Contextual Search Navigation using Semantic Tag ...

Contextual Search Navigation using Semantic Tag Signatures Geir Solskinnsbakk

Jon Atle Gulla

Department of Computer and Information Science NTNU, Trondheim, Norway

Department of Computer and Information Science NTNU, Trondheim, Norway

[email protected]

[email protected]

ABSTRACT Search has been and will continue to be an important tool for users who need to locate information in an ever increasing mount of resources. Not all queries have a well defined information need that can easily be described by a keyword query. Exploratory search is one such type of search where the user is not necessarily proficient in the domain or does not have a clear idea of what he is looking for. In such types of search, navigation is beneficial to guide the user in his quest. In this paper we present an approach to contextual navigation search, based on a hierarchical structure constructed from folksonomy tags. The tags are associated with an extended semantic representation used to guide the navigation. Five semantic navigators are introduced, which are navigation strategies the user can benefit from. We present a prototype which has been implemented to show the applicability of the approach to the problem at hand. The preliminary results are promising and demonstrate the ability to direct the user at interesting navigational suggestions and documents.

Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Search process—Navigation

General Terms Theory

Keywords Semantic navigation, Tag signatures

1. INTRODUCTION Search has been and will continue to be a very important tool for users that need to find information in an increasing mount of resources. Traditionally, standard key word based systems (e.g. Google) are the most popular systems. However, as Marchionini points out “As people demand more of Web services, short queries typed into search boxes are not robust enough to meet all of their demands” [7]. Marchionini

makes a distinction between lookup search and exploratory search. Lookup search is typically associated with information seeking with a well defined information need and precise results. Exploratory search on the other hand may require several iterations, examining and interpretation of multiple sources. Exploratory search may benefit from other types of search that gives the user guidance in the search procedure. Precision and recall have for a long time been important measures of success in the information retrieval community, but requires that we have a specific information need that we can fulfill. This is not necessarily the case when the user is looking to explore a topic or domain. Navigational search can in these types of search help the user to explore a topic by (1) showing the breadth of a topic/domain, and (2) showing related topics, sub topics etc. Traditional navigation systems are often based on ontologies to guide the user in the navigation task (see e.g. [3, 8, 11]). Using the ontology directly for navigation is a cumbersome procedure, since large ontologies are hard to visualize in a good way. Moreover, mapping the user input (query) to the ontology is not a straight forward task. The concepts of the user and the concepts of the ontology creator (expert) may differ, both in terms of how the concepts are defined and how they are represented in term space [9]. This means that there is a mismatch between the concept space of humans and the terms that represent the concepts, both with respect to the users’ interpretation of the concept in term space and the ontology’s representation of the concept in term space. Thus it may be hard for users to locate a good starting point in the ontology for the navigation task. Moreover, ontology engineering is a quite time consuming and expensive activity, which may lead to a knowledge acquisition bottleneck [6]. This means that a representation of the domain for navigational purposes that is simple to construct and maintain is more adequate, since such a structure may be updated and rearranged automatically to reflect changes in the domain at certain intervals in time. This paper proposes an approach to semantic navigation based on folksonomies. Folksonomies are centered around three main entities; users, object, and tags. Popular examples of folksonomies include Delicious (www.delicious.com) and Flickr (www.flickr.com). In our approach we base the semantic navigation on structuring the folksonomy data and extending the folksonomy tags with semantics in the form of tag signatures. We introduce five types of semantic navigation strategies, navigators, which support the user in navi-

gating the information space. The approach is demonstrated as an implemented prototype, which shows promising results. The paper is arranged as follows: Section 2 discusses semantic navigation, while Section 3 gives an overview of our approach. Our preliminary results are described in Section 4, and finally the paper is concluded in Section 5.

2. SEMANTIC NAVIGATION Exploratory search [7] means that a user is sifting through large amounts of information and has more complex information needs. Traditional search using simple keyword queries returning to the user a list of documents related to the query is not always the best strategy in such a situation [7]. Navigation strategies that help the user explore the information would in this situation be beneficial. Semantic navigation has, to our knowledge, no well established definition. Villela Dantas et al. [11] describes conceptual navigation as “...a way of navigating in Web sites when one is inserted in a context”. The context here is represented by an ontology which guides the user in the search, by (i) letting the user see the structure of the domain modeled by the ontology and (ii) helping the user to see relations between different parts of the domain ontology. Thus the ontology is in this case a model that will aid the user in understanding the domain. In our opinion such models are especially beneficial in cases where the user is not familiar with the domain. Semantic navigation is also in the literature closely related to the notion of context. Boselli et al. [3] describe navigation as a change of context. This context can be identified through the activities of a user, previous activities, relevant topics in the user’s company etc. [3]. Becker et al. [1] present an approach for modeling of semantic navigation structures. The authors “argue that navigation structures should resemble existing structures or hierarchies, that are familiar within an organization”[1]. This means that using a structure that the user can understand and is well acquainted with is beneficial for navigation. Further, the authors state that the structure can be used towards classification of the system’s content. Three types of navigation are identified; content-structure navigation, free navigation, and semantic navigation. Kopak et al. [5] characterize semantic navigation as “a process of constructing meaning while moving through an information space, which is supported by explicit and implicit cues in the environment”. The authors focus on the approaches of genre, linking and annotation towards supporting semantic navigation. Genre is important since the genre (email, homepage, article, etc.) gives cues towards what the intention of the content creator was, and is also associated with an expectation towards the structure and content of the information [5]. Links are important so that the user is aware, explicitly, of how information objects are related. Systems like Tabulator [2] that are based on more formal models (e.g. RDF), may also give the user opportunity to find what he is seeking within the structure itself. Summing up we can say than semantic navigation should not only be a means for users to explore new information, but rather help the user get the bigger picture; the user should be able to recognize how information is related. Beneficial

for this process is to have some model to guide the user. The user should be able to select a starting point (maybe a node in the model, a keyword query etc.), and from there navigate to related information, gathering information as the user is moving on. This leads to our definition of semantic navigation, given as Definition 1 below. The term concept will be used rather than tag in the remainder of this section to make the discussion more generic. Definition 1. Semantic navigation lets the user explore the information space by using a structured model of a domain. The model should visualize the domain to the user in such a way that the user gets an overview of important relations in the domain. The user should be able to find information in terms of resources that have a connection to the concepts of the domain. Our approach to semantic navigation introduces the notion of semantic navigators. A semantic navigator is a strategy for visualizing navigation alternatives to the user. The semantic navigators are constructed so that the user is presented with the most interesting navigation alternatives according to a given query. First, the user query is mapped to a concept in the structure. This concept will be used as the initial context for the navigators, as the semantic navigators act in the vicinity of a context concept. We have defined five different semantic navigators. The first two navigators are the generalization and specialization navigators which operate on the abstraction relations in the hierarchy. The generalization navigator presents to the user concepts on the higher level, while the specialization navigator presents to the user child concepts. Next, the sibling navigator presents to the user concepts that are siblings of the current concept, facilitating horizontal navigation of the structure. The association navigator presents the user with concepts that are related through other types of relations. Thus the association navigator is freer in suggesting concepts for navigation. Finally, the instance navigator presents the user with relevant instances. Each navigator is responsible for ranking the concepts proposed to the user according to some relevance metric. The motivation behind these navigators is that they each will provide the user with valuable clues to the navigation process. Each navigator will present the user with different aspects of the structure that are viable options for further navigation given the current context.

3.

APPROACH

The model we are using is a hierarchy of tags which is based on an unsupervised construction process. Since the model construction is unsupervised, expensive and tedious ontology construction is avoided. Secondly the model is extended with tag signatures, which are shallow semantic representations of the tags in the hierarchy. The tag signature of a tag is represented as a vector of terms describing the tag. The terms are weighted to reflect the strength of the relation between the tag and the term. Thus the tag signature enables us to give a richer representation of tag’s semantics that the tag does on its own. For a more thorough description of the construction and rationale of both tag hierarchy and tag signatures we refer the reader to [10].

3.1

Navigation

We have implemented four out of the five navigators described in Section 2. These are sibling, generalization, specialization, and association. The instantiation navigator has not been implemented due to that instances are not recognized in the current structure. We have implemented two query types at present, which are two tags, and a single tag and a keyword. These two query types offer two different aspects. While the navigators for the query type handling two tags suggests navigation options that are in some sense connected to both tags’ semantics, the navigators will for the query type handling a tag and a key word try to balance the suggestions between the tag semantics and the keyword. The implementation of the navigators generalization, specialization, and sibling is similar and uses the relations found in the hierarchy. The association navigator uses the same strategy for the case where the query maps to a tag and a keyword. The difference is that the relations are not necessarily defined by the hierarchy, but by the magnitude of the cosine similarity between the tag and all other tags in the hierarchy. This introduces generic relations into the hierarchy based on the semantic similarity between the tags. First the tag(s) in the query are mapped to the tag(s) in the hierarchy, which places the navigation in context. The generalization navigator ranks the list of related tags according to the more general tags. This means that we include the parent and all siblings of the parent in the ranking strategy. The specialization navigator ranks all the tags that are subsumed by the given tag, while the sibling navigator ranks all tags with the same parent as itself. Tag ranking is done in two ways. In queries which maps to two tags, we first average the tag signatures of the two tags, so that we have a combined representation of the tags. Then we calculate the cosine score between the average tag and each of the related tags (first factor of Equation 1). The tags are ranked according to the cosine score. In case we can map one query term to a tag and the other is considered a keyword, we use the keyword as an extra indicator of relevance. The tag signature is used to calculate the cosine similarity with each related tag, while the query term is used to find the weight of the term within each of the related tags tag signatures. This is done according to Equation 1, where the first factor is the cosine measure, and the second factor is the keyword’s influence on the score. scoren is the score of related tag n, wl,m is the weight of term l in the signature of tag m, i is the tag we mapped the query to, n is the related tag, wkw,n is the weight of the keyword in the tag signature of n. ∑t scoren = √∑ t

k=1

wk,i · wk,n √∑ · wkw,n t 2 · w k=1 k,n

(1)

2 k=1 wk,i

The ranking of the tags for the association navigator when mapping the query to two tags, A and B, is done by calculating, for each of the two tags, the cosine similarity of the tag signatures with all other tags in the hierarchy. Then, for each other tag C, we multiply the cosine scores, scoreC = cosine(A, C) · cosine(B, C), where cosine(N, M ) is the cosine score of the tag signature for tags N and M . Thus the tags M with highest scoreM are presented to the user. Informal experimentation with several scoring metrics found that

the ones presented here seem to give the best performance. In addition to the navigation suggestions, the user is presented with a set of documents that are found according to the tag(s) and the keyword. In case the query maps to two tags, the query generated is based on using the 50 most prominent terms from the average tag signature as a weighted query into a vector space based document index. In case the query maps to a single tag and a keyword, the keyword is required in the result document, while the tag signature query acts as a filter, guiding the search. This means that the keyword provides direction intended by the user, and the tag signature provides filtering based on the semantics of the tag. The use of tag signatures in document search lets us have a dynamic binding between tags and documents and we do not rely on pre-indexing the relations between tags and documents. Thus we do not have to re-index the document collection when the tag signatures are updated. The document search is based on a Java implementation using Lucene (http://lucene.apache.org).

Figure 1: Partial hierarchy used in the prototype.

4.

PRELIMINARY RESULTS

We have implemented a prototype to test our approach on real data. The prototype is based on a hierarchy of tags generated based on the data and approach described in [10]. The hierarchy we used was a pruned version of the hierarchy from [10] including 150 tags, spanning over three topics; science, foto, and security. Part of the hierarchy showing the top node security is depicted as Figure 1. Figure 2 shows two example queries and the four navigators used. The two examples show queries that map to the tag energy and the keyword pollution (Figure 2(a)) and the tags energy and space (Figure 2(b)). Each of the navigators points out further directions that may be interesting to follow for the user. In these specific examples, the generalization navigator does not supply interesting alternatives to the user since the tree is so small. However, the other navigation suggestions are good at showing different directions the search could take. The example shows that the resulting navigators rank the tag suggestions differently based on the query context. Tag suggestions given by the navigators act as links the user can follow to explore the topic of the query. Informal inspection of the retrieved documents shows that

tag disambiguation found in the literature (e.g. [4]) before generating both tag signatures and tag hierarchy.

(a) Query: Tag:energy Keyword:pollution

We have in this paper presented an approach for contextual navigation search which is based on a hierarchic model of folksonomy tags to guide the search. In addition we use an extended semantic representation of the tags, tag signatures, which give a description of the tags in terms of the vocabulary of the documents the folksonomy users have applied them to. Five types of semantic navigators have been identified, which employ different strategies for navigating the hierarchy. The preliminary experimentation with the prototype shows promising results, which make a good basis for further improvement. We also plan to do a more thorough evaluation of the improved approach.

6. (b) Query: Tag:energy Tag:space Figure 2: Example navigators for two queries.

the documents indeed reflect the intention of the queries. The results seem to have higher quality and are more focused towards the intention compared to keyword queries. We attribute this to the tag signatures being able to guide the document search in a correct direction.

5. DISCUSSION & CONCLUSIONS The hierarchy is based on folksonomy tags and unsupervised construction. One may argue that ontologies are more precise, since they are manually constructed, and thus should have a higher quality. However, the knowledge acquisition bottleneck [6] associated with ontology construction suggests that a more informal representation based on unsupervised approaches is more appealing. The evaluation found in [10] shows that the quality of the hierarchy is generally good, and our opinion is that the benefit of having a structure that is constructed unsupervised and easy to update outweighs the cons of the few quality issues the structure has. One of the current limitations to the structure is that only hierarchic relations are found. It may be beneficial to add other non-hierarchical relations across the hierarchy. This can be done by utilizing the cosine between the tag signatures of the tags in the hierarchy. However, these types of relations have no good definition and are not easy to name (hasPart, isMemberOf etc.), and would be regarded as associations. The quality of the tag signatures is an important aspect, since they are used both to make a connection between documents and to rank the tags by the navigators. Since the tag signatures are generated based on how users apply tags to textual resources, we argue that in general the signatures should be good. Further, the use of supporting keywords to guide the navigation lets the system balance between using the tag signature as a filter for topical semantics, while the keywords give directional guidance to the navigation. Tag ambiguity is a limiting factor for our approach, since we do not disambiguate tags. Disambiguation could easily be handled by applying one of the many approaches for

REFERENCES

[1] J. Becker, C. Brelage, K. Klose, and M. Thygs. Conceptual modeling of semantic navigation structures: the mosena-approach. In Proceedings of the 5th ACM international workshop on Web information and data management, WIDM ’03. ACM, 2003. [2] T. Berners-Lee, Y. Chen, L. Chilton, D. Connolly, R. Dhanaraj, J. Hollenbach, A. Lerer, and D. Sheets. Tabulator: Exploring and analyzing linked data on the semantic web. In The 3rd International Semantic Web User Interaction Workshop (SWIU06). 2006. [3] R. Boselli and F. Paoli. Semantic navigation through multiple topic ontologies. In Proceedings of the 2nd Italian Semantic Web Workshop, Semantic Web Applications and Perspectives (SWAP), 2005. [4] A. Garcia, M. Szomszor, H. Alani, and O. Corcho. Preliminary results in tag disambiguation using dbpedia. In CKCaR’09: Proceedings of the 1st International Workshop on Collective Knowledge Capturing and Representation at K-CAP 2009, 2009. [5] R. Kopak, L. Freund, and H. L. O’Brien. Supporting semantic navigation. In Proceedings of the third symposium on Information Interaction in context (IIiX 2010), pages 359–364. ACM, 2010. [6] A. Maedche and S. Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16:72–79, March 2001. [7] G. Marchionini. Exploratory search: from finding to understanding. Commun. ACM, 49:41–46, April 2006. [8] G. Polaillon, M. A. Aufaure, B. Le Grand, and M. Soto. FCA for contextual semantic navigation and information retrieval in heterogeneous information systems. In 18th International Workshop on Database and Expert Systems Applications, DEXA ’07, 2007. [9] M. L. Shaw and B. R. Gaines. Comparing conceptual structures: consensus, conflict, correspondence and contrast. Knowledge Acquisition, 1(4):341 – 363, 1989. [10] G. Solskinnsbakk and J. A. Gulla. A hybrid approach to constructing tag hierarchies. In On the Move to Meaningful Internet Systems, OTM 2010, volume 6427 of LNCS, pages 975–982. Springer, 2010. [11] J. R. Villela Dantas and P. P. Muniz Farias. Conceptual navigation in knowledge management environments using NavCon. Information Processing & Management, 46(4):413–425, 2010.