Evaluating usability and precision of visual search engine

79 downloads 9393 Views 1MB Size Report
compound phrases (e.g., “Introduction to Literature”, as a course name). The main focus ... a visual search engine to summarize the entire domain (E-learning).
Evaluating Usability and Precision of Visual Search Engine ∗ Olfa Nasraoui

Knowledge Discovery and Web Mining Lab Dept. of Computer Engineering and Computer Science Univsersity of Louisville Louisville, KY 40292 USA

Knowledge Discovery and Web Mining Lab Dept. of Computer Engineering and Computer Science Univsersity of Louisville Louisville, KY 40292 USA

[email protected]

[email protected]

Abstract

a visual search engine to summarize the entire domain (E-learning). This can be considered as a tool to help visualize concepts and subconcepts. This visual exploration of documents enables users to have an overall view of the entire repository, without even clicking on the resources and reading each document. When a user types a query on the visual search engine, the visual search engine dynamically matches the query with the whole visual ontology (concepts, subconcepts, etc). The visual search engine presents all the sectors (concepts/subconcepts) that share terms with tthe submitted query in different colors than the unmatched concepts. Therefore, the user can find what he/she is looking for immediately. As the user adds more terms to his/her query, the number of matched sectors narrows down to the most similar concepts in the ontology.

rP ap

In this paper, we evaluate a visual search engine that is designed to solve the real problem of browsing and searching for documents in a vast repository of colleges/courses located at Western Kentucky University (WKU). The architectural design of this interface combines Formal Concept Analysis (FCA) with Semantic Factoring to decompose complex, vast concepts into their primitives in order to develop knowledge representation for the HyperManyMedia platform1 . The main objective of this study is to test: (a) the efficiency of ranking the documents using precision, and (b) the usability of the visual search engine. This approach has been implemented and used by online students at WKU2 .

er

Leyla Zhuhadar

Categories and Subject Descriptors H.3.3 [Information Systems]: Information Storage and Retrieval-Inforamtion Search and Retrieval Keywords Visual Information Retrieval, Usability, Precision, Ontology, Semantic Web, Search Engine

1.

Introduction

Po

st e

This paper provides an evaluation of visual knowledge representation of a graphical model which is represented in the form of semantic networks. In order to develop the visual knowledge representation for the whole E-learning repository, HyperManyMedia. We rely on the well known principle: Zipf’s laws: the Principle of Least Effort [7] in our arguments for building a small set of vocabulary that represents the whole domain of our repository. In addition, we use the Collocation Concept [3] to build the ontology. More specifically, our ontology not only consists of nouns, but also of compound phrases (e.g., “Introduction to Literature”, as a course name). The main focus of this paper is the evaluation of a visual representation of the ontology that allows learners to navigate the system visually. This representation provides the user (learner) with ∗ This

work is partially supported by National Science Foundation CAREER Award IIS-0133948 to Olfa Nasraoui. 1 http//HyperManyMedia.wku.edu 2 http://www.wku.edu

© 2010 SCS. All rights reserved. Reprinted here with permission.

2.

Background

2.1

Ontology

Gruber defined an ontology as: “an explicit specification and formal specification of conceptualization of a domain of interest [2]”. The main goal of using an ontology in Gruber’s work was to support sharing and reusing of formally represented knowledge in AI systems. To accomplish this, a common vocabulary needs to be defined then used to represent the shared knowledge [2]. This included definitions of classes, functions, objects, and the relationships among all of them–which is an ontology. More specifically, the ontology represents the language of the Semantic Web. Since the Semantic Web will not replace the current Web, but will rather be built on top of it, a new structure was needed to deal with this issue. The old formal language, HTML needed to be preserved and a new semantic language needed to be used, the Resource Description Framework (RDF). RDF encapsulates the Web Ontology Language (OWL) in a schema similar to the XML format and it lays on top of it. The first structure, of what is called “The Semantic Layers[1]”, was proposed by Tim Berners-Lee in 2001.

2.2

Information Retrieval Evaluation Metrics

Several evaluation methods have been introduced in the literature, such as Recall, Precision, F-measure, Harmonic Mean, E Measure, User-Oriented Measure (coverage, novelty), expected search length, satisfaction, frustration, etc. The most widely used are: (a) Top-n Recall, which is the number of relevant retrieved documents among the top n retrieved documents divided by the total number of relevant documents, and (b) Top-n Precision, which is the number of relevant retrieved documents within the top n divided by n.

Page 1

3.

Implementation

Table 1. Top-20 Precision for Visual Search Interface

3.1

Top-20-Precision

History General Biology Bioquímica de posgrado Introducción a la biología Accounting Economic Analysis for Business Decisions Game Theory for Managers Teoría de juegos para directivos Introducción al marketing Marketing Management Introducción al marketing Marketing Estrategia de marketing Game Theory Teoría de juegos Engineering Wave Propagation Contabilidad financiera Financial Accounting English Average

0.20 0.56 0.87 0.77 0.30 0.76 0.88 0.50 0.78 0.89 0.88 0.30 0.88 0.86 0.85 0.28 0.79 0.75 0.53 0.11 0.637

We selected 20 concepts and subconcepts randomly to test the visual interface. We noticed, as shown in Table 1, that when the visual term is a concept, the precision is very low. Whereas, when the visual term is a subconcept, the precision is very high. This can be compared to our findings in Metadata search [4, 6], while using a “single-term keyword,” “two-term keywords,” and “three-terms keywords”. When the term is a concept that consists of a single term, e.g., “English,” this term can be found in many different documents not related to the English courses. On average the Top20-precision with random sampling of concepts/subconcepts was 0.637 which is considered acceptable. We did not evaluate the recall in the visual test since our testing was implemented only on the Top-20. Figure 2 presents the visual interface with ranking the documents after a user clicks on the concept Engineering.

rP

Figure 1. Augmented Ontology:level-1

Visual Concept

ap er

We used Protégé3 , an open source ontology editor and knowledgebased framework that supports two ways of modeling ontologies (1) Protégé-Frames and (2) Protégé-OWL editors - to design and build the structure of HyperManyMedia ontology. Our current ontology consisting of ~40,000 lines of code 4 . The main question is how to design an ontology that can summarize the whole domain? We used two concepts: Formal Context Representation (FCR) and Semantic Factoring (SF), refer to our previous work[5], for more details.

Constructing the HyperManyMedia Ontology:

Figure 2. Right clicking on the “Engineering” Sector

Po s

te

Figure 1 depicts the upper-level of the HyperManyMedia ontology in Protégé. This figure describes the classification of the ontology. The highest level is “Thing” and underneath it, is the definition of the five major entities (College, Course, Language, Lecture, and Professor). However, since we extended the domain ontology into the multilingual domain (English and Spanish), we need to define the same entities in Spanish. Protégé provides the user with the capability to create any type of relationship that fits any structure needed. In our case, we defined the following entities: has_College, has_Course, has_Language, has_Lecture, has_Professor, sub_Class_Of. In addition, each entity has different characteristics (Functional, Description).

4.

Evaluating the Visual Search Engine

We used two types of evaluation methodologies to test our visual search engine: (1) Precision, and (2) Usability. 4.1

Precision

Precision is used to evaluate the accuracy of the retrieved search results. Our visual search engine was designed based on the ontology discussed above. Table 1 presents the visual concepts that have been tested.

3 http://protege.stanford.edu 4 http://people.wku.edu/leyla.zhuhadar/semanticowl.owl

4.1.1

Usability

The usability test consists of evaluating each concept and subconcept presented in the visual interface. The test covers three levels of testing: 1) based on the hierarchical level of the ontology domain,

Page 2

2) based on the English resources in each level, and 3) based on the Spanish resources in each level (refer to Table 2 for more details).

chy in England (Lecture:SubSubSubConcept level), (b) English (College:SubConcept level), (c) Engineering (College:SubConcept level), and (d) Introduction to Computers and Engineering (Course:SubSubConcept level).

Table 2. Usability Test for the Visual Search Engine Test Type

Hierarchical Level

English Resources (Concepts/ SubConcepts)

In the latest case, the visual ontology matches 4 concepts, refer to Figure 6, those concepts are: (a) Monar-

Spanish Resources (Concepts/SubConcepts)

Figure 4. Visual Search for “E”

Left button click College (Concept) Course (SubConcept) Lecture









































descriptive features from (SubSubSubConcept) Right button click Course (SubConcept) Lecture (SubSubConcept) descriptive features from (SubSubSubConcept) Double-click Course (SubConcept) Lecture (SubSubConcept) descriptive features from

• Functionality Test:

Figure 5. Visual Search for “En”

rP

(SubSubSubConcept)

ap er

(SubSubConcept)

Testing the usability of the visual interface is related to the functions provided by the visual interface using the mouse. The following functionality is provided and each one serves a different purpose. In Table 2, we distinguish each one of these functions and we run the test on each level separately.

Figure 6. Visual Search for “Eng”

Po s

te

Figure 3. One Level Filtering of the query “Engineering”

1. Active Visual Search: When a user types a query in the visual search engine, the visual search engine dynamically matches a query with the whole visual ontology (concepts, subconcepts, etc). Consequently, it presents all the sectors (concepts/subconcepts) that share the query terms in different colors than the unmatched terms. Therefore, the user can find what he/she is looking for immediately. However, as long as the user types more letters in the query, the number of matched sectors narrows down to the most similar concepts in the ontology. For example, when a user types the three letters “Eng”, Figure 4 shows 186 concepts retrieved by only typing “E”, Figure 5 shows 32 concepts retrieved by typing “En”, Figure 6 shows 4 concepts retrieved by typing “Eng”.

2. Left Mouse Button Click on a Sector: If the level of filtering is equal to 1, the user is able to move from concept to subconcepts (e.g., Engineering –> Hydrology) and all the concepts underneath the specific concept “Engineering”; thus, all concepts under Engineering can be seen and retrieved visually (refer to Figure 3).

Page 3

(a) In each sector, the user can go in a deeper level of granularity until reaching the leaves of that level in the graph.

Figure 9. Double-click on the “Engineering” Sector

Figure 7. Two Levels Filtering of the query “Engineering”

ap er

(b) If the level of filtering is higher than 1 (refer to Figure 7 and 8), the user is able to see from the beginning an increased level of granularity equal to the level of filtering. However, by clicking on a specific concept, the level of granularity of that specific concept can be extended further. The process stops when it reaches the leaves in the graph.

(b) The user can navigate up and down through the graph (ontology); the upper hierarchy level represents an upper concept of the current node, and the lower level represents a subconcept of the current node.

5.

Conclusion and Future Work

rP

In this paper, we evaluated a visual search model for the HyperManyMedia platform. Testing this model was based on (a) the efficiency of the visual interface in ranking the retrieved document using Precision, and (b) testing the usability of the visual model. In the Efficiency Test, we noticed that when the visual term is a concept, the Precision is very low; whereas, when the visual term is a subconcept, the precision is very high. We noticed that when the term is a concept consisting of a single term, e.g., “English”, this term can be found in several different documents and those documents may be unrelated to the English courses. We found that the average of Top-20-Precision with random sampling of concepts/subconcepts was 0.637, which is considered a satisfactory Precision level. Currently, we are working on embedding the semantic tagging functionality to the HyperManyMedia platform.

Po s

te

Figure 8. Three Levels Filtering of the query “Engineering”

3. Right Mouse Button Click on a Sector (refer to Figure 2): The retrieval system considers the concept/subconcept in this node as a query term and it retrieves all related concepts matching that query. (a) The graph underneath that specific node becomes the root of the graph and all the concepts underneath this node are updated.

(b) This procedure is repeated until the user reaches the leaves of the tree under that specific concept.

4. Double-Click on a Sector:

References [1] T. Berners-Lee, J. Hendler, O. Lassila, et al. The semantic web. Scientific American, 284(5):28–37, 2001. [2] T.R. Gruber. A translation approach to portable ontology specifications. KNOWLEDGE ACQUISITION, 5:199–199, 1993. [3] C.D. Manning, H. Schütze, and MIT Press. Foundations of statistical natural language processing. MIT Press, 1999. [4] L. Zhuhadar, O. Nasraoui, and R. Wyatt. Metadata as seeds for building an ontology driven information retrieval system. International Journal of Hybrid Intelligent Systems, 6(3):169–186, 2009. [5] L. Zhuhadar, O. Nasraoui, and R. Wyatt. Visual Ontology-Based Information Retrieval System. In Proceedings of the 2009 13th International Conference Information Visualisation-Volume 00, pages 419– 426. IEEE Computer Society, 2009. [6] Leyla Zhuhadar, Olfa Nasraoui, and Robert Wyatt. Metadata domainknowledge driven search engine in "hypermanymedia" e-learning resources. In CSTST ’08: Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology, pages 363–370, New York, NY, USA, 2008. ACM. [7] G.K. Zipf. Human behavior and the principle of least effort. Hafner New York, 1972.

(a) In this case, the order of the visualization changes (e.g., double clicking on Engineering will bring the Engineering to the high level of the graph and it will be considered as the main concept the user would like to search underneath (refer to Figure 9).

Page 4

Suggest Documents