IVEA: Toward a Personalized Visual Interface for ... - ACM Digital Library

4 downloads 0 Views 439KB Size Report
Digital Enterprise Research Institute. National University of Ireland, Galway. IDA Business Park, Lower Dangan,. Galway, Ireland. E-mail: [email protected].
IVEA: Toward a Personalized Visual Interface for Exploring Text Collections VinhTuan Thai Digital Enterprise Research Institute National University of Ireland, Galway IDA Business Park, Lower Dangan, Galway, Ireland E-mail: [email protected]

Siegfried Handschuh Digital Enterprise Research Institute National University of Ireland, Galway IDA Business Park, Lower Dangan, Galway, Ireland E-mail: [email protected] representation of text collections by helping users to focus on aspects that they are particularly concerned with. In this approach, users only need to define their spheres of interest by encoding important entities in a personal, simple ontology. This ontology is leveraged upon to allow users to interpret various aspects of a text collection at different levels of detail. Here we describe the prototype, IVEA, which has been enhanced with new features based on users' feedback.

ABSTRACT

In this paper we present IVEA, a personalized visual interface which enables users to explore text collections from different perspectives and levels of detail. This work explores the use of a personal ontology, which encapsulates users' entities of interest, as an anchor for the exploration process. This, in effect, simplifies the comprehension of visual representation of text collections by helping users to focus on aspects that they are particularly concerned with. ACM Classification: H5.2 [Information interfaces and presentation]: User Interfaces. - Graphical user interfaces. General terms: Design Keywords: Visual exploration, text collections INTRODUCTION

Information visualization is an essential mechanism to support scientists and analysts alike in exploring different facets of complex information spaces such as text collections. Given the amount and the unstructured or weakly structured nature of textual documents that analysts have to deal with, visual applications can help to gain needed insights in a timely manner. Previous research in document corpus visualization has mainly focused on visualizing different linkages between automatically extracted entities. The shortcoming of this approach is that findings presented are not aligned with users' interest. This is important since in practice oftentimes analysts, in following business logics, would strictly focus on a very specific set of pre-defined entities. Consequently, visual presentation of entities that are outside of analysts' spheres of interest could prove counterproductive. This need, however, requires more than just traditional information retrieval applications since non-visual information needs to be visualized and interacted with.

Figure 1. Approach Overview VISUAL INTERFACE AND INTERACTIONS

IVEA, as shown in Fig.2, employs multiple coordinated views as described below. Personal Knowledge View

This view shows the hierarchical relationship between concepts and instances within the personal ontology, and serves as the starting point for the exploration process. Text documents are analyzed based on these entities to obtain information such as their frequencies, relevance, and relative locations in each document. Each entity can be represented by a user-defined set of associating texts; therefore their abbreviations and synonyms can be accounted for. Entities can be interactively added or removed on this view. Overview display

This matrix-based multi-dimensional view enables users to execute dynamic queries simply by dragging one or more entities from the ontology and dropping onto the display area. Since hierarchical relationships between entities are taken into account, dragging and dropping a class over to the Overview display will result in the automatic inclusion of all of its direct instances and recursively, all of its subclasses. In this matrix, rows represent selected entities, columns represent documents containing at least one of those

In this context, we proposed an approach [2,3], as illustrated in Fig. 1, to simplify the comprehension of visual

Copyright is held by the author/owner(s). IUI’09, February 8–11, 2009, Sanibel Island, Florida, USA. ACM 978-1-60558-331-0/09/02.

479

entities, and each cell shows the relevance value of a document with respect to an entity via its height. The vertical part of the cross-hair highlighter helps to focus on which entities a document contains and its horizontal part helps to show the distribution of an entity in a collection. The hierarchy attached with the matrix allows users to dynamically drill down or roll up to achieve views at different levels of detail based on the class-subclass or class-instance relationships.

to frequency values. By hovering the mouse over an entity's label, users can know its exact frequency. Keyphrases in each document are automatically extracted by the XtraK4Me (http://smile.deri.ie/projects/keyphraseextraction) library and are used as suggestions for incrementally enriching the personal ontology. With the newly added entities, the personal ontology becomes a richer and better representation of the users' interests and hence can lead to more personalized subsequent exploration and analysis experiences. Ontology enrichment can be done with minimal effort by simply dragging a keyphrase and dropping it onto a class in the Personal Knowledge View. The textual view shows a document's contents and texts associated with entities in the ontology are highlighted for better focus and navigation. Entities Distribution View

A variant of the TileBars[1] is used to display the relative locations of the ontology's entities together with their corresponding frequencies in each fragment of a document. This view has been improved so that it can act as a navigation pad, i.e. clicking on a cell in this view will result in the first appearance of an entity in the corresponding fragment being displayed in the Document View. This direct-access navigation capability can help users to gain quick access to the text portions surrounding entities of interest and therefore can reduce the time required to traverse from the beginning until the end of lengthy documents.

Figure 2. IVEA Visual Interface

To accommodate for large collections, a new feature has been implemented to group together documents which contain the same set of entities into a cluster, and only one document of that cluster is initially displayed on the matrix. The column of the representative document has a visual cue, which is a '+' sign on top, to indicate that there are more documents containing exactly the same set of entities. Clicking on this representative column will make visible all documents in a cluster on the matrix and its visual cue changes to '-'. Clicking again on the representative column will hide other documents in a cluster. This interactive feature enables users to comprehend the visual display of a large number of documents without having to examine a matrix containing a large number of columns representing all documents containing the set of selected entities.

SUMMARY AND OUTLOOK

IVEA provides an intelligent visual interface to allow users to interactively explore text collections in a personalized manner. All visual components containing advanced features are extensions of existing visual paradigms that most users are familiar with (tree structure, spreadsheet, tag clouds) and most interactions involve a single mouse-click. As such, IVEA requires short learning curve and minimal effort to use. In future work, we plan to evaluate its usability and utility.

Users can also dynamically filter out attributes that they are no longer interested in while exploring by just rightclicking on a concept or an instance and the whole row is removed from the overview display. This interaction in effect restricts the matrix to display a particular subset of the data. Furthermore, when users move from one column to another, the Document View and Entities Distribution View are changed accordingly to show details of the document in focus.

ACKNOWLEDGMENTS

This work is supported by the Science Foundation Ireland (SFI) under the DERI-Líon project (SFI/02/CE1/I131). REFERENCES

Document View

This view consists of: (1) a tabbed panel showing the frequency of entities mentioned in a document as well as keyphrases extracted from it, and (2) a textual view of a document's contents. In previous versions, we used bar chart to display frequency. Bar chart, however, does not cope well with large number of entities. Therefore, we opt for a visual encoding scheme in which font sizes correspond relatively

480

1.

Hearst, M. A. TileBars: visualization of term distribution information in full text information access. In Proc CHI’95, pp. 59–66, 1995.

2.

Thai, V., Handschuh, S. and Decker, S. IVEA: An Information Visualization Tool for Personalized Exploratory Document Collection Analysis. In Proc ESWC 2008, pp. 139–153, 2008.

3.

Thai, V., Handschuh, S. and Decker, S. Tight coupling of personal interests with multi-dimensional visualization for exploration and analysis of text collections. In Proc IV08, pp. 221–226, 2008.

Suggest Documents