Interactive Visualization of DL Data and Metadata - cs.vu.nl

2 downloads 0 Views 38KB Size Report
The Visage data visualization and exploration environment [5] is used to visualize the metadata. A visual query language supports database-style joins between ...
Interactive Visualization of DL Data and Metadata Mark Derthick Human Computer Interaction Institute Carnegie Mellon University Pittsburgh, PA, USA 15213 1 412 268-8812

[email protected] ABSTRACT Much current research on digital libraries focuses on named entity extraction and transformation into structured metadata. Examples include entities like events, people, and places, and attributes like birth date or latitude. Unfortunately, this extraction process is not very reliable, and in any case a digital library may contain references to hundreds of thousands of entities. Information visualization is a powerful tool for summarizing large sets of structured data about entities and their relationships to uncover overall patterns, and then drilling down into interesting subsets. We have applied this technique to metadata extracted from the Informedia Digital Video Library, and show examples of conclusions that can be drawn from metadata patterns alone. Currently, text attributes are handled poorly in terms of both query semantics and interaction speed. Our goal is to overcome these difficulties, so that integrated data and metadata library browsing becomes a continuous interactive activity.

1. INTRODUCTION As digital libraries grow, the familiar problem of too many disorganized query matches becomes ever harder to manage. Furthermore, accessing a document in a digital library rarely is the end goal of a user, as finding a novel to read might be in a traditional library. More often it is to understand, organize, and communicate information in terms of people, events, or other entities that exist outside the library. One document may refer to many entities, and one entity may be referred to in many documents. Thus documents are bottlenecks that stand between the user and the natural elements of discourse of the task. Off-line named entity extraction and transformation into structured metadata can support this process. Unfortunately, this extraction is currently not very reliable, and in any case a digital library may contain references to hundreds of thousands of entities. Information visualization is a powerful tool for summarizing large sets of structured data about entities and their relationships to uncover overall patterns, and then drilling down into interesting subsets. This paper begins with an example of metadata and of browsing it. However, it is impossible to anticipate and extract as structured data everything of interest to the user. Therefore online processing of unstructured data is also required. To stimulate workshop discussion, open problems in integrated browsing of both structured and unstructured data are then discussed.

2. METADATA EXTRACTION CMU’s Informedia News on Demand project has recorded and processed thousands of hours of news video from CNN and other sources [6]. A transcript of the audio is obtained from closedcaption information or speech recognition. Each word is looked

up in a gazetteer of geographical locations. Any locations found are linked to the video segment, and the number of times the location is mentioned is recorded in a database of metadata. The latitude and longitude, country, and continent are also added. The video frames are also analyzed, by face recognition and optical character recognition algorithms. Frames containing a name and a face give rise to a “named face” entity. The person’s title is extracted by OCR or from a biographical dictionary. The latter source also adds nationality and dates of birth and death. The Informedia database used in the example below contains about 50K video segments, 1500 named face occurrences of 200 distinct names, and 78K occurrences of 1800 distinct locations.

3. ENTITY VISUALIZATION The Visage data visualization and exploration environment [5] is used to visualize the metadata. A visual query language supports database-style joins between entities of different types, Dynamic Query filtering (DQ) of attribute values, visualization of conditional attribute value distributions with histograms, and drill down to individual entities [3]. Data populate the query structure, which becomes an interface for exploration that gives continuous feedback in the form of visualizations of summary statistics. The target user of our previous work has been a data analyst familiar with the domain from which the entities come, but not a computer scientist. For digital library applications, the target will be a more casual user. Statistical tools will not be necessary, but learnability will be more important. In particular, the interface should be upward compatible with familiar Web-style search interfaces. Improved support for text attributes, a prerequisite for familiarity, is discussed in the future work section. The example below uses only structured attributes (which includes text if there are only a small number of “atomic” values, as is the case for countries). The left box in the figure below contains a DQ widget for the attribute “cntry_name.” The 15 most common values are given their own bar, and the rest are lumped under “”. The horizontal histogram visualization superimposed on the DQ buttons shows the distribution of the countries of each occurrence of a location extracted as metadata. It shows the dominance of the US, CNN’s home country, but also concentrations that reflect specific events, as in the case of Serbia. By clicking on the histogram bars, the user can focus on any subset of countries. Here only locations in Serbia are selected, so the summary line at the top of the left box shows that 2836 of all the 77,851 locationmentions are in Serbia. The link between the boxes represents a bi-directional constraint (conceptually a database join) that propagates the Serbia location restriction along the link to the right box, which provides an

overview of the video segments (documents) that mention the locations. There are 21,541 segments that mention some location, of which 3023 mention a location in Serbia. The dark conditional distribution of the segment copyright_date attribute shows that most of the Serbia segments were recorded in 1998-1999, while the light unconditional distribution shows that segments overall are distributed more uniformly from 1996-1999. In some intervals as many as one third of all the news segments mention locations in Serbia. Using the date slider, the histograms can be filtered further, and continuous feedback is provided. In the figure, it is impossible to make out the date distribution for the Serbia segments before mid-1998 because they comprise such a small percentage of the 21K segments. To support drilling down on this subset, the “3023” visual object can be dragged to start a new query in which further filtering and browsing take place. If the subset has fewer than about 1,000 documents, each record can be visualized explicitly on a chart, map, timeline, table, or other information graphic. There is a video demo at the main JCDL01 conference that illustrates browsing and drill down with Visage in more detail [1].

unstructured attributes (for instance, with relevance feedback, or by selection within a Themescape visualization [4]). Can unstructured attributes, especially text, be indexed in a manner compatible with efficient DQ indexing? Even changing the query semantics (e.g. to Boolean disjunction of keywords), or using approximate query algorithms may be useful for browsing whole libraries. For instance, each text attribute could be transformed into a handful of scalar attributes corresponding to the principal components. Then text queries would be transformed in the same manner and used to set DQ ranges around the value for each component. Alternatively, the system may be able to make educated guesses about words that will occur in queries. For instance, it might save the 100 most recent or more common query words. Then it becomes like the country case, except that documents may contain multiple values. Fortunately, Visage’s [approximate] kd-tree counting algorithm correctly counts unique documents even if they match multiple values [2] (thus the Boolean disjunction query semantics suggested above).

5. ACKNOWLEDGMENTS This material is based on work supported by the National Science Foundation under Cooperative Agreement No. IRI-9817496. Thanks to the Informedia group members for helpful discussions.

6. REFERENCES 1.

Derthick, M. "Interactive Visualization of Video Metadata (22MB avi video)" in Proceedings of the Joint ACM/IEEE Conference on Digital Libraries, 2001. Virginia Beach, VA. p. to appear. http://www.cs.cmu.edu/~sage/animations/JCDL01/JCD L.avi

2.

Derthick, M., J. Harrison, A. Moore, and S.F. Roth. "Efficient Multi-object Dynamic Query Histograms" in Proceedings of IEEE Information Visualization Symposium (InfoVis’99), 1999: IEEE Press. p. 84-91. http://www.cs.cmu.edu/~sage/PDF/IV99.pdf

3.

Derthick, M., J.A. Kolojejchick, and S. Roth. "An Interactive Visual Query Environment for Exploring Data" in Proceedings of the ACM Symposium on User Interface Software and Technology (UIST), 1997. Banff, Canada: ACM Press. p. 189-198. http://www.cs.cmu.edu/~sage/UIST97/UIST97.pdf

4.

Hetzler, B., P. Whitney, L. Martucci, and J. Thomas. "Multi-faceted Insight Through Interoperable Visual Information Analysis Paradigms" in Proceedings of IEEE Symposium on Information Visualization, InfoVis ’98. October, 1998. Research Triangle Park, North Carolina. p. 137-144. http://www.pnl.gov/infoviz/insight.pdf

5.

Roth, S.F., M.C. Chuah, S. Kerpedjiev, J.A. Kolojejchick, and P. Lucas, "Towards an Information Visualization Workspace: Combining Multiple Means of Expression." Human-Computer Interaction Journal, 1997. 12(1-2): p. 131-185. http://www.cs.cmu.edu/~sage/PDF/Towards.pdf

6.

Wactlar, H., T. Kanade, M. Smith, and S. Stevens, "Intelligent Access to Digital Video: Informedia Project." IEEE Computer, 1996. 29(5): p. 46-52.

4. FUTURE WORK The next step is to provide Informedia users with an integrated interface where relevance-based text queries can be intermingled with DQ and join constraints. Informedia is moving to web delivery of content through the use of W3C standards like XML and XSL. Visage is a large standalone application, which currently uses ODBC and SQL to communicate with a database. We plan to modify the Informedia web server to accept SQL encoded in a URI, in addition to URIs from the current Informedia interface, and to send XML data back to both applications. Query modifications in either interface propagate to the other, so users can treat the two applications as one application that uses two windows. Initially the Informedia interface will not respond to Visage DQ actions, because recomputing the XML and redisplaying would take longer than the 100ms latency that users can tolerate for mouse moves. Later, we can write XSL that can handle incremental changes efficiently. Text searching in Informedia uses an inverted index to find a list of potential query matches, sequentially scans them to compute a query-specific relevance score, and returns the top 500. This is a completely different way to index than the kd-trees used to make Visage DQ efficient. Therefore, changing the text query requires rebuilding the DQ index, as do adding DQ widgets or dragging to create a new query. The build time is linear in the number of attributes and records, and takes 30 seconds for 5 attributes of 200K hits on a 450MHz Pentium 2 with 384MB of RAM. Therefore building a new index for the 3023 Serbia segments, or for a new text query that only returns a few thousand hits, can be done interactively. However we would like to be able to browse whole libraries interactively while changing constraints on

Suggest Documents