Metadata Visualization for Digital Libraries: Interactive Timeline ...

3 downloads 106 Views 903KB Size Report
Most timelines use a graphical presentation, but non-graphical presentations such as tables, lists, and even textual descriptions are abundant in the literature.
Metadata Visualization for Digital Libraries: Interactive Timeline.Editing and Review

Vijay Kumar, Richard Fur&a

Robert B. Allen

Center for the Study of the Digital Libraries and the Dept. of Computer Science Texas A&M University College Station, TX 77843-3 112, USA E-mail: { vijayk,furuta} @cs.tamu.edu

Bellcore 1A-352R, 445 South Street Morristown, NJ-07960, USA E-mail: [email protected]

ABSTRACT

Interactive Timeline Editing and Review (ITER), a general framework for modeling and presenting temporal information, is described. In addition, the tmViewer interface is described for viewing temporal and other metadata. ITER and tmViewer go beyond previous electronic timeline displays in treating timelines as hy pertexts and structured documents, and allowing interactive display of the metadata in addition to the events. The use of the tool is described for exploring bibliographic records, such as search hits from the book database available at amazon.com, and for the presentation of timelines. KEYWORDS: history, interactivity, timelines, tmviewer, visualization

metadata, taxonomy,

OVERVIEW

A timeline is a linear graphical or textual presentation of events with respect to time. In a typical timeline presentation, time is arranged along one dimension and a number of markers, representing events, are placed appropriately along the time dimension. A timeline consists of entities, e.g., events, people, places, organizations, actions, etc. Most timelines use a graphical presentation, but non-graphical presentations such as tables, lists, and even textual descriptions are abundant in the literature. A timeline has a theme and the events listed are related to the theme. Peripheral and external information, which is information not related to the theme, is sometimes provided to better emphasize the time frame. The earliest use of timelines in the published literature can be traced to William Playfair who, in 1785 A.D., extended the graphical metaphor to non-spatial data. Today, most major fields of study use some form of temporal data and represent it in a visual form. Peg-missionto make digital or hnrd copies ofall or part ofthis work for personal or c&.woom use is granled without fee provided that copies ue not made or distributed for profit or commercial advantageandthat copiesbear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to list% requires prior speciiic permission and/or a fee. Digital Libraries 98 Pittsburgh PA USA Copyright ACM 1998 O-89791-965--3/98/ 6...%5.00

126

Timelines are applied in fields as diverse as history, anthropology, sociology, psychology, music, business, marketing, biology, and physics [19]. Tufte [22] outlines four reasons for creating a graphic, namely, description, exploration, tabulation, and decoration. With the increasing use of the Internet as the publishing medium we see the proliferation of informal timelines such as personal profiles and company histories on Web sites. Simple electronic timelines have been proposed by several authors [2, 11, 12, 20, 211. Our work is descended from the interactive timelines developed by Allen [2]. Another effort to develop interactive timelines is reported by Plaisant, et al. [20]. Users of electronic timelines can explore events not only by zooming-in and “drilling-down” into an event but also by resizing, rotating, and resealing the graphic. By interactively reformatting the timeline display, readers are able to reveal the structure of events that is not apparent at first sight in a static graphic or table. Problem

Statement

Visualization of metadata using timelines is a new research area, therefore no established models exist in the published literature. Timeline tools such as Timeliner [21] use ad hoc models and support only a limited number of user interactions such as zooming and resealing. Other models [12, 211 are narrowly focused on one domain, making them unsuitable for modeling generalized timelines. However, they serve an important objective by providing a basis for developing a reference model. Timelines are typically used in knowledge-rich domains [ 151. Knowledge-rich domains are characterized by having a large variety of entities with many types of interrelationships among the entities. Each entity is different from others in number and type of attributes. The relationships among entities also are unique. This makes it different from other fields of study that deal with time-based data, such as statistics and data visualization, where the behavior of one or more variables is studied with respect to time. These characteristics make developing structured content for knowledge-rich domains quite challenging.

the content and the presentation, the edits are stored in the storage layer. When the design session ends, the modified specifications are saved.

Another issue is that of presentation of the data so users can reformat the presentation interactively. Interactive timelines can be used for information exploration and may support ad hoc querying capability. Such problems of presentation and exploration have been studied earlier in the fields of hypertext, CSCW, interactive data mining, and information visualization [9, 10, 171. However models developed in the fields of hypertext, CSCW, and interactive data mining are too narrowly focussed on specific issues, while models for information visualization tend to be too broad to be useful for timeline presentation [3, 7, 18, 231. For example, there is a fair amount of work done in the field of visualizing timerelated biomedical data. Cousins and Kahn [5, 61 have informally defined seven operations on timelines. These operations are slice, overlay, filter, concatenate, new, copy, and delete. They also mention the need for a timeline browser, a tool for visually manipulating a group of timelines. What is needed is a model that expresses the process of creating and presenting knowledge-rich content in a flexible manner.

Storage

Layer: Content

When a browsing session starts, the content and the associated specifications are read by the presentation layer. At the content level we are primarily interested in the characteristics such as data type, ordering function, criticality of entities, and the strength and arity of relationships. These attributes are selected from the content taxonomy [ 131shown in the Appendix.

During the course of this research, several timelines were studied in an effort to characterize the content and presentation of timelines. The characterization process resulted in development of two taxonomies as shown in the Appendix. This paper describes the ITER model of timelines, which is a framework within which generalized timelines can be created and viewed. Experiences with a prototype interface are described in the implementation section of this paper.

Simple entity

A tangible object, an event, or a concept that is a necessary component of a timeline. It may be part of one or more composite Entities.

Composite entity

An entity that consists of two or more entities.

Attribute

A property of the entity that has a value. A typical entity has at least one attribute which is the name of the entity.

Relationship

An association between two entities. Associations can be unary or binary. An entity or a relationship.

Component Content

THE ITER MODEL

The timeline prototype presented here is an integral part of the lTER framework. The ITER model is based on two layers: the storage layer and the presentation layer. The storage layer models the structured data within a database residing in persistent storage. The presentation layer models the visual manifestation of the components on the screen. During the design session as the components are viewed and manipulated dynamically, the modifications are reflected in the updated content. Table 1 shows the terminology used in the model. I Storage layer 1

I

I Presentrtion I

I

Table 1: ITER terminology. Storage Layer: Operations

In addition to the data model the storage layer specifies three operations that can be used to modify a timeline from the interactive interface. Operation merge0 allows a user to merge content of two timelines into a single timeline. Operation split0 enables a user to split a timeline into two parts based on a given criterion. The third operation subset(), allows a user to create a view of the content based on a given subset criterion.

layer I

Axes

.

Entities

. . .

Relationships Figure 1: The architecture Storage Layer: Presentation

l

. .

of the ITER mode ?I.

Animations l Unary operation: l . Binary operations

Specifications

As shown in Figure 1, presentation specifications are associated with every object in the content base. They specify how the associated content should be rendered on the display. When the user starts a design session, the content and a default presentation specification are sent to the presentation layer. As the user makes modifications to

A set of entities with their associated attributes and relationships forming a coherent theme.

Swap, rotate, rescale, rearrange, label, or tick mark. Show/hide entities and attributes. Pan, zoom, navigate to an entity. Edit entity display. Show/hide relationships, Edit display of relationships. Transitive closure on relationships. Animation with and without prompts. Elision, Interval view, split. Superimpose, juxtapose, and merge.

Table 2: User interactions

127

supported layer.

by the presentation

Transitive-cls Present-ent UnPresent-ent Present-att UnPresent-att Pan camp-zoom Navigate edit-entity crispify-attrib fuzzify-attrib rotate-axis swap axis Rescale rearrange-axis Multi-label Grid-on Grid-off

relation set Rs. entity E entity E entity-type E, attribute A entity-type E, attribute A component C component C, zoom factor component C. component C, attribute A entity E, p-spec Entity E, attribute A Entity E, attribute A Timeline t,, degrees M Timeline t,, axis x, y Timeline I,, axis x, scale-factorf Timeline t,, ticks k,, k,. Timeline t,, attribute a,, a, Timeline t,, axis X, “major” and/or “minor” Timeline t,, axis n, “major” and/or “minor” Table 3: Signatures

Table 4: Signatures

Presentation

of the presentation

of the presentation

new display (E) E with crisp A E with fuzzy A t, t, with axis y, x t, t, with ticks k, and k,. t, with one axis labeled two ways t, + grid t, - grid

layer operations

layer operations

Layer: Browsing

The presentation layer supports the user interaction operations with timelines. Tables 2 and 3 list the supported operations; these operations were taken from the taxonomy of the presentation [ 131as shown in the Appendix. Presentation

relation-disply closure of (Rs) timeline + P(E) timeline -P(E) E+A E-A component C in center magnified C detailed view of C or view of A

Layer: Editing

In addition to browsing the content a user can also alter the structured content via the interface. As shown in Table 4 these operations allow one to create, delete, or edit an entity, relation, or an attribute.

128

available

available

in the browsing

mode.

in the design mode.

The operations at the storage and the presentation layers take the unique characteristics of time such as linearity and continuity into account. For example, events can be compared based on their begin time, end time, or their duration. This means that any attribute of events that has the above mentioned characteristics can be substituted for time, thereby widening the applicability of the ITER model to non-temporal data. Although the names of the operations in Tables 3 and 4 are generally self-explanatory, some of them need clarification. For example, fuzzifi means using a less precise value of an attribute. Similarly, the crispi’y

operation will use a higher precision value of an attribute, if one is available. tmVlEWER

IMPLEMENTATION

We developed a prototype implementation of the 1TER model [ 131. Several practical issues arose during this phase, which were resolved by making certain assumptions about the data and the prospective user. The prototype is a I O,OOOline Java application [ 141. In the present implementation, the user selects interactively which attributes are to be displayed and assigns one display dimension to each selected attribute. Available display dimensions are location, color, size, shape, symbol, texture, fixed label, popup label, and frame of the visual marker. Apart from data-specific constraints like “only attributes with discrete values can be displayed using the shape dimension”, the user has complete freedom to define the presentation of the content interactively. Time-Based

Metadata

Visualization

The tmViewer can be used to display metadata in association with the temporal dimension. Records from the on-line bookstore amazon.com were generated based on the These records were manually query “hypermedia”. restructured and stored in a data file. Figure 2 shows as sorting of those records by publisher and year. This is somewhat similar to the Starfield viewer [ 11.

‘\

\

Figure 3: Visualizing

relationships.

Life and Works of Musical Composers

We experimented with several datasets to show that the tmViewer can be used to produce a wide variety of timelines. A database of Western music composers was compiled during the early stage of development. The domain of music offers a rich set of attributes and relationships. Some of the attributes are musical eras, style, type of music, genre, are evident in the database such as contemporary, student-teacher, influence-of, etc. Spatial attributes emergence of instruments, nationality, and number of pieces composed. Many temporal and causal relationships, such as place of birth and places of education, that influenced the style of composition are also stored in the database. Figure 4 shows a sample timeline from the composer database. Here the colored bars represent the lives of music composers. The Y axis shows the name of the country with witch a composer is associated. The colors of the bars represent their style of music.

. .

I I

.

: . I

Figure 4: Using tmViewer British Figure 2: Library metadata

sorted by publisher

and year.

Following the ITER model, it is possible to show relationships among objects. Figure 3 shows books that were chosen by the same people who ordered Jakob Nielsen’s Hypertext and Hypermedia. More than one relationship can be viewed and the user can choose the display attribute of line (such as color, thickness, or style) to differentiate between the displayed attributes.

on the composers

database

Royal History

One of the inspirations for this research was the Timeline of World History [20]. That paper timeline shows changes in governments and administrations across more than 3000 years of history and across many parts of the world. Governments and administrations are generally easy to represent on timelines because they have clear time frames. To show that tmViewer is a useful tool for information discovery we developed a dataset of the history of British Royalty. The dataset was compiled from the information available on the Royal family’s official Web site [4].

129

The dataset contains 90 events including war, reign, era, event, lif, and administration, from 1030 AD to present. There are five types of relationships among events. The “events included” relation is a composition relationship. An event may include many sub-events. Other relationships include, “Child of’, “Parent of’, “Peer to”, and “Caused”. Table 5 shows the data schema for this dataset.

Event Name

I

Relationship Start Year End Year Sub Events

administration Name Name of a 1 relationship, pointer to the related event Number Number Pointer to another event

interested in knowing how the Prime Ministership started in England and may want to know how Robert Walpole became the first Prime Minister. In tmViewer this will involve locating the glyph for Robert Walpole and then focussing on the area to see what event was immediately prior to his administration. As shown in Figure 5-A, this event was the South Sea Bubble Crisis. The user can confirm this inference further by further exploring the textual description associated with each event. Similarly, Figure 5-B shows the answer to a question “What war resulted in the Treaty of Utrecht? ”

William the Conqueror None 1

I 1030 1087 None

Table 5: Data schema for historical

events.

Cause-effect relationships can be visualized easily by interacting with the system. For example a student may be

One can also answer quantitative queries such as “How many rulers reigned longer than 20 years. ” To answer this question users can create a derived attribute, say, duration, that is computed by finding the difference of “End date” and “Begin date”. Then they can use the elision feature to remove all but “Reign” event-types and then the interval feature to show only durations that are greater than 20 years. Users can then easily count the displayed glyphs as shown in Figure 5-C and answer the question. Beyond cause-effect relationships it is possible to explore inter-relationship among people such as the question “How are the Black Prince and John of Gaunt related? ” Users can find this relationship by locating the data glyphs for one of the persons. Focusing on the event will display related entities. Users can now use the relationships menu and display defined relationships.

(4

(b)

Cd) Figure 5: Four examples

of displaying

metadata

130

in addition

to temporal

relationships.

No relationship is defined between John of Gaunt and the Black Prince, but following the defined relationships leads the user to Edward III. When the user focuses on the glyph marked Edward III, he/she can immediately see that John of Gaunt and Black Prince are both sons of Edward III as shown in Figure S-D.

However, reasoning about complex events, which are described by only a few brief attributes, can be difficult. At present ITER incorporates a simple querying capability. A potential extension of this work would be the use of natural language input to generate timeline displays.

CONCLUSION Timelines may be viewed as an interface to digital libraries. Effective visualization of knowledge-rich data sets can help users gain insight into relevant features of the data, construct accurate mental models of the information presented, and locate regions of particular interest. By allowing users to see the same information in a variety of formats, timelines enable their users to test claims about the plausible relationships between events. Moreover, timelines are not limited to showing traditional historical information. Timelines can be used to display a wide variety of timebased information visually, such as medical histories, stockmarket data, company histories, lawsuit histories, and individual’s credit histories.

This material is based in part on work supported by the Texas Advanced Research Program under Grant Number

ACKNOWLEDGEMENTS

Data Representations

Underlying

REFERENCES

I.

Ahlberg, C. and Shneiderman, B. “Visual Information Seeking: Tight coupling of dynamic query filters with starfield displays”, in Proc. of ACM CHI ‘94, Boston, MA, April 24-28, 1994, pp 313-3 17.

2.

Allen, R. B. “Interactive Timelines as Information systems interfaces”, Symposium on Digital Libraries, Japan, August 1995, pp. 175-l 80.

3.

Arens, Y., Hovy, E., and Vossers, M. “The Knowledge Underlying Multimedia Presentations”, in M. Maybury, editor, Intelligent Multimedia Interfaces, pp. 280-306. The MIT Press, 1993.

4.

COI Publications, “The British Monarchy: The official web site”. Oct., 1997, URL http://www.royal.gov.uk.

5.

Cousins S. B. and Kahn, M.G. “Visualizing Operations on Temporal Data”, in Proc. First Conf: Visualization in Biomedical Computing, Atlanta, GA, May 1990.

6.

Cousins S. B. and Kahn, M.G. “The Visual Display of Temporal Information”, Artljkial Intelligence in Medicine, Vol. 3, pp. 34 l-357, 199 I.

7.

Furuta, R. “An Object-based Taxonomy for Abstract Structure in Document Models”, The Computer Journal, Vol. 32, No. 6, 1989, pp 494-504.

8.

Halasz, F. and Schwartz, M. “The Dexter Hypertext Reference Model”, in Special Issue on Hypermedia, Communications of the ACM, Vol. 32, No. 2, February

Timelines

The ITER framework provides a basis for developing timeline applications. An application developed using the framework will allow its users to create, visualize, and interact with generalized timelines in flexible manner. We have attempted to model timelines in terms of established standards for hypertexts and documents. One way of extending this effort would be to follow the SGML model and develop DTDs for timelines. The advantage of this approach is that authors can make use of existing data easily. Most historical data is in descriptive text form such as history books and story books. It is a lot of work to put this information into a database using an authoring tool. Manual data entry and scanning seem to be the only solutions, both of which are slow and error prone. Therefore, a DTD-based approach that processes the input in situ may work better. Limitations

99903-230.

of Timelines

We feel that interactive graphical timelines can be used for exploring relationships and understanding the context of historical events. However, we do not mean to imply that timelines can replace a detailed examination of history. Many historic events are complex. In many cases, there are several different versions of historical&&. Even if the facts are accepted, the interpretation of these facts may diverge wildly. Clearly, future timelines must include the ability to present different versions of history. Indeed the ability to indicate and contrast the versions of history clearly could be a great benefit of timelines. ITER provides simple inferencing capability, in that it supports transitivity and performs transitive closure on relationships. More sophisticated reasoning capability will allow users to test hypotheses about historical events.

1994. 9.

Jerding, D. F. and Stasko, J. T. “The Information Mural: A Technique for Displaying and Navigating Large Information Spaces”, in Proc. lnfoVis’95, IEEE Computer Society Press, October 1995, pp 43-50.

10. Johnson, B. S. “Treemaps: Visualizing Hierarchical and Categorical Data”, dissertation, Dept. of Computer Science, University of Maryland, College Park, MD, 1993. 1I. Joyce, D. “History of Mathematics Timeline”, URL http:l!‘ale~hO:clarku.edu/-diovce/mathhis~titne.html, October 1997.

131

12. Karam, G.M. “Visualization Using Timelines”, in Proc. of Intl. Symposium on Sojiiare Testing and Analysis (ISSTA), 1994i, also in SIGSOFT, ACM Software Engineering Notes, 1994.

18. Mckinlay, J. D., “Automating the Design of Graphical Information”, ACM Presentation of Relational Transaction on Graphics, vol. 5, no. 2, April, 1986, pp.

13. Kumar, V. “Timelines as Interfaces to information Systems”, dissertation, Dept. of Computer Science, Texas A&M University, College Station, TX-77843, May 1998.

19. Ocha, G. “The Timeline Book of Arts”, Ballantine Books, New York, 1995.

110-141.

20. Plaisant, C., Milash, B., Rose, A., Widoff, S., and Shneiderman, B. “LifeLines: Visualizing Personal Histories”, in Proc. of CHI’96, Vancouver, BC, Canada, April 14-18, 1996, pp 221-227.

14. Kumar, V., “TMViewer Prototype”. Available from the URL: http://www.csdl.tamu.edu/-vijayk/timelines.htmJ. 15. Kumar, V., Furuta, R., and Allen, R.B. “Interactive Interfaces for Knowledge-rich Domains”, EP ‘96: Proceedings of the Sixth International Conference on Electronic Publishing, Document Manipulation, and Typography, Sept. 1995, pp. 235-246.

2 I. Tom Snyder Productions. “Timeliner 4.0”, Tom Snyder Productions. 1997.

16. Lesk, M.E., “What to do When There’s Too Much Information”, in Proc. ACM Hypertext 1989.

23. Zhou, M. and Feiner, S. K. “Data Characterization for Heterogeneous Visualizing Automatically Information”, in Proc. InfoVis’96, Oct. 1996, pp. 13-20.

E. “The Visual Display of Quantitative Information”, Graphics Press, Cheshire, CT, 1992.

22. Tufte,

17. Mckinlay, J. D., Robertson, G. G., and Card, S. K. “The Perspective Wall: Detail and Context Smoothly Integrated”, in Proc. ofCHI’91, 1991, pp 173-179. APPENDIX

Content _,',' _- , .'

,/'.

,/'

t

,, .'I-'

_

/\ 1

, -’

,,,:

*\,

/

F’“Yy

Entities

Linearity

1

I

k

\ \Frozen R;al-time

Linkar Quality Continuity

!I Abs

\\

Zelative

(Metric)

,“:\ #’t \\

t’ I ‘\\

Precision A&racy (Metric)

Media

Reievance Significance

’ ‘;

Vi&al kdio \, ,J\\ >‘, \, Text Images \

Computationtype

(Merric)

/n ,./‘t

/ /’ Compctable Defined

, ’ TemporalCausal Si;tial

Figure A: The content taxonomy

132

’ Orgering + Type , ’ , _ \_ , A\? \\\\ _ Quant Ordinal Nornina; / \ ‘\ ComposiZio?

Paper

Presentation ._

Computer

Lines

Media Dimensions

Solid 3D ,_-’

,.’

,’ ’ _-,’ ’

__ I’ /

,,’ Form

Dimensions t Spatial Color 1 Shape Size Texture I Frame colcr Frame texture I Frame style Symbol Label

Scalar

Linear

Grid

Table,/

Interactivity

Numerical ‘Graphical Line Stapel

Auto

Elision / j Interval

view ,& Rescale 1 Axis I

Suggest Documents