Library Projects
Knowledge-based Editing and Visualization for Hypermedia Encyclopedias Christoph Hüser, Klaus Reichenberger, Lothar Rostek, and Norbert Streitz
O
ne of the main goals in developing digital libraries is to provide users with opportunities for accessing and using information in highly flexible and user-oriented ways not available in current information repositories.This implies a focus on supporting the intellectual access to information and to consider the context of the information request. Systems for digital libraries should be able to select the appropriate type and amount of information from a comprehensive pool and compose it on the fly for a meaningful and coherent presentation which might have never occurred before or never will again after this event because it is customized to the current situation. In order to meet these goals, one has to adopt a new perspective towards information retrieval, the notion of documents, and publishing in general. This is achieved by utilizing the paradigm shift currently taking place in electronic publishing caused by hypertext and hypermedia. That is, no longer viewing documents as static entities published at one point in time in a definite form, but as dynamic and networked collections of information composed on demand and presented with possibilities for interaction. This also allows for multiple views on the content according to different aspects of interest initiated by different users. At the Integrated Publications and Information
Systems Institute (GMD-IPSI) in Darmstadt, Germany, we are developing concepts and tools supporting the production and use of innovative publication products as flexibly compiled cross-sectional information collections. These efforts are part of the Europublishing project which is funded by the European R&D program RACE II. As a concrete application, we use the Dictionary of Art which is to be published this year by Macmillan Publishers Ltd., U.K., as a 34-volume print edition. More than 6,000 authors and 50 editors have been involved in its conventional publication process for over 15 years. Our research focuses on providing support for the complex and demanding editorial work and on the presentation of hypermedia publications (see Figure 1). Like digital library applications, the Dictionary of Art poses problems of large amounts of information and thus of individual access involving the selection and combination of information which is put together by users demands. As is true for all electronic publication products, there is the additional challenge to meet the quality standards of traditional publishing, in particular of design and layout when presenting the information on the screen. Our approach for meeting these requirements is based on the three concepts which also reflect the overall process. First, we use an object-oriented representation of a formal representation of facts which COMMUNICATIONS OF THE ACM
April 1995/Vol. 38, No. 4
49
Figure 1. User interface for the Dictionary of Art
ment are the Editor’s Workbench for advanced hypermedia publications and an automatic visualization engine. The Editor’s Workbench reflects our view that maintaining and editing a knowledge base of extracted facts remains an intellectual task. However, the Editor’s Workbench (see Figure 2) makes the added value of the intellectual efforts accessible for further processing. The skeleton of the knowledge base is an object network consisting of Dictionary of Art articles and domain-specific objects such as representations of art styles, artists, and works of art. Schema and update behavior of these object types are modeled using the frame-based representation tool SmallTalk Frame Kit (SFK). It supports a variety of consistency checks and atomicity of complex update operations. For knowledge acquisition— that is, for populating the object network-—we experiment with automatic text processing techniques ranging from pattern-oriented parsing to full text analysis applied, e.g., to biography articles. The head of the dictionary biographies is a particularly well-structured, densely phrased piece of text that contains the essential facts Conceptual architecture of the Editor’s Workbench of a person’s life structured
are extracted from Dictionary of Art source material in order to create a powerful knowledge base. Second, we support a process of network editing and enrichment, partially achieved by automatic means and partially by sophisticated tools for the human editor who is part of the editorial cycle. Third, we employ a set of procedures and tools for automatic presentation of results utilizing the underlying formal representation. The core components of our publishing environ-
Figure 2.
50
April 1995/Vol. 38, No. 4
COMMUNICATIONS OF THE ACM
Library Projects according to editorial guidelines. These guidelines are encoded in the rules of our parsing and text-toobject conversion tools (XGrammar). Allowing flexible information access to the knowledge base results in unpredictable content selections to be presented. This requires an automatic generation of graphical presentation in combination with text generation which flexibly transforms complex knowledge representations into readable natural language texts. Our approach is unique in so far as there are no predefined templates for the presentations, such as timeline, network or geographical diagrams. Although a variety of presentations is created by our system, each one is a result of the specific request combined with the available information and depends on the characteristics of the selected content. This integrative approach is based on a common description for both the facts represented in the knowledge base and for the basic graphical means of expression: both are described as relations—existing either between domain objects or between graphical elements. Correspondence in their characteristic relational properties serves to determine whether a particular domain relation can be visualized using a particular graphical relation. When a conflict-free set of graphical relations has been decided an optimization algorithm, based on a force-model, is applied to compute the final presentation. We built an environment consisting of integrated prototypes providing the functions described here. As a whole, they address major issues relevant to digital libraries of the future where we expect the source material is going to be beyond scanned images of text pages of existing books. The described prototypes are part of a more comprehensive effort at GMD-IPSI including multimedia archives based on object-oriented databases, support for cooperative work, such as multiple authors and editors, and information retrieval and 3D visualization for large document bases. C References 1. Kamps, T., Reichenberger, K. A dialogue approach to graphical information access. Designing User Interfaces for Hypermedia, Schuler, W., Hannemann, J., and Streitz, N., Eds. Springer, Heidelberg (1995), 141– 55. 2. Rostek, L., Möhr, W. An editor’s workbench for an art history reference work. In Proceedings of the ACM European Conference on Hypermedia Technology. Edinburgh, U.K., Sept. 13 –18, 1994, pp. 233– 238. 3. Rostek, L., Möhr, W., Fischer, D. Weaving a web: The structure and creation of an object network representing an electronic reference work. Electronic Publishing — Origination, Dissemination and Design. Special Issue, 6, 4 Wiley, NY, 495 – 505. Christoph Hüser is the manager of the Publications and Visualization Environment research department at GMD-IPSI. Klaus Reichenberger is a member of the research staff at GMD-IPSI. Lothar Rostek is a senior member of the research staff at GMDIPSI. Norbert A. Streitz is the deputy director of GMD-IPSI and the manager of the Cooperative Hypermedia Systems research division. Email: hueser, reichen, rostek,
[email protected] © ACM 0002-0782/95/0400
The University of California CD-ROM Information System Deane Merrill, Nathan Parker, Fredric Gey, and Chris Stuber
T
he University of California CD-ROM Information System replaces the equivalent of 260,000 books of published federal statistics with a CD-ROM-based online information system. The size of this database is currently 270 CDROMs (135GB). It contains 1990 U.S. census data (approximately 3,000 items of socio-economic and demographic information, including race-ethnicity, employment, income, educational level, and poverty) NCSA Mosaic: Document View
File
Options
Navigate
Annotate
Help
Document Title:
1990 Census Lookup (1.0.5e)
Document URL:
http: //cedr.lbl.gov/cdrom/lookup/date=788117324
(Reload this page)
Current Level: State – – Place Ann Arbor city: FIPS.STATE=26,FIPS.PLACE90=03000 RACE Universe: Persons White (800–869, 971) ................................................ 90196 Black (870–934, 972) ................................................. 9785 American Indian, Eskimo, or Aleut (000–599, 935–970, 973–975): American Indian (000–599, 973) ..................................... 263 Eskimo(935–940, 974) ................................................. 0 Aleut (941–970, 975) ................................................. 0 Asian or Pacific Islander (600–699, 976-985): Asian (600–652, 976, 977, 979–982, 985): Chinese (605–607, 976) .......................................... 3170 Filipino (608, 977) .............................................. 620 Japanese (611, 981) .............................................. 981 Asian Indian (600, 982) ......................................... 1469 Korean (612, 979) ............................................... 1729 Vietnamese (619, 980) ............................................. 85 Cambodian (604) .................................................... 0 Hmong (609) ........................................................ 7 Laotian (613) ...................................................... 0 Thai (618)......................................................... 43 Other Asian (601–603, 610, 614–617, 620–652, 985) ................ 386 Pacific Islander (653–699, 978, 983, 984): Polynesian (653–659, 978, 983): Hawaiian (653, 654, 978) ........................................ 23 Samoan (655, 983) ................................................ 0 Tongan (657) ..................................................... 0 Other Polynesian (656, 658, 659) ................................. 0 Micronesian (660–675, 984): Guamanian (660, 984) ............................................. 0 Other Micronesian (661–675) ...................................... 0
Back Forward Home Reload Open... Save As... Clone New Window Close Window
Figure 1. Example use of LOOKUP system to retrieve the racial composition of the population of Ann Arbor, Michigan.
COMMUNICATIONS OF THE ACM
April 1995/Vol. 38, No. 4
51