Figure 1 also shows the full text of the definition for ... The architecture, as outlined in Figure 2, consists of 6 models covering the knowledge in the form of facts ...
Semantic enabled presentation of information Frank Nack, Lynda Hardman and Jacco van Ossenbruggen CWI, Amsterdam, Kruislaan 413, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands E-mail: {firstname.secondname}@cwi.nl
1. Introduction Human culture is embodied in the collection of information transmitted from one generation to another. It is based not only on storing this information, but also on conveying this information in such a way that the relationships within the information are understood. The era of multimedia and Internet technology has not changed this overall goal in any fundamental way. Problems remain at a higher level - for example, that of selecting appropriate multimedia content for conveying the information to the user in a way that enhances the communication of the message. The automatic generation of multimedia presentations has been a focus of multimedia research for over a decade. The aim is to establish generation mechanisms with adaptive [4] or adaptable qualities [14] that adjust the multimedia presentation to the specific context of an individual user. Various attempts to explore and develop innovative presentation techniques have been described [1, 2, 3, 6, 10, 19]. These approaches facilitate the synthesis of multimedia documents and plan how this material is presented to various users. The underlying assumption of these systems, however, is that all material and user requests are known. In dynamic environments, where neither the individual user requirements nor the requested material can be predicted in advance, the established planning approaches are insufficient. Instead, we claim that a system that automatically generates functional and aesthetically pleasing presentations in dynamic environments needs the knowledge to provide a balance between media content, meaning, usability and aesthetics. As a result, it facilitates the communication of information by addressing the circumstances and presuppositions of the user at the time of accessing the information. To enable this the system requires knowledge of low-level codes, collections of objective measurements [7, 18] representing prototypical style elements, in combination with high-level conceptual descriptions [17] that support contextual and presentational requirements.
2. Conveying information Conveying audio-visual information to the user in a way that enhances the communication of the message is a non-trivial task. For this to be successful two things have to take place. The first is to ensure that the information reaches the user, in other words, that it is disseminated correctly. The second is that it is presented in such a way that the user extracts the intended meaning from the information [9]. To illustrate this, consider the way that current generation search engines, Google in this case, present the information that has been found. Google concentrates on the search side of the problem, not on the presentation side. What we see as the result is a list of links to Web pages where potentially relevant information can be found. All the information is presented as text - a handy medium, but not necessarily the one needed. Similarly, an image search, which is already forcing the user to make a choice of media before she even starts the search, returns a seemingly randomly-ordered list of images. Hence, it is not so much the quality of the search result that is troublesome but rather that any information known by the system about the images is not being used to display them in any meaningful way.
What we would much rather have is that the system, knowing the query, is able to generate a presentation which more effectively communicates the information incorporated within the retrieved media items. Figure 1 displays an example GUI for a large information network on the director Sergej Eisenstein. The system always tries to present information units of all available media types, as demonstrated in Figure, where the images represent videos. The importance of an object is emphasized through its size and position within the presentation area. The importance can be determined on aspects such as traversal-valency or significance (relation between link type and context of investigation) of a relation between nodes. Figure 1 also shows the full text of the definition for intellectual montage, whereas the related textual examples are only visualised as an active area, encouraging further investigation. A visual example, on the other hand, is displayed more prominently just above it. Thus, the GUI is the visual presentation of a knowledge space that facilitates not only the representation of ideas and arguments within a work, on the level of higher semantic structures down to the precise unit of articulation, but also the discourse about various aspects of it from different viewpoints.
Figure 1: Dynamic Interface representing the various ways of linking material
3. Presentation generation In order to create this presentation, we need to specify a number of characteristics of the presentation. However complex the process may be, it boils down to making decisions and trade-offs in the style of the final presentation. The style of the presentation can be broken up into three parts: •
The content of the presentation (a digital data repository for a variety of primary data (text, audio, video, 3D animation, 2D image, 3D image, graphic) and the expert annotations related to one or more of these primary data).
•
The overall logic of the presentation, i.e. the structure to connect the content units together. This might address questions such as how to establish an evolving presentation. With evolving we refer to the concept of progression of detail that facilitates navigation based on a given weighted set of descriptors representing a story context on a micro level (next step in content exploration) as well as on a macro level (larger contextual units clustering content, such as classes of artefacts within an art movement), as described in [6].
•
The final ingredient is aesthetics. This includes aspects such as the typography, layout and alignment and colours.
Of course, the problem does not stop here. While one user wants his information about Eisenstein to be presented in this way as described in Figure 1, there are many others who would like their information presented in some other form. Similarly, any one user may need different ways of presenting the information depending on the time they have, the location they are in or the means they have for viewing or listening to the information. While the goal is to convey the retrieved information to the user effectively, there are of course very many diverse users who each want their own information displayed on their own device. Multimedia presentation design is, however, an expensive process, that traditionally would imply large amounts of expensive manual design effort requiring skills that only professionals possess. To make this scalable and cost-effective, we need to automate at least parts of the presentation design process.
4. Automatic presentation generation Our approach for conveying information effectively is to develop an explicit model of the knowledge needed to make an informed choice for the design. The Cuypers system, our experimental workbench, provides a framework for automatically generating multimedia p resentations as described earlier [15, 16].
Domain Model Meta data DB
MM DB
Discourse Model Design Model User Profile
Domain selection
Discourse organization
Presentation design Constraint layer
Device Profile Media Model
Figure 2: Information sources used by the Cuypers hypermedia generation engine The architecture, as outlined in Figure 2, consists of 6 models covering the knowledge in the form of facts and task solving routines (mainly constraint solving) required during the generation process that is organized by the presentation engine. Note, all modules have their own conceptual representations in RDF that are accessible from all other modules too. The areas covered by the various modules are:
Domain model: conveys the underlying relations in the subject matter by providing the explicit specification of the relationships among the concepts on a particular topic. As for many domains, these models have already been developed, and interchange languages for these models are now being developed in the context of the Semantic Web, the domain model itself is not a topic of our research. Discourse model: establishes information relevant for the evolving communicational structure of the presentation. For example, importance weights for information units such as text or images are established on a scale from 0 to 1, depending on the importance of their role for the current communicational goal (e.g for an introduction definitions might score with a high weight). The discourse model is also responsible for the navigational style. For example, as the presentation is evolving over time, based on the browsing behaviour of the user, the importance weight of links might be high. Finally, relevant items for the story component on the level of navigation, such as links for macro- and micro-navigation, and content, such as the role of required text (definition or overview), are templated here as well. This information is mainly of interest for the retrieval, but will also serve during the design process.
Design model: contains methods and facts to design the stylistic aspects of a presentation, with respect to layout, typography and colour design [5]. User profile: embodies the most suitable user profile and provide rules indicated by that profile to guide the choice of values for system attributes such as colour, text size [11]. Device profile: contains information about the technical environment used, such as network connection, screen size, etc. Media model: embodies information to choose the appropriate media in form of descriptions of the characteristics of the media themselves. The media model allows to solve constraints such as: Which medium is appropriate for the message to be conveyed? Does a process need to be conveyed, making video an appropriate choice, or does the user want to see details of a painting, requiring a high-quality still image? [12,13] Presentation engine: manages the overall synchronisation of the presentation design steps. Given that it uses multiple models of different knowledge types, the process of presentation generation is knowledge intensive. The system needs to be able to relate different types of knowledge and take this into account at different stages in the process. In addition, the system allows trade-offs to be made. There is no single algorithm or heuristic for coming up with "the" solution, so the system needs a plug-in architecture that allows flexible experimentation with alternative algorithms. In addition to this symbolic-level processing, the system has to calculate the pixel positions and split-second durations for the final presentations. Processing numerical constraints is thus a required part of the functionality. In case constraints are not met, the system needs to be able to backtrack and try other solutions [8, 15, 16]. The architecture outlined above uses metadata of media items and knowledge about the domain and graphic design to resolve the constraints based on the interdependencies between the style ingredients - content, presentation structure and aesthetics. Resolving these dependencies is exactly why graphic design is difficult. An illustration of a dependency is that a background colour should not clash with the colours of any images being displayed. In this case, the aesthetics, the choice of colours, are dependent on content. Another example is the amount of content that can be included in the space available. Here the content is dependent on the presentation structure. Another dependency is that of presentation structure on the content. This dependency is not so much on the images and texts per se, but on the relationships among them from the underlying subject matter. Suppose there is a collection of images by Eisenstein that are to be incorporated into a presentation. There are, however, too many to be displayed on the screen at the same time. Some are sketches for his films and others illustrate some details of montage theory. This allows us to make groupings within the presentation structure that reflect this distinction. In this way, we can build up a presentation structure that not only depends on the underlying meaning in the material, but it can also be used as one of the mechanisms for conveying it effectively.
5. Conclusion Systems such as our Cuypers environment need to be firmly rooted in the global Web architecture. This forces the design to be based on standard representation languages and shared technologies. Fortunately, this is offset by the significant advantages of being able to reuse third party multimedia content, declarative knowledge and off-the-shelf tools. It also allows others to embed our work in their own Web environments. In addition to the current Web, the Semantic Web is also beginning to develop and more and more the meaning implicit in the content of Web pages is made explicit. Most talks about the Semantic Web use semanticallyassisted information retrieval to illustrate its application - a super-Google, if you will. This will give us our correct and relevant information. For our own work, however, the Semantic Web is most important once this information has been found. We want to make use of any available semantics to assist in conveying the information to the user. Just as we can now share style sheets for Web pages, because the style information has been encapsulated and made explicit, separate from the content to which it refers, with the Semantic Web we should be able to share our smart style - declarative descriptions of discourse characteristics and design rules. In addition, we can make the meaning conveyed in our multimedia presentations explicit using Semantic Web languages, so that the generated presentations themselves also become processable by software agents in addition to being more accessible to human agents.
6. Acknowledgement Part of the research described here was funded by the Dutch national ToKeN2000/CHIME project.
7. References [1] ANDRÉ, E., MÜLLER, J., and RIST, T. WIP/PPP: Knowledge-Based Methods for Fully Automated Multimedia Authoring. In: Proceedings of EUROMEDIA'96, London, UK, 1996, pp. 95-102.
[2] BATEMAN, J., KLEINZ,J., KAMPS,T., and REICHENBERGER, K. Towards Constructive Text, Diagram, and Layout Generation for Information Presentation. In: Computational Linguistics 27(3), pp. 409-449, September 2001
[3] BOLL, S., KLASS, W., and WANDEL, J. A Cross-Media Adaptation Strategy for Multimedia Presentations. In ACM Multimedia '99 Proceedings, pages 37-46, Orlando, Florida, October 30 - November 5, 1999. ACM, Addison Wesley Longman. [4] BRUSILOVSKY, P. Adaptive Hypermedia. Journal on User Modeling and User-Adapted Interaction. 11:87-110, 2001. [5] CZAJKA, K. Design of interactive and adaptive interfaces to exploit large media-based knowledge spaces in the domain of museums for the fine arts. Masters Thesis, University of Applied Science Darmstadt, Germany, June 14, 2002. [6] DAVENPORT, G., and MURTAUGH, M. ConText: Towards the Evolving Documentary. In ACM Multimedia ‘95, Proceedings, pages 377-378, San Francisco, California November 5-9, 1995. ACM Press. [7] DEL BIMBO, A. Visual Information Retrieval. Morgan Kaufmann Ed, San Francisco, USA, 1999. [8] GEURTS, J. Geurts Constraints for Multimedia Presentation Generation. Masters Thesis, University of Amsterdam 2002. [9] HARDMAN, L. Smart style for conveying information. Inaugural lecture, 2 May, Technische Universiteit Eindhoven, 2003. http://alexandria.tue.nl/extra2/redes/hardman2003.pdf [10] KAMPS, T. Diagram Design : A Constructive Theory. Springer Verlag, 1999. [11] LITTLE, S. Cuypers Meets Users. Technical Report, 2002
[12] NACK, F., WINDHOUWER, M., HARDMAN, L., PAUWELS, E., and HUIJBERTS, M. The Role of High-level and Low-level Features in Style-based Retrieval and Generation of Multimedia Presentations. In: New Review of Hypermedia and Multimedia 7, pp. 39—65, 2001. [13] NACK, F. and HARDMAN, L. Denotative and Connotative Semantics in Hypermedia: Proposal for a Semiotic-Aware Architecture. In: New Review of Hypermedia and Multimedia 7, pp. 7—37, 2001. [14] RUTLEDGE, L., VAN OSSENBRUGGEN, J., HARDMAN, L., and BULTERMAN, D. C. A. Mix'n'Match: Exchangeable Modules of Hypermedia Style. In Proceedings of the 10th ACM conference on Hypertext and Hypermedia, Darmstadt, Germany, February 21-25, 1999, pages 179-188. Edited by Klaus Tochterman, Jorg Westbomke, Uffe K. Will and John J. Leggett. [15] OSSENBRUGGEN, J. v, CORNELISSEN, F., GEURTS, J., RUTLEDGE, L., & HARDMAN L. Towards second and third generation Web-based multimedia In: The Tenth International World Wide Web Conference, May 1-5, 2001, Hong Kong [16] OSSENBRUGGEN, J. v, GEURTS, J., HARDMAN L & RUTLEDGE, L. Towards a Formatting Vocabulary for
Time -based Hypermedia. In: The Twelfth International World Wide Web Conference, ACM Press, Budapest, Hungary, May 20-24, 2003, Note: To be published [17] SCHREIBER, A. T. G., DUBBELDAM, B., WIELEMAKER, J., & WIELINGA, B. Ontology -based Photo Annotation, IEEE Intelligent Systems, pp 66 – 74, May/June 2001 (Vol. 16, No. 3) http://www.computer.org/intelligent/ex2001/x3066abs.htm [18] SMEULDERS, A. W. M., WORRING, M., SANTINI, S., GUPTA, A., and JAIN, R. Content-based image retrieval: the end of the early years, 1349 - 1380, 22 - 12, IEEE trans. PAMI, 2000. [19] WEITZMAN, L. and WITTENBURG, K. Automatic presentation of multimedia documents using relational grammars. In: Proceedings of the second ACM international conference on Multimedia '94 San Francisco pp. 443-451, October 15 - 20, 1994