Interaction and Focus: Towards a Coherent Degree of

Interaction and Focus: Towards a Coherent Degree of Detail in Graphics, Captions and Text

Knut Hartmann

Antonio Krugery Ralf Helbingz

Stefan Schlechtwegz

Abstract

Direct manipulation is the key concept of interaction with systems that use graphics to communicate with the user. The latest generation of such systems supports also interaction with other modalities, i.e., text and gure captions. However, the eects of interaction within one modality must be reected in the other modalities in order to keep the presentation consistent. Although some approaches exist that distribute the content of a presentation coherently on dierent modalities, none of them is suited for highly interactive systems. This paper provides a fast and simple solution to solve this problem. We argue that interaction with objects changes their actual importance for the presentation. Therefore a central focus structure is used as a common basis for the generation of graphics and gure captions. A semantic network is used to determine relevant background objects and concepts. We will show how graphical abstraction can be used to render images that have a desired focus structure and to generate gure captions appropriately.

1 Introduction Good presentations using several modalities (e.g., text and graphics) to communicate information rely on a proper distribution of the content among the dierent modalities. Hence, authors have to decide which part of the information should be given in textual form or be provided by illustrations. Text and graphics need adequate focus structures that help to distinguish clearly between important parts of the presentation and less relevant details that provide background information. Whereas the role of focus structures in text is well understood (e.g., MC90, GS86]) and computer based text generation approaches rely on Institut fur Wissens- und Sprachverarbeitung, Otto-von-Guericke Universitat Magdeburg, Universitatsplatz 2, D-39106 Magdeburg, Germany, email: [email protected] y Graduiertenkolleg Kognitionswissenschaft, Lehrstuhl Prof. Wahlster, Bau 36, Universit at des Saarlandes, Postfach 151150, D-66041 Saarbrucken, Germany, email: [email protected] z Institut f ur Simulation und Graphik, Otto-von-Guericke Universitat Magdeburg, Universitatsplatz 2, D-39106 Magdeburg, Germany, email:fstefans,[email protected]

these experience, research on automatic generation of focus structures for graphics is a rather new area. Nevertheless, non-photorealistic rendering techniques which were developed during the last few years, support the generation of graphical abstractions that impose clear focus structures in the resulting rendered image. The problem of the automatic generation of graphical abstractions fullling a specic communicative goal is still open and subject to ongoing research. Especially in highly interactive systems, where the communicative goal changes due to user interactions, two interesting questions arise: First, how to generate texts and abstracted graphics with coherent focus structures and second, how to modify the focus structure as the user interacts with the system. Similar problems arise when appropriate illustrations { i.e., illustrations with a coherent focus structure within text and graphics { should be generated for a given text portion. This paper proposes solutions to both problems. In Section 3 we introduce the notion of a focus structure which is used as a common basis for the generation of graphics and gure captions as well as to select those text fragments within a given canned text which are focussed on the most prominent entities of the focus structure. In Section 4 we exploit this structure to generate appropriate graphical presentations which are provided with gure captions. Our ideas will be illustrated in Section 5 by an example of a medical tutoring system for human anatomy which is designed to assist students during exam preparations.

2 Related Work The fact that users nd many information presentation systems confusing or hard to understand is mainly due to their lack of customization. Therefore, a lot of eort has been spent recently on the development of so called Intellimedia Systems which generate adequate presentations tting user's needs and goals. Instead of relying on stored elements, all parts of the presentation (i.e., graphics and text) are generated from scratch. These systems (WAF+ 93, RAM97, FM93, SF91]) use advanced planning approaches to select the content of the presentation as well as the medium to convey it and therefore produce sophisticated results. If users can interact with the system, an expensive re-planning is necessary in order to adjust the generated presentation to the new user intention. The above mentioned systems only allow for rudimentary interaction. For example, menu-based interactions within the PPP-system RAM97] are restricted to graphics. Even though some systems provide richer interaction possibilities (e.g., the Zoom Illustrator PRS97]), only very few of them support direct manipulation in text as well as in graphics (e.g., the VisDok-system HHS98]). Although these systems generate multimedia presentations, they do not make extended use of graphics with multiple focus structures, i.e., several focus structures with nested focussed and background objects. This can be achieved with powerful graphical abstraction techniques (SR98, SW98]) which highlight important

objects in the rendered image and de-emphasize unimportant background objects. In Kru98], several rules are identied explaining how a suitable focus structure for a given presentation context must be constructed in order to achieve a well balanced degree of abstraction in the nal image. A consistency rule, for instance, constrains the maximum abstraction degree for objects in the graphics according to their future importance for the presentation. Violation of this rule leads to sequences of graphics with objects \popping out" in the middle of the presentation at the moment they are gaining importance. If interaction is possible in all modalities used by these systems, the graphical and textual focus structures must be synchronized to keep the presentation consistent. Until now, presentation systems failed to provide a fast and exible solution to this problem.

3 Focus Structure In order to generate presentations with a well balanced and consistent content in all modalities, there are a few considerations which lead to the concept of focus structures. First, it is essential to assign values to textual and graphical entities (referring themselves to domain objects) which indicate their importance for the communicative task at hand. In the following, we use the term dominance value to denote these values. Second, one main observation is that the dominance values of separate entities are related to each other. Our main hypothesis is that the objects being currently focussed pass or propagate some of their dominance to other, related entities. This propagation is well-recognized in presentation planning approaches and is achieved by design rules in the content selection phase of presentation planning.1 The data structure representing the dominance values themselves as well as the contribution of one object's dominance values to the dominance values of related objects is referred to as focus structure. In contrast to presentation planning approaches which determine dominance values in the content selection phase, we propose an extension to highly interactive systems where user interaction controls the set of focussed objects. For this purpose, we use an explicit representation of the document's focus structure. Changes of the user's interest (e.g., expressed by an interaction) modify this focus structure and, as a consequence, the dominance values of the domain objects. Domain specic knowledge, i.e., domain objects and their relations, can be represented eectively in a semantic network. The relations contained in this network include object classications (is-a and instance-of relations), their partonomy (part-of relations) and other relations between objects (e.g., connected-with relations). This semantic network is enhanced with the following additional parameters 1

see, for instance the provide-background operator in the WIP/PPP system And95].

propagation(ux n) is distribution := ux (1 ; c(n) ; l(n)) dom(n) := dom(n) + ux c(n) edge-number := card(fn j edge(n n )g) unless (distribution threshold) forall n in edge(n n ) i

i

>

do

i

i

ux := (edge) distribution edge-number propagation(ux n ) i

od

!

=

i

i

Figure 1: The dominance propagation algorithm which represent the dominance values of domain objects (nodes)2 and the ux of dominance via the relations between domain objects (edges): a percentage value c(n) indicating the consumption of dominance within the node n, i.e., the amount of dominance remaining in an activated node, a cumulated dominance value dom(n) to enable dominance propagation via multiple activations, a weight (edge) indicating the ability of the relation edge to propagate dominance to related nodes. The amount of dominance can be decreased or increased modeling an resistor or repeater associated to that edge. a percentage value l(n) indicating the loss of dominance within a node n a fraction of transferred dominance which does not contribute to the cumulated dominance dom(n) of the node itself, and which is not passed to other nodes via relations. In our framework, the focus structure is a subgraph of this enhanced semantic network marking relevant objects and attributes of the presentation. Our main idea { the distribution or ux of dominance via relations { is formalized in the dominance propagation algorithm. First, an initial dominance values DOM is assigned to domain objects with the main focus (i.e., to nodes in the semantic network). Second, this dominance can be spread over to related objects which can in turn propagate a portion of their dominance. This is achieved by a recursive application of the propagation function propagation(ux n ) for all initial dominance values (ux n ) 2 DOM until termination. The propagation function propagation(ux n) is given in Figure 1. !

i

i

i

i

2 Nodes representing domain objects with dominance values above zero are called nodes.

activated

As a consequence of dominance propagation, a focussed object, being part of a group, is able to activate the other objects of that group. Furthermore, an instance can activate the other objects within its class, or objects can activate other related objects (e.g., focussed muscles can activate bones to which they are attached to). The resulting focus structure can be modied directly or indirectly by altering the preselected set of initial focussed objects, whereas the focus area can be enlarged or reduced by adjusting the initial dominance values. Besides direct manipulation of the parameters of the propagation function, the user may inuence these parameters indirectly via scrolling the visible text portion, selecting objects within the image, the gure caption, or the text. As a consequence of user interactions, a new set of focussed objects is determined and the propagation algorithm starts with a new set of initial dominance values DOM. Furthermore, the current viewing direction can be evaluated in order to extract a clue which objects are currently focussed, since objects in the center of the viewing volume are more important, whereas hidden objects may be less important to the user.

4 Image and Caption Generation from the Focus Structure Having assigned dominance values to the objects of the presentation, this information has to be conveyed appropriately to all modalities. In this section we illustrate the central role of the focus structure to adjust the content of the dierent media.

Graphics:

Since the rendered images in the presentation are generated from 3Dmodels, the rendering style and rendering parameters can be adapted to reect the determined dominance values. The most important objects should be highlighted in the rendered image, while objects providing valuable background information may be rendered in less detail and unimportant objects may be omitted completely. Thus, the dominance values can be used to determine appropriate rendering styles and rendering parameters. To render images reecting the computed focus structure, a notion of detailism of the rendering styles invoked by the system is necessary. The metric proposed in this paper focuses on lines that are generated by a sketch renderer SR98] and is based on four dierent categories of decreasing degrees of graphical abstraction.3 The most abstract graphical representation of an object is no representation at all, i.e., excluding the object completely from the image. The second category includes only the object's silhouettes. The third category adds some line-drawings to the silhouettes that are selected from the edge-based representation of the 3D-model. 3 In this work we are not discussing the notion of graphical abstraction in general and use it in an intuitive manner. For a more elaborated view refer to S+ 98].

A)

B)

Figure 2: Examples of dierent abstraction styles to model appropriate focus structures (Kaiserpfalz zu Magdeburg). Finally, the last category includes also lighting information, allowing for a broad range of details from emphasized edges to a combination of sketched and shaded images. Since the third and fourth class contain a range of dierent abstracted representations of the objects by themselves, an additional smoother metric can

be dened on each of those classes. In this way, more or less edges or lighting information can be considered and dierent styles of representing surface-related information can be employed to customize the level of abstraction. For a better understanding of these concepts, consider the dierent depictions of a palace shown in Figure 2. The images of column A) are examples with dierent focus structures that were realized by means of graphical abstraction. The uppermost image in column A) has no clear focus structure since all parts of the palace are rendered in the same color. Rendering objects in their original color can yield to a clear separation of fore- and background objects (second image). A more complex focus structure is realized in the third image guiding the viewer's attention to both towers. Column B) contains some examples of the above mentioned abstraction categories. The rst and second image are examples of depictions from the third category, showing the object's silhouette and additional lines. The third one was rendered under consideration of all available information (i.e., lighting information) and is therefore a member of the fourth abstraction category. One major property of this approach is that the more abstract classes are completely included into the more detailed classes, thus allowing a direct mapping from dominance values of the focus structure to abstraction categories. This can be done by partitioning the range of allowed dominance values into four intervals. If a given value for an object falls into the rst two intervals, the object is completely left out or only the silhouette is drawn. If the value lies in the third or fourth interval, it determines the number of lines that are used to depict the object and how much lighting information is used to render the image.

Figure caption:

Information on the transformation of dominance values to rendering styles might be essential for a correct interpretation of the rendered image. Therefore, the encoding scheme itself should be described within the gure caption. In this scenario, gure captions serve as legends to explain graphical conventions used in the rendered image. Furthermore, in gure captions causal relations in the application of abstraction techniques can be explained. Such gure captions ease the interpretation of the image and enable the graphics generator to incorporate powerful abstraction techniques PMHS98, MMCR98]. Other parts of the gure captions provide the name of the object, the viewing direction and the focussed objects. This information is essential as images attract the viewer's attention most and thus it is very likely that the user reads only the gure caption without reading the accompanying text.

Text:

User interaction in the graphics or gure caption may result in severe changes to the focus structure, as discussed in the last section. Consequently, those text portions with a compatible focus structure must be determined in order to adjust the visible text portion to the modied graphics. As a consequence, the system should display those text portions, where the most dominant graphical objects are

Figure 3: A proposal for a system enhancing medical texts with generated rendered images and gure captions. most salient. As the content of the text is unknown, dierent heuristics to select good text portions have to be evaluated. As we have shown in this section, the focus structure can be used to coordinate the behavior of independent generation modules for each modality.

5 Example To demonstrate the concepts introduced in this paper, consider the following example of a medical tutoring system. As Figure 3 shows,4 this system provides information by three modalities: an image (on the left), text (on the right) and a gure caption below the image. The images are rendered from a 3D-model whereas the main text is canned, but scrollable. Interaction is possible with all three modal4 This gure merely serves to illustrate our ideas and did not result from a screen-shot of a running system.

ities: the user may select objects in the rendered image, the gure caption5 and in the text. All these interactions impose changes to the intended focus structure of the entire document. Figure 3 shows a snapshot of an ongoing session, where the user has selected a region in the text. This text portion is analyzed in order to determine an initial set of focussed objects, which serves as input for the dominance propagation algorithm. The dominance values of the resulting focus structure are used to select appropriate rendering styles and rendering parameters for the objects in the 3D-model. In the following, we explain these steps in more detail. Since human anatomy uses unique names, no deep text analysis is needed to decide which objects are mentioned in the region. Furthermore, some cues on important objects can be derived by analyzing the text formating. High initial dominance values are assigned to objects which are highlighted in the text (e.g., by a special font). Additionally, the system assigns a low dominance value to other discourse participants. The main focus of the selected text part in this example lies on the ligaments (lat. retinacula). The structure of the book from which the text is taken Rog92, p. 304] concentrates on the description of functional systems and their parts. Thus, the hierarchical structure of the textbook (Part 2: \Muscolo-skeletal system", Section 20 \Muscles and movements of the lower limb", subsection \The dorsum of the foot", subsubsection \Retinacula") can be used to determine the most important and thus focussed objects, i.e., those ligaments located in the dorsum of the foot. The dominance propagation should consider assumptions about the knowledge of the intended user. For naive users, for instance, it might be necessary to include all objects somehow related to the focussed objects. This could be achieved by a global increment of the weight parameter , which results in an extension of the focus area. Thus, the muscles and the bones in this area are activated with a lower dominance value. In this way, the dominance propagation algorithm assigns a high dominance value to ligaments, a medium dominance value to muscles and a low dominance value to bones. This dominance distribution is then visualized by rendering objects of dierent classes with dierent line styles. The selection of another text portion than the one in Figure 3, for example, imposes a new calculation of the initial set of focussed objects, which, in turn, has to be reected in the rendered image. Consequently, the descriptive gure caption has to be updated automatically to reect the picture's content correctly. So far, the initial set of objects has been determined by an analysis of the displayed text portion. In addition to interactions on text, the user can change the viewing direction or select objects in the rendered image. Both interactions change the focus structure, which may force the selection of another text portion with a compatible focus structures. Thus, this focus shift may cause the text to scroll to an adequate position where the focussed objects are mentioned. !

5 Preim et. al. PMHS98] introduce the term interactive gure captions to denote user interactions via gure captions.

6 Discussion We have suggested a framework to adjust the level of detail in multimodal presentation generation. This work combines ideas developed by the authors in several systems on multimodal presentation planning HHS98], on generation of gure captions PMHS98] and systems producing powerful graphical abstraction SR98], Kru98]. The implementation of the propagation algorithm and the automatic selection of dierent abstraction techniques to convey dominance values is work in progress. The major problem of this approach is to appropriately parameterize the semantic network (i.e., to assign values to the variables c and l for each node, as well as to the variable for each edge between nodes). There are at least two possible solutions to this problem: a machine learning approach: where dominance values are assigned manually to the object in a number of illustrations and appropriate values for the dependent parameters are trained in a second phase, and a rule based approach where rules are applied to estimate the parameters of the algorithm. Both approaches are valuable to be explored in the future to extend the techniques presented in this paper. !

References And95]

E. Andre. Ein planbasierter Ansatz zur Generierung multimedialer Prasentationen. inx Verlag, Sankt Augustin, 1995. CCLT98] T. Catarci, M. F. Costabile, S. Levialdi, and L. Tarantino, editors. Advanced Visual Interfaces: An InternationalWorkshop, AVI '98, L'Aquila, Italy, May, 25{27 1998. ACM Press. FM93] S. K. Feiner and K. R. McKeown. Automating the Generation of Coordinated Multimedia Explanations. In M. T. Maybury, editor, Intelligent Multimedia Interfaces, pages 117{138. AAAI Press, Menlo Park, CA, 1993. GS86] B. J. Grosz and C. L. Sidner. Attention, Intentions, and the Structure of Discourse. Computational Linguistics, 12:175{204, 1986. HHS98] R. Helbing, K. Hartmann, and Th. Strothotte. Dynamic Visual Emphasis in Interactive Technical Documentation. In T. Rist, editor, Proc. of the Workshop on Combining AI and Graphics for the Interface of the Future, pages 82{90, Brighton, UK, August, 24 1998. Kru98] A. Kruger. Automatic Graphical Abstraction in Intent-Based 3D Illustrations. In Catarci et al. CCLT98], pages 47{55. MC90] K. F. McCoy and J. Cheng. Focus of attention: Constraining what can be said next. In C. L. Paris, W. R. Swartout, and W. C. Mann, editors,

Natural Language Generation in Arti cial Intelligence and Computational Linguistics. Kluwer Academic Publishers, 1990. MMCR98] V. O. Mittal, J. D. Moore, G. Carenini, and S. Roth. Describing Complex Charts in Natural Language: A Caption Generation System. Computational Linguistics, 24(3):431{467, 1998. PMHS98] B. Preim, R. Michel, K. Hartmann, and Th. Strothotte. Figure Captions in Visual Interfaces. In Catarci et al. CCLT98], pages 235{246. PRS97] B. Preim, A. Raab, and Th. Strothotte. Coherent Zooming of Illustrations with 3D-Graphics and Text. In W. E. Davis, M. Mantei, and V. Klassen, editors, Proc. of Graphics Interface '97, pages 105{113. Canadian Information Processing Society, 1997. RAM97] T. Rist, E. Andre, and J. Muller. Adding Animated Presentation Agents to the Interface,. In Proc. of the 1997 International Conference on Inteligent User Interfaces, pages 79{86, 1997. Rog92] A. W. Rogers. Textbook of Anatomy. Churchill Livingstone, Edinbourgh, 1992. S+ 98] Th. Strothotte et al. Computational Visualization: Graphics, Abstraction, and Interactivity. Springer Verlag, Berlin Heidelberg New York, 1998. SF91] D. Seligmann and S. Feiner. Automated Generation of Intent-Based 3D Illustrations. Computer Graphics, 25(4):123{132, 1991. SR98] S. Schlechtweg and A. Raab. Rendering Line-Drawings for Illustrative Purposes. In Computational Visualization: Graphics, Abstraction, and Interactivity S+ 98], pages 65{89. SW98] S. Schlechtweg and H. Wagener. Interactive Medical Illustrations. In Computational Visualization: Graphics, Abstraction, and Interactivity S+ 98], pages 295{311. WAF+ 93] W. Wahlster, E. Andre, W. Finkler, H.-J. Protlich, and T. Rist. Planbased Integration of Natural Language and Graphics Generation. Arti cial Intelligence, 63:387{427, 1993.