Contextualized Text Explanations for Visualizations - CiteSeerX

9 downloads 103552 Views 1MB Size Report
“a picture is worth a thousand words”, the art historian .... However, unlike DUIS, Text illustrator does not achieve in- ..... Adobe Solutions Development, 1997.
Contextualized Text Explanations for Visualizations Wallace Chigona

Thomas Strothotte

Department of Simulation and Graphics University of Magdeburg Universitatsplatz 2, D-39106 Magdeburg ¨ Germany

Department of Simulation and Graphics University of Magdeburg Universitatsplatz 2, D-39106 Magdeburg ¨ Germany

[email protected]

[email protected]

ABSTRACT According to the multimedia design principle of spatial contiguity, presenting text explanations for visualizations within the image space improves the users’ ability to make referential links between the text and its corresponding objects. In this paper we introduce a concept of Dual-Use of Image Space (DUIS) and we show how the concept presents text explanations for visualizations within the image space without obstructing the image. In DUIS the pixels are used both as shading information for the objects as well as text which can be read. The following are other benefits of DUIS: (1) It provides a mechanism to help user compare information for different objects with minimum cognitive burden (2) By using the same space for both the image and text the concept uses the screen real estate efficiently.

Keywords Dual-Use of Image Space, Text explanations, Spatial contiguity, Smooth transition, Hypertext navigation, Image maps

1.

INTRODUCTION

The importance of providing text explanations to images has long been noted. For example, despite the saying that “a picture is worth a thousand words”, the art historian Gombrich [9] points out that “no picture tells its own story”. Weidenmann [20], an educational psychologists examining the use of images in learning, observes that text explanations enhance the interpretation of images. However, based on the multimedia principle of spatial contiguity, we know that the effectiveness of text explanations does not depend only on the contents but also on its location in relation to its corresponding image [11]. The spatial contiguity theory says that learning is improved when the text and its corresponding image(s) are presented near each other than when they are far from each other. As Mayer [11]

points out, the following are the theoretical bases for improved learning when text and its corresponding image are presented near each other: 1. The readers do not waste cognitive resources saccading between the image and the text. 2. Since the readers can hold both the text and the image in the working memory simultaneously, they can easily make referential links between the two. In the area of interactive systems, the current research for presenting text close to their corresponding images is taking the following directions: 1. Using popup windows. The limitation of this approach is that the window may obstruct the image. 2. Placing the text as labels near the corresponding graphical objects. This is limited to short text. We propose a method which is in line with the design principle of spatial contiguity and at the same time overcomes the limitations of the above described methods. Our method, which is called Dual-Use of Image Space(DUIS), presents text explanations on their corresponding objects themselves. In other words, the text explanations are contextualized in their corresponding objects. The benefits of contextualized annotations have already been noted within the text realm [22]. In DUIS the pixels in the image space are used both for shading and as text which can be read. The fundamental interface issue in this concept is: Given an image, smoothly integrate text associated with individual objects which the user has selected. The text should be integrated into the image with smooth transitions in real time. The amount of text should be variable. The rest of the paper is organized as follows: Section 2 discusses related works. In Section 3 we present the concept of Dual-Use of Image Space and explain how it is used to present contextualized text explanations for visualization. The section also highlights other benefits of the concept, namely: (1) its ability to enable users to compare information for different objects with minimum cognitive burden

and (2) its efficiency in using the screen real estate. Section 4 discusses the techniques for presenting text in the image space. Section 5 is a short discussion on the application of DUIS in hypertext navigation. Finally we present concluding remarks and future research direction in Section 6.

2.

RELATED WORK

Fluid documents Fluid documents [6, 22] is a text annotation technique which uses lightweight interactive animation to incorporate annotations in their context. Here text flows smoothly on the screen in response to user manipulations, particularly with regard to following hyper-links. For example, mousing over an anchor causes a text about the destination document to expand from the mark. At the same time, the source lines next to the mark are pushed down to make room for the addition text. These changes are animated, thereby helping the user to comprehend them. The annotation (the introduced text) may also contain links. This provides a mechanism for multiple-links for a single link in the source document which, as Zelleweger et al. [22] argue, reduces wasteful intermediate pages. Fluid documents demonstrates the benefits of providing the annotations within their context. This concept, however, cannot be extended to images in any obvious way: Text can be made to flow around images, but cannot be integrated within them.

Popup windows Popup windows, based on the balloon help system introduced on the Macintosh, provide text information about an image or a GUI component when the mouse moves over it. An example of such systems is Link titles available in most of the recent web browsers [13]. Although popup windows provide the textual information close to the anchor, they have the limitation that the window may obstruct the source image [21, 22]. This problem may be addressed by the use of see-through widgets [4]. Another limitation of popup windows is that they cannot present information for more than one object simultaneously. This makes comparison of information about different objects cognitively demanding. For instance, to make a comparison, the reader uses cognitive resources to store and recall information which he/she saw previously. As we will show in Section 3.1, DUIS addresses this problem.

readers to associate them. However, the text may obscure parts of the image and as such the approach is well suited only for short text [15].

Zoom illustrator Zoom illustrator, designed by Preim et al. [16], is a dynamic method of labeling images by placing text in the margin. The objects and its corresponding text are joined by a line. Interesting interaction issues result when the image is manipulated. The system also empowers the user to manipulate the text to precipitate corresponding changes to the image. However, this approach is limited in the following ways: 1. Since the text and the image are presented at different locations, the reader must saccade between the two. 2. Due to space limitations, this method is suited only for short texts.

Text illustrator Text illustrator, proposed by Schlechtweg and Strothotte [18], deals with the illustration of long texts. In this system, images and texts are presented in separate windows; manipulating the contents in one window leads to changes in the contents of other window. For example, when an object in the image is selected, the text scrolls so that a portion dealing with the selected object is visible. Conversely, when the text is scrolled, the graphics is also manipulated smoothly to reflect the contents of the current visible portion of text. However, unlike DUIS, Text illustrator does not achieve integration of the two medias: the text and graphics are presented in separate windows.

Artistic screening This approach, proposed by Ostromoukhov and Hersh [14], uses a special half-toning method, called screening, to introduce text as artifacts into renditions. An image is subdivided into blocks each of which is filled by a character. As with any half-toning method, the system aims to reproduce the tone of the original image. Here this is done by varying the weight (thickness of stroke) of the characters. For instance, narrow-stroked characters are used to represent light shading while thick-stroked characters are used to represent dark shading. See the example presented in Figure 1. The approach taken by Artistic screening differs from ours in the following ways: 1. Artistic screening is not an interactive system.

Static labeling Here text explanations are displayed within the image space as labels next to their corresponding objects. This approach is commonly referred to as label placement in the area of cartography where text is integrated in a map to show names of features such as places, rivers and buildings. Static labels are also common in area of technical illustration where they are used to enhance the readers understanding of the depicted objects. The main advantage of this approach is that since the text is provided close to their corresponding objects it is easier for

2. It treats the image as a whole and not as individual objects which make up the image. As such it is not possible to get information about individual objects. 3. It does not address issues of legibility of the text.

3.

DUAL-USE OF IMAGE SPACE

In DUIS the textual information related to the object is displayed in the image space and at the same time is used to simulate the shading of the object. In other words, the pixels in the image space represent both text which can be read and, at the same time, shading information in the images.

Figure 2 illustrates the concept of legibility of an object. The image in this example is a part of the map of Germany and the selected object is the state of Saxony-Anhalt. The first image (from the top) shows a close-up rendition of the map. When the user turns up the legibility of the selected object, the text is introduced in an animation until the characters are fully recognizable as shown in the last image in Figure 2. The second and third images are some of the frames in the animation sequence. The text which appears in this example is tourism information of the selected state. Note how the pixels representing the object are used in two ways. On one hand, they display the object itself and; on the other hand, they display the text about the image. This dual use of the display space, shading the object and at the same time displaying a text about the object, is a new concept for intimately linking an image and the text associated with it.

3.1

Advantages of DUIS

Ease of comparison of information Figure 1: An example from Artistic screening: The shading of the image is simulated by varying the weight of the characters. Image reproduced with permission from authors of [14].

One of the major goals of DUIS is to achieve smooth transition between a representation of an object as an image and its representation as text. To explain the concept, let us assume a scenario in which the user is presented with an image and wishes to obtain textual information about individual objects being displayed. First, the user selects an object about which he/she wishes to obtain textual information. This is done by pointing and clicking. In previous systems, the text would now be introduced into the image, either as a label beside the object, or as a floating window next to the mouse pointer. Alternatively, a document with text would be opened, either in a separate window or in the same window replacing the current image. Instead, DUIS is based on the concept of the legibility of the object. The underlying principle is that every image is dithered with text, except that without any manipulation the characters are too small to be recognized. After selecting an object, the user adjusts the legibility. This means that over time (we have found that about a second is sufficient) the characters comprising the text are enlarged, starting from a predefined minimum size (say one pixel) up to the point where the characters are large enough to be recognized by human readers (say 8 point size). The process of turning up the legibility of the object is also referred to as extracting the text from the object. From a technical point of view, the effect of turning up the legibility is that the dither matrices used to render the selected object are enlarged quickly while the size and shading of the object are kept constant. The effect of this is that as the dither matrix size increases, characters become more and more recognizable.

One of the goals of visualization is to enhance comparison of information. For example, given a geographical map as in the previous example, the user may want to compare the information of different regions. DUIS is designed to enable the users make such comparisons with minimum demands on their cognitive resources. As shown in Figure 3, DUIS can display information of multiple objects simultaneously. This places information for multiple objects at the user’s disposal simultaneously and he/she can easily make the comparison. This is a clear contrast to the other approaches, popup windows for example, where it is possible to view information of only one object.

Efficiency in use of screen real-estate Another advantage of DUIS is its efficiency in using the screen real estate. Here the image space is used also to display the text. For this reason this approach would be appropriate for small-screen devices such as electronic books.

4. PRESENTING TEXT IN THE IMAGE SPACE 4.1 Shading As is the case in the Artistic Screening (see Section 2), in DUIS the intensity of the object is simulated by varying the weight of the font glyphs. Narrow strokes are used for light shading and thick strokes for dark shading, and the in-between levels of shading are achieved through interpolation. However, unlike Artistic screening where each glyph is designed individually, DUIS uses existing fonts. The use of existing fonts is advantageous since it offers flexibility in terms of the choice of looks of the text. It also allows DUIS to take advantage of the features of existing font rendering libraries thereby ensuring a good quality characters. For our purposes we require a font in which the weight can vary continuously. Here the standard fonts do not qualify since they offer only discrete weight options (regular and bold). For this reason we employ Multiple Master Fonts [1] with variable weight and width axes.

Figure 3: Text explanation for more than one object can be displayed simultaneously. This makes comparison of the information easier than in existing methods.

Figure 4: A font glyph at different levels of intensity for different font sizes. Figure 4 illustrates how the variations in the weight are used to simulate the intensity: Each column represents a font size; the bottom most glyph has the minimum weight (representing lightest shading) and the weight increases upwards. Each glyph is rendered (centered) in a box whose size is equal to the bounding box of that glyph at the maximum weight (in the example the top most glyph). This means for any size a glyph with the maximum weight value fills the entire box while the minimum weight glyph would occupy the least space in the box. To avoid dealing with the ascends and descends of the glyphs, the current implementation does not use lowercase characters. Figure 2: Turning up legibility of an object. Text is introduced the selected object in an animation, the images shown in this figure are some of the frames in the animation sequence.

Rendering of characters is computational intensive and therefore comes in conflict with the goal of speed required for the smooth-transitions. DUIS addresses this problem by precomputing the characters.

4.2

Text Layout

There are several algorithms for fitting a paragraph of text to a given shape. Although the shapes are usually rectangular, most algorithms may also fit text to other kinds of shapes. One of such algorithms works by applying and evaluating penalty values and rules as proposed by Knuth [10] and used in TEX system. One of the goals of text layout algorithms is to avoid unreasonable word breaks at the end of the line, since this makes reading difficult [12]. In addition to this goal, the text layout algorithm for DUIS aims at preserving the average intensity of the object. This is necessary so that the objects remain distinguishable and recognizable even after text extraction. Our line break algorithm is based on the algorithm proposed by Knuth [10]. The major modification is on how the algorithm fills up a line when the text does not fit perfectly. The original algorithm manipulates the spaces between the characters and between the words. This, however, is not suitable for DUIS since it alters the intensity of the object. Instead, we manipulate the width of the font face on the line: When the text is too long or too short to fit on a line, the width of the font face is reduced or increased respectively.

4.3

Object Size

In most cases the size of the object and the amount of text may not match exactly; there is either too much or too little space for the text to be displayed.

Too Much Text Here the reader may page through the text. As can be observed in Figure 3, arrows are used as a graphical indication of the presence of extra text prior or after the currently displayed page which may be accessed by clicking on the respective arrows. The text format is also used to address this problem. If the text is formatted such that it has got headings (according to HTML format), the text which comes after a heading and before the next heading is considered to belong to the section of the heading. Users may fold a section, that means, only the heading will be displayed; conversely, a folded section may be unfolded. As shown in the example in Figure 5, the presence of heading is graphically indicated by square bracket enclosing either a plus or a minus sign. A minus sign stands for an unfolded (open) section, in other words it prompts for folding, on the other hand, a plus sign stands for a folded section. In the example the “materials” section has been folded while the “cost” section is unfolded. During folding, the text belonging to the section disappears into the square bracket in an animation and during the unfolding the text grows from it. Just like in Fluid Documents (see Section 2), the text for the subsequent sections moves up and down respectively. A special case is when the object itself is too small to contain reasonable amount of text. Here the term reasonable is used to mean a specified minimum amount of text, usually the name of the object, which must be displayed. In such cases the object is scaled up to a size where at least the minimum amount of text fits. At the same time the other objects in the image are squeezed to create room for the scaled-up

Figure 5: Folding/unfolding: A square bracket enclosing a minus or plus sign is a graphical indication of the presence of a section.

object. A more detailed discussion on image distortion is presented in section 4.4 Object Shape.

Too Much Space This problem is not as critical as the previous one. Here if the text would be displayed normally, that is, from top to bottom, the lower part of the object would remain unfilled and as such the average intensity of the object would be different from that of the original image. DUIS offers two solutions to this problem: 1. The text is centered in the object and the remaining space is shaded using normal shading or 2. the text is repeated over and over until the entire object is filled up.

4.4

Object Shape

Reading from the object surfaces may be difficult because of the shape: On one hand, readers are used to reading from rectangular regions (windows), on the other hand, the shape of a silhouette of a graphical object is usually irregular. As a solution to this problem, DUIS has an option where, on the user’s request, the shape of the object morphs into a rectangular window, as shown in right most image in Figure 6. This process is called rectangularization. In most instances the rectangle has the size of the rectangular bounding box of the object. This, however, may not always yield pleasant results. We consider, among other things, the length of a line of text: Below a certain threshold the length of the line has negative effects on legibility [8]. For this reason, rectangles with widths below the threshold are not acceptable. On the other hand, bounding boxes of long objects, for example a river running across a country, may take a lot of space. These problems are addressed by specifying the minimum and maximum sizes of rectangles (users may adjust these values). The rectangle has the size of the bounding box only if the bounding box falls within these bounds. When the bounding box of an object is below/above the bounds, then the size of rectangle equals the minimum/maximum rectangle.

Figure 6: This figure shows some of the frames during rectangularization. In contextual rectangularization, the rest of the objects in the scene are also distorted to create room to the rectangularized object.

Overlay Rectangularization our informal observations indicate that in some cases users do not like distortion in their images, especially when the distortions are big. For this reason DUIS has an option where the rectangle is laid over the image and the objects in the image remain undistorted. This kind of rectangularization is called overlay rectangularization. The following options are available to avoid overlay rectangles obstructing the underlying image: 1. The rectangle may be made transparent as shown in Figure 8.

Figure 7: Multiple rectangularization. Clipping is used to avoid rectangles intersecting each other.

2. The rectangle is floating, that means, it can be dragged to a position of their choice as shown in Figure 9. A pair of lines connects the rectangle and its original object to help the user visually associate the two.

After deciding on the size of the rectangle, the algorithm creates a rectangular polygon with a number of vertices equal to those of original polygon. i.e. each vertex on the original polygon has a corresponding vertex on the rectangular polygon. Morphing between the two polygons is achieved by lineally interpolating between their corresponding vertices. In order to maintain the context of the rectangularized object, the other the objects in the image should remain visible. Here, the other objects in the image are displaced, that is, their shapes are distorted to create room for the rectangle (see Figure 6). The displacement algorithm is adapted from the displacement algorithms proposed by Carpendale et al. [5]. We call this form of rectangularization contextual rectangularization.

Multiple Rectangularization As shown in Figure 7, an image may contain more than one rectangularized object. This, as pointed out earlier, minimizes the cognitive resources needed to compare information about different objects. Rectangles which would intersect each other deserve special consideration, since if this happened, some text would be obstructed. To avoid the rectangles intersecting, the new rectangle is tested against all the existing rectangles to identify intersections which, if found, are clipped from new rectangle. Due to this operation, the newly created “rectangle” may have more than four sides. However, as is shown in the example in Figure 7, all angles would still remain right angles.

Figure 8: The rectangularized object can be transparent so that the underlying image remains visible.

4.5

Smooth Transition

As illustrated in Figure 2, the extraction of text from the objects is realized using light weight animations. This also holds for most of the changes in the scene, for example, the distortions of the objects in the image (see Figure 6). The smooth transition is necessary because, as Bederson and Boltman [2] as well as Robertson et al. [17] note, readers find abrupt transitions, such as instantly replacing an object visualized on the screen with another one confusing, distracting and irritating. Instead, smooth transitions spreading

Figure 9: The rectangularized object can be moved to a position of the readers’ choice. over a short period of time give the user a chance to comprehend and appreciate the operation and its implications. On the other hand, users find animations which take “too” long disruptive since they come in the way of their (the users) task [2] . For this reason, the animations in DUIS take at most one second. In the case of rectangularization, the smooth transition helps the readers to maintain the objects constancy despite the change in the shapes. Here we draw our lessons from research in cartograms. In cartograms the area of regions on a map are drawn proportional to the value being visualized, say population. According to the findings of Ware [19], when presented with static cartograms, readers have problems recognizing the regions. However, the problem is greatly reduced when the readers observed the animation of maps being deformed.

4.6

Figure 10: Multiple columns may improve the legibility of text.

5.

HYPERTEXT NAVIGATION

One task the hypertext readers undertake is to decide whether or not to follow a link. Conklin [7] refers to this task as a cognitive overhead of hypertext navigation since it does not relate to the readers’ main task of information gathering. A number of researchers have argued that this overhead can be reduced by providing preview information of the destination of the links since this informs the reader of the relevance of the information in the destination documents(s) [3, 13]. Here, like in text explanations for visualizations, the location of the preview information is of usability significance [22]; that is, the nearer to the anchor the preview information is the more effective.

Further Considerations

Tube-like Objects The current implementation fills the object with text from left to right, top to bottom. However, this approach does not yield pleasant results when dealing with tube-like (pseudo 1D) objects such as rivers and blood vessels in anatomy illustrations. We are currently working on algorithms to present text along the the shape of tube-like objects and instead of rectangularizing them, we are experimenting with inflating, that is, making the object wider.

Multiple Columns We are also currently experimenting with presenting text in multiple columns in the objects as shown in the illustration in Figure 10. This would be necessary in objects whose shapes are complex (non-convex) and especially those that contain holes. In the current implementation when the polygon has a “dent” as in the top part of Figure 10 or a hole, the text lines from the left side continues after the interruption. This makes reading difficult since after jumping the interruption, the reader must find the position where the text line continues. Likewise, after the end of a line it may not be easy to find the begin of the subsequent line. We are of the opinion that introducing multiple columns would address this shortfall.

Using the techniques described in this paper, textual preview information for image map links can be provided within the image map itself. The benefit of ease of comparison among different objects, noted in the visualization application, also applies in this application area. As shown in the example in Figure 11, the text itself may also contain links (here, like in most hypertext systems, underlining is used as a graphical indication of the presence of a link). This makes it possible to introduce for the first time multiple links for image maps which, as Zellweger et al. [22] note, reduces wasteful intermediate pages.

6.

CONCLUSION AND FUTURE WORK

In this paper we have introduced the concept of Dual-Use of Image Space and we have shown how it is used to provide contextualized text explanations for visualizations. We have also demonstrated the application of the concept in hypertext navigation. On top of providing the contextualized text for visualization, DUIS has other usability benefits. (1) Since it is possible in DUIS to simultaneously present text explanation for multiple objects, readers can compare information with minimum cognitive burden resources. (2) By using the same space for

[8] Dyson, M., and Kipping, G. The effects of line length and method of movement on patterns of reading from screen. Visual Language 32, 2 (1998), 150–181. [9] Gombrich, E. The Sense of Order - Astudy in the Psychology of decorative art. 2nd ed. Phaidon Press, London, 1984. [10] Knuth, D. Digital Typography. CSLI Publications, Stanford, CA, 1999. [11] Mayer, R. Multimedia Learning. Cambridge, Cambridge, UK, 2001. [12] Nas, G. The effect on reading speed of word divisions at the end of a line. In Human Computer Interaction: Psychonomic aspects, G. van der Veer and G. Mulder, Eds. Springer-Verlag, Berlin, 1988. Figure 11: The text may contain links. Underlining is used a graphical indication of the presence of a link.

[13] Nielen, J. Using Link Titles to Help Users Predict Where they are going. Jakob Nielsen’s Alertbox, January, 1998, http://www.useit.com/alertbox/980111.html.

both the text and the image DUIS uses the screen real estate efficiently and as such it is suitable for small screen devices.

[14] Ostromoukhov, V., and Hersh, R. Artistic screening. In Proceedings of Conference on Computer Graphics (1995), pp. 219–228.

Until now we have not investigated usability of DUIS in systematic studies. The development of the system has so far been guided by the available literature on usability as well as by our informal observations. However, we are of the opinion that it is necessary to carry out a study to assess the correctness of our assumptions. A formal usability study for the system is planned in the near future.

7.

REFERENCES

[1] Designing multiple master typefaces. Tech. rep., Adobe Solutions Development, 1997. [2] Bederson, B., and Boltman, A. Does animation help users build mental maps of spatial information. In Proc of the IEEE Symposium on Information Visualization (1999), pp. 28 – 35. [3] Bernstein, M. More than legible: On links that readers don’t want to follow. In Proc. ACM Hypertext’00 (2000), pp. 216–217. [4] Bier, E., Stone, M., Pier, P., Buxton, W., and DeRose, T. Toolglass and magic lenses: The see-through interface. In Proceedings of Conference on Computer Graphics (1993), pp. 73–80. [5] Carpendale, M., Cowperthwaite, D., and Fracchia, F. Extending distortion viewing from 2d to 3d. IEEE Graphics and Applications 17 (1997), 42–51. [6] Chang, B., Mackinlay, J., and Zellweger, P. Fluidly revealing information in fluid documents. In Proceedings of the Smart Graphics 2000 Spring Symposium (2000), AAAI Press, pp. 178–181. [7] Conklin, J. Hypertext: An introduction and survey. Communications of ACM. 20, 9 (September 1987), 17–41.

¨mer, L., and Heber, M. Label [15] Petzold, I., Plu placement for dynamically generated screen maps. In Proccedings of the Ottawa ICA (1999), pp. 893–903. [16] Preim, B., Raab, A., and Strothotte, T. Coherent zooming of illustrations with 3d-graphics and text. In Proceedings Graphics Interface’97 (May 1997), Canadian Computer-Human Communications Society, pp. 105–113. [17] Robertson, G., Card, S., and Mackinlay, J. Information visualization using 3d interactive animation. Communications of the ACM 36, 4 (April 1993), 16–19. [18] Schlechtweg, S., and Strothotte, T. Generating scientific illustrations in electronic books. In Proceedings of the Smart Graphics 2000 Spring Symposium (2000), AAAI Press, pp. 8–15. [19] Ware, J. Using animation to improve the communicative aspect of cartograms. Master’s thesis, Michigan State University, Department of Geography, 1998. [20] Weidenmann, B. Informative bilder (was sie k¨ onnen, wie sie didaktisch nutzt und wie man sie nicht verwenden sollte). P¨ adagogik (1989), 30–34. [21] Weinreich, H., and Lamersdorf, W. Concepts for improved visualization of web link attributes. In Proceedings of the 9th International World Wide Web (2000), pp. 403–416. [22] Zellweger, P., Chang, B., and Mackinlay, J. Fluid links for informed and incremental link transitions. In Proc. ACM Hypertext’98 (1998), pp. 50–57.

Suggest Documents