Annotating Animated 3D Objects - Semantic Scholar

Annotation of Animated 3D Objects Timo Götzelmann, Knut Hartmann, and Thomas Strothotte Otto-von-Guericke University of Magdeburg {timo,knut,tstr}@isg.cs.uni-magdeburg.de

Abstract This paper presents a novel approach to illustrate dynamic procedures in tutoring materials by analyzing corresponding animations. We propose two strategies: (i) to enhance animations with secondary elements (e.g., textual annotations and arrows) and (ii) to generate visual summaries, i.e. illustrations where secondary elements provide indications of the direction and the extent of moving objects in animations. We propose metrics aiming at an unambiguous and frame-coherent layout in animations. Moreover, we integrated real-time algorithms to layout secondary elements in animations into an interactive 3D browser. In order to test the impact of various conflicting functional and aesthetic layout requirements, our system contains several algorithms and strategies, which are compared in a user study. Finally, this paper presents result of applying our approach to enhance scientific illustrations and technical documentation.

1

Introduction

Learning material (e.g., text books or visual dictionaries) incorporate many illustrations in order to convey visual attributes of objects (e.g., shape, appearance) and spatial relations between components in complex configurations. These visual entities are primary elements in the layout of illustrations [Tuf97]. Moreover, textual annotations (labels) and other secondary elements [Tuf97] (i) establish co-referential relations between visual and textual elements or (ii) encode directions of moving components and user actions in assembling or maintenance processes (cf. the arrows in Fig. 1). Textual annotations either directly overlay the referred object (internal labels) or are placed on the exterior of the geometric model (external labels), whereas reference lines establish the link between the label’s text and the referred object. These 2D illustrations present 3D objects from a fixed point of view. For complex spatial configurations or complex-shaped objects an interactive exploration of 3D models facilitates comprehension and learning (see [PPS99]). Therefore, several researchers developed real-time layout algorithms that smoothly integrate secondary elements into interactive 3D visualizations. Some of these real-time layout algorithms already consider coherency aspects, i.e. they target at minimizing the flow of layout elements between subsequent frames during user interactions.

Instructive material that aims at explaining processes or actions often comprises animations and videos. Thus, we developed strategies to incorporate secondary elements into computer animations. Many dynamic processes can be explained by a cyclic sequence of slightly different 3D models (e.g., pumping of a heart or work cycle of a combustion motor). Our new real-time layout algorithm analyzes sequence of 3D models considering the current viewing direction. In contrast to interactive 3D visualizations which are characterized by unpredictable user interactions, the flow of primary elements in an animation could be analyzed in advance, in order to minimize the flow of secondary elements. Therefore, we determine calm and fluctuating regions and store the results in a special G-buffer. Subsequently, secondary layout elements are placed in calm regions considering requirements of a functional and aesthetic layout. Inspired by a common illustration technique to visualize movements and actions in many textbooks (see Figure 1: Illustration of working Fig. 1), we analyze the object’s movement and insert combustion motor. new secondary elements (arrows) in the direction of those movements if there is not sufficient space to accommodate labels in a calm region. Finally, these animation paths are labeled. This paper is organized as follows: Sec. 2 discusses the related work in annotating sequences of images and points out that our problem has not been addressed yet. Sec. 3 first defines criteria for the annotation of animated 3D objects, mentions why former approaches cannot satisfy them and describes our approach. The results of our work are discussed in Sec. 4 followed by a short user study Sec. 5. Finally, Sec. 6 concludes this paper and gives some ideas about future work.

2

Related Work

Interactive 3D Exploration. Automatic labeling algorithms were used to integrate secondary elements with respect to visual constraints in Virtual or Augmented Reality environments (e.g., [BFH05, AF03]). Various real-time layout algorithms for secondary elements have been proposed for interactive 3D visualizations (e.g., for virtual 3D city models [MD06], anatomic tutoring systems [PRS97], general surface models [AHS05, GHS06], or volumetric data [BG05]). In all these applications secondary elements support specific tasks. Therefore, label layouts are often evaluated with respect to conflicting functional and aesthetic criteria (cf. [HGAS05]) in order to determine an optimal label layout with different heuristics (cf. [CMS95]). But only a few of those algorithms also target at frame-

2

coherent transitions of layout elements during user interactions. Moreover, all current algorithms consider only movements between subsequent frames and are therefore not able to stabilize annotations throughout pre-defined animations. Video Annotation. In order to annotate video streams, the flow of visual elements is often analyzed with image processing techniques. These algorithms either determine trajectories of moving elements or evaluate the potential of regions to accommodate annotations. Trajectories: Yan et al. [YKCK06] developed several tracking algorithms to detect a moving ball in tennis matches in order to display trajectories on selected (key) frames. Goldman et al. [GCSS06] analyzed trajectories of moving objects to generate storyboards from video streams automatically. (ii) Evaluation of regions: Thanedar and Höllerer [TH04] employ a uniform grid to determine so called calm and dormant regions. Rosten et al. [RRD05] used the feature density to determine positions and regions of low visual interest in an augmented reality scenario. The segmentation of all frames of a video stream into foreground and background and the evaluation of regions in order to find good placements for annotations is the major challenge in all these approaches. As video analysis techniques cannot exploit semantic information, these applications do not raise any constraints on the layout of the annotations. Moreover, none of the approaches mentioned above considers coherency aspects. Our system, however, renders animations from user-selected points of view. Therefore, we can exploit color-coded renditions into invisible buffers to ease the analysis, exploit co-referential relations between primary and secondary elements, consider layout guidelines, and minimize the flow of secondary elements throughout an animation.

3

Approach

The approach presented in this paper is restricted to animations consisting of cyclic sequences of 3D models that are rendered from the current viewing direction of the user. Our layout algorithm considers the exact position and shape of visual objects on the projection and does not rely on any shape simplification with bounding objects. In order to select an appropriate annotation strategy and a good placement for all secondary layout elements all frames of the animation are analyzed. In order to facilitate comprehension and learning, we adopt and extend guidelines for a functional and aesthetic placement of layout elements [HGAS05] to annotate 3D animations. An unambiguous placement label layout requires that anchor points overlay their visual reference objects. Moreover, mutual overlaps between layout elements and occlusions of visual objects by secondary layout elements should be minimized. Finally, we aim at minimizing the visual flow of layout elements. This requirement is enforced for some elements: In order to guarantee the readability of texts presented within labels and to facilitate learning (mental maps) textual annotations are not allowed to move. As these mutual conflicting requirements have to be considered in an interactive application where learners can navigate and explore 3D scenes, we employ a hierarchical approach and 3

Figure 2: The architecture of the layout system. several heuristics to determine an appropriate layout: The layout algorithm first determines candidates for an unambiguous placement of anchor points and internal labels for all frames of an animation. Then overlaps between external labels and visual objects as well as flow of layout elements are analyzed. Finally, a rendering component inserts secondary layout elements into all frames. The architecture of our layout system for secondary elements in animations is presented in Fig. 2. It extends a real-time label layout system [GHS06] which is based on an analysis of invisible ID-buffers of the current view onto a 3D model. Instead of a single 3D model, a set of 3D models representing frames with coherently moving 3D objects serve as input. This frame sequence is repeated while the user explores the 3D model. User interactions define camera parameters and view transformations that are applied onto all 3D models. The segmentation of the projections is based on color-coded renditions of the 3D scene. Therefore, an ID buffer for each frame of the animation is created in the background. These ID buffers are used to determine calm regions, i.e. positions on the view-plane which do not induce changes during the animation. If an object that has to be annotated does not offer calm regions to accommodate the label, trajectories of the object’s movements are determined. In the following subsections we first discuss problems that arise in applying conventional labeling techniques to animated 3D scenes (Sec. 3.1). The determination of calm regions in animated 3D scenes is described in Sec. 3.2. Our novel approach to determine additional secondary elements (arrows) to visualize trajectories of moving objects in animation is presented in Sec. 3.3.

3.1

Problems using conventional approaches

Our initial hypothesis was that coherency strategies for the annotation of interactive 3D visualizations can also be applied to annotate 3D animations. In order to motivate new functional requirements and heuristics to implement them, we briefly summarize our initial observations: Moving Text. Conventional label layout techniques frequently consider Imhof’s guidelines to place names in maps [Imh75]. Imhof advises cartographers to place names for area and line features (internal labels in our terminology) on salient regions of their reference object. Thus, internal labels are commonly placed in the middle of or on the skeleton of visual

4

Piston Piston Piston

Piston

(a) Moving internal text.

Piston

Piston

(b) Fixed external text, moving reference line.

Figure 3: Problems induced by moving secondary elements. objects. Hence, any objects’ movement and any occlusion on the projection induces the movement of labels, which massively decreases their readability (see Fig. 3). Moving Reference Line. In learning material the majority of annotated illustrations employs external labels. These labels are placed on the background and are either aligned with respect to the silhouette of the foreground object or with respect to dedicated annotation areas. Reference lines connect visual objects and textual descriptions. Any user interaction induces changes to the silhouette of the foreground object and to the spatial arrangement of the available background. Even though coherency strategies reduce the flow of layout elements, drastic changes in the global arrangement of layout elements or crossing between connecting lines cannot be prevented completely. In contrast to unpredictable effects of user interactions to the spatial arrangement of foreground elements on the projection, these changes could and should be considered to annotate pre-defined animations. Our algorithm aims to achieve a coherent layout of secondary elements for all frames of an animation. Moreover, the layout has to respect functional requirements. External labels, for example, should not occlude the foreground during the entire animation. Our layout algorithm exploits spatial coherency within animations, i.e. it is based on the assumption that visual objects move only slightly on the view-plane. Therefore, there are areas which belong to a single object during the entire animation. We determine these calm regions that are not affected by the animation and try to place the labels there.

3.2

Determination of calm regions

Our system computes layouts for secondary elements based on the location, size, and shape of complex-shaped primary objects on the projection plane (i.e. in image space). Therefore, we render in a second pass invisible ID-buffers containing unique color-codes for the individual visual objects. Regions where ID-values do not change throughout all frames of an animation can be considered as calm and used for labeling (see Fig. 4). To evaluate the flow of foreground objects within an animation, we render ID-buffers for all frames using the

5

(a) Blue denotes the ID of the piston, Green areas encode calm regions.

(b) An objects without calm regions.

Figure 4: Detection of calm areas. current camera parameters. Subsequently, by applying a simple AND operation on those ID buffers we create another G-buffer. In contrast, zero values in that G-buffer indicates regions where labels should not be placed. We exploit calm regions to place internal and external labels. For objects that move extensively, calm regions can be very small or even they may not exist at all. In those cases, we have to find another strategy to annotate these objects. In the next step we determine and visualize the movement of the objects.

3.3

Analysis of the objects’ movements

We simplify the analysis of the objects’ movements in animations by abstracting the objects’ locations with points. These points should be placed unambiguously, i.e. viewers should easily determine their visual reference object. Therefore, we determine the midpoint on salient regions of the referred object. We compared two algorithms to determine the midst position for complex-shaped objects (shape approximations with bounding boxes and distance transformations, see Tab. 1) with respect to their exactness,1 performance, and the coherency of their results within subsequent frames of an animation. Method Distance transformation Bounding box

Exact yes no

Outliers no possible

Performance slow fast

Coherency jumps coherent

Table 1: Comparison of methods extracting the mid-point of complex-shaped objects. Distance transformations. For interactive applications the computation of distance transformations for all ID-buffers associated with the frames of an animation is too complex. Down-sampled ID-buffers ease this problem, but serious incoherences between the results in subsequent frames disqualify that technique for our application. 1 Outliers

denote proposed mid-points that are actually not contained in the objects’ projections.

6

Piston

Piston

Piston

Piston

(a) Internally label determined animation path.

Piston

Piston

(b) Externally label determined animation path.

Figure 5: Annotation of animation paths. Shape approximations with bounding boxes. For complex-shaped objects, the bounding box center may not lie within the projected area of the visual reference object and thus produce invalid animation paths. An additional check on the ID buffer whether the center point of the bounding box is an outlier easily fixed that problem. For outliers we hierarchically subdivide the bounding box until the segment is not an outlier any more. Method Point averaging routines Mathematical curve fitting routines Tolerancing routines

# resulting points fixed variable some are fixed

Curvature control bad good bad

Table 2: Comparison of line smoothing algorithms. In a second step the mid points of moving objects within the frames of an animation are connected. In order to achieve an aesthetic appearance like arrows in hand-made illustrations these trajectories are smoothed. We evaluated several line smoothing algorithms with respect to the requirements of our application. Table 2 shows the number of resulting points and the influence about the curvature according to the classification of Lewis et al. [MS92]. Big changes in the position of an object in subsequent frames of an animation frame would produce long line segments and a rough appearance of the animation path, thus point averaging routines (e.g., median) are not sufficient in our application. Because the good control over the resulting curvature of the line stroke, we chose curve fitting routines (in our case B-splines). We visualize the results with additional arrows at the ends of the determined line. If there is enough space for the labeling stroke, we directly project the letters of the annotation onto the label path (internal path annotation). If the space is not sufficient, we place the label’s text outside on a calm region around the object (external path annotation) and connect the text with the animation path of the referred object (see Fig. 5).

7

3.4

Comparison

Both methods described in the previous sections have advantages. The label placement in calm regions minimizes the overlayed space on the object; arrows present trajectories of moving object and can often be found in hand-made illustrations. Thus, it is application dependent, which method should be preferred. In interactive applications, however, the animation itself communicates the object’s movement. Therefore, we suggest to prefer the label placement in calm regions and to use trajectories only for objects without calm regions. For still images and screen-shots trajectories can convey the movement of objects.

4

Results

We developed an experimental system to test both strategies that were discussed in the last section to integrate annotations for animated objects into an interactive 3D visualization. Therefore, we enhanced several 3D models of the Viewpoint 3D library with common 3D modeling programs in order to get sequences of moving 3D objects. As the layout algorithm is based purely on color-coded images, our approach could be applied to procedural animation methods too. The annotation process worked with all models in real-time. The computational performance of the annotation process is almost independent from the geometric complexity of the 3D models. For very complex 3D animations, however, scenegraph based 3D visualization toolkits (Coin3D) as used in our system might not be the best choice in terms of performance. The determination of calm regions required a color-coded renditions of a complete animation sequence. The performance depends on the number of animation frames and the rendering speed of the graphics hardware. With the 3 different animated 3D models (animations of 16 frames) the determination of calm regions Figure 6: Calm (white) and fluctuating (grey) regions needed less than a second. There- of a motor (animated ventilator and pistons). fore, this process starts whenever the user stops interacting with the 3D model. Fig. 6 shows the calm and fluctuating regions of a V8-Motor with a rotating ventilator and accordingly moving pistons. Fig. 7 shows the same motor which is annotated with external labels; their anchors are placed on calm regions. Fig. 8 presents an illustration where the pistons are enriched by arrows showing the trajectories of their movement in the animation. These trajectories should be integrated into static illustrations of instructive texts. In order to provide spatial indications (i.e. depth information), our approach can modify the appearance of arrows. The homepage of our labeling project contains further details and videos.2 2 http://wwwisg.cs.uni-magdeburg.de/∼hartmann/Projects/labeling.html

8

Figure 7: Labeled calm regions.

5 User Study In order to evaluate whether those methods presented in the previous section are appropriate (i) to annotate animated 3D models and (ii) to improve the comprehension of animated 3D models we developed and performed several tests. The test application showed 5 different shapes which were labeled in different fashions. These shapes moved along defined paths; the labels followed these movements or remained on fixed positions. The participants were asked to read the text presented as labels and to associate to their corresponding shape.

5.1

Method

Subjects and design. The tests were conducted with 30 subjects (computer graphics students; 8 female, 22 male) which were subdivided into 2 test groups of 14 and 16 participants. Both test groups absolved the same blind tests with different types of hardware. Materials and apparatus. The first group absolved the test on a 19 inch TFT monitor (Belinea 101920) at 75 Hz refresh rate. The second test group was tested on a 21 inch CRT monitor (Fujitsu-Siemens MCM213V) at 80 Hz refresh rate. The visible diagonal size of both of them were almost equal (TFT: 48.3cm, CRT: 50.0cm). The test application initially presented five different shapes (circle, square, square 90◦ rotated, triangle, triangle 180◦ rotated) on five different starting positions (see Fig. 9). In each test these assignments were chosen randomly. On the right side of the display there were 5 large corresponding buttons for the user interaction.

9

Figure 8: Labeled animation paths. In order to measure the impact of different labeling strategies, these 5 shapes were annotated with slightly differing label texts without any semantic relation to their associated shape. Once again, those assignments were chosen randomly. To generate textual annotations with 4-letter words, the following procedure was used: Each letter of this string was chosen randomly. Out of this random strings, 4 other strings were generated by alternating one of the 4 letters (Levenshtein distance of 1 [Lev66]). An instruction ’Select the button of the shape with the label:’ was presented on the top of the display, followed by a 4-digit string. Procedure. The test subjects were randomly assigned to the groups TFT or CRT. Initially, the test was presented to the participants. They were instructed (i) to read the 4-digit string on the top of the display, (ii) to find the corresponding label in the left side of the display, and (iii) to assign the correct shape by pushing the appropriate shapes in the right side of the display. Additionally, they were instructed to prioritize accuracy over urgency. After this introduction, the users started the tests. Each test had a preparation screen, which showed the consecutive test, except for the string ’Preparation:’. After clicking on a button the actual test was performed.

5.2

Results and Discussion

Scoring. For each test the application logged, (i) whether or not the subjects selected the correct shape and (ii) the selection time in milliseconds. The preparation time was not considered. For a better comparison the timings of each subject were subsequently normalized by dividing it by the results median.

10

Figure 9: Test scene of the user study: Select the corresponding button. Hypothesis 1: Moving text is better readable on CRT than on TFT monitors. First we evaluated the influence of display types. We applied Student’s t-test to determine whether of not this factor results in a significant difference. Statistical significance cannot be considered with a result ≥ 5%(p ≥ 0.5). The t-test yields that no significant difference could be determined both with respect to the correct shape selection (F=0.274, p=0.604) and with respect to the selection time (F