an introduction to mpeg-4 animation framework extension (afx)

5 downloads 0 Views 2MB Size Report
Computer graphics standards such as Virtual Reality. Modeling Language [2] (VRML) or MPEG-4's Binary. Format for Scene [1] (BIFS) are based on common.
AN INTRODUCTION TO MPEG-4 ANIMATION FRAMEWORK EXTENSION (AFX) Mikaël Bourges-Sévenier Mindego Inc., 100 Buckingham drive, Suite 238, Santa Clara, CA 95051, USA ABSTRACT This document presents MPEG-4 Animation Framework eXtension (AFX), a recent amendment to the MPEG-4 Systems specification. AFX defines high-level geometry, texture, volume, and animation components for enhanced interactive multimedia applications. 1

INTRODUCTION

Computer graphics standards such as Virtual Reality Modeling Language [2] (VRML) or MPEG-4’s Binary Format for Scene [1] (BIFS) are based on common industry practice and favor interoperability among players. These standards are made of tools or components organized in a scene graph. The scene graph is a tree structure in which each node is a component and branches are its properties. How is a component defined? Originally, a component follows the famous sentence “one tool, one functionality”. Hence, the first components that appeared in VRML were very low-level in the sense very close to graphic APIs such as OpenGL. However, higherlevel components were needed. BIFS is a binary superset of VRML 2.0 and supports all its features. While VRML 2.0 follows a download-and-play philosophy, BIFS was designed for streaming with other medias. Since its first release at the end of 1998 [1], lowlevel components have been added to BIFS but few highlevel ones. In November 2000, the AFX group started to look at high-level components and a framework to support them with the following fact: in a VRML/BIFS contents with 2D/3D animated objects, 80% or more of the file often contains animation and geometry data. As the tools used are so low-level, lots of information is needed to describe realistic animations and 2D/3D objects. On the other hand, many higher-level tools have been developed for industries such as medical, CAD/CAM, and games. Higher-level components can be defined as providing a compact representation of functionality in a more abstract manner. Typically, this abstraction leads to mathematical models that need few parameters. These models cannot be rendered directly by a graphic card: internally, they are converted to low-level tools a graphic card can render. Besides a more compact representation, this abstraction often provides other functionalities. For example, a

0-7803-7622-6/02/$17.00 ©2002 IEEE

subdivision surface can be subdivided based on the area viewed by the user. This provides four functionalities: compact representation, view-dependent subdivision, automatic level-of-details, and progressive local refinements. Enabling all these functionalities may require lots of computations and an implementation may provide crude capabilities for a limited resource terminal but full support for a desktop. Obviously, the rendering won't have the same quality but the content will be the same. Thus, another benefit of such representations is scalability. The organization of this document is as follows. First, we present MPEG-4 with an emphasis on synthetic scenes where AFX components are used. Then, we present AFX and the components present in the specification. Finally, we discuss perspectives and challenges of future AFX work. 2

MPEG-4 OVERVIEW

The MPEG-4 toolbox [1] contains many tools for audio, video, 2D and 3D graphics, animation, interactivity, stream synchronization, and so on; everything one can expect to build a multimedia platform. Figure 1 shows the internals of an MPEG-4 player. Going from left to right, an incoming stream is received by an abstract interface called the Delivery Multimedia Integration Framework (DMIF). DMIF handles the network connections to retrieve the content. Each stream in the content is then demultiplexed and feed into its corresponding decoder. In contrast with other media decoders, the BIFS (Binary Format for Scene) decoder outputs a tree of objects instead of an array of data as with audio and video. The scene composes all streams together in the compositor in order to render the content. Intellectual Property Management and Protection (IPMP) systems may protect each input and output of the tools. BIFS, the BInary Format for Scene, is a binary representation of a VRML scene graph [5], enriched with capabilities of MPEG-4 streaming architecture. AFX, described in the remainder of this document extends BIFS features [4]. In contrast with BIFS nodes, AFX nodes may have more efficient dedicated encoding [4].

III - 1

IEEE ICIP 2002

Elementary Stream Interface

DMIF

Terminal

Video DB

Video Decode

Video CB

OD DB

OD Decode

AFX DB

AFX Decode

Decoded AFX

BIFS DB

BIFS Decode

Decoded BIFS

Render

Audio CB

Composite

DMUX

Audio DB

Audio Decode

Cognitive Behavior Biomechanical

IPMP DB

IPMP-Ds IPMP-ES IPMP

Animation Framework eXtension

Physics

BIFS Tree

System(s)

Geometry Animation Framework eXtension – AFX ‘effects’

Possible IPMP Control Points

Figure 2 – AFX conceptual organization of models.

Figure 1 – MPEG-4 Systems Architecture and AFX streams.

3.2 3

3.1

THE ANIMATION FRAMEWORK EXTENSION (AFX)

AFX concepts

The AFX specification [4] contains components for rendering geometry, textures, volumes, and animation organized around AFX conceptual organization of models for computer games and animation [11] (Figure 2). To understand this organization, let's take an example. Suppose one wants to build an avatar. The avatar consists of geometry elements that describe his legs, arms, head and so on. Simple geometric elements can be used and deformed to produce more physically realistic geometry. Then, skin, hair, cloths are added. These may be physicbased models attached to the geometry. Whenever the geometry is deformed, these models deform and thanks to their physics, they may produce wrinkles. Biomechanical models are used for motion, collision response, and so on. Finally, our avatar may exhibit special behaviors when he encounters objects in its world. He might also learn from experiences: for example, if he touches a hot surface, it hurts. Next time, he will avoid touching such as surface. This hierarchy also works in a top to bottom manner: if he touches a hot surface, his behavior may be to retract his hand. Retracting his hand follows a biomechanical pattern. The speed of the movement is based on the physical property of his hand linked to the rest of its body, which in turn modify geometric properties that define the hand. AFX does not define models for the last two categories as they are heavily application-dependent and standard techniques are often customized for each application. Animation of the models is possible at any stage of the pyramid, except the last two stages.

Animation

Modeling

The AFX models

AFX defines 6 categories of models, following [11]: 1. Geometric models. They capture the form and appearance of an object. Many characters in animations and games can be quite efficiently controlled at this low-level. Due to the predictable nature of motion, building higher-level models for characters that are controlled at the geometric level is generally much simpler. 2. Modeling models. They are an extension of geometric models and provide linear and non-linear deformations of geometry they control. 3. Physical models. They capture additional aspects of the world such as an object’s mass inertia, and how it responds to forces such as gravity. The use of physical models allows many motions to be created automatically and with unparallel realism. 4. Biomechanical models. Real animals have muscles that they use to exert forces and torques on their own bodies. 5. Behavioral models. A character may expose a reactive behavior when its behavior is solely based on its perception of the current situation. Goal-directed behaviors can be used to define a cognitive character’s goals. They can also be used to model flocking behaviors. 6. Cognitive models. If the character is able to learn from stimuli from the world, it may be able to adapt its behavior. These models are related to artificial intelligence techniques. 3.3

AFX components

3.3.1 Shaping objects VRML objects consist of a polygonal meshes typically described in IndexedFaceSet nodes. The polygonal mesh represents a sampled version of smooth surfaces and small

III - 2

faces approximate curvature. AFX proposes curved surfaces such as NURBS [14], [12] and Subdivision Surface [3]. Subdivision surfaces support well-known Loop [12] and Catmull-Clark [7] algorithms as well as extensions such as normal control and edge sharpness [3]. An extended Loop algorithm enables quadrangulated meshes to be divided smoothly while retaining smooth color transitions during the subdivision process. Hierarchical subdivision surfaces enables progressive detail additions at each level of the subdivision process, wavelets are used to compress the detail signal.

3.3.2 Texturing objects AFX proposes new tools for creating textures: from procedural textures, to light-field mapping [8], to imagebased rendering [10], and to photorealistic synthetic images [7]. Light field mapping offers a compelling solution to efficient interactive visualization of photorealistic reflectance properties of both real and synthetic objects and complete environments. Image-based rendering uses images with depth information to represent objects photo-realistically without needs for any mesh.

Figure 3 – Subdivision surfaces using extended Loop algorithm: starting from quads (left), triangulation creates invisible edges (center). The extended Loop algorithm preserves curvatures during the subdivision process (right).

Figure 7 – Light-field mapping objects and environment (top). A troll represented using depth image-based rendering (bottom). Figure 4 – Subdivision surfaces with normal control.

Photorealistic synthetic textures can be achieved using the SynthetizedTexture framework: color information of an image is represented with various vector graphics tools that can be animated directly from within the scene.

Figure 5 – Hierarchical Subdivision Surfaces. AFX introduces solid modeling [15] that enables content authors to create complex volumes using exact geometry and an extension toconstructive solid geometry (Error! Reference source not found.).

Figure 8 – SynthesizedTexture framework describes images using scene tools in a photorealistic manner. Left: original image, Right: close up showing the scene elements. Figure 6 – Solid modeling operations with density. From left to right: two separate spheres, intersecting spheres, and overlapping spheres. The center line is a cross-section of the left sphere showing the densities.

3.3.3 AFX animation tools As shown in Figure 2, the first four levels of the pyramid can be animated and, as a rule of thumb, the higher the level in the pyramid, the less data needed to convey animation. MPEG-4 BIFS provides animation tools using

III - 3

piecewise-linear interpolators. Such tools often require lots of data to approximate curvature of the real paths and assume a linear timeline. In contrast, AFX proposes the Animator node based on NURBS curve geometry for both the animation path and the timeline (Figure 9). value

0

1

t

0

Discrete

0

1

Linear

0

1

Paced

1

Velocity spline

Figure 9 – Left: the same path is traveled with different timelines. Right: example of animation path made of two NURBS segments. AFX also proposes the Bone Based Animation (BBA) tool that enables skeleton animation. A skeleton is composed of bones. Bones are typically connected to a skin mesh model such that when a bone moves, the skin is deformed accordingly. BBA is biomechanical tool and can be used for any type of skeleton, not just human-like avatars. The skin models can be simple meshes or more complex models such as subdivision surfaces.

Figure 10 – Skeleton definition (left), skin using subdivision surfaces (middle) and refinement (right). As the skeleton is animated, the skin is deformed. 4

FUTURE WORK AND PERSPECTIVE

AFX components provide higher-level representations of geometry, animation, and texturing models than BIFS and VRML. The abstraction provided by these components can be extended to streaming and the AFX group is already working on new areas such as view-dependent streaming and scene partitioning. 5

[2] ISO/IEC 14772-1, The Visual Reality Modeling Language (VRML), 1997 [3] H. Biermann, A. Levin., D. Zorin. "Piecewise-smooth subdivision surfaces". SIGGRAPH 2000 Conference Proceedings. New Orleans, Louisiana. July 23-28, 2000. pp. 113-120 [4] M. Bourges-Sévenier et al. Study of ISO/IEC 144961:2001/PDAM4, Animation Framework eXtension and Multi-User Worlds, document N4852, May 2002 [5] M. Bourges-Sévenier, A. Walsh. MPEG-4 Jumpstart. Prentice Hall, December 2001. [6] M. Bourges-Sévenier, A. Walsh. Core Web3D. Prentice Hall, September 2000. [7] M. Briskin, Y. Elichai, Y. Yomdin, “How can Singularity Theory help in Image Processing?”, Pattern Formation in Biology, Vision and Dynamics, A. Carbone, M. Gromov and P. Prusinkiewitcz, Editors, World Scientific Publishers, pp. 392 – 423, 1999 [8] E. Catmull and J. Clark "Recursively generated Bspline surfaces on arbitrary topological meshes" Computer-Aided Design, 10:350-355, September 1978. [9] W-C Chen, R. Grzeszczuk, J-Y Bouguet. "Light Field Mapping: Hardware-Accelerated Visualization of Surface Light Fields" Part of “Acquisition and Visualization of Surface Light Fields,'' SIGGRAPH 2001 Course Notes for Course #46 [10] P. Debevec, “Introduction to Image-Based Modeling, Rendering, and Lighting”. Siggraph'99 courses, 1999. [11] Funge. J.D. AI for Computer Games and Animation: A Cognitive Modeling Approach. A K Peters Ltd, August 1999. [12] H. Grahn. "NURBS extension for VRML97". Blaxxun Interactive, 2000. [13] C. Loop Smooth subdivision surfaces based on triangles. Master’s thesis, Department of Mathematics, University of Utah, August 1987. [14] Piegl and Tiller. The NURBS book. Springer-Verlag. 1997 [15] J-F Rotgé, "Principles of solid geometry design logic", Proceedings of the CSG 96 Conference, Winchester, UK, p 233-254, 17-19 April 1996.

REFERENCES

[1] ISO/IEC 14496, Coding of Audio-Visual Objects: Systems, January 2001.

III - 4