efficient animation of meshgrid models, using a ... - Semantic Scholar

2 downloads 0 Views 855KB Size Report
the MESHGRID representation for animation purposes relies upon the reduced ..... can be designed in terms of a limited number of points and since the number ...
Proc. 3rd IEEE Benelux Signal Processing Symposium (SPS-2002), Leuven, Belgium, March 21-22, 2002

EFFICIENT ANIMATION OF MESHGRID MODELS, USING A SKIN & BONE SYSTEM A. Salomie, A. Gavrilescu, R. Deklerck, I. Ravyse , J. Cornelis

M. Preda

V.U.Brussel, Dept. ETRO-IRIS Pleinlaan 2, 1050 Brussels, Belgium

Institut National des Télécommunications 9, Rue Charles Fourier 91011 Evry Cedex - FRANCE [email protected]

[email protected]

ABSTRACT In this paper we describe a MPEG-4 AFX compliant Skin&Bone based animation technique applied on a novel surface representation method, called MESHGRID, which is characterized by a connectivity-description defined in terms of a regular 3D grid of points, i.e. the reference-grid. The approach is based on deforming the regular reference-grid in order to obtain the animation of the vertices of the surface attached to this referencegrid. The MESHGRID representation used in the experiments is a humanoid model built from implicit surfaces: e.g. blobs, cylinders, cones, ellipsoids, etc. The flexibility and efficiency of the MESHGRID representation for animation purposes relies upon the reduced number of vertices that are described in the animation script and the hierarchical nature of the representation. Moreover, it is more intuitive to design the animation in terms of the regular reference-grid than to interact directly with the vertices. 1.

INTRODUCTION

Realistic animation of mesh-based surfaces usually requires an extensive amount of data to be coded and transmitted, since at each time instant the position of each vertex belonging to the model has to be known. In this paper we propose an efficient approach, which combines an advanced hierarchical multiresolution surface representation, called MESHGRID [1] with a bone-based animation technique, i.e. AFX Skin&Bones [2], which is an extension of the existing Face and Body animation (FBA) coding in MPEG-4. The MESHGRID surface representation, developed at the ETRO-IRIS department of VUB (Vrije Universiteit Brussel), associated lab of IMEC, is a novel method that has been promoted to the SNHC (Synthetic Natural Hybrid Coding), Working Draft of MPEG-4. 2.

The connectivity-wireframe is similar to the classical wireframe in the sense that it consists of a set of vertices and lines or curves connecting those vertices (see Figure 1). Yet, it is different because the direct polygonal representation may consist of nonplanar polygons ranging from triangles to heptagons, which have to be split in triangles for the final polygonal representation. It has a regular structure, each vertex being connected to exactly four other vertices, and is flexible enough to fit complex surfaces. Each vertex of the connectivity-wireframe is located on the gridline between two reference-grid points, one located inside, and the other one outside the object (see Figure 3).

THE MESHGRID SURFACE REPRESENTATION

The basic idea of the new surface representation called MeshGrid is (1) to define a regular 3D grid of points, called reference-grid, and (2) to attach a wireframe description of the surface of the object to this reference-grid. The wireframe description, which is called the connectivity-wireframe, keeps the connectivity information between a set of vertices located on the surface of the object. The reference-grid stores the spatial distribution of the vertices from the connectivity-wireframe. Each reference-grid point is identified by a discrete position (u,v,w), and by a coordinate (x(u,v,w),y(u,v,w),z(u,v,w)).

S02-1

(a)

(b)

Figure 1: MeshGrid representations of a sphere: (a) discrete, (b) continuous. The vertices of the surface are displayed as balls and are connected via the connectivitywireframe. The benefits of the MESHGRID representation are that both a very compact coding of the geometry and a global or view-dependent multi-resolution representation of the original mesh description are possible. The connectivity-wireframe is efficiently encoded by using a new type of 3D extension of Freeman chain-code. The reference-grid is a smooth vector field defined on a regular discrete 3D space, i.e. each reference-grid point is identified by a discrete position (u,v,w), and the coordinates (x,y,z), are efficiently compressed using an embedded 3D wavelet-based multi-resolution intra-band coding algorithm. A MESHGRID mesh can be obtained in several ways: (1) from discrete and mathematical represented 3D models, by applying a surface extraction method, called TRISCAN, (2) INDEXEDFACESET models can be brought to the MESHGRID representation by remeshing using a similar TRISCAN surface extraction approach and (3) by converting quadrilateral meshes in a lossless manner by deforming the reference grid towards the surface mesh. The idea behind the TRISCAN technique is to build the global connectivity (i.e. the connectivity-wireframe) of the surface of an object such that surface primitives can be unambiguously derived

Proc. 3rd IEEE Benelux Signal Processing Symposium (SPS-2002), Leuven, Belgium, March 21-22, 2002

from the connectivity information. The surface of the object is approximated by the union of these surface primitives. The TRISCAN technique can be classified in the group of contour oriented methods for surface extraction. In order to build the connectivity-wireframe, we apply a border-tracking engine in three sets of reference surfaces cutting the object. The surface primitives are filling the connectivity-wireframe, and a set of rules has been designed to determine the orientation and the type of primitives that fit a certain closed/open circuit in the connectivitywireframe (see Figure 2). The polygonization of the connectivitywireframe is done in a later stage or when needed. The shape of the objects is not an issue: objects with or without holes are correctly handled. Our approach distinguishes from both () the other contour oriented methods for surface extraction ([3]), which perform the contouring in only one set of reference surfaces, and () from the local methods based on a lookup table to extract the surface primitives ([4, 5]), although there are similarities between the surface primitives generated with TRISCAN and the ones obtained by means of the local methods (e.g. the Marching Cubes algorithm [6]).

(a) 6 1 2 3 4 5

(b)

(a)

(b)

Figure 3: Animation of the model by reshaping the reference-grid nodes. A slice through the reference-grid is displayed in grey lines (label 1). The contour, displayed in black (label 4), consisting of the vertices (label 5) lying at the surface of the object, is attached to the referencegrid. (a) The original reference-grid and the original contour. (b) Final shape of the reference-grid and the corresponding deformations of the contour.

(c)

Figure 2 Images illustrating some of the connectivity cases corresponding to (a) the triangle primitive (b) the rectangle primitive (c) the hexagon primitive. The MESHGRID representation can be efficiently exploited in progressive transmission schemes, since it allows for three types of scalability in both view-dependent and view-independent scenarios, including: (1) resolution scalability, i.e. the adaptation of the number of transmitted vertices, (2) shape precision, i.e. the adaptive reconstruction of the reference-grid positions, and (3) vertex precision scalability, i.e. the change of the precision of known vertex positions with respect to the reference-grid.

Figure 3 illustrates the concept of this shape deformation technique in a planar slice through the reference-grid. The vertices (label 5) attached to the reference-grid are lying at the surface of the object and define a 2D-contour (label 4) in the planar slice. Note that in general, the vectors (x(u,v,w),y(u,v,w),z(u,v,w)) defining the reference-grid points, will define curved slices (i.e. reference-surfaces) when keeping one of the coordinates u, v, w, constant and that the vertices attached to these vectors will form a 3D contour. By definition, vertices are lying at the intersection between the grid lines and the surface of the object, at a certain ratio (label 6) in between two reference-grid points: i.e. the vertex-precision ratio, which is defined in terms of vertex-precision bits in the MeshGrid surface representation. In absence of these bits, its value is set to 0.5, which means that the vertices will be positioned in the middle between a pair of grid-points, consisting of one external point (label 2) and one internal point (label 3). When the grid is deformed as shown in Figure 3 (b), vertices will move accordingly and their new locations will be computed from the altered positions of the grid-point pairs, via a linear interpolation based on the vertex-precision ratio. In order to preserve the consistency of the connectivity-wireframe, deformations should be kept smooth to avoid that grid-lines start intersecting each other at a local scale.

2.1. Animation techniques of a MESHGRID representation The MESHGRID model is very flexible for animation purposes, since, in addition to the vertex-based animation typical for INDEXEDFACESET models, it allows for specific animation types, such as (1) rippling effects by changing the position of the vertices relative to corresponding reference-grid points and (2) reshaping of the regular reference-grid. The latter form of animation can be done on a hierarchical multi-resolution basis: i.e. deforming the reference-grid for a lower resolution mesh will propagate the action to the higher levels, while applying a similar deformation at the higher resolution levels will only have a local impact. The vertices of the wireframe will be updated, each time the grid point it is attached to, moves.

S02-2

Proc. 3rd IEEE Benelux Signal Processing Symposium (SPS-2002), Leuven, Belgium, March 21-22, 2002

3.

To reduce the number of triangles needed to represent the humanoid model, a non-uniformly distributed Reference-Grid, was interactively designed to have a much higher density in the areas of the joints [Figure 4(a)]. In our experiment, the Reference-Grid is hierarchically organized in three levels, by means of a wavelet decomposition. The first level (Level 1) contains 1638 points, the second level 11056 points and the third and final level 83349 points. The lowest resolution level is chosen as the base model for the animation, while higher mesh resolution levels are improving the smoothness of the deformations at the joints and of the overall shape. As can be noticed, the reference-grid points contained in the lowest resolution level represent only a small percentage of the total number of points present in the final level. Hence, such a hierarchical approach is very advantageous since the animation can be designed in terms of a limited number of points and since the number of computations needed to apply the Skin&Bone transformation to each of these points will be reduced. The new positions of the reference-grid points belonging to a higher resolution level (l+1) can be computed efficiently via discrete interpolation filters from the new positions of the neighboring reference-grid points at a lower resolution level (l) and by adding the details of the wavelet decomposition at that  level. The interpolated position of a reference-grid point 0 is computed according to “Dyn’s four point subdivision scheme for curves” [8] as follows:

SKIN&BONES ANIMATION TECHNIQUES

In order to meet a satisfactory degree of visual attractiveness during animation, 3D graphical models for human and animal bodies, should minimally consist of two components [7]: 1) a representation for the skeleton and 2) a surface representation of the skin surrounding it. The skeleton is simply a collection of segments and joint angles with various degrees of freedom at the articulation sites: typically rotation and/or translation, which may evolve between authorized values based on real human or animal mobility capabilities. The motion of the skeleton drives the deformation of the seamless surface model representing the skin, which is in general a triangular mesh or a set of surface patches. The so-called Bone-based Animation (BBA) [2] specification of the AFX group of MPEG-4 is based on the same ideas. With both objectives of (1) performing realistic animation and (2) addressing low-bitrate streamed animation, the AFX group is currently finalising the specifications allowing animation for any kind of articulated models based on skeleton modeling. AFX Skin&Bones (SB) related nodes allow for the definition of any kind of skeleton hierarchy, with any kind of attached geometry (MESHGRID, INDEXEDFACESET or higher order geometry nodes like NURBS and SUBDIVISION SURFACES), and any appearance attributes. In order to define a static 3D pose of an articulated figure including geometry, color and texture attributes, one single global seamless 3D mesh has to be built for the entire figure. The bone skeleton is provided together with a field of weighting values specifying, for each vertex of the mesh or reference-grid point in case of the MESHGRID representation, the related influence of the bones directly altering the 3D position of the point. Moreover, the weighting factors can be specified implicitly and more compactly by means of two influence regions defined through a set of crosssections attached to the bone. The Skin&Bones animation bitstream (BBA stream) allows for a compact representation of motion parameters. The seamless 3D model is semantically separated in component parts with respect to the animation requirements. Due to the hierarchical structure of this type of models, animation implies relative registering of the bone by taking its parent as a reference point. The AFX specifications allow generic registration by means of translation, rotation and scale components; usually, in the case of realistic animation the rotation component is sufficient. Animation of the character is achieved by moving the bones through updates of the geometric transformation component of the skeleton. During animation, each 3D point related to the skin is updated by a translation component obtained from the associated bones transformation. 4.

L

N

 Ÿ

¬ ®

0   žžW q 0  žŸž  W ­¬®­ q 0  žŸž  W ­¬®­ q 0   W q 0  ­­­ L

N

L

N

L



L

N



N

L

N

where l represents the hierarchical level of the grid point. This interpolation scheme is applied three times: i.e. consecutively along the u, v and w directions. For computing the details in the wavelet decomposition, the inverse of this filter is used, while for the LLL subband it suffices to apply only subsampling, seen the smoothness of the reference-grid. This approach has the advantage that the grid-points present in the lowest resolution, will stay unchanged in the higher resolutions. For the weight w = 0, one obtains a short filter predicting the position of the grid point 0  from its two immediate neighbours at a lower resolution level l via linear interpolation. The case w =  leads to a longer filter, but guarantees a smooth interpolation, since it corresponds to fitting a CatmullRom or Cardinal spline curve through the points. When using this long filter, one needs to make sure that neighbouring parts of the model moving in different directions are separated by at least four reference-surfaces, to avoid mutual influences on their motions. For the short filter a more compact model can be designed, since two separation planes will be sufficient. Yet, the reference-grid and the derived surface of the animated object will be less smooth for the higher resolution levels. The benefits of specifying an animation in terms of a hierarchical reference-grid are even more pronounced when comparing this approach to animation methods directly using the vertices from for instance an INDEXEDFACESET representation. This difference in complexity is illustrated by Figure 5, which depicts the knee of the humanoid model. As one can see in Figure 5 (a), the number of vertices contained in the surface mesh of the second resolution level is already quite high (216 vertices), while the reference-grid at the lowest level only consists of 27 points at the height of the knee (three planes defined by nine points each). Although the number of grid points will be higher when applying the long interpolation filter, the fact that the grid is defined on a regular space, seriously simplifies the interactive selection of the grid points, since it is possible to determine the majority of these L

N

EXPERIMENTS

For our experiment we built a humanoid model from a combination of implicit surfaces and converted this mathematical representation via the TRISCAN method in a multi-resolution MESHGRID representation, each resolution consisting of a single seamless mesh. In our approach the reference-surfaces defining the referencegrid have been chosen in such a way that they pass through the anatomical articulations (joints) of the body, in order to be able to virtually split the single mesh into meaningful anatomical parts (such as the shoulder, elbow, wrist), that can be driven by the hierarchical skeleton definition from the Skin&Bones animation script.

S02-3

Proc. 3rd IEEE Benelux Signal Processing Symposium (SPS-2002), Leuven, Belgium, March 21-22, 2002

points automatically, once a few key points have been chosen. Moreover, the same animation script can animate (1) any resolution of the MESHGRID model, due to its hierarchical construction, and (2) any model with a reference-grid that is defined in a compatible way with the reference model used in the animation script. Compatibility can be achieved by making a

(a)

(b)

similar set of reference-surfaces passing through the same anatomical articulations of the body. Animating the INDEXEDFACESET representation will require a different script for each resolution and for each model.

(c)

(d)

(e)

Figure 4 (a) Frontal view of the humanoid model, shown at the highest resolution level (b) The same model displayed at the lowest resolution level, with the reference-grid drawn in overlay. Note that the reference-grid is defined to pass through the joints of the model (such as the elbow, wrist and ankle). The short filter is used to compute the higher resolution levels. (c)(d)(e) Successive moments of an animated humanoid model, displayed at the highest resolution level.

5.

[5] T. Poston, H. T. Nguyen, P-A. Heng, and T-T. Wong, “Skeleton Climbing: Fast Isosurfaces with Fewer Triangles”, in Proceedings of Pacific Graphics’97, Seoul, Korea, pp. 117-126, October 1997. [6] W. E. Lorensen and H. E. Cline, “Marching Cubes: A high resolution 3D surface construction algorithm”, in Proceedings Computer Graphics (SIGGRAPH ’87), vol. 21, pp. 163-169, 1987. [7] A. Aubel, R. Boulic, D. Thalmann, “Real-time Display of Virtual Humans: Level of Details and Impostors”, IEEE Trans. Circuits and Systems for Video Technology, Special Issue on 3D Video Technology, 10(2): 207-217, 2000. [8] N. Dyn, D. Levin and J. A. Gregory, “A four-point interpolatory subdivision scheme for curve design”, ComputerAided Geometric Design, 4: 257-268, July 1987.

CONCLUSIONS

The proposed Skin&Bone hierarchical animation technique based on the MESHGRID representation method offers serious advantages in terms of compactness, design simplicity and computational load, compared to a bone-based animation defined for a complex single resolution model described as an INDEXEDFACESET. The major advantage compared to animation techniques defined for other hierarchical models (e.g. SUBDIVISION SURFACES), is that it is more intuitive to address the regularly defined grid points than the vertices, and that it is possible, with the same script, to animate compatible MESHGRID models. In the future we intend to improve the realism of the animation by elaborating some parts of the model in more detail (e.g. the hands, the face), by adding textures and by taking into account muscle deformations. 6.

REFERENCES

[1] I. A. Salomie, A. Gavrilescu, R. Deklerck, J. Cornelis, “Flexible MeshGrid Representation and Coding”, ISO/IEC JTC1/SC29/WG11 MPEG2001/7038, Sydney, July, 2001. [2] M. Preda, F. Preteux, “Skin & Bone implementation in reference software”, ISO/IEC JTC1/SC29/WG11 MPEG2001/7587, Pattaya, December, 2001. [3] C. Bajaj, E. Coyle, and K. Lin. Arbitrary Topology Shape Reconstruction from Planar Cross Sections. Graphical Models and Image Processing, 58(6): 524-543, 1996. [4] J. O. Lachaud, “Extraction de surfaces à partir d’images tridimensionnelles: approches discrete et approche par modèle deformable’’, Ph.D. Thesis, July 1998.

(a)

(b)

Figure 5 Snapshot of the humanoid knee during the animation. The reference-grid lines are displayed in black. The surface at the second resolution level is displayed as a wireframe in (a) and Gouraud-shaded in (b).

S02-4