Dead-Reckoning Algorithms for Synthetic Objects ... - Semantic Scholar

1 downloads 0 Views 131KB Size Report
Dead-Reckoning Algorithms for. Synthetic Objects in MPEG-4 SNHC. Tolga K. Capin1, Igor Sunday Pandzic2,. Nadia Magnenat Thalmann2, Daniel Thalmann1.
Dead-Reckoning Algorithms for Synthetic Objects in MPEG-4 SNHC Tolga K. Capin1, Igor Sunday Pandzic2, Nadia Magnenat Thalmann2, Daniel Thalmann1 1Computer Graphics Laboratory

Swiss Federal Institute of Technology CH1015 Lausanne, Switzerland {capin,thalmann}@lig.di.epfl.ch http://ligwww.epfl.ch 2MIRALAB-CUI

University of Geneva 24 rue de Général-Dufour CH1211 Geneva 4, Switzerland {Igor.Pandzic,Nadia.Thalmann}@cui.unige.ch http://miralabwww.unige.ch

Abstract MPEG-4 SNHC requires transmission of object parameters through a bitstream, with a variety of bitstream bandwidth and latency. Especially with complex environments and a large number of transmitters, these requirements might be excessive. Therefore, tools should be provided to decrease the amount of information sent through the bitstream. In this paper, we propose to use in SNHC the dead-reckoning technique that has been successfully used in the networked virtual environments area. The technique is based on extrapolation of the object parameters depending on the last animation parameters received from the transmitter during the update phase. In this paper, we survey the use of the deadreckoning technique for synthetic objects in MPEG-4 SNHC, particularly simple nonarticulated rigid objects, human faces and bodies.

Keywords: MPEG-4, SNHC, dead-reckoning, human face, human body, Kalman filtering.

1. Introduction SNHC has selected a few anchor applications to abstract essential requirements and develop concrete initial results. The initial applications are audio/video with 2D graphics and interpersonal communication. The next generation of applications can be hybrid media conferencing, tele-teaching or 3D networked virtual environments [Doenges97]. All these applications require real-time streaming with high bandwidth and low latency, to communicate the objects' state changes. However, the low-bitrate streams fail to supply this quality of service. Additionally, in applications requiring a large number of transmitters, the communication performance might be excessive. In networked virtual environments (NVE) field, the dead-reckoning technique has been proposed to decrease the amount of messages communicated among the participants by extrapolating the next position depending on the last received object parameters. Initially, the dead-reckoning algorithm has been applied to rigid non-articulated objects [IEEE93][Macedonia94], and recently has been applied to the virtual human body [Capin97]. This paper is an extension of [Capin97] from the body object to the synthetic simple objects and the face. In this paper, we overview the dead-reckoning technique and how it can be incorporated in SNHC. The next section discusses the basic dead reckoning technique. Then, we survey the animation parameters for the body and the face in MPEG-4 SNHC, and we propose the usage of dead reckoning technique for these objects.

2. Dead Reckoning Technique The dead-reckoning technique is a way to decrease the amount of messages communicated among the participants, and is used for simple non-articulated objects in popular systems such as DIS [IEEE93], NPSNET [Macedonia 94]. To describe the dead-reckoning algorithm, similar to [Gossweiler 94], we can give an example of space dogfight game with n players. Each player is represented by, and can control, a different ship. When a player X moves its own ship, it sends a message to all n-1 players, containing the new position. When all players move once, a total of n*(n-1) messages are communicated. To reduce the communication overhead, the player X sends the ship's position and velocity to other participants. The other participants will use the

velocity information to extrapolate the next position of the participant X. This extrapolation operation is named dead-reckoning. In this approach, each participant also stores another copy of its own model, called ghost model, to which it applies dead-reckoning algorithm. If the difference between the real position and this additional copy is greater than a predefined maximum, then player X sends the real position and velocity to other participants, so that they can correct their copy of participant X's object. Note that player X sends messages only if there is a big difference between the real position and the extrapolated one. The performance of the dead-reckoning algorithm is dependent on how it correctly predicts the next frames. Therefore, the characteristics of the simulation, and the underlying object should be taken into account for developing the algorithm.The deadreckoning technique for non-articulated rigid objects is straightforward. Message transmission occurs if the Euclidian distance between the object and its ghost is greater than a threshold distance, or the 3D angle between the object and its ghost is greater than a threshold degree. The extrapolation computation is also straightforward: the ghost object is transformed (i.e. translated and rotated) using the current translational and rotational speed.

3. Dead-Reckoning Algorithm for Virtual Body

3.1. Overview of Body Representation in SNHC Real-time representation and animation of virtual human figures has been a challenging and active area in computer graphics [Boulic 95][Badler 93]. Typically, an articulated structure corresponding to the human skeleton is needed for the control of the body posture. Surfaces representing the body shape have to be attached to the skeleton, and clothes may be wrapped around the body shape. The Face and Body Animation Ad-Hoc Group within MPEG-4 SNHC group has proposed a set of parameters to animate the virtual human body, called BAPs (Body Animation Parameters) [MPEG 96]. The BAPs, if correctly interpreted, will produce reasonably similar high level results in terms of body posture and animation on different body models, without the need to initialize or calibrate the model. No assumption is made and no limitation is imposed on the number of articulation (joints) in the human body model and the range of motion of joints. In other words the human

body model should be capable of supporting various applications, from realistic simulation of human motions to network games using simple human-like models. 4 types of BAPs are defined:

Global Positioning Domain Parameters: These are the global position and orientation values of particular observable points on the body, in the body coordinate system. Possible choices: top of head, back of neck (C7-T1), mid-clavicle, shoulders (acromion), elbow, wrist, pelvis(L3-L4), hip, knee, ankle, bottom of mid-toe. Joint Angle Domain Parameters: These parameters comprise the joint angles connecting different body parts. Possible candidates: toes, ankle, knee, hip, the spine (C1-C7, T1-T12, L1-L5), shoulder, clavicle, elbow, wrist, and the fingers. The detailed joint list, with the rotation normals are given in [MPEG96]. The rotation angles are assumed to be positive in the counterclockwise rotation direction with respect to the rotation normal. The rotation angles are defined as zero in the default posture, which is standing posture. Note that the normals of rotation move with the body, and they are fixed with respect to the parent body part. That is to say, the axes of rotation are not aligned with the body or world coordinate system, but move with the body parts. Hand and Finger Parameters: The hand is capable of performing complicated motions and there are at least fifteen joints in the hand, not counting the carpal part. The inclusion of the fifteen joint data, e.g. available from cyberglove, was proposed [MPEG96]. Force Parameters: Force parameters should be a part of the human animation system so that human body animations can be generated by applying forces of certain magnitude and direction, to specific places on the body model. The force parameters are given by specifying the direction and magnitude of the force, and the application point on the body. The position and orientation are in the body coordinate system. No assumption is made on the

application point of the force (i.e. it can be on the surface or on the skeleton, or in the between). High Level Parameters High level parameters can be used to define high level expressions, or motions, without having to describe them by lower level parameters. The set of high level parameters and their input values have not been defined yet.

r_scapula vt6 r_shoulder r_elbow r_wrist

head_top head vc8 vc7 l_scapula l_shoulder l_elbow

r_clavicle

r_hand_center

vt5 l_clavicle

l_wrist l_hand_center

vt4 vl3 vl2 pelvis vt1 r_cp_pelvic r_hip

r_knee

r_ankle r_subtalar r_mid_foot r_toe

l_cp_pelvic l_hip

l_knee

l_ankle l_subtalar l_mid_foot l_toe

Figure 1: Virtual Human Figure Representation In this paper, we only consider the joint angle domain parameters, however the technique can also be adopted to the other types of parameters.

3.2. Dead-Reckoning for Virtual Human Body The dead-reckoning algorithm on virtual human figures works on their position and body joint angles. There are two main possible levels of dead-reckoning of virtual human body:

-action-independent dead-reckoning: This approach requires no knowledge on the type of action the figure is executing (e.g. walking, grasping) and no information on the motion control method (e.g. inverse kinematics, dynamics, real-time motion capture). The BAPs of the virtual body are considered to be only available information; and the dead-reckoning computations are performed on this information. - action-based dead-reckoning: The algorithms within this approach know that the actor is performing a particular action (e.g. walking), and parameters of the actor's state (e.g. tired). There have been a few work on the automatic recognition and synthesis of the participants' actions [Unuma95][Tosa96]. In this type of dead-reckoning, the algorithm uses the parameters of the current motion (for example, speed and direction for walking); and uses higher level motion control mechanisms (walking motor [Boulic90]) to obtain the motion. In this paper, we provide a joint-level dead-reckoning algorithm. In order to predict inbetween postures between messages, we use a Kalman Filter.

3.3. Kalman Filtering The Kalman filter is an optimal linear estimator that minimizes the expected mean-sequare error in the estimated state variables. It provides a means to infer the missing information using noisy measurements, and is used in predicting the future courses of dynamic systems. Its efficient recursive formulation allows the algorithm to keep up with the realtime requirements of posture prediction. For further information on Kalman filtering, see [Brown92]. Previously, in the virtual reality field, the Kalman filter has been applied to decrease the lag between tracking and display in the head trackers, for an overview see [Azuma95][Foxlin96]. There is considerable freedom in modeling the system, depending on the knowledge of the modeled processes. In our system, we make the following assumptions: at a time frame, only the joint angles of the body are available, and their velocity and acceleration are to be computed. The 74 degrees of freedom can be decomposed into 74 independent 1-dof values, each using a separate predictor. This makes the system simpler. For this, we assume that joint angles change across the prediction interval are small, therefore it is possible to represent rotations by yaw, pitch, roll operations where the order of operations is not important. Based on these assumptions, we use a linear Kalman filter of

Markov-Gaussian type. This allows us to have a simple solution without having any further information or make assumptions on the type of action that the virtual body is performing.

3.4. Dead Reckoning Algorithm for Virtual Body The dead-reckoning algorithm is as follows: for each participant p do initialize Kalman filters for body p At each time step do for each participant p do if (p == mybody) then /* Compute measured body joints -> store in body[p] */ body[p] = Measure() /* Compute predicted body joints of local body: */ ghost_body = Kalman( ghost_body) /* Compare body[p] and ghost_body joint angles */ delta = compare( body[p], ghost_body) if (delta > maximum_allowed_data) then Send message m with body joints of body[p] Copy body[p] joints to ghost_body joints endif end else if (message m arrived from participant p) Copy message m joint angle contents to body[p] endif body[p] = Kalman( body[p]) end endfor end Figure 2: Dead-reckoning algorithm for Virtual Body

Note that, even if the participant Y receives a message from participant X at time frame i, it still performs Kalman filter computations for body X. This is due to the delay between the time participant X body posture is obtained, and the time that participant Y receives the message that contains this information. One important decision is to select the metrics to compare two body postures. The common practice to compare two postures has been to compare them with eye. However, the dead-reckoning algorithm requires mathematical comparison of joint angles. There are many possibilities to decide on a comparison metric, among them:

1. maximum of differences between corresponding angles: max ( body[mybody][joint] - ghost_body[joint])

joint = 1 .. 74

2. maximum of differences between corresponding angles, with different a coefficient for each joint max ( coef(i)*(body[mybody][joint] - ghost_body[joint])) joint = 1..74 3. 3D distance between corresponding joints Approach 1 assumes that every angle has equal importance for the posture. However, in most cases, some angles have slight effect on the overall posture (for example, in the hand waving posture, the wrist angles have small effect). Approach 2 tries to take this into consideration by assigning a coefficient to each degree of freedom. The third approach uses the 3D position of each joint, and computes the linear distance between corresponding joints. The third approach is expected to achieve better results in providing a metric to compare two postures; because it takes into consideration the 3D positions similar to comparison by eye, and the overall posture of the body. In the next section, we compare the performance of these actions on example sequences.

4. Dead-Reckoning for the Face

4.1. Overview of Face in SNHC Similar to the body, the Face and Body Ad Hoc group has decided on a set of Facial Animation Parameters (FAPs) to animate faces. The FAPs are based on the study of

minimal facial actions and are closely related to muscle actions. They represent a complete set of basic facial actions, and therefore allow the representation of most natural facial expressions.

Figure 3: FAP Units in MPEG-4 SNHC All the parameters involving translational movement are expressed in terms of the Facial Animation Parameter Units (FAPU). These units are defined in order to allow interpretation of the FAPs on any facial model in a consistent way, producing reasonable results in terms of expression and speech pronunciation. The FAPUs are listed in [MPEG96]. They correspond to fractions of distances between some key facial features. The fractional units used are chosen to allow enough precision. For each FAP the list contains the name, short description, whether the parameter is unidirectional (can have only positive values) or bi-directional, definition of the direction of movement for positive values and definition of the measurement units. The measurement units are either degrees or one of the FAPUs [MPEG96]. One parameter (depress_chin) is defined in terms of intensity ranging from 1 to 10. The parameter set contains three high level parameters. The viseme parameter allows to render visemes on the face without having to express them in terms of other parameters or to enhance the result of other parameters, insuring the correct rendering of visemes. The full list of visemes is not defined yet. Similarly, the expression parameter allows definition of high level facial expressions.

4.2. Dead-Reckoning for the Face

Currently the dead-reckoning technique for the face is in an initial stage. For the face, we propose a similar dead-reckoning approach to the virtual body. Typically, each FAP in the FAP set can be separated to a different Kalman filter. Thus,, there would be as many Kalman filters as the number of FAPs, each containing the current value, the velocity and acceleration of the FAP. Currently, we are developing the dead-reckoning technique technique for the face, therefore currently we cannot provide more details on the filter for the face.

5. Experimental Results We have chosen the joint-level dead-reckoning technique as the initial implementation, and tried its performance with varying conditions using the VLNET (Virtual Life Network) system that we have been developing [Thalmann95]. As representative examples, we selected three actions: a football kick sequence, a jumping sequence, and a hello sequence. Figure 4 shows the three actions, with their joint changes with respect to time. The football kick action consists of a slightly jerky motion, due to the nature of the behavior and the tracking noise. The jumping sequence is more predictive, except the beginning of the action. The hand-wave sequence mainly involves the right arm joints, with various joint behaviors with respect to time. Figure 5 shows the performance of the dead-reckoning program for the three example actions, with three approaches of posture comparison as discussed in the previous section. The x-axes show the maximum angle difference between corresponding joint angles of local body and ghost body in Figure 5(a) and 5(b), and maximum Euclidean distance between corresponding joints Figure 5(c). The y-axis denotes the percentage of timesteps where the actions caused message transfer, to the whole period of the motion. A percentage of 100% denotes that the dead-reckoning operation has not been performed, and 70 % shows that 30 % of the timesteps were successfully predicted using the deadreckoning operation. Figure 5(a) shows the results for the basic algorithm, with varying maximum allowed angle differences between joints. As expected, as the limit increases, the algorithm prediction rate increases, hence the message communication decreases. Figure 5(b) results were taken using approach 2 of posture comparison by decreasing the coefficient of twisting angles, with the assumption that they have less effect on the final posture. Figure 5(c) shows the results of the approach 3 with varying maximum Euclidean distances. The

resulting animation was also similar to the original motion, when observed with eye. The results show that using distance metric for comparison achieves better performance in dead-reckoning than joint angle comparison. With an error estimate of maximum 15 cm, a 50 % decrease in exchange of messages can be achieved.

Integration of Dead-Reckoning in MPEG-4 SNHC Figure 6 illustrates the block diagram of an object decoder and its input-output possibilities, taken from [MPEG96b]. The object receives from the animation parameters from input stream, and outputs the final produced mesh and surfaces to the video renderer, which is later composited with other AV objects. The Body object can also receive local controls that can be used to modify the look or behavior of the body locally by a program or by the user. There are three possibilities of local control. First, by sending locally a set of BDPs to the Body the shape and/or texture can be changed. Second, a set of Amplification Factors can be defined, each factor corresponding to an animation parameter in the BAP set. The body object will apply these Amplification Factors to the BAPs, resulting in amplification or attenuation of selected body actions. The third local control is allowed through the definition of the Filter Function. This function, if defined, will be invoked by the Body object immediately before each rendering. The Body object passes the original BAP set to the Filter Function, which applies any modification to it and returns it to be used for the rendering. The Filter Function can include user interaction. It is also possible to use the Filter Function as a source of body animation if there is no bitstream to control the body, e.g.in the case where the body is driven by a deaf language system using the text as input. With this architecture, the dead-reckoning process can be easily integrated with the synthetic object through this local control input. The object should be able to pass the latest received animation parameter set to the dead-reckoning module, and the deadreckoning module passes back the extrapolated parameters at each time step. This requires two-way communication for local control, and further information regarding the timestep of the animation, etc. Therefore, there is a need to extend the functionalities of the synthetic objects for these tasks.

Conclusions and Future Work

In this paper, we have proposed the usage of dead-reckoning algorithm within MPEG-4 SNHC for synthetic objects, particularly simple object, faces and bodies. The obtained results show that, with acceptable errors in the posture information; it is possible to decrease the network communication overhead considerably. It was also shown that the performance of the general-purpose filter is highly based on the characteristics of the instanteneous motion. The work on networking virtual human figures is just beginning; and it is necessary to develop adaptive dead-reckoning algorithms which adjust their parameters depending on the current motion. These developments will definitely improve the network performance for communicating synthetic object data.

Figure 4: Example sequences and the joints value changes with respect to time. (a) football, (b) hello, (c) jump sequence. Percentage of sent messages during the simulation

(a) Delta Angle (degrees)

(b) Delta Angle (degrees)

(c) Distance (cm) Figure 5: Performance of the Kalman Filter with varying delta values. y-axis shows the percentage of message communication.(a) approach 1 for comparison (maximum of joint angle difference), (b) approach 2 (maximum of joint angle

difference with corresponding angle coefficients, (c) approach 3 (Euclidean distance between corresponding difference.

Figure 6: Input-outputs of a synthetic object in MPEG-4 SNHC. The input "Local Control and Interaction" can be used for interfacing the dead-reckoning program with the object.

Acknowledgements The authors would like to thank members of the MPEG-4 SNHC Body and Face Ad Hoc Group, Ronan Boulic for his help in body definition in MPEG-4, Tom Molet for their flock of birds interface and motion sequences; Jean-Michel Puiatti for his basic implementation of Kalman filter; and the assistants at LIG and MiraLAB for their contributions in the human models. This work is partially supported by European ACTS VIDAS and COVEN projects.

References [Azuma95] Ronald Azuma, Gary Bishop, “A Frequency-Domain Analysis of HeadMotion Prediction”, Proc. ACM SIGGRAPH’95, 1995. [Badler 93] N. I. Badler, C. B. Phillips, B. L. Webber, Simulating Humans: Computer Graphics Animation and Control, Oxford University Press, 1993.

[Boulic 90] Boulic R., Magnenat-Thalmann N. M.,Thalmann D. "A Global Human Walking Model with Real Time Kinematic Personification", The Visual Computer, Vol.6(6),1990. [Boulic 95] R. Boulic et al. "The HUMANOID environment for Interactive Animation of Multiple Deformable Human Characters", Proc. Eurographics'95, 1995. [Capin97] T. Capin, I.S. Pandzic, N. Magnenat-Thalmann, D. Thalmann, "A DeadReckoning Algorithm for Virtual Human Figures", Proc. IEEE VRAIS'97. [Doenges97] P. Doenges, T. K. Capin, F. Lavagetto, J. Ostermann, I. S. Pandzic, E. Petajan, "MPEG-4: Audio/Video and Synthetic Graphics/Audio for Real-Time, Interactive Media Delivery", Image Communications Journal, 1997 (to appear). [Foxlin96] E. Foxlin, “Inertial Head-Tracker Sensor Fusion by a Complementary Separate-Bias Kalman Filter”, Proc. IEEE VRAIS’96. [Gossweiler 94] Rich Gossweiler, Robert J. Laferriere, Michael L. Keller, Pausch, "An Introductory Tutorial for Developing Multiuser Virtual Environments", Presence: Teleoperators and Virtual Environments, Vol. 3, No. 4, 1994. [IEEE93] Institute of Electrical and Electronics Engineers, International Standard, ANSI/IEEE Standard 1278-1993, Standard for Information Technology, Protocols for Distributed Interactive Simulation, March 1993. [Macedonia 94] M.R. Macedonia, M.J. Zyda, D.R. Pratt, P.T. Barham, "NPSNET: A Network Software Architecture for Large-Scale Virtual Environments", Presence: Teleoperators and Virtual Environments, Vol. 3, No. 4, 1994. [MPEG96] ISO/IEC JTC1/SC29/WG11 M1365: "Face and body definition and animation parameters", MPEG, 1996. [MPEG96b] ISO/IEC JTC1/SC29/WG11 M1364: "Draft Specification of SNHC Verification Model 1.0", MPEG, 1996. [Thalmann95] D. Thalmann, T. K. Capin, N. Magnenat Thalmann, I. S. Pandzic, “Participant, User-Guided, Autonomous Actors in the Virtual Life Network VLNET”, Proc. ICAT/VRST ’95, pp. 3-11.

[Tosa 96] Naoko Tosa, Ryohei Nakatsu, “The Esthetics of Artificial Life: Human-Like Communication Character ‘MIC’ and Feeling Improvisation Character ‘MUSE’”, Proc. Artificial Life, 1996. [Unuma95] Munetoshi Unuma, Ken Anjyo, Ryozo Takeuchi, “Fourier Principles for Emotion-Based Human Figure Animation, Proceedings of ACM SIGGRAPH’95, 1995.

Suggest Documents