Synthetic Actors in Real World Animating Virtual

MIRALab Copyright © Information 1998

Synthetic Actors in Real World Animating Virtual Actors in Real Environments Nadia Magnenat Thalmann MIRALab, University of Geneva Daniel Thalmann Computer Graphics Lab, EPFL Abstract This paper tries to provide a detailed and complete description of merging virtual actors with animation in a real environment. It describes the tasks involved in each stage of integration such as the video acquisition, the extraction of camera parameters, the creation and animation of virtual actors, and the rendering of the final images. The most important problems are discussed: real objects hidden by virtual actors and virtual actors hidden by real objects, collision detection between the virtual actor and the real environment, correspondence between the real and the virtual cameras, casting shadows of the virtual actors on the real world. Case studies are presented like the virtual actress Marilyn walking with real people on a real street or sitting down on a real chair. Keywords: virtual actors, augmented reality, hidden surfaces, virtual camera, shadows Introduction Computer-generated images are more and more used for advertising, simulation and special effects. Virtual or synthetic actors have been also created in the last few years. More recently, image processing has become popular for digital warping [2] or morphing [3] and to encrust new real actors into existing films, like in the Forrest Gump movie. Another popular approach especially for advertising combines computer-generated images into real images. However, very few experiences have been made to animate virtual actors in the real world. Film industry companies have successfully produced films like Jurassic Park and The Mask, but the complete process of integration of virtual actors into real scenes has not been described in details. This paper tries to describe step by step the techniques to create scenes involving realistic virtual actors in a real environment Although we don't think there is a similar work described in details in the scientific literature, we may mention interesting work related and useful to our approach especially in Computer Vision [4 5 6]. The use of reference points corresponds to the general problem of camera calibration

extensively described in the robotics and photogrammetry literature [7]. Such techniques compute a camera's viewing parameters by matching points in the 2-D images with corresponding 3-D points on the object. Similar problems are also present in Augmented Reality [8 9]. However, the problems are slightly different, because criteria of time are more important than criteria of image quality. Pentland et al. [10] describe how they extracted a 3-D model of a building from a video clip of a walk-around outside the Media Laboratory at MIT. The recovered 3-D locations of these feature points were then used to compute 3-D polygons for the building surfaces visible in the scene. The 2-D vertices of the polygons that define each face were manually identified in one video frame and the recovered camera position for that frame is used to back-project the 2-D vertices onto the appropriate 3-D plane, resulting in 3-D vertices for the polygons in scene coordinates. Although, the approach is very different, we should also mention interactive video environments like the ALIVE system [11]. In this environment autonomous virtual agents and the user can "see" each other --- users can see the agents on the video screen, and the agents can see users through a computer vision system. An image of the user appears on the video screen, effecting a type of "magic mirror", in which users see themselves in a different world through the use of a simulated mirror. Before other processing can occur, the vision system must isolate the figure of the user from the background using low-level image processing techniques to detect differences in the scene, and use connected-components analysis routines to extract objects. Fellous [12] also describes an experience of involving real people in a synthetic scenery using a blue-box approach as explained in the next section. The technical descriptions in this paper are based on several sequences produced at the University of Geneva and the Swiss Federal Institute of Technology. All the scenes produced involve the synthetic Marilyn in various real situations: walking behind and in front of real objects, walking with two students in a very well-known place in Geneva, entering through the window in a laboratory, sitting down on a real chair, and walking along the border of Lake of Geneva with an autonomous dress. A general view of the problem To combine computer-generated images into real ones, there are two ways. One way consists in taking into account the real world during the generation of the images by the computer. The other way more common consists in composing the images without taking into account the real world. In this second case, the method only consists in superimposing virtual objects over the real world. Two kinds of processes are possible: analog and digital. In the analog or video process, a chromatic threshold is determined on the first source to cut and the value of the second source is substituted. This is mainly the blue box approach: the background of the computer-generated image is set to a specific color, say blue, which none of the virtual object use. Then the combining step replaces all blue areas with the corresponding parts from the real images. In the digital process, each pixel of the image is coded and can be tested and replaced. A numerical threshold, masks or transparent background may be used. The more powerful approach is of course an image synthesis system working on digital images. Such a system

should provide facilities for editing, composing, animating, deforming and modifying sequences of digital images. It also requires bi-directional connections with broadcast video systems, high quality images, large storage and fast access to images and a high-level user interface. For creating scenes involving virtual actors in the real world, we should really take into account the real world during the generation of the images by the computer. For example, consider a virtual actor passing behind a real tree: for some images of the actor, part of the body should be hidden. For more realism, the shadow of the actor should be cast on the real floor. This means that the computer-generated images are dependent on the real world. One way of solving these problems is to create virtual objects similar to the real ones and a virtual camera corresponding to the real camera which is used to shoot the real scene. However, this correspondence is generally hard to establish. In summary, the virtual actor should be integrated into the real world using the same parametric conditions than in the reality. This means that several interesting problems should be solved: "collision detection" between the virtual actor and the real environment; e.g. virtual actor walking on a real street or sitting down on a real chair. processing the hidden surfaces which means real objects hidden by virtual actors and virtual actors hidden by real objects. adapting the sizes of the virtual actors to the dimensions of the real world. making the rendering of the virtual actor similar to the representation of the real world (photo or video) casting shadows of the virtual actors on the real world if there is a camera motion, making a correspondence between the virtual camera and the real one We will now explain the main steps to produce a sequence involving a virtual actor in a real scene. As shown in , the procedure may be decomposed into several steps. First, the real scene has to be digitized in 2D using video acquisition. Some elements of this real scene may have to be built in 3D in the computer in order to evaluate contact points, to detect hidden surfaces, or to generate shadows. From the 2D pictures and 3D coordinates, characteristics of the real camera and other parameters have to be extracted. Then, the virtual actor should be created and his motion designed. Using the extracted parameters obtained from the real scene, images of the virtual actor have to be calculated using a rendering process. At this stage, 3D elements may be involved for hidden surface removal and shadow generation. Image processing techniques allow the blending of computer-generated images with real images. Finally, the combined images have to be registered on films or video.

Figure 1. Diagram of the procedure Real/Virtual Video acquisition and 3D modeling Before shooting a sequence, it is necessary to select a decor well adapted to this sequence. For example, if we want to produce a sequence with Marilyn walking in a real decor, we need to have a flat floor and reference points in the decor. The main reason to require a flat floor is that modelling of walking is difficult on a non-flat floor and our walking software requires a flat floor. There is a second important reason: the floor must be enough simple to be modeled into the computer for estimating the contact points between Marilyn's feet and the floor. Figure 2 shows an example of real scene we used. Reference points in the decor are mainly used for calculating the camera parameters as we will see in Section . It is also important to be sure that real objects that can hide the virtual actors are simple as they should be modeled into the computer. It is also better to select a diffuse light in order to minimize the impact of the shadows.

Figure 2. Example of real decor Construction of 3D elements of the real scene is a tedious task and should be restricted to the minimum. However, this is necessary in some specific cases. If a real object hides the virtual actor, one solution to solve the problem consists in modeling the object in 3D in order to use the standard hidden surface removal algorithm at the rendering stage. This is also necessary to model 3D objects when the virtual actor has a contact with them. For example, when an actor is walking on the floor or climb a mountain, we should model the floor or the mountain. If the actor should sit down on a chair, we have to build the chair. Finally, if a real object produces shadows on a virtual actor, we have to build the real object. This is also true for virtual actor's shadow cast on a real object. One question arises: how to build the necessary 3D objects ? Unfortunately, an automatic construction of the 3D elements from 2D images is still a complex problem and in our case, we

use a manual process. Extraction of Camera Parameters Static camera To combine a synthetic object into a photographic scene, it is necessary to render the objects from the same point of view as used to make the photo. In other terms, the virtual camera of the image synthesis system should correspond to the real camera used for the photo. A virtual camera is generally a simplification of a real camera. It has a position (eye), a view direction (interest point), an orientation (spin), a view angle (zoom) and clipping planes. A real camera has the same characteristics, but also optical distortions, focus control and exposure time. In case of a static camera (no camera motion), camera position could be determined by an algorithmic method. A first technique was described by Rosebush and Kushner [13] but only the camera position could be determined. In our case, we want to determine the 10 following characteristics: 3 coordinates of the center of the camera, 3 angles for the orientation, the view angle, the two dimensions of the projection screen and the ratio r between width and height of the pixel. Our approach based on Bogart [14] method consists in an iterative process for determining the characteristics of the real camera from the 2D coordinates on the image of 3D points in the scene. The method requires that at least five points are in the photo, and that the 3D coordinates of those points are known. Later on, when a computer-generated object is modeled, it must be in the same 3D space as the photo objects, and must use the same units. For each of the five points, the 2D screen points must be found. This can be done by examining the photo or using image processing techniques to detect specific hues in the photo. As Bogart's method is iterative, the 2D point does not need to be accurate to sub-pixel detail. The final set of camera parameters will project the given 3D points to 2D locations that have the minimum error from the given 2D screen points. At the beginning of the iteration, fictitious values are given to the camera characteristics. With the help of these values and the 3D coordinates of the points, we calculate the 2D coordinates of these points on the projection plane. Because of the inexact values for the camera characteristics, we get 2D coordinates which do not match the real 2D coordinates measured on the image. Using the difference between the real and the calculated coordinates, we can calculate new characteristics for the camera. With these new characteristics, we obtain new 2D coordinates for the points and we continue until the difference between real and calculated 2D coordinates is less than a given threshold. More details may be found in the appendix. Dynamic camera When the camera is moved during a sequence involving virtual actors in a real world, the problem is much more difficult, because it requires to coordinate a real camera with a virtual one. The problem may be solved using special hardware equipment or by software. One hardware solution consists in fixing the real camera to a robot arm and driving it by the computer. By modeling the components of the real world in the virtual one, the virtual camera may be animated by computer, the real camera will follow due to the robot. The geometric

coherence and the accuracy are guaranteed, however, the system is expensive and limited by the robot motion. Another possibilities is to slave the virtual camera to the real camera as in the SYNTHETIC TV system [7]. The system is a dynamic coupling obtained by placing sensors on the real shooting system that continuously informs the computer of the executed actions on this real system. This means that the real camera motion is transmitted to the computer which calculates the corresponding parameters. A wireframe image of the virtual world is generated using the current viewpoint and composed in the camera sight. The synthetic images are then calculated with geometric distortions due to the real camera. We did not use a hardware solution because of the price and the limitation of the different approaches. Our software solution is an extension of the Bogart algorithm to specific camera movements. The way to do it is to fix a few parameters depending on the type of camera motion. The technique then consists in first calculating the first frame using the algorithm for a static camera, then for the next frames, the fixed parameters are the same as for the first frames and the other parameters are calculated using the same algorithm. A complete new iteration is initiated when the fixed parameters do not correspond to the reality. The advantage is that we reduce the number of parameters involved in the iteration calculation, which reduces the Jacobian calculation (see appendix) and allows us to decrease the number of reference points in the real world. We applied the technique for three cases: zoom: only the distance between the camera center and the projection plane is changing pan: the camera is fixed in a point and only a rotation is performed tilt: only a translation is performed For example, in a zoom sequence, for the first frame, we have 10 unknowns and we need 5 reference points, but for the next frames, there is only one unknown and the need for two reference points only. In the case of a traveling, we have 3 unknowns and a need for two reference points. Problems of this approach Points in the real world should be selected in order to be sure that at least ten points are visible in the camera field during camera motion. Moreover, iteration process requires to have non coplanar points. It is enough to get camera positions only one frame out of 4 using the algorithm. Inbetween positions may be easily calculated using spline interpolation Images that we get seem to be deformed during camera motion especially at high speed. This is due to a problem of reference points. These reference points on the image are generally shifted from several pixels relatively to their true positions. This is something to take into account during the definition of reference points. In theory, we need at least 5 reference points for one iteration for a static camera. In practice, for better accuracy, 10 points are recommended, For panoramic and travelling, the theoretical

minimum is 2 points (6 in practice). Zoom only requires one reference point in theory but 3 in practice. Concerning the camera problem, mixing real and virtual images works very well for a static camera. During a zoom motion, the results are also very good and we may have virtual actors walking in the real world. When there is a camera displacement, the results are less satisfactory. The problems are due to many inaccuracies during the process. Causes are multiple and difficult to completely eliminate. First, there are inaccuracies in the measures of 2D positions of reference points on the images because of focus problems. Camera speed tends to increase the accuracy on these 2D positions. Moreover, travelling motion tends to be a little jerky due to the way of moving the camera which is tied to a rollerstroller. Finally, the iterative method can only provide an estimation of the results and we use spline interpolation to get smooth motion. More complex techniques should be investigated [15]. There is also a potential problem, because parts of the real world may not be in focus. To overcome this, the virtual actors could be rendered to simulate a limited depth-of-field. Another problem is due to optical distortions due to the camera, especially far away from the center. This problem could compromise the illusion that the virtual actors and the virtual world coexist. One way to correct the problem is to use image warping techniques, both on the digitized video and the graphic images [16]. Creating and animating the Virtual Actor Creating the actor If the virtual actor does not already exist, he should be created. For the face, the operations conducted in a traditional sculpture can be performed by computer for computer-generated objects. Our sculpting software is based on the Ball and Mouse metaphor [17] and allows the user to create a polygon mesh surface. When used in conjunction with a common 2-D mouse such that the SpaceBall is held in one hand and the mouse in the other, full three-dimensional user interaction is achieved. Local deformations based on an extension of FFD [18] are applied while the SpaceBall device is used to move the object and examine the progression of the deformation from different angles, mouse movements on the screen are used to produce vertex movements in 3D space from the current viewpoint. Local deformations make it possible to produce local elevations or depressions on the surface and to even out unwanted bumps once the work is nearing completion. The technique is intended to be a metaphor analogous to pinching, lifting and moving of a stretchable fabric material. Pushing the apex vertex inwards renders a believable effect of pressing a mould into clay. For the body shape, we have proposed recently a new approach based on metaballs [19]. Metaballs (or soft objects) [20 21], which are particular subsets of implicit surfaces, are becoming hot topic in computer graphics. Because metaballs joint smoothly and gradually, it gives shape to realistic, organic-looking creations, suitable for modeling human bodies, animals and other organic figures, which are very hard to model using traditional geometric methods. In order to enhance the modeling capability of the metaball technique, we devised some ways to reduce the number of metaballs needed for a desired detail shape, and to allow the designer to manipulate blobby surface at interactive speed. A cross-sectional based isosurface sampling

method, combined with the B-spline blending, enable us to achieve those two goals. From a practical point-of-view, we use an interactive metaball editor (BodyBuilder) for shape designers. We start the shape design by first creating a skeleton for the organic body to be modeled. Metaballs are used to approximate the shape of internal structures which have observable effects on the surface shape. Each metaball is attached to its proximal joint, defined in the joint local coordinate system of the underlying skeleton which offers a convenient reference frame for positioning and editing metaball primitives, because the relative proportion, orientation, and size of different body parts is already well-defined. Motion control This step corresponds to the animation of the virtual actor. This is performed in our case using the TRACK system. TRACK [22] is an interactive tool for the visualization, editing and manipulation of multiple track sequences. A sequence is associated with an articulated figure and can integrate different motion generators such as walking, grasping (), inverse kinematics, dynamics and key framing within a unified framework. Integration of different motion generators is vital for the design of complex motion where the characterization of movement can quickly change in terms of functionality, goals and expressivity. This induces a drastic change in the motion control algorithm at multiple levels: behavioral decision making, global criteria optimization and actuation of joint level controllers. By now, there is no global approach which can reconfigure itself with such flexibility. The system provides a large set of tools for track space manipulations and Cartesian space corrections. This approach allows an incremental refinement design combining information and constraints from both the track space (usually joints) and the Cartesian space. We have dedicated this system to the design and evaluation of human motions for the purpose of animation. For this reason, we also insure the real-time display of the 3D figure motion with a simultaneous scan of the 2D tracks. The function of TRACK can be basically divided into two parts: the motion generation, and the track editing. shows the system control flow of TRACK.

Figure 3. Example of realtime grasping using TRACK To create an animation sequence, the first step may consist in creating key positions of the scene, for example Marilyn sitting down. We start from the TRACK default position (a), then we build the position corresponding to the actress on the chair (b). Using the editing functions of TRACK, Marilyn is positioned and placed back (c) then with the walking function, walking parameters are set up in order that Marilyn can walk to the initial position (d). Marilyn should first rotate 90o, sit down (e) and take the position of b. To generate Marilyn standing up again and walking, we just use the TRACK facilities to generate a reverse sequence. shows a frame with the virtual Marilyn on the real chair.

Figure 4. System control flow of TRACK

Figure 5. a. Initial position in TRACK b. "Sit down" position c. Starting position (backwards the initial position) d. Ready to sit down e. "Sit down" position and height of the virtual chair

Figure 6. Marilyn on a real chair (see Color Plate) Walking generation There are many examples of integration of virtual actors in the real world, but a very important case involves actors walking on a real floor. For this reason, the animator needs to have a tool for designing walking path and walking style. Walking design is a key part in our motion control system. The main principle is to separate the design of spatial and temporal characteristics. The path design takes place first, then the temporal characteristic can be constructed in a continuous design loop: Constrained design of distance over time to provide personified speed.

Translation into relative theoretical velocity with the inverted transfer function. Computation of the phase to determine the step instances and locations Visualization of the step along the path Figure 7 shows the principle of the design process. Figure 7. General design process for walking generation This design architecture is important when the human figure moves in a predefined setting. Due to the real-time visualization, the user can instantaneously check the validity of step positions with regard to the setting. Following stones in a path or avoiding puddles continuously are clear examples of its usefulness In addition, temporal control provides a desired rhythm during the animation. The path is designed on a 2D layout (a) of the scene by means of Cardinal splines. Four control points are sufficient to determine a cubic segment passing through the two middle points. The complete path is defined by a list of such control points. The path is passing through the control points (except the first and last ones) with a first order continuity between adjacent segments. We may adjust the tension and have a local control on the curve. Prior to the walking trajectory mapping, the path is transformed in a sequence of line segments using an adaptive sampling algorithm [23]. b shows a session of our walking software. More details may be found in [24].

(a)

(b) Figure 8. a. Design of a path b. interactive session of the walking software show scenes with Marilyn walking in real worlds. For the second example, Marilyn has been positioned on the first and the last image of the sequence in order to obtain the distance she

should walk. Using the time duration of the sequence, velocity has been calculated and entered into the walking software which cares of the walking velocity.

(a)

(b) Figure 9. a. Marilyn entering into a room b. Marilyn walking with friends (see Color Plates) The rendering process For this step, a standard rendering system should be used. In our case, we use the public domain raytracer RayShade. Two kinds of entities have to be generated: the virtual actor and the virtual objects corresponding to real objects that hide the virtual actor. These virtual objects have to be simple and should be created using a modeler as explained in section 4. Then, the virtual object should be adjusted to the real one using an image processor or blended as explained in Section 10.2. To produce realistic shadows of the virtual actress on the real world, we first determine the orientation of the shadows by sticking a 1-meter stake into the floor and measuring the coordinates of the shadow. We may then define in the ray tracing software a directional light source with the measured coordinates. Additional ambient light should be generally added. For the color of the shadow, we analyze a real digitized image in order to obtain the RGB values of the shadow on the floor. In , Marilyn has her shadow cast on the floor, which has been built in 3D according to Section 4. Blending process The blending process is the key operation for combining the virtual actors with the real picture taking into account problems of hiding and occluding objects and shadowing. The operation is

based on the generation of masks, but there are different ways to create and process these masks. We will consider three types of masks: 2D masks, 3D generated masks and depth masks. 2D masks In this approach, we use first a blending technique to generate the mask of objects located between the actress and the camera. We start with decor.rgb from which we will create the mask to hide the actress. This could be made using an image processing program like ADOBE PhotoShop for example using a simple procedure. First, we select the part of the decor () that should hide the actress, we cut it (a) and fill in it in black (b). Then we select the inverse part (everything except the previous selected part) and cut the new selection (c). Using the resulted mask and the actress image, actress.rgb(t) (a), we can obtain an image of the masked actress, mactress.rgb(t) (b) that can be blended with the decor decor.rgb to produce the final image . The procedure should be iteratively used as long as the actress is behind the object.

Figure 10. Original scene

Figure 11. a. Object cut b. Black filling c. mask

Figure 12. a. Original actress b. Cut actress

Figure 13. Final image 3D generated masks This is the most general method and the method we apply the most often. Masks are obtained by a separate rendering of the occluding objects. This means that each time a real object has to hide the virtual actress, we model the real object in the virtual scene and render it. For example, shows an example where Marilyn is passing behind a pole. A cylinder has been generated by computer in order to model this pole. In order to show the methodology in details, we present a complete example, Marilyn is entering into a laboratory through the window. We start with two RGB color pictures: decor.rgb representing the decor () and actress.rgb(t) (a) representing the virtual actress at the time t; actress.rgb(t) should be on a black background with some parts of the actress' body that may be black when virtual objects corresponding to real objects hide these parts of the synthetic actress. A mask is created and used during the image composition. In order to create the mask, actress.rgb(t) is first converted to a black image actress.bw(t), then all pixel values of the image different from 0 are set up to 1, resulting in the mask actress.mask(t) (b). The blending of decor.rgb with actress.mask(t) produces the resulting image with the synthetic actress inside the real decor. In , the actress seems really to be behind the window generated by computer. shows

four frames of the resulting animation sequence.

Figure 14. Marilyn passing behind a real pole (see Color Plate)

Figure 15. The real world

Figure 16. a. Virtual actress b. Mask

Figure 17. Result of the blending process

Figure 18. Sequence with Marilyn entering in the lab through the window (see Color Plate) Depth mask blending This is evident that the system knows the distance to the virtual actors, because actors are built into the system. But the system generally does not know where all the real objects are in the environment. Of course, we may assume that the entire environment is measured at the beginning and remains static thereafter or even that the real world has been built as a complete 3D model. However, in most cases, we don't have such a model or the virtual world is dynamic. In this case, we may use a depth map of the real environment which will allow real objects to occlude virtual objects, by performing a pixel-by-pixel depth value comparison. In order to obtain the depth map, we may use laser rangefinders, computer vision techniques for recovering shape, or intensity-based matching from a pair of stereo images [25]. Recording the images Unfortunately, even with a Reality Engine and a Sirius video card, this is not possible to produce real-time sequences of realistic virtual actors living in a real world. For these reasons, we have produced our sequences frame-by-frame. shows the complete process in details.

Figure 19. The complete process with input/output A complete example with a dressed actress One of the main problem of integration of virtual actors into the real world is to make them enough realistic to be not so different than real people inside the same world. One key aspect is the behavior of the actors' clothes during an animation sequence. We cannot imagine a realistic sequence involving a virtual actress walking with a completely rigid dress. For this reason, we have recently produced a sequence with a completely dressed Marilyn. Clothes are simulated using physics-based models [26 27]. A series of figures illustrates the complete process. Figure 20 shows the original background and Figure 21 the nude body of the actress. InFigure 22, we show how the body is placed inside the real decor. The way of creating clothes is illustrated in Figure 23.Figure 24 explains how a virtual floor is created for generating Marilyn shadow and to set up the lights. The final image is displayed in see slide Figure 25.

Figure 20. Background image

Figure 21. Marilyn's body

Figure 22. Marilyn placed in function of background

Figure 23. Dress creation

Figure 24. Creation of virtual floor for Marilyn shadow and light set up

Figure 25. Final combined image Applications Mixing virtual humans and real scenery is mainly interesting for special effects. It is then possible to recreate actors from the past in today situations and let them interact together. In this kind of applications, illusion is at its best. If the technique of mixing real and virtual words improve in the coming future, we will be able to simulate in live actions people of today with the presence of people from other worlds or other times. The effect will be extremely interesting and creative : for example, superimposing dynamically styles of speech and manners from one period versus another one, or styles of clothing. This will help to understand evolutive behaviour through time and decors. Finally, this technique will not only be of some use in the entertainment industry, but also in the simulation of mixing different periods of history and dynamically compare them. Another potential category of applications is the production of audiovisual documents (films, CD-ROM) illustrating the behavior people should have during a danger situation: e.g. people in a fire or a nuclear plant. There are also applications for maintenance in a similar way as described for Augmented Reality [28] [29]. Conclusion There is an enormous potential in the production of sequences involving virtual actors living and playing in real scenes. New films could be produced with dead actors or actors who will never exist. However, the existing techniques are still very limited. No good tools are available to make the correspondence between the 3D real scene and the 3D virtual actors. Problems will be only solved when 3D reconstruction of a scene will be easily possible, which is a current challenge in pattern recognition and image processing research. Acknowledgments The authors would like to thank the students who worked in these projects at the University of Geneva and the Swiss Federal Institute of Technology: Mirko Dinarich, Hugues Favey, GianMarco Martino, David Passamani, and Riccardo Santandrea. The are also grateful to the research people of both laboratories for their assistance in these projects when needed, especially Pierre Beylot, Laurent Bezault, Zhyong Huang, and Shen Jianhua. The software used for the creation

and animation of the virtual actors has been sponsored by the Swiss National Research Foundation, the ESPRIT Project HUMANOID and the Federal Office for Education and Science. References Appendix: Determination of the camera parameters To be determined: the 10 following characteristics V=(xc, yc, zc, [phi]x, [phi]y, [phi]z, [theta], u0, v0, r) with: xc, yc, zc: 3 coordinates of the center of the camera [phi]x, [phi]y, [phi]z: 3 angles for the orientation [theta]: view angle u0, v0: two dimensions of the projection screen r: ratio between width and height of the pixel At the beginning of the iteration, fictitious values are given to the camera characteristics. With the help of these values and the 3D coordinates of the points, we calculate the 2D coordinates of these points on the projection plane of the camera using the two equations: and where L is the screen width in pixels. Because of the inexact values for the camera characteristics, we get n 2D-coordinates (u'i, v'i) which do not match the real n 2D-coordinates (ui, vi) measured on the image. If the difference is greater than a convergence criteria (which is the case at the beginning) of the iteration), we calculate a correction on the parameters using the Jacobian matrix dU=J.dV where U = (u1, v1 , ... un, vn) is the vector of 2D exact positions U' = (u'1, v'1 , ... u'n, v'n) is the vector of 2D calculated positions dU = U' - U

with 1