Virtual Dancer: Architecture for Generating Semi ...

Virtual Dancer: Architecture for Generating Semi-Automatic 3D Animations Rodrigo B. V. Lima, Davi M. Cabral, Roberto S. M. Barros, Geber L. Ramalho Centro de Informática – Universidade Federal de Pernambuco (UFPE) Caixa Postal 7.851 – 50.732-970 – Recife – PE – Brazil {rbvl, dmc3, Roberto, glr}@cin.ufpe.br

Abstract The process of creating animation for avatars in virtual worlds has always been a non-trivial task. There are advanced techniques available in the market, such as motion capture, but due to resources limitation, these approaches are usually out of range to most users. This paper proposes a language based on XML, capable of describing in a very intuitive way, dance movements, which can be associated to avatars, as well as an application that can interpret the defined language in order to generate an avatar that can dance synchronized to any MIDI file.

1

Introduction

It is a known fact that a well-designed avatar can improve interaction in a virtual world. It can make it easier for the user to explore the environment and enhance user enjoyment [Hansel, 2001]. Avatars are a rich means of communication, being capable of representing an almost unlimited universe of expressive possibilities. Thus, exploration of these possibilities may yield better final results. Expressive possibilities, such as dance and music, can be associated directly to an avatar through choreographies. However, the process of creating dance animation for avatars has always been a challenging task. The high complexity necessary to produce gestures and movements for characters is difficult due to the existence of various articulations, which must all be coordinated simultaneously. Therefore, dance animation can easily become a very tiring and time-consuming task, especially if the designer wants to build a repository of dance movements and avatars. There exist some advanced techniques available in the market that can be used to deal with this task, for instance, motion capture [Lee, 2002]. However, due to resource limitation, these approaches are usually out of range to most users. Having this in mind, our main objective is to propose an alternative solution for synchronizing movements and music, one that is accessible and easy to use, i.e. based on common input devices, and with publishing capabilities. The main idea is to propose a model capable of receiving a definition af an avatar in 3D, a list of movements and a song as input, and, based on the results acquired after processing these inputs, generate, as output, a dancer avatar that can be published over the Web. The remainder of this paper is organized as follows: the review of related work is provided in section 2. Section 3 describes the technological aspects of this works. Section 4 and 5 describe the proposed model. The former introduces the language XMSL (eXtensible Movements Synchronization Language), and the latter describes the application responsible for validating the proposed language and generating a dancing avatar. In section 6 we present analyse the obtained results. Concluding remarks follow in section 7 as well as the avenues of further work.

2

Related Work

Initially we have identified some applications related to our work on some level with our work. All these applications were analysed considering available features and easy of use. Bellow follows a brief description of the most relevant systems identified.

2.1

The Dancer

Application developed by the website GlobZ [GlobZ, 2004] using Flash technology. It allows the user to create choreographies to one available dancer defined by the system. This choreography is assembled by combining predefined dance movements for the dancer’s feet, hands and body. The user plays the role of a choreographer, that is,

he/she shall elaborate responsible to elaborate the choreography he/she thinks suits best the song being played. This application can be seen on Figure 1 (extracted from http://www.globz.net).

Figure 1: The Dancer The main limitations of this application are: its environment is strictly 2D; number of available songs (currently there is just one); the user cannot define new dance movements to be used by the dancer; and the user cannot create his/her own dancer.

2.2

Rube

Applications based on XML (eXtensible Markup Language) [Bray, 2004] and Web3D publishing technologies such as X3D (eXtensible 3D) and VRML (Virtual Reality Modelling Language). It was developed by the University of Florida [Kim, 2002] for the construction of dynamic 3D models (simulations). This system proposes a modelling paradigm where geometric and behavioural information of a given 3D simulation are separately represented. This approach allows fast representation of a given experiment under different conditions. An example of experiment in Rube can be seen on Figure 2 (extracted form [Kim, 2002]).

Figure 2: Experiment in Rube In Rube, virtual elements are represented internally using X3D, which allows the tool to use the technology XLST [Clark, 1999] to fuse together structural and behavioural information. Additionally, behavioural information can also be part of JavaScript package designed for simulations (SimPack).

2.3

Other Systems

The two applications described above were the most significant ones for our work, hence, they ended up serving as an important baseline. However, there are other similar works such as [Barrientos, 2001] and [Rieger, 2003]. They are also looking forward to developing simpler mechanisms (hardware and software) for movement creation. These approachs are composed of a pre-defined database of movements, which are subsequently combined in order to create sequences of more complex movements.

3

Technological Issues

As mentioned before, one of the main objectives of this work is to create one tool that would provide a way to add dance elements and choreographies to virtual environments, all this under a simple, flexible and cheap fashion. Based on these requirements, we have chosen to use VRML and X3D to define our avatars, since these are standard languages when talking about virtual worlds development for the Internet. The XMSL (eXtensible Movements Synchronization Language) language is based on XML. This gives great flexibility to the project for two main reasons: it is possible to use any previous designed avatar, the designer just needs to re-structure it according to the format specified by XMSL (further details about XMSL will been seen in section 4); and it is also possible to build a repository of movements, totally independent of any avatar. The application responsible for interpreting XMSL and also processing the MIDI (Musical Instrument Digital Interface) files was implemented in Java, using mainly the APIs JavaSound and CyberX3D. The API JavaSound is a low level API designed to execute and control input and output over sound media, for both MIDI and digital music files. This API provides mechanisms to install, access and manipulate system resources such as sound mixers, MIDI synthesizers and other devices, and also allows reading and writing files. The CyberX3D API is basically a package to develop application capable of converting and dealing with X3D and VRML files. We have used Data Bind Castor [Castor, 2004] in order to integrate the Java application and the XML language. In XML, a data binding is a link between XML documents and objects designed specially to store all data contained in these documents. Hence, applications can manipulate, in a very natural way (when compared to DOM and SAX1 parsers) data that have been serialised in XML. We have opted to use an object centred approach ([Bourret, 2003] and [Sosnoski, 2003]), which is based on three concepts: marshalling (transformation of a XML representation into an object), unmarshalling (opposite process of marshalling) and mapping (set of ancillary rules relating to the previous two operations). One advantage of this approach is that programmers are not restricted to work with objects of the kind element, text and attribute as would be necessary in, for einstance, DOM, having the freedom to design his own class hierarchy. Furthermore, attributes and elements can be accessed via class methods and new methods can be implemented for additional functionalities.

4

XMSL (eXtensible Movements Synchronization Language)

In this section we will present our proposed language, called XMSL (eXtensible Movements Synchronization Language). This language was based on the XML standard and on observations of other systems for 3D object manipulation, such as Rube [Kim, 2002]. Its main objective is to provide a means of describing avatars and movements for 3D environment in a modular way. The idea behind XMSL is to separate the process of creating 3D animations into two modular steps, one structural and another dynamic or behavioural.

4.1

Structural Model

A structural file contains the avatar definition used by the system. These files can be acquired from any VRML/X3D repository or developed by an authoring tool that can save files in VRML or X3D formats (trueSpace, Cosmo Worlds, 3D Studio Max, Spazz3D). Figure 3 shows a small part of a structural file2 and figure 4 shows the corresponding avatar. This example has only a minimum structure and, for simplicity, will be used through the remainder of this section to exemplify the most important components of a structural file in XMSL. The proposed model contains specific elements and properties to be used when defining an avatar for a virtual world. Among these elements, the most important are the tags doc, avatar and part. The tag doc is the root of the language, being useful to group together one or more characters defined by the tag avatar. The tag part defines each articulated piece from the avatar. Its content must be written in VRML or X3D. The elements defined by the tags avatar and part have a property called name, responsible for naming them. Additionally, the properties x, y and z of avatar define its initial position inside the virtual world.

1 2

Further information about DOM and Sax parsers can be found in [Federizzi, 2001]. The complete file can be downloaded from http://www.cin.ufpe.br/~dmc3/virtualdancer/estrutural.xml

! ! ! ! " ) ( ( ' (& ( ( ( ( ,( . ( ( , . ( + 0 ( (

#

$ % &' ! ! !

(

(

( *+

, -

"

*(

(

/ 1

' (

0 . ' ( ( 3 ( ( % (

(

0 ( * 2 ( ( 0 ( 0 ++* 4

2

2 2 5 2 ( (

(

( (

+

0 (

(

6 1

' (

0 (

. ' ( ( 3 ( ( % (

0 ( * 7 2 ( ( 0 ( 0 ++* 4

2

2 2 5 2 ( ( ( (

Figure 3: Part of a structural file corresponding to the avatar Ball

Figure 4: Avatar Ball

4.2

Behavioural Model

The behavioural model, just as the structural model, is defined by a series of elements and properties. In this module we can find information related to the movements and actions supported by our model. The behavioural code has two main elements, the tags move and choreography. The former defines the minimum units of movements, and the latter uses these units to assemble a given choreography. Figure 5 presents an example of a behavioural file3. 3

The complete file can be downloaded from http://www.cin.ufpe.br/~dmc3/virtualdancer/comportamental.xml

The element move consists of an attribute name, used to label each movement, and two other sub elements, key and keyValue. Key defines the time fractions (from 0 to 1) of each movement and keyValue defines the position of the object on each of these time fractions. The meaning of these two elements is similar to the fields Key and KeyValue from the node positionInterpolator of VRML [Guynup, 2000]. Their values are delimited by brackets ([ ]) and real numbers separated by commas (,). Just as an example, lines from 5 to 8 of figure 5 specify one movement, named “upLeftHand”, fractioned in two instants, 0 and 1. On instant 0, the associated object will be positioned on the coordinates x = 0.6, y = -1.3 and z = 0, and on instant 1, on the position x = 0.6, y = -1.1 and z = 0. This movement could have been splitted into a greater amount of time fractions (i.e. [0 0.3 0.6 1]), which would lead to more complex and more precise movements. Just for simplicity, we have decided to keep just two different instants on the analyzed example. The element choreography consists of an attribute type and two sub elements, partRef and moveRef. The attribute type defines how the synchronization between the choreography and the music will happen: according to the music rhythm (type = 0); or according to beat events of a percussion instrument (type = 1). The elements partRef and moveRef associates the choreography to one part element, which must be defined in the structural model, and one or more move elements defined in the behavioral model. Figure 5 shows a choreography that is associated to the part named lefthand of one avatar, its synchronization is based on the music rhythm and it has 4 movements: upLeftHand, downLeftHand, upLeftHand and downLefthand.

) ( ,

(

'

! ! ! ! " (&

# (

(

( *+

, -

"

*(

(

( ( * & +8 ( 9 . 1 : 5 9 . 9 .; ( * 1 7 " : 7 ( < < < < (

,

$ % &' ! ! !

. . + + * & + + ! + + * & + + ! .

+8 & +8 &

( < + ( +8 ( ( +8 (

5

9 .; ( * +

+8 (

(

Figura 5: Part of a behavioral file (movement upLeftHand)

5

Application Virtual Dancer

We have also developed an application in Java that can recognize both the structural and behavioral models presented earlier. It receives as input a XMSL and a MIDI file, and, after processing them, generates VRML or X3D code containing an avatar that can dance synchronized to the selected MIDI file. This application is flexible enough to adapt itself to any MIDI file, as well as any avatar created. In the following sections we are going to show more details about its structure and its main modules.

5.1

MIDI Synchronization Algorithm

The music synchronization module was obtained thanks to the development of an algorithm capable of processing MIDI files. In order to develop this algorithm we have had to study some musical concepts, such as, music tempo, beat detection, etc, as well as the Java Sound API. The main musical characteristic taken into account to develop the algorithm was the notion of beating. Beating is the regular occurrence of rhythm through time, that is, beats that happens on specific and constant intervals [Samartino, 2003]. The velocity of the beat may vary from one song to another. Some are slow paced while

others are faster; or even during the same song. This velocity is called tempo, which can be measured in beats per minute (bpm). Our algorithm uses two different approaches: synchronization according to the song rhythm, and synchronization based on a given percussion instrument. On the first strategy, the algorithm computes the song rhythm in microseconds per quarter note, in order to determine the duration of a quarter note during the song. If this approach is used, the dancer will execute one dance movement on each quarter note. The algorithm has to determine the amount of movements necessary to form the choreography according to the song duration. This total number is intuitively computed by dividing the length of the song by the value of microseconds per quarter note. Each movement code is then repeated as many times as necessary to fill the music execution time according to the order defined by the choreography4. All movements must have the same duration (one quarter note). The attribute key associated to the choreography establishes this characteristic. Its value is computed by incrementing the values of the attribute key of each movement to be processed by the highest value of key for the partial choreography5. For instance, assuming the existence of a hypothetic song lasting three seconds and a choreography (based on the rhythm) defined by two movements whose values for the attribute key are respectively [0 0.5 1] and [0 0.3 1]). The value of the attribute key for the choreography should be equal to [0 0.5 1 1 1.3 2]. This value is computed using simple math to obtain one interval ranging from 0 to 1. In this case, the rule used would be: 2 / x = 1 / y, where 2 corresponds to the highest value of the key interval found, 1 is the highest value of the desired interval, x is the value to be substituted for each element of the obtained interval and y corresponds to all values in a interval form 0 to 1. For our example, the result of this equation would be the following interval: [0 0.25 0.5 0.5 0.65 1]. The second approach to solve the synchronization problem is based on beat-extraction techniques. This process can extract beat events from a MIDI file. Beat information becomes then the most important element for structural analysis of a MIDI sequence [Dixon, 2001]. This second method of synchronization (attribute type of the choreography set to 1) allows the avatar to execute a faster and rougher movement every time a beat event occurs. Hence, the resulting choreography becomes a lot more interesting because, on many songs, some instruments (especially percussion instruments, normally present on track 9 of a MIDI sequence) strongly determine the rhythm of the song. It is up to the user to determine which percussion instrument he/she wants to use. Every time an event from the selected instrument occurs, the avatar will execute a given movement. The main difference between this approach and the first one resides on the fact that the total number of movement repetitions will be equal to the amount of the selected instrument events. The values for the key attribute will be computed according to the following equation: y = x.(t2 – t1) + t1. On this equation x corresponds to the value of key of the movement, t1 and t2 are, respectively, the initial and final time of the movement on the choreography, and y is the desired key value to be used on the choreography. For example, considering the previous example and assuming that there is a percussion event after one second of song execution, then, by applying equation 1, we will obtain the following value for the attribute key of the choreography [0 0.5 1 1 1.6 3]. Just as in the algorithm based on the rhythm, the last step would be to scale down these values so that they would fit an interval between 0 and 1. Hence, we would end up obtaining these values for key: [0 0.17 0.33 0.33 0.53 1].

5.2

VRML Generator Algorithm

The input analysis and the output generation process are based on an abstract syntax tree of objects, used to represent the existing elements of a XMSL input file. As stated before (section 3.1), the transformation of XMSL elements into an abstract tree of objects is done by Castor. The general model used for representing XMSL files under the format of an AST (Abstract Syntax Tree), as well as other classes used to generate the VRML output can be seen on figure 6. The most important elements that comprise the tree of objects are: Doc, Avatar, Part, Movement, Choreography, Sound, Timer and TimerVisitor. The former five elements are extracted during the conversion of the XMSL file into objects, and the latter three from the user interface (Sound) and from the MIDI analyzer (Timer and TimerVisitor).

4 We have decided to keep repeating the choreography code as an alternative approach to the synchronization problem caused by the time needed to render the images. With this solution, the final animation code brings within itself the timing information on which each movement has to be executed, leaving to the render plugins the work of re-synchronizing all images. 5 By partial choreography we mean the moment on which the movements of a specific choreography are being repeated and concatenated to assembly the final and complete choreography.

The classes Bind, AvatarObj, MovementObj, PartObj, AvatarRef, MovementRef and PartRef are used to convert the XMSL elements into objects. The class Bind is responsible for encapsulating the marshalling of the input files. The classes AvatarObj, MovementObj and PartObj are direct implementations of their respective elements in XMSL. Finally, the last three classes are implementations of the reference concept required by Castor due to the XMSL structure used to represent the attributes AvatarRef, MovementRef and PartRef mentioned on section 4.2. Besides, it can be said that the interface and abstract classes Avatar, Movement and Part are only abstractions for the xxxRef and xxxObj classes used by the system, therefore making it easier to deal with them.

Visitor

TimerVisitor

Figure 6: Object model used by the VRML generator algorithm In order to facilitate dealing with this amount of objects, we have opted to use the Composite design pattern [Gamma, 2000]. This design pattern allows representing the object structure as a hierarchy of subsequent compositions. Therefore, individual objects and object compositions can be treated in a uniform fashion. The decision of using this design pattern proved to be very important for generating output code, which was reduced to the declaration of a single method called getCode on the base interface, in this case the class Component. This solution divides the problem of output code generation into well-defined parts. Each class provides its own VRML definition, implemented in the method getCode. We just need to invoke the method getCode from the class that is located on the top of the abstract tree (in this case, one instance of the class Doc) and it will, recursively, invoke the getCode methods of all objects situated below it on the hierarchy. Each class will return the specific VRML code necessary to generate the output file. Figure 7 shows an example of an AST. DOC

AVATAR

COREO1 PART1

PART2 TIMER MOV1

MOV2

Figure 7: Object tree

It is important to say that an instance of the class Choreography generates the avatar choreography by combining the sequence of specific codes from each MovementObj instance referenced by it (synchronization algorithm, section 5.1). Notice that the VRML code stored by the instance of the class MovementObj are indirectly saved on the output file by means of instances of the class Choreography. Additionally, the API CyberX3D does the generation of X3D code automatically. It converts the final VRML code into its X3D correspondent. However, the system could be easily adapted to generate the X3D code directly. It would only be necessary to rewrite the methods getCode. The AST contains only one Timer object responsible for storing the song duration. This object is dynamically associated to all choreographies thanks to another object called TimeVisitor. The TimeVisitor object implements the Visitor design pattern [Gamma, 200], making it easier the process of AST analysis.

5.3

Analysis and Comparison of Similar Systems

Both Rube and Virtual Dancer are expansible systems that allow code reusability. Such characteristic is achieved thanks to the definition of a language (Model Exchange Language in Rube and XMSL in Virtual Dancer) responsible for splitting out the static and dynamic codes of a virtual environment, which obligates both systems to dynamically generate the animation sequences for each new geometric model. From all analysed systems, only the Virtual Dancer and The Dancer are capable of dealing with animation for musical environment. However, on the system The Dancer, all movements are statically generated based on a finite set of pre-processed movements. At last, Virtual Dancer is the only system that can generate output files both in VRML and X3D. Table 1 bellow presents a summary of the features available on each system. Table 1: Comparison between systems

The Dancer Rube Virtual Dancer

5.4

Expansible/ Reuse

Dynamic Animation

X

X

X

X

VRML Output

X3D Output X

X

X

Music X X

Results

In order to verify the Virtual Dancer architecture, a small set of volunteers rated the choreographies generated for 200 different MIDI files. They took into account aspects such as level of synchronization and general aesthetics of the choreography. In almost all cases, Virtual Dancer was capable of generating dancing avatars that performed choreographies rated as satisfactory by all volunteers. Another result observed during the execution of this project was that, on average, the out X3D files were 121,35% bigger than the corresponding VRML ones. This great increase of size could seem a little strange at first, especially when we note the system substitutes each VRML command for its X3D direct correspondent. The problem is originated by an intrinsic characteristic of X3D: the use of tags inherited from XML to represent information. This characteristic increases the number of characters needed to represent one piece of information when compared to VRML. For instance, the command Transform is represented on both languages like this: Table 2: Command Transform in VRML and X3D Command Number of Characters

VRML Transform { … } 11

Dynamic Animation … 23

5.5

Conclusions and Future Work

The main contribution of this work was the development of an easy to use architecture for creation of 3D animation; more specifically dance movements. The low effort needed to construct such animations, makes it possible to add, very easily, this interesting feature to a virtual world. Furthermore, the modular nature of XMSL allows a very strong level of code reuse. Hence, the designer may create a repository of movements that could be later associated to any avatar. Possible extensions of this project are: definition of dynamic backgrounds, which could also be synchronized with a song; inclusion of more than one dancer simultaneously; and make it possible to synchronise each part of the dancer’s body to different instruments.

References Barrientos. F. and Canny, J. (2001) “Cursive: A novel interaction technique for controlling expressive avatar gesture”, Computer Science Division, University of California, Berkely, USA. Bell, G., Carey, R. and Marrin, C. (1996) "VRML 2.0: Cover page", http://gun.teipir.gr/VRML-amgem/index2.html, April. Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E. and Yergeau, F. (2004) “Extensible Markup Language (XML) 1.0 (Third Edition)”, W3C Recommendation, http://www.w3.org/TR/2004/REC-xml-20040204/, March. Clark, J. (1999) "XSL Transformations (XSLT) Version 1.0", W3C Recommendation, http://www.w3.org/TR/xslt, April. Daly, L. (2002) “Introducing X3D - Overview of X3D”, www.realism.com/Web3D/x3d/s2002/Overview/slides/index.htm, April. Dixon, S (2000) “Pacific Rim International Conference on Artificial Intelligence”, A Lightweight Multi-Agent Musical Beat Tracking System, p. 778-788 Gamma, E., Helm, R., Johnson, R. and Vlissides, J. (2000) “Padrões de Projeto”, Porto Alegre: Bookman. Guynup, S. (2000) "Basic Game Programming: 3D Invaders", http://www.3dezine.com/3DEZine/gamestory.html, April. Hansel, P. (2001) "All About Avatars", http://philliphansel.com/legal.htm, March. Kim, T. and Fishwick, P. (2002) “A 3d XML-Based Visualization Framework for Dynamic Models”, University of Florida, http://citeseer.ist.psu.edu/kim02xmlbased.html, March. Lee, J., Chai, J., Reitsma, P., Hodgins, J. and Pollard, N. (2002) “Interaction Control of Avatars Animated with Human Motion Data”. Rieger, T. (2003) “Avatar Gestures”, Interactive Graphics Systems Group, Technische Universität Darmstadt, Germany. Roads, C. (1996) “The Computer Musical Tutorial”, MIT Press. Samartino, L. (2003) "Teoria Musical", http://www.malhanga.com/musica/Teoria%20Musical_pag3.html, April.