Improv: Interactive Improvisational Animation and

0 downloads 0 Views 25KB Size Report
... other in real-time with personalities and moods consistent with the author's goals and intentions. ... from external devices and video cameras. This paper begins ... demonstrations and installations created with Improv. Authoring ... Using coherent noise in limb movements allows authors to give the impression of naturalistic ...
Improv: Interactive Improvisational Animation and Music Eric Singer, Athomas Goldberg, Ken Perlin, Clilly Castiglia, Sabrina Liao Media Research Laboratory New York University Introduction Improv is a system for the creation of real-time behavior-based animated actors. There have been several recent efforts to build network distributed autonomous agents, but in general, these efforts do not focus on the author’s view. To create rich interactive worlds inhabited by believable animated actors, authors need the proper tools. Improv provides tools to create actors that respond to users and to each other in real-time with personalities and moods consistent with the author’s goals and intentions. The character animation system in Improv consists of two subsystems. The first is an Animation Engine that uses procedural techniques to enable authors to create layered, continuous, non-repetitive motions and smooth transitions between them. The second is a Behavior Engine that enables authors to create sophisticated rules governing how actors communicate, change and make decisions. The combined system provides an integrated set of tools for authoring the "minds" and "bodies" of interactive actors. Recent development has added audio and musical features to the Improv system. Known as Improv Musique, this development work focuses on two areas: Interactive Virtual Musicians and Dancers; and Audio Support for Virtual Environments. Features include actor speech, voice recognition, controllable music, environmental sound and user input from external devices and video cameras. This paper begins with an overview of the Improv animation and authoring system, followed by the music and audio features, and concluding with a description of recent demonstrations and installations created with Improv. Authoring Animation in Improv As an authoring system, Improv must provide creative experts with tools for constructing the various aspects of an interactive application. These must be intuitive to use, allow for the creation of rich, compelling content and produce behavior at run-time which is consistent with the author’s vision and intentions. Animated actors must be able to respond to a wide variety of user interactions, in ways that are both appropriate and nonrepetitive. This is complicated by the fact that in applications involving several characters, these actors must be able to work together while faithfully carrying out the author’s intentions. The author needs to control the choices an actor makes and how the actors move their bodies.

Architecture The model used by Improv consists of an Animation Engine which utilizes descriptions of atomic animated actions (such as Walk or Wave) to manipulate 3D models, and a Behavior Engine which is responsible for higher- level capabilities (such as going to the store or engaging another actor in a conversation) and decisions about which animations to trigger. In addition, the Behavior Engine maintains the internal model of the actor, representing various aspects of an actor’s moods, goals and persona lity. In a sense, the Animation Engine represents the "body" of the actor while the Behavior Engine constitutes its "mind". Animation Engine The Animation Engine provides tools for generating and interactively blending realistic gestures and motions. Actors are able to move from one animated motion to another in a smooth and natural fashion in real time. Motions can be layered and blended to convey different moods and personalities. The author defines an action simply as a list of joint rotations together with a range and a time varying expression for each. Most actions are constructed by varying a few of these over time via combinations of sine, cosine and coherent noise (controlled randomness). For example, sine and cosine signals are used together within actions to impart elliptical rotations. Using coherent noise in limb movements allows authors to give the impression of naturalistic motions without needing to incorporate complex simulation models. The author can also import keyframed animation from commercial modeling systems such as Alias or SoftImage. The Improv system internally converts these into actions that specify time varying values for various joint rotations or deformations. To the rest of the system, these imported actions look identical to any other action. Behavior Engine Because the user is a variable in the run-time system, Improv authors cannot create deterministic scenarios. The user’s responses are always implicitly presenting the actor with a choice of what to do next. Because of this variability, the user’s experience of an actor’s personality and mood must be conveyed largely by that actor’s probability of selecting one choice over another. The behavior engine provides several authoring tools for guiding an actor’s behavioral choices. The most basic tool is a simple parallel scripting system in which individual scripts, like actions, are organized into groups of mutually exclusive behavior. However, unlike actions, when a script within a group is selected, any other script that was running in the same group immediately stops. In any group at any given moment, exactly one script is running. Generally speaking, at any given moment an actor will be executing a number of scripts in parallel. In each of these scripts, the most common operatio n is to

select one item from a list of items. These items are usually other scripts or actions for the actor (or for some other actor) to perform. The author must assume that the user will be making unexpected responses. For this reason, it is not sufficient to provide the author with a tool for scripting long linear sequences. Rather, the author must be able to create layers of choices — from more global and slowly changing plans to more localized and rapidly changing activities — that take into account the continuously changing state of the actor’s environment and the unexpected behavior of the human participant. Individual Scripts A script is organized as a sequence of clauses. At run-time, the system runs these clauses sequentially for the selected script in each group. At any update cycle, the system may run the same clause that it ran on the previous cycle, or it may move on to the next clause. The author is provided with tools to "hold" clauses in response to events or timeouts. The simplest thing an author can do within a script clause is trigger a specific action or script, which is useful when the author has a specific sequence of activities (s)he wants the actor to perform. In addition to commands that explicitly trigger specific actions and scripts, Improv provides a number of tools for generating the more non-deterministic behavior required for interactive non-linear applications. In Improv, authors can create decision rules which take information about an actor and its environment and use this to determine the actor’s tendencies toward certain choices over others. The author specifies what information is relevant to the decision and how this information influences the weight associated with each choice. As this information changes, the actor’s tendency to make certain choices over others will change as well. Coordination Of Multiple Actors An author can coordinate a group of actors as if they were a single actor. We do this by enabling actors to trigger each other’s scripts and actions with the same freedom with which an actor can trigger its own. If one actor tells a joke, the author may want the other actors to respond, favorably or not, to the punchline. By having the joke teller cue the others actors to respond, proper timing is maintained even if the individual actors make their own decisions about how exactly to react. In this way, an actor can give the impression of always knowing what other actors are doing and responding immediately and appropriately in ways that fulfill the author’s goals. User Interaction and Multi-Level Control Of Actor State One important feature of Improv is the ability for the user to interact with the system at different levels. This means that the author can give the user the right kind of control for every situation. If the user requires a very fine control over actors’ motor skills, then the

author can provide direct access to the action level. On the other hand, if the user is involved in a conversation, the author might let the user specify a set of gestures for the actor to use and have the actor decide on the specific gestures from moment to moment. At an even higher level, the author may want to have the user directing large groups of actors, such as an acting company or an army, in which case (s)he might have the user give directions to the entire group and leave it to the individual actors to carry out those instructions. Since any level of the actor’s behavior can be made accessible to the user, the author is free to vary the level of control as necessary at any point in the application. Improv Musique Over the past year, we have been adding music, audio and user- input features to Improv. As a result, Improv now provides facilities for adding actor speech with lip synching, voice recognition, ambient background sound and effects, controllable music sequence playback, singing synthesis and user input from external devices and video. Implementation Improv Musique is made up of several components. Features are implemented on both Macintosh and UNIX machines which communicate across a local area network. Much of the aud io system is implemented in Opcode™ MAX, a visual programming environment for the Macintosh used primarily for MIDI applications. In the Musique system, we use MAX for receiving and filtering data from input devices; playing and processing MIDI and digital audio files; interfacing with Macintosh voice recognition facilities; and network communication with actors. MAX programs (called "patches") are the central input, output and control point for Musique features. Custom MAX external objects, written in C, enable digital audio playback, MIDI file playback, voice recognition and video input features. UNIX programs provide network message routing as well as sound file analysis for lip synching. Network communication plays an important part in the integrated aud io and visual environment of Improv. Messages sent via telnet between actors and the Musique components allow actors to request services, such as speech playback; inform the Musique system about changes in the environment (e.g. user location in a virtual space); and receive information about audio and music (e.g. sound volume or song tempo). The author has the ability to define message types and formats as required, relating to the various Musique features, and direct them to different actors. Custom MAX Objects Integral to the Musique system are several custom external objects for MAX . These include SeqPlayer, AiffPlayer, Reco, SoundMap and VideoIn

SeqPlayer is a Standard MIDI File (SMF) player. It allows a MIDI sequence to be played back beat by beat, enabling one to conduct and control the tempo of a computer band or orchestra similar to the way a conductor would conduct a real orchestra. For example, each beat received by SeqPlayer could be used to output one quarter note’s worth of music from a score file. SeqPlayer also provides other essential features such as bar and beat reporting, jumping and looping. MIDI data from SeqPlayer can be processed in various ways before output, thereby enabling a musical score to be played back and modified in real time under control of a human user or an Improv actor. AiffPlayer plays Audio Interchange Format File (AIFF) digital audio files. Files play directly from hard disk, allowing sounds of any length to be played. Several files can be played simultaneously with ind ependent control of volume and panning. AiffPlayer is used in Improv for speech file playback as well as ambient audio and sound effects. Reco provides an interface to the Macintosh speaker- independent speech recognition system. It allows the author to create groups of words and phrases to be recognized as valid responses. When a user speaks, Reco reports which word or phrase was spoken or a special message for an unrecognized utterance. It supports filtering out of optional words and spurious sounds ("um’s and er’s"). SoundMap controls ambient sound and sound effects for a virtual environment. It stores the location and attributes of sounds within a virtual space and controls playback based on a user’s position within the space. Sound playback from both MIDI and digital audio is supported. VideoIn receives live input from a video source, typically a video camera. It performs motion detection by comparing each frame to the previous one. It outputs this information as motion occurring in user-definable zones. Actor Speech Actors have the ability to speak a variety of pre-recorded phrases which they select in a similar way to choosing physical actions. Phrases are recorded and stored as AIFF files on the Macintosh. Upon selecting a phrase, an actor sends a message to initiate playback by AiffPlayer of the corresponding sound file. The system also tracks the location of the user in the virtual space relative to the speaking actor and adjusts volume and panning accordingly. To enable lip synching, sound files are first analyzed using a Linear Predictive Coding (LPC) program called "lpanal" (part of MIT’s Csound package). Output from "lpanal" is further analyzed to determine the locations of vowels and consonants or simply mouth opening based on loudness information. This second-stage analysis creates and saves an animation code file which translates this timing information into facial animations. When the actor speaks a particular phrase, the corresponding animation code is executed simultaneously.

Voice Recognition Actors initialize the Reco object by sending a list of response sets to MAX. Each response set contains a list of expected responses from the user for a given situation. For example, when asked a "yes or no" question, the user might respond "yes," "no," "maybe," "I don’t know," etc. Sets are defined using a flexible syntax which allows for a large number of responses to be specified succinctly. When expecting a response, the actor sends a message to enable a particular set. When the user speaks, the Macintosh attempts to analyze the utterance and match it to an item from the response set. Reco then returns a message to the actor with the name of the recognized phrase or an "*unreco*" message to indicate an unrecognized utterance. Environmental Sound Using a visual interface patch in MAX, a sound designer can place sounds throughout an aerial map of a virtual space. By setting attributes such as range, volume and panning curves, the designer can tune the spatial characteristics of each sound (for example, making a sound more directional or more ambient). This information is stored by SoundMap and used to control the audio of the environment. During operation, Improv continually reports the user’s location and orientation in the virtual space. Based on this information, SoundMap turns sounds on and off and adjusts volume and panning individually for each sound based on the user’s relative position and orientation. User Input Using MAX, the system can acquire input from video camera, serial devices (such as a magnetic tracker) or MIDI instruments (such as electronic keyboards or drums). Input data can be analyzed and filtered in MAX and used to control music and sound or to provide information to actors. For example, motion detection from VideoIn can inform an actor about how "lively" the user is. A performance on a MIDI instrument can be analyzed to provide volume and timing information to an actor. A magnetic tracking device can be used as an electronic baton to drive a musical performance (as outlined below). Improv In Practice Botanica Virtuàl "Walking through the fog, you cross over a bridge and into the bayou. At a fork in the road, you meet Papa Legba, a huddled old man playing the harmonica. He looks up at you and says, ‘At the crossroads, anything can happen.’"

At SIGGRAPH ‘96, we presented The Botanica Virtual, an immersive VR experience in which a participant enters a bayou swamp environment and meets a number of characters, each representing a Voodoo archetype. The Improv characters respond to the participant’s actions in the space as well as engaging the participant in conversation through speech recognition and generation. Dancing Gregor "Inside the Juke Joint, we hear the sounds of the bar clientele along with music blaring from the jukebox. The bartender turns down the jukebox and the virtual band begins their set. Gregor, a virtual actor, hears the band break into a blues shuffle and begins to dance." The virtual band is an audio entity controlled by a user playing an electronic drum. The user functions as the conductor, controlling the tempo and volume of the band. Beats from the drum are received by MAX and used to drive the SeqPlayer object which outputs the MIDI score file. The output is processed so that the velocity of the drum beats controls the band’s volume. Gregor "listens" to the band by means of messages which indicate the beat, tempo and volume, sent via the MAX’s "telnet" object . He synchronizes his step to the beat and chooses his dance style based on the tempo and volume. For example, when the music is slow and loud, Gregor will tend to dance in a "limbo" style. Aria "The conductor steps up to the podium, picks up the baton and commands the virtual orchestra to play. Gigio, a virtual opera singer, nods to the conductor and surveys the audience. He begins to sing and act out the Aria in the tradition of great tenors past." In the Aria installation, Also presented at SIGGRAPH ‘96, a user conducts the opera singer and orchestra using an electronic baton. The baton contains a magnetic tracking device used to sense its position. Data from the sensor is read in MAX and analyzed to derive a beat from each down stroke, as well as the amplitude of the stroke and the horizontal position. Beats from the baton drive the SeqPlayer object, thereby controlling the tempo. Amplitude of the baton stroke controls the volume of the orchestra and vocals, and horizontal position controls the vowel sound of the vocalist, who sings with an "a, e, i, o or u" sound. Gigio’s vocal part is generated on an SGI using MIT Csound, a real-time software synthesis program. In Csound, we use the FOF algorithm to create vocal synthesis of a male tenor singing in vowel sounds. The MIDI score file output from SeqPlayer contains both the orchestra and vocal part. The orchestral part is output to a MIDI synthesizer, while the vocal part is formatted into a Csound command and sent by MAX via telnet to the machine running Csound.

Gigio is sent messages for the notes and the location in the score. He synchronizes his facial expression to the vowel sound and chooses his actions based on the score location. For example, he knows when to make his entrance, where the climax of the music is and when to take his bow.

Credits Improv is a project of the New York University Media Research Laboratory and Center for Advanced Technology in conjunction with the Laboratório de Sistemas Integravéis of University of São Paulo, Brazil. Special thanks to Ruggero Ruscioni for his direction on the Aria project. For further information, see http://cat.nyu.edu/projects. References Badler, N., C. Phillips and B. Webber, "Simulating Humans: Computer Graphics, Animation, and Control" Oxford University Press, 1993 Blumberg, B. M. and T. A. Galyean, "Multi- Level Direction of Autonomous Creatures for Real-Time Virtual Environments" Proceedings of SIGGRAPH 95, In Computer Graphics Proceedings, Annual Conference Series, pp. 47-54, ACM SIGGRAPH, New York, 1995 Bruderlin, A. and T. W. Calvert, "Goal- Directed, Dynamic Animation of Human Walking" Proceedings of SIGGRAPH 89, In Computer Graphics , 23 (4), pp. 233-242, ACM SIGGRAPH, New York, 1989 Bruderlin, A. and L. Williams, "Motion Signal Processing" Proceedings of SIGGRAPH 95, In Computer Graphics Proceedings, Annual Conference Series, pp. 97-104, ACM SIGGRAPH, New York, 1995 Dannenberg, R.B. "Real- Time Scheduling and Computer Accompaniment" In Current Directions in Computer Music Research, eds. M.V. Mathews and J.R. Pierce, MIT Press, Cambridge, MA, 1989 Dannenberg, R.B., "Nitely News" In program notes from The Second Artificial Intelligence Based Arts Exhibition, Joseph Bates, curator, published in AAAI-9, Seattle, WA, 1994 Johnson, M. 1994. "WavesWorld: A Testbed for Three Dimensional Semi- Autonomous Animated Characters" PhD Thesis, MIT, 1994 Matthews, M. V., "The Cond uctor Program and Mechanical Baton" In Current Directions in Computer Music Research, eds. M.V. Mathews and J.R. Pierce, MIT Press, Cambridge, MA, 1988

Minsky, M., "Society of Mind" MIT press, 1986 Perlin, K. and A. Goldberg, "Improv: A System for Scripting Interactive Actors in Virtual Worlds" Proceedings of SIGGRAPH 96, In Computer Graphics Proceedings, Annual Conference Series, pp. 205-216, ACM SIGGRAPH, New York, 1996 Perlin, K., "Real Time Responsive Animation with Personality" In IEEE Transactions on Visualization and Computer Graphics 1(1), IEEE, New York, 1995 Perlin, K., "An Image Synthesizer" Proceedings of SIGGRAPH 85, In Computer Graphics 19(3), pp. 287-293, ACM SIGGRAPH, New York, 1985 Rowe, R., "Interactive Music Systems" MIT Press, Cambridge, MA, 1992 Strassmann, S., "Desktop Theater: Automatic Generation of Expresssive Animation" PhD thesis, MIT Media Lab, June 1991 (online at http://www.strassmann.org/straz/phdthesis.pdf) Sundberg, J., L. Nord and R. Carlson, eds. "Music, Language, Speech, and Brain" MacMillan Press, Mount Vernon, NY, 1991 Unuma, M., K. Anjyo and R. Takeuchi, "Fourier Principles for Emotion-based Human Figure Animation" Proceedings of SIGGRAPH 95, In Computer Graphics Proceedings, Annual Conference Series, pp. 91-96, ACM SIGGRAPH, New York, 1995