Planning Animations Using Cinematography Knowledge Kevin Kennedy and Robert E. Mercer Cognitive Engineering Laboratory, Department of Computer Science The University of Western Ontario, London, Ontario, CANADA
[email protected],
[email protected]
Abstract. Our research proposes and demonstrates with a prototype system an automated aid for animators in presenting their ideas and intentions using the large range of techniques available in cinematography. An experienced animator can use techniques far more expressive than the simple presentation of spatial arrangements. They can use effects and idioms such as framing, pacing, colour selection, lighting, cuts, pans and zooms to express their ideas. In different contexts, a combination of techniques can create an enhanced effect or lead to conflicting effects. Thus there is a rich environment for automated reasoning and planning about cinematographic knowledge. Our system employs a knowledge base of cinematographic techniques such as lighting, colour choice, framing, and pacing to enhance the expressive power of an animation. The prototype system does not create animations, but assists in their generation. It is intended to enhance the expressiveness of a possibly inexperienced animator when working in this medium.
1
Related Work
Some computer graphics systems have incorporated cinematographic principles. He et al. [3] apply rules of cinematography to generate camera angles and shot transitions in 3D communication situations. Their real-time camera controller uses an hierarchical finite state machine to represent the cinematographic rules. Ridsdale and Calvert [8] have used AI techniques to design animations of interacting characters from scripts and relational constraints. Karp and Feiner [4, 5] approach the problem of organizing a film as a top-down planning problem. Their method concentrates on the structure and sequencing of film segments. Perlin and Goldberg [7] have used AI techniques to develop tools to author the behaviour of interactive virtual actors. Sack and Davis [9] use a GPS model to build image sequences of pre-existing cuts based on cinematographic idioms. Butz [2] has implemented a tool with similar goals to our own for the purposes of generating animations that explain the function of mechanical devices. The system uses visual effects to convey a communicative goal. The animation scripts are incrementally generated in real time and are presented immediately to the user. E. Stroulia and S. Matwin (Eds.): AI 2001, LNAI 2056, pp. 357–360, 2001. c Springer-Verlag Berlin Heidelberg 2001
358
2
K. Kennedy and R.E. Mercer
RST Plan Representation
The transformation from animator intent into presentation actions requires some type of structured methodology to allow implementation. For this purpose we are employing Rhetorical Structure Theory (RST) [6]. Though RST was envisioned as a tool for the analysis of text, it also functions in a generative role. Its focus on communicative goals is useful for modelling the intentions of the author, and how these intentions control the presentation of the text. This technique is used by Andre and Rist to design illustrated documents [1]. In our work the author is replaced by an animator and the text is replaced with images. The communicative acts are not comprised of sentences, but are assembled from the structure and presentation of the scene.
3
Design Approach
We are using a traditional AI approach: acquire and represent the knowledge, then build a reasoning system. The source of our knowledge is a traditional cinematography textbook [10]. The knowledge in this book is general in nature but has a simple rule-based approach. There are three major components to the reasoning system: the knowledge base, the planner, and the renderer. Knowledge Base. The knowledge base is our attempt to capture the “common sense” of cinematography. Some important concepts represented in the knowledge base are: cameras, camera positions, field of view, lights, colours, scenes, stage positions, solid objects, spatial relationships, 3D vectors, occlusion, moods, themes and colour/light effects. Figure 1 shows an example of some of the knowledge presented in our cinematography reference text in several chapters on scene lighting. In this figure we have broken down the techniques described into their major classifications arranging them from left to right according to the visual “energy” they convey. The terms written below each lighting method are the thematic or emotional effects that are associated with these techniques. It is these effects that the animator can select when constructing a scene with our program. In addition to lighting techniques, the knowledge base represents camera effects like framing, zooms, and wide-angle or narrow-angle lenses. Colour selections for objects and backgrounds as well as their thematic meanings are also contained in the knowledge base. These three major techniques (lighting, colour, and framing) can be used to present a wide variety of effects to the viewer. We have used a qualitative reasoning approach to representation in our knowledge base. For instance, a size instance is categorized as one-of tiny, small, medium-size, large, and very-large while stage positions consist of locations like stage-right or stage-left-rear. The knowledge base is written in LOOM, a Classification/Subsumption based language written in LISP. LOOM represents knowledge using Concepts and Relations which are arranged in a classification hierarchy. LOOM’s power lies in its ability to classify concepts into the classification hierarchy automatically.
Planning Animations Using Cinematography Knowledge
359
Amount of Light (energy) Low Key
High Key
chiaroscuro
foreshadow happy ending
feeling down sad drama
flat
upbeat happy
silhoutte
romantic concealed identity sharp and jagged
fast fall-off sorrow age exhaustion
cameo
regular flat lighting
high key-light
wonderment joy normalcy
unpleasant futureevent
enlightenment
cleanliness and efficiency devoid of human warmth and compassion
deserved doom
dramatic
flat + high key-light high energy enlightened normalcy
over saturation depersonalization disorientation mechanization smoothness and beauty
reverse light (from below) disorientation questionable credibility ghostly frightened
unusual uncomfortable
sharp shadows
soft shadows
Fig. 1. Semantic Deconstruction of Cinematography Lighting Models
Planner. The planner constructs RST plans which contain cinematographic instructions for presenting animation scenes. The planner is a depth-first forward chainer that actively analyzes the effects of the RST plan steps. While the RST plan is being constructed, the planner searches through the space of all possible RST plans implied by the predefined RST plan steps. The partial RST plan at any point is the “state” of the planner as it searches through possible plans. As the planner proceeds, a description of the animation shot is created. A Shot concept contains relations (in frame terminology, slots) for characters, lightsets, colour-choices, camera positions, etc. The specifics of a particular Shot are created through a series of constraints and assertions to the knowledge base. This specific Shot is an “instance” of the the Shot “concept”. If at any point a Shot instance is found to be inconsistent (for example, it is constrained as both brightly lit and dark at the same time) then this branch fails and the planner backtracks to try another approach. If a plan succeeds, the resulting shot is presented to the animator. At this point, the animator can evaluate the scene using his or her own criteria and can choose to accept or reject the result. If the animator rejects a shot, the planner is told that the current solution is a failure. The planner then back-tracks to the most recent choice point, and continues to search for another solution. Renderer. After the planner has found an RST plan for a shot, it can be rendered. The Shot instance for the plan contains all information needed to render the scene visually. For this task we use the Persistence of Vision ray-
360
K. Kennedy and R.E. Mercer
tracer (POV-ray). A ray-tracer is needed to correctly render the complex lighting effects that can be generated by the RST planner. Alternatively, the shot can be rendered to VRML (Virtual Reality Modelling Language) and viewed with an appropriate tool.
4
Current Status and Future Work
The present implementation accepts input statements about animator intentions and scene structure and produces ray-traced images of the scene with appropriate lighting, colour choice, and framing applied. In the future we will concentrate on assembling short scenes from several distinct shots. Acknowledgements. We would like to thank Robert E. Webber for his contributions to an earlier version of this paper. This research was funded by NSERC Research Grant 0036853.
References 1. E. Andre and T. Rist. The design of illustrated documents as a planning task. In Intelligent Multimedia Interfaces, pages 94–116. American Association for Artificial Intelligence, 1993. 2. A. Butz. Anymation with CATHI. In Proceedings of the 14th Annual National Conference on Artificial Intelligence (AAAI/IAAI), pages 957–962. AAAI Press, 1997. 3. L.-w. He, M. F. Cohen, and D. H. Salesin. The Virtual Cinematographer: A Paradigm for Automatic Real-Time Camera Control and Directing. Computer Graphics, pages 217–224, August 1996. SIGGRAPH ’96. 4. P. Karp and S. Feiner. Issues in the automated generation of animated presentations. In Proceedings Graphics Interface ’90, pages 39–48. Canadian Information Processing Society, May 1990. 5. P. Karp and S. Feiner. Automated presentation planning of animation using task decomposition with heuristic reasoning. In Proceedings Graphics Interface ’93, pages 118–127. Canadian Information Processing Society, May 1993. 6. W. C. Mann and S. A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(13):243–281, 1988. 7. K. Perlin. Real Time Responsive Animation with Personality. IEEE Transactions on Visualization and Computer Graphics, 1(1):5–15, March 1995. 8. G. Ridsdale and T. Calvert. Animating microworlds from scripts and relational constraints. In N. Magnenat-Thalmann and D. Thalmann, editors, Computer Animation ’90, pages 107–118. Springer-Verlag, 1990. 9. W. Sack and M. Davis. IDIC: Assembling Video Sequences from Story Plans and Content Annotations. In Proceedings International Conference on Multimedia Computers and Systems, pages 30–36. IEEE Computer Society Press, 1994. 10. H. Zettl. Sight Sound Motion: Applied Media Aesthetics. Wadsworth Publishing Company, 1990.