Adaptive Video Documentaries C. Rocchi
M. Zancanaro
ITC Irst Via Sommarive, 18 Povo Trento, Italy
ITC Irst Via Sommarive, 18 Povo Trento, Italy
[email protected]
[email protected]
ABSTRACT We present a first step toward a system for video adaptation. We introduce a formalism, XASCRIPT, which allows defining templates with adaptation rules as multiple-choice points on user dependent features. We then illustrate a system that composes video documentaries describing 2D images. The novelty of the approach stems from the application of adaptive hypermedia techniques to the dynamic composition of video documentaries. The architecture of the system is presented and the features currently supported in the application scenario are discussed.
ical author to define a set of possible video documentaries, from which user-tailored documentaries can be dynamically assembled. The author, via a graphical interface (under development), can describe templates that represent a set of potential video documentaries, with instructions for the selection and the editing of the elements of the video clips. The templates include adaptation rules as multiple-choice points, along which the presentation is tailored to a specific user. The application supports both content adaptation and dynamic selection of transition effects between video shots.
2. XSCRIPT Keywords adaptive systems, video documentaries, temporal media
1.
INTRODUCTION
Although some interesting architectures are emerging to define and formalize the notion of Adaptive Hypermedia (see for example [1]), the research in this field is primarily focused on hypermedia based on static media, mainly text and images. With some notably exceptions, there have been attempts to provide a framework for the adaptation of temporal media as well. Not and Zancanaro [4] introduces a framework for the automatic composition of audio based hypermedia. Craig and colleagues [2] describe a system to automatically extract and combine small video fragments from a news archive. A noteworthy system is Cuypers, which generates web-based presentation from a multimedia database (see [6]). We propose an initial step toward the adaptative composition of video documentaries, that is video presentations based on still 2D images. Video clips are built by applying cinematic techniques, mainly camera movements and transition effects, to images. The novelty of our approach stems from the application of adaptive hypermedia techniques to the dynamic composition of video documentaries exploiting the basic notions of cinematography. We introduce XASCRIPT, a flexible mark-up formalism that allows a hypotet-
The XSCRIPT formalism is an XML-based specification language to represent the logical structure of a video documentary exploiting the basic notions of cinematography: shots, camera movements and transitions(see [3]). We define a shot as a sequence of camera movements applied to the same image. The basic camera movements are pan, tilt and zoom, respectively rotations of the camera along the x, y and z axis. Transitions among shots are considered as the punctuation symbols of cinematography; they affect the rhythm of the discourse and the message conveyed by the video. The main transitions are cut - the first frame of the shot to be displayed immediately replaces the last frame of the shot currently on display; fade - a shot is gradually replaced by (fade out) or gradually replaces (fade in) a black screen or another shot and cross-fade, which is the composition of a fade out on the displayed shot and a fade in applied to the shot to be shown. In [5] we described an automatic video planner that generates video documentaries starting from an audio commentary annotated at discourse level. Here we report on the extensions brought to XSCRIPT to enable the definition of adaptive video templates.
3. XASCRIPT We have designed XASCRIPT to allow defining adaptation rules [7] and constraints that establish when and how the elements of the language of cinematography can be used to adaptively assemble a video documentary. XASCRIPT is a language for the definition of templates, that is intensional descriptions of a set of potential documentaries, with multiple-choice points on user-dependent parameters. Once a video documentary is requested, the adaptation engine elaborates the templates according to the current user model and returns a script for a video documentary in XSCRIPT form. In XASCRIPT, at the moment, there are two ways to affect the presentation with user-dependent features: (i) the selection of shots and (ii) the choice of transition ef-
fects. To create a template the author first defines a shot repository. The second part of the template relates to editing, that is how pieces of information (shots) are presented to the user. Editing is a key point in the generation of a movie. The choice of transitions affects the flow of discourse and highlights the passing from a piece of information to another. In a classic text-based hypermedia, transitions from one document to another might not be important for user’s understanding. In the case of video-based hypermedia they are crucial, as pointed out by cynematographers and film critics (see for example [3]), for they underline the rhythm of the presentation, signalling the (semantic) “closeness” of the scenes to which they apply. A director might choose a long cut to underline that two scenes are not closely related, whereas short and faded transitions are more indicated for smoother changes. To define how shots are to be sequenced, which transitions apply and how long transitions last, XASCRIPT implements adaptation rules. An adaptation rule is a structure, where conditions include requirements and actions are pieces of documentary (transitions) or other rules. We have designed XASCRIPT to support two types of resources for adaptivity: (i) user-model features (UM-expression) and (ii) editing features (EDIT-expression). A UM-expression is a check over the set of features encoded in the user-model. In the current implementation the set of features include: Spatial position: the user has already been here, she is in front of an artwork, she is close to an exhibit; Interests: the user is interested in a painting, she prefers details about the author; Discourse history: the user has already seen a particular shot, she has already watched the presentation of an exhibit; Background knowledge: the user has already been in the museum, she is a painting teacher;
... $currentMovie.getLastTransition() == fade-out ...
Figure 1: Example of template.
4. AN APPLICATION SCENARIO At the moment the system is adopted in an integrated architecture for the adaptive presentation of “The Cycle of the Months” a fresco in Torre Aquila (Trento, Italy) composed of eleven panels. The system detects the position of the visitor by means of infrared emitters placed in front of each panel. In this scenario we have defined a set of templates and a user model supporting all the features presented in section 3. Templates implement different strategies to cope with content adaptation and particularly with dynamic selection of transitions. Figure 1 shows an example of dynamic selection of transition effects. If the last transition is a fadeout then shot02 is introduced by a fade-in of two seconds, othewise it is simply displayed.
5. REFERENCES [1] P. D. Bra, A. Aerts, D. Smits, and N. Stash. AHA! version 2.0, more adaptation flexibility for authors. In Proceedings of the AACE E-Learn Conference, Budapest, Hungary, October 2002. [2] C. Lindley, J. Davis, F. Nack, and L. Rutledge. The application of rhetorical structure theory to interactive news program generation from digital archives. Technical Report INS-R0101, CWI Centrum voor Wiskunde en Informatica, January 2001.
Device: the user is visiting a museum provided with a PDA, she is looking for information on the web with a desktop PC;
[3] C. Metz. Film Language: a Semiotics of the Cinema. Oxford University Press, 1974.
Skills: the user has good skills with technological tools;
[4] E. Not and M. Zancanaro. The MacroNode approach: Mediating between adaptive and dynamic hypermedia. Lecture Notes in Computer Science, 1892, 2000.
Visiting style: the user tends to move a lot, tends to request additional information; Experise: the use is a novice, an expert, a child; EDIT-expressions are conditions over the editing of the current movie that include: dependencies among content units: e.g. the selection of a given shot requires that another shot has already been selected; dependencies among presentation forms: e.g. if the last transition is not a cut the next shot can fade-in. The combination of both resources allows defining fine-grained templates, providing a flexible mechanism that supports both content adaptation and dynamic selection of transition effects.
[5] C. Rocchi and M. Zancanaro. Generation of video documentaries from discourse structures. In Proceedings of the 9th European Workshop on Natural Language Generation, Budapest, Hungary, April, 13-14 2003. [6] J. van Ossenbruggen, J. Geurts, F. Cornelissen, L. Rutledge, and L. Hardman. Towards second and third generation web-based multimedia. In Proceedings of the 10th International World Wide Web Conference, Hong Kong, May, 1-5 2001. [7] H. Wu, E. D. Kort, and P. D. Bra. Design issues for general purpose adaptive hypermedia systems. In Proceedings of the 12 th ACM Conference on Hypertext and Hypermedia, Arhus, Denmark, August 2001.