Adding Dynamic Visual Manipulations to Declarative ...

5 downloads 120 Views 5MB Size Report
Sep 18, 2009 - Animation, Content Enrichment, Declarative Language, Media. Annotation ... the social features that make these interfaces for image sharing.
Adding Dynamic Visual Manipulations to Declarative Multimedia Documents Fons Kuijk, Rodrigo Laiola Guimarães, Pablo Cesar and Dick C. A. Bulterman CWI: Centrum Wiskunde & Informatica Science Park 123 1098 XG Amsterdam, The Netherlands +31 20 592 93 33

{fons.kuijk, rlaiola, p.s.cesar, dick.bulterman}@cwi.nl ABSTRACT

Imagine a tourist that went on holiday to Rio de Janeiro, Brazil. Suppose he visited the 'Hipódromo da Gávea' to watch a horse race. Next he walked alongside the 'Lagoa Rodrigo de Freitas' where he saw a Christmas tree on a platform floating in the lagoon. In the afternoon he took a boat trip to the Cagarras Islands and the next morning he climbed the Two Brothers Hill to see a magnificent sunrise. Our tourist was impressed, and now wants to put some of his excitement into a slideshow presentation of his journey. A photograph (Figure 1) can be used to tell the story of his trip. In Figure 1a: regions of interest are highlighted, metadata has been associated to the regions, and our authoring tourist may have added comments to the photo and the regions.

The objective of this work is to define a document model extension that enables complex spatial and temporal interactions within multimedia documents. As an example we describe an authoring interface of a photo sharing system that can be used to capture stories in an open, declarative format. The document model extension defines visual transformations for synchronized navigation driven by dynamic associated content. Due to the open declarative format, the presentation content can be targeted to individuals, while maintaining the underlying data model. The impact of this work is reflected in its recent standardization in the W3C SMIL language. Multimedia players, as Ambulant and the RealPlayer, support the extension described in this paper.

Categories and Subject Descriptors H.5.1 [Information Interfaces and Presentations]: Multimedia Information Systems - Animations. I.7.2 [Document and Text Processing]: Document Preparation - Format and notation, hypertext/hypermedia, Languages and Systems, Multi/mixed media.

General Terms Documentation, Experimentation, Human Factors, Languages.

Keywords Animation, Content Enrichment, Declarative Language, Media Annotation, Pan and Zoom, Photo Sharing, SMIL.

(a)

1. INTRODUCTION Currently, there are a number of Web-based interfaces for sharing pictures with others (e.g., Flickr, Picasa, and Facebook). Among the social features that make these interfaces for image sharing popular, is the ability to include metadata, and add comments and tags to regions of interest [1][5]. As a result, the amount of information linked to media material in the Web has grown exponentially. The question remains, has this trend materialized in innovative structured multimedia documents and models? Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DocEng’09, September 16-18, 2009, Munich, Germany. Copyright 2009 ACM 978-1-60558-575-8/09/09...$10.00.

(b) Figure 1. Panorama of Rio de Janeiro. (a) Regions of interest: the Hippodrome, a floating Christmas tree, the Cagarras Islands, and Two Brothers Hill. (b) Paths: the interconnecting path indicates the temporal ordering.

149

Envision a presentation in which various descriptions associated with these regions are presented as dynamic, synchronized pans and zooms across the image area, temporally ordered by a connecting path as indicated in Figure 1b. A timeline representation is shown in Figure 2a. Consumers of the story may select interactive options accessible as temporal and spatial hyperlinks. The presentation can be customized [4] to support user interests, as is illustrated in Figure 2b and c.

In section 2 of this paper an overview of existing technical solutions and systems is provided. Next, extensions for existing declarative languages are discussed. Section 4 validates the contribution by elaborating on an authoring system capable of converting photos, a-temporal in nature, into multimedia structured documents, based on the proposed extensions for existing declarative languages.

2. EXISTING TECHNOLOGY To realize rich media presentations we discern: rich media environments, general-purpose representation languages and multimedia representation languages. Rich Media Environments. Silverlight and Flex both provide a proprietary solution for presentation functions and programming logic (JavaScript and ActionScript), yet implementing pan and zoom functionality is non-trivial. General Purpose Representation Languages. HTML+CSS currently does not support animations, transitions, and pan and zoom functionality. Rich relationships between different media elements cannot be specified. JavaScript libraries that work across distinct platforms, such as Yahoo! UI and jQuery, make the animation creation task easier.

(a)

Multimedia Representation Languages. SMIL (Synchronized 1 Multimedia Integration Language) [9], SVG (Scalable Vector Graphics), and NCL (Nested Context Language) [8] are XML based languages that describe multimedia presentations in a declarative manner. They include wide support for playback media, spatial layout, and temporal composition. Support for animations is a strong point. SMIL provides tags for building interactive multimedia presentations. SVG describes static and animated two-dimensional vector graphics. It can be purely declarative or may include scripting. NCL is used to specify interactive multimedia presentations. The major difference between these languages is the support for pan and zoom; possibly the most complete feature set is offered by SMIL 3.0.

(b)

3. SMIL MediaPanZoom MODULE Our proposed extension to the SMIL 3.0 recommendation, as presented in this paper, forms now part of the standardized MediaPanZoon module2. The SMIL panZoom attribute and the animate element in particular are relevant for the specification of the envisioned structured multimedia presentation. The panZoom attribute of the MediaPanZoom module integrates the functionality of the SVG viewBox attribute and adapts it for use within the SMIL media framework. The panZoom attribute allows users to define an area of the media object that is projected within the panZoom area into a SMIL presentation, as shown in Figure 3. The panZoom area may be smaller, equal to, or larger than the media object area. The fit attribute, or sub-region positioning and alignment directives dictate scaling.

(c) Figure 2. Starting with the full image, the presentation zooms in on the first region, pans to the successive regions of interest and finally zooms out to the full image again (a). Based on user interest, metadata or external events the user can get his preferred events presented (b) and (c).

If supported by the profile implementing the MediaPanZoom module, a dynamic pan-and-zoom effect is obtained by applying standard SMIL animation primitives to the dimensions of the panZoom area. Panning is obtained by varying the X and Y positioning values, and for zooming the size dimensions of the

Unfortunately this scenario is hard to realize. Often external authoring tools for creating and customizing complex declarative media presentations cannot easily access the vast amount of data linked to media. Reuse of media and annotations can be offered by using open formats.

150

1

http://www.w3.org/Graphics/SVG

2

http://www.w3.org/TR/SMIL3/smil-extended-media-object.html

panZoom area have to be varied. If a panZoom area extends past the viewable extents of a media object, then the effective contents of these extended areas will be transparent.

resulting multimedia presentation (Figure 2a) will incorporate the regions, paths, and annotations provided by the author. It will start by zooming in on the first region of interest, while the linked information is displayed – either as text or as audio, but synchronized. Then, following the defined paths, a transition to the next region of interest will happen, and so on. Customized presentations can be created by reordering the paths (as shown in Figure 2b and c) or by providing different associated media files.

Figure 3. An example of SMIL panZoom attribute usage. Scaling is dictated by the fit attribute. It is important to highlight that SMIL allow users to add metadata to any of the elements contained in the document specification. Moreover, authors can synchronize pan and zoom effects with audio or captions, and even generate customized presentations for distinct audiences.

4. A Photo Browser Application of Pan Zoom

Figure 4. Image-level (bottom row) and Region-level (upper row) timeline showing the order of images and regions and the transition between them (F is a fade-in fade-out, P is panning).

The authoring tool we present here converts photos into temporal structured documents; currently, timed enrichments include audio narratives. Authoring systems can be full-fledged providing powerful functionality; ours provides very specific functionality thus making it affordable and simple for users. A sequence of images and regions on these images forms the basis of an animated storyline (see Section 1). The author can sort images on an image-level timeline (see Figure 4). For each image, the author can manually identify regions of interest (Figure 1a). In the future we aim to be able to import region annotations from photo sharing systems, MPEG-7 descriptors [7], or have them identified by automatic recognition tools. In addition, the author can link recorded audio files (e.g., saying “On this mountain I did see….”) and other information (textual descriptions and annotations) to the regions. It is important to note that regions and associated linked media are not embedded, thus assuring customizable and accessible capabilities.

The authoring system is implemented in Java. It has a preview mode to allow for fine-tuning the presentation. A presentation is created based on an internal data model that handles regions, comments, annotations and paths independently. This model can be serialized in XML format for persistence and to be able to cut, copy and paste a storyline or parts of it. Regions are bound to images; storylines combine images, regions and audio fragments. Audio fragments can be coupled to regions or clusters of regions and may control timing of the static and dynamic panZoom functionality. The author can export the presentation as a structured multimedia document. The encoding is based on SMIL that – being an open format – offers navigation and customization to the end-user. The images are referred to via URI’s, maintaining the integrity of the sources. The timed annotations for transitions between the regions of interest include functionality of the SMIL MediaPanZoom module. The visual component of a storyline is in effect a sequence of static and dynamic panZoom components. Support

Having identified the regions of interest, the author can define animated paths between regions (Figure 1b) by sorting regions on a region-level timeline (see Figure 4). The ordering determines temporal transitions, created by the system, that maintain visual continuity. Similar to regions, transitions support linked media (e.g., audio file saying “after the horse race, I went to…”). The

151

6. ACKNOWLEDGMENTS

for interaction is obtained by using temporal and spatial hyperlinks. Specific storylines can be targeted to individual users, so that watching the presentation may become an interactive, personalized experience. Although there is a common ground, our tourist may want to convey a story to his family that differs from what he wants to tell to his close friends.

This work was supported by EU FP7-ICT project TA2. The development of the Ambulant Player is supported by the NLnet foundation.

7. REFERENCES

Photo management tools and video production systems such as iPhoto, iMovie, Photo Animation and Photo Story pack the media in a self-contained media file. By doing so, media integrity is not maintained, it cannot support timed navigation or selective viewing, and it does not allow reuse of media and metadata. MemoryNet Viewer [6] does not enable end-users to add timed enrichments as pan and zoom, nor export to an open document format. StoryTrack [2] stores stories and metadata in XML that can be translated to HTML or SMIL for sharing with others. An audio track cannot span a collection of images, and it does not allow users to add timed pan/zoom enrichments. The Flipper System [3] does not offer timed enrichments, and it is not possible to create a story based on a set of images. iTell [5] supports storywriting and media production. It associates scripts with voiceover and images. Stories are expressed in SMIL, but it does not support timed annotation or hyperlinks. Web-based photo sharing services (e.g. Flickr) and community-sharing environments (e.g. Orkut) do allow users to share pictures with the ability to add (spatial) notes and comments. However, all annotations are a-temporal and sitespecific.

[1] Ames, M. and Naaman, M. 2007. Why we tag: motivations for annotation in mobile and online media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 971-980. DOI= http://doi.acm.org/10.1145/1240624.1240772 [2] Balabanović, M., Chu, L. L., and Wolff, G. J. 2000. Storytelling with Digital Photographs. In Proceedings of the SIGCHI Conference on Human factors in Computing Systems (The Hague, The Netherlands). CHI '00. ACM Press, New York, NY, 564-571. DOI= http://doi.acm.org/10.1145/332040.332505 [3] Counts, S., and Fellheimer, E. 2004. Supporting Social Presence through Lightweight Photo Sharing On and Off the Desktop. In Proceedings of the SIGCHI Conference on Human factors in Computing Systems (Vienna, Austria). CHI '04. ACM Press, New York, NY, 599-606. DOI= http://doi.acm.org/10.1145/985692.985768 [4] Jansen, J. and Bulterman, D. C. A. 2008. Enabling adaptive time-based web applications with SMIL state. In Proceeding of the Eighth ACM Symposium on Document Engineering, pp. 18-27. DOI= http://doi.acm.org/10.1145/1410140.1410146

5. CONCLUSIONS This paper describes an extension to a multimedia declarative language that allows authors to add customizable panning and zooming capabilities to media content.

[5] Landry, B. M., and Guzdial, M. 2006. iTell: Supporting Retrospective Storytelling with Digital Photos. In Proceedings of the 6th Conference on Designing Interactive Systems (University Park, PA, USA). DIS '06. ACM Press, New York, NY, 160-168. DOI= http://doi.acm.org/10.1145/1142405.1142432

As an example we presented our authoring system that adds temporal aspects to content. For this, the author has to identify regions of interest, associate comments and audio commentaries, and determine a navigation path between the regions. The result is a structured multimedia document in which the base media objects – recorded audio, textual descriptions and base image – are linked to each other, not encoded into an embedded format. This enhances their re-usability in different contexts and different environments.

[6] Rajani, R., and Vorbau, A. 2004. Viewing and Annotating Media with MemoryNet. In Extended Abstracts on Human Factors in Computing Systems (Vienna, Austria). CHI '04. ACM Press, New York, NY, 1517-1520. DOI= http://doi.acm.org/10.1145/985921.986104

Given the right facilities, the consumer can walk out on the linear storyline of the presentation; the links assure that playing narratives and panning and zooming the images maintain synchronization. Meanwhile the presentation can continue to play an optional continuous audio stream – e.g., background music – that in turn can be synchronized with other internal or external events. Of course the presentation can also be consumed in linear fashion.

[7] Ryu, Ming-Sung; Park, Soo-Jun; Won, Chee Sun. Applications of Digital Image Processing XXVIII. Edited by Tescher, Andrew G. Proceedings of the SPIE, Volume 5909, pp. 232-240. 2005 [8] Soares, L. F. G., Rodrigues, R. F. 2006. Nested Context Language 3.0: Part 8 - NCL Digital TV Profiles. Technical Report - Department of Informatics - PUC-Rio. n. 35/06. ISSN 0103-9741.

The MediaPanZoom module we proposed has been included in the SMIL 3.0 standard. Mainstream multimedia players, such as the Ambulant Player3 and the RealNetworks4 player, already support it.

3

http://www.ambulantplayer.org

4

http://www.realnetworks.com

[9] Synchronized Multimedia Integration Language (SMIL 3.0), W3C Recommendation 01 December 2008, http://www.w3.org/TR/SMIL.

152

Suggest Documents