Predictable Processing of Multimedia Content

Predictable Processing of Multimedia Content, using MPEG-21 Digital Item Processing Chris Poppe, Frederik De Keukelaere? , Saar De Zutter, Wesley De Neve, and Rik Van de Walle Ghent University - IBBT Department of Electronics and Information Systems - Multimedia Lab Gaston Crommenlaan 8, B-9050 Ledeberg-Ghent, Belgium {chris.poppe,frederik.dekeukelaere,saar.dezutter,wesley.deneve,rik. vandewalle}@ugent.be http://www.multimedialab.elis.ugent.be/

Abstract. Within an MPEG-21 architecture, the two key concepts are the Digital Item, representing multimedia content, and Users, interacting with this content. MPEG-21 introduced Digital Item Processing to allow content authors to describe suggested processing of their Digital Items. It standardizes ways to insert functionality into a Digital Item, as such, creating a dynamic and interactive multimedia format. Moreover, if a terminal wants to support Digital Item Processing, it needs to provide an execution environment offering basic functionality. The semantics of this functionality have been standardized, however there is significant room for interpretation. Consequently, a Digital Item author may not be aware of the actual processing when using this functionality. In this paper, a system is proposed, compliant with the Digital Item Processing specification, to give content creators full control on the processing. This allows creating advanced predictable multimedia systems in an MPEG21 environment.

1

Introduction

ISO/IEC 21000, better known as MPEG-21, is a standards suite developed by MPEG. It envisions to create a multimedia framework for the creation, delivery, and consumption of multimedia content across a wide range of networks and devices [1]. An MPEG-21-compliant terminal is a terminal that provides the necessary functionality to process MPEG-21 content and will further be called MPEG-21 terminal. A content author will create a Digital Item (DI), containing references to multimedia content and metadata. Embedded functionality in the DI allows the author to define the way the content should be processed when an MPEG-21 terminal is used to consume it. ?

The work considered in this paper has been partially conducted while Frederik De Keukelaere was a PhD student at Multimedia Lab. Since October 1, 2006 he is working at IBM Japan Tokyo Research Laboratory.

This paper focuses on the processing part, defined in part 10 of the MPEG21 Multimedia Framework, named Digital Item Processing (DIP)[2]. DIP allows the execution of script code inserted in the DI. Moreover, it provides basic functionality with standardized interfaces, available on any DIP-compliant terminal (called DIP terminal). DIP has been succesfully applied in the Los Alamos National Library for the dissemination of digital objects and as means to create services, linked to the digital objects, which can be executed by agents [3]. In this case the DIP terminals were all implemented by the same person and known to the DI authors. Contrarily, in a more open environment, different DIP terminals might interprete the standardized interfaces differently. Regarding an MPEG-21 environment, these possible implementations may lead to potential issues for the creation of DIs containing DIP functionality. In this paper, a system is presented to allow the content author to take full control of the processing by using standardized DIP functionality. The proposed system allows an author to implement his own set of basic functionality which can be used on a client device when the content is consumed. As a result, the author exactly knows how his content will be processed, while still maintaining MPEG-21 compliance. The outline of the paper is as follows. The next section elaborates on the MPEG-21 standard, specifically on DIP. Section 3 recapitulates the issues arising within an ubiquitous MPEG-21 environment and in Section 4, we present our solution. Section 5 elaborates on new use cases that become achievable by the proposed system. Finally, Section 6 formulates a number of conclusive remarks.

2

MPEG-21

The aim of MPEG-21, the so-called Multimedia Framework, is to enable transparent and augmented use of multimedia resources across a wide range of networks, devices, and communities. In this framework, the fundamental unit of transaction is a Digital Item (DI) as defined in part 2, named Digital Item Declaration (DID) [1]. This part defines the structure of a DI, which can contain (references to) multimedia content and metadata. The declaration of a DI uses an XML-based language, called Digital Item Declaration Language (DIDL), which defines the structure of the items. A DI can, for example, represent a music collection including audio files, descriptive information of every song, graphical elements representing CD-covers, etc. This is a static presentation, meaning there is no information available on how a DI should be processed by a consumer. DIP has been created to permit the author of a DI to add explicit information on how the item should be processed [2]. This way, an author can, for instance, add a method to the item which shows the cover of the CD, whilst playing a song and displaying a textual description. DIP allows the addition of interaction to the static declaration of a DI by means of Digital Item Methods (DIMs). These methods are written in the Digital Item Method Language (DIML), which extends ECMAScript [5]. They are essentially code fragments inserted in the

XML representation of the DI. A DIP terminal will most likely contain a module, called a DIP engine, capable of executing these methods. To extend the scripting functionality, DIP provides specific multimedia processing by defining a standardized set of functions. As such, similar behavior can be obtained on different terminals. These functions, called Digital Item Base Operations (DIBOs), form a library, available on any DIP terminal, which can be called from within a DIM. MPEG-21 has standardized the interfaces and semantics of this functionality and the developer of the DIP engine is responsible for providing an implementation. This has as advantage that different vendors can compete in their implementation. The DIBOs are divided into different categories, relating to different parts of the MPEG-21 framework. Moreover, the DOM Level 3 Core API and the DOM Level 3 Load and Save API [6] are included in DIML, allowing access, manipulation, loading, and serializing of the DID at the XML level. For a detailled description of the DIBOs, the reader is referred to [1]. An example of a DI containing DIP functionality is shown in Fig. 1. The XML representation shows two Components. The first component (identified by the id “movieResource”) defines a movie resource and a descriptor stating this element represents a Movie object. The second component (identified by the id “DIM”) contains a resource that represents a DIM. The first descriptor in this component is used to indicate the presence of a DIM. The second descriptor is used to denote the type of arguments the DIM takes. Consequently, the figure shows a DIM which takes a Movie object as argument and then executes a DIBO (DIP.play() in the example) on this object. The play DIBO is one of the DIPrelated DIBOs and renders the element, passed as an argument, into a transient and directly perceivable presentation. When the functionality provided by the DIBOs on a DIP terminal is not sufficient, a DI author can make use of Digital Item eXtension Operations (DIXOs). A DIXO is externally generated code which can be included in the DI. DIP defines ways to invoke DIXOs from inside a DIM and a DIXO has access to the entire DIBO set through standardized bindings. So DIBOs are part of a DIP terminal, but DIXOs are typically externally created by a DI author. The language of the DIXOs can be chosen freely, but currently only DIXOs written in the Java language, called J-DIXOs, are standardized. The J-DIXO itself is a Java class (if necessary included in a Java archive) which implements a pre-defined J-DIXO interface. A specific DIBO has been defined, called runJDIXO, which invokes the J-DIXO. Fig. 2 shows the different components of an MPEG-21 terminal from a DIP point of view. We can see that the DIP engine takes a central position; interacts with the User and is connected to additional modules related to the different MPEG-21 parts. A DID engine parses item and forward the DIP elements to the DIP engine. Through the DIBOs, an interface is created for a DI to utilize (part of) the underneath platform. However, there is room for interpretation of the semantics of the DIBOs and this vagueness can introduce several problems in real life scenarios, as will be discussed in the next section.

Fig. 1. Example Digital Item

Fig. 2. MPEG-21 terminal

3

Problem Description

Given that every DIP terminal can have its own implementation of the DIBOs, a number of problems arise. Although the semantics of the function are determined, the actual implementation can vary considerably, making it hard for a DI author to compose his content without knowledge of the client application. A short example is the alert DIBO, which takes a string as parameter and alerts the User. The semantics of this DIP-related DIBO are defined as “provide simple textual feedback to the User”[1]. It is obvious that the actual interpretation of alerting a User is rather vague and can be synchronously showing a popup message on the screen, displaying a warning on the media player, or even just adding some information to a log file. This might issue a problem for a content author if his application relies on the reaction of the user on this alert. For the other DIBOs similar problems can be found. Several of the DIBOs can only be used to their full capacity if the creator of the DI is aware of the actual implementation of the DIBOs at client side. Clearly a mechanism is needed that allows DI authors to control the processing of their content in a more detailled manner.

4

System for Predictable Digital Item Processing

The solution we propose makes use of the existing technology defined by DIP, therefore allowing full compliance with the standard. The basic idea is that a DI author provides an own implementation for a set of DIBOs (further called authorDIBOSet), encapsulated in a DIXO, which will then be used whenever a DIBO is called from within the DI. To accomplish this, a DI author adds a specific method which can be called by the DIP engine. By adding the attribute “autoRun” to the definition of the DIM and by setting its value to “true”, a DIP Engine will automatically execute this method when processing the DI. The method invokes a DIXO provided by the content author, containing the authorDIBOSet. The DIXO itself can be transported along with the DI or can be made available online. The execution of the DIXO starts with changing the occurrences of the desired DIBO calls into calls to the DIXO itself, as shown in Fig. 3. Since the DIXO can make use of the DIBOs and more specifically the DOM functionality, we can easily replace the textual occurrence of the DIBO calls by calls to the DIXO. The first four arguments of runJDIXO() are used to identify the element containing the Java archive or class, while the fifth argument (given the value “DIXOSet” in Fig. 3) is needed to define the appropriate class to be executed. The last argument is an array of arguments which is passed to the class. In our system, we use the first element in this array to identify the DIBO call, which was replaced by using the specific name of the DIBO (in this case the play DIBO, noticeable by the “play” argument of the DIXO call). The rest of the array is used to pass the original arguments of the DIBO to the DIXO. This ends the initialization phase, corresponding to step 1, shown in the sequence diagram in Fig. 4. The

Fig. 3. Conversion of DIBO calls to DIXO calls

DIXO will then allow the User to choose a DIM in the updated document. If a DIBO is invoked within that DIM, a direct call to the DIXO containing the authorDIBOSet is performed, with the appropriate arguments (step 2 in Fig. 4). At this point, the execution of the appropriate DIBO, provided by the DI author, starts and return values, if any, are passed back to the invoking DIM. As such, the content author can be sure that his implementation of the DIBO set is used and can uniquely determine the outcome. This system is transparent to the user, since he is not involved with the internal working of the methods, but only with the perceivable outcome.

Client

Server Load DI, execute autorunDIM Get DIXO containing authorDIBOSet

1

DIXO Change DIBO calls to DIXO calls authorDIBOSet DIXO call cal DIBO 2 Result

Server side processing Result 3

Fig. 4. Sequence diagram of the use of an authorDIBOSet

The DI author can deliver an implementation for every DIBO or restrict himself to the most relevant ones and reuse a number of the DIBOs available in

the client application. As shown in step 3 of Fig. 4, the author can even choose to place the DIBO implementation on a centralized server, or provide the DIBO functionality throug a web service, thereby reducing the processing effort on the client device. To increase the performance of the system, a simple check might be added in the DIM which loads the DIXO, to see if the set is already present in the client application. Since a content author typically produces numerous diverse DIs, there is a clear benefit to get the authorDIBOSet only once and consequently refer to it in those DIs. The described system gives an author full control of the processing, allowing to exploit his in-depth understanding of the content. Therefore, the possibilities for different DIBOs are extended in the following ways. The DIBO related to Digital Item Adaptation (DIA) allows to adapt elements of the DI. According to the DIBO requests that an attempt is done to adapt a specific element. This attempt might fail if the DIP engine has no adaptation capabilities, or it does not know how to interpret the input arguments. Since the DI author is aware of the actual format of the input arguments, he is more equiped to create the adaptation. Our system allows that the author can introduce specific adaptation tools steered by associated metadata. MPEG-21 DIA defines tools to adapt DIs based on context information. Resources can be adapted according to descriptions of the usage environment, introducing an advanced quality of service. For example, within DIA, means are defined to describe the preferences of the user. This way, scenes of interest within a movie can be identified and could be displayed at higher quality [7]. The rich variety of adaptations introduced in DIA can only be used if appropriate software is available and if the format of the arguments is exactly known. Our system allows to execute this adaptation since the DI author himself will implement it. If high consuming adaptations cannot be run on the client device, the author can choose to place them on a server. The DIBOs related to part 2 of MPEG-21, Digital Item Declaration (DID), provide means to allow the end-user to make specific choices when dealing with a DI (for example the choice of a specific movie to be shown). The way that these choices are presented to the user is not defined. Through our system the DI author can present the choices in a consistent and structured way, according to his own preferences. The DIBOs related to part 3 of MPEG-21, Digital Item Identification (DII), specify means to retrieve elements from a DI according to a specific identification. The DI author is better suited to provide an implementation of these DIBOs, since he has a priori knowledge about the structure of the DI and the location of several identified elements. This way, high cost XML processing can be avoided to increase the systems performance [8]. The DIP-related DIBOs, which mostly interact with the User, can be extended with rich user interfaces allowing consistent presentation of different DIs from the same content author. Playback of specific content can be achieved,

by providing appropriate codecs and even entire players which offer advanced control to the User. The DIBOs related to part 5 of MPEG-21, Rights Expression Language (REL), can now be used according to the intention of the DI author. The author can set up his own license server and has, consequently, more control on the usage of his content. A DI author can use an alternative to our proposed system. Upon creation of the DIMs within his DI, he might choose to use DIXOs instead of DIBOs. The playMovie DIM, in Fig. 3, will then directly contain a DIXO call instead of the play DIBO call. This is similar to the DI formed after the initialization phase (step 1 in Fig. 4). The creation of DIs in such a manner has as an advantage that there is no need to reconvert the DI at the Users’ side. However, when the DI author wants to change the inserted functionality this has to be done for all the produced DIs. Our system collects the DIBO implementations in a set and allows easy updates. The author can also choose between DIBO implementations on the DIP terminal or implementations from the authorDIBOSet, whereas this is not possible in the alternative system, since this is hard-coded.

5

Discussions

Our system allows to accomplish new use cases and service delivery. In this section, the use case of a museum equipped with an interactive multimedia infrastructure is presented. In the case of a closed environment, meaning a system in which the client applications and the provided content are known to the DI author, the author is aware of the actual processing. This might be the case in an interactive museum, where people get a museum-owned PDA containing an MPEG-21 terminal, which can be used to consume content provided by the museum (this was the use case of the European project DANAE1 ). Since the client application is known by the museum, several assumptions can be made on what the values are for the different arguments of the DIBOs and what happens within the processing. If we want to achieve the goal of MPEG-21 and broaden the environment in a way that the content authors do not need to have knowledge about the client MPEG-21 terminals, we foresee difficulties for the content authors as mentioned above. Consider a museum with the infrastructure to present interactive multimedia content. A central server stores MPEG-21 content and is responsible for delivering this to available terminals. If a user enters the museum, carrying his own PDA, cell phone or other multimedia device, containing an MPEG-21 terminal, he will be able to consume the museum content. Context about the consumers is collected and processed to generate advanced quality of service and user experience. We can work out the use case through the system presented in this paper. The museum will create an authorDIBOSet, implementing the relevant DIBOs, 1

http://danae.rd.francetelecom.com/

which will be used on all client devices. By using this authorDIBOSet, a specific user interface can be created for communication with the user (see Fig. 5). Advertisements can be added to the user interface, without having to incorporate these in the media content itself . A consistent uniform user interface is provided in any interaction and on any device. Control of the play DIBO allows to play

Fig. 5. Multimedia application using DIP

proprietary content with a specific codec or player according to the needs of the museum (in the figure, the player is a multimedia player able to show rich multimedia presentations including text, graphics, sound, and movies). A full implementation of the license related DIBOs, making use of internal licensing and registration servers can be used to deliver specific content to appropriate users. As such, customers with a subscription to the museum can get access to additional content or services. Heavy processing, like the adaptation of content is done on server side, allowing even the most constrained devices to consume the multimedia. Information is gathered on the number of consuming clients, the maximum bandwidth, and the capacity of the museum’s infrastructure. Consequently, this is taken into account when performing adaptations or delivering content. By using a centralized server, context can be collected and made available to each application. For example the figure shows the name of the consumer, “Katrien”, when starting an interaction. The name is just a simple example of various contextual information, which can be collected when a consumer enters a museum. This use of context creates a more personalized approach to multimedia processing; it focuses on the key player in an interactive multimedia application, namely the user. The presented solution allows introduction of any MPEG-21 terminal into the multimedia infrastructure of the museum. Visitors can use their own multime-

dia devices resulting in reduced costs for the museum. The use case presented in this section can be extended to other domains wherein a multimedia infrastructure can be exploited, e.g., warehouses, educational environments, and cultural events. By providing the appropriate DIBO sets and domain-specific content, the existing client application can be used in other settings.

6

Conclusions

The major contribution of our work is the development of an MPEG-21-compliant system to extend the way a DI author can define the actual processing of his DI. The system gives the content authors full control over the actual processing. A number of advantages throughout the different parts of DIP were presented. A use case has been presented, showing the applicability of the system in a real-life scenario. Our system makes the processing of Digital Items more appealing and interesting for industrial content providers, allowing advanced service delivery. Acknowledgments. The research activities that have been described in this paper were funded by Ghent University, the Interdisciplinary Institute for Broadband Technology (IBBT), the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT-Flanders), the Fund for Scientific Research-Flanders (FWO-Flanders) and the European Union.

References 1. Burnett, I., Pereira, F., Van de Walle, R., Koenen, R.: The MPEG-21 Book. Wiley. (2003) 195–204 2. De Keukelaere, F., De Zutter, S., Van de Walle, R.: MPEG-21 Digital Item Processing. IEEE Transactions on Multimedia. Vol. 7. (2005) 427–434 (2000) 809–830 3. Bekaert, J., Balakireva, L., Hochstenbach, P., Van de Sompel, H.: Using MPEG21 DIP and NISO OpenURL for the Dynamic Dissemination of Complex Digital Objects in the Los Alamos National Laboratory Digital Library D-Lib Magazine. Vol. 10. (2004) 4. Poppe, C., De Keukelaere, F., De Zutter, S., Van de Walle, R.: Advanced Multimedia Systems Using MPEG-21 Digital Item Processing. Proceedings of Eighth IEEE International Symposium on Multimedia. (2006) 785–786 5. ECMA, Standard ECMA-262 ECMAScript Language Specification 3rd edition. (1999) http://www.ecma-international.org/publications/standards/Ecma-262.htm 6. The W3C Document Object Model. http://www.w3.org/DOM/ 7. Devillers, S., Timmerer, C., Heuer, J., Hellwagner, H.: Bitstream Syntax Description-Based Adaptation in Streaming and Constrained Environments. IEEE Transactions on Multimedia. Vol. 7. (2005) 463–470 8. De Zutter, S., De Keukelaere, F., Poppe, C., Van de Walle, R.: Performance analysis of MPEG-21 technologies on mobile devices. Proceedings of Electronic Imaging. Vol. 6074 (2006)