to create and develop new, powerful devices for multimedia consumption. ... is illustrated by the implementation of a demonstrator on the Windows Mobile.
MPEG-21 Session Mobility on Mobile Devices Frederik De Keukelaere, Robbie De Sutter, and Rik Van de Walle Multimedia Lab, Department of Electronics and Information Systems, Ghent University-IBBT, SintPietersnieuwstraat 41, B-9000 Ghent, Belgium
Abstract— Since broadband network connections are becoming ubiquitous, people expect access to multimedia content anywhere anytime. This new demand results in a need to create and develop new, powerful devices for multimedia consumption. The new devices, each having their set of network and terminal capabilities, allow transparent access to multimedia and therefore open the path to transparent mobility of multimedia. In this paper we demonstrate how MPEG-21 can be used to provide a solution for session mobility between devices with different terminal and network capabilities. We demonstrate how MPEG-21 Digital Items can be used for realizing session mobility between two different devices and how it can overcome the difficulties of capability differences. The feasibility of the MPEG-21 approach for session mobility is illustrated by the implementation of a demonstrator on the Windows Mobile 2003 and Windows XP platforms. This paper outlines the technical details of the implementation and the results of several performance measurements on both platforms. Index Terms— Session Mobility, MPEG-21, Digital Item Declaration, Digital Item Adaptation
1
T
ODAY,
Introduction
people are willing to use applications on a wide set of devices in a broad set of environments and situations. The augmented demand for devices has resulted in the development of many new mobile and versatile products. Those new devices each have their own different capabilities and performance. As will be demonstrated in this paper, creators of mobile devices are currently building hardware platforms powerful enough to bring the experience of a multimedia application to mobile terminals. Given the capabilities of these new and future devices, end-users are becoming more and more able to access their multimedia content anywhere anytime. This concept, accessing multimedia anywhere on any device, is called Universal Multimedia Access [1], [2]. It requires content providers to produce multimedia resources suited for this wide variety of devices. To make this possible, research is going on in different
Student watching a lesson on a settop box
Media stream
tv/set-top box
Student continuing the session on a PDA
Media stream
Content and streaming server
Session Transfer
Pocket PC Session Transfer Media stream Finishing the elearning session on a tablet pc
Tablet PC
Figure 1: session mobility in a use case scenario
fields allowing the realization of Universal Multimedia Access [3], [4]. This research has made it possible to access multimedia content with a whole set of different devices. Since people can now use multiple devices to access their content, they are demanding a new and easy way to transparently switch from one device to another. This concept, called session mobility, has been studied in various domains. While some have focused on the mobility of sessions between applications (e.g., between browsers [5]), others have focused on protocols allowing session mobility [6], [7]. In this paper we describe how session mobility for multimedia sessions can be realized. It is organized as follows. In the first section, we discuss session mobility in general. In this discussion, we provide an architecture allowing the realization of session mobility between two devices. This brings us to a set of problems that can occur when realizing session mobility in general. In the second part of this paper, we discuss the MPEG-21 Multimedia Framework [8], [9]. In that section, we demonstrate how MPEG-21 can provide a solution to the problems that occur when realizing session mobility between devices with different capabilities. We discuss how MPEG-21 Session Mobility allows transparent session transfer within MPEG-21. In the third part of this paper we give the technical details and performance
Device A
Device A
Device B
Device B
1. collect session data
1. collect session data
2. send session data
2. send session data
3. process session data 3. process session data
4. request updated session information
Figure 2: Simple session mobility
results of an implementation of MPEG-21 Session Mobility on a PDA (Windows Mobile 2003) and a PC (Windows XP) platform. Finally, this paper concludes with remarks on our results.
2 2.1
Session Mobility – general concept Use case scenario
Before providing an architecture in which session mobility can be realized, let us start by illustrating the concept of session mobility with a possible use case scenario. Consider a student "attending" online classes. While at home this student watches the course on a TV connected to a set-top box. The set-top box offers the student the ability to video-chat with other students while watching the lectures. Suddenly, the student receives a phone call. As he does not want to miss a part of the course, he pauses his online course session. By the time he has finished his phone call, he notices that he has to take the bus to a meeting of the student council. While leaving his home, he transfers his session to his PDA, so he can continue watching the course during the bus ride to the meeting. Some time later he arrives at the meeting. Since he is a bit too early, he decides to continue the course on a tablet pc, while he is waiting for the other students to arrive. This time, he transfers his session from his PDA to the web pad, in order to have the advantage of a larger screen, a faster internet connection and some extra processing power.
2.2
A general architecture for session mobility
Before a multimedia session can be successfully transferred from one device to another, there are several requirements that need to be fulfilled. Let us consider the example of a session mobility scenario between device A and device B as depicted in Figure 2. In the first step of the session transfer, device A gathers the information about the current session. During this step information such as what media is being consumed; the current media position; and the current
Figure 3: Advanced session mobility
status of the session (play, pause, etc.) are collected. As a second step in the session transfer, the collected information is sent to device B and device A stops the session. To complete the session transfer, device B processes the received information and continues the session. Although this is a simple example of session mobility, there are three clear steps: • • •
Collect session data Transfer session data Process data and continue session
It is possible to think of more sophisticated protocols for session transfer; for example, the protocol described above can be extended to allow multiple updates of the session information. This can be useful when clients A and B are required to be synchronized before the session transfer is completed. Figure 3 shows how this can be realized. Until the initial startup of the session on device B the algorithm is identical except for the fact that, here, device A does not stop the session but continues until device B tells it to stop. After the initial startup of the session, device B requests new updates so that the current position of the media can be adjusted. The updates can be repeated until the session on device B is synchronized with device A. At that point device B can tell device A to stop and the session is then successfully transferred to device B. Using multiple updates results in a smoother transfer of the session between device A and device B because the session is not stopped at any time during the transfer. In the simple scenario the session is stopped at device A as soon as the session data is transferred and the session continues after device B has processed the session data. During that period the session is temporarily “paused”. In the scenario with multiple updates the session at device A is stopped as soon as the session on device B is running and synchronized with the session on device A. Therefore there is no “pause” of the session at any time.
2.3
General remarks on session mobility
Because the aim of a Universal Multimedia Access framework is to distribute content to a wide set of devices, there may be a significant difference in the capabilities of the different terminals between which session mobility occurs. This difference in terminal capabilities adds complexity to session mobility. For example, watching a video on a terminal with a large screen and then transferring this session to a terminal that does not support such a large screen will most likely require the adaptation of the video to a lower resolution. Other possible differences in terminal capabilities can cause similar problems for session mobility (e.g., differences in processing power, availability of codecs, etc.). Another area adding complexity to session mobility is the network. Just as with terminal capabilities, differences in network capabilities can result in different requirements for the session. For example, switching from a device with a broadband connection to a device with limited bandwidth will likely require switching to content encoded at a different bit rate in the new session. Other differences in network characteristics can also impact on session mobility; examples are differences in error rate of the network, packet loss, etc. As a final remark on session mobility in general, we would like to address the interoperability of the messages sent between the different terminals. Unless there is a common, proprietary or standardized, representation for the messages between the two different terminals transferring sessions, they will not be able to understand each other. Lack of common representation, results in the impossibility to reconstruct the session on the target device based upon the session data of the originating device.
3
How can MPEG-21 tools realize session mobility?
In this section we demonstrate how the MPEG-21 standard, provides a solution for session mobility and the difficulties described in section 2. However, before moving on to the solutions, let us look at the new technology that MPEG-21 brings to the multimedia world. Today, many different standards are combined on our devices so as to build an infrastructure for the delivery and consumption of multimedia content. Some of them focus on the structure of the different elements of a multimedia presentation (e.g., Metadata Encoding & Transmission Standard [10]), while others focus on the
layout and the temporal behavior of multimedia presentations (e.g., XHTML [11], SMIL [12], etc.). ISO/IEC 21000, better known as the MPEG-21 Multimedia Framework, is a much broader standard that looks beyond data encapsulation and data presentation, to address the problems of digital rights management, dynamic adaptation of multimedia, etc. Contrary to most other multimedia standards, which are usually focused on one aspect of a multimedia session, MPEG-21 tries to realize the “big picture”. MPEG-21 describes how various elements of multimedia content fit together throughout the multimedia delivery and consumption chain. It provides the ability to deliver and consume multimedia data across a variety of terminals, networks and platforms. Therefore, MPEG-21 can be considered as a real Universal Multimedia Access framework.
3.1
Digital Item
Within MPEG-21 the concept of a “Digital Item” is the key to the whole framework; every transaction, every message, every form of communication is performed using a Digital Item. Within ISO/IEC 210001 [13], [14], Digital Items are defined as “structured digital objects, including a standard representation, identification and metadata”. To be able to use a “Digital Item” across the framework there is the need for a flexible, but precise description of that Digital Item. The second part of MPEG-21, the Digital Item Declaration (DID) [15], [16], provides the required flexibility and makes it possible to declare a Digital Item composed of multiple multimedia resources. It is a container structure allowing users to describe the relationship between the different elements of the Digital Item. As an example of how Digital Items can be used, consider the creation of a digital music album. Suppose the digital music album consists of different music tracks, the album’s title, the titles of the individual songs, some additional video clips, etc. A Digital Item Declaration containing the necessary resources (references) and metadata for this music album can be constructed in XML using the Digital Item Declaration Language (DIDL). Since Digital Items are used throughout the whole MPEG-21 framework, Digital Items can have different functionalities within the MPEG-21 framework. The most basic functionality for a Digital Item is carrying content throughout the framework. We will call such Digital Items content Digital Items (CDIs). This type of Digital Item is aimed at consumption by an end-user and usually consists of a set of resources (e.g., a video
Figure 4: A content Digital Item
stream), a set of choices (e.g., about bit rate), and a set of descriptors (e.g., indicating interesting parts in the video stream). In Figure 4, a Digital Item consisting of several video streams is represented in the DIDL. The video streams contained in the Digital Item have different resolutions and different bit rates. This Digital Item also contains two choices allowing a user to choose between different bit rates and resolutions. It should be noted that the choices given in Figure 4 can only be made by human interaction with the Digital Item because there are no additional descriptors containing machine readable information about the resolutions and bit rates added to the Digital Item. This information has not been added because it is not required for the purposes of this section and it would add unnecessary complexity to the example. However, such information is standardized in ISO/IEC 21000-7 Digital Item Adaptation [17], [18] and can be added to the Digital Item by means of descriptors. Besides containing multimedia content, a Digital Item can be used to contain information about the context in which another Digital Item will be used. This type of Digital Item, which we will call a context Digital Item (XDI) throughout this paper, contains information about terminal capabilities, network capabilities, or any type of information about a Digital Item that is needed for the multimedia consumption and delivery chain. In this paper context Digital Items will be used to contain metadata for realizing MPEG-21 session mobility. To enable session mobility, the XDI captures the configuration state of a content Digital Item, which is defined by the state of choices that have been made (e.g., chosen bit rate, resolution, etc.). The Digital Item for
isPlaying 1.234898
Figure 5: A context Digital Item for MPEG-21 Session Mobility
session mobility also captures application-state information, which pertains to information specific to the application currently rendering the Digital Item. Examples of application-state information include the position on the screen at which a video contained in a CDI is being rendered, or the track of a music album currently being played. Figure 5 is an example of a possible context Digital Item for the content Digital Item of Figure 4. It encapsulates the state of the choices made for the CDI (i.e., resolution CIF, bit rate 512 kbps) and contains the current playback position (1.2s) and state (isPlaying) of the resource being consumed.
3.2
MPEG-21 session mobility
As described previously, MPEG-21 defines a generic, standardized, format in which Digital Items can be expressed. This allows the producers of content to encapsulate their multimedia in a Digital Item that can be consumed on a wide set of terminals and networks. This is possible because content creators have the possibility to include different kinds of resources and metadata inside one Digital Item. Along with the different kinds of data it is possible to include a set of choices allowing the end-user or the terminal to select that resource that is suitable for the environment in which the multimedia will be consumed. It is this choice mechanism, in combination with descriptors describing terminal and network characteristics, that allows the flexibility to create Digital Items that can be consumed on a wide set of
Starting an MPEG-21 multimedia session
Media stream
Client A
MPEG-21 Session Mobility
Content and streaming server
Media stream Resuming an MPEG-21 multimedia session
Client B
Figure 6: Architecture overview
terminals and networks. As a consequence, the mechanism also provides session mobility with the ability to handle devices with different characteristics. MPEG-21 session mobility has the advantage that the messages, the XDIs, between two devices are represented in a standardized form (i.e., the DIDL), and, even more, in the same standardized form as the content itself, the CDIs. This enables interoperability at the level of the messages. Since DIDL is standardized, every device which is MPEG-21 compliant will, automatically, be able to understand the messages for session mobility. If that terminal is also able to understand the data contained within the message, then MPEG-21 session mobility can be realized.
4
Implementation of MPEG-21 session mobility
In order to demonstrate the usability of MPEG-21 session mobility we have created an MPEG-21 session mobility demonstrator. The demonstrator consists of three different parts: a streaming server, a content server, and an implementation of MPEG-21 session mobility. Before describing the different parts of the demonstrator, we first describe the multimedia content used in the demonstrator. In the test we used the trailer of the movie Spiderman and encoded the video stream with Windows Video 8 at different bit rates and different resolutions. The video stream was encoded at 256 kbps and 512 kbps, at a constant bit rate, both in the CIF and the QCIF resolution. The frame rate of the video was kept at 30 fps. The audio streams where encoded at 64 kbps, at a constant bit rate, using Windows Audio 8. For the tests, we combined both the streaming server and the content server on one machine. The operating system, a Windows 2003 Server Standard Edition, runs on top of an Intel Pentium 4 running at 2.8 GHz with Hyper-Threading enabled and with 256 Mb of RAM. To deliver the content to the clients we used a standard web
server, the Internet Information Server 6.0. For the delivery of the multimedia streams, we used the Windows Media Services streaming server. During the tests we transferred a session between three different clients. The first, an Intel Pentium 4 running at 2.8 GHz with Hyper-Threading enabled and with 512 Mb of RAM, was running Windows XP. The second, an Intel Pentium 1 Celeron at 450 MHz with 128 Mb of RAM, was running Windows 2000. The third, an iPAQ 5550, with Intel XScale 400 MHz with 128 Mb of RAM was running Windows Mobile 2003. Before describing the test results let us first describe the implementation of the demonstrators. The demonstrators were built using the .Net framework on the PC platform and the .Net Compact framework on the PDA platform. Both implementations use the Windows Media Player for displaying the multimedia content. The applications consist of an MPEG-21 content Digital Item viewer and an MPEG-21 context Digital Item handler. They allow its users to load an MPEG-21 Digital Item and present a user interface suited for that Digital Item. The applications look in the Digital Item for Choice tags and present those choices to the user. For the Digital Item presented in Figure 4, they present the choices between different resolutions and different bit rates. A screenshot of the user interface on the iPAQ for that digital item can be found in Figure 7. After the choices are made, the applications detect which resources are available and present them to the end-user. At this point the end-user is able to consume the resources (Figure 8). For this consumption the Windows Media Player is controlled from within the .Net (Compact) framework using the Windows Media Player COM component. At this point, a session has been started on one device. To transfer multimedia sessions to another device, the demonstrators are continuously listening for requests from other devices. In the applications, this listening is done through a standard TCP socket. All communication between different devices is done through that socket. When device A wants to transfer a session to device B, it generates a session mobility context Digital Item and sends it to device B using the TCP connection. Device B then parses the message and uses the information within the XDI to load the CDI. Next, it configures the choices in the CDI according the information in the XDI. After configuring the state of the choices device B continues the consumption of the resources based upon the information in the XDI. An example XDI that has been transferred between two devices can be found in Figure 5.
Figure 7: Making choices on the PDA
We measured several parts of the processing complexity that MPEG-21 adds to session mobility. As a test of the performance of the demonstrators we measured 4 different parts during a typical MPEG-21 multimedia session: the parsing of a CDI, the generation of an XDI, the parsing of an XDI, and the transferring of a session. All four performance tests where done on the Pentium 4, the Pentium 1, and the iPAQ. The results of the tests can be found in Table 1, Table 2, and Table 3 respectively. Since the demonstrators are implemented in the .Net framework and the .Net Compact framework, the times we measured are just estimates of the actual times. This is the case because, in the .Net framework and the .Net Compact framework, it is possible that the garbage collector interferes with the measuring of execution times and therefore it can give us varying results for the same operation. The average times we measured on the Pentium 4 are 6 ms for parsing the CDI, below 0.5 ms for generating an XDI, 4.5 ms for parsing an XDI, and 540 ms for transferring and resuming a session. On the Pentium 1 the same operations took respectively: 22 ms, 1 ms, 20 ms, and 833.1 ms. On the iPAQ we measured respectively: 219.2 ms, 14.4 ms, 63.1 ms, and 1270.9 ms. It is clear from those results that the largest amount of time is spent reconstructing and resuming the multimedia session. This is because during the reconstruction and resumption of the session, the Windows Media Player is loaded and positioned to the correct place in the media stream. The average time taken to move an MPEG-21 session
Figure 8: Consuming components on the PDA
between two devices of the same type, is 540 ms for the Pentium 4, 833.1 ms for the Pentium 1 and 1270.9 ms for the iPAQ. All of those times are only a very small fraction of the time it takes to consume the actual multimedia content, and therefore they can be considered as minimal (generally ignorable) overhead.
5
Conclusions
Given the fact that there is a wide variety of terminals capable of handling multimedia, it is becoming more and more obvious that there is a demand for transparent session mobility between those devices. As broadband wireless solutions are becoming more and more available and reasonably priced, users want access to their content anywhere and at any time with any type of device. In this paper, we have presented how the Universal Multimedia Access can leverage from the concept of session mobility. During the discussion of session mobility, three major difficulties have been identified; differences in terminal capabilities, differences in network capabilities and lack of a common representation of the messages used in session mobility can make it hard, or even impossible, to realize session mobility in a transparent way. We have demonstrated how MPEG-21 can provide a solution to the problems of session mobility in general. MPEG-21 provides us with a mechanism allowing the creation of multimedia that can be consumed by devices with different terminal capabilities and different network characteristics. It also provides us with the necessary standardization to create a common format for the
References
Table 1: Time measurements on Pentium 4 (2.8 GHz) std. dev. average (ms) median (ms) (ms) Parse CDI Generate XDI Parse XDI Session Transfer
6