MPEG-4 powered by Java: A Weather Forecast ...

3 downloads 0 Views 221KB Size Report
Abstract: - The paper gives a short overview on the new attractive feature of the recent MPEG-4 standard, namely the interactive facilitation of multimedia ...
MPEG-4 powered by Java: A Weather Forecast application case study S. M. TRAN, M. PREDA, F. PRETEUX Artemis Project Units Institut National des Télécommunications 9 rue Charles Fourier, 91011 Evry cedex, France [email protected], [email protected], [email protected] Abstract: - The paper gives a short overview on the new attractive feature of the recent MPEG-4 standard, namely the interactive facilitation of multimedia contents. Some experiments referred to BIFS and MPEG-J ─ the two main motors enabling the interactivity in MPEG-4 ─ are presented. These experiments serve for two targets. First, they demonstrate a sophisticated multimedia content enhanced with dynamic behaviors thanks to the power of programmatic service supported by the MPEG-4 standard. Next, the demo application serves as a comprehensive test for the integration of MPEG and Java programming language. Experiments and lessons are registered and deducted. They are the bases for authors to propose some new extensions to the MPEG-J API structure, which would ease the designing/programming task as well as improve the capability for creating advanced multimedia scenarios. Key-Words: - Multimedia compression, MPEG-4, BIFS, MPEG-J, interactive multimedia, Java.

1

Introduction

The boom of the Internet together with the drastic improvement in digital storage as well as the existence of faster and faster computing processor units makes multimedia applications more and more popular. As a feedback, the increasing need of multimedia applications and the related uses expose some new requirements on the quality of multimedia applications. Among many others, consideration shall be put on achieving higher compression of image / video, enabling the composition of various types of media data and not the last, supporting the interactivity between users and applications. The emergence of the recent MPEG-4 standard [4] brings efficient solutions to these new challenges. More than standardizing advanced tools for better compression in the traditional fields (audio and video) of MPEG family, MPEG-4 jumps a great step forward with a new capability to build up a so-called multimedia scene, serving information to users in a much more innovative way than ever in the conventional movie: the interaction between multimedia content and users are now possible. This new feature of MPEG-4 does put a stead mile-stone ahead in the convergence of the three worlds: TV/film/entertainment, computing and telecommunication. That broad target of MPEG-4 implies the complexity for the standard to make the addressed novelty become realizable. It is the reason why numerous of researches are being processed on this field to stabilize as well as improve the magic term “interactivity” in the video broadcasting

domain. It is important to note that this advanced feature of MPEG-4 has not yet been exploited extensively the in recent MPEG-4 applications. For instance, only the MPEG-4 audio video coding performance, having high quality at low cost of bitrate is utilised in MPEG-4 capturing and / or playing [10], [11] as well as in the so-called DivX applications [12]. It means that the actual feedback from the viewpoint of service-providers in consideration of the new services is still ahead. The innovative capability, namely the interactivity of MPEG-4 is still on progress to match the demand of users in the integration era of computing and broadcasting domain. The research in this work targets at two objectives: exploring and contributing to the aforementioned ongoing process. Aiming to this goal, we first give a short overview on the core technique, which makes the composition and interactivity of a multimedia content possible in the next section. We proceed on in Section 3 with a demo application, which focuses on the programmatic logic as well as the resulted adaptive behaviour of the multimedia scene. This section is the foundation for us to derive the proposals in Section 4. On one hand, the proposals tend to ease the work of scene-designer with the extension of MPEG-J interface to a higher level abstraction. On the other hand, the authors make a suggestion for several new elements in the structure of BIFS and MPEG-J to improve the interactive feature of the standard. The paper closes with conclusion and perspective work in the Section 5.

2 The core structure enabling the interactive capability of MPEG-4 MPEG-4 addresses the coding of audio-visual objects of various types: natural video and audio objects as well as textures, text, 2D and 3D graphics, also synthetic music and sound effects. Being even more complex, there are possibly a large range of objects of the same type, which can be again categorized in numerous classes, based on their logical meaning. These classes can be repeatedly rearranged into higher and higher abstraction classes. Encyclopedia application is a good example of such complicated (therefore sophisticated) multimedia content, challenging the capability of new multimedia standards including MPEG-4. To efficiently handle the huge volume of the information, the possibility of composition, reusing as well as manipulation on demand becomes crucial to multimedia compression. In MPEG-4, the incorporation of BInary Format for Scenes (BIFS) and Java technology are the key-techniques to satisfy the aforementioned requirements. 2.1 BIFS, the light-weight layer In order to reconstruct a rich multimedia content ─ combining various types of audio-visual objects serving for different intentions ─ at client’s terminal, it is not sufficient to transmit only the compressed audio-visual data to that terminal. In MPEG-4, the transmission of the additional information BIFS is dedicated to specify how multimedia objects are combined together for final presentation. This new type of information is a part of system information, acting as an intermediate layer between media data (audio, still image, video) and the final built-up content. BIFS enables the modeling of any multimedia scenario as a scene graph, where the nodes of the graph are either media- or assistant objects (Fig. 1). MPEG-4 player no longer retrieves directly the compressed audio / video data from a transmission channel. Instead, BIFS information is decoded first. Via the BIFS nodes occurring in the scene, the decoder accesses the associated audio visual data as well as other (new) supplementary data defined in MPEG-4, such as 3D mesh stream, visual still texture with arbitrary shape,.... Therefore, the essential operations for presenting the compressed audio / video ─ including scheduling, coordinating in temporal and / or spatial domain, synchronization, processing interactivity, and so on ─ can be flexibly controlled through the properties of BIFS nodes. In complicated scenario ─ interactive multimedia content ─ BIFS nodes can be dynamically added or

removed from the scene graph on demand. Certain types of nodes ─ sensors ─ can interact with users and generate appropriate triggers, which are transmitted to other nodes by routing mechanism, causing changes in state of these receiving nodes. The effect results adaptive behaviors of the whole multimedia scene upon users’ requests. The use of the so-called Script node further expands the possibilities of interaction. It is a node interface wrapped around an European Computer Manufacturers Association (ECMA) script (or JavaScript) function [13]. This allows complex transcoding of events occurring in the scene. Scene

Bifs decoder MPEG-4 formatted file / MPEG-4 stream

Demul tiplex

Subtitle

Logos

Voice

Media decoder #1 (Video)

n

Media decoder #n+1 (Audio)

m

Person

Sprite

Background

Music

Texture

Fig. 1: The underlying structure of an enhanced video scene based on MPEG-4.

Within the scope of the purest MPEG tradition, BIFS scenes also have initial states similar to video I (intra-coded) frames, and predictive states similar to video P (predicted-coded) frames. Scene I frames is packaged into an access unit (AU), which is a random access point (RAP), and scene P frames are packaged into subsequent AUs, which are not RAPs. All the AUs in sequence build up the BIFS stream, which is multiplexed with the conventional compressed audio video streams to be sent to clients’ terminals. 2.2 MPEG-J, the heavy-weight layer The scene composition and interactivities provided by the BIFS information mentioned in Section 2.1 is called the parametric description of multimedia content. Within the scope of BIFS layer, the deployment of the Script node in a scene offers a scene-designer a taste of “programming” the scene (the programming language here is the ECMA script). However, the complete programmatic behavior in presentation of a multimedia scenario is realized only with the integration of Java virtual machine at MPEG(-4) terminal. The vision behind MPEG-J is that an application written in Java is embedded in the MPEG-4 stream (it is called MPEGlet), so that it can flexibly dictate the reaction

of a compliant MPEG-J terminal in representing the associated content, taking into account users’ demands as well as terminal conditions. As seen from Fig. 2, the controlling range of MPEG-J covers all functional components of client terminal, not only the Scene as in the case of BIFS. Therefore the MPEG-J maximizes the computational capability of MPEG-4 content. MPEG-J applications Network manager D E M U X

Resource manager

Scene graph

BIFS decoder Media decoder

Data flow Controlling of Java Controlling of BIFS

Scene manager

Compositor / Renderer

Final multimedia content

Fig. 2: Structure of mechanism enabling interactivity and composition in MPEG-4-based content.

Java programming language is widespread and object oriented; it also provides ─ among many others ─ exception handling, event mechanism, polymorphism, platform independence as well as robustness and security. These factors were the main reasons that led to the choice of Java in MPEG [6]. The so-called MPEG-J API (Application Programming Interface) is standardized in MPEG-4 to facilitate the communication between MPEG-4 terminal and the Java virtual machine, which will start up whenever the MPEG-J stream is detected from the communication channel. This is a new elementary stream, which is dedicated to carry some instances of MPEGlet interface (starting point for Java processing thread) as well as all other associated classes and objects involved in the embedded Java application. The Object Descriptors ODs (conveyed in their own elementary stream) determine the name scope of MPEG-J classes and objects as in the case of audio / visual objects.

3

application is intensively based on MPEG-J technology in order to highlight its contributions to the MPEG family: the dynamic behavior and the flexible programmability. The former one is not available with any predecessors of MPEG. The latter one is even impossible with deployment of only the MPEG-4 BIFS mechanism. For distinguishing reason, we use the following convention throughout the figures denoting class diagram: classes and interfaces marked with unfilled rectangles are from the standard MPEG-J API, those symbolized with yellow (darker) background and bold text are application-specific classes and interfaces (Fig. 4).

Case-study: Forecast application

In this section, we are going to introduce the power of MPEG-4-based multimedia content enhanced with interactivity through a real application, referred to as Forecast (Fig. 3). The implementation of the

Buttonlike text

3D globe

Forecast data

Direct input text

Fig. 3: Forecast application based on MPEG-J visualized with the MPEG-4 Player3D.

The Forecast application is intended to display the current weather as well as the forecast for several cities grouped by continents. The information is retrieved on demand, that is user can first locate the continent, then choose the city of interest. The request for the selected city is subsequently sent through an Internet connection to the online forecast server [14] to obtain the correspondent data. Other enhanced features of MPEG-4 are also involved in this multimedia content, such as button-like texts, the globe with 3D view, multithreaded programming (Fig. 3). Fig. 4 outlines the class structure of the application. The main object of the application is derived from class ForecastQuery (implementing the MPEGlet interface), which contains several visible “button” objects. These buttons can react to right click mouse event issued by users. Actually, they are the implementations of the EventOutListener interface in customized manners in respect to their functionality. For instance, when being activated, the CityButton object will create an instance from class ParseXML class to make a network connection, then to send a query and to fill the customized data ForecastData (which will be displayed to user) with the response from the server. The ForecastQuery object can also launch some

thread to make the scene more lively, such as rotating the globe to emphasize the 3D impression (the object of RotateGlobeThread) or to simulate the effect of waiting state as in conventional window environment (the object of BusyCursorThread). The object from class LanguageTranslator is responsible for multi language displaying on the scene. The interaction between objects in the scene as a reaction upon users’ request for forecast data is summarized in Fig. 5. Thread run()

HelpAnimateThread

BusyCursorThread

CititesAnimateThread

MPEGlet

RotateGlobeThread

RotateHelpBtnThread

LanguageTranslator InputLang : Integer OutputLang : Integer

ForecastData CityName : String longtitude : Double latitude : Double temp_string : String

setInputLanguage() setOutputLanguage() translate()

init() run() stop() destroy()

ParseXML ProcessDocument()

GlobeSensor

ForecastQuery CityButton PlayIntro()

ContinentButton

LangSelectButton

HelpButton

EventOutListener notify()

Fig. 4: Class diagram of the Forecast application. : Continent Button

: City Button

: User Select continent

: Parse XML : Forecast Server

City Group

Display city

Select city

Display data

Create request Send request Response

Fig. 5: Main sequence of interaction in the scenario of Forecast.

4

Discussion and Proposals

The Forecast application described in Section 3 is an illustration for the convergence between computing

and broadcasting technology. Some of the most sophisticated features of computing technology including 3D presentation, network communication, multithreaded operations are integrated together with the advanced method MPEG-4 for compressing and broadcasting media data (audio, video and related data). They all attribute to a new type of multimedia presentations, which are more attractive, more interactive and therefore more efficient. The Forecast application, developed within the “Jules Verne” European Project [17], deeply involves the support of MPEG-4 standard to create highly interactive multimedia scenarios. Such application serves as a good environment to develop improved implementation as well as new designing structures, which we are willing to propose as the extension of the MPEG-4 interactivity capability as followings. 4.1 Extension in implementation The resource of MPEG-J API interfaces is efficient enough for an embedded Java application to query the information from the scene. That is, all kind of information can be retrieved with the method getEventOut of the Node interface attached to any internal object of the scene (statically defined within the BIFS stream). However, in the reverse direction (i.e. modifying the scene) these interfaces prove to be cumbersome in encoding. When Java application dynamically creates a Java object and push it into the scene via sendEventIn method of some internal nodes, the associated interface of the object must be implemented by an anonymous class. The object is then derived on the fly from this class. For example the following code sets the whichChoice field of a Switch node to 3 by means of instantiating a class which implements the SFInt32FieldValue interface: MySwitchNode.sendEventIn( EventIn.Switch.whichChoice, new SFInt32FieldValue(){ public int getSFInt32Value(){return 3;} } )

The definition on the fly as above is awkward and reduces the readability of the code segment. This is the cost for the flexibility in implementation offered by the MPEG-J API. When the API resource is definitely defined for the Java environment (therefore they are called as MPEG-Java API), it is much more useful to replace the aforementioned anonymous classes with the certainty ones, equipped with primitive get / set methods. In the case of SFInt32FieldValue interface, we declare and defined once the ASFInt32FieldValue class as a primitive type in MPEG-J API (Fig. 6).

Then, at any time when a scene modification involving this interface as in the above code segment, we can simply adopt the class as following:

but also the NewNode interface due to the complexity of Node-styled objects. The proposed class structure tends to raise the MPEG-J API interfaces into a higher level of primitive classes. The introduction of these classes helps to avoid frequent redefinition of the same anonymous class referring to the same type of object while modifying the scene. It also makes the Java source code more readable and more object-oriented.

MySwitchNode.sendEventIn( EventIn.Switch.whichChoice, new ASFInt32FieldValue(3) )

Similarly, all interfaces would be associated with classes as in Fig. 6. The ASFNode class implements not only the SFNodeFieldValue interface FieldValue

SFBoolFieldValue

SFInt32FieldValue

getSFBoolValue()

getSFInt32Value()

ASFBoolFieldValue m_value : Boolean

ASFInt32FieldValue m_value : Integer

ASFBoolFieldValue() getSFBoolValue() setSFBoolValue()

....

SFRotationFieldValue getSFRotationValue()

ASFRotationFieldValue ....

SFNodeFieldValue getSFNodeValue()

ASFNode NodeID : Integer

NodeValue

NewNode

ASFNode() getNodeID() getSFNodeValue()

ASFInt32FieldValue() getSFInt32FieldValue() setSFInt32FieldValue()

AMFInt32FieldValue m_value : ASFInt32FieldValue[] AMFNode m_value : ASFNode[]

AMFInt32FieldValue() getMFInt32Value() add() getSize() removeAt() setAt() getAt() removeAll() setAt()

MFInt32FieldValue

....

....

AMFRotationFieldValue

MFRotationFieldValue

AMFNode() getMFNodeValue() add() getSize() removeAt() setAt() getAt() removeAll()

AShape ATouchSensor appearance : ASFNode .... geometry : ASFNode AShape() getField() getNodeType()

MFNodeFieldValue

MFFieldValue

Fig. 6: Extension to MPEG-J API.

4.2 Extension in structure In our previous work [16], we pointed out a possible shortcoming MPEG-4 in integrating the computing power into the broadcasting domain. The former technology has a strict chronology constraint, while the latter must be as loose as possible to enable the random access. It means that starting from a static scene, going through interactions between objects in the scene as well as between the scene and users, the original scene can be changed drastically so that the retransmission of the scene with its very first state will reset all the modification in the current scene. Conversely, the retransmission is inevitable in the broadcasting domain, where a client terminal is supposed to switch onto the communication channel at any time. Therefore, in streaming MPEG-4 content, only static or simple interactive scenes

(which support prompt reaction upon users’ request, do not have any long-term effect) are available. Sophisticated multimedia scenes are possible lonely for local play. A possible solution we proposed in [16] is the introduction of Cookies node, which has the following syntax: Cookies{ ExposedField ExposedField ExposedField }

MFBool MFString MFString

IsPers[] Name[] Value[]

The lifetime of the Cookies nodes surpass the boundary of AUs having RAP flags (Fig. 7). At the border of two RAPs, the Cookies node having its previous state will remain unchanged. Otherwise, it is initialized as usual. Those Cookies nodes, whose field IsPers takes the true value, can be saved at the terminal for future use. It is useful for storing the

high scores of a multimedia game or a bookmark for an encyclopedia content.

References:

Presentation time Random access unit Ordinary nodes

BIFS bitstream

Cookies nodes

Fig. 7: Cookies node and its lifetime.

In the domain of MPEG-J, the implementation of the run method in any MPEGlet interface must respect the content of Cookies nodes to avoid resetting the scene when that MPEGlet is retransmitted. Or the run method would not be passed to Java machine to execute when it is detected as a retransmission.

5

Conclusion

The work reviews the core technology, which adds the interactivity and the computation capability to MPEG-4-based content. They are the BIFS nodes and the MPEG-J API. The Forecast enhanced “movie” ─ it is a real application in the viewpoint of computing technology ─ is created to demonstrate this new attractive feature of the standard. In the application, we highlight the network communication, the dynamic behavior upon users’ request and the multithreaded programming, which have not ever been available for media compression up to now. It is a convincing proof for the mergence happening between the TV/film/entertainment, the computing and telecommunication. The marriage of computation with broadcasting technology as well as the Java with the MPEG is analyzed. They are the root for us to develop several extensions to MPEG-4 standard. We proposed the layer of primitive classes to fully cover the MPEG-J API low level interfaces, to ease the work for programming the scene. We also suggested the introduction of new Cookies node which takes into account the problem of retransmitting a dynamic multimedia scene for random access. The actual implementation of the proposals into the reference software decoder of MPEG-4, finetuning the solution and standardizing them are among our next targets in the future research.

[1] WD 3.0 ISO/IEC 14496-1:2002 Information technology ─ coding of audio-visual objects ─ Part 1: Systems, August 2002. [2] ISO/IEC JTC1 / SC29 / WG11 MPEG99 / M5450, BIFS/OD Encoder version 4.0, November 1999. [3] ISO/IEC 14772-1: 1997, Information technology ─ Computer graphics and image processing ─ The Virtual Reality Modeling Language ─ Part 1: Functional specification and UTF-8 encoding. [4] ISO/IEC JTC 1/SC 29/WG11 N4668, MPEG-4 Overview, March 2002. [5] Arnold, Ken, J. Gosling, D. Holmes. The Java Programming Language, 3rd Edition, Redwood City, CA: Benjamin/Cummings, 1994. [6] F. Pereira, T Ebrahimi. The MPEG-4 book, IMSC Press Multimedia Series/Andrew Tescher, 2002. [7] Website of MPEG-4 @ ENST, http://www.comelec.enst.fr/~dufourd/mpeg4/index.html. [8] C. Concolato, J. – C. Dufourd. Comparison of MPEG-4 BIFS and some other multimedia description languages, Workshop and Exhibition on MPEG-4, WEPM 2002, San Jose, California, USA. [9] Website of Envivio, http://www.envivio.com. [10] Website of Philips, http://www.mpeg4.philips.com. [11] Website of MPEG-4 IP project, http://mpeg4ip.sourceforge.net. [12] Website of DivX, http://www.divx.com. [13] ISO/IEC DIS 16262 Information technology ─ ECMAScript: A general purpose, cross-platform programming language [14] Website of Unisys Weather http://www.weather.unisys.com [15] Website of Osmose @ ENST: http://www.comelec.enst.fr/osmo4/#samples. [16] S. M. Tran, M. Preda, F. Preteux, K. Fazekas: New proposal for enhancing the interactive capability in MPEG-4, MMSP 2004, Sienna, Italy. [17] Website of the “Jules Verne” project, www.citi.tudor.lu/julesverne