What is Multimedia? Multimedia APIs - TKK - TML

4 downloads 44178 Views 1MB Size Report
Application Logic needs of a programming language (if, case, goto...) – Compiled ... Desktop environments: KDE, GNOME (Toolkits + Applications). – Window ...
T-111.5350 Multimedia Programming Pablo Cesar

What is Multimedia? Multimedia APIs

Pablo Cesar [email protected] http://www.tml.hut.fi/~pcesar

T-111.5350 Multimedia Programming Pablo Cesar

T-111.5350 Multimedia Programming Pablo Cesar

Outline • Definitions of Multimedia • Multimedia Elements: – Multimedia Objects: Audio, video, graphics, text – Visual Style – Layout of those objects • Temporal dimension (animation, synchronization) • Graphical layout

– Application Logic: State of the application (e.g., Games) – User Interaction: Passive to authoring (Visualization, Navigation, WIMP concepts)

• Taxonomy of Authoring Content Formats – Expressive Power, Easiness of Use, Safety of Distribution, Interoperability

• Compiled Languages (C, C++) • Virtual Machine Languages (Java) • XML Based Languages (SMIL, XForms)

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Heller

– Interactivity – Aesthetics – Audience

Motion

Text

Au d Dis ienc e c Int iplin e e Qu racti v a Us lity ity e Ae fulne sth s eti s cs

xt

Sound

nte

Graphics

Abstraction

Representation

Multimedia

Co

• Context: includes properties such as

(Increasing abstraction)

Elaboration

– Elaboration: no edited information – Representation: edited or stylized – Abstraction: For example icons (most abstract)

Media Expression

Media Type

• Media Type: text, sound, graphics, and motion • Media Expression: describes the level of abstraction using the media

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Purchase

• Nature of the sign:

• Syntax / Arrangement: – – – – –

Individual Augmented Temporal Linear …

Augmented

Syntax

– Concrete iconic (photorealistic image) – Abstract iconic (map) – Symbolic (written word)

Individual

Temporal Linear Schematic Network

Mod

ality

Aura l Visu al

– Aural: audio – Visual: graphics

Symbolic

Concrete iconic

• Modality:

Abstract iconic

Sign

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Bulterman and Hardman • Media Assets: How to reference the multimedia objects of the presentation • Synchronization Composition: – Hard timing relationships, Relative structural ordering – Constraints

• Spatial Layout – Implicit (video), explicit, and dynamic

• Asynchronous Events – Content-based (timing) and user interaction (navigation)

• Adjunct/Replacement Content – Alternative content / adaptation content

• Performance Analysis – performance optimization for various delivery scenarios

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Vuorimaa • Multiple media – Types: text, graphics, animation, image, audio, and video – Source: Natural (e.g., video) vs. artificial (e.g., 3D graphics)

• Interaction – Stand-alone vs. Networked applications – Level of interaction (user interface, application, and service) – Amount of interaction • E-mail, video-on-demand, video conference, video • Game, and virtual reality

• Timing – External synchronization of different media (e.g., video and slides) – Internal timing within single medium (e.g. video) – Usually applications have time dimension (e.g., story line)

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Summary (1/2) • Multimedia Objects – Audio, video, graphics, text

• Visual Style • Layout of those objects – Temporal dimension (animation, synchronization) – Spatial layout

• User Interaction – Passive to authoring (Visualization, Navigation, WIMP concepts)

• Application Logic – State of the application (e.g., Games)

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Summary (2/2) ”Computer mediated applications that integrate and present different media objects, which are arrange spatially and temporally. Moreover, user interaction can control the behavior of the application.” Multimedia Objects

Visual Style

Temporal Dimension

Spatial Layout

Application Logic

User Interaction

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements Objects and Visual Style • Discrete Media – Icons: semantic images (e.g., stop symbol). Require the user to have previous knowledge – Graphics: computer generated. Can be 2D or 3D graphics depending on the goal – Images: natural source (e.g., photograph) – Text:

Size,

, Color

• Continuous Media – Motion Pictures (audio + video)

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements Spatial Layout (Pihkala 2003, Boll 2001) • Absolute – coordinates relative to origin

• Directional relations:

North

– define order in space

• Topological relations: – disjoint, touch, equals, inside of, covered by, contains, cover, and overlap

• Text Flow: – one-dimensional flow showed in two-dimensional area

East Contains

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements Temporal Dimension (Pihkala 2003, Boll 2001) • Temporal Models: – Definite: for example 6 seconds – Indefinite: for example, when user clicks – Parallel and Sequential relations (e.g., start these two videos at this moment or start this video after this other one)

• Animation: – Mixture of temporal dimension and spatial layout (i.e., position of an object changes in time) Time

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements User Interaction • Different Levels of Interaction (Aleem): – – – –

Passive: only visualization Reactive: limited interaction (e.g., Scroll Pane functionality). Proactive: choose a path or make selections (e.g., Button). Reciprocal: corresponds to user authoring of information

• Interaction Models (Boll): – Navigational: choice to decide where to go next – Design: user can modify the visual style of the presentation (e.g., colors) – Movie: user can control the global time (e.g., VCR capabilities)

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements Application Logic • Traditionally multimedia presentations did not have that much logic: – Virtual visit to a museum, DVD menus...

• Real – time interactive systems: – Virtual Reality worlds, games

• Application Logic needs of a programming language (if, case, goto...) – Compiled Languages: C, C++ – Virtual Machine: Java – World Wide Web, MPEG-4, Director: scripting

T-111.5350 Multimedia Programming Pablo Cesar

Taxonomy of Authoring Content Formats Requirements • • • • • • •

Supported Media Types: audio, video, text, graphics, and animation Arrangement of the signs: spatial and temporal Interaction: passive, reactive, proactive, and reciprocal Difficulty to use (threshold) Expressional power (i.e., ceiling) Safety of Distribution Interoperability Threshold Ceiling

Interoperability

Safety of Distribution

Compiled Languages

+++

+++

+

+

VM Languages

++

++

++

++

XML based Languages

+

+

+++

+++

T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Normally, used for system software (e.g., operating system) and resource demanding services: C, C++ Pro • Efficient approach • Expressive power (closer to computer hardware)

Con • Interoperability (each service has to be compiled to target device) • Less safer to distribute (it can include harmful code)

T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages System Software • ”User Interface Software Tools” (1995, Myers) defines a layered model • Applications implemented using higher-level tools • Toolkit: a library of widgets used by applications • Windowing System: helps user to monitor and control different contexts (input and output functionality)

T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Windowing System (1/3) KDE Desktop Environment Window Manager Kwin

Window Manager

One per Session Enlight.

Fluxbox

Xlib Base Layer

Gnome Libraries

Sawfish

KDE Libraries Toolkit Qt Toolkit

Gnome Desktop Environment

Toolkit GIMP Toolkit (GTK) GDK Xlib

X Network Protocol

X Network Protocol XServer

GLib

T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Windowing System (2/3) • X-Window – X.org: fonts management, graphics card support, composite functionality – Desktop environments: KDE, GNOME (Toolkits + Applications) – Window Managers: FluxBox, Sawfish…

• DirectFB – XDirectFB: X-Window Support on DirectFB – DirectFBGL

• Microsoft Windows – DirectX

• Mac – Video: QucikTime – 3D: OpenGL – 2D: QuickDraw

T-111.5350 Multimedia Programming Pablo Cesar

Screenshots – X.org

T-111.5350 Multimedia Programming Pablo Cesar

Screenshots – DirectFB

T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Windowing System: DirectFB DirectFB Application User Space DirectFB

Chipset Driver Kernel Space

Framebufffer Driver

Framebufffer

Timing and Mode

Accelerator

Hardware

T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Windowing System: Direct-X Win32 Application

Win32 Application

Direct3D API GDI HAL Device Device Driver Interface (DDI) Graphics Hardware

T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Toolkits • Toolkits provide – Interaction: to handle user input – Canvas Operations: both rendering region, canvas, and graphics primitives – Set of Widgets: predefined user interface elements (e.g., Button) – Graphical Layout: to control the location of the widgets

• Examples: QT, GTK • Virtual Toolkit – Device independent Toolkit – Mapped to actual Toolkit in the device – Example: AWT

T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Media Providers • • • • •

Audio/Video: Xine, MPlayer Television: linuxtv Games: SDL Other Languages: For example libflash 3D graphics: – OpenGL – OpenGL ES

• Home media platforms: LIMMBO, MythTV

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages A Virtual Machine is an abstraction of the computing environment. JVM + APIs • •

• •

Pro Platform independence Safer to distribute (restricts potential security attacks) Expressive power (programming language) Well documented APIs

Con • Heavy applications (because of VM concept) • Difficult of use (programming language) • Less powerful than compiled languages

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Java Overview • Nowadays, trying to target all kind of computer devices • Editions: – Java 2 Enterprise Edition (J2EE): for servers and enterprise computers – Java 2 Standard Edition (J2SE): for servers and personal computers – Java 2 Micro Edition (J2ME): for embedded devices, PDAs, mobile phones, and Digital television set-top boxes – Java Card: for smart cards

• Profile

Profile

– Requirements for a specific vertical market of devices (set of APIs)

• Configuration – Minimum platform for a horizontal grouping of devices (VM + core APIs)

MIDP CLDC

Configuration KVM

T-111.5350 Multimedia Programming Pablo Cesar Servers

Personal Computers

Optional Packages

Java 2 Enterprise Edition (J2EE)

TV STBs High End PDAs

Mobile Phones Low end PDAs

Smart Cards

Optional Packages

Java 2 Standard Edition (J2SE)

Optional Packages

Personal Profile Foundation Profile

Optional Packages

CDC

CLDC

Java Virtual Machine

MIDP

KVM Java 2 Micro Edition(J2ME)

Java Card Card VM

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Multimedia • User interface development (AWT/Swing) – Layout: Grid, North-South-East-West, Flow – Set of Widgets: Button, TextArea – User Interaction: awt.ui.* (Mouse, Keyboard…)

• Video/Audio and Synchronization (JMF) – Manager, Player, Data Source, and Controller

• 3D Graphics – Java3D – Java wrappers for OpenGL

• Different Devices – Television: MHP/OCAP/ACAP/ARIB -> GEM – Handheld: MIDP

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages User Interface Development

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages JMF (1/2) Retrieves the actual media data

Decodes and plays the media data

Implements the state machine

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages JMF (2/2) • Unrealised: when it does not have all the information to acquire the needed resources • Realised: when it has all the information to acquire the needed resources • Prefetched: when it has all the needed resources, and has already prefetched enough media data to start playing immediately • Started: when it is actually playing the media

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages 3D Graphics • Java3D – – – –

Completely new API for stand-alone 3D graphics applications Can use any underlying architecture (Direct-X, OpenGL...) It might not be the most efficient approach Developers have to learn a new API

• Java wrappers of OpenGL – – – – –

Functionality from OpenGL Developers knows the API already Only wrappers: uses Java Native Interface (JNI) Much intercommunication between layers (Java -> C) API is not standardised yet (Java Specification Requests) • JSR 231: OpenGL • JSR 239: OpenGL ES

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages J2ME • Defines two Configurations:

TV STBs High End PDAs

Mobile Phones Low end PDAs

– CDC: High end consumer devices • RAM Java Memory: around 2MB • ROM Java Memory: around 2.5MB

– CLDC: Low end consumer devices • Processor:16 bit/16 MHz or higher • Java total memory: 160-512 KB

• CDC (Connected Device) – Personal Profile • Adds support for lightweight AWT

– Foundation Profile • Basic application APIs (no GUI)

• CLDC (Connected Limited Device)

Optional Packages

Personal Profile Foundation Profile

Optional Packages

CDC

CLDC

JVM

KVM

MIDP

– Mobile Information Device Profile (MDIP) • Application APIs + GUI APIs

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Handheld

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Television Interoperable Application Application Manager

Interoperable Application Transport Protocol(s)

Data

Sun Java HAVi APIs APIs

Interoperable Application DAVIC APIs

DVB Specific APIs

Java Virtual Machine Operating System, drivers, firmware System Software

Data

T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Summary Supported Media Types Text, Graphics Video, Audio Arrangement of the signs Spatial Temporal Interaction Different Devices Handheld Television

AWT JMF AWT Java Threads AWT Events MIDP GEM

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Declarative programming language (only what has to be done, not how). Major contributor is W3C Pro • Easiness of use (you can even use a text editor) • Interoperability (only needs a compatible browser) • Safest to distribute

Con • Expressive power (quite limited, not a programming language!) • Use of scripting for application logic (or not?) • Needs of a service under it (browser)

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Overview Document Document XML Based Language Document XML Based Language XML Based Language

• HTML & XHTML • Multimedia – SMIL, Timesheets

• User Interface – XForms, XIML

• Vector Graphics – SVG

• Voice – VoiceXML

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages HTML & XHTML • • • •

HTML HTML 4.01: (24 Dec. 1999) W3C Recommendation Lingua franca for publishing hypertext on the WWW. Non-proprietary Can be created by a wide range of tools: – Text editors – Authoring tools



• • • •

– To only describe the structure of the document (CSS formatting)



XHTML 1.0 – Well formed documents – Proper nesting – ...

All kind of features (mixed together): – UI components – Fonts – Lists

XHTML XHTML 1.0 (26 Jan. 2000, revised 1 Aug. 2002) W3C Recommendation XHTML 2.0: (22 July 2004) W3C Working Draft Reformulation of HTML 4 in XML Intention



XHTML 2.0 – No backwards compatible – Reduces scripting – Includes XForms and XML Events

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages XHTML Modularization and XHTML 1.1 Other XHTML Modules

Core Modules Structure Text Hypertext List

Other W3C Modules

Applet Presentation Edit Bi−directional Text Forms Tables Basic Forms Basic Tables Image Object Client−side Image Map Server−Side Image Map

Intrinsic Events Frames Target IFrame Name Identification Legacy Metainformation Scripting Stylesheet Style Attribute Link Base

Private Modules

Other XHTML Modules

Core Modules Structure Text Hypertext List

Other W3C Modules Ruby Annotation

Applet Presentation Edit Bi−directional Text Forms Tables Basic Forms Basic Tables Image Object Client−side Image Map Server−Side Image Map

Intrinsic Events Frames Target IFrame Name Identification Legacy Metainformation Scripting Stylesheet Style Attribute Link Base

Private Modules

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Multimedia SMIL • SMIL 2.0 (07 Aug. 2001) W3C Recommendation • Easy to write, like HTML • Doesn’t define media formats, only integrates them • , ,

• Defines the spatial and temporal dimensions of the document • Limited Interaction • , : for links

• Absolute Layout

Timesheets • Similar to CSS, but for temporal dimension • Document composed of: – Content: XHTML – Formatting: CSS – Timing: Timesheets

• Similar syntax than SMIL

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages User Interface • • • • •

XForms XForms 1.0 (14 Oct. 2003) W3C Recommendation Next generation of web forms Not intended as a self-standing document type Uses host language for the document layout (e.g., XHTML, SMIL) Advances user interface features: •

• •

• • • • •



text input, select one, select many, submit

User input can be validated in the client-side Calculations are done, as well, in the client side

XUL XML User Interface Language Only supported in Mozilla and Netscape 6 (or later) browsers Only for window-based graphical UI (mobile phones?) Abstraction only at the platform level (not at the UI level, voice?) It separates: • •

• • •

Client application definition and programmatic logic Presentation (using CSS) Language-specific text labels

Look & feel changed as wished Interaction achieved by scripting Interface elements: windows, menubar, scrollbar

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Vector Graphics, Voice • • •

SVG SVG 1.0 (04 Sept. 2001) W3C Recommendation SVG 1.1 (14 Jan. 2003) W3C Recommendation Describes vector-based graphics for the Web (no pixel based) • • •



Shapes (e.g., lines & curves) Images Text

Drawings can be • •

Interactive (e.g., Mouse clicked ) Animated (e.g., Change location)

• • •

VoiceXML VoiceXML 2.0 (16 Mar. 2004) W3C Recommendation Creation of audio dialogs (user interfaces) Input •



Output •



Speech Recognition and/or touch tone (keypad) Pre-recorded audio and Text-toSpeech Synthesis (TTS)

Describes: – Spoken prompts: synthetic speech – Recognition of spoken words and touch tone key presses (fields) – Control of dialog flow (menu, form that can be submitted to server) – Telephony control (call transfer)

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Terminals & Browsers (Desktop)

http://www.xsmiles.org

http://www.opera.com/

http://www.mozilla.org

http://www.microsoft.com/windows/ie/

http://home.netscape.com/

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Terminals & Browsers (Embedded) Espial Browser

Web TV Mobile Phone

T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Summary Media Types Audio Video Text, Images Arrangement of the signs Spatial Temporal Interaction

XHTML

SVG SMIL XForms

No No Yes

Yes No Yes

Flow & Absolute No Links

Absolute

Yes Yes Yes

-----

No Yes -Links Links Full

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages

Please,

OrderedGroup (Scene)

MovieTexture (Video)

OK

select a topic by using your remote control

Transform2D (Graphics)

Text and Rectangle (Background)

ImageTexture (Image of Topics)

TouchSensor (Panel)

Please, select a topic by using your remote control

Button

Button

OK

Button

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages MPEG-4 Overview (1/2) • Evolution: – MPEG traditionally targeted to audio/video codecs (MPEG-1, MPEG-2) – Complex toolkit capable of providing solutions for multimedia applications

• Scene: – Composition of different multimedia objects (2D, 3D, video) including their spatial and temporal relationships

• Entry points: – BInary Format for Scene (BIFS): • Hierarchical structure (scene graph) • Properties: color, size, position, and timing • Behavior: BIFS commands (conditional) and Animations

– MPEG-Java: set of Java APIs – eXtensible MPEG-4 Texttual (XMT): XML language that describes scenes

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages MPEG-4 Overview (2/2) • Some of the Scene Nodes: – – – –

Top: root of the graph (e.g., Layer3D and Layer2D) Grouping: containers of multimedia objects Sensor: nodes capable of detecting events (e.g., Time and Touch) Shape: Graphical Primitives that include two fields: Geometry (e.g., rectangle and circle) and Appearance (e.g., texture and material) – Face: integration of synthetic 3D human-like objects

• Interaction: – Sensors detect events and Route distribute them – Predefined behaviors: resize, relocate – Complex behavior: script or Java

• Widgets: – Can be implemented (e.g., sensor + Shape)

• Layout: – Local coordinates of the objects (more complex automatic layout is not permitted)

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages MHEG Overview •

Content Classes – Multimedia objects (e.g., video or audio clips) – Contained in MHEG object (small data) or reference (e.g., filename, web server address) – Author can reference to smaller sections (e.g., track 5)



Behavior Classes: – Synchronization of events and user interaction – User Interaction



The action class: – Event triggers – Sequential and parallel

• •

The link class: establishes relationships between events and objects i.e. what actions to take on what objects in response to a particular event. Selection and modification classes: – E.g., Push button, checkbox, radio button, slider, text entry field and text lists – Selections, input information and trigger events.

T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages MHEG Example (scene:InfoScene1 group-items: (bitmap: BgndInfo content-hook: #bitmapHook original-box-size: (320 240) original-position: (0 0) content-data: referenced-content: "InfoBngd" ) (text: content-hook: #textHook original-box-size: (280 20) original-position: (40 50) content-data: included-content: "1. Lubricate..." ) links: (link: Link1 event-source: InfoScene1 event-type: #UserInput event-data: #Left link-effect: action: transition-to: InfoScene2 ) )

T-111.5350 Multimedia Programming Pablo Cesar

Conclusion • Multimedia – Multimedia objects, visual style – Spatial layout, temporal dimension – Application logic, user interaction

• Four alternatives (from taxonomy) – – – –

Compiled languages (C): most efficient, less safer to distribute VM languages (Java): programming language, interoperable XML based languages: most interoperable, less expressive power Multimedia Languages: intended for multimedia

• Number of APIs – C: OpenGL/Direct-X, DirectFB, SDL, linuxTV – Java: AWT, Swing, JMF, Java3D, Java OpenGL – XML: XHTML, SMIL, Timesheets, XForms, SVG, VoiceXML

T-111.5350 Multimedia Programming Pablo Cesar

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

T. A. Aleem. A Taxonomy of Multimedia Interactivity. Doctoral dissertation, The Union Institute, USA, September 1998. S. Boll. ZYX, Towards Flexible Multimedia Document Models for Reuse and Adaptation. Doctoral dissertation, University of Vienna, Austria, August 2001. D. C. A. Bulterman and L. Hardman, Structured Multimedia Authoring, ACM Transactions on Multimedia Computing, Communications, and Applications, 1(1): 89-109, February 2005. P. Cesar, Graphics Software Architecture for High End Interactive Television Terminals, Helsinki University of Technology, Finland, December 2005 (in print). R. S. Heller, C. D. Martin, N. Haneef, and S. Gievska-Krliu. Using a theoretical multimedia taxonomy framework. ACM Journal of Educational Resources in Computing, 1(1): article number 6, 2001. K. Pihkala. Extensions to the SMIL Language. Doctoral dissertation, Helsinki University of Technology, Finland, November 2003. H. Purchase. Defining multimedia. IEEE Multimedia, 5(1):8-15, 1998. M. Williams. A Taxonomy of Media Usage in Multimedia. Doctoral dissertation, Nova Southeastern University, USA, May 2003. P. Vuorimaa, Multimedia Technology Course (http://www.tml.hut.fi/Opinnot/T-111.350)