3.1 Data Types based on LonWorks and MPEG-7 standards. Data types are used to ... (ISO/IEC 15938-5) are an international standard of media description and ...
A Metadata Schema Design on Representation of Sensory Effect Information for Sensible Media and its Service Framework using UPnP Shinjee Pyo1), Sanghyun Joo2), Bumsuk Choi2), Munchurl Kim1) and Jaegon Kim3) 1)
Information and Communications University Electronics and Telecommunications Research Institute 3) Korea Aviation University 1) {vy311, mkim}@icu.ac.kr, 2){joos, bschoi}@etri.re.kr, 3){jgkim}@kau.ac.kr 2)
Abstract With advent of various media services and development of audio and video devices, we can enjoy the media more effectively and realistically. Conventional media content is presented mostly via speakers, TV and LCD monitors. Beyond the media rendering only, if the media contents interlink with peripheral devices when being playbacked, it is possible to make fascinating effects on audiovisual media contents. In this paper, we suggest a device rendered sensible media and metadata schema for representing the effect and control information and design a service framework for device rendered sensible media based on UPnP framework.
device-synchronized ways. In this paper, we introduce a metadata schema and UPnP based service framework for RoSE media. The original work of this paper was presented as a contribution in the 82nd MPEG standardization meeting and the DRESS (Device-Rendered Sensible) media was renamed to the RoSE media at the meeting. This paper is organized as follows: Section 2 introduces a RoSE media concept with its service scenarios; Section 3 addresses a metadata schema for RoSE; Section 4 describes a RoSE media service framework based on UPnP; and we conclude our works in Section 6.
Keywords Device Rendered Sensible media, DCI Metadata, Metadata, Schema, RoSE Media, UPnP AV framework
1. Introduction Conventional media content is usually presented to users by TV or audio visual peripheral devices such as LCD monitors and speakers. Users want more realistic experiences of multimedia content with high fidelity. For examples, stereoscopic video, virtual reality, 3-dimensional TV, multi-channel audio, etc are typical types of media for realistic experiences. However, these sorts of applications are limited in visual and audio perspectives. More realistic media can be associated with the target devices which are supposed to be rendered in coupled manners with the contents. For example, special effects can be authored as a separate track in conjunction with an audiovisual content in a synchronized way. While the audiovisual content is being play-backed, a series of special effects can be made by shaking window’s curtains for a sensation of fear effect, by turning on a flashbulb for lightning flash effect, etc. Furthermore, fragrance, flame, fog and scare effects can be made by a scent device, a flame-thrower, a fog generator and a shaking chair, respectively [1]. From a rich media perspective, realistic media coupled/assisted with their target devices is very beneficial to the users because their experiences on media consumption can be greatly enhanced. Industry markets can be enriched in a coupled way of media and devices. From a technical perspective, this requires a representation of sensory effect for device-rendered media presentation, which may define representation of the information about special effects, characteristics of target devices, synchronizations, etc. for enriched experience. So the information representation of special effects, characteristics of target devices, synchronizations, etc. is associated with multimedia contents, which constitutes so-called Representation of Sensory Effect (RoSE) media. Presentation of RoSE media is achieved by rendering the multimedia contents in device-rendered and
ISBN 978-89-5519-136-3
2. RoSE Media RoSE media is interlinked with peripheral devices which can represent various effects according to scenes of media. So, it is possible to make fascinating effects on audiovisual media contents so that users can get sensible feeling for the media contents by rendering peripheral devices. To express the various effects, the RoSE media is coupled with the original media contents and DCI (Device Control Information) metadata which describes the information for expressing the device-rendering effects using peripheral devices. When ROSE media content is being playbacked, the effects of sensibility in content can be realized via various devices controlled by a RoSE media handler with the DCI metadata which is included in the RoSE media content. Fig. 1 exhibits a conceptual model for RoSE media service framework. In Fig. 1, the RoSE media content is generated with a rose picture and scent information which is described in the DCI metadata. The RoSE media can then be playbacked by a RoSE media handler via various devices. Here, the RoSE media handler parses the DCI metadata and controls the required peripheral devices by sending out the parsed device control data to their target devices.
-1129-
Fig. 1 A Conceptual Model for RoSE Media Service
Feb. 17-20, 2008 ICACT 2008
The sensibility to be expressed by the DCI metadata includes the wind effect using a wind device such as an electric fan or an air conditioner, the lighting effect using a lighting device such as a lamp, the temperature effect using a temperature controller, the vibration effect using vibration devices such as cell phones or vibration chairs, the scent effect using a diffusion device such as a diffuser, and the shielding effect using shading devices such as curtains and blinders. In this section, we discuss about RoSE media, several application scenario of RoSE media and requirements for RoSE media. 2.1 Application Scenarios
2.1.1 RoSE media on mobile phones Recently, the rich terminal devices of mobile phones are very popular in the markets. The rich mobile terminals are usually equipped with color display units, speakers, lightning LEDs and vibrators etc. Furthermore, those devices are connected to 3G mobile networks such as WCDMA, Wibro (or NGN), HSDPA networks. A video content can be rendered via a display unit and speakers on a mobile phone, and the mobile phone is vibrated with its LEDs turned on/off alternately when the explosion scene in the content is playbacked as shown in Fig. 2. vibration
Lighting flash
Lighting flash
Fig. 2 A mobile phone with a RoSE Media
2.1.2 Advertisements A user watches a soccer game. After the first half of a game is over, the advertisement of pizza based on RoSE media is shown. The advertisement content accompanies with smell of pizza using a peripheral device which emits various smells and fragrance. When the advertisement comes into the RoSE media controller, the controller searches the connected device which can represent the scent effect of pizza advertisement in home network and service the scent effect using appropriate device for effect to user according to the metadata included in the advertisement content. As a result, the user feels the pizza advertisement not only visually but also olfactorily. 2.1.3 Showing Movie in Amusement Park Universal Studio, Backdraft of Thema park provides realistic experience with movies, and many other amusement parks provide sensible movie services with various effects such as vibration of chairs, pouring water, and blowing smoke in the theaters. However, these movies and the effects are separately operated by each control point so the device
ISBN 978-89-5519-136-3
2.2 What is RoSE media? In order to represent the RoSE media, we require two things: one is for presentation of effects; and another is for DCI metadata. 2.2.1 Representation of effects
RoSE media will be applicable to advertisements, amusement park, sensible movie on mobile phones and so on.
vibration
control information of device-rendering effects is not represented with the movie contents in combined forms. So, when we use a RoSE media service framework in conventional applications of amusement parks, the devices which represent various effects are controlled by included metadata and synchronized with its own movie content.
RoSE media is coupled with peripheral devices and represents various effects such as lighting, scent, sound, wind, shielding effect, vibration, visual and temperature effects. Also the information about these effects is described and communicated so as to express RoSE media properly. The metadata of RoSE media contains the information. During the media content is being played, the information about these effects is transmitted and rendered to the peripheral devices. Finally, using the information, the peripheral devices are controlled and presented to express various effects. Lighting effect is a set of effects which use lighting devices such as, a floor lamp, a fluorescent lamp and a flood lamp. There are various actions in lighting effect. The actions are flickering lamp, dimming brightness, changing color of lamp, and so on. Sound effect includes several actions which changes a volume and turn on/off a sound according to the scene using sound devices such as a speaker and audio equipment. Wind effect contains actions which uses an electric fan, an air conditioner and a ceiling fan. The actions make the wind devices operate, and change a direction or intensity of wind. Scent effect uses emitting device to emit various scent according to the scene of media. Shielding effect uses a controllable curtain, a window blind to shield a light in accordance with the scene. Vibration effect gives vibration to users using vibration chair or some devices which are able to vibrate. Visual effect is actions which shows extensible image using not the visual device playing the media, but other peripheral visual devices. Temperature effect change temperature using temperature controller. We assume that all the devices which require representing these all effects are prepared, and give pertinent actions to each device. The actions are synchronized with media contents. In the next section, we discuss how to design metadata schema for device control information. 2.2.2 DCI Metadata In order to represent the RoSE media, the metadata should represent various effects and conditions for effect, allow for synchronization with media contents and express control information. Also, DCI metadata requires data types for descriptions of device control information to express sensory effects and synchronization between control information and media contents. In data types, we are based on LonWorks [2] and MPEG-7 MDS [3]. We will discuss about the data types of DCI metadata in the next section.
-1130-
Feb. 17-20, 2008 ICACT 2008
SNVT_temp_p SNVT_rpm
3. RoSE Metadata Schema DCI metadata schema is a structure of writing a DCI metadata for RoSE media. For the DCI metadata, we need to design a DCI schema for RoSE media. Now we propose some data types as building blocks of schema and structure of schema. 3.1 Data Types based on LonWorks and MPEG-7 standards Data types are used to be building blocks of expression of DCI metadata. These data types are based on LonWorks and MPEG-7 standard. LonWorks is a platform which enables to interoperate with various devices in control networking system and is developed by Echelon [6]. LonMark is an association to standardize interoperable system and devices. LonMark provides guidelines to embody interoperable system, such as LONMARK Interoperability Guidelines, ANS/EIA709 Control Network Protocol Specification and LONMARK Functional Profiles. These guidelines are represented in LonMark web site [6]. Also, LonMark standardizes SNVTs(Standard Network Variable Types) to communicate with device and control devices. In SNVTs, there are 187 different SNVTs for representation of device status and control values. Among the 187 SNVTs, we select nine SNVTs to express the effects and control data of devices for RoSE meida and translate the SNVTs into XML schema. The types of SNVTs have the namespace “snvt:”. The SNVTs has are defined in the following format.
snvt:temp_pType snvt:rpmType
Table 1. SNVT mapping to XML Schema types
The SNVT_lux is a variable for brightness, SNVT_speed_mil is a variable for speed (m/s) SNVT_angle_deg represents degree of angle, SNVT_angle_vel is a variable for angular velocity, SNVT_mass_mill is variable for mass (mg), SNVT_sound_db means intensity of sound (db), SNVT_time_sec represents time (sec), SNVT_temp_p is a variable for temperature, and SNVT_rpm is a variable for angular velocity (rpm). MPEG-7 Part 5 Multimedia Description Schemas (ISO/IEC 15938-5) are an international standard of media description and expression [3]. In MPEG-7 MDS, we use three data types for DCI metadata: mediaTimePointType for representing time point in media; mediaDurationType for duration of playing time of media; and DescriptionMetadataType to describe general information of DCI metadata. We will discuss about structure of DCI in the next section. 3.2 Prefix of Namespace DCI metadata schema has three prefix of namespace such as dci, snvt, and mpeg7. The dci means DCI metadata schema itself, the snvt means the schema which defines various SNVT, and the mpeg7 means MPEG-7 MDS. 3.3 Design DCI Metadata Schema Based on the data types, we organize the DCI metadata schema As follow: A root element of DCI metadata is DCI; and DCI has three child elements as shown in Fig.4.
Fig. 3 SNVT_angle_deg in LonWorks Standard Document
Among the items in the table, we apply five items such as Type Category, Valid Type Range, Type Resolution and Units to define the data types. The Type Category represent how to represent variables using predefined variable types, such as unsignedInt, float, decimal, and boolean. The Valid Type Range restricts the range of variables and the Type Resolution defines resolution which can represent the values. The Units represent the units which are used to represent SNVT types. In SNVT_angle_deg case, the Unit is degrees. According to the items of SNVTs, we convert the following SNVTs to the corrersponding XML conversion SNVT types for usage in DCI metadata schema.
Fig 4. Structure of DCI element
In dci:DCIType, there are GeneralInfo, EffectDescription and DeviceControlDescription. The GeneralInfo describes general information of DCI metadata, the EffectDescription is container for every effect applied to the media, and the DeviceControl Description is the container for the control parameters for each device. We will take a close look at the three elements in the next section. 3.3.1 GeneralInfo element
SNVT XML conversion type SNVT_lux snvt:luxType SNVT_speed_mil snvt:speed_milType SNVT_angle_deg snvt:angle_degType SNVT_angle_vel snvt:angle_velType SNVT_mass_mil snvt:mass_milType SNVT_sound_db snvt:sound_dbType SNVT_time_sec snvt:time_secType
ISBN 978-89-5519-136-3
The GeneralInfo element uses mpeg7: DescriptionMetadata Type. In the GeneralInfo element, there are various sub-elements about description of metadata as follows.
-1131-
Feb. 17-20, 2008 ICACT 2008
is some part of ConditionType.
Fig 8 Structure of the CondtionType element
3.3.3 Device Control Description Fig 5. Structure of the GeneralInfo element
3.3.2 Effect Description EffectDescription has the dci:EffectDescription Type and enables the terminal
The
to know which types of effects are applied to the media and to match a target device for each effect. In the EffectDescription element, there is one child element, Effect. The Effect element describes the information of an effect. The following figure exhibits the structure of the EffectDescription element.
The DeviceControlDescription contains control information of each device and synchronization information for media interlocking. The synchronization information is represented by mediaTimePointType and media DurationType of MPEG-7. The DeviceControl Description has one child element, DeviceControlData. Each DeviceControlData matches with Effect in the EffectDescription by referencing TargetID.
Fig9. Structure of the DevideControlData element
Fig6. Structure of the EffectDescription element
In order to represent each Effect, Effect element should provide information of condition for the effect, give ID for each Effect and initialize the effect. Also, the EffectDescription element can contain many Effect elements. (maxOccurs of Effect is unbounded). The type of the Effect element is of smmd1ed:EffectType.
In the DeviceControlData element, there are the RefTargetID attribute, Sync and ControlData elements. The RefTargetID attribute points to TargetID of the Effect element in the EffectDescription element. Using this element, the terminal can recognize which device this control data is applied to. The Sync element describes timing information for synchronization with media. The type is of dci:SyncType. The dci:syncType has two attributes: start and duration. These attributes are represented by mpeg7:mediaTimePointType and mpeg7:media DrationType. The ControlData element describes various control parameters for device.
Fig7. Structure of the Effect element
In Effect element, there are two attributes and two child elements. TargetID and TypeOfEffect attributes function as naming or tagging of the Effect element. The Condition element provides a requirement to realize the effect, such as maximum/minimum brightness of lighting device and controllability of direction of wind. The type of the Condition element is of dci:ConditionType. The InitialEffect element describes the device control parameters for initial effect. The type of the InitialEffect element is dci:InitialEffectType. The following figure
ISBN 978-89-5519-136-3
Fig10. Structure of the DeviceControlDataType element
The type is dci:ControlDataType is the same as the type of the InitialEffect element because these two elements deal with control information of devices.
-1132-
Feb. 17-20, 2008 ICACT 2008
entire device control information for realizing the various effects. Each control data points to the ID’s of the effects which are mentioned above, synchronizes with the content, and describes the control information according to the effect, such as SetWindSpeedLevel and SetTemperatureLevel.
Fig11. Structure of the ControlDataType element
4. Authoring of DCI metadata Using the DCI metadata schema, we implement DCI metadata for RoSE media service. The following xml instantiations are some exemplars of DCI metadata for a romantic movie. The DCI metadata consists of three parts and each part describes about general information of the metadata, effects which are suitable for the content and control to present the effects as mentioned above, respectively. The first part is about the metadata in the GeneralInfo element and it represents the information about the version number, the time of the last update, and the creation time.
Fig 12. GeneralInfo
The second part of the DCI metadata is for the EffectDescription element and it describes all kinds of
effects which enable to present in the content. Each effect has its own ID, type of effect, and an essential condition for the effect.
Fig 14. DeviceControlDescription
According to the DCI metadata, RoSE media contains the effect information and device control information. So, the RoSE media service handler parses the DCI metadata, recognizes about the media content, checks the peripheral devices and sends the control information to the proper devices. We will introduce ROSE media service framework which enables to represent ROSE media in the next section.
4. A Service Framework of ROSE media Using the metadata schema, we can construct DCI metadata according to the media content. The media content and DCI metadata are combined together, and then the combined media create RoSE media. In Fig.1, RoSE media is fed into the RoSE media handler which sends the device control information to the peripheral devices such as a dimmer, an air conditioner and a perfumer. We realize our proposed RoSE media service framework based on UPnP AV architecture. Control Point Discover AV contents
Fig 13. EffectDescription
The
third
part
is
about
the
DeviceControlDescription element and it represents
ISBN 978-89-5519-136-3
-1133-
Media Sever
Discover AV players
Media Data Stream
Media Renderer
Fig 15. UPnP AV 1.0 Architecture
Feb. 17-20, 2008 ICACT 2008
UPnP A/V architecture consists of media server, media renderer and a control point. In Fig 15, there are three components which are composed of UPnP AV architecture. A control point is a kind of a user input device and controls the action of the media server and renderer. The UPnP AV architecture uses IP networking and each components sends signals and data stream using the network. To implement a service framework of ROSE media, we use the IntelTM tools for UPnP Technologies. It consists of Device Spy, Device Sniffers, Device Validator, Device Author, Device Relay, Network Light, AV Media Controller, AV Media Server, AV Renderer, and AV Wizard. Among the tools for UPnP, Device Spy, AV Media Server, AV Media Renderer, Device Author and Network Light can be used for RoSE media service framework. Device Spy is Universal Control Point (UCP) of IntelTM. As we mentioned before, control point is a component or device for controlling the actions of other devices, such as MP3 player, PVR and PC. We can test "action" request and events using this tool. Also, it traces packets which are sent to UPnP devices. AV Media Server is an AV media server which is configured to share local media files. It reads metadata from audio tags and image formats, and makes it available on the network. AV Media Renderer adds a rich set of AV features to the Windows Media Player and ActiveX control. The AV Renderer supports multiple connections, media types, and play lists. The Device Author is a tool for authoring the device description in XML document. It can generate and validate service description XML automatically. Also, it allows for service description XML to be imported from any UPnP device on the network. Network Light is a device which supports the SwitchPower and Dimmer services. Now we design the implementation for RoSE media service framework. The first step of RoSE media service is a packaging/unpackaging process. The reason why we perform packaging and unpackaging process is that DCI metadata is a new form of metadata. In UPnP framework, there are two kinds of metadata: one is device description metadata; and the other is service description metadata for a certain device. So, if media content and DCI metadata exists separately in network, it will be confused to recognize the DCI metadata. According to the media content, we write DCI metadata. Then we package the media content and DCI metadata. Next step is feeding the RoSE media to Media server using the AV Media Server of IntelTM tools for UPnP. To read the packaged form of RoSE media in UPnP framework, we also need unpackaging process. So, we need an application to solve the packaging process before the media server. Also, we require additional part for unpackaging process and sending to the Control Point in the media server. Then, the AV Media Server unpackages the RoSE media, sends the DCI metadata to control point and prepare to transmit the media content to the AV Media Renderer. And the AV Media Controller controls play or stop of the media content. Next step is checking the peripheral devices which are connected in network. Using Device Spy, we can notice that the devices exist in our network. Also, we can control the device using invocation function of actions of devices. We set our environment using a Network light, a AV Media Server, and a AV Media Renderer. To control the DCI metadata with devices and media player, we need additional part for
ISBN 978-89-5519-136-3
conversion of the DCI metadata into available SOAP message for UPnP framework in Control Point. Then, when the media renderer starts to play the media sent by the media server, the Control Point controls the devices which are connected in network according to conversion SOAP message of DCI metadata. Connected Devices
: UPnP Architecture : Additional Part : RoSE media Controller
Control Point Discover AV
conversion Discover Player
DCI Metadata
DCI metadata
unpackaging packaging
Media Server
Media data stream
Media Renderer (Players)
Fig 16. Advanced UPnP AV Framework for RoSE media service
Fig 16 shows service framework based on UPnP AV architecture. Fig. 17 is an example which turns on/off and dims a light using Network Light and Device Spy.
Fig 17. UPnP service example of controlling light
5. Conclusion and Future Works In this paper, we propose a new emerging media, called and DCI (Device Control Information) metadata schema for RoSE media service. The RoSE media is interlinked and synchronized with peripheral devices which are connected in the network. Also we design a service framework for RoSE media using DCI metadata based on UPnP AV architecture. The proposed RoSE media can enrich user’s experience in more realistic and sensible way to open a new media, which are now under consideration for MPEG standardization. REFERENCES [1] Sanghyun Joo, et. al., "A Service Framework for Device-rendered Sensible Media," ISO/IEC JTC1 SC29 WG11, MPEG82/m14900, Shenzhen, China, Oct. 2007. [2] LonMark SNVT MasterList, Version 13 Revision 00 January 2006. [3] MPEG-7 Part 5: MDS, ISO/IEC 15938-5 [4] B. S. Choi, et. al., "Device Control Information Metadata Schema for Device-Rendered Sensible Media," ISO/IEC JTC1 SC29 WG11, MPEG82/m14976, Shenzhen, China, Oct. 2007 [5] John Ritchie, Thomas Kuehnel, “UPnP AV Architecture:1”, UPnP Version 1.0, June 25, 2002 [6] http://www.lonmark.org.
-1134-
Feb. 17-20, 2008 ICACT 2008