Management of multimedia data using an Object-Oriented Database System M. Adiba
R. Lozano
H. Martin
F. Mocellin
Laboratoire LSR Universite Joseph Fourier B.P. 72 - 38402 - Saint Martin d'Heres Cedex - France Tel : (33/0) 4 76 82 72 02 , Fax : (33/0) 4 76 82 72 87 mail: fMichel.Adiba/Rafael.Lozano/Herve.Martin/
[email protected]
Abstract
A MultiMedia DataBase Management System (MMDBMS) must provide facilities to store, model and query multimedia data. In the context of the STORM project1 , we are developing an object-oriented client-server MMDBMS which addresses these problems. We use an object-oriented approach to develop reusable components which can be easily integrated in the general framework of multimedia database applications. In this paper, we present the main functionnalities of our system and we focus on video objects and their integration in multimedia presentations. Keywords: Object-Oriented DBMS, Multimedia, Video, Presentation, Temporal, Query language.
1 Introduction
Because it improves communication and information exchange between people, multimedia information becomes extensively used for applications such as entertainment, medicine, advertising and education. A mutimedia information system (MIS) encompasses representation, storage, retrieval, process, transmission and presentation of several time dependent (e.g. video or audio) and time independent (e,g. image or text) media [6]. So, building a MIS is an exciting challenge and many problems such as managing presentations including several types of data composed and synchronized in dierent manners have attracted a lot of attention from the research community [3, 4, 6]. To address this challenge, a MIS must ful ll several requirements as multimedia data storage, multimedia data structure and presentation modelling and query speci cation. supported
by SFERE-CONACYT Mexico STORM: Structural and Temporal Object-oRiented Multimedia database system 1
Over the years, several approaches have been used to design MIS. Many multimedia authoring systems like toolbook and Director [2] propose sophisticated and uni ed environments for creating multimedia applications. These approaches focus on non conventional data for constructing multimedia presentations. They do not take into account semantical links between mutimedia data and conventional data (for example descriptive attributes of an object Person and its photo which appears in a presentation). This separation implies some limitations especially in the data extraction process. Conversely, it is dicult to access multimedia data from data stored in the database. In this paper we present the STORM multimedia database system which is a client-server object oriented DBMS extended with speci c multimedia aspects. We aim to show that merging conventional and multimedia data increases MIS capabilities. We propose a model to capture information about each media. We focus on hierarchical structure of video data because video delivery systems are a very important class of multimedia applications and video structure can be complex to model. We also propose an objectoriented approach to store, query and play multimedia presentations. A presentation is an application which combines dierent types of multimedia data.
2 STORM MMDBMS Architecture
The architecture of our STORM multimedia DBMS ( gure 1) is similar to that one proposed in [3]. Instead of de ning a new DBMS we extended the O2 object-oriented DBMS [5] and as far as possible, we use DBMS tools for managing multimedia objects. We chose the O2 DBMS because it provides several facilities for storing and manipulating objects and because it is ODMG compliant [1]. We use the object-oriented
data model to model speci c structure of each media and to model multimedia presentations. We extended the OODBMS client/server services to support synchronization constraints (e.g. multi-threading). The client machine is a multimedia workstation which is used to create, manipulate and perform multimedia objects stored into the DBMS server. A set of development tools is provided which get to server through the C++ Application Programming Interface. STORM architecture consists of three layers: (1) Database management layer for multimedia objects, (2) Multimedia object management layer, (3) user interface layer. Database management layer provides facilities for managing each type of media with suitable storage techniques and gives a canonical representation of each monomedia data type in order to capture their complex structure. Moreover, the object model allows to de ne a hierarchy of classes for specifying the structure and the behavior of each media data type. As we show in the next section, we widely use object-oriented concepts such as class, inheritance and methods. Presentation management layer models relationships between objects in order to create, query and run complex multimedia presentations which includes several objects with spatial and temporal constraints. We propose our own class library presented in section 3. We introduce the concept of temporal shadow which consists to add temporal properties (delay and duration) to each object that appears in a presentation. The O2 system supports ODMG/OQL language. We extend this language in order to query temporal attributes of presentations. User Interface Layer proposes some facilities as images display or videos playout and also furnishes tools to create and to play presentations according to synchronization constraints. In our environment, the O2 Look, built on top of X-Window provides primitives to present objects on the screen, to handle user interactivity and to handle spatial aspects.
3 Multimedia Object Management
3.1 Multimedia objects of the database
To integrate images, texts, audios, and videos, we used four classes Image, Text, Audio and Video. Image and Text classes are imported from the O2 Kit schema proposed by the O2 DBMS. Audio class allows to model and store in bits format audio objects.
3.1.1 Video class Our video model takes into account the fact that video data are dynamic and time dependent. The object
structure represents the hierarchical aspect of video (cutting into sequences, scenes and shots). An important feature of DBMS is to separate the physical data storage from the external view of information. In order to meet this requirement for video data, we introduce the concept of virtual video. A virtual video is an algebric formula which produces as a result a new video. Because this video is not physically stored, we quali ed it of virtual video. A parallel with views in relational data model may be easily established. We have created a schema of classes to model a video. The root of this schema is the Video class. It is an abstract class whose role is only to de ne three subclasses RAW VIDEO, VIRTUAL VIDEO, VIDEO DOCUMENT and to specify that it is possible to apply basic operations as (play, pause, rewind, forward, ...) to all objects of subclasses. 1. The RAW VIDEO class allows to integrate raw videos with dierent formats types into the database. 2. The VIRTUAL VIDEO class is the kernel of our model. A virtual video is a list of video segments where each video segment has the following form: = [src : Video; sf : integer; ef : integer; f req : integer; rep : integer] VI
(1)
where [ and ] represent a tuple. A video segment is de ned over the video src which could be a raw video, a virtual video or a video document; f s and f e represent respectively, the frame number of the begining and the end of the video segment ; f req is the frame rate to be used at video playing time and rep is its \repeating factor". Because we do not store the physical frames which compose a video virtual and rather, we store the de nition of each video segment which forms part of it, a video which have been built in this way is named virtual video. In short, a virtual video is a list of segments where each segment is a tuple as de ned previously. We do not keep the trace of all algebra expression as in [7]. We just store a list of intervals which compose the virtual video and some properties to modify their initial behavior (frequency, repetition). The algebra operators used to compose virtual videos are: Concatenation (+) : this operation takes two virtual videos and creates a new virtual video composed of the rst virtual video following by the second. Union (%) : this operation has the same semantic than concatenation but it eliminates the part of the second operand common with the rst operand. Intersection (^) : this operation creates a virtual video with the common part of the operands.
Navigation Tool
Interactive Layer (Layer 3)
Media Editing
Multimedia Query Interface Construct, Query, Present
Presentation Management Layer (Layer 2)
Meta-Schema
STORM O2Kit
O2Look
O2C
OQL O2Engine
Objects & Values Audio
Video
Text
Image
SO
Objects Server O2
Database Management Layer (Layer 1)
DB
Figure 1: Architecture of the MMDBMS.
Dierence (-) : this operation creates a virtual video which is composed of the same images of the rst operand but without the common images with the second operand. 3. The VIDEO DOCUMENT class speci es the hierarchic logical structure of a video. We propose a normalized structure (sequence? >scene ? >shot) or a free structure (level n? >level n-1? > ... ? >level 1).
3.1.2 User de ned classes We do not impose any structure to the user. Nevertheless to bene t of our environment, each monomedia object (e.g. a video) must be an instance of the corresponding class (e.g. Video class). Then, the user can create his/her own class with his/her speci c attributes to manage multimedia database objects. Figure 2 sketches a possible hierarchy of user de ned classes with some textual attributes which will be used later in the query section 4. Afterwards, we show that we can present each object of the database like Video objects or MyImage objects.
3.2 STORM objects as multimedia presentations
We associate to each multimedia object belonging to a presentation a couple of temporal elements, duration and delay which constitute the Temporal Shadow of the object. For any object x, duration(x) is the time (in seconds) during which the object will be perceived by a human operator. For instance, the duration of an image is the time during which the image is displayed on the screen. A duration is either
a free (i.e. unlimited) value , or a bound (i.e. limited) value. By default it is free if the corresponding item is static (e.g. an integer, a text, an image) and bound in the case of dynamic or ephemeral data such as audio or video. For instance this means that the duration associated with an image is unlimited by default. Once an image is displayed, it is the user's responsibility to erase it. Of course, it is also possible to allocate a xed amount of time (e.g. 5 minutes) during which the image is displayed and then automatically erased. We also associate with each object a delay. For any object x, delay(x) is the time (e.g. in seconds) before observing x. In other words, it is the waiting time before presenting (i.e. playing) the object at the interface level. For instance, wait ten seconds before playing an audio comment. Here again we have either bound or free delays. The Temporal Shadow of the object allows the presentation in an individual way. But we also want to specify presentation including several objects. So we introduce sequential and parallel synchronization constraints. A sequential presentation is a serie of objects which are mutually time exclusive. In a parallel presentation, several objects can be observed simultaneously, each object has its own timeline. Some constraints can be applied on the synchronization between objects via temporal relations : fseq meet, seq before, par equal, par start, par finish, opar verlap, par duringg. For instance, seq meet constraint between two objects speci es that there is no delay between the two objects
name : string, author : string subject : string, keywords : list(string),
MyObject
MyVideo content : Video
MyAudio
MyImage
MyText
MyShow
type : string, format : string, content : Audio
type : string, format : string content : Image
content : Text
content : Show
Figure 2: Hierarchy of database objects classes SO_Image Static duration : Duration,
NE : Point, SW : Point, content : Image
SO_Text SO_Audio
SO
Dynamic SO_Video
delay : Duration
Show
NE : Point, SW : Point, content : Video
content : list(SO), constraint : string
Figure 3: Hierarchy of O2 classes presentation. Each presentation is a STORM object which is a quadruplet(i, , c, d) where i is an object identi er. The duration d and the delay constitute the Temporal Shadow. The data to be presented is the content c which can be either monomedia (or atomic) or composed of several objects using sequential and parallel operators. For spatial properties, we add NE and SW attributes to monomedia objects which will be displayed on the screen, they are spatial coordinates for the object display. Figure 3 sketches our STORM classes library. The root is SO for STORM Object, it owns one attribute delay which each STORM object will inherit (i.e. paragraph 3.1). The class SO has also several methods as the method \start presentation" which can play each presentation. Classes like SO C where C is either Text, Image, Video, Audio or the user defined classes (i.e. section 3.1.2 are used to associate with a database object one or several presentations. This approach establishes a clear distinction between database objects and presentations of them. The use of oid allows several presentations to share the same object with dierent
Temporal Shadows. The class Show allows to compose a presentation with objects of the same or dierent nature. The content attribute corresponds to a list of SO objects which compose the presentation, and the constraint attribute represents their synchronization. In the Show class we nd methods like\synchro" and \components" which are used for querying purposes. The \synchro" method returns a string which represents the type of synchronization between two objects of the presentation : fmeet, beforeg and fequal, start, finish, overlap, duringg. For instance, if o1 and o2 are objects of the Show s, s->synchro(o1, o2 ) returns \meet" if the synchronization between o1 and o2 is sequential with a meet constraint. The \components" method returns a list of SO objects which belong to the show. In the following section, we show how a complex presentation can be constructed.
3.3 Constructing a multimedia presentation
The creation of a presentation is done in several steps : (1) create or retrieve database objects which are part of the presentation, (2) create corresponding STORM objects (of the SO class) with temporal and
(S)
(S2)
(A1)
(S1) (T1)
(I1)
(V1)
Figure 4: SO objects of the parallel presentation spatial aspects, (3) establish parallel or sequential synchronization. Then we can play the presentation and make it persistent. For instance, suppose we want to create a multimedia presentation composed of several database objects which presents the \Meribel" resort. We use four multimedia objects for this presentation : a music, a photo of the resort, a text which presents Meribel and a video of skier. The time of the music corresponds to the duration of the presentation. The image and the video are presented in sequence and the text is displayed in parallel. The construction of this presentation requires several steps : 1. First, we capture database objects which compose our presentation. For instance, we search im-
ages which have \Meribel" resort as subject. Then we choose one of them which will be part of the presentation. In the same way we retrieve the other three objects (MyText, MyAudio and Myvideo) which will be part of the \Meribel" presentation that we want to create. 2. For each selected object, we create the corresponding STORM object which content attribute refers to the oid of the object to be presented. Each created STORM object owns a Temporal Shadow and space coordinates (see gure 4. A1, V1, T1, I1). 3. Then we de ne the synchronization between STORM objects. The chosen synchronization is:
par during(A1, par equal(T1, seq before(I1, V1))).
4. Finally we can retrieve this presentation to playing or updating it, we use the OQL language (i.e. Section 4). To play the presentation, the user only executes the \start presentation" method (see section 3.1.2.
4 Query multimedia objects
In this section we show how we address extensions of OQL in order to oer a general framework for querying multimedia data. First, we use descriptive attributes or temporal aspects of the database objects to query presentations. Second we are interested to the synchronization between them. The following persistent roots are used for query expression: Shows: set(Show) and SO MyAudios: set(SO MyAudio). Use of descriptive attributes: STORM objects can refer to a database object which has several attributes: name, subject, keywords, etc. A keywords list (keywords) is associated with each MyObject object which describes a monomedia object. Q1 : select all presentations about "Sport": Select s from s in Shows, so in s->components where ``Sport'' in so->content->keywords
We can also query SO objects on their temporal aspects delay and duration. Q2 : select titles of musics which last less than 10 minutes (or 600 seconds) and which are composed by Beethoven: Select distinct sa->content->name from sa in SO MyAudios where sa->duration < 600 and sa->content->author = ``Beethoven''
Querying on synchronization between objects: We extended OQL to query presentations on
the synchronization between objects which compose it. For that we use the available methods in the show class (see section 3.2) and the title method which returns a string which corresponds to the class name of the object which receives this message. Q3 : select each picture of Grenoble which is immediately followed by a picture of Paris: Select pg from s in Shows, pg in s->components, pp in s->components where pg->content->subject=``Grenoble'' and pp->content->subject=``Paris'' and pg->title=``SO MyImage'' and pp->title=``SO MyImage'' and s->synchro(pg, pp)=``meet''
5 Conclusion
In this paper, we presented our current works in Multimedia Object-Oriented Databases. We aim to show that object-oriented approach is suitable for extending DBMS with multimedia facilities. We focus this paper on some basic and essential features of multimedia systems as the general schema and query speci cation. We also present dierent types of queries using the almost standardized language OQL. Obviously, such a language must be extended to take advantage of characteristics inherent to multimedia data like the visual and the spatial characteristics of image. But those extensions must be made by integrating new paradigms with standard query languages. The major contribution of this paper is to show how database concepts as integrity constraints and logical model can be used to handle multimedia data.
References
[1] Cattel (R.) (edite par). { Object Databases : the ODMG93 Standard. { Morgan-Kaufmann, 1993. [2] Cutler (R.) et Candan (K. S.). { Multimedia Authoring Systems, chap. of Multimedia Database Systems, pp. 279{296. { Subrahmanian, V.S. and Jajodia, S., 1996, Springer edition. [3] Ghafoor (A.). { Multimedia Database Management Systems. ACM Computing Surveys, vol. 4, n 27, December 1995, pp. 593{598. [4] K. (Aberer) et W. (Klas). { Supporting Temporal multimedia Operations in Object-Oriented Database System. In : International Conference on Multimedia Computing and Systems. { Boston, may 1994. [5] Mocellin (F.), Martin (H.) et Adiba (M.). { STORM : a Structural and Temporal Object-oRiented Multimedia database system. In : Demonstration and poster, Conference EDBT96. { Avignon, march 1996. [6] Nwosu (K.), Thuraisingham (B.) et Berra (B.). { Multimedia Database Systems: design and implementation strategies. { may 1996, Kluwer Academic Publishers edition. [7] Weiss (R.), Duda (A.) et Giord (D.K.). { Composition and Search with a Video Algebra. In : IEEE multimedia, ed. par Spring, pp. 12{25. { 1995.