an object-oriented approach - IEEE Computer Society

Audio/Video Databases: An Object-Oriented Approach Simon Gibbs, Christian Breiteneder and Dennis Tsichritzis

University of Geneva

Abstract

AV database may store a compressed representation (e.g., P E G [22] or MPEG [ 131encoded video) or an altemate representation from which the audio or video sequences are produced (examples would be synthesizing digital audio from MIDI' data or rendering video frames from animation data). Because of the temporal nature of AV values, their production and consumption often require specialized devices capable of the real-time handling of streams of data. Until recently this equipment was expensive and not readily available. However a number of significantadvances are now taking place that are greatly increasingthe use of AV data. These developments include advances in high-bandwidth networks and protocols facilitating real-time transfer of digital audio and video (e.g., broadband ISDN and ATM); improvements in storage media such as high-capacity magnetic disks and writable CDs; faster rendering rates for graphics hardware allowing real-time animation; greater availability of special-purpose audio and video processors on workstations; and better computer interfaces (e.g., provision of serial-line interfaces) to both commercial and professional video products such as camcorders, VCRs, and video mixers. Another significant development is real-time compression and decompression hardware for digital video. The compressed video has data rates comparable to bus and disk bandwidths and so opens the possibility of video recording and playback from conventional secondary storage devices. In addition, an anticipated future development having broad-impact on the use of video, will be the emergence of standards for HDTV. With the confluenceof the above activities in networking, compression and storage technology, the need for AV databases is becoming apparent. Since AV databases are closely related to those for multimedia we begin, in section 2, by giving some background on existing multimedia databases and point out some of their limitations. Section 3 proposes a definition for AV database systems and describes how they differ from traditional database systems. The main contribution

The notion of an audiolvideo, or AV, database is introduced. An AV database is a collection of AV values (digital audio and video data) and AV activities (interconnectable components used to process AV values). Two abstraction mechanisms, temporal composition and flow composition, allow the aggregation of AV values and AV activities respectively. An object-oriented framework, incorporating an AV data model and prescribing AV databaselapplication interaction, is described.

1: Introduction Storing and retrieving multimedia data has been investigated in the past (e.g., [4][6][7][12][23][24]), often with an emphasis on text and images. Audio and video have received somewhat less attention, perhaps because of their extremely high storage requirements (for example, one second of high quality digital video can occupy tens of Mbytes). However a number of ongoing developments (these will be described below) are greatly increasing the viability of digital audio and video. Current products such as CD-I [ 161, DVI [lo][ 171 and QuickTime [2] now bring these media to the personal computer user. The results are growing collections of digital audio and video material and a growing need for systems which manage this material by providing the functionality generally associated with database systems. In this paper we introduce audio/video, or AV, databases. We identify useful abstractions for AV data and address their implications on database design. The main contribution of the paper is a framework which serves both as a guide in constructing AV databases and applications, and as an AV data model. Central to this framework is the notion of AV activity - interconnectable components existing within the database and within applications. AV data consists of temporally correlated digital audio and digital video material. Digital audio is basically a sequence of digitized samples, while digital video is a sequence of raster frames. However, in either case the sequences may not be stored directly in an AV database. Instead an

1. MIDI, or Musical Instrument Digital Interface, is a standard for communicating with musical devices.

381 1063-6382/93$03.00 0 1993 IEEE

of this paper lies in section 4 where we present a object-oriented framework for AV databases and AV applications.

extracted information to avoid retrieval and processing of the originals. Multimedia Database Systems Multimedia database systems replicate to some extent the features of multimedia document systems, e.g., modelling of complex object stmcture, content-based retrieval, and differentiation of intemal and extemal representation. The difference is that these systems also stress typical database systems issues such as concurrency control, access control, recovery, versioning, and management of storage devices. Anexampleof amultimediadatabase system is ORION’S Multimedia Information Manager (MIM) [23][24]. One interesting aspect of the MIM is that all devices (storage devices as well as U 0 devices) are represented as objects. For 1/0 devices these objects further specify how information is presented and captured. MIM supports images, voice and analog video. The problem of version control has also been investigated. The features of the various systems mentioned above and the requirements they present for multimedia database systems can be summarized as follows: Most of the work done up to now favors the object-oriented approach and suggests the use of an OODBMS. With minor divergences, the constructs which should be supported by the multimedia data model (e.g., [ 12][23]) are widely accepted. As storage requirements of multimedia data are large, techniques to minimize storage space on the physical level (e.g., compression techniques) and the logical level (e.g., data sharing through aggregation) are considered important. Querying and browsing of multimedia information as well as control of data capture and presentation must be provided. Additionally, multimedia database management systems provide traditional database functions such as support for multiple views, content-based retrieval (which is problematic for image and audio, but at least discussed in several lists of requirements), concurrency control, transaction management, security (an issue discussed in database research, but has never been really addressed in multimedia database systems), backup and recovery. Finally, version control is also considered important. The purpose of this brief summary is to help understand whether current multimedia database systems adequately support audio and video. Although there is some support for audio annotation, it appears that multimedia database systems, as described above, would have difficulties with AV applications. These applications deal with sequences of data that must be retrieved, processed and presented subject to real-time constraints. The sequences typically require synchronization (e.g., an audio track to video) and may require sophisticated processing by the database (e.g., decompression or synthesis). Yet support for temporal sequences, their synchronizationand processing, are not part of current multimedia database design.

2: Multimedia databases Systems for storing and retrieving “multimedia information” have been designed and developed for different application areas in the last decade. Most of the systems we will describe are prototypes tailored for specific application areas and so provide domain-specific functionalities and features, however one can also identify certain features essential for multimedia databases in general. Multimedia Document Systems One of the earliest uses of multimedia information systems comes from office automation. The goal here is to provide flexible retrieval from large repositories of “multimedia documents” - text documents enhanced by raster images, graphics and audio annotations. Two examples are MINOS [7] and MULTOS [4]. Both systems offer the following features: modelling of complex object structure, content-based retrieval, and differentiation of intemal (logical) and extemal (layout) representations. Whereas content-based retrieval is possible for text components, it is far more difficult for images and audio components. In the case of MINOS, content-based retrieval for images is limited toconditions on their existence and their type, MULTOS differentiates between active and passive components; existence and content conditions are allowed on active components, while only existence conditions are allowed for passive components. Active components contain attributes and text values, passive components contain image, graphics and audio values. Other multimedia document systems put less emphasis on content-based retrieval. For instance, DIAMOND [20] uses simple file system-like folders for retrieval and stresses network communication-transmission of documents has to be handled flexibly;e.g., a node which cannot handle images receives only documents where the images have been replaced by a notification message. Still other systems focus on tools for authoring (creation of multimedia documents) and tools for interlinking media components (hypermedia). Image Database Management Systems Image databases stress the importance of image processing. The extent to which image processing should be provided by multimedia databases in general is an open issue, however restricted content-based retrieval for images, e.g., by some form of similarity measure, is a possibility. For example, RED1 [6] (Relational Database System for Images) allows simple content retrieval (based on similarity) for LANDSAT images. It combines systems for image processing, image recognition, and database management. Image structures and features are extracted from images and stored in a relational database, while the original images are kept in a different image store. The query interface (Queryby-Pictorial-Example)first tries to answer a query using the

382

3: Audiohide0 databases

concurrency control, recovery mechanisms, etc. In order to implement such functionality for AV data, two problems appear fundamental. These are first AV data modelling, or how to structure and organize AV data, and second, the nature of the interface provided to clients of an AV database. Exploring these two problems is the aim of this paper.

The notion of multimedia databases that emerged in the previous section pays little attention to audio or video media. AV databases require a different perspective, one more attuned to the temporal nature of these media. We consider the crucial difference to be related to activities. The correct view of an AV database is not as a simple repository of data, instead an AV database should be considered as a locus of AV activities, where an activity may involve processing of AV data or the exchange of AV data between the database and an application.

3.2: Scenarios How could AV databases be used in practice? We give two scenarios: Scenario I : The Corporate AV Database Consider a large software producer with an intemal multimedia network. The company uses video for a number of purposes. A professional in-house production group prepares product announcements and other promotional videos. Important project presentations,video conferences, and demos are also recorded and edited. Various public broadcasts are captured and archived. The entire video collection is managed by an AV database system. The video material is accessible through a hypermedia interface which links, for example, the documents describing a project to the video of a presentation by the project leader. Users modify the database,either through the hypermedia interface or other specialized applications such as workstation-based video editors. ScenarioII: The VirtualWorldAVDatabase An AV database supporting “virtual worlds” is provided as a network service. The contents of the database include 3D scenes, high-resolution raster images, “surface scan”data [3], video and audio clips, and graphic objects. Users interactively move through the virtual world by querying the database. As the user changes position, a new visualization of the world is rendered at the database site, resulting in sequence of images (an AV value) being sent to the user.

3.1: Definitions The above view of AV databases is elaborated in the following definitions: Definition:An AV value, v, is a (finite) sequence, vi, of digital audio or digital video data elements. AV values are examples of so-called “dynamic,” “continuous” or “time-based” media. We use the term “AV” to emphasize that audio and video are our primary concern. However, we consider databases of the form described in this paper to be required for time-based media in general. Definition:Each AV value has a media data type governing the encoding and interpretation of its elements. The type of v (and v itself) determine r, the data rate of v. Examples of media data types include CD encoded audio (data elements are pairs of 16 bit audio samples occumng at a rate of 44.1 kHz) and CCIR 601’ digital video (data elements are frames of 8 bit video samples, the sampling rate is 13.5 MHz); here the data rates are the same for all values of these types. Other examples are the compressed digital video formats (e.g., MPEG,DVI) which allow a range of video resolutions and so span a range of data rates. Definition: An AV activiry is the production andlor consumption of AV values at their associated data rates. AV activities can be classified as sources (producers), sinks (consumers) or transformers (processors) of AV values. Definition:An AVdatabase system is a software I hardware entity managing a collection of AV values and AV activities. The system is capable of storing a large number of AV values and controls their concurrent access. The AV values are organized by structures and abstractions specified using a data model. Clients (applications) issue requests to the database. Certain requests, such as queries, may return references (i.e., names or identifiers) to AV values rather than the values themselves. Other requests cause AV values to be produced, consumed and processed. These requests involve AV activities, which may exist within the client or within the database system. AV database systems should provide the functionality found in traditional database systems, i.e., query processing,

3.3: Characteristics of AV databases The proposed view of AV databases as “loci” of AV producers, consumers and transformers has a number of implications concerning their design, functionality and client interface. Here we identify five general characteristics differentiating AV database systems from other forms of database systems: database pla@orm- should allow specialized hardware The production, consumption and processing of AV values (examples of processing include format conversion, filtering, compression and rendering) often requires special hardware such as analog-to-digital and digital-to-analog converters, digital signal processors, and graphics pipelines. It should be possible to place some of these hardware elements under control of the database system. The reasons are: First, certain devices are very expensive (e.g., digital video effects processors [ 191) and it is more cost-effective if they can be shared by different clients. Second, placing the device with the data can drastically reduce network traffic. This is extremely important given the high data rates associated with AV media.

1. A digital video standard developed by an international association of broadcasters.

383

scheduling - should allow application involvement Traditional databases perform various scheduling activities. For instance, read and write requests are scheduled by the concurrency control sub-system and disk accesses are scheduled by the storage sub-system. However, these scheduling activities are hidden from the client. With AV databases, certain forms of scheduling should be under client control. The need for this derives from three observations: First, AV values often have long durations (i.e., of minutes or hours). Second, it may not be possible to allow concurrent use of special-purpose hardware. Third, system resources (buffers, processor cycles, bus bandwidth, network bandwidth) are limited. The result is that client requests can tie up resources, or the database itself, for significant periods of time. Consequently, concurrent access to AV data may require explicit scheduling (in particular, resource pre-allocation) by clients. The above is a rather pragmatic reason for client involvement with scheduling, there is also a more fundamental reason. We have mentioned that AV sequences are temporally correlated. The database system is responsibleforcoordinating the presentation of these values, in particular starting and stopping various sources and maintaining synchronization of activated sources (because of unpredictable system latencies, AV values tend to jitter and require regular resynchronization). However, in order for the database system to perform this feat of coordination, it requires a specification of the temporal constraints applicable to the AV value. These constraints are determined by clients and they basically involve scheduling, i.e., indicating when particular values start and stop. client interface - should be asynchronous, stream-based With conventional database systems, the interaction between the client and the system resembles a call-by-value procedural interface. The client issues a request and then receives a reply. The requests and replies contain all the data exchanged by the client and the database system. With AV data this form of interaction is not suitable. For example, consider a client which requests an audio value. In many cases the client will not be interested in handling the reply stream sample-by-sample, but rather simply directing the stream to some suitable sink such as a digital-to-analog converter. (Of course, at some level, there would be software on the client side which transfers samples to the converter; the point is that inner-details of this activity should not be expressed in the database/client interface.) As another example, consider timing. Here again the issue-request / receive-reply mode of interaction is inadequate. Certain AV values require significantlengths of time for their transfer’. The client does not want to “block” during such transfers. Rather it needs to initiate the transfer and then proceed to other tasks, perhaps being informed when the transfer is complete. Thus the client interface should be based on no-

tions of multiple tasks, stream redirection, and asynchronous notification rather than on a simple issue-request / receive-reply protocol. data placement - should allow application involvement Assuring physical data independence can severely diminish the usability of an AV database system. To give an example, consider an application which combines two (or more) video values. Such “video mixing” is commonly used during video editing. Depending upon the characteristics of the storage devices in use, it may simply not be possible for the database to simultaneously produce the two video values unless they reside on different devices. Thus to preserve physical data independence the database system would, when the two values are initially on the same device, need to copy one video value to a temporary area on a second device. This could be so time-consuming as to destroy any sense of interactivity (which is the main advantage of “non-linear’’digital video editing as opposed to video tape editing). The altemative then is to make visible to the client some aspect of the physical storage structure so that the two values can be assured to be available simultaneously. data representation - applications should deal with “qualiryfactors” There are many alternatives for encoding and compressing digital video and digital audio. It is also possible that AV data may be stored in analog form and digitized only when needed. (An analog videodisc jukebox provides a video storage capacity difficult to achieve using magnetic disks. Both write-once and rewritable videodiscs are now available, and form an attractive and practical altemative to digital technology.) Application programs need not be aware of representations used by the database system. Instead applications should specify data representation indirectly, in terms of AV “quality factors.” These factors include, for example, video resolution and audio fidelity.

4: An AV database framework The purpose of the proceeding section has been to illustrate some of the unique characteristics of database systems that support AV media. In this section we present a framework for AV databases and AV applications. Here the term “framework” is used in the rather specific sense found in object-oriented programming [SI[ 1I]. In particular, a framework is a collection of interdependent classes - a 1. In some cases, by exchanging compressed AV data, transfer durations can be reduced to those experienced with traditional database systems. This is not possible in general since: first, there may be inadequate resources (storage,bus or network bandwidths) for the compressed data, and secondly, the data may involve a “live” source in which case it is impossible to compress the entire value prior to exchange. (Examples of live sources include video cameras, microphones,and values that are changing due to interaction with the client - for instance a video sequence depicting an interactive walkthrough of a 3D model.)

384

ing with AV media require two specializations,these classes would resemble: class Videovalue subclass-of Mediavalue (

framework can be thought of as embodying a generalized design, or conceptual model, for some particular application area or system type. Typically the framework will consist of a number of abstract classes, classes providing general behavior and not intended for direct use in applications. Instead, the framework classes are specialized to form concrete classes - which can be instantiated and manipulated by the application programmer. To express database system functionality through the use of a framework, the framework must address aspects of data modelling, database design and application design.

int int int int

width

height depth numFrame ImageValue frame[numFrame]

I

class Audiovalue subclass-of Mediavalue (

int int int int

1

4.1: AV data model

numchannel depth numSample

sample(numChannel][numSample]

Each of these classes would in tum have a number of specializations reflecting different encoding and storage strategies for AV data. Possible specializations of Videovalue include JPEG-Videovalue, MPEG-Videovalue, DVI-VideoValue, CCIR-Videovalueand LV-Videovalue(for values stored on LaserVision videodiscs). The class to which a value belongs must be specified, or deduced, when the value is created. However an application working with existing AV values can use the generic Videovalue class and thus be screened from underlying differences in representation. Generally applications should avoid explicit references to particular AV data representations. Instead AV storage and presentation requirements should be specified via quality factors. A video quality factor is an expression of the form: wxhxd@r indicating a video resolution of width, w, and height h, pixels, a depth of d bits per pixel and a rate of r frames per second. An audio quality factor is a description such as voicequality, FM-quality, or CD-quality. Other formats for expressing quality factors are possible; the selection of a format is not a critical issue. What is important is that an AV database system, given a quality factor, be capable of determining a data representation (if more than one possibility exists), the appropriate encoding parameters, and storage and processing requirements. The ability of AV applications to specify presentation quality may appear infeasible since quality is determined when a value is captured not when it is retrieved. There is the possibility, though, of changing quality. One example is the notion of scalable video [ 141, a proposal for increasing the device-independenceof digital video. Using a scalable representation, a video value encoded at one quality can be viewed at a lower quaIity by ignoring some of the encoded data. It is also possible to view a value at “higher” quality than used for encoding by scaling and interpolationof thedecoded data (however this does not add information). We now give an example of a class that could appear within an AV database. The class records news broadcasts and contains a number of descriptive attributes (title, etc.) and a single video-valued attribute for which a quality factor has

The AV database framework is an extension to an objectoriented framework for multimedia programming that has been described elsewhere [9]. The earlier framework introduced two abstract classes for developing multimedia applications.These two classes, called Mediavalue and MediaActivity, are the starting point from which media-dependent classes are specialized (the names have been altered slightly from the original description, frameworks evolve and this naming reflects the current version). A partial specification for Mediavalue is: class Mediavalue { WorldTme duration Worldilme start ObjectTme WorldToObject(Worldlirne) WorldTme ObjectToWorld(0bjectTime) Scale(f1oat) Translate(Worldlme) Mediavalue Element(WorldTme)

I In the above and following examples we will use a simple syntax for class definitions. Each class contains attributes and methods. Attributes have a name and a class to which their values belong. For example, duration is an attribute name whose values are objects from the class Worldlrne. Methods have a name and a signature identifying classes of the return value (if present) and parameters. In the Mediavalue class, the method WorldToObject takes an instance of WorldTme as a parameter and retums an instance of ObjectTime. The Mediavalue class makes use of two temporal coordinate systems, world rime and object time, and provides the basic behaviors (methods) for handling temporal sequences. The units of world time are specified by the Mediavalue class, whjle the units of object time are a subclass responsibility. As an example, a subclass dealing with video could measure object time using video “timecode” (where the smallest unit is 1/30th of a second). The methods of Mediavalue include transforming between temporal coordinates and scaling or translating temporal sequences. The implementation of these methods is a subclass responsibility. The specializationsof Mediavalue correspondto the media data types introduced in definition 3.2. Applications work-

385

been provided. (Quality factors are optional in class definitions. If absent, stored values can be of varying quality.) class SimpleNewscast ( String title String broadcastSource String keywords Date whenElroadcast Videovalue videoTrack quality 640 x 480 x 8 @ 30

4.2: AV operations AV data can be viewed as having two states: First, there is a passive state where data is stored in some format and serves as the target of modification and query operations. Second, and of more interest, AV data has an active state. In this form it is best thought of as a stream, i.e., a rate can be associated with the data and operations on the data must proceed at this rate. Certain operations can be applied to AV data in the passive state. The traditional database operations - inserting, deleting, modifying and querying data items - fall into this category. For instance, it should be possible to take a Newscast object and modify the value of its videoTrack attribute; perhaps changing particular frames or perhaps adding or deleting frames. These operations have no timing constraints and so can be expressed in current database systems (however, this does not imply that they would be implemented efficiently). Other operations, in particular recording and playback, deal with AV data in its active state. In order to support such operations, AV database systems must manage streams of data in addition to passive data elements. This fundamental requirement has implications on many aspects of AV database system design and implementation.Discussion of these issues in full is clearly beyond the scope of the paper, instead we focus on one particular question: what abstractions should be provided to database applications so that they may effectively manipulate AV data in the active state? Our approach is to give applications control over active AV data, that is streams, through the creation and manipulation of instances of “activity classes”. These classes are specializations of an abstract class called MediaActivity,a class which defines the basic behavior provided by all activities. A partial specificationfor this class is: class MediaActivity { PortSet ports EventSet events

I

The above class is not very practical since it contains video but no audio information; a more realistic example would contain an accompanying audio track. However the audio track should not be specified as just another attribute of SimpleNewscast. The audio track and video track are temporally correlated, this correlation is specified using temporal composition. In general, temporal composition is necessary when a number of media values are simultaneously presented. Television and film are two obvious examples, each containing both audible and visual components. Within a class definition, temporally correlated attributes are grouped using a “tcomp” construct. As an example, a more realistic Newscast class, with bilingual audio and text subtitles, would be: class Newscast {

...

tcomp clip (

Videovalue Audiovalue Audiovalue TextStreamValue

I

videoTrack englishTrack frenchTrack subtitleTrack

1

Here the clip attribute of Newscast is a temporal composite containing four attributes (also called tracks or components). Correlations between the components are specified, on a per-instance basis, by a timeline diagram. Such diagrams depict the relative timing (start time and duration) of each component. For example, the timeline in Fig. 1 indicates that videoTrack starts at time b and ends at timet, while the other tracks last from t, until t2. Timing information would be specified when a Newscast instance is created. This could be interactively (via an authoring program) or automatically (for example, by a program controlling recording of television broadcasts).

Bind(MediaValue,Port) Cue(Wor1dTime) Start() stop0 Catch(Event,Handler)

I

Activity classes rely on a number of notions, these include: activity creation - An activity is created by instantiating a MediaActivity subclass. (The MediaActivity class itself cannot be instantiated, it merely specifiesthe general interfaceto activities.) activity location Activities have a location, which can be thought of as the processor or node on which they execute. Activity locations need not be visible to applications, however, during creation sufficient information must be present to determine a location for the new activity. activity ports Each activity is associated with a set of Port objects through which streams enter and leave the activ-

videoTrack

Fig. 1 Timeline diagram for a Newscast.clip value.

A track-like structure is a common feature among the emerging multimedia data formats. Temporal composition naturally describes this structure and so is essential to AV databases.

386

ity. A port has a direction, either “in” or “out”, and a media data type. Activities can be classified as sinks, sources and transformers depending upon whether they have, respectively, output ports only, input ports only, or ports of both directions. activity binding The MediaActivity class provides a Bind method allowing a media value to be associated with a particular port. Typically this would be used to configure a source activity so that the specified value is produced from the output port of the source. For example, consider a VideoSource activity specified as: class VideoSource subclass-of MediaActivity ( PortSet ports = (video-out) EventSet events = {EACH-FRAME, LAST-FRAME}

...

Assuming that the data type of video-out is Videovalue (a Mesubclass), then a SimpleNewscast.videoTrack value could be bound to the video-out port of a VideoSource instance. activity control An application can direct an activity to start and stop; once started the activity will produce and consume data through its ports. It is also possible to “cue” the activity to aparticularpoint in world time. For instance,cueing a VideoSource activity to world time “ 0 would position it at the first frame of its bound value. activity event notifrcation As an activity proceeds it generates events which can be “caught” by applications. In the example above, the VideoSource class identifies two events, EACH-FRAME and LAST-FRAME. An application could instantiate this class, request notification on a frameby-frame basis (using EACH-FRAME) or simply when the last frame has been reached, start the activity and then wait to be notified. Some further examples of activities are listed in Table 1. A few points should be mentioned (although the examples in the table deal with video, the following would also apply to audio activities): First, while these examples are more spediaValue

Table 1 Examples of video activities.

activity

I I

videodigitizer

I I

source

video reader

source

video encoder

transformer

videodecoder

I

video tee

videowindow video writer

transformer

I

sink sink

I I

output port datatype

I

raw

I

compressed raw

I

compressed

I

raw compressed

display

raw

I

raw x na

raw x n

I

decod

compressed

raw

transformer

I

input port datatype

I

transformer

video mixer

I

kind

cific than MediaActivity they are still abstract in the sense that the data types of their input and output ports are not fully specified. We have informally used “raw” and “compressed” to give an idea of the function of each activity. Concrete (instantiable) classes for the activities in Table 1 would have ports with data types such as CCIR-Videovalue or JPEGVideovalue. A second point is that many of these activities require specialized hardware or dedicated software resources. Generally,prior to starting an activity, resources must be acquired and initialized. These actions are the responsibility of activity implementorsrather than application implementors. Third, it should be clear that these components can be connected in a variety of configurations.This leads to the notion offlow composition. Flow composition refers to the forming of groups of connected activities. This may come about in two ways: 1. First, activities are connected- via their “in” and “out” ports. An “in” port can be connected to an “out” port provided they are of the same data type. A group of activities connected in this fashion is called an activity graph. 2. Second, composite activities can be formed which contain component activities. It is possible to connect an “out” port of a component to the “out” of the composite in which it is contained - provided the ports are of the same data type. A similar rule applies to the connection of “in” ports. A simple activity is one which cannot be decomposed into other activities. Composite activities occur in two situations. First, activities which process composite AV values will generally contain components for each track of the value. Such a composite would maintain the synchronization of its component activities, assuring that the streams corresponding to the different tracks remain temporally correlated. Second, certain activities are frequently used together. It may then be convenient to group these activities and let the composite activity be responsible for their connection. Flow composition, activity graphs, simple andcomposite activities can be depicted using a graphical notion where nodes correspond to activities and directed arcs indicate port connections. Some examples are shown in Fig. 2. On the top,

raw

I

I Fig. 2

Flow composition: simple activities (top) and a composite activity (bottom).

three simple activities are connected in a chain; this group of activities produces a compressed data stream by reading from storage, decodes the stream, and displays the final result. On the bottom of Fig. 2, the read and decode activities have been grouped within the composite activity called source. This composite could then be connected to a display activity. The difference now being that an application working with a source activity need not be aware of its intemal configuration. AV database systems require both temporal composition and flow composition. While temporal composition determines when operations on AV values take place, what these operations are is left open. For instance, an application using the Newscast class could decide to display a videoTrack value on ascreen, store the value in a file,or mix it with another value in the database. It is the choice of the AV activities to which the video track is bound that determines which operations take place. The resulting view of AV database systems and their relationship to applications is summarized in Fig. 3. An AV database contains temporally-composed AV values that are processed and transferred to applications by groups of connected activities. Activities can be bound to AV values (this is indicatedby curved arrows in Fig. 3 which point to the part of the AV value being processed by an activity) and started and stopped under application control. Requests by applications to create and connect activities are mediated by the database system which maintains responsibilityfor controlling access to shared resources.

\

statements correspond to method invocations on objects belonging to the various classes we have introduced.) First consider the “corporate AV database”. Suppose the database contains the SimpleNewscast class previously specified. Consider an application which queries this class and plays back the videoTrack attribute. In order to present these values, appropriate activitiesmust be created and connected 1 dbSource = new activity VideoSource for SimpleNewscast.videoTrack

2 appSink = new activity Videowindow quality 320x240~8@ 30 3 videostream = new connection from dbSource.out to appSink.in 4 myNews = select SimpleNewscast where (title = “60 Minutes” and whenBroadcast = someDate) 5 bind myNews.videoTrack to dbSource 6 start videostream

Statements 1 and 2 instantiate activities. The first creates a video source located with the database, the second creates a local video sink located at the application. The locations can be determined since 1) activities bound to database values must be located with the database, and 2) Videowindow activities would be located at applications. In requesting a video source the application is allocating resources within the database system. If insufficient resources were available this statement would fail. The activity dbSource may be a composite, furthermore, its components may depend on the particular value to which it is bound (i.e., if SimpleNewscast.videoTrack values use various underlying representations, such as JPEG-Videovalue and MPEG-VideoValue, then dynamic configuration of dbSource is necessary). Statement number three connects the output port of the source to the input port of the sink, the quality factor associated with the sink aids in resource allocation. This statement would fail if insufficient network bandwidth were available or if the source could not provide data at this rate. The fourth statement is a database query, the only information retumed to the application is a reference (myNews) to a Newscast object in the database. The final statements bind a particular video value to the video source and then start a stream. It is at this point that video data is passed from the database to the application. The transfer and the application can then proceed in parallel. At any point the application may stop the transfer or setup handlers for events generated by the activities. The above example does not involve temporal composition. Suppose the application must receive an audio track in addition to the video. The application could proceed as above by instantiatingand connecting audio sinks and sources in addition to those used for video. However if the application requires the audio and video to be synchronized than a pair of composite activities must be used. Using the Newscast specification given above, the operations needed to present video with a simultaneous audio track (in English) would resemble:

I

AV Database System

AV activities \ I

\

AV values

Fig. 3 AV database system and applications.

4.3: Application examples We now give some more detailed examples of temporal and flow composition based on the two scenarios of section 3. We will use pseudo-code to indicate requests issued by an application to an AV database system. (The pseudo-code

388

dbSource = new activity MultiSource install (new activity VideoSource for Newscast.clip.videoTrack) in dbSource install (new activity AudioSource for Newscast.clip.englishTrack) in dbSource appSink = new activity Multisink install (new activity VideoWindow quality 320x240~8 @ 30) in appSink install (new activity Audiosink quality voice) in appSink compositestream = new connection from dbSource.out to appSink.in myNews = select Newscast where (title= “60Minutes”and whenBroadcast = someDate) bind myNews.clip to dbSource start compositestream The first statement creates a composite activity and installs two components used for Newscast.clip.videoTrackvalues and Newscast.clip.engiishTrack values respectively. The second statement creates a matching composite sink. The two activities are then connected in the third statement, the result is a configuration resembling that shown on the top of Fig. 3. As in the previous example, these three statements result in the allocation of database, network and application resources. The final statements issue a query, bind a value to the source and start a presentation activity. Our final example, the virtual world scenario, is derived from a prototype we have developed [21]. Consider an AV database application that displays a rendered scene through which the user can navigate. Video imagery stored in the database is incorporatedin the scene, for instance the video material could be projected on a wall in the virtual world. Fig. 4 shows the required activities and their connections. The es-

AV database svstem

Fig.4

es. Depending upon the capabilities and resources of the database system and the client, rendering may be done by the database or locally by the client. For example, a client with 3D graphics hardware may simply request the video stream from the database and render it locally (i.e.. incorporate the video material in the virtual world), this is shown on the top of Fig. 4. While a client without such hardware could request that rendering occur at the database site (bottom of Fig. 4). This example can be extended in a number of ways. For instance, by adding multiple clients or multiple sources of artifacts. These extensions would be handled by more complex configurationsof activities.

5: Conclusion As new technology is developed new forms of applications become possible. It is our contention that for database systems, recent developments related to digital audio and video are likely to have significant impact, and that the design and implementation of audiobide0 database systems must be undertaken. We have presented an object-oriented framework and related abstractions for AV databases. The framework gives a new perspective for databases, one where the database must become involved with capture, presentation, scheduling and synchronization of complex objects, managing access and allocation of devices and channel bandwidths, and notifying the application of presentation-related events. AV database systems, as described here, have yet to be implemented. However there are some steps in this direction. One of the key notions of the framework is the encapsulation of media processing activities, such as coding and decoding, filtering, and synchronization, by active objects. This is a generalization of the approach taken by ORION’S MIM where storage, capture and presentation devices are represented by objects. Our approach is also related to the work on audio and video servers [ 1][ 15][ 181 where the issues of connecting and synchronizing media objects are addressed. In order to realize AV database systems, two steps appear necessary: First, the data modelling requirements for AV must be identified. We have argued that temporal composition is the basic abstraction that applies to AV data, however many refinementsof temporal composition are possible. We are exploring this issue by modelling a particular AV format in detail [ 5 ] .Second, it is necessary to extend database architectures with facilities for managing AV activities. While this is a major undertaking, it is related to, and can benefit from, the rapidly progressing area of operating system support for digital audio and video.

client

Alternative activity graphs for a virtual world application.

Acknowledgments We would like to acknowledge the support of the Swiss National Research Foundation (Fonds National de la Recherche Scientific) under project number 20-29053.90 and the Austrian National Science Founda-

sential component is render, which processes two streams one coming from the user driven activity, move, the other from a video source - and generates a stream of raster imag-

389

tion (Fonds zur Forderung der wissenschafilichen Forschung) under contract number 50619-PHY.

23. Woelk, D., Kim, W., and Luther, W. An Object-Oriented Approach to Multimedia Databases. Proc. of the ACM-SIGMOD 1986 Conf.on Management of Data (Washington, D.C., May 1986), ed. C. Zaniolo, SIGMOD Record, Vol. 15, No. 2, pp. __ 31 1-325. 24. Woelk, D. and Kim, w. Multimedia Information Management in an Object-Oriented Database System. Proc. 13th Int. Conf. on VLDB (Brighton, England, Sept. 1987), Eds. P. M. Stocker and W.Kent, Morgan Kaufmann Publishers, Los Altos, CA, 1987, pp. 319-329.

References 1. Anderson, D.P. and Homsy, G. A Continuous Media 1/0 Server and Its Synchonization Mechanism. IEEE computer, vol. 24, No. 10 (Oct. 1991), pp. 51-57. 2. Apple Corp., QuickTime Developer’s Guide. Preliminary version, 199 1. 3. Baribeau, R., Taylor, J., Rioux, M., and Godin, G. Color and Range Sensing for Hypermedia and Interactivity in Museums. Proc. Intl. Conf. on Interactivity and Hypermedia in Museums, (Pittsburgh, Oct. 1991), pp. 265-275. 4. Bertino, E., Rabitti, F., and Gibbs, S.Query Processing in a Multimedia Document System.ACM TOIS, Vol. 6, No. 1 (Jan. 1988), pp. 1-41. 5. Breiteneder, C., Gibbs, S. and Tsichritzis, D. Modelling of AudioWideo Data. In Proc. 11th Intl. Conf.on the Entity Relationship Approach, Springer-Verlag,1992, pp. 332-339. 6. Chang, N.S. and Fu, K.S.Query by Pictorial Example. IEEE Transactions on Sofrware Engineering, Vol. SE-6, No. 6, 1980, pp. 519-524. 7. Christodoulakis, S. et al. Multimedia Document Presentation, Information Extraction, and Document Formation in MINOS: A Model and a System. ACM TOIS, Vol. 4, No. 4 (Oct. 1986), pp. 345-383. 8. Deutsch, L.P. Design Reuse and Frameworks in the Smalltalk-80 System. In SoJbvare Reusability, Vol. II, (Eds. T.J. Biggerstaff and A.J. Perlis) ACM Press, pp. 57-71, 1989. 9. Gibbs, S. Composite Multimedia and Active Objects. Proc. OOPSLA’91,pp. 97-1 12. 10. Green, J. The Evolution of DVI System Software. Commun. of the ACM, Vol. 35, No. 1 (Jan. 1992), pp. 53-67. 11. Johnson, R., and Wirfs-Brock, B. Object-Oriented Frameworks. Proc. OOPSLA’91 Tutorial Notes 12. Klas, W., Neuhold, E.J., and Schrefl, M. Using an Object- Oriented Approach to Model Multimedia Data. Computer Communications, Vol. 13, No. 4, pp. 204-216, 1990. 13. Le Gall, D. MPEG: A Video Compression Standard for Multimedia Applications. Commun. of the ACM, Vol. 34, No. 4 (April 1991), pp. 46-58. 14. Lippman, A. Feature Sets for Interactive Images. Commun.of the ACM, Vol. 34, No. 4 (April 1991), pp. 92-102. 15. Milazzo, P.G. Shared Video under UNIX. Proc. 1991 Summer UNIX Conf.,pp. 369-383. 16. Preston, J.M. (Ed.) Compact-Disc Interactive: A Designer’s Overview, Kluwer, Deventer NL, 1987. 17. Ripley, G.D. DVI - A Digital Multimedia Technology. Commun. of the ACM, Vol. 32, No. 7 (July 1989), pp. 811-822. 18. Rowe, L.A., and Smith, B. A Continuous Media Player. Third Intl. Workshopon Network and Operating System Supportfor Digital Audio and Video, San Diego, Nov. 1992, pp. 334-344. 19. Sony DME-9000. Product Brochure. 20. Thomas, R.H. et al. Diamond A Multimedia Message System Built on a Distributed Architecture. IEEE Computer, Vol. 18, No. 12 (Dec. 1985), pp. 65-77 21. Tsichritzis, D. and Gibbs, S. Virtual Museums and Virtual Realities. Proc. Intl. Conf.on Interactivity and Hypermedia in Museums, (Pittsburgh, Oct. 1991), pp. 17-25. 22. Wallace, G.K. The JPEG Still Picture Compression Standard. Commun. of the ACM, Vol. 34, No. 4 (April 1991), pp. 30-44.

390

an object-oriented approach - IEEE Computer Society

an object-oriented approach - IEEE Computer Society

Suggest Documents

Optical iris localization approach - IEEE Computer Society

A Distributed Intelligence Approach - IEEE Computer Society

IEEE COMPER SOCIETY - IEEE Computer Society

An Exploratory Study - IEEE Computer Society

An Empirical Study - IEEE Computer Society

Geometric Hashing: An Overview - IEEE Computer Society

An Arithmetic Unit - IEEE Computer Society

An Empirical Study - IEEE Computer Society

CallMeSmart - IEEE Computer Society

WaferCatalyst - IEEE Computer Society

Bookshelf - IEEE Computer Society

Conation - IEEE Computer Society

DOI - IEEE Computer Society

Workshop - IEEE Computer Society

Conferences - IEEE Computer Society

Spotlight - IEEE Computer Society

microLAW - IEEE Computer Society

iservo - IEEE Computer Society

Conferences - IEEE Computer Society

IEEE Computer Society - CiteSeerX

Featureous - IEEE Computer Society

eGSSM - IEEE Computer Society

PHYSIMC - IEEE Computer Society

ANFIS - IEEE Computer Society