Management of Resources to Support Continuous Display of Structured Video Objects Martha Lucia Escobar-Molano
[email protected]
Computer Science Department University of Southern California November 10, 1994
Committee Shahram Ghandeharizadeh (Chairperson) George Bekey Melvin Breuer Dennis McLeod Tomlinson Holman (Outside Member)
12:00 Noon, Tuesday, November 22, 1994 Salvatori Computer Science Center, Room 222
Abstract Video is becoming widely used in a variety of elds such as medicine, education, entertainment and science. In science and education, for instance, it is used for visualization and interactive simulation as an integral part of virtual reality. It has been predicted that virtual reality could revolutionize the world of the scientist in the same way that the spreadsheet changed the world of the nancial planner [Wal93]. For example, the techniques of virtual reality were used to examine the positioning of the corrective optics to be installed in the Hubble Space Telescope prior to the actual shuttle mission to x the problem [Han93]. With a proliferation of video data, its eective representation that facilitates design and implementation of algorithms to support queries becomes of paramount importance. This dissertation proposes a conceptual model for a video clip. It partitions a video clip into smaller and more manageable pieces. First we divide a video clip into an Object Space and a Name Space. The object space represents the rendering aspect of video, e.g., a collection of 3D representations of characters and backgrounds associated using temporal and spatial constraints along with rendering characteristics. The name space represents the user interpretation of video, e.g., name of characters, actions such as running, walking, etc. Next, we organize the object space into three layers: 1) components, 2) spatial and temporal relationships between components, and 3) rendering features such as light sources, view point, etc. The focus of this proposal is on the object space due to the value of its content description. We rst present the model and its bene ts for content based querying. Subsequently, we discuss the physical design of a system that realizes the proposed model and its related research topics.
1 Introduction Video in variety of formats has been available since late 1800's: In 1870's Eadweard Muybridge created a series of motion photographs to display a horse in motion [Enc92]. Thomas Edison patented a motion picture camera in 1887 [Enc92]. The primary motivation of these inventors was to entertain the audience and to capture their imagination. Needless to say, video has enjoyed more than a century of research and development to evolve to its present format. During the 1980s, digital video started to become of interest to computer scientists. Repositories containing digital video clips started to emerge. Both the National Information Infrastructure and the Information Super Highway initiatives have added to this excitement by envisioning massive archives that contain digital video in addition to other types of information, e.g., textual, record-based data. This has resulted in a growing interest for a system that supports queries that manipulate and retrieve digital video based on their content. It is useful because it enables the user to retrieve the relevant data instead of browsing through weeks (potentially years of) video data. While the present format of video clips is ideal for a human to manipulate and reason about, it is not ideal for designing algorithms that process queries. This proposal describes how video clips can be authored to contain structure in order to support content-based retrieval of information. During the past several years, numerous studies have proposed alternative techniques to support content-based retrieval of video information. From a database perspective, these approaches can be categorized using the alternative abstract view points of a database. A database can be viewed from two1 levels of abstraction [Ull88]: (1) Conceptual database: The user speci es the entities and the relationships between these entities. Moreover, the user identi es the required functionality from the nal system that might store the identi ed information. Data de nition and manipulation languages are used to describe the conceptual database in terms of a data model (i.e., relational, entity{relationship, object{oriented, etc.). (2) Physical database: The data de nition and manipulation languages are realized using an implementation. An implementation includes important topics such as crash-recovery, query optimization, access methods, concurrency control and support for multiple users, techniques to maintain the integrity of the data, etc. An implementation of data de nition and data manipulation languages may vary from one system designer to another depending on their employed algorithms and programming skills. The current studies can be taxonomized as bottom-up. They assume the current physical representation of video (a sequence of frames2 displayed at a pre{speci ed rate) and employ it to support both compression and content-based retrieval3 . Consider the following as examples. Several studies have proposed alternative compression techniques for video based on this physical representation, with that of the Motion Picture Expert Group, MPEG [Gal91], developing as a standard. In computer vision, some studies have investigated techniques to interpret the content of a picture based on human visual perception [GW92]. Several studies have attempted to parse There is a third level of abstraction, termed view database, that is not relevant for this study. It is a portion of the conceptual database or an abstraction of parts of the conceptual database. 2 A frame is a snapshot of a motion picture. 3 Note that in the late 1800s, the concept of frames displayed at a pre{speci ed rate was designed to support neither compression nor content-based retrieval. It was introduced to force the human perception to observe motion. 1
1
the physical representation of video to construct structure [FAC93]. There have been eorts at the conceptual level with studies of appropriate data de nition and manipulation languages for a video clip [OT93] based on the frame representation of a video clip. This dissertation is based on a top{down approach. It proposes that a video clip maintains the complex relationships that exists among its data items in order to support content-based queries. To a certain degree, this approach has started to emerge in an ad-hoc manner with the entertainment industry taking the lead in this direction. For example, in the movie \On the Line of Fire", Clint Eastwood as an actor was overlayed on a background that consisted of snap shots of Clinton's 1992 presidential campaign. Similarly, in the movie \Jurassic Park", the Dinosaurus (and their complex movements) were generated using computers and later incorporated into the dierent scenes of the movie. This approach to authoring movies has the following advantages: 1) it is economically viable (e.g., the cost of creating a ctitious presidential campaign would have been higher than incorporating Clinton's campaign), 2) it might improve the quality (e.g., the inclusion of an actual presidential campaign added realism to the picture), and 3) it resurrects characters from the past (e.g., JFK became part of the movie \Forest Gump" long after his assassination). There have also been eorts in other disciplines. Some studies have analyzed computer generated motion based on laws of physics [WK88, Coh92]: Motions of synthetic creatures are de ned based on constraints (e.g., physical structure of the creature) and objectives (e.g., jump from here to there). Finally, it is important to note that software to produce animated video is commercially available [Mac92]. From a conceptual perspective, a frame is not the appropriate level of granularity to reason about the content of a video clip. Instead, a frame should be described in terms of its logical data items. The object{oriented paradigm is the prime candidate for representation of the objects that constitute a video clip due to its rich modeling capability, i.e., complex objects, methods, etc. However, physical implementations of alphanumeric object{oriented databases require further investigation for video databases as explained in Section 7. In addition to proposing a top{down approach to investigate databases containing structured video objects, this proposal develops a conceptual data model for video and states some research directions at the physical level. The model presented is not by any means exhaustive. It concentrates on the spatial and temporal aspects of video that are fundamental for its representation. It provides a framework to investigate issues opened by the new requirements such as video data placement and retrieval from secondary storage. The following example illustrates many of the concepts presented in this article and serves as a motivation for the proposed model.
Example 1: Animation and virtual reality can be employed to train a person on how to drive in California. The training may consist of a theoretical section and a practical exercise. The theoretical section employs an animated video to illustrate the basic principles of driving and the trac regulations vigent in California. The practical exercise is an interactive simulation of the trainee driving in Los Angeles. It presents a virtual environment (a view of Los Angeles from a car) to the trainee who manipulates a steering wheel, a break, a gas pedal and a shift stick. The trainer may construct alternative scenarios by manipulating the presentation of information in order to instruct the trainee. The construction is a real{time task for the practical exercise 2
Figure 1: Paths followed by left and right turns. and a non real{time task for the theoretical section. Both the theoretical section and the practical exercise have three sources of information: 1) A spatial database with information about the buildings, roads and signs in the city. It can be represented as a collection of 3D objects (the dierent buildings, signs and roads) placed in a 3D coordinate system associated to Los Angeles. 2) The moving vehicles. It can be represented by the path and timing followed by each moving vehicle. A curve in the Los Angeles's coordinate system with timing information and direction of the vehicle associated to dierent points in the curve, for each vehicle. 3) The rendering parameters indicating the view point, light sources, shading, etc. Notice that the state of the objects in 1) can be aected by the moving vehicles. For example if a car hits a light pole, the system must compute the new state of the pole and the vehicle based on the speed, weight, and direction of the vehicle, etc. The sources of information 2) and 3) and the scenarios shown to the trainee are de ned dynamically in the practical exercise while statically in the theoretical section. For example in the practical exercise, if the trainee is driving south on freeway 110 and decides to take Santa Monica freeway heading West, the video must show a view of the exit from 110 to Santa Monica heading West. Therefore, the system must consider the buildings, roads and signals located around that exit to create dynamically (in real{time) a scenario that shows the exit from the driver's perspective. If the theoretical section includes a drive through the exit from 110 south to Santa Monica west, then the same scenario is created but statically. Furthermore, the path followed by the vehicle is de ned dynamically for the practical exercise while statically for the theoretical section. The selective display of the animated video is important to review sections where the trainee is weak. Also, a selective replay of the practical exercise would allow the trainee to self criticize his/her reactions. For example, the trainee might want to display the scenes where the car turns right at a red light to either review the regulations for that circumstance or to examine whether his/her reactions were appropiate. Such query can be stated as: Retrieve the scenes where the path followed by the car forms an angle greater than4 2 and smaller than 180 ? 1 (the angle 1 and 2 determines when a path changes from going straight to right turn and from right turn to making a U{turn, respectively. 4
3 (a) Left turn
(b) Right turn
is measured by drawing a circle counterclockwise starting from the rst section of the path. See Figure 1) and there is a red light at the location where the change of direction occurs.
2 Temporal and spatial information play a key role in the description of video content, therefore in its interrogation. Hence, their representation is crucial to the retrieval of data. This study employs object-based constructs to describe the relationships between the objects in a presentation (e.g., a scene). It assumes the existence of a repository containing the representation of the participants (e.g., 3D representation of characters). Next, it employs spatial and temporal constructs in order to describe the relationships that exist in a presentation (e.g., motion paths and their timing). It separates the rendering parameters (e.g., the lighting and the angle of view point) from the data items, providing an author with the ability to have alternative presentations (e.g., present a scene from alternative view points). The presentation participants, their temporal and spatial relationships and their rendering characteristics constitute the Object Space. Besides the object space, a video database contains the user's interpretation of the object space (e.g., characters names, motion adjectives such as running, walking, dancing, etc.). Such interpretation is termed Name Space and facilitates the user's interaction with the database. For instance, it is simpler to refer to a \stop sign" instead of the geometric features of a stop sign when querying the animated video of Example 1. The name space is a subjective representation and varies from user to user, while the object space is objective. The focus of this study is on the object space. Therefore, when referring to a video and a video database, we mean its rendering description and its object space respectively. We describe the conceptual model for a video database, and its value in content description. We also describe research issues raised at the physical level of the proposed model. We start by presenting the conceptual database and the data de nition of the proposed model in Sections 3 and 4. In Section 5, a taxonomy of techniques to produce/display video clips is presented. This taxonomy de nes requirements to be considered in the implementation of the database at the physical level. The relevance of temporal and spatial relations in the description of content is presented in Section 6. Finally, the future work is presented in Section 7.
2 Related Work Several studies have investigated video databases from a conceptual perspective. Oomoto and Tanaka [OT93] proposed an object{based data model for video. It follows the traditional bottom{ up approach by assuming that a video consists of a sequence of frames. It focuses on the name space of a video database. Temporal representation and its manipulation have been studied before the advent of multimedia. Allen [All83] introduced an interval{based representation of time and a reasoning algorithm to maintain it. With the introduction of continuous media types such as video, researchers have studied temporal modeling in the context of databases. Gibbs et al [GBT94] proposed a timed stream as the basic element of time{based media. It introduced a basic structuring mechanisms for timed streams. Little and Ghafoor [LG93] presented an interval{based representation of temporal 4
Figure 2: Temporal and spatial relationships between objects: (a) A still picture: Objects are related using spatial constructs. (b) Rolling ball motion: Objects are related using spatial and temporal constructs. relationships between multimedia streams as part of a database. It introduced algorithms for forward, reverse and partial playout of the streams based on the introduced representation. Schloss and Wynblatt [SW94] presented a conceptual object{oriented database to represent multimedia data. It divides the multimedia database in layers for modularity and re{usability purposes. It considers a multimedia database as a collection of complex objects with temporal relationships, and synchronization policies (i.e., strictness and resynchronization after failure). In sum, these are conceptual studies that did not consider the spatial aspects of objects that participate in a video clip. Studying spatial and temporal aspects of multimedia have been attempted before. Weitzman and Wittenburg [WW94] introduced a technique to generate automatically multimedia documents based on relational grammars. Documents are considered as collections of objects with spatial and temporal constraints. The main focus of this study is on the rendering aspect of multimedia documents. It does not address issues such as data organization on secondary storage for ecient retrieval of multimedia data and re{usability. However, it provides spatial and temporal information that might be useful for databases that consist of multimedia documents. y
Several studies have investigated the continuous display ofy video clips [BGMJ94, CL93, GCEMJ93, GDS95, GR93, GS93]. However, these studies assumed video as a sequence of frames that are displayed at a constant rate.
3 Conceptual Database
x
x
We propose a model that views video as a collection of component objects, spatial and temporal constructs, and rendering features. The spatial and temporal constructs de ne where in the rendering space and when in the temporal space the component objects are displayed. The rendering
z
z (a)
5
time (b)
features de ne how the objects are displayed. A Rendering Space is a coordinate system de ned by n orthogonal vectors, where n is the number of dimensions (i.e., n = 3 for 3D, n = 2 for 2D). A spatial construct speci es the placement of a component in the rendering space. To illustrate, consider Figure 2 (a). Its rendering space is 3D. One spatial construct describes the placement of the desk (one component object of the picture) in the rendering space. Another describes the placement of the workstation (a second component object). Analogously, dierent components are rendered within a time interval, termed Temporal Space. A temporal construct speci es the subinterval in the temporal space when the component object is rendered. For example, if a movie has 30 scenes of 3 minutes each then the temporal space of the movie is [0; 90]. Moreover there is a temporal construct specifying the subinterval within the temporal space that should render each scene. For example, a temporal construct for the rst scene will specify the subinterval [0; 3]. To illustrate the use of both constructs simultaneously, consider the motion of a rolling ball in Figure 2(b). The motion is captured by a sequence of snapshots, represented by a sequence of triplets: (the object (i.e., the ball), its positioning, subinterval). Each triplet speci es a spatial and temporal construct. The model presented in this paper partitions the information associated to video in three layers: (i) Indivisible objects (e.g., the 3D representation of the ball in Figure 2 (b)), termed Atomic Objects. (ii) Composed Objects that consist of objects constrained using temporal and spatial constructs (e.g., a triplet: the ball, a positioning, a subinterval; for each snapshot in Figure 2 (b)). (iii) The rendering features (e.g., viewpoint, light sources, etc.). We now present each layer in turn. Subsequently, we present an example that illustrates the representation of a movie using this model.
3.1 Atomic Objects This section presents the atomic objects layer. This layer contains objects that are considered indivisible (i.e., they are rendered in their entirety). The exact representation of an atomic object is application dependent. In animation, as described in [TT90], the alternative representations include: (1) wire-frame representation: An object is represented by a set of segment lines. (2) surface representation: An object is represented by a set of primitive surfaces, typically: polygons, equations of algebraic surfaces or patches. (3) solid representation: An object is a set of primitive volumes. 6
y
y
y
Figure 3: Spatial constructs: (a) Three dierent directions for a die, (b) Two Atomic Objects, (c) A Composed Object constructed using spatial constructs and the atomic objects in (b). x
x
x
But from the database perspective, the speci c representation (e.g., wire-frame, surface or solid representation) is relevant at the physical level where persistent data structures are employed to store a zrepresentation (beyondzthe scope of this paper). From a conceptual perspective, the system supplies data(a)to be processed by the specialized software (e.g., graphics packages to produce a frame in animation).
z
3.2 Composed Objects The composed objects layer contains the representation of temporal and spatial constructs. In addition to specifying positioning and timing of objects, these constructs de ne objects as composed by other objects. The composition might be recursive (i.e., a composed object may consist of a collection of other composed objects). For example, a molecule is composed of atoms. An atom itself is a composed object consisting of electrons, protons and neutrons. In this case, the electrons, protons and neutrons are in the atomic objects layer while the atoms and molecules belong to the composed objects layer. 7 (b)
(c)
Spatial constructs place objects in the rendering space and implicitly de ne spatial relationships between objects. The placement of an object de nes its position and direction in the rendering space. For example, consider a path from a house to a pond. The placement of a character on the path must include, in addition to its position, the direction of the character (e.g., heading towards the pond or heading towards the house). A coordinate system de ned by n orthogonal vectors de nes unambiguously the position and direction of an object in the rendering space. For example, consider a 3D representation of a die. Figure 3 (a) shows three dierent placements of the die in the rendering space de ned by the x-y-z axis. Notice that the position of the die in each placement is the same. But the direction of the die varies (e.g., the face at the top is dierent for each placement). However, the coordinate systems de ned by the red, green and blue axis speci es unambiguously the position and the direction of the die. A spatial construct speci es the coordinate system (e.g., the red, green and blue coordinate system) in the rendering space to dictate the placement of the object. This coordinate system is matched to n orthogonal vectors in the object to determine the placement (e.g., the green arrow is matched to the edge between 6 and 2 heading towards 3, the blue to the edge between 1 and 2 heading towards 5 and the red between 1 and 6 heading towards 4). A Spatial Construct of a component object o is a bijection that maps n orthogonal vectors in o into n orthogonal vectors in the rendering space, where n is the number of dimensions. Let o's coordinate system and the mapped coordinate system be de ned by the n orthogonal vectors in o and the mapped vectors, respectively. The placement of a component object o in the rendering space is the traslation of o from its coordinate system to the mapped coordinate system, such that its relative position with respect to both coordinate systems does not change. Note that there is a unique placement for a given spatial construct.
Example 3.1: Consider the atomic objects in Figure 3 (b). Intuitively, a spatial construct drags
the coordinate system of an atomic object to the mapped coordinate sytem in the rendering space, carrying the object along. Figure 3 (c) shows the mapped coordinate systems of both spatial constructs (one for each atomic object) in the rendering space. It also shows the eect of applying the constructs to the atomic objects. The mapped coordinate systems de ne uniquely the position and direction of the objects in the rendering space. 2
Temporal constructs de ne the rendering time of objects and implicitly establish temporal relationships among the objects. They are always de ned with respect to a temporal space. Given a temporal space [0; t], a Temporal Construct of a component o of duration d, maps o to a subinterval [i; j ] such that 0 i j t and j ? i = d. Temporal and spatial constructs can be de ned simultaneously in this layer by assigning positions in the rendering space as well as time intervals to objects. For example, the motion of a character can be represented by considering the dierent postures of the character while moving as a function of time: assigning a position and a posture to each time interval.
8
3.3 Rendering Features This layer speci es the rendering parameters of the presentation. It de nes what section of a scene will be visible, where and what light sources will be used, etc. Given a composed object with duration d, this layer de nes the rendering parameters for intervals [i; j ] where 0 i j d. For example, consider a movie that lasts dmovie minutes. Suppose that each scene lasts one minute. Then we can de ne the rendering features for each subinterval of one minute in the time interval [0; dmovie]. Then, for each interval: [0; 1]; [1; 2]; : : :; [dmovie ? 1; dmovie ] we specify: view point, light sources, etc.
3.4 Example This section illustrates how a movie is represented using this model. The 3D representations of the characters and objects in the scenes are represented in the Atomic Objects layer. Motions, scenes and sequence of scenes are represented in the Composed Objects layer as collections of spatial and temporal constructs. Finally, the rendering parameters are speci ed in the Rendering features layer. Figure 4 shows the dierent levels of abstraction of a media object and an example of the representation of a scene in the movie. Suppose Mickey Mouse walks along a path in the scene. Then, we have the 3D representations of dierent postures of Mickey Mouse in the Atomic Objects layer. For example, his posture when he starts walking, his posture one second later, etc. They are denoted by p1, p2, etc. These postures might have been originals composed by an artist or generated using interpolation. We also include the 3D representation of the background (denoted by sc1) in the Atomic Objects layer. To represent the motion, we have to specify spatial and temporal constructs in the Composed Objects layer. The spatial constructs for the motion representation are speci ed by the curve labeled c1. The curve illustrates the path followed by Mickey Mouse (i.e., the dierent positions reached) and the blue coordinate systems the direction of the 3D representation of Mickey Mouse posture. The temporal constructs are speci ed by the triplets: component, starting time and duration, representing the posture of the character and its timing. For example, the point labeled by (p1, 0, 1) indicates that the posture p1 is at the scene at the 3D coordinates associated to the point and appears at time 0 and lasts for 1 second. To associate the motion of Mickey Mouse to the background, we have the spatial and temporal constructs in the Composed Objects layer represented by c2. The spatial constructs de ne where in the rendering space the motion and the background are placed. The rendering space is represented by the black coordinate system and the placement of the motion and background by the blue coordinate systems. The motion is mapped to the blue coordinate system labeled by (c1,5,25) and the background to the other blue coordinate system. The temporal constructs de ne the timing of the appearances of the background and the motion. For instance, the background sc1 appears at the beginning of the scene while Mickey Mouse's motion (c1) starts at the 5th second. Finally, the rendering features are assigned by specifying the view point, the light sources, etc., for the time interval at which the scene is rendered.
9
RENDERING FEATURES Rendering characterization of composed objects
View point, light sources, etc. of each scene in the movie.
c2: (sc1,0,30) * * (c1,5,25) COMPOSED OBJECTS Temporal and spatial association to objects
c1: (p3,3,2) (p2, 1, 2) * (p1, 0, 1) * *
ATOMIC OBJECTS Indivisible objects Figure 4: Levels
Different postures of Mickey Mouse: p1, p2, .... A scenery with a house, trees, mountains, and a medow: of Abstraction in Datasc1. Representation (a) Description of each level.
of how a movie with a scene where Mickey Mouse walks is represented 10
(a)
(b)
(b) Example
4 Data De nition This section describes a data de nition of the model described in the previous section. The Data De nition Language chosen to represent the dierent levels of abstraction of the object space is that of DAPLEX [Shi81] extended with methods. The reasons for choosing DAPLEX is its simplicity, its support for complex objects and inheritance that are essential for our model representation. Moreover, it is a well studied language and some object{oriented database models are based on it [ISK+ 93].
4.1 Atomic Objects Video applications may represent atomic objects in alternative ways (e.g., wire{frame, surface, solid representation, etc.) at the physical level. From a conceptual perspective, these physical representations are considered as an unstructured unit (i.e., a BLOB). Atomic Objects can also be represented at the conceptual level as either: (i) A procedure that takes some parameters and builds a BLOB that represents the object. For example, a geometric gure can be represented by the parameters of its parts (i.e., the radius, the length of a side of a square, etc.) and a procedure that consumes those parameters and produces a bitmap that represents the object. This type of representation is termed Parametric. (ii) An interpolation of two other atomic objects. For example in animation, the motion of a character can be represented as postures at selected times and the postures in between can be obtained by interpolation. As in animation, this representation is termed In-Between. (iii) A transformation applied to another atomic object. For example the representation of a posture of Mickey Mouse can be obtained by applying some transformation to a master representation. This representation is termed Transform. In Figure 5, we present the schema of the type Atomic that describes these alternative representations. The conventions employed in this schema representation as well as others presented in this paper are as follows: The names of built-in types (i.e., strings, integers, etc.) are all in capital letters as opposed to de ned types that use lower case letters. ANYTYPE refers to strings, integers, characters and complex data structures. A type is represented by its name surrounded by an oval. The attributes of a type are denoted by arrows with single line tails. The name of the attribute labels the arrow and the type is given at the head of the arrow. Multivalued attributes are denoted by arrows with two heads and single value attributes by arrows with a single head. For multivalued attributes, an S overlapping the arrow is used to denote a sequence instead of a set. The type/subtype relationship is denoted by arrows with double line tail. The type at the tail is the subtype and the type at the head is the supertype. For example, in Figure 5 Parametric is a subtype of Atomic, and it has two attributes: Parameters and Generator. Parameters is a set of elements of any type and Generator is a function that maps a set of elements of any type (i.e., Parameters) into a BLOB. 11
Atomic
Figure 5: Atomic object schema In-Between
BLOB
Parametric
First Last Parameters
Atomic
Generator
Atomic ANYTYPE
P(ANYTYPE)
Master Param.
Param.
Atomic
BLOB
Delta
Interp. ANYTYPE
FUNCTION
Transform
ANYTYPE
FUNCTION 12 Atomic x Atomic x P (ANYTYPE) BLOB
FUNCTION Atomic x BLOB P (ANYTYPE)
Figure 6: Examples of coordinate systems.
4.2 Composed Objects Composed Objects are collections of objects with spatial and/or temporal constructs.
De nition: A Composed Object C is represented by the set: f(ei; pi; si; di) j ei is a component of C ,
pi is the mapped coordinate system in C 's rendering space de ned by the spatial construct on ei and [si ; di] is the subinterval de ned by a temporal construct on ei g
A composed object may have more than one occurrence of the same component. For example, a character may appear and disappear in a scene. Then, the description of the scene includes one 4-tuple for each appearance of the character. Each tuple speci es the character's position in the scene and a subinterval when the character appears. The de nition of composed objects establishes a hierarchy among the dierent components of an object. This hierarchy can be represented as a tree. Each node in the tree represents an object with spatial and temporal constructs (i.e., the 4-tuple in the composed object representation: (component, position, starting time, duration)), and each arch represents the relation component of. The following example illustrates the role of the spatial and temporal constructs in the rendering ofy a media object. y y
A z z
z
Example 4.1: First, consider a restricted version of the spatial constructs to simplify their
representation and the placement of objects in the rendering space:y
Restricted spatial contructs map a coordinate system associated to the object to a coordinate systemBparallel to the rendering space. x C (i.e., the mapped x coordinate system axis are parallel to and in the same direction as the axis of the rendering space). For example, assume that the rendering x
13
D
z x
space is the coordinate system A in Figure 6. The systems C and D are invalid restricted spatial constructs while the system B is a valid one. If all spatial constructs are restricted as described above, then it suces to specify a point in the rendering space instead of the mapped coordinate system. For example, there is no need of specifying the blue coordinate systems in Figure 4. The placement of the objects can be unambiguously speci ed by the origin points (i.e., the path). Temporal and spatial constructs de ne subintervals and positions of components with respect to the temporal and rendering space of their immediate ancestors. But it might be necessary to compute the timing and positioning of an object with respect to a non-immediate ancestor. The timing and positioning of an object a with respect to the temporal and rendering space of its ancestor A, are: [(i is a s ancestor si ) + sa ; da] and (i is a s ancestorx1i ) + x1a ; : : :; (i is a s ancestor xni ) + xna ) 0
0
0
where (a; (x1a; : : :; xna); sa; da) are a's spatial and temporal constructs, i is an ancestor of a in the tree rooted by A, sx and dx are the starting time and duration of object x with respect to its immediate ancestor, and xji is the j -th coordinate of the position of the object i with respect to its immediate ancestor. Consider a movie that consists of n scenes: sc1 ; : : :; scn , with scene sc3 consisting of a motion m and a background b. Suppose that the path followed by m is de ned by p points as follows:
m = f(pc1; (pcx1 ; pcy1; pcz1); 0; dm1 ); (pc2; (pcx2 ; pcy2; pcz2); dm1 ; dm2 ); : : :; (pcp; (pcxp; pcyp; pczp); i=1;p?1dmi; dmp)g; where pci is OID of a posture of the character. Suppose that the spatial and temporal constructs associated to the background of the third scene, the third scene and the movie are as follows:
b = f(ob1; (obx1 ; oby1; obz1); 0; i=1;pdmi + 5); : : :; (obr; (obxr; obyr ; obzr ); 0; i=1;pdmi + 5)g sc3 = f(b; (0; 0; 0); 0; i=1;pdmi + 5); (m; (mx; my ; mz ); 5; i=1;pdmi )g movie = f(sc1; (scx1 ; scy1 ; scz1); s1; d1); : : :; (scn; (scxn; scyn; sczn); sn; dn)g Then the placement of a posture pck with respect to the scene is (mx + pcxk; my + pcyk ; mz + pczk ) and its starting time and duration with respect to the beginning of the movie is [s3 +5+i=1;k?1 dmi ; dmk]. 2 Figure 7 illustrates the schema associated to the representation of spatial and temporal constructs in the Composed Objects layer.
4.3 Rendering Features Rendering features are associated to intervals in the temporal space of a composed object. They are a collection of tuples, (descriptor; value), to represent the description and value of a feature. Figure 8 represents the schema associated to the rendering features layer. 14
Composed
Figure 7: Composed Object schema Curve
Association Object Position Component
SPACE
StartingAt
Duration TIME
15 Composed
Atomic
TIME
Figure 8: The rendering features schema
Example 4.2: Consider a scene that lasts 180 seconds (i.e., its temporal space is [0; 180]).
Assume that the rendering features change every minute. Suppose that the camera is located at (100; 100; 100) during the rst minute of the scene. Then, the representation of the rendering features of the scene is as follows:
2 Element
Rendering (scene; f([0; 60]; f(view point position; (100; 100; 100)); : : :g); : : :g) Features
Rendering
5 Taxonomy of Techniques to Produce/Display Video Clips Composed The
Intervals described in Sections 3 and 4 de ne the objects that constitute a three layers of abstraction video clip, their temporal and spatial relationships, and its rendering features. This description is termed version of a video object. It has the advantage of representing content information Startstructured Duration that is useful when processingRendering queries. However, this information needs some processing before it is rendered to the user (e.g., the animated video of Example 1 must be created by specialized software prior to its display). The result of processing this information is a bitmap (termed non{structured) Attributes TIME TIME representation of the video object (a sequence of frames to be displayed at a certain rate). Type
STRING
Value
16
ANYTYPE
The creation of the structured version of video can be done in the middle of its rendering (i.e., dynamically). For instance, the practical exercise of Example 1 is created dynamically. The direction of the automobile is de ned interactively by the trainee using the keyboard. Therefore the path followed by the car is dynamic. The creation of the structured version of video can also be de ned before its rendering (statically). For example, consider animation as a visualization tool for nancial data (i.e., 3D charts with x-axis representing the branch, y-axis the item and z-axis the gross sales of an item y at a branch x. The chart changes over time to show the behaviour in the last 60 months). First, the data and the layout of the charts are de ned, then the sequence of charts are displayed. Therefore, the structure of the animation (i.e., the data and layout of charts) is static and de ned only once. The creation of the non-structured version from the structured version of a media data can be done in either an interpreted mode or a compiled mode. The interpreter interleaves the creation of the non-structured version with its display. It considers information in the structure to create the rst portion of the non-structured version, then displays this portion. This procedure is repeated for subsequent portions of the non{structured version, until all portions are displayed. To illustrate, the interpreted display of a video is a sequence of frame constructions and displays. Conversely, the compiler constructs the entire non-structured version and then displays it. The dynamic creation of the structured version must be interpreted because the interaction with a user determines what is displayed next. While the static creation can be either interpreted or compiled. The mode of creation of non-structured media data (i.e., interpreted versus compiled) raises several issues. Video requires continuous display. Thus the delivery of data from the database to the specialized software (e.g., graphics package to create each frame in a video) must satisfy the continuous display constraint for the case of interpretation. For compilation, there is no such constraint. However, some optimization issues may arise (e.g., construct frames in an order that minimizes data retrieval from disk). As for the case of programming languages, interpretation is useful for the development stage as oposed to the production stage. The video designer might want to test a portion of the video speci cation as the programmer might want to test a routine. The display of video is periodic (i.e., a x number of frames are displayed each second). Therefore, we supply data from the database to the rendering software based on a period p. The period must be based on the display rates. More precisely, if the frequency of the display is f (e.g., the number of frames per second) then the period must be a multiple of f1 . The following procedure outlines the creation of a non-structured version of a video object O:
Step 1: Based on the period p de ned, split the temporal space of O into subintervals of
duration equal to p. For example, if the duration of O is d, then its temporal space (i.e., [0; d]) is split into the subintervals [0; p]; [p; 2p]; [2p; 3p]; : : :; [d pd e ? 1; d dp e].
Step 2:5 For each subinterval [i; j ] in O's temporal space: a. The DBMS supplies: 1) the components whose time intervals de ned relative to O
intersects with [i; j ], 2) the components' spacetime attributes, and 3) the rendering parameters associated to [i; j ].
5
Notice that this step has a continuous display constraint if the video object is being interpreted.
17
b. The specialized software (e.g., graphics packages) creates the ready to render version of O corresponding to the subinterval [i; j ].
c. (Optional) Compress the ready to render version of O corresponding to [i; j ]. Step 3: (Optional) Concatenate the ready to render versions of all subintervals in O's tem-
poral space. Step 4: (Optional) Compress the concatenation.
The compilation of video objects can be done either by parts or at once. For the case of compilation by parts, the components are compiled rst and then merged to create the non{ structured version of the composed object. For example, the compilation of a scene can be done in two steps. First, compile each motion (i.e., follow the procedure above for each motion). And then merge the non{structured versions of the motions (i.e., follow the procedure above for the scene and in Step 2.a supply the frames in the non{structured version of the motions corresponding to the interval [i; j ]). For the case of compilation at once, Step 2.a supplies atomic objects. The compiler must create a link between structured and non{structured versions of video objects. This link might be useful for query processing because of the content description of the structured version, For instance, suppose that the animated video (i.e., theoretical section) of Example 1 is compiled. The processing of the query \display scenes with blinking trac lights" might be as follows: Query the structured version rst, employ the linkage to retrieve the desired scenes, and then display them. Figure 9 summarizes how structured and non{structured versions of video objects are created. The coloring of ovals speci es allowed combinations of video objects creations. For example, compilation is surrounded by only one red oval. Thus, compilation is possible only if the structured version was created statically. The way a video object is created de nes requirements to be satis ed by the implementation of the database at the physical level. Therefore, this taxonomy can be used to classify implementation techniques at the physical level.
6 Content Based Queries Spatial and temporal constructs are useful for content based retrieval. They implicitly de ne temporal relationships (e.g., a character appears in a movie during the rst minute), spatial relationships (e.g., a painting of Van Gogh is hanging on the wall) and a combination of temporal and spatial relationships (e.g., one car is chasing another). It is beyond the scope of this paper to give a complete list of relationships between objects and a query language to interrogate the database. The purpose of this section is to illustrate the content information stored in the temporal and spatial constructs de ned in the composed objects layer. We discuss retrieval based on temporal, spatial and a combination of spatial and temporal information.
6.1 Temporal Relationships 18
Creation of the structured version of a video object
Statically
Structured version
Dynamically
Non-structured version
Figure 9: Taxonomy of implementations.
Compilation
Interpretation
19 By parts
At once
At once
Figure 10: The Thirteen Possible Temporal Relationships [All83] between two Time Intervals. The tree representation described in Section 4.2 implicitly de nes temporal relations among the components of a composed object. Each node has an associated time interval that represents the timing of a component with respect to its ancestor. The time intervals can be translated to represent timing with respect to any ancestor. Two elements in the tree can be compared if their timing was established with respect to a common ancestor. The time intervals associated to comparable elements determine whether the elements satisfy any of the relationships in Figure 10 de ned in [All83]. Any two intervals satisfy at least one of the relationships in Figure 10. In order to make the formulation of a query user friendly, the objects would be referred by its surrogate in the Name Space. Therefore, a mapping I between the Name Space and the Object Space must be de ned. The following example illustrates the use of temporal constructs in query X before Y formulation.
Example 6.1: Figure 11X shows the hierarchy corresponding to the de nition of a scene c with equal Y
timing relative to the temporal space of the movie (i.e., all the elements in the hierarchy are comparable). Each node contains three elements: X meets Y
1) The OID of the object that the node represents. For example, c: identi er of a scene, o1: identi er of a MickeyX Mouse motion, o2: identi er of a Minnie X Mouse motion, o11; : : :; o1n: overlaps Y identi ers of the representations of the postures of Mickey Mouse when motion o1 is played, Y o21; : : :; o2m: identi ers of the representations of the postures of Minnie Mouse when motion o2 is played. X during Y 2) The time interval when the object is rendered. For example, t: time interval when the scene is played within the movie, 1, ti2: time intervals when the Mickey Mouse and Minnie Mouse X startstiY X finishes Y time
20
Name Space Motion Scene
Posture
Object Space
. . . .
.. . . .
o12,ti12,p12
o1n,ti1n,p1n
o21,ti21,p21
o22,ti22,p22
o2m,ti2m,p2m
. . ..
Mickey Mouse Walking
o2,ti2,p2
o1,ti1,p1
. .. ..
Figure 11: Hierarchy of a scene and its mapping from the Name Space. 21
o11,ti11,p11
.
c,t,p
Name Space Mickey Mouse
Minnie Mouse Minnie Mouse Walking
Figure 12: Object and Name Spaces for a blue print of Los Angeles motions are played in the movie, ti11: time interval when the rst posture of Mickey Mouse in the motion o1 is played in the movie. 3) The position of the object in the rendering space of the movie. For example, p11; : : :; p1n: positions in the scenario where Mickey Mouse is at dierent instants of time in the motion. Figure 11 also presents a mapping between the Objects and Names space. This mapping is the interpretation that a user gives to the objects. A query retrieving the scenes containing both Mickey and Minnie Mouse could be formulated N as: Find the nodes n, such that n 2 I (Scene) and there exists nodes n1 and n2 in the subtree rooted by n where n1 2 I (Mickey Mouse) and n2 2 I (Minnie Mouse) and the time intervals ti1 ; ti2 associated to n1 and W n2 respectively Eintersect (i.e., :before(ti1; ti2) AND :before(ti2; ti1) AND :meets(ti1 ; ti2) AND :meets(ti2; ti1)). If ti13 = [100; 103] and ti25 = [102; 105] then the answer of the query would be a set of OIDs including c. 2
6.2 Spatial Relationships
S
The tree representation described in Section 4.2 implicitly de nes spatial relationships among 210 to temporal relations, to establish a spatial relationship between two the components. Similar objects it is necessary that the positioning of the two objects is de ned with respect to a common ancestor. x 605 Gas Station
22
To establish spatial relationships in video, it is necessary to consider an instantaneous snapshot. In this case, the hierarchy associated to the video object can be reduced to atomic objects whose time intervals contain the given instant (e.g., the atomic objects that conform the frame displayed at the given instant) and their positioning in the rendering space. As in the case of temporal relationships, the objects would be referred by its surrogate in the name space in the formulation of a query. Therefore the mapping between the name space and the object space must be de ned. The spatial relationships between objects might either be dependent or independent of rendering features such as view point. For example, to say that an object is behind another object is viewpoint dependent. While to say that an object is west of another object is independent of the viewpoint. We call the spatial relationships independent of the view point and other rendering parameters as Objective Spatial Relationships and the ones depending on the rendering parameters as Subjective Spatial Relationships.
6.2.1 Objective Spatial Relationships Objective spatial relationships might either be dependent or independent of an objective coordinate system (i.e., a coordinate system that is independent of the rendering parameters). For example, the relationships North, South, East and West are based on the coordinate system de ned by the cardinal points. On the other hand, relationships such as Contains, Adjacent and Intersects are independent of any objective coordinate system. At the atomic objects layer, each object has volume that can be scaled. The composed objects layer determines the position of the object in the rendering space. Therefore, information at both atomic and composed objects layer determine objective spatial relationships between two objects such as: intersect, adjacent, contains, etc. The following example illustrates how spatial constructs can be used to process queries with spatial relationships.
Example 6.2: Suppose that the database contains a blue print of the city of Los Angeles (see
Figure 12. We could formulate queries such as locate the rst gas station on the freeway 210 that is located East from the freeway 605. Figure 12 shows a section of the blue prints of Los Angeles. It represents part of the object and name spaces.6 The system selects the two objects in the hierarchy of Los Angeles that corresponds to the freeways 605 and 210 (e.g., search Fwy 605 in the name space and then apply the mapping to obtain the corresponding object in the object space). Next, the system determines the area where the two freeways intersect (i.e., the shadowed red rectangle). Subsequently, the system computes the gas stations that are located in the freeway 210 (using the spatial constructs). Finally, the system selects the gas station whose coordinates are east of freeway 605 (i.e., gas stations whose West-East coordinates are greater than x) and have the smallest West-East coordinates. 2
Instead of the actual hierarchy and mapping from the name to the object space, a graphical representation is presented to facilitate understanding. 6
23
6.2.2 Subjective Spatial Relationships Subjective spatial relationships can be established with procedures that determine whether a relation between two objects is satis ed based on the rendering parameters. For example, InFrontOf and Behind relationships are viewpoint dependent. We are not aware of speci c algorithms to decide whether an object is in front of another object. However, a similar problem: the hidden surfaces problem (i.e., deciding what surfaces of a 3D object are visible), has been studied extensively [TT90]. Therefore, we believe that algorithms to establish InFrontOf and Behind exist.
6.3 Spatial and Temporal Relationships Some of the spatial and temporal relationships can be expressed as a combination of spatial relationships and temporal relationships using logical connectors such as AND, OR, etc. But there are some cases where it is impossible to express them as a nite combination of spatial and temporal relationships. To illustrate, consider a concept such as \chasing" (it is a relationship between several objects). In a video that includes one car chasing another, we can capture the concept of chasing by analyzing the paths7 followed by the two cars. If the paths of both cars are similar, we can infer that one car is chasing another. However, we can mistakenly infer a car chase in the case of a trailer carrying cars. This limitation can be eliminated by analyzing the spatial relationship (on top) between the cars and the trailer.
7 Future Research The feasibility of the implementation of the proposed model, and the resources (memory capacity and disk bandwidth) required to support the desired functionality are two issues worth investigating. The feasibility of the compiler can easily be observed. A compiler has two main components: a database, and a rendering module. The database supplies structured video clips to the rendering software and the rendering software constructs non{structured video clips to be stored back in the database. There are many implementations of object{oriented databases available8 . In addition, 3D representation of objects and their rendering have been studied extensively [FvDFH90]. Moreover, there are commercial rendering packages available (e.g., PHIGS, PEX, GL, etc.). The interpreter, on the other hand, imposes continuous display constraints that requires further investigation. The retrieval of objects that constitute a structured video clip and its rendering must satisfy temporal constraints. Otherwise, the display may suer from disruptions, termed hiccups. Current technological trends predict CPUs signi cantly faster than those available today. Therefore, the aspect that needs further study is the retrieval. It constitutes the focus of this dissertation. Current techniques to retrieve data from a database do not consider the new requirements imposed by the proposed model: hiccup{free display of video and divisible video clips (i.e., objects participating in video clips might be shared among several video clips and within a video clip). Therefore, we shall investigate new techniques to retrieve data with the new requirements. We shall concentrate on the interpretation of statically created structured video clips. 7 8
The paths are represented in the composed objects layer. Omega [GCKL93] is an object{oriented database implementation developed at USC.
24
Disk bandwidth and memory capacity are often the critical resources when displaying an object. Intelligent placement of data across the disks is crucial to eective utilization of the available disk bandwidth. Similarly, memory management policies impact the frequency of retrieval from the disk subsystem. Therefore, it is essential to study memory management and data placement techniques for the target application. We intend to investigate these techniques by focusing on atomic objects only, ignoring: 1) temporal and spatial constructs, and 2) rendering features. This is because atomic objects are large and require a signi cant fraction of disk bandwidth and memory capacity. We speed up the retrieval of atomic objects by employing the aggregate bandwidth of multiple disks. Our design enables the system to scale up, i.e., as the size of the database grows, additional disks can be introduced to maintain the desired performance. We shall start by focusing on a single user system. Time permitting, we shall investigate a multi{user environment assuming a shared{nothing architecture [DG90]. Based on how the data is assigned to the disks and the employed memory management policies, a scheduler must assign resources (i.e., disk bandwidth and memory) intelligently in order to ensure a hiccup{free display while minimizing the amount of resources required to support this display. The scheduler determines what and when to retrieve data from disks to satisfy a hiccup{free display. We shall devise and evaluate scheduling techniques for the data placement and memory management policies. We intend to evaluate the proposed techniques using a simulation study. We will obtain the parameters for the simulations based on educated guess using an implementation. Our implementation consumes a structured object and produces a non{structured video clip as output. It reads a script specifying structured video clips and generates the corresponding non{structured versions. The non{structured video clips can then be compressed and displayed using MPEG. The script speci es for each video clip: the le names where the atomic objects are stored, temporal and spatial constructs, and rendering features. Each atomic object has a 3D representation that is stored as a le in the unix operating system. We shall estimate simulation parameters based on our experience using the implementation.
7.1 Memory Management Intelligent management of memory is crucial to the interpretation process. Data is retrieved from disk to memory, then processed and displayed. Since video clips must be displayed continuously then either data must be prefetched to assure a hiccup{free display or data must be retrieved fast enough to keep up with the display rate. The second alternative seems the most viable because it requires less memory. One way of minimizing the disk bandwidth requirement of a display is to minimize the frequency of retrieval from the disk subsystem. An intelligent policy to manage memory will increase hit ratio, decreasing the amount of data retrieved from the disk subsystem. The proposed model diers from the alpha{numeric databases in that the system can predict references to objects in the future based on the temporal constructs. Thus traditional page replacement policies such as FIFO, MRU, LRU, and LRU-K[OOW93] might not be appropiate. A page replacement policy that replaces a page that will not be referenced in the longest period appears to be more appropiate for our target environment. A replacement policy will be devised based on the temporal constructs of video clips and 25
compared to the existing techniques. The evaluation of this technique might include the following metrics: working set size [Den68] and hit ratio.
7.2 Data Placement on Disks The size of the disk page impacts the fraction of disk bandwidth that is wasted. A magnetic disk drive is a mechanical device. It incurs a seek and latency time prior to transferring a disk page. The operations attributed to seek and latency times are wasteful work. They minimize the bandwidth of the disk drive. By increasing the size of a disk page, the system can minimize the fraction of disk bandwidth that is wasted due to disk seek and latency operations. However, increasing the size of a disk page implies that the system might read more data than necessary. This result in a trashing behaviour that increases the number of references to the disk subsystem. We intend to investigate this tradeos further. A hiccup{free display requires a constant rate of data retrieval. In a multidisk hardware platform, the distribution of data across disks impacts the retrieval rate of the system. For example, assigning all objects in a video to one disk might result in formation of a bottleneck, providing for a retrieval rate that is lower than the display rate and resulting in hiccups. Therefore, the display of the video might suer disruptions. An intelligent assignment strategy would avoid this scenario by distributing the objects across multiple disks in order to utilize their aggregate bandwidth. The placement of data on disks must assign objects or portions9 of objects to disk pages. There are two alternative ways of making this assignment. The rst one clusters small objects into disk pages and strips large objects across disk pages. Then, it assigns disk pages to disk drives. The second one assigns small objects to disk drives, strips large objects across disk pages and assigns them to disk drives. Then for each disk drive, it clusters small objects in disk pages. In sum, the placement of data on disks must consider the following issues: 1) clustering objects in disk pages, and 2) assigning objects/disk pages to disks. The dissertation shall investigate both alternatives for data placement.
7.2.1 Clustering Data clustering has been studied in the context of alpha{numeric object{oriented databases. Tsangaris and Naughton [TN91], Cheng and Hurson [CH91], and McIver and King [JK94] developed clustering techniques based on the heat10 of the objects and frequency of traversals associated with the relationships that exist among the dierent objects. However, these techniques did not consider hiccup{free display requirement of video. One issue that will be addressed in this dissertation is, whether these techniques are appropiate for use in a system that implements the proposed model. And, if they are not appropiate, we intend to propose new clustering techniques. These techniques will be evaluated and compared with the current techniques. A possible spectrum of clustering techniques could range from frequency based techniques (i.e., current techniques) to those that take advantage of temporal relationships that exist between objects. We shall consider dierent degrees of combination of these two types of 9 10
If the object is larger than a disk page. Frequency of access.
26
techniques and evaluate them with metrics such as: working set size [Den68] and hit ratio.
7.2.2 Assigning Objects/Disk Pages to Disks Assigning objects/disk pages to multiple disks have been studied before. Copeland et al [CABK88] and Weikum et al [WZS93] introduced techniques to assign objects and disk pages to disks aiming to balance the load. These techniques were based on the heat of objects. However, these techniques considers neither the frequency of traversals nor the hiccup{free display constraint. Load balancing techniques do not prevent the system from assigning simultaneously retrieved disk pages to the same disk drive. Ideally, simultaneously retrieved disk pages should be placed in dierent disk drives to use the aggregate bandwidth and reduce the retrieval time. An increase in the number of simultaneously retrieved disk pages assigned to the same disk drive leads to an increase in retrieval time that might cause hiccups. The technique proposed in [GWLZ94] considers the frequency of traversals in addition to the heat of objects to assign objects to multiple disks aiming to balance the load. However, this technique does not take into consideration hiccup{free display requirement of video. As for the other techniques, the frequency of traversals does not prevent the system from assigning simultaneously retrieved disk pages to the same disk. Therefore, the use of this technique in a system that implements the proposed model might lead to hiccups. We shall devise techniques to assign objects or disk pages to disks that consider temporal relationships between objects besides heat and frequency of traversals. Replication is an alternative that might alleviate the formation of temporary bottlenecks that result in hiccups (e.g., if two simultaneously retrieved disk pages reside on the same disk, then a backup copy might be placed on another disk to be able to retrieve both pages simultaneously). Notice that replicating atomic objects has the advantage of reducing the probability of data becoming unavailable due to disk failures (if a disk drive fails, its content can be found somewhere else). And unlike alpha{numeric databases, atomic objects are unchangeable. Therefore, there is no need to propagate updates between atomic objects and their replicas. We intend to evaluate and compare the proposed techniques with current techniques. A possible spectrum of these techniques could range from frequency based (i.e., current techniques) to those that take advantage of temporal relationships between objects. We shall consider dierent degrees of combination of these two types of techniques and evaluate them with metrics such as the con ict ratio (i.e., clusters competing for the same disk bandwidth).
7.3 Scheduling Scheduling when a disk page is retrieved from disks is crucial for assuring a hiccup{free display. The scheduler fragments time into equi{sized intervals, termed time intervals. A time interval is the duration of the retrieval of a disk page in the worst case scenario. The scheduler decides what disk pages to retrieve from disks at each time interval. This decision must be based on the deadlines that the display must satisfy to avoid hiccups. Objects that constitute a frame must be retrieved from disks (if they are not already in memory) and processed before the system nishes the display of the previous frame. 27
The hiccup{free display might dictate to retrieve simultaneously disk pages that reside in the same disk. To solve this con ict, the scheduler might prefetch all the disk pages but one at previous time intervals. However, prefetching disk pages requires extra memory. Hence, data placement techniques should minimize the number of simultaneously retrieved disk pages to be assigned to the same disk drive. The main objective of data placement, memory management and scheduling techniques is to satisfy continuous display constraints. Therefore, we shall evaluate them based on their eectiveness in avoiding hiccups.
References [All83] [BGMJ94] [CABK88] [CH91] [CL93] [Coh92] [Den68] [DG90] [Enc92] [FAC93] [FvDFH90] [Gal91] [GBT94] [GCEMJ93]
James F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, November 1983. S. Berson, S. Ghandeharizadeh, R. Muntz, and X. Ju. Staggered Striping in Multimedia Information Systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 79{90, June 1994. G. Copeland, W. Alexander, E. Boughter, and T. Keller. Data placement in bubba. In SIGMOD, pages 100{110. ACM, 1988. J. R. Cheng and A. R. Hurson. Eective clustering of complex objects in object{oriented databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 22{31. ACM SIGMOD, 1991. H-J. Chen and T.D.C. Little. Physical storage organizations for time-dependent multimedia data. In Proceedings of the 1993 Foundations of Data Organization and Algorithms (FODO) Conference, October 1993. M. F. Cohen. Interactive spacetime control for animation. In Proceedings of SIGGRAPH, pages 293{302, 1992. Peter J. Denning. The working set model for program behavior. Communications of the ACM, 11(5):323{333, May 1968. D. J. DeWitt and J. Gray. Parallel database systems: The future of database processing of a passing fad? SIGMOD RECORD, 19(4):104{112, December 1990. The Software Toolworks Multimedia Encyclopedia. Software Toolworks Incorporated, 1992. A. Hsu F. Arman and M-Y. Chiu. Image processing on compressed data for large video databases. In Proceedings of the ACM Multimedia, pages 267{272, 1993. J. D. Foley, van Dam, Feiner, and Huges. Computer Graphics: Principles and Practices. Addison-Wesley, 1990. Didier Le Gall. Mpeg: A video compression standard for multimedia applications. Communications of the ACM, 34(4):47{58, April 1991. S. Gibbs, C. Breiteneder, and D. Tsichritzis. Data Modeling of Time-based Media. In Proceedings of the ACM SIGMOD International Conference on Management of Data, May 1994. S. Ghandeharizadeh, H. Chan, M. Escobar-Molano, and X. Ju. On con guring hierarchical multimedia storage managers. Technical Report USC-012-93, University of Southern California, 1993.
28
[GCKL93] S. Ghandeharizadeh, V. Choi, C. Ker, and K. Lin. Design and implementation of Omega object-based system. In In Proceedings of the fourth Australian database conference, February 1993. [GDS95] S. Ghandeharizadeh, A. Dashti, and C. Shahabi. A Pipelining Mechanism to Minimize the Latency Time in Hierarchichal Multimedia Storage Managers. In Data Communications, 1995. [GR93] S. Ghandeharizadeh and L. Ramos. Continuous retrieval of multimedia data using parallelism. IEEE Transactions on Knowledge and Data Engineering, 1(2), August 1993. [GS93] S. Ghandeharizadeh and C. Shahabi. Management of Physical Replicas in Parallel Multimedia Information Systems. In Proceedings of the 1993 Foundations of Data Organization and Algorithms (FODO) Conference, October 1993. [GW92] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Addison Wesley, 1992. [GWLZ94] S. Ghandeharizadeh, D. Wilhite, K. Lin, and X. Zhao. Object placement in parallel object{ oriented database systems. IEEE Data Engineering, February 1994. [Han93] D. Hancock. Virtual prototyping. IEEE Spectrum, pages 34{39, October 1993. [ISK+ 93] H. Ishikawa, F. Suzuki, F. Kozakura, A. Makinouchi, M. Miyagishima, Y. Izumida, M Aoshima, and Y. Yamane. The model, language, and implemantation of an object{oriented multimedia knowledge base management system. ACM Transactions on Database Systems, 18(1):1{50, March 1993. [JK94] W. J. McIver Jr and R. King. Self-adaptive, on-line reclustering of complex object data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 407{418, 1994. [LG93] T. D. C. Little and A. Ghafoor. Interval-based conceptual models for time-dependent multimedia data. IEEE Transactions on Knowledge and Data Engineering, 5(4), August 1993. [Mac92] Macromind director. MacroMind Inc., 1992. [OOW93] E. J. O'Neil, P. O'Neil, and G. Weikum. The lru-k page replacement algorithm for database disk buering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 297{306, May 1993. [OT93] E. Oomoto and K. Tanaka. Ovid: Design and implementation of a video-object database system. IEEE Transactions on Knowledge and Data Engineering, 5(4):629{643, August 1993. [Shi81] D. Shipman. The Functional Model and the Data Language DAPLEX. ACM Transactions on Database Systems, 6(1):140{173, 1981. [SW94] G. A. Schloss and M. J. Wynblatt. Building temporal structures in a layered multimedia data model. In Proceedings of ACM Multimedia, pages 271{278, October 1994. [TN91] M. Tsangaris and J. Naughton. A stochastic approach for clustering in object bases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 12{21. ACM SIGMOD, May 1991. [TT90] Nadia Magnenat Thalmann and Daniel Thalmann. Computer Animation Theory and Practice. Springer-Verlag, 1990. [Ull88] Jerey D. Ullman. Principles of Database and Knowledge{base Systems. Volume I. Computer Science Press, 1988. [Wal93] J. Walton. Get the picture - new directions in data visualization. In Animation and Scienti c Visualization. Tools and Applications, pages 29{36. Academic Press, 1993.
29
[WK88] [WW94] [WZS93]
A. Witkin and M. Kass. Spacetime constraints. In Proceedings of SIGGRAPH, pages 159{168, 1988. L. Weitzman and K. Wittenburg. Automatic presentation of multimedia documents using relational grammars. In Proceedings of ACM Multimedia, pages 443{451, October 1994. G. Weikum, P. Zabback, and P. Scheuermann. Dynamic le allocation in disk arrays. In Proceedings of the 1993 Foundations of Data Organization and Algorithms (FODO) Conference, October 1993.
30