Document not found! Please try again

Heuristics for Optimizing Multi-clip Queries in Video Databases - ISYS

0 downloads 0 Views 344KB Size Report
Feb 21, 2004 - cation is triggered by well-accepted standards, like SMIL, which enable simple authoring of interactive audiovisual presentations. Supposing a ...
Heuristics for Optimizing Multi-clip Queries in Video Databases Harald Kosch1 , Ahmed Mostefaoui2 , L´aszl´o B¨osz¨orm´enyi1 , Lionel Brunie3 1

Institute of Information Technology, University Klagenfurt, Austria ; email:harald(laszlo)@itec.uni-klu.ac.at 2 Computer Science Lab Franche-Comt´e, University of Montb´eliard, France ; email:[email protected] 3 Information Systems Engineering Laboratory , National Institute of Applied Sciences Lyon, France ; email:(Lionel.Brunie)@insa-lyon.fr

February, 2004 Abstract. In this paper we address the multi-clip query optimization problem where a multi-clip query requests multiple video clips. We propose a new heuristics called Restricted Search Interval that maximizes clip sharing between queries and consequently reduces the network bandwidth of a video server for a multicast system. An adaptation of our heuristics for optimizing the response time of the query is also presented. The experimental results show that the suggested heuristics reduces the server workload by about 28% on the average in comparison to a classical heuristic approach. Keywords: Video Databases, Video Server, Multi-Clip Queries, Piggypacking.

1. Introduction Many recent multimedia applications use the multi-clip query paradigm to display multiple video clips. In such a paradigm, the result of a query is a set of continuous objects (audio or video) that need to be retrieved from a video server and delivered/presented to the user. In addition to that, many applications tend to request the same clips, so called hot clips, in peak request hours. For instance in news-ondemand the top news of the day is highly demanded in the evening hours. Such typical request scenarios have been exploited to improve buffer management (Kosch et al., 2002) or disk bandwidth in video servers (Min-You Wu, 2001). However, optimization of the network and disk bandwidth of a video server has not been considered yet. In this context this paper investigates how the admission control of a video server may benefit from a multicast enabled network and the main idea is not to serve the same clip demands more than once. The work is motivated by the fact that multicasted network systems are technologically mature (Wittmann and Zitterbart, 2000). For instance, the virtual network layer MBone is now in use for many years. c 2004 Kluwer Academic Publishers. Printed in the Netherlands.

koschmtap234-01.tex; 21/02/2004; 20:28; p.1

2

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

Moreover, multicast systems are supported by network protocols, like the IP Multicast, which enables sources to send a single copy of a video to multiple recipients who want to receive this information (Diot et al., 2000). The use of multicast systems for multiple video delivery is more efficient than requiring the source to send an individual video to each requester, in which case the number of receivers is limited by the bandwidth available to the sender. In our theoretical framework (section 4), we will assume that the server has an maximum available bandwidth sharable among the clips. This abstraction allows us firstly to formalize the problem independently of any physical architecture and secondly to provide general approaches to optimize the external network bandwidth as well as the I/O workload. The focus of our study is multi-clip queries which use is triggered by a wide range of Video-on-Demand applications including: (a) Customized News-On-Demand (Jiang et al., 1999) where users formulate queries like: ”show me the news clips of the day”. The result of such a query includes all news clips (politics, sport, economics, etc.) related to that day. A user may even ask for specific news like: ”show me all the highlights of the basketball matches of the weekend”. (b) Tele-Learning (Zhang and Gollapudi, 2000; Megzari et al., 2002) which aims to deliver instructional video material to individual users. An example is the research channel on demand video archive at http: //www.researchchannel.com. This archive is supported by more than 50 universities and research organizations. For instance, a computer science student preparing for an exam on commercial image retrieval systems may issue the query: ”show me all clips explaining the IBM QBIC System”. (c) Specialized Video Archives (Kosch et al., 2001) where users may submit queries for decision-making. For instance let us assume a video archive storing soccer clips of a national league. Then a soccer trainer preparing his team for a penalty session of a final cup may rely on this archive by submitting a query like: ”give me all clips showing a penalty shot by one player of the next advisory”. (d) Video Editing (Anderson, 1997) where an object is composed of a number of clips with strict temporal relationships between them. Hence, delivering an object is synonymous to delivering the various clips that compose that object. It is a common experience that the management of multi-clip queries poses a number of problems for the server. First, there can be a number

koschmtap234-01.tex; 21/02/2004; 20:28; p.2

Heuristics for Optimizing Multi-clip Queries

3

of possible delivery scenarios for a submitted multi-clip query 1 . This is due to the presentation flexibility permitted by many applications. For example, a presentation of 3 clips with no ordering constraints has 6 different possible delivery scenarios. In general a presentation of n clips has n! different delivery possibilities. The task of the server is therefore to find the optimal delivery scenario according to a given optimization metrics. Secondly, applications may specify complex presentation scenarios by imposing structural and temporal constraints. For instance, presenting documents written in the Synchronized Multimedia Integration Language (SMIL) (W3C, 2001)2 allows the specification of precedence (i.e., one clip has to be delivered after another one, with minimum and maximum tolerable delay constraints between two clips to be presented sequentially). For example, the following simple SMIL-document specifies the constraint that video2.mpeg has to be presented 5 seconds after video1.mpeg. There exist many different types of complex presentation compositions. In video editing applications, it is generally required that the clips of a presentation be ordered (Meng et al., 1999). In customized Video-On-Demand applications clips can be partially ordered (Bouras et al., 1999; Raymond and Paul, 1998). For example in a Tele-Learning system, concepts contained in different clips may build on each other. Let us suppose that the three clips c 1 , c2 , c3 contain concepts A, B, and C and let us further assume that A may build on the two other concepts B and C. A student requesting these three clips will probably declare a precedence dependency between B and A, and between C and A. He/she will not however bother whether B or C is delivered first. Partial ordering in News-on-Demand can be imposed by user preferences, for example a user preferring to receive political news before sports news. Finally, there is generally no clip ordering in electronic commerce applications. For example a user asking a company for twenty clips of the latest soul music will not in general impose delivery constraints to make his/her selection. 1

The terms multi-clip query and presentation are used interchangeably in the reminder of the paper. 2 SMIL 2.0 was released in August 2001. The popularity of SMIL is underlined by numerous tools that appeared recently. (refer to http://www.w3.org/TR/smil20/.)

koschmtap234-01.tex; 21/02/2004; 20:28; p.3

4

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

The server must adhere to the ordering requirements when delivering clips of submitted presentations. The constraints that can be imposed by users and applications on a presentation fall into two categories (Shahabi et al., 1998b) : 1. precedence constraints related to the ordering of clips when delivering them; 2. delay constraints related to the waiting time that users/applications can tolerate. We distinguish the following delay constraints imposed on the server : Max Startup : the maximum waiting time an application can tolerate between the time the query is submitted and the time the first clip is delivered. Max Delay : the maximum waiting time between the delivery of two successive clips. Min Delay : the minimum waiting time imposed on the server for the delivery of two successive clips. This last constraint is necessary in video editing applications, for example, where the processing of clips is rather costly in terms of computing and storage resources. It is therefore crucial for such applications not to be forced to receive successive clips too early. Example: To illustrate the presentation optimization problem, let us consider the following example where there are three presentations P1 , P2 and P3 , such that : P1 = {(c1 , 15, 1.5), (c2 , 10, 3), (c3 , 15, 1.5)} P2 = {(c4 , 15, 3), (c5 , 10, 1.5), (c6 , 15, 1.5), (c7 , 10, 1.5)} P3 = {(c8 , 25, 1.5), (c6 , 15, 1.5), (c4 , 15, 3)} Each tuple contains the clip identity c i , its length and its delivery rate respectively (e.g., In presentation P 1 , clip c1 has a length of l=15s and a delivery rate of r=1.5 Mb/s). For each presentation we further specify the following delay and presentation constraints :

P1 P2 P3

Max Startup

Max Delay

Min Delay

Precedence

25s 20s 5s

10s 20s 15s

0s 0s 0s

{(c1 , c2 )} {(c8 , c6 )}

We suppose that the server has an available network bandwidth of 3 Mb/s. Then figure 1 shows an optimal schedule for the clips of the three submitted presentations with respect to the Workload Metric, i.e., there exits no other valid schedule with a smaller mean workload. For every clip, the optimizer attributes a start-time at which the clip is to be delivered. Note that some clips are requested by more than on presentation. For example, clip c 6 is requested simultaneously by presentations P2 and P3 . If no constraint violation is encountered, clip

koschmtap234-01.tex; 21/02/2004; 20:28; p.4

5

Heuristics for Optimizing Multi-clip Queries

Server Bandwidth (Mb/s)

c6 can therefore be shared between P2 and P3 . Such sharing is called ”piggybacking”.

3

C8 C1 0

5

10

C6

C2

1.5

C5 15

20

C4

C3 25

30

35

40

45

C7 50

55

60

65

70

Time (s) 75

80

Figure 1. Example schedule of the presentations.

In many multimedia applications, including those mentioned above, a subset of clips are more frequently requested (“they are hot”) than the rest of the data. For instance, in News-On-Demand applications, clips from the current day are usually far more frequently requested than those from previous days. Some of these today’s clips contain top news and will simultaneously be demanded in the peak access times. In Tele-Learning applications clips explaining core problems of a course are ”hot” in the preparation period of exams. In such periods, these clips are likely to be requested simultaneously by many students. Hence, piggybacking, whenever possible, is beneficial because shared clips require no additional server resources. This increases the throughput of the server and consequently allows it to support more simultaneous presentations. In this paper, we concentrate on how to maximize the effect of piggybacking in scheduling presentations. The rest of the paper is organized as follows : the next Section describes how the proposed optimization algorithms in the video server relate to an end-to-end system. Section 3 presents previously proposed findings to the multi-clip optimization problem. In Section 4 the research problem tackled in this paper is clearly outlined and definitions are proposed. Section 5 presents the suggested heuristics. In Section 6, the effectiveness of the proposed heuristics is evaluated through a series of experiments. Section 7 highlights the future work and concludes this paper.

2. Multi-Clip Query Optimization and End-to-End Systems Multi-clip queries are related to two big classes of applications. In the first class of applications, the multimedia presentations are an integral part of a multimedia database and the multi-clip query to the video server is issued from the database. In the second class of applications,

koschmtap234-01.tex; 21/02/2004; 20:28; p.5

6

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

the multi-clip query is composed at the client applications. Following a brief explanation of these classes of applications. 1. Multimedia presentations can be an integral part of a multimedia database system, i.e., users will be able to store, query, and possibly manipulate multimedia presentations using a single database management system. Moreover, as a result of demands from users, video servers need to have the capability to serve, possibly over a network, not only individual video streams (i.e., video-on-demand), but presentations (i.e., presentation-on-demand) as well (Oria et al., 1999; Adali et al., 1999; Lee et al., 2000; Prabhakaran, 2000).

c1

MCAST Network c1

Video Server Multimedia Database Query− Optimization Processing

Resource Scheduling Admission Control

Give me the presentation showing the highlights of today’s sport events Client

Client

Buffer Cache

Give me the presentation of today’s basketball events

P={(c1,15,1.5),(c2,10,1.5)} P={(c1,15,1.5)}

Figure 2. End-to-end architecture of a multi-clip application involving a multimedia database.

Figure 2 shows the end-to-end architecture which relates to such an application. The client queries the database for a stored multimedia presentation. Support is given by presentation query languages or visual browsing tools, e.g., the GVISUAL browsing and query tool and the GOQL query language proposed in (Sheng et al., 1999). Once a presentation has been selected for delivery, the database contacts the video server to deliver the clips with the presentation specified constraints. The system is multi-user enabled, i.e., different clients may request videos simultaneously. Many video-on-demand applications follow a request characteristics where some clips are demanded far more often than others. Thus it would be meaningful to deliver clips from simultaneously, or nearly simultaneously submitted queries not more than once. With the help of a multicast enabled network (MCAST), as shown in figure 2, the video sever may deliver the same clip (c1 in figure 2) to different users without dispensing additional network bandwidth. This bandwidth sharing will be exploited by the proposed heuristics in this paper.

koschmtap234-01.tex; 21/02/2004; 20:28; p.6

7

Heuristics for Optimizing Multi-clip Queries

2. Multimedia presentations are composed at the client application, i.e., the user requests multiple multimedia data from a multimedia database which can contain complex structural and temporal constraints defined by QoS (Zhang and Gollapudi, 2000) requirements. These can be application specific (e.g., video editing (Anderson, 1997)) or user specific (e.g., sequential presentation of the news highlights of the day (Raymond and Paul, 1998)). Again, the multimedia server should have the capability to serve the presentations consistently, i.e., the delivered presentations must satisfy the specified constraint parameters. Video Server Client c1

MCAST Network

Client

Resource Scheduling Admission Control

c1

Buffer Cache P={(c1,15,1.5),(c2,10,1.5)} P={(c1,15,1.5)}

Figure 3. End-to-end architecture of a multi-clip application without a multimedia database.

Figure 3 shows the end-to-end architecture which relates such an application. In contrast to the class 1 of applications, the client composes the constraints for the presentation. This kind of application is triggered by well-accepted standards, like SMIL, which enable simple authoring of interactive audiovisual presentations. Supposing a multi-user system as before, the multi-clip query looks for the video server pretty the same, except that it is delivered from the client, and not generated by the database. Once again, the multicast enabled network is the basis for a successful network bandwidth sharing.

3. Related Work The problem of optimizing multi-clip queries in video databases is related to the problem of multiple-query optimization and processing in traditional relational databases. The main goal of multiple-query processing is to optimize a set of queries together and to execute the common operations once (Chen and Dunham, 1998). This has been an active research area over the last fifteen years. Sellis has shown

koschmtap234-01.tex; 21/02/2004; 20:28; p.7

8

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

in (Sellis, 1988) that the problem is NP-complete and that substantial reduction in response time over single-query execution can be obtained. This inspired us to examine the possible benefits of sharing objects in multi-clip query processing. However, the techniques employed in multiple-query optimization do not apply to the problem of optimizing multi-clip queries. The reason is the fundamental difference in the nature of the constraints imposed in the two classes of applications. Multi-clip queries impose precedence, delay, and resource constraints as multiple relational queries have their “constraints” specified in the query formulation, i.e., in the dependence of the operations executed on relations. The aim of a multiple-query optimizer is therefore to share sub-expressions as much as possible between queries (Chen and Dunham, 1998), whereas a multi-clip optimizer orients its optimization goal at a given metric (see Section 4.3). For instance, for the Response Time Metric, sharing of clips from previously submitted presentations with the smallest start-time is preferable, while for the Workload Metric, sharing of as many heavy clips (where length × delivery rate is high) as possible is preferable. Only recently, some works are started for the optimization and processing of queries on multimedia presentations stored in distributed database systems (Adali et al., 1999; Lee et al., 2000; Prabhakaran, 2000). The constraints considered are, for instance, sequential or concurrent playout of corresponding streams during presentation. Multimedia presentations are modeled as presentation graphs. Lee et al. (Lee et al., 2000) have worked on the conception of the presentation graph model and on the development of a query language to query and manipulate presentation graphs. Ada et al. (Adali et al., 1999) have extended the work of (Lee et al., 2000) by accounting for interaction and defined operations for combining presentations from multiple databases into a single presentation. Furthermore, they proposed a query optimization framework, including a query algebra. These works (Adali et al., 1999; Lee et al., 2000) allow users to query the presentation database for content and structure and enable them to build new presentations based on the query results. However, they are not concerned with the presentation delivery management. B. Prabhakaran (Prabhakaran, 2000) discusses different approaches to adapt multimedia presentations when resources cannot guarantee real-time requirements (server and client side). He introduces the concept of flexible temporal specification to allow efficient adaptation (structure and content) of the delivery. Possible benefit of piggypacking is not, however, considered. S.T. Campbell et al. (Campbell and Chung, 2002) have recently presented a methodology for delivering multimedia objects from a multimedia database system that maintains the temporal

koschmtap234-01.tex; 21/02/2004; 20:28; p.8

Heuristics for Optimizing Multi-clip Queries

9

ordering requirements specified by a client. They rely on well-known disk scheduling policies, like scan, and perform disk block prefetching for requests to be delivered. However, this work does not consider the possible profit of piggypacking for multiple submitted presentations. An important contribution to the presentation delivery management comes from the works in (Huang et al., 1998; Johnson and Zhang, 1999; Song et al., 1999). These works aim to keep the delivery of the presentations consistently (i.e., the delivered presentations satisfy the specified constraint parameters). Huang et al. (Huang et al., 1998) have designed effective strategies to dynamically adjust any rate variations in the transmission of the data. Zhang et al. (Johnson and Zhang, 1999; Song et al., 1999) have developed strategies to ensure consistent play-out defined in the presentations. Bouras et al. (Bouras et al., 1999) have developed efficient buffer management algorithms to smooth presentation and synchronization anomalies. However the impact of presentation constraints to the resource management in video databases is yet little exploited. In a previous work (Kosch et al., 2002) we have proposed a novel prefetching strategy based not only on run-time information (access frequencies of individual video objects, for example) but also on knowledge about the clip structures. Thus, we give a competitive edge to presentations submitted in the future and demanding already buffered ¨ ¨ (hot-accessed) clips. Balkir and Ozsoyoglu (Balkir and Ozsoyoglu, ¨ 1998a; Balkir and Ozsoyoglu, 1998b) have developed algorithms for buffer management and admission control with respect to different application domains. Both works (Kosch et al., 2002; Balkir and ¨ Ozsoyoglu, 1998a) do not consider any delivery constraints. Golubchick et al. (Lau et al., 1998) have studied the effectiveness of piggypacking for reducing I/O demand in Video-on-Demand systems. They have found that a small variation in the delivery rate of streams can enable enough merging (piggypacking) of I/O streams and a significant reduction of I/O bandwidth can be realized. However, they do not consider multi-clip queries. The works in (Raymond and Paul, 1998; Shahabi et al., 1998a; Garofalakis et al., 1998) address the problem of optimizing more complex queries in video databases. Shahabi et al. (Shahabi et al., 1998a) have proposed an excellent formulation of the problem by defining the set of constraints that a multimedia application can impose on a presentation. However, they do not consider the potential benefit of piggybacking. Garofalakis et al. (Garofalakis et al., 1998) have provided a nearoptimal scheduling algorithm based on Graham’s list-scheduling for composite multimedia objects of different length and rate. However, they do not include any delay constraint, nor they consider piggy-

koschmtap234-01.tex; 21/02/2004; 20:28; p.9

10

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

backing. Raymond and Paul (Raymond and Paul, 1998) make the simplification that all clips in the database have the same rate and duration. They then show that optimizing multi-clip queries is the same as finding a maximum matching in a bipartite graph. In our work, we consider the potential benefit of piggybacking without making a reductionist assumptions either on rate or on duration of the clips. May we finally remark that our problem shares similarities with sequencing and scheduling problems as described in (El-Rewini et al., 1994). These problems range from task scheduling with or without weights, deadlines, and constraints on resources and waiting queues. However, the proposed solutions are not applicable (also pointed out by Raymond and Paul in (Raymond and Paul, 1998)), because they do not take piggypacking into account. The key complication introduced by piggypacking (if it is possible to apply) is that the resource constraints may change over time.

4. The Problem The research problem that we study in this paper is how to find efficiently an optimal or near optimal schedule of the presentation’s clips that maximizes the benefits of piggybacking. In other words, given a submitted presentation with its constraints, the optimizer must find an optimal schedule of the presentation’s clips in a reasonable time such that neither the presentation constraints nor the network and server constraints are violated. The latter assumes that the server has enough available resources to sustain the requirements of all supported presentations. This implies that for every new submitted presentation, the server checks whether or not it can be admitted (admission control). Deriving admission criteria for a presentation is a complex task and is highly dependent on the physical characteristics of the server (Jiang and Mohapatra, 1999) (size of the buffer, disk bandwidth, striping technique used, etc.). The purpose of this paper is not to address a particular server architecture, but to propose a general framework for the multi-clip optimization problem. For this reason, we assume that the server has a maximum available bandwidth sharable among the clips (i.e., at any instant the maximum number of simultaneously delivered clips is limited). We also assume that a multicast enabled network is available so that the same clip can be sent to different clients (please refer to Section 2 for our distributed environment). Furthermore, we

koschmtap234-01.tex; 21/02/2004; 20:28; p.10

Heuristics for Optimizing Multi-clip Queries

11

assume that the clips of the same presentation do not overlap (linearity constraint). The task of the optimizer is then to assign, for every clip of the submitted presentation, a start-time with respect to all defined constraints (i.e., precedence, delay, linearity, and server bandwidth constraints). 4.1. Notations and Definitions Table I. Notations used. Symbol

Meaning

C |C| ci li bi (t) si P VP EP (i, j) Sk SuccSk (i, P)

: : : : : : : : : : : :

φSk (P)

:

W orkSk (P)

:

Bandwidth(t) `Sk (P)

: :

the set of clips queued in the server and not yet delivered. cardinality of C. the identifier of a clip (ci ∈ C). length of clip ci in time units. delivery bandwidth required for clip ci at the time t. start-time of clip ci in time units. identifier of a presentation. set of clips in a presentation P. set of precedence constraints in a P. precedence constraint : ci precedes cj . identifier of a schedule. function which returns the identifier of the clip which is the successor of ci in the schedule Sk of presentation P. function which returns the start-time of the first clip in the schedule Sk of presentation P. workload in Mb/s introduced into the server by the presentation P for the schedule Sk available server bandwidth at the time t. length of a presentation P for a schedule Sk in time units. It is computed as the sum of the lengths of all participating clips plus the sum of the waiting times between two successive participating clips.

Definition 1. (Presentation) A presentation P submitted to the server is composed of the following components : • a set VP of n clips, VP = {c1 , . . . , cn }, such that ∀i (1 ≤ i ≤ n), ci ∈ C and n ≤| C |; • a set EP of precedence constraints, EP = {. . . , (i, j), . . .} such that (i, j) means that the clip ci precedes the clip cj and ci , cj ∈ VP ;

koschmtap234-01.tex; 21/02/2004; 20:28; p.11

12

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

• the constraints M ax Startup, M ax Delay and M in Delay, i.e., the maximum tolerable start-time of the schedule, the maximum and minimum tolerable delays between two clips in a schedule, respectively. A valid schedule for a submitted presentation P is an assignment of a start-time to each c ∈ VP within the condition that none of the assignments violates the following constraints. Definition 2. (Valid Schedule) A schedule S k with respect to the submitted presentation P, with VP = {c1 , . . . , cn }, is said to be valid, iff : • Resource constraint ∀t ∈ [φSk (P), φSk (P) +`Sk (P)] and ∀i ∈ [1, n] : if si ≤ t ≤ si + li then bi (t) ≤ Bandwidth(t). This condition formalizes the constraint on the available bandwidth (disk and network), i.e., a schedule could only be valid if the available bandwidth is under the required delivery bandwidth for each clip in the presentation. • Precedence constraint ∀(i, j) ∈ EP : si + li < sj . This condition is related to the ordering of clips when delivering them. • Linearity constraint @i, j(1 ≤ i, j ≤ n) : si ≤ sj ≤ si + li . This constraint specifies that there may not exits an overlapping of two clips of the same presentations. • Delay constraints φSk (P) ≤ M ax Startup. and : ∀i ∈ [1, n] : if j = SuccSk (i, P) then M in Delay ≤ sj − (si + li ) ≤ M ax Delay. These constraints are related users/applications can tolerate.

to

waiting

times

that

koschmtap234-01.tex; 21/02/2004; 20:28; p.12

Heuristics for Optimizing Multi-clip Queries

13

4.2. Number of Valid Schedules The number of valid schedules depends on the number of possible orderings of clips c ∈ VP . Let us consider the maximum number of orderings, i.e., when no precedence constraints are imposed. Then there exist n! orderings of clips c ∈ VP , i.e., all possible permutations of clips c ∈ V P , where n is the cardinality of VP . The complexity of the problem, that means the number of valid schedules, evaluates as the sum of possible assignments of all clips for each ordering. Let us further suppose that no resource constraints are imposed (i.e., sufficient server bandwidth is available) and that the time unit is 1s. Then the first clip of each ordering can be placed at (M ax Startup+1) slots. The second can be placed at (M ax Delay − M in Delay + 1) slots, and the remaining clips of each ordering can be placed also at (M ax Delay − M in Delay + 1) slots. Thus, the number of valid schedules evaluates as n! ∗ (M ax Startup + 1) ∗ (M ax Delay − M in Delay + 1)(n−1) . This number of possible schedules is very high even for a small number of clips in a presentation. For instance let us consider a presentation containing 10 objects, and a setting of the delay constraints to M ax Startup = 5s, M ax Delay = 3s, M in Delay = 0s. The number of possible schedules evaluates then to 2.28 ∗ 10 13 . 4.3. Metric The multi-clip optimizer has to orient its optimization goal at a given metric. We propose to consider two metrics relevant to content-based video retrieval (e.g., News-on-Demand application, or video archives). These are the server workload and the response time. The Workload metric measures the mean network and server bandwidth in Mb/s and the Response Time measures the time difference (in seconds) between the presentation submission and the delivery of the first clip. Minimizing Response Time is interesting for single user requests, whereas minimizing Workload is interesting for higher request throughput. We consider both metrics individually for our optimization problem, as this is also the case for complex database query processing. Therefore, we first propose a heuristic to minimize the Workload of the Server (RSI in Section 5.2) and then a heuristic to minimize the Response Time (mRSI in Section 5.3). We now define the two metrics as follows : Definition 3. (Metric) Let P be the actual submitted presentation. A schedule Sopt from the set of valid schedules {S1 , . . . , Sk . . . , Sm } is said optimal (with respect to the metric) iff :

koschmtap234-01.tex; 21/02/2004; 20:28; p.13

14

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

Workload : ∀k(1 ≤ k ≤ m) : W orkSopt (P) ≤W orkSk (P) Response Time : ∀k(1 ≤ k ≤ m) : φSopt (P) ≤ φSk (P).

5. Heuristic Approaches for optimizing Multi-Clip Queries As mentioned before, the admission of a newly submitted presentation depends on the scheduling of the clips of that presentation. This implies that the scheduling task is performed on-line. Moreover, as the server receives several presentations concurrently, the processing time taken to compute a schedule affects not only the response time of the current presentation being processed but also the response times of the other presentations. Here, we face a new constraint that is related to the computational time of a presentation schedule. As the number of possible valid schedules is very high even for a small number of clips in a presentation, as shown in Subsection 4.2, optimal algorithms have not been considered because of their relative long processing time. We concentrate mainly on heuristic approaches to fulfill all the constraints including the computational time of the schedule. In this Section, we will present two new heuristics called Restricted Search Interval (RSI) for the Workload metric and modified RSI (mRSI) for the Response Time metric, that attempt to maximize piggybacking. Before outlining the principles of these heuristics, we will introduce the basic heuristic algorithm, called the baseline heuristics, generally used to schedule presentation clips. 5.1. Baseline heuristics The baseline scheduling heuristics is a simple and widely used list scheduling. The clips of a presentation submitted to the server are scheduled according to their order of presentation (where this order is determined by the precedence constraints. We start to schedule a clip as early as possible, i.e., for the non-first clips at the end position of the last scheduled clip plus the minimum delay which does not violate the delay and resource constraints. This approach has the advantage of being simple and fast but it does not utilize the potential benefits of piggybacking. Let us use an example to illustrate the baseline heuristics. Let us assume that the optimizer has to schedule the presentation P 1 of the example in Section 1 with the specifications : P1 = {(c1 , 15, 1.5), (c2 , 10, 3), (c3 , 15, 1.5)} M ax Startup = 25s, M ax Delay = 10s, M in Delay = 0s and the precedence (c1 , c2 ).

koschmtap234-01.tex; 21/02/2004; 20:28; p.14

15

Heuristics for Optimizing Multi-clip Queries

Server Bandwidth in Mb/s

Let us further assume that the server has an available network bandwidth of 7.5Mb/s and that the actual server workload is as shown in figure 4 (all shareable clips are displayed).

7.5 6.0 4.5

C2

3.0 1.5

C1

0 0

C1

Time in s

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Server Bandwidth in Mb/s

Figure 4. Actual server workload at the submission time of P1 .

7.5 6.0 4.5 3.0

   C1      C3      C2         

1.5

C1

0 0

C2 C1

Time in s

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Figure 5. Schedule of the presentation P1 generated by the baseline heuristics.

Figure 5 shows the schedule generated by the baseline heuristics. The three clips of P1 are placed one after the other with a startup time of 0s and a maximum delay in between clips of 0s. This is possible, because the server has enough bandwidth available in this time period. One clearly sees that the schedule is optimal with respect to the Response Time metric. However, as no sharing of clips from previously submitted presentations is done, the schedule is not optimal with respect to the Workload metric. The optimality with respect to the Response Time is achieved with a higher mean workload compared to schedules generated by the RSI heuristics. Thus we expect that a query optimizer implementing the baseline heuristics rejects more presentations for a high workload than one implementing the RSI heuristics. The experiments in Section 6.3 confirm this assumption and show that the rejection rate for the baseline heuristics was on the average about 1/3 higher than for the RSI (and the mRSI) under the experimental settings (5000 presentations submitted with a mean interval time equal to 1s).

koschmtap234-01.tex; 21/02/2004; 20:28; p.15

16

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

5.2. Restricted Search Interval Heuristics (RSI) The principle of the proposed heuristics RSI is to merge heavy clips (where length × delivery rate is large) already queued for schedule from previously submitted presentations. The pseudo-code in figure 6 gives an overview of RSI. It returns a valid schedule for a submitted presentation P . Schedule RSI(P) { Part A Construct a list P L of clips from VP such that ∀c ∈ P L : c ∈ C; Sort P L by decreasing weight and store back to P L; Part B While (P L not empty) { cp = f irst(P L); P L = tail(P L); Construct a list SL of clips from C such that ∀c ∈ SL : c = c p ; Sort SL by decreasing start time and store back to SL; While (SL not empty) { Part B1 cs = f irst(SL); SL = tail(SL); Determine a Search Interval I for cs and P ; Part B2 If (∃ valid schedule for P in I) return this schedule; } } Apply the baseline; } Figure 6. Pseudo-code explaining the main parts of the RSI heuristics.

For a submitted presentation P , the heuristics operates in two main steps, denoted by A and B in the pseudo-code: A Split the clips of the submitted presentation P into two clip lists, one for clips where sharing is possible with clips already queued for schedule from previously submitted presentations (piggybacking list, denoted by P L) and one for the remaining clips. Sort the piggybacking list by decreasing weight (length × delivery rate) of the clips in order to maximize the effect of piggybacking. The resulting list is stored back in P L.

koschmtap234-01.tex; 21/02/2004; 20:28; p.16

17

Server Bandwidth in Mb/s

Heuristics for Optimizing Multi-clip Queries



Search Interval (I2) C3

7.5

C1

C2

C3

Sharing

6.0 4.5

C2

3.0 1.5

C1

0 0

5

10

15

20

C3



25

30

35

C1

Time in s

C1 40

45

50

55

60

65

70

75

80

85

90

95

100

Sharing C2

Sharing

C3

Search Interval (I1) C3

   

C1

C2

C3

Invalid Search Interval (I3)

Figure 7. Construction of a search interval for the presentation P 1 .

B Go through the piggybacking list and find a valid schedule for the submitted presentation P knowing that the clip currently being examined is shared with previously submitted presentations. This task is denoted by Part B1 in the pseudo-code and will be detailed in the following. If for all clips in the piggybacking list no valid schedule for P can be constructed (this means that the piggybacking is not possible), then apply the baseline algorithm on P . The search for a valid schedule is denoted by Part B2 in the pseudo-code and will be detailed in the following. The second part of RSI, Part B, is performed as follows : we start by considering the first element cp of the piggypacking list P L (the heaviest sharable clip). A new list, SL, is constructed which contains those clips from the list of already queued clips which are identical to c p (these clips may occur in different presentations). We then consider the first clip cs of SL which has the smallest start-time among all clips of SL. For this clip cs we determine a search interval I defined as the maximal interval the presentation P can occupy under the given constraints (Part B1). Figure 7 illustrates the determination of a search interval. Assume again that the optimizer has to schedule the presentation P 1 of the previous example under the actual server workload as shown in figure 4. The piggypacking list P L for P1 under the actual server workload is supposed to be (c2 , c1 ) (as length ∗ rate of c2 is greater than that of

koschmtap234-01.tex; 21/02/2004; 20:28; p.17

18

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

c1 ). There exist two possible search intervals for P 1 for the workload situation outlined in figure 7; interval I1 related to the sharing of clip c1 and interval I2 related to the sharing of clip c 2 . The interval I3 related to the sharing of clip c1 is not valid, because its startup time of 55s violates the M ax Startup constraint of 25s. The length of a search interval depends on the precedence and delay constraints imposed on the server, thus for the interval I2, the clip c 1 has to be placed before c2 and the clip c3 can be placed before or after c1 , leading to the drawn search interval in figure 7. If several search intervals are related to one clip in the piggypacking list, the one with the less start-time is chosen first for scheduling. This allows us to target the Response Time metric as the optimization subgoal.

C3

C1

C1

C3

Branch & Bound Search Tree of Interval I2

   

    

C2

C2

Server Bandwidth in Mb/s

Search Interval (I2) 7.5

C3

6.0

C1

4.5

C2

3.0 1.5

Time in s

C1

0 0

5

10

15

20

25

30

35

40

45

C3

50

55

60

65

70

75

80

85

90

95

100

C2

Search Interval (I1)

      

C3 C1

      

C1

C2

C2

C3

C3

C2

Branch & Bound Search Tree of Interval I1

Figure 8. Branch & bound search trees for the two search intervals I1 and I2.

The heuristic starts by considering the search interval I2 related to the first clip c2 in the piggypacking list and tries to find a valid

koschmtap234-01.tex; 21/02/2004; 20:28; p.18

19

Heuristics for Optimizing Multi-clip Queries

Server Bandwidth in Mb/s

schedule in this search interval. If this is not possible due to a constraint violation, we consider the search interval I1 related to the second clip c1 in the piggypacking list (there is no second possible search interval related to c2 ). If the search for a valid schedule fails again, thus no valid schedule for any of the clips in the piggybacking list can be found, we apply the baseline algorithm. The task of determining a valid schedule in a search interval is performed by constructing a branch & bound search tree (Part B2). The heuristics stops when the first valid schedule is found. We illustrate the principle of the search in figure 8. Let us assume that the optimizer has to schedule the presentation P1 under the same actual server workload as above. Figure 8 shows the branch & bound search trees for the two search intervals I1 (clip c1 shared) and I2 (clip c2 shared). The RSI heuristics considers the interval I2 first. The clip c 1 has to be placed before c2 and the clip c3 can be placed before or after c1 . This leads to the displayed search tree with two branches. The branch placing c3 after c2 is not exploited, because in this case no assignment of a start-time to c1 , which does not violate the M ax Startup constraint, is possible. The branch & bound search tree constructed for I1 has three branches, as only one precedence (c 1 , c2 ) has to be taken into consideration. Clip c3 can be placed after c1 , because the startup time of all schedules for this situation is 25s and therefore within the limits of M ax Startup.

7.5 6.0 4.5 3.0

    C3         

1.5

  C1     ! "  ! " ! "!  ! "  ! " ! "! !" ! C2" ! "! !" ! " ! "!

0 0

Time in s

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Figure 9. Schedule of the presentation P1 generated by the RSI heuristics.

Figure 9 shows the schedule finally generated by the RSI heuristic. The branch & bound search starts considering the branch c 3 , c1 , c2 of the search tree. It places c3 at the beginning of the search interval and then tries to place c1 within a waiting time interval from M in Delay of 0s to M ax Delay of 10s. Only at the maximum delay time of 10s can a valid schedule be constructed (shown in figure 9). One valid schedule has been found then and the search stops.

koschmtap234-01.tex; 21/02/2004; 20:28; p.19

20

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

In order to fulfill the on-line requirement of the optimizer, we set a computational time limit for the heuristics to find a valid schedule so as not to exceed the time limit imposed. If this limit is exceeded the baseline algorithm is triggered. Note that this limit is an input parameter for the RSI heuristics and can be set based on the arrival rate of presentations. 5.3. Adaptation of the RSI for the Response Time Metric: mRSI heuristics The principle of the proposed RSI heuristics was to merge heavy clips, thus minimizing the metric Workload. In order to find a heuristically acceptable solution for the Response Time metric, we propose a modified RSI (mRSI) which merges a clip with clips already queued for schedule from previously submitted presentations having the smallest start-time among all possible candidates. That means a clip c1 of the actual presentation is stored in the piggypacking list before another clip c 2 , if the smallest start-time of all instances of c1 already queued for schedule is smaller than the smallest start-time of all instances of c2 queued for schedule. If one fails to find a valid schedule for the first instance of the first clip in the piggypacking list, the first clip is inserted in the piggypacking list again, according to the smallest start-time of the remaining queued clips. For instance, let c1 be queued two times from previously submitted presentations with start-times of 10s and 30s and c2 be queued once with a start-time of 20s. Then the initial piggypacking list would be (c 1 , c2 ). If one fails to find a valid schedule for the first instance of c 1 (with start-time 10s), then c1 is inserted after c2 into the piggypacking list. The reason is that the remaining instance of c1 has a start-time of 30s, whereas the unique instance of c2 has a start-time of 20s. Thus the unique instance of c2 is considered next. Let us now reconsider the schedule of the presentation P 1 and concentrate on figure 8 that shows the branch & bound search trees for the two search intervals I1 (clip c1 shared) and I2 (clip c2 shared). Contrary to the RSI, the mRSI heuristics considers first the search intervals I1 (clip c1 shared) and then the interval I2 (clip c 2 shared). The reason is because of the smaller start-time of the already queued clip c1 which is 25s (the queued clip c2 has a start-time of 55s). Figure 10 shows the schedule finally generated by the mRSI heuristic. The branch & bound search starts considering the branch c 3 , c1 , c2 of the search tree and places c3 at the beginning of the search interval. Clip c2 is placed after c1 with respect to the delay constraint. Due to

koschmtap234-01.tex; 21/02/2004; 20:28; p.20

21

Server Bandwidth in Mb/s

Heuristics for Optimizing Multi-clip Queries

7.5 6.0 4.5 3.0

$# $# $# $#$# $# &% &% &% &%

) * ) *) * ) *) *)C2* ) *) *)*

C3

1.5 0 0

'( '( '( '( ('(' C1(' ('

Time in s

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Figure 10. Schedule of the presentation P1 generated by the mRSI heuristics.

the resource constraints of the server, c 2 has to be placed after the maximum waiting time of 10s. Obviously we aim at minimizing the response time while paying a price for a possible increase of the server’s workload and the risk of accepting less presentations. In our previous example, the mean server workload when applying the mRSI heuristics is 1.6% higher than the workload when applying the RSI heuristics. We will show in the experimental Section 6.3 that the increase in workload is small and varies from 3% to 6% for the different distribution parameters used. 6. Experimental Analysis This Section describes the series of experiments we performed in order to evaluate the effectiveness of the suggested heuristics. We have implemented a multi-clip query optimizer which performs admission control based on the baseline algorithm and the RSI and mRSI algorithms. Let us start by presenting the experimental settings : − The number of presentations, NP res , submitted to the server was fixed at a default value of 5000. The arrival rate of presentations is modeled as a Poisson process with a mean inter-arrival time equal to 1 second. The size of the clip database varies between 500 and 10,000 clips in steps of 500 (thus we considered 20 different values). The number of clips per presentation is chosen randomly between 3 and 5. The length of a clip is chosen randomly between 20 and 60 seconds. The compression rate of the clips is randomly chosen from the two compression rates of MPEG-1 with 1.5 Mb/s and MPEG2 with 4 Mb/s. The set of precedence constraints is preliminarily held empty. Later (Subsection 6.2) we introduced one precedence for presentations containing 3 or 4 clips and two precedences for presentations having 5 clips. The clips involved in the precedence dependency are chosen randomly from all clips in a presentation. Finally, the server has an available bandwidth of 100Mb/s.

koschmtap234-01.tex; 21/02/2004; 20:28; p.21

22

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

− The constraint M in Delay is set to the typical default value of 0s for News-On-Demand applications. The M ax Startup is set to 60s, and the M ax Delay is set to 10s. In order to evaluate the effectiveness of the proposed heuristics, we use realistic statistical distributions of clips. The first one is based on the Zipf distribution which is proven to be close to the access distribution in video archives in general and in News-On-Demand archives in particular (Dan et al., 1994). The second one is based on Hot-Spot distribution where a subset of clips (hot clips) are more frequently requested than others. Note that for all experiments, we measured a mean computational time of 40 milliseconds for the RSI heuristics. This computational time is negligible in comparison to the advantage of the RSI heuristics as reported below. The experimental results are grouped into two parts. First we consider the Workload Metric for both Zipf- and Hot-Spot distributions by varying the distribution characteristic and the delay constraints. In the second part we concentrate on the Response Time Metric for both Zipf- and Hot-Spot distributions. Finally, we give a brief discussion of the experimental results. 6.1. Workload Metric 6.1.1. Zipf Distribution In these experiments, we measure the mean server workload under the two heuristics (baseline and RSI). All presentations have been admitted for the conditions discussed above. In our analysis we vary the Zipf parameter with four values : param1 = 0.9, param2 = 0.7, param3 = 0.5, and param4 = 0.3. The higher is the value of the parameter, the more the same clips are requested. For a parameter value equal to 0, a uniform distribution is reached, i.e., let N pres be the total number of submitted presentations and let Access(i) be the access frequency of PNP res 1 −1 −param ∗i . the clip ci , then Access(i) = ( i=1 iparam ) Figure 11 displays the mean workload reduction (in %) using the RSI heuristics instead of the baseline for the Zipf distribution. Figure 12 displays a typical server workload distribution for the baseline and for the RSI heuristics (param2 = 0.7). Figure 11 shows that a significant reduction in the server workload by applying the RSI heuristics instead of the baseline can be achieved. The highest achieved reduction for each parameter (database size of 500 clips) ranges from 24.4% for param4 = 0.3 to 46.8% for param1 = 0.9. With an increase in the size of the clip’s database, the reduction decreases, and remaines very significant. For example, e.g., for the largest

koschmtap234-01.tex; 21/02/2004; 20:28; p.22

23

Heuristics for Optimizing Multi-clip Queries 50

param=0.9 param=0.7 param=0.5 param=0.3

45

Workload reduction (in %) using RSI heuristics

40

35

30

25

20

15

10

0

1000

2000

3000

4000 5000 6000 7000 Number of clips in the database

8000

9000

10000

Figure 11. Mean workload reduction (in %) using RSI heuristics instead of baseline for the Zipf distribution with respect to the variation in the Zipf parameter.

database size of 10,000 clips, the reduction is 13.2% for param 4 = 0.3 and 27.1% for param1 = 0.9. Figure 12 gives additional details about the behavior of the two algorithms for the Zipf distribution. For databases containing larger number of clips, the sharing of clips between presentations decreases in the distribution and this consequently affects the effectiveness of the RSI heuristics. It shows that the mean server workload is around 59 Mb/s with the baseline algorithm for a database of size smaller than 8,000 clips and then increases slightly as the size of the database increases. On the other hand, the mean server workload increases steadily for the RSI heuristics and this increase becomes smoother with a higher database size. 6.1.2. Hot-Spot Distribution In these experiments, we study the impact of the RSI heuristics for the Workload Metric in the presence of a Hot-Spot distribution. We assume that 80% of the queries (presentations) access hot percentage of clips in the database. We consider four values for the percentages of hot clips : hot1 = 8%, hot2 = 11%, hot3 = 17%, and hot4 = 50%. The lower the hot percentage value, the more the same clips are requested. Figure 13 illustrates the mean workload reduction (in %) using the RSI heuristics instead of the baseline. We achieve a significant reduc-

koschmtap234-01.tex; 21/02/2004; 20:28; p.23

24

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie 4

6.2

x 10

6 5.8

Mean workload (Kb/s)

5.6 5.4 5.2 5 4.8 4.6

RSI Heuristics Baseline

4.4 4.2

0

1000

2000

3000 4000 5000 6000 7000 Number of clips in the database (param=0.7)

8000

9000

10000

Figure 12. Typical example of the server workload for the Zipf distribution (param2 = 0.7).

tion of the mean server workload when using the RSI heuristics – as with the Zipf distribution. The highest achieved reduction for each parameter (database size of 500 clips) varies between 17.4% (for the case hot4 = 50%) and 55.6% (for hot1 = 8%). With an increase in the size of the clip’s database, the reduction rate decreases, but remaines very significant. For example, for the largest database size of 10.000 clips the reduction is 8.2% for hot4 = 50% and 28.0% for hot1 = 8%. Figure 14 gives additional details about the behavior of the two algorithms. The mean server workload for the Hot-Spot distribution shows similar characteristics concerning the scalability as for the Zipf distribution, i.e., the advantage of the RSI over the baseline is significant even for higher database sizes. 6.2. Workload Metric including Precedence Dependencies In these experiments we include precedence dependencies in the presentations submitted to the server. For presentations containing 3 or 4 clips we introduced one precedence and for presentations having 5 clips we introduced two precedences. We compared the mean server workload of a multi-clip optimizer using the RSI heuristics to one using the baseline. In order to measure the robustness of the multiclip optimizer we let the optimizer using the baseline heuristics not to

koschmtap234-01.tex; 21/02/2004; 20:28; p.24

25

Heuristics for Optimizing Multi-clip Queries 60

hot=50% hot=17% hot=11% hot=8%

Workload reduction (in %) using RSI heuristics

50

40

30

20

10

0

0

1000

2000

3000

4000 5000 6000 7000 Number of clips in the database

8000

9000

10000

Figure 13. Mean workload reduction (in %) using RSI heuristics instead of baseline for the Hot-Spot distribution with respect to the variation in the hot parameter.

consider precedence dependencies in the submitted presentations. Thus, the RSI-heuristics is in the unfavorable condition to pay attention to precedences which means that fewer slots are available to it to place all clips of a presentation than in the non-precedence case. Figure 15 displays the mean workload reduction (in %) using the RSI heuristics instead of the baseline for the Zipf distribution and figure 16 displays the difference (in % to the baseline) of the workload reduction having the RSI heuristics to deal with precedences. Results for the HotSpot distribution are similar to those for the Zipf distribution and are not shown here. Figure 15 reveals similar characteristics as figure 11 for the experiments without precedence dependencies, but obviously one notices a decrease in the workload reduction. This is clear as we based us here, compared to figure 11, in a more unfavorable condition. Figure 16 quantifies the difference in the workload reduction when the RSI heuristics has to deal with precedences. On the average, the difference is not significant.

koschmtap234-01.tex; 21/02/2004; 20:28; p.25

26

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie 4

6

x 10

5.8

5.6

Mean workload (Kb/s)

5.4

5.2

5

4.8

4.6

4.4

4.2

RSI Heuristics Baseline 0

1000

2000

3000 4000 5000 6000 7000 Number of clips in the database (hot=11%)

8000

9000

10000

Figure 14. Typical example of the server workload for the Hot-Spot distribution (hot2 = 11%).

6.3. Response Time Metric In the following experiments we study the efficiency of the two heuristics: the RSI (merges heavy clips) and the mRSI (merges clips according to smallest start-time) for the Response Time metric. We assume now that the server has a smaller available bandwidth of only 50 Mb/s. Under these conditions not all of the 5000 submitted presentations can be admitted. Such experimental protocol shows better the tradeoff between start-time reduction and increased workload. Thus our experiments consist of two parts; first we compare the response time and then the number of accepted presentations of RSI and mRSI. We obtain a relatively invariable result with respect to the distribution parameters. As a demonstration, we showed here the result for a Zipf distribution with a parameter of 0.3 which represents a very uncomfortable situation for the server. That means limited server bandwidth of 50 Mb/s together with a low Zipf parameter means less piggypacking and results in the rejection rate fall for a large clip database below 25%. Figure 17 shows the Response Time for the RSI and mRSI heuristics for the Zipf distribution (param4 = 0.3). For a database of a small number of clips (≤ 5500 clips), the response time increases sharply, as for a larger database. For a database size higher than 7000, no significant differences have been observed and these are, therfore, not shown.

koschmtap234-01.tex; 21/02/2004; 20:28; p.26

27

Heuristics for Optimizing Multi-clip Queries 40

param=0.9 param=0.7 param=0.5 param=0.3

Workload reduction (in %) using RSI heuristics

35

30

25

20

15

10

5

0

1000

2000

3000

4000 5000 6000 7000 Number of clips in the database

8000

9000

10000

Figure 15. Mean workload reduction (in %) using RSI heuristics instead of baseline for the Zipf distribution where presentations contain precedence dependencies.

The response time of the RSI is always above that of the mRSI and the difference is on the average 1.7s. Although the mean response time difference is only 1.7s, it should be noted that for response time-critical applications the achieved reduction is significant. Figure 18 shows the number of accepted presentations of the RSI and mRSI heuristics for the Zipf distribution (param 4 = 0.3). The number of accepted presentations decreases sharply as the database size increases from 500 clips to 3500 clips (where slightly more than 25% of the total 5000 presentations are accepted for a database of size 3500 clips). The percentage of accepted presentations remains fairly constant for a database of size 5500 clips or more. On the average, RSI admitted 4.5% more presentations than mRSI. Whether or not the consequences of this higher number of rejections can be accepted depends upon the application requirement. The response time of the baseline heuristics is around 33s for the same settings (not shown in the figure). It is, therefore, only slightly higher than the response time of the mRSI heuristics. However, the rejection rate of the baseline heuristics is on the average about 1/3 higher than for both the mRSI and the RSI. This clearly shows the efficiency of the proposed heuristics.

koschmtap234-01.tex; 21/02/2004; 20:28; p.27

28

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie 7.5

param=0.9 param=0.7 param=0.5 param=0.3

Difference in workload reduction (in %) with precedences

7.4 7.3 7.2 7.1 7 6.9 6.8 6.7 6.6 6.5

0

1000

2000

3000

4000 5000 6000 7000 Number of clips in the database

8000

9000

10000

Figure 16. Difference of the workload reduction having the RSI heuristics to deal with precedences.

6.4. Discussion The experimental results with the Workload and Response Time Metrics clearly demonstrate the advantages of the RSI and the mRSI 36

35

Response Time in s

34

33

32

31

30

29

28

mRSI Heuristic RSI Heuristic 0

1000

2000 3000 4000 5000 Number of clips in the database (param=0.3)

6000

7000

Figure 17. Response Time for the Zipf distribution (param4 = 0.3).

koschmtap234-01.tex; 21/02/2004; 20:28; p.28

29

Heuristics for Optimizing Multi-clip Queries 2100

mRSI Heuristic RSI Heuristic

2000

Accepted Presentations out of 5000

1900 1800 1700 1600 1500 1400 1300 1200 1100

0

1000

Figure 18. Number (param4 = 0.3).

2000 3000 4000 5000 Number of clips in the database (param=0.3)

of

Accepted

Presentations

for

6000

the

Zipf

7000

distribution

heuristics (with its piggybacking strategy) over the baseline algorithm for optimizing multi-clip queries in video databases. The workload over all experiments reduced on the average by 28% (without precedence constraints). If we consider precedences, the reduction in workload on the average decreases only slightly but remains significant. Finally, note that little overcost was introduced by imposing a time limit on the computing time. In all, a mean computing time of 40 milliseconds was measured for the proposed heuristics.

7. Conclusion and Future Work In this paper we tackled the problem of multi-clip optimization. Compared to related approaches (e.g., (Shahabi et al., 1998a; Raymond and Paul, 1998; Garofalakis et al., 1998)), we considered a more realistic scenario which allows the optimization of both structural and temporal constraints on the delivery (i.e., delay, resource and precedence constraints) of clips of any rate and duration. We developed a novel heuristic approach that takes advantage of piggybacking, based on a multicast enabled network. The experimental results clearly showed the effectiveness of the suggested heuristics. In future works, we plan to tackle the problem of users/applications interactivity. Indeed, the users/applications interactivity pose a number of challenges : first classical interactivity which is related to individual

koschmtap234-01.tex; 21/02/2004; 20:28; p.29

30

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

clips (VCR functionalities : forward, rewind, etc.) desynchronizes the planned sharing of clips made by the optimizer. In that case, more server resources are needed to cope with the desynchronization. Secondly, other interactivity functionalities for multi-clip queries have to be supported by the server. Such functionalities may include : (a) clips jumping : a user may jump to a specific clip, (b) dynamic clips ordering : during the clips delivery, an application may ask to reorder the rest of the clips by imposing a new metric (clips with low duration first), (c) clips adding : the external network may ask the server to re-send a loosed clip (i.e., add dynamically the loosed clip to the presentation being currently delivered). Furthermore, we are currently examining video adaptation capabilities in multicast enabled networks if resource availability in routers and proxies is no longer sufficient to deliver the presentation properly (Kosch, 2002). Especially, we are interested in how the server can support adaptation in the network. For this purpose, we insert into the streams MPEG-7 descriptions, mainly an instance of the Variation Description Scheme (van Beek et al., 2001), to describe the adaptation capacities of the video stream.

References Adali, S., M. Sapino, and V. Subrahmanian: 1999, ‘A Multimedia Presentation Algebra’. In: Proceedings of the ACM SIGMOD Conference. pp. 121–132. Anderson, D.: 1997, ‘Device reservation in Audio/Video Editing Systems’. ACM Transactions On Computer Systems 15(1), 111–133. ¨ Balkir, N. and G. Ozsoyoglu: 1998a, ‘Delivering Presentations from Multimedia Servers’. VLDB Journal 7(4), 294–307. ¨ Balkir, N. and G. Ozsoyoglu: 1998b, ‘Multimedia Presentation Servers: Buffer Management and Admission Control’. In: Proceedings of the International Workshop on Multimedia Database Management Systems. pp. 154–161. Bouras, C., V. Kapoulas, D. Miras, V. Ouzounis, P. Spirakis, and A. Tatakis: 1999, ‘On-Demand Hypermedia/Mutimedia Service using Pre-Orcestrated Scenarios over the Internet’. Networking and Information Systems Journal (Hermes Science) 2(5-6), 741–762. Campbell, S. T. and S. Chung: 2002, ‘Scheduling and Optimization of the Delivery of Multimedia Streams Using Query Scripts’. Multimedia Tools and Applications 18(1), 5–30. Chen, F.-C. F. and M. H. Dunham: 1998, ‘Common Subexpression Processing in Multiple-Query Processing’. IEEE Transactions on Knowledge and Data Engineering 10(3), 493–499. Dan, A., D. Sitaram, and P. Shahabuddin: 1994, ‘Scheduling Policies for an OnDemand Video Server With Batching’. In: ACM International Multimedia Conference. pp. 15–23. Diot, C., B. Levine, B. Lyles, H. Kassem, and D. Balensiefen: 2000, ‘Deployment issues for the IP multicast service and architecture’. IEEE Network 14(1), 78–88.

koschmtap234-01.tex; 21/02/2004; 20:28; p.30

Heuristics for Optimizing Multi-clip Queries

31

El-Rewini, H., T. Lewis, and H. Ali: 1994, Task Scheduling in Parallel and Distributed Systems. Prentice-Hall. ¨ Garofalakis, M., Y. Ioannidis, and B. Ozden: 1998, ‘Resource scheduling for composite multimedia objects’. In: International Conference on Very Large Databases. New York, pp. 74–85. Huang, J., P. Wan, and D.-Z. Du: 1998, ‘Criticality- and QoS-Based Multiresource Negotiation and Adaptation’. Real-Time Systems 15(3), 249–273. Jiang, H., D. Montesi, and A. K. Elmagarmid: 1999, ‘Content-based Access to Video Databases’. Multimedia Tools and Applications 9(3), 227–249. Jiang, X. and P. Mohapatra: 1999, ‘Efficient Admission Control Algorithms for Multimedia Servers’. Multimedia Systems 7(4), 294–304. Johnson, T. V. and A. Zhang: 1999, ‘Dynamic Playout Scheduling Algorithms for Continuous Multimedia Streams’. Multimedia Systems 7(4), 312–325. Kosch, H.: 2002, ‘MPEG-7 and Multimedia Database Systems’. SIGMOD Records 31(2). Kosch, H., A. Moustefaoui, and L. Brunie: 2002, ‘Semantic based Prefetching in News-on-Demand Video Servers’. Multimedia Tools and Applications 18(2). Kosch, H., R. Tusch, L. B¨ osz¨ orm´enyi, A. Bachlechner, B. D¨ orflinger, C. Hofbauer, C. Riedler, M. Lang, and C. Hanin: 2001, ‘SMOOTH - A Distributed Multimedia Database System’. In: Proceedings of the International VLDB Conference. Rome, Italy, pp. 713–714. Lau, S.-W., C. S. Lui, and L. Golubchik: 1998, ‘Merging Video Streams in Multimedia Storage Server: Complexity and Heuristics’. Multimedia Systems 6(1), 29–42. ¨ ¨ Lee, T., L. Sheng, N. Balkir, A. Al-Hamdani, G. Ozsoyoglu, and Z. Ozsoyoglu: 2000, ‘Query Processing Techniques for Multimedia Presentations’. Multimedia Tools and Applications 11(1), 63–69. Megzari, O., L. Yuan, and A. Karmouch: 2002, ‘Meta-Data and Media Management in a Multimedia Interactive Telelearning System’. Multimedia Tools and Applications 16(1-2), 137–160. Meng, H., D. Zhong, and S.-F. Chang: 1999, ‘Searching and Editing MPEGCompressed Video in a Distributed Online Environment’. Multimedia Systems 7(4), 282–293. Min-You Wu, W. S.: 2001, ‘Optimal Scheduling for Parallel CBR Video Servers’. Multimedia Tools and Applications 14(1), 79–99. ¨ Oria, V., M. Ozsu, B. Xu, L. Cheng, and P. Iglinski: 1999, ‘VisualMOQL: The DISIMA Visual Query Language’. In: IEEE International Conference on Multimedia Computing and Systems, Vol. 1. Florence, Italy, pp. 536–542. Prabhakaran, B.: 2000, ‘Adaptive Multimedia Presentation Strategies’. Multimedia Tools and Applications 12(2-3), 281–298. Raymond, T. N. and S. Paul: 1998, ‘Optimal clip ordering for multi-clip queries’. VLDB Journal 7(4), 239–252. Sellis, T. K.: 1988, ‘Multiple-Query Optimization’. ACM Transactions on Database Systems 13(1), 23–52. Shahabi, C., A. Dashti, and S. Ghandeharizadeh: 1998a, ‘Continuous Media Retrieval Optimizer and Hierarchical Storage Structures’. In: Third International Conference on Integrated Design and Process Technology IADT’98. pp. 360–367. Shahabi, C., A. Dashti, and S. Ghandeharizadeh: 1998b, ‘Profile Aware Retrieval Optimizer for Continuous Media’. In: World Automation Congress (WAC).

koschmtap234-01.tex; 21/02/2004; 20:28; p.31

32

Kosch, Mostefaoui, B¨ osz¨ orm´enyi, Brunie

¨ ¨ Sheng, L., Z. Ozsoyoglu, and G. Ozsoyoglu: 1999, ‘A Graph Query Language and Its Query Processing’. In: IEEE International Conference on Data Engineering (ICDE). Sydney, Australia, pp. 572–581. Song, Y., M. Mielke, and A. Zhang: 1999, ‘NetMedia: Synchronized Streaming of Multimedia Presentations in Distributed Environments’. In: IEEE International Conference on Multimedia Computing and Systems (Vol.2). pp. 585–590. van Beek, P., A. Benitez, J. Heuer, J. Martinez, P. Salembier, J. Smith, and T. Walker: 2001, ‘MPEG-7: Multimedia Description Schemes’. ISO/IEC FDIS 15938-5:2001. W3C: 2001, ‘Synchronized Multimedia Integration Language (SMIL) Version 2.0’. REC-smil20-20010807. http://www.w3.org/TR/smil20/. Wittmann, R. and M. Zitterbart (eds.): 2000, Multicast Communication. Morgan Kaufmann Publishers. Zhang, A. and S. Gollapudi: 2000, ‘QoS Management in Educational Digital Library Environments’. Multimedia Tools and Applications 10(2-3), 133–156.

koschmtap234-01.tex; 21/02/2004; 20:28; p.32

Suggest Documents