On Scheduling Atomic and Composite Multimedia

On Scheduling Atomic and Composite Multimedia Objects Cyrus Shahabi, Shahram Ghandeharizadeh Computer Science Department University of Southern California Los Angeles, California 90089

Surajit Chaudhuri Hewlett-Packard Labs 1501 Page Mill Road Palo Alto, CA 94304

[shahabi, shahram]@perspolis.usc.edu

[email protected]

Keywords: Composite Objects, Continuous Media Retrieval, Multimedia, Parallelism, Real-time Schedul-

ing, Storage Servers, Temporal Relationships

Abstract In multi-user multimedia information systems (e.g., movie-on-demand, digital-editing), scheduling the retrievals of continuous media objects becomes a challenging task. This is because of both intra and inter object time dependencies. Intra-object time dependency refers to the real-time display requirement of a continuous media object. Inter-object time dependency is the temporal relationships de ned among multiple continuous media objects. In order to compose tailored multimedia presentations, a user might de ne complex time dependencies among multiple continuous media objects having various length and display bandwidth requirement. Scheduling the retrieval tasks corresponding to the components of such a presentation in order to respect both inter and intra task time dependencies is the focus of this study. To tackle this task scheduling problem (CRS), we start with a simpler scheduling problem (ARS) where there is no inter task time dependency (e.g., movie-on-demand). With ARS, the order of tasks activation might impact both system throughput and observed startup latency. An on-line scheduling algorithm is hence proposed that employs alternative heuristics to order tasks dierently. Our simulation studies show that the choice of a heuristic with a high system load can impact the system performance (throughput and latency) by a factor of 20%-150%. We also investigate an augmented version of ARS (termed ARS +) where requests reserve displays in advance (e.g., reservation-based movie-on-demand). Using this advance knowledge, we develop techniques to resolve contention by employing memory buers instead of postponing tasks. By means of a simulation study, we demonstrate that the average startup latency can be decreased (up to 950%) by employing buers to resolve contention. Finally, with CRS, postponing tasks due to contention might destroy the time dependencies de ned among the tasks. Therefore, employing memory buers to resolve contention becomes unavoidable. Alternative memory assignment algorithms are proposed and simulation studies indicate that one of them outperforms the rest by a wide margin (up to 230%). One contribution of this study is the formal de nition of the scheduling problems and proof of their NP-hardness.

This research was supported in part by the National Science Foundation under grants IRI-9203389, IRI-9258362 (NYI award), and CDA-9216321, and a Hewlett-Packard unrestricted cash/equipment gift.

1 Introduction Multimedia information systems that provide service to multiple simultaneous requests are starting to become common place. In these systems, scheduling the retrievals of multimedia objects is challenging in order to: 1) satisfy the real-time constraint associated with the display of continuous media objects (i.e., audio and video), 2) maximize the number of supported simultaneous displays (throughput), 3) minimize the average startup latency incurred by the displays, and 4) respect the de ned temporal relationships among multiple displays (for applications such as digital-editing). This paper investigates three classes of multimedia applications and alternative issues in scheduling the retrievals of their objects. The application classes include: 1. On-demand Atomic Object Retrieval: With this class of applications, a system strives to display an object (audio or video) as soon as a user request arrives referencing the object. The envisioned movieon-demand and news-on-demand systems are examples of this application class. We formalize the scheduling problem that represents this class as Atomic Retrieval Scheduling (ARS) problem. 2. Reservation-based Atomic Object Retrieval: This class is similar to on-demand atomic object retrieval except that a user requests the display of an object at some point in the future. An example might be a movie-on-demand system where the customers call to request the display of a speci c movie at a speci c time, e.g., Bob calls in the morning to request a movie at 8:00pm. Reservation-based retrieval is expected to be cheaper than on-demand retrieval because it enables the system to minimize the amount of resources required to service requests (using planning optimization techniques). The scheduling problem that represents this application class is termed Augmented ARS (ARS + ). 3. On-demand Composite Object Retrieval: As compared to atomic objects, a composite object describes when two or more atomic objects should be displayed in a temporarily related manner [1]. To illustrate the application of composite objects, consider the following environment. During the post-production of a movie, a sound editor accesses an archive of digitized audio clips to extend the movie with appropriate sound-eects. The editor might choose two clips from the archive: a gun-shot and a screaming sound eect. Subsequently, she authors a composite object by overlapping these two sound clips and synchronizing them with the dierent scenes of a presentation. During this process, she might try alternative gun-shot or screaming clips from the repository to evaluate which combination is appropriate. To enable her to evaluate her choices immediately, the system should be able to display the composition as soon as it is authored (on-demand). The scheduling problem that represents this application class is termed Composite Retrieval Scheduling (CRS). The class of applications constitute CRS is the focus of this study. This is because researchers are just starting to realize the importance of customized multimedia presentations targeted toward the individual's preferences [39]. Some studies propose systems that generate the customized presentations automatically based on a user query [34]. Independent of the method that the presentation is generated, the retrieval engine must respect the time dependencies between the elements of a presentation when retrieving the data. Hence, the CRS problem is the most challenging problem and cannot be tackled unless simpler problems of ARS and ARS + are solved. A contribution of this paper is that it relates the three scheduling problems in order to apply the solutions proposed for one to the others. Other contributions of this study are as follows: 1. Classifying multimedia applications and their corresponding retrieval scheduling problems (see Sec. 1). 1

2. Identifying new scheduling problems with dierent characteristics as compared to those of conventional scheduling problems (see Sec. 6). 3. De ning a robust framework which is a combination of both previously proposed techniques and new extensions (see Sec. 2). 4. Recognizing the retrieval contention problem and investigating the features desired to devise ecient contention prediction algorithms (see Sec. 3). 5. De ning the scheduling problems formally and showing their NP -hardness in the context of an oline (clairvoyant) scheduler, where the system has a complete knowledge about all the tasks and their release times1 . 6. Proposing a simple, but exible and ecient, structure for activating requests in an on-line system. (An on-line system services requests as they arrive and has no knowledge of when requests might arrive as a function of time.) 7. Proposing some new heuristics as well as revisiting the heuristics proposed for the conventional scheduling problems and adapt them for ARS. These heuristics behave dierently only when the system is heavily loaded and they are compared in Sec. 5 using a simulation study. The choice of a heuristic on the average may improve the observed average startup latency by 150% and enhance the throughput with a high system load by 20%. 8. Introducing sliding techniques to improve the average startup latency with ARS + when the system load is low. In Sec. 5, we show that the reduction in the average startup latency with buered sliding can be in excess of 950% with a light system load. 9. Extending the ARS + sliding techniques for CRS. We show that scheduling the retrieval of a single composite object might not be feasible without employing the sliding techniques to resolve internal contention. Hence, we develop some memory assignment algorithms to resolve internal contention with the objective of minimizing the amount of required memory. Our simulation results (Sec. 5) show the superiority of one of the algorithms (up to 230% improvement). 10. Extending the structure and the heuristics of ARS for an on-line system to solve the CRS problem. In [8], we focused on a special case of the CRS problem (binary single media composite objects) and introduced buered sliding techniques to avoid retrieval contention. The CRS problem is a generalization of [8] to support: 1) a mix of media types and 2) n-ary composite objects consisting of a mix of media types. Moreover, the approaches described in this study for CRS are more general and subsume those of [8]. In [41], we assumed an n-ary composite object with pre-de ned temporal relationships. Therefore, the constituting atomic objects could be placed intelligently. Instead, CRS assumes that the temporal relationships are de ned on-demand and there is no sucient time to modify the placement of the objects. Hence, temporal relationships are satis ed by intelligent scheduling instead of intelligent placement of data. To put our work in perspective, we denote the retrieval of an object as a task. The distinctive characteristics of the scheduling problems (ARS, ARS + , CRS) that distinguish them from the conventional scheduling problems2 are: 1) tasks are IO-bound and not CPU-bound, 2) each task utilizes multiple disks during its life 1 Sec. 4 contains the details of this item and other items where no reference section is speci ed. 2 A more detailed comparison can be found in Sec. 6.

2

c(X) < Disk Activity System Activity

Wi

Xj

Zk

c (W)
i and 1 d D, in parallel with the execution of retrieval tasks, for the remainder of time interval i. Second, the scheduler should not reserve resources in advance. That is, even if we assume a regular pattern of subobject assignments, still the complexity of cop will be O(n `) if resources are reserved in advance. For example, in Fig. 3c where the objects still follow the round-robin assignment, assume a request referencing object Y arrives at time i ? 1 while cluster 3 is busy until time i + 2. Suppose that the scheduler looks ahead and reserves cluster 3 at time i +2 for Y . Now, at time i, a new request arrives referencing object Z and since Z1 can be scheduled at i with no contention, cop initiates its retrieval task at i. However, due to the advance resource reservation for object Y , a collision will occur at i + 2. Cop needs to look ahead when scheduling Z to prevent such a collision. However, similar to the case of random assignment, this increases the complexity of cop back to O(n `). Note that another data structure can keep the information that d1 is reserved at interval i and hence prevent the retrieval of Z . However, this will avoid a small task of (say) length 2 to utilize d1 and d2 at i and i + 1, respectively. The proposed scheduling algorithms for atomic objects respect these two criteria. However, we show that these criteria cannot be guaranteed when scheduling the retrieval of composite objects. In this case, the duration of time interval should be extended to incorporate the duration of computation of cop.

4 Class of Scheduling Problems To tackle the scheduling problem, rst we focus on scheduling the retrievals of atomic objects belonging to dierent media types on a multi-disk/multi-user environment. We term this problem Atomic Retrieval Scheduling, ARS. Subsequently, we introduce an augmented model of ARS, termed ARS + . Finally, we focus on scheduling the retrievals of composite objects, termed Composite Retrieval Scheduling (CRS). We show that the solutions proposed for ARS and ARS + can be extended to support CRS.

4.1 Atomic Retrieval Scheduling (ARS) To motivate the ARS problem, consider a movie-on-demand application which has to deal with multiple requests from users for movies which may have dierent bandwidth requirements and display qualities. Thus, movies may have MPEG-1 or MPEG-2 formats and dierent display qualities (e.g., NTSC or HDTV). The problem is to determine in which order a number of requests referencing these movies should be serviced so that all the customers observe a tolerable startup latency. A variation of the ARS objective is to determine resource requirements to ensure that no request observes a startup latency greater than x seconds. The ARS problem is to schedule retrieval tasks such that the total bandwidth requirement of the scheduled tasks on each cluster during each interval does not exceed the bandwidth of that cluster (i.e., no retrieval contention). Moreover, ARS should satisfy an optimization objective. Depending on the application, 7

this objective could be to 1) minimize the average startup latency of the tasks, or 2) minimize the total duration of scheduling for a set of tasks. We make the assumption that the bottleneck resource is cluster bandwidth, and that there is always an idle processor to schedule the retrieval tasks [15]. This is a valid assumption due to the performance characteristics of the disk drives relative to the processors [35, 40]. In this section, we present: 1) formal de nition of ARS, 2) show that it is a NP -hard problem, 3) discuss some issues that should be considered when implementing the on-line scheduling system, and 4) investigate some heuristics for a highly loaded system.

4.1.1 De ning Tasks Let T be a set of tasks where each t 2 T is a retrieval task corresponds to the retrieval of a video object. Note that if an object X is referenced twice by two dierent requests, a dierent task is assigned to each request. The time to display a subobject is de ned as a time interval that is employed as the time unit for the rest of this paper. For each task t 2 T , we de ne:

r(t): Release time of t, r : T ! N . The time that t's information (i.e., the information of its referenced object) becomes available to the scheduler. This is identical to the time that a request referencing the object is submitted to the system. `(t): Length (size) of the object referenced by t, ` : T ! N . The unit is in number of subobjects. c(t): Consumption rate of the object referenced by t, 0 < c(t) 1. This rate is normalized by RD . Thus, c(t) = 0:40 means that the consumption rate of the object referenced by t is 40% of RD , the cluster bandwidth. p(t): The cluster that contains the rst subobject of the object referenced by t, 1 p(t) D. It determines the placement of the object referenced by t on the clusters.

We denote a task ti as a quadruple: < r(ti ); `(ti ); c(ti ); p(ti ) >. To illustrate, consider the following example.

Example 4.1: Suppose: D = 3, and t1 :< 0; 5; 0:6; 1 >. This task corresponds to the retrieval of object X which its placement is shown in Fig. 4. The task t1 corresponds to a request referencing X that is submitted at time interval 0 (r(t1 ) = 0). Object X consists of 5 subobjects and its retrieval (or display) spans 5 time intervals, hence `(t1 ) = 5. The display bandwidth requirement of X is 0:6 RD . That is, assuming RD = 20 Mb/s, X can be an MPEG-2 compressed video object with 12 Mb/s bandwidth requirement. Furthermore, since the placement of X starts from cluster d1 , then p(t1 ) = 1. For purposes of exposition, in this as well as in future examples, we assume a high bandwidth requirement for the objects (i.e., large c(t)). In Table 3 (Sec. 5), we provide realistic values for c(t) which were used for our experiments.

4.1.2 Formal De nition of ARS De nition 4.2: The problem of ARS is to nd a schedule (where : T ! N ) for a set T , such that: 1) it minimizes the nishing time [14] w, where w is the least time at which all tasks of T have been completed,

and 2) satis es the following constraints:

8

X1 X4

Y1 Y4

Z1 Z4

V3

1

X2 X5

Y2 Y5

Z2 Z5

V1

X3

Y3

Z3

V4

2

V2 V5

3

Figure 4: Round-robin object placement, D = 3

t3

a. Example 4.3

0 1 t1 t2 t3 2 3 4 5 6

b. Example 4.5

Time

t1 t2 Time

Time

t1 0 1 2 3 4 5 6 7

0 1 2 3 4 5 6

t2 t3

c. Example 4.12

Figure 5: Required display schedules

8t 2 T , (t) r(t). 8u 0, let S (u) be the set of tasks which (t) u < (t)+`(t), then: 8i, 1 i D Pt2S(u) Ri (t) 1:0 where: Ri (t) = c(t) if (p(t) + u ? (t)) mod D = i ? 1 (1) 0:0 otherwise

The rst constraint ensures that no task is scheduled before its release time. The second constraint strives to avoid retrieval contention. It guarantees that at each time interval u and for each cluster i, the aggregate bandwidth requirement of the tasks that employ cluster i and are in progress (i.e., have been initiated at or before u but have not committed yet), does not exceed the bandwidth of cluster i. The mod-function handles the round-robin utilization of clusters per task.

Example 4.3: Suppose: D = 3, T = ft1; t2; t3g, and t1 :< 0; 5; 0:6; 1 > t2 :< 2; 5; 0:8; 2 > t3 :< 3; 5; 0:6; 1 > The placement of objects X , V , and Z referenced respectively by t1 , t2 , and t3 is shown in Fig. 4. The required display schedules of these tasks as a function of time is depicted in Fig. 5a. Consider the schedule (t) = r(t) that results in the minimum w (no task can start sooner than its release time). However, this is not a feasible schedule. To illustrate, consider the following discussion:

The schedule is: (t1 ) = 0; (t1 ) + `(t1 ) = 5 (t2 ) = 2; (t2 ) + `(t2 ) = 7 (t3 ) = 3; (t3 ) + `(t3 ) = 8

Iterating over the time intervals u: 9

{ u = 0 ) S (0) = ft1g. Hence, R1 (t1 ) = 0:6 1:0 R2 (t1 ) = 0:0 1:0 R3 (t1 ) = 0:0 1:0

{ u = 1 ) S (1) = ft1g. Hence, R1 (t1 ) = 0:0 1:0 R2 (t1 ) = 0:6 1:0 R3 (t1 ) = 0:0 1:0

{ u = 2 ) S (2) = ft1; t2g. Hence, R1 (t1 ) + R1 (t2 ) = 0:0 1:0 R2 (t1 ) + R2 (t2 ) = 0:8 1:0 R3 (t1 ) + R3 (t2 ) = 0:6 1:0 { u = 3 ) S (2) = ft1; t2; t3 g. Hence, R1 (t1 ) + R1 (t2 ) + R1 (t3 ) = 1.2 > 1:0 R2 (t1 ) + R2 (t2 ) + R2 (t3 ) = 0:0 1:0 R3 (t1 ) + R3 (t2 ) + R3 (t3 ) = 0:8 1:0

Since at u = 3, the aggregate bandwidth requirement of t1 , t2 , and t3 exceeds the bandwidth of d1 , the proposed schedule (i.e., (t) = r(t)) is not a feasible one.

ARS is NP -hard ARS can be shown to be NP-hard by a reduction from Bin Packing Problem [16]. The key intuition is that deciding which tasks to pack together for unit cluster bandwidth can be reduced to the problem of packing objects in a bin. The details are in the technical report [42].

Theorem 4.4: ARS is NP -hard. An alternative metric of optimization important in real-world applications is minimum average startup 8 latency. The startup latency P of a task t is de ned as (t) ? r(t). Hence, the objective of ARS can be modi ed to be minimizing t2T (t) ? r(t). The new objective does not impact the intaractibility of the ARS problem.

4.1.3 On-line Scheduling Since ARS is NP -hard, we need to consider ecient heuristics. Furthermore, since in real-world applications (such as movie-on-demand) a scheduler does not have the complete knowledge about all the tasks in advance, an on-line scheduler is desirable. First, we describe the main structure of the on-line scheduling algorithm. The heuristics are then discussed in Sec. 4.1.5. The main structure starts with time interval u = 0 and increment u until all t 2 T are scheduled. For each value of u the following steps are taken respectively: 8 The precise equation is (t) ? r(t) + 1 because of the cycle-based approach where the display is always one interval behind the retrieval.

10

Requests

Queue Center Create a task per request and add it to the Queue sorted on h(t)

Queue

Cop invoked at the begining of every interval u From the head to the tail of the queue: /* examine each task t */ if no-contention(t) remove t from the queue busy[u, p(t)] += c(t); activate(t , u); if (l(t)>MaxL) MaxL = l(t); end-if end-for for (i=u; i for ARS + . A sample application of ARS + could be a movie-on-demand application where the customers reserve movies in advance. For example, at 7:00 pm Alice reserves GodFather to be displayed at 8:00 pm. Hence, assuming t be the task corresponding to Alice retrieving GodFather, r(t) = 7 : 00 and x(t) = 1 : 00. We assume that at time r(t), a task t is either accepted which means that its display will start at r(t) + x(t) with at most seconds of startup latency, or the system rejects the task right away (at time r(t)). Hence, this system has always a low system load and a short waiting queue. Otherwise, the behavior of ARS + under high system load becomes identical to that of ARS and similar heuristics (see Sec. 4.1.5) can be employed.

With ARS + , the scheduler has extra information about the tasks which results in a more exible scheduling. Suppose t is started at time interval u, there are now three possible scenarios: 1) r(t) u < r(t) + x(t), 2) u = r(t) + x(t), or 3) u > r(t) + x(t). Trivially, with the second and third scenarios t will observe a startup latency of zero and u ? r(t) ? x(t) intervals, respectively. These two scenarios are similar to ARS where a task is started at or after its release time. The interesting case is the rst scenario. In this case, although the retrieval of the corresponding object is started sooner than r(t)+ x(t), the display can still be initiated at r(t) + x(t). This can be achieved by buering the data retrieved prior to r(t) + x(t), termed upsliding described10 in [8]. Upsliding starts a retrieval task sooner than the display of its corresponding object. Memory buers are employed to store those subobjects that are retrieved sooner and have not been displayed as yet. As the delay between retrieval and display increases, the amount of required memory also increases. To illustrate the concept of upsliding and its memory requirement consider the following example.

Example 4.5: Suppose: D = 3, T = ft1; t2; t3g, and t1 :< 0; 2; 5; 0:6; 1 > t2 :< 1; 1; 5; 0:6; 1 > t3 :< 2; 0; 5; 0:6; 1 > The placement of objects X , Y , and Z referenced respectively by t1 , t2 , and t3 is shown in Fig. 4. The required display schedules of these tasks as a function of time is depicted in Fig. 5b. To simplify the example, all the tasks have equal start time (i.e., r(t) + x(t)), length, display bandwidth, and their rst subobjects reside on the same cluster. Hence, if a scheduler starts all the tasks at their start time, there would be a retrieval contention for cluster 1. One solution is to postpone the retrieval of t2 and t3 for one and two intervals, respectively, observing a total startup latency of 3 intervals for this collection of tasks. However, since the information about t2 (t1 ) is released one (two) interval(s) sooner than its start time, its retrieval can start earlier by employing upsliding. In this case, no startup latency will be observed. Table 1 demonstrates the status of retrieval, memory, and display per time interval. From the above example, the memory requirement for an upslided task has a growing phase, steady phase, and a shrinking phase. The detailed computation of memory requirement for each phase can be found11 in [41]. The important phase is the steady phase which determines the maximum memory requirement. The maximum memory requirement for each upslided task is proportional to both the number of 10 The dierence is that in [8], we introduced upsliding for binary single-media composite objects. 11 In [41], we assumed a single media type. The extension of equations to a mix of media type can be done similar to Eq. 2

which computes the memory requirement for the steady phase.

15

Time Interval 0 1 2 3 4 5 6 7

Retrieve

X1 X2 , Y1 X 3 , Y2 , Z 1 X 4 , Y3 , Z 2 X 5 , Y4 , Z 3 Y5 , Z4 Z5 -

Memory Display Sliding Cycle-based X1 X1 , X 2 , Y 1 X2 , X3 , Y2 X1 , Y1 , Z1 X3 , X4 , Y3 X2 , Y2 , Z2 X1 , Y1 , Z1 X4 , X5 , Y4 X3 , Y3 , Z3 X2 , Y2 , Z2 X5 , Y 5 X4 , Y4 , Z4 X3 , Y3 , Z3 X5 , Y 5 , Z 5 X4 , Y 4 , Z 4 X 5 , Y5 , Z 5

Table 1: System status with upsliding for Example 4.5 upslided intervals and its display bandwidth requirements:

MaxMem(t) =

(

c(t) (r(t) + x(t) ? (t)) if r(t) + x(t) > (t) 0 otherwise

(2)

Note that the amount of required memory in megabit can be computed as MaxMem(t) RD interval, where interval is the duration of a time interval in seconds. This is because the bandwidth requirement of t is c(t) RD and the display time of one subobject of the object referenced by t is interval seconds. Since Eq. 2 has already incorporated c(t), it is sucient to multiply MaxMem(t) by RD interval to compute the memory requirement in megabits. Finally, note that ARS + can be restricted to ARS by assuming 8t 2 T x(t) = 0. Therefore, the following result holds.

Theorem 4.6: ARS + is NP -hard. 4.2.1 Solving ARS +: Role of Memory To describe the main structure of ARS + algorithm, assume the maximum amount of sliding for a task t is denoted by B (t). The main structure of ARS + is very similar to that of ARS (see Sec. 4.1.3). The structure is on-line and does not make advance reservations of resources. However, availability of the buers12 is exploited. Thus, for each interval u, the set T 0 is constructed from the tasks t0 such that 8t0 2 T 0 : 1. 2. 3. 4.

u r(t0 ) r(t0 ) + x(t0 ) ? u B (t0 ) t0 can be scheduled at u with no retrieval contention (as de ned in Def. 4.2). t0 canPbe scheduled at u without exhausting the system memory (SYSMEM), i.e., assuming (t0 ) then t2(S(u)[ft g) MaxMem(t) SY SMEM .

u

0

Previously, with ARS, as soon as a task was released (item 1) and it resulted in no retrieval contention (item 3), it was added to T 0 . Here, however, a task can start before its start time (item 2) if it is allowed to (based on B (t)) and it will be added to T 0 if besides all the mentioned constraints, the system memory does not over ow (item 4). Once T 0 is constructed, alternative methods to sort T 0 and the procedure of selecting 12 We use the terms buer and memory interchangeably in this paper.

16

tasks will be exactly identical to those of ARS heuristics. Indeed, the same heuristics and h functions can be employed. In addition, the implementation optimizations of Sec. 4.1.4 are applicable here.

4.2.2 Assignment of Memory to Tasks B (t) determines the maximum number of intervals allowed by a task t to slide. Since the amount of memory buers required by t is directly proportional to the number of intervals it slides, B (t) also determines the amount of buers available to the task. Hence, two issues remain to be discussed: 1) the value of B (t), and 2) the utilization of B (t) by t (i.e., should it be utilized greedily or conservatively). The lower bound of B (t) is 0. In this case, no sliding is allowed by t. If 8t 2 T B (t) = 0, then the ARS and ARS + scheduling algorithms become identical (assuming r(t) r(t) + x(t)). The upper bound of B (t) is x(t). That is, independent of the amount of available memory, a task cannot be slided sooner than its release time. In Sec. 5, the optimal value of B (t) is both computed analytically and veri ed using a simulation study.

To address the second issue, in [8] we propose two alternative algorithms, First Match and Exhaust B, to assign memory buers for sliding. The algorithm First Match is conservative in its approach of using buers. On the other hand, the algorithm Exhaust B tries to greedily use all B (t) buers for sliding. Thus, the latter tries to minimize latency at the cost of increased memory whereas the former tries to minimize the use of buers for the current display. The algorithm of Sec. 4.2.1 is a generalization of Exhaust B. It strives to schedule a task t at r(t) + x(t) ? B (t), exhausting all the B (t) buers. In contrast, a generalization of First Match would perform as follows. At time r(t), it tries to see whether task t can be scheduled at r(t) + x(t). If so, then none of the B (t) buers will be needed. Otherwise, it tries to schedule t at an interval nearest to r(t) + x(t) (say u), decrementing u until it either exhausts all B (t) buers or u = r(t). If this happens, it tries to schedule t at u where u > r(t) + x(t). We eliminated the generalization of First Match from consideration for two reasons: 1) our simulation studies in [8] demonstrates the superiority of Exhaust B, and 2) the generalization of First Match is not ecient. To observe, note that with the generalization of First Match it is possible to schedule a task t at time interval r(t) to start at a later time u, where u > r(t). However, such a strategy reserves resources in advance, making scheduling signi cantly inecient (see Sec. 3).

4.3 Composite Retrieval Scheduling (CRS) With CRS, each composite task consists of a number of atomic tasks. We use t to represent an atomic task and for a composite task. Similarly, T represents a set of atomic tasks while is a set of composite tasks. A composite task, itself, is a set of atomic tasks, e.g., = ft1 ; t2 ; :::; tn g. Each atomic task has the same parameters as de ned in Sec. 4.1.1, except for the release time r(t). Instead, each atomic task has a lag time denoted by x(t). Without loss of generality, we assume for a composite task , x(t1 ) x(t2 ) ::: x(tn ). Subsequently, we denote the rst atomic task in the set as car(), i.e., car() = t1 . Lag time of a task is identical to the lag parameter (see Sec. 2.3) of its corresponding object and determines the start time of the task with respect to x(car()), i.e., the start time of the rst object t1 . Trivially, x(car()) = 0. Brie y, x(t) determines the temporal relationships among the atomic tasks of a composite task. Each composite task, on the other hand, has only a release time r() which is the time that a request for the corresponding composite object is submitted. 17

De nition 4.7: An atomic task (of a composite task) t is schedulable at u i t can be started at u and completes at u + `(t) ? 1 without resulting in retrieval contention as de ned in Def. 4.2. De nition 4.8: A composite task = ft1; t2 ; :::; tng is said to be schedulable at u i 8t 2 , t is schedulable at u + x(t). The formal de nition of the CRS problem is in [42]. It is also shown there that CRS can be restricted to ARS + by assuming 8 2 is a singleton set and hence,

Theorem 4.9: CRS is NP -hard. 4.3.1 Approaches to Solve CRS There are two general approaches to solve the CRS problem. One is to view a composite task as a set of independent atomic tasks. The other is to view a composite task as a single atomic task. As is explained below, either of these approaches increases complexity in scheduling.

S

The rst approach constructs a set T as T = 2 . The release time for an atomic task t 2 can be computed as: r(t) = r(), i.e., an atomic task is released once its corresponding composite task is released. Now the CRS problem can be viewed as the ARS + problem that should schedule the set T . The main dierence is that for those t that are not the rst task of any composite task (i.e., 6 9 2 s.t. t = car()), they should either be scheduled immediately at their start time (i.e., r(t) + x(t)) or the display of their corresponding composite object will suer from hiccups. In other words, they cannot tolerate any startup latency once the display of their composite task has started. The only case that t 2 can tolerate a startup latency of is when car() observes the startup latency of . Enforcing such a constraint will require the scheduler to be more complex. An alternative approach is to schedule composite tasks as if they are atomic tasks, employing ARS heuristics. To achieve this, we should assign a length (`) and a bandwidth requirement (c(t)) to each composite task so that the heuristics designed for ARS be able to prefer one task to the other. Towards that end, for each composite task with n atomic tasks we de ne:

X c(t) t2 n `() = Maxt2 (x(t) + `(t)) c() =

(3) (4)

If more than one composite task is schedulable at u, then one of the ARS heuristics is employed to select one (or more) task(s). The de nition of a composite task being schedulable at u is as de ned in Def. 4.8. However, we should emphasize that the use of c() and `() is to only enable ARS heuristics to compute h(). As in the previous approach, the key problem with this approach is also to verify schedulability, i.e., it is more complex to infer retrieval contention. To provide an analogy on how the rst and second approaches are dierent, consider Fig. 8. In Fig. 8a each shaded box is an atomic task and the problem of ARS is to t them all in the big rectangle on the right with the objective to minimize the amount of required time by consuming as much of the available cluster bandwidth as possible. With CRS the problem of lling the rectangle is identical to ARS, however, each task is now a polygon (instead of a rectangle). The rst approach strives to partition a polygon into 18

Bandwidth

Bandwidth

Time

a. ARS Problem

b. CRS Problem

Time

Figure 8: ARS vs. CRS analogy boxes so that it can use the ARS + solutions to solve CRS. The problem is that the ARS + solutions might destroy the shape of the polygons in the process of scheduling. Hence, more constraints should be added to ARS + . The second approach, on the other hand, maintains the shape of the polygons while scheduling them. Instead, it employs ARS solutions to select a polygon among all the polygons that can t from a certain point. In this paper we focus on the second approach and do not consider the rst further. We now return to the problem of determining retrieval contention with the second approach. The key point is that the structure of a composite task does not follow a regular pattern, thus violating the rst criteria for ecient contention prediction (see Sec. 3). Hence, to schedule a composite task at u, the entire duration from u to u + `() should be examined for retrieval contention. In other words, advance resource reservation via look-ahead becomes essential. Implementation of this technique demands choosing longer time interval in order to compensate for the extended duration of computation of cop. This results in less number of simultaneous displays per interval compared to a system with the same system load which does not support composite objects.

4.3.2 The Critical Role of Memory As we have shown in Sec. 4.2, scheduling of ARS + can bene t form memory. However, in case of CRS, memory may be necessary to achieve schedulability, as discussed below. A composite object may have internal contention , i.e., atomic tasks that constitute a composite task may compete with one another for the available cluster bandwidth. Hence, it is possible that due to such internal contention, a composite task is not schedulable even if there are no other active requests. For example, consider a composite task that consists of the atomic tasks of Example 4.3. Note that the release time of the tasks of this example should now be considered as their lag time. Independent of the time interval that is initiated, there is an internal contention between t1 and t3 after 4 intervals from the start time of . In other words, it is not possible to start all the atomic tasks of at their start time.

De nition 4.10: Internal contention: Consider a composite task = ft1; t2; :::; tng and 8u 0 let S (u) be the set of atomic tasks which P x(t) u < x(t) + `(t). The composite task has internal contention i 9u; i 1 i D such that t2S(u) Ri (t) > 1:0 where: Ri (t) =

(

c(t) if (p(t) + u ? x(t)) mod D = i ? 1 0:0 otherwise 19

(5)

The above de nition, intuitively means that 6 9u such that be schedulable at u (see Def. 4.8). This problem is particular to CRS because there is a dependency among start times of atomic tasks and yet these atomic tasks can con ict with each other. Using our graphical analogy, the width of the polygon that represents a composite task with internal contention is larger than the width of the rectangle (see Fig. 8). To resolve internal contention, one should move atomic tasks of so that the deformed polygon can t in the rectangle. This can be achieved by extending buered sliding introduced in [8]. Resolving the internal contention for a composite task is to modify the start time of its consisting atomic tasks, such that Def. 4.10 does not hold true for . Such a modi cation requires use of buers. Ideally, we should minimize the amount of required buer. However, it can be shown [42] that the problem, termed RIC for short, is intractable:

Theorem 4.11: Minimizing the amount of buer needed to resolve internal contention in a composite task (RIC) is NP -hard.

Resolving internal contention can be performed as a pre-processing step for composite tasks before they are scheduled. This pre-processing step can use two types of buered sliding for composite tasks: upsliding and downsliding. We brie y describe each of these. Upsliding is similar to the discussion of Sec. 4.2: It resolves internal contention for a composite task in the same way that it solves ARS + for a set of atomic tasks where 8t 2 r(t) = r(). Once this problem is solved, the start time of atomic tasks can be updated to their scheduled time (i.e., x(t) (t) ? r(t)) and the modi ed composite task will have no internal contention. Since resolving internal contention is equivalent to solving ARS + , use of ARS + may lead a task to start after its start time. This can also be compensated for by memory as described later as part of the downsliding approach. Note that the objective of ARS + should be modi ed to minimize the amount of required memory (instead of minimizing the average startup latency). This is described in detail in Sec. 4.3.3. An alternative approach to resolving internal contention is downsliding. With this approach, the start time of constituting atomic tasks of a composite task is postponed (moved downward). To compensate for this delay, the display of the entire composite object is delayed. To illustrate, consider the following example.

Example 4.12: Suppose: D = 3, = ft1; t2; t3g, and r() = 0. Given the task information as: ti :< x(ti ); `(ti ); c(ti ); p(ti ) >, then: t1 :< 0; 5; 0:8; 2 > t2 :< 2; 5; 0:6; 1 > t3 :< 2; 5; 0:6; 1 > The placement of objects V , Y , and Z referenced respectively by t1 , t2 , and t3 is shown in Fig. 4. The required display schedules of these tasks as a function of time is depicted in Fig. 5c. If all the tasks is initiated at their start time (i.e., r() + x(t)), there will be a retrieval contention for cluster 1 at u = 2. However, the system can downslide the retrieval of t2 (t3 ) by one (two) interval(s) while the retrieval of still starts at 0. In this case, the display of will observes a startup latency of two intervals. Table 2 demonstrates the status of retrieval, memory, and display per time interval. The amount of startup latency introduced by downsliding for composite task is:

delay() = Maxt2 ((t) ? r(t) ? x(t)) 20

(6)

Time Interval 0 1 2 3 4 5 6 7 8 9

Retrieve

V1 V2 V3 V 4 , Y1 V5 , Y2 , Z1 Y3 , Z2 Y4 , Z3 Y5 , Z4 Z5

Memory Sliding Cycle-based V1 V1 , V2 V2 , V3 V1 V3 , V4 , Y1 V2 V4 , V5 , Y2 V3 , Y1 , Z1 V5 , Y3 V4 , Y2 , Z2 Y4 V5 , Y3 , Z3 Y5 Y4 , Z4 Y5 , Z 5

Display V1 V2 V3 , Y1 , Z1 V4 , Y2 , Z2 V5 , Y3 , Z3 Y4 , Z4 Y5 , Z5

Table 2: System status with downsliding for Example 4.12 The memory requirement of a composite task contributed by downsliding for each time interval u is:

Mem(u) = Mem(u ? 1) + Ret(u) ? Disp(u)

(7)

where, Ret(u) and Disp(u) are the amount of data retrieved and displayed at interval u, respectively. The computation of Ret(u) and Disp(u) is described in [42].

4.3.3 Heuristic Approaches to Solve RIC As mentioned in Sec. 4.3.2, ARS + and RIC are very similar. With both ARS + and RIC , if an atomic task t is scheduled sooner than its start time (i.e., (t) < r(t)+ x(t)), it requires some memory. However, if (t) > r(t) + x(t), with ARS + the task will observe a startup latency while with RIC the corresponding composite task might (or might not) observe an extra latency and still some memory is required. Furthermore, the objective of ARS + is dierent from that of RIC . With ARS +, a certain amount of memory buers is assigned to a task (proportional to B (t)) and the objective is to utilize the buers to minimize the average startup latency. With RIC , however, the objective is to minimize both the amount of required memory and the startup latency for the modi ed composite task. In other words, the problem is that if a composite task has internal contention, how should the start time of its consisting atomic tasks be modi ed (using the buered sliding techniques) in order to minimize both the amount of required memory and the startup latency. Since the startup latency is contributed by downsliding and downsliding increases memory requirement, these two objectives seems to be consistent. However, a heuristic that slides all the tasks upward might observe the lowest startup latency but might require a huge amount of memory. This claim is veri ed by our simulation studies. There could be a family of heuristics based on alternative decisions on which tasks, for how much, and in which direction should be slided to meet both of the objectives of RIC. In this paper, we have investigated ve heuristics. The performance of the heuristics are then compared in Sec. 5.4. As a starting point, consider the two alternative memory assignment algorithms described in Sec. 4.2.2: Exhaust B and First Match. To resolve internal contention for a composite task , a variation of Exhaust B is to assume 8t 2 B (t) = x(t). That is, it strives to start all the consisting atomic tasks from the release time of the composite task and if this fails it slides them down. From Sec. 4.2.2 and [8], the intuition is that Exhaust B will result in minimum amount of downsliding and hence reduce the startup latency of the composite task. However, due to its blind upsliding of all the atomic tasks, it might require a large amount of memory. 21

First Match will not modify the starting time of a task unless it is de nitely required. That is, its rst attempt is to start the task at its start time. Furthermore, based on the observation that downsliding results in startup latency, if the rst attempt fails, it rst tries to upslide the task (one interval at a time). Downsliding the task occurs only if the task cannot be upslided. Another observation from downsliding is that only the maximum amount of downsliding determines the startup latency of the corresponding composite task (see Eq. 6). In addition, downsliding a task requires all other tasks that are not downslided consume memory. Based on these observations, once a task is downslided for k intervals, downsliding other tasks for k0 intervals where k0 k results in both less memory requirement and no extra startup latency. This fact is used by DynamicU which is an extension of First Match. DynamicU dynamically updates the start time of the unscheduled tasks form x(t) to x(t) + MaxDown where MaxDown is the maximum amount of downsliding computed locally among the already scheduled tasks.

LateU is an alternative version of DynamicU which has a higher complexity. LateU rst applies First Match to to resolve contention. Assume for every t 2 , x0 (t) is the modi ed value of x(t) resulted from applying First Match. Subsequently, MaxDown is computed as MaxDown = Maxt2 (x0 (t) ? x(t)), and in a second pass the start time of every task t is striven to become x(t) + MaxDown. If that fails, the task is upslided from x(t) + MaxDown one interval at a time until in the worst case it starts back at x0 (t). The goal of LateU is that once a task is downslided for MaxDown, other tasks should also be downslided for MaxDown if possible in order to minimize the memory requirement while the observed latency remains unchanged. Our nal algorithm is Largest Object First (LofLateU ) heuristic. The observation here is that the shorter the length of a task (`(t)) and the lower its consumption rate (c(t)), the less memory it requires for sliding. Hence, LofLateU rst sorts the task in the non-increasing order based on `(t) c(t) and then applies LateU . Therefore, it increases the probability that a larger task be scheduled at its start time without requiring any sliding. Our simulation results (see Sec. 5.4) indicate the superiority of the LofLateU heuristic. Note that the complexity of LofLateU is higher than that of the other heuristics.

5 Some Simulation Experiments As discussed before, an accurate study of the behavior of heuristics and buered sliding techniques should be done either analytically or in the context of a real implementation. However, both of these methods are time consuming (in the order of years) and dicult to achieve. Instead, to gain a basic insight, we selected to analyze the heuristics and sliding techniques using dierent simulation models. This section contains three sets of experiments. Each set is to investigate the speci c characteristics of ARS, ARS +, and CRS , respectively. In the rst set, we study a heavily loaded system in the context of ARS . By a heavily loaded system, we mean that during the majority of time intervals the aggregate bandwidth requirement of tasks highly exceeds the aggregate bandwidth of the system clusters. Therefore, a queue of pending tasks is formed and hence the choice of a heuristic makes a dierence in the system performance. Sample application is a news on-demand system when due to some unexpected event (e.g., Iraq invades Kuwait), many people wanted to access the system simultaneously. Other example applications are provided in Sec. 4.1.5. In these types of applications rejecting a request due to high system load is not possible. Recall that the same heuristics are extended for the CRS problem (see Sec. 4.3) which is also 22

Media type MPEG-1 (or stereo CD quality audio) Low quality MPEG-2 High quality MPEG-2 Compressed HDTV

Consumption rate (Mb/s) Subobject size (MB) 1.5 0.38 5 1.25 15 3.75 35 8.75

c(t) 0.03 0.1 0.3 0.7

Table 3: Media parameters another sample application where rejecting a request is not a choice. For our experiments, we assume a news-recording application, where a queue of reporters is formed at peak load waiting to store their news on magnetic storage devices. If the queue is served o-line, the objective would be to empty the queue as fast as possible (minimizing nishing time). Otherwise, the objective might be to minimize the average observed startup latency. Note that in a lightly loaded system, where either a small or no queue of tasks is formed, the performance of dierent heuristics becomes identical. The focus of the second set of experiments is on ARS + and a lightly loaded system in order to investigate the impact of the amount of buer assigned for sliding on the system performance. From [42], we concluded that buered sliding has no deterministic impact on system performance when the system load is high and hence we did not report it here. We chose one of the heuristics (FCFS) and study the impact of B (t). A sample application that might assume a light system load is a reservation-based news-on-demand application where a very low startup latency is desired. That is, a request is either serviced observing a few seconds of latency or rejected if the system is overloaded. Note that rejection is occurred at the time of reservation so that the user do not need to wait at all. The application must be reservation-based or else buers cannot be utilized (see Sec. 4.2). Finally, the last set of experiments study the performance of the memory assignment heuristics to resolve internal contention of composite tasks (RIC). Recall that with CRS, the internal contention of a composite task should be resolved prior to its activation. Subsequently, multiple composite tasks are scheduled employing heuristics described for ARS and evaluated in the rst set of experiments. For the rest of this section, we rst describe the simulation model. Next, we report the results of our experiments.

5.1 Simulation Model For the purposes of this evaluation, we assumed a hardware platform consisting of variable number of disk clusters (1 D 20) each consisting of two one Gigabyte Seagate ST31200W disks. Hence, the eective bandwidth of each cluster is 50 Mb/s. Note that choice of number of disks per cluster is arbitrary and does not impact the observations. That is, xing the number of both clusters and released tasks per unit of time, the system becomes overloaded later (sooner) once we increase (decrease) the number of disks per cluster. In our experiments we chose to vary the number of clusters and/or released tasks per unit of time, in order to study dierent loads. For all the experiments the subobject sizes are computed assuming that the duration of a time interval is 2 seconds. For the rst two sets of experiments we assumed a news-on-demand application where objects belong to either of the four dierent media types shown in Table 3. Again these assumptions are arbitrary and dierent values for parameters did not impact the main observations reported in this section. However, we strive to choose realistic values consistent with the state of the art at the time of this writing. 23

5.2 Set 1: ARS, Comparison of the Heuristics In the rst set of experiments, we varied the size of clips (= `(t)) from 2:33 (70 subobjects) up to 9:32 minutes (280 subobjects), simulating typical length of a news clip. To simulate a peak load, we assumed that the system is fully utilized during time 0 to time warmup and newly released tasks cannot be scheduled. During then a number of tasks (say n) are released with random release times (where 0 r(t) warmup) and added to a queue. Finally, at time warmup the system starts to schedule tasks from the head of the queue. The queue is sorted based on the h-function of a heuristic (see Sec. 4.1.5). The value of n determines the magnitude of the peak load and is the length of the queue. By assigning n to warmup, we make the random release times of n tasks to be uniformly distributed from 0 to warmup. In our experiments we varied the length of the queue (n) from 100 up to 3000 tasks. There are two criteria to compare the performance of heuristics. First, how fast a heuristic can empty the queue. This is consistent with the rst optimization objective of ARS that is to minimize the nishing time (w) of a set of tasks T . Second, which heuristic results in the minimum average startup latency observed by tasks. These two criteria are actually identical to the famous throughput and latency criteria. The similarity between latency and average startup latency is trivial to see. However, translating the nishing time to throughput is not as straightforward. Assume a set of tasks T are scheduled on a D cluster system and nished by time13 w. Then the average throughput of the system (number of tasks activated concurrently) can be computed as: (8) @ = ` w n where ` is the average length of the tasks (175 intervals in our experiments). To have a theoretical base point for comparison, the average throughput of a D cluster system can also be computed as:

AV G = Dc

(9)

where c is the average consumption rate of the tasks (0:28 in our experiments assuming the c(t)'s provided in Table 3). Fig. 9 compares the throughput of dierent heuristics by using AVG as a comparison point. To be speci c, the throughput of the heuristics (computed from their nishing time by employing Eq. 8) are normalized by the value of AVG (say @normalized ). Subsequently, the percentage of the improvement as compared to AVG is computed as14 (@normalized ? 1) 100. In Fig. 9, n was xed at 1500 and D was varied from 1 to 20 (X -axis). Fig. 9a illustrates the throughput of the set of heuristics that is termed well-behaved. LOF outperformed both SOF and FFI by an average margin of 20%. The margin of dierence between LOF and FFD was approximately 5%. The other set of heuristics, termed ill-behaved (Fig. 9b), almost always performed worse than AVG. The reason that they are not well-behaved is because their behavior is non-deterministic. They sometimes get lucky and sort the tasks in such a way that it results in a good nishing time, and sometimes they do not. Hence, one might as well sort the tasks randomly and gain similar performance. Brie y, LOF shows a consistent superiority over other heuristics in system throughput. For the same experiments, we normalized the average startup latency observed by dierent heuristics 13 Note that the real nishing time is w ? warmup because no tasks are scheduled during the rst warmup intervals. We consider this factor in our experiments and henceforth by w we mean w ? warmup. 14 The reason that we did not report any numbers for latency and throughput is because those numbers would be very dependent on our simulation parameters. However, percentage of improvement is almost independent of the parameters.

24

% of Improvement in Throughput (as compared to AVG) 25

% of Improvement in Throughput (as compared to AVG) 25 * ECF x LCF 20 . EDF o FCFS 15 + LCFS

20 LOF FFD

15 10

10

5

5 AVG

0

SOF & FFI

−5 −10 0

0

5

10 15 Number of clusters

−5 −10 0

20

a. Well-behaved heuristics

5


20

b. Ill-behaved heuristics

Figure 9: Throughput: varying D, length of the queue is 1500

% of Increase in Latency (as compared to SOF) 250

% of Increase in Latency (as compared to SOF) 250

200

200 LOF

150

* ECF x LCF . EDF o FCFS + LCFS

150

FFD 100

100

50

50 FFI

0 0

5


0 0

20

a. Well-behaved heuristics

5


b. Ill-behaved heuristics

Figure 10: Latency: varying D, length of the queue is 1500

25

20

20

% of Increase in Latency (as compared to SOF) 120 LOF

% of Improvement in Throughput (as compared to AVG) LOF

10

FFD

100

0

AVG

80

SOF & FFI

60

FFD

−10 40

−20 20

−30 −40 0

FFI

0

500

1000 1500 2000 2500 Length of the queue (n)

−20 0

3000

500

a. Throughput

1000 1500 2000 2500 Length of the queue (n)

3000

b. Latency

Figure 11: Well-behaved heuristics: varying the length of the queue (D = 16) by that of SOF. Subsequently, the same as the case for throughput we computed the percentage of increase as compared to SOF. The reason we chose SOF is because it demonstrated the minimum average startup latency in all the cases. As shown in Fig. 10a, LOF and FFD suddenly become the worst heuristics in latency time. To observe why, consider the case that the queue consists of 11 tasks with identical length, where c(t) = 0:1 for ten of them and 1:0 for the last one. LOF and FFD schedule the last one rst, resulting in a startup latency observed by 10 other tasks, while FFI or SOF schedules the 10 tasks rst and makes a single task to observe a startup latency. The dierence between the latency time of SOF and LOF is more signi cant (an average of 150%) than the dierence between their throughput. The ill-behaved heuristics (Fig. 10b) were always outperformed by SOF (by a margin of 10% to 50%). To understand the impact of dierent lengths of the queue, we xed D at 16 and varied n from 100 to 3000. The results for the well-behaved heuristics are reported in Fig. 11. The main observations remain valid, and as expected the performance of the heuristics become close to each other for small n. An interesting observation is that for small n, LOF and FFD outperformed SOF and FFI in latency time as well (Fig. 11b). This is because with short queues, the dierent objectives are no longer con icting. Similar trend was observed with RIC and LofLateU when the system utilization was in the neighborhood of 100% (see Sec. 5.4). In conclusion, for an o-line system where it is important to empty the wait queue as fast as possible, LOF is the superior heuristics. On the other hand, for an on-line system where minimizing the average startup latency is the important objective, SOF should be the heuristic of choice. Similarly, a system that expects a single peak load and does not require the entire system resources immediately (expect a light system load to follow) is better of employing SOF. A system administrator can switch heuristics in the middle of system execution to adapt the system to alternative scenarios. In this case, the ill-behaved heuristics with non-predictable behavior are not useful to the administrator.

5.3 Set 2: ARS +, Impact of Buered Sliding With a light system load, tasks compete with each other not because the entire system bandwidth is exhausted during the majority of intervals, but because the bandwidth of a single cluster is exhausted during speci c 26

Avg. Startup Latency (in number of intervals)

Avg. Startup Latency (in number of intervals) 1.8

0.6

1.6

0.5

1.4 0.4

1.2

0.3

1

0.2

0.8 0.6

0.1

0.4 0 0

5

10 15 B (in number of intervals)

20

0

a. Very Light (6 requests per minute)

5

10 15 B (in number of intervals)

20

b. Light (7.5 requests per minute)

Figure 12: Value of B (t) intervals. In other words, due to the placement of their corresponding objects, they might collide on a single cluster. In this case, instead of postponing tasks, one can resolve the contention by applying sliding and hence decrease the average startup latency. This set of experiments strives to both verify this claim and determine the reasonable amount of sliding each task is allowed to do. To simulate a lightly loaded system, we employed an open simulation model: the number of tasks (n = 4000) released every minute is controlled by a Poisson distribution. We manipulated the Poisson parameter to model two alternative system loads: Very light (6 tasks per minute), and light (7.5 tasks per minute) system load corresponding to 50% and 65% system utilization (D = 16), respectively. To allow the tasks to slide, their release time should be dierent from their start time (the ARS + problem). Similar to the rst set of experiments, we varied the size of clips (= `(t)) from 2:33 (70 subobjects) up to 9:32 minutes (280 subobjects), simulating typical length of a news clips. The lag parameter, x(t), varies from 0 to 40 intervals (80 seconds). We xed the number of clusters at 16, and the amount of memory available for buered sliding is15 1:22 Gigabyte (GB). The observations are independent of the values of D and system memory. For example, the system memory will be exhausted sooner, if less memory is assumed. Before analyzing the impact of B (t) using the simulation model, we derive a theoretical optimal value for B (t). A theoretical optimal value for B (t) is the one that satis es the following equation:

X

t2S (u)

B (t) Sub(t) = sysmem

(10)

where S(u) denotes the set of active tasks at u, Sub(t) is the subobject size of the object referenced by t, and sysmem is the available system memory in MB. Given a task tbig where c(tbig ) = 1, we can compute which is the size of its subobject in MB. Subsequently, for any t, Sub(t) = Sub(tbig ) Sub(tbig ) = RD interval 8 c(t) or Sub(t) = RD intervalc(t) . Substituting Sub(t) in Equation 10 and assuming a x value (OptB ) c(tbig ) 8 15 Note that for a system with 32 physical disk drives, this is a reasonable amount of memory, conceptually translated to 39 MB of memory per physical disk (a typical workstation comes with a 1 GB disk and 64 MB of memory).

27

for all B (t), we obtain:

RD interval OptB X c(t) = sysmem 8

P Note that in the worst case

t2S (u)

(11)

t2S (u) c(t) = D, and hence from Equation 11 we obtain:

8 OptB = R sysmem D interval D

(12)

To verify the derived optimal value, consider Fig. 12. In this gure, 8t we varied the value of B (t) from 0 up to 20 and compared the average startup latency observed by FCFS. Recall that with low system load, no queue is constructed so all the heuristics perform identically. The value of OptB for our experiments is 6:25. In Fig. 12a, as we increase the value of B , the average startup latency decreases until B = 10. When assigning more than 10 buers to each task for sliding the curve plateaus. The reason for small jumps in the average startup latency when B > 10 is due to competition for the available memory (i.e., memory becomes a scarce resource). The curve did not plateau at B = 6:25 because the system load is very low and P t2S (u) c(t) < D. With a low system load (Fig. 12b), however, we observe that the curve plateaus when B increases beyond 6, consistent with the computed value of OptB . The reduction in the average startup latency with buered sliding ranges from 240% to 950% depending on the system load. Note that with all the dierent loads the startup latency is even decreased signi cantly with B (t) = 1 as compared to no sliding (B (t) = 0).

We also examined the value of B (t) to be a function of both x(t) (i.e., B (t) = xB(t) ) and sysmem (i.e., x(t) 8 B (t) = RDsysmem intervalB ). In each case we varied the value of B . With B (t) = B , the observations were similar to those reported for a constant B . However, the reduction in the average startup latency was less 8 than when B is xed. The second case (where B (t) = RDsysmem intervalB ) has already been covered because it is a subset of the reported cases. To illustrate, note that in this case all the tasks are still assigned a xed value of B (t). Subsequently, by xing sysmem at 1.22 GB and varying B , dierent values generated for B (t) are a subset of those reported in this section.

5.4 Set 3: CRS , Resolving Internal Contention In this set of experiments we compare the memory requirement and the incurred startup latency of the following heuristics to resolve internal contention: Exhaust B, First Match, DynamicU, LateU, and LofLateU (see Sec. 4.3.3). To generate a composite task we used uniform random number generators to generate 30 atomic tasks16 with variable lengths (1 `(t) 140, i.e., 2 second to 4.67 minute clips), consumption rates (c(t) 2 f0:7; 0:6; 0:4; 0:3g), and placement (1 p(t) D). The reason that we changed the task lengths from the previous experiments is to have more realistic values for the clips that constitute a composite object. Note that the target application is now digital-editing instead of news-on-demand. Higher consumption rates are assumed because in a digital-editing environment video objects are required to be uncompressed [13]. These values do not impact the main observations as discussed in the following paragraph. To construct temporal relationships between the tasks we used Poisson distribution to generate x(t) for atomic tasks. We manipulated the Poisson parameter to model ve alternative levels of contention (from 1 to 5). Level 3 determines 100% utilization of the system. That is, a single composite task for some speci c 16 This is an arbitrary choice and will not impact the main observations.

28

time intervals might require all the bandwidth of the D cluster system. Brie y, the utilization factor for the ve levels are 25%, 50%, 100%, 125%, and 150%, respectively. Note that in the computation of the utilization factor we considered the assumed consumption rates of the tasks and hence the experiments become independent of the choice of those values. When the level of contention is 3, the composite task should not observe much of a contention when scheduled on a D cluster system. As the level of contention decreases (increases) the probability that the generated composite tasks suer from contention decreases (increases). Note that even for level 1, the possibility of contention is greater than zero due to the con ict in data placement of atomic tasks. That is, it might be the case that only a single cluster becomes the bottleneck due to the temporal relationships, while other clusters are idle. We varied the number of clusters D from 1 to 20. However, since the start time of the tasks x(t) determines the load on the system and x(t) is based on a Poisson distribution that take in to the account the value of D, the trends were almost identical for dierent values of D. Hence, we report the case of D = 16. Note that as D increases the probability of contention for lower levels (level 3) decreases. The experiments were conducted as follows. With each level of contention, a single composite task were generated as described in the previous paragraphs. Subsequently, all the ve heuristics were invoked to resolve the internal contention of . The maximum memory requirement and startup latency were computed for each heuristic. This process was repeated 3000 times, each time with a dierent composite task. The average of 3000 experiments are reported in the following paragraphs. In Fig. 13, the Y-axis is the maximum memory requirement for buered sliding. The values are normalized by interval RD . The X-axis in all the graphs is the 5 levels of contention. The larger the value of the level the higher the probability of internal contention. As expected, the memory requirement of Exhaust B is much higher than First Match (see Fig. 13a). This is because Exhaust B slides all the tasks up blindly and this might result in also sliding those tasks that do not result in contention. Since the number of such tasks are larger for lower levels, the margin of dierence between Exhaust B and First Match is proportionally wider with those levels. On the other hand, Exhaust B is superior to First Match in startup latency (see Fig. 14a). This is because by sliding all the tasks up, Exhaust B empties future cluster bandwidths reducing the possibility that a task be slided downward. The margin of dierence becomes wider as the level of contention increases because with the lower levels, the contention can usually be resolved by upsliding. Hence, although Exhaust B was a superior algorithm with ARS + where a xed amount of buers were assigned to the tasks and the objective was minimizing startup latency, it is not appropriate for RIC. First Match and its other three variations are conservative in the usage of memory buers. DynamicU corrects the start time of the unscheduled atomic tasks, once it observes a MaxDown latency due to the scheduling of the scheduled tasks. Hence, it bene ts from a marginal improvement as compared to First Match (Fig. 13b). LateU does the correction also for the scheduled tasks in an extra pass and hence enjoy a wider margin of improvement while its complexity is higher than that of DynamicU due to that extra pass. The margin of improvement is 97% with level 3 of contention and increases up to 150% with level 5. Finally, LofLateU by sorting the tasks prior to invoking LateU bene ts from 22% (level 2) to 230% (level 5) improvement as compared to First Match. The reason that the heuristics perform almost identically with level 1 of contention is because with that level (or lower) most of the contentions can be resolved by upsliding. The performance of these heuristics diers when downsliding occurs. In Fig. 14b the startup latency of dierent heuristics are compared. The latency observed by LateU and First Match are exactly identical. This is because LateU corrects the start time of the task at most to x(t) + MaxDown. Hence, the start time of the task who is responsible for the maximum latency, and is 29

Maximum Memory (Normalized)

Maximum Memory (Normalized) 700

First_Match

120

600

DynamicU

100

500 80

400 Exhaust_B

60

300

40

200

LateU 20

100

LofLateU

First_Match 0 1

2

3 Contention Level

4

0 1

5

2

a. First Match vs. Exhaust B

3 Contention Level

4

5

b. Others

Figure 13: Memory Maximum Startup Latency (in number of intervals) 8 DynamicU 7

Maximum Startup Latency (in number of intervals) 8 7 First−Match 6

6

5

5

4

4 Exhaust−B

3

3

2

2

1

1

0 1

First_Match & LateU

2

3 Contention Level

4

0 1

5

a. First Match vs. Exhaust B

LofLateU

2

3 Contention Level

4

5

b. Others

Figure 14: Latency downslided for MaxDown intervals, remains intact. DynamicU by modifying the start time of the tasks using the locally computed MaxDowns, resulted in a marginally higher startup latency as compared to LateU. LofLateU by shuing the original order of the tasks, resulted in higher latency with lower levels of contention. This is because tasks are scheduled based on their lengths and consumption rates and not based on their start times. Hence, the possibility that a task cannot be started at its start time due to contention is increased. However, with low levels of contention, the latency is not that high anyway (order of milliseconds). With higher levels of contention, LofLateU resulted in a better startup latency for the same reason that with ARS, LOF resulted in both better latency and throughput when the system utilization was only a little higher than 100%. In conclusion, LofLateU is the superior heuristic for the RIC problem. Its memory requirement is much less than that of the other alternatives while its observed latency is either tolerable or lower than that of the other alternatives. For example, consider a level 2 of contention which is a very common contention. That is, a composite task in the worst case might require only 50% of the system bandwidth. With this level and the mentioned system parameters, if LofLateU was employed to resolve contention, a composite task would require 115 MB of maximum memory (3.6 MB per disk) and would observe 0.64 seconds of startup latency. 30

With First Match the memory requirement would be increased to 142 MB while the latency would be 0.2 seconds. This is because First Match downslided a few tasks while some large tasks were upslided. Instead, LofLateU upslided smaller tasks resulting in lower memory requirement, with the penalty that more tasks were scheduled later than their start time (downslided) resulting in higher startup latency as compared to First Match.

6 Related work Several studies have investigated techniques to support the continuous display of atomic multimedia objects [36, 23, 46, 38, 9, 17, 3]. There are others that study the problem of scheduling the retrievals of the subobjects of dierent atomic objects during a time interval in order to utilize more of the disk bandwidth (and less memory) by eliminating the impact of the seek time [50, 6, 17, 19, 10, 32]. However, we are only aware of few studies [11, 48, 46] that are related to scheduling the retrievals of atomic objects. Our work is dierent from all of the above in that our scheduling techniques rely on and exploit the regular pattern (round-robin) in which a task utilizes the clusters. In addition, with ARS, we focus on a heavily loaded system while in [46] it is assumed that the number of active requests is less than the number of streams supported by the system (light system load). The methods to admit a submitted request without violating the quality of service for the active requests is studied in [48]. However, this study does not investigate which policy should be employed to choose a task from a set of queued requests. Cop in our scheduling algorithms (see Sec. 3) is similar in principle to admission control policies. However, due to our focus on round-robin activation of the clusters, the algorithm of cop is dierent. Dan et. al. [11] propose a batching technique to group the requests that reference the same video in order to reduce the retrieval load. This technique can be adopted and tuned to our scheduling algorithms. In this case, each task can be viewed as a representative of a collection of requests referencing the same object. Finally, in [33] an almost identical architecture to our frame work is considered, and the detailed implementation of an admission control algorithm similar to cop is investigated. It assumes a fully loaded system (neither low load nor high load) and observes starvation of some requests. It proposes intelligent data structures and algorithms to solve the starvation problem. That study is complementary to the ARS problem because it eliminates the starvation problem. When compared with the atomic objects, the retrieval of composite objects using a multi-disk hardware platform has received less attention. A number of studies have analyzed composite objects and temporal relationships from a conceptual perspective [24, 47, 29, 25]. Their focus is on alternative abstract representations of composite objects. Other studies [45, 28, 37] concentrate on the networking aspects of a geographically distributed system and propose techniques to support the temporal relationships of a composite object when displayed at a remote client. A survey of these studies is presented in [7, 37]. The relation between our previous works ([41, 8]) on composite objects and this study is discussed in Sec. 1. ARS might appear to be similar to the problem of real-time scheduling of tasks on multiprocessors (from [31, 30] to [26]). One might conceptualize a cluster as a processor to reduce ARS to a multiprocessor task scheduling problem. However, the constrained placement of data across the clusters introduces new challenges. In multiprocessor task scheduling, a task can be scheduled for execution on any processor at any time, however, with ARS the retrieval of object X must start with the cluster containing its rst subobject and employ other clusters in a round-robin manner. Stankovic et al. suggests that the placement constraints and the impact of this placement on the run time scheduling is an open research area [43, 44]. An alternative reduction of ARS is to consider each cluster as a resource. This suggests the similarity 31

of ARS with the task scheduling under resource constraint [16, 15, 5, 52, 49]. One major characteristic of ARS dierentiates it from these studies. All these studies assume that the resources are occupied by a task during its life time. However, with ARS, a display acquires and releases resources periodically in a round-robin manner. A method to compensate for this is to break a task into subtasks, where each subtask acquires resources for its entire duration. Subsequently, a precedence relation [27] can be employed to relate the subtasks of a task. However, the precedence relation is not restricted enough for ARS. If a partial order is assumed such that task t1 precedes task t2 , then this means that t1 should complete before t2 starts. However, with ARS, assuming t1 and t2 correspond to retrieval subtasks from clusters i and i + 1 on behalf of a single request, then t2 should start immediately after t1 completes. Otherwise the display may suer from hiccup. Hence, it is not sucient to start t2 any time after the completion of t1 .

7 Conclusions and Future Research Directions We investigated a class of scheduling problems (ARS, ARS + , CRS) that arise in a large number of multimedia applications. Due to the special requirements of these applications, they introduce new challenges to the conventional scheduling problems. Hence, we showed their uniqueness and concluded that the solutions proposed to the conventional scheduling problems need to be revisited and adapted for ARS, ARS + , and CRS. We formalized these problems and proved their NP -hardness. Two dierent types of applications can be modeled by ARS. First, those applications that can tolerate the rejection of a request when the system load is high (e.g., movie-on-demand). In this case, the system load is low and each task observes a maximum startup latency of D intervals. For a typical system this translates to (say) one minute of startup latency that can be hidden from the users by displaying a memory-resident preview. We described the retrieval contention problem in such an environment and proposed an on-line scheduling algorithm. Second type of applications cannot reject requests and hence during some peak load the system might become overloaded. We proposed both new and adapted heuristics for these applications and by means of a simulation model concluded that the choice of heuristics (LOF vs. SOF) can make up to 28% dierence in throughput and average of 150% in latency time. The behavior of ARS + when the system load is high is similar to ARS and identical heuristics can be employed. However, when requests reserve displays in advance, it is more reasonable to either service them on time or reject them at the time of reservation. Hence, we studied ARS + with low system load and demonstrated that memory buers can be employed to reduce the startup latency signi cantly (up to 950%). The optimal share of each task from memory used for buered sliding was computed analytically and it was veri ed using a simulation study. We also demonstrated that an algorithm that utilizes buers greedily is both less complex and more successful than a conservative one. With CRS, the problem of retrieval contention becomes more restrict because atomic tasks of a single composite task cannot be postponed or the de ned temporal relationships will be destroyed. Buer sliding techniques of ARS + were extended to resolve such an internal contention without violating the time dependencies. Furthermore, alternative heuristics to determine which tasks and how much to slide in order to minimize both the amount of required memory and the startup latency were introduced. By means of a simulation study we showed that the LofLateU heuristic outperforms the rest by a wide margin (up to 230%). In addition, we extended the heuristics proposed for ARS so that an on-line scheduler for CRS can employ them to schedule composite tasks in a multi-user environment. 32

This study can be extended in many ways. We start with short term future plans. First, the role of buered sliding to resolve contention between atomic tasks of dierent composite tasks should be investigated. This will result in a more exible but relatively complex scheduling algorithms. This investigation can be done in the context of the rst approach for solving CRS where atomic tasks are scheduled independently. In this case, adding a constraint to the atomic tasks such that their start time and release time become a function of the release time of the composite task is necessary. Second, object replication should be investigated in order to reduce the buer requirement of sliding techniques. Third, in this study we focused on the demand driven paradigm [41]. That is, a task is released based on a submitted request. However, tasks might be released in a regular manner based on a data driven paradigm [41]. To illustrate such an environment, suppose that there is a large database of digitized movies. A number of Cable companies (e.g., HBO, Showtimes, Movie Channel) might render this database to serve their customers. The schedules (daily, weekly or monthly) of cable companies are known in advance. Meanwhile, a pay-per-view service provider might use the same database to display movies for its customers on demand. It is worthwhile to study the scheduling problems in a combined environment that incorporates both data and demand driven paradigms. Finally, with ARS we focused on a movie-on-demand application where the user has no interaction with the display. However, with a video-on-demand application, user should be provided with VCR operations (i.e., pause, resume, fast-forward, fast-rewind). In such an environment the duration of a task and its resource (cluster) requirement might vary based on user interaction. Hence, once a task is started, the interval that it completes is not known by the scheduler. As a long term future plan, CRS can be extended to a geographically distributed environment. To envision such an environment, consider movie editors in dierent locations collaborating on authoring a movie. Each editor has its own local repository while he/she might access the clips stored on his/her colleagues' repositories across the globe. The network delay should be considered when satisfying temporal relationships. Once the network resources are reserved [51] and a speci c bandwidth (cnet ) is guaranteed, the network can be considered as a disk cluster with cnet transfer rate. Hence, the same discussed techniques can be extended to this environment. However, there are two major distinctions. First, a clip might now be stored entirely on a single cluster instead of being striped across multiple clusters. Second, the disk clusters (either real ones or those representing the network) might have dierent transfer rates (a heterogeneous environment). In this case, each box in Fig. 3 has a dierent size. Consequently, two tasks might collide at time interval i while no collision is detected during time intervals i ? 1 and i +1. Conceptualizing the network as a cluster, the box corresponding to that cluster might have dierent sizes during dierent intervals due to the network congestion. Scheduling tasks in such an environment is more complex and novel knowledgeable heuristics should be developed.

References [1] James F. Allen. Maintaining Knowledge about Temporal Intervals. Communications of the ACM, 26(11):832{ 843, November 1983. [2] D. Anderson, Y. Osawa, and R. Govindan. Real-time disk storage and retrieval of digital audio and video. UCB/ERL Technical Report M91/646, UC Berkeley, 1991. [3] S. Berson, S. Ghandeharizadeh, R. Muntz, and X. Ju. Staggered Striping in Multimedia Information Systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1994. [4] S. Berson, L. Golubchik, and R. R. Muntz. A Fault Tolerant Design of a Multimedia Server. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 364{375, 1995.

33

[5] J. Blazewicz. Deadline Scheduling of Tasks with Ready Times and Resource Constraints. Information Processing Letters, 8(2):61{63, Feb. 1979. [6] P. Bocheck, H. Meadows, and S. Chang. Disk Partitioning Technique for Reducing Multimedia Access Delay. In ISMM Distributed Systems and Multimedia Applications, August 1994. [7] John F. Koegel Buford, editor. Multimedia Systems, chapter 7. ACM Press, and Addison-Wesley, 1994. [8] S. Chaudhuri, S. Ghandeharizadeh, and C. Shahabi. Avoiding Retrieval Contention for Composite Multimedia Objects. In Proceedings of the VLDB Conference, 1995. [9] H.J. Chen and T. Little. Physical Storage Organizations for Time-Dependent M ultimedia Data. In Proceedings of the Foundations of Data Organization and Algorithms (FODO) Conference, October 1993. [10] A. Cohen, W. Burkhard, and P.V. Rangan. Pipelined Disk Arrays for Digital Movie Retrieval. In Proceedings of ICMCS'95, 1995. [11] A. Dan, D. Sitaram, and P. Shahabuddin. Scheduling Policies for an On-Demand Video Server with Batching. In Proceedings of the ACM Multimedia, pages 15{23, 1994. [12] M. L. Dertouzos. Control Robotics: The Procedural Control of Physical Processes. In Proc. of IFIP Congress, 1974. [13] S. R. Alpert et. al. The EFX Editing and Eects Environment. IEEE Multimedia, 3(1):15{29, Spring 1996. [14] M. Garey and R. Graham. Bounds for Multiprocessor Scheduling with Resource Constraints. SIAM Journal on Computing, 4(2):187{200, June 1975. [15] M. Garey, R. Graham, D. Johnson, and A. Yao. Resource Constrained Scheduling as Generalized Bin Packing. Journal of Cmbinatorial Theory (A), 21:257{298, 1976. [16] M. Garey and D. Johnson. Complexity Results for Multiprocessor Scheduling under Resource Constraints. SIAM Journal on Computing, 4:397{411, 1975. [17] D. J. Gemmell, J. Han, R. J. Beaton, and S. Christodoulakis. Delay-Sensetive Multimedia on Disks. IEEE Multimedia, 1(3):56{67, Fall 1994. [18] S. Ghandeharizadeh, A. Dashti, and C. Shahabi. A Pipelining mechanism to minimize the latency time in hierarchical multimedia storage managers. Computer Communications, 18(3), March 1995. [19] S. Ghandeharizadeh, S. H. Kim, and C. Shahabi. On Con guring a Single Disk Continuous Media Server. In Proceedings of the 1995 ACM SIGMETRICS/PERFORMANCE, May 1995. [20] S. Ghandeharizadeh, S. H. Kim, and C. Shahabi. On Disk Scheduling and Data Placement for Video Servers. USC Technical Report, University of Southern California, 1996. [21] S. Ghandeharizadeh and S.H. Kim. Striping in Multi-disk Video Servers. In Proceedings of SPIE'95 Conference, pages 88{102, October 1995. [22] S. Ghandeharizadeh, L. Ramos, Z. Asad, and W. Qureshi. Object Placement in Parallel Hypermedia Systems. In Proceedings of the International Conference on Very Large Databases, 1991. [23] S. Ghandeharizadeh and C. Shahabi. Management of Physical Replicas in Parallel Multimedia Information Systems. In Proceedings of the Foundations of Data Organization and Algorithms (FODO) Conference, October 1993. [24] R. G. Herrtwich. Time Capsules: An Abstraction for Access to Continuous Meda Data. In Proc. of the 11th Real-Time Systems Symp., pages 11{20, December 1990. [25] N. Hirzalla, B. Falchuk, and A. Karmouch. A Temporal Model for Interactive Multimedia Scenarios. IEEE Multimedia, 2(3):24{31, Fall 1995. [26] G. Koren and D. Shasha. MOCA - A Multiprocessor Online Competitive Algorithm for Real-Time System Scheduling. Theoritical Computer Science, 128(1-2):75{97, June 1994. [27] E. L. Lawler. Optimal Sequencing of a Single Machine Subject to Precedence Constraints. Management Science, 19, 1973. [28] T. D. C. Little and A. Ghafoor. Network Considerations for Distributed Multimedia Object Composition and Communication. IEEE Network, pages 32{49, Novemeber 1990. [29] T. D. C. Little and A. Ghafoor. Interval-Based Conceptual Models for Time-Dependent Multimedia Data. IEEE Transaction on Knowledge and Data Engineering, 5(4):551{563, August 1993.

34

[30] C. L. Liu and J. W. Layland. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment. Journal of ACM, 20(1):46{61, Jan. 1973. [31] R. R. Muntz and E. G. Coman. Optimal Preemptive Scheduling on Tow-Processor System. IEEE Trans. Computers, C-18:1014{1020, 1969. [32] B. Ozden, A. Biliris, R. Rastogi, and A. Silberschatz. A Disk-Based Storage Architecture for Movie On Demand Servers. Information Systems, 20(6):465{482, September 1995. [33] B. Ozden, R. Rastogi, and A. Silberschatz. Disk Striping in Video Server Environments. Data Engineering Bulletin, 18(4):4{16, December 1995. [34] G. Ozsoyoglu, V. Hakkoymaz, and J. Kraft. Automating the assembly of presentations from multimedia databases. In Proceedings of the IEEE 12th International Conference on Data Engineering, 1996. [35] D. Patterson, G. Gibson, and R. Katz. A case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the ACM SIGMOD International Conference on Management of Data, May 1988. [36] V.G. Polimenis. The Design of a File System that Supports Multimedia. Technical Report TR-91-020, ICSI, 1991. [37] S. Ramanathan and P. V. Rangan. Feedback Techniques for Intra-Media Continuity and Inter-Media Synchronization in Distributed Multimedia Systems. The Computer Journal, 36(1):19{31, 1993. [38] P. Rangan and H. Vin. Ecient Storage Techniques for Digital Continuous Media. IEEE Transactions on Knowledge and Data Engineering, 5(4), August 1993. [39] R. Reddy. Some Research Problems in Very Large Multimedia Databases. In Proceedings of the IEEE 12th International Conference on Data Engineering, 1996. [40] C. Ruemmler and J. Wilkes. An Introduction to Disk Drive Modeling. IEEE Computer, 27(3), March 1994. [41] C. Shahabi and S. Ghandeharizadeh. Continuous Display of Presentations Sharing Clips. ACM Multimedia Systems, 3(2), May 1995. [42] C. Shahabi, S. Ghandeharizadeh, and S. Chaudhuri. On Sscheduling Atomic and Composite Multimedia Objects. USC Technical Report USC-CS-95-622, University of Southern California, 1995. [43] J. Stankovic, M. Spuri, M. Natale, and G. Buttazzo. Implications of Classical Scheduling Results for Real-Time Systems. Technical Report UMASS-CS-93-023, University of Massachusets, 1993. [44] J. Stankovic, M. Spuri, M. Natale, and G. Buttazzo. Implications of Classical Scheduling Results for Real-Time Systems. IEEE Computer, 28(6), June 1995. [45] R. Steinmetz. Synchronization Properties in Multimedia Systems. IEEE J. Selected Areas in Comm., 8(3):401{ 412, April 1990. [46] F.A. Tobagi, J. Pang, R. Baird, and M. Gang. Streaming RAID-A Disk Array Management System for Video Files. In First ACM Conference on Multimedia, August 1993. [47] M. Vazirgiannis and C. Mourlas. An Object-Oriented Model for Interactive Multimedia Presentations. The Computer Journal, 36(1), 1993. [48] H. M. Vin, P. Goyal, and A. Goyal. A Statistical Control Algorithm for Multimedia Servers. In Proceedings of the ACM Multimedia, pages 33{40, 1994. [49] Q. Z. Wang and E. L. Leiss. A Heuristic Schedule of Independent Tasks with Bottleneck Resource Constraints. SIAM Journal on Computing, 24(2):235{241, April 1995. [50] P. S. Yu, M. S. Chen, and D. D. Kandlur. Design and Analysis of a Grouped Sweeping Scheme for Multimedia Storage Management. In Proceedings of the Third International Workshop on Network and Operating System Support for Digital Audio and Video, November 1992. [51] L. Zhang, S. Deering, D. Estrin, and D. Zappala. RSVP: A new resource ReSerVation Protocol. IEEE Network, 7(5), September 1993. [52] W. Zhao and K. Ramamritham. Simple and Integrated Heuristic Algorithms for Scheduling Tasks with Time and Resource Constraints. Journal of Systems and Software, 7:195{205, 1987.

35

On Scheduling Atomic and Composite Multimedia

On Scheduling Atomic and Composite Multimedia

Suggest Documents

Multimedia Task Scheduling on OLSR enabled Ad

Scalable Multimedia Disk Scheduling - CiteSeerX

Retrieval Scheduling for Multimedia Presentations

ON SCHEDULING OF MULTIMEDIA SERVICES IN A ... - CiteSeerX

ON SCHEDULING OF MULTIMEDIA SERVICES IN ... - Semantic Scholar

Energy-Efficient CPU Scheduling for Multimedia Applications

Retrieval scheduling for collaborative multimedia ... - Springer Link

Scheduling disciplines for multimedia WLANs: Embedded ... - CiteSeerX

Multimedia Applications Require Adaptive CPU Scheduling 1

Distributed Sender Scheduling for Multimedia ... - Semantic Scholar

Energy-Efficient CPU Scheduling for Multimedia ... - CiteSeerX

Multimedia Applications Require Adaptive CPU Scheduling 1 ...

scheduling multimedia applications with quality of ...

Run-Time Scheduling for Multimedia Applications

Content Scheduling in Multimedia Interactive Mobile Games

Retrieval scheduling for collaborative multimedia ... - CiteSeerX

Upper-Level Scheduling Supporting Multimedia ... - Semantic Scholar

Integrated Scheduling of Multimedia and Hard Real-Time ... - CiteSeerX

An Evaluation of Composite Routing and Scheduling Policies for ... - UCI

Directory Based Composite Routing and Scheduling Policies for ... - UCI

Directory Based Composite Routing and Scheduling Policies for ... - UCI

Directory Based Composite Routing and Scheduling Policies for ... - UCI

pdf-1183\reference-data-on-atomic-physics-and-atomic-processes ...

Effect of atomic number and atomic weight on time-dependent ...