QoS-Oriented Multi-query Scheduling over Data Streams | SpringerLink

5 downloads 62385 Views 321KB Size Report
International Conference on Database Systems for Advanced Applications. DASFAA 2009: Database Systems for Advanced Applications pp 215-229 ...
QoS-Oriented Multi-query Scheduling over Data Streams Ji Wu1 , Kian-Lee Tan1 , and Yongluan Zhou2 1

2

School of Computing, National University of Singapore {wuji,tankl}@comp.nus.edu.sg Dept. of Mathematics & Computer Science, University of Southern Denmark [email protected]

Abstract. Existing query scheduling strategies over data streams mainly focus on metrics in terms of system performance, such as processing time or memory overhead. However, for commercial stream applications, what actually matters most is the users’ satisfaction about the Quality of Service (QoS) they perceive. Unfortunately, a system-oriented optimization strategy does not necessarily lead to a high degree of QoS. Motivated by this, we study QoS-oriented query scheduling in this paper. One important contribution of this work is that we correlate the operator scheduling problem with the classical job scheduling problem. This not only offers a new angle in viewing the issue but also allows techniques for the well studied job scheduling problems to be adapted in this new context. We show how these two problems can be related and propose a novel operator scheduling strategy inspired by job scheduling algorithms. The performance study demonstrates a promising result for our proposed strategy.

1 Introduction Many typical applications of Data Stream Management System (DSMS) involve timecritical tasks such as disaster early-warning, network monitoring, on-line financial analysis. In these applications, output latency (as the main QoS measure) is extremely crucial. Managing system resources to maintain a high QoS is particularly important for applications that have Service Level Agreements (SLA) with the clients, where each client may have its own QoS requirement as to when query answers should be delivered. In traditional DBMS, where data are pull-based, the output delay depends only on the query cost. However, in a DSMS, where data are pushed from scattered sources and their arrivals are out of the DSMS’s control, the output latency becomes tupledependent. Input tuples may have experienced different degrees of delay (due to, for instance, the varying data transmission condition) before arriving at the system. Therefore, the query executor has to continuously adapt to the ever changing initial input delay to ensure results to be produced in a timely manner. Given the unpredictable workload and the limited resource, a DSMS may not always be able to meet all the QoS requirements. When that happens, efforts should be made to satisfy as many clients as possible to maximize the profits or minimize the loss. Strategies proposed in this paper can be viewed as our initial effort towards QoSoriented adaptive query processing for data streams. In this work, we take the output X. Zhou et al. (Eds.): DASFAA 2009, LNCS 5463, pp. 215–229, 2009. c Springer-Verlag Berlin Heidelberg 2009 

216

J. Wu, K.-L. Tan, and Y. Zhou

latency as the main QoS metric since it is the key parameter for typical online applications. Given that each input tuple is attached with a timestamp indicating when it is generated, a result tuple is said to meet the QoS if the time difference between the output time and the input timestamp is no more than the user-specified delay threshold. It is important to note that the output latency defined here embraces both query processing time and various delays incurred during query processing and data transmission. The latter are variables that fluctuate over time beyond query engine’s control. Several query scheduling strategies have been proposed to improve the query performance in a DSMS. However, the main objective of these algorithms is to minimize the average query processing time [5,10,12] or memory consumption [1]. Few of them deal with the issue of optimizing the total user satisfaction. Our proposed QoS-oriented metric complements the above work by considering a more realistic scenario. As can be seen later, a QoS-oriented perspective leads to an entirely different scheduling strategy from the existing work. In summary, contributions of this paper are: 1. Proposition of QoS-oriented query scheduling in a system-oriented context for continuous query processing; 2. An in-depth analysis of how the operator scheduling problem can be transformed into a job scheduling problem; 3. A novel planning-based scheduling strategy which is designed based on the above transformation and addresses the QoS-oriented multi-query optimization issue; 4. Extensive experimental studies that verify the effectiveness of the solution. The rest of the paper is organized as follows: Section 2 introduces the problem to be solved and surveys the related work. Section 3 transforms the operator scheduling issue to a job scheduling problem to facilitate our problem analysis. Section 4 details the proposed scheduling algorithm. Section 5 demonstrates the effectiveness of the proposed strategy through our experimental study. Finally we conclude the paper in Section 6.

2 Preliminaries 2.1 Metric Definition As mentioned before, the quality requirement is defined as the maximum tolerable delay of output tuples. To evaluate the QoS performance of the system, we define the following QoS penalty function. i Definition 1. Given a query Q, let Tout denote the time when an output tuple i is proi duced and Tin be the maximal timestamp of all the input tuples that contribute to the tuple i. And L is the predefined QoS threshold for query Q. The penalty function for tuple i is:  i i 0 if Tout − Tin ≤L (1) Ui = 1 otherwise

Accordingly, the query level QoS can be evaluated by taking the normalized aggregation of the tuple level penalties: n i=1 Ui , n is the total number of output tuples (2) n Intuitively, a query’s quality is inversely proportional to the value in Equation 2.

QoS-Oriented Multi-query Scheduling over Data Streams

I1

I2

op1

O1

217

I3

op3

O3

op4

op2

O2

op5

op6

op7

O4

O5

O6

Fig. 1. A query graph example

In a multi-query environment, we seek to achieve high output quality across all participating queries. Each query qi is assumed to be associated with a weight wi to indicate its importance: A higher wi implies a higher priority of qi . Now, the penalty function U over all queries is the weighted sum of those of the individual queries: nj m  wj i=1 Ui U= nj j=1

(3)

where, m is the number of participating queries. For unbounded input streams, the objective function should be defined within an observation period. Then the parameter n in the equation refers to the total number of output tuples produced in the recent observation period (say last five hours). Note that the length of the period does not affect our algorithm. It should be meaningful to the application. For example, if its length is 5 hours, then our algorithm is optimizing the objective function defined on the last 5 hours. 2.2 System Models Similar to existing work on stream processing, we model the entire Continuous Query (CQ) plan with a Directed Acyclic Graph (DAG). Vertices with only outgoing edges represent input streams and those with only incoming edges represent output streams. Other vertices are query operators. Edges connecting vertices are tuple queues that link the adjacent operators. Data flows are indicated by arrows. For example, Figure 1 shows a query graph with three input streams (I1 , I2 , I3 ) and six output streams (O1 , O2 , ..., O6 ). Each output stream corresponds to exactly one registered query in the system (O1 is the output for query Q1 , and O2 for query Q2 ... so on and so forth). Also there are seven query operators (op1, op2, ..., op7) in this plan. Some operators are dedicated to a single query (such as op7 for query Q6 ) while others are shared among several queries (such as op4 for query Q3 , Q4 and Q5 ). In this problem setting, we assume complete and ordered query results are desired. For each input stream, tuple arrivals are ordered by their timestamps. Each query operator can only process tuples from its input queue in a First-Come First-Served (FCFS)

218

J. Wu, K.-L. Tan, and Y. Zhou

manner so that the tuple order is preserved throughout the query execution. Since each input tuple has a timestamp indicating the tuple creation time and each query has a predefined QoS threshold, whenever a new input tuple arrives, the system can compute the deadline for producing the corresponding output in order to satisfy the QoS requirement. For example, given Q1 ’s QoS threshold L1 , if an input tuple p ∈ I1 with p timestamp Tin arrives at time t, then the deadline to produce the corresponding output p + L1 . And the available time left for query processing, which tuple for query Q1 is Tin p is called Remaining Available Time (RAT), will be Tin + L1 − t. If it actually requires Cp amount of CPU time, which is called Remaining Processing Time (RPT), for the system to process the tuple, then qualified output can be possibly produced only when p Cp ≤ Tin + L1 − t (i.e. RPT ≤ RAT). However, it is not always easy to find out the deadlines of producing the qualified output especially when a query involves more than one input stream. For example, tuples from I1 alone cannot determine the deadlines for the resultant output tuples of query Q3 since their deadlines also depend on timestamps of inputs from I2 . Queries involving multiple streams will be discussed in detail in Section 4.3. 2.3 Problem Statement The formal problem statement can be put as follows: Given the query operator graph, continuously allocate a time slot for each operator to process each of its input tuples such that the objective function U , defined in Equation 3, is minimized. 2.4 Related Work Our work mainly concerns two areas: 1) operator scheduling in stream systems; 2) job scheduling algorithms that minimize the number of late jobs. The issue of operator scheduling has been studied with different objectives. For example, The Chain algorithm [1] schedules the operators in a way such that runtime memory overhead can be minimized. In the rate-based scheduling algorithm proposed by Urhan et al. [12], the objective is to maximize the output rate at the early stage of query execution. There are also scheduling algorithms proposed for optimizing query response time (a.k.a. output latency) [5] or its variant metric (such as slowdown in [10]). However, the objectives in these work are more system-oriented in the sense that the optimal solution is the one that maximizes the system performance, but not users’ satisfaction. In contrast, we adopt a QoS-oriented view, which brings in the user requirements as another dimension of the issue. Such slight difference, however, renders totally different problem settings, and hence completely different solutions. Probably the most relevant work to us is the one done by Carney et al. [3]. They provide interesting scheduling solutions to account for QoS-oriented requirements. However, their approach is only targeted at the scenario where operators are not shared among the queries. This is a strong assumption because stream applications often involve multi-query processing with operators shared among different queries. Scheduling over shared operators, as illustrated in this paper, can be far more complicated.

QoS-Oriented Multi-query Scheduling over Data Streams

t=30

I1

p

ts=10 MinDue

op1 (1) op2 (1)

219

MaxDue

d: (31, 38) J 1

d: 33 J 2

op4 (2)

op3 (2)

d: 35 J 3

O1

L1 =25

O2

d: 34 J 4

L2 =24

op5 (2)

O3

d: 40 J 5

L3 =30

Fig. 2. Transform operator scheduling to job scheduling

The job scheduling problem (particularly, the problem of minimizing late jobs) has been studied over the years. Various algorithms were proposed to address this class of problem with different constraints. Karp [6] proved the weighted number of late jobs problem in general, denoted as 1|| wj Uj , is NP-hard. But it is solvable in pseudopolynomial time [8]. Polynomial algorithm is available if the processing time and the job weight can all be oppositely ordered [7]. In the more recent work [9], solutions were proposed to solve the same class of problem with the condition that job release time is not equal. However, to the best of our knowledge, no work has been done that can directly address the multi-query scheduling problem due to the complication of operatorsharing and precedence constraints that are unique in DSMS. Existing approaches are either too general or too restrictive to be applied directly in our context.

3 From Operator Scheduling to Job Scheduling We show in this section how operator scheduling problem can be approximated by a job scheduling model. This provides a new angle to view the issue and allows us to borrow ideas from a well studied subject to develop low-cost operator scheduling algorithms. In a typical single machine job scheduling problem, people look for a plan that allocates each job the appropriate time slot for execution so that the objective function is optimized. Each participating job Ji is associated with a processing cost ci , a deadline di and a penalty value ui . Ji is on time if its completion time ti ≤ di ; otherwise, the job is late and the penalty ui will be incurred. Analogously, in continuous query processing, we can treat the work done by a query operator in response to the arrival of a new input tuple as a job. For example, for the query plan shown in Figure 2, the arrival of a tuple p ∈ I1 (indicated by the box with solid lines) with timestamp ts = 10 triggers five jobs to be created in the system, each corresponds to one involved operator (Let J x denote the job performed by operator opx in the figure). The estimation of jobs’ processing cost, deadline and penalty value are explained in the rest of this section.

220

J. Wu, K.-L. Tan, and Y. Zhou

Processing Cost. The job processing cost is the product of two parameters: unit processing cost and cardinality of the input size. Unit processing cost is the time taken for the operator to process one tuple from its input queue. Cardinality of the input size is determined by the multiplicity (or selectivity) of all upstream operators along the path from the input stream node to the current operator. For the example in Figure 2, let ρ1 and ρ2 denote the multiplicity of op1 and op2 respectively, then the input cardinality for op3 is simply ρ1ρ2. If c3 is the unit processing cost for op3, the job cost of J 3 would be ρ1ρ2c3 . Deadline. Unlike traditional job scheduling problem, not all jobs are given explicit deadlines here. Firstly, it is important to distinguish two types of jobs in this context: Leaf-Job (L-Job) and NonLeaf-Job (NL-Job). L-Job refers to jobs performed by the last operator in a query tree. Examples of L-Jobs are J 3 , J 4 and J 5 in Figure 2. Intuitively, for L-Job, its deadline coincides with the due date by which the query output should be produced. The value can be calculated by adding the input tuple timestamp with the respective query QoS threshold. For example, the deadline for J 3 would be 35 (input timestamp ts = 10 plus Q1 ’s QoS threshold L1 = 25). NL-Job refers to jobs performed by non-leaf operators. The output of an NL-Job becomes the input of some other NL-Job or L-Job in a query plan. J 1 and J 2 are examples of NL-Jobs in Figure 2. Computing deadline for NL-Job with fan-out equal to 1 is relatively easy. Basically, it can be derived backwards from its only immediate downstream job. For example, to compute J 2 ’s deadline, we just need to know the deadline and processing cost of job J 3 . For simplicity, assume the multiplicity of all operators in the example is 1 and the unit cost of each operator is indicated by the number in the corresponding bracket shown in the figure. Hence J 3 ’s processing cost is ρ1ρ2c3 = 1 × 1 × 2 = 2. And J 2 ’s deadline is simply J 3 ’s deadline minus J 3 ’s processing cost, 35 − 2 = 33. This essentially gives the latest date by which J 2 has to finish such that it is possible for the downstream job J 3 to complete on time. Unfortunately, to define deadline for NL-Job with fan-out greater than 1 is much more involved. This is because different downstream jobs have different QoS requirements and due dates. Consider job J 1 in the figure whose fan-out is 3. The three immediate downstream jobs J 2 , J 4 and J 5 require J 1 to complete latest by 32, 32 and 38 respectively in order for themselves to complete on time. There is no single definite deadline that can be determined for J 1 in this case. Here, we use two due dates, namely MinDue and MaxDue, to characterize the situation. The definition of MinDue and MaxDue are given as follows: Definition 2. The MinDue of job i is the latest time by which i has to complete such that there exists a feasible plan with all its downstream jobs to be scheduled on time. Definition 3. The MaxDue of job i is the latest time by which i has to complete such that there exists a feasible plan with at least one of its downstream jobs to be scheduled on time. The significance of MinDue and MaxDue can be viewed as follows: If an NL-Job completes before its MinDue, then no downstream jobs will be overdue because this NLJob cannot complete early enough. On the other hand, if the NL-Job completes after its

QoS-Oriented Multi-query Scheduling over Data Streams

221

MaxDue, then no downstream jobs can possibly complete on time. Section 4 will show how MinDue and MaxDue are used in the proposed scheduling algorithm. The MaxDue for an NL-Job is in fact just the maximum date among the deadlines derived from each of its downstream jobs. In the example, J 1 ’s MaxDue would be max{32, 32, 38} = 38. However, to compute MinDue is a bit complicated because of the schedule feasibility check. Due to space constraints, we omit the detail here. Interested reads can refer to [13] for the algorithm to compute MinDue. In the above example, the result MinDue for J 1 would be 31 (not 32). And the corresponding feasible schedule for its downstream jobs to complete on time is J 2 → J 4 → J 5 . All the job deadlines in the example are shown inside boxes with dotted lines in Figure 2. Job Penalty. To determine the penalty value associated with each job is another issue. In our objective function, the (weighted) penalty is only defined over each output tuple. That can be seen as the penalty applied to a late L-Job. However, for NL-Job, such penalty value is undefined since its completion does not affect the objective function in a direct way. Nevertheless, a late NL-Job, which causes its downstream L-Jobs to be late, does influence the overall QoS. We shall see how this subtlety is handled in our proposed scheduling algorithm in Section 4.

4 Scheduling Algorithm The distinct features about multi-query scheduling as described in the previous section reveal that the problem is much harder to solve than a traditional job scheduling issue. This mainly stems from two reasons: 1) Jobs may be shared among different queries. 2) Job precedence constraints (given by the query plan tree) have to be observed. In what follows, we propose a novel dynamic planning-based heuristic, which effectively schedules the query operators to achieve a good overall QoS in polynomial time. 4.1 Job Set for Scheduling Since the system deals with continuous queries running over potentially unbounded data streams, the scheduling strategy has to be an online algorithm. Here we propose a planning-based online strategy. Instead of selecting individual job, the scheduler selects a set of jobs from the received input tuples for each scheduling round. As we shall see, the planning-based approach enables the scheduler to take a holistic view when making the current scheduling decision, rendering a better output quality. However, the question that immediately follows is how large the job set should be. In other words, how far the scheduler should look ahead. If it looks too far, the plan may be outdated if later the subsequent arrivals of input tuples trigger jobs with earlier deadlines to be generated. This causes rescheduling that inflicts unnecessary overhead. On the other hand, if the lookahead is too short, poor decision may be reached due to lack of global deliberation. Hence, the ideal scenario is to include the minimum number of jobs enough to construct the global optimal (or quasi-optimal) plan for a near future. To this end, we propose the following job selection criterion: All jobs to be included for scheduling must have their deadlines earlier than those of future jobs to be generated

222

J. Wu, K.-L. Tan, and Y. Zhou p1 ts=10 p2 ts=16 p3 ts=23

I1

MinDue

d1 : 35 d2 : 41 d3 : 48

O1

L1 =25

J 12 J J

q1 ts=12 q2 ts=14 q3 ts=28

op4 (2)

d1 : 42 d2 : 44 d3 : 58

1 d1 : (32, 38) J 1 d2 : (38, 44) J 21 d3 : (45, 51) J 31

op1 (1)

op2 (3)

I2

2 2 2 3

op3 (2)

O2

d1 : 40 d2 : 46 d3 : 53

L2 =30

J 13 J J

3 2 3 3

O3

J 14 J 24 J 34

L3 =30

Fig. 3. Choosing the appropriate job set for scheduling

by the system. This rule can be enforced because input tuples are ordered according to their timestamps and processed in a FCFS manner by the operators. The monotonicity allows us to find a point in time which all future jobs’ deadlines are bound to be beyond. This point can be obtained by finding the earliest deadline among all the latest generated jobs associated with each operator. For example, Figure 3 shows a snapshot of a query graph with two input streams and four operators that produce three output streams. The graph shows there are three tuples buffered at each of the input queue (p1, p2, p3 at I1 and q1, q2, q3 at I2 ). These tuples virtually generate 12 jobs (boxes with dotted lines) for the system. Job performed by operator opx with reference to input tuple py (or qy) is denoted as Jyx in the figure. (Again, assume the query operators selectivity or multiplicity is 1 for simplicity) Their deadlines are indicated in the corresponding boxes. For NL-Job with fan-out greater than 1, we take its MinDue for our consideration. Therefore, among all the latest jobs for each operator (i.e. J31 , J32 , J33 , J34 ), the one associated with op1 (i.e. J31 ) has the earliest deadline d = 45, which is essentially the cut-off point: All jobs with deadlines less than or equal to 45 will be included for scheduling (i.e. those indicated as shaded boxes) while others are left for the next round of scheduling. The selected job set will be considered for scheduling in the algorithms depicted in the next section. 4.2 Scheduling Heuristic In Section 3, we show that the uncertainties about NL-Job, in terms of both deadline and job penalty, greatly increase the problem complexity. In addition, precedence constraints among jobs with reference to the same input tuple further complicates the issue. For example, in Figure 3, any upstream job Jy1 , y ∈ {1, 2, 3} has to be scheduled before the downstream job Jy2 and Jy3 to make the plan meaningful. In fact, the entire schedul ing problem can be categorized as 1|prec| wj Uj 1 in standard notation [4]. It is an 1

Denoting the problem of finding a non-preemptive schedule on a single machine such that the job precedence constraints are satisfied and the total weighted penalty function is minimized.

QoS-Oriented Multi-query Scheduling over Data Streams

223

NP-hard problem and generally no good solutions or heuristics are known. To design a heuristic for this problem faces two challenges: 1) The produced plan must be feasible with all precedence constraints observed. 2) The benefits of executing NL-Jobs must be assessed in an efficient and intelligent way. We show how these two challenges are tackled in our proposed algorithm. Evaluating Job Value. Denoting jobs with values (or utilities) is the essential step in a scheduling algorithm. In our problem setting, a duly completed job avoids penalty being applied to the objective function. Therefore, its value can be quantified as the amount of Penalty Reductions (PR) contributed to the system provided it is completed on time. For L-Jobs, which are located at the bottom of the query tree, their PR values are calculated as follows: At the beginning of each scheduling cycle, the system calculates the current QoS of the L-Job’s corresponding query according to equation 2. The amount of PR by completing the current L-Job, say job j, is therefore approximated as n  n  + ρj Ui w ρj i=1 Ui i=1  )=  − (4) w×( n + ρj n + ρj n + ρj In the above equation, w is the weighting factor  assigned to the given query, Ui is the penalty function defined in equation 1, and ρj is the production of multiplicities of all operators along the path from the first operator that takes the input stream to the leaf operator (i.e. the one performing the L-Job). Unfortunately, computing PR for NL-Jobs is not straightforward. With the premise that the system will not be overloaded, we may reasonably assume that the number of late jobs in each scheduling cycle does not constitute a significant portion of the total number of jobs in that cycle. In other words, most NL-Jobs should be finished around their MinDues since a delayed NL-Job will seriously affect all the downstream jobs. In view of this, we adopt an optimistic approach in our heuristic: We assign an NL-Job’s PR to be the sum of PRs of its immediate downstream jobs. Experiments show that this gives a very good estimation for the PR of an NL-Job. Algorithm Sketch. The heuristic consists of two phases. Line 1 to 7 of Algorithm 1 sketch the first phase. As our intention is to maximize the potential NL-Job value (the optimistic view), the algorithm allocates all NL-Jobs at the earliest possible time without considering any L-Job in this phase. The sequence among NL-Jobs are determined according to their MinDue. By doing so, it implicitly enforces the precedence constraints because an upstream job always has an earlier MinDue than its downstream jobs. When an NL-Job is allocated a time slot, its scheduled completion time is recorded. This information will be used to check how much laxity is left between the current scheduled completion time and the due date. In phase II of the heuristic (Line 8 to 30), L-Jobs are inserted into the schedule produced in phase I in a greedy manner. At the very outset, it is necessary to introduce the concept of Penalty Reduction Density (PRD) for NL-Job. The idea is similar to value density in job scheduling [11]. Essentially, it is the ratio between the value and the cost for a given job. Since for NL-Job, its value (in terms of PR) drops from maximum to

224

J. Wu, K.-L. Tan, and Y. Zhou

Algorithm 1. Job Scheduling Algorithm Notations: N : the set of all NL-Jobs L: the set of all L-Jobs P : Queue that records the final schedule x.cost: the time cost for processing job x x.time: current scheduled completion time for job x x.due: deadline of job x if x is an L-Job; MinDue if x is an NL-Job x.maxdue: MaxDue of NL-Job x x.pr: the value (in terms of Penalty Reduction) of a job x x.prd: Penalty Reduction Density of an NL-Job x 1: t:=0 2: while N is not empty do 3: Find job i in N with the earliest MinDue 4: N := N \{i} 5: i.time := t + i.cost 6: Append i at the end of P 7: t := i.time 8: while L is not empty do 9: Find job j in L with the earliest deadline 10: L := L\{j} 11: if CheckAncestor(j) returns FALSE then 12: Append j at the end of P /* j will miss the deadline */ 13: else 14: Search from the beginning of P and find the first job k s.t. k.time ≥ j.due − j.cost 15: Let Q denote the set that consists of job k and all jobs after k in P 16: pen := 0 /* accumulated penalty */ 17: for all job m ∈ Q do 18: if m.time < m.due then 19: p := min{m.cost, max{0, m.time + j.cost − m.due}} 20: else if m.time ≥ m.due AND m.time < m.maxdue then 21: p := min{m.maxdue − m.time, j.cost} 22: else 23: p := 0 24: pen := pen + m.prd × p 25: if j.pr > pen then 26: Insert j into P right before job k 27: for all job n ∈ Q do 28: n.time := n.time + j.cost 29: else 30: Append j at the end of P /* j will miss the deadline */

zero as the completion time moves from below MinDue to above MaxDue, we can evaluate the PRD of an NL-Job using the following formula: P RD =

PR MaxDue − MinDue

(5)

QoS-Oriented Multi-query Scheduling over Data Streams

225

With PRD, we are able to estimate the potential penalty incurred with respect to the tardiness of an NL-Job completion. Phase II of the algorithm goes as follows: All L-Jobs are considered for scheduling one by one ordered by their due dates. Firstly, function CheckAncestor() verifies the ancestor jobs are scheduled early enough for the current L-Job to complete on time (Line 11). This checking is straightforward and runs in O(log n) time. If the checking is passed, the L-Job will be tentatively allocated a latest possible time slot such that it can be completed by the due date. For all NL-Jobs whose completion times are deferred due to the insertion of the new L-Job, their potential losses, in terms of loss in PR, are assessed by multiplying the job tardiness with the job PRD (Line 17 to 24). And their aggregate value is compared against the new L-Job’s PR. The new job can secure the tentative time slot only if its PR is greater than the total loss of PR of all NL-Jobs affected; otherwise, the L-Job is appended at the end of the schedule and treated as a late job. The total runtime of the heuristic is O(n2 ). 4.3 Multi-stream Queries As mentioned at the beginning, for queries taking multiple input streams, the response time is defined as the difference between the result tuple delivery time and the maximal timestamp among all the contributing input tuples. However, quite often the system has no clue as to which particular contributing input tuple carries the largest timestamp value when they just arrive at the query engine. In the sequel, jobs triggered by the arrivals of these tuples will be assigned earlier deadlines than what is necessary (if those tuples indeed are not the one that carries the largest timestamp in the final output). Although such miscalculation only means some jobs will be unnecessarily scheduled in advance and no serious consequence will result most of time, it would be better to identify the delay or time difference among the input streams such that an offset can be applied to improve the deadline prediction. This is an issue related to input stream coordination. Strategies [2,14] have been proposed to identify the synchronization issue among the streams. However, this is an area where ideal solution has not been found. We will study this problem further in our future work. 4.4 Batch Processing Tuple based operator scheduling offers fine-grained control over query execution. However, it leads to substantial overhead due to frequent context switches among operators. Batch based strategy effectively reduces such overhead by processing a series of tuples in one shot. The batch processing discussed in this section refers to grouping input tuples (from the same data source) such that they collectively trigger one job to be created for each involved operator. (as opposed to tuple based scheduling where every input tuple triggers one job for an involved operator) This not only helps cut down the number of context switches as mentioned, but also helps reduce the runtime of the scheduling algorithm since fewer jobs need to be scheduled in the system. An important issue to consider here is the appropriate size for each batch. Given the dynamic input characteristics, we propose a dynamic criterion to determine the batch size as follows: Sequential tuples from the same input may form a single batch if 1)

226

J. Wu, K.-L. Tan, and Y. Zhou

The timestamp difference between the head tuple and the tail tuple in the batch does not exceed aμ, where μ is the average laxity (defined as RAT-RPT ) of jobs currently in the system and a is a runtime coefficient. 2) The timestamp difference of any two consecutive tuples is no more than bτ , where τ is the average inter-arrival time of the corresponding input stream and b is a runtime coefficient. Criteria 1 essentially constrains the length of the batch. The reference metric is the average laxity of the jobs currently in the system. The intuition here is job’s laxity should be positively related to the length of delay that input tuples can tolerate. (Consider the delay experienced by the first tuple in the batch.) In our experiment, we set a = 1. It means the timestamp difference between the head tuple and tail tuple in the batch cannot be greater than the average jobs’ laxity value in the system. Criteria 2 essentially determines the point that can mark the end of a batch. In our experiment, we set b = 2. It means if timestamp difference of two consecutive tuples is greater than two times of the average inter-arrive time, they will be separated into two batches. Another issue here is to set the appropriate deadline of a job batch. The easiest way is to find a representative tuple from the batch and set the batch deadline according to that tuple’s deadline. In our experiment, we choose the first tuple in the batch (which has the earliest timestamp value) as the representative tuple because it produces the most stringent deadline that guarantees no output tuples generated from the batch will be late if that deadline can be met.

5 Experimental Evaluation 5.1 Experimental Setup We implemented our proposed algorithm as part of the QoS-aware DSMS prototype system. The system mainly consists of three components: query engine, statistical manager and query scheduler. The query engine is able to process queries involving selection, projection, join and aggregation. The statistical manager monitors information such as unit processing cost of each operator, input data rate as well as current QoS of each registered query and reports them to the scheduler, which then makes scheduling decisions based on these information. The multi-query plan used for the experiment is generated randomly. The number of involved operators ranges from 24 to 48 and the number of involved queries are between 12 and 32. Each query has been given a QoS threshold, whose value is an integer between 500 to 10000 (ms). Also a weighting factor (integer between 1 and 10) is assigned for each query. We use three different data streams for our experiment. They are generated by a data generator which produces input streams following Poisson process with customizable mean inter-arrival time. Each produced tuple is given a timestamp indicating its generation time. Input tuples have to go through a “delay” operator before they can be processed. The “delay” operator essentially simulates the input transmission delay from data source to the query engine. For comparison, we also implemented three other scheduling strategies: Earliest Deadline First (EDF), Least Laxity First (LLF)2 and a random approach. All 2

The strategy schedules the job with minimum slack time (i.e. min{RAT-RPT} among all jobs with RAT ≥ RPT) to execute.

QoS-Oriented Multi-query Scheduling over Data Streams 100

100 EDF LLF Random Heuristic

EDF LLF Random Heuristic

80 QoS Score (%)

80 QoS Score (%)

227

60

40

20

60

40

20

0

0 0

10

20

30 time (seconds)

40

50

60

0

10

20

30 time (seconds)

40

50

60

100

100

80

80 EDF LLF Random Heuristic

60

QoS Score (%)

QoS Score (%)

Fig. 4. QoS Score with increasing data rate Fig. 5. QoS Score with increasing input (tuple-based) transmission delay (tuple-based)

40

20

EDF LLF Random Heuristic

60

40

20

0

0 0

10

20

30 time (seconds)

40

50

60

0

10

20

30 time (seconds)

40

50

60

Fig. 6. QoS Score with increasing data rate Fig. 7. QoS Score with increasing input (batch-based) transmission delay (batch-based)

experiments were conducted on an IBM x255 server running Linux with four Intel Xeon MP 3.00GHz/400MHz processors and 18G DDR main memory. 5.2 Performance Study For ease of presentation, we convert the value of the objective function (defined in equation 3 in terms of weighted penalty aggregation) to a percentile QoS score as follows: Uworst − U (6) Uworst where U denotes the penalty value obtained from the experiment and Uworst denotes the worst possible penalty value. (i.e. the penalty value when all output tuples are late) QoS score =

Strategy Comparison. We use the same set of queries and data to evaluate the performance of all four scheduling strategies. The experiment is designed as follows: We use two different ways to slowly increase the system workload over time and observe the resultant QoS score. In the first case, we achieve this by improving the data rate until the workload is equivalent to 80% of the system capacity. In the second approach,

228

J. Wu, K.-L. Tan, and Y. Zhou 100 100

90 85 80 75

95 Average QoS Score (%)

Average QoS Score (%)

95

LLF LLF-B Random Random-B Heuristic Heuristic-B

LLF-B Random-B Heuristic-B

90 85 80 75 70

70

65 65

60 0.4

60

0.6

0.8

Query to Operator Ratio

Fig. 8. Average QoS score

Fig. 9. Multi-query scalability

we keep the workload constant while slowly increasing the transmission delay of input tuples (achieved through the “delay” operator). This essentially reduces the RAT for input tuples leading to scheduling contentions. Results from Figure 4 and 5 clearly indicate that in both cases our heuristic approach performs better than other strategies. In particular, we can see that for EDF, the QoS score starts to drop significantly when the input load exceeds certain level. This can be explained by the Domino Effect: When system load becomes heavy, an EDF scheduler can perform arbitrarily badly because it consistently picks the most urgent task to execute, which may be already hopeless to meet the deadline. The LLF also performs worse than our heuristic due to its lack of vision to foresee the potential gain of scheduling NL-Jobs. The same set of experiments is also performed when all four strategies are running in batch-based scheduling mode (refer to Figure 6 and Figure 7). And similar conclusions can be reached. Tuple-Based vs. Batch-Based Scheduling. We focus on performance comparison between tuple-based scheduling and batch-based scheduling in this section. Since the EDF strategy may perform arbitrarily badly, we do not include it for our consideration. Figure 8 plots the average QoS score achieved by the other three strategies for both tuple level scheduling and batch level scheduling. It turns out that batch-based scheduling outperforms tuple-based scheduling for all three strategies. This is mainly attributed to reduced number of context switches among query operators as well as decreased scheduling overhead. We also conducted experiments on input data with bursty nature. The performance contrast is even more obvious between tuple-based strategy and batch-based strategy. Due to space constraints, the details are not reported here. Multi-query Scalability. A commercial stream system often runs a large number of similar queries, each for one subscribed user. That means within the query engine, operators are largely shared by different queries. This experiment examines the scalability of different scheduling strategies (excluding EDF for the same reason as above) in terms of query sharing. The degree of sharing is measured by the ratio between the number of queries to the number operators in the system. As depicted in Figure 9, with the increase of the query sharing, the superiority of our heuristic over other scheduling approaches become more and more evident. This is because our algorithm adopts a planning-based approach, which is able to look ahead to assess the potential value of each job for the

QoS-Oriented Multi-query Scheduling over Data Streams

229

entire system, while other strategies do not possess such clairvoyance when making the scheduling decision.

6 Conclusions For a service-oriented data stream system, QoS-based query scheduling is an indispensable component. We propose a new multi-query scheduling strategy that aims to turn a DSMS into a true real-time system that can meet application-defined deadlines. The strategy is based on a novel transformation of our query scheduling problem to a job scheduling problem. Experimental study demonstrates a promising result of our proposed strategy. As part of the future plan, we will extend the current work to cater for applications with more generalized QoS specifications.

References 1. Babcock, B., Babu, S., Datar, M., Motwani, R.: Chain: Operator Scheduling for Memory Minimization in Data. In: SIGMOD, pp. 253–264 (2003) 2. Babu, S., Srivastava, U., Widom, J.: Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Trans. Database Syst. 29(3), 545–580 (2004) 3. Carney, D., C ¸ etintemel, U., Rasin, A., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Operator Scheduling in a Data Stream Manager. In: VLDB, pp. 838–849 (2003) 4. Graham, R.L., Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G.: Optimization and Approximation in Deterministic Sequencing and Scheduling: A Survey. Annals of Discrete Mathematics 5, 287–326 (1979) 5. Jiang, Q., Chakravarthy, S.: Scheduling strategies for processing continuous queries over streams. In: Williams, H., MacKinnon, L.M. (eds.) BNCOD 2004. LNCS, vol. 3112, pp. 16–30. Springer, Heidelberg (2004) 6. Karp, R.M.: Reducibility among Combinatorial Problems. In: Complexity of Computer Computations, pp. 85–103. Plenum, New York (1972) 7. Lawler, E.L.: Sequencing to minimize the weighted number of tardy jobs. RAIRO Operations Research 10, 27–33 (1976) 8. Lawler, E.L., Moore, J.: A functional equation and its application to resource allocation and sequencing problems. Management Science 16, 77–84 (1969) 9. P´eridy, L., Pinson, E., Rivreau, D.: Using short-term memory to minimize the weighted number of late jobs on a single machine. European Journal of Operational Research 148(3), 591– 603 (2003) 10. Sharaf, M.A., Chrysanthis, P.K., Labrinidis, A., Pruhs, K.: Efficient Scheduling of Heterogeneous Continuous Queries. In: VLDB, pp. 511–522 (2006) 11. Stankovic, J.A., Spuri, M., Rmamritham, K., Buttazzo, G.C.: Deadline Scheduling for Realtime Systems - EDF and Related Algorithms. Kluwer Academic Publishers, Norwell (1998) 12. Urhan, T., Franklin, M.J.: Dynamic Pipeline Scheduling for Improving Interactive Query Performance. In: VLDB, pp. 501–510 (2001) 13. Wu, J., Tan, K.L., Zhou, Y.: QoS-Oriented Multi-Query Scheduling over Data Streams. Technical Report (2008), http://www.comp.nus.edu.sg/˜wuji/TR/QoS.pdf 14. Wu, J., Tan, K.L., Zhou, Y.: Window-Oblivious Join: A Data-Driven Memory Management Scheme for Stream Join. In: SSDBM Conference, 21 (2007)

Suggest Documents