Comparison of Strategies for Serving Two Streams ... - Semantic Scholar

Comparison of Strategies for Serving Two Streams of Jobs Feng Zhang, Lester Lipsky∗, Sarah Tasneem†, and Steve Thompson F, X

Abstract Highly varying job demands generally consist of many short jobs mixed with several long jobs. In this paper, we consider a simple scenario where two job streams with different level of demands must be processed by the same server. We study the performance of several round-robin variants and F CF SP in such a scenario. The simulation results show that on the one hand, by employing immediate preemption to favor newly arrived jobs, round-robin can effectively reduce the mean response time for the short-job stream, while only slightly increasing the response time for the longjob stream. On the other hand, by assuming the availability of job stream information and always favoring the short-job stream, F CF SP may improve performance. However, to further improve performance, other information if available (e.g., the characteristics of each individual stream) should be considered. Keywords: First-come-first-served with priority (F CF SP ), round-robin (RR), last-come-first-served with preemptive resume (LCF SP R), least-attainedtime (LAT )

1

Introduction

Imagine n Poisson streams of jobs arrive at a single-queue server for processing as illustrated in Figure 1. Stream i has a job arrival rate λi and a service time distribution Fi (x) with mean service time x ¯i = E[Xi ], where Xi is the service time random variable of Stream i. By considering all the streams together, the whole system is an M/G/1 Pn queue. Specifically, the total job arrival rate λ = i=1 λi and the probability that a given job is from Stream i is pP i = λi /λ. n The overall service time distribution FP(x) is i=1 pi Fi n with mean service time x ¯ = E[X] = i=1 pi x ¯i . Many scheduling strategies have been presented to be used for an M/G/1 queue. On the one extreme, if the individual service times are known ∗ F. Zhang, L. Lipsky, and S. Thompson, are with the Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 (email: [email protected]; [email protected]; [email protected]) † S. Tasneem is with the Department of Math and Computer Science, Eastern Connecticut State University, Willimantic, CT 06226 (email: [email protected])

λ

λi

Tw

Figure 1: Illustration of n streams of jobs arriving at a single-queue server. exactly, shortest-remaining-processing-time (SRP T ), which preempts the current job in execution if the newly arrived job has less service time requirement, is optimal [1, 2]. On the other extreme, without requiring such information, first-come-first-served (F CF S) is efficient in handling jobs with small squared coefficient of variation (i.e., Cv2 = σ 2 /¯ x2 < 1, where σ is the standard deviation), whereas processor sharing (P S), last-come-first-served with preemptive resume (LCF SP R), and least-attained-time (LAT ) [3] can handle jobs with Cv2 > 1 well. In a previous paper [4], we showed that by favoring newly arrived jobs, the performance of round-robin (RR) could be better than that of P S for time-slice values in the order of one mean service time. An issue is how well the latter methods perform in the case of multiple streams of jobs. For some methods like F CF S and P S, there are known results regarding their performance in the case of multiple streams of jobs. Both F CF S and P S are blind of job streams. In F CF S, all jobs will need to wait a similar amount of time in the queue before being served. Let Tw be the waiting time random variable and E[Tw ] be the mean waiting time. It is easy to see that the mean response time of each stream E[Ti ] is E[Tw ] + x¯i . So if we can compute E[Tw ], we can know how well F CF S performs for each stream of jobs. By applying Little’s Theorem, we have Eq. 1 with T being the system time random variable. According to the Pollaczek-Khinchin formula, E[T ] is given by Eq. 2, where ρ (= λ¯ x = Pn i=1 ρi ) is the overall utilization parameter, with ρi (= λi x ¯i ) being the utilization parameter of Stream i. E[Tw ] = E[T ] − x ¯ ρ¯ x Cv2 + 1 F CF S : E[T ] = x¯ + 1−ρ 2

(1) (2)

So we have mean response time of each stream to be

proportional to Cv2 , as given by Eq. 3 and Eq. 4: F CF S : E[Tw ] = F CF S : E[Ti ] =

ρ¯ x Cv2 + 1 1−ρ 2 ρ¯ x Cv2 + 1 + x¯i 1−ρ 2

(3) (4)

If P S is employed, the mean system time E[T ] is given by Eq. 5, which is the same as that for the M/M/1 queues [5]. So P S is preferred to F CF S since E[T ] does not depend on Cv2 any more. Is the mean response time of each stream independent of Cv2 as well? By applying results of multi-class Jackson networks, this is indeed true as E[Ti ] is given by Eq. 6. For highly varying job demands (e.g., the majority of jobs from shortjob streams and a small portion of jobs from long-job streams), Cv2 could be much greater than 1. In such cases, F CF S could be much worse than P S for handling each stream, even though both of them are blind of job streams. What if each stream can be differentiated from the others? Can the job stream information be employed to come up with a better strategy? P S : E[T ] = P S : E[Ti ] =

x ¯ 1−ρ x ¯i 1−ρ

2

Round-Robin Variants

(5) (6)

To answer these questions, in this paper, we consider a simple scenario with two streams of jobs: one stream characterized by a small mean service time x ¯s , and the other characterized by a large mean service time x ¯l . We assume that the job stream information is available. Such an assumption is not too unrealistic in many cases. For example, a Web server can distinguish the incoming traffic by examining the subnet IP addresses of individual requests. Within this scenario, we investigate a policy employing the job stream information, which is called first-come-first-served with priority (F CF SP ). In F CF SP , jobs in the same priority classes are served in F CF S and those from higher priority classes always preempt the current active job immediately. By assuming that the two streams of jobs correspond to two priority classes with the short-job stream having higher priority, it is obvious that the short-job stream will not be affected by the long-job one at all. The mean response time of the short-job stream E[Ts ] using F CF SP is given by: F CF S : E[Ts ] =

mentioned scenario with two streams of jobs. Unlike F CF SP , these three round-robin variants are blind of job streams. So we first study how the variants handle each stream of jobs by considering several cases, which are characterized by different service-time distributions. The simulation results show that the variants using immediate preemption can reduce E[Ts ] remarkably, whereas the delay on long jobs remains small. Furthermore, by always favoring the short-job stream, it is shown that F CF SP slightly outperforms the round-robin variants in some cases. However, even with such information incorporated, F CF SP may perform poorly. Indeed, it is found to be worse in one simulation. It is concluded that the distribution of each stream is actually important in determining which strategy is best. The rest of this paper is organized as follows: Section 2 briefly describes three round-robin variants investigated in [4]; The simulation results are presented in Section 3; Section 4 concludes the paper.

2 x ¯s x ¯s ρs Cvs −1 + (7) 1 − ρs 1 − ρs 2

Here, ρs = λs x ¯s , with λs being the short-job stream arrival rate. In addition to F CF SP , we re-examine three round-robin variants investigated in [4] in the above

As discussed in [4], the handling of newly arrived jobs is important if Cv2 6= 1 and putting newly arrived jobs at the back may not be the best strategy. To avoid unnecessary delay on the short jobs in the case of Cv2 ≫ 1, it is critical to serve newly arrived jobs first. Based on this idea, three round-robin variants were investigated in [4]. The first round-robin variant (denoted by F rontN P R) inserts a newly arrived job at the front of the queue so that it is served after the current job in execution finishes or uses its time-slice ∆. If the new job is shorter than ∆, it finishes in one ∆. For not too-small a ∆, most short jobs can get through quickly. Only if a new job is a long job, will it take multiple ∆s. Each time it gets a ∆, it moves to the end of the queue and will not get another one until all the jobs in front of it (including newly arrived jobs) get served. So its delay on short jobs is alleviated. Whereas no preemption is performed on the current active job (before expiration of its ∆) in F rontN P R, the second and third variants (denoted by P reF ull and P reRem) favor a newly arrived job by preempting the current active job and serving the new job immediately. The preempted job is put at the front of queue. The difference between these two variants is that in P reF ull, the preempted job will get another full ∆ when resumed, while in P reRem, the preempted job will get its unused portion of ∆ when resumed. In both variants, if the current active job could not finish in the allocated ∆ and no preemption happens during its execution (i.e., no new job arrives),

Table 1: Description of strategies. Strategy F rontN P R P reF ull

P reRem

F CF SP LAT

Description Front insertion of new jobs, no immediate preemption Immediate preemption of current active job, allocated full ∆ when resumed Immediate preemption of current active job, allocated remaining ∆ when resumed F CF S in each class, immediate preemption of low-priority jobs Jobs with least attained times first

the job will move back to the end of the queue and get a full ∆ in the next round. All these variants are simple to implement since the insertion of jobs is only needed at the front and back of the queue and no exact service times of individual jobs are required to be known. Although P reF ull and P reRem require immediate preemption of the current active job, no special preemption scheme needs to be defined. Instead, the existing scheme can be reused since jobs need to be preempted anyway at the end of their time-slices. The only extra information to be managed in P reRem is the remaining time-slice values of individual jobs, which can be easily computed. In the next section, we study the performance of these three variants and F CF SP in handling two streams of jobs using simulation. To facilitate the following discussion, we list a brief description of strategies examined in this paper, in Table 1.

3

Simulation Results

In this section, we study the performance of the above round-robin variants and F CF SP in handling two Poisson streams of jobs. Three M/G/1 queues are considered with G being hyper-exponential-2 (H2 ), hyper-Erlangian-2 (HE2 ), and truncated-power-tail with three phases (P T3 ). H2 corresponds to the case of two streams of jobs with exponential (Cv2 = 1) service times, HE2 corresponds to the case of two streams of jobs with Erlangian-2 (E2 , Cv2 = 0.5 < 1) service times, whereas P T3 represents one stream of jobs with exponential service times and another stream of jobs with Cv2 > 1. The last case is denoted by P T3 because the way of setting its parameters is based on the truncated-power-tail (T P T ) distribution model (and using α = 1.4) [6]. A short description of each of the distributions can be found in Appendix. We assume

that the mean service time of each distribution is 1.0 (i.e., x¯ = 1.0). The Cv2 value of the distributions is set to 10.0. Since H2 and HE2 have another free parameter, for simplicity, it is assumed that 90% jobs are from the short-job stream. A finite set of ∆ values (with minimal value 0.1) are selected to show the behavior of the round-robin variants near origin (i.e., for relatively small ∆ values) as well as how they approach the optima when ∆ increases to the order of one mean service time. For each chosen ∆ and a given distribution, one simulation run of using 109 job samples is conducted.

3.1

Performance of RR Variants

In simulation, we measure the mean response times for each stream separately with respect to the chosen ∆ values. The simulation results are shown in Figures 2, 3, and 4. Note that as ∆ → 0, all variants converge to P S and the mean response time for each stream can be computed analytically using Eq. 6. It is clear from the figures that the simulation results approach the analytical solutions for small ∆ values. As ∆ increases, the performance of all variants on the short-job stream improves initially, but degrades eventually for large ∆ values, whereas the performance on the long-job stream first degrades, but improves for large ∆ values. This is because all variants favor new jobs, which are likely to be short. With the increase of ∆, more and more short jobs can get through in one ∆ without being delayed much by the existence of long jobs. When ∆ is in the order of one mean service time, this allows almost all the short (and therefore comparable size) jobs to be LCF S, but allows processor sharing for mixtures of many short jobs with 1 or 2 long ones. As ∆ is increased further, long jobs will benefit more in comparison to short jobs. The second observation is that the performance improvement of using F rontN P R on the short-job stream is far less than that of using P reF ull and P reRem. The latter two variants can achieve over 50% reduction on E[Ts ] under moderately-heavy load condition. This demonstrates that by employing immediate preemption, both variants ensures the timely processing of short jobs. On the other hand, not much impact has been imposed on the long-job stream using any of the round-robin variants. In particular, the increase of E[Tl ] is usually less than 5% with respect to that of using P S. After all, although 90% of jobs are from the short-job stream in the case of H2 (or HE2 ), they represent only 26.87% (or 14.5%) of the total work load. The third observation is that P reRem favors the short-job stream for a wider range of ∆ values than P reF ull does. Our explanation is that in P reF ull, a

(a)

(b)

(a)

7.45

0.6

7.6

FrontNPR

0.55

FrontNPR

0.7

PreFull

PreRem

FCFSP

0.65

7.4

0.45

7.2

FCFSP

7

0.35

7.35

(1-ρ)E[Tl]

l

s

(1-ρ)E[T ]

0.6

0.4

(1-ρ)E[T ]

(1-ρ)E[Ts]

7.4

PreFull

PreRem

0.5

(b)

0.75

0.55 0.5

0.3

7.3

6

0.35

0.15 1

2 ∆

3

7.25 0

4

1

2 ∆

3

4

Figure 2: The performance of round-robin variants for individual branches of H2 for ρ = 0.7: (a) Short job Stream (¯ xs = 0.29); (b) Long job Stream (¯ xl = 7.36). (a)

0

5.8 1

2 ∆

3

4

5.6 0

1

2 ∆

3

4

Figure 4: The performance of round-robin variants for individual branches of P T3 for ρ = 0.7: (a) Short job Stream (¯ xs = 0.64); (b) Long job Stream (¯ xl = 6.07).

(b)

0.7

9 FrontNPR

8.8

PreFull

0.6

PreRem

8.6

FCFSP

0.5

8.4

0.4

8.2

l

(1-ρ)E[T ]

(1-ρ)E[Ts]

6.2

0.4

0.2

0.3

8 7.8 7.6

0.2

7.4 0.1 7.2 0 0

1

2 ∆

3

4

7 0

1

2 ∆

3

4

Figure 3: The performance of round-robin variants for individual branches of HE2 for ρ = 0.7: (a) Short job Stream (¯ xs = 0.16); (b) Long job Stream (¯ xl = 8.55). preempted job is given a full time-slice when resumed so that a long job may be served for more than one full ∆ in each round. This delays the processing of many short jobs, especially for large ∆ values. On the other hand, P reRem will give only one full time-slice to a long job in each round so that the delay on short jobs is not so significant for large ∆ values. By slowing down the processing of long jobs slightly, P reRem has a more superior performance on the short-job stream. Overall, P reRem performs no worse than P reF ull under all the examined cases.

3.2

6.6 6.4

0.45

0.25

0.1 0

6.8

Performance of F CF SP

In the figures, one can observe that F CF SP provides better performance for the short-job stream in all three examined distributions. This is because using

F CF SP , the processing of the short-job stream will not be slowed down by that of the long-job stream at all, whereas with no knowledge of job stream information, all three round-robin variants will not be able to guarantee the server to be available to serve the shortjob stream first. Even though P reRem and P reF ull can reduce the impact of the long-job stream on the short-job stream by using ∆ values in the order of one mean service time, some interference still exists. Whereas the performance of F CF SP on the short-job stream is a lower bound for H2 , HE2 , and P T3 , its performance on the long-job stream depends on the distributions. For H2 (as shown in Figure 2), it is worse than the three round-robin variants. This is reasonable since it gives priority to the short-job stream. In addition, each stream is exponential so F CF S performs the same as P S for the long-job stream. For ∆ values in the order of one mean service time, most short jobs can be finished in one time-slice, whereas most long jobs may require tens of time-slices to be completed. So essentially, the outcome is like using P S on the long-job stream. This is probably why P reRem and P reF ull perform similarly as F CF SP for such ∆ values on the long-job stream. If the service time requirements of two streams of jobs together satisfy HE2 , F CF SP outperforms the round-robin variants on the long-job stream as well. This seems contradictory to the trends obtained in the case of H2 on the first sight. But there is no contradiction. Actually, each stream of HE2 is E2 , whose Cv2 < 1. It is clear that F CF S is preferred to RR in handling the long-job stream. So it is not surprised to see that E[Tl ] of using F CF SP is more than 15% lower than those of using the three round-robin variants.

3.3

For HE2 , Figure 6 depicts that F CF SP still outperforms P reRem, whereas LAT does not. The superior performance of F CF SP again relies on the fact of knowing the job stream information and then favoring short jobs. Moreover, as explained in the previous sub-section, F CF S is more effective than P S or round-robin in serving the long-job stream that is E2 . The degrading performance of LAT is because the residual-time function of HE2 is not monotonically increasing. As such, LAT often identifies jobs incorrectly and favors long jobs instead of short jobs. 1

0.95

(1-ρ)E[T]

By explaining the above two different trends, we can then understand the results of P T3 . Similar to H2 , F CF SP is also worse than the three round-robin variants. But it is much worse and the three variants are superior for any ∆ value. The reason is that the long-job stream is H2 (with Cv2 = 3.19). In such a case, RR is preferred to F CF S. It should then be pointed out that F CF SP may perform even worse if the long-job (as well as short-job) stream has Cv2 ≫ 1. From the above analysis, it can be concluded that by using job stream information, F CF SP may improve performance in some cases, but not always. To further improve performance, it seems important to utilize other available information such as the distribution of each stream. For example, if it is known that 2 both streams have Cvi > 1, round-robin with priority (RRP ) is preferred to F CF SP .

Overall Performance Comparison

Here, we consider two streams together and compare the overall performance of P reRem (the best among the variants) and F CF SP with LAT . In LAT , jobs are differentiated based on their attained service times and those with less attained times (e.g., newly arrived jobs) are always favored. Figure 5 shows that for H2 , both LAT and F CF SP outperform P reRem slightly. This is expected since both of them can distinguish short jobs from long jobs more accurately. Specifically, since H2 has an increasing residual-time (i.e., expected-remaining-time) function, in which longer attained times correspond to longer residual times, LAT essentially favors short jobs. By relying on the job stream information, F CF SP also favors short jobs based on our assumption. Note that if the long-job stream is favored, the performance of F CF SP will be much worse.

0.9

0.85

LAT:HE

0.8

2

PreRem:HE2 FCFSP:HE2

0.75 0

0.1

0.2

0.3

ρ

0.4

0.5

0.6

0.7

Figure 6: For hyper-Erlangian-2 (HE2 ), P reRem is worse than F CF SP , but better than LAT . Finally, it is clear from Figure 7 that both LAT and P reRem outperform F CF SP for P T3 . LAT performs the best because P T3 has an increasing residualtime function as well. P reRem ranks the second by effectively reducing the mean response time of the longjob stream (characterized by H2 ), while F CF SP performs poorly, as shown in Figure 4.(b). 1

1 0.95

0.98 0.96

0.9

(1-ρ)E[T]

(1-ρ)E[T]

0.94 0.92

0.85

0.9 0.8 0.88 0.86 0.84

LAT:PT

0.75

LAT:H2 PreRem:H2

FCFSP:PT

0.7 0

FCFSP:H2

0.82 0

0.1

0.2

3

PreRem:PT

0.3

ρ

0.4

0.5

0.6

0.7

Figure 5: For hyper-exponential-2 (H2 ), P reRem performs slightly worse than F CF SP and LAT .

0.1

3

3

0.2

0.3

ρ

0.4

0.5

0.6

0.7

Figure 7: For P T3 , P reRem is worse than LAT , but better than F CF SP .

4

Conclusions

In this paper, we examined the performance of three round-robin variants and F CF SP in handling two streams of jobs. Through simulation, it was demonstrated that by employing immediate preemption (i.e, preempting the current active job immediately upon a new arrival), round-robin can implicitly ensure the timely processing of the short-job stream using time-slice values in the order of one mean service time. Furthermore, it avoids any significant delay on the processing of the long-job stream in comparison to P S. The simulation results also show that if the job stream information is available, a strategy like F CF SP may improve performance over round-robin and LAT , especially when the service times of each stream satisfy the exponential distribution or a distribution with Cv2 < 1. However, when the long-job stream has Cv2 > 1, F CF SP could be much worse than round-robin or LAT . Our future work will focus on investigating other methods that employ not only the job-stream information, but also the diverse characteristics of each job stream. For example, a scheduling method better than F CF SP would be using an appropriate strategy for each stream based on its squared coefficient of variation. Specifically, if it is decided (e.g., through 2 measurement) that Stream i has Cvi ≤ 1, F CF S will be chosen to serve Stream i, whereas if Stream i has 2 Cvi > 1, some round-robin variant like P reRem will be chosen to serve the jobs from Stream i.

Appendix: Distributions used Here For any service time distribution, the probability that the service time is no more than x is F (x) = P r(X ≤ x), which is called the Probability Distribution Function (PDF) in probability theory. Its derivative, f (x) = dFdx(x) , is called the probability density function (pdf). In the following, we describe the service-time distributions examined in this paper and give their pdfs. The well-known exponential distribution has pdf f (x) = µe−µx , where µ is the service rate. All the other distributions examined here use exponential distribution as a building block. The Erlangian-n distribution (i.e., n-phase Erlangian distribution) describes the distribution of the sum of n mutually independent, identically distributed exponential random variables. Its pdf is known to be, f (x) =

µ(µx)n−1 −µx e . (n − 1)!

The hyper-exponential-2 distribution describes the service times of jobs whose execution takes one

of two exponential branches, with service rates µ1 and µ2 , respectively. The first branch will be taken with probability p1 and the other with probability p2 (p1 + p2 = 1). Its probability density function is, f (x) = p1 µ1 e−µ1 x + p2 µ2 e−µ2 x . The hyper-Erlangian-n distribution also describes a two-branch process. However, instead of being exponential, each branch is En . By referring to the pdf of En and that of H2 , the probability density function of HEn is, f (x) = p1

µ1 (µ1 x)n−1 −µ1 x µ2 (µ2 x)n−1 −µ2 x e + p2 e . (n − 1)! (n − 1)!

The truncated-power-tail (T P T ) distributions are a special class of generalized hyper-exponential distributions in that both the branching probabilities and branch service rates are geometric. Specifically, let θγ α = 1, where θ < 1 and α ≤ 2. The probability of moving to the ith branch pi is p1 θi−1 and the service rate of the ith branch is µ/γ i−1 if the first branch has a service rate of µ. A nice property of T P T is that its behavior in a finite range is similar to that of a power-tail distribution. Furthermore, by removing the constraint on the maximum number of branches, a power-tail distribution with α < 2 can be represented.

References [1] N. Bansal and M. Harchol-Balter. Analysis of srpt scheduling: Investigating unfairness. In Proc. ACM Sigmetrics’01, 2001. [2] L.E. Schrage and L.W. Miller. The Queue M/G/1 with the Shortest Processing Remaining Time Discipline. Operations Research, 14:670–684, 1966. [3] S. Aalto, U. Ayesta, and E. Nyberg-Oksanen. Two-level processor-sharing scheduling disciplines: mean delay analysis. SIGMETRICS Perform. Eval. Rev., 32(1):97–105, 2004. [4] F. Zhang, S. Tasneem, L. Lipsky, and S. Thompson. Analysis of round-robin variants: Favoring newly arrived jobs. To appear in ANSS 2009. [5] F. Baskett, K. M. Chandy, R. R. Muntz, and F. G. Palacios. Open, closed, and mixed networks of queues with different classes of customers. J. ACM, 22(2):248–260, 1975. [6] M. Greiner, M. Jobmann, and L. Lipsky. The importance of power-tail distributions for modeling queueing systems. Operations Research, 47(2):313– 326, 1999.