COMPARISON OF BUFFER USAGE UTILIZING MULTIPLE SERVERS IN NETWORK SYSTEMS WITH POWER-TAIL DISTRIBUTIONS John E. Hatem, Lester Lipsky and Pierre Fiorini The Taylor L. Booth Center for Computer Applications and Research 233 Glenbrook Road, U-31 University of Connecticut Storrs, Connecticut 06269-3031
[email protected] [email protected] [email protected] December 4, 1996
ABSTRACT We present the results of a parametric study of the buer size needed to prevent over ow or loss in single and multiple server systems where data arrivals or service times are \bursty", \self-similar", or \fractal". Such erratic behavior can be caused (or adequately described) by renewal processes whose interarrival distributions are power-tail (or Pareto, or Levy, or \long-tail") with in nite variance. We show that power tails can cause problems for intermediate values of the utilization parameter, , and become very serious (beyond the usual 1=(1 ? ) factor) when is close to 1, and/or when approaches 1. For systems with a power-tail arrival distribution and multiple servers(PT/M/C), we gain no performence increase by utilizing multiple Poisson servers. However, for systems with a Poisson arrival rate and power-tail service times (M/PT/C), the improvement by using multiple, slower servers over a single faster one can be great indeed. Keywords: Power-tail, long-tail, buer over ow, buer loss, multiple servers
0
Comparison of Buer Usage Utilizing Multiple Servers in Network Systems with Power-Tail Distributions 1 Introduction In recent years, there has been a push towards increasingly faster communications networks, especially in light of the World Wide Web and other systems with high-bandwidth requirements. As network demands get faster and faster, increasing network speed is seen as the only way to handle the increasing trac arriving. However, while doubling the service rate on a system does indeed double the throughput, other measurements, such as response time [ERRA96], probability of buer over ow [LIPS97] and mean time to over ow [HEAT96] in many cases show only marginal improvement or are unaected. The key to understanding this apparent paradox is in the distribution of the service times of these networks. In recent years there has been an ever increasing interest in the development of systems which will be able to process incoming trac from various communications networks. Numerous papers have appeared indicating that the trac to be expected in the future will be of an extraordinary character. Leland, et.al, [LELA94] have measured and analyzed the arrival of millions of packets on ETHERNET networks at Bellcore, and found an enormous instability of arrival rates. Park, et. al. [PARK96] have analyzed le sizes and network trac on the World Wide Web, and discovered that such trac shows variability at a wide range of scales; they see the same instability of the number of arrivals whether they measure in .01, .1, 1, 10 or 100 second intervals. This has been described as \self-similar",\bursty" or \fractal" behavior. This behavior is not new or exclusive to multimedia; this self-similarity was evident in text le sizes and CPU time distributions years ago, but was not thought to be important. We are just now gaining an understanding of this self-similarity. Willinger, et. al. [WILL95] has shown that self-similarity can be caused by the superimposition of heavy tailed on-o processes, which they called packet trains. Each node on a network is either transmitting (which corresponds to the on-times) or not transmitting (which corresponds to the o periods). As the Web grows, the number of systems in this model increases; at high levels of aggregation, the bursts are more prominent. They model this by using the Hurst parameter to show the degree of self-similarity, and an appropriate underlying distribution. Greiner, et. al. [GREI95] de ne a class of heavy-tail distributions known as Power-tails, which have nite mean but in nite variance. The degree of self-similarity for a power-tail distribution is determined by , which is related to the Hurst Parameter as = (3 ? H )=2. [GREI95] describe in detail the properties of power-tail distributions, de ning an analytic class of well-behaved distributions (a sub-class of which are Phase Distributions which can be used in Markov Chain models) that have truncated power-tails (TPT), and in the limit become power-tail distributions. This class was rst used in [LIPS86] to explain the long-tail behavior of measured CPU times at Bellcore in 1986 [LELA86]. Van de Liefvoort and Weng have generated \self-similar" data of the kind described in [LELA94] by simulating a renewal process where the inter-arrival times come from a single power-tail distribution with a nite mean (but in nite variance) [LIEF94]. Current eorts have been made to model data with these distributions, with dierent formulas for deriving the (or Hurst) parameter. The diculty lies in gaining an accurate measurement of the parameter. Data with these distributions appear to be linear over several orders of magnitude on a log-log scale. One attempt at modeling has used a Hill estimator [TAQQ95] and attempting to infer a value of from a stable region of the plot. This method is heuristic at best, since it is dicult to get a point estimate for a graph, and the graph may exhibit considerable 1
volatility. Attempts at smoothing can yield somewhat better results; however, nding accurate values of remains an open topic. Current empirical estimates place somewhere between 1.05 [PARK96] and 1.4 [LELA94] for network trac. In a previous paper, we showed that this self-similarity creates tremendous bottlenecks for single server computer networks [LIPS97]. Traditionally, the way to reduce bottlenecks is to identify and upgrade the oender. This increases the speed of the system. However, the selfsimilar nature of le sizes and network trac indicate that this will only work marginally. No matter how fast you make the network or how much you speed up the system, eventually a single job of signi cant size will arrive and cause a bottleneck. This paper explores systems which exhibit this type of behavior to see if the performance of these systems can be improved by utilizing multiple servers instead of a single one with the same capacity. Heath et. al [HEAT96] studied the time it takes until a uid queue with an ON/OFF input stream (with heavy tail on distribution) and a nite but large holding capacity reaches the over ow point. Lipsky and Hatem [LIPS97] analyze systems with power-tail service or arrival times and show that the buer size required to prevent even a small amount of packets to an over ow buer increases much faster than the usual 1=(1 ? ) factor. The conclusions point to the law of diminishing returns, where geometrically increasing the buer size only yields linear improvements in percent of packets rejected. However, these models are all based on a First-come First-served algorithms with a single server. We will show here how multiple servers improve performance dramatically, especially for moderately loaded systems. Many analyses focus on unrealistically high utilization parameters (i.e. := arrival rate mean service time = :9 and above) where few people would actually run their systems. By focusing on systems with more realistic utilizations, we can hope to nd ways of improving working systems. Another advantage of using multiple servers is expense; a server (or network line) that runs twice as fast is almost always more expensive than two systems (network lines) at single speed. (Note: In this paper, we use the word \system" to indicate any shared resource.) These analyses will focus on network lines such as the World Wide Web, which epitomizes the self-similar behavior we are examining. We show that even with limited buer size, where over ow packets are rejected, slower multiple servers still outperform single fast servers in systems with power-tail distributions. The rationale behind this approach lies with the fundamental nature of the packet trac: its self-similar behavior. With this behavior, eventually a job with a long service time will arrive at the server. When this arrives, it occupies the server, no matter how fast, for a long period of time. While the server is occupied, the queue backs up, waiting time increases, and the buer size needed to prevent over ows (or packets going to a backup buer) increases. With two servers, this long job can go to one server, while the smaller jobs can still go through the other server. This is optimal for moderate utilizations because at low utilization, a double fast server will be able to process the infrequent large job. At high utilization, both servers eventually are occupied with long jobs, and the system behaves similarly to a system with a double fast server. Throughout this paper, we will show for self-similar distributions, multiple slower servers is almost always better. Of course, all this is done assuming steady-state behavior. But this may require inordinately many arrivals before such large queue lengths could be seen in reality. Discrete event simulation models necessarily suer from the same problem. [GREI95] presented an argument showing that the closer is to 1, the more arrivals must occur before any system's steady state can be approached. They show that the number of events needed for = 1.4 is 100 times that needed for a similar queue with Poisson arrivals and exponential service times. Thus transient behavior should be examined in future work. 2
2 The Basic Models Our system is made up of a single processor or multiple processors receiving data from an arrival stream of variable-sized packets. The arrival stream may be considered a renewal process. Fiorini and Lipsky [LIPS95] have compared Truncated Power-tail renewal processes with equivolent semi-markov (non-renewal) processes and show that the behavior is similar. The receiving processor has a nite primary memory buer which can hold at most N packets. If a no-loss system is required, then we assume there exists an unbounded secondary or backup-buer that will store the over ow (e.g., a disc-array sub-system), and transfer the data to the primary buer as space becomes available. We also assume that this transfer will occur faster than the primary buer can empty out. The assumption of an (almost) \in nite buer" is not unreasonable, given the emerging technologies for fairly high-speed massive storage. We will show presently that for M/G/C queues under moderate load, the size of the primary buer can be substantially reduced in comparison with a M/G/1 server of comparable speed. First-come rst-serve sequencing is preserved since packets are passed to each server in the order in which they arrive and are put in the queue. If there is no backup buer, then there must be losses, and we have a GI/G/C/N system. Following [LIPS97] will assume that either \GI" or \G" is exponential, yielding a total of four dierent types of queues. First we will assume that packet arrivals can be considered a general renewal process, where each packet must be serviced in a time taken from an exponential distribution with mean time 1= (a GI/M/C queue). If no backup buer is provided, then we have a GI/M/C/N queue. In an alternate view (see [LIKH95]) a Poisson process with a \disbursed" batch of packets whose number is distributed by a power-tail, can also generate self-similar data. If the packets can be \reassembled" at the receiving node and counted as one customer whose service time is taken from a power-tail distribution then we have an M/G/C queue, or an M/G/C/N queue if there is no backup buer. For a queue with a single server, we will assume a service rate of 1, and for queues with multiple servers, we will assume a service rate of 1/C, and service time of C for each server; the maximal throughput of the multiple servers will equal that of the single server. Therefore, for an M/G/1 queue, the mean service time will be 1 (with a power-tail distribution), while for an M/G/2 queue, the mean service time will be 2 for each server.
2.1 Properties of Power-tail Distributions
These distributions are thoroughly described in [GREI95]. A summary is given here. A Probability Distribution Function (PDF), for some random variable, X , is de ned as: F (x) := Prob(X x);
while its Reliability Function is given as R(x) := Prob(X > x) = 1 ? F (x):
Also, if it exists, the probability density function (pdf) is de ned as f (x) :=
(x) : dF (x) = ? Rdx dx
A Power-tail Distribution can be de ned by its behavior for very large x. That is, if c R(x) ?! ; x
3
(1)
then R(x) [or F (x)] is a power-tail distribution with power , where is a positive, real number. From elementary calculus it is easy to show that if 1 then the distribution has an in nite mean. If 1 < 2 then F (x) has a nite mean, but an in nite variance. In general, Z1 ` x` f (x) dx = 1 8 ` : (2) E(X ) := 0
Such distributions have been known to exist for a long time. Pareto used them in describing the distribution of wealth in economics. Levy showed that all stable distributions with in nite variance have power tails. Thus they are also referred to as Pareto-, or Levy-Pareto-, or simply Levy Distributions in the literature for various disciplines. For a more complete discussion, the reader is referred to William Feller's Book [FELL71], or [GREI95]. These distributions have been ignored in computer science and related elds in the past because an extremely large number of events (a number of the order of 107 would not be unreasonable) must occur for the eects of this distribution to be relevant. What does it mean for a model to predict a steady-state mean queue length of, say 10,000 customers, when there are hardly that many customers in the user community? Only now, with the presence of the information super-highway (and the world-wide web) can we expect to see (and, in fact, have already seen) so many events (customers - packets) in a relatively short time. The problem is further exacerbated by the fact that since the world wide web is global, there is not the \overnight" most current systems enjoy for the queue to drain while the users sleep. The sun never sets on the World Wide Web. In general, simple power-tail distributions [the one used by Pareto was of the form: f (x) = c x?1 =(1 + x)+ ] are dicult to use for Laplace transforms, and do not have direct matrix representations. But a most useful sub-class of them is given in [GREI95]. The particular one we use here is de ned as: ?1 1 ? MX n n (3) RM (x) = 1 ? M n=0 exp(?x= ); where and are parameters satisfying the inequalities: 0 < < 1 and > 1. It is not hard to show that the `th moments are given by ` M E(XM` ) = 11??M 1 ?1 ?( )` 1 : Next de ne their limit function 1 X R(x) := lim RM (x) = (1 ? ) n exp(x= n ): M !1 n=0
It can be shown that R(x) satis es (1), and is related to and by log() = 1 ; or := ? log( ) From this it follows that
(4)
(5)
(6)
E(X ) := Mlim E(XM` ) = 1 for ` : !1 that is, Equation (2) is satis ed. We refer to the functions, RM (x) as truncated power-tail (TPT) distributions, because, depending on the size of M , they look like their limit function, the true power-tail, R(x). But for some large x, depending upon M , they drop o exponentially. These 4
distribution functions have the bene t of being easy to manipulate algebraically. Furthermore, they are M ?dimensional phase distributions whose vector-matrix representations , < pM ; BM > are given by (using the notation of [LIPS92]): 2 3 .. 1 0 0 . 0 66 7 7 66 0 ?1 0 ... 7 0 7 6 7 . BM = 66 0 0 ?2 .. and pM = 11??M [ 1 2 M ?1 ]: (7) 7 0 7 66 .. .. 7 .. .. .. 7 . . . 4 . . 5 .. ?(M ?1) 0 0 0 . We need these matrices to calculate the properties of nite-buer queues. They generate the functions given above by the relations RM (x) = M [exp(?xBM )]
where for any square matrix, Z
M [Z] := pM Z 0M : (8) 0 M is the column vector with all 1's. The general method of representing processes by matrix operators is called Linear Algebraic Queueing Theory (LAQT) by [LIPS92]. We de ne
V := B? ; 1 M
M
which gives
` ]: E(XM` ) = `! M [VM Furthermore, the Laplace transform of FM ( ) is Z1 e?sx fM (x) dx = M [(I + sVM )?1 ]: B (s) := 0
Note that the matrix B, representing R(x), is in nite dimensional, and has an in nite set of eigenvalues, f n g, with an accumulation point at 0. So, in principle, its inverse does not exist. However, with judicious use of limits, all calculations can be carried out. Our purpose in this paper is to examine the eect which power-tail distributions and their truncated cousins have upon buer sizes, and how utilizing multiple servers can reduce the buer space required to limit buer over ow to some threshhold. For our \base" function we have chosen = 1:4 and = 0:5. [GREI95] has shown that for xed , system behavior is quite insensitive to changes in , so any intermediate value will do. On the other hand, performance is very sensitive to . The value = 1:4 ts the data given in [LELA94]. Other values of have also been reported [PARK96], and may be further explored in future work. In all cases, the 's have been chosen to give a mean interarrival time of 1. These formulas are all appropriate for single server models. With multiple servers, extended formulas are required. At each section, we will introduce the additional terms required to generate the formulas.
2.2 M/G/C Queues - No Packets Lost
Here we assume that the time to process a packet has service time distribution F (x). We also assume that the arrival process is Poisson, with rate , and we have both a primary buer of 5
Server with Power-tail service time distribution and mean service time 0.5
λ
Server with Power-tail service time distribution and mean service time 0.5
Figure 1: An M/G/C queue with a power-tail serive time distribution. In this example, C = 2; we also user C = 1 for comparison. Also note that for systems with multiple servers (C 2), the sum of the service rates is 1; this ensures that we are comparing equitable systems. a limited size and a secondary buer of in nite size. We also assume we can have more than one server (C = 1; 2 ) all with the same service time distribution F (x). For our model, we will assume that the total service rate of the system is 1, i.e. if there are two servers, each has mean service time of 2. This allows us to make a fair comparison; we wish to nd whether a single double speed server is better than two single speed servers. This model constitutes an open M/G/C queue. A diagram of the system can be found in Figure 1. There are some obvious statements we can make. If we have one customer in the system, then a faster server will be better than two servers, as only one server can be utilized. At what point does one system become better? For an M/M/2 server, the doubly fast server is always better, since in the stateless system, it is impossible to distinguish between two servers and a double fast one when both servers are active. We include the M/M/1 and M/M/2 curves for comparison in our examples. In Figure 2, we chart the mean response time for some of these systems. As one would expect, for lower utilization rates, a single, faster server gives better response times. However, for truncated power-tails with M = 8 there is a \crossover" point at around = :38. This is what we would expect { that at lower rates of utilization, a faster server is better than two servers. However, as the power-tail grows, this point shifts towards lower values of ; for a truncated power-tail with M = 20, this crossover occurs at a point lower than = :05. So for all but the systems with the shortest truncated power-tail distributions (which look almost exponential), two servers have a better response time than a single double fast one. The question for this example, and the primary question of this paper, is \For what systems is it better to buy multiple servers?". We look at response times in Figure 3. The crossover points distinctly indicate that it is better to use 2 slower servers than one faster one starting at = 0:38. Also, three is better than one at about = 0:5. However, there is only marginal improvement from 2 to 3 servers at about = 0:7. Clearly, the optimal number of servers depend not only on the distribution, but at what the mean utilization of the system is. For > :9 almost any system would be expected to behave badly. Of more interest is the range 0:4 < < 0:8. For Truncated power-tails with larger M, the advantage goes to the multiple server system; this supports our contention that the more self-similar a system is (the larger M is), the better multiple servers will perform over a faster, single server. One could also argue that this is merely a result of the higher coecient of variation, C 2 , for the higher order power-tails. Indeed, some system performance characteristics, such as the mean queue length, depend solely on the rst two moments. But, [LIPS97] shows that for many of the parameters, such as buer size choice, the rst two moments are inadequate to explain what is going on. Indeed, systems with the same mean service time and variance (i.e., the rst two 6
Mean Response time for an M/Power−tail/C System
4
10
3
Response time
10
2
10
m/pt8/1 m/pt20/1 1
10
m/pt8/2
m/pt20/2 m/m/2 m/m/1
0
10
0
0.1
0.2
0.3
0.4
0.5 Rho
0.6
0.7
0.8
0.9
1
Figure 2: An M/G/C queue with a power-tail service time distribution. Mean response time for an M/PT/C queue with power-tails of length 8 and 20, and C=1,2 servers. M/M/1 and M/M/2 queues are included for comparison.
7
Mean Response Time for an M/PT8/C System
2
Response Time
10
1
10
m/pt8/3 m/pt8/2 0
10
0
m/pt8/1 0.1
0.2
0.3
0.4
0.5 Rho
0.6
0.7
0.8
0.9
1
Figure 3: An M/G/C queue with at power-tail service time distribution. Mean response time for an M/TPT/C queue with truncated power-tails with M=8 and C=1,2,3 servers. For this case, two servers are better than one for > 0:38 and three is better than two at > :65.
8
Buffer space required for 1% to overflow buffer
3
10
m/pt20/2 m/pt20/1
2
Buffer Size required * (1−Rho)
10
1
10
m/pt8/1
m/pt8/2 m/m/2
m/m/1 0
10
−1
10
0
0.1
0.2
0.3
0.4
0.5 Rho
0.6
0.7
0.8
0.9
1
Figure 4: Primary Buer Size Needed in a M/PT/C Queue for Over ow To Be Less Than 1%, as a Function of = 1=( x). Because the buer size can become very large as approaches 1, the function actually plotted is log[(1 ? )N ]. The curves are discontinuous because N is an integer function, and have negative slopes for small because of the factor 1 ? . moments) can yield wildly dierent behaviors. We can also look at buer size required to limit over ow. In Figure 4 we chart the buer size required so that only 1:0% of the arriving packets go to our \over ow" buer. As would be expected, the necessary buer size grows unboundedly as approaches 1. In order to control the variation along the y-axis, N was multiplied by 1 ? . Even so, for large values of M (the power-tail length), C 2 becomes unboundedly large, so we have plotted log[(1 ? ) N ] versus . Note that as the truncated power-tail grows (M gets bigger), and it will as the World-Wide Web gets larger and more international, the double server requires a signi cantly smaller buer size than the double-fast single server. Also of interest is the region of greatest improvement. For the lower-middle values of ( = 0.2 to 0.5), the required buer size is reduced by one or two orders of magnitude. As the system gets busier ( ! 1), the buer size required for the same percentage to over ow converges again. Some of this behavior can be explained by the large variance inherent in power-tail distributions. The high coecient of variation indicates that large jobs will arrive, although infrequently. The system will be processing many small jobs, some mid size jobs and a rare large job. However, as increases, the probability of a large job increases. For the relatively idle system, the large job will occupy only one of the servers; the other server remains to service the small and mid sized jobs in the meanwhile. With a busier system, the probability that both queues are simultaneously occupied with large jobs increases. The system saturates, and the arriving jobs land in the backup queue, just like they would with a system with a single server. So as ! 1, the advantage of the double server diminishes. 9
Also note that buer size is an integer function. A half a slot cannot occupy a job, so we either need a slot or we don't. Also, this is what happens in real systems; no one would set up a system with 13.5 megabytes, even if that were optimal. Due to the way that memory has be allocated as a de facto standard, a system either has 8M or 16M. This also explains the step function we get when trying to allocate an optimal buer size, such as for the lower values for M/PT8/2 in Figure 4.
2.3 M/G/C/N Queues - Packets Rejected
In a closed system, as with most systems, buer size is limited. If preserving packets is critical, we can add our \in nite buer" (the secondary buer on disk which feeds the rst buer), but rarely are packets so critical that we must take these steps. Traditionally, upper layers in a protocol recover missorted or lost packets. The IP protocol is de ned as an unreliable, best-eort, connectionless packet delivery system. It does not discard packets capriciously; unreliability arises when resources are exhausted. Discarding packets is not in itself unreasonable. The TCP layer is what guarantees in-order delivery [COME91]. Other protocols have similar methods for handling dropped packets. The probability that an M/G/C/N queue will drop a packet is the same as the probability that an arriving customer will see a full queue. We de ne this as r(N;N), or the probability that a customer will see N customers in a system that only holds N customers. This is: r(N ; N ) = r(0; N )xc (N ; N ) for C N
The derivation of these formulas is beyond the scope of this paper. The reader is invited to reference [LIPS92] for a full discussion. In [LIPS97], it was shown that discarding packets reduces the buer size required over having an in nite secondary buer by several orders of magnitude in some cases. It was also shown that the high variance of the truncated power-tails was inadaquate to fully explain this bad behavior. Hyper-exponential distributions with the same variance yielded fundamentally dierent results. Here we have reduced the buer size even further by splitting the servers. Now, not only do we discard packets when the buer lls, but the buer lls less frequently due to the multiple servers compensating for the arrival of large jobs. In gure 5, we chart the buer size required so that only 1:0% of the arrivals are rejected in a system with single and multiple servers. Of interest here are the great savings in buer size due to splitting the servers. Even when we reject jobs when the queue lls, there is still the problem of a large job occupying the server. Therefore, although part of the problem is with the fact that the queue length can grow unboundedly, especially due to the larger jobs that appear periodically, we still can gain signi cant savings in buer size by splitting the servers.
2.4 G/M/C Queues - No Packets Rejected
Up until now, we have assumed that the service time has a power-tail distribution. Now let us look at what happens when the arrival rate has a power-tail distribution. The premise behind this paper has been that we can gain performance by splitting a faster server into two slower, independent servers. This works well, as we have seen, when the service time has a power-tail distribution. But with the G/M/C queues, we are splitting an exponential server into C exponential servers with rate =C . When each of the servers are busy, it is impossible to distinguish between a load dependent server and multiple servers. The performance of these two systems is almost exactly the same. As Figure 6 shows, the required buer size diers when n < C . If n >= C , all systems are busy at rate =C , and it is impossible to distinguish this system from one with a single 10
Buffer space required for 1% REJECTED
3
Buffer Size required * (1−Rho)
10
m/pt20/1/n 2
10
m/pt20/2/n 1
10
m/pt8/1/n m/pt8/2/n
0
10
0
0.1
0.2
0.3
0.4
0.5 Rho
0.6
0.7
0.8
0.9
1
Figure 5: Primary Buer Size Needed in a M/PT/C/N Queue for Rejection Rate To Be Less Than 1% Here, we assume that when the system lls up (we have N customers), any new arriving customers are rejected. It is up to the upper layers in the protocol to resend.
11
PT16/M/C (C=1:4) queue, buffer size for 1% overflow
2
Size Needed * (1−Rho)
10
1
10
PT16/M/4 PT16/M/3 PT16/M/2 PT16/M/1
0
10
0
0.1
0.2
0.3
0.4
0.5 Rho
0.6
0.7
0.8
0.9
1
Figure 6: A G/M/C queue with a power-tail arrival distribution and exponential servers. In this example, C = 1; 2; 3; 4. Note that as increases, the systems look alike. This is because when all the servers are busy, the system is indistinguishable from a system with one fast server. Here, the mean service time for each server is 1/C. SincePthe aggregation of Poisson processes is Poisson, and the mean service rate of this system is Ci=0 i . For this reason, a single server is better then multiple servers. In this chart, the system with the smallest buer requirements is the PT16/M/1 system, and the largest is PT16/M/4. exponential server with service rate . The arrival rate has a truncated power-tail distribution with M = 16. We analyze the system for 1, 2, 3 and 4 servers. The formulas for these graphs are from [LIPS92] and [GREI95]. We note without giving details that a system with packets rejected, (TPT/M/C/N) yields the expected results. Since packets are rejected, our needed primary queue size is smaller. Since multiple exponential servers of slower speed perform similarly to a single higher speed server (as long as the cumulative service rates are the same), we derive no performance improvement by splitting the servers in this case.
3 System speedup for dierent values of In the previous gures we chose the power parameter to be = 1:4, matching the experimental value that appeared in [LELA94]. However, some more recent data analyses, using Hill estimators and other methods, have measured as low as 1.06 for World-Wide Web trac [PARK96]. We now look at what kind of reduction in required buer size we can obtain by splitting the servers for lower values of 's. In gure 7 we chart the buer size required so that 1:0% of the customers will go to the over ow buer for = 1.4 and = 1.1. 12
Buffer space required for 1% to overflow buffer
4
10
3
10
m/pt16/1, Alpha=1.1
Buffer Size required * (1−Rho)
m/pt16/2, Alpha=1.1 m/pt16/1, Alpha=1.4 2
10
1
10
m/pt16/2, Alpha=1.4
0
10
−1
10
0
0.1
0.2
0.3
0.4
0.5 Rho
0.6
0.7
0.8
0.9
1
Figure 7: Primary Buer Size Needed in a M/PT/C Queue for Over ow To Be Less Than 1%, as a Function of = 1=( x). Here we compare multiple servers vs. single, faster servers for power-tail of length 16 and power parameter , = 1.1, 1.4. We can see here the sensitivity of the system to .
13
In both cases, we see that there is a vast improvement in required buer size by splitting the servers in these cases. This chart also shows the sensitivity of these system to the parameter. The improvement in the closed system, with over ow packets rejected, is also signi cant, although the improvement is not as dramatic. These numbers assume that the system has reached a steady state. The relaxation time (a measure of the time it takes a system to approach its steady state) in such systems may be longer than the lifetime of the system. If this is the case, then to properly measure these systems, we should look at the transition time, the time after the system starts up yet before it reaches the steady state.
4 Conclusion We have shown how to integrate power-tail distributions and their truncated cousins into the analysis of communications networks using various GI/G/C queues both with and without nite buers and with single faster servers and slower multiple servers. We have shown how due to the increasingly self-similar nature of network trac, increasing the speed of a single server shows only marginal improvement. By utilizing multiple servers we yield improvements in response time and required buer space for M/G/C queues. The option of duplicating a server has the additional bene t of being more cost eective than doubling its speed. It is usually cheaper to buy multiple, slower servers than one high speed one. What makes duplication a better choice is the nature of World-Wide Web trac. A web server can receive many HTTP requests. While one system is servicing one request, another server can be servicing others. The independent nature of web trac makes a high level of parallelism possible. This holds true even for ATM switches, which have a xed packet size. The erratic trac may be caused by les whose sizes are distributed according to a power-tail law, but are broken up into numerous smaller packets. These packets then are transmitted close together in time, giving an appearance of burstiness. Changing the underlying medium therefore will not change the burstiness which is so characteristic of Web trac. The distribution we use for our modeling, power-tails, models bursty Web trac more accurately than other distributions in use. Exponential distributions, in particular, are woefully inadequate for describing network trac. Hyperexponential distributions with the same mean and variance as the truncated power-tails are accurate for low and high values of ( < 0.1 and > 0.95), but are completely unreliable for middle values of , where most systems operate. Other more complicated processes which heuristically build in correlations and burstiness (such as Compound Poisson processes) may well be unnecessary, and are less accurate. The elegance of the simplicity of this model makes power-tail distributions a powerful modeling tool. The reason multiple servers oer such a gain in performance is because a single large job does not tie up the system. With the secondary buer, while the system is serving large jobs, there is a higher probability that a second large job will arrive. The server splitting osets this, but ultimately enough large jobs may arrive to drag down the system. We have also seen the incredible buer savings gained by rejecting packets when the buer lled instead of sending them to an over ow buer. In this case, large arriving jobs can be rejected if the system is already full. The results of this paper suggest further questions. For example, we show that even for smaller truncated power-tail distributions, having two servers improves performance over having one. However, we saw that having three servers oers only slight buer savings in buer space and response time. Obviously, there is an optimal level of parallelism; perhaps some sort of dynamic server splitting (similar to time sharing) can occur. 14
References
[BERA95] J. Beran, R. Sherman, M. S. Taqqu and W. Willinger, \Variable-Bit-Rate Video Trac and Long-Range Dependence", IEEE/ACM Trans. on Networking, 1995. [COME91] Douglas E. Comer, David L. Stevens, Internetworking with TCP/IP, Vol. I, Prentice Hall, Englewood Clis, NJ, 1991 [ERRA96] Ashok Erramilli, Onuttom Narayan and Walter Willinger, \Experimental Queueing Analysis with Long-Range Dependent Packet Trac", IEEE/ACM Trans. on Networking, pp. 209-223, April 1996 [FELL71] William Feller, An Introduction to Probability Theory and its Applications, Vol. II, John Wiley and Sons, New York, 1971. [GARG92] Sharad Garg, Lester Lipsky and Maryann Robbert, \The Eect of Power-tail Distributions on the Behavior of Time Sharing Computer Systems", 1992 ACM SIGAPP Symposium on Applied Computing, Kansas City, MO, March, 1992. [GREI95] Michael Greiner, Manfred Jobmann and Lester Lipsky, \The Importance of Powertail Distributions for Telecommunication Trac Models", Technical Report, Department of Informatics, Technical University-Munich (TUM), Submitted for publication. [HEAT96] David Heath, Sidney Resnick and Gennady Samorodnitsky, \Patterns of Buer Over ow in a Class of Queues with Long Memory in the Input Stream", Technical Report, Cornell University. [LELA86] Will E. Leland and Teunis Ott, \Load-Balancing Heuristics and Process Behavior", Proceedings of ACM SIGMETRICS 1986, May 27 - 30, 1986, pp. 54-69. (The proceedings appeared as vol 14, no 1, Performance Evaluation Review, May, 1986.) [LELA94] Will E. Leland, Murad S. Taqqu, Walter Willinger and Daniel V. Wilson, \On the Self-Similar Nature of Ethernet Trac (Extended Version)", Proc. of IEEE/ACM Trans. on Networking, 2,1, Feb. 1994. [LIEF94] Appie van de Liefvoort and Hai Feng Weng, \On the Instability (`Fractal' Behavior) of Arrival Counts Generated By Power-tail Renewal Processes", Department Report, Computer Science and Telecommunications Program, University of Missouri-Kansas City, 1994. [LIKH95] Nikolai Likhanov, Boris Tsybakov and Nicolas D. Georganas, \Analysis of an ATM Buer with Self-Similar (\Fractal") Input Trac", Proc. IEEE INFOCOM'95, Boston, April 1995. [LIPS86] Lester Lipsky, \A Heuristic Fit of an Unusual Set of Data", Bell Communications Research Report, January 1986. [LIPS92] Lester Lipsky, QUEUEING THEORY: A Linear Algebraic Approach, MacMillan and Company, New York, 1992. [LIPS95] Lester Lipsky and Pierre Fiorini, \Auto-Correlation of Counting Processes Associated with Renewal Processes", Technical Report, Booth Research Center, University of Connecticut, August 1995. 15
[LIPS97] Lester Lipsky and John Hatem, \Buer Problems in Telecommunications Systems", Technical Report, BRC, University of Connecticut, Submitted for Publication. [LOWR93] Walter Lowrie and Lester Lipsky, \A Model For The Probability Distribution of Medical Expenses", Proceedings of CONFERENCE OF ACTUARIES IN PUBLIC PRACTICE, 1993. [PARK96] Kihong Park, Gitae Kim and Mark Crovella, \On the Relationship between File Sizes, Transport Protocols, and Self-Similar Trac", Technical Report, Boston University, TR-96-016, Sumbitted for publication. [TAQQ95] Walter Willinger, Murad S. Taqqu, Robert Sharman and Daniel V. Wilson, \Self Similarity In High Speed Packet Trains: Analysis and Modeling of Ethernet Trac Measurements", Statistical Science, vol. 10, pp. 67-85, 1995 [WILL95] Walter Willinger, Murad S. Taqqu, Robert Sharman and Daniel V. Wilson, \Self Similarity Through High-Variability: Statistical Analysis of Ethernet LAN Trac at the Source Level (Extended Version)", ACM SIGCOMM'95.
16