On the estimation of buffer overflow probabilities

0 downloads 0 Views 475KB Size Report
It can be shown that this probability is equal to a buffer overflow prob- ability in a corresponding ... Duffield [15] extends these results by considering large-devi-.
178

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 1, JANUARY 2001

On the Estimation of Buffer Overflow Probabilities from Measurements Ioannis Ch. Paschalidis, Member, IEEE, and Spyridon Vassilaras

Abstract—We propose estimators of the buffer overflow probability in queues fed by a Markov-modulated input process and serviced by an autocorrelated service process. These estimators are based on large-deviations asymptotics for the overflow probability. We demonstrate that the proposed estimators are less likely to underestimate the overflow probability than the estimator obtained by certainty equivalence. As such, they are appropriate in situations where the overflow probability is associated with quality of service (QoS) and we need to provide firm QoS guarantees. We also show that as the number of observations increases to infinity the proposed estimators converge with probability one to the appropriate target, and thus, do not lead to underutilization of the system in this limit. Index Terms—Empirical measure, estimation, large deviations, Markov-modulated processes, Sanov’s theorem.

I. INTRODUCTION

E

STIMATING small buffer overflow probabilities has found important applications in a variety of application areas. Two prominent examples include the control of communication networks and manufacturing systems. In modern high-speed communication networks, congestion manifests itself as buffer overflows; the quality of service (QoS) faced by various connections can be quantified by the buffer overflow probability. To provide QoS guarantees the so-called effective bandwidth admission control mechanism has been proposed [1]–[6]. Briefly, effective bandwidth is a number between the peak and average rate of a connection chosen in such a way that when connections are allocated their effective bandwidth in an appropriately dimensioned buffer, the buffer overflow proba). bility stays below a given small level (say, on the order of Real-time applications can tolerate such small frequencies of congestion phenomena. In make-to-stock manufacturing systems and supply chains, on the other hand, the objective is to control the inventory in order to avoid stockouts (see [7], [28], [8], [9]). In such systems, demand is met from a finished goods inventory, and it is Manuscript received November 14, 1999; revised August 16, 2000. This work was supported in part by the National Science Foundation under a CAREER Award ANI-9983221 and under Grants NCR-9706148 and ACI-9873339, and by the Nokia Research Center, Boston, MA. The material in this paper was presented in part at the 37th Annual Allerton Conference, Monticello, IL, September 1999. I. Ch. Paschalidis is with the Department of Manufacturing Engineering, Boston University, Boston, MA 02215 USA (e-mail: [email protected]; url: http://ionix.bu.edu). S. Vassilaras is with the College of Engineering, Boston University, Boston, MA 02215 USA (e-mail: [email protected]). Communicated by V. Anantharam, Associate Editor for Communication Networks. Publisher Item Identifier S 0018-9448(01)00587-9.

backordered if inventory is not available. The stockout probability quantifies the QoS encountered by customers. It can be shown that this probability is equal to a buffer overflow probability in a corresponding make-to-order system [7], [28], [9]. Thus, the problem of estimating the stockout probability can be transformed into one of estimating a buffer overflow probability. In both these applications, we are interested in estimating buffer overflow probabilities that are very small. Moreover, arrival and service processes are typically autocorrelated (to model bursty traffic in communication networks, and to accommodate realistic demand scenarios and model failure-prone production facilities in supply chains). As a result, obtaining exact analytic expressions is intractable; it is, therefore, natural to focus on asymptotic regimes. To that end, large deviations theory (see [10]) has been an important analytical tool. Large deviations techniques have recently been applied to a variety of problems in telecommunications (see [11], [12], [6] and references therein) and supply chains (see [7], [28], [9]). All of this large deviations work assumes detailed knowledge of models for the arrival and service processes. In practice, however, such traffic models are not known a priori and have to be estimated from real observations. Consequently, one approach for estimating buffer overflow probabilities is to assume a certain traffic model, estimate the parameters of the model from real observations, and then calculate the overflow probability using large deviations techniques. Consider, for example, the case of a deterministic Markov-modulated source (i.e., a source which has a constant rate at every state of the underbe the overflow probability lying Markov chain) and let when this source is fed to a certain buffer, where denotes the transition probability matrix of the underlying Markov chain. (We assume that the rate at every state and the characteristics of the buffer are given, thus, we do not explicitly denote the dependence of the overflow probability on these quantities). Let be an unbiased estimator of the transition probability matrix , . Suppose that we use as an estimator of that is, the overflow probability. An important observation is that due , is not necessarily equal to to the nonlinearity of . That is, a certainty equivalence approach can lead to an erroneous estimate. Measurement-based, model-free admission control has received some attention recently [13]–[15]. In particular, the authors in [13] and [14] consider measurement-based admission control schemes in bufferless multiplexers. They investigate how the overflow probability is affected by estimation errors relying on approximations based on the central limit theorem. Duffield [15] extends these results by considering large-deviations asymptotics in a multiplexer with nonzero buffer. The

0018–9448/01$10.00 © 2001 IEEE

PASCHALIDIS AND VASSILARAS: BUFFER OVERFLOW PROBABILITIES FROM MEASUREMENTS

approach in [15] relies on some earlier work in [16] which develops an estimator of the large-deviations rate function of the arrival process directly from measurements, without assuming any particular stochastic model for that process. Finally, the authors in [17] argue, as we do, that certainty equivalence can lead to erroneous estimates for rare event probabilities and propose a Bayesian approach in a simpler [independent and identically distributed (i.i.d.)] context than ours. In this paper, we will focus on queues fed by Markov-modulated arrival processes and develop new estimates of overflow probabilities that take estimation errors into account. To that end, we will establish an inverse of Sanov’s theorem (see [10]) for Markov chains in the large-deviations regime. Intuitively, the proposed estimators suggest that we should quote a quanto guard against estimation errors. tity that is larger than We will provide analytical and numerical evidence that the proposed estimators are “safe” even when based on relatively few observations, meaning that they do not lead to substantial underestimation of the overflow probability that can compromise QoS. Still, they are consistent in the sense that they converge to with probability one (w.p. 1) as the number of observations tends to infinity. Among our main contributions we consider the following. • The fact that the proposed estimators are “safe” for relatively few observations, but eventually (as the number of observations tends to infinity) “learn” the true overflow probability. Providing “safe” estimates from relatively few observations is crucial since it allows the estimator to quickly adapt to level shifts in the input, and thus, accommodate nonstationary scenarios. One of the proposed estimators has the same structure as the estimator suggested in [15]. In [15], however, the estimation process is different; a limiting log-moment-generating function is estimated directly from the data. In our work, we impose additional structure by considering Markov-modulated processes (which is appropriate in some applications, e.g., inventory control in supply chains [7], [28], [9]). Moreover, we derive additional estimators based on higher moments of a loss measure. These estimators are shown to be more appropriate (“safer,” yet still consistent) for our purposes. • The development of efficient algorithms to compute the proposed estimators. Computing the estimators amounts to solving a nonlinear programming problem; efficient computation is an issue, especially in high-dimensional instances. We study the structure of this problem and develop algorithms that perform well in practice. • An inverse of Sanov’s theorem for Markov chains that we establish. This generalizes a result in [18] which deals with i.i.d. processes. It is also related to the result in [19], which derives a large deviations rate function of the posterior distribution for the transition probabilities of a Markov chain with parameter the “true” transition probabilities. We provide an independent proof based on first principles, where in our version the limit of the empirical measure enters the rate function; this is motivated by the application where the “true” transition probabilities are not known.

179

Moreover, our result applies to Markov chains with zeros in the transition probability matrix, a case which is not discussed in [19]. On the organization of this paper, we start in Section II with some preliminaries on large deviations and Sanov’s theorem. In Section III, we discuss analytical large-deviations expressions for buffer overflow probabilities and formulate the estimation problem. In Section IV, we analyze the estimation process in the large-deviations regime and establish an inverse of Sanov’s theorem for Markov chains. We use this result in Section V to propose our estimators of buffer overflow probabilities. We discuss issues related to the computation of the proposed estimators in Section VI. Comparisons between various estimators and some illustrative numerical results are in Section VII. Conclusions are in Section VIII. II. PRELIMINARIES In the form of background on large deviations and to establish some of our notation, we first review some basic results. , with Consider a sequence of i.i.d. random variables . The strong law of large numbers asserts that mean converges to , as , w.p. 1. Thus, for large , , where (or , the event ) is a rare event. In particular, its probability behaves for , as , where the function determines as the rate at which the probability of this event is diminishing. , and is considered the Cramér’s theorem [20] determines first large deviations statement. of random variables, Consider next a sequence with values in and define

For the applications we have in mind, is a partial sum , where , are process. Namely, identically distributed, possibly dependent random variables. We will be making the following assumption. Assumption A: 1) The limit

exists for all , where are allowed both as elements and as limit points. of the sequence 2) The origin is in the interior of the domain

of

. is differentiable in the interior of and the deriva. tive tends to infinity as approaches the boundary of is lower semicontinuous, i.e., 4) 3)

for all .

180

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 1, JANUARY 2001

Theorem II.1 (Sanov’s): For every set tors in

Let us define

of probability vec-

(1) . and are which is the Legendre transform of convex duals (see [21]), namely, along with (1), it also holds that (2) is convex and lower semicontinuous (see The function [10]). Cramér’s theorem has been extended to cover autocorrelated processes. In particular, under Assumption A, the Gärtner–Ellis satisfies a large devitheorem (see [10]) establishes that . More specifiations principle (LDP) with rate function cally, this theorem intuitively asserts that for large enough and for small

The set of processes satisfying Assumption A (and hence Gärtner–Ellis theorem) is large enough to include renewal, Markov-modulated, and stationary processes with mild mixing conditions. Such processes can model “burstiness” and are commonly used in modeling the input traffic to communication networks. They have recently being used in modeling demand and the production process in manufacturing systems [7], [28], [9]. For discrete random variables, a stronger result than Cramér’s theorem is Sanov’s theorem which deals with large deviations of the empirical measure. Consider a sequence of i.i.d. random variables taking values with elements. from a finite alphabet denote the probability law for Let , where is an element of the standard -dimensional , probability simplex (i.e., ). The empirical measure of the is given by sequence

where

where

denotes the interior of .

Intuitively, the empirical measure is “close” to with proba. Notice that this probability is dibility behaving as minishing as is “further away” from the a priori measure since the relative entropy can be interpreted as distance1 (Kullback–Leibler distance, see [22]). On a notational remark, in the rest of the paper we will be deand the limiting log-moment-generating noting by function and the large-deviations rate function, respectively, of the process . Moreover, vectors will be denoted by bold characters and will be assumed to be column vectors, unless otherwise specified. III. LARGE DEVIATIONS OF A G/G/1 QUEUE AND PROBLEM FORMULATION In this section, we review work on the large deviations behavior of a G/G/1 queue under a Markov-modulated arrival process. In particular, we state an asymptotic expression for the probability that the queue length process exceeds a certain threshold . This probability is the quantity we wish to estimate from measurements. Consider a single class queue. We will be using a discrete-time model, where time is divided into time slots. We let , , denote the aggregate number of customers that enter the queue at time . The queue has an infinite buffer and is serviced according to a general service process which can clear up customers during the time interval . We assume to and are that the stochastic processes stationary, possibly autocorrelated, and mutually independent processes that satisfy Assumption A. We denote by the queue length at time (without counting arrivals at time ). We assume that the server uses a work-conserving policy (i.e., the server never stays idle when there is work in the system) and that (4)

and where is the indicator function of being equal to . Thus, is equal to the fraction of occurrences of in the sequence. satisfies an Sanov’s theorem (see [10]) establishes that LDP with rate function given by the relative entropy (3) In particular, we have the following.

which by stationarity carries over to all . We also assume that is stationary. To simthe queue length process plify the analysis, we consider a discrete-time “fluid” model, as nonnegative meaning that we will be treating , , and real numbers (the amount of fluid entering, in queue, or served). An LDP for the queue length process has been established in [23], [11] and is given in the next proposition. In preparation with the property for the result, consider a convex function . We define the largest root of to be the solution of

=

1The relative entropy is nonnegative and equal to zero if and only if  . However, it is not a true metric since it is not symmetric and does not satisfy the triangle inequality.

PASCHALIDIS AND VASSILARAS: BUFFER OVERFLOW PROBABILITIES FROM MEASUREMENTS

the optimization problem . If has a negative , there are two cases: either has a single derivative at , for positive root or it stays below the horizontal axis . In the latter case, we will say that has a root at all . Proposition III.1: The steady-state queue length process satisfies

181

of the decay rate. This procedure yields an estimate of the overflow probability. A problem with this approach is that is not necessarily even if is an unbiased estimate of , unbiased since it is a nonlinear function of the transition probabilities in . That is, a certainty equivalence approach leads to an erroneous estimate. Our objective in this paper is to develop alternative and more appropriate estimates of the overflow prob. ability than

(5) where

IV. LARGE-DEVIATIONS ANALYSIS PROCESS

is the largest root of the equation (6)

More intuitively, for large enough

we have

This expression can be used to estimate the overflow probability in a queue with a finite buffer of size . Kelly [5] establishes that the latter probability has the same asymptotic decay rate . (same exponent) as A. Markov-Modulated Arrivals Assume next that the arrival process is Markov-modulated. More specifically, consider an irreducible Markov states and transition probability chain with . We will be using the notation matrix , where is the th row of and prime denotes transpose. The Markov chain makes one transition per time slot; let be the state at time . The number of arrivals at . time is a deterministic function of the state, i.e., In [10, Sec. 3.1.1] it is established that the limiting log-moment-generating function of the arrival process is given by (7) where

denotes the Perron–Frobenius eigenvalue of the matrix which has elements (8)

(In this Markov-modulated case, we are using notation that exand on the plicitly denotes the dependence of transition probabilities .) Notice that because the quantities are always positive the irreducibility of implies that is irreducible. This calculation of can be easily is a random function of the extended to the case where (see [10]). state

OF THE

ESTIMATION

We start the analysis by studying the large-deviations behavior of the estimation for a Markov-modulated process. In particular, we determine the large-deviations rate function of the a posteriori measure for the underlying Markov chain, given the observed empirical measure. In the simpler case where we are observing the realization of a sequence of i.i.d. random variables, [18] provides the denote a sequence of result. In particular, let i.i.d. random variables taking values from a finite alphabet , . The probability law is not with probability law . known; we have instead a prior distribution the posterior distribution Given observations is a function of the empirical measure; we will be denoting the posterior distribution and by the by , corresponding random variable. Assume that as converges weakly to some in the support of . In [18], it , satisfies an LDP is established that for any prior with rate function if is in the support of otherwise. This result can be interpreted as an “inverse” of Sanov’s theis “close” to orem. Intuitively, given observations with probability behaving as . Recall, that Sanov’s theorem is dealing with deviations of the empirical measure from its typical value and considers the law as given. We will develop an equivalent of this result in the Markov case. Before we proceed with this agenda, it is instructive to state an equivalent of Sanov’s theorem in the Markov case. Consider and the an irreducible Markov chain with states . Let denote transition probability matrix the vector consisting of the rows of . Let denote a sequence of states that the Markov chain visits with the , and consider the empirical measures initial state being

B. Estimating the Overflow Probability We are interested in estimating the overflow probability from measurements. In particular, we will be assuming that we have perfect knowledge of the service process , and that we can observe the states of the Markov chain and associated with the arrival process. That is, we do know , but the transition probability matrix is the function unknown. is One way of estimating the overflow probability to form an estimate of the transition probability matrix, obtain the Perron–Frobenius eigenvalue of the corresponding matrix , and use in Proposition III.1 to obtain an estimate

where

Note that when , the empirical measure denotes the fraction of times that the Markov chain makes transitions from to in the sequence . Let now denote the set of pairs of states that can appear in the sequence and denote by the standard -di-

182

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 1, JANUARY 2001

mensional probability simplex, where . Note that the vector of nality of is an element of

denotes the cardi’s denoted by

. For any

satisfies

whenever

, let

and be its marginals. Whenever

that for all , we have

(9) , let Proof: We have

We will be using the notation We will say that a probability measure is shiftinvariant if both its marginals are identical, i.e., for all . An LDP for is established in the next theorem and is proven in [10, Sec. 3.1.3]. let

Theorem IV.1 ([10]): For every

(10)

if is shift invariant otherwise is the relative entropy defined in (3),

where that is,

Then, for any set

denotes the size of the type class of , i.e., where . For the first the number of sequences for which equality above note that given that the Markov chain is in state , for a trantransitions “out of ” are i.i.d. with probability sition to state . Furthermore, by the definition of the empirical is equal to the number of transitions from measure to in the sequence . Therefore, letting

of probability vectors in

[cf. (9)] and

where

we obtain

denotes the interior of .

We next develop the main result of this section, which can be viewed as the “inverse” of Theorem IV.1. In particular, we obin the Markov serve a certain sequence of states . Let again be the tranchain with the initial state being sition probability matrix of the Markov chain and the vector of transition probabilities. Note that is an element of (i.e., the -times Cartesian product of ). We assume that the transition probabilities are not known; instead, we have a , which assigns probability mass prior only to ’s corresponding to irreducible Markov chains. Let denote the support of . Given the sequence , the posterior ; we will distribution is a function of the empirical measure for the posterior distribution and for write the corresponding random variable. Before we proceed with the main result, we establish the folto denote lowing lemma. We will be using the notation probabilities when the initial state is and the transition probabilities are equal to . the empirical measure Lemma IV.2: Suppose that as converges weakly to some shift-invariant which satisfies , , and . Suppose, also, that implies for all . Then, for any given

(11) where

is the entropy of the probability vector .

PASCHALIDIS AND VASSILARAS: BUFFER OVERFLOW PROBABILITIES FROM MEASUREMENTS

183

Furthermore, it can be made negligible by taking conclude that

It can be shown that (see [10, Exercise 3.1.21])

. We

(12) Using (12), (11), and the fact that the relative entropy is nonnegative, we obtain This along with (13) establishes the desired result. The main result of this section is the following theorem.

Taking the limit as should integrate to one in

and since the prior distribution we obtain

the empirical meaTheorem IV.3: Suppose that as converges weakly to some shift-invariant which satsure , , and . Suppose, also, isfies implies for all . Then for any that and for every set in we have prior

(13) We are left with proving a matching lower bound. To this end, defined as follows: consider an “ -neighborhood” of

(17) where by

denotes the interior of

and the rate function is given

if

(14) , for all and note that when sufficiently large . We have

,

for

to

(18) otherwise. Proof: Conditional on the empirical measure being equal , and using Bayes’ theorem we have

(19) We assume here that the integrand is well-defined, that is, its denominator is not equal to zero.2 If this is the case, then the integrand is equal to the Radon–Nikodym derivative evaluated at . consider first To obtain an upper bound on the numerator in the right-hand side of (19). We have

(15) where we have used (11) in the last inequality above. The first term on the right-hand side of the above is zero, since implies that the integral is strictly positive. For the second term, it can be shown (see [10, eq. 3.1.22]) that (16) The third term on the right-hand side of (15) equals (20) Next, note that the fourth term on the right-hand side of (15) is implies for all . not infinity since

2For

example, it is equal to zero when n is large (i.e., q is close to q ) and .

q 62 p

184

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 1, JANUARY 2001

where we have used (11) and (12) in the last equality. Here, we ; otherwise, are assuming that and the upper bound still holds. Since as , converges to , we combine (19), (20), and the result of Lemma IV.2 to conclude

(21) which establishes the upper bound in (17). is empty then by the definition of Next, note that if we have

and the lower bound in (17) holds trivially. Otherwise, for any there exists sufficiently small such that the , is contained in . By con“ -neighborhood” of , sidering the numerator in the right-hand side of (19) we obtain

Recall now that the inequality above holds for all . Optimizing over such to obtain the tightest bound we have

which establishes the lower bound in (17). V. ESTIMATES OF THE OVERFLOW PROBABILITY In this section we develop estimates of the overflow probability in a G/G/1 queue and discuss confidence measures. Consider the model of the G/G/1 queue, introduced in Section III, under a Markov-modulated arrival process. We assume that we have perfect knowledge of the service process , of the that characterize the arrival process, and number of states that maps Markov states to amount of arof the function riving fluid. The transition probability matrix of the underlying Markov chain of the arrival process is unknown. Suppose of states that this we observe a sequence . With the Markov chain visits with the initial state being notation of Section IV, and letting the empirical measure be equal to , a maximum-likelihood estimator of the transition probabilities is given by (25) for where we assume that is large enough to have all . (To avoid overburdening the notation, in (25) and the sequel we suppress the dependence of the estimators on the sequence .) Let denote the vector of these estimates. We can now with elements [cf. (8)] construct a matrix

and obtain an estimate of the limiting log-momentgenerating function for the arrival process by computing the . Applying the result of Perron–Frobenius eigenvalue of Proposition III.1, for large values of we obtain the following : estimate of the overflow probability (22) In the third inequality above we have used (11). In the last inconverges to and equality above we have used the fact that we conclude that (16). Taking now

(23) Using the result of Lemma IV.2 and (19) we obtain

(26) is the largest root of the equation . If we knew the true transition probabilities, say , then by Proposition III.1 the ersatz approximates in the the overflow probability; in (26) we use the estimate place of the unknown . Notice, however, that as we outlined in is an unbiased estimate of the tranSection III, even though is not necessarily unbiased. Due to the sition probabilities, , estimation errors in can be severely nonlinearity of amplified. To address this drawback, we next propose an alternative estimator. In particular, taking the Bayesian approach of Section IV, we consider the estimator

where

(27) (24)

where the expectation is taken with respect to the posterior probability distribution of the transition probabilities. Recall that we

PASCHALIDIS AND VASSILARAS: BUFFER OVERFLOW PROBABILITIES FROM MEASUREMENTS

have assumed the prior to assign probability mass only to measures corresponding to irreducible Markov chains. As a result, is well-dethe posterior is also irreducible and fined. Calculating the expectation in (27) exactly is intractable; , we will resort to asymptotics. In particular, we let where is a scalar, and compute the dominant exponent in the next proposition.

Finally, if which implies

185

,

for all

,

Consequently, . We conclude that (28) holds in all three cases. Based on (27) and the result of Proposition V.1, for large values of we obtain the following estimator of the overflow : probability

Proposition V.1: It holds

(29) Proof: The result follows by direct application of Varadhan’s integral lemma [10, Theorem 4.3.1]. It suffices to verify that the necessary assumptions are satisfied. Indeed, the moment condition

is trivially satisfied for any which implies that

since

A few remarks on this estimator are in order. Consider a G/G/1 queue serviced by an independent copy of the service process . We draw a sample vector of transition probabiliand feed ties according to the posterior distribution the queue with an independent copy of the arrival process with transition probabilities equal to this sample vector. Let denote the queue length in this queue. For large enough we have

Moreover, as it is established in the lemma that follows this is a continuous function. proof, Lemma V.2: Assume that is the vector of transition probabilities corresponding to an irreducible Markov chain. Then, is a continuous function of . Proof: It suffices to show that (28) is the Perron–Frobenius eigenvalue of the Recall that . As such, it is real, nonnegative, and continuous in matrix and [24]. Therefore, is continuous in and as well, which implies

We next distinguish between three cases: , , and . , has a single positive If the root, which by continuity implies that for small enough perturbed equation

(30) captures estimation errors by Essentially, the estimator computing what the loss probability would have been if the arrival process were not exactly known, but characterized by some posterior probability distribution. This interpretation of is similar to the result in [15, Theorem 8]; the difference is that [15] does not consider the Markov-modulated case and estimates the limiting log-moment-generating function directly from measurements. can be decomposed An interesting observation is that be the optimal solution of into two terms. In particular, let . We the optimization problem appearing in the definition of have (31)

has a single root

as well. Furthermore,

and for all Therefore, If now which implies

Consequently,

. ,

for all

.

,

The first term in the right-hand side of the above approximates the loss probability in a G/G/1 queue with buffer size and arrival process governed by transition probabilities equal to . The second term in the right-hand side of the above approximates the probability that the posterior transition probabilities can be inof the arrival process are equal to . The vector terpreted as the most likely value of the transition probabilities of the arrival process, given the observation , that leads to an can be thought of as the “worst” overflow. In view of (30), transition probability vector in , i.e., the one leading to larger . overflow probability , To understand on a more intuitive level the estimator converges to some note that when the empirical measure

186

, that

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 1, JANUARY 2001

. Furthermore, we have assumed in Theorem IV.3 . Consequently,

where

is some positive scalar and

is given by [cf. (18)] if

(32) The right-hand side of the above is the large-deviations asymptotic for the overflow probability, if we believe the transition probabilities to be given by (their estimated values). Instead, which is larger. We can say that we pay an “estiwe quote mation premium” to guard against estimation errors. Taking into underestimates the account the interpretation in (30), overflow probability, i.e., certainty equivalence fails. as an expectation of the loss If we view the estimator [cf. (27)], we can also compute higher measure order moments. More specifically, consider the quantities (33) Using exactly the same reasoning as in the proof of Proposition V.1 we can compute the dominant exponent of these quantities. We can thus define estimators based on higher moments of the we define loss measure at hand. In particular, for (34) Interestingly enough, we can also asymptotically characterize . The next the distribution of the loss measure proposition describes the result. Proposition V.3: We have

(35) Proof: The result follows by direct application of the contraction principle [10, Theorem 4.2.1]. In particular, satisfies an LDP in with rate function

otherwise. (37) We are interested in solving relatively large instances of (36); would be in the range of 5–100 which brings the typically, number of decision variables in the range of 25–10 000. As a result, computational efficiency is critical and special-purpose algorithms that exploit the special structure of interest. On the structure of the objective function, a first observation is a convex function if is a convex set. This can is that be easily established directly from the definition of convexity. is strictly convex in (where it is finite). In Furthermore, Section IV, we have assumed that the prior assigns probability mass only to ’s corresponding to irreducible Markov chains. Let be the set of ’s corresponding to irreducible Markov chains. The next Proposition shows that is a convex set. Proposition VI.1: The set of transition probability vectors corresponding to irreducible Markov chains is a convex set. and Proof: Consider two irreducible Markov chains with transition probability vectors and , respectively. form a new Markov chain with For some scalar . Clearly, for either transition probabilities or , is irreducible. We will use contradiction to show . Assume otherwise. that this is also the case for has two states and such that is not This implies that is irreducible, there exists a path accessible from . Since , i.e., from to with intermediate states

Consequently, for any

which contradicts our initial assumption. Based on this proposition, and assuming that the support of is a convex subset of , is a convex function. the prior Henceforth, and in the absence of more information on the true transition probabilities of the Markov chain we wish to estimate, we will be making the following assumption.

Therefore,

which is equivalent with (35). VI. COMPUTING THE ESTIMATORS Computing the estimators in (29) and (34) requires solving nonlinear optimization problems. In this section, we examine the structure of these problems and devise efficient algorithms for their solution. The optimization problems in (29) and (34) have the following form: minimize s.t. (36)

is the set of Assumption B: The support of the prior transition probability vectors corresponding to irreducible Markov chains. Recall next from the statement of Theorem IV.3 that the limit , , which appears in the defiof the empirical measure [cf. (37)], satisfies . As a result, is nition of the transition probability vector of an irreducible Markov chain. Consider now the following optimization problem: minimize s.t.

(38)

PASCHALIDIS AND VASSILARAS: BUFFER OVERFLOW PROBABILITIES FROM MEASUREMENTS

187

where (39) The next lemma establishes a property of the optimal solution. Lemma VI.2: Let be an optimal solution of the optimization problem in (38). Then is the transition probability vector of an irreducible Markov chain. is the transition probability vector Proof: Recall that . Then of an irreducible Markov chain, to be denoted by of , is accessible from . More for any pair of states are the intermediate states, it holds specifically, if that

This implies that otherwise, probabilities

. Thus, the Markov chain with transition is irreducible.

The result of this lemma suggests that under Assumption B the optimization problem in (38) is equivalent to the problem in (36). Hence, we will focus on solving (38). A very important issue for any nonlinear programming problem is the form of the objective function. A convex objective function, especially under polyhedral constraints (as in (38)), can lead to more efficient algorithms. Let denote the objective function in (38). Notice that it is the , and , weighted sum of a strictly convex function which is not necessarily convex [see Fig. 1 for a case where is not convex]. Consequently, is not convex in is strictly convex, it can be general. Nevertheless, since will be convex for small enough values of . seen that and assuming that the buffer size is Recalling that given, we will be dealing with a convex objective function if we can afford a large number of measurements. A. A Heuristic Algorithm To solve the problem in (38) we have developed a heuristic algorithm which performs very well in practice, typically giving first decimal digit approximations to the optimal value by the third iteration and third decimal digit by the fourth. To describe the heuristic, note that the constraint set is the implies Cartesian product of simplices, and that for all . This feasible set implies the following optimality conditions [25, pp. 178–179]:

satisfying

(40)

The heuristic algorithm iterates as follows. 1) Initialize with 2) Let

.

be the outcome of the th iteration. Stop if , where is the desired accuracy.

=

Fig. 1. We consider an example where k 1 and the arrival process is the superposition of 36 two-state Markov-modulated processes with (f (1); f (2)) = (0:042; 0:077). This aggregate process is fed into a buffer with capacity c = 50=24. The parameter s was set equal to 0:002: In (a) we plot  (p ) and in (b)  (p) versus the two (arbitrarily chosen as) independent decision variables p(1; 1) and p(2; 2). It can be seen that although  (p) is not convex (it is convex only along some directions),  (p) is convex.

3) Form an approximation of

(41) Minimize the expression in (41) subject to the constraints . (This of problem (38) to obtain an optimal solution minimization can be done by solving the optimality conditions in (40). More specifically, the system of equations in (40) reduces to a scalar nonlinear equation which can be solved numerically.) 4) Set

and return to Step 2.

The intuition behind this algorithm is that as we get closer to the of problem (38), the approximations real minimum

188

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 1, JANUARY 2001

of improve and yield that are closer to . Because at each step we solve the approximate problem exactly, the algorithm typically needs much fewer iterations than a standard gradient-based algorithm. Although, this heuristic performs well in practice, it does not guarantee convergence. This is the case because the approximaare good in a small region around the expansion tions point but may be well off away from that point. This might lead . An algoto unpleasant situations where rithm that does guarantee convergence to a stationary point (i.e., a local minimum) is the conditional gradient method (see [25, Sec. 2.2]). This is a feasible direction method that iterates as follows: (42) is the step size and where the subproblem minimize

is the optimal solution of

Given that we have observed a sequence of transitions of the Markov chain corresponding to the arrival process, we will , ii) , and iii) consider three estimators: i)

where , , and were defined in (26), (29), and (34), respectively, and is some scalar. Recall that in Section as the expectation of the loss measure V we interpreted , and as the corresponding second mocan be interpreted as expectation ment. In this light, times standard deviation. To compute these estimators plus as an approximawe will be using the empirical measure tion of its limit , which appears in the definition of the rate . Maintaining the notation established so far, we function , and denote by the corresponding vector of let transition probabilities. The next proposition compares the three estimators discussed above. Proposition VII.1: For any nonnegative

s.t.

it holds that (44)

(43) Notice that this subproblem is a linear programming problem and, thus, can be solved very fast with sophisticated solvers (e.g., such as CPLEX). To obtain the excellent performance of the heuristic in practice and still be able to guarantee convergence, one can implement a hybrid algorithm that uses the heuristic initially (when far away from the local minimum) and switches to the conditional gradient method when the iterate approaches a local minimum. Both algorithms we examined above, require the gradient . The gradient of can be easily obtained. The calof is a bit more involved and is culation of the gradient of given in the Appendix. Concluding this section, we note that the optimization problem in (35) can be handled similarly. That is, we can dualize the nonlinear constraint and apply the method of multipliers [25, Sec. 4.2]. As a result, at each iteration we will be dealing with a problem identical to (38) which makes the discussion above applicable. VII. COMPARISONS AND NUMERICAL RESULTS In this section, we compare the various estimators discussed so far and present some illustrative numerical results. We consider a queue with a certain Markov-modulated arrival process with transition probability vector . As in Section the queue length and by the buffer size. III, we denote by The result of Proposition III.1 suggests approximating the overwith (see [6], [7], [28] flow probability for related approximations and numerical results indicating that this approximation is very accurate for a wide range of ’s). We pretend now that we do not know and we are interested in from measurements. estimating the loss probability In the sequel, when computing various estimators we will be treating the buffer size as constant; we will discuss the behavior of the estimators as the number of measurements increases.

Furthermore, for all

and w.p. 1. (45)

Proof: The first inequality in (44) is due to the same argument used in establishing (32). That is,

since . The second inequality in (44) is due to the fact that the standard deviation is nonnegative. , using To establish the behavior of the estimators as w.p. 1. renewal reward arguments it can be seen that As a result w.p. 1. and note that as , and treating Next consider as a constant, we have . Observe that is , and that both and a strict global minimum of are continuous in a neighborhood of . Then it can be shown (see [25, Ch. 1, Exercise 1.9]) that the optimal solution of the optimization problem

is some function of , say , which tends to as . Intuitively, for large enough , becomes sufficiently small dominates the objective function, which implies that and of . the optimal solution is “close” to the minimizer Namely, for large enough

w.p. 1.

PASCHALIDIS AND VASSILARAS: BUFFER OVERFLOW PROBABILITIES FROM MEASUREMENTS

The same argument leads to

w.p. 1 which implies w.p. 1. We conclude that w.p. 1. Essentially, this proposition establishes that the estimators considered are consistent since they converge to the “target” . We can view as the target because this is what we would have quoted as the overflow probability if we knew the true transition probabilities . When we estimate an overflow probability we are more concerned with underestimating the “true” value than with overestimating it. The reason is that the overflow probability quantifies QoS (see, for example, applications in communication networks [6] and supply chains [7], [28], [9]); controlling the queue to guarantee the desired and QoS is important. In this light, the estimators are “safer” than since due to (44) they are less likely to underestimate the overflow probability; yet they are consistent and as . The paramconverge to the correct target determines how conservative we choose to eter in be for relatively small values of . A. An Example We next present a numerical example. Consider a Markovmodulated arrival process driven by a Markov chain with five states and transition probability matrix

The Markov chain makes one transition per time slot and the number of arrivals per time slot at every state is characterized by

We fed this arrival process into a queue with capacity per time slot and buffer size . To improve the accuracy of the large-deviations asymptotic we used a refinement discussed in [6] and [7], [28] that introduces a constant in front of the exponential. In particular, the refined large-deviations asymptotic is (46) (see [6], [7], [28] for details). In the parwhere . ticular example we are considering, we have , which indicates For comparison, simulation yields is a good approximation (in the sense that it captures that

189

the order of magnitude and approximates well the first significant digit). To compare the different estimators of the overflow probability we generated 10 000 long sample paths of the arrival process. For every one of these sample paths, for all and for some values of , we computed a refined , , and , which included the constant version of in front of exponentials as in (46).3 Fig. 2 depicts the histogram of the values obtained for equal to 5000 [Fig. 2(a)–(c)] and 10 000 [Fig. 2(d)–(f)]. We were interested in the fraction of . sample paths that lead to a substantial underestimation of In Fig. 3, for every we plot the fraction of sample paths that for all yield an estimator which does not stay above be either , , or times after . More specifically, let , for some values of . For all such and we plot such that there exists the fraction of sample paths yielding with . From both Figs. 2 and 3 it can is “safer” (in the sense discussed above) than be seen that , with the difference being significant for small . Moreover, is “safe” even for very small (around 2000). To put the values of into perspective, note that estimating an overby directly observing the flow probability on the order of occupancy of the queue (i.e., measuring the fraction of time that observations. we have an overflow) requires more than Having consistent, yet “safe” estimators with relatively few observations is important in applications since it allows to quickly track changes in the statistics of the input and adapt to level shifts and other nonstationary phenomena. Of course, to avoid underestimation errors the price to pay is that for small and and for a number of sample paths the estimators overestimate the overflow probability and lead to underutilization of the system. Nevertheless, according to Proposition VII.1, as the number of measurements increases they converge to the appropriate target. Finally, we consider the result of Proposition V.3 to assess the probability of underestimating the “true” overflow probability. For the same data reported above, we generated a sample path of the arrival process and computed the empirical measure based on the first 10 000 observations. Using this empirical , , measure, we computed . In Fig. 4, we plot and

for various values of . It can be seen that calculations of can be used to assess the level of confidence in the “safety” of the proposed estimators. Alternatively, this computation can be used to determine the appropriate overflow probability to quote for a given confidence level. VIII. CONCLUSION We have proposed estimators of buffer overflow probabilities in buffers fed by Markov-modulated inputs. These estimators are based on analytical large-deviations asymptotics for these probabilities. In particular, we considered a situation where the transition probabilities of the Markov chain characterizing the

P

3To

avoid overburdening the notation, we will use the same symbols, , and P ( ), to denote the refined estimators.

m

P

,

190

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 1, JANUARY 2001

Fig. 2. Plots (a)–(c) depict histograms of the values of P ,P , and P (3). Plots (d)–(f) depict histograms of the values of P (3). The vertical line drawn at every one of these figures is at P = 4:9 10 .

2

P

,

P

, and

and corresponding right (respectively, left) eigenvector denoted by (respectively, ). Then (see [26, Sec. 2.5–2.8]) (47) Recall that

and

P P

P

Fig. 3. Comparing the estimators , , and (m), for m = 1; 3. In the legend, “PlossI,” “PlossII,” and “PlossIV(m)” denotes , , and (m), respectively.

P

P P

arrival process are not known. For any prior distribution of these transition probabilities, we characterized the large-deviations behavior of the posterior distribution. We utilized the latter to define the proposed estimators. We discussed nonlinear programming algorithms for computing these estimators and provided some illustrative numerical results. We demonstrated that the proposed estimators are preferable to the estimator based on certainty equivalence and than observing the occupancy of the queue directly.

We obtain

(48)

is the Perron–Frobenius eigenvalue of and where the corresponding left and right eigenvectors. Similarly

APPENDIX In this appendix we will calculate the gradient , in . Consider an the region where matrix . We will denote by its th its spectral radius, by its th (normaleigenvalue, by ), and by its ized) right eigenvector (solution to th (normalized) left eigenvector (solution to ). Ashas multiplicity (as in the case of ) sume that

(49)

PASCHALIDIS AND VASSILARAS: BUFFER OVERFLOW PROBABILITIES FROM MEASUREMENTS

191

bringing [19] to their attention. They are also very grateful to the anonymous reviewers for their constructive remarks and suggestions. REFERENCES

Fig. 4. We plot log P [e vertical lines at , , and

P P

P



]. For comparison we draw three (3) (from left to right, respectively).

Recall also that III,

is the largest root of the equation . Based on the discussion in Section can be alternatively written as

(50) (51) where the second equality is due to the convexity of the limiting log-moment-generating functions and the fact that is negative for sufficiently small . The third equality above is due to strong duality which holds since we are dealing with a convex programming problem (see [25, Ch. 5]). Using the envelope theorem [27], we obtain (52) are optimal solutions of the optimization problems where are given by (48). in (51), and the elements of is a Lagrange multiplier for the original optimizaNote that tion problem in (50) and, therefore, satisfies the first-order optimality condition

which implies (53) Another way of deriving this result is by using the fact that is equal to zero for all satisfying . Thus, and by using the chain rule we obtain (54) which is in agreement with (52). ACKNOWLEDGMENT The authors wish to thank A. J. Ganesh for his comments on an earlier conference version of the present paper and for

[1] J. Y. Hui, “Resource allocation for broadband networks,” IEEE J. Select. Areas Commun., vol. 6, pp. 1598–1608, Sept. 1988. [2] R. J. Gibbens and P. J. Hunt, “Effective bandwidths for the multi-type UAS channel,” Queueing Syst., vol. 9, pp. 17–28, 1991. [3] F. P. Kelly, “Effective bandwidths at multi-class queues,” Queueing Syst., vol. 9, pp. 5–16, 1991. [4] R. Guérin, H. Ahmadi, and Naghshineh, “Equivalent capacity and its applications to bandwidth allocation in high-speed networks,” IEEE J. Select. Areas Commun., vol. 9, pp. 968–981, 1991. [5] F. P. Kelly, “Notes on effective bandwidths,” in Stochastic Networks: Theory and Applications, S. Zachary, I. B. Ziedins, and F. P. Kelly, Eds. Oxford, U.K.: Oxford Univ. Press, 1996, vol. 9, pp. 141–168. [6] I. Ch. Paschalidis, “Class-specific quality of service guarantees in multimedia communication networks,” in Automatica, (Special Issue on Control Methods for Communication Networks), V. Anantharam and J. Walrand, Eds., 1999, vol. 35, pp. 1951–1968. [7] D. Bertsimas and I. Ch. Paschalidis. (1999, Feb.) Probabilistic service level guarantees in make-to-stock manufacturing systems. Dept. Manuf. Eng., Boston Univ. Tech. Rep. [Online]. Available http://ionia.bu.edu [8] P. Glasserman, “Bounds and asymptotics for planning critical safety stocks,” Operations Res., vol. 45, no. 2, pp. 244–257, 1997. [9] I. Ch. Paschalidis and Y. Liu. (2000, April) Large deviations-based asymptotics for inventory control in supply chains. Dept. Manuf. Eng., Boston Univ. [Online]. Available http://ionia.bu.edu [10] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications, 2nd ed. New York: Springer-Verlag, 1998. [11] D. Bertsimas, I. Ch. Paschalidis, and J. N. Tsitsiklis, “On the large deviations behavior of acyclic networks of G/G/1 queues,” Ann. Appl. Probab., vol. 8, no. 4, pp. 1027–1069, 1998. , “Asymptotic buffer overflow probabilities in multiclass multi[12] plexers: An optimal control approach,” IEEE Trans. Automat. Contr., vol. 43, pp. 315–335, Mar. 1998. [13] M. Grossglauser and D. N. C. Tse, “A framework for robust measurement-based admission control,” IEEE/ACM Trans. Networking, vol. 7, pp. 293–309, June 1999. [14] , “A time-scale decomposition approach to measurement-based admission control,” in Proc. IEEE INFOCOM, 1999. [15] N. G. Duffield, “A large deviation analysis of errors in measurement based admission control to buffered and bufferless resources,” in Proc. IEEE INFOCOM, 1999. [16] N. G. Duffield, J. T. Lewis, N. O’Connell, R. Russell, and F. Toomey, “Entropy of ATM traffic streams: A tool for estimating QoS parameters,” IEEE J. Select. Areas Commun., vol. 13, pp. 981–990, June 1995. [17] A. Ganesh, P. Green, N. O’Connell, and S. Pitts, “Bayesian network management,” Queueing Syst., vol. 28, pp. 267–282, 1998. [18] A. Ganesh and N. O’Connell, “An inverse of Sanov’s theorem,” BRIMS, HP Labs, Bristol, U.K., Tech. Rep., 1998. [19] F. Papangelou, “Large deviations and the Bayesian estimation of higherorder Markov transition functions,” J. Appl. Probab., vol. 33, pp. 18–27, 1996. [20] H. Cramér, “Sûr un nouveau théorème-limite de la théorie des probabilités,” in Actualités Scient. Industrielles (Colloque Consacré à la Théorie des Probabilités). Paris, France: Hermann, 1938, pp. 5–23. [21] R. T. Rockafellar, Convex Analysis. Princeton, NJ: Princeton Univ. Press, 1970. [22] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [23] P. W. Glynn and W. Whitt, “Logarithmic asymptotics for steady-state tail probabilities in a single-server queue,” J. Appl. Probab., vol. 31A, pp. 131–156, 1994. [24] P. Lancaster, Theory of Matrices. New York: Academic, 1969. [25] D. P. Bertsekas, Nonlinear Programming. Belmont, MA: Athena Scientific, 1995. [26] J. H. Wilkinson, The Algebraic Eigenvalue Problem. Oxford, U.K.: Oxford Univ. Press, 1965. [27] H. R. Varian, Microeconomic Analysis, 3rd ed. New York: W. W. Norton, 1992. [28] D. Bertsimas and I. Ch. Paschalidis, “Probabilistic service level guarantees in make-to-stock manufacturing systems,” Oper. Res., to be published.

Suggest Documents