Approximation Capability of Independent Wavelet Models to Heterogeneous Network Traffic Chuanyi Ji, Sheng Ma and Xusheng Tian Abstract— In our previous work, we showed empirically that independent wavelet models were parsimonious, computationally efficient, and accurate in modeling heterogeneous network traffic measured by both auto-covariance functions and buffer loss rate. In this work, we focus on auto-covariance functions, to establish a theory of independent wavelet models as unified models for heterogeneous network traffic. We have developed the theory on the approximation capability of independent wavelet models for heterogeneous traffic in terms of the decay rate of auto-covariance functions at large lags. Average auto-covariance functions of independent wavelet models have been derived and shown to be linear combinations of basis functions. Through a simple analytical expression, we have shown that the decay rate of the auto-covariance functions of independent wavelet models is determined explicitly through a single quantity called the rate function of variances of wavelet coefficients. By specifying analytical forms of the rate function, independent wavelet models have been shown as unified models of heterogeneous traffic in terms of auto-covariance functions. The simplicity of the theory thereby provides both quantitative and qualitative explanations why independent wavelet models are unified models of heterogeneous traffic.
I. Introduction Modeling and analysis of heterogeneous network traffic is imperative to supporting multi-media applications with diverse statistical characteristics and Quality of Service (QoS) requirements. The heterogeneity of the traffic can be characterized by the rate of decay of the corresponding autocovariance function at large lags. For short-range dependent (SRD) traffic such as voice [1] and video conferencing [2], the decay rate is either exponential or polynomial (of order greater than one) so that the auto-covariance is summable. For long-range dependent (LRD) traffic such as Ethernet data [3], the decay rate is polynomial (of order at most one) so that the auto-covariance is non-summable. For a mixture of SRD and LRD such as VBR video traffic [4][5][6], the decay rate is dominated by the one with the heaviest tail. Therefore, to show that a class of models is versatile enough to approximate heterogeneous traffic in terms of capturing auto-covariance function, we need to show that the auto-covariance function of the models can exhibit the same rates of decay as the traffic. Recently, several models have been proposed to model heterogeneous network traffic. They include superposition of Markov models [7][8], M/G/∞ models [4][9], FARIMA models [5][10], and wavelet models [6]. This work was supported by NSF grant NCR 9805338, and in part by NSF grant CAREER IRI-9502518, and in part by DARPA grant F19628-98-C-0057. C. Ji and X. Tian are with Department of Electrical, Computer and System Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, e-mail: {jic,tianx}@rpi.edu. S. Ma is with the IBM T.J. Watson, Hawthorne, NY 10532, e-mail:
[email protected].
To evaluate the ability of a traffic model, three key issues [11] need to be addressed. They are performance, efficiency, and feasibility for analysis. The performance addresses the ability of a model to characterize significant statistical properties in a diverse class of network traffic. Commonly-used criteria on performance include the loss rate, and auto-covariance function1 , where the latter shows how a model captures the second-order properties in the traffic. The efficiency deals with the computational complexity required to develop a model and to process large data sets using the model. The feasibility for analysis implies whether a model is simple enough for analyzing the performance of the model, and the network performance. Markov models are capable of approximating any stationary processes provided the number of states is sufficiently large [12]. When applied to network traffic, however, they often result in a complicated structure within too many parameters [13]. M/G/∞ models have been shown to be able to model a versatile class of processes [14][4]. But because M/G/∞ is a compound point process, it is not clear whether it can scale well for modeling traffic of a large volume. FARIMA models can model different types of traffic [5]. But the computational cost required for both model development and synthetic traffic generation is prohibitively high [5][10]. In our previous work, we developed unified models [6][15] for heterogeneous traffic using wavelets. We discovered through experiments that the independent wavelet models were able to match the sample auto-covariance functions, and to faithfully predict buffer loss rate2 . The independent wavelet models were also shown to be parsimonious, and had the lowest computational complexity achievable. A question we did not answer though is why the independent wavelet models have such a good performance in terms of matching auto-covariance functions, and predicting buffer loss rate from synthesized traffic. As the previous work focused mostly on feasibility and development of the models, a theory needs to be established on the approximation capability of independent wavelet models for heterogeneous traffic. In this paper, we will develop such a theory by first deriving a simple analytical form of the auto-covariance functions of the independent wavelet models. Such a simple form makes it possible for us to show that the independent wavelet models are “universal approximators” of heterogeneous traffic ranging from shortrange to long-range dependence. In particular, we will show that the decay rate of the auto-covariance functions of independent wavelet models is determined in an explicit way by 1 also
quantities related to delays and marginal distributions. marginal distributions other than Gaussian, shaping algorithm is required for better match in terms of buffer loss rate. 2 For
the growth rate of variances of wavelet coefficients. We will provide examples of the growth rate for heterogeneous traffic to demonstrate various decay rate of auto-covariance functions of independent wavelet models. We will also show that the theory leads to a simple algorithm for approximating an auto-covariance function directly, and the model is parsimonious and computationally efficient.
A. Definition Let xK (t) be the resolution-limited approximation[17] of x(t) for which information at resolutions higher than 2K is discarded, for K > 0. Then j
xK (t) =
II. The Independent Wavelet Models
x(t) =
+ φ0 ,
(4)
and
x(t) = lim xK (t). m dm j φj (t)
K→∞
(1)
j=1 m=0
φm j (t)
m dm j φj (t)
j=1 m=0
Let x(t) be a random process generated from an independent wavelet model for discrete time t (t ≥ 0). That is, through the inverse wavelet transform, we have ∞ X ∞ X
K 2X −1 X
3
dm j
where and are the Haar wavelet basis functions (see [6][16] for details) and the corresponding wavelet coefficients, respectively, at the time scale j (j ≥ 1, and an integer) and shift m (m ≥ 0, and an integer). The wavelet basis functions are obtained by dilating and translating a mother wavelet φ(t), where
An average auto-covariance function of x(t) can be defined through that of xK (t). Definition 1: Let TK = 2K , and t, τ > 0 be the discrete time. The average auto-covariance function of xK (t) is defined as
RK (τ ) = if 0 ≤ t < 1/2, 1 φ(t) = −1 if 1/2 ≤ t < 1, 0 otherwise.
(5)
TK −1 1 X E[xK (t)xK (t + τ )]. Tk t=0
(6)
The average auto-covariance function of x(t) with unlimited resolution is defined as:
The wavelet coefficients, dm j ’s, can be obtained through the wavelet transform R(τ ) = lim RK (τ ) dm j =
+∞ X
K→∞
x(t)φm j (t).
(2)
(7)
assuming the limit exists.
t=0
For the independent wavelet models we consider in this paper, dm j ’s are assumed to be independent random variables. For a given j (j ≥ 1), dm j ’s are i.i.d. random variables with a zero mean and a variance σj2 . That is,
B. An Analytical Expression for Independent Wavelet Models To obtain a closed-form expression of R(τ ), we insert xK (t) given by Equ. (4) into Equ. (6), and obtain K−j
K−j
1 m2 E[dm j1 d j2 ] =
σj2 0
m1 = m2 and j1 = j2 = j, otherwise.
(3)
2 −1 1 −1 2 K 2 K X 1 X X X 1 m2 RK (τ ) = {E[dm j1 dj2 ]· Tk j =1 j =1 m =0 m =0 1
2
TX k −1
III. Average Auto-Covariance Function of An Independent Wavelet Model Although an auto-covariance4 function is well defined for a wide-sense-stationary process, the existing definition does not apply directly to that of the independent wavelet models. This is because the time signal x(t) generated from the independent wavelet model given in Equ. (1) is non-stationary [17], where the non-stationarity is introduced by the nondifferentiable Haar wavelets. Therefore, an appropriate autocovariance function needs to be defined for the independent wavelet models. 3 Since
for the Haar wavelet, φ0 represents the mean of x(t), without loss of generosity, φ0 = 0 is assumed in the rest of the paper. 4 Since for the independent wavelet models, the wavelet coefficients are zero-mean, auto-correlation and auto-covariance functions are the same. We will use both terms interchangeably.
2
1
m2 1 φm j1 (t)φj2 (t + τ )}.
(8)
t=0
Define the (deterministic) temporal correlation hj (τ ) of the wavelet basis φm j (t) to be
hj (τ ) =
Tj −1
X
φ0j (t)φ0j (t + τ ),
t=0
where Tj = 2j . Using Equ. (3) for the independent wavelet models, the average auto-correlation functions with limited and un-limited resolutions can be rewritten as5 : 5 Please
refer to Appendix A for details.
RK (τ )
=
K X
σj2 2−j hj (τ ),
(9)
+∞ X
σj2 2−j hj (τ ).
(10)
j=1
R(τ )
=
j=1
For the Haar basis functions, hj (τ ) can be easily shown to possess a simple form: 3τ 1 − Tj 0 ≤ τ < Tj /2, τ − 1 Tj /2 < τ < Tj , hj (τ ) = Tj 0 otherwise.
(11)
IV. Independent Wavelet Models: “Universal Approximators” of Heterogeneous Traffic
hj (τ )
The heterogeneity of traffic can be characterized by the rate of decay at the large lags of auto-covariance functions. If a theory can be established to show that the average autocovariance function of the independent wavelet models exhibits similar decay rate at the tail, the independent wavelet models will be proven to be “universal approximators” of auto-covariance functions of heterogeneous traffic. To do so, we need to specify conditions on σj2 ’s, which completely determine the average auto-covariance function of the independent wavelet models.
1
0
0.5T j
τ
Tj
-0.5
Fig. 1. hj (τ ) versus τ .
Such a hj (τ ) is plotted in Fig. 1, which shows that hj (τ ) is piece-wise linear and has limited support. The support [0, Tj ) increases as a function of the time scale (j). A set hj (τ ) at all time scales forms a set of basis functions for R(τ ). The average auto-covariance function given by Equ. (10) is therefore a linear combination of the basis functions, where a base hj (τ ) is weighted by the variance, σj2 , of the wavelet coefficient dm j at the same time scale. As will soon be shown, such a simple expression of the average auto-covariance function provides an intuitive understanding of the independent wavelet models, and makes it possible to establish the theory. C. Understanding Based on Intuition 1
basis φm j (t) and the basis hj (τ ) of the auto-covariance function in Fig. 2 at several time scales. The reason then becomes clear why independent wavelet models can capture the temporal dependence in network traffic. The basis functions ( φm j (t)’s or hj (τ ) ’s) have absorbed most of the temporal dependence in the traffic. Then the dependence among wavelet coefficients can be neglected. The left figure also shows that the wavelet basis φm j (t)’s are (deterministic) self-similar, which naturally matches the (statistical) self-similarity in bursty traffic. The right figure shows that due to the intrinsic multi-resolution property of wavelets, such hj (τ ) ’s with the longer and longer support build up the auto-covariance function in a progressive fashion at larger and larger time scales.
A. Sample Variances of Wavelet Coefficients of Heterogeneous Traffic What conditions on σj2 ’s are feasible to describe the variances of wavelet coefficients of heterogeneous traffic? An answer to such a question can be drawn from sample variances of independent wavelet coefficients corresponding to heterogeneous traffic. In particular, sample variances of independent wavelet coefficients are obtained using the traces from the following cases: SRD traffic generated from AR(1), LRD traffic from F ARIM A(0, 0.4, 0), a mixture of SRD and LRD from F ARIM A(1, 0.4, 0), a VBR video trace (“Star Wars”6 ), and an Ethernet data trace (pAug7 ).
1
8 AR(1) FARIMA(0,0.4,0) FARIMA(1,0.4,0)
6
j=3
3
2
2
3
4
2
4
6
4
7
6
5
7
j
5
log (Var(d ))
j=2
0
2
j=1
−2
−4
1
2
3
4
5
6
7
8
t
1
2
3
4
5
6
7
8
t −6
Fig. 2. Left figure shows the Haar wavelet basis functions. Right figure illustrates the corresponding basis hj (τ ) of the auto-covariance function.
To understand intuitively why the independent wavelet models are capable of capturing complex temporal dependence in heterogeneous traffic, we plot both the Haar wavelet
−8
0
2
4
6 Time Scale j
8
10
Fig. 3. log2 σj2 versus the time scale j. 6 See 7 See
[5] for detailed description of this trace. [18] for detailed description of this trace.
12
10 pAug Star Wars
8
6
j
log (Var(d ))
4
2
2
0
−2
−4
−6
0
2
4
6 Time Scale j
8
10
12
Fig. 4. log2 σj2 versus the time scale j.
Fig. 3 plots the (logarithmic) sample variances of wavelet coefficients as a function of the time scale for AR(1), F ARIM A(0, 0.4, 0), and F ARIM A(1, 0.4, 0), respectively. A similar plot is given in Fig. 4 for VBR video traffic and Ethernet data. B. Conditions on The Variances: The Rate Function Two common features can be observed for all these cases which represent heterogeneous traffic. First, the variances are non-decreasing8 with respect to the time scale. Second, σj2 , as a function of j, can be characterized as either increasing until approaching a constant at a large j, or increasing at all j’s. For developing the theory, we can assume that σj2 increases9 2 with respect to j, i.e., σj+1 > σj2 for 1 ≤ j. σj2 as a function of j can be further specified through a so-called rate function f(y). Such a rate function will be used to specify the growth rate of σj2 with respect to the time scale. f(y) is assumed to be a differentiable and monotonically increasing function of y, where y ∈ R1 (y > 0). Furthermore, f(y) − log y is assumed to be monotonic10 . Then σj2 can be characterized as
σj2 =
m > 0 is a constant. O(q) represents a quantity in the order of q. To understand the meaning of the constraints on f(y), let j us consider σj2 = A − C 2−f(2 )+j as an example. The conditions f(y) − log y → +∞, and f(y) − log y being monotonic, ensure that σj2 is a monotonically increasing function of j. log y < f(y) < O(ym ) and (log y)′ < f ′ (y) < O(ym−1 ) specify both the rate of growth of σj2 , and how smooth the growth is. The former is needed to obtain a versatile class of decay rate at R(τ ) corresponding to heterogeneous traffic. The latter is necessary for deriving the main theoretical result, where R(τ ) will be related to σj2 through f(2j ) in a simple form. Similar understanding can be obtained for other cases of σj2 . To further understand whether those conditions can be satisfied by well-known examples of heterogeneous traffic, we consider an example of SRD traffic, where f(y) = yα log ρ1 for α > 0, and 0 < ρ < 1. Then f(2j ) = 2αj log ρ1 , which is monotonically increasing. j − f(2j ) < 0 for j large and approaches +∞ monotonically when j → +∞. In addition, both f(y) < O(ym ) for m > α, and f ′ (y) < O(ym−1 ). Then all conditions are satisfied for such f(y). It should be noted that since the above conditions require σj2 to have a certain growth rate, they exclude the case for i.i.d. traffic11 , where f(2j ), and σj2 is a constant. C. The Main Theorem Using the specified σj2 ’s, we are ready to derive an explicit expression at large lags of the auto-covariance function of an independent wavelet model. Intuitively, since f(y) is the rate function characterizing the rate of increase of σj2 , the corresponding auto-covariance function should be determined through f(y) also. Theorem 1: Let σj2 be the variance of wavelet coefficients m dj for j ≥ 1 and m ≥ 0. Let σj2 satisfy the conditions given in Equ. (12). Then for τ sufficiently large,
j
Bl (τ ) ≤ R(τ ) ≤ Bu (τ ),
A − C 2−f(2 )+j , if f(y) is so chosen that for y sufficiently large, (a) log y < f(y) < O(ym ) for m > 0, (b) (log y)′ < f ′ (y) < O(ym−1 ), (c) f(y) − log y → +∞ when y → +∞
(13)
Bu(τ) l (τ) where limτ→+∞ 2B −f(aτ) = Al , and limτ→+∞ 2−f(bτ) = Au , for a ∈ [2, 4] and b ∈ [1, 2]. Al > 0 and Au > 0 are constants. Equ. (13) can be rewritten as, for τ sufficiently large,
j
B 2−f(2 )+j , if f(y) is so chosen that for y sufficiently large, (a) log y − f(y) → +∞ when y → +∞ (b) p log log y < f(y) < log y for p > 1, and (c) (log log y)′ < f ′ (y) < y1 , (12)
where A, C and B are positive constants, with A > C. 8 If we take statistical variations of the sample estimates into consideration. 9 σ 2 is taken to be increasing rather than non-decreasing for the feasij bility of analysis. 10 Unless otherwise stated, log is to be understood as log in the rest 2 of the paper.
O(2−f(aτ) ) ≤ R(τ ) ≤ O(2−f(bτ) ). The proof of the theorem can be found in Appendix B. The correctness of the result can be examined through simple examples. First, since f(y) is a monotonic increasing function, 2−f(aτ) ≤ 2−f(bτ) , and both the upper and lower bounds decrease monotonically. Second, consider the Fractional Gaussian Noise (FGN) process, for which σj2 = (22H−1 − 1)2j(2H−1) with 12 < H < 1 being the Hurst parameter. Then f(2j ) = 2(1 − H) j. Replacing 2j by aτ , we have f(aτ ) = 2(1 − H) log τ + 2(1 − H) log a. So the lower 1 ). Similarly, we can show that bound of R(τ ) is O( τ 2(1−H) 11 R(τ )
for i.i.d. traffic can be found in [15].
the upper bound is in the same order as the lower bound. 1 ). This is the same as that derived Then R(τ ) = O( τ 2(1−H) in [15] through a different approach. The example therefore confirms the correctness of the theorem on FGN traffic. D. Examples What are the examples of the rate function f(2j ) which correspond to auto-covariance functions of well-known heterogeneous traffic? D.1 Examples of SRD Traffic The first such rate function corresponds to short-range dependent traffic, discussed previously. αj Corollary 1: When σj2 = A − C ρ2 for α > 0 and 0 < ρ < 1, i.e., f(2j ) = 2αj log ρ1 + j, α
R(τ ) = O(ρτ )
R(τ ) = O(
τ 1+α
)
for τ sufficiently large. Compared with Corollary 1, Corollary 2 shows that when σj2 increases at a slower rate with respect to the time scale (from super-exponential to exponential before approaching a constant), the decay rate of R(τ ) at the tail decreases from exponential to polynomial. D.2 Examples of LRD Traffic When σj2 increases at an even slower rate, the tail of the auto-covariance function is even heavier, which leads to longrange dependent traffic. The first such case is for the decay at the tail of R(τ ) to be dominated by τ1 so that the autocovariance function is non-summable. Corollary 3: Let σj2 = Bj n , where n > 0 is a constant. Then 1 R(τ ) = O( ). τ
1 R(τ ) = O( 1−α ). τ As mentioned in Section IV-B , when α = 2H − 1, this case corresponds to FGN traffic. D.3 An Example of Mixtures Complex network traffic may also be modeled as a mixture of SRD, and LRD. To illustrate this idea, consider such a mixture, where
(14)
for τ sufficiently large. The proof of the corollary can be derived directly from the results of the theorem and omitted. When α = 1, R(τ ) = O(ρτ ) decays exponentially. This corresponds to the auto-covariance of a first-order Markov process specified through a first-order Auto-Regressive process [15]. When 0 < α < 1, R(τ ) decays√sub-exponentially. In particular, when α = 12 , R(τ ) = O(e−β τ ) (β > 0), which corresponds to the auto-covariance function resulting from an M/G/∞ model for VBR video traffic [4]. The second example due to SRD process is for R(τ ) decaying at a polynomial rate, and is summable. Corollary 2: When σj2 = A−C 2−αj for α > 0, i.e., f(2j ) = (1 + α) j, we have 1
The corollary shows that when σj2 increases at a polynomial rate, the resulting auto-covariance function is long-range dependent. The next example also corresponds to long-range dependent traffic, where σj2 increases at an exponential rate. Corollary 4: Let σj2 = B 2αj , i.e., f(2j ) = (1 − α) j, where 0 < α < 1. Then
σj2 = p1 σj21 + p2 σj22 , for p1 , p2 > 0, σj21 = A − C 2−α1j , and σj22 = B 2α2j , Since R(τ ) is linear in σj2 , we have R(τ ) = p1 R1 (τ ) + p2 R2 (τ ), where
R1 (τ ) =
+∞ X
σj21 2−j hj (τ ),
R2 (τ ) =
+∞ X
σj22 2−j hj (τ ).
j=1
j=1
For small j’s, σj21 dominates σj2 , and R1 (τ ) dominates R(τ ). For large j’s, σj22 dominates σj2 , and R2 (τ ) dominates R(τ ). E. Discussions To get a coherent view on the capability of independent wavelet models as universal approximators to auto-covariance functions of heterogeneous traffic, we summarize the examples in Table I. The examples show explicitly the rate function f(y) for well-known heterogeneous traffic, and how the rate function f(y) of independent wavelet coefficients determines the rate of decay of the auto-covariance function. How should we interpret the relationship between the rate function f(y) (the growth rate of σj2 ) and the nature of heterogeneous traffic? The increase in the variances of wavelet coefficients at a larger time scale indicates the amount of new statistical information captured by that time scale, which further indicates the range of temporal dependence in the traffic. As an extreme case when σj2 is a constant at all time scales, the amount of statistical information remains the same at all time scales. As there is no new information captured when the time scale is being increased, the original process is a
TABLE I Examples. A > 0, B > 0, 0 < ρ < 1.
1 2 3 4 5
2αj
A − C · ρ ,α > 0 A − C · 2−αj , α > 0 B · jn , n > 0 B · 2αj ,0 < α < 1 A mixture
f(y) α
1 ρ
log y + y log (1 + α) log y log y − n · log(log y) (1 − α) log y A mixture
R(τ )
x(t)
Examples on Traffic
τα
SRD SRD LRD LRD
video conferencing, VBR video ([4])
O(ρ ) 1 ) O( τ 1+α 1 O( τ ) 1 ) O( τ 1−α A mixture
white noise process [19]. When σj2 increases rapidly at small time scales but soon saturates at larger time scales, most of the statistical information is being captured by the small time scales. The corresponding process is thus short-range dependent. When the variances either keep increasing at all time scales or approaching a constant at a very slow rate, statistical information is distributed at almost all time scales. The corresponding process is long-range dependent.
Ethernet data ([18])([6]) VBR video
1 Sample auto−correlation Average auto−correlation
0.9
0.8
0.7
Auto−correlation function
σj2
Case
0.6
0.5
0.4
0.3
V. Modeling Average Auto-Covariance Functions
0.2
0.1
Not only does the theory show what independent wavelet models can do in terms of modeling heterogeneous traffic, but also suggests a simple method for modeling the average auto-covariance functions directly.
0
0
100
200
300
400
500 Lags
600
700
800
900
1000
Fig. 5. Auto-correlation functions for “Star Wars”. “—”: Sample autocorrelation function; “− − −”:Average auto-correlation function.
A. An Algorithm 1
4. Use σˆj2 as an estimate of σj2 in Equ. (9) to obtain RK (τ ) as an approximation for R(τ ).
B. Experimental Results A VBR video trace (“Star Wars”) and an Ethernet data trace (pAug) were used to test the results on average autocovariance functions. The average auto-correlation functions are obtained through the algorithm presented in Section V-A. Sample auto-covariance functions and average auto-covariance functions of the VBR video and Ethernet data are plotted in Fig. 5 and Fig. 6, respectively. Fig. 7
Sample auto−correlation Average auto−correlation
0.9
0.8
0.7
Auto−correlation function
Consider the auto-covariance function RK (τ ) given in Equ. (9). To obtain a desired average auto-covariance function using a trace, we simply need to obtain sample variances of wavelet coefficients at different time scales. These sample variances can then be used to weight hj (τ ) ’s to obtain RK (τ ), which can be regarded as an approximation to R(τ ). This method directly estimates the auto-covariance function, and provides an alternative approach to our previous work that models wavelet coefficients [6][15]. A simple algorithm to implement such an idea is summarized below. 1. Choose a traffic trace of length N , where N > τ0 with τ0 being the largest lag of interest in an auto-covariance function. 2. Compute the wavelet coefficients dm j ’s through wavelet transform using Equ. (2) . 3. Obtain sample variance at the j-th time scale through Pthe nj 2 σ ˆj2 = n1j m=1 (dm j ) .
0.6
0.5
0.4
0.3
0.2
0.1
0
0
100
200
300
400
500 Lags
600
700
800
900
1000
Fig. 6. Auto-correlation functions for Bellcore trace pAug. “—”: Sample auto-correlation function; “− − −”:Average auto-correlation function.
and Fig. 8 illustrate the squared difference between the sample auto-covariance function and the average auto-covariance functions evaluated for the two traces, respectively. As Fig. 5 and Fig. 6 indicate, the average auto-covariance function matches the sample auto-covariance function very well, which suggests that the independent wavelet models have the ability to faithfully model the second-order statistics of the real traffic, and the algorithm is feasible. C. Efficiency of Independent Wavelet Models The efficiency of the independent wavelet models can be characterized in terms of (1) the number of parameters in RK (τ ), and (2) the computation time needed to estimate the
−2
10
−3
10
−4
10
−5
Squared error
10
−6
10
−7
10
−8
10
−9
10
−10
10
0
100
200
300
400
500 Lags
600
700
800
900
1000
Fig. 7. Squared Error between sample auto-correlation and average auto-correlation: “Star Wars”. −2
10
−3
10
−4
10
−5
Squared error
10
to develop the theory on approximation capability of the independent wavelet models for heterogeneous traffic in terms of the decay rate of auto-covariance functions at large lags. We have been able to show that the average auto-covariance function of an independent wavelet model decays exponentially in terms of a rate function. Such a rate function characterizes the growth rate of variances of wavelet coefficients. By specifying different forms of rate functions, the auto-covariance functions of the independent wavelet models have exhibited the same decay rate as that of a versatile class of heterogeneous traffic. We have also derived a simple algorithm based on the theory to model auto-covariance functions directly, and shown our model is computationally efficient. The theory we have established so far is based on the autocovariance function. As auto-covariance functions alone are not sufficient to capture all significant statistics in network traffic, we will investigate in our future work how to evaluate the approximation capability of independent wavelet models through buffer loss rate to count for higher-order statistics and effects of non-Gaussian traffic.
−6
10
References
−7
10
[1]
−8
10
−9
10
−10
10
0
100
200
300
400
500 Lags
600
700
800
900
1000
[2]
Fig. 8. Squared Error between sample auto-correlation and average auto-correlation: pAug.
[3]
parameters (and to generate the synthesized traffic of length N ). K is the number of parameters in RK (τ ). Let τ0 be the largest lag of our interest. Then K > log τ0 , since the basis function hj (τ ) has limited support, and the time scale needs to be sufficiently large to have some hj (τ0 ) 6= 0. Meanwhile, if N is the length of a trace from which sample variances are estimated, then K ≤ log N . Therefore, K satisfies
[4]
[5] [6] [7] [8]
log τ0 < K ≤ log N The computation time needed to obtain the sample variances is at most O(N ). The computational complexity needed to generate synthesized traffic of length N is also O(N ) by independent wavelet models [6][15]. Therefore, independent wavelet models (in terms of average auto-covariance functions) have O(log N ) parameters, and computational complexity O(N ). This shows that independent wavelet models are efficient.
[9]
VI. Conclusion
[13]
Based on average auto-covariance functions of the independent wavelet models we defined in this work, we have derived a simple closed-form expression for the average auto-covariance functions. Using such a simple expression, we have been able
[10]
[11] [12]
[14] [15]
H. Heffes and D. Lucantoni, “A Markov modulated characterization of voice and data traffic and related statistical multiplexer performance,” IEEE J. Select. Areas Commun., vol. 4, pp. 856–867, 1986. D. Cohen and D. Heyman, “A simulation study of video teleconferencing traffic in ATM networks,” in Proc. IEEE INFOCOM’93, (San Francisco, CA), 1993. W. Leland, M. Taqqu, W. Willinger, and D. Wilson, ““SelfSimilarity in high-speed packet traffic: Analysis and modeling of Ethernet traffic measurements”,” Statistical Science, vol. 10, no. 1, pp. 67–85, 1995. M. M. Krunz and A. M. Makowski, “Modeling video traffic using M/G/∞ input processes: a compromise between Markovian and LRD models,” IEEE J. Select. Areas Commun., vol. 16, pp. 733– 748, June 1998. M. W. Garrett and W. Willinger, “Analysis, modeling and generation of self-similar VBR video traffic,” in Proc. ACM SIGCOMM’94, (London, UK), pp. 269–279, 1994. S. Ma and C. Ji, “Modeling video traffic in the wavelet domain,” in Proc. IEEE INFOCOM’98, (San Francisco, CA), April 1998. D. Tse, R. Gallager, and J. Tsitsiklis, “Statistical multiplexing of multiple time-scale Markov streams,” IEEE J. Select. Areas Commun., vol. 43, pp. 1566–1579, 1995. A. T. Andersen and B. F. Nielsen, “An application of superpositions of two state Markovian sources to the modeling of self-similar behavior,” in Proc. IEEE INFOCOM’97, (Kobe, Japan), pp. 196– 204, 1997. D. Cox, ““Long-Range Dependence : A Review”,” in Statistics : An Appraisal (H.D.David and H.T.David, eds.), pp. 55–74, The Iowa State University Press, 1984. C. Huang, M. Devetsikiotis, I. Lambadaris, and A. Kaye, “Modeling and simulation of self-similar variable bit rate compressed video: A unified approach,” in Proc. ACM SIGCOMM’96, (San Francisco, CA), pp. 114–125, 1996. V. Frost and B. Melamed, “Traffic modeling for telecommunications networks,” IEEE Communications Magazine, vol. 32, pp. 70–80, 1994. G. Box and G. Jenkins, Time Series Analysis, forecasting and control. Holden-Day, 1976. A. T. Andersen and B. F. Nielsen, “A Markovian approach for modeling packet traffic with long-range dependence,” IEEE J. Select. Areas Commun., vol. 16, pp. 719–732, June 1998. M. Krunz and S. K. Tripathi, “Exploiting the temporal structure of MPEG video for the reduction of bandwidth requirements,” in Proc. IEEE INFOCOM’97, (Kobe, Japan), 1997. S. Ma, Traffic modeling and analysis. PhD thesis, Department of
[16] [17] [18] [19]
Electrical, Computer and systems Engineering, Rensselaer Polytechnic Institute, 1998. I. Daubechies, Ten Lectures on Wavelets. Philadelphia: SIAM, 1992. G. Wornell and A. Oppenheim, “Wavelet-based representations for a class of self-similar signals with application to fractal modulation,” IEEE Trans. Inform. Theory, vol. 38, pp. 785–800, 1992. W. Leland, M. Taqqu, W. Willinger, and D. Wilson, “On the selfsimilar nature of Ethernet traffic,” IEEE/ACM Trans. Networking, vol. 2, pp. 1–15, February 1994. L. Kaplan and C. Kuo, “Fractal estimation from noisy data via discrete fractional Gaussian noise (DFGN) and the Haar basis,” IEEE Trans. Inform. Theory, vol. 41, no. 12, pp. 3554–3562, 1993.
1. Deriving The Lower Bound Let kτ = ⌊log2 τ ⌋, for any τ ≥ 2. By the definition of R(τ ), we have kτ X
R(τ ) =
RK (τ ) =
K X
σj2
m1=0
j=1
=
j=1
2K−j X−1
kτ X
X
φ0j (t)φ0j (t + τ ) =
Tj −1
X
Furthermore, from the expression of hj (τ ) , Equ. (11), we have
hkτ +l (τ ) =
(15)
where Tj = 2j . The last step in the above equation is obtained by using the fact that: Tj −1
=
m φm j (t) φj (t + τ ),
K−j
for 0 ≤ m ≤ 2 − 1. Define the (deterministic) temporal correlation, φm (t), of Haar basis functions as j
hj (τ ) =
Tj −1
X
τ 2−(kτ +1) − 1 1 − 3τ 2−(kτ +l)
l=1 l≥2
(18)
Then
t=0
t=0
σj2 2−j hj (τ ) = 0.
j=1
Tk −1 1 X 1 φm1 (t) φm j (t + τ ) TK t=0 j
Tj −1 1 X 0 φ (t) φ0j (t + τ ), Tj t=0 j
σj2 2−j hj (τ ).
Since hj (τ ) = 0 for τ > Tj (Tj = 2j ),
By manipulating the expression for RK (τ ), we have
σj2
∞ X
j=kτ +1
j=1
Appendix A: Deriving The Analytical Expression for Average Auto-Covariance Functions
K X
σj2 2−j hj (τ ) +
R(τ ) σk2τ +1 2−(kτ +1) (τ 2−(kτ +1) − 1) + ∞ X σj2 2−j (1 − 3τ 2−j )
(19)
j=kτ +2
Since σj2 is increasing,
φ0j (t) φ0j (t + τ ).
∞ X
t=0
σj2 2−j (1 − 3τ 2−j )
j=kτ +2
Then the average auto-correlation function in Equ. (15) can be rewritten as:
∞ X
≥
σk2τ +2
=
σk2τ +2 (2−(kτ +1)
2−j (1 − 3τ 2−j )
j=kτ +2
RK (τ ) =
K X
σj2 2−j hj (τ ).
(16)
j=1
R(τ ) ≥ O( +∞ X
σj2 2−j hj (τ ).
(20)
Putting Equ. (20) into (19), we have
The average auto-correlation function of an independent wavelet model with un-limited resolution can be expressed as
R(τ ) =
− τ 2−(2kτ +2) .
(17)
j=1
For Haar basis functions, hj (τ ) is shown in Equ. (11). Inserting hj (τ ) into the expressions for RK (τ ) and R(τ ), we have the analytical expressions for both RK (τ ) and R(τ ).
To show the lower bound is as given in the theorem, we need σ2
−σ2
to show next that kτ +2 τ kτ +1 = O(2−f(aτ) ), where a ∈ [2, 4], using the two expressions of σj2 . j (a) σj2 = A − C 2−f(2 )+j
Appendix B: Proof of The Main Theorem The proof of the theorem consists of two steps: (1) to derive a lower bound, and (2) to derive an upper bound. Within each step we need to derive the bound using the two expressions of σj2 ’s.
σk2τ +2 − σk2τ +1 ). τ
= =
1 2 (σ − σk2τ +1 ) τ kτ +2 kτ +2 C −f(2kτ +1 )+(kτ +1) )+(kτ +2) [2 − 2−f(2 ] τ −δ −δ O(2 × 2−f(2τ 2 ) − 4 × 2−f(4τ 2 ) )
where 0 ≤ δ < 1. Define g(x; z) = x 2−f(xz) for x > 0 and −δ −δ z > 0 large. Let γl = 2 × 2−f(2τ 2 ) − 4 × 2−f(4τ 2 ) , which is
2. Deriving The Upper Bound j
(a) σj2 = A − C 2−f(2 )+j Inserting the given σj2 into the expression of R(τ ), we have
γl = g(2; z) − g(4; z), −δ
where z = τ 2 . Since g(x; z) is continuous and differentiable with respect to x, by the mean value theorem, we have
R(τ ) = A
+∞ X
2−j hj (τ ) − C
It can be easily shown that A
−C
where a ∈ [2, 4], and g′ (a; z) = 2−f(xz) [1 − xzf ′ (xz) ln 2] |x=a
−g′ (a; z) = O(2−f(aτ) ),
+∞ X
P+∞ j=1
2−j hj (τ ) = 0, and
j
2−f(2 ) hj (τ )
j=1
=
Since for τ sufficiently large (and thus xz sufficiently large), the conditions on f(xz) ensures 1 − xzf ′ (xz) ln 2 < 0, and 2−f(xz) | 1 − xzf ′ (xz) ln 2 | is O(2−f(xz) ). Then
j
2−f(2 ) hj (τ ).
j=1
j=1
g(2; z) − g(4; z) = (2 − 4)g′ (a; z),
+∞ X
C2
−f(2kτ +1 ) +∞ X
C
(1 − τ 2−kτ −1 ) − j
2−f(2 ) (1 − 3τ 2−j )
j=kτ +2
≤ ≤
kτ +1
) C 2−f(2 O(2−f(bτ) ),
where b ∈ [1, 2]. This expression is obtained using the fact that 0 < 1 − 3τ 2−j < 1 and 0 < 1 − τ 2−kτ +1 < 1. Therefore, we have the upper bound
i.e., R(τ ) ≤ O(2−f(bτ) ). γl = O(2−f(aτ) ). j
(b)σj2 = B 2−f(2 )+j Using similar derivations, we have the lower bound
j
(b) σj2 = B 2−f(2 )+j Inserting the given σj2 into the expression of R(τ ), we have R(τ )
γl = (4 − 2)[g(4; z) − g(2; z)].
=
Using the mean-value theorem again, we have g(4; z) − g(2; z) = (4 − 2)g′ (a; z),
≤
kτ +1
) B 2−f(2 (τ 2−(kτ +1) − 1) + ∞ X j B 2−f(2 ) (1 − 3τ 2−j )
B
j=kτ +2 ∞ X
j
2−f(2 ) (1 − 3τ 2−j )
j=kτ +2
where a ∈ [2, 4].
≤
g′ (a; z) = 2−f(xz) [1 − xzf ′ (xz) ln 2]. Since for τ sufficiently large, now we have 1 − 1 ′ xzf ′ (xz) ln 2 > 0 (due to the given condition y ln 2 > f (y), and 2−f(xz) [1 − xzf ′ (xz) ln 2] = O(2−f(xz) ), g′ (a; z) = 2−f(xz) [1 − xzf ′ (xz) ln 2] |x=a = O(2−f(az) ). Then the rest of the derivation will be the same as that of Case (a), and the same lower bound can be obtained. Putting the two case together, we have γl = O(2−f(aτ) ) for a ∈ [2, 4], and τ sufficiently large.
O(2−f(2
kτ +2
)
),
where the last expression is obtained using conditions on f(2j ), and 1 − 3τ 2−j = O(1). Then R(τ ) ≤ O(2−f(bτ) ), for b ∈ [1, 2]. Combining the lower and upper bounds, the theorem is proved. Q.E.D.