Adaptive Random Sampling for Load Change Detection [Extended Abstract]
∗
Baek-Young Choi
Jaesung Park
Zhi-Li Zhang
University of Minnesota 200 Union St. SE Minnesota MN 55414
University of Minnesota 200 Union St. SE Minnesota MN 55414
University of Minnesota 200 Union St. SE Minnesota MN 55414
[email protected]
[email protected]
[email protected]
ABSTRACT Timely detection of changes in traffic load is critical for initiating appropriate traffic engineering mechanisms. Accurate measurement of traffic is essential since the efficacy of change detection depends on the accuracy of traffic estimation. However, precise traffic measurement involves inspecting every packet traversing a link, resulting in significant overhead, particularly on high speed links. Sampling techniques for traffic load estimation are proposed as a way to limit the measurement overhead. In this paper, we address the problem of bounding sampling error within a pre-specified tolerance level and propose an adaptive random sampling technique that determines the minimum sampling probability adaptively according to traffic dynamics. Using real network traffic traces, we show that the proposed adaptive random sampling technique indeed produces the desired accuracy, while also yielding significant reduction in the amount of traffic samples. We also investigate the impact of sampling errors on the performance of load change detection.
accuracy of traffic measurement. But, inspecting every packet traversing a link to obtain the exact amount of traffic load impairs the processing capacity of a router. Therefore sampling techniques that estimate traffic accurately with minimal measurement overhead are needed. In this paper, we develop an adaptive random sampling technique for load change detection using sampled traffic measurement as a scalable solution for today’s and future high-speed link traffic measurement. Our adaptive random sampling technique differs from existing sampling techniques for traffic measurement in that it bounds sampling errors within a pre-specified error tolerance level. Bounding error in estimation is important for further analysis of traffic such as change point detection. We analyze and verify the proposed adaptive random sampling technique with real traces and investigated the impact of sampling errors on the performance of traffic load change detection. In Section 2, we formally state the problem addressed in this paper. In Section 3, the adaptive random sampling technique is described and applied to a change point detection algorithm. Section 4 concludes the paper.
Categories and Subject Descriptors
2.
C.2.3 [Network Operations]:
General Terms Measurement
Keywords Sampling, change detection
1. INTRODUCTION Network traffic may fluctuate frequently and often unexpectedly for various reasons such as transitions in user behavior and failure of network elements. Timely detection of such changes in traffic is critical for initiating appropriate traffic engineering mechanisms. The performance of a change detection algorithm depends on the ∗A full version of this paper is available www.cs.umn.edu/˜ choiby/papers/ars.ps
at
PROBLEM STATEMENT
Assume that there are m packets arriving in an block, and let Xi be the msize of the ith packet. Hence the traffic load of this block is V = i=i Xi . To estimate the traffic load of the block, suppose we randomly sample n packets out of the m packets. In other words, each packet has an equal probability p = n/m to be sampled. Let ˆ j , j = 1, 2, . . . , n, denote the size of the jth sampled packet. X Then the traffic load V can beestimated by Vˆ using the samples, n ˆ . Our objective is to bound X where Vˆ is given by Vˆ = m n j=1 j Vˆ −V the relative error V to be bounded by ε with probability 1 − η. From the central limit theorem of random samples [1], as the sample size n → ∞, the average of sampled data approaches the population mean, regardless of distribution of population. Thus the bounding estimation error problem can be written as follows: √ ˆ εµ n V − V ≤ η, (1) Pr >ε ≈2 1−Φ V σ where µ and σ are, respectively, the population mean and standard deviation of the packet size distribution in a block, and Φ(·) is the cumulative distribution function (c.d.f) of the standard normal distribution. Hence, to satisfy the given error tolerance level, the required number of packet samples must satisfy (2) n ≥ n∗ = zp · S 2 −1 where zp = Φ (1−η/2) is a constant determined by given erε ror tolerance level, and S = (σ/µ)2 is the squared coefficient of
4
10
1.5 1 0.5
6
Dstat:original
4 2
0
50
100
150 time (300sec)
200
250
300
1
0.6
eta=0.20, ep=0.25
error probability
0.8
0.4 0.2 0
optimal sampling adaptive random sampling 0
0.02
0.04
0.06
0.08
0.1 error
0.12
0.14
0.16
0.18
0.2
200
400
600
800
1000
1200
1400
200
400
600
800
1000
1200
1400
200
400
600
800
1000
1200
1400
200
400
600
800
1000
1200
1400
2 1 0
eta=0.05, ep=0.05
0
original trace
x 10 2
origianl volume optimal sampling adaptive random sampling
8
traffic volume
7
x 10
2 1 0
2 1 0
Figure 1: Traffic volume estimations and relative error ({η, ε} = {0.1, 0.1}, B = 300sec).
Figure 2: Detection statistics for population estimated traffic loads with different accuracy.
variance (SCV ) of the packet size distribution in a block. Eq. (2) concisely relates the minimum number of packet samples to the estimation accuracy and the variability in packet sizes. In particular, it states the minimum required number of packet samples, n∗ , is linearly proportional to the squared coefficient of variance of the packet size distribution in a block.
lable by the accuracy parameters. Figure 1 shows the time series of the original traffic load and the estimated traffic loads using both the ideal optimal sampling and the adaptive random sampling with prediction. We see that for both the sampling methods, the traffic load estimation indeed conforms to the pre-specified accuracy parameters, i.e., the probability of relative errors larger than ε = 0.1 is around η = 0.1.
3. ADAPTIVE RANDOM SAMPLING AND ITS APPLICATION TO LOAD CHANGE DETECTION In this section, we shortly outline adaptive random sampling technique and the impact of sampling errors on the performance of traffic load change detection. The interested readers can refer to the full version of the paper [3] for detailed analysis and description. We observe that the network traffic fluctuates significantly over time. Hence, the optimal sampling probability also varies over time. Note that to determine the optimal sampling probability, we need to know the actual SCV of the packet size distribution and the packet count in a block. Unfortunately, in practice these traffic parameters of a block are unknown to us at the time the sampling probability is to be determined. To address this problem, we employ an Auto-Regressive model to predict these parameters based on past sampled measurements. Note that in addition to estimation errors due to sampling, the prediction model also introduces prediction errors. We quantify and analyze the impact of these errors on the traffic load estimation and discuss how these errors can be controlled. Using the central limit theorem for a sum of a random number of random variables (p.369 in [2]), we establish the following. T HEOREM 1. With probability 1 − η, the relative error in estimating the traffic load Vk of the kth block is V˜k − Vk 1 (3) Vk ≤ ε + √zp (1 + ε)Y
Based on the time series of estimated traffic load, we employ a non-parametric change point detection algorithm based on singular spectrum analysis (SSA) developed in [4]. Figure 2 illustrates the impact of sampling errors on the performance of the load change detection algorithm. We used a 99% significance level as the detection threshold. The detection statistics are compared for different accuracy parameters based on the original traffic on the top plot. Nondetections and false alarms are observed for the traffic load estimations using loose accuracy parameters. This evidently tells us tells that bounding estimation errors is critical in traffic load change detection.
4.
CONCLUSIONS
In this paper, we proposed an adaptive random sampling technique that bounds the sampling error to a pre-specified tolerance level while minimizing the number of samples. We have experimented with real traffic traces and demonstrated that the proposed adaptive random sampling indeed conforms to the pre-specified accuracy parameters. We also have investigated the impact of sampling error on the performance of change detection algorithm and illustrated the desirability of bounding estimation error.
5.
REFERENCES
[1] D. A. Berry and B. W. Lindgren. Statistics Theory and Methods, 2nd ed. Duxbury Press, ITP, 1996. [2] P. Billingsley. Convergence of Probability Measures. New York Wisley, 1968.
where Y ∼ N (0, 1).
[3] B.-Y. Choi, J. Park, and Z.-L. Zhang. Adaptive random sampling for load change detection. Technical Report, University of Minnesota, Nov 2001.
Theorem 1 yields a theoretic bound on the variance of adaptive ran2 with probability 1 − η. Furthermore, it dom sampling, i.e. (1+ε) zp illustrates that the variance of adaptive random sampling is independent of the distribution of objects being sampled and is control-
[4] V. Moskvina and A. Zhigljavsky. Change-point detection algorithm based on the singular-spectrum analysis, detection. School of Mathematics, Cardiff University, CF24 4YH, UK,Preprint, 2001.