Fitting Univariate Distributions to Computer Network ...

Fitting Univariate Distributions to Computer Network Traffic Data Using GUI Petar Čisar * and Sanja Maravić Čisar** *

Academy of Criminalistic and Police Studies, Belgrade-Zemun, Serbia ** Subotica Tech, Department of Informatics, Subotica, Serbia [email protected], [email protected]

Abstract—The available literature is not completely certain what type(s) of probability distribution best models computer network traffic. The statistical analysis presented in this paper aims to show the implementation of graphical interface for fitting univariate distributions to authentic network traffic data. The analysis is realized in Matlab based GUI, using distribution fitting tool.

3. The integral of the probability function is one, that is +∞

∑ f (x )dx = 1 −∞

I. INTRODUCTION A random variable is a variable (typically represented by x) that has a single numerical value that is determined by chance. A probability distribution is a graph, table or formula that gives the probability for each value of the random variable. A univariate distribution is a probability distribution of one random variable. Discrete distributions − If x is a random variable then P(x) denotes the probability of occurring x. It must be the case that 0 ≤ P( x ) ≤ 1 for each value of x and pj = 1

∑ j

where j represents all possible values that x can have and pj is the probability at xj.

Figure 1. Figure 1. Discrete distribution [8]

Continuous distributions - The mathematical definition of a continuous probability function f(x) is a function that satisfies the following properties. 1. The probability that x is between two points a and b is:

Figure 2. Continuous distribution [8]

Since continuous probability functions are defined for an infinite number of points over a continuous interval, the probability at a single point is always zero. Probabilities are measured over intervals, not single points. That means, the area under the curve between two distinct points defines the probability for that interval. In this sense the height of the probability function can in fact be greater than one. The property that the integral must equal one is equivalent to the property for discrete distributions that the sum of all the probabilities must equal one. Fitting distributions consist of finding a mathematical function which represents in a good way a statistical variable. In statistics it is very often the following situation: there are some observations of a quantitative character x1, x2,… xn and the task is to test if those observations, being a sample of an unknown set, belong to a set with a probability density function (pdf) f(x,q), where q is a vector of parameters to estimate with the available data. In Matlab, pdfs are estimated with appropriate parameters. Each supported pdf represents a parametric family of distributions. Input arguments are arrays of outcomes followed by a list of parameter values specifying a particular member of the distribution family.

b

p[a ≤ x ≤ b] =

∫ f (x)dx a

2. It is non-negative for all real x.

II.

COMPUTER NETWORK TRAFFIC DISTRIBUTIONS

The different computer network traffic models each have their own advantages and disadvantages. The type of

network under observation and the traffic characteristics dominantly influence the choice of the traffic model used for analysis. Traffic models that cannot detect or describe the statistical characteristics of the actual traffic on the network are to be avoided, since the choice of such models will result in under-estimation or over-estimation of network performance. There is no one single model that can be universally used for modeling traffic in all types of networks. For heavy-tailed traffic, it can be shown that Poisson distribution model under-estimates the traffic [1]. In the case of high speed networks with unexpected demand on packet transfers, Pareto distribution based traffic models are acceptable since the model takes into the consideration the long-term correlation in packet arrival times [2]. Also, with Markov models, though they are mathematically correct, they fail to fit the actual traffic of high-speed networks. The available literature is not completely unanimous what type(s) of probability distribution best models network traffic. Thus, for example, the uniform, Poisson, lognormal (Figure 3), Pareto and Rayleigh distributions were used in different applications.

User 1

Figure 3. Network traffic distribution [3]

III.

NETWORK TRAFFIC DATA

For the analysis of network traffic curves, this research uses daily, weekly and monthly graphic illustration of several larger Internet users that derives from the popular network software MRTG (Multi Router Traffic Grapher), which is related to the period of one day, week and month. Without the loss of generality, the graphical presentation of curves from three users is given below, noting that the observed traffic curves of other users do not deviate significantly from the forms shown here [4]. User 2

User 3

Daily

Weekly

Monthly

Figure 4. Traffic curves of different users

The daily outgoing traffic of a typical user (in this case, for September 21, 2010, Tuesday) is taken as an example, in which the following four characteristic intervals can be identified (Figure 5): 02–06h (night traffic), 06−10h (morning traffic), 10−22h (daily traffic) and 22−02h (night traffic).

Figure 5. Daily traffic curve

Using the ability of the monitoring software PRTG [5] to provide numeric values also (Figure 6), 349 consecutive hourly averages were taken for the first 15 days of the monthly period (Aug 24 − Sep 06).

Figure 7. Fitting distributions to network traffic data using GUI Figure 6. PRTG - Traffic samples (example)

This paper uses 144 samples of daily traffic (10−22 h). Appropriate descriptive statistics for these samples is given by the following table (the speed rates are given in kbit/s). TABLE I. DESCRIPTIVE STATISTICS OF DAILY TRAFFIC Descriptive statistics

10-22h

Mean

59144,47319

Standard Error

443,5149934

Median

60117,992

Mode

#N/A

Standard Deviation

5322,179921

Sample Variance

28325599,11

Kurtosis

-0,38901581

Skewness

-0,280017092

Range

25817,343

Minimum

45322,681

Maximum

71140,024

Sum

8516804,14

Count

144

Confidence Level(95,0%)

876,6926133

Upper Control Limit

75111,01296

Lower Control Limit

43177,93343

Using this data as input for GUI Distribution Fitting Tool [7], the result is the following graphic situation.

Weibull distribution − A random variable X is said to have a Weibull distribution with parameters α and β if the pdf of X is:

α α −1 −( x / β )α x e βα f ( x;α , β ) = 0 f ( x;α , β ) =

x≥0 x

Fitting Univariate Distributions to Computer Network ...

Fitting Univariate Distributions to Computer Network ...

Suggest Documents

FITTING WEIBULL AND LOGNORMAL DISTRIBUTIONS TO MEDIUM ...

FITTING THE STATISTICAL DISTRIBUTIONS TO THE DAILY

FITTING WEIBULL AND LOGNORMAL DISTRIBUTIONS TO MEDIUM ...

Fisher scoring for some univariate discrete distributions

Stein's method for comparison of univariate distributions

CVaR Distance Between Univariate Probability Distributions and ...

Parameter induction in continuous univariate distributions: Well

A Constructive Representation of Univariate Skewed Distributions

Using univariate BÃ©zier distributions to model simulation input

A Novel Approach for Fitting Probability Distributions to ... - CiteSeerX

Fitting Statistical Distributions to Data in Hurricane ... - CiteSeerX

FITTING THE STATISTICAL DISTRIBUTIONS TO THE ... - Eprint UTM

Network approach to internet bandwidth distributions - Computational ...

On Graphical Models via Univariate Exponential Family Distributions

Inverse moments of univariate discrete distributions via the Poisson ...

Fitting of dynamic recurrent neural network models to sensory stimulus ...

Fitting airport privatisation to purpose - Robyn Keast Network ...

transconductor network for linear fitting - Circuits and

Exponential Sum-Fitting of Dwell-Time Distributions ... - Cell Press

Fitting parametric distributions using R: the ... - Agrocampus Ouest

fitting mixture importance sampling distributions via improved cross ...

2009: fitting discrete multivariate distributions with ... - ePrints Soton

Fitting Pareto II Distributions on Firm Size: Statistical ... - asmda

Tail fitting for truncated and non-truncated Pareto-type distributions ...