Hierarchical Models for Different Kinds of Traffic on CDMA ... - CiteSeerX

1 downloads 0 Views 424KB Size Report
[1] W. E. Leland, M. S. Taqq, W. Willinger, and D. V.. Wilson, “On the self-similar ... [5] W. Willinger, M. S. Taqqu, R. Sherman, and D. V.. Wilson, “Self-Similarity ...
Hierarchical Models for Different Kinds of Traffic on CDMA-1xRTT Networks Chlo´e Rolland∗ , Julien Ridoux† , Bruno Baynat∗ ∗ Universit´e Pierre et Marie Curie – Paris VI, LIP6/CNRS, UMR 7606, Paris, France † ARC Special Research Center for Ultra-Broadband Communications - The University of Melbourne, Australia {rolland,baynat}@rp.lip6.fr, [email protected]

1

Abstract

Introduction

From an operational point of view, traffic models are essential components. They allow the computation of the essential inputs (performance parameters, generation of synthetic traffic, etc.) required by most of the traffic engineering applications. For years, traffic modeling has been restricted to its simpler form in considering plain renewal processes of traffic entities (packets, pages, flows, sessions, etc.). Poisson processes have then been widely used because of their memoryless property. More recent traffic analysis studies highlighted evidences for scaling behaviors in IP traffic. Localarea traffic and web traffic have been shown to be self-similar [1, 2]; long-range dependence [3] and fractal behaviors [4] have been discovered in backbone traffic. These scaling properties indicate the correlations existing in the time series extracted from the traffic traces. The correlation structure of the traffic has strong implications on queuing and performance, especially leading to high variability (burstiness) over a wide range of time scales. As a result, the inter-dependences among traffic entities prevent them from being modeled by “simple” (e.g. Markovian) models. In the context of IP traffic, the source of the phenomenon is not clear. The presence of heavy-tailed distributions [2], the superposition of independent sources [5] or the inherent structure and interactions of protocol layers [6, 7] have been identified as possible causes. For over more than ten years now, a lot of work has been devoted to the modeling of IP traffic. Attempts for capturing scaling behaviors in realistic models have been made by combining measurements, simulation and analytical modeling. All these approaches can roughly be classified into two main categories: the descriptive or “black-box” approaches and the constructive or “white-box” approaches. A descriptive approach relies on the traffic obser-

The presence of scaling behaviors in IP traffic (fractality, long-range dependency, ...) is challenging the commonly assumed traffic models. The origins of such behaviors are still under investigation. Several possible causes have been highlighted: among them, the presence of heavy-tailed distributions in traffic characteristics, the superposition of multiple independent sources, and the inherent structure and interactions of protocol layers. The objective of this paper is to provide a general framework for traffic description that aims at reproducing and understanding the scaling properties of IP traffic. To this aim, we developed LiTGen, an easy to use and tune traffic generator. LiTGen relies on a hierarchical model that statistically models traffic on a per user and application basis. The user-driven approach aims at evaluating the impact of sources superposition. The hierarchical structure is closely related to the protocol interactions from application to IP layer. Finally the flexibility of the model enables us to investigate the sensitivity with regard to the distributions of the different random variables involved in the traffic description. We validated our model by comparison with a large CDMA trace captured in a large scale ISP wireless access network1 . The synthetic trace generated by our model and the measured trace are compared with wavelet spectral and semi-experiment techniques. Hopefully the results of this study will be the starting point for the development of new analytical models that reflect more accurately the scaling behaviors in IP traffic. Keywords: scaling behaviors, traffic modeling, traffic characterization, wavelet, semi-experiments

1 This study would not have been conducted without the support of Sprint Labs. The authors would like to thank Sprint Labs for giving access to the wireless traffic traces and particularly Ashwin Sridharan for his support.

1

vation coming from measurements. The objective is to develop models of traffic that aim at reproducing the scaling behaviors that have been measured. “Statistical models” and “hidden models” are common examples of this class of approaches. Statistical models of traffic (e.g. “Self-similar” and “Fractal” models) are based on the estimation of statistical moments (mean, variance, etc.) and dependencies (autocorrelation, covariance, etc.). They produce complex stochastic processes (e.g. Fractional Gaussian Noise, Fractional ARIMA processes, Multifractal Wavelet Model) that reflect more accurately the scaling properties of the traffic [1, 5, 8]. Hidden models (e.g. Hidden Markovian Models or Hidden Bayesian Models) consist in estimating the parameters of a presupposed simple model (e.g. a simple Markov chain), such that the output of the model fits the measured traffic with regard to the expected behaviors [9]. Both of these descriptive approaches provide a traffic description at a very fine granularity. However, they pain to give insights about the intrinsic causes of the observed scaling behaviors for two reasons. The high complexity of the resulted traffic models prevents them from taking into account the relations existing between the users and the network elements. These “descriptive” techniques reproduce the traffic in a blind way, as coming out of a “black-box” not structurally related to the system, preventing their use to seek for the origins of scaling behaviors in IP traffic.

scaling behaviors. In this paper we present a new approach, halfway between the black-box and the white-box approaches. Our approach can thus be qualified as a “gray-box” approach. As descriptive approaches, it is based on measurements, but tries to rely on simple assumptions. As constructive approaches, it takes into account a given knowledge of the system, but only considers the interactions between traffic and applications (protocols). The goal of this paper is not only to provide another generator of traffic but also to identify more precisely the causes of the traffic scaling behaviors. For this purpose, we develop a hierarchical user-driven approach to investigate the influence of the presupposed possible causes of scaling behaviors. The underlying model first relies on a per user description of the traffic to evaluate the impact of sources superposition. The model is also hierarchical and takes into account the interactions of protocol mechanisms from application to IP layer. Finally the traffic description is based on the definition of a set of random variables, corresponding to the description of different entities at different levels of the hierarchy, enabling us to investigate the sensitivity of the model with regard to the distribution of these random variables and to test whether these random variables are (and need to be) heavy-tailed. Modeling traffic and understanding causes of scaling behaviors require a deep analysis among classes of traffic. This paper focuses on the presentation of the methodology while giving preliminary results for the web, mail and P2P traffic. The rest of the paper is organized as follows. Section 2 describes precisely LiTGen’s underlying model. In section 3 we introduce the architecture of the wireless access network from which the traffic traces we benefit were captured, and present the identification of the underlying model’s traffic entities in the traces. The evaluation of LiTGen is presented in section 4. Finally, section 5 explores the causes of scaling behaviors before concluding this study.

A constructive approach is based on the knowledge of the system and will be referred as to a “white-box” approach. The objective of such an approach is to reflect as simply as possible the real interactions between users, protocols, applications and/or network elements, and thus try to have a look at the system as a white (transparent) box [6, 10, 11]. As opposed to the models of the descriptive class, these models usually rely on very simple assumptions on the traffic and on the network (e.g. renewal processes, Markovian models). As a consequence they can be solved analytically (or numerically) in order to quickly derive the performance parameters. In addition, by taking into account (even succinctly) the possible interaction between the traffic and the network, they allow to give insights on the possible causes leading to the observed traffic. Nevertheless, such models lack of generality and have difficulties to handle traffic scaling behaviors because of their inherent simplicity. For these reasons, very interesting approaches such as [6] manage to capture efficiently the overall traffic scaling behaviors. But they do not give sufficient insights about the interaction of their model internal parameters as well as their impact on the traffic

2

Model description

Earlier works identified three possible causes of correlation in IP traffic: the presence of heavy-tailed distributions [2], the superimposition of independent traffic sources [5] and the inherent structure and interactions of protocol layers [6]. These two last assumptions call on the conception of our traffic generator to be based on a user-oriented approach and a hierarchical model. This model is made of several semantically meaningful levels, each of them 2

R.V.

Semantic level

Description

Nature

Nsession

Session

Session size

Discrete

Tis

Session

Inter-Session period duration

Continuous

Npage

Page

Page size

Discrete

Tof f

Page

OFF period duration

Continuous

Nobj

Object

Object size

Discrete

IAobj

Object

Object Inter-Arrival times

Continuous

IApkt

Packet

Packet Inter-Arrival times

Continuous

Table 1: Random variables associated in the model characterized by a specific traffic entity. For each traffic entity, we define a set of random variables either related to a time or a size characterization.

2.1

- Nobj , the object size, i.e. the number of IP packets in an object; - IAobj , the objects inter-arrival times in a session (in seconds).

Session level 2.4

We assume each user undergoes an infinite succession of session and inter-session periods. During a session, a user makes use of the network resources by downloading a certain number of objects. We define two random variables to characterize this level: - Nsession , the session size, i.e. the number of objects downloaded during a session; - Tis , the inter-session duration (in seconds).

Finally, each object is made of a set of packets. The arrival process of packets in an object can be described by giving the inter-arrival time between two successive packets: - IApkt : packets inter-arrival times in an object.

2.5 2.2

Web page level

Further model assumptions

So far, no assumption has been made concerning the correlation structure of the predefined random variables. As an example, the arrival process of packets in objects may not necessarily be a renewal process. Successive random variables IApkt can thus be correlated (and a more precise notation should be given)2 . Moreover, a random variable (of a given level) can be correlated to another random variable (of the same level or of another one). For example IApkt can be correlated to Nobj . Actually, we will investigate this dependency in Section 4. As a consequence, the hierarchical approach developed in this paper is very generic and, complex dependency mechanisms between the different random variables can be taken into account. Of course, the objective is to remain as simple as possible, and introduce dependencies only if necessary. The first goal of this paper is then to test whether the userdriven and the hierarchical approach, coupled with very simple i.i.d. hypothesis for the random variables, are sufficient to reflect the measured traffic scaling behaviors. Secondly, the simple structure of the model enables us to investigate both the dependencies between random variables and the sensitivity with regard to the distributions.

This level is proper to web traffic and is not relevant in the context of mail and P2P. In the case of web traffic, each user session can be seen as a succession of ON and OFF periods, corresponding respectively to the downloading of a web page and the user’s reading time of this page. The two following random variables characterize this level: - Npage : the page size, i.e. the number of objects embedded in a page; - Tof f : the OFF duration, corresponding to the reading time of a page (in seconds).

2.3

Packet level

Object level

A session (resp. web page in the case of web traffic) is made of one or several objects. Indeed, a session (resp. page) is split up into a set of requests (sent by user) and responses (from the server), where responses gather the session’s objects (resp. the page’s objects). In the case of web, objects may be skeletons or embedded pictures of web pages. In the case of mail, objects may be servers responses to clients requests (e.g. e-mails, clients accounts meta-data. . . ). In the case of P2P, objects may be files or chunks of files. The description of this level requires the definition of two random variables:

2 A more precise notation has been omitted here for the sake of clarity of the presentation.

3

3

Trace analysis

3.1

ties. Starting from the captured sets of packets, we aggregate packets to identify traffic entities of the higher levels: objects, web pages and sessions (see Figure 2).

Raw packets trace

In order to calibrate and validate the model we propose, we benefit from data traces captured on the Sprint PCS CDMA-1xRTT access network. Figure 1 shows a summary of the wireless access network architecture as well as the measurement point where packet traces have been captured. This measurement point is located at the interface between the Internet core network and the wireless specific set of networking elements.

P2P Traffic of User 1 User Filter

Download Traffic

e.g: P2P

...

P2P Traffic

P2P Traffic of User i

Object Filter

Objects

Session Filter

Sessions

...

P2P Traffic of User n

Figure 2: Filtering the data trace

Customer Data Network

BS

P2P Traffic of User 2

Application Filter

3.2.1

Packet level

AAA

From the packet level trace, we identify traffic specific to a given application based on a source port number selection. For a given application traffic, we determine the set of packets corresponding to each user by projecting the 5-tuple data with a given destination IP address, a given transport protocol (TCP) and a given set of source port numbers identifying an application (e.g. 143, 220, 109 and 110 for the mail traffic). Among each user’s packets, we obtain subsets of packets with varying {source IP address, destination port}, corresponding typically to the server IP address contacted and the port opened on the user’s side. All resulting subsets of a given user correspond then to all flows he requested (considering the given application).

PDSN / Foreign Agent

BS

Radio Access Network

Home Agent

Wireless specific Optimizations and Processing

Core Network

PDSN / Foreign Agent

BS

Measurement Point BS

BS: Base Station; PDSN: Packet Data Switch Node; AAA: Authorization, Authentication, Accounting.

Figure 1: Sprint PCS capture architecture At this measurement point, a traffic capture has been realized leading to two 24 hours long traces in both directions. These traces consist of a collection of IP packets with accurate timestamps and entire TCP/IP headers (the monitoring systems capture the first 44 bytes of all IP packets). Thus, we have access to the well-known 5-tuple: {IP destination, IP source, port destination, port source, transport protocol}. These traces have already been used in a previous study [12] that gives more insights on the raw characteristics of this data traffic. This previous study highlights the similarities and differences existing between typical backbone traffic and this wireless traffic capture.

3.2

3.2.2

Object level

After applying the “application and user filter”, packets subsets are grouped to identify objects by means of the method presented in [13]. Based on the analysis of the TCP headers, this method observes the TCP flags (SYN, FIN, etc) to differentiate objects within flows (characterized by a rupture in the acknowledgment number series).

Traffic entities identification 3.2.3

Considering the measurement point, as shown in Figure 1, the traffic we study is in the download path, that is the traffic going from the core network to the wireless access network. While the trace captured in the upload path contains the requests of the users to the servers, the trace captured in the download path contains the responses of the servers to the users. As a consequence, the trace captured in the download path carries the traffic we want to study. To calibrate and validate the model, we need to identify in the data trace the specified traffic enti-

Web page level

Only in the case of web traffic, we aggregate web objects into their corresponding pages. Since we do not have access to the packets’ payloads, we use heuristics (also used in [10, 11, 13]) to achieve the aggregation. The definition of the web pages relies on active periods during which one or several objects are being downloaded. Those periods are separated by inactive periods, where the silence can come from the user (thus it is an OFF) or not (e.g. idle time due to the web browser: parsing of the page’s skeleton). 4

arrivals are supposed to be independent), but also we assume the random variables to be independent between them. We led to similar results for the web, mail and P2P traffics. First, power law distributions, which present a heavy-tail, approximate well the random variables Npage , Tof f and Nobj . On figures 4, 5 and 6 we plot respectively the distributions of the number of objects in pages (Npage ), the OFF period durations (Tof f ) and the number of packets in objects (Nobj ), for the web traffic only. On each figure, we also plot the fitted power-law distribution. We could not manage to fit accurately the interarrivals of packets within objects (IApkt ). Since we make the assumption that the random variables are independent and identically distributed for the fit investigation, one may suppose that there exists a correlation on IApkt that we do not take into account. We will investigate this possibility in Section 4.

We then use a temporal clustering method to infer the pages’ boundaries. An inactive period that lasts for more than a predefined threshold determines the precedent page termination (this given inactive period is thus an OFF). In figure 3 we plot the empirical distribution of the inactive periods and fixed empirically the threshold to 1s, a consistent result compared to [10, 11, 13]. 0.2 0.18

Empirical PDF of the duration of the inactive periods

0.16

Pr[duration k])

10

Web session level

−3

10

−4

10

Finally, we aggregate objects (resp. web pages in the case of web traffic) into sessions. Similarly to web pages, the definition of the sessions relies on active periods during which one or several objects (resp. web pages) are being downloaded. We fixed empirically the threshold to 300 seconds3 .

3.3

−5

10

Empirical distribution Power law with alpha=1.3 −6

10

0

10

1

2

10

3

10 log(k)

10

4

10

Figure 4: Web traffic: # of objects in pages Npage 0

Measured parameters

10

In this paper, we investigate the influence of the distributions followed by the random variables associated to the traffic entities. The presence of heavytailed distributions in traffic has been identified as a possible cause for self-similarity in traffic [2]. We thus need to know if there are random variables following such distributions in the measured trace, to better drive our study and understand the obtained results. We compute statistics from the measured trace and examine if “well-known” distributions can approximate the empirical ones followed by the random variables listed in Tab. 1. To this aim, we first assume that the random variables are independent and identically distributed. Precisely, not only we assume the independence of random variables within a given process (e.g. two successive packet

−1

log(Pr[OFF>off])

10

−2

10

−3

10

−4

10

Empirical distribution Power law with alpha=0.7 −5

10

0

10

1

2

10

10

3

10

log(off sec)

Figure 5: Web traffic: OFF durations Tof f

3.4

Traffic generation

Contrarily to the trace analysis, LiTGen generates traffic from upper level entities (sessions) to lower ones (packets). LiTGen is used for the generation of traffic corresponding to different user’s applications. For each kind of traffic, we first fix the number

3 Note that the value of this threshold does not impact significantly the results.

5

0

compare the scaling properties of the time series extracted from traffic.

10

−1

10

−2

4.1

−3

To study the traffic behaviors we use a wavelet analysis. The statistical tool we rely on is the LDEstimate (Logscale Diagram Estimate or LDE), developed by P. Abry and D. Veitch, whose code is freely available [14]. This tool performs an analysis based on a discrete wavelet transform: for a given time serie of packet arrivals, it produces a logarithm plot of the data wavelet spectrum estimates. The alignment of points in the resulting diagram allows to examine traffic behaviors over different time scales. We consider a given stochastic process X. Borrowing notations and definition from [15], we briefly describe the performing of the X wavelet spectrum estimates. One begins with an elementary function ψ, called the mother wavelet, dilates it by the scale factor s = 2j and translates it by 2j k to obtain the wavelets ψj,k (t) = 2−j/2 ψ(2−j t − k). The wavelet coefficients dX (j, k) are then defined as:

log(Pr[P>k])

10

10

−4

10

−5

10

Empirical distribution Power law with alpha=1.1

−6

10

0

10

1

10

2

3

10

10

4

10

5

10

log(k)

Figure 6: Web traffic: # of packets in objects Nobj of users of the corresponding application. LiTGen generates then traffic for each user independently. The final synthetic trace is obtained by superimposing synthetic traffic of all users and all applications. For validation purposes we can extract the proportion and the number of users of each application from the captured trace. In an operational network these statistics can be derived from operator’s knowledge of customers subscribed services4 .

Wavelet analysis

dX (j, k) = hX, ψj,k i

4

Validation

The energy function, which is an estimation of the variance of these wavelet coefficients, is defined by:

We evaluate our model by examining how well it captures the complexity of the traffic correlation structure. To this aim, we first generate synthetic traffic using the real statistics collected on the data trace: we thus associate with each random variable of the model the empirical distributions measured on the raw trace (see Section 3). We then compare the behavior of the packet time series extracted from the synthetic traffic with the measured one, as illustrated in Figure 7.

S2 (j) =

Generation

(2)

where nj is the number of wavelet coefficients at scale j. The Logscale Diagram Estimate is a plot of the logarithm of the energy function S2 (log2 S2 (j)) against the octave j = log2 2j . To ease the reading of the diagram, the horizontal axis are calibrated both in octave j and scale s (in seconds). Such plots are particularly useful to observe the variation of S2 (j) with j. While straight lines with positive slope constitute experimental evidence for the presence of scaling, horizontal alignments of points constitute experimental evidence for the absence of scaling (e.g. Poisson process). For more details about mathematical developments and intuitions behind this technique see [15]. We focus on web, mail and P2P traffic and generate three independent synthetic traces using a simple version of our generator. With this so called basic LiTGen, all traffic entities are generated from renewal processes using the empirical distributions extracted from the captured trace, without introducing any additional dependency between the random variables. The three synthetic traces are then merged into a single one and compared to the filtered captured traffic composed of the same three applications. Figure 8(a) shows the resulting LDE

packet time series

?

Synthetic packet trace

1 X | dX (j, k) |2 nj k

Measured packet trace Measurement

(1)

packet time series

Figure 7: Validating the model To this aim, we choose to use two methodologies of evaluation, which are very well suited to study scaling behaviors in traffic. These two methods, defined below, are 1) the LDestimate; and 2) the semiexperiments. They allow to identify, study and then 4 Such a finite assumption is typically used for network planning to predict the active population that will be served during a given time.

6

23

488mus 0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

128s

512s

7

9

pear in the same proportions in the captured trace: while carrying 92.7% of the packets and 95.6% of the flows, web is the dominant application; mail carries 6.8% of the packets and 3.9% of the flows; P2P carries 0.5% of the packets and 0.5% of the flows. Figure 8(b) clearly indicates the differences between the three applications spectra in the captured trace. Moreover, figure 8(b) shows that the mail and P2P traffics are hidden by the predominant web traffic. These two last points call for studying each application independently. Finally, figure 8(b) also shows that our extended model accurately models the web traffic. In the following, we study each of these three applications independently. Figure 9(a) confirms the previous results on web traffic. The reference spectrum (thick gray curve) corresponds to the captured web traffic only. While the synthetic trace produced by basic LiTGen (thin curve) pains to reproduce the captured scaling structure, the one produced by extended LiTGen (circle curve) superimposes with the captured one. Figure 10(a) presents the mail traffic spectra. Unlike web traffic, we do not observe a very important difference between the synthetic trace obtain by basic LiTGen and the one obtain by extended LiTGen. Indeed, the basic LiTGen’s underlying model reproduces in quite a good way the mail traffic spectrum. Extended LiTGen improves the results showing LiTGen’s good ability to model mail traffic. Figure 11(a) shows the case of P2P traffic. Basic LiTGen pains to reproduce the captured traffic correlation structure for the scales above j = 0. The dip of reference spectrum at scales around j = 2 indicates a possible periodic behavior which we do not capture. Although the structural dependency introduced between Nobj and IApkt in extended LiTGen does not lead to the same improvement when dealing with P2P traffic, it allows the corresponding spectra to match the reference one except over scales comprised between j = 0 and j = 5. Due to space limitation, we do not provide here the investigation required to capture this apparent periodic behavior and leave it as future work. While LiTGen exhibits good results on the overall spectra, we need a further advanced methodology to validate the internal properties of the synthetic traffic.

Captured trace (mail, P2P and web) Synthetic trace using "basic" LiTGen Synthetic trace using "extended" LiTGen

log2 Variance(j)

19

15

11

7

−11

−9

−7

−5

−3

−1 scale j

1

3

5

(a) “Basic” vs “extended” LiTGen 23

488mus 0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

128s

512s

5

7

9

Captured trace: mail, P2P and web Captured trace: mail Captured trace: P2P Captured trace: web

19

log2 Variance(j)

15

11

7

3

−1

−11

−9

−7

−5

−3

−1 scale j

1

3

(b) Mail, P2P and web spectra

Figure 8: Model evaluation and comparison of the mail, P2P and web spectra. spectra. Clearly, the synthetic trace produced by basic LiTGen (thin curve) does not match the captured traffic spectrum (thick gray curve). This simple version of LiTGen’s underlying model does not succeed in reproducing the captured traffic scaling structure with a good accuracy. To improve this result, we refine the model by adding a structural dependency between the object sizes and the internal arrival process of packets within objects. This slight improvement of basic LiTGen is referred as extended LiTGen. This version still models the arrivals of packets within objects by renewal processes but, the average values of the inter-arrival random variables IApkt now depend on the corresponding objects sizes Nobj . The spectra obtained (circle curve in figure 8(a)) is barely distinguishable from the captured one. As a first result, the introduction of a very simple structural dependency in the model succeeds in reproducing accurately the traffic correlation structure. It is important to notice that this can be achieved without taking into account network characteristics (such as TCP dynamics, RTT, loss rates or link capacities) leading to a much simpler generator than the one developed in [16]. These three kinds of traffic, however, do not ap-

4.2

Semi-experiments method

Semi-experiments have been introduced in [3] and consist in an arbitrary but insightful manipulation of internal parameters of the time series studied. The comparison of the energy spectrum before and after the semi-experiment leads to conclusions 7

23

488mus 0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

128s

512s

24

488mus 0.002 0.0078 0.031 0.12

Captured trace Synthetic trace using basic LiTGen Synthetic trace using extended LiTGen 20

2s

8s

32s

128s

512s

24

488mus 0.002 0.0078 0.031 0.12

16

12

0.5

2s

8s

32s

128s

512s

1

3

5

7

9

Reference: Synthetic trace Semi−exp: A−Pois Semi−exp: S−Thin Semi−exp: P−Uni Semi−exp: T−Pkt

20

log2 Variance(j)

log2 Variance(j)

log2 Variance(j)

19

15

0.5

Reference: Captured trace Semi−exp: A−Pois Semi−exp: S−Thin Semi−exp: P−Uni Semi−exp: T−Pkt

16

12

11 8

7

−11

−9

−7

−5

−3

−1 scale j

1

3

5

7

4

9

(a) Basic VS extended LiTGen

8

−11

−9

−7

−5

−3

−1 scale j

1

3

5

7

4

9

(b) SE - captured trace

−11

−9

−7

−5

−3

−1 scale j

(c) SE - extended LiTGen

Figure 9: Web: LiTGen underlying model evaluation: wavelet analysis and semi-experiments methodology 18

488mus 0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

128s

512s

17

488mus 0.002 0.0078 0.031 0.12

Captured trace Synthetic trace using "basic" LiTGen Synthetic trace using "extended" LiTGen

10

6

2

2s

8s

32s

128s

512s

17

9

5

−11

−9

−7

−5

−3

−1 scale j

1

3

5

7

1

9

(a) Basic VS extended LiTGen

488mus 0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

128s

512s

1

3

5

7

9

Reference: Synthetic trace Semi−exp: A−Pois Semi−exp: S−Thin Semi−exp: P−Uni Semi−exp: T−Pkt

13

log2 Variance(j)

13

log2 Variance(j)

log2 Variance(j)

14

0.5

Reference: Captured trace Semi−exp: A−Pois Semi−exp: S−Thin Semi−exp: P−Uni Semi−exp: T−Pkt

9

5

−11

−9

−7

−5

−3

−1 scale j

1

3

5

7

1

9

(b) SE - captured trace

−11

−9

−7

−5

−3

−1 scale j

(c) SE - extended LiTGen

Figure 10: Mail: LiTGen underlying model evaluation: wavelet analysis and semi-experiments methodology 488mus 0.002 0.0078 0.031 0.12

15

488mus0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

11

8s

32s

128s

512s

7

3

−9

−7

−5

−3

−1 1 scale j

3

5

0.5

2s

8s

32s

128s

512s

7

9

(a) Basic VS extended LiTGen

11

7

3

−5

1

3

5

7

9

Reference: Synthetic trace Semi−exp: A−Pois Semi−exp: S−Thin Semi−exp: P−Uni Semi−exp: T−Pkt

11

−1

−11

488mus 0.002 0.0078 0.031 0.12

log2 Variance(j)

log2 Variance(j)

log2 Variance(j)

2s

Reference: Captured trace Semi−exp: A−Pois Semi−exp: S−Thin Semi−exp: P−Uni Semi−exp: T−Pkt

11

−1

0.5

128s 512s 2048s

Captured trace Synthetic trace using "basic" LiTGen Synthetic trace using "extended" LiTGen

7

3

−1

−11

−9

−7

−5

−3

−1 scale j

1

3

5

(b) SE - captured trace

7

9

−5

−11

−9

−7

−5

−3

−1 scale j

(c) SE - extended LiTGen

Figure 11: P2P: LiTGen underlying model evaluation: wavelet analysis and semi-experiments methodology about the importance of the role played by the parameters modified by the semi-experiment. We apply the same set of semi-experiments to the captured traces and the synthetic traces generated by extended LiTGen. We then compare the impact of the internal manipulations to the two time series for the web (Fig.9), the mail (Fig.10) and the P2P (Fig.11) traffic.

the synthetic traces, for web, mail and P2P traffic.

T-Pkt is a Truncation manipulation that allows to examine the objects arrival process by keeping only the first packet of each object. Removing packets decreases the energy of the spectrum that takes smaller values. As shown in figures 9, 10 and 11, T-Pkt has a similar impact on the captured and

A-Pois allows to examine the interactions between objects. This manipulation repositions the objects Arrival times according to a Poisson process and randomly permutes the objects order (while preserving the internal packet structure of objects). While A-Pois is a drastic manipulation, it has very

The S-Thin manipulation allows to test for the independence of objects. It randomly Selects objects with some probability, here equal to 0.9. When applying S-Thin, the spectra of the captured and synthetic trace, for the web as well as mail and P2P traffic, keep the same shape but drop by a small amount close to log2 (0.9) = −0.15.

8

little (and similar) effect on the spectra of all traces, indicating the negligible contribution of object arrival process in comparison to packets arrival. P-Uni confirms this conclusion since it allows to examine the impact of in-objects packets burstiness. P-Uni uniformly distributes arrival times of packets in each object while preserving packets count and object duration. This manipulation flattens the spectrum from scales j = −11 to j = −5 for the web and the mail applications (resp. from scales scales j = −11 to j = 2 for the P2P application) in a comparable manner for the captured and synthetic traces. As a conclusion, the captured and synthetic traces spectra present similar reactions to each semi-experiment manipulation. This indicates that LiTGen captured the key internal properties of the traffic highlighted by the semi-experiments, i.e. the object arrival process has few influence; the objects can be considered as independent and the packets arrival process within objects contributes mostly to the energy spectrum. Note that the simple structure of our traffic description, which still relies on renewal processes, is sufficient to reproduce these traffic internal properties.

5

TIS , Nobj , IAobj and IApkt ; as well as Npage and Tof f for web traffic). We then compare these traces to the reference synthetic trace generated by extended LiTGen calibrated with the experimental distributions. Observing first the web traffic, figure 12(a) shows that modeling the random variables IAobj , Npage , Tof f , TIS and Nsession by memoryless distributions has a small impact on the spectra of the LDE. On the contrary, modeling IApkt and Nobj by memoryless distributions widely impacts the spectra, as shown in figure 12(b). As an example, modeling IApkt by an exponential distribution completely erases the correlation existing between packets inter-arrivals and flattens the spectrum at scales below j = −3. This confirms the results obtained by the semi-experiments methodology that designated the in-object packets inter-arrival structure as the main source of energy in the spectrum. As shown in figure 14 and 13 similar conclusions can be drawn for P2P and web traffic. For web, P2P and mail, we observe that modeling the random variables IAobj , Tis and Nsession by memoryless distributions is a realistic assumption that leads to an extremely small loss of accuracy compared to the reference spectrum. This is also the case of the random variables Npage and Tof f characterizing web traffic. On the contrary, modeling IApkt and Nobj by memoryless distributions is far from realistic and indicates the need to model these random variables more carefully. This investigation leads then to very interesting results illustrated by the number of objects (resp. pages in the case of web traffic) within sessions Nsession and their arrival process IAobj , as well as the number of objects within pages Npage and the OFF durations Tof f in the case of web traffic. Although the experimental distributions of these random variables are closely approximated by heavy-tailed distributions, we show these distributions have negligible influence on the scaling behaviors in traffic. The presence of heavy-tailed distributions does not compulsorily imply a presupposed scaling behavior. Nevertheless, the internal structure of objects has a strong influence on the spectra. In web as well as mail and P2P traffic, the investigation concerning the packets inter-arrival distribution clearly points it out as the source of correlation in traffic at small scales. We finally create two new synthetic traces for each kind of traffic, which we compare to the reference one. We obtain the first trace by calibrating the insensitive random variables (cf. Fig. 12(a)) with memoryless distributions. We thus create a

Impact of traffic entities properties

One of the goals of this paper is to explore the causes of scaling behaviors in IP traffic. In this section, we investigate the sensitivity of the generated synthetic traffic with regard to the distributions of the different random variables involved in the model. This investigation is done thanks to LiTGen, whose flexibility allows to model the random variables in different ways. Each random variable can be modeled either: - by its empirical distribution derived from the measured trace; - by a fitted distribution; - by a memoryless distribution.

5.1

Testing memoryless hypothesis

The flexibility of extended LiTGen (validated in the previous section) enables us to investigate the sensitivity of the traffic correlation structure with regards to the random variable distributions. To this aim, we first replace individually the experimental distribution of each random variable by a memoryless distribution (exponential or geometric) of same mean. We thus create five (resp. seven in the case of web traffic) synthetic traces, each one corresponding to a given random variable (Nsession , 9

23

488mus 0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

128s

512s

23

488mus 0.002 0.0078 0.031 0.12

Synthetic trace Synthetic trace: investigating IAobj

2s

8s

32s

128s

512s

3

5

7

9

Synthetic trace: investigating Npkt

Synthetic trace: investigating N

page

Synthetic trace: investigating Toff

19

0.5

Synthetic trace Synthetic trace: investigating IApkt

19

Synthetic trace: investigating T

log2 Variance(j)

log2 Variance(j)

Synthetic trace: investigating Nsession IS

15

11

7

15

11

−11

−9

−7

−5

−3

−1 scale j

1

3

5

7

7

9

(a) Insensitive: IAobj , Npage , Tof f , TIS and Nsession .

−11

−9

−7

−5

−3

−1 scale j

1

(b) Sensitive: IApkt and Nobj

Figure 12: Web: test for the memoryless hypothesis 18

488mus 0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

128s

512s

18

488mus 0.002 0.0078 0.031 0.12

Synthetic trace Synthetic trace: investigating IAobj

32s

128s

512s

3

5

7

9

14

log2 Variance(j)

log2 Variance(j)

8s

obj

Synthetic trace: investigating Nsession

10

6

2

2s

Synthetic trace: investigating N

Synthetic trace: investigating TIS 14

0.5

Synthetic trace Synthetic trace: investigating IApkt

10

6

−11

−9

−7

−5

−3

(a) Insensitive: Nsession .

−1 scale j

1

3

5

IAobj ,

7

TIS

2

9

and

−11

−9

−7

−5

−3

−1 scale j

1

(b) Sensitive: IApkt and Nobj

Figure 13: Mail: test for the memoryless hypothesis 15

488mus0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

128s 512s 2048s

15

488mus0.002 0.0078 0.031 0.12

Synthetic trace Synthetic trace: investigating IAobj Synthetic trace: investigating Nsession

log2 Variance(j)

log2 Variance(j)

8s

32s

128s 512s 2048s

11

7

3

−1

2s

Synthetic trace: investigating Nobj

Synthetic trace: investigating TIS 11

0.5

Synthetic trace Synthetic trace: investigating IApktts

7

3

−11

−9

−7

−5

(a) Insensitive: N session

−3

−1 1 scale j

3

IAobj ,

5

7

TIS

9

−1

11

and

−11

−9

−7

−5

−3

−1 1 scale j

3

5

7

9

11

(b) Sensitive: IApkt and Nobj

Figure 14: P2P: test for the memoryless hypothesis synthetic trace in which only Nobj and IApkt are calibrated with empirical distributions. Observing first the web traffic, figure 15(a) shows that the spectrum of the synthetic traces obtained with the partially memoryless replacements (square line) matches the reference. In the case of mail and P2P traffic, the two spectra are very close (figures 15(b) and 15(c)). This result has a strong practical appeal: five of the seven random variables (resp. three of the five random variables in the case of mail and P2P traffic) are modeled by memoryless distributions, while reproducing the traffic scaling structure. Moreover, we found that these spectra reproduce the traffic internal properties highlighted

by the semi-experiments in section 4. The last synthetic trace is obtained by modeling all the random variables with memoryless distributions. On figures 15(a) and 15(b), we observe a great deviation between the corresponding spectrum (cross line) and the reference one, which confirms the sensitivity of the traffic to the distributions of IApkt and Nobj . One notice a particularity concerning P2P traffic. Indeed, while the tested synthetic trace do not capture the bump between the scales j = −5 and j = 2, it captures the original spectra before (there is no correlation) but also after the bump.

10

23

488mus 0.002 0.0078 0.031 0.12

0.5

2s

8s

32s

Captured trace Synthetic trace: empirical distributions Synthetic trace: ml distributions except IA

pkt

128s

512s

19

0.5

2s

8s

32s

Measured trace Synthetic trace: empirical distributions Synthetic trace: ml distributions except IA

and N

obj

pkt

Synthetic trace: ml distributions

19

488mus 0.002 0.0078 0.031 0.12

128s

512s

488mus0.002 0.0078 0.031 0.12

and N

obj

Synthetic trace: ml distributions

15

0.5

2s

8s

32s

128s 512s 2048s

Captured trace Synthetic trace: empirical distributions Synthetic trace: ml distributions except IApkt and Nobj

15

Synthetic trace: ml distributions

15

11

7

log2 Variance(j)

log2 Variance(j)

log2 Variance(j)

11

11

7

−11

−9

−7

−5

−3

−1 scale j

1

(a) Web traffic

3

5

7

9

3

7

3

−11

−9

−7

−5

−3

−1 scale j

1

3

5

7

9

−1

(b) Mail traffic

−11

−9

−7

−5

−3

−1 1 scale j

3

5

7

9

11

(c) P2P traffic

Figure 15: Investigating the substitution of all empirical distributions

6

Conclusion

References [1] W. E. Leland, M. S. Taqq, W. Willinger, and D. V. Wilson, “On the self-similar nature of Ethernet traffic,” in ACM Sigcomm, 1993.

This paper describes LiTGen, a light traffic generator. LiTGen has the benefit to reproduce accurately the traffic scaling properties at small and large time scales, while using a very simple underlying hierarchical model. We highlighted the dependency between objects sizes and in-object packets interarrivals, and showed how this impacts the quality of the generator. Thanks to LiTGen, we investigated the impact of the random variables distributions describing the IP traffic structure. This investigation helps understanding the sensitivity of the traffic, with regards to the distributions involved in its description, and then identify crucial parameters. It also gives insights to anyone willing to provide accurate traffic models. As an example, the objects size (in number of packets) and the respective packets inter-arrivals have to be modeled by heavy tailed distributions to reproduce accurately original traffic. Nevertheless, our study demonstrated that the presence of heavy-tailed distributions in traffic does not necessarily implies the correlation, some of them can be modeled by memoryless distributions without impacting the traffic scaling properties.

[2] M. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: Evidence and possible causes,” in ACM Sigmetrics, May 1996. [3] N. Hohn, D. Veitch, and P. Abry, “Does fractal scaling at the IP level depend on TCP flow arrival process?” in ACM IMC, November 2002. [4] Z.-L. Zhang, V. J. Ribeiro, S. B. Moon, and C. Diot, “Smalltime scaling behaviors of Internet backbone traffic: an empirical study,” in IEEE INFOCOM, 2003. [5] W. Willinger, M. S. Taqqu, R. Sherman, and D. V. Wilson, “Self-Similarity throught high-variability: Statistical analysis of ethernet LAN traffic at the source level,” in ACM Sigcomm, August 1995. [6] V. Misra and W.-B. Gong, “A hierarchical model for teletraffic,” in IEEE CDC, December 1998. [7] B. B. Mandelbrot and M. S. Taqqu, “Robust R/S analysis of long run serial correlation,” in 42nd Session ISI, 1979. [8] R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R. G. Baraniuk, “A Multifractal Wavelet Model with Application to Network Traffic,” IEEE Transactions on Information Theory, vol. 45, no. 4, pp. 992–1018, 1999. [9] S. Vaton, “‘Fractal’ versus ‘Poisson’ models of traffic,” in IFIP Workshop on Performance Modeling and Evaluation of ATM and IP Networks, 2000. [10] B. A. Mah, “An empirical model of http network traffic,” in IEEE Infocom, April 1997. [11] P. Barford and M. Crovella, “Generating representative web workloads for network and server performance evaluation,” in ACM Sigmetrics, June 1998.

This study also demonstrated the ability of a hierarchical model to reproduce accurately the characteristics of classes of traffic. Precise classes of applications (e.g. web and mail together, the main contributors to wireless traffic) will be defined soon to specify the utilization domain of the hierarchical model. While the results of LiTGen are proper for the P2P traffic, a more accurate but simple model for P2P traffic and other classes of application is still to be defined. This will become particularly important when mobile users will massively adopt new services and decrease the domination of web application in the overall traffic.

[12] J. Ridoux, A. Nucci, and D. Veitch, “Seeing the difference in IP traffic: Wireless versus wireline,” in IEEE Infocom, April 2006. [13] F. Donelson-Smith, F. Hernandez-Campos, K. Jeffay, and D. Ott, “What TCP/IP protocol headers can tell us about the web,” in ACM Sigmetrics, June 2001. [14] Veitch, D. and Abry, P.: Matlab code for the wavelet based analysis of scaling processes, http://www.cubinlab.ee.mu.oz.au/∼darryl/. [15] P. Abry, M. S. Taqqu, P. Flandrin, and D. Veitch, SelfSimilar Network Traffic and Performance Evaluation. Wiley, 2000, ch. Wavelets for the analysis, estimation, and synthesis of scaling data.

11

[16] K. V. Vishwanath and A. Vahdat, “Realistic and responsive network traffic generation,” in ACM Sigcomm, September 2006. [17] C. Rolland, J. Ridoux, and B. Baynat, “Hierarchical models for different kinds of traffics on CDMA-1xRTT networks,” UPMC - Paris VI, LIP6/CNRS, Tech. Rep., 2006, http://www-rp.lip6.fr/∼rolland/techreport.pdf.

12

Suggest Documents