Measurement-Based WWW User Traffic Model for Radio ... - BME

Measurement-Based WWW User Traffic Model for Radio Access Networks ´ Attila Vidács † József Barta ‡ Zsolt Kenesi ‡ Tamás Eltet˝ o‡ † Budapest University of Technology and Economics Department of Telecommunications and Telematics H-1117 Budapest, Pázmány Péter s. 1/D, Hungary Tel: [+36](1)463-1926; fax: [+36](1)463-3107 E-mail: [email protected] ‡ Ericsson Traffic Lab, Ericsson Research H-1037 Budapest, Laborc u. 1. Hungary Tel: [+36](1)437-7615; fax: [+36](1)437-7219 E-mail: Jozsef.Barta,Zsolt.Kenesi,[email protected]

1

Abstract

This contribution focuses on web user traffic modeling for radio access networks providing multimedia applications. The main goal is to provide detailed and accurate source models of wireless WWW users suitable for radio access networks. The performed modem pool measurements to define the model parameters are described. Some validation results of the implemented model is also reported.

Keywords: traffic modeling, Internet, world wide web, traffic measurements, radio access networks

2

wireless networks. Two of these problems which are not (or partially) addressed are that mobile users have limited resource (not LAN-like) scenario, and the differences between uplink (upstream) and downlink (downstream) traffic. Another reason for the need of continuously updated web traffic model is the changing structure of the traffic transferred on WWW. This is due to the change of different use and content of the Internet and also is a result of different protocols and browsers used. In this document a new general WWW traffic model is proposed. This structural multilayer model provides the understanding of the internal structure of Web traffic, and thus makes it possible to fit the model to the wireless scenario of radio access networks.

Introduction

The performance evaluation of 3rd generation (3G) radio access networks providing data services requires source models for several traffic types. The access networks must simultaneously fulfill the QoS requirements of these traffic types which differ significantly in their structure, amount and variability. This diversity makes the source modeling a challenging task. A common set of traffic models is essential to provide comparable results of different performance evaluations of the radio access networks. The dominant part of data traffic today, and also expected in the near future, is the World Wide Web traffic. Therefore, our main focus is on the quest for a wireless Web user traffic model. (The traffic source in our investigations is a wireless user in a cell of an access network.) Earlier traffic models (see e.g., [1, 2, 3, 4, 5, 6]) did not provide answers to all of the problems in

3

Performed measurements

One way of modeling data is to capture traffic on the actual network and fit the model parameters empirically. Since 3G networks are not yet deployed at the present, the measurements were carried out on a modem pool. The configuration enabled us to examine the main features of user behaviour where users have low rate access to a common link and they have to share its bandwidth which is the typical situation in radio access networks. (Some models in the literature [3, 4, 5, 6] are based on LAN measurements but the different configuration causes different user behaviour.) The measurements were made at the modem pool of Ericsson Traffic Laboratory, Hungary during January-August, 1999. About 400 users including mainly researchers, software developers, university

students (and their family members) had access to the modem pool with free callback option. The topology of the monitored network and the measurement configuration can be seen on Figure 1.

and upstream traffic.

Figure 3: Traces of the upstream and downstream traffic of a randomly chosen user session.

Figure 1: The measurement setup

The traffic was measured at two levels. The call level parameters were recorded during half a year. More than 11000 calls were registered with call holding times of about 580 days in total. The total amount of downloaded data was about 5 GBytes carried by 80 million packets. The detailed packet level measurements were carried out only for a few days due to the large storage requirement for post processing. The tcpdump sniffer program was used to store all the packets seen directing to or originating from the modems. The measured trace contained about 3 million packets and nearly 60 MBytes of data was transferred. As expected, about 80% of all traffic was caused by HTTP calculated both in bytes and in packets. We analyzed the daily profiles for both weekdays and weekends. Because of the free callback option the tariffs of the ISP had no effects on user behaviour. The period with relatively constant but not peak load between 9am and 1pm was chosen for further statistical analysis (see Figure 2).

As suggested by the figure, there was a correlation between up- and downstream arrivals when upstream packets were shifted second and were compared to downstream arrivals (see Figure 4).

Figure 4: The difference between the number of downstream and upstream packets in each 10 sec bin (left), and in the total (right) trace for different time shifts This important feature enabled us to generate downstream traffic by our model and derive the upstream traffic flow in a deterministic way. (The time shift can be a parameter to represent the delay of the actual access network when the model is used for radio access networks.)

4

Figure 2: Daily profiles for left: a weekday (Monday) and right: weekend (Sunday)

We found that about 90% of the traffic volume was transferred in downstream direction, while the number of downstream packets was nearly the same as the number of upstream packets. In other words, upstream and downstream packets mainly pair-up as data/acknowledgement (or request/answer) packet pairs. Figure 3 shows the number of packets in down-

strong packet by 0.5 packet

Web user model

As the traffic has very complex characteristics, a structural, multilayer model is proposed to give a complete characterization of HTTP traffic. Structural modeling—in contrast to the black box approach, in which only the statistical characteristics are captured without taking into account the mechanisms that generate them—provides the understanding of the internal structure, and thus makes it possible to fit the model to the wireless scenario, and also assures scalability. Based on the modeling considerations above and our modem pool measurement results, the following Web-user traffic model can be outlined. Figure 5 shows the detailed structure of how a Web browsing session can be decomposed into hierarchical layers. The definitions of each layer, the properties it tries

Session arrivals Session Session length

Session interarrival time

Requests for pages Page Page download time Requests for embedded objects TCP Object download time

Packet

Packet transmission time

Packet interarrival time within the same TCP session

Figure 5: The model structure

to capture, and the elements and mechanisms that have impacts on the given layer are described in the followings. The latter is of great importance in setting and scaling the model parameters.

4.1

Session level

A WWW session is the time period when the user is actively browsing the Web. For dial-in users the session arrival is the time instant when the user gets connected through the modem, and the holding time of a session can be defined as the duration of the seizure of an access line (i.e., call holding time). Since a mobile data user will be ‘always connected’ in 3G networks, the session needs to be defined differently. A WWW session of a mobile user is considered to be the period of time when the user is actively using the Web). This highest layer captures the users’ willingness to browse the net. The intensity and the duration of sessions mainly depend on the quality of service provided and on the price the user has to pay for it. The session arrivals can be well modeled by a Poisson process, with interarrival time as an exponentially distributed random variable. The length of a session is modeled as a Weibull distributed random variable (see Table 1). (Note the slightly heavy-tailed holding times.)

4.2

Page level

Within a session the user visits several Web files. Web files may include by reference other files (or embedded objects), thus the user’s request for a single Web file results in the transfer of multiple files from the Web server. A Web file along with all the files that also must be transferred to display is called a Web page. The page download time is the total time needed to transfer all the files of a page. While traversing the hypertext of a Web server, the user makes requests to Web pages by clicking on links or typing in URL addresses. Thus, the events

of page requests are (mainly1 ) user-initiated at this level. The characteristics of an accessed Web page depends on the content the user prefers to visit (or, from a different point of view, on the content the Web provides to the user). The capabilities of the mobile terminal can also influence the characteristics of downloaded pages (e.g., what media types can be displayed on the terminal). Note, that page downloads can also be overlapping and thus parallel in time. This can easily happen when the user opens more than one browser windows and uses it in parallel. This assumption is missing from the modeling proposals in the literature where the “download-one-page-and-read-it”assumption is used. The page requests are modeled as events with exponentially distributed interarrival times (see Table 1). These requests can be related to user “clicks”. (Note that the page download time and the number of pages in a session are not explicitly modeled.)

4.3

TCP level

For each embedded objects of a Web page, a new2 TCP connection is opened to transfer the data. Therefore the transfer of a Web page consists of several TCP connections that run in parallel. The object download time is the total time needed to transfer all bytes of a single Web file (object). Typically the browser opens TCP connections as new objects occur on the Web page, which is currently being downloaded. Thus, the requests for embedded objects are machine-initiated. The complex dynamics of TCP is the key mechanism that is responsible for the generated traffic pattern. Besides the implemented TCP version the main parameters that can influence the TCP flows are the browser settings (e.g., how many parallel TCPs are opened), the terminal capabilities (e.g., memory size), the accessed Web content (e.g., sizes of files that must be transferred), and, most of all, the network itself (e.g., bandwidth, load, losses, delays). The requests for embedded objects are modeled as events with exponentially distributed interarrival times. The number of objects per page is modeled as a geometrical random variable with mean of 4. The size of an embedded object is defined as the total number of bytes that must be transferred to display it (i.e., file size). The object size is modeled as a random variable with shifted Pareto distribution with mean of 15 kbytes. 1 Note the ‘pop-up’ windows than can open up without the user’s direct request for it. 2 Note the HTTP 1.1 protocol, when TCP connections are kept open to download more files.

Model Session interarrival time

Distribution Exp.

Distr. function F (x) = 1 − e−λx

Session length

Weibull

F (x) = 1 − e−(x/β)

Page req. interarr. time Object req. interarr. time # of emb. obj. in a page

Exp. Exp. Geom.

Object size

Pareto

F (x) = 1 − e−λx F (x) = 1 − e−λx F (n) = 1 − (1 − p)n α x0 F (x) = 1 − x+x 0

Packet size

Mult. mod.

-

α

Parameters α = 0.8, β = 3000 s λ = 0.02 λ = 0.25 p = 1/4 x0 = 6 kbytes, α = 1.4 s (see Eqs. (3,4,5))

Mean µ = 3400 s µ = 50 s µ=4s µ=4 µ = 15 kbytes -

Table 1: Model parameter summary. The modeling of the TCP layer directly is avoided in the proposed modeling approaches in the literature, mainly because of its complexity. To calculate the average bandwidth of a TCP connection, the modeling goal is to capture the following effects: When a single TCP connection is active at a time, the connection will eventually use all the available bandwidth. On the other hand, when more TCPs are active in parallel, they compete and divide the available bandwidth almost equally. To capture this phenomenon, we propose the following algorithm: Assuming that a certain fixed bandwidth is allocated to the user, each TCP connection is given half of the available bandwidth at the time instance when it is opened, and this bandwidth remains constant during the connection time, i.e., BW (i) =

−

i−1 X

BW

(j)

p1500

(1)

(i)

=

Sobj BW (i)

,

P r{Spacket = 40 bytes} = 0.25,

=

(i) P r{Spacket

= 512 bytes} = 0.35, (4)

=

(i) P r{Spacket

= 1500 bytes} = 0.4. (5)

(2)

(i)

where Sobj is the size of the ith object.

(3)

We assign uniform packet sizes within the same TCP connection. It means that when a new TCP connection is opened the packet size is chosen only once, and is kept constant for all the packets. (Note that the size of the last packet can be a fraction carrying the last segment of the file). The number (i) of packets (Npacket ) can be calculated by knowing the packet size and object size, i.e., (i)

Sobj

(i)

where I{·} is the indicator function, t(i) is the start time of the ith TCP connection, BWtotal is the amount of bandwidth allocated to the user, and T (i) is the object download time given by

4.4

p512

(i)

=

Npacket = d

j=1

T

p40

1 BWtotal − 2

! n o (j) (i) (j) (j) ·I t < t < t +T ,

(i)

(i)

The number of bytes in each packet (Spacket ) is modeled by a multimodal distribution with probabilities

(i)

Spacket

e.

(6)

(i)

The packet interarrival times (IATpacket ) within the same TCP connection are modeled as a deterministic sequence, i.e., (i)

IATpacket =

T (i) (i)

Npacket

.

(7)

Note that the packet interarrival times and packet sizes are constant only within a TCP connection. The packet stream generated by the model is the mixture of all TCP packets and thus has a composite interarrival time distribution and the multimodal packet size distribution mentioned earlier.

Packet level

TCP connections are composed of IP packets. Since more TCP connections can be active at the same time, the packet streams of different TCP connections can be mixed. The size of an IP packet can vary for different TCP connections. The packet sizes are determined by the lower layer protocols and the PDU (Protocol Data Unit) sizes used within the network.

4.5

Model parameter summary

Table 1 gives the summary of distributions together with their parameters used in the model. The following quantities can also be calculated from the distributions and parameters above: • There are 68 pages per session on the average. • There are 272 downloaded objects (files) per session on the average.

• 4.08 Mbytes of data is downloaded in a session on the average. • Assuming 3.6 kbyte/sec peak access rate (i.e., 28.8 kbit/sec) the average data rate per session is 1.2 kbyte/sec.

5

Validation

[s]

The above described Web user traffic model was implemented in full details. The model parameters were fitted to the measurement results, synthetic data traces were generated and the model was validated by comparing these traces to the real ones.

5.2

Some lower level behaviour of our model was also investigated. The main question of interest was the feasibility of the modeled TCP download rates. It was found in the measurements, that apart from very short downloads a large scale of different rates occured with approximately equal weights. In our model, the distribution of download rates was a bit different from the measurement. The possible reason is that the model of the share of the available bandwidth requires more investigations. Nevertheless, the overall statistics are in the same order of magnitude and our opinion is that the model produces valid traffic even in such simulations where no TCP stack is available.

6

Model for networks

6.1

[s]

Figure 6: Interarrival times in measurement and simulation

5.1

Higher level behaviour

The validation has been concectrated to the TCP level behaviour of the model. The packet level is determined by the TCP level, therefore it is essential to model the TCP connections correctly. First, we compared the arrival process of the TCP connections in the measurement and in the model. The QQ-plot in Figure 6 compares the distrbutions of the interarrival times of TCP connections in our measurements and simulations according to our synthetic data, and shows good agreement. It was found that the modeled and measured correlation structures of TCP interarrival times both suggest statistical independence. The probability distribution of the holding times of TCP connections in the measurements and in our model was examined. Our model fits reasonably to the measurements, though this was not explicitly modeled. The number of parallel TCP connections as a function of time was also compared. Samples in 10 second intervals were taken and the distributions showed suprisingly good agreement. This fact also supports the validity of our model.

Lower level behaviour

radio

access

’Make it wireless’

The main parameters of the model should be adjusted to the wireless mobile environment provided by the radio access networks and services. These are • the user behavior changes because of mobility, • the limited access rates, • the different terminal capabilities • the ’always connected’ scenario. First of all, there is one key factor that is not mobile specific but can have a strong impact on the traffic generated by a wireless Web user, and that is tariffing. The price of the service will primarily determine who, when, and how will use the service. (Or if we go further, pricing can be a tool to shape the users’ traffic demand according to the operator’s strategy.) However, the applied pricing schemes are hard to predict, and thus are not considered in the present work. The mobility of the user is not in the focus of our study, rather the ‘wireless’ property is taken into account, i.e., the user is connected to the Internet via radio access networks. Radio access primarily means limited access rates. The question of how to scale the model parameters to fit the whole range of available access rates in RANs is the main question for us, and will be discussed in the next section in more details. The model parameters consider those mobile data terminals that have similar capabilities to current desktop PCs. This is the large display, memory and computational power. Future handhold mobile data terminals are not considered in this document. The implemented model should also handle the fact that mobile stations with data capabilities will be ’always connected’ without any need of dial-in.

6.2

Scalability

The proposed model should be able to support the whole range of available access rates in RANs. This is in the range of 16 kbps to 384/2048 kbps. As current measurements were performed at the user access rate of about 32 kbps (i.e., the closest to the 28.8 kbps modem speeds), scaling of the model parameters are necessary. It is clear, that a higher available access rate modifies reasonably the behaviour of the Web user. In the current implementation the TCP and page interarrival time parameters are scaled according to the actual access rate. In this assumption, the user clicks more frequently to access web documents and the network can serve the user faster when more bandwidth is available, which is in line with expectations. (Note that this is a reasonable theoretical assumption that is planned to be confirmed by measurements based on modified user access rates. Similar investigations were published in [8].) Downloaded file sizes and packet sizes are unchanged, but the total amount of downloaded data is modified as a consequence that the total number of TCP connections are modified based on the data rate. The measured mean session length of dial-in users is nearly one hour long. It seems to be too long for a mobile environment and thus needs to be modified 3 .

6.3

Usability

The proposed web traffic model is primarily intended to support the performance evaluation activities of radio access networks. The evaluation of such networks requires different levels of modeling details, such as call or packet levels. At different hierarchical levels the model can be used for • call level simulations to assess user activity, • TCP level simulations, or • packet level simulations (e.g., in link level radio access network simulations) The proposed model can also be used in analytical studies as well.

7

Conclusions

A web user traffic model for radio access networks was established. The parameters to be modeled were fitted to the measured data from the modem pool. The main advantage of our proposed solution is that it provides the possibility to create a suitable model for radio access networks by tuning the model 3 For example, call holding times with mean of 11 seconds were reported in [7] where the users had to pay for the service.

parameters at the required modeling level. With this property a helpful tool can be created to evaluate the performance of access networks as a function of different Web traffic characteristics. The next important step is to make the model ’wireless’ meaning to modify the model parameters to represent the traffic expected from a wireless user. For this purpose, new data collections are planned with modified modem pool (’network’) setup, such as different modem access rates and modified outgoing link capacity. This study intends to assess the scalability of the model parameters.

References [1] M.E. Crovella and A. Bestavros, “SelfSimilarity in World Wide Web Traffic: Evidence and Possible Causes” IEEE Transactions on Networking, Vol. 5. Number 6. pp. 835-846, December 1997. [2] ETSI, “Selection Procedures for the Choice of Radio Transmission Technologies for the UMTS”, Technical Report, TR 101 112, 1998. [3] A. Reyes-Lecuona, E. Gonzalez-Parada, E. Casilari, J.C. Casasola and A. Diaz-Estrella, “A Page-Oriented WWW Traffic Model for Wireless System Simulations”, ITC 16, 1999. [4] P. Barford and M. Crovella, “Generating Representative Web workloads for network and server performance evaluation”, BU-CS-97-006, 1997. [5] M. Poza, M. Iracheta, “On the Quest of a Better World Wide Web Traffic Model for UMTS” Proc. of IEE 3G Mobile Communication Technologies Conference, Publication Number 471, pp. 456460 London, April 2000. [6] Z. Liu, N. Niclausse, C. Jalpa-Villanueva, and S. Barbier, “Traffic Model and Performance Evaluation of Web Servers”, technical document, RR-3840, INRIA, December 1999. [7] J. Farber, S. Bodamer, and J. Charzinski, “Statistical Evaluation and Modelling of Internet Dial-up Traffic”, technical document COST257TD(99)32, 1999. [8] N. Vicari and S. K¨ ohler, “Measuring Internet User Traffic Behavior Dependent on Access Speed”, technical document COST257TD(99)33, 1999.