On the capabilities of application level traffic ... - CiteSeerX

2 downloads 9022 Views 228KB Size Report
In the current Internet there is only a single best effort service class and there exist no guarantees for either relative ... M.I.: E-mail: [email protected] .... Bulk. Bulk. Object transfer. Figure 2. Application types based on the use of transport protocols.
On the capabilities of application level traffic measurements to differentiate and classify Internet traffic Mika Ilvesm¨aki, Marko Luoma Helsinki University of Technology Networking Laboratory Otakaari 5 A ESPOO, Finland ABSTRACT The use of network based traffic classification to differentiate aggregate traffic has been introduced with the development of new Internet service architectures, especially with the Differentiated Services. We present measurements and analysis of various packet and flow statistics to aid in classifying or differentiating traffic flows according to the application nature. Our study on methods traffic classification includes the background analysis of traffic traces to detect applications of varying nature by measuring packet inter-arrival times, packet lengths, flow inter-arrival times, and packet and flow shares of total traffic. Most promising results with a single statistic are achieved when classifying traffic based on packet inter-arrival patterns. The interarrival time distributions of packets seem to be able to divide the traffic into two distinguishable classes. However, the division to three or more classes remains as somewhat ambiguous issue and needs further research. However, the results also indicate that no single statistic is able to classify application flows with reasonable certainty but that this might be achieved when several statistics and their analysis results are combined. A good method of increasing the classification result would be to increase the dimensionality of the classification. For instance, combining the classification results of packet IAT and packet length distributions would almost certainly lead to the detection of applications of different nature. Keywords: Traffic classification, Internet, Router Performance, Quality of Service, Artificial Intelligence

1. INTRODUCTION The existence and use of Internet applications changes in the network in various ways.1 The changes occur in relation to the network topology, and relative to the location on the planet in general. Furthermore, the changes occur in different time scales: Applications send packets at varying times, use of applications changes during the hours of the day, and new applications emerge and old ones fade away during longer periods of time. To achieve user satisfaction these dynamics of Internet traffic must be taken into account when aiming to provide traffic differentiation for QoS, or if trying to optimize the workload required to deliver this traffic.2 In the current Internet there is only a single best effort service class and there exist no guarantees for either relative or absolute service classes and, furthermore, it is questionable3–6 if we can ever even expect the Internet to provide absolute end-to-end QoS.3–6 The current best-effort-only architecture is ideal for elastic applications that adapt to the available network resources.7 However, the emergence of delay–sensitive applications, such as voice–over–IP and videoconferencing, the latter also bandwidth–hungry, require, or benefit from the ability to provide service differentiation and service classes in the network. Similarly, the probable growth of new QoS sensitive applications might expect some sort of QoS guarantees from the network.8 A lot of work has been put into developing service architectures in the Internet. The most recent, and at the time of writing, also the most popular suggestion comes from the IETF Differentiated Services workgroup. Differentiated Services -architecture aims to provide network service to traffic aggregates with the aid of traffic classification at the network edges.9 Whether the need for service levels in the network arises from using applications sensitive to the network performance parameters (bandwidth, packet loss, delay, jitter) or from the general need to receive increased and better Other author information: (Send correspondence to M.I.) M.I.: E-mail: [email protected] Fax: +358 9 451 2474

performance than other users in the network, it is evident that some form of traffic differentiation needs to be performed in order for the network to be able to offer these different service levels to the traffic.

2. TRAFFIC CLASSIFICATION BY THE NETWORK The process of identifying flows is a useful network property for optimizing resource usage in network equipment.10 Furthermore, flows may also be associated with real-time performance guarantees and one could identify these flows by matching packet headers with pre-specified filters. The fundamental question, and a source of heavy debate, is whether we should classify an IP packet flow implicitly by and in the network or explicitly by the user requests.11 The different methods and possibilities of application classification and subsequent quality differentiation are illustrated in Figure 1. QoS

Quality differentiation

Equal quality

Network Controlled

User Controlled

Cost differentiated

Signaled

Per Flow

Per Packet

RSVP IntServ

DSCP DiffServ

Header Analysis

Traffic Analysis

Flat fee

Per Packet

User Detection

Application Detection

SIMA

Smart Market

DiffServ

DiffServ

Real Time

Length

BW

Background

IAT

IAT

DiffServ AF

Length

Activity

These produce filters for Header Analysis

Figure 1. Classification methods for quality differentiation The selection done by the user implies that the network should offer a set of service classes from which the user is then to choose. In this approach, the incentives needed in the network must encourage users to request the proper service classes for their application to enable control on the use of resources in the network. As informal social conventions are seen inadequate to control selfish behavior, the alternative incentive of pricing has to be considered. In addition, applications have to know about the network service offerings. The selection of the service level implicitly by the network means that the network chooses the applications to be prioritized and the appropriate service class. Since there is no explicit commitment to a given service level, the mapping of the application to the service class, and the nature of the service delivered to each service class, need not to be uniform across all routers nor stable over time. The selection criteria are based on preset filters indicating users, user groups, or applications. To determine the filters analysis of measured traffic may be utilized as a background measurement and analysis process. The implicit approach entails a fixed set of application classes and that this approach cannot accommodate individual or situational variations within a single application because the network needs to know something about the requirements for each application.11 However, if we use real-time monitoring, measurement and analysis of the network traffic, we might be able to provide dynamic sets of applications to service classes and, consequently, a possibility to offer varying and dynamically changing service levels. Another problem is that when using the implicit approach the service offerings could be different between different routers.11 This problem could be overcome by introducing

common service offering databases to the network that could be then used to update the individual routers databases. Finally, one could argue that the service should be requested explicitly by applications.11 However, the recent efforts with Internet QoS, like the Differentiated Services -approach, indicate that service levels concern aggregates of traffic flows as, for example, in the case of handing out bandwidth to specific user or application groups. Therefore, we feel that the network based traffic classification is a valid problem requiring more attention. Consequently, if the network is used to initiate the classification and differentiation of the traffic, additional functionality in the form of traffic analysis methods should exist to dynamically create and update the classification filters and rules. Our intention, in this work, is to survey the possible statistics that could be measured and analyzed from the network and study whether these properties reveal any indication as to the nature of the application behind the traffic. The rest of the work is arranged as follows: In Section 3 we examine the application classes that exist in the current Internet and suggest a division of traffic into three classes. In Section 4 we take a look at the packet and data flow properties that could be measured to form the classification criteria. In Section 5 we will present the measurement results and analyze them in order to determine whether any particular measured property could be used as an aid in the classification process. Finally, we conclude and present some suggestions for further research.

3. DIVIDING APPLICATIONS TO CLASSES One option to have dynamism in the selection of the prioritized application sets is to observe and measure traffic characteristics and then base the prioritized application sets on these observations and analysis thereof. Ideally, we would want to use a set of applications that benefit the user if the traffic of these applications would be classified. Therefore, we have to develop methods to recognize the possible characteristics of different types of applications. Looking at the duration of flows and traffic directionality, the Internet today carries three basic categories of traffic,12 and any kind of network environment aiming to offer services at different levels must recognize and adjust itself to these three basic categories. The categories are: 1. The long held adaptive reliable traffic flows, typically including the long held TCP traffic flows. 2. The short duration reliable transactions, where the lifetime of the traffic flow is so short that the flow sits completely within the startup phase of the TCP adaptive flow control protocol 3. The externally controlled load unidirectional traffic flow, which is typically a result of compression of a real time audio or video signal. The classification of applications may be done by other criteria also. The transport layer protocols are the last possibility to control the sending of the packets in the Internet. A view based on the application’s use of transport protocol is shown in Figure 2. We can reduce three main classes for the traffic if we put the UDP-based rate controlled applications to the first class because of their likely interactive nature and combine the TCP based interactive applications to the second group and leave the rest of the applications to the third. Combining the previous approaches on application types we arrive to three traffic classes that are used in this work. Consequently, in this work, we aim to divide the traffic into three classes using the measured properties of packets and flows. These classes together with suggestions on the related applications are presented in Table 1. The applications in Table 1 are chosen for closer inspection because, first of all, they exist in the network trace and because they offer, in our view, a diverse set of applications of distinctly different and characteristic nature. The aim from hereon is to observe the traffic characteristics on different time scales and see whether any single measured property would clearly indicate the application type.

Application classification

TCP based

Interactive

UDP based

Bulk

Controlled

Raw

NFS Object transfer HTTP Xwin

Bulk FTP

Proprietary RealAudio RealVideo

RTP based H.323

Bulk

Interactive

NFS

IP phone

Figure 2. Application types based on the use of transport protocols

Class nr Class 1 Interactive traffic Class 2 Transactional traffic Class 3 Supportive traffic

Table 1. Service classes used Service class properties and typical applications Intended Class properties High priority traffic, applications that send packets at very short intervals for lengthy periods of time and have usually a human user interaction in both ends of the traffic flow.

Example applications VocalTech IP phone (port 22555).

Priority traffic, applications that part of the time send packets at very short intervals, but have moderate intervals between bursts. These applications typically have a human user at one end requesting information from a server at the other end. Traffic that is important to the network functionality and reliable functioning of the applications but whose actual prioritization does not affect the perceived overall quality of the original application causing the birth of supportive traffic.

Secure Shell (port 22) and HTTP (port 80) Name service applications(port 53 for dns)

4. MEASURED PROPERTIES TO CHARACTERIZE APPLICATIONS As a general notion, the measurements should be kept as simple as possible and it should be avoided to do unnecessarily complex measurements if the same kind of results could be achieved with simpler measurements. If we are to detect and classify individual applications from the traffic there should be no need to measure any properties that contain the packets and flows of several different applications and therefore we will be concentrating in this work on measurements that are made on packets and flows of certain selected application types as mentioned previously in Table 1. The application is defined, for simplicity, by the source TCP port number in the packet header. This enables us to see the traffic patterns of the sending application sources. Internet traffic presents itself differently in different timescales. This is illustrated in Figure 3 and we can see that there are only a limited set of properties that may be measured and statistics that may be analyzed in order to determine the application type. In this work, we focus on measuring different statistics in the packet and flow level.

slow phenomena

Traffic classification - resource based - user based

IP applications

flow classification

flows

Resource allocation

TCP connections buffering

packets

packet classification

bytes bits fast phenomena

Figure 3. The timescales of Internet traffic

4.1. Packet level statistics for application flows Since real-time applications need to send and receive packets within a bounded delay, it is evident that the applications try to send the packets with a fixed intersending time. At the receiving end, it is also preferred, for satisfactory performance, that the packets arrive regularly according to preset or presumed arrival process. Therefore, traffic flows are assumed to have real-time properties if the distribution of the arrival process is uni-modal.13 This assumption does not take into account the advanced features of some interactive applications, such as silence detection in the IP phone application. With elastic applications, usually utilizing the flow control and congestion avoidance properties of the TCP-protocol, the inter-arrival distribution is heavily distorted by delay and capacity properties of the connection. These properties influence the window algorithm, through delayed ACKs, of the transmitting TCPclient causing inter-arrival distribution resembling hyper-geometrical shape. We measure the inter-arrival times of the packet streams14 on selected applications and observe whether the distributions give indication on the application type. At the packet level, we also measure the packet length and how it is distributed within an application flow. For the interactive applications, with their fixed sending process and their need for tolerance on packet losses, the packet length may be considered short. In the elastic applications the packet lengths may vary and depend also on the underlying network technology. Usually the aim is to use relatively lengthy packets in order to maximize the throughput and utilization. Last, we measure the amount of packets that are sent within an application compared to the total number of packets in the traffic trace. As we compare the amount of packets that are sent with one application we may state something about its popularity in the network. However, it should be remembered that a large number of packets may mean just that an application is sending short packets, or possibly an application in an erroneous state as a result of either a malfunction in the application software itself or as a result of the network being unstable or compromised.

4.2. Flow level statistics for application flows Advancing to the flow level, we first run into the trouble of defining the flow. Two points of view exist in the literature: The first view states that a flow should be considered as the packets between the TCP-packets that establish and tear down a TCP connection. The other view, that is also favored in this work, sees the flow as packets traversing from one part of the network to another, where the packet follow each other in a flow with certain common contents in the packet header and within a set time interval (flow timeout).15 In this work, we will use the accepted and used15–18 value of 60 seconds for the flow timeout. The flow interarrival time distribution, or the frequency, with which new flows occur gives us an indication how popular the application is in the network given that the flow timeout value is of reasonable value. This property is partly dependent on the flow timeout and partly indicates the usage pattern of the application: If the timeout is very short the flow timeouts rapidly and a new flow needs to be set up for the remaining packets. As the timeout increases, we start seeing the flows created because of new users of the application appear in the network. We will also count for the number of flows in reference to the total count of flow for a traffic trace. For a fixed timeout value and for a given application this indicates how the application behaves and how often it is used in the network compared to other applications.

5. MEASUREMENTS AND RESULTS The measurements and related simulations were done on a traffic trace obtained from the access point of the Networking Laboratory at Helsinki University of Technology. The trace is 34 days long and contains both outbound and inbound traffic. The traces were fed to a simulated IP router,similar in functionality found in the simulation package used in [17], which then produced results on the packet and flow data as required. The inter-arrival-time patterns for packets among the selected applications are shown in Figure 4. We present the distributions observing the packets as one long stream and divided into flows. Note, that the distributions do not change significantly. Observing the packet inter-arrival-time (IAT) distributions, we see that the distributions are different for each application type. The distribution of the secure shell (ssh / port 22) seems to have a long tail whereas the IP phone application clearly has a characteristic bi-modal IAT-distribution with the active and silent periods of the application showing as the two spikes in the distribution. The dns-service has its characteristic IAT-distribution indicating that the packets are appearing on equally spaced pattern. The http-protocol resembles somewhat the distribution of the ssh-application with shorter tail at the end. The IAT-distributions seem to characterize the applications with some level of certainty: dns is clearly separable from ssh and http that resemble each other, and the IP phone application is portraying its characteristic packet inter-arrival pattern. These distributions could be either parameterized or used for pattern recognition to be then used to determine other similarly behaving applications in the network. The distribution of packet lengths with selected applications are shown in Figure 5. Note, that for the httpapplication we show the packet length distributions for randomly selected flows to show that the distribution varies from flow to flow. With the other applications the reader is assured that the individual packet length distributions per flow are similar with each other. We can see that ssh-service and the dns-service send short packets whereas the http-protocol seems to be more geared towards longer packets indicating that the majority of packets seen in this application originate from www-servers. This is an expected phenomena noting the definition of application by the source TCP port number. The spikes implying the short packets in http-protocol are the result of the client side acknowledgments to the server. All in all, the packet length distribution of this kind, a lot of short and long packets, seems to be characteristic to the http-protocol. The IP phone application clearly shows its characteristic of using short packet lengths with almost exclusive preference to 180 bytes (almost 90% of the packets). The packet length distributions clearly divide the applications into two. Short packets may indicate interactive applications. However, there seems not to be any absolutely certain way of telling the application priority based purely on the packet length distributions. In Figure 6 we can see the amount of packets and flows that the selected applications have in the traces. The http-application reserves the majority of packets and flows and whereas the ssh-application is quite high in the packet count but somewhat lower on the flow count. It should also be noted that with relatively high share of packets on the IP phone application it still seems to reserve a quite low share of flows. The packet and flow shares

Packet stream

Flow sorted

Packet stream 1

0.9

0.9

0.9

0.9

0.8

0.8

0.8

0.8

0.7

0.7

0.7

0.7

0.6

0.6

0.6

0.6

0.5

0.4

0.5

0.4

% of occurances

1

% of occurances

1

% of occurances

% of occurances

Flow sorted 1

0.5

0.4

0.4

0.3

0.3

0.3

0.3

0.2

0.2

0.2

0.2

0.1

0.1

0.1

0.1

0

0

0.05

0.1 Time (ms)

0.15

0

0.2

0

0.05

0.1 Time (ms)

0.15

0

0.2

0

0.05

0.1 Time (ms)

0.15

(a) ssh

0

0.2

Packet stream

0.9

0.9

0.9

0.8

0.8

0.8

0.8

0.7

0.7

0.7

0.7

0.6

0.6

0.6

0.6

0.5

0.4

% of occurances

0.9

% of occurances

1

% of occurances

1

0.5

0.4

0.3

0.3

0.2

0.2

0.2

0.2

0.1

0.1

0.1

0.1

0.5

1 Time (ms)

1.5

2

(c) dns

0.2

0.15

0.2

0.4

0.3

0

0.15

0.5

0.3

0

0.1 Time (ms)

Packet stream

1

0.4

0.05

Flow sorted

1

0.5

0

(b) http

Flow sorted

% of occurances

0.5

0

0

0.5

1 Time (ms)

1.5

2

0

0

0.05

0.1 Time (ms)

0.15

0.2

0

0

0.05

0.1 Time (ms)

(d) Vocaltech - IP phone

Figure 4. Packet interarrival times for selected applications

do not indicate the application characteristics as such, but observing the packet and flow shares together might indicate interactive applications when the packet count is relatively high and the flow count relatively low. For the inter-arrival-time of application flows the distributions for the selected applications are shown in Figure 7. The ssh- and dns-applications are shown in 10 minute timescale whereas http is shown in 1 minute timescale and IP phone in 1 second scale. We can see that the arrival processes of the application flows are very much alike in appearance and do not provide that much information to classification problem. Especially for the applications with interactive properties (ssh and http) the interarrival times spread themselves in somewhat exponential fashion. The spike in the distribution at 60 seconds for ssh indicates that the used flow timeout value is too short and should be somewhat increased to lower the need for frequent flow setups. With the http-protocol the flows are appear to be relatively short and occuring frequently. This is in accordance with the nature of WWW-service where flows are formed frequently to

Flow #1

Flow #2

1

1

0.8

0.8

0.9

0.8

% of occurances

% of occurances

1

0.6 0.4

0.7

0.2

0.6

0

0.6 0.4 0.2

0

500 1000 Packet length (bytes)

0.5

0

1500

0

500 1000 Packet length (bytes)

Flow #3

1500

Flow #4

1

1

0.8

0.8

0.3

0.2

0.6 0.4 0.2

0.1

0

% of occurances

% of occurances

0.4

0 0

200

400

600

800

1000

1200

1400

1600

0

500 1000 Packet length (bytes)

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

200

400

600

800

0

1500

0

500 1000 Packet length (bytes)

1500

(b) http

1

0

0.4 0.2

(a) ssh

0

0.6

1000

1200

1400

1600

0

0

200

(c) dns

400

600

800

1000

1200

1400

1600

(d) Vocaltech - IP phone

Figure 5. Packet length distributions for selected applications

wide selection of servers. The spikes in 90 second intervals for dns-service are unexplained. A superficial glance at the dns-configuration did not reveal any parameters that could explain the behavior. The low overall usage of the IP phone provides no additional information on the application arrival process.

6. CONCLUSIONS In this work, we observe the accumulated statistics of flows and packets on selected applications. No single measured property indicates the nature of application with a certainty. The best single characteristic for a given application seems to be the inter-arrival-time distribution of packets. The IAT distribution of packets seems to be able to divide the traffic into two distinguishable classes. However, the division to three or more classes remains as somewhat ambiguous issue and needs further research. The distributions for flow interarrival times do not seem to produce any additional data for classification purposes.

Flowshares with selected applications

Packet shares with selected applications ssh 2,73 % dns 1,49 %

http 18,32 %

http 21,19 %

ftp 0,04 % Vocaltech 0,01 %

ssh 14,34 %

dns 0,31 % ftp 0,34 % Vocaltech 1,56 %

Other 65,13 %

Other 74,55 %

(a) Packet count

(b) Flow count

Figure 6. Packet and flow shares for selected applications

The modeling of flow arrival processes with different kind of applications might prove to be an interesting research issue in the future, however. A good method of increasing the classification result would be to increase the dimensionality of the classification. For instance, combining the classification results of packet IAT and packet length distributions would almost certainly lead to the detection of applications of different nature. The same applies to combining the packet and flow count measures; These measures alone do not aid in detecting different applications but were they combined it might be possible to detect applications with relatively few flows and with relatively large packet counts to be of interactive nature. Naturally, this statement needs to be backed up with extensive measurements and analysis.

ACKNOWLEDGMENTS This work was sponsored by the IMELIO-project in the Networking laboratory in the Helsinki University of Technology.

REFERENCES 1. J. Apisdorf, K. Claffy, K. Thompson, and R. Wilder, “OC3MON: Flexible, affordable, high-performance statistics collection,” tech. rep., MCI Telecommunications Corporation, 1997. 2. K. Thompson, G. J. Miller, and R. Wilder, “Wide-area traffic patterns and characteristics (extended version),” IEEE Network , November/December 1997. 3. M. Borden, E. Crawley, B. Davie, and S. Batsell, Integration of Real-Time Services in an IP-ATM Network Architecture, August 1995. 4. S. Shenker, C. Partridge, and G. R., Specification of Guaranteed Quality of Service, February 1997. 5. E. Crawley, R. Nair, B. Rajagopalan, and H. Sandick, A Framework for QoS-based Routing in the Internet, March 1996. 6. B. Rajagopalan and R. Nair, Quality of Service (QoS) -based Routing in the Internet - Some Issues, October 1996. 7. L. Breslan and S. Shenker, “Best–effort versus reservations: A simple comparative analysis,” in Proceedings of SIGCOMM ’98, IEEE/ACM, IEEE/ACM, 1998. 8. E. Guarene, P. Fasano, and V. Vercellone, “IP and ATM integration perspectives,” IEEE Communications Magazine , January 1998.

Flow interarrival time distribution of http−application

Flow interarrival time distribution of ssh−application

0.35

0.06

0.3 0.05

0.25

% of occurances

% of occurances

0.04

0.03

0.2

0.15

0.02

0.1

0.01

0

0.05

0

100

200

300 time(s)

400

500

0

600

0

10

20

(a) ssh

30 time(s)

40

50

60

(b) http

Flow interarrival time distribution of Vocaltech−application

Flow interarrival time distribution of dns−application

0.5

0.04

0.45 0.035

0.4 0.03

0.35

% of occurances

% of occurances

0.025

0.02

0.3

0.25

0.2

0.015

0.15 0.01

0.1 0.005

0

0.05

0

100

200

300 time(s)

(c) dns

400

500

600

0

0

0.1

0.2

0.3

0.4

0.5 time(s)

0.6

0.7

0.8

0.9

1

(d) Vocaltech - IP phone

Figure 7. Flow interarrival time distributions for selected applications

9. K. Kilkki, Differentiated Services for the Internet, Macmillan Technology Series, Macmillan Technical Publishing, 1999. 10. V. P. Kumar, T. Lakshman, and D. Stiliadis, “Beyond best effort: Router architectures for the differentiated services of tomorrow’s internet,” IEEE Communications Magazine 36, pp. 152–164, May 1998. 11. S. Shenker, “Fundamental design issues for the future internet,” IEEE Journal on Selected Areas in Communications 13, pp. 1176–1188, September 1995. 12. P. Ferguson and G. Huston, “Quality of service in the internet: Fact or fiction,” 1998. 13. A. Chapman and H. Kung, “Automatic quality of service in IP networks,” in Proceedings of the Canadian Conference on Broadband Research, pp. 184–189, April 1997. 14. B. Nandy, N. Seddigh, A. Chapman, and J. H. Salim, “A connectionless approach to providing QoS in IP

15. 16. 17. 18.

networks,” in High Performance Networking, H. van As, ed., pp. 363–379, IFIP, Kluwer Academic Publishers, September 1998. K. C. Claffy, H.-W. Braun, and G. C. Polyzos, “A parameterizable methodology for internet traffic flow profiling,” IEEE Journal On Selected Areas In Communications 13, pp. 1481–1494, October 1995. P. Newman, T. Lyon, and G. Minshall, “Flow labelled IP: A connectionless approach to ATM,” in IEEE Infocom, San Francisco, IEEE, March 1996. S. Lin and N. McKeown, “A simulation study of IP switching,” in ACM SIGCOMM ’97, 1997. P. Newman, G. Minshall, and T. L. Lyon, “IP switching – ATM under IP,” IEEE/ACM Transactions on Networking 6, pp. 117–129, April 1998.

Suggest Documents