Real-time (ms-streaming, macromedia-fcs) and Bulk (FTP or HTTP). Bulk. Real-time. Capture environment. 2. Traffic monitoring. In this section we describe the ...
A Classification Method for Bulk/RealTime Traffic based on Flow Statistics
Graduate School of Engineering Osaka City University, Japan
Masaki TAI, Shingo ATA, and Ikuo OKA
Abstract Recently, streaming services to deliver video and voice are increasing rapidly. As for traffic in the Internet, two kinds (bulk/real-time) of traffic exist together in the same network. Therefore, it is difficult to satisfy the communication quality of each traffic individually. One approach is expected that the edge router classifies the kind of traffic and provides different service for each type of traffic. Previously it is easy to identify real-time traffic by checking the protocol/port number in IP header, however, it becomes more difficult due to the existence of streaming traffic over TCP connection. Moreover, detection of real-time traffic over a kind of overlay network such as P2P or VPN is neither easy. In this paper, we propose a new identification method for real-time traffic based on not checking the protocol number, but analyzing the statistical characteristics of packet arrivals. Through experiments by using the monitored data, our proposed method can effectively identify real-time traffics. Key words : streaming, real-time traffic, traffic identification, classification, flow analysis
1
Background and research goals
Bulk and Real-time traffics exist together
Quality of real-time traffic is interfered by bulk traffic
Differentiated service
Traffic classification is needed Application-controlled interval
Bulk
Lengthen interval due to background traffic
Real-time
Real-time flow Bulk flow
Traffic direction
Existing classification techniques and problems
• Protocol number (UDP as real-time) ⇒ Not suitable for streaming over TCP (e.g., rtsp) • Well-known port number ⇒ Cannot identify: 1.Applications changing port # 2.Unknown applications 3.Real-time over VPN or overlay
• New traffic classification ⇒ based on flow statistics ⇒ not on port/protocol #
2
1. Introduction Recently, streaming services to deliver video and voice are increasing rapidly. In the real-time traffic such as streaming service, packet transmission delay is more important metric in order to achieve the playout quality. However, in the current Internet, both real-time and bulk traffics exist together in the same network. In such condition, communication quality of real-time traffic is affected by the background bulk traffic. Figure shows an example: there are two flows (real-time and bulk) sharing the same link. Packets for real-time flow have some inter-arrival gaps controlled by the application. However, the router processes packets by FIFO principle regardless bulk or real-time, these gaps are therefore interfered by packets of background traffic (bulk flow in this figure). Lengthen intervals for real-time traffic may cause a delay of playing or packet losses. For this problem, one approach is expected to offer a differenciated service for each type of traffic by classifying the kind of traffic at the edge router [1]. Previously it is easy to identify real-time traffic by checking the protocol in IP header. In addition, checking the port number used in the packet is also effective to identify the type of traffic. However, it becomes more difficult due to the existence of streaming traffic over TCP connections such RTSP (Real time streaming protocol). Moreover port number check cannot be applied to applications which change their port numbers for different sessions due to security reasons. Unknown real-time applications can neither be identified by checking the port number or protocol. Furthermore, when the real-time traffic is delivered by using a kind of overlay networks such as P2P and VPN, it is quite hard to identify real-time traffic by itself because both of traffics (i.e., bulk and real-time) may be transferred over the same overlay connection. From these reasons, there is a possibility that the traffic classification by using the port number will not be applicable in the future. In this paper, we propose a new classification method to identify bulk and real-time traffics by using flow statistics, instead of port/protocol numbers. More specifically, we first monitor packets at the gateway of our laboratory, and investigate characteristics of each type of traffic. Based on the results of investigation, we propose a new classification method which can identify real-time traffic by only checking the IP header of packets. We also show the result of identification by using another captured data, and present that our method can identify real-time traffic with high probability. We note that there are few literature to classify the type of traffic by only using the flow information on the router. Only the method to detect P2P traffic is proposed so far [2].
2
Monitoring flows
Traffic monitor
Deployed at the gateway of our laboratory
Monitored traffic
Real-time (ms-streaming, macromedia-fcs) and Bulk (FTP or HTTP) Bulk
Capture environment
Service
Site
Protocol name (Port #)
Linux distribution
Debian
ftp (21)
Linux distribution
Vine
ftp (21)
Linux distribution
Fedoracore
ftp (21)
Linux distribution
Debian
http (80)
Real-time Service
Site
Protocol name (Port #)
Streaming contents
Biglobe
ms-streaming (1755)
Music
Bonjovi
ms-streaming (1755)
News
NHK
ms-streaming (1755)
News
FujiTV
macromedia-fcs (1953)
Dancer
Takefuji
macromedia-fcs (1953)
Streaming contents
GyaO
ms-streaming over rtsp (554)
3
2. Traffic monitoring In this section we describe the measurement environment we used in this paper. 2.1 Capture environment and classification packets into flows We deploy a traffic monitor at the gateway of our laboratory. Figure shows the environment of the network. We mirrored traffic that passed the gateway of our laboratory, and capture all packets accessed between all hosts in the laboratory and the Internet. We implement a capture program using libpcap [4]. In this program, source IP address, destination IP adress, source port number, destination port number, protocol number, packet size and time stamp are recorded. After monitoring, we then classify captured packets into flows, and analyze the statistical information such as flow duration, the interval of packet arrival time in flow, and so on. For flow classification, we consider packets having the five tuples (i.e., source and destination IP addresses, source and destination port numbers, and protocol number0 as the same flow. We also set the timeout to detect the end of flow to be 60 seconds, which is based on the time of the maximum value of RTO (retransmission timeout) in TCP (64 seconds) [5]. 2.2 Target traffic We analyze several streaming flows from famous sites in Japan for target real-time traffic. We also examine some file downloads from sites of Linux distributions as bulk transfer traffic. During the examination we capture all packets on the monitoring point. Sites and protocols are summarized in the table of this slide. For real-time traffic, we select msstreaming (Windows media [7]), macromedia-fcs (streaming on Flash [7]), and RTSP (defined by RFC2326). Note that these protocols use TCP for the transport layer protocol. Since our objective is to identify real-time traffic which is not distinguishable by simply checking port or protocol numbers, we do not consider real-time applications which use UDP and/or will-known port numbers.
3
Flow characteristics
Packet inter-arrival time is different
Some packets have large gaps in real-time Most of packets arrive closely in bulk
Interval time [usec]
Variations of packet intervals
Elapsed time [usec]
Real-time Bulk
4
3. Characteristics of bulk and real-time traffic by flow analysis In this section we investigate what kind of flow statistics is important to identify real-time traffic. We especially focus on intervals of packet arrives in the flow. 3.1 Comparisons of characteristics of packet intervals between bulk and real-time traffics Figure compares time-variant intervals of packet arrivals among four flows: Biglobe (ms-streaming), Gyao (msstreaming over rtsp), Takefuji (macromedia-fcs), and Debian (ftp). In this figure, we can observe that intervals in Debian are almost small, which indicates that arrival of packets in Debian becomes bursty. It is because a rate of bulk traffic is controlled only by a window flow control of TCP. On the other hand, for real-time traffic, the intervals of packet arrivals are observed by a kind of sawtooth wave, i.e., a set of packets is arriving with small gaps between packets and a large interval exists between two sets. However, the size of gap is different according to the site which has different transfer rate and frame size. Furthermore, large intervals in Gyao are mostly identical. It is because some of real-time traffics use a constant bit rate transmission to keep the quality of streaming. However, variable bit rate streaming traffic also has a large interval between e.g., movie frames. Anyway, as shown in this figure, there is a significant difference on the variations of packet intervals between bulk and real-time traffics. Following slides discuss how we distinguish such difference automatically.
4
Cumulative counts of packet intervals 1. 2. 3. 4.
Classify into flows Measure arrival intervals between packets in flow Sort intervals in ascending order Calculate the cumulative probability f(x) from: (# of intervals eq. or less than x) divided by (total # of intervals)
4
3
Cumulative probability
3
9
3
4
3
Time
3
4
Number of intervals
6
3
2
Probability
6/11
3/11
2/11
Cumulative probability
6/11
9/11
11/11
9
3
4
9
3
Bulk Real-time
Large difference where cumulative distribution reaches 1 Bulk: less than 10ms Real-time: around 100~1000ms 5
3.2 Cumulative counts of packet intervals Figure shows the cumulative probability of number of packets having the same interval. To plot this figure, we first calculate the interval time between two packets on the flow. Next, we sort all intervals in ascending order, and count the number of packets having the same interval. We finally obtain the cumulative probability of the distribution of intervals of packet arrivals. From this figure, in real-time flows, 60% of intervals are less than 5 msec, and can be observed a significant increase of the cumulative probability around 50% of the distribution. This observation is caused by the fixed interval generated by the streaming server. However, the rest of intervals are significantly high and it would be around 100 msec to reach 100% of the cumulative probability. On the other hand, in the bulk traffic, the increase of the cumulative probability is gradual, and most of intervals are less than 50 msec. Especially, the majority of packets arrive continuously. When congestion avoidance mode of TCP, most packets of bulk traffic arrive more continuously, and there is little influence of an arrival interval by RTT. By these reasons, arrival intervals of each traffic are clearly different and an arrival interval that cumulative probability of bulk traffic became 1 is shorter than it of real-time traffic.
5
Cumulative time for packet arrivals 1. 2. 3. 4.
Time Classify into flows Number of intervals Measure arrival intervals between packets in flow Probability Sort intervals in ascending order Cumulative probability Calculate the cumulative probability g(x) from: (sum of intervals eq. or less than x) divided by (total duration of flow)
Bulk
3
4
3
9
3
4
3
9
3
4
3
4
6
3
9 2
18/48
12/48
18/48
18/48
30/48
48/48
3
Cumulative probability
Real-time
Significant difference on the slope Bulk: almost vertical Real-time: gradually increasing 6
3.3 Cumulative time for packet arrivals Figure shows the cumulative probability of time of packet arrival intervals on each flow. By contrast to the previous slide, we calculate the cumulative time (not the number of intervals) for packet arrival intervals. Let us show the example by using the table in this slide. By monitoring packets, we observe 6, 3, and 2 times where the interval is 3, 4 and 9 respectively. In the previous subsection, the cumulative interval is 6/11 (= 6/(6+3+2)) where the interval is 3. On the other hand, the cumulative time is 18/48 (= (3×6)/(3×6+4×3+9×2)) where the interval is 3. The cumulative time also show the impact of the nature of burst traffic. As shown in this figure, we can observe that the growth in bulk transfer traffic is significant rather than the one in realtime traffic. The reason is that in the bulk transfer most of packets are arriving continuously from the previous packet, and have very small intervals. Even if the number of packets having the small interval is large, the cumulative time is still small and most of intervals are included by specifying small cumulative value (e.g., 20 msec). On the other hand, the increase of the cumulative time in real-time traffic is relatively slow and gradual due to the rate control by applications. The intervals generated by streaming applications are obviously larger than packet gaps in the burst traffic.
6
Classification method 1. 2. 3. 4.
Calculate the cumulative probability g(x) from measured intervals (see Slide 6) Obtain the interval x0 satisfying the median of the cumulative probability (i.e., g(x0) = 0.5) Calculate the ratio of # of packets included within the range of ±Δ around x0 Identify that the flow is real-time when the ratio is less than the threshold r We use Δ = 28ms and r = 0.75 for evaluation Real-time Cumulative probability
Bulk
7
4. Real-time traffic classification method based on analysis result We propose a classification method for bulk/real-time traffic by using the characteristics of flow statistics shown in Section 3. 4.1 Classification algorithm Our algorithm classifies into a bulk transfer and a real-time forwarding by using the characteristics shown in the figure. Proposed classification method is shown as follows: (1) Classify packets into flows. (2) Obtain arrival intervals of packets on the flow. (3) Sort intervals in ascending order. (4) Calculate the proportion on the arrival interval time in the flow duration time, and calculate the cumulative time for packet intervals. (5) Obtain the arrival interval time equal to the median of the cumulative probability. (6) Obtain the ratio of packets included within the range of ±Δ msec at the median. (7) Identify that the flow is real-time when the ratio of packets in ±Δ msec region is less than the threshold. We examine from Step (2) to Step (7) for each flow. The value into which a bulk traffic and a real-time traffic separate set from the flow analyzed in section 3 as a threshold. Figure of left buttom shows classification method clearly. 4.2 Influence of parameter To examine our method, we need to determine the threshold for classification of traffic. We first investigate the influence of the parameter Δ, which is the acceptable range to be considered that the value is equal to the median with some negligible errors. Figure shows the detection accuracy is influenced in the proposal method according to parameter Δ within the range to be permissible of the error margin. We investigate influences for the detection accuracy by changing Δ, and search an optimal value of Δ. Figure shows the detection accuracy by changing Δ. From this figure, we observe that rise in the proportion of Δ is more remarkable real-time traffic than bulk-traffic, because more packets by bulk traffic has some arrival interval. From this result, we set that the ratio of packet is 0.75 and Δ is 28 msec as thresholds.
7
Classification result for real traffic
12 hours traffic captured at the gateway in lab.
Different dataset from analyzed one
Date
Feb. 10, 2006
Total size
8,451,533,502 [Bytes]
Total flow
59,063 [flows]
Real-time
Bulk 25.7%
Protocol (port)
Detection rate
74.3% Protocol (Port)
Detection rate
ms-streaming (1755)
100%
ftp (21)
91.8%
macromedia-fcs (1953)
83.3%
http (80)
55.2%
ms-streaming over rtsp (554)
84%
Real-time traffic is identified accurately
Some traffics are mistakenly identified • Short-lived flows • Rate controlled flows • Keep-alive messages
Future work • Evaluate with larger volumed traffic for validation • Combined method to avoid mis-identification
8
5. Classification result of using real traffic Tables show the classification result by using real traffic. We use another dataset different from the data used for analysis of classifying real-time traffic. The left-top table is a summary of dataset. For the comparison, we also count the number of flows used the port numbers shown in the table, and calculate the ratio which are identified by using our proposed method. As a result, real-time traffic can be classified with a high accuracy by using our proposed method. From the left table, we can identify more than 83% of real-time traffic, especially 100% for ms-streaming. However, real-time traffic cannot be completely detected. Because quality of proportion method is influenced by the parameter. Analysing mis-identified traffic, we can find optimal threshold and consider that real-time traffic can classify perfect. 44.8% of http traffic is identified to real-time traffic. In this way, when it classify bulk traffic into real-time traffic, it mis-identifies occasionally. Therefore, we have to investigate mis-identified flows about the port number and its flow statistics. We consider that mis-identified flows are also caused, because of rate control except caused by streaming and delays and congestion in the network. 6. Conclusion In this paper, we have proposed the algorithm to classify a real-time traffic and a bulk traffic automatically by using result of statistical analysis of both flows. By applying to another captured data, we have shown that our method can identify real-time traffic with high accuracy. As the future work, we have to evaluate with larger volume traffic for validation. Additionally, we have to propose combined method to avoid mis-identification. References [1] M. Ilvesmaki and M. Luoma, “Measurement based traffic classification in differentiated services,” in Proceedings of SPIE International Symposium ITCom, August 2001. [2] T. Karagiannis, A. Broido, M. Faloutsos and K. Claffy, ”Transport layer identification of P2P traffic,” in Proceedings of IMC’04, pp. 25-27, October 2004. [3] N. Brownlee and k. Claffy, “Understanding Internet traffic streams dragonfiles and tortoises,” IEEE Communication Magazine, vol. 40, pp. 110117, October 2002. [4] “libpcap, a system-independent interface for user-level packet capture.” http://sourceforge.net/project/libpcap/. [5] V. Paxson and M. Allman, “Computing TCP’s retransmission timer,” RFC 2988, November 2000. [6] “Windows media format, a degital media format for streaming applications.” http://www.microsoft.com/windows/windowsmedia/. [7] “Macromedia Flash format, a degital media format for streaming applications.” http://www.macromedia.com/software/flash/. [8] H.Schulzrinne, A. Rao and R. Lanphier, “Real time streaming protocol (rtsp),” RFC 2326, April 1998.
8