Monitoring the Spatial-Temporal Effect of Internet Traffic Based on Random Matrix Theory Jia Liu
Wenzhu Zhang
Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University Beijing 100084, China Email:
[email protected]
Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University Beijing 100084, China Email:
[email protected]
Abstract—Monitoring traffic pattern is always an important approach to improve the performance of computer networks. In recent years, Peer-to-Peer (P2P) service gradually dominates the Internet traffic against traditional Client-Server (C-S) service such as Web. The changed situation of networks is mainly in two aspects. One is that popular P2P file-sharing applications require both board downstream and upstream bandwidth, which conflicts to asymmetry structure of the C-S service oriented networks, and another is that the topology of P2P overlay networks is varying with peers joining and leaving. The dramatic shift of Internet traffic may cause different patterns from that observed in early years. This leads a new round of Internet measurement, which requires effective monitoring of network-wide traffic within the region of ISP or autonomous system. Unfortunately, few of technologies can capture well the dynamics of C-S and P2P traffic in a macroscopic level. In this paper, we propose a method based on Random Matrix Theory (RMT) for the detection of networkwide traffic pattern. Using only a few observation points, our method can monitor the macroscopic effect of the Internet traffic. We show that such macroscopic-level monitoring can be used to capture shifts in spatial-temporal patterns caused by P2P and C-S traffic, and inform where and when P2P and C-S traffic possibly arise in transit networks.
I. I In the decade after the birth of Peer-to-Peer (P2P) technology, Internet continues with its rapid and often surprising evolution. A survey from GatheLogic [1] in 2006 has found that in the period of 1993 to 2002, Web service was the main traffic of the Internet, then P2P traffic has grown to dominate Internet traffic after 2002 and it is keeping growing now. Recently another study at Tsinghua University showed that P2P file sharing traffic dominates the campus network, consuming 49.09% bandwidth. With no doubt, the P2P traffic has became one of the main traffic, and the popularity of P2P applications has brought convenience for the Internet users to share resources. However, in contrast to the traditional networking services, P2P services provide data transfer without much consideration to fair and optimal use of network resources or side-effects on other network services, e.g. today, more than 60% of Internet traffic is P2P, which ties up 5060% downstream bandwidth and 70-80% upstream bandwidth
978-1-4244-2413-9/08/$25.00 ©2008 IEEE
Jian Yuan, Depeng Jin and Lieguang Zeng Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University Beijing 100084, China Email:
[email protected]
respectively [1]. Obviously, the bandwidth requirement conflicts to the asymmetry structure of Internet which is originally designed for Client-Server (C-S) and may cause bottleneck of bandwidth. The stunning growth and the bandwidth intensive nature of such applications have essential impact on the traffic pattern of underlying network. Some of the conditions considered before may no longer be suitable in the case of P2P networking. To make things even worse, the P2P networks’ topology is in continuous change due to peers joining or leaving the overlay, which makes the monitoring more difficult. But, it is important for ISPs and Internet protocol designers to monitor the P2P systems through the whole networks dynamically, and then adopt effective methods to configure resources avoiding bottleneck bandwidth or other problems. Consequently, effective monitoring of network-wide traffic is required for the modern Internet. Research on approach based on measurement of the P2P system (e.g. Napster, Gnutella and KaZaA) such as P2P topology monitoring [2] [3], P2P traffic characteristic measurement [4] [5] has flourished. Some related study on P2P topology monitoring paid more attention to characteristic of the topology graph such as measurement of vertex connectivitis [6], distribution of interval distance among peers [3] [7] as well as clustering [3] [8] and so on, others engaged the attention of the connectivitis and session time etc. of dynamic topology. In addition, some researchers are heedful on the traffic characteristics such as skewed distribution in the traffic across the network at different levels of spatial aggregation [4] and so on. However, for ISPs, they may be more interested in a direct approach to monitor traffic pattern in a macroscopic level, informing where and when P2P and CS traffic possibly arise in networks, where and when the router is much more congested or likely-congested. It is helpful for traffic engineering with such detailed information. Unfortunately, there are few technologies till now that can evaluate and understand P2P traffic pattern in a macroscopic level. On the one hand, it is difficult to measure the distribution of the peers spatially. As peers in networks join or leave, they
258
collectively, in a decentralized fashion, form an unstructured and dynamically changing overlay topology. On the other hand, P2P traffic presents different distribution in different areas [1]. Temporal analysis requires much more resources including enough observation points and effective algorithms. As a result, it is difficult to collective information over the whole network. In this paper, we propose a method based on Random Matrix Theory (RMT), which reveals the spatial-temporal patterns of P2P and C-S traffic. With only a few observation points, this method can capture more detailed information, informing where and when the peers join and leave, where and when the bandwidth is much more consumed. The rest of this paper is structured as five sections. In section II, we introduce the related works about RMT methods. In section III, our technique to analyze macroscopic behavior is given in details. In section IV, we describe our simulation model. In section V, we discuss the simulation results. Then, we conclude our research and outline future works in section VI. II. R W In the theory of random matrix one is concerned with the following question. Consider a large matrix whose elements are random variables with given probability laws. Then what can one say about the probabilities of a few of its eigenvalues or a few of its eigenvectors? This question is originally of pertinence for the understanding of the statistical behavior of slow neutron resonances [9] [10] in nuclear physics where it was proposed in 1950s and intensively studied by the physicists. Later Random matrix theory (RMT) was developed by Wigner, Dyson, Metha et al. Then it gained importance in other areas. In [11] [12], studies applying RMT methods to analyze the properties of cross-correlation matrix C show that almost 98% of the eigenvalues of C agree with RMT predictions, suggesting a considerable degree of randomness in the measured cross correlations. These results prompt a new question: what can we infer about the structure of C from these results. In [13], Vasiliki Plerou et al. has given an interpretation of the deviations from RMT in financial data: they find that the largest eigenvalue of C represents the influence of the entire market that is common to all stocks and the contents of the remaining eigenvalues that deviate from RMT shows the existence of cross correlations between stocks of the same type of industry, stocks having large market capitalization, and stocks of firms having business in certain geographical areas. Data of internet traffic is also time correlated as financial data. A number of empirical studies have convincingly shown that the temporal dynamics of Internet traffic exhibits long-range dependence [14] [15], which implies existence of nontrivial correlation structure at large timescales. It can be explained that the TCP congestion-control algorithm exhibits a self-organizing property, that is, when a large number of connections share the Internet, underlying interactions among the connections avoid router congestion simultaneously over varying spatial extent. The flow rate in network is just like different stocks while the underlying interactions among the
connections seems to be the varying levels of volatility of different stocks. As the eigenvalue deviation from RMT has a significant interpretation in financial market, there is no different for Internet traffic. In 2002, a study of correlations among data flows in Renater [16], based on RMT method, has detected that the largest eigenvalue is approximately 100 times larger than predicted for uncorrelated time series, and the eigenvector component distribution of the largest eigenvalue deviates significantly from the Gaussian distribution predicted by RMT. Furthermore, the Renater study reveals that all components of the eigenvector corresponding to the largest eigenvalue are positive, which implies their collective contribution to the strong correlation in congestion over the whole network. Since all network flows contribute to the eigenvector, the eigenvector can be viewed as an indicator of spatialtemporal correlation in network congestion. In this view, it reveals that congestion emerges from underlying interactions among flows crossing a network in various directions. Deriving from this theory, [17] has successfully monitored the DDoS attacks. In our research, it is found that the P2P traffic and the C-S traffic have the likely-congested characteristic-each peer is the destination of aggregate flows crossing a network constructed by peers while the server is the end of flows from thousands of clients. The frequently exchanging information among peers or between clients and servers strengthen their correlation. As stated above, the impact of P2P and C-S traffic in Internet can be exposed through their correlation, as revealed by components of the eigenvector of the largest eigenvalue. Furthermore, in [16], it shows that the first n most correlated connections are given by the first n components of the eigenvector corresponding to the largest eigenvalue. In the internet, flows into peers and servers are all aggregate. We can therefore define a weight vector by grouping eigenvector components corresponding to a destination together to build up information about the influence of the destination over the whole network. Contrasting weights of the weight vector against each other, we can summarize a network-wide view of spatial-and-temporal correlation, locate peers and servers, and observe traffic patterns to find more information, e.g , the impact of P2P traffic on the whole network, where and when the P2P and C-S traffic arise. In particular, using relatively few observation points, we could infer a shift in the spatialtemporal correlation of large areas of interest outside those few areas where measurements are made. This approach can significantly reduce requirements for data, perhaps to the point of real-time analysis. In what follows, we will describe the mathematical details of our analysis method. III. M As described in section II, our approach is based on the application of the deviations from RMT. Statistical properties of random matrices such as XK×M are specified in [18]. The results promoted that, in the limit K → ∞, M → ∞, Q ≡ K/M(> 1) is fixed. It was shown analytically that the probability density function Prm (λ) of eigencvalues λ of the
259
random correlation matrix XK×M is given by √ Q (λ+ − λ)(λ− − λ) , (1) Prm (λ) = 2π λ for λ within the bounds λ− ≤ λi ≤ λ+ , where λ− and λ+ are the minimum and maximum eigenvalue √ of XK×M , respectively, given by λ± = 1 + 1/Q ± 1/ Q. As RMT method stated above, information related to correlation among time series can be extracted from the eigenvalues out of the range of λ− ≤ λi ≤ λ+ . In the case of Renater [16], the empirical distribution of eigenvector components for the large eigenvalues is ’flat’, all components being of the same order. This suggests that the largest eigenvalue is associated with strong correlations among the network. In [16], it also showed that all the components of the eigenvector of the largest eigenvalue indicate correlations existing in the network, i.e. the first n most correlated connections are given by the first n components of the eigenvector corresponding to the largest eigenvalue. Our approach is a development based on this theory. A. Traffic matrix In our analysis, it is assumed that there are N subnets, interconnecting through backbone routers to form a network, with L subnets routers selected as observation points to log outbound traffic. Let XK×M denote the matrix of the packets flow, observed in L subnets, in which K represents the number of sampling points multiplied by the number of subnets, i.e. K = L × N and M represents the times of obersvation. However, flows within the same subnet are zeros in XK×M . Each component xi j of XK×M corresponds to the packets of the ith source-destination flow, counted in the jth time interval T , where T is the time window of each observation. Our approach aims to capture the spatial-and-temporal characteristics of whole network to further locate the peers or servers, which is caused by their burst traffic and strong correlation. So, we normalize the XK×M as follows. Each flow variable xi j , a component in XK×M , is normalized as fi j by its mean mi j and standard deviation σi j [13] [16] [17], xi j − mi j fi j = , (2) σi j
Analysis of eigenvalues and eigenvectors is a frequentlyused method in matrix theory. We can further analyze the correlation matrix C through eigenanalysis [19]. The equation Cω = λω,
(4)
defines eigenvalues and eigenvectors, in which λ is a scalar, called the eigenvalue. If C is a square K-by-K matrix, then ω is the eigenvector, a nonzero K by 1 column vector. Eigenvalues and eigenvectors always come in corresponding pairs. They can be computed in various ways [19]. An eigenvector is a special kind of vector for its associated matrix, because the action of the underlying operator represented by the matrix takes a particularly simple form, rescaling by a real number multiple. C. Subnet weight Let the eigenvector ωl correspond to the largest eigenvalue λl . It often has special significance. As described in [16] that the first n most correlated connections are given by the first n components of the eigenvector corresponding to the largest eigenvalue, the cross-correlation matrix contains information about underlying interactions among various flows among subnets. The components of the eigenvector ωl of the largest eigenvalue λ1 reveal the corresponding flows’ influences on macroscopic behavior, abstracted from the matrix C into pair (λ1 ,ωl ). In order to monitor the traffic pattern more explicitly, we define the subnet weight as described in the former work [17]. ωl is divided into N vector, i.e. ωl =(ωl1 ,ωl2 ,...,ω1N ). The kth sub-vector of ωl includes the components ωlik representing the ith observation’s contribution to the kth subnet. Let S1×N denote the weight vector, in which the kth element is defined as k X Sk = (ωlik )2 , (5) i
The related researches [13] [16] [17] have proved that the cross-correlation matrix contains information about underlying interactions among various flows among subnets. Therefore we can observe the P2P and C-S traffic pattern by S1×N calculated by eigenvector corresponding to the largest eigenvalue. IV. S M
The normalized flow vector FK×M corresponds to XK×M . However, in a uniform distributed network, the normalized process has little impact on the results.
We use a four-tier model to monitor the macroscopic effect of P2P and C-S traffic, which is used in our former work [17].
B. Cross-correlation analysis
The four-tier topology is shown in Fig. 1, resembling the original Abilene network [20], including 11 backbone routers (A, B, ..., K), 40 subnets routers (A1 , A2 , ..., K4 , K5 ), 100 leaf routers (A1a , A1b , ..., K5c , K5d ) and 200 hosts for each leaf router. Assume the model links between backbone routers to have varying delays, e.g. the link between routers D and F has the longest propagation delay, and the link between routers J and K has the shortest delay. Link capabilities gradually increase from host to backbone, with backbone links being hundreds of times faster than links of hosts. The assumption is to simulate the real hierarchical networks.
According to the characteristics of the internet traffic stated above, we analyze cross-correlation of the data traffic FK×M . First of all, we compute the equal time cross-correlation matrix C with elements [17] c(i j)(kl) =< fi j fkl >t ,
(3)
which measures correlation between fi j and fkl , and < ... >t denotes an average over a selected time interval t. The crosscorrelation matrix C is real and symmetric, with each element falling between -1 and 1.
A. Topology settings
260
K2c K2d
K2b I1b
I1c
I1d
I3b
K2e
I3a K2a
A3
I1a
A3b
A
K2 I3
I1
A3a
K3d
K3
A1
I A2 A1a
A1c A1b
G
E
C
K3c
K3a
K1
K3b
K4
I2
G2
A2c A2a
K K5
G1
K4b
K1a
A2b
K1b
E1
K4a
E3
B1a
B
E2
C2
C1
B1 B1b
G2a E1a
C1a
B4b
K5a
B4e
B2a
C1b C2a
C2b
E2b
E2a
J
E3a
J5 J5e
B4d
J1
H
B2b B3c
D
B3b
K5b
I2b
B4c
B3
B3a
I2a
G2b
E3b
E1b
B4 B2
I2c
G1a G1b G1C
B4a
J5a J1a
F
J2
J5d J5b
J3
J1b
H1
J5c
J4
H4 D1
D5
H3
F1 D5a
D1a
F4
F3 D3
D1b
D5b
D5d
F1a
D5c
H1c
H4a H3a
F1b F3a
D3a
H2 H1b
H4b
J2a
J3a
D4d
D3b
F2a
F2b
F4a
F3c
H2a
J4b J4a
J4c
H3b
H2d H2b
F4c
J3b
J2b
F2
D5e
D4
D2
H1a
H2c
F4b
F4d
D4c D2a
D2c
D4a
D4b
D2b
Fig. 1.
The simulation model with 11 backbone routers, 40 subnet routers, and 110 leaf routers.
B. Generating backgroud traffic The background traffic is constructed by the traffic between normal hosts. Our model includes 22,000 sources, each of which represents a client. Each source generates traffic as an ON/OFF process which can provide a convenient model of user behavior. At the beginning of each ON period, a destination receiver is chosen randomly from any leaf router. TCP protocol is selected in our experiment. As known, modern TCP implementations contain four intertwined algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery. The model works adopting likely Reno TCP, except reducing the congestion window to half of the current window size after receiving one, instead of three, duplicate ACKs. The shortest path is selected for each packet, which means the routing is static. As a result, the packets on a connection will take the same route, and no reordering occurs. Although real networks exhibit larger link capabilities than our model, it also carries more connection. It has been previously verified that our TCP model generates behavior consistent with the dynamics of Internet traffic at various timescales [17]. Empirical measurements on the Internet observe a heavytailed distribution of file sizes [21]. Here, the Pareto distribution is therefore used to model heavy-tailed characteristics. The Pareto distribution function has the following form: P[X ≤ x] = 1 − (k/x)a , (k ≤ x)
(6)
where 0 ≤ α ≤ 2 is the shape parameter. For our experiment, we select the same shape parameter, α = 1.5, for both ON and OFF process. However, different means are chosen. Here, λon = 50(packets) is selected to represent the preference for small files, as is typically the case with Web page downloads. Empirical observations of OFF periods change dramatically
between observations made at night or in the day. Let λo f f = 5000 (milliseconds) represent the average thinking time before a user requests another file. Aiming to the detection of traffic pattern, we need to simulate background traffic that is neither too sparse nor too congested. Because when the network is too lightly loaded, the traffic pattern can not be observed for the weak correlations among flows. On the other hand, when the network hardly overloaded, the likely-congested phenomenon we expect to observe will be submerged by everywhere congestion. C. Generating C-S traffic Two nodes are selected as servers, which are located in subnet D5 and subnet I1 respectively. For convenience, we generate constant rate for each C-S link. For servers in D5 and I1 , we arrange for sources of random 50 leaf routers respectively, and design different beginning time and sending rate, aiming to verify that our methods can identify detailed temporal information. The details of the parameters will be given together with the simulation results. D. Generating P2P traffic Two cases are considered for all. One is special and simple, i.e. there is only one group of peers in the networks, and they work consistent with each other. The other one is more complicated but more similar to the real P2P sharing systems in the network. The simple one can help us to observe something more easily, while the complicated one can simulate the real P2P systems. 1) One group of peers: Assume there is only one group of peers in our model, which are distributed under subnet D2 , H2 , J5 , K2 , K3 . In this P2P group, peers select sources uniformly from leaf routers in the range with the exception of their own
261
subnet. They join and leave synchronously to share same files. For each peer, we arrange for them to launch constant-rate packets. The P2P rate should be different from the C-S rate as that in the real Internet. 2) Multiple groups of peers: In this case, a dynamic P2P overlay networks is considered as follows. Let t0 < t1 < t2 , in which the three parameters represent different time. At the beginning, a group of peers distributed under A1 , D2 , J5 , K2 and K3 communicate with each other at the time t0. Then the peers of D2 leave the sharing-file system at t1, while the ones under H2 join in at t2. The experiment is designed to simulate a process of P2P sharing-file systems with dynamic peers. E. Generating mixed traffic In this part, the setting for mixed traffic is built up, which is much more complicated and real than above. Assume t0 , t1 , t2 and t3 to denote three increasing time, i.e. t0 ≤ t1 ≤ t2 ≤ t3 . Initially, peers distributed under A1 , D2 , J5 , K2 and K3 start at t0 . At time of t1 , the peers of D2 leave the P2P network. After that, C-S traffic represented as two servers under D5 and I1 is added at t1 . Then, the peers under H2 join in the above peers group at t2 . Although the experiment is different from the real traffic which may exhibit much more connections and much larger capability, our settings is an abstraction of the typical Internet traffic pattern, mixed with dynamic P2P traffic and CS traffic against plenty of background traffic. The results can show whether the dynamic P2P and C-S traffic can be detected together. F. Routers The parameters of routers are configured as that leaf routers forward at 5000 packets per second (pps), subnet routers forward at 20,000pps, and backbone routers forward at 160,000pps, similar to the real Internet routing pace. To store and forward packets, all routers maintain a queue of limited length, where arriving packets are stored until they can be processed: first in, first out. While small queue lengths lead to many losses during TCP slow start and large queues produce excessive delays, to achieve a reasonable balance, TCP simulations [22] often set lengths of router queue in a maximum queue length (160 packets, in our simulation) within this range would not influence our qualitative findings. V. S R Almost 10% of the subnet routers, i.e. four or five, are selected as observation points, which record all outbound flows to destination leaf routers. In addition, a central collector reliably receives the continuous stream of measured data from the observation points in time to perform analysis. The simulation calculates the weight vector using M data points within a moving time window T from one period to the next. Let M=500, T =500ms and the rate of the moving window is 10 data, i.e. the step of the X axis is 10. The first 200 data points are regarded as the initial data, sliding 10 data points each time. So, the X axis represents how many times the time window slides. Apparently we can calculate
Fig. 2. Spatial-temporal pattern of background traffic, observed by monitors B3 , F1 , F3 and J4 . Generated by 22000 ON/OFF sources, with λon = 50 packets, λo f f = 5000ms and shape parameter α = 1.5.
corresponding time, e.g., if X = 5, the corresponding time is (200 + 5 × 10) × T = 125s. The Y and Z axis represent zone of subnets and weight of vector S1×N . A. C-S traffic monitoring We select subnet routers B3 , F1 , F3 and J4 as observation points, which record all outbound flows to destination leaf routers. In Fig.2, the background traffic is simulated without any C-S traffic and P2P traffic. The traffic patterns distribute randomly. Then as described in part C of Section IV, two servers are added in subnet D5 and I1 . Assume all 50 randomselected sources among the whole network launch packets at a constant rate of 0.5pps at 100s, and set the server under subnet D5 as their destination. While another group of flow aims for destination I1 , with 50 random-selected sources at 125s launching packets at the rate of 0.2pps. The simulation result is shown in Fig.3. Compared with that in Fig.2, the traffic pattern is greatly changed. The two subnets D5 and I1 reveal themselves clearly. The beginnings of the two raises are located at almost X = 2 for D5 and X = 7 for I1 , corresponding to time of 110s and 135s respectively. Although there is almost a ten-second delay, the temporal characteristic of traffic pattern is obviously monitored. Furthermore, a similar experiment is done with the different parameters of sending rate 0.14pps for server of D5 and 0.2pps for server of I1 , the result of which is shown in Fig.4. By comparing the simulation result in Fig.3 with the one of Fig.4, apparently, the amplitude of the raises is larger when the corresponding server has high-rate sources. We can infer that the amplitude of the subnets’ raises reflect to the intensity of the inbound aggregate flow. It is really useful for ISPs to monitor the likely-congested node and then they can
262
Fig. 3. Spatial-temporal pattern of C-S traffic, observed by monitors B3 , F1 , F3 and J4 . For D5 , rate= 0.5pps, starting time= 125s; For I1 , rate= 0.2pps, starting time= 100s.
Fig. 4. Spatial-temporal pattern of C-S traffic, observed by monitors B3 , F1 , F3 and J4 . For D5 , rate= 0.14pps, starting time= 150s; For I1 , rate= 0.2pps, starting time= 125s.
take effective action to optimize the network resource, with avoiding the congestion node to overload. According to the analysis, the C-S traffic leads to networkwide shift in spatial-temporal correlation. Our approach is effective to monitor the varying traffic pattern caused by C-S traffic. Additionally, the traffic strength is more related to the amplitude of the raise. B. P2P traffic monitoring For one group of peers, peers in the group are distributed under A1 , D2 , H2 , J5 , K2 and K3 , beginning at 125s synchronously, with the traffic rate 30pps among peers. The simulation result is shown in Fig.5 and 6. In Fig.5, the weight
Fig. 5. Spatial-temporal pattern of static P2P traffic, calculated by complete information. For each peer (A1 , D2 , H2 , J5 , K2 and K3 ), rate= 30pps, starting time= 125s.
Fig. 6. Spatial-temporal pattern of static P2P traffic, observed by monitors B1 , F1 , F3 and K2 . For each peer (A1 , D2 , H2 , J5 , K2 and K3 ), rate= 30pps, starting time= 125s
vector is calculated by traffic information all over the network. However in real network, it is hard to obtain all data, here just for comparison between Fig.5 and Fig.6. As shown in Fig.5, five raises located in A1 , D2 , H2 , J5 , K2 and K3 appear smooth. In Fig.6, monitoring points located in B1 , F1 , F3 and H2 , the observable raises also appear in these subnets with some fluctuation. These raises all begin to rise at X = 7, which is almost consistent with our experimental settings, as well as in Fig.5. On the other hand, the amplitude of each peer differs from each other, ranging from 0.04 to 0.11. This phenomenon is possibly caused by the irregular topology of this model, e.g.
263
Fig. 7. Spatial-temporal pattern of P2P traffic, calculated by complete information. For each peer in the group A1 , D2 , J5 , K2 and K3 , rate= 25pps, starting time= 125s. Then peers of D2 leave at time= 160s and peers of H2 join in at time= 200s, with rate= 35pps.
Fig. 9. Spatial-temporal pattern of mixed traffic, calculated by complete information. P2P Traffic: for peers in group of A1 , D2 , J5 , K2 and K3 , rate= 25pps, starting time= 100s; then at time of 170s, peers of D2 leave with other peers communicate at the rate of 35pps while at the time of 180s H2 joins in to build up a new group of peers A1 , H2 , J5 , K2 and K3 with the rate= 35pps. C-S Traffic: for D5 , rate= 0.125pps, starting time= 135s; For I1 , rate= 0.1pps, starting time= 135s.
sharing-file system at 160s, while the ones under H2 join in at 200s with the rate 35pps. In Fig.7, it is calculated by all data over the whole network. The raises correspond to A1 , D2 , J5 , K2 and K3 as we expect. Then the weight vector shown in Fig.8 is calculated by data from monitors A3 , B4 , F3 , G2 and K2 . The fluctuation is more, but the raises are always obvious. Particularly, the raise corresponding to D2 rises from X = 7 to X = 14, and the raise to H2 rises from X = 24. The dynamic spatial-temporal pattern is monitored. Furthermore, the endedge of the D2 ’s raise and the start-edge of the H2 ’ raise, i.e the time when the peer D2 leaving and H2 joining, have some impact on the monitored amplitude. C. Mixed traffic monitoring
Fig. 8. Spatial-temporal pattern of P2P traffic, observed by monitors A3 , B4 , F3 , G2 and K2 . For each peer in the group A1 , D2 , J5 , K2 and K3 , rate= 25pps, starting time= 125s. Then peers of D2 leave at time= 160s and peers of H2 join in at time= 200s, with rate= 35pps.
the leaf routers under A1 and D2 is less than H2 , J5 , K2 and K3 . Consequently, using only a few observation points, the spatial-temporal pattern of P2P can be well monitored. For multiple group of peers, the P2P network is always much more complex than that with only one steady group of peers. For our purpose of monitoring the dynamic pattern of P2P traffic, the experiment is designed to simulate a likely process of P2P sharing-file systems. At the beginning, a group of peers distributed under A1 , D2 , J5 , K2 and K3 communicate with each other at 125s. Then the peers of D2 leave the
In this scenario, we simulate the mixed traffic, which contains both C-S traffic and dynamic P2P traffic as specified in part E of Section IV. The experiment is designed as follows. Firstly, peers of subnet group A1 , D2 , J5 , K2 and K3 share traffic with the rate 25pps at the time of 100s. Secondly, let both the two C-S traffic of D5 and I1 start at 135s with the source-rate 0.125pps for D5 and 0.1pps for I1 . Thirdly, peers of D2 leave at 170s and change the rate of rest peers to 35pps. Finally, peers of H2 join in at 180s with rate 35pps. In Fig.9, spatial-temporal pattern of mixed traffic is calculated by all data while in Fig.10, the result is calculated with data monitored by B3 , C2 , D1 , F1 and K2 .Comparing to the two figures, for P2P traffic, raises of A1 , D2 , H5 , K2 and K3 start at X = 0 with D2 ’s leaving at X = 18 and H2’s joining at X = 22, and for C-S traffic, they both begin to rise at X = 13. The result is almost accordant to what we expected. That means we can capture the spatial-temporal impact of both C-S traffic and the dynamic P2P traffic on the network.
264
Fig. 10. Spatial-temporal pattern of mixed traffic, observed by monitors B3 , C2 , D1 , F1 and K2 . P2P Traffic: for peers in group of A1 , D2 , J5 , K2 and K3 , rate= 25pps, starting time= 100s; then at time of 170s, peers of D2 leave with other peers communicating at the rate of 35pps, while at the time of 180s peers of H2 join in to build up a new group of peers A1 , H2 , J5 , K2 and K3 with the rate= 35pps. C-S Traffic: for D5 , rate= 0.125pps, starting time= 135s; For I1 , rate= 0.1pps, starting time= 135s.
VI. C F W In recent years, P2P service gradually dominates the Internet traffic against traditional Client-Server (C-S) service. The dramatic shift of Internet traffic may cause different patterns from that observed in early years. Unfortunately, few of technologies can capture well the dynamics of C-S and P2P traffic in a macroscopic level. In this paper, we propose a monitoring method based on RMT for network-wide P2P and C-S traffic detection. We experiment with different settings modes in our four-tier model: C-S Traffic, P2P traffic and mixed traffic. Our simulation results show that macroscopic-level monitoring could capture shifting traffic patterns during transient periods with only a few observation points. Our analysis method reveals the time and location of each type of traffic with few observations. Further works will aim for investigating the influence of observation points, and the characteristics of other eigenvalues. A We would like to thank anonymous reviewers for their helpful comments. This work is supported by the National Natural Science Foundation of China (Grants No. 60674048, 60772053), and the State Key Development Program for Basic Research of China (Grant No. 2007CB310701).
[4] S. Sen and J. Wang, ”Analyzing peer-to-peer traffic across large networks”, ACM/IEEE Trans. On Networking, vol. 12, no. 2, pp. 219-232, Apr, 2004. [5] L. Plissonneau, J-L. Costeux and P. Brown. ”Analysis of peer-to-peer traffic on ADSL”, Lecture Notes in Computer Science, vol. 3431, pp. 69-82, Mar, 2005. [6] S. Saroiu, P.K. Gummadi and S.D. Gribble, ”A measurement study of Peer-to-Peer File Sharing Systems”, Proc. of Multimedia Computing and Networking 2002, San Jose, CA, USA, pp. 156-170, Jan, 2002 [7] M. Ripeanu, I. Foster and A. Lamnitchi, ”Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer systems and Implications for System design”, IEEE Internet Computing Journal, vol. 6, no. 1, pp. 50-57, Jan, 2002. [8] M. Jovanovic, F. Annexstein and K. Bean, ”Modeling peer-to-peer network topologies through small-world models and power laws”, Proc. of the Telecommunications Forum, Belgrade, pp.161-172, 2001. [9] E.P. Wigner, ”On the statistical distribution of the widths and spacings of nuclear resonance levels”, Mathematical Proceedings of the Cambridge Philosophical Society, vol. 47, pp. 790-798, 1951. [10] E.P. Wigner, ”Results and theory of resonance absorption”, Conference on Neutron Physics by Time-of-Flight, Gatlinburg, pp.59-70, Nov, 1956. [11] L. Laloux, P. Cizeau, J. P. Bouchaud, and M. Potters, ”Noise Dressing of Financial Correlation Matrices”, Physics Review Letters, vol. 83, no. 7, pp.1467-1470, Aug, 1999. [12] V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral and H.E. Stanley, ”Universal and Nonuniversal Properties of Cross Correlations in Financial Time Series”, Physics Review Letter. vol. 83, no. 7, pp.14711474, Aug, 1999. [13] V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral and H.E. Stanley, ”Random matrix approach to cross correlations in financial data”, Physics Reviw E, vol. 65, no. 10, Jun, 2002 [14] A. Feldmann, A.C. Gilbert, P. Huang and W. Willinger, ”Dynamics of IP Traffic: A study of the Role of Variability and the Impact of Control”, Proc. of the ACM SIGCOMM’99 Conference, Cambridge, MA, pp. 301313. [15] V. Paxson and S. Floyd, ”Wide Area Traffic: The Failure of Poisson Modeling”, IEEE/ACM Trans on Networking, Vol. 3, No. 3, pp. 226244, Jun, 1995. [16] M. Barthelemy, B. Gondran and E. Guichard, ”Large-Scale CrossCorrelations in Internet Traffic”, Physical Review E, vol. 66, no. 9, 2002. [17] Jian Yuan, K Mills, ”Monitoring the Macroscopic Effect of DDoS Flooding Attacks”, IEEE Transactions on Dependable and Secure Computing, vol. 2, no. 4, pp:324-335, Oct-Dec, 2005. [18] A.M. Sengupta, P.P. Mitra, ”Distributions of singular values for some random matrices”, Physics Review E, vol. 60, no. 3, pp.3389-3392, 1999. [19] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe and H. Van der Vorst, ”Tempalates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide”, Philadephia, Society for Industrial and Applied Math, 2000. [20] M. Crovella and E. Kolaczyk, ”Graph Wavelets for Spatial Traffic Analysis”, Proc. of the IEEE Infocom 2003, San Francisco, CA, USA, pp. 1847-1857, Apr, 2003. [21] A.Feldmann, A.C. Gilbert W. Willinger and T.G. Kurtz, ”The Changing Nature of Network Traffic: Scaling Phenomena”, ACM Computer Communication, vol. 28, no. 4, pp. 5-29, 1998. [22] M.Claypool, R.Kinicki, M. Li, J. Nichols and H. Wu, ”Inferring Queue Sizes in Access Networks by Active Measurement”, Proc. of the 5th Passive and Active Measurement Workshop, France, pp. 227-236, Apr, 2004.
R [1] F. DaviD, ”P2P File Sharing - The Evolving Distribution Chain”, 2006. http://www.dcia.info/activities/p2pmswdc2006/ferguson.pdf [2] J. Liang, R. Kumar and K. W. Ross, ”The KaZaA overlay: A measurement study”, Proc. of the 19th IEEE Annual Computer Communications Workshop, 2005. [3] D. Stutzbach, R. Rejaie and S.Sen, ”Characterizing unstructured overlay topologies in modern P2P file-sharing systems”, Proc. of the 5th ACM SIGCOMM Internet Measurement Conference, Berkeley, pp. 49-62, Oct, 2005.
265