Exploring Graph-based Network Traffic Monitoring Marios Iliofotou Department of Computer Science University of California, Riverside Email:
[email protected]
Modeling network traffic as a graph has recently attracted significant attention. Initial studies, have only explored the capabilities of TDGs for the detection of worm activity [1] and the grouping of related hosts in enterprise networks [5]. In addition, prior efforts only focused on TDGs extracted from a single network. The main goals of this work is to explore the full potential of TDGs over a large set of applications, over long time duration, and different locations of observation. In more detail, this work comprises of two thrusts. First, we target the selection of: (a) the best methods to create and use TDGs in practice, and (b) the best metrics to model and summarize TDGs. Towards this end we experimented with a large set of real-world network traces collected from a diverse set of network links, some being of extended duration ranging from 2000 to 2008. Second, using the TDG profile of well known applications, we design a P2P classification framework [2], which is a systematic methodology for classifying network traffic using TDGs.
interaction between hosts. Such information can be the number of flows/bytes/packets exchanged between two IPs. In the basic case, an edge can represent the exchange of a single packet between two IPs. Other edge filters can collect traffic that belongs to a particular application. For example, under the assumption that all flows (defined using the 5-tuple {srcIP,srcPort,dstIP,dstPort,Proto}) with port number 53 belong to the DNS application, an edge filter using this port can be used to form the DNS TDG. Similarly, we can choose filters to generate TDGs for a large range of applications, e.g., SMTP, Gnutella, Bittorrent, Web, etc. A filter can potentially utilize all attributes of a flow, such as: (a) the payload, (b) port numbers, and (c) packet sizes and packet timing (e.g., inter packet gap) information. To model and summarize TDGs we experimented with a large set of both standard and novel metrics. We can group these metrics in two groups: (a) static metrics that capture the structure of the graph, and (b) dynamic metrics that can describe the changes of TDGs in time. Static graph metrics provide statistical summaries of TDGs. Such metrics include the degree distribution, degree correlation, distances, etc. Preliminary results of this study can be found in [3]. Dynamic metrics introduce time as a parameter in the study of TDGs. For example, a dynamic metric can quantify how much the graph has changed over the past two hours. An in depth study of dynamic properties is part of our ongoing work. We studied a large set of metrics (static, and dynamic) over a large set of real-world traces. We here summarize some key observations. The graphs formed within five-minute intervals generate graphs with stable graph structure, compared to graphs generated over shorter intervals. Such five-minute long graphs can allow the separation between different classes of applications, as we show in Section III in more detail. Another interesting observation is that TDGs show periodic and predictable behavior over the length of a day and over weekends. Finally, even though TDGs can have quantitatively different metric values over different locations, their structure is distinctive enough to lead to qualitatively similar observations.
II. TDG S : P RACTICAL C ONSIDERATIONS
III. G RAPH - BASED P2P T RAFFIC C LASSIFICATION
The definition of what constitutes an edge in the graph is flexible and can change depending on the application at hand. We refer to the processes of selecting which nodes are connected in a TDG as Edge Filtering. The edges in TDGs can be annotated to store information regarding the
The key idea in our graph-based classifier, is that TDGs generated by different applications have characteristic structure that we can use to distinguish between them. For example, in Figure 1 we show the Gnutella (P2P) and Web TDGs. Clearly, the two graphs even visually look different. To select the best
Abstract—Monitoring network traffic and classifying applications are essential functions for network administrators. These tasks are becoming increasingly challenging since (a) many applications obfuscate their traffic using nonstandard ports, and (b) new applications constantly appear. This suggests the need for a behavioral-based approach, where the detector looks for fundamental behaviors of the application that are both intrinsic to the application and distinct from normal traffic. Identifying intrinsic behaviors makes it difficult for application writers to disguise such behaviors without defeating the very purpose of the application. In this paper, we propose a graph-based representation of network traffic which captures the networkwide interactions of applications. In these graphs, nodes are individual IP address and edges between nodes represent particular communications. For example, an edge might represent the exchange of a single packet, or the exchange of at least ten packets of any type. We call such graphs “Traffic Dispersion Graphs” or TDGs [3]. As a proof of concept we show that our proposed graph-based classifier out-perfoms BLINC [4] in detecting P2P traffic on backbone links. Our results are very promising, showing that TDGs can provide the basis for the next generation of network monitoring tools.
I. I NTRODUCTION
864 1337
1335
1330
710
1323
663 1257
1245
1243
1020
1014
1019
1338
1336
1331
1013 917
1324
915
907
904
898
896
892
889
502 1032
990
987
952 916
914
906
903
897
895
838
834
830
891
888
825
1271 1258
1246
1244 837
272
833
829
802
827
824
799
374 846
801
798
920 1033
991
988
611
1033
791
865 102
652 884
953
526
712
1023
1022
1016 828
9 651
711
664 619
58
875
763
1045
843
338
375
618
1003 633
503 853 267 903 103 641
90
785 696
506 273
499
377
370
565
502
1239 993
608
606
603
376
59
369
564
923
854
615
1312 46 398 449
426
403
343
776
912
922
142
1342
788
1108
139
131
102
1240
498 862
823 10 71 337 463
559 654 179
1234
473 689
45
238 462
973
695
240
96
558
P2P non-P2P Boundaries
556
1028
200
4
208
22 756 557
353
49
507 904 863
492
983
1027
207
509
607
605
602
824
150 284 448
425
402
342
421
911 887
870
503 757
956 971
588
241
527 650
1202 141
1038
138
567
480
16
130
101
886
935
368
177
566
70
367
72
151 559
115 236
295
984 380 737
728
190 147 535 755
764
58 479
759 1039
604
555
1259 275 925
552
48
1026 84
264 83
820
887
17 520 3
532
80
536 819 392
199 1201
237
569
985
347 146
336 352 587 1040
21 568
346 497
901 587
114
563
981
715 1047 1015
588 1041
124
554
22 266 560
388
1144
277
1343
706
430 826
937
123
553 316
189
621
338
821 759 792 59
1311
982 620
1036
531
533 265
621
70
1077
767
822 115
111
882
1344 783 20 177 119 908 114
110
881
535
366 311
23 468 337 1308 642
622
42 431
365 178
992
656
33
194
1102 43
273
346
655
705
332 949
127
1377 140
649
986
1300
21 533
71 638
771
126
607 109
99
878
596
455
614
943 281
1141
108
586
98
877 582 518
752
721
354
171
1280
19
643
148
353
70
213 268
688
849
352
604 457 794 958
639
1120
608 322 684 848
351
356
1188 89
828 154
1119 239
270 860 536
1096
1281
348
20
504
439 797
88 1184
34 112
24 467
86
1345
314 288
949
532
796
40
126
292
276
1362 798
883
297
217
738
291 525
1283
1232
927
34
379 571 132 85 869
1055 926
378
519
175
74 39 33
718 687
496
167 17
89
636
294 73 373 787
698 169
557 1284
534
1233
265
1154
297
1163
786
93
817
287
1252 91
133 134 832
777 92 851
1031
18
677
286
91
1197 583
472
90
758
950
761
1292
993 1030
471
329
1036
66
1329
404 1029
524
92
813
462
818 65 550
290
671 670
1351
298
837
958 1079
1098
206
1250
996
293
152
549 1049
797 487
932 1293
699
729
775 1008
719
112
570
514
551
960
492
573 576
1178 930
36
60
333 577
659 135
629
450
54 330 522 929
760
35 493 978 345
558 930
681
332
382
259
461 640
704
331
574
381
747
460
446 657
1191 515
1030
545
424 609
291
180
205
679
184 1039
1012 55
13 471 951 1315
60
1156 852
649
1038
523
1011 839
441
26
145 328
685 1157
254
563 193
729
1247
113 411
153
25
655
562
144
601 433
372
389
271
475
1261
454 590 360
959
661 137
1225
474
14
997
453
447 440
1162
626
907 825
636
795
23 459
488 740
1048
776
1347
829
344 1024
411
204
1113
451
315
910
659
1010
258 306
308 656
57 1037 444
946
933
1169
208
891 351
1009
1213 30
10
928
1207 1192 1182
136
631
209
472 1305 548 454
214
594
911
29
703 1348
9
648
573
397
341
24 708
456
572
340
819
207
76 326
940
101
57
1374
945
282
701 243
1015
1327
711
1214 1100
2
869 818
1078 994
1014 667
1052 814
523
871
315
119
766
605
714 450
817 690
1371
56
1208
739
483 233
246 963 835
1058
260
1194 90
816
1222 595
470 339
169 466 602
942
564
88
309
720
1328
418
528
192
412
594
875
962 582
247
540
1319 96 369
1364
153
439
500
941
1215
87
246
1019 983
1340 795
226
972
725
1095
1339 35 585
762
661
147
2 1161
1074
1376
1290
327
905
1065 233
857
391
620
1326
842
734 738
1195
944 933
163
686 299
646
26
1174 282
106
1172
783
786
785 1277
1
390
1196
896
1
514
733
1231 926
42
542
845
144
432
883
363
154
478
1047 457 384 180 442
422
104
1159
745
954
886
262
965
670
660
363 1238
410
1000
456 261
1200
939
1262 948
237 81
909 626
453 913 995
770
1235 215 730
234
320
160
1012
387
746
76
100
947
236 82
143 280 224
347
894
107
787
882 805
417
632
1212 747 672
826
28
872
51
226
1370 65
321 310
1341
136
1013
396
566
726
129 5
210
1017 404 1237 1093
170
1321 77
395
25 572
725
867
7 924
1094 881
245
247 279
386
850
658 41 673
328 793
36
161
578 64
463
801
634 399 322
1220
832
1021 155 8 181 156
934 377
680
7
520
879
1153
836 416
1266
436
8
1114
248
75
438 1224
52
227
884 1365
218
360
623
447
1007
281 633 194 556 117
999 1206
176
128
361
302
111
184
209
863 622
118
27
446
62
544
1006
412
41
627 355
155
561 864
125 1025
624
364 125
301
876 644
1066 967 408 183 804
771
497
1204
984
277
244
823
435
985
1025 827 847
792 341
841
244
741 6
666 713
356
415
595
399
319 616
543
110
317 31
756
613
171
641
311
675
1075 593 749
398
496
991
1189
865 953
129
613
468
690 257
724
906
715
754 816
617
750
491 838
185
1140
589 597
735
727
170
909
850
576 686
538 511
952
1101
128
730
616 866
979
285
561 592
107
534
397
580 856
961 216 201
878
921 61 934 845
252
946
714 731
986
577
1285
1017
1128 182 512
448
493
1166
198
83
944
109
789
1147 1011
693
206
706 276
373
295
1002
1090
49
123 964
815
1359
InO (%)
1044
1369
945
131
368
669
50 40
484 162
1133
578
897
482
350 429
445
188
127
423
968
469
899
521
81
966
201
60
804
87
348
1164
964
980
113
461 568 420
889
628
159
674
890
888
1092
1127
481
1199
836
339
262
666
50
385
1043
833 160
531
1278
202
355 1021
609
1103
499 296
1264
936
570
1004
799
1050
393
358 357
464
637
1291
593
340 870
477
948
161
108
239 530
1042
28
680 648
251
224
241 731
212
162
469
552
1004
376 15
53
856
374
1048
874
914
413
198
575
51
240 32
1325 1057
82
1282
263
562
784
465 174
225
975
938 1001
951
222
61
1046
762
299
915
137
62
1254
1016 429
1056
841
159
313
245
614
647
211
1051 668
794
124
132
789
312
970
584
619
97
135
331
925
165
1049
784 250
1217 1000
646
1008
773
479
1260 1251
118
673 788
484
899
149 1099 403
168
67
590
1170
597 140
116
1373 32
275
69 292 16
317
758
640
227
567
488
1296
893 1106
342
50
489
947
343
323 47
1248 545
748
1105
1009
500
31
364
449
1353
976
261
768
1037
1205 1137
225
86
644
1272
678
571
615
1372 230
919
1190
1045
618
880
831
242
223
1241 635
167
187 1303 419
893
1176 768
148
537 122
610
755
1040
303 318
1121 191
221
235 732
37
158
505 232
196
40
575 765
851
432
913
868
1183
782
1001 769
27
485
901
39 316
421
630
1088
598
130
716 978
465
723
1097
1112
344
612 639
510
1352
278
1288
501
84
579
254 392
203 362
525 418
138
1216
808
774
475
383
410 298
312
1028
625
791
569
718
638
372
222 844
324
905
371
596
658
85
394
414
210
255
253 1089
1255
334
812
164
405
197
52
46 515
728
139
100
898
1116
876
1267
375
1322
274
1356
436
624
1165 1333
982
732
1273 1022
242
452
902
1148
428 1129
63 612 1046
12 290
443
1242
354 1210
653
689
378
1059
645
1151
486
1302 487
671
405
143
727
717
43
610
981
485
11
974 314
679
157
813
530
811
999
1173 855
961
902 1289
955
1142
1060
1041
268
1203
489
719
684
163
384
1018 892
217
717
1256
672
98
529
843
494
435
793 751
809
231
660
365 551
540
852
401
451 1320 1005 866
349
599
445
30
940 931
722 611
858
716
1027 834 637
166
753
1136
998
176
394
643 806
751
321
1035
181
704
400
116
796
444
1334 579 495
1185
957
839 554
647
653
650
474
790
937
38
504 1086
560
939
1064
1209
422
313
591
989
478
734 1061
320
68
908
1186
1269
654
278
979
1031
1069 724
486
895 810
168
508
323
516
956
1053
134 396 517
1279
105
1375
117
603
778
916 1249 1349
1218
1363
1010
78
665
846
1117
452
105
733
174
1316
524 606
44 589
757 1076 969
175
935
840
362 678 1171
388
395
657
1198
707
1135
900
188
1043
997
1301 977
565
1346 165
800 1275
737
283
409
195
458 1024
258
381 387 807
186
1287
705
700
736
172
455
750
955
243
1070
187
996
812
1134
400
391
491
443
781
333
97 156 623
1054
1152
1044
145 73
1228 490
442 350
830 1350
790
1317
345
1085
121
642
720 166
120 45
1229
393
146
464
675 173
1081
1236 406
303
106 1003
574
699
44
1354
64
274
20
782 900
847
1118
1087 539
1160 269
19
1274
63
781
104 495
1034
284
815 349
216 555 494
402
283
15 257 13
1355
1268 894
1263 221
186
1297
676
69
820 401 1304
12
417
80 68
780
1062
803
694
726
585
848
831
721 617 849 1357
383
553
1313 38
927
302
1143
936
779
581
928
280
219
5
859
473
885 1270
121
1020 977
1042
467
539
922
580
1265
879
18
279
1080 99 778
14 890
29 840 300
238 189
164
780 1358
1126
1314 645 3
72
748
366
980
1286 950
601 318
920
1181 921
466 6
1219
223 861 779
760
600 998
600
584
79
814
335 740 835
1091 1332 220 702
407
581
434
319 583
912
37
821 325
667
220
1223 476 954
95
289
367
1115 917 482
1193
120 213
11
1067
336
761 1158
1227
1187
54 513 1107
1023
1211 652
877 185
458
359 697 802
361
1175 1130
988
133
289 987
1063 880 251
777
272
910
1253
775
10
324
873 931
498
480 30 197 67 509
66
807 359
271
774
95
929 630
94 586
695 1068 4
219 628
938
629 230 844
674 122
1155 481 1221
943
1072
1005 842
685 270
773
103
53
218
625
772
548
483
263 918
800
269
772
1029 459
663
1007
696
142
662
1073 803
749
1026 256
229
78
770
627
1034
857
56 752
293 1082
763
460 434
651 267
1083 992
885
744
860
681
438
712
264
266
743
1368 742 255
74
228
430
963 80 932
490
1032
1071 506
1104
433
1226 1084
1002
713 427 437
505
329
1018
47
150
1006
93 437
707 296 477
511
77
1230
149
855 152
141
599
75
0
1109 962
1318 470 1110
325 522
632
382 822
151
739
1035
598
957 158
541
173
256 431 501 94
746 157
754
854
172
521 769
508
183
55 294
507
79
182 196
1111
330
1360
249
200
191
193
811
48
285
287
178
202
228
1179
195
199
190
192
810 327
1177
476 212
1276 334
304
215
510
211
250
286
288
370
179
203
379
229
214
205
231
631 249
664
766
745
248
753
301
305
204
235
307
853
234
305
300
662
253
304
260
252
547
259
546 308
518
526
385
2
1180
357
335
326
232
306
665
1361
389
310
425
307
309
416 407
371
380
358
691
543
546
409
414
420
767
406
682
408
413
419
549 960
513
959
519
527
386
390
424
512
428
423
923
708
743
517
544
547
941
669
668
924
709
1138
1145
516
1149
744
677
676
683
688
682
542
4
541
635
691
874
634
736
694
735
415
698
693
697
873
701
703
700
710
702
709
742
942 805
1298
1306
741 859
1309
809
858 966
1295
1139
1146
1150
968
970
972
808
974
862
868
872
1168
965
967
969
971
973
861 976
1367
723
722 538
537
692
687
765
1167
806
1132
440
528
550
1131
1125
529
764
591
692
1124
919
1294
1123
441
427
426 592
918
683
1122
1366
1299
1307
990
867
871
995
1310 975
989
994
6 Average Degree
8
10
Fig. 1. Left: The Gnutella TDG (P2P). Center: The Web TDG (non-P2P). Right: Scatter plot showing how few metrics can distinguish between TDG classes. The InO metric captures the percentage of nodes that have both in-coming and out-going connections. The average degree, is the average number of neighbors of a node in the graph. Percision Recall
100%
90% Percentage(%)
metrics to model P2P TDGs, we used standard data mining (feature selection) methods and traces from a diverse set of links. The data set comprise of three Tier-1 ISPs and five backbone links from Abilene Internet2 backbone. We observed that few static graph metrics are robust in time and space so they can be used to distinguish between applications. An example is illustrated in Figure 1 (Right), where two metrics are suffice to distinguish between the majority of P2P and non-P2P TDGs. Even though some non-P2P applications have graphs that look like P2P, we observed that using dynamic graph metrics we can still distinguish between the two. For example, DNS TDGs are similar in structure with P2P applications (e.g., FastTrack, Gnutella, etc.), but they are very different in how the graphs change over time. In particular, DNS TDGs are more stable and change less frequent than P2P TDGs. Our proposed methodology consists of three steps. At step one, we use data mining methods (K-means) to cluster flows into groups based on their similarity. For example, similar flows can be those with a common application-layer header. Intuitively, similar flows will belong to the same application. Therefore, the final output of this step will be groups of flows where each group has flows from a single application. At the second step, a TDG is formed for each group. Using the empirical profile of P2P TDGs (Figure 1), our method classifies each group as P2P or non-P2P. Finally, as an optional step, we can use statistical methods to extract a distinctive signature for each application. For example, such signature can be a regular expression (regex) matching part of the payload. All three steps can be performed offline. The evaluation on backbone traces shows that our method can detect over 90% of P2P traffic with over 95% accuracy. Detailed results for the main P2P protocols are presented in Figure 2. By comparing our method with BLINC [4], we observe that our approach detects overall 7% more P2P traffic with 6% higher accuracy on the same link. BLINC has significantly lower performance for some P2P applications. For example, we detect 95% of BitTorrent traffic, while BLINC detects only 25%!
80%
70%
60%
50%
BitTorrent MP2P eDonkey FastTrack Gnutella Soribada
Total
Fig. 2. Classification precision and recall for the six main P2P protocols. Recall, is the percentage of traffic from each class that was detected as P2P. Precision denotes the percentage of times we classify a flow as, e.g., Gnutella, and the flow is actually Gnutella.
IV. S UMMARY & C ONCLUSIONS Even though modeling network traffic as a graph has been proposed before [1], [5], we are the first to study TDGs in depth and explore their full potential. To the best of our knowledge, our graph-based approach is the first method to use the network-wide behavior of applications in order to classify network traffic. We believe TDGs will provide another useful tool in the arsenal of traffic analysis and monitoring techniques. ACKNOWLEDGMENTS The author would like to thank all his collaborators: Michalis Faloutsos, George Varghese, Michael Mitzenmacher, Prashanth Pappu, Sumeet Singh, Hyun-chul Kim, and CAIDA. R EFERENCES [1] D. Ellis, J. Aiken, K. Attwood, and S. Tenarglia. A Behavioral Approach to Worm Detection. In ACM CCS WORM, 2004. [2] M. Iliofotou, P. Pappu, M. Faloutsos, M. Mitzenmacher, H. Kim, and G. Varghese. Graption: Automated Detection of P2P Applications in the Internet Backbone. Technical report, UC Rverside, 2008. [3] M. Iliofotou, P. Pappu, M. Faloutsos, M. Mitzenmacher, S. Singh, and G. Varghese. Network Monitoring Using Traffic Dispersion Graphs (TDGs). In ACM IMC, 2007. [4] T. Karagiannis, K. Papagiannaki, and M. Faloutsos. BLINC: Multi-level Traffic Classification in the Dark. In ACM SIGCOMM, 2005. [5] G. Tan, M. Poletto, J. Guttag, and F. Kaashoek. Role classification of hosts within enterprise networks based on connection patterns. In USENIX Annual Technical Conference, 2003.