Exploring Graph-based Network Traffic Monitoring - CiteSeerX

3 downloads 786 Views 437KB Size Report
promising, showing that TDGs can provide the basis for the next generation of network monitoring tools. I. INTRODUCTION. Modeling network traffic as a graph ...
Exploring Graph-based Network Traffic Monitoring Marios Iliofotou Department of Computer Science University of California, Riverside Email: [email protected]

Modeling network traffic as a graph has recently attracted significant attention. Initial studies, have only explored the capabilities of TDGs for the detection of worm activity [1] and the grouping of related hosts in enterprise networks [5]. In addition, prior efforts only focused on TDGs extracted from a single network. The main goals of this work is to explore the full potential of TDGs over a large set of applications, over long time duration, and different locations of observation. In more detail, this work comprises of two thrusts. First, we target the selection of: (a) the best methods to create and use TDGs in practice, and (b) the best metrics to model and summarize TDGs. Towards this end we experimented with a large set of real-world network traces collected from a diverse set of network links, some being of extended duration ranging from 2000 to 2008. Second, using the TDG profile of well known applications, we design a P2P classification framework [2], which is a systematic methodology for classifying network traffic using TDGs.

interaction between hosts. Such information can be the number of flows/bytes/packets exchanged between two IPs. In the basic case, an edge can represent the exchange of a single packet between two IPs. Other edge filters can collect traffic that belongs to a particular application. For example, under the assumption that all flows (defined using the 5-tuple {srcIP,srcPort,dstIP,dstPort,Proto}) with port number 53 belong to the DNS application, an edge filter using this port can be used to form the DNS TDG. Similarly, we can choose filters to generate TDGs for a large range of applications, e.g., SMTP, Gnutella, Bittorrent, Web, etc. A filter can potentially utilize all attributes of a flow, such as: (a) the payload, (b) port numbers, and (c) packet sizes and packet timing (e.g., inter packet gap) information. To model and summarize TDGs we experimented with a large set of both standard and novel metrics. We can group these metrics in two groups: (a) static metrics that capture the structure of the graph, and (b) dynamic metrics that can describe the changes of TDGs in time. Static graph metrics provide statistical summaries of TDGs. Such metrics include the degree distribution, degree correlation, distances, etc. Preliminary results of this study can be found in [3]. Dynamic metrics introduce time as a parameter in the study of TDGs. For example, a dynamic metric can quantify how much the graph has changed over the past two hours. An in depth study of dynamic properties is part of our ongoing work. We studied a large set of metrics (static, and dynamic) over a large set of real-world traces. We here summarize some key observations. The graphs formed within five-minute intervals generate graphs with stable graph structure, compared to graphs generated over shorter intervals. Such five-minute long graphs can allow the separation between different classes of applications, as we show in Section III in more detail. Another interesting observation is that TDGs show periodic and predictable behavior over the length of a day and over weekends. Finally, even though TDGs can have quantitatively different metric values over different locations, their structure is distinctive enough to lead to qualitatively similar observations.

II. TDG S : P RACTICAL C ONSIDERATIONS

III. G RAPH - BASED P2P T RAFFIC C LASSIFICATION

The definition of what constitutes an edge in the graph is flexible and can change depending on the application at hand. We refer to the processes of selecting which nodes are connected in a TDG as Edge Filtering. The edges in TDGs can be annotated to store information regarding the

The key idea in our graph-based classifier, is that TDGs generated by different applications have characteristic structure that we can use to distinguish between them. For example, in Figure 1 we show the Gnutella (P2P) and Web TDGs. Clearly, the two graphs even visually look different. To select the best

Abstract—Monitoring network traffic and classifying applications are essential functions for network administrators. These tasks are becoming increasingly challenging since (a) many applications obfuscate their traffic using nonstandard ports, and (b) new applications constantly appear. This suggests the need for a behavioral-based approach, where the detector looks for fundamental behaviors of the application that are both intrinsic to the application and distinct from normal traffic. Identifying intrinsic behaviors makes it difficult for application writers to disguise such behaviors without defeating the very purpose of the application. In this paper, we propose a graph-based representation of network traffic which captures the networkwide interactions of applications. In these graphs, nodes are individual IP address and edges between nodes represent particular communications. For example, an edge might represent the exchange of a single packet, or the exchange of at least ten packets of any type. We call such graphs “Traffic Dispersion Graphs” or TDGs [3]. As a proof of concept we show that our proposed graph-based classifier out-perfoms BLINC [4] in detecting P2P traffic on backbone links. Our results are very promising, showing that TDGs can provide the basis for the next generation of network monitoring tools.

I. I NTRODUCTION

864 1337

1335

1330

710

1323

663 1257

1245

1243

1020

1014

1019

1338

1336

1331

1013 917

1324

915

907

904

898

896

892

889

502 1032

990

987

952 916

914

906

903

897

895

838

834

830

891

888

825

1271 1258

1246

1244 837

272

833

829

802

827

824

799

374 846

801

798

920 1033

991

988

611

1033

791

865 102

652 884

953

526

712

1023

1022

1016 828

9 651

711

664 619

58

875

763

1045

843

338

375

618

1003 633

503 853 267 903 103 641

90

785 696

506 273

499

377

370

565

502

1239 993

608

606

603

376

59

369

564

923

854

615

1312 46 398 449

426

403

343

776

912

922

142

1342

788

1108

139

131

102

1240

498 862

823 10 71 337 463

559 654 179

1234

473 689

45

238 462

973

695

240

96

558

P2P non-P2P Boundaries

556

1028

200

4

208

22 756 557

353

49

507 904 863

492

983

1027

207

509

607

605

602

824

150 284 448

425

402

342

421

911 887

870

503 757

956 971

588

241

527 650

1202 141

1038

138

567

480

16

130

101

886

935

368

177

566

70

367

72

151 559

115 236

295

984 380 737

728

190 147 535 755

764

58 479

759 1039

604

555

1259 275 925

552

48

1026 84

264 83

820

887

17 520 3

532

80

536 819 392

199 1201

237

569

985

347 146

336 352 587 1040

21 568

346 497

901 587

114

563

981

715 1047 1015

588 1041

124

554

22 266 560

388

1144

277

1343

706

430 826

937

123

553 316

189

621

338

821 759 792 59

1311

982 620

1036

531

533 265

621

70

1077

767

822 115

111

882

1344 783 20 177 119 908 114

110

881

535

366 311

23 468 337 1308 642

622

42 431

365 178

992

656

33

194

1102 43

273

346

655

705

332 949

127

1377 140

649

986

1300

21 533

71 638

771

126

607 109

99

878

596

455

614

943 281

1141

108

586

98

877 582 518

752

721

354

171

1280

19

643

148

353

70

213 268

688

849

352

604 457 794 958

639

1120

608 322 684 848

351

356

1188 89

828 154

1119 239

270 860 536

1096

1281

348

20

504

439 797

88 1184

34 112

24 467

86

1345

314 288

949

532

796

40

126

292

276

1362 798

883

297

217

738

291 525

1283

1232

927

34

379 571 132 85 869

1055 926

378

519

175

74 39 33

718 687

496

167 17

89

636

294 73 373 787

698 169

557 1284

534

1233

265

1154

297

1163

786

93

817

287

1252 91

133 134 832

777 92 851

1031

18

677

286

91

1197 583

472

90

758

950

761

1292

993 1030

471

329

1036

66

1329

404 1029

524

92

813

462

818 65 550

290

671 670

1351

298

837

958 1079

1098

206

1250

996

293

152

549 1049

797 487

932 1293

699

729

775 1008

719

112

570

514

551

960

492

573 576

1178 930

36

60

333 577

659 135

629

450

54 330 522 929

760

35 493 978 345

558 930

681

332

382

259

461 640

704

331

574

381

747

460

446 657

1191 515

1030

545

424 609

291

180

205

679

184 1039

1012 55

13 471 951 1315

60

1156 852

649

1038

523

1011 839

441

26

145 328

685 1157

254

563 193

729

1247

113 411

153

25

655

562

144

601 433

372

389

271

475

1261

454 590 360

959

661 137

1225

474

14

997

453

447 440

1162

626

907 825

636

795

23 459

488 740

1048

776

1347

829

344 1024

411

204

1113

451

315

910

659

1010

258 306

308 656

57 1037 444

946

933

1169

208

891 351

1009

1213 30

10

928

1207 1192 1182

136

631

209

472 1305 548 454

214

594

911

29

703 1348

9

648

573

397

341

24 708

456

572

340

819

207

76 326

940

101

57

1374

945

282

701 243

1015

1327

711

1214 1100

2

869 818

1078 994

1014 667

1052 814

523

871

315

119

766

605

714 450

817 690

1371

56

1208

739

483 233

246 963 835

1058

260

1194 90

816

1222 595

470 339

169 466 602

942

564

88

309

720

1328

418

528

192

412

594

875

962 582

247

540

1319 96 369

1364

153

439

500

941

1215

87

246

1019 983

1340 795

226

972

725

1095

1339 35 585

762

661

147

2 1161

1074

1376

1290

327

905

1065 233

857

391

620

1326

842

734 738

1195

944 933

163

686 299

646

26

1174 282

106

1172

783

786

785 1277

1

390

1196

896

1

514

733

1231 926

42

542

845

144

432

883

363

154

478

1047 457 384 180 442

422

104

1159

745

954

886

262

965

670

660

363 1238

410

1000

456 261

1200

939

1262 948

237 81

909 626

453 913 995

770

1235 215 730

234

320

160

1012

387

746

76

100

947

236 82

143 280 224

347

894

107

787

882 805

417

632

1212 747 672

826

28

872

51

226

1370 65

321 310

1341

136

1013

396

566

726

129 5

210

1017 404 1237 1093

170

1321 77

395

25 572

725

867

7 924

1094 881

245

247 279

386

850

658 41 673

328 793

36

161

578 64

463

801

634 399 322

1220

832

1021 155 8 181 156

934 377

680

7

520

879

1153

836 416

1266

436

8

1114

248

75

438 1224

52

227

884 1365

218

360

623

447

1007

281 633 194 556 117

999 1206

176

128

361

302

111

184

209

863 622

118

27

446

62

544

1006

412

41

627 355

155

561 864

125 1025

624

364 125

301

876 644

1066 967 408 183 804

771

497

1204

984

277

244

823

435

985

1025 827 847

792 341

841

244

741 6

666 713

356

415

595

399

319 616

543

110

317 31

756

613

171

641

311

675

1075 593 749

398

496

991

1189

865 953

129

613

468

690 257

724

906

715

754 816

617

750

491 838

185

1140

589 597

735

727

170

909

850

576 686

538 511

952

1101

128

730

616 866

979

285

561 592

107

534

397

580 856

961 216 201

878

921 61 934 845

252

946

714 731

986

577

1285

1017

1128 182 512

448

493

1166

198

83

944

109

789

1147 1011

693

206

706 276

373

295

1002

1090

49

123 964

815

1359

InO (%)

1044

1369

945

131

368

669

50 40

484 162

1133

578

897

482

350 429

445

188

127

423

968

469

899

521

81

966

201

60

804

87

348

1164

964

980

113

461 568 420

889

628

159

674

890

888

1092

1127

481

1199

836

339

262

666

50

385

1043

833 160

531

1278

202

355 1021

609

1103

499 296

1264

936

570

1004

799

1050

393

358 357

464

637

1291

593

340 870

477

948

161

108

239 530

1042

28

680 648

251

224

241 731

212

162

469

552

1004

376 15

53

856

374

1048

874

914

413

198

575

51

240 32

1325 1057

82

1282

263

562

784

465 174

225

975

938 1001

951

222

61

1046

762

299

915

137

62

1254

1016 429

1056

841

159

313

245

614

647

211

1051 668

794

124

132

789

312

970

584

619

97

135

331

925

165

1049

784 250

1217 1000

646

1008

773

479

1260 1251

118

673 788

484

899

149 1099 403

168

67

590

1170

597 140

116

1373 32

275

69 292 16

317

758

640

227

567

488

1296

893 1106

342

50

489

947

343

323 47

1248 545

748

1105

1009

500

31

364

449

1353

976

261

768

1037

1205 1137

225

86

644

1272

678

571

615

1372 230

919

1190

1045

618

880

831

242

223

1241 635

167

187 1303 419

893

1176 768

148

537 122

610

755

1040

303 318

1121 191

221

235 732

37

158

505 232

196

40

575 765

851

432

913

868

1183

782

1001 769

27

485

901

39 316

421

630

1088

598

130

716 978

465

723

1097

1112

344

612 639

510

1352

278

1288

501

84

579

254 392

203 362

525 418

138

1216

808

774

475

383

410 298

312

1028

625

791

569

718

638

372

222 844

324

905

371

596

658

85

394

414

210

255

253 1089

1255

334

812

164

405

197

52

46 515

728

139

100

898

1116

876

1267

375

1322

274

1356

436

624

1165 1333

982

732

1273 1022

242

452

902

1148

428 1129

63 612 1046

12 290

443

1242

354 1210

653

689

378

1059

645

1151

486

1302 487

671

405

143

727

717

43

610

981

485

11

974 314

679

157

813

530

811

999

1173 855

961

902 1289

955

1142

1060

1041

268

1203

489

719

684

163

384

1018 892

217

717

1256

672

98

529

843

494

435

793 751

809

231

660

365 551

540

852

401

451 1320 1005 866

349

599

445

30

940 931

722 611

858

716

1027 834 637

166

753

1136

998

176

394

643 806

751

321

1035

181

704

400

116

796

444

1334 579 495

1185

957

839 554

647

653

650

474

790

937

38

504 1086

560

939

1064

1209

422

313

591

989

478

734 1061

320

68

908

1186

1269

654

278

979

1031

1069 724

486

895 810

168

508

323

516

956

1053

134 396 517

1279

105

1375

117

603

778

916 1249 1349

1218

1363

1010

78

665

846

1117

452

105

733

174

1316

524 606

44 589

757 1076 969

175

935

840

362 678 1171

388

395

657

1198

707

1135

900

188

1043

997

1301 977

565

1346 165

800 1275

737

283

409

195

458 1024

258

381 387 807

186

1287

705

700

736

172

455

750

955

243

1070

187

996

812

1134

400

391

491

443

781

333

97 156 623

1054

1152

1044

145 73

1228 490

442 350

830 1350

790

1317

345

1085

121

642

720 166

120 45

1229

393

146

464

675 173

1081

1236 406

303

106 1003

574

699

44

1354

64

274

20

782 900

847

1118

1087 539

1160 269

19

1274

63

781

104 495

1034

284

815 349

216 555 494

402

283

15 257 13

1355

1268 894

1263 221

186

1297

676

69

820 401 1304

12

417

80 68

780

1062

803

694

726

585

848

831

721 617 849 1357

383

553

1313 38

927

302

1143

936

779

581

928

280

219

5

859

473

885 1270

121

1020 977

1042

467

539

922

580

1265

879

18

279

1080 99 778

14 890

29 840 300

238 189

164

780 1358

1126

1314 645 3

72

748

366

980

1286 950

601 318

920

1181 921

466 6

1219

223 861 779

760

600 998

600

584

79

814

335 740 835

1091 1332 220 702

407

581

434

319 583

912

37

821 325

667

220

1223 476 954

95

289

367

1115 917 482

1193

120 213

11

1067

336

761 1158

1227

1187

54 513 1107

1023

1211 652

877 185

458

359 697 802

361

1175 1130

988

133

289 987

1063 880 251

777

272

910

1253

775

10

324

873 931

498

480 30 197 67 509

66

807 359

271

774

95

929 630

94 586

695 1068 4

219 628

938

629 230 844

674 122

1155 481 1221

943

1072

1005 842

685 270

773

103

53

218

625

772

548

483

263 918

800

269

772

1029 459

663

1007

696

142

662

1073 803

749

1026 256

229

78

770

627

1034

857

56 752

293 1082

763

460 434

651 267

1083 992

885

744

860

681

438

712

264

266

743

1368 742 255

74

228

430

963 80 932

490

1032

1071 506

1104

433

1226 1084

1002

713 427 437

505

329

1018

47

150

1006

93 437

707 296 477

511

77

1230

149

855 152

141

599

75

0

1109 962

1318 470 1110

325 522

632

382 822

151

739

1035

598

957 158

541

173

256 431 501 94

746 157

754

854

172

521 769

508

183

55 294

507

79

182 196

1111

330

1360

249

200

191

193

811

48

285

287

178

202

228

1179

195

199

190

192

810 327

1177

476 212

1276 334

304

215

510

211

250

286

288

370

179

203

379

229

214

205

231

631 249

664

766

745

248

753

301

305

204

235

307

853

234

305

300

662

253

304

260

252

547

259

546 308

518

526

385

2

1180

357

335

326

232

306

665

1361

389

310

425

307

309

416 407

371

380

358

691

543

546

409

414

420

767

406

682

408

413

419

549 960

513

959

519

527

386

390

424

512

428

423

923

708

743

517

544

547

941

669

668

924

709

1138

1145

516

1149

744

677

676

683

688

682

542

4

541

635

691

874

634

736

694

735

415

698

693

697

873

701

703

700

710

702

709

742

942 805

1298

1306

741 859

1309

809

858 966

1295

1139

1146

1150

968

970

972

808

974

862

868

872

1168

965

967

969

971

973

861 976

1367

723

722 538

537

692

687

765

1167

806

1132

440

528

550

1131

1125

529

764

591

692

1124

919

1294

1123

441

427

426 592

918

683

1122

1366

1299

1307

990

867

871

995

1310 975

989

994

6 Average Degree

8

10

Fig. 1. Left: The Gnutella TDG (P2P). Center: The Web TDG (non-P2P). Right: Scatter plot showing how few metrics can distinguish between TDG classes. The InO metric captures the percentage of nodes that have both in-coming and out-going connections. The average degree, is the average number of neighbors of a node in the graph. Percision Recall

100%

90% Percentage(%)

metrics to model P2P TDGs, we used standard data mining (feature selection) methods and traces from a diverse set of links. The data set comprise of three Tier-1 ISPs and five backbone links from Abilene Internet2 backbone. We observed that few static graph metrics are robust in time and space so they can be used to distinguish between applications. An example is illustrated in Figure 1 (Right), where two metrics are suffice to distinguish between the majority of P2P and non-P2P TDGs. Even though some non-P2P applications have graphs that look like P2P, we observed that using dynamic graph metrics we can still distinguish between the two. For example, DNS TDGs are similar in structure with P2P applications (e.g., FastTrack, Gnutella, etc.), but they are very different in how the graphs change over time. In particular, DNS TDGs are more stable and change less frequent than P2P TDGs. Our proposed methodology consists of three steps. At step one, we use data mining methods (K-means) to cluster flows into groups based on their similarity. For example, similar flows can be those with a common application-layer header. Intuitively, similar flows will belong to the same application. Therefore, the final output of this step will be groups of flows where each group has flows from a single application. At the second step, a TDG is formed for each group. Using the empirical profile of P2P TDGs (Figure 1), our method classifies each group as P2P or non-P2P. Finally, as an optional step, we can use statistical methods to extract a distinctive signature for each application. For example, such signature can be a regular expression (regex) matching part of the payload. All three steps can be performed offline. The evaluation on backbone traces shows that our method can detect over 90% of P2P traffic with over 95% accuracy. Detailed results for the main P2P protocols are presented in Figure 2. By comparing our method with BLINC [4], we observe that our approach detects overall 7% more P2P traffic with 6% higher accuracy on the same link. BLINC has significantly lower performance for some P2P applications. For example, we detect 95% of BitTorrent traffic, while BLINC detects only 25%!

80%

70%

60%

50%

BitTorrent MP2P eDonkey FastTrack Gnutella Soribada

Total

Fig. 2. Classification precision and recall for the six main P2P protocols. Recall, is the percentage of traffic from each class that was detected as P2P. Precision denotes the percentage of times we classify a flow as, e.g., Gnutella, and the flow is actually Gnutella.

IV. S UMMARY & C ONCLUSIONS Even though modeling network traffic as a graph has been proposed before [1], [5], we are the first to study TDGs in depth and explore their full potential. To the best of our knowledge, our graph-based approach is the first method to use the network-wide behavior of applications in order to classify network traffic. We believe TDGs will provide another useful tool in the arsenal of traffic analysis and monitoring techniques. ACKNOWLEDGMENTS The author would like to thank all his collaborators: Michalis Faloutsos, George Varghese, Michael Mitzenmacher, Prashanth Pappu, Sumeet Singh, Hyun-chul Kim, and CAIDA. R EFERENCES [1] D. Ellis, J. Aiken, K. Attwood, and S. Tenarglia. A Behavioral Approach to Worm Detection. In ACM CCS WORM, 2004. [2] M. Iliofotou, P. Pappu, M. Faloutsos, M. Mitzenmacher, H. Kim, and G. Varghese. Graption: Automated Detection of P2P Applications in the Internet Backbone. Technical report, UC Rverside, 2008. [3] M. Iliofotou, P. Pappu, M. Faloutsos, M. Mitzenmacher, S. Singh, and G. Varghese. Network Monitoring Using Traffic Dispersion Graphs (TDGs). In ACM IMC, 2007. [4] T. Karagiannis, K. Papagiannaki, and M. Faloutsos. BLINC: Multi-level Traffic Classification in the Dark. In ACM SIGCOMM, 2005. [5] G. Tan, M. Poletto, J. Guttag, and F. Kaashoek. Role classification of hosts within enterprise networks based on connection patterns. In USENIX Annual Technical Conference, 2003.

Suggest Documents