Active BGP Measurement with BGP-Mux - Caida

5 downloads 1096 Views 2MB Size Report
Networks Use BGP to Interconnect. 4. 4 ... BGP controls both inbound and outbound traffic. 4 .... LIFEGUARD: Practical Repair of Persistent Route Failures.
Active BGP Measurement with BGP-Mux

Ethan Katz-Bassett (USC) with testbed and some slides hijacked from Nick Feamster and Valas Valancius

2

Before I Start Georgia Tech system, I am just an enthusiastic user



  

Nick Feamster and his students: Valas Valancius Bharath Ravi

Questions for the audience:



 

What would you use this system for? What should we use it for? How do we get more ASes to connect to us?  

3

Getting them to agree to peer Then, getting the connection to work

Active BGP Measurement with BGP-Mux 3

Networks Use BGP to Interconnect

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Networks Use BGP to Interconnect

WS

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Networks Use BGP to Interconnect

WS

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Networks Use BGP to Interconnect

ATTWS

WS

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Networks Use BGP to Interconnect

ATTWS

WS

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Networks Use BGP to Interconnect

L3ATTWS ATTWS

SprintATTWS

WS

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Networks Use BGP to Interconnect

L3ATTWS ATTWS

SprintATTWS

WS

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Networks Use BGP to Interconnect UWL3ATTWS L3ATTWS ATTWS

SprintATTWS

WS

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Networks Use BGP to Interconnect UWL3ATTWS L3ATTWS ATTWS

SprintATTWS

WS

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Networks Use BGP to Interconnect UWL3ATTWS L3ATTWS ATTWS

SprintATTWS

WS

BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic

    4

Active BGP Measurement with BGP-Mux 4

4

Virtual Networks Need BGP, too Say I have some neat new routing ideas. I want to test them:  Emulate the type of AS (CDN, stub, etc) of my choice 

Choose a set of providers, peers, and customers

Inbound:



 

Choose routes from those providers Send traffic along those routes

Outbound:



 

Announce my prefix(es) to neighbors of choice, with communities, etc Receive traffic to prefix(es)

And everyone else should be able to do this, also



5

Active BGP Measurement with BGP-Mux 5

Traditionally, BGP Experiments are Hard I have some neat new routing ideas. How do I test them?  Passive observation  

E.g., RouteViews, RIPE Receive feeds only

Limited “active” measurements



 

E.g., Beacons Generally, regular announcements and withdrawals

Know the right people



 

Negotiate the ability to make announcements High overhead, limited deployment

All limit what you can do 6

Active BGP Measurement with BGP-Mux 6

What I Need to Get What I Want Resources





IP address space



AS number

Connectivity & contracts





BGP peering with real ASes



Data plane forwarding

Time and money



7

Active BGP Measurement with BGP-Mux 7

BGP-Mux Provides All This For You Resources



 

IP address space 184.164.224.0/19 AS number AS47065

Internet

Connectivity & contracts



 

BGP peering with real ASes 5 Universities as providers Data plane forwarding Send & receive traffic

Time and money One-time cost



9

UW

GT

BGP-Mux

Virtual Network

Virtual Network

Active BGP Measurement with BGP-Mux 9

Design Requirements    

Session transparency: BGP updates should appear as they would with direction connection Session stability: Upstreams should not see transient behavior Isolation: Individual networks should be able to set their own policies, forward independently, etc Scalability: BGP-Mux should support many networks

10

Active BGP Measurement with BGP-Mux 10

A Project Using BGP-Mux LIFEGUARD: Locating Internet Failures Effectively and Generating Usable Alternate Routes Dynamically  Locate the ISP / link causing the problem



Suggest that other ISPs reroute around the problem  

11

What would we like to add to BGP to enable this? What can we deploy today, using only available protocols and router support?

Active BGP Measurement with BGP-Mux 11

Our Goal for Failure Avoidance 

Enable content / service providers to repair persistent routing problems affecting them, regardless of which ISP is causing them

Setting  Assume we can locate problem  Assume we are multi-homed / have multiple data centers  Assume we speak BGP 

We use BGP-Mux to speak BGP to the real Internet: 5 US universities as providers 12

Active BGP Measurement with BGP-Mux 12

Self-Repair of Forward Paths

Straightforward: Choose a path that avoids the problem. 13

LIFEGUARD: Practical Repair of Persistent Route Failures 13

Self-Repair of Forward Paths

Straightforward: Choose a path that avoids the problem. 13

LIFEGUARD: Practical Repair of Persistent Route Failures 13

A Mechanism for Failure Avoidance Forward path: Choose route that avoids ISP or ISP-ISP link Reverse path: Want others to choose paths to my prefix P that avoid ISP or ISP-ISP link X  Want a BGP announcement AVOID(X,P):  

14

Any ISP with a route to P that avoids X uses such a route Any ISP not using X need only pass on the announcement

Active BGP Measurement with BGP-Mux 14

Ideal Self-Repair of Reverse Paths

15

LIFEGUARD: Practical Repair of Persistent Route Failures 15

Ideal Self-Repair of Reverse Paths

AVOID(L3,WS)

15

LIFEGUARD: Practical Repair of Persistent Route Failures 15

Ideal Self-Repair of Reverse Paths

AVOID(L3,WS)

AVOID(L3,WS)

15

LIFEGUARD: Practical Repair of Persistent Route Failures 15

Ideal Self-Repair of Reverse Paths AVOID(L3,WS)

AVOID(L3,WS)

AVOID(L3,WS)

15

LIFEGUARD: Practical Repair of Persistent Route Failures 15

Ideal Self-Repair of Reverse Paths AVOID(L3,WS)

AVOID(L3,WS)

AVOID(L3,WS)

15

LIFEGUARD: Practical Repair of Persistent Route Failures 15

Practical Self-Repair of Reverse Paths

16

LIFEGUARD: Practical Repair of Persistent Route Failures 16

Practical Self-Repair of Reverse Paths

WS

16

LIFEGUARD: Practical Repair of Persistent Route Failures 16

Practical Self-Repair of Reverse Paths

ATT → WS WS

Qwest → WS

16

LIFEGUARD: Practical Repair of Persistent Route Failures 16

Practical Self-Repair of Reverse Paths

L3 → ATT → WS ATT → WS Sprint → Qwest → WS

AISP → Qwest → WS

16

WS

Qwest → WS

LIFEGUARD: Practical Repair of Persistent Route Failures 16

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS

AISP → Qwest → WS

16

WS

Qwest → WS

LIFEGUARD: Practical Repair of Persistent Route Failures 16

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS

AISP → Qwest → WS

16

WS

Qwest → WS

LIFEGUARD: Practical Repair of Persistent Route Failures 16

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS

AISP → Qwest → WS

16

WS

Qwest → WS

LIFEGUARD: Practical Repair of Persistent Route Failures 16

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS

WS AVOID(L3,WS)

AISP → Qwest → WS

17

Qwest → WS

LIFEGUARD: Practical Repair of Persistent Route Failures 17

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS

AISP → Qwest → WS

WS → L3→ WS

Qwest → WS

BGP loop prevention encourages switch to working path. 17

LIFEGUARD: Practical Repair of Persistent Route Failures 17

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS

AISP → Qwest → WS

WS → L3→ WS

Qwest → WS → L3→ WS

BGP loop prevention encourages switch to working path. 17

LIFEGUARD: Practical Repair of Persistent Route Failures 17

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS

AISP → Qwest → WS → L3→ WS

WS → L3→ WS

Qwest → WS → L3→ WS

BGP loop prevention encourages switch to working path. 17

LIFEGUARD: Practical Repair of Persistent Route Failures 17

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS

WS → L3→ WS

Qwest → WS → L3→ WS

BGP loop prevention encourages switch to working path. 17

LIFEGUARD: Practical Repair of Persistent Route Failures 17

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS → L3→ WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS

WS → L3→ WS

BGP loop prevention encourages switch to working path. 17

LIFEGUARD: Practical Repair of Persistent Route Failures 17

Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS ? ATT → WS → L3→ WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS

WS → L3→ WS

BGP loop prevention encourages switch to working path. 17

LIFEGUARD: Practical Repair of Persistent Route Failures 17

Practical Self-Repair of Reverse Paths UW → Sprint → Qwest L3 → ATT → WS→ WS → L3→ WS ? ATT → WS → L3→ WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS

WS → L3→ WS

BGP loop prevention encourages switch to working path. 17

LIFEGUARD: Practical Repair of Persistent Route Failures 17

Practical Self-Repair of Reverse Paths UW → Sprint → Qwest L3 → ATT → WS→ WS → L3→ WS ? ATT → WS → L3→ WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS

WS → L3→ WS

BGP loop prevention encourages switch to working path. 17

LIFEGUARD: Practical Repair of Persistent Route Failures 17

Naive Poisoning Causes Transient Loss 





Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss

18

B-A-O F E-D-A-O

C

B-A-O

D-A-O E F-B-A-O

B

A-O

A-O

D

A

O

O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 18

Naive Poisoning Causes Transient Loss 





Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss

19

B-A-O F E-D-A-O

C

B-A-O

D-A-O E F-B-A-O

B

A-O

A-O

D

A

O

O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 19

Naive Poisoning Causes Transient Loss 





Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss

20

B-A-O F E-D-A-O

C

B-A-O

D-A-O E F-B-A-O

B

A-O-X-O

A-O-X-O

D

A

O

O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 20

Naive Poisoning Causes Transient Loss 





Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss

21

B-A-O-X-O F E-D-A-O

C

B-A-O-X-O

D-A-O-X-O E F-B-A-O

B

A-O-X-O

A-O-X-O

D

A

O

O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 21

Naive Poisoning Causes Transient Loss 





Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss

22

B-A-O-X-O E-D-A-O F E-D-A-O B-A-O-X-O

C

B-A-O-X-O

D-A-O-X-O F-B-A-O E F-B-A-O D-A-O-X-O

B

A-O-X-O

A-O-X-O

D

A

O

O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 22

Naive Poisoning Causes Transient Loss 





Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss

23

B-A-O-X-O E-D-A-O F E-D-A-O B-A-O-X-O

C

B-A-O-X-O

D-A-O-X-O F-B-A-O E F-B-A-O D-A-O-X-O

B

A-O-X-O

A-O-X-O

D

A

O

O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 23

Naive Poisoning Causes Transient Loss 





Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss

24

B-A-O-X-O E-D-A-O F E-D-A-O B-A-O-X-O

C

B-A-O-X-O

F-B-A-O E D-A-O-X-O D-A-O-X-O F-B-A-O

B

A-O-X-O

A-O-X-O

D

A

O

O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 24

Naive Poisoning Causes Transient Loss 





Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss

25

B-A-O-X-O F E-D-A-O-X-O

C

B-A-O-X-O

D-A-O-X-O E F-B-A-O-X-O

B

A-O-X-O

A-O-X-O

D

A

O

O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 25

Prepend to Reduce Path Exploration 

 

Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison

26

B-A-O-O-O F E-D-A-O-O-O

C

B-A-O-O-O

D-A-O-O-O E F-B-A-O-O-O

B

A-O-O-O

A-O-O-O

D

A

O

O-O-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 26

Prepend to Reduce Path Exploration 

 

Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison

27

B-A-O-O-O F E-D-A-O-O-O

C

B-A-O-O-O

D-A-O-O-O E F-B-A-O-O-O

B

A-O-O-O

A-O-O-O

D

A

O

O-O-O O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 27

Prepend to Reduce Path Exploration 

 

Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison

28

B-A-O-O-O F E-D-A-O-O-O

C

B-A-O-O-O

D-A-O-O-O E F-B-A-O-O-O

B

A-O-X-O A-O-O-O

A-O-O-O A-O-X-O

D

A

O

O-O-O O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 28

Prepend to Reduce Path Exploration 

 

Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison

29

B-A-O-X-O F E-D-A-O-O-O

C

B-A-O-X-O

D-A-O-X-O E F-B-A-O-O-O

B

A-O-X-O

A-O-X-O

D

A

O

O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 29

Prepend to Reduce Path Exploration 

 

Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison

30

B-A-O-X-O F E-D-A-O-X-O

C

B-A-O-X-O

D-A-O-X-O E F-B-A-O-X-O

B

A-O-X-O

A-O-X-O

D

A

O

O-X-O

AVOID(X,P)

Active BGP Measurement with BGP-Mux 30

Prepend to Reduce Path Exploration 

 

Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison

B-A-O-X-O F E-D-A-O-X-O

C

B-A-O-X-O

D-A-O-X-O E F-B-A-O-X-O

B

A-O-X-O

A-O-X-O

D

A UW

O-X-O O-X-O

GT

AVOID(X,P) O BGP-Mux

LIFEGUARD 30

Active BGP Measurement with BGP-Mux 30

Cumulative Fraction of Convergences (CDF)

Tested Idea Using BGP-Mux

  

0.9999 0.999 0.99 0.95 Prepend, no change No prepend, no change

0.65 0 0

1 2 3 4 5 6 7 8 Peer Convergence Time (minutes)

With no prepend, only 65% of unaffected ISPs converge instantly With prepending, 95% of unaffected ISPs re-converge instantly, 98%