Networks Use BGP to Interconnect. 4. 4 ... BGP controls both inbound and
outbound traffic. 4 .... LIFEGUARD: Practical Repair of Persistent Route Failures.
Active BGP Measurement with BGP-Mux
Ethan Katz-Bassett (USC) with testbed and some slides hijacked from Nick Feamster and Valas Valancius
2
Before I Start Georgia Tech system, I am just an enthusiastic user
Nick Feamster and his students: Valas Valancius Bharath Ravi
Questions for the audience:
What would you use this system for? What should we use it for? How do we get more ASes to connect to us?
3
Getting them to agree to peer Then, getting the connection to work
Active BGP Measurement with BGP-Mux 3
Networks Use BGP to Interconnect
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Networks Use BGP to Interconnect
WS
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Networks Use BGP to Interconnect
WS
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Networks Use BGP to Interconnect
ATTWS
WS
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Networks Use BGP to Interconnect
ATTWS
WS
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Networks Use BGP to Interconnect
L3ATTWS ATTWS
SprintATTWS
WS
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Networks Use BGP to Interconnect
L3ATTWS ATTWS
SprintATTWS
WS
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Networks Use BGP to Interconnect UWL3ATTWS L3ATTWS ATTWS
SprintATTWS
WS
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Networks Use BGP to Interconnect UWL3ATTWS L3ATTWS ATTWS
SprintATTWS
WS
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Networks Use BGP to Interconnect UWL3ATTWS L3ATTWS ATTWS
SprintATTWS
WS
BGP sessions Route advertisements Traffic over those routes BGP controls both inbound and outbound traffic
4
Active BGP Measurement with BGP-Mux 4
4
Virtual Networks Need BGP, too Say I have some neat new routing ideas. I want to test them: Emulate the type of AS (CDN, stub, etc) of my choice
Choose a set of providers, peers, and customers
Inbound:
Choose routes from those providers Send traffic along those routes
Outbound:
Announce my prefix(es) to neighbors of choice, with communities, etc Receive traffic to prefix(es)
And everyone else should be able to do this, also
5
Active BGP Measurement with BGP-Mux 5
Traditionally, BGP Experiments are Hard I have some neat new routing ideas. How do I test them? Passive observation
E.g., RouteViews, RIPE Receive feeds only
Limited “active” measurements
E.g., Beacons Generally, regular announcements and withdrawals
Know the right people
Negotiate the ability to make announcements High overhead, limited deployment
All limit what you can do 6
Active BGP Measurement with BGP-Mux 6
What I Need to Get What I Want Resources
IP address space
AS number
Connectivity & contracts
BGP peering with real ASes
Data plane forwarding
Time and money
7
Active BGP Measurement with BGP-Mux 7
BGP-Mux Provides All This For You Resources
IP address space 184.164.224.0/19 AS number AS47065
Internet
Connectivity & contracts
BGP peering with real ASes 5 Universities as providers Data plane forwarding Send & receive traffic
Time and money One-time cost
9
UW
GT
BGP-Mux
Virtual Network
Virtual Network
Active BGP Measurement with BGP-Mux 9
Design Requirements
Session transparency: BGP updates should appear as they would with direction connection Session stability: Upstreams should not see transient behavior Isolation: Individual networks should be able to set their own policies, forward independently, etc Scalability: BGP-Mux should support many networks
10
Active BGP Measurement with BGP-Mux 10
A Project Using BGP-Mux LIFEGUARD: Locating Internet Failures Effectively and Generating Usable Alternate Routes Dynamically Locate the ISP / link causing the problem
Suggest that other ISPs reroute around the problem
11
What would we like to add to BGP to enable this? What can we deploy today, using only available protocols and router support?
Active BGP Measurement with BGP-Mux 11
Our Goal for Failure Avoidance
Enable content / service providers to repair persistent routing problems affecting them, regardless of which ISP is causing them
Setting Assume we can locate problem Assume we are multi-homed / have multiple data centers Assume we speak BGP
We use BGP-Mux to speak BGP to the real Internet: 5 US universities as providers 12
Active BGP Measurement with BGP-Mux 12
Self-Repair of Forward Paths
Straightforward: Choose a path that avoids the problem. 13
LIFEGUARD: Practical Repair of Persistent Route Failures 13
Self-Repair of Forward Paths
Straightforward: Choose a path that avoids the problem. 13
LIFEGUARD: Practical Repair of Persistent Route Failures 13
A Mechanism for Failure Avoidance Forward path: Choose route that avoids ISP or ISP-ISP link Reverse path: Want others to choose paths to my prefix P that avoid ISP or ISP-ISP link X Want a BGP announcement AVOID(X,P):
14
Any ISP with a route to P that avoids X uses such a route Any ISP not using X need only pass on the announcement
Active BGP Measurement with BGP-Mux 14
Ideal Self-Repair of Reverse Paths
15
LIFEGUARD: Practical Repair of Persistent Route Failures 15
Ideal Self-Repair of Reverse Paths
AVOID(L3,WS)
15
LIFEGUARD: Practical Repair of Persistent Route Failures 15
Ideal Self-Repair of Reverse Paths
AVOID(L3,WS)
AVOID(L3,WS)
15
LIFEGUARD: Practical Repair of Persistent Route Failures 15
Ideal Self-Repair of Reverse Paths AVOID(L3,WS)
AVOID(L3,WS)
AVOID(L3,WS)
15
LIFEGUARD: Practical Repair of Persistent Route Failures 15
Ideal Self-Repair of Reverse Paths AVOID(L3,WS)
AVOID(L3,WS)
AVOID(L3,WS)
15
LIFEGUARD: Practical Repair of Persistent Route Failures 15
Practical Self-Repair of Reverse Paths
16
LIFEGUARD: Practical Repair of Persistent Route Failures 16
Practical Self-Repair of Reverse Paths
WS
16
LIFEGUARD: Practical Repair of Persistent Route Failures 16
Practical Self-Repair of Reverse Paths
ATT → WS WS
Qwest → WS
16
LIFEGUARD: Practical Repair of Persistent Route Failures 16
Practical Self-Repair of Reverse Paths
L3 → ATT → WS ATT → WS Sprint → Qwest → WS
AISP → Qwest → WS
16
WS
Qwest → WS
LIFEGUARD: Practical Repair of Persistent Route Failures 16
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS
AISP → Qwest → WS
16
WS
Qwest → WS
LIFEGUARD: Practical Repair of Persistent Route Failures 16
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS
AISP → Qwest → WS
16
WS
Qwest → WS
LIFEGUARD: Practical Repair of Persistent Route Failures 16
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS
AISP → Qwest → WS
16
WS
Qwest → WS
LIFEGUARD: Practical Repair of Persistent Route Failures 16
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS
WS AVOID(L3,WS)
AISP → Qwest → WS
17
Qwest → WS
LIFEGUARD: Practical Repair of Persistent Route Failures 17
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS
AISP → Qwest → WS
WS → L3→ WS
Qwest → WS
BGP loop prevention encourages switch to working path. 17
LIFEGUARD: Practical Repair of Persistent Route Failures 17
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS
AISP → Qwest → WS
WS → L3→ WS
Qwest → WS → L3→ WS
BGP loop prevention encourages switch to working path. 17
LIFEGUARD: Practical Repair of Persistent Route Failures 17
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS Sprint → Qwest → WS
AISP → Qwest → WS → L3→ WS
WS → L3→ WS
Qwest → WS → L3→ WS
BGP loop prevention encourages switch to working path. 17
LIFEGUARD: Practical Repair of Persistent Route Failures 17
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS
WS → L3→ WS
Qwest → WS → L3→ WS
BGP loop prevention encourages switch to working path. 17
LIFEGUARD: Practical Repair of Persistent Route Failures 17
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS L3 → ATT → WS ATT → WS → L3→ WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS
WS → L3→ WS
BGP loop prevention encourages switch to working path. 17
LIFEGUARD: Practical Repair of Persistent Route Failures 17
Practical Self-Repair of Reverse Paths UW → L3 → ATT → WS ? ATT → WS → L3→ WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS
WS → L3→ WS
BGP loop prevention encourages switch to working path. 17
LIFEGUARD: Practical Repair of Persistent Route Failures 17
Practical Self-Repair of Reverse Paths UW → Sprint → Qwest L3 → ATT → WS→ WS → L3→ WS ? ATT → WS → L3→ WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS
WS → L3→ WS
BGP loop prevention encourages switch to working path. 17
LIFEGUARD: Practical Repair of Persistent Route Failures 17
Practical Self-Repair of Reverse Paths UW → Sprint → Qwest L3 → ATT → WS→ WS → L3→ WS ? ATT → WS → L3→ WS SprintSprint → Qwest → Qwest → WS→→WS L3→ WS
WS → L3→ WS
BGP loop prevention encourages switch to working path. 17
LIFEGUARD: Practical Repair of Persistent Route Failures 17
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss
18
B-A-O F E-D-A-O
C
B-A-O
D-A-O E F-B-A-O
B
A-O
A-O
D
A
O
O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 18
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss
19
B-A-O F E-D-A-O
C
B-A-O
D-A-O E F-B-A-O
B
A-O
A-O
D
A
O
O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 19
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss
20
B-A-O F E-D-A-O
C
B-A-O
D-A-O E F-B-A-O
B
A-O-X-O
A-O-X-O
D
A
O
O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 20
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss
21
B-A-O-X-O F E-D-A-O
C
B-A-O-X-O
D-A-O-X-O E F-B-A-O
B
A-O-X-O
A-O-X-O
D
A
O
O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 21
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss
22
B-A-O-X-O E-D-A-O F E-D-A-O B-A-O-X-O
C
B-A-O-X-O
D-A-O-X-O F-B-A-O E F-B-A-O D-A-O-X-O
B
A-O-X-O
A-O-X-O
D
A
O
O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 22
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss
23
B-A-O-X-O E-D-A-O F E-D-A-O B-A-O-X-O
C
B-A-O-X-O
D-A-O-X-O F-B-A-O E F-B-A-O D-A-O-X-O
B
A-O-X-O
A-O-X-O
D
A
O
O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 23
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss
24
B-A-O-X-O E-D-A-O F E-D-A-O B-A-O-X-O
C
B-A-O-X-O
F-B-A-O E D-A-O-X-O D-A-O-X-O F-B-A-O
B
A-O-X-O
A-O-X-O
D
A
O
O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 24
Naive Poisoning Causes Transient Loss
Some ISPs may have working paths that avoid problem ISP X Naively, poisoning causes path exploration even for these ISPs Path exploration causes transient loss
25
B-A-O-X-O F E-D-A-O-X-O
C
B-A-O-X-O
D-A-O-X-O E F-B-A-O-X-O
B
A-O-X-O
A-O-X-O
D
A
O
O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 25
Prepend to Reduce Path Exploration
Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison
26
B-A-O-O-O F E-D-A-O-O-O
C
B-A-O-O-O
D-A-O-O-O E F-B-A-O-O-O
B
A-O-O-O
A-O-O-O
D
A
O
O-O-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 26
Prepend to Reduce Path Exploration
Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison
27
B-A-O-O-O F E-D-A-O-O-O
C
B-A-O-O-O
D-A-O-O-O E F-B-A-O-O-O
B
A-O-O-O
A-O-O-O
D
A
O
O-O-O O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 27
Prepend to Reduce Path Exploration
Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison
28
B-A-O-O-O F E-D-A-O-O-O
C
B-A-O-O-O
D-A-O-O-O E F-B-A-O-O-O
B
A-O-X-O A-O-O-O
A-O-O-O A-O-X-O
D
A
O
O-O-O O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 28
Prepend to Reduce Path Exploration
Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison
29
B-A-O-X-O F E-D-A-O-O-O
C
B-A-O-X-O
D-A-O-X-O E F-B-A-O-O-O
B
A-O-X-O
A-O-X-O
D
A
O
O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 29
Prepend to Reduce Path Exploration
Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison
30
B-A-O-X-O F E-D-A-O-X-O
C
B-A-O-X-O
D-A-O-X-O E F-B-A-O-X-O
B
A-O-X-O
A-O-X-O
D
A
O
O-X-O
AVOID(X,P)
Active BGP Measurement with BGP-Mux 30
Prepend to Reduce Path Exploration
Most routing decisions based on: (1) next hop ISP (2) path length Keep these fixed to speed convergence Prepending prepares ISPs for later poison
B-A-O-X-O F E-D-A-O-X-O
C
B-A-O-X-O
D-A-O-X-O E F-B-A-O-X-O
B
A-O-X-O
A-O-X-O
D
A UW
O-X-O O-X-O
GT
AVOID(X,P) O BGP-Mux
LIFEGUARD 30
Active BGP Measurement with BGP-Mux 30
Cumulative Fraction of Convergences (CDF)
Tested Idea Using BGP-Mux
0.9999 0.999 0.99 0.95 Prepend, no change No prepend, no change
0.65 0 0
1 2 3 4 5 6 7 8 Peer Convergence Time (minutes)
With no prepend, only 65% of unaffected ISPs converge instantly With prepending, 95% of unaffected ISPs re-converge instantly, 98%