IP/MPLS CORE IP/MPLS CORE – HIGH AVAILABILITY DESIGN

12 downloads 4225 Views 4MB Size Report
27 Jan 2010 ... HIGH AVAILABILITY SOLUTION ARCHITECTURE ... Clean separation of routing and packet forwarding functions. Junos. Software. RE p g.
IP/MPLS CORE – HIGH AVAILABILITY DESIGN Mars Chen 27th January 2010

TODAY’S IP NETWORK Is an infrastructure that supports mission critical services: ƒ VoIP/Mobility ƒ Converged data network services ƒ Business VPN Services ƒ Cloud Computing ƒ And Internet access services ƒ ………………….. ƒ These carrier services typically have customer SLA’s that must be

supported

2

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

BUSINESS CASE FOR HIGH AVAILABILITY

Cost

Downtime Cost

Availability Cost

Availability

3

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

HIGH AVAILABILITY SOLUTION ARCHITECTURE Carrier-Class Availability Is a Culture, Not a Single Feature or Product

Dependable Networks Dependable Platforms Hardware

4

Copyright © 2009 Juniper Networks, Inc.

Software

www.juniper.net

Scalability

Security

M Management t process

PLATFORM HIGH AVAILABILITY

1. Hardware 2. Software 3. Control plane •



Nonstop forwarding ( NSF) : Graceful restart + GRES N Nonstop t routing ti

4. Virtualization – Virtual Chassis

5

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

LOGICAL PLATFORM VIEW OF MODERN ROUTER Clean separation of routing and packet forwarding p g functions ƒ Consistent performance

RE

Junos Software Forwarding Table

ƒ Stability ƒ Provider-class routing

Update

Routing Engine (RE) ƒ JUNOS software

Internet

Packet Forwarding Engine (PFE) ƒ Processor-based design

Forwarding Table

PFE

Interfaces

Processor (s)

Switch Fabric

ƒ FPC/PICs I/O Card

6

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

I/O Card

on n Daemo

Daemo n n3 Daemon

on 2 Daemo

3 D Daemon Daemo on 1

Daemon 2

Control plane and forwarding plane separation allows continuous packet forwarding during control plane failure GRES provides stateful replication between the master and backup REs

Daemon 1

ROUTING ENGINE REDUNDANCY GRACEFUL ROUTING ENGINE SWITCHOVER (GRES) ( )

Kernel ƒ Keepalives exchange info such as

Kernel

interfaces, and kernel. ƒ GRES allows fast switchover d i RE failure during f il

Keepalive

GRES alone is not enough

ƒ Routing adjacencies broken during

switchover ƒ Must be combined with either graceful restart protocol extensions or nonstop active routing

Packet Forwarding

Ph i l IInterfaces Physical t f 7

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

Daemon n

Daem Daemon nmon 3

emon 2 Dae

Daemon 3 emon 1 Dae

Daemon 2

GRES + Graceful Restart

Daemon 1

NONSTOP FORWARDING

Kernel e e Kernel

= NSF (Nonstop Forwarding)

8

Keepalive Routing Adjacencies

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

GRACEFUL RESTART: OPERATIONS Separate control and d data d t planes l P

P

When router recovers, neighbors sync up without disturbing forwarding forwarding.

PE 2

PE 1

PE 3

If router’s control plane fails, data plane can keep forwarding packets

9

P

P Other routers are never made aware of failure

Neighbors hide failure from all others routers in the network

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

GRACEFUL RESTART PROTOCOL DETAILS Purpose - Continue forwarding (PFE) during a restart of routing (RE)

BGP

OSPF

IS-IS

MPLS

Changes

IETF

Protocol extensions Per-peer configuration Various timers with configurable defaults

Graceful Restart Mechanism for BGP

Protocol extensions New opaque-LSA type 9, “Grace-LSA”

Graceful OSPF Restart

Protocol extensions 3 new timers New “re-start” option (TLV) in IIH PDU

Restart Signaling for ISIS

Protocol Extensions

Graceful Restart Mechanism for BGP with MPLS

Uses signaling as described in “Graceful Restart Mechanism for BGP

RSVP

rfc4724

rfc3623

rfc3847

Rfc4781

Extensions to GMPLS RSVP

Protocol Extensions

Graceful Restart

Extend rfc 3473

Rfc5063

Recovery ERO

10

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

GRACEFUL RESTART: LIMITATIONS ƒ

Neighboring routers must understand GR procedures and messages • Inhibits full GR implementation • Particularly significant on PEs

ƒ

GR must stop if topology changes during grace period

ƒ

In some cases, cases cannot distinguish between link failure and control plane failure

ƒ

In some cases, routing re-convergence might exceed grace period • For example, if there are hundreds of BGP peers

ƒ

Protocol interdependencies can slow re-convergence beyond grace period • For example, if BGP depends on LDP restart completion, which depends on RSVP restart completion

ƒ

11

Operators acceptance of GR is not widespread

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

Kernel Kernel

ƒ No need to run protocol extensions

on neighbors Routing Adjacencies

Backup routing engine becomes hot standby ƒ Both REs run routing protocol

processes ƒ Relies R li on GRES tto preserve kkernell and interface info

switchover seamless to neighbors 12

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

Dae emon n

Dae emon 3 Daemon n

Dae emon 2

Daemon 3 Dae emon 1

Individual routers assume sole responsibility for RE failure or switchover it h

Daemon 2

Internal processes keep backup RE aware of protocol state and adjacency activities

Daemon 1

NON-STOP ACTIVE ROUTING

Kernel Kernel

Ideally, a fully redundant system can eliminate any disruption during software upgrade Reality is, most systems today are nott fully f ll redundant d d t

emon n Dae

Daenmon 3 Daemon

emon 2 Dae

Daemon 3 Dae emon 1

software upgrade isn’t merely a single g p point of software replacement.

Daemon 2

Not every vendor extends GRES, GR and NSR support to ISSU

Daemon 1

IN SERVICE SOFTWARE UPGRADE

Keepalive

Packet Forwarding

Physical Interfaces 13

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

TRUE ISSU Version Y.1

Version X.0

Kernel Control Plane Forwarding Plane

Packet Forwarding

Physical Interfaces 14

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

Da aemon n

Daemon 3

Da aemon 2

Can be done to major or minor releases l

Routing E Engine

Upgrades entire software image

Da aemon 1

Upgrade Between Major Releases

VIRTUALIZATION VIRTUAL CHASSIS KEY ADVANTAGES No Dependency on Dedicated Connectivity Hardware Leverage existing Redundancy Mechanisms ƒ GRES/NSR, LAG (Aggregated-Ethernet).

Operational efficiency – Single control plane as visible externally. Failover transparent to external control entities (OSS, policy servers, AAA etc).

ƒ Failover not gated by routing re-convergence.

Copyright © 2009 Juniper Networks, Inc.

RE 0

1

RE 1

2

LC

35 4

No routing change as visible externally: Inter-chassis link failover completely contained within VC

15

0

www.juniper.net

LC LC

RELIABLE NETWORKS ƒSDH (Layer 1) ƒEthernet Eth t OAM ( Layer L 2) ƒLink bundling ƒVRRP ƒIGP fast convergence ƒBFD ƒVRRP ƒMPLS (RSVP) Fast reroute ƒIP/LDP fast reroute

16

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

SONET/SDH PROTECTION SWITCHING

SONET APS & SDH MSP ƒ Redundant routers share uplink

Rapid circuit failure recovery ƒ Used on router router-to-ADM to ADM links

ADM

Interoperable with standard ADM

Protect

Working & protect circuits ƒ May reside on different routers ƒ May reside on same router

17

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

OAM LAYERS Transport Connectivity

Transport Layer ƒ Ensures two directly connected peers

maintain bidirectional communication. communication ƒ Must detect link down failure and notify higher layer for protocol to re-route around the failure. ƒ Monitor link quality to ensure that performance meets an acceptable level.

S i Service

Connectivity Layer ƒ Monitor the communication path

between two non-adjacent devices.

Service Layer ƒ Measures and represents the status of

the services as seen by the customer. ƒ Metrics such as throughput, round-trip delay jitter need to be monitored in an delay, effect to meet the Service Level Agreements (SLAs) contracted between the provider and the customer.

18

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

ETHERNET OAM FAILURE ACTION

Both 802.3ah and 802.1ag can mark the link down upon failure MPLS FRR can be triggered by link down indication ƒ Ethernet Ring Protection will also link down indication

19

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

LINK BUNDLING Reliable Links ƒ Link failure does not affect forwarding ƒ Load redistributed among

Simultaneous Physical Connections

other members

Parallel Link Technologies ƒ ƒ ƒ ƒ

20

MLPPP – T1/E1 Link aggregation Multi-Link Frame Relay 802 3ad – Ethernet aggregation 802.3ad SONET/SDH aggregation

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

IP DYNAMIC ROUTING

New York San Francisco

ƒOSPF or IS-IS computes path ƒIf link or node fails, New path is computed ƒResponse times: Typically a few seconds ƒCompletion time: Typically a few minutes, but very dependant on topology

21

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

FASTER ROUTER CONVERGENCE

ƒ Faster convergence improves Network Reliability ƒ Separation of control & forwarding planes is key ƒ Protocol expertise is key

Features

Benefits

High Priority Flooding for Interested LSPs (ISIS)

ƒ Timer reduced from 100 to 20msec ƒ Faster propagation of major changes

Quick SPF Scheduling (ISIS)

ƒ Reduces Red ces time from 7 sec to 50 msec ƒ Speeds calculation of optimum path

Sub-second Hellos (ISIS)

ƒ Lowest Hello Time possible for IS-IS, 333msec ƒ Faster Link Failure Detection

RIB and FIB Enhancements (BGP)

22

ƒ Indirect Next Hop implies faster convergence

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

WHAT IS BFD (BIDIRECTIONAL FORWARDING DETECTION ) ? Yet Another Hello Protocol ( but lightweight – compared to OSPF/ISIS….) Packets sent at regular intervals; neighbor failure detected when packets stop showing up Always unicast, even on shared media Not just for direct links; can be used over MPLS LSPs LSPs, multi multi-hop hop separated neighbors, unidirectional links. Simple packet format, very low processing overhead. Can be implemented in forwarding plane to the extent possible.

23

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

BFD ENHANCEMENTS ƒStatic route ƒOSPF/ISIS ƒ eBGP ƒ Multihop iBGP ƒ BFD for MPLS OAM ƒ BFD provides LSP data plane verification. LSP-Ping verifies consistency between LSP control and

data plane. BFD benefits: ƒ Lightweight. Scales to large number of LSPs ƒ Sub-second S b d ffailure il d detection t ti ƒ Periodic fault detection

ƒ BFD over different types of LSPs: ƒ Point-to-point LSP (RSVP or LDP signaled) ƒ ECMP PE-PE PE PE awareness ƒ P2MP LSP ƒ L2 Pseudowire ƒ Can use VCCV-BFD and VCCV-Ping for Pseudowires

24

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

BFD VS ETHERNET OAM 802.1AG CC COMPARISON

25

BFD

802.1ag CC

Layer 3 continuity check. check Better for Layer 3 and MPLS

Ethernet Layer 2 continuity check. check Better for Ethernet

Use ping and traceroute for loopback and trace functions

802.1ag also supports loopback and linktrace

C Comparatively simpler configuration f

MEPs, MIPs, MD give more flexibility f but make configuration complex. Working to simplify it.

Runs on aggregate link but not on child li k links

Runs on aggregate and child links. Can adjust dj t aggregate t bandwidth. b d idth

Session down causes IGP, BGP to reroute

Interface down causes protocols to reroute

S OAM: O data plane p a e failure a u e detection detect o MPLS for RSVP and LDP LSPs, LSP switch action, can trigger FRR

MPLS S OAM: O interface te ace do down ca can ttrigger gge FRR

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

MPLS (RSVP BASED) PROTECTION Secondary LSP Secondary Standby LSP FAST REROUTE

Detour

ƒ One-to-One Backup p ((aka 1:1

Pi Primary

detour) –using label swapping ƒ Facility Backup – using label stacking g

LSR

ƒ Link Protection – Protects only against link failure

ƒ Node Protection – Protects against both link failure and node forwarding plane failure

26

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

IP/LDP FAST REROUTE WHY? ƒSome networks do not implement RSVP ƒRSVP Requires PE to PE full mesh for transport plane protection t ti ƒRSVP N^2 Problem ƒ 200 PEs = app. app 39800 LSPs + detours & bypasses ƒ Fixable using LSP hierarchy

27

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

LOOP-FREE ALTERNATES (LFA) ƒAdds fast-reroute (FRR) capability to IS-IS, OSPF and LDP ƒ Normally only best nodal path is used for RIB walks ƒAdd Add a non-best non best (albeit loop free) path for backup purposes. ƒHow ? ƒ Shared, common link state database ƒ Place the SPF root at your neighbors

28

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

SPF ROOTS & LFA ILLUSTRATED

N1

2

R2

1 3

1

S

D

3

3 1

N2

3

R3

Main SPF Backup SPF 29

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

MANAGEMENT PROCESS CONTINUOUS SYSTEMS AVAILABILITY Minimize impact of human factors with failsafe mechanisms ƒ Prevent configuration errors and ensure compliance

to configuration policies with candidate configurations, commit verifications, commit scripts

Speed response and resolution to unplanned events

Planned Maintenance Hardware and software upgrades

ƒ Avert downtime with transparent failover, network

Unplanned Events Network failures, h d hardware events and software defects

recovery features ƒ Provide proactive response and accelerate resolution with extensive instrumentation, event policies, op scripts

Reduce time of planned events ƒ Stable releases reduce the frequency of fixes and

Human Factors Changes that negatively impact network performance

30

duration of upgrades ƒ In-service upgrades available in high-end routing platforms Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

SCRIPT AUTOMATION ƒHelps to Reduce Human Error ƒCommit Scripts p - Parse Configurations g upon p commit ƒ Generate Warnings ƒ Reject the commit ƒ Modify the configuration ƒOp Scripts - Used to ensure compliance with company policy ƒ Generate notifications ƒ Automatic Diagnosis ƒ Correction of Problems ƒEvent E t Policies P li i - Monitors M it an eventt trigger ti ƒ Works with Op Scripts ƒ Wait for Notification messages 31

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

SCALABILITY/SECURITY Scalability ƒ Hardware : performance/next-hops/firewall performance/next hops/firewall filter… filter ƒ Control plane : routes/peers/routing instances/logging….

Security ƒ DDOS Attacks

32

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net

SUMMARY

High Availability:

33

™

Is a culture

™

Has many layers

™

Is business critical

Copyright © 2009 Juniper Networks, Inc.

www.juniper.net