BRKRST 3363
BRKRST 3363
BRKRST 3363
• NSF/NSR
Agenda
• Thinking About Fast Convergence
• Routed Convergence
• Failure Detection
• OSPF/ISIS
• EIGRP
• BGP
• Additional Routed Convergence
• NSF/NSR
Fast Convergence Mindset
• How Fast?
• 200ms (or less)
• 50ms – SONET APS
• Do I Need It?
• Complexity vs. Return
• Business Drivers
• Risks
Convergence =
A D
C
Measuring Fast Convergence
A D
C
Measuring Fast Convergence
• Failure Detection
• What happened?
B
! !
A D
C
Measuring Fast Convergence
• Failure Detection
• What happened?
• Event Propagation
• Spread the word
B
! !
My Link to D is
A down! D
C
My Link to B is down!
Measuring Fast Convergence
• Event Propagation
• Spread the word
A
??? ??? D
C
Measuring Fast Convergence
• Event Propagation
• Spread the word
Reach D via C
A No change D
C
Measuring Fast Convergence
• Failure Detection • Routing Process
• What happened? • Now where do we go?
A D
C
Measuring Fast Convergence
• Failure Detection • Routing Process
• What happened? • Now where do we go?
A D
C
Network Convergence Overview
• Network convergence is the time needed for traffic to be rerouted to the
alternative or more optimal path after the network event
• Network convergence requires all affected routers to process the event and
update the appropriate data structures used for forwarding
Proactive ~99.9%
Good change management processes including what-if analysis and change validation
Low number of Single Points of Failures
Fault and configuration management tools
Improved consistency (HW, SW, Config, design)
Typically no quality improvement process
The Culture of Availability
What’s Your Availability Level?
Fast Convergence Mindset
Predictive ~99.99+%
The mistake: we should not have been thinking about routing protocol
convergence, but of network convergence
• NSF/NSR
Failure Detection
Failure Detection
Detecting Link Failure
Hardware Dependent
• Polling vs Interrupt
• 6748-GE-TX: 20ms/port * 48 ports = 960ms (polled)
• Nexus 7k, ASR9k, 6708-10GE/ES/ES+: <10ms (interrupt)
Failure Detection
Detecting Link Failure
• Link Failure -> Interface Down, Easy?
Hardware Software
Port Firmware
PHY CPU IOS
ASIC
• Debounce Timer
• Throttles down notification
• Switches only
Failure Detection
Detecting Link Failure
• Link Failure -> Interface Down, Easy?
Hardware Software
Port Firmware
PHY CPU IOS
ASIC
Tunnels (GRE,
IPsec, etc.)
L2 bridged
network
• Even Faster
• Normal Hellos…but fast!
• 50ms x 3 = 150ms detection
• ~1 second detection
• Interrupt Driven (like CEF)
• Process Driven
• 1 Hello to Rule Them All
• 1 Hello/Protocol
• PIM, LDP, BGP, OSPF • Hardware Offload Possible
• Nexus 7k, ASR 1k/9k, me3600-CX,
• Handled by Central CPU 7600 ES+
BFD Configuration
IOS-XE(config)# interface…
IOS-XE(config-if)# bfd interval 50 min_rx 50 multiplier 3
IOS-XE(config)# router ospf 1
IOS-XE(config-router)# bfd all-interfaces
RP/0/RSP0/CPU0:XR# configure
RP/0/RSP0/CPU0:XR(config)# router ospf 0
RP/0/RSP0/CPU0:XR(config-ospf)# area 0
RP/0/RSP0/CPU0:XR(config-ospf)# interface …
RP/0/RSP0/CPU0:XR(config-ospf-ar-if)# bfd fast-detect
NX-OS(config)# interface…
NX-OS(config-if)# bfd interval 50 min_rx 50 multiplier 3
NX-OS(config)# router ospf 1
NX-OS(config-router)# bfd
Measuring Fast Convergence
• FIB Update
• Event Propagation • Make it so
• Spread the word
OSPF
SPF and LSA Generation Throttling
Throttling is the general process of slowing down responses to the frequently
oscillating events such as link flaps.
The general idea is to reduce resource wastage in unstable situations and wait till the
situations calm down…….The general idea is as follows.
When an event occurs, e.g. a link goes down or new LSA arrives, do not respond to
it immediately, e.g. by generating an LSA or running SPF, but wait some time, hoping
to accumulate more similar events, e.g. waiting for the link to go back up, or more
LSAs arriving.
This could potentially save a lot of resources, by reducing the number of SPF runs or
amount of LSAs flooded.
The question is – how long should we hold or throttle the responses?
Petr Lapukhov
http://blog.ine.com/2009/12/31/tuning-ospf-performance/
OSPF Convergence Times
• Convergence =
Failure Detection + Event Propagation + SPF + FIB Update
IOS
brisbane#sh ip ospf 100
Routing Process "ospf 100" with ID 30.0.0.13
Initial SPF schedule delay 5000 msecs Initial SPF delay (5 secs)
Minimum hold time between two consecutive SPFs 10000 msecs (10 secs)
Maximum wait time between two consecutive SPFs 10000 msecs (10 secs)
Incremental-SPF disabled
Minimum LSA interval 5 secs Min LSA interval
IOS XE OSPF
OSPF: Defaults
Default Timers
IOS XE
darwin#sh ip ospf 100
Routing Process "ospf 100" with ID 30.0.0.14
.
.
Initial SPF schedule delay 5000 msecs Initial SPF delay (5 secs)
Minimum hold time between two consecutive SPFs 10000 msecs (10 secs)
Maximum wait time between two consecutive SPFs 10000 msecs (10 secs)
Incremental-SPF disabled
Minimum LSA interval 5 secs Min LSA interval
IOS XR OSPF
OSPF: Defaults
Default Timers
IOS XR
RP/0/5/CPU0:rangers#sh ospf
• The Bad
• Query Domain Size B C
EIGRP
Query
EIGRP
Query
D
EIGRP
Query
Event Propagation in EIGRP
EIGRP A
• The Good Query
• Immediate event notification
• The Bad
• Query Domain Size B C
EIGRP
Query
EIGRP
Query
D
EIGRP
Query
Improving EIGRP Event Propagation
EIGRP A
• Reduce Query Domains Query
• Summary
• Stub
• Filters
Summary
Boundary
B C
D
Improving EIGRP Event Propagation
EIGRP A
• Reduce Query Domains Query
• Summary EIGRP
Reply
• Stub
• Filters
Summary
Boundary
B C
D
EIGRP Summary Configuration
IOS-XE(config)# interface…
IOS-XE(config-if)# ip summary address eigrp <AS_NUM> 192.168.0.0 255.255.0.0
!
IOS-XE(config)#router eigrp <AS_NUM>
IOS-XE(config-router)#eigrp stub
IOS-XE(config-router)#distribute-list 7 out <Interface>
IOS-XE(config)# access-list 7 permit 172.16.1.0 0.0.0.255
RP/0/RSP0/CPU0:XR# configure
RP/0/RSP0/CPU0:XR(config)# router eigrp <AS_NUM>
RP/0/RSP0/CPU0:XR(config-eigrp)# address-family ipv4
RP/0/RSP0/CPU0:XR(config-eigrp-af)# stub
RP/0/RSP0/CPU0:XR(config-eigrp-af)# route-policy summary out
RP/0/RSP0/CPU0:XR(config-eigrp-af)# <interface>
RP/0/RSP0/CPU0:XR(config-eigrp-af-if)# summary-address 192.168.0.0/24
NX-OS(config)# interface…
NX-OS(config-if)# ip summary address eigrp <AS_NUM> 192.168.0.0 255.255.0.0
!
NX-OS(config)#router eigrp <AS_NUM>
NX-OS(config-router)#eigrp stub
NX-OS(config-router)#distribute-list 7 out <Interface>
NX-OS(config)# access-list 7 permit 172.16.1.0 0.0.0.255
Improving EIGRP Event Propagation
B C
• Feasible Successors
• Don’t even ask!
D
Improving EIGRP Event Propagation
B C
• Feasible Successors
• Don’t even ask!
~0 ms
• No Query/Reply
D
EIGRP
Feasible Successor
• Whether the next best path is considered loop free by EIGRP (a feasible
successor) or not has a large impact on convergence times.
• Don’t just consider the best path from every point in your network, but also
the next best path.
• Determine how best to set up your path metrics to improve convergence
performance.
• Always use the delay metric to engineer your routing, never the bandwidth
metric!
EIGRP Feasible Successors
• EIGRP selects Successor and Feasible Successor
• Successor is the best route
• Feasible Successor is 2nd best route
• Must be mathematically loop-free (meets feasibility condition)
• Feasible Successor acts as a “backup route”
• Kept in topology table (not routing table)
• Up to 6 Feasible Successors
• Built into the protocol, nothing to enable
EIGRP Feasible Successors
Delay 10
172.16.2.0/24 EIGRP 10 B
Delay 15
172.16.2.0/24
B
EIGRP Feasible Successors
RouterB#show ip eigrp topology
P 172.16.2.0/24, 1 successors, FD is 285440
via 192.168.200.1 (285440/281600), Ethernet0/1
via 172.16.1.1 (307200/281600), Ethernet0/0
• Keepalives
• 60/180s default
• Don’t tune (at least not aggressively)
• BFD
• neighbor <> fall-over bfd
• Interface Tracking
• Notifies BGP if interface/route down
• Enabled by default
BGP
• BGP and IGP Convergence tuning have a different focus
•IGPConvergence - Rebuild the topology quickly following an event
•BGP Convergence - Transfer large amounts of prefix information very quickly
• BFD
XE-NX(config)# router bgp 65535
XE-NX(config-router)# neighbor <> fall-over bfd
RP/0/RSP0/CPU0:XR# configure
RP/0/RSP0/CPU0:XR(config)# router bgp 65535
RP/0/RSP0/CPU0:XR(config-bgp)# bfd mimimum-interval 50
RP/0/RSP0/CPU0:XR(config-bgp)# bfd multipler 3
RP/0/RSP0/CPU0:XR(config-bgp)# neighbor <>
RP/0/RSP0/CPU0:XR(config-bgp-nbr)# bfd fast-detect
BGP Event Propagation
• MTU L3 Source
• Bigger packets L3 Destination
TTL
• BGP Based On TCP
• MSS Source Port
• Maximum amount of TCP data Destination Port
• Window Size Flags MTU
• Local TCP buffer Window
• ACKs reduce window as it fills
RP/0/RSP0/CPU0:XR# configure
RP/0/RSP0/CPU0:XR(config)# router bgp 65535
RP/0/RSP0/CPU0:XR(config-bgp)# address-family ipv4 unicast
RP/0/RSP0/CPU0:XR(config-bgp-af)# nexthop trigger-delay critical <> non-critical <>
BGP Routing Update – PIC Core
• Flat RIB = slow convergence
10.1.1.0/24 192.168.1.1
10.1.2.0/24 192.168.1.1
10.1.3.0/24 192.168.1.1
• Before PIC
• Update per route
• Convergence dependent on
BGP RIB size
BGP Routing Update – PIC Core
• Instead of flat FIB, Hierarchical
10.1.1.0/24 192.168.1.1
10.1.2.0/24 192.168.1.1
10.1.3.0/24 192.168.1.1
BGP Routing Update – PIC Core
• Instead of flat FIB, Hierarchical
10.1.1.0/24 192.168.1.1
10.1.3.0/24 192.168.1.1
• Single change updates multiple entries
• Convergence time independent from prefix
count
7600(config)# cef table output-chain build favor convergence-speed
BGP
Minimum Route Advertisement Interval (MRAI)
RFC 4271
Section 9.2.1.1
BGP
Minimum Route Advertisement Interval (MRAI)
• MRAI timers are maintained per peer
•iBGP – 0 seconds by default
•eBGP – 30 seconds by default
•neighbor x.x.x.x advertisement-interval <0-600>
• Pros
•Promotes stability by batching route changes
•Improves update packing in some situations
• Cons
•May drastically slow convergence
•One flapping prefix can slow convergence for other prefixes
BGP
Minimum Route Advertisement Interval (MRAI)
• BGP is not a link state protocol, but instead is path vector based
• May take several “rounds/cycles” of exchanging updates & withdraws for the
network to converge
• MRAI must expire between each round!
• The more fully meshed the network and the more tiers of Autonomous
Systems, the more rounds required for convergence
• Think about
•The many tiers of Autonomous Systems that are in the Internet
•The degree to which peering can be fully meshed
Additional Routed
Convergence
Forwarding Table Overview (CEF)
OSPF
Adjacency Routing
Table Table EIGRP
OSPF
Adjacency Routing
Table Table EIGRP
OSPF
Adjacency Routing
Table Table EIGRP
• Process quantum
• XE only
• Prefix Prioritization RSP720 RP1 RSP2 Sup2e RSP440 RP2
• Install /32s first
Software CEF Updates
• OSPF Prefix Suppression (XE Only)
XE(config)#router ospf 1
XE(config-router)#prefix-suppression
• ISIS advertise-passive-only (XE/XR)
XE(config)#router isis CLUS
XE(config-router)#advertise-passive-only
RP/0/RSP0/CPU0:XR(config)# router isis CLUS
RP/0/RSP0/CPU0:XR(config-isis)# address-family ipv4 unicast
RP/0/RSP0/CPU0:XR(config-isis-af)# advertise passive-only
• Prefix Prioritization
XE(config)#interface g0/0
XE(config-interface)#isis tag 7
XE(config)#router isis CLUS
XE(config-router)# ip route priority high tag 7
RP/0/RSP0/CPU0:XR(config)# ipv4 prefix-list critical-priority-prefix
RP/0/RSP0/CPU0:XR(config-ipv4_pfx)# 10 permit 0.0.0.0/0 eq 32
RP/0/RSP0/CPU0:XR(config)# router isis CLUS
RP/0/RSP0/CPU0:XR(config-isis)# address-family ipv4 unicast
RP/0/RSP0/CPU0:XR(config-isis-af)# spf prefix-priority critical critical-priority-prefix
Forwarding Table Overview (CEF)
OSPF
Adjacency Routing
Table Table EIGRP
OSPF
Adjacency Routing
Table Table EIGRP
OSPF
Adjacency Routing
Table Table EIGRP
25
t, Loss Of
20
15 no pic
10 pic
5
0
n 2n 3n 4n 5n 6n
• PIC is the ability to restore forwarding without resorting to per prefix operations.
• Loss Of Connectivity does not increase as my network grows (one problem
less).
And Then There Were LFAs
• Loop Free Alternate: a routing protocol calculates the next-best hop should a
link fail (per-link LFA) or a particular prefix become unreachable (per-prefix
LFA)
• EIGRP’s concept of Feasible Successor is per-prefix LFA
• Recent LFA technologies simply apply FS logic to link-state protocol
LFA in Link-State Protocols
• WWLSPD (What Would Link-State Protocols Do?)
• OSPF and ISIS can apply the same concept as EIGRP’s FS
• Calculate the second-best NH in the event of a failure
• …variants can handle link or node failure
DBD exchange
LSA exchange
• This process is the same as initial
database synchronization, but it uses
set A to
different packet types.
exchange
Control Data B
NSF/GR OSPF
• When A and B have resynchronized
their databases, they place each
other in full state, and run SPF. Control Data A
Control Data B
NSF/GR OSPF
router ospf 100
• Use the nsf command under the nsf A
router ospf configuration mode to ....
enable graceful restart.
• Show ip ospf can be used to verify
graceful restart is operational. router ospf 100
nsf
....
router#sh ip ospf
Routing Process "ospf 100" with ID 10.1.1.1
....
B
Non-Stop Forwarding enabled, last NSF restart 00:02:06
ago (took 44 secs)
router#show ip ospf neighbor detail
Neighbor 3.3.3.3, interface address 170.10.10.3
....
Options is 0x52
LLS Options is 0x1 (LR), last OOB-Resync 00:02:22 ago
NSF/GR BGP
• When the BGP peering session is
brought up, the graceful restart
capability is negotiated. If both peers Control Data
state they are capable of GR, it is A
Updates
not run the bestpath calculations until
its B has finished sending updates.
Read only
mode
When B has finished sending updates,
it sends an end of RIB marker, which is
an update with an empty withdrawn
NLRI TLV.
Control Data B
B
NSF/GR BGP
• When A receives the end of RIB
marker, it runs bestpath, and installs
the best routes in the routing table. Control Data A
Control Data B
NSF/GR BGP
• Use the bgp graceful-restart
command under the router bgp router bgp 65000
bgp graceful-restart A
configuration mode to enable ....
graceful restart.
• Show ip bgp neighbors can be
used to verify graceful restart is router bgp 65501
operational. bgp graceful-restart
....
Unlike GR, NSR is a self-contained solution to maintain the routing topology across HA events
TCP connections and the routing protocol sessions are migrated from the active RP to standby RP without
letting the peers knowing about the switchover
Neighbors/protocol peers and rest of the network do not notice that an OSPF/LDP/BGP process went
through a restart
Minimal LSA/Route information re-flooded during NSR recovery
Overall CPU usage greatly reduced during NSR recovery
Improves reliability of the overall system
Agenda
o Thinking About Fast Convergence
Reactive Convergence
o Failure Detection
o Event Propagation
o Routing Update
o BGP Convergence
Forwarding Table Update
• Proactive Convergence
• Closing Remarks
Agenda
PE1 PE3
CE CE
1 2
PE2 PE4
BGP PIC Edge
• You can submit an entry for more than one of your “favorite” speakers
• Don’t forget to follow @CiscoLive and @CiscoPress
• View the official rules at http://bit.ly/CLUSwin
Complete Your Online Session Evaluation
• Give us your feedback to be
entered into a Daily Survey
Drawing. A daily winner
will receive a $750 Amazon
gift card.
• Complete your session surveys
though the Cisco Live mobile
app or your computer on
Cisco Live Connect.
Don’t forget: Cisco Live sessions will be available
for viewing on-demand after the event at
CiscoLive.com/Online
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
• Related sessions
Thank you