Bring your Network Closer to Five Nines with Graceful Shutdown
by Ivan Pepelnjak
The five nines (99.999% availability of a service) is the holy grail of many Chief Information Officers (CIO), sometimes having a direct impact on their bonus structure. What the five nines translate to in real life is the , which is extremely hard to reach even in a fully redundant architecture. It’s quite a challenge to fine-tune your network to reroute around link and node failures in seconds if you want to keep the overall downtime to a minimum. The scheduled router outages (upgrades, hardware maintenance), while being necessary, are only an additional burden, further reducing the safety margin you have. In this article, you’ll see how you can use the stub router advertisement functionality built into the OSPF (Open Shortest Path First) routing protocol to implement graceful shutdown and reduce the network downtime caused by scheduled router outages.
Introduction to Graceful Shutdown
The traditional approach to detecting a router being shut down is hostile to high-availability goals:
Some routing protocols (most notably, OSPF and IS-IS) have no mechanisms to inform the neighbors that a router is going to be shut down.
The only means of detecting a reloading neighbor device is through the absence of hello packets, which can take seconds (unless you use protocol).
Until a neighbor loss is detected, packets are forwarded to it, resulting in packet loss.
The network re-convergence can start only after the node failure has been detected, further increasing the downtime.
The graceful shutdown functionality is trying to address all these issues by announcing that a router is about to disappear from the network:
The router about to be shut down starts advertising its links with a maximum cost.
The change in link cost starts a network convergence process. If there are alternate paths, the traffic will be rerouted around the device that is about to be shut down.
Throughout the convergence period, the routing and forwarding tables on the router about to be removed from the network are intact, allowing it to properly forward any traffic still sent to it.
After the network convergence is complete, you can safely remove the router from the network.
Stub Router Advertisement
OSPF implements graceful shutdown with the Stub Router Advertisement feature and . This feature is configured with the max-metric router-lsa router configuration command, which causes the router to modify its router (type-1) Link State Advertisement (LSA) changing the costs to all adjacent routers and networks (type-2 LSA) to 65535 (maximum per-hop cost allowed by OSPF).
The operation of this feature is best illustrated with an example. Let’s assume we have a highly redundant network shown in Figure 1.
Figure 1
Sample multi-area OSPF network with external connectivity

Note
The network in Figure 1 is highly redundant, but does not offer any internal load sharing capabilities due to different line speeds on primary and backup paths.
As expected, the LAN subnet between X1 and X2 (192.168.0.0/24) can be reached via two paths from the POP router (Listing 1).
Listing 1
Two equal-cost routes to external LAN from the POP router
POP#show ip route | section 192.168.0.0
O IA 192.168.0.0/24 [110/138] via 10.7.1.5, 00:05:12, Serial1/1
[110/138] via 10.7.1.1, 00:04:29, Serial1/0
Configuring max-metric router-lsa on C1 as the first step in router shutdown process changes the inbound cost of all OSPF links on C1 to 65535 (similar, but not equivalent to changing OSPF interface cost), resulting in the network topology displayed in Figure 2.
Figure 2
OSPF topology with max-metric router-lsa configured on C1

After the max-metric router-lsa has been configured, C1 starts advertising high link costs in its router LSA; the Listing 2 displays its router LSA as seen from the POP router. It also recalculates the OSPF topology in all areas (C1 is an area border router) and advertises summary LSAs (inter-area routes) with increased costs (Listing 3). After receiving the changed LSAs from C1, the POP router recalculates the OSPF topology and starts using the path going through C2 as the only path to the 192.168.0.0/24 subnet (Listing 4).
Listing 2
Router LSA advertised by C1 as seen on the POP router
POP#show ip ospf database router 10.0.1.1
… the printout has been shortened …
Routing Bit Set on this LSA
Advertising Router: 10.0.1.1
Link connected to: another Router (point-to-point)
(Link ID) Neighboring Router ID: 10.1.0.3
(Link Data) Router Interface address: 10.7.1.1
TOS 0 Metrics: 65535
Link connected to: a Stub Network
(Link ID) Network/subnet number: 10.7.1.0
(Link Data) Network Mask: 255.255.255.252
TOS 0 Metrics: 64
Listing 3
Inter-area route to 192.168.0.0/24 advertised by C1 and C2
POP#show ip ospf database summary 192.168.0.0
… the printout has been shortened …
Summary Net Link States (Area 1)
Link State ID: 192.168.0.0 (summary Network Number)
Advertising Router: 10.0.1.1
Network Mask: /24
TOS: 0 Metric: 65545
Link State ID: 192.168.0.0 (summary Network Number)
Advertising Router: 10.0.1.2
Network Mask: /24
TOS: 0 Metric: 74
Listing 4
The POP router uses a single route toward the external LAN subnet
POP#show ip route | section 192.168.0.0
O IA 192.168.0.0/24 [110/138] via 10.7.1.5, 00:07:56, Serial1/1
Last but definitely not least, it’s important to mention that the increased OSPF cost reported by an OSPF stub router triggers OSPF recalculations that will select alternate OSPF paths if they exists. If there is no alternate OSPF path, the path through the stub router (configured with the max-metric router-lsa) will still be used, as it’s a valid OSPF path. The stub router advertisement will thus not trigger a failover to a backup route learnt through another routing protocol.
Automating Graceful Shutdown
If you want a consistent implementation of the reliable graceful shutdown in your network, you have to address the following caveats:
The network operators shall not be allowed to execute the reload command.
The router configuration modified to enable the graceful shutdown (with the max-metric router-lsa command) should not be saved to NVRAM, otherwise the router will still not be used for packet forwarding after the reload.
The Embedded Event Manager (EEM) applets can help you to fix both issues. You can with the TACACS+ command authorization if you use the Authentication Authorization and Accounting (AAA) framework and a TACACS+ server, or you could define an EEM applet that triggers on the reload command and disables it (Listing 5).
Listing 5
EEM applet that disables the reload command
event manager applet noReload
event cli pattern "^reload" sync no skip yes occurs 1 period 1
action 1.0 syslog priority informational msg "Reload is disabled"
Note
It’s very important that the cli pattern defined in the noReload applet contains the beginning-of-line symbol (the caret), otherwise all configuration commands having reload substring anywhere in them would be disabled.
The graceful shutdown process can be automated with another EEM applet that configures the OSPF stub router advertisement and triggers a router reload after a specified time period (Listing 6).
Listing 6
Shutdown process
event manager applet Shutdown
event none
action 1.0 cli command "enable"
action 1.1 cli command "configure terminal"
action 2.0 cli command "router ospf 1"
action 2.1 cli command "max-metric router-lsa"
action 3.0 cli command "event manager applet doReload"
action 3.1 cli command "event timer countdown time 20"
action 3.2 cli command "action 1.0 syslog msg shutting-down"
action 3.3 cli command "action 1.1 reload"
action 4.0 syslog msg "shutdown has been triggered"
Note
You can create the doReload EEM applet that will trigger the actual reload within the Shutdown applet or you could configure it in advance with the event none option and just change the triggering event (highlighted in Listing 6) in the Shutdown applet.
After defining the shutdown alias (with the alias exec shutdown event manager run Shutdown configuration command), you can start reloading your router with the shutdown command. A sample printout generated with that command is included in Listing 7.
Listing 7
Triggering graceful shutdown
X2#shutdown
X2#
%HA_EM-6-LOG: Shutdown: shutdown has been triggered
%SYS-5-CONFIG_I: Configured from console by vty0
%HA_EM-6-FMS_RELOAD_SYSTEM: fh_io_msg: Policy has requested a system
reload; -Process= "EEM Server", ipl= 0, pid= 230
%SYS-5-RELOAD: Reload requested by EEM. Reload Reason: Embedded Event Manager action.
Graceful Shutdown on Edge Routers
The OSPF stub router advertisement works best if the router about to be removed from the network serves as a transit OSPF router (a router in the middle of an OSPF network). If you want to achieve the same effects on an edge (a router with stub networks) or AS boundary router (redistributing external routing information into OSPF), you have to take additional precautions.
To illustrate the problems that can arise, let’s assume we want to shut down the X1 router in the sample network (Figure 3).
Figure 3
Router X1 should be removed from the network

If OSPF is run on the LAN connecting X1 and X2, the LAN subnet is modeled as a network (type-2) LSA, resulting in the OSPF topology in area 0 shown in Figure 4.
Figure 4
Area 0 topology with external LAN modeled as a transit OSPF network

Before the shutdown procedure is started, the traffic from C1 to EXT flows through X1 (Listing 8).
Listing 8
Traffic from C1 to EXT flows over Serial 1/1 (through X1)
C1#show ip route | section 192.168.0.0
O 192.168.0.0/24 [110/74] via 10.7.5.2, 00:07:21, Serial1/1
When the max-metric router-lsa is configured on X1, the cost of the link between the X1 and the attached LAN is increased as displayed in Figure 5 (the modified router LSA is displayed in Listing 9).
Figure 5
Area 0 topology with max-metric router-lsa configured on X1

Listing 9
Modified router LSA advertised by X1
C1#show ip ospf database router 10.0.5.1
… the printout has been shortened …
LS Type: Router Links
Link State ID: 10.0.5.1
AS Boundary Router
Number of Links: 6
… stub links have been removed …
Link connected to: a Transit Network
(Link ID) Designated Router address: 192.168.0.2
(Link Data) Router Interface address: 192.168.0.1
Number of TOS metrics: 0
TOS 0 Metrics: 65535
Link connected to: another Router (point-to-point)
(Link ID) Neighboring Router ID: 10.0.1.2
(Link Data) Router Interface address: 10.7.5.14
Number of TOS metrics: 0
TOS 0 Metrics: 65535
Link connected to: another Router (point-to-point)
(Link ID) Neighboring Router ID: 10.0.1.1
(Link Data) Router Interface address: 10.7.5.2
Number of TOS metrics: 0
TOS 0 Metrics: 65535
The increased link cost between X1 and the network LSA representing the IP subnet 192.168.0.0/24 results in the desired topology where the traffic from C1 toward the EXT router flows through C2 and X2 (see Listing 10).
Listing 10
Traffic from C1 to EXT flows over the LAN interface (through C2 and X2)
C1#show ip route | section 192.168.0.0
O 192.168.0.0/24 [110/84] via 10.0.2.2, 00:00:05, FastEthernet0/0
Running OSPF on a LAN with third-party devices attached to it is an obvious security risk. In a high quality production network, OSPF would very likely be disabled on the LAN between X1 and X2 with the passive-interface router configuration command (the OSPF configuration on X1 is included in Listing 11), resulting in the OSPF topology displayed in Figure 6.
Note
You might want to use the same setup in small remote offices that should never become a transit site in the network.
Figure 6
Area 0 topology with external LAN represented as stub networks

Listing 11
OSPF configuration on X1
X1#show running | section router ospf
router ospf 1
log-adjacency-changes
passive-interface FastEthernet0/0
network 0.0.0.0 255.255.255.255 area 0
When the max-metric router-lsa router configuration command is entered on X1 in this scenario, the traffic from C1 to EXT still flows through X1 (Listing 12) as the command only increases the link costs, not the costs of the stub networks. The resulting OSPF topology is displayed in Figure 7.
Listing 12
The route to the external LAN on C1 still points to X1
C1#show ip route | section 192.168.0.0
O 192.168.0.0/24 [110/74] via 10.7.5.2, 00:02:34, Serial1/1
Figure 7
The OSPF cost from C1 to the stub subnet advertised by X1 is not increased

As the OSPF cost from C1 to X1 is still 64 (only the outbound cost is considered in the SPF graph) and X1 advertises the LAN prefix with a cost of 10 (default cost of an Ethernet interface), the OSPF cost from C1 to the IP prefix 192.168.0.0/24 advertised by X1 is still 74, which is better than the path going through X2.
To solve this challenge, you have to use additional options of the max-metric router-lsa command that increase the costs of other OSPF topology elements, not just the link costs. The options are summarized in Table 1.
Table 1
Configuration options of the max-metric router-lsa command
|
Option
|
Description
|
|
include-stub
|
Increases the OSPF cost of the stub networks included in the router LSA. Use in scenarios where multiple OSPF routers advertise the same stub network.
|
|
summary-lsa
|
Increases the OSPF cost of the inter-area summary routes (type-3 LSAs). Use in scenarios where the stub subnets are part of the summary route (dictating its cost), but you don’t want to use the include-stub option.
|
|
external-lsa metric
|
Increases the metric on external routes advertised by the router.
|
In our scenario, we have to increase the cost of the stub networks with the max-metric router-lsa include-stub configuration command. This command increases all the costs advertised in the router (type-1) LSA. The router LSA advertised by X1 after this command has been entered is displayed in Listing 13 and the resulting routing table on C1 is shown in Listing 14 (note that the traffic from C1 toward EXT now flows over C2 and X2).
Listing 13
The metrics of stub networks advertised by X1 are set to 65535
C1#show ip ospf database router 10.0.5.1
… the printout has been shortened …
Link State ID: 10.0.5.1
AS Boundary Router
Number of Links: 6
Link connected to: a Stub Network
(Link ID) Network/subnet number: 192.168.0.0
(Link Data) Network Mask: 255.255.255.0
Number of TOS metrics: 0
TOS 0 Metrics: 65535
… rest of the printout deleted …
Listing 14
Traffic from C1 to EXT flows over C2 and X2
C1#show ip route | section 192.168.0.0
O 192.168.0.0/24 [110/84] via 10.0.2.2, 00:06:15, FastEthernet0/0
The increase of the stub network costs has almost no side effects on the OSPF network, unless some of the routers in it work according to the ancient OSPF specifications from (which was made obsolete by in 1994 and further made obsolete by at least two more RFCs). If you would happen to have such equipment in your network, it would probably ignore stub subnets with maximum cost, making router’s loopback interfaces unreachable (and potentially making the router itself unreachable from anywhere but from the directly connected neighbors).
External Routes
The results of configuring stub router advertisement on an AS boundary router (a router inserting external routes into an OSPF network) is very similar to the situation described in the previous section. The max-metric router-lsa configuration command increases only the costs of the router’s inbound links, which have no effect on the OSPF topology for subnets originated by the router.
For example, if both X1 and X2 originate an external conditional default route (configuration of X1 is included in Listing 15), the default route on C1 will still point to X1 (Listing 16) even after the max-metric router-lsa command has been entered, as the cost of the default route advertised by X1 has not changed (the external route LSAs are displayed in Listing 17).
Listing 15
OSPF default route configuration on X1
router ospf 1
log-adjacency-changes
passive-interface FastEthernet0/0
network 0.0.0.0 255.255.255.255 area 0
default-information originate always metric 20 route-map DefaultRoute
!
route-map DefaultRoute permit 10
match ip address prefix-list DefaultSubnet
!
ip prefix-list DefaultSubnet seq 5 permit 172.16.0.0/24
Listing 16
Default route on C1 points to X1
C1#show ip route | section 0.0.0.0/0
O*E2 0.0.0.0/0 [110/20] via 10.7.5.2, 01:09:32, Serial1/1
Listing 17
External subnet LSAs in OSPF area 0
C1#show ip ospf database external
… the printout has been shortened …
LS Type: AS External Link
Link State ID: 0.0.0.0 (External Network Number)
Advertising Router: 10.0.5.1 (X1)
Network Mask: /0
Metric Type: 2 (Larger than any link state path)
Metric: 20
Forward Address: 0.0.0.0
External Route Tag: 1
LS Type: AS External Link
Link State ID: 0.0.0.0 (External Network Number)
Advertising Router: 10.0.5.2 (X2)
Network Mask: /0
Metric Type: 2 (Larger than any link state path)
Metric: 20
Forward Address: 0.0.0.0
External Route Tag: 1
The max-metric router-lsa external-lsa [ metric ] command solves this issue as well; when entered, the metric on external routes inserted into the OSPF network is increased (the resulting type-5 LSA from X1 is displayed in Listing 18), resulting in the network topology where all the traffic is rerouted around the device about to be removed if there is an alternate OSPF path (or, in case of the external routes, if another router is inserting the same external IP prefix into the OSPF topology).
Listing 18
Default route advertised by X1
C1#show ip ospf database external adv-router 10.0.5.1
… the printout has been shortened …
Link State ID: 0.0.0.0 (External Network Number)
Advertising Router: 10.0.5.1
Network Mask: /0
Metric Type: 2 (Larger than any link state path)
Metric: 16711680
Contrary to the max-metric router-lsa include-stub configuration command which has almost no impact on the network, the max-metric router-lsa external-lsa causes flooding of all external LSAs originated by the router. In designs where an ASBR advertises a large number of external routes into an OSPF network, the resulting OSPF traffic could cause significant utilization of resources throughout the network. The max-metric router-lsa external-lsa command should thus only be used in scenarios where multiple ASBRs advertise the same external prefixes into the OSPF domain.
Summary
If you’re aiming at achieving very high service availability, you should address all the weak spots that could increase the network downtime, one of them being abrupt router shutdown that disrupts routing protocols and causes packet loss during the network convergence phase.
The OSPF stub router advertisement described in RFC 3137 and implemented in IOS release 12.2(4)T helps you improve the service availability by rerouting the traffic around the router that you plan to shut down. The default action, configured with the max-metric router-lsa router configuration command, increases the inbound link costs advertised by the router in its LSA, resulting in increased path costs whenever the router appears as a transit router in the OSPF topology graph. The minimum-cost SPF tree computed by other routers should thus exclude the stub router if there is an alternate OSPF path to the destination. However, the OSPF stub router advertisement will not remove an OSPF path from the IP routing table; therefore you cannot use this feature to install backup routes learnt from another routing protocol or through floating static route configuration.
If your network design includes stub networks with multiple OSPF routers attached to them (for example, external IP subnets announced into OSPF as stub networks), you should use the include-stub option of the max-metric router-lsa configuration command to ensure the cost increase (and resulting topology change) of the stub networks.
Following the increase in link and (optionally) stub network costs, all routers in all the affected OSPF areas (all areas to which the router undergoing shutdown is attached) recalculate the OSPF topology. The increased link costs could lead to increased inter-area route costs (if the router on which you’ve configured max-metric router-lsa is an area border router), resulting in flooding of all inter-area LSAs.
If multiple AS boundary routers insert the same external IP prefix into the OSPF domain, you should use the max-metric router-lsa external-lsa configuration command on them. On the other hand, if an AS boundary router redistributes a large number of external IP routes into the OSPF topology, you should try to avoid using the external-lsa option, as it causes changes and flooding of all external LSAs advertised by the affected router.
Related learning solutions:
More to explore: