More to explore

When OSPF Becomes a Distance Vector Protocol

by Ivan Pepelnjak

We were always told that Open Shortest Path First (OSPF) is a fast converging link-state routing protocol that always results in a loop-free and blackhole-free network topology. In reality, it’s a link-state protocol within an area and almost a distance-vector protocol between areas. In this article, I’ll illustrate how this unexpected behavior can affect the convergence of your network and how you can use proprietary extensions of Cisco IOS to alleviate the undesired side effects of OSPF.

The Scenario

Let’s start with a very simple network topology displayed in Figure 1: four routers in a meshed OSPF area, with two of them (A1 and A2) also participating in area 0 (thus becoming Area Border Routers – ABR).

Figure 1

Sample network topology

This topology is very common in hierarchical networks (A1 and A2 would be the core or aggregation routers and S1 and S2 would be two sites with a backdoor link between them).

Note

The backdoor link between S1 and S2 was inserted solely to better illustrate the side effects of OSPF. The same behavior is observed (although in a slightly less pronounced way) in a pure hub-and-spoke scenario.

When a subnet is lost on the S1 router (subnet loss can be easily emulated by disabling a loopback interface), the corresponding route is lost on S2 when it runs Shortest Path First (SPF) algorithm (the IP routing debugging printouts are included in Listing 1). However, the route mysteriously reappears in a few milliseconds, now pointing to both ABRs (obviously a black hole). After another five seconds, the spurious routes disappear, resulting in completed network convergence.

Technical details

The initial SPF delay and the inter-SPF interval can be configured with the timers throttle spf router configuration command.

Listing 1

Area Border Routers blackhole a lost subnet

S2#show debug

IP routing:

  IP routing debugging is on for access list 98

S2#show access-list

Standard IP access list 98

    10 permit 10.0.0.11

S2#

04:44.047: RT: del 10.0.0.11/32 via 10.0.2.17, ospf metric [110/65]

04:44.051: RT: delete subnet route to 10.0.0.11/32

04:44.051: RT: NET-RED 10.0.0.11/32

04:44.087: RT: add 10.0.0.11/32 via 10.0.2.13, ospf metric [110/139]

04:44.091: RT: NET-RED 10.0.0.11/32

04:44.119: RT: add 10.0.0.11/32 via 10.0.2.5, ospf metric [110/139]

04:44.123: RT: NET-RED 10.0.0.11/32

04:49.139: RT: del 10.0.0.11/32 via 10.0.2.13, ospf metric [110/139]

04:49.143: RT: NET-RED 10.0.0.11/32

04:49.175: RT: del 10.0.0.11/32 via 10.0.2.5, ospf metric [110/139]

04:49.179: RT: delete subnet route to 10.0.0.11/32

04:49.183: RT: NET-RED 10.0.0.11/32

This behavior might seem purely academic at first, but if the backup solution on S2 uses another routing protocol or floating static routes, the backup route will not be applied until the spurious routes received from ABRs are removed.

What’s going on?

To understand why OSPF behaves like it does in multi-area environment; you have to realize that it’s a link-state protocol only within a single area. For example, the path toward the 10.0.0.11/32 prefix (loopback interface on S1) is announced into the area 0 as a summary (type-3) Link State Advertisement (LSA), as shown in Figure 2.

Figure 2

Generation of inter-area summary LSA

The contents of the LSA are displayed in Listing 2. The LSA contains only the prefix and the cost toward the prefix (typical distance-vector information). Even the area in which the prefix originates is not included in the LSA.

Listing 2

Summary LSA for IP prefix 10.0.0.11/32 in area 0

A1#show ip ospf database summary 10.0.0.11

 

            OSPF Router with ID (10.0.0.1) (Process ID 1)

                Summary Net Link States (Area 0)

 

  Options: (No TOS-capability, DC, Upward)

  LS Type: Summary Links(Network)

  Link State ID: 10.0.0.11 (summary Network Number)

  Advertising Router: 10.0.0.1

  Network Mask: /32

        TOS: 0  Metric: 65

 

  Options: (No TOS-capability, DC, Upward)

  LS Type: Summary Links(Network)

  Link State ID: 10.0.0.11 (summary Network Number)

  Advertising Router: 10.0.0.2

  Network Mask: /32

        TOS: 0  Metric: 65

Configuration tip

The results of the show command in Listing 2 were filtered with the output filter include ^$|Options|Type|Link|Router|Mask|Metric.

When the S1 loses the 10.0.0.11/32 subnet, the ABRs (A1 and A2) run the SPF algorithm in area 1 and discover that the path toward the 10.0.0.11/32 is lost. However, they both have an alternate path through area 0 and the other ABR. The path through area 0 is thus selected as the best path and re-advertized into area 1 as illustrated in Figure 3.

Figure 3

A temporary misdirected routing introduced by the ABRs

The corresponding debugging printouts on A1 are included in Listing 3 (please note that the printouts have been heavily filtered for brevity reasons).

Listing 3

Initial SPF run on A1

25:08.699: OSPF: Detect change in LSA type 1, LSID 10.0.0.11, from 10.0.0.11 area 1

25:13.707: OSPF: running SPF for area 1, SPF-type Full

25:13.751: OSPF: Generate sum from intra-area route 10.0.0.11, mask 255.255.255.255, type 3, age 3600, metric 16777215, seq 0x80000002 to area 0

25:13.763: OSPF: running spf for summaries area 0

25:13.767: OSPF: Start processing Summary LSA 10.0.0.11, mask 255.255.255.255, adv 10.0.0.2, age 66, seq 0x80000001 (Area 0) type 3

25:13.771:    Add better path to LSA ID 10.0.0.11, gateway 0.0.0.0, dist 75

25:13.771:    Add path: next-hop 10.0.1.2, interface FastEthernet0/0

25:13.775: Add Summary Route to 10.0.0.11/255.255.255.255. Metric: 75, Next Hop: 10.0.1.2

25:13.779: OSPF: Entered inter-area route sync - area 0

25:13.783: OSPF: Generate sum from inter-area route 10.0.0.11, mask 255.255.255.255, type 3, age 0, metric 75, seq 0x80000001 to area 1

However, as the summary LSA for IP prefix 10.0.0.11/32 in area 0 depends on the router LSA in area 1, both ABRs eventually remove the summary LSA from the area 0, triggering another SPF run in area 0 (Figure 4 and Listing 4). When the SPF algorithm is run the second time in area 0, both ABRs discover they no longer have an inter-area route toward the 10.0.0.11/32 prefix. The summary LSA is thus removed from area 1, finally resulting in correct network topology.

Figure 4

Network topology is corrected after the summary LSAs are removed

Listing 4

Partial SPF run after the inter-area summary is removed from area 0

25:13.827: OSPF: Detect change in LSA type 3, LSID 10.0.0.11, from 10.0.0.2 area 0

25:13.851: OSPF: Start partial processing Summary LSA 10.0.0.11, mask 255.255.255.255, adv 10.0.0.2, age 3600, seq 0x80000002 (Area 0) type 3

25:13.855: OSPF: inter-route to 10.0.0.11/32 became unreachable, check externals

25:13.867: OSPF: Start partial processing Summary LSA 10.0.0.11, mask 255.255.255.255, adv 10.0.0.1, age 0, seq 0x80000001 (Area 1) type 3

25:13.867: OSPF: Non-backbone/self-originated LSA

25:13.871: OSPF: Start partial processing Summary LSA 10.0.0.11, mask 255.255.255.255, adv 10.0.0.2, age 2, seq 0x80000001 (Area 1) type 3

25:13.875: OSPF: Non-backbone/self-originated LSA

25:18.859: OSPF: Generate sum from inter-area route 10.0.0.11, mask 255.255.255.255, type 3, age 3600, metric 16777215, seq 0x80000002 to area 1

However, the whole process resulted in two topology changes in area 1, requiring an extra SPF run on all routers in area 1 to complete the network convergence. As the default throttle timers set the inter-SPF interval at higher value than the initial SPF delay, the network convergence is prolonged for a significant amount of time.

The debugging printouts on S2 illustrate another interesting behavior of Cisco’s OSPF implementation: even though A1 and A2 realized pretty early on that they’re dealing with a loop (the second SPF run was a partial SPF run in area 0 and was thus not subject to inter-SPF interval), they could not originate a changed LSA (to remove the bogus summary route) into area 1 immediately due to LSA throttling functionality of Cisco IOS; the default LSA throttling parameters allow an LSA to be originated only once every five seconds. The five second gap between the original LSA and the changed LSA is very evident in the abbreviated debugging printouts in Listing 5.

Listing 5

Delayed convergence in area 1 due to LSA throttling

27:06.271: OSPF: Detect change in LSA type 1, LSID 10.0.0.11, from 10.0.0.11 area 1

27:11.307: OSPF: running SPF for area 1, SPF-type Full

27:11.379: OSPF: Schedule partial SPF - type 3 id 10.0.0.11 adv rtr 10.0.0.2

27:11.387: OSPF: Schedule partial SPF - type 3 id 10.0.0.11 adv rtr 10.0.0.1

27:16.495: OSPF: Detect change in LSA type 3, LSID 10.0.0.11, from 10.0.0.1 area 1

27:16.499: OSPF: Schedule partial SPF - type 3 id 10.0.0.11 adv rtr 10.0.0.1

27:16.571: OSPF: Detect change in LSA type 3, LSID 10.0.0.11, from 10.0.0.2 area 1

27:16.571: OSPF: Schedule partial SPF - type 3 id 10.0.0.11 adv rtr 10.0.0.2

The Solution

You might want to tackle the unexpected route flaps introduced by OSPF inter-area mechanisms with the tuning of various OSPF timers. This approach is clearly a kludge not a solution, as it does not address the underlying problem, but solely reduces the span of its impact. If you want to go down this route, these are the router configuration commands you can use:

Table 1

Tuning OSPF with the router configuration commands

Configuration command syntax

Explanation

timers throttle spf delay interval max-interval

Sets the SPF-related timers. The delay parameter specifies the time between LSA change detection and the SPF run. The interval and max-interval parameters specify the minimum and maximum intervals between the full SPF runs (the inter-SPF interval increases if the network remains unstable).

timers throttle lsa all delay interval

Specifies the initial delay between a change in the routing table and the LSA update. The interval parameter sets the minimum interval between updates to the same LSA.

Note

The OSPF timers configuration syntax has changed with the introduction of OSPF Shortest Path First Throttling and OSPF LSA Throttling features in IOS release 12.2T.

There are two solutions (although both imperfect) to the unexpected OSPF route flaps:

Inter-area route summarization removes the preconditions for the route flap, as the summary LSA inserted into area 0 is less specific than the disappearing IP prefix from a non-backbone OSPF area.

You could also use the (non-standard) OSPF ABR Type 3 LSA Filtering feature of Cisco IOS to prevent an ABR from propagating backbone summary LSAs back into a non-backbone area.

For example, we could configure Type 3 LSA filter on A1 and A2 in our sample network to ensure that they don’t accept summary LSAs for prefixes known to be in area 1 from the backbone area. To configure LSA filtering, you have to:

Step 1.

Define an IP prefix list that is used to match the LSAs you want to filter.

Step 2.

Define an OSPF Type-3 filter with the area area-id filter-list prefix prefix-name in|out router configuration command.

The area filter-list configuration command is a bit complex to understand. To start with, the ip prefix-list used in the command has to permit the prefixes (summary LSAs) that should not be filtered and deny the prefixes that should be. The in and out keywords are also a bit counterintuitive:

If you specify the in keyword, the IP prefix list filters the summary LSAs originated by this router into the specified area.

If you specify the out keyword, the IP prefix list filters the summary LSAs generated into other areas based on the information received from this area.

In our scenario, we want to filter the summary LSAs propagated from area 0 into all non-backbone area. The easiest way to configure this filter is to use the area 0 filter-list out configuration command. The complete configuration is displayed in Listing 6 (assuming that the loopback interfaces in area 1 fall within the address range 10.0.0.8/29).

Note

You could also use area 1 filter-list in configuration command, but then the route flaps could still occur in other non-backbone areas attached to the same ABRs.

Listing 6

Summary LSA filter configured on A1 and A2

router ospf 1

 area 0 filter-list prefix Area_1_Loopback out

!

ip prefix-list Area_1_Loopback seq 5 deny 10.0.0.8/29 ge 32

ip prefix-list Area_1_Loopback seq 10 permit 0.0.0.0/0 ge 1

Summary

Contrary to common wisdom, OSPF is not a pure link-state protocol. It uses link state algorithms within an area, but behaves almost like a distance vector protocol between the areas. The distinction could be considered purely academic if it would not introduce temporary routing instabilities into any multi-area OSPF network that does not use inter-area summarization.

In this article, you’ve seen how any OSPF design that has multiple ABRs in a non-backbone area can lead to temporarily incorrect IP routing within that area after an IP prefix is lost. Using default IOS OSPF parameters, the incorrect routing can persist for up to ten seconds, which is long enough to disrupt mission-critical applications and voice traffic.

You could tweak OSPF parameters (SPF and LSA throttle timers) to reduce the time span during which the Area Border Routers insert incorrect information into the affected area, or you could use OSPF route summarization or Type 3 (summary) LSA filters to prevent an IP prefix to reappear through the backbone area into the area from which it originated.

Related learning products:

When OSPF Becomes a Distance Vector Protocol E-Lesson

Open Shortest Path First - Complete Technology Remote Labs

Building Scalable Cisco Internetworks course

Building Scalable Cisco Internetworks Remote Labs

Building Scalable Cisco Internetworks E-course

More to explore:

OSPF ABR Type 3 LSA Filtering

OSPF Shortest Path First throttling

OSPF Link State Advertisement throttling

OSPF configuration commands

More OSPF design and configuration tips


Previous chapter Full article Next chapter
© 1997-2008 NIL, Terms of use