Content

Scalable Policy Routing

by Ivan Pepelnjak

If you’re serious about the high-availability of your network, your remote sites have a primary and a backup link into the core network. In the old days, the backup link was usage-charged (think about ISDN and X.25), and the important issue was to reduce the usage of the backup link. These days, we usually use fixed-cost primary and backup links (for example, Metro Ethernet for primary link and Frame Relay or ADSL for the backup), and once the top managers realize that, they want us to utilize both links all the time.

It doesn’t take much to convince anyone (even people who have never been involved in networking) that it doesn’t make sense to load-share between a 20 Mbps symmetrical fiber-optic link and a 1Mbps/256 kbps asymmetrical ADSL link. The next idea the managers get is usually very predictable: why don’t you transport certain applications over the backup link? Welcome to the murky world of policy routing.

Whenever you mention policy routing to Cisco-focused engineers, they get disturbing mental pictures involving myriads of access-lists and route-maps that bypass the regular routing tables on a packet-by-packet basis and turn a nicely designed network into a spaghetti-like construction of hop-by-hop static routing. Fortunately, the server virtualization has allowed us to dedicate virtual servers (and, consequently, specific IP addresses) to individual applications, and distance-vector routing protocols (or MPLS Traffic Engineering) help you design alternate paths for specific IP prefixes. In this article, we’ll use the most flexible solution and discuss how to build a network with BGP, but EIGRP or even RIP could give you pretty satisfying results. Unfortunately, you cannot do much if you use OSPF and are not willing to deploy MPLS Traffic Engineering (the approach I’ll cover in an upcoming article).

Sample Network

The sample network that will be used in all the printouts and configuration examples is a redundant hub-and-spoke network where each remote site connects to the central location via a high-speed GRE-over-Internet connection and a lower-speed Frame Relay link. The central site has two core routers (one concentrating the Internet connections, the other one serving as the Frame Relay hub) and two distribution-layer routers. The network schematics are displayed in Figure 1.

Note

In a live network, you’d use IPSec in combination with GRE tunnels, but it’s not included in the configuration samples to reduce the overall complexity.

Figure 1

Network diagram

Management would like all traffic to and from the Legacy server farm (for example, TN3270 sessions or low-volume client-server transactions) to flow over the lower-speed links, while the new flashy Web 2.0 applications should use the high-speed link. Obviously, all applications should be able to use either one of the links in case of link failure.

Note

In a well-designed network, non-critical bandwidth-intensive applications would be blocked from using the lower-speed backup link.

The rest of the article covers the basic BGP design and associated router configurations that you can use in any large-scale network and the changes made to the BGP design to support the desired policy routing.

Basic Routing Design

The whole network uses BGP as its core routing protocol, giving us a highly-scalable solution with the inherent capability to implement policy-based routing (most of the BGP’s complexity is a direct result of its abilities to perform policy-based routing decisions). Each site is a separate autonomous system (AS); remote sites have one or two routers in their AS and the central site can have as many routers as needed, using the two core routers as BGP route reflectors. OSPF is also deployed in the core site to ensure fast convergence and solve the BGP next-hop problems. The overall routing design is displayed in Figure 2.

Figure 2

Routing design

You could decide to simplify the router configuration by redistributing directly connected routes into the BGP on each router, but this would just pollute the BGP tables with the point-to-point WAN subnets that are usually not needed for proper network operation. It’s thus better to manually list the networks you want to announce in the BGP routing process. The sample configuration from one of the remote sites is included in Listing 1.

Listing 1

BGP configuration on Site-A

router bgp 65100

 network 10.0.1.1 mask 255.255.255.255

 network 192.168.1.0

Note

Only the most relevant parts of the router configurations are included in the article. You can subscribe to the e-lesson associated with this article to check the complete router configurations, build the solution, and test it in an actual network. Please check the NIL e-lessons web site for its availability.

To make the core router configurations as scalable as possible, we’re using the new template mechanisms introduced in IOS release 12.0S and 12.3T. These template mechanisms might look verbose if you have only a few neighbors, but the ease-of-management they give you quickly pays off – if you have to make a change in your BGP configuration, you change the settings in one place and they get propagated to all BGP neighbors automatically. For example, the core BGP routers use several templates:

The LocalAS session template that is used to create all peers in the same AS;

The Global policy template that defines rules that apply to all neighbors (community propagation is configured in this template);

The RRClient template that covers all the other routers in the central site;

The RemoteSite template that specifies parameters for the BGP neighbors on remote sites.

The BGP templates used on the CoreInet router are shown in Listing 2 and the BGP neighbor configuration is included in Listing 3.

Listing 2

BGP templates on the CoreInet router

router bgp 65000

 template peer-policy Global

  send-community both

!

 template peer-policy LocalAS

  inherit peer-policy Global 1

!

 template peer-policy RRClient

  route-reflector-client

  inherit peer-policy Global 1

!

 template peer-policy RemoteSite

  inherit peer-policy Global 1

!

 template peer-session LocalAS

  remote-as 65000

  update-source Loopback0

Listing 3

BGP neighbors of the CoreInet router

router bgp 65000

 neighbor 10.0.1.3 description CoreFR

 neighbor 10.0.1.3 inherit peer-session LocalAS

 neighbor 10.0.1.3 inherit peer-policy LocalAS

 !

 neighbor 10.0.1.4 description Legacy

 neighbor 10.0.1.4 inherit peer-session LocalAS

 neighbor 10.0.1.4 inherit peer-policy RRClient

 !

 neighbor 10.0.1.20 description Web

 neighbor 10.0.1.20 inherit peer-session LocalAS

 neighbor 10.0.1.20 inherit peer-policy RRClient

 !

 neighbor 10.0.11.2 description Site-A

 neighbor 10.0.11.2 remote-as 65100

 neighbor 10.0.11.2 inherit peer-policy RemoteSite

 !

 neighbor 10.0.11.6 description Site-B

 neighbor 10.0.11.6 remote-as 65101

 neighbor 10.0.11.6 inherit peer-policy RemoteSite

 !

 neighbor 10.0.11.10 description Site-C

 neighbor 10.0.11.10 remote-as 65102

 neighbor 10.0.11.10 inherit peer-policy RemoteSite

On the other hand, the routers on the remote sites have just two BGP neighbors. Implementing peer templates is thus overkill, the traditional BGP configuration (Listing 4) is used on these sites.

Listing 4

BGP configuration on a remote site

router bgp 65100

 neighbor 10.0.8.1 remote-as 65000

 neighbor 10.0.8.1 description CoreInet

 !

 neighbor 10.0.11.1 description CoreFR

 neighbor 10.0.11.1 remote-as 65000

Policy Routing to the Server

Whenever you want to implement policy routing in a network, you have to consider both traffic flow directions independently. For example, changes you make to force the traffic from remote sites to use a particular link toward the server usually do not influence the traffic flowing in the other direction, potentially resulting in asymmetrical traffic.

Warning

Asymmetrical traffic flow should be avoided, as it could introduce unwanted jitter and subsequently reduce the overall throughput.

In this section, we’ll focus on the traffic flow from remote sites toward the servers; the next section will deal with return traffic.

To influence the traffic flow toward the servers, remote site routers have to prefer IP prefixes (for legacy servers) received through the backup BGP session over those received through the primary BGP session. You could use a variety of BGP mechanisms, but the only one that requires no configuration changes on remote sites (and this is crucial to achieve scalability) is the Multi-Exit Discriminator (MED) attribute, which can be set on the central site routers and accepted by the remote sites.

The following MED values are used in the sample network:

MED=200 is set on all IP prefixes advertised from the CoreInet router to the remote sites.

MED=300 is set on IP prefixes advertised from the CoreFR router to the remote sites to ensure the backup path is only used when the primary link fails (lower MED values are preferred), as shown in Figure 3 and Listing 5.

MED=100 is set on IP prefixes of the legacy servers, making them more preferred over the backup link as shown in Figure 4 and Listing 6.

Figure 3

CoreInet router is preferred for the Web LAN

Listing 5

BGP paths toward the Web LAN on the Site-A router

Site-A#show ip bgp 10.0.21.0

BGP routing table entry for 10.0.21.0/24, version 22

Paths: (2 available, best #1, table Default-IP-Routing-Table)

  65000

    10.0.11.1 from 10.0.11.1 (10.0.1.2)

      metric 200, localpref 100, valid, external, best

  65000

    10.0.8.1 from 10.0.8.1 (10.0.1.3)

      metric 300, localpref 100, valid, external

Figure 4

CoreFR router is preferred for the Legacy LAN

Listing 6

BGP paths toward the Legacy LAN on the Site-A router

Site-A#show ip bgp 10.0.20.0

BGP routing table entry for 10.0.20.0/24, version 13

Paths: (2 available, best #2, table Default-IP-Routing-Table)

  65000

    10.0.11.1 from 10.0.11.1 (10.0.1.2)

      metric 200, localpref 100, valid, external

      Community: 65000:100

  65000

    10.0.8.1 from 10.0.8.1 (10.0.1.3)

      metric 100, localpref 100, valid, external, best

      Community: 65000:100

You could set the MED with an access-list or a prefix-list on the CoreFR router, but a more scalable approach would use BGP communities (highlighted in Listing 6): the originating router (the Legacy router in our network) would set a BGP community (65000:100 is used in the sample network) to indicate that the IP prefix belongs to the legacy servers and the CoreFR router would use the community to set the MED values. The overall process is illustrated in Figure 5.

Figure 5

BGP route propagation from the Legacy router to the Site router

Configure Policy Routing on the Core Routers

BGP configuration on the Legacy router changes only slightly: a route-map is attached to the network statement advertising the IP prefixes of the legacy servers (Listing 7).

Listing 7

Changes in the BGP configuration of the Legacy router

router bgp 65000

 network 10.0.20.0 mask 255.255.255.0 route-map FRBest

!

route-map FRBest permit 10

 set community 65000:100

The RemoteSite peer policy template is changed on the CoreInet router to set the MED to 200 on all outgoing updates (Listing 8).

Listing 8

BGP configuration changes on the CoreInet router

router bgp 65000

 template peer-policy RemoteSite

  route-map InetMED out

!

route-map InetMED permit 10

 set metric 200

The changes on the CoreFR router are a bit more extensive (Listing 9):

An ip community-list is defined to match the target BGP community (65000:100).

A route-map is used to match BGP prefixes with the target BGP community and change their MED to 100. The MED of all other BGP prefixes is set to 300.

The RemoteSite peer policy template is modified to include an outgoing route-map.

Listing 9

BGP configuration changes on the CoreFR router

router bgp 65000

 template peer-policy RemoteSite

  route-map FRMED out

!

ip community-list standard FRBest permit 65000:100

!

route-map FRMED permit 10

 match community FRBest

 set metric 100

!

route-map FRMED permit 20

 set metric 300

Policy Routing from the Server

The changes to BGP routing that force the traffic flow from the Legacy servers to remote sites through the backup links are slightly more complex. Obviously, both core routers (CoreFR and CoreInet) have to prefer BGP prefixes received from remote sites over the same BGP prefixes received from the other core router. That’s the default BGP behavior (so we need no configuration change on CoreInet and CoreFR), but this requirement also precludes the usage of tools like BGP Local Preference or Multi-Exit Discriminator as any of these attributes would make routes from one of the core routers preferable on all other routers on the central site.

Even though the CoreInet and CoreFR routers cannot modify any BGP attributes of the routes received from the remote sites, the Legacy router has to prefer BGP routes received from the CoreFR router (to ensure the traffic from the legacy servers is sent over the Frame Relay link) and the Web router has to prefer BGP routes received from the CoreInet router (making the traffic from the internal Web servers flow over through the GRE-over-Internet tunnels). The easiest BGP tool available to do the job is the weight mechanism that is local to a router and does not change any of the BGP attributes (ensuring that even if the Web and Legacy routers would propagate their BGP routes, these would not be changed).

To simplify the implementation, we’ll use static weights: all routes received from the CoreFR router will have a higher weight on the Legacy router. The changes in BGP configuration are included in Listing 10 and the BGP prefixes advertised by Site-A as seen on the Legacy router are displayed in the Listing 11.

Note

You also have to perform BGP soft reconfiguration with the clear ip bgp * soft in command on the Legacy after changing its BGP configuration to make sure that the BGP prefixes received from the CoreFR and CoreInet routers are processed using the new set of parameters

Listing 10

BGP configuration change on the Legacy router

router bgp 65000

 neighbor 10.0.1.3 weight 500

Listing 11

BGP prefixes originated by the Site-A on the Legacy router

Legacy#show ip bgp reg 65100

   Network          Next Hop            Metric LocPrf Weight Path

*>i10.0.1.1/32      10.0.8.2                 0    100    500 65100 i

* i                 10.0.11.2                0    100      0 65100 i

*>i192.168.1.0      10.0.8.2                 0    100    500 65100 i

* i                 10.0.11.2                0    100      0 65100 i 

Technical detail

Even though the Legacy router always prefers routes received from the CoreFR router, the traffic flow is always optimal, as the BGP next-hop of an external route does not change within the autonomous system, regardless of the path the route has taken to reach the final BGP router.

Likewise, the weights are set on the Web router to prefer BGP prefixes received from the CoreInet router (Listing 12). The results of this configuration change are shown in Listing 13.

Listing 12

BGP configuration change on the Web router

router bgp 65000

 neighbor 10.0.1.2 weight 500

Listing 13

BGP prefixes originated by the Site-A on the Web router

Web#show ip bgp regexp 65100

   Network          Next Hop            Metric LocPrf Weight Path

*>i10.0.1.1/32      10.0.11.2                0    100    500 65100 i

* i                 10.0.8.2                 0    100      0 65100 i

*>i192.168.1.0      10.0.11.2                0    100    500 65100 i

* i                 10.0.8.2                 0    100      0 65100 i

In a more complex scenario, you could duplicate the setup used in the previous section: the CoreFR and the CoreInet routers would set BGP communities on BGP routes received from the remote sites and the Legacy and the Web routers would change BGP local preference or weight for routes marked with specific BGP community, but this would only complicate the design. The implementation would be even more complex if there would be additional routers between the Legacy and the CoreFR routers.

Ready-for-Use Tests

To test the correct operation of the policy routing in your network, you should perform at least the following tests:

Traceroute from the sample clients at various remote sites all classes of servers (in our scenario, a server in the Legacy LAN and a server in the Web LAN).

Traceroute from the servers back to the clients.

Technical detail

The record route option available in the IOS traceroute command does not help you, as it records forward route (which you test with the traceroute command anyway), not the return route.

The tests should be performed under all possible link conditions (both links active, failure of the primary link, failure of the backup link).

The first set of the tests, executed between a client on site A and the legacy (TN3270) and web (MAIL) servers are displayed in Listing 14. Similar tests executed from the two servers toward the client on site A are shown in Listing 15.

Note

You can execute additional tests in the associated remote lab exercise. Please check the NIL e-lessons web site for its availability.

Listing 14

Traceroute executed from a client on site A toward various servers

Client.Site-A#traceroute TN3270

Type escape sequence to abort.

Tracing the route to TN3270 (10.0.20.20)

  1 Site-A (192.168.1.1) 4 msec 4 msec 4 msec

  2 Serial-1-0-100.CoreFR (10.0.8.1) 8 msec 8 msec 8 msec

  3 Fast-0-0.Legacy (10.0.10.3) 16 msec 12 msec 12 msec

  4 TN3270 (10.0.20.20) 20 msec *  36 msec

Client.Site-A#traceroute MAIL

Type escape sequence to abort.

Tracing the route to MAIL (10.0.21.25)

  1 Site-A (192.168.1.1) 8 msec 8 msec 4 msec

  2 Tunnel-0.CoreInet (10.0.11.1) 12 msec 8 msec 12 msec

  3 Fast-0-0.Web (10.0.10.4) 8 msec 8 msec 16 msec

  4 MAIL (10.0.21.25) 36 msec *  28 msec

Listing 15

Traceroute executed from a legacy and a web server toward a client on site A

TN3270#traceroute Client.Site-A

Type escape sequence to abort.

Tracing the route to Client.Site-A (192.168.1.100)

  1 Fast-0-1.Legacy (10.0.20.1) 8 msec 4 msec 4 msec

  2 Fast-0-0.CoreFR (10.0.10.2) 12 msec 12 msec 8 msec

  3 FR-1-0-100.Site-A (10.0.8.2) 8 msec 8 msec 12 msec

  4 Client.Site-A (192.168.1.100) 20 msec *  40 msec

Mail#traceroute Client.Site-A

Type escape sequence to abort.

Tracing the route to Client.Site-A (192.168.1.100)

  1 Fast-0-1.Web (10.0.21.1) 8 msec 4 msec 4 msec

  2 Fast-0-0.CoreInet (10.0.10.1) 12 msec 8 msec 8 msec

  3 Tunnel-0.Site-A (10.0.11.2) 16 msec 16 msec 16 msec

  4 Client.Site-A (192.168.1.100) 24 msec *  40 msec

Summary

Most network designers and implementers try to avoid policy routing, as its common implementation in Cisco IOS requires a complex mix of access-lists and route-maps that have to be deployed on a hop-by-hop basis. In reality, distance vector routing protocols can be used to implement common policy routing requirements in enterprise networks where a set of applications should prefer a different subset of links than other applications.

Routing protocol-based policy routing should be implemented (if at all possible) with BGP, as it gives you the richest set of tools to use to influence the route selection policy. EIGRP is a viable alternative (you can manipulate the delay portion of the metric for each individual IP prefix), with RIP being the solution of last resort. You cannot implement the same mechanisms with any link-state protocol, as you cannot increase the link cost for individual IP prefixes (OSPF with type-of-service support would allow you to do that, but it’s never been implemented in a mainstream routing device).

Related learning products:

Configuring BGP on Cisco Routers Course

Configuring BGP on Cisco Routers Remote Labs

Configuring BGP on Cisco Routers E-course

Building Scalable Cisco Internetworks Course

Building Scalable Cisco Internetworks Remote Labs

Building Scalable Cisco Internetworks E-course

More to explore:

Designing Fast Converging BGP Networks

Redistributing customer routes into BGP

BGP essentials: BGP communities

BGP essentials: peer session templates

BGP essentials: configuring internal BGP sessions

BGP fast session deactivation

Perfect load-balancing: How close can you get?

More BGP hints and tips

More MPLS Traffic Engineering hints and tips

Right sidebar