Saturday, September 13, 2014

BGP always-compare-med vs deterministic-med

BGP route selection algorithm has always been very systematic, up until you get to the MED or ( Multi-Exit Discriminator), which can be a little bit confusing. in this post, i’ll try to make it as simple as it can be to understand the difference between using the commands bgp always-compare-med  and  bgp deterministic-med

i’m writing this assuming that the reader is fully aware of the BGP route selection, seeking only an understanding of the difference between those two commands

now let’s check the below topology


R1 in AS 1 is peering with R2 and R3 in AS23 , and with R4 in AS4, do does R5 in AS5. throughout the post, we’ll be using R1 to eexaminethe network 5.5.5.5/32 originated in AS5  as a reference to check the difference between those two commands.

Let’s first see the configuration on R1 and check the routing table

R1#show ip bgp 5.5.5.5
BGP routing table entry for 5.5.5.5/32, version 17
Paths: (3 available, best #3, table Default-IP-Routing-Table)
Flag: 0x820
 Advertised to update-groups:
       1
 23 5
   10.0.12.2 from 10.0.12.2 (2.2.2.2)
     Origin IGP, localpref 100, valid, external
 23 5
   10.0.13.3 from 10.0.13.3 (3.3.3.3)
     Origin IGP, localpref 100, valid, external
 4 5
   10.0.14.4 from 10.0.14.4 (4.4.4.4)
     Origin IGP, localpref 100, valid, external, best

when BGP received updates, it lists them in order of older (down) to newer ( up), and if there’s no tie between assuming all the routes are valid, the oldest route will be selected as the best.

we can check that by simply clearing neighbor 4.4.4.4, since the session will restart and it will be the oldest one, it’ll be the one on top and 3.3.3.3 will be at the bottom and selected as the best

R1#clear ip bgp 10.0.14.4
R1#
*Mar  1 02:25:48.883: %BGP-5-ADJCHANGE: neighbor 10.0.14.4 Down User reset
*Mar  1 02:25:49.683: %BGP-5-ADJCHANGE: neighbor 10.0.14.4 Up

R1#show ip bgp 5.5.5.5
BGP routing table entry for 5.5.5.5/32, version 6
Paths: (3 available, best #3, table Default-IP-Routing-Table)
Flag: 0x860
 Advertised to update-groups:
       1
 4 5
   10.0.14.4 from 10.0.14.4 (4.4.4.4)
     Origin IGP, localpref 100, valid, external
 23 5
   10.0.12.2 from 10.0.12.2 (2.2.2.2)
     Origin IGP, localpref 100, valid, external
 23 5
   10.0.13.3 from 10.0.13.3 (3.3.3.3)
     Origin IGP, localpref 100, valid, external, best

As expected, now let’s try sending different MED from R2,R3 and R4 to R1 and see how will that work

R1#show ip bgp 5.5.5.5
BGP routing table entry for 5.5.5.5/32, version 13
Paths: (3 available, best #2, table Default-IP-Routing-Table)
Flag: 0x4860
 Advertised to update-groups:
       1
 4 5
   10.0.14.4 from 10.0.14.4 (4.4.4.4)
     Origin IGP, metric 400, localpref 100, valid, external
 23 5
   10.0.12.2 from 10.0.12.2 (2.2.2.2)
     Origin IGP, metric 200, localpref 100, valid, external, best
 23 5
   10.0.13.3 from 10.0.13.3 (3.3.3.3)
     Origin IGP, metric 300, localpref 100, valid, external

Well, it seems that R2 now is the preferred exit for 5.5.5.5, the reason is that by default BGP will compare routes in pairs when they’re from the same neighboring system. and since the route from R4 is the oldest route and R2 is from the same AS, the comparison will take lace between R2 and R3 while excluding R4 since it’s from a different AS.

to be really sure about it, we’ll lower the metric from R4 and soft clear the sessions. R2 should still be preferred route

R1#show ip bgp 5.5.5.5
BGP routing table entry for 5.5.5.5/32, version 11
Paths: (3 available, best #2, table Default-IP-Routing-Table)
 Advertised to update-groups:
       1
 4 5
   10.0.14.4 from 10.0.14.4 (4.4.4.4)
     Origin IGP, metric 100, localpref 100, valid, external
 23 5
   10.0.12.2 from 10.0.12.2 (2.2.2.2)
     Origin IGP, metric 200, localpref 100, valid, external, best
 23 5
   10.0.13.3 from 10.0.13.3 (3.3.3.3)
     Origin IGP, metric 300, localpref 100, valid, external

and still R2 is preferred. Now to change this behavior we need to make R1 compare between routes from different autonomous systems, this can be done by the command bgp always-compare-med

the way it works is as mentioned before, BGP scans prefixes from the top down, so it will compare between the routes from R4 and R2, and the best of them will compete with the oldest route

R1(config-router)#bgp always-compare-med

R1#clear ip bgp *
*Mar  1 00:25:19.759: %BGP-5-ADJCHANGE: neighbor 10.0.12.2 Down User reset
*Mar  1 00:25:19.763: %BGP-5-ADJCHANGE: neighbor 10.0.13.3 Down User reset
*Mar  1 00:25:19.767: %BGP-5-ADJCHANGE: neighbor 10.0.14.4 Down User reset
*Mar  1 00:25:20.531: %BGP-5-ADJCHANGE: neighbor 10.0.12.2 Up
*Mar  1 00:25:20.783: %BGP-5-ADJCHANGE: neighbor 10.0.13.3 Up
*Mar  1 00:25:20.951: %BGP-5-ADJCHANGE: neighbor 10.0.14.4 Up

R1#show ip bgp 5.5.5.5
BGP routing table entry for 5.5.5.5/32, version 2
Paths: (3 available, best #1, table Default-IP-Routing-Table)
Flag: 0x820
 Advertised to update-groups:
       1
 4 5
   10.0.14.4 from 10.0.14.4 (4.4.4.4)
     Origin IGP, metric 100, localpref 100, valid, external, best
 23 5
   10.0.13.3 from 10.0.13.3 (3.3.3.3)
     Origin IGP, metric 300, localpref 100, valid, external
 23 5
   10.0.12.2 from 10.0.12.2 (2.2.2.2)
     Origin IGP, metric 200, localpref 100, valid, external

and now, even though R4 is the newest route and not from the same AS as R2 and R3, this time R4 MED is included in the path selection.

now let’s remove bgp always-compare-med  and talk about bgp deterministic-med

deterministic med will group prefix from the same ASs together in the BGP table, regardless of the way it received them, and start comparing prefixes inside each group, and the best of group will compete with the best of other groups.

The reason to do this is that eliminated the arbitrary behavior of the the oldest route being the best from routes of the same AS

R1(config)#router bgp 1
R1(config-router)#no bgp always-compare-med

R1#clear ip bgp *
*Mar  1 00:42:35.435: %BGP-5-ADJCHANGE: neighbor 10.0.12.2 Down User reset
*Mar  1 00:42:35.439: %BGP-5-ADJCHANGE: neighbor 10.0.13.3 Down User reset
*Mar  1 00:42:35.443: %BGP-5-ADJCHANGE: neighbor 10.0.14.4 Down User reset
*Mar  1 00:42:36.107: %BGP-5-ADJCHANGE: neighbor 10.0.12.2 Up
*Mar  1 00:42:36.739: %BGP-5-ADJCHANGE: neighbor 10.0.13.3 Up
*Mar  1 00:42:37.167: %BGP-5-ADJCHANGE: neighbor 10.0.14.4 Up

R1#show ip bgp 5.5.5.5
BGP routing table entry for 5.5.5.5/32, version 6
Paths: (3 available, best #3, table Default-IP-Routing-Table)
Flag: 0x4860
 Advertised to update-groups:
       1
 23 5
   10.0.12.2 from 10.0.12.2 (2.2.2.2)
     Origin IGP, metric 200, localpref 100, valid, external
 4 5
   10.0.14.4 from 10.0.14.4 (4.4.4.4)
     Origin IGP, metric 100, localpref 100, valid, external
 23 5
   10.0.13.3 from 10.0.13.3 (3.3.3.3)
     Origin IGP, metric 300, localpref 100, valid, external, best

after removing the always compare med and clearing all sessions, R1 just preferred the oldest route.

now let’s enable bgp deterministic-med

R1(config)#router bgp 1
R1(config-router)#bgp deterministic-med
R1(config-router)#end

R1#show ip bgp 5.5.5.5
BGP routing table entry for 5.5.5.5/32, version 9
Paths: (3 available, best #2, table Default-IP-Routing-Table)
Flag: 0x4840
 Advertised to update-groups:
       1
 4 5
   10.0.14.4 from 10.0.14.4 (4.4.4.4)
     Origin IGP, metric 100, localpref 100, valid, external
 23 5
   10.0.12.2 from 10.0.12.2 (2.2.2.2)
     Origin IGP, metric 200, localpref 100, valid, external, best
 23 5
   10.0.13.3 from 10.0.13.3 (3.3.3.3)
     Origin IGP, metric 300, localpref 100, valid, external

you can see that right after we issued the command, routes from R2 is now the preferred exit for 5.5.5.5, this eliminates R1 preferring routes based on their age

now finally, let’s enable bgp always-compare-bed  with bgp deterministic-med

R1(config-router)#bgp always-compare-med

R1#clear ip bgp *
*Mar  1 00:56:55.059: %BGP-5-ADJCHANGE: neighbor 10.0.12.2 Down User reset
*Mar  1 00:56:55.067: %BGP-5-ADJCHANGE: neighbor 10.0.13.3 Down User reset
*Mar  1 00:56:55.071: %BGP-5-ADJCHANGE: neighbor 10.0.14.4 Down User reset
*Mar  1 00:56:55.447: %BGP-5-ADJCHANGE: neighbor 10.0.13.3 Up
*Mar  1 00:56:55.895: %BGP-5-ADJCHANGE: neighbor 10.0.12.2 Up
*Mar  1 00:56:56.099: %BGP-5-ADJCHANGE: neighbor 10.0.14.4 Up

R1#show ip bgp 5.5.5.5
BGP routing table entry for 5.5.5.5/32, version 2
Paths: (3 available, best #1, table Default-IP-Routing-Table)
Flag: 0x820
 Advertised to update-groups:
       1
 4 5
   10.0.14.4 from 10.0.14.4 (4.4.4.4)
     Origin IGP, metric 100, localpref 100, valid, external, best
 23 5
   10.0.12.2 from 10.0.12.2 (2.2.2.2)
     Origin IGP, metric 200, localpref 100, valid, external
 23 5
   10.0.13.3 from 10.0.13.3 (3.3.3.3)
     Origin IGP, metric 300, localpref 100, valid, external

since we have deterministic MED enabled along with always compare MED , routes from group one ( which contains R4 only) is compared, and the best, which is R4 is compared to the best of group 2 which contains R2 and R3. Obviously the winner will be R4 due to the lower MED

a few things to note before closing this post, Cisco recommends enabling deterministic MED in BGP deployments to eliminate any “randomness” when it comes to routers choosing the best path.

always compare MED needs and agreement between your domain and the other different service providers, if you’re hooked up to two service providers and ISP-A for example decided to send you a lower MED, all traffic will be directed to ISP-A even though ISP-B might be the better one for you.

hopefully this cleared a little bit the difference between those two commands

Monday, September 8, 2014

Redistributing iBGP into IGP via bgp redistribute-internal command and the probable consequences


By default, The redistribution of iBGP into IGP isn’t allowed on Cisco IOS. The reason is redistribution of BGP into IGP can introduce several problems, some of them are:
  • If an internet router redistributes all 500,000 internet prefix into the IGP will for sure crash those routers, as the IGPs aren’t designed to carry that amount of routes
  • Redistributing BGP routes into IGP in a router in the middle of the network can cause a blackhole since the routes loses the info of the route originator and the reditributing router becomes the advertizing router for the route


In service provider networks, BGP is running everywhere, there’s no need to reditribute BGP into the IGP. ( we can excluse MPLS  L3VPNs since that’s a different story)
In some network, BGP is running only on the edge of the network and the rest of the network is running IGP only inside it, even though reditributing bgp into the IGP so that the internal network can reach outside the AS seems like a good solution, but it’s not, you can just generate a default route from the edge routers using the IGP and that can handle the  outbound traffic of the internal AS networks easily and without a hassle.
For the sake of the argument, I’ll discuss how to enable the redistribution of iBGP into IGP and how can that introduce a routing loop in the AS

Let’s check this simple topology



AS2 is running OSPF as an IGP, R1 and R2 is eBGP neighbors. R2 is iBGP neighbor to both R3 and R4

Now let’s check R3 and R4 BGP table for the network 1.1.1.1/32
R3#show ip bgp 1.1.1.1
BGP routing table entry for 1.1.1.1/32, version 2
Paths: (1 available, best #1, table Default-IP-Routing-Table)
  Not advertised to any peer
  1
    2.2.2.2 (metric 21) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 0, localpref 100, valid, internal, best

R4#show ip bgp 1.1.1.1/32
BGP routing table entry for 1.1.1.1/32, version 17
Paths: (1 available, best #1, table Default-IP-Routing-Table)
  Not advertised to any peer
  1
    2.2.2.2 (metric 11) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 0, localpref 100, valid, internal, best


Both R3 and R4 are pointing to R2 as the next-hop for the prefix 1.1.1.1/32 as expected. Now let’s try to redistribute the iBGP routes on R3

    R3(config)#router ospf 1
R3(config-router)#redistribute bgp 2 subnets

Checking the OSPF database
R3#show ip ospf database

            OSPF Router with ID (3.3.3.3) (Process ID 1)

                Router Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Link count
2.2.2.2         2.2.2.2         136         0x80000008 0x00E723 4
3.3.3.3         3.3.3.3         42          0x80000009 0x0046B4 4
4.4.4.4         4.4.4.4         1996        0x80000006 0x0087F3 3

                Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.2.4.2        2.2.2.2         136         0x80000003 0x009174

10.3.4.3        3.3.3.3         8           0x80000003 0x007F7C
It seems that the route isn’t redistributed even though you can clearly see it under the router ospf configuration
router ospf 1
 log-adjacency-changes
 redistribute bgp 2 subnets

now let’s enable the iBGP reditribution for IGPs
R3(config)#router bgp 2
R3(config-router)#bgp redistribute-internal

Checking the ospf database again
R3#show ip ospf database 

            OSPF Router with ID (3.3.3.3) (Process ID 1)

                Router Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Link count
2.2.2.2         2.2.2.2         332         0x80000008 0x00E723 4
3.3.3.3         3.3.3.3         238         0x80000009 0x0046B4 4
4.4.4.4         4.4.4.4         176         0x80000007 0x0085F4 3

                Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.2.4.2        2.2.2.2         332         0x80000003 0x009174
10.3.4.3        3.3.3.3         204         0x80000003 0x007F7C

                Type-5 AS External Link States

Link ID         ADV Router      Age         Seq#       Checksum Tag
1.1.1.1         3.3.3.3         41          0x80000001 0x00B2EF 1


That’s more like it, the prefix 1.1.1.1/32 is now in the ospf database and the advertising router is R3 (3.3.3.3)
Uptill now everything seems fine, but infact it isn’t, if we try to ping /traceroute from R3 or R4, it will fail

R3#ping 1.1.1.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)


R3#traceroute 1.1.1.1

Type escape sequence to abort.
Tracing the route to 1.1.1.1

  1 10.3.4.4 64 msec 52 msec 20 msec
  2 10.3.4.3 24 msec 44 msec 20 msec
  3  *  *
    10.3.4.4 84 msec
  4  *  *  *
  5  *  116 msec *
  6  *  *  *
  7  *  124 msec *
  8  *  *
    10.3.4.3 124 msec
  9  *  *  *
 10  *  *  *
 11  *  *
    10.3.4.4 144 msec
 12  *




R4#ping 1.1.1.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)


R4#traceroute 1.1.1.1 timeout 1

Type escape sequence to abort.
Tracing the route to 1.1.1.1

  1 10.3.4.3 44 msec 36 msec 20 msec
  2 10.3.4.4 40 msec 40 msec 24 msec
  3  *  *
    10.3.4.3 60 msec
  4  *
    10.3.4.4 84 msec *
  5  *  *  *
  6  *  72 msec *
  7  *  *  *
  8  *  *  124 msec
  9  *  *  *
 10  *  *  *
 11  *  *  *

 12 10.3.4.4 168 msec *  *

Well, it seems that we have a routing loop at our hands here, the packet destined to 1.1.1.1 keeps bouncing between R3 and R4. The reason is that R3 is advertizing this network to all routers in the OSPF domain, which has an AD of 110, which by turn is more preferable than iBGP AD of 170., whilst the the shortest path to R2 ( next-hop) is through R4 as metric of 21  compared to 61 . So when R3 is trying to send the packet to R2, it will first send it to R4, when the packet reaches R4, it will find a better path through R2. That’s when the routing loop occures.

Here’s how to confirm it

R3#show ip route 1.1.1.1
Routing entry for 1.1.1.1/32
  Known via "bgp 2", distance 200, metric 0
  Tag 1, type internal
  Redistributing via ospf 1
  Advertised by ospf 1 subnets
  Last update from 2.2.2.2 00:14:25 ago
  Routing Descriptor Blocks:
  * 2.2.2.2, from 2.2.2.2, 00:14:25 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 1

R3#show ip route 2.2.2.2
Routing entry for 2.2.2.2/32
  Known via "ospf 1", distance 110, metric 21, type intra area
  Last update from 10.3.4.4 on FastEthernet0/0, 01:23:23 ago
  Routing Descriptor Blocks:
  * 10.3.4.4, from 2.2.2.2, 01:23:23 ago, via FastEthernet0/0
      Route metric is 21, traffic share count is 1

R4#show ip route 1.1.1.1
Routing entry for 1.1.1.1/32
  Known via "ospf 1", distance 110, metric 1
  Tag 1, type extern 2, forward metric 10
  Last update from 10.3.4.3 on FastEthernet0/0, 00:16:49 ago
  Routing Descriptor Blocks:
  * 10.3.4.3, from 3.3.3.3, 00:16:49 ago, via FastEthernet0/0
      Route metric is 1, traffic share count is 1
      Route tag 1


In order to fix that, you have several options, some of them are
  •     Redistribute iBGP into BGP on R2 , ie the border router
  •   Reduce the cost between the R3 the redistributing router and the border router, but most probably that might affect all of your IGP domain routers since this will propagate ( you should be extremely careful)



Hopefuly this cleared up the usage of bgp redistribute-internal command and probable consequences of using it