I used to be not that fond of BGP as a routing protocol—it just struck me as too complicated and scary. Then I had to spend a lot of time with it and it became a firm favourite.
In fact I spent so much time with it that I think my husband was going to cite it as a co-respondent in the divorce case! (Just joking—he probably likes it more than me. More than I like it, I should say—not more than he likes me, obviously
. I think).
BGP is without doubt the most scalable and controllable routing protocol around. But with the best will in the world, nobody ever described it as particularly responsive. You could worry yourself into an early grave waiting for it do something sometimes.
All that is changing. There have been various tweaks over time to speed up its failure detection and reconvergence mechanisms, from event-driven failure detection and recalculation (rather than waiting for scheduled tasks to run) to narrowing down the bits of the BGP table that have to be reworked when something changes.
But the problem BGP has always had is that it just carries so many prefixes. Over 350,000 for an Internet routing table. No IGP could cope with that. Something that big just has to take time, and if its reconvergence time is directly related to the number of prefixes it has to look after, it’s going to be near impossible to make it very fast.
If you’re using BGP within your enterprise, it is possible to get subsecond convergence—not something that would have been possible several years ago, but do-able now. Even with say a few thousand routes, a few seconds isn’t out of the question. The trick has been to scale that. And the goal has been to get to some sort of deterministic, prefix-independent convergence.
Enter BGP Prefix Independent Convergence (PIC). Actually it’s not that new—it was presented at NANOG (North American Network Operators’ Group) back in 2007 by a clever chap from Cisco but it’s no longer just to be found on huge, expensive Service Provider routers, but on the sort of kit the rest of us can afford too.
The idea is that rather than tie every BGP destination prefix to the interface out which it can be reached, you tie that prefix (i.e. network) to a BGP next hop which is in turn linked to an IGP next hop (which points at an interface).
It sounds a bit more convoluted but it’s now hierarchical and if an interface fails, instead of having to update the forwarding table with a new interface for every route (which will take some time if you have several hundred thousand of them), you now just have to update the IGP next hop-to-interface piece. If you can still get to the next hop via another path, that’s the only part you need to update in order to reroute the traffic.
That part will reconverge in the time your IGP takes (typically a few seconds or even subsecond) and it doesn’t actually matter how many BGP prefixes are involved as they’ll all point at the same next hop, and that hasn’t changed.
The software will then carry on and do all its proper reconvergence tasks to pick the best path, as the one it’s chosen may not be the optimal one, but it can do that at its normal (not hugely speedy) rate without affecting anything. The important thing is that the traffic has started flowing again with minimal disruption.
Tests have shown that you can get connectivity back in an BGP network in a couple of hundred milliseconds whether you have 30 BGP routes or 300,000, whereas you’d have been looking at between 10 and 100 seconds for greater than say 100,000 routes without this technique. And the beauty of it is that it doesn’t matter how many routes you have.
Okay, if you’re not a Service Provider, perhaps you don’t care, but if you use a Service Provider network, it’s nice to know that there may be things they can do in the future to speed up your end to end convergence.
And I’ll have to stop making inane jokes about BGP convergence and watching paint dry.
Find your next job with techworld jobs