OSPF: time to get rid of the totally not so stubby legacy (posted 2022-05-12)
Recently, I was looking through some networking certification material. A very large part of it was about OSPF. That's fair, OSPF is probably the most widely used routing protocol in IP networks. But the poor students were submitted to a relentless sequence of increasingly baroquely named features: stub areas, not-so-stubby-areas, totally stubby areas, culminating in totally not-so-stubby areas.
Can we please get rid of some of that legacy? And if not from the standard documents or the router implementations, then at least from the certification requirements and training materials?
Shortest path first, but not so fast
The Open Shortest Path First routing protocol (OSPF, Internet Standard 54) was first defined in RFC 1131 in 1989. So in internet time, OSPF is truly ancient. The base OSPFv2 specification is over 200 pages, with additional extensions in separate documents spanning the early 1990s to the late 2010s.
OSPF is powered by Edsgar Dijkstra's shortest path first algorithm. SPF is a relatively efficient algorithm for finding the shortest path between two places, in the real world or in a network. Still, in a large network there's a lot of paths to check until you can be sure you've found the shortest one. The problem here is that for a network that's 10 times larger, SPF needs 60 times as long to run. So if a router in a network with, say, 100 routers, needs a second to do its SPF calculations after an update, in a network with 1000 routers that takes a minute, and in a network with 10,000 routers an hour.
So in order to make OSPF useful in large networks, you can split your network into different areas. The SPF calculations are then contained to the routers within each area. So rather than calculate SPF over a 10,000-router network, you could have 100 areas with 100 routers each. Then routers that connect two areas would have to calculate SPF over 100 routers for two areas, so 2 seconds rather than an hour worth of SPF calculations.
But if each of those 10,000 routers still injects two, three or four address blocks into OSPF, that means the OSPF database will have something like 30,000 entries. So now updating and remembering all those address blocks becomes a bottleneck. Solution: summarize link advertisements. So if routers in area 35 advertise address blocks 10.35.1.x, 10.35.2.x, … 10.35.95.x, rather than push out all that information to all 10,000 routers throughout the network, the area border routers for area 35 simply say “10.35.x.x” to the rest of the network.
Even better: if an area only connects to the “backbone” area (area 0) and doesn't learn any routing information from other areas or from outside OSPF, it's a stub area that really doesn't even need to know anything that's happening in the rest of the network, so let's give it a default route to reach the rest of the world.
Variations on a stubby theme
Stub areas still have some OSPF routing information from other areas. We can get rid of that too, and then we have a totally stubby area.
On the other hand, maybe we want to import external routing information into OSPF even in our stub area, and then propagate that external information to other areas. This makes for a not-so-stubby area.
And who said you can't have your cake and eat it: let's make our totally stubby area not-so-stubby, and we'll have a totally not-so-stubby area, guaranteeing certification income for years to come. (See Wikipedia's page on OSPF for more details.)
Spring cleaning
As protocol designers, we're really good at adding more capabilities, more options. As network architects and engineers, we're really good at adding complexity to make our networks do something they won't do out of the box. But we can't just keep adding options and complexity without ever taking any of it away. At least not if we want to have a fighting chance at teaching our craft to the next generation so we can retire at some point.
Our routers/computers are now 1000 times as fast and have 1000 times the memory as the 68030-based routers/computers back in 1990. OSPF implementations support incremental SPF.
10,000 routers in one area will melt the network operations center long before the SPF calculations melt the router CPUs. I've personally worked on a network with 600 routers in area 0 back in 1999. SPF performance was the least of our concerns.
So I'm calling it: OSPF areas and summarization are now legacy. New and current OSPF networks should just use a flat area 0 rather than try to micromanage the information flow between areas. Students should no longer have to learn how areas work, and only be informed about the various flavors of stubbiness as an example of humorous naming that doesn't age well.