When LinkedIn announced their Project Falco I knew exactly what one of my future Software Gone Wild podcasts would be: a chat with Russ White (Mr. CCDE, now network architect @ LinkedIn). It took us a long while (and then the summer break intervened) but I finally got it published: Episode 62 is waiting for you. —ipspace
What about I2RS performance?
The first post in this series provides a basic overview of I2RS; there I used a simple diagram to illustrate how I2RS interacts with the RIB—
One question that comes to mind when looking at a data flow like this (or rather should come to mind!) is what kind of performance this setup will provide. Before diving into the answer to this question, though, perhaps it’s important to ask a different question—what kind of performance do you really need? There are (at least) two distinct performance profiles in routing—the time it takes to initially start up a routing peer, and the time it takes to converge on a single topology and/or route change. In reality, this second profile can be further broken down into multiple profiles (with or without an equal cost path, with or without a loop free alternate, etc.), but for our purposes I’ll just deal with the two broad categories here.
If your first instinct is to say that initial convergence time doesn’t matter, go back and review the recent Delta Airlines outage carefully. If you are still not convinced initial convergence time matters, go back and reread what you can find about that outage. And then read about how Facebook shuts down entire data centers to learn what happens, and think about it some more. Keep thinking about it until you are convinced that initial convergence time really matters. 🙂 It’s a matter of “if,” not “when,” where major outages like this are concerned; if you think your network taking on the order of tens of minutes (or hours) to perform initial convergence so applications can start spinning back up is okay, then you’re just flat wrong.
How fast for initial convergence is fast enough? Let’s assume we have a moderately sized data center fabric, or larger network, with something on the order of 50,000 routes in the table. If your solution can install routes on the order of 8,000 routes in ten seconds in a lab test (as a recently tested system did), then you’re looking at around a minute to converge on 50,000 routes in a lab. I don’t know what the actual ratio is, but I’d guess the “real world” has at least a doubling effect on route convergence times, so two minutes. Are you okay with that?
To be honest, I’m not. I’d want something more like ten seconds to converge on 50,000 routes in the real world (not in a lab). Let’s think about what it takes to get there. In the image just above, working from a routing protocol (not an I2RS object), we’d need to do—
- Receive the routing information
- Calculate the best path(s)
- Install the route into the RIB
- The RIB needs to arbitrate between multiple best paths supplied by protocols
- The RIB then collects the layer 2 header rewrite information
- The RIB then installs the information into the FIB
- The FIB, using magic, pushes the entry to the forwarding ASIC
What is the point of examining this process? To realize that a single route install is not, in fact, a single operation performed by the RIB. Rather, there are several operations here, including potential callbacks from the RIB to the protocol (what happens when BGP installs a route for which the next hop isn’t available, but then becomes available later on, for instance?). The RIB, and any API between the RIB and the protocol, needs to operate at about 3 to 4 times the speed at which you expect to be able to actually install routes.
What does this mean for I2RS? To install, say, 50,000 routes in 10 seconds, there needs to be around 200,000 transactions in that 10 seconds, or about 20,000 transactions per second. Now, consider the following illustration of the entire data path the I2RS controller needs to feed routing information through—
For any route to be installed in the RIB from the I2RS controller, it must be:
- Calculated based on current information
- Marshalled, which includes pouring it into the YANG format, potentially pushed to JSON, and placed into a packet
- Transported, which includes serialization delay, queuing, and the like
- Unmarshalled, or rather locally copied from the YANG format into a format that can be installed into the RIB
- Route arbitration and layer 2 rewrite information calculation performed
- Any response, such as an “install successful,” or “route overridden” returned through the same process to the I2RS controller
It is, of course, possible to do all of this 20,000 times per second—especially with a lot of heavy optimization, etc., in a well designed/operated network. But not all networks operate under ideal conditions all the time, so perhaps replacing the entire control plane with a remote controller isn’t the best idea in the world.
Luckily, I2RS wasn’t designed to replace the entire control plane, but rather to augment it. To explain, the next post will begin considering some use cases where I2RS can be useful.
Networking is often a “best effort” type of configuration. We monkey around with something until it works, then roll it into production and hope it holds. As we keep building more patches on to of patches or try to implement new features that require something to be disabled or bypassed, that creates a house of cards that is only as strong as the first stiff wind. It’s far too easy to cause a network to fall over because of a change in a routing table or a series of bad decisions that aren’t enough to cause chaos unless done together. —Networking Nerd
But what are we to do about it. Tom’s Take is that we need to push back on applications. This, also, I completely agree with. But this only brings us to another problem—how do we make the case that applications need to be rewritten to work on a simpler network? The simple answer is—let’s teach coders how networks really work, so they can figure out how to better code to the environment in which their applications live. Let me be helpful here—I’ve been working on networks since somewhere around 1986, and on computers and electronics since before then. When I first started in network engineering, I could still wander up in the hills and see Noah’s Ark…
And in all that time, “we,” as in network engineers, have been trying to teach coders how to make their applications run on the network.
Maybe—just maybe—this quest isn’t actually going anyplace. Maybe we convince a few coders here and there. And then they’re replaced by a new generation of coders (just like old network engineers are replaced with new ones every now and again) who never learned those lessons, and want to do really cool stuff, and see the network engineering team as a bunch of old fuddy-duddies who don’t know how to get things done.
The root of this problem isn’t coders. It’s the people who run the businesses coders work for. In most places, IT is just an inconvenience—something I must use to get my “real job” done, rather than an invaluable tool that enables me. Selling X is the focus, IT just gets in the way by making me jump through hoops to get to the information I need to sell X, and by making me put stuff in that I don’t care about once I’ve sold X. In this world, the network is one layer back from the actual system I need to work so I can get my “real work” done, so it’s an enigma wrapped in a painful GUI I hate to use but I must.
This is how networks are really seen by folks who use our systems. Network engineering is a bunch of whiners piled on top of a bunch of folks who make things that make my job harder.
If teaching coders isn’t going to solve the problem, then what do we do?
This is what I think we need to do: we need to go to where the money is. Applications aren’t bought by coders, just like networks aren’t. When your manager comes to you and says, “we need a new network,” do they also tell you which gear to buy, and how to configure it? That’s not generally my experience (just remember to keep your managers hermetically sealed off from sales engineers and in flight “CIO” magazines, and your life will be easier). The same applies to coders—when someone says, “I want an application that does this,” they don’t specify how it should work, just that it should. Which means the coder is going to take the shortest of short cuts to make it work, knowing that when it comes down to deployment day, the exec is going to push on the network people to “make it work.”
Why on the network folks? First, we always geek out and say “yes.” Second, we don’t know how to effectively say “no.” We don’t have any sort of language, or process, etc., to say “no” with.
This is what I think we need to change. We need to learn how to talk in terms of tradeoffs and complexity. We need to figure out how to say things like, “sure, I can deploy that, and it won’t cost a penny today—but it will cost you in downtime and operational costs in the future.” As a field, we don’t even have the language for this sort of discussion, much less any way to measure it and make it real. I’ve worked in the area of complexity for several years now because I believe this is where that language is going to come from. I don’t think we’re there yet, but I do think understanding complexity is the right tail to grab when trying to get to this dog. The applications folks already know all about this stuff in their own world; they talk a lot about modularity, APIs, and the like. This isn’t to say they always get it right, but at least they’re talking about it. We’re not even talking about it—instead, we’re building patch on patch, feature on feature, with little thought about the technical debt we’re creating.
Only when we can talk complexity against complexity, technical debt against technical debt, head to head with companies that develop applications, will we begin to truly participate in these conversations.
The network vendors aren’t going to help us here, because they’re keen to sell boxes with the latest features—this is something we must do. There is no calvary off on the horizon.
And then again…
The reality is we shouldn’t need DevOps for configuration at all. This is a bit of a revolution in my thinking in the last two or three years, but what I’m trying to do is to simply make DevOps, as it’s currently constituted, obsolete. DevOps should be about understanding how the network is working and making the network work better, rather than about making the network work in the first place. We need to get to the point where configuration just isn’t something that’s “done” any longer, beyond a few basic points that get things up and running. I know we won’t ever get there, but this is an attitude, not an absolute destination, that keeps me thinking about how to make things simpler.
My daughter has started a new fiction series on her blog; I’ll just link to it here for anyone who’s interested.
A cold circular object is pressed to my head, a hand reaching around to cover my mouth before I can make a sound. My eyes squeeze shut reflexively. Is this it? Is this how I’m going to die? After so much time of hiding, running, escaping the war around me, am I finally reaching the end of my life? My heart explodes inside me, waiting for the explosion of powder, marking the final moments of my life…but nothing comes. I open my eyes, surprised. What is this? Why hasn’t he made that one little motion that would end my life? —Nenshou Fire