Renesys Blog on Routing Vulnerabilities
I've been writing about the routing infrastructure monitoring company Renesys for several years. James Cowie's post Staring Into the Gorge contains some real gems:
Here We Go Again.
Imagine an innocent BGP message, sent from a random small network service provider's border router somewhere in the world. It contains a payload that is unusual, but strictly speaking, conformant to protocol. Most of the routers in the world, when faced with such a message, pass it along. But a few have a bug that makes them drop sessions abruptly and reopen them, flooding their neighbors with full-table session resets every time they hear the offending message. The miracle of global BGP ensures that every vulnerable router on earth gets a peek at the offending message in under 30 seconds. The global routing infrastructure rings like a bell, as BGP update rates spike by orders of magnitude in the blink of an eye. Links congest. Small routing hardware falls over and dies. It takes hours for things to return to normal...
At 17:07:26 UTC on Monday, CNCI (AS9354), a small network service provider in Nagoya, Japan, advertised a handful of BGP updates containing an empty AS4PATH attribute...
This one seems to have bitten Cisco's IOS XR, a relatively newborn "from scratch" rewrite of the venerable IOS, destined to run on big iron, like the CSR-1 or the 12000 series...
The global mesh of BGP-speaking routers that we call the Internet has inherent vulnerabilities that stem from the software quality and policy weaknesses of its weakest participants, and the amplification potential of its best-connected participants. Running sloppy software at the edge of the routing mesh (in enterprises, say) is unlikely to give anyone the ability to propagate large amounts of instability or partition the Internet. But closer to the core, I think we have a serious problem to contemplate.
Remember, if you can get just one provider to listen to you, and not filter your announcements, you can get your message into the ear of just about every BGP-speaking router on the planet within about thirty seconds. And if some subpopulation of those routers can be reset, they act as amplifiers for your instability. Power law outage-size distributions are not a myth — they are a logical consequence of the structure of the Internet, the importance of a few key participants in carrying global traffic, and their reliance for interconnection on technologies that are clearly still in the shaking-out-the-obvious-bugs mode.
So that is really cool. I liked the following too:
I was at USENIX last week, and sat in on a great workshop on security metrics, where Sandy Clark gave a somewhat controversial presentation on the interaction between software quality and the timing of exploit appearances...
[O]ne of the strongest predictors of a significantly large time to the emergence of the first zero-day exploit for a new version of software is the degree to which the release represents a substantial rewrite of the code. Doing a rewrite seems to start a "honeymoon period," during which time the system in question is safer from exploitation than it has been in a long time. In fact, the magnitude of the protective effect is so significant, that you might ask yourself whether a dollar spent in pursuit of higher quality code is actually better spent rewriting the code periodically, to whatever quality standard you can achieve.
That sounds like an argument for diversity to me. Writing new software introduces diversity, and the attackers have to decide if they want to spend resources to understand the new target sufficiently to exploit it.
Here We Go Again.
Imagine an innocent BGP message, sent from a random small network service provider's border router somewhere in the world. It contains a payload that is unusual, but strictly speaking, conformant to protocol. Most of the routers in the world, when faced with such a message, pass it along. But a few have a bug that makes them drop sessions abruptly and reopen them, flooding their neighbors with full-table session resets every time they hear the offending message. The miracle of global BGP ensures that every vulnerable router on earth gets a peek at the offending message in under 30 seconds. The global routing infrastructure rings like a bell, as BGP update rates spike by orders of magnitude in the blink of an eye. Links congest. Small routing hardware falls over and dies. It takes hours for things to return to normal...
At 17:07:26 UTC on Monday, CNCI (AS9354), a small network service provider in Nagoya, Japan, advertised a handful of BGP updates containing an empty AS4PATH attribute...
This one seems to have bitten Cisco's IOS XR, a relatively newborn "from scratch" rewrite of the venerable IOS, destined to run on big iron, like the CSR-1 or the 12000 series...
The global mesh of BGP-speaking routers that we call the Internet has inherent vulnerabilities that stem from the software quality and policy weaknesses of its weakest participants, and the amplification potential of its best-connected participants. Running sloppy software at the edge of the routing mesh (in enterprises, say) is unlikely to give anyone the ability to propagate large amounts of instability or partition the Internet. But closer to the core, I think we have a serious problem to contemplate.
Remember, if you can get just one provider to listen to you, and not filter your announcements, you can get your message into the ear of just about every BGP-speaking router on the planet within about thirty seconds. And if some subpopulation of those routers can be reset, they act as amplifiers for your instability. Power law outage-size distributions are not a myth — they are a logical consequence of the structure of the Internet, the importance of a few key participants in carrying global traffic, and their reliance for interconnection on technologies that are clearly still in the shaking-out-the-obvious-bugs mode.
So that is really cool. I liked the following too:
I was at USENIX last week, and sat in on a great workshop on security metrics, where Sandy Clark gave a somewhat controversial presentation on the interaction between software quality and the timing of exploit appearances...
[O]ne of the strongest predictors of a significantly large time to the emergence of the first zero-day exploit for a new version of software is the degree to which the release represents a substantial rewrite of the code. Doing a rewrite seems to start a "honeymoon period," during which time the system in question is safer from exploitation than it has been in a long time. In fact, the magnitude of the protective effect is so significant, that you might ask yourself whether a dollar spent in pursuit of higher quality code is actually better spent rewriting the code periodically, to whatever quality standard you can achieve.
That sounds like an argument for diversity to me. Writing new software introduces diversity, and the attackers have to decide if they want to spend resources to understand the new target sufficiently to exploit it.
Comments