Recoverable Network Architecture

Last year I outlined my Defensible Network Architecture 2.0, consisting of 8 (originally 7, plus 1 great idea from a comment) characteristics of an enterprise that give it the best chance to resist an intrusion.

I'd like to step into the post-intrusion phase to discuss Recoverable Network Architecture (RNA, goes well with DNA, right?), a set of characteristics for an enterprise that give it the best chance to recover from an intrusion. This list is much rougher than the previous DNA list, and I appreciate feedback. The idea is that without these characteristics, you are not likely to be able to resume operations following an incident.

RNA does not mean your enterprise will be intruder-free, just as DNA didn't mean you would be intrusion free. Rather, if you do not operate a Recoverable Network Architecture you have very little chance of returning at least the system of interest to a trustworthy state. (Please remember the difference between trusted and trustworthy!)

The recoverable network must be defensible. Being defensible not only helps with resisting intrusions; it helps recovery too. For example, the network must already be:

Monitored: Monitoring helps determine incident scope before recovery and remediation effectiveness after recovery.

Inventoried: Inventories help incident responders understand the range of potential victims in an incident before recovery and help ensure no unrecognized victims are left behind after recovery.

Controlled: Control helps implement short term incident containment, if appropriate, before recovery, and enforces better resistance after recovery.

Claimed: Because an asset is claimed, incident responders know which asset owners to contact.

Minimized: Assets that retain security exposures following recovery are subject to easy compromise again.

Assessed: Assessment validates that monitoring works (can we see the assessment?), that inventories are accurate (is the system where it should be?), that controls work (did we need an exception to scan the target, or could we sail through?), and that minimization/keeping current worked (are easy holes present?)

Current: Assets that retain security vulnerabilities following recovery are subject to easy compromise again.

Measured: Measurement helps justify various recovery actions, e.g. showing that so-called "cleaning" is less effective and costs more than complete system rebuilds.

Assets in a recoverable network must be capable of being replaced -- fast. IT shops are slowly waking up to the fact that "cleaning" does not work, is too expensive, and should be standard for any disaster recovery/business process continuity activity anyway. Complete rebuilds are becoming the only semi-effective remedy. (I say semi-effective because even complete rebuilds can preserve BIOS-level and other persistent, extra-OS rootkits.)

Incident responders in a recoverable network must be authorized and empowered to collect evidence, analyze leads, escalate findings, and guide remediation. An IRT that is asked to assist with an incident, but that is not allowed or able to collect information from a victim, is basically helpless. An IRT that must wait for other parties to provide information is ineffective, and likely to find the "data" provided by the other party to be of decreasing value as time passes and asset owners trounce host-based evidence.

A recoverable network is supported by an organization that has planned for intrusions. The IR plan must engage a variety of parties, contain realistic scenarios, and actually be followed. IR plans help increase the likelihood of incident recovery because time is not wasted on phone calls asking "what do we do now?"

A recoverable network is supported by an organization that has exercised the IR plan. Drills find weaknesses in plans that will hamper recovery.

A recoverable network is supported by an IRT that is appropriately segmented. By that I mean that the IRT's infrastructure is not hosted or maintained by the same infrastructure the IRT is trying to recover. In other words, the IRT should not depend on equipment administered by the same people who suffered a loss of their credentials, or be part of the same Windows domain, and so on. If the IRT does share infrastructure with the victim, then the IRT can no longer trust its own systems and must first restore the trustworthiness of its own gear before turning to the organization.

A recoverable network is supported by an IRT that is also connected. The team can communicate in degraded situations, with itself and with outside parties. The IRT will definitely have requirements that exceed the end user community, and almost certainly even the IT shop.

What do you think of these requirements? I may try expanding on each of the DNA items with examples at some point. If that works well I will apply the same to RNA.

Richard Bejtlich is teaching new classes in Europe and Las Vegas in 2009. Online Europe registration ends by 1 Apr, and seats are filling. "Super Early" Las Vegas registration ends 15 Mar.

Comments

Anonymous said…

I might also suggest two refinements although they may be covered within the larger scope of your statements and are just not explicitly identified.

1. Your system/network must be Documented. Not just the owner information is required but a clear understanding of the data elements stored, recovery procedures, continuity procedures. This has to detailed and be made available so that a local IT resource can be the "hands on" asset for resumption or continuity operations without previous knowledge of the system. (Just to be clear I'm not advocating they should be pulling memory dumps or anything)

2. Categorization. The system/network should absolutely be categorized according to the needs of your business and/or IR team. Categorization is not only to help in the determination of the priority for the incident but for post incident analysis or analysis of previous incidents to show trends of systems/networks being targeted/exploited. This may help you justify further activities (additional monitoring, redesign, etc) on categories of systems or networks later on.

Rocky

1:10 AM