Saturday, March 15, 2008

How Many Burning Homes

I mentioned the idea of host integrity assessment in my post Controls Are Not the Solution to Our Problem. The idea is to sample live devices (laptops, desktops, servers, routers, switches -- anything that runs a network-enabled operating system) to see if they are trustworthy. (They may be trusted, but that does not make them trustworthy.)

I described how I might determine trustworthiness, or integrity, in Three Capabilities, Three Companies. I'd like to expand on these thoughts with five metrics. Before showing the security metrics, I'd like to introduce an analogy.

Imagine a city with an understaffed, under-resourced, and possibly unappreciated fire department. The FD would like to prevent fires, but it spends most of its time responding to fires. How should city leadership decide how to staff and resource the FD? (There is no way to eliminate fires, at least no way that could ever be financed using any foreseeable resources. Even if people lived in concrete cells with no furnishings, they would probably figure out a way to light each other or the ground on fire!)

In this situation, one might argue that one way to judge the peril of the situation is the ability of the FD to "manage the fires." In other words, perhaps there is some number of burning homes that can be maintained while the FD responds, contains, and extinguishes fires. If the FD is large enough the number of fires can be rapidly decreased such that the time to extinguish is very small. If the FD is too small, then eventually the whole city burns because the fires overwhelm the FD's ability to respond, contain, and extinguish.

The question becomes what is the "right" number? You could think in terms of the following metrics.

  1. Number of burning homes at any sampled time. The higher this number, the more likely the fire will spread.

  2. Average length of time any home is burning. Again, the higher this number, the more likely the fire will spread.

  3. Average time from detection to response. This measures how fast the FD arrives on site.

  4. Average time from response to recovery. This measures how effective the FD is fighting fires.

  5. Average property value of burning homes. One would be less concerned if the burning homes are abandoned or condemned, and more concerned if they are inhabited.


I do not consider the number of arsonists here. That is relevant but it brings into question the role of the police to deter, investigate, apprehend, prosecute, and incarcerate threats. The FD cannot fight arsonists directly.

Now let's turn to digital security. While it's easy to spot a fire, identifying a "burning" (i.e., compromised) computer can be more difficult. If we could do that via host integrity assessment, we could imagine the following metrics.

  1. Number of compromised computers at any sampled time. This is a statistically valid sample.

  2. Average length of time any computer is compromised. Answering this quesiton requires a forensic investigation to identify the point in time where the intrusion is most likely to have happened.

  3. Average time from detection to response. This measures the effectiveness of the intrusion detection program.

  4. Average time from response to recovery. This measures the effectiveness of the IRT and provisioning personnel.

  5. Average asset value of compromised computers. Again, a lot of owned low-value assets might not be a big problem.


So what do you do with these numbers? First, I recommend just collecting them. Second, take them to business owners and ask if the situation is acceptable. For example:

  • Is it acceptable to have 25% of a business' computers compromised? 50% 10%? 5%?

  • Is it ok for them to be owned for 6 months? 1 day? 2 years?

  • Is it ok for us to take 6 months to notice? 2 hours? 2 days?

  • Is it ok for us to take 1 week to recover? 1 day? 1 month?

  • Is it ok for us to be suffering compromise on development servers? Call center PCs? Human resources databases?


Note on arsonists: you should be able to tell that "arsonists" are intruders. Since most companies can't reduce threats directly, IRTs are in exactly the same position as the FD.

Note on prevention: you can extend the fire analogy to other areas. Fire resistance is like the time required for a red team to penetrate a target. Applying fire retardants is like blue teams taking countermeasures upon discovering vulnerabilities.

Finally, with these answers we can make decisions to change the metrics. For example, a firefighter could say "increase my staff by two people per shift, and buy this new fire engine, and I can change the metrics this way." In the digital realm, a security analyst could say "increase my staff by two people per shift, and buy this new sensor grid, and I can change the metrics this way."

You could also try to influence the prevention side by saying "change all antivirus software from vendor A to vendor B, and change all local users from administrators to unprivileged users" and then see if the metrics change.

The manager is now in a position where spending influences metrics, and the failure to spend could result in an unacceptable answer to the question "How many burning homes?"

8 comments:

Alex said...

Richard,

Great approach to determining the risk tolerance of the organization.

At the risk of carrying it too far, I would encourage you to think about changing your analogy, though. Houses wouldn't directly correspond to what Exec. Mgmt. cares about -the value tied to a specific process. So instead of using "houses" I'd use public facilities (Is it OK to let schools burn down, and not hospitals? What about Police Stations?) and corporations (can the city operate without grocery stores? What about gas stations?). Houses might be more like desktops, maybe.

By tying the asset to the business process it supports instead of platform or IP address, business owners can generally correlate the worth of groups of (or individual) assets to their tolerance for risk.

Dr Anton Chuvakin said...

Disagree - not a good approach in many env since:

# Is it acceptable to have 25% of a business' computers compromised? 50% 10%? 5%?

# Is it ok for them to be owned for 6 months? 1 day? 2 years?

Many would say 'yes - as long as we can use them too' (or 'no, but we won't spend on this so - yes')

# Is it ok for us to take 6 months to notice? 2 hours? 2 days?

Many would say 'yes - as long as we can use them too' (or 'no, but we won't spend on this so - yes')


# Is it ok for us to take 1 week to recover? 1 day? 1 month?


Many would say 'yes - as long as we can use them too' (or 'no, but we won't spend on this so - yes')


# Is it ok for us to be suffering compromise on development servers? Call center PCs? Human resources databases?

Few would say yes, but then - it is 'yes as long as nobody knows AND we can use the systems'....


Please, please debate me :-)

Richard Bejtlich said...

Hi Anton,

I think this approach works very well. For a site like the one you describe, where essentially no one cares their systems are owned, the answers are yes, any amount; yes, any duration; yes, any time; yes, any time; yes, any system. In such a situation I would probably look to get another job because we have specifically defined that no one cares about integrity in such an organization. That's a lawsuit waiting to happen, especially if any regulations apply.

Dr Anton Chuvakin said...

Correct, but the keyword seems to be "waiting..." to happen. Will it be waiting? Or will it actually happen?

tramadol said...
This comment has been removed by a blog administrator.
Michael H Buselli said...

I agree with Anton. A company that is "waiting" will keep "waiting" until its hand is forced. But then, in that case, it is true what Richard said: one should find a new place to work. If your company does not care for properly protecting its assets, then you should not be satisfied working there.

Richard Bejtlich said...

Michael, let me make a little more subtle point: if your risk tolerance is tighter than your company's, you will be frustrated and not happy. If that delta is too big, you should probably consider another job.

Carlo said...

Good Job! :)