Thursday, August 30, 2007

More Thoughts on FAIR

My post Thoughts on FAIR has attracted some attention, but as often the case some readers choose to obscure my point by overlaying their own assumptions. In this post I will try to explain my problems with FAIR in as simplistic a manner as possible.

Imagine if someone proposed the following model for assessing force: F=ma

(Yes, this is Newton's Second Law, and yes, I am using words like "model" and "assess" to reflect the risk assessment modeling problem.)

I could see two problems with using this model to assess force.

  1. Reality check: The model does not reflect reality. In other words, an accurate measurement of mass times an accurate measurement of acceleration does not result in an accurate measurement of force.

  2. Input check: To accurately measure force, the values for m and a must not be arbitrary. Otherwise, the value for F is arbitrary.


With respect to FAIR, I make the following judgments.

  1. Reality check: The jury is out on whether FAIR reflects reality. It certainly might. It might not.

  2. Input check: I have not seen any evidence that FAIR expects or requires anything other than arbitrary inputs. Arbitrary inputs to a model that passes the reality check does not produce anything valuable as far as I am concerned.

    If you personally like feeding your own opinions into a model to see what comes out the other end, have at it. It's nice to play around by making assumptions, seeing the result, and then altering the inputs to suit whatever output you really wanted to see.


One of the previous post commenters mentioned the book Uncertainty, which looks fascinating. If you read the excerpt you'll notice this line:

In the early 1970s what was then the U.S. Atomic Energy Commission (AEC) asked Norman C. Rasmussen, a professor of nuclear engineering at the Massachusetts Institute of Technology, to undertake a quantitative study of the safety of light-water reactors...

Rasmussen assembled a team of roughly sixty people, who undertook to identify and formally describe, in terms of event trees, the various scenarios they believed might lead to major accidents in each of the two reactors studied. Fault trees were developed to estimate the probabilities of the various events.

A combination of historical evidence from the nuclear and other industries, together with expert judgment, were used to construct the probability estimates, most of which were taken to be log-normally distributed.
(emphasis added)

I've seen no commitment to including real evidence in FAIR, and I submit that those who lack evidence will have their so-called "expert judgment" fail due to soccer-goal security. Therefore, they possess neither historical evidence nor really expert judgment. So, even if FAIR meets my reality check, the results are only a feel-good exercise.

So how can I defend the use of this model?

Risk = Vulnerability X Threat X Impact (or Cost)

As I've said before, it is useful for showing the effects on Risk if you change one of the factors, ceteris paribus. This assumes that the model passes the reality check, which I believe it does.

I am not trying to calculate absolute values for Risk when I cite this equation. I am trying to conceptually show how Risk decreases when Threat decreases (ceteris paribus), or how Risk increases as Vulnerability increases (ceteris paribus), and so on.

You could take the same approach with F=ma if you were trying to explain to someone how it would hurt more to be struck by an object whose mass is larger than another object, assuming constant acceleration for each event. I am not trying to calculate F in such a case; I'm only using the model to describe the relationship between the components.

17 comments:

Tomas said...

Richard,

I think you are right in some aspects, that is: since with FAIR you do not usually have real data to make probability estimates and then you will not get as good risk estimate as you might wish.

However, in FAIR and similar frameworks you get help to elicit expert knowledge and transform it into a risk estimation. And the validity of this risk estimation is of course related to the validity of the expert knowledge: If you put garbage in, you get garbage out.

But, I think you are wrong when you are saying that the input to FAIR is arbitrary. Of course, if used incorrectly, the input can be arbitrary.

My question is: why would anybody that seriously wants to use FAIR make "arbitrary" input? Why not make "guesses" that are the best according to your knowledge? Then, based on the input and its modeling assumptions, FAIR will output the best possible risk estimation (at least if you believe in Bayesian statistics and decision theory..).

This means that you cannot make any better risk estimation based on the knowledge you have given as input without changing the FAIR model or adding more input.

So if you have to make decision that is the best according to you knowledge, then FAIR might work well.

http://deeptrust.blogspot.com/

jbmoore said...

With all due respect, you are comparing apples and oranges here. While Newton's Second Law is idealized, it is pretty darn accurate until friction or atmospheric resistance become significant. So, there are cases where it's off. You can count them on three fingers - friction, atmospheric resistance and Mercury's orbital precession. Newton's Law is grounded in reality and explains it quite well. To test inputs, you have to keep one input constant while varying the other to determine the relationship between the two - simple empirical experimental science.

FAIR is likely a compromise among committees and an attempt to quantitate very complex systems and variables that are unknown in an untestable and likely unproven way. Newton's Laws are grounded in the scientific method and were proven to be false only in special cases. There seems to be a lot of assumptions made about FAIR that may be false, but worse still, the model is likely untestable. There's no way to verify that the calculations and processes you use are valid. If FAIR was a rigorous scientific model, you would build a network and secure it based on FAIR's methodology and vary the inputs (fuzzing, pen testing, etc) and see what outputs you get (am I secure or hacked). It's similar to certain biological neural nets that sense a wide range of sensory input and reduce it to a boolean output response (fire or don't fire).

Biological networks are likely more complex than computer networks. If neurophysiologists and biophysicists can deduce the behavior of biological neural networks, than computer scientists can do the same with computer networks and their security. It should be easier to test computer networks because men designed them. After all, it took 4.499999 billion years for evolution to build a human brain capable of deducing Newton's Laws, building nuclear weapons and relatively simple computer networks compared to your brain's own circuitry. The difference between a biological system such as your brain and an artificial system such as a computer network is that the brain is resilient and redundant at the molecular, cellular and possibly the tissue levels. The human made computer network is brittle. There is little redundancy or resiliency built in at the molecular, component, or subsystem levels. Therefore, since a computer network is easy to break or compromise, it should be easy to quantitate at some level how vulnerable the network is.

I understand your logic, but the systems you chose to compare aren't comparable. Newton's Laws were derived from experimental observation, while FAIR wasn't. One is a pretty good model of the physical world, whereas the latter is something to make people feel good with no basis in reality. Calling FAIR a model of reality is an insult to valid scientific models and scientific methodology. Engineers and physicists know how to test systems. As I said earlier, if people built it, they can test it, or let Nature test it for them. They don't call them engineering disasters for nothing.

Richard Bejtlich said...

jbmoore,

Would you have been happier if I had laid out all of my own reservations with this analogy, such as my own recognition that Newton's Laws are real science and FAIR and other models are not? Or would you have not bothered to read any farther after seeing ten pages of disclaimers prior to me making my point? (Sigh.)

Alex said...

@jbmoore - Very good thoughts! Some things to note:

FAIR was not developed by a committee, rather it was developed at a Fortune 100 Insurance company - and vetted against actuarials and Chris Holloman and staff at The Ohio State University for validity as a Bayesian network (as well as some informal help from Washington University's NMR lab). It just so happened that this Insurance company didn't have particularly clear policies concerning IP ownership.

That said, I completely agree: FAIR is certainly not a scientific model in the sense that you define it in your comments. Of course, Bayesian networks are valid for creating probability statements within the context of a scientific models, and that is the goal of FAIR. You seem to be somewhat confused in this regard as I read your above comment above. There is a distinction between using FAIR as a prior for some other model - perhaps an ISMS - and as a model to build "secure" networks itself.

This is an important distinction, but it does not make the use of FAIR any less valid or an "insult to scientific models and scientific methodology". As I'm sure you are not trying to say that "engineers and physicists" at no time use probability theory or stochastic methods to test their theories.

FAIR is, however, an attempt to use a rational model and probability theory to implement scientific method towards the study of risk. Two things to note here:

1. This is certainly not classical statistics - FAIR suffers quite a bit of criticism in our industry from those who have had (direct quote) "a couple of statistics classes in college" and so therefore believe they are experts in probability theory. Inference-based statistics will seem counter-intuitive to these folks until they invest time and effort rather than ridicule what they are not equipped to understand.

2. The use of Bayes Theorem itself, has been called the implementation of scientific method, but only in its ability to create and test probabilities. Again, it is used by "engineers and physicists" as such when they have noisy data (and who has noisy data? We've got noisy data!).

So, as you say, it is apples and oranges to what both you and Richard are describing above. Your criticisms then should be focused on the following:

1.) Is a Bayesian network a valid tool for determining probability.

2.) Does FAIR fail when we apply Cox's Theorem or Jaynes desiderata to it?

Alex said...

@ Richard,

Wrong picture! You should be using:

http://en.wikipedia.org/wiki/Image:Thomasbayes.jpg

or maybe:

http://bayes.wustl.edu/etj/1960.jpg

Richard Bejtlich said...

Alex,

Please -- save your cheap shots for Bayes, Cox, Jaynes, and any other big names surrounding your methodology. I have a master's degree from Harvard in public policy. I am well-equipped to understand any rational argument you can make. I never claimed to be an expert in statistics, but if that is what is needed to appreciate the beauty of your system then it should be self-evident that it's unsuitable for the real world. I asked a really simple question and I got a reply that ducks behind two other issues. It's clear to me there is no way to resolve an issue when the financial future of one of the parties (not me) depends on supporting his argument. I'm done.

jbmoore said...

I understood what you were trying to say, but did you state them? To understand something you have to measure it in a meaningful way. Did you point out that Newton's Laws were derived through testing? No. You implied it. What's the underlying flaw with FAIR? There is no empirical evidence to support or refute it. It's untestable. Why? Because people are not assembling data on computer breakins/data breaches and we don't know what works versus what doesn't work, and we have no quantitifable data. I mean we have best practices, but that's what they are, best practices. Advocating network monitoring as you do as a form of data collection is the wise thing to do. And I must apologize. I should have said that your analogy is flawed rather than your logic. You are advocating accurate measurement in order to define relationships that can be used to estimate information that is in alignment with reality. But it still comes down to testing one's ideas against what is being observed.
Ceci n'est pas une pipe - the painting of a pipe is not the pipe. A model of reality is not reality - it's a simplified version or approximation of reality suitable enough that it can be tested against reality and either refuted or supported by the evidence. Is this model (Risk = Vulnerability X Threat X Impact (or Cost)) true? There is likely no way to estimate true cost, nor is there a way to estimate true threat. The only known is the vulnerability. Therefore, risk is not quantifiable in any meaningful way. That is not to say that there is not significant risk, but that there's no way to meaningfully obtain solid data until after the damage has been done, i.e. known measured threat x known vulnerability x known loss of damage = known risk. No one has ever disclosed the true costs because there's no transparency or incentive to do so. For all you know and I know as well, risk = (vulnerability)^3, or threat is actually a power function, risk=v*t^2*c. Do you know how this equation came into existence and based upon what evidence? From en.wikipedia.org, the quantitative engineering definition of risk is:

Risk = {(probability of an accident)} * {(losses per accident)}.

Then you'll find out half a sentence later that measuring engineering risk is "often difficult". If you can't meaningfully measure the quantity, anything you spout is dogma and untested dogma at that.

John Rodenbiker said...

As someone who has reviewed if not completed many, many risk assessments I have hands on experience with these things.

In my professional opinion they have very little value in actually assessing the amount of risk faced by an organization.

As Richard has implied in his postings, it is a matter of garbage in, garbage out. We, as an industry, are reading the entrails of rabbits.

I do think what we call "risk assessments" has some value as a security management tool, but not as intended.

If you go through the entire FAIR rigmarole you should have a list of your organization's information assets prioritized by their subjective value to the company. Having such a list in hand -- even if the "priorities" only reflect the biases of the list's creator -- is a lot better than not. I doubt most organizations have anything like this, really. Hardware and software inventories are not the same thing.

The other benefit to "risk assessments" is they can make a nice checklist for best practice compliance. "Has Business Division X's security policies been reviewed lately? Do we have current hardware and software inventories? Has HR been complying with its security procedures? How the physical security? The environmental controls in the data centers? Is the malicious software prevention system working and is anyone monitoring it? Are the application controls on our business software in place and effective?" Etc., etc., etc. Wash, rinse, repeat.

Of course, this tool isn't really being used to assess risk at this point, but instead is a checklist.

If an organization is going to use a risk assessment (or be forced to use it as banks under FDIC oversight must) this is the best way to derive some value from the exercise.

--
John Rodenbiker, CISA
jrodenbiker@rodenbiker.net

John Rodenbiker said...

@tomas

You said:
"My question is: why would anybody that seriously wants to use FAIR make "arbitrary" input? Why not make "guesses" that are the best according to your knowledge?"

The problem is that our knowledge, as an industry, is insufficient for any guesses based on it to be distinguishable from arbitrary input.

--
John Rodenbiker, CISA
jrodenbiker@rodenbiker.net

jbmoore said...

The best you can do is make qualitative statements of risk for complex systems. In that case, if costs double, triple, or even go up tenfold, risk remains about the same. In biology and biochemistry, 2 = 1, in other words, a 100% increase in gene activity is still x, not 2x, due to sampling errors, variability and other complex factors. A reproducible thousand-fold increase in activity would be quantitatively and qualitatively significant.

Or, you can wave your hand and state that the field is in its infancy, no one knows the risks, no one's measured the risks and the field IT Security Risk Analysis is ripe for a fresh perspective/approach. Such honesty isn't appreciated in business, but it's demanded in the Sciences.

jbmoore said...

Alex,

I know that you have to train Bayesian systems using training sets. Doesn't that run the risk of selecting for discrimination of something else rather than what you intend? There's a famous example of training a neural net to recognize tanks. Well, all the pictures of tanks also had trees in them while the pictures without tanks didn't. The neural net didn't learn to recognize tanks, but trees instead, and it failed its first real test.

Alex said...

@ Richard,

The "cheap shot" wasn't aimed at you at all, actually- and I'm very sorry you took it personally. It's just an observation of existing criticisms of risk analysis, not even concerning FAIR but of the study of risk itself. I'll intentionally leave the author of that quote unnamed, first because it's not my intention to be snarky, and second, they're actually much smarter than that quote suggests.

RE: the "Big Names", wouldn't you want to validate an approach using the validation methods developed? I'm offering Cox's Theorem and Jaynes' Desiderata as ways you can go test FAIR for suitability as a Bayesian Network. As far as being the one with financial interest, FAIR is released under a CC license, and note that I'm actually offering tests for independent validation of the framework - it just needs to be in the context of what it is, not what it is not.

In fact, I would offer that the issues that we're dancing around really have nothing to do with FAIR, but whether or not risk can be modeled in a logical (or objective, epistemic) probabilistic manner. I am guilty of moving the conversation towards "yes" and then directing you to whether FAIR is a valid framework for modeling, but only because I've yet to really see a cogent "no" answer to that original question.

@jbmoore,

"What's the underlying flaw with FAIR? There is no empirical evidence to support or refute it. It's untestable."

No one is suggesting that risk analysis is anything more than a so-called "soft" science - what we are suggesting is that scientific methods be used to make as rational a probability statement as possible. As you say, the model is not reality, no more than "a 60% chance of rain" doesn't mean we will need to measure "60% rain" this afternoon. Now it does mean that roughly 6 of 10 60% chance of rain statements for discreet days should indeed result in precipitation. As such, the model does need to match the (subjective) reality of the observer, and stand up against the rigors of what probability theorists expect (thus my comments above).

A couple of other notes:

1.) Risk Analysis may be "often difficult" but there's a leap in logic to say that "often difficult" = impossible or even unusable.

2.) Be very careful when you toe the line on scientific objectivity. As I.J. Good said, ``...the subjectivist states his judgments, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science.''

3.) I take great comfort in the fact that the whole argument tends to boil down to "can we make models to express reality or not".

4.) "...there's no way to meaningfully obtain solid data until after the damage has been done. No one has ever disclosed the true costs because there's no transparency or incentive to do so." Please don't forget that FAIR was developed by an IRM group. They may not have disclosed costs to the public in a meaningful way (see DSW or TJX for how costs can be gamed in financial statements), but you can bet your sweet momma's pecan pie they knew the true costs of an incident from an internal perspective (in no small part thanks to using FAIR's category of loss factors).

@ John:

"reviewed if not completed many, many risk assessments I have hands on experience with these things."

As did I. Hates OCTAVE and 800-30 we do. And I would agree that in the context of those frameworks, "they have very little value in actually assessing the amount of risk faced by an organization." And if you're worried about aggregate risk to the organization, the goal of using FAIR isn't to express one master risk statement to rule them all. There is no other answer to "how much risk do I have" other than "lots", really. From an organizational aggregate standpoint, it is much more valid to study your capability to manage risk than risk itself (as theoretically, the amount of risk one has is a lagging indicator of our ability to manage it). But that capability measurement is a whole other can of beans.

"If you go through the entire FAIR rigmarole you should have a list of your organization's information assets prioritized by their subjective value to the company."

Good Lord Of The Dance I hope not. I would hope that FAIR would be applied as a method for measuring discreet risk issues (policy exception, audit findings, etc..). Note that while you can use FAIR in a monolithic risk assessment approach, as I mentioned above, those things suck and usually aren't useful unless/until your IRM program is mature enough to follow a business process based view of asset management rather than an IP address view of what an asset is.

I would also argue that FAIR does not apply value to a discreet asset as many other risk assessment methodologies (attempt) to do, but rather the probable magnitude of loss regarding specific threat actions by specific threat communities. I would rather leave the subjective (nothing wrong with that as long as it is done with appropriate rigor) asset valuation to the BIA people.

As for the "best practices" thing - you've got a whole different perspective on risk assessment, and one that I would advocate we should abandon (for many of the reasons you list). I would also argue that this perspective is incongruent to FAIR.

Alex said...

@jbmoore:

I'm not sure I understand your "The best you can do" paragraph. I'm a little slow right now and somewhat distracted (on a video chat).

"Or, you can wave your hand and state that the field is in its infancy, no one knows the risks, no one's measured the risks and the field IT Security Risk Analysis is ripe for a fresh perspective/approach. Such honesty isn't appreciated in business, but it's demanded in the Sciences."

I'm not sure I have a problem with that. Or as John illustrates, we're trying to do the wrong things with risk because of what we thought risk was years ago and now are "best practices" (or as I intimate in a blog post - because scan technology was so cool we just built risk around that and called it risk management).

"I know that you have to train Bayesian systems using training sets. Doesn't that run the risk of selecting for discrimination of something else rather than what you intend? There's a famous example of training a neural net to recognize tanks. Well, all the pictures of tanks also had trees in them while the pictures without tanks didn't. The neural net didn't learn to recognize tanks, but trees instead, and it failed its first real test."

"Run the risk, LOL" We could estimate the risk for that, I suppose :)

Short answer is "we certainly run that risk", but qualified with "just like any other theory". That is to say, this is not an issue with Bayes Theorem itself as much as the model used (use pictures without trees next time). Could FAIR (or a practitioners discreet risk study) suffer the same problem - certainly, and that's why it must be vetted. What I've been trying to explain is that this is no different than any other use of scientific method, and as such is a heck of a lot better than not using it all. Note that this is also why RMI has released FAIR under CC and is trying to transfer "ownership" to The Open Group. If RMI took VC and ran the regular start up route - we may have been forced to hold the IP close to our chest as "secret consulting sauce" - but Jack is determined on offering something up for use and scrutiny under scientific method.

Now unfortunately, the lack of VC means we're going slower than normal technology companies. The white paper approach is several years old and needs updating (yeah, we're getting around to it). The framework of factors and taxonomy are still the same, but obviously we don't like the matrix approach, the asset centric approach, and the expression of risk as high, medium, low rather than LEFxPLM, etc... etc...

Chris Walsh said...

Rich:

You've often said that historical data are not useful in assessing probabilities for infosec-relevant events because circumstances (including knowledge) change so rapidly.

Couple that with the apparent value you place on the nuke fault tree use of historical data, and it would seem that you would have to belief that a 'scientifically' sound assessment of loss likelihoods in the infosec field is impossible.

Is this a fair (no pun intended) conclusion?

Richard Bejtlich said...

Chris,

I see where you are going with this, but I emphasized the use of historical data in the nuke example to show that at least some of the inputs to their model were based on evidence (of some type) rather than only expert opinion. I liked the use of fault trees too.

I think it may be possible to produce some reckoning of risk, but the window for which that value is useful may be very short (days, weeks at most) and be based on very fresh evidence (again, days, weeks at most).

alex said...

"I think it may be possible to produce some reckoning of risk, but the window for which that value is useful may be very short (days, weeks at most) and be based on very fresh evidence (again, days, weeks at most)."

Do you have a model around this, or is this just a wild guess?

:)

Chris Walsh said...

Rich:

Cool. I'm inclined not to agree, but "it's an empirical question". :^)

BTW, I am glad you finally got down to looking at sensitivity in these models (I just read the climate post). This is a topic which is almost completely ignored outside of academia (and pretty well ignored within it).