Sunday, August 26, 2007

Thoughts on FAIR

You knew I had risk on my mind given my recent post Economist on the Peril of Models. The fact is I just flew to Chicago to teach my last Network Security Operations class, so I took some time to read the Risk Management Insight white paper An Introduction to Factor Analysis of Information Risk (FAIR). I needed to respond to Risk Assessment Is Not Guesswork, so I figured reading the whole FAIR document was a good start. I said in Brothers in Risk that I liked RMI's attempts to bring standardized terms to the profession, so I hope they approach this post with an open mind.

I have some macro issues with FAIR as well as some micro issues. Let me start with the macro issue by asking you a question:

Does breaking down a large problem into small problems, the solutions to which rely upon making guesses, result in solving the large problem more accurately?

If you answer yes, you will like FAIR. If you answer no, you will not like FAIR.

FAIR defines risk as

Risk - the probable frequency and probable magnitude of future loss

That reminded me of

Annual Loss Expectancy (ALE) = Annualized Rate of Occurrence (ARO) X Single Loss Expectancy (SLE)

If you don't agree remove the "annual" terms from the second definition or add them to the FAIR definition.

I have always preferred this equation

Risk = Vulnerability X Threat X Impact (or Cost)

because it is useful for showing the effects on risk if you change one of the factors, ceteris paribus. (Ok, I threw the Latin in there as homage to one of my economics instructors.)

If you consider frequency when estimating threat activity and include countermeasures as a component of vulnerability, you'll notice that Threat X Vulnerability starts looking like ARO. Impact (or Cost) is practically the same as SLE, so the two equations are similar.

FAIR turns its definition into the following.



If you care to click on that diagram, you'll see many small elements that need to be estimated. Specifically, you can follow the Basic Risk Assessment Guide to see these are the steps.

  • Stage 1. Identify scenario components


    • 1. Identify the asset at risk

    • 2. Identify the threat community under consideration


  • Stage 2. Evaluate Loss Event Frequency (LEF)


    • 3. Estimate the probable Threat Event Frequency (TEF)

    • 4. Estimate the Threat Capability (TCap)

    • 5. Estimate Control strength (CS)

    • 6. Derive Vulnerability (Vuln)

    • 7. Derive Loss Event Frequency (LEF)


  • Stage 3. Evaluate Probable Loss Magnitude (PLM)


    • 8. Estimate worst-case loss

    • 9. Estimate probable loss


  • Stage 4. Derive and articulate Risk


    • 10. Derive and articulate Risk



The problem with FAIR is that in every place you see the word "Estimate" you can substitute "Make a guess that's not backed by any objective measurement and which could be challenged by anyone with a different agenda." Because all the derived values are based on those estimates, your assessment of FAIR depends on the answer to the question I asked at the start of this post.

Let's see how this process stands up to some simple scrutiny by reviewing FAIR's Analyzing a Simple Scenario.

A Human Resources (HR) executive within a large bank has his username and password written on a sticky-note stuck to his computer monitor. These authentication credentials allow him to log onto the network and access the HR applications he’s entitled to use...

1. Identify the Asset at Risk: In this case, however, we’ll focus on the credentials, recognizing that their value is inherited from the assets they’re intended to protect.


We start with a physical security risk case. This simplifies the process considerably and actually gives FAIR the best chance it has to reflect reality. Why is that? The answer is that the physical world changes more slowly than the digital world. We don't have to worry about having solid walls being penetrated by a mutant from the X-Men movies or from the state of the credentials suddenly being altered by a patch or configuration change.

Identify the Threat Community: If we examine the nature of the organization (e.g., the industry it’s in, etc.), and the conditions surrounding the asset (e.g., an HR executive’s office), we can begin to parse the overall threat population into communities that might reasonably apply... For this example, let’s focus on the cleaning crew.

That's convenient. The document lists six potential threat communities but decides to only analyze one. Simplification sure makes it easier to proceed with this analysis. It also means the result is so narrowly targeted to be almost worthless, unless we decide to repeat this process for the rest of the threat communities. And this is still only looking at a sticky note.

3. Estimate the probable Threat Event Frequency (TEF): Many people demand reams of hard data before they’re comfortable estimating attack frequency. Unfortunately, because we don’t have much (if any) really useful or credible data for many scenarios, TEF is often ignored altogether. So, in the absence of hard data, what’s left? One answer is to use a qualitative scale, such as Low, Medium, or High.

And, while there’s nothing inherently wrong with a qualitative approach in many circumstances, a quantitative approach provides better clarity and is more useful to most decision-makers – even if it’s imprecise.

For example, I may not have years of empirical data documenting how frequently cleaning crew employees abuse usernames and passwords on sticky-notes, but I can make a reasonable estimate within a set of ranges.

Recognizing that cleaning crews are generally comprised of honest people, that an HR executive’s credentials typically would not be viewed or recognized as especially valuable to them, and that the perceived risk associated with illicit use might be high, then it seems reasonable to estimate a Low TEF using the table below...

Is it possible for a cleaning crew to have an employee with motive, sufficient computing experience to recognize the potential value of these credentials, and with a high enough risk tolerance to try their hand at illicit use? Absolutely! Does it happen? Undoubtedly. Might such a person be on the crew that cleans this office? Sure – it’s possible. Nonetheless, the probable frequency is relatively low.
(emphasis added)

Says who? Has the person making this assessment done any research to determine if inflitrating cleaning crews is a technique used by economic adversaries? If yes, how often does that happen? What is the nature of the crew cleaning this office? Do they perform background checks? Have they been infiltrated before? Are they owned by a competitor? Figuring all of that out is too hard. Let's just supply guess #1: "low."

4. Estimate the Threat Capability (Tcap): Tcap refers to the threat agent’s skill (knowledge & experience) and resources (time & materials) that can be brought to bear against the asset... In this case, all we’re talking about is estimating the skill (in this case, reading ability) and resources (time) the average member of this threat community can use against a password written on a sticky note. It’s reasonable to rate the cleaning crew Tcap as Medium, as compared to the overall threat population.

Why is that? Why not "low" again? These are janitors we're discussing. Guess #2.

5. Estimate the Control Strength (CS): Control strength has to do with an asset’s ability to resist compromise. In our scenario, because the credentials are in plain sight and in plain text, the CS is Very Low. If they were written down, but encrypted, the CS would be different – probably much higher.

It is easy to accept guess #3 because we are dealing with a physical security scenario. It's simple for any person to understand that a sticky note in plain site has zero controls applied against it, so the (nonexistent) "controls" are worthless. But what about that new Web application firewall? Or you anti-virus software? Or any other technical control? Good luck assessing their effectiveness in the face of attacks that evolve on a weekly basis.

6. Derive Vulnerability (Vuln)

This value is derived using a chart that balances Tcap vs Control Strength. Since it is based on two guesses, one could decide if it is more or less accurate than estimated the vulnerability directly.

7. Derive Loss Event Frequency (LEF)

This value is derived using a chart that balances TEF vs Vulnerability. We derived vulnerability in the previous step and estimated TEF in step 3.

8. Estimate worst-case loss: Within this scenario, three potential threat actions stand out as having significant loss potential – misuse, disclosure, and destruction... For this exercise, we’ll select disclosure as our worst-case threat action.

This step considers Productivity, Response, Replacement, Fine/Judgments, Competitve Advantage, and Reputation, with Threat Actions including Access, Modification, Disclosure, and Denial of Access. Enter guess #4.

9. Estimate probable loss magnitude (PLM): The first step in estimating PLM is to determine which threat action is most likely. Remember; actions are driven by motive, and the most common motive for illicit action is financial gain. Given this threat community, the type of asset (personal information), and the available threat actions, it’s reasonable to select Misuse as the most likely action – e.g., for identity theft. Our next step is to estimate the most likely loss magnitude resulting from Misuse for each loss form.

Again, says who? Was identity theft chosen because it's popular in the news? My choice for guess #5 could be something completely different.

10. Derive and Articulate Risk: [R]isk is simply derived from LEF and PLM. The question is whether to articulate risk qualitatively using a matrix like the one below, or articulate risk as LEF, PLM, and worst-case.

The final risk rating is another derived value, based on previous estimates.

The FAIR author tries to head off critiques like this blog with the following section:

It’s natural, though, for people to accept change at different speeds. Some of us hold our beliefs very firmly, and it can be difficult and uncomfortable to adopt a new approach. Ultimately, not everyone is going to agree with the principles or methods that underlie FAIR. A few have called it nonsense. Others appear to feel threatened by it.

Apparently I'm resistant to "change" and "threatened" because I firmly hold on to "beliefs." I'm afraid that is what I will have to do when frameworks like this are founded upon someone's opinion at each stage of the decision-making process.

The FAIR document continues:

Their concerns tend to revolve around one or more of the following issues:

The absence of hard data. There’s no question that an abundance of good data would be useful. Unfortunately, that’s not our current reality. Consequently, we need to find another way to approach the problem, and FAIR is one solution.


I think I just read that the author admits FAIR is not based on "good data," and since we don't have data, we should just "find another way," like FAIR.

The lack of precision. Here again, precision is nice when it’s achievable, but it’s not realistic within this problem space. Reality is just too complex... FAIR represents an attempt to gain far better accuracy, while recognizing that the fundamental nature of the problem doesn’t allow for a high degree of precision.

The author admits that FAIR is not precise. How can it even be accurate when the derived values are all based on subjective estimates anyway?

Some people just don’t like change – particularly change as profound as this represents.

I fail to see why FAIR is considered profound. Is the answer because the process has been broken into five estimates, from which several other values are derived? Why is this any better than articles like How to Conduct a Risk Analysis or Risk Analysis Tools: A Primer or Risk Assessment and Threat Identification?

I'm sure this isn't the last word on this issue, but I need to rest before teaching tomorrow. Thank you for staying with me if you read the whole post. Obviously if I'm not a fan of FAIR I should propose an alternative. In Risk-Based Security is the Emperor's New Clothes I cited Donn Parker, who is probably the devil to FAIR advocates. If the question is how to make security decisions by assessing digital risk, I will put together thoughts on that for a post (hopefully this week).

Incidentally, the fact that I am not a fan of FAIR doesn't mean I think the authors have wasted their time. I appreciate their attempt to bring rigor to this process. I also think the questions they ask and the elements they consider are important. However, I think the ability to insert whatever value one likes into the five estimations fatally wounds the process.

This is the bottom line for me: FAIR advocates claim their output is superior due to their framework. How can a framework that relies on arbitrary inputs produce non-arbitrary output? And what makes FAIR so valuable anyway -- has the result been tested against any other methods?

17 comments:

Anonymous said...

FAIR is more precise because instead of making 3 guesses like in the old-school "risk equation" it's now 5. So it's almost twice as precise. :) It's also more scientific because it has cool acronyms like Tcap and LEF and acronyms are always more scientific.

"No, dude, see... it goes to 11!"

mjr./Marcus Ranum

Alex said...

Uh, no Marcus, precision in a Bayesian network is primarily a function of how informative your priors are, not the amount of complexity in the model.

FAIR is more accurate because it is a logical framework for expressing risk that uses things like frequency.

Think of it this way, If I'm shooting at a target, and I have a very tight grouping - I'm precise. That has nothing to do with accuracy (whether I've hit the bullseye or not). I can be less precise (have a much wider distribution of shots) but still be accurate (in the bullseye).

You should really read the book I recommended after your first podcast on the subject - "Probability Theory -- The Logic Of Science". Quoting E.T Jaynes is a lot hipper than quoting Nigel Tufnel :)

Richard Bejtlich said...

I'm cross-commenting so people who might not have seen Alex's post will see my response.

Alex, you said:

"if it is true that Bayesian approaches are valid, and if risk is a probability issue and FAIR is the right framework for use in such an approach, then where does using FAIR fail us?"

Consider the Microsoft article you cited. It says:

"Bayesian probability theory is a branch of mathematical probability theory that allows one to model uncertainty about the world and outcomes of interest by combining common-sense knowledge and observational evidence..."

So, it is obvious to me what the problem with FAIR is: who determines what is "common-sense knowledge," and where is your "observational evidence"? The answers seem to be "whoever is doing the analysis" (which is no answer at all) and "unavailable." Therefore, you have arbitrary inputs producing arbitrary outputs.

Clint Laskowski said...

Assessing risk is a very complex issue. I'm convinced it may not even be possible with regards to information security, which involves aggregate risk and complex/compound risks (not just the simple risks like the one in the FAIR paper). I've been saying for some time that with Information Security Risk Assessments, The Journey is the Reward. In other words, the effort of attempting to identify and (at least partially) understand possible threats, vulnerabilities, and impacts is ultimately more valuable than the risk calculation. I'll be posting more on this soon.

Alex said...

That's a very valid point, Clint. I've often thought that for the practitioner, much of the benefit comes not from the probability statement at the end of a FAIR analysis, but in the rigor applied to the process.

RE: "Guess" x "Guess" by the way:

This is perfectly valid in Probability Theory (not the "guess" part but that's just a superficial 'attack via thesaurus' on the validity of prior information). Creating a prior from a posterior is certainly acceptable. More cheekily - "One man's posterior is another man's prior".

Clint Laskowski said...

More questions: (1) If we can't measure risk, can we manage it ('you can't manage what you can't measure'). (2) Can we ever hope to be taken seriously if we can't measure risk to a high degree of confidence (i.e., as compared to six-sigma, five-nines, etc.); (3) can anyone point to a public (or redacted) document that takes a real-world scenario (say a small business with a server, 10 desktops, a few laptops, some printers, an Internet connection, and some confidential data) and conducts a serious quantitative or qualitative information security risk assessment resulting in a risk rankings that are not based on "estimates" that can easily be dismissed by almost anyone? I don't think so (trust me, I've searched)!

Clint Laskowski said...

CHALLENGE

(Richard, I hope you don't mind me posting this challenge on your blog, but this seems to be where the discussion is taking place.)

Alex - or anyone else promoting an information security risk assessment methodology - how about posting a real-world example of how your risk assessment methodology works using a scenario like I described in my previous post (a small business with a server, 10 desktops, a few laptops, some printers, an Internet connection, and some confidential data). You can make the threats, vulnerabilities, and impacts what ever you want, they just need to be realistic and reasonably complete. If your methodology can generate risk rankings that can be used to determine if a risk is greater than a specific tolerence level, and it can rank the risks by magnitude (i.e., show which risk is the "highest", second "highest", third, etc.), and it is not based on complete guesswork that can be easily refutted ... I'll eat my hat! And, I venture to say Richard might eat his hat, too!

Tomas said...

In defense of FAIR

http://deeptrust.blogspot.com/2007/08/fair-is-defended.html

Alex said...

Clint,

Actually, we're about to do that - it was a study used last week to justify security controls around a web application.

It has a good mix of noisy and non-noisy priors.

Chris Hayes said...

I have been a FAIR practitioner for over two years and have performed hundreds of risk assessments with it. Risk scenarios I have assessed have spanned the spectrum of information security (technical, application security, legal, regulatory risk, 3rd party risk, you get the point). I like FAIR for quite a few reasons, but will list a few:

1. The business decision makers for multi-million dollar lines of businesses - I have had to defend my risk assessments to – understand the terms that make up the framework and understand risk better then most information security professionals (who tend to think in binary terms 1 or 0, good or bad).

2. FAIR allows for a consistent approach to assessing risk. Whether its group based or an individual effort. Had a co-worker that recently came back from a SANS course where he was introduced to a VMWARE vulnerability. You would have thought when he came back that we were literally on the verge of a global catastrophe. Once we started talking about the vulnerability in terms of a scenario – his blood pressure started to drop and balance was restored in the universe. A classic possible – probable example.

3. Finally – FAIR is agile and not just for information security professionals to use. I have used FAIR for non-security risk scenarios to help guide personal decisions.

So, there you have it. While I might not be able to debate the foundations of FAIR (Bayesian analysis, – there is no debating the results it provides to decision makers as well the professional growth I have enjoyed since using it.

Clint Laskowski said...

@ chris hayes ...

So, with the experience of doing "hundreds or risk assessments" using FAIR, do you not have even a single information security risk assessment that you could post for us to learn from ... one that meets my challenge above (and please don't say "but they contain confidential information" ... that stuff can be redacted)?

Anonymous said...

Have a look at this book:
"Uncertainty"
by Morgan and Henrion. Contains many ways for experts to estimate risk - and they are dealing in fields including nuclear safety where there have been rather lower frequency of incidents and rather higher loss values than normally found in Information Security.

Andrew Yeomans

Chris Hayes said...

@ Clint - I will commit to posting a few scenarios at http://infosecrisk.blogspot.com/. I will need a few days to get some things in order, configure the blog to my liking, and do some sanitizing.

Alex said...

@Clint,

The offer is out on RMI's Weblog. We'll also begin posting some real world examples, complete with where prior information is gathered from.

Clint Laskowski said...

I'll start preparing my hat ... salt, pepper, etc. ;-)

Allen Baranov, CISSP said...

It seems to me that FAIR is better than the traditional ALE equation because although you are still guessing your guesses are leading you through a better thought process.

It is by no means perfect or even very much better than ALE but it seems better to me.

Having said that I have never seen a risk assessment that I was comfortable with where the risk wasn't negligible (aliens stealing our servers and blowing up the DRM site) or certain (a virus landing on the mail server).

In answer to Clint's comment:

If we can't measure risk, can we manage it ('you can't manage what you can't measure').

Fair enough, but the original quote is from Lord Kelvin who worked in a science lab where everything was standardised. We are in the real world, where there are different factors changing and interacting all the time. Our measurement will never be exact or anything better than gut feel and good guesses.

(2) Can we ever hope to be taken seriously if we can't measure risk to a high degree of confidence (i.e., as compared to six-sigma, five-nines, etc.);

Sure. Work with what is certain - viruses, portscans, problematic patches, stupid users. If you seriously need five-nines you can get it with a good firewall and some very strict server access rules. Most desktops don't have to be up all the time. Accept a certain level of risk. Just make sure you have backups.

(3) can anyone point to a public (or redacted) document that takes a real-world scenario (say a small business with a server, 10 desktops, a few laptops, some printers, an Internet connection, and some confidential data) and conducts a serious quantitative or qualitative information security risk assessment resulting in a risk rankings that are not based on "estimates" that can easily be dismissed by almost anyone? I don't think so (trust me, I've searched)!

Hmm..even in this scenario there would be some issues getting standard data. Do the laptops go home? Do the workers undergo security awareness. Is the company a target for hackers? What is the physical security like? Do they use NAC? Do they have firewalls between segments? Are there branches? Do they use wireless? Do they use credit cards ever? Does one of their staff run a personal web server on his desktop and copy movies through P2P?

Clint Laskowski said...

@allen, re: (3)

All your questions asking what controls are in place are irrelevant.

I'm not asking about a specific scenario.

I'm saying, can proponents of information security risk assessment point to even a single public document (scrubbed if you like) measuring and prioritizing the information security risks of a typical small business (much less a large business), based on values for likelihood and impact that are anything more than guesses?

If information security risk assessment (risk identifcation, risk analysis, and risk evaluation) were real, there would be plenty of good examples out there by now, and they could stand up to scruitiny. As it is, there are none as far as I can see.