Thursday, January 04, 2007

Hawke vs the Machine

One of the better episodes of an otherwise lacklustre third season of Airwolf (one of the best TV shows of the 1980s) was called Fortune Teller. Ace helicopter pilot Hawke finds himself face-to-face with an aircraft equipped with the Fortune Teller, a machine built to out-fly the world's best. The best line in the episode follows:

Archangel: They haven't built a machine yet that could replace a good pilot, Hawke.

Hawke: Let's hope so.


I thought of this truth when a colleague called yesterday. He plans to bring me to his customer's security operations center. The manager of the SOC is worried that the team is not detecting and responding to intrusions properly (or at all). One of the services my company provides is assessing and evaluating detection and response teams. So, I plan to visit this SOC and see what I can do to improve the situation.

I think I already know what the problem is. My friend told me the SOC is using an IDS/IPS solution you would all recognize, coupled with a SIM/SEM you would all recognize. If that is the extent of the tools available to the SOC analysts, I pretty much know what I am going to say before I even arrive.

The vast majority of security tools -- especially prevention-oriented tools -- are alert-centric. They are created by developers who assume they can understand attacks, code them into solutions, and have a customer deploy them properly and in the expected environment. The reality is this situation fails on a majority of counts, if not all counts. For example, who would ever have expected the Adobe Acrobat JavaScript Execution Bug as explained by my friend Nitesh Dhanjani?

The problem with a purely alert-centric tool is that it cannot handle all the rich varieties of conditions that it will encounter in a production network -- ever. As soon as it's updated something new appears. The only "solutions" that truly work are those that completely eliminate classes of vulnerabilities. (I think I've written about this before, and I know others have said much more than me on that issue.)

So what does an analyst do with an alert-centric tool, like an IDS/IPS? The analysis process looks like the figure at left. When you only have alerts to handle, you quickly reach a dead end. This is why I know the problem at the SOC before I get there. They're using alert-centric tools. The "detection" and "response" mission becomes "process tickets" generated when the alert-centric tool blinks red. The focus is on processing tickets, not finding intrusions. What else can these analysts do? Their tools are a straightjacket.
Network Security Monitoring (NSM) is different. Generating statistical, session, full content, and alert data gives analysts some place to go when they get an alert -- or when they want to do manual analysis. All of the best intrusions I've ever found were discovered manually. Sure, I've discovered compromises solely using IDS alerts, but all of the best subsequent investigation relied on NSM data.

With NSM, an alert is the beginning of the investigation, not the end, as shown at right. What does this have to do with Hawke and the Machine? An analyst using an alert-centric tool is only as good as the tool.

An analyst using NSM data can exceed the capabilities of the tool.

Here's a related thought. Years ago I told Snort creator Marty Roesch how I perform NSM analysis. He said "Richard, I wrote Snort so you don't have to look at packets." I replied, "Marty, I look at packets because you wrote Snort."

Why is a person better than a machine, in cases like this? The reason is a person is more creative and adaptable. A person can develop a feel for a network. A person can read about a new exploit Wednesday morning and mine NSM data for evidence of that attack faster than a vendor can issue some sort of updated detection or prevention method. Machines are good for providing hints about where to look, which is why I still use IDS'. However, I frequently do completely manual analysis with systems like Sguil, looking for interesting patterns or sessions that my IDS doesn't know.

I'll close with the note that some vendors are using what I call a "dumb is better" approach for generating alert data. For example, you can read about Tenable Security's Never Before Seen approach to identifying interesting events. This is exactly the sort of alert that is helpful, assuming I can then turn to other NSM data to learn more about it. Never Before Seen (NBS) is a great way to identify events which are new. You can apply this to many sorts of issues, which I will not address here but are mentioned by Ron Gula in his blog.

(Disclaimers: Before you argue that your system already does this, I know this approach is a lot older than the Tenable post. I know Marcus was already doing this well before he joined Tenable. The point is that I can read and learn about it thank to the Tenable blog. I can't point my clients to vague marketing materials on other vendor's sites and expect my clients to make sense of what they're reading. Thanks to the Tenable blog I can learn about systems like Tenable written with a technical angle and real product screenshots and so on.)

To conclude, I'm not an expert in any vendor tools. I'm an expert in using the right sorts of data to detect and eject intruders. I am not going to be any better at using some popular vendor's IDS/IPS than another person who understands the interface. There might be room for differentation based on the sorts of attacks reported or the services monitored, but not much.

At the end of the day, expertise grows from having the right forms of data available. I guarantee the SOC in this story doesn't have that data, so that will be recommendation zero.

12 comments:

LonerVamp said...

I'm certainly not trying to fish out details, but I will say the McAfee Intrushield (IDS/IPS) that my company uses leaves a lot to be desired. In fact, exactly what you mentioned above. Alerts are great, but I distinctly hate seeing 1 packet for an alert or something similarly useless. In addition to downright horrible alert/attack descriptions and signature information. Too bad I inherited this device and we'll likely be buying another one soon. :(

ayoi said...

This is a great post Richard. I can say that we've encountered similar problem at our place. Finally I can point out the importance of not relying only with the alerts data and ask all the SAs to read your post and learn. It's realy frustrating when we dun have any other data to support our analysis. IMHO NSM is so far the best practice.

p/s: And your books are great.

Daniel said...

This NBS (from tenable) looks like just a copy of the FTS (first time seen) from OSSEC. It does basically the same think, just with different names.

Daniel

Richard Bejtlich said...

Daniel,

It's definitely not a copy. Marcus (Tenable CTO) has had NBS implementations for years. The latest is here. We were doing "NBS" with ASIM in the USAF in 1993, so nothing's really "new."

Rob Lewis said...

As you know, Marcus also writes a lot about how all of this is dumb, and none of it is better then deny-by-default and enumerating goodness.

No matter how you frame it, this is just reactive versus proactive security.

It makes more sense, to me anyway, to deny "NBS" events that are unauthorized, rather than do a song and dance to figure out where they came from.

Beth Rosenberg said...

Hi Richard. Doing a little PR via blog comment here; a couple of months ago I suggested you take a look at Sandstorm's NetIntercept 3.2 --- either via download or live demo.

We have such a great product, and it's so overlooked, in favor of freeware on one end, and six-figure bloatware on the other. :(

Richard Bejtlich said...

Rob, "Proactive" vs "reactive" is not the way to look at this. I agree that ideally you would deny anything that's not allowed. In reality, it's exceptionally difficult to decide what should be allowed or even what "allowed" means. At the point when you make a decision the network has changed. An exceptionally "secure" or "tight" default deny will never be enforced due to "business concerns."

Mike SC said...

Richard,

I’ve always use net-flows or argus flows to complement the IDS/IPS analysis, and I even think that a full packet capture system – something like Network General InfiniStream – would be even better. Now, the 64K question is “why the major IDS/IPS vendors do not add a similar capability to their systems?” Yes, they have the option to capture x packets prior to the alert, or they can do session tagging similar to Snort, but I do not know of a major IDS/IPS product that can do session flow capture. Either do not agree with the NMS approach, or if they agree, they don’t want it because it would tax the performance of their sensors.

Thanks,
Mike SC

DJB said...

Richard,

I enjoyed your observations and agree with your assessment. I read your blog often, but rarely comment. I feel however, that this topic deserves further attention. The topic is not alert-centric vs. NSM, or even passive vs. reactive. The real issue here is Return on Investment for security and Due Care. The cost and lack of common expertise of NSM is why it has not been fully adopted. Every SOC/NOC I’ve ever been in (over 100) suffers the plight you have identified. Furthermore, I could hire a hundred people with your level of expertise or the same number of Gulas, Ranums and Roeschs to perform NSM. The only problem is that the problem would not go away and I would be out a significant amount of money, even if you have “the right forms of data available.” The volume of traffic that we are talking about would require far too many experts. IDS/IPS arms the packet pushers and senior managers with the mighty shield of ignorance. As long as you are doing something it’s perceived as better than doing nothing. IA as a profession needs to wake up and assert itself. We need to refocus our efforts to be enablers of security rather than enforcers of security. We can’t eliminate the threats that perpetrate the attacks and intrusions that NSM detects. Instead we must budget and focus our efforts on ensuring that the people, processes and technology we employ adhere to the established policies and standards. See you at the next ISSA meeting.

Anonymous said...

Wish you a happy and prosperous new year.
Soleilmavis
http://soleilmavis.blogspot.com

one.miguel said...

No need to buy another vendor product for full packet capture. Infinistream costs about 60k, I think. Get a cheap server with a terabyte of drive space and run Linux (or FreeBSD) and capture packets from a mirror port using tcpdump. Having an all-in-one solution is scary, based on experience. Having three separate (or more) systems (IDS/IPS+Flowdata+tcpdump) is definitely nicer for incident response. The fact that it doesn't cost an arm and left is nice too!

Rob Lewis said...

Richard, absolutely all of your comments are fair, but my view is based on a different model of security that was developed exactly because of the difficulties you mention. I am not real technical but I read your blog because it as good as any to get a feel for what security persons typically go through and have to deal with.

As an exercise for myself, I will attempt to address your comments from our different point of view, (which I hope is following the principle of idea debate that I believe is the true ideal behind blogging).

>>>it's exceptionally difficult to decide what should be allowed or even what "allowed" means.

I do not feel this is difficult at all. We look at this question from the context of "what users are allowed to access these files?".
To be manageable and intuitive, even to non-technical managers, internal controls must be user centric at the data level; it goes far beyond governing access to the network. It is more than endpoint security, which really is not security at all from this point of view, for if your rule is Richard can access data set A but not data set B (assuming all other requirements for authentication etc. are satisfied), then Richard can not access data set B no matter what device he uses. This approach must be deny-by-default.

This is accomplished easily if you can enforce group permissions and security rankings within and between groups that already exist in all systems, in what amounts to multilevel/trusted security.


>>>At the point when you make a decision the network has changed.

With the user centric approach it does not matter how the network changes, but the approach to governing user access must.

Rather than try and protect core data with edge security, it probably makes more sense to lock down the crown jewels of data at the host or core and push those user centric rules outward to clients and perimeter. If this can be achieved, then the network can evolve optimally to enhance data flow and still be secure. The white list rules can be pushed to all devices. Remote users are an extension of your internal network, but by invitation only,and guests to the site have no privileges of any consequence.



>>>>An exceptionally "secure" or
"tight" default deny will never be enforced due to "business concerns."

In our case, there is a continuum of security available to the enterprise, from full mandatory access controls as found in MLS/TOS that can be relaxed to a normal discretionary system with a few authorized rule changes by the CSO.
You can have one group that has full MLS and another that has normal discretionary rules, with the option of allowing users multiple group permissions, but that will not allow any user to share that file with anyone not on the permission list. As per true MLS/TOS, improper use of data is disallowed. It becomes possible to have secure hand-offs of data, complete with tamper proof audit trails of each event, between groups, divisions, orgs, etc.

Such a system has natural advantages that have not been available before. Reducing unauthorized network traffic, elinating false positives and providing useful user based auditing. While past experiences with TOS etc may have been negative due to complexity and cost and inter-operabity (bricking the system) that does not happen with this model.

A whole slew of security problems though are eliminated with this approach though. Rather than just eliminate use of USB drives, which would inhibit legitimate business data flow, our approach can allow use of all devices for allowed access files, but prohibit their use for sensitive data or trade secrets. The same can be done for network cards, printers, burners, etc..

In fact, is there a defense against screen scraping (photo capture of screens)? It is very easy to designate that secret info can only be viewed on a certain monitor, in the presence of a security officer or minimally, video cameras.

The end result is that it becomes possible to enforce security policies that also become much more intuitive to set.

Just my 2 cents.