Five Thoughts on Incident Response

March 14, 2007

Speaking of incidents, I thought it might be interesting to share a few brief observations based on incidents I've worked recently. Please remember this is a blog post. If you expect thorough explanations of these points with footnotes, historical references, arguments to the contrary expertly swept aside, etc., please wait for a future book! :)

Anti-Virus is not (or should not be) an incident response tool. I am baffled when I see machines compromised, and the owners think a magic signature from their AV vendor is going to save the day. In this day and age intruders who gain kernel level control of a host often disable AV and will not give up the fight so easily. My second point relates to this one.

Your default incident recovery strategy should be to rebuild from scratch. By scratch I mean reinstallation from original trusted media and re-installation of applications and data.

Today, in 2007, I am still comfortable saying that existing hardware can usually be trusted, without evidence to the contrary, as a platform for reinstallation. This is one year after I saw John Heasman discuss PCI rootkits (.pdf). I was lucky enough to spend a few hours chatting with John and fellow NGS Software guru David Litchfield after John's talk on firmware rootkits (.pdf). John's talks indicate that the day is coming when even hardware that hosted a compromised OS will eventually not be trustworthy.

One day I will advise clients to treat an incident zone as if a total physical loss has occurred and new platforms have to be available for hosting a reinstallation. If you doubt me now, wait for the post in a few years where I link back to this point. In brief, treat an incident like a disaster, not a nuisance. Otherwise, you will be perpetually compromised.

SPAN ports should not be the default traffic access option. I cannot tell you how much time, effort, and frustration has accompanied the use (or attempted use) of SPAN ports in incident response situations.
- "The SPAN port is already used."
- "The SPAN port can't do that." (although it probably can, the network engineer either doesn't know how to set it up or doesn't want it configured to help the security team)
- "Do you see anything? No? Try now. No? Try now. No?"
- "You only see half the traffic? Wait, try this. Now you see double? Ok, try now."
For Pete's sake, buy a tap, put it in the proper place, and stand back while the packets are collected properly.

A Linux live CD is not a substitute for a real network security monitoring platform. Upon realizing that Cisco MARS is not an incident response solution, I was desperate to collect some form of useful network-centric data at one client site. In a last-ditch attempt to salvage a bad situation my on-site colleague deployed a Network Security Toolkit live CD on top of a box previously running Linux natively. I was able to SSH into it, mount the local filesystem, and start writing packets to the hard drive using Tshark's ring buffer. This is absolutely making the best out of a mess, which is standard incident response behavior.

I would ask anyone who turns to a live CD for their monitoring needs to avoid the temptation to think Snort on a live CD on spare, old hardware is anything like Snort on properly sized, configured, deployed hardware. Furthermore, Snort != monitoring. Live CDs are fine for assessment work but they are nearly worthless for packet capture. Needless to say I was able to talk my colleague through a FreeBSD installation and was soon collecting data in a somewhat better environment.

When you are compromised, you are probably not facing a zero-day exploit unique to you and not capable of being prevented. When you are compromised you're most likely suffering from some fairly modern variant of attack code that nevertheless contains exploits dating back to 2002. For some reason people seem to feel better if they think the incident is caused by some uber elite intruder who saved up his killer 0-day just for their enterprise. In reality someone probably connected an infected laptop physically to the network, or via VPN, and found a way to get a worm or other malware to the segment of the enterprise running "production" machines that never get patched.

Do you have any IR stories or lessons to share? Please post them as comments or write on your blog, then post a link here as a comment. Thank you.

Comments

hogfly said…

Richard,

I've included a random spattering of comments in my blog

12:55 AM

H. Carvey said…

Thoughts...

Anti-Virus is (or should not be) an incident response tool.

And yet, many times, it is. The fact remains that many infrastructures are simply unprepared an incident...personnel aren't trained, etc., so "incident response" is performed by committee. I really don't think that there's anything wrong with performing your own IR, at least to start...there just needs to be the training in place first.

Your default incident recovery strategy should be to rebuild from scratch.

Ugh! In the face of business continuity issues, blanket statements such as this do a disservice to your customers. Not to pick nits, but how about some caveats here? If you get hit with something in user mode and find no evidence of a kernel compromise, why can't you keep your systems running, after extensive cleaning, and with continued monitoring...at least until you can get replacements up and running?

"Rebuild from scratch" without a root cause analysis will very likely lead to re-compromise or re-infection. You can re-install the system from clean media, and patch it from here to kingdom come, but if the original issue was a misconfiguration in an application or a weak, easily guessed password, you're setting yourself up to be p0wned all over again.

Should systems be rebuilt? Yes, I believe so...in many cases, this is not only the best option, but a great opportunity to get upgrades in place.

A Linux live CD is not a substitute for a real network security monitoring platform.

Nor is it useful during live response.

When you are compromised, you are probably not facing a zero-day exploit unique to you and not capable of being prevented.

Agreed. The problem that I've seen is that most incident responders are thrown into the fire when an incident occurs, with no training or background. Too often, cases like this result in worms running unchecked for weeks at a time, or the responder claiming that a rootkit is to blame and throwing in the towel after doing nothing more than looking at Task Manager.

There's a lot more to IR than just what most folks see, and the best overall approach is a proactive one.

1. If you have an IT infrastructure, get IR training. If you're already experienced issues, then you've already got a basis for functional, hands-on training right there. Going off to Linux-centric training when you're an all-Windows shop can be a waste of time and money, depending on your staff.

2. Prepare of incidents. In my book, I use "Mission Impossible" as an example of IR prep...Cruise's character makes it back to the safe house and getting to the top of the stairs, takes off his jacket and crushes the light bulb inside it. He then spreads the shards in the hallway as he backs toward the door (ingress route). Anyone entering the hallway won't see the glass and will make noise. Configuring systems appropriately will cause noise when an incident occurs...at that point, it's just a matter of listening.

3. Develop the ability of your internal staff to perform tier 1 IR, and if necessary, seek out professionals to perform tier 2/3 IR. Seek their training and assistance. One of the more common issues I've seen when going on-site is that the customer wants to know what processes were running on the system, and if any sensitive data was leaving the network...and the system in question has been shutdown and taken off the network and there are no network logs.

9:11 AM

Richard Bejtlich said…

Hi Harlan,

Thanks for your comments. I agree with you, but please don't think my recommendation for rebuilding from scratch precludes root cause analysis. I would never recommend rebuilding from scratch blindly because I agree that re-compromise would likely happen.

9:24 AM

yoshi said…

re: taps

I am sensitive to taps is it becomes a single point of failure. In my last permanent position we employed taps on numerous occasions only to watch them fail and take out service at some random moment (so much for fail open). In the end - we set a standard for SPAN ports, enforced it, and rarely had issues. Its called setting process and standards.

10:29 AM

PaulM said…

Excellent article, and good advice all around.

Anti-Virus is (or should not be) an incident response tool.

I think your point started out something like, "Once you know a machine has been compromised, running Stinger is a waste of time. It probably won't find it and it definitely can't clean it." And I agree totally with that statement. Just don't throw out the baby with the bathwater.

Over the past 6-7 years, AV has grown up. I get real-time alert data from all of my AV clients. Not only does it detect named viruses, this AV product is also capable of some generic attack/behavior detection. Since malware is increasingly being delivered and operating over TLS, host-based detection will continue to be important.

11:12 AM

Richard Bejtlich said…

Tim,

So how do you prevent your switches failing? Are all your hosts connected to two or more switches for redundancy?

I am interested in knowing your tap vendor, too.

11:44 AM

Anonymous said…

Great article.

I have to agree with you about rebuilding from scratch. If it's just a PC that's infected (even with a root kit) due to some malware that doesn't mean your whole enterprise is "owned".

One thing not mentioned (and probably beyond the scope):
How do you handle what you've found as a course of your investigation? What if it leads outside of your enterprise?

1:10 PM

Richard Bejtlich said…

Thanks David.

How I handle findings depends on the client, unless there's some bright line issue (like CP) which requires me notifying the authorities if the client won't act. I usually leave dealing with other companies to the client, although I will support them. It's usually not my role to act specifically on behalf of a company when speaking to another company. Furthermore, corporate lawyers tend to get heavily involved when other companies are affected.

1:50 PM

H. Carvey said…

Richard,

Re: root cause analysis...one thing to remember is that even with a blog, lots of folks aren't going to remember a lot of the intricacies of what you said. While many incidents will result in a rebuild, its extremely important to not only highlight the need for an RCA, but to tell folks how to do one.

Harlan

2:48 PM

Search This Blog

TaoSecurity Blog

Five Thoughts on Incident Response

Comments

Popular posts from this blog

Zeek in Action Videos

MITRE ATT&CK Tactics Are Not Tactics

New Book! The Best of TaoSecurity Blog, Volume 4