Monday, December 26, 2005

Pulling the Plug in 2005

Every time I attend a USENIX conference, I gather free copies of the ;login: magazine published by the association. The August 2005 issue features some great stories, with some of them available right now to non-USENIX members. (USENIX makes all magazine articles open to the public one year after publication. For example, anyone can now read the entire December 2004 issue.)

An article which caught my eye was Forensics for System Administrators by Sean Peisert. Although the USENIX copy of the article won't be published until August 2006, you can read Sean's copy here (.pdf).

I thought the article was proceeding well until I came across this advice.

"What happens when there is some past event that a system administrator wishes to understand on their system? Where should the administrator, now a novice forensic analyst, begin? There are many variables and questions that must be answered to make proper decisions about this. Under almost all circumstances in which the system can be taken down to do the analysis, the ideal thing to do is halt or power-off the system using a hardware method." (emphasis added)

Is he serious? The article continues:

"[T]he x86 BIOS does not have a monitor mode that supports this [a hardware interrupt]. The solution for everyone else? Pull the plug. The machine will power off, the disk will remain as-is, and there will be no possibility of further contamination of the evidence through some sort of clean-up script left by the intruder, as long as the disk is not booted off or mounted in read/write mode again. The reason for stopping a machine is that it prevents further alteration of the evidence. The reason for halting with a hardware interrupt, rather than using the UNIX halt or shutdown command is that if a root compromise occurred, those commands could have been trojaned by an intruder to clean up evidence."

I can't believe I'm reading this advice in 2005, only 6 days from 2006. This is the advice I heard nearly 10 years ago. "Pulling the plug" as the first step in a forensic investigation is absolutely terrible advice. I am not a host-based forensics guru, but I know that a live response, first described in the June 2001 book Incident Response by Mandia, Prosise, and Pepe, should be part of even the most basic forensically-minded sys admin's techniques. Sean could have even looked into the ;login: archives to find Keith Jones' article in the November 2001 issue describing live response.

Live response is a technique to retrieve volatile information from a running system in a forensically sound manner. Live response can be frustrated by some binary and kernel alteration techniques, but it is a good (non-network-centric) first step whenever a host is suspected of being compromised. Those who want to know more about live response, and see how helpful the advice can be, will enjoy reading Real Digital Forensics.

Sean tries to defend pulling the plug here:

"In our first example intrusion, I took a preliminary look at the syslog and saw that dates of suspicious logins went back at least three weeks. Given that the intrusion seemed to be going on for so long, I decided that I could no longer trust the system to reliably and accurately report evidence about itself. Therefore, pulling the plug on the machine was the best option."

That is a really weak excuse. Certainly a non-ankle-biter attacker will take steps to hide his presence. That does not mean that no attempt should be made to collect volatile system information!

Sean continues:

"It is certainly the case that halting a system can help perserve more evidence, particularly that in swap, slack, or otherwise unallocated space on disk. But it also can destroy some evidence. For example, halting a system will wipe out the contents of memory, hindering the ability of an analyst to dump a memory image to disk. However, in the forensic discussions in this article, slack space and memory dumps are outside the scope of our analysis. In our case, halting a system merely helped to preserve real evidence, and had the intrusion in our first example been discovered sooner, and the system sooner halted as a result, the intruder would have had less time to cover their tracks. Then, as I will discuss, certain helpful log files that were deleted may have been recoverable."

If Sean is worried that an intruder will take actions to "cover their tracks," then the live response can be performed after the victim host has been cut off from the Internet. Sure, the most 31337 attackers may detect this and start self-cleansing procedures, but how often does that happen? Also, collecting live response data does not usually trigger any cleaning mechanisms. The sort of data one collects is the normal information a system administrator might inspect during the course of regular duties.

The fundamental issue here is whether pulling the plug should be the first response activity or not. In my experience, cutting off remote access is the first step. Analysis of NSM data involving the target host is second. Live response is the third. Forensic duplication and analysis is the fourth, if the previous two steps point to compromise and the resources for investigation and available.

This part of the article makes me sad:

"This material is based on work sponsored by the United States Air Force and supported by the Air Force Research Laboratory under Contract F30602-03-C-0075 and performed in conjunction with Lockheed Martin Information Assurance. Thanks to Sid Karin, Abe Singer, Matt Bishop, and Keith Marzullo, who provided valuable discussions during the writing of this article."

First, why is the Air Force paying for advice that should have been abandoned in 1998, the last time I remember the Air Force suggesting these sorts of actions? Second, why didn't any of the article reviewers speak out against this bad advice?

8 comments:

geek00L said...

Pulling the plug after the incident is not a good idea, I have few times requiring live incident response of which bad guy running codes in the memory and deleted itself from the hardrive, if we pull out the plug without performing live forensic, apparently it will be gone and giving hard time to ourself to understand the circumstances. I wonder why they don't put swap and memory as a serious source of finding or locating evidence and footstep of bad guy. This is apparently not a clever way and if this article write out through the discussion of many talented guys, I'm wondering where they have put their mind on. That being said, it sucks.

js said...

I agree wholeheartedly. The very concept of pulling the plug goes against basic forensic science theory. Destroying potential evidence (i.e., pulling the plug) is foolish and damaging advice to give. Unfortunately, I know of law enforcement who have been trained to pull the plug unless they're dealing with Unix systems or servers. (!!!!)

Keydet89 said...

Joe,

I'm completely on-board with the idea of performing some kind of live response activities prior to pulling the plug. What I'm curious about is the last sentence of your comment...why pull the plug only on non-Unix systems?

One thing I'm curious about is the tendency to pull an "image" of physical memory...what does one do with a upwards to a full gigabyte of data? From what I've been told, some folks run strings on it, and though that may give you leads, it does so without context.

H. Carvey
"Windows Forensics and Incident Recovery"
http://www.windows-ir.com
http://windowsir.blogspot.com

js said...

Harlan,

Their reason is that Unix systems are more susceptible to system files being corrupted after pulling the plug (cached data, kernel memory space objects, etc.) The problem is safely shutting down a *nix box without the root password. If you don't have it to run su or sudo-based shutdown, then you have no choice but to pull the plug. I know some sysadmins that would attest to weird things happening when an upgrade froze the system or a sudden power loss trashed their *nix OS. Windows is somehow more resiliant in this aspect.

As for the memory dump, there are some useful applications for that. The context can be achieved with strings and tying that back to the hard drive image or "known" memory text, e.g. trojan signature of some type. This is useful when you have rogue applications running on the system, rather than accounting fraud and other static incidents.

Keydet89 said...

Joe,

My concern is that the "context" one achieves with strings is limited, as you really don't have much of a way to tie it to a specific process.

Also, you say, "As for the memory dump, there are some useful applications for that."...can you name any available for Windows physical memory dumps?

js said...

By "applications," I meant ways to apply or use the memory dump. I do not know of any useful Windows programs. I tend to use EnCase and FTK when performing forensics inside of Windows. I can think of several useful ways to parse and make the data available through even just a scripting language.

I agree that you can't very well compare it to a specific program, but there are times when it can be done. Say a program is writing packets, which will be stored in memory until overwritten. The binary data will be of little to no value.

Anonymous said...

As it was mentioned, law enforcement is taught to pull the plug on the majority Windows systems, however, this doesn't pertain to incident response situations (cops do dead box forensics, not incident response as general rule).

As an analogy with LE, cops are trained to preserve the crime scene without doing any further harm or changes, whether it be a murder scene or computer related incident, so, you get lots of training to 'pull the plug'. The IT guys and gals deal with another spectrum of computer incidents where pulling the plug won't be beneficial.

Brett Shavers
bshavers@gmail.com

Anonymous said...

I find it very interesting that several of the posts say that one way or the other "sucks" or is not acceptable. I think this is where ignorance or intolerance leads one to spout about without seeing the “big picture”
There is more than one acceptable way of performing an analysis and collection of “evidence” and everyone should be on board with that philosophy, short of something destroying "evidence". One speaks from an "enterprise" philosophy while the other speaks from LE. I have completed both styles of exams and there is no "one" way to arrive at a correct result.
There are many applications to capture the "live" data, including RAM and running processes. ProDiscover, LiveWire, HELIX and of course, EnCase FIM/Enterprise. If one captures the "live" information then pulls the plug, what is the harm? It really is a win/win situation where the live data is/was captured and the pagefile/swap and other "changable" disk areas are presevered "as is", at least on Win9X or NT systems. The LINUX/UNIX based systems do have potential to suffer kernel damage and therefore the risk VS reward must be evaluated when deciding whether or not to “pull the plug”.
Harlan, with regards to the live state of RAM. I have reviewed live captures of the RAM and successfully recovered passwords while others have recovered partial, unlogged chats and other useful information, including some memory resident only programs that don’t show up on the running processes list. I think this is what you were asking, I apologize if not.
My personal opinion is that we, as forensic analysts or examiners should not be too quick to point the finger unless something is just flat out wrong. This stuff will come back to bite someone when we make vague statements that imply that a procedure is not acceptable or incorrect or just not “up to date” instead of stating that a procedure is not a “best practice” situation. Just my two cents.

Trooper Michael C. Taylor
PA State Police
Criminal Investigator / Computer Crimes