Every time I attend a USENIX conference, I gather free copies of the ;login: magazine published by the association. The August 2005 issue features some great stories, with some of them available right now to non-USENIX members. (USENIX makes all magazine articles open to the public one year after publication. For example, anyone can now read the entire December 2004 issue.)
An article which caught my eye was Forensics for System Administrators by Sean Peisert. Although the USENIX copy of the article won't be published until August 2006, you can read Sean's copy here (.pdf).
I thought the article was proceeding well until I came across this advice.
"What happens when there is some past event that a system administrator wishes to understand on their system? Where should the administrator, now a novice forensic analyst, begin? There are many variables and questions that must be answered to make proper decisions about this. Under almost all circumstances in which the system can be taken down to do the analysis, the ideal thing to do is halt or power-off the system using a hardware method." (emphasis added)
Is he serious? The article continues:
"[T]he x86 BIOS does not have a monitor mode that supports this [a hardware interrupt]. The solution for everyone else? Pull the plug. The machine will power off, the disk will remain as-is, and there will be no possibility of further contamination of the evidence through some sort of clean-up script left by the intruder, as long as the disk is not booted off or mounted in read/write mode again. The reason for stopping a machine is that it prevents further alteration of the evidence. The reason for halting with a hardware interrupt, rather than using the UNIX halt or shutdown command is that if a root compromise occurred, those commands could have been trojaned by an intruder to clean up evidence."
I can't believe I'm reading this advice in 2005, only 6 days from 2006. This is the advice I heard nearly 10 years ago. "Pulling the plug" as the first step in a forensic investigation is absolutely terrible advice. I am not a host-based forensics guru, but I know that a live response, first described in the June 2001 book Incident Response by Mandia, Prosise, and Pepe, should be part of even the most basic forensically-minded sys admin's techniques. Sean could have even looked into the ;login: archives to find Keith Jones' article in the November 2001 issue describing live response.
Live response is a technique to retrieve volatile information from a running system in a forensically sound manner. Live response can be frustrated by some binary and kernel alteration techniques, but it is a good (non-network-centric) first step whenever a host is suspected of being compromised. Those who want to know more about live response, and see how helpful the advice can be, will enjoy reading Real Digital Forensics.
Sean tries to defend pulling the plug here:
"In our first example intrusion, I took a preliminary look at the syslog and saw that dates of suspicious logins went back at least three weeks. Given that the intrusion seemed to be going on for so long, I decided that I could no longer trust the system to reliably and accurately report evidence about itself. Therefore, pulling the plug on the machine was the best option."
That is a really weak excuse. Certainly a non-ankle-biter attacker will take steps to hide his presence. That does not mean that no attempt should be made to collect volatile system information!
"It is certainly the case that halting a system can help perserve more evidence, particularly that in swap, slack, or otherwise unallocated space on disk. But it also can destroy some evidence. For example, halting a system will wipe out the contents of memory, hindering the ability of an analyst to dump a memory image to disk. However, in the forensic discussions in this article, slack space and memory dumps are outside the scope of our analysis. In our case, halting a system merely helped to preserve real evidence, and had the intrusion in our first example been discovered sooner, and the system sooner halted as a result, the intruder would have had less time to cover their tracks. Then, as I will discuss, certain helpful log files that were deleted may have been recoverable."
If Sean is worried that an intruder will take actions to "cover their tracks," then the live response can be performed after the victim host has been cut off from the Internet. Sure, the most 31337 attackers may detect this and start self-cleansing procedures, but how often does that happen? Also, collecting live response data does not usually trigger any cleaning mechanisms. The sort of data one collects is the normal information a system administrator might inspect during the course of regular duties.
The fundamental issue here is whether pulling the plug should be the first response activity or not. In my experience, cutting off remote access is the first step. Analysis of NSM data involving the target host is second. Live response is the third. Forensic duplication and analysis is the fourth, if the previous two steps point to compromise and the resources for investigation and available.
This part of the article makes me sad:
"This material is based on work sponsored by the United States Air Force and supported by the Air Force Research Laboratory under Contract F30602-03-C-0075 and performed in conjunction with Lockheed Martin Information Assurance. Thanks to Sid Karin, Abe Singer, Matt Bishop, and Keith Marzullo, who provided valuable discussions during the writing of this article."
First, why is the Air Force paying for advice that should have been abandoned in 1998, the last time I remember the Air Force suggesting these sorts of actions? Second, why didn't any of the article reviewers speak out against this bad advice?