New Honeynet Project Challenge
I saw that the Honeynet Project announced a new Scan of the Month last week. The evidence consists of Apache logs, Linux syslogs, Snort logs, and IPTables firewall logs. Here are examples.
From the Apache access log:
From the /var/log/messages syslog:
From the Snort logs, apparently captured via syslog:
Finally, from the IPTables logs:
Let's see how much data is in each of the logs. I used 'wc' to count lines in each of the sets of logs.
So, we have over 260,000 lines of log entries to review. This seems fairly crazy to me. As a NSM practitioner who advocates collecting session and full content data, I am often criticized by those who consider it too difficult or expensive to collect such forms of network evidence. This Scan of the Month presents the alternative -- working though line after line of text-based log entries. Now what is more expensive, in terms of time and resources?
You might say I would have the same problem analyzing this intrusion using NSM techniques. You might believe Snort would yield the same number of alerts whether configured to emit text-based records via syslog or alerts for presentation by Sguil.
I guarantee I could determine if the system was compromised, and by how many parties, faster using NSM techniques than manual log analysis.
I would also know exactly what network traffic the intruder launched against the target, regardless of whether or not it triggered a Snort alert. I would not have to look at text-based IPTables representations of packet movement. I could instead look at session data, which summarizes the thousands of packets in a flow into a single record.
I believe the winner of this SotM will end up being a Perl or Awk wizard who can parse the logs efficiently to reduce the number of lines to be analyzed.
This is still a useful challenge. If there is any data available at all after a compromise, it is often in the form of Web logs, syslogs, and so on. It is important to know how to interpret such evidence, if that is all there is to analyze. Still -- imagine the possibilities when NSM-based evidence is collected!
From the Apache access log:
210.116.59.164 - - [13/Mar/2005:04:05:47 -0500]
"POST /_vti_bin/_vti_aut/fp30reg.dll HTTP/1.1" 404 1063 "-" "-"
From the /var/log/messages syslog:
Mar 13 22:50:53 combo sshd(pam_unix)[9356]:
authentication failure; logname= uid=0 euid=0
tty=NODEVssh ruser= rhost=h-67-103-15-70.nycmny83.covad.net user=root
From the Snort logs, apparently captured via syslog:
Feb 25 12:21:33 bastion snort: [1:483:5] ICMP PING CyberKit 2.2 Windows
[Classification: Misc activity] [Priority: 3]: {ICMP}
70.81.243.88 -> 11.11.79.100
Finally, from the IPTables logs:
Feb 25 12:11:24 bridge kernel: INBOUND TCP: IN=br0
PHYSIN=eth0 OUT=br0 PHYSOUT=eth1
SRC=220.228.136.38 DST=11.11.79.83 LEN=64 TOS=0x00
PREC=0x00 TTL=47 ID=17159 DF
PROTO=TCP SPT=1629 DPT=139 WINDOW=44620 RES=0x00 SYN URGP=0
Let's see how much data is in each of the logs. I used 'wc' to count lines in each of the sets of logs.
- Apache: 7,620
- Syslog: 3,925
- Snort: 69,039
- IPTables: 179,752
- Total: 260,336
So, we have over 260,000 lines of log entries to review. This seems fairly crazy to me. As a NSM practitioner who advocates collecting session and full content data, I am often criticized by those who consider it too difficult or expensive to collect such forms of network evidence. This Scan of the Month presents the alternative -- working though line after line of text-based log entries. Now what is more expensive, in terms of time and resources?
You might say I would have the same problem analyzing this intrusion using NSM techniques. You might believe Snort would yield the same number of alerts whether configured to emit text-based records via syslog or alerts for presentation by Sguil.
I guarantee I could determine if the system was compromised, and by how many parties, faster using NSM techniques than manual log analysis.
I would also know exactly what network traffic the intruder launched against the target, regardless of whether or not it triggered a Snort alert. I would not have to look at text-based IPTables representations of packet movement. I could instead look at session data, which summarizes the thousands of packets in a flow into a single record.
I believe the winner of this SotM will end up being a Perl or Awk wizard who can parse the logs efficiently to reduce the number of lines to be analyzed.
This is still a useful challenge. If there is any data available at all after a compromise, it is often in the form of Web logs, syslogs, and so on. It is important to know how to interpret such evidence, if that is all there is to analyze. Still -- imagine the possibilities when NSM-based evidence is collected!
Comments
You might want to try using SEC (Simple Event Correlator) for this log parsing task. One of my colleagues used this tool for a similar situation (during our SANS GIAC course to parse all the snort logs they distribute). Worked great!
SEC: http://kodu.neti.ee/~risto/sec/
Here's a great write up by Jim Brown on using SEC : http://sixshooter.v6.thrupoint.net/SEC-examples/article.html
And Chris' Posted practical if you'd like to see how he used it to parse his snort logs.
http://www.giac.org/certified_professionals/practicals/gcia/0650.php