Question on NSM Methodology

I received the following question via email today:

"I'm a huge fan of your newest book, and I read it cover-to-cover in a handful of evenings. However, I have a question about the approach you take for doing network monitoring.

The average throughput of our Internet connection is around 5Mbits/sec sustained. I would love to implement Sguil as an interface to my IDS infrastructure (currently Acid and Snort on the network side), but I ran some numbers on the disk space required to store that much network traffic, and the number quickly swamped the disk resources I currently have available to me for this activity.

Am I missing something with regards to how Snort stores data in this kind of scenario, or do I really need to plan for that much disk space?"

This is a good question, and it is a common initial response to learning about Network Security Monitoring (NSM).

Remember that NSM is defined as "the collection, analysis, and escalation of indications and warning to detect and respond to intrusions." There is no explicit mention of collecting every packet that traverses the network in that defintion. However, NSM analysts find that the best way to accomplish the detection of and response to intrusions is by interpreting alert, session, full content, and statistical network evidence. Simply moving "beyond intrusion detection" (my book's subtitle) -- beyond reliance on alert data alone -- moves one away from traditional "IDS" and towards NSM.

The answer for those operating in high bandwidth requirements is collect what you can. Chapter 2 (.pdf) lists several principles of detection and security, including:

- Detection through sampling is better than no detection.
- Detection through traffic analysis is better than no detection.
- Collecting everything is ideal but problematic.

I recommend looking at Chapter 2 for more information on these principles.

Someone monitoring a data center uplink or an Internet backbone is not going to collect meaningful amounts of full content data without spending a lot of money on high-end hardware and potentially custom software. You may only be able to collect small amounts of full content data in response to specific problems like identification of a covert back door. You may have to take a cue from Internet2 and analyze NetFlow data, and hardly look at full content at all.

There is nothing wrong with either approach. The idea is to give your analysts as much supporting information as possible when they need to make a decision concerning suspicious or malicious traffic. Only giving them an alert, with no other context or non-judgement based data, makes it very unlikely the analyst will know how to make an informed validation and escalation decision.

My specific answer for the question at hand would be to try deploying Sguil with a conservative full content collection strategy. Pass BPF filters in the log_packets.sh script to limit the full content data collected on the sensor. Additionally, if you find the amount of session data logged by SANCP to be a burden, you can pass filters to SANCP as well.

If at all possible I advise not filtering SANCP and other session collection mechanisms, as these content-neutral collection measures can really save you in an incident response scenario. If SANCP and database inserts are a problem, consider the more mature code base of Argus or collecting NetFlow records from routers you control. My book also outlines how to do this.

Update: My buddy Bamm Visscher points out that a sustained 5 Mbps throughput is 2250 MB per hour or 54000 MB per day in raw traffic. However, some overhead is needed for libpcap headers. For small packets, the header could be as large as the content, effectively doubling the disk space needed to record that packet. For large packets, the header is a smaller percentage of the overall record of the packet.

Anectdotal evidence from one of Bamm's friends says a link with sustained 10 Mbps is writing about 8 GB per hour to disk.

Speaking conservatively for the original question of 5 Mbps, a sensor like a Dell PowerEdge 750 with two 250 GB SATA drives can hold at least several days worth of libpcap data, and potentially up to a week. That's plenty of time to retrieve useful full content data if regular monitoring is done.

Comments

Popular posts from this blog

Zeek in Action Videos

New Book! The Best of TaoSecurity Blog, Volume 4

MITRE ATT&CK Tactics Are Not Tactics