Frequently I'm asked about the data sources I cite as being necessary for Network Security Monitoring, namely statistical data, session data, full content data, and alert data. Sometimes people ask me "Is it NSM if I'm not collecting full content?" or "Where's the statistical data in Sguil? Without it, is Sguil a NSM tool?" In this post I'd like to address this point and answer a question posted as a comment Joe left on my post My Investigative Process Using NSM.
In 2002 while working for Foundstone, I contributed to the fourth edition of Hacking Exposed, pictured at left. On page 2 I defined NSM as the collection, analysis, and escalation of indications and warning to detect and respond to intrusions. Since then I've considered modifying that definition to emphasize the traffic-centric approach I intended to convey by using the term "network."
Whenever I speak or write about NSM I emphasize the four types of network data most likely to discover and control intrusions. However, I also say and write that you should collect the maximum amount of data that is technically, legally, and reasonably possible. For example, it is technically impossible (without spending vast amounts of money) to continuously collect all but a short period of full content traffic in some environments. In other cases, it is not legally allowed, or privacy concerns render collecting full content a bad idea. For example, I would hope my ISP avoids storing all user packets because they claim a security value. With reason as a guide, I would also expect NSM practitioners to avoid storing the full content of the traffic passing on their storage area network or similar.
I like the approach taken by the inspiration for The Tao of Network Security Monitoring, namely the incomparable Bruce Lee's The Tao of Jeet Kune Do. Bruce Lee didn't advocate slavish devotion to any style. He suggested taking what was valuable from a variety of styles and applying what works in your own situation. I recommend the same idea with NSM.
Does this mean that one can completely avoid collecting full content data, perhaps relying instead on statistical, session, and alert data? I argue that whatever the limitations that prevent continuous full content data collection, the ability to perform on-demand full content data collection is an absolute requirement of NSM.
First, every network probably must have this capability, simply to meet lawful intercept requirements. Second, although I love session data, it is not always able to answer every question I may have about a suspicious connection. This is why approaches like the DoD's Centaur project are helpful but not sufficient. There is really no substitute for being able to look at full content, even if it's activated in the hopes of catching a future instance of a suspicious event. Third, only full content data can be carefully re-examined by deep inspection applications (like an IDS) once a new detection method is deployed. Session data can be mined, but the lack of all packet details renders detection of certain types of suspicious behavior impossible.
While we're talking about full content, I suppose I should briefly address the issue of encryption. Yes, encryption is a problem. Shoot, even binary protocols, obscure protocols, and the like make understanding full content difficult and maybe impossible. Yes, intruders use encryption, and those that don't are fools. The point is that even if you find an encrypted channel when inspecting full content, the fact that it is encrypted has value.
When I discover outbound traffic to port 44444 TCP on a remote server, I react one way if I can read clear HTTP, but differently if the content appears encrypted.
I will admit to filtering full content collection of traditionally encrypted traffic, such as that on 443 TCP, when such collecting such traffic would drastically decrease the overall amount of full content I could collect. (For example, with HTTPS I might only save 1 day's worth of traffic; without, maybe 3 days.) In such cases I am making a trade-off that I hope is acceptable given the constraints of my environment.
As to why we don't have statistical data in Sguil: I think those who want statistical data can turn to other projects like MRTG, Darkstat, or even Wireshark to get interesting statistical data.
In brief, I consider NSM's basic data requirements to be all four types of data mentioned earlier, with the understanding that collecting full content is expensive. On-demand definitely, continuous if possible.