Operational Traffic Intelligence System Woes
Recently I posted thoughts on Cisco's Self-Defending Network. Today I spent several hours on a Cisco Monitoring, Analysis and Response System (MARS) trying to make sense of the data for a client. I am disappointed to report that I did not find the experience very productive. This post tries to explain the major deficiencies I see in products like MARS.
Note: I call this post Operational Traffic Intelligence System Woes because I want it to apply to detecting and resisting intrusions. As I mentioned earlier, hardly anyone builds real intrusion detection systems. So-called "IDS" are really attack indication systems. I also dislike the term intrusion prevention system ("IPS"), since anything that seeks to resist intrusion could be considered an "IPS." Most available "IPS" are firewalls in the sense that anything that denies activity is a policy enforcement system. I use the term traffic intelligence system (TIS) to describe any network-centric product which inspects traffic for detection or resistance purposes. That includes products with the popular labels "firewall," "IPS," and "IDS."
Three main criticisms can be made against TIS. I could point to many references but since this is a blog post I'll save that heavy lifting for something I write for publication.
In brief, we have TIS that don't/can't fully understand their environment, reporting more alerts than an analyst can understand, while providing not enough details to satisfy operational investigations. I did not even include usability as a critical aspect of this issue.
How does this apply to MARS? It appears that MARS (like other SIM/SEM/SIEM) believes that the "more is better" approach is the way to address the lack of context. The idea is that collecting as many input sources as possible will result in a system that understand the environment. This works to a certain limited point, but what is really needed is comprehensive knowledge of a target's existence, operating system, applications, and configuration. That level of information is not available, so I was left with inspecting 209 "red" severity MARS alerts for the last 7 days (3099 yellow, 1672 green). Those numbers also indicate information overload to me. All I really want to know is which of those alerts represent intrusions? MARS (and honestly, most products) can't answer that question.
The way I am usually forced to determine if I should worry about TIS alerts is manual inspection. The open source project Sguil provides session and full content data -- independent of any alert -- that lets me know a lot about activity directly or indirectly related to an event of interest. With MARS and the like, I can basically query for other alerts. Theoretically NetFlow can be collected, but the default configuration is to collect NetFlow for statistical purposes while discarding the individual records.
If I want to see full content, the closest I can get is this sort of ASCII rendition of a packet excerpt. That is ridiculous; it was state-of-the-art in 1996 to take a binary protocol (say SMB -- not shown here but common) and display the packet excerpt in ASCII. That level of detail gives the analyst almost nothing useful as far as incident validation or escalation.
(Is there an alternative? Sure, with Sguil we extract the entire session from Libpcap and provide it in Ethereal/Wireshark, or display all of the content in ASCII if requested by the analyst.)
The bottom line is that I am at a loss regarding what I am going to tell my client. They spent a lot of money deploying a Cisco SDN but their investigative capabilities as provided by MARS are insufficient for incident analysis and escalation. I'm considering recommending augmentation with a separate product that collections full content and session data, then using the MARS as tip-off for investigation using those alternative data sources.
Are you stuck with similar products? How do you handle the situation? Several of you posted ideas earlier, and I appreciate hearing more.
Copyright 2007 Richard Bejtlich
Note: I call this post Operational Traffic Intelligence System Woes because I want it to apply to detecting and resisting intrusions. As I mentioned earlier, hardly anyone builds real intrusion detection systems. So-called "IDS" are really attack indication systems. I also dislike the term intrusion prevention system ("IPS"), since anything that seeks to resist intrusion could be considered an "IPS." Most available "IPS" are firewalls in the sense that anything that denies activity is a policy enforcement system. I use the term traffic intelligence system (TIS) to describe any network-centric product which inspects traffic for detection or resistance purposes. That includes products with the popular labels "firewall," "IPS," and "IDS."
Three main criticisms can be made against TIS. I could point to many references but since this is a blog post I'll save that heavy lifting for something I write for publication.
- Failure to Understand the Environment: This problem is old as dirt and will never be solved. The root of the issue is that in any network of even minimal size, it is too difficult for the TIS to properly model the states of all parties. The TIS can never be sure how a target will decipher and process traffic sent by an intruder, and vice versa. This situation leaves enough room for attacks to drive a Mac truck, e.g., they can fragment at the IP / TCP / SMB / DCE-RPC levels and confuse just about every TIS available, while the target happily processes what it receives. Products that gather as much context about targets improve the situation, but there are no perfect solutions.
- Analyst Information Overload: This problem is only getting worse. As attackers devise various ways to exploit targets, TIS vendors try to identify and/or deny malicious activity. For example, Snort's signature base is rapidly approaching 10,000 rules. (It's important to realize Snort is not just a signature-based IDS/IPS. I'll explain why in a future Snort Report.) The information overload problem means it's becoming increasingly difficult (if not already impossible) for security analysts to understand all of the attack types they might encounter while inspecting TIS alerts. SIM/SEM/SIEM vendors try mitigate this problem by correlating events, but at the end of the day I want to know why a product is asking me to investigate an alert. That requires drilling down to the individual alert level and understanding what is happening.
- Lack of Supporting Details: The vast majority of TIS continue to be alert-centric. This is absolutely crippling for a security analyst. I am convinced that the vast majority of TIS developers never use their products on operational networks supporting real clients with contemporary security problems. If they did, developers would quickly realize their products do not provide the level of detail needed to figure out what is happening.
In brief, we have TIS that don't/can't fully understand their environment, reporting more alerts than an analyst can understand, while providing not enough details to satisfy operational investigations. I did not even include usability as a critical aspect of this issue.
How does this apply to MARS? It appears that MARS (like other SIM/SEM/SIEM) believes that the "more is better" approach is the way to address the lack of context. The idea is that collecting as many input sources as possible will result in a system that understand the environment. This works to a certain limited point, but what is really needed is comprehensive knowledge of a target's existence, operating system, applications, and configuration. That level of information is not available, so I was left with inspecting 209 "red" severity MARS alerts for the last 7 days (3099 yellow, 1672 green). Those numbers also indicate information overload to me. All I really want to know is which of those alerts represent intrusions? MARS (and honestly, most products) can't answer that question.
The way I am usually forced to determine if I should worry about TIS alerts is manual inspection. The open source project Sguil provides session and full content data -- independent of any alert -- that lets me know a lot about activity directly or indirectly related to an event of interest. With MARS and the like, I can basically query for other alerts. Theoretically NetFlow can be collected, but the default configuration is to collect NetFlow for statistical purposes while discarding the individual records.
If I want to see full content, the closest I can get is this sort of ASCII rendition of a packet excerpt. That is ridiculous; it was state-of-the-art in 1996 to take a binary protocol (say SMB -- not shown here but common) and display the packet excerpt in ASCII. That level of detail gives the analyst almost nothing useful as far as incident validation or escalation.
(Is there an alternative? Sure, with Sguil we extract the entire session from Libpcap and provide it in Ethereal/Wireshark, or display all of the content in ASCII if requested by the analyst.)
The bottom line is that I am at a loss regarding what I am going to tell my client. They spent a lot of money deploying a Cisco SDN but their investigative capabilities as provided by MARS are insufficient for incident analysis and escalation. I'm considering recommending augmentation with a separate product that collections full content and session data, then using the MARS as tip-off for investigation using those alternative data sources.
Are you stuck with similar products? How do you handle the situation? Several of you posted ideas earlier, and I appreciate hearing more.
Copyright 2007 Richard Bejtlich
Comments
I'm curious...where do you think this will go in the future? Devices like MARS that do alerts but also capture full content for better analysis? I've never done it or seen it done, so it just kinda overwhelms me to think about capturing full content for even small periods of time. It just sounds like a ton of operational storage capacity.
Do you think devices/technologies like this will ever find a real place in the repetoire of someone skilled enough to otherwise analyze packets on a level enough to augment these devices? Or will a lot of experts just forego the cost and stick to Snort, Ethereal, storage capacity, and their own engines?
I am hoping that maybe this year will be the year the managers, techies, and non-techies alike start to realize that security is not easy or narrow. You can't study the topic for a year and be able to explain it all and understand everything electronic (too often the unspoken expectation from non-tech mgmt has been that, in my experience). It is old hat to say there is no one magical device or lump sum of money to do it, but the ramifications of saying that are still sinking in so slowly...
http://netjitsu.blogspot.com/2007/01/investigating-alerts-with-cisco-ids-v5x.html
I'm interested in whether or not Mars can collect full Netflow session records . If/when you find out, please let us know.
The rules in MARS need to be fine-tuned to the environment and there needs to be enough netflow events collected to establish a baseline (at least a few weeks). I also recommend that you do some filtering before the packets even get to the MARS appliance. The strength of MARS lies in it's ability to correlate data to help you trace an attack to the point of origin. You need to configure it to use netflow, Windows event logs, etc if you want it to be anything more than a big centralized logging server. Generally speaking, the more devices you have feeding data to it the clearer the picture becomes.
We use to keep full netflow data going back a month or so as well but we didn't use it all that often. That being said it was great to have it when we needed it, which I guess is exactly the reason to have it!
Would have loved full data capture but it would have been prohibitively expensive.
It seems that a lot of these SIM and IDS/IPS systems are really now being sold to small and medium enterprises without any regard to the amount of additional staff time and expertise that will be required to maintain them. Consequently I find that the ones I've used aren't oriented towards making investigation of an incident easier but are there simply to send out more alerts under the premise that more alerts is surely better because we're detecting and stopping more attacks..
When I discovered this I nearly puked.
I had the enjoyment of implementing the MARS product at my last employer. It holds a lot of promise when configured appropriately, but as you say it fails to deliver enough raw data to the analyst to expediently classify an event before investing paid time on investigation.
A much larger issue with the tool than with the richness of information (and something itching in my memory tells me you can get more detailed packet dumps from it) is the significant time required to configure the product for one's organization. Far too often, with any rule-based event engine, you have just enough time to bang a framework together before your management runs out of patience. Whether the tools are adequate or not by design, a partial implementation will defeat all.
For that reason, the larger issue may be selecting solutions that provide practical coverage within the company culture. Along that line of thought, Sguil and other minimal processing solutions seem to be a required baseline before striking off to Cisco SDN-land.
Homogenization of network architecture may change this in the future, but for now, I'd love to see a complete implementation in a very large enterprise.
- Bill
Maybe it is just part of the energetically growing trend to feel secure as opposed to actually being secure? Maybe it's just part of the growing pains of the industry where we have lots and lots of new CISSP/security people who are still learning the basics and will take another 5-10 years to become viable experts that can do more than just fill a cubicle and do low-hanging tasks?
At any rate, as long as attacks have a human factor to them (as opposed to automated scans and script kiddies), no technology is going to fully replace the human factor on the security side.
Can I get a Nitro demo?
I recommend sending this comment via email to sguil-users@lists.sourceforge.net. No one will see or respond to your comment on a blog post from January.