Saturday, October 25, 2008

Security Event Correlation: Looking Back, Part 1

I've been thinking about the term "correlation" recently. I decided to take a look back to determine just what this term was supposed to mean when it first appeared on the security scene.

I found Thinking about Security Monitoring and Event Correlation by Billy Smith of LURHQ, written in November 2000. He wrote:

Security device logging can be extensive and difficult to interpret... Along with lack of time and vendor independent tools, false positives are another reason why enterprise security monitoring in not easy...

The next advance in enterprise security monitoring will be to capture the knowledge and analytical capabilities of human security experts for the development of an intelligent system that performs event correlation from the logs and alerts of multiple security technologies.


Ok, so far so good.

For example Company A has a screening router outside of their firewall that protects their corporate network and a security event monitoring system with reliable artificial intelligence. The monitoring system would start detecting logs where the access control lists or packet screens on the screening router were denying communications from a certain IP address. Because the intelligent system is intelligent it begins detailed monitoring of the firewall logs and logs of any publicly accessible servers for any communications destined for or originating from the IP address. If the intelligent system determined that there was malicious communication, the system would have the capability to modify the router access control lists or the firewall configuration to deny any communication destined for or originating from the IP address. (emphasis added)

Ok, you lost me. The enterprise is already "denying communications," implying an administrator already knew to configure defensive measures. Because denied traffic is logged, the correlation system looks for traffic somewhere else in the enterprise, and then modifies access control lists it finds that currently allow said traffic? What is this, a mistake detection mechanism?

Let's look at the next example.

What if the intelligent system began detecting multiple failed logins to an NT server by the president of the company? It would be useful for this technology to determine where these failed logins were originating from and "look for" suspicious activity from this IP and/or user for some designated timeframe. If this system determined that the failed logins originated from a user other than the president of the company, it could begin to closely monitor for a period of time all actions by this user and the company president (the user could be impersonating the president). This monitoring could include card readers, PBXs or voice mail access, security alarms from secured doors and gates and access to other servers. If the monitoring system were not correlating events the user impersonating the company president would probably bypass all access control and security monitoring devices because the user's actions appear as "normal" activity. (emphasis added)

This example is a little better, until the end. Failed logins happen every day, but an excessive number of failed logins can indicate an attack. I'm not exactly sure how the inclusion of other log sources is supposed to make a difference here, however. Furthermore, if the "user's actions appear as 'normal' activity," just how is it supposed to be identified as suspicious?

The correlation argument fails to pieces in the penultimate paragraph of this article:

Today there is one major obstacle to intelligent event correlation enterprise-wide. There is no standard for logging security related information or alerts. Every vendor uses their own logging or alerting methodology on security related events. In many cases there are inconsistent formats among products from the same vendor. These issues make enterprise security monitoring difficult and event correlation almost impossible with artificial intelligence. The industry will need to impose a standard method or protocol for logging and alerting security related events before an intelligent system can be developed and successfully implemented enterprise-wide. (emphasis added)

Wow, that is absolutely off-target. Lack of a logging standard is problematic, but the absolute worst problems involve having no idea 1) what assets exist; 2) what assets matter; 3) what activity is normal; 4) who owns what assets; 5) what to do about an incident.

So far the there's nothing compelling about "correlation" here. The article hints that one might learn more about failed login attempts if an analyst could check physical access logs to verify the in-office presence of a person, but couldn't the source IP for the failed logins roughly indicate the same? Even if the company president is in the building, it doesn't mean he/she is at his/her computer.

In the next part of this article we'll move forward in time to look at more correlation history.

1 comment:

Mike Epplin said...

Good article, and you raise some great points with regards to correlation. However, towards the end you refer to what some of the problems are.

Lack of logging standards is a problem, and there are some groups who are trying to tackle this issue, however due to costs to the vendor for implementing standards and a lack of direction, these efforts will ultimately not bear much fruit. This dictates a need for a log management/correlation that can capture logs regardless of format and normalize them based on something like regular expressions. Systems that have a flexible, standards based architecture are much easier to use in this regard as it will keep costs down from a resource and training perspective.

The problems you refer to based on assets is a problem a large number of organizations have. This impacts more then security event correlation.

Normal activity and incident handling procedures are an issue regardless of what type of monitoring system is used. If you don't know who owns an asset, who do you call when a patch hasn't been applied, or when a system is infected, etc.?