Sunday, June 06, 2010

Simple Questions, Difficult Answers

Recently I had a discussion with one of the CISOs in my company. He asked a simple question:

"Can you tell me when something bad happens to any of my 100 servers?"

That's a very reasonable question. Don't get hung up on the wording. If it makes you feel better, replace "something bad happens to" with "an intruder compromises," or any other wording that conveys the question in a way you like.

It's a simple question, but the answer is surprisingly difficult. Let's consider the factors that affect answering this question.

  • We need to identify the servers.

    • We will almost certainly need IP addresses.

      • How many IP addresses does each server have?

      • What connectivity does each IP address provide?

      • Are they IPv4, IPv6, both?

      • Are they static or dynamic? (Servers should be static, but that is unfortunately not universal.)

    • We will probably need hostnames.

      • How many hostnames does each server have?

      • What DNS entries exist?

      • Extrapolate from the IP questions above to derive related hostname questions.

    • We will need to identify server users and owners to separate authorized activity from unauthorized activity, if possible.

  • What is the function and posture of each server?

    • Is the server Internet-exposed? Internally exposed? A combination? Something different?

    • How is the server used? What sort of services does it provide, at what load?

    • What is considered normal server activity? Suspicious? Malicious?

  • What data can we collect and analyze to detect intrusion?

    • Can we see network traffic?

      • Do we have instrumentation in place to collect data for the servers in question?

      • Can we see network traffic involving each server interface?

      • Is some or all of the traffic encrypted?

      • Does the server use obscure protocols?

      • What volume of data do we need to analyze?

      • What retention period do we have for this data?

      • What laws, regulations, or other restrictions affect collecting and analyzing this data?

    • Can we collect host and application logs?

      • Do we have instrumentation in place to collect data for the servers in question?

      • Are the logs standard? Nonstandard? Obscure? Binary?

      • Are the logs complete? Useful?

      • What volume of data do we need to analyze?

      • What retention period do we have for this data?

      • What laws, regulations, or other restrictions affect collecting and analyzing this data?

    • Is the collection and analysis process sufficient to determine when an intrusion occurs?

      • Is the data sufficiently helpful?

      • Are our analysts sufficiently trained?

      • Do our tools expose the data for analysis in an efficient and effective manner?

      • Do analysts have a point of contact for each server knowledgeable in the server's operations, such that the analyst can determine if activity is normal, suspicious, or malicious?

I'll stop there. I'm not totally satisfied with what I wrote, but you should have a sense of the difficulty associated with answering this CISO's question.

Furthermore, at what number is this process likely to yield results in your organization, and at what number will it fail? Can it be done for 1 server? 10? 100? 1,000? 10,000? 100,000?


Anonymous said...

I agree 100% this is the type of info an analyst or CIRT team needs in order to categorize events, view them within the context of their environment and "expected" traffic pattern and behavior. I would really like to see how these ideals and other "TAO" concepts scale in huge globally diverse networks, especially if this type of visibility does not already exists.

Mister Reiner said...

All good questions indeed.

This is an oversimplification, but there are really only two things people want to prevent when it comes to servers - unauthorized control and unauthorized transfer of information. Unauthorized control can be achieved by injection of code or by compromising enabled remote control capabilities. One way unauthorized transfer of information can be achieved, is by compromising the workstations of users that have authorized access to the information, which means having control of those workstations.

My questions for your CISO are:

1. What is in place to prevent/detect the injection of code and what is the system's response to code injects?

2. How are you preventing exploitation of enabled remote control capabilities?

If you need to be called in, it means that security measures have been inadequate in preventing the above. It's like the owner of a bodyguard service asking, "Can you give me a call if you're out on a job and the client is killed?"

People need to change their mindset from being reactive to proactive. Too many people are focused on detecting things after the fact - well after the damage is already done. It's time for folks like your CISO to start asking different questions.

Don't get me wrong, I feel compromise detection capabilities are EXTREMELY important, but enabling more capability to detect things after the fact isn't making things more secure. Prevention should still be the number one priority.

Mister Reiner

Anonymous said...

Actually...both are equally important. As the saying goes, "Prevention eventually fails". Spend too much money on the prevention, and the detection won't exist. Standard trade-off that never seems to occur under the 'risk management' umbrella.

Anonymous said...

in logs section ....

is necesary a ntp server, for synchronize events.

regards richard.


Mister Reiner said...

To Anon: There is a certain truth to "Prevention eventually fails", but I think that those implementations are predestined to fail - either because of a flawed implementation and/or a misguided belief that certain security measures/tools are actually effective and infallible.

We all know that compromise is inevitable and unavoidable, so we need to start architecting our network and security with that mindset. Protect the information at all costs, realize that certain assets will be compromised no matter what we do, and minimize the impact of those compromises until we can identify those assets and shut them down. And of course we need the right detection capabilities to do that the latter. Ha ha. ;)

Anonymous said...

I like your blog !!
do not miss my goods , they are very beautiful !!

acrylic jewelry

Akoya pearl beads

Akoya Pearl Bracelets

Brian said...

Not to complicate things, but what happens once the same application is moved to a private, internal, shared cloud? Does figuring out if something bad happened become easier or harder - easier because you would have more control over the environment - harder because you don't necessarily know exactly where the application is running, what else is running on the same hardware, or a host of other things.

Anonymous said...

Check this swiss software company, NEXThink.
I'm using them to have answers to a lot of the questions that you made in a simple 2 mouse clicks.

They are doing real-time monitoring on the destkops, so even i can say imidiately who is connected to all the servers, what is the used account, what is the applications, etc, etc...