Wednesday, January 31, 2007

I Am Not Anti-Log

Some of you who rely on various system and application logs might take exception to my emphasis on interpreting network traffic. You might think I am "anti-log." That is absolutely not true. I will demonstrate a case that shows I appreciate logs in certain situations.

Last night I was analyzing alert data collected from one of the customers I monitor. One of the Snort alerts I saw (a bleeding-exploit.rules entry) indicated BLEEDING-EDGE EXPLOIT Possible MSIE VML Exploit. This did not look promising, especially since I was not flooded with these events. In other words, if I had seen 100, I would not be 100 times more worried than if I saw only one alert. The fact that I was investigating a single alert made me think this signature might be deadly accurate.

I am not going to walk through the entire investigation for this event. Suffice it to say I wanted to know if the victim system was truly exploited. I eventually found myself looking at transcripts of traffic and the traffic itself. I duplicated part of the activity on my own system so I could show you what that might look like without revealing client data.
No, I did not visit a dating site for fun. Neither did my client. Prior to this I saw nothing indicating those sorts of interests, so I'm guessing this was an unintentional case.

The point is that it's much easier to understand a victim's Web browsing (if that's the crucial aspect of the investigation) if Web proxy logs are available, like these:

1170223023.601 388 192.168.2.5 TCP_MISS/404 558
GET http://back88008800.com/ -
DIRECT/81.95.146.166 text/html
1170223024.219 318 192.168.2.5 TCP_MISS/404 569
GET http://back88008800.com/favicon.ico -
DIRECT/81.95.146.166 text/html
1170223028.897 390 192.168.2.5 TCP_MISS/200 797
GET http://back88008800.com/dating.html -
DIRECT/81.95.146.166 text/html
1170223061.677 344 192.168.2.5 TCP_REFRESH_HIT/304 240
GET http://back88008800.com/dating.html -
DIRECT/81.95.146.166 -
1170223062.070 355 192.168.2.5 TCP_MISS/200 1946
GET http://back88008800.com/script.js -
DIRECT/81.95.146.166 application/x-javascript
1170223062.329 123 192.168.2.5 TCP_MISS/302 438
GET http://www.worlddatinghere.com/? -
DIRECT/63.218.226.67 text/html
1170223062.463 392 192.168.2.5 TCP_MISS/302 696
GET http://81.95.146.133/sutra/in.cgi? -
DIRECT/81.95.146.133 text/html
1170223062.802 339 192.168.2.5 TCP_MISS/200 4084
GET http://81.95.146.133/sp/sp2/index.php -
DIRECT/81.95.146.133 text/html

These are my personal Squid Web cache logs, which tracked my investigation. I like these logs because they cut right to the heart of the matter, namely what sites were visited as part of this event.

While analyzing this case I also had access to session data, like this.

Session data is great because it shows me everything that happened, regardless of whether it involved a logging application (like a Web proxy) or not. However, to get at the details I would need to generate transcripts, like this.

My point is that sometimes it's helpful to work with an application-specific log, like a Squid Web proxy log, instead of rebuilding everything from traffic.

Speaking of Squid, I found that the default /etc/logrotate.d/squid entry which controls /var/log/access.log rotation, contains this:

#
# Logrotate fragment for squid.
#
/var/log/squid/*.log {
daily
compress
delaycompress
rotate 2
missingok
nocreate
sharedscripts
prerotate
test ! -x /usr/sbin/sarg-maint || /usr/sbin/sarg-maint
endscript
postrotate
test ! -e /var/run/squid.pid || /usr/sbin/squid -k rotate
endscript
}

I decided to change the "rotate 2" to "rotate 30" to give me 30 days of logs. Remember this is my own network's setting, where I was duplicating my client's experience for your blog reading enjoyment. As far as my client was concerned, I did not find any evidence of compromise after checking my session and full content data for suspicious post-alert activity.

Sunday, January 28, 2007

NoVA Sec Meeting 1900 Mon 29 Jan 07 at Getronics

Since this blog has a higher readership than the NoVA Sec blog, I want to reiterate:

The next NoVA Sec meeting will take place 1900 (7 pm) Monday 29 January 2007 at Getronics Red Siren. Wesley Shields will discuss FreeBSD jails.

The Self-Defeating Network

At the risk of adding yet more fuel to the fire, I'd like to share a few more thoughts on NSM. Although the title of this post is The Self-Defeating Network (SdN), I don't intend it to be a slam of Cisco's Self-Defending Network (SDN). Rather, the post's title demonstrates a probably lame attempt at branding an otherwise potentially boring issue.

Thus far I've tried to explain NSM, and the related concept of Defensible Network Architecture (originated in my Tao book, expanded in Extrusion), from the view of best practices. I've tried to say here's what you should do, because it gives you the best chance to survive on the mean streets of the Internet. In this post I'll take a different approach by describing the Self-Defeating Network -- what you should not do if you want to have a chance to defend your enterprise.

These are the characteristics of the Self-Defeating Network (SdN).

  1. The SdN is unknown, meaning no one really understands how it works. There is no complete, current inventory of all infrastructure, hosts, services, applications, and data. No one knows what patterns of activity are normal, or which are suspicious or malicious. One cannot defend what one does not understand.

  2. The SdN is unmonitored, which results in trust without verification. Those using the SdN trust the integrity, confidentiality, and availability of information resources but have no idea if their trust is well-placed. (Here's a hint: it's not.)

  3. The SdN is uncontrolled, which means anything goes -- including suspicious and malicious traffic. Security is seen as an impediment and not a requirement, and as a result the enterprise is actually less capable of providing resources than one which is controlled. The uncontrolled environment also makes monitoring exceptionally difficult because no policy exists saying what is allowed and what is disallowed.

  4. The SdN strives to be unmanned (i.e., to reduce headcount to zero), because products, not processes and people, are perceived as the "solution." Sadly enough one could argue this aspect of the SdN is shared by the SDN. For an explanation of why people are necessary, please read Hawke vs the Machine.

  5. In addition to (mis)placing trust in an untrustworthy network, the users of the SdN also (mis)trust their products to provide alert-centric warnings when "bad things happen." If no alerts are sounded, SdN users assume everything is fine.

    A friend described a case of this approach taken to the extreme, when a large company paid a MSSP for three years of service, during which no alerts were ever raised. After paying several hundred thousand dollars, the company realized the MSSP misconfigured the SPAN ports feeding the MSSP sensors. For three years no trafic was inspected so the MSSP reported nothing. The MSSP wrote a big refund check.

    Hint on a future post: this is also the problem with log-centric detection methods. The absence of logs does not indicate an absence of problems. An absence of network traffic, however, does at least indicate an absence of remote connectivity with a suspected victim computer. Obviously an absence of network traffic does not preclude physical problems like stealing data via USB token.


Eventually I'll take all the notes I write in the wee hours of the night into blog posts. For now that's all!

Is It NSM If...

Frequently I'm asked about the data sources I cite as being necessary for Network Security Monitoring, namely statistical data, session data, full content data, and alert data. Sometimes people ask me "Is it NSM if I'm not collecting full content?" or "Where's the statistical data in Sguil? Without it, is Sguil a NSM tool?" In this post I'd like to address this point and answer a question posted as a comment Joe left on my post My Investigative Process Using NSM.

In 2002 while working for Foundstone, I contributed to the fourth edition of Hacking Exposed, pictured at left. On page 2 I defined NSM as the collection, analysis, and escalation of indications and warning to detect and respond to intrusions. Since then I've considered modifying that definition to emphasize the traffic-centric approach I intended to convey by using the term "network."

Whenever I speak or write about NSM I emphasize the four types of network data most likely to discover and control intrusions. However, I also say and write that you should collect the maximum amount of data that is technically, legally, and reasonably possible. For example, it is technically impossible (without spending vast amounts of money) to continuously collect all but a short period of full content traffic in some environments. In other cases, it is not legally allowed, or privacy concerns render collecting full content a bad idea. For example, I would hope my ISP avoids storing all user packets because they claim a security value. With reason as a guide, I would also expect NSM practitioners to avoid storing the full content of the traffic passing on their storage area network or similar.

I like the approach taken by the inspiration for The Tao of Network Security Monitoring, namely the incomparable Bruce Lee's The Tao of Jeet Kune Do. Bruce Lee didn't advocate slavish devotion to any style. He suggested taking what was valuable from a variety of styles and applying what works in your own situation. I recommend the same idea with NSM.

Does this mean that one can completely avoid collecting full content data, perhaps relying instead on statistical, session, and alert data? I argue that whatever the limitations that prevent continuous full content data collection, the ability to perform on-demand full content data collection is an absolute requirement of NSM.

First, every network probably must have this capability, simply to meet lawful intercept requirements. Second, although I love session data, it is not always able to answer every question I may have about a suspicious connection. This is why approaches like the DoD's Centaur project are helpful but not sufficient. There is really no substitute for being able to look at full content, even if it's activated in the hopes of catching a future instance of a suspicious event. Third, only full content data can be carefully re-examined by deep inspection applications (like an IDS) once a new detection method is deployed. Session data can be mined, but the lack of all packet details renders detection of certain types of suspicious behavior impossible.

While we're talking about full content, I suppose I should briefly address the issue of encryption. Yes, encryption is a problem. Shoot, even binary protocols, obscure protocols, and the like make understanding full content difficult and maybe impossible. Yes, intruders use encryption, and those that don't are fools. The point is that even if you find an encrypted channel when inspecting full content, the fact that it is encrypted has value.

When I discover outbound traffic to port 44444 TCP on a remote server, I react one way if I can read clear HTTP, but differently if the content appears encrypted.

I will admit to filtering full content collection of traditionally encrypted traffic, such as that on 443 TCP, when such collecting such traffic would drastically decrease the overall amount of full content I could collect. (For example, with HTTPS I might only save 1 day's worth of traffic; without, maybe 3 days.) In such cases I am making a trade-off that I hope is acceptable given the constraints of my environment.

As to why we don't have statistical data in Sguil: I think those who want statistical data can turn to other projects like MRTG, Darkstat, or even Wireshark to get interesting statistical data.

In brief, I consider NSM's basic data requirements to be all four types of data mentioned earlier, with the understanding that collecting full content is expensive. On-demand definitely, continuous if possible.

Saturday, January 27, 2007

Wireshark Display Filters and SSL

I mentioned the power of Wireshark display filters when analyzing 802.11 last year. Now I read Ephemeral Diffie Hellman support - NOT ! by the Unsniff guys and they tell me that they cannot decode SSL traffic which uses the ephemeral Diffie-Hellman cipher suite. I wonder what that looks like in traffic?

Thanks to Wireshark display filters, I can find a suitable packet. Here's a matching packet.
You could use syntax like this with Tshark:

tshark -V -n -r capture -R "ssl.handshake.ciphersuite == 0x39"
...edited...
Secure Socket Layer
TLSv1 Record Layer: Handshake Protocol: Server Hello
Content Type: Handshake (22)
Version: TLS 1.0 (0x0301)
Length: 74
Handshake Protocol: Server Hello
Handshake Type: Server Hello (2)
Length: 70
Version: TLS 1.0 (0x0301)
Random
gmt_unix_time: Jan 26, 2007 19:32:44.000000000
random_bytes: 76744E818415307EA6F7C14FAF4BA640F67834C1263E5065...
Session ID Length: 32
Session ID (32 bytes)
Cipher Suite: TLS_DHE_RSA_WITH_AES_256_CBC_SHA (0x0039)
Compression Method: null (0)
Maybe some of you crypto gurus can comment on their blog post -- is it possible to decrypt traffic if the cipher suite is TLS_DH_RSA_WITH_AES_256_CBC_SHA (0x0037) instead of TLS_DHE_RSA_WITH_AES_256_CBC_SHA (0x0039)? The cited perfect forward secrecy article says Diffie-Hellman provides PFS but isn't clear on the differences between plain DH and DHE (ephemeral).

From what I read in Cryptography Decrypted, SSL/TLS uses Diffie-Hellman to create a shared pre-master secret key and six shared secret keys. Anonymous DH doesn't require either side to authenticate each other. Fixed or static DH exchanges public DH values using digital certs. Ephemeral DH exchanges public DH values signed with RSA or DSA keys. What does this mean for decryption using Wireshark, etc.? Thank you.

What Do I Want

If you've read this blog for a while, or even if you've just been following it the last few months, you might be saying "Fine Bejtlich, we get it. So what do you want?"

The answer is simple: I want NSM-centric techniques and tools to be accepted as best practices for digital security. I don't say this to sell products. I say this because it's the best chance we have of figuring out what's happening in our enterprise.

NSM means deploying sensors to collect statistical, session, full content, and alert data. NSM means having high-fidelity, actionable data available for proactive inspection when possible, and reactive investigation every other time. NSM means not having to wait to hire a network forensics consultant who brings his own gear to the enterprise, hoping for the intruder to make a return appearance while the victim is instrumented. I'd like to see organizations realize they need to keep track of what's happening in their enterprise, in a content-neutral way, similar to the services provided by a cockpit voice recorder and a flight data recorder (CVR, FDR).

This is critical: what does content-neutral mean? The CVR doesn't start recording when it detects the pilot saying "help" or "emergency." The FDR doesn't start recording when the plane's altitude drops below 1000 feet. Rather, both devices are always recording, because those who deploy CVRs and FDRs know they don't know what will happen. This is the opposite of soccer-goal security, where you pick a defensive method and possibly miss everything else.

Network-centric access control, implemented by firewalls and the like, is pretty much a given. (I'm not talking about NAC, which is an acronym for Cisco's Network Admission Control and not Network Access Control, for pity's sake.) Ignoring the firewall-dropping folks at the Jericho Forum and Abe Singer, everyone (especially auditors and regulators) recognizes firewalls are necessary and helpful components of network security. This is undeniably true when one abandons the idea of the firewall as a product and embraces the firewall as a system, as rightly evangelized by Marcus Ranum and described in the Firewall FAQ:

2.1 What is a network firewall?

A firewall is a system or group of systems that enforces an access control policy between two or more networks. The actual means by which this is accomplished varies widely, but in principle, the firewall can be thought of as a pair of mechanisms: one which exists to block traffic, and the other which exists to permit traffic. Some firewalls place a greater emphasis on blocking traffic, while others emphasize permitting traffic. Probably the most important thing to recognize about a firewall is that it implements an access control policy.


Notice this description doesn't mention Pix, or Checkpoint, or Pf, or any other box. Those words apply equally well to router ACLs, layer 3-4 firewalls, "IPS," "deep packet inspection" devices -- whatever. It's about blocking and permitting traffic.

Returning to the main point: we've got to get network visibility and awareness as deeply into the digital security mindset as network-centric access control (via firewalling). How can you possibly consider blocking or permitting traffic if you don't even know what's happening in the enterprise? NSM will answer that question for you. Build yourself a network data recorder today, and learn to interpret what you're seeing. You'll sleep worse in the beginning, but better as you get a grip on your security posture -- managing by fact, not belief.

TaoSecurity Enterprise Trust Pyramid

My Monitor Your Routers post touched on the idea of trust. I'd like to talk about that for a moment, from the perspective of an operational security person. I'm not qualified to address trust in the same way an academic might, especially since trust is one of the core ideas of digital security. Trust can be described in extreme mathematical detail and in some cases even proven. I don't know how to do that. Instead, I'm going to describe how I decide what I trust when performing network incident response.

The diagram above shows the level of trust I have in the evidence or operation of various devices. I've broken them down into four categories. My trust decreases as the level of interaction with users increases. This is not a slam on users. Rather, it's a reflection of the idea that the level of exposure increases as one considers a device that is operated at the whim of a human.

At the very bottom of the pyramid would be a person on his/her user platform. This could be a PC with a user browsing the Web, reading email, chatting on IM, swapping files on P2P, etc. (Administrative rights worsen the situation but are not necessary to make my point.) It could also be a smart phone or other powerful portable general computing device. Because of the extreme number and diversity of attacks against this victim pool and the relative vulnerability of the platform, I hardly trust anything I see when interacting with a live victim system. An image of a victim system hard drive is more trusted than the same system viewed via a potentially compromised operating system, but memory- or hardware-resident attacks confuse that situation too.

I trust servers more than clients because servers typically offer constrained means by which users access the server. A properly designed and deployed server provides limited access to computing resources, although administrators typically interact directly with the platform. Sys admins are presumed to be more vigilant than regular users, but the level of exposure of services and the fallibility of human admins relegates the server to only one notch above client systems in my pyramid.

Next comes infrastructure. These are systems which carry user traffic, but users aren't expected to directly interact with them as they might with clients or servers. Again, admins will interact with the infrastructure, but most infrastructure is not a general computing platform with a very powerful shell. Note that users (and therefore attackers) can determine the existence of infrastructure when it interferes with traffic, as is the case when ACLs block communications. Infrastructure with directly addressable interfaces are more vulnerable to interaction and therefore less trusted than infrastructure with no directly addressable interfaces. For example, an unmanaged switch is potentially more trusted than a managed switch offering an administration interface.

At the top of the pyramid we find sensors. This being a Network Security Monitoring blog, you would expect that! The reason I trust my network sensors so much is that, when properly designed and deployed, they are invisible to users. They do not act on traffic and therefore cannot be discovered. When remote connectivity is required, my sensors do need administrative interfaces. That is a potential weakness. Furthermore, like all devices with TCP/IP stacks, they are susceptible to high-end attacks against the stack itself or the inspection applications watching passing traffic. These devices (really, no devices) can therefore be perfectly trustworthy, but they are leagues above the desktops at the bottom of the pyramid.

For a while I've held out hope that memory snapshots taken in some reliable manner might give valuable insights into the state of a victim system, thereby revealing useful evidence. I mentioned this last year. Now I read Joanna Rutkowska is demolishing this technique too. Back to the bottom of the trust heap goes the desktop!

My Investigative Process Using NSM

I know some of you believe that my Network Security Monitoring (NSM) methodology works and is the best option available for independent, self-reliant, network-centric collection, analysis, and escalation of security events. Some of you think NSM is impossible, a waste of time, irrelevant, whatever. I thought I would offer one introductory case based on live data from my cable line demonstrating my investigative process. Maybe after seeing how I do business the doubters will either think differently (doubtful) or offer their own answer to this problem: how do I know what happened in my enterprise?

(Please: I don't want to hear people complain that I'm using data from a cable line with one public target IP address; I'm not at liberty to disclose data from my client networks in order to satisfy the perceived need for bigger targets. The investigative methodology is the same. Ok?) s shown in the figure at left, I'm using Sguil for my analysis. I'm not going to screen-capture the whole investigation, but I will show the data involved. I begin this investigation with an alert generated by Bleeding Threat ruleset. This is a text-based representation of the alert from Sguil.


Count:1 Event#1.78890 2007-01-15 03:17:36
BLEEDING-EDGE DROP Dshield Block Listed Source
203.113.188.203 -> 69.143.202.28
IPVer=4 hlen=5 tos=32 dlen=48 ID=39892 flags=2 offset=0 ttl=104 chksum=57066
Protocol: 6 sport=1272 -> dport=4899

Seq=2987955435 Ack=0 Off=7 Res=0 Flags=******S* Win=65535 urp=35863 chksum=0
Payload:
None.

I'm not using Sguil or Snort to drop anything, but I'm suspicious why this alert fired. Using Sguil I can see exactly how Snort decided to fire this alert.

alert ip [80.237.173.0/24,217.114.49.0/24,68.216.152.0/24,89.0.143.0/24,69.24.128.0/24,
193.138.250.0/24,69.158.31.0/24,211.147.241.0/24,207.138.45.0/24,201.230.81.0/24,
221.12.113.0/24,195.245.179.0/24,193.255.250.0/24,199.227.77.0/24,218.106.91.0/24,
66.70.120.0/24,203.113.188.0/24,219.146.96.0/24,129.93.9.0/24,61.128.211.0/24]
any -> $HOME_NET any (msg:"BLEEDING-EDGE DROP Dshield Block Listed Source - BLOCKING";
reference:url,feeds.dshield.org/block.txt; threshold: type limit, track by_src, seconds 3600,
count 1; sid:2403000; rev:319; fwsam: src, 72 hours;)

/nsm/rules/cel433/bleeding-dshield-BLOCK.rules: Line 32

Wow, that's a big list of IPs. The source IP in this case (203.113.188.203) is in the class C of the Snort alert, so Snort fired. Who is the source? Again, Sguil shows me without launching any new windows or Web tabs.

% [whois.apnic.net node-2]
% Whois data copyright terms http://www.apnic.net/db/dbcopyright.html

inetnum: 203.113.128.0 - 203.113.191.255
...edited...
person: Nguyen Manh Hai
nic-hdl: NMH2-AP
e-mail: spam@viettel.com.vn
address: Vietel Corporation
address: 47 Huynh Thuc Khang, Dong Da District, Hanoi City
phone: +84-4-2661278
fax-no: +84-4-2671278
country: VN
changed: hm-changed@vnnic.net.vn 20040825
mnt-by: MAINT-VN-VIETEL
source: APNIC

Vietnam -- probably not someone I visited and no one from whom I would expect traffic.

So I know the source IP and I know exactly (not to be taken for granted given other systems) why this alert appeared. The question now is, should I care? If I were restricted to using an alert-centric system (as described earlier), my main option would now be to query for more alerts. I'll spare the description and say this is the only alert from this source. In the alert-centric world, that's it -- end of investigation.

At this point my log-centric friends might say "Check the logs!" That's a fine idea, but what if there aren't any logs? Does this mean the Vietnam box didn't do anything else, or that it did act but generated no logs? That's an important point.

In the NSM world I have two options: check session data, and check full content data. Let's check full content first by right-clicking and asking Sguil to fetch Libpcap data into Wireshark.

Here I could show another cool screen shot of Wireshark, but the data is the important element. Since Sguil copies it to a local directory I specify, I'll just re-read it with Tshark.

1 2007-01-14 22:17:36.162428 203.113.188.203 -> 69.143.202.28 TCP 1272 > 4899
[SYN] Seq=0 Len=0 MSS=1460
2 2007-01-14 22:17:36.162779 69.143.202.28 -> 203.113.188.203 TCP 4899 > 1272
[RST, ACK] Seq=0 Ack=1 Win=0 Len=0
3 2007-01-14 22:17:37.230040 203.113.188.203 -> 69.143.202.28 TCP 1272 > 4899
[SYN] Seq=0 Len=0 MSS=1460
4 2007-01-14 22:17:37.230393 69.143.202.28 -> 203.113.188.203 TCP 4899 > 1272
[RST, ACK] Seq=0 Ack=1 Win=0 Len=0

Not very exciting -- apparently two SYNs and two RST ACK responses. As we could have recognized from the original alert, port 4899 TCP activity is really old and since I know my network (or at least I think I do), I know I'm not offering 4899 TCP to anyone. But how do you know for all the systems you administer -- or watch -- or don't watch? This full content data, specific to the alert generated, but collected independently of the alert tells us we don't need to worry about this specific event.

This next idea is crucial: just because we have no other alerts from a source, it does not mean this event is all the activity that host performed. With this IP-based Snort alert, we can have some assurance that no other activity occured because Snort will tell us when it sees packets from the 203.113.188.203 network. If we weren't using an IP-based alert, we could query session data, collected independently of alert and full content data, to see what else the attacking host -- or target host -- did.

(Yes, these sections are heavy on the bolding and even underlining, because after five years of writing about this methodology a lot of people still don't appreciate the real-world problems faced by people investigating network incidents.)

I already said we're confident nothing else happened from the attacker here because Snort is triggering on its specific netblock. That sort of alert is a miniscule fraction of the entire rule base. Normally I would find other activity by querying session data and getting results like this from Sguil and SANCP.

Sensor:cel433 Session ID:5020091160069307011
Start Time:2007-01-15 03:17:36 End Time:2007-01-15 03:17:37
203.113.188.203:1272 -> 69.143.202.28:4899
Source Packets:2 Bytes:0
Dest Packets:2 Bytes:0

As you can see, Sguil and SANCP have summarized the four packets shown earlier. There's nothing else. Now I am sure the intruder did not do anything else, at least from 203.113.188.203 within the time frame I queried. For added confidence I could query on a time range involving the target to look for suspicious activity to or from other IPs, in the event the intruder switched source IPs.

I have plenty of other cases to pursue, but I'll stop here. What do you think? I expect to hear from people who say "That takes too long," "It's too manual," etc. I picked a trivial example with a well-defined alert and a tiny scope so I could avoid spending most of Saturday night writing this blog post. Think for a moment how you could expand this methodology and the importance of this sort of data to more complex cases and I think you'll give better feedback. Thank you!

Friday, January 26, 2007

Thoughts on December 2006 USENIX Login

I had the opportunity to "hang in the sky" (to use John Denver's phrase) again this week. While flying I read one of the best issues of USENIX ;login: I've seen. The December 2006 issue featured these noteworthy articles, most of which aren't online for everyone. USENIX members have the printed copy or can access the .pdf now. Nonmembers have to wait a year or attend the next USENIX conference, where free copies are provided.

  • My favorite article was The Underground Economy: Priceless by Team Cymru (.pdf available for free now). The article described the sorts of stolen material one can find circulating in the
    underground. It's a definite wake-up call for anyone who doesn't pay attention to that issue. Choice quotes include:

    Entire IRC networks--networks, not just single servers--are dedicated to the underground economy. There are 35 to 40 particularly active servers, all of which are easy to find. Furthermore, IRC isn't the only Internet vehicle they use. Other conduits include, but are not limited to, HTTP, Instant Messaging, and Peer-to-Peer (P2P)...

    This is the greatest failure of new technology--a rush to market, without consideration of the risks and a cost/benefit analysis. This is at the heart of the security problem. Certainly, that is not to say that industries should not capitalize on technological advances but, rather, that they should consider risk and threat mitigation strategies prior to bringing any product to market...

    The underground economy is fertile ground for the pursuit (and, we hope) prosecution of the miscreants. Most of the underground economy servers are public, advertised widely, and easy to find (standard IRC ports, very descriptive DNS RRs, etc.). There is absolutely no presumption of privacy in the underground economy; the channels aren't hidden, the channels have no keys, and the servers have no passwords. The clients in these channels are widely divergent. Think about what has just been shared:

    1. There is no need for specialized IRC clients.
    2. There is no need to rapidly track ever-changing DNS RRs and IPs.
    3. There is no need to pull apart every new permutation of malware.
    4. There is no need to hide, period.

  • Jan Göbel, Jens Hektor, and Thorsten Holz wrote Advanced Honeypot-Based Intrusion Detection, describing a combination of tools like Nepenthes, CWSandbox and local solutions.

  • In The Security of OpenBSD: Milk or Wine? by Andy Ozment and Stuart E. Schechter demonstrate OpenBSD security has improved over time.

  • In White Worms Don't Work by Nicholas Weaver and Dan Ellis, the pair argue "good" worms don't work.

  • Michael B. Scher shares his legal expertise in On Doing "Being Reasonable".

  • In the New Security Paradigms Workshop (NSPW '06) conference write-up (.pdf, available for free now), I read this interesting summary:

    Panel: Control vs. Patrol: A New Paradigm for Network Monitoring Panelists: John McHugh, Dalhousie University; Fernando Carvalho-Rodrigues, NATO; David Townshed, University of New Brunswick; The panelists debated the idea of an independent network-monitoring authority operating to ensure network integrity. The panelists contrast their concept of patrol versus more traditional discussions of network monitoring, which, in their perspective, are control- or ownership-oriented. The analogy driving the discussion was the role of highway patrols: Where a person drives in public spaces is their own business but that they were present is publicly accessible knowledge.

    Another summary wrote:

    Challenging the Anomaly Detection Paradigm: (.pdf) Carrie Gates, CA Labs; Carol Taylor, University of Idaho; This paper described weaknesses the authors perceived in the anomaly detection paradigm. The authors identified and questioned assumptions in three domains: the rarity and hostility of anomalies, problems in training data, and incorrect assumptions about operational requirements.

    In the first case, the authors argue that the assumptions made about the "normalcy" of data differ both since Denning's original studies and owing to changes in scope: Network data is more complex than system logs, and network data today is far more hostile than at the time of Denning's paper. In the second case, there are implicit assumptions about training data, such as the normalcy of a previous sample and the rarity of attacks that overlap this former case. Finally, the operational constraints were discussed in depth, with several commentators noting that the acceptable false-positive rate among the operational community is close to zero.
    (emphasis added)


Journals like ;login: are another great reason to be a USENIX member.

Thursday, January 25, 2007

Snort Report 2 Posted

My second Snort Report has been posted. In this edition I talk about upgrading from an older version to 2.6.1.2, and then I begin discussing the snort.conf file.

I recommend reading the first Snort Report so you can follow along with my methodology. In the third article (to be posted next month) I describe the sorts of activity you can detect without using Snort rules or dynamic preprocessors. The idea behind this series of articles is to develop an intuitive understanding of Snort's capabilities, starting with the basics and becoming more complicated.

Wednesday, January 24, 2007

Monitor Your Routers

Today I read this new Cisco advisory containing these words:

Cisco routers and switches running Cisco IOS® or Cisco IOS XR software may be vulnerable to a remotely exploitable crafted IP option Denial of Service (DoS) attack. Exploitation of the vulnerability may potentially allow for arbitrary code execution. The vulnerability may be exploited after processing an Internet Control Message Protocol (ICMP) packet, Protocol Independent Multicast version 2 (PIMv2) packet, Pragmatic General Multicast (PGM) packet, or URL Rendezvous Directory (URD) packet containing a specific crafted IP option in the packet's IP header...

A crafted packet addressed directly to a vulnerable device running Cisco IOS software may result in the device reloading or may allow execution of arbitrary code.


This is the sort of "magic packet" that's an attacker's silver bullet. Send an ICMP echo with the right IP option to a router interface and whammo -- you could 0wn the router. Who would notice? Most people consider their border router an appliance that connects to an ISP, and only begin watching traffic behind the router. I've been worried about this scenario since I posted Ethernet to your ISP in 2005 and read Hacking Exposed: Cisco Networks in 2006.

A router, like a printer, is just a computer in a different box. We should design architectures such that all nodes within our control can be independently and passively observed by a trusted platform. We should not rely on a potential target to reliably report its security status without some form of external validation, such as traffic collected and analyzed by network security monitoring processes. This Cisco advisory reminds me of a case I encountered at Foundstone. Shortly after joining the company in early 2002 the Apache chunked encoding vulnerability became known in certain circles. As the new forensics guy on the incident response team, I was asked how I could tell if Foundstone's Apache Web server was hacked. I replied that I would first check session data collected by a trusted sensor looking for unusual patterns to and from the Web server, as far back as the data and my time would allow.

Foundstone, being an assessment company, didn't share my opinions on NSM. No such data was available, so the host integrity assessment descended into the untrustworthy and often fruitless attempt to find artifacts of compromise on the Web server itself. Of course the Web server could not be taken down for forensic imaging, so we performed a live response and looked for anything unusual.

This case was one of the easier ones in the sense that all that was at stake was a single server. We didn't have a farm of Web servers, any of which could have been hacked.

Whenever I am trying to scope an incident I always turn to network data first and host data second. The network data helps decide which hosts merit additional individual scrutiny at the host-centric level. It makes no sense and is not time-efficient to deeply or even quickly analyze a decent-sized group of potential victims.

Because most security shops do not bother to collect NSM data, their first response activities usually involve a system administrator perusing the victim file system, manually checking potentially altered or erased logs on the target, and inspecting potentially obfuscated processes in memory controlled by a root kit. While I believe it can be important to collect live response data and even image and inspect a hard drive, I never start an incident response with those activities.

Remember: network first. I don't care if the intruder uses an encrypted channel. As long as he has utilized the network in a mildly suspicious manner, I will have some validation that the system is not behaving as expected. This presumes system owners know what is normal, preferably by comparing a history of network data against more recent, potentially suspect, activity.


Copyright 2007 Richard Bejtlich

Monday, January 22, 2007

Review of The Pragmatic CSO

While waiting in the airport, and flying between Ottawa and Washington Dulles, I read a copy of Mike Rothman's new book The Pragmatic CSO. I was somewhat suspicious of some of the early reviews, since they appeared so quickly after the book was published. You can rest assured that I read the whole book -- and I really liked it.

The most important feature of "P-CSO" (as it's called) is that it is a business book. P-CSO teaches readers (assumed to be techies, for the most part) how to think like a businessperson who reports and interacts with other businesspeople. I took business classes in college and graduate school, and I run my own business. Most of the time, however, I'm doing technical work. I usually stay so busy that I don't consciously consider the sorts of business issues Mike describes. Consider the following quote from pages 51-2:

The only way to get a seat at the table is by holding yourself to the same standards as everyone else. Operate a program, improve where necessary, track metrics, and report progress. Then repeat. Welcome to the wonderful world of business...

In business, perception is often more important than reality. Competence does the CSO little good unless senior management perceives him (or her) as competent. To do that, a Pragmatic CSO must learn to approach the job as a business manager does. The CSO job should be managed in the same way that the CFO manages finances, the CIO manages the IT department, and the CEO manages the business. This means identifying business goals, creating a step-by-step plan for achieving those goals, and executing on that plan, all the while communicating activities and success to senior management... instead of being treated as a security wonk.


Consider this from page 45:

When the CEO asked you if your security is effective, do you think he believed you... Since you haven't told the CEO what effective security is, why would he believe you?

In other words, frame perceptions. Furthermore, from page 70:

If there are no consequences for failure, you aren't a business unit.

So what is good security? Read pages 47-48:

No availability issues due to security problems. No loss of corporate intellectual property. No lawsuits because of policy violations. No problems that cause the PR spin-meisters to work overtime. Finally, a strong presentation to the auditors and examiners that you are in compliance with whatever regulation/policy is applicable...

You want show show improvement in the areas that are within your control. You want to see awareness going in the right direction. You want to make sure that security is not so onerous that it's getting in the way of business. You want to show that your environment is getting more secure via periodic penetration and vulnerability tests. And you want to show that you continue to improve how incidents are dealt with.


What, no tracking to show that 100% of machines are patched? Who cares! Mike is exactly right there, and here on pages 46-47:

Security is clearly overhead... the goals of any security program are to maintain availability, protect intellectual property, shepherd the brand, limit corporate liability, and ensure compliance. None of those activities directly contribute to the top line. But it can provide a strategic advantage...

[Y]ou are not going to put together a model that shows a positive ROI. That is fruitless and very hard to prove, so ultimately it's a waste of time. But we are trying to evangelize the mindset that an effective, programmatic approach to security will save the company money.


From the book I synthesized a few lists I plan to use in the future.

First, how to run a business or team:

  1. Set goals.

  2. Build a plan.

  3. Execute the plan.

  4. Track metrics and try to improve.

  5. Report progress.


The last item really only applies when you have upper or outside accountability.

Second, how to build a business plan using five elements:

  1. Position: Why does your group exist?

  2. Priorities: Where should you focus attention?

  3. Structure: How should you organize and operate?

  4. Service: What do you deliver to customers?

  5. Time: When are your deadlines?


None of this may make an impact unless you're in the middle of a project that involves contemplating such issues. As a small business owner I'm always grappling with these subjects. Even though P-CSO is written for Chief Security Officers in the corporate world, I found its business focus helpful for me as a consultant and business person. If any of what I wrote resonates with you, I strongly recommend buying and reading The Pragmatic CSO. All CSOs should also have a copy, period.

Thursday, January 18, 2007

Security Responsibilities

It's been several years since I had operational responsibility for a single organization's network security operations. As a consultant I find myself helping many different customers, but I maintain continuous monitoring operations for only a few. Sometimes I wonder what it would be like to step back into a serious security role at a single organization. Are any of you looking for someone with my background (.pdf)? If yes, please feel free to email taosecurity [at] gmail [dot] com. Thank you.

Latest Laptop Recommendations

It's been over a year since my last request for comments on a new laptop. I had a scare using my almost 7-year-old Thinkpad a20p today while teaching a private class. I wanted to run VMware Server using a VM configured to need 192 MB RAM. The laptop has 512 MB of physical RAM. When I started the VM, VMware Server complained it didn't have sufficient free RAM. Puzzled, I checked my Windows hardware properties and saw only 256 MB RAM reported! Oh oh.

I guessed that maybe one of the two 256 MB RAM sticks in my laptop had been loosened on the trip to the class site. Using a grounding wrist band thoughtfully provided by my class, I removed my laptop's RAM and reseated it. After booting, I saw all 512 MB again. Whew.< This experience made me again consider buying a new laptop. I am going to buy a Thinkpad, probably something in the T series like a T60p. However, I'm considering a new OS strategy. Currently I dual boot Windows 2000 Professional and FreeBSD 6.x. For my next laptop, I'm thinking of installing an OS fully supported by VMware Server, like Ubuntu, with VMware Server over it. I won't install anything else in Ubuntu. I'll do all my work inside VMware, with one VM running FreeBSD for daily work and another running some version of Windows for Office-like tasks.

I've avoided relying on VMware in the past as a primary work environment because I thought I would regularly need hardware-level access to run wireless assessment tools. This hasn't turned out to be a real need, and I think I would just turn to a live CD like BackTrack that has figured out all the Linux kernel voodoo needed for the cooler wireless tools.

Is anyone else doing this? What has been your experience?

FreeBSD VMware Interfaces

A site hosting news on FreeBSD 7.0 also included several great tips for FreeBSD under VMware. One tip talked about the lnc network interface standard under VMware.

You can see lnc0 in this sample VM. Here's dmesg output:

lnc0: <PCNet/PCI Ethernet adapter> port 0x1400-0x147f
irq 18 at device 17.0 on pci0
lnc0: Attaching PCNet/PCI Ethernet adapter
lnc0: [GIANT-LOCKED]
lnc0: Ethernet address: 00:0c:29:38:7d:ea
lnc0: if_start running deferred for Giant
lnc0: PCnet-PCI

This is what the interface looks like in VMware:

taosecurity:/root# ifconfig lnc0
lnc0: flags=108843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,
NEEDSGIANT> mtu 1500
inet6 fe80::20c:29ff:fe38:7dea%lnc0 prefixlen 64 scopeid 0x1
inet 198.18.153.167 netmask 0xffff0000 broadcast 198.18.255.255
ether 00:0c:29:38:7d:ea

The fact that lnc is GIANT-locked is bad for network performance. Furthermore, lnc is deprecated in FreeBSD 7.0, replaced by le.

The site included a tip to replace the lnc0 driver with an emulated em0 driver. I modified my .vmx file by adding the second line.

Ethernet0.present = "TRUE"
ethernet0.virtualDev="e1000"

With that option, I see em0 in dmesg output:

em0: <Intel(R) PRO/1000 Network Connection Version - 3.2.18>
port 0x1070-0x1077
mem 0xec820000-0xec83ffff,0xec800000-0xec80ffff irq 18
at device 17.0 on pci0
em0: Memory Access and/or Bus Master bits were not set!
em0: Ethernet address: 00:0c:29:fd:f6:1d

Here is the device in ifconfig output:

kbld:/root# ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
options=b<RXCSUM,TXCSUM,VLAN_MTU>
inet6 fe80::20c:29ff:fefd:f61d%em0 prefixlen 64 scopeid 0x1
inet 198.18.152.169 netmask 0xffff0000 broadcast 198.18.255.255
ether 00:0c:29:fd:f6:1d
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active

That's really cool, especially since em0 doesn't need the GIANT lock. Performance should be improved. The site I mentioned said this with respect to replacing lnc with le:

le is still a SIMPLEX device but at least it's not GIANT-locked.

It's true that le is not GIANT locked, but I think SIMPLEX is really irrelevant. You'll see SIMPLEX everywhere, like in the above em0 output. As I mentioned here, SIMPLEX "has nothing to do with half or full duplex. SIMPLEX refers to the NIC not being able to transmit and receive while operating on true CSMA/CD Ethernet, which is really not the case with switched networks."

On a related note, you don't have to recompile your kernel to use le. You can load it as a kernel module during boot by entering if_le_load="YES" in /boot/loader.conf as noted in the le man page. In my tests this did not result in using the le driver instead of lnc however, so those wishing to use an lnc0 replacement should use em0.

I also concur with the recommendation to use kern.hz="100" in /boot/loader.conf to deal with VMware timing issues. That helps immensely and doesn't require VMware Tools installation.

Wednesday, January 17, 2007

FreeBSD News

I'd like to mention a few FreeBSD news items. First, FreeBSD 6.2 was released Monday. I am not rushing to install it but I plan to deploy it everywhere. I have a subscription to FreeBSDMall.com, so I don't need to download any .iso's at the moment. I plan to upgrade all existing FreeBSD 6.1 systems using Colin Percival's 6.1 to 6.2 binary upgrade script. I am particularly glad to see that Colin's freebsd-update utility is now part of the base system.

Second, FreeSBIE 2.0, a FreeBSD live CD based on FreeBSD 6.2, was just released. I plan to download and try it out, at least in a VM. I'll probably burn a CD to use for testing FreeBSD support on various hardware.

Third, this BSDNews.com story pointed me to a site watching developments in FreeBSD 7.0, called What's cooking for FreeBSD 7.0?. It provides a quick summary of features expected in the next major version of FreeBSD. Keeping with the development theme, the Oct-Dec 2006 status report was also just published.

Update: If you use a USB mouse, wait for FreeSBIE 2.0.1 -- according to this interview with Matteo Riondato.

Tuesday, January 16, 2007

Brief Response to Marty's Post

Marty Roesch was kind enough to respond to my recent posts on NSM. We shared a few thoughts in IRC just now, but I thought I would post a few brief ideas here.

My primary concern is this: just because you can't collect full content, session, statistical, and alert data everywhere doesn't mean you should avoid collecting it anywhere. I may not have sensors on the sorts of network Marty describes (high bandwidth, core networks) but I have had (and have) sensors elsewhere that did (and do) support storing decent amounts of NSM data on commodity hardware using open source software. I bet you do too.

I'm not advocating you store full content on the link to your storage area network. I don't expect Sony to store full content of 8 Gbps of traffic entering their gaming servers. I don't advocate storing full content in the core. Shoot, I probably wouldn't try storing session data in the core. Rather, you should develop attack models for the sorts of incidents that worry you and develop monitoring strategies that best meet those needs given your resource constraints.

For example, almost everyone can afford to monitor almost all forms of NSM data at the point where their users exit the intranet and join the Internet. I seldom see these sorts of access links carrying the loads typically thought to cause problems for commodity hardware and software. (ISPs, this does not include you!) This is the place where you can perform extrusion detection by watching for suspicious outbound connections. If you follow defensible network architecture principles, you can augment your monitoring by directing all outbound HTTP (and other supported protocols) through a proxy. You can inspect those proxy logs instead of reviewing NSM data, if you have access.

Marty also emphasizes the problems caused by centralizing NSM data. I do not think centralization is a key aspect, or necessarily a required aspect, of NSM. One of my clients has three sensors. None of them report to a central point. All of them run their own sensor, server, and database components.

The existing Sguil architecture centrally stores alert and session data. Full content data remains on the sensor and is periodically overwritten. I am personally in favor of giving operators the option of storing session data on a database local to the sensor. That would significantly reduce the problems of centralization. I almost never "tune" of "filter" statistical or full content data. I seldom "tune" of "filter" session data, but I always tune alert data. By keeping the session data on the sensor, you can collect records of everything the sensor sees but not waste bandwidth pushing all that information to a central store.

Marty also said this:

Then we've got training. I know what the binary language of moisture vaperators, Rich knows the binary language of moisture vaperators, lots of Sguil users know it too. The majority of people who deploy these technologies do not. Giving them a complete session log of an FTP transfer is within their conceptual grasp, giving them a fully decoded DCERPC session is probably not. Who is going to make use of this data effectively? My personal feeling is that more of the analysis needs to be automated, but that's another topic.

Excellent Star Wars comment. I don't like the alternative, though. As I described here, I'm consulting for a client stuck with a security system they don't understand and for which they don't have the data required to acquire real knowledge of their network. I don't understand how providing less information is supposed to help this situation. As I wrote in Hawke vs the Machine, expertise grows from having the right forms of data available. In other words, it's the data that makes the expert. I don't have any special insights into alerts from an IDS or IPS. I can make sense of them only through investigation, and that requires data to investigate.

Recording everything, everywhere will never scale and isn't feasible. However, the revolution will be monitored, which will help us understand our networks better and hopefully detect and eject more intruders.

Comments on ISSA Journal Article

It's been 2 1/2 years since my first book was published, although I've been writing and speaking about Network Security Monitoring (NSM) for at least five years. I'm starting to see other people cite my works, which is neat. It also means people are starting to criticize what I wrote, so I need to elaborate on some ideas.

The December 2006 ISSA Journal includes an article by Robert Graham titled Detection Isn’t Optional: Monitoring-in-depth. (No, it's not the Robert Graham of Black Ice/ISS fame. This is a different person.)

The implication of this article is that NSM is insufficient because it does not integrate SNMP data, event logs, and other sources. I do not disagree with this assessment. The reason I focus on NSM is that I start from the premise of self-reliance. In many enterprises, the security team does not have access to SNMP data from infrastructure devices. That belongs to the networking team. They also might not have access to event logs, since those are owned by system administrators. In these situations, security analysts are left analyzing whatever data they can collect independently -- hence NSM.

Granted, the NSM definition I proposed is far too wide to apply strictly to traffic-centric monitoring. As I wrote previously I'm going to revise the NSM definition prior to writing a second edition of Tao. I think it makes sense to think of monitoring within this skeleton framework:

  • Enterprise Monitoring


    • Performance Monitoring

    • Fault Monitoring

    • Security Monitoring


      • Network- (i.e., traffic) centric

      • Infrastructure-centric

      • Host-centric

      • Application-centric


    • Compliance Monitoring



Here you see that I consider NSM to be a single part of the security aspect of enterprise situational awareness. NSM is not the be-all, end-all approach to solving enterprise problems. If I had tried to tackle this entire issue my first book could have been 2400 pages instead of 800. If you've read my blog for a while you'll remember seeing me review books on Nagios and host integrity monitoring and also commenting on SNMP. I do all this because I recognize the value of these other data sources.

Intel Premier IT Security Graphic

The image at left is from the first issue of an Intel marketing magazine called Premier IT. I like it because it shows many of the terms I try to describe in this blog, in relationship to each other. In English, the graphic says something like the following:

Threats exploit vulnerabilities, thereby exposing assets to a loss of confidentiality/integrity/availability, causing business impact.

I disagree that business impact is mitigated by controls. I think those terms were connected to make a pretty cyclical diagram. I would also say that controls mitigate attacks (exploits) by threats, not the threats themselves. Imprisonment mitigates threats.

The next diagram shows Intel emphasizes Policy at the base, followed by Training and Education, then Technology and Testing, and finally Monitoring and Enforcement. I think the Training and Education piece is marginally effective at best, at least for the general user population. It's tough enough for security pros to keep up with the latest attacks. It's impossible for general users. The school of hard knocks (i.e., experience) is doing a better job teaching the general user population not to trust anything online. I like the recommendation for Continuous monitoring for attacks and policy violations.

The last diagram positions most of the components of digital security within context. It includes Governance and Personnel, Physical Security, Network Security, Platform Security, Application Security, Storage Security, and File and Data Security. I like this image because it makes me question what aspects of this environment I understand and can personally implement.

Monday, January 15, 2007

Operational Traffic Intelligence System Woes

Recently I posted thoughts on Cisco's Self-Defending Network. Today I spent several hours on a Cisco Monitoring, Analysis and Response System (MARS) trying to make sense of the data for a client. I am disappointed to report that I did not find the experience very productive. This post tries to explain the major deficiencies I see in products like MARS.

Note: I call this post Operational Traffic Intelligence System Woes because I want it to apply to detecting and resisting intrusions. As I mentioned earlier, hardly anyone builds real intrusion detection systems. So-called "IDS" are really attack indication systems. I also dislike the term intrusion prevention system ("IPS"), since anything that seeks to resist intrusion could be considered an "IPS." Most available "IPS" are firewalls in the sense that anything that denies activity is a policy enforcement system. I use the term traffic intelligence system (TIS) to describe any network-centric product which inspects traffic for detection or resistance purposes. That includes products with the popular labels "firewall," "IPS," and "IDS."

Three main criticisms can be made against TIS. I could point to many references but since this is a blog post I'll save that heavy lifting for something I write for publication.

  1. Failure to Understand the Environment: This problem is old as dirt and will never be solved. The root of the issue is that in any network of even minimal size, it is too difficult for the TIS to properly model the states of all parties. The TIS can never be sure how a target will decipher and process traffic sent by an intruder, and vice versa. This situation leaves enough room for attacks to drive a Mac truck, e.g., they can fragment at the IP / TCP / SMB / DCE-RPC levels and confuse just about every TIS available, while the target happily processes what it receives. Products that gather as much context about targets improve the situation, but there are no perfect solutions.

  2. Analyst Information Overload: This problem is only getting worse. As attackers devise various ways to exploit targets, TIS vendors try to identify and/or deny malicious activity. For example, Snort's signature base is rapidly approaching 10,000 rules. (It's important to realize Snort is not just a signature-based IDS/IPS. I'll explain why in a future Snort Report.) The information overload problem means it's becoming increasingly difficult (if not already impossible) for security analysts to understand all of the attack types they might encounter while inspecting TIS alerts. SIM/SEM/SIEM vendors try mitigate this problem by correlating events, but at the end of the day I want to know why a product is asking me to investigate an alert. That requires drilling down to the individual alert level and understanding what is happening.

  3. Lack of Supporting Details: The vast majority of TIS continue to be alert-centric. This is absolutely crippling for a security analyst. I am convinced that the vast majority of TIS developers never use their products on operational networks supporting real clients with contemporary security problems. If they did, developers would quickly realize their products do not provide the level of detail needed to figure out what is happening.


In brief, we have TIS that don't/can't fully understand their environment, reporting more alerts than an analyst can understand, while providing not enough details to satisfy operational investigations. I did not even include usability as a critical aspect of this issue.

How does this apply to MARS? It appears that MARS (like other SIM/SEM/SIEM) believes that the "more is better" approach is the way to address the lack of context. The idea is that collecting as many input sources as possible will result in a system that understand the environment. This works to a certain limited point, but what is really needed is comprehensive knowledge of a target's existence, operating system, applications, and configuration. That level of information is not available, so I was left with inspecting 209 "red" severity MARS alerts for the last 7 days (3099 yellow, 1672 green). Those numbers also indicate information overload to me. All I really want to know is which of those alerts represent intrusions? MARS (and honestly, most products) can't answer that question.

The way I am usually forced to determine if I should worry about TIS alerts is manual inspection. The open source project Sguil provides session and full content data -- independent of any alert -- that lets me know a lot about activity directly or indirectly related to an event of interest. With MARS and the like, I can basically query for other alerts. Theoretically NetFlow can be collected, but the default configuration is to collect NetFlow for statistical purposes while discarding the individual records.

If I want to see full content, the closest I can get is this sort of ASCII rendition of a packet excerpt. That is ridiculous; it was state-of-the-art in 1996 to take a binary protocol (say SMB -- not shown here but common) and display the packet excerpt in ASCII. That level of detail gives the analyst almost nothing useful as far as incident validation or escalation.

(Is there an alternative? Sure, with Sguil we extract the entire session from Libpcap and provide it in Ethereal/Wireshark, or display all of the content in ASCII if requested by the analyst.)

The bottom line is that I am at a loss regarding what I am going to tell my client. They spent a lot of money deploying a Cisco SDN but their investigative capabilities as provided by MARS are insufficient for incident analysis and escalation. I'm considering recommending augmentation with a separate product that collections full content and session data, then using the MARS as tip-off for investigation using those alternative data sources.

Are you stuck with similar products? How do you handle the situation? Several of you posted ideas earlier, and I appreciate hearing more.


Copyright 2007 Richard Bejtlich

Friday, January 12, 2007

Certified Malware Removal Expert

I read the following in the latest SANS NewsBites (link will work shortly):

Does anyone on your staff do an excellent job of cleaning out PCs that have been infected by spyware and other malicious software. We are just starting development of a new certification (and related training) for Certified Malware Removal Experts and we are looking for a council of 30 people who have done a lot of it to help vet the skills an dknowledge required for the certification exam and classes. Email cmre@sans.org if you have a lot of experience.

This must be the easiest SANS certification of all! The safest way to remove malware is to reinstall from trusted original media (not backups which could be compromised). That doesn't even account for BIOS or other hardware rootkits, but hardly anyone cares about that problem yet.

Hopefully SANS will come to the same conclusion that Microsoft already did and drop this idea.

Wednesday, January 10, 2007

Thoughts on Cisco Self-Defending Network Book

I didn't exactly "read" Self-Defending Networks: The Next Generation of Network Security by Duane DeCapite. Therefore, I won't review the book at Amazon.com. I definitely didn't read a majority of the text, which is a personal requirement for a book review. However, I'd like to discuss the title here.

The book has a ton of screen shots and is essentially a big marketing piece for Cisco's Self-Defending Network gear, which includes:

Why do I mention this, especially with product listings? Well, I realized the Self-Defending Network (SDN) is a security integrator's dream. I'm working with a client who has sold essentially this entire setup to a customer, and they want me to help get the most value from the deployment. I'm also going to assist with incident response planning.

The point is a security integrator can pitch this entire SDN suite as a coherent, one-brand "solution," and cover pretty much all the bases. That's impressive and I'm interested in knowing what sort of traction Cisco is getting with this approach. My sense is that it will sell well to non-technology companies who are really late in the security game. Yes, there are many companies who have no real protection, even in 2007. I severely doubt the readers of this blog are in that category, but what are you seeing?

The Revolution Will Be Monitored

I read the following in the latest SANS NewsBites:

Revised Civil Procedure Rules Mean Companies Need to Retain More Digital Data (4 January 2007)

The revised Federal Rules of Civil Procedure, which took effect on December 1, 2006, broaden the types of electronic information that organizations may be asked to produce in court during the discovery phase of a trial. The new types of digital information include voice mail systems, flash drives and IM archives. This will place a burden on organizations to retain the data in the event it is needed in a legal case.

Section V, Depositions and Discovery, Rule 34 of the Federal Rules of Civil Procedure reads, in part,

"Any party may serve on any other party a request to produce and permit the party making the request, or someone acting on the requestor's behalf, to inspect, copy, test or sample any designated documents or electronically stored information - including writings, drawings, graphs, charts, photographs, sound recordings, images, and other data or data compilations stored in any medium from which information can be obtained ..."


This ComputerWorld article adds:

According to a 2006 study by the American Management Association and the ePolicy Institute, more than half of those who use free IM software at work say that their employers have no idea what they're up to.

There are two ways to look at this problem. The first involves limiting the amount of data available, i.e., data creation. Brian Honan mentions this in his commentary for SANS:

Make sure to include how to deal with personal electronic devices such as PDAs and pen drives - hint best to prohibit their use in a corporate environment in the first place.

Yeah, right. Everything is going to have USB/Bluetooth/whatever connectivity and a flash drive sooner or later. We're already seeing this will cell phones and integrated cameras. It's almost impossible to not buy a new cell phone without a camera. One of my clients is considering banning cell phones with cameras in the office. That absolutely will not work. Who is going to enforce that policy? They don't have guards and no guard is going to strip-search employees to find cell phones with cameras.

The second way to look at the problem involves limiting data retention. In other words, don't save as much data and therefore have less data available for legal scrutiny. That is absolutely going to fail too. The trend across all sectors is to retain more information. Section 10 of the PCI Security Standard is just one example. Since 2003 the National Association of Securities Dealers (NASD) has required financial firms to retain IM for three years. The Securities and Exchange Commission (SEC) has already fined companies millions of dollars for not retaining email for at least three years.

I would not be surprised to see best practice evolve into requiring network traffic retention systems, perhaps at the session level or maybe even at the full content level. I would also not be surprised to see requirements for intercepting outbound encrypted traffic for inspection and retention purposes. The only reason we don't see those requirements yet is regulators don't understand how any protocol can be tunneled over any protocol, as long as the endpoints understand the mechanism involved.

New Laser Printer

My old HP DeskJet 970cxi died, so I decided to finally buy a color laser printer. Owning a color laser printer has been sort of a Holy Grail for me. I owned a black-and-white laser printer in 1994, and I always thought the true day of personal desktop publishing would arrive with reasonably priced color laser printers.

I bought a Lexmark C530dn at NewEgg.com for slightly more than $500 (when shipping is included). Since I bought the DeskJet several years ago for around $300, this new $500 printer seems the right price. There are cheaper color laser printers from Lexmark and Dell, but I wanted an integrated duplex unit. (I dislike wasting paper and I prefer to carry fewer sheets when possible.) The printer got outstanding CNet scores and I found the Better Buys for Business (.pdf) praise convincing.

After lugging the box upstairs (60+ lbs) it took about 15 minutes to set up the printer, install the software on Windows XPSP2 and attach the proper cables. I'm printing through the XP box, although the printer has an integrated Ethernet port if I wished to attach it to the network. Print quality (even in draft, which I made the default) thus far is excellent. The printer supports PCL 5c Emulation, PCL 6 Emulation, Personal Printer Data Stream (PPDS), PostScript 3 Emulation, and PDF 1.5, but I haven't tried printing to it from Unix yet.

Monday, January 08, 2007

Many Intruders Remain Unpredictable

The second of the three security principles listed in my first book is:

Many intruders are unpredictable.

I think the new Adobe Acrobat Reader vulnerability demonstrates this perfectly. (I'm not calling Stefano Di Paola an intruder; anyone who uses his technique maliciously is an intruder, though.)

Who would have thought to abuse a .pdf viewer in such a manner? Read more about the problem here.

This event reminds me of soccer goal security.

And Another Thing... More NSM Thoughts

My Hawke vs the Machine post elicited many comments, all of which I appreciate. I'd like to single out one set of comments for a formal reply here. These are by "DJB," which I highly doubt is Daniel J. Bernstein since the comment ends with "See you at the next ISSA meeting." (DJB lives in Illinois and I live in Virginia.)

DJB writes:

The topic is not alert-centric vs. NSM, or even passive vs. reactive. The real issue here is Return on Investment for security and Due Care. The cost and lack of common expertise of NSM is why it has not been fully adopted. Every SOC/NOC I’ve ever been in (over 100) suffers the plight you have identified. Furthermore, I could hire a hundred people with your level of expertise or the same number of Gulas, Ranums and Roeschs to perform NSM. The only problem is that the problem would not go away and I would be out a significant amount of money, even if you have “the right forms of data available.” The volume of traffic that we are talking about would require far too many experts.

Let me address these points in turn.

  • There is no ROSI (return on security investment). There is simply cost avoidance. Due care is a concept I am more likely to embrace.

  • NSM requires almost no cost at all. All of the software I use is open source. Most of the hardware I use is old junk. (This is one beauty of Unix.) When necessary I do buy hardware, and certainly one can spend lots of money on specialized gear. However, for the average company using an OC-3 or lower, you can get an acceptable amount of NSM data without breaking the $3000 barrier for a very fast, big-hard-drive 1U box and network tap.

  • What takes more expertise: interpreting cryptic output from a series of alerts (through alert-centric processes) or inspecting all of the traffic associated with an event of interest (NSM processes)? Making sense of the alerts from any leading commercial IDS/IPS can be an exercise in astrological prognostication and intestine reading. Looking at session patterns or -- unbelievably! - what commands the intruder tried to execute on the Web server and what responses came back takes very little skill. (Darn, I think I just called myself an idiot.) Furthermore, from whence does skill derive? Looking at alerts, or inspecting traffic? Q.E.D.

  • If every SOC/NOC (100) you've visited suffers the same problem, they need help! Contact me: I provide assessment and remediation services for exactly those sorts of broken organizations.

  • With NSM you don't need to hire a hundred Gulas, Ranums, and Roesches. First, they don't exist. Second, the data helps make the expert, not the other way around.

  • With or without NSM, security problems never go away. This is important: There is no security end game. All you do is achieve an acceptable level of perceived risk. That's my definition of security. With NSM, however, you know what is happening and can try to improve.


I think the other points have already been addressed.

One closing thought: I have never met an analyst -- a person who is actually trying to figure out what security events are ocurring on a network -- who rejects NSM once exposed to its tools and techniques. In fact, when I taught at SANS CDI East last month one of my students offered one of best comments I've ever read:

Wow, practical information for network and security engineers... why isn't anyone else teaching something this useful? (Comment from student, 15 Dec 06)

Many people who are tangentially related to network security or sell products or do other services reject my ideas all the time. (Some do, not all do.) The people in the trenches see the value, and really that's all I care about. The possible exception is convincing their bosses, so the analysts get the equipment and training they need.

Brothers in Risk

I write about risk, threat, and other security definitions fairly regularly. Lo and behold I just read a post by someone else who shares my approach. This is a must read. How did you react to the story?

A second brother in risk is Gunnar Peterson, who writes in part:

When security teams conflate threats and vulnerabilities, the result is confusion. Instead efforts dealing with threats... and vulnerabilities... should be separately optimized, besides both being part of "security"; they don't have that much in common.

Oh bravo, especially the old school link to Dan Geer which I should read again.

Security in the Real World

I received the following from a student in one of my classes. He is asking for help dealing with security issues. He is trying to perform what he calls an "IDS/IPS policy review," which is a tuning exercise. I will apply some comments inline and some final thoughts at the end.

If you recall, I was in one of your NSO classes last year. At the end of the day the only place I am able to use everything I learned is at home.

This is an example of a security person knowing what should be done but unable to execute in the real world. This is a lesson for all prevention fanboys. You know the type -- they think 100% prevention is possible. In the real world, "business realities" usually interfere.

As you are aware with other corporate environments, our company goes by Gartner rating on products and ends up buying a technology where you don't get any kind of data, but just an alert name. So that is a pain within itself.

Here we see a security analyst who has been exposed to my Network Security Monitoring ideas (alert, full content, session, and statistical data), but is now stuck in an alert-centric world. I have turned down jobs that asked me to leave my NSM sources behind a supervise alert ticketing systems. No thanks!

There is this issue that I have been running into and thought maybe you can help me, if you are free. I work for this pretty large organization with 42000 users in 15 states... in a dynamic ever-changing environment, what is the best route for policy review?

We have two different IDS technologies across the company. The IDS/IPS policy review is really for turning off the signatures that we don't need to know about and cutting down on alerts, so that alert monitoring becomes easier. Since we use ArcSight for correlation, its easier to look for our interesting traffic.


Wait a minute -- I thought ArcSight and other SIM/SEM/SIEM/voodoo was supposed to solve this problem? Why disable anything if ArcSight is supposed to deal with it?

Right now, we are dealing with just a very high volume of alerts, there is no way we are going to be able to catch anything. In other small environments, I have been able to easily determine what servers we have/havent and turn on only those that are needed. For example, if we are not running frontpage, we can turn off all frontpage alerts. In our environment, it will be difficult to determine that and often times, we have no idea what changes have taken place.

This is an example of a separation between the security team and the infrastructure team. This poor analyst is trying to defend something his group doesn't understand.

Therefore, our policy review needs to cover all the missing pieces as well. By that, I mean we have to take into consideration the lack of cooperation across the board from other teams, when we disable or enable alerts.

Here we see the effects of lack of cooperation between security and infrastructure groups. When I show people how I build my own systems to collect NSM data, some say "Why bother? Just check NetFlow from your routers, or the firewall logs, or the router logs, etc..." In many places the infrastructure team won't let the security team configure or access that data.

(1) Going by the firewall rule - Feedback from my team is that we wont know about the firewall rule changes, if any change were to occur, hence we can't do it. Another is they trust the IDS technology too much and they say "well you are recommending turning off a high level alert "There is a reason why vendor rates it high".

It seems the infrastructure team trusts the security vendor more than their own security group.

(2) Going by applications that are running on an environment - This has proven to be even more
difficult, since there is no update what has been installed and not.


Again, you can't monitor or even secure what you don't understand.

(3) Third approach for external IPS - Choose those that aren't triggered in the past three months, review those and put them in block mode - In case something were to be open by the firewall team, they will identify it being blocked by something, it will be brought to our attention then we can unblock it after a discussion.

This is an interesting idea. Run the IDS in monitoring mode. Anything that doesn't trigger gets set to block mode when the IDS becomes an IPS! If anyone complains, then react.

None of this has been approved thus far. as you know I used to work for [a mutual friend] and we had small customers like small banks and so forth. With them the policy review was much easier.

Smaller sites are less complex, and therefore more understandable.

Lets pick one example. I recommended at one point that we can disable all mysql activity on
our external IDSs, since they are blocked by the firewall anyway, so that we dont have to see thousands and thousands of scans on our network on port 1434 all the time. Even that didn't get approved. The feedback for that was IDS blocking the alerts can take some load off the firewall.


So, this is a complicated topic. I appreciate my former student permitting me to post this anonymously. Here's what I recommend.

  1. Perform your own asset inventory. You first have to figure out what you're defending. There are different ways to do this. You could analyze a week's worth of session data to see what's active. You could look for the results of intruder scans or other recon to find live hosts and services. You could conduct your own scan. You could run something like PADS for a week. In the end, create a database or spreadsheet showing all your assets, their services, applications, and OS if possible.

  2. Determine asset security policy and baselines. What should the assets you've inventoried be doing? Are they servers only accepting requests from clients? Do the assets act as clients and servers? Only clients? What is the appropriate usage for these systems based on policy?

  3. Create policy-based detection mechanisms. Once you know what you're protecting and how they behave, devise ways to detect deviations from these norms. Maybe the best ways involve some of your existing IDS/IPS mechanisms -- maybe not.

  4. Tune stock IDS/IPS alerts. I described how I tune Snort for Sys Admin magazine. I often deploy a nearly full rule set for a short time (one or two days) and then pare down the obviously unhelpful alerts. Different strategies apply. You can completely disable alerts. You can threshold alerts to reduce their frequency. You can (with systems like Sguil) let alerts fire but send them only to the database. I use all three methods.

  5. Exercise your options. Once you have a system you think is appropriate, get a third party to test your setup. Let them deploy a dummy target and see if you detect them abusing it. Try client-side and server-side exploitation scenarios. Did you prevent, detect, and/or respond to the attacks? Do you have the data you need to make decisions? Tweak your approach and consider augmenting the data you collect with third party tools if necessary.


I hope this is helpful. Do you have any suggestions?