Port Independent Protocol Identification

One of the Holy Grails of network security monitoring is Port Independent Protocol Identification (PIPI -- lousy acronymn, but technically useful). PIPI allows inspection of protocols regardless of the port in use. PIPI has many security implications for discovery and (preferably) denial of covert channels, back doors, and other policy-violating channels. PIPI can also help network engineers better understand the legitimate use of protocols on their networks.

Some implementations exist. Last year after visiting Fidelis Security I mentioned their appliance uses port-neutral methods to identify protocols. Sourcefire's RNA also does PIPI. The Linux-only Application Layer Packet Classifier for Linux (L7-filter) and IPP2P projects use signatures to discover protocols on arbitrary ports. I'd like to hear of other approaches.

Today, thanks to geek00l, I read the paper Dynamic Application-Layer Protocol Analysis
for Network Intrusion Detection
by an all-star team from Technische Universit√§t M√ľnchen and Berkeley's ICSI Center for Internet Research. From the abstract:

In this paper, we discuss the design and implementation of a NIDS extension to perform dynamic application-layer protocol analysis. For each connection, the system first identifies potential protocols in use and then activates appropriate analyzers to verify the decision and extract higher-level semantics. We demonstrate the power of our enhancement with three examples: reliable detection of applications not using their standard ports, payload inspection of FTP data transfers, and detection of IRC-based botnet clients and servers.

Even better, their implementation is scheduled for integration in the next release of Bro, perhaps next month.

On a related PIPI note, in the future I expect we will not create firewall policies using port numbers as a major component. A security policy enforcement system might instead allow an administrator to implement a policy like "deny all outbound HTTP except [real] HTTP on port 80 and HTTPS on port 443." In other words, network (i.e., traffic-centric) security policy will be decoupled from ports and instead focus on applications and data.


Joao Barros said…
I once mentioned this subject on the pf mailing list. It's almost taboo...
I don't know if you noticed but there was some code added to CURRENT that can tag traffic based on protocol inspection:

A netgraph node that can do different manipulations with
mbuf_tags(9) on packets.

Submitted by: Vadim Goncharov

Check ng_tag(4)
Before the commit the discussion started on current@

Oddly enough, Microsoft ISA Server supports L7 inspection.

Thanks for your comment! I found the thread which includes this helpful example.
Anonymous said…


If not fully capable, the remaining functionality is in the works. Regex is pretty flexible to find what anything you may be looking for.
PADS-man (and others), please read the paper I mentioned -- it discusses limitations of these sorts of systems. PADS is still awesome though.

By the way I appreciate your link to an earlier post!
Unknown said…
I can foresee something like this happening, especially as egress of data out of networks keeps trying to occur over known ports using obfuscated (encrypted or proprietary) protocols.

I admit I need to read up on this some more, but this almost sounds like you need signatures or samples to determine protocols. Does that mean things that evade AV or IDS signatures (slight changes) will become normal in even protocols for application communications?
Joao Barros said…
Yes, you will need signatures to determine protocols, mostly like for example snort works, with regex to find known patterns in the traffic.
But like you said, changes that would make the regex miss would make the marking/detection fail.
Actually this was the argument I got from the pf mailing list: this is fallible.
Well yes it is, but I'd rather have something rather then nothing.

It's fallible but the technology is used in IDS, firewalls, traffic shapers. Am I the one who's wrong?
The implementation in the paper does not rely on signatures alone. Bro-PIA uses signatures to signal additional inspection by an application-aware protocol inspection module. The decision is not strictly made on a signature match.
Martin Roesch said…
"Signature matching" (regex) is not a great way of doing it IMO. In RNA we do full validation of the protocols we detect via programmatic methods, each protocol is validated by a stateful analyzer that knows the structure of the protocol and validates the traffic at hand. When we get a non-match, we have a multi-method system that uses three separate approaches to try to determine the protocol. As a result RNA has a very low false positive rate, once RNA identifies a protocol it's very rare for it to be wrong. The same can't be said of regex based methods because there's no structural validation of the protocol typically, just a set of "keyphrase matches" that are subject to all the standard string matching problems. We also use a confidence model in RNA give you an idea of the statistical certainty of the data, which is useful as a thresholding mechanism for automated systems to take advantage of.

I wish I could go more detail but it's proprietary technology at this point. Maybe we ought to open source it... :)
Anonymous said…
Other product that performs application layer traffic classification
is Qradar from Q1Labs : http://www.q1labs.com/content.php?id=175
Augusto Barros said…
Check Point already does something like that (to work with protocols instead of port numbers). I had a problem to make an application that uses simple HTTP over port 443 pass through it because the firewall was reporting that the wrong protocol was being used on that port.

This approach can detect and block lots of tunnels, but we still have problems to detect and block, for example, tunnels like OzymanDNS (DNS tunnel - Dan Kaminsky) and httptunnel, where binary data from the protocol being tunneled is encoded with stuff like base64 or base32. For those cases I can only think in something like flows behaviour analysis for detection.
Anonymous said…
This is something near and dear to my heart, since I'm doing my master's on it. :-) I'm with Augusto, I think "Deep Packet Inspection" i.e. protocol parsing is going to hit a brick wall. It's quite useful where it can be used, but it's no good against encrypted data (https anyone?), and it's probably no good against my pet problem, hidden channels in http; http is simply too flexible in what it allows you to send to be able to parse everything. Base64 itself is no problem - if the parser can recognize it, it can decode and process it. However, if a hacker is doing a shell over HTTP and tunneling all their traffic in data blocks labelled as images or javascript literals or cookies or whatever, you're hosed.

I'll post references to some academic papers on classifying traffic on behavioural attributes when I have time, if there's any interest. Annie DeMontigny-LeBouef came up with a nice set of attributes, Mike Collins has a paper coming up at ESORICS, Zuev and Moore did some work in the area, Borders and Prakash did 'WebTap' in 2005 (I think), and the folks at Swinburne in Australia have an interesting set of papers on the subject. That's just off the top of my head, though, there's more that I'm forgetting. Again, those are mainly focused on classifying traffic based on attributes derived mainly from packet lengths, interpacket delays, data volumes, directional dynamics, and the like.

Very interesting stuff, but I think it's still a ways off from being ready to go live. :-) Protocol parsing is much more feasible, but like I said, I think there's a lot of ground that it won't cover. That's not to say, of course, that it's not worth doing, just like I wouldn't suggest that it's not worth having a firewall because of its limitations.

Is your work published on the Web?
Anonymous said…
Hi Richard,

No not yet, I'm still working on it. I'm aiming to be finishing up within a couple of months, at which point I'll make it available.

Here's some of the works I mentioned earlier, though:

Kevin Borders and Atul Prakash. WebTap: Detecting covert web
traffic. In Proceedings of the 11th ACM Conference on Computer
and Communications Security (CCS ’04), October 2004.

T.T.T. Nguyen, G. Armitage, "Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks," in (to be presented) IEEE 31st Conference on Local Computer Networks, Tampa, Florida, USA, November 2006.

A. DeMontigny-LeBoeuf. Flow attributes for use in traffic characterization.
Technical report CRC-TN-2005-003, December 2005.

Those should give you a flavor for what sort of work is going on, though it's by no means complete. IMHO, the field is still fairly immature, and should improve to the point of at least being a good compliment to existing approaches.
Erik said…
I recently got a report published, in which I describe a new efficieant algorithm for protocol identification. The report is called "The SPID Algorithm - Statistical Protocol IDentification" And can be downloaded from: www.iis.se/docs/The_SPID_Algorithm_-_Statistical_Protocol_IDentification.pdf.

The SPID algorithm can reliably detect/identify/classify the protocol based on just the first 5 TCP packets with payload. I do however need more training data in order to use the full potential of the SPID algorithm.

There is also a proof-of-concept application for the SPID algorithm available at SourceForge.

Popular posts from this blog

Five Reasons I Want China Running Its Own Software

Cybersecurity Domains Mind Map

A Brief History of the Internet in Northern Virginia