Port Independent Protocol Identification
One of the Holy Grails of network security monitoring is Port Independent Protocol Identification (PIPI -- lousy acronymn, but technically useful). PIPI allows inspection of protocols regardless of the port in use. PIPI has many security implications for discovery and (preferably) denial of covert channels, back doors, and other policy-violating channels. PIPI can also help network engineers better understand the legitimate use of protocols on their networks.
Some implementations exist. Last year after visiting Fidelis Security I mentioned their appliance uses port-neutral methods to identify protocols. Sourcefire's RNA also does PIPI. The Linux-only Application Layer Packet Classifier for Linux (L7-filter) and IPP2P projects use signatures to discover protocols on arbitrary ports. I'd like to hear of other approaches.
Today, thanks to geek00l, I read the paper Dynamic Application-Layer Protocol Analysis
for Network Intrusion Detection by an all-star team from Technische Universität München and Berkeley's ICSI Center for Internet Research. From the abstract:
In this paper, we discuss the design and implementation of a NIDS extension to perform dynamic application-layer protocol analysis. For each connection, the system first identifies potential protocols in use and then activates appropriate analyzers to verify the decision and extract higher-level semantics. We demonstrate the power of our enhancement with three examples: reliable detection of applications not using their standard ports, payload inspection of FTP data transfers, and detection of IRC-based botnet clients and servers.
Even better, their implementation is scheduled for integration in the next release of Bro, perhaps next month.
On a related PIPI note, in the future I expect we will not create firewall policies using port numbers as a major component. A security policy enforcement system might instead allow an administrator to implement a policy like "deny all outbound HTTP except [real] HTTP on port 80 and HTTPS on port 443." In other words, network (i.e., traffic-centric) security policy will be decoupled from ports and instead focus on applications and data.
Some implementations exist. Last year after visiting Fidelis Security I mentioned their appliance uses port-neutral methods to identify protocols. Sourcefire's RNA also does PIPI. The Linux-only Application Layer Packet Classifier for Linux (L7-filter) and IPP2P projects use signatures to discover protocols on arbitrary ports. I'd like to hear of other approaches.
Today, thanks to geek00l, I read the paper Dynamic Application-Layer Protocol Analysis
for Network Intrusion Detection by an all-star team from Technische Universität München and Berkeley's ICSI Center for Internet Research. From the abstract:
In this paper, we discuss the design and implementation of a NIDS extension to perform dynamic application-layer protocol analysis. For each connection, the system first identifies potential protocols in use and then activates appropriate analyzers to verify the decision and extract higher-level semantics. We demonstrate the power of our enhancement with three examples: reliable detection of applications not using their standard ports, payload inspection of FTP data transfers, and detection of IRC-based botnet clients and servers.
Even better, their implementation is scheduled for integration in the next release of Bro, perhaps next month.
On a related PIPI note, in the future I expect we will not create firewall policies using port numbers as a major component. A security policy enforcement system might instead allow an administrator to implement a policy like "deny all outbound HTTP except [real] HTTP on port 80 and HTTPS on port 443." In other words, network (i.e., traffic-centric) security policy will be decoupled from ports and instead focus on applications and data.
Comments
I don't know if you noticed but there was some code added to CURRENT that can tag traffic based on protocol inspection:
Log:
A netgraph node that can do different manipulations with
mbuf_tags(9) on packets.
Submitted by: Vadim Goncharov
Check ng_tag(4)
Before the commit the discussion started on current@
Oddly enough, Microsoft ISA Server supports L7 inspection.
Thanks for your comment! I found the thread which includes this helpful example.
http://taosecurity.blogspot.com/2004/08/passive-asset-detection-system.html
If not fully capable, the remaining functionality is in the works. Regex is pretty flexible to find what anything you may be looking for.
By the way I appreciate your link to an earlier post!
I admit I need to read up on this some more, but this almost sounds like you need signatures or samples to determine protocols. Does that mean things that evade AV or IDS signatures (slight changes) will become normal in even protocols for application communications?
But like you said, changes that would make the regex miss would make the marking/detection fail.
Actually this was the argument I got from the pf mailing list: this is fallible.
Well yes it is, but I'd rather have something rather then nothing.
It's fallible but the technology is used in IDS, firewalls, traffic shapers. Am I the one who's wrong?
I wish I could go more detail but it's proprietary technology at this point. Maybe we ought to open source it... :)
is Qradar from Q1Labs : http://www.q1labs.com/content.php?id=175
This approach can detect and block lots of tunnels, but we still have problems to detect and block, for example, tunnels like OzymanDNS (DNS tunnel - Dan Kaminsky) and httptunnel, where binary data from the protocol being tunneled is encoded with stuff like base64 or base32. For those cases I can only think in something like flows behaviour analysis for detection.
I'll post references to some academic papers on classifying traffic on behavioural attributes when I have time, if there's any interest. Annie DeMontigny-LeBouef came up with a nice set of attributes, Mike Collins has a paper coming up at ESORICS, Zuev and Moore did some work in the area, Borders and Prakash did 'WebTap' in 2005 (I think), and the folks at Swinburne in Australia have an interesting set of papers on the subject. That's just off the top of my head, though, there's more that I'm forgetting. Again, those are mainly focused on classifying traffic based on attributes derived mainly from packet lengths, interpacket delays, data volumes, directional dynamics, and the like.
Very interesting stuff, but I think it's still a ways off from being ready to go live. :-) Protocol parsing is much more feasible, but like I said, I think there's a lot of ground that it won't cover. That's not to say, of course, that it's not worth doing, just like I wouldn't suggest that it's not worth having a firewall because of its limitations.
Is your work published on the Web?
No not yet, I'm still working on it. I'm aiming to be finishing up within a couple of months, at which point I'll make it available.
Here's some of the works I mentioned earlier, though:
Kevin Borders and Atul Prakash. WebTap: Detecting covert web
traffic. In Proceedings of the 11th ACM Conference on Computer
and Communications Security (CCS ’04), October 2004.
http://www.eecs.umich.edu/~aprakash/papers/borders-prakash-ccs04.pdf
T.T.T. Nguyen, G. Armitage, "Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks," in (to be presented) IEEE 31st Conference on Local Computer Networks, Tampa, Florida, USA, November 2006.
http://caia.swin.edu.au/pubs/lcn2006-nguyen_armitage_marked.pdf
A. DeMontigny-LeBoeuf. Flow attributes for use in traffic characterization.
Technical report CRC-TN-2005-003, December 2005.
http://www.crc.ca/files/crc/home/research/network/system_apps/network_systems/network_security/publications/ADeMontigny_CRCTN2005003.pdf
Those should give you a flavor for what sort of work is going on, though it's by no means complete. IMHO, the field is still fairly immature, and should improve to the point of at least being a good compliment to existing approaches.
The SPID algorithm can reliably detect/identify/classify the protocol based on just the first 5 TCP packets with payload. I do however need more training data in order to use the full potential of the SPID algorithm.
There is also a proof-of-concept application for the SPID algorithm available at SourceForge.