Sunday, February 06, 2005
I started the day in a briefing by Joe Stewart and Mike Wisener of LURHQ. I attended this talk primarily because Joe has been my point of contact at LURHQ for contributing several malware analysis case studies to my next book, Extrusion Detection. LURHQ analysts do some of the best technical research publicly available, and they are going to share some original write-ups in the new book.
The title of the talk was "Binary Difference Analysis via Phase Cancellation." This didn't mean much to me initially, but I am definitely glad I attended. Joe and Mike explained a way to analyze compiled binaries. In other words, how does the code in compiled malware A resemble variant B? Alternatively, how does a patch to binary C change it into binary D?
Joe cited work by Halvar Flake of SABRE Security on function signature plug-ins for IDA Pro and Todd Sabin of Bindview on instruction comparison. Joe and Mike devised a method which takes advantage of the fact that adding two waveforms together, when they are 180 degrees out of sync, results in cancellation. The top "wave equation" in the figure at right shows this property. Adding two waveforms together when they are exactly the same causes a new wave with twice the amplitude.
Joe and Mike use math to convert assembly instructions obtained from a compiled binary into a wave-like representation. They then take the binary they want to compare with the first and convert it into a wave-like form as well. They apply an algorithm that assigns higher relevancy values to so-called "worker" assembly commands (like CALL) and lower relevancy values to so-called "helper" assembly commands (like PUSH). Using their algorithm they compare the waveforms of the assembly language of the two binaries and display differences.
To implement this in code, the LURHQ analysts wrote an architecture for the free debugger Ollydbg to extend it to understand Perl plug-ins. Their OllyPerl API was inspired by IDA Python. They use OllyPerl to run their proof of concept tool WaveDiff. Joe and Mike showed how their tool found the deltas in a Microsoft binary before and after a patch was applied. The patch appeared to address the vulnerability stated by the vendor, but other differences were also apparent. This method shows a way to determine the extent of changes introduced by patches.
The LURHQ duo stated their tool is frustrated by obfuscation techniques. It depends on identical subroutine layouts and JMPing to different offsets is also a problem. They theorized that first using techniques developed by Sabin and Flake, like graph isomorphism, to isolate subroutines, and then detecting changes in those subroutines using phase cancellation, would make for a more robust solution.
After the LURHQ talk I went to the room where Marty Roesch was scheduled to talk about Snort developments. I ended up watching the end of a discussion of Renderman's WarPack. I can almost let this one speak for itself. The device is a mobile wireless tour-de-force. Look to see it in action at the next DefCon. I recommend keeping a safe distance. Renderman warned his audience that his biology has probably been adversely affected by having antennas pointed in certain directions.
Marty's Snort talk was a great look at future Snort developments involving target-based IDS (aka "T-IDS"). The problem is that traditional IDS lack context about the networks they monitor. In other words, how does the IDS know how an end host will handle traffic variations? For example, there are eight possible ways to reassemble IP fragments, with five commonly seen in real TCP/IP stacks. There are two ways to reassemble TCP streams, confounded by questions of handling resets in and out of the TCP window. TCP also features options involving maximum segment size, timestamps, inactivity timeouts, and PAWS (Protect Against Wrapped Sequences), to name a few. If the IDS reassembles IP or TCP fragments one way, and the end host uses another, an opportunity for evasion a la Ptacek and Newsham arises.
There are three ways to provide context to the IDS: active scanning, agents on monitored hosts, or passive identification. Active scanning suffers from a time lag problem, meaning scans 24 hours apart will miss activity by hosts active outside the scanning interval. Host agents are problematic because not all devices can accept them, like printers, embedded devices, and the like. (This is also a problem for network addmission control schemes.) Passive identification has the best chance of offering real-time information for all hosts which appear online.
Implementing target-based IDS in Snort requires a large code overhaul. Marty is replacing the frag2 IP defragmenter with frag3 and the stream4 TCP reassembly module with stream5. Both are target-based and multiple instances of each can be run simultaneously. For example, one could run one instance of stream5 to handle TCP streams for Windows stacks and a second for Linux stacks.
These are the first steps in the drive to implement target-based traffic processing. The next will be application-based processing. The former has the IDS model TCP/IP stacks for various target hosts. The latter has the IDS model versions and features for various target applications, like IIS 4.0, 5.0, or 6.0, perhaps at varying patch levels.
In addition, Sourcefire is working on a new traffic data acquisition system, or 'daq_module'. This is an abstraction layer to make it easier for Snort to collect traffic other than Ethernet. For example, one could use the new module to grab traffic from a divert socket, or a custom NIC driver, or even the pflog output from the Pf firewall. The new Snort decoder will support various output formats, including "Snort classic" (the current version), plus Tcpdump and Tethereal.
Marty briefly previewed the new Snort rules syntax, called Snort Language 2.0. The new language is needed because the existing rule syntax is far overextended. Marty never anticipated that Snort rules would include stateful analysis, PCRE matching, and protocol decoding! The new rules will have an info - test - action structure. Beyond a new structure, the new rules will be less port-centric and instead be based on "discoverable attributes." When a packet arrives for inspection, the new Snort will check a "dispatcher," which will look for target attributes (like OS, applications, and so on) in an attribute table. Snort will apply "micropolicies" (for, say Linux, or OpenSSH, or Win32, or IIS) to decide whether to alert on the packet or stream in question. Snort admins will still be able to write their own rules, along with the new micropolicies and attribute table. Sourcefire hopes users will buy a commercial RNA system to create the attribute table automatically.
I asked Marty if or how his competitors were innovating. He said NFR continues to integrate OS fingerprinting into its IDS. He said Gartner reported IDS was "dead" because it took too much time to properly configure an IDS and interpret its results. Target-based IDS (especially combined with RNA) will make detecting suspicious and malicious activity much easier and more accurate. Marty cited my book with respect to my idea of "defensible networks" and mentioned Sourcefire is working on a Secure Sockets Layer (SSL) decryption device.
After Marty's talk I attended Johnny Long's Google hacking presentation. Johnny's Google-fu is impressive, and he is definitely a PowerPoint Ranger. In short, "all your information are belong to Google." I learned that you can remove image requests from views of Google's caches by appending "&strip=1" to the end of cache URLs. In other words, http://126.96.36.199/search?q=cache:5OTSvO8549oJ:www.taosecurity.com/+&hl=en displays www.taosecurity.com and pulls non-cached images from www.taosecurity.com, while http://188.8.131.52/search?q=cache:5OTSvO8549oJ:www.taosecurity.com/+&hl=en&strip=1 won't touch www.taosecurity.com and won't display images.
Johnny showed how to get around Google's attempts to surpress queries for the PHPNuke admin.php vulnerability. If you query for "inurl:admin.php" you see an error screen. If you change the case of, say the first P, you get the results you want. Regarding what you can find with Google, the answer is everything -- Social Security numbers, credit cards, driver's licenses, etc. It's often not your fault, but the fault of ignorant or naive local governments, police departments, and others who collect and post such information without thinking of the consequences.
Johnny also showed how to use Sensepost.com's BiLE tool to perform link and relationship analysis of Web sites. You really need to read Johnny's new book, pictured earlier, to get the full idea.
Shmoocon concluded with the publication of an advisory on International Domain Name spoofing. Shmoo released this advisory to spur browser developers to fix the problem. This is a phisher's dream, so I hope Web browsers fix their IDN support soon.
On a related note, you mind enjoy reading Security Best Practice: Host Naming & URL Conventions by Gunter Ollmann. He offers advice on naming hosts in order to make life more difficult for phishers.
Overall I very much enjoyed Shmoocon. I intend to attend next year and perhaps submit a proposal to talk about packet radio or another offbeat topic. I felt the theme of this con, in some respects, was "cross training." Beyond traditional host- and network-oriented presentations, I saw briefings on RF, IR, and using an audio noise reduction technique (phase cancellation) to find differences in compiled binaries. Talks on lock picking were offered. I think this cross-pollination is healthy for the digital security community because we are exposed to ideas outside our mental comfort zone.
For another look at Shmoocon, check out this Secureme Blog entry.