Friday, January 28, 2005

Flowgrep: Flow-oriented Content Matching

Last month I found Meling Mudin's IDS blog, and learned of Jose Nazario's tool Flowgrep. Flowgrep is written in Python. It is similar to Ngrep, which I addressed in my first book. Ngrep is packet-oriented, meaning the strings for which Ngrep searches must all appear in a single packet. If you search for 'bejtlich', and 'bejt' is in one packet and 'lich' another, then Ngrep won't find anything.

Flowgrep, in contrast, is conversation-oriented. Flowgrep assembles TCP sessions, as well as pseudo-sessions for UDP and ICMP. Flowgrep will rebuild a conversation where 'bejt' is in one packet and 'lich' another, and report seeing 'bejtlich'.

Flowgrep relies on Mike Schiffman's Libnet and Mike Pomraning's Pynids, a Python wrapper for Rafal Wojtczuk's Libnids. Mike was kind enough to work with me over the last week to get Pynids operational on FreeBSD 5.3.

Here's how I ended up with a working Flowgrep implementation. First I ensured Python was on my system:

janney:/usr/local/src/pynids-0.5rc1$ which python2.4
/usr/local/bin/python2.4

I installed the net/libnet-devel port. Note this installs Libnet 1.1.2.1, rather than the deprecated Libnet 1.0.2a found in net/libnet.

Next I downloaded and extracted pynids-0.5rc1.tar.gz.

janney:/usr/local/src$ tar -xzvf pynids-0.5rc1.tar.gz
x pynids-0.5rc1/
x pynids-0.5rc1/CHANGES
x pynids-0.5rc1/COPYING
x pynids-0.5rc1/Example
x pynids-0.5rc1/MANIFEST.in
x pynids-0.5rc1/README
x pynids-0.5rc1/dispatch.diff
x pynids-0.5rc1/libnids-1.19.tar
x pynids-0.5rc1/nidsmodule.c
x pynids-0.5rc1/setup.cfg
x pynids-0.5rc1/setup.py
x pynids-0.5rc1/PKG-INFO

Note I did not install Libnids myself. Mike packages a patched version of Libnids with his Pynids distribution.

I used these commands to install Pynids:

janney:/usr/local/src$ cd pynids-0.5rc1
janney:/usr/local/src/pynids-0.5rc1$ python2.4 setup.py build
running build
tar -xf libnids-1.19.tar
patch -c -p1 -i ../dispatch.diff
...edited...
janney:/usr/local/src/pynids-0.5rc1$ sudo python2.4 setup.py install
running install
running build
running build_ext
running install_lib
copying build/lib.freebsd-5.3-RELEASE-p2-i386-2.4/nidsmodule.so
-> /usr/local/lib/python2.4/site-packages

Now that Pynids was installed, I downloaded Flowgrep and installed it.

janney:/usr/local/src$ tar -xzvf flowgrep-0.8.tar.gz
x flowgrep-0.8
x flowgrep-0.8/flowgrep.py
x flowgrep-0.8/flowgrep.8
x flowgrep-0.8/setup.py
x flowgrep-0.8/Makefile
x flowgrep-0.8/README
janney:/usr/local/src$ cd flowgrep-0.8
janney:/usr/local/src/flowgrep-0.8$ make
python setup.py build
running build
janney:/usr/local/src/flowgrep-0.8$ sudo make install
cp flowgrep.py flowgrep
python setup.py install
running install
running build
running install_data
copying flowgrep -> /usr/local/sbin/
copying flowgrep.8 -> /usr/local/man/man8/

Now that Flowgrep is installed, let's see how it works. For demonstration purposes, I have telnet running on host janney on an odd port -- 47557 TCP. I tell Tcpdump to collect all traffic on janney involving that port. This gives us traffic for later analysis:

janney:/home/richard$ sudo tcpdump -n -i xl0 -s 1515
-w flowgrep2.lpc port 47557
tcpdump: listening on xl0,
link-type EN10MB (Ethernet), capture size 1515 bytes
^C88 packets captured
810 packets received by filter
0 packets dropped by kernel

On janney (or really any host with visibility of janney) to watch for flows containing the string 'hackerpassword':

janney:/home/richard$ sudo flowgrep -d xl0 -l /tmp
-i -a hackerpassword

From host neely, I telnet to janney on port 47557 TCP and try to login as user 'bejtlich' (echoed to my screen) and password 'hackerpassword' (not echoed to my screen):

richard@neely:~$ telnet janney 47557
Trying 192.168.2.7...
Connected to janney.
Escape character is '^]'.

FreeBSD/i386 (janney.taosecurity.com) (ttyp6)

login: bejtlich
Password:
Login incorrect
login:
telnet> quit
Connection closed.

Let's see what Flowgrep wrote to the /tmp directory:

1106916564-10.10.10.20-32785-192.168.2.7-47557-tcp
1106916564-192.168.2.7-47557-10.10.10.20-32785-tcp

We can use hd to view the contents of these files:

janney:/home/richard$ hd /tmp/1106916564-10.10.10.20-32785-192.168.2.7-47557-tcp
00000000 ff fc 25 ff fe 26 ff fb 18 ff fb 20 ff fc 23 ff |..%..&..... ..#.|
00000010 fb 27 ff fc 24 ff fa 20 00 33 38 34 30 30 2c 33 |.'..$.. .38400,3|
00000020 38 34 30 30 ff f0 ff fa 27 00 ff f0 ff fa 18 00 |8400....'.......|
00000030 52 58 56 54 ff f0 ff fd 03 ff fc 01 ff fb 22 ff |RXVT..........".|
00000040 fa 22 03 01 00 00 03 62 03 04 02 0f 05 00 00 07 |.".....b........|
00000050 62 1c 08 02 04 09 42 1a 0a 02 08 0b 02 15 0c 02 |b.....B.........|
00000060 17 0d 02 12 0e 02 16 0f 02 11 10 02 13 11 02 ff |................|
00000070 ff 12 02 ff ff ff f0 ff fb 1f ff fa 1f 00 50 00 |..............P.|
00000080 18 ff f0 ff fd 05 ff fb 21 ff fa 22 01 14 ff f0 |........!.."....|
00000090 ff fd 01 ff fc 22 62 65 6a 74 6c 69 63 68 0d 00 |....."bejtlich..|
000000a0 68 61 63 6b 65 72 70 61 73 73 77 6f 72 64 0d 00 |hackerpassword..|
000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000

That is what the client sent to the server. Here is what the server sent to the client:

janney:/home/richard$ hd /tmp/1106916564-192.168.2.7-47557-10.10.10.20-32785-tcp
00000000 ff fd 25 ff fb 26 ff fd 18 ff fd 20 ff fd 23 ff |..%..&..... ..#.|
00000010 fd 27 ff fd 24 ff fa 20 01 ff f0 ff fa 27 01 ff |.'..$.. .....'..|
00000020 f0 ff fa 18 01 ff f0 ff fb 03 ff fd 01 ff fd 22 |..............."|
00000030 ff fd 1f ff fb 05 ff fd 21 ff fa 22 01 10 ff f0 |........!.."....|
00000040 ff fa 21 00 ff f0 ff fa 21 03 ff f0 ff fb 01 ff |..!.....!.......|
00000050 fe 22 ff fa 22 03 03 e2 03 04 82 0f 07 e2 1c 08 |.".."...........|
00000060 82 04 09 c2 1a 0a 82 08 0b 82 15 0c 82 17 0d 82 |................|
00000070 12 0e 82 16 0f 82 11 10 82 13 11 82 ff ff 12 82 |................|
00000080 ff ff ff f0 0d 00 0d 0a 46 72 65 65 42 53 44 2f |........FreeBSD/|
00000090 69 33 38 36 20 28 6a 61 6e 6e 65 79 2e 74 61 6f |i386 (janney.tao|
000000a0 73 65 63 75 72 69 74 79 2e 63 6f 6d 29 20 28 74 |security.com) (t|
000000b0 74 79 70 36 29 0d 00 0d 0a 0d 00 0d 0a 6c 6f 67 |typ6)........log|
000000c0 69 6e 3a 20 62 65 6a 74 6c 69 63 68 0d 0a 50 61 |in: bejtlich..Pa|
000000d0 73 73 77 6f 72 64 3a 0d 0a 4c 6f 67 69 6e 20 69 |ssword:..Login i|
000000e0 6e 63 6f 72 72 65 63 74 0d 0a 6c 6f 67 69 6e 3a |ncorrect..login:|
000000f0 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ...............|
00000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000

"So what," you might say. I can reconstruct streams with Tcpflow. True enough, but how did we end up with these two streams? These are the result of Flowgrep searching streams for the content 'hackerpassword'. How did the packets which built that stream look? Let's jump to the packets where the server presented "Password:" to the client:

07:49:16.711854 IP 192.168.2.7.47557 > 10.10.10.20.32785:
P 207:216(9) ack 161 win 33304
<nop,nop,timestamp 101794307 107723922>
0x0000: 4510 003d 9342 4000 4006 d09b c0a8 0207 E..=.B@.@.......
0x0010: 0a0a 0a14 b9c5 8011 af45 1755 e813 c462 .........E.U...b
0x0020: 8018 8218 7e23 0000 0101 080a 0611 4203 ....~#........B.
0x0030: 066b bc92 5061 7373 776f 7264 3a .k..Password:

My client acknowledges receipt of the server's 'Password:' packet but sends no data of its own:

07:49:16.713050 IP 10.10.10.20.32785 > 192.168.2.7.47557:
. ack 216 win 5840 <nop,nop,timestamp 107723923 101794307>
0x0000: 4510 0034 790c 4000 3f06 ebda 0a0a 0a14 E..4y.@.?.......
0x0010: c0a8 0207 8011 b9c5 e813 c462 af45 175e ...........b.E.^
0x0020: 8010 16d0 d11b 0000 0101 080a 066b bc93 .............k..
0x0030: 0611 4203 ..B.

Now I enter 'hackerpassword'. This packet has the 'h' -- it's the last ASCII character:

07:49:17.703919 IP 10.10.10.20.32785 > 192.168.2.7.47557:
P 161:162(1) ack 216 win 5840
<nop,nop,timestamp 107724022 101794307>
0x0000: 4510 0035 790d 4000 3f06 ebd8 0a0a 0a14 E..5y.@.?.......
0x0010: c0a8 0207 8011 b9c5 e813 c462 af45 175e ...........b.E.^
0x0020: 8018 16d0 68af 0000 0101 080a 066b bcf6 ....h........k..
0x0030: 0611 4203 68 ..B.h

The server ACKs the packet with the 'h':

07:49:17.796820 IP 192.168.2.7.47557 > 10.10.10.20.32785:
. ack 162 win 33304 <nop,nop,timestamp 101794416 107724022>
0x0000: 4510 0034 9347 4000 4006 d09f c0a8 0207 E..4.G@.@.......
0x0010: 0a0a 0a14 b9c5 8011 af45 175e e813 c463 .........E.^...c
0x0020: 8010 8218 6502 0000 0101 080a 0611 4270 ....e.........Bp
0x0030: 066b bcf6 .k..

I send the 'a'. Again, it's the very last byte of application layer data:

07:49:17.811595 IP 10.10.10.20.32785 > 192.168.2.7.47557:
P 162:163(1) ack 216 win 5840
<nop,nop,timestamp 107724033 101794416>
0x0000: 4510 0035 790e 4000 3f06 ebd7 0a0a 0a14 E..5y.@.?.......
0x0010: c0a8 0207 8011 b9c5 e813 c463 af45 175e ...........c.E.^
0x0020: 8018 16d0 6f36 0000 0101 080a 066b bd01 ....o6.......k..
0x0030: 0611 4270 61 ..Bpa

Next, another server ACK and a client character, 'c':

07:49:17.906823 IP 192.168.2.7.47557 > 10.10.10.20.32785:
. ack 163 win 33304 <nop,nop,timestamp 101794427 107724033>
0x0000: 4510 0034 934a 4000 4006 d09c c0a8 0207 E..4.J@.@.......
0x0010: 0a0a 0a14 b9c5 8011 af45 175e e813 c464 .........E.^...d
0x0020: 8010 8218 64eb 0000 0101 080a 0611 427b ....d.........B{
0x0030: 066b bd01 .k..
07:49:18.087160 IP 10.10.10.20.32785 > 192.168.2.7.47557:
P 163:164(1) ack 216 win 5840
<nop,nop,timestamp 107724061 101794427>
0x0000: 4510 0035 790f 4000 3f06 ebd6 0a0a 0a14 E..5y.@.?.......
0x0010: c0a8 0207 8011 b9c5 e813 c464 af45 175e ...........d.E.^
0x0020: 8018 16d0 6d0e 0000 0101 080a 066b bd1d ....m........k..
0x0030: 0611 427b 63 ..B{c

By now everyone should appreciate just how powerful and useful Flowgrep can be. Even though every character of my 'hackerpassword' string appeared in separate packets, Flowgrep assembled the stream and logged the traffic that matched the filter I specified.

Flowgrep is not the only tool with this capability, since more robust intrusion detection systems offer similar features. However, this is the only stand-alone tool I know that offers rapid string matching on stream contents on arbitrary ports.

I chose to demonstrate telnet because I knew it would place virtually every character I typed into separate packets. The principle applies anywhere you are concerned that content of interest could be split between multiple packets. Furthermore, Flowgrep is designed to watch UDP and ICMP conversations as well.

If you'd like to know more about Flowgrep, check out Jose's presentation to UMeet 2004 and the transcript of the talk. Jose also maintains the Worm Blog, which I intend to peruse, and is part of the Internet Motion Sensor project.

I think Python is a very promising language for rapidly developing these sorts of tools. I have the following books on Python on my to-read list, courtesy of APress:

Expect reviews when I get a chance to read, which is difficult as I work on my own new book.

3 comments:

jose nazario said...

richard, thanks for your very kind intro to my tool flowgrep. it's a very great into to the tool, and i'll have to link to it. i hope all is well.

Christina Estenopolis said...
This comment has been removed by a blog administrator.
dghnfgj said...
This comment has been removed by a blog administrator.