Network Forensic Traffic Reconstruction with Tcpxtract
Today I got a chance to try Nick Harbour's Tcpxtract program. I had heard of it several months ago, but I had trouble compiling it on FreeBSD. Just now I tried the regular ./configure, make, make install routine using version 1.0.1 and had no problems.
Tcpxtract searches Libpcap traces for file formats it recognizes, using the following configuration file:
Here are the program's options. Note it can listen to an interface or read a trace.
Here is Tcpxtract in action on a trace containing a visit to a Web site.
Tcpxtract was able to reconstruct the HTML cousin's Bejtlich.com consulting company. It also rebuilt the three GIFs used as graphics on the page.
Tcpxtract is not foolproof. Here is a download of putty.zip via FTP through HTTP. In other words, as seen by Tethereal:
Let's see how Tcpxtract handles this trace.
Tcpxtract creates 8 .zip files:
None of which are similar to the real file:
On the other hand, Tcpflow has a little more success, although it is confused by the HTTP traffic over Squid.
The pageant.exe file worked on a Windows system to which I transferred it.
I really look forward to seeing Tcpxtract develop, and I hope to add some file formats to the configuration file. I might also try to hear the interview with Nick at CyberSpeak.
Tcpxtract searches Libpcap traces for file formats it recognizes, using the following configuration file:
#---------------------------------------------------------------------
# ANIMATION FILES
#---------------------------------------------------------------------
#
# AVI (Windows animation and DiVX/MPEG-4 movies)
avi(4000000, RIFF\?\?\?\?);
# MPEG Video
mpg(4000000, \x00\x00\x01\xba, \x00\x00\x01\xb9);
mpg(4000000, \x00\x00\x01\xb3, \x00\x00\x01\xb7);
# Macromedia Flash
fws(4000000, FWS);
#---------------------------------------------------------------------
# GRAPHICS FILES
#---------------------------------------------------------------------
#
#
# AOL ART files
art(150000, \x4a\x47\x04\x0e, \xcf\xc7\xcb);
art(150000, \x4a\x47\x03\x0e, \xd0\xcb\x00\x00);
# GIF and JPG files (very common)
gif(3000000, \x47\x49\x46\x38\x37\x61, \x00\x3b);
gif(3000000, \x47\x49\x46\x38\x39\x61, \x00\x00\x3b);
jpg(1000000, \xff\xd8\xff\xe0\x00\x10, \xff\xd9);
jpg(1000000, \xff\xd8\xff\xe1);
# PNG (used in web pages)
png(1000000, \x50\x4e\x47\?, \xff\xfc\xfd\xfe);
# BMP (used by MSWindows, use only if you have reason to think there are
# BMP files worth digging for. This often kicks back a lot of false
# positives
bmp(100000, BM\?\?\x00\x00\x00);
# TIF
tif(200000000, \x49\x49\x2a\x00);
#---------------------------------------------------------------------
# MICROSOFT OFFICE
#---------------------------------------------------------------------
#
# Word documents
doc(12500000, \xd0\xcf\x11\xe0\xa1\xb1);
# Outlook files
pst(400000000, \x21\x42\x4e\xa5\x6f\xb5\xa6);
ost(400000000, \x21\x42\x44\x4e);
# Outlook Express
dbx(4000000, \xcf\xad\x12\xfe\xc5\xfd\x74\x6f);
idx(4000000, \x4a\x4d\x46\x39);
mbx(4000000, \x4a\x4d\x46\x36);
#
#---------------------------------------------------------------------
# HTML
#---------------------------------------------------------------------
html(50000, \x3chtml, \x3c\x2fhtml\x3e);
#---------------------------------------------------------------------
# ADOBE PDF
#---------------------------------------------------------------------
pdf(5000000, \x25PDF, \x25EOF\x0d);
#---------------------------------------------------------------------
# AOL (AMERICA ONLINE)
#---------------------------------------------------------------------
#
# AOL Mailbox
mail(500000, \x41\x4f\x4c\x56\x4d);
#---------------------------------------------------------------------
# SOUND FILES
#---------------------------------------------------------------------
# wav will be captured as avi.
# Real Audio Files
ra(1000000, \x2e\x72\x61\xfd);
ra(1000000, \x2eRMF);
#---------------------------------------------------------------------
# MISCELLANEOUS
#---------------------------------------------------------------------
#
zip(10000000, PK\x03\x04, \x3c\xac);
java(1000000, \xca\xfe\xba\xbe);
Here are the program's options. Note it can listen to an interface or read a trace.
orr:/var/tmp/tcpxtract$ tcpxtract
Usage: tcpxtract [OPTIONS] [[-d] [-f ]]
Valid options include:
--file, -fto specify an input capture file instead of a device
--device, -dto specify an input device (i.e. eth0)
--config, -cuse FILE as the config file
--output, -odump files to DIRECTORY instead of current directory
--version, -v display the version number of this program
--help, -h display this lovely screen
Here is Tcpxtract in action on a trace containing a visit to a Web site.
orr:/var/tmp/tcpxtract$ tcpxtract -f test.lpc
Found file of type "html" in session [192.168.2.7:14348 -> 192.168.2.5:48117], exporting to 00000000.html
Found file of type "gif" in session [192.168.2.7:14348 -> 192.168.2.5:22002], exporting to 00000001.gif
Found file of type "jpg" in session [192.168.2.7:14348 -> 192.168.2.5:48117], exporting to 00000002.jpg
Found file of type "gif" in session [192.168.2.7:14348 -> 192.168.2.5:43975], exporting to 00000003.gif
Found file of type "gif" in session [192.168.2.7:14348 -> 192.168.2.5:45023], exporting to 00000004.gif
Found file of type "html" in session [192.168.2.7:14348 -> 192.168.2.5:22002], exporting to 00000005.html
Tcpxtract was able to reconstruct the HTML cousin's Bejtlich.com consulting company. It also rebuilt the three GIFs used as graphics on the page.
orr:/var/tmp/tcpxtract$ file 0000*
00000000.html: HTML document text
00000001.gif: GIF image data, version 89a, 305 x 106
00000002.jpg: JPEG image data, JFIF standard 1.02
00000003.gif: GIF image data, version 89a, 31 x 31
00000004.gif: GIF image data, version 89a, 147 x 214
00000005.html: HTML document text
Tcpxtract is not foolproof. Here is a download of putty.zip via FTP through HTTP. In other words, as seen by Tethereal:
4 1.594194 192.168.2.5 52245 192.168.2.7 3128 HTTP GET ftp://ftp.tartarus.
org/pub/people/simon/putty-snapshots/x86/putty.zip HTTP/1.1
Let's see how Tcpxtract handles this trace.
orr:/var/tmp/tcpxtract$ tcpxtract -f test2.lpc
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000000.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000001.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000002.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000003.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000004.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000005.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000006.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000007.zip
Tcpxtract creates 8 .zip files:
orr:/var/tmp/tcpxtract$ ls -al *.zip
-rwx------ 1 richard wheel 297 Jan 3 11:55 00000000.zip
-rwx------ 1 richard wheel 6407 Jan 3 11:55 00000001.zip
-rwx------ 1 richard wheel 213520 Jan 3 11:55 00000002.zip
-rwx------ 1 richard wheel 42590 Jan 3 11:55 00000003.zip
-rwx------ 1 richard wheel 23523 Jan 3 11:55 00000004.zip
-rwx------ 1 richard wheel 10386 Jan 3 11:55 00000005.zip
-rwx------ 1 richard wheel 38498 Jan 3 11:55 00000006.zip
-rwx------ 1 richard wheel 94888 Jan 3 11:55 00000007.zip
orr:/var/tmp/tcpxtract$ file *.zip
00000000.zip: Zip archive data, at least v2.0 to extract
00000001.zip: Zip archive data, at least v2.0 to extract
00000002.zip: Zip archive data, at least v2.0 to extract
00000003.zip: Zip archive data, at least v2.0 to extract
00000004.zip: Zip archive data, at least v2.0 to extract
00000005.zip: Zip archive data, at least v2.0 to extract
00000006.zip: Zip archive data, at least v2.0 to extract
00000007.zip: Zip archive data, at least v2.0 to extract
None of which are similar to the real file:
orr:/var/tmp/tcpxtract$ ls -al /home/richard/putty.zip
-rw-r--r-- 1 richard richard 1069490 Jan 3 11:49 /home/richard/putty.zip
On the other hand, Tcpflow has a little more success, although it is confused by the HTTP traffic over Squid.
orr:/var/tmp/tcpxtract$ tcpflow -r test2.lpc
orr:/var/tmp/tcpxtract$ ls -al 192*
-rw-r--r-- 1 richard wheel 547 Jan 3 11:55 192.168.002.005.52245-192.168.002.007.03128
-rw-r--r-- 1 richard wheel 288 Jan 3 11:55 192.168.002.007.00022-192.168.002.005.51747
-rw-r--r-- 1 richard wheel 1069768 Jan 3 11:55 192.168.002.007.03128-192.168.002.005.52245
orr:/var/tmp/tcpxtract$ file 192*
192.168.002.005.52245-192.168.002.007.03128: ASCII text, with CRLF line terminators
192.168.002.007.00022-192.168.002.005.51747: data
192.168.002.007.03128-192.168.002.005.52245: data
orr:/var/tmp/tcpxtract$ unzip -l 192.168.002.007.03128-192.168.002.005.52245
Archive: 192.168.002.007.03128-192.168.002.005.52245
warning [192.168.002.007.03128-192.168.002.005.52245]: 278 extra bytes at beginning or within zipfile
(attempting to process anyway)
Length Date Time Name
-------- ---- ---- ----
131072 01-02-06 21:03 pageant.exe
608818 01-02-06 19:30 putty.hlp
29840 01-02-06 19:30 putty.cnt
274432 01-02-06 21:03 plink.exe
286720 01-02-06 21:03 pscp.exe
286720 01-02-06 21:03 psftp.exe
434176 01-02-06 21:03 putty.exe
167936 01-02-06 21:03 puttygen.exe
-------- -------
2219714 8 files
orr:/var/tmp/tcpxtract$ unzip 192.168.002.007.03128-192.168.002.005.52245
Archive: 192.168.002.007.03128-192.168.002.005.52245
warning [192.168.002.007.03128-192.168.002.005.52245]: 278 extra bytes at beginning or within zipfile
(attempting to process anyway)
inflating: pageant.exe
inflating: putty.hlp
inflating: putty.cnt
inflating: plink.exe
inflating: pscp.exe
inflating: psftp.exe
inflating: putty.exe
inflating: puttygen.exe
The pageant.exe file worked on a Windows system to which I transferred it.
I really look forward to seeing Tcpxtract develop, and I hope to add some file formats to the configuration file. I might also try to hear the interview with Nick at CyberSpeak.
Comments
I've been having to research file carvers for my job. Tcpxtract seems to be based on foremost/scalpel. It will have the same flaw as they do which is that foremost and scalpel assume that the basic file is intact on the filesystem. If the file is fragmented as it might be within a libpcap capture file, a scalpel like file carver is going to carve multiple copies of a file if there are multiple ethernet frames with the header and footer. The 2006 and 2007 challenges at dfrws.org tried to address this issue. See the first place winner for the 2007 challenge (http://sandbox.dfrws.org/2007/cohen/). His zip_carver and pdf_carver programs are quite good at reconstructing whole files from fragments. Here's a wrapper script for the zip_carver.py program:
zipcarver.bash:
#!/bin/bash
IMGFILE=""
MAPFILE=""
TOOLSDIR="/opt/dfrws"
OUTPUTDIR=""
#echo "Please enter the directory where the tools are:"
#read TOOLSDIR
echo "Please enter the output directory:"
read OUTPUTDIR
echo "Please enter the absolute pathname of the image file"
echo "to be carved:"
read IMGFILE
echo "Making the output directory..."
mkdir -p $OUTPUTDIR
cd $OUTPUTDIR
echo "Creating the zip index file in the $OUTPUTDIR"
$TOOLSDIR/zip_carver.py -c -i $OUTPUTDIR/zip.idx $IMGFILE
echo "Now we create the initial maps..."
$TOOLSDIR/zip_carver.py -m -i $OUTPUTDIR/zip.idx $IMGFILE
while [ 1 ]
do
echo "Please hit Ctrl-C to exit the program"
echo "Please enter mapfile to be processed:"
read MAPFILE
$TOOLSDIR/zip_carver.py -e $OUTPUTDIR/$MAPFILE.zip -M $OUTPUTDIR/$MAPFILE.map $IMGFILE
done
exit
Hope this helps.
John