Network Forensic Traffic Reconstruction with Tcpxtract

Today I got a chance to try Nick Harbour's Tcpxtract program. I had heard of it several months ago, but I had trouble compiling it on FreeBSD. Just now I tried the regular ./configure, make, make install routine using version 1.0.1 and had no problems.

Tcpxtract searches Libpcap traces for file formats it recognizes, using the following configuration file:


#---------------------------------------------------------------------  
# ANIMATION FILES
#---------------------------------------------------------------------  
#

# AVI (Windows animation and DiVX/MPEG-4 movies)
avi(4000000, RIFF\?\?\?\?);

# MPEG Video
mpg(4000000, \x00\x00\x01\xba, \x00\x00\x01\xb9);
mpg(4000000, \x00\x00\x01\xb3, \x00\x00\x01\xb7);

# Macromedia Flash
fws(4000000, FWS);

#---------------------------------------------------------------------
# GRAPHICS FILES
#---------------------------------------------------------------------  
#
#
# AOL ART files
art(150000,     \x4a\x47\x04\x0e, \xcf\xc7\xcb);
art(150000,     \x4a\x47\x03\x0e, \xd0\xcb\x00\x00);


# GIF and JPG files (very common)
gif(3000000, \x47\x49\x46\x38\x37\x61, \x00\x3b);
gif(3000000, \x47\x49\x46\x38\x39\x61, \x00\x00\x3b);
jpg(1000000, \xff\xd8\xff\xe0\x00\x10, \xff\xd9);
jpg(1000000, \xff\xd8\xff\xe1);

# PNG   (used in web pages)
png(1000000, \x50\x4e\x47\?, \xff\xfc\xfd\xfe);

# BMP   (used by MSWindows, use only if you have reason to think there are
#       BMP files worth digging for. This often kicks back a lot of false
#       positives
bmp(100000, BM\?\?\x00\x00\x00);

# TIF
tif(200000000, \x49\x49\x2a\x00);


#---------------------------------------------------------------------  
# MICROSOFT OFFICE 
#---------------------------------------------------------------------  
#
# Word documents
doc(12500000, \xd0\xcf\x11\xe0\xa1\xb1);

# Outlook files
pst(400000000, \x21\x42\x4e\xa5\x6f\xb5\xa6);
ost(400000000, \x21\x42\x44\x4e);

# Outlook Express
dbx(4000000, \xcf\xad\x12\xfe\xc5\xfd\x74\x6f);
idx(4000000, \x4a\x4d\x46\x39);
mbx(4000000, \x4a\x4d\x46\x36);
#
#---------------------------------------------------------------------  
# HTML
#---------------------------------------------------------------------  

html(50000, \x3chtml, \x3c\x2fhtml\x3e);

#---------------------------------------------------------------------  
# ADOBE PDF
#---------------------------------------------------------------------  

pdf(5000000, \x25PDF, \x25EOF\x0d);

#---------------------------------------------------------------------  
# AOL (AMERICA ONLINE)
#---------------------------------------------------------------------  
#
# AOL Mailbox
mail(500000, \x41\x4f\x4c\x56\x4d);

#---------------------------------------------------------------------  
# SOUND FILES
#---------------------------------------------------------------------  

# wav will be captured as avi.

# Real Audio Files
ra(1000000, \x2e\x72\x61\xfd);
ra(1000000, \x2eRMF);

#---------------------------------------------------------------------  
# MISCELLANEOUS
#---------------------------------------------------------------------  
#
zip(10000000, PK\x03\x04, \x3c\xac);
java(1000000, \xca\xfe\xba\xbe);

Here are the program's options. Note it can listen to an interface or read a trace.


orr:/var/tmp/tcpxtract$ tcpxtract 
Usage: tcpxtract [OPTIONS] [[-d ] [-f ]]
Valid options include:
  --file, -f          to specify an input capture file instead of a device
  --device, -d      to specify an input device (i.e. eth0)
  --config, -c        use FILE as the config file
  --output, -o   dump files to DIRECTORY instead of current directory
  --version, -v             display the version number of this program
  --help, -h                display this lovely screen

Here is Tcpxtract in action on a trace containing a visit to a Web site.


orr:/var/tmp/tcpxtract$ tcpxtract -f test.lpc 
Found file of type "html" in session [192.168.2.7:14348 -> 192.168.2.5:48117], exporting to 00000000.html
Found file of type "gif" in session [192.168.2.7:14348 -> 192.168.2.5:22002], exporting to 00000001.gif
Found file of type "jpg" in session [192.168.2.7:14348 -> 192.168.2.5:48117], exporting to 00000002.jpg
Found file of type "gif" in session [192.168.2.7:14348 -> 192.168.2.5:43975], exporting to 00000003.gif
Found file of type "gif" in session [192.168.2.7:14348 -> 192.168.2.5:45023], exporting to 00000004.gif
Found file of type "html" in session [192.168.2.7:14348 -> 192.168.2.5:22002], exporting to 00000005.html

Tcpxtract was able to reconstruct the HTML cousin's Bejtlich.com consulting company. It also rebuilt the three GIFs used as graphics on the page.


orr:/var/tmp/tcpxtract$ file 0000*
00000000.html: HTML document text
00000001.gif:  GIF image data, version 89a, 305 x 106
00000002.jpg:  JPEG image data, JFIF standard 1.02
00000003.gif:  GIF image data, version 89a, 31 x 31
00000004.gif:  GIF image data, version 89a, 147 x 214
00000005.html: HTML document text

Tcpxtract is not foolproof. Here is a download of putty.zip via FTP through HTTP. In other words, as seen by Tethereal:


  4   1.594194  192.168.2.5 52245 192.168.2.7  3128 HTTP GET ftp://ftp.tartarus.
org/pub/people/simon/putty-snapshots/x86/putty.zip HTTP/1.1

Let's see how Tcpxtract handles this trace.


orr:/var/tmp/tcpxtract$ tcpxtract -f test2.lpc 
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000000.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000001.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000002.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000003.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000004.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000005.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000006.zip
Found file of type "zip" in session [192.168.2.7:14348 -> 192.168.2.5:5580], exporting to 00000007.zip

Tcpxtract creates 8 .zip files:


orr:/var/tmp/tcpxtract$ ls -al *.zip
-rwx------  1 richard  wheel     297 Jan  3 11:55 00000000.zip
-rwx------  1 richard  wheel    6407 Jan  3 11:55 00000001.zip
-rwx------  1 richard  wheel  213520 Jan  3 11:55 00000002.zip
-rwx------  1 richard  wheel   42590 Jan  3 11:55 00000003.zip
-rwx------  1 richard  wheel   23523 Jan  3 11:55 00000004.zip
-rwx------  1 richard  wheel   10386 Jan  3 11:55 00000005.zip
-rwx------  1 richard  wheel   38498 Jan  3 11:55 00000006.zip
-rwx------  1 richard  wheel   94888 Jan  3 11:55 00000007.zip
orr:/var/tmp/tcpxtract$ file *.zip
00000000.zip: Zip archive data, at least v2.0 to extract
00000001.zip: Zip archive data, at least v2.0 to extract
00000002.zip: Zip archive data, at least v2.0 to extract
00000003.zip: Zip archive data, at least v2.0 to extract
00000004.zip: Zip archive data, at least v2.0 to extract
00000005.zip: Zip archive data, at least v2.0 to extract
00000006.zip: Zip archive data, at least v2.0 to extract
00000007.zip: Zip archive data, at least v2.0 to extract

None of which are similar to the real file:


orr:/var/tmp/tcpxtract$ ls -al /home/richard/putty.zip 
-rw-r--r--  1 richard  richard  1069490 Jan  3 11:49 /home/richard/putty.zip

On the other hand, Tcpflow has a little more success, although it is confused by the HTTP traffic over Squid.


orr:/var/tmp/tcpxtract$ tcpflow -r test2.lpc 

orr:/var/tmp/tcpxtract$ ls -al 192*
-rw-r--r--  1 richard  wheel      547 Jan  3 11:55 192.168.002.005.52245-192.168.002.007.03128
-rw-r--r--  1 richard  wheel      288 Jan  3 11:55 192.168.002.007.00022-192.168.002.005.51747
-rw-r--r--  1 richard  wheel  1069768 Jan  3 11:55 192.168.002.007.03128-192.168.002.005.52245

orr:/var/tmp/tcpxtract$ file 192*
192.168.002.005.52245-192.168.002.007.03128: ASCII text, with CRLF line terminators
192.168.002.007.00022-192.168.002.005.51747: data
192.168.002.007.03128-192.168.002.005.52245: data

orr:/var/tmp/tcpxtract$ unzip -l 192.168.002.007.03128-192.168.002.005.52245
Archive:  192.168.002.007.03128-192.168.002.005.52245
warning [192.168.002.007.03128-192.168.002.005.52245]:  278 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  Length     Date   Time    Name
 --------    ----   ----    ----
   131072  01-02-06 21:03   pageant.exe
   608818  01-02-06 19:30   putty.hlp
    29840  01-02-06 19:30   putty.cnt
   274432  01-02-06 21:03   plink.exe
   286720  01-02-06 21:03   pscp.exe
   286720  01-02-06 21:03   psftp.exe
   434176  01-02-06 21:03   putty.exe
   167936  01-02-06 21:03   puttygen.exe
 --------                   -------
  2219714                   8 files

orr:/var/tmp/tcpxtract$ unzip 192.168.002.007.03128-192.168.002.005.52245
Archive:  192.168.002.007.03128-192.168.002.005.52245
warning [192.168.002.007.03128-192.168.002.005.52245]:  278 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  inflating: pageant.exe             
  inflating: putty.hlp               
  inflating: putty.cnt               
  inflating: plink.exe               
  inflating: pscp.exe                
  inflating: psftp.exe               
  inflating: putty.exe               
  inflating: puttygen.exe

The pageant.exe file worked on a Windows system to which I transferred it.

I really look forward to seeing Tcpxtract develop, and I hope to add some file formats to the configuration file. I might also try to hear the interview with Nick at CyberSpeak.

Comments

jbmoore said…

Richard,

I've been having to research file carvers for my job. Tcpxtract seems to be based on foremost/scalpel. It will have the same flaw as they do which is that foremost and scalpel assume that the basic file is intact on the filesystem. If the file is fragmented as it might be within a libpcap capture file, a scalpel like file carver is going to carve multiple copies of a file if there are multiple ethernet frames with the header and footer. The 2006 and 2007 challenges at dfrws.org tried to address this issue. See the first place winner for the 2007 challenge (http://sandbox.dfrws.org/2007/cohen/). His zip_carver and pdf_carver programs are quite good at reconstructing whole files from fragments. Here's a wrapper script for the zip_carver.py program:

zipcarver.bash:

#!/bin/bash
IMGFILE=""
MAPFILE=""
TOOLSDIR="/opt/dfrws"
OUTPUTDIR=""
#echo "Please enter the directory where the tools are:"
#read TOOLSDIR
echo "Please enter the output directory:"
read OUTPUTDIR
echo "Please enter the absolute pathname of the image file"
echo "to be carved:"
read IMGFILE
echo "Making the output directory..."
mkdir -p $OUTPUTDIR
cd $OUTPUTDIR
echo "Creating the zip index file in the $OUTPUTDIR"
$TOOLSDIR/zip_carver.py -c -i $OUTPUTDIR/zip.idx $IMGFILE
echo "Now we create the initial maps..."
$TOOLSDIR/zip_carver.py -m -i $OUTPUTDIR/zip.idx $IMGFILE
while [ 1 ]
do
echo "Please hit Ctrl-C to exit the program"
echo "Please enter mapfile to be processed:"
read MAPFILE
$TOOLSDIR/zip_carver.py -e $OUTPUTDIR/$MAPFILE.zip -M $OUTPUTDIR/$MAPFILE.map $IMGFILE
done
exit

Hope this helps.

John

7:05 AM

Search This Blog

TaoSecurity Blog

Network Forensic Traffic Reconstruction with Tcpxtract

Comments

Popular posts from this blog

Zeek in Action Videos

MITRE ATT&CK Tactics Are Not Tactics

New Book! The Best of TaoSecurity Blog, Volume 4