Friday, June 13, 2008

Logging Web Traffic with Httpry

I don't need to tell anyone that a lot of interesting command-and-control traffic is sailing through our Web proxies right now. I encourage decent logging for anyone using Web proxies. Below are three example entries from a Squid access.log. This is "squid" format with entries for user-agent and referer tacked to the end.

Incidentally here is a diff of my Squid configuration that shows how I set up Squid.

r200a# diff /usr/local/etc/squid/squid.conf /usr/local/etc/squid/squid.conf.orig
632,633c632,633
< acl our_networks src 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16
< http_access allow our_networks
---
> #acl our_networks src 192.168.1.0/24 192.168.2.0/24
> #http_access allow our_networks
936c936
< http_port 172.16.2.1:3128
---
> http_port 3128
1990,1992d1989
< logformat squid-extended %ts.%03tu %6tr %>a %Ss/%03Hs %<st
%rm %ru %un %Sh/%<A %mt "%{Referer}>h" "%{User-Agent}>h"
<
<
2022c2019
< access_log /usr/local/squid/logs/access.log squid-extended
---
> access_log /usr/local/squid/logs/access.log squid
2216c2213
< strip_query_terms off
---
> # strip_query_terms on
3056d3052
< visible_hostname r200a.taosecurity.com

If you worry I'm exposing this to the world, don't worry too much. I find the value of having this information in a place I can find it outweighs the possibility someone will use this data to exploit me. There's much easier ways to do that, I think.

The first record shows a Google query for the term "dia", where the referer was a query for "fbi". The second record is a Firefox prefetch of the first record. The third record is a query for a .gif.

1213383786.614 255 192.168.2.103 TCP_MISS/200 9263
GET http://www.google.com/search?hl=en&client=firefox-a&rls=
com.ubuntu%3Aen-US%3Aofficial&hs=Hqt&q=dia&btnG=Search -
DIRECT/64.233.169.103 text/html "http://www.google.com/search
?q=fbi&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:official&client=firefox-a"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20060601
Firefox/2.0.0.14 (Ubuntu-edgy)"

1213383786.704 76 192.168.2.103 TCP_MISS/200 2775
GET http://www.google.com/pfetch/dchart?s=DIA -
DIRECT/64.233.169.147 image/gif
"http://www.google.com/search?hl=en&client=firefox-a&rls=com.ubuntu%3A
en-US%3Aofficial&hs=Hqt&q=dia&btnG=Search" "Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.8.1.14) Gecko/20060601 Firefox/2.0.0.14 (Ubuntu-edgy)"

1213383786.717 81 192.168.2.103 TCP_MISS/200 1146
GET http://www.google.com/images/blogsearch-onebox.gif -
DIRECT/64.233.169.99 image/gif "http://www.google.com/search?hl=en
&client=firefox-a&rls=com.ubuntu%3Aen-US%3Aofficial&hs=Hqt&q=dia&btnG=Search"
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20060601
Firefox/2.0.0.14 (Ubuntu-edgy)"

What if you're a security person who can't access Web logs, but you have a NSM sensor in the vicinity? You might use Bro to log this activity, but I found something last year that's much simpler by Jason Bittel: Httpry.

r200a# httpry -h
httpry version 0.1.3 -- HTTP logging and information retrieval tool
Copyright (c) 2005-2008 Jason Bittel
Usage: httpry [ -dhpq ] [ -i device ] [ -n count ] [ -o file ] [ -r file ]
[ -s format ] [ -u user ] [ 'expression' ]

-d run as daemon
-h print this help information
-i device listen on this interface
-n count set number of HTTP packets to parse
-o file write output to a file
-p disable promiscuous mode
-q suppress non-critical output
-r file read packets from input file
-s format specify output format string
-u user set process owner
expression specify a bpf-style capture filter

Additional information can be found at:
http://dumpsterventures.com/jason/httpry

In the following example I run Httpry against a trace of the traffic taken when I visited the site shown in the Squid logs earlier.

r200a# httpry -i bge0 -o /tmp/httprytest3.txt -q -u richard
-s timestamp,source-ip,x-forwarded-for,direction,dest-ip,method,host,
request-uri,user-agent,referer,status-code,http-version,reason-phrase
-r /tmp/test3.pcap
r200a# cat /tmp/httprytest3.txt

# httpry version 0.1.3
# Fields: timestamp,source-ip,x-forwarded-for,direction,dest-ip,method,host,
request-uri,user-agent,referer,status-code,http-version,reason-phrase

06/13/2008 15:03:06 68.48.240.186 - > 64.233.169.103
GET www.google.com /search?hl=en&client=firefox-a&rls=com.ubuntu
%3Aen-US%3Aofficial&hs=Hqt&q=dia&btnG=Search Mozilla/5.0
(X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20060601 Firefox/2.0.0.14
(Ubuntu-edgy) http://www.google.com/search?q=fbi&ie=utf-8&
oe=utf-8&aq=t&rls=com.ubuntu:en-US:official&client=firefox-a -
HTTP/1.0 -

06/13/2008 15:03:06 64.233.169.103 - < 68.48.240.186
- - - - - 200 HTTP/1.0 OK

06/13/2008 15:03:06 68.48.240.186 192.168.2.103 > 64.233.169.147
GET www.google.com /pfetch/dchart?s=DIA Mozilla/5.0
(X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20060601 Firefox/2.0.0.14
(Ubuntu-edgy) http://www.google.com/search?hl=en&client=
firefox-a&rls=com.ubuntu%3Aen-US%3Aofficial&hs=Hqt&q=dia&btnG=Search -
HTTP/1.0 -

06/13/2008 15:03:06 68.48.240.186 192.168.2.103 > 64.233.169.99
GET www.google.com /images/blogsearch-onebox.gif Mozilla/5.0
(X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20060601 Firefox/2.0.0.14
(Ubuntu-edgy) http://www.google.com/search?hl=en&client=
firefox-a&rls=com.ubuntu%3Aen-US%3Aofficial&hs=Hqt&q=dia&btnG=Search -
HTTP/1.0 -

06/13/2008 15:03:06 64.233.169.147 - < 68.48.240.186
- - - - - 200 HTTP/1.0 OK
06/13/2008 15:03:06 64.233.169.99 - < 68.48.240.186
- - - - - 200 HTTP/1.0 OK

As you can see, the format here is request-reply, although the last four records are request,request,reply,reply.

Although I first tried Httpry straight from the source code, in this case I tested an upcoming FreeBSD port created by my friend WXS. If you give Httpry a try, let me know what you think and how you like to invoke it on the command line. I plan to daemonize it in production and run it against a live interface, not traces.

5 comments:

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...

hi, richard:

from browsing the source, it appears httpry doesn't do any defragmentation or desegmentation of TCP/IP frames, so it's vulnerable to all the usual evasions.

Barry said...

Also see
http://www.unixwiz.net/tools/websnarf.html
which seems to have the advantage of single-line logging.

Richard Bejtlich said...

Barry, Websnarf sits on port 80 to log traffic. It replaces a Web server.

Anonymous said...

see http://justniffer.sourceforge.net/

It performs defragmentation