Why Collect Full Content Data?
I am writing a SANS Gold paper on a custom full packet capture system using Linux and tcpdump. It is for the GSEC Certification, so my intent is to cover the reasons why to do full packet capture and the basic set up of a system (information that wasn't readily available when setting my system up)...
I am already referencing The Tao of Network Security Monitoring.
These are the questions that I came up with based on questions other peers have asked me...
Here are the questions, followed by my answers. Most of this is covered in my previous books and blog posts, but for the sake of brevity I'll try posting short, stand-alone responses.
- As an information security analyst in today's threat landscape why would I want to do full packet capture in my environment? What value does have?
Full content data or capturing full packets provides the most flexibility and granularity when analyzing network-centric data. Unlike various forms of log data, full content data, if properly collected, is the actual data that was transferred -- not a summarization, or representation, or sample.
- Where should I place a full packet capture system on my network - are ingress/egress points sufficient?
I prioritize collection locations as follows:
- Collect where you can see the true Internet destination IP address for traffic of interest, and where you can see the true internal source IP address for traffic of interest. This may require deploying two traffic access methods with two sensors; so be it.
- Collect where you can see traffic to and from your VPN segment. Remember the previous IP address requirements.
- Collect where you can see traffic to and from business partners or through "third party gateways." You need to acquire the true source IP, but you may not be able to acquire the true destination IP if the business partner prevents collecting behind any NAT or security devices that obscure the true destination IP.
- Collect where your business units exchange traffic. This is more of a concern for larger companies, but you want to see true source and destination IPs (if possible) of internal traffic as they cross business boundaries.
- Consider cloud or hosted vendors who enable collection near Infrastructure-as-a-Service platforms used by your company.
- What advantages are there to creating a custom server with open source tools (such as a server running Linux and capturing with tcpdump) opposed to buying a commercial solution (like Solera or Niksun)?
A custom or "open" platform enables analysts to deploy the sorts of tools they need to accomplish their security mission. Closed platforms require the analyst to rely on the information provided by the vendor.
- Now that I have full packet data, what kind of analysis goals should I have to address advanced threats and subtle attacks?
The goal for any network security monitoring operation is to collect and analyze indicators and warnings to detect and respond to intrusions. Your ultimate role is to detect, respond to, and contain adversaries before they accomplish their mission, which may be to steal, alter, or destroy your data.
- Any other advice for an analyst just getting started with full packet capture systems and analyzing the data?
Rarely start with full content data. Don't dump a ton of traffic into Wireshark and start scrolling around. I recommend working with session data (connection logs) and application-specific logs (HTTP, DNS, etc.) to identify sessions of interest, then examine the content if necessary to validate your suspicions.
I could write a lot more on this topic. Stay tuned.
Comments
Second, compromises are seldom solitary events. Full network capture makes it possible to search backwards for events that have already happened. IDS is the easiest way to find events, but signatures can only be applied after a threat is known. Even if the lag is only a day or hours, you can miss most compromises by the time the signature is deployed.
The primary problem with full network capture at the enterprise level is the expense. Disk storage that is sufficiently large and fast is costly. Aggregating traffic and ensuring that you're capturing without significant loss at line speed can be difficult. Finally, some type of index is needed which generally necessitates a commercial solution. 300TB of network traffic is great, but you can't use it unless you have an index to grab the data you want.
Similar to what you write, it's much easier to get started monitoring only HTTP or DNS. Likewise, if it's possible to prioritize network segments such as those with PII, those that require PCI compliance, or any other "important" network. I would also suggest beginning with netflows. They're much lighter and will help scope full packet capture requirements. Additionally, you'll get some idea of what you want to capture when netflows indicate you might have a problem and you realize you need more information. Those are the networks you want to begin full capture on.
Overview First | Zoom and Filter | Details on Demand
I apply this a lot to analyse network traffic. I start with an overview visualization of any kind of log (or multiple logs). Then I pivot around, explore, and find interesting areas. Once I get to some interesting areas I figure out what the network captures are that were generated around that time and among the machines involved.
I find visualization to be the best way to start understanding large amounts of data to hone into interesting areas. The problem remaining is how you visualize your data. But that's another topic and is what pixlcloud is focussed on.
Regarding sensor placement, what do you think about full packet capture for strictly internal traffic? Should an intrusion be successful, I would think this would be beneficial to track an attacker's internal movements in order to identify the full scope of the compromise.
One issue I could see would be duplication of captures. For example, if all traffic on an internal switch is captured via port mirroring, and traffic at an egress point is also being captured, then all outbound traffic would be captured twice: once on the switch and then again as it leaves the gateway.
Of course, identifying and dealing with the initial attack vector before any entrenchment is accomplished should be the goal, but may not be possible.
That said, it can be intimidating for those who are just starting out. Using HTTP as a stepping stone is fantastic; figure out general network traffic and then look at the capture data from the compromise.