Today in #snort-gui on irc.freenode.net, Marty Roesch of Snort fame explained how Snort handles stream reassembly:
roesch: when stream4 is doing it's thing it queues the tcp segments as they come in
roesch: in stream4 we actually queue the entire packet and keep a pointer to the payload to management reassembly
roesch: "flushing" is what happens when we accumulate a certain number of bytes on a stream that's in excess of the "flush point" for that stream
roesch: when we flush, we reassemble the segments into a pseudopacket and run it back thru the preprocessor stack and detection engine
roesch: if there's a detect, we ask stream4 to log all the queued *packets* on the stream
roesch: the first packet gets identified as the attack packet and the rest of them are tagged off of that event
roesch: so if you're detecting on "foobar" and it's been spread across three packets as "fo" "ob" "ar" then you're going to get one even packet and two tagged packets
roesch: this was in 2.1.x or maybe 2.2
roesch: the idea is that we don't want to log the pseudopacket since it's pretty much "inadmissable" from a evidence standpoint
qru: roesch: Yeah, I always hated that thing. What do you do w/the pseudo packet then?
roesch: we chuck it
roesch: as an analyst you'll need to have something that can reassemble the segments and present them to you
roesch: which in theory is pretty easy but in implementation is a pain if you've got an evasive attacker
This explanation is important for several reasons. First, it's important to understand how your IDS works. If you don't understand how it works, you're less likely to trust the alert data it generates. If you don't trust IDS alerts, why are you collecting them?
Second, this stream implementation represents a trade-off between capability and performance. Sensors are not built with unlimited ability to capture and reassemble traffic. Anything you can do to make the traffic stream cleaner for your sensor, like packet scrubbing, helps.
Third, Marty demonstrates that the pseudopacket that Snort presents to an analyst may not be an actual packet that crossed the wire. If an analyst wants to see exactly what passed by the sensor, she must turn to full content data collected independently of the alert data generation with Snort.