Monday, November 01, 2010

Collage: Defeating Censorship [aka Security] with User-Generated Content

The Economist article Anti-censorship: Hidden truths; A new way of beating the web’s censors brought a system called "Collage" to my attention. Collage, a project by Sam Burnett, Nick Feamster, and Santosh Vempala, described this way on its project site:

We have developed Collage, which allows users to exchange messages through hidden channels in sites that host user-generated content.

Collage has two components: a message vector layer for embedding content in cover traffic; and a rendezvous mechanism to allow parties to publish and retrieve messages in the cover traffic.

Collage uses user-generated content (e.g., photo-sharing sites) as “drop sites” for hidden messages.

To send a message, a user embeds it into cover traffic and posts the content on some site, where receivers retrieve this content using a sequence of tasks.

Collage makes it difficult for a censor to monitor or block these messages by exploiting the sheer number of sites where users can exchange messages and the variety of ways that a message can be hidden. Our evaluation of Collage shows that the performance overhead is acceptable for sending small messages (e.g., Web articles, email).

Applications use Collage to send and receive messages, by hiding these messages inside user-generated cover content (e.g., images, tweets, etc.) and publishing them on user-generated content hosts like Flickr or Twitter. At the receiver, Collage fetches the cover content from content hosts and decodes the message. By hiding data inside user-generated content as they traverse the network, Collage escapes detection by censors.


Freedom FTW, right? Let's rewrite this description from the point of view I care more about:

We have developed Collage, which allows intruders to exchange messages through hidden channels in sites that host user-generated content.

Collage has two components: a message vector layer for embedding content in cover traffic that will fly past your proxies and other filtering mechanisms; and a rendezvous mechanism to allow parties to publish and retrieve messages in the cover traffic.

Collage uses user-generated content (e.g., photo-sharing sites) as “drop sites” for hidden messages, like command and control traffic, or stolen data.

To send a message, a user embeds it into cover traffic and posts the content on some site, where receivers retrieve this content using a sequence of tasks that defenders will not recognize as malicious.

Collage makes it difficult for incident detection and response teams to monitor or block these messages by exploiting the sheer number of sites where users can exchange messages and the variety of ways that a message can be hidden. Our evaluation of Collage shows that the performance overhead is acceptable for sending small messages (e.g., Web articles, email), perfect for command and control instructions.

Malware or backdoors use Collage to send and receive messages, by hiding these messages inside user-generated cover content (e.g., images, tweets, etc.) and publishing them on user-generated content hosts like Flickr or Twitter that are not blocked by reputation systems, which some security vendors think solve the world's problems. At the receiver, Collage fetches the cover content from content hosts and decodes the message. By hiding data inside user-generated content as they traverse the network, Collage escapes detection by organizations trying to protect their data.


I wonder if I'm not the only one thinking this way?

10 comments:

Mike said...

Well, I think you have proven that Collage is morally neutral.

That is, if used by the oppressed in places like China and Iran to communicate, then its a good tool.

If used as a covert communication channel by the bad guys, then its bad.

The question is, how does one allow the good uses and not the bad, or how does one punish the bad uses but not the good?

David said...

Of course you aren't the only one thinking this way, now what's the best way to detect this type of C&C traffic?

I would start by looking for very frequent image posting or very infrequent but routine image posting behaviour.

Shirkdog said...

Just another form of image stenography, a rather interesting one though.

gsirt said...

Hey Richard,

What do you think is the best way to 'defend' against this technique? Seems there would be no way to defend against it with existing technology (to my knowledge). Maybe a Security Policy which limits web access (no access to Social sites Facebook, Twitter, etc).

mab

Anonymous said...

Not just you...as I read through the description it was a head slapper. Why didn't I think of this medium? :)

Alisdair McKenzie said...

Seems to me it could be used to by-pass an entities DLP measures!

Anonymous said...

Here's a post along similar lines, where the author uses Amazon's EC2 service so he can do some number crunching, but as a special benefit, he gets to thwart the evil IT security folks.

VivekRajan said...

Wow, all this in just 650 lines of Python + Selenium.

I guess if someone adapted Collage for a C&C app, it is goodbye to IP/DNS blacklisting and Hello to Flickr/Twitter user blacklisting.

The Family Griot said...

Great blog 'Capt' Bejtlich! I stumbled upon this while looking for some old AIA AFCERT info. Looking forward to further reading!

Peter Joseph said...

I am also certainly thinking this way. While this blog entry certainly provokes thought the tone could come across like in the same light as those who see metasploit and other tools along those lines as always bad things.

These "new" concepts tools need to be out in the open so we can think of ways to counter them!