Tuesday, June 24, 2008

Pascal Meunier Is Right About Virtualization

I love Pascal Meunier's post Virtualization Is Successful Because Operating Systems Are Weak:

It occurred to me that virtual machine monitors (VMMs) provide similar functionality to that of operating systems...

What it looks like is that we have sinking boats, so we’re putting them inside a bigger, more powerful boat, virtualization...

I’m now not convinced that a virtualization solution + guest OS is significantly more secure or functional than just one well-designed OS could be, in theory...

I believe that all the special things that a virtualization solution does for functionality and security, as well as the “new” opportunities being researched, could be done as well by a trustworthy, properly designed OS.


Please read the whole post to see all of Pascal's points. I had similar thoughts on my mind when I wrote the following in my post NSM vs Encrypted Traffic, Plus Virtualization:

[R]eally nothing about virtualization is new. Once upon a time computers could only run one program at a time for one user. Then programmers added the ability to run multiple programs at one time, fooling each application into thinking that it had individual use of the computer. Soon we had the ability to log multiple users into one computer, fooling each user into thinking he or she had individual use. Now with virtualization, we're convincing applications or even entire operating systems that they have the attention of the computer...

Thanks to those who noted the author was Pascal Meunier and not Spaf!

9 comments:

David said...

Ulrich Drepper aka. the Linux libc maintainer wrote about this in depth on his blog

BadTux said...

Couple of comments: Unless Spaf has changed his name to Pascal Meunier, that wasn't a posting by Spaf. oops!

Second: I think Pascal is on to something. Machine-level virtualization adds yet more complexity, and merely pushes the vulnerabilities upwards. You get isolation between applications, but at the cost of greatly increased maintenance -- instead of one OS, you have to maintain multiple OS's (one per virtualized machine). Looking back at the Verizon study and its patching policy questions, and realize that this increases the patching problem exponentially. The question is whether we can retrofit current operating systems with the sort of OS-layer features that would allow avoiding the complexities of hardware-layer virtualization while still creating the bulkheads between applications (and OS!) that we agree are necessary given the poor application-level security of commonly-available applications. (Watch this space, it's a scenario in much discussion in circles like the Jericho Forum).

To a certain extent we've been going backwards since the failure of the Multics Project to produce a system competitive on a price-performance basis. Compare the security model of Multics (which as far as I know never suffered a security breach) with that of any currently-extant OS. Now, I am quite cognizant of the many failings of Multics as an OS (mostly due to the antiquated GE hardware it was implemented upon, which forced numerous compromises in system design), but the point is that we do not have a single OS in common use today which implements what we knew was good security back in 1975. Thirty years, and no progress. Computer science is an oxymoron, we do not learn from the past, we just keep re-implementing its mistakes.

Richard Bejtlich said...

BadTux -- thank you very much -- fixed.

jbmoore said...

I disagree with the focus. The problem is not with the operating system per se, it's due to the applications and device drivers. With Linux and BSD, you can secure the OS and still have gapping holes via the installed applications. The OS will run fine on multicores now (latest Windows, FreeBSD and Linux), but many applications aren't optimized for multiple processors. Also, virtualization allows you to run multiple instances of application servers on one hardware platform. The inherent problem has always been that hardware development outpaces software development and that software development is constrained by the current hardware, not the hardware on the market 6 months from now.

Seventy-five per cent of Windows system crashes are due to buggy device drivers (mostly third party drivers). Linux is likely less stable than say Solaris 10 for the same reason, whereas Solaris 10 is more stable because the few drivers it uses are well written. This is not the fault of the Linux kernel developers, but the device driver developers over which Microsoft and the various kernel developing communities have little or no control over. There is a device driver subsystem for Linux called nooks that would eliminate device driver crashes, but it's not been picked up by the Linux kernel community. Besides, VMware's ESX server is running on top of a minmal specially written RedHat kernel that runs only on certain hardware. You can't install VMware ESX server on a SATA system. The VMware OS install will fail because it doesn't have the device drivers to access the hard drives.

Virtualization was actually pioneered by IBM for use on it's mainframes in the mid-1960's.
Virtualization on PC's only took off when the computing hardware became more powerful and cheap. Virtualization is just increasing efficiency. You can get more bang for the buck out of the hardware. Restoring virtual machines are usually quicker as well so you are decreasing administrative costs. With snapshots, you are minimizing data loss. A virtual Windows guest with a Linux VMware host is likely more secure than a real Windows platform from the standpoint that data loss will likely be less. In that sense, it will be a bit more "secure". It fits the definition of security as a means of minimizing physical or monetary loss. Is it the best security solution - likely not, but security is a trade-off between economics, risks and productivity.

BadTux said...

JB, it was possible to write a bad application for Multics too. But once said application was written, it could not take advantage of any OS holes to gain superuser access, because it would not have the access to do so.

Let's look at, say, Virtuozzo and FreeBSD jails. These compartmentalize a runtime environment rather than an entire virtual machine. But then also consider the Verizon study. I can tell you from personal experience that if you have five FreeBSD jails running five instances of the FreeBSD runtime environment, and a new release of FreeBSD comes down the pike to fix major security holes, you're in for a s***load of hurt. The end result is that some of these jails don't get patched, and while a jailbreak only breaks that jail, you still have an exploit being exercised. You can't just roll out a patch all in one day. You have to make sure your critical core infrastructure doesn't break. And the more virtual environments you have, the more virtual infrastructure you have, and the harder it becomes. You just multiplied your complexity, congratulations, you be 0wn3d.

Now, if the OS itself had provisions to isolate programs from one another to the extent where one application failing did not allow any possibility or opportunity to even see any part of the OS that could be exploited... instead, we have applications such as Snort which require root access, which do not allow multiple users on a machine to have their own instance running in their own namespace completely isolated from other instances of Snort. If it were possible to build those kinds of isolation mechanisms into the OS so that they were transparent to Snort, you don't have to update ten copies of Snort for ten departments that are attempting to detect intrusions on their individual networks. You update one copy of snort -- the one on your IDS server that has ten Ethernet ports on it -- and each department's snort automatically gets restarted in its own isolated environment once you restart.

Also note that you can somewhat simulate this today with chroot and jails. The problem is that you are once again duplicating things into the jail. If an OS provided for fine-grained mapping of files into namespaces such that /etc/resolv.conf always got mapped down into the instance namespace while /etc/snort always was unique to the instance, then you could provide real fine-grained isolation that did *not* require you to upgrade the contents of chroot'ed environments every time an exploit was discovered in one of those libraries. Instead, you patch once, and it patches all the environments.

In other words, virtualization in the VMware sense of the word is a dead end. Everything it offers could be offered at a higher level -- with better performance, with better security due to having only *one* copy of the OS that needs to be patched rather than *dozens* of copies of the OS, etc. I will note that while IBM indeed invented VM in the late 60's, most IBM shops don't use it in a production environment -- they're running good ole MVS/CICS in an unvirtualized environment. That's because a real DASD runs much faster than a virtual DASD -- even when you have hardware support for slicing up a DASD into multiple DASD.

In the meantime, the other things mentioned -- snapshots, for example -- are doable (and done) on the OS level and do not require an external environment to do them. Other things like process migration from one machine in a cluster to another machine in a cluster can also be done at the OS level if you have an OS that has the features to do it. The basic problem is that while we are slowly accumulating the features needed to do all of this -- for example, current Linux 2.6 kernels contain the code to re-map directory accesses (but not file accesses) -- nobody has put it all together.

And with that I must go, for I'm verging perilously close to a line I cannot cross even pseudonomynously. Sorry, I cannot go into more details about what that environment would look like for reasons that will become clear in a few months...

-BT

Rob Lewis said...

@JBMOORE, (and Pascal Meunier as well)

The original post and the comments have been excellent, obviously coming from folks with a lot more experience and knowledge than myself.

The discussion has brought up the point that virtualization has potential from the point of view of (some) operational efficiency and recovery from polluted machines.

But can I ask, where is the advantage in terms of information-centric security? Granular access control at the data file level on a per user basis has been hard enough as it is, without having to contend with users running rogue virtual machines and doing who knows what with the data.

jbmoore said...

Do you trust your end users or not? Your question comes down to that. If you don't, then you push out the appropriate desktop policies, turn off or remove devices, install monitoring agents on their desktops and watch your monitoring people's eyes glaze over from the deluge of data, 95% of which is likely false positives.Then there's laptops to contend with.

I don't have any answers. I have to beg for tools and access to programs on my monitoring system at work. My productivity has been constrained by what my Legal Department has dictated not be on my work system, even though those measures can be circumvented by a web based application. I have to bring a personal laptop with Linux into work to diagnose problems on the network using Wireshark. I have to use a LiveCD on a less locked down workstation in order to use a hexeditor to view and clean up binary data I encounter on the first workstation. Meanwhile, I wonder if the Symantec AV client on both workstations is catching all of the malicious web based malware my systems might be exposed to and I am guessing the answer's a firm no.

If the AV vendors are drowning in malware, where do we stand? There's tripwire and other host based file integrity applications. Desktop monitoring apps have been out for a while. The problem with all of them is that of the thousands of files on the typical system, only a few change at a time. Is the change due to a software update, a normal system process, an administrative configuration change, or is it a malicious event? If you use a signature based scanning/monitoring solution, you have to keep the signature databases up to date which is no easy feat these days.

People have tried building immune systems for operating systems, but I'm guessing that they are still brittle. Living systems are fuzzy from the molecular level up. There's resilience and redundancy built in at every level. DNA is two stranded. The information on one stand can be used to fix the other strand. We have two copies of DNA strands (chromosomes). There are DNA repair mechanisms to correct errors or fix excess damage. If they fail, a suicide switch is triggered causing cell death. Failure of the suicide switch can lead to cancer thereby invoking the immune system to become involved. Modern computers have none of this. The modern computer equivalent of a human cell would be to give everyone their own two node cluster, the "inactive" node would be comparing its file structure to the active node. it would be smart enough to know normal from bad and help the active node resist attack. Should the active node die, the inactive node comes online. If it becomes infected, it sends out a message for help and disconnects from the network automatically. No one has built that much redundancy or intelligence into a system. Possibly the economics are against it. However, using the biological analogy, we'll likely have to have either nodes sharing state information concerning their health or they'll have to send such state information to a central monitoring node and let it decide whether to shut down unhealthy nodes or not. This will be some sort of distributed computing environment and the intelligence will have to be built into the hardware (FPGAs) and software. Or perhaps, we'll migrate to some sort of living biological/silico computational systems in which case, your computer really can catch a cold or flu, but it'd be much harder for the bad guys to code the malware.

BadTux said...

"Then there's laptops to contend with." And let us not forget Internet-connected smart phones, virtually all of which have the ability to pass data from a computer to the Internet and if you have the ability to sync your phone book between computer and smart phone, you have the ability to transfer data to the Internet. Are you going to embargo all smart phones? Perhaps if you are the NSA or CIA you can get away with that, but our sales force would riot if forced to give up their Blackberries or Treos.

Thus the Jericho Forum's somewhat pretentious declaration that "Perimeter security is dead." Their point is that it is so easy for things to get through that perimeter nowdays that the only resilient computer network is one which has its own "immune system" within the network also. VLAN's, policy servers between VLAN's, and IDS systems running all over the place appear to be part of their solution, the rest... ah, but I should not be doing your research for you :-).

Regarding redundant systems and immune systems internal to the OS that can "heal" problems it is harder than you would think -- but also easier than you think. There are no fundamentally new technologies involved, though typically it does require OS modifications to get the best results. The only really big problem is that these immune systems tend to develop "allergies" and start creating failures where, in the absence of the "immune system", the system would otherwise stay up and going. This is a Hard Problem and one that we spend a ton of time and effort on, revving down the "immune system" of our OS so that it merely grumbles rather than goes on the attack when something noteworthy but not fatal happens. This is one of the reasons why current general-purpose computer OS's have such limited self-healing capabilities enabled "out of the box", unless you know exactly what applications you will be running on the OS and exactly what their failure modes are, we simply do not currently have the knowledge to write a good general-purpose "immune system" for an OS capable of ferreting out and healing things that break without causing an allergic reaction that is as bad as the disease. But once again we are getting away from the area of virtualization, though not network security perhaps...

_BT

Rob Lewis said...

@JB Moore,

"Do you trust your end users or not? Your question comes down to that."

People hire staff with barely a background check all the time, and give them the keys to the kingdom. Should they trust their end users? Even the most trusted employee can be compromised, depending on the situation and desparation levels involved.

"watch your monitoring people's eyes glaze over from the deluge of data, 95% of which is likely false positives."

We provide a default-deny environment, in which rogue activity, network noise and false positives tends towards zero.

"If the AV vendors are drowning in malware, where do we stand...building immune systems for operating systems..."

Interestingly enough, we have had success with a deterministic kernel level policy enforcer that is a behavior enforcer. Unless a malware is able to authenticate in as a bonafide user role, it falls off of the system as a non-event. This does not act as an immune system, but it does play a role in work we are doing with self-healing systems, as the behavior enforcer component.


@badtux,

"Then there's laptops to contend with."

". ...Are you going to embargo all smart phones?

Implementing information-centric security in the form of fine-grained access controls at the data file level for all users provides a solution for this. In our schema, any object recognizable by the kernel can be ranked like users and data for secrecy. With the ability to rank endpoint devices lower for secrecy than the secrecy ranking for mission-critical data, even authorized users will not be permitted to use data with these devices in an unauthorized fashion. (e.g. unable to send anything out by email or IM because network cards have a lower secrecy ranking than the document).

"the Jericho Forum's somewhat pretentious declaration that "Perimeter security is dead."

I think they really mean that it is inadequate. Protecting infrastructure protects containers, not the contents.

Again, information-centric security allows de-perimeterization, because you retain total control over flow of business data.

I think that this may reinforce Pascal Meunier's original position about what happens if you strengthen the controls of the OS. Yet, I believe these principles can be applied to virtsec as well.