Monday, August 21, 2006

Minix 3: A Sign of the Future?

The April 2006 issue of ;login; features Modular System Programming in MINIX 3 (.pdf, free). I'd like to share some excerpts from this article that struck a chord.

If you ask ordinary computer users what they like least about their current operating system, few people will mention speed. Instead, it will probably be a neck-and-neck race among mind-numbing complexity, lack of reliability, and security in a broad sense (viruses, worms, etc.).

We believe that many of these problems can be traced back to design decisions made 40 or 50 years ago. In particular, the early designers' goal of putting speed above all else led to monolithic designs with the entire operating system running as a single binary program in kernel mode.

When the maximum memory available to the operating system was only 32K words, as was the case with MIT's first timesharing system, CTSS, multi-million-line operating systems were not possible and the complexity was manageable.

I agree, although trying various tricks to get maxmimum functionality out of minimum space can introduces its own problems.

In our view, the only way to improve operating system reliability is to get rid of the model of the operating system as one gigantic program running in kernel mode, with every line of code capable of compromising or bringing down the system.

Nearly all the operating system functionality, and especially all the device drivers, have to be moved to user-mode processes, leaving only a tiny microkernel running in kernel mode. Moving the entire operating system to a single user-mode process as in L4Linux [4] makes rebooting the operating system after a crash faster, but does not address the fundamental problem of every line of code being critical.

What is required is splitting the core of the operating system functionality -- including the file system, process management, and graphics -- into multiple processes, putting each device driver in a separate process, and very tightly controlling what each component can do. Only with such an architecture do we have a chance to improve system reliability.

I am not a kernel developer, although I speak regularly with at least one. This argument seems to make sense.

The reasons that such a modular, multiserver design is better than a monolithic one are threefold. First, by moving most of the code from kernel mode to user mode, we are not reducing the number of bugs but we are reducing the power of each bug to cause damage.

Bugs in user-mode processes have much less opportunity to trash critical kernel data structures and cannot touch hardware devices they have no business touching.

The crash of a user-mode process is rarely fatal, whereas a crash of the kernel always is. By moving most of the code out of the kernel, we are moving most of the bugs out as well.

Second, by breaking the operating system into many processes, each in its own address space, we greatly restrict the propagation of faults. A bug in the audio driver may turn the sound off, but it cannot wipe out the file system by accident. In a monolithic system, in contrast, bugs in any function can destroy code and data structures in unrelated and much more critical functions.

Third, by constructing the system as a collection of user-mode processes, the functionality of each module can be clearly determined, making the entire system much easier to understand and simpler to implement. In addition, the operating system's maintainability will improve, because the modules can be maintained independently from each other, as long as interfaces and shared data structures are respected.

Again, I agree.

Even Microsoft understands this. The next version of Windows (Vista) will feature many user-mode drivers, and Microsoft's Singularity research project is also based on a microkernel.

Will Vista be the last Microsoft OS using the old model? Maybe.

It is useful to recall that, historically, restricting what programmers can do has often led to more reliable code.

That is a very interesting idea. It is certainly true that restricting users can improve security, as long as those restrictions are not able to be circumvented.

I'm downloading a live CD iso of MINIX 3 now, so I hope to speak more about it in a future blog post.


Matt said...

Looks like Herder and Bos forgot a reference:
The Tanenbaum-Torvalds Debate

Chris Walsh said...

Don't forget the userland analog to this: multiple cooperating processes being to perform a complex task, each with the privileges it needs (and no others), communicating through a rigidly-defined mechanism, a la Postfix.

Anonymous said...

In fact, Linux is also taking pieces out of the kernel (like udev or FUSE) since kernel should implement mechanisms, not politics.

Although I agree with non-monolithic, micro-kernel ideas, I think they are still to inmature and suffer from performance problems. However, that doesn't keep me from thinking they're the future.

Anonymous said...

You should read the following:

An exerpt: "What modern microkernel advocates claim is that properly component-structured systems are engineerable[...]. There are many supporting examples for this assertion[...]. There are no supporting examples suggesting that unstructured systems are engineerable."

Anonymous said...

I'm not sure Apple and ohter users of 'Mach' would agree that the micro-kernel is immature. It seems to be working quite well for them actually.