Thursday, February 03, 2005

A Closer Look at Linux Kernel Development

Last month I wrote a blog entry titled Linux Kernel Development Problems in response to discoveries of new Linux kernel vulnerabilities. I wondered what people were saying about possibly forking the Linux kernel to start a 2.7 branch. I found a fascinating thread from last month with subject starting with 2.7 on the linux-kernel mailing list. In my previous article I cited Ted T'so's contribution to the thread. Here are a few other thoughts on Linux kernel development from that discussion. (For background on the new Linux development model, check here and here.)

My take is simple: I prefer the FreeBSD development process. I avoid 6.0 CURRENT, except for lab test systems, because I know it has zero guarantee of stability. I play with 5.3 STABLE to see what might appear in FreeBSD 5.4. On production systems I run the "security" branch, which only has bug fixes and critical fixes (few are far between, in my experience).

For Linux, I would like to see the old development model resurrected, as it mirrors this approach. The 2.5.z kernel was "CURRENT." The 2.4.z kernel was "STABLE." There wasn't really a "security" branch, but I like the idea of incrementing the .z to address security flaws.

Here is a sample of what the linux-kernel thread posters had to say.

I agree with Bill Davidsen who writes:

"Several of us have suggested that only security fixes and fixes for bugs which resulting in crashes, hangs, filesystem damage and the like be backported to the 2.6.N until 2.6.N+1 is released. No new drivers, schedulers (unless the old one breaks), just fixes."

Adrian Bunk demonstrates that a lot of changes are being made:

"The 2.6.9 -> 2.6.10 patch is 28 MB, and while the changes that went into 2.4 were limited since the most invasive patches were postponed for 2.5, now _all_ patches go into 2.6 ."

Alan Cox lets the world know what he thinks of the 2.6.9 kernel:

"After 2.6.9-ac its clear that the long 2.6.9 process worked very badly. While 2.6.10 is looking much better its long period meant the allegedly 'official' base kernel was a complete pile of insecure donkey turd for months. That doesn't hurt most vendor users but it does hurt those trying to do stuff on the base kernels very badly."

Ted T'so opines that the 2.4 kernel wasn't as stable as everyone seems to remember:

"You have *got* to be kidding. In my book at least, 2.4 ranks as one of the less successful stable kernel series, especially as compared against 2.2 and 2.0. 2.4 was far less stable, and a vast number of patches that distributions were forced to apply in an (only partially successful) attempt to make 2.4 stable meant that there are some 2.4-based distributions where you can't even run with a stock 2.4 kernel from kernel.org. Much of the reputation that Linux had of a rock-solid OS that never crashed or locked up that we had gained during the 2.2 days was tarnished by 2.4 lockups, especially in high memory pressure situations."

Dave Jones from Red Hat posts a Linux distributors point of view. I recommend reading his whole post.

"The delta between 2.6.9 -> 2.6.10 was around 4000 changesets. Cherry picking csets to backport to 2.6.9 at this rate of change is nigh on impossible. You /will/ miss stuff...

So now we're at our 2.6.9-ac+a few dozen 2.6.10 csets and all is happy with the world. Except for the regressions. As an example, folks upgrading from Fedora core 2, with its 2.6.8 kernel found that ACPI no longer switched off their machines for example. Much investigation went into trying to pin this down. Kudos to Len Brown and team for spending many an hour staring into bug reports on this issue, but ultimately the cause was never found.

It was noted by several of our users seeing this problem that 2.6.10 no longer exhibits this flaw. Yet our 2.6.9-ac+backports+every-2.6.10-acpi-cset also was broken. It's likely Fedora will get a 2.6.10 based update before the fault is ever really found for a 2.6.9 backport.

This is just one example of a regression that crept in unnoticed, and got fixed almost by accident. (If it was intentionally fixed, we'd know which patches we needed to backport 8-)"

Felipe Alfaro Solana explains kernel tracking exhaustion:

"I would like to comment in that the issue is not exclusively targeted to stability, but the ability to keep up with kernel development."

Arjan van de Ven believes working on a single code base (2.6) is better than working on 2.7 and 2.6:

"as long as more things get fixed than new bugs introduced (and that still seems to be the case) things only improve in 2.6.

The joint approach also has major advantages, even for quality: All testing happens on the same codebase. Previously, the testing focus was split between the stable and unstable branch, to the detriment of *both*."

David Lang doesn't care so much about kernel quality, since it's the user's responsibility to test it prior to production:

"Sorry, I've been useing kernel.org kernels since the 2.0 days and even within a stable series I always do a full set of tests before upgrading. every single stable series has had 'paper bag' releases, and every single one has had fixes to drivers that have ended up breaking those drivers.

the only way to know if a new kernel will work on your hardware is to try it. It doesn't matter if the upgrade is from 2.4.24 to 2.4.25 or 2.6.9 to 2.6.10 or even 2.4.24 to 2.6.10

anyone who assumes that just becouse the kernel is in the stable series they can blindly upgrade their production systems is just dreaming."

Bill Davidsen explains why Linux distros make money:

"There is a reason why people pay big bucks to Redhat (and others) for a five year contract to back port the bug fixes to the original kernel and software. Barring some huge change I need, I expect to run AS3.0 for four more years for one application, 'learning experiences' are not a good thing."

Jesper Juhl lives at the other end of the stability extreme:

"Every morning when I turn on my machine I grab the latest -bk, build it with my usual config, install that kernel and reboot, then use that as my "kernel of the day". I do this on both my home and work box (well, the work box only does this on mondays) and I've had very little trouble so far."

Richard Moser points out there's more than the 2.4 and 2.6 kernels at play:

"The latest 2.0 version of the Linux kernel is: 2.0.40 2004-02-08 07:13 UTC F V VI Changelog

You have FOUR. 2.6, 2.4, 2.2, 2.0

In my scheme it's time to let go of 2.0; support moves to 2.6, 2.4, 2.2. ~ Development goes to 2.7, in the same way the 2.6 model is done now (so that it's always usable and needs no feature freeze etc before release).
~ In 6 months, 2.2 support is dropped, support moves to 2.8, 2.4, 2.2 with development on 2.9. Support includes bugfixes (security and otherwise) only."

A recurring theme that I don't specifically cite is the burden on those who bundle distros. Several posters implied that it's the responsibility of the distro developers to patch the vanilla kernels into shape for release in Red Hat and so on. Those running vanilla kernels are more or less expected to handle problems themselves. I don't think this is an appropriate answer, but I guess it is realisitc.

No comments: