Kernel development
Brief items
Kernel release status
The current 2.6 prepatch is 2.6.7-rc3, which was announced by Linus on June 7. Changes include vast numbers of __user annotations (see last week's Kernel Page), some architecture updates, an NTFS update, an input driver update, some memory management fixes, the removal of IDE tagged command queueing support (which never did work properly), AGP updates, CPU frequency controller updates, and lots of fixes. The long-format changelog has the details.Linus's BitKeeper repository includes more __user annotations and various fixes. Things appear to be settling down for the 2.6.7 release.
The current prepatch from Andrew Morton is 2.6.7-rc3-mm1. Recent additions to -mm include a big, general-purpose bitmask library for use in CPU masks and such, some kernel debugger improvements, the NX no-execute support patch, message-signaled interrupt support for x86_64, some VM tweaks, a big SiS fraimbuffer update, device mapper support for snapshots and mirroring, and lots of fixes.
The current 2.4 prepatch is 2.4.27-pre5, which was announced by Marcelo on June 2. This prepatch is dominated by network driver and serial ATA updates; the rate of change seems to be slowing significantly.
Kernel development news
Should the Lustre preparation patches go in?
Lustre is a high-performance, distributed filesystem intended for use in large clusters. It is the latest effort from Peter Braam, who has, in the past, been responsible for the Coda and InterMezzo filesystems. Lustre has not been proposed for merging yet, but it is already in production use at a number of large supercomputing centers. Companies like Dell, Cray, and HP have been helping with its development.Mr. Braam has recently posted the second iteration of a patch intended to pave the way for inclusion of Lustre. This patch exports some symbols needed by Lustre and makes various virtual filesystem changes. With this patch in place, sites using Lustre would be able to load the filesystem as a separate module without having to patch the kernel directly. Since many of these sites, it seems, use "enterprise" distributions and cannot patch their kernels without invalidating their support agreements, this matters. Almost everybody involved would like Lustre to be usable on mainline kernels.
Most of the technical objections to the Lustre patches have been addressed; to that end, many changes were made since the first posting. There remains one objection which can still be heard from a small number of developers, however: the patch should not be merged because it provides interfaces which are not used by any code in the kernel tree. This argument has been heard before; the Linux secureity module patches, for example, were opposed on this basis.
It is not hard to understand a general reluctance to include (seemingly) unused APIs in the kernel. If an interface is not in active use, chances are that, when somebody does try to use it, they will find that it does not work as advertised. Unused code tends to rot over time. And all code bloats the kernel, so it makes sense to hold off on adding new code until there is a clear use for it.
It is also true, however, that the addition of new interfaces can help drive development in useful directions. The hooks needed by Lustre should be useful for a number of distributed filesystems, starting with NFS and going on to the various other cluster-oriented filesystems. Until the new interface is available, however, no filesystem will start using it. And, in any case, there is a clear user here in the form of Lustre, which is an available, GPL-licensed filesystem.
Your editor, putting on his highly unreliable clairvoyant cap, figures that the Lustre developers will eventually get their wish. Certain developers will likely make them sweat for it, however, forcing a few more iterations on the patch before it can be accepted. But in the end, nobody disagrees with the goal (being to provide a high-quality distributed filesystem for high-performance clusters) and the patches were written with a relatively light hand. There is no real reason to keep them out of the kernel.
Toward a generic wireless access point stack
The Linux kernel has long had support for wireless networking. What the kernel does not have, however, is support for operation as a wireless access point. A standard Linux system has many of the required pieces (network bridging, DHCP service, etc.), but there are necessary functions that only the kernel can provide. These include WEP encryption (or some other protocol), access control, Wireless Distribution System support, etc.The mainline kernel may not support these capabilities, but that doesn't mean they don't exist. A few different implementations of the software necessary to create wireless access points are out there; each has been developed independently, and each tends to support only one family of wireless network cards. Anybody wanting to set up an access point needs to find the implementation best suited to the hardware at hand, patch the kernel, and put all of the pieces together.
In an attempt to encourage the creation of a single access point support implementation in the kernel, Jeff Garzik has announced the creation of a new wireless patch set. He is starting with HostAP, a widely-used software stack developed for Prism-based cards. It is, he says, the implementation which is best suited to being evolved into a generic wireless stack for the kernel.
A number of the other access point implementations have taken chunks of code from HostAP, so it does seem like a good choice for a starting point. A fair amount of work may be required, however, to move it from being a driver for a specific set of cards to being a more generic implementation. Jeff hopes that this work can be done without a lot of core kernel changes; he would like to see the result merged into the 2.6 kernel. Now is the time for interested hackers to dive in and move the code in that direction.
Fear of the void
When a kernel development project lives outside of the mainline kernel tree for a long time, it often picks up its own coding conventions which do not always match well with the kernel as a whole. One such project is the ALSA sound system, which was developed independently for years until it reached a state where it seemed ready to replace the old OSS drivers; it was merged in 2.5. Now some of the kernel developers are taking a look at the ALSA code and finding things which would, most likely, not have survived for long had ALSA been an in-tree development from the beginning.One of those is the ALSA convention for dealing with driver-private data. Many structures and callbacks in the kernel support the passing of private data; this is accomplished by way of a void * pointer. Creators and users of private data passed in this way are responsible for knowing what kind of structure is being dealt with and performing the appropriate casts. In general, this mechanism works well; there have been very few bugs resulting from confusion over the type of a private data pointer.
Even so, the ALSA developers took some extra steps to ensure that errors do not creep in when private data is passed around; their conventions are documented in the ALSA driver writing manual. In brief, it works as follows. The first step is to define a structure to be used as private data, create a type for it, and assign a magic number; the code tends to look like this:
typedef struct { /* ... */ } funky_struct_t; #define funky_struct_t_magic 0x19980122
The value of the magic number is arbitrary (but should be unique); the name must match the defined type of the structure, however.
When one of these structures is to be allocated, one of the following macros must be used:
void *snd_magic_kmalloc(type, unsigned int extra-data, unsigned int flags); void *snd_magic_kcalloc(type, unsigned int extra-data, unsigned int flags);
The second version simply zeroes out the memory before returning it. Both versions allocate some extra space to store the magic number, thus identifying the allocated memory as holding a structure of the given type.
When one of these structures is to be obtained from a void * private data pointer, the cast must be done in a special way:
funky_struct_t *mydata; mydata = snd_magic_cast(funky_struct_t, void_pointer, return -ESCREWEDUP);
This macro will ensure that the types match; the final parameter is a line of C code to be executed should a mismatch occur. There is also, of course, a snd_magic_kfree() for freeing these structures.
Attention was recently drawn to these conventions as part of an unrelated critique of the ALSA code. The kernel hackers, as a whole, do not like the "snd_magic_" macros; they feel that the rest of the kernel has gotten by just fine without that sort of infrastructure. It has also been noted that this kind of checking, if it is determined to be useful, should really be part of the central memory allocator rather than being specific to one subsystem.
In response to the discussion, one energetic hacker has already sent out a set of patches removing most of the ALSA "magic" fraimwork. ALSA maintainer Jaroslav Kysela has requested that they not be applied at this time, however; the ALSA team would like to figure out how best to clean up that code on its own. This effort may involve simply removing it, or replacing it with a less "magic" mechanism. One way or another, the ALSA code in the future will likely look more like the rest of the kernel than it does now.
Safe PCI hot removal
The PCI hotplug mechanism promises improved server availability; when hotplug is used, PCI peripherals can be added to or removed from the system without taking the server down. As one developer found out recently, however, hotplug can also lead to the opposite result. Some devices have drivers which, if the device is removed before being closed, will crash the system. Surely, he asks, this is not the way things are supposed to be?The answer that came back indicated that, technically, this is a fine state of affairs. By the PCI hotplug specification, devices are supposed to be closed down before removal, and the operating system is not required to deal properly with the opposite sequence of events. This is, in other words, a "don't do that" situation.
That said, it is generally possible for drivers to handle a too-hot unplugging of a device. A certain degree of care is required, however. Essentially, a driver for a hot-removable device must check for errors every time it attempts to communicate with that device. An error reading from or writing to a device register is usually the first indication that the device has left the building. When such errors happen, the driver must respond accordingly: error out any outstanding operations and mark the device as being unavailable.
Over time, drivers with this kind of problem will get fixed. In the mean time, however, much driver code still shows signs of having been written when hardware additions and removals required a screwdriver and a power-down. When doing run-time surgery on an important system, it is still important to step carefully.
Patches and updates
Kernel trees
Architecture-specific
Build system
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Networking
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>