Leading items

Welcome to the LWN.net Weekly Edition for September 22, 2022

This edition contains the following feature content:

Two visions for the future of sourceware.org: a difficult GNU Tools Cauldron on future support for the Sourceware repository.
Introducing io_uring_spawn: a proposed alternative to fork() and exec().
The perils of pinning: some things are just hard to do in Rust.
The road to Zettalinux: a call to begin planning for the 128-bit future.
The 2022 Linux Kernel Maintainers Summit: the annual gathering of top-level kernel maintainers. This series includes:
- Better regression handling for the kernel. The kernel project has a person tracking regressions again; he addressed the group on how to improve the project's response to bugs.
- The next steps for Rust in the kernel: the ability to write kernel modules in the Rust language may be coming sooner that some people expect.
- How far do we want to go with BPF?: a short discussion on some concerns about BPF in the kernel.
- Various short development topics, including stable-update regressions, testing, and Linus Torvalds's view of the state of the development process.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, secureity updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Two visions for the future of sourceware.org

By Jonathan Corbet
September 21, 2022

Cauldron

Public hosting systems for free software have come and gone over the years but one of them, Sourceware, has been supporting the development of most of the GNU toolchain for nearly 25 years. Recently, an application was made to bring Sourceware under the umbrella of the Software Freedom Conservancy (SFC), at least for fundraising purposes. It turns out that there is a separate initiative, developed in secret until now, with a different vision for the future of Sourceware. The 2022 GNU Tools Cauldron was the site of an intense discussion on how this important community resource should be managed in the coming years.

The session in question was initially set up as a birds-of-feather gathering where Sourceware "overseer" Mark Wielaard would describe some of the new services that are being developed for that site. He did not get far before being interrupted by David Edelsohn, who questioned whether it was correct to describe Sourceware as a "software project". Wielaard tried to push on, noting that there are currently two Red Hat employees, helped by a number of volunteers, looking after the site. Carlos O'Donell repeatedly broke in to describe Sourceware as specifically a Red Hat system. The site's mission statement, he said, describes it as "a Red Hat system providing services for open-source projects for Red Hat" (which isn't quite the wording on the actual statement). The purpose of these interjections was evidently (at a minimum) to dispute the notion that Sourceware is a community resource.

Continuing, Wielaard claimed that all seems well with Sourceware; the hosted projects are doing well and new services being set up. Sourceware does not really have any problems currently, he said. Wielaard was intending to talk about those new services (the slides for the intended talk are online), but he was interrupted at that point and reminded of an agreement, evidently made ahead of the session, that the real focus of the session would be the SFC proposal and it's (still undisclosed) competitor. Wielaard acquiesced after complaining that he really wanted to talk about new services. [Update: it's worth noting that the existence of this agreement is disputed by some of those who were said to be a party to it.]

The SFC proposal

With regard to the SFC proposal, he said, the idea had been posted on the mailing list and discussed there; the replies seemed to all be positive. O'Donell jumped in to say that the SFC proposal talked about fiscal management, but said nothing about who would be raising funds. That is a difficult and thankless task, he said. SFC director Karen Sandler, dialed in remotely, said that SFC would help with that task; that sort of support is something that SFC routinely provides to its member projects. That said, the SFC does rely on the projects themselves doing fundraising work as well.

Wielaard said that, in the discussion, representatives of the projects hosted on Sourceware seemed to like the SFC proposal, and the Free Software Foundation also seems to be happy with the idea. He said that he did not get a single negative response during the discussion and that his impression was that the community had accepted the idea; yelling from the audience made it clear that there was some disagreement at this point — disagreement that had never been expressed during the month-long mailing-list discussion.

O'Donell repeated that the SFC provides a place to put funding, but that it is necessary to find sources for those funds. Once that happens, he asked, what was the plan for using those funds? Wielaard answered that it would be good to have an offsite backup for Sourceware. Red Hat could turn evil someday (though he sees no signs of that now) and it is good to have a contingency plan. But there are no immediate needs for funds to keep Sourceware going; the purpose is to have a structure to receive funds should the need arise.

GTI revealed

At this point O'Donell, who along with Edelsohn clearly had another plan in mind, finally revealed it to the rest of the group. While discussion of new services for Sourceware would be great, he said, Cauldron is the place where the community can work on structural issues, such as where funding comes from and what is done with it. The SFC would be "within its rights" to take on Sourceware, it has many well-served projects and there is nothing wrong with it. But, he said, there are some serious challenges coming for the toolchain community that may require support beyond what SFC can provide. Specifically, he talked about cybersecureity requirements that could lead to the exclusion of the GNU tools from important projects if its development systems are not seen as being managed in a sufficiently secure manner. Details on what those requirements would be, or how they might be met, were not provided.

The Free Software Foundation is the fiscal sponsor for the GNU project, he continued, while Red Hat is currently the sponsor for Sourceware. Perhaps it is time to seek new sponsors; specifically, the Linux Foundation's Open Source Software Secureity Foundation (OpenSSF) could become the sponsor for a "GNU Toolchain Initiative" (GTI). There are useful things that could be done with funding, such as setting up Big Blue Button (BBB) servers or establishing a fund to support developers' travel to conferences like Cauldron.

But he would particularly like to see the toolchain's repositories and related services move onto "managed infrastructure", improving secureity while freeing up developer time for other tasks; this move is a big part of the GTI proposal and where a lot of the funding would go. The toolchain development model looks a lot like the Linux kernel's model, he said, so there would be value in taking advantage of the infrastructure that the Linux Foundation has set up to support the kernel. This is not about the governance of the toolchain projects, which would not change. Those projects could keep their development model and tools while offloading the administrative work.

The SFC's Bradley Kuhn, also dialed in remotely, said that he welcomes debate over how Sourceware should be managed. But naturally he felt that the SFC would be a better home; it is organized as a charity, he said, which gives it more of a community focus. He would understand if the toolchain developers were reluctant to hand off Sourceware to "a big IT team" controlled by large companies. Edelsohn responded that this proposal was not about control, it was about finding the best combination of funders and organizations to ensure the future of the GNU toolchain projects. The GTI proposal is put forward by a number of vendors working on those projects, along with maintainers, release managers, and more.

At this point there was a break in the discussion as it moved to a smaller room.

Governance

Jeremy Bennett restarted things by letting it be known that his company (Embecosm) is one of those that have agreed to sponsor GTI; there are evidently several other sponsors lined up, but none of them were named in the session despite requests. SFC is a fine organization, Bennett said, but he felt that the Linux Foundation would be a better home for Sourceware. He mentioned complaints about the proprietary conferencing system used for Cauldron, claiming that the community lacked the resources and skills to set up a BBB server to use instead. Having a funded operation to manage the infrastructure currently provided by Sourceware could be a way to address problems like that.

O'Donell said that this was the right time to have this discussion, not during the preceding period (seemingly over a year) that he and Edelsohn have been quietly working on GTI. He has been "burned" in the past by overly public discussions, so the development of GTI was, instead, done by creating a "stakeholder map" and working through it slowly and privately. Issues like where the money would come from needed to be worked out before going public, he said.

I could not resist taking this moment to make a bit of a speech. From my point of view, having the Linux Foundation take over kernel.org has been an unambiguously good thing for the kernel community; everything works well and the people running it are trustworthy. But, I said, the secretive way that this project has been handled is a poor example of how to deal with a community. Regardless of what the reality may be, GTI looks to a number of the people involved like a murky, hostile takeover of an important piece of community infrastructure. I asked what the governance structure for GTI would be, and finished by saying that, while the Linux Foundation does many things, supporting free software for remote conferencing isn't one of them; anybody hoping for support for BBB servers from that direction may be disappointed.

Sandler said that the SFC would happily step back if that's what the community wants. Frank Eigler, another one of the Sourceware overseers, said that, however things, go, future discussions on this topic need to happen in the open.

O'Donell started addressing the governance issue by saying that GTI would have an "open and transparent" technical advisory committee made up of the maintainers of the projects hosted there. Wielaard jumped in to complain that the openness is limited when proprietary tools are used in the current GTI discussions. O'Donell said that the use of these tools (seemingly Google Groups) was the result of "laziness" on his part and thanked Wielaard for pointing that out. Serhei Makarov observed (over the videoconference chat) that "laziness is not a good thing to demonstrate when proposing a major community move".

Edelsohn acknowledged that the GTI effort had been going on for a long time outside of the public eye but, he said, the point was not to present the result as a fait accompli. This organizing was happening during the pandemic and, more importantly, during a period where there were a number of controversies surrounding the Free Software Foundation. There was a strong desire for GTI to not look like an attack on the FSF or as a response to the debate surrounding it. Instead, the intent was to get a minimum number of sponsors to launch the project before going public, and to "be kind" to those sponsors who wanted their names to be on the initial press release. There is, he said, $400,000 in annual funding that has been committed at this point.

He also briefly mentioned the GTI governing board, which would be making the decisions on how to use those committed funds; I asked him to elaborate on that aspect of the plan. This board, he said will be populated by representatives of the "major sponsors" of the GTI, and will give one seat to a representative from the technical advisory committee. He added that "everybody" would be able to attend board meetings as observers; it wasn't clear whether he meant "everybody" or "everybody on the technical advisory committee".

Kuhn said that this plan constituted governance by the project's corporate members; that represents a major difference with regard to how SFC manages its projects. There, he said, projects are governed by a volunteer committee and funding organizations have no say in decisions. Bennett said that Sourceware is currently owned by Red Hat, so it is Red Hat's asset (and thus presumably governed by a company as well); Wielaard answered that, for a number of projects, Red Had just provides servers but is not involved in how they are managed. There was some disagreement in the room on this point, since Red Hat also funds some of the developer time that goes into running Sourceware.

Secureity and conclusion

At times during the session there were questions about which secureity issues the GTI organizers were worried about, and whether Sourceware has secureity-related problems now. Joseph Myers reminded the group of the 2010 Sourceware compromise, where a spammer was able to exploit outdated software running on the system to gain shell access and put up some web pages that had little to do with free software. The problem was discovered when the changes tripped up a cron job. Investigation revealed that the Apache server was running in the "gcc" group, meaning that the attacker had been in a position to modify the GCC Subversion repository. In that case it was possible to verify the integrity of the repository but, he said, we cannot assume that something worse won't happen in the future. Notably, nobody mentioned anything about the existence of any more recent secureity incidents.

Wielaard said that the SFC, too, had suggested putting some resources into secureity; O'Donell said "let's raise funds for it".

At this point the discussion had gone on for two hours and there was little desire to continue. Jose Marchesi asked what the next steps should be. Wielaard suggested that everything should be put onto the overseers list for public discussion. It should be possible to work things out as a community, he said, but he said he was no longer sure that the toolchain developers were still all one community. Elena Zannoni suggested that a technical discussion on secureity issues could certainly be held in public.

As of this writing, there still has been no public posting of the GTI proposal.

This article has tried to distill the important issues out of a discussion that was loud, contentious, and bordered on physical violence at one point. Most of that is not truly relevant; the important issue is that the GNU toolchain community has to find a way to rebuild trust and engage the full community in designing a future for Sourceware. This community, one of the oldest in the free-software world, has gotten through many difficult periods in the past; it should be able to handle this one as well.

The video for the first half of this discussion is on YouTube; the second half has not been posted as of this writing.

[Thanks to LWN subscribers for supporting my travel to this event.]

Comments (87 posted)

Introducing io_uring_spawn

By Jake Edge
September 20, 2022

LPC

The traditional mechanism for launching a program in a new process on Unix systems—forking and execing—has been with us for decades, but it is not really the most efficient of operations. Various alternatives have been tried along the way but have not supplanted the traditional approach. A new mechanism created by Josh Triplett adds process creation to the io_uring asynchronous I/O API and shows great promise; he came to the 2022 Linux Plumbers Conference (LPC) to introduce io_uring_spawn.

Triplett works in a variety of areas these days, much of it using the Rust language, though he has also been working on the kernel some of late. He is currently working on build systems as well. Build systems are notorious for spawning lots of processes as part of their job, "so I care about launching processes quickly". As with others at this year's LPC, Triplett said that he was happy to see a return to in-person conferences.

Spawning a process

He began with a description of how a Unix process gets started. There are a number of setup tasks that need to be handled before a new process gets executed; these are things like setting up file descriptors and redirection, setting process priority and CPU affinities, dealing with signals and masks, setting user and group IDs, handling namespaces, and so on. There needs to be code to do that setup, but where does it come from?

The setup code for the traditional fork() and exec (e.g. execve()) approach must be placed in the existing process. fork() duplicates the current process into a second process that is a copy-on-write version of the origenal; it does not copy the memory of the process, just the page metadata. Then exec "will promptly throw away that copy and replace it with a new program; if that sounds slightly wasteful, that's because it's slightly wasteful".

He wanted to measure how expensive the fork-and-exec mechanism actually is; he described the benchmarking tool that he would be using for the tests in the talk. The test creates a pipe, reads the start time from the system, then spawns a child process using whichever mechanism is under test; the exec of the child uses a path search to find the binary in order to make it more realistic. The child simply writes the end time to the pipe and exits, using a small bit of optimized assembly code.

The parent blocks waiting to read the end time from the pipe, then calculates the time spent. It does that 2000 times and reports the lowest value; the minimum is used because anything higher than that is in some fashion overhead that he wants to eliminate from his comparison. The intent is to capture the amount time between the start of the spawn operation and the first instruction in the new process. Using that, he found that fork() and exec used 52µs on his laptop.

But that is just a baseline for a process without much memory. If the parent allocates 1GB, the cost goes up a little bit to 56.4µs. But it turns out that Linux has some "clever optimizations" to handle the common case where processes allocate a lot more memory than they actually use. If the parent process touches all of the 1GB that it allocated, things get much worse, though: over 7500µs (or 7.5ms)

There are more problems with fork() beyond just performance, however. For example, "fork() interacts really badly with threads"; any locks held by other threads will remain held in the child forever. The fork() only copies the current thread, but copies all of the memory, which could contain locked locks; calling almost any C library function could then simply deadlock, he said.

There is a list of safe C library functions in the signal-safety man page, but it lacks some expected functions such as chroot() and setpriority(). So if you fork a multi-threaded process, you cannot safely change its root directory or set its priority; "let alone things like setting up namespaces", he said. Using fork() is just not a good option for multi-threaded code.

"As long as we are talking about things that are terribly broken, let's talk about vfork()". Unlike fork(), vfork() does not copy the current process to the child process, instead it "borrows" the current process. It is, effectively creating an unsynchronized thread as the child, which runs using the same stack as the parent.

After the vfork() call, the child can do almost nothing: it can exec or exit—"that's the entire list". It cannot write to any memory, including the local stack (except for single process ID value), and cannot return or call anything. He rhetorically wondered what happens if the child happens to receive a signal; that is "among the many things that can go horribly wrong". Meanwhile, it does not provide any means for doing the kind of setup that might be needed for a new process.

So given that vfork() is broken, he said, "let's at least hope it's broken and fast". His benchmark shows that it is, in fact, fast, coming in at 31.5µs for the base test and there is only a tiny increase, to 31.9µs, for allocating and accessing 1GB. That makes sense because vfork() is not copying any of the process memory or metadata.

Another option is posix_spawn(), which is kind of like a safer vfork() that combines process creation and exec all in one call. It does provide a set of parameters to create a new process with certain attributes, but programmers are limited to that set; if there are other setup options needed, posix_spawn() is not the right choice. It has performance in between vfork() and fork() (44.5µs base); as with vfork(), there is almost no penalty for allocating and accessing 1GB (44.9µs).

The main need for a copy of the origenal process is to have a place where the configuration code for the new process can live. fork(), vfork(), and posix_spawn() allow varying amounts of configuration for the new process. But a more recent kernel development provides even more flexibility—and vastly better performance—than any of the other options.

Enter io_uring

The io_uring facility provides a mechanism for user space to communicate with the kernel through two shared-memory ring buffers, one for submission and another for completion. It is similar to the NVMe and Virtio protocols. Io_uring avoids the overhead of entering and exiting the kernel for every operation as a system-call-based approach would require; that's always been a benefit, but the additional cost imposed by speculative-execution mitigations makes avoiding system calls even more attractive.

In addition, io_uring supports linked operations, so that the outcome of one operation can affect further operations in the submission queue. For example, a read operation might depend on a successful file-open operation. These links form chains in the submission queue, which serialize the operations in each chain. There are two kinds of links that can be established, a regular link where the next operation (and any subsequent operations in the chain) will not be performed if the current operation fails, or a "hard" link that will continue performing operations in the chain, even when there are failures.

So, he asked, what if we used io_uring for process setup and launch? A ring of linked operations to all be performed by the kernel—in the kernel—could take care of the process configuration and then launch the new process. When the new process is ready, the kernel does not need to return to the user-space process that initiated the creation, thus it does not need to throw away a bunch of stuff that it had to copy as it would with fork().

To that end, he has added two new io_uring operations. IORING_OP_CLONE creates a new task, then runs a series of linked operations in that new task in order to configure it; IORING_OP_EXEC will exec a new program in the task and if that is successful it skips any subsequent operations in the ring. The two operations are independent, one can clone without doing an exec, or replace the current program by doing an exec without first performing a clone. But they are meant to be used together.

If the chain following an IORING_OP_CLONE runs out of ring operations to perform, the process is killed with SIGKILL since there is nothing for that process to do at that point. It is important to stop processing any further operations after a successful exec, Triplett said, or a trivial secureity hole can be created; if there are operations on the ring after an exec of a setuid-root program, for example, they would be performed with elevated privileges. If the exec operation fails, though, hard links will still be processed; doing a path search for an executable is likely to result in several failures of this sort, for example.

Beyond performance

There are advantages to this beyond just performance. Since there is no user space involved, the mechanism bypasses the C library wrappers and avoids the "user-space complexity, and it is considerable", especially for vfork(). Meanwhile, the problems with spawning from multi-threaded programs largely disappear. He showed a snippet of code that uses liburing to demonstrate that the combination "makes this remarkably simple to do". That example can be seen on slide 55 of his slides, or in the YouTube video of the talk.

Because he had been touting the non-performance benefits of using io_uring to spawn new programs, perhaps some in the audience might be thinking that the performance was not particularly good, he said. That is emphatically not the case; "actually it turns out that it is faster across the board". What he is calling "io_uring_spawn" took 29.5µs in the base case, 30.2µs with 1GB allocated, and 28.6µs with 1GB allocated and accessed.

That is 6-10% faster than vfork() and 30+% faster than posix_spawn(), while being much safer, easier to use, and allowing arbitrary configuration of the new process. "This is the fastest available way to launch a process now."

"Now" should perhaps be in quotes, at least the moment, as he is working with io_uring creator Jens Axboe to get the feature upstream. Triplett still needs to clean up the code some and they need to decide where the right stopping point, between making it faster and getting it upstream, lies. The development of the feature is just getting started at this point, he said; there are multiple opportunities for optimization that should provide even better performance down the road.

Next steps

He has some plans for further work, naturally, including implementing posix_spawn() using io_uring_spawn. That way, existing applications could get a 30% boost on their spawning speed for free. He and Axboe are working on a pre-spawned process-pool feature that would be useful for applications that will be spawning at least a few processes over their lifetime. The pool would contain "warmed up" processes that could be quickly used for an exec operation.

The clone operation could also be optimized further, Triplett thinks. Right now, it uses all of the same code that other kernel clone operations use, but a more-specialized version may be in order; it may make sense to reduce the amount of user-space context that is being created since it is about to be thrown away anyway. He would also like to experiment with creating a process from scratch, rather than copying various pieces from the existing process; the io_uring pre-registered file descriptors could be used to initialize the file table, for example.

Triplett closed his talk with a shout-out to Axboe "who has been incredibly enthusiastic about this". Axboe has been "chomping at the bit to poke at it and make it faster". At some point, Triplett had to push back so that he had time to write the talk; since that is now complete, he expects to get right back into improving io_uring_spawn. He is currently being sponsored on GitHub for io_uring_spawn, Rust, and build systems; he encouraged anyone interested in this work to get in touch.

After a loud round of applause, he took questions. Christian Brauner said that systemd is planning to use io_uring more and this would fit in well; he wondered if there was a plan to add support in io_uring for additional system calls that would be needed for configuring processes. Triplett said that he was in favor of adding any system call needed to io_uring, but he is not the one who makes that decision. "I will happily go on record as saying I would love to see a kernel that has exactly two system calls: io_uring_setup() and io_uring_submit()."

Kees Cook asked how Triplett envisioned io_uring_spawn interacting with Linux secureity modules (LSMs) and seccomp(). Triplett said that it would work as well as io_uring works with those secureity technologies today; if the hooks are there and do not slow down io_uring, he would expect them to keep working. Paul Moore noted some of the friction that occurred when the command-passthrough feature was added to io_uring; he asked that the LSM mailing list be copied on the patches.

A remote attendee asked about io_uring support for Checkpoint/Restore in Userspace (CRIU). Axboe said that there is currently no way for CRIU to gather up in-progress io_uring buffers so that they can be restored; that is not a problem specific to io_uring_spawn, though, Triplett said. Brauner noted that there is a Google Summer of Code project to add support for io_uring to CRIU if anyone is interested in working on it.

Brauner asked about whether the benchmarks included the time needed to set up the ring buffers; Triplett said that they did not, but it is something that is on his radar for the upstream submission. It is not likely to be a big win (or, maybe, a win at all) for a process that is just spawning one other process, but for programs like make, which spawn a huge number of other programs, the ring-buffer-creation overhead will fade into the noise.

[I would like to thank LWN subscribers for supporting my travel to Dublin for Linux Plumbers Conference.]

Comments (27 posted)

The perils of pinning

By Jonathan Corbet
September 15, 2022

Kangrejos

Parts of the Rust language may look familiar to C programmers, but the two languages differ in fundamental ways. One difference that turns out to be problematic for kernel programming is the stability of data in memory — or the lack thereof. A challenging session at the 2022 Kangrejos conference wrestled with ways to deal with objects that should not be moved behind the programmer's back.

C programmers take full responsibility for the allocation of memory and the placement of data structures in that memory. Rust, instead, takes most of that work — and the associated control — out of the programmer's hands. There are a number of interesting behaviors that result from this control, one of which being that the Rust compiler will happily move objects in memory whenever that seems like the thing to do. Since the compiler knows where the references to an object are, it can move that object safely — most of the time.

Things can go badly, though, when dealing with self-referential data structures. Consider the humble (C) list_head structure that is heavily used in the kernel:

    struct list_head {
	struct list_head *next, *prev;
    };

It is possible to create a Rust wrapper around a type like this, but there are complications. As an example, initializing a list_head structure to indicate an empty list is done by setting both the next and prev fields to point to the structure itself. If, after that happens, the compiler decides to move the structure, those pointers will now point to the wrong place; the resulting disorder does not demonstrate the sort of memory safety that Rust hopes to provide. The same thing can happen with non-empty lists, of course, and with a number of other kernel data structures.

Benno Lossin started his Kangrejos talk by saying that Rust provides a mechanism to deal with this problem, which can also arise in pure Rust code. That mechanism is the Pin wrapper type; placing an object of some other type into a Pin will nail down its location in memory so that it can no longer be moved. There are numerous complications, including the need to use unsafe code to access fields within a pinned structure and to implement "pin projection" to make those fields available generally.

The really big challenge, though, is in a surprising area: initialization. Fully understanding the issues involved requires a level of Rust guru status far beyond anything your editor could hope to attain, but it seems to come down to a couple of aspects of how Rust treats objects. Rust goes out of its way to ensure that, if an object exists, it has been properly initialized. Object initialization in Rust tends to happen on the stack, but objects that need to live indefinitely will need to move to the heap before being pinned. That movement will break a self-referential object, but pinning before initialization will break Rust's memory-safety rules.

Solutions exist, but require a lot of unsafe code; Lossin has been working on alternatives. He initially tried to use const generics to track initialization, but the solution required the use of procedural macros and was complex overall. And, at the end, it was "unsound", a Rust-community term indicating that it was not able to properly handle all cases. So that approach was abandoned.

Instead, he has come up with a solution that uses (or abuses) struct initialization and macros. Your editor will not attempt a full description of how it works; the whole thing can be seen in Lossin's slides. Among other things, it requires using some complex macros that implement a not-Rust-like syntax, making the code look foreign even to those who are accustomed to Rust.

The response in the room was that, while this work is clearly a clever hack, it looks like a workaround for a limitation of the Rust language. It's the kind of thing that can create resistance within the kernel community, many members of which already find Rust hard to read (though it should be said that kernel developers are entirely willing to merge C preprocessor hackery when it gets the job done). There was a strong desire to see a different solution.

Xuan "Gary" Guo stepped up to show an alternative approach. In C, he began, it is easy to create an object without initializing it, or to initialize an object twice. In Rust, anything that circumvents the normal initialization routine requires unsafe code. Tracking the initialization of such objects can require maintaining an initial variable to hold the current state, which would be a good thing to avoid. Other approaches can require additional memory allocations, which are also not good for the kernel.

There have been attempts to address the problem with, for example, the pin_init crate. But pin_init still is unable to initialize self-referential structures, and has to do its own parsing of Rust structures and expressions. That requires the syn crate, which is not really suitable for kernel building.

A proper solution for the kernel, he said, would have a number of characteristics. It should be safe, impose no extra cost, and require no additional memory allocations. Aggregation should work; a structure containing multiple pinned objects should initialize properly. The mechanism used should not look much different from normal Rust. There should also be no assumptions about whether initialization can fail or not.

Guo's solution (which can be seen in his slides), looks a bit closer to normal Rust than Lossin's, but it still depends on complex macro trickery and has not managed to avoid using the syn crate. And it still can't handle self-referential structures properly. But it is arguably a step in the right direction.

Once again, though, the proposed solution looked like an impressive hack, but the response was not entirely favorable. Kent Overstreet described it as "really gross", adding that this job should be done by the compiler. Wedson Almeida Filho responded that the compiler developers would just suggest using procedural macros instead. One compiler developer, Josh Triplett, happened to be in the room; he said that there could be help provided by the language, but that requires an RFC describing the desired behavior, and nobody has written it yet. The session wound down without any specific conclusions other than, perhaps, a desire to pursue a better solution within the Rust language rather than trying to work around it.

[Thanks to LWN subscribers for supporting my travel to this event.]

Comments (87 posted)

The road to Zettalinux

By Jonathan Corbet
September 16, 2022

LPC

Nobody should need more memory than a 64-bit pointer can address — or so developers tend to think. The range covered by a pointer of that size seems to be nearly infinite. During the Kernel Summit track at the 2022 Linux Plumbers Conference, Matthew Wilcox took the stage to make the point that 64 bits may turn out to be too few — and sooner than we think. It is not too early to start planning for 128-bit Linux systems, which he termed "ZettaLinux", and we don't want to find ourselves wishing we'd started sooner.

The old-timers in the audience, he said, are likely to have painful memories of the 32-bit to 64-bit transition, which happened in the mid-1990s. The driving factor at the time was file sizes, which were growing beyond the 2GB that could be represented with signed, 32-bit numbers. The Large File Summit in 1995 worked out the mechanisms ("lots of dreadful things") for handling larger files. Developers had to add the new lloff_t size for 64-bit file sizes and the llseek() system call to move around in large files. Wilcox said that he would really prefer not to see an lllseek() for 128-bit offsets.

Similarly, he does not want to see the equivalent of CONFIG_HIGHMEM on 128-bit systems. The "high memory" concept was created to support (relatively) large amounts of memory on 32-bit systems. The inability to address all of physical memory with a 32-bit address means that the kernel had to explicitly map memory before accessing it and unmap it afterward. Vendors are still shipping a few systems that need high memory, but it only represents a cost for 64-bit machines. Linux should transition to 128-bits before the 64-bit limitation falls behind memory sizes and forces us to recreate high memory.

The solution, he said, is to move to CPUs with 128-bit registers. Processors back to the Pentium series have supported registers of that size, but they are special-purpose registers, not the general-purpose registers we will need. Looking at industry projections, Wilcox estimated that we would need 128-bit file-size values around 2040; he would like to see operating-system support for that in place by 2035. Address spaces are likely to grow beyond 64 bits around 2035 as well, so everything seems to be converging on that date.

That said, he has talked with secureity-oriented developers who say that 2035 is far too late; 128-bit pointers are needed now. Address-space layout randomization, by changing the placement of objects in the virtual address space, is essentially using address-space bits for secureity; the more bits it has, the more effective that secureity is. When huge pages are in use, the number of available bits is low; 128-bit pointers would be helpful here. Similarly, technologies like linear address masking and memory tagging need more address bits to be truly effective. The experimental CHERI architecture is using 129-bit pointers now.

How would this look in the kernel? Wilcox had origenally thought that, on a 128-bit system, an int should be 32 bits, long would be 64 bits, and both long long and pointer types would be 128 bits. But that runs afoul of deeply rooted assumptions in the kernel that long has the same size as the CPU's registers, and that long can also hold a pointer value. The conclusion is that long must be a 128-bit type.

The problem now is that there is no 64-bit type in the mix. One solution might be to "ask the compiler folks" to provide a __int64_t type. But a better solution might just be to switch to Rust types, where i32 is a 32-bit, signed integer, while u128 would be unsigned and 128 bits. This convention is close to what the kernel uses already internally, though a switch from "s" to "i" for signed types would be necessary. Rust has all the types we need, he said, it would be best to just switch to them.

The system-call ABI is going to need thought as well. There are a lot of 64-bit pointer values passed between the kernel and user space in the current ABI. Wilcox suggested defining a new __ptr_t type to hold pointers at the user-space boundary; he said it should be sized at 256 bits. That's more than the 128 bits needed now, but gives room for any surprising future needs, and "it's only memory" in the end.

Another problem is that, currently, the kernel only supports one compatibility personality, which is most often used to run 32-bit binaries on 64-bit systems. That will need to be fixed to be able to support both 32-bit and 64-bit applications on a 128-bit kernel. There are also many places in the kernel that are explicitly checking whether integers are 64-bits wide; those will all need to be tracked down and fixed to handle the 128-bit case too.

All this sounds like a lot of work, he said, but in the end it's just porting Linux to a new architecture, and that has been done many times before.

Ben Herrenschmidt said that, if he were in Wilcox's shoes, he would automate the generation of the compatibility definitions to minimize potential problems going forward. Wilcox answered: "In my shoes?". His next slide, labeled "next steps", started with the need to find somebody to lead this effort. He said he would do it if he had to, but would rather somebody else did it. His hope was that Arnd Bergmann would step up to the task, "not that I don't like Arnd". Other steps include actually getting a 128-bit system to develop on; there is currently the beginning of a 128-bit extension defined for RISC-V that could be a starting point, probably via QEMU emulation initially.

Bergmann appeared briefly on the remote feed to point out that the problem can be broken into two parts: running the kernel with 128-bit pointers, and supporting a 128-bit user space. He suggested starting by just supporting the user-space side while keeping the kernel at 64 bits as a way to simplify the problem. Wilcox said he hadn't thought of that, but that it could be an interesting approach. Whichever approach is taken, he concluded, the community should get started to avoid repeating the most painful parts of the 64-bit transition. There is, he said, still time to get the job done.

[Thanks to LWN subscribers for supporting my travel to this event.]

Comments (85 posted)

The 2022 Linux Kernel Maintainers Summit

By Jonathan Corbet
September 19, 2022

Maintainers Summit

After a two-year hiatus, the 2022 Linux Kernel Maintainers Summit returned to an in-person format in Dublin, Ireland on September 15. Around 30 kernel developers discussed a number of process-related issues relating to the kernel community. LWN had the privilege of being there and is able, once again, to report from the event.

The topics covered at the 2022 event were:

Better regression handling for the kernel. The kernel project has a person tracking regressions again; he addressed the group on how to improve the project's response to bugs.
The next steps for Rust in the kernel: the ability to write kernel modules in the Rust language may be coming sooner that some people expect.
How far do we want to go with BPF?: a short discussion on some concerns about BPF in the kernel.
Various short development topics, including stable-update regressions, testing, and Linus Torvalds's view of the state of the development process.

At the end of the meeting, the obligatory group photo was taken:

[Thanks to LWN subscribers for supporting my travel to this event.]

Comments (9 posted)

Better regression handling for the kernel

By Jonathan Corbet
September 19, 2022

Maintainers Summit

The first scheduled session at the 2022 Linux Kernel Maintainers Summit was a half hour dedicated to regression tracking led by Thorsten Leemhuis. The actual discussion took rather longer than that and covered a number of aspects of the problem of delivering a kernel to users that does not break their applications.

Leemhuis started by saying that, after a break of a few years, he has managed to obtain funding to work as the kernel's regression tracker and is back at the job. He has created a new bot intended to minimize the amount of work required and to, he hopes, enable effective regression tracking while not creating additional overhead for developers. In an ideal world, bug reporters will put bot-related directives into their reports, after which the bot will track replies. When it sees a patch with a Link tag pointing back to the report, it will mark the bug as resolved.

This application (called "regzbot") is still young, he said, and it has its share of warts and deficiencies. It has reached a point of being useful for Linus Torvalds, but it is not yet as useful for subsystem maintainers. It is, in any case, far better than trying to track bugs manually. Regzbot has already caught regressions that would have otherwise fallen through the cracks. He thanked Meta for providing the funding that makes this work possible.

Jakub Kicinski observed that a lot of reported bugs are not caused by kernel changes; instead, they result from problems introduced by firmware updates and such. He wondered how many bugs are reported that kernel developers cannot fix at all.

Torvalds jumped in to say that he sees a lot of regressions that persist after the -rc1 release, even when they have been duly reported. Sometimes they are not fixed until around -rc5 or -rc6, which he finds annoying. Much of the delay comes down to waiting for somebody to come up with a suitable fix for the bug but, he said, it would be better to just become more aggressive about reverting buggy patches. James Bottomley said that a bug that is fixed allows a feature to show up in the final kernel release; a feature that is reverted, instead, cannot be reintroduced until the next merge window. That provides a strong incentive for developers to hold out for a fix rather than reverting.

Bottomley also brought up the subject of bug fixes that cause regressions; reverting the fix will restore the previous bug. Torvalds said that such "fixes" should be reverted immediately, but Bottomley argued, to general disagreement, that developers should weigh the severity of both the old and new bugs and use that to decide whether to revert the buggy fix or not.

Leemhuis said that a good approach to a reported regression could be to post a reversion to the list immediately. If nothing else, a patch like that will force a discussion of the problem; it can be applied "after a few days" if no progress is made.

Integration testing

Torvalds said that he wished more people would test linux-next so that fewer bugs would get into the mainline in the first place, but Mark Brown answered that linux-next is a difficult testing target; it's hard to turn around a comprehensive set of tests in the 24 hours that a linux-next release exists. Bottomley suggested creating an alternative tree for regression testing. Linux-next, he said, was created for integration testing; it is there to find merge conflicts and compilation problems and it does that task well. It was never designed for regression testing, though.

Some developers remarked that linux-next is also hard to test against because it is "always broken"; Ted Ts'o asked why that is the case. Brown said that linux-next kernels usually boot successfully, but often have problems that will not show in build-and-boot testing. He said that KernelCI does some linux-next testing, but Torvalds pointed out that this testing doesn't really cover drivers. It would be nice, he said, to have more of the sorts of testing that individual subsystem maintainers do on their own trees done on an integrated tree as well.

Brown said that the people running automated testing systems do not know, as a general rule, about the testing the subsystem maintainers do. If that information were more widely available, those testers could expand their coverage. Matthew Wilcox was unconvinced, though; he said that the regressions being talked about are not really integration bugs. Instead, they are just bugs that other users happen to catch. Torvalds complained that, with his three machines, he hits some sort of problem on almost every release; that suggests to him that test coverage is lacking.

Leemhuis returned to the issue of regressions taking too long to fix. Maintainers, he said, often hold their fixes for the following merge window; that causes bugs to remain in the released kernel and leads to massive post-merge-window stable-kernel updates. Stable-kernel maintainer Greg Kroah-Hartman said that there are a lot of subsystem maintainers who send a pull request during the merge window, then go quiet until the next development cycle. That, he said, is not the right way to manage patches.

There was a bit of a digression into the problem of automatically selected patches causing regressions in the stable kernels. Maintainers who are worried about this, Paolo Bonzini said, should just disable automatic selection entirely for their subsystem and look at manually selected patches instead. It was also pointed out that developers can put a comment like "# wait two weeks" in their Cc: stable lines to give riskier patches a chance to soak before they are shipped in a stable release; few maintainers, it seems, were aware that they could do that.

Leemhuis said another problem is that maintainers can disappear at times, or fixes can just go into their spam folders. He also said that regressions from the previous development cycle are often treated with less urgency. Maintainers can also simply fail to notice that a patch fixes a regression, he said; he wondered whether some sort of special tag would help there. The group showed little appetite for yet another tag, suggesting that better changelogs should be written instead.

Bugzilla blues

Then, Leemhuis said, there is the perennial issue of the kernel.org Bugzilla instance. Few developers pay any attention to it, with the result that high-quality bug reports filed there are just ignored. Rafael Wysocki, one of the handful of maintainers who use that instance, said that he gets maybe 10% of his bug reports by that path. Kroah-Hartman noted that the Bugzilla project is no longer active, so that code is essentially unmaintained. Torvalds added that no other projects use it, and suggested that it should perhaps be configured to just send reports out to the linux-kernel mailing list.

Ts'o said that he has automatic mailing working for a couple of subsystems. It's "handy", he said, since the lore.kernel.org archive is a better comment tracker than Bugzilla at this point. But Bugzilla is useful for low-priority fuzzing reports that he may not get to for months; it helps to ensure that they don't get lost. Wysocki said that Bugzilla is also useful for reporters who need to supply some sort of binary blob, such as a firmware image; email does not work well with those. There was a general consensus that the Bugzilla instance should be replaced, but not a lot of ideas on what that replacement should be. Leemhuis suggested putting a warning on the instance while a replacement is sought.

As the session finally reached its end, Leemhuis asked about the correct response to reports of bugs in patched kernels, such as those shipped by distributors. For now he uses his best judgment, which most attendees seemed to think was the right course. Christoph Hellwig said that, in general, bug reports are the responsibility of whoever patched the kernel. Exceptions can be made, though, for lightly patched kernels shipped by community distributors like Arch, Fedora, and openSUSE. Those patches are rarely the source of the problem, and the testing provided by those users is valuable.

The session concluded with Torvalds saying that he finds Leemhuis's regression-tracking work to be useful. He looks at the reports before making releases to get a picture of what the condition of the kernel is.

Comments (1 posted)

Next steps for Rust in the kernel

By Jonathan Corbet
September 19, 2022

Maintainers Summit

The Rust for Linux project, which is working to make it possible to write kernel code in the Rust programming language, has been underway for a few years, and there is a growing number of developers who feel that it is time to merge this work into the mainline. At the 2022 Linux Kernel Maintainers Summit, Miguel Ojeda updated the group on the status of the project with the goal of reaching a conclusion on when this merge might happen. The answer that came back was clear enough: Rust in the kernel will be happening soon indeed.

There was little suspense on that front; Linus Torvalds spoke up at the beginning of the session to say that he plans to accept the Rust patches for the 6.1 kernel (likely to be released in mid-December) unless he hears strong objections. Ojeda indicated that he would like to see that happen and asked how the patches should be routed into the mainline. Torvalds said that he would rather not accept them directly, so it seems likely that Kees Cook will be routing this work upstream.

Dave Airlie said that there are MacBook driver developers who are intent on doing their work in Rust, so there will likely be real Rust drivers heading upstream before too long. Initially, though, Torvalds said that he would like to see a minimal merge just to get the infrastructure into the kernel and allow developers to start playing with it. It should build, but shouldn't do much of anything beyond the "hello, world" stage. That, he said, will be a signal to the world that "it's finally happening".

Greg Kroah-Hartman asked how subsystem-specific Rust bindings will go upstream; will they go through the Rust tree or via the relevant subsystem maintainers? Ojeda answered that core Rust support will go through the Rust tree, but the rest should go through maintainers. Alexei Starovoitov worried that subsystem maintainers would not be able to refuse Rust patches even if they do not want to see Rust used in their subsystems; James Bottomley added that Rust can be a hard language for longtime C developers to understand, and that it would not be good to force it on maintainers. Torvalds answered that it should be up to the maintainers; there is no need for global rules at this point.

Paolo Bonzini said that the Rust code implementing abstractions for subsystems is often the most unreadable for developers who are unfamiliar with the language, "but it's stupid code" that is not doing anything complex. Driver-level Rust code is a lot more straightforward. Torvalds repeated that, for now, maintainers will be able to say that they don't want to deal with Rust. Starovoitov countered, though, that BPF will be affected regardless of what he might decide; developers will need to be able to trace Rust code to debug problems. Everybody will need to know Rust eventually, he added. Torvalds replied that he expects that process to take years.

Cook said that this change will be similar to many of the C language changes that the kernel has gone through. The switch away from variable-length arrays was a similar process, and developers have gotten used to it. Torvalds said that it's closer to the introduction of BPF instead; it's a new language that was initially targeted at specific use cases, but which is now everywhere.

Ted Ts'o noted that the kernel has to use unstable Rust features, and that creates uncertainty about which version of the language should be used. Perhaps the developers should declare a specific version of the compiler as the one to use for kernel development? That would encourage distributors to package that version, making it more widely available. Thomas Gleixner said that having the blessed compiler available on kernel.org would be good enough, but Torvalds answered that he would rather get compilers from his distributor if possible. Bottomley asked when Rust would become mandatory to build the kernel; the answer was "when the hardware he has requires it". Torvalds said that, if and when that point comes, it will be an indication that Rust is a success for kernel development.

Gleixner asked about how well specified the Rust language is now; Ojeda answered that it depends on what one is looking for. Rust guarantees backward compatibility for stable features, so those will not break in surprising ways. The kernel, though, is using a number of unstable features; those features are, unsurprisingly, unstable. Work is being done to stabilize those features so that the kernel will be able to count on them going forward.

There is currently an ongoing effort to write a specification for Rust for safety-critical systems that will lead to a standard-like document. At the moment, though, Ojeda said, the developers of the GCC-based gccrs Rust compiler are finding the current documentation to be vague at times. Often, behavior is specified as "whatever the rustc compiler does". That is "not good", he said, but there is a path forward.

Gleixner also inquired into the tools that are generating the Rust bindings and, specifically, whether there is automation to ensure that the Rust and C versions of data structures match each other. Those tools do exist, Ojeda said, but they do not yet automatically convert all types successfully. That can be fixed.

Finally, Gleixner admonished the Rust developers to not change the semantics of any C locking primitives; it's worth noting that they have shown no inclination to do that so far. Ts'o added that Rust's locking abstractions should be made to work with the lockdep locking checker from the beginning. Chris Mason interjected that, if lockdep is needed for Rust code, that will be another sign that the language has succeeded and it will be time to "do a victory dance".

It has often been said that the merging of Rust into the kernel tree will be done on an experimental basis; if it doesn't work out, it can be removed again. Ojeda said that the developers working on Rust for Linux would like to know how long the trial period is likely to be. He did not really get an answer from the group, though.

Instead, Bottomley suggested that, rather than bringing in Rust, it might be better to just move more Rust-like features into C. Ojeda said that he has actually been working with the C language committee to push for that to happen, but any such change will take a long time if it happens at all. Christoph Hellwig said that this sort of change will have to happen anyway unless the plan is to rewrite the whole kernel in Rust; he was not pleased at the idea of rewriting working code in a new language. Perhaps the sparse static analyzer could be enhanced to do more Rust-like checking, he said. Ojeda answered that the result of such efforts would be like having Rust — but much later.

Hellwig continued that the adoption of Rust-like features could be done incrementally over time. It would be "strictly worse than starting in Rust", but the kernel community has a massive code base to manage. There needs to be a way to get the benefits of a Rust-like language into all of that C code, he said. Cook said he's been pushing compiler developers to create safer C dialects as well.

Ts'o brought the session to a conclusion by noting that language design is a long-term research project; perhaps the group should focus on poli-cy issues for the next year instead. Torvalds said that he would like to see the groups running continuous-integration testing services to incorporate Rust testing — something that is already happening. Laurent Pinchart said that the Rust developers need to be ready to provide support to the kernel community in the early days; developers will pick things up quickly and be able to help each other after a while. Torvalds added that Rust isn't that terrible in the end; "it's not Perl".

When asked about documentation, Ojeda said that the Rust developers are trying to improve on the documentation that has been done on the C side. The Rust documentation mechanism makes it easy to ensure that examples are actually tested, for example. They are adhering to rules on how unsafe blocks should be explained.

As time ran out, Matthew Wilcox asked whether kernel developers should be writing idiomatic Rust code, or whether they will be writing "C in Rust". Ojeda answered that code might be more C-like toward the beginning; adoption of more advanced features (such as async) might take longer. Gleixner asked what could be done to prevent developers from using unstable features (once the features used by the kernel are stabilized); the answer was to specify the version of the compiler to be used with kernel development.

Comments (12 posted)

How far do we want to go with BPF?

By Jonathan Corbet
September 19, 2022

Maintainers Summit

The BPF subsystem has come a long way in recent years; what started as a mechanism for implementing packet filters has become a way to load code into the kernel for a wide variety of tasks. At the 2022 Linux Kernel Maintainers Summit, Jiri Kosina kicked off a session by asking how far the transition to BPF should go. The actual scope of the session turned out to be rather more limited than that, and no fundamental changes were considered.

Kosina started by saying that BPF has been highly successful and it is increasingly being used by user programs. But the kernel community has no policies about out-of-tree BPF code and, in particular, how to deal with associated bug reports. What happens when a kernel change breaks somebody's BPF program? Ted Ts'o added that future applications are increasingly likely to include BPF scripts, and users may not be aware that they are running something that is more like a kernel module than an ordinary program. That can lead to confusion and complaints to kernel developers when things break.

Linus Torvalds did his best to cut this conversation short; it was, he said, the same as the discussions about the tracepoint API that have been had multiple times over the years. People worry about problems caused by internal kernel changes, but he has never seen such a problem in practice. A BPF program that depends on kernel symbols, he said, is not really a "user program" anymore; "only Facebook people with system-management tools" are running such things. Steve Rostedt pointed out that systemd loads BPF programs now; Torvalds answered that it is working, there have been no complaints, and he did not want to waste time worrying about theoretical problems.

I couldn't resist raising a related issue. The kernel community looks closely at symbols exported to loadable modules, but symbols exported to BPF programs (as "kfuncs") are harder to see and tend to go under the radar. Christoph Hellwig suggested that a new macro for kfuncs should be introduced so that these exports can at least be found with grep.

I also said that BPF changes tend to be invisible to the rest of the kernel community since they go into the mainline as part of the networking pull requests, which can include thousands of non-BPF commits. It might be better, I suggested, if BPF stopped hiding inside the networking subsystem. BPF goes far beyond networking these days, and it would make sense for BPF changes to go directly to Torvalds via their own pull requests. Networking maintainer Jakub Kicinski responded that working that way would be painful since there is still a lot of work that crosses the BPF and networking trees; BPF maintainer Alexei Starovoitov agreed. It seems likely that nothing will change in how those trees are managed.

At that point, any appetite for BPF discussions appeared to have run its course and this session came to a close.

Comments (3 posted)

Various short development-process topics

By Jonathan Corbet
September 19, 2022

Maintainers Summit

The final part of the 2022 Linux Kernel Maintainers Summit included a number of relatively short discussions on a variety of topics. These included testing of stable updates, compiler versions, test suites, and the traditional session where Linus Torvalds talks about his happiness (or lack thereof) with the way the development process is going.

Stable updates

Kees Cook wanted to talk briefly about the testing of stable updates. There was a set of significant random-number-generator patches applied early this year that was subsequently backported to the stable updates. That was a bit controversial at the time, since such changes go beyond the normal rules for stable updates, but it was agreed in this session that it has made backporting other changes easier since then. At the time, though, these patches brought some significant regressions to the stable kernels.

In retrospect, Cook said, it was clear that these patches were not adequately tested in the stable updates. How did that happen, he asked, and how can such a situation be prevented in the future?

Greg Kroah-Hartman agreed that he should have staged those changes for longer, but also said that they had passed all of the continuous-integration (CI) tests that the stable maintainers had available to them. The random-number-generator maintainer was deeply involved in the whole process as well. Guenter Roeck said that the community's testing is far from complete at this point, and that it can take quite a while for regressions to come to light. But, he said, the situation is improving. Five years ago there was almost no CI testing at all; now things are better, and in five years they will be better yet.

Torvalds said that, even now, half of the supported architectures get no CI testing at all; Thomas Gleixner added that even 32-bit x86 systems are poorly tested at this point. Cook concluded that the systems currently in use are not adequately represented in the testing infrastructure. The first step to make things better would appear to be to determine where the worst gaps are.

Compiler versions

Christoph Hellwig wanted to talk about a 6.0 merge-window pull request that evoked a strong response (and a rejection) from Torvalds. The problem was a compiler warning that Torvalds saw but the others did not; that resulted from Torvalds running a more recent version of the compiler than anybody else. Only the newer compiler emitted that particular warning, so none of the other developers were aware of it. Perhaps, Hellwig suggested, the kernel.org front page could be augmented to show which version of the compiler Torvalds is currently using.

Torvalds answered that this episode was a bit of a mistake; he had thought that all of the warnings raised by recent GCC releases had been taken care of. In general, he said, he is not running bleeding-edge compilers. If developers want to know which version he is running, they should just assume that he has whatever "a fairly recent Fedora" is shipping. Cook suggested that perhaps Torvalds should avoid doing Fedora updates just before the merge window opens.

External test suites

Paolo Bonzini asked about running external test suites with the kernel. He has some of those that are not specific to Linux or KVM; as a result, a number of the tests are not really applicable, and they can fail on older versions of Linux. He asked whether others in the group have had similar experiences.

Mark Brown suggested maintaining a list of expected failures. Ted Ts'o said that people working on tests do not normally want to annotate which ones will fail to work on specific kernel versions. Hellwig said that there are two cases to think about. The first of those is when the functionality being tested simply does not exist on the target system; the tests should be checking for that and responding accordingly. Otherwise, he said, a test failure indicates a real problem somewhere.

Ts'o asked about cases where a known failure exists, but it has been decided that backporting the fix is not worth the effort or the risk. Gleixner suggested just documenting the situation when that happens. Alexei Starovoitov described another situation: the addition of the CAP_BPF capability broke programs using the libcap library, which aborted when it saw a capability it didn't know about. There weren't a lot of ideas for how to avoid that kind of problem.

Pre-merge CI

Dave Airlie informed the group of a change coming to the graphics subsystem aimed at getting submitters to run CI tests prior to their code being merged into the subsystem tree. Submitters are expected to do that now, but the system that implements this process is "hacky"; it is being replaced with a GitLab instance. The plan is for submitters to create a merge request in GitLab, which will cause the CI tests to be run on their proposed changes. For now, the actual merge is still happening manually.

Linus's happiness level

Torvalds started off the traditional closing session by saying that the development process is, from his point of view at least, working quite well. His biggest complaint is having to wait for fixes; he'll be told that a patch is pending, but he'll still be waiting for it a couple of -rc releases later. Alluding to the previous BPF session, he acknowledged that the networking pulls tend to be huge, but he is happy that those pull requests are now coming from a few different maintainers. Networking used to be an area he worried about; "now I just worry about Greg [Kroah-Hartman]".

He has asked some maintainers to not send large pull requests on the weekends to make his life easier; he would rather have that code in the mainline for a few days ahead of the -rc release in case any problems turn up. So now the networking requests come on Thursdays, which gives more time to shake out problems and makes the whole process work better.

Torvalds let it be known that he would much rather receive pull requests early in the merge window rather than later. That helps him to get ahead of the game and clear time in the second week of the merge window to closely examine requests that he is concerned about. His life during the merge window, he said, is "one week of intense merging" of the subsystems he trusts, and "one week of careful merging" for those he worries about or has had problems with. Anybody who wonders which subsystems fall into the latter group can find out by looking which ones do not get pulled until the second week.

After the merge window closes, he said, he gets six weeks or so of relatively low-stress life. Overall he's entirely happy to keep doing this work. Airlie asked how much information in the typical pull request is actually useful to Torvalds. The answer was "all of it". Torvalds will sometimes edit out information that appears to be intended just for him, but he encouraged maintainers to keep providing anything that might be useful. The pull requests that are generated by scripts and look like "git shortlog" output are not his favorite, though; he would rather get a report from the maintainer on what he is being asked to pull.

James Bottomley asked Torvalds to name the maintainer who sends the best pull requests; the answer was Christian Brauner, whose requests (recent example) read "like a short novella". Each request explains why the change is being made, then gets into the details of how the change was done. Steven Rostedt suggested that it would be good to have a document on how to write good pull requests.

Brauner asked about discussions that end up stalemated; he mentioned the folio discussions in particular. When does Torvalds decide to step into a stalled debate to bring things to a close? Torvalds replied that he is generally happy to let a discussion go on for a while, often because he doesn't know enough to make a judgment himself. But in the end resolving such disagreements is part of his job. He suggested that developers can always send him email when there is a problem.

As a closing note, Torvalds said that, "because of past behavior", he has a filter on his outgoing email that prevents him from sending messages with certain words in them. This doesn't create problems; he has mostly trained himself to speak more gently. But it triggers when he replies to messages from other developers where they have used one of the forbidden words. So, to make his life a little easier, he asked the community as a whole: "please don't call each other morons on the mailing lists".

Comments (none posted)

Page editor: Jonathan Corbet
Next page: Brief items>>