LWN.net Weekly Edition for January 30, 2025
Welcome to the LWN.net Weekly Edition for January 30, 2025
This edition contains the following feature content:
- Vendoring Go packages by default in Fedora: the ongoing story of the impedance mismatch between language-specific repositories and the traditional Linux distribution model.
- The Rust 2024 Edition takes shape: the next major Rust edition is about to land; here's what the final changes look like.
- The first part of the 6.14 merge window: an overview of what has landed in the mainline so far.
- The trouble with the new uretprobes: a newish (and somewhat strange) system call creates problems with container-management systems.
- FOSDEM keynote causes concerns: who should be given a platform at this important conference?
- Offline applications with Earthstar: a privacy-oriented database system.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Vendoring Go packages by default in Fedora
The Go language is designed to make it easy for developers to import other Go packages and compile everything into a static binary for simple distribution. Unfortunately, this complicates things for those who package Go programs for Linux distributions, such as Fedora, that have guidelines which require dependencies to be packaged separately. Fedora's Go special interest group (SIG) is asking for relief and a loosening of the bundling guidelines to allow Go packagers to bundle dependencies into the packages that need them, otherwise known as vendoring. So far, the participants in the discussion have seemed largely in favor of the idea.
Discussions about vendoring and distribution packaging are not new nor unique to Go or Fedora. LWN has covered the overlap between language and distribution package managers in 2017, vendoring and packaging Kubernetes for Debian in 2020, a discussion around iproute2 and libbpf vendoring also in 2020, and another Debian conversation about vendoring in 2021—and there's no doubt similar discussions have taken place in the interim. It is a recurring topic because it remains an unsolved problem and a perennial pain point for packagers for Linux distributions that have policies that discourage bundling.
Those policies are not meant to frustrate packagers, however, but to benefit the other contributors and users of the distribution. If there is a bug or security flaw in a dependency, for example, it is easier to update a single package than if the dependency is bundled in several packages. Licensing and technical review of packages with bundled dependencies is more complicated, and there is a greater chance that non-compliant licenses and bugs will be missed. Bundling also leads to bloat: if five packages bring along their own copies of a dependency, that is a waste of disk space for users. While disk space is less precious than it was when distributions first developed these policies, there is no sense in wasting it. In short, there are good reasons Fedora and other distributions have developed packaging policies that discourage bundling.
Nevertheless, those policies do frustrate packagers—many of whom are volunteers with limited time and patience to dedicate to the task of splitting out a program's dependencies into multiple packages. Go programs are known to be particularly painful in this regard. Last year "Maxwell G" wrote a blog post about the problems with unbundling Go to meet Fedora's guidelines. They pointed out that the Go SIG maintains 556 application packages, with 1,858 library packages to support those applications:
Maintaining these library packages is not exactly straightforward. The Go packaging ecosystem is [non-ideal] for distributions that wish to maintain each library as a separate package—it was never designed that way. A fair number of libraries do not publish tagged releases, so each project depends on an arbitrary commit of those libraries. Other widely used libraries, such as the golang-x-* projects, only have 0.Y.Z beta releases, despite being depended on by a large number (494) of source packages. (This is a recent development; for a while, these packages did not have tagged releases either.) Furthermore, the Go module ecosystem lacks a way for packages to set compatible version ranges like, for example, the Python and Rust crate ecosystems. Each project just sets a single version in the go.mod file. We cannot rely on upstream metadata or stable versions to help us determine whether an update is incompatible with other packages in the distribution like other Fedora language ecosystems do.
To cope with the mismatch between Go and RPM packaging, Maxwell G has been working on the go-vendor-tools project to handle creating reproducible vendor archives and better handle license scanning for those dependencies. They said recently that the Go SIG has moved 25 packages to this new tooling, including Docker. Maxwell G recommends bundling by default for Go software.
The proposal
The most recent discussion started when Álex Sáez, a member of the
Go SIG, opened a
ticket about the SIG's "challenges in maintaining and updating
Go dependencies
" with the Fedora Engineering Steering Council
(FESCo) on January 13. There has been a "growing difficulty in
managing Go dependencies in the container ecosystem
"
which is making it difficult to update packages that have new
dependencies. Kubernetes, Docker, Podman, and many other applications
popular in the container ecosystem are primarily written in Go.
Fedora's guidelines
currently state that Go dependencies should be unbundled by default, but
say that it "can be reasonable to build from bundled
dependencies
" for some projects. Sáez argued in the ticket that
the complexity of unbundling dependencies is justification enough, and
suggested a shift toward allowing bundled Go packages by
default instead. FESCo member Kevin Fenzi recommended
gathering feedback from the larger developer community.
Mikel Olasagasti sent a message to the fedora-devel mailing list to make the Go SIG's case and solicit feedback. Allowing vendoring by default would alleviate bottlenecks and improve the stability of Go programs in Fedora, he said. It is urgent now due to the inclusion of Go 1.24 in the upcoming Fedora 42 release, which has caused more than 200 packages to fail to build from source (FTBFS). If consensus on the proposal were to be reached, the Go SIG could implement the change as part of Fedora 43.
Why just Go?
One of the first and predictable reactions
to the idea of loosening bundling restrictions on Go package was that
it would usher in new rules for other languages as well. Michael
Gruber called
the proposal "the can-opener to deviate from our distribution
model in general
".
I mean, if I can pull in a myriad of Go or Rust modules for my package to build (without individual review or re-review in case of changed dependencies), then why should I bother unbundling a library such as zlib, regex, openssl, ... ?
Gruber said that if developers were installing dependencies via language ecosystem tools instead of using dependencies from Fedora, it would fundamentally change Fedora to become merely a platform for applications rather than a development platform.
Daniel P. Berrangé replied
that the logic should be applied consistently across languages, and
suggested that it was time to rethink Fedora's approach to packaging
for "non-C/non-ELF language modules
". With languages such
as Go, Python, Java, Rust, and more, dependencies are "fast moving
with a somewhat more relaxed approach to API [compatibility] than
traditional ELF libraries
". Fedora's approach, he said, relies on
"the heroics of small numbers of SIG members
" and if those
people decide to focus on other things, then support for those languages is
at risk of falling apart. He mentioned Node.js, which was largely
maintained solely by Stephen Gallagher until he stepped
down from that in May 2024.
Fedora's review process has critical components,
Berrangé said, particularly when it comes to license scanning. But Fedora and
its upstreams are "completely non-aligned
" when it comes to
bundling dependencies. And that all signs point to Fedora and its
upstreams moving further apart "for non-infrastructure components
of the OS in the non-C/ELF world
". He did note that there would be
security tradeoffs if Fedora allowed more packages to bundle
dependencies, but suggested that there would be more time to work on
security updates and improve automation if developers didn't have to
do the busy work of unbundling dependencies to begin with.
Gerd Hoffmann pointed out that the current practice of unbundling Go and Rust dependencies does not make it easier to publish security updates, due to static linking. It is not enough to update the package with a security flaw, any packages depending on the updated package have to be rebuilt as well. There was a hefty side discussion that followed after FESCo member Neal Gompa complained about how package rebuilds are handled in Fedora versus openSUSE.
FESCo member Zbigniew Jędrzejewski-Szmek disagreed
with the reasoning that Fedora should allow bundling across the board
for "consistency
", and disagreed with those who opposed
bundling because "that'll create a slippery slope to use it
everywhere
." Other language ecosystems such as Rust and Python
deal well with unbundling dependencies. He disagreed that modern
languages were incompatible with traditional distribution
packaging.
Jan Drögehoff disputed that Rust dealt well with unbundling dependencies, though he said Go's ecosystem was even more annoying than Rust because it had a decentralized namespace that allowed linking to dependencies anywhere. Rust, in comparison, uses the crates.io registry that ensures that two dependencies do not share the same name. He said it should be a matter of when and how bundling was allowed for all languages, not if it is allowed. Unbundling dependencies causes too much work for packagers:
Lots of Fedora package maintainers are already spread across a dozen packages for which they often don't have the time to properly maintain and most of the time no one cares because there are only a handful of people using it or any programs that consume it hasn't been [updated] either.
Rings to unbind them
Many years ago, outgoing Fedora Project Leader (FPL) Matthew Miller put together a proposal to change how Fedora was developed. Specifically, he proposed the idea of package "rings". (LWN covered this in 2013.)Under that plan, Fedora would adopt a tiered system for packages. Packages in the zeroth tier would adhere to the Fedora Packaging Guidelines we know and love today. Packages in the first tier would have looser guidelines, such as allowing vendoring, while outer tiers would have looser guidelines still. That proposal was never adopted, but Miroslav Suchý suggested revisiting the idea and allowing bundling if a package was in the outer ring and no inner packages were dependent on it.
Jędrzejewski-Szmek said
that the ring idea is dead, and should stay dead. The answer to
whether bundling is appropriate depends on the implementation language
and details of the individual package, such as whether that package is at the
core of Fedora with many dependencies or a leaf package with no
dependencies. "This does not sort into rings in any way.
"
Suchý countered that the Copr build system and repositories had become the unofficial outer ring of Fedora. Berrangé also argued that the ring idea did not come to exist within Fedora, but it had grown up outside of Fedora.
An increasingly large part of the ecosystem is working and deploying a way that Fedora (and derivative distros) are relegated to only delivering what's illustrated as Ring 1. This is especially the case in the CoreOS/SilverBlue spins, but we see it in traditional installs too which only install enough of Fedora to bootstrap the outside world. Meanwhile ring 2 is the space filled by either language specific tools (pip, cargo, rubygems, etc), and some docker container images, while ring 3 is the space filled by Flatpaks and further docker container images.
While users adopt things from the outer rings, they lose the benefits that Fedora's direct involvement would have enabled around licensing and technical reviews, as well as security updates.
Should we stay or should we vendor?
At least one FESCo member seemed skeptical of loosening the rules
for all languages, and not entirely convinced about the Go SIG
proposal. Fabio Valentini asked
how bundling would let the Go SIG update 200 FTBFS packages any
faster, and said that patching vendored dependencies is really
annoying at least when it comes to Rust. "It *is* possible, with
workarounds (disabling checksum checks, etc.) but it's very
gnarly.
".
He also said, in reply to a comment about supply-chain security
from Richard W.M. Jones, that bundling is not a cheat code to avoid
review. "The responsibility to check that vendored dependencies
contain only permissible content is still on the package
maintainer
".
Olasagasti explained
that the number of required packages in Fedora can far exceed the
number of dependencies defined in its go.mod (the file
that describes a module's properties including its dependencies). If a
Go package had vendored dependencies "it might not require
these additional packages, as they wouldn't be part of the bundled
application
." The doctl package that implements the Digital
Ocean command-line interface requires 122 dependencies in its
go.mod but requires 752 Fedora packages to build—629
are development packages for Go.
For example, doctl's go.mod file defines 3 indirect dependencies in the containerd modules (console, containerd, and log). However, for Fedora, at least 16 containerd modules would need to be included (aufs, btrfs, cgroups, cni, console, containerd, continuity, fifo, fuse-overlayfs-snapshotter, imgcrypt, nri, runc, stargz-snapshotter, ttrpc, typeurl, and zfs).
FESCo member Fabio Alessandro Locati agreed with Valentini that the conversation should focus only on Go. However, he looked favorably on the idea of moving Go packages to a vendored approach.
At the moment, the debate continues without a clear indication that FESCo will approve or deny the Go SIG proposal. However, there seems to be less resistance to the idea than there might have been a year or three ago. If approved for Go, it seems likely that lobbying will begin shortly after for other languages to follow suit.
The Rust 2024 Edition takes shape
Last year, LWN examined the changes lined up for Rust's 2024 edition. Now, with the edition ready to be stabilized in February, it's time to look back at the edition process and see what was successfully adopted, which new changes were added, and what still remains to work on. A surprising amount of new work was proposed, implemented, and stabilized during the year.
Editions are Rust's mechanism for ensuring stability in a language that makes frequent, small releases, and which is still evolving quickly. Each edition represents a backward-compatibility guarantee: once the edition is finalized, code that compiles on that edition will always compile on that edition. The editions aren't totally frozen — the language can still add new features, so long as they're backward compatible — but the project takes the commitment to backward compatibility seriously. New releases of the compiler are tested against most of the published Rust code on crates.io and the Rust-for-Linux kernel code to ensure that they don't break code written for old editions.
Therefore, the introduction of a new edition represents the language's only chance to introduce backward-incompatible changes. Over the past year, a good number of such changes have piled up, from minor changes to the syntax, all of the way through changes to how certain types are interpreted by the type system. While none of the changes are essential for Rust projects to immediately adopt, they do represent something to be aware of when writing Rust code.
Scope changes
One problem that has vexed Rust programmers for some time is the way that the language handles the lifetime of temporary values. For example, the if let construction lets the programmer fallibly unwrap a value. In the current version of Rust, however, it has the annoying property of keeping any temporaries involved in the match alive until after the corresponding else block. For example, in this code the std::sync::Mutex remains locked in the else branch:
if let Some(x) = mutex.lock().some_method() { ... } else { // Mutex remains locked here }
Changing this is a backward-incompatible change, because Rust programs can be sensitive to the order in which items are dropped. In particular, any program that relied on the mutex remaining locked in the else block would be broken if the language changed to drop temporary items earlier. That change is ready to be enabled in the 2024 edition, along with a linting rule to warn programmers about affected programs.
The project is also fixing the order in which temporary expressions at the end of a block get cleaned up. Rust blocks have return values, which lets if statements be used as expressions, among other cases. But when an expression is returned from a block, it might create temporary values. For example, this code creates a temporary value for cell.borrow():
fn f() -> usize { let cell = RefCell::new(".."); // The borrow checker throws an error for the below // line in the 2021 edition, but not the 2024 edition cell.borrow().len() }
Currently, that temporary value is dropped after the locals from the block (just cell, in this case) are dropped. This is somewhat inconvenient, because it means that the above code is rejected by the lifetime checker, since a use of cell lives longer than cell does. Programmers have to work around it by assigning any temporaries to variables explicitly, cluttering the code with unnecessary let statements. So the 2024 edition will fix that problem by ensuring that temporaries from the end of a block are dropped before local variables.
Unsafety
A new warning will be issued for unsafe operations in unsafe functions. In current editions of Rust, marking a function as unsafe actually conflates the two meanings of the unsafe keyword: it both marks the function as unsafe to call, and allows the programmer to use unsafe operations in the body of the function. While this makes intuitive sense, Rust programmers have generally found that it invites confusion, because it can make it hard to spot which operations in the body of the function actually are unsafe. In the new edition, using an unsafe operation in an unsafe function without an additional unsafe block to mark the usage will result in a warning.
The biggest change to the rules for unsafe in this edition, however, is to how the language handles extern blocks (which declare external functions that can be called using the C foreign-function interface). Currently, all external functions are treated as unsafe, because the C code could potentially do anything. This is somewhat un-ergonomic for simple functions that, in fact, are safe to call. But just letting the programmer assert that some external functions are safe would break Rust's promise that, when undefined behavior occurs, there must be one or more incorrect unsafe blocks that are responsible.
The solution that the language developers have landed on is unsafe extern blocks. Now, the entire extern declaration will be marked unsafe — and the user is thereby duly warned that they must ensure that all of the external function declarations (including the safety of each individual function) are correct, or else face the risk of undefined behavior. Since that warning has moved to the point of declaration, uses of foreign functions no longer necessarily require the use of an unsafe block (although the programmer can still mark a foreign function as unsafe explicitly). In practice, since many Rust projects use auto-generated C bindings, the difference is likely to be minor. The change will reduce the number of unnecessary unsafe blocks used for calling external functions, even if it also adds a step of indirection to chasing down any problematic unsafe code.
The edition will also make the std::env::set_var() and std::env::remove_var() environment-variable manipulation functions unsafe, since they are only supposed to be used in single-threaded environments, but nothing in their types enforces that. Similarly, the edition will require the no_mangle attribute (which disables name mangling) to be marked unsafe, since it can cause the wrong function implementation to be called on some platforms if there are two functions with the same name. It will also no longer be possible to take references to global mutable variables — those must be manipulated with raw pointers in unsafe blocks, instead.
Capturing lifetime information
LWN has covered the work to improve the usability of inferred function return types, although the work has not stopped since that article was published. To briefly recap: the Rust compiler allows programmers to say that a function returns impl Trait for some trait, and the compiler will automatically infer which specific concrete type the function should return. This is a kind of existential type that is particularly useful for functions that return complicated iterators, which have complex types that it would be redundant to have the user write out. The Rust developers call the feature "return-position impl Trait" (RPIT) types.
The problem comes when this feature has to interact with Rust's lifetime system. If the type that the compiler infers contains a reference with a given lifetime, previous versions of Rust had no way for the programmer to name that lifetime, and therefore no way for the programmer to add any bounds to it. This proved to be particularly problematic for asynchronous functions that frequently need to hold references with complex lifetimes. To fix this, the language added a new piece of syntax that allows the programmer to name the lifetimes used in an impl Trait type. For example, this function explicitly says that the lifetime 'a is referenced by the (implicit) return type, allowing the programmer to work with the lifetime by name if needed:
fn example<'a>(one: &'a Foo, two: &Bar) -> impl use<'a> Sized { ... }
Any lifetimes not mentioned in the use block are not captured, so in the above example two is only used inside the function, not stored in its return type. This solves one of the main problems with RPIT types, and is a good step toward improving the ergonomics of asynchronous functions, which was one of the Rust project's goals for 2024. But the semantics of impl Trait types are currently a bit inconsistent. When there is a use block, both lifetimes and generic type parameters that are captured by the return type must be listed explicitly. When there is no use block, generic type parameters are captured implicitly, but lifetime parameters are not (since that would have avoided the whole problem by giving them names).
In order to make impl Trait types more consistent, the 2024 edition of the language will use the same rules for both lifetime parameters and generic type parameters: they will be implicitly captured by default, and if the programmer wants to give them names (or only capture a subset of them), then they must use a use block.
Small changes
Cargo, Rust's package manager and build system, is also seeing some changes. The largest one is that it will soon support selecting dependencies with compatible minimum-supported Rust versions. So, for example, a library that declares that it supports Rust version 1.76 will no longer be able to silently depend on libraries that only declare support for Rust version 1.80. This will solve a number of headaches for maintainers.
There are many more small changes to the language that are less likely to require changes to existing code:
- The Future and IntoFuture traits will be available without an import.
- The rules for how declarative macros match pieces of syntax are being tweaked, and made more strict.
- Rustdoc tests should run faster.
- Rust's code formatter will sort imports differently (see issues 12476, 123800, and 123802), and support multiple editions.
- The gen keyword (which will be used to indicate generators) is reserved for future use, as are unguarded prefixed string literals.
- The "never" type (!) will be less special.
- And programmers will be able to iterate through boxed slices.
Although there are a great number of backward-incompatible changes destined to land in this edition, adapting to the edition should not be too difficult. One of the requirements for any change proposed for an edition is that there be a way to automatically migrate existing code. Any Rust project using Cargo should be able to run cargo fix --edition and then update the edition declaration in its Cargo.toml file in order to upgrade. Some of the automatic fixes are a little bit ugly (or introduce new unsafe blocks), and so the process is not completely painless. Since Rust will happily mix libraries with different editions, however, many projects will choose to stay on the 2021 edition for some time.
The Rust project did not quite accomplish all of the goals it set for 2024 (perhaps explaining why the 2024 edition will be released in February 2025), but it made substantial progress. And, although many people worry about the incremental accumulation of complexity in the language, the plans for the 2024 edition show that the project is willing to simplify and streamline things over time — albeit not as quickly as some people would like. We'll have to see what gets proposed for the 2027 edition over the next few years.
The first part of the 6.14 merge window
As of this writing, just over 4,300 non-merge changesets have been pulled into the mainline repository for the 6.14 release. Many of the pull requests this time around include remarks saying that activity has been relatively low this time around, presumably due to the holidays. So those 4,300 changesets are probably closer to the merge-window halfway point than usual. Much of the work merged thus far looks more like incremental improvements than major new initiatives, but there still have been a number of interesting changes in the mix.Some of the most significant changes pulled into the mainline so far are:
Architecture-specific
- The PowerPC architecture has gained lazy preemption support.
- X86 systems using AMD's Secure Encrypted Virtualization feature now support a secure timestamp counter for guests. In short, it allows guests to read timestamps that cannot be manipulated by the host.
- AMD's energy-use counters for CPU cores are now supported in the perf events subsystem.
Core kernel
- The pid_max sysctl knob sets the highest number that can be used for a process ID; it has the effects of limiting the size of PID values and of limiting the total number of processes that may exist. In 6.14, pid_max is now tied to the PID namespace, allowing it to be set independently within containers. It is hierarchical, so no namespace can set pid_max to a value higher than that found in any of its parent namespaces. See this commit for more information about this change.
- When a program is launched with execveat(), the name of the executed file as stored in its directory entry will be shown in /proc rather than (as is done in current kernels) the file-descriptor number that was used. See this article for details on this change.
- The new "dmem" control-group controller regulates access to device memory, such as that found on graphics cards. Documentation is sparse, but there is a brief guide to the configuration of this controller available.
Filesystems and block I/O
- The pidfdfs filesystem can now create file handles (when requested by a name_to_handle_at() call); these can be used to create a system-wide unique identifier for processes even on 32-bit systems. It is also now possible to bind-mount pidfds.
- The statx() system call can now return the required alignment for read operations on a file; that alignment may be different than the requirement for writes, and some applications can benefit from knowing both.
- Some Btrfs configurations give the filesystem a choice of multiple devices when the time comes to read a specific block. In current kernels, the PID of the reading process is used to make that decision, but that will focus all read traffic onto a single device in a single-reader workload. The 6.14 kernel adds a couple of new policy options that can implement either round-robin read balancing or simply focus reads onto a specific device. See this commit for instructions on enabling round-robin, or this one to set a specific device.
- The bcachefs filesystem has a lot of changes after missing the 6.13
development cycle; these include a major on-disk format change that
will require a "
big and expensive
" format upgrade. These changes include self-healing improvements, filesystem-checking time "improved by multiple orders of magnitude
", and more; see this merge message for more information. - The md-linear device-mapper target (which essentially concatenates block devices) was removed in 6.8 as being deprecated and unmaintained. It seems that there were still users of this target, though, so it has been restored for 6.14. This change is also marked for the stable updates, so should propagate to the older kernels as well.
Hardware support
- Clock: Qualcomm X1P42100 graphics clock controllers, Qualcomm QCS615 and SM8750 global clock controllers, Qualcomm SM8750 TCSR clock controllers, Qualcomm SM8750 display clock controllers, Qualcomm IPQ CMN PLL clock controllers, and Qualcomm SM6115 low power audio subsystem clock controllers.
- Graphics: Synopsys Designware MIPI DSI host DRM bridges and ZynqMP DisplayPort audio interfaces.
- Hardware monitoring: TI TPS25990 monitoring interfaces, Intel common redundant power supply monitors, and Analog Devices ADM1273 hot-swap controllers.
- Miscellaneous: NVMe PCI endpoint function targets, Loongson memory controllers, AMD AI engines, STMicroelectronics LED1202 I2C LED controllers, TI LP8864/LP8866 4/6 channel LED drivers, KEBA SPI interfaces, and Airoha EN7581 SoC CPU-frequency controllers.
- Networking: NXP S32G/S32R Ethernet interfaces, Realtek 8922AE-VS PCI wireless network adapters, and QNAP microcontroller unit cores.
Miscellaneous
- The samples directory in the kernel repository contains a new program, mountinfo, which demonstrates the use of the statmount() and listmount() system calls.
- When Rust 1.84.0 (or later) is available, Rust code in the kernel will use the derive(CoercePointee) feature for pointer coercion. That feature is on the Rust-language stabilization track, and its use is an important step toward using only stable Rust features in the kernel. This merge message shows how it can be used.
Networking
- The RxRPC protocol implementation can now make use of huge UDP frames for better throughput. Support for the RACK-TLP loss-detection algorithm has also been added.
- There is a new per-network-namespace sysctl knob — tcp_tw_reuse_delay — that controls how long the system will wait before reusing the port number of a closed TCP socket; its value is in milliseconds.
- It is now possible to select whether an interface MAC or PHY should be used as the provider of PTP timestamps; this merge message gives some examples of how to do this that are presumably intelligible to people familiar with such things.
- IPsec IP-TFS/AGGFRAG (RFC 9347) is now supported.
Security-related
- The "xperms" SELinux feature allows policies to target specific ioctl() calls or netlink messages. In-kernel documentation is missing, but this wiki page has some information.
Internal kernel changes
- The kernel's annotation system, used to add information about code (such as "this jump is safe without a retpoline") would previously create a different ELF section for each annotation type. There is now a generic annotation infrastructure that gathers all of that information into a single section.
The 6.14 merge window can be expected to remain open through February 2, with the 6.14 release most likely happening on March 23. This timing seems more certain than usual, just because it will maximize editorial pain at LWN due to the Linux Storage, Filesystem, Memory Management, and BPF Summit starting on March 24. One way or another, we'll survive the experience and tell you how it goes.
The trouble with the new uretprobes
A "uretprobe" is a dynamic, user-space tracepoint injected by the kernel into a running process; this document tersely describes their use. Among other things, uretprobes are used by the perf utility to time function calls. The 6.11 kernel saw a significant change to uretprobes that improved their performance, but that change is also creating trouble for some users. The best way to solve the problem is not entirely clear.Specifically, a uretprobe exists to gain information at the return from a function in the process of interest. Older kernels implemented uretprobes by injecting code that, on entry to a function, changed the return address to a special trampoline that, in turn, contained a breakpoint trap instruction. When the target process executed that instruction, it would trap back into the kernel, which would then extract the information of interest (such as the function's return value) and run any other attached code (a BPF program, perhaps) before allowing the process to resume. This method worked, but it also had a noticeable performance impact on the probed process.
In an attempt to improve uretprobe performance, Jiri Olsa put together a patch set that changed the implementation on x86 systems. The return trampoline still exists but, rather than triggering a trap, it just calls the new uretprobe() system call, which then takes care of all of the associated work. Since system-call handling is faster than taking a trap, the cost to the probed process is lower when uretprobe() is used. This new system call takes no arguments, and it can only be called from the kernel-injected special trampoline; otherwise it will just deliver a SIGILL signal to the calling process.
Arguably, all system calls are special, but this one takes "special" to a whole new level. It is not something that a process can just call to obtain a useful service from the kernel. uretprobe() is thus unlikely to be on anybody's list of "five new system calls that every programmer should know". It does, however, succeed in accelerating uretprobes by as much as about 30%. This change went into the 6.11 release, seemingly without ill effect.
On January 10, though, Eyal Birger reported an ill effect; kernels that implement uretprobe() were causing Docker containers to crash. The problem is that Docker uses seccomp() to impose a policy on which system calls a containerized system can invoke. The policy used by Docker is, as standard practice would suggest, a default-deny arrangement; if a given system call has not been explicitly enabled, it will be blocked. uretprobe(), not being on any Docker developer's list of new exciting system calls, is not found in the allowlist. As a result, the injection of a uretprobe into a process running under Docker will result in that process's untimely and mysterious death. Docker users will, indeed, no longer notice a performance hit on a traced process, but they are unlikely to express their gratitude for that.
Various possibilities for fixing the problem were suggested at the time.
Olsa put together a quick
patch to detect a failure to execute uretprobe() and fall back
to the old implementation in that case. He also considered simply disabling the
uretprobe() implementation entirely when seccomp() is in
use, or adding a sysctl knob to control whether uretprobe() is
used. Birger, though, disliked
the sysctl idea, saying: "'Give me speed but potentially crash
processes I don't control' is a curious semantic
".
Oleg Nesterov, instead, suggested patching seccomp() to simply ignore calls to uretprobe(), making that system call even more special. Birger returned with a patch implementing Nesterov's suggestion. Kees Cook, however, questioned this change, wondering why uretprobe() needs to be so special. Docker, he pointed out, already handles other weird system calls like sigreturn(); it should be able to do the same with uretprobe():
Basically, this is a Docker issue, not a kernel issue. Seccomp is behaving correctly. I don't want to start making syscalls invisible without an extremely good reason.
Birger responded that this case is indeed different:
I think the difference is that this syscall is not part of the process's code - it is inserted there by another process tracing it. So this is different than desiring to deploy a new version of a binary that uses a new libc or a new syscall.
That reasoning just hardened Cook's position, though. A process might want to defend against injection of uretprobe() by blocking it with seccomp(), he said. The whole point of seccomp() is to implement a policy given to it, he added; there should not be a separate policy within seccomp() itself.
This reasoning, sound as it may be, does little to solve Birger's problem, which, he said, is simply:
The problem we're facing is that existing workloads are breaking, and as mentioned I'm not sure how practical it is to demand replacing a working docker environment because of a new syscall that was added for performance reasons.
This replacement, he said, would not be easy to accomplish. He concluded by wondering if the right solution might be to just revert the uretprobe() change. Olsa said, again, that it might be better to introduce a new sysctl knob to control whether uretprobe() is used, but Cook answered that reverting may be the best choice, at least for now, while some thought goes into how this implementation should interact with features like seccomp(). Olsa then suggested that the best solution might be to disable uretprobe() temporarily, without removing it from the kernel, until Docker can be updated to handle it correctly. Birger went off to consider that idea.
This approach may lead to a solution for this specific problem, though it could take years before enough Docker installations have been updated to make re-enabling uretprobe() safe. But we will be seeing this problem again. Running systems within a sandbox that denies all system calls that have not been explicitly enabled may well be good for security, but that practice will run into trouble when the kernel underlying the whole system routinely adds new system calls. Beyond uretprobe(), the x86 architecture saw the addition of nine new system calls in 2024: setxattrat(), getxattrat(), listxattrat(), removexattrat(), mseal(), map_shadow_stack(), lsm_get_self_attr(), lsm_set_self_attr(), and lsm_list_modules(). There is no reason to believe that the addition of system calls will stop now.
Something will have to give; in this case, that something would appear to be the new uretprobe implementation. But it is hard to imagine that the development community will be pleased at the idea that it cannot add new system calls lest existing Docker implementations break. Perhaps there will never be another system call as special as uretprobe(), with its ability to break systems with just a kernel change but, as Cook pointed out, there have been cases where the addition of a more "normal" system call has caused similar crashes. In summary, it would be surprising if the combination of "don't allow anything new" and "add lots of new things" didn't explode every now and then.
FOSDEM keynote causes concerns
This year's edition of the Free and Open Source Software Developers' European Meeting (FOSDEM) begins on February 1 in Brussels. The event is widely regarded as one of the most important open-source conferences. One of the reasons that FOSDEM is held in high esteem by the community is its non-commercial nature. It does accept sponsors, but sponsorships come with few perks and no "pay-for-play" speaking slots. Thus, the scheduling of a keynote by Jack Dorsey—primarily known for his role in co-founding Twitter, and currently CEO and chairman of FOSDEM sponsor Block, Inc.—raised eyebrows and led to plans for a protest. The keynote has since been removed from the schedule, but there are still a number of lingering questions.
FOSDEM background
FOSDEM got its start in 2000 as the "Open Source Developers of Europe Meeting" (OSDEM). It added "Free" to the mix in 2001, at the request of Richard Stallman, according to the conference's about page. It is held yearly at the Solbosch campus of Université libre de Bruxelles, excepting 2021 and 2022 when the conference was held online-only due to COVID. It brings in thousands of developers from around the world to occupy the 35 rooms across the campus that host the main tracks and developer rooms ("devrooms"). It is, from year to year, unclear how many people attend FOSDEM as there is no charge for entry or any other way of counting attendees.
The FOSDEM devrooms are organized by members of open-source projects or other groups that share interests in a particular topic, such as the Android Open Source Project (AOSP), distributions, monitoring and observability, or software-defined radio—to name only a few. As a disclaimer, I should note that I have spoken at FOSDEM, assisted with organizing devrooms in the past, such as the virtualization and IaaS devroom in 2016, and have interacted with FOSDEM organizers as a sponsor.
While the devrooms are the main attraction for many attendees, the conference does have keynotes held in the Janson room with capacity for more than 1,400 people. The seating capacity for rooms at FOSDEM varies wildly from as few as 48 seats to more than 800 in the next-largest room aside from Janson.
The controversy
Block, Inc. was formerly known as Square, Inc. It owns the
Square payment service, the Cash App digital wallet service, Tidal
music streaming service, and others. The title of Dorsey's talk was
"Infusing Open Source Culture into Company DNA
". The abstract
(saved
on Archive.org) said that the talk would cover "the strategic importance of open-source
technology and why Block has committed to building in the
open
". Dorsey also had a co-speaker for the session, Manik
Surtani, Block's head of open source. Surtani was founder of the Infinispan open-source in-memory
database, and has been active in the Java Community Process (JCP).
The talk title and description were unusually corporate-sounding
for FOSDEM. A survey of keynotes back to 2013 turns up no other
keynote focused solely on a single company's open-source achievements
or culture, much less one that is virtually invisible in the space.
It is hard to see how it would be highly rated compared to usual
FOSDEM fare. Felix Niederwanger published
screen shots of the pretalx conference software,
used by FOSDEM, which show what appear to be low ratings for Dorsey's
talk proposal, with the exception of one positive review. "Giving him this
stage is both morally and politically wrong and damaging to the
philosophy of open source
," Niederwanger said.
Curl creator Daniel Stenberg, said of the talk:
Jack Dorsey speaking about "Infusing Open Source Culture into Company DNA" when not a single soul has ever heard of him in relation to Open Source before this talk title would have worked as a joke to me.
Drew DeVault, creator of the sourcehut
software-hosting service, Hare
programming language, aerc
mail client, and frequent commentator on FOSS, found the idea of
Dorsey keynoting at FOSDEM less amusing than Stenberg. He wrote
a blog post arguing that "billionaires are not welcome at
FOSDEM
" and that the Janson stage at FOSDEM was for smaller projects that
embody the values of FOSS and for discussing the challenges faced by
the community. Dorsey, according to DeVault, was at fault for some of the
challenges:
In 2023 this stage hosted Hachyderm's Kris Nóva to discuss an exodus of Twitter refugees to the fediverse. [...] Two years later one of the principal architects of, and beneficiaries of, that disaster will step onto the same stage. Even if our community hadn't been directly harmed by Dorsey's actions, I don't think that we owe this honor to someone who took a billion dollars to ruin their project, ostracize their users, and destroy the livelihoods of almost everyone who worked on it.
DeVault went farther than complaining about the keynote, however. He also said he would organize a sit-in protest of the talk. He asked people to meet outside Janson prior to the talk and sit on the stage after the previous speaker finished until the end of Dorsey's scheduled talk.
Statement
On January 16, FOSDEM issued statement in response to the controversy about Dorsey and news of planned protests of his talk:
To be clear, in our 25 year history, we have always had the hard rule that sponsorship does not give you preferential treatment for talk selection; this policy has always applied, it applied in this particular case, and it will continue to apply in the future. Any claims that any talk was allowed for sponsorship reasons are false.
It went on to say that FOSDEM had always "allowed and
welcomed
" protest, and that the organizers would not take action
against a protest "provided the protest is indeed peaceful and does
not disrupt the proceedings
". Whether the plans to occupy the
keynote stage during Dorsey's talk qualified as disruption was left
unstated. It also asked for protest organizers to contact the FOSDEM
team in advance to allow them to meet crowd-control and fire-safety
obligations.
A discussion
on the Lobste.rs community featured
mixed reactions to the talk, planned protest, and FOSDEM's request for
non-disruptive protest. One commenter, "Hail_Spacecake", said they did
not agree with DeVault's reasons for protesting the talk and hoped the
organizers would physically remove "disruptive
" protesters or
summon the police to do so, so that the talk would go on as
scheduled. (Update: Lobste.rs moderator Peter Bhat Harkins has notified us that the comment has since been deleted, indicating that they consider calls for "physically removing" protestors to be a dogwhistle for political violence.)
Another commenter, "friendlysock", said that there were "varying
degrees of fairness
" for critiques against Dorsey, but argued that
regular people have benefited from his companies:
Saying "hurr durr he's a billionaire we should automatically dismiss or side against him" is intellectually lazy at best and gleeful willful ignorance at worst.
Esther Payne wrote that it
may be true that the sponsorship did not pay for the talk, but "we
need to look at the soft power and very real monetary power
"
someone like Dorsey has. She argued that there is a disconnect between
the organizers of FOSDEM and some of the community.
Talk disappears
DeVault published a follow-up
post on January 20, in which he noted that the talk had been
downgraded from keynote to the "main track", held in the same room at
the same time. DeVault said that he had created a mailing list for
organizing the protest, and indicated that a number of people had
reached out to participate. He also acknowledged "Dorsey does not bear sole
responsibility for Twitter's sale
" but said he was complicit and
profited handsomely from the sale and "all its harmful
consequences
".
According to Payne's next post, Dorsey's talk disappeared from FOSDEM's web site on January 21. The organizers have not addressed the reasons for the removal or made any public statement about it since the 16th. After the talk was removed from the schedule, DeVault published another post, this time calling for a discussion about transparency and governance at FOSDEM.
DeVault said that it "strains our presumption of good faith
"
that the talk was rejected by three of four reviewers (referencing the
scores published by Niederwanger), and that it was "kind of weird
that we have to take them at their word
" that the sponsorship did
not play a role. Very little about how
FOSDEM operates or is governed, he said, is documented anywhere
publicly. "Who makes decisions? How? We don't know, and that's kind
of weird for something so important in the open source space.
"
He encouraged those who would have attended a protest to show up and
"talk about what we want from FOSDEM in the future
" instead. He said
that he would prepare a summary of the discussion for the community
and FOSDEM staff after the event.
I emailed FOSDEM's press contact, Mark Van den Borre,
to ask why Dorsey's talk was selected, and whether it was canceled by
the organizers or if Dorsey pulled out. He replied quickly, but said
that he was unable to provide answers on those topics—except to
clarify that Dorsey's talk "had nothing to do with
sponsorship
". FOSDEM has always kept sponsorships and programming
separate, and will continue to do so in the future.
In response to the calls for more transparency about FOSDEM
planning, spending, and such, he said that "I think you'll be hard
pressed to find many more open and transparent conferences than
FOSDEM
". He cited examples such as FOSDEM's free entry policy, its
community-driven programming for devrooms, that FOSDEM
publishes its site setup in a public Ansible repository, and the
fact that none of the organizers are paid for their work. He did
not specifically address the governance or transparency complaints
raised by DeVault. However, he indicated a willingness on behalf of
the organizers to listen.
We don't want to rest on our laurels. We want to keep improving ourselves and help [like-minded] organisers improve. Respectful contact with thoughtful visitors, speakers and other conference organisers is crucial to that.
When all is said and done, this is a puzzling episode in FOSDEM's long history. If money played no role in Dorsey landing a main stage talk, it's hard to see why his talk was chosen. He has no obvious bona fides to recommend him as an authority on open source. Block is not known as an exceptional contributor to open source. The talk's title and description would be much more at home at a corporate event than FOSDEM. One hopes the complete story will eventually come out to help the community fully understand what transpired here.
Other critiques
FOSDEM has received criticism for more than its choice of
keynotes. Even before COVID, the conference had the reputation of
being a likely place to catch a seasonal illness. This is because it
takes place during the height of cold and flu season, in a crowded
venue with poorly ventilated rooms, and draws people (and their germs)
from around the world. After the advent of COVID, some prospective
attendees have lobbied for organizers to do more to prevent spread of
COVID and other diseases, and expressed disappointment with FOSDEM's
current policies (or lack thereof). For example, Garrett LeSage took issue
with the lack of "any kind of sickness mitigation
" in the form
of a masks policy or better ventilation.
There is also an "alternative, online venue
" named FluConf 2025 that calls out FOSDEM
as a place where people gather to talk about projects and "spread a
cocktail of pathogens known as the FOSDEM Flu
". The FluConf
organizers have called, instead, for FOSS participants to publish videos, blog
posts, podcasts, or other materials that will be linked from the main
FluConf website and promoted online with the #fluConf2025
hashtag by the conference's Mastodon account.
Critiques aside, FOSDEM has been pulled together by an assemblage of volunteers with more passion than resources for decades. Putting on FOSDEM is a largely thankless task, so it seems reasonable that the organizers should get some benefit of the doubt when there is a real or perceived misstep. That does not mean the organizers should be immune to criticism or free from calls for improvement, but perspective—and one hopes kindness—is in order. Perhaps DeVault's, and others', calls for more transparent governance could also lead to reducing the workload on individual FOSDEM organizers and improve the event even further.
Offline applications with Earthstar
Earthstar is a privacy-oriented, offline-first, LGPL-licensed database intended to support distributed applications. Unlike other distributed storage libraries, it focuses on providing mutable data with human-meaningful names and modification times, which gives it an interface similar to many non-distributed key-value databases. Now, the developers are looking at switching to a new synchronization protocol — one that is general enough that it might see wider adoption.
The library
Earthstar is written in TypeScript (making it part of the JavaScript ecosystem), and designed to run both on native clients and in the browser. It has ten contributors, although the project's founder, Sam Gwilym, is certainly the most active. It has also gone through ten major releases since the project was started in 2020, and is now entering the final steps of preparing for the eleventh. Despite the library's rapid releases, the interface it presents has remained largely the same during that short time — at least, for a JavaScript library.
The Earthstar user guide explains the API for applications to build on. In short, after creating a key pair and establishing one or more connections to other peers, an application can write a binary blob to a particular key, and that data will be synchronized to all the other online peers that have permission to read that data. In this way, the total set of all values held by participants in the system forms a distributed key-value database. The database is offline-first and eventually consistent, so it doesn't make any guarantees about how long it will take for an update to be propagated to interested peers, but in practice the synchronization is fast enough to use for real-time chat applications when all of the involved peers are online.
The library does not yet support discovering new peers over the internet. There's no technical reason that this can't be supported in the future, but for now applications will need to implement their own peer discovery (such as by having peers put their public IP addresses into the application's Earthstar database). It does support automatically discovering peers on the local network, however. The project's documentation also covers how to create a relay server for an application.
There are a number of small examples that use the library, both in the documentation and generally available on the internet. Two of the most useful existing programs that use Earthstar are buntimer, a timer application that allows you to keep a todo list of timers in sync across multiple devices, and famstar, a simple photo-sharing and messaging application.
The protocol
Earthstar's new protocol is called Willow — a joint project of Gwilym and Aljoscha Meyer. Willow is a meta-protocol: it specifies most of what one needs for a distributed application, but leaves the choice of several key parameters, including encryption, up to the application. The new version of Earthstar will therefore be only one possible use of Willow, and other projects may change how the protocol is instantiated to fit their needs. Willow is split up into a handful of separate specifications for different pieces of the protocol. The central one is Willow's data model, which describes the kinds of data Willow can sync and how it's organized.
Willow stores arbitrary sequences of bytes organized by four different attributes: a path, made up of arbitrary human-readable path components; a timestamp, used to provide human-understandable modification times; a "subspace" that has a certain set of associated read and write permissions; and a "namespace" that separates data from different applications. Applications can query for values by any of those factors, with a few complications.
The first complication is timestamps. Paths in Willow are made up of a series of path components — analogous to directories in a filesystem. Writing a newer value to a prefix of an existing value's path (e.g., a new value for /a when there is an existing value for /a/b) overwrites the previous value. This is used to implement both updates and deletion; allowing for prefixes allows for the deletion of entire directories. While applications can query for values newer than a given timestamp, only the most recent version of a document will be returned.
The decision to use timestamps in a distributed system is an unusual one — most comparable protocols use vector clocks or systems like append-only logs. So Willow has a published justification for the decision. In short, the protocol doesn't actually rely on timestamps for correctness — even if all of the timestamps were zero, the protocol would still be able to pick a single, consistent document to call the "most recent" version. It might not be the version a human would have chosen, however.
Ultimately, the purpose of timestamps in the protocol is to make the idea of "the newest version" of a mutable resource in a distributed system more human-meaningful. In complex systems, understanding why a particular version is deemed newer than another can be difficult — it is much less so when the program can refer to the notion of time.
The next complication is read and write permissions. Many protocols allow a participant to discover what data a peer has available; so even if a peer didn't have permission to read some data, it could certainly tell whether the data existed. Willow doesn't work like that — it uses a simple zero-knowledge proof (detailed here) to make it so that peers can only exchange information about subspaces that they both have permission to read. In a typical use of Willow, each subspace will be controlled by a separate user (although users can have more than one subspace), who then grants read permission to whoever they desire.
The exact choice of authentication algorithm is one of the things that Willow leaves up to the implementation; all that it requires is that there is a function for determining whether an identity is permitted to read from or write to a subspace. Earthstar uses a capability system called Meadowcap to determine this. Some applications will want to allow a user to create their own subspace with no coordination; others will want identities to be managed by a centralized authority. Meadowcap supports both use cases in different ways. In particular, it supports granting not only read or write permissions, but also the ability to allow other users to also grant permissions for a particular set of keys. This makes the system more suitable for potential offline use.
Meadowcap also supports having credentials expire at a given time. Unfortunately, this is more complicated than it might initially seem, because Willow is designed to operate offline; a client can use an expired credential by just lying about the time, and pretending that an update was only now added to the network. So properly using expiring credentials in an application demands some care. The easiest way to handle it is to update the timestamp on a document to just after the expiry time of a credential when it expires — then, even if a client lies and pretends to have an update constructed in the past, it will be discarded in favor of the newer entry.
In any case, Willow presents a simple, flexible base for distributed storage. While the protocol doesn't handle conflict resolution at all (documents are always "last write wins"), leaving that up to the application, it does handle synchronization. Willow supports both active exchanges of information between peers, and the creation of "drops" of data that can be used to implement a sneakernet. Unlike online synchronization, using drops requires transmitting a whole portion of the database, not just the updates; on the other hand, portable storage is usually sufficiently dense to mitigate that problem. Compared to other protocols for distributed storage, Willow offers more flexibility and privacy, along with true mutability and deletion.
The downsides
But Willow's design decisions do come with disadvantages. For one thing, the fact that peers need read permissions to a subspace in order to even learn about the existence of it has huge privacy benefits — but it also means that peers without those read permissions can't act as relays for those that do. In order to synchronize the documents in a subspace with other users, the users need to be online simultaneously, or use the space-inefficient sneakernet method.
Another problem is with the choice to leave conflict resolution up to the application. While this makes sense for a generic protocol that aims to be used across many different areas, it does mean that there is still a decent amount to do to build an application on top of Willow. The most straightforward approach is to assign each user a separate subspace, and rely on them to not introduce conflicts in their own files, but that's not a robust or general solution.
Finally, while Willow is designed to make true deletion of data possible — something that is difficult in systems based around append-only logs — this is in some respects an impossible promise to keep. Deletion of data only works if all of the participating peers that have the data stored actually receive and honor the update. While Willow's eventual consistency guarantees mean that this will happen if all of the peers are eventually online again, in the real world that might not happen. Also, nothing stops a peer from keeping copies of deleted data on its own. So the inclusion of provisions for deleting data in the protocol are useful for making moderation and removal of illegal content possible in situations where all of the protocol participants are honest, but it isn't a solid guarantee that the application developer can rely on.
Comparison
Earthstar's documentation has a comparison to a few different methods of synchronizing data across the internet. Its use of relay servers for long-distance connection and direct connections for local peers is similar to Scuttlebutt, the decentralized social media platform. Its use of paths to identify opaque blobs of data is more similar to the InterPlanetary File System (IPFS). I think the library that provides the closest comparison, however, is gun. Both Earthstar and gun are written using JavaScript, are designed for offline-first use, and present similar interfaces.
Earthstar is not a large project. Compared to existing solutions for distributed applications, it's still missing some important features. Its data model is both more simplistic and more restrictive than other tools. But in exchange for those tradeoffs, it presents a compelling, human-friendly solution for distributed data. It will not be the right solution for every use case, but it may be the right solution for some of them.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: Git security; Ubuntu discussion; LWN EPUBs; Facebook moderation; Quotes; ...
- Announcements: Newsletters, conferences, security updates, patches, and more.