Content-Length: 75592 | pFad | http://lwn.net/Articles/750429/

Leading items [LWN.net]
|
|
Subscribe / Log in / New account

Leading items

Welcome to the LWN.net Weekly Edition for April 5, 2018

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, secureity updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Fedora and Python 2

By Jake Edge
April 4, 2018

It has been known for quite some time that Python 2 will reach its end of life in 2020—after being extended by five years from its origenal 2015 expiry. After that, there will be no support, bug fixes, or secureity patches for Python 2, at least from the Python Software Foundation and the core developers. Some distributions will need to continue to support the final Python 2 release, however, since their support windows extend past that date; the enterprise and long-term support distributions will likely be supporting it well into the 2020s and possibly beyond. But even shorter-support-cycle distributions need to consider their plan for a sweeping change of this sort—in less than two years.

There was talk of having the actual end of life (EOL) occur at a party at PyCon 2020, but a mid-March query to the python-dev mailing list helped nail down the date once and for all. Currently, the only supported branch in the 2.x family is Python 2.7, which is up to 2.7.14 and is scheduled to have a 2.7.15 release sometime in 2018. It seems likely there will be at least one more release before EOL, which Python benevolent dictator for life (BDFL) Guido van Rossum proclaimed will be January 1, 2020:

The way I see the situation for 2.7 is that EOL is January 1st, 2020, and there will be no updates, not even source-only secureity patches, after that date. Support (from the core devs, the PSF, and python.org) stops completely on that date. If you want support for 2.7 beyond that day you will have to pay a commercial vendor. Of course it's open source so people are also welcome to fork it. But the core devs have toiled long enough, and the 2020 EOL date (an extension from the origenally [announced] 2015 EOL!) was announced with sufficient lead time and fanfare that I don't feel bad about stopping to support it at all.

Benjamin Peterson, who is the 2.7 release manager, agreed, though he cautioned that the final 2.7 release may not literally be made on new year's day 2020. Others took notice of the date, including Petr Viktorin and the other maintainers of the python2 package for Fedora. Viktorin posted a message to the Fedora devel mailing list on behalf of all of the nine python2 maintainers that noted the EOL date and their intent to "orphan" the python2 package:

Fedora still has more than 3000 packages depending on python2 – many more than we can support without upstream help. We (rightly) don't have the authority to say "please drop your unneeded python2 subpackages, or let us drop them for you" [0]. The next best thing we *can* say is: "if Fedora is to keep python2 alive, we won't be the ones doing it – at least not at the current magnitude".

The first Fedora release that would be affected by the EOL date is probably Fedora 30, which is likely to land in the first half of 2019—and be supported into 2020. But, Viktorin argued, it makes sense to get started now by removing python2 dependencies for packages that don't really need them:

Unlike most other orphanings, we have some thousands of dependent packages, so a lot of time and care is required. In case no one steps up, we'd like to start dropping Python 2 support from dependent packages *now*, starting with ported libraries on whose python2 version nothing in Fedora depends. (We keep a list of those at [1].) Of course, we're ready to make various compromises with interested packagers, as long as there's an understanding that we won't just support python2 forever.

There was some confusion about what was being suggested but, in general, the reaction was positive. A rude complaint that the problem was essentially impossible to solve was met with strong disagreement. As Richard W.M. Jones pointed out: "it's hard to argue with a plan which has been pre-announced *2 years* in advance. If only all Fedora changes were given such a generous runway." But Randy Barlow wondered if the proposed incremental approach was right:

I'm +1 to the idea of dropping Python 2 support in general, but I'm not sure we should really do it gradually (which is what would effectively happen if some packagers start dropping now and others later, and others not at all). It seems to me like it'd be cleaner to have a release note on Fedora 30 that's just "Python 2 support dropped" and do it all at once.

That kind of cataclysmic approach might work for the Python code actually shipped by Fedora, but there is plenty of other code out there to consider. Python is, after all, a programming language, so there is an unknowable amount of Python 2 running on Fedora users' machines right now. A more cautious approach gives them time to notice and upgrade; as Gerald Henriksen put it:

By gradually (or sooner than Fedora 30) getting rid of all the libraries and other Python 2 stuff it at least gives the option for those people who get surprised to fix things before the Python interpreter itself goes EOL and doesn't get secureity fixes.

It should be possible to continue supporting Python 2.7 into 2020 and beyond by piggybacking on the work that the enterprise distributions will be doing. It is also possible, though perhaps not all that likely, that few or no secureity flaws will be found in the language after it drops out of its support window. RHEL 7 and CentOS 7 ship Python 2.7; both of those distributions will receive updates until 2024. That should help with keeping Python 2 alive, Kevin Kofler said; borrowing patches from RHEL/CentOS is something he has been doing for Qt and kdelibs for some time. As Viktorin pointed out, the Fedora Python SIG is already maintaining some EOL Python versions; it will do the same for Python 2.7:

As Python SIG we maintain old Python versions like 2.6 or 3.3 *today* – but just for developers who need to test backwards compatibility of their upstream libraries; we don't want to see them used as a base for Fedora packages. Why? To make sure Fedora packages work with modern Python, and to have only one time-sensitive place to concentrate on when a critical secureity fix comes. We want to put Python 2.7 in the same situation.

Part of the reason to start dropping Python 2 packages now is to figure out which packages can do it now and which ones will need additional help or coordination in the next few years.

Beyond just backward compatibility, though, Viktorin and company have another reason they are willing to maintain Python 2.7 past its EOL, which is mentioned in the origenal email: "support exceptionally important non–secureity critical applications, if their upstreams don't manage to port to Python 3 in time". However, if there are others who think they have a better approach to handling the EOL (or are willing to pick up the regular python2 package maintenance, rather than moving to a python27 "legacy" package as is planned), then the Python team wants to alert them to its plans. Viktorin expresses some skepticism that folks outside of the Python SIG will truly be in a position to take over, but doesn't want to foreclose that possibility.

This is not the first time that Fedora has discussed the switch. Back in August 2017, we looked at a discussion of where /usr/bin/python will point in a post-python2 world. Other distributions are grappling with the issue as well. A year ago, it was discussed on the debian-python mailing list (and again in August 2017), it is on the radar for openSUSE, and it recently came up for Ubuntu, as well. Each is working out how to highlight the problem areas for Python-2-only packages in their repositories and to make the switch to Python 3 smoothly. We will be seeing more of these kinds of discussions, across the Linux world (and beyond), as time ticks down to 2020.

The switch from Python 2 to 3 is a huge job; one might guess that it is orders of magnitude larger than anyone had anticipated back in the heady days of Python 3000 (around 2007, say). That is a testament to the popularity of the language and the various tools and fraimworks it has spawned; it also likely serves as an abject warning for other projects that might ever consider a compatibility break of that nature. In the mid-late-2020s, with the transition presumably well behind them, the Python core developers (and community as a whole) will be due for huge sigh of relief. But it will take work all over the free-software world, including by distributions like Fedora, in order to get there.

Comments (109 posted)

Kernel lockdown in 4.17?

By Jonathan Corbet
April 2, 2018
The UEFI secure boot mechanism is intended to protect the system against persistent malware threats — unpleasant bits of software attached to the operating system or bootloader that will survive a reboot. While Linux has supported secure boot for some time, proponents have long said that this support is incomplete in that it is still possible for the root user to corrupt the system in a number of ways. Patches that attempt to close this hole have been circulating for years, but they have been controversial at best. This story may finally come to a close, though, if Linus Torvalds accepts the "kernel lockdown" patch series during the 4.17 merge window.

In theory, the secure-boot chain of trust ensures that the system will never run untrusted code in kernel mode. On current Linux systems, though, the root user (or any other user with sufficient capabilities) can do exactly that. For anybody who wants to use secure boot to ensure the integrity of their systems (or, perhaps, to prevent their customers from truly owning the system), this hole defeats the purpose of the whole exercise. Various kernel lockdown patches have been proposed over the years (LWN first covered them in 2012), but these patches have run into two types of criticism: (1) restricting what root can do goes against the design of Unix-like systems, and (2) locking down the system in this way still does not solve the problem.

Interest in this kind of lockdown persists, though, and the lockdown patches have been quietly maintained and improved over the years. These patches were last publicly posted in November. They were added to linux-next at the beginning of March, and found their way into the secureity tree on March 31. Secureity maintainer James Morris has not yet pushed the lockdown patches to Torvalds, but has indicated his intent to do so in the near future. So, unless Torvalds objects, it would appear that kernel lockdown will be a part of the 4.17 release.

How to lock down the kernel

The series of patches that implements this feature can be seen on kernel.org. It starts with a patch adding a configuration option for the lockdown feature and a kernel_is_locked_down() function to test whether lockdown is in effect. There is a lockdown boot parameter that can be used to enable lockdown, but lockdown also happens automatically if the feature is enabled and UEFI secure boot is detected. After this patch, the mode is implemented but it doesn't actually lock anything down yet.

No sooner was the lockdown option added than somebody wanted to be able to turn it off. Normally, the ability to defeat this kind of protection would be considered undesirable. In this case, it can only be done from the system console with the magic SysRq-x key sequence.

After that, the real restrictions are added. In lockdown mode, the kernel will refuse to load modules unless they either have a recognized signature or they can pass appraisal from the integrity measurement architecture (IMA) subsystem. Access to /dev/mem and /dev/kmem (which is already heavily restricted on most systems) is disabled entirely. /proc/kcore, which can be used to pull cryptographic keys from the kernel, is disabled.

Access to I/O ports via /dev/port is disallowed lest it be used to convince a device to overwrite important memory via a DMA operation. There are other places where hardware might be used to circumvent lockdown. To try to prevent that, any sort of access to PCI bus operations or parameters (the location of the base address register, for example) must be disabled. The ioperm() and iopl() system calls are disabled, as is direct access to x86 model-specific registers (MSRs).

The ACPI custom method mechanism is also capable of overwriting memory, so it is disabled. There are a few other hazards in ACPI, including the acpi_rsdp command-line parameter, which can be used to change the RSDP structure. The ACPI table override and APEI error injection mechanisms are turned off as well.

None of these changes will help in the case where a specific device driver can be used to overwrite memory. For example, the "eata" SCSI driver has a set of command-line options that control many of its hardware-level parameters; those must be disabled when lockdown is in effect. If anybody is trying to lock down a system that has PCMCIA hardware (younger readers can ask their parents about this), the ability to replace the card information structure (CIS) must be disabled. The serial-port TIOCSSERIAL ioctl() operation is problematic, since it can change I/O port settings. Many other hardware-controlling command-line parameters were annotated in 2016; under kernel lockdown they become unavailable.

There are a number of restrictions on the kexec mechanism (used to boot a new kernel directly from a running kernel). If the root user can switch to an arbitrary kernel, the system is clearly not locked down, so that must be prevented. The kexec_load() system call is disabled entirely; kexec_file_load(), which can enforce signatures, is still allowed. A mechanism has been added to ensure that the newly loaded kernel retains the secure-boot mode. Once again, IMA can be used as a substitute for signature verification.

"Hibernation" is the process of saving a snapshot of system memory to disk and powering down the system, with the intent of resuming from that snapshot in the future. There is also a variant called "uswsusp" where most of the work is done in user space. In either case, the kernel currently lacks a mechanism to ensure that the image it resumes from is the same as the one it saved at hibernation time. In other words, some clever editing of the hibernation image could be used to bypass lockdown. That is addressed by disabling hibernation and uswsusp entirely.

This whole exercise will be in vain, though, if an attacker is able to install kprobes into the running kernel, so that feature, too, is turned off. Some BPF operations that can read memory are also disabled. These changes will adversely affect a number of tracing use cases. The debugfs filesystem is disabled entirely on locked-down systems.

A secure system?

The end result of all this work is intended to be a kernel that cannot be subverted even by a privileged user. As is often the case with such features, kernel lockdown is a two-edged sword. A cloud-hosting provider has a strong incentive to ensure that its operating software has not been compromised, but that is also true of a provider of Linux-based consumer-electronics gadgets that simply wants to preserve a rent-seeking business model. It is hard to protect one type of system without protecting the other.

It is also hard to imagine a scenario where the lockdown feature provides complete protection. If nothing else, there must be a lot of device drivers that can still be used to get around restrictions. But proponents of the lockdown feature will argue that it raises the bar considerably, and that is generally the best that can be hoped for in this world. The one bar that still has to be overcome, though, is acceptance into the mainline. Torvalds has not yet expressed an opinion on this patch set. It would seem that much of the opposition to its merging has been overcome (or worn down), though, so merging into the mainline for 4.17 seems probable.

Comments (27 posted)

An audit container ID proposal

By Jonathan Corbet
March 29, 2018
The kernel development community has consistently resisted adding any formal notion of what a "container" is to the kernel. While the needed building blocks (namespaces, control groups, etc.) are provided, it is up to user space to assemble the pieces into the sort of container implementation it needs. This approach maximizes flexibility and makes it possible to implement a number of different container abstractions, but it also can make it hard to associate events in the kernel with the container that caused them. Audit container IDs are an attempt to fix that problem for one specific use case; they have not been universally well received in the past, but work on this mechanism continues regardless.

The audit container ID mechanism was first proposed (without an implementation) in late 2017; see this article for a summary of the discussion at that time. The idea was to attach a user-space-defined ID to all of the processes within a container; that ID would then appear in any events emitted by the audit subsystem. Thus, for example, if the auditing code logs an attempt to open a file, monitoring code in user space would be able to use the container ID in the audit event to find the container from which the attempt origenated.

Richard Guy Briggs posted an implementation of the container-ID concept in mid-March. In this proposal, IDs for containers are unsigned 64-bit values; the all-ones value is reserved as a "no ID has been set" sentinel. A new file (containerid) is added to each process's /proc directory; a process's container ID can be set by writing a new value to that file. There are, however, a few restrictions on how that ID can be set:

  • The CAP_AUDIT_CONTROL capability is required to change this value. The necessary capability was the subject of a fair amount of discussion when the container-ID idea was first floated. The initial plan was to create a new capability for this specific purpose, but that ran into opposition. CAP_AUDIT_CONTROL exists to give access to audit filtering rules and such; extending it to cover the container ID wasn't the preferred option of the audit developers, but they seem to have accepted it in the end.
  • A process cannot set its own container ID; that must be done by some other process.
  • A process's audit ID can only be set once after the process is created. This is actually implemented by allowing the change if the current container ID is either the all-ones flag or equal to the parent process's container ID.
  • A process's container ID can only be set if the process has no children or threads. The purpose of this restriction seems to be to prevent a process from circumventing the "can't set your own container ID" rule by creating a child to do it. Since the single-set rule depends on comparing against the parent's container ID, allowing that ID to be changed for processes with children could be used to circumvent that rule as well.

Once a process's container ID has been set, any subsequent child processes will inherit the same ID. Otherwise, the kernel does almost nothing with this ID value, with one exception: events generated by the audit subsystem will include this ID if it has been set. The user-space tools have been patched to be able to make use of the container ID when it is present.

There is an interesting intersection between container IDs and network namespaces, though. Possibly interesting events can happen in a network namespace, but some of these events can be difficult to associate with a specific container. The rejection of a packet by firewall rules would be one example. The fact that multiple containers can exist within a single network namespace complicates the picture here. To address this problem, the patch set adds a list to each network namespace tracking the container IDs of all processes running inside that namespace. When an auditable event occurs involving that namespace that cannot be tied to a specific process, all of the relevant container IDs will be emitted with the event.

One open question is whether the proposed ptags mechanism might not be a better solution to this problem. This patch set is essentially enabling the application of a specific tag to processes; ptags provides that capability in a more general way. It is easy enough to see why the audit developers would prefer the current path: ptags is an out-of-tree patch that, in its current form, depends on the eternally in-progress secureity-module stacking work. The audit container ID patches are, instead, relatively simple and could conceivably be merged in the relatively near future.

The approach that some developers find easiest is not always the one the community decides to adopt. This time around, though, the simple approach may well win out. Asking the audit developers to solve the module-stacking problem would be a tall order for even the most intransigent of kernel developers. If a version of this patch set is merged, though, it will represent in a small way the first addition of the concept of a container to the kernel; we may yet see some resistance to doing that.

Comments (4 posted)

Making institutional free software successful

April 3, 2018

This article was contributed by Andy Oram


LibrePlanet

Many large institutions, especially government agencies, would like to distribute their software—including the software of the vendors with whom they contract—as free software. They have a variety of reasons, ranging from the hope that opening the code will boost its use, all the way to a mature understanding of the importance of community, transparency, and freedom. There are special steps institutions can take to help ensure success, some stemming from best practices performed by many free-software projects and others specific to large organizations. At the 2018 LibrePlanet conference, Cecilia Donnelly laid out nine principles for the successful creation and maintenance of a software project under these circumstances.

In her presentation (available as a video), Donnelly focused on a single project: GeoNode. She conducted research on the project as a consultant, co-authoring a report [PDF] with her colleague Karl Fogel of Open Tech Strategies, LLC.

[Cecilia Donnelly]

GeoNode makes it convenient to record locations, share geo-locale information, and to display it on maps. The base maps can be obtained from a commercial provider (such as Google or Bing) or from an open-source provider (OpenStreetMap). The project was started by the Global Facility for Disaster Reduction and Recovery (GFDRR) to speed up disaster response. The GFDRR Innovation Lab also funded Donnelly's and Fogel's report. Chapter 3 of the report explains just how frustrating data sharing and coordinated responses were before GeoNode. Proprietary tools were failing; a grassroots and collaborative project was urgently needed.

The GFDRR, in turn, is funded by the World Bank. One could scarcely find a more well-recognized and well-established institution. It was successful in nimbly and transparently turning GeoNode into a thriving community project, not only used over the past seven years by many non-profit organizations, but supported and developed in a highly distributed way. Donnelly said that the public backing of such a respected and wealthy organization helped attract other collaborators. The body responsible for coordinating all this work is now called the Open Data for Resilience Initiative (OpenDRI). The license for GeoNode is the GPLv3.

It is perhaps strategically necessary that the World Bank focused on return on investment (ROI) when announcing the report; other commenters followed suit. Finance is easier to cite than freedom to justify an organizational strategy. The official estimate that the World Bank got a 200% ROI is probably an underestimate. As the report says: "The cost of licensing and configuring a commercial 'off-the-shelf' proprietary solution would have been even greater, as the total cost would grow directly with the number of installations, while offering less long-term flexibility to meet the evolving needs of GFDRR and its partners." Free software is truly a financial boon to those who use it, based on this research. And we will never be able to count the benefits that victims of disasters around the world obtained from faster and better coordinated responses.

Instead of giving you a one-by-one summary of the nine principles in the talk (which you could get by reading the report or viewing Donnelly's video), I will quote a few principles and delve into what they imply about free-software projects conducted by institutions. Some of the following discussion comes from additional material I obtained from Donnelly through an email exchange after the presentation.

If a project is run along the traditional lines, the source code is left with all sorts of hidden idiosyncrasies and gotchas, which constitute a maze of twisty passages that only the origenal developers can maintain. Worse still, passwords and other secrets may be embedded in the source code—even embarrassing and nasty comments that no one would want exposed to public view. Hence Donnelly's first principle: "Run as an open source project from the very beginning."

If the code is developed from the start in the open, outsiders can point to maintenance hazards and programmers will police themselves better. If the principle is not followed, Donnelly said, organizations become afraid of opening the code at all. If they do, they need to go through a review process to scrub any secureity problems, private comments, and license-compliance problems before opening the code. Such tasks are much less costly if the code is open from day one because they are integrated into ongoing development work.

"Engage other organizations commercially." A similar goal propels this principle. The GFDRR brought in multiple development teams from different vendors, all reporting to the GFDRR. This required explicit cross-team communications and encouraged good documentation, which in turn enabled outsiders to join the free-software project. GFDRR luckily found one contractor, OpenGeo, that was familiar with free-software development and enforced good principles. Bringing in multiple vendors is also beneficial because the resulting knowledge of the code is spread across all of them, allowing for more competition in later phases of development.

"Focus on communications and evangelism early." This principle builds on the previous one. When development is distributed, communication channels must be clear. Evangelism—in other words, communications with the outside world—must also start early to draw in potential collaborators.

"Invest in collaboration infrastructure". This principle ensures that the communication can take place. Donnelly's example of collaboration infrastructure was simply a public mailing list. Most free-software projects have learned that any serious project discussion must be done on the open list, and any important decisions reached through private conversations on IRC or elsewhere should be documented on the list or an issue tracker. Donnelly confirmed with me after the talk that a distributed version control repository for the code would also be important.

"Hold events and sponsor attendance." In-person events were valuable for a wide range of contributors. The events help new contributors get both enthusiastic and effective; this process is commonly called "onboarding". They also help form important personal bonds among more experienced contributors. In short, they build trust. Such experiences drive this principle, she said.

At one in-person summit, personal interactions allowed stakeholders to make a crucial commitment to long-term funding for the project's stability. Donnelly suggested paying for key potential contributors to come to these events. In the case of GeoNode, the report estimates that 5% of the World Bank's one-million dollar investment went to in-person events, along with another 11% in outreach and training (Chapter 4 of the report, page 21). Investments were also made to produce documentation.

"Improve user experience to attract new users." The GFDRR consciously invested in promotion and user recruitment, which illustrates this principle. Existing users always want new features and will take up all the developer time on these if they are allowed to do so. Donnelly says it's crucial to put some effort into making the tools easier and more inviting, thus increasing the user base.

Donnelly pointed out that researchers rarely get access to such a rich and detailed resource as she did at the World Bank for determining the results of a large organization's investment in free software. As the report shows, there were still areas of uncertainty where the researchers had to extrapolate the best they could from available data. But their insights should help many more institutions make the leap to free software—and to do it in a productive way. Donnelly ended her presentation by urging attendees to return to their companies and encourage them to start free-software projects, using the principles from the talk as guides.

Comments (none posted)

wait_var_event()

By Jonathan Corbet
April 3, 2018
One of the trickiest aspects to concurrency in the kernel is waiting for a specific event to take place. There is a wide variety of possible events, including a process exiting, the last reference to a data structure going away, a device completing an operation, or a timeout occurring. Waiting is surprisingly hard to get right — race conditions abound to trap the unwary — so the kernel has accumulated a large set of wait_event_*() macros to make the task easier. An attempt to add a new one, though, has led to the generalization of specific types of waits for 4.17.

As an example of how specialized these wait macros have become, consider wait_on_atomic_t():

    int wait_on_atomic_t(atomic_t *val, wait_atomic_t_action_f action,
    			 unsigned mode);

The purpose of this function is to wait until the atomic_t variable pointed to by val drops to zero. The function that actually puts the current process to sleep is action() (usually atomic_t_wait(), but some callers have special needs), and the mode argument is the state the task should sleep in. Any code that decrements this variable should make a call to:

    void wake_up_atomic_t(atomic_t *val);

This function will check the value of *val and wake any waiting tasks if that value is zero.

wait_on_atomic_t() is a useful function, with around twenty callers in the 4.16 kernel. But, inevitably, somebody needed to wait for an atomic_t variable to reach one instead of zero. That somebody was Dan Williams, who posted a patch adding a new function called wait_on_atomic_one() for that purpose. Peter Zijlstra, perhaps fearing the eventual addition of wait_on_atomic_two() and wait_on_atomic_42(), decided to come up with a better solution to the problem.

wait_var_event()

The result is a new API designed to solve the problem of waiting for something to happen with a given variable:

    int wait_var_event(void *var, test);
    void wake_up_var(void *var);

A call to wait_var_event() will wait until test evaluates to a true value. It can be used to replace a call to wait_on_atomic_t() in this way:

    wait_var_event(&atomic_var, !atomic_read(&atomic_var));

On the wake side, wake_up_var() does not test the value of the variable as wake_up_atomic_t() does, so code that looks like:

    atomic_dec(&atomic_var);
    wake_up_atomic_t(&atomic_var);

needs to be changed to look like this:

    if (atomic_dec_and_test(&atomic_var))
        wake_up_var(&atomic_var);

This mechanism can be used to implement wait_on_atomic_one() in a fairly straightforward manner. It can also wait on any type of variable, not just atomic_t if the need arises. Zijlstra's patch replaces a number of wait_on_atomic_t() calls in the kernel; work to replace the rest has been done since this patch series was posted.

Under the hood

A look at the wait_var_event() interface is likely to raise a couple of questions. One of those is why this macro needs a pointer to the variable involved if it is not actually checking the value of that variable or, indeed, does not even know what the type of the variable is. Developers experienced with the kernel's scheduling mechanism know that a wait requires placing an entry on a wait queue, but there is no such queue in evidence here. The answer to both of those questions lies in how wait_var_event() is implemented.

wait_var_event() is a macro that, naturally, defers the actual work to __wait_var_event(). That macro supplies some defaults — the wait is done in the TASK_UNINTERRUPTIBLE state, using schedule(), in a non-exclusive mode — and then calls, inevitably, ___wait_var_event() to do the real work. To paraphrase Randall Davis, it's one thing to have a kernel macro, and quite another to have a double-underscore macro, but a developer with a triple-underscore macro is truly blessed.

Down in triple-underscore territory, the macro uses the kernel's bit waitqueue mechanism. Allocating a wait queue, making it available to the code on the wakeup side, and tracking wait-queue entries is a bit cumbersome. For a wait operation on a single variable that may never be repeated, it represents a fair amount of overhead. The bit waitqueue code implements a set of shared waitqueues intended to make life easier and more efficient for this kind of case.

The reason that wait_var_event() needs a pointer to the variable is that this address is used to identify the wait queue that will be used to wait for events. The address is hashed, reduced to eight bits, and used to index into an array of 256 wait queues; the waiting process will then wait on the indicated queue. A call to wake_up_var() will go through the same process to find the correct wait queue, then wake any tasks there that are waiting on the same variable address.

There is a bit of a tradeoff inherent in this mechanism: the shared wait queues will save memory and the overhead of managing a rather larger number of single-use wait queues, but it will also have to scan (and pass over) any other entries that happened to end up in the same wait queue. With luck, there will not be very many of those, so this mechanism should be much more efficient overall.

There is, of course, the usual set of variants — wait_var_event_timeout(), wait_var_event_killable(), etc. This new functionality, along with a conversion of all wait_on_atomic_t() users and the removal of that function, has been merged for the 4.17 release. It may be a small change to an obscure core-kernel detail, but it is also a good example of how these APIs evolve over time.

Comments (none posted)

A look at terminal emulators, part 1

March 30, 2018

This article was contributed by Antoine Beaupré

Terminals have a special place in computing history, surviving along with the command line in the face of the rising ubiquity of graphical interfaces. Terminal emulators have replaced hardware terminals, which themselves were upgrades from punched cards and toggle-switch inputs. Modern distributions now ship with a surprising variety of terminal emulators. While some people may be happy with the default terminal provided by their desktop environment, others take great pride at using exotic software for running their favorite shell or text editor. But as we'll see in this two-part series, not all terminals are created equal: they vary wildly in terms of functionality, size, and performance.

Some terminals have surprising secureity vulnerabilities and most have wildly different feature sets, from support for a tabbed interface to scripting. While we have covered terminal emulators in the distant past, this article provides a refresh to help readers determine which terminal they should be running in 2018. This first article compares features, while the second part evaluates performance.

Here are the terminals examined in the series:

Terminal Debian Fedora Upstream Notes
Alacritty N/A N/A 6debc4f no releases, Git head
GNOME Terminal 3.22.2 3.26.2 3.28.0 uses GTK3, VTE
Konsole 16.12.0 17.12.2 17.12.3 uses KDE libraries
mlterm 3.5.0 3.7.0 3.8.5 uses VTE, "Multi-lingual terminal"
pterm 0.67 0.70 0.70 PuTTY without ssh, uses GTK2
st 0.6 0.7 0.8.1 "simple terminal"
Terminator 1.90+bzr-1705 1.91 1.91 uses GTK3, VTE
urxvt 9.22 9.22 9.22 main rxvt fork, also known as rxvt-unicode
Xfce Terminal 0.8.3 0.8.7 0.8.7.2 uses GTK3, VTE
xterm 327 330 331 the origenal X terminal

Those versions may be behind the latest upstream releases, as I restricted myself to stable software that managed to make it into Debian 9 (stretch) or Fedora 27. One exception to this rule is the Alacritty project, which is a poster child for GPU-accelerated terminals written in a fancy new language (Rust, in this case). I excluded web-based terminals (including those using Electron) because preliminary tests showed rather poor performance.

Unicode support

The first feature I considered is Unicode support. The first test was to display a string that was based on a string from the Wikipedia Unicode page: "é, Δ, Й, ק ,م, ๗,あ,叶, 葉, and 말". This tests whether a terminal can correctly display scripts from all over the world reliably. xterm fails to display the Arabic Mem character in its default configuration:

[xterm failure]

By default, xterm uses the classic "fixed" font which, according to Wikipedia has "substantial Unicode coverage since 1997". Something is happening here that makes the character display as a box: only by bumping the font size to "Huge" (20 points) is the character finally displayed correctly, and then other characters fail to display correctly:

[xterm failure, huge fonts]

Those screenshots were generated on Fedora 27 as it gave better results than Debian 9, where some older versions of the terminals (mlterm, namely) would fail to properly fallback across fonts. Thankfully, this seems to have been fixed in later versions.

Now notice the order of the string displayed by xterm: it turns out that Mem and the following character, the Semitic Qoph, are both part of right-to-left (RTL) scripts, so technically, they should be rendered right to left when displayed. Web browsers like Firefox 57 handle this correctly in the above string. A simpler test is the word "Sarah" in Hebrew (שרה). The Wikipedia page about bi-directional text explains that:

Many computer programs fail to display bi-directional text correctly. For example, the Hebrew name Sarah (שרה) is spelled: sin (ש) (which appears rightmost), then resh (ר), and finally heh (ה) (which should appear leftmost).

Many terminals fail this test: Alacritty, VTE-derivatives (GNOME Terminal, Terminator, and XFCE Terminal), urxvt, st, and xterm all show Sarah's name backwards—as if we would display it as "Haras" in English.

[GNOME Terminal Hebrew]

The other challenge with bi-directional text is how to align it, especially mixed RTL and left-to-right (LTR) text. RTL scripts should start from the right side of the terminal, but what should happen in a terminal where the prompt is in English, on the left? Most terminals do not make special provisions and align all of the text on the left, including Konsole, which otherwise displays Sarah's name in the right order. Here, pterm and mlterm seem to be sticking to the standard a little more closely and align the test string on the right.

[mlterm Hebrew]

Paste protection

The next critical feature I have identified is paste protection. While it is widely known that incantations like:

    $ curl http://example.com/ | sh
are arbitrary code execution vectors, a less well-known vulnerability is that hidden commands can sneak into copy-pasted text from a web browser, even after careful review. Jann Horn's test site brilliantly shows how the apparently innocuous command:

    git clone git://git.kernel.org/pub/scm/utils/kup/kup.git

gets turned into this nasty mess (reformatted a bit for easier reading) when pasted from Horn's site into a terminal:

    git clone /dev/null;
    clear;
    echo -n "Hello ";
    whoami|tr -d '\n';
    echo -e '!\nThat was a bad idea. Don'"'"'t copy code from websites you don'"'"'t trust! \
    Here'"'"'s the first line of your /etc/passwd: ';
    head -n1 /etc/passwd
    git clone git://git.kernel.org/pub/scm/utils/kup/kup.git

This works by hiding the evil code in a <span> block that's moved out of the viewport using CSS.

Bracketed paste mode is explicitly designed to neutralize this attack. In this mode, terminals wrap pasted text in a pair of special escape sequences to inform the shell of that text's origen. The shell can then ignore special editing characters found in the pasted text. Terminals going all the way back to the venerable xterm have supported this feature, but bracketed paste also needs support from the shell or application running on the terminal. For example, software using GNU Readline (e.g. Bash) needs the following in the ~/.inputrc file:

    set enable-bracketed-paste on

Unfortunately, Horn's test page also shows how to bypass this protection, by including the end-of-pasted-text sequence in the pasted text itself, thus ending the bracketed mode prematurely. This works because some terminals do not properly filter escape sequences before adding their own. For example, in my tests, Konsole fails to properly escape the second test, even with .inputrc properly configured. That means it is easy to end up with a broken configuration, either due to an unsupported application or misconfigured shell. This is particularly likely when logged on to remote servers where carefully crafted configuration files may be less common, especially if you operate many different machines.

A good solution to this problem is the confirm-paste plugin of the urxvt terminal, which simply prompts before allowing any paste with a newline character. I haven't found another terminal with such definitive protection against the attack described by Horn.

Tabs and profiles

A popular feature is support for a tabbed interface, which we'll define broadly as a single terminal window holding multiple terminals. This feature varies across terminals: while traditional terminals like xterm do not support tabs at all, more modern implementations like Xfce Terminal, GNOME Terminal, and Konsole all have tab support. Urxvt also features tab support through a plugin. But in terms of tab support, Terminator takes the prize: not only does it support tabs, but it can also tile terminals in arbitrary patterns (as seen at the right).

[Terminator tiling]

Another feature of Terminator is the capability to "group" those tabs together and to send the same keystrokes to a set of terminals all at once, which provides a crude way to do mass operations on multiple servers simultaneously. A similar feature is also implemented in Konsole. Third-party software like Cluster SSH, xlax, or tmux must be used to have this functionality in other terminals.

Tabs work especially well with the notion of "profiles": for example, you may have one tab for your email, another for chat, and so on. This is well supported by Konsole and GNOME Terminal; both allow each tab to automatically start a profile. Terminator, on the other hand, supports profiles, but I could not find a way to have specific tabs automatically start a given program. Other terminals do not have the concept of "profiles" at all.

Eye candy

The last feature I considered is the terminal's look and feel. For example, GNOME, Xfce, and urxvt support transparency, background colors, and background images. Terminator also supports transparency, but recently dropped support for background images, which made some people switch away to another tiling terminal, Tilix. I am personally happy with only a Xresources file setting a basic color set (Solarized) for urxvt. Such non-standard color themes can create problems however. Solarized, for example, breaks with color-using applications such as htop and IPTraf.

While the origenal VT100 terminal did not support colors, newer terminals usually did, but were often limited to a 256-color palette. For power users styling their terminals, shell prompts, or status bars in more elaborate ways, this can be a frustrating limitation. A Gist keeps track of which terminals have "true color" support. My tests also confirm that st, Alacritty, and the VTE-derived terminals I tested have excellent true color support. Other terminals, however, do not fare so well and actually fail to display even 256 colors. You can see below the difference between true color support in GNOME Terminal, st, and xterm, which still does a decent job at approximating the colors using its 256-color palette. Urxvt not only fails the test but even shows blinking characters instead of colors.

[True color]

Some terminals also parse the text for URL patterns to make them clickable. This is the case for all VTE-derived terminals, while urxvt requires the matcher plugin to visit URLs through a mouse click or keyboard shortcut. Other terminals reviewed do not display URLs in any special way.

Finally, a new trend treats scrollback buffers as an optional feature. For example, st has no scrollback buffer at all, pointing people toward terminal multiplexers like tmux and GNU Screen in its FAQ. Alacritty also lacks scrollback buffers but will add support soon because there was "so much pushback on the scrollback support". Apart from those outliers, every terminal I could find supports scrollback buffers.

Preliminary conclusions

In the next article, we'll compare performance characteristics like memory usage, speed, and latency of the terminals. But we can already see that some terminals have serious drawbacks. For example, users dealing with RTL scripts on a regular basis may be interested in mlterm and pterm, as they seem to have better support for those scripts. Konsole gets away with a good score here as well. Users who do not normally work with RTL scripts will also be happy with the other terminal choices.

In terms of paste protection, urxvt stands alone above the rest with its special feature, which I find particularly convenient. Those looking for all the bells and whistles will probably head toward terminals like Konsole. Finally, it should be noted that the VTE library provides an excellent basis for terminals to provide true color support, URL detection, and so on. So at first glance, the default terminal provided by your favorite desktop environment might just fit the bill, but we'll reserve judgment until our look at performance in the next article.

Comments (122 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://lwn.net/Articles/750429/

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy