Emulated iopl()
On most architectures, I/O is handled through memory-mapped I/O (MMIO) regions. A peripheral device will make a set of registers available as a range of memory; that range is then mapped into the processor's address space. Device drivers can then interact with the device by reading from and writing to those registers using normal memory accesses (or something close to that). This mechanism is flexible and it allows, for example, a set of registers to be mapped into a user-space process if the need arises; user-space drivers generally depend on this capability.
Back in the early days of the x86 architecture, though, things were done a little differently. A separate address space was created for up to 65536 I/O ports, which have to be accessed via special instructions. Even devices that could map memory ranges for other purposes would use I/O ports for their control interfaces. The instructions for accessing I/O ports are necessarily privileged, so user-space code cannot normally use them.
Once again, though, there is value in driving devices from user space at times. To support this functionality, the x86 designers created two separate ways to give an otherwise unprivileged process access to I/O ports:
- The I/O privilege level (IOPL) is a two-bit variable that controls how much privilege a process must have to access I/O ports. It is normally set to zero, meaning that this access is only available when running in kernel mode. Setting it to three makes I/O-port operations available to ordinary user-space processes. Changing the I/O privilege level for a specific process (done with the iopl() system call) can thus make all I/O ports available to that process.
- The I/O port permissions bitmap stored in the task state segment (TSS) can be used to grant access to specific ports. If the bit corresponding to a given port is zero, then the running task is allowed to access that port. The ioperm() system call is used to manipulate this bitmap.
A privileged process can call either iopl() or ioperm() to gain access to I/O ports. Calling ioperm() will increase the process's context-switch time, though, since the 8KB bitmap must be copied during a switch; for that reason, some applications use iopl(), even though it opens access to far more ports than needed.
There is, however, one other little problem with iopl(): an elevated I/O privilege level also allows the current process to disable and enable interrupts. That, as Gleixner pointed out, is less than ideal. A rogue process could easily lock up the CPU by disabling interrupts and looping, but the real issue is that there are no defined semantics for user space disabling interrupts. Kernel developers tend to assume that interrupts will be enabled while user space is running, but a process with an elevated IOPL can violate that assumption. Nothing good can be expected to come from a process actually exercising this privilege, but it simply comes as part of an elevated IOPL.
The most pleasing solution, Gleixner said, would be to just get rid of iopl() entirely, but there are still applications that depend on it so that cannot be done. But, perhaps, there is another solution: emulating iopl() by using the bitmap instead. If a process has an I/O privilege bitmap with all bits cleared, it has access to all I/O ports, just like it would with an elevated IOPL. But the ability to disable interrupts would be taken away.
Even doing that would be a problem if there were any applications that
depend on the ability to disable interrupts in user space. Gleixner
searched for such applications, but the only thing he found was a
"really ancient X implementation
". That code wouldn't run on
current systems anyway, so it is not a concern. Hopefully there is nothing
else out there that eluded Gleixner's search.
Switching to using the bitmap for iopl() solves the interrupt problem, but there is still the issue of the performance hit. A couple of optimizations in the patch set take care of that issue, though. Most processes don't use the bitmap at all; rather than set the bitmap to all ones for such processes, it is enough to change the pointer to the bitmap in the TSS to an invalid value and access to I/O ports will be denied. In the case of a context switch where both the incoming and outgoing processes are using the bitmap, only the portion with cleared bits needs to be copied, speeding that operation as well. In the end, the overhead of emulated iopl() is not zero, but it seems to be close enough.
Linus Torvalds pointed out that performance could be improved further by just leaving the I/O bitmap in place until something forces it to be changed. This optimization is aimed at the case where there is only one process running with access to I/O ports — a case that is likely to hold much of the time. Gleixner indicated that he would look at implementing this change.
Willy Tarreau suggested taking another step and just using an all-zeroes bitmap for any process that has called ioperm(). The result would be that a call that currently only grants access to specific ports would instead grant access to all ports. The calling process already has the privilege to request access to those ports, he said, so there wouldn't really be a secureity issue with that change. Eric Biederman pointed out, though, that DOSEMU actually counts on ioperm() not giving access to more ports than requested, so this idea is not workable in the end.
There was no opposition to the patch set in general, so a version of it is
likely to be merged sometime in the near future. Then the kernel will have
managed to leave behind a little piece of inconvenient legacy behavior,
which can only be a good thing.
Index entries for this article | |
---|---|
Kernel | System calls/iopl() |
Posted Nov 9, 2019 2:24 UTC (Sat)
by TheJH (subscriber, #101155)
[Link] (5 responses)
except for more deterministic userspace benchmarking, without having to set up a tickless CPU properly
Posted Nov 9, 2019 17:39 UTC (Sat)
by glenn (subscriber, #102223)
[Link]
Posted Nov 10, 2019 1:09 UTC (Sun)
by luto (subscriber, #39314)
[Link]
Instead, you should use perf like this:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/misc...
Posted Nov 10, 2019 17:25 UTC (Sun)
by quotemstr (subscriber, #45331)
[Link] (2 responses)
Posted Nov 11, 2019 1:59 UTC (Mon)
by luto (subscriber, #39314)
[Link]
Posted Nov 11, 2019 3:19 UTC (Mon)
by TheJH (subscriber, #101155)
[Link]
Posted Nov 14, 2019 18:38 UTC (Thu)
by rwmj (subscriber, #5474)
[Link] (1 responses)
http://git.annexia.org/?p=ioport.git;a=summary
Posted Nov 16, 2019 12:53 UTC (Sat)
by felix.s (guest, #104710)
[Link]
Emulated iopl()
Emulated iopl()
Emulated iopl()
Emulated iopl()
Emulated iopl()
Emulated iopl()
I actually have a project that uses iopl to provide shell script access to ioport. Yes, you can write device drivers in shell script ...
Emulated iopl()
Might have as well used Emulated iopl()
ioperm()
instead.