The inherent fragility of seccomp()
seccomp() allows the establishment of a filter that will restrict the set of system calls available to a process. It has obvious uses related to sandboxing; if an application has no need for, say, the open() system call, blocking access to that call entirely can reduce the damage that can be caused if that application is compromised. As more systems and programs are subjected to hardening, the use of seccomp() can be expected to continue to increase.
Michael Kerrisk recently reported that upgrading to glibc 2.26 broke one of his demonstration applications. That program was using seccomp() to block access to the open() system call. The problem that he ran into comes down to the fact that applications almost never invoke system calls directly; instead, they call wrappers that have been defined by the C library.
The glibc open() wrapper has, since the beginning, been a wrapper around the kernel's open() system call. But open() is an old interface that has long been superseded by openat(). The older call still exists because applications expect it to be there, but it is implemented as a special case of openat() within the kernel itself. In glibc 2.26, the open() wrapper was changed to call openat() instead. This change was not visible to ordinary applications, but it will break seccomp() filters that behave differently for open() and openat().
Kerrisk was not really complaining about the change, but he did want to
inform the glibc developers that there were user-visible effects from
it: "I want to raise awareness that these sorts of changes have the
potential to possibly cause breakages for some code using seccomp, and note
that I think such changes should not be made lightly or
gratuitously
". The developers should, he was suggesting, keep the
possibility of breaking seccomp() filters in mind when making
changes, and they should document such changes when they cannot be avoided.
Florian Weimer, however, disagreed:
Another way of putting this might be: seccomp() filters are not considered to be a part of the ABI that is provided by glibc, so incompatible changes there are not considered regressions. They are, instead, a consequence of filtering below the glibc level while expecting behavior above that level to remain unchanged.
Weimer's point of view would appear to be the one that will govern glibc development going forward. So Kerrisk has proposed some man-page changes to make the fragility of seccomp() filters a bit less surprising to developers. Playing the game at this level will require a fairly deep understanding of what is going on and the ability to adapt to future C-library changes.
This outcome could be seen as an argument in favor of a filtering interface like OpenBSD's pledge(). Like seccomp(), pledge() is used to limit the set of system calls available to a process, but pledge() is defined in terms of broad swathes of functionality rather than working at the level of individual system calls. It can be used to allow basic file I/O, for example, while disabling the opening (or creation) of new files. pledge() is far less expressive than seccomp() and cannot implement anything close to the same range of policies but, for basic filtering, it seems far less likely to generate surprises after a kernel or library update.
But Linux doesn't have pledge() and seems unlikely to get it.
seccomp() can certainly get the sandboxing job done, but
developers who use it should expect to spend some ongoing effort
maintaining their filters.
Index entries for this article | |
---|---|
Kernel | Security/seccomp |
Security | Linux kernel/Seccomp |
The inherent fragility of seccomp()
Posted Nov 10, 2017 21:07 UTC (Fri)
by juliank (guest, #45896)
[Link] (5 responses)
Posted Nov 10, 2017 21:07 UTC (Fri) by juliank (guest, #45896) [Link] (5 responses)
(1) base set of permissions (normal file I/O, sysv IPC [if fakeroot is used])
(2) directory reading
(3) sockets
See https://anonscm.debian.org/cgit/apt/apt.git/tree/methods/... and later lines.
This will break eventually if a new syscall is introduced. I consider two ways to solve that:
(1) Keep a list of all syscalls that have been checked in the source code, and regularly (on CI) check if there are new ones. If new ones appear, they have to be compared to the existing set, and if similar enough, added to the list.
(2) make syscalls return ENOSYS instead of aborting the program. This should cause libc to fall back from new optimised syscalls to old syscalls, as it has to maintain a certain base level
Combining the two should yield a maintainable result.
[1] https://juliank.wordpress.com/2017/10/23/apt-1-6-alpha-1-...
Another thing people don't consider are NSS modules and LD_PRELOAD. They could be doing all kind of weird stuff when you call getaddrinfo(). For example, they could use SYSV IPC to talk to another process, like a DNS cache. Evil little bastards. We had the same problem with people running apt in fakeroot: fakeroot needs sysv ipc to talk to its metadata daemon thing, and these were not whitelisted. I hacked in support for that - if FAKED_MODE is set in the environment, it now adds ipc syscalls. Ugly.
The inherent fragility of seccomp()
Posted Nov 10, 2017 21:10 UTC (Fri)
by juliank (guest, #45896)
[Link] (1 responses)
Posted Nov 10, 2017 21:10 UTC (Fri) by juliank (guest, #45896) [Link] (1 responses)
The inherent fragility of seccomp()
Posted Nov 11, 2017 1:45 UTC (Sat)
by pkern (subscriber, #32883)
[Link]
Posted Nov 11, 2017 1:45 UTC (Sat) by pkern (subscriber, #32883) [Link]
At the same time as stated in the original post AppArmor also leaks the details of the libraries an application loads into the profiles. Or if they exec something you need to account for whatever the exec'ed app does.
The inherent fragility of seccomp()
Posted Nov 11, 2017 0:14 UTC (Sat)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Nov 11, 2017 0:14 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)
(1) Keep a list of all syscalls that have been checked in the source code, and regularly (on CI) check if there are new ones. If new ones appear, they have to be compared to the existing set, and if similar enough, added to the list.You have to check all libraries your program uses, as well, and all libraries those libraries use, and so on ad infinitum. Oh and don't forget LD_PRELOADed libraries, dynamically loaded plugins, etc etc etc. (Particularly relevant if things like Gtk are in use because of the possibility of accessibility and IM plugins that call out to weird hardware and the like that you have quite possibly never realised even exists: but speech recognition for blind people sometimes relies on LD_PRELOAD to interpose all console I/O, etc etc etc... the list of obscure edge cases crucial to someone that this breaks is endless, and IMHO unmaintainable.)
(2) make syscalls return ENOSYS instead of aborting the program. This should cause libc to fall back from new optimised syscalls to old syscalls, as it has to maintain a certain base levelSee my comment below for a case where the affected syscall was getpid(). getpid() is guaranteed to never fail, so nobody ever checks to see if it failed.
I just checked the seccomp filters active in a bunch of programs running on the system on which I'm typing this. Several of them still do not whitelist getpid(), almost a year after glibc 2.25 was released. I guess they're working by luck. The first such example is something that really *needs* seccomp, too: ntpd 4.2.8p10. It calls getpid() multiple times in the very same source file where it sets up a filter list that excludes getpid(): the obscure and out-of-the-way ntpd/ntpd.c. One of its calls does not check for failure, so can easily end up trying to set a process group of (pid_t)-1... it's in a tangle of conditionals that mean that most of the time, if you're lucky, you'll end up not compiling in that code -- but there are several other calls elsewhere in the source tree... and oh yes it also links to OpenSSL's libcrypto. Any bets on whether *that* calls getpid()? Repeat for every other syscall it doesn't allow past, and every syscall it allows past but only with argument checking.
This is not a maintainable strategy for any but the simplest programs.
The inherent fragility of seccomp()
Posted Nov 11, 2017 8:38 UTC (Sat)
by alonz (subscriber, #815)
[Link]
I believe the OP meant something subtly different: he wasn't planning to check which syscalls the program uses, rather just what syscalls exist in the kernel. When new syscalls are added - he would add them to the appropriate group in the filters (e.g., if it's a new way to open files, it will be filtered the same as all other open* syscalls). And until this update happens, the filters will ensure glibc (or any other library) will get ENOENT for this new syscall, forcing it to fall back to older syscalls.
Posted Nov 11, 2017 8:38 UTC (Sat) by alonz (subscriber, #815) [Link]
In a sense, this just implements a poor-man's-pledge, with the CI system ensuring it evolves together with the kernel (or at least trying to).
The inherent fragility of seccomp()
Posted Nov 11, 2017 16:26 UTC (Sat)
by marcH (subscriber, #57642)
[Link]
Posted Nov 11, 2017 16:26 UTC (Sat) by marcH (subscriber, #57642) [Link]
https://bugs.chromium.org/p/chromium/issues/detail?id=772273
sslh seccomp policy blocks ssh to ChromeOS over link-local IPv6 addresses
https://chromium-review.googlesource.com/c/chromiumos/ove...
The inherent fragility of seccomp()
Posted Nov 10, 2017 21:59 UTC (Fri)
by luto (subscriber, #39314)
[Link] (7 responses)
Posted Nov 10, 2017 21:59 UTC (Fri) by luto (subscriber, #39314) [Link] (7 responses)
The inherent fragility of seccomp()
Posted Nov 10, 2017 22:41 UTC (Fri)
by arnd (subscriber, #8866)
[Link]
Posted Nov 10, 2017 22:41 UTC (Fri) by arnd (subscriber, #8866) [Link]
Part of the problem is that we have reduced the set of available syscalls on modern architectures, anything that uses include/uapi/asm-generic/unistd.h for instance intentionally offer only openat() but not open(). When glibc can reasonably assume that openat() is available on all architectures, the logical next step is to always call that to reduce the differences between architectures.
The inherent fragility of seccomp()
Posted Nov 10, 2017 22:43 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link]
glibc can be compiled with a guaranteed minimum kernel version, and will skip compatibility code if the minimum kernel version is higher or equal to the one that included a particular system call.
You can search the libc manual for "--enable-kernel".
Posted Nov 10, 2017 22:43 UTC (Fri) by pbonzini (subscriber, #60935) [Link]
The inherent fragility of seccomp()
Posted Nov 10, 2017 22:45 UTC (Fri)
by juliank (guest, #45896)
[Link]
Posted Nov 10, 2017 22:45 UTC (Fri) by juliank (guest, #45896) [Link]
The backtrace thing with the trap signal is especially useful for stuff like NSS modules and preloaded libraries.
The inherent fragility of seccomp()
Posted Nov 11, 2017 0:00 UTC (Sat)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Nov 11, 2017 0:00 UTC (Sat) by nix (subscriber, #2304) [Link] (3 responses)
This is not the first instance of such breakage: in glibc 2.25, BIND's named daemon stopped working. The failure was catastrophic: rather than daemonizing, it hung forever, which had a tendency to bring boots to a grinding halt: if it didn't, it had a tendency to bring whole networks to a halt if this wasn't noticed and all machines were eventually rebooted after an update. The cause? glibc 2.25 dropped the internal caching of getpid() which it had long done, since it didn't speed much up, added a lot of complexity, introduced subtle bugs, and broke horribly with PID namespaces. When this was done, threaded programs which called getpid() for the first time after activating their seccomp filters needed to whitelist it in those filters, where they never needed to before. BIND had not done so, and called getpid() before daemonizing. Strangely neither it nor glibc expected getpid() to fail. POSIX guarantees it cannot fail, but thanks to seccomp it now can.
Worse yet, this sort of failure can happen even if the call is only made in some non-glibc library, even if the library has no idea the seccomp filters are in force in the first place, and even if the program installing the filter has no idea the library was calling the function (perhaps it wasn't when the filters were added, and who can check every change ever made to every library your program depends on, even indirectly?)
Expecting glibc and other libraries to avoid making changes that break seccomp filters is tantamount to demanding that they never change the set of syscalls they invoke (or the arguments passed to them, because who knows what validation those filters are carrying out) in any situation ever, which would make library development on Linux essentially impossible.
I don't see a way to fix this in the current model other than to demand that all seccomp-filtered programs be statically linked and never upgraded (which would make it impossible to fix security holes in them or any libraries they used: this is of course ridiculous). The increasing use of seccomp is placing silent landmines beneath the feet of everyone using every seccomp-filtered program. This is a shame, because if programs were never upgraded and their behaviour was completely predictable, seccomp would be an excellent way to prevent malicious behaviour. However, in a world like that, programs would all already be secure and we wouldn't need seccomp in the first place.
The obvious fix, to introduce LD_AUDIT-style filtering on *library* arguments, falls at the same hurdle, for the same reason: as long as the filters are process-wide, some library getting upgraded can unintentionally violate the contract of the filter and break. The only solution I can see that would work reliably would be for each library to filter *its own* calls, so that it could at least in theory adjust its filters as its own set of expected calls changed: a sort of DT_SYMBOLIC per-.so filter for inter-shared library function calls. God knows how to implement that without totally wrecking performance though: it would mean every filtered call would have to go through the PLT and ld.so, at the very least: the very opposite of the increasing reduction in lazy binding that's actually happening. I suspect there are more complexities I haven't considered, too.
The inherent fragility of seccomp()
Posted Nov 11, 2017 3:37 UTC (Sat)
by patrakov (subscriber, #97174)
[Link]
Posted Nov 11, 2017 3:37 UTC (Sat) by patrakov (subscriber, #97174) [Link]
https://sites.google.com/site/fullycapable/Home/thesendma...
There, it was also a syscall failing, that could not fail previously (with sendmail not checking the return), due to a new security mechanism (capabilities).
The inherent fragility of seccomp()
Posted Nov 12, 2017 8:27 UTC (Sun)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Nov 12, 2017 8:27 UTC (Sun) by epa (subscriber, #39769) [Link] (1 responses)
Programs that are seccomp-aware, and want to handle these things defensively, could arrange to trap the signal. Otherwise existing code would at least either work correctly or fail cleanly.
The inherent fragility of seccomp()
Posted Nov 13, 2017 1:01 UTC (Mon)
by simcop2387 (subscriber, #101710)
[Link]
Posted Nov 13, 2017 1:01 UTC (Mon) by simcop2387 (subscriber, #101710) [Link]
The inherent fragility of seccomp()
Posted Nov 11, 2017 4:59 UTC (Sat)
by roc (subscriber, #30627)
[Link] (3 responses)
Posted Nov 11, 2017 4:59 UTC (Sat) by roc (subscriber, #30627) [Link] (3 responses)
There isn't really a good solution here. pledge() won't scale to a broader software ecosystem. Trying to let libraries express their syscall requirements and collect those transitively would be complicated and prone to errors that over-expose the kernel. Probably a more capability-based kernel API would be better, but it's hard to get there from here.
The inherent fragility of seccomp()
Posted Nov 12, 2017 8:59 UTC (Sun)
by wahern (subscriber, #37304)
[Link] (2 responses)
Posted Nov 12, 2017 8:59 UTC (Sun) by wahern (subscriber, #37304) [Link] (2 responses)
The inherent fragility of seccomp()
Posted Nov 12, 2017 9:27 UTC (Sun)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Nov 12, 2017 9:27 UTC (Sun) by roc (subscriber, #30627) [Link] (1 responses)
Then you'd have to rewrite glibc and most other userspace libraries and applications to use capsicum-enabled APIs.
It could be great, but don't claim it's easy.
The inherent fragility of seccomp()
Posted Nov 13, 2017 21:42 UTC (Mon)
by wahern (subscriber, #37304)
[Link]
Posted Nov 13, 2017 21:42 UTC (Mon) by wahern (subscriber, #37304) [Link]
Getting over that political hurdle seems daunting, unfortunately. AFAIK the CLONE_FD patch (https://lwn.net/Articles/638613/), necessary for implementing Capsicum's pdfork() interface, _still_ hasn't been merged.
Regarding glibc, I'm not sure how much of an impact it would have on glibc. The particular case of open v openat is irrelevent because applications are supposed to be using openat in the Capsicum model, anyhow. The benefit of Capsicum is that it builds upon the existing, de facto file descriptors-as-capabilities model in Unix. From the perspective of libc, playing nice with Capsicum is roughly similar to refactoring to better leverage the latest evolutions of POSIX and privilege separation best practices. For example, use getrandom() instead of expecting to open /dev/urandom. And stop relying on /proc so heavily because it's not always visible. These are things glibc has to do, anyhow.
The inherent fragility of seccomp()
Posted Nov 13, 2017 7:00 UTC (Mon)
by quotemstr (subscriber, #45331)
[Link] (6 responses)
Posted Nov 13, 2017 7:00 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (6 responses)
The inherent fragility of seccomp()
Posted Nov 13, 2017 14:43 UTC (Mon)
by musicinmybrain (subscriber, #42780)
[Link]
Posted Nov 13, 2017 14:43 UTC (Mon) by musicinmybrain (subscriber, #42780) [Link]
The inherent fragility of seccomp()
Posted Nov 13, 2017 15:08 UTC (Mon)
by nix (subscriber, #2304)
[Link] (4 responses)
Posted Nov 13, 2017 15:08 UTC (Mon) by nix (subscriber, #2304) [Link] (4 responses)
e.g. if your sshd has something obscure LD_PRELOADed into it for the sake of a blind user, now you have to adapt to the new syscalls it makes in routine operation, even though you probably had no idea the thing existed. (OK, in this case, the blind user would more likely be preloading something into the ssh *client*, which is not seccomped, but if we're going to try seccomping anything associated with a user interface we'll suddenly have to consider input methods and God knows what getting preloaded in or plugged in).
The inherent fragility of seccomp()
Posted Nov 14, 2017 23:41 UTC (Tue)
by wahern (subscriber, #37304)
[Link] (3 responses)
Posted Nov 14, 2017 23:41 UTC (Tue) by wahern (subscriber, #37304) [Link] (3 responses)
Which brings up another benefit of pledge over seccomp--pledge doesn't require root privileges to invoke. Almost all the standard utilities in OpenBSD call pledge, _including_ ssh(1). pledge can do this because it's not inherited across exec, which smartly sidesteps all the messy security issues with the setuid and setgid executable bits.
The inherent fragility of seccomp()
Posted Nov 15, 2017 0:34 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Posted Nov 15, 2017 0:34 UTC (Wed) by nix (subscriber, #2304) [Link]
Which brings up another benefit of pledge over seccomp--pledge doesn't require root privileges to invoke.Neither does the installation of a seccomp filter, as long as you have done a prctl(PR_SET_NO_NEW_PRIVS, 1) first to ensure that you can't go invoking setuid programs, etc, later on. Heck, it was basically designed for Chromium's renderers, and no way are they run as root except by absolute lunatics :)
(This is how it avoids the old sendmail cap attack: setuid programs or their children can't be fooled into running with an unexpected seccomp filter installed before the setuid took effect, because installation of a filter requires turning permanently off the ability to invoke setuid programs in the process hierarchy that has the filter in force.)
The inherent fragility of seccomp()
Posted Dec 11, 2017 1:13 UTC (Mon)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Dec 11, 2017 1:13 UTC (Mon) by roc (subscriber, #30627) [Link] (1 responses)
The inherent fragility of seccomp()
Posted Dec 11, 2017 7:14 UTC (Mon)
by mjg59 (subscriber, #23239)
[Link]
Posted Dec 11, 2017 7:14 UTC (Mon) by mjg59 (subscriber, #23239) [Link]
openat() was available before - why they are blaming glibc?
Posted Nov 15, 2017 12:29 UTC (Wed)
by sasha (guest, #16070)
[Link] (4 responses)
Posted Nov 15, 2017 12:29 UTC (Wed) by sasha (guest, #16070) [Link] (4 responses)
openat() was available before - why they are blaming glibc?
Posted Nov 16, 2017 8:24 UTC (Thu)
by smcv (subscriber, #53363)
[Link] (3 responses)
Posted Nov 16, 2017 8:24 UTC (Thu) by smcv (subscriber, #53363) [Link] (3 responses)
The situation here seems to be the other way round: a whitelist-based filter allowed a particular program to call the open syscall (and therefore open files), but in recent glibc, the open(2) wrapper function actually uses the more general openat syscall, which the filter didn't allow. This caused that program to become unable to open files - not vulnerable, but also not usable ("failing closed").
openat() was available before - why they are blaming glibc?
Posted Nov 16, 2017 9:19 UTC (Thu)
by jem (subscriber, #24231)
[Link] (2 responses)
Posted Nov 16, 2017 9:19 UTC (Thu) by jem (subscriber, #24231) [Link] (2 responses)
openat() was available before - why they are blaming glibc?
Posted Nov 16, 2017 13:27 UTC (Thu)
by vadim (subscriber, #35271)
[Link]
Posted Nov 16, 2017 13:27 UTC (Thu) by vadim (subscriber, #35271) [Link]
1. Programmer writes code, wants more protection and decides on the use of seccomp.
2. Programmer looks at what the code needs, and comes up with the 'open' syscall. However the code doesn't call the syscall directly, but the wrapper glibc provides.
3. Code is finished, programmer moves on to the next project.
4. Kernel development goes on, and the 'openat' syscall gets created.
5. Glibc adds usage of openat, and makes it so that in some cases, the glibc provided open wrapper sometimes actually calls 'openat'.
6. In those cases, the previously written code ends up using the 'openat' syscall which is not whitelisted because it didn't exist when the code was written, or because the 'open' wrapper always used the 'open' syscall and nothing else, and this changed later.
The 'open' call doesn't go anywhere. Glibc just doesn't promise to do an exact 1 to 1 wrapper, or not to introduce internal usage of additional new syscalls for its own internal reasons. When you call glibc open(), glibc may actually invoke a new, more advanced syscall like openat instead, or use additional syscalls in the wrapper.
openat() was available before - why they are blaming glibc?
Posted Nov 16, 2017 14:09 UTC (Thu)
by corbet (editor, #1)
[Link]
This conversation confuses me a bit. Who is blaming glibc? The article is about a particular kernel functionality that is prone to breakage.
The term "fragility" in the title was applied to seccomp(), not glibc, after all.
I've not seen comments blaming glibc either...?
Posted Nov 16, 2017 14:09 UTC (Thu) by corbet (editor, #1) [Link]