|
|
Subscribe / Log in / New account

Filesystem sandboxing with eBPF

By Jake Edge
November 6, 2019

OSS EU

Running untrusted code in a safe manner is generally the goal of sandboxing efforts. The sandbox technique presented by Georgia Tech PhD student Ashish Bijlani at Open Source Summit Europe 2019 is no exception. He has used something of a novel scheme to allow unprivileged code to implement the sandbox policies using BPF; the policies are then enforced by the kernel.

Background

There are lots of use cases for running untrusted third-party code without risking the contents of files on the system. Two that he mentioned were web-browser plugins obtained from potentially dodgy internet sites and machine-learning code that one might like to evaluate. There is a spectrum of code that can be run, from known-good code to known-bad code; in between those is unknown, untrusted code. Known-good code can be whitelisted and known-bad code can be blacklisted, sandboxing is a technique used for that code in the middle. A sandbox is simply an isolated and controlled execution environment, he said.

[Ashish Bijlani]

Bijlani is focused on a specific type of sandbox: a filesystem sandbox. The idea is to restrict access to sensitive data when running these untrusted programs. The rules would need to be dynamic as the restrictions might need to change based on the program being run. Some examples he gave were to restrict access to the ~/.ssh/id_rsa* files or to only allow access to files of a specific type (e.g. only *.pdf for a PDF reader).

He went through some of the existing solutions to show why they did not solve his problem, comparing them on five attributes: allowing dynamic policies, usable by unprivileged users, providing fine-grained control, meeting the security needs for running untrusted code, and avoiding excessive performance overhead. Unix discretionary access control (DAC)—file permissions, essentially—is available to unprivileged users, but fails most of the other measures. Most importantly, it does not suffice to keep untrusted code from accessing files owned by the user running the code. SELinux mandatory access control (MAC) does check most of the boxes (as can be seen in the talk slides [PDF]), but is not available to unprivileged users.

Namespaces (or chroot()) can be used to isolate filesystems and parts of filesystems, but cannot enforce security policies, he said. Using LD_PRELOAD to intercept calls to filesystem operations (e.g. open() or write()) is a way for unprivileged users to enforce dynamic policies, but it can be bypassed fairly easily. System calls can be invoked directly, rather than going through the library calls, or files can be mapped with mmap(), which will allow I/O to the files without making system calls. Similarly, ptrace() can be used, but it suffers from time-of-check-to-time-of-use (TOCTTOU) races, which would allow the security protections to be bypassed.

ptrace() also suffers from high performance overhead (roughly 50%), as does the final option that Bijlani outlined: Filesystem in Userspace (FUSE). A FUSE filesystem would check all of his boxes, but it suffers from nearly 80% performance overhead. He was looking for a solution that would only add 5-10% overhead, he said.

That is what he has created with SandFS. It is a stackable filesystem that can enforce unprivileged-user-specified policies on filesystem access. A user would invoke it this way:

    $ sandfs -s sandfs.o -d /home/user /bin/bash
The sandfs binary is unprivileged; it can be run by anyone. The example above would run bash within a sandbox for accesses to the /home/user directory. The sandbox is defined by sandfs.o, which is written in C and compiled by LLVM into BPF bytecode.

He talked a bit about BPF and how it can be used, calling BPF "a key enabling technology" for SandFS. BPF maps provide a mechanism to communicate between user space and BPF programs running in the kernel; they also have a major role to play for SandFS. More details on BPF can be found in this LWN article.

Architecture

He then turned to the architecture of SandFS; there are a few different components to it, starting with the SandFS daemon and SandFS library in user space. The daemon is what the sandfs binary talks to and the library is available for those developing their own security policies. There is also a modified version of Wrapfs that is used to intercept the filesystem operations for the mounted filesystem. A set of SandFS BPF handlers are available in the kernel to implement the security checking for each of the filesystem operations that are intercepted by SandFS itself, which is the filesystem based on Wrapfs.

The basic operation is that the sandfs binary sends the BPF code to the daemon, which loads it into the kernel. If the BPF verifier does not find a problem with the code, the next step is to mount SandFS on the directory specified (/home/user in the example). Any filesystem operations will be intercepted by SandFS, which will call out to the BPF programs loaded from user space in order to get access decisions. SandFS itself does not perform I/O, it simply passes any operations that were allowed by the policies down to the lower-level filesystem (e.g. ext4 or XFS).

The policies can consult BPF maps, which can be written from user space; that allows for dynamic policies. The BPF programs passed in from user space in may look things up in the maps, such as path names, to determine whether to allow access or not; it is even possible to alter parameters to the filesystem operations based on the policies (e.g. to make all open() calls read-only). SandFS handles kernel objects, rather than parameters directly passed by user space, so it avoids any TOCTTOU problems.

In the talk, he gave two example of BPF programs that could be used to restrict access. The first would consult the BPF map for the path being used as part of the lookup() filesystem operation; if it found the path in the map, it would return -EACCES, thus providing a way for user space to restrict access to any part of the sandboxed directory. The second would look at the mode specified in open() operations, rejecting those with O_CREAT and changing the mode to O_RDONLY for the rest.

He then showed some performance numbers for a few different types of operations, comparing the time taken for them on ext4 versus SandFS. Creating a .tar.gz file of the 4.17 kernel showed the lowest overhead (4.57%, 61.05s vs. 63.84s). Decompressing and expanding the tar file had the most overhead (9.75%, 5.13s vs. 5.63s), while compiling the kernel (make ‑j 4 tinyconfig) came in at 9.28% (27.15s vs. 29.67s).

The SandFS framework could be used in a number of different ways, Bijlani said. It could restrict access to private user data such as SSH keys. It could also be used to compartmentalize certain operations of a complex application, such as a web browser; handling file and media formats could be put into separate sandboxed processes. Also, container-management systems could stack multiple layers of SandFS checks to harden the filesystem access from their containers.

He wrapped up the talk by noting the the SandFS code is available on GitHub. He has written an academic paper on it as well. In addition, he pointed to some related work that he presented at OSS North America in 2018 (slides [PDF]) and at the 2018 Linux Plumbers Conference (YouTube video).

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to attend Open Source Summit Europe in Lyon, France.]

Index entries for this article
KernelBPF
KernelFilesystems
SecurityLinux kernel/Filesystems
SecuritySandboxes
ConferenceOpen Source Summit Europe/2019


to post comments

Filesystem sandboxing with eBPF

Posted Nov 6, 2019 23:43 UTC (Wed) by roc (subscriber, #30627) [Link]

> It could also be used to compartmentalize certain operations of a complex application, such as a web browser; handling file and media formats could be put into separate sandboxed processes.

Web browsers already do the latter. Those sandboxed processes implement filesystem access filtering with seccomp policy that triggers SIGSYS on openat() and a SIGSYS handler that proxies the syscall to a broker process over IPC. AFAIK this isn't actually a performance problem because it's only used for loading libraries or config files, *not* for loading Web/media content --- because that normally doesn't come from the filesystem anyway, it's obtained from other browser subsystems via IPC.

Filesystem sandboxing with eBPF

Posted Nov 7, 2019 5:29 UTC (Thu) by gutschke (subscriber, #27910) [Link]

I don't see any mention of suid-binaries, but given the very obvious potential for abuse, this filesystem hopefully prevents all new privileges, as soon as it has been mounted for a process hierarchy.

How does it protect itself?

Posted Nov 7, 2019 15:05 UTC (Thu) by NAR (subscriber, #1313) [Link] (2 responses)

I might miss something, but if e.g. that PDF reader has a bug allowing arbitrary code execution - what stops the attacker turning off ("dynamically change policy") this sandbox protection?

How does it protect itself?

Posted Nov 8, 2019 3:20 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (1 responses)

Presumably the knob for controlling that is outside the sandbox? If they're already outside the sandbox, why not just use that state directly?

How does it protect itself?

Posted Dec 3, 2019 16:10 UTC (Tue) by cpuguy83 (guest, #107303) [Link]

The case for this would be containers, where you force an app to run within a specific context rather than the app setting up said context.

Way forward to on-access antivirus in Linux

Posted Nov 8, 2019 12:01 UTC (Fri) by jorgegv (subscriber, #60484) [Link] (17 responses)

I think this mechanism could be used for implementing on-access antivirus on Linux, similar to the way it is implemented in Windows operating systems.

Some AV software (e.g. Sophos) are now using an out-of-tree module called TALPA, or even fanotify, but these come with their own limitations and/or problems (e.g. out-of tree patch, no NFSv4 support). This new implementation looks much cleaner and likely to enter mainstream kernel.

Way forward to on-access antivirus in Linux

Posted Nov 8, 2019 19:48 UTC (Fri) by amacater (subscriber, #790) [Link] (16 responses)

And who really cares about Windows-based, Windows virus definition driven AV on Linux? Sophos - the folk whose kernel modules were deemed of too limited value to be included in a Linux kernel. AV on Linux, where single threaded processes hog whole cores on a CPU - about as much use as an ashtray inside the ISS:)

Chkrootkit / rkhunter / clamav - maybe, but even then virtually all the rootkits were patched long ago - keeping software up to date solves this problem. The only justification for AV on Linux is if you're an ISP protecting a mailspool / web server serving Windows users. It will give them an illusion of relative safety tempered only by the unpatchable disaster that is Windows. Bitter, me?

Way forward to on-access antivirus in Linux

Posted Nov 8, 2019 21:50 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (15 responses)

> And who really cares about Windows-based, Windows virus definition driven AV on Linux?
People who run Linux in corporate settings?

> Chkrootkit / rkhunter / clamav - maybe, but even then virtually all the rootkits were patched long ago
Linux has a new exploitable root hole about every 3-4 months. Even more when all of the infrastructure is considered.

Way forward to on-access antivirus in Linux

Posted Nov 9, 2019 12:31 UTC (Sat) by pizza (subscriber, #46) [Link] (14 responses)

> People who run Linux in corporate settings?

Ah yes, to meet the "poorly implemented rootkit that does more harm than good" market.

Way forward to on-access antivirus in Linux

Posted Nov 9, 2019 20:39 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

Nope. Linux is freakingly insecure. It's protected right now basically by being very niche, so that attackers are not interested in it.

If this changes, get ready for Linux ransomware and undetectable rootkits. There is no hardening at all in mainstream Linux distros.

Way forward to on-access antivirus in Linux

Posted Nov 9, 2019 21:28 UTC (Sat) by amacater (subscriber, #790) [Link] (1 responses)

Linux is no longer niche - it's universal. Exploitable root hole every three months? Please be so good as to look at the average Mean Time to Repair [MTTR] in Linux and common applications and compare this to the speed of comparable patching in the commercial applications.

If, say, Amazon and the Linux components of Microsoft's Azure are too small to be regarded, please advise what you regard as important.

Way forward to on-access antivirus in Linux

Posted Nov 9, 2019 21:31 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

Android is universal. Desktop Linux is not, it barely exists.

> Exploitable root hole every three months? Please be so good as to look at the average Mean Time to Repair [MTTR] in Linux and common applications and compare this to the speed of comparable patching in the commercial applications.
Uh, what? Most IoT and Android devices are not repaired at all, they just exist in a vulnerable state.

The only thing preventing mass infections are gatekeepers in Play Store and the fact that most IoT devices don't execute arbitrary code.

Way forward to on-access antivirus in Linux

Posted Nov 10, 2019 2:14 UTC (Sun) by pizza (subscriber, #46) [Link] (6 responses)

> If this changes, get ready for Linux ransomware and undetectable rootkits. There is no hardening at all in mainstream Linux distros.

Neither of which are (or can be) addressed by the current "enterprise antivirus" paradigm.

Way forward to on-access antivirus in Linux

Posted Nov 10, 2019 2:23 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

They can. Modern antiviruses have extensive anti-patching and with kernel integrity checks. It's possible to work around them, but not at all trivial even in the kernel mode.

Way forward to on-access antivirus in Linux

Posted Nov 11, 2019 7:41 UTC (Mon) by zlynx (guest, #2285) [Link] (4 responses)

Which is a reason that they result in crashing Windows so often. And then Windows has to create little simulated environments for the AV so it can "watch" a pretend operating system.

Way forward to on-access antivirus in Linux

Posted Nov 11, 2019 8:41 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

I'm using Windows with an AV for work and so far I haven't seen it crashing. Not once.

> And then Windows has to create little simulated environments for the AV so it can "watch" a pretend operating system.
Windows doesn't do anything like this. It provides official hooks for AV software in the kernel mode, but doesn't do any emulation.

Way forward to on-access antivirus in Linux

Posted Nov 11, 2019 15:46 UTC (Mon) by zlynx (guest, #2285) [Link] (2 responses)

Microsoft has to build hacks for nearly every release of Windows 10 because some company's idiot AV thinks it knows Windows better than Microsoft does.

Way forward to on-access antivirus in Linux

Posted Nov 11, 2019 15:52 UTC (Mon) by pizza (subscriber, #46) [Link] (1 responses)

Similarly, "enterprise" AV is responsible for reducing brand-new ultrabooks with nvme storage and making them perform about as well as a much older system with spinning rust.

(seriously; I just saw a thread on my employer's intermal messaging boards about how our current enterprise AV suite makes compiles take nearly 3x longer than without it..)

Way forward to on-access antivirus in Linux

Posted Nov 11, 2019 16:17 UTC (Mon) by dezgeg (subscriber, #92243) [Link]

Not to mention all the extra security holes introduced by AVs doing complex parsing of file formats in processes running with SYSTEM permissions, e.g. https://googleprojectzero.blogspot.com/2015/09/kaspersky-...

Way forward to on-access antivirus in Linux

Posted Nov 17, 2019 5:29 UTC (Sun) by daurnimator (guest, #92358) [Link] (3 responses)

Indeed. However its a requirement of e.g. PCI-DSS Requirement 5.1.

From https://www.pcisecuritystandards.org/documents/PCI_DSS_v3...

> 5.1 Deploy anti-virus software on all systems commonly affected by malicious software (particularly personal computers and servers)

In most corporate settings where there is card data (and unless the business is willing to convince an auditor that Linux is not commonly affected by malicious software), you have to deploy *something* antivirusy.

Way forward to on-access antivirus in Linux

Posted Nov 18, 2019 23:29 UTC (Mon) by flussence (guest, #85566) [Link] (2 responses)

SELinux not good enough any more?

Way forward to on-access antivirus in Linux

Posted Nov 18, 2019 23:34 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

SELinux has been crap all along. It's pretty much impossible to use on a regular desktop system.

Way forward to on-access antivirus in Linux

Posted Nov 21, 2019 16:20 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

I haven't had to disable it for years. I even have it enabled on some servers I have running too. So maybe your experience is out-of-date (it was indeed much harder to use…6, maybe 7 years ago)?

Filesystem sandboxing with eBPF

Posted Nov 21, 2019 9:22 UTC (Thu) by rhdxmr (guest, #44404) [Link]

Wow. SandFS is very cool example of eBPF program.


Copyright © 2019, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy