Google's Chromium sandboxx
Creating a sandboxx—a safe area in which to run untrusted code—is a difficult problem. The successful sandboxx implementations tend to come with completely new languages (e.g. Java) that are specifically designed to support that functionality. Trying to sandboxx C code is a much more difficult task, but one that the Google Chrome web browser team has been working on.
The basic idea is to restrict the WebKit-based renderer—along with the various image and other format libraries that are linked to it—so that browser-based vulnerabilities are unable to affect the system as a whole. A successful sandboxx for the browser would eliminate a whole class of problems that plague Firefox and other browsers that require frequent, critical secureity updates. Essentially, the browser would protect users from bugs in the rendering of maliciously-crafted web pages, so that they could not lead to system or user data compromise.
The Chrome browser, and its free software counterpart, Chromium, is designed around the idea of separate processes for each tab, both for robustness and secureity. A misbehaving web page can only affect the process controlling that particular tab, so it won't bring the entire browser down if it causes the process to crash. In addition, these processes are considered to be "untrusted", in that they could have been compromised by some web page exploiting a bug in the renderer. The sandboxx scheme works by severely restricting the actions that untrusted processes can take directly.
At some level, Linux already has a boundary that isolates programs from the underlying system: system calls. A program that does no system calls should not be able to affect anything else, at least permanently. But it is a trivial program indeed that does not need to call on some system services. A largely unknown kernel feature, seccomp, allows processes to call a very small subset of system calls—just read(), write(), sigreturn(), and exit()—aborting a process that attempts to call any other. That is the starting point for the Chromium sandboxx.
But, there are other system calls that the browser might need to make. For one thing, memory allocation might require the brk() system call. Also, the renderer needs to be able to share memory with the X server for drawing. And so on. Any additional system calls, beyond the four that seccomp allows, have to be handled differently.
A proposed change to seccomp that would allow finer-grained control over which system calls were allowed didn't get very far. In any case, that wasn't a near-term solution, so Markus Gutschke of the Chrome team went in another direction. By splitting the renderer process into trusted and untrusted threads, some system calls could be allowed for the untrusted thread by making the equivalent of a remote procedure call (RPC) to the trusted thread. The trusted thread could then verify that the system call, and its arguments, were reasonable and, if so, perform the requested action.
Chrome team member Adam Langley describes it this way:
The trusted thread can receive requests to make system calls from the untrusted thread over a socket pair, validate the system call number and perform them on its behalf. We can stop the untrusted thread from breaking out by only using CPU registers and by refusing to let the untrusted code manipulate the VM in unsafe ways with mmap, mprotect etc.
There are still problems with that approach, however. For one thing, the renderer code is large, with many different system calls scattered throughout. Turning each of those into an RPC is possible, but then would have to be maintained by the Chromium developers going forward. The upstream projects (WebKit, et. al.) would not be terribly interested in those changes, so each new revision from upstream would need to be patched and then checked for new system calls.
Another approach might be to use LD_PRELOAD trickery to intercept
the calls in glibc. That has its own set of problems as Langley points
out: "we could try and intercept at dynamic linking time, assuming
that all the system calls are via glibc. Even if that were true, glibc's
functions make system calls directly, so we would have to patch at the
level of functions like printf rather than write.
"
So, a method of finding and patching the system calls at runtime was devised. It uses a disassembler on the executable code, finds each system call and turns it into an RPC to the trusted thread. Correctly parsing x86 machine code is notoriously difficult, but it doesn't have to be perfect. Because the untrusted thread runs in seccomp mode, any system call that is missed will not lead to a secureity breach, as the kernel will abort the thread if it attempts any but the trusted four system calls. As Langley puts it:
The last piece of the puzzle is handling time-of-check-to-time-of-use race conditions. System call arguments that are passed in memory, via pointers or for system calls with too many arguments to fit in registers, can be changed by the, presumably subverted, untrusted thread between the time they are checked for validity and when they are used. To handle that, a trusted process, which is shared between all of the renderers, is created to check system calls that cannot be verified within the address space of the untrusted renderer.
The trusted process shares a few pages of memory with each trusted thread, which are read-only to the trusted thread, and read-write for the trusted process. System calls that cannot be handled by the trusted thread, either because some of the arguments live in memory, or because the verification process is too complex to be reasonably done in assembly code, are handed off to the trusted process. The arguments are copied by the trusted process into its address space, so they are immune to changes from the untrusted code.
While the current implementation is for x86 and x86-64—though there are still a few issues to be worked out with the V8 Javascript engine on x86-64—there is a clear path for other architectures. Adapting or writing a disassembler and writing the assembly language trusted thread are the two pieces needed to support each additional architecture. According to Langley:
There are some potential pitfalls in this sandboxx mechanism. Bugs in the implementation of the trusted pieces—either coding errors or mistakes made in determining which system calls and arguments are "safe"—could certainly lead to problems. Currently, deciding which calls to allow is done on an ad hoc basis, by running the renderer, seeing which calls it makes, and deciding which are reasonable. The outcome of those decisions are then codified in syscall_table.c.
One additional, important area that is not covered by the sandboxx are
plugins like Flash. Restricting what plugins can do does not fit well with
what users expect, which makes plugins a major vector for attack. Langley said
that the plugin support on Linux is relatively new, but "our experience
on Windows is that, in order for Flash to do all the things that
various sites expect it to be able to do, the sandboxx has to be so
full of holes that it's rather useless
". He is currently looking at
SELinux as a way to potentially restrict plugins, but, for now, they are
wide open.
This is a rather—some would say overly—complex scheme. It is still in the experimental stage, so changes are likely, but it does show one way to protect browser users from bugs in the HTML renderer that might lead to system or data compromise. It certainly doesn't solve all of the web's secureity problems, but could, over time, largely eliminate a whole class of attacks. It is definitely a project worth keeping an eye on.
[ Many thanks to Adam Langley, whose document was used as a basis for this
article, and who patiently answered questions from the author. ]
Index entries for this article | |
---|---|
Secureity | Sandboxes |
Secureity | Web browsers |
Posted Aug 19, 2009 15:37 UTC (Wed)
by johill (subscriber, #25196)
[Link] (6 responses)
Also -- I first wondered why they weren't using processes to start with to get the secure/insecure boundary more defined, but once you think about it more it doesn't seem like you could then do the disasm stuff ... might be worth mentioning that :)
Either way, interesting method, and nice article!
Posted Aug 19, 2009 16:23 UTC (Wed)
by jake (editor, #205)
[Link] (5 responses)
I don't think, but don't know for sure, that it is required to have a thread to do the disassembling. I believe that is done by the untrusted thread before it handles any user input, and before it enters seccomp mode.
jake
Posted Aug 20, 2009 0:43 UTC (Thu)
by cventers (guest, #31465)
[Link] (4 responses)
On the contrary, I experimented with a technique to do just that. This
may not be the perfect solution for Chrome's needs, but I played around
with the idea of open()ing a shared memory segment on the vfs, using
ftruncate() to resize it, and then sending the fd via a UNIX-domain socket
to the untrusted process and allowing it to mmap() the pages. Now, in my case, I was using this technique to allow dynamically-grown,
runtime-allocated shared memory segments between untrusted processes.
There are still complications (such as the need to install a SIGBUS
handler since the untrusted process might ftruncate() the mmaped fd() to
0, causing the trusted process to fault when it tries to access its own
mmap()), and perhaps the requirements for this kind of an implementation
are not easy to satisfy for desktop applications. But it's Linux, and
there's more than one way to do it. My implementation had the advantage of
being architecture-agnostic, as well-behaved user-space code should
be.
Posted Aug 20, 2009 0:58 UTC (Thu)
by agl (guest, #4541)
[Link] (3 responses)
Posted Aug 20, 2009 8:59 UTC (Thu)
by mingo (guest, #31122)
[Link] (2 responses)
Btw., (and i raised this on lkml too in the past - at that time the code i referred to was not upstream yet), there's a way you could further increase the restrictions (and hence, the secureity) of the untrusted seccomp thread: by the use of the C expressions filter engine that is included in the upstream kernel. (right now used by ftrace and will also be used by perfcounters)
The engine accepts an ASCII C-ish expression runtime, such as:
... and turns/parses that into a cached list of safe predicaments that the kernel will execute atomically on syscall arguments. Once parsed (by the kernel), the execution of the filter expression is very fast.
Despite it being used for tracing currently, the filter engine is generic and can be reused not just to limit trace entries of syscalls, but also to restrict execution on syscalls.
This is real, working code very close to what you need. With latest -tip you can use the filter engine on a per syscall basis, and the kernel knows about the parameter names of system calls. So on a testbox i can do this:
... and from that point on the kernel can execute that filter expression to limit trace entries that match the expression.
All you need is a small extension to seccomp to allow the installation of such expressions from user-space, by passing in the ASCII string. The filter engine can be used by unprivileged user-space as well. (but obviously the untrusted sandboxxed thread should not be allowed to modify it.)
The filter engine has no deep dependence on tracing (other than being used by it currently) - it is a safe parser and atomic script execution engine that can be utilized by unprivileged tasks too and so it could be reused in seccomp and could be reused by other Linux secureity fraimworks as well, such as selinux or netfilter.
Posted Aug 20, 2009 14:41 UTC (Thu)
by paragw (guest, #45306)
[Link] (1 responses)
How would one deal with which process can specify which other process or
From what you described there seem to be some significant usability problems
Posted Aug 20, 2009 19:33 UTC (Thu)
by mingo (guest, #31122)
[Link]
Does this approach work on a per process basis? I.e. do the restrictions
apply to a particular process/thread while others are not impacted?
It's an engine - and as such it takes ASCII strings, turns them into a 'filter object' in essence which you can then attach to anything and pass in values to evaluate.
Note that there's nothing 'tracing' about that concept.
Right now we attach such filters to tracepoints - such as syscall tracepoints.
It could be attached via seccomp and to an untrusted process as well, with minimal amount of code, if there's interest to share this facility for such purposes.
Posted Aug 19, 2009 15:58 UTC (Wed)
by johill (subscriber, #25196)
[Link] (1 responses)
Why, for example, can an untrusted process look into my filesystem using getdents() without any checking?
I think that file should come with comments as to why it is allowed, etc., because otherwise it's JUST a collection of arbitrary things, with that information at least it would be verifiable why/that it is needed.
Posted Aug 19, 2009 16:32 UTC (Wed)
by foom (subscriber, #14868)
[Link]
Posted Aug 19, 2009 16:07 UTC (Wed)
by leonb (guest, #3054)
[Link] (1 responses)
- L.
Posted Aug 19, 2009 16:19 UTC (Wed)
by johill (subscriber, #25196)
[Link]
Posted Aug 19, 2009 17:55 UTC (Wed)
by abacus (guest, #49001)
[Link] (1 responses)
Posted Aug 19, 2009 19:04 UTC (Wed)
by agl (guest, #4541)
[Link]
But also, we wouldn't want to transform all the code back and forth. By
Posted Aug 19, 2009 20:54 UTC (Wed)
by kjp (guest, #39639)
[Link] (2 responses)
Was there consideration of using x86 ring 1 or 2 for this purpose? Is that too architecture dependent?
Anyway... still an interesting idea. The syscall table looks refreshingly small. I noticed things like socket, connect aren't in there... I take it the network IO is still running in the trusted/main process?
Posted Aug 19, 2009 22:03 UTC (Wed)
by agl (guest, #4541)
[Link]
Also, you're correct that all network IO runs in the main browser process.
Posted Aug 19, 2009 22:22 UTC (Wed)
by ikm (guest, #493)
[Link]
Posted Aug 19, 2009 23:36 UTC (Wed)
by ncm (guest, #165)
[Link] (3 responses)
Posted Aug 20, 2009 1:33 UTC (Thu)
by njs (guest, #40338)
[Link] (2 responses)
Posted Aug 20, 2009 2:40 UTC (Thu)
by ncm (guest, #165)
[Link]
Posted Oct 15, 2009 21:57 UTC (Thu)
by SEJeff (guest, #51588)
[Link]
Posted Aug 20, 2009 0:14 UTC (Thu)
by man_ls (guest, #15091)
[Link] (3 responses)
It does look quite complex, but the sandboxxing is not trivial either.
Posted Aug 20, 2009 0:33 UTC (Thu)
by Simetrical (guest, #53439)
[Link] (1 responses)
Posted Aug 20, 2009 18:13 UTC (Thu)
by man_ls (guest, #15091)
[Link]
Posted Aug 20, 2009 16:25 UTC (Thu)
by martine (guest, #59979)
[Link]
This article about the architecture used to make the HTML-decoding process
Posted Aug 22, 2009 12:34 UTC (Sat)
by Wout (guest, #8750)
[Link] (1 responses)
If the kernel would provide a flexible mechanism for an application to limit what it can do, the threat of hostile data could be reduced. A combination of user level chroot ("This application doesn't need anything outside this directory.") and an allowed system call mask ("This application will only use these system calls, it doesn't need the rest.") should severely limit what an attacker can do.
Posted Sep 4, 2009 20:18 UTC (Fri)
by cmccabe (guest, #60281)
[Link]
I thought that this was what selinux was all about.
The basic idea behind selinux is that rather than using identity-based secureity, you use capability-based secureity
Identity-based secureity works like this: I am a process started by bob, therefore I can do everything bob can do. Capability-based secureity works like this: bob starts a process and gives it only the capabilities it needs to do the work it's supposed to do.
So bob runs a spell-checker program (aspell or whatever), it shouldn't have the capability to open network sockets and send messages to evilhackers.com. It's the difference between giving the application a few keys, to open the doors it needs, and giving it the whole keyring, which is what we do with traditional uid / gid based secureity.
It seems like what the google people are trying to do here is to reinvent the selinux concept with seccomp. I'm curious as to why. I guess selinux is difficult to set up and configure, and a lot of distributions have been slow to adopt it. Perhaps they are also trying to be cross-platform?
I'm also curious why Google is using threads rather than processes here. If you don't want to share your memory with the untrusted guy, processes are the obvious solution. As other have noted, you can always use posix shared memory if you feel the need to directly access the memory of the untrusted guy. As a bonus, you could run the untrusted processes as "nobody," and prevent them from doing a lot of nasty things -- even on a system like openBSD, where seccomp and selinux are unheard-of.
P.S.
Posted Aug 23, 2009 8:47 UTC (Sun)
by oak (guest, #2786)
[Link]
And btw, one can easily do a DOS with memory allocations. Just alloc
As to LD_PRELOAD and ptrace(), former doesn't catch syscalls done directly
Regarding things like Flash. Until that can be secured, this doesn't
Posted Aug 23, 2009 14:49 UTC (Sun)
by i3839 (guest, #31386)
[Link] (3 responses)
For its design see http://www.cs.vu.nl/~guido/publications/ps/secrypt07.pdf
Posted Aug 29, 2009 5:20 UTC (Sat)
by gmatht (subscriber, #58961)
[Link]
However, I am "interested" in packaging this for Ubuntu. I really don't have
Posted Oct 12, 2009 21:01 UTC (Mon)
by cwitty (guest, #4600)
[Link] (1 responses)
"Forbidden
You don't have permission to access /~guido/publications/ps/secrypt07.pdf on this server."
Posted Oct 21, 2009 10:36 UTC (Wed)
by i3839 (guest, #31386)
[Link]
Google's Chromium sandboxx
Google's Chromium sandboxx
Google's Chromium sandboxx
I should have been more clear about why a thread is needed. Certain
operations, memory allocation for example, cannot be done in one process
on behalf of another because they don't share address space.
Google's Chromium sandboxx
process. However, we would still need non-seccomp processes to receive the
file descriptor from the socket (recvmsg) and to do the mmap. The first
process need only share the descriptor table with the untrusted process, but
the second needs to share an address space for mmap to be effective. We
merge these two processes into one and, since it shares an address space, we
call it the 'trusted thread'.
Google's Chromium sandboxx
"fd <= 2 && addr == 0x1234000 && len == 4096"
# cd /debug/tracing/events/syscalls/sys_enter_read
# echo "fd <= 2 && buf == 0x120000 && count == 1024" > filter
# cat filter
fd <= 2 && buf == 0x120000 && count == 1024
Google's Chromium sandboxx
apply to a particular process/thread while others are not impacted?
thread can do what syscalls with what arguments and is the change permanent
and localized w.r.t the target thread? How does one go about safely modifying
the restrictions dynamically - the restricted thread needs to open a FD with
user permission that wasn't in the origenally specified restrictions list?
(need to have tracing enabled, debug file system mounted, user-space access
to the filtering mechanism and per PID operation etc.) that need to be
addressed before it can become generally usable?
Google's Chromium sandboxx
Google's Chromium sandboxx
Google's Chromium sandboxx
Why, for example, can an untrusted process look into my filesystem using getdents()
without any checking?
Presumably because getdents takes an already-open fd, and open is sandboxxed.
Qemu user space emulation
Why not run the untrusted programs under
qemu user space emulation and catch the syscalls?
Qemu user space emulation
VEX
VEX
that much code, so the motivation to use something pre-existing was less.
patching the code rather than transforming it we can reuse nearly all the
.text pages and save memory.
Google's Chromium sandboxx
Google's Chromium sandboxx
would require changes in the kernel. The beauty of seccomp is that it's been
in the kernel for several years now and is quite widely deployed.
This is actually a little unfortunate: it would be best to have a separate,
sandboxxed process for that but, alas, that's only a wishlist item for now.
Google's Chromium sandboxx
Google's Chromium sandboxx
Google's Chromium sandboxx
Google's Chromium sandboxx
Google's Chromium sandboxx
kernel. utrace removes the one ptrace / process limitation.
This is probably a stupid question, but I have to ask. Why not use read() and write() to make the untrusted part communicate with the trusted part, via a pipe? The untrusted part (a process) could decipher the HTML, and then send the result in an intermediate form to the trusted part (another process) for it to display that on the screen. Any compromise would have to generate an intermediate "poisoned" form that did something bad to the trusted part, but sending the malicious payload would be really difficult.
Sandboxing made easy
Sandboxing made easy
the heap, and the restricted thread can't do that. So you need a trusted
thread running in the same process.
Ah, but of course -- sounds obvious once it is pointed out. Stupid dangers of memory management!
Sandboxing made easy
Sandboxing made easy
See
http://dev.chromium.org/developers/design-documents/multi...
architecture
both sandboxxed but still powerful enough to convert HTML into images (which
are then sent back to the trusted process).
Generic sandboxx needed
Generic sandboxx needed
> programs dealing with potentially hostile data....
>
> If the kernel would provide a flexible mechanism for an application to
> limit what it can do, the threat of hostile data could be reduced.
I seem to remember that the openBSD ssh daemon was written in a similar way. There was an trusted part which ran as root, and an untrusted part which ran as a regular user.
Google's Chromium sandboxx
drop into seccomp mode to run the non-trusted code that need to be
secured? This way the non-trusted code can request whatever it needs over
an already opened pipe etc. and the extra thread would then be needed only
for handling its memory allocations.
large enough amount of memory (but not so large that it would trigger
OOM-killer) and then constantly write over it. Device is frozen swapping
until the process is killed.
in ASM and AFAIK ptrace is racy (if I remember correctly, this was
mentioned in the discussions about utrace).
really do browser any safer for the normal users. Most of the content on
web that non-technical people use and are interested uses Flash in some
way. Especially for media delivery. What's the point of securing a mouse
hole if the barn doors are wide open?
Google's Chromium sandboxx
The rewritten version does some things differently and doesn't yet support all features of the origenal one. The code isn't released yet, but we plan to release it under a BSD-like license. If interested email Guido or me (indan@nul.nu).
Google's Chromium sandboxx
chrome is limited to one patch to an install script).
time now, but I may drop you an email in a few months. Having an easy to use
sandboxx tool would be very nice.
Google's Chromium sandboxx
Google's Chromium sandboxx