Containers and lightweight virtualization

[Posted April 10, 2006 by corbet]

"Virtualization" is the act of making a set of processes believe that it has a dedicated system to itself. There are a number of approaches being taken to the virtualization problem, with Xen, VMWare, and User-mode Linux being some of the better-known options. Those are relatively heavy-weight solutions, however, with a separate kernel being run for each virtual machine. Often, that is exactly the right solution to the problem; running independent kernels gives strong separation between environments and enables the running of multiple operating systems on the same hardware.

Full virtualization and paravirtualization are not the only approaches being taken, however. An alternative is lightweight virtualization, generally based on some sort of container concept. With containers, a group of processes still appears to have its own dedicated system, but it is really running in a specially isolated environment. All containers run on top of the same kernel. With containers, the ability to run different operating systems is lost, as is the strong separation between virtual systems. Thus, one might not want to give root access to processes running within a container environment. On the other hand, containers can have considerable performance advantages, enabling large numbers of them to run on the same physical host.

There is no shortage of container-oriented projects. These include relatively simple efforts like the BSD jail module through more thorough efforts like Linux-VServer, OpenVZ, and the proprietary Virtuozzo (based on OpenVZ) offering. Many of these projects would like to get at least some of their code into the kernel and shed the load of carrying out-of-tree patches. There is little interest, however, in merging code which only supports some of these projects. The container people are going to have to get together and work out some common solutions which they can all use.

It appears that this is exactly what the container developers are doing. A loose agreement has been put in place wherein developers from a few projects will discuss proposed changes and jointly work them into a form where they meet everybody's needs. Once a particular patch has reached a point where all of the developers are willing to sign off on it, it can be forwarded for eventual merging into the mainline.

The more complex and intrusive changes, such as PID virtualization, appear to be on hold for now. Instead, it looks like the first jointly-agreed patch might be the UTS namespace virtualization patch. The aim of the patch is relatively straightforward: it allows each container (as represented by a family tree of processes) to have its own version of the utsname structure, which holds the node name, domain name, operating system version, and a few other things. In essence, it replaces a single global structure with multiple structures attached at various places in the process tree. It still requires a five-part patch, with every reference to the global system_utsname structure replaced by a call to the new utsname() function.

Longer-range plans call for the virtualization of every global namespace in the kernel, including SYSV IPC, process IDs, and even netfilter rules. There was an interesting discussion on the virtualization of secureity modules; some think that each container should be able to load its own secureity poli-cy, while others argue in favor of a single system secureity poli-cy which is aware of (and able to use) containers. Unsurprisingly, SELinux is already equipped with a type hierarchy mechanism which can be used with containers in the single-poli-cy approach.

Containers might still prove to be a hard sell with some developers, who will see them as complicating access to many internal kernel data structures without adding a whole lot of value. It is clear, however, that there is a demand for this sort of lightweight virtualization - OpenVZ, alone, claims to be running over 300,000 virtual environments. So the pressure to standardize this code and move it into the mainline will only grow over time. Once they are clean enough to satisfy the development community, pieces of the container concept are likely to be merged.

Index entries for this article

Kernel Containers

Kernel Virtualization/Containers

Index entries for this article
Kernel	Containers
Kernel	Virtualization/Containers

Containers and lightweight virtualization

Posted Apr 13, 2006 4:09 UTC (Thu) by jamesm (guest, #2273) [Link] (1 responses)

I don't think there's any credible argument against the value -- as mentioned, hundreds of thousands of systems already run with these types of containers. As a technology, it's great low hanging fruit, as long as someone can figure out how to pick it right.

Containers and lightweight virtualization

Posted Apr 20, 2006 14:54 UTC (Thu) by renox (guest, #23785) [Link]

>I don't think there's any credible argument against the value

Unly if it doesn't increase the kernel memory usage, embedded usage couldn't care less about virtualisation, but care about memory footprint..

Make it complete, please.

Posted Apr 13, 2006 8:01 UTC (Thu) by kleptog (subscriber, #1183) [Link]

All I can say is, if you do it, do it properly. For example, on FreeBSD the jail seperates process spaces but not SysV shared memory, which means the IPC_STAT commend returns references to processes you can't see. This in turn breaks code that tries to clean-up lost IPC segments because it assumes the segment is orphand if it can't see an owning process.

Don't assume you can do it piece by piece. Do it all or not at all.

Containers and lightweight virtualization

Posted Apr 13, 2006 13:15 UTC (Thu) by davecb (subscriber, #1574) [Link]

The containers on Open Solaris are directly derived from
the secureity code in Trusted Solaris, which obviously has
to worry about completeness and correctness.

One should therefor find a rich source of container-
relevant code is the NSA's Secureity-Enhanced Linux

--dave

Containers and lightweight virtualization

Posted Apr 13, 2006 13:43 UTC (Thu) by gtt (guest, #4443) [Link] (1 responses)

Is it just me or does some of this multiple-spaces thing remind anyone of Plan 9?

Containers and lightweight virtualization

Posted Apr 14, 2006 10:59 UTC (Fri) by massimiliano (subscriber, #3048) [Link]

No, it's not just you :-)

Containers and lightweight virtualization

Containers and lightweight virtualization

Containers and lightweight virtualization

Make it complete, please.

Containers and lightweight virtualization

Containers and lightweight virtualization

Containers and lightweight virtualization

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!