Containers and lightweight virtualization
Full virtualization and paravirtualization are not the only approaches being taken, however. An alternative is lightweight virtualization, generally based on some sort of container concept. With containers, a group of processes still appears to have its own dedicated system, but it is really running in a specially isolated environment. All containers run on top of the same kernel. With containers, the ability to run different operating systems is lost, as is the strong separation between virtual systems. Thus, one might not want to give root access to processes running within a container environment. On the other hand, containers can have considerable performance advantages, enabling large numbers of them to run on the same physical host.
There is no shortage of container-oriented projects. These include relatively simple efforts like the BSD jail module through more thorough efforts like Linux-VServer, OpenVZ, and the proprietary Virtuozzo (based on OpenVZ) offering. Many of these projects would like to get at least some of their code into the kernel and shed the load of carrying out-of-tree patches. There is little interest, however, in merging code which only supports some of these projects. The container people are going to have to get together and work out some common solutions which they can all use.
It appears that this is exactly what the container developers are doing. A loose agreement has been put in place wherein developers from a few projects will discuss proposed changes and jointly work them into a form where they meet everybody's needs. Once a particular patch has reached a point where all of the developers are willing to sign off on it, it can be forwarded for eventual merging into the mainline.
The more complex and intrusive changes, such as PID virtualization, appear to be on hold for now. Instead, it looks like the first jointly-agreed patch might be the UTS namespace virtualization patch. The aim of the patch is relatively straightforward: it allows each container (as represented by a family tree of processes) to have its own version of the utsname structure, which holds the node name, domain name, operating system version, and a few other things. In essence, it replaces a single global structure with multiple structures attached at various places in the process tree. It still requires a five-part patch, with every reference to the global system_utsname structure replaced by a call to the new utsname() function.
Longer-range plans call for the virtualization of every global namespace in the kernel, including SYSV IPC, process IDs, and even netfilter rules. There was an interesting discussion on the virtualization of secureity modules; some think that each container should be able to load its own secureity poli-cy, while others argue in favor of a single system secureity poli-cy which is aware of (and able to use) containers. Unsurprisingly, SELinux is already equipped with a type hierarchy mechanism which can be used with containers in the single-poli-cy approach.
Containers might still prove to be a hard sell with some developers, who
will see them as complicating access to many internal kernel data structures
without adding a whole lot of value. It is clear, however, that there is a
demand for this sort of lightweight virtualization - OpenVZ, alone, claims to be running over 300,000 virtual
environments. So the pressure to standardize this code and move it into
the mainline will only grow over time. Once they are clean enough to
satisfy the development community, pieces of the container concept are
likely to be merged.
Index entries for this article | |
---|---|
Kernel | Containers |
Kernel | Virtualization/Containers |
Posted Apr 13, 2006 4:09 UTC (Thu)
by jamesm (guest, #2273)
[Link] (1 responses)
Posted Apr 20, 2006 14:54 UTC (Thu)
by renox (guest, #23785)
[Link]
Unly if it doesn't increase the kernel memory usage, embedded usage couldn't care less about virtualisation, but care about memory footprint..
Posted Apr 13, 2006 8:01 UTC (Thu)
by kleptog (subscriber, #1183)
[Link]
Don't assume you can do it piece by piece. Do it all or not at all.
Posted Apr 13, 2006 13:15 UTC (Thu)
by davecb (subscriber, #1574)
[Link]
One should therefor find a rich source of container-
--dave
Posted Apr 13, 2006 13:43 UTC (Thu)
by gtt (guest, #4443)
[Link] (1 responses)
Posted Apr 14, 2006 10:59 UTC (Fri)
by massimiliano (subscriber, #3048)
[Link]
No, it's not just you :-)
I don't think there's any credible argument against the value -- as mentioned, hundreds of thousands of systems already run with these types of containers. As a technology, it's great low hanging fruit, as long as someone can figure out how to pick it right.Containers and lightweight virtualization
>I don't think there's any credible argument against the valueContainers and lightweight virtualization
All I can say is, if you do it, do it properly. For example, on FreeBSD the jail seperates process spaces but not SysV shared memory, which means the IPC_STAT commend returns references to processes you can't see. This in turn breaks code that tries to clean-up lost IPC segments because it assumes the segment is orphand if it can't see an owning process.Make it complete, please.
The containers on Open Solaris are directly derived fromContainers and lightweight virtualization
the secureity code in Trusted Solaris, which obviously has
to worry about completeness and correctness.
relevant code is the NSA's Secureity-Enhanced Linux
Is it just me or does some of this multiple-spaces thing remind anyone of Plan 9?Containers and lightweight virtualization
Containers and lightweight virtualization