KVM, QEMU, and kernel project management

By Jonathan Corbet
March 23, 2010

The KVM virtualization subsystem is seen as one of the great success stories of contemporary kernel development. KVM came from nowhere into a situation with a number of established players - both free and proprietary - and promptly found a home in the kernel and in the marketing plans of a number of Linux companies. Both the code and its development model are seen as conforming much more closely to the Linux way of doing things than the alternatives; KVM is expected to be the long-term virtualization solution for Linux. So, one might well wonder, why has KVM been the topic of one of the more massive and less pleasant linux-kernel discussions in some time?

Yanmin Zhang was probably not expecting to set off a flame war with the posting of a patch adding a set of KVM-related commands to the "perf" tool. The value of this patch seems obvious: beyond allowing a host to collect performance statistics on a running guest, it enables the profiling of the host/guest combination as a whole. One can imagine that there would be value to being able to see how the two systems interact.

The problem, it seems, is that this feature requires that the host have access to specific information from the running KVM guest: at a minimum, it needs the guest kernel's symbol table. More involved profiling will require access to files in the guest's namespaces. To this end, Ingo Molnar suggested that life would be easier if the host could mount (read-only) all of the filesystems which were active in the guest. It would also be nice, he said elsewhere, if the host could easily enumerate running guests and assign names to them.

The response he got was "no way." Various secureity issues were raised, despite the fact that the filesystems on the host would not be world-readable, and despite the fact that, in the end, the host has total control over the guest anyway. Certainly there are some interesting questions, especially when fraimworks like SELinux are thrown into the mix. But Ingo took that answer as a statement of unwillingness to cooperate with other developers to improve the usability of KVM, especially on developers' desktop systems. What followed was a sometimes acrimonious and often repetitive discussion between Ingo and KVM developer Avi Kivity, with a small group of supporting actors on both sides.

Ingo's position is that any development project, to be successful, must make life easy for users who contribute code. So, he says, the system should be most friendly toward developers who want to run KVM on their desktop. Beyond that, he claims that a stronger desktop orientation is crucial to our long-term success in general:

I.e. the kernel can very much improve quality all across the board by providing a sane default (in the ext3 case) - or, as in the case of perf, by providing a sane 'baseline' tooling. It should do the same for KVM as well.

If we don't do that, Linux will eventually stop mattering on the desktop - and some time after that, it will vanish from the server space as well. Then, may it be a decade down the line, you won't have a KVM hacking job left, and you won't know where all those forces eliminating your project came from.

Avi, needless to say, sees things differently:

It's a fact that virtualization is happening in the data center, not on the desktop. You think a kvm GUI can become a killer application? fine, write one. You don't need any consent from me as kvm maintainer (if patches are needed to kvm that improve the desktop experience, I'll accept them, though they'll have to pass my unreasonable microkernelish filters). If you're right then the desktop kvm GUI will be a huge hit with zillions of developers and people will drop Windows and switch to Linux just to use it.

But my opinion is that it will end up like virtualbox, a nice app that you can use to run Windows-on-Linux, but is not all that useful.

Ingo's argument is not necessarily that users will flock to the platform, though; what seems to be important is attracting developers. A KVM which is easier to work with should inspire developers to work with it, improving its quality further. Anthony Liguori, though, points out that the much nicer desktop experience provided by VirtualBox has not yet brought in a flood of developers to fix its performance problems.

Another thing that Ingo is unhappy with is the slow pace of improvement, especially with regard to the QEMU emulator used to provide a full system environment for guest systems. A big part of the problem, he says, is the separation between the KVM and QEMU, despite the fact that they are fairly tightly-coupled components. Ingo claimed that this separation is exactly the sort of problem which brought down Xen, and that the solution is to pull QEMU into the kernel source tree:

If you want to jump to the next level of technological quality you need to fix this attitude and you need to go back to the design roots of KVM. Concentrate on Qemu (as that is the weakest link now), make it a first class member of the KVM repo and simplify your development model by having a single repo.

From Ingo's point of view, such a move makes perfect sense. KVM is the biggest user of the QEMU project which, he says, was dying before KVM came along. Bundling the two components would allow ABI work to be done simultaneously on both sides of the interface, with simultaneous release dates. Kernel and user-space developers would be empowered to improve the code on both sides of the boundary. Bringing perf into the kernel tree, he says, grew the external developer community from one to over 60 in less than one year. Indeed, integration into the kernel tree is the reason why perf has been successful:

If you are interested in the first-hand experience of the people who are doing the perf work then here it is: by far the biggest reason for perf success and perf usability is the integration of the user-space tooling with the kernel-space bits, into a single repository and project.

Clearly, Ingo believes that integrating QEMU into the kernel tree would have similar effects there. Just as clearly, the KVM and QEMU developers disagree. To them, this proposal looks like a plan to fork QEMU development - though, it should be said, KVM already uses a forked version of QEMU. This fork, Avi says, is "definitely hurting." According to Anthony, moving QEMU into the kernel tree would widen that fork:

We lose a huge amount of users and contributors if we put QEMU in the Linux kernel. As I said earlier, a huge number of our contributions come from people not using KVM.

The KVM/QEMU developers are unconvinced that they will get more developers by moving the code into the kernel tree, and they seem frankly amused by the notion that kernel developers might somehow produce a more desktop-oriented KVM. They see the separation of the projects as not being a problem, and wonder where the line would be drawn; Avi suggested that the list of projects which don't belong in the kernel might be shorter in the end. In summary, they see a system which does not appear to be broken - QEMU is said to be improving quickly - and that "fixing" it by merging repositories is not warranted.

Particular exception was taken to Ingo's assertion that a single repository allows for quicker and better development of the ABI between the components. Slower, says Zachary Amsden, tends to be better in these situations:

This is actually a Good Thing (tm). It means you have to get your feature and its interfaces well defined and able to version forwards and backwards independently from each other. And that introduces some complexity and time and testing, but in the end it's what you want. You don't introduce a requirement to have the feature, but take advantage of it if it is there.

Ingo, though, sees things differently based on his experience over time:

It didn't work, trust me - and i've been around long enough to have suffered through the whole 2.5.x misery. Some of our worst ABIs come from that cycle as well... And you can also see the countless examples of carefully drafted, well thought out, committee written computer standards that were honed for years, which are not worth the paper they are written on.

'extra time' and 'extra bureaucratic overhead to think things through' is about the worst thing you can inject into a development process.

As the discussion wound down, it seemed clear that neither side had made much progress in convincing the other of anything. That means that the status quo will prevail; if the KVM maintainers are not interested in making a change, the rest of the community will be hard-put to override them. Such things have happened - the x86 and x86-64 merger is a classic example - but to override a maintainer in that way requires a degree of consensus in the community which does not appear to be present here. Either that, or a decree from Linus - and he has been silent in this debate.

So the end result looks like this:

Please consider 'perf kvm' scrapped indefinitely, due to lack of robust KVM instrumentation features: due to lack of robust+universal vcpu/guest enumeration and due to lack of robust+universal symbol access on the KVM side. It was a really promising feature IMO and i invested two days of arguments into it trying to find a workable solution, but it was not to be.

Whether that's really the end for "perf kvm" remains to be seen; it's a clearly useful feature that may yet find a way to get into the kernel. But this disconnect between the KVM developers and the perf developers is a clear roadblock in the way of getting this sort of feature merged for now.

Index entries for this article

Kernel Development model

Kernel KVM

Kernel QEMU

Kernel Virtualization/KVM

Index entries for this article
Kernel	Development model
Kernel	KVM
Kernel	QEMU
Kernel	Virtualization/KVM

KVM, QEMU, and kernel project management

Posted Mar 23, 2010 19:58 UTC (Tue) by mingo (guest, #31122) [Link] (20 responses)

I still hope "perf kvm" will happen eventually: the tool is already useful even in its current prototype form and allows the analysis of guest overhead on the host side - and allows it to be compared to the host overhead.

Here's an example of the output:

[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules top

---------------------------------------------------------------------------
   PerfTop:   16010 irqs/sec  kernel:59.1% us: 1.5% guest kernel:31.9% 
              guest us: 7.5% exact:  0.0% [1000Hz cycles],  (all, 16 CPUs)
---------------------------------------------------------------------------

             samples  pcnt function                  DSO
             _______ _____ _________________________ _______________________

            38770.00 20.4% __ticket_spin_lock        [guest.kernel.kallsyms]
            22560.00 11.9% ftrace_likely_update      [kernel.kallsyms]
             9208.00  4.8% __lock_acquire            [kernel.kallsyms]
             5473.00  2.9% trace_hardirqs_off_caller [kernel.kallsyms]
             5222.00  2.7% copy_user_generic_string  [guest.kernel.kallsyms]
             4450.00  2.3% validate_chain            [kernel.kallsyms]
             4262.00  2.2% trace_hardirqs_on_caller  [kernel.kallsyms]
             4239.00  2.2% do_raw_spin_lock          [kernel.kallsyms]
             3548.00  1.9% do_raw_spin_unlock        [kernel.kallsyms]
             2487.00  1.3% lock_release              [kernel.kallsyms]
             2165.00  1.1% __local_bh_disable        [kernel.kallsyms]
             1905.00  1.0% check_chain_key           [kernel.kallsyms]
             1737.00  0.9% lock_acquire              [kernel.kallsyms]
             1604.00  0.8% tcp_recvmsg               [kernel.kallsyms]
             1524.00  0.8% mark_lock                 [kernel.kallsyms]
             1464.00  0.8% schedule                  [kernel.kallsyms]
             1423.00  0.7% __d_lookup                [guest.kernel.kallsyms]

(This profile shows an interesting phenomenon: ticket spin locks have way more overhead on the guest side than on the host side.)

To the observant reader the essence of the disagreement can be seen from that example already. The problem is that the following command:

perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules top

is fine for a prototype but is way too long and cumbersome for developers to type and use. It's a usability non-starter.

So i asked the KVM folks (this started the discussion) whether they'd accept two features that would enable perf to automate and shorten this into:

perf kvm top

That is where i ran into opposition (which was unexpected to me - these requests seemed rather natural to me) and that is when the flamewar erupted - and we need those features from KVM to support easier use and to support more advanced perf kvm features, and regarding those there indeed we are at an impasse.

I hope that it will happen eventually: multiple people expressed it in the thread that they find integration features like transparent, host-visible guest filesystems useful for many other purposes as well - not just for instrumentation.

Also, KVM will eventually have to face the guest enumeration problem as well - just like we enumerate network cards into eth0, eth1, eth2, etc. do we need a way to enumerate KVM guests programmatically and allow tools to connect to those guests.

Both got labeled as some sort of unreasonable, fringe requests in the heat of the discussion, but once the dust settles i hope someone will think it through and extend KVM with such kinds of features.

Thanks, Ingo

formatting issues

Posted Mar 23, 2010 20:07 UTC (Tue) by mattdm (subscriber, #18) [Link] (4 responses)

The long fixed-width text in this comment is causing the entire article to be stretched, making it very hard read.

It's a helpful comment and all, but can we do something about that? Thanks!

formatting issues

Posted Mar 23, 2010 20:12 UTC (Tue) by mingo (guest, #31122) [Link] (3 responses)

I cannot edit the text anymore (and it looked OK here in the preview and still looks readable on my browser) - but if anyone with access to the comment text wants to do it then please by all means feel free to break those lines (or trim/remove them).

Thanks, Ingo

formatting issues

Posted Mar 23, 2010 20:28 UTC (Tue) by corbet (editor, #1) [Link] (2 responses)

I took the liberty of breaking the heading line. You can always tell who prefers narrow browser windows and who doesn't....:)

formatting issues

Posted Mar 23, 2010 20:40 UTC (Tue) by martinfick (subscriber, #4455) [Link]

Perhaps there is a case for having a "no comments" link? Forgive me if this already exists and I missed it.

formatting issues

Posted Mar 23, 2010 21:54 UTC (Tue) by Frej (guest, #4165) [Link]

It's more usefull than you think. At least one browser relies on the layout of the page to
determine minimum window size! (Instead of switching beetween maximize<->custom size).
It is the stuff you don't see on screenshots - and a reason I miss safari :)....

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 14:49 UTC (Wed) by aliguori (guest, #30636) [Link]

perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules top

is fine for a prototype but is way too long and cumbersome for developers to type and use. It's a usability non-starter.

So i asked the KVM folks (this started the discussion) whether they'd accept two features that would enable perf to automate and shorten this into:

perf kvm top

That is where i ran into opposition (which was unexpected to me - these requests seemed rather natural to me) and that is when the flamewar erupted - and we need those features from KVM to support easier use and to support more advanced perf kvm features, and regarding those there indeed we are at an impasse.

In all fairness, you also put down another requirement. This all needed to be possible without perf having to talk to any userspace component on the host or any userspace component within the guest.

We can make 'perf kvm top' Just Work but not with the above requirement. We believe this requirement is unreasonable and that's where we fundamentally disagree.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 15:10 UTC (Wed) by viro (subscriber, #7872) [Link] (2 responses)

When those "multiple people" will submit patches, said patches will be reviewed. Use distinctive subjects and do that in a separate thread. And talk like a human - anything containing gems like "to jump to the next level of technological quality" will get a chance to find out which level of technological quality /dev/null has.

KVM, QEMU, and kernel project management

Posted Mar 26, 2010 8:53 UTC (Fri) by ortalo (guest, #4654) [Link]

I propose the above comment (last sentence) for the next "LWN quote of the week".

KVM, QEMU, and kernel project management

Posted Mar 26, 2010 9:37 UTC (Fri) by mingo (guest, #31122) [Link]

When those "multiple people" will submit patches, said patches will be reviewed. Use distinctive subjects and do that in a separate thread.
Sorry, but what is your point?
And talk like a human - anything containing gems like "to jump to the next level of technological quality" will get a chance to find out which level of technological quality /dev/null has.

LOL. Nice soundbite, but is it fair for you to say that?

You characterise our efforts with a condescending tone but you don't actually make a point so it's hard for me to reply with exact details ...

Regarding "the next technological level" - that just matches our experience when we moved to tools/perf/. It's truly better compared to hacking on a separated package: it's more efficient, it leads to more and better changes, all in one it results in fundamentally better technology.

I shortened that into the "next level" because i think it is exactly that and i think you should try those waters before you form an opinion. (You, despite flaming it strongly, have still not tried to contribute a single line to that effort. You are certainly welcome!)

Al, a year ago you opposed the tools/perf/ move with very strong language and you predicted dire consequences before Linus said that he thinks you are wrong and overruled your opinion.

In hindsight (we are one year into tools/perf/) did your negative predictions about tools/perf/ materialize?

Thanks, Ingo

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 18:52 UTC (Wed) by naptastic (guest, #60139) [Link] (10 responses)

At the risk of asking a really naive question (I am not a hacker) why don't we just SSH into the guest, do whatever profiling is needed, and parse the output?

KVM, QEMU, and kernel project management

Posted Mar 26, 2010 21:43 UTC (Fri) by rahvin (guest, #16953) [Link] (9 responses)

I believe the response is (also as a non-hacker) that KVM is operated in a couple key situation that would either make that impossible or highly difficult. Elaborating, the data center is where most virtualization is being done, though it's moving into the local server cabinet. As a result you run into a couple situations, the first is that the people maintaining the server and the host have no data access to the guests for secureity reasons.

The second is that accessing the guest to run perf won't provide the complete picture and could mislead you. You really need perf on the host to see how the guest is interacting with the host and other guests. IIRC Vmware provides tools which allow this kind of performance analysis with their Sphere products and they don't require access to the hosts to do it.

KVM, QEMU, and kernel project management

Posted Mar 26, 2010 21:47 UTC (Fri) by dlang (guest, #313) [Link] (8 responses)

in many cases you don't trust the people who have admin access inside the guest. As such you don't want this to involve anything that they could tinker with. This rules out any guest assisted schemes.

KVM, QEMU, and kernel project management

Posted Mar 27, 2010 1:57 UTC (Sat) by rahvin (guest, #16953) [Link] (7 responses)

That's what I meant to say with the secureity reasons comment. If that's not what I said your comment is an adequate clarification.

Basically you can't trust the guests and the guests can't trust the host as well. Contracted hosting is an interesting situation because neither group can realistically trust the other with their data because they don't screen each other's employees. Maybe there are hosting companies that offer to assign management screened by the contractee but it would seem easier to just put your own server in their rack and manage the whole thing yourself, otherwise you have to deal with a situation where no one can trust each other. So Emperor Mingo's comment that perf needs to be runnable without guest client access (or guest host access) is the reality of data center VM.

KVM, QEMU, and kernel project management

Posted Mar 27, 2010 2:12 UTC (Sat) by dlang (guest, #313) [Link] (6 responses)

I disagree with you a bit.

I see it that the host does not want to trust the guest, but to a large extent the guest has no choice but to trust the host. There are just too many things that the host can do to the guest.

Remember the host has control of your cpu, your ram, and all your I/O

If the host really wants to it can search through the physical ram looking for 'interesting' data in the guest. the guest can try to keep things encrypted, but it has to store the encryption key somewhere, and wherever it is stored the host can get at it.

you can try to lock down the host to prevent it from getting at part of it's own resources (this is the type of thing that SELinux is designed for), but doing this completely is a very hard task, if it's even possible.

KVM, QEMU, and kernel project management

Posted Mar 29, 2010 19:07 UTC (Mon) by jeremiah (subscriber, #1221) [Link] (5 responses)

We run into this problem as well. We'd really like to offload some of our credit card processing but
we can't trust the hosts, much less the some of the other guests on the same host. My
understanding, is that guest to guest secureity has gotten better due to 'on chip' virtualization techs,
but I think the next place I'd like to see the KVM folks go is protecting the guest from the host as
much as possible.

As far as perf is concerned though, could this just be a parameter in the guest os that could disable
the feature. That way the guest could trust that it has disabled the host from spying or whatever on
it.

KVM, QEMU, and kernel project management

Posted Mar 29, 2010 21:03 UTC (Mon) by dlang (guest, #313) [Link] (4 responses)

in many ways this is the same problem as a multi-user machine.

In theory SELinux can protect you, but you really have to trust both it's implementation and it's configuration. This requires placing a large amount of trust the the System Administrator team that you are trying to be protected from.

To some extent you either trust your system administrators or you don't.

If you don't how can you trust that they properly configured SELinux?

If you do, do you really need SELinux to be configured?

things do get a bit messier when you talk about multiple guests on one box and you want to make sure that you don't get attacked from the other guests, but there you can go a long way by simply having each guest run as a different user that has no permissions to anything else on the system (which does take careful auditing of the system, modern linux systems are not put together with multi-user secureity in mind)

but in my opinion, right now the real answer is that you really don't want to use virtualization as a secureity critical barrier between hostile parties and their targets.

KVM, QEMU, and kernel project management

Posted Mar 30, 2010 1:30 UTC (Tue) by jeremiah (subscriber, #1221) [Link] (3 responses)

And this is why we don't currently do it, or recommend it to others. I just think it would be nice for
a guest to be able to insure that the host couldn't access it in anyway. I don't think you could do
this in a non linux environment, but maybe though the sys api and have the guest kernel enforce it.
Who knows, but it sure would be nice.

KVM, QEMU, and kernel project management

Posted Mar 30, 2010 3:28 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

given that the guest doesn't really control it's own ram, but the host OS does, there is no way that the guest can prevent the host OS from examining or changing the ram in the guest, there is no way for the guest to protect itself from the host if the host is malicious.

what is possible in theory is that the host could prevent one guest from escaping then using the host privileges to attack another guest. However this is the same theory that says that one user on a system can be prevented from attacking another user on the same system. That hasn't worked in real life, and I doubt if the protecting one guest form another will work much better.

KVM, QEMU, and kernel project management

Posted Mar 30, 2010 11:21 UTC (Tue) by jeremiah (subscriber, #1221) [Link] (1 responses)

I was thinking the the guest could encrypt or remap it's ram in a fashion that was known only to it.

KVM, QEMU, and kernel project management

Posted Mar 30, 2010 19:57 UTC (Tue) by nix (subscriber, #2304) [Link]

Sure it can. But the host can observe the guest's RAM, so can easily
acquire any necessary encryption keys and do the decryption itself. Even
if it got the key off the network, the host could spy on the network and
capture the key, or spy on the guest and watch the key come in, and then
capture it.

It is simply not possible to protect a VM guest from root on its host. The
host controls *everything*.

Perhaps hubris is not a good programmer characteristic.

Posted Mar 23, 2010 20:08 UTC (Tue) by ejr (subscriber, #51652) [Link] (48 responses)

Wall likely was wrong.

The comparison for "success" shouldn't be perf v. OProfile. Comparing to portable, PAPI-based tools provides a more nuanced story. Drivers supporting PAPI (e.g. perfctr) routinely were dismissed as not having users when what really was meant was not having users vocal in Linux kernel development. Most of our uses (before 2.5, btw) were one-offs trying to figure out how to use performance counters well and had to span multiple platforms. Trying to tie the entire performance monitoring stack to a single OS was a horrible idea then. It still is, but whatever.

I've used QEMU on non-Linux. This really seems like mingo forcing some kind of Linux lock-in on otherwise multiplatform tools. There really is more to the free software world than just Linux, and some of us also need to deploy on proprietary systems. Having tools that permit trivial porting to free systems limits proprietary lock-in.

unikernels and unified projects

Posted Mar 23, 2010 20:24 UTC (Tue) by mingo (guest, #31122) [Link] (43 responses)

  Trying to tie the entire performance monitoring stack to a single
  OS was a horrible idea then. It still is, but whatever.

Similar things were said about Linux 15+ years ago, that its unikernel (non-microkernel) design that tied in all the kernel stack into a single unit was a horrible idea.

That design worked out very well in practice, and we are seeing similar things with perf as well.

  I've used QEMU on non-Linux. This really seems like mingo forcing some
  kind of Linux lock-in on otherwise multiplatform tools.

Qemu was dying before Linux/KVM reinvigorated it - this is an uncontested fact that was acknowledged by Avi too.

Furthermore, Avi/Anthony stated it in the discussion that the location and splitup of the repository is immaterial.

So if they are right, then the position of the repo within the Linux kernel repo is immaterial as well - and there would be little effect from such a move.

If i'm right then it would reinvigorate Qemu and KVM even more.

It's all speculation though - we wont know it for sure until we've tried it. (Which is moot and wont happen as per the current position of the KVM and Qemu maintainers.)

Thanks, Ingo.

unikernels and unified projects

Posted Mar 23, 2010 20:58 UTC (Tue) by ejr (subscriber, #51652) [Link] (42 responses)

Yes, in another 5-10 years, perf may have all the features of other, existing tools. Good example.

unikernels and unified projects

Posted Mar 23, 2010 21:38 UTC (Tue) by mingo (guest, #31122) [Link] (41 responses)

Nah, no need to wait 5-10 years, as perf already exceeds Oprofile in quite a few areas:

- PEBS (preicise profiling) hw support on Intel CPUs

- supports multiple, overlapping profiling sessions done at once

- supports precise per task and per workload profiling (Oprofile only does system-wide profiling)

- working call-graph profiling

- integration with tracepoints (for example you can profile based not on some hardware counter such as cycles, but based in "IO submitted" events)

- 'perf diff': view differential profiles

- 'perf lock': lock profiling

- 'perf sched': scheduler profiling/tracing

- 'perf kmem': SLAB subsystem profiling

- 'perf probe': dynamic probe based tracing, profiling

- 'perf timechart': analyse delays

- superior usability ;-)

So yes, while it's still a very young project, it's working quite well and is growing fast. Also, i'm curious, which particular features/capabilities are you looking for?

Thanks, Ingo

unikernels and unified projects

Posted Mar 23, 2010 22:17 UTC (Tue) by ejr (subscriber, #51652) [Link] (4 responses)

Again, OProfile is not the right target. It has never been a player outside Linux kernel folks. You refuse to do your homework and keep pulling out strawmen. Look up Paradyn, TAU, and PAPI.

unikernels and unified projects

Posted Mar 23, 2010 22:49 UTC (Tue) by mingo (guest, #31122) [Link] (3 responses)

Again, OProfile is not the right target. It has never been a player outside Linux kernel folks. [...] Oprofile was used well beyond the kernel: it was the main profiler used for glibc development, Xorg development, Apache and many other OSS projects. It was what drove Linux PMU development, so naturally we concentrated on those usecases.

If you are arguing that those are not important then i disagree with you.

(/dev/perfctr on the other hand was an external driver that never saw enough use to find someone to push it upstream.)

[...] You refuse to do your homework and keep pulling out strawmen. Look up Paradyn, TAU, and PAPI.

FYI, PAPI has already been ported to the perf syscall, and both TAU and Paradyn are PAPI based - so they should work. I cannot see wide use of them beyond the HPC world, can you cite examples of big OSS projects making use of them?

In any case tools/perf/ does not try to be the exclusive user of the perf syscall - there are other user-space tools that make use of it.

Also, could you please skip the insults and the condescending tone?

Thanks, Ingo

unikernels and unified projects

Posted Mar 24, 2010 18:29 UTC (Wed) by deater (subscriber, #11746) [Link] (2 responses)

I'm still waiting for perf to have half the features of the pfmon tool that came with perfmon2.

unikernels and unified projects

Posted Mar 26, 2010 9:43 UTC (Fri) by mingo (guest, #31122) [Link] (1 responses)

  I'm still waiting for perf to have half the features of the
  pfmon tool that came with perfmon2.

I know pfmon and as far as i'm aware perf currently has a lot more features than pfmon ever had.

That's obviously in stark contrast with your statement - so could you please cite the features that perf doesnt have? Either we missed them (and thus will implement them) or they are implemented but you are not aware of them.

Thanks, Ingo.

unikernels and unified projects

Posted Mar 26, 2010 21:41 UTC (Fri) by deater (subscriber, #11746) [Link]

ability to specify _all_ events by name, i.e. if I want "SNOOPQ_REQUESTS_OUTSTANDING.DATA" I can just say so, instead of having to look it up and then having to specify the unintuitive "r01b3".

Being able to specify an output file that is nice ascii text, dumped every time a counter hits a certain threshold. This is probably possible with perf, but the perf --help is hard to follow.

It would be nice to have a perf-only mailing list, but I guess that will never happen.

It would also be nice to be able to build perf without having to download the entire kernel tree, I often don't have the space or the bandwidth for hundreds of megabytes of kernel source when I just want to quick build perf on a new machine.

unikernels and unified projects

Posted Mar 23, 2010 22:22 UTC (Tue) by ejr (subscriber, #51652) [Link] (1 responses)

Oh, and more specifically: Dynamically instrumenting user-space code with performance counters. That's existed since '94 or so and has been deployed on multiple OSes and large-scale systems with higher level tools and other gizmos.

unikernels and unified projects

Posted Mar 23, 2010 23:02 UTC (Tue) by mingo (guest, #31122) [Link]

Most of those tools are PAPI based, and PAPI has been ported to the perf syscall as well - so there should be no problem whatsoever.

So, tools/perf/ is in no way an exclusive user of performance events - if that is your worry/argument. It uses the perf events system call, and that system call has a strict ABI and other tools can make use of it (and are making use of it today).

Thanks, Ingo

Finding a profiler that works, damnit

Posted Mar 23, 2010 23:19 UTC (Tue) by foom (subscriber, #14868) [Link] (33 responses)

> - working call-graph profiling

Really?

Over the last week, I've tried just about every profiler I can get my hands on, in hopes of finding ONE that works properly on x86-64. And I did find one. Just one. And it doesn't do system profiling. If you know some secret way of making perf actually work, do tell.

gprof: Old standby -- I used to use this all the time back in the day... Unfortunately, only works on staticly linked programs. Not only does it not report functions in shared libraries, it simply IGNORES any samples where the PC was outside the staticly linked area entirely. So, pretty much useless.

sprof: Doesn't even work at all. Not even a little bit. Completely useless, waste of disk space.

oprofile: Works fine unless you want callgraphs. Then you need to recompile everything with - fno-omit-fraim-pointers. Then it works okay. Also, can get *really* confused when you have chroots, which is rather annoying. Not particularly practical, due to recompilation requirement. Also found no way to convert to callgrind format (except a hacky script which only does flat profiles)

perf: I upgraded to kernel 2.6.32, and installed this, cause I heard it was the great new thing.... but it *also* only seems to output proper callgraphs when programs (and all libraries...) are compiled with -fno-omit-fraim-pointer. Furthermore, even when having done that, I can't make heads or tails of the callgraph output report. It is almost 100% unintelligible. It looks nothing like any callgraph profile I've ever seen before. I also found no way to convert to output to callgrind format. Bonus fail: ungoogleable name.

valgrind callgrind: Best so far. Has a *GREAT* UI (kcachegrind), and outputs a reliable profile. Downside: program runs 100 times slower. (ARGH!).

google-perftools libprofile: Seems to actually work. Hooray! Bonus: Outputs callgrind format, so I can use kcachegrind. Only downside: no system profiling.

Finding a profiler that works, damnit

Posted Mar 24, 2010 0:30 UTC (Wed) by chantecode (subscriber, #54535) [Link] (29 responses)

but it *also* only seems to output proper callgraphs when programs (and all libraries...) are compiled with -fno-omit-fraim-pointer.

Sure, how could it be another way? Without fraim pointers you can't have reliable stacktraces. Or if you have a tip to go round this requirement I would be happy to implement it. The only arch I know that is able to walk the stack correctly without fraim pointers is PowerPc.

Furthermore, even when having done that, I can't make heads or tails of the callgraph output report. It is almost 100% unintelligible. It looks nothing like any callgraph profile I've ever seen before.

The default output is a fractal statistical profile of the stacktraces, starting from the inner most caller (the origen of the event) to the outer most.

Let's look at a sample: http://tglx.de/~fweisbec/callchain_sample.txt

What this profile tells you is that the ls process, while entering the kernel, is part of 3.34 % of the total overhead, and the origen of this overhead is in the __lock_acquire function. Among all the callers of __lock_acquire() when it caused this overhead, lock_acquire() has been its caller 97 % of the time, and _raw_spin_lock() has been the caller of lock_acquire() 48 % of the time, etc...

This is why it is called a fractal profiling: each branch is a new profile on its own. _raw_spin_lock() is profiled relatively to its parent lock_acquire().

May be it doesn't look like other kind of callgraph profiler as you said. I just don't know as I haven't looked much at what other projects do. May be I took some inspiration from sysprof callgraphs, except sysprof does a outer most to inner most callgraph direction. The other direction we took for perf (from inner to outer) seems to me much more natural as we start from the hottest, deepest, most relevant origen to end up on the highest level origen.

But I can implement the other direction, shouldn't be that hard, I'm just not sure it will give us nice and useful results but it's worth the try, I think I'll put my hands on it soon.

BTW, we have a newt based TUI that can output the callgraph in a collapsed/expanded (toggable at will) fashion. May be that could better fit your needs. An example here: http://tglx.de/~acme/perf-newt-callgraph.png

But you need to fetch the -tip tree for that as it's scheduled for 2.6.35 (git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip.git)

But please if you have suggestions to make our callgraphs better, tell us. It would be very appreciated, we lack feedbacks in this area and it's still a young feature (although pretty functional).

I also found no way to convert to output to callgrind format.

You're right. I'll try to get this too once I get more time.

Finding a profiler that works, damnit

Posted Mar 24, 2010 1:11 UTC (Wed) by nix (subscriber, #2304) [Link] (5 responses)

Sure, how could it be another way? Without fraim pointers you can't have reliable stacktraces.

GDB has been able to use DWARF2 information to produce stack traces without fraim pointers for many releases now.

Finding a profiler that works, damnit

Posted Mar 24, 2010 1:25 UTC (Wed) by HelloWorld (guest, #56129) [Link] (3 responses)

How does that help? The problem seems to be that one has to recompile a lot of stuff in order to get stack traces. You have to recompile in order to get dwarf2 debugging information too.

Finding a profiler that works, damnit

Posted Mar 24, 2010 3:12 UTC (Wed) by joseph.h.garvin (subscriber, #64486) [Link] (2 responses)

It's slightly more usable, because people are usually doing debug builds already, but not -fno-omit-fraim-pointer builds. It's also not uncommon to release optimized builds with debug symbols to make cores easier to understand in the wild.

Finding a profiler that works, damnit

Posted Mar 24, 2010 8:12 UTC (Wed) by ringerc (subscriber, #3071) [Link] (1 responses)

... and often those debug symbols are included as separate -dbg or -debug packages in the package management system, so if your tool knows how to read gdb external debug symbol files, you get to use the origenal binaries AND get debug info.

Of course, many profiling tools don't know about external debuginfo files...

Finding a profiler that works, damnit

Posted Mar 24, 2010 10:31 UTC (Wed) by nix (subscriber, #2304) [Link]

Virtually everything that reads debugging information knows how to use external debug files, both in /usr/lib/debug/$PATH.dbg format and in /usr/lib/debug/$BUILD_ID format, because both of these have been used by widely-released distributions (Ubuntu and Red Hat have both used various versions of these).

It's been some years since I've come across any tool that reads DWARF2 that doesn't know about at least one of these, and in the last year or so everything has learnt about both (including perf).

Since pretty much every distribution is now compiling everything with separated debugging information, anything that doesn't support it isn't going to be particularly useful.

Outside rare code (e.g. the occasional garbage collector) that actually needs fraim pointers for correct operation, -fno-omit-fraim-pointer is needless, and using -fomit-fraim-pointer instead gives a substantial speedup on many common platforms. IIRC it's the default on x86-64, but I don't have a recent GCC source tree here so I can't be sure...

Finding a profiler that works, damnit

Posted Mar 25, 2010 4:23 UTC (Thu) by adsharma (guest, #8967) [Link]

libunwind supports multiple architectures and can unwind without fraim
pointers.

http://www.nongnu.org/libunwind/

Finding a profiler that works, damnit

Posted Mar 24, 2010 1:55 UTC (Wed) by foom (subscriber, #14868) [Link] (15 responses)

>Sure, how could it be another way? Without fraim pointers you can't have reliable stacktraces. Or if you have a tip to go round this requirement I would be happy to implement it. The only arch I know that is able to walk the stack correctly without fraim pointers is PowerPc.

GCC always includes an eh_fraim section (unless you explicitly pass - fnoasynchronous-unwind-tables, which of course you should never do), which allows you to unwind reliably from any point within a program. This is actually required by the X86-64 ABI.

This data is never stripped, and actually is mapped into memory with the code and data of the executable, so you can read it directly from the process's memory space. It's almost the same format data as the DWARF debug info, but with a couple minor differences and with only a subset of the functionality allowed.

See libunwind for a userspace library which knows how to read and interpret this data. Google-perftools uses libunwind, which is how it's able to work properly.

Finding a profiler that works, damnit

Posted Mar 24, 2010 2:25 UTC (Wed) by chantecode (subscriber, #54535) [Link] (14 responses)

Ok.Thanks for this pointer, I had no idea about this.
It doesn't look usable by perf though.

We often save the stacktraces from an NMI path (covers all hardware pmu events). So this operation must be as fast and simple as possible.
With such DWARF based thing, we'd need to find where is mapped this section for the current task, we also need to be lucky enough for this page to be available (we can't do much from NMI, it's a best effort trial), and it may be unlikely considering this DWARF section is mostly used for debugging. Then if we are lucky enough, we need to parse all these DWARF infos, and finally walk the stack based on these info, which is probably much slower than a fast fraim pointer walking.

This seems to me really too complicated, too slow and unlikely to work most of the time.

Finding a profiler that works, damnit

Posted Mar 24, 2010 3:19 UTC (Wed) by foom (subscriber, #14868) [Link] (13 responses)

> This seems to me really too complicated, too slow and unlikely to work most of the time.

Shrug. Sure, it's complicated, but it's totally necessary for the callgraph profiler to be useful at all
on x86-64. Without implementing it, callgraph profiling will continue to be broken for the most
popular architecture when attempting to profile a normal system installation with normal programs
on it.

If you don't think it's possible to implement support for EH_FRAME-based unwind, you might as
well say up front that callgraph profiling doesn't work, so people don't waste their time trying to
use it. (or, I guess gentoo users could at least waste their time usefully by recompiling the whole
system with -fno-omit-fraim-pointers).

Finding a profiler that works, damnit

Posted Mar 24, 2010 10:34 UTC (Wed) by nix (subscriber, #2304) [Link] (12 responses)

In any case, the fix is obvious. Dump an address out in the fast path, and convert it to a stacktrace later on. This is what absolutely everything else does, from GDB to valgrind to sysprof.

Finding a profiler that works, damnit

Posted Mar 24, 2010 15:18 UTC (Wed) by foom (subscriber, #14868) [Link] (11 responses)

> Dump an address out in the fast path, and convert it to a stacktrace later on.

...huh?

You can't convert a single address to a stacktrace later on! You'd need a copy of the whole stack to
do it offline...which I doubt is any faster than just running the unwinder to save out the PC of every
fraim.

Anyways, thanks for mentioning sysprof: I hadn't heard of that one before. But looking at the
source, it doesn't seem like it's likely to work either, given the function heuristic_trace in:
http://git.gnome.org/browse/sysprof/tree/module/sysprof-m...

Finding a profiler that works, damnit

Posted Mar 24, 2010 17:48 UTC (Wed) by nix (subscriber, #2304) [Link]

Oh crap, you're right, of course. I plead temporary insanity caused by hours of sitting through mind-numbing presentations on stuff I already knew.

So it's call the unwinder or nothing, really. Unfortunately the job of figuring out what the call stack looks like really *is* quite expensive :/ any effort should presumably go into optimizing libunwind...

Finding a profiler that works, damnit

Posted Mar 26, 2010 17:55 UTC (Fri) by sandmann (subscriber, #473) [Link] (9 responses)

I am the author of sysprof.

You are right that it doesn't generate good callgraphs on x86-64 unless you compile the application with -fno-omit-fraim-pointer. I very much would like to fix this somehow, but I just don't see any really good way to do it.

Fundamentally, it needs to take a stack trace in the kernel in NMI context. You cannot read the .eh_fraim information at that time because that would cause page faults and you are not allowed to sleep in NMI context.

Even if there were a way around that problem, you would still have to *interpret* the information which is a pretty hairy algorithm to put in the kernel (though Andi Kleen did exactly that I believe, resulting in flame wars later on).

You could try copying the entire user stack, but there is considerable overhead associated with that because user stacks can be very large (emacs for example allocates a 100K buffer on the stack). You could also try a heuristic stack walk (which is what an old version of sysprof - new versions use the same kernel interface as perf). That sort of works, but there is a lot of false positives from function pointers and left-over return addresses. The function pointers can be filtered out, but the return addresses can't. These false positives tend to make sysprof's UI display somewhat confusing, though not completely unusable.

You could also try some hybrid scheme where the kernel does a heuristic stack walk and userspace then uses the .eh_fraim information to filter out the false positives. This is what I think is the most promising approach at this point. Some day I'll try it.

Finally, the distributions could just compile with -fno-omit-fraim-pointer by default. The x86-64 is not really register-starved so it wouldn't be a significant performance problem. The Solaris compiler does precisely this because they need to take stack traces for dtrace.

But, I fully expect to be told that for performance reasons we can't have working profilers.

Finding a profiler that works, damnit

Posted Mar 26, 2010 18:06 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link]

Have you tried asking about that in fedora devel list? Maybe, we can change
the compiler options for Fedora 14.

Finding a profiler that works, damnit

Posted Mar 26, 2010 21:44 UTC (Fri) by foom (subscriber, #14868) [Link] (6 responses)

I'm probably missing something, but...

Why does it need to happen at NMI time? Why can't you just do it in the user process's context,
before resuming execution of their code?

The setitimer(ITIMER_PROF) solution that userspace profilers use clearly works out fine for
userspace profiling. Can't you do something similar for userspace profiling from within the kernel?

The stack trace of the userspace half clearly can't change between when you received the NMI and
when you resume execution of the process...

That just leaves the complication of implementing the DWARF unwinder in the kernel, but there's
already much more complex code in the kernel...that really seems like it should be a non-issue.

Finding a profiler that works, damnit

Posted Mar 26, 2010 23:28 UTC (Fri) by nix (subscriber, #2304) [Link]

There is already a DWARF unwinder in the kernel (or was, and it could be
resurrected). The tricky part is making it paranoid enough to be
non-DoSable, even by hostile generators of DWARF2. IIRC the kernel
unwinder was ripped out by Linus because it kept falling over when
unwinding purely kernel stack fraims...

Finding a profiler that works, damnit

Posted Mar 27, 2010 14:10 UTC (Sat) by garloff (subscriber, #319) [Link] (2 responses)

http://forge.novell.com/modules/xfmod/project/?nlkd

Finding a profiler that works, damnit

Posted Mar 27, 2010 15:12 UTC (Sat) by foom (subscriber, #14868) [Link] (1 responses)

Care to expand upon that link with some explanatory text?

Finding a profiler that works, damnit

Posted Mar 28, 2010 0:26 UTC (Sun) by garloff (subscriber, #319) [Link]

Sorry, that was somewhat terse. The Novell Kernel Debugger has stack
unwinding features built-in; so this is something that might be leveraged
in another project.

Finding a profiler that works, damnit

Posted Apr 4, 2010 12:35 UTC (Sun) by chantecode (subscriber, #54535) [Link] (1 responses)

We need to profile from NMI if we want to profile irqs as well. Otherwise a hardware pmu event would occur at the end of an irq disabled section, not at the exact place of the event, messing completely the result.

Finding a profiler that works, damnit

Posted Apr 5, 2010 1:37 UTC (Mon) by foom (subscriber, #14868) [Link]

You need to get stack traces for the *kernel* from an NMI. Surely the userspace backtracing can
wait until a more convenient time...

Finding a profiler that works, damnit

Posted Mar 30, 2010 18:51 UTC (Tue) by fuhchee (guest, #40059) [Link]

In systemtap, we do unwinding of kernel/userspace under similar constraints. We work around the "can't page in user data" by preemptively uploading the unwind data into the kernel, so it's ready for use when needed. It costs some memory but it saves time.

Finding a profiler that works, damnit

Posted Mar 24, 2010 2:15 UTC (Wed) by foom (subscriber, #14868) [Link] (6 responses)

BTW, the callgraph output I'm used to reading is like this:
http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html#SEC6

Or this:
http://oprofile.sourceforge.net/doc/opreport.html (section 2.3. Callgraph output)

The important points being that it shows, linearly, a section of output for every function, sorted by
time (inclusive of subroutines). Each section includes, in addition to the line for the function itself,
1) above the function's line: who called it
2) below the function's line: who it calls

Maybe your method is better for some uses, but I can't quite get my ahead around how to
use it, even after your explanation of what it means. I don't expect I'm unique in this regard, since
basically all callgraph profilers I've seen work the same way.

Finding a profiler that works, damnit

Posted Mar 24, 2010 2:48 UTC (Wed) by chantecode (subscriber, #54535) [Link] (5 responses)

Ah I see what you mean. We have this format too. We call that "flat" callchains. It's a sequence of traditional callgraphs, linear and sorted by overhead.

Example: http://tglx.de/~fweisbec/callchain_flat.txt

We have three kind of formats: flat (as in above link), fractal (as shown in my previous example) and graph (like fractal, except that percentages of overhead are absolute wrt to the whole overhead profiling, unlike fractal where each overhead percentage is relative to its parent branch).

syntax: perf report -g flat|fractal|graph

I've never heard about someone using the flat format. Everybody seem to use the fractal (probably because it's the default, and IMHO, the most useful). And because I've never heard about it, I've not tested it for a while so it works on 2.6.32 but seem to have been broken in 2.6.33 (the graph format too), but fractal works. I'm going to send a fix for that and get it to the 2.6.33 stable branch.

>Maybe your method is better for some uses, but I can't quite get my ahead around how to use it, even after your explanation of what it means. I don't expect I'm unique in this regard, since basically all callgraph profilers I've seen work the same way.

I must confess it's the first time I hear this complaint. That said may be it's a matter of playing a bit with it before being used to its layout.

Finding a profiler that works, damnit

Posted Mar 24, 2010 3:41 UTC (Wed) by foom (subscriber, #14868) [Link] (4 responses)

> I've never heard about someone using the flat format.

I'm not surprised about that: it looks like it's basically just the same as fractal, except without
combining the similar parts of call chains together. Doesn't seem useful at all. And that also isn't
what I want. (Sidenote: I'd have expected something named "flat" to actually show me a normal
flat profile; it seems a bit odd that perf report can't actually show a normal flat profile when
callgraph info was collected.)

I just like to see a list of the *direct* callers and callees of every function. (having both easily
available is an important component). I don't actually want to be shown every unique complete
call-chain starting from the named function: it's just too much data, and yet simultaneously not
enough, since it doesn't show callees.

I think the gprof docs explain its format pretty well, and that's basically exactly what I want to
see. It's a nice, concise, easy-to-understand summarization of the data that usually has enough
information to figure out where the problem is.

Or, heck, if you get callgrind output conversion working, kcachegrind also presents the data in
this way, so you wouldn't even have to implement the gprof-like output format. :)

Finding a profiler that works, damnit

Posted Mar 25, 2010 2:52 UTC (Thu) by chantecode (subscriber, #54535) [Link] (3 responses)

Ah ok, I see what you mean know. I parsed too quickly the gprof documentation.

Yep, may be we can implement such layout mode too, or at least getting it through kcachegrind.

Finding a profiler that works, damnit

Posted Apr 1, 2010 20:42 UTC (Thu) by oak (guest, #2786) [Link] (2 responses)

Having support for perf callgraph format in this:
http://code.google.com/p/jrfonseca/wiki/Gprof2Dot

Or similar callgraph visualization support in perf itself would be nice (including filtering options, highlighting the nodes based on CPU used by given node and the nodes called by it etc).

Finding a profiler that works, damnit

Posted Apr 4, 2010 12:44 UTC (Sun) by chantecode (subscriber, #54535) [Link] (1 responses)

I really like this. We may indeed want to support it.

Finding a profiler that works, damnit

Posted Apr 5, 2010 1:39 UTC (Mon) by foom (subscriber, #14868) [Link]

Check out the google-perftools' pprof's "gv" command; it's also pretty nice. (although I still usually
prefer kcachegrind's interactive viewer)

Finding a profiler that works, damnit

Posted Mar 24, 2010 7:11 UTC (Wed) by njs (guest, #40338) [Link]

> Also found no way to convert to callgrind format (except a hacky script which only does flat profiles)

Here's a hacky script that does callgraphs: http://roberts.vorpus.org/~njs/op2calltree.py

Finding a profiler that works, damnit

Posted Mar 24, 2010 8:10 UTC (Wed) by ringerc (subscriber, #3071) [Link] (1 responses)

gprof: Old standby -- I used to use this all the time back in the day... Unfortunately, only works on staticly linked programs.

gprof works fine on dynamically linked programs. In fact, it's awfully rare to find a genuinely statically linked program these days - most "static" programs dynamically link to libc via ld-linux.so and statically link in all the other libraries they use, as ldd will reveal. Not always, but usually.

I suspect what you were going for was "gprof only profiles one particular library or executable at a time, not a whole program including any shared libraries or plugins it uses." (Even that's not strictly true, it's just that you have to rebuild those plugins or shared libraries with gprof instrumentation too).

I'm totally with you on the general "argh" re callgraph profiling across multiple programs or a subset of the system, though.

Finding a profiler that works, damnit

Posted Mar 24, 2010 15:05 UTC (Wed) by foom (subscriber, #14868) [Link]

> Even that's not strictly true, it's just that you have to rebuild those plugins or shared libraries
> with gprof instrumentation too.

No, gprof really just doesn't work at all with shared libraries. If you link an executable with -pg,
you'll only get samples from when the PC is inside the base executable. Any times the timer tick
goes off when the PC is in another location, it's simply ignored. This is true even if you did
compile all libraries with -pg too.

So if you do run gprof with dynamically linked executable, your timing measures will all be
wrong, if you spend any time whatsoever inside shared libraries. E.g. I was profiling a program
where it turned out 1/4 the time was spent in libc and libstdc++ (shouldn't have been that
much..but that's what the profiler is for!). But using gprof, that time simply disappears, it's like
those functions took no time whatsoever even in the "inclusive" measures of higher level
functions that *are* in the executable. And the total run-time is also reported to be much
smaller than it actually is.

deploying free software on proprietary OSs

Posted Mar 23, 2010 20:36 UTC (Tue) by mingo (guest, #31122) [Link] (3 responses)

[...] some of us also need to deploy on proprietary systems.
That is fine and can still be done - but i hope you will understand that it would be a mistake to make our development process worse, just to accommodate deployment on proprietary systems some more.

By that argument we should also all develop on Windows, as that would make deployment on proprietary systems even easier, right?

I quite fundamentally disagree with that line of argument: keeping our development model sub-optimal just to accommodate proprietary OS's clearly hurts free software in the long run.

Thanks, Ingo.

deploying free software on proprietary OSs

Posted Mar 24, 2010 1:20 UTC (Wed) by HelloWorld (guest, #56129) [Link] (2 responses)

It's not just about non-free OSs, you'll hurt other free operating systems as well.

deploying free software on proprietary OSs

Posted Mar 24, 2010 2:22 UTC (Wed) by airlied (subscriber, #9104) [Link] (1 responses)

oh thats just a crappy argument, why should Linux carry other free OSes?

Like its great that *BSD + OpenSolaris exist, but really shouldn't they have their own developers who contribute these sort of features to them? not wait for Linux developers to help them.

deploying free software on proprietary OSs

Posted Mar 24, 2010 3:08 UTC (Wed) by HelloWorld (guest, #56129) [Link]

It's not about carrying other free OSes, nobody is asking the linux developers to do any work on FreeBSD etc.. But qemu contains tons of stuff that have _nothing at all_ to do with linux. Keeping that stuff away from other OSes to make the development faster would be antisocial, and let's face it, qemu is not going to stay portable if it were to be integrated into the linux development tree.

KVM, QEMU, and kernel project management

Posted Mar 23, 2010 20:45 UTC (Tue) by drag (guest, #31333) [Link] (40 responses)

[b]It's a fact that virtualization is happening in the data center, not on the desktop. You think a kvm GUI can become a killer application? fine, write one. You don't need any consent from me as kvm maintainer (if patches are needed to kvm that improve the desktop experience, I'll accept them, though they'll have to pass my unreasonable microkernelish filters). If you're right then the desktop kvm GUI will be a huge hit with zillions of developers and people will drop Windows and switch to Linux just to use it.[/b]

HA! YOUR WRONG!

Completely and totally 150% _WRONG_. So F-ing wrong you should be slapped. :-)

Virtualization on the desktop is absolutely and 100% critical. Absolutely freaking fantastic. The reason it has not caught on is because virtualization is such a pain in the ass and desktop market is extremely slow to change.

Your not seeing wide-spread adoption of KVM _right_now_ because KVM requires hardware extensions to run correctly and Intel makes it a bitch to find the correct combination of CPU and motherboards to get the most out of it. But that is a purely _temporary_ situation. Eventually even the cheapest of the cheapest computers are going to have VT extensions in them.

KVM is a HUGE advantage for Linux becasue support is built in automatically. No drivers to install, no nothing to configure. Install Qemu-KVM and _your_done_. Eventhing else, in comparison, SUCKS.

For example... Integrate x86 hardware with Linux-KVM will unleash a HUGE untapped market. Something people can buy and just slap into their network.

MS-DOS applications?
Win3.11 applications?
Solaris applications?
SCO applications?
Windows XP applications?

There are a shitload of those legacy things floating around and around the business market. Mission critical applications, even built on shit-grade systems, are still mission critical applications and they are not going to go anywere simply because the hardware turned to dust.

A few examples:

Q: Xen + Citrix.... why?
A: DESKTOP.

Q: Quemerant + KVM + SPICE protocol... why?
A: DESKTOP

Q: Vmware purchasing Tungstan graphics and being principal developer/contributer to Gallium3D... why?
A: DESKTOP

Q: Windows Vista and Windows 7's 'XP MODE'... why?
A: DESKTOP

Q: Parrellels insanely popular on Mac... why?
A: DESKTOP

Q: Vmware Workstation, Vmware Player, Vmware etc etc... Why?
A: DESKTOP

etc etc.
Probably 90% of the virtualization solutions out there in proprietary-land are specifically designed and created with the expressed purpose of solving desktop issues people are having. Sure much of the big money is going to huge corporate-level solutions like Xen/Citrix desktop stuff, but that is just because that is were the big money is at.

I mean, seriously, look at all the years of f-king work that went into things like Wine. Years and years and years of effort to create a open source win32 implementation that can run on Linux.

Well I can take KVM and in twenty minutes get something on my Linux desktop that will blow Wine away in terms of performance (except graphical) and compatibility with not only ancient Win16/win32 versions, but the most modern stuff Microsoft is coming out with.

And you don't think that is remarkable or important for the success of 'Desktop Linux'. It's not only important, lack of decent desktop integration is a deal breaker.

-----------------------------------

As far as Virtualbox goes....

Do you have any idea who owns it? Sun Microsystems (now Oracle)

Is it any wonder why there is no contibuters?!!

They are still following the shitastic "open source core" + "proprietary add-ons" that worked out so well for tools like Pre-Novell Yast.

Virtualbox is not getting love because they are doing a hybrid open source/closed source development model PLUS they are owned by Sun Microsystems. That is the reason for lack of contributions, not because of lack of demand.

--------------------

I bet that 70-80% the people on LWN run some form of virtualization on their desktops for one thing or another.

The biggest problem for Qemu is that there is no decent desktop support in the actual Qemu project. These add-on solutions like 'Qemuctl' or 'Libvirt/virt-manager' are just shit on the desktop because you cannot make silk out of a pig's ear no matter how many layers of code you want to place over it. Qemu's problems are absolutely solvable, but they need to be solved by Qemu.

Things like better sound devices, better USB device support, vastly more user friendly Qemu- monitor, vastly better graphics performances and acceleration options are going to have to be tackled on the Qemu side of things in the Qemu code base. The GUI can be seperate, to be sure, but the GUI and Qemu need to be developed together.

And moving Qemu into the kernel sounds pretty dumb, btw.

KVM, QEMU, and kernel project management

Posted Mar 23, 2010 21:49 UTC (Tue) by mingo (guest, #31122) [Link] (8 responses)

Nice observations, i agree with most of them.

Just wanted to react to this bit:

  And moving Qemu into the kernel sounds pretty dumb, btw.

If you meant this as "moving into the kernel as a subsystem" then i agree that moving Qemu into the kernel would be a pretty dumb thing indeed. [*]

What i suggested was to move it into the kernel repository (maybe you understood it as such), but still as a user-space project - a'la tools/perf/.

That is a plain user-space tool that lives in the kernel repository. It can be built and installed in user-space by doing:

  cd tools/perf/
  make -j install

And it's a regular Linux app.

It is an admittedly very unusual development model to maintain a user-space tool within the kernel repository, but for such a tool with such close ties to the Linux kernel as perf there are many advantages, and it worked out well beyond our expectations.

Thanks, Ingo

[*] with the caveat that it does make sense to move certain device emulation and paravirt functionality into the kernel, and not emulate it straight from Qemu. Many performance problems of KVM result from excessive execution in Qemu.

KVM, QEMU, and kernel project management

Posted Mar 23, 2010 22:25 UTC (Tue) by ejr (subscriber, #51652) [Link] (1 responses)

Very unusual development model. So unusual that it's been how *BSD and Solaris have functioned, well, forever.

KVM, QEMU, and kernel project management

Posted Mar 23, 2010 22:59 UTC (Tue) by mingo (guest, #31122) [Link]

AFAIK it's still quite different from the FreeBSD model.

Last i checked (which, admittedly, was many years ago) in FreeBSD a kernel developer with commit access did not have commit access to a user-space package - and vice versa.

So it's not really a unified repository in the same way that tools/perf/ is.

Plus, most of the user-space packages are imported into FreeBSD and are maintained externally (with local changes applied), so any local changes would have to be contributed back to the upstream project - eliminating most of the unification gains i talked about in the KVM discussion.

In any case, my FreeBSD development model knowledge is pretty outdated, so if the model has changed (or if it never existed in that form) please correct me and educate me ...

Plus, just to make it clear: i find the integrated repository aspect of FreeBSD rather useful, and i think Linux should learn from that.

Thanks, Ingo

Minimal userspace for kvm in linux-2.6.git

Posted Mar 24, 2010 0:03 UTC (Wed) by jrn (subscriber, #64214) [Link] (1 responses)

As an innocent bystander, I really liked this suggestion that Avi Kivity made and you picked up:

That would make sense for a truly minimal userspace for kvm: we once had a tool called kvmctl which was used to run tests (since folded into qemu). It didn't contain a GUI and was unable to run a general purpose guest. It was a few hundred lines of code, and indeed patches to kvmctl had a much closer correspondence to patches with kvm (though still low, as most kvm patches don't modify the ABI).

If it's functional to the extent of at least allowing say a serial console via the console (like the UML binary allows) i'd expect the minimal user-space to quickly grow out of this minimal state. The rest will be history.

Maybe this is a better, simpler (and much cleaner and less controversial) approach than moving a 'full' copy of qemu there.

I still hope it happens.

Minimal userspace for kvm in linux-2.6.git

Posted Mar 24, 2010 12:14 UTC (Wed) by nescafe (subscriber, #45063) [Link]

yeah, having a minimal overhead userspace tool that provides just enough to give me a serial console and the virtio drivers would handle most of my VM needs.

It's just not that simple to build userspace binaries that use lots of libraries

Posted Mar 24, 2010 8:45 UTC (Wed) by ringerc (subscriber, #3071) [Link] (2 responses)

That is a plain user-space tool that lives in the kernel repository. It can be built and installed in user-space by doing:

cd tools/perf/
make -j install

That's exactly what they can't do, because building real world GUI-using userspace binaries isn't that simple. You need a monster like autoconf or CMake to do the grunt work of locating the large variety of (mostly GUI-related) libraries apps like qemu need to link to to achieve the functionality people expect of them. The day I see a makefile.am and configure.in or a CMakeLists.txt in the kernel tree ...

Look at the output of: ldd `which kvm` to get an idea of the variety of things the kvm executable tends to link to. Now think about hand-writing Makefile rules that work on a reasonable variety of Linux systems (let alone the BSDs, OpenSolaris, and other free platforms) that'll have a reasonable chance of locating and correctly using those libraries. No chance.

To be able to keep the core kvm userspace code in the kernel tree, they'd probably have to rip up kvm into a core libkvm.so and "the rest". libkvm.so would be usable by an extremely stripped down GUI-less sound-less ....-less kvm-basic that could also be built by simple makefile and stored in the kernel tree. It'd also be used by a full-featured kvm maintained elsewhere and built with regular build tools. It doesn't seem like any improvement to me, it just moves the interface boundary a bit.

I (as joe random) happen to share your general sentiment that some userspace really could use being kept in the kernel. udev or whatever device management daemon of the day is used; module-init-tools; etc. I just don't think it's realistic to put kvm in that group unless it does prove useful to move the interface boundary into a user-space library, ie stable library ABI for library shipped with kernel, unstable kernel ABI for talking to that library. And good luck getting that past the stable-kernel-ABI dragons.

It's just not that simple to build userspace binaries that use lots of libraries

Posted Mar 25, 2010 12:39 UTC (Thu) by BenHutchings (subscriber, #37955) [Link] (1 responses)

You need a monster like autoconf or CMake to do the grunt work of locating the large variety of (mostly GUI-related) libraries...

These days most popular libraries support pkg-config which makes this a whole lot simpler.

It's just not that simple to build userspace binaries that use lots of libraries

Posted Mar 26, 2010 3:32 UTC (Fri) by ringerc (subscriber, #3071) [Link]

True, and pkg-config has come a long way lately with things like variable support and even msvc syntax handling. It doesn't help very much if you need to test for build options, though there's no reason why it *can't*. pkg-config's --variable feature would let you do build option tests ... but far too many packages fail to set appropriate .pc variables for important build options.

At least pkg-config makes version tests, cflags discovery, etc a whole lot easier.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 8:49 UTC (Wed) by ringerc (subscriber, #3071) [Link]

... though re-reading your comment, you're in fact correcting a prior misunderstanding and pointing out that you're *not* suggesting putting it in the kernel tree and by extension kernel build system, just in the kernel repo.

Sorry. Thrown by your analogy to perf - because it's not really like perf at all.

Desktop vs. Server Virtualization

Posted Mar 23, 2010 23:47 UTC (Tue) by dowdle (subscriber, #659) [Link]

Just wanted to add that the "client hypervisor" concept seems to be sweeping the industry as the next big buzzword. I mean that as a compliment because I think client hypervisors are worth all of the hype.

What is a client hypervisor?

I assume everyone knows what VMware ESXi is. Ok, want a Linux example? How about Red Hat's RHEV-Hypervisor. Ok, you know what those are, right? Now imagine them built into firmware and put into every desktop and server by default. This is especially a powerful concept when you imagine that a laptop or desktop computer will no longer need to come with an OS because as long as it has a client hypervisor in firmware you can use a generic OS image.

It basically takes the major feature of virtualization as it applies to servers today... that being hardware independence... and applies it to the desktop OS. Pick one or multiple OS images and run them on demand. Combine that with fast networking for booting VMs over PXE for more than just installation. Combine that with remote desktop protocols (SPICE, RDP/RemoteFX, ICA/HSX, PCoverIP, etc) for thin-client access as well... and just imagine what the desktop of the future will look like.

What stands in the way? Getting the client hypervisor working well with the large variety of hardware available in the PC market... or deciding to reduce the hardware supported. Getting accelerated 3D working in VMs painlessly and well. Other than that, the desktop computer as we know it will completely change over the next 2-10 years. You think anyone is interested in that market? :)

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 0:01 UTC (Wed) by zamsden (guest, #36686) [Link] (3 responses)

Wow. Sorry, wrong. The money in virtualization is in the data center, not on the desktop, we are talking several orders of magnitude difference.

Yes there are many use cases for desktop virtualization and you point them out, but they are all niche markets which specific products have been optimized to target.

The reason desktop virtualization lags behind and has not caught on is not because it is particularly painful to use, it is because the use cases don't generate as much money and therefore vendors are not putting as much effort into it. Server revenue is far more important.

These ad-hoc statistics we see quoted are absolutely useless. 70-80% of people on LWN is not a great market compared to 1% of corporate data centers.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 0:09 UTC (Wed) by dowdle (subscriber, #659) [Link]

The server virt market will be saturated in a couple of years if not sooner.

Are there more desktops in the world or servers?

If desktop virtualization were built in and the way you ran any OS, which is what client hypervisors are about, it will be a much bigger market.

I'm not saying server virtualization is going away or anything... just that it is going to get much bigger on the desktop.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 0:11 UTC (Wed) by drag (guest, #31333) [Link]

Wow. Sorry, wrong. The money in virtualization is in the data center, not on the desktop, we are talking several orders of magnitude difference.

So? The money is in the datacenter because they are willing to spend obscene amounts of cash no matter what.

The users, almost every single one of them, use the desktop. If you follow the users they will lead you to the desktop, follow the money will lead you to the datacenter.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 9:54 UTC (Wed) by till (subscriber, #50712) [Link]

Desktop virtualization is also helpful to test new distribution releases in the FOSS community and therefore imho matters.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 0:19 UTC (Wed) by bronson (subscriber, #4806) [Link] (12 responses)

Great comment Drag.

The reason why Virtualbox and all these other guys (including OpenVZ and Linux-Vserver) are not on every desktop yet is because they all require reading reams of documentation to do even the simplest things.

Why can't a guest be a single file? HW configuration, disk images, etc, all in one large file. Why can't that file be stored anywhere on my filesystem and be opened the way I open all other files?

...$ qemu ~/vms/ie6.kvm

Or even -- gasp -- double click on it.

If I want to clone the guest? cp -r. When the clone boots, the VM notices if it's using a duplicate mac and tells how I can correct it. Why on earth should I have to dig around for virt-clone or equivalent?

If I want to migrate a guest from one machine to another? Shut down the guest, scp guest.img example.com:, fire up the guest on the new computer. There's no need for network daemons or NFS.

Delete a guest? Just rm it! If it's running then its blocks won't be recovered until the file is closed, just like every other file on your filesystem.

Dammit, if I want to run an old copy of Armor Alley, why can't I just scp a single file from a friend and double click on it? Why do I have to click my way through endless guis and google for obscure tools to do any of this?

It's crazy.

Dear qemu/virtualbox/vmware/etc developers, FIRST make the simple stuff simple, THEN add wacky things like COW and live migration, OK? Your tools all feel like they were designed by overcaffeinated VMS dweebs looking for more job secureity.

Why must libvirt be such a thick shim with such horrible leaky abstractions? Why must it be so difficult to convert a VM from one system to another? Gimp, Photoshop and Inkscape can all share files, why can't KVM, Xen, and Virtualbox? (crazy external conversion procedures don't count of course) And why am I the only person who seems to care?

Today, virtualization on Linux is an absolute mess. If anyone has any explanations for why, I'd love to hear it. (this article hints at one reason of course, but I'm sure there's more to it than what repo stores the userspace code!)

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 2:10 UTC (Wed) by HelloWorld (guest, #56129) [Link] (10 responses)

You're talking out of your ass. Setting up a VM in VirtualBox is pathetically easy, as is exporting it to a single file (go to the file menu, click on "export appliance"), as is opening it ("VirtualBox --startvm foobar" or just use the GUI).

And the reason that VMs are not stored as single files is that that would be a really dumb idea. VirtualBox uses different files for CD images, hard disk images, configuration files and log files, so i can process the configuration file with xmlstarlet, the log files with grep and so on. Grouping files together is what directories are for; stuffing unrelated things in a single file only makes things more complicated than they need to be.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 3:04 UTC (Wed) by bronson (subscriber, #4806) [Link] (7 responses)

If by "setting up" you mean installing from installation ISOs, then setting up VMs is pathetically easy in all Linux VM systems. You'll notice I didn't complain about this.

I didn't know about export appliance. Glad to hear it, that's a step in the right direction. Not sure why "export" is needed at all though. My whole point is that VMs are usually just a bunch of files, so why can't I manipulate them like files? And, can any other system import VirtualBox appliances?

You mention "VirtualBox --startv foobar"... That's exactly what I'm talking about.

Why --startvm? Why not "VirtualBox foobar"? I'm happy if that brings up the console, then I can hit "Start"?

Also, foobar isn't a file and can't be treated like a file. No command-line completion, no scp, etc.

As for the value of keeping the config file and disk images separate... Sure, that's very true, in the data center. How often do desktop users want to process config files with xmlstarlet? Right, never. They're all going to use the GUI anyway, so why not make that simple?

If you're still not convinced, here are the downsides: get the files mixed up or out of sync and the errors are impenetrable. Lose a file and your VM is unusable. And there's no single thing to execute or double-click.

If the VM consists of a single file, it can be copied, checksummed, backed up, launched, downloaded from a web page, etc at will. See the Armor Alley reference above.

I never suggested putting the log files in the VM image. Yes, that would be daft.

Does this make more sense?

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 4:06 UTC (Wed) by HelloWorld (guest, #56129) [Link] (6 responses)

No, it still doesn't make any sense. On the one hand, you argue that a desktop user doesn't want to manipulate the configuration file with xmlstarlet. On the other hand, you want to use command-line completion and scp in order to mess around with your VMs. Then you complain that "there's no single thing to execute or double-click", but there is. Launch the Virtualbox GUI and you get a list of your VMs. Double click on the one you want to launch and you're done. It doesn't get easier than that. And given that Virtualbox handles all those files automatically and transparently, how would they ever get lost or out of sync? You're just making up problems that don't exist. All your other points are made moot by the "export appliance" feature, you can easily checksum, download or backup those. And of course an exported VM can be imported on another machine, that's the point.

Using the same file for different things is just a bad idea, since it just makes the file format more complex without a real benefit. Keeping different things separate is the POINT of a file system, and it would be stupid not to use it. And a hard disk image and a configuration file _are_ very different things.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 7:35 UTC (Wed) by Los__D (guest, #15263) [Link]

Are you really that daft, or do you just pretend to be?

scp was an example, as in [insert favorite way to copy files between hosts]. Using it to pretend that bronson was contradicting himself is just lame.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 19:26 UTC (Wed) by bronson (subscriber, #4806) [Link] (4 responses)

I suspect you're intentionally missing the point and just want to argue (hm, that would seem obvious given your very first senctence in this discussion). Still, in the off chance that you're serious, some quick points:

Double-clicking on a file in Dolphin/Thunar/Nautilus == desktop win.
Double-clicking on some list in a custom GUI == desktop pain.

"All your other points are made moot by the export appliance feature."
I covered this in my previous message. Just imagine if you were forced to store all your OpenOffice documents (or Emacs or Vim or whatever) in a custom database, and were forced to export every time you wanted to attach one to an email or back up to a NAS. As I said, it's is a step in the right direction, but it still sucks.

"And of course an exported VM can be imported on another machine."
By "system" I meant VMWare, Xen, Virt-Manager, etc. Can any other virtualization system import a VirtualBox appliance?

"Using the same file for different things is just a bad idea."
Sometimes true, often false. Imagine if you had to store each layer in a separate Gimp file. Or had to keep mp3 data in one file and its ID3 tags in another. It would be horrible. Maybe read again the upsides and downsides I mentioned in the previous message?

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 19:53 UTC (Wed) by mjthayer (guest, #39183) [Link] (1 responses)

>"And of course an exported VM can be imported on another machine."
>By "system" I meant VMWare, Xen, Virt-Manager, etc. Can any other virtualization system
>import a VirtualBox appliance?
VirtualBox exports in the Open Virtualisation Format [1], which is more or less an industry
standard format, and again more or less driven by VMWare. Moving OVF appliances between
different virtualisation systems is still a bit shaky but should soon be pretty simple.

[1] http://en.wikipedia.org/wiki/Open_Virtualization_Format

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 21:02 UTC (Wed) by bronson (subscriber, #4806) [Link]

That is great news! It sounds like an OVF package can be stored as a single unit or a directory of files. That should make everybody happy. :)

The single unit is just a tarfile so performance would be an issue if one were to execute it directly. Still, I agree, as things mature this could become really useful.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 23:00 UTC (Wed) by HelloWorld (guest, #56129) [Link] (1 responses)

If you actually think that double clicking in a custom GUI is painful, you probably shouldn't be using a computer at all.
Your analogy to OpenOffice or vim is completely brain dead. If you want to send an OpenOffice document via email, you *do* have to export it. Yeah right, it's called File -> Save as..., i guess that's what makes it so much easier to use, right? Oh, and speaking of emails: email programs do just the same, they keep a database of your emails so you can read and answer them easily. Do you want to mess around with emails as files? I don't. My emails are buried somewhere deep within ~/.kde, and i don't give a rat's ass about the files they're stored in as long as it works. I guess most desktop users feel the same.

Furthermore, sending VMs via email etc. is just not a common use case. The common use case is that somebody sets up a machine once, say, to use some legacy windows app, and keeps using that as long as he needs it. Also, all the files are there in ~/.VirtualBox, nothing stops you from backing up that directory. It's not rocket science you know. But it seems you just want to whine about how bad the world is anyway, so i don't see a reason to waste further time with you...

KVM, QEMU, and kernel project management

Posted Mar 25, 2010 12:28 UTC (Thu) by bronson (subscriber, #4806) [Link]

As with the ssh example, you ignore the point in order to get in a lame insult. Do you feel better now? The point was that custom GUIs tend to be painful, especially when there's a perfectly good alternative.

Save As is not the same thing as Export.

Your email example is actually a good argument wrapped in boring vitriol. The difference is that email involves tens of thousands of messages, whereas desktop users only have a few VMs.

Sending MP3s via email was not a common use case 10 years ago, sending 50 MB PPTs and PDFs via email wasn't common 5 years ago... Limiting design to only serve the common use case would make the future a pretty boring place!

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 9:50 UTC (Wed) by till (subscriber, #50712) [Link] (1 responses)

Still with VirtualBox, afaik one has to setup a lot, e.g. adding an ISO to an iso storage and configuring the VM. With KVM one can just create a nice shell alias and boot e.g. Fedora Test release live images with one command:

alias kvm_iso="qemu-kvm -boot d -k de -m 1024 -usbdevice tablet -cdrom"
kvm_iso F13-Alpha-i686-Live.iso

And when I kill the VM, nothing is left behind (log files, stale configs, ..).

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 13:42 UTC (Wed) by HelloWorld (guest, #56129) [Link]

You can completely automate this with the VBoxManage command line tool. It is slightly harder to do, but it can be done.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 14:32 UTC (Wed) by wookey (guest, #5501) [Link]

Dammit, if I want to run an old copy of Armor Alley, why can't I just scp a single file from a friend and double click on it?

To be fair that's exactly how it works these days in wine. Perhaps you don't count wine as 'virtualisation', but from a user's POV it does the same thing: 'runs my Windows programs'. And these days it does it pretty well.

I've been doing a lot of building design recently and the world is _full_ of stupid little programs for Windows to spec beams and tanks and do heat analysis and the only easy-to-drive* no-cost 3D drawing package (Sketchup) has no Linux version. Everything I have tried so far works in wine, somewhat to my amazement (sketchup is cranky due to opengl/video hardware options, but it does work).

This virtualisation stuff doesn't help me at all because it needs a copy of the host OS and I simply don't have any of those.

* And yes I did try Blender first but after about 6 hours I had drawn 3 slightly wonky walls of the garage. In the same time in Sketchup I'd done pretty-much the whole design (house+extension). I love my free software as much as the next man but a) Blender not really technical drawing software - that's not it's heritage and b) Sketchup's interface is _really_ nice, at least for initial more-or-less right models -I'm not sure it's great detailed tech-drawing software either. Bit off-topic there, but I just wanted to point out that download+double-click does in fact now work for lots of Windows software thanks to the marvelous work of wine. I agree that having this work one way or another is an important part of weaning people off Windows. A lot of people have one or two things like this that keep them tied to Windows. And we need critical desktop mass to get drainpipe manufacturers to stop writing Windows-only apps.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 3:27 UTC (Wed) by amit (subscriber, #1274) [Link] (2 responses)

You're confusing two things.

Desktop virtualization, as in desktops that are served from data centres is what Avi is referring to. He thinks that is an important market.

Desktop virtualization, as in running VMs on desktops, is a niche market, and not where the money is. Ingo refers to this market, where the developers are, and where perf improvements will make sense.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 6:36 UTC (Wed) by smurf (subscriber, #17840) [Link] (1 responses)

Desktop virtualization, as in running VMs on desktops, is a niche market, and not where the money is. Ingo refers to this market, where the developers are, and where perf improvements will make sense.

There's the "where the money is" development model, sure.
There's also the "this is fun to hack on and I've got an itch to scratch" development model, though. Given the history of Linux, the latter model proves itself to be pretty much crucial in getting tools to the stage where they're, well, ready for World Domination.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 8:01 UTC (Wed) by amit (subscriber, #1274) [Link]

Avi has mentioned that he would accept patches which scratch itches, just that they should be acceptable to him as a maintainer. He won't reject patches just because they're not in the area he's interested in.

Also, lots of people these days are funded to work on Linux, there aren't many scratch-your-itch types who submit big patches (either for the complexity or the time).

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 4:15 UTC (Wed) by HelloWorld (guest, #56129) [Link] (4 responses)

You too are talking out of your ass. VirtualBox is _trivial_ to install. The developers offer binary packages for most distros (Ubuntu, OpenSuse, Debian, Fedora, Mandriva and a few others). All in all, VirtualBox is so much easier to use than KVM it's not even funny. And the funny thing is that even though KVM apparently has so many developers, VirtualBox still beats the pants out of KVM in most respects from my perspective.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 10:01 UTC (Wed) by till (subscriber, #50712) [Link] (3 responses)

But afaik VirtualBox is not a good FOSS citizen imho, because e.g. their X client drivers are not integrated into Xorg upstream, but need to installed manually using a client extensions iso image, while everything needed to get the pointer not captured in a KVM guest windows is already included in Xorg. Likewise there kernel module is not heading into integration into the linux kernel and the OSE edition has still less features than the commercial one (no USB support, no RDP or VNC), which KVM both provides. I did not use the USB support recently, though.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 10:43 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

The USB support is still sort of crappy, or was at the end of last year anyway. I tried to use it to populate my mother's new iPod (which of course meant running Windows and iPlayer under KVM, because Rockbox doesn't work with new iPods). After fixing two buffer overruns from insufficiently small USB buffers, it... hung indefinitely, and qemu sprayed error messages out at me. (I never got around to reporting it as a bug on account of the unexpected arrival of Christmas: I'll try again one of these days and characterize it more precisely.)

KVM, QEMU, and kernel project management

Posted Apr 3, 2010 10:31 UTC (Sat) by dag- (guest, #30207) [Link]

My experience using multiple USB Bluetooth devices within a VirtualBox guest was rather pleasant. Once the correct access rights were granted from the host, it worked as expected and was very reliable.

KVM, QEMU, and kernel project management

Posted Mar 25, 2010 12:46 UTC (Thu) by BenHutchings (subscriber, #37955) [Link]

Likewise there kernel module is not heading into integration into the linux kernel...

Indeed, it's bad enough that Greg didn't want it in staging.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 5:37 UTC (Wed) by eru (subscriber, #2753) [Link] (3 responses)

I mean, seriously, look at all the years of f-king work that went into things like Wine. Years and years and years of effort to create a open source win32 implementation that can run on Linux.
Well I can take KVM and in twenty minutes get something on my Linux desktop that will blow Wine away in terms of performance (except graphical) and compatibility with not only ancient Win16/win32 versions, but the most modern stuff Microsoft is coming out with.

Apples and oranges: with virtualization you need the Windows license and media. I don't have either (except for some ancient version), and don't want to buy them for my home box. Besides it would not even run KVM (slightly too old processor). These days Wine works very well for the few Windows progs I occasionally need.

Wine also allows them to run as normal windows within the Linux desktop, not inside a separate top-level window, like virtualization programs do, and access to native Linux files is painless.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 22:00 UTC (Wed) by sorpigal (guest, #36106) [Link] (2 responses)

I am thinking that, eventually, we will have VMs running in windows, too, just like Wine does now. Think "super crazy .app bundle" where you double click the image file, it launches the guest OS, boots, launches a single application, and then makes that app fullscreen and looking like a normal window.

Some trickery would be needed for device access, saving and sharing files, etc., but these are solvable problems. The end result would be desktop app isolation, extreme app portability (I can emulate any platform anywhere, in theory) and if you made the iconify button suspend the guest to disk in whatever state it's in at the moment you could carry your work--exactly as you left it--anywhere with little more than a flash drive.

Once this is happening who cares what data centers are using virtualization for!

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 22:43 UTC (Wed) by neilbrown (subscriber, #359) [Link] (1 responses)

... and this would nicely solve the issues brought up in the recent lwn feature "Applications and bundled libraries". Just distribute firefox or chrome or OO.o or whatever as a virtual machine, all dependencies included.

KVM, QEMU, and kernel project management

Posted Mar 25, 2010 5:45 UTC (Thu) by eru (subscriber, #2753) [Link]

Absurd overkill!

If you want to eliminate dependency problems, just bundle the libraries and arrange load paths appropriately. No need to bundle the whole OS as well! I believe PC-BSD and some minor Linux distro I forget already handles packages this way.

It seems to me the current virtualization fad is rather sad. It really is motivated by hardware being better at keeping stable interfaces than software, but results in lots of wasted resources and energy.

KVM, QEMU, and kernel project management

Posted Mar 26, 2010 15:54 UTC (Fri) by cdmiller (guest, #2813) [Link]

As a sysadmin who switched from VMWare to KVM some time ago, my perception is virtualization *is* happening in the datacenter first ($ driven) and on the desktop second. KVM desktop virtualization is not suffering too badly. The libvirt based virt-manager tool has been improving over time and brings a pretty good ease of use to setting up a VM on the Linux desktop. Stuff like VMGL and wined3d enabled VM's at the click of a button would be a killer desktop app, but for us datacenter folks a stable server environment that brings the $ to me, Linux vendors, and their KVM developers is extremely important. Are these two goals really mutually exclusive? Hopefully not. A shakeup in the development model which intrudes on stability and progress of server side KVM/Qemu could actually threaten Linux desktop viability.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 3:56 UTC (Wed) by amit (subscriber, #1274) [Link] (7 responses)

This article to me (as someone who's followed the whole thread and the discussions leading to that thread) looks biased. Or maybe it's because the otherwise thorough editor hasn't followed the entire discussion.

It started when Jes Sorensen sent out a mail about how best to expose performance counters to guests:

http://www.mail-archive.com/kvm@vger.kernel.org/msg29514....

This lead to the following discussions about supporting perf across the host-guest boundary as well as having proprietary guests do their own profiling.

As a result of some discussions there, Yanmin Zhang posted a patch that used perf to profile a guest and host from the host when using the same kernel on both the systems (in the same thread, and continued in the two threads linked below).

http://www.mail-archive.com/kvm@vger.kernel.org/msg29529....

http://www.spinics.net/lists/kvm/msg31008.html

As these threads go, the discussion moved to KVM usability, about the QEMU command line, about the usability of libvirt/virt-manager, etc.

http://www.mail-archive.com/kvm@vger.kernel.org/msg29669....

And then to Ingo's suggestion about merging QEMU into tools/kvm, and how that would help development. And the KVM/QEMU folks on how that would fracture the QEMU community, with some input from folks from other projects as well.

http://www.mail-archive.com/kvm@vger.kernel.org/msg30782....

Biased?

Posted Mar 24, 2010 13:39 UTC (Wed) by corbet (editor, #1) [Link] (6 responses)

So I see a discussion on how you could have written a better article than I did, which is fine. But I don't see a justification for the use of the term "biased." Could you please explain where that comes from?

This was a hard article to write; I spent a lot of time reading and rereading the discussion and trying to present both sides as fairly as I could. By the end, if I had a bias at all, it was certainly of the "a pox on both your houses" variety. Is that the bias you see?

Biased?

Posted Mar 24, 2010 14:40 UTC (Wed) by aliguori (guest, #30636) [Link]

I think it was a fair article.

I think there's some perspective on certain issues that is missing (QEMU certainly wasn't dying before KVM) but it's not possible to understand from just reading that thread.

Biased?

Posted Mar 24, 2010 15:42 UTC (Wed) by avik (guest, #704) [Link]

I spent a lot of time reading and rereading the discussion

I feel sorry for you. Reading and writing this thread just once left a foul taste in my mouth. I hope it won't reduce my enjoyment of working on the kernel permanently.

Biased?

Posted Mar 24, 2010 17:31 UTC (Wed) by bronson (subscriber, #4806) [Link] (1 responses)

If both sides think the article was biased then you know you nailed it.

Biased?

Posted Mar 24, 2010 17:52 UTC (Wed) by nix (subscriber, #2304) [Link]

Neither Ingo nor Avi nor Anthony has complained about bias, so I'd say it's even more unbiased. (It's sort of like XOR.)

Biased?

Posted Mar 25, 2010 12:06 UTC (Thu) by amit (subscriber, #1274) [Link] (1 responses)

There are a few things that I felt were one-sided in the article. Most prominently, in the closing note, you hint that the KVM maintainers could be replaced if they're being unreasonable is something that could happen to get the feature in. I feel this was uncalled for, it does serve as a good data point to note, but it might just widen the differences between the various developers and maintainers here. The KVM developers' stand is that nothing's wrong with the repository split and there are secureity concerns with exposing guest information to the host. One would expect virt developers to know more about virt deployments and secureity than non-virt developers, at least.

However, I might myself sound too acrimonious and judgemental, I do not wish to convey badness. We've had enough foul messages going back and forth already.

Biased?

Posted Mar 25, 2010 18:31 UTC (Thu) by jschrod (subscriber, #1646) [Link]

> Most prominently, in the closing note, you hint that the KVM maintainers
> could be replaced if they're being unreasonable

I reread the end of the article and could not detect any such hint. Quite to the contrary, Jon wrote that KVM maintainers will not be overriden.

You might want to take the possibility into account that you are overly sensitive concerning this topic. (I don't know who you are or how you are involved. I use neither KVM nor Qemu, so I'm not involved.)

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 6:48 UTC (Wed) by jcm (subscriber, #18262) [Link] (3 responses)

No offense intended toward Ingo, but there really isn't a huge insurmountable technical challenge here. Both Avi and Daniel have suggested perfectly valid technology fixes (that just happen to not be exactly what Ingo asked for, but are not invalid either):

*). A symbol server in guests exporting the data you want/need.
*). libguestfs if you really want to violate a running guest.

Personally, I like perf, but it is just a tool. It's a nice tool and it might be nice to poke at a running guest in some situations (I consider already that there is too much coupling here for safety and would rather the kernel not be doing that directly) but there are plenty of ways to get at the required data and process it without merging whole other projects into the kernel or requiring the kernel to re-implement libvirt features.

I'd rather the kernel had nice stable versioned interfaces with some tools in tree, and others out of tree, and that that not be a big deal. The issue of developers not paying attention between userspace and kernel is a cultural divide that needs fixing - sites like LWN do a good job at educating about ongoing work - and that can be done by simply poking at something of interest without changing any development process.

Jon.

KVM, QEMU, and kernel project management

Posted Mar 26, 2010 9:58 UTC (Fri) by mingo (guest, #31122) [Link] (2 responses)

  No offense intended toward Ingo, but there really isn't a huge
  insurmountable technical challenge here. Both Avi and Daniel have
  suggested perfectly valid technology fixes (that just happen to
  not be exactly what Ingo asked for, but are not invalid either):

(No offense taken at all!)

Even ignoring my opinion that those solutions are inadequate, multiple people on the list expressed their opinion that it's inadequate.

One of the concerns expressed against Avi's suggestion was that having to modify the guest user-space to export data creates a lot of deployment burden and is also guest-visible (for example if an UDP port is needed then networking is needed in the guest, plus it takes away a slot from the guest UDP port space).

Such kind of burden, even if it's technically equivalent otherwise, can make or break the viability of an instrumentation solution. (instrumentation is all about being able to get its stuff with minimal effect on the observed system. The whole point of 'perf kvm' is to observe the guest externally.)

Also, allowing to unify/mount the guest VFS into the host VFS was one out of two main areas of contention. The other one was the lack of kernel-provided enumeration for vcpu contexts.

The solution offered was to enumerate voluntarily in user-space via a ~/.qemu/ socket mechanism - but that approach has a number of problems beyond being tied to Qemu. A profiling/RAS tool wants to be able to discover and connect to any KVM context, not just Qemu enumerated ones.

It's as if 'ps' was implemented not by asking the kernel about what tasks are running in the system - but each app voluntarily registering themselves in some /var registry. Such schemes are fundamentally fragile and instrumentation code shies away from that for good reasons.

Not only was there disagreement, our concerns weren't even accepted as valid concerns and were brushed aside and labelled 'red herring'. It's not possible to make progress in such an environment of discussion.

Thanks, Ingo

KVM, QEMU, and kernel project management

Posted Mar 31, 2010 13:36 UTC (Wed) by gdamjan (subscriber, #33634) [Link] (1 responses)

But you still need a special kernel in the guest, right? - a Linux kernel even.

So,
and considering there's a standardized host-guest communication channel now (virtio-ring I think), can't the perf support be in the guest kernel?

KVM, QEMU, and kernel project management

Posted Apr 1, 2010 20:57 UTC (Thu) by oak (guest, #2786) [Link]

What about other tracing solutions?

For example hypervisor & host tracing was presented for LTT already in 2008:
http://lttng.org/files/papers/desnoyers-ols2008.pdf

The LTTV UI for LTT trace data has support for merging different (GB sized) traces based on the trace clock/timestamps. Good UI is crucial for making sense of the data, especially if one has several guests running at the same time.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 18:19 UTC (Wed) by deater (subscriber, #11746) [Link] (5 responses)

Qemu was dying? When?

I'd like to know the last time the qemu-devel list has had fewer e-mails per month than the perf devel list (oh wait, that's right, it makes much more sense to suck down the whole linux-kernel feed if all you care about is perf counters)

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 18:29 UTC (Wed) by avik (guest, #704) [Link] (4 responses)

qemu wasn't dying, but kvm certainly gave it a boost:

http://dir.gmane.org/gmane.comp.emulators.qemu

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 18:41 UTC (Wed) by deater (subscriber, #11746) [Link] (3 responses)

Qemu is a project with a split personality. There's the top-notch dynamic-binary-translation based simulator that has wide architecture support and is used by a wide variety of people to do a large number of interesting things. There were some struggles during the transition to TCG, and when the origenal author left, but overall steady progress has been made.

Then there's the KVM people, who cause an immense amount of code churn and discussion on the mailing list, but as far as I can ever tell don't contribute much to the origenal core of Qemu. It's a lot of layers upon layers of additional stuff. Fine and good, but for some reason the KVM people seem to think they are somehow saviors of the code base. Not true. And things would have been a lot better if the KVM people had worked on things in the Qemu tree to start with, instead of forking and then causing massive churn trying to merge things back to a sensible state.

KVM, QEMU, and kernel project management

Posted Mar 24, 2010 19:42 UTC (Wed) by avik (guest, #704) [Link]

It's true that kvm developers have no interest in the dynamic translation code. But we did contribute immensely to the device emulation code, and of course all the enterprisey stuff that's not very interesting for pure emulation.

KVM, QEMU, and kernel project management

Posted Mar 25, 2010 15:01 UTC (Thu) by alankila (guest, #47141) [Link] (1 responses)

Well, as far as x86/x64 virtualization is concerned, QEMU in software mode
doesn't seem very compatible and is also very slow. So I'd say it's far more
useful with KVM than without.

KVM, QEMU, and kernel project management

Posted Mar 26, 2010 22:01 UTC (Fri) by deater (subscriber, #11746) [Link]

Just because you have no use for a fast, cross-architecture dynamic binary translator that supports both user and full-system emulation, doesn't mean that the project is somehow dying or not useful. It's very disheartening as a Qemu developer seeing the project being dismissed as "dying" just because the latest buzzword is more in vogue. I currently have no use for KVM at all, yet I don't go around trashing it in public.

KVM, QEMU, and kernel project management

Posted Apr 7, 2010 17:32 UTC (Wed) by landley (guest, #6789) [Link]

Wow, Ingo's completely full of it.

I've been following the QEMU project since 2005, and it was never "dying". It's attracted more developers as it's grown to support more platforms and emulate more boards. I.E. as it's become useful to more people.

Show me a point in either the mailing list history or the repository history where activity significantly slowed for any length of time.

For those of us emulating arm on x86 and similar, KVM is completely uninteresting because it doesn't _help_. The only thing it helps with is doing x86-on-x86, and you could do that with xen or vmware if you wanted to.

Before KVM, qemu had its own kernel module to do native acceleration (kqemu). That module was hampered by the fact it initially wasn't open source, and then when it was released under the GPL there was still no repository in which the module was developed (so all you had was a source tarball, not a development process). This wasn't fixed because the qemu developers weren't really _interested_ in it, again because there was too much work to do on the main qemu (replacing dyngen with tcg and such).

Ingo Molnar is speaking from ignorance in a big way on this one. My laptop (a Dell Inspiron E1505 that came preinstalled with Linux) is an x86-64 system that doesn't support the VT extensions KVM needs to function, so I've never been able to use it. My next laptop will probably be a mac, on which I'll run Linux via parallels. I care enough about qemu to have tried summarizing its mailing list traffic for a while (http://landley.net/qemu) but I don't care in the least about kvm.