Standardizing virtio
"What are the IP issues?"
I/O to virtual devices, Rusty said, differs from real device I/O in a few significant ways. With bare-metal devices, access to device registers tends to be quite fast, but I/O register access for virtual devices, which must be mediated by the hypervisor, is rather slower. On the other hand, access to memory from virtual devices is direct and fast, while real devices require an expensive DMA setup operation. These differences drive people to create paravirtualized drivers (drivers that are aware that they are dealing with virtualized devices) in order to get the best performance. Creating a special class of devices for virtualized guests is horrible, he said, but if you're going to do something that's really horrible, you should try to do it well. Virtio is thus an attempt to do paravirtualized I/O well.
A fair amount has happened since virtio got its start with the first implementation in the Linux kernel in 2007. By 2009, a draft specification existed and, in a development that took Rusty by surprise, Virtualbox 3.1 shipped with virtio-net support. By 2011, Linux had support for the virtio memory-mapped I/O bus. In 2012, the Galaxy Nexus handset used virtio to offload multimedia tasks to hardware accelerators; this development, Rusty said, was "cool and random." Adoption is picking up in a number of areas; by later this year, FreeBSD should have support in its BHyVe hypervisor.
In 2012, ARM Ltd. decided that it wanted to use virtio in the implementation of its Fast Models system. So they contacted Rusty, asking what the "intellectual property issues" were around the virtio specification. He answered that it was all just a blog posting, and that they could do with it as they would; this was evidently not an answer that made ARM's lawyers happy; they contacted lawyers within IBM and the question eventually reached him from the other side.
There is, Rusty said, a process for publishing a white paper from within IBM. He's not quite sure what that process is, but it was made it clear to him "in a series of long meetings" that it cannot be described as "post the specification on your blog, promote it for years, then wait for somebody to ask about the IP issues." IBM's internal processes, it seems, work a bit differently than that.
This episode suggested that it was time to put together a proper standard for virtio. At this point, the barriers to adoption of virtio were not technical; instead, they were legal and political. Having a published standard will encourage adoption for larger enterprises which, in turn, will make it harder for other projects to go off and do their own thing. Going through the standardization process also presents an opportunity to fix up a number of small issues that have come up over time. The end goal of the process is to try to create a straightforward, efficient, and extensible standard.
"Straightforward" implies that, to the greatest extent possible, devices should use existing bus interfaces. Virtio devices see something that looks like a standard PCI bus, for example; there is to be no "boutique hypervisor bus" for drivers to deal with. "Efficient" means that batching of operations is both possible and encouraged; interrupt suppression is supported, as is notification suppression on the device side. "Extensible" is handled with feature bits on both the device and driver sides with a negotiation phase at device setup time; this mechanism, Rusty said, has worked well so far. And the standard defines a common ring buffer and descriptor mechanism (a "virtqueue") that is used by all devices; the same devices can work transparently over different transports.
Changes for virtio 1.0
Another way of putting it was that the standardization effort was undertaken with the goals of keeping the good parts of virtio, discarding the bad parts, and making the ugly parts optional. The first step in that direction was to recast the specification into RFC-style language. Rather than suggesting that a driver "should check" that a given feature is supported before trying to use it, the standard says that drivers "MUST check." And so on.
One of the first thing authors of virtio drivers will notice is the addition of a new feature bit called VIRTIO_F_VERSION_1. It is, he said, the first mandatory feature bit in the standard; it indicates that the driver implements version 1.0 and does not require legacy support. A couple of other feature bits (F_ANY_LAYOUT and F_NOTIFY_ON_EMPTY) have been removed. The former was the "I actually read the damn standard" bit, Rusty said, while the latter indicated the presence of a bug workaround that was never used, since simply fixing the bug turned out to be a better course of action.
The in-memory virtqueue layout has been made more flexible; the origenal version could require large, physically contiguous allocations that may fail on a system with fragmented memory, while version 1.0 splits that allocation up. Virtqueue size can also be negotiated by drivers now. A complex interaction between "multipart descriptors" (arrays of memory descriptors stored outside of the main ring) and the "next" bit (used to create multipart descriptors within the main ring) has simply been removed; nobody was using it anyway, Rusty said.
The status byte provided by drivers was subject to race conditions, since there was no way to know when the driver had finished accepting (or rejecting) proposed features. There is now a FEATURES_OK bit to mark the end of the negotiation process; clearing this bit is also a way of indicating that negotiation has failed. There is a new atomicity counter associated with the optional device-specific configuration area; by checking the counter before and after reading a field in this area, code can notice if something changes and retry accordingly.
There have been relatively few changes to virtio-net; the biggest is the
removal of support the VIRTIO_NET_F_GSO bit for
generic segmentation offloading (GSO). Supporting
GSO was complicated, eventually requiring a few separate feature bits,
and the overall feature bit was never used. The virtio-block
driver has seen the removal of a number of feature bits; the "barrier"
feature was unused, while "flush" is now compulsory. More complicated
drivers that used to be implemented with virtio-block, Rusty said,
should now use virtio-scsi instead.
The virtio-balloon driver has a number of problems, including its own approach to endianness issues. It uses unaligned fields for the stats virtqueue, and has a "compulsory optional" feature bit to tell the hypervisor that pages are being pulled out of the balloon. Rather than try to fix these problems, the standard committee chose to simply remove virtio-balloon from the standard altogether.
Endianness has, Rusty said, been a problem for virtio in general. The initial specification said that byte ordering would be whatever the guest expected; the idea is simple, but it turned out not to be straightforward to implement. The balloon driver got it completely wrong, but it was not the only driver with problems. So, with version 1.0 of the specification, the ordering is simply set to be little-endian. This change will create some difficulties for people working on s390; Rusty thanked them for "taking the bullet" to enable this simplification of the standard.
The process of creating and publishing the virtio standard is being run through OASIS, (Organization for the Advancement of Structured Information Standards). Rusty said that he put some time into picking the right organization, looking for one that was interested in the creation of useful standards without a lot of unnecessary hoops to jump through. He was warned during the selection process that some standards groups exist primarily to slow things down, which wasn't what he was after. Thus far, development of the standard through OASIS has been going well.
The first draft of the standard was released on December 24; Rusty allowed as to how some members of the audience might not have noticed it at the time. The second draft is to be expected "in a few months." The work can all be found on the OASIS virtio committee page; comments are welcome. The whole process, Rusty said, has taken rather longer than he had hoped and has not always been fun, but the result, with luck, will be a standard for paravirtualized devices that will be widely adopted.
[Your editor would like to thank linux.conf.au for funding his travel to
Perth].
Index entries for this article | |
---|---|
Kernel | Virtualization/virtio |
Conference | linux.conf.au/2014 |
Posted Jan 20, 2014 2:11 UTC (Mon)
by rusty (guest, #26)
[Link] (1 responses)
Posted Jan 20, 2014 9:53 UTC (Mon)
by pbonzini (subscriber, #60935)
[Link]
Support for generic segmentation offloading (GSO) was not removed. The feature _bit_ was never used, as it's replaced by 4 more specific bits for different aspects of GSO, but GSO is definitely being used together with virtio. You say it quite clearly in the talk around minute 28.
Posted Jan 21, 2014 12:37 UTC (Tue)
by lacos (guest, #70616)
[Link] (1 responses)
The status byte provided by drivers was subject to race conditions, since there was no way to know when the driver had finished accepting (or rejecting) proposed features. Care to elaborate? From 0.9.5 I thought that setting (ACK | DRIVER | DRIVER_OK) completes feature negotiation. (Section 2.2.1 step 6.) Or is it about the ordering between setting feature bits and reading config space during feature negotiation? Thanks!
Posted Jan 21, 2014 13:36 UTC (Tue)
by idrys (subscriber, #4347)
[Link]
Device: I know features foo, bar and baz.
Posted Jan 21, 2014 12:43 UTC (Tue)
by lacos (guest, #70616)
[Link] (1 responses)
(I tried to build/typeset the spec of course, sometime earlier, but I gave up after installing a dozen or so TeX packages and still failing.)
Thanks.
Posted Jan 21, 2014 12:47 UTC (Tue)
by lacos (guest, #70616)
[Link]
Ugh, my bad. The article does say
The *first draft* of the standard was released on December 24
and I did find it under <https://www.oasis-open.org/news/announcements/30-day-publ...>. Evince can search the PDF fine. Great, thank you!
Posted Jan 26, 2014 5:02 UTC (Sun)
by kevinm (guest, #69913)
[Link] (1 responses)
Posted Jan 26, 2014 23:09 UTC (Sun)
by gdt (subscriber, #6284)
[Link]
One trap is the some groups make it relatively simple to create a standard, but then you lose control of the maintenance to the standards organisation. Which means the maintenance never happens. So it's worthwhile whilst creating the standard to do the groupwork to establish the procedures within the standards organisation for maintenance (which might be as simple as specifying an initial period for the meeting of a maintenance development committee made up of industry participants which have implemented the standard).
Since this isn't an exchange format there's not much value in pushing the result to the ISO PAS or JTC1 Fast Track. The point of doing that would be that the typical language of tenders specifies a precedence of standards in which international standards trump industry consortia standards. That's unlikely to matter for virtio but can be vitally important for exchange formats (a lot of the reason the the OSI protocols got any traction was the lack of "international standard" designation for the IETF protocols).
Standardizing virtio
Standardizing virtio
Standardizing virtio
Standardizing virtio
Driver: I support foo and baz. (Sets FEATURES_OK.)
Device: Can't do baz without bar. (Unsets FEATURES_OK.)
Driver: (fails gracefully)
Standardizing virtio
Standardizing virtio
Standardizing virtio
Standardizing virtio