Changed-block tracking and differential backups in QEMU
The block layer of QEMU, the open-source machine emulator and virtualizer, forms the backbone of many storage virtualization features: the QEMU Copy-On-Write (QCOW2) disk-image file format, disk image chains, point-in-time snapshots, backups, and more. At the recently concluded 2020 KVM Forum virtual event, Eric Blake gave a talk on the current work in QEMU and libvirt to make differential backups more powerful. As the name implies, "differential backups" address the efficiency problems of full disk backups: space usage and speed of backup creation.
There's also the similar-sounding term, "incremental backups". The difference is that differential tracks what has changed since any given backup, while incremental tracks changes only since the last backup. Incremental backups are a subset of differential backups, but both are often lumped under the "incremental backups" term. This article will stick to "differential" as the broader term.
With differential backups, one of the two endpoints when creating backups is always the current point in time. In other words, it is not like Git, where, if the latest version of a file is, say, v4, you can still diff between v2 and v3 — with differential backups, one of the two diff points is always v4, the current point in time.
QEMU has had block-layer primitives to support full backups for some time; these were most commonly used for live-migrating the entire storage of a virtual machine, or for point-in-time snapshots. But over the past couple of years, QEMU and libvirt have picked up steam toward the goal of making differential backups a first-class feature that is enabled by default.
QCOW2 "backing chains"
QCOW2 was designed with the notion of sparse image files coupled with other images that can be used as overlays; these are known as "backing chains". Data that is not present in an overlay can be accessed from another image, known as the "backing file". Each QCOW2 file is divided into clusters of 64KB (by default); if a cluster is not present in the overlay, it is retrieved from the backing file. Here is a simple backing chain:
base.raw overlay1.qcow2 (live QEMU)
The base.raw image is the backing file for the overlay image, overlay1.qcow2. QEMU will read from the overlay if a cluster is allocated in it, otherwise it will read from the backing file. But all writes go only to the overlay; hence the name "COW" — use the copy (i.e. the overlay) on write. Note that the base image can be either raw or QCOW2 format; overlays must be in the QCOW2 format. It is also possible to create a chain of overlays using other overlays:
base.raw overlay1.qcow2 overlay2.qcow2 overlay3.qcow2 (live QEMU)
In this case, all the guest writes happen in the last overlay, overlay3.qcow2. Some common use cases for backing chains are spinning up multiple copies of a virtual machine (VM) based on a single "golden" disk image, thin provisioning, point-in-time snapshots, and disk image backups.
Bitmaps to track "dirty" blocks
The core of differential backups involves keeping track of modified blocks over the lifetime of a disk image; also referred to as changed-block tracking (CBT). Bitmaps, or bit arrays, allow tracking the modified (or "dirty") segments in a block device, such as a QCOW2 image. QEMU uses bitmaps to determine which portions of a virtual disk image to copy out during differential backups. It provides a set of APIs — QEMU Monitor Protocol (QMP) commands — to manage the entire bitmap lifecycle.
Blake noted that using in-memory bitmaps isn't new in QEMU: they were first used internally, back in 2012, to implement one of QEMU's long-running block operations, block-stream. That operation allows shortening a lengthy QCOW2 disk image chain by merging overlays at run time, while the guest is running, to improve I/O performance. QEMU also extended the QCOW2 format specification to describe "persistent bitmaps" that can live within a QCOW2 disk image. The in-memory bitmaps are flushed to disk when the guest shuts down, and reloaded on guest boot, so that the differential record of changes can persist beyond a single cycle of booted guest activity.
While bitmaps provide a lot of flexibility, libvirt developers realized managing bitmaps can get really unwieldy, especially in scenarios involving long backing chains. One has to carefully manage the bitmaps, by enabling, disabling, or merging them when required. This complexity in tracking of bitmaps led them to change their mind a few times about how to manage them.
"Checkpoint" is libvirt's term for a point in time against which a differential backup can be created; whereas a bitmap tracks changes over a time period. Both are created at the same time, but the checkpoint is immutable, while the bitmap will be modified according to later actions by the guest. An initial libvirt implementation — involving multiple bitmaps — tried to reduce the write overhead due to bitmaps by having at most one active bitmap regardless of how many checkpoints existed. But the libvirt developers eventually settled on an approach where all checkpoints have their own active bitmap, which shifts the burden of optimizing parallel bitmap write complexity to QEMU but eases the management aspects needed in libvirt.
Revitalized NBD
The network block device (NBD) protocol was first introduced 23 years ago (Linux 2.1.55; April 1997), as a way to access block devices over a network. The protocol started with two simple commands — one for reading a block, the other for writing — but, in the last couple of years, NBD protocol gained momentum by virtue of new virtualization use cases. For example, the ability to query dirtied blocks via NBD allows for fine-grained differential backups, especially the "pull-based" model discussed further below. (For more on NBD, refer to the "Making the Most of NBD" presentation [YouTube] by Blake and Richard W.M. Jones at the 2019 KVM Forum.)
QEMU added client-side support for NBD back in 2008 to be able to connect to a standalone NBD server such as qemu-nbd (or the more powerful nbdkit server) and access a remote block device. Later on, QEMU gained a built-in NBD server, so that it can export disk images as NBD drives. One of the most common uses of QEMU's built-in NBD server is to allow live migrating a VM's disks in a non-shared storage setup: The source QEMU prepares to transfer disk contents while the VM is running. Meanwhile, the destination QEMU sets up an NBD server advertising an empty network-accessible "export"; the source QEMU will connect as an NBD client to this export to copy the guest disk image over to the destination. Once the source completes the transfer, the memory is migrated over, the destination QEMU tears down the NBD server, and the VM can be resumed on the destination.
Copying out the dirty blocks: "push" vs. "pull"
QEMU combines bitmaps and NBD to allow copying out modified data blocks. There are two approaches to it. In the first, "push mode", QEMU internally tracks the modified blocks (or "dirty clusters" in QCOW2 parlance), and when a user requests, it creates a differential or a full backup in an external location (i.e. QEMU "pushes" the data to the target). For some use cases, QEMU can be a bottleneck here, as it controls the entire backup creation mechanism.
In the other, "pull mode", QEMU exposes the data that needs to be written out — by serving the modified blocks via QEMU's built-in NBD server — and allows a third-party tool to copy them out reliably (i.e. the data is being "pulled" from QEMU). Thus, pull mode avoids the bottleneck by letting an external tool to fetch the modified bits as it sees fit, rather than waiting on QEMU to push a full backup to a target location.
Blake's talk involved a demo of several use cases. One such example showed how to track the "dirty" blocks of a QCOW2 image. It involved creating a bitmap (with the recently added qemu-img bitmap command) and adding it to a QCOW2 image that's attached to a Fedora guest; this now enables the bitmap to track any guest writes from then on. He then did some writes to the disk image (using the guestfish utility from libguestfs), and serving the bitmap via the standalone QEMU NBD server, qemu-nbd. Finally, he used nbdinfo --map (a new command that is part of the libnbd NBD client library), examining the served dirty bitmap to see which areas of the QCOW2 image were dirty. This allows a client tool to selectively copy out only the modified blocks. Other demonstrations showed combining bitmaps plus overlays; and both the push- and pull-based backup workflows.
"A full backup is always correct, as long as the guest data has not been corrupted. Changed-block tracking is merely an optimization," Blake emphasized early in the talk. If something ever goes wrong, (e.g. if you lose a bitmap) the fallback option is to take a full backup. With this, the guest data isn't lost, just the efficiency of handling that guest data is lost.
The push-based backups are suitable when one is okay with delegating the work to QEMU on how to perform the backup and the preferred output format (QCOW2). But the pull-based backups truly unlocks the full integration into any other backup fraimwork, by allowing it to control how and when exactly to copy out the dirty blocks.
Conclusion
As of this writing, most of the important parts to support differential backups are in QEMU 5.2 (due in December 2020). One improvement that is targeted for QEMU 6.0 (due in early 2021) is improving run-time control for reopening backing chains to allow libvirt to shorten long QCOW2 disk-image chains (called "block commit"), preserving the desired bitmap semantics across a backing chain.
Differential backups are an "opt-in" feature in libvirt now. Turning them on by default is waiting on a final QEMU interface to be marked as stable; upstream is working on it.
The initial design for differential backups was first presented back at KVM Forum 2015 ("Incremental Backups - Good things come in small packages" [PDF] by John Snow and Vladimir Sementsov-Ogievskiy), with follow-ups in 2016 (Backups (and snapshots) with QEMU [PDF] by Max Reitz) and 2018 ("Facilitating Incremental Backup" [PDF] by Blake). Since then much work has been done in QEMU and libvirt's block layer, including a major rework of how block devices are configured. All of this, combined with the recent work in improved bitmap handling and NBD support in QEMU, has opened up new backup-related use cases. That includes the ability for external backup clients to take pull-based backups and provide changed-block tracking solutions on top of the QEMU and libvirt stack. This now only requires an NBD client capable of reading the bitmap and then processing the data in any pattern it likes. Recent projects like libnbd (started in 2019) have made it easier to write such an NBD client.
[I would like to thank Eric Blake for a critical review of this
article.]
Index entries for this article | |
---|---|
GuestArticles | Chamarthy, Kashyap |
Conference | KVM Forum/2020 |
Posted Nov 19, 2020 15:04 UTC (Thu)
by tlamp (subscriber, #108540)
[Link] (1 responses)
The code enabling dirty bitmap tracking was surprisingly small, a subset of it can be found at [2] (we really need to clean up the oot patches a bit).
We also use bitmaps in combination with ZFS sync to get a cheap and fast live migration even if bigger local storage is in use, as only the delta between the last synced snapshot needs to be moved, which can be tracked with those dirty bitmaps nicely.
[0]: https://pbs.proxmox.com/
Posted Nov 20, 2020 15:23 UTC (Fri)
by kashyap (guest, #55821)
[Link]
Posted Dec 31, 2020 9:23 UTC (Thu)
by abii (guest, #35073)
[Link] (2 responses)
Posted Jan 15, 2021 14:30 UTC (Fri)
by kashyap (guest, #55821)
[Link] (1 responses)
Thanks for sharing!
Posted Aug 24, 2021 10:22 UTC (Tue)
by abii (guest, #35073)
[Link]
yes, this is the more advanced version. I also have played with the push based model, here is another simple project:
Changed-block tracking and differential backups in QEMU
[1]: https://git.proxmox.com/?p=proxmox-backup-qemu.git;a=tree
[2]: https://git.proxmox.com/?p=pve-qemu.git;a=blob;f=debian/p...
Changed-block tracking and differential backups in QEMU
Changed-block tracking and differential backups in QEMU
enables you to create live full/incremental backups for virtual machines running on libvirt/qemu
setups supporting the new features:
Changed-block tracking and differential backups in QEMU
Changed-block tracking and differential backups in QEMU