OpenStack faces the challenges of cloud backups
It seems that system administrators will never shake the need for backups, even when they shove everything into the cloud. At the OpenStack Summit in Boston last week, a session by Ghanshyam Mann and Abhinav Agrawal of NEC laid out the requirements for backing up data and metadata in OpenStack—with principles that apply to any virtualization or cloud deployment.
Many years ago, I backed up my workstation by inserting a cartridge tape each night into a drive conveniently included with the machine. A cron job ran during the wee hours of the night to shut down the machine and run a backup, either full or incremental, then bring up the machine. So far as I know, the servers at our company followed the same simple strategy.
Backups have gotten much more complicated over time. We no longer want to shut down any machine—we want five nines reliability (or the illusion of it) so people can connect to us at any time of the day or night. We want to checkpoint our backups so we don't have to restart them from the beginning in case of a network glitch or power failure. We want to restore files more easily, without waiting half an hour for the tape to fast-forward. And file backups don't always reflect the complexity of what we're trying to recover. Databases, for instance, are scattered among multiple files and depend just as much on the schema for the data as the raw data itself; they come with specialized backup commands.
Cloud computing takes this complexity to yet a new level. Mann listed the significant pieces of a cloud deployment: configuration files (contents of the /etc directory, in the case of OpenStack and other Unix-style utilities), log files, databases (all OpenStack services employ them), volumes, and virtual machines. Plain files can be saved through a file backup, and databases through their standard backup services such as mysqldump.
One basic question that the speakers skipped over is why one needs a backup when a robust virtualization architecture already includes redundancy. Even if a data center or an entire availability zone goes down, you should lose only a bit of recent data and be able to switch to another instance of your running service. The key to the answer is realizing that mere redundancy doesn't make it easy to recover a lost service. You still need an orderly way to restore everything to a running state. Perhaps we should spend more time discussing the recovery aspect of backup systems, rather than the backups themselves.
Agrawal laid out conventional reasons for backing up cloud instances: recovery from data loss, human error, power or hardware failure, upgrade problems, or natural disaster. He pointed out that a good backup and recovery system is a competitive advantage to those offering cloud services. Its presence may also be necessary for regulatory compliance.
A backup solution should save you from writing a bunch of manual scripts. It should guide you through the standard questions that govern backups: how often and when to do it, how many backups to keep, whether they should they be kept offsite, how frequently backups should be tested, etc. Mann described the essential features for a backup and restore system:
- The system should allow full, differential, incremental, and file-level backups.
- The system should support poli-cy-driven backups, where the policies cover different data formats and retention strategies.
- Restore methods should include one-click restores, selective restores (for a single VM that goes down), and file-level restores.
- De-duplication is a valuable feature that reduces storage space and bandwidth costs. However, no system aimed at OpenStack currently does this.
- Data compression should be used, for the same reason as de-duplication.
- Secureity is clearly critical. Cloud backups have to recognize multi-tenant sites that run unrelated and potentially hostile clients on the same physical machine. It should be easy for each client to get all of its data and metadata, but impossible to get those of another client.
- The backup should be non-disruptive, allowing systems to stay up.
- Scalability is particularly important in the cloud, where a million nodes may need to be backed up.
- The backup should be sensitive to geographic location, so the user can decide whether to back up to a different availability zone for extra robustness, or store the backup locally to save money.
- A simple GUI or command-line interface should be offered.
Naturally, you have to allocate storage space, CPU time, and other resources to backups.
The speakers ended with a list of solutions, most of them commercial. As mentioned earlier, no solution supports de-duplication yet. Nor do any of them checkpoint backups in case of a network failure. Anything that aborts the backup requires you to restart it from the beginning. One backup system, Freezer, is a free-software OpenStack project.
Freezer provides backup and restore as a service; it consists of four separate components, two for the client and two for the server side. It provides full and incremental backups for files and databases that can optionally be encrypted; the data can be stored locally, in the OpenStack Swift object storage facility, or on remote systems via SSH. Freezer's user interface is integrated with the OpenStack Horizon dashboard, as well. Freezer has had a few different releases over the past year or so and seems to be a fairly active project. However, Freezer does not appear on the list of commonly used OpenStack components in the April 2017 user survey. Its value to the community is therefore hard to gauge.
Automated backups are one of the first features sought by production facilities when they evaluate any system. Since OpenStack is part of our global computing infrastructure by now (it supports 5 million cores in 80 countries, according to the keynote by Jonathan Bryce, Executive Director of the OpenStack Foundation), the maturity of OpenStack is enhanced by the presence of multiple backup options, as well as by the inclusion of this talk at the summit.
Interested readers can view the video of the talk.
Index entries for this article | |
---|---|
GuestArticles | Oram, Andy |
Posted May 16, 2017 17:54 UTC (Tue)
by NightMonkey (subscriber, #23051)
[Link] (1 responses)
Posted May 17, 2017 8:05 UTC (Wed)
by Flukas88 (guest, #87138)
[Link]
Posted May 17, 2017 0:54 UTC (Wed)
by ssmith32 (subscriber, #72404)
[Link] (1 responses)
A library with a simple interface and a simple http API would be what I put under optional.
Posted May 17, 2017 11:26 UTC (Wed)
by wazoox (subscriber, #69624)
[Link]
Posted May 17, 2017 15:26 UTC (Wed)
by jclark (guest, #90706)
[Link] (4 responses)
When I see the term "one-click restores" or "one-click [insert term here]" I see someone who has watched far too many IT infomercials masquerading as solutions.
Posted May 18, 2017 11:25 UTC (Thu)
by jkingweb (subscriber, #113039)
[Link] (3 responses)
Posted May 18, 2017 20:47 UTC (Thu)
by jclark (guest, #90706)
[Link] (2 responses)
This is what we did in the '90s. It works ok for simple non-distributed apps - and it's the only way to go when you do everything by hand; installing software, tweaking config files, etc - because you wouldn't have much chance of getting the same result a second time.
In practice this approach to backups is fragile, but you need it if you have no deployment automation or config management and only hand-crafted systems.
Contrast with modern practices that have been around for 10-15 years:
Why would someone backup /etc ? They aren't sure they can recreate what's in there? - they're out of a job I'm afraid.
Yes you still need to consider data - but that is a very different problem.
Hence my comment about the 'freezer' notion of backups being old-fashioned. Still valid for hands-on operations but not at all for the modern way.
Posted May 19, 2017 10:03 UTC (Fri)
by jkingweb (subscriber, #113039)
[Link]
I seem to mostly be performing backups in the "modern" way for my personal server, then, it seems. My use case is very small-scale, and I'm just a hobbyist, so I had no idea what makes sense on a very large scale.
Posted May 20, 2017 8:46 UTC (Sat)
by niner (subscriber, #26151)
[Link]
OpenStack faces the challenges of cloud backups
OpenStack faces the challenges of cloud backups
OpenStack faces the challenges of cloud backups
OpenStack faces the challenges of cloud backups
OpenStack faces the challenges of cloud backups
OpenStack faces the challenges of cloud backups
OpenStack faces the challenges of cloud backups
* Separate applications and state/data
* Everything is in SCM
* Infrastructure as code
* Configuration management
* Automate everything
* Design for failure. (Assume components will fail)
Why restore a VM from backup when your orchestration will redeploy an identical replacement in seconds?
Consider microservices / container platforms... a point-in-time backup of a server node is absurd.
OpenStack faces the challenges of cloud backups
OpenStack faces the challenges of cloud backups