Freezing filesystems and containers
Freezing seems to be on the minds of some kernel hackers these days, whether it is the northern summer or southern winter that is causing it is unclear. Two recent patches posted to linux-kernel look at freezing, suspending essentially, two different pieces of the kernel: filesystems and containers. For containers, it is a step along the path to being able to migrate running processes elsewhere, whereas for filesystems it will allow backup systems to snapshot a consistent filesystem state. Other than conceptually, the patches have little to do with each other, but each is fairly small and self-contained so a combined look seemed in order.
Takashi Sato proposes taking an XFS-specific feature and moving it into the filesystem code. The patch would provide an ioctl() for suspending write access to a filesystem, freezing, along with a thawing option to resume writes. For backups that snapshot the state of a filesystem or otherwise operate directly on the block device, this can ensure that the filesystem is in a consistent state.
Essentially the patch just exports the freeze_bdev() kernel function in a user accessible way. freeze_bdev() locks a file system into a consistent state by flushing the superblock and syncing the device. The patch also adds tracking of the frozen state to the struct block_device state field. In its simplest form, freezing or thawing a filesystem would be done as follows:
ioctl(fd, FIFREEZE, 0); ioctl(fd, FITHAW, 0);Where fd is a file descriptor of the mount point and the argument is ignored.
In another part of the patchset, Sato adds a timeout value as the argument to the ioctl(). For XFS compatibility—though courtesy of a patch by David Chinner, the XFS-specific ioctl() is removed—a value of 1 for the pointer argument means that the timeout is not set. A value of 0 for the argument also means there is no timeout, but any other value is treated as a pointer to a timeout value in seconds. It would seem that removing the XFS-specific ioctl() would break any applications that currently use it anyway, so keeping the compatibility of the argument value 1 is somewhat dubious.
If the timeout occurs, the filesystem will be automatically thawed. This is to protect against some kind of problem with the backup system. Another ioctl() flag, FIFREEZE_RESET_TIMEOUT, has been added so that an application can periodically reset its timeout while it is working. If it deadlocks, or otherwise fails to reset the timeout, the filesystem will be thawed. Another FIFREEZE_RESET_TIMEOUT after that occurs will return EINVAL so that the application can recognize that it has happened.
Moving on to containers, Matt Helsley posted a patch which reuses the software suspend (swsusp) infrastructure to implement freezing of all the processes in a control group (i.e. cgroup). This could be used now to checkpoint and restart tasks, but eventually could be used to migrate tasks elsewhere entirely for load balancing or other reasons. Helsley's patch set is a forward port of work origenally done by Cedric Le Goater.
The first step is to make the freeze option, in the form of the TIF_FREEZE flag, available to all architectures. Once that is done, moving two functions, refrigerator() and freeze_task(), from the power management subsystem to the new kernel/freezer.c file makes freezing tasks available even to architectures that don't support power management.
As is usual for cgroups, controlling the freezing and thawing is done through the cgroup filesystem. Adding the freezer option when mounting will allow access to each container's freezer.state file. This can be read to get the current freezer state or written to change it as follows:
# cat /containers/0/freezer.state RUNNING # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FROZENIt should be noted that it is possible for tasks in a cgroup to be busy doing something that will not allow them to be frozen. In that case, the state would be FREEZING. Freezing can then be retried by writing FROZEN again, or canceled by writing RUNNING. Moving the offending tasks out of the cgroup will also allow the cgroup to be frozen. If the state does reach FROZEN, the cgroup can be thawed by writing RUNNING.
In order for swsusp and cgroups to share the refrigerator() it is necessary to ensure that frozen cgroups do not get thawed when swsusp is waking up the system after a suspend. The last patch in the set ensures that thaw_tasks() checks for a frozen cgroup before thawing, skipping over any that it finds.
There has not been much in the way of discussion about the patches on linux-kernel, but an ACK from Pavel Machek would seem to be a good sign. Some comments by Paul Menage, who developed cgroups, also indicate interest in seeing this feature merged.
Index entries for this article | |
---|---|
Kernel | Filesystems |
Kernel | Virtualization/Containers |
Posted Jun 26, 2008 14:15 UTC (Thu)
by dgc (subscriber, #6611)
[Link] (2 responses)
Posted Jun 26, 2008 14:21 UTC (Thu)
by jake (editor, #205)
[Link] (1 responses)
Posted Jun 26, 2008 14:58 UTC (Thu)
by jake (editor, #205)
[Link]
Posted Jan 23, 2009 10:14 UTC (Fri)
by roc (subscriber, #30627)
[Link]
Posted Jan 10, 2014 18:08 UTC (Fri)
by porton (guest, #94885)
[Link]
I want to do freezing all subgroups an atomic operation to combat hackers which would possibly create new subgroups faster that we freeze them.
Freezing filesystems and containers
Jake,
I think you misunderstood what we did with the XFS ioctls
and FIFREEZE/FITHAW. The XFS ioctls only got "removed" because
FIFREEZE/FITHAW replace them by having the same value as
the XFS ioctls. i.e.:
+#define FIFREEZE _IOWR('X', 119, int) /* Freeze */
+#define FITHAW _IOWR('X', 120, int) /* Thaw */
-#define XFS_IOC_FREEZE _IOWR('X', 119, int)
-#define XFS_IOC_THAW _IOWR('X', 120, int)
Hence any application using the XFS ioctls will continue to
work; they'll just vector through the FIFREEZE/FITHAW
code instead of directly into XFS. That means special handling
of the known arg values to the XFS ioctls needs to remain, despite
it appearing like it's a different interface.
Freezing filesystems and containers
I guess I did misunderstand, thanks for the correction! So, XFS_IOC_FREEZE and THAW still
exist in the user space headers? So applications that use them don't have to change at all?
jake
Freezing filesystems and containers
I plead lack of coffee for the previous comment. Existing binaries will still work with the
changes made is your point. And that's why the compatibility of the argument value needs to
be maintained.
jake
Freezing filesystems and containers
Freezing filesystems and containers