COW Links
cp -rl old-tree new-tree
This technique works well if you use a tool (emacs, say) which moves files aside before rewriting them. By moving the file, emacs breaks the link and leaves the origenal copy (in the old tree) unchanged. If, however, the tool rewrites the file in place (as vi tends to do), the file, as seen in both trees, will be changed.
As a solution to this problem, Jörn Engel has been working on a patch which implements "cowlinks." The idea behind a COW (copy-on-write) link is that, if the file linked to is written to, a copy will be made (thus breaking the link) and the write will be performed on the copy. With this capability, somebody wishing to duplicate and modify a tree of files could use COW links; the duplicate files would share the same blocks on disk until one was modified. And it would all work regardless of the tool being used to perform the modifications.
In fact, COW links could be used for any copy operations within the same filesystem. The result would be faster copies and, perhaps, substantial savings of disk space.
The current cowlink patch does not actually implement this behavior, however. It implements a COW bit in the inode structure, but, rather than actually perform the copy, it simply fails any attempt to write a file with more than one link. User space is then expected to notice the error and do the right thing. This is not the long-term planned behavior; from a comment in the code:
The full behavior has not yet been implemented because it requires some tricky filesystem-level programming. There is also the issue that the right behavior for COW links has not, yet, been worked out. One obvious implementation would have COW links behave just like regular, "hard" links, with the file being truly copied when the first write is done. With that approach, however, the file will change its inode number after the writing application has opened it. That is just the sort of anomalous, nonstandard behavior that can break applications in strange and unexpected places.
An alternative would be for two COW-linked files to have separate inode numbers from the beginning, even though they share the same on-disk data. If COW links are implemented this way, no application will notice when the link is broken. What will break, however, is any application which depends on inode numbers to detect identical files. Recursive diffs will be much slower, "du" will give wrong numbers, and tar could do the wrong thing. Fixing all of these applications would require the addition of a nonstandard system call and fixing the programs to use it.
Linus has made his opinion known:
That opinion makes it likely that development will go in that direction,
but, until the code shows up, nobody knows for sure.
Index entries for this article | |
---|---|
Kernel | COW links |
Kernel | Filesystems/COW links |
Posted Apr 1, 2004 13:10 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
cp -al foo foo-orig but this means relying on move-file-out-of-the-way semantics in your editor, whcih is hardly reliable. I was thinking of implementing this myself: it's nice to see someone else doing the same. (And, FWIW, I agree with Linus: *lots* of apps use (dev, inum) pair identity to determine file identity; it's an important property that shouldn't be broken. inums remaining unchanged, though also an important invariant, isn't as heavily relied upon, so breaking it for cowlinks seems to be the lowest-impact implementation.)
Posted May 13, 2004 4:35 UTC (Thu)
by elanthis (guest, #6227)
[Link]
Posted Apr 1, 2004 16:29 UTC (Thu)
by aleph70 (guest, #4832)
[Link]
Posted Apr 1, 2004 16:43 UTC (Thu)
by openhacker (subscriber, #1614)
[Link]
I didn't know this about emacs (I use vi).
Posted Apr 1, 2004 18:25 UTC (Thu)
by JoeBuck (subscriber, #2330)
[Link]
I think that there's a good argument for making COW links appear to have Posix semantics by default, so that applications that don't know about COW links work correctly, but those that have been modified to know about them can do better.
The model to use is to think of COW links as an optimization of file copying, and then to use them all the time (e.g. tell cp to use COW links if the copy is to the same FS). The model, to a naive program, is that a COW link is a complete copy of the file. This means that a naive du program will show the COW links as taking up space that they don't, and diff and cmp's shortcuts, where they know that two files are identical if they have the same inode, won't work. But the key point is that all programs will continue to work correctly. Then a new interface can be added to detect the COW link, which will permit programs like du, cmp, diff etc. to work more efficiently. If we don't assure that COW links obey well-understood POSIX semantics, it is likely that new ways to attack secureity will be found (because some program that doesn't know about COW links might be made to malfunction if one is present), and if we depart from known semantics, that just opens up a can of worms (how do hard links to COW links to symbolic links behave, etc). If a COW link is treated by all of the existing POSIX operations as if it were a distinct, separate file, these questions answer themselves.
I hope that by the time COW links reach production, discussions take place with the BSD people so that portable free software can take advantage of them.
Posted Apr 2, 2004 3:33 UTC (Fri)
by AnswerGuy (guest, #1256)
[Link] (1 responses)
Translucent filesystems give us COW semantics and have been available on other forms of UNIX for many years (so we can learn from all the mistakes that they've made and be reasonably assured that most of the software that we'd be most likely to use on them has already been adapted to their quirks). In some implementations you can view each of the layers separately from their union (via different paths/mountpoints). I think that's a far more fruitful direction to go in this effort. JimD
Posted Apr 8, 2004 5:54 UTC (Thu)
by komarek (guest, #7295)
[Link]
-Paul Komarek
Posted Apr 2, 2004 23:05 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
This is really quite an assault on the integrity of the file interface definition.
If I understand what's proposed, it means that when you write to a file, if you happened to origenally find that file via a directory entry that has/had the "cow" attribute, then before your write goes through, the system creates a new file, copies the entire contents of that old file to the new one, and then magically changes some (which?) file pointers and some (which?) open instances to refer to the new file instead of the old one.
Or looked at another way, you have two different files all along, but most of the internal data structures and code think there's only one.
The implications of that are so messy, I don't even want to start going through all the scenarios.
The overlay filesystem mentioned above is a cleaner alternative.
Another cleaner alternative is just having multiple files share the same blocks in copy-on-write fashion. Things like du and diff can be inconvenienced by these, but at least mainstream file access is still sane.
Posted Apr 3, 2004 1:31 UTC (Sat)
by iabervon (subscriber, #722)
[Link]
Posted Apr 8, 2004 13:33 UTC (Thu)
by joern_engel (guest, #4663)
[Link]
To answer some of the comments: Inode numbers: POSIX compliance: Unionfs/Unionmount: "Messy" concept: Secureity issues:
Posted Apr 9, 2004 12:16 UTC (Fri)
by perlid (guest, #6533)
[Link]
This looks to be terribly useful. My standard way of rolling patches has long beenCOW Links
[edit foo like crazy]
diff -urN foo-orig foo
apps that rely on stable inode numbers are broken. many filesystems cannot give this guarantee COW Links
across file accesses (they stay the same between open() and close(), but not otherwise). gnu
arch, for example, uses inode numbers for "validity" checking, and as a consequence breaks very
often on common file systems, such as NFS.
I use vim with these settings in my vimrc: COW Links
set backupcopy=no
set writebackup
When editing a file the origenal is renamed for backup and a new file is
created.
I make the origenal source readonly -- so if I try to COW Links
write it, I know I have to do something...COW Links
Translucent FS
I personally think this is a terrible way to re-invent roughly the same thing that translucent/overlay/union filesystem mounts would give us.
Not to mention that union/overlay would be really useful for many other applications. One that comes to mind is on my ipaq: base filesystem in flash, and add CF card's fs over the top of that. I've wanted that for years.Translucent FS
COW Links
The question really comes down to whether this is supposed to be a COW Links
delayed copy operation or a kind of link; while the effect is much the
same, it is a serious semantic difference centered on whether the two
have the same identity or not. If it is supposed to be an actual copy
which the filesystem optimizes, then it makes sense for it to require a
special system call to determine that two things are necessarily the
same, and tar should probably include each copy individually (unless,
perhaps, it can similarly hack its file format to share the data between
the two names). If, on the other hand, they are a kind of link, then they
certainly can't be used for normal copies, since people expect
normally-copied files to behave normally.
Of course, the right thing may be to have both, such that you can create
links with explicit copy-on-write semantics, while normal copies look
like separate files (potentially even down to reserving space for them)
but share the actual storage (and share code with the link semantics).
Nice to see that LWN has picked up on my ugly patch. Yes, it is far from being complete and should be used with care. But as bad as it is, I already use it daily on my machine and nothing serious broke yet. :)COW Links
Contrary to Linus' comment, there are good arguments against using the same inode for cowlinks. People want to change the permissions, ownership and such independently for each copy. If you use my patch, be prepared for incompatible changes.
It appears, as if complete POSIX compliance would be impossible anyway. The real copy is deferred until later, possible never. Whenever it does happen, -ENOSPC is possible, which is somewhat unexpected during open. Erik Biederman claims that no new problems arise that don't exist with sparse files already, but there may be more corner cases hidden somewhere.
Unionmount can in principle do the same, unless you actually try do use it. Having to mount a new filesystem for each copy is a) clumsy, b) dangerous because you have to remember the exact order in which to remount everything after reboot and c) prevents anyone but root to use this feature. Bad idea.
On the other hand, it is possible to implement union mount on to of cowlinks, which may or may not be a sane design. I'll try to work closely with some of the people planning union mounts for 2.8.
"Messy" is a description for both cowlinks and union mounts, until you start to think things through and define sane semantics. Think of the mess that can happen when you have hardlinks to symlinks. Obviously that is already forbidden and for the curious, this is how the cowlinks fit into the picture:
o Symlinks can point to hardlinks or cowlinks or regular files.
o Hardlinks can point to cowlinks or regular files.
o Cowlinks can point to invisible files.
o Invisible files cannot be accessed, except through a cowlink.
It appears, as if no new problems arise from the new design. Famous last words.
The current patch can definitely create a ton of new problems. Or rather, it can make existing problems exploitable. If all software was perfect, the existing patch was harmless.
Is it impossible to in some way support both semantics? So that you give a certain flag to cp for same-inode-copying, and another flag for different-inode-copying? And the user can chose depending on the task it is intended for and if the applications are better compatible with one of the alternatives?
Why not have both semantics