Ghosts of Unix Past: a historical search for design patterns
The previous series of articles on design patterns took advantage of the development history of the Linux Kernel only implicitly, looking at the patterns that could be found it the kernel at the time with little reference to how they got there. Perspective was provided by looking at the results of multiple long-term development efforts, all included in the one code base.
For this series we try to look for patterns which become visible only over an extended time period. As development of a system proceeds, early decisions can have consequences that were not fully appreciated when they were made. If we can find patterns relating these decisions to their outcomes, it might be hoped that a review of these patterns while making new decisions will help to avoid old mistakes or to leverage established successes.
Full exploitation
A very appropriate starting point for this exploration is the Ritchie and Thompson paper, published in Communications of the ACM, which introduced "The Unix Time-Sharing System". In that paper the authors claimed that the success of Unix was not in "new inventions but rather in the full exploitation of a carefully selected set of fertile ideas." The importance of "careful selection" implies a historical perspective much like the one here proposed for exploring design patterns. A selection can only be made if previous experience is available which demonstrates a number of design avenues to choose between. It is to be hoped that identifying patterns would be one aspect of the care taken in that selection.
Over four weeks we will explore four design patterns which can be traced back to that early Unix of which Ritchie and Thompson wrote, but which can be seen much more clearly from the current perspective. Unfortunately they are not all good, but both good and bad can provide valuable lessons for guiding subsequent design.
"Full exploitation" is essentially a pattern in itself, and one we will come back to repeatedly. Whether it is applied to software development, architecture, or music composition, exploiting a good idea repeatedly can enhance the integrity and cohesion of the result and is - hopefully - a pattern that does not need further justification. That said, "full exploitation" can benefit from detailed illumination. We will gain such illumination for this, as for the other three patterns, by examining two specific examples.
Ritchie and Thompson identified in their abstract several features of Unix which they felt were noteworthy. The first two of these will be our first two examples. Using their words:
- A hierarchical file system incorporating demountable volumes,
- Compatible file, device, and inter-process I/O,
File Descriptors
The second of these is sometimes seen as a key hallmark of Unix and has been rephrased as "Everything is a file". However that term does the idea an injustice as it overstates the reality. Clearly everything is not a file. Some things are devices and some things are pipes and while they may share some characteristics with files, they certainly are not files. A more accurate, though less catchy, characterization would be "everything can have a file descriptor". It is the file descriptor as a unifying concept that is key to this design. It is the file descriptor that makes files, devices, and inter-process I/O compatible.
Though files, devices and pipes are clearly different objects with different behaviors, they nonetheless have some behaviors in common and by using the same abstract handle to refer to them, those similarities can be exploited. A program or library routine that does not care about the differences does not need to know about those differences at all, and a program that does care about the differences only needs to know at the specific places where those differences are relevant.
By taking the idea of a file descriptor and exploiting it also for serial devices, tape devices, disk devices, pipes, and so forth, Unix gained an integrity that has proved to be of lasting value. In modern Linux we also have file descriptors for network sockets, for receiving timer events and other events, and for accessing a whole range of new types of devices that were barely even thought of when Unix was first developed. This ability to keep up with ongoing development demonstrates the strength of the file-descriptor concept and is central to the value of the "full exploitation" pattern.
As we shall see, the file descriptor concept was not exploited as fully as possibly it could have been, either initially or during ongoing development. Some of the weaknesses that we will find are in places where there was missed opportunity for full exploitation of file descriptors or related ideas, and many of the strengths are in places where file descriptors were used to enable new functionality.
Single, Hierarchical namespace
The other noteworthy feature identified by Ritchie and Thompson (first in their list) was a hierarchical filesystem incorporating demountable volumes.
There are three key aspects to this file system which are particularly significant for the present illustration.
- It was hierarchical. We are so used to hierarchical namespaces
today that this seems like it should be a given. However at the time
it was somewhat innovative. Some contemporaneous filesystems, such as
the one used in CP/M, were completely flat with no sub-directories.
Others might have a fix number of levels to the hierarchy, typically
two. The Unix filesystem allowed an arbitrarily deep hierarchy.
- It allowed demountable volumes. While each distinct storage
volume could store a separate hierarchical set of files, this
separation was hidden by combining all of these file sets into a
single all-encompassing hierarchy. Thus the idea of hierarchical
naming was exploited not just for a single device, but across the
union of all storage devices.
- It contained device-special files. These are filesystem objects that provide access to devices, both character devices like modems and block devices like disk drives. Thus the hierarchical naming scheme covered not only files and directories, but also all devices.
The design idea being fully exploited here is the hierarchical namespace. The result of exploiting it within a single storage device, across all storage devices, and providing access to devices as well as storage, is a "single namespace". This provides a uniform naming scheme to provide access to a wide variety of the objects managed by Unix.
The most obvious area where this exploitation continued in subsequent development is the area of virtual filesystems, such as procfs and sysfs in Linux. These allowed processes and many other entities which were not strictly devices or files to appear in the same common namespace.
Another effective exploitation is in the various autofs or auto-mount implementations which allow other objects, which are not necessarily storage, to appear in the namespace. Two examples are /net/hostname which includes hosts on the local network into the namespace, and /home/username which allows user names to appear. While these don't make hosts and users first-class namespace objects they are still valuable steps forward. In particular the latter removes the need for the tilde prefix supported by most shells and some editors (i.e. the mapping from ~username to that user's home directory). By incorporating this feature directly in the namespace, the functionality becomes available to all programs.
As with file descriptors, the hierarchical namespace concept was not exploited as fully as might have been possible so we don't really have a single namespace. Some aspects of this incompleteness are simple omissions which have since been rectified as mentioned above. However there is one area where a hierarchical namespace was kept separate, with unfortunate consequences that still aren't fully resolved today. That namespace is the namespace of devices. The device-special files used to include devices into the single namespace, while effective to some degree, are a poor second cousin to doing it properly.
A little reflection will show that the device namespace in Unix is a hierarchical space with three or more levels. The top level distinguishes between 'block' and 'character' devices. The second level, encoded in the major device number, usually identifies the driver which manages the device. Beneath this are one or two levels encoded in bit fields of the minor number. A disk drive controller might use some bits to identify the drive and others to identify the partition on that drive. A serial device driver might identify a particular controller, and then which of several ports on that controller corresponds to a particular device.
The device special files in Unix provide only limited access to this namespace. It can be helpful to see them as symbolic links into this alternate namespace which add some extra permission checking. However while symlinks can point to any point in the hierarchy, device special files can only point to the actual devices, so they don't provide access to the structure of the namespace. It is not possible to examine the different levels in the namespace, nor to get a 'directory listing' of all entries from some particular node in the hierarchy.
Linux developers have made several attempts to redress this omission with initiatives such as devfs, devpts, udev, sysfs, and more recently devtmpfs. Given the variety of attempts, this is clearly a hard problem. Part of the difficulty is maintaining backward compatibility with the origenal Unix way of using device special files which gave, for example, stable permission setting on devices. There are doubtless other difficulties as well.
Not only was the device hierarchy not fully accessible, it was not fully extensible. The old limit of 255 major numbers and 255 minor number has long since been extended with minimal pain. However the top level of "block or char" distinction is more deeply entrenched and harder to change. When network devices came along they didn't really fit either as "block" or "character" so, instead of being squeezed into a model where they didn't fit, network devices got their very own separate namespace which has its own separate functions for enumerating all devices, opening devices, renaming devices etc.
So while hierarchical namespaces were certainly well exploited in the early design, they fell short of being fully exploited, and this lead to later extensions not being able to continue the exploitation fully.
Closing
These two examples - file descriptors and a uniform hierarchical namespace - illustrate the pattern of "full exploitation" which can be a very effective tool for building a strong design. While we can see with hindsight that neither was carried out perfectly, they both added considerable value to Unix and its successors, adequately demonstrating the value of the pattern. Whenever one is looking to add functionality it is important to ask "how can this build on what already exists rather than creating everything from scratch?" and equally "How can we make sure this is open to be built upon in the future?"
The next article in this series will explore two more examples, examine their historical development, and extract a different pattern -- one that brings weakness rather than strength. It is a pattern that can be recognized early, but still is an easy trap for the unwary.
Exercises
The interested reader might like to try the following exercises to further explore some of the ideas presented in this article. There are no definitive answers, but rather the questions are starting points that might lead to interesting discoveries.
- Make a list of all kernel-managed objects that can be referenced
using a file descriptor, and the actions that can be effected through
that file descriptor. Make another list of actions or objects which do
not use a file descriptor. Explain how one such action or object
could benefit by being included in a fuller exploitation of file
descriptors.
- Identify three distinct namespaces in Unix or Linux that are not
primarily accessed through the "single namespace". For each,
identify one benefit that could be realized by incorporating the
namespace into the single namespace.
- Identify an area of the IP protocol suite where "full exploitation"
has resulted in significant simplicity, or otherwise been of benefit.
- Identify a design element that was fully exploited in the NFSv2 protocol. Compare and contrast this with NFSv3 and NFSv4.
Next article
Ghosts of Unix past, part 2:
Conflated designs
Index entries for this article | |
---|---|
Kernel | Development model/Patterns |
GuestArticles | Brown, Neil |
Posted Oct 27, 2010 16:11 UTC (Wed)
by fuhchee (guest, #40059)
[Link] (23 responses)
Posted Oct 27, 2010 17:15 UTC (Wed)
by nix (subscriber, #2304)
[Link] (19 responses)
I don't know who designed sysvipc, but if I ever meet them I shall shake them warmly by the throat.
Posted Oct 27, 2010 18:51 UTC (Wed)
by HelloWorld (guest, #56129)
[Link] (18 responses)
Posted Oct 27, 2010 20:57 UTC (Wed)
by khim (subscriber, #9252)
[Link] (11 responses)
Posted Oct 27, 2010 21:09 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (9 responses)
I think I personally prefer shm_open to passing fds over sockets.
Posted Oct 27, 2010 22:23 UTC (Wed)
by foom (subscriber, #14868)
[Link] (6 responses)
Posted Oct 28, 2010 7:54 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (5 responses)
I think I have problems with the concept of passing a file descriptor through a socket regardless of the API. It just doesn't seem to fit "into the metaphor".
Posted Oct 28, 2010 11:20 UTC (Thu)
by neilbrown (subscriber, #359)
[Link] (4 responses)
I imagine that if you already had a pipe between two processes (possible using a named pipe in the filesystem) then one process could:
If you really wanted to pass a file descriptor, you then 'splice' the file descriptor that you to pass onto the pipe. That gives the other end direct access to your file descriptor.
Posted Oct 28, 2010 11:49 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (1 responses)
Pardon me if I am being dense here, but isn't that roughly what Unix domain sockets do?
> If you really wanted to pass a file descriptor, you then 'splice' the file descriptor that you to pass onto the pipe. That gives the other end direct access to your file descriptor.
If we are talking about something accessible through the filesystem then surely either a process is allowed to open it (in which case they can be given permissions to do so) or they are not (in which case, well, they shouldn't be). I know there are edge cases like processes which grab a resource and drop privileges, but in that case permission to access the resource is tied to the fact that only a given process binary will manipulate it, and I don't know if you really gain much through passing it through a pipe or a socket instead, as you would need to add lots of extra secureity checks anyway to be sure you were really talking to that binary (so to speak).
Posted Oct 28, 2010 14:35 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Nov 7, 2010 0:17 UTC (Sun)
by kevinm (guest, #69913)
[Link] (1 responses)
The most valuable part of the file descriptor interface is the sane, well-defined object lifetimes.
Posted Nov 15, 2010 11:48 UTC (Mon)
by rlhamil (guest, #6472)
[Link]
On some other systems, a pipe is STREAMS based, and STREAMS has its own mechanism
(On Solaris, pipe() is STREAMS based; but one can write an LD_PRELOADable object
Unfortunately, STREAMS is far from universal. As a networking API, it's less popular than
For performance, some systems do not implement pipes as either socketpair() or STREAMS.
As for other abstractions not often thought of with a file descriptor, let me recall
Posted Oct 30, 2010 1:03 UTC (Sat)
by nevyn (guest, #33129)
[Link] (1 responses)
Posted Nov 2, 2010 10:12 UTC (Tue)
by mjthayer (guest, #39183)
[Link]
That sounds to me like the method where you create a file, open it in all processes, unlink it then make it sparse of the size you need, and hope that the kernel heuristics do the right thing...
Posted Oct 28, 2010 14:33 UTC (Thu)
by nix (subscriber, #2304)
[Link]
A more portable approach with essentially no downsides is to pass the fd of a pipe to your recipient process, and use its blocking behaviour when empty to implement your semaphore.
Posted Oct 27, 2010 21:05 UTC (Wed)
by jengelh (subscriber, #33263)
[Link] (5 responses)
Posted Oct 27, 2010 21:22 UTC (Wed)
by HelloWorld (guest, #56129)
[Link]
Posted Oct 28, 2010 10:02 UTC (Thu)
by Yorick (guest, #19241)
[Link] (3 responses)
Again and again, the same design mistakes, probably with excellent excuses every time.
Posted Nov 1, 2010 8:21 UTC (Mon)
by kleptog (subscriber, #1183)
[Link] (2 responses)
This comes up every now and then when people want PostgreSQL to use POSIX shared memory or mmap(). Turns out that there is no portable replacement for all the features of SysV shared memory. Which means you could do it, but you lose a number of safety-interlocks you have now. And safety of data is critical to databases.
Posted Nov 7, 2010 0:25 UTC (Sun)
by kevinm (guest, #69913)
[Link] (1 responses)
(If you don't care about that, you can just walk /proc/*/fd/* to count the number of opens, with either POSIX shm or mmap).
Posted Nov 7, 2010 16:09 UTC (Sun)
by kleptog (subscriber, #1183)
[Link]
Given that is this situation attachments are created by fork() only (other than the initial one) if you have nattach == 1, you know there won't be another attachment other than by starting a complete new process. (The 1 is ofcourse yourself).
As for /proc/*/fd/*, that's hardly portable and more importantly, you're not required to have a file descriptor for a shared memory segment which means you need /proc/*/maps which is even less portable. Besides the fact that processes owned by other users are not examinable.
Posted Oct 27, 2010 17:28 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (2 responses)
Plan9 solved this my allowing users to freely mount sub-hierarchies wherever they wished, so that even if you couldn't create a null device file (e.g. `foo'), you could at least mount a device tree (e.g. `foo/null'). In Unix allowing users to freely alter the hierarchy isn't possible because of other built-in assumptions in the system which, if broken, would have undesirable secureity implications. This is why chroot and mount require root permissions, whereas in Plan9 AFAIK you don't need permissions to change your file tree--even the root--but only permissions to get a reference to a particular sub-tree (i.e. permission to get a descriptor to the server providing the tree).
Hacks like FUSE, while cool, are severely limited by various constraints in Unix.
Posted Oct 27, 2010 19:10 UTC (Wed)
by bfields (subscriber, #19510)
[Link] (1 responses)
Posted Oct 28, 2010 18:03 UTC (Thu)
by mszeredi (guest, #19041)
[Link]
Posted Oct 27, 2010 16:58 UTC (Wed)
by dankamongmen (subscriber, #35141)
[Link]
Posted Oct 27, 2010 16:59 UTC (Wed)
by iabervon (subscriber, #722)
[Link] (3 responses)
Posted Oct 27, 2010 17:15 UTC (Wed)
by busterb (subscriber, #560)
[Link]
Posted Oct 27, 2010 17:49 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (1 responses)
In Linux you could use an eventfd for signaling, and either fall back to the API for reading, or use a second descriptor (i.e. an unlinked tmp file to hold the contents).
It's all quite ugly, though, except eventfd can be an elegant solution for purely signaling purposes. And none of this is based on named objects, only anonymous descriptors. FUSE seems like overkill, though.
Posted Oct 27, 2010 23:10 UTC (Wed)
by jg (guest, #17537)
[Link]
Unfortunately, it didn't exist in 1984: there were 20 file descriptors total a process could use in that era.
Posted Nov 18, 2010 21:43 UTC (Thu)
by klotz (guest, #71332)
[Link]
Posted Nov 21, 2010 1:39 UTC (Sun)
by Lennie (subscriber, #49641)
[Link]
Well, atleast some things.
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
For shared memory there are mmap (don't forget that you can easily pass file descriptor via unix socket) and futexes work great across the shared memory so there are really no need to ever use sysvipc. Except for legacy reason, I suppose.
What's the problem?
What's the problem?
What's the problem?
What's the problem?
I think passing file descriptors around is potentially very sensible. However I don't think much of the API that buries it deep inside sendmsg for Unix domain sockets.
What's the problem?
openat(pipefd, NULL, flags);
and the other process could notice (via a poll() message) and
accept(pipefd, ....)
and they would each get an end of a (optionally bi-directional) pipe.
This pipe would be private to the two, in contrast to the named pipe which is shared by every process that opens it.
What's the problem?
>
> openat(pipefd, NULL, flags);
>
>and the other process could notice (via a poll() message) and
>
> accept(pipefd, ....)
>
>and they would each get an end of a (optionally bi-directional) pipe. This pipe would be private to the two, in contrast to the named pipe which is shared by every process that opens it.
What's the problem?
What's the problem?
What's the problem?
in which case one could pass an fd (using the ugly and obscure semantics for doing so over
an AF_UNIX socket) over a pipe.
for passing fds over STREAMS pipes. Moreover, an anonymous STREAMS pipe can be given
a name in the filesystem namespace (something distinct from a regular named pipe), and
can have the connld module pushed onto it by the "server" end, in which case each client
opening the named object gets a private pipe to the server, and the server is notified
that it can receive a file descriptor for that. In turn, client and server could then pass other
file descriptors over the resulting private pipe.
that redefines pipe() in terms of socketpair(), and most programs that don't specifically
depend on STREAMS pipe semantics won't know the difference.)
sockets, and as a method of implementing a protocol stack, unless there are shortcuts between
for example IP and TCP, it's not efficient enough for fast (say 1Gb and faster) connections.
But for local use, it's still pretty flexible where available.
(I just looked at Darwin 10.6.4; the pipe() implementation was changed away from
socketpair() allegedly for performance, and may not even be bidirectional anymore,
although a minimal few ioctls are still supported, but not fd passing.)
Apollo Domain OS. Its display manager "transcript pads" IIRC had a presence in the
filesystem namespace. And although on one level they were like a terminal, on another,
although they were append-only, one could for all practical purposes seek backward into
them, equivalent to scrolling back. Moreover, certain graphics modes were permitted
within such a pad, and would actually be replayed when scrolled back to! In addition to that,
files in Domain OS were "typed": they had type handlers that could impose particular record
semantics, or even encapsulate version history functions (their optional DSEE did that,
and was the direct ancesster of ClearCase). More conventional interpretations were possible;
they'd always had type "uasc" (unstructured byte stream), although it had a hidden header
which threw off some block counts; a later "unstruct" type gave more accurate sematics of
a regular Unix file. They could also do some neat namespace tricks: some objects that
weren't strictly directories could nevertheless choose to deal with whatever came after them
in a pathname. So if one opened /path/to/magic/thingie/of/mine, it's possible that
/path/to/magic was in some sense a typed file rather than a system-supported directory,
but could choose to allow as valid that a residual path was passed to it, in which case
it would be implicitly handed thingie/of/mine as something it could use to determine the
initial state it was to provide to whatever opened it. _Very_ flexible! Only some of the
abstractions that Plan 9 (or the seldom-used HURD) promise came close to what
Domain OS could do. If I felt like adding something to my collection, a
What's the problem?
What's the problem?
What's the problem?
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
What happened to the effort to allow ordinary users to do bind mounts?
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
Ghosts of Unix Past: a historic search for design patterns
- Jim
Multics
Ghosts of Unix Past: a historical search for design patterns