Fun with file descriptors
This issue is a problem for kernel developers. They would rather not create new, file-descriptor-based services (completion events for syslet-based asynchronous I/O, for example) if glibc will not use those services. So there has been a search for alternatives, most of which involve creating a separate space for "system" file descriptors. Linus suggested one way of doing this:
Davide Libenzi took this idea forward with a patch to create a non-sequential file descriptor area. The current kernel tracks file descriptors in a linear array - a technique which works well as long as the "lowest available descriptor" rule holds. As soon as one starts setting high-order bits in file descriptor numbers, however, the linear array becomes rather less practical. So Davide's patch creates a separate, linked-list data structure used for the non-sequential file descriptor range. The second part of the patch set then fixes up the dup2() system call to use the new file descriptor range. The normal behavior of dup2() has not changed, but if the destination file descriptor is passed as FD_UNSEQ_ALLOC, a random file descriptor will be allocated from the non-sequential area. A specific file descriptor in that area can be requested by passing a number higher than FD_UNSEQ_BASE.
This approach has the advantage of not requiring any new system calls or changing the default user-space binary interface at all. But according to Ulrich Drepper, that attribute is not an advantage at all. Since using this capability requires application changes in any case, Ulrich would rather just see a new system call created; he proposes:
int nonseqfd(int fd, int flags);
This system call would duplicate the open file descriptor fd into the non-sequential space, optionally closing fd in the process. The flags parameter would allow other attributes of the new file descriptor to be controlled. Of particular interest is whether that descriptor shows up in the /proc/pid/fd directory. The optimal way of closing all open file descriptors, apparently, is to read that directory to see which descriptors are currently open. Keeping special descriptors out of that directory (perhaps shifting them to a parallel private-fd directory) will prevent well-meaning applications from closing the library's file descriptors.
It has been suggested that the open() system call should get a flag which would cause it to select a non-sequential file descriptor from the outset, eliminating the need for a separate call to nonseqfd(). There are, however, a number of system calls which create file descriptors but which have no flags parameter and which, thus, will never be able to return non-sequential file descriptors; socket() is a classic example. So there will still be a need for a system call which can duplicate a file descriptor into the new space.
Ulrich has requested that all file descriptors in the non-sequential space be allocated randomly. He would rather not ever see a situation where application developers think they can rely on any specific allocation behavior when using that space. There have also been suggestions that the non-sequential space could be useful for for high-performance applications which hold large numbers of file descriptors open - web servers, for example. Such applications usually have no use for the "lowest available descriptor" guarantee and would happily do without the overhead of implementing that guarantee. Davide's current implementation does not appear to have been written with thousands of non-sequential file descriptors in mind, though.
On another front, Ulrich has been working on a race condition which comes up with certain types of applications. It is possible to request that a file descriptor be automatically closed if the process performs an exec(); the fcntl() system call is used for this purpose. The problem is that there is some time between when the file descriptor is created (with an open() call, perhaps) and the subsequent fcntl() call. If another thread forks and runs a new program between those two calls, its copy of the new file descriptor will not have the close-on-exec flag set and will thus remain open.
Solving that problem generally will take some work, but fixing the
open case is relatively easy. Ulrich is proposing a new
O_CLOEXEC flag for this purpose. There does not appear to be much
opposition to this idea, so the new flag might well make an appearance in
2.6.23.
Index entries for this article | |
---|---|
Kernel | File descriptors |
Kernel | System calls |
Fun with file descriptors
Posted Jun 7, 2007 4:42 UTC (Thu)
by felixfix (subscriber, #242)
[Link] (1 responses)
Hmmm ... It's been a while since I wrote threaded socket code, so I may be barking up a non-existent tree. But if a new open() flag to create a non-sequential fd is not welcome because socket(), for instance, does not have these flags, why is a new O_CLOEXEC flag acceptable? Is it the case that no socket fd will ever be closed upon exec? Or is it because exec is (still?) a no-no in threaded programs?Posted Jun 7, 2007 4:42 UTC (Thu) by felixfix (subscriber, #242) [Link] (1 responses)
Fun with file descriptors
Posted Jun 14, 2007 7:51 UTC (Thu)
by slamb (guest, #1070)
[Link]
I'm not sure about a "no-no", but exec*() is generally preceded by fork(), and I'm told a fork() in large threaded
programs can be extremely slow. Apache uses an external forking server for this reason. Other
caveats include that in the fork()ed child process, it is as if all other threads suddenly died, and they still hold
whatever mutexes they were holding. So any resources the child requires (before calling exec*()) need a pthread_atfork(3)
handler or similar.
Posted Jun 14, 2007 7:51 UTC (Thu) by slamb (guest, #1070) [Link]
Fun with file descriptors
Posted Jun 7, 2007 5:48 UTC (Thu)
by daney (guest, #24551)
[Link] (4 responses)
The O_CLOEXEC for open(2) is a good start, but it is not sufficient.
We need coverage for accept(2) also. I am not sure if there are other
ways of creating file descriptors, but unless you plug them all, you
have to assume that the race condition will occur and take
precautions. If you have to take these precautions, then fixing only
a portion of the causes (fixing open(2)) is not all that useful.
Posted Jun 7, 2007 5:48 UTC (Thu) by daney (guest, #24551) [Link] (4 responses)
Another nasty race condition that I have considered is:
Thread 1 Thread 2 Thread 3 --------------------------------------------------------------- load fd from memory load fd from memory close fd obtain same fd from open use fd on wrong fileIf Thread 3 did not reuse the fd there would not be a problem.
I am glad that people are starting to address these issues.
Fun with file descriptors
Posted Jun 7, 2007 11:52 UTC (Thu)
by HenrikH (subscriber, #31152)
[Link] (2 responses)
But this should be controlled by the application using mutexes to lock the access to the fd and you probably have some struct holding the fd that will also be freed upon the "close fd" so that thread #1 no longer can access the same fd after it is closed by thread #2.Posted Jun 7, 2007 11:52 UTC (Thu) by HenrikH (subscriber, #31152) [Link] (2 responses)
That said however, it would be nice to avoid the reuse of the fd as it is now.
Fun with file descriptors
Posted Jun 7, 2007 16:41 UTC (Thu)
by daney (guest, #24551)
[Link] (1 responses)
The problem being is that if you use a mutex to protect a blocking read, you can not interrupt it by closing the file descriptor as the thread calling close would block on the mutex.Posted Jun 7, 2007 16:41 UTC (Thu) by daney (guest, #24551) [Link] (1 responses)
There are ways to work around the current state of things, but the problem is that they are often complex and easy to get wrong.
Fun with file descriptors
Posted Jun 12, 2007 15:23 UTC (Tue)
by shane (subscriber, #3335)
[Link]
The problem being is that if you use a mutex to protect a blocking
read, you can not interrupt it by closing the file descriptor as the
thread calling close would block on the mutex.
Posted Jun 12, 2007 15:23 UTC (Tue) by shane (subscriber, #3335) [Link]
Yes, this is a tricky problem.
You can create a file descriptor that you can "wake" the blocking thread and then use select() or poll(). You can use a pipe for this (of course you have to be careful to handle closing the pipe then).
You can also use a condition variable for the reader, and a separate thread checking for input that signals the reader. A 3rd thread should be able to close the file descriptor, which should then return EBADF from it's select() or poll(). (I think...)
Sharing file descriptors between threads is indeed a pain in the ass though. Correct threaded programming is fairly difficult, it makes one wonder if maybe good old event-driven programming wasn't really the answer all along!
Fun with file descriptors
Posted Jul 1, 2008 21:21 UTC (Tue)
by dankamongmen (subscriber, #35141)
[Link]
Posted Jul 1, 2008 21:21 UTC (Tue) by dankamongmen (subscriber, #35141) [Link]
I'm being brutally bent over by the lack of an accept(2) solution right now (I'm docing the changes as they come along, btw; see: http://dank.qemfd.net/dankwiki/index.php/Linux_APIs#File_...). It seems setsockopt(2) could be easily enough overloaded for this purpose...argh!
Fun with file descriptors
Posted Jun 7, 2007 14:24 UTC (Thu)
by RobSeace (subscriber, #4435)
[Link] (10 responses)
> This problem is evidently real, to the point that the glibc goes out ofPosted Jun 7, 2007 14:24 UTC (Thu) by RobSeace (subscriber, #4435) [Link] (10 responses)
> its way to avoid using internal file descriptors for anything.
Um, except that it DOES use internal FDs for lots of stuff... How do you think syslog() is implemented? It needs a Unix domain socket FD to talk to syslogd... (Plus, it might open another FD for the console to blast messages there...) How do you think the DNS resolver works? It needs sockets to talk to DNS servers... Plus, there are countless opens of various "/proc" files and other regular files going on all the time... Granted, usually just very temporary, and closed soon after, but if we're worrying about multi-threaded code, that could still cause problems if they brokenly assume no one will be messing with the FDs... And, that's just libc; lots of other libs also open private FDs for their own use... Got an X app? You've got at least one hidden FD for talking to the X server... There are surely many other examples, as well... It's a fairly common practice, not something strange and unheard of... I just don't get why there needs to be a big deal made about it, and the sudden desire to cater to obviously and perversely broken apps... Continuing to enable these broken apps to function is not helping the situation any; the correct thing to do is fix the broken apps, not enable their broken behavior... Because, they're STILL going to break eventually, when they use some OTHER lib someday which uses FDs behind its back...
That said, I'm not really opposed to the idea of a separate FD space... I just don't see where its all that necessary... Anyone got an example of such a real-world broken app?? (And, if so, why the *bleep* isn't it fixed yet??)
Fun with file descriptors
Posted Jun 8, 2007 0:33 UTC (Fri)
by bronson (subscriber, #4806)
[Link] (8 responses)
If file descriptors were safe by default (close-on-exec), then I agree, let glibc scatter FDs everywhere. In my case, though, it's a bit more difficult. I'm writing a test harness that starts as root, does some non-trivial processing (including a lot of forking), then eventually drops perms and execs a potentially hostile, user-supplied executable. (by hostile, I mean the way any executable in ~/bin on a multi-user system is potentially hostile)Posted Jun 8, 2007 0:33 UTC (Fri) by bronson (subscriber, #4806) [Link] (8 responses)
Well, I definitely I run through the entire FD space to make sure ALL FDs that I don't know about are closed. If glibc opens an FD to some sensitive resource while I'm running as root, and that FD remains open when I drop perms and exec, that user's executable gets free reign over some potentially sensitive system resource.
I'll admit that I haven't thought about this too deeply (it's just a one-off hack)... Is there any better solution than running through and closing anything I don't know about? I've found the occasional lurking fd (a file leaked earlier in the process, a forgotten syslog, etc) so my solution, while damned ugly, has probably saved me once or twice.
Why oh why can't FDs be safe by default?
Fun with file descriptors
Posted Jun 8, 2007 4:09 UTC (Fri)
by daney (guest, #24551)
[Link]
> Why oh why can't FDs be safe by default?Posted Jun 8, 2007 4:09 UTC (Fri) by daney (guest, #24551) [Link]
Probably because POSIX defines how they work. Being compatible with legacy systems has its price.
Fun with file descriptors
Posted Jun 8, 2007 7:16 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link]
What about a thread-specific default set of FD flags?Posted Jun 8, 2007 7:16 UTC (Fri) by quotemstr (subscriber, #45331) [Link]
Using something like this, we wouldn't need to modify any existing APIs.
/* Internal glibc function */
A: old_fd_flags = kernel_default_fd_flags(FD_CLOEXEC | FD_RANDFD);
B: event_fd = super_duper_event_polling_mechanism_fd();
C: kernel_default_fd_flags(old_fd_flags);
Since the state is thread-specific, we don't need to worry about cross-thread synchronization. It wouldn't be inherited across exec, fork or clone, since it's intended for purely local options. I can't think of a situation where one would want to create a new thread and atomically give it a default set of FD flags.
It's race-free as well. If a fork happens between A and B, nothing unusual happens; the child process doesn't inherit the thread setting flags. If a fork happens after B, event_fd is closed when the child exec()s.
It's adheres to POSIX as long as the application doesn't touch kernel_default_fd_flags itself, and as long as any libraries restore flags after they're done with them.
Why not add an FD_CLOFORK owhile we're at it? That's a lot closer to what you'd want for a piece of code that allocated an internal file descriptor. Granted, multithreaded programs shouldn't fork except to then exec.
Fun with file descriptors
Posted Jun 8, 2007 10:34 UTC (Fri)
by RobSeace (subscriber, #4435)
[Link] (5 responses)
I understand your concerns, but as far as I can see, a separate FD space doesn't do anything to help with your problems at all... The FDs would still be there, but merely outside the normal range... (And, if they were in a separate "/proc" location, you might not even know where they were to legitimately be able to close them prior to your exec*()... Unless glibc had diligently and properly set CLOEXEC for them all... But, if it did that anyway, it wouldn't matter if they were in the separate space or not, for your purposes...) Now, the automatic CLOEXEC stuff would help you, and I have no problems with that notion at all... The only thing I find a bit strange is the supposed need for this separate FD space, because there are supposedly apps that don't operate properly when glibc (or whoever) creates an FD of its own, which I find absurd... If such an app really exists, I'd like to know about it, for no other reason than to know whose code to avoid in the future...Posted Jun 8, 2007 10:34 UTC (Fri) by RobSeace (subscriber, #4435) [Link] (5 responses)
Fun with file descriptors
Posted Jun 11, 2007 9:33 UTC (Mon)
by nlucas (guest, #33793)
[Link] (4 responses)
According to the text, the problem is with applications assuming the "lowest available descriptor" guarantee, which seems to be a POSIX thing, so can't be changed in the generic case.Posted Jun 11, 2007 9:33 UTC (Mon) by nlucas (guest, #33793) [Link] (4 responses)
An example where it can be an issue would be for an application to allocate several file descriptors in a loop, mixed with the usual mix of libc functions, and then assume the <max> parameter for select (which, by the way, is a brain-dead parameter) as the first file descriptor plus the number of file descriptors created.
This seems to be a POSIX accepted behaviour, even if the code seems a bit fishy (not very good for later maintenance), so must be supported either by the kernel or by libc.
Fun with file descriptors
Posted Jun 11, 2007 10:32 UTC (Mon)
by RobSeace (subscriber, #4435)
[Link] (3 responses)
No, the only thing guaranteed is that any particular call that creates a new FD will return the lowest numbered FD currently available... That's it... It doesn't guarantee that nothing else outside the app code will use up any FDs... And, no sane code would EVER make such a brain-dead assumption... Because, any programmer worth a damn KNOWS for a fact that LOTS of library code DOES indeed open up lots of FDs of its own for various uses... So, unless you have complete control over the code, and aren't making any library function calls, you better make NO assumptions over what particular FD number you are going to get assigned at any particular time... You can be guaranteed it's the lowest currently available number, but that doesn't mean a whole lot if you don't know all of the currently open FDs...Posted Jun 11, 2007 10:32 UTC (Mon) by RobSeace (subscriber, #4435) [Link] (3 responses)
For instance, this common behavior is fairly reasonable:
close (0);
open ("/dev/null", O_RDONLY);
Assuming that the open() will get FD# 0... That's reasonable enough, because it's hard to imagine the need for either close() or open() to create a persistent extra FD of its own for some use behind your back, and this sort of behavior has historically always worked... But, if you add any other lib function calls between the close() and the open(), you're just asking for trouble, and you can't be too surprised when it doesn't work anymore... (Plus, if you really wanted to write good code, you'd instead use dup2() or freopen() or something, to guarantee assignment of the desired FD#...)
Fun with file descriptors
Posted Jun 11, 2007 12:29 UTC (Mon)
by nlucas (guest, #33793)
[Link] (1 responses)
Fair enough. To me, even the lowest number fd guarantee is strange, so I though it was some POSIX weirdness (along with other weird behaviours for compatibility sake).Posted Jun 11, 2007 12:29 UTC (Mon) by nlucas (guest, #33793) [Link] (1 responses)
Anyway, even the { close(0);open(...); } is strange, as it doesn't take into account multithreading, so I wouldn't ever do anything like that, even in single threaded applications (I never know when a piece of code that seems trivial will be latter "copy/pasted" to a multithreaded application).
Fun with file descriptors
Posted Jun 14, 2007 7:58 UTC (Thu)
by slamb (guest, #1070)
[Link]
Posted Jun 14, 2007 7:58 UTC (Thu) by slamb (guest, #1070) [Link]
Avoiding code which is broken if copy'n'pasted into the wrong context is a hopeless job - the best you can do is state your code's assumptions, and whoever adds code (whether pasted or not) must take responsibility for it.
Anyway, I can't really imagine when you'd ever want to replace stdin/stdout/stderr while multiple threads are going, so I don't know why someone would paste this there.
Fun with file descriptors
Posted Sep 15, 2007 6:40 UTC (Sat)
by schabi (guest, #14079)
[Link]
You wrote:
Posted Sep 15, 2007 6:40 UTC (Sat) by schabi (guest, #14079) [Link]
This one is broken as sonn as threads are involved - typical race condition, another thread could do anything between close(0) and open().For instance, this common behavior is fairly reasonable: close (0); open ("/dev/null", O_RDONLY);
Fun with file descriptors
Posted Jun 8, 2007 18:55 UTC (Fri)
by jengelh (subscriber, #33263)
[Link]
If you run the proprietary nvidia driver, each GL app will open /dev/nvidiactl at least once and /dev/nvidia0 at least twice. Oh well.Posted Jun 8, 2007 18:55 UTC (Fri) by jengelh (subscriber, #33263) [Link]