The new pselect() system call
Applications like network servers that need to monitor multiple file descriptors using select(), poll(), or (on Linux) epoll_wait() sometimes face a problem: how to wait until either one of the file descriptors becomes ready, or a signal (say, SIGINT) is delivered. These system calls, as it turns out, do not interact entirely well with signals.
A seemingly obvious solution would be to write an empty handler for the signal, so that the signal delivery interrupts the select() call:
static void handler(int sig) { /* do nothing */ } int main(int argc, char *argv[]) { fd_set readfds; struct sigaction sa; int nfds, ready; sa.sa_handler = handler; /* Establish signal handler */ sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sigaction(SIGINT, &sa, NULL); /* ... */ ready = select(nfds, &readfds, NULL, NULL, NULL); /* ... */
After select() returns we can determine what happened by looking at the function result and errno. If errno comes back as EINTR, we know that the select() call was interrupted by a signal, and can act accordingly. But this solution suffers from a race condition: if the SIGINT signal is delivered after the call to sigaction(), but before the call to select(), it will fail to interrupt that select() call and will thus be lost.
We can try playing various games like setting a global flag within the signal handler and monitoring that flag in the main program, and using sigprocmask() to block the signal until just before the select() call. However, none of these techniques can entirely eliminate the race condition: there is always some interval, no matter how brief, where the signal could be handled before the select() call is started.
The traditional solution to this problem is the so-called self-pipe trick, often credited to D J Bernstein. Using this technique, a program establishes a signal handler that writes a byte to a specially created pipe whose read end is also monitored by the select(). The self-pipe trick cleverly solves the problem of safely waiting either for a file descriptor to become ready or a signal to be delivered. However, it requires a relatively large amount of code to implement a requirement that is essentially simple. (For example, a robust solution requires marking both the read and write ends of the pipe non-blocking.)
For this reason, the POSIX.1g committee devised an enhanced version of select(), called pselect(). The major difference between select() and pselect() is that the latter call has a signal mask (sigset_t) as an additional argument:
int pselect(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *timeout, const sigset_t *sigmask);The sigmask argument specifies a set of signals that should be blocked during the pselect() call; it overrides the current signal mask for the duration of that call. So, when we make the following call:
ready = pselect(nfds, &readfds, &writefds, &exceptfds, timeout, &sigmask);the kernel performs a sequence of steps that is equivalent to atomically performing the following system calls:
sigset_t sigsaved; sigprocmask(SIG_SETMASK, &sigmask, &sigsaved); ready = select(nfds, &readfds, &writefds, &exceptfds, timeout); sigprocmask(SIG_SETMASK, &sigsaved, NULL);
For some time now, glibc has provided a library implementation of pselect() that actually uses the above sequence of system calls. The problem is that this implementation remains vulnerable to the very race condition that pselect() was designed to avoid, because the separate system calls are not executed as an atomic unit.
Using pselect(), we can safely wait for either a signal to be delivered or a file descriptor to become ready, by replacing the first part of our example program with the following code:
sigset_t emptyset, blockset; sigemptyset(&blockset); /* Block SIGINT */ sigaddset(&blockset, SIGINT); sigprocmask(SIG_BLOCK, &blockset, NULL); sa.sa_handler = handler; /* Establish signal handler */ sa.sa_flags = 0; sigemptyset(&sa.sa_mask); sigaction(SIGINT, &sa, NULL); /* Initialize nfds and readfds, and perhaps do other work here */ /* Unblock signal, then wait for signal or ready file descriptor */ sigemptyset(&emptyset); ready = pselect(nfds, &readfds, NULL, NULL, NULL, &emptyset); ...
This code works because the SIGINT signal is only unblocked once control has passed to the kernel. As a result, there is no point where the signal can be delivered before pselect() executes. If the signal is generated while pselect() is blocked, then, as with select(), the system call is interrupted, and the signal is delivered before the system call returns.
Although pselect() was conceived several years ago, and was already publicized in 1998 by W. Richard Stevens in his Unix Network Programming, vol. 1, 2nd ed., actual implementations have been slow to appear. Their eventual appearance in recent releases of various Unix implementations has been driven in part by the fact that the 2001 revision of the POSIX.1 standard requires a conforming system to support pselect(). With the 2.6.16 kernel release, and the required wrapper function that appears in the recently released glibc 2.4, pselect() also becomes available on Linux.
Linux 2.6.16 also includes a new (but nonstandard) ppoll() system call, which adds a signal mask argument to the traditional poll() interface:
int ppoll(struct pollfd *fds, nfds_t nfds, const struct timespec *timeout, const sigset_t *sigmask);
This system call adds the same functionality to poll() that pselect() adds to select(). Not to be left in the cold, the epoll maintainer has patches in the pipeline to add similar functionality in the form of a new epoll_pwait() system call.
There are a few other, minor differences between pselect() and ppoll() and their traditional counterparts. For example the type of the timeout is:
struct timespec { long tv_sec; /* Seconds */ long tv_nsec; /* Nanoseconds */ };This allows the timeout interval to be specified with greater precision than is available with the older system calls.
The glibc wrappers for pselect() and ppoll() also hide a couple of details of the underlying system calls.
First, the system calls actually expect the signal mask argument to be described by two arguments, one of which is a pointer to a sigset_t structure, while the other is an integer that indicates the size of that structure in bytes. This allows for the possibility of a larger sigset_t type in the future.
The underlying system calls also modify their timeout argument so that on an early return (because a file descriptor became ready, or a signal was delivered), the caller knows how much of the timeout remained. However, the respective wrapper functions hide this detail by making a local copy of the timeout argument and passing that copy to the underlying system calls. (The Linux select() system call also modifies its timeout argument, and this behavior is visible to applications. However, many other select() implementations don't modify this argument. POSIX.1 permits either behavior in a select() implementation.)
Further details of
pselect()
and
ppoll()
can be found in the latest versions of the
select(2)
and
poll(2)
man pages, which can be found
here.
Index entries for this article | |
---|---|
GuestArticles | Kerrisk, Michael |
Posted Mar 30, 2006 8:31 UTC (Thu)
by wingo (guest, #26929)
[Link]
Posted Mar 30, 2006 11:20 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Mar 30, 2006 15:39 UTC (Thu)
by clugstj (subscriber, #4020)
[Link] (8 responses)
Posted Mar 30, 2006 19:05 UTC (Thu)
by smoogen (subscriber, #97)
[Link] (7 responses)
Posted Mar 30, 2006 19:21 UTC (Thu)
by clugstj (subscriber, #4020)
[Link] (5 responses)
Posted Mar 30, 2006 19:58 UTC (Thu)
by mikov (guest, #33179)
[Link]
For comparison the Win32 API doesn't support signals.
Posted Mar 31, 2006 4:24 UTC (Fri)
by hppnq (guest, #14462)
[Link] (2 responses)
Signals can be used for notifying process groups asynchronously of some event, for instance, which is likely to be a bit more tricky to do with select(). Yes, this is ancient Unix, but so is the concept of the tty.
Using select() as a replacement for signal handling is a quite tricky hack that only solves one specific problem and creates a couple of others, so I'm glad work is being done to implement the appropriate, standard interface. :-)
Posted Mar 31, 2006 13:30 UTC (Fri)
by clugstj (subscriber, #4020)
[Link] (1 responses)
Posted Mar 31, 2006 15:23 UTC (Fri)
by hppnq (guest, #14462)
[Link]
The complexity that arises from mixing signals and threads is only justified if you have specific reasons to implement it like that. By default, a multi-threaded process acts the same as a non-threaded process when interrupted.
So, unless you have specific reasons for defining per-thread signal masks (which is possible), there's nothing special about the multi-threaded case.
Posted Mar 31, 2006 20:16 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
How about the most basic use of signals: I have a server that waits for input on various sockets and I want to terminate the server. Sending SIGTERM is a conventional way to do that. I know it well, I know the tools ('kill') to do it like the back of my hand, and it works the same on most other processes, so it is very convenient for it to work on the server in question.
If the server is simple enough, the author doesn't have to write any code at all; the kernel will terminate it all by itself. If the server wants to e.g. finish up transactions in progress, the author can add a small amount of code to handle SIGTERM. But without signals, the author must write more code and design a special termination protocol. I then must remember to use it, and probably look up how to use the tool that initiates it, every time I want to terminate the server.
There are similar external interruption sort of things where signals are
easier to implement and easier for the user to deal with. Think about a program in which a user types commands. One of those commands is taking too long and he wants to abort and go back to the program's command prompt. Ctl-C is an excellent way for him to interrupt, and rather easy to implement (Ctl-C normally generates a signal).
Signals also cut through layers. Let's say I run a program that calls a library function that calls a library function and so on and 5 levels deep there is a select(). I don't know or want to know anything about those deep libraries, but I know I don't want to wait more than 4 seconds for everything to finish. With a signal, I can get that select() to abort after 4 seconds without even knowing it's there. The only other way would be to add timeouts in every parameter list in the stack, and have every level keep track of elapsed time. More work for everybody.
I think of signals, when properly used with the modern interfaces, to solve the same problems that interrupts and "throwing" objects do.
Timeout, on the other hand, is what is misplaced on the select() call. Timing out should be handled either through a signal (it would have to be different from the existing global signals, so that you could use it inside a library) or a file descriptor that becomes ready at a certain time (not unlike the self-sending pipe thing that subsitutes for signals).
Posted Apr 7, 2006 13:39 UTC (Fri)
by lgb (guest, #784)
[Link]
Posted Mar 30, 2006 16:35 UTC (Thu)
by mtk77 (guest, #6040)
[Link] (6 responses)
My little inetd at http://hairy.beasts.org/whinetd/whinetd/src/whinetd.c uses this technique.
Posted Mar 30, 2006 21:03 UTC (Thu)
by jreiser (subscriber, #11027)
[Link] (5 responses)
Posted Mar 30, 2006 22:13 UTC (Thu)
by mkerrisk (subscriber, #1978)
[Link] (4 responses)
Posted Mar 31, 2006 1:21 UTC (Fri)
by dougm (guest, #4615)
[Link] (1 responses)
Posted Mar 31, 2006 3:41 UTC (Fri)
by mkerrisk (subscriber, #1978)
[Link]
Posted Mar 31, 2006 13:43 UTC (Fri)
by clugstj (subscriber, #4020)
[Link] (1 responses)
It is not a better solution just because it takes less code (in user space).
Posted Apr 6, 2006 10:09 UTC (Thu)
by renox (guest, #23785)
[Link]
Well does glibc includes it?
Whereas kernel implementation is really unique, so it is really better.
That said I find quite awful that glibc would implement pselect leaving the unsuspecting developer vulnerable to the race condition, it should be either implemented in the kernel or not at all (unless the library find a way to close the race condition of course).
Posted Jul 21, 2008 14:23 UTC (Mon)
by almorozov (guest, #53014)
[Link]
Posted Nov 22, 2024 13:15 UTC (Fri)
by TuhinK (guest, #174722)
[Link]
Nice article, thanks.The new pselect() system call
Now all we need is working userspace kernel headers for 2.6.16, so the new glibc can actually get at these new syscalls... :/The new pselect() system call
I don't understand why people want to use signals in their programs. They are a race condition nightmare to manage in combination with threads. I have always (20 years of experience) been able to rewrite programs to avoid signal usage except in the case of fatal error conditions.The new pselect() system call
An example of how this is done might illuminate people who still use it.The new pselect() system call
OK, but first someone must explain to me what the signal is being used for. Then I will try to come up with a design that doesn't require a signal to perform this function. Usually, in the past, what I've seen is signals being used for timeouts. This is easily avoided by using select/poll (which have timeout parameters) before calling read/write instead of blocking in read/write.The new pselect() system call
I completely agree. Signals should be avoided if possible - there are countless race conditions and apparentlky even buggy glibc functions (!) associated with them. To me signals have always seemed like a remnant from the past when they were used to supplement a lacking API. If anything the API should evolve to decrease the need for signals even further.The new pselect() system call
Mmmh, I like the fact that system calls can be interrupted. :-)
The new pselect() system call
System call interruption is not portable. Some UNIXes will restart some system calls after they handle the signal, some won't. Yes, signals are OK in non-threaded applications, but when the application uses threads, the race-condition nightmare begins.System Call Interruption
Portability is not an issue when your system call is actually interrupted. :-)
System Call Interruption
The new pselect() system call
I've met with this problem when I wrote my little server capable of serving incoming requests with spawning special programs with fork() and than exec(). The "main" process is to handle accept(), and initialize communication between my server and the clients connecting, and when enough data is ready sent by client, I should start an external program and manage the data flow between it and my server. Of course I should keep track processes I've forked so SIGCHLD is needed. And I don't want to create new process for each incoming connection neither threads (I want my software run on both of Solaris and Linux, and I've learned that forking from multithtread application is not a very fast and portable way to do things), so I've decided to use nonblocking I/O and select(), also because of the need of taking care about my childs, signal handling is a must for me.The new pselect() system call
The traditional way to deal with this is for the process to create a pipe and the select call to include that in its list. Then, the signal handler just writes a byte down the pipe.The new pselect() system call
The fourth paragraph of the article begins, "The traditional solution to this problem is the so-called self-pipe trick ...". That paragraph alleges non-obvious defects in such a trick. What is your response?Self-pipe trick (and failings)
John,
Self-pipe trick (and failings)
The fourth paragraph of the article begins, "The traditional solution to this problem is the so-called self-pipe trick ...". That paragraph alleges non-obvious defects in such a trick. What is your response?
I'm not sure if you are asking this question of me (author of article) or "mtk77" (who your article seems to reply to). Anyway, I don't allege any defects in the self-pipe trick; all I say is that it requires quite a bit of code to do things right:
It all works fine, but pselect() allows us to achieve the same result with less code.
while ((ready = select(nfds, &readfds, NULL, NULL, NULL)) == -1 &
errno == EINTR)
continue;
Just a nit: I think you want '&&' in the while condition, not '&'.Self-pipe trick (and failings)
Just a nit: I think you want '&&' in the while condition, not '&'.
Self-pipe trick (and failings)
Doh! Thanks, yes.
Yes, it takes some code, but it only needs to be written and debugged once. Then tuck it into you comm library behind a simple interface and forget about the gory details. This is a problem that can be solved completely in user space. Adding a new system call to handle it is adding complexity to the kernel for no good reason.Self-pipe trick (and failings)
> Yes, it takes some code, but it only needs to be written and debugged once.Self-pipe trick (and failings)
No because the self-pipe trick cannot really be put into a library: it play tricks with signals, pipes which could have impact in your code, so each userspace program which need it must reimplement it, I'd hardly call this 'once'.
Plus it is conforming to POSIX standard, even better!
The race still occurs :-(
Hi
I have just encountered the race condition which the provided recipe tries
to avoid :-(. The code looks similar to one in the article:
1. SIGCHLD is blocked
2. Then two child processes are invoked (in a "pipe" i.e. output of one
process
is the input of another).
3. A SIGCHLD handler similar to one described in
info '(libc)Merged Signals' is installed
4. So far so good. Then I start to wait in pselect
Logs show that the first child exits, its termination signal is caught by
the handler and pselect returns with exitcode=-1, errno=EINTR. Ok.
The code performs all necessary actions and returns back to
pselect(). But when the second child exits there's a *possibility* that
pselect won't return after signal is caught and processed in the handler.
I put a simple check that right before pselect() SIGCHLD is blocked
(and it is). Logs show that the handler is invoked (that is the signal was
unblocked in pselect) but strace'ing the process shows that it's hanging
on select() :-(. It seems I missed some magic :-(.
Debian-4.0 under VZ-enabled 2.6.18 kernel.
Any equivalent to epoll_pwait in kqueue