lpg-0 4
lpg-0 4
lpg-0 4
Sven Goldt
Sven van der Meer
Scott Burkett
Matt Welsh
Version 0.4
March 1995
Contents
Converted on:
Fri Mar 29 14:43:04 EST 1996
Copyright
The Linux Programmer's Guide is 1994, 1995 by Sven Goldt
Sven Goldt, Sachsendamm 47b, 10829 Berlin, Germany
<goldt@math.tu-berlin.de>.
Chapter 8 is 1994, 1995 by Sven van der Meer <vdmeer@cs.tu-berlin.de>.
Chapter 6 is 1995 Scott Burkett <scottb@IntNet.net>.
Chapter 10 is 1994, 1995 Matt Welsh <mdw@cs.cornell.edu>.
Special thanks goes to John D. Harper <jharper@uiuc.edu> for proofreading this guide.
Permission to reproduce this document in whole or in part is subject to the following
conditions:
1. The copyright notice remains intact and is included.
2. If you make money with it the authors want a share.
3. The authors are not responsible for any harm that might arise by the use of it.
Preface
This guide is far from being complete.
The first release started at version 0.1 in September 1994. It concentrated on system calls
because of lack of manpower and information. Planned are the description of library
functions and major kernel changes as well as excursions into important areas like
networking, sound, graphics and asynchronous I/O. Maybe some hints about how to build
shared libraries and pointers to useful toolkits will later be included.
This guide will only be a success with generous help in the form of information or perhaps
even submission of whole chapters.
Introduction
Once upon a time I installed Linux on my PC to learn more about system administration. I
tried to install a slip server but it didn't work with shadow and mgetty. I had to patch
sliplogin and it worked until the new Linux 1.1 releases. No one could tell me what had
happened. There was no documentation about changes since the 0.99 kernel except the
kernel change summaries from Russ Nelson, but they didn't help me very much in solving
problems.
The Linux Programmer's Guide is meant to do what the name implies-- It is to help Linux
programmers understand the peculiarities of Linux. By its nature, this also means that it
should be useful when porting programs from other operating systems to Linux. Therefore,
this guide must describe the system calls and the major kernel changes which have effects
on older programs like serial I/O and networking.
4 System calls
A system call is usually a request to the operating system (kernel) to do a hardware/system-specific
or privileged operation. As of Linux-1.2, 140 system calls have been defined. System calls like
close() are implemented in the Linux libc. This implementation often involves calling a macro
which eventually calls syscall(). Parameters passed to syscall() are the number of the system call
followed by the needed arguments. The actual system call numbers can be found in <linux/unistd.h>
while <sys/syscall.h> gets updated with a new libc. If new calls appear that don't have a stub in libc
yet, you can use syscall(). As an example, you can close a file using syscall() like this (not advised):
#include <syscall.h>
extern int syscall(int, ...);
int my_close(int filedescriptor)
{
return syscall(SYS_close, filedescriptor);
}
On the i386 architecture, system calls are limited to 5 arguments besides the system call number
because of the number of hardware registers. If you use Linux on another architecture you can
check <asm/unistd.h> for the _syscall macros to see how many arguments your hardware supports
or how many the developers chose to support. These _syscall macros can be used instead of
syscall(), but this is not recommended since such a macro expands to a full function which might
already exist in a library. Therefore, only kernel hackers should play with the _syscall macros. To
demonstrate, here is the close() example using a _syscall macro.
#include <linux/unistd.h>
_syscall1(int, close, int, filedescriptor);
The _syscall1 macro expands revealing the close() function. Thus we have close() twice-once in
libc and once in our program. The return value of syscall() or a _syscall macro is -1 if the system
call failed and 0 or greater on success. Take a look at the global variable errno to see what happened
if a system call failed.
The following system calls that are available on BSD and SYS V are not available on Linux:
audit(), auditon(), auditsvc(), fchroot(), getauid(), getdents(), getmsg(), mincore(), poll(), putmsg(),
setaudit(), setauid().
6.1 Introduction
6.2 Half-duplex UNIX Pipes
o 6.2.1 Basic Concepts
o 6.2.2 Creating Pipes in C
o 6.2.3 Pipes the Easy Way!
o 6.2.4 Atomic Operations with Pipes
o 6.2.5 Notes on half-duplex pipes:
6.3 Named Pipes (FIFOs - First In First Out)
o 6.3.1 Basic Concepts
o 6.3.2 Creating a FIFO
o 6.3.3 FIFO Operations
o 6.3.4 Blocking Actions on a FIFO
o 6.3.5 The Infamous SIGPIPE Signal
6.4 System V IPC
o 6.4.1 Fundamental Concepts
IPC Identifiers
IPC Keys
The ipcs Command
The ipcrm Command
o 6.4.2 Message Queues
Basic Concepts
Internal and User Data Structures
Message buffer
Kernel msg structure
Kernel msqid_ds structure
Kernel ipc_perm structure
SYSTEM CALL: msgget()
SYSTEM CALL: msgsnd()
SYSTEM CALL: msgctl()
msgtool: An interactive message queue manipulator
Background
Command Line Syntax
Examples
The Source
o 6.4.3 Semaphores
Basic Concepts
Internal Data Structures
Kernel semid_ds structure
Kernel sem structure
SYSTEM CALL: semget()
SYSTEM CALL: semop()
6.1 Introduction
The Linux IPC (Inter-process communication) facilities provide a method for multiple processes to
communicate with one another. There are several methods of IPC available to Linux C
programmers:
These facilities, when used effectively, provide a solid framework for client/server development on
any UNIX system (including Linux).
The above sets up a pipeline, taking the output of ls as the input of sort, and the output of sort as the
input of lp. The data is running through a half duplex pipe, traveling (visually) left to right through
the pipeline.
Although most of us use pipes quite religiously in shell script programming, we often do so without
giving a second thought to what transpires at the kernel level.
When a process creates a pipe, the kernel sets up two file descriptors for use by the pipe. One
descriptor is used to allow a path of input into the pipe (write), while the other is used to obtain data
from the pipe (read). At this point, the pipe is of little practical use, as the creating process can only
use the pipe to communicate with itself. Consider this representation of a process and the kernel
after a pipe has been created:
%
From the above diagram, it is easy to see how the descriptors are connected together. If the process
sends data through the pipe (fd0), it has the ability to obtain (read) that information from fd1.
However, there is a much larger objective of the simplistic sketch above. While a pipe initially
connects a process to itself, data traveling through the pipe moves through the kernel. Under Linux,
in particular, pipes are actually represented internally with a valid inode. Of course, this inode
resides within the kernel itself, and not within the bounds of any physical file system. This
particular point will open up some pretty handy I/O doors for us, as we will see a bit later on.
At this point, the pipe is fairly useless. After all, why go to the trouble of creating a pipe if we are
only going to talk to ourself? At this point, the creating process typically forks a child process.
Since a child process will inherit any open file descriptors from the parent, we now have the basis
for multiprocess communication (between parent and child). Consider this updated version of our
simple sketch:
%
Above, we see that both processes now have access to the file descriptors which constitute the
pipeline. It is at this stage, that a critical decision must be made. In which direction do we desire
data to travel? Does the child process send information to the parent, or vice-versa? The two
processes mutually agree on this issue, and proceed to ``close'' the end of the pipe that they are not
concerned with. For discussion purposes, let's say the child performs some processing, and sends
information back through the pipe to the parent. Our newly revised sketch would appear as such:
%
Construction of the pipeline is now complete! The only thing left to do is make use of the pipe. To
access a pipe directly, the same system calls that are used for low-level file I/O can be used (recall
that pipes are actually represented internally as a valid inode).
To send data to the pipe, we use the write() system call, and to retrieve data from the pipe, we use
the read() system call. Remember, low-level file I/O system calls work with file descriptors!
However, keep in mind that certain system calls, such as lseek(), do not work with descriptors to
pipes.
The first integer in the array (element 0) is set up and opened for reading, while the second integer
(element 1) is set up and opened for writing. Visually speaking, the output of fd1 becomes the input
for fd0. Once again, all data traveling through the pipe moves through the kernel.
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
main()
{
int
fd[2];
pipe(fd);
.
.
}
Remember that an array name in C decays into a pointer to its first member. Above, fd is equivalent
to &fd[0]. Once we have established the pipeline, we then fork our new child process:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
main()
{
int
pid_t
fd[2];
childpid;
pipe(fd);
If the parent wants to receive data from the child, it should close fd1, and the child should close fd0.
If the parent wants to send data to the child, it should close fd0, and the child should close fd1.
Since descriptors are shared between the parent and child, we should always be sure to close the end
of pipe we aren't concerned with. On a technical note, the EOF will never be returned if the
unnecessary ends of the pipe are not explicitly closed.
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
main()
{
int
pid_t
fd[2];
childpid;
pipe(fd);
if((childpid = fork()) == -1)
{
perror("fork");
exit(1);
}
if(childpid == 0)
{
/* Child process closes up input side of pipe */
close(fd[0]);
}
else
{
/* Parent process closes up output side of pipe */
close(fd[1]);
}
.
.
As mentioned previously, once the pipeline has been established, the file descriptors may be treated
like descriptors to normal files.
/*****************************************************************************
Excerpt from "Linux Programmer's Guide - Chapter 6"
(C)opyright 1994-1995, Scott Burkett
*****************************************************************************
MODULE: pipe.c
*****************************************************************************/
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int main(void)
{
int
pid_t
char
char
fd[2], nbytes;
childpid;
string[] = "Hello, world!\n";
readbuffer[80];
pipe(fd);
if((childpid = fork()) == -1)
{
}
perror("fork");
exit(1);
if(childpid == 0)
{
/* Child process closes up input side of pipe */
close(fd[0]);
}
else
{
Often, the descriptors in the child are duplicated onto standard input or output. The child can then
exec() another program, which inherits the standard streams. Let's look at the dup() system call:
SYSTEM CALL: dup();
PROTOTYPE: int dup( int oldfd );
RETURNS: new descriptor on success
-1 on error: errno = EBADF (oldfd is not a valid descriptor)
EBADF (newfd is out of range)
EMFILE (too many descriptors for the process)
NOTES: the old descriptor is not closed!
Although the old descriptor and the newly created descriptor can be used interchangeably, we will
typically close one of the standard streams first. The dup() system call uses the lowest-numbered,
unused descriptor for the new one.
Consider:
.
.
childpid = fork();
if(childpid == 0)
{
/* Close up standard input of the child */
close(0);
/* Duplicate the input side of pipe to stdin */
dup(fd[0]);
execlp("sort", "sort", NULL);
.
}
Since file descriptor 0 (stdin) was closed, the call to dup() duplicated the input descriptor of the pipe
(fd0) onto its standard input. We then make a call to execlp(), to overlay the child's text segment
(code) with that of the sort program. Since newly exec'd programs inherit standard streams from
their spawners, it actually inherits the input side of the pipe as its standard input! Now, anything that
the original parent process sends to the pipe, goes into the sort facility.
There is another system call, dup2(), which can be used as well. This particular call originated with
Version 7 of UNIX, and was carried on through the BSD releases and is now required by the POSIX
standard.
SYSTEM CALL: dup2();
PROTOTYPE: int dup2( int oldfd, int newfd );
RETURNS: new descriptor on success
-1 on error: errno = EBADF (oldfd is not a valid descriptor)
EBADF (newfd is out of range)
EMFILE (too many descriptors for the process)
NOTES: the old descriptor is closed with dup2()!
With this particular call, we have the close operation, and the actual descriptor duplication, wrapped
up in one system call. In addition, it is guaranteed to be atomic, which essentially means that it will
never be interrupted by an arriving signal. The entire operation will transpire before returning
control to the kernel for signal dispatching. With the original dup() system call, programmers had to
perform a close() operation before calling it. That resulted in two system calls, with a small degree
of vulnerability in the brief amount of time which elapsed between them. If a signal arrived during
that brief instance, the descriptor duplication would fail. Of course, dup2() solves this problem for
us.
Consider:
.
.
childpid = fork();
if(childpid == 0)
{
/* Close stdin, duplicate the input side of pipe to stdin */
dup2(0, fd[0]);
execlp("sort", "sort", NULL);
.
.
}
This standard library function creates a half-duplex pipeline by calling pipe() internally. It then
forks a child process, execs the Bourne shell, and executes the command argument within the
shell. Direction of data flow is determined by the second argument, type. It can be r or w, for
read or write. It cannot be both! Under Linux, the pipe will be opened up in the mode specified
by the first character of the type argument. So, if you try to pass rw, it will only open it up in
read mode.
While this library function performs quite a bit of the dirty work for you, there is a substantial
tradeoff. You lose the fine control you once had by using the pipe() system call, and handling the
fork/exec yourself. However, since the Bourne shell is used directly, shell metacharacter expansion
(including wildcards) is permissible within the command argument.
Pipes which are created with popen() must be closed with pclose(). By now, you have probably
realized that popen/pclose share a striking resemblance to the standard file stream I/O functions
fopen() and fclose().
LIBRARY FUNCTION: pclose();
PROTOTYPE: int pclose( FILE *stream );
RETURNS: exit status of wait4() call
-1 if "stream" is not valid, or if wait4() fails
NOTES: waits on the pipe process to terminate, then closes the stream.
The pclose() function performs a wait4() on the process forked by popen(). When it returns, it
destroys the pipe and the file stream. Once again, it is synonymous with the fclose() function for
normal stream-based file I/O.
Consider this example, which opens up a pipe to the sort command, and proceeds to sort an array of
strings:
/*****************************************************************************
Excerpt from "Linux Programmer's Guide - Chapter 6"
(C)opyright 1994-1995, Scott Burkett
*****************************************************************************
MODULE: popen1.c
*****************************************************************************/
#include <stdio.h>
#define MAXSTRS 5
int main(void)
{
int cntr;
FILE *pipe_fp;
char *strings[MAXSTRS] = { "echo", "bravo", "alpha",
"charlie", "delta"};
/* Create one way pipe line with call to popen() */
if (( pipe_fp = popen("sort", "w")) == NULL)
{
perror("popen");
exit(1);
}
/* Processing loop */
for(cntr=0; cntr<MAXSTRS; cntr++) {
fputs(strings[cntr], pipe_fp);
fputc('\n', pipe_fp);
}
/* Close the pipe */
pclose(pipe_fp);
}
return(0);
Since popen() uses the shell to do its bidding, all shell expansion characters and metacharacters are
available for use! In addition, more advanced techniques such as redirection, and even output
piping, can be utilized with popen(). Consider the following sample calls:
popen("ls ~scottb", "r");
popen("sort > /tmp/foo", "w");
popen("sort | uniq | more", "w");
As another example of popen(), consider this small program, which opens up two pipes (one to the
ls command, the other to sort):
/*****************************************************************************
Excerpt from "Linux Programmer's Guide - Chapter 6"
(C)opyright 1994-1995, Scott Burkett
*****************************************************************************
MODULE: popen2.c
*****************************************************************************/
#include <stdio.h>
int main(void)
{
FILE *pipein_fp, *pipeout_fp;
char readbuf[80];
/* Create one way pipe line with call to popen() */
if (( pipein_fp = popen("ls", "r")) == NULL)
{
perror("popen");
exit(1);
}
/* Create one way pipe line with call to popen() */
if (( pipeout_fp = popen("sort", "w")) == NULL)
{
perror("popen");
exit(1);
}
/* Processing loop */
while(fgets(readbuf, 80, pipein_fp))
fputs(readbuf, pipeout_fp);
/* Close the pipes */
pclose(pipein_fp);
pclose(pipeout_fp);
}
return(0);
For our final demonstration of popen(), let's create a generic program that opens up a pipeline
between a passed command and filename:
/*****************************************************************************
Excerpt from "Linux Programmer's Guide - Chapter 6"
(C)opyright 1994-1995, Scott Burkett
*****************************************************************************
MODULE: popen3.c
*****************************************************************************/
#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *pipe_fp, *infile;
char readbuf[80];
if( argc != 3) {
fprintf(stderr, "USAGE:
exit(1);
}
return(0);
sort popen3.c
cat popen3.c
more popen3.c
cat popen3.c | grep main
512
Up to 512 bytes can be written or retrieved from a pipe atomically. Anything that crosses this
threshold will be split, and not atomic. Under Linux, however, the atomic operational limit is
defined in linux/limits.h as:
#define PIPE_BUF
4096
As you can see, Linux accommodates the minimum number of bytes required by POSIX, quite
considerably I might add. The atomicity of a pipe operation becomes important when more than one
process is involved (FIFOS). For example, if the number of bytes written to a pipe exceeds the
atomic limit for a single operation, and multiple processes are writing to the pipe, the data will be
interleaved or chunked. In other words, one process may insert data into the pipeline between
the writes of another.
Two way pipes can be created by opening up two pipes, and properly reassigning the file
descriptors in the child process.
The pipe() call must be made BEFORE a call to fork(), or the descriptors will not be
inherited by the child! (same for popen()).
With half-duplex pipes, any connected processes must share a related ancestry. Since the
pipe resides within the confines of the kernel, any process that is not in the ancestry for the
creator of the pipe has no way of addressing it. This is not the case with named pipes
(FIFOS).
The above two commands perform identical operations, with one exception. The mkfifo command
provides a hook for altering the permissions on the FIFO file directly after creation. With mknod, a
quick call to the chmod command will be necessary.
FIFO files can be quickly identified in a physical file system by the p indicator seen here in a long
directory listing:
$ ls -l MYFIFO
prw-r--r-1 root
root
Also notice the vertical bar (pipe sign) located directly after the file name. Another great reason to
run Linux, eh?
To create a FIFO in C, we can make use of the mknod() system call:
LIBRARY FUNCTION: mknod();
PROTOTYPE: int mknod( char *pathname, mode_t mode, dev_t dev);
RETURNS: 0 on success,
-1 on error: errno = EFAULT (pathname invalid)
EACCES (permission denied)
ENAMETOOLONG (pathname too long)
ENOENT (invalid pathname)
ENOTDIR (invalid pathname)
(see man page for mknod for others)
NOTES: Creates a filesystem node (file, device file, or FIFO)
I will leave a more detailed discussion of mknod() to the man page, but let's consider a simple
example of FIFO creation from C:
mknod("/tmp/MYFIFO", S_IFIFO|0666, 0);
In this case, the file /tmp/MYFIFO is created as a FIFO file. The requested permissions are
0666, although they are affected by the umask setting as follows:
final_umask = requested_permissions & ~original_umask
A common trick is to use the umask() system call to temporarily zap the umask value:
umask(0);
mknod("/tmp/MYFIFO", S_IFIFO|0666, 0);
In addition, the third argument to mknod() is ignored unless we are creating a device file. In that
instance, it should specify the major and minor numbers of the device file.
<stdio.h>
<stdlib.h>
<sys/stat.h>
<unistd.h>
#include <linux/stat.h>
#define FIFO_FILE
"MYFIFO"
int main(void)
{
FILE *fp;
char readbuf[80];
/* Create the FIFO if it does not exist */
umask(0);
mknod(FIFO_FILE, S_IFIFO|0666, 0);
while(1)
{
fp = fopen(FIFO_FILE, "r");
fgets(readbuf, 80, fp);
printf("Received string: %s\n", readbuf);
fclose(fp);
return(0);
}
Since a FIFO blocks by default, run the server in the background after you compile it:
$ fifoserver&
We will discuss a FIFO's blocking action in a moment. First, consider the following simple client
frontend to our server:
/*****************************************************************************
Excerpt from "Linux Programmer's Guide - Chapter 6"
(C)opyright 1994-1995, Scott Burkett
*****************************************************************************
MODULE: fifoclient.c
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#define FIFO_FILE
"MYFIFO"
IPC Identifiers
Each IPC object has a unique IPC identifier associated with it. When we say IPC object, we are
speaking of a single message queue, semaphore set, or shared memory segment. This identifier is
used within the kernel to uniquely identify an IPC object. For example, to access a particular shared
memory segment, the only item you need is the unique ID value which has been assigned to that
segment.
The uniqueness of an identifier is relevant to the type of object in question. To illustrate this, assume
a numeric identifier of 12345. While there can never be two message queues with this same
identifier, there exists the distinct possibility of a message queue and, say, a shared memory
segment, which have the same numeric identifier.
IPC Keys
To obtain a unique ID, a key must be used. The key must be mutually agreed upon by both client
and server processes. This represents the first step in constructing a client/server framework for an
application.
When you use a telephone to call someone, you must know their number. In addition, the phone
company must know how to relay your outgoing call to its final destination. Once the other party
responds by answering the telephone call, the connection is made.
In the case of System V IPC facilities, the telephone correllates directly with the type of object
being used. The phone company, or routing method, can be directly associated with an IPC key.
The key can be the same value every time, by hardcoding a key value into an application. This has
the disadvantage of the key possibly being in use already. Often, the ftok() function is used to
generate key values for both the client and the server.
LIBRARY FUNCTION: ftok();
PROTOTYPE: key_t ftok ( char *pathname, char proj );
RETURNS: new IPC key value if successful
-1 if unsuccessful, errno set to return of stat() call
The returned key value from ftok() is generated by combining the inode number and minor device
number from the file in argument one, with the one character project indentifier in the second
argument. This doesn't guarantee uniqueness, but an application can check for collisions and retry
the key generation.
key_t
mykey;
mykey = ftok("/tmp/myapp", 'a');
In the above snippet, the directory /tmp/myapp is combined with the one letter identifier of 'a'.
Another common example is to use the current directory:
key_t
mykey;
mykey = ftok(".", 'a');
The key generation algorithm used is completely up to the discretion of the application programmer.
As long as measures are in place to prevent race conditions, deadlocks, etc, any method is viable.
For our demonstration purposes, we will use the ftok() approach. If we assume that each client
process will be running from a unique home directory, the keys generated should suffice for our
needs.
The key value, however it is obtained, is used in subsequent IPC system calls to create or gain
access to IPC objects.
By default, all three categories of objects are shown. Consider the following sample output of ipcs:
------ Shared Memory Segments -------shmid
owner
perms
bytes
nattch
status
status
messages
1
Here we see a single message queue which has an identifier of 0. It is owned by the user root, and
has octal permissions of 660, or -rw-rw--. There is one message in the queue, and that message has
a total size of 5 bytes.
The ipcs command is a very powerful tool which provides a peek into the kernel's storage
mechanisms for IPC objects. Learn it, use it, revere it.
<IPC ID>
Simply specify whether the object to be deleted is a message queue (msg), a semaphore set (sem), or
a shared memory segment (shm). The IPC ID can be obtained by the ipcs command. You have to
specify the type of object, since identifiers are unique among the same type (recall our discussion of
this earlier).
Message buffer
The first structure we'll visit is the msgbuf structure. This particular data structure can be thought of
as a template for message data. While it is up to the programmer to define structures of this type, it
is imperative that you understand that there is actually a structure of type msgbuf. It is declared in
linux/msg.h as follows:
/* message buffer for msgsnd and msgrcv calls */
struct msgbuf {
long mtype;
/* type of message */
char mtext[1];
/* message text */
};
The message type, represented in a positive number. This must be a positive number!
mtext
/* Message type */
/* Request identifier */
/* Client information structure */
Here we see the message type, as before, but the remainder of the structure has been replaced by
two other elements, one of which is another structure! This is the beauty of message queues. The
kernel makes no translations of data whatsoever. Any information can be sent.
There does exist an internal limit, however, of the maximum size of a given message. In Linux, this
is defined in linux/msg.h as follows:
#define MSGMAX
4056
/* <= 4056 */
Messages can be no larger than 4,056 bytes in total size, including the mtype member, which is 4
bytes in length (long).
This is a pointer to the next message in the queue. They are stored as a singly linked list
within kernel addressing space.
msg_type
While you will rarely have to concern yourself with most of the members of this structure, a brief
description of each is in order to complete our tour:
msg_perm
An instance of the ipc_perm structure, which is defined for us in linux/ipc.h. This holds
the permission information for the message queue, including the access permissions, and
information about the creator of the queue (uid, etc).
msg_first
Link to the first message in the queue (the head of the list).
msg_last
Link to the last message in the queue (the tail of the list).
msg_stime
Timestamp (time_t) of the last message that was sent to the queue.
msg_rtime
Timestamp of the last ``change'' made to the queue (more on this later).
wwait
and
rwait
Pointers into the kernel's wait queue. They are used when an operation on a message queue
deems the process go into a sleep state (i.e. queue is full and the process is waiting for an
opening).
msg_cbytes
Total number of bytes residing on the queue (sum of the sizes of all messages).
msg_qnum
All of the above are fairly self-explanatory. Stored along with the IPC key of the object is
information about both the creator and owner of the object (they may be different). The octal access
modes are also stored here, as an unsigned short. Finally, the slot usage sequence number is
stored at the end. Each time an IPC object is closed via a system call (destroyed), this value gets
incremented by the maximum number of IPC objects that can reside in a system. Will you have to
concern yourself with this value? No.
NOTE:There is an excellent discussion on this topic, and the security reasons as to its existence
and behavior, in Richard Stevens' UNIX Network Programming book, pp. 125.
The first argument to msgget() is the key value (in our case returned by a call to ftok()). This key
value is then compared to existing key values that exist within the kernel for other message queues.
At that point, the open or access operation is dependent upon the contents of the msgflg argument.
IPC_CREAT
Create the queue if it doesn't already exist in the kernel.
IPC_EXCL
When used with IPC_CREAT, fail if queue already exists.
If IPC_CREAT is used alone, msgget() either returns the message queue identifier for a newly
created message queue, or returns the identifier for a queue which exists with the same key value. If
IPC_EXCL is used along with IPC_CREAT, then either a new queue is created, or if the queue exists,
the call fails with -1. IPC_EXCL is useless by itself, but when combined with IPC_CREAT, it can be
used as a facility to guarantee that no existing queue is opened for access.
An optional octal mode may be OR'd into the mask, since each IPC object has permissions that are
similar in functionality to file permissions on a UNIX file system!
Let's create a quick wrapper function for opening or creating message queue:
int open_queue( key_t keyval )
{
int
qid;
if((qid = msgget( keyval, IPC_CREAT | 0660 )) == -1)
{
return(-1);
}
return(qid);
}
Note the use of the explicit permissions of 0660. This small function either returns a message queue
identifier (int), or -1 on error. The key value must be passed to it as its only argument.
The first argument to msgsnd is our queue identifier, returned by a previous call to msgget. The
second argument, msgp, is a pointer to our redeclared and loaded message buffer. The msgsz
argument contains the size of the message in bytes, excluding the length of the message type (4 byte
long).
The msgflg argument can be set to 0 (ignored), or:
IPC_NOWAIT
If the message queue is full, then the message is not written to the queue, and control is
returned to the calling process. If not specified, then the calling process will suspend (block)
until the message can be written.
Let's create another wrapper function for sending messages:
int send_message( int qid, struct mymsgbuf *qbuf )
{
int
result, length;
/* The length is essentially the size of the structure minus
sizeof(mtype) */
length = sizeof(struct mymsgbuf) - sizeof(long);
if((result = msgsnd( qid, qbuf, length, 0)) == -1)
{
return(-1);
}
}
return(result);
This small function attempts to send the message residing at the passed address (qbuf) to the
message queue designated by the passed queue identifier (qid). Here is a sample code snippet
utilizing the two wrapper functions we have developed so far:
#include
#include
#include
#include
<stdio.h>
<stdlib.h>
<linux/ipc.h>
<linux/msg.h>
main()
{
int
qid;
key_t msgkey;
struct mymsgbuf
long
int
double
} msg;
{
mtype;
request;
salary;
/* Message type */
/* Work request number */
/* Employee's salary */
/* Bombs away! */
if((send_message( qid, &msg )) == -1) {
perror("send_message");
exit(1);
}
After creating/opening our message queue, we proceed to load up the message buffer with test data
(note the lack of character data to illustrate our point about sending binary information). A quick
call to send_message merrily distributes our message out to the message queue.
Now that we have a message on our queue, try the ipcs command to view the status of your queue.
Now let's turn the discussion to actually retrieving the message from the queue. To do this, you use
the msgrcv() system call:
SYSTEM CALL: msgrcv();
PROTOTYPE: int msgrcv ( int msqid, struct msgbuf *msgp, int msgsz, long mtype,
int msgflg );
RETURNS: Number of bytes copied into message buffer
-1 on error: errno = E2BIG (Message length is greater than msgsz,
no MSG_NOERROR)
EACCES (No read permission)
EFAULT (Address pointed to by msgp is invalid)
EIDRM (Queue was removed during retrieval)
EINTR (Interrupted by arriving signal)
EINVAL (msgqid invalid, or msgsz less than 0)
ENOMSG (IPC_NOWAIT asserted, and no message
exists
in the queue to satisfy the request)
NOTES:
Obviously, the first argument is used to specify the queue to be used during the message retrieval
process (should have been returned by an earlier call to msgget). The second argument (msgp)
represents the address of a message buffer variable to store the retrieved message at. The third
argument (msgsz) represents the size of the message buffer structure, excluding the length of the
mtype member. Once again, this can easily be calculated as:
msgsz = sizeof(struct mymsgbuf) - sizeof(long);
The fourth argument (mtype) specifies the type of message to retrieve from the queue. The kernel
will search the queue for the oldest message having a matching type, and will return a copy of it in
the address pointed to by the msgp argument. One special case exists. If the mtype argument is
passed with a value of zero, then the oldest message on the queue is returned, regardless of type.
If IPC_NOWAIT is passed as a flag, and no messages are available, the call returns ENOMSG to
the calling process. Otherwise, the calling process blocks until a message arrives in the queue that
satisfies the msgrcv() parameters. If the queue is deleted while a client is waiting on a message,
EIDRM is returned. EINTR is returned if a signal is caught while the process is in the middle of
blocking, and waiting for a message to arrive.
Let's examine a quick wrapper function for retrieving a message from our queue:
int read_message( int qid, long type, struct mymsgbuf *qbuf )
{
int
result, length;
/* The length is essentially the size of the structure minus
sizeof(mtype) */
length = sizeof(struct mymsgbuf) - sizeof(long);
if((result = msgrcv( qid, qbuf, length, type,
{
return(-1);
}
}
return(result);
0)) == -1)
After successfully retrieving a message from the queue, the message entry within the queue is
destroyed.
The MSG_NOERROR bit in the msgflg argument provides some additional capabilities. If the
size of the physical message data is greater than msgsz, and MSG_NOERROR is asserted, then the
message is truncated, and only msgsz bytes are returned. Normally, the msgrcv() system call
returns -1 (E2BIG), and the message will remain on the queue for later retrieval. This behavior can
used to create another wrapper function, which will allow us to ``peek'' inside the queue, to see if a
message has arrived that satisfies our request:
int peek_message( int qid, long type )
{
int
result, length;
if((result = msgrcv( qid, NULL, 0, type,
{
if(errno == E2BIG)
return(TRUE);
}
IPC_NOWAIT)) == -1)
return(FALSE);
}
Above, you will notice the lack of a buffer address and a length. In this particular case, we want the
call to fail. However, we check for the return of E2BIG which indicates that a message does exist
which matches our requested type. The wrapper function returns TRUE on success, FALSE
otherwise. Also note the use of the IPC_NOWAIT flag, which prevents the blocking behavior
described earlier.
Now, common sense dictates that direct manipulation of the internal kernel data structures could
lead to some late night fun. Unfortunately, the resulting duties on the part of the programmer could
only be classified as fun if you like trashing the IPC subsystem. By using msgctl() with a selective
set of commands, you have the ability to manipulate those items which are less likely to cause grief.
Let's look at these commands:
IPC_STAT
Retrieves the msqid_ds structure for a queue, and stores it in the address of the buf
argument.
IPC_SET
Sets the value of the ipc_perm member of the msqid_ds structure for a queue. Takes the
values from the buf argument.
IPC_RMID
Removes the queue from the kernel.
Recall our discussion about the internal data structure for message queues (msqid_ds). The kernel
maintains an instance of this structure for each queue which exists in the system. By using the
IPC_STAT command, we can retrieve a copy of this structure for examination. Let's look at a quick
wrapper function that will retrieve the internal structure and copy it into a passed address:
int get_queue_ds( int qid, struct msgqid_ds *qbuf )
{
if( msgctl( qid, IPC_STAT, qbuf) == -1)
return(-1);
}
}
return(0);
If we are unable to copy the internal buffer, -1 is returned to the calling function. If all went well, a
value of 0 (zero) is returned, and the passed buffer should contain a copy of the internal data
structure for the message queue represented by the passed queue identifier (qid).
Now that we have a copy of the internal data structure for a queue, what attributes can be
manipulated, and how can we alter them? The only modifiable item in the data structure is the
ipc_perm member. This contains the permissions for the queue, as well as information about the
owner and creator. However, the only members of the ipc_perm structure that are modifiable are
mode, uid, and gid. You can change the owner's user id, the owner's group id, and the access
permissions for the queue.
Let's create a wrapper function designed to change the mode of a queue. The mode must be passed
in as a character array (i.e. ``660'').
int change_queue_mode( int qid, char *mode )
{
struct msqid_ds tmpbuf;
/* Retrieve a current copy of the internal data structure */
get_queue_ds( qid, &tmpbuf);
/* Change the permissions using an old trick */
sscanf(mode, "%ho", &tmpbuf.msg_perm.mode);
/* Update the internal data structure */
if( msgctl( qid, IPC_SET, &tmpbuf) == -1)
{
return(-1);
}
}
return(0);
We retrieve a current copy of the internal data structure by a quick call to our get_queue_ds
wrapper function. We then make a call to sscanf() to alter the mode member of the associated
msg_perm structure. No changes take place, however, until the new copy is used to update the
internal version. This duty is performed by a call to msgctl() using the IPC_SET command.
BE CAREFUL! It is possible to alter the permissions on a queue, and in doing so, inadvertently lock
yourself out! Remember, these IPC objects don't go away unless they are properly removed, or the
system is rebooted. So, even if you can't see a queue with ipcs doesn't mean that it isn't there.
To illustrate this point, a somewhat humorous anecdote seems to be in order. While
teaching a class on UNIX internals at the University of South Florida, I ran into a
rather embarrassing stumbling block. I had dialed into their lab server the night
before, in order to compile and test the labwork to be used in the week-long class. In
the process of my testing, I realized that I had made a typo in the code used to alter
the permissions on a message queue. I created a simple message queue, and tested
the sending and receiving capabilities with no incident. However, when I attempted
to change the mode of the queue from ``660'' to ``600'', the resulting action was that
I was locked out of my own queue! As a result, I could not test the message queue
labwork in the same area of my source directory. Since I used the ftok() function to
create the IPC key, I was trying to access a queue that I did not have proper
permissions for. I ended up contacting the local system administrator on the morning
of the class, only to spend an hour explaining to him what a message queue was, and
why I needed him to run the ipcrm command for me. grrrr.
After successfully retrieving a message from a queue, the message is removed. However, as
mentioned earlier, IPC objects remain in the system unless explicitly removed, or the system is
rebooted. Therefore, our message queue still exists within the kernel, available for use long after a
single message disappears. To complete the life cycle of a message queue, they should be removed
with a call to msgctl(), using the IPC_RMID command:
int remove_queue( int qid )
{
if( msgctl( qid, IPC_RMID, 0) == -1)
{
return(-1);
}
return(0);
}
This wrapper function returns 0 if the queue was removed without incident, else a value of -1. The
removal of the queue is atomic in nature, and any subsequent accesses to the queue for whatever
purpose will fail miserably.