QNX Software Systems
QNX Software Systems
PROGRAMMER’S GUIDE
QNX Neutrino Realtime
Operating System
Programmer’s Guide
Publishing history
July 2004 First edition
QNX, Momentics, Neutrino, and Photon microGUI are registered trademarks of QNX Software Systems in certain jurisdictions. All other trademarks and
trade names belong to their respective owners.
Printed in Canada.
Debugging 20
Debugging in a self-hosted environment 20
Debugging in a cross-development environment 21
The GNU debugger (gdb) 23
The process-level debug agent 23
A simple debug session 30
Configure the target 30
Compile for debugging 30
Start the debug session 30
Get help 32
Sample boot image 34
2 Programming Overview 37
Process model 39
An application as a set of processes 40
Processes and threads 41
Some definitions 41
Priorities and scheduling 43
Priority range 43
BLOCKED and READY states 44
The ready queue 45
Suspending a running thread 47
When the thread is blocked 47
When the thread is preempted 47
When the thread yields 48
Scheduling algorithms 48
FIFO scheduling 49
Round-robin scheduling 50
Why threads? 51
Summary 52
3 Processes 53
Starting processes — two methods 55
Process creation 55
Concurrency 56
Using fork() and forkpty() 57
Inheriting file descriptors 57
Process termination 58
Normal process termination 59
Abnormal process termination 59
Affect of parent termination 61
Detecting process termination 61
Glossary 447
Index 471
Typographical conventions
Throughout this manual, we use certain typographical conventions to
distinguish technical terms. In general, the conventions we use
conform to those found in IEEE POSIX publications. The following
table summarizes our conventions:
Reference Example
Code examples if( stream == NULL )
Command options -lR
Commands make
Environment variables PATH
File and pathnames /dev/null
Function names exit()
Keyboard chords Ctrl – Alt – Delete
Keyboard input something you type
Keyboard keys Enter
Program output login:
Programming constants NULL
Programming data types unsigned short
Programming literals 0xFF, "message string"
Variable names stdin
User-interface components Cancel
We use an arrow (→) in directions for accessing menu items, like this:
This table may help you find what you need in the Programmer’s
Guide:
continued. . .
Recommended reading
For the most part, the information that’s documented in the
Programmer’s Guide is specific to QNX. For more general
information, we recommend the following books:
Threads:
In this chapter. . .
Choosing the version of the OS 3
Conforming to standards 4
Header files in /usr/include 7
Self-hosted or cross-development 8
Using libraries 14
Linking your modules 18
Debugging 20
A simple debug session 30
When you install QNX Momentics, you get a set of configuration files
that indicate where you’ve install the software. The
QNX CONFIGURATION environment variable stores the location
of the configuration files for the installed versions of Neutrino; on a
self-hosted Neutrino machine, the default is /etc/qconfig.
If you’re using the command-line tools, use the qconfig utility to
configure your machine to use a specific version of Neutrino.
¯ If you run it without any options, qconfig lists the versions that
are installed on your machine.
¯ If you use the -e option, you can use qconfig to set up the
environment for building software for a specific version of the OS.
For example, if you’re using the Korn shell (ksh), you can
configure your machine like this:
eval qconfig -n "QNX Neutrino 6.3.0" -e
When you start the IDE, it uses your current qconfig choice as the
default version of the OS; if you haven’t chosen a version, the IDE
chooses an entry from the directory identified by
QNX CONFIGURATION. If you want to override the IDE’s choice,
you can choose the appropriate build target. For details, see “Version
coexistence” in the Concepts chapter of the IDE User’s Guide.
Neutrino uses these environment variables to locate files on the host
machine:
QNX TARGET
The location of target backends on the host
machine.
Conforming to standards
The header files supplied with the C library provide the proper
declarations for the functions and for the number and types of
arguments used with them. Constant values used in conjunction with
the functions are also declared. The files can usually be included in
any order, although individual function descriptions show the
preferred order for specific headers.
When the -ansi option is used, qcc compiles strict ANSI code. Use
this option when you’re creating an application that must conform to
the ANSI standard. The effect on the inclusion of ANSI- and
POSIX-defined header files is that certain portions of the header files
are omitted:
¯ for ANSI header files, these are the portions that go beyond the
ANSI standard
¯ for POSIX header files, these are the portions that go beyond the
POSIX standard
You can then use the qcc -D option to define feature-test macros to
select those portions that are omitted. Here are the most commonly
used feature-test macros:
POSIX C SOURCE=199506
Include those portions of the header files that relate to the
POSIX standard (IEEE Standard Portable Operating System
Interface for Computer Environments - POSIX 1003.1, 1996)
FILE OFFSET BITS=64
Make the libraries use 64-bit file offsets.
LARGEFILE64 SOURCE
Include declarations for the functions that support large files
(those whose names end with 64).
QNX SOURCE
Include everything defined in the header files. This is the
default.
¯ <limits.h>
¯ <setjmp.h>
¯ <signal.h>
¯ <stdio.h>
¯ <stdlib.h>
¯ <time.h>
The following ANSI and POSIX header files are affected by the
QNX SOURCE feature test macro:
Self-hosted or cross-development
In the rest of this chapter, we’ll describe how to compile and debug a
Neutrino system. Your Neutrino system might be anything from a
deeply embedded turnkey system to a powerful multiprocessor server.
You’ll develop the code to implement your system using development
tools running on the Neutrino platform itself or on any other
supported cross-development platform.
Neutrino supports both of these development types:
A simple example
We’ll now go through the steps necessary to build a simple Neutrino
system that runs on a standard PC and prints out the text
“Hello, world!” — the classic first C program.
Let’s look at the spectrum of methods available to you to run your
executable:
continued. . .
Which method you use depends on what’s available to you. All the
methods share the same initial step — write the code, then compile
and link it for Neutrino on the platform that you wish to run the
program on.
☞ You can choose how you wish to compile and link your programs:
you can use tools with a command-line interface (via the qcc
command) or you can use an IDE (Integrated Development
Environment) with a graphical user interface (GUI) environment. Our
samples here illustrate the command-line method.
int
main (void)
{
printf ("Hello, world!\n");
return (0);
}
qcc -V
If you’re using an IDE, refer to the documentation that came with the
IDE software for more information.
At this point, you should have an executable called hello.
Self-hosted
If you’re using a self-hosted development system, you’re done. You
don’t even have to use the -V cross-compilation flag (as was shown
above), because the qcc driver will default to the current platform.
You can now run hello from the command line:
hello
For a network filesystem, you’ll need to ensure that the shell’s PATH
environment variable includes the path to your executable via the
network-mounted filesystem. At this point, you can just type the
name of the executable at the target’s command-line prompt (if you’re
running a shell on the target):
hello
Download/upload facility
When the debug agent is connected to the host debugger, you can
transfer files between the host and target systems. Note that this is a
general-purpose file transfer facility — it’s not limited to transferring
only executables to the target (although that’s what we’ll be
describing here).
In order for Neutrino to execute a program on the target, the program
must be available for loading from some type of filesystem. This
means that when you transfer executables to the target, you must
write them to a filesystem. Even if you don’t have a conventional
filesystem on your target, recall that there’s a writable “filesystem”
present under Neutrino — the /dev/shmem filesystem. This serves as
a convenient RAM-disk for downloading the executables to.
[virtual=ppcbe,elf] .bootstrap = {
startup-800fads
PATH=/proc/boot procnto-800
}
[+script] .script = {
devc-serppc800 -e -c20000000 -b9600 smc1 &
reopen
hello
}
[type=link] /dev/console=/dev/ser1
[type=link] /usr/lib/ldqnx.so.2=/proc/boot/libc.so
[perms=+r,+x]
libc.so
[data=copy]
[perms=+r,+x]
devc-serppc800
hello
¯ A PowerPC 800 FADS board and ELF boot prefix code are being
used to boot.
¯ The image should contain devc-serppc800, the serial
communications manager for the PowerPC 80x family, as well as
hello (our test program).
Let’s assume that the above buildfile is called hello.bld. Using the
mkifs utility, you could then build an image by typing:
dinit -f hello.ifs a:
Using libraries
When you’re developing code, you almost always make use of a
library — a collection of code modules that you or someone else has
already developed (and hopefully debugged). Under Neutrino, we
have three different ways of using libraries:
¯ static linking
¯ dynamic linking
¯ runtime loading
Static linking
You can combine your modules with the modules from the library to
form a single executable that’s entirely self-contained. We call this
static linking. The word “static” implies that it’s not going to change
— all the required modules are already combined into one executable.
Dynamic linking
Rather than build a self-contained executable ahead of time, you can
take your modules and link them in such a way that the Process
Manager will link them to the library modules before your program
runs. We call this dynamic linking. The word “dynamic” here means
that the association between your program and the library modules
that it uses is done at load time, not at linktime (as was the case with
the static version).
Runtime loading
There’s a variation on the theme of dynamic linking called runtime
loading. In this case, the program decides while it’s actually running
that it wishes to load a particular function from a library.
Static libraries
A static library is usually identified by a .a (for “archive”) suffix (e.g.
libc.a). The library contains the modules you want to include in
your program and is formatted as a collection of ELF object modules
that the linker can then extract (as required by your program) and bind
with your program at linktime.
This “binding” operation literally copies the object module from the
library and incorporates it into your “finished” executable. The major
advantage of this approach is that when the executable is created, it’s
entirely self-sufficient — it doesn’t require any other object modules
to be present on the target system. This advantage is usually
outweighed by two principal disadvantages, however:
¯ Every executable created in this manner has its own private copy of
the library’s object modules, resulting in large executable sizes
(and possibly slower loading times, depending on the medium).
Dynamic libraries
A dynamic library is usually identified by a .so (for “shared object”)
suffix (e.g. libc.so). Like a static library, this kind of library also
contains the modules that you want to include in your program, but
these modules are not bound to your program at linktime. Instead,
your program is linked in such a way that the Process Manager causes
your program to be bound to the shared objects at load time.
The Process Manager performs this binding by looking at the program
to see if it references any shared objects (.so files). If it does, then
the Process Manager looks to see if those particular shared objects are
already present in memory. If they’re not, it loads them into memory.
Then the Process Manager patches your program to be able to use the
shared objects. Finally, the Process Manager starts your program.
Note that from your program’s perspective, it isn’t even aware that it’s
running with a shared object versus being statically linked — that
happened before the first line of your program ran!
The main advantage of dynamic linking is that the programs in the
system will reference only a particular set of objects — they don’t
contain them. As a result, programs are smaller. This also means that
you can upgrade the shared objects without relinking the programs.
This is especially handy when you don’t have access to the source
code for some of the programs.
dlopen()
When a program decides at runtime that it wants to “augment” itself
with additional code, it will issue the dlopen() function call. This
function call tells the system that it should find the shared object
referenced by the dlopen() function and create a binding between the
program and the shared object. Again, if the shared object isn’t
present in memory already, the system will load it. The main
advantage of this approach is that the program can determine, at
runtime, which objects it needs to have access to.
Note that there’s no real difference between a library of shared
objects that you link against and a library of shared objects that you
load at runtime. Both modules are of the exact same format. The only
difference is in how they get used.
By convention, therefore, we place libraries that you link against
(whether statically or dynamically) into the lib directory, and shared
objects that you load at runtime into the lib/dll (for “dynamically
loaded libraries”) directory.
Note that this is just a convention — there’s nothing stopping you
from linking against a shared object in the lib/dll directory or from
using the dlopen() function call on a shared object in the lib
directory.
☞ You can use the -L option to qcc to explicitly provide a library path.
☞ For this release of Neutrino, you can’t use the floating point emulator
(fpemu.so) in statically linked executables.
You may wish to use the above “mixed-mode” linking because some
of the libraries you’re using will be needed by only one executable or
because the libraries are small (less than 4 KB), in which case you’d
be wasting memory to use them as shared libraries. Note that shared
libraries are typically mapped in 4-KB pages and will require at least
one page for the “text” section and possibly one page for the
“data” section.
1 Compile the source files for the library using the -shared
option to qcc.
☞ Make sure that all objects and “static” libs that are pulled into a .so
are position-independent as well (i.e. also compiled with -shared).
"-Wl,-hname"
(You might need the quotes to pass the option through to the linker
intact, depending on the shell.)
This option sets the internal name of the shared object to name instead
of to the object’s pathname, so you’d use name to access the object
when dynamically linking. You might find this useful when doing
cross-development (e.g. from a Windows NT system to a Neutrino
target).
Debugging
Now let’s look at the different options you have for debugging the
executable. Just as you have two basic ways of developing
(self-hosted and cross-development), you have similar options for
debugging.
Debug
Debugger Executable
agent
In this case, the debugger starts the debug agent, and then establishes
its own communications channel to the debug agent.
Communications
channel
Debug
Debugger Executable
agent
☞ In order to debug your programs with full source using the symbolic
debugger, you’ll need to tell the C compiler and linker to include
symbolic information in the object and executable files. For details,
see the qcc docs in the Utilities Reference. Without this symbolic
information, the debugger can provide only assembly-language-level
debugging.
Starting gdb
You can invoke gdb by using the following variants, which
correspond to your target platform:
For more information, see the gdb entry in the Utilities Reference.
☞ To use the pdebug agent, you must set up pty support (via
devc-pty) on your target.
When the process’s threads are stopped and the debugger is in control,
you may examine the state of any thread within the process. You may
also “freeze” all or a subset of the stopped threads when you continue.
For more info on examining thread states, see your debugger docs.
The pdebug agent may either be included in the image and started in
the image startup script or started later from any available filesystem
that contains pdebug.
The pdebug command-line invocation specifies which device will be
used. (Note that for self-hosted debugging, pdebug is started
automatically by the host debugger.)
You can start pdebug in one of three ways, reflecting the nature of the
connection between the debugger and the debug agent:
¯ serial connection
¯ TCP/IP static port connection
¯ TCP/IP dynamic port connection
Serial connection
If the host and target systems are connected via a serial port, then the
debug agent (pdebug) should be started with the following command:
pdebug devicename[,baud]
/dev/ser2
Running the process debug agent with a serial link at 115200 baud.
Tx Tx
Rx Rx
RTS RTS
CTS CTS
Host Target
DTR DTR
(DTE) (DTE)
DSR DSR
CD CD
Gnd Gnd
RI RI
TCP/IP connection
If the host and the target are connected via some form of TCP/IP
connection, the debugger and agent can use that connection as well.
Two types of TCP/IP communications are possible with the debugger
and agent: static port and dynamic port connections (see below).
The Neutrino target must have a supported Ethernet controller. Note
that since the debug agent requires the TCP/IP manager to be running
on the target, this requires more memory.
This need for extra memory is offset by the advantage of being able to
run multiple debuggers with multiple debug sessions over the single
network cable. In a networked development environment, developers
on different network hosts could independently debug programs on a
single common target.
TCP/IP
Developers'
stations
Target
For a static port connection, the debug agent is assigned a TCP/IP port
number and will listen for communications on that port only. For
example, the pdebug 1204 command specifies TCP/IP port 1204:
Port 1204
TCP/IP
pdebug -
TCP/IP
inetd
Port Port
1234 1234
Port
1234
For a TCP/IP dynamic port connection, the inetd process will manage the
port.
Note that this method is also suitable for one or more developers.
the same, inetd causes unique ports to be used on the debugger side.
This ensures a unique socket pair for each session.
Note that inetd should be included and started in your boot image.
The pdebug program should also be in your boot image (or available
from a mounted filesystem).
The config files could be built into your boot image (as in this sample
script) or linked in from a remote filesystem using the [type=link]
command:
[+script] startup-script = {
# explicitly running in edited mode for the console link
devc-ser8250 -e -b115200 &
reopen
display msg Welcome to Neutrino on a PC-compatible BIOS system
# tcp/ip with a NE2000 Ethernet adaptor
io-net -dne2000 -pttcpip if=ndi0:10.0.1.172 &
waitfor /dev/socket
inetd &
pipe &
# pdebug needs devc-pty and esh
devc-pty &
# NFS mount of the Neutrino filesystem
fs-nfs2 -r 10.89:/x86 /x86 -r 10.89:/home /home &
# CIFS mount of the NT filesystem
fs-cifs -b //QA:10.0.1.181:/QARoot /QAc apkleywegt 123 &
# NT Hyperterm needs this to interpret backspaces correctly
stty erase=08
reopen /dev/console
[+session] esh
}
[type=link] /usr/lib/ldqnx.so.2=/proc/boot/libc.so
[type=link] /lib=/x86/lib
[type=link] /tmp=/dev/shmem # tmp points to shared memory
[type=link] /dev/console=/dev/ser2 # no local terminal
/etc/services = {
ftp 21/tcp
telnet 23/tcp
finger 79/tcp
pdebug 8000/tcp
}
/etc/inetd.conf = {
ftp stream tcp nowait root /bin/fdtpd fdtpd
telnet stream tcp nowait root /bin/telnetd telnetd
finger stream tcp nowait root /bin fingerd
pdebug stream tcp nowait root /bin/pdebug pdebug -
}
¯ getting help
The above specifies that the host IP address is 10.0.1.172 (or 10.428
for short). The pdebug program is configured to use port 8000.
Get help
While in a debug session, any of the following commands could be
used as the next action for starting the actual debugging of the project:
n Next instruction
help inspect
Get help for the inspect command
# list command:
(gdb) l
3
4 main () {
5
6 int x,y,z;
7
8 setprio (0,9);
9 printf ("Hi ya!\n");
10
11 x=3;
12 y=2;
16
17 }
Continuing.
[+script] startup-script = {
# explicitly running in edited mode for the console link
devc-ser8250 -e -b115200 &
reopen
display msg Welcome to Neutrino on a PC-compatible BIOS system
# tcp/ip with a NE2000 Ethernet adaptor
io-net -dne2000 -pttcpip if=ndi0:10.0.1.172 &
waitfor /dev/socket
pipe &
# pdebug needs devc-pty
devc-pty &
# starting pdebug twice on separate ports
[+session] pdebug 8000 &
}
[type=link] /usr/lib/ldqnx.so.2=/proc/boot/libc.so
[type=link] /lib=/x86/lib
[type=link] /tmp=/dev/shmem # tmp points to shared memory
[type=link] /dev/console=/dev/ser2 # no local terminal
io-net
pipe
devc-pty
pdebug
esh
ping
ls
In this chapter. . .
Process model 39
Processes and threads 41
Priorities and scheduling 43
Scheduling algorithms 48
Why threads? 51
Summary 52
Process model
The Neutrino OS architecture consists of the microkernel and some
number of cooperating processes. These processes communicate with
each other via various forms of interprocess communication (IPC).
Message passing is the primary form of IPC in Neutrino.
QNX 4
file CD-ROM
manager file
Flash
manager
file
manager
Process
manager DOS file NFS file
manager manager
Neutrino
microkernel
Software bus
Qnet
Photon Application network
Font
GUI manager
manager
manager
The Neutrino architecture acts as a kind of “software bus” that lets you
dynamically plug in/out OS modules. This picture shows the graphics driver
sending a message to the font manager when it wants the bitmap for a font.
The font manager responds with the bitmap.
the text to be drawn in. The font manager responds with the requested
bitmaps, and the graphics driver then draws the bitmaps on the screen.
Some definitions
In the Neutrino OS, we typically use only the terms process and
thread. An “application” typically means a collection of processes;
the term “program” is usually equivalent to “process.”
A thread is a single flow of execution or control. At the lowest level,
this equates to the program counter or instruction pointer register
advancing through some machine instructions. Each thread has its
own current value for this register.
Threads don’t share such things as stack, values for the various
registers, SMP thread-affinity mask, and a few other things.
Two threads residing in two different processes don’t share very
much. About the only thing they do share is the CPU. You can have
them share memory between them, but this takes a little setup (see
shm open() in the Library Reference for an example).
When you run a process, you’re automatically running a thread. This
thread is called the “main” thread, since the first
programmer-provided function that runs in a C program is main().
The main thread can then create additional threads if need be.
Only a few things are special about the main thread. One is that if it
returns normally, the code it returns to calls exit(). Calling exit()
terminates the process, meaning that all threads in the process are
terminated. So when you return normally from the main thread, the
process is terminated. When other threads in the process return
normally, the code they return to calls pthread exit(), which
terminates just that thread.
Another special thing about the main thread is that if it terminates in
such a manner that the process is still around (e.g. it calls
pthread exit() and there are other threads in the process), then the
memory for the main thread’s stack is not freed up. This is because
the command-line arguments are on that stack and other threads may
need them. If any other thread terminates, then that thread’s stack is
freed.
Priority range
Each thread can have a scheduling priority ranging from 1 to 63 (the
highest priority), independent of the scheduling policy. The special
idle thread (in the process manager) has priority 0 and is always ready
to run. A thread inherits the priority of its parent thread by default.
A thread has both a real priority and an effective priority, and is
scheduled in accordance with its effective priority. The thread itself
can change both its real and effective priority together, but the
effective priority may change because of priority inheritance or the
scheduling policy. Normally, the effective priority is the same as the
real priority.
Interrupt handlers are of higher priority than any thread, but they’re
not scheduled in the same way as threads. If an interrupt occurs, then:
Priorities
(hardware interrupt handlers)
63
.
.
.
10
. A B C
.
.
.
Priority
.
.
5
.
D G
.
.
.
1 E
0 F (idle)
Ready
queue
63
Active
10
A B C
Priority
5 Blocked
D
G Z
E Idle
0
F
The ready queue for six threads (A-F) that are READY. All other threads
(G-Z) are BLOCKED. Thread A is currently running. Thread A, B, and C are
at the highest priority, so they’ll share the processor based on the running
thread’s scheduling algorithm.
¯ is blocked
¯ is preempted
¯ yields
Scheduling algorithms
To meet the needs of various applications, Neutrino provides these
scheduling algorithms:
Each thread in the system may run using any method. Scheduling
methods are effective on a per-thread basis, not on a global basis for
all threads and processes on a node.
Remember that these scheduling algorithms apply only when two or
more threads that share the same priority are READY (i.e. the threads
are directly competing with each other). If a higher-priority thread
becomes READY, it immediately preempts all lower-priority threads.
In the following diagram, three threads of equal priority are READY.
If Thread A blocks, Thread B will run.
Ready
queue
Active
10
B C
Priority
Blocked
FIFO scheduling
In FIFO (SCHED FIFO) scheduling, a thread selected to run continues
executing until it:
Ready
queue
Priority Active
10
A B C
Round-robin scheduling
In round-robin (SCHED RR) scheduling, a thread selected to run
continues executing until it:
Active
Priority
10
B C A
Why threads?
Now that we know more about priorities, we can talk about why you
might want to use threads. We saw many good reasons for breaking
things up into separate processes, but what’s the purpose of a
multithreaded process?
Let’s take the example of a driver. A driver typically has two
obligations: one is to talk to the hardware and the other is to talk to
other processes. Generally, talking to the hardware is more
time-critical than talking to other processes. When an interrupt comes
in from the hardware, it needs to be serviced in a relatively small
window of time — the driver shouldn’t be busy at that moment
talking to another process.
One way of fixing this problem is to choose a way of talking to other
processes where this situation simply won’t arise (e.g. don’t send
messages to another process such that you have to wait for
acknowledgment, don’t do any time-consuming processing on behalf
of other processes, etc.).
Another way is to use two threads: a higher-priority thread that deals
with the hardware and a lower-priority thread that talks to other
processes. The lower-priority thread can be talking away to other
Summary
The modular architecture is apparent throughout the entire system:
the Neutrino OS itself consists of a set of cooperating processes, as
does an application. And each individual process can comprise
several cooperating threads. What “keeps everything together” is the
priority-based preemptive scheduling in Neutrino, which ensures that
time-critical tasks are dealt with by the right thread or process at the
right time.
In this chapter. . .
Starting processes — two methods 55
Process creation 55
Process termination 58
Detecting process termination 61
Process creation
The process manager component of procnto is responsible for
process creation. If a process wants to create another process, it
makes a call to one of the process-creation functions, which then
effectively sends a message to the process manager.
Here are the process-creation functions:
¯ fork()
¯ forkpty()
¯ popen()
¯ spawn()
¯ system()
¯ vfork()
For details on each of these functions, see their entries in the Library
Reference. Here we’ll mention some of the things common to many
of them.
Concurrency
Three possibilities can happen to the creator during process creation:
2 The child replaces the parent. In fact, they’re not really parent
and child, because the image of the given process simply
replaces that of the caller. Many things will change, but those
things that uniquely identify a process (such as the process ID)
will remain the same. This is typically referred to as “execing,”
since usually the exec*() functions are used.
Many things will remain the same (including the process ID,
parent process ID, and file descriptors) with the exception of
file descriptors that had the FD CLOEXEC flag set using fcntl().
See the exec*() functions for more on what will and will not be
the same across the exec.
☞ Many programmers coming from the Unix world are familiar with the
technique of using a call to fork() followed by a call to one of the
exec*() functions in order to create a process that’s different from the
caller. In Neutrino, you can usually achieve the same thing in a single
call to one of the spawn*() functions.
5 in use for a particular file when the parent creates the child, the child
will also have file descriptor 5 in use for that same file. The child’s
file descriptor will have been duped from the parent’s. This means
that at the filesystem manager level, the parent and child have the
same open control block (OCB) for the file, so if the child seeks to
some position in the file, then that changes the parent’s seek position
as well. It also means that the child can do a write(5, buf,
nbytes) without having previously called open().
If you don’t want the child to inherit a particular file descriptor, then
you can use fcntl() to prevent it. Note that this won’t prevent
inheritance of a file descriptor during a fork(). The call to fcntl()
would be:
If you want the parent to set up exactly which files will be open for
the child, then you can use the fd count and fd map parameters with
spawn(). Note that in this case, only the file descriptors you specify
will be inherited. This is especially useful for redirecting the child’s
standard input (file descriptor 0), standard output (file descriptor 1),
and standard error (file descriptor 2) to places where the parent wants
them to go.
Alternatively this file descriptor inheritance can also be done through
use of fork(), one or more calls to dup(), dup2() and close(), and then
exec*(). The call to fork() creates a child that inherits all the of the
parent’s file descriptors. dup(), dup2() and close() are then used by the
child to rearrange its file descriptors. Lastly, exec*() is called to
replace the child with the process to be created. Though more
complicated, this method of setting up file descriptors is portable
whereas the spawn() method is not.
Process termination
A process can terminate in one of two basic ways:
¯ If you create a shared memory object and then map in more than
the size of the object, when you try to write past the size of the
object you’ll be hit with SIGBUS. In this case, the virtual address
used is valid (since the mapping succeeded), but the memory
cannot be accessed.
You can also have the current state of a terminated process written to
a file so that you can later bring up the debugger and examine just
what happened. This type of examination is called postmortem
debugging. This happens only if the process is terminated due to one
of these signals:
Signal Description
SIGABRT Program-called abort function
SIGBUS Parity error
SIGEMT EMT instruction
SIGFPE Floating-point error or division by zero
SIGILL Illegal instruction executed
SIGQUIT Quit
SIGSEGV Segmentation violation
continued. . .
Signal Description
SIGSYS Bad argument to a system call
SIGTRAP Trace trap (not reset when caught)
SIGXCPU Exceeded the CPU limit
SIGXFSZ Exceeded the file size limit
The process that dumps the state to a file when the process terminates
is called dumper, which must be running when the abnormal
termination occurs. This is extremely useful, because embedded
systems may run unassisted for days or even years before a crash
occurs, making it impossible to reproduce the actual circumstances
leading up to the crash.
¯ puts it in session 1
¯ optionally, closes all file descriptors except stdin, stdout, and stderr
The following sample illustrates the use of wait() for waiting for child
processes to terminate.
#include <spawn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
while (1) {
if ((pid = wait(&status)) == -1) {
perror("wait() failed (no more child processes?)");
exit(EXIT FAILURE);
}
printf("a child terminated, pid = %d\n", pid);
if (WIFEXITED(status)) {
printf("child terminated normally, exit status = %d\n",
WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
printf("child terminated abnormally by signal = %X\n",
WTERMSIG(status));
} // else see documentation for wait() for more macros
}
}
The following is a simple child process to try out with the above
parent.
#include <stdio.h>
#include <unistd.h>
The sigwaitinfo() function will block, waiting until any signals that
the caller tells it to wait for are set on the caller. If a child process
terminates, then the SIGCHLD signal is set on the parent. So all the
parent has to do is request that sigwaitinfo() return when SIGCHLD
arrives.
#include <errno.h>
#include <spawn.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/neutrino.h>
void
signal handler(int signo)
{
// do nothing
}
// mask out the SIGCHLD signal so that it will not interrupt us,
// (side note: the child inherits the parents mask)
sigemptyset(&mask);
sigaddset(&mask, SIGCHLD);
sigprocmask(SIG BLOCK, &mask, NULL);
while (1) {
if (sigwaitinfo(&mask, &info) == -1) {
perror("sigwaitinfo() failed");
continue;
}
switch (info.si signo) {
case SIGCHLD:
// info.si pid is pid of terminated process, it is not POSIX
printf("a child terminated, pid = %d\n", info.si pid);
break;
default:
// should not get here since we only asked for SIGCHLD
}
}
}
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>
#include <sys/neutrino.h>
#include <sys/procfs.h>
#include <sys/stat.h>
memset(&rattr, 0, sizeof(rattr));
rattr.msg max size = 2048;
while (1) {
if ((ctp = dispatch block(ctp)) == NULL) {
fprintf(stderr, "%s: dispatch block failed: %s\n",
progname, strerror(errno));
exit(1);
}
dispatch handler(ctp);
}
}
struct dinfo s {
procfs debuginfo info;
char pathbuffer[PATH MAX]; /* 1st byte is
info.path[0] */
};
int
display process info(pid t pid)
{
char buf[PATH MAX + 1];
int fd, status;
struct dinfo s dinfo;
procfs greg reg;
sizeof(dinfo), NULL);
if (status != EOK) {
close(fd);
return status;
}
/*
* For getting other type of information, see sys/procfs.h,
* sys/debug.h, and sys/dcmd proc.h
*/
close(fd);
return EOK;
}
int
io write(resmgr context t *ctp, io write t *msg,
RESMGR OCB T *ocb)
{
char *pstr;
int status;
if (dumper fd != -1) {
/* pass it on to dumper so it can handle it too */
if (write(dumper fd, pstr, strlen(pstr)) == -1) {
close(dumper fd);
dumper fd = -1; /* something wrong, no sense in
doing it again later */
}
}
return EOK;
}
In this chapter. . .
What is a resource manager? 75
Components of a resource manager 85
Simple examples of device resource managers 90
Data carrying structures 99
Handling the IO READ message 109
Handling the IO WRITE message 119
Methods of returning and replying 122
Handling other read/write details 127
Attribute handling 133
Combine messages 134
Extending Data Control Structures (DCS) 142
Handling devctl() messages 145
Handling ionotify() and select() 152
Handling private messages and pulses 164
Handling open(), dup(), and close() messages 167
Handling client unblocking due to signals or timeouts 168
Handling interrupts 170
Multi-threaded resource managers 173
Filesystem resource managers 178
Message types 186
Resource manager data structures 188
restricted in what you can do (except for the restrictions that exist
inside an ISR).
A few examples...
A serial port may be managed by a resource manager called
devc-ser8250, although the actual resource may be called
/dev/ser1 in the pathname space. When a process requests serial
port services, it does so by opening a serial port (in this case
/dev/ser1).
fd = open("/dev/ser1", O RDWR);
for (packet = 0; packet < npackets; packet++)
write(fd, packets[packet], PACKET SIZE);
close(fd);
fd = open("/dev/dvd", O WRONLY);
while (data = get dvd data(handle, &nbytes)) {
bytes written = write(fd, data, nbytes);
if (bytes written != nbytes) {
perror ("Error writing the DVD data");
}
}
close(fd);
The cat utility takes the name of a file and opens the file, reads
from it, and displays whatever it reads to standard output (typically
the screen). As a result, you can type:
cat /proc/ipstats
verbosity level 0
ip checksum errors: 0
udp checksum errors: 0
tcp checksum errors: 0
packets sent: 82
packets received: 82
/*
* In this stage, the client talks
* to the process manager and the resource manager.
*/
fd = open("/dev/ser1", O RDWR);
/*
* In this stage, the client talks directly to the
* resource manager.
*/
for (packet = 0; packet < npackets; packet++)
write(fd, packets[packet], PACKET SIZE);
close(fd);
Device
Client
1 3
2 4
Process Resource
manager manager
5 When the file descriptor is obtained, the client can use it to send
messages directly to the device associated with the pathname.
In the sample code, it looks as if the client opens and writes
directly to the device. In fact, the write() call sends an
IO WRITE message to the resource manager requesting that the
given data be written, and the resource manager responds that it
either wrote some of all of the data, or that the write failed.
ENDCASE
CASE IO READ:
call io read handler
ENDCASE
CASE IO WRITE:
call io write handler
ENDCASE
. /* etc. handle all other messages */
. /* that may occur, performing */
. /* processing as appropriate */
ENDSWITCH
ENDDO
Many of the details in the above pseudo-code are hidden from you by
a resource manager library that you’ll use. For example, you won’t
actually call a MsgReceive*() function — you’ll call a library
function, such as resmgr block() or dispatch block(), that does it for
you. If you’re writing a single-threaded resource manager, you might
provide a message handling loop, but if you’re writing a
multi-threaded resource manager, the loop is hidden from you.
You don’t need to know the format of all the possible messages, and
you don’t have to handle them all. Instead, you register “handler
functions,” and when a message of the appropriate type arrives, the
library calls your handler. For example, suppose you want a client to
get data from you using read() — you’ll write a handler that’s called
whenever an IO READ message is received. Since your handler
handles IO READ messages, we’ll call it an “io read handler.”
The resource manager library:
The library does default handling for any messages that you don’t
want to handle. After all, most resource managers don’t care about
presenting proper POSIX filesystems to the clients. When writing
them, you want to concentrate on the code for talking to the device
you’re controlling. You don’t want to spend a lot of time worrying
about the code for presenting a proper POSIX filesystem to the client.
home/thomasf
Identifies the remaining part that’s to be managed by
the filesystem resource manager.
¯ resmgr layer
¯ dispatch layer
iofunc layer
This top layer consists of a set of functions that take care of most of
the POSIX filesystem details for you — they provide a
POSIX-personality. If you’re writing a device resource manager,
you’ll want to use this layer so that you don’t have to worry too much
about the details involved in presenting a POSIX filesystem to the
world.
resmgr layer
This layer manages most of the resource manager library details. It:
If you don’t use this layer, then you’ll have to parse the messages
yourself. Most resource managers use this layer.
The names of the functions and structures for this layer have the form
resmgr *. The header file is <sys/resmgr.h>. For more
information, see the Library Reference.
IPC messages
Channel
Blocking function
Handler function
I/O handlers
io_read Read function
io_write
... Write function
dispatch layer
This layer acts as a single blocking point for a number of different
types of things. With this layer, you can handle:
IO * messages
It uses the resmgr layer for this.
IPC messages
Channel
Blocking function
Handler function
Select handler
Pulse handler
Message handler
You can use the dispatch layer to handle IO * messages, select, pulses, and
other messages.
#include <errno.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>
Here we supply two tables that specify which function to call when a
particular message arrives:
Effectively, this is a per-name data structure. Later on, we’ll see how
you could extend the structure to include your own per-device
information.
/*
* define THREAD POOL PARAM T such that we can avoid a compiler
* warning when we use the dispatch *() functions below
*/
#define THREAD POOL PARAM T dispatch context t
#include <sys/iofunc.h>
#include <sys/dispatch.h>
argv[0]);
return EXIT FAILURE;
}
#include <sys/iofunc.h>
#include <sys/dispatch.h>
The THREAD POOL PARAM T manifest tells the compiler what type
of parameter is passed between the various blocking/handling
functions that the threads will be using. This parameter should be the
context structure used for passing context information between the
functions. By default it is defined as a resmgr context t but since
this sample is using the dispatch layer, we need it to be a
dispatch context t. We define it prior to doing the includes
above since the header files refer to it.
The thread pool attributes tell the threads which functions to use for
their blocking loop and control how many threads should be in
existence at any time. We go into more detail on these attributes when
The thread pool handle is used to control the thread pool. Amongst
other things, it contains the given attributes and flags. The
thread pool create() function allocates and fills in this handle.
The thread pool start() function starts up the thread pool. Each newly
created thread allocates a context structure of the type defined by
THREAD POOL PARAM T using the context alloc function we gave
above in the attribute structure. They’ll then block on the block func
and when the block func returns, they’ll call the handler func, both of
which were also given through the attributes structure. Each thread
essentially does the same thing that the single-threaded resource
manager above does for its message loop. THREAD POOL PARAM T
From this point on, your resource manager is ready to handle
messages. Since we gave the POOL FLAG EXIT SELF flag to
thread pool create(), once the threads have been started up,
pthread exit() will be called and this calling thread will exit.
OCB B
Process
B
resmgr Mount
library OCB C structure
(optional)
Process describing
C Attribute
structure for /dev/time
/dev/time/min
Multiple clients with multiple OCBs, all linked to one mount structure.
uid and gid The user ID and group ID of the owner of this
resource. These fields are updated automatically by
the chown() helper functions (e.g.
iofunc chown default()) and are referenced in
conjunction with the mode member for
access-granting purposes by the open() help
functions (e.g. iofunc open default()).
IOFUNC PC SYNC IO
If not set, causes the default iofunc layer
IO OPEN handler to fail if the client specified
any one of O DSYNC, O RSYNC, or O SYNC.
IOFUNC PC LINK DIR
Controls whether or not root is allowed to link
and unlink directories.
where:
struct io read {
uint16 t type;
uint16 t combine len;
int32 t nbytes;
uint32 t xtype;
};
typedef union {
struct io read i;
/* unsigned char data[nbytes]; */
/* nbytes is returned with MsgReply */
} io read t;
combine len This field has meaning for a combine message — see
the “Combine messages” section in this chapter.
We’ll create an io read() function that will serve as our handler that
actually returns some data (the fixed string "Hello, world\n").
We’ll use the OCB to keep track of our position within the buffer that
we’re returning to the client.
When we get the IO READ message, the nbytes member tells us
exactly how many bytes the client wants to read. Suppose that the
client issues:
Granted, this isn’t a terribly efficient way for the client to perform
reads! In this case, we would get msg->i.nbytes set to 1 (the size
of the buffer that the client wants to get). We can’t simply return the
entire string all at once to the client — we have to hand it out one
character at a time. This is where the OCB’s offset member comes
into play.
#include <unistd.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>
int io read (resmgr context t *ctp, io read t *msg, RESMGR OCB T *ocb);
int
io read (resmgr context t *ctp, io read t *msg, RESMGR OCB T *ocb)
{
int nleft;
int nbytes;
int nparts;
int status;
/*
* On all reads (first and subsequent), calculate
* how many bytes we can return to the client,
* based upon the number of bytes available (nleft)
* and the client’s buffer size
*/
if (nbytes > 0) {
/* set up the return data IOV */
SETIOV (ctp->iov, buffer + ocb->offset, nbytes);
/*
* advance the offset by the number of bytes
* returned to the client.
*/
ocb->offset += nbytes;
nparts = 1;
} else {
/*
* they’ve asked for zero bytes or they’ve already previously
* read everything
*/
nparts = 0;
}
if (msg->i.nbytes > 0)
ocb->attr->flags |= IOFUNC ATTR ATIME;
The ocb maintains our context for us by storing the offset field, which
gives us the position within the buffer, and by having a pointer to the
attribute structure attr, which tells us how big the buffer actually is
via its nbytes member.
Of course, we had to give the resource manager library the address of
our io read() handler function so that it knew to call it. So the code in
main() where we had called iofunc func init() became:
/* initialize functions for handling messages */
iofunc func init( RESMGR CONNECT NFUNCS, &connect funcs,
RESMGR IO NFUNCS, &io funcs);
io funcs.read = io read;
int io read (resmgr context t *ctp, io read t *msg, RESMGR OCB T *ocb);
Where did the attribute structure’s nbytes member get filled in? In
main(), just after we did the iofunc attr init(). We modified main()
slightly:
After this line:
iofunc attr init (&attr, S IFNAM | 0666, 0, 0);
At this point, if you were to run the resource manager (our simple
resource manager used the name /dev/sample), you could do:
# cat /dev/sample
Hello, world
Where does it get the IOV array? It’s using ctp->iov. That’s why
we first used the SETIOV() macro to make ctp->iov point to the
data to reply with.
If we had no data, as would be the case of a read of zero bytes, then
we’d do a return ( RESMGR NPARTS(0)). But read() returns with
the number of bytes successfully read. Where did we give it this
information? That’s what the IO SET READ NBYTES() macro was
for. It takes the nbytes that we give it and stores it in the context
structure (ctp). Then when we return to the library, the library takes
this nbytes and passes it as the second parameter to the MsgReplyv().
The second parameter tells the kernel what the MsgSend() should
return. And since the read() function is calling MsgSend(), that’s
where it finds out how many bytes were read.
We also update the access time for this device in the read handler. For
details on updating the access time, see the section on “Updating the
time for reads and writes” below.
The first two are almost identical, because the default functions really
don’t do that much by themselves — they rely on the POSIX helper
functions. The third approach has advantages and disadvantages.
int
io open (resmgr context t *ctp, io open t *msg,
RESMGR HANDLE T *handle, void *extra)
{
return (iofunc open default (ctp, msg, handle, extra));
}
Obviously, this is just an incremental step that lets you gain control in
your io open() when the message arrives from the client. You may
wish to do something before or after the default function does its
thing:
/* example of doing something before */
int
io open (resmgr context t *ctp, io open t *msg,
RESMGR HANDLE T *handle, void *extra)
{
if (!accepting opens now) {
return (EBUSY);
}
/*
Or:
int
io open (resmgr context t *ctp, io open t *msg,
RESMGR HANDLE T *handle, void *extra)
{
int sts;
/*
* have the default function do the checking
* and the work for us
*/
/*
* if the default function says it’s okay to let the open
* happen, we want to log the request
*/
if (sts == EOK) {
log open request (ctp, msg);
}
return (sts);
}
It goes without saying that you can do something before and after the
standard default POSIX handler.
The principal advantage of this approach is that you can add to the
functionality of the standard default POSIX handlers with very little
effort.
int
iofunc stat default (resmgr context t *ctp, io stat t *msg,
iofunc ocb t *ocb)
{
iofunc time update (ocb -> attr);
iofunc stat (ocb -> attr, &msg -> o);
return ( RESMGR PTR (ctp, &msg -> o,
sizeof (msg -> o)));
}
Notice how the iofunc chmod() handler performs all the work for the
iofunc chmod default() default handler. This is typical for the simple
functions.
The more interesting case is the iofunc stat default() default handler,
which calls two helper routines. First it calls iofunc time update() to
ensure that all of the time fields (atime, ctime and mtime) are up to
date. Then it calls iofunc stat(), which builds the reply. Finally, the
default function builds a pointer in the ctp structure and returns it.
The most complicated handling is done by the iofunc open default()
handler:
int
iofunc open default (resmgr context t *ctp, io open t *msg,
iofunc attr t *attr, void *extra)
{
int status;
2 It then calls the helper function iofunc open(), which does the
actual verification of the permissions.
Note that even in such cases, there are still helper functions you can
use: iofunc read verify() and iofunc write verify().
struct io write {
uint16 t type;
uint16 t combine len;
int32 t nbytes;
uint32 t xtype;
/* unsigned char data[nbytes]; */
};
typedef union {
struct io write i;
/* nbytes is returned with MsgReply */
} io write t;
int
io write (resmgr context t *ctp, io write t *msg, RESMGR OCB T *ocb)
{
int status;
char *buf;
/*
* Reread the data from the sender’s message buffer.
* We’re not assuming that all of the data fit into the
* resource manager library’s receive buffer.
*/
if (msg->i.nbytes > 0)
ocb->attr->flags |= IOFUNC ATTR MTIME | IOFUNC ATTR CTIME;
Of course, we’ll have to give the resource manager library the address
of our io write handler so that it’ll know to call it. In the code for
main() where we called iofunc func init(), we’ll add a line to register
our io write handler:
/* initialize functions for handling messages */
iofunc func init( RESMGR CONNECT NFUNCS, &connect funcs,
RESMGR IO NFUNCS, &io funcs);
io funcs.write = io write;
At this point, if you were to run the resource manager (our simple
resource manager used the name /dev/sample), you could write to
it by doing echo Hello > /dev/sample as follows:
return (ENOMEM);
In the case of a read(), both of the above cause the read to return -1
with errno set to ENOMEM.
...
And:
SETIOV (ctp->iov, buffer, nbytes);
return ( RESMGR NPARTS(1));
return (EOK);
Note that in neither case are you causing the MsgSend() to return with
a 0. The value that the MsgSend() returns is the value passed to the
IO SET READ NBYTES(), IO SET WRITE NBYTES(), and other
similar macros. These two were used in the read and write samples
above.
The resmgr msgwrite() copies the contents of buffer into the client’s
reply buffer immediately. Note that a reply is still required in order to
unblock the client so it can examine the data. Next we free the buffer.
Finally, we return to the resource manager library such that it does a
reply with zero-length data. Since the reply is of zero length, it
doesn’t overwrite the data already written into the client’s reply
buffer. When the client returns from its send call, the data is there
waiting for it.
¯ You can leave the client blocked and later, when your write handler
function is called, you can reply to the client with the new data.
Another example might be if the client wants you to write out to some
device but doesn’t want to get a reply until the data has been fully
written out. Here are the sequence of events that might follow:
4 Many interrupts may occur before all the data is written — only
then would you reply to the client.
The first issue, though, is whether the client wants to be left blocked.
If the client doesn’t want to be left blocked, then it opens with the
O NONBLOCK flag:
...
int nonblock;
The question remains: How do you do the reply yourself? The only
detail to be aware of is that the rcvid to reply to is ctp->rcvid. If
you’re replying later, then you’d save ctp->rcvid and use the saved
value in your reply.
Or:
iov t iov[2];
Note that you can fill up the client’s reply buffer as data becomes
available by using resmgr msgwrite() and resmgr msgwritev(). Just
remember to do the MsgReply*() at some time to unblock the client.
¯ Handling readcond().
IO XTYPE NONE
No extended type information is being provided.
IO XTYPE OFFSET
If clients are calling pread(), pread64(), pwrite(), or pwrite64(),
then they don’t want you to use the offset in the OCB. Instead,
they’re providing a one-shot offset. That offset follows the
struct io read or struct io write headers that
reside at the beginning of the message buffers.
For example:
struct myread offset {
struct io read read;
struct xtype offset offset;
}
Some resource managers can be sure that their clients will never
call pread*() or pwrite*(). (For example, a resource manager
that’s controlling a robot arm probably wouldn’t care.) In this
case, you can treat this type of message as an error.
IO XTYPE READCOND
If a client is calling readcond(), they want to impose timing and
return buffer size constraints on the read. Those constraints
struct myreadcond {
struct io read read;
struct xtype readcond cond;
}
int
io read (resmgr context t *ctp, io read t *msg,
RESMGR OCB T *ocb)
{
int status;
/* No special xtypes */
if ((msg->i.xtype & IO XTYPE MASK) != IO XTYPE NONE)
return ( RESMGR ERRNO(ENOSYS));
...
}
¯ pread*()
¯ pwrite*()
int
io read (resmgr context t *ctp, io read t *msg,
RESMGR OCB T *ocb)
{
off64 t offset; /* where to read from */
int status;
...
}
int
io write (resmgr context t *ctp, io write t *msg,
RESMGR OCB T *ocb)
{
off64 t offset; /* where to write */
int status;
size t skip; /* offset into msg to where the data
resides */
...
/*
* get the data from the sender’s message buffer,
* skipping all possible header information
*/
resmgr msgreadv(ctp, iovs, niovs, skip);
...
}
Handling readcond()
The same type of operation that was done to handle the
pread()/ IO XTYPE OFFSET case can be used for handling the client’s
readcond() call:
typedef struct {
struct io read read;
struct xtype readcond cond;
} io readcond t
Then:
struct xtype readcond *cond
...
CASE IO XTYPE READCOND:
cond = &((io readcond t *)msg)->cond
break;
}
Then your manager has to properly interpret and deal with the
arguments to readcond(). For more information, see the Library
Reference.
Attribute handling
Updating the time for reads and writes
In the read sample above we did:
if (msg->i.nbytes > 0)
ocb->attr->flags |= IOFUNC ATTR ATIME;
According to POSIX, if the read succeeds and the reader had asked
for more than zero bytes, then the access time must be marked for
update. But POSIX doesn’t say that it must be updated right away. If
you’re doing many reads, you may not want to read the time from the
kernel for every read. In the code above, we mark the time only as
needing to be updated. When the next IO STAT or IO CLOSE OCB
message is processed, the resource manager library will see that the
time needs to be updated and will get it from the kernel then. This of
course has the disadvantage that the time is not the time of the read.
Similarly for the write sample above, we did:
if (msg->i.nbytes > 0)
ocb->attr->flags |= IOFUNC ATTR MTIME | IOFUNC ATTR CTIME;
if (msg->i.nbytes > 0) {
ocb->attr->flags |= IOFUNC ATTR ATIME;
iofunc time update(ocb->attr);
}
if (msg->i.nbytes > 0) {
ocb->attr->flags |= IOFUNC ATTR MTIME | IOFUNC ATTR CTIME;
iofunc time update(ocb->attr);
}
You should call iofunc time update() before you flush out any cached
attributes. As a result of changing the time fields, the attribute
structure will have the IOFUNC ATTR DIRTY TIME bit set in the flags
field, indicating that this field of the attribute must be updated when
the attribute is flushed from the cache.
Combine messages
In this section:
Atomic operations
Consider a case where two threads are executing the following code,
trying to read from the same file descriptor:
a thread ()
{
char buf [BUFSIZ];
The first thread performs the lseek() and then gets preempted by the
second thread. When the first thread resumes executing, its offset into
the file will be at the end of where the second thread read from, not
the position that it had lseek()’d to.
This can be solved in one of three ways:
¯ The two threads can use a mutex to ensure that only one thread at a
time is using the file descriptor.
¯ Each thread can open the file itself, thus generating a unique file
descriptor that won’t be affected by any other threads.
Using a mutex
Per-thread files
Bandwidth considerations
Another place where combine messages are useful is in the stat()
function, which can be implemented by calling open(), fstat(), and
close() in sequence.
Rather than generate three separate messages (one for each of the
functions), the C library combines them into one contiguous message.
This boosts performance, especially over a networked connection, and
also simplifies the resource manager, because it’s not forced to have a
connect function to handle stat().
¯ Component responses
¯ IO CONNECT COMBINE
Component responses
As we’ve seen, a combine message really consists of a number of
“regular” resource manager messages combined into one large
contiguous message. The resource manager library handles each
component in the combine message separately by extracting the
individual components and then out calling to the handlers you’ve
specified in the connect and I/O function tables, as appropriate, for
each component.
This generally doesn’t present any new wrinkles for the message
handlers themselves, except in one case. Consider the readblock()
combine message:
If the IO COMBINE FLAG bit is set in the combine len member, this
indicates that the message is being processed as part of a combine
message.
When the resource manager library is processing the individual
components of the combine message, it looks at the error return from
the individual message handlers. If a handler returns anything other
than EOK, then processing of further combine message components is
aborted. The error that was returned from the failing component’s
handler is returned to the client.
As you can see, resmgr msgread() simply calls MsgRead() with the
offset of the component message from the beginning of the combine
message buffer. For completeness, there’s also a resmgr msgwrite()
that works in an identical manner to MsgWrite(), except that it
dereferences the passed ctp to obtain the rcvid.
¯ lock ocb
¯ unlock ocb
These are members of the I/O functions structure. The handlers that
you provide for those callouts should lock and unlock the attribute
structure pointed to by the OCB by calling iofunc attr lock() and
iofunc attr unlock(). Therefore, if you’re locking the attribute
structure, there’s a possibility that the lock ocb callout will block for a
period of time. This is normal and expected behavior. Note also that
the attributes structure is automatically locked for you when your I/O
function is called.
Callouts: io open()
io lock ocb()
io stat()
io unlock ocb()
io close()
IO CONNECT COMBINE
For the access() function, the client’s C library will open a connection
to the resource manager and perform a stat() call. Then, based on the
results of the stat() call, the client’s C library access() may perform an
optional devctl() to get more information. In any event, because
access() opened the device, it must also call close() to close it:
Callouts: io open()
io lock ocb()
io stat()
io unlock ocb()
io lock ocb() (optional)
io devctl() (optional)
io unlock ocb() (optional)
io close()
the attribute structure. Since the attribute structure doesn’t have any
spare fields, we would have to extend it to contain that pointer.
Sometimes you may want to add extra entries to the standard
iofunc *() OCB (iofunc ocb t).
Let’s see how we can extend both of these structures. The basic
strategy used is to encapsulate the existing attributes and OCB
structures within a newly defined superstructure that also contains our
extensions. Here’s the code (see the text following the listing for
comments):
/* Define our overrides before including <sys/iofunc.h> */
struct device;
#define IOFUNC ATTR T struct device /* see note 1 */
struct ocb;
#define IOFUNC OCB T struct ocb /* see note 1 */
#include <sys/iofunc.h>
#include <sys/dispatch.h>
struct ocb *ocb calloc (resmgr context t *ctp, struct device *device);
void ocb free (struct ocb *ocb);
iofunc funcs t ocb funcs = { /* our ocb allocating & freeing functions */
IOFUNC NFUNCS,
ocb calloc,
ocb free
};
/* One struct device per attached name (there’s only one name in this
example) */
main()
{
...
/*
* deviceattr will indirectly contain the addresses
* of the OCB allocating and freeing functions
*/
deviceattr.attr.mount = &mountpoint;
resmgr attach (..., &deviceattr);
...
}
/*
* ocb calloc
*
* The purpose of this is to give us a place to allocate our own OCB.
* It is called as a result of the open being done
* (e.g. iofunc open default causes it to be called). We
* registered it through the mount structure.
*/
IOFUNC OCB T
ocb calloc (resmgr context t *ctp, IOFUNC ATTR T *device)
{
struct ocb *ocb;
/* see note 3 */
ocb -> prev = &device -> list;
if (ocb -> next = device -> list) {
device -> list -> prev = &ocb -> next;
}
device -> list = ocb;
return (ocb);
}
/*
* ocb free
*
* The purpose of this is to give us a place to free our OCB.
* It is called as a result of the close being done
* (e.g. iofunc close ocb default causes it to be called). We
* registered it through the mount structure.
*/
void
ocb free (IOFUNC OCB T *ocb)
{
/* see note 3 */
if (*ocb -> prev = ocb -> next) {
ocb -> next -> prev = ocb -> prev;
}
free (ocb);
}
3 The ocb calloc() and ocb free() sample functions shown here
cause the newly allocated OCBs to be maintained in a linked
list. Note the use of dual indirection on the struct ocb
**prev; member.
struct newmount {
iofunc mount t mount;
int ourflag;
};
typedef union {
struct io devctl i;
struct io devctl reply o;
} io devctl t;
The nbytes value is the nbytes that’s passed to the devctl() function.
The value contains the size of the data to be sent to the device driver,
or the maximum size of the data to be received from the device driver.
The most interesting item of the input structure is the dcmd. that’s
passed to the devctl() function. This command is formed using the
macros defined in <devctl.h>:
#define POSIX DEVDIR NONE 0
#define POSIX DEVDIR TO 0x80000000
#define POSIX DEVDIR FROM 0x40000000
#define DIOF(class, cmd, data) ((sizeof(data)<<16) + ((class)<<8) + (cmd) + POSIX DEVDIR FROM)
#define DIOT(class, cmd, data) ((sizeof(data)<<16) + ((class)<<8) + (cmd) + POSIX DEVDIR TO)
#define DIOTF(class, cmd, data) ((sizeof(data)<<16) + ((class)<<8) + (cmd) + POSIX DEVDIR TOFROM)
#define DION(class, cmd) (((class)<<8) + (cmd) + POSIX DEVDIR NONE)
☞ The size of the structure that’s passed as the last field to the DIO*
macros must be less than 214 == 16 KB. Anything larger than this
interferes with the upper two directional bits.
In the above code, we defined three commands that the client can use:
MY DEVCTL SETVAL
Sets the server global to the integer the client provides.
MY DEVCTL GETVAL
Gets the server global and puts that value into the client’s
buffer.
MY DEVCTL SETGET
Sets the server global to the integer the client provides and
returns the previous value of the server global in the client’s
buffer.
And the following code gets added before the main() function:
int io devctl(resmgr context t *ctp, io devctl t *msg, RESMGR OCB T *ocb);
Now, you need to include the new handler function to handle the
IO DEVCTL message:
/*
Let common code handle DCMD ALL * cases.
You can do this before or after you intercept devctl’s depending
on your intentions. Here we aren’t using any pre-defined values
so let the system ones be handled first.
*/
if ((status = iofunc devctl default(ctp, msg, ocb)) != RESMGR DEFAULT) {
return(status);
}
status = nbytes = 0;
/*
Note this assumes that you can fit the entire data portion of
the devctl into one message. In reality you should probably
perform a MsgReadv() once you know the type of message you
have received to suck all of the data in rather than assuming
it all fits in the message. We have set in our main routine
that we’ll accept a total message size of up to 2k so we
don’t worry about it in this example where we deal with ints.
*/
rx data = DEVCTL DATA(msg->i);
/*
Three examples of devctl operations.
default:
return(ENOSYS);
}
/* Clear the return message ... note we saved our data after this */
memset(&msg->o, 0, sizeof(msg->o));
/*
If you wanted to pass something different to the return
field of the devctl() you could do it through this member.
*/
msg->o.ret val = status;
¯ The data being returned to the client is placed at the end of the
reply message. This is the same mechanism used for the input data
so we can use the DEVCTL DATA() function to get a pointer to
this location. With large replies that wouldn’t necessarily fit into
the server’s receive buffer, you should use one of the reply
mechanisms described in the “Methods of returning and replying”
section. Again, in this example, we’re only returning an integer
that fits into the receive buffer without any problem.
If you add the following handler code, a client should be able to open
/dev/sample and subsequently set and retrieve the global integer
value:
int main(int argc, char **argv) {
int fd, ret, val;
data t data;
return(0);
}
For more information on the ionotify() and select() functions, see the
Library Reference.
typedef union {
struct io notify i;
struct io notify reply o;
} io notify t;
io notify t *msg
¯ type
¯ combine len
¯ action
¯ flags
¯ event
¯ check for conditions now, and if none are met, arm them
Since iofunc notify() looks at this, you don’t have to worry about it.
The flags member contains the conditions that the client is interested
in and can be any mixture of the following:
#include <sys/iofunc.h>
#include <sys/dispatch.h>
/*
* define structure and variables for storing the data that is received.
* When clients write data to us, we store it here. When clients do
* reads, we get the data from here. Result ... a simple message queue.
*/
typedef struct item s {
struct item s *next;
char *data;
} item t;
¯ a queue to keep the data that gets written to us, and that we use to
reply to a client. For this, we defined item t; it’s a type that
contains data for a single item, as well as a pointer to the next
item t. In device attr t we use firstitem (points to the first
item in the queue), and nitems (number of items).
Note that we removed the definition of attr, since we use device attr
instead.
Of course, we have to give the resource manager library the address of
our handlers so that it’ll know to call them. In the code for main()
where we called iofunc func init(), we’ll add the following code to
register our handlers:
/* initialize functions for handling messages */
iofunc func init( RESMGR CONNECT NFUNCS, &connect funcs,
RESMGR IO NFUNCS, &io funcs);
io funcs.notify = io notify; /* for handling IO NOTIFY, sent as
a result of client calls to ionotify()
and select() */
io funcs.write = io write;
io funcs.read = io read;
io funcs.close ocb = io close ocb;
And, since we’re using device attr in place of attr, we need to change
the code wherever we use it in main(). So, you’ll need to replace this
code:
/* initialize attribute structure used by the device */
iofunc attr init(&attr, S IFNAM | 0666, 0, 0);
Note that we set up our device-specific data in device attr. And, in the
call to resmgr attach(), we passed &device attr (instead of
&attr) for the handle parameter.
Now, you need to include the new handler function to handle the
IO NOTIFY message:
int
io notify(resmgr context t *ctp, io notify t *msg, RESMGR OCB T *ocb)
{
device attr t *dattr = (device attr t *) ocb->attr;
int trig;
/*
* ’trig’ will tell iofunc notify() which conditions are currently
* satisfied. ’dattr->nitems’ is the number of messages in our list of
* stored messages.
*/
/*
* iofunc notify() will do any necessary handling, including adding
* the client to the notification list is need be.
*/
int
io write(resmgr context t *ctp, io write t *msg,
RESMGR OCB T *ocb)
{
device attr t *dattr = (device attr t *) ocb->attr;
int i;
char *p;
int status;
char *buf;
item t *newitem;
if (msg->i.nbytes > 0) {
sizeof(msg->i));
newitem->data[msg->i.nbytes] = NULL;
if (dattr->firstitem)
newitem->next = dattr->firstitem;
else
newitem->next = NULL;
dattr->firstitem = newitem;
dattr->nitems++;
/*
* notify clients who may have asked to be notified
* when there is data
*/
if (msg->i.nbytes > 0)
ocb->attr->attr.flags |= IOFUNC ATTR MTIME |
IOFUNC ATTR CTIME;
The important part of the above io write() handler is the code within
the following section:
if (msg->i.nbytes > 0) {
....
}
Here we first allocate space for the incoming data, and then use
resmgr msgread() to copy the data from the client’s send buffer into
the allocated space. Then, we add the data to our queue.
Next, we pass the number of input units that are available to
IOFUNC NOTIFY INPUT CHECK() to see if there are enough units
to notify clients about. This is checked against the notifycounts that
if (dattr->firstitem) {
int nbytes;
item t *item, *prev;
/*
* figure out number of bytes to give, write the data to the
* client’s reply buffer, even if we have more bytes than they
* are asking for, we remove the item from our list
*/
nbytes = min (strlen (item->data), msg->i.nbytes);
/*
* write the bytes to the client’s reply buffer now since we
* are about to free the data
*/
resmgr msgwrite (ctp, item->data, nbytes, 0);
free(item);
dattr->nitems--;
} else {
/* the read() will return with 0 bytes */
IO SET READ NBYTES (ctp, 0);
}
if (msg->i.nbytes > 0)
ocb->attr->attr.flags |= IOFUNC ATTR ATIME;
return (EOK);
}
The important part of the above io read handler is the code within this
section:
if (firstitem) {
....
}
We first walk through the queue looking for the oldest item. Then we
use resmgr msgwrite() to write the data to the client’s reply buffer.
We do this now because the next step is to free the memory that we’re
using to store that data. We also remove the item from our queue.
Lastly, if a client closes their file descriptor, we must remove them
from our list of clients. This is done using a io close ocb handler:
int
io close ocb(resmgr context t *ctp, void *reserved, RESMGR OCB T *ocb)
{
device attr t *dattr = (device attr t *) ocb->attr;
/*
* a client has closed their file descriptor or has terminated.
* Remove them from the notification list.
*/
int
timer tick(message context t *ctp, int code, unsigned flags, void *handle) {
* ....
*/
printf("received timer event, value %d\n", value.sival int);
return 0;
}
int
message handler(message context t *ctp, int code, unsigned flags, void *handle) {
printf("received private message, type %d\n", code);
return 0;
}
int
main(int argc, char **argv) {
thread pool attr t pool attr;
resmgr attr t resmgr attr;
struct sigevent event;
struct itimer itime;
dispatch t *dpp;
thread pool t *tpp;
resmgr context t *ctp;
int timer id;
int id;
if((tpp = thread pool create(&pool attr, POOL FLAG EXIT SELF)) == NULL) {
fprintf(stderr, "%s: Unable to initialize thread pool.\n",argv[0]);
return EXIT FAILURE;
}
iofunc func init( RESMGR CONNECT NFUNCS, &connect func, RESMGR IO NFUNCS,
&io func);
iofunc attr init(&attr, S IFNAM | 0666, 0, 0);
/* Never returns */
thread pool start(tpp);
}
☞ Most users of the library will want to have the default functions
manage the IO DUP and IO CLOSE messages; you’ll most likely
never override the default actions.
Why two? Because we may get an abort in one of two places. We can
get the abort pulse right after the client has sent the IO OPEN
message (but before we’ve replied to it), or we can get the abort
during an I/O message.
Once we’ve performed the handling of the IO CONNECT message,
the I/O functions’ unblock member will be used to service an unblock
pulse. Therefore, if you’re supplying your own io open handler, be
sure to set up all relevant fields in the OCB before you call
resmgr open bind(); otherwise, your I/O functions’ version of the
unblock handler may get called with invalid data in the OCB. (Note
that this issue of abort pulses “during” message processing arises only
if there are multiple threads running in your resource manager. If
there’s only one thread, then the messages will be serialized by the
library’s MsgReceive() function.)
The effect of this is that if the client is SEND-blocked, the server
doesn’t need to know that the client is aborting the request, because
the server hasn’t yet received it.
Only in the case where the server has received the request and is
performing processing on that request does the server need to know
that the client now wishes to abort.
For more information on these states and their interactions, see the
MsgSend(), MsgReceive(), MsgReply(), and ChannelCreate()
functions in the Library Reference; see also the chapter on
Interprocess Communication in the System Architecture book.
If you’re overriding the default unblock handler, you should always
call the default handler to process any generic unblocking cases first.
For example:
if((status = iofunc unblock default(...)) != RESMGR DEFAULT) {
return status;
}
This ensures that any client waiting on a resource manager lists (such
as an advisory lock list) will be unblocked if possible.
Handling interrupts
Resource managers that manage an actual hardware resource will
likely need to handle interrupts generated by the hardware. For a
detailed discussion on strategies for interrupt handlers, see the chapter
on Writing an Interrupt Handler in this book.
How do interrupt handlers relate to resource managers? When a
significant event happens within the interrupt handler, the handler
needs to inform a thread in the resource manager. This is usually done
via a pulse (discussed in the “Handling private messages and pulses”
section), but it can also be done with the SIGEV INTR event
notification type. Let’s look at this in more detail.
When the resource manager starts up, it transfers control to
thread pool start(). This function may or may not return, depending
on the flags passed to thread pool create() (if you don’t pass any
flags, the function returns after the thread pool is created). This means
that if you’re going to set up an interrupt handler, you should do so
before starting the thread pool, or use one of the strategies we
discussed above (such as starting a thread for your entire resource
manager).
However, if you’re going to use the SIGEV INTR event notification
type, there’s a catch — the thread that attaches the interrupt (via
InterruptAttach() or InterruptAttachEvent()) must be the same thread
that calls InterruptWait().
#define INTNUM 0
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>
#include <sys/neutrino.h>
void *
interrupt thread (void * data)
{
struct sigevent event;
int id;
while (1) {
InterruptWait (NULL, NULL);
/* do something about the interrupt,
* perhaps updating some shared
* structures in the resource manager
*
* unmask the interrupt when done
*/
InterruptUnmask(INTNUM, id);
}
}
int
main(int argc, char **argv) {
thread pool attr t pool attr;
resmgr attr t resmgr attr;
dispatch t *dpp;
thread pool t *tpp;
int id;
/* Never returns */
thread pool start(tpp);
}
#include <errno.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>
/*
* define THREAD POOL PARAM T such that we can avoid a compiler
* warning when we use the dispatch *() functions below
*/
#define THREAD POOL PARAM T dispatch context t
#include <sys/iofunc.h>
#include <sys/dispatch.h>
int id;
The thread pool attribute (pool attr) controls various aspects of the
thread pool, such as which functions get called when a new thread is
started or dies, the total number of worker threads, the minimum
number, and so on.
The functions that you fill into the above structure can be taken from
the dispatch layer (dispatch block(), ...), the resmgr layer
(resmgr block(), ...) or they can be of your own making. If you’re not
using the resmgr layer functions, then you’ll have to define
THREAD POOL PARAM T to some sort of context structure for the
library to pass between the various functions. By default, it’s defined
as a resmgr context t but since this sample is using the dispatch
The important parameters specify the maximum thread count and the
increment. The value for maximum should ensure that there’s always
a thread in a RECEIVE-blocked state. If you’re at the number of
maximum threads, then your clients will block until a free thread is
ready to receive data. The value you specify for increment will cut
down on the number of times your driver needs to create threads. It’s
probably wise to err on the side of creating more threads and leaving
them around rather than have them being created/destroyed all the
time.
You determine the number of threads you want to be
RECEIVE-blocked on the MsgReceive() at any time by filling in the
lo water parameter.
If you ever have fewer than lo water threads RECEIVE-blocked, the
increment parameter specifies how many threads should be created at
context free() Free the context when the worker thread exits.
If no flags are passed (i.e. 0 instead of any flags), the function returns
after the thread pool is created.
¯ Handling directories
ls -l /mount/home
d = opendir("/mount/home");
while (...) {
dirent = readdir(d);
...
}
/*
* MOD [1]: allocate multiple attribute structures,
* and fill in a names array (convenience)
*/
#define NumDevices 2
iofunc attr t sample attrs [NumDevices];
char *names [NumDevices] =
{
"/dev/sample1",
"/dev/sample2"
};
main ()
{
...
/*
* MOD [2]: fill in the attribute structure for each device
* and call resmgr attach for each device
*/
for (i = 0; i < NumDevices; i++) {
iofunc attr init (&sample attrs [i],
S IFCHR | 0666, NULL, NULL);
pathID = resmgr attach (dpp, &resmgr attr, name[i],
FTYPE ANY, 0,
&my connect funcs,
&my io funcs,
&sample attrs [i]);
}
...
}
Handling directories
Up until this point, our discussion has focused on resource managers
that associate each device name via discrete calls to resmgr attach().
We’ve shown how to “take over” a single pathname. (Our examples
have used pathnames under /dev, but there’s no reason you couldn’t
take over any other pathnames, e.g. /MyDevice.)
A typical resource manager can take over any number of pathnames.
A practical limit, however, is on the order of a hundred — the real
And those are just the most obvious ones. The reasons (and
possibilities) are almost endless.
The common characteristic of these resource managers is that they all
implement filesystems. A filesystem resource manager differs from
the “device” resource managers (that we have shown so far) in the
following key areas:
3 The IO READ logic has to return the data for either the “file”
or “directory” specified by the pathname.
struct io connect {
unsigned short type;
unsigned short subtype; /* IO CONNECT * */
Looking at the relevant fields, we see ioflag, mode, sflag, and access,
which tell us how the resource was opened.
The path len parameter tells us how many bytes the pathname takes;
the actual pathname appears in the path parameter. Note that the
pathname that appears is not /sample fsys/spud, as you might
expect, but instead is just spud — the message contains only the
pathname relative to the resource manager’s mountpoint. This
simplifies coding because you don’t have to skip past the mountpoint
name each time, the code doesn’t have to know what the mountpoint
is, and the messages will be a little bit shorter.
Note also that the pathname will never have relative (. and ..) path
components, nor redundant slashes (e.g. spud//stuff) in it — these
are all resolved and removed by the time the message is sent to the
resource manager.
When writing filesystem resource managers, we encounter additional
complexity when dealing with the pathnames. For verification of
access, we need to break apart the passed pathname and check each
component. You can use strtok() and friends to break apart the string,
and then there’s iofunc check access(), a convenient iofunc-layer call
that performs the access verification of pathname components leading
up to the target. (See the Library Reference page for the iofunc open()
for information detailing the steps needed for this level of checking.)
☞ The binding that takes place after the name is validated requires that
every path that’s handled has its own attribute structure passed to
iofunc open default(). Unexpected behavior will result if the wrong
attribute is bound to the pathname that’s provided.
struct dirent {
ino t d ino;
off t d offset;
unsigned short d reclen;
unsigned short d namelen;
char d name [NAME MAX + 1];
};
struct dirent
struct stat
Alignment filler
struct dirent
struct stat
Alignment filler
Returning the optional struct stat along with the struct dirent
entry can improve efficiency.
Message types
Generally, a resource manager receives these types of messages:
¯ connect messages
¯ I/O messages
Connect messages
A connect message is issued by the client to perform an operation
based on a pathname. This may be a message that establishes a longer
term relationship between the client and the resource manager (e.g.
open()), or it may be a message that is a “one-shot” event (e.g.
rename()).
The library looks at the connect funcs parameter (of type
resmgr connect funcs t — see the Library Reference) and calls
out to the appropriate function.
If the message is the IO CONNECT message (and variants)
corresponding with the open() outcall, then a context needs to be
established for further I/O messages that will be processed later. This
context is referred to as an OCB (Open Control Block) — it holds any
information required between the connect message and subsequent
I/O messages.
Basically, the OCB is a good place to keep information that needs to
be stored on a per-open basis. An example of this would be the
current position within a file. Each open file descriptor would have its
own file position. The OCB is allocated on a per-open basis. During
the open handling, you’d initialize the file position; during read and
write handling, you’d advance the file position. For more information,
see the section “The open control block (OCB) structure.”
I/O messages
An I/O message is one that relies on an existing binding (e.g. OCB)
between the client and the resource manager.
An an example, an IO READ (from the client’s read() function)
message depends on the client’s having previously established an
association (or context) with the resource manager by issuing an
open() and getting back a file descriptor. This context, created by the
open() call, is then used to process the subsequent I/O messages, like
the IO READ.
There are good reasons for this design. It would be inefficient to pass
the full pathname for each and every read() request, for example. The
open() handler can also perform tasks that we want done only once
(e.g. permission checks), rather than with each I/O message. Also,
when the read() has read 4096 bytes from a disk file, there may be
another 20 megabytes still waiting to be read. Therefore, the read()
function would need to have some context information telling it the
position within the file it’s reading from, how much has been read,
and so on.
The resmgr io funcs t structure is filled in a manner similar to
the connect functions structure resmgr connect funcs t.
Notice that the I/O functions all have a common parameter list. The
first entry is a resource manager context structure, the second is a
message (the type of which matches the message being handled and
contains parameters sent from the client), and the last is an OCB
(containing what we bound when we handled the client’s open()
function).
In this chapter. . .
What is Qnet? 193
Benefits of Qnet 193
How does it work? 196
Locating services using GNS 200
Quality of Service (QoS) and multiple paths 209
Designing a system using Qnet 212
Autodiscovery vs static 218
When should you use Qnet, TCP/IP, or NFS? 219
Writing a driver for Qnet 222
What is Qnet?
Qnet is Neutrino’s protocol for distributed networking. Using Qnet,
you can build a transparent distributed-processing platform that is fast
and scalable. This is accomplished by extending the Neutrino
message passing architecture over a network. This creates a group of
tightly integrated Neutrino nodes (systems) or CPUs — a Neutrino
native network.
A program running on a Neutrino node in this Qnet network can
transparently access any resource, whether it’s a file, device, or
another process. These resources reside on any other node (a
computer, a workstation or a CPU in a system) in the Qnet network.
The Qnet protocol builds an optimized network that provides a fast
and seamless interface between Neutrino nodes.
Benefits of Qnet
The Qnet protocol extends interprocess communication (IPC)
transparently over a network of microkernels. This is done by taking
advantage of the Neutrino’s message-passing paradigm. Message
passing is the central theme of Neutrino that manages a group of
cooperating processes by routing messages. This enhances the
Since Qnet extends Neutrino message passing over the network, other
forms of interprocess communication (e.g. signals, message queues,
and named semaphores) also work over the network.
Qnet drivers
In order to support different hardware, you may need to write a driver
for Qnet. The driver essentially performs three functions: transmits a
packet, receives a packet, and resolves the remote node’s interface.
In most cases, you don’t need a specific driver for your hardware, for
example, for implementing a local area network using Ethernet
hardware or for implementing TCP/IP networking that require IP
encapsulation. In these cases, the underlying io-net and tcpip
layer is sufficient to interface with the Qnet layer for transmitting and
receiving packets. You use standard Neutrino drivers to implement
Qnet over a local area network or to encapsulate Qnet messages in IP
(TCP/IP) to allow Qnet to be routed to remote networks.
But suppose you want to set up a very tightly coupled network
between two CPUs over a super-fast interconnect (e.g. PCI or
RapidIO). You can easily take advantage of the performance of such a
high-speed link, because Qnet can talk directly to your hardware
driver. There’s no io-net layer in this case. All you need is a little
code at the very bottom of Qnet layer that understands how to transmit
and receive packets. This is simple as there is a standard internal API
between the rest of Qnet and this very bottom portion, the driver
interface. Qnet already supports different packet transmit/receive
interfaces, so adding another is reasonably straightforward. The
transport mechanism of Qnet (called the L4) is quite generic and can
be configured for different size MTUs, whether or not ACK packets
or CRC checks are required, to take the full advantage of your link’s
advanced features (e.g. guaranteed reliability).
For details about how to write a driver, see the section on “Writing a
driver for Qnet” later in this chapter.
The QNX Momentics Transparent Distributed Processing Source Kit
(TDP SK) is available to help you develop custom drivers and/or
modify Qnet components to suit your particular application. For more
information, contact your sales representative.
/net/node1/dev/socket
/net/node1/dev/ser1
/net/node1/home
/net/node1/bin
....
¯ To get system information about all of the remote nodes that are
listed in /net, use pidin with the net argument:
$ pidin net
¯ You can use pidin with the -n option to get information about the
processes on another machine:
pidin -n node1 | less
fd = open("/net/node1/dev/ser1",O RDWR...);
As you can see, the code required for accessing remote resources and
local resources is identical. The only change is the pathname used.
In the TCP/IP socket() case, it’s the same, but implemented
differently. In the socket case, you don’t directly open a filename.
This is done inside the socket library. In this case, an environment
variable is provided to set the pathname for the socket call (the SOCK
environment variable — see npm-tcpip.so).
Some other applications are:
Message queue
You can create or open a message queue by using
mq open(). The mqueue manager must be running.
When a queue is created, it appears in the pathname
space under /dev/mqueue. So, you can access
/dev/mqueue on node1 from another node by
using /net/node1/dev/mqueue.
/dev/par1
Manager: Application:
name_attach name_open
("printer") ("printer")
Global Name
Service
Qnet Qnet
Name Path
printer /net/node1/dev/name/global/printer
... ...
In this example, there’s one gns client and one gns server. As far as
an application is concerned, the GNS service is one entity. The
Server
name attach()
Register your service with the GNS server.
name detach()
Deregister your service with the GNS server.
Client
name open() Open a service via the GNS server.
name close() Close the service opened with name open().
Registering a service
In order to use GNS, you need to first register the manager process
with GNS, by calling name attach().
When you register a service, you need to decide whether to register
this manager’s service locally or globally. If you register your service
locally, only the local node is able to see this service; another node is
not able to see it. This allows you to have client applications that look
for service names rather than pathnames on the node it is executing
on. This document highlights registering services globally.
When you register GNS service globally, any node on the network
running a client application can use this service, provided the node is
running a gns client process and is connected to the gns server, along
with client applications on the nodes running the gns server process.
You can use a typical name attach() call as follows:
if ((attach = name attach(NULL, "printer", NAME FLAG ATTACH GLOBAL)) == NULL) {
return EXIT FAILURE;
}
First thing you do is to pass the flag NAME FLAG ATTACH GLOBAL.
This causes your service to be registered globally instead locally.
The last thing to note is the name. This is the name that clients search
for. This name can have a single level, as above, or it can be nested,
such as printer/ps. The call looks like this:
if ((attach = name attach(NULL, "printer/ps", NAME FLAG ATTACH GLOBAL)) == NULL) {
return EXIT FAILURE;
}
Nested names have no impact on how the service works. The only
difference is how the services are organized in the filesystem
generated by gns. For example:
$ ls -l /dev/name/global/
total 2
dr-xr-xr-x 0 root techies 1 Feb 06 16:20 net
dr-xr-xr-x 0 root techies 1 Feb 06 16:21 printer
$ ls -l /dev/name/global/printer
total 1
dr-xr-xr-x 0 root techies 1 Feb 06 16:21 ps
or:
if ((fd = name open("printer/ps", NAME FLAG ATTACH GLOBAL)) == -1) {
return EXIT FAILURE;
}
If you don’t specify this flag, GNS looks only for a local service. The
function returns an fd that you can then use to access the service
manager by sending messages, just as if you it had opened the service
directly as /dev/par1, or /net/node/dev/par1.
Name Path
printer /net/node3/dev/name/global/printer
... ...
Name Path
printer /net/node3/dev/name/global/printer
... ...
node1 node2
/dev/par1
You don’t have to start all redundant gns servers at the same time.
You can start one gns server process first, and then start a second gns
server process at a later time. In this case, use the special option -s
backup server on the second gns server process to make it download
the current service database from another node that’s already running
the gns server process. When you do this, the clients connected to the
first node (that’s already running the gns server process) are notified
of the existence of the other server.
In the second scenario, you maintain more than one global domain.
For example, assume you have two nodes, each running a gns server
process. You also have a client node that’s running a gns client
process and is connecting to one of the servers. A different client
node connects to the other server. Each server node has unique
services registered by each client. A client connected to server node1
can’t see the service registered on the server node2.
Name Path
printer /net/node3/dev/name/global/printer
... ...
Name Path
printer /net/node5/dev/name/global/printer
... ...
node1 node2
☞ If you have only a single network interface, the QoS policies don’t
apply at all.
QoS policies
Qnet supports transmission over multiple networks and provides the
following policies for specifying how Qnet should select a network
interface for transmission:
exclusive Qnet uses one — and only one — link, ignoring all
others, even if the exclusive link fails.
loadbalance
Qnet decides which links to use for sending packets, depending on
current load and link speeds as determined by io-net. A packet is
queued on the link that can deliver the packet the soonest to the
remote end. This effectively provides greater bandwidth between
nodes when the links are up (the bandwidth is the sum of the
bandwidths of all available links) and allows a graceful degradation of
service when links fail.
If a link does fail, Qnet switches to the next available link. By default,
this switch takes a few seconds the first time, because the network
driver on the bad link will have timed out, retried, and finally died.
But once Qnet “knows” that a link is down, it will not send user data
over that link. (This is a significant improvement over the QNX 4
implementation.)
The time required to switch to another link can be set to whatever is
appropriate for your application using command line options of Qnet.
See npm-qnet-l4 lite.so documentation.
Using these options, you can create a redundant behavior by
minimizing the latency that occurs when switching to another
interface in case one of the interfaces fail.
While load-balancing among the live links, Qnet sends periodic
maintenance packets on the failed link in order to detect recovery.
When the link recovers, Qnet places it back into the pool of available
links.
preferred
With this policy, you specify a preferred link to use for transmissions.
Qnet uses only that one link until it fails. If your preferred link fails,
Qnet then turns to the other available links and resumes transmission,
using the loadbalance policy.
Once your preferred link is available again, Qnet again uses only that
link, ignoring all others (unless the preferred link fails).
exclusive
You use this policy when you want to lock transmissions to only one
link. Regardless of how many other links are available, Qnet will
latch onto the one interface you specify. And if that exclusive link
fails, Qnet will not use any other link.
Why would you want to use the exclusive policy? Suppose you
have two networks, one much faster than the other, and you have an
application that moves large amounts of data. You might want to
restrict transmissions to only the fast network, in order to avoid
swamping the slow network if the fast one fails.
/net/node1˜exclusive:en0/dev/ser1
The QoS parameter always begins with a tilde (˜) character. Here
we’re telling Qnet to lock onto the en0 interface exclusively, even if it
fails.
Symbolic links
You can set up symbolic links to the various “QoS-qualified”
pathnames:
☞ You can’t create symbolic links inside /net because Qnet takes over
that namespace.
rm /remote/sql server
ln -sP /net/magenta /remote/sql server
This removes node1 and reassigns the service to node2. The real
advantage here is that applications can be coded based on the abstract
“service name” rather than be bound to a specific node name.
For a real world example of choosing appropriate QoS policy in an
application, see the following section on designing a system using
Qnet.
High-speed
transport
Controller Data
card cards
¯ Starting Qnet
$ ls /net
$ ls /net/cc0
You get the following output (i.e. the contents of the root of the
filesystem for the controller card):
. .inodes mnt0 tmp
.. .longfilenames mnt1 usr
.altboot bin net var
.bad blks dev proc xfer
.bitmap etc sbin
.boot home scratch
A simple variation of the above command requires that you run the
following command during initialization:
$ ln -s /net/cc0/dev/mqueue /mq
Controller Data
card cards
These selections allow you to control how data will flow via different
transports.
In order to do that, first, find out what interfaces are available. Use the
following command at the prompt of any card:
ls /dev/io-net
hs0 hs1
Low-speed transport
Controller Data
card cards
the primary controller card, Use the following command to access this
card in /cc directory:
ln -s /net/cc0 /cc
Scalability
You can also scale your resources to run a particular server
application using additional controller cards. For example, if your
controller card (either a SMP or non-SMP board) doesn’t have the
necessary resources (e.g. CPU cycle, memory), you could increase
the total processor and box resources by using additional controller
cards. Qnet transparently distributes the (load of) application servers
across two or more controller cards.
Autodiscovery vs static
When you’re creating a network of Neutrino hosts via Qnet, one thing
you must consider is how they locate and address each other. This
falls into two categories: autodiscovery and static mappings.
The decision to use one or the other can depend on security and ease
of use.
The advantage of using Qnet is that it lets you build a truly distributed
processing system with incredible scalability. For many applications,
it could be a benefit to be able to share resources among your
application systems (nodes). Qnet implements a native network
protocol to build this distributed processing system.
The basic purpose of Qnet is to extend Neutrino message passing to
work over a network link. It lets these machines share all their
resources with little overhead. A Qnet network is a trusted
environment where resources are tightly integrated, and remote
manager processes can be accessed transparently. For example, with
Qnet, you can use the Neutrino utilities ( cp, mv and so on) to
manipulate files anywhere on the Qnet network as if they were on
your machine. Because it’s meant for a group of trusted machines
(such as you’d find in an embedded system), Qnet doesn’t do any
authentication of remote requests. Also, the application really doesn’t
know whether it’s accessing a resource on a remote system; and most
importantly, the application doesn’t need any special code to handle
this capability.
If you’re developing a system that requires remote procedure calling
(RPC), or remote file access, Qnet provides this capability
transparently. In fact, you use a form of remote procedure call (a
Neutrino message pass) every time you access a manager on your
Neutrino system. Since Qnet creates an environment where there’s no
difference between accessing a manager locally or remotely, remote
procedure calling (capability) is builtin. You don’t need to write
source code to distribute your services. Also, since you are sharing
the filesystem between systems, there’s no need for NFS to access
files on other Neutrino hosts (of the same endian), because you can
access remote filesystem managers the same way you access your
local one. Files are protected by the normal permissions that apply to
users and groups (see “File ownership and permissions” in the
Working with Files chapter in the User’s Guide).
There are several ways to control access to a Qnet node, if required:
QNX
Neutrino
QNX
Windows
TCP/IP Neutrino
(SNMP,
Monitoring HTTP) QNX
host Neutrino
Qnet
Another issue may be the required behavior. For example, NFS has
been designed for filesystem operations between all hosts and all
endians. It’s widely supported and a connectionless protocol. In NFS,
the server can be shut down and restarted, and the client resumes
automatically. NFS also uses authentication and controls directory
access. However, NFS retries forever to reach a remote host if it
doesn’t respond, whereas Qnet can return an error if connectivity is
lost to a remote host. For more information, see “NFS filesystem” in
Working with Filesystems in the User’s Guide).
If you require broadcast or multicast services, you need to look at
TCP/IP functionalities, because Qnet is based on Neutrino message
passing, and has no concept of broadcasting or multicasting.
TCP/IP Qnet
io-net
Ethernet driver
(e.g. devn-speedo)
en0
In the above case, io-net is actually the driver that transmits and
receives packets, and thus acts as a hardware-abstraction layer. Qnet
doesn’t care about the details of the Ethernet hardware or driver.
So, if you simply want new Ethernet hardware supported, you don’t
need to write a Qnet-specific driver. What you need is just a normal
Ethernet driver that knows how to interface to io-net.
There is a bit of code at the very bottom of Qnet that’s specific to
io-net and has knowledge of exactly how io-net likes to transmit
and receive packets. This is the L4 driver API abstraction layer.
Let’s take a look at the arrangement of layers that exist in the node
when Qnet is run with the optional binding of IP encapsulation (e.g.
bind=ip):
Qnet
TCP/IP
io-net
Ethernet driver
(e.g. devn-speedo)
/dev/io-net/en0
Qnet
Your superfast
hardware driver
Just as before, Qnet needs a little code at the very bottom that knows
exactly how to transmit and receive packets to this new driver. There
exists a standard internal API (L4 driver API) between the rest of
Qnet and this very bottom portion, the driver interface. Qnet already
supports different packet transmit/receive interfaces, so adding
another is reasonably straightforward. The transport mechanism of
Qnet (called the L4) is quite generic, and can be configured for
different size MTUs, whether or not ACK packets or CRC checks are
required, to take the full advantage of your link’s advanced features
(e.g. guaranteed reliability).
For more details, see the QNX Momentics Transparent Distributed
Processing Source Kit (TDP SK) documentation.
In this chapter. . .
What’s an interrupt? 229
Attaching and detaching interrupts 229
Interrupt Service Routine (ISR) 230
Running out of interrupt events 241
Advanced topics 241
What’s an interrupt?
The key to handling hardware events in a timely manner is for the
hardware to generate an interrupt. An interrupt is simply a pause in,
or interruption of, whatever the processor was doing, along with a
request to do something else.
The hardware generates an interrupt whenever it has reached some
state where software intervention is desired. Instead of having the
software continually poll the hardware — which wastes CPU time —
an interrupt is the preferred method of “finding out” that the hardware
requires some kind of service. The software that handles the interrupt
is therefore typically called an Interrupt Service Routine (ISR).
Although crucial in a realtime system, interrupt handling has
unfortunately been a very difficult and awkward task in many
traditional operating systems. Not so with Neutrino. As you’ll see in
this chapter, handling interrupts is almost trivial; given the fast
context-switch times in Neutrino, most if not all of the “work”
(usually done by the ISR) is actually done by a thread.
Let’s take a look at the Neutrino interrupt functions and at some ways
of dealing with interrupts.
#define IRQ3 3
/*
* Associate the interrupt handler, serint,
* with IRQ 3, the 2nd PC serial port
*/
ThreadCtl( NTO TCTL IO, 0 );
id = InterruptAttach (IRQ3, serint, NULL, 0, 0);
...
¯ updating some data structures shared between the ISR and some of
the threads running in the application
Depending on the complexity of the hardware device, the ISR, and the
application, some of the above steps may be omitted.
Let’s take a look at these steps in turn.
1 2 3 4
Hardware interrupt Active
request line Inactive
IRQx
Intx
IRQy
Inty
a new interrupt, the interrupt is still considered active even when the
original cause of the interrupt is removed (step 3). Not until the last
assertion is cleared (step 4) will the interrupt be considered inactive.
In edge-triggered mode, the interrupt is “noticed” only once, at step 1.
Only when the interrupt line is cleared, and then reasserted, does the
PIC consider another interrupt to have occurred.
Neutrino allows ISR handlers to be stacked, meaning that multiple
ISRs can be associated with one particular IRQ. The impact of this is
that each handler in the chain must look at its associated hardware and
determine if it caused the interrupt. This works reliably in a
level-sensitive environment, but not an edge-triggered environment.
To illustrate this, consider the case where two hardware devices are
sharing an interrupt. We’ll call these devices “HW-A” and “HW-B.”
Two ISR routines are attached to one interrupt source (via the
InterruptAttach() or InterruptAttachEvent() call), in sequence (i.e.
ISR-A is attached first in the chain, ISR-B second).
Now, suppose HW-B asserts the interrupt line first. Neutrino detects
the interrupt and dispatches the two handlers in order — ISR-A runs
first and decides (correctly) that its hardware did not cause the
interrupt. Then ISR-B runs and decides (correctly) that its hardware
did cause the interrupt; it then starts servicing the interrupt. But
before ISR-B clears the source of the interrupt, suppose HW-A asserts
an interrupt; what happens depends on the type of IRQ.
Edge-triggered IRQ
If you have an edge-triggered bus, when ISR-B clears the source of
the interrupt, the IRQ line is still held active (by HW-A). But because
it’s edge-triggered, the PIC is waiting for the next clear/assert
transition before it decides that another interrupt has occurred. Since
ISR-A already ran, it can’t possibly run again to actually clear the
source of the interrupt. The result is a “hung” system, because the
interrupt will never transit between clear and asserted again, so no
further interrupts on that IRQ line will ever be recognized.
Level-sensitive IRQ
On a level-sensitive bus, when ISR-B clears the source of the
interrupt, the IRQ line is still held active (by HW-A). When ISR-B
finishes running and Neutrino sends an EOI (End Of Interrupt)
command to the PIC, the PIC immediately reinterrupts the kernel,
causing ISR-A (and then ISR-B) to run.
Since ISR-A clears the source of the interrupt (and ISR-B doesn’t do
anything, because its associated hardware doesn’t require servicing),
everything functions as expected.
¯ The type of functions that the ISR itself can execute is very limited
(those that don’t call any kernel functions, except the ones listed
below).
¯ The ISR runs at a priority higher than any software priority in the
system — having the ISR consume a significant amount of
processor has a negative impact on the realtime aspects of
Neutrino.
Safe functions
When the ISR is servicing the interrupt, it can’t make any kernel calls
(except for the few that we’ll talk about shortly). This means that you
need to be careful about the library functions that you call in an ISR,
because their underlying implementation may use kernel calls.
☞ For a list of the functions that you can call from an ISR, see the
Summary of Safety Information appendix in the Library Reference.
Here are the only kernel calls that the ISR can use:
¯ InterruptMask()
¯ InterruptUnmask()
¯ TraceEvent()
You’ll also find these functions (which aren’t kernel calls) useful in an
ISR:
¯ InterruptLock()
¯ InterruptUnlock()
¯ You may decide that nothing needs to be done in the ISR; you just
want to schedule a thread.
int
InterruptAttach (int intr,
const struct sigevent * (*handler) (void *, int),
const void *area,
int size,
unsigned flags);
int
InterruptAttachEvent (int intr,
const struct sigevent *event,
unsigned flags);
Using InterruptAttach()
Looking at the prototype for InterruptAttach(), the function associates
the IRQ vector (intr) with your ISR handler (handler), passing it a
communications area (area). The size and flags arguments aren’t
germane to our discussion here (they’re described in the Library
Reference for the InterruptAttach() function).
For the ISR, the handler() function takes a void * pointer and an
int identification parameter; it returns a const struct sigevent
* pointer. The void * area parameter is the value given to the
InterruptAttach() function — any value you put in the area parameter
to InterruptAttach() is passed to your handler() function. (This is
...
// do the work
...
// if the isr handler did an InterruptMask, then
// this thread should do an InterruptUnmask to
// allow interrupts from the hardware
}
}
Using InterruptAttachEvent()
Most of the discussion above for InterruptAttach() applies to the
InterruptAttachEvent() function, with the obvious exception of the
ISR. You don’t provide an ISR in this case — the kernel notes that
you called InterruptAttachEvent() and handles the interrupt itself.
Since you also bound a struct sigevent to the IRQ, the kernel
can now dispatch the event. The major advantage is that we avoid a
context switch into the ISR and back.
An important point to note is that the kernel automatically performs
an InterruptMask() in the interrupt handler. Therefore, it’s up to you
to perform an InterruptUnmask() when you actually clear the source
of the interrupt in your interrupt-handling thread. This is why
InterruptMask() and InterruptUnmask() are counting.
¯ The interrupt load is too high for the CPU (it’s spending all of the
time handling the interrupt).
Or:
Advanced topics
Now that we’ve seen the basics of handling interrupts, let’s take a
look at some more details and some advanced topics.
Interrupt environment
When your ISR is running, it runs in the context of the process that
attached it, except with a different stack. Since the kernel uses an
internal interrupt-handling stack for hardware interrupts, your ISR is
impacted in that the internal stack is small. Generally, you can assume
that you have about 200 bytes available.
The PIC doesn’t get the EOI command until after all ISRs — whether
supplied by your code via InterruptAttach() or by the kernel if you
use InterruptAttachEvent() — for that particular interrupt have been
run. Then the kernel itself issues the EOI; your code should not issue
the EOI command.
Normally, any interrupt sources that don’t have an ISR associated
with them are masked off by the kernel. The kernel automatically
unmasks an interrupt source when at least one ISR is attached to it
and masks the source when no more ISRs are attached.
Interrupt latency
Another factor of concern for realtime systems is the amount of time
taken between the generation of the hardware interrupt and the first
line of code executed by the ISR. There are two factors to consider
here:
Atomic operations
Some convenience functions are defined in the include file
<atomic.h> — these allow you to perform atomic operations (i.e.
operations that are guaranteed to be indivisible or uninterruptible).
¯ adding a value
¯ subtracting a value
¯ clearing bits
¯ setting bits
¯ toggling bits
In this chapter. . .
Introduction 247
Dynamic memory management 247
Heap corruption 248
Detecting and reporting errors 252
Manual checking (bounds checking) 265
Memory leaks 268
Compiler support 271
Summary 274
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 245
2005, QNX Software Systems Introduction
Introduction
If you develop a program that dynamically allocates memory, you’re
also responsible for tracking any memory that you allocate whenever
a task is performed, and for releasing that memory when it’s no longer
required. If you fail to track the memory correctly you may introduce
“memory leaks,” or unintentionally write to an area outside of the
memory space.
Conventional debugging techniques usually prove to be ineffective for
locating the source of corruption or leak because memory-related
errors typically manifest themselves in an unrelated part of the
program. Tracking down an error in a multithreaded environment
becomes even more complicated because the threads all share the
same memory address space.
In this chapter, we’ll introduce you to a special version of our
memory management functions that’ll help you to diagnose your
memory management problems.
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 247
Heap corruption 2005, QNX Software Systems
returns memory from the heap to the system when the program
releases memory.
Heap corruption
Heap corruption occurs when a program damages the allocator’s view
of the heap. The outcome can be relatively benign and cause a
memory leak (where some memory isn’t returned to the heap and is
inaccessible to the program afterwards), or it may be fatal and cause a
memory fault, usually within the allocator itself. A memory fault
typically occurs within the allocator when it manipulates one or more
of its free lists after the heap has been corrupted.
It’s especially difficult to identify the source of corruption when the
source of the fault is located in another part of the code base. This is
likely to happen if the fault occurs when:
248 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Heap corruption
Multithreaded programs
Multithreaded execution may cause a fault to occur in a different
thread from the thread that actually corrupted the heap, because
threads interleave requests to allocate or release memory.
When the source of corruption is located in another part of the code
base, conventional debugging techniques usually prove to be
ineffective. Conventional debugging typically applies breakpoints —
such as stopping the program from executing — to narrow down the
offending section of code. While this may be effective for
single-threaded programs, it’s often unyielding for multithreaded
execution because the fault may occur at an unpredictable time and
the act of debugging the program may influence the appearance of the
fault by altering the way that thread execution occurs. Even when the
source of the error has been narrowed down, there may be a
substantial amount of manipulation performed on the block before it’s
released, particularly for long-lived heap buffers.
Allocation strategy
A program that works in a particular memory allocation strategy may
abort when the allocation strategy is changed in a minor way. A good
example of this would be a memory overrun condition (for more
information see “Overrun and underrun errors,” below) where the
allocator is free to return blocks that are larger than requested in order
to satisfy allocation requests. Under this circumstance, the program
may behave normally in the presence of overrun conditions. But a
simple change, such as changing the size of the block requested, may
result in the allocation of a block of the exact size requested, resulting
in a fatal error for the offending program.
Fatal errors may also occur if the allocator is configured slightly
differently, or if the allocator policy is changed in a subsequent
release of the runtime library. This makes it all the more important to
detect errors early in the life cycle of an application, even if it doesn’t
exhibit fatal errors in the testing phase.
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 249
Heap corruption 2005, QNX Software Systems
Common sources
Some of the most common sources of heap corruption include:
¯ releasing memory
Even the most robust allocator can occasionally fall prey to the above
problems.
Let’s take a look at the last three bullets in more detail:
250 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Heap corruption
Releasing memory
Requests to release memory requires your program to track the
pointer for the allocated block and pass it to the free() function. If the
pointer is stale, or if it doesn’t point to the exact start of the allocated
block, it may result in heap corruption.
A pointer is stale when it refers to a block of memory that’s already
been released. A duplicate request to free() involves passing free() a
stale pointer — there’s no way to know whether this pointer refers to
unallocated memory, or to memory that’s been used to satisfy an
allocation request in another part of the program.
Passing a stale pointer to free() may result in a fault in the allocator, or
worse, it may release a block that’s been used to satisfy another
allocation request. If this happens, the code making the allocation
request may compete with another section of code that subsequently
allocated the same region of heap, resulting in corrupted data for one
or both. The most effective way to avoid this error is to NULL out
pointers when the block is released, but this is uncommon, and
difficult to do when pointers are aliased in any way.
A second common source of errors is to attempt to release an interior
pointer (i.e. one that’s somewhere inside the allocated block rather
than at the beginning). This isn’t a legal operation, but it may occur
when the pointer has been used in conjunction with pointer
arithmetic. The result of providing an interior pointer is highly
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 251
Detecting and reporting errors 2005, QNX Software Systems
To achieve this goal, we’ll use a replacement library for the allocator
that can keep additional block information in the header of every heap
buffer. This library may be used during the testing of the application
to help isolate any heap corruption problems. When a source of heap
corruption is detected by this allocator, it can print an error message
indicating:
252 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Detecting and reporting errors
The library technique can be refined to also detect some of the sources
of errors that may still elude detection, such as memory overrun or
underrun errors, that occur before the corruption is detected by the
allocator. This may be done when the standard libraries are the
vehicle for the heap corruption, such as an errant call to memcpy(), for
example. In this case, the standard memory manipulation functions
and string functions can be replaced with versions that make use of
the information in the debugging allocator library to determine if their
arguments reside in the heap, and whether they would cause the
bounds of the heap buffer to be exceeded. Under these conditions, the
function can then call the error reporting functions to provide
information about the source of the error.
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 253
Detecting and reporting errors 2005, QNX Software Systems
LD PRELOAD=/usr/lib/malloc g/libmalloc.so.2
or:
LD PRELOAD=/usr/lib/libmalloc.so.2
☞ In this chapter, all references to the malloc library refer to the debug
version, unless otherwise specified.
Both versions of the library share the same internal shared object
name, so it’s actually possible to link against the nondebug library and
test using the debug library when you run your application. To do
this, you must change the $LD LIBRARY PATH as indicated above.
The nondebug library doesn’t perform heap checking; it provides the
same memory allocator as the system library.
By default, the malloc library provides a minimal level of checking.
When an allocation or release request is performed, the library checks
only the immediate block under consideration and its neighbors,
looking for sources of heap corruption.
Additional checking and more informative error reporting can be
done by using additional calls provided by the malloc library. The
mallopt() function provides control over the types of checking
performed by the library. There are also debug versions of each of the
allocation and release routines that you can use to provide both file
and line information during error-reporting. In addition to reporting
the file and line information about the caller when an error is detected,
the error-reporting mechanism prints out the file and line information
that was associated with the allocation of the offending heap buffer.
To control the use of the malloc library and obtain the correct
prototypes for all the entry points into it, it’s necessary to include a
different header file for the library. This header file is included in
<malloc g/malloc.h>. If you want to use any of the functions
254 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Detecting and reporting errors
defined in this header file, other than mallopt(), make sure that you
link your application with the debug library. If you forget, you’ll get
undefined references during the link.
The recommended practice for using the library is to always use the
library for debug variants in builds. In this case, the macro used to
identify the debug variant in C code should trigger the inclusion of the
<malloc g/malloc.h> header file, and the malloc debug library
option should always be added to the link command. In addition, you
may want to follow the practice of always adding an exit handler that
provides a dump of leaked memory, and initialization code that turns
on a reasonable level of checking for the debug variant of the
program.
The malloc library achieves what it needs to do by keeping
additional information in the header of each heap buffer. The header
information includes additional storage for keeping doubly-linked
lists of all allocated blocks, file, line and other debug information,
flags and a CRC of the header. The allocation policies and
configuration are identical to the normal system memory allocation
routines except for the additional internal overhead imposed by the
malloc library. This allows the malloc library to perform checks
without altering the size of blocks requested by the program. Such
manipulation could result in an alteration of the behavior of the
program with respect to the allocator, yielding different results when
linked against the malloc library.
All allocated blocks are integrated into a number of allocation chains
associated with allocated regions of memory kept by the allocator in
arenas or blocks. The malloc library has intimate knowledge about
the internal structures of the allocator, allowing it to use short cuts to
find the correct heap buffer associated with any pointer, resorting to a
lookup on the appropriate allocation chain only when necessary. This
minimizes the performance penalty associated with validating
pointers, but it’s still significant.
The time and space overheads imposed by the malloc library are too
great to make it suitable for use as a production library, but are
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 255
Detecting and reporting errors 2005, QNX Software Systems
What’s checked?
As indicated above, the malloc library provides a minimal level of
checking by default. This includes a check of the integrity of the
allocation chain at the point of the local heap buffer on every
allocation request. In addition, the flags and CRC of the header are
checked for integrity. When the library can locate the neighboring
heap buffers, it also checks their integrity. There are also checks
specific to each type of allocation request that are done. Call-specific
checks are described according to the type of call below.
You can enable additional checks by using the mallopt() call. For
more information on the types of checking, and the sources of heap
corruption that can be detected, see of “Controlling the level of
checking,” below.
Allocating memory
When a heap buffer is allocated using any of the heap-allocation
routines, the heap buffer is added to the allocation chain for the arena
or block within the heap that the heap buffer was allocated from. At
this time, any problems detected in the allocation chain for the arena
or block are reported. After successfully inserting the allocated buffer
in the allocation chain, the previous and next buffers in the chain are
also checked for consistency.
Reallocating memory
When an attempt is made to resize a buffer through a call to the
realloc() function, the pointer is checked for validity if it’s a
non-NULL value. If it’s valid, the header of the heap buffer is checked
for consistency. If the buffer is large enough to satisfy the request, the
buffer header is modified, and the call returns. If a new buffer is
required to satisfy the request, memory allocation is performed to
obtain a new buffer large enough to satisfy the request with the same
consistency checks being applied as in the case of memory allocation
described above. The original buffer is then released.
256 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Detecting and reporting errors
Releasing memory
This includes, but isn’t limited to, checking to ensure that the pointer
provided to a free() request is correct and points to an allocated heap
buffer. Guard code checks may also be performed on release
operations to allow fill-area boundary checking.
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 257
Detecting and reporting errors 2005, QNX Software Systems
char *p;
int opt;
opt = 1;
mallopt(MALLOC CKACCESS, opt);
p = malloc(30);
free(p);
strcpy(p, "hello, there!");
258 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Detecting and reporting errors
MALLOC FILLAREA
Turn on (or off) fill-area boundary checking that validates that
the program hasn’t overrun the user-requested size of a heap
buffer. Environment variable: MALLOC FILLAREA.
The value argument can be:
¯ zero to disable the checking
¯ nonzero to enable it.
It does this by applying a guard code check when the buffer is
released or when it’s resized. The guard code check works by
filling any excess space available at the end of the heap buffer
with a pattern of bytes. When the buffer is released or resized,
the trailing portion is checked to see if the pattern is still
present. If not, a diagnostic warning message is printed.
The effect of turning on fill-area boundary checking is a little
different than enabling other checks. The checking is performed
only on memory buffers allocated after the point in time at
which the check was enabled. Memory buffers allocated before
the change won’t have the checking performed.
Here’s how you can catch an overrun with the fill-area boundary
checking option:
...
...
int *foo, *p, i, opt;
opt = 1;
mallopt(MALLOC FILLAREA, opt);
foo = (int *)malloc(10*4);
for (p = foo, i = 12; i > 0; p++, i--)
*p = 89;
free(foo); /* a warning is generated here */
MALLOC CKCHAIN
Enable (or disable) full-chain checking. This option is
expensive and should be considered as a last resort when some
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 259
Detecting and reporting errors 2005, QNX Software Systems
Forcing verification
You can force a full allocation chain check at certain points while
your program is executing, without turning on chain checking.
Specify the following option for cmd:
MALLOC VERIFY
Perform a chain check immediately. If an error is found,
perform error handling. The value argument is ignored.
260 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Detecting and reporting errors
MALLOC FATAL
Specify the malloc fatal handler. Environment
variable: MALLOC FATAL.
MALLOC WARN
Specify the malloc warning handler handler.
Environment variable: MALLOC WARN.
M HANDLE ABORT
Terminate execution with a call to abort().
M HANDLE EXIT
Exit immediately.
M HANDLE IGNORE
Ignore the error and continue.
M HANDLE CORE
Cause the program to dump a core file.
M HANDLE SIGNAL
Stop the program when this error occurs, by sending
itself a stop signal. This lets you one attach to this
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 261
Detecting and reporting errors 2005, QNX Software Systems
Handler Value
M HANDLE IGNORE 0
M HANDLE ABORT 1
M HANDLE EXIT 2
M HANDLE CORE 3
M HANDLE SIGNAL 4
☞ You can OR any of these handlers with the value, MALLOC DUMP,
to cause a complete dump of the heap before the handler takes action.
Here’s how you can cause a memory overrun error to abort your
program:
...
int *foo, *p, i;
int opt;
opt = 1;
mallopt(MALLOC FILLAREA, opt);
foo = (int *)malloc(10*4);
for (p = foo, i = 12; i > 0; p++, i--)
*p = 89;
opt = M HANDLE ABORT;
mallopt(MALLOC WARN, opt);
free(foo); /* a fatal error is generated here */
262 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Detecting and reporting errors
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 263
Detecting and reporting errors 2005, QNX Software Systems
Caveats
The debug malloc library, when enabled with various checking, uses
more stack space (i.e. calls more functions, uses more local variables
etc.) than the regular libc allocator. This implies that programs that
explicitly set the stack size to something smaller than the default may
encounter problems such as running out of stack space. This may
cause the program to crash. You can prevent this by increasing the
stack space allocated to the threads in question.
MALLOC FILLAREA is used to do fill-area checking. If fill-area
checking isn’t enabled, the program can’t detected certain types of
errors. For example, errors that occur where an application accesses
beyond the end of a block, and the real block allocated by the
allocator is larger than what was requested, the allocator won’t flag an
error unless MALLOC FILLAREA is enabled. By default, this
environment variable isn’t enabled.
MALLOC CKACCESS is used to validate accesses to the str* and
mem* family of functions. If this variable isn’t enabled, such accesses
won’t be checked, and errors aren’t reported. By default, this
environment variable isn’t enabled.
MALLOC CKCHAIN performs extensive heap checking on every
allocation. When you enable this environment variable, allocations
can be much slower. Also since full heap checking is performed on
every allocation, an error anywhere in the heap could be reported
upon entry into the allocator for any operation. For example, a call to
free(x) will check block x, and also the complete heap for errors
before completing the operation (to free block x). So any error in the
heap will be reported in the context of freeing block x, even if the
error itself isn’t specifically related to this operation.
264 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Manual checking (bounds checking)
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 265
Manual checking (bounds checking) 2005, QNX Software Systems
that contains all relevant information about the pointer, including the
current value, the base pointer and the extent of the buffer. Access to
the pointer can then be controlled through macros or access functions.
The accessors can perform the necessary bounds checks and print a
warning message in response to attempts to exceed the bounds.
Any attempt to dereference the current pointer value can be checked
against the boundaries obtained when the pointer was initialized. If
the boundary is exceeded the malloc warning() function should be
called to print a diagnostic message and perform error handling. The
arguments are: file, line, message.
266 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Manual checking (bounds checking)
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 267
Memory leaks 2005, QNX Software Systems
Memory leaks
The ability of the malloc library to keep full allocation chains of all
the heap memory allocated by the program — as opposed to just
accounting for some heap buffers — allows heap memory leaks to be
detected by the library in response to requests by the program. Leaks
can be detected in the program by performing tracing on the entire
heap. This is described in the sections that follow.
Tracing
Tracing is an operation that attempts to determine whether a heap
object is reachable by the program. In order to be reachable, a heap
buffer must be available either directly or indirectly from a pointer in
a global variable or on the stack of one of the threads. If this isn’t the
case, then the heap buffer is no longer visible to the program and can’t
be accessed without constructing a pointer that refers to the heap
buffer — presumably by obtaining it from a persistent store such as a
file or a shared memory object. The set of global variables and stack
for all threads is called the root set. Because the root set must be
stable for tracing to yield valid results, tracing requires that all threads
other than the one performing the trace be suspended while the trace
is performed.
Tracing operates by constructing a reachability graph of the entire
heap. It begins with a root set scan that determines the root set
comprising the initial state of the reachability graph. The roots that
can be found by tracing are:
Once the root set scan is complete, tracing initiates a mark operation
for each element of the root set. The mark operation looks at a node
268 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Memory leaks
Suspend all threads, clear the mark information for all heap buffers,
perform the trace operation, and print a report of all memory leaks
detected. All items are reported in memory order.
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 269
Memory leaks 2005, QNX Software Systems
detail Indicate how the trace operation should deal with any heap
corruption problems it encounters. For a value of:
Analyzing dumps
The dump of unreferenced buffers prints out one line of information
for each unreferenced buffer. The information provided for a buffer
includes:
File and line information is available if the call to allocate the buffer
was made using one of the library’s debug interfaces. Otherwise, the
return address of the call is reported in place of the line number. In
some circumstances, no return address information is available. This
usually indicates that the call was made from a function with no frame
information, such as the system libraries. In such cases, the entry can
usually be ignored and probably isn’t a leak.
From the way tracing is performed we can see that some leaks may
escape detection and may not be reported in the output. This happens
if the root set or a reachable buffer in the heap has something that
looks like a pointer to the buffer.
270 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Compiler support
Compiler support
Manual bounds checking can be avoided in circumstances where the
compiler is capable of supporting bounds checking under control of a
compile-time option. For C compilers this requires explicit support in
the compiler. Patches are available for the Gnu C Compiler that allow
it to perform bounds checking on pointers in this manner. This will be
dealt with later. For C++ compilers extensive bounds checking can be
performed through the use of operator overloading and the
information functions described earlier.
C++ issues
In place of a raw pointer, C++ programs can make use of a
CheckedPtr template that acts as a smart pointer. The smart pointer
has initializers that obtain complete information about the heap buffer
on an assignment operation and initialize the current pointer position.
Any attempt to dereference the pointer causes bounds checking to be
performed and prints a diagnostic error in response an attempt to
dereference a value beyond the bounds of the buffer. The
CheckedPtr template is provided in the <malloc g/malloc>
header for C++ programs.
The checked pointer template provided for C++ programs can be
modified to suit the needs of the program. The bounds checking
performed by the checked pointer is restricted to checking the actual
bounds of the heap buffer, rather than the program requested size.
For C programs it’s possible to compile individual modules that obey
certain rules with the C++ compiler to get the behavior of the
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 271
Compiler support 2005, QNX Software Systems
Clean C
The Clean C dialect is that subset of ANSI C that is compatible with
the C++ language. Writing Clean C requires imposing coding
conventions to the C code that restrict use to features that are
acceptable to a C++ compiler. This section provides a summary of
some of the more pertinent points to be considered. It is a mostly
complete but by no means exhaustive list of the rules that must be
applied.
To use the C++ checked pointers, the module including all header
files it includes must be compatible with the Clean C subset. All the
system headers for Neutrino as well as the <malloc g/malloc.h>
header satisfy this requirement.
The most obvious aspect to Clean C is that it must be strict ANSI C
with respect to function prototypes and declarations. The use of K&R
prototypes or definitions isn’t allowable in Clean C. Similarly, default
types for variable and function declarations can’t be used.
Another important consideration for declarations is that forward
declarations must be provided when referencing an incomplete
structure or union. This frequently occurs for linked data structures
such as trees or lists. In this case the forward declaration must occur
before any declaration of a pointer to the object in the same or another
structure or union. For example, a list node may be declared as
follows:
struct ListNode;
struct ListNode {
struct ListNode *next;
void *data;
};
272 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
2005, QNX Software Systems Compiler support
C++ example
Here’s how the overrun example from earlier could have the exact
source of the error pinpointed with checked pointers:
typedef CheckedPtr<int> intp t;
...
intp t foo, p;
int i;
int opt;
opt = 1;
mallopt(MALLOC FILLAREA, opt);
foo = (int *)malloc(10*4);
opt = M HANDLE ABORT;
mallopt(MALLOC WARN, opt);
for (p = foo, i = 12; i > 0; p++, i--)
*p = 89; /* a fatal error is generated here */
opt = M HANDLE IGNORE;
mallopt(MALLOC WARN, opt);
free(foo);
October 6, 2005 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the Past 273
Summary 2005, QNX Software Systems
Summary
When you develop an application, we recommend that you test it
against a debug version that incorporates the malloc library to detect
possible sources of memory errors, such as overruns and memory
leaks.
The malloc library and the different levels of compiler support can be
very useful in detecting the source of overrun errors (which may
escape detection during integration testing) during unit testing and
program maintenance. However, in this case, more stringent checking
for low-level bounds checking of individual pointers may prove
useful. The use of the Clean C subset may also help by facilitating the
use of C++ templates for low-level checking. Otherwise, you might
consider porting the bounds checking variant of GCC to meet your
individual project requirements.
274 Chapter 7 ¯ Heap Analysis: Making Memory Errors a Thing of the PastOctober 6, 2005
Appendix A
Freedom from Hardware and
Platform Dependencies
In this appendix. . .
Common problems 277
Solutions 280
October 6, 2005 Appendix: A ¯ Freedom from Hardware and Platform Dependencies 275
2005, QNX Software Systems Common problems
Common problems
With the advent of multiplatform support, which involves non-x86
platforms as well as peripheral chipsets across these multiple
platforms, we don’t want to have to write different versions of device
drivers for each and every platform.
While some platform dependencies are unavoidable, let’s talk about
some of the things that you as a developer can do to minimize the
impact. At QNX Software Systems, we’ve had to deal with these
same issues — for example, we support the 8250 serial chip on
several different types of processors. Ethernet controllers, SCSI
controllers, and others are no exception.
Let’s look at these problems:
¯ Big-endian vs little-endian
¯ atomic operations
October 6, 2005 Appendix: A ¯ Freedom from Hardware and Platform Dependencies 277
Common problems 2005, QNX Software Systems
Big-endian vs little-endian
Big-endian vs little-endian is another compatibility issue with various
processor architectures. The issue stems from the byte ordering of
multibyte constants. The x86 architecture is little-endian. For
example, the hexadecimal number 0x12345678 is stored in memory as:
address contents
0 0x78
1 0x56
2 0x34
3 0x12
address contents
0 0x12
1 0x34
2 0x56
3 0x78
¯ typecast mangling
¯ hardware access
¯ network transparency
Typecast mangling
Consider the following code:
func ()
{
long a = 0x12345678;
char *p;
p = (char *) &a;
printf ("%02X\n", *p);
}
278 Appendix: A ¯ Freedom from Hardware and Platform Dependencies October 6, 2005
2005, QNX Software Systems Common problems
Hardware access
Sometimes the hardware can present you with a conflicting choice of
the “correct” size for a chunk of data. Consider a piece of hardware
that has a 4 KB memory window. If the hardware brings various data
structures into view with that window, it’s impossible to determine
a priori what the data size should be for a particular element of the
window. Is it a 32-bit long integer? An 8-bit character? Blindly
performing operations as in the above code sample will land you in
trouble, because the CPU will determine what it believes to be the
correct endianness, regardless of what the hardware manifests.
Network transparency
These issues are naturally compounded when heterogeneous CPUs
are used in a network with messages being passed among them. If the
implementor of the message-passing scheme doesn’t decide up front
what byte order will be used, then some form of identification needs
to be done so that a machine with a different byte ordering can receive
and correctly decode a message from another machine. This problem
has been solved with protocols like TCP/IP, where a defined network
byte order is always adhered to, even between homogeneous
machines whose byte order differs from the network byte order.
October 6, 2005 Appendix: A ¯ Freedom from Hardware and Platform Dependencies 279
Solutions 2005, QNX Software Systems
Atomic operations
One final problem that can occur with different families of processors,
and SMP configurations in general, is that of atomic access to
variables. Since this is so prevalent with interrupt service routines and
their handler threads, we’ve already talked about this in the chapter on
Writing an Interrupt Handler.
Solutions
Now that we’ve seen the problems, let’s take a look at some of the
solutions you can use. The following header files are shipped standard
with Neutrino:
<gulliver.h>
isolates big-endian vs little-endian issues
<hw/inout.h>
provides input and output functions for I/O or memory address
spaces
Determining endianness
The file <gulliver.h> contains macros to help resolve endian
issues. The first thing you may need to know is the target system’s
endianness, which you can find out via the following macros:
LITTLEENDIAN
defined if little-endian
280 Appendix: A ¯ Freedom from Hardware and Platform Dependencies October 6, 2005
2005, QNX Software Systems Solutions
BIGENDIAN
defined if big-endian
ENDIAN LE16()
uint16 t ENDIAN LE16 (uint16 t var)
If the host is little-endian, this macro does nothing (expands simply to
var); else, it performs a byte swap.
ENDIAN LE32()
uint32 t ENDIAN LE32 (uint32 t var)
If the host is little-endian, this macro does nothing (expands simply to
var); else, it performs a quadruple byte swap.
October 6, 2005 Appendix: A ¯ Freedom from Hardware and Platform Dependencies 281
Solutions 2005, QNX Software Systems
ENDIAN LE64()
uint64 t ENDIAN LE64 (uint64 t var)
If the host is little-endian, this macro does nothing (expands simply to
var); else, it swaps octets of bytes.
ENDIAN BE16()
uint16 t ENDIAN BE16 (uint16 t var)
If the host is big-endian, this macro does nothing (expands simply to
var); else, it performs a byte swap.
ENDIAN BE32()
uint32 t ENDIAN BE32 (uint32 t var)
If the host is big-endian, this macro does nothing (expands simply to
var); else, it performs a quadruple byte swap.
ENDIAN BE64()
uint64 t ENDIAN BE64 (uint64 t var)
If the host is big-endian, this macro does nothing (expands simply to
var); else, it swaps octets of bytes.
UNALIGNED RET16()
uint16 t UNALIGNED RET16 (uint16 t *addr16)
Returns a 16-bit quantity from the address specified by addr16.
282 Appendix: A ¯ Freedom from Hardware and Platform Dependencies October 6, 2005
2005, QNX Software Systems Solutions
UNALIGNED RET32()
uint32 t UNALIGNED RET32 (uint32 t *addr32)
Returns a 32-bit quantity from the address specified by addr32.
UNALIGNED RET64()
uint64 t UNALIGNED RET64 (uint64 t *addr64)
Returns a 64-bit quantity from the address specified by addr64.
UNALIGNED PUT16()
void UNALIGNED PUT16 (uint16 t *addr16, uint16 t
val16)
Stores the 16-bit value val16 into the address specified by addr16.
UNALIGNED PUT32()
void UNALIGNED PUT32 (uint32 t *addr32, uint32 t
val32)
Stores the 32-bit value val32 into the address specified by addr32.
UNALIGNED PUT64()
void UNALIGNED PUT64 (uint64 t *addr64, uint64 t
val64)
Stores the 64-bit value val64 into the address specified by addr64.
Examples
Here are some examples showing how to access different pieces of
data using the macros introduced so far.
Mixed-endian accesses
This code is written to be portable. It accesses little data (i.e. data
that’s known to be stored in little-endian format, perhaps as a result of
some on-media storage scheme), and then manipulates it, writing the
data back. This illustrates that the ENDIAN *() macros are
bidirectional.
October 6, 2005 Appendix: A ¯ Freedom from Hardware and Platform Dependencies 283
Solutions 2005, QNX Software Systems
284 Appendix: A ¯ Freedom from Hardware and Platform Dependencies October 6, 2005
2005, QNX Software Systems Solutions
Then, you’d map the memory region via mmap device memory().
Let’s say it gave you a char * pointer called ptr. Using this pointer,
you’d be tempted to:
cr1 = *(ptr + PKTTYPE OFF);
// wrong!
sr1 = * (uint32 t *) (ptr + PKTCRC OFF);
er1 = * (uint16 t *) (ptr + PKTLEN OFF);
The access for cr1 didn’t change, because it was already an 8-bit
variable — these are always “aligned.” However, the access for the
16- and 32-bit variables now uses the macros.
An implementation trick used here is to make the pointer that serves
as the base for the mapped area by a char * — this lets us do pointer
math on it.
To write to the hardware, you’d again use macros, but this time the
UNALIGNED PUT*() versions:
*(ptr + PKTTYPE OFF) = cr1;
UNALIGNED PUT32 (ptr + PKTCRC OFF, sr1);
UNALIGNED PUT16 (ptr + PKTLEN OFF, er1);
October 6, 2005 Appendix: A ¯ Freedom from Hardware and Platform Dependencies 285
Solutions 2005, QNX Software Systems
And:
*(ptr + PKTTYPE OFF) = cr1; // endian neutral
UNALIGNED PUT32 (ptr + PKTCRC OFF, ENDIAN BE32 (sr1));
UNALIGNED PUT16 (ptr + PKTLEN OFF, ENDIAN BE16 (er1));
286 Appendix: A ¯ Freedom from Hardware and Platform Dependencies October 6, 2005
2005, QNX Software Systems Solutions
October 6, 2005 Appendix: A ¯ Freedom from Hardware and Platform Dependencies 287
Appendix B
Conventions for Makefiles and
Directories
In this appendix. . .
Structure 291
Specifying options 297
Using the standard macros and include files 300
Advanced topics 310
Structure
Here’s a sample directory tree for a product that can be built for two
different operating systems (QNX 4 and Neutrino), on five CPU
platforms (x86, MIPS, PowerPC, ARM, and SH4), with both endian
combinations on the MIPS and PowerPC:
Project level
project
Section level
sec1 sec2
OS level
qnx4 nto
CPU level
arm mips ppc sh x86
Variant level
o.le o.be o.le o.be o.le o.le o
We’ll talk about the names of the directory levels shortly. At each
directory level is a Makefile file used by the make utility to
determine what to do in order to make the final executable.
However, if you examine the makefiles, you can see that most of them
simply contain:
include recurse.mk
We’ll discuss how to cause make to compile only certain parts of the
source tree, even if invoked from the top of the tree, in the “Advanced
topics” section.
If you look at the source tree that we ship, you’ll notice that we follow
the directory structure defined above, but with a few shortcuts. We’ll
cover those shortcuts in the “Advanced Topics” section.
Makefile structure
As mentioned earlier, the makefile structure is almost identical,
regardless of the level that the makefile is found in. All makefiles
(except the bottommost level) include the recurse.mk file and may
set one or more macros.
Here’s an example of one of our standard (nonbottommost)
Makefiles:
LATE DIRS=boards
include recurse.mk
Macros
The example given above uses the LATE DIRS macro. Here are the
macros that can be placed within a makefile:
¯ EARLY DIRS
¯ LATE DIRS
¯ LIST
¯ MAKEFILE
¯ CHECKFORCE
¯ VARIANT
¯ CPU
¯ OS
Note that you’re free to define whatever values you wish — these are
simply conventions that we’ve adopted for the three directory levels
specified. See the section on “More uses for LIST,” below.
Once the directory has been identified via a tag in the makefile, you
can specifically exclude or include the directory and its descendents in
a make invocation. See “Performing partial builds” below.
Directory structure
Let’s look at the directory levels themselves in some detail. Note that
you can add as many levels as you want above the levels described
here — these levels would reflect your product. For example, in a
factory automation system, the product would consist of the entire
system — you would then have several subdirectories under that
directory level to describe various projects within that product (e.g.
gui, pidloop, robot plc, etc.).
The OS level
If you were building products to run on multiple operating systems,
you’d include an OS level directory structure. This would serve as a
branchpoint for OS-specific subdirectories. In our factory-floor
example, the gui section might be built for both QNX 4 and
Neutrino, whereas the other sections might be built just for Neutrino.
If no OS level is detected, Neutrino is assumed.
Generally, the CPU level would contain nothing but subdirectories for
the various CPUs, but it may also contain CPU-specific source files.
Specifying options
At the project level, there’s a file called common.mk.
This file contains any special flags and settings that need to be in
effect in order to compile and link.
At the bottommost level (the variant level), the format of the makefile
is different — it doesn’t include recurse.mk, but instead includes
common.mk (from the project level).
Macro Value
VARIANT1 o.be
CPU mips
OS nto
SECTION driver
PROJECT robot plc
include ../../common.mk
dll The image being built is a DLL; it’s linked with the
-Bsymbolic option (see ld in the Utilities Reference).
If the compound variant doesn’t include a, so, or dll,
an executable is being built.
shared Compile the object files for .so use, but don’t create an
actual shared object. You typically use this name in an
a.shared variant to create a static link archive that can
be linked into a shared object.
g Compile and link the source with the debugging flag set.
be, le Compile and link the source to generate big (if be) or
little (if le) endian code. If a CPU supports bi-endian
operation, one of these variants should always be present
in the compound variant name. Conversely, if the CPU is
mono-endian, neither be nor le should be specified in
the compound variant.
Variant Purpose
g.le A debugging version of a little-endian executable.
so.be A big-endian version of a shared object.
403.be A user-defined “403” variant for a big-endian system.
☞ The only valid characters for variant names are letters, digits, and
underscores ( ).
In order for the source code to tell what variant(s) it’s being compiled
for, the common makefiles arrange for each variant name to be
postfixed to the string VARIANT and have that defined as a C or
assembler macro on the command line. For example, if the compound
variant is so.403.be, the following C macros are defined:
VARIANT so, VARIANT 403, and VARIANT be. Note that neither
VARIANT be nor VARIANT le is defined on a CPU that doesn’t
support bi-endian operation, so any endian-specific code should
always test for the C macros LITTLEENDIAN or BIGENDIAN
(instead of VARIANT le or VARIANT be) to determine what
endianness it’s running under.
¯ qconfig.mk
¯ qmacros.mk
We’ll also look at some of the macros that are set or used by those
include files.
Preset macros
Before including qtargets.mk, some macros need to be set to
determine things like what additional libraries need to be searched in
the link, the name of the image (if it doesn’t match the project
directory name), and so on. This would be done in the area tagged as
“Preset make macros go here” in the sample above.
Postset macros
Following the include of qtargets.mk, you can override or (more
likely) add to the macros set by qtargets.mk. This would be done
in the area tagged as “Post-set make macros go here” in the
sample above.
qconfig.mk macros
Here’s a summary of the macros available from qconfig.mk:
PWD HOST Print the full path of the current working directory.
The which parameter can be either the string HOST for compiling
something for the host system or a triplet of the form os cpu compiler
to specify a combination of target OS and CPU, as well as the
compiler to be used.
The os would usually be the string nto to indicate Neutrino. The cpu
would be one of x86, mips, ppc, arm or sh. Finally, the compiler
would be one of gcc.
For example, the macro CC nto x86 gcc would be used to specify:
¯ an x86 platform
¯ CLPOST which
¯ CCPOST which
¯ ASPOST which
¯ ARPOST which
¯ LRPOST which
¯ LDPOST which
¯ UMPOST which
The parameter “which” is the same as defined above: either the string
“HOST” or the ordered triplet defining the OS, CPU, and compiler.
For example, specifying the following:
EXTRA SRCVPATH
Added to the end of SRCVPATH. Defaults to none.
EXTRA LIBVPATH
Added to LIBVPATH just before
$(INSTALL ROOT support)/$(OS)/$(CPUDIR)/lib.
Default is none.
Note that for the VFLAG which, CCVFLAG which, ASVFLAG which,
and LDVFLAG which macros, the which part is the name of a variant.
This combined macro is passed to the appropriate command line. For
example, if there were a variant called “403,” then the macro
VFLAG 403 would be passed to the C compiler, assembler, and linker.
☞ Don’t use this mechanism to define a C macro constant that you can
test in the source code to see if you’re in a particular variant. The
makefiles do that automatically for you. Don’t set the *VFLAG *
macros for any of the distinguished variant names (listed in the
“Recognized variant names” section, above). The common makefiles
will get confused if you do.
Advanced topics
In this section, we’ll discuss how to:
¯ collapse unnecessary directory levels
¯ perform partial builds
¯ use GNU configure
include ../common.mk
In this case, we’ve specified both the variant (as “be” for big-endian)
and the CPU (as “ppc” for PowerPC) with a single directory.
Why did we do this? Because the 800fads directory refers to a very
specific board — it’s not going to be useful for anything other than a
PowerPC running in big-endian mode.
In this case, the makefile macros would have the following values:
Macro Value
VARIANT1 ppc-be
CPU ppc
OS nto (default)
SECTION 800fads
PROJECT boards
¯ recurse into all of the directories except for the specified tagged
ones
Let’s consider an example. The following (issued from the top of the
source tree):
make CPULIST=x86
causes only the directories that are at the CPU level and below (and
tagged as LIST=CPU), and that are called x86, to be recursed into.
You can specify a space-separated list of directories (note the use of
quoting in the shell to capture the space character):
Then you can decide to build (or prevent from building) various
subcomponents marked with CONTROL. This might be useful in a
very big project, where compilation times are long and you need to
test only a particular subsection, even though other subsections may
be affected and would ordinarily be made.
For example, if you had marked two directories, robot plc and
pidloop, with the LIST=CONTROL macro within the makefile, you
could then make just the robot plc module:
make CONTROLLIST=robot plc
Or make both (note the use of quoting in the shell to capture the space
character):
make "CONTROLLIST=robot plc pidloop"
GNU configure
The way things are being done now can be used with any future
third-party code that uses a GNU ./configure script for
configuration.
☞ The steps given below shouldn’t overwrite any existing files in the
project; they just add new ones.
Every time that you type make in one of the newly created directories,
the GNUmakefile is read (a small trick that works only with GNU
make). GNUmakefile in turn invokes the
/usr/include/mk/build-cfg script, which notices whether or
not configure has been run in the directory:
TARGET SYSNAME
This is the target OS (e.g. nto, win32) that we’re
going to be generating executables for. It’s set
automatically by build-cfg, based on the
directory that you’re in.
make cmds The command goals passed to make (e.g. all). It’s
set automatically by build-cfg what you passed
on the original make command line.
hook preconfigure()
This function is invoked just before we run the project’s configure
script. Its main job is to set the configure opts variable properly. Here’s
a fairly complicated example (this is from GCC):
# The "target" variable is the compilation target: "ntoarm", "ntox86", etc.
function hook preconfigure {
case ${SYSNAME} in
nto)
case "${target}" in
nto*) basedir=/usr ;;
*) basedir=/opt/QNXsdk/host/qnx6/x86/usr ;;
esac
;;
solaris)
host cpu=$(uname -p)
case ${host cpu} in
i[34567]86) host cpu=x86 ;;
esac
basedir=/opt/QNXsdk/host/solaris/${host cpu}/usr
;;
*)
echo "Don’t have config for ${SYSNAME}"
exit 1
;;
esac
configure opts="${configure opts} --target=${target}"
configure opts="${configure opts} --prefix=${basedir}"
configure opts="${configure opts} --exec-prefix=${basedir}"
configure opts="${configure opts} --with-local-prefix=${basedir}"
configure opts="${configure opts} --enable-haifa"
configure opts="${configure opts} --enable-languages=c++"
configure opts="${configure opts} --enable-threads=posix"
configure opts="${configure opts} --with-gnu-as"
configure opts="${configure opts} --with-gnu-ld"
configure opts="${configure opts} --with-as=${basedir}/bin/${target}-as"
configure opts="${configure opts} --with-ld=${basedir}/bin/${target}-ld"
if [ ${SYSNAME} == nto ]; then
configure opts="${configure opts} --enable-multilib"
configure opts="${configure opts} --enable-shared"
else
configure opts="${configure opts} --disable-multilib"
fi
}
hook postconfigure()
This is invoked after configure has been successfully run. Usually
you don’t need to define this function, but sometimes you just can’t
quite convince configure to do the right thing, so you can put some
hacks in here to munge things appropriately. For example, again from
GCC:
hook premake()
This function is invoked just before the make. You don’t usually need
it.
hook postmake()
This function is invoked just after the make. We haven’t found a use
for this one yet, but included it for completeness.
hook pinfo()
This function is invoked after hook postmake(). Theoretically, we
don’t need this hook at all and we could do all its work in
hook postmake(), but we’re keeping it separate in case we get fancier
in the future.
This function is responsible for generating all the *.pinfo files in
the project. It does this by invoking the gen pinfo() function that’s
defined in build-cfg, which generates one .pinfo. The command
line for gen pinfo() is:
gen pinfo [-nsrc name ] install name install dir pinfo line...
src name The name of the pinfo file (minus the .pinfo
suffix). If it’s not specified, gen pinfo() uses
install name.
pinfo line Any additional pinfo lines that you want to add. You
can repeat this argument as many times as required.
Favorites include:
¯ DESCRIPTION="This executable
performs no useful purpose"
¯ SYMLINK=foobar.so
In this appendix. . .
Introduction 323
The impact of SMP 324
Designing with SMP in mind 327
Introduction
As described in the System Architecture guide, there’s an SMP
(Symmetrical MultiProcessor) version of Neutrino that runs on:
¯ MIPS-based systems
¯ PowerPC-based systems
If you have one of these systems, then you’re probably itching to try it
out, but are wondering what you have to do to get Neutrino running
on it. Well, the answer is not much. The only part of Neutrino that’s
different for an SMP system is the microkernel — another example of
the advantages of a microkernel architecture!
[virtual=x86,bios] .bootstrap = {
startup-bios
PATH=/proc/boot procnto-smp
}
[+script] .script = {
devc-con -e &
reopen /dev/con1
libc.so
[type=link] /usr/lib/ldqnx.so.2=/proc/boot/libc.so
[data=copy]
devc-con
esh
ls
After building the image, you proceed in the same way as you would
with a single-processor system.
Processor affinity
One issue that often arises in an SMP environment can be put like this:
“Can I make it so that one processor handles the GUI, another handles
the database, and the other two handle the realtime functions?”
The answer is: “Yes, absolutely.”
This is done through the magic of processor affinity — the ability to
associate certain programs (or even threads within programs) with a
particular processor or processors.
Processor affinity works like this. When a thread starts up, its
processor affinity mask is set to allow it to run on all processors. This
implies that there’s no inheritance of the processor affinity mask, so
it’s up to the thread to use ThreadCtl() with the
NTO TCTL RUNMASK control flag to set the processor affinity mask.
The processor affinity mask is simply a bitmap; each bit position
indicates a particular processor. For example, the processor affinity
mask 0x05 (binary 00000101) allows the thread to run on processors
0 (the 0x01 bit) and 2 (the 0x04 bit).
This FIFO trick won’t work on an SMP system, because both threads
may run simultaneously on different processors. You’ll have to use
the more “proper” thread synchronization primitives (e.g. a mutex).
Function Operation
atomic add() Add a number.
atomic add value() Add a number and return the original
value of *loc.
atomic clr() Clear bits.
atomic clr value() Clear bits and return the original value of
*loc.
atomic set() Set bits.
atomic set value() set bits and return the original value of
*loc.
atomic sub() Subtract a number.
atomic sub value() Subtract a number and return the original
value of *loc.
atomic toggle() Toggle (complement) bits
atomic toggle value() Toggle (complement) bits and return the
original value of *loc.
do graphics ()
{
int num cpus;
int i;
pthread t *tids;
// now all the "do lines" are off running on the processors
void *
do lines (void *arg)
{
In this appendix. . .
GDB commands 334
Running programs under GDB 340
Stopping and continuing 350
Examining the stack 373
Examining source files 380
Examining data 387
Examining the symbol table 411
Altering execution 415
set qnxinheritenv
Set where the remote process inherits its
environment from; see “Your program’s
environment.”
set qnxremotecwd
Set the working directory for the remote process;
see “Starting your program.”
set qnxtimeout
Set the timeout for remote reads; see “Setting the
target.”
GDB commands
You can abbreviate a GDB command to the first few letters of the
command name, if that abbreviation is unambiguous; and you can
repeat certain GDB commands by typing just Enter. You can also use
the Tab key to get GDB to fill out the rest of a word in a command (or
to show you the alternatives available, if there’s more than one
possibility).
You may also place GDB commands in an initialization file and these
commands will be run before any that have been entered via the
command line. For more information, see:
Command syntax
A GDB command is a single line of input. There’s no limit on how
long it can be. It starts with a command name, which is followed by
arguments whose meaning depends on the command name. For
example, the command step accepts an argument that is the number
of times to step, as in step 5. You can also use the step command
Command completion
GDB can fill in the rest of a word in a command for you if there’s
only one possibility; it can also show you what the valid possibilities
are for the next word in a command, at any time. This works for GDB
commands, GDB subcommands, and the names of symbols in your
program.
Press the Tab key whenever you want GDB to fill out the rest of a
word. If there’s only one possibility, GDB fills in the word, and waits
for you to finish the command (or press Enter to enter it). For
example, if you type:
(gdb) info bre Tab
GDB fills in the rest of the word breakpoints, since that is the only
info subcommand beginning with bre:
You can either press Enter at this point, to run the info
breakpoints command, or backspace and enter something else, if
breakpoints doesn’t look like the command you expected. (If you
were sure you wanted info breakpoints in the first place, you
might as well just type Enter immediately after info bre, to exploit
command abbreviations rather than command completion).
If there’s more than one possibility for the next word when you press
Tab, GDB sounds a bell. You can either supply more characters and
try again, or just press Tab a second time; GDB displays all the
possible completions for that word. For example, you might want to
set a breakpoint on a subroutine whose name begins with make , but
when you type:
b make Tab
GDB just sounds the bell. Typing Tab again displays all the function
names in your program that begin with those characters, for example:
make a section from file make environ
make abs section make function type
make blockvector make pointer type
make cleanup make reference type
make command make symbol completion list
(gdb) b make
The most likely situation where you might need this is in typing the
name of a C++ function. This is because C++ allows function
overloading (multiple definitions of the same function, distinguished
by argument type). For example, when you want to set a breakpoint
you may need to distinguish whether you mean the version of name
that takes an int parameter, name(int), or the version that takes a
float parameter, name(float). To use the word-completion
facilities in this situation, type a single quote ’ at the beginning of the
function name. This alerts GDB that it may need to consider more
information than usual when you press Tab, or Esc followed by ?, to
request word completion:
(gdb) b ’bubble(Esc?
bubble(double,double) bubble(int,int)
(gdb) b ’bubble(
In some cases, GDB can tell that completing a name requires using
quotes. When this happens, GDB inserts the quote for you (while
completing as much as it can) if you don’t type the quote in the first
place:
(gdb) b bub Tab
GDB alters your input line to the following, and rings a bell:
(gdb) b ’bubble(
In general, GDB can tell that a quote is needed (and inserts it) if you
haven’t yet started typing the argument list when you ask for
completion on an overloaded symbol.
Getting help
You can always ask GDB itself for information on its commands,
using the command help.
help
h You can use help (h) with no arguments to display a
short list of named classes of commands:
(gdb) help
List of classes of commands:
List of commands:
help command
With a command name as help argument, GDB
displays a short paragraph on how to use that
command.
complete args
The complete args command lists all the possible
completions for the beginning of a command. Use
args to specify the beginning of the command you
want completed. For example:
complete i
results in:
info
inspect
ignore
In addition to help, you can use the GDB commands info and show
to inquire about the state of your program, or the state of GDB itself.
Each command supports many topics of inquiry; this manual
introduces each of them in the appropriate context. The listings under
info and show in the index point to all the sub-commands.
show version
Show what version of GDB is running. You should include this
information in GDB bug-reports. If multiple versions of GDB
are in use at your site, you may occasionally want to determine
which version of GDB you’re running; as GDB evolves, new
commands are introduced, and old ones may wither away. The
version number is also announced when you start GDB.
show copying
Display information about permission for copying GDB.
show warranty
Display the GNU “NO WARRANTY” statement.
The pty option spawns a pdebug server on the local machine and
connects via a pty.
Here’s a sample:
If your communication line is slow, you might need to set the timeout
for remote reads:
☞ While input and output redirection work, you can’t use pipes to pass
the output of the program you’re debugging to another program; if
you attempt this, GDB is likely to wind up debugging the wrong
program.
When you issue the run command, your program is loaded but
doesn’t execute immediately. Use the continue command to start
your program. For more information, see “Stopping and continuing.”
While your program is stopped, you may call functions in your
program, using the print or call commands. See “Examining
data.”
If the modification time of your symbol file has changed since the last
time GDB read its symbols, GDB discards its symbol table and reads
it again. When it does this, GDB tries to retain your current
breakpoints.
set args Specify the arguments to be used the next time your
program is run. If set args has no arguments, run
executes your program with no arguments. Once
you’ve run your program with arguments, using set
args before the next run is the only way to run it
again without arguments.
show args Show the arguments to give your program when it’s
started.
show paths Display the list of search paths for executables (the
PATH environment variable).
modes your program was using and switches back to them when you
continue running your program.
You can redirect your program’s input and/or output using shell
redirection with the run command. For example,
If you exit GDB or use the run command while you have an attached
process, you kill that process. By default, GDB asks for confirmation
if you try to do either of these things; you can control whether or not
you need to confirm by using the set confirm command.
¯ thread-specific breakpoints
The GDB thread debugging facility lets you observe all threads while
your program runs — but whenever GDB takes control, one thread in
particular is always the focus of debugging. This thread is called the
current thread. Debugging commands show program information
from the perspective of the current thread.
GDB associates its own thread number — always a single integer —
with each thread in your program.
info threads
Display a summary of all threads currently in your program.
GDB displays for each thread (in this order):
1 Thread number assigned by GDB
2 Target system’s thread identifier (systag)
3 Current stack frame summary for that thread.
An asterisk * to the left of the GDB thread number indicates the
current thread. For example:
(gdb) info threads
3 process 35 thread 27 0x34e5 in sigpause ()
2 process 35 thread 23 0x34e5 in sigpause ()
* 1 process 35 thread 13 main (argc=1, argv=0x7ffffff8)
at threadtest.c:68
thread threadno
Make thread number threadno the current thread. The
command argument threadno is the internal GDB thread
number, as shown in the first field of the info threads
display. GDB responds by displaying the system identifier of
the thread you selected and its current stack frame summary:
(gdb) thread 2
[Switching to process 35 thread 23]
0x34e5 in sigpause ()
info program
Display information about the status of your program: whether
it’s running or not, what process it is, and why it stopped.
Setting breakpoints
Use the break (b) command to set breakpoints. The debugger
convenience variable $bpnum records the number of the breakpoints
you’ve set most recently; see “Convenience variables” for a
discussion of what you can do with convenience variables.
You have several ways to say where the breakpoint should go:
break function
Set a breakpoint at entry to function. When using source
languages such as C++ that permit overloading of
symbols, function may refer to more than one possible
place to break. See “Breakpoint menus” for a discussion
of that situation.
break +offset
break -offset
Set a breakpoint some number of lines forward or back
from the position at which execution stopped in the
currently selected frame.
break linenum
Set a breakpoint at line linenum in the current source file.
That file is the last file whose source text was printed. This
breakpoint stops your program just before it executes any
of the code on that line.
break filename:linenum
Set a breakpoint at line linenum in source file filename.
break filename:function
Set a breakpoint at entry to function found in file filename.
Specifying a filename as well as a function name is
superfluous except when multiple files contain similarly
named functions.
break *address
Set a breakpoint at address address. You can use this to set
breakpoints in parts of your program that don’t have
debugging information or source files.
There are several variations on the break command, all using the
same syntax as above:
GDB lets you set any number of breakpoints at the same place in your
program. There’s nothing silly or meaningless about this. When the
breakpoints are conditional, this is even useful (see “Break
conditions”).
GDB itself sometimes sets breakpoints in your program for special
purposes, such as proper handling of longjmp (in C programs).
These internal breakpoints are assigned negative numbers, starting
with -1; info breakpoints doesn’t display them.
You can see these breakpoints with the GDB maintenance command,
maint info breakpoints.
Setting watchpoints
You can use a watchpoint to stop execution whenever the value of an
expression changes, without having to predict a particular place where
this may happen.
Although watchpoints currently execute two orders of magnitude
more slowly than other breakpoints, they can help catch errors where
in cases where you have no clue what part of your program is the
culprit.
rwatch arg Set a watchpoint that breaks when watch arg is read
by the program. If you use both watchpoints, both
must be set with the rwatch command.
awatch arg Set a watchpoint that breaks when arg is read and
written into by the program. If you use both
watchpoints, both must be set with the awatch
command.
info watchpoints
This command prints a list of watchpoints and
breakpoints; it’s the same as info break.
catch exceptions
You can set breakpoints at active exception handlers by using
the catch command. The exceptions argument is a list of
names of exceptions to catch.
You can use info catch to list active exception handlers. See
“Information about a frame.”
There are currently some limitations to exception handling in GDB:
Deleting breakpoints
You often need to eliminate a breakpoint or watchpoint once it’s done
its job and you no longer want your program to stop there. This is
called deleting the breakpoint. A breakpoint that has been deleted no
longer exists and is forgotten.
With the clear command you can delete breakpoints according to
where they are in your program. With the delete command you can
delete individual breakpoints or watchpoints by specifying their
breakpoint numbers.
You don’t have to delete a breakpoint to proceed past it. GDB
automatically ignores breakpoints on the first instruction to be
executed when you continue execution without changing the
execution address.
clear function
clear filename:function
Delete any breakpoints set at entry to function.
clear linenum
clear filename:linenum
Delete any breakpoints set at or within the code of the
specified line.
Disabling breakpoints
Rather than delete a breakpoint or watchpoint, you might prefer to
disable it. This makes the breakpoint inoperative as if it had been
deleted, but remembers the information on the breakpoint so that you
can enable it again later.
You disable and enable breakpoints and watchpoints with the enable
and disable commands, optionally specifying one or more
breakpoint numbers as arguments. Use info break or info watch
to print a list of breakpoints or watchpoints if you don’t know which
numbers to use.
A breakpoint or watchpoint can have any of the following states:
Break conditions
The simplest sort of breakpoint breaks every time your program
reaches a specified place. You can also specify a condition for a
breakpoint. A condition is just a Boolean expression in your
programming language (see “Expressions”). A breakpoint with a
condition evaluates the expression each time your program reaches it,
and your program stops only if the condition is true.
commands [bnum]
... command-list ...
end
Specify a list of commands for breakpoint number bnum. The
commands themselves appear on the following lines. Type a
line containing just end to terminate the commands.
To remove all commands from a breakpoint, type commands
and follow it immediately with end; that is, give no commands.
With no bnum argument, commands refers to the last
breakpoint or watchpoint set (not to the breakpoint most
recently encountered).
Breakpoint menus
Some programming languages (notably C++) permit a single function
name to be defined several times, for application in different contexts.
This is called overloading. When a function name is overloaded,
break function isn’t enough to tell GDB where you want a
breakpoint.
If you realize this is a problem, you can use something like break
function(types) to specify which particular version of the function you
want. Otherwise, GDB offers you a menu of numbered choices for
different possible breakpoints, and waits for your selection with the
prompt >. The first two options are always [0] cancel and [1]
all. Typing 1 sets a breakpoint at each definition of function, and
typing 0 aborts the break command without setting any new
breakpoints.
For example, the following session excerpt shows an attempt to set a
breakpoint at the overloaded symbol String::after(). We choose three
particular definitions of that function name:
(gdb) b String::after
[0] cancel
[1] all
[2] file:String.cc; line number:867
[3] file:String.cc; line number:860
[4] file:String.cc; line number:875
[5] file:String.cc; line number:853
[6] file:String.cc; line number:846
[7] file:String.cc; line number:735
> 2 4 6
Breakpoint 1 at 0xb26c: file String.cc, line 867.
Breakpoint 2 at 0xb344: file String.cc, line 875.
Breakpoint 3 at 0xafcc: file String.cc, line 846.
Multiple breakpoints were set.
Use the "delete" command to delete unwanted
breakpoints.
(gdb)
continue [ignore-count]
c [ignore-count]
fg [ignore-count]
Resume program execution, at the address where your program
last stopped; any breakpoints set at that address are bypassed.
The optional argument ignore-count lets you specify a further
number of times to ignore a breakpoint at this location; its effect
is like that of ignore (see “Break conditions”).
The argument ignore-count is meaningful only when your
program stopped due to a breakpoint. At other times, the
argument to continue is ignored.
The synonyms c and fg are provided purely for convenience,
and have exactly the same behavior as continue.
If you use the step command while control is within a function that
☞ was compiled without debugging information, execution proceeds
until control reaches a function that does have debugging information.
Likewise, it doesn’t step into a function that is compiled without
debugging information. To step through functions without debugging
information, use the stepi command, described below.
The step command stops only at the first
instruction of a source line. This prevents multiple
stops in switch statements, for loops, etc. The step
command stops if a function that has debugging
information is called within the line.
Also, the step command enters a subroutine only
if there’s line number information for the
subroutine. Otherwise it acts like the next
command. This avoids problems when using cc
-gl on MIPS machines.
(gdb) f
#0 main (argc=4, argv=0xf7fffae8) at m4.c:206
206 expand input();
(gdb) until
195 for ( ; argc > 0; NEXTARG) {
stepi [count]
si [count] Execute one machine instruction, then stop and
return to the debugger.
It’s often useful to do display/i $pc when
stepping by machine instructions. This makes GDB
automatically display the next instruction to be
executed, each time your program stops. See
“Automatic display.”
The count argument is a repeat count, as in step.
nexti [count]
ni [count] Execute one machine instruction, but if it’s a
function call, proceed until the function returns.
The count argument is a repeat count, as in next.
Signals
A signal is an asynchronous event that can happen in a program. The
operating system defines the possible kinds of signals, and gives each
kind a name and a number. The table below gives several examples of
signals:
info signals
info handle
Print a table of all the kinds of signals and how GDB has been
told to handle each one. You can use this to see the signal
numbers of all the defined types of signals.
pass GDB should allow your program to see this signal; your
program can handle the signal, or else it may terminate
if the signal is fatal and not handled.
When a signal stops your program, the signal isn’t visible until you
continue. Your program sees the signal then, if pass is in effect for
the signal in question at that time. In other words, after GDB reports a
signal, you can use the handle command with pass or nopass to
control whether your program sees that signal when you continue.
You can also use the signal command to prevent your program from
seeing a signal, or cause it to see a signal it normally doesn’t see, or to
give it any signal at any time. For example, if your program stopped
due to some sort of memory reference error, you might store correct
values into the erroneous variables and continue, hoping to see more
execution; but your program would probably terminate immediately
as a result of the fatal signal once it saw the signal. To prevent this,
you can continue with signal 0. See “Giving your program a
signal.”
Whenever your program stops under GDB for any reason, all threads
of execution stop, not just the current thread. This lets you examine
the overall state of the program, including switching between threads,
without worrying that things may change underfoot.
Conversely, whenever you restart the program, all threads start
executing. This is true even when single-stepping with commands like
step or next.
In particular, GDB can’t single-step all threads in lockstep. Since
thread scheduling is up to the Neutrino microkernel (not controlled by
GDB), other threads may execute more than one statement while the
current thread completes a single step. Moreover, in general, other
threads stop in the middle of a statement, rather than at a clean
statement boundary, when the program stops.
You might even find your program stopped in another thread after
continuing or even single-stepping. This happens whenever some
other thread runs into a breakpoint, a signal, or an exception before
the first thread completes whatever you requested.
Stack frames
The call stack is divided up into contiguous pieces called stack
frames, or frames for short; each frame is the data associated with one
call to one function. The frame contains the arguments given to the
function, the function’s local variables, and the address at which the
function is executing.
When your program is started, the stack has only one frame, that of
the function main(). This is called the initial frame or the outermost
frame. Each time a function is called, a new frame is made. Each time
a function returns, the frame for that function invocation is
eliminated. If a function is recursive, there can be many frames for the
same function. The frame for the function in which execution is
actually occurring is called the innermost frame. This is the most
recently created of all the stack frames that still exist.
Inside your program, stack frames are identified by their addresses. A
stack frame consists of many bytes, each of which has its own
address; each kind of computer has a convention for choosing one
byte whose address serves as the address of the frame. Usually this
address is kept in a register called the frame pointer register while
execution is going on in that frame.
GDB assigns numbers to all existing stack frames, starting with 0 for
the innermost frame, 1 for the frame that called it, and so on upward.
These numbers don’t really exist in your program; they’re assigned by
GDB to give you a way of designating stack frames in GDB
commands.
Some compilers provide a way to compile functions so that they
operate without stack frames. (For example, the gcc option
-fomit-frame-pointer generates functions without a frame.)
frame args The frame command lets you move from one stack
frame to another, and to print the stack frame you
select. The args may be either the address of the
frame or the stack frame number. Without an
argument, frame prints the current stack frame.
select-frame
The select-frame command lets you move from
one stack frame to another without printing the
frame. This is the silent version of frame.
Backtraces
A backtrace is a summary of how your program got where it is. It
shows one line per frame, for many frames, starting with the currently
executing frame (frame 0), followed by its caller (frame 1), and on up
the stack.
backtrace
bt Print a backtrace of the entire stack, with one line
per frame, for all frames in the stack.
You can stop the backtrace at any time by typing
the system interrupt character, normally Ctrl – C.
backtrace n
bt n Similar, but print only the innermost n frames.
backtrace -n
bt -n Similar, but print only the outermost n frames.
The names where and info stack ( info s) are additional aliases
for backtrace.
Each line in the backtrace shows the frame number and the function
name. The program counter value is also shown — unless you use
set print address off. The backtrace also shows the source
filename and line number, as well as the arguments to the function.
The program counter value is omitted if it’s at the beginning of the
code for that line number.
Here’s an example of a backtrace. It was made with the command bt
3, so it shows the innermost three frames:
The display for frame 0 doesn’t begin with a program counter value,
indicating that your program has stopped at the beginning of the code
for line 993 of builtin.c.
Selecting a frame
Most commands for examining the stack and other data in your
program work on whichever stack frame is selected at the moment.
Here are the commands for selecting a stack frame; all of them finish
by printing a brief description of the stack frame just selected.
frame n
fn Select frame number n. Recall that frame 0 is the
innermost (currently executing) frame, frame 1 is the
frame that called the innermost one, and so on. The
highest-numbered frame is the one for main.
frame addr
f addr Select the frame at address addr. This is useful
mainly if the chaining of stack frames has been
damaged by a bug, making it impossible for GDB to
up-silently n
down-silently n
These two commands are variants of up and down; they differ
in that they do their work silently, without causing display of
the new frame. They’re intended primarily for use in GDB
command scripts, where the output might be unnecessary and
distracting.
frame
f When used without any argument, this command
doesn’t change which frame is selected, but prints a
brief description of the currently selected stack
frame. It can be abbreviated f. With an argument,
this command is used to select a stack frame. See
“Selecting a frame.”
info frame
info f This command prints a verbose description of the
selected stack frame, including:
¯ the address of the frame
¯ the address of the next frame down (called by
this frame)
¯ the address of the next frame up (caller of this
frame)
¯ the language in which the source code
corresponding to this frame is written
¯ the address of the frame’s arguments
¯ the program counter saved in it (the address of
execution in the caller frame)
¯ which registers were saved in the frame
The verbose description is useful when something
has gone wrong that has made the stack format fail
to fit the usual conventions.
info frame addr
info f addr
Print a verbose description of the frame at address
addr, without selecting that frame. The selected
frame remains unchanged by this command. This
info locals Print the local variables of the selected frame, each
on a separate line. These are all variables (declared
either static or automatic) accessible at the point of
execution of the selected frame.
info catch Print a list of all the exception handlers that are
active in the current stack frame at the current point
of execution. To see other exception handlers, visit
the associated frame (using the up, down, or
frame commands); then type info catch. See
“Breakpoints and exceptions.”
By default, GDB prints ten source lines with any of these forms of the
list command. You can change this using set listsize:
show listsize
Display the number of lines that list prints.
Here are the ways of specifying a single source line — all the kinds of
linespec:
+offset Specifies the line offset lines after the last line printed.
When used as the second linespec in a list command
that has two, this specifies the line offset lines down
from the first linespec.
-offset Specifies the line offset lines before the last line printed.
filename:number
Specifies line number in the source file filename.
function Specifies the line that begins the body of the function
function. For example: in C, this is the line with the
open brace, }.
filename:function
Specifies the line of the open brace that begins the body
of function in the file filename. You need the filename
with a function name only to avoid ambiguity when
there are identically named functions in different source
files.
forward-search regexp
search regexp
fo regexp
Check each line, starting with the one following the last line
listed, for a match for regexp, listing the line found.
reverse-search regexp
rev regexp
Check each line, starting with the one before the last line listed
and going backward, for a match for regexp, listing the line
found.
☞ The executable search path isn’t used for this purpose. Neither is the
current working directory, unless it happens to be in the source path.
If GDB can’t find a source file in the source path, and the object
program records a directory, GDB tries that directory too. If the
source path is empty, and there’s no record of the compilation
directory, GDB looks in the current directory as a last resort.
Whenever you reset or rearrange the source path, GDB clears out any
information it has cached about where source files are found and
where each line is in the file.
When you start GDB, its source path is empty. To add other
directories, use the directory command.
For example, we can use info line to discover the location of the
object code for the first line of function m4 changequote:
We can also inquire (using *addr as the form for linespec) what
source line covers a particular address:
After info line, the default address for the x command is changed
to the starting address of the line, so that x/i is sufficient to begin
examining the machine code (see “Examining memory”). Also, this
address is saved as the value of the convenience variable $ (see
“Convenience variables”).
disassemble
This specialized command dumps a range of memory as
machine instructions. The default memory range is the function
surrounding the program counter of the selected frame. A
single argument to this command is a program counter value;
GDB dumps the function surrounding this value. Two
arguments specify a range of addresses (first inclusive, second
exclusive) to dump.
Shared libraries
You can use the following commands when working with shared
libraries:
sharedlibrary [regexp]
Load shared object library symbols for files matching the given
regular expression, regexp. If regexp is omitted, GDB tries to
load symbols for all loaded shared libraries.
info sharedlibrary
Display the status of the loaded shared object libraries.
You can query the settings of these parameters with the show
solib-search-path, show solib-absolute-prefix, and
show auto-solib-add commands.
Examining data
The usual way to examine data in your program is with the print (p)
command or its synonym inspect. It evaluates and prints the value
of an expression of the language your program is written in.
print exp
print /f exp exp is an expression (in the source language). By
default, the value of exp is printed in a format
appropriate to its data type; you can choose a
different format by specifying /f , where f is a
letter specifying the format; see “Output formats.”
print
print /f If you omit exp, GDB displays the last value again
(from the value history; see “Value history”). This
lets you conveniently inspect the same value in an
alternative format.
Expressions
The print command and many other GDB commands accept an
expression and compute its value. Any kind of constant, variable or
operator defined by the programming language you’re using is valid
in an expression in GDB. This includes conditional expressions,
function calls, casts and string constants. It unfortunately doesn’t
include symbols defined by preprocessor #define commands.
GDB supports array constants in expressions input by the user. The
syntax is {element, element...}. For example, you can use the
command print {1, 2, 3} to build up an array in memory that is
malloc’d in the target program.
Because C is so widespread, most of the expressions shown in
examples in this manual are in C. In this section, we discuss operators
that you can use in GDB expressions regardless of your programming
language.
Casts are supported in all languages, not just in C, because it’s useful
to cast a number into a pointer in order to examine a structure at that
address in memory.
GDB supports these operators, in addition to those common to
programming languages:
Program variables
The most common kind of expression to use is the name of a variable
in your program.
Variables in expressions are understood in the selected stack frame
(see “Selecting a frame”); they must be either:
you can examine and use the variable a whenever your program is
executing within the function foo(), but you can use or examine the
variable b only while your program is executing inside the block
where b is declared.
There’s an exception: you can refer to a variable or function whose
scope is a single source file even if the current execution point isn’t in
this file. But it’s possible to have more than one such variable or
function with the same name (in different source files). If that
Here file or function is the name of the context for the static variable.
In the case of filenames, you can use quotes to make sure GDB parses
the filename as a single word. For example, to print a global value of
x defined in f2.c:
(gdb) p ’f2.c’::x
This use of :: is very rarely in conflict with the very similar use of
the same notation in C++. GDB also supports use of the C++ scope
resolution operator in GDB expressions.
Artificial arrays
It’s often useful to print out several successive objects of the same
type in memory; a section of an array, or an array of dynamically
determined size for which only a pointer exists in the program.
You can do this by referring to a contiguous span of memory as an
artificial array, using the binary operator @. The left operand of @
p *array@len
The left operand of @ must reside in memory. Array values made with
@ in this way behave just like other arrays in terms of subscripting,
and are coerced to pointers when used in expressions. Artificial arrays
most often appear in expressions via the value history (see “Value
history”), after printing one out.
Another way to create an artificial array is to use a cast. This
reinterprets a value as if it were an array. The value need not be in
memory:
that expression via Enter. For instance, suppose you have an array
dtab of pointers to structures, and you’re interested in the values of a
field fv in each structure. Here’s an example of what you might type:
set $i = 0
p dtab[$i++]->fv
Enter
Enter
...
Output formats
By default, GDB prints a value according to its data type. Sometimes
this isn’t what you want. For example, you might want to print a
number in hex, or a pointer in decimal. Or you might want to view
data in memory at a certain address as a character string or as an
instruction. To do these things, specify an output format when you
print a value.
The simplest use of output formats is to say how to print a value
already computed. This is done by starting the arguments of the
print command with a slash and a format letter. The format letters
supported are:
x Regard the bits of the value as an integer, and print the integer
in hexadecimal.
f Regard the bits of the value as a floating point number and print
using typical floating point syntax.
To reprint the last value in the value history with a different format,
you can use the print command with just a format and no
expression. For example, p/x reprints the last value in hex.
Examining memory
You can use the command x (for “examine”) to examine memory in
any of several formats, independently of your program’s data types.
x/nfu addr
x addr
x Use the x command to examine memory.
The n, f , and u are all optional parameters that specify how much
memory to display and how to format it; addr is an expression giving
the address where you want to start displaying memory. If you use
defaults for nfu, you need not type the slash /. Several commands set
convenient defaults for addr.
specifications 4xw and 4wx mean exactly the same thing. (However,
the count n must come first; wx4 doesn’t work.)
Even though the unit size u is ignored for the formats s and i, you
might still want to use a count n; for example, 3i specifies that you
want to see three machine instructions, including any operands. The
command disassemble gives an alternative way of inspecting
machine instructions; see “Source and machine code.”
All the defaults for the arguments to x are designed to make it easy to
continue scanning memory with minimal specifications each time you
use x. For example, after you’ve inspected three machine instructions
with x/3i addr, you can inspect the next seven with just x/7. If you
use Enter to repeat the x command, the repeat count n is used again;
the other arguments default as for successive uses of x.
The addresses and contents printed by the x command aren’t saved in
the value history because there’s often too much of them and they
would get in the way. Instead, GDB makes these values available for
subsequent use in expressions as values of the convenience variables
$ and $ . After an x command, the last address examined is
available for use in expressions in the convenience variable $ . The
contents of that address, as examined, are available in the convenience
variable $ .
If the x command has a repeat count, the address and contents saved
are from the last memory unit printed; this isn’t the same as the last
address printed if several units were printed on the last line of output.
Automatic display
If you find that you want to print the value of an expression frequently
(to see how it changes), you might want to add it to the automatic
display list so that GDB prints its value each time your program stops.
Each expression added to the list is given a number to identify it; to
remove an expression from the list, you specify that number. The
automatic display looks like this:
2: foo = 38
3: bar[5] = (struct hack *) 0x3804
undisplay dnums...
delete display dnums...
Remove item numbers dnums from the list of
expressions to display.
The undisplay command doesn’t repeat if you press
Enter after using it. (Otherwise you’d just get the error
No display number ....)
info display
Print the list of expressions previously set up to display
automatically, each one with its item number, but
without showing the values. This includes disabled
expressions, which are marked as such. It also includes
expressions that wouldn’t be displayed right now
because they refer to automatic variables not currently
available.
Print settings
GDB provides the following ways to control how arrays, structures,
and symbols are printed.
These settings are useful for debugging programs in any language:
(gdb) f
#0 set quotes (lq=0x34c78 "<<", rq=0x34c88 ">>")
at input.c:530
530 if (lquote != def lquote)
If you have a pointer and you aren’t sure where it points, try set
print symbol-filename on. Then you can determine the name
and source file location of the variable where it points, using p/a
pointer. This interprets the address in symbolic form. For example,
here GDB shows that a variable ptt points at another variable t,
defined in hi2.c:
☞ For pointers that point to a local variable, p/a doesn’t show the
symbol name and filename of the referent, even with the appropriate
set print options turned on.
$1 = {
next = 0x0,
flags = {
sweet = 1,
sour = 1
},
meat = 0x54 "Pork"
}
struct thing {
Species it;
union {
Tree forms tree;
Bug forms bug;
} form;
};
show demangle-style
Display the encoding style currently in use for decoding C++
symbols.
Value history
Values printed by the print command are saved in the GDB value
history. This lets you refer to them in other expressions. Values are
kept until the symbol table is reread or discarded (for example with
the file or symbol-file commands). When the symbol table
changes, the value history is discarded, since the values may contain
pointers back to the types defined in the symbol table.
The values printed are given history numbers, which you can use to
refer to them. These are successive integers starting with 1. The
print command shows you the history number assigned to a value
by printing $num = before the value; here num is the history number.
To refer to any previous value, use $ followed by the value’s history
number. The way print labels its output is designed to remind you
of this. Just $ refers to the most recent value in the history, and $$
refers to the value before that. $$n refers to the nth value from the
end; $$2 is the value just prior to $$, $$1 is equivalent to $$, and
$$0 is equivalent to $.
For example, suppose you have just printed a pointer to a structure
and want to see the contents of the structure. It suffices to type:
p *$
p *$.next
You can print successive links in the chain by repeating this command
— which you can do by just typing Enter.
print x
set x=5
then the value recorded in the value history by the print command
remains 4 even though the value of x has changed.
show values
Print the last ten values in the value history, with their item
numbers. This is like p $$9 repeated ten times, except that
show values doesn’t change the history.
show values n
Print ten history values centered on history item number n.
show values +
Print ten history values just after the values last printed. If no
more values are available, show values + produces no
display.
Pressing Enter to repeat show values n has exactly the same effect
as show values +.
Convenience variables
GDB provides convenience variables that you can use within GDB to
hold on to a value and refer to it later. These variables exist entirely
within GDB; they aren’t part of your program, and setting a
convenience variable has no direct effect on further execution of your
program. That’s why you can use them freely.
Convenience variables are prefixed with $. Any name preceded by $
can be used for a convenience variable, unless it’s one of the
predefined machine-specific register names (see “Registers”). Value
show convenience
Print a list of convenience variables used so far, and their
values. Abbreviated show con.
set $i = 0
print bar[$i++]->contents
Registers
You can refer to machine register contents, in expressions, as
variables with names starting with $. The names of registers are
different for each machine; use info registers to see the names
used on your machine.
info registers
Print the names and values of all registers except floating-point
registers (in the selected stack frame).
info all-registers
Print the names and values of all registers, including
floating-point registers.
GDB has four “standard” register names that are available (in
expressions) on most machines — whenever they don’t conflict with
an architecture’s canonical mnemonics for registers:
For example, you could print the program counter in hex with:
p/x $pc
set $sp += 4
have special registers that can hold nothing but floating point; these
registers are considered to have floating point values. There’s no way
to refer to the contents of an ordinary register as floating point value
(although you can print it as a floating point value with print/f
$regname).
Some registers have distinct “raw” and “virtual” data formats. This
means that the data format in which the register contents are saved by
the operating system isn’t the same one that your program normally
sees. For example, the registers of the 68881 floating point
coprocessor are always saved in “extended” (raw) format, but all C
programs expect to work with “double” (virtual) format. In such
cases, GDB normally works with the virtual format only (the format
that makes sense for your program), but the info registers
command prints the data in both formats.
Normally, register values are relative to the selected stack frame (see
“Selecting a frame”). This means that you get the value that the
register would contain if all stack frames farther in were exited and
their saved registers restored. In order to see the true contents of
hardware registers, you must select the innermost frame (with frame
0).
However, GDB must deduce where registers are saved, from the
machine code generated by your compiler. If some registers aren’t
saved, or if GDB is unable to locate the saved registers, the selected
stack frame makes no difference.
p ’foo.c’::x
whatis exp Print the data type of expression exp. The exp
expression isn’t actually evaluated, and any
side-effecting operations (such as assignments or
function calls) inside it don’t take place. See
“Expressions.”
whatis Print the data type of $, the last value in the value
history.
ptype typename
Print a description of data type typename, which may
be the name of a type, or for C code it may have the
form:
¯ class class-name
¯ struct struct-tag
¯ union union-tag
¯ enum enum-tag
ptype exp
ptype Print a description of the type of expression exp. The
ptype command differs from whatis by printing a
detailed description, instead of just the name of the
type. For example, for this variable declaration:
struct complex {double real; double imag;} v;
info source
Show the name of the current source file — that is,
the source file for the function containing the current
point of execution — and the language it was written
in.
info sources
Print the names of all source files in your program
for which there is debugging information, organized
into two lists: files whose symbols have already been
read, and files whose symbols are read when needed.
info functions
Print the names and data types of all defined
functions.
info functions regexp
Print the names and data types of all defined
functions whose names contain a match for regular
expression regexp. Thus, info fun step finds all
functions whose names include step; info fun
ˆstep finds those whose names start with step.
info variables
Print the names and data types of all variables that
are declared outside of functions (i.e. excluding local
variables).
Altering execution
Once you think you’ve found an error in your program, you might
want to find out for certain whether correcting the apparent error
would lead to correct results in the rest of the run. You can find the
answer by experimenting, using the GDB features for altering
execution of the program.
For example, you can store new values in variables or memory
locations, give your program a signal, restart it at a different address,
or even return prematurely from a function.
Assignment to variables
To alter the value of a variable, evaluate an assignment expression.
See “Expressions”. For example,
print x=4
stores the value 4 in the variable x and then prints the value of the
assignment expression (which is 4).
If you aren’t interested in seeing the value of the assignment, use the
set command instead of the print command. The set command is
really the same as print except that the expression’s value isn’t
printed and isn’t put in the value history (see “Value history”). The
expression is evaluated only for its effects.
If the beginning of the argument string of the set command appears
identical to a set subcommand, use the set variable command
instead of just set. This command is identical to set except for its
lack of subcommands. For example, if your program has a variable
width, you get an error if you try to set a new value with just set
width=13, because GDB has the command set width:
set {int}0x83040 = 4
You can get much the same effect as the jump command by storing a
new value in the register $pc. The difference is that this doesn’t start
your program running; it only changes the address of where it will run
when you continue. For example:
Invoking the signal command isn’t the same as invoking the kill
utility from the shell. Sending a signal with kill causes GDB to
decide what to do with the signal depending on the signal handling
tables (see “Signals”). The signal command passes the signal
directly to your program.
When you use return, GDB discards the selected stack frame (and
all frames within it). You can think of this as making the discarded
You can use this variant of the print command if you want to
execute a function from your program, but without cluttering the
output with void returned values. If the result isn’t void, it’s printed
and saved in the value history.
A user-controlled variable, call scratch address, specifies the location
of a scratch area to be used when GDB calls a function in the target.
This is necessary because the usual method of putting the scratch area
on the stack doesn’t work in systems that have separate instruction
and data spaces.
Patching programs
By default, GDB opens the file containing your program’s executable
code (or the core file) read-only. This prevents accidental alterations
to machine code; but it also prevents you from intentionally patching
your program’s binary.
If you’d like to be able to patch the binary, you can specify that
explicitly with the set write command. For example, you might
set write on
set write off
If you specify set write on, GDB opens
executable and core files for both reading and
writing; if you specify set write off (the
default), GDB opens them read-only.
If you’ve already loaded a file, you must load it
again (using the exec-file or core-file
command) after changing set write for your
new setting to take effect.
show write Display whether executable files and core files are
opened for writing as well as reading.
In this appendix. . .
ARM-specific restrictions and issues 423
ARM-specific features 427
This means that such privileged user processes execute with all the
access permission of kernel code:
made through one mapping aren’t visible through the cache entries
for other mappings.
¯ The FCSE remapping uses the top 7 bits of the address space,
which means there can be at most 128 slots. In practice, some of
the 4 GB virtual space is required for the kernel, so the real
number is lower.
The current limit is 63 slots:
- Slot 0 is never used.
- Slots 64-127 (0x80000000-0xFFFFFFFF) are used by the
kernel and the ARM-specific shm ctl() support described below.
Since each process typically has its own address space, this
imposes a hard limit of at most 63 different processes.
ARM-specific features
This section describes the ARM-specific behavior of certain
operations that are provided via a processor-independent interface:
¯ shm ctl() operations for defining special memory object properties
Any process that can use shm open() on the object can map it,
not just the process that created the object.
continued. . .
In this appendix. . .
Low-level discussion on Qnet principles 433
Details of Qnet data communication 434
Node descriptors 436
Booting over the network 439
What doesn’t work ... 445
Server Client
MsgSend()
MsgReply()
MsgDeliverEvent()
☞ Each node in the network is assigned a unique name that becomes its
identifier. This is what we call a node descriptor. This name is the
only visible means to determine whether the OS is running as a
network or as a standalone operating system.
In the above call, nd is the node descriptor that identifies each node
uniquely. The node descriptor is the only visible means to determine
whether the Neutrino is running as a network or as a standalone
operating system. If nd is zero, you’re specifying a local server
process, and you’ll get local message passing from the client to the
server, carried out by the local kernel as shown below:
MsgSend() MsgReceive()
MsgSend()
io-net
Network-card
Client machine driver
Network media
Server machine
Network-card
driver
io-net
MsgReceive()
The advantage of this approach lies in using the same API. The key
design features are:
¯ The kernel puts the user data directly into (and out of) the network
card’s buffers - there’s no copying of the payload.
¯ There are no context switches as the packet travels from (and to)
the kernel from the network card.
Node descriptors
The <sys/netmgr.h> header file
The <sys/netmgr.h> header defines the ND LOCAL NODE macro
as zero. You can use it any time that you’re dealing with node
descriptors to make it obvious that you’re talking about the local node.
As discussed, node descriptors represent machines, but they also
include Quality of Service information. If you want to see if two node
descriptors refer to the same machine, you can’t just arithmetically
compare the descriptors for equality; use the ND NODE CMP()
macro instead:
¯ If the return value from the macro is zero, the descriptors refer to
the same node.
¯ If the value is less than 0, the first node is “less than” the second.
¯ If the value is greater than 0, the first node is “greater than” the
second.
This is similar to the way that strcmp() and memcmp() work. It’s done
this way in case you want to do any sorting that’s based on node
descriptors.
The <sys/netmgr.h> header file also defines the following
networking functions:
¯ netmgr strtond()
¯ netmgr ndtostr()
netmgr strtond()
netmgr ndtostr()
int netmgr ndtostr(unsigned flags,
int nd,
char *buf ,
size t maxbuf );
This function converts the given node descriptor into a string and
stores it in the memory pointed to by buf . The size of the buffer is
given by maxbuf . The function returns the actual length of the node
name (even if the function had to truncate the name to get it to fit into
the space specified by maxbuf ), or -1 if an error occurs (errno is set).
The flags parameter controls the conversion process, indicating which
pieces of the string are to be output. The following bits are defined:
¯ Network card boot ROM (e.g. PXE, bootp downloads GRUB from
server)
1 Go to www.gnu.org/software/grub website.
Here’s what the PXE boot ROM does to download the OS image:
host testpxe {
hardware ethernet 00:E0:29:88:0D:D3; # MAC address of system to boot
fixed-address 192.168.0.3; # This line is optional
option option-150 "(nd)/tftpboot/menu.1st"; # Tell grub to use Menu file
filename "/tftpboot/pxegrub"; # Location of PXE grub image
}
# End dhcpd.conf
# menu.1st end
Building an OS image
In this section, there is a functional buildfile that you can use to create
an OS image that can be loaded by GRUB without a hard disk or any
local storage.
Create the image by typing the following:
☞ In a real buildfile, you can’t use a backslash (Ò) to break a long line
into shorter pieces, but we’ve done that here, just to make the buildfile
easier to read.
PATH=/proc/boot:/bin:/usr/bin:/sbin:/usr/sbin: \
/usr/local/bin:/usr/local/sbin \
LD LIBRARY PATH=/proc/boot: \
/lib:/usr/lib:/lib/dll procnto
}
[+script] startup-script = {
procmgr symlink ../../proc/boot/libc.so.2 /usr/lib/ldqnx.so.2
#
# do magic required to set up PnP and pci bios on x86
#
display msg Do the BIOS magic ...
seedres
pci-bios
waitfor /dev/pci
#
# A really good idea is to set hostname and domain
# before qnet is started
#
setconf CS HOSTNAME aboyd
setconf CS DOMAIN ott.qnx.com
#
# If you do not set the hostname to something
# unique before qnet is started, qnet will try
# to create and set the hostname to a hopefully
# unique string constructed from the ethernet
# address, which will look like EAc07f5e
# which will probably work, but is pretty ugly.
#
#
# start io-net, network driver and qnet
#
# NB to help debugging, add verbose=1 after -pqnet below
#
display msg Starting io-net and speedo driver and qnet ...
io-net -dspeedo -pqnet
#
# Now that we can fetch executables from the remote server
#
# now print out some interesting techie-type information
#
display msg hostname:
getconf CS HOSTNAME
display msg domain:
getconf CS DOMAIN
display msg uname -a:
uname -a
#
# create some text consoles
#
display msg .
display msg Starting 3 text consoles which you can flip
display msg between by holding ctrl alt + OR ctrl alt -
display msg .
devc-con -n3
waitfor /dev/con1
#
# start up some command line shells on the text consoles
#
reopen /dev/con1
[+session] TERM=qansi HOME=/ PATH=/bin:/usr/bin:\
/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin:\
/proc/boot ksh &
reopen /dev/con2
[+session] TERM=qansi HOME=/ PATH=/bin:/usr/bin:\
/usr/local/bin:/sbin:/usr/sbin:\
/usr/local/sbin:/proc/boot ksh &
reopen /dev/con3
[+session] TERM=qansi HOME=/ PATH=/bin:\
/usr/bin:/usr/local/bin:/sbin:/usr/sbin:\
/usr/local/sbin:/proc/boot ksh &
#
# startup script ends here
#
}
#
# Lets create some links in the virtual file system so that
# applications are fooled into thinking there’s a local hard disk
#
#
# Make /tmp point to the shared memory area
#
[type=link] /tmp=/dev/shmem
#
# Redirect console (error) messages to con1
#
[type=link] /dev/console=/dev/con1
#
# Now for the diskless qnet magic. In this example, we are booting
# using a server which has the hostname qpkg. Since we do not have
# a hard disk, we will create links to point to the servers disk
#
[type=link] /bin=/net/qpkg/bin
[type=link] /boot=/net/qpkg/boot
[type=link] /etc=/net/qpkg/etc
[type=link] /home=/net/qpkg/home
[type=link] /lib=/net/qpkg/lib
[type=link] /opt=/net/qpkg/opt
[type=link] /pkgs=/net/qpkg/pkgs
[type=link] /root=/net/qpkg/root
[type=link] /sbin=/net/qpkg/sbin
[type=link] /usr=/net/qpkg/usr
[type=link] /var=/net/qpkg/var
[type=link] /x86=/
#
# these are essential shared libraries which must be in the
# image for us to start io-net, the ethernet driver and qnet
#
libc.so
devn-speedo.so
npm-qnet.so
#
# copy code and data for all following executables
# which will be located in /proc/boot in the image
#
[data=copy]
seedres
pci-bios
setconf
io-net
waitfor
Troubleshooting
If the boot is unsuccessful, troubleshoot as follows:
Make sure your:
¯ inetd is running
A20 gate
On x86-based systems, a hardware component that forces the A20
address line on the bus to zero, regardless of the actual setting of the
A20 address line on the processor. This component is in place to
support legacy systems, but the QNX Neutrino OS doesn’t require
any such hardware. Note that some processors, such as the 386EX,
have the A20 gate hardware built right into the processor itself — our
IPL will disable the A20 gate as soon as possible after startup.
adaptive
Scheduling algorithm whereby a thread’s priority is decayed by 1. See
also FIFO, round robin, and sporadic.
atomic
Of or relating to atoms. :-)
In operating systems, this refers to the requirement that an operation,
or sequence of operations, be considered indivisible. For example, a
thread may need to move a file position to a given location and read
data. These operations must be performed in an atomic manner;
otherwise, another thread could preempt the original thread and move
the file position to a different location, thus causing the original thread
to read data from the second thread’s position.
attributes structure
Structure containing information used on a per-resource basis (as
opposed to the OCB, which is used on a per-open basis).
This structure is also known as a handle. The structure definition is
fixed (iofunc attr t), but may be extended. See also mount
structure.
bank-switched
A term indicating that a certain memory component (usually the
device holding an image) isn’t entirely addressable by the processor.
In this case, a hardware component manifests a small portion (or
“window”) of the device onto the processor’s address bus. Special
block-integral
The requirement that data be transferred such that individual structure
components are transferred in their entirety — no partial structure
component transfers are allowed.
In a resource manager, directory data must be returned to a client as
block-integral data. This means that only complete struct dirent
structures can be returned — it’s inappropriate to return partial
structures, assuming that the next IO READ request will “pick up”
where the previous one left off.
bootable
An image can be either bootable or nonbootable. A bootable image is
one that contains the startup code that the IPL can transfer control to.
bootfile
The part of an OS image that runs the startup code and the Neutrino
microkernel.
budget
In sporadic scheduling, the amount of time a thread is permitted to
execute at its normal priority before being dropped to its low priority.
buildfile
A text file containing instructions for mkifs specifying the contents
and other details of an image, or for mkefs specifying the contents
and other details of an embedded filesystem image.
canonical mode
Also called edited mode or “cooked” mode. In this mode the
character device library performs line-editing operations on each
received character. Only when a line is “completely entered” —
typically when a carriage return (CR) is received — will the line of
data be made available to application processes. Contrast raw mode.
channel
A kernel object used with message passing.
In QNX Neutrino, message passing is directed towards a connection
(made to a channel); threads can receive messages from channels. A
thread that wishes to receive messages creates a channel (using
ChannelCreate()), and then receives messages from that channel
(using MsgReceive()). Another thread that wishes to send a message
to the first thread must make a connection to that channel by
“attaching” to the channel (using ConnectAttach()) and then sending
data (using MsgSend()).
CIFS
Common Internet File System (aka SMB) — a protocol that allows a
client workstation to perform transparent file access over a network to
a Windows 95/98/NT server. Client file access calls are converted to
CIFS protocol requests and are sent to the server over the network.
The server receives the request, performs the actual filesystem
operation, and sends a response back to the client.
CIS
Card Information Structure — a data block that maintains information
about flash configuration. The CIS description includes the types of
memory devices in the regions, the physical geometry of these
devices, and the partitions located on the flash.
combine message
A resource manager message that consists of two or more messages.
The messages are constructed as combine messages by the client’s C
library (e.g. stat(), readblock()), and then handled as individual
messages by the resource manager.
The purpose of combine messages is to conserve network bandwidth
and/or to provide support for atomic operations. See also connect
message and I/O message.
connect message
In a resource manager, a message issued by the client to perform an
operation based on a pathname (e.g. an io open message).
Depending on the type of connect message sent, a context block (see
OCB) may be associated with the request and will be passed to
subsequent I/O messages. See also combine message and I/O
message.
connection
A kernel object used with message passing.
Connections are created by client threads to “connect” to the channels
made available by servers. Once connections are established, clients
can MsgSendv() messages over them. If a number of threads in a
process all attach to the same channel, then the one connection is
shared among all the threads. Channels and connections are identified
within a process by a small integer.
The key thing to note is that connections and file descriptors (FD) are
one and the same object. See also channel and FD.
context
Information retained between invocations of functionality.
When using a resource manager, the client sets up an association or
context within the resource manager by issuing an open() call and
getting back a file descriptor. The resource manager is responsible for
storing the information required by the context (see OCB). When the
client issues further file-descriptor based messages, the resource
manager uses the OCB to determine the context for interpretation of
the client’s messages.
cooked mode
See canonical mode.
core dump
A file describing the state of a process that terminated abnormally.
critical section
A code passage that must be executed “serially” (i.e. by only one
thread at a time). The simplest from of critical section enforcement is
via a mutex.
deadlock
A condition in which one or more threads are unable to continue due
to resource contention. A common form of deadlock can occur when
one thread sends a message to another, while the other thread sends a
message to the first. Both threads are now waiting for each other to
reply to the message. Deadlock can be avoided by good design
practices or massive kludges — we recommend the good design
approach.
device driver
A process that allows the OS and application programs to make use of
the underlying hardware in a generic way (e.g. a disk drive, a network
interface). Unlike OSs that require device drivers to be tightly bound
into the OS itself, device drivers for QNX Neutrino are standard
processes that can be started and stopped dynamically. As a result,
adding device drivers doesn’t affect any other part of the OS —
drivers can be developed and debugged like any other application.
Also, device drivers are in their own protected address space, so a bug
in a device driver won’t cause the entire OS to shut down.
DNS
Domain Name Service — an Internet protocol used to convert ASCII
domain names into IP addresses. In QNX native networking, dns is
one of Qnet’s builtin resolvers.
dynamic bootfile
An OS image built on the fly. Contrast static bootfile.
dynamic linking
The process whereby you link your modules in such a way that the
Process Manager will link them to the library modules before your
program runs. The word “dynamic” here means that the association
between your program and the library modules that it uses is done at
load time, not at linktime. Contrast static linking. See also runtime
loading.
edge-sensitive
One of two ways in which a PIC (Programmable Interrupt Controller)
can be programmed to respond to interrupts. In edge-sensitive mode,
the interrupt is “noticed” upon a transition to/from the rising/falling
edge of a pulse. Contrast level-sensitive.
edited mode
See canonical mode.
EOI
End Of Interrupt — a command that the OS sends to the PIC after
processing all Interrupt Service Routines (ISR) for that particular
interrupt source so that the PIC can reset the processor’s In Service
Register. See also PIC and ISR.
EPROM
Erasable Programmable Read-Only Memory — a memory
technology that allows the device to be programmed (typically with
higher-than-operating voltages, e.g. 12V), with the characteristic that
any bit (or bits) may be individually programmed from a 1 state to a 0
state. To change a bit from a 0 state into a 1 state can only be
accomplished by erasing the entire device, setting all of the bits to a 1
state. Erasing is accomplished by shining an ultraviolet light through
the erase window of the device for a fixed period of time (typically
10-20 minutes). The device is further characterized by having a
limited number of erase cycles (typically 10e5 - 10e6). Contrast flash
and RAM.
event
A notification scheme used to inform a thread that a particular
condition has occurred. Events can be signals or pulses in the general
case; they can also be unblocking events or interrupt events in the
case of kernel timeouts and interrupt service routines. An event is
delivered by a thread, a timer, the kernel, or an interrupt service
routine when appropriate to the requestor of the event.
FD
File Descriptor — a client must open a file descriptor to a resource
manager via the open() function call. The file descriptor then serves
as a handle for the client to use in subsequent messages. Note that a
file descriptor is the exact same object as a connection ID (coid,
returned by ConnectAttach()).
FIFO
First In First Out — a scheduling algorithm whereby a thread is able
to consume CPU at its priority level without bounds. See also
adaptive, round robin, and sporadic.
flash memory
A memory technology similar in characteristics to EPROM memory,
with the exception that erasing is performed electrically instead of via
ultraviolet light, and, depending upon the organization of the flash
memory device, erasing may be accomplished in blocks (typically
64k bytes at a time) instead of the entire device. Contrast EPROM
and RAM.
FQNN
Fully Qualified NodeName — a unique name that identifies a QNX
Neutrino node on a network. The FQNN consists of the nodename
plus the node domain tacked together.
garbage collection
Aka space reclamation, the process whereby a filesystem manager
recovers the space occupied by deleted files and directories.
HA
High Availability — in telecommunications and other industries, HA
describes a system’s ability to remain up and running without
interruption for extended periods of time.
handle
A pointer that the resource manager base library binds to the
pathname registered via resmgr attach(). This handle is typically used
to associate some kind of per-device information. Note that if you use
the iofunc *() POSIX layer calls, you must use a particular type of
handle — in this case called an attributes structure.
image
In the context of embedded QNX Neutrino systems, an “image” can
mean either a structure that contains files (i.e. an OS image) or a
structure that can be used in a read-only, read/write, or
read/write/reclaim FFS-2-compatible filesystem (i.e. a flash
filesystem image).
interrupt
An event (usually caused by hardware) that interrupts whatever the
processor was doing and asks it do something else. The hardware will
generate an interrupt whenever it has reached some state where
software intervention is required.
interrupt handler
See ISR.
interrupt latency
The amount of elapsed time between the generation of a hardware
interrupt and the first instruction executed by the relevant interrupt
service routine. Also designated as “T il ”. Contrast scheduling
latency.
I/O message
A message that relies on an existing binding between the client and
the resource manager. For example, an IO READ message depends
on the client’s having previously established an association (or
context) with the resource manager by issuing an open() and getting
back a file descriptor. See also connect message, context, combine
message, and message.
I/O privileges
A particular right, that, if enabled for a given thread, allows the thread
to perform I/O instructions (such as the x86 assembler in and out
instructions). By default, I/O privileges are disabled, because a
program with it enabled can wreak havoc on a system. To enable I/O
privileges, the thread must be running as root, and call ThreadCtl().
IPC
Interprocess Communication — the ability for two processes (or
threads) to communicate. QNX Neutrino offers several forms of IPC,
most notably native messaging (synchronous, client/server
relationship), POSIX message queues and pipes (asynchronous), as
well as signals.
IPL
Initial Program Loader — the software component that either takes
control at the processor’s reset vector (e.g. location 0xFFFFFFF0 on
the x86), or is a BIOS extension. This component is responsible for
setting up the machine into a usable state, such that the startup
program can then perform further initializations. The IPL is written in
assembler and C. See also BIOS extension signature and startup
code.
IRQ
Interrupt Request — a hardware request line asserted by a peripheral
to indicate that it requires servicing by software. The IRQ is handled
by the PIC, which then interrupts the processor, usually causing the
processor to execute an Interrupt Service Routine (ISR).
ISR
Interrupt Service Routine — a routine responsible for servicing
hardware (e.g. reading and/or writing some device ports), for
updating some data structures shared between the ISR and the
thread(s) running in the application, and for signalling the thread that
some kind of event has occurred.
kernel
See microkernel.
level-sensitive
One of two ways in which a PIC (Programmable Interrupt Controller)
can be programmed to respond to interrupts. If the PIC is operating in
level-sensitive mode, the IRQ is considered active whenever the
corresponding hardware line is active. Contrast edge-sensitive.
linearly mapped
A term indicating that a certain memory component is entirely
addressable by the processor. Contrast bank-switched.
message
A parcel of bytes passed from one process to another. The OS
attaches no special meaning to the content of a message — the data in
a message has meaning for the sender of the message and for its
receiver, but for no one else.
Message passing not only allows processes to pass data to each other,
but also provides a means of synchronizing the execution of several
processes. As they send, receive, and reply to messages, processes
undergo various “changes of state” that affect when, and for how
long, they may run.
microkernel
A part of the operating system that provides the minimal services
used by a team of optional cooperating processes, which in turn
provide the higher-level OS functionality. The microkernel itself lacks
filesystems and many other services normally expected of an OS;
those services are provided by optional processes.
mount structure
An optional, well-defined data structure (of type iofunc mount t)
within an iofunc *() structure, which contains information used on a
per-mountpoint basis (generally used only for filesystem resource
managers). See also attributes structure and OCB.
mountpoint
The location in the pathname space where a resource manager has
“registered” itself. For example, the serial port resource manager
registers mountpoints for each serial device (/dev/ser1,
/dev/ser2, etc.), and a CD-ROM filesystem may register a single
mountpoint of /cdrom.
mutex
Mutual exclusion lock, a simple synchronization service used to
ensure exclusive access to data shared between threads. It is typically
acquired (pthread mutex lock()) and released
(pthread mutex unlock()) around the code that accesses the shared
data (usually a critical section). See also critical section.
name resolution
In a QNX Neutrino network, the process by which the Qnet network
manager converts an FQNN to a list of destination addresses that the
transport layer knows how to get to.
name resolver
Program code that attempts to convert an FQNN to a destination
address.
NDP
Node Discovery Protocol — proprietary QNX Software Systems
protocol for broadcasting name resolution requests on a QNX
Neutrino LAN.
network directory
A directory in the pathname space that’s implemented by the Qnet
network manager.
Neutrino
Name of an OS developed by QNX Software Systems.
NFS
Network FileSystem — a TCP/IP application that lets you graft
remote filesystems (or portions of them) onto your local namespace.
Directories on the remote systems appear as part of your local
filesystem and all the utilities you use for listing and managing files
(e.g. ls, cp, mv) operate on the remote files exactly as they do on
your local files.
NMI
Nonmaskable Interrupt — an interrupt that can’t be masked by the
processor. We don’t recommend using an NMI!
node domain
A character string that the Qnet network manager tacks onto the
nodename to form an FQNN.
nodename
A unique name consisting of a character string that identifies a node
on a network.
nonbootable
A nonbootable OS image is usually provided for larger embedded
systems or for small embedded systems where a separate,
configuration-dependent setup may be required. Think of it as a
second “filesystem” that has some additional files on it. Since it’s
nonbootable, it typically won’t contain the OS, startup file, etc.
Contrast bootable.
OCB
Open Control Block (or Open Context Block) — a block of data
established by a resource manager during its handling of the client’s
open() function. This context block is bound by the resource manager
to this particular request, and is then automatically passed to all
subsequent I/O functions generated by the client on the file descriptor
returned by the client’s open().
package filesystem
A virtual filesystem manager that presents a customized view of a set
of files and directories to a client. The “real” files are present on some
medium; the package filesystem presents a virtual view of selected
files to the client.
pathname prefix
See mountpoint.
persistent
When applied to storage media, the ability for the medium to retain
information across a power-cycle. For example, a hard disk is a
persistent storage medium, whereas a ramdisk is not, because the data
is lost when power is lost.
Photon microGUI
The proprietary graphical user interface built by QNX Software
Systems.
PIC
Programmable Interrupt Controller — hardware component that
handles IRQs. See also edge-sensitive, level-sensitive, and ISR.
PID
Process ID. Also often pid (e.g. as an argument in a function call).
POSIX
An IEEE/ISO standard. The term is an acronym (of sorts) for Portable
Operating System Interface — the “X” alludes to “UNIX”, on which
the interface is based.
preemption
The act of suspending the execution of one thread and starting (or
resuming) another. The suspended thread is said to have been
“preempted” by the new thread. Whenever a lower-priority thread is
prefix tree
The internal representation used by the Process Manager to store the
pathname table.
priority inheritance
The characteristic of a thread that causes its priority to be raised or
lowered to that of the thread that sent it a message. Also used with
mutexes. Priority inheritance is a method used to prevent priority
inversion.
priority inversion
A condition that can occur when a low-priority thread consumes CPU
at a higher priority than it should. This can be caused by not
supporting priority inheritance, such that when the lower-priority
thread sends a message to a higher-priority thread, the higher-priority
thread consumes CPU on behalf of the lower-priority thread. This is
solved by having the higher-priority thread inherit the priority of the
thread on whose behalf it’s working.
process
A nonschedulable entity, which defines the address space and a few
data areas. A process must have at least one thread running in it —
this thread is then called the first thread.
process group
A collection of processes that permits the signalling of related
processes. Each process in the system is a member of a process group
identified by a process group ID. A newly created process joins the
process group of its creator.
process group ID
The unique identifier representing a process group during its lifetime.
A process group ID is a positive integer. The system may reuse a
process group ID after the process group dies.
process ID (PID)
The unique identifier representing a process. A PID is a positive
integer. The system may reuse a process ID after the process dies,
provided no existing process group has the same ID. Only the Process
Manager can have a process ID of 1.
pty
Pseudo-TTY — a character-based device that has two “ends”: a
master end and a slave end. Data written to the master end shows up
on the slave end, and vice versa. These devices are typically used to
interface between a program that expects a character device and
another program that wishes to use that device (e.g. the shell and the
telnet daemon process, used for logging in to a system over the
Internet).
pulses
In addition to the synchronous Send/Receive/Reply services, QNX
Neutrino also supports fixed-size, nonblocking messages known as
pulses. These carry a small payload (four bytes of data plus a single
byte code). A pulse is also one form of event that can be returned
from an ISR or a timer. See MsgDeliverEvent() for more information.
Qnet
The native network manager in QNX Neutrino.
QoS
Quality of Service — a policy (e.g. loadbalance) used to connect
nodes in a network in order to ensure highly dependable transmission.
QoS is an issue that often arises in high-availability (HA) networks as
well as realtime control systems.
RAM
Random Access Memory — a memory technology characterized by
the ability to read and write any location in the device without
limitation. Contrast flash and EPROM.
raw mode
In raw input mode, the character device library performs no editing on
received characters. This reduces the processing done on each
character to a minimum and provides the highest performance
interface for reading data. Also, raw mode is used with devices that
typically generate binary data — you don’t want any translations of
the raw binary stream between the device and the application.
Contrast canonical mode.
replenishment
In sporadic scheduling, the period of time during which a thread is
allowed to consume its execution budget.
reset vector
The address at which the processor begins executing instructions after
the processor’s reset line has been activated. On the x86, for example,
this is the address 0xFFFFFFF0.
resource manager
A user-level server program that accepts messages from other
programs and, optionally, communicates with hardware. QNX
Neutrino resource managers are responsible for presenting an
interface to various types of devices, whether actual (e.g. serial ports,
parallel ports, network cards, disk drives) or virtual (e.g. /dev/null,
a network filesystem, and pseudo-ttys).
RMA
Rate Monotonic Analysis — a set of methods used to specify,
analyze, and predict the timing behavior of realtime systems.
round robin
Scheduling algorithm whereby a thread is given a certain period of
time to run. Should the thread consume CPU for the entire period of
its timeslice, the thread will be placed at the end of the ready queue
for its priority, and the next available thread will be made READY. If
a thread is the only thread READY at its priority level, it will be able
to consume CPU again immediately. See also adaptive, FIFO, and
sporadic.
runtime loading
The process whereby a program decides while it’s actually running
that it wishes to load a particular function from a library. Contrast
static linking.
scheduling latency
The amount of time that elapses between the point when one thread
makes another thread READY and when the other thread actually gets
some CPU time. Note that this latency is almost always at the control
of the system designer.
Also designated as “Tsl ”. Contrast interrupt latency.
session
A collection of process groups established for job control purposes.
Each process group is a member of a session. A process belongs to
the session that its process group belongs to. A newly created process
joins the session of its creator. A process can alter its session
membership via setsid(). A session can contain multiple process
groups.
session leader
A process whose death causes all processes within its process group
to receive a SIGHUP signal.
software interrupts
Similar to a hardware interrupt (see interrupt), except that the source
of the interrupt is software.
sporadic
Scheduling algorithm whereby a thread’s priority can oscillate
dynamically between a “foreground” or normal priority and a
“background” or low priority. A thread is given an execution budget
of time to be consumed within a certain replenishment period. See
also adaptive, FIFO, and round robin.
startup code
The software component that gains control after the IPL code has
performed the minimum necessary amount of initialization. After
gathering information about the system, the startup code transfers
control to the OS.
static bootfile
An image created at one time and then transmitted whenever a node
boots. Contrast dynamic bootfile.
static linking
The process whereby you combine your modules with the modules
from the library to form a single executable that’s entirely
self-contained. The word “static” implies that it’s not going to change
— all the required modules are already combined into one.
thread
The schedulable entity under QNX Neutrino. A thread is a flow of
execution; it exists within the context of a process.
timer
A kernel object used in conjunction with time-based functions. A
timer is created via timer create() and armed via timer settime(). A
timer can then deliver an event, either periodically or on a one-shot
basis.
timeslice
A period of time assigned to a round-robin or adaptive scheduled
thread. This period of time is small (on the order of tens of
milliseconds); the actual value shouldn’t be relied upon by any
program (it’s considered bad design).
! B
-Bsymbolic 299 big-endian 278
resmgr attr t structure 188 BLOCKED state 44
<limits.h> 5 blocking states 45
<setjmp.h> 5 build-cfg 315
<signal.h> 5 build-hooks 315
<stdio.h> 6 configure opts 316
<stdlib.h> 6 hook pinfo() 315, 318
<time.h> 6 hook postconfigure() 315, 317
hook postmake() 315, 318
hook preconfigure() 315, 316
hook premake() 315, 318
A make CC 316
make cmds 316
access() 141, 142 make opts 316
ARM memory management 423 SYSNAME 315
ASFLAGS macro 308 TARGET SYSNAME 316
ASVFLAG * macro 308 buildfile 12
attribute structure
extending to contain pointer to
resource 143
in resource managers 101 C
cache, ARM 424
CCFLAGS macro 308
version 340 E
version number 340
warranty 340 EAGAIN 125
watch 356, 361 EARLY DIRS macro 294
watchpoints edge-sensitive interrupts 231
command list 363 End of Interrupt (EOI) 233, 241
conditions 361 ENOSYS 89, 127, 129, 189
defined 350 environment variables
listing 354 LD LIBRARY PATH 386
setting 356 PATH 386
threads 356 QNX CONFIGURATION 3
whatis 411 QNX HOST 4
where 376 QNX TARGET 4
working directory 344 SHELL 343
write 419, 420 EOF 118
x 388, 393 EOK 138
debugging 20 events, interrupt, running out
cross-development 21 of 241
self-hosted 20 exceptions, floating-point 59
symbolic 22 EXCLUDE OBJS macro 306
via TCP/IP link 25 execing 56
DEFFILE macro 306 exit status 59
devctl() 141, 190 exit() 42
devices EXTRA INCVPATH macro 306
/dev/null 93 EXTRA LIBVPATH macro 306
/dev/shmem 11 EXTRA OBJS macro 307
dispatch 92 EXTRA SRCVPATH macro 306
dispatch t 92
dispatch create() 92
dup() 167
dynamic
F
library 16 Fast Context Switch Extension
linking 15, 307 (FCSE) 424
port link via TCP/IP 27 fcntl.h 6
fgetc() 109
FIFO (scheduling method) 49
FILE OFFSET BITS 5
files G
.1 extension 18
.a suffix 16 gcc
.so suffix 16 compiling for debugging 341
/usr/include/ 293 GNU configure 314
/usr/include/mk/ 293 GNUmakefile 314
common.mk 297
debugger initialization 334
host-specific 4
inetd.conf 27 H
large, support for 5
hardware interrupts See interrupts,
Makefile 291
ISR
Makefile.dnm 294
header files 7
offsets, 64-bit 5
Hello, world! program 8
qconf-qrelease.mk 301
helper functions
qconfig.mk 301
in resource managers 117
qrules.mk 304
High Availability Manager
qtargets.mk 309
(HAM) 62
recurse.mk 293
hook pinfo() 315, 318
target-specific 4
hook postconfigure() 315, 317
filesystem
hook postmake() 315, 318
/proc 22
hook preconfigure() 315, 316
builtin via /dev/shmem 11
hook premake() 315, 318
find malloc ptr() 266
host-specific files, location of 4
float.h 6
floating-point exceptions 59
FQNN (Fully Qualified Node
Name) 438 I
fread() 109
fstat() 106, 136, 141 I/O
FTYPE ANY 94 functions table in resource
FTYPE MQUEUE 94 managers 93
Fully Qualified Node Name message 187
(FQNN) 438 ports 423
privileges 423
include directory 7
INCVPATH macro 306
iofunc mmap() 105 io read handler 82, 83, 86, 109, 185
iofunc mmap default() 105 IO READ message 82, 83, 96, 109,
IOFUNC MOUNT 32BIT 107 110, 118, 130, 136, 137,
IOFUNC MOUNT FLAGS PRIVATE 107 182, 184–187
IOFUNC NFUNCS 108 io read() 110, 180
iofunc ocb
attach() 104, 118 IO STAT 84, 133, 141, 142
iofunc ocb
calloc() 143 io stat() 141, 142
iofunc ocb
detach() 104 IO UNBLOCK 136, 168
iofunc ocb
free() 143 io unlock ocb() 140–142
IOFUNC OCB PRIVILEGED 101 IOV 185
iofunc open() 118 IO WRITE 82, 139
iofunc open default() 105, 115, io write handler 82
117 IO WRITE message 82, 118, 130
IOFUNC PC CHOWN RESTRICTED 107 io write() 119, 140, 180
IOFUNC PC LINK DIR 108 IO XTYPE NONE 128
IOFUNC PC NO TRUNC 107 IO XTYPE OFFSET 128, 129, 132
IOFUNC PC SYNC IO 108 IPC (interprocess
iofunc read default() 118 communication) 39
iofunc read verify() 119 ISR See also interrupt handler
iofunc stat() 117 coupling data structure
iofunc stat default() 117 with 238
iofunc time update() 106 defined 229
iofunc write default() 118 environment 241
iofunc write verify() 119 functions safe to use
io lock ocb() 140–142 within 234
IO LSEEK 136–139, 185 preemption considerations 236
IO LSEEK message 136, 137, 185 pseudo-code example 238
io lseek() 140 responsibilities of 233
IO MSG 190 returning SIGEV INTR 238
IO NOTIFY 158 returning SIGEV PULSE 238
io notify() 153 returning SIGEV SIGNAL 238
IO OPEN 108, 169, 182 rules of acquisition 230
io open handler 82 running out of interrupt
io open() 109, 115, 141, 142, 169, events 241
182 signalling a thread 236
IO PATHCONF 107, 108
IO READ 136–138, 185
string.h 6 U
strtok() 183
struct sigevent 237 unblocking 168
Supervisor mode 423 unistd.h 6
SYSNAME 315 USEFILE macro 309
System mode 423 User mode 423
T V
target-specific files, location of 4 VARIANT LIST macro 304
TARGET SYSNAME 316 VFLAG * macro 308
TCP/IP
debugging and 25
dynamic port link 27
static port link 26 W
termios.h 6
write() 90, 95, 106, 135, 139
ThreadCtl() 423
writeblock() 139
THREAD POOL PARAM T 98
threads
“main” 42
defined 41 X
resource managers 89, 96
stacks 426 x86
system mode, executing in 423 accessing data objects via any
using to handle interrupts 240 address 279
time members distinct address spaces 277
in attribute structure of resource Xscale memory management 423
managers 106
time.h 6
timeslice
defined 51
TOUCH HOST macro 303
TraceEvent() 235
types.h 6
typographical conventions xix