Concurrent Programming Using Threads
Concurrent Programming Using Threads
Concurrent Programming Using Threads
1.1 Introduction
In multitasking system a task which we have to resolve we can divide to subtasks. Subtasks
can be executed by separate processes. Operating system reserves separate area of memory
for every process. If we have multi processor computer processes are executed in parallel –
each process is executed by separate processor. In single-processor computer, the operating
system uses a hardware clock to allocate ,,time slices” for each currently running process. If
the time slices are small and computer is not overloaded with too many processes trying to do
something it appears to a user as if all the processes are running concurrently.
Threads are often called lightweight processes. They also permit concurrent realization of
subtasks. Managing threads requires fewer system resources than managing processes. Threads
have own program counter, a stack, and a set of registers but all the other data structures
belong to the process which created them and threads share them. The operating system does
not protect before parallel access to shared data structures. Each process may consists of single
thread, like Process 1 or more threads like Processes 2 to 5 (see fig. 1.1 1 ). Comunnication
between threads is more efficient and in many cases easier to use than communication between
processes. Operating system does not control concurrent access to the same data structures by
different threads. This is made by programmer. How to do it we will describe in next sections.
In the next part of this work we are not going to describe parallel processes but we are going
to describe a threads.
There are two kinds of threads: user-level threads and kernel-level threads (fig. 1.1). User-
level threads avoid the kernel entirely. The kernel is not aware of the existence of threads,
1
Fig. 1.1 is taken from Sun web page:
http://docs.sun.com/app/docs/doc/801-6659/6i116aqmb?a=view
1
Figure 1.1: User-level and kernel-level threads
it only knows about existence of process containing threads. All thread management is done
by the application by using a thread library. Thread switching can be done without kernel
intervention. User-level thread can be run on any operating systems, it only needs a thread
library. The threads library uses underlying threads of control called light-weight processes
that are supported by the kernel. A light-weight process (LWP) is like as a virtual processor
that executes code or system calls. It is a bridge between the user level and the kernel level.
Each process consists of one or more light-weight processes, each of which runs single or more
user-level threads. But the kernel does not know how many threads are executed by light-
weight process. It treats whom like a single-threaded process. For example, Process 2 (fig.
1.1) contains only single light-weight process, which runs two threads but Process 4 contains
two light-weight processes. Each of them executes only single thread.
Light-weight process is referred to as a kernel-level thread. There is a one-to-one mapping
of light-weight process to kernel-level thread. Each process within system is associated with
single or more kernel-level threads. For example, Process 1 (fig. 1.1) is associated with only one
kernel-level thread but Process 5 is associated with three kernel-level threads. In distinguishing
from user-level threads, the kernel is aware of existence of kernel-level threads. All thread
management is done by the kernel. There is no thread library but an API (Application
Programming Interface) to kernel thread management. The kernel schedules many kernel-
2
level threads on many processors. It decides which processor executes kernel-level thread.
Threads can be executed concurrently if they are not run by the same light-weight process.
If threads are run by the same light-weight process - like threads in Process 2 (fig. 1.1) -
they cannot be executed concurrently, because they will be executed by the same processor.
By that reason the first thread is blocked e.g. if the second is blocked during input-output
operation. Other situation is in case of Process 5. It contains three light-weight processes, the
last runs only single thread. So last thread can be run concurrently with other three threads.
It will not be executed on the same processor that three other threads.
There are problems related to concurrent programing - synchronization, mutual exclusion
and a deadlock. Often threads share the same resources like variables, files, printers, etc.
Sometimes a thread must have exclusive access to a resource. For example, when a thread
is updating variable, no other thread should have access to this variable at the same time.
A sequence of instructions that access shared resource is named as a critical section. The
necessity to restrict access to a resource is termed as a mutual exclusion. It involves the
following:
• If there are multiple requests for a resource, it must be granted to one of the threads in
finite time.
• When a thread has exclusive access to a shared resource it releases it in a finite time.
• When a thread requests a resource it must obtain the resource in finite time.
• A thread should not consume processor time while waiting for a resource.
There are several mechanisms that can be used to solve the mutual exclusion problem. We
will describe semaphores, mutexes and monitors.
The second problem related to concurrent programing is a deadlock. It is a situation
in which a set of threads are prevented from making any further progress by their mutually
incompatible demands for additional resources. Deadlock can occur if, and only if, the following
conditions all hold together:
• threads continue to hold a resource while waiting for a new resource request to be granted,
• there is a cycle of threads, each is awaiting a resource held by the next thread in the
cycle.
3
A deadlock may be prevented by weakening one or more of the conditions. For example the
second condition may be modified to require a thread to request all needed resources at one
time. The circular-wait condition may be modified by imposing a total ordering on resources
and insisting that they be requested in that order.
In next sections we are presented the Pthreads library for creating threads and synchro-
nization. We will describe threads synchronization using semaphores, mutexes and condition
variables. There are presented only part of library which knowledge is indispensable for reali-
zing of laboratory exercise. It is not complete the Pthreads library manual. We assume that
student has basic knowledge of theoretical problems which were presented in the lectures.
where example is the name of executable (output) file and example.c is the name of a source
file.
4
where:
If thread creation is not successful, the function returns a non-zero error code. Otherwise the
function returns a zero and the thread id of the newly created thread is stored in the location
pointed by first argument and thread func function is called.
It terminates the execution of the calling thread. It is called after a thread has completed
its work and is no longer required to exist. The value argument is the return value of the
thread. It can be consulted from another thread or process calling the pthread join function
(described in section 1.3.3). The pthrtead exit terminates the thread but it does not close
files. Any files opened inside the thread will remain open after the thread is terminated. It
does not free the memory allocated inside the thread, either.
A thread can terminate another thread. It needs to call a function:
5
where tid is a thread id, which is to be terminated. It sends a cancellation request to this
thread. The target thread can ignore the request, honor it immediatly, or defer it till it reaches
a cancellation point. The last option is default for all threads created with deffault attributes.
A cancellation point is serve by functions that might suspend the execution of a thread for a
long time. For example, there are following functions:
• pthread join,
• sem wait.
If a thread executes any of these functions, it will check for deferred cancel requests. Funtions
mentioned above are described in next sections.
where:
The joined thread must be in the joinable. All threads created with default attributes are
joinable. On success, 0 is returned and the return value of terminated thread is stored in the
location pointed to by value argument. On failure a non-zero value is returned.
When a joinable thread terminates, its memory resources - thread descriptor and stack -
are not deallocated untill another thread call the pthread join function on it. Therefore, it
must be called once for each joinable thread to avoid memory leaks.
6
1.3.4 Sample code
1: #include <stdio.h>
2: #include <pthread.h>
7
The total of the values of the matrix is computed by N threads. Each thread computes the
total value in the row and returns it to the main function which sums it. The main function
initializes the matrix (lines 21-23) and creates a N threads (lines 24-29). Each of N threads
is created by the pthread create function (line 25) and represented by descriptor. Thread
descriptors are declared in line 19 as an N size array with the pthread t type values. Threads
share acces to the matrix which is declared as a global variable (line 5). The matrix is not
protected in a critical section because threads only read it and do not update. Threads do not
read values from the same matrix’s cells because values in the same row are summed by only
one thread.
Each thread executes the SumValues function which is defined in lines 6-15. The thread
function has a single argument. Argument is passed to the thread function when thread is
created - it is the last argument of the pthread create function. It is an index of matrix’s
row which will be summed by thread. Each thread computes a total value in one row (lines
11-12) and returns it calling the pthread exit function (line 14).
The main function waits for all threads to terminate using the pthread join function (line
33) and computes the total of the values of the matrix. It sums values returned by all threads
(line 34). When the total is computed it prints it to standard output using the printf function
(line 36).
1.4 Mutexes
A mutex is a mutual exclusion is useful for protecting access to a shared resources from
concurrent modifications. In fact, this is how the mutex got its name - MUTUal EXclusion.
It has two possible states:
Each mutex has a locking count (numer of locks operations performed on it by the calling
threads). A mutex can never be owned by two different threads in same time. A thread
attempting to lock a mutex that is already locked by another thread is suspended until the
owning thread releases the mutex first. At this time, the first thread will wake up and continue
execution, having the mutex locked by it. A suspended thread does not consume any CPU
resources. Locking a mutex is an atomic operation.
8
1.4.1 Creating a mutex
Mutexes in the Pthread library are represented by variables of type pthread mutex t. A
mutex must be initialized before it can be used. It is initialized by calling the function:
The pthread mutex init function always returns 0. The mutex is initially unlocked.
It behaves identically to the pthread mutex lock function, excepts that it does not block the
calling thread if the mutex is already locked by another thread. It returns immediately with
the error code EBUSY.
At exit from a critical section mutex is released by calling function:
On success the function returns 0, otherwise non-zero error code. A thread should always
release a mutex that it has locked. If the mutex is created with default attributes, attempting
to unlock the mutex if it was not locked by the calling thread results in undefined behavior.
The function decrements a locking count of the mutex, and only when this count reaches 0 is
the mutex actually unlocked.
3
Mutexes with default attributes suffices to realize a tasks in the laboratory exercises.
9
1.4.3 Destroying a mutex
If a mutex is no longer needed it should be destroyed by calling function:
A mutex must be unclocked on entrance to this function. On success the function returns 0,
otherwise it returns non-zero error code.
10
1: #include <stdio.h>
2: #include <pthread.h>
/* number of matrix rows and columns */
3: #define M 5
4: #define N 10
5: int matrix[N][M];
6: int total; /* the total values in the matrix */
7: pthread_mutex_t mutex; /* mutex protecting the "total" variable */
/* thread function; it sums the values of the matrix */
8: void *SumValues(void *i)
9: {
10: int n = (int)i; /* number of row */
11: int my_total = 0; /* the total of the values in the row */
12: int j;
13: for (j = 0; j < M; j++) /* sum values in the "n" row */
14: my_total += matrix[n][j];
15: printf("The total in row %d is %d\n", n, my_total);
16: pthread_mutex_lock(&mutex); /* lock a mutex */
17: total += my_total; /* update a global total */
18: pthread_mutex_unlock(&mutex); /* release a mutex */
19: pthread_exit(NULL); /* terminate a thread */
20: }
21: int main(int argc, char *argv[])
22: {
23: int i, j;
24: pthread_t threads[N]; /* descriptors of threads */
25: for (i = 0; i < N; i++) /* initialize the matrix */
26: for (j = 0; j < M; j++)
27: matrix[i][j] = i * M + j;
28: pthread_mutex_init(&mutex, NULL); /* initialize a mutex */
29: for(i = 0; i < N; i++) /* create threads */
30: if (pthread_create(&threads[i], NULL, SumValues, (void *)i))
31: {
32: printf("Can not create a thread\n");
33: exit(1);
34: }
35: for (i = 0; i < N; i++) /* wait for terminate a threads */
36: {
37: int value; /* value returned by thread */
38: pthread_join(threads[i], (void **)&value);
39: }
40: printf("The total values in the matrix is %d\n", total);
41: pthread_mutex_destroy(&mutex); /* destroy a mutex */
42: return 0;
43: }
11
1.5 Semaphores
Semaphores are used by the operating system to synchronize processes or threads. They
are useful in synchronizing the access of different processes to shared resources. Semaphores
are a programming construct designed by Edsger W. Dijkstra in the 1965 and published in his
paper ,,Co-operating Sequential Processes”. A semaphore S is an integer variable that takes
only non-negative values. Before it can be used it should be initialized by non-negative value.
A semaphore S can be accessed only through two atomic operations - P(S) and V(S) - which
are defined as:
P(S): V(S):
if S > 0 then if some processes are suspended then
S := S - 1 wake up one
else else
suspend calling process S := S + 1
end if end if
Letters P and V take from proper words in dutch - proberen (to try) and verhogen (to raise)
or passeren (to pass) and vrijgeven (to release). In english P and V operations are named wait
and signal.
If a semaphore takes any integer non-negative values it is named as the general semaphore.
It may be used for a counter for resources shared between threads. There is the binary
semaphore, when takes only two values: 0 and 1.
Using semaphores, you must be careful to avoid a deadlock condition. Deadlock occurs
when a process locks a semaphore, then later tries to lock the same semaphore again when
it has a 0 value. The process cannot resume until the semaphore is unlocked, and cannot
unlock the semaphore until it can resume processing, so the process never resumes. At the
same time, any other process that depends on the same semaphore will halt when it tries to
lock the semaphore.
where:
12
• shared - if non-zero then a semaphore is shared between processes, otherwise it is local
to the current process,
The sem init initialize a semaphore and return 0 on success, otherwise it returns -1.
It suspends the calling thread until a semaphore has non-zero count, and then decrements a
semaphore count. It always return 0. In order to perform the V(semaphore) operation we
should use function:
It increments value of a semaphore. This function never blocks and returns 0 on success and
-1 on error.
There is non blocking variant of the sem wait function:
If a semaphore count is 0, the calling thread is not blocked and function returns with error.
Otherwise the action of the sem trywait is the same as the action of the sem wait function.
We can also read a current count of the semaphore. For the purposes of this operation is
function:
It always returns 0 and stores the current count of the semaphore in the location pointed to
by the second argument.
A semaphore can be destroyed if there are no threads waiting on the semaphore at the time
when the sem destroy is called. It returns -1 on error and 0 otherwise.
13
1.5.4 Sample code
We present the use of semaphores, taking producer and consumer problem as an example.
We have two processes - the producer and the consumer. The producer produces a products
and stores them in a FIFO queue. The queue has limited capacity. Therefore the producer
must wait for free places in the queue. Products are consumed by the consumer. When the
queue is empty consumer waits for products in the queue.
Producer and consumer processes are implemented using separate threads. The consumer
executes the consumer function (defined in lines 27-41) and the producer executes the producer
function - defined in lines 11-26. Threads are created by the main function (lines 48-49).
The producer products integer values from range from 0 for 99. It uses a random number
generator (line 16). Values are stored in the queue. The queue is represented by an array
which is declared as global variable (line 8). The tail and the head of the queue is represented
by two global variables - tail and head - declared in line 9. If produced value is equal 0 then
the producer ends. The consumer consumes all values and ends work (line 30-39) after it reads
0 from the queue.
Number of products and number of free places in the queue are represented by semaphores -
products and empty. Semaphores are declared as global variables (line 10). The main function
initializes these semaphores. The products semaphore represents products in the queue and
is initialized with zero (line 47) value because at the beginning the queue is empty. The empty
semaphore represents free places in the queue and is initialized with value equal to the size of
queue (line 46).
The producer can insert new value to the queueu (line 19) only if it is not full. It executes
the P operation on the empty semaphore (line 18) before it inserts value to the queue. When
new product is inserted to the queue the producer executes the V operation on the products
semaphore (line 21) signaling, that number of product in the queue has accrued. After produc-
tion the consumer executes the sleep function (line 22), that suspends execution of a thread
for a given period time. The producer ends its work when the produced value is equal to 0. It
does not return any value.
The consumer is analogous to the producer. It can take a product from the queue only if it
is not empty. It executes the P operation on the products semaphore (line 32) before it takes
a product from the queue (line 33). After that it signals that number of free places in the
queue has accured, executing the V operation on the empty semaphore (line 36). The time of
consumption is a random value from range from 0 for 4. The product’s consumption is realized
by calling the sleep function (line 37). The consumer ends his work when the consumed value
is 0. It does not return any value.
The main function waits for the consumer and the producer threads termination (lines
50-51). After that it destroys a semaphores (lines 52-53) and terminates himself (line 54).
14
1: #include <stdio.h>
2: #include <stdlib.h>
3: #include <unistd.h>
4: #include <time.h>
5: #include <pthread.h>
6: #include <semaphore.h>
15
42: int main(int argc, char *argv[])
43: {
/* descriptors of threads od producer and consumer */
44: pthread_t prod_t, cons_t;
/* initialize a semaphores */
46: sem_init(&empty, 0, SIZE);
47: sem_init(&products, 0, 0);
/* destroy a semaphores */
52: sem_destroy(&empty);
53: sem_destroy(&products);
54: return 0;
55: }
• signal the condition - when the condition becomes true, it awakens one of a suspended
threads from a queue,
• wait for the condition - suspending a thread until another thread singnals a condition
variable; suspended thread is inserted to a queue.
16
be associated with a mutex, to avoid the race condition where a thread prepares to wait on a
condition variable and another thread signals the condition just before the first thread actually
waits on it. A thread should lock mutex before using condition variable.
A condition variable must be initialized before using. It is initialized by calling function:
where:
• attr - the conditions variable attributes, if it is NULL then default attributes are used4 .
If several threads are waiting, only one is resumed, but it is not specified which one. Nothing
happens if no threads are waiting on the condition variable. All threads that are waiting on
the condition variable may be restarted in one function:
The function suspends calling thread and release a mutex. When a thread resumes execution
after a signal or broadcast operation, a thread will again own a mutex and it will be locked.
All functions for using condition variable return 0 if success. Any other returned value
indicates that an error occurred.
4
A condition variable with default attributes suffices to realize tasks in the laboratory exercises.
17
1.6.3 Destroying a condition variable
A condition variable is destroyed by calling following function:
1.6.4.1 Monitors
Semaphores described in section 1.5 are low-level mechanism for mutual exclusion. It is
easy to make a deadlock using semaphores. For example if one thread executed a P operation
and waits on the semaphore, another thread must execute a V operation on the same semaphore
to unlock the first thread. Otherwise the first thread never resumes. A higher level solution
would make implementing synchronization a little easier. It is implemented by monitors.
The monitor was introduced by C.A.R. Hoare in paper ,,An Operating System Structuring
Concept” in the 1974. It encapsulates the representation of an abstract object and provides
a set of operations on this object. A monitor is a program module that consists of variables
that store the object’s state and functions that implement operations on the object. A state
of object can be changed only by the monitor’s functions. It follows that monitor’s variables
are accessed only by calling monitor’s functions. Functions in the same monitor cannot be
executed cuncurrently. This is what allows monitors to enforce mutual exclusion. A monitor
has following properties:
• only functions specified in the monitor are visible outside the monitor,
• the monitor’s funtions may not access variables declared outside monitor,
• variables are initialized before any funtion is called by executing the initialization funtion
when the monitor is created.
18
1.6.4.2 Monitor for the producer and the consumer problem
In this section we present the monitor for the producer and the consumer problem. Cer-
tainly it is only a program module but not a process. It is described in a pseudocode similar
to C language. The monitor contains the queue of values and makes possible to insert a new
value to this queue and take a value from this queue. We do not specify a type of values of
the queue. Is is not necessary for understanding this problem. The queue can include values
of any types.
1: monitor producer_consumer
2: {
3: private:
4: const int SIZE = ...;
5: type_of_value queue[SIZE];
6: int tail, head;
7: condition not_full, not_empty;
8: public:
9: void producer_consumer() /* initialize a monitor */
10: {
11: tail = head = 0;
12: }
13: void put(type_of_value value) /* insert a new value to the queue */
14: {
15: if (queue is full)
16: not_full.wait();
17: queue[tail] = value;
18: tail = (tail + 1) mod SIZE;
19: not_empty.signal();
20: }
21: type_of_value get() /* take a value from the queue */
22: {
23: type_of_value value;
24: if (queue is empty)
25: not_empty.wait();
26: value = queue[head];
27: head = (head + 1) mod SIZE;
28: not_full.signal();
29: return value;
30: }
31: }
Presented monitor contains the following variables: the queue for values (line 5) and va-
riables represent the tail and the head of the queue (line 6). These variables are accessible
only inside the monitor. Additonally the monitor contains two condition variables describes
19
situation when the queue is not empty and not full (line 7). These variables make possible to
delay a thread using this monitor when the queue is empty or full. The monitor contains a
constant which represents size of the queue, too.
In the producer - consumer problem we have only two operations: the producer inserts a
new value to the queue, the consumer takes a value from the queue. These operations are
implemented by monitor’s functions - put (lines 13-20) and get (lines 21-30). These functions
are visible outside the monitor.
The put function (lines 13-20) inserts a new value to the queue if there is a free place.
When the queue is full a thread waits for place (line 16). For this purpose a not full condition
variable is used. Instruction in line 16 should be understand as ,,wait until the queue is not
full”. It is equivalent with ,,wait for free place in the queue”. After inserting a new value to
the queue it is signaled that ,,the queue is not empty” (line 19). There is used a not empty
condition variable.
The consumer can take a value from the queue using the get function (lines 21-30). It is
analogous to the put function. When the queue is empty it waits for a value in the queue (line
25). There is used a not empty condition variable. Line 16 should be understand as ,,wait
until the queue is not empty” or ,,wait for a value in the queue”. After takinig a value from
the queue it is signaled using a not full condition variable (line 28), that ,,the queue is not
full”.
We presented a monitor and now we can present processes of the producer and the consu-
mer. Processes will be described by functions.
20
Processes of the producer and the consumer never end. They share a monitor defined in
line 1. The producer products a new value (line 7) and inserts it to the queue (line 8) using
monitor’s function put. The consumer consumes values produced by the producer. It takes a
value from the queue using monitor’s function get.
In next section we will present implementation of described monitor using the Pthread
library and condition variables.
1.6.4.3 Implementation of the producer and the consumer monitor using the
Pthread library and condition variables
1: #include <stdio.h>
2: #include <stdlib.h>
3: #include <unistd.h>
4: #include <time.h>
5: #include <pthread.h>
21
/* destructor of monitor’s object */
28: monitor::~monitor()
29: {
/* destroy a mutex and conditional variables */
30: pthread_mutex_destroy(&mutex);
31: pthread_cond_destroy(¬_full);
32: pthread_cond_destroy(¬_empty);
33: }
/* insert a new value to the queue */
34: void monitor::put(const int value)
35: {
36: pthread_mutex_lock(&mutex); /* lock a mutex */
57: monitor prod_cons_mon; /* monitor used for producer and consumer problem */
22
58: void *producer(void *arg) /* producer function */
59: {
60: int value;
61: do /* produce values (products) until produced value is non-zero */
62: {
63: value = rand() % 100; /* produce a value */
64: printf("produced: %d\n", value);
65: prod_cons_mon.put(value); /* insert produced value to the queue */
66: sleep(rand() % 5); /* rest after production */
67: }
68: while(value);
69: pthread_exit(NULL); /* terminate producer thread */
70: }
23
Producer (lines 58-70) and consumer (lines 71-82) are implemented using separate threads
like in section 1.5.4. The monitor is declared as global variable (line 57). Producer and
consumer do not contain any synchronization instructions like in a sample presented, they
only call a monitor’s functions.
We defined a class representing a monitor (lines 6-20). The class contains the queue for
products (line 9) and two functions - put (lines 34-44) and get (lines 45-56) - implementing
suitable operations: inserting a new value to the queue and getting a first value from the queue.
A value returned by the get function is removed from the queue. The queue can be accessed
outside a monitor only using these functions.
A producer must wait if the queue is full and a consumer must wait if the queue is empty.
Threads are suspended by means of condition variables. Variable not full is used to suspend
a producer (line 38) trying to insert a value to a full queue. Conditional variable not empty
is used to suspend a consumer (line 49) when the queue is empty. Additional a mutex is
used (declared in line 14) associated with a conditional variables. The mutex is locked in
the begining of the put (line 36) and get (line 47) funtions end released in the end of these
functions (lines 43 and 54). It guarantees that functions put and get will not be execute
concurrently.
The not empty condition variable is signalled after inserting a new product to the queue
(line 43). It is signallled because the queue contains at least a one value and consumer waiting
for it should be resumed (line 49). And vice versa - after removed of a value from the queue,
the not full condition variable is signalled (line 53). There is at least one free place in the
queue and we should resume suspended producer waiting for it (line 38).
A conditional variable and a mutex are initialized in the constructor of the object (lines
21-27). They are destroyed in the destructor (lines 28-33). The main function only creates
producer and consumer threads (lines 87-88) and waits till are terminated (lines 89-90). A
source code is presented below.
24
Bibliography
[2] M. Ben-Ari Principles of Concurrent and Distributed Programming, Prentince Hall Inter-
national Ltd 1990
[3] Brian W. Kernighan, Dennis M. Ritchie The C Programming Language, Second Edition,
Prentice-Hall 1988
[4] http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
[5] http://www.humanfactor.com/pthreads/
[6] http://www.llnl.gov/computing/tutorials/pthreads/
[7] http://users.actcom.co.il/~choo/lupg/tutorials/multi-thread/multi-thread.
html
[8] http://www-106.ibm.com/developerworks/linux/library/l-posix1.html
[9] http://www-106.ibm.com/developerworks/library/l-posix2/
[10] http://www-106.ibm.com/developerworks/library/l-posix3/
[11] http://www.opengroup.org/onlinepubs/7908799/xsh/threads.html
[12] http://docs.sun.com/app/docs/doc/801-6659
25