Parallel Regions: Compute
Parallel Regions: Compute
Parallel Regions: Compute
RecursiveTask<Integer>
{
static final int THRESHOLD = 1000;
private int begin; private int end; private int[] array; public
SumTask(int begin, int end, int[] array) {
this.begin = begin; this.end = end;
this.array = array;
} protected Integer compute() {
if (end - begin < THRESHOLD) {
int sum = 0;
for (int i = begin; i <= end; i++)
sum += array[i]; return sum;
}
else {
int mid = (begin + end) / 2;
4.5.3 OpenMP
OpenMP is a set of compiler directives as well as an API for programs written in C, C++,
or FORTRAN that provides support for parallel programming in sharedmemory
environments. OpenMP identifies parallel regions as blocks of code that may run in
parallel. Application developers insert compiler directives into their code at parallel
regions, and these directives instruct the OpenMP run-
ForkJoinTask <V>
<abstract>
time library to execute the region in parallel. The following C program illustrates a
compiler directive above the parallel region containing the printf() statement:
parallel
it creates as many threads as there are processing cores in the system. Thus, for a dual-core
system, two threads are created; for a quad-core system, four are created; and so forth. All
the threads then simultaneously execute the parallel region. As each thread exits the
parallel region, it is terminated.
OpenMP provides several additional directives for running code regions in parallel,
including parallelizing loops. For example, assume we have two arrays, a and b, of size N.
We wish to sum their contents and place the results in array c. We can have this task run in
parallel by using the following code segment, which contains the compiler directive for
parallelizing for loops:
#pragma omp parallel for
for (i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}
OpenMP divides the work contained in the for loop among the threads it has created in
response to the directive
#pragma omp parallel for
In addition to providing directives for parallelization, OpenMP allows developers to
choose among several levels of parallelism. For example, they can set the number of
threads manually. It also allows developers to identify whether data are shared between
threads or are private to a thread. OpenMP is available on several open-source and
commercial compilers for Linux, Windows, and macOS systems. We encourage readers
interested in learning more about OpenMP to consult the bibliography at the end of the
chapter.
• QOS CLASS UTILITY —The utility class represents tasks that require a longer time
to complete but do not demand immediate results. This class includes work such as
importing data.
• QOS CLASS BACKGROUND —Tasks belonging to the background class are not
visible to the user and are not time sensitive. Examples include indexing a mailbox
system and performing backups.
Tasks submitted to dispatch queues may be expressed in one of two different ways:
1. For the C, C++, and Objective-C languages, GCD identifies a language extension
known as a block, which is simply a self-contained unit of work. A block is
specified by a caret ˆ inserted in front of a pair of braces {}. Code within the braces
identifies the unit of work to be performed. A simple example of a block is shown
below:
^{ printf("I am a block"); }
2. For the Swift programming language, a task is defined using a closure, which is
similar to a block in that it expresses a self-contained unit of functionality.
Syntactically, a Swift closure is written in the same way as a block, minus the
leading caret.
The following Swift code segment illustrates obtaining a concurrent queue for the
user-initiated class and submitting a task to the queue using the dispatch async()
function:
Internally, GCD’s thread pool is composed of POSIX threads. GCD actively manages
the pool, allowing the number of threads to grow and shrink according to application
demand and system capacity. GCD is implemented by the libdispatch library, which Apple
has released under the Apache Commons license.
The first two parameters specify that the iteration space is from 0 to n−1 (which
corresponds to the number of elements in the array v). The second parameter is a C++
lambda function that requires a bit of explanation. The expression [=](size t i) is the
parameter i, which assumes each of the values over the iteration space (in this case from 0
to 𝚗− 1). Each value of i is used to identify which array element in v is to be passed as a
parameter to the apply(v[i]) function.
The TBB library will divide the loop iterations into separate “chunks” and create a
number of tasks that operate on those chunks. (The parallel for function allows developers
to manually specify the size of the chunks if they wish to.) TBB will also create a number
of threads and assign tasks to available threads. This is quite similar to the fork-join library
in Java. The advantage of this approach is that it requires only that developers identify
what operations can run in parallel (by specifying a parallel for loop), and the library
manages the details involved in dividing the work into separate tasks that run in parallel.
Intel TBB has both commercial and open-source versions that run on Windows, Linux, and
macOS. Refer to the bibliography for further details on how to develop parallel
applications using TBB.
A signal is used in UNIX systems to notify a process that a particular event has
occurred. A signal may be received either synchronously or asynchronously,
depending on the source of and the reason for the event being signaled. All
signals, whether synchronous or asynchronous, follow the same pattern:
The standard UNIX function for delivering a signal is kill(pid t pid, int signal)
This function specifies the process (pid) to which a particular signal (signal) is to be delivered. Most
multithreaded versions of UNIX allow a thread to specify which signals it will accept and which it will block.
Therefore, in some cases, an asynchronous signal may be delivered only to those threads that are not blocking it.
However, because signals need to be handled only once, a signal is typically delivered only to the first thread
found that is not blocking it. POSIX Pthreads provides the following function, which allows a signal to be
delivered to a specified thread (tid):
Although Windows does not explicitly provide support for signals, it allows us to emulate them using
asynchronous procedure calls (APCs). The APC facility enables a user thread to specify a function that is to
be called when the user thread receives notification of a particular event
The difficulty with cancellation occurs in situations where resources have been allocated to a canceled
thread or where a thread is canceled while in the midst of updating data it is sharing with other threads. This
becomes especially troublesome with asynchronous cancellation. Often, the operating system will reclaim
system resources from a canceled thread but will not reclaim all resources. Therefore, canceling a thread
asynchronously may not free a necessary system-wide resource.
With deferred cancellation, in contrast, one thread indicates that a target thread is to be canceled, but
cancellation occurs only after the target thread has checked a flag to determine whether or not it should be
canceled. The thread can perform this check at a point at which it can be canceled safely.
In Pthreads, thread cancellation is initiated using the pthread cancel() function. The identifier of the target
thread is passed as a parameter to the function. The following code illustrates creating—and then canceling—a
thread:
pthread t tid;
...