Omp Sync Data Runtime Environment
Omp Sync Data Runtime Environment
These slides were originally written by Dr. Barbara Chapman, University of Houston
OpenMP Memory Model
OpenMP assumes a shared memory
Threads communicate by sharing variables.
2
OpenMP Syntax
Most OpenMP constructs are compiler directives
For C and C++, they are pragmas with the form:
#pragma omp construct [clause [clause]…]
For Fortran, the directives may have fixed or free form:
*$OMP construct [clause [clause]…]
C$OMP construct [clause [clause]…]
!$OMP construct [clause [clause]…]
Include file and the OpenMP lib module
#include <omp.h>
use omp_lib
Most OpenMP constructs apply to a “structured block”.
A block of one or more statements with one point of entry at the top
and one point of exit at the bottom.
It’s OK to have an exit() within the structured block.
Thread 0 1 2 3
no chunk * 1-4 5-8 9-12 13-16
chunk = 2 1-2 3-4 5-6 7-8
9-10 11-12 13-14 15-16
overhead
OpenMP Synchronization
Synchronization enables the user to
Control the ordering of executions in different threads
Ensure that at most one thread executes operation or
region of code at any given time (mutual exclusion)
7
Barrier
We need to update all of a[ ] before using a[ ] *
All threads wait at the barrier point and only continue when all
threads have reached the barrier point
idle
idle
time
time
Critical Region (Section)
14
Ordered
The ordered construct enforces the sequential order for a
block.
Code is executed in order in which iterations would be
performed sequentially
The worksharing construct has to have the ordered clause
15
Updates to Shared Data
Blocks of data are fetched into cache lines
Values may temporarily differ from other copies of
data within a parallel region
a
Shared memory
a
cache1 cache2 cache3 cacheN
16
a
The Flush Directive
The flush construct denotes a sequence point where
a thread tries to create a consistent view of memory
for specified variables.
All memory operations (both reads and writes) defined
prior to the sequence point must complete.
All memory operations (both reads and writes) defined
after the sequence point must follow the flush.
Variables in registers or write buffers must be updated
in memory.
Arguments to flush specify which variables are
flushed.
If no arguments are specified, all thread visible
variables are flushed.
17
What Else Does Flush Influence?
The flush operation does not
actually synchronize different
threads. It just ensures that a
thread’s values are made
consistent with main memory.
Something to note:
Compilers reorder instructions to better exploit the functional
units and keep the machine busy
Flush prevents the compiler from doing the following:
Reorder read/writes of variables in a flush set relative to a flush.
Reorder flush constructs when flush sets overlap.
A compiler CAN do the following:
Reorder instructions NOT involving variables in the flush set
relative to the flush.
Reorder flush constructs that don’t have overlapping flush sets.
18
A Flush Example
Pair-wise synchronization.
integer ISYNC(NUM_THREADS)
C$OMP PARALLEL DEFAULT (PRIVATE) SHARED (ISYNC)
IAM = OMP_GET_THREAD_NUM()
ISYNC(IAM) = 0
Make sure other threads can
C$OMP BARRIER see my write.
CALL WORK()
ISYNC(IAM) = 1 ! I’m all done; signal this to other threads
C$OMP FLUSH(ISYNC)
DO WHILE (ISYNC(NEIGH) .EQ. 0)
C$OMP FLUSH(ISYNC)
END DO Make sure the read picks up a
C$OMP END PARALLEL good copy from memory.
21
OpenMP Memory Model
OpenMP assumes a shared memory
Threads communicate by sharing variables.
22
Data-Sharing Attributes
All data clauses apply to parallel regions and worksharing constructs except “shared”
24
which only applies to parallel regions.
About Storage Association
shared data
a[size][size]
A, index
A and index are shared
by all threads.
temp temp temp
temp is local to each
thread
A, index
27
OpenMP Private Clause
private(var) creates a local copy of var for each
thread.
The value is uninitialized
Private copy is not storage-associated with the original
The original is undefined at the end
IS = 0
C$OMP PARALLEL DO PRIVATE(IS)
DO J=1,1000
IS = IS + J
✗
END DO
IS was not
C$OMP END PARALLEL DO initialized
print *, IS
IS is undefined
here, regardless
28
of initialization
(In)Visibility of Private Data
#pragma omp parallel private(x) shared(p0, p1)
Thread 0 Thread 1
X = …; X = …;
P0 = &x; P1 = &x;
/* references in the following line are not allowed */
… *p1 … … *p0 …
You can not reference another’s threads private variables … even if you have a
shared pointer between the two threads.
29
The Firstprivate And Lastprivate Clauses
firstprivate (list)
All variables in the list are initialized with the
value the original object had before entering
the parallel construct
lastprivate (list)
The thread that executes the sequentially last
iteration or section updates the value of the
objects in the list
Firstprivate Clause
firstprivate is a special case of private.
Initializes each private copy with the corresponding
value from the master thread.
✗
✔
IS = 0
C$OMP PARALLEL DO FIRSTPRIVATE(IS)
DO 20 J=1,1000
IS = IS + J
20 CONTINUE
C$OMP END PARALLEL DO Each thread gets its own IS
print *, IS with an initial value of 0
Regardless of initialization, IS is
undefined at this point
31
Lastprivate Clause
Lastprivate passes the value of a private variable
from the last iteration to the variable of the master
thread
IS = 0
✔
C$OMP PARALLEL DO FIRSTPRIVATE(IS)
C$OMP& LASTPRIVATE(IS)
DO 20 J=1,1000
Are you sure ?
IS = IS + J
20 CONTINUE
C$OMP END PARALLEL DO Each thread gets its own IS
print *, IS with an initial value of 0
33
A Data Environment Checkup
Consider this example of PRIVATE and FIRSTPRIVATE
C variables A,B, and C = 1
C$OMP PARALLEL PRIVATE(B)
C$OMP& FIRSTPRIVATE(C)
Are A,B,C local to each thread or shared inside the parallel region?
What are their initial values inside and after the parallel region?
IS = 0
C$OMP PARALLEL DO REDUCTION(+:IS)
DO 1000 J=1,1000
IS = IS + J
1000 CONTINUE
print *, IS
35
OpenMP Reduction
36
The Reduction Clause
reduction ( operator: list ) C/C++
reduction ( [operator | intrinsic] ) : list ) Fortran
Reduction variable(s) must be shared variables
Check the specs
A reduction is defined as: for details
Fortran C/C++
x = x operator expr x = x operator expr
x = expr operator x x = expr operator x
x = intrinsic (x, expr_list) x++, ++x, x--, --x
x = intrinsic (expr_list, x) x <binop> = expr
“min” and “max” intrinsic
Note that the value of a reduction variable is undefined
from the moment the first thread reaches the clause till
the operation has completed
The reduction can be hidden in a function call
Reduction Example
Remember the code we used to demo private,
firstprivate and lastprivate.
program closer
IS = 0
DO J=1,1000
IS = IS + J
1000 CONTINUE
print *, IS program closer
IS = 0
#pragma omp parallel for reduction(+:IS)
DO J=1,1000
IS = IS + J
1000 CONTINUE
print *, IS
38
Example - The Reduction Clause
sum = 0.0
!$omp parallel default(none) &
!$omp shared(n,x) private(i) Variable SUM
!$omp do reduction (+:sum) is a shared
do i = 1, n variable
sum = sum + x(i)
end do
!$omp end do
!$omp end parallel
print *,sum
itotal = 1000
C$OMP PARALLEL PRIVATE(np, each)
np = omp_get_num_threads() Are these
each = itotal/np two codes
………
C$OMP END PARALLEL
equivalent?
itotal = 1000
C$OMP PARALLEL DEFAULT(PRIVATE) SHARED(itotal)
np = omp_get_num_threads()
each = itotal/np
………
C$OMP END PARALLEL
42
Default Clause Example
itotal = 1000
C$OMP PARALLEL PRIVATE(np, each)
np = omp_get_num_threads() Are these
each = itotal/np two codes
………
C$OMP END PARALLEL
equivalent?
itotal = 1000
C$OMP PARALLEL DEFAULT(PRIVATE) SHARED(itotal) yes
np = omp_get_num_threads()
each = itotal/np
………
C$OMP END PARALLEL
43
OpenMP Threadprivate
Makes global data private to a thread and persistent,
thus crossing parallel region boundary
Fortran: COMMON blocks
C: File scope and static variables
Different from making them PRIVATE
With PRIVATE, global variables are masked.
THREADPRIVATE preserves global scope within each thread
Threadprivate variables can be initialized using COPYIN
or by using DATA statements.
Some limitations on use of threadprivate
Consult specification before using
44
A Threadprivate Example
Consider two different routines called within a parallel region.
Because of the threadprivate construct, each thread executing these routines has
its own copy of the common block /buf/.
45
Threadprivate/Copyin
• You initialize threadprivate data using a copyin clause.
parameter (N=1000)
common/buf/A(N)
C$OMP THREADPRIVATE(/buf/)
46
The Copyin Clause
copyin (list)
Applies to THREADPRIVATE common blocks only
At the start of the parallel region, data of the master thread is
copied to the thread private copies
Example:
common /cblock/velocity
common /fields/xfield, yfield, zfield
! create thread private common blocks
!$omp threadprivate (/cblock/, /fields/)
!$omp parallel &
!$omp default (private) & Data now
!$omp copyin ( /cblock/, zfield ) available to
threads
Copyprivate
Used with a single region to broadcast values of private
variables from one member of a team to the rest of the team.
#include <omp.h>
void input_parameters (int, int); // fetch values of input parameters
void do_work(int, int);
void main()
{
int Nsize, choice;
do_work(Nsize, choice);
}
}
48
Fortran - Allocatable Arrays
Fortran allocatable arrays whose status is
“currently allocated” are allowed to be specified as
private, lastprivate, firstprivate, reduction, or copyprivate
51
OpenMP Runtime Functions
53
OpenMP Library Routines
To use a known, fixed number of threads used in a program,
(1) tell the system that you don’t want dynamic adjustment of the
number of threads, (2) set the number threads, then (3) save the
number you got.
Disable dynamic adjustment of the
#include <omp.h> number of threads.
void main()
{ int num_threads; Request as many threads as
omp_set_dynamic( 0 ); you have processors.
omp_set_num_threads( omp_num_procs() );
#pragma omp parallel
Protect this op since Memory
{ int id=omp_get_thread_num();
stores are not atomic
#pragma omp single
num_threads = omp_get_num_threads();
do_lots_of_stuff(id);
}
} Even in this case, the system may give you fewer threads
than requested. If the precise # of threads matters, test for
54 it and respond accordingly.
OpenMP Runtime Functions
Name Functionality
omp_set_num_threads Set number of threads
omp_get_num_threads Number of threads in team
omp_get_max_threads Max num of threads for parallel region
omp_get_thread_num Get thread ID
omp_get_num_procs Maximum number of processors
omp_in_parallel Check whether in parallel region
omp_set_dynamic Activate dynamic thread adjustment
(but implementation is free to ignore this)
omp_get_dynamic Check for dynamic thread adjustment
omp_set_nested Activate nested parallelism
(but implementation is free to ignore this)
omp_get_nested Check for nested parallelism
omp_get_wtime Returns wall clock time
omp_get_wtick Number of seconds between clock ticks
OpenMP Runtime Functions
Name Functionality
omp_set_schedule Set schedule (if “runtime” is used)
omp_get_schedule Returns the schedule in use
omp_get_thread_limit Max number of threads for
program
omp_set_max_active_levels Set number of active parallel
regions
omp_get_max_active_levels Number of active parallel regions
omp_get_level Number of nested parallel regions
omp_get_active_level Number of nested active par.
regions
omp_get_ancestor_thread_num Thread id of ancestor thread
omp_get_team_size (level) Size of the thread team at this level
omp_in_final Check in final task or not
OpenMP Environment Variables
57
OpenMP Environment Variables/1
Default Oracle
OpenMP Environment Variable
Solaris Studio
OMP_NUM_THREADS 2
OMP_SCHEDULE
static, “N/P”
“schedule,[chunk]”
OMP_DYNAMIC {TRUE | FALSE} TRUE
OMP_NESTED {TRUE | FALSE} FALSE
OMP_WAIT_POLICY [ACTIVE |
PASSIVE
PASSIVE]
OMP_MAX_ACTIVE_LEVELS 4
The names are in uppercase, the values are case insensitive
Be careful when relying on defaults (because they are
compiler dependent)
OpenMP Environment Variables/2
Default Oracle
OpenMP Environment Variable
Solaris Studio
OMP_THREAD_LIMIT 1024
OMP_PROC_BIND {TRUE | FALSE} FALSE