Parallel Programming Models: Sathish Vadhiyar
Parallel Programming Models: Sathish Vadhiyar
Parallel Programming Models: Sathish Vadhiyar
Sathish Vadhiyar
Department of Computational and Data Sciences
Supercomputer Education and Research Centre
Indian Institute of Science, Bangalore, India
P0
P1
Idle time
Computation
Communication
Synchronization
3
4
• Execution time, Tp
• Speedup, S
– S(p, n) = T(1, n) / T(p, n)
– Usually, S(p, n) < p
– Sometimes S(p, n) > p (superlinear speedup)
• Efficiency, E
– E(p, n) = S(p, n)/p
– Usually, E(p, n) < 1
– Sometimes, greater than 1
• Scalability – Limitations in parallel computing,
relation to n and p.
PARALLEL PROGRAMMING
CLASSIFICATION AND STEPS
5
6
Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/
7
Programming Paradigms
• Shared memory model – Threads, OpenMP,
CUDA
• Message passing model – MPI
8
Parallelizing a Program
Given a sequential program/algorithm, how
to go about producing a parallel version
Four steps in program parallelization
1. Decomposition
Identifying parallel tasks with large extent of
possible concurrent activity; splitting the
problem into tasks
2. Assignment
Grouping the tasks into processes with best load
balancing
3. Orchestration
Reducing synchronization and communication costs
4. Mapping
Mapping of processes to processors (if possible)
9
D A O M
e s r a
c s c p
o i h p
m g p0 p1 e p0 p1 i
p s P0 P1
n n
o m t g
s e r
i n a
t t t
P2 P3
i p2 p3 i p2 p3
o o
n n
Illustrations
14
Data Distributions
Task parallelism
0 4
0 2 4 6
0 1 2 3 4 5 6 7
Orchestration
• Goals
–Structuring communication
–Synchronization
• Challenges
–Organizing data structures – packing
–Small or large messages?
–How to organize communication and
synchronization ?
18
Orchestration
• Maximizing data locality
– Minimizing volume of data exchange
• Not communicating intermediate results – e.g. dot product
– Minimizing frequency of interactions - packing
• Minimizing contention and hot spots
– Do not use the same communication pattern with the
other processes in all the processes
• Overlapping computations with interactions
– Split computations into phases: those that depend on
communicated data (type 1) and those that do not (type
2)
– Initiate communication for type 1; During
communication, perform type 2
• Replicating data or computations
– Balancing the extra computation or storage cost with
the gain due to less communication
19
Mapping
• Which process runs on which particular
processor?
–Can depend on network topology,
communication pattern of processes
–On processor speeds in case of
heterogeneous systems
20
Mapping
• Which process runs on which particular
processor?
–Can depend on network topology,
communication pattern of processes
–On processor speeds in case of
heterogeneous systems
21
Assignment -- Option 3
P0
P1
P2
P4
22
Orchestration
• Different for different programming
models/architectures
– Shared address space
• Naming: global addr. Space
• Synch. through barriers and locks
– Distributed Memory /Message passing
• Non-shared address space
• Send-receive messages + barrier for synch.
23
SAS Program
P0
P1
P1
P2
P4
P2
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: