Lecture 2
Lecture 2
Computing
LECTURE #2
1
Agenda
o What is parallel computing?
o Terminologies of parallel computing
o Performance Evaluation of parallel computing
o Challenges of parallel computing
o Parallel processing concepts:
o How is parallelism expressed in a program
o Architectural concepts related to parallelism (parallel processing)
2
What is parallel computing?
Multiple processors cooperating concurrently to solve one problem.
3
What is parallel computing?
4
What is parallel computing?
❑Multicore is a processor having several cores that can access the same memory
concurrently
❑A computation is decomposed into several parts called Tasks that can be computed
in parallel
❑ Finding enough parallelism is (one of the) critical steps for high performance
(Amdahl’s law).
6
Performance Metrics
❑ Execution time:
The time elapsed between the beginning and the end of its execution.
❑ Speedup:
The ration between serial and parallel time.
Speedup= Ts/Tp
❑ Efficiency:
Ratio of speedup to the number of processors.
Efficiency= Speedup/P
7
Performance Metrics
❑ Amdahl’s Law
Used to predict maximum speedup using multiple processors.
• Let f = fraction of work performed sequentially.
• (1 - f) = fraction of work that is parallelizable.
• P = number of processors
On 1 cpu: T1 = f + (1 – f ) = 1.
(1−𝑓 )
On P processors: Tp = f +
𝑝
• Speedup
𝑇1 1 1
= <
𝑇𝑝 𝑓+(1−𝑓)/𝑝 𝑓
9
Challenges
All parallel programs contain:
❑ Parallel sections
❑ Sequential sections
❑Sequential sections are with work is being duplicated or no useful work is being done,
(waiting for others)
❑ Idling:
Processes may become idle due to many reasons such as load
imbalance, synchronization, and presence of serial components in a
program.
❑ Excess Computation:
The fastest known sequential algorithm for a problem may be difficult
or impossible to parallelize, forcing us to use a parallel algorithm
based on a poorer but easily parallelizable sequential algorithm.
11
Challenges
❖ 2)Memory system challenges
❖ The effective performance of a program on computer relies :
1. Speed of processor (clock rates of processors increased from 40MHz (e.g MIPS R3000, 1988), to 2.0 GHz
(e.g pentium 4, 2002), to, 8.429GHz (AMD's Bulldozer based FX chips, 2012)
2. Ability of memory to feed data to processor
❖ Memory System Performance is mainly captured by two parameters, latency and bandwidth.
• Latency is the time from the issue of a memory request to the time the data is Memory
available at the processor.
Example: if memory has latency 100 ns (no caches). Assume that the processor has two Data Path
multiply-add units and is capable of executing four instructions in each cycle of 1 ns.
Then processor must wait 100 cycles before it can process the data
Processor
◦ Bandwidth is the rate at which data can be pumped to the processor by the memory system.
13
How is parallelism expressed in a program
IMPLICITLY EXPLICITLY
❑Define tasks only, rest implied; or ❑Define tasks, work decomposition,
define tasks and work decomposition data decomposition, communication,
rest implied; synchronization.
❑ A pure implicitly parallel language does not need special directives, operators or functions
to enable parallel execution.
❑ Programming languages with implicit parallelism include Axum, HPF, Id, LabVIEW, MATLAB
M-code,
❑Example: taking the sine or logarithm of a group of numbers, a language that provides
implicit parallelism might allow the programmer to write the instruction thus:
18
Advantages Disadvantages
❑ A programmer does not need to worry ❑ It reduce the control that the
about task division or process programmer has over the parallel
communication, execution of the program,
19
2- EXPLICITLY
How is parallelism expressed in a program
❑ it is the representation of concurrent computations by means of primitives in the form
of special-purpose directives or function calls.
Advantages Disadvantages
2
1
Parallel Processing Concepts
1- Implicit Parallelism
❖Concerning Memory- processor data path bottlenecks, microprocessor
designer invent (trend) alternate routs to cost effective performance.
2
3
Parallel Processing Concepts
Implicit Parallelism
Parallel Processing Concepts
Implicit Parallelism
Parallel Processing Concepts
Implicit Parallelism
Parallel Processing Concepts
Implicit Parallelism
28
Parallel Processing Concepts
Implicit Parallelism
❖Consider a processor with two pipelines & ability to simultaneously issue
two instructions (per clock cycle) (hence it is named dual issue execution)
- Consider the execution for adding 4 numbers
29
Parallel Processing Concept
Implicit Parallelism
30
Parallel Processing Concepts
Implicit Parallelism
Limitation:
❖ True Data Dependency: The result of one operation is an input to the next.
32
Parallel Processing Concepts
Implicit Parallelism
Superscalar Execution
❖ Scheduling of instructions is determined by a number of factors:
• True Data Dependency
• Resource Dependency
• Branch Dependency
❖In a more aggressive model, instructions can be issued out of order. In this case,
if the second instruction has data dependencies with the first, but the third
instruction does not, the first and third instructions can be co-scheduled. This is
also called dynamic issue.
Efficiency Considerations
❖Due to limited parallelism in typical instruction traces, dependencies, or the
inability of the scheduler to extract parallelism, the performance of superscalar
18
Parallel Processing Concepts
1.2 Very long instruction word processors VLIW Implicit Parallelism
◦ Instruction that can be executed concurrently are packed into groups given to
processor as a single long instruction word to be executed on multiple functional
units at the same time.
◦ Very sensitive to compilers ability to detect data & resource dependencies 19
2- Explicit Parallelism
❖ Elements of a Parallel Computer Hardware
*Hardware:
▪ Multiple Processors
▪ Multiple Memories
▪ Interconnection Network System Software
*System Software:
▪ Parallel Operating System
▪ Programming Constructs to Express/Orchestrate Concurrency Application Software
*Application Software:
▪ Parallel Algorithms
❖ Goal:
▪ Utilize the Hardware, System, & Application Software to either:
- Achieve Speedup.
- Solve problems requiring a large amount of memory.
36
What is Think - Different
How many people doing the work → (Degree of Parallelism)
Whether they need info from each other to finish their own job → (Communication)
37