COA UNIT-III Parallel Processors
COA UNIT-III Parallel Processors
UNIT III
1
PARALLEL PROCESSOR
• Parallel processing and its challenges
• Instruction level parallelism
• Flynn's classification
• Hardware multithreading: SISD, MIMD, SIMD,
SPMD and Vector multithreading
• Multicore processors: Shared memory
multiprocessor and cluster multiprocessor
UNIT-5
Parallel Processing
UNIT-5
Parallel Processing
UNIT-5
Parallel Processing
Parallel processing
• A parallel processing system can be achieved by having a
multiplicity of functional units that perform identical or
different operations simultaneously. The data can be
distributed among various multiple functional units.
• The following diagram shows one possible way of
separating the execution unit into eight functional units
operating in parallel.
• The operation performed in each functional unit is
indicated in each block if the diagram:
Parallel processing
• The adder and integer multiplier performs the arithmetic
operation with integer numbers.
• The floating-point operations are separated into three
circuits operating in parallel.
• The logic, shift, and increment operations can be
performed concurrently on different data. All units are
independent of each other, so one number can be shifted
while another number is being incremented.
PARALLEL PROCESSING CHALLENGES
The Hardware Model
• An ideal processor is one where all constraints on ILP are
removed. The only limits on ILP in such a processor are those
imposed by the actual data flows through either registers or
memory.
1) Register renaming
There are an infinite number of virtual registers available, and hence
all WAW and WAR hazards are avoided and an unbounded
number of instructions can begin execution simultaneously.
2) Branch prediction
Branch prediction is perfect. All conditional branches are predicted
exactly.
PARALLEL PROCESSING CHALLENGES
3.Jump prediction
• —All jumps (including jump register used for return and
computed jumps) are perfectly predicted. When combined with
perfect branch prediction, this is equivalent to having a processor
with perfect speculation and an unbounded buffer of instructions
available for execution.
4.Memory address alias analysis
• —All memory addresses are known exactly, and a load can be
moved before a store provided that the addresses are not
identical. Note that this implements perfect address alias analysis.
PARALLEL PROCESSING CHALLENGES
5.Perfect caches
• —All memory accesses take 1 clock cycle. In practice, superscalar
processors will typically consume large amounts of ILP hiding
cache misses, making these results highly optimistic.
PARALLEL PROCESSING CHALLENGES
Limitations on the Window Size and Maximum Issue Count
2.Tournament-based
branch predictor —The prediction scheme uses a correlating 2-bit
predictor and a noncorrelating 2-bit predictor together with a
selector, which chooses the best predictor for each branch.
PARALLEL PROCESSING CHALLENGES
The Effects of Finite Registers
• Single processor
• Single instruction stream
• Data stored in single memory
• Uni-processor
• Sequence of data
• Transmitted to set of processors
• Each processor executes different instruction sequence
• Never been implemented
• Each processor executes a different sequence of instructions.
Fine-grained multithreading
Coarse-grained multithreading
HARDWARE MULITHREADING
Fine-grained multithreading
Coarse-grained multithreading
• Database servers
• ¡ Web servers
• ¡ Telecommunication markets
• ¡ Multimedia applications
• ¡ Scientific applications
Shared Memory Multiprocessors
• In shared-memory multiprocessors, numerous processors are
accessing one or more shared memory modules. The processors
may be physically connected to the memory modules in many
ways, but logically every processor is connected to every memory
module.
• One of the major characteristics of shared memory
multiprocessors is that all processors have equally direct access
to one large memory address space. The limitation of shared
memory multiprocessors is memory access latency.
• The figure shows shared-memory multiprocessors.
Shared Memory Multiprocessors
Shared Memory Multiprocessors
• Shared memory multiprocessors have a major benefit over other
multiprocessors since all the processors sent a similar view of the
memory.
• These processors are also termed Uniform Memory Access (UMA)
systems. This term denotes that memory is equally accessible to
every processor, providing access at the same performance rate
Clustered Multiprocessors
• The clustered system usually consists of integrating several
machines into one machine to complete tasks.
• Cluster systems are a mix of hardware clusters and software
groups.
• Hardware clusters help to share high-performance disks among
devices.
• The device clusters make both systems work together. Every
node of the clustered systems contains the cluster program.
Clustered Multiprocessors
Clustered Multiprocessors