UNIT-4 - Pipelining & Parallel Processing
UNIT-4 - Pipelining & Parallel Processing
• Parallel Processing
• Pipelining
• Arithmetic Pipeline
• Instruction Pipeline
• RISC Pipeline
PARALLEL PROCESSING
PARALLEL PROCESSING
• Example of parallel Processing:
– Multiple Functional Unit:
Separate the execution unit into
eight functional units operating in
parallel.
PARALLEL COMPUTERS
Architectural Classification
– Flynn's classification
» Based on the multiplicity of Instruction Streams and Data Streams
» Instruction Stream
• Sequence of Instructions read from memory
» Data Stream
• Operations performed on the data in the processor
Instruction stream
• Characteristics:
Ø One control unit, one processor unit, and one memory unit
Ø Parallel processing may be achieved by means of:
ü multiple functional units
ü pipeline processing
M CU P
M CU P Memory
• •
• •
• •
M CU P Data stream
Instruction stream
Characteristics
Control Unit
Instruction stream
Data stream
Alignment network
• Characteristics
Ø Only one copy of the program exists
Ø All processors receive the same instruction from the control
unit but operate on different items of data.
Computer Organization Computer Architectures Lab
Pipelining and Vector Processing 12 Parallel Processing
Interconnection Network
Shared Memory
• Characteristics:
Ø Multiple processing units (multiprocessor system)
Ø Execution of multiple instructions on multiple data
PIPELINING
• A technique of decomposing a sequential process into suboperations,
with each subprocess being executed in a special dedicated segment
that operates concurrently with all other segments.
Ai * B i + C i for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2
Multiplier
Segment 2
R3 R4
Adder
Segment 3
R5
GENERAL PIPELINE
• General Structure of a 4-Segment Pipeline
Clock
Input S1 R1 S2 R2 S3 R3 S4 R4
• Space-Time Diagram
The following diagram shows 6 tasks T1 through T6 executed in 4
segments.
Clock cycles
1 2 3 4 5 6 7 8 9
1 T1 T2 T3 T4 T5 T6
No matter how many
segments, once the
Segment 2 T1 T2 T3 T4 T5 T6
pipeline is full, it takes only
3 T1 T2 T3 T4 T5 T6 one clock period to obtain
4 T1 T2 T3 T4 T5 T6 an output.
PIPELINE SPEEDUP
Consider the case where a k-segment pipeline used to execute n tasks.
Ø n = 6 in previous example
Ø k = 4 in previous example
• Pipelined Machine (k stages, n tasks)
ØThe n tasks clock cycles = k+(n-1) (9 in previous example)
• Conventional Machine (Non-Pipelined)
Ø Cycles to complete each task in nonpipeline =n
Ø For k tasks, nk cycles required is
• Speedup (S)
Ø S = Nonpipeline time /Pipeline time
Ø For n tasks: S = nk/(k+n-1)
Ø As n becomes much larger than k-1; Therefore, S = nk/n = k
Types of Pipelining
• Arithmetic Pipeline
• Instruction Pipeline
Arithmetic Pipeline
• Pipe line arithmetic units are usually found in very high speed
computers.
• They are used to implement floating point operations (addition
and subtraction), multiplication of fixed point numbers.
• The inputs to the floating point adder pipeline are two
normalized floating point binary numbers.
• The floating point addition and subtraction can be performed in
four segments as shown in figure below.
• The registers labeled R are placed between the segments to
store intermediate results.
• The sub operations that are performed in the four segments are
ARITHMETIC PIPELINE
Floating-point adder Exponents
a b
Mantissas
A B
[1] Compare the exponents
[2] Align the mantissa R R
2) Align mantissas
Add or subtract
X = 0.9504 x 103 Segment 3: mantissas
Y = 0.08200 x 103
3) Add mantissas R R
Z = 1.0324 x 103
Adjust Normalize
4) Normalize result Segment 4:
exponent result
Z = 0.10324 x 104
R R
INSTRUCTION PIPELINE
Execution of Three Instructions in a 4-Stage Pipeline
Conventional
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Pipelined
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Decode instruction
Segment2: and calculate
effective address
Branch?
yes
no
Fetch operand
Segment3: from memory
Interrupt yes
Interrupt?
handling
no
Update PC
Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
1 FI DA FO EX
Instruction
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Pipeline Conflicts
– Pipeline Conflicts : 3 major difficulties
1) Resource conflicts: memory access by two segments at the
same time. Most of these conflicts can be resolved by using
separate instruction and data memories.
RISC Computer
• RISC (Reduced Instruction Set Computer)
- Machine with a very fast clock cycle that executes at the rate of one
instruction per cycle.
• Major Characteristic
1. Relatively few instructions
2. Relatively few addressing modes
3. Memory access limited to load and store instructions
4. All operations done within the registers of the CPU
5. Fixed-length, easily decoded instruction format
6. Single-cycle instruction execution
7. Hardwired rather than microprogrammed control
8. Relatively large number of registers in the processor unit
9. Efficient instruction pipeline
10. Compiler support for efficient translation of high-level language
programs into machine language programs
• Types of instructions