0% found this document useful (0 votes)
23 views34 pages

UNIT-4 - Pipelining & Parallel Processing

Uploaded by

gedelaarjun333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views34 pages

UNIT-4 - Pipelining & Parallel Processing

Uploaded by

gedelaarjun333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Pipelining and Vector Processing 1

PIPELINING AND VECTOR PROCESSING

• Parallel Processing

• Pipelining

• Arithmetic Pipeline

• Instruction Pipeline

• RISC Pipeline

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 2 Parallel Processing

PARALLEL PROCESSING

Parallel processing is a term used for a large class


of techniques that are used to provide
simultaneous data-processing tasks for the
purpose of increasing the computational speed of
a computer system.

Instead of processing each instruction


sequentially as in a conventional computer.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 3

• Ex: while an instruction is being executed in the ALU, the


next instruction can be read from memory.

• The system may have two or more ALUs and be able to


execute two or more instructions at the same time.

• Purpose: To increase the throughput

• Throughput: The amount of processing that can be


accomplished during a given interval of time.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 4

• The amount of hardware increases with parallel processing


, and with it, the cost of the system increases.

• However, technologies developments have reduced


hardware costs to the point where parallel processing
techniques are economically feasible.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 5

PARALLEL PROCESSING
• Example of parallel Processing:
– Multiple Functional Unit:
Separate the execution unit into
eight functional units operating in
parallel.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 6

• Above figure shows one possible way of separating the


execution unit into eight functional units operating in
parallel.
• Arithmetic operations with integers: Adder-Subtractor,
Integer multiplier.
• All units are independent of each other.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 7

• Parallel processing can be classified as:


(i) Internal organization of the processors
(ii) Interconnection between processors
(iii) Flow of information through the system.

• M.J.Flynn: Organization of a computer system by the


number of instruction and data items that are manipulated
simultaneously.
• The sequence of instructions read from memory constitutes
an instruction stream. The operations performed on the
data in the processor constitutes a data stream.
• Flynn’s classification divides computers into four major
groups.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 8 Parallel Processing

PARALLEL COMPUTERS
Architectural Classification
– Flynn's classification
» Based on the multiplicity of Instruction Streams and Data Streams
» Instruction Stream
• Sequence of Instructions read from memory
» Data Stream
• Operations performed on the data in the processor

Number of Data Streams


Single Multiple

Number of Single SISD SIMD


Instruction
Streams Multiple MISD MIMD

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 9 Parallel Processing

SISD COMPUTER SYSTEMS

Control Processor Data stream Memory


Unit Unit

Instruction stream

• Characteristics:
Ø One control unit, one processor unit, and one memory unit
Ø Parallel processing may be achieved by means of:
ü multiple functional units
ü pipeline processing

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 10 Parallel Processing

MISD COMPUTER SYSTEMS

M CU P

M CU P Memory
• •
• •
• •

M CU P Data stream

Instruction stream

Characteristics

- There is no computer at present that can be classified as


MISD

- Only theoretical interest since no practical system has been


constructed using this organization.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 11 Parallel Processing

SIMD COMPUTER SYSTEMS


Memory
Data bus

Control Unit
Instruction stream

P P ••• P Processor units

Data stream

Alignment network

M M ••• M Memory modules

• Characteristics
Ø Only one copy of the program exists
Ø All processors receive the same instruction from the control
unit but operate on different items of data.
Computer Organization Computer Architectures Lab
Pipelining and Vector Processing 12 Parallel Processing

MIMD COMPUTER SYSTEMS


P M P M ••• P M

Interconnection Network

Shared Memory

• Characteristics:
Ø Multiple processing units (multiprocessor system)
Ø Execution of multiple instructions on multiple data

• Types of MIMD computer systems


- Shared memory multiprocessors

- Message-passing multicomputer (multicomputer system)

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 13 Pipelining

PIPELINING
• A technique of decomposing a sequential process into suboperations,
with each subprocess being executed in a special dedicated segment
that operates concurrently with all other segments.
Ai * B i + C i for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2

Multiplier
Segment 2

R3 R4

Adder
Segment 3

R5

Suboperations in each segment: R1  Ai, R2  Bi Load Ai and Bi


R3  R1 * R2, R4  Ci Multiply and load Ci
R5  R3 + R4 Add

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 14 Pipelining

OPERATIONS IN EACH PIPELINE STAGE

Clock Segment 1 Segment 2 Segment 3


Pulse
Number R1 R2 R3 R4 R5
1 A1 B1 --- --- -------
2 A2 B2 A1 * B1 C1 -------
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 15 Pipelining

GENERAL PIPELINE
• General Structure of a 4-Segment Pipeline
Clock

Input S1 R1 S2 R2 S3 R3 S4 R4

• Space-Time Diagram
The following diagram shows 6 tasks T1 through T6 executed in 4
segments.
Clock cycles

1 2 3 4 5 6 7 8 9
1 T1 T2 T3 T4 T5 T6
No matter how many
segments, once the
Segment 2 T1 T2 T3 T4 T5 T6
pipeline is full, it takes only
3 T1 T2 T3 T4 T5 T6 one clock period to obtain
4 T1 T2 T3 T4 T5 T6 an output.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 16 Pipelining

PIPELINE SPEEDUP
Consider the case where a k-segment pipeline used to execute n tasks.
Ø n = 6 in previous example
Ø k = 4 in previous example
• Pipelined Machine (k stages, n tasks)
ØThe n tasks clock cycles = k+(n-1) (9 in previous example)
• Conventional Machine (Non-Pipelined)
Ø Cycles to complete each task in nonpipeline =n
Ø For k tasks, nk cycles required is
• Speedup (S)
Ø S = Nonpipeline time /Pipeline time
Ø For n tasks: S = nk/(k+n-1)
Ø As n becomes much larger than k-1; Therefore, S = nk/n = k

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 17 Pipelining

PIPELINE AND MULTIPLE FUNCTION UNITS


Example:
- 4-stage pipeline
- 100 tasks to be executed in sequence
- 1 task in non-pipelined system; 4 clock cycles
Pipelined System : k + n - 1 = 4 + 99 = 103 clock cycles
Non-Pipelined System : n*k = 100 * 4 = 400 clock cycles
Speedup : Sk = 400 / 103 = 3.88

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 18

Types of Pipelining
• Arithmetic Pipeline
• Instruction Pipeline

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 19

Arithmetic Pipeline
• Pipe line arithmetic units are usually found in very high speed
computers.
• They are used to implement floating point operations (addition
and subtraction), multiplication of fixed point numbers.
• The inputs to the floating point adder pipeline are two
normalized floating point binary numbers.
• The floating point addition and subtraction can be performed in
four segments as shown in figure below.
• The registers labeled R are placed between the segments to
store intermediate results.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 20

• The sub operations that are performed in the four segments are

• 1. Compare the exponents


• 2. Align the mantissas
• 3. Add or Subtract the mantissa
• 4. Normalize the result

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 21 Arithmetic Pipeline

ARITHMETIC PIPELINE
Floating-point adder Exponents
a b
Mantissas
A B
[1] Compare the exponents
[2] Align the mantissa R R

[3] Add/sub the mantissa


Compare
[4] Normalize the result Segment 1: exponents
Difference
by subtraction

X = A x 10a = 0.9504 x 103 R


Y = B x 10b = 0.8200 x 102
Segment 2: Choose exponent Align mantissa
1) Compare exponents :
3-2=1 R

2) Align mantissas
Add or subtract
X = 0.9504 x 103 Segment 3: mantissas
Y = 0.08200 x 103
3) Add mantissas R R

Z = 1.0324 x 103
Adjust Normalize
4) Normalize result Segment 4:
exponent result
Z = 0.10324 x 104
R R

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 22

• The comparator, shifter, adder-subtractor, incrementer, and


decrementer in the floating point pipeline are implemented with
combinational circuits.

• Let say individual segment delays= 60+70+100+80=310ns


• Register delay= 10 ns
• Non pipelined total delay= 320ns
• Pipelined adder=100 +10= 110ns
• Speed up= 320/110=2.9.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 23 Instruction Pipeline

INSTRUCTION PIPE LINE


Pipeline processing can occur not only in the data stream but in the
instruction stream as well.

An instruction pipeline reads consecutive instructions from memory


while previous instructions are being executed in other segments.

Six Phases* in an Instruction Cycle

[1] Fetch an instruction from memory


[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 24

• * Some instructions skip some phases


• * Effective address calculation can be done in the part of
the decoding phase
• * Storage of the operation result into a register is done
automatically in the execution phase

• ==> 4-Stage Pipeline

• [1] FI: Fetch an instruction from memory


• [2] DA: Decode the instruction and calculate the
effective address of the operand
• [3] FO: Fetch the operand
• [4] EX: Execute the operation
Computer Organization Computer Architectures Lab
Pipelining and Vector Processing 25 Instruction Pipeline

INSTRUCTION PIPELINE
Execution of Three Instructions in a 4-Stage Pipeline

Conventional

i FI DA FO EX

i+1 FI DA FO EX

i+2 FI DA FO EX

Pipelined

i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 26 Instruction Pipeline

INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE

Segment1: Fetch instruction


from memory

Decode instruction
Segment2: and calculate
effective address

Branch?
yes
no
Fetch operand
Segment3: from memory

Segment4: Execute instruction

Interrupt yes
Interrupt?
handling
no
Update PC

Empty pipe

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 27 Instruction Pipeline

INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE

Step: 1 2 3 4 5 6 7 8 9 10 11 12 13

1 FI DA FO EX
Instruction

2 FI DA FO EX

(Branch) 3 FI DA FO EX

4 FI FI DA FO EX

5 FI DA FO EX

6 FI DA FO EX

7 FI DA FO EX

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 28

Pipeline Conflicts
– Pipeline Conflicts : 3 major difficulties
1) Resource conflicts: memory access by two segments at the
same time. Most of these conflicts can be resolved by using
separate instruction and data memories.

2) Data dependency: when an instruction depend on the result


of a previous instruction, but this result is not yet available.

3) Branch difficulties: branch and other instruction (interrupt,


ret, ..) that change the value of PC.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 29 RISC Pipeline

RISC Computer
• RISC (Reduced Instruction Set Computer)
- Machine with a very fast clock cycle that executes at the rate of one
instruction per cycle.

• Major Characteristic
1. Relatively few instructions
2. Relatively few addressing modes
3. Memory access limited to load and store instructions
4. All operations done within the registers of the CPU
5. Fixed-length, easily decoded instruction format
6. Single-cycle instruction execution
7. Hardwired rather than microprogrammed control
8. Relatively large number of registers in the processor unit
9. Efficient instruction pipeline
10. Compiler support for efficient translation of high-level language
programs into machine language programs

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 30
RISC Pipeline
RISC PIPELINE

• The Instruction Cycle can be divided into three sub


operations and implemented in three segments( I,A,E).

The I- segment fetches the instruction from program memory.


The instruction is decoded and an ALU operation is performed in
the A segment.
E segment- Transfer the output of ALU to a register, memory,
or PC.

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 31

• Types of instructions

- Data Manipulation Instructions

- Load and Store Instructions

- Program Control Instructions

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 32

• 9-5 RISC Pipeline


– Example : Three-segment Instruction Pipeline
– Pipeline timing with data conflict :
– Pipeline timing with delayed load :

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 33

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 34

• In figure (a), There will be a data conflict in instruction 3


because the operand in R2 is not yet available in the A
segment.
• 1.LOAD: R1 M[address 1]
• 2. LOAD : R2 M[address 2]
• 3. ADD: R3  R1+R2
• 4. STORE: M[address 3]  R3
• Solution: Delayed load:

Computer Organization Computer Architectures Lab

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy