CSE 4293 Pipelining

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

CSE 4293 – Computer Architecture

Pipelining

Nafiz Imtiaz
Lecturer, EEE, AUST
Ref: https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-1-execution-stages-and-throughput/
Execution, Stages
and Performance

CSE-4293 Computer Architecture @ Nafiz Imtiaz 1


Concept of pipelining

Speed can be increased by hardware and also software


development
• Hardware development-> need to change the circuities
(difficult process)

• Software development-> don’t necessary to change


circuities. We need to be develop the pipelining for
executing the instruction in shortage possible time.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 3


Concept of pipelining

Pay
Car/ 1 2 3 4 5 6 7 8 9
Time

Receive
Order
C1 O P R Store
C2 O P R
C3 O P R

Average time for each car without pipelining:


Total time 9
= = 3 𝑚𝑖𝑛
Number of car 3

CSE-4293 Computer Architecture @ Nafiz Imtiaz 4


Concept of pipelining

Pay
Car/ 1 2 3 4 5 6 7 8 9
Time

Receive
Order
C1 O P R
Store
C2 O P R
C3 O P R

Average time for each car with pipelining:


Total time 5
= = 1.67 𝑚𝑖𝑛
Number of car 3
CSE-4293 Computer Architecture @ Nafiz Imtiaz 5
Definition of Pipelining

• Pipelining is a process of arrangement of hardware elements of the CPU


such that its overall performance is increased
• Simultaneous execution of more than one instruction takes place in a
pipelined processor
• In pipelining multiple instructions are overlapped in execution

CSE-4293 Computer Architecture @ Nafiz Imtiaz 6


Design of a basic pipeline
• In a pipelined processor, a pipeline has two ends, the input end and the
output end. Between these ends, there are multiple stages/segments such
that output of one stage is connected to input of next stage and each stage
performs a specific operation.

• Interface registers are used to hold the intermediate output between two
stages. These interface registers are also called latch or buffer.

• All the stages in the pipeline along with the interface registers are
controlled by a common clock.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 7


Design of a basic pipeline

Stage/segment Interface register

CSE-4293 Computer Architecture @ Nafiz Imtiaz 8


Pipeline Stages
RISC processor has 5 stage instruction pipeline to execute all the instructions
in the RISC instruction set. Following are the 5 stages of RISC pipeline with
their respective operations:

•Stage 1 (Instruction Fetch)


In this stage the CPU reads instructions from the address in the memory whose
value is present in the program counter.

•Stage 2 (Instruction Decode)


In this stage, instruction is decoded and the register file is accessed to get the
values from the registers used in the instruction.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 9


Pipeline Stages
• Stage 3 (Instruction Execute)
In this stage, ALU operations are performed.

• Stage 4 (Memory Access)


In this stage, memory operands are read and written from/to the memory that is
present in the instruction.

• Stage 5 (Write Back)


In this stage, computed/fetched value is written back to the register present in
the instructions.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 10


Execution in a pipelined processor

Execution sequence of instructions in a pipelined processor can be visualized


using a space-time diagram.
For example, consider a processor having 5 stages and let there be 4
instructions to be executed. We can visualize the execution sequence through
the space-time diagrams.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 11


Execution in a non-pipelined processor
Let,
Total number of instructions , K=4
Total number of stages, N=5
So, Total cycle, T=K*N=4*5=20 cycles

S1 I1 I2 I3 I4

S2 I1 I2 I3 I4
stage

S3 I1 I2 I3 I4

S4 I1 I2 I3 I4

S5 I1 I2 I3 I4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Time

CSE-4293 Computer Architecture @ Nafiz Imtiaz 12


Execution in a pipelined processor
Let,
Total number of instructions , N=4
Total number of stages, K=5
So, Total cycle, T=(K+N-1)=4+5-1=8 cycles

stage
S1 I1 I2 I3 I4
S2 I1 I2 I3 I4
S3 I1 I2 I3 I4
S4 I1 I2 I3 I4
S5 I1 I2 I3 I4
1 2 3 4 5 6 7 8 Time

CSE-4293 Computer Architecture @ Nafiz Imtiaz 13


Calculations
Time taken to execute ‘N’ instructions in a non pipelined processor:

𝐸𝑇𝑛𝑜𝑛 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 = 𝐾 × 𝑁 𝑇𝑝

Time taken to execute ‘N’ instructions in a pipelined processor:

𝐸𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 = 𝐾 + 𝑁 − 1 𝑇𝑝

Speedup (S) of the pipelined processor over non-pipelined processor:

𝐸𝑇𝑛𝑜𝑛 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 𝐾×𝑁 𝑇𝑝 𝐾×𝑁


𝑆= 𝐸𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔
= 𝐾+𝑁−1 𝑇𝑝
= 𝐾+𝑁−1

CSE-4293 Computer Architecture @ Nafiz Imtiaz 14


Calculations
When the number of tasks ‘N’ are significantly larger than k, that is, N >> k
𝐾×𝑁 𝐾×𝑁
𝑆= = = 𝐾 = 𝑆𝑚𝑎𝑥
𝐾+𝑁−1 𝑁

Where ‘k’ are the number of stages in the pipeline.

𝐺𝑖𝑣𝑒𝑛 𝑆𝑝𝑒𝑒𝑑 𝑈𝑝 𝑆 S
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 = = =
𝑀𝑎𝑥 𝑠𝑝𝑒𝑒𝑑 𝑈𝑝 𝑆𝑚𝑎𝑥 K

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑁
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 = =
𝑇𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑡ℎ𝑒 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑘 + 𝑁 − 1 𝑇𝑝

Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1

CSE-4293 Computer Architecture @ Nafiz Imtiaz 15


Problems
If we need to execute 12 number of instructions and we are using 5 stages
of pipelining then calculate:[Clock frequency = 2.8GHz]
a) Total execution time without pipelining
b) Total execution time with pipelining
c) Speedup
d) Efficiency
e) Throughput

CSE-4293 Computer Architecture @ Nafiz Imtiaz 16


Solution

a) Total execution time without pipelining

𝐸𝑇𝑛𝑜𝑛 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 = 𝐾 × 𝑁 𝑇𝑝
1
= 5 × 12 𝑇𝑝 here, Tp =
fclock

b) Total execution time with pipelining

𝐸𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 = 𝐾 + 𝑁 − 1 𝑇𝑝
1
= 5 + 12 − 1 𝑇𝑝 here, Tp =
fclock

CSE-4293 Computer Architecture @ Nafiz Imtiaz 17


Solution

c) Speedup (S) of the pipelined processor over non-pipelined processor:


𝐸𝑇𝑛𝑜𝑛 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 𝐾×𝑁 𝑇𝑝 𝐾×𝑁
𝑆=
𝐸𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔
=
𝐾+𝑁−1 𝑇𝑝
= 𝐾+𝑁−1
5 × 12 15
= =
5 + 12 − 1 4

d) Efficiency:
𝐺𝑖𝑣𝑒𝑛 𝑆𝑝𝑒𝑒𝑑 𝑈𝑝 𝑆 S
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 = = =
𝑀𝑎𝑥 𝑠𝑝𝑒𝑒𝑑 𝑈𝑝 𝑆𝑚𝑎𝑥 K
15
3
= 4 = = 75%
5 4

CSE-4293 Computer Architecture @ Nafiz Imtiaz 18


Solution
c) Throughput:

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑁
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 = =
𝑇𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑡ℎ𝑒 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑘 + 𝑁 − 1 𝑇𝑝

12 1
= (5+12−1) 𝑇𝑝 here, Tp = f
clock

CSE-4293 Computer Architecture @ Nafiz Imtiaz 19


Dependencies
and Data Hazard

CSE-4293 Computer Architecture @ Nafiz Imtiaz 20


Dependencies in a pipelined processor
There are mainly three types of dependencies possible in a pipelined processor.
These are :

1) Structural Dependency
2) Control Dependency
3) Data Dependency

These dependencies may introduce stalls in the pipeline.

Stall : A stall is a cycle in the pipeline without new input.(NOP)

CSE-4293 Computer Architecture @ Nafiz Imtiaz 21


Structural dependency(Structural Hazard)

This dependency arises due to the resource conflict in the pipeline. A resource conflict is
a situation when more than one instruction tries to access the same resource in the same
cycle. A resource can be a register, memory, or ALU.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 22


Example:

In the above scenario, in cycle 4, instructions I1 and I4 are trying to access same resource
(Memory) which introduces a resource conflict. To avoid this problem, we have to keep the
instruction on wait until the required resource (memory in our case) becomes available.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 23


We need to introduce stalls in the pipeline as shown below:

stalls

CSE-4293 Computer Architecture @ Nafiz Imtiaz 24


Solution for structural dependency
To minimize structural dependency stalls in the pipeline, we use a hardware mechanism
called Renaming.

Renaming :

According to renaming, we divide the memory into two independent modules used to store the
instruction and data separately called Code memory(CM) and Data memory(DM) respectively. CM
will contain all the instructions and DM will contain all the operands that are required for the
instructions.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 25


Solution for structural dependency

CSE-4293 Computer Architecture @ Nafiz Imtiaz 26


Control dependency(Control Hazard)
All instructions who change the program counter codes to occur control hazard.

This type of dependency occurs during the transfer of control instructions such as BRANCH,
CALL, JMP, etc. On many instruction architectures, the processor will not know the target
address of these instructions when it needs to insert the new instruction into the pipeline. Due to
this, unwanted instructions are fed to the pipeline.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 27


Consider the following sequence of instructions in the program:
100: I1
101: I2 (JMP 250)
102: I3
….
….
….
250: BI1
Expected output: I1 -> I2 -> BI1
NOTE: Generally, the target address of the JMP instruction is known after
ID stage only.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 28


Output Sequence: I1 -> I2 -> I3 -> BI1
So, the output sequence is not equal to the expected output, that means the
pipeline is not implemented correctly.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 29


Output Sequence: I1 -> I2 -> Delay(Stall) -> BI1
As the delay slot performs no operation, this output sequence is equal to the expected output sequence. But
this slot introduces stall in the pipeline.

Branch penalty : The number of stalls introduced during the branch operations in the pipelined
processor is known as branch penalty.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 30


Data dependency
Example:
Let there be two instructions I1 and I2 such that:
I1 : ADD R1, R2, R3
I2 : SUB R4, R1, R2

When the above instructions are executed in a pipelined processor, then data dependency
condition will occur, which means that I2 tries to read the data before I1 writes it, therefore,
I2 incorrectly gets the old value from I1.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 31


To minimize data dependency stalls in the pipeline, operand forwarding is used.

Operand Forwarding :
In operand forwarding, we use the interface registers present between the stages to hold
intermediate output so that dependent instruction can access new value from the interface
register directly.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 32


Data Hazard
Data hazards occur when instructions that exhibit data dependence, modify data in different
stages of a pipeline. Hazard cause delays in the pipeline. There are mainly three types of data
hazards:
1) RAW (Read after Write) [Flow/True data dependency]
2) WAR (Write after Read) [Anti-Data dependency]
3) WAW (Write after Write) [Output data dependency]

CSE-4293 Computer Architecture @ Nafiz Imtiaz 33


Let there be two instructions I and J, such that J follow I.
Then,
•RAW hazard occurs when instruction J tries to read data before instruction I writes it.
Example:
I: R2 <− R1 × R3 R(J)∩W(I) ≠ ∅
J: R4 <− R2 + R3

•WAR hazard occurs when instruction J tries to write data before instruction I reads it.
Example:
I: R2 <− R1 × R3
J: R3 <− R4 + R5 W(J)∩R(I) ≠ ∅

•WAW hazard occurs when instruction J tries to write output before instruction I writes it.
Example:
I: R2 <− R1 × R3
W(J)∩W(I) ≠ ∅
J: R2 <− R4 + R5
WAR and WAW hazards occur during the out-of-order execution of the instructions.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 34


Now, we say that instruction J depends in instruction I,
when

R J ∩W(I) ∪ (W J ∩R(I)) ∪ (W(J)∩W(I)) ≠ ∅

This condition is called Bernstein condition.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 35


Thank You

CSE-4293 Computer Architecture @ Nafiz Imtiaz 10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy