CSE 4293 Pipelining

CSE 4293 – Computer Architecture
Pipelining
Nafiz Imtiaz
Lecturer, EEE, AUST
Ref: https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-1-execution-stages-and-throughput/
Execution, Stages
and Performance
CSE-4293 Computer Architecture @ Nafiz Imtiaz 1

Concept of pipelining
Speed can be increased by hardware and also software

development
• Hardware development-> need to change the circuities
(difficult process)
• Software development-> don’t necessary to change

circuities. We need to be develop the pipelining for
executing the instruction in shortage possible time.

Pay
Car/ 1 2 3 4 5 6 7 8 9
Time
Receive
Order
C1 O P R Store
C2 O P R
C3 O P R
Average time for each car without pipelining:

Total time 9
= = 3 𝑚𝑖𝑛
Number of car 3

Pay
Car/ 1 2 3 4 5 6 7 8 9
Time
Receive
Order
C1 O P R
Store
C2 O P R
C3 O P R
Average time for each car with pipelining:

Total time 5
= = 1.67 𝑚𝑖𝑛
Number of car 3
Definition of Pipelining
• Pipelining is a process of arrangement of hardware elements of the CPU

such that its overall performance is increased
• Simultaneous execution of more than one instruction takes place in a
pipelined processor
• In pipelining multiple instructions are overlapped in execution

Design of a basic pipeline
• In a pipelined processor, a pipeline has two ends, the input end and the
output end. Between these ends, there are multiple stages/segments such
that output of one stage is connected to input of next stage and each stage
performs a specific operation.
• Interface registers are used to hold the intermediate output between two
stages. These interface registers are also called latch or buffer.
• All the stages in the pipeline along with the interface registers are
controlled by a common clock.

Design of a basic pipeline
Stage/segment Interface register

Pipeline Stages
RISC processor has 5 stage instruction pipeline to execute all the instructions
in the RISC instruction set. Following are the 5 stages of RISC pipeline with
their respective operations:
•Stage 1 (Instruction Fetch)

In this stage the CPU reads instructions from the address in the memory whose
value is present in the program counter.
•Stage 2 (Instruction Decode)

In this stage, instruction is decoded and the register file is accessed to get the
values from the registers used in the instruction.

Pipeline Stages
• Stage 3 (Instruction Execute)
In this stage, ALU operations are performed.
• Stage 4 (Memory Access)

In this stage, memory operands are read and written from/to the memory that is
present in the instruction.
• Stage 5 (Write Back)

In this stage, computed/fetched value is written back to the register present in
the instructions.

Execution in a pipelined processor
Execution sequence of instructions in a pipelined processor can be visualized

using a space-time diagram.
For example, consider a processor having 5 stages and let there be 4
instructions to be executed. We can visualize the execution sequence through
the space-time diagrams.

Execution in a non-pipelined processor
Let,
Total number of instructions , K=4
Total number of stages, N=5
So, Total cycle, T=K*N=4*5=20 cycles
S1 I1 I2 I3 I4
S2 I1 I2 I3 I4
stage
S3 I1 I2 I3 I4
S4 I1 I2 I3 I4
S5 I1 I2 I3 I4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Time

Execution in a pipelined processor
Let,
Total number of instructions , N=4
Total number of stages, K=5
So, Total cycle, T=(K+N-1)=4+5-1=8 cycles
stage
S1 I1 I2 I3 I4
S2 I1 I2 I3 I4
S3 I1 I2 I3 I4
S4 I1 I2 I3 I4
S5 I1 I2 I3 I4
1 2 3 4 5 6 7 8 Time

Calculations
Time taken to execute ‘N’ instructions in a non pipelined processor:
𝐸𝑇𝑛𝑜𝑛 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 = 𝐾 × 𝑁 𝑇𝑝
Time taken to execute ‘N’ instructions in a pipelined processor:
𝐸𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 = 𝐾 + 𝑁 − 1 𝑇𝑝
Speedup (S) of the pipelined processor over non-pipelined processor:
𝐸𝑇𝑛𝑜𝑛 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 𝐾×𝑁 𝑇𝑝 𝐾×𝑁

𝑆= 𝐸𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔
= 𝐾+𝑁−1 𝑇𝑝
= 𝐾+𝑁−1

Calculations
When the number of tasks ‘N’ are significantly larger than k, that is, N >> k
𝐾×𝑁 𝐾×𝑁
𝑆= = = 𝐾 = 𝑆𝑚𝑎𝑥
𝐾+𝑁−1 𝑁
Where ‘k’ are the number of stages in the pipeline.
𝐺𝑖𝑣𝑒𝑛 𝑆𝑝𝑒𝑒𝑑 𝑈𝑝 𝑆 S
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 = = =
𝑀𝑎𝑥 𝑠𝑝𝑒𝑒𝑑 𝑈𝑝 𝑆𝑚𝑎𝑥 K
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑁
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 = =
𝑇𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑡ℎ𝑒 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑘 + 𝑁 − 1 𝑇𝑝
Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1

Problems
If we need to execute 12 number of instructions and we are using 5 stages
of pipelining then calculate:[Clock frequency = 2.8GHz]
a) Total execution time without pipelining
b) Total execution time with pipelining
c) Speedup
d) Efficiency
e) Throughput

Solution
a) Total execution time without pipelining
𝐸𝑇𝑛𝑜𝑛 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 = 𝐾 × 𝑁 𝑇𝑝
1
= 5 × 12 𝑇𝑝 here, Tp =
fclock
b) Total execution time with pipelining
𝐸𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 = 𝐾 + 𝑁 − 1 𝑇𝑝
1
= 5 + 12 − 1 𝑇𝑝 here, Tp =
fclock

Solution
c) Speedup (S) of the pipelined processor over non-pipelined processor:

𝐸𝑇𝑛𝑜𝑛 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 𝐾×𝑁 𝑇𝑝 𝐾×𝑁
𝑆=
𝐸𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔
=
𝐾+𝑁−1 𝑇𝑝
= 𝐾+𝑁−1
5 × 12 15
= =
5 + 12 − 1 4
d) Efficiency:
𝐺𝑖𝑣𝑒𝑛 𝑆𝑝𝑒𝑒𝑑 𝑈𝑝 𝑆 S
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 = = =
𝑀𝑎𝑥 𝑠𝑝𝑒𝑒𝑑 𝑈𝑝 𝑆𝑚𝑎𝑥 K
15
3
= 4 = = 75%
5 4

Solution
c) Throughput:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑁
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 = =
𝑇𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑡ℎ𝑒 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑘 + 𝑁 − 1 𝑇𝑝
12 1
= (5+12−1) 𝑇𝑝 here, Tp = f
clock

Dependencies
and Data Hazard

Dependencies in a pipelined processor
There are mainly three types of dependencies possible in a pipelined processor.
These are :
1) Structural Dependency
2) Control Dependency
3) Data Dependency
These dependencies may introduce stalls in the pipeline.
Stall : A stall is a cycle in the pipeline without new input.(NOP)

Structural dependency(Structural Hazard)
This dependency arises due to the resource conflict in the pipeline. A resource conflict is
a situation when more than one instruction tries to access the same resource in the same
cycle. A resource can be a register, memory, or ALU.

Example:
In the above scenario, in cycle 4, instructions I1 and I4 are trying to access same resource
(Memory) which introduces a resource conflict. To avoid this problem, we have to keep the
instruction on wait until the required resource (memory in our case) becomes available.

We need to introduce stalls in the pipeline as shown below:
stalls

Solution for structural dependency
To minimize structural dependency stalls in the pipeline, we use a hardware mechanism
called Renaming.
Renaming :
According to renaming, we divide the memory into two independent modules used to store the
instruction and data separately called Code memory(CM) and Data memory(DM) respectively. CM
will contain all the instructions and DM will contain all the operands that are required for the
instructions.

Solution for structural dependency

Control dependency(Control Hazard)
All instructions who change the program counter codes to occur control hazard.
This type of dependency occurs during the transfer of control instructions such as BRANCH,
CALL, JMP, etc. On many instruction architectures, the processor will not know the target
address of these instructions when it needs to insert the new instruction into the pipeline. Due to
this, unwanted instructions are fed to the pipeline.

Consider the following sequence of instructions in the program:
100: I1
101: I2 (JMP 250)
102: I3
….
….
….
250: BI1
Expected output: I1 -> I2 -> BI1
NOTE: Generally, the target address of the JMP instruction is known after
ID stage only.

Output Sequence: I1 -> I2 -> I3 -> BI1
So, the output sequence is not equal to the expected output, that means the
pipeline is not implemented correctly.

Output Sequence: I1 -> I2 -> Delay(Stall) -> BI1
As the delay slot performs no operation, this output sequence is equal to the expected output sequence. But
this slot introduces stall in the pipeline.
Branch penalty : The number of stalls introduced during the branch operations in the pipelined
processor is known as branch penalty.

Data dependency
Example:
Let there be two instructions I1 and I2 such that:
I1 : ADD R1, R2, R3
I2 : SUB R4, R1, R2
When the above instructions are executed in a pipelined processor, then data dependency
condition will occur, which means that I2 tries to read the data before I1 writes it, therefore,
I2 incorrectly gets the old value from I1.

To minimize data dependency stalls in the pipeline, operand forwarding is used.
Operand Forwarding :
In operand forwarding, we use the interface registers present between the stages to hold
intermediate output so that dependent instruction can access new value from the interface
register directly.

Data Hazard
Data hazards occur when instructions that exhibit data dependence, modify data in different
stages of a pipeline. Hazard cause delays in the pipeline. There are mainly three types of data
hazards:
1) RAW (Read after Write) [Flow/True data dependency]
2) WAR (Write after Read) [Anti-Data dependency]
3) WAW (Write after Write) [Output data dependency]

Let there be two instructions I and J, such that J follow I.
Then,
•RAW hazard occurs when instruction J tries to read data before instruction I writes it.
Example:
I: R2 <− R1 × R3 R(J)∩W(I) ≠ ∅
J: R4 <− R2 + R3
•WAR hazard occurs when instruction J tries to write data before instruction I reads it.
Example:
I: R2 <− R1 × R3
J: R3 <− R4 + R5 W(J)∩R(I) ≠ ∅
•WAW hazard occurs when instruction J tries to write output before instruction I writes it.
Example:
I: R2 <− R1 × R3
W(J)∩W(I) ≠ ∅
J: R2 <− R4 + R5
WAR and WAW hazards occur during the out-of-order execution of the instructions.

Now, we say that instruction J depends in instruction I,
when
R J ∩W(I) ∪ (W J ∩R(I)) ∪ (W(J)∩W(I)) ≠ ∅
This condition is called Bernstein condition.

Thank You

CSE 4293 Pipelining

Uploaded by

Copyright:

Available Formats

CSE 4293 Pipelining

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSE 4293 Pipelining

Uploaded by

Copyright:

Available Formats

CSE 4293 – Computer Architecture

CSE-4293 Computer Architecture @ Nafiz Imtiaz 1

Speed can be increased by hardware and also software

• Software development-> don’t necessary to change

CSE-4293 Computer Architecture @ Nafiz Imtiaz 3

Average time for each car without pipelining:

CSE-4293 Computer Architecture @ Nafiz Imtiaz 4

Average time for each car with pipelining:

• Pipelining is a process of arrangement of hardware elements of the CPU

CSE-4293 Computer Architecture @ Nafiz Imtiaz 6

CSE-4293 Computer Architecture @ Nafiz Imtiaz 7

Stage/segment Interface register

CSE-4293 Computer Architecture @ Nafiz Imtiaz 8

•Stage 1 (Instruction Fetch)

•Stage 2 (Instruction Decode)

CSE-4293 Computer Architecture @ Nafiz Imtiaz 9

• Stage 4 (Memory Access)

• Stage 5 (Write Back)

CSE-4293 Computer Architecture @ Nafiz Imtiaz 10

Execution sequence of instructions in a pipelined processor can be visualized

CSE-4293 Computer Architecture @ Nafiz Imtiaz 11

CSE-4293 Computer Architecture @ Nafiz Imtiaz 12

CSE-4293 Computer Architecture @ Nafiz Imtiaz 13

Time taken to execute ‘N’ instructions in a pipelined processor:

Speedup (S) of the pipelined processor over non-pipelined processor:

𝐸𝑇𝑛𝑜𝑛 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑖𝑛𝑔 𝐾×𝑁 𝑇𝑝 𝐾×𝑁

CSE-4293 Computer Architecture @ Nafiz Imtiaz 14

Where ‘k’ are the number of stages in the pipeline.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 15

CSE-4293 Computer Architecture @ Nafiz Imtiaz 16

a) Total execution time without pipelining

b) Total execution time with pipelining

CSE-4293 Computer Architecture @ Nafiz Imtiaz 17

c) Speedup (S) of the pipelined processor over non-pipelined processor:

CSE-4293 Computer Architecture @ Nafiz Imtiaz 18

CSE-4293 Computer Architecture @ Nafiz Imtiaz 19

CSE-4293 Computer Architecture @ Nafiz Imtiaz 20

These dependencies may introduce stalls in the pipeline.

Stall : A stall is a cycle in the pipeline without new input.(NOP)

CSE-4293 Computer Architecture @ Nafiz Imtiaz 21

CSE-4293 Computer Architecture @ Nafiz Imtiaz 22

CSE-4293 Computer Architecture @ Nafiz Imtiaz 23

CSE-4293 Computer Architecture @ Nafiz Imtiaz 24

CSE-4293 Computer Architecture @ Nafiz Imtiaz 25

CSE-4293 Computer Architecture @ Nafiz Imtiaz 26

CSE-4293 Computer Architecture @ Nafiz Imtiaz 27

CSE-4293 Computer Architecture @ Nafiz Imtiaz 28

CSE-4293 Computer Architecture @ Nafiz Imtiaz 29

CSE-4293 Computer Architecture @ Nafiz Imtiaz 30

CSE-4293 Computer Architecture @ Nafiz Imtiaz 31

CSE-4293 Computer Architecture @ Nafiz Imtiaz 32

CSE-4293 Computer Architecture @ Nafiz Imtiaz 33

CSE-4293 Computer Architecture @ Nafiz Imtiaz 34

R J ∩W(I) ∪ (W J ∩R(I)) ∪ (W(J)∩W(I)) ≠ ∅

This condition is called Bernstein condition.

CSE-4293 Computer Architecture @ Nafiz Imtiaz 35

CSE-4293 Computer Architecture @ Nafiz Imtiaz 10

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.