Pipelining
Pipelining
Performance Issues
Longest delay determines clock period
Critical path: load instruction
Instruction memory → register file → ALU →
data memory → register file
Not feasible to vary period for different
instructions
Violates design principle
Making the common case fast
We will improve performance by pipelining
Four loads:
Speedup
= 16/7 = 2.3
Non-stop:
Speedup
= 4n/n + 3 ≈ 4
= number of stages
lw $t 1, 0( $t 0) lw $t 1, 0( $t 0)
lw $t 2, 4( $t 0) lw $t 2, 4( $t 0)
stall add $t 3, $t 1, $t 2 lw $t 4, 8( $t 0)
sw $t 3, 12( $t 0) add $t 3, $t 1, $t 2
lw $t 4, 8( $t 0) sw $t 3, 12( $t 0)
stall add $t 5, $t 1, $t 4 add $t 5, $t 1, $t 4
sw $t 5, 16( $t 0) sw $t 5, 16( $t 0)
13 cycles 11 cycles
Prediction
correct
Prediction
incorrect
MEM
Right-to-left WB
flow leads to
hazards
Wrong
register
number