Pipeline History
Pipeline History
10/12/11 1
Informal Early Feedback, Part I
10/12/11 2
Informal Early Feedback, Part II
10/12/11 3
Informal Early Feedback, Part III
10/12/11 4
Recall our equation for execution time
10/12/11 5
“Superpipeling”
MIPS R4000
10/12/11 6
More Superpipelining
10/12/11 7
Historical data from Intel’s processors
10/12/11 8
There is a cost to deep pipelines
Two effects:
— Diminishing returns: pipeline register latency becomes significant
— Negatively impacts CPI (longer stalls, more instructions flushed)
10/12/11 9
Mitigating CPI loss 1: Dynamic Branch Prediction
It turns out, instructions tend to do the same things over and over again
— Idea: Use past history to predict future behavior
— First attempt:
• Keep 1 bit per branch that remembers last outcome
T T T T T T T T T NT T T T T T …
10/12/11 11
Branch prediction tables
10/12/11 12
When to predict branches?
Need:
— PC (to access predictor)
— To know it is a branch (must have decoded the instruction)
— The branch target (computed from the instruction bits)
10/12/11 13
Mitigating CPI loss 1: Branch Target Buffers
Need:
— PC Already have at fetch.
— To know it is a branch
Can remember and make available at fetch
— The branch target
10/12/11 14
BTB accessed in parallel with reading the instruction
Misprediction
1
from EX stage If matching entry found, and …
0 2-bit counter predicts taken
Match 1 0 — Redirect fetch to branch target
& Taken
Target
— Instead of PC+4
BTB
PC
4
Add What is the taken branch penalty?
P
C — (i.e., how many flushes on a
predicted taken branch?)
Read Instruction
address [31-0]
Instruction
memory
10/12/11 15
Back to our equation for execution time
10/12/11 16
Multiple Issue
10/12/11 17
Issue width over time
10/12/11 18
Static Multiple Issue
10/12/11 19
Example: MIPS with Static Dual Issue
Dual-issue packets
— One ALU/branch instruction
— One load/store instruction
— 64-bit aligned
• ALU/branch, then load/store
• Pad an unused instruction with nop
10/12/11 20
Hazards in the Dual-Issue MIPS
10/12/11 21
Scheduling Example
10/12/11 23
Loop Unrolling Example
10/12/11 24
Dynamic Multiple Issue = Superscalar
10/12/11 25
Out-of-order Execution (Dynamic Scheduling)
Allow the CPU to execute instructions out of order to avoid stalls
— But commit result to registers in order
Example
lw $t0, 20($s2)
add $t1, $t0, $t2
sub $s4, $s4, $t3
slti $t5, $s4, 20
— Can start sub while add is waiting for lw
10/12/11 26
Out-of-order Scheduling
Allow the CPU to execute instructions out of order to avoid stalls
— But commit result to registers in order
Example
lw $t0, 20($s2)
add $t1, $t0, $t2
sub $s4, $s4, $t3
slti $t5, $s4, 20
— Can start sub while add is waiting for lw
10/12/11 27
Implementing Out-of-Order Execution
10/12/11 28
Dynamically Scheduled CPU
Preserves
dependencies
Hold pending
operands
10/12/11 30