Digital Design and Computer Architecture, 2: Edition

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 134

MICROARCHITECTURE Chapter 7

Digital Design and Computer Architecture, 2nd Edition


David Money Harris and Sarah L. Harris

Chapter 7 <1>
MICROARCHITECTURE Chapter 7 :: Topics

• Introduction
• Performance Analysis
• Single-Cycle Processor
• Multicycle Processor
• Pipelined Processor
• Exceptions
• Advanced Microarchitecture

Chapter 7 <2>
MICROARCHITECTURE Introduction
• Microarchitecture: how to Application
Software
programs

implement an architecture Operating


Systems
device drivers

in hardware Architecture
instructions
registers
• Processor: Micro- datapaths

– Datapath: functional blocks architecture controllers

adders
– Control: control signals Logic
memories

Digital AND gates


Circuits NOT gates

Analog amplifiers
Circuits filters

transistors
Devices
diodes

Physics electrons

Chapter 7 <3>
MICROARCHITECTURE Microarchitecture
• Multiple implementations for a single
architecture:
– Single-cycle: Each instruction executes in a
single cycle
– Multicycle: Each instruction is broken into series
of shorter steps
– Pipelined: Each instruction broken up into series
of steps & multiple instructions execute at once

Chapter 7 <4>
MICROARCHITECTURE Processor Performance
• Program execution time
Execution Time = (#instructions)(cycles/instruction)(seconds/cycle)

• Definitions:
– CPI: Cycles/instruction
– clock period: seconds/cycle
– IPC: instructions/cycle = IPC
• Challenge is to satisfy constraints of:
– Cost
– Power
– Performance

Chapter 7 <5>
MICROARCHITECTURE MIPS Processor
• Consider subset of MIPS instructions:
– R-type instructions: and, or, add, sub, slt
– Memory instructions: lw, sw
– Branch instructions: beq

Chapter 7 <6>
MICROARCHITECTURE Architectural State
• Determines everything about a processor:
– PC
– 32 registers
– Memory

Chapter 7 <7>
MICROARCHITECTURE MIPS State Elements

CLK CLK CLK


PC' PC WE3 WE
A1 RD1
32 32 A RD 5 32
32 32
5
A2 RD2 32 A RD
Instruction 32 32
Memory Data
5
A3 Memory
Register
WD3 WD
32 File 32

Chapter 7 <8>
MICROARCHITECTURE Single-Cycle MIPS Processor
• Datapath
• Control

Chapter 7 <9>
MICROARCHITECTURE Single-Cycle Datapath: lw fetch
STEP 1: Fetch instruction

CLK CLK
CLK
PC Instr WE3 WE
PC' A1 RD1
A RD
A RD
Instruction
A2 RD2 Data
Memory
A3 Memory
Register
WD3 WD
File

Chapter 7 <10>
MICROARCHITECTURE Single-Cycle Datapath: lw Register Read

STEP 2: Read source operands from RF

CLK CLK
CLK
25:21
WE3 WE
PC' PC Instr A1 RD1
A RD
A RD
Instruction
A2 RD2 Data
Memory
A3 Memory
Register
WD3 WD
File

Chapter 7 <11>
MICROARCHITECTURE Single-Cycle Datapath: lw Immediate

STEP 3: Sign-extend the immediate

CLK CLK
CLK
25:21
WE3 WE
PC' PC Instr A1 RD1
A RD
A RD
Instruction
A2 RD2 Data
Memory
A3 Memory
Register
WD3 WD
File

15:0 SignImm
Sign Extend

Chapter 7 <12>
MICROARCHITECTURE Single-Cycle Datapath: lw address
STEP 4: Compute the memory address

ALUControl2:0
010
CLK CLK
CLK
25:21
WE3 SrcA Zero WE
PC' PC Instr A1 RD1
A RD

ALU
ALUResult
A RD
Instruction
A2 RD2 SrcB Data
Memory
A3 Memory
Register
WD3 WD
File

SignImm
15:0
Sign Extend

Chapter 7 <13>
MICROARCHITECTURE Single-Cycle Datapath: lw Memory Read

• STEP 5: Read data from memory and write


it back to register file
RegWrite ALUControl2:0
1 010
CLK CLK
CLK
25:21
WE3 SrcA Zero WE
PC' PC Instr A1 RD1
A RD

ALU
ALUResult ReadData
A RD
Instruction
A2 RD2 SrcB Data
Memory 20:16
A3 Memory
Register
WD3 WD
File

SignImm
15:0
Sign Extend

Chapter 7 <14>
MICROARCHITECTURE Single-Cycle Datapath: lw PC Increment

STEP 6: Determine address of next instruction


RegWrite ALUControl2:0
1 010
CLK CLK
CLK
25:21 WE3 SrcA Zero WE
PC' PC Instr A1 RD1
A RD

ALU
ALUResult ReadData
A RD
Instruction
A2 RD2 SrcB Data
Memory 20:16
A3 Memory
Register
WD3 WD
File

PCPlus4
+

SignImm
4 15:0
Sign Extend

Result

Chapter 7 <15>
MICROARCHITECTURE Single-Cycle Datapath: sw
Write data in rt to memory
RegWrite ALUControl2:0 MemWrite
0 010 1
CLK CLK
CLK
25:21
WE3 SrcA Zero WE
PC' PC Instr A1 RD1
A RD

ALU
ALUResult ReadData
20:16 A RD
Instruction
A2 RD2 SrcB Data
Memory 20:16
A3 Memory
Register WriteData
WD3 WD
File

PCPlus4
+

SignImm
4 15:0
Sign Extend

Result

Chapter 7 <16>
MICROARCHITECTURE Single-Cycle Datapath: R-Type
• Read from rs and rt
• Write ALUResult to register file
• Write to rd (instead of rt)
RegWrite RegDst ALUSrc ALUControl2:0 MemWrite MemtoReg
1 1 0 varies 0
CLK CLK 0
CLK
25:21
WE3 SrcA Zero WE
PC' PC Instr A1 RD1 0
A RD

ALU
ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0
Sign Extend

Result

Chapter 7 <17>
MICROARCHITECTURE Single-Cycle Datapath: beq
• Determine whether values in rs and rt are equal
• Calculate branch target address:
BTA = (sign-extended immediate << 2) + (PC+4)
PCSrc

RegWrite RegDst ALUSrc ALUControl2:0 Branch MemWrite MemtoReg


0 x 0 110 1 x
CLK CLK 0
CLK
WE3 SrcA Zero WE
0 PC' PC Instr
25:21
A1 RD1
A RD 0

ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0
<<2
Sign Extend PCBranch

+
Result

Chapter 7 <18>
MICROARCHITECTURE Single-Cycle Processor
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK
25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD

ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0
<<2
Sign Extend PCBranch

+
Result

Chapter 7 <19>
MICROARCHITECTURE Single-Cycle Control
Control
Unit MemtoReg
MemWrite
Branch
Opcode5:0 Main
ALUSrc
Decoder
RegDst
RegWrite

ALUOp1:0

ALU
Funct5:0 ALUControl 2:0
Decoder

Chapter 7 <20>
MICROARCHITECTURE Review: ALU
F2:0 Function
A B 000 A& B
N N 001 A|B
010 A+B
F 011 not used
ALU 3
100 A & ~B
N
101 A | ~B
Y
110 A-B
111 SLT

Chapter 7 <21>
MICROARCHITECTURE Review: ALU
A B
N N

0
F2
N

Cout +
[N-1] S
Extend
Zero

N N N N
1

0
3

2 F1:0
N
Y

Chapter 7 <22>
MICROARCHITECTURE Control Unit: ALU Decoder
ALUOp1:0 Meaning
00 Add
01 Subtract
10 Look at Funct
11 Not Used

ALUOp1:0 Funct ALUControl2:0


00 X 010 (Add)
X1 X 110 (Subtract)
1X 100000 (add) 010 (Add)
1X 100010 (sub) 110 (Subtract)
1X 100100 (and) 000 (And)
1X 100101 (or) 001 (Or)
1X 101010 (slt) 111 (SLT)
Chapter 7 <23>
MICROARCHITECTURE Control Unit Main Decoder
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000
lw 100011
sw 101011
beq 000100

MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK
25:21 WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD

ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0
<<2
Sign Extend PCBranch
+

Result

Chapter 7 <24>
MICROARCHITECTURE Control Unit: Main Decoder

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10
lw 100011 1 0 1 0 0 0 00
sw 101011 0 X 1 0 1 X 00
beq 000100 0 X 0 1 0 X 01

Chapter 7 <25>
MICROARCHITECTURE Single-Cycle Datapath: or
MemtoReg
Control
MemWrite
Unit
Branch 0
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK 1 0
0 001 0
25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD

ALU
1 ALUResult ReadData
0 A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
1
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0 <<2
Sign Extend PCBranch

+
Result

Chapter 7 <26>
MICROARCHITECTURE Extended Functionality: addi
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK
25:21 WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD

ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0
<<2
Sign Extend PCBranch

+
Result

No change to datapath
Chapter 7 <27>
MICROARCHITECTURE Control Unit: addi
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10

lw 100011 1 0 1 0 0 1 00

sw 101011 0 X 1 0 1 X 00

beq 000100 0 X 0 1 0 X 01

addi 001000

Chapter 7 <28>
MICROARCHITECTURE Control Unit: addi
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10

lw 100011 1 0 1 0 0 1 00

sw 101011 0 X 1 0 1 X 00

beq 000100 0 X 0 1 0 X 01

addi 001000 1 0 1 0 0 0 00

Chapter 7 <29>
MICROARCHITECTURE Extended Functionality: j
Jump MemtoReg
Control
MemWrite
Unit
Branch
ALUControl 2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK
0 PC' 25:21
WE3 SrcA Zero WE
0 PC Instr A1 RD1 0 Result
1 A RD

ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
PCJump 15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0
<<2
Sign Extend PCBranch

+
27:0 31:28

25:0
<<2

Chapter 7 <30>
MICROARCHITECTURE Control Unit: Main Decoder
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type 000000 1 1 0 0 0 0 10 0

lw 100011 1 0 1 0 0 1 00 0

sw 101011 0 X 1 0 1 X 00 0

beq 000100 0 X 0 1 0 X 01 0

j 000010

Chapter 7 <31>
MICROARCHITECTURE Control Unit: Main Decoder
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type 000000 1 1 0 0 0 0 10 0

lw 100011 1 0 1 0 0 1 00 0

sw 101011 0 X 1 0 1 X 00 0

beq 000100 0 X 0 1 0 X 01 0

j 000010 0 X X X 0 X XX 1

Chapter 7 <32>
MICROARCHITECTURE Review: Processor Performance
Program Execution Time
= (#instructions)(cycles/instruction)(seconds/cycle)
= # instructions x CPI x TC

Chapter 7 <33>
MICROARCHITECTURE Single-Cycle Performance
MemtoReg
Control
MemWrite
Unit
Branch 0 0
ALUControl 2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK 1 0
010 1
25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD

ALU
1 ALUResult ReadData
1 A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
0
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0 <<2
Sign Extend PCBranch

+
Result

TC limited by critical path (lw)


Chapter 7 <34>
MICROARCHITECTURE Single-Cycle Performance
• Single-cycle critical path:
Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem
+ tmux + tRFsetup

• Typically, limiting paths are:


– memory, ALU, register file
– Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup

Chapter 7 <35>
MICROARCHITECTURE Single-Cycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20

Tc = ?

Chapter 7 <36>
MICROARCHITECTURE Single-Cycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20

Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup


= [30 + 2(250) + 150 + 25 + 200 + 20] ps
= 925 ps
Chapter 7 <37>
MICROARCHITECTURE Single-Cycle Performance Example
Program with 100 billion instructions:

Execution Time = # instructions x CPI x TC


= (100 × 109)(1)(925 × 10-12 s)
= 92.5 seconds

Chapter 7 <38>
MICROARCHITECTURE Multicycle MIPS Processor
• Single-cycle:
+ simple
- cycle time limited by longest instruction (lw)
- 2 adders/ALUs & 2 memories
• Multicycle:
+ higher clock speed
+ simpler instructions run faster
+ reuse expensive hardware on multiple cycles
- sequencing overhead paid many times
• Same design steps: datapath & control

Chapter 7 <39>
MICROARCHITECTURE Multicycle State Elements
• Replace Instruction and Data memories with
a single unified memory – more realistic
CLK CLK
CLK
WE WE3
PC' PC A1 RD1
RD
EN A A2 RD2
Instr / Data
Memory A3
Register
WD
File
WD3

Chapter 7 <40>
MICROARCHITECTURE Multicycle Datapath: Instruction Fetch
STEP 1: Fetch instruction

IRWrite

CLK CLK
CLK CLK
WE WE3
PC' PC Instr A1 RD1
b A
RD
A2 RD2
EN
Instr / Data
Memory A3
Register
WD
File
WD3

Chapter 7 <41>
MICROARCHITECTURE Multicycle Datapath: lw Register Read
STEP 2a: Read source operands from RF

IRWrite

CLK CLK CLK


CLK CLK
WE 25:21 WE3 A
PC' PC Instr A1 RD1
b A
RD
A2 RD2
EN
Instr / Data
Memory A3
Register
WD
File
WD3

Chapter 7 <42>
MICROARCHITECTURE Multicycle Datapath: lw Immediate
STEP 2b: Sign-extend the immediate
IRWrite

CLK CLK CLK


CLK CLK
WE 25:21 WE3 A
PC' PC Instr A1 RD1
b A
RD
A2 RD2
EN
Instr / Data
Memory A3
Register
WD
File
WD3

SignImm
15:0
Sign Extend

Chapter 7 <43>
MICROARCHITECTURE Multicycle Datapath: lw Address
STEP 3: Compute the memory address

IRWrite ALUControl2:0

CLK CLK CLK


CLK CLK
WE WE3 A SrcA CLK
25:21
PC' PC Instr A1 RD1
b RD

ALU
A EN A2 RD2 ALUResult ALUOut
Instr / Data SrcB
Memory A3
Register
WD
File
WD3

SignImm
15:0
Sign Extend

Chapter 7 <44>
MICROARCHITECTURE Multicycle Datapath: lw Memory Read

STEP 4: Read data from memory

IorD IRWrite ALUControl2:0

CLK CLK CLK


CLK CLK
WE WE3 A SrcA CLK
25:21
PC' PC Instr A1 RD1
b 0 Adr RD

ALU
A EN A2 RD2 ALUResult ALUOut
1
Instr / Data SrcB
Memory CLK A3
Register
WD
Data File
WD3

SignImm
15:0
Sign Extend

Chapter 7 <45>
MICROARCHITECTURE Multicycle Datapath: lw Write Register

STEP 5: Write data back to register file

IorD IRWrite RegWrite ALUControl2:0

CLK CLK CLK


CLK CLK
WE WE3 A SrcA CLK
25:21
PC' PC Instr A1 RD1
b 0 Adr RD

ALU
A EN A2 RD2 ALUResult ALUOut
1
Instr / Data SrcB
Memory CLK
20:16
A3
Register
WD
Data File
WD3

SignImm
15:0
Sign Extend

Chapter 7 <46>
MICROARCHITECTURE Multicycle Datapath: Increment PC
STEP 6: Increment PC

PCWrite IorD IRWrite RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0

CLK CLK CLK


CLK CLK
0 SrcA
WE WE3 A CLK
25:21
PC' PC Instr A1 RD1 1
b 0 Adr RD

ALU
EN A EN A2 RD2 00 ALUResult ALUOut
1 SrcB
Instr / Data 4 01
Memory CLK
20:16
A3 10
Register
WD 11
Data File
WD3

SignImm
15:0
Sign Extend

Chapter 7 <47>
MICROARCHITECTURE Multicycle Datapath: sw
Write data in rt to memory

PCWrite IorD MemWrite IRWrite RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0

CLK CLK CLK


CLK CLK
0 SrcA
WE WE3 A CLK
25:21
PC' PC Instr A1 RD1 1
b 0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1
Instr / Data 4 01 SrcB
Memory CLK
20:16
A3 10
Register
WD 11
Data File
WD3

SignImm
15:0
Sign Extend

Chapter 7 <48>
MICROARCHITECTURE Multicycle Datapath: R-Type
• Read from rs and rt
• Write ALUResult to register file
• Write to rd (instead of rt)
PCWrite IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB1:0 ALUControl 2:0

CLK CLK CLK


CLK CLK
0 SrcA
WE WE3 A CLK
25:21
PC' PC Instr A1 RD1 1
b 0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1

SignImm
15:0
Sign Extend

Chapter 7 <49>
MICROARCHITECTURE Multicycle Datapath: beq
• rs == rt?
• BTA = (sign-extended immediate << 2) + (PC+4)
PCEn
IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0 Branch PCWrite PCSrc

CLK CLK CLK


CLK CLK
0 SrcA
WE WE3 A Zero CLK
25:21
PC' PC Instr A1 RD1 1 0
b 0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 1
Instr / Data 20:16
4 01 SrcB
0
Memory 15:11
A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

Chapter 7 <50>
MICROARCHITECTURE Multicycle Processor
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A Zero CLK
25:21
PC' PC Instr A1 RD1 1 0
0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

Chapter 7 <51>
MICROARCHITECTURE Multicycle Control
Control
MemtoReg
Unit
RegDst
IorD Multiplexer
PCSrc Selects
Main ALUSrcB1:0
Controller
Opcode5:0 (FSM) ALUSrcA
IRWrite
MemWrite
Register
PCWrite
Enables
Branch
RegWrite

ALUOp1:0

ALU
Funct5:0 ALUControl 2:0
Decoder

Chapter 7 <52>
MICROARCHITECTURE Main Controller FSM: Fetch
S0: Fetch

Reset

CLK
PCWrite 1
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl 2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK 0
CLK 0 CLK 0
0 SrcA 010
0 WE WE3 A Zero CLK 0
25:21
PC' PC Instr A1 RD1 1 0
0 Adr RD B 01

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 1
X
Instr / Data 1 20:16 4 01 SrcB
1 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

Chapter 7 <53>
MICROARCHITECTURE Main Controller FSM: Fetch
S0: Fetch
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite CLK
PCWrite PCWrite 1
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl 2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK 0
CLK 0 CLK 0
0 SrcA 010
0 WE WE3 A Zero CLK 0
25:21
PC' PC Instr A1 RD1 1 0
0 Adr RD B 01

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 1
X
Instr / Data 1 20:16 4 01 SrcB
1 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

Chapter 7 <54>
MICROARCHITECTURE Main Controller FSM: Decode
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

CLK
PCWrite 0
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl 2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst

CLK CLK CLK X


CLK 0 CLK 0
0 SrcA XXX
X WE WE3 A Zero CLK X
25:21
PC' PC Instr A1 RD1 1 0
0 Adr RD B XX

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 1
X
Instr / Data 0 20:16 4 01 SrcB
0 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

Chapter 7 <55>
MICROARCHITECTURE Main Controller FSM: Address
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or
S2: MemAdr Op = SW CLK
PCWrite 0
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
CLK RegDst CLK CLK 1
CLK 0 CLK 0
0 SrcA 010
X WE WE3 A Zero CLK X
25:21
PC' PC Instr A1 RD1 1 0
0 Adr RD B 10

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 1
X
Instr / Data 0 20:16 4 01 SrcB
0 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

Chapter 7 <56>
MICROARCHITECTURE Main Controller FSM: Address
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or CLK
S2: MemAdr Op = SW PCWrite 0
Branch 0 PCEn
IorD Control PCSrc
ALUSrcA = 1 MemWrite Unit ALUControl2:0
ALUSrcB = 10 IRWrite ALUSrcB1:0
ALUOp = 00 31:26
Op
ALUSrcA
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK 1
CLK 0 CLK 0
0 SrcA 010
X WE WE3 A Zero CLK X
25:21
PC' PC Instr A1 RD1 1 0
0 Adr RD B 10

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 1
X
Instr / Data 0 20:16 4 01 SrcB
0 0
Memory 15:11
A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

Chapter 7 <57>
MICROARCHITECTURE Main Controller FSM: lw
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or
S2: MemAdr Op = SW

ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00

Op = LW
S3: MemRead

IorD = 1

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

Chapter 7 <58>
MICROARCHITECTURE Main Controller FSM: sw
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or
S2: MemAdr Op = SW

ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00

Op = SW
Op = LW
S5: MemWrite
S3: MemRead

IorD = 1
IorD = 1
MemWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

Chapter 7 <59>
MICROARCHITECTURE Main Controller FSM: R-Type
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01
ALUOp = 00
PCSrc = 0
IRWrite
PCWrite

Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute

ALUSrcA = 1 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00
ALUOp = 00 ALUOp = 10

Op = SW
Op = LW S7: ALU
S5: MemWrite
Writeback
S3: MemRead

RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

Chapter 7 <60>
MICROARCHITECTURE Main Controller FSM: beq
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11
PCSrc = 0 ALUOp = 00
IRWrite
PCWrite
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute
S8: Branch
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01
ALUOp = 00 ALUOp = 10 PCSrc = 1
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite
Writeback
S3: MemRead

RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

Chapter 7 <61>
MICROARCHITECTURE Multicycle Controller FSM
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11
PCSrc = 0 ALUOp = 00
IRWrite
PCWrite
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute
S8: Branch
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01
ALUOp = 00 ALUOp = 10 PCSrc = 1
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite
Writeback
S3: MemRead

RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

Chapter 7 <62>
MICROARCHITECTURE Extended Functionality: addi
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11
PCSrc = 0 ALUOp = 00
IRWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01
ALUOp = 00 ALUOp = 10 PCSrc = 1
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback

RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

Chapter 7 <63>
MICROARCHITECTURE Main Controller FSM: addi
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11
PCSrc = 0 ALUOp = 00
IRWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 1 ALUOp = 00
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback

RegDst = 1 RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 0 MemtoReg = 0
MemWrite
RegWrite RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

Chapter 7 <64>
MICROARCHITECTURE Extended Functionality: j

PCEn
IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0Branch PCWrite PCSrc1:0

CLK CLK CLK


CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 01
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2

SignImm
15:0
Sign Extend
25:0 (jump)

Chapter 7 <65>
MICROARCHITECTURE Main Controller FSM: j
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0 S11: Jump
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11 Op = J
PCSrc = 00 ALUOp = 00
IRWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback

RegDst = 1 RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 0 MemtoReg = 0
MemWrite
RegWrite RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

Chapter 7 <66>
MICROARCHITECTURE Main Controller FSM: j
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0 S11: Jump
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11 Op = J
PCSrc = 00 ALUOp = 00 PCSrc = 10
IRWrite PCWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00
Branch

Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback

RegDst = 1 RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 0 MemtoReg = 0
MemWrite
RegWrite RegWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 1
RegWrite

Chapter 7 <67>
MICROARCHITECTURE Multicycle Processor Performance
• Instructions take different number of cycles:
– 3 cycles: beq, j
– 4 cycles: R-Type, sw, addi
– 5 cycles: lw
• CPI is weighted average
• SPECINT2000 benchmark:
– 25% loads
– 10% stores
– 11% branches
– 2% jumps
– 52% R-type
Average CPI = (0.11 + 0.02)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12

Chapter 7 <68>
MICROARCHITECTURE Multicycle Processor Performance
Multicycle critical path:
Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst

CLK CLK CLK


CLK CLK
0 SrcA
WE WE3 A Zero CLK
25:21
PC' PC Instr A1 RD1 1 0
0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2

SignImm
15:0
Sign Extend

Chapter 7 <69>
MICROARCHITECTURE Multicycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20

Tc = ?

Chapter 7 <70>
MICROARCHITECTURE Multicycle Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20

Tc = tpcq_PC + tmux + max(tALU + tmux, tmem) + tsetup


= tpcq_PC + tmux + tmem + tsetup
= [30 + 25 + 250 + 20] ps
= 325 ps
Chapter 7 <71>
MICROARCHITECTURE Multicycle Performance Example
Program with 100 billion instructions
Execution Time = ?

Chapter 7 <72>
MICROARCHITECTURE Multicycle Performance Example
Program with 100 billion instructions
Execution Time = (# instructions) × CPI × Tc
= (100 × 109)(4.12)(325 × 10-12)
= 133.9 seconds

This is slower than the single-cycle processor


(92.5 seconds). Why?

Chapter 7 <73>
MICROARCHITECTURE Multicycle Performance Example
Program with 100 billion instructions
Execution Time = (# instructions) × CPI × Tc
= (100 × 109)(4.12)(325 × 10-12)
= 133.9 seconds

This is slower than the single-cycle processor


(92.5 seconds). Why?
– Not all steps same length
– Sequencing overhead for each step (tpcq + tsetup= 50 ps)

Chapter 7 <74>
MICROARCHITECTURE Review: Single-Cycle Processor
Jump MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK
0 25:21 WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0 Result
1 A RD

ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
PCJump 15:11
1
WriteReg4:0
PCPlus4
+

SignImm
4 15:0
<<2
Sign Extend PCBranch

+
27:0 31:28

25:0
<<2

Chapter 7 <75>
MICROARCHITECTURE Review: Multicycle Processor
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct

MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 01
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2

ImmExt
15:0
Sign Extend
25:0 (Addr)

Chapter 7 <76>
MICROARCHITECTURE Pipelined MIPS Processor
• Temporal parallelism
• Divide single-cycle processor into 5 stages:
– Fetch
– Decode
– Execute
– Memory
– Writeback
• Add pipeline registers between stages

Chapter 7 <77>
MICROARCHITECTURE Single-Cycle vs. Pipelined
Single-Cycle
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900
Instr
Time (ps)
Fetch Decode Execute Memory Write
1
Instruction Read Reg ALU Read / Write Reg
Fetch Decode Execute Memory Write
2
Instruction Read Reg ALU Read / Write Reg

Pipelined
Instr
Fetch Decode Execute Memory Write
1
Instruction Read Reg ALU Read/Write Reg
Fetch Decode Execute Memory Write
2
Instruction Read Reg ALU Read/Write Reg
Fetch Decode Execute Memory Write
3
Instruction Read Reg ALU Read/Write Reg

Chapter 7 <78>
MICROARCHITECTURE Pipelined Processor Abstraction
1 2 3 4 5 6 7 8 9 10

Time (cycles)
$0
lw DM $s2
lw $s2, 40($0) IM RF 40 + RF

$t1
add DM $s3
add $s3, $t1, $t2 IM RF $t2 + RF

$s1
sub DM $s4
sub $s4, $s1, $s5 IM RF $s5 - RF

$t5
and DM $s5
and $s5, $t5, $t6 IM RF $t6 & RF

$s1
sw DM $s6
sw $s6, 20($s1) IM RF 20 + RF

$t3
or DM $s7
or $s7, $t3, $t4 IM RF $t4 | RF

Chapter 7 <79>
MICROARCHITECTURE Single-Cycle & Pipelined Datapath
CLK CLK
CLK
25:21 WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD

ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0 WriteReg4:0
15:11
1
PCPlus4
+

SignImm
4 15:0 <<2
Sign Extend
PCBranch

+
Result

CLK
CLK ALUOutW
CLK CLK CLK CLK
CLK
25:21
WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 0
A RD

ALU
1 ALUOutM ReadDataW
A RD 1
Instruction 20:16
A2 RD2 0 SrcBE Data
Memory
A3 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
20:16
RtE
0 WriteRegE4:0
15:11
RdE
1
+

SignImmE
4 15:0
<<2
Sign Extend PCBranchM
+
PCPlus4F PCPlus4D PCPlus4E

ResultW

Fetch Decode Execute Memory Writeback

Chapter 7 <80>
MICROARCHITECTURE Corrected Pipelined Datapath
CLK
CLK ALUOutW
CLK CLK CLK CLK
CLK
25:21
WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 0
A RD

ALU
ALUOutM ReadDataW
1 A RD 1
Instruction 20:16
A2 RD2 0 SrcBE Data
Memory
A3 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
20:16
RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdE
1
SignImmE
+

15:0 <<2
Sign Extend
4 PCBranchM

+
PCPlus4F PCPlus4D PCPlus4E

ResultW

Fetch Decode Execute Memory Writeback

WriteReg must arrive at same time as Result

Chapter 7 <81>
MICROARCHITECTURE Pipelined Processor Control
CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW


Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
BranchD BranchE BranchM
31:26 PCSrcM
Op ALUControlD ALUControlE2:0
5:0
Funct ALUSrcD ALUSrcE
RegDstD RegDstE
ALUOutW
CLK CLK CLK
CLK
25:21 WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 0
A RD

ALU
1 ALUOutM ReadDataW
A RD 1
Instruction 20:16
A2 RD2 0 SrcBE Data
Memory
A3 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File
20:16
RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdE
1
+

15:0 <<2
Sign Extend SignImmE
4 PCBranchM

+
PCPlus4F PCPlus4D PCPlus4E

ResultW

• Same control unit as single-cycle processor


• Control delayed to proper pipeline stage
Chapter 7 <82>
MICROARCHITECTURE Pipeline Hazards
• When an instruction depends on result from
instruction that hasn’t completed
• Types:
– Data hazard: register value not yet written back to
register file
– Control hazard: next instruction not decided yet
(caused by branches)

Chapter 7 <83>
MICROARCHITECTURE Data Hazard
1 2 3 4 5 6 7 8

Time (cycles)
$s2
add DM $s0
add $s0, $s2, $s3 IM RF $s3 + RF

$s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 & RF

$s4
or DM $t1
or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF

Chapter 7 <84>
MICROARCHITECTURE Handling Data Hazards
• Insert nops in code at compile time
• Rearrange code at compile time
• Forward data at run time
• Stall the processor at run time

Chapter 7 <85>
MICROARCHITECTURE Compile-Time Hazard Elimination
• Insert enough nops for result to be ready
• Or move independent useful instructions forward
1 2 3 4 5 6 7 8 9 10

Time (cycles)
$s2
add DM $s0
add $s0, $s2, $s3 IM RF $s3 + RF

nop DM
nop IM RF RF

nop DM
nop IM RF RF

$s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 & RF

$s4
or DM $t1
or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF

Chapter 7 <86>
MICROARCHITECTURE Data Forwarding

1 2 3 4 5 6 7 8

Time (cycles)
$s2
add DM $s0
add $s0, $s2, $s3 IM RF $s3 + RF

$s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 & RF

$s4
or DM $t1
or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF

Chapter 7 <87>
MICROARCHITECTURE Data Forwarding
CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW


Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
PCSrcM
BranchD BranchE BranchM

CLK CLK CLK


CLK
25:21
WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 00
A RD 01

ALU
1 10 ALUOutM ReadDataW
A RD
Instruction 20:16
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File 1
25:21
RsD RsE ALUOutW
0
20:16
RtD RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdD RdE
1
SignImmD SignImmE
Sign
+

15:0
Extend
4
<<2

+
PCPlus4F PCPlus4D PCPlus4E

PCBranchM

ResultW

RegWriteW
ForwardBE
ForwardAE

RegWriteM
Hazard Unit

Chapter 7 <88>
MICROARCHITECTURE Data Forwarding
• Forward to Execute stage from either:
– Memory stage or
– Writeback stage
• Forwarding logic for ForwardAE:
if ((rsE != 0) AND (rsE == WriteRegM) AND RegWriteM)

then ForwardAE = 10
else if ((rsE != 0) AND (rsE == WriteRegW) AND RegWriteW)
then ForwardAE = 01
else ForwardAE = 00

Forwarding logic for ForwardBE same, but replace rsE with rtE

Chapter 7 <89>
MICROARCHITECTURE Stalling

1 2 3 4 5 6 7 8

Time (cycles)
$0
lw DM $s0
lw $s0, 40($0) IM RF 40 + RF

Trouble!
$s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 & RF

$s4
or DM $t1
or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF

Chapter 7 <90>
MICROARCHITECTURE Stalling

1 2 3 4 5 6 7 8 9

Time (cycles)
$0
lw DM $s0
lw $s0, 40($0) IM RF 40 + RF

$s0 $s0
and DM $t0
and $t0, $s0, $s1 IM RF $s1 RF $s1 & RF

$s4
or or DM $t1
or $t1, $s4, $s0 IM IM RF $s0 | RF

Stall $s0
sub DM $t2
sub $t2, $s0, $s5 IM RF $s5 - RF

Chapter 7 <91>
MICROARCHITECTURE Stalling Hardware
CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW


Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
PCSrcM
BranchD BranchE BranchM

CLK CLK CLK


CLK
25:21
WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 00
A RD 01

ALU
ALUOutM ReadDataW
EN

1 10 A RD
Instruction 20:16
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File 1
25:21
RsD RsE ALUOutW
0
20:16
RtD RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdD RdE
1
SignImmD SignImmE
Sign
+

15:0
Extend
4
<<2

+
PCPlus4F PCPlus4D PCPlus4E

CLR
EN

PCBranchM

ResultW

MemtoRegE

RegWriteW
ForwardBE

RegWriteM
ForwardAE
FlushE
StallD
StallF

Hazard Unit

Chapter 7 <92>
MICROARCHITECTURE Stalling Logic
lwstall =
((rsD==rtE) OR (rtD==rtE)) AND MemtoRegE

StallF = StallD = FlushE = lwstall

Chapter 7 <93>
MICROARCHITECTURE Control Hazards
• beq:
– branch not determined until 4th stage of pipeline
– Instructions after branch fetched before branch occurs
– These instructions must be flushed if branch happens
• Branch misprediction penalty
– number of instruction flushed when branch is taken
– May be reduced by determining branch earlier

Chapter 7 <94>
MICROARCHITECTURE Control Hazards: Original Pipeline
CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW


Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD 2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
PCSrcM
BranchD BranchE BranchM

CLK CLK CLK


CLK
25:21
WE3 SrcAE ZeroM WE
0 PC' PCF InstrD A1 RD1 00
A RD 01

ALU
ALUOutM ReadDataW
1 10
EN

A RD
Instruction 20:16
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File 1
25:21
RsD RsE ALUOutW
0
20:16
RtD RtE
0 WriteRegE4:0 WriteRegM 4:0 WriteRegW 4:0
15:11
RdD RdE
1
SignImmD SignImmE
Sign
+

15:0
Extend
4
<<2

+
PCPlus4F PCPlus4D PCPlus4E

CLR
EN

PCBranchM

ResultW

MemtoRegE

RegWriteW
ForwardBE

RegWriteM
ForwardAE
FlushE
StallD
StallF

Hazard Unit

Chapter 7 <95>
MICROARCHITECTURE Control Hazards
1 2 3 4 5 6 7 8 9

Time (cycles)
$t1
lw DM
20 beq $t1, $t2, 40 IM RF $t2 - RF

$s0
and DM
24 and $t0, $s0, $s1 IM RF $s1 & RF
Flush
$s4 these
or DM instructions
28 or $t1, $s4, $s0 IM RF $s0 | RF

$s0
sub DM
2C sub $t2, $s0, $s5 IM RF $s5 - RF

30 ...
...
$s2
slt DM $t3

slt
64 slt $t3, $s2, $s3 IM RF $s3 RF

Chapter 7 <96>
MICROARCHITECTURE Early Branch Resolution
CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW


Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
BranchD

EqualD PCSrcD
CLK CLK CLK
CLK
WE3
= WE
25:21 SrcAE
0 PC' PCF InstrD A1 RD1 00
A RD 01

ALU
ALUOutM ReadDataW
1 10
EN

A RD
Instruction 20:16
A2 RD2 00 0 SrcBE Data
Memory 01
A3 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File 1
25:21
RsD RsE ALUOutW
0
20:16
RtD RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdE RdE
1
SignImmD SignImmE
Sign
+

15:0
Extend
4
<<2

PCPlus4F PCPlus4D
+
CLR

CLR
EN

PCBranchD

ResultW

MemtoRegE

RegWriteW
ForwardBE
ForwardAE

RegWriteM
FlushE
StallD
StallF

Hazard Unit

Introduced another data hazard in Decode stage


Chapter 7 <97>
MICROARCHITECTURE Early Branch Resolution
1 2 3 4 5 6 7 8 9

Time (cycles)
$t1
lw DM
20 beq $t1, $t2, 40 IM RF $t2 - RF

$s0 Flush
and DM
24 and $t0, $s0, $s1 IM RF $s1 & RF this
instruction

28 or $t1, $s4, $s0

2C sub $t2, $s0, $s5

30 ...
...
$s2
slt DM $t3

slt
64 slt $t3, $s2, $s3 IM RF $s3 RF

Chapter 7 <98>
MICROARCHITECTURE Handling Data & Control Hazards
CLK CLK CLK

RegWriteD RegWriteE RegWriteM RegWriteW


Control MemtoRegE MemtoRegM MemtoRegW
MemtoRegD
Unit
MemWriteD MemWriteE MemWriteM
ALUControlD2:0 ALUControlE2:0
31:26
Op ALUSrcD ALUSrcE
5:0
Funct RegDstD RegDstE
BranchD

EqualD PCSrcD
CLK CLK CLK
CLK
WE3
= WE
25:21 SrcAE
0 PC' PCF InstrD A1 RD1 0 00
A RD 01

ALU
ALUOutM ReadDataW
1 1 10
EN

A RD
Instruction 20:16
A2 RD2 0 00 0 SrcBE Data
Memory 01
A3 1 10 1 Memory
Register WriteDataE WriteDataM
WD3 WD
File 1
25:21
RsD RsE ALUOutW
0
20:16
RtD RtE
0 WriteRegE4:0 WriteRegM4:0 WriteRegW 4:0
15:11
RdD RdE
1
SignImmD SignImmE
Sign
+

15:0
Extend
4
<<2

+
PCPlus4F PCPlus4D
CLR

CLR
EN

PCBranchD

ResultW

MemtoRegE

RegWriteW
ForwardBD

ForwardBE
ForwardAD

ForwardAE

RegWriteM
RegWriteE
BranchD

FlushE
StallD
StallF

Hazard Unit

Chapter 7 <99>
MICROARCHITECTURE Control Forwarding & Stalling Logic
• Forwarding logic:
ForwardAD = (rsD !=0) AND (rsD == WriteRegM) AND RegWriteM
ForwardBD = (rtD !=0) AND (rtD == WriteRegM) AND RegWriteM

• Stalling logic:
branchstall = BranchD AND
[RegWriteE AND ((WriteRegE == rsD) OR (WriteRegE == rtD))
OR
[MemtoRegM AND ((WriteRegM == rsD) OR (WriteRegM == rtD))]

StallF = StallD = FlushE = lwstall OR branchstall

Chapter 7 <100>
MICROARCHITECTURE Branch Prediction
• Guess whether branch will be taken
– Backward branches are usually taken (loops)
– Consider history to improve guess
• Good prediction reduces fraction of branches
requiring a flush

Chapter 7 <101>
MICROARCHITECTURE Pipelined Performance Example
• SPECINT2000 benchmark:
– 25% loads
– 10% stores
– 11% branches
– 2% jumps
– 52% R-type
• Suppose:
– 40% of loads used by next instruction
– 25% of branches mispredicted
– All jumps flush next instruction
• What is the average CPI?

Chapter 7 <102>
MICROARCHITECTURE Pipelined Performance Example
• SPECINT2000 benchmark:
– 25% loads
– 10% stores
– 11% branches
– 2% jumps
– 52% R-type
• Suppose:
– 40% of loads used by next instruction
– 25% of branches mispredicted
– All jumps flush next instruction
• What is the average CPI?
– Load/Branch CPI = 1 when no stalling, 2 when stalling
– CPIlw = 1(0.6) + 2(0.4) = 1.4
– CPIbeq = 1(0.75) + 2(0.25) = 1.25

Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1)

= 1.15
Chapter 7 <103>
MICROARCHITECTURE Pipelined Performance
• Pipelined processor critical path:
Tc = max {
tpcq + tmem + tsetup
2(tRFread + tmux + teq + tAND + tmux + tsetup )
tpcq + tmux + tmux + tALU + tsetup
tpcq + tmemwrite + tsetup
2(tpcq + tmux + tRFwrite) }

Chapter 7 <104>
MICROARCHITECTURE Pipelined Performance Example
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Equality comparator teq 40
AND gate tAND 15
Memory write tmemwrite 220
Register file write tRFwrite 100

Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup )


= 2[150 + 25 + 40 + 15 + 25 + 20] ps = 550 ps
Chapter 7 <105>
MICROARCHITECTURE Pipelined Performance Example
Program with 100 billion instructions
Execution Time = (# instructions) × CPI × Tc
= (100 × 109)(1.15)(550 × 10-12)
= 63 seconds

Chapter 7 <106>
MICROARCHITECTURE Processor Performance Comparison
Execution
Time Speedup
Processor (seconds) (single-cycle as baseline)
Single-cycle 92.5 1
Multicycle 133 0.70
Pipelined 63 1.47

Chapter 7 <107>
MICROARCHITECTURE Review: Exceptions
• Unscheduled function call to exception handler
• Caused by:
– Hardware, also called an interrupt, e.g. keyboard
– Software, also called traps, e.g. undefined instruction
• When exception occurs, the processor:
– Records cause of exception (Cause register)
– Jumps to exception handler (0x80000180)
– Returns to program (EPC register)

Chapter 7 <108>
MICROARCHITECTURE Example Exception

Chapter 7 <109>
MICROARCHITECTURE Exception Registers
• Not part of register file
– Cause
• Records cause of exception
• Coprocessor 0 register 13
– EPC (Exception PC)
• Records PC where exception occurred
• Coprocessor 0 register 14
• Move from Coprocessor 0
– mfc0 $t0, Cause
– Moves contents of Cause into $t0
mfc0
010000 00000 $t0 (8) Cause (13) 00000000000

31:26 25:21 20:16 15:11 10:0


Chapter 7 <110>
MICROARCHITECTURE Exception Causes
Exception Cause
Hardware Interrupt 0x00000000

System Call 0x00000020

Breakpoint / Divide by 0 0x00000024

Undefined Instruction 0x00000028

Arithmetic Overflow 0x00000030

Extend multicycle MIPS processor to handle last


two types of exceptions

Chapter 7 <111>
MICROARCHITECTURE Exception Hardware: EPC & Cause
EPCWrite IntCause CauseWrite
CLK

CLK 0x30 0 Cause


0x28 1 EN
EPC
EN

PCEn
IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0 Branch PCWrite PCSrc1:0

CLK CLK CLK


CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 01
Instr / Data 20:16
4 01 SrcB Overflow 10
0
Memory A3 10
15:11
1 PCJump 11
CLK Register
WD 11
0 File 0x8000 0180
Data WD3
1
<<2 27:0
<<2

SignImm
15:0
Sign Extend
25:0 (jump)

Chapter 7 <112>
MICROARCHITECTURE Control FSM with Exceptions
S12: Undefined S14: MFC0
PCSrc = 11
PCWrite RegDst = 0
IntCause = 1 Memtoreg = 10
CauseWrite RegWrite
EPCWrite

Op = others

S0: Fetch S1: Decode


IorD = 0 Op = mfc0
Reset AluSrcA = 0 S11: Jump
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11 Op = J
PCSrc = 00 ALUOp = 00 PCSrc = 10
IRWrite PCWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00
Branch

Op = SW
Op = LW S7: ALU Overflow Overflow
S5: MemWrite S10: ADDI
Writeback S13:
S3: MemRead Writeback
Overflow
PCSrc = 11
RegDst = 1 PCWrite RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 00 IntCause = 0 MemtoReg = 00
MemWrite
RegWrite CauseWrite RegWrite
EPCWrite

S4: Mem
Writeback

RegDst = 0
MemtoReg = 01
RegWrite

Chapter 7 <113>
MICROARCHITECTURE Exception Hardware: mfc0
EPCWrite IntCause CauseWrite
CLK
...
CLK 0x30 0 Cause
01101 C0
0x28 1 EN
EPC 01110

EN ...
15:11
PCEn
IorD MemWrite IRWrite RegDst MemtoReg1:0 RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0Branch PCWrite PCSrc1:0

CLK CLK CLK


CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 Adr RD B

ALU
EN A EN
20:16
A2 RD2 00 ALUResult ALUOut
1 01
Instr / Data 20:16
4 01 SrcB Overflow 10
0
Memory A3 10
15:11
1 PCJump 11
CLK Register
WD 10 11
File 0x8000 0180
Data 00 WD3
01
<<2 27:0
<<2

SignImm
15:0
Sign Extend
25:0 (jump)

Chapter 7 <114>
MICROARCHITECTURE Advanced Microarchitecture
• Deep Pipelining
• Branch Prediction
• Superscalar Processors
• Out of Order Processors
• Register Renaming
• SIMD
• Multithreading
• Multiprocessors

Chapter 7 <115>
MICROARCHITECTURE Deep Pipelining
• 10-20 stages typical
• Number of stages limited by:
– Pipeline hazards
– Sequencing overhead
– Power
– Cost

Chapter 7 <116>
MICROARCHITECTURE Branch Prediction
• Ideal pipelined processor: CPI = 1
• Branch misprediction increases CPI
• Static branch prediction:
– Check direction of branch (forward or backward)
– If backward, predict taken
– Else, predict not taken
• Dynamic branch prediction:
– Keep history of last (several hundred) branches in branch target
buffer, record:
• Branch destination
• Whether branch was taken

Chapter 7 <117>
MICROARCHITECTURE Branch Prediction Example
add $s1, $0, $0 # sum = 0
add $s0, $0, $0 # i = 0
addi $t0, $0, 10 # $t0 = 10
for:
beq $s0, $t0, done # if i == 10, branch
add $s1, $s1, $s0 # sum = sum + i
addi $s0, $s0, 1 # increment i
j for
done:

Chapter 7 <118>
MICROARCHITECTURE 1-Bit Branch Predictor
• Remembers whether branch was taken the
last time and does the same thing
• Mispredicts first and last branch of loop

Chapter 7 <119>
MICROARCHITECTURE 2-Bit Branch Predictor
strongly weakly weakly strongly
taken taken not taken not taken
taken
taken taken taken
predict predict predict predict
taken taken not taken not taken
taken taken taken taken

Only mispredicts last branch of loop

Chapter 7 <120>
MICROARCHITECTURE Superscalar
• Multiple copies of datapath execute multiple
instructions at once
• Dependencies make it tricky to issue multiple
instructions at once
CLK CLK CLK CLK

CLK

PC RD A1
A A2
A3 RD1

ALUs
RD4 A1 RD1
A4
Instruction A5 Register A2 RD2
A6 File RD2
Memory RD5 Data
WD3 Memory
WD6
WD1
WD2

Chapter 7 <121>
MICROARCHITECTURE Superscalar Example
lw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3 Ideal IPC: 2
and $t2, $s4, $t0 Actual IPC: 2
or $t3, $s5, $s6
sw $s7, 80($t3) 1 2 3 4 5 6 7 8

Time (cycles)
$s0
lw $t0
lw $t0, 40($s0) 40 +
RF $s1 DM RF
IM
add $t1
add $t1, $s1, $s2 $s2 +

$s1
sub $t2
sub $t2, $s1, $s3 $s3 -
RF $s3 DM RF
IM
and $t3
and $t3, $s3, $s4 $s4 &

$s1
or $t4
or $t4, $s1, $s5 $s5 |
RF $s0 DM RF
IM
sw $s5
sw $s5, 80($s0) 80

Chapter 7 <122>
MICROARCHITECTURE Superscalar with Dependencies
lw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3 Ideal IPC: 2
and $t2, $s4, $t0 Actual IPC: 6/5 = 1.2
or $t3, $s5, $s6
sw $s7, 80($t3) 1 2 3 4 5 6 7 8 9

Time (cycles)
$s0
lw $t0
lw $t0, 40($s0) 40 +
RF DM RF
IM

$t0 $t0
add $t1
add $t1, $t0, $s1 $s1 $s1 +
RF $s2 RF $s2 DM RF
IM
sub $t0
sub $t0, $s2, $s3 $s3 $s3 -

Stall $s4
and and $t2
and $t2, $s4, $t0 $t0 &
RF $s5 DM RF
IM IM
or or $t3
or $t3, $s5, $s6 $s6 |

$t3
sw $s7
sw $s7, 80($t3) 80 +
RF DM RF
IM

Chapter 7 <123>
MICROARCHITECTURE Out of Order Processor
• Looks ahead across multiple instructions
• Issues as many instructions as possible at once
• Issues instructions out of order (as long as no
dependencies)
• Dependencies:
– RAW (read after write): one instruction writes, later
instruction reads a register
– WAR (write after read): one instruction reads, later
instruction writes a register
– WAW (write after write): one instruction writes, later
instruction writes a register

Chapter 7 <124>
MICROARCHITECTURE Out of Order Processor
• Instruction level parallelism (ILP): number
of instruction that can be issued
simultaneously (average < 3)
• Scoreboard: table that keeps track of:
– Instructions waiting to issue
– Available functional units
– Dependencies

Chapter 7 <125>
MICROARCHITECTURE Out of Order Processor Example
lw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3 Ideal IPC: 2
and $t2, $s4, $t0 Actual IPC: 6/4 = 1.5
or $t3, $s5, $s6
sw $s7, 80($t3) 1 2 3 4 5 6 7 8

Time (cycles)
$s0
lw $t0
lw $t0, 40($s0) 40 +
RF $s5 DM RF
IM
or $t3
or $t3, $s5, $s6 $s6 |

RAW
$t3
sw $s7
sw $s7, 80($t3) 80 +
RF DM RF
two cycle latency IM
between load and RAW
use of $t0
$t0
add $t1
add $t1, $t0, $s1 $s1 +
RF $s2 DM RF
WAR IM
sub $t0
sub $t0, $s2, $s3 $s3 -

RAW
$s4
and $t2
and $t2, $s4, $t0 $t0 &
RF DM RF
IM

Chapter 7 <126>
MICROARCHITECTURE Register Renaming
lw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3 Ideal IPC: 2
and $t2, $s4, $t0 Actual IPC: 6/3 = 2
or $t3, $s5, $s6
sw $s7, 80($t3) 1 2 3 4 5 6 7

Time (cycles)
$s0
lw $t0
lw $t0, 40($s0) 40 +
RF $s2 DM RF
IM
sub $r0
sub $r0, $s2, $s3 $s3 -

2-cycle RAW RAW $s4


and $t2
and $t2, $s4, $r0 $r0 &
RF $s5 DM RF
IM
or $t3
or $t3, $s5, $s6 $s6 |

RAW $t0
add $t1
add $t1, $t0, $s1 $s1 +
RF $t3 DM RF
IM
sw $s7
sw $s7, 80($t3) 80 +

Chapter 7 <127>
MICROARCHITECTURE SIMD
• Single Instruction Multiple Data (SIMD)
– Single instruction acts on multiple pieces of data at once
– Common application: graphics
– Perform short arithmetic operations (also called packed
arithmetic)
• For example, add four 8-bit elements
padd8 $s2, $s0, $s1
32 24 23 16 15 8 7 0 Bit position

a3 a2 a1 a0 $s0

+ b3 b2 b1 b0 $s1

a3 + b3 a2 + b2 a1 + b1 a0 + b0 $s2

Chapter 7 <128>
MICROARCHITECTURE Advanced Architecture Techniques
• Multithreading
– Wordprocessor: thread for typing, spell checking,
printing
• Multiprocessors
– Multiple processors (cores) on a single chip

Chapter 7 <129>
MICROARCHITECTURE Threading: Definitions
• Process: program running on a computer
– Multiple processes can run at once: e.g., surfing
Web, playing music, writing a paper
• Thread: part of a program
– Each process has multiple threads: e.g., a word
processor may have threads for typing, spell
checking, printing

Chapter 7 <130>
MICROARCHITECTURE Threads in Conventional Processor
• One thread runs at once
• When one thread stalls (for example, waiting
for memory):
– Architectural state of that thread stored
– Architectural state of waiting thread loaded into
processor and it runs
– Called context switching
• Appears to user like all threads running
simultaneously

Chapter 7 <131>
MICROARCHITECTURE Multithreading
• Multiple copies of architectural state
• Multiple threads active at once:
– When one thread stalls, another runs immediately
– If one thread can’t keep all execution units busy,
another thread can use them
• Does not increase instruction-level parallelism
(ILP) of single thread, but increases
throughput
Intel calls this “hyperthreading”

Chapter 7 <132>
MICROARCHITECTURE Multiprocessors
• Multiple processors (cores) with a method of
communication between them
• Types:
– Homogeneous: multiple cores with shared memory
– Heterogeneous: separate cores for different tasks (for
example, DSP and CPU in cell phone)
– Clusters: each core has own memory system

Chapter 7 <133>
MICROARCHITECTURE Other Resources
• Patterson & Hennessy’s: Computer
Architecture: A Quantitative Approach
• Conferences:
– www.cs.wisc.edu/~arch/www/
– ISCA (International Symposium on Computer
Architecture)
– HPCA (International Symposium on High Performance
Computer Architecture)

Chapter 7 <134>

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy