0% found this document useful (0 votes)
28 views31 pages

Slide 5

The lecture covers the design of a single-cycle processor, focusing on the RISC-V architecture, which includes components like the datapath and control logic. It outlines the instruction execution process, detailing the five stages: Fetch, Decode, Execute, Memory, and Write Back, while emphasizing the requirements and block diagrams for each stage. The lecture also discusses the implementation of specific instruction formats and the role of the ALU in processing arithmetic and logical operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views31 pages

Slide 5

The lecture covers the design of a single-cycle processor, focusing on the RISC-V architecture, which includes components like the datapath and control logic. It outlines the instruction execution process, detailing the five stages: Fetch, Decode, Execute, Memory, and Write Back, while emphasizing the requirements and block diagrams for each stage. The lecture also discusses the implementation of specific instruction formats and the role of the ALU in processing arithmetic and logical operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

ELT3047 Computer Architecture

Lecture 5: Single cycle processor design

Hoang Gia Hung


Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Today’s lecture overview

❑ A single-core processor consists of


▪ Datapath: HW elelements that process data,
e.g. perform the arithmetic, logical & memory
operations.
▪ Control: HW elements that tell the datapath,
memory & I/O devices what to do according
to program instructions.

❑ Building two RISC-V implementations


➢ Single cycle processor (starting this week)
➢ Pipelined processor (later)

❑ Only a simplified RISC-V ISA subset


➢ Memory reference: lw, sw
➢ Arithmetic/logical: add, sub, ori
➢ Control transfer: beq
Instruction execution in a single-cycle
processor
❑ Any instruction must be executed in exactly one single clock
cycle, which comprises 5 sequential phases.
➢ Example: add t3, t1, t2 vs lw t3, 20(t1)

add t3, t1, t2 lw t3, 20(t1) Clk

Fetch Read inst. at [PC] Read inst. at [PC] Fetch

Decode
o Addition o Load word
&

Next Instruction
o Read [t1] as opr1 o Read [t1] as opr1 Decode
Operand
o Read [t2] as opr2 o Use 20 as opr2
Fetch

ALU Result = opr1 + opr2 MemAddr = opr1 + opr2 ALU

Memory Use MemAddr to read Memory


Access from memory

Result Result WB
Result stored in t3 Memory data stored in t3
Write
Steps to design a datapath

? ?
? ? ?

Clk

1. Instruction Fetch 2. Decode 3. Execute 4. Memory 5. Write back

❑ We will build a lite RISC-V datapath incrementally:


1. Look at each stage closely, figure out the requirements and processes.
2. Sketch a high level block diagram, then zoom in for each elements.
3. With the simple starting design, check whether different type of instructions
can be handled, add modifications when needed.
A prelude to control
1. Instruction Fetch 2. Decode 3. Execute 4. Memory 5. Write back

? ?
? ? ?

Control Logic

❑ Not all instructions need all 5 stages → the control logic selects
“needed” datapath lines based on the instruction.
➢ MUX selector, ALU op selector, write enable, etc.
Fetch Stage: Requirements
❑ Instruction Fetch Stage:
1. Use the Program Counter (PC) to fetch the instruction from memory
▪ PC is implemented as a special register in the processor
2. Increment the PC by 4 to get the address of the next instruction:
▪ How do we know the next instruction is at PC+4?
▪ Note the exception when branch/jump instruction is executed

❑ Output to the next stage (Decode):


➢ The instruction to be executed

1. Fetch
2. Decode
3. ALU
4. Memory
5. RegWrite
Fetch Stage: Block diagram

Increment by
4 for next
instruction

Decode Stage
32-bit
register
Zoomed-in element: PC register
❑ Seems that we’re reading and updating PC at the same time!
➢ How can it works properly during a single cycle?

❑ Magic of clock
➢ PC is read during the first half of the clock period and it is updated with PC+4
at the next rising clock edge.
Time
Clk
𝑡𝑠𝑒𝑡𝑢𝑝 𝑡𝑠𝑒𝑡𝑢𝑝
𝑡𝑐𝑙𝑘−2−𝑄
Add
4 PC 100 104
𝑡𝑎𝑑𝑑
PC Read
In address
In 104 108
Instruction
Clk Instruction Flip-flop timing
memory
D Q 𝑡𝑠𝑒𝑡𝑢𝑝 time that D must not change before ↑
𝑡𝑐𝑙𝑘−2−𝑄 Delay after ↑ until D appears at Q

clk 𝑡𝑎𝑑𝑑 Delay at the adder


Zoomed-in element: Instruction Memory

❑ Idealized memory
➢ One input bus: Address Address DataOut
Instruction
➢ One output bus: Data Out 32 32
Memory
❑ Memory word is found by
➢ Address selects the word to put on Data Out
➢ The word must had been written to the memory prior to instruction fetch.
➢ During instruction fetch operation, the memory behaves as a combinational
logic block: Address valid → Data Out valid after “access time”.

❑ Note: in practice, there must be more inputs but they are not
used during instruction fetch.
➢ E.g. Data In, Clock, Write Enable had been used to write the instructions to
the memory (prior to instruction fetch).
Decode Stage: Requirements
❑ Instruction Decode Stage:
➢ Gather data from the instruction fields:
1. Read the opcode to determine instruction type and field lengths
2. Read data from general purpose registers (in the register file)
▪ Can be two (e.g. add), one (e.g. addi) or zero (e.g. auipc)

❑ Input from previous stage (Fetch):


➢ The instruction to be executed

❑ Output to the next stage (ALU):


➢ Operation and the necessary operands
1. Fetch
2. Decode
3. ALU
4. Memory
5. RegWrite
Decode Stage: Block Diagram
Register Register
numbers File
Data
Fetch Stage

5 Read 32

ALU Stage
Read
register A data A
5 Read
register B Operands
Inst. 5 Write
register Read 32
data B
Operation

Collection of
registers, known
as register file
Zoomed-in element: Register File
5 RA RA 32
❑ A collection of 32 data BusA
Register
registers: numbers
5 RB
Register
➢ Two 32-bit output busses: 5 RW File Data
busA and busB RB 32
32 data
➢ One 32-bit input bus: Write BusB
Data
data
busW BusW

❑ Register is selected by: Clk Write Enable

➢ RA (number) selects the register to put on busA (data)


➢ RB (number) selects the register to put on busB (data)
➢ RW (number) selects the register to be written via busW (data) when Write
Enable is 1.

❑ Clock input (CLK)


➢ CLK input is a factor ONLY during write op.
➢ During read op., behaves as a combinational logic block: RA/RB valid →
busA/busB valid after “access time”
Decode Stage: R-Format Instruction
add x18, x19, x20
Notation:
0000 000

Inst [Y:X]
= bits X to Y in Instruction
10100

5 AddrB
DataA
32 content of
5
BusA register x19
AddrA
10011 000

Register
5 AddrD File

DataB
32
content of
32
DataD BusB register x20
BusW
10010

Clk Write Enable


Result to be stored
011 0011

into register x18


(produced by later
stage)
Decode Stage: I-Format Instruction
addi x15, x1, -50
111111001110

Inst [24:20]
5
AddrB DataA
32 content of
5
BusA register x1
00001

AddrA
Register
5 File
AddrD
32
000

DataB
32
DataD BusB
01111

BusW
Problems:
Clk Write Enable RB data is an
Result to be stored
0010011

into register x15


immediate value,
(produced by later not from register!
stage)
Adding addi to datapath

+4 Reg[]
DataD
ALU
pc IMEM
inst[11:7]
AddrD DataA Reg[rs1] alu
pc+4 inst[19:15] AddrA 0
inst[24:20] AddrB DataB
Reg[rs2] 1

Imm. imm[31:0]
Gen
inst[31:20]

ALUSel=Add
ImmSel=I BSel=1

❑ Decoding problem for addi is completely solved at ALU stage


➢ Decoding stage: copy inst[31:20] to low 12 bits of immediate & then sign-
extended by filling up the upper 20 bits of the immediate with inst[31].
➢ ALU stage: use a MUX prior to the ALU to select busB/immediate operand.
➢ Note: this set-up also works for all other I-format arithmetic instructions
(sltiu,andi,ori, …) just by changing the control signal ALUSel.
I- & S-type Immediate Generator

31 25 24 20 19 15 14 12 11 7 6 0
imm[11:0] rs1 funct3 rd I-opcode
imm[11:5] rs2 rs1 funct3 imm[4:0] S-opcode

5
5
1 6
I S

inst[31](sign-extension) inst[30:25] inst[24:20] I


inst[31](sign-extension) inst[30:25] inst[11:7] S
31 11 10 5 4 0

❑ Immediates are decoded differently for I-type and S-type instr’s.


➢ Just need a 5-bit mux to select between two positions where low five bits of
immediate can reside in instruction.
➢ Other bits in immediate are wired to fixed positions in instruction.
ALU Stage: Requirements
❑ Instruction ALU Stage:
➢ ALU = Arithmetic-Logic Unit
➢ Also called the Execution stage
➢ Perform the real work for most instructions here
▪ Arithmetic (e.g. add, sub), Shifting (e.g. sll), Logical (e.g. and, or)
▪ Memory operation (e.g. lw, sw): Address calculation
▪ Branch operation (e.g. bne, beq): Perform register comparison and
target address calculation

❑ Input from previous stage (Decode):


➢ Operation and Operand(s)
1. Fetch
❑ Output to the next stage (Memory): 2. Decode
➢ Calculation result 3. ALU
4. Memory
5. RegWrite
ALU Stage: Block Diagram

Memory Stage
Decode Stage

ALU result
Operands ALU

Operation

Logic to perform
arithmetic and
logical operations
Element: Arithmetic Logic Unit
ALUSel
4
❑ ALU (Arithmetic Logic Unit) A
32
➢ Combinational logic to implement
arithmetic and logical operations ALU 32
result

❑ Inputs: B A op B
➢ Two 32-bit numbers 32

❑ Control:
➢ 4-bit to decide the particular operation ALUSel Function
0000 AND
❑ Output: OR
0001
➢ Result of arithmetic/logical operation
0010 add
0110 subtract
0111 slt
1100 NOR
ALU Stage: Branch Instructions
❑ Branch instruction is harder as we need to perform two
calculations
❑ Example: "beq x9, x0, 3"
1. Branch Outcome:
▪ Need a comparator to compare the registers
2. Branch Target Address:
▪ Use ALU to calculate the address
▪ Need PC (from Fetch Stage)
▪ Need Offset (from Decode Stage)

❑ Also need to feed the branch target address back to the fetch
stage!
Branch Comparator
❑ BrEq = 1, if A=B
A Branch
❑ BrLT = 1, if A < B
Comp.
❑ BrUn =1 selects unsigned B
comparison for BrLT, 0=signed
❑ BGE branch: A >= B, if !(A<B)

BrUnBrEq BrLT
B-type Immediate Generator

❑ Only bit inst[7] changes role in immediate between S and B


➢ Only need a single-bit 2-way mux,

❑ 12-bit immediate encodes PC-relative offset of -4096 to +4094


bytes in multiples of 2 bytes:
➢ Treat immediate as in range -2048 to +2047, then shift left by 1 bit to
multiply by 2 for branches
Adding branch to the datapath

alu
+4 Reg[] pc
wb 1
DataD Reg[rs1]
1 ALU
0
pc IMEM inst[11:7] AddrD 0
Reg[rs2]
pc+4 inst[19:15] AddrA DataA Branch 0
Comp.
inst[24:20] AddrB DataB 1

inst[31:7]
Imm. imm[31:0]

Gen
BrUn
PCSel=taken/not-taken ImmSel=B RegWEn=0 Bsel=1 ALUSel=Add
Choose to BrEq BrLT ASel=1
Control Signal generate Choose PC as
to select B-type ALU opr 1,
between immediate imm[31:0] as
(PC+4) or ALU opr2, to
Branch Target calculate the
branch target
Memory Stage: Requirements
❑ Instruction Memory Access Stage:
➢ Only the load and store instructions need to perform operation in this stage
▪ Use memory address calculated by ALU Stage
▪ Read from or write to data memory
➢ All other instructions remain idle
▪ Result from ALU Stage will pass through to be used in Register Write
stage (later in this lecture) if applicable

❑ Input from previous stage (ALU):


➢ Computation result to be used as memory address (if applicable)

❑ Output to the next stage (Register Write):


➢ Result to be stored (if applicable) 1. Fetch
2. Decode
3. ALU
4. Memory
5. RegWrite
Memory Stage: Block Diagram

Register Write
ALU Stage

32 Address

Stage
Read 32
Result Data
32 Write
Data Data
Memory

MemRW

Memory which
stores data values
Adding lw to datapath

lw x14, 8(x2)

+4 Reg[] pc alu
wb 1
1 DataD Reg[rs1]
alu pc inst[11:7] 0 ALU DMEM 1
pc+4
0 IMEM AddrD Reg[rs2] Addr wb
inst[19:15] DataA Branch 0 DataR 0
AddrA Comp. DataW mem
inst[24:20] DataB 1
AddrB

inst[31:7]
Imm. imm[31:0]

Gen
RegWEn=1 Bsel=1
Asel=0 ALUSel=Add WBSel=0
PCSel ImmSel=I BrUnBrEq BrLT MemRW=Read

❑ Supporting narrower loads (lh/lb) requires additional circuits.


Adding sw to datapath

sw x14, 8(x2)
Do we need any modification?

+4 Reg[] pc alu
wb 1
DataD Reg[rs1]
1 ALU
alu pc inst[11:7] AddrD 0 DMEM 1
pc+4
0 IMEM Reg[rs2] Addr wb
inst[19:15] AddrA DataA Branch 0 DataR 0
Comp. mem
inst[24:20] AddrB DataB 1 DataW

inst[31:7]
Imm. imm[31:0]

Gen
RegWEn=0 Bsel=1
Asel=0 ALUSel=Add WBSel=*
PCSel ImmSel=S BrUnBrEq BrLT MemRW=Write

*= “Don’t Care”
Register Write Stage: Requirements
❑ Instruction Register Write Stage:
➢ Most instructions write the result of some computation into a register
▪ Examples: arithmetic, logical, shifts, loads, set-less-than
▪ Need destination register number and computation result
➢ Exceptions are stores, branches, jumps
▪ There are no results to be written
▪ These instructions remain idle in this stage

❑ Input from previous stage (Memory):


➢ Computation result either from memory or ALU
1. Fetch
2. Decode
3. ALU
4. Memory
5. RegWrite
Register Write Stage: Block Diagram
Memory Stage
5 32
AddrB DataA
BusA
5
AddrA
Register
Result 5 File
AddrD
32
DataB
32
DataD BusB
BusW

Clk Write Enable

❑ Result Write stage has no additional element:


➢ Basically just route the correct result into register file
➢ The Write Register number (AddrD) had been generated way back in the
Decode Stage
Adding jalr to datapath
jalr rd, rs1, imm

pc+4
alu +4 Reg[] pc alu
wb 1
DataD Reg[rs1] 2
1 ALU
pc inst[11:7] AddrD 0 DMEM 1
pc+4
0 IMEM Reg[rs2] Addr DataR
wb
inst[19:15] AddrA DataA Branch 0 0
Comp. mem
inst[24:20] AddrB DataB 1 DataW

inst[31:7]
Imm. imm[31:0]

Gen

PCSel=1 inst[31:0] Bsel=1 Asel=1 WBSel=2


ImmSel=I RegWEn=1 MemRW=Read
BrUn=* BrLT=* ALUSel=Add
BrEq = *

❑ Enlarging WB MUX to enable PC+4 to be written to Reg[rd]


➢ Uses same immediates as arithmetic and loads: PC = Reg[rs1] + immediate
The complete RV32I datapath

pc+4
+4 Reg[] pc alu
alu wb 1
DataD Reg[rs1] 2
1 ALU
pc inst[11:7] AddrD 0 DMEM 1
pc+4
0 IMEM Reg[rs2] Addr DataR
wb
inst[19:15] AddrA DataA Branch 0 0
Comp. DataW mem
inst[24:20] AddrB DataB 1

inst[31:7]
Imm. imm[31:0]

Gen

PCSel inst[31:0] ImmSel RegWEn BrUn BrEq BrLT BSel ASel ALUSel MemRW WBSel

Control Unit

❑ Can execute any RV32I instruction in one clock cycle.


➢ The way the datapath is operated is governed by the Control Unit
➢ How do we design the Control Unit? Look forward to the next lecture ☺

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy