0% found this document useful (0 votes)
22 views

lecture08_RISCV_Impl_2

The document covers the implementation of RISC-V single-cycle architecture, detailing CPU performance factors, components, and stages of instruction execution. It explains the roles of the datapath and control in processing instructions, as well as the design of combinational and sequential circuits. Additionally, it outlines the execution stages for various instruction types, emphasizing the importance of registers and memory access in the RISC-V architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

lecture08_RISCV_Impl_2

The document covers the implementation of RISC-V single-cycle architecture, detailing CPU performance factors, components, and stages of instruction execution. It explains the roles of the datapath and control in processing instructions, as well as the design of combinational and sequential circuits. Additionally, it outlines the execution stages for various instruction types, emphasizing the importance of registers and memory access in the RISC-V architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Lecture 08: RISC-V Single-Cycle

Implementa9on

CSE 564 Computer Architecture Summer 2017

Department of Computer Science and Engineering


Yonghong Yan
yan@oakland.edu
www.secs.oakland.edu/~yan

1
Acknowledgements
• The notes cover Appendix C of the textbook, but we use
RISC-V instead of MIPS ISA
– Slides for general RISC ISA implementaLon are adapted from
Lecture slides for “Computer OrganizaLon and Design, FiRh
EdiLon: The Hardware/SoRware Interface” textbook for
general RISC ISA implementaLon
– Slides for RISC-V single-cycle implementaLon are adapted
from Computer Science 152: Computer Architecture and
Engineering, Spring 2016 by Dr. George Michelogiannakis from
UC Berkeley

2
Introduc9on
• CPU performance factors CPU Time = Instructions * Cycles Time
*
– InstrucLon count Program Instruction Cycle
• Determined by ISA and compiler
– CPI and Cycle Lme
• Determined by CPU hardware
• Simple subset, shows most aspects
– Memory reference: lw, sw
– ArithmeLc/logical: add, sub, and, or, slt
– Control transfer: beq, j
Components of a Computer

Memory
Processor Input
Enable?
Read/Write
Control

Program
Datapath
Address
PC Bytes
Write
Registers Data

ArithmeLc & Logic Unit Read Data


Output
(ALU) Data

Processor-Memory Interface I/O-Memory Interfaces


4
The CPU
• Processor (CPU): the acLve part of the computer that
does all the work (data manipulaLon and decision-
making)
• Datapath: porLon of the processor that contains
hardware necessary to perform operaLons required by
the processor (the brawn)
• Control : porLon of the processor (also in hardware) that
tells the datapath what needs to be done (the brain)

5
Datapath and Control

• Datapath designed to support data transfers required by


instrucLons
• Controller causes correct transfers to happen

rd

registers
instruction
memory
PC

rs

memory
ALU

Data
rt

+4 imm

opcode, funct

Controller
6
Logic Design Basics
• Information encoded in binary
– Low voltage = 0, High voltage = 1
– One wire per bit
– Multi-bit data encoded on multi-wire buses
• Combinational circuit
– Operate on data
– Output is a function of input
• State (sequential) circuit
– Store information
Combinational Circuits
• AND-gate
– Y=A&B n Adder A
Y
+
n Y=A+B B

A
Y
B

n ArithmeLc/Logic Unit
n MulLplexer n Y = F(A, B)
n Y = S ? I1 : I0
A
I0 M
u Y ALU Y
I1 x
B
S F

Chapter 4 — The Processor — 8


Sequential Circuits
• Register: stores data in a circuit
– Uses a clock signal to determine when to update the
stored value
– Edge-triggered: update when Clk changes from 0 to 1

Clk
D Q
D

Clk
Q
Edge-Triggered D Flip Flops
• Value of D is sampled on posi9ve clock edge.
D Q

• Q outputs sampled value for rest of cycle.

CLK

Q
Sequential Circuits
• Register with write control
– Only updates on clock edge when write control input is 1
– Used when stored value is required later

Clk

D Q Write

Write D
Clk
Q
Clocking Methodology
• Combinational logic transforms data during clock
cycles
– Between clock edges
– Input from state elements, output to state element
– Longest delay determines clock period
Single cycle data paths

Processor uses
synchronous logic
design (a “clock”). f! T!
1 MHz! 1 μs!
10 MHz! 100 ns!
100 MHz! 10 ns!
1 GHz! 1 ns!

D Q
All state elements act like
posiLve edge-triggered flip Reset ?
flops.

clk
Hardware Elements of CPU
• CombinaLonal circuits OpSelect
– Mux, Decoder, ALU, ... - Add, Sub, ...
- And, Or, Xor, Not, ...
Sel - GT, LT, EQ, Zero, ...
lg(n)
A0 O0 A
A1 . O1
Result

Decoder
..
O A
... Mux lg(n)
ALU
Comp?
An-1 On-1 B

• Synchronous state elements


– Flipflop, Register, Register file, SRAM, DRAM

D Clk
En En

Clk D
Q Q

Edge-triggered: Data is sampled at the rising edge


Register Files
• Reads are combinaLonal
register
D0 D1 D2 ... Dn-1
En
Clk ff ff ff ... ff

Q0 Q1 Q2 ... Qn-1

Clock WE
we
ReadSel1 rs1 rd1 ReadData1
ReadSel2 rs2 Register rd2 ReadData2
file
WriteSel ws
2R+1W
WriteData wd

15
Register File Implementa9on
• RISC-V integer instrucLons have at most 2 register source
operands rs1
rd clk wdata rdata1 rdata2 5 rs2
5 32 32 32 5

reg 0

we reg 1



reg 31

16
A Simple Memory Model

WriteEnable
Clock

Address
MAGIC ReadData
RAM
WriteData

Reads and writes are always completed in one cycle


• a Read can be done any Lme (i.e. combinaLonal)
• a Write is performed at the rising clock edge
if it is enabled
=> the write address and data
must be stable at the clock edge

Later in the course we will present a more realis:c model of memory

17
Five Stages of Instruc9on Execu9on

• Stage 1: InstrucLon Fetch


• Stage 2: InstrucLon Decode
• Stage 3: ALU (ArithmeLc-Logic Unit)
• Stage 4: Memory Access
• Stage 5: Register Write

18
Stages of Execu9on on Datapath

rd

registers
instruction
memory
PC

rs

memory
ALU

Data
rt

+4 imm

1. InstrucLon 2. Decode/ 3. Execute 4. Memory 5. Register


Fetch Register Write
Read

19
Stages of Execu9on (1/5)
• There is a wide variety of instrucLons: so what general
steps do they have in common?
• Stage 1: InstrucLon Fetch
– The 32-bit instrucLon word must first be fetched from
memory
• the cache-memory hierarchy
– also, this is where we Increment PC
• PC = PC + 4, to point to the next instrucLon: byte addressing
so + 4

20
Stages of Execu9on (2/5)

• Stage 2: InstrucLon Decode: gather data from the fields


(decode all necessary instrucLon data)
1. read the opcode to determine instrucLon type and field
lengths
2. read in data from all necessary registers
• for add, read two registers
• for addi, read one register
• for jal, no reads necessary

21
Stages of Execu9on (3/5)

• Stage 3: ALU (ArithmeLc-Logic Unit): the real work of


most instrucLons is done here
– AL operaLons:
• arithmeLc (+, -, *, /), shiRing, logic (&, |), comparisons
(slt)
– loads and stores
• lw $t0, 40($t1)
• the address we are accessing in memory = the value in $t1
PLUS the value 40
• AddiLon is done in this stage
– CondiLonal branch
• Comparison is done in this stage (one soluLon)

22
Stages of Execu9on (4/5)
• Stage 4: Memory Access: only load and store instrucLons
– the others remain idle during this stage or skip it all together
– since these instrucLons have a unique step, we need this extra
stage to account for them
– as a result of the cache system, this stage is expected to be
fast

23
Stages of Execu9on (5/5)
• Stage 5: Register Write
– most instrucLons write the result of some computaLon into a
register
– examples: arithmeLc, logical, shiRs, loads, slt
– what about stores, branches, jumps?
• don’t write anything into a register at the end
• these remain idle during this fiRh stage or skip it all together

24
Stages of Execu9on on Datapath

rd

registers
instruction
memory
PC

rs

memory
ALU

Data
rt

+4 imm

1. InstrucLon 2. Decode/ 3. Execute 4. Memory 5. Register


Fetch Register Write
Read

25
Instruction Execution
• PC → instruction memory, fetch instruction
• Register numbers → register file, read registers
• Depending on instruction class
– Use ALU to calculate
• Arithmetic result
• Memory address for load/store
• Branch condition and target address
– Access data memory for load/store
– PC ← target address or PC + 4
CPU Components
Multiplexers
n Can’t just join wires
together
n Use mulLplexers
Control Signals
Building a Datapath
• Datapath
– Elements that process data and addresses
in the CPU
• Registers, ALUs, mux’s, memories, …
• We will build a RISCV datapath incrementally
– Refining the overview design
Instruction Fetch

Increment by
4 for next
32-bit instrucLon
register
R-Format Instructions
• Read two register operands
• Perform arithmetic/logical operation
• Write register result
Load/Store Instructions
• Read register operands
• Calculate address using 12-bit offset
– Use ALU, but sign-extend offset
• Load: Read memory and update register
• Store: Write register value to memory
Branch Instructions
• Read register operands
• Compare operands
– Use ALU, subtract and check Zero output
• Calculate target address
– Sign-extend displacement
– Shift left 2 places (word displacement)
– Add to PC + 4
• Already calculated by instruction fetch
Branch Instructions

Just
re-routes
wires

Sign-bit wire
replicated
Composing the Elements
• First-cut data path does an instruction in one clock
cycle
– Each datapath element can only do one function at a time
– Hence, we need separate instruction and data memories
• Use multiplexers where alternate data sources are
used for different instructions
R-Type/Load/Store Datapath
Full Datapath
ALU Control
• ALU used for
– Load/Store: F = add
– Branch: F = subtract
– R-type: F depends on funct field

ALU control Function


0000 AND
0001 OR
0010 add
0110 subtract
0111 set-on-less-than
1100 NOR
ALU Control
• Assume 2-bit ALUOp derived from opcode
– Combinational logic derives ALU control

opcode ALUOp Operation funct ALU function ALU control


lw 00 load word XXXXXX add 0010
sw 00 store word XXXXXX add 0010
beq 01 branch equal XXXXXX subtract 0110
R-type 10 add 100000 add 0010
subtract 100010 subtract 0110
AND 100100 AND 0000
OR 100101 OR 0001
set-on-less-than 101010 set-on-less-than 0111
Datapath: Reg-Reg ALU Instruc9ons

RegWriteEn
0x4
Add clk

Inst<19:15> we
Inst<24:20> rs1
addr rs2
PC Inst<11:7> rd1
inst ALU
wa
Inst. wd rd2
clk GPRs
Memory

Inst<14:12> ALU
Control

OpCode RegWrite Timing?


7 5 5 3 5 7
func7 rs2 rs1 func3 rd opcode rd ← (rs1) func (rs2)
31 25 24 20 19 15 14 12 11 76 0
41
Datapath: Reg-Imm ALU Instruc9ons

RegWriteEn
0x4
clk
Add

Inst<19:15> we
rs1
rs2
PC addr rd1
inst Inst<11:7> wa ALU
wd rd2
clk Inst. GPRs
Memory
Inst<31:20> Imm
Select
Inst<14:12> ALU
Control

OpCode ImmSel

12 5 3 5 7
immediate12 rs1 func3 rd opcode rd ← (rs1) op immediate
31 20 19 15 14 12 11 76 0
42
Conflicts in Merging Datapath

RegWrite
0x4 Introduce
clk
Add muxes
Inst<19:15> we
rs1
Inst<24:20> rs2
PC addr rd1
Inst<11:7>
inst wa ALU
wd rd2
clk Inst. GPRs
Memory
Inst<31:20> Imm
Select
Inst<14:12> ALU
Control

OpCode ImmSel
7 5 5 3 5 7
func7 rs2 rs1 func3 rd opcode rd ← (rs1) func (rs2)

immediate12 rs1 func3 rd opcode rd ← (rs1) op immediate


31 20 19 15 14 12 11 76 0 43
Datapath for ALU Instruc9ons

RegWriteEn
0x4
clk
Add

<19:15> we
rs1
<24:20> rs2
PC addr rd1
inst <11:7> wa ALU
wd rd2
clk Inst. GPRs
Memory
Inst<31:20> Imm
Select
<14:12> ALU
Control

<6:0>
ImmSel Op2Sel
OpCode
Reg / Imm
7 5 5 3 5 7
func7 rs2 rs1 func3 rd opcode rd ← (rs1) func (rs2)

immediate12 rs1 func3 rd opcode rd ← (rs1) op immediate


31 20 19 15 14 12 11 76 0 44
Load/Store Instruc9ons
RegWriteEn MemWrite
0x4 clk WBSel
ALU / Mem
Add
“base” we
clk
rs1
rs2
addr rd1 we
PC wa addr
inst ALU
wd rd2
Inst. GPRs rdata
clk Data
Memory disp Imm Memory
Select wdata
ALU
Control

OpCode ImmSel Op2Sel


7 5 5 3 5 7
imm rs2 rs1 func3 imm opcode Store (rs1) + displacement

immediate12 rs1 func3 rd opcode Load


31 20 19 15 14 12 11 76 0
rs1 is the base register
rd is the destination of a Load, rs2 is the data source for a Store 45
RISC-V Condi9onal Branches
7 5 5 3 5 7 BEQ/BNE
imm rs2 rs1 func3 imm opcode BLT/BGE
31 25 24 20 19 15 14 12 11 7 6 0 BLTU/BGEU

• Compare two integer registers for equality (BEQ/BNE) or


signed magnitude (BLT/BGE) or unsigned magnitude (BLTU/
BGEU)
• 12-bit immediate encodes branch target address as a
signed offset from PC, in units of 16-bits (i.e., shiR leR by 1
then add to PC).

46
Condi9onal Branches (BEQ/BNE/BLT/BGE/BLTU/BGEU)

PCSel RegWrEn
br MemWrite WBSel

pc+4

0x4
Add
Add

clk

we Br Logic Bcomp?
clk
rs1
rs2
PC addr rd1 we
inst wa addr
wd rd2 ALU
clk Inst. GPRs rdata
Memory Data
Imm Memory
Select wdata
ALU
Control

OpCode ImmSel Op2Sel


47
Full RISCV1Stage Datapath
RISC-V
rs1 br_eq?
Sodor 1-Stage
JumpReg rs2
Branch
br_lt?
TargGen CondGen br_ltu?
PC
Branch
pc+4 PC ir[31:25],
ir[11:7]
TargGen
jalr
branch PC+4
jump +4
Jump
exception

co-processor (CSR) registers


ir[31:12] TargGen
pc_sel

ir[31:20] IType Sign PC


Instruction Inst
addr

data

Extend
Mem

ir[11:7]

rf_wen
ir[31:20] Op2Sel
SType Sign

wb_sel
Extend
val

ir[31:12]
UType wa en
ALU
Reg

wd
ir[24:20] rs2 File
Reg
addr

data
ir[19:15] rs1
File

AluFun
Op1Sel

Decoder rdata
rs2
addr Data Mem
wdata
Control
Signals

mem_val
mem_rw
Note: for simplicity, the CSR File
(control and status registers) and
associated datapath is not shown
48
Execute Stage
Hardwired Control is pure Combina9onal Logic

ImmSel
Op2Sel
FuncSel
op code
combinaLonal MemWrite

logic WBSel
Equal?
WASel
RegWriteEn
PCSel

49
ALU Control & Immediate Extension

Inst<14:12> (Func3)

Inst<6:0> (Opcode)
ALUop
+
0?

FuncSel
( Func, Op, +, 0? )

Decode Map
ImmSel
( IType12, SType12,
UType20)
50
Hardwired Control Table
Opcode ImmSel Op2Sel FuncSel MemWr RFWen WBSel WASel PCSel
ALU * Reg Func no yes ALU rd pc+4
ALUi IType12 Imm Op no yes ALU rd pc+4
LW IType12 Imm + no yes Mem rd pc+4
SW SType12 Imm + yes no * * pc+4
BEQtrue SBType12 * * no no * * br
BEQfalse SBType12 * * no no * * pc+4
J * * * no no * * jabs
JAL * * * no yes PC X1 jabs
JALR * * * no yes PC rd rind

Op2Sel= Reg / Imm WBSel = ALU / Mem / PC


WASel = rd / X1 PCSel = pc+4 / br / rind / jabs

51
Single-Cycle Hardwired Control
clock period is sufficiently long for all of the following steps to
be “completed”:
1. InstrucLon fetch
2. Decode and register fetch
3. ALU operaLon
4. Data fetch if required
5. Register write-back setup Lme
=> tC > tIFetch + tRFetch + tALU+ tDMem+ tRWB

At the rising edge of the following clock, the PC, register file
and memory are updated

52
Implementa9on in Real
• Load-Store RISC ISAs designed for efficient pipelined
implementaLons
– Inspired by earlier Cray machines (CDC 6600/7600)
• RISC-V ISA implemented using Chisel hardware
construcLon language
– Chisel: h}ps://chisel.eecs.berkeley.edu/
– Ge~ng started:
• h}ps://chisel.eecs.berkeley.edu/2.2.0/ge~ng-started.html
– Check resource page for slides and other info

53
Chisel in one slides
• Module
• IO
• Wire
• Reg
• Mem

54
UCB RISC-V Sodor
• h}ps://github.com/ucb-bar/riscv-sodor
– Single-cycle:
• h}ps://github.com/ucb-bar/riscv-sodor/tree/master/src/
rv32_1stage

55

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy