0% found this document useful (0 votes)
24 views

Datorteknik, Eitf70, Per Andersson

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Datorteknik, Eitf70, Per Andersson

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Introduction

DATORTEKNIK, EITF70, PER ANDERSSON


Once upon a time:
When a Computer executed a Program…
The

Hardware /
Software
Interface
Computers were not always electronic

Charles Babbage
1791 – 1871

Ada King, Countess of Lovelace


1815 – 1852

Analytical Never built!


Mechanics of the time not sufficient
Engine
By Marcin Wichary from San Francisco, U.S.A. -
Replica from Science Museum in London
Analytical Engine Mill
Chips
AMD MI300, 13 chips stacked
The Swedish Multidisciplinary Research Center towards the
EU Chips Act
Per Andersson, LTH
Who was Moore?

Gordon Moore, 1929-2023


Intel co-founder
Prediction from 1965 (rev 1975)
Picture from Scientists You Must Know video, created by the Chemical Heritage Foundation
All curves are not alike
• Total performance and
device count still increasing
• Power per chip and clock
frequency is flat
• More parallelism!
• Why?
– Physical limits
– Engineering challenges
Cost of Chip Development

“3nm, IC design costs range from a staggering $500 million to $1.5 billion,
according to IBS. The $1.5 billion figure involves a complex GPU at Nvidia.”

https://semiengineering.com/big-trouble-at-3nm/
Example NVIDIA H100 Tensor Core GPU, 80B transistor, custom
Value Creation TSMC process

Knowledge of how to efficiently control 80B transistors creates a


disruptive value

Average Sales Price: NVIDIA H100 $40,000


without owning a silicon foundry (fabless)
5- 30×
2

One wafer gives ~60 Nvidia H100 Tensor Core GPU´s


Cost: $267/GPU + testing, packaging, royalties etc. ~$1000;
Total $1267 (do not quote me)
Cost of silicon wafer in TSMC 4nm ~ $15,000
[adopted Björn Ekelund, Ericsson]
NVIDIA´s secret
Verticals are crucial:
Algorithms

Adopt / Influence
: 30
ato
ml
aye
rs

Semiconductor Systems
Att gå igenom
• Canvas – huvudkällan till information
• Kursombud?
• Kurs under omarbetning!!!
• Schema och kursformat
• Föreläsningar (7 st) – teori och exempel
• Lektioner (4 st, halvgrupp) – förbereder labbar
• Laborationer
– Var, när & hur
Examination
För betyg 3:
– Minst 35 av 50 totalt på duggorna
– Tre laborationer som godkänns via duggorna
För betyg 4:
– Enligt ovan, samt godkänd laboration 4
– 45 på duggorna eller betyg 4 på tentamen
För betyg 5:
– Enligt ovan, samt betyg 5 på tentamen
Duggor
• Provdugga (5p) + 3 duggor (3 x 15p), max 50p
• Mål, 35p för godkänt, 45p för betyg 4
• Tre försök med tidsbegränsning
• Dragning ur frågekatalog
• Bästa resultatet räknas
• Öppet i 72 timmar från onsdag 8:00
– Provdugga denna vecka
– Övriga veckan efter laboration 1-3
Hjälpmedel
Se Canvas!
Mest webbaserat
Installera kompilatorpaket
och simulator

Kursbok på tentamen
What is a computer?

https://eseo-tech.github.io/emulsiV/
Inside a Computer, take #1

Stored data:
Stored program: Anything!
von Neumann
Textbook:
Appendix A
All built using Digital Primitives Read it!!!

• Combinational logic: Gates, Adders, Multiplexers


• Sequential elements: Flip-flops, Registers, Memories

Clk
D Q
D

Clk
Q
Memory
• Ordered sequence
– 8-bit bytes (typically)
• Each with unique address
• Size vs. Access time
• GB vs. GiB

Bit cell x 8
Memory size confusion
Load / Store Operations Pseudo Instructions
RISC-V Instruction-Set
Erik Engheim <erik.engheim@ma.com> Mnemonic Instruction Type Description Mnemonic Instruction Base instruction(s)

LD rd, imm12(rs1) Load doubleword I rd ← mem[rs1 + imm12] LI rd, imm12 Load immediate (near) ADDI rd, zero, imm12
Arithmetic Operation
LW rd, imm12(rs1) Load word I rd ← mem[rs1 + imm12] LUI rd, imm[31:12]
LI rd, imm Load immediate (far)

https://blog.translusion.com/
ADDI rd, rd, imm[11:0]
Mnemonic Instruction Type Description Load halfword I
LH rd, imm12(rs1) rd ← mem[rs1 + imm12]
AUIPC rd, sym[31:12]
LA rd, sym Load address (far)
ADD rd, rs1, rs2 Add R rd ← rs1 + rs2 LB rd, imm12(rs1) Load byte I rd ← mem[rs1 + imm12] ADDI rd, rd, sym[11:0]

SUB rd, rs1, rs2

ADDI rd, rs1, imm12


Subtract

Add immediate
R

I
rd ← rs1 - rs2

rd ← rs1 + imm12
LWU rd, imm12(rs1) Load word unsigned I rd ← mem[rs1 + imm12]
MV rd, rs

NOT rd, rs
Copy register

One's complement
ADDI rd, rs, 0

XORI rd, rs, -1


images/posts/RISC-V-
cheatsheet-RV32I-4-3.pdf
Load halfword
LHU rd, imm12(rs1) I rd ← mem[rs1 + imm12]
Set less than R unsigned
SLT rd, rs1, rs2 rd ← rs1 < rs2 ? 1 : 0 NEG rd, rs Two's complement SUB rd, zero, rs

Set less than LBU rd, imm12(rs1) Load byte unsigned I rd ← mem[rs1 + imm12]
SLTI rd, rs1, imm12 I rd ← rs1 < imm12 ? 1 : 0
immediate BGT rs1, rs2, offset Branch if rs1 > rs2 BLT rs2, rs1, offset

SLTU rd, rs1, rs2 Set less than unsigned R rd ← rs1 < rs2 ? 1 : 0 SD rs2, imm12(rs1) Store doubleword S rs2 → mem[rs1 + imm12]
BLE rs1, rs2, offset Branch if rs1 ≤ rs2 BGE rs2, rs1, offset
Set less than
SLTIU rd, rs1, imm12 I rd ← rs1 < imm12 ? 1 : 0 SW rs2, imm12(rs1) Store word S rs2(31:0) → mem[rs1 + imm12]
immediate unsigned Branch if rs1 > rs2
BGTU rs1, rs2, offset BLTU rs2, rs1, offset
(unsigned)
LUI rd, imm20 Load upper immediate U rd ← imm20 << 12 SH rs2, imm12(rs1) Store halfword S rs2(15:0) → mem[rs1 + imm12]
Branch if rs1 ≤ rs2
BLEU rs1, rs2, offset BGEU rs2, rs1, offset
(unsigned)
Add upper immediate
AUIP rd, imm20 U rd ← PC + imm20 << 12 SB rs2, imm12(rs1) Store byte S rs2(7:0) → mem[rs1 + imm12]
to PC
BEQZ rs1, offset Branch if rs1 = 0 BEQ rs1, zero, offset

Logical Operations Branching BNEZ rs1, offset Branch if rs1 ≠ 0 BNE rs1, zero, offset

BGEZ rs1, offset Branch if rs1 ≥ 0 BGE rs1, zero, offset


Mnemonic Instruction Type Description Mnemonic Instruction Type Description

if rs1 = rs2 BLEZ rs1, offset Branch if rs1 ≤ 0 BGE zero, rs1, offset
AND rd, rs1, rs2 AND R rd ← rs1 & rs2 BEQ rs1, rs2, imm12 Branch equal SB
pc ← pc + imm12
OR rd, rs1, rs2 OR R rd ← rs1 | rs2 BGTZ rs1, offset Branch if rs1 > 0 BLT zero, rs1, offset
if rs1 ≠ rs2
BNE rs1, rs2, imm12 Branch not equal SB
pc ← pc + imm12
XOR rd, rs1, rs2 XOR R rd ← rs1 ^ rs2
J offset Unconditional jump JAL zero, offset
Branch greater than or if rs1 ≥ rs2
AND immediate I BGE rs1, rs2, imm12 SB
ANDI rd, rs1, imm12 rd ← rs1 & imm12 equal pc ← pc + imm12
CALL offset12 Call subroutine (near) JALR ra, ra, offset12
ORI rd, rs1, imm12 OR immediate I rd ← rs1 | imm12 Branch greater than or if rs1 >= rs2
BGEU rs1, rs2, imm12 SB
equal unsigned pc ← pc + imm12
CALL offset Call subroutine (far) AUIPC ra, offset[31:12]
XORI rd, rs1, imm12 XOR immediate I rd ← rs1 ^ imm12 JALR ra, ra, offset[11:0]
BLT rs1, rs2, imm12 Branch less than SB if rs1 < rs2
pc ← pc + imm12
RET Return from subroutine JALR zero, 0(ra)
SLL rd, rs1, rs2 Shift left logical R rd ← rs1 << rs2 Branch less than if rs1 < rs2
BLTU rs1, rs2, imm12 SB
unsigned pc ← pc + imm12 << 1
NOP No operation ADDI zero, zero, 0
SRL rd, rs1, rs2 Shift right logical R rd ← rs1 >> rs2 rd ← pc + 4
JAL rd, imm20 Jump and link UJ
pc ← pc + imm20

SRA rd, rs1, rs2 Shift right arithmetic R rd ← rs1 >> rs2 JALR rd, imm12(rs1) Jump and link register I rd ← pc + 4 Register File Register Aliases
pc ← rs1 + imm12
Shift left logical
SLLI rd, rs1, shamt I rd ← rs1 << shamt zero ra sp gp
immediate r0 r1 r2 r3
ra - return address
sp - stack pointer
SRLI rd, rs1, shamt Shift right logical imm. I rd ← rs1 >> shamt r4 r5 r6 r7 tp t0 t1 t2 gp - global pointer
tp - thread pointer
Shift right arithmetic s0/fp s1 a0 a1
SRAI rd, rs1, shamt I rd ← rs1 >> shamt r8 r9 r10 r11
immediate

32-bit instruction format r12 r13 r14 r15 a2 a3 a4 a5

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 a6 a7 s2 s3
r16 r17 r18 r19
R func rs2 rs1 func rd opcode t0 - t6 - Temporary registers
r20 r21 r22 r23 s4 s5 s6 s7 s0 - s11 - Saved by callee
I immediate rs1 func rd opcode a0 - 17 - Function arguments
r24 r25 r26 r27 s8 s9 s10 s11 a0 - a1 - Return value(s)
SB immediate rs2 rs1 func immediate opcode

UJ immediate rd opcode r28 r29 r30 r31 t3 t4 t5 t6


CALL offset12 Call subroutine (near) JALR ra, ra, offset12

CALL offset Call subroutine (far) AUIPC ra, offset[31:12]


JALR ra, ra, offset[11:0]

1
Registers
RET Return from subroutine JALR zero, 0(ra)

NOP No operation ADDI zero, zero, 0

Register File Register Aliases • Few but Fast access


r0 r1 r2 r3 zero ra sp gp
ra - return address
• All equal (well…)
sp - stack pointer
r4 r5 r6 r7 tp t0 t1 t2 gp - global pointer – rv32: 32 bit
tp - thread pointer
r8 r9 r10 r11 s0/fp s1 a0 a1 – rv64: 64 bit
r12 r13 r14 r15 a2 a3 a4 a5 • r0 always zero (RISC-V)
2 1 0
r16 r17 r18 r19 a6 a7 s2 s3 • Use according to
ode t0 - t6 - Temporary registers
r20 r21 r22 r23 s4 s5 s6 s7 s0 - s11 - Saved by callee
a0 - 17 - Function arguments
– Your liking (not really)
ode

ode
r24 r25 r26 r27 s8 s9 s10 s11 a0 - a1 - Return value(s)
– Conventions
ode r28 r29 r30 r31 t3 t4 t5 t6
Do bits have Meaning?

Let’s look at an example


• What do the bits “1110 0010 1000 0010 1010 1100 0010 0100” mean – really!?
• Can be more compactly expressed as 0xE282AC24 (hexadecimal notation)
• Would fit into four (consecutive) bytes of memory [0xE2, 0x82, 0xAC, 0x24]
• How do/should we (our computer) interpret it?

Any ideas? Check out:


https://hexed.it/
How about “€$”? Can you explain it?
We (conventions) define the semantics of bits!
SRA rd, rs1, rs2 Shift right arithmetic R rd ← rs1 >> rs2 Jump and link register I rd ← pc + 4
JALR rd, imm12(rs1)
pc ← rs1 + imm12
Shift left logical
SLLI rd, rs1, shamt I rd ← rs1 << shamt
immediate

Instructions are Bits too


SRLI rd, rs1, shamt Shift right logical imm. I rd ← rs1 >> shamt
Load / Store Operations
RISC-V Instruction-Set
SRAI rd, rs1, shamt
Shift right arithmetic
immediate
I rd ← rs1 >> shamt
Erik Engheim <erik.engheim@ma.com> Mnemonic Instruction Type Description Mnemoni

LD rd, imm12(rs1)
32-bit instruction
Load doubleword I
format
rd ← mem[rs1 + imm12] LI rd, imm12
r
Arithmetic Operation
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LW rd, imm12(rs1) Load word I rd ← mem[rs1 + imm12] r
LI rd, imm
Mnemonic Instruction Type Description R
LHfunc
rd, imm12(rs1) rs2
Load halfword rs1 I funcrd ← mem[rs1rd+ imm12] opcode
r
I LA rd, sym
ADD rd, rs1, rs2 Add R rd ← rs1 + rs2 immediate
LB rd, imm12(rs1) Load byte rs1 I funcrd ← mem[rs1rd+ imm12] opcode
r
SUB rd, rs1, rs2 Subtract R rd ← rs1 - rs2 SB immediate rs2
Load word unsignedrs1 I funcrd ← mem[rs1
immediate opcode MV rd, rs
LWU rd, imm12(rs1) + imm12]

ADDI rd, rs1, imm12 Add immediate I rd ← rs1 + imm12 UJ NOT rd, rs r
immediate Load halfword rd opcode
LHU rd, imm12(rs1) I rd ← mem[rs1 + imm12]
Set less than R unsigned
SLT rd, rs1, rs2 rd ← rs1 < rs2 ? 1 : 0 NEG rd, rs

Set less than LBU rd, imm12(rs1) Load byte unsigned I rd ← mem[rs1 + imm12]
SLTI rd, rs1, imm12 I rd ← rs1 < imm12 ? 1 : 0
immediate BGT rs1, rs2,

SLTU rd, rs1, rs2 Set less than unsigned R rd ← rs1 < rs2 ? 1 : 0 SD rs2, imm12(rs1) Store doubleword S rs2 → mem[rs1 + imm12]
BLE rs1, rs2,
SLTIU rd, rs1, imm12
Set less than
I rd ← rs1 < imm12 ? 1 : 0 SW
https://luplab.gitlab.io/rvcodecjs/
rs2, imm12(rs1) Store word S rs2(31:0) → mem[rs1 + imm12]
immediate unsigned
BGTU rs1, rs2,
LUI rd, imm20 Load upper immediate U rd ← imm20 << 12 SH rs2, imm12(rs1) Store halfword S rs2(15:0) → mem[rs1 + imm12]
BLEU rs1, rs2,
Add upper immediate
AUIP rd, imm20 U rd ← PC + imm20 << 12 SB rs2, imm12(rs1) Store byte S rs2(7:0) → mem[rs1 + imm12]
to PC
BEQZ rs1, offse

Logical Operations Branching BNEZ rs1, offse

Mnemonic Instruction Type Description BGEZ rs1, offse


Mnemonic Instruction Type Description

AND R if rs1 = rs2 BLEZ rs1, offse


AND rd, rs1, rs2 rd ← rs1 & rs2 BEQ rs1, rs2, imm12 Branch equal SB
pc ← pc + imm12
OR R
Performanc e X Performanc e Y
Performance = Execution time Y Execution time X = n
Clock period

Clock (cycles)

Data transfer
and computation
Update state

CPU Time = CPU Clock Cycles ´ Clock Cycle Time


CPU Clock Cycles
=
Clock Rate
Clock Cycles n
æ Instruction Count i ö
CPI = = å ç CPIi ´ ÷
Instruction Count i=1 è Instruction Count ø
Performance summary

Instructions Clock cycles Seconds • Algorithm


CPU Time = ´ ´ • Language
Program Instruction Clock cycle • Compiler
• Instruction set

But:
Power = Capacitive load ´ Voltage 2 ´ Frequency

×30 5V → 1V ×1000
How do we achieve it?
• Use abstraction to simplify design What about Moore’s law?

• Make the common case fast

• Performance via parallelism

• Performance via pipelining

• Performance via prediction

• Hierarchy of memories

• Dependability via redundancy

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy