0% found this document useful (0 votes)
141 views

ARM ISA Review

The document summarizes key details about three versions of the Raspberry Pi single-board computers. The original Raspberry Pi uses a single-core ARM11 processor. The Raspberry Pi 2 uses a quad-core ARM Cortex-A7 processor and has improved connectivity options. The Raspberry Pi 3 uses a 64-bit ARM Cortex-A53 processor, adds built-in WiFi and Bluetooth, and retains the improved connectivity of the Raspberry Pi 2.

Uploaded by

Fran222999 XD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views

ARM ISA Review

The document summarizes key details about three versions of the Raspberry Pi single-board computers. The original Raspberry Pi uses a single-core ARM11 processor. The Raspberry Pi 2 uses a quad-core ARM Cortex-A7 processor and has improved connectivity options. The Raspberry Pi 3 uses a 64-bit ARM Cortex-A53 processor, adds built-in WiFi and Bluetooth, and retains the improved connectivity of the Raspberry Pi 2.

Uploaded by

Fran222999 XD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Raspberry Pi

ARM ISA review

039v12 1
Raspberry Pi
▪ Based on ARM11 (ARMv6) Broadcom BCM2835/BCM2836.
▪ Low Power ARM1176JZ-F Applications Processor
▪ ARM11 is the processor integrated in the first iPhone (also in
iPhone 3G).

039v12 2
Raspberry Pi 2
▪ ARM Cortex A7
▪ 900 MHZ quad-core
▪ 1GB RAM
▪ mid-range smartphones
▪ 4 USB ports
▪ 40 GPIO pins
▪ Full HDMI port
▪ Ethernet port
▪ Combined 3.5mm audio jack and composite video
▪ Camera interface (CSI)
▪ Display interface (DSI)
▪ Micro SD card slot
▪ VideoCore IV 3D graphics core

039v12 3
Raspberry Pi 3

▪ ARMv8 Cortex A-53 (64 bits)


▪ 1.2 GHZ quad-core
▪ 1GB RAM
▪ high-range smartphones
▪ 4 USB ports
▪ 40 GPIO pins
▪ Full HDMI port
▪ ….
▪ 10/100 Ethernet port
▪ Wifi 802.11n
▪ Bluetooth 4.1 (Low Energy)

039v12 4
Devices with Cortex-A53
Samsung
Galaxy Tab A Huawei
10.1 Nova
Plus
EgoIggo S95X smart TV set of
box

Lenovo Yoga Samsung


Tab 3 10 Galaxy C7

039v12 5
Data Sizes and Instruction Sets
▪ The ARM is a 32-bit, Load-Store RISC architecture.

▪ When used in relation to the ARM:


▪ Byte means 8 bits
▪ Halfword means 16 bits (two bytes)
▪ Word means 32 bits (four bytes)

▪ Most ARM’s implement two instruction sets


▪ 32-bit ARM Instruction Set
▪ 16-bit Thumb Instruction Set

▪ Jazelle cores can also execute Java bytecode

6
Processor Modes
▪ The ARM has seven basic operating modes, used to run user tasks, an operating
system, and to efficiently handle exceptions such as interrupts:

▪ User : unprivileged mode under which most tasks run

▪ FIQ : entered when a high priority (fast) interrupt is raised

▪ IRQ : entered when a low priority (normal) interrupt is raised

▪ Supervisor : entered on start up or reset and when a Software Interrupt


instruction is executed
▪ Abort : used to handle memory access violations as a result of fetching
instructions or accessing data.

▪ Undef : used to handle unknown or illegal instructions

▪ System : privileged mode using the same registers as user mode

7
The ARM Register Set
10000 10010 10001 11011 10111 10011
User mode IRQ FIQ Undef Abort SVC
r0
r1 ARM has 37 registers, all 32-bits long
r2
r3 - A subset of these registers is accessible in each
r4 mode and does not have to be preserved
r5 - Note: System mode uses the User mode
r6 register set.
r7
r8 r8
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr
Current mode Banked out registers
8
Program Status Registers
31 28 27 24 23 16 15 8 7 6 5 4 0

N Z C V Q J U n d e f i n e d I F T mode
f s x c

▪ Condition code flags ▪ Interrupt Disable bits.


▪ N = Negative result from ALU ▪ I = 1: Disables the IRQ.
▪ Z = Zero result from ALU ▪ F = 1: Disables the FIQ.
▪ C = ALU operation Carried out
▪ V = ALU operation oVerflowed (the result is ▪ T Bit
not representable in 32 bits (C2))
▪ Architecture xT only
▪ T = 0: Processor in ARM state
▪ Sticky Overflow flag - Q flag ▪ T = 1: Processor in Thumb state
▪ Architecture 5TE/J only
▪ Indicates if saturation has occurred ▪ Mode bits
▪ Specify the processor mode
▪ J bit
▪ Architecture 5TEJ only
▪ J = 1: Processor in Jazelle state

039v12 9
Program Counter (r15)
▪ When the processor is executing in ARM state:
▪ All instructions are 32 bits wide
▪ All instructions must be word aligned
▪ Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction
cannot be halfword or byte aligned)

▪ When the processor is executing in Thumb state:


▪ All instructions are 16 bits wide
▪ All instructions must be halfword aligned
▪ Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as instruction
cannot be byte aligned)

▪ When the processor is executing in Jazelle state:


▪ All instructions are 8 bits wide
▪ Processor performs a word access to read 4 instructions at once

039v12 10
Conditional Execution (predicated inst.)
▪ ARM instructions can be made to execute conditionally by postfixing them with the
appropriate condition code field.
▪ This improves code density and performance by reducing the number of
forward branch instructions.
CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip

▪ By default, data processing instructions do not affect the condition code flags but
the flags can be optionally set by using “S”. CMP does not need “S”.
loop

SUBS r1,r1,#1 decrement r1 and set flags
BNE loop if Z flag clear then branch

11
Condition Codes
Mnemonic Meaning
{cond}
EQ (equal) When Z is enabled (Z is 1)
NE (not equal). When Z is disabled. (Z is 0)
GE (greater or equal than, in two’s complement). When both V and N are
enabled or disabled (V is N)
LT (lower than, in two’s complement). This is the opposite of GE, so when V and
N are not both enabled or disabled (V is not N)
GT (greather than, in two’s complement). When Z is disabled and N and V are
both enabled or disabled (Z is 0, N is V)
LE (lower or equal than, in two’s complement). When Z is enabled or if not that,
N and V are both enabled or disabled (Z is 1. If Z is not 1 then N is V)
MI (minus/negative) When N is enabled (N is 1)
PL (plus/positive or zero) When N is disabled (N is 0)
VS (overflow set) When V is enabled (V is 1)
VC (overflow clear) When V is disabled (V is 0)
HI (higher) When C is enabled and Z is disabled (C is 1 and Z is 0)
LS (lower or same) When C is disabled or Z is enabled (C is 0 or Z is 1)
CS/HS (carry set/higher or same) When C is enabled (C is 1)
CS/HS (carry set/higher or same) When C is enabled (C is 1)
CC/LO (carry clear/lower) When C is disabled (C is 0)

cmp
Update status register
inst{s}: adds, subs, ands, …
Brach (b) cond. b{cond}: beq, bne, bgt, ble …
Instruction cond. inst{cond}: addeq, subne, ldrgt, …

12
Conditional execution examples

C source code ARM instructions


unconditional conditional
if (r0 == 0) CMP r0, #0 CMP r0, #0
{ BNE else ADDEQ r1, r1, #1
r1 = r1 + 1; ADD r1, r1, #1 ADDNE r2, r2, #1
} B end ...
else else
{ ADD r2, r2, #1
r2 = r2 + 1; end
} ...

▪ 5 instructions ▪ 3 instructions
▪ 5 words ▪ 3 words
▪ 5 or 6 cycles ▪ 3 cycles

13
Data Processing Instructions
▪ Consist of :
▪ Arithmetic: ADD ADC SUB SBC RSB RSC
▪ Logical: AND ORR EOR BIC
▪ Comparisons: CMP CMN TST TEQ
▪ Data movement: MOV MVN

▪ These instructions only work on registers, NOT memory.

▪ Syntax:

<Operation>{<cond>}{S} Rd, Rn, Operand2

▪ Comparisons set flags only - they do not specify Rd


▪ Data movement does not specify Rn

▪ Second operand is sent to the ALU via barrel shifter.


14
Using a Barrel Shifter:The 2nd Operand
Register, optionally with shift operation
Operand Operand ▪ Shift value can either be:
1 2 ▪ 5 bit unsigned integer
▪ Specified in bottom byte of
another register.
Barrel
Shifter
▪ Used for multiplication by constant

Immediate value
▪ 8 bit number, with a range of 0-
255.
ALU
▪ Rotated right through even
number of positions
▪ Allows increased range of 32-bit
Result constants to be loaded directly into
registers

15
Immediate constants

▪ No ARM instruction can contain a 32 bit immediate constant


▪ All ARM instructions are fixed as 32 bits long
▪ The data processing instruction format has 12 bits available
for operand2
11 8 7 0
rot immed_8
Quick Quiz:
x2
Shifter 0x3FC
ROR MOV r0, #???

▪ 4 bit rotate value (0-15) is multiplied by two to give range 0-


30 in steps of 2
▪ Rule to remember is
“8-bits rotated right by an even number of bit positions”
Check: https://alisdair.mcdiarmid.org/arm-immediate-value-encoding/
18
Loading 32 bit constants
▪ To allow larger constants to be loaded, the assembler offers a pseudo-
instruction:
▪ LDR rd, =const
▪ This will either:
▪ Produce a MOV or MVN instruction to generate the value (if possible).
or
▪ Generate a LDR instruction with a PC-relative address to read the
constant from a literal pool (Constant data area embedded in the
code).
▪ For example
▪ LDR r0,=0xFF => MOV r0,#0xFF
▪ LDR r0,=0x55555555 => LDR r0,[PC,#Imm12]

.LTORG
DCD 0x55555555
▪ This is the recommended way of loading constants into a register

19
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load

▪ Memory system must support all access sizes

▪ Syntax:
▪ LDR{<cond>}{<size>} Rd, <address>
▪ STR{<cond>}{<size>} Rd, <address>

e.g. LDREQB
20
Address accessed

▪ Address accessed by LDR/STR is specified by a base register with an offset


▪ For word and unsigned byte accesses, offset can be:
▪ An unsigned 12-bit immediate value (i.e. 0 - 4095 bytes)
LDR r0, [r1, #8]
▪ A register, optionally shifted by an immediate value
LDR r0, [r1, r2]
LDR r0, [r1, r2, LSL#2]
▪ This can be either added or subtracted from the base register:
LDR r0, [r1, #-8]
LDR r0, [r1, -r2, LSL#2]
▪ For halfword and signed halfword / byte, offset can be:
▪ An unsigned 8 bit immediate value (i.e. 0 - 255 bytes)
▪ A register (unshifted)
▪ Choice of pre-indexed or post-indexed addressing
▪ Choice of whether to update the base pointer (pre-indexed only)
LDR r0, [r1, #-8]!

21
Load/Store Exercise
Assume an array of 25 words. A compiler associates
y with r1. Assume that the base address for the
array is located in r2. Translate this C
statement/assignment using just three instructions:

array[10] = array[5] + y;

22
Load/Store Exercise Solution

array[10] = array[5] + y;

LDR r3, [r2, #20] ; r3 = array[5]


ADD r3, r3, r1 ; r3 = array[5] + y
STR r3, [r2, #40] ; array[5] + y =
array[10]

23
Load and Store Multiples
▪ Syntax:
▪ <LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register list>
▪ 4 addressing modes:
▪ LDMIA / STMIA increment after
▪ LDMIB / STMIB increment before
▪ LDMDA / STMDA decrement after
▪ LDMDB / STMDB decrement before
IA IB DA DB
LDMxx r10, {r0,r1,r4} r4
STMxx r10, {r0,r1,r4}
r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0

24
Multiply and Divide
▪ There are 2 classes of multiply - producing 32-bit and 64-bit results
▪ 32-bit versions on an ARM7TDMI will execute in 2 - 5 cycles

▪ MUL r0, r1, r2 ; r0 = r1 * r2


▪ MLA r0, r1, r2, r3 ; r0 = (r1 * r2) + r3

▪ 64-bit multiply instructions offer both signed and unsigned versions


▪ For these instruction there are 2 destination registers

▪ [U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3


▪ [U|S]MLAL r4, r5, r2, r3 ; r5:r4 = (r2 * r3) + r5:r4

▪ Most ARM cores do not offer integer divide instructions


▪ Division operations will be performed by C library routines or inline shifts

25
Branch instructions
▪ Branch : B{<cond>} label
▪ Branch with Link : BL{<cond>} subroutine_label

31 28 27 25 24 23 0

Cond 1 0 1 L Offset

Link bit 0 = Branch


1 = Branch with link
Condition field

▪ The processor core shifts the offset field left by 2 positions, sign-extends it
and adds it to the PC
▪ ± 32 Mbyte range
▪ How to perform longer branches?

26
ARM Branches and Subroutines
▪ B <label>
▪ PC relative. ±32 Mbyte range.
▪ BL <subroutine>
▪ Stores return address in LR
▪ Returning implemented by restoring the PC from LR
▪ For non-leaf functions, LR will have to be stacked

func1 func2

push{regs,lr} push{regs}
:
: :
:
BL func2 :
BL func1
: :
:
pop{regs,lr} pop{regs}
:
BX lr BX lr

039v12 27
Register Usage
Register
Arguments into function The compiler has a set of rules known as a
r0 Procedure Call Standard that determine how to
Result(s) from function r1
otherwise corruptible pass parameters to a function (see AAPCS)
r2
(Additional parameters r3
passed on stack) CPSR flags may be corrupted by function call.
Assembler code which links with compiled code
r4 must follow the AAPCS at external interfaces
r5
r6 The AAPCS is part of the new ABI for the ARM
Register variables r7 Architecture
Must be preserved r8
r9/sb - Stack base
r10/sl - Stack limit if software stack checking selected
r11

Scratch register r12


(corruptible)

Stack Pointer r13/sp - SP should always be 8-byte (2 word) aligned


Link Register r14/lr - R14 can be used as a temporary once value stacked
Program Counter r15/pc

28
PSR access
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0

N Z C V Q de J GE[3:0] IT cond_abc E A I F T mode


f s x c

▪ MRS and MSR allow contents of CPSR / SPSR to be transferred to / from a


general purpose register or take an immediate value
▪ MSR allows the whole status register, or just parts of it to be updated
▪ Interrupts can be enable/disabled and modes changed, by writing to the CPSR
▪ Typically a read/modify/write strategy should be used:

MRS r0,CPSR ; read CPSR into r0


BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ (BIt Clear)
MSR CPSR_c,r0 ; write modified value to ‘c’ byte only

▪ In User Mode, all bits can be read but only the condition flags (_f) can be modified

29
Addressing modes

039v12 30
ARM assembly language

039v12 31
ARM vs MIPS

039v12 32
Examples: MIPS vs ARM
lw $1, dato($0) ldr r2, =dato // cargar dirección de dato en r2
ldr r1, [r2] // cargar dir de mem apuntada por r2 en r1

sw $1, dato($0) str r2, =dato // cargar dirección de dato en r2


str r1, [r2] // guardar en dir de mem apuntada por r2, r1

add $1, $2, $3 add r1, r2, r3


(sub …) (sub …)

addi $1, $2, 1 add r1, r2, #1

sll $1, $2, 4 mov r1, r2, LSL #4 // desplazamiento lógico a izq. 4 bits de r2 a r1

sra $1, $2, 2 mov r1, r2, ASR #2 // desplazamiento aritmético a drch.2 bits de r2 a r1
srl $1, $2, 2 mov r1, r2, LSR #2 // desplazamiento lógico a drch. 2 bits de r2 a r1
// r1 ← r1 + (r1 << 1) = 3*r1 (multiplicar por 3)
add r1, r1, r1,LSL #1
j dir b dir // salta a la dirección dir

jal fun bl fun // salta a fun y guarda dir. sig. instrc. en reg. lr

jr $ra bx lr // salta a la dir. almacenada en lr (dir. retorno)

beq $1, $2, dir cmp r1, r2 // compara r1 y r2 (r1-r2) cargando flag Z.
(bne) beq dir // mira flag Z, si Z=1 salta
// (mira flag Z, si Z=0 salta)
(bne)

039v12 33
Other examples of ARM vs. MIPS
Stack management
addi $sp, $sp, -4 push {lr} // salvar contenido de un registro (lr) en pila. Se
sw $ra, 0($sp) pueden almacenar varios a la vez {r1, r2, r3, r4}

lw $ra, 0($sp) pop {lr} // sacar contenido de pila y meterlo en un registro


addi $sp, $sp, 4 (lr). Se puede hacer con varios rg. {r1, r2, r3, r4}

Loop control when a counter reaches 0

addi $8, $8, -1 adds r8, r8, #-1 // decrementa r8 y guarda estado en flags (Z)
beq $8, $0, exit beq exit // mira flag Z, si Z=1 salta (Z cargado en adds)

Auto-increment operation to traverse data structures

lw $8, 0($9) ldr r8, [r9], #4 // direccionamiento post-auto-incrementado


add $9, $9, 4

039v12 34
Summary

039v12 35

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy