Architecture Programmers Model Instruction Set
Architecture Programmers Model Instruction Set
Introduction
Architecture
Programmers Model
Instruction Set
• History of ARM
ARM (Acorn RISC Machine) started as a new, powerful, CPU design for the
replacement of the 8-bit 6502 in Acorn Computers (Cambridge, UK, 1985)
• First models had only a 26-bit program counter, limiting the memory space to 64 MB
(not too much by today standards, but a lot at that time).
• 1990 spin-off: ARM renamed Advanced RISC Machines
• ARM now focuses on Embedded CPU cores
• IP licensing: Almost every silicon manufacturer sells some microcontroller with
an ARM core. Some even compete with their own designs.
• Processing power with low current consumption
• Good MIPS/Watt figure
• Ideal for portable devices
• Compact memories: 16-bit opcodes (Thumb)
• New cores with added features
• Harvard architecture (ARM9, ARM11, Cortex)
• Floating point arithmetic
• Vector computing (VFP, NEON)
• Java language (Jazelle)
Facts
• 32-bit CPU
• 3-operand instructions (typical): ADD Rd,Rn,Operand2
• RISC design…
• Few, simple, instructions
• Load/store architecture (instructions operate on registers, not memory)
• Large register set
• Pipelined execution
• … Although with some CISC touches…
• Multiplication and Load/Store Multiple are complex instructions (many cycles longer
than regular, RISC, instructions)
• … And some very specific details
• No stack. Link register instead
• PC as a regular register
• Conditional execution of all instructions
• Flags altered or not by data processing instructions (selectable)
• Concurrent shifts/rotations (at the same time of other processing)
• …
Agenda
Introduction
Architecture
Programmers Model
Instruction Set
Topologies
Von Neumann Harvard
ARM9s
ARM7s and newers
and olders
Inst. Data
AHB
bus
I D
Cache Cache
MEMORY
& I/O
Bus Interface
AHB
Memory-mapped I/O: bus
• No specific instructions for I/O
(use Load/Store instr. instead) MEMORY
• Peripheral’s registers at some & I/O
memory addresses
ARM7TDMI A[31:0]
Block Diagram
Address Register Address
Incrementer
PC bus
PC
REGISTER
BANK
ALU bus
Control Lines
INSTRUCCTION
DECODER
Multiplier
B bus
A bus
SHIFT
A.L.U.
Instruction Reg.
Thumb to
ARM
Write Data Reg. Read Data Reg.
translator
D[31:0]
ARM Pipelining examples
ARM7TDMI Pipeline
1 Clock cycle
ARM9TDMI Pipeline
1 Clock cycle
cpsr
spsr spsr spsr spsr spsr spsr
Register Organization Summary
User,
FIQ IRQ SVC Undef Abort
SYS
r0
r1
User
r2 mode
r3 r0-r7,
r4 r15, User User User User
r5 and mode mode mode mode
cpsr r0-r12, r0-r12, r0-r12, r0-r12,
r6
r15, r15, r15, r15,
r7 and and and and
r8 r8 cpsr cpsr cpsr cpsr
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)
cpsr
spsr spsr spsr spsr spsr
N Z C V undefined I F T mode
f s x c
Introduction
Architecture
Programmers Model
Instruction Set (for ARM state)
Data processing Instructions
Consist of :
Arithmetic: ADD ADC SUB SBC RSB
RSC
Logical: AND ORR EOR BIC
Comparisons: CMP CMN TST TEQ
Data movement: MOV MVN
CF Destination 0 Destination CF
Destination CF
Immediate value
8 bit number, with a range of 0-255.
ALU
Rotated right through even number of
positions
Allows increased range of 32-bit
constants to be loaded directly into
Result registers
Loading 32 bit constants
Cycle time
Basic MUL instruction
2-5 cycles on ARM7TDMI
1-3 cycles on StrongARM/XScale
2 cycles on ARM9E/ARM102xE
+1 cycle for ARM9TDMI (over ARM7TDMI)
+1 cycle for accumulate (not on 9E though result delay is one cycle longer)
+1 cycle for “long”
Above are “general rules” - refer to the TRM for the core you are using for the
exact details
Branch instructions
Branch : B{<cond>} label
Branch with Link : BL{<cond>} subroutine_label
31 28 27 25 24 23 0
Cond 1 0 1 L Offset
The processor core shifts the offset field left by 2 positions, sign-extends it
and adds it to the PC
± 32 Mbyte range
How to perform longer branches or absolute address branches?
solution: LDR PC,…
ARM Branches and Subroutines
BL <subroutine>
Stores return address in LR
Returning implemented by restoring the PC from LR
For non-leaf subroutines, LR will have to be stacked
func1 func2
STMFD sp!, :
: {regs,lr}
:
: :
:
BL func1 BL func2
:
: :
:
: LDMFD sp!,
{regs,pc} MOV pc, lr
Syntax:
LDR{<cond>}{<size>} Rd, <address>
STR{<cond>}{<size>} Rd, <address>
e.g. LDREQB
Address accessed
Address accessed by LDR/STR is specified by a base register plus an offset
For word and unsigned byte accesses, offset can be
An unsigned 12-bit immediate value (ie 0 - 4095 bytes).
LDR r0,[r1,#8]
A register, optionally shifted by an immediate value
LDR r0,[r1,r2]
LDR r0,[r1,r2,LSL#2]
This can be either added or subtracted from the base register:
LDR r0,[r1,#-8]
LDR r0,[r1,-r2]
LDR r0,[r1,-r2,LSL#2]
For halfword and signed halfword / byte, offset can be:
An unsigned 8 bit immediate value (ie 0-255 bytes).
A register (unshifted).
Choice of pre-indexed or post-indexed addressing
LDM / STM operation
Load/Store Multiple Syntax:
<LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register list>
4 addressing modes:
LDMIA / STMIA increment after
LDMIB / STMIB increment before
LDMDA / STMDA decrement after
LDMDB / STMDB decrement before
IA IB DA DB
LDMxx r10, {r0,r1,r4} r4
STMxx r10, {r0,r1,r4} r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
Base-update possible: r0 r1
LDM r10!,{r0-r6} r0
LDM/STM for Stack Operations
Traditionally, a stack grows down in memory, with the last “pushed” value at
the lowest address. The ARM also supports ascending stacks, where the stack
structure grows up through memory.
The value of the stack pointer can either:
• Point to the last occupied address (Full stack)
– and so needs pre-decrementing/incrementing (ie before the push)
• Point to an unoccupied address (Empty stack)
– and so needs post-decrementing/incrementing (ie after the push)
The stack type to be used is given by the postfix to the instruction:
• STMFD / LDMFD : Full Descending stack
• STMFA / LDMFA : Full Ascending stack.
• STMED / LDMED : Empty Descending stack
• STMEA / LDMEA : Empty Ascending stack
Note: ARM Compilers will always use a Full descending stack.
Stack Examples
0x418
SP r5 SP
r4 r5
r3 r4
r1 r3
r0 r1
Old SP Old SP r5 Old SP Old SP r0 0x400
r5 r4
r4 r3
r3 r1
r1 r0
SP r0 SP
0x3e8
LDM/STM Alias Names
ADD r2,#1
Thumb instruction set limitations:
16-bit Thumb Instruction
Conditional execution only for branches
Source and destination registers identical
Only Low registers (R0-R7) used
Constants are of limited size
31 0 Inline barrel shifter not used
ADDS r2,r2,#1 No MSR, MRS instructions
32-bit ARM Instruction
Atomic data swap