0% found this document useful (0 votes)
7K views

Architecture Programmers Model Instruction Set

The document discusses the history and architecture of ARM processors. It began in 1985 as a replacement for the 6502 chip in Acorn computers. Key aspects include its 32-bit RISC design with load/store architecture and large register set. ARM focuses on low-power embedded cores that are widely licensed. Newer models added features like floating point, vector processing, and the Harvard architecture. The document outlines the ARM instruction set, pipeline, register organization, and operating modes.

Uploaded by

mariyal ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7K views

Architecture Programmers Model Instruction Set

The document discusses the history and architecture of ARM processors. It began in 1985 as a replacement for the 6502 chip in Acorn computers. Key aspects include its 32-bit RISC design with load/store architecture and large register set. ARM focuses on low-power embedded cores that are widely licensed. Newer models added features like floating point, vector processing, and the Harvard architecture. The document outlines the ARM instruction set, pipeline, register organization, and operating modes.

Uploaded by

mariyal ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Agenda

 Introduction
Architecture
Programmers Model
Instruction Set
• History of ARM
ARM (Acorn RISC Machine) started as a new, powerful, CPU design for the
replacement of the 8-bit 6502 in Acorn Computers (Cambridge, UK, 1985)
• First models had only a 26-bit program counter, limiting the memory space to 64 MB
(not too much by today standards, but a lot at that time).
• 1990 spin-off: ARM renamed Advanced RISC Machines
• ARM now focuses on Embedded CPU cores
• IP licensing: Almost every silicon manufacturer sells some microcontroller with
an ARM core. Some even compete with their own designs.
• Processing power with low current consumption
• Good MIPS/Watt figure
• Ideal for portable devices
• Compact memories: 16-bit opcodes (Thumb)
• New cores with added features
• Harvard architecture (ARM9, ARM11, Cortex)
• Floating point arithmetic
• Vector computing (VFP, NEON)
• Java language (Jazelle)
Facts
• 32-bit CPU
• 3-operand instructions (typical): ADD Rd,Rn,Operand2
• RISC design…
• Few, simple, instructions
• Load/store architecture (instructions operate on registers, not memory)
• Large register set
• Pipelined execution
• … Although with some CISC touches…
• Multiplication and Load/Store Multiple are complex instructions (many cycles longer
than regular, RISC, instructions)
• … And some very specific details
• No stack. Link register instead
• PC as a regular register
• Conditional execution of all instructions
• Flags altered or not by data processing instructions (selectable)
• Concurrent shifts/rotations (at the same time of other processing)
• …
Agenda

Introduction
 Architecture
Programmers Model
Instruction Set
Topologies
Von Neumann Harvard

ARM9s
ARM7s and newers
and olders
Inst. Data

AHB
bus
I D
Cache Cache
MEMORY
& I/O

Bus Interface

AHB
Memory-mapped I/O: bus
• No specific instructions for I/O
(use Load/Store instr. instead) MEMORY
• Peripheral’s registers at some & I/O
memory addresses
ARM7TDMI A[31:0]

Block Diagram
Address Register Address
Incrementer

PC bus
PC

REGISTER
BANK

ALU bus

Control Lines
INSTRUCCTION
DECODER
Multiplier

B bus
A bus

SHIFT

A.L.U.
Instruction Reg.

Thumb to
ARM
Write Data Reg. Read Data Reg.
translator

D[31:0]
ARM Pipelining examples
ARM7TDMI Pipeline

FETCH DECODE EXECUTE


Reg. Reg.
Read Shift ALU Write

1 Clock cycle

ARM9TDMI Pipeline

FETCH DECODE EXECUTE MEMORY WRITE


Reg. Reg.
Shift ALU access
Read Write

1 Clock cycle

• Fetch: Read Op-code from memory to internal Instruction Register


• Decode: Activate the appropriate control lines depending on Opcode
• Execute: Do the actual processing
ARM7TDMI Pipelining (I)

1 FETCH DECODE EXECUTE

2 FETCH DECODE EXECUTE

3 FETCH DECODE EXECUTE


instruction
time

• Simple instructions (like ADD) Complete at a rate of one per cycle


ARM7TDMI Pipelining (II)
• More complex instructions:

1 ADD FETCH DECODE EXECUTE

2 STR FETCH DECODE Cal. ADDR Data Xfer.

3 ADD FETCH stall DECODE EXECUTE

4 ADD FETCH stall DECODE EXECUTE

5 ADD FETCH DECODE EXECUTE


instruction
time

STR : 2 effective clock cycles (+1 cycle)


Processor Modes
 The ARM has seven operating modes:

 User : unprivileged mode under which most tasks run

 FIQ : entered when a high priority (fast) interrupt is raised

 IRQ : entered when a low priority (normal) interrupt is raised

 SVC : (Supervisor) entered on reset and when a Software Interrupt


instruction is executed

 Abort : used to handle memory access violations

 Undef : used to handle undefined instructions

 System : privileged mode using the same registers as user mode


The ARM Register Set

Current Visible Registers


r0
Abort
Undef
SVC
IRQ
FIQ
User Mode
Mode
Mode
Mode
Mode
r1
r2
r3 Banked out Registers
r4
r5
User,
r6 User FIQ IRQ SVC Undef Abort
r7
SYS
r8 r8 r8
r9 r9 r9
r10 r10 r10
r11 r11 r11
r12 r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr spsr
Register Organization Summary
User,
FIQ IRQ SVC Undef Abort
SYS
r0
r1
User
r2 mode
r3 r0-r7,
r4 r15, User User User User
r5 and mode mode mode mode
cpsr r0-r12, r0-r12, r0-r12, r0-r12,
r6
r15, r15, r15, r15,
r7 and and and and
r8 r8 cpsr cpsr cpsr cpsr
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr

Note: System mode uses the User mode register set


Program Status Registers
31 28 27 24 23 16 15 8 7 6 5 4 0

N Z C V undefined I F T mode

f s x c

 Condition code flags Interrupt Disable bits.


 N = Negative result from ALU I = 1: Disables the IRQ.
 Z = Zero result from ALU F = 1: Disables the FIQ.
 C = ALU operation Carried out
 V = ALU operation oVerflowed
T Bit (Arch. with Thumb mode only)
T = 0: Processor in ARM state
 Mode bits T = 1: Processor in Thumb state
10000 User
Never change T directly (use BX instead)
10001 FIQ Changing T in CPSR will lead to
10010 IRQ unexpected behavior due to
10011 Supervisor pipelining
10111 Abort
11011 Undefined Tip: Don’t change undefined bits.
11111 System This allows for code compatibility
with newer ARM processors
Program Counter (R15)
 When the processor is executing in ARM state:
 All instructions are 32 bits wide
 All instructions must be word aligned
 Therefore the PC value is stored in bits [31:2] and bits [1:0] are zero
 Due to pipelining, the PC points 8 bytes ahead of the current instruction, or 12
bytes ahead if current instruction includes a register-specified shift

 When the processor is executing in Thumb state:


 All instructions are 16 bits wide
 All instructions must be halfword aligned
 Therefore the PC value is stored in bits [31:1] and bit [0] is zero
Agenda

Introduction
Architecture
Programmers Model
 Instruction Set (for ARM state)
Data processing Instructions
 Consist of :
 Arithmetic: ADD ADC SUB SBC RSB
RSC
 Logical: AND ORR EOR BIC
 Comparisons: CMP CMN TST TEQ
 Data movement: MOV MVN

 These instructions only work on registers, NOT memory.


31 28 25 24 21 20 19 16 15 12 11 0

cond. 0 0 L op-code S Rn Rd Operand 2

L, Literal: 0: Operand 2 from register, 1: Operand 2 immediate


 Syntax:
<Operation>{<cond>}{S} Rd, Rn, Operand2

 {S} means that the Status register is going to be updated


 Comparisons always update the status register. Rd is not specified
 Data movement does not specify Rn
 Second operand is sent to the ALU via barrel shifter.
The Barrel Shifter
LSL : Logical Left Shift ASR: Arithmetic Right Shift

CF Destination 0 Destination CF

Multiplication by a power of 2 Division by a power of 2,


preserving the sign bit

LSR : Logical Shift Right ROR: Rotate Right

...0 Destination CF Destination CF

Division by a power of 2 Bit rotate with wrap around


from LSB to MSB

RRX: Rotate Right Extended

Destination CF

Single bit rotate with wrap around


from CF to MSB
Using the Barrel Shifter:
The Second Operand
Operand Register, optionally with shift operation
Operand
1 2  Shift value can be either be:
 5 bit unsigned integer
 Specified in bottom byte of another
register.
Barrel
Shifter  Used for multiplication by a power of 2
Example: ADD R1, R2, R3, LSL #2
(R2 + R3*4) -> R1

Immediate value
 8 bit number, with a range of 0-255.
ALU
 Rotated right through even number of
positions
 Allows increased range of 32-bit
constants to be loaded directly into
Result registers
Loading 32 bit constants

 To allow larger constants to be loaded, the assembler offers a pseudo-


instruction:
 LDR rd, =const (notice the “=“ sign)
 This will either:
 Produce a MOV or MVN instruction to generate the value (if possible).
or
 Generate a LDR instruction with a PC-relative address to read the constant from
a literal pool (Constant data area embedded in the code).
 For example
 LDR r0,=0xFF => MOV r0,#0xFF
 LDR r0,=0x55555555 => LDR r0,[PC,#Imm12]


DCD 0x55555555
 This is the recommended way of loading constants into a register
Loading addresses: ADR
 The Assembler includes the pseudo-instruction ADR, intended to
load an address into a register
ADR Rd, label
 ADR will be translated into a data processing instruction which
uses PC as the source operand
 For example:
.text Note: PC is 8 bytes ahead of the
.arm
current instruction (pipelining)
.globl _start

_start: mov r0,#1 8074: e3a00001 mov r0, #1


adr r1,msg1 8078: e28f1008 add r1, pc, #8
mov r2,#12 807c: e3a0200c mov r2, #12
swi 0x900004 8080: ef900004 swi 0x00900004
swi 0x900001 8084: ef900001 swi 0x00900001
8088: 6c6c6548
msg1: .ascii "Hello World\n" 808c: 6f57206f
8090: 0a646c72
Data processing instr. FLAGS
 Flags are changed only if the S bit of the op-code is set:
 Mnemonics ending with “s”, like “movs”, and comparisons: cmp, cmn, tst, teq
 N and Z have the expected meaning for all instructions
 N: bit 31 (sign) of the result
 Z: set if result is zero
 Logical instructions (AND, EOR, TST, TEQ, ORR, MOV, BIC, MVN)
 V: unchanged
 C: from barrel shifter if shift ≠ 0. Unchanged otherwise
 Arithmetic instructions (SUB, RSB, ADD, ADC, SBC, RSC, CMP, CMN)
 V: Signed overflow from ALU
 C: Carry (bit 32 of result) from ALU
 When PC is the destination register (exception return)
 CPSR is copied from SPSR. This includes all the flags.
 No change in user or system modes
 Example: SUBS PC,LR,#4 @ return from IRQ
Multiply
 Syntax:
 MUL{<cond>}{S} Rd, Rm, Rs Rd = Rm * Rs
 MLA{<cond>}{S} Rd,Rm,Rs,Rn Rd = (Rm * Rs) + Rn
 [U|S]MULL{<cond>}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo := Rm*Rs
 [U|S]MLAL{<cond>}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo

 Cycle time
 Basic MUL instruction
 2-5 cycles on ARM7TDMI
 1-3 cycles on StrongARM/XScale
 2 cycles on ARM9E/ARM102xE
 +1 cycle for ARM9TDMI (over ARM7TDMI)
 +1 cycle for accumulate (not on 9E though result delay is one cycle longer)
 +1 cycle for “long”

 Above are “general rules” - refer to the TRM for the core you are using for the
exact details
Branch instructions
 Branch : B{<cond>} label
 Branch with Link : BL{<cond>} subroutine_label

31 28 27 25 24 23 0

Cond 1 0 1 L Offset

Link bit 0 = Branch


1 = Branch with link
Condition field

 The processor core shifts the offset field left by 2 positions, sign-extends it
and adds it to the PC
 ± 32 Mbyte range
 How to perform longer branches or absolute address branches?
solution: LDR PC,…
ARM Branches and Subroutines
 BL <subroutine>
 Stores return address in LR
 Returning implemented by restoring the PC from LR
 For non-leaf subroutines, LR will have to be stacked

func1 func2
STMFD sp!, :
: {regs,lr}
:
: :
:
BL func1 BL func2
:
: :
:
: LDMFD sp!,
{regs,pc} MOV pc, lr

main program subroutine leaf subroutine


(no calls)
Single register data transfer

LDR STR Word


LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load

 Memory system must support all access sizes

 Syntax:
 LDR{<cond>}{<size>} Rd, <address>
 STR{<cond>}{<size>} Rd, <address>

e.g. LDREQB
Address accessed
 Address accessed by LDR/STR is specified by a base register plus an offset
 For word and unsigned byte accesses, offset can be
 An unsigned 12-bit immediate value (ie 0 - 4095 bytes).
LDR r0,[r1,#8]
 A register, optionally shifted by an immediate value
LDR r0,[r1,r2]
LDR r0,[r1,r2,LSL#2]
 This can be either added or subtracted from the base register:
LDR r0,[r1,#-8]
LDR r0,[r1,-r2]
LDR r0,[r1,-r2,LSL#2]
 For halfword and signed halfword / byte, offset can be:
 An unsigned 8 bit immediate value (ie 0-255 bytes).
 A register (unshifted).
 Choice of pre-indexed or post-indexed addressing
LDM / STM operation
 Load/Store Multiple Syntax:
<LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register list>
 4 addressing modes:
LDMIA / STMIA increment after
LDMIB / STMIB increment before
LDMDA / STMDA decrement after
LDMDB / STMDB decrement before

IA IB DA DB
LDMxx r10, {r0,r1,r4} r4
STMxx r10, {r0,r1,r4} r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4

Base-update possible: r0 r1

LDM r10!,{r0-r6} r0
LDM/STM for Stack Operations
 Traditionally, a stack grows down in memory, with the last “pushed” value at
the lowest address. The ARM also supports ascending stacks, where the stack
structure grows up through memory.
 The value of the stack pointer can either:
• Point to the last occupied address (Full stack)
– and so needs pre-decrementing/incrementing (ie before the push)
• Point to an unoccupied address (Empty stack)
– and so needs post-decrementing/incrementing (ie after the push)
 The stack type to be used is given by the postfix to the instruction:
• STMFD / LDMFD : Full Descending stack
• STMFA / LDMFA : Full Ascending stack.
• STMED / LDMED : Empty Descending stack
• STMEA / LDMEA : Empty Ascending stack
 Note: ARM Compilers will always use a Full descending stack.
Stack Examples

STMFD sp!, STMED sp!, STMFA sp!, STMEA sp!,


{r0,r1,r3-r5} {r0,r1,r3-r5} {r0,r1,r3-r5} {r0,r1,r3-r5}

0x418
SP r5 SP
r4 r5
r3 r4
r1 r3
r0 r1
Old SP Old SP r5 Old SP Old SP r0 0x400
r5 r4
r4 r3
r3 r1
r1 r0
SP r0 SP
0x3e8
LDM/STM Alias Names

 STMIA, STMIB, STMDA, STMDB are the same instructions


as STMEA, STMFA, STMED, STMFD, respectively

 LDMIA, LDMIB, LDMDA, LDMDB are also the same


instructions as LDMFD, LDMED, LDMFA, LDMEA,
respectively

 The later names are useful when working with stacks


Thumb State
 Thumb is a 16-bit instruction set
 Optimized for code density from C code (~65% of ARM code size)
 Improved performance from memory with a narrow data bus
 Subset of the functionality of the ARM instruction set
 Core has additional execution state - Thumb
 Switch between ARM and Thumb via the BX Rn instruction (Branch and eXchange). If Rn.0 is 1 (odd
address) the processor will change to thumb state.
15 0

ADD r2,#1
Thumb instruction set limitations:
16-bit Thumb Instruction
 Conditional execution only for branches
 Source and destination registers identical
 Only Low registers (R0-R7) used
 Constants are of limited size
31 0  Inline barrel shifter not used
ADDS r2,r2,#1  No MSR, MRS instructions
32-bit ARM Instruction
Atomic data swap

 Exchanges a word or byte between a register and a memory location


 This operation cannot be interrupted, not even by DMA
 Main use: Operating System semaphores
 Syntax:
 SWP {<cond>} Rd, Rm, [Rn]
 SWPB{<cond>} Rd, Rm, [Rn]

Rd=[Rn]; [Rn]=Rm (Rd and Rm can be the same)


Coprocessors
 Coprocessor instructions:
 Coprocessor data operation: CDP
 Coprocessor Load/Store: LDC, STC
 Coprocessor register transfer: MRC, MCR
(some coprocessors, like P14 and P15, only support MRC and MCR)
 A 4-bit coprocessor number (Pxx) has to be specified in these
instructions.
 Result in UNDEF exceptions if coprocessor is missing
 The most common coprocessors:
 P15: System control (cache, MMU, …)
 P14: Debug (Debug Communication Channel)
 P1, P4, P10: Floating point (FPA, FPE, Maverick, VFP, …)
 The assembler can translate the floating-point mnemonics into
coprocessor instructions.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy