0% found this document useful (0 votes)
28 views

Coa Chapter 5

Uploaded by

ramkh148
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Coa Chapter 5

Uploaded by

ramkh148
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

21CSS201T

COMPUTER ORGANIZATION
AND ARCHITECTURE

UNIT-5

1
Contents
• Parallelism: Need, types , applications and challenges
• Architecture of Parallel Systems-Flynn’s classification
• ARM Processor: The thumb instruction set
• Processor and CPU cores, Instruction Encoding format
• Memory load and Store instruction
• Basics of I/O operations.
• Case study: ARM 5 and ARM 7 Architecture

2
Parallelism: Need, types , applications
and challenges

3
Parallelism
• Executing two or more operations at the same time is known as
parallelism.
• Parallel processing is a method to improve computer system
performance by executing two or more instructions simultaneously
• A parallel computer is a set of processors that are able to work
cooperatively to solve a computational problem.
• Two or more ALUs in CPU can work concurrently to increase
throughput
• The system may have two or more processors operating
concurrently

4
Goals of parallelism
• To increase the computational speed (ie) to reduce the amount
of time that you need to wait for a problem to be solved
• To increase throughput (ie) the amount of processing that can
be accomplished during a given interval of time
• To improve the performance of the computer for a given clock
speed
• To solve bigger problems that might not fit in the limited memory
of a single CPU

5
Applications of Parallelism
• Numeric weather prediction
• Socio economics
• Finite element analysis
• Artificial intelligence and automation
• Genetic engineering
• Weapon research and defence
• Medical Applications
• Remote sensing applications

6
Applications of Parallelism

7
Types of parallelism
1. Hardware Parallelism
2. Software Parallelism

• Hardware Parallelism :
The main objective of hardware parallelism is to increase the processing speed. Based on
the hardware architecture, we can divide hardware parallelism into two types: Processor
parallelism and memory parallelism.
• Processor parallelism
Processor parallelism means that the computer architecture has multiple nodes, multiple
CPUs or multiple sockets, multiple cores, and multiple threads.
• Memory parallelism means shared memory, distributed memory, hybrid distributed shared
memory, multilevel pipelines, etc. Sometimes, it is also called a parallel random access
machine (PRAM). “It is an abstract model for parallel computation which assumes that all the
processors operate synchronously under a single clock and are able to randomly access a
large shared memory. In particular, a processor can execute an arithmetic, logic, or memory
access operation within a single clock cycle”. This is what we call using overlapping or
pipelining instructions to achieve parallelism.
8
Hardware Parallelism
• One way to characterize the parallelism in a processor is by the
number of instruction issues per machine cycle.
• If a processor issues k instructions per machine cycle, then it is called
a k-issue processor.
• In a modern processor, two or more instructions can be issued per
machine cycle.
• A conventional processor takes one or more machine cycles to issue a
single instruction. These types of processors are called one-issue
machines, with a single instruction pipeline in the processor.
• A multiprocessor system which built n k-issue processors should be
able to handle a maximum of nk threads of instructions simultaneously

9
Software Parallelism
• It is defined by the control and data dependence of programs.
• The degree of parallelism is revealed in the program flow graph.
• Software parallelism is a function of algorithm, programming style,
and compiler optimization.
• The program flow graph displays the patterns of simultaneously
executable operations.
• Parallelism in a program varies during the execution period .
• It limits the sustained performance of the processor.

10
11
12
13
14
15
16
Software Parallelism - types
Parallelism in Software

✔ Instruction level parallelism


✔ Task-level parallelism
✔ Data parallelism
✔ Transaction level parallelism

17
Instruction level parallelism
• Instruction level Parallelism (ILP) is a measure of how many
operations can be performed in parallel at the same time in a
computer.

• Parallel instructions are set of instructions that do not depend on


each other to be executed.

• ILP allows the compiler and processor to overlap the execution of


multiple instructions or even to change the order in which
instructions are executed.

18
Eg. Instruction level parallelism
Consider the following example
1. x= a+b
2. y=c-d
3. z=x * y
Operation 3 depends on the results of 1 & 2
So ‘Z ‘ cannot be calculated until X & Y are calculated
But 1 & 2 do not depend on any other. So they can be computed
simultaneously.

19
• If we assume that each operation can be completed in one unit of
time then these 3 operations can be completed in 2 units of time .

• ILP factor is 3/2=1.5 which is greater than without ILP.

• A superscalar CPU architecture implements ILP inside a single


processor which allows faster CPU throughput at the same clock
rate.

20
Data-level parallelism (DLP)
• Data parallelism is parallelization across multiple processors in
parallel computing environments.

• It focuses on distributing the data across different nodes, which


operate on the data in parallel.

• Instructions from a single stream operate concurrently on several


data

21
DLP - example
• Let us assume we want to sum all the elements of the given
array of size n and the time for a single addition operation is Ta
time units.

• In the case of sequential execution, the time taken by the


process will be n*Ta time unit

• if we execute this job as a data parallel job on 4 processors the


time taken would reduce to (n/4)*Ta + merging overhead time
units.

22
DLP in Adding elements of array

23
DLP in matrix multiplication

• A[m x n] dot B [n x k] can be finished in O(n) instead of O(m∗n∗k ) when


executed in parallel using m*k processors.

24
• The locality of data references plays an important part in
evaluating the performance of a data parallel programming
model.

• Locality of data depends on the memory accesses performed


by the program as well as the size of the cache.

25
Flynn’s Classification
• This taxonomy distinguishes multi-processor computer architectures
according to the two independent dimensions of Instruction stream
and Data stream.
• An instruction stream is sequence of instructions executed by
machine.
• A data stream is a sequence of data including input, partial or
temporary results used by instruction stream.
• Each of these dimensions can have only one of two possible states:
Single or Multiple.
• Flynn’s classification depends on the distinction between the
performance of control unit and the data processing unit rather than
its operational and structural interconnections.

26
Flynn’s Classification

• Four category of Flynn classification

27
SISD
• They are also called scalar • SISD computer having one control
processor i.e., one instruction at a unit, one processor unit and single
time and each instruction have only memory unit.
one set of operands. •
• Single instruction: only one
instruction stream is being acted
on by the CPU during any one
clock cycle.
• Single data: only one data stream
is being used as input during any
one clock cycle.
• Deterministic execution.
• Instructions are executed
sequentially.

28
SIMD
• A type of parallel computer. • single instruction is executed by
• Single instruction: All processing different processing unit on
units execute the same different set of data
instruction issued by the control
unit at any given clock cycle .
• Multiple data: Each processing
unit can operate on a different
data element as shown if figure
below the processor are
connected to shared memory or
interconnection network
providing multiple data to
processing unit

29
MISD
• A single data stream is fed into • same data flow through a linear
multiple processing units. array of processors executing
• Each processing unit operates different instruction streams
on the data independently via
independent instruction.
• A single data stream is
forwarded to different
processing unit which are
connected to different control
unit and execute instruction
given to it by control unit to
which it is attached.
30
MIMD
• Multiple Instruction: every • Different processor each
processor may be executing a processing different task.
different instruction stream.
• Multiple Data: every
processor may be working
with a different data stream.
• Execution can be
synchronous or
asynchronous, deterministic
or nondeterministic

31
32
33
ARM Features

34
35
Thumb instruction set (T variant)

36
ARM Core dataflow model

37
38
39
40
41
42
43
Single-core computer

44
Single-core CPU chip
the single core

45
Multi-core architectures
• Replicate multiple processor cores on a
single die.
Core 1 Core 2 Core 3 Core 4

Multi-core CPU chip 46


Multi-core CPU chip
• The cores fit on a single processor socket
• Also called CMP (Chip Multi-Processor)

c o r e c o r e c o r e c o r e

1 2 3 4

47
The cores run in parallel
thread 1 thread 2 thread 3 thread 4

c c c c
o o o o
r r r r
e e e e

1 2 3 4

48
Within each core, threads are time-sliced (just like on a
several uniprocessor)
several several several
threads threads threads threads

c c c c
o o o o
r r r r
e e e e

1 2 3 4

49
Instruction Encoding
• Remember that in a stored program computer, instructions are stored in
memory (just like data)
• Each instruction is fetched (according to the address specified in the PC),
decoded, and executed by the CPU
• The ISA defines the format of an instruction (syntax) and its meaning
(semantics)
• An ISA will define a number of different instruction formats.
• Each format has different fields
• The OPCODE field says what the instruction does (e.g. ADD)
• The OPERAND field(s) say where to find inputs and outputs of the instruction.

50
MIPS Instruction Encoding
The nice thing about MIPS (and other RISC machines) is that it has very few
instruction formats (basically just 3)
• All instructions are the same size (32 bits = 1 word)
• The formats are consistent with each other (i.e. the OPCODE field is
always in the same place, etc.)
• The three formats:
1. I-type (immediate)
2. R-type (register)
3. J-type (jump)

51
I-type (immediate)
• An immediate instruction has the form:
XXXI rt, rs, immed
• Recall that we have 32 registers, so we need ??? bits each to specify
the rt and rs registers
• We allow 6 bits for the opcode (this implies a maximum of ???
opcodes, but there are actually more, see later)
• This leaves 16 bits for the immediate field
31 25 20 15
OPC rs rt immed

26 21 16 0

52
I-type Example
• Example:
ADDI $a0, $12, 33 # a0 <- r12 + 33
The ADDI opcode is 8, register a0 is register # 4
31 25 20 15
8 12 4 33

26 21 16 0

53
Load-Store Formats
•A memory address is 32 bits, so it cannot be directly
encoded in an instruction
• Recall the use of a base register + offset (16-bits) in the
load-store instructions
• Thus, we need an OPCODE, a destination/source
register (destination for load, source for store), a base
register, and an offset
• This sounds very similar to the I-type format... example:
LW $14, 8($sp) # r14 is loaded from
stack+8
• The LW opcode is 35 (0x23)4
31 25 20 15
35 29 14 8
26 21 16 0
54
R-type (register) format
• General form:
XXX rd, rt, rs
• Arithmetic-logical and comparison instructions require the
encoding of 3 registers, the rest can be used to specify the
OPCODE.
• To keep the format as regular as possible, the OPCODE has a
primary “opcode” and a “function” field.
• We also need 5 bits for the shift-amount, in case of SHIFT
instructions.
• The 16 bits used for the immediate field in the I-type instruction
are split into 5 bits for rd, 5 bits for shift-amount, and 6 bits for
function (the other fields are the same).

OPC
31 rs
25 rt 20 rd 15 sht 10funct 5

26 21 16 11 6 0
55
R-type Example

• SUB $7, $8, $9 # r7 <- r8 - r9


• The opcode for all R-type instructions is zero, the function
code for SUB is 34, the shift amount is zero

• 31 25 20 15 10 5
0 8 9 7 0 34
26 21 16 11 6 0

56
J-type (Jump) Format
• For a jump, we only need to specify the opcode, and we can use the other
bits for an address:
31 25
OPC address

26
• We only have 26 bits for the address, but MIPS addresses are 32 bits
long...
• Because the address must reference an instruction, which is a word
address, we can shift the address left by 2 bits (giving us 28 bits). We get
the other 4 bits by combining with the 4 high-order bits of the PC.

57
Branch Addressing
There are 2 kinds of branches:
1.EQ/NEQ family (compares 2 regs for (in)equality), example:
BEQ $14, $8, 1000
2. Compare-to-zero family (compares 1 reg to zero),
example:
BGEZ $14, 1000
• Both “families” require OPCODE, rs register, and offset
(1.) requires an additional register (rt)
(2.) requires some encoding for (>=, <=, )

31 25 20 15
OPC rs rt offset/4
26 21 16
0
or code (for >, <, etc) 58
Branch example
BEQ $14, $8, 1000 # PC := PC+1000 if r14==r8
BGEZ $14, 20 # PC := PC+20 if r14 >= 0
• The opcode for BEQ is 4; for BGEZ is 1, the code for >= is
1
31 4 1425 8 20 15
250

26 21 16
0
1 14 1 5
31 25 20 15

26 21 16
0
59
60
61
Memory Load and Store
Operation
Overview
ARM Load/Store Instructions
• The ARM is a Load/Store Architecture:
• Only load and store instructions can access memory
• Does not support memory to memory data processing operations.
• Must move data values into registers before using them.
Types of instructions
ARM Load/Store Instructions
• ARM has three sets of instructions
which interact with main memory.
These are:
• Single register data transfer
(LDR/STR)
• Block data transfer (LDM/STM)
• Single Data Swap (SWP)
Basic Load and Store Instruction
Syntax and Example

ARM Load/Store Instructions


• Memory system must support all access sizes
• Syntax:
- LDR{<cond>}{<size>} Rd, <address>
- STR{<cond>}{<size>} Rd, <address>
e.g.
-LDR R0, [R1]
- STR R0,[R1]
- LDREQB R0, [R1]
Load Operation
Data Transfer: Memory to Register (load)
• To transfer a word of data, we need to specify two
things:
• Register: r0-r15
• Memory address: more difficult
-Think of memory as a single one-dimensional
array, so we can address it simply by supplying a
pointer to a memory address.
-There are times when we will want to offset
from this pointer.
Case study: ARM 5 and ARM 7
Architecture
Data Sizes and Instruction Sets
■ The ARM is a 32-bit architecture.

■ When used in relation to the ARM:


■ Byte means 8 bits
■ Halfword means 16 bits (two bytes)
■ Word means 32 bits (four bytes)

■ Most ARM’s implement two instruction sets


■ 32-bit ARM Instruction Set
■ 16-bit Thumb Instruction Set

■ Jazelle cores can also execute Java bytecode


Processor Modes
■ The ARM has seven basic operating modes:

■ User : unprivileged mode under which most tasks run

■ FIQ : entered when a high priority (fast) interrupt is raised

■ IRQ : entered when a low priority (normal) interrupt is raised

■ Supervisor : entered on reset and when a Software Interrupt


instruction is executed

■ Abort : used to handle memory access violations

■ Undef : used to handle undefined instructions

■ System : privileged mode using the same registers as user mode


The ARM Register Set
Current Visible Registers
r0
Abort Mode
r1
r2
r3 Banked out Registers
r4
r5
r6 User FIQ IRQ SVC Undef
r7
r8 r8
r9 r9
r10 r10
r11 r11
r12 r12
r13 r13 r13 r13 r13 r13
(sp)
r14 (sp)
r14 (sp)
r14 (sp)
r14 (sp)
r14 (sp)
r14
(lr)
r15 (lr) (lr) (lr) (lr) (lr)
(pc)
cpsr
spsr spsr spsr spsr spsr
Register Organization Summary
r0
r1
User
r2 mode
r3 r0-r7,
r15, User User User User
r4
and mode mode mode mode Thumb state
r5 r0-r12 r0-r12 r0-r12 r0-r12
cpsr Low registers
r6 , , , ,
r7 r15, r15, r15, r15,
and and and and
r8 r8
cpsr cpsr cpsr cpsr
r9 r9
r10 r10 Thumb state
r11 r11 High registers
r12 r12
r13 r13 r13 r13 r13 r13
(sp)
r14 (sp)
r14 (sp)
r14 (sp)
r14 (sp)
r14 (sp)
r14
(lr)
r15 (lr) (lr) (lr) (lr) (lr)
(pc)
cpsr
spsr spsr spsr spsr spsr

Note: System mode uses the User mode register set


The Registers
■ ARM has 37 registers all of which are 32-bits long.
■ 1 dedicated program counter
■ 1 dedicated current program status register
■ 5 dedicated saved program status registers
■ 30 general purpose registers

■ The current processor mode governs which of several banks is accessible.


Each mode can access
■ a particular set of r0-r12 registers
■ a particular r13 (the stack pointer, sp) and r14 (the link register, lr)
■ the program counter, r15 (pc)
■ the current program status register, cpsr

Privileged modes (except System) can also access


■ a particular spsr (saved program status register)
Program Status Registers
31 28 27 24 23 16 15 8 7 6 5 4 0

N Z C V Q J U n d e f i n e d I F T mode
f s x c
• Condition code flags • Interrupt Disable bits.
• N = Negative result from ALU • I = 1: Disables the IRQ.
• Z = Zero result from ALU • F = 1: Disables the FIQ.
• C = ALU operation Carried out
• V = ALU operation oVerflowed
• T Bit
• Architecture xT only
• Sticky Overflow flag - Q flag • T = 0: Processor in ARM state
• Architecture 5TE/J only • T = 1: Processor in Thumb state
• Indicates if saturation has occurred

• Mode bits
• J bit
• Specify the processor mode
• Architecture 5TEJ only
• J = 1: Processor in Jazelle state
Program Counter (r15)
■ When the processor is executing in ARM state:
■ All instructions are 32 bits wide
■ All instructions must be word aligned
■ Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction
cannot be halfword or byte aligned).

■ When the processor is executing in Thumb state:


■ All instructions are 16 bits wide
■ All instructions must be halfword aligned
■ Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as instruction
cannot be byte aligned).

■ When the processor is executing in Jazelle state:


■ All instructions are 8 bits wide
■ Processor performs a word access to read 4 instructions at once
Development of the
ARM Architecture
Improved 5 5
Halfword and ARM/Thumb Jazelle
4 T T
signed Interworking Java bytecode
1 E
halfword /
CLZ E execution
byte support J
System SA-110 ARM9EJ-S ARM926EJ-S
Saturated maths
mode
2 DSP
SA-1110 multiply-accumulat ARM7EJ-S ARM1026EJ-S
e instructions

3 ARM1020E SIMD Instructions


Thumb
instruction 4T Multi-processing 6
set XScale
V6 Memory
Early ARM
architecture (VMSA)
architectures ARM7TDMI ARM9TDMI ARM9E-S
Unaligned data
support
ARM720T ARM940T ARM966E-S ARM1136EJ-S
Conditional Execution and Flags
• ARM instructions can be made to execute conditionally by postfixing
them with the appropriate condition code field.
• This improves code density and performance by reducing the number of
forward branch instructions.
CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip

• By default, data processing instructions do not affect the condition


code flags but the flags can be decrement
optionally set by using “S”. CMP
r1 and set flags

does not need “S”. if Z flag clear then branch


loop

SUBS r1,r1,#1
BNE loop
Condition Codes
• The possible condition codes are listed below: Note AL is
the default and does not need to be specified

Suffix Description Flags tested


EQ Equal Z=1
NE Not equal Z=0
CS/HS Unsigned higher or same C=1
CC/LO Unsigned lower C=0
MI Minus N=1
PL Positive or Zero N=0
VS Overflow V=1
VC No overflow V=0
HI Unsigned higher C=1 & Z=0
LS Unsigned lower or same C=0 or Z=1
GE Greater or equal N=V
LT Less than N!=V
GT Greater than Z=0 & N=V
LE Less than or equal Z=1 or N=!V
AL Always
Examples of conditional execution
• Use a sequence of several conditional instructions
if (a==0) func(1);
CMP r0,#0
MOVEQ r0,#1
BLEQ func

• Set the flags, then use various condition codes


if (a==0) x=0;
if (a>0) x=1;
CMP r0,#0
MOVEQ r1,#0
MOVGT r1,#1

• Use conditional compare instructions


if (a==4 || a==10) x=0;
CMP r0,#4
CMPNE r0,#10
MOVEQ r1,#0
Branch instructions
• Branch : B{<cond>} label
• Branch with Link : BL{<cond>} subroutine_label

31 28 27 25 24 23 0

Cond 1 0 1 L Offset

Link bit 0 = Branch


1 = Branch with link
Condition field

• The processor core shifts the offset field left by 2 positions, sign-extends it and adds it to
the PC
• ± 32 Mbyte range
• How to perform longer branches?
Data processing Instructions
• Consist of :
• Arithmetic: ADD ADC SUB SBC RSB RSC

• Logical: AND ORR EOR BIC

• Comparisons: CMP CMN TST TEQ

• Data movement: MOV MVN

• These instructions only work on registers, NOT memory.

• Syntax:

<Operation>{<cond>}{S} Rd, Rn, Operand2

• Comparisons set flags only - they do not specify Rd


• Data movement does not specify Rn

• Second operand is sent to the ALU via barrel shifter.


The Barrel Shifter
LSL : Logical Left Shift ASR: Arithmetic Right Shift

CF Destination 0 Destination CF

Multiplication by a power of 2 Division by a power of 2,


preserving the sign bit

LSR : Logical Shift Right ROR: Rotate Right

...0 Destination CF Destination CF

Division by a power of 2 Bit rotate with wrap around


from LSB to MSB

RRX: Rotate Right Extended

Destination CF

Single bit rotate with wrap around


from CF to MSB
Using the Barrel Shifter: The Second Operand
Register, optionally with shift
Operand Operand operation
1 2 • Shift value can be either be:
• 5 bit unsigned integer
• Specified in bottom byte of
another register.
Barrel • Used for multiplication by
Shifter constant

Immediate value
• 8 bit number, with a range of
0-255.
ALU • Rotated right through even
number of positions
• Allows increased range of
32-bit constants to be loaded
directly into registers
Result
Immediate constants
• Examples: 31 0

ror #0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0x000000ff step 0x00000001

ror #8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0xff000000 step 0x01000000

ror
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0x000003fc step 0x00000004
#30

• The assembler converts immediate values to the rotate form:


• MOV r0,#4096 ; uses 0x40 ror 26
• ADD r1,r2,#0xFF0000 ; uses 0xFF ror 16

• The bitwise complements can also be formed using MVN:


• MOV r0, #0xFFFFFFFF ; assembles to MVN r0,#0

• Values that cannot be generated in this way will cause an error.


Loading 32 bit constants
• To allow larger constants to be loaded, the assembler
offers a pseudo-instruction:
• LDR rd, =const

• This will either:


• Produce a MOV or MVN instruction to generate the value (if possible).
or
• Generate a LDR instruction with a PC-relative address to read the
constant from a literal pool (Constant data area embedded in the
code).
• For example
• LDR r0,=0xFF => MOV r0,#0xFF

• LDR r0,=0x55555555 => LDR r0,[PC,#Imm12]




DCD 0x55555555

• This is the recommended way of loading constants into a


register
Multiply
• Syntax:
• MUL{<cond>}{S} Rd, Rm, Rs Rd = Rm * Rs
• MLA{<cond>}{S} Rd,Rm,Rs,Rn Rd = (Rm * Rs) + Rn
• [U|S]MULL{<cond>}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo := Rm*Rs
• [U|S]MLAL{<cond>}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo := (Rm*Rs)+RdHi,RdLo

• Cycle time
• Basic MUL instruction
• 2-5 cycles on ARM7TDMI
• 1-3 cycles on StrongARM/XScale
• 2 cycles on ARM9E/ARM102xE
• +1 cycle for ARM9TDMI (over ARM7TDMI)
• +1 cycle for accumulate (not on 9E though result delay is one cycle longer)
• +1 cycle for “long”

• Above are “general rules” - refer to the TRM for the core you are using for the
exact details
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load

• Memory system must support all access sizes

• Syntax:
• LDR{<cond>}{<size>} Rd, <address>
• STR{<cond>}{<size>} Rd, <address>

e.g. LDREQB
Address accessed
• Address accessed by LDR/STR is specified by a base
register plus an offset
• For word and unsigned byte accesses, offset can be
• An unsigned 12-bit immediate value (ie 0 - 4095 bytes).
LDR r0,[r1,#8]

• A register, optionally shifted by an immediate value


LDR r0,[r1,r2]
LDR r0,[r1,r2,LSL#2]

• This can be either added or subtracted from the base


register:
LDR r0,[r1,#-8]
LDR r0,[r1,-r2]
LDR r0,[r1,-r2,LSL#2]

• For halfword and signed halfword / byte, offset can be:


• An unsigned 8 bit immediate value (ie 0-255 bytes).
• A register (unshifted).
• Choice of pre-indexed or post-indexed addressing
Pre or Post Indexed Addressing?
• Pre-indexed: STR r0,[r1,#12]

Offset r0
Source
12 0x20c 0x5 0x5 Register
for STR
r1
Base
Register 0x200 0x200

Auto-update form: STR r0,[r1,#12]!

■ Post-indexed: STR r0,[r1],#12


Updated r1 Offset
Base 0x20c 12 0x20c
Register r0
Source
Original r1 0x5 Register
Base 0x200 0x5 for STR
Register 0x200
LDM / STM operation
• Syntax:
<LDM|STM> {<cond>}<addressing_mode> Rb{!}, <register list>
• 4 addressing modes:
LDMIA / STMIA increment after
LDMIB / STMIB increment before
LDMDA / STMDA decrement after
LDMDB / STMDB decrement before

IA IB DA DB
LDMxx r10, {r0,r1,r4} r4
STMxx r10, {r0,r1,r4} r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0
Software Interrupt (SWI)
31 28 27 24 23 0

Cond 1 1 1 1 SWI number (ignored by processor)

Condition Field
• Causes an exception trap to the SWI hardware
vector
• The SWI handler can examine the SWI number
to decide what operation has been requested.
• By using the SWI mechanism, an operating
system can implement a set of privileged
operations which applications running in user
mode can request.
• Syntax:
• SWI{<cond>} <SWI number>
PSR Transfer Instructions
31 28 27 24 23 16 15 8 7 6 5 4 0

N Z C V Q J U n d e f i n e d I F T mode
f s x c

• MRS and MSR allow contents of CPSR / SPSR to


be transferred to / from a general purpose register.
• Syntax:
• MRS{<cond>} Rd,<psr> ; Rd = <psr>

• MSR{<cond>} <psr[_fields]>,Rm ; <psr[_fields]> = Rm

where
• <psr> = CPSR or SPSR
• [_fields] = any combination of ‘fsxc’

• Also an immediate form


• MSR{<cond>} <psr_fields>,#Immediate

• In User Mode, all bits can be read but only the


condition flags (_f) can be written.
ARM Branches and Subroutines
• B <label>
• PC relative. ±32 Mbyte range.
• BL <subroutine>
• Stores return address in LR
• Returning implemented by restoring the PC from LR
• For non-leaf functions, LR will have to be stacked
func1 func2
STMFD :
: sp!,{regs,lr}
: :
:
BL func1 :
BL func2
: :
:
: :
LDMFD
sp!,{regs,pc} MOV pc, lr
Thumb
• Thumb is a 16-bit instruction set
• Optimised for code density from C code (~65% of ARM code size)
• Improved performance from narrow memory
• Subset of the functionality of the ARM instruction set
• Core has additional execution state - Thumb
• Switch between ARM and Thumb using BX instruction

31 0
ADDS r2,r2,#1
32-bit ARM Instruction
For most instructions generated by compiler:
■ Conditional execution is not used
■ Source and destination registers identical
■ Only Low registers used
■ Constants are of limited size
■ Inline barrel shifter not used
15 0
ADD r2,#1
16-bit Thumb Instruction
Example ARM-based System

16 bit RAM 32 bit RAM

Interrupt
Controll
er I/O
nIRQ nFIQ
Peripherals

ARM
Core
8 bit ROM
AMBA
Arbiter Reset

ARM
TIC
Remap/
External Bus Interface Timer
Pause
ROM External

Bridge
Bus
Interface
External
RAM On-chip Interrupt
Decoder RAM Controller

AHB or ASB APB

System Bus Peripheral Bus

• AMBA • ACT
• Advanced Microcontroller Bus • AMBA Compliance Testbench
Architecture
• ADK • PrimeCell
• Complete AMBA Design Kit • ARM’s AMBA compliant peripherals

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy