02 - CH_2_ARM_Processor Architecture(6) (1)
02 - CH_2_ARM_Processor Architecture(6) (1)
Spring 2023
1
References
Tahir, Muhammad, and Kashif Javed. ARM
Microprocessor Systems: Cortex-M
Architecture, Programming, and Interfacing.
CRC Press, 2017, Chapter 1 and 2.
Cortex-M3 Devices Generic User Guide:
https://developer.arm.com/documentation/dui055
2/a
Arm University Program Education Kits
2
Review/ Discussion
What are examples of traditional
microprocessors?
Computer performance metrics?
RISC vs CISC architecture?
3
Topics
An introduction on:
Programming languages
Inside a microprocessor (a simple explanation)
ARM architecture
4
Programming Languages
5
Machine Languages
Computers only understand 0 and 1!
Early computers were programmed in machine
language
Defined by its hardware architecture
Examples of instructions:
Assembly Machine
ADD x,y 100101 -> ADD: Opcode x, y: Operands
LD x,1 100100
SUB x,y 100011
7
High-level Languages
Developed to speed up the programming process
A programming language that combines algebraic
expressions and English words.
Instructions written in these languages are known as
statements.
It’s very close to human language.
Include Basic, FORTRAN, COBOL, Pascal, C, C++, C#, and
Java
The equation wages = rate * hours can be written in C++
as:
wages = rate * hours;
8
Compiler/ assembler
A software which translates the high-level
language to machine language is called
compiler.
A software which translates the assembly
language to machine language is called
assembler.
your code in
Your code in
assembly or high- Compiler
machine language
level language / assembler
9
What is a Program? Why writing a program?
A program is a set of instructions which is translated
into 0 and 1 in order to be understandable for
machine.
11
ARM Microcontroller components
1. Microprocessor core
2. Bus system and bus matrix
3. Memory and peripherals
4. Debug system
5. Nested vectored interrupt controller (NVIC)
Microprocessor
Debug
NVIC core
Interface
Cortex M4/ M3
Peripherals
BUS
I/Os
Memory
12
What’s inside of a microprocessor?
13
What’s inside of a microprocessor?
ALU
Arithmetic and Logical Unit
14
What’s inside of a microprocessor?
ALU
15
What’s inside of a microprocessor?
16
How the data are transferred?
Register access is
faster than
memory access
Higher
performance
17
The Processor Datapath
Datapath: Elements that
process the data and
move and address the
data in the processor
Operands are stored in the
registers
Each register has 32 bits
Data bus is also 32 bits
18
The Processor: Instruction Memory
(Harvard architecture)
Harvard
architecture:
separate Instr
bus
memory for data
and instruction
Von-Neuman
architecture:
one memory for
Instruction
data and memory
instructions
(Example: 8085)
19
The Processor: Instruction Memory
(Harvard architecture)
Instr
bus
20
The Processor: Instruction Memory
(Harvard architecture)
Instr
bus
21
The Processor: Instruction execution cycles
Step 1: Read the program counter(PC)
Step 2: Fetch the Instruction
Step 3: Increment the PC (PC = PC + 4)
Step 4: Access required registers from register file
Step 5: Perform the operation in ALU
22
What is an instruction?
A binary pattern which specifies which task must be performed by the
microprocessor
All ARM instructions have a fixed bit-width size (32 bits in ARM
state, 16 bits in Thumb state)
These binary patterns are different for every microprocessor
The collection of instructions is called instruction set
Each instruction is made of an opcode and one to three operands.
Examples of instructions:
ADD R1, R2
SUB R1, R2, R3
Here ADD and SUB are opcode (or instruction) and R1-R3 are
operands.
23
Instruction Set Architecture (ISA)
ISA is an approach that allows accessing operands.
Memory-Memory: This type of ISA allows more than one
operand of most instructions to be specified in memory.
Example: PDP series
Register-Memory: These architectures allow one operand of an
instruction to be specified in memory, while the other operand is in
CPU register.
Example: x86, Motorola 68k
Register-Register: Also known as load-store architecture.
Direct access to the memory is NOT allowed to most of the
instructions in this ISA.
Only specific instructions, called load and store instructions, are
responsible for any data movement between registers and memory.
Example: ARM, MIPS, RISC-V
24
ARM Instruction Set Architecture (ISA)
ARM supports two different instruction sets:
ARM instructions: 32 bits
Thumb instructions: 16 bits (A subset of ARM instructions)
Offers flexibility
Suitable for applications with limited memory
Thumb-2 instruction set is an enhancement to 16-bit Thumb instruction set.
It adds 32-bit instructions that can be freely intermixed with 16-bit instructions in a
program.
The additional 32-bit instructions enable Thumb-2 to cover the functionality of
the ARM instructions. However, 32-bit Thumb instructions are not exact copy of 32-
bit ARM instructions .
Cortex-M3 supports Thumb 2 and Thumb, but does not support ARM instructions.
25
Arm Architectures and Processors
Arm architecture is a family of RISC-based processor architectures
Well known for its power efficiency
Widely used in mobile devices, e.g. smartphones and tablets
Designed and licensed by Arm to a wide ecosystem of partners
ARM Holdings
The company that designs Arm-based processors
Arm does not manufacture, but it licenses designs to semiconductor
partners, who add their own intellectual property (IP) on top of Arm’s IP,
which they then fabricate and sell to customers
Arm also offers IP other than processors, such as physical IPs, interconnect
IPs, graphics cores and development tools
26
Arm Processor Families
Cortex-A series (Application) Cortex-A5 Cortex-A55
Cortex-A7 Cortex-A57
High performance processors capable of full Cortex-A8
Cortex-A9
Cortex-A65
Cortex-
operating system (OS) support Cortex-A Cortex-A15
Cortex-A17
A65AE
Cortex-A72
Cortex-A32 Cortex-A73
Applications include smartphones, digital TV, smart Cortex-A34
Cortex-A35
Cortex-A75
Cortex-A76
books Cortex-A53 Cortex-
A75AE
IP SoC
Cortex-A9 libraries
Cortex-R5 Cortex-M4 Arm
ROM RAM
processor
Arm7 Arm9 Arm11
System bus
Arm-based
DRAM ctrl FLASH ctrl SRAM ctrl SoC
Peripherals
AXI bus AHB bus APB bus
External Interface
GPIO I/O blocks Timer
Smaller code
Lower silicon costs
Ease of use
Faster software development and reuse
Embedded applications
Smart metering, human interface devices, automotive and industrial
control systems, white goods, consumer products and medical
instrumentation
29
Arm Cortex-M Series Family
Arm Core Hardwar Saturate DSP
Hardware Floating
Processor Architectur Architectur Thumb Thumb-2 e d Extension
Multiply Point
e e Divide Math s
Von 1 or 32
Cortex-M0 Armv6-M Most Subset No No No No
Neumann cycle
Cortex Von 1 or 32
Armv6-M Most Subset No No No No
- Neumann cycle
M0+
Cortex-M4 Armv7E-M Harvard Entire Entire 1 cycle Yes Yes Yes Optional
Cortex-M7 Armv7E-M
Harvard Entire Entire 1 cycle Yes Yes Yes Optional
30
Cortex-M4 Processor Overview
Cortex-M4 processor
Introduced in 2010
Designed with a large variety of highly efficient signal processing features
Features extended single-cycle multiply accumulate instructions,
optimized SIMD arithmetic, saturating arithmetic, and an optional floating-
point unit
Enhanced determinism
The critical tasks and interrupt routines can be served quickly in a known
number of cycles 31
Cortex-M4 Processor Features
32-Bit reduced instruction set computing (RISC) processor
Harvard architecture
Separated data bus and instruction bus
Instruction set
Includes the entire Thumb-1 (16-bit) and Thumb-2 (16/32-bit)
instruction sets
Supported interrupts
Non-maskable interrupt (NMI) + 1 to 240 physical interrupts
8 to 256 interrupt priority levels
32
Cortex-M4 Processor Features
Supports sleep modes
Up to 240 wake-up interrupts
Integrated wait for interrupt (WFI) and wait for event
(WFE) instructions and sleep on exit capability
Sleep and deep sleep signals
Optional retention mode with arm power management kit
Enhanced instructions
Hardware divide (2-12 cycles)
Single-cycle 16, 32-bit MAC, single-cycle dual 16-bit
MAC
8, 16-bit SIMD arithmetic
33
Cortex-M4 Processor Features
Debug
Optional JTAG & serial-wire debug (SWD) ports
Up to eight breakpoints and four watchpoints
34
ARM Cortex M4 Blocks
1. Microprocessor core
2. Bus system and bus matrix
3. Memory and peripherals
4. Debug system
5. Nested vectored interrupt controller (NVIC)
Microprocessor
Debug
NVIC core
Interface
Cortex M4/ M3
Peripherals
BUS
I/Os
Memory
36
Cortex-M4 Block Diagram A[31:0]
Control
signals Address register
special use
Instruction
Memory
37
Cortex-M4 Block Diagram
Processor pipeline stages
Three-stage pipeline: fetch, decode, and execution
Some instructions may take multiple cycles to execute,
in which case the pipeline will be stalled
Speculatively prefetches instructions from branch target
addresses
Up to two instructions can be fetched in one transfer
(16-bit instructions)
38
Cortex-M4 Block Diagram
Nested vectored interrupt controller (NVIC)
Up to 240 interrupt request signals and an NMI
Automatically handles nested interrupts, such as comparing
priorities between interrupt requests and the current priority level
Debug subsystem
Handles debug control, program breakpoints, and data
watchpoints
When a debug event occurs, it can put the processor core in a
halted state, so developers can analyse the status of the processor,
such as register values and flags, at that point
40
ARM Cortex-M3 registers
41
Cortex-M3 Registers
R0-R12: general purpose registers
Low registers (R0-R7) can be accessed by any
instruction
High registers (R8-R12) sometimes cannot be
accessed, e.g. by some Thumb (16-bit)
instructions
43
Cortex-M3 Registers
xPSR, combined program status register (PSR)
Provides information about program execution and ALU
flags
Application PSR (APSR)
Interrupt PSR (IPSR)
Execution PSR (EPSR)
44
Cortex-M3 Registers
APSR
N: negative flag: set to one if the result from the ALU is negative
Z: zero flag: set to one if the result from the ALU is zero
C: carry flag: set to one if an unsigned overflow occurs
V: overflow flag: set to one if a signed overflow occurs
Q: stick saturation flag: set to one if saturation has occurred in saturating
arithmetic instructions, or overflow has occurred in certain multiply instructions
IPSR
Interrupt service routine (ISR) number: current executing ISR number
EPSR
T: Thumb state: always one since Cortex-M4 only supports the Thumb state
IC/IT: Interrupt-continuable instruction (ICI) bit, IF-THEN instruction block
status bit
45
Cortex-M3 Registers
Exception mask registers
1-bit PRIMASK (priority masking register): When it is set, all the exceptions/interrupts are
blocked except the Reset Interrupt, the Non-Maskable Interrupt (NMI), and the HardFault
exception.
1-bit FAULTMASK (fault mask register): Setting FAULTMASK to 1 only allows reset and NMI but
masks the HardFault exception.
1-bit BASEPRI (base priority masking register): The interrupt masking by the BASEPRI is
performed depending on the current priority level conguration.
46
Cortex-M Registers
47
Concept of Overflow
Why microprocessors have separate carry and overflow
flags?
Carry flag represents overflow for unsigned numbers
Overflow flag represents overflow for signed numbers
There can be four possible outcomes when an
arithmetic operation such as addition is performed.
no overflow,
unsigned overflow only,
signed overflow only,
both signed and unsigned overflows.
48
Concept of Overflow
Example: an unsigned overflow but not signed overflow (i.e.,
case 2). Assume 32-bit data.
Assume that we want to perform addition of R0 = 0xFFFFFFFF and
R1= 0x00000001.
The 32-bit answer comes out to be 0x00000000 with a carry 1 (comes
out of the most significant bit (MSB)).
If considered as unsigned, result should be:
4,294,967,295 + 1 = 4,294,967,296.
Answer cannot be accommodated in 32 bits.
Got an incorrect answer (0).
C flag = 1 (unsigned overflow)
If considered as signed, result is:
-1 + 1 = 0
Answer within range of signed numbers.
V flag = 0 (no signed overflow)
49
Concept of Overflow
Example: a signed overflow but not unsigned
overflow (i.e., case 3). Assume 32-bit data.
Assume that we want to perform addition of R0 =
0x7FFFFFFF and R1= 0x7FFFFFFF.
The 32-bit answer comes out to be 0xFFFFFFFE.
If considered as unsigned, result is:
2,147,483,647 + 2,147,483,647 = 4,294,967,294.
C flag = 0 (no unsigned overflow)
If considered as signed, result is:
Signed interpretation is -2
V flag = 1 (signed overflow)
50
Concept of Overflow
Example: both signed and unsigned overflow (i.e., case 4).
Assume 32-bit data.
Assume that we want to perform addition of R0 = 0x80000000 and R1=
0x80000000.
The 32-bit answer comes out to be 0x00000000 with a carry 1 (comes
out of the most significant bit (MSB)).
If considered as unsigned, result should be:
2,147,483,648 + 2,147,483,648 = 4,294,967,296.
Answer cannot be accommodated in 32 bits.
Got an incorrect answer (0).
C flag = 1 (no unsigned overflow)
If considered as signed, result should be:
-2,147,483,648 + -2,147,483,648 = -4,294,967,296
Got an incorrect answer (0).
V flag = 1 (signed overflow)
51
Exceptions
Exceptions are usually used to handle
unexpected events which arise during the
execution of a program
52
Exceptions
Some sources of Exceptions:
A side-effect of an instruction
Data abort (a memory fault during a load or store
data access)
54
ARM operating modes graph
55
Memory types: RAM, SRAM, DRAM
Random Access Memory
Volatile memory: data is lost when the power goes off
57
Memory address map
Predefined memory map
4 GB memory space
Cortex-M3 has an
internal structure which is
optimized for this
memory map
System level:
Interrupt
Debug system
Vender
specific
information
58
Memory Endianness
ARM processors are 32-bit and their memory
interface is also 32-bit and are not limited in that
aspect.
Cortex-M processors support the following common
data types when performing operations or transferring
data to or from memory.
Byte data type of size 8-bits
Halfword data type of size 16-bits
Word data type of size 32-bits
59
Memory Endianness
Little Endian
The processor stores the least significant byte of a word (or
halfword) at the lowest-numbered byte address, and the
most significant byte of the word (or halfword) at the
highest-numbered byte address.
61
Bit-Band Operations
Bit-banding is an important concept, and a very useful tool, of
Cortex-M3.
62
How to modify only some bits in a
register or a memory?
To set or clear bit-2 in an 8-bit register, you can do
Scenario: Set: register |= 0x04
Without bit-banding Clear: register &= 0xFB (&= ~0x04)
Note:
|= “Or Equal” operator
&= “And Equal” operator
63
Bit-Band Alias
Bit-band is supported in two predefined separate
memory regions (called bit-band regions), located in
the first 1 MB of the SRAM region and the peripheral
region, respectively.
Access to these two regions as bit-band regions is not
direct.
Rather, we need to access a separate memory region
called the bit-band alias regions, to perform bit-band
operations in the two predefined bit-band memory
regions.
In other words, normal read/write operations performed
in the bit-band alias regions result in single bit
read/write operations in the actual bit-band region. 64
65
Locating Bit-Band memory regions
Two bit-band regions: One region is located at the start
of SRAM space and the other is located at the start of
peripheral space and each of them is 1 MB in size.
Locations of bit-band regions:
0x20000000 − 0x200FFFFF (SRAM bit-band region, 1 MB)
0x40000000 − 0x400FFFFF (peripheral bit-band region, 1
MB)
Locations of the corresponding bit-band alias regions:
0x22000000 − 0x23FFFFFF (SRAM bit-band alias region, 32
MB)
0x42000000 − 0x43FFFFFF (peripheral bit-band alias
region, 32 MB)
66
Mapping between bit-band region
and its corresponding alias.
Access to bit-banding region.
Each individual bit in the bit-band region is
accessed separately in the least significant bit
(LSB) of 32-bit contents at the word-aligned bit-
band alias address.
67
Mapping between bit-band region
and its corresponding bit-band alias
68
Mapping between bit-band region
and its corresponding bit-band alias
For example, when we want to set the fifth bit at address
0x20000400, what would be the corresponding bit-band alias
address to perform this operation?
Since the MSB of 0x20000000 in the bit-band region is
mapped to 0x2200007F in bit-band alias, correspondingly
the LSB of 0x20000001 is mapped to 0x22000080 in bit-
band alias.
Let x represent the memory address and c is the bit location
in the bit-band region that we want to operate on.
Then the corresponding bit-band alias address y is given by:
69
Examples of Bit-band Alias
0x2200 0008
0x2200 0004 Alias
0x2200 0000
70
Examples of Bit-band Alias
71
Writing single bit to memory
With and Without Bit-band
Example 2.1 (Writing to bit-band region for bit setting.).
Let’s first see how bit setting using conventional load-modify-store
procedure looks like. The pseudocode listed below outlines the key
steps involved in this process.
Step 1: Setup address in bit-band region
Step 2: Read data from the address to register
Step 3: Set the selected bit
Step 4: Write the result back to same address
Now the same activity will be performed using bit band alias region,
then the following steps are required.
How do we know
memory
0x22000008 is
referring to bit-2
of memory
0x20000000?
75
Advantages of Bit-Band Operations
Example 2.3 (Performing branch operation using bit banding)
Let us consider that a branch operation is to be performed after testing a status
bit in one of the registers associated with a peripheral.
This bit can be set or reset based on the presence or absence of a certain
condition related to that peripheral.
Under these situations the normal sequence of operations to perform the above-
mentioned task involves the following steps.
Step 1: Read the complete status register.
Step 2: Mask unwanted bits and perform any arbitrary bit
shifting if required.
Step 3: Compare with the test value and then perform
branch operation if the test condition is true.
However, this very same activity can be efficiently performed based on bit-
banding by reading the status bit using band alias region and involves the
following steps.
Step 1: Read the status bit using bit-band alias region.
Step 2: Compare and perform the branch operation if the
test condition is valid . 76
System Stack Architecture
Uses a special memory region, called stack memory region.
Accessed using stack pointers (MSP or PSP)
Stack memory read and write operations always follow Last-In-First-
Out (LIFO) data buffer format.
Cortex-M processors allocate a small region from the main
system memory (RAM) as the stack memory.
PUSH instruction and the POP instruction are used to access stack.
Stack memory is used in the following situations:
Storing some of the registers (holding application data currently) that
may need to be freed. then retrieved later.
Pass parameter or argument values
Declaring any local variables used by a software function or
subroutine.
In case of exceptions and/or interrupts, the processor status and
general-purpose registers are stored before the corresponding
interrupt service routine is executed. 77
Cortex-M Stack
Cortex-M processor uses a full-descending
stack operation model, where the SP points to the
largest address (also called the stack starting
address) when stack memory is empty.
Two-Stack Model:
Cortex-M processor has two SPs: the MSP and the
PSP.
CONTROL register bit 1 (CONTROL-Bit1).
CONTROL-Bit1 = 0: the MSP is used for both thread
mode and handler mode.
CONTROL-Bit1 = 1: the PSP is used in thread mode
and MSP is used in handler mode.
78
When CONTROL-Bit 1 is 0 both thread and handler
modes use main stack
79
Bus organization
Cortex M3 allows to carry instruction fetch and data
access at the same time.
Main bus interfaces:
Code memory bus
I-code
D-code
System bus
Static RAM (SRAM), peripherals, external
80
Bus organization
81
Bus organization
The Advanced High-performance Bus (AHB) and
Advanced Peripheral Bus (APB) bus protocols are part
of the Advanced Microcontroller Bus Architecture
(AMBA) standard.
AMBA standard consists of a set of multiple bus
protocol specifications.
A bus matrix is used as the AHB interconnection network
Bus matrix allows the data access and instruction fetch to
take place simultaneously
An internal AHB-to-APB bus bridge is used to connect a
number of APB devices, such as debugging components,
which follow the private peripheral bus interface.
82
Bus organization
The bus matrix allows memory and other peripherals to be
accessed using the AHB and APB buses.
83
Bus organization: I-code bus
I-Code bus is a 32-bit
bus based on the AHB-
Lite bus protocol.
It allows to perform
instruction fetches in
memory regions from
0x00000000 to
0x1FFFFFFF.
Instruction fetches are
performed in word size,
even for 16bit Thumb
instructions.
The CPU core can fetch
up to two Thumb
instructions at a time.
84
Bus organization: D-code bus
The D-Code bus is
a 32-bit bus based
on the AHB-Lite
bus protocol.
Data access in
memory regions
from 0x00000000
to 0x1FFFFFFF
can be performed
via this bus.
85
Bus organization: system-bus
The System bus is a
32-bit bus based on
the AHB-Lite bus
protocol.
It is used for
instruction fetch and
data access in
memory regions
from 0x20000000 to
0xDFFFFFFF and
from 0xE0100000 to
0xFFFFFFFF.
86
Bus organization: AHB-AP bus
The processor
contains an AHB-
AP (access port)
interface for debug
accesses.
An external
Debug Port (DP)
component
accesses this
interface.
87
Bus organization: Private Peripheral Bus
(PPB)
The private peripheral bus
(PPB) is a 32-bit bus based on
AMBA-based APB protocol.
This is intended for private
peripheral accesses in memory
regions 0xE0040000 to
0xE00FFFFF.
However, since some part of
this APB memory is already
used for different debug
interfaces, the memory region
that can be used for attaching
extra peripherals on this bus is
only 0xE0042000 to
0xE00FF000.
88
Discussion
Assume that a processor has separate data and
instructions memory. The data memory is 1 GB,
and the instruction memory 512 MB. It has 8 bits
registers. Assume memory cells are 1 byte. This
processor has 16 bits fixed size instructions.
Assume that you are going to design a bus system
for this processor.
90
Summary
Programming languages
ARM architecture
ARM core (Microprocessor, its registers)
ARM ISA
Memory types and ARM Memory map
Bus organization
91
Future sessions
ARM instructions
92