Module 1-Complete
Module 1-Complete
Language Programming
& Architecture
Module -1
Faculty:
Dr. Shilpa Ankalaki
Assistant Professor
Department of CSE
Manipal Institute of Technology Bengaluru
Another name for embedded systems is Cyber-Physical Systems, introduced in 2006 by Helen Gill of
the National Science Foundation, because these systems combine the intelligence of a computer with the
physical objects of our world.
In an embedded system, we use ROM for storing the software and fixed constant data and RAM for
storing temporary information. Many microcomputers employed in embedded systems use Flash
EEPROM, which is an electrically erasable programmable ROM, because the information can easily be
erased and reprogrammed.
The microprocessors do not contain RAM, ROM, or I/O peripherals. As a result, they
must be connected externally to RAM, ROM and I/O through buses
• The ARM came out of a company called Acorn Computers in United Kingdom in the 1980s.
• Professor Steve Furber of Manchester University worked with Sophie Wilson to define the ARM
architecture and instructions.
• The VLSI Technology Corp. produced the first ARM chip in 1985 for Acorn Computers and was
designated as Acorn RISC Machine (ARM).
• Apple Corp. got interested in using the ARM chip for the PDA (personal digital assistants) products.
• This renewed interest in the chip led to the creation of a new company called ARM (Advanced RISC
Machine).
• This new company bet its entire fortune on selling the rights to this new CPU to other silicon
manufacturers and design houses. Since the early 1990s, an ever increasing number of companies have
licensed the right to make the ARM chip.
• ARM has defined the details of architecture, registers, instruction set, memory map, and timing of the
ARM CPU and holds the copyright to it.
• The various design houses and semiconductor manufacturers license the IP (intellectual property) for
the CPU and can add their own peripherals as they please.
• It is up to the licensee (design houses and semiconductor manufactures) to define the details of
peripherals such as I/O ports, serial port UART, timer, ADC, SPI, DAC, I2C, and so on.
• As a result while the CPU instructions and architecture are same across all the ARM chips made by
different vendors, their peripherals are not compatible. This is the only drawback of ARM
• The good news is the IDE (integrated development environment) such as Keil (see www.keil.com) or
IAR (see www.IAR.com) do provide peripheral libraries for chips from various vendors and make the
job of programming the peripherals much easier.
• In recent years ARM provides the IP for some peripherals such as UART and SPI, but unlike the CPU
architecture, its adoption is not mandatory and it is up to the chip manufacturer whether to adopt it or
not.
23-01-2025 Department of Computer Science and Engineering 13
23-01-2025 Department of Computer Science and Engineering 14
Arm Architecture
The ARM is a Reduced Instruction Set Computer (RISC) system and includes the
attributes typical to that type of system:
• A large array of uniform registers.
• A load/store model of data-processing where operations can only operate on
registers and not directly on memory. This requires that all data be loaded into
registers before an operation can be preformed, the result can then be used for
further processing or stored back into memory.
• A small number of addressing modes with all load/store addresses begin
determined from registers and instruction fields only.
• A uniform fixed length instruction (32-bit).
In addition to these traditional features of a RISC system the ARM provides a
number of additional features:
• Separate Arithmetic Logic Unit (ALU) and shifter giving additional control
over data processing to maximize execution speed.
• Auto-increment and Auto-decrement addressing modes to improve the
operation of program loops.
• Conditional execution of instructions to reduce pipeline flushing and thus
increase execution speed.
23-01-2025 Department of Computer Science and Engineering 15
ARM CORE DATA FLOW MODEL
Fig. shows the basic structure of ARM core and how data moves
between its different parts.
An ARM core dataflow model may be thought of as functional units connected by
data buses, with the arrows representing data flow, the lines representing buses, and
the boxes representing either an operating unit or a storage region. The diagram
depicts both the data flow and the abstract components that make up an ARM core.
The Data bus is where data enters the CPU core. The data might be an executable
command or a data object.
R2= R0+R1*8
• The ARM CPU uses the tri-part instruction format for most instructions. One of the most common
format is
instruction destination,source1,source2
Depending on the instruction the source2 can be a register, immediate (constant) value, or memory. The
destination is often a register or read/write memory.
MOV instruction
Simply stated, the MOV instruction copies data into register or from register to register. It has the following formats:
MOV Rn,Op2 ; load Rn register with Op2 (Operand2). ;Op2 can be immediate
MOV R2,#0x25 ;load R2 with 0x25 (R2 = 0x25)
MOV R5,R7 ;copy contents of R7 into R5 (R5 = R7)
• In ARM the R13, R14, R15, and CPSR (current program status register)
registers are called SFRs (special function registers) since each one is
dedicated to a specific function.
• A given special function register is dedicated to specific function such as
status register, program counter, stack pointer, and so on. The function of
each SFR is fixed by the CPU designer at the time of design because it is
used for control of the microcontroller or keeping track of specific CPU
status.
• The R13 is set aside for stack pointer. The R14 is designated as link
register which holds the return address when the CPU calls a subroutine
and the R15 is the program counter (PC).
• The PC (program counter) points to the address of the next instruction to
be executed.
• The CPSR (current program status register) is used for keeping
condition flags among other things. In contrast to SFRs, the GPRs (R0-
ARM Registers and ALU
R12) do not have any specific function and are used for storing general
data.
23-01-2025 Department of Computer Science and Engineering 48
ARM Data Format
ARM data type
ARM has four data types. They are bit, byte (8-bit), half-word (16-bit) and word
(32 bit).
Data format representation
There are several ways to represent a byte of data in the ARM assembler. The
numbers can be in hex, binary, decimal, or ASCII formats.
Hex numbers
To represent Hex numbers in an ARM assembler we put 0x (or 0X) in front of the
number
MOV R1,#0x99
Decimal numbers
To indicate decimal numbers in some ARM assemblers such as Keil we simply use
the decimal (e.g., 12) and nothing before or after it.
MOV R7,#12
ARM Assembly Language Programming & Architecture by Mazidi, et al.
Binary numbers
To represent binary numbers in an ARM assembler we put 2_ in front of the
number.
MOV R6,#2_10011001
Numbers in any base between 2 and 9
To indicate a number in any base n between 2 and 9 in an ARM assembler we
simply use the n_ in front of it.
ASCII characters
To represent ASCII data in an ARM assembler we use single quotes as follows:
LDR R3,#‘2’ ;R3 = 00110010 or 32 in hex. ASCII of 2.
To represent a string, double quotes are used; and for defining ASCII strings (more
than one character), we use the DCB directive.
ALIGN
This is used to make sure data is aligned in 32-bit word or 16-bit half word memory address. The following uses ALIGN to
make the data 32-bit word aligned:
AREA E2_7A, READONLY, CODE When there is no ALIGN directive the DCB directive allocates
ENTRY the first empty location for its data. In this example, address
ADR R2,DTA 0x10 is allocated for 0x55. So 0x22 goes to address 0x11.
LDRB R0,[R2]
ADD R1,R1,R0
H1 B H1
DTA DCB 0x55 DCB 0x22
END
In the example the ALIGN is set to 2 which means the data
should be put in a location with even address. The 0x55 goes
to the first empty location which is 0x10. The next empty
location is 0x11 which is not a multiple of 2. So, it is filled
with 0 and the next data goes to location 0x12.
In the example the ALIGN is set to 4 which means the data
should go to locations whose address is multiple of 4. The
0x55 goes to the first empty location which is 0x10. The next
empty locations are 0x11, 0x12, and 0x13 which are not a
multiple of 4. So, they are filled with 0s and the next data
goes to location 0x14.
SPACE directive
Using the SPACE directive we can allocate memory for variables.
LONG_VAR SPACE 4 ;Allocate 4 bytes
OUR_ALFA SPACE 2 ;Allocate 2 bytes
Pseudo Instructions:
LDR Pseudo Instruction
LDR Rd,=32-bit_immidiate_value
Notice the = sign used in the syntax. The following pseudo-instruction loads R7 with 0x112233.
LDR R7, =0x112233
To load values less than 0xFF, “MOV Rd, #8-bit_immidiate_value” instruction is used since it is
a real instruction of ARM, therefore more efficient in code size.
The first column of each line is always considered as label. Thus, be careful to
press a Tab at the beginning of each line that does not have label; otherwise, your
instruction is considered as a label and an error message will appear when
compiling.
By choosing label names that are meaningful, a programmer can make a program much easier to
read and maintain. There are several rules that names must follow.
• First, each label name must be unique.
• The names used for labels in Assembly language programming consist of alphabetic letters in
both uppercase and lowercase, the digits 0 through 9, and the special characters question mark
(?), period (.), at (@), underline (_), and dollar sign ($).
• The first character of the label must be an alphabetic character. In other words, it cannot be a
number.
• Every assembler has some reserved words that must not be used as labels in the program.
Foremost among the reserved words are the mnemonics for the instructions. For example,
“MOV” and “ADD” are reserved because they are instruction mnemonics.
ARM Memory Map and Memory Access
32
The ARM CPU uses __________________-bit addresses which gives us a maximum
04
of ______________________________GB (gigabytes) of memory space.
0x00000000 to 0xFFFFFFFF
This 4GB of memory space has addresses, _________________________________
• The data SRAM is used by the CPU for data variables and stack, whereas the EEPROMs are considered to be
memory that one can also add externally to the chip. In other words, while many ARM microcontroller chips
have no EEPROM memory, it is very unlikely for an ARM microcontroller to have no on-chip data SRAM.
• The on-chip Flash ROM is used for program code, while the EEPROM is used most often for critical system
data that must not be lost if power is cut off.
• The data SRAM is volatile memory and its contents are lost if the power to the chip is cut off. Since volatile data
SRAM is used for dynamic variables (constantly changing data) and stack. We need EEPROM memory to secure
critical system data that does not change very often and will not be lost in the event of power failure.
• The on-chip Flash ROM is programmed and erased in block size. The block size is 8, 16, 32, or 64 bytes or
more depending on the chip technology. That is not the case with EEPROM, since the EEPROM is byte
programmable and erasable. Both the EEPROM and Flash memories have limited number of erase/write
cycles, which can be 100,000 or more.
23-01-2025 75
A given ARM chip has the following address assignments. Calculate the space and the
amount of memory given to each section.
(a) Address range of 0x00100000 – 0x00100FFF for EEPROM
(b) Address range of 0x40000000 – 0x40007FFF for SRAM
(c) Address range of 0x00000000 – 0x0007FFFF for Flash
(d) Address range of 0xFFFC0000 – 0xFFFFFFFF for peripherals
Solution:
(a) With address space of 0x00100000 to 00100FFF, we have 00100FFF – 00100000 =
0FFF bytes. Converting 0FFF to decimal, we get 4,095 + 1, which is equal to 4K bytes.
(c) With address space of 0000 to 7FFFF, we have 7FFFF – 0 = 7FFFF bytes. Converting
7FFFF to decimal, we get 524,287 + 1, which is equal to 512K bytes.
ARM systems use DRAM for the RAM memory, just like the x86
and Pentium PCs.
DRAM as primary memory to store both the operating systems
and the applications.
DRAM as primary memory to store both the operating systems
and the applications.
In such systems, the Flash memory will be holding the POST
(power on self test), BIOS (basic Input/output systems) and
boot programs.
Same as x86 system, such systems have both on-chip and off-
chip high speed SRAM for cache.
ARM chips on the market with some on-chip Flash ROM, SRAM,
and memory decoding circuitry for connection to external (off-
chip) memory.
Solution:
1/50 MHz = 20 ns is the bus clock period. Since the bus cycle time of zero wait states is 2 clocks, we have:
Memory cycle time with 0 WS 2 × 20 = 40 ns
Memory cycle time with 1 WS 40 + 20 = 60 ns
Memory cycle time with 2 WS 40 + 2 × 20 = 80 ns
• The rate of data transfer is generally called bus bandwidth. In other words, bus
bandwidth is a measure of how fast buses transfer information between the CPU and
memory or peripherals. The wider the data bus, the higher the bus bandwidth.
• The advantage of the wider external data bus comes at the cost of increasing the die
size for system on-chip (SOC) or the printed circuit board size for off-chip memory
• The speed of the CPU must be matched with the higher bus bandwidth; otherwise,
there is no use for a fast CPU.
• Bus bandwidth is measured in MB (megabytes) per second and is calculated as
follows:
bus bandwidth = (1/bus cycle time) × bus width in byte
• The placement of data in SRAM by the programmer that can be nonaligned and therefore subject to memory
access penalty.
• The single cycle access of memory is also used by ARM to bring into registers 4 bytes of data every clock cycle assuming
that the data is aligned.
• The use of align directive for RAM data makes sure that each word is located at an address location ending with address
of 0, 4, 8, or C.
• If our data is word size (using DCDU directive) then the use of align directive at the start of the data section guaranties all
the data placements will be word aligned. When a word size data is defined using the DCD directive, the assembler aligns
it to be word aligned
• Show the data transfer of the following LDRB instructions with appropriate figures
and indicate the number of memory cycle times it takes for data transfer.
LDR R1,=0x80000000
LDR R3,=0xF31E4598
LDR R4,=0x1A2B3D4F
STR R3,[R1]
STR R4,[R1,#4]
LDRB R2,[R1]
LDRB R2,[R1,#1]
LDRB R2,[R1,#2].
For the LDRH R2,[R1,#1], instruction, locations with addresses of 0x80000000, 0x80000001, 0x80000002, and
0x80000003 are accessed, but only 0x80000001 and 0x80000002 are used to get the 16 bits to R2. Therefore, it
takes only one memory cycle to transfer the data. Now, R2=0x00001E45.
For the LDRH R2,[R1,#3] instruction, in the first memory cycle, locations with addresses of 0x80000000,
0x80000001, 0x80000002, and 0x80000003 are accessed, but only 0x80000003 is used to get the lower 8
bits to R2. In the second memory cycle, the address locations 0x80000004, 0x80000005, 0x80000006, and
0x80000007 are accessed where only the 0x80000004 location is used to get the upper 8 bits to R2. Now,
R2=0x00004FF3
;assume R5 = 0x40000100
MOV R1,#0x55 ;R1 = 55 (in hex)
STRB R1,[R5] ;copy R1 location pointed to by R5
• The CPU can access operands (data) in various ways, called addressing modes. The number of
addressing modes is determined when the microprocessor is designed and cannot be changed. Using
advanced addressing modes the accessing of different data types and data structures (e.g. arrays,
pointers, classes) .
• Some of the simple ARM addressing modes are:
1. register
2. immediate
3. register indirect (indexed addressing mode)
4. PC relative Addressing mode
• In indexed addressing modes, any registers including the PC (R15) register can be used as
the pointer register.
• For example, the following instruction reads the contents of memory location PC+4: LDR
• LDR R0,[PC,offset] EA= Address of current instruction + 8 + offset
LDR R0,[PC,#4]
• In this way, the data which has a known distance from the current executing line can be
accessed.
• The PC register points 8 bytes (2 instructions) ahead of executing instruction.
• As a result, “LDR R0,[PC,#4]” accesses a memory location whose address is 4+8 bytes ahead
of the current instruction.
• For instance, if “LDR R0,[PC,#4]” is located in address 0x10 the effective address is:
0x10 + 8 + 4 = 0x1C.
In von Neumann, there are no separate buses for code and data memory. In
Harvard, there are separate buses for code and data memory.
ARM Assembly Language Programming & Architecture by Mazidi, et al.
When the CPU wants to execute the “LDR Rd,[Rx]” instruction, it puts Rx on the
address bus of the system, and receives data through the
data bus. For example, to execute “LDR R2,[R5]”, assuming that R5 =
0x40000200, the CPU puts the value of R5 on the address bus. The Memory
puts the contents of location 0x40000200 on the data
bus. The CPU gets the contents of location 0x40000200 through the data bus
and brings into CPU and puts it in R2.
The “STR Rx,[Rd]” instruction is executed similarly. The CPU puts Rd on the
address bus and the contents of Rx on the data bus. The memory location
whose address is on the address bus receives the contents of data bus.
------------------------------------------------------------------------
Register to register (Register direct) MOV R0, R1
------------------------------------------------------------------------
Absolute (Direct) LDR R0, MEM
------------------------------------------------------------------------
Literal (Immediate) MOV R0, #15
ADD R1, R2, #12
------------------------------------------------------------------------
Indexed, base (Register indirect) LDR R0, [R1]
------------------------------------------------------------------------
Pre-indexed, base with displacement (Register indirect with offset)
LDR R0, [R1, #4] ;Load R0 from [R1+4]
LDR R0, [R1, #-4]
------------------------------------------------------------------------
ARM Assembly Language Programming & Architecture by Mazidi, et al.
Pre-indexed with autoindexing (Register indirect with pre-incrementing)
LDR R0, [R1, #4]! ; Load R0 from [R1+4], R1 = R1 +4
------------------------------------------------------------------------
Post-indexing with autoindexed (Register indirect with post-increment)
LDR R0, [R1], #4 ; Load R0 from [R1], R1 = R1 +4
------------------------------------------------------------------------
Double Reg indirect (Register indirect indexed)
LDR R0, [R1, R2] ; Load R0 from [R1 + R2]
------------------------------------------------------------------------
Double Reg indirect with scaling (Register indirect indexed with scaling)
LDR R0, [R1, R2, LSL #2]
------------------------------------------------------------------------
Program counter relative LDR R0, [PC, #offset]
---------------------------------------------------------------------