0% found this document useful (0 votes)
141 views46 pages

ARM Processor Unit 5

Uploaded by

venkat Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views46 pages

ARM Processor Unit 5

Uploaded by

venkat Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

ARM-THf WORl□·s

MOST POPUlAR
32-BIT fMBf□Uf□
ig%,
[:Ji, 11mm '? i l.f:::i: 6
l0
PART I
PROCfSSOR "ES"%±
~---'
-; £a ,g
,.-,,,__ ii,

i' .;,·.. • 11JlJlll1·


'Slj
-- ,
ii.£#ill,''arr
It, JZB:IU '
if
,,a
- d::
;».
•<>~ ;-,,,..,,, ~

ARCHIHnURf AND ASSfMBlY


we
lANGUAGf PROGRAMMING 'ur¢

In this chapter, you will learn


">»
0 'lhe history of the ARM processor 0 How to use subroutines without a stack iti!'
0 The features and architecture of ARM 0 How to generate 32-bit constants using
0 The instruction set of ARM the rotation scheme
0 Assembly language programming for 0 The concept ofliteral pools
ARM 0 How to access R/W and Read Only memory I
t
0 The addressing modes of ARM 0 The use of different types of stacks

Introduction

-,.
This chapter gives an introduction to ARM, the very popular 32-bit processor, with
a short account of its history, followed by details of where it stands in the embedded
processor market now. ARM stands for 'Advanced RISC Machine'. The name explicitly
£
states its characteristic of being a RISC processor. The first ARM processor actually was
meant to be the 'Acorn RISC Machine' as it was manufactured by Acorn Computers

1
Ltd., Cambridge, England, in 1985.
i
.,
10.1 I History of the ARM Processor

'
J
},, In 1985, Acorn Computers Ltd. was in search of a new processor to put up in the
desktop market. While the technocrats were contemplating various design options, they

II
came across a few papers published by a set of students in the University of Berkley
I - (USA) outlining a very simple processor design based on RISC principles. The computer lj
architects of Acorn Computers found the design very attractive and decided to build
Ji
Chapter-opening image: An ARM? LPC2140ooard.
/l
If'

336 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32 BIT EMBEDDED PROCESSOR 337
r
a new processor using som e of these principles. This led to the developm ent of ARM1, 10.1.2 ] The ARM Microcontroller
which had less than 25,000 tra nsistors, and operated at 6 MHz. ARM has been designated as a 'm icroprocessor' and indeed it is a processor which has
This was followed by ARM2 (in 1987) with 30,000 transistors. Com paring this to
very high com puting capabilities. It has a rich set of featu res for handling com plex
an Intel/Motorola's processor of that tim e having 70,000 transistors, this was a beauty com putations.
in terms of a sm aller die size and lower power dissipation. This was thus, the first ARM
However, for using it as an em bedded processor, it needs many m ore capabilities
processor which was produced in bulk. It had a 32-bit data bus, a 26-bit address space
and these com e in the form of on-chip peripherals. To the ARM core, peripherals are
and sixteen 32-bit registers and was clocked at 8 to 12 MHz. It dissipated much less
added and thus it becom es a 'm icrocontroller' or an MCU (m icrocontroller unit), rather
power, and perform ed much better than Intel's 80286 which cam e up around the sam e
than an MPU (m icro processor unit). Figure 10.1 shows the ARM MCU. The num ber
tim e (but focused on the deskt op mar ket). and kind of peripherals added, depends on the requirem ents of the buyer of the IP. It is
ARM3, ARM4 and ARMS were also designed, but never produced, because around
because of this that we have varying num ber of peripherals for ARM processors sup-
this tim e, in 1990, Acorn Com puters team ed up with Apple Com puters and VLSI
plied by different com panies. It could be obvious that to support more peripherals, the
Technology group to form a com pany nam ed Advanced RISC Machines Ltd. This com -
core has to be more powerful. That is why we generally find m ore periphera ls around an
pany continued with ARM6, ARM7, etc. The latter was the processor which becam e
ARM 9 core rather than around an ARM? core. But as a rule, users have to spell out
very popular and led to ARM being used in exotic products such as mobile phones, their requirem ents for the peripherals of an MCU. la
PDAs, IPods, com puter hard disks, etc. After this, ARM made rapid strides in th e 32-bit
When a chip has the core and the necessary peripherals to perform as a system , it is «lo
em bedded market, accounting for a very high percentage of applications in the high-end n
called a System on Chip (SC )a nd th e term 'ARM SoC' is a very com m only used-
em bedded system s mar ket. understandably it has som e version of the ARM core and a large set of peripherals. '"
As of 2011, ARM processors account for approxim ately 90 per cent of all em bedded 1
32-it RISC processors. ARM processors arc used extensively in consum er electronics,
!1 11
1' including PDAs, mobile phones, digital media and music players, handheld gam e con- 10.1.3 I RISC VS CISC '
soles, calculators and com puter peripherals such as hard drives and routers, etc.
The differences between these two schools of thought in com puter architectu re have
The subsequent and more advanced processors of the ARM fami ly (ARM9, ARM1O,
been discussed in Section 0.3.
ARMll, Cortex) have been built on the success of the ARM? processor, which is still
But to put the idea in a proper perspective in the context of ARM, som e specific
the most popular and widely used m em ber of the ARM family.
featu res of RISC are listed herein. These apply to most of the instructions of ARM, but
Over the year s, many advanced featu res have been added to the ARM processor, but
not necessarily to all.
the core has rem ained more or less the sam e.
i) Instructions are of the sam e size, that is, 32 bits
10.1.1] The ARM Core ii) Instructions are executed in one cycle
iii) Only the load and store instructions access mem ory
What is meant by the 'core'? The core is the 'processing unit' or the 'com puting engine'
which has all the com puting power, and this aspect is decided by the architectu re, which
represents the basic design of the processor.
One special and unique featu re of ARM as a company is that it designs the core and
licenses this IP (Intellectu al Property ) to others. This sim ply means that the com pany
ARM Core Developed by ARM
HtPl
docs not 'fabricate' the chip, but sells only the design. This design is taken by the licensee,
who may or may not add m ore featu res (usually peripherals) to the design. Som etim es
the buyer can also modify the basic design to a minor extent. The buyer com pany fabri-
cates the design and sells it/uses it for its products.
There arc various ways in which ARM sells its IP. It could be in the form of a soft
IP. In this case, the design is sold as RTL (VHDL/Verilog code), and this allows the
buyer to modify the design to a certain extent. If the design is sold as a hard IP, it means
the buyer gets only the layout or the net list (connection of nets or electronic wires).
Thus, the buyer can add only periphera ls to the 'black box' design he has purchased.
Internal Bus

I Chip developed by
licensees and chip

maeaac<crn,s

We can thus understand that ARM the com pany does not 'fabricate' ARM chips.
(In contrast, Intel fabricates its processors and sells them as chips.) It is because of this,
that we have ARM chips and boards of various com panies- Sam sung, Philips, Atm el,
Figure 10.1 I ARM SoCcore with peripherals
Texas Instrum ents, ST Microelectronics and so on-- the list is very long.
ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 341
340 EMBEDDED SYSTEMS
Table 10.2 I Variants of the ARM Processor
ARM7TDMI
Processor Architecture Memory Management Other
Embedded ICE-RT Name Version Features Features
ETM7 Interface ARM7TDMI ARMv4T
ARMV4T ARMTTDMI-S ARMv4T
ARM-7 Core ARM7EJ-S ARMvSE DSP, Jazelle
Thumb ARM920T ARMV4T MMU
ARM922T ARMv4T MMU
ARM920T
ARM926EJS ARMvSE MMU DSP, Jazelle
MMU
ARM946E-S ARMvSE MPU DSP
Dual 16K Caches
ARM966E-S ARMvSE DSP
Embedded ICE
ARM968E-S ARMvSE OMA, DSP
ETM9 Interface
ARMV4T
ARM966HS
ARM1020E
ARMvSE
ARMvSE
MPU (optional)
MMU
DSP
DSP
'"'~I
l(

ARM-9 Core
ARM1022E ARMvSE MMU DSP
>M
Thumb
ARM 1026EJ-S ARMvSE MMU or MPU DSP, Jazelle
ASB Interface ARM1136J(F)-S ARMv6 MMU DSP, Jazelle
Figure 10.2 I Two ARM cores ARM 1176JZ(F)-S ARMv6 MMU+TrustZone DSP, Jazelle
ARM 11 MPCore ARMv6 MMU+Multiprocessor DSP, Jazelle
Subsequently, it was decided to do awaywith these complex naming schemes, as the Cache Support
features corresponding to TDMI were expected to be mandatorily available in all ARM
ARM 1156T2(F)-S ARMv6 MPU DSP
processors. But some numbers were added to imply the presence of memory interfaces,
cache, tightly coupled memory and so on. For example, ARMwith cache and MMU are Cortex-M0 ARMV6-M NVIC
now given the suffix 26 or 36, whereas processors with MPUs are suffixed with 46. Over Cortex-Ml ARMv6-M FPGA TCM interface NVIC
the years, this type of naming convention has also changed. Refer to Table 10.2 for some Cortex-M3 ARMv7-M MPU (optional) NVIC
8say Es ES1E8%2433E
. .2¥
more variants of ARM.
(Courtesy: The Definitive Guide to ARM Cortex-M3 by Joseph Liu, Newnes Publications)

10.1.5 I Architecture Versions


Over the years, the architectural features have also been enhanced. Thus, later versions
of the architecture are more powerful Versions v4 and v4T are the early versions, later Table 10.3 I Features of the Architecture Variants of ARM
versions are v5, v5E, v6 and v7. Table 10.3 lists various architecture variants of ARM. Architecture Features
Versions
10.1.6 { ARM CO RTEX v4 ARM instructions only
ARM has come a long way from ARM2, which was the first one to be commercially v4T THUMB instructions also added
produced. ARM7 was a resounding success which made ARM the dominant player in
v5 More advanced ARM and THUMB instructions
the 32-bit embedded processor market. ARM7 was followed by ARM9, ARM10 and
ARM11, all of which boasted of more and more computing powers. The latest in the v5E Advanced ARM instructions and enhanced DSP instructions
sequence is the CORTEX series which has the architecture v7 version. To make this v6 Advanced ARM and THUMB. SIMD and memory support
series cater to well-defined application sets, the following three profiles have been defined: instructions added
i) The A profile: This profile which has the ARMv7-A architecture is meant for v7 THUMB-2 technology, in which both 16-bit and 32-
high end applications. It is meant to handle complex applications with high-end bit instructions are supported, and there is no need to
i
embedded operating systems, and typical applications requiring such a profile are switching between ARM and THUMB instruction sets
mobile phones and video systems.
MI9993233.3999!MAY9See
se HA "EI9 +e923 1
l
If:'fl
342 EMBEDDED SYSTEMS ARM THE WORLD'S MOST POPULAR 32-II EMBEDDED PROCESSOR 343

ii) The R profile: This profile which has the ARMv7-R architectu re has been designed Decode Execute
for high-end applications which require real-tim e capabilities. Typical applications
Figure 10.3a I A three stage pipeline
are autom atic bra king system s and other safety critical applications.
iii) TheM profile: This profile which has the ARMV7-M architecture has been designed
for deeply em bedded microcontroller type system s. This is to be used in industrial
control applications where a large num ber of peripherals may have to be handled Cycle 2 3 4 5
and controlled. Operation

10.1.7 I The Features of ARM Which Make lt'Special'


INSTR 1 I Fetch
II Decode 11 Execute

Now that we have done a survey of the range of ARM processors, let's discuss the INSTR2 I Fetch
11
Decode I[ Execute
featu res which have m ade ARM a very popular processor in the high-end em bedded
market. INSTR 3 Fetch
11
Decode I[ Execute

i) Data bus width: The processor has a 32-bit data bus width, which means that it can Figure 10.3b The three stage pipeline with 3 instructions in operation
iii oat
read and write 32 bits in one cycle. For high end applications, having a wide data
bus corresponds to a high data bandwidth and is very important. When ARM first el

'
made its entry into the field, there were very few embedded processors which had
Decode Execute Buffer Write
>
'l
t such a wide bus width.
ii) Computational capability: The instruction set of ARM has been cleverly designed
to facilitate very good com putational capability. Many unique and new methods of
Figure 10.4 ] Afive-stage pipeline

fast com putation without the necessity of extensive hardware is used. The design of
the processor used the RISC approach, but over the years, this philosophy has been
Figure 10.3a shows a three stage pipeline, while Figure 10.3b shows three
diluted to enable the addition of specialized hardware for com putationally intensive
instructions in the pipeline. Any instruction needs three sub cycles to com e out of the
tasks. In essence, ARM is a RISC processor which has a few CISC featu res as well.
pipeline, which tra nslates to a throughput of three instructions per clock period (T).
iii) Low power: In the embedded field, power saving is very important, because a large
ARM7 has a 3-stage pip eline, while ARM9 has a 5-stage pipelin e with m ore
number of devices operate on battery power. Designing lower power processor cores
finely quantized stages (Figure 10.4), which are 'fetch, decode, execute, buffer data
is thus a matter of high priority. How is it that a processor is designed to have low
and write back'. As a general rule, more advanced processors have more pipeline
power capability? Em bedded processors operate at low clock frequencies com pared to
stages. For exam ple, ARMlO has 6 stages.
desk top processors. While 3.3GHz is com m only used in the desktop processor field,
Pipelining is a great idea, but it has the drawback that when a bra nch instruc-
ARM operates at relatively low frequencies from 60 MHz to at the most 1 GHz.
tion appears, the instructions following it arc no longer needed to be executed in
The other techniques in low-power design are explained in Section 2.4.
the norm al sequence. So the instructions in the previous stage/stages have to be
iv) Pipelining: Pipelining is a fundam ental idea in computer architectu re, for increas-
discarded, or we say that the pipeline is to be flushed. This creates a loss of speed,
ing the speed of opera tion. The idea is to get m any activities to be done in tandem ,
and the penalty is higher for pipelines with m ore number of stages.
by dividing the whole instruction processing stage into sub stages. The basic task
that any processor does is 'fetch, decode and execute'. In the sim plest form of pipe- v) Multiple regi ster instructions: Since ARM is a RISC processor, it has instructions
lining (3 stage), all the three stages are active all the tim e. While the first stage is which process data which arc in registers only - this sim ply m eans that data
fetching an instruction, the next stage, that is,. th e decode stage, is busy with the processing instructions do not use addressing modes in which one operand is
decoding of the previously fetched instruction, and the execute stage is execut- in mem ory. But there are instructions which access m em ory and load data into
ing the instruction which had been previously decoded. Thus at any tim e, there multiple registers - also, contents of multiple registers can be stored in mem ory,
are three instructions sim ultaneously present in the pipeline, at different levels of with a single instruction.
processing. vi) DSP enhancements: Our processor has RISC as its basic policy, but the m ore
If the processor clock frequency is f, the clock period (T) of the processor advanced mem bers of the fam ily have DSP (Digital Signal Processing) instructions
is divided by 3 to give a tim e ofT/3 for each of the stages. In this sub-cycle (of as an enhanced featu re. This is where ARM departs from its RISC philosophy, but
period T/3), one instruction each is obtained as a throughput, which is essentially is necessary for surviving in the em bedded market. These DSP enhancements are
3 instructions in the period T. It means that the processing speed is multiplied by 3. signified by an 'E' in the nam e as of the ARMvSTE and ARMvSTEJ architectu res.
r
!'

~
344 EMBEDDED SYSTEMS
i ARM- THE WORLD'S MOST POPULAR 32-8IT EMBEDDED PROCESSOR 345

10.2 [ ARM Architecture


Table 10.4 I Registers in the User Mode
Register Numbers Designations
With this background, let us get star ted on the more intricate details of the processor.
RO-R12 General purpose registers
R13 Stack pointer (SP)
10.2.1 I Instruction Set Architecture
R14 Link register (LR)
It is likely that you have heard the term 'Instruction Set Architectu re' (ISA) m entioned
R15 Program counter (PC)
in som e context or the other. The term im plies the user's i.e. the program m er's view of
the processor, which constitu te the instruction set, addressing m odes, registers, etc. ISA
is the assem bly progra m m er's or com piler designer's view of the processor. We will base
most of our discussions on ARM? whi ch was the first and still the most popul ar of the RO User
ARM processors. Advanced versions may have m ore enhancem ents, but the basic archi - and
R1 System
tectu re is more or less the sam e.
R2

10.2.2 I Operating Modes R3


tug
R4
ARM has seven opera ting m odes which are listed here. It is not important to understand {

the exact functions of each m ode right now. But keep in mind that the user m ode cor- R5
Fast
>
responds to the sim plest m ode, with least privileges, but is the mode under which most R6 •I
Interrupt
application program s run. The system mode is a highly privileged mode. This mode is Request
R7
used by opera ting system s to manipulate and control the activities of the processor. The
R8 R8_FIO
other modes are entered on the occurrence of exceptions or rather, they are interrupt
modes. See the list of the opera ting m odes of ARM. R9 R9_FIO i
i) User: Unprivileged m ode under which most tasks run R10 R10_FIQ I
ii) FIQ(Fast Interrupt Request): Entered on a high priority (fast) interrupt request R11 R11_FIQ
Interrupt !
iii) IRQ(Interrupt Request): Entered on a low priority interrupt request R12 R12_FIQ Request Supervisor Undefined Abort ·•
iv) Supervisor: Entered on reset and when a software interrupt instruction (SWI ) is R13_FIO R13_IRQ R13_SVC R13_ABT
R13 SP
executed ---
R14LR R14_FIQ R14_IRO R14_SVC R14_ABT
v) Abort: Used to handle mem ory access violations
vi) Undef: Used to handle undefined instructions R15 PC
vii) System: Privileged m ode using the same registers as user mode
CPSR

10.2.3 I Register Set SPSR_FIOI ISPSR_IRQ SPSR_ABT

ARM has 37 regi ster s each of which is 32 bits long. They are listed as follows: Figure 10.5 Register set of ARM
i) 1 dedicated progra m counter (PC)
ii) 1 dedicated current program statu s register (CPSR)
iii) 5 dedicated saved program statu s registers (SPSR) Figure 10.5 shows the whole set of regi sters available for the processor. Look at the
iv) 30 general purpose registers set of registers titled as 'user and system'. Let's discuss the specific functions of each of
them .
Now, let's go into the details of the listed registers RO-R12 arc general purpose registers, or what may be designated as scratch pad'
(
registers. These are the registers into which data and address are loaded. They are also
10.2.3.1 I General Purpose Registers 'the' registers used in com putations.
j
1,
There are 30 of them ; but they are distributed am ong different m odes. R13 is the pointer to the stack, and is the stack poin ter (SP). ;
To understand this featu re, see the case of one particular mode, say the user m ode. R15 acts as the program counter (PC), whi ch, like in any other processor, is the !
In this mode, the registers act as shown in Table 10.4. register which sequences instructions as they are fetched from m em ory. f
I
lI
346 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-8IT EMBEDDED PROCESSOR 347

R14 is the link register (LR), a special register. It is used whether there is a procedure Table 10.5 I CPSR Bits
call or an interrupt, that is, branching to a location. When branching becom es necessary, Bit Nos. Notation Interpretation
the value of PC is saved in the link register, and PC takes on the new bra nch address. 0 to 4 Mode Specifies the current mode of
When retu rning to the original sequence, the PC value can be retrieved from the link operation
register. This is a very convenient option, because the necessity to push the PC value to T Specifies whether in ARM(T = 1)or
5
the stack is avoided. The stack is a m em ory area, and saving and retrieving from stack is Thumb(T = 0) state
tim e consuming. Having such a register, that is, the LR, to store retu rn addresses helps
6 F Disables (F = 1 )FIQ
to reduce the delay associated with procedure calls and interrupts.
7 I Disables (I = 1 )IRQ

10.2.4 I Mode Switching 8 to 23, 25 to 26 Undefined

We know· that there are seven modes for the processor, which im plies that it can be 24 J In Jazelle state (J -- 1)
switched to different modes, as decided by the requirem ent. When the processor switches, 27 Q Sticky overflow flag
say, from the user to another mode, som e of the user mode registers are replaced by 28 to 31 V,CZ,N Conditional
another set of registers. See th e FIO_m ode, for exam ple, in this mode, R8 to R14 are
al
replaced by another set of registers, and the names of these registers are suffixed by FIO,
like R14_FIQ, R12_FIQand so on. 4¢
Bits O to 4 specify the current mode of operation. Since there are only 7 modes of
operation, onl y seven m ode num bers are valid.
»
Why Is it that FIQ uses another set of registers? 1111

The J bit is for indicating whether the Jazelle state is valid or not. The T bit specifies
Note that th is mode is entered on a 'fast interrupt' which means it requires fast action.
whether the current opera tion is in the ARM or Thum b mode.
One action during interrupts would be to save the contents of the currently uscd regis-
The contents of this register can be modified only in the highly privileged system
ters. This 'saving' takes som e tim e. To ensure fast operation, in the case of being switched
mode. It also contains the condition flag bits. Most of you are likely to know the rel-
to the FIQ_mode, new registers are used. No tim e is spent on saving the contents of
regi ster R8 to R14 of the user mode. Once th e FIQ_m ode is entered, those registers are
evance of the conditional flag bits. But for those who might be new to the concept of
flags, here is a concise descr iption.
l
just swapped out, and replaced by a set of new regi sters. Note also, th at all registers are
not swapped out, however.
J
·A
Now look at Figure 10.5 once again to note the IRO_m ode. Here only R13 and R14 10.2.5 I Conditional Flags
are replaced by new registers. In th e IRQ_m ode, the response is not expected to be, as N: Negative Flag This flag indicates the statu s of the MSB of the result of an opera-
fast as in the FIO_m ode. Thus, there is sufficient tim e to allow the contents of most of tion. If we are dealing with signed number N = 1 means that the sign bit = 1, which is
the registers to be saved, before mode switching is done. This also applies to the modes a negative result.
'undef, supervisor and abort'. In these modes too, only two registers are swapped out and
replaced with new ones. C: Carry Flag This bit is set if there is an overflow from the MSB of the data being
manipul ated; this can happen in additions, shifts, rotates etc. It is also set when the result
CPSR of subtra ction is positive. If Rl-R2 gives a positive result, C = 1, indicates that R1 is
The CPSR (Current Progra m Statu s Register) is a very important register, and there is greater than R2. To be precise, let's say that 'A carry occurs if the result of an add, sub-
onl y one such register for the processor. Figure 10.6 and Table 10.5 gives its details. tract or com pare is greater than or equal to 232, or as the resul t of an inline barrel shifter
The CPSR contains the inform ation about the current state of the processor. It has operation in a move or logical instruction'.
bits which specify the mode, control bits to enable/disable interrupts, and also specifies
whether the Thum b or ARM mode is currently in use. Z: Zero Flag If the result of an arithm etic or logical opera tion is zero, then Z = 1.

V: Overflow Flag This is the overflow flag, which is relevant only for signed operations.
It indicates that the sign bit has possibly been corrupted because the result has gone out
of the range.
I
31 28 27 24 23 16 15 8 7 6 5 4 0
When signed num bers arc used, only 31 bits are available for the magnitude of the i
N Z C VIO

Figure 10.6 Current Program Status Register (CPSR) bit configuration


F[T{ mode num bers. With 32 bits, overflow occurs if the result of an add, subtract or com pare is
greater than or equal to (231 - 1) or less than - 23 1, which is the m axim um range available
for signed num bers.
I
ii
t

348 EMBEDDED SYSTEMS ARM THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 349
To cite an exam ple, say two positive num bers are added, and the magnitu de of the Note that the first entry in the table is 'Reset'. All processors have a address, term ed
sum becom es greater than 31 bits. There will be an overflow into the sign bit, which will 'reset vector'which is the location to which control bra nches to, when it is first powered
change the MSB to '1'and get wrongly interpreted as a negative num ber. Thi s overflow ( on, or when reset in the midst of processor activity. For ARM, this is OxOO00 0000.
into the sign bit (MSB) with no overflow out of the MSB causes the overflow (V) bit Since this location is always fixed, RE SET is usually included in the class of vectored
to be set. interrupts.

Q: Sticky Overflow Flag This flag indicates overflow itself, but it is 'sticky' in the sense
that it rem ains set until explicitly cleared.
10.4 I Programming the ARM Processor
Saved Program Status Registers (SPSR) There are five 'Saved Program Statu s
Now that we have had a look at the concepts regarding the instruction set architectu re
Registers', that is, one for each of the 'exception' modes of operation. When an exception,
(ISA) of ARM, we are in a position to understand it better by program m ing. Writing,
that is, an interrupt occurs, the corresponding SPSR saves the current CPSR value into
running and testing program s is the key to understanding any processor. By doing pro-
it (so as to be able to retrieve it on retu rning to the previous mode). The system mode
gram m ing, we becom e capable of understanding alm ost everything about how registers,
and user m odes do not have SPSRs because they are not entered through the m echanism
mem ory and flags act on data. In short, we get a total feel about the processing activity
of interrupts. done inside the processor.
et{
To get to this, we need a programm ing environm ent, that is, an Integrated
«t
Developm ent Environm ent (IDE). There are many IDEs available for ARM, som e of
10.3 ] Interrupt Vector Table
which are free of cost (and freely downloadable) and som e of which are proprietary and
»
We have seen that ARM has a num ber of exception modes. Exceptions are a class of thus have to be paid for. However for stu dents, an evaluation version is available which is
interrupts which are internally genera ted due to the occurrence of som e specific condi- freely downloadaple and available from the website www.keil.com . Here, we will use the
tions. For exam ple, when an undefined instruction is detected, th e processor can' t process Keil IDE also called the RVDK (Real View Developm ent Kit), which is very popul ar
it. The solution for such an undesired situ ation is to m ake the processor switch to another and easy to use. This version can be used for testing program s and for sim ul ation also.
mode and genera te an interrupt. This interrupt takes control to an interrupt service rou- We will do all our learning using this IDE. The step-by-step procedure for using this,
tine (ISR i.e. interrupt handler) residing in a specific location in mem ory. This specific is detailed in Appendix A. In this part of the chapter, we will assum e that you have this
location is term ed the 'Interrupt Vector' corresponding to this exception. IDE and also that you have already browsed through Appendix A.
Besides 'exceptions', the processor can be interrupted by instructions and this is
called a software interrupt (SWI ). There are hardware interrupts as well, which are acti-
10.4.1 ProgrammingAssembly vs C
vated y FIQorIRO. r ·1 ,
The aforesaid discussion is just to clarify the fact that associated with all exceptions, Programm ing can be done in assem bly as well as in high level languages. In the em bed- , I
I
,.1,
hardware and software interrupts, there is a fixed interrupt vector which leads to the ISR ded design world, high level languages are used in product design, and C is a very 4
or the interrupt handler. popular language. As such we will also do C program m ing (in the next chapter). But
See Table 10.6 which shows the pre-defined interrupt vectors. before that, let's have a stint in assem bly program m ing. Our approach will be such that
to understand the ARM core, that is, to use its regi sters, do mem ory access and so on,
we will do assem bly program m ing. This ensures that we get a good grip on the ARM
Table 10.6 I List of Interrupt Vectors core architectu re. In this context, it will turn out that we focus on the com putational
capabilities of the core.
t
Exception Shorthand Vector Address
And when we start using ARM as a microcontroller, i.e. the core with a number of
Reset RESET 0x00000000
peripherals, we use C program m ing. This will allow us to use the processor in various
Undefined instruction UNDEF 0x00000004 practical applications involving peripherals and intera ction with th e external world. This
Software interrupt SWI 0x00000008 part will be discussed in Chapter 11.
Prefetch abort PABT 0x0000000c
Data abort DAT 0x000000l0
Reserved 0x00000014
10.5 I ARM Assembly Language
Interrupt request IRQ 0x00000018 As m entioned earlier, the ARM instruction set has been cleverly designed to get m ore
Fast 0x000000l c than one operation to be done in a single instruction. Let's list out som e featu res of the
ARM instruction set.
350 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 351
Operand 1 Operand 2 Address Data
0x00001200 A3
0x00001201 90
Ox00001202 47
Ox00001203 OE
Barrel
Shifter Figure 10.8a I The little endian format

Address Data
0x00001200 OE
0x00001201 47
0x00001202 90
0x00001203 A3
ALU Figure 10.8b I The big endian format

!
wt

A 32-bit data stored in mem ory needs 4 bytes of space which means 4 consecutive tt I

addresses are required, as one address can store only one byt e. When the lowest byt e
Result
of the 32-bit word is stored in the lowest of these four addresses, it is called the 'little
» '
Figure 10.7 I Data processing unit
endian' format. Otherwise, it is the 'big endi an' form at. See Figure 10.8. The 32-bit data
word is Ox0E4790A3.1he storage addresses are from Ox00001200 onwards.
i) ARM is a RISC processor, in which every instruction has a maxim um size of32 bits. In the processor industry, both formats are used. Intel prefers the little endian form at,
Instructions are expected to be executed in one cycle. This is true for most instruc- while Motorola uses the big endian form at. ARM allows both form ats ( can be fixed up
tions, but not for all. Therefore it is better to say that ARM is a RISC processor with by software, in the initialization stage). In this book, we assum e the little endian form at.
a few CISC typ e instructions as well.
ii) Another featu re of RISC and therefore of ARM, is that it is a load-store architec- 10.5.2 [ Data Alignment
,J
tu re. This m eans that all com putations are register based, that is, the opera nds are to
be brought to registers from m em ory, using a load instruction. After com putation,
Storing (and loading also) of 4 byt es in mem ory can be done in one cycle, because the I
the result is to be stored in mem ory. For the user, this means that there is no data
processor has a 32-bit data bus. When 32-bit data is stored in m em ory, four addresses
are needed. We need to specify only one address in our instruction; but there is an aspect
I
1
processing instructions in which one of the operands is in m em ory. All operands are
to be available in registers before com putation can be done.
iii) A third featu re of ARM is that its ALU has a barrel shifter (Figure 10. 7) associated
called 'alignm ent'. For 32-bit data, 'alignm ent' im plies that the last two bits of this address
are zero. For exam ple, the address 0x00001200 is an aligned address. When this address
I
with one of its operands. A barrel shifter is a unit that can perform more than one
is used to store 32-bi t data, this address and the next three addresses arc autom atically
accessed. This is because of the way mem ory is organized, as four banks (see Figure 10.9).
ii
bit of shift/rotation, to the right or to the left on an opera nd. As we will soon see, the
barrel shifter adds som e clever processing techniques to data processing and allows I
shifting and an arithm etic opera tion to be com bined in the sam e instruction.
iv) 'Conditions' can be appended to instructions: this implies that we can choose to 'do
Bank 3 Bank2 Bank 1 Bank 0 I
or not do' a particular operation based on the statu s of a condition flag, For m ost 0x1204 0x1205 0x1206 0x1207
~i
other processors, only branching operations depend on flag statu s. Here we will see 0x1200 0x1201 0x1202 0x1203
±£:
that data movem ent as well as data processing instructions can be made 'conditional'. ~

11
10.5.1 I Data Types D31 D24 D23 D16 D15 DB D7 DO #
ii

ARM can operate on 32-bit data, which is term ed a word, 16-bit data called a half word 16 Bits 16 Bits i
and also on byt e opera nds. The processing tools offer the option of storing data as 'little
endian', or 'big endian'. To clarify this concept, follow the forthcom ing discussion, and
I: 32 Bits :I ±'f
Figure 10. 9 I Memory banks
observe Figure 10.8
EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-8I EMBEDDED PROCESSOR 353
352
If the address of a 32-bit number is given as O:x:1200, the accessed addresses are 10.6.1 I Data Processing Instructions
0x1200, 0x1201, O:x:1202 and O:x:1203. The 4 bytes in these addresses are considered to be ARM is a RISC processor, one of the features ofwhich is that it processes, i.e., performs
in the same row, that is, aligned. In this case, one byte each from each bank is accessed computations, on data which are in registers only. There are instructions which move
and only one memory cycle is needed to access an aligned word. data from one register to another. Such instructions have only two operands, that is, the
For unaligned data, one more cycle is necessary. Think of the address O:x:1201. The source and the destination. Instructions which perform arithmetic/logical computations
locations to be accessed will be O:x:1201, O:x:1202, O:x:1203 and O:x:1204. Note that the first have three operands-two source operands and one destination operand.
three bytes will be in the same row, while the last will be in a different row (bank), and
so one more cycle of access will be required.
We summarize the conditions for 'aligned data' as follows: 10.6.1.1 I MOVandMVN

• For word (32-bit) data, the specified address should have its least significant two bits The 'MOV' instruction is a 'register to register' data movement instruction with the for-
mat MOV destination, source where both the source and destination have to be registers.
as 0.
• For halfword (16-bit) accesses, the specified address should have the LSB equal to 0. The mnemonic 'MVN'stands for 'move negated' which implies moving the comple-
mented value of the source to the destination.
Most of the tools for ARM ensure that data is stored in aligned locations, so as to Registers Rl to R12 can be used for data movement as they are general purpose
avoid unnecessary extra cycles of operation. registers. The registers R13, R14 and R15, whi ch are the stack pointer, link register and u t]

-. the program counter respectively, can also use the MOV instructions, but this must be a

10.5.3 I Assembly Language Rules done carefully and only for specific purposes. :111

An assembly language line has four fields, namely, label, opcode, operand and comment. Examples
A label is positioned at the left of a line and is the symbol for the memory address which MOVR11,R2 copy the contents ofR2 to R11
stores that line of information. There are certain rules regarding labels that are allowed MOV R12, R10 copy the contents of R10 to R12
under the type of assembler being used. The manual of the specific assembler should be MVNR0,R9 ;move the complemented value of R9 to RO
referred, to get this clear. The second field is the opcode or instruction field. The third is if R9 = O:x:FFF00000, RO = 0:x:000FFFFFF
the operand field, and the last is the comment field which starts with a semicolon. The
use of comments is advised for making programs more readable. Note Here we have discussed only the case of the MOV instruction used for moving
A typical assembly language statement is data between registers. The MOV instruction is also used for copying immediate data
into registers. That will be discussed in Section 10.17.
BOSE ADD Rl, R2, R3 ;add R2 and R3 and copy the sum to Rl.
The label is BOSE, the opcode is ADD, the operands are R1, R2 and R3 and the
10.6.1.2 ] The Barr el Shifter
line after the semicolon is the comment. While writing programs, make sure you don't
write instructions at the extreme left of the page-that part is the 'label' field in this
book.We will use the assembler which is part of the RVDK supplied by Keil. The steps
Now, refer to Figure 10.7. We see that there is a barrel shifter associated with data
processing. The figure shows two register operands, one of which can optionally be
l
in usirig it have been clearly described in Appendix A. More details are available in the acted upon by a barrel shifter, before being admitted to the ALU. The barrel shifter
'Real view assembly guide'. can do shifting and rotation. Let us first have a general discussion on shifts and ~,J.
rotations. 4
'
'I..
10.6 ] ARM Instruction Set 10.6.2 I Shift and Rotate .~
~
We will now discuss the ARM instruction set, and gradually move on to writing Two types of shifts are possible: logical and arithmetic. I]$i#l
programs. ,,+it
,,
The instruction set can be broadly classified as follows:
10.6.2.1 I Logical Shift Left (LSL) I
i) Data processing instructions Logical Shift Left of a (say) 32-bit number causes it to shift left, (a specified number of 1
ii) Load store instructions-single register, multiple register
times) and the vacant bits on the right are filled with zeros. See Figure 10.10. The last bit t
.. 1j
'·,

iii) Branch instructions 4


i
iv) Status register access instructions
The last set moves the contents of the CPSR or an SPSR to or from a general purpose
CF Register O I
register and are used only in privileged modes. We will discuss the first three sets in detail. Figure 10.10 I Logical shift left
354 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 355

shifted out from the left is copied to the carry flag. Keep in mind that a left shift by one 10.6.3 I Format of Shift and Rotate Instructions
bit position corresponds to multiplication by 2. An LSL of 5 implies multiplication by 32.
The num ber of bit positions by which shifts and rotations are to be done m ay be specified
by a constant or may be indicated in another register.
10.6.2.2 I Logical Shift Right (LSRJ
Logical Shift Right does a sim ilar thing. The vacant bit positions on the left arc filled
Examples
with zeros, and the last bit shifted out is retained in the carry flag. This is shown in
Figure 10.11. Shifting right by one, divides the num ber by 2. Two right shifts cause a
LSL R2,#4 shift left logically, the content of R2 by 4 bit positions
ASR R5, #8 ;shift right arithm etically, the content ofR2 by 4 bit positions
division by 4.
ROR R 1,R2 ;rotate the content ofR l, by the num ber specified in R2
10.6.2.3 I Arithmetic Shift Right (ASR)
Arithm etic Shift Right is different in the sense that the vacant bit positions on the left Example 10.1
are filled with the MSB of the original number. See Figure 10.12. This type of shift has
the function of doing 'sign extension' of data, because for positive numbers the MSB is 0, The content of som e of the registers are given as:
and for negative num bers, the MSB is 1. There is no instruction for arithmetic shift left,
,/«4 Rl = 0EF00DE12, R2 = Ox0456123F, R5 =4,R 6= 28. aa
because of not having an application for it.
r t
m~ Find the result (in the destination register), when the following instructions are executed.
,,. 10.6.2.4 ] Rotate Right (ROR) l
» In this, the data is moved right, and the bits shifted out from the right are inserted back i)
ii)
LSL Rl,#8
ASR R1, R5
through the left. Sec Figure 10.13. The last bit rotated out is available in the carry flag.
iii) ROR R2, R6
There is no 'rotate left' instruction, because left rotation by n tim es can be achieved by
iv) LSR R2, #5
rotating to the right (32- n) tim es. For exam ple, rotating 4 tim es to the left is achieved
by rotating 32-4 = 28 tim es to the right. Solution
10.6.2.5 I Rotate Right Extended (RRX) i) Shifting Rl left 8 tim es causes 8 zeros in the 8 positions on the right. Rl now con-
tains Ox00DE1200 1
This corresponds to rotating right through the carry bit, meaning that the bit that drops
ii) R5 contains 4. Arithm etically right shifting Rl 4 times, causes the MSB (1, for the
off from the right side is moved to C and the carry bit enters through the left of the data.
given num ber) to be replicated 4 tim es on the left, thus causing a sign extension of
This should be obvious from Figure 10.14.
the shifted number. R1 now contains 0xFEF00DEl.
iii) R6 contains 28. Rotating R2 28 tim es to the right is equivalent to rotating it
0 Register 32-28 = 4 tim es, to the left. After rotation, R6 contains Ox456123F0.
Logical shift right iv) Here, R2 is logically shifted right 5 tim es, and so 5 zeros enter through the left. R2
Figure 10.11
now has the value Ox0022B091.
89act-etc it» • targee.gc.macer.

: Register CF
10.6.4 I Combining the Operations of Move and Shift
Figure 10.12 I Arithmetic shift right
Recollect the barrel shifter which is an integral part of the data processing unit of the
processor. This allows shifting and data processing to be done in the sam e instruction
cycle. We will first see how moving and shifting can be com bined in one instruction
Register itself.
Figure 10.13 I Rotate right MOV RI, R2, LSL #2
MOV R1, R2, LSR R3

Figure 10.14 I •
Rotate right extended
Register
1 In both the above instructions, R1 is the destination register. In the first instruction, the
source operand, that is, the content of R2 is logically shifted twice and then m oved to
the destination register R1. In the second, the am ount of' shifting' is specified in register
R3. After the shifting is done, the result is moved to R1.
!

356 EMBEDDED SYSTEMS


ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 357
Example 10.2 31 28 27 20 19 16 151211 43 0

1-] 83 -
Find the content of the destination registers after the execution of each of the given
OPCODE Other Info
instructions, given that the content of RS = Ox72340200 and R2 = 4.

i) MOV R3, R5, LSL #3 Figure 10.15 I Format of a typical instruction


ii) MOV R6, R5, ASR R2

Solution
Table 10.7 I List of Conditions, Codes and Corresponding Flag Status

The resul ts here are sim ilar to Exam ple 10.1, except that the source and destination reg- Cond Mnemonic Meaning Condition Flag State
isters are not the sam e after execution of the instructions. 0000 EQ Equal Z=1
0001 NE Not Equal 7Z=0
i) MOV R3, R5, LSL #3.
The content of RS is shifted left 3 tim es, and moved to R3. 0010 CS/HS Carry set/unsigned >- C= 1
R3 now contain s 091A01000 0011 CL O Carry clear/unsigned < C=0
ii) MOV R6, R5, ASR R2
R2 = 4,and so RS is arithm etically shifted right 4 tim es. Since th e MSB of the
0100 Ml Minus/Negative N= l }' 4

0101 PL Plus/Positive or Zero N=0 -t t


num ber in RS is 0, when right shifting, this bit is replicated 4 tim es at the left of the
num ber. After execution, R6 contains Ox07234020
0110 vs Overflow O=1
_______..,.....,........_,,....,,...,....,_...........-....-..,,,...-...,..,,=-~- .£.k1T,
0111 vc No overflow O=0
1000 HI Unsigned higher C=1&7=0
I
10.7 Conditional Execution
ARM has another interesting featu re which can be designated as 'conditional execution'.
This means that instructions are executed only if a specified condition is true, and here
1001
1010
1011
LS
GE
LT
Unsigned lower or same
Signe d> =
Signe d <
C=0IZ==l
N= =V
N!=V
j
the im portant thing is that it is not bra nch instructions alone that are meant- any data 1100 GT Signed> Z==0,N==V
processing instruction can be used in this way.
1101 LE Signe d <= Z==1 or N!=V
In general, all arithm etic and logic instructions are expected to affect conditional ~
1110 AL Always

·~,
flags. But for ARM, we m ust suffix the instruction by S for this to happen. Otherwise
the flags are unaffected. It is the S suffix on a data processing instruction that causes the
flags in the CPSR to be updated.
In Example 10.2, in the instruction MOV R3, R5, LSL #3, th ere is a logical opera -
tion involved, that is, the left shift opera tion. This should cause the carry flag and N flag
1111
333.3580±.
(NV)
ERE39257325..243209154.50358355372%-16.53RT9~
Unpredictable
ige

but -3 is greater th an -6. Thus, it is clear that unsigned and signed num bers have to be
dealt with differently.
- ',
ii
to be set. But since the MOY instruction is not appended with the suffix, 'S', the flags ~]
rem ain unaffected, that is, reset. The MOY instruction can be made conditional by writ- 4

ing it as MOVS R3, R5, LSL #3. After this is executed, we find the N and C flags to be
t
10.8 [ Arithmetic Instructions iii/
set. This flag setting can be used to m ake an instruction following it, to be 'conditional'.

i"
We will soon see more aspects of this. Now let's get a feel of the arithm etic instructions of ARM and the special ways in which
Figure 10.15 shows the form at of a typ ical ARM instruction. In the instruction they can be used.
:lt
code, four bits are allotted for the condition under which the instruction is to be exe-
cuted. If no condition is indicated, these bits assume the 'always' condition. 10.8.1 I Addition and Subtraction
Table 10.7 lists the conditions, condition codes and the flag statuses for these condi-
Addition and subtraction are three operand instructions. The destination is always a regi s-
tions. We will discuss the use of condition codes for instructions. ,i
ter. The source operands may both be registers or one of them m ay be an im m ediate data.
Note that the conditions used for signed num bers and unsigned num bers ar e dif--
ferent. For unsigned numbers, we use the m nem onic 'higher' or 'lower', while for signed
There are som e issues in using im m ediate data greater than 8 bits (Ref Section 10.17). 4
num bers, the conditions are specified as 'greater than' or 'lower than'. The flag settings arc
See Table 10.8 which gives exam ples of how the different addition and subtraction i
also different. The logic of this is very sim ple, that is, we know that 6 is. higher than 3,
instructions work. Any of the general purpose registers may be used as operands, though l
in the table, only R3, R4 and RS have been m entioned.
[
111

lii
358 EMBEDDED SYSTEMS ARM THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 359

Table 10.8 I List of Arithmetic Instructions


«
Otherwise this also becom es a NOP. The move instruction gets the bigger
Instruction Operation Calculation t
» num ber into Rl0.

ADD R3, R4, RS Add R3 = R4 + RS '$ v) If the car ry flag is not set, and the Z flag is also not set, it m eans that R2 is bigger.

ADC R3, R4, RS Add with carry R3 = R4 +RS+ C


Note The last line does not need a condition. It can sim ply be MOV. If the other two
SUB R3, R4, RS Subtract R3= R4- R5 conditions are not satisfied it is obvious that the last one will be.
SBC R3, R4, RS Subtract with carry R3= R4- R5 -C
Exam ple 10.3 might seem much too sim ple to need such a lot of explanation, but the
RSB R3, R4, R5 Reverse subtract R3 = RS - R4
intention is to make this idea (of conditional execution) very clear , so as to enable you to
RS Reverse subtract with tackle more difficult problem s with ease.

One question that may come to your mind is 'why' such


Rem em ber the concept of suffixing data processing instructions. This can be used
conditional execution?
ingenuously for making opera tions conditional. For exam ple, the add instructions (just
as any other data processing instruction) does not affect the conditional flags unless it The answer is that, with this the use of bra nch instructions can be avoided in m any
·]
,~
· 1,

is suffixed by S. Following such an ADD instructions, we can have instructions with in stances. Thi s is a very great saving, as branching causes stalling of th e pipeline
conditions appended to it. The set of possible conditions are listed in Table. 10. 7. For any (Section 10.2). It allows very dense code, without many bra nches. Not executing som e of «I
instruction, the upper 4 bits are used to specify the condition (Figure 10.15). the conditional instructions does affect the speed, but the penalty is less than the over-
head due to a bra nch.
Consider these program lines
SUBS R1, R2, R3 the suffix 'S' has been used
MOVEQR2,R l ;the EQnotation tests the Z=1 condition Example 10.4
Here the move instruction is executed only if the result of the subtra ction produces a Find the result of the following instructions. What do these instructions accom plish?
zero and sets the zero flag. The condition EQ_implies th e setting of the zero flag (Refer
i) ADD R1, R2, R2, LSL #3
Table 10.7)). Let' s use this concept in a sim ple exam ple.
ii) RSB R3, R3, R3, LSL #3
Example 10.3 iii) RSB R3, R2, R2, LSL #4
iv) SUB RO, RO, RO, LSL #2
It is required to com pare two num bers which are in registers R1 and R2. The bigger v) RSB R2, Rl, #0
num ber is to be placed in RlO. If the two num bers are equal, then the num ber is to be
moved to R9. Solution
i) ADD R1, R2, R2, LSL #3
Solution
One source operand is R2, LSL #3. Left shifting 3 times accomplishes multiplica-
Here we use the subtraction operation to do the com parison. tion by 2?=8
SUBS R3, R1, R2 ;R3=R 1- R2 The result of the whole operation is R1 = R2+8R2=9R2
MOVEQ R 9, R1 ;If R l and R2 are equal (Z=1)m ove Rl to R9 ii) RSB R3, R3, R3, LSL #3
MOVHI RlO, Rl ;ifRl>R2, C = 1, Rl is moved to RlO R3=8R3- R3=7R 3
MOVR 8,R2 ;otherwise i.e., ifRl is less than R2, m ove R2 to RS iii) RSB R3, R2, R2, LSL #4
R3 = 16R2- R2 = 15R2
iv) SUB RO, RO, RO LSL #2
The salient points of this program are as follows:
RO= R0-4RO=-3R0
i) First the operation, R1-R2 is perform ed and the result is placed in R3. v) RSB R2, Rl,#0
ii) Since the SUB instruction has been appended with S, the flags will be set We get R2 = 0- Rl = -Rl. i.e., we get the negative value ofRl
. ' . ' 8 °,242/1%35.3.' 10KR3/.·19£:3&-h.1"£:+$2±1.%6 £2 0 9 9 /7 3 8 3 . +±9911935%,9
accordingly.
iii) If the two num bers are equal, the zero flag gets set and the instruction MOVEO
will get executed. Otherwise it becom es a NOP (no operation) instruction. Here
one of the numbers (R1) is to moved to R9 (as both num bers arc cqual).
10.9 I Logical Instructions
iv) The next line checks whether the carry flag has been set. If R1>R2, th e carry Now, we will see the logical instructions of the processor. They also need to be suffixed
flag is set (C = 1) and the MOVHI (m ove if high) instruction gets executed. with 'S' to have th e flags updated. See Table 10.9.
re
ti
EMBEDDED SYSTEMS
'I ARM-THE WORLD'S MOST POPULAR 32-8IT EMBEDDED PROCESSOR 361
360

Table 10.9 I List of Logical Instructions Table 10.11 I Flag Settings After a Compare Instruction

Instruction Operation Logical Result If C z


AND R3, R4, R5 Logical AND of 32 bit values R3 = R4 AND RS R3 > R4 1 0
0RR R3, RA4, R5 Logical OR of 32 bit values R3 = R4 OR RS R3 < R4 0 0
OR R3, R4, R5 Logical XOR of 32 bit values R3 = R4XOR RS R3 = R4 1 1
erz999/9/3892a
# ae ,..
IC R3, R4, R5 Logical bit clear R3 = R4 (AND NOT) RS
R±4, 3He .
3e$82f2./0, 095 38 GM9 A as±u IRIBF7 nrz:m
What Is the use of the TST instruction?
TST is an instruction sim ilar to com pare, but it does ANDing and then sets conditional
Example 10.5
flags. If the result of ANDing is a zero, the Z flag is set. It can be used to verify if at least
Given the contents of R3 and R4 as, R3 = Ox0FF00FF0, R4 = Ox0FFO0FF0. and RO= 0. one of the bits of a data word is set or not. For that, 'test' the num ber with another one
Find the values in R1, R2 and RS at the end of the sequence of instructions shown. in which the required bit position has a '1'. For exam ple, let's say we need to know if
the LSB of the content of R1 is set or not. Use the instruction TST Rl, #01 and verify
i) EORS R1, R3, RA •"II
the status of the Z flag. If Z =1,it im plies that the LSB of R1 is not set, because the
ii) ANDS R5, R3, ti
AND'lJperation for that bit, has produced a 0, not a 1.
Solution What is the use ofthe TEQ instruction?
The content of the destination register and the affected flag is shown alongside the
TEQdoes exclusive ORing which tests for equality. If both the operands are equal, the
executed instruction
Z flag is set. It verifies if the value in a register is equal to a specified num ber. The instruc-
i) EORS R1, R3, RA ;R1= 0x00000000, Z = 1 tion TEQRl, #45 verifies wheth er the content of R1 is 45.
ii) ANDS R5, R3, RO ;RS = Ox00000000 Z = 1

Note One of the source opera nds may be 8-bit im m ediate data as well. Refer to 10.11 I Multiplication
Section 10.17 for details of how to handle data bigger than 8 bits.
ecires.a.at.et.» q3tcsM.
Me.&'Y
I .#£0%0,0.979e986.1909%836080%%%.
298990066606
438 Rei pg.eNE.reM.Acbats0#
tee399. Multiplication is a com plex operation which needs specialized hardware and takes m ore
than one cycle to execute. ARM has a number of multiplication instructions, which uses
10.10 I Compare Instructions
this hardware. Let's exam ine how these instructions are used.

This instruction com pares two opera nds and causes the conditional flags to be affected, 10.11.1 I Multiply
but neither the destination nor the source changes. Com parison is done by a subtraction The form at of the multiply instruction is
opera tion, and the flags are set/reset according to the resul t of this. (ARM has four typ es
of com pare instructions as shown in Table 10.10). However, only two flags really matter MUL Rd, Rm, Rs
and they are the zero flag and the carry flag. Refer to Table 10.11 to get an idea of the where Rd is the destination register. Rm and Rs are source registers. A num ber of points
flag settings after a com par e instruction. arc to be kept in mind when these instructions are used. Table 10.12 lists different typ es
Note Since the com pare instructions explicitly affect the flags, the suffix S is not of multiplication instructions.
required for them .
Com parison is a very im portant operation, and we will use it very frequently. Table 10.12 List of Multiply Instructions
A num ber of progra m s using this instruction will be discussed subsequently. Instruction Operation Calculation
+
SMLAL RO, Rl, R2, R3 Signed multiply and (RO, Rl] = (RO, R1J + R2 * R3
I

l
Table 10.10 List of'Compare'Instructi ons accumulate
CMP R3,R4 Compare R3- R4, but only flags affected SMULL R0, R1, R2, R3 Signed multiply [RO, R1]= R2 * R3
,,,
CMN R3, R4 Compare negated R3 + R4, but only flags affected UM LAL R0, R1, R2, R3 Unsigned multiply and [RO, R1]= [RO, R1] + R2 R3 4
TST R3, R4 Test R3 AND R4 but only flags affected accumulate
i-
;
R4 Test R3 OR RA but only flags affected UMULL
r Ta 2898Ga99/ £3#5A22298% i
1
1
iit
/
362 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 363
i) The source and destination registers are 32 bits in lengt h. If the product is longer
than 32 bits, only the lower bits are preserved in the destination register.
10.13 I Starting Assembly Language
ii) Im m ediate data cannot be used as a source operand.
Programming
iii) If the multiplicand and m ultiplier are signed num bers, it is up to the program m er to If you have any previous experience of assem bly language program m ing, you will know
identify a logic to interpret the sign of the product. that there arc two item s used therein- instructions and directives- the form er are exe-
iv) The instruction can be made conditional. cutable statem ents which are 'executed' by the processor. The latter, that is, directives are
non-executable statem ents relating to the assem bler. They are used to give the assem bler
Example necessary inform ation to perform the assem bly process sm oothly. For som e processors,
MUL R1,R2, R 3 R 1 =R2 x R3 directives are also called pseudo instructions. For ARM, pseudo instructions are special
MULS R1,R2, R 3 R 1 =R2 xR 3 and flags are also set directives issued to the processor which causes certain instructions to be executed. Thus,
MULSEQ R3, R2, R1 ;R3 = R1x R2 is done only if the Z=1 they are also executable statem ents. Thus for ARM, an assem bly language line will con-
;(because of the EQsuffix) tain an instruction, directive or pseudo instruction.
;because of the S suffix, flags are updated Writing and testing a progra m for ARM is done in a com puter, usually a PC, which
MULE Q R4,R 3, R5 ;ifZ = 1, R4 = R3 x R5 is called the host com puter. The host com puter should have the program developm ent '.'.J

tools for ARM. Since the progra m written in ARM assem bly language is assem bled
in a PC which has a different processor (usually som e version of Pentium) , th e process
'
«i
i
10.11.2 ] Multiply and Accumulate i
is called 'cross assem bly'. After the progra m in tested, it is converted to a hex file and
I
The format of this instruction is

MLA Rd, Rm, Rs, Rn ;Rd = (Rm * Rs) + Rn


burned into our processor (i.e. ARM).
The output of an assem bly or com pilation process has atleast two areas. l
This instruction does multiplication and accumulation (addition) as seen above. All the i) A code area. This is usually a read-only area.
conditions specified for th e MUL in struction are applicable here, as well. ii) A data area. This is usually a read-write area.

Example
MLA RO, R1, R2, R3 ;RO = R1 x R2 + R3
The default area for code is Read-Only and for data it is Read-Write.
Let's understand som e fundam ental directives first. ,I
4
10.11.3 I Long Multiply/Long Multiply and Accumulate
10.13.1] The AREA Directive
I
In this, when 32-bit data arc multiplied to get 64 bit results, the upper 32 bits are saved
The first thing we do when we start assem bly language progra m m ing is to define an
area. There is a directive nam ed 'ARE A' for this. This directive nam es the area and sets its
I
in a specified register. For signed data, the sign bit is also preserved in the upper register.
attributes. The attributes are placed after the name, separated by com m as.
The form at is
Instruction <RdLo>, <RdHi>, <Rm>, <Rs> Examples Example
ARE A
1
'ii
Note That in all the above cases, two registers function as the destination SORT, CODE, RE AD ONLY
Since multiplication is a com plex instruction, it takes many cycles for execution. So, it is
ARE A TAB LE, DATA I
best to realize multiplication using shifting and adding, rather than using any of the mul- The first area defined above is given the nam e SORT; it contains a code, and is read
tiply instructions. Table 10.12 lists the available 'multiply and accum ulate' instructions. only. The word 'read only' is optional. The second AREA directive has the nam e TABLE
and it contains data and though not mentioned will correspond to the Read Write area #!1~1
(as it is a data area).
10.12 I Division
Division is another com plex instruction requiring specialized hardware and extra clock 10.13.2 I The ENTRY Directive
cycles: As a policy, basic ARM architecture docs not have a 'divide' instruction. Division The ENTRY directive marks the first instruction to be executed within an application.
can be realized using repeated subtraction. Com pilers are given the responsibility of Because an application cannot have more than one entry point, the ENTRY directive
accom plishing division using the sim ple instructions of the processor. can appear in only one of the source modules.
#,I
Ii

ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 365


364 EMBEDDED SYSTEMS

The second line has address NUMB for the first half word, and NUMB +2for
10.13.3 ] The END Directive the second half word. In the last case, the addresses of the words are NUMBR and
This directive tells the assembler to stop reading. Anything written after the END NUMBR + 4.
directive will be ignored by the assembler. So every assembly language source module Keep in mind that a byte needs only one address, halfword needs two, and a word
must finish with an END directive, on a line by itself requires four memory addresses.

10.14 I General Structure of an Assembly Language Line 10.14.2 I The EQU Directive
The general form of source lines in assembly language is: This is a frequently used directive, and is used to equate a numeric constant to a label.
{label} {instructionldirectivelpseudo-instruction} {;comment} The constant may be data or address. Examples are as follows:

Some points to keep in mind are listed as follows: FACTR EQU 35


BASE_ADDR EQU Ox40000000
i) Instructions, pseudo-instructions and directives must be preceded by a white space,
such as a space or a tab, even if there is no label. This means that they should not be
written in the label space (extreme left of the line). 10.14.3 I Constants Allowed «/
t
ii) Instruction mnemonics and register names can be written in uppercase or lowercase, The constants that can be used are numbers (decimals, hex or having any other base), t
,;
but not mixed. characters, strings and Boolean
iii) Labels are symbols that represent addresses. The address given by a label is calcu-
lated during assembly. The assembler calculates the address of a label relative to the • Decimal., say, 346, 6748, etc.
origin of the area where the label is defined. Assigning labels eases the programmer's • Hexadecimal. For example, 0x12345678, 0FCE45, etc.
burden as he does not have to concern himselfwith numerical values. The location • n_xxx where: n is a base between 2 and 9 and xxx is a number in that base
counter in the assembler keeps on incrementing as labels are encountered. • Characters: They are to be enclosed within single quotes,' e', 'R', etc.
• Strings: They are characters enclosed within double quotes "mine", "non", etc.
Typical assembly language lines are
• Boolean: TRUE or FALSE
NOO MOVR1, R2, LSL #2 ;copy the content of R2 left shifted, to Rl
NUMS DCW 2354, 5678 ;define two data halfwords
10.14.4 [ The RN Directive
In the above two lines, NOO and NUMS are the labels, MOV (with operands) is an
instruction and DCW is a directive, which is explained in the following section. The names of the general purpose registers have been introduced as RO, R1, R2, etc. J
When we use them for loading operands, it is possible that we have a confusion as to
which data has been loaded into which register. To ease out this problem, there is a way ~
10.14.1[ DirectivesforDefiningData I
for giving variable names to registers. Suppose we need to use RO for loading the value #
Before we go deep into programming, we need to understand a few directives of the
assembler which define and describe different kinds of data. Data which is used in a
ofX, and R1 for loading Y, we use the directive RN as follows: l
program can be bytes or words or half words. We define data and assign labels to their XRN0 ~
corresponding addresses. Defining data implies allocating space for data. The space we YRN1
1
allocate corresponds to memory addresses, which are identified by labels. Data, when
I
t
This method can be used for any of the registers. iil
stored in memory is defined accordingly, using directives. u
DCB defines data byte, DCW defines 16 bits or a half word and DCD defines a DAT1 RN 8 i~i
word (32 bits). DET RN 10 5
£
i
'ff
Examples
NUMS DCB 9,82,71 10.15 [ Writing Assembly Programs
NUMB DCW Ox6787,0x4564
NUMBR DCD Ox00000123,0x67890900 Now, that we have got used to writing some instructions, let's get down to writing a
complete program. This will let us get a feel of the programming process, after which we l:11
In the above, the first line shows data which are bytes. The first byte 9, has the address ~:,1:
can learn more important instructions and write bigger and better programs.
NUMS, 82 has the address NUMS + 1, and 71 has the address NUMS + 1. Iii
Iii'
,~11
366 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 367

Example 10.6 Table 10.13 [ List ofBranch Instructions


Mnemonic Instruction
Write a program to find the sum of 3X + 4Y + 9Z, where X=2,Y=3 and Z=4.
B Branch
Solution BL Branch and link
AREA SUMM, CODE, READONLY BX Branch and Exchange
X RN 1 ;register Rl is named X BLX Branch Exchange with link
EEG, s tems Me d 5ice0g2, Lua
Y RN 2 ;register R2 is named Y
Z RN 3 ;register R3 is named Z 31 28 27 26 25 24 23
ENTRY
MOV X,#2 ;load X = 2 into register Rl I COND [ , 0 1 I I
L Signed-immed_24

MOV Y,#3 ;load Y = 3 into register R2


MOV Z,#4 ;load Z = 4 into register R3 Figure 10.16 Format of a branch instruction
ADD Rl,Rl,Rl,LSL#l ;Rl = 3X
MOV R2,R2,LSL#2 ;R2 = 4Y unconditional. Most processors have 'jump' and 'call' instructions for changing the
ADD R3,R3,R3,LSL#3 ;R3 = 9Z sequence of execution. ARM does all this by different forms of a 'branch' instruction.
ADD Rl,Rl,R2 ;Rl = R1+R2 i.e. 3X+4Y It has the mnemonic 'B' for branch. The four different forms of branch instruction are
ADD Rl,Rl,R3 ;Rl = Rl+R3 i.e. 3X+4Y+9Z given in Table 10.13
Let's see the usage of each of them.
STOP B STOP ;continue branching at STOP Branching implies transferring control to a new memory location which is expressed
END ;end of the assembly file as a 'label'. Hence the format of any branch instruction is B label. Branching is made
$ roe,eases.erea.roa.Mes%,eh .e¥ car-ore.«epars.oer.' 3.-.29083%9 %3»Ms.% 6e39.398. 98.pt,1. 1.3$. . 2 8 06804 010.66 -. £9 0 89«3. 9a3.2,,3,
conditional by appending the mnemonic B with the necessary condition.
Since this is the first complete program we are writing, it is important to make some
Examples
observations regarding it.
B NEW ;transfers control unconditionally to location NEW
i) SUMM is the name of the code ARE A defined. The term 'Read only' is optional, as STOP B STOP ;continually branches to its own label STOP
by default, a code area corresponds to the Read only memory only. BNE NOO ;branch to NOO ifZ flag is not set ;1•.1..
I
ii) As assembly language line has the label field at the left, and the opcode field to its BHI LUX ;branch if high, i.e., if C = 1 i±,
right. For the Keil assembler, you may find that writing the word 'A RE A' in the label #
The format of a branch instruction is as shown in Figure 10.16. Target addresses arc 4
field will generate an error message. But the directive RN is to be positioned in the
'relative'. What this means is that when a branch instruction is taken up, the PC (program
label field itself No instructions should be in the label field. i,,i·
counter) value and the value specified by the instruction are algebraically added. I
iii) The ENTRY directive should be followed by an instruction or pseudo instruction. fi
The target is specified as a 24-bit signed number; this number is shifted left (logically)
iv) The program involves multiplication and addition. Since the multiply instruction is !
twice (so that the two LSB bits are zero. 1hi s makes all target addresses to be 'aligned' (Ref
a 'complex' one involving the use of special hardware (more power dissipation and
Sccion 10.5.2). The left shifting also multiplies the number by 4. This makes the target to I
more clock cycles), it is not used. Instead multiplication is achieved by the use of
shift and add instructions. have 26 bits, that is, the maximum range is between+/- 225 (one bit is for sign, remember). I
This number is added to the PC value. In short, what is done with the 24-bit immediate
v) The last instruction is an unconditional branch instruction (mnemonic 'B') and it
number, by the instruction is that itshifts itleftby two bits, sign extends itto 32 bits, andadds it
continually branches to the same label STOP. This is done so that control does not
go any instruction beyond this location. Any code has to finally be burned into
to PC.Thus, the maximum range for branching is only +/-32MB.(2= 2x2;2= 1MB, m
25 = 32). For branch addresses beyond this range, the PC can be directly loaded with the 1.1
4
ROM. Many embedded programs have their last line as this kind of self branching,
since we don't want the next memory locations in code memory to be accessed.
target address. Now see this simple program which calculates the factorial of 10. n4
ti
Example 10.7 +

10.16 I Branch Instructions AREA FACTO, CODE ;define the code area
For any processor, branching is a very important operation. 1he power to change ENTRY ;entry point
the sequence of execution is obtained by branching, which may be conditional or MOV R1, #10 ;Rl = 10
368 EMBEDDED SYSTEMS
ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 369
MOV R2, #1 ;R2 =1
;R2 =R2 xR2 iii) When th e result of subtraction becom es ve, (the condition 'MI' for minus), add the
REPT MUL R2 , Rl , R2
divisor to this negative num ber (in R3).
SUBS Rl, R1, #1 ;R1 =RI- 1
;branch to REPT if Z! = 0 iv) In this problem , when 16 is subtracted 31 (0x1F) tim es from 500, the value in R4
BNE REPT
;last line is +ve. One more subtraction makes the N flag to be set, and the num ber R4 to be
STOP B STOP
negative.
END
v) To this -ve num ber add the divisor. This m akes it equal to the rem ainder which is 4,
in this case.
This is a very sim ple program which finds the factorial of 10. It can be used to find vi) Thus, we get 31 (OxlF) as the quotient (in R3) and 4 as the rem ainder (in R4)
the factorial of any other num ber (except O), provided the factorial does not exceed
32 bits in size. The technique is to multiply the number with the 'num ber-1' recursively. 10.16.1 ] Subroutines/Procedures
Meanwhile, a counter also decrem ents by l(which is done by subtra ction), and when the
counter is 0, the Z flag is set. The multiplication is then stopped. The factorial is avail- In Table 10.13, th ere is another form of th e branch instruction which is BL standing for
able in the register R2. The branch instruction used is a conditional one, that is, BNE 'Branch and Link'. Recollect th at a procedure (also called subroutines, functions, etc.) m eans
which tests the Zero flag. The instruction before it, that is, SUB has been appended with th at a new program sequence is taken up, but control return s to th e original point after
that. Most processors (including ARM) use stacks to store th e retu rn addresses and return «}
th e 'S' suffix to ensure the setting of flags.
instructions to handle procedure calls. ARM has an additional feature to handle procedures +j
Now let's see another exam ple which uses condi tional branching. This program per-
form s division by repeated subtraction. in a simpler m anner. Recollect a register named th e 'Link Register'. When a BL instruction
is encountered, th e PC value is changed to th at of th e target, but th e old PC value is copied
Example 10.8 to the LR register. At th e end of the procedure, the LR value can be copied back to the PC.
Now let's write a program which calls a procedure.
AREA DIV, CODE
ENTRY Example10.9
MOV R1, #500 ;Move the dividend to Rl
Write a program to calculate 3x2 + 5Y, where X = 8 and Y = 5
MOV R2, #16 ;Move the divisor to R2
MOV R3, #0 ;R3 =
0 Solution
MOV R4, Rl ;copy the dividend to R4
AREA PROCED,CODE
REPT SUBS R4 , RA, R2 ;subtract and set flags
ENTRY
ADDPL R3, R3, #1 ;add if N =
1 i.e. MSB of RS is +ve
BPL REPT ;repeat the loop if the MSB is +ve MOV R2,#8 ;to calculate 3Xx? +5¥
ADDMI R4, RA, R2 ;if MSB of R4 is -ve, add R2 to R4 BL SQUARE ;call the SQUARE procedure
STOP B STOP ADD R1,R3,R3,LSL #1 ; 3x°
END
a» - k .
MOV R2,#5 ;R2 = 5
arsorzusarsznzaroe.sets.arr raa.cs3%1.%%«as-423 15060-5L.2us%2.7a
BL SQUARE ;call the SQUARE procedure
ADD R0,R3,R3,LSL #2 ;5Y°
This progra m perform s division by repeated subtraction. Here 500 is to be divided by 16.
The m ethod is to subtract 16 from 500 repeatedly until the resul t becom es negative.
ADD R4,Rl,R0 ;RA = RI+RO i.e 3X? +5Y
STOP B STOP ;last line in the execution
The branch instruction BPL REPT m eans Branch to label RE PT if plus (PL), i.e., if
N=0. SQUARE MUL R3,R2,R2 ;the SQUARE procedure
Besides conditional bra nching, there are the ADD and SUB instructions also, which MOV PC,LR ;return LR back to PC
are conditional- the condition used is the statu s of the sign flag N. END
etarosere .qr.1a, ..as«9:,r ps % ts1.5ta.7,a.20ersatzestr
93era4 e£a.9a893Pe5cu terrroarr98
The steps of the progra m are as follows:
i) Subtra ct 16 from 500, and check if the result is +ve or -ve. This can be verified by The salient points of this program are as follows:
checki ng the N flag which corresponds to the MSB of the resultant num ber. The
i) A procedure nam ed SQUARE has been used. This procedure uses the multiply
condition flags are updated by the subtra ction operation (using the suffix S).
instruction to find the square of any num ber. The num ber to be squared is passed
ii) If the num ber (in R4) is +ve, it m eans that subtraction can be repeated unhindered.
to the procedure using the regi ster R2. The square of the num ber is retu rned to the
Each tim e this is verified, the quotient register (R3) is increm ented by 1.
m ain program in R3.
EMBEDDED SYSTEMS
370 ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR
371

■f
ii) There are two num bers, X and Y, whose squares are to be found. Calli ng the pro-
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1312 11 10 9 8 7 6 5 4 3 2 1 0
cedure am ounts to just writing the instruction BL SQUARE . This instruction will
cause a branching to the procedure nam ed SQUARE . It also copies the current PC I Cond /0/0/ 1 >/ Rn / Rd / Shifter_operand I
value to the link regi ster (LR).
iii) The procedure has only two instructions: one to perform squaring, and the other to Figure 10.17a Format of a typical data processing instruction
copy the LR content back to PC. The second instruction causes a retu rn to the main
program .
iv) We need two mul tiplications, in addition to the squaring operation. These two, that 11 87 0
is, 3X 2 and 5Y? are achieved by shifting and adding. The MUL instruction is used as
ROT IMMED-8
little as possible because it takes more tim e, and causes higher power dissipation.
v) The last step is adding 3X 2 and 5Y? whi ch are now in R1 and RO. The sum is avail-
able in RA. x2
vi) Note that the last program line to be executed is STOP B STOP, even though it is
not the last line in the assem bly file.
In Table 10.13 there are two more form s for the bra nch instruction. BX stands for
Branch and Exchange. BLX is for Bra nch Link and Exchange. The Exchange featu re
is applicable when ARM and THUMB instructions are being used, and it is needed to
switch from one set to another. 32-Bit Cons tant
Figure 10.17b I Modification of the 'shifter operand'

10.17 I Loading Constants


When the im m ediate mode is needed, the 12-bit field is m odified such that there
How is it different for ARM? is an 8-bit im m ediate constant which is subject to a ROR (rotate right operation). The
An im portant addressing mode for any processor is the 'im mediate' mode. In this, a con- rotate opera tor can use only 4 bits. But since the maxim um rotation possible is 32 bits,
stant which is specified in the instruction itself is to be copied into a register, or is used the four bit 'rotate' opera nd is m ultiplied by 2 and then rotated, Hence, it becom es ±
as one opera nd in any arithm etic or logic operations. a case of'8 bits shifted by an even num ber of bit positions'. The rotated 8-bit num ber will %
becom e a 32-bit numb er during the data processing.
Examples
MOV R1, #07867
Let's try to understand how this is done. ~II
itlr
ADD R1, R2, # 567 ,I i
10.17.1 Generating a 32-bit Constant Using Rotation j
This seem s very obvious and direct. For CISC machines, this is fine, because the imm e- t%»a
Consider the steps in rotating to the right by 2, th e num ber OxF0 after expanding it to
diate data can be another byt e or word. But ARM has the lim itation that its instruction l'j
fill 32 bits space
size should nor exceed 32 bits, which m eans that the constant shoul d fit in the word ,
:'.·-ii
.
11,;
length of 32 bits along with the opcode, condi tion code, register code and other infor- Case 1 _]!~
mation that the instruction shoul d carry. It is thus apparent that we can't have a 32-bit The original 8-bit num ber is 1111 0000
jrj
ifl
constant em bedded in the instruction. Expanding it to fill 32 bits makes it :r:~
00000000 00000000 00000000 11110000 \ii
So then what is the maximum size ofthe constant "'
Rotating it right by 2 make s it
that can be used in the immediate mode?
We would like to be able to use im m ediate constants as large as 32 bits. How is this 00000000 00000000 00000000 00111100
done? ARM uses an ingenious technique, the idea being the use of rotation of a sm all i.e. Ox3C
num ber to generate a large num ber. We have already seen that there is a barrel shifter '
I
Case2 #
in the ALU. Any data processing instruction has a form at as shown in Figure 10.17a.

,
We can also use the MVN instruction for generating new num bers. lf0 is loaded into a 4
The data processing instruction form at has 12 bits available for operand 2.
register and moved into another or the sam e register after using the MVN instruction, i!:ij
Figure 10.17b shows the instruction format which has been modified for using the
we get OxFFFFFFFF it
im m ediate mode.
You can try this code to verify. #
ARM-TH E WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 373

EMBEDDED SYST EMS Table 10.15


372 8-Bit Numbe r ROR Constant

Table 10.14 I The Range of Constants That Can Be Generated By the Rotation Scheme iii 0x6D 16 0x6D0000
Decimal Values Equivalent Step Between Rotate iv 0x05 6 0x14000000
Hexadecimal Values V 0x3E 2 0x8000000F
0- 255 0-0ff 1 No rotate
0x100 - 0x3fc 4 Right by 30 bits
256, 260, 264, ... ,1020
0x400 -- 0ff0 16 Right by 28 bits
1024, 1040, 1056, ... , 4080

- -.
4096, 4160, 4224, ... ,16320 0xl 000- 03fc0
nrrvm m

MOVRl,#Ox0
MVNRl,Rl
SI.ti
64
- --
Right by 26 bits The answers for iii, iv and v are in Table 10.15
From Example 10.9, we note that many 32-bit numbers can be generated by the
ARM rotation scheme, but there are constants which cannot be obtained by this method.
For example, the number 0xllllllll cannot be generated by rotation.

Let's summarize the points regarding the generation of constants using the ARM
rotation scheme.
i) A class of constants can be generated by this scheme, (Table 10.14) but all constants
How then are such constants obtained for use in the
immediate mode ofaddressing?
10.17.2 I Literal Pools
, i

cannot be generated. Those that cannot be, will have to loaded directly into memory,
In computer science, specifically in compiler and assembler design, a literal pool is I
by using the concept of'literal pools'. We will come to that soon. (Table 10.14 shows (i
a lookup table used to hold literals during assembly and execution. But first, what I,
the range of constants that can be generated by the rotation scheme) exactly is a literal? In programming, a literal is a value written exactly as it is meant
d,
ii) To generate the constant needed, the programmer need not specify the 8-bit imme- to be interpreted. A literal can be a number, a character or a string. For example, in
~1'
diate number, and the number of rotations to be done. He just has to write an
the expression, x= 145,x is a variable, and 145 is a literal. Thus, literals are constants
instruction in the immediate mode. The assembler converts this instruction to the
required scheme. When a constant Ox.200000002 is needed, the assembler converts
it to the instructions.
MOVR1, #0x22
for ARM. When it is required to load a constant in a register, the assembler can help
by creating a space in memory and then placing this constant in the space. From this
memory space, the processor can take it and use it using load instructions. But assemblers
are not guided by instructions; they use what are called pseudo instructions. In this case
j
MOVR1, Rl, ROR#4, which creates the require 32-bit constants for us.
iii) The processor does not have an instruction for rotation to the left. But, rotating n
of making· a literal pool, and taking a constant from the pool; there is a specific pseudo
instruction
'I
times to the left is achieved by rotating (32-n) times to the right.
LDR Rd, = const 'I
Example 10.9 This pseudo instruction can construct any 32-bit numeric constant. Suppose we
Find the 32-bit constant generated by each of the following rotations need to the constant 0x33333333, it is likely that we write an instructions MOV Rl,
# 0x33333333. With this, the assembler will give an error message that such a constant
i) Rotate Ox40, to the right 30 times cannot be generated. To avoid such a situation, we write
ii) Rotate Ox.56, to the left 12 times
iii) Rotate Ox6D, to the right 16 times LDR R1, = 033333333.
iv) Rotate Ox05, to the right 6 times I
This is a pseudo instruction (don't confuse it with the LDR instruction, we will soon
v) Rotate Ox.FC, to the right 2 times
¥I

,
come to, there is a difference in format between the two). This will cause the assembler
f
Solution to check one of the following possibilities. ;
i) The 8-bit number is 01000000. i) Can the constant be constructed with MOV or MVN instruction combined with 4
00000000 00000000 00000000 01000000; the 8-bit number in 32-bit format rotation? If this is possible, the assembler generates the appropriate instruction, that ~''i
$
00000000 00000000 00000001 00000000; the number after rotation is, an 8-bit number is rotated appropriately to get the constant in question. ~i
Thus, the constant obtained is Ox.100 ii) If the constant cannot be constructed this way, the assembler places the value in a I
ii) The 8-bit number is 01010110 literalpool and generates an LDR (load register) instruction with a program-relative 1
00000000 00000000 000000000 01010110, the 8-bit number in 32-bit format. address that reads the constant from this literal pool. r
Rotating 12 times to the left is equivalent to rotating 20 times to the right I
t
00000000 00000101 01100000 00000000; the number after rotation i
-·------------------------------
The constant generated is 0x56000 I
I•
:~
374 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 375

Example10.10b
Example 10.10a
AREA PROGl,CODE,READ ONLY
AREA PROGl,CODE,READ ONLY
ENTRY
ENTRY

LDR Rl, =
#0x12400000
LDR R1, =
0x12400000
LDR R2, = 0x00555555
LDR R2, = 0X00555555
ADD R3,R1,R2
ADD R3,R1,R2
SPACE 4400
STOP B STOP
STOP B STOP
END
..
...,__...,
gun: e, egg.pg ez
_,..,.__,.....,
rope.t
,..
fflW
- ffllll~~l!iitilW1.·.::,mt:t:i'W.'l\i"..r.-w. END
5900960680906.0999.086000099
P/weest e0990,982./008, 090009%.006-6
0$09803096a3800er0we38 xx res errs

This is a modified version of Exam ple 10. lOa. Recollect that we need to have a litera l pool
In Exam plelO.lOa, two constants are needed. If you run this program and check for the constant Ox00555555. Here, a directive called SPACE 4400 has been inserted.
the disassem bly file, you will find two interesting facts, which relate to the different This directive creates an em pty area of 4400 byes. Because of this, the total space occu-
ways in which these two constants are genera ted. The assem bler realizes that the first pied by the program becom es large (greater than 4400 byt es, anyway). The literal pool is
constant can be obtained by the rotation schem e, but the second one cannot. So a usually after the progra m area. In this case, this will make the literal pool to be beyond
litera l pool is created just after the last instruction, and the constant 0x00555555 is th e range (greater than 4KB ) of th e LDR R2, = 000555555. Hence, on assembling, the
placed therein. Then a 'load register' instruction is generated to load the constant into following message is seen.
register. error: A1284E: Literalpool too distant, use LTORG to assemble it within 4KB
In the program an error message indicating this will be obtained as above.
How Is the literal pool accessed? To avoid such a situ ation, we can place the literal closer to the instruction which
The literal we need is accessed from the literal pool using a PC relative mode. In this needs the constant. See the m odified version of the program .
;
mode, onl y 12 bits are allowed for the 'relative num ber' which can be positive or negative.
Thus, the liter al in th e pool has to be withi n +/- 4KB of the current PC value. Example 10.1 Oc
AREA PROGl,CODE,READ ONLY
Where should the literal pool be placed? ENTRY
LDR Rl, = 0x12400000
Norm ally the literal pool is placed just after the END directive, which means just
after the end of the progra m area. This is okay for norm al size program s. But som e-
LDR R2, =
0X00555555
ADD R3,R1,R2
tim es, progra m s are very large and if the literal pool is placed after the end of the
LTORG
program , it may be out of range (of +/- 4KB ) of the LDR instruction. Such a situ-
SPACE 4400
ation im plies that there should be the flexibility of placing literal pools anywhere in
STOP B STOP
mem ory. This is done by the LTORG directive which allows us to define the origin
of a litera l pool. END
Ge.9Mo.ii»
¥Ms#9C.A993a0gO,a:6LL.E99 es.ao
0PA.9.9 er2r. 3%39%9%.150907 .z9±% en
When a pseudo instruction LDR Rd= const is encountered, the assem bler checks
if the constant is available and addressable in the nearest literal pool. If it is so, it takes Now the program runs without error, because a literal pool has been created before the
it from the pool. Otherwise, it attem pts to place the constant in the next literal pool. 'free space' of 4400 bytes. By the use of the LTORG directive, the required constant is
If the next literal pool is out of range, the assem bler generates an error message. In this found to have been placed in this pool, by the assem bler. Thus, we see that the directive
case, the LTORG directive is to be used to place an additional literal pool in the code. LTORG can be used to place literal pools wherever we want. This will becom e useful as
Place the LTORG directive after the failed LDR pseudo instruction, and within 4KB progra m s becom e larger.
mem ory space.
Literal pools are to be placed in locations where the processor docs not attem pt to 10.18 I Load and Store Instructions
execute them as instructions. It is best to place them after unconditional branc h instruc-
ARM is a RISC architectu re, and one of the featu res of RISC is that of being a 'load
tions, or after the retu rn instruction at the end of a subroutine. Let us sec an exam ple of
store' architectu re. Loading is the process of getting data from mem ory into a register,
a case where the LTORG directive becom es necessary.
and storing is just the reverse process. In ARM, data is brought into registers using a
376 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 377
load instruction, and only then can it be used for data processing. After computation,
the result can be 'stored' in memory. The memory in question is 'RAM' which is the
Table 10.16 I List of Load and Store Instructions

read/write memory. RAM is volatile and is used for temporary storage of data in the LDR Load Word STR Store Word
course of computations. The only instructions which access RAM are 'load' and 'store'.
LDRH Load Half Word STRH Store Half Word
All registers can be accessed using these instructions, but programmers are advised to
exercise caution when accessing critical registers like the PC, SP, etc. LDRSH Load Signed Half Word
The syntax for load or store is
LDRB Load Byte STRB Store Byte
LDR/STR {<cond>}<Rd>, <addressing mode>
Rd is the source register for store and destination register for load. LDRSB Load Signed Byte
The addressing mode gives us the necessary information to get the 'effective address',
which is the actual memory address to be accessed. The addressing mode is indirect
memory and registers. There are also instructions which differentiate between signed
because the memory address is not to be specified directly in the instruction, rather
and unsigned data.
a base register is mandatorily used. For the simplest case, an example of LOAD and
There are instructions which clearly indicate the kind of data to be moved. See
STORE instructions are as follows:
Table 10.16. From the table, we understand that we can load and store parts of a 32-bit
LDR R1, [R2] ;copy into R1 the content of memory specified in R2 word by using B for byte and H for halfword, along with the load and store instructions.
STR R1,[R2] ;store the content ofRl into the memory address specified in R2 If a memory locationcontains a 32-bit word, we can move the LSB (assuming little
endian format) into a register by using LDRB, or the lower half of the word by using
This implies that the load/store instruction must be preceded by an instruction
LDRH. Let's clarify this by an example.
which copies the address into R2. We will soon get to know how this is done. There
are various ways of specifying the effective address. The barrel shifter can be part of the Example 10.12
address specifying mechanism.
Two memory areas are being referenced and two registers are used as pointers:
Example 10.11 R1= Ox00000lO0
R2 = Ox40001200
Howis the effective memory address calculated in the followingload and store instructions?
Figures 10.18a and b show the data addresses and corresponding data.
i) LDR R3, [R2,LSL #2] Show the content of memory, after the execution of the following instructions:
ii) STR R9, [R1, R2, ROR #2]
Address Byte Stored
iii) LDR RA4, [R3, R2]
0 x00000100 56
iv) STR R5, [RA, R3, ASL #4]
O x00000101 23
Solution 0 x00000102 OD
O x00000103 AE
i) LDR R3, [R2, LSL #2]
In this the effective address is the content ofR2 left shifted by 2, i.e. multiplied by 4 Figure 10.18a I Address and data
ii) STR R9, [R1, R2, ROR #2]
Here, the effective address is specified by R1, R2 and a right rotation. To calculate Address Byte Stored
0 x40001200 00
it, the content of R2 is rotated twice by 2, and then added to the content of Rl.
O x40001201 00
vi) LDR R4,[R3, R2]
o x40001202 00
The effective address here is the sum of R3 and R2. O x40001203 00
vii) STR R5, [R4, R3, ASL #4]
Figure 10.18b I Address and data
The effective address is the sum of the content of R4 and the arithmetically left
shifted (by 4) content of R3.
mroast8 i) LDR R3, [RI]
ii) LDRB R3, [R1]
10.18.1 I Bytes, Half Words and Words iii) LDRH R3,[R1]
Now, let's see another aspect of load and store instructions. ARM has instructions iv) STRB R3, [R2] given that R3 = OxAE0D2356
to transfer specifically a word (32 bits), half word (16 bits) or a byte (8 bits) between
For this case, show halfword and word storage as well.
378 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-IT EMBEDDED PROCESSOR 379

Solution LDR Rl, [R7] ;Rl = OxCDEF8204 Case 1


LDRSH R2,R7] ;R2 = OxFFFF8204 Case 2
i) LDR R3,[R 1]
In this, the com plete 32-bit data in the address pointed to by Rl is copied to R3.
LDRSB R3, [R7] ;R3 = 0x00000004 Case 3
So R3 =0AE0D2356 only
For case 1, th e 32 bits are copied to R1. For case 2, the lower 16 bits are to be copied.
4
ii) LDRB R3,[R 1] The MSB of the 16-bit half word is' 1', and this is ext ended to 32 bits while copying to
In this, the byt e (LSB) of the word alone is copied to R3. Since it is an unsigned -~
R2. That's how the upper 16 bits of R2 becom e FFFF.
byt e, the rem aining byt es ofR3 contain 0. So R3 = 0x00000056 For case 3, th e lowest byt e alone is copied. Its MSB is 0. As such, the rest of R3 is il
iii) LDRH R3,[R 1] filled with zeros, i.e., the sign bit '0' is extended to fill the upper 24 bits. You can also #
I '~

In this, the half word (lower two byt es) of the address is copied to R3. observe in Table 10.16 that th ere are no store instructions for signed byt es or signed half l
R3 = 0x00002356 words. This is because storing sim ply means placing num bers in m em ory. These num bers li
iv) STRB R3,[R2] given that R3 = 0RAE 0D2356 may be signed, unsigned data or code- it is only when the user brings it to a register, 11t
1
In this, the byt e corresponding to the LSB of the data in R3 is copied to the address is the processing on that num ber done. Onl y then it is necessary for that num ber to be
1·1
pointed by R2. See Tables 10.17a, b and c for byt e, half word storage and word interpreted as signed or unsigned.
111··
storage as well.
Table 10.17a, b and c
10.18.3 I Indexed Addressing Modes i '.1

i!
In this mode, the effective address calculation can be done before a load/ store is executed };lj''
STRB R3, [R2] or afterwards. Let's see what it is all about.
0 x40001200 56
00 10.18.3.1 ] Pre-indexed Addressing Mode
00 Observe the instruction LDR RO, [R7, #4). Here R7 is the base register and the effective
00 address is R7 + 4. The data at this effective address is copied to RO.
Next, see the instruction STR Rl, [R5, R6, LSL #2). The effective address = R5 +
STRH R3, [R2] R6 left shifted twice.
In the above two instructions, there is a notable featu re, however. After the load/
0 x40001200 56
store is done, the base address content rem ains unchanged, that is, the effective address
23
is not copied to the base register. But if we want the base address to contain the effective
00 address, just suffix the instruction by the char acter "!' and then 'write back' occurs.
00 Consider the instruction LDR R2, [R6, #-8) !. In this, after the loading operation is
done, R6 has the effective address written back into it.
STR R3, [R2]
0 x40001200 56 Example 10.13
23
Calculate the effective addresses and explain what each instruction does.
OD i) STRB R2,[R6, R7, #Ox24)!
AE ii) LDRSH R4, [RlO, Rll , ASR #4]
;I../1.58. ./.32+5./%.2MC-1318291.323991
8,0933 @».ere.#rrEELAs•
8/8E.3.0.99c 3A.5.2Pr++ 93ues+a

Solution
10.18.2 ] Loading Signed Numbers i) STRB R2,[R6, R7, #Ox24]!
Signed num bers are those whose MSB is the sign bit. For positive num bers, the sign bit The effective address is the sum of the contents ofR6, R7 and the num ber Ox24.
is 'O', whereas negative num bers arc in the two's com plem ent form and have their MSBs 1he content of R2 is stored in the effective address. After that, the effective address
to be' 1', When a 32-bit num ber is available in mem ory, it can be loaded into registers as is copied to R6.
signed bytes and signed half words. In these cases, the MSB of the byt e part or the half ii) LDRSH RA, [R1O, R11, ASR #4]
word part is checked, and sign extension is done while loading it into registers Here, the effective address is the sum of the contents of R10 and Rl 1 after arith-
Consider the case of a word 0xCDEF8204 in m em ory. Let R7 be used as a pointer metically shifting it right by 4 positions. The half word in this address is loaded to
to that memory location. Then, observe the result of execution of the following instruc- RA. The contents of the base register rem ains unchanged.
tions, as given in the com m ents colum n. teet.32. €er r.weaus.e
rRat
cc. upaces,cs..pre .1396 er, egger?-9 'NT±r #R ES; As .t:.c a.a».2e;
ARM--THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 381
380 EMBEDDED SYSTEMS

10.18.3.2 I Post-indexed Addressing Mode 10.19 [ Readonly and Read/Write Memory


In this m ode, the effective address calculation is done after the execution of the specific Tue two mem ory areas defined by the com piler are 'Readonly' for code, and 'Read/write'
instruction has been done. for data. Usually this corresponds to ROM and RAM in a physical system . RAM is used
Take the case of the instruction LDR RO, [R4], #4 for interm ediate resul ts, for tem porary storage, etc., as this is volatile mem ory. We can
Here the data pointed by the content of R4 is first copied to RO. After that, the store data perm anently in the readonly mem ory, process it and copy it in RAM . In the
content of R4 is changed to R4 + 4. There is no need of th e "!' operation because th at is readonl y memory, data is written using directives like DCD, DCW, etc. From there, it is
exactly what post-indexing does. copied to readwrite m em ory using load and store instructions.

Example 10.15
Example 10.14
AREA FIRST, CODE, READ ON L Y
Let's add 10 numb ers which are in mem ory. The num bers are 16-bi t long, that is, half
ENTRY
words, and use two byt e spaces. The pre-indexed mode of addressing with write back is
used to index the half words which have addresses with a spacing of2 between them . The
LDR R7, NUMS =
;load the address of NUMS in R7

instruction LDRH R2, [R7, #2]! does the indexing of th e 16-bit numb ers.
LDR RB, NUMSl =
;load the address of NUMSl in RB
LDR R9, NUM S2 =
;load the address of NUMS2 in R9
LDR Rl, {R7] ;load the word to Rl
Solution
STR Rl, [R9] ;store the word in Rl in NUMS2
AREA DADD, CODE, READONLY
STR R1, [R8] ;store the word in Rl in NUMSl
ENTRY
STOP B STOP
LDR R7, =
TABLE ; copy the address of Table to R7 NUMS DCD 653451134
STRT MOV R0,#9 ;RO = 9 AREA SECOND, DATA, READWRITE
LDRH RI, [R7] ;load 1" number from memory to Rl NUMS2 SPACE 60
REPT LDRH R2, [R7,#2] ! ;pre-indexed with writeback NUM Sl DCD 0
ADD Rl,Rl,R2 ;Rl Rl+R2 = END
SUBS RO,RO,#I ;RO = R0-1 330908-%
136066%.,2009e987108991056 Raes, 9,86,99009%8R.0868099798.2309980.0
3Rq44009093 5.a hattut

BNE REPT ;repeat the addition until RO= 0


In Exam ple 10.15, three m em ory areas have been defined: one in readonly, and two in
STOP B STOP ; last line
readwrite m em ory. What is accom plished is just the tra nsfer of a word from readonly
TABLE DCW 3456,7859,1234,9876,3452,3214,7864,0987,2032
m em ory to readwrite m em ory. In readwrite mem ory, one part is a space of 60 byt es. The
END
994094. st.» ,±z..Ne.±a."
1.03.50.,99tu.hue ". +.
Ate9.PW9,a. 9999 sasses99es.e.tset0..tee.
8gt.82oat..o,cs.a3
next is a word space which is initialized to 0. After the execution of the program , the
number 653451134 is copied to both these spaces.
Let's exam ine the salient featu res of this program .
Example 10.16
i) Toe numbers to be added are stored in code mem ory, just after the last line of th e
AREA STRINl, CODE, READ ONL Y
program .
ii) 1hcre are 10 num bers to be added. The first number is loaded into register R1 and ENTRY
the rest are loaded one by one into R2. STRT LOR Rl, = SOURCE ;pointer to source string
iii) RO is used as a counter to the numbers. In a general case, if there are N num bers to LDR RO, = DESTIN ;pointer to destination string
be added, RO = N-1. Here N = 10. BL COPY ;call procedure for copying
i) The loading of th e num bers to R2 is done using a loop. Sinc e the numbers are half STOP B STOP ;last line of execution
words, their addresses are to be increm ented by2. lh is is done very efficiently by the
COPY LDRB R2, [Rl] ,#1 ;Load byte and update address.
pre-indexed addressing with write back schem e. After one half word is accessed,
STRB R2, [RO] ,#1 ;Store byte and update address.
the effective address is written back to the base register R7 in readiness for accessing
CMP R2, #0 ;Check for 0
the next half word.
BNE COPY ;repeat until the string is over
v) The address corresponding to TABLE is a 32-bit constant. It is calcul ated by the
MOV PC,LR ;return to calling program
assem bler, and loaded into R7 using the techniques mentioned in Section 10.17.
382 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 383
SOURCE DCB "I am sam",0

AREA STRIN2 ,DATA, READWRITE


DESTIN DCB 0 M

END RO M+1
R1 M+2
Exam ple 10.16 uses many of the program m ing aspects that we have been discussing so R2
far. Let's have a look at the im portant featu res of this program .
i) There is an ASCII string written in readonly mem ory using the DCB directive.
Such a string is enclosed in double quotes and each character is a byt e.
ii) One readonly and one read/ write m em ory areas have been defined. M+12
iii) After the ASCII string, a O is used as a term inating char acter. The arrival of this R12 M+13
0 in R2 is used to check whether the required transfer of the string is done. R13 M+14
iv) The instructions for loading and storing are suffixed by 'B'which indicates that only STM
R14 M+15
a byt e is to be tra nsferred.
v) Post-indexed mode of addressing is used for load and store. The addresses need to R15
be increm ented only by 1,as only a byt e is transferred.
Register Set Memory
vi) The instructions for loading and storing are in a procedure nam ed COPY. The pro-
cedure is 'called' by the BL instruction which does branching and also copies the Figure 10.19 I The LDM and STM instructions
current PC to the link register. The last line of the procedure is copying the LR back
to PC. This constitu tes the 'retu rn' to the main program . each register in the opera tion, and the increm ent or decrem ent can occur before or after
the opera tion. The suffixes for these options are as follows:

10.20 I Multiple Register Load and Store IA -- increm ent after


IB increm ent before
We have seen in Section 10.18 the LDR and STR instructions, which tra nsfer data
DA- decrem ent after
(in the form of byt es, halfwords or words) between a register and m em ory. Now, let's
DB - decrem ent before
see an advanced (or let's say an extended) form ofloading and storing, wherein multiple
registers are in volved. But onl y data in th e form of words (32 bits) can be handled by Consider the instruction LDMDA RO, (R4-R9)
these instruction. The mnem onic of multiple load and store is LDM/ STM. The base regi ster here is RO. Let us assum e it holds the numb er 0x45000000.The oper a-
tion of this instruction is that the 32-bit word at that address is pointed by RO, and that
10.20.1 [ The LDM Instruction word copied to R4. Then the address is decrem ented to point to the next word. So the
new address is [R0-4], and this word is copied to R5. The sequence of decrem enting
Let's talk about LDM first- it has the syntax
the address and loading data from mem ory is done for the registers R4, R5, R6, R7, RS
LDM{end}address-mode Rn~!),reg-list} and R9.
It is obvious that a single instruction replaces six LDR instructions. Is there any
Rn is the base register for the load operation. The address stored in this register is the
advantage in this? As far as execution is concerned, it is 'No'. All the. six load opera tions
starting address for the load opera tion. There can be a num ber of modes for specifying
have to be done. But note that only one 'instruction fetch' cycle is needed for the six load
the address. register-list is a com m a-delimitedlist of sym bolic register names and register
operations together. So there is definitely som e savings in term s of tim e.
ranges enclosed in braces. There m ust be at least one register in the list. Register ranges
are specified with a dash. For example, {RO-R5, R9} is a list. The! option is for 'write
What Is the difference in operation of the following instruction?
back' and the option is relevant for interrupts. We will not discuss the second option
here. Write back is not to be specified if the base register Rn is in register-list. LDMIA RlO,( R9, Rl - R5}
Multiple register load means that mul tiple mem ory locations are to be accessed, and Here the base address is in R10, and after each data tra nsfer, it is increm ented by 4. In
loaded into multiple registers. There is a 'base register' acting as a pointer for the first the destination register list, R9 is specified first, but the processor has a particular way
mem ory location to be accessed. This register is then incremented or decrem ented to of handlin g th e list. The lowest register will always be loaded from the lowest address in
point to the next mem ory addresses. There are four options for handling this. The base mem ory, and the highest register from the highest address. Here Rl gets the data in the
register can be incremented or decremented by 4 (one word needs four addresses) for address pointed by RlO.
EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 385
384

10.20.2 [ The STM instruction is the first one that can be taken out. It is sequential access that is done, and not random
access. Two operations are defined for a stack, that is, the PUSH, in which data is written
This has the same format as the LDM instruction. Consider the instruction into the stack, and POP in which data is read out and loaded into registers. The stack has
STMIA Rl, (R2-R4} a pointer to its top which is called the Stack pointer (SP). Por ARM, this is register R13.
This will be equivalent to the instructions This means that the address of the top of the stack is to be available in SP.
STR R2, [Rl]
STR R3, [R1. #4] 10.20.3.1 ] Types ofStacks
STR R4, [Rl. #8]] Ascending/Descending and Empty/Full
After the sequences of four stores are over, the base content does not vary, however. An ascending stack grows upwards. It starts from a low memory address and, as items
If you need it to be changed to that of the final address, the writeback operator "!'is to be are pushed onto it, progresses to higher memory addresses. A descending stack grows
used. So write the instruction as STMIA Rl!,(R2-R4} downwards. It starts from a high memory address, and as items are pushed onto it, it
Now let's use the LDM and STM instructions to simply Example 10.16 which progresses to lower memory addresses.
transfers bytes from one portion of memory (Readonly) to another portion (Read/ In an empty stack, the stack pointer points to the next free (empty) location on the
write). But the multiple load/store instructions can be used only for words (32 bits). So stack, i.e., to the place where the next item to be pushed, will be stored. In a full stack,
Example 10.17 has been modified and used to move 6 words. the stack pointer points to the topmost item in the stack, that is the location of the last
item pushed onto the stack. In practice, stacks are almost always full and descending.
Example 10.17 Most stacks are 'Full descending' types.
AREA STRINl, CODE, READONLY Let's consider a descending stack in which SP is first decremented and then data
ENTRY is pushed in. The reverse occurs for the POP operation. Stacks allow data to be pushed
or popped only as words (32 bits for ARM). Consider that SP= Ox50002000, and the
LOR R1, = SOURCE ;pointer to source contents ofRl and R2 are pushed in At the end of the operation we find that SP= SP-8
LDR RO, = DESTIN ;pointer to destination = Ox50001FF8. ARM does not have a mnemonic for PUSH, instead it uses the STM
LDMIA R1,{R2-R8} ;Load six words to R2-R8 instruction. To simplify the use of the STM/LDM instructions corresponding to PUSH
STMIA RO, {R2-R8} ;Store six words in destination and POP for different types of stacks, Table 10.18 can be referred to.
STOP B STOP

SOURCE DCD Ox675889,0xl234568,0x9876543,0x2345678,0x8907653 For the kind ofstack that we are talking about now, what
the instruction we can use for pushing the contents
AREA STRIN2 , DATA,READWRITE ;defin e the R/W memory area
ofregisters R1 to R3?
DESTIN DCD 0
END
teer. 3 pet.asses6retattoo.eeer.sea.pee.» a es.aswoe
The answer is STMDB SP! (Rl-R3}. We need SP to be used as the base register. For
e.e.gee.oaksa.eoar.au.arr.aeoeroecots.aa
in, SP is first decremented, and then storing is done. So we use the suffix 'DB'
Here 6 words from the source memory have been copied to six registers using just with SP. The operator '!' is used such the decremented value is available in SP.
one instruction. In the next instructicn, these six words are stored in the destination this simple program (Example 10.18) in which SP is initialized to Ox40000200.
memory Some values are loaded into registers R1 to R3. Using the STMDB instruction, the con-
Note how simple, the program is. of the three registers, that is, 3 words are pushed to the stack, and will be available in
Example 10.17 illustrates the idea that block data transfers can be simplified using memorv. At the end of the program, SP will be found to have the value of0x400001F4.
the multiple register instructions. But their real importance is for stack implementation.
Stacks are a necessity for any processor; stacks are needed for storing data temporarily
and also for storing return addresses and register values during procedure calls. We will Table 10.18 I Types of Stacks and Corresponding Instructions to be Used
see this now. For those who are not very familiar with the concept of stacks, here is a Stack Type Push Pop
brief review. Full Descending STMF(DB) LDMFD (IA)
Full Ascending STMFA (IB) LDMFA (DA)
10.20.3 ] Stack Empty Descending STMED (DA) LDMED (1B)
A stack is an area in memory, the accessing ofwhich is done in a special way. Most stacks Empty Ascending STMEA(IA) LDMEA (DB)
are Last-In First-Out (LIFO) type stacks. This means that the last data that was stored 2.2386.7.±1 886328
C S> SI
EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR
386 387

Example 10.18
The main program has a procedure named PROC1 whi ch is called by the j
instruction BL 'PROCl. This instruction causes the current PC value to be copied to
AREA STCK, CODE, READONLY LR. In PROC1, since we anticipate a nested procedure, we push LR and the worki ng
ENTRY registers to stack using the instruction STMDB SP!,[RE GS,LR}. Thus th e content of ii
'I
LDR SP, = 0x40000200 LR is safely stored in th e stack. i
I
MOV R1,#1 In the procedure PROCl, another procedure is called by the instruction BL
!
MOV R2,#2 'P ROC2. This instruction causes the copying of the present PC to LR. I PROC2, th ere 1
MOV R3,#3 is the instruction MOV PC, LR at the end. This will get the PC value back from LR, 'l
STMDB SP!,{Rl-R3} and thus execution goes back to PROC1. ]
STOP B STOP At the end of PROCl, th ere is the stack based instruction LDMIA SP!, (RE GS,
:11
END retrieves the contents of LR. This is given back to PC by the instruction MOV
LR. Thus, execution goes back to the main program . l1
Now, in this program , if the STM instruction is changed to STMIA, the stack becom es
Any part of memory can be defined as a stack, by sim ply defining the content of the
an ascending stack, and the value of SP will be Ox4000200C, after program execution.
pointer register. Let's write a procedure using a stack. ,1, 111;

Thus, it is obvious that a stack is a data structu re which can be defined by software. I
'l i
10.20.4 I Stacks and Subroutines/Procedures
For m ost processors, procedures use a stack to store the retu rn address. A procedure is
CODE, READONLY ]IiI,~
taken up by a 'CALL' instruction. This causes the action of pushing the current value of
LDR R7, = 0X40000000
PC onto the stack. The procedure ends with a 'RE TURN ' instruction. This causes the
LDR SP, = 0x40000210 ;define SP
PC value to be popped back.
MOV R1,#1
For ARM, so far (Section 10.16.1) we have used procedures without the necessity of
MOV R2,#2 MG
a stack. That is because the Link Register (LR) keeps the retu rn address, when a proce-
MOV R3,#3
dure is called. But think of the case of nested procedures. There is onl y one link register
BL PROCl ;Call PROC1
for a mode, and a new procedure will overwrite th e existing link register which stores the
LDR R6, [R7] ;load R6
details of the previous procedure, and very soon things may go out of hand.
B STOP
In such cases, a stack is a necessity. Each tim e a procedure is called, the PC value
is saved in the LR, as is the usual case. When a nested procedure com es in, the content STMDB SP! ,{LR,RI-R3) ;save registers and LR on stack
of the link register is pushed on to the stack, and popped out from the stack when exit- MOV R1,#0x34
ing the procedure. Figure 10.20 shows the sequence of actions needed to take care of a MOV R2,#0x45
nested procedure. MOV R3,#0xDC
Now, let's try to understand the sequence of actions indicated by Figure 10.20. BL PROC2 ;call PROC2
In the m ain program , we define a stack by giving a value to the stack pointer (SP). STR R5, (R7 ;store RS
LDMIA SP! ,{Rl-R3,LR} ;retrieve registers from stack
MOV PC,LR ;copy LR to PC
Main Program PROC1 PROC2
ADD R4,R2,Rl ;the nested procedure PROC2
LDR$P,cl / ADD R5,R4,R3

-
STMDB
SP!{REGS, LR} MOV PC,LR ;go back to PROCl
---------------
--------------- --------------- END

BL PROC1 BL PROC2
---------------


--------------- a mple 10.19 shows the in stance of a nested procedure and the use of the stack. The
---------------
--------------- is in tune with the sequence outlined by Figure 10.20. Nothing very im portant
LDMIA
--------------- SP!{REGS, LR} achieved by the progra m , But it shows how any nested procedure can be written.
---------------
~ MOV PC, LR MOV PC, LR PROC1 changes the contents of registers R1, R2 and R3, but since they have already
een saved on the stack by th e STMDB in st ruction, their contents can be retrieved
Figure 10.20 I Sequence of actions needed for a nested procedure hile returning to the main program .
EMBEDDED SYSTEMS
388
ARMTHE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 389
PROC2 adds the new contents of R1, R2 and R3, and returns. In PROC1, the sum
in R5 is stored in the memory location pointed by R7. Later, in the main program, this
STOP B STOP
content is loaded to R6.
END
Example 10.20
This program uses the concept of'bubble sorting'. First the 15 numbers stored in the vari-
Write a program which arranges numbers (stored in readonly memory) in ascending
able ARRAY are loaded, in two different steps. After that, two loops are taken in which
order, and placl them in the R/W memory.
the first loop uses a counter from O to 14. The first loop traverses from one end of the array
AREA NUM,DATA,READONLY to other, and the second loop is used to compare nearby values and swap them according
ARRAY DCD 2,7,4,5,11,17,3,15,8,6,9,19,10,23,20 to which value is lesser than the other. Hence, by each iteration of the outer loop, the low-
est value in the array slowly comes to the first element, and this process continues.
AREA COD,CODE
ENTRY
LDR RO , ARRAY = ; load ARRAY to RO 1
I
LDMIA R0,(R1-R10} ;load 10 numbers to Rl to R10 Conclusion
MOV SP,#0x40000000
STMIA SP,{Rl-Rl0}
;location of R/W memory
;store the 10 numbers
With this, we come to the end of our discussion on the architecture and assembly lan- iiy
guage programming for ARM. There are a few more instructions, pseudo instructions l
ADD SP,#A0 and directives that haven't been dealt with, but that can be learned by referring to a book ,,
1)
ADD R0,#40 fully dedicated to this processor. ,.il
LDMIA R0,{Rl-RS} ;load next set of 5 numbers
STMIA SP,{RI-R5} ;store them
MOV SP,#0x40000000 ;address of Read-write memory
MOV Rl,#0 ;Initialize counter Rl to zero
KEY POINTS OF THIS CHAPTER
MOV R3,SP o ARM is the most popular of the 32-bit processors in the market.
LOOPl MOV R2,#0 ; outer loop, counter from one end
o ARM, the company does not fabricate chips-instead it sells the design as 'Intellectual
MOV RA4,SP
Property'.
LOOP2 CMP R2,#14
;branch to OUTER o The ARM family consists of members ARM7, 9, 10, 11 and Cortex versions.
BEQ OUTER
ADD R2,#1 ;increment the counter o Lower power dissipation and good computational capability are the chief attributes of
LDR R0, [R4] ;stored as 4 bytes the ARM processor.
;hence a jump of 4 o The processor has a large set of registers, and operates in seven modes.
LDR R5, [RA4,#4] o It can be programmed in assembly and one of the ID Es available is Keil RVDK.
ADD RA,#4
;comparing nearby values o The barrel shifter in the ALU has a lot of relevance, as it simplifies computations.
CMP R0,R5
BLT LOOP2 o ARM can use data processing instructions conditionally, by suffixing 5 to it.
MOV R6,R0 o There is a link register (LR) for simplifying procedure calls.
MOV R0,R5 ;swapping and storing them
o It has a special mechanism for handling immediate data which is bigger than 8 bits.
MOV RS,R6
o It has multiple register load and store instructions .
STR RS, [R4]
SUB R4,#4 o Stacks are needed when nested procedures come .
STR RO, [R4]
ADD RA4,#4
B LOOP2
QUESTIONS
OUTER
ADD R1,#1 1. List out the important features that make ARM ideal for embedded applications.
CMP R1,#15
2. Name two aspects in the design of ARM which has made it a processor with 'low-power
BNE LOOPl
dissipation'.
3. What is the use of a cache for any processor?
390

4. What does the acronym ISA mean to you?


EMBEDDED SYSTEMS
#=-Ill WMIV$
WIST IWIIIN
5. What are the advantages and disadvantages of'pi pelining'?
6. What is the penalty incurred in the case of'unaligned' data?
7. How is 'rotation to the left' achieved in ARM?

8. How is the instruction LDR different from the pseudo instruction LDR?
32-III HREIII
ll
9. Why is it that compare instructions don't need the suffix 'S'?
10. How is the write back operator used in the 'pre-indexed' mode of addressing?

9£E$$IR
EXERCISES
MINI II- II'MEI NI NVR MMII
Write instructions for the following, without using any CISC type instruction.
a) move into R7, a byte multiplied by 8
b) move into R6, a word multiplied by 17
# MEI/ II$IM £
c) move into RS, a number divided by 8
2. What do the following instructions mean and what is accomplished? In this chapter, you will learn
a) ANDEQ R1, R2, R4
b) ADDHI R2, R4, R2 o The internal architectu re of LPC 2148, o The program m ing of the tim er unit
c) MOVAL R7, R5 a typ ical and popular ARM? MCU o The program m ing of the PWM unit
d) SUBME R1, R2, R7 o The buses in this MCU o How to use the serial com munication unit
e) CMP R1, R2 o The list of peripherals inside the chip o The internal structu re of ARM9 and
f) TEST R1, R3
o The m em ory map of the peripherals Cortex-M3
g) MOVGT R2, R5
h) ADDLT R5, R6, R7 o The program m ing of the GPIO

3. Write assembly language programs for the following.

a) Find the factorial of any number (the factorial should fit in a 32-bit register)
b) Do division using repeated subtraction
c) Find the sum of the first 100 natural numbers. Save the result in memory Introduction
d) Find the sum of 10 numbers stored in Read only memory the result should be in Read/
write memory In the previous chapter, we m ade a thorough stu dy of the core of ARM. The stu dy was
e) Store 15 numbers in memory and arrange them in not exhaustive; there are m any m ore aspects, featu res and advancem ents introduced with
- descending order each new version of the architectu re. The trick is to learn more as and when you need
- ascending order to use the chip for a specific application. What we did in the last chapter was assem bly
f) Write a program with a procedure call using language program m ing, so that the com putational capabilities of ARM, the processor,
- without using stack are clear.
- using stack In this chapter, we take a different approach-we exam ine ARM as a m icrocontroller
aka SoC (System on Chip). The application dom ain of this processor is in the
em bedded field. A number of peripherals are added inside the chip so as to make it a
'm icrocontroller'and as the peripherals increase in num ber and com plexity, it is sufficient
to make a com plete system . The num ber and kind of peripherals needed depends on the
application, but there are som e peripherals which are m ore or less a standard featu re in

Chapter- opening image: An ARM7 LPC2146 board.


392 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 393

most microcontrollers, examples are timers/counters, serial ports, general purpose I/O, TMS"» TI» XTAL2
TRST] TCK" [ TDO" XTAL1 ♦ RST
etc. More advanced MCUs have I2C, SPI, RTC, PWM units and so on as internal
peripherals. MCUs with more advanced cores have peripherals such as LCD controllers,
CAN controllers, USB controllers, etc. LPC2141/42/44/46/48
Test/Debug
Interface
In this chapter, we will take a look at the internal block diagram and internal buses P0[31 :28] and
P0[25:0]
of some ARM MCUs, study a few selected peripherals and write a few programs for ARM7 TDMI -S
these peripherals. P1[31:16] «!»',I
AHB Bridge
All the programs presented in the chapter have been tested and verified on the LPC
2148 MCU using the Keil RVDK.
ARM MCUs are manufactured by different firms and so there are a variety of
MCUs and peripheral boards in the market. We choose a few popular ones for our
study.
J1
9

We start with an MCU based on the ARM 7 core-NXP founded by Philips (the ·,
company) has manufactured and popularized the LPC 21xx series which is a set of l
:-.1.
MCUs with sufficient peripherals for a moderately complex application. Let's begin 4l
with the LPC 2148 MCU which is one member of the LPC 214x series, the other
members being LPC 2141/42/44/46. Data sheets and user manuals for the series are EINT3 to External
VPB (VLSI
Peripheral Bus) D+
D-
±',
!
1

EINT0 Interrupts UP_LED


available which give all the fine details of each and every peripheral. Some important Connect
details of the chip are given in Appendix D. The data sheet is available in the site www.
pcarson/lyladas/.
4 x CAPO --+----.i
Capture/Compare

,! t
4 xCAP1 SCL0, SCL1
(W/External Clock) { ] l'C-bus Serial
Bx MATO
Timer 0/Timer 1 Interfaces 0 and 1
8 x MAT1 SDAO, SDA1

11.1 I Block Diagram ADO[7.6] and


AD0(4:1] ND Converters SPI and SSP
SCKO, SCK1
MOSI0, MOSl1
0 and 1% Serial Interfaces
Figure 11.la is the photograph of an MCB 2140 board, with the LPC 2148 and AD 1[7.0 0 1 MISO0, MISO1
SSELO, SSEL1
other interfaces marked on it. Figure 11.lb shows the internal block diagram ofLPC
2148. Let's take a look at this block diagram, and discuss some aspects of this very Figure 11.1b Internal block diagram of LPC 2148
complex chip.

SD
Adapter
Pot1 LPC214x JTAG RS-232 11.2 I Features of the LPC 214x Family
I The different functional blocks in this SoC family (which includes the 2141/42/44/48)
are shown in Figure 11.1, and let us attempt to understand some of them. The details
of the ARM7 core was thoroughly covered in the previous chapter and so it is not
repeated here.
The main features provided by this family are listed below. It is not necessary to go
through the list comprehensively right now. Once the most important features are stud-
ied in detail, this list may be used as a back reference.
i) The core ARM 7TDMI-S in a tiny LQFP64 package
ii) 8 KB to 40 KB of on-chip static RAM
iii) 32 KB to 512 KB of on-chip flash memory
iv) 128-bit wide interface/accelerator enables high-speed 60 MHz operation
v) USB 2.0 Full-speed compliant device controller with 2 KB of endpoint RAM.
In addition, provides 8 KB of on-chip RAM accessible to USB by DMA
Speaker LEDs Reset INT1 USB
10-bit ADCs provide a total of 6/14 analog inputs
Figure 11.1a I Photograph of the LPC 2148 MCU on a MCB 2140 development board Single 10-bit DAC provides variable analog output (LPC2142/44/46/48 only)
394 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 395
iii) Two 32-bit tim ers/external event counters (with four captu re and four com pare ± /die Mode
channels each), PWM unit (six outp uts) and watchdog
In the idle m ode, instruction execution is suspended until either a reset or interrupt occurs.
iv) Low power real-tim e clock (RTC) with independent power and 32 kHz clock input
But peripheral functions can continue opera tion and may generate interrupts to
v) Multiple serial interfa ces including two UARTs (16C550), two fast 12C-bus
cause the processor to resum e execution. The idle mode elim inates power used by the
(400 Kbit/s), SPI and SSP with buffering and variable data lengt h capabilities
processor, m em ory system s, related controllers and internal buses.
vi) Vectored interrupt controller (VlC) with configurable priorities and vector addresses
vii) Up to 45 of 5 V tolera nt fast general purpose I/O pin s in a tiny LQF P64 packa ge Power-down Mode
viii) Up to 21 external interrupt pins available
In the power-down mode, the oscillator is shut down and the chip receives no internal
ix) 60 MHz maxim um CPU clock available from program m able on-chip PLL
clocks.
x) On-chip integrated oscill ator operates with an external crystal from 1 MHz to 25 MHz
This mode can be term inated and norm al operation resum ed by either a reset or
xi) Power saving m odes include idle and power-down
certain specific interrupts that are able to function without clocks. Since all dynam ic oper-
xii) Processor wake-up from power-down mode via external interrupt or BOD
ation of the chip is suspended, this m ode reduces chip power consum ption to alm ost zero.
xiii) Single power supply chip with POR and BOD circuits
We now take a more detailed look at the im portant featu res of the chip. It may be 11.2.4 ] Internal Buses
necessar y to refer to Figure 11.1 while discussing these features.
11.2.4.T ] AMBA

11.2.1 ] Memory 'Advanced Microcontroller Bus Architectu re' or AM BA is a standard defined by ARM
in 1996, for on-chip buses in its SoC designs. In Figure 11.2, a num ber of buses can be
The mem ory available includes up to 40KB static RAM and 512KB flash. In the case of
seen which form part of AM BA. The figure shows the bus structu re re-drawn to em pha-
LPC2146/48 only, an 8 KB SRAM block intended to be utilized mainly by the USB,can
size the functionality of the constitu ent buses of the AM BA standard. Three buses with
also be used as a general purpose RAM for data storage and code storage and execution.
different protocols and speeds have been defined, for catering to the different kinds of
com ponents present inside the chip.
11.2.2 ] Memory Map The fastest bus is the system or the local bus, which connects the processor core
The total memory space is 4 GB (corresponding to an internal address bus of 32 bits, i.e., with mem ory, as mem ory accesses have to be very fast. In the LPC 21xx series, serious
2= 4GB).It is a 'm em ory m apped 1/0' system in which peripherals and mem ory share thought has been given to the idea of speeding up special peripherals, and as such, a
the sam e memory space. GPIO (General Purpose 1/0) block is also connected to the local bus. This perm its
peripherals connected to this fast GPIO block, to use the high speed of the local bus.
11.2.3 I System Functions
The system functions include a crystal oscillator and a PLL (Phase Locked Loop). The Fast
oscillator frequency can be in the range of 10 to 25 MHz which can be multiplied up, ARM 7
GPIO
Core
to get a system frequency up to 60 MHz using the PLL. There is also the possibility of
AHB VIC
changing the system frequency dynam ically (using the PLL). When the system is idling,
Bridge
the frequency can be scaled down to reduce power dissipation.

11.2.3.1] Reset AMBAAHB


There are two ways of resetting- a hard reset using the active low reset pin and a soft
SRAM
reset on account of the watchdog tim er. In either case, the reset vector is the address
&
0x0. Reset also starts a tim er designated as the 'wake up timer'. This tim er ensures that Flash VPB
a minim um delay is allowed for the system to stabilize and then only code execution is Divider
allowed to start.

1- - - - - - - - - - - - - - - - 1
11.2.3.2 ] Power Control I I
: VPB Peripherals :
One of the main featu res of ARM processors are their low-power dissipation Besides a
I I
basic low power design in the technological aspects, low power modes are also available:
they are the idle m ode and power-down m ode.
Figure 11.2 I The internal bus structure
EMBEDDED SYSTEMS
396 ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 397
Along with the core, an AHB bridge is seen which defines the high speed AMBA's
In this setup, the flash memory is arranged as a bank of 128 bits such that each
'Advanced High Performance Bus' facilitating the 'Vectored Interrupt Controller'. The
access to flash allows 128 bits to be accessed. This will require the flash to be organized
third bus is the VP B bus which stands for VLSI Peripheral Bus; there is a bridge which
as 4 memory modules, where each module will have a bandwidth of 32 bits and thus
communicates between the low speed VP B bus and the higher speed AHB bus. The
effectively 128 bits at a time. In practice, the speed of memory will not get multiplied
VP B is the one that connects to all the peripherals of the LPC 214x SoC.
by 4, but it improves the speed over the case of having a flash memory and 32 bits
access, at a time. All the extra hardware to get this done is in the memory controller,
11.2.4.2 { The VPB Bus and Divider i.e., the MAM.
Figure 11.2 shows that there is a bridge that interfaces between the VP B and the AHB.
There is also a VP B divider. This is a register whose settings can be used to divide the
output frequency of the PLL so as to get a reduced clock frequency (1/2 to ) for the
VP B peripherals which need to operate at a frequency lower than the processor. The 11.3 I Peripherals
processor clock is designated as CCLK, while the peripheral clock is called PCLK. On Section 11.2 contains a list of the peripherals available in this chip. Each of the peripher-
reset, PCLK is ¼ of CCLK. als has addresses and the peripherals use the memory mapped 1/0 scheme of addressing-
this means that both memory and I/O share the same address space. The total address
11.2.5 I Memory Accelerator Module space is from Ox0 to OxFFFFFFF, i.e., a 4GB space.
The memory map of the system is as shown in Figure 11.4.
Instructions are executed by fetching them from memory. In a typical MCU, program
code, that is, the instructions are stored in flash memory. Flash memory is rather slow, What are the notable features in this memory map?
which means that program execution gets slowed down, and so the very purpose of hav- A
ing a high speed processor gets defeated. The easiest solution is to have a shadow RAM i) The 512KB of flash (non-volatile) memory has an address starting from 0x0.
as in PCs, where the content of flash is copied to RAM (on startup) so that this (fast) ii) The 40 KB of static RAM has the addresses starting at Ox4000 0000.
RAM is accessed rather than the slow flash. Such a solution can be thought out for our iii) There is a RAM allotted for USB with DMA' applications.
MCU as well, but here we are thinking of 'on-chip'flash and RAM which are limited iv) The peripherals attached to the AHB have addresses from OxF000 0000.
in size- so copying program code to on-chip RAM is not a feasible solution. Another v) The peripherals attached to the lower speed VP B bus have the addresses from
possibility is to have a fast cache on the chip, but that will need additional hardware and 0xE000 0000.
increase the complexity of the chip.
VPB Peripherals Next, we will learn how to use the peripherals of the chip. Each
The final solution to this has come in the form of a module named the 'memory
peripheral has a number of special function registers (SFRs) associated with it, and each
accelerator module'. In simple terms, the memory accelerator module (MAM ) attempts
SFR has a specific address. To use a specific peripheral in the way we want, the associated
to have the next ARM instruction that will be needed, in its latches in time to prevent
registers should be written with the appropriate bits.
CPU fetch stalls (see Figure 11.3).

Memory Address 11.3.1 I GPIO (General Purpose 1/0) I


If you look at the pin configuration, which is available in the manual of the chip, it will
be seen that most pins have more than one function. There are three to four designa-
Flash Memory Bank tions for each pin, and which of these is valid at a time depends on how the pin has been
(128 Bits) 'programmed' using the pinselect block.
Bus
There are two general purpose 32-bit ports, PO and Pl, with restrictions and features
ARM Local Bus Interface
as explained below. These are the pins to which external peripherals can be connected.

Buffers
11.3.1.1 ] Port0
Port 0 is a 32-biI /O port with individual direction controls for each bit. 28 pins of the
Port 0 can be used as general purpose bi-directional digital I/Os, while P0.31 provides
digital output functions only. The operation of Port 0 pins depends upon the pin func-
tion selected via the pin connect block. Pins PO.24, P0.26 and P0.27 are 'reserved' and
Figure 11.3 Simplified view of the memory accelerator module
not available for use.
ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 399
398 EMBEDDED SYSTEMS

4.0GB OxFFFF FFFF


1 1 .3.1 .2 I Port 1
AHB Peripherals Port 1 is a 32-bit bi-directional 1/0 port with individual direction controls for each bit.
3.75 GB OxFOOO 0000 The opera tion of Port 1 pins depends upon the pin function selected via the correspond-
in g pin connect block. Pins O through 15 are not available. Out of the rem aining pins
APB Peripherals
from 16 to 31, the pins 16 to 25 are 'reserved'. In effect, only very few pins of Port 1 are
3.5GB 0xE000 0000
available and they can be used for GPIO only, because other pin functions are used for
-
JTAG.
3.0GB f--- Reserved Address Space OxCOOO 0000

2.0GB Ox8000 0000 11.3.1.3 \ Pin Connect Block


Boot Block
(12 kB Remapped from On-chip Flash Memory) The purpose of this is to configure the pins to the desired functions. This acts like a
Ox7FFF 0000
Ox7FFF CFFF multiplexer.
Let's look at it this way. Each pin of the chip has a maximum of four functions. To
Reserved Address Space select one specific function for a pin, a multiplexer with two select pins, is necessary. The
select pins function is provided by the bits of the PINSEL registers.
Ox7FDO 2000
Ox7FDO 1FFF Only three 'pinselect' registers are available and needed too, because only Port 0
8 kB On-chip USS DMA RAM (LPC2146/2148) and half of Port 1 are available as peripheral pins. See Appendix D for details of the
Ox7FDO 0000
Ox7FCF FFFF
PIN SELECT registers. As a sam ple, pin selection for Pin No 29,PO.5 has been shown in
Table 11.1, and the selection of the pin using th e Piselect logic is shown in Figure 11.5
Reserved Address Space

Ox4000 8000 Table 11.1 I Pin select Logicfor P0.5


Ox4000 7FFF
32 kB On-chip Static RAM (LPC2146/2148) Bits of PINSEL Port Pin Value of Function Selected Reset
Ox4000 4000 Register No PINSEL Bits Value
Ox4000 3FFF
16kB On-chip Static RAM (LPC2142/2144) 11: 10 P0.5 00 GPIO 0
0x4000 2000
01 MISO0 (SPI0)
Ox4000 1FFF
8 kB On-chip Static RAM (LPC2141) 10 Match 0.1 (Timer 0)
Ox4000 0000
1.0GB 11
Ox3FFF FFFF ADO.7
Ee-Re
.a..23.290%.98.90223800250%.7$039.59307735283¢ ca

Reserved Address Space

Ox0008 0000
Ox0007 FFFF
Total of 512 kB On-chip Non-volatile Memory (LPC2148) GPIO
Ox0004 0000
Ox0003 FFFF SPIO
Total of 256 kB On-chip Non-volatile Memory (LPC2146) Pin P0.5
Ox0002 0000 Select
Ox0001 FFFF Match 0.1 Mux
Total of 128 kB On-chip Non-volatile Memory (LPC2144)
Ox0001 0000
OxOOOO FFFF AD 0.7
Total of 64 kB On-chip Non-volatile Memory (LPC2142)
0x0000 8000
OxOOOO 7FFF
Total of 32 kB On-chip Non-volatile Memory (LPC2141)
0.0 G8 OxOOOO 0000
11:10
Figure 11.4 I Memory map of the Soc Bits of PINSELO Reg
Figure 11.5 [ Pinselect mux of pin P0.5
400 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 401

Table 11.2 I Generalization of the Function of the Pinselect Register Bits SFRs
PINSEL0 and PINSEL 1 Function 0
1
00 Primary function, typically GPIO IODIR
G
01 First alternate function p
Second alternate function I
10 IOCLR
0
11 Reserved
Sr±.39r8Er±.$.929/I.
7£722099:33,33909±5.388££
7.9%±M08.3601.29305
$704:±$%.3
303611-5980%.32
IOSET p
i
n
s
What Table 11.1 displays is that the bits 11 and 10 of PINSEL0 register should be IOPIN
00 if pin No:29 (P0.5) is to be a GPIO, 01 if it is to be used as SPIO, 10 if it is to be a
match pin for Timer 0 and 11,if it is used as AD0.7. 31
In general, pin selection is as shown in Table 11.2 Figure 11.6 Registers pertaining to one GPIO pin
The description of a pin shows that it can have 4 possible functions. The logic on the
select pins decides the functionality of the port pin. For any pin, the bit configuration '00'
of the corresponding pinselect register bits program the pin to act as a general purpose i) IODIR (IO Direction register): The bit setting of this register decides whether a
I/0 pin. By default, on reset, all port pins act as GPIO pins. pin is to be an in put(0) or output(1).
ii) IOSET (IO Set register): This register is used to set the output pins of the chip. To
Example 11.1 make a pin to be '1', the corresponding bit in the register is to be '1'. Writing zeros
have no effect.
Use the PINSEL0 register for activating PWMl, PWM2 and PWM3 outputs, which iii) IOCLR (IO Clear register): To make an output pin to have a '0'value, i.e., to clear
are at pins P0.0, PO.land P0.7, respectively. it. The corresponding bit in this register has to be a'1.Witing zeros have no effect.
iv) IOPIN (IO Pin register): From this register, the value of the corresponding pin can
Solution
be read, irrespective of whether the pin is an input or output pin.
Refer Appendix D, which gives the complete listing of the pin functions of the chip.
PWM output pins are listed as the second alternate function of any port pin. The corre- Example 11.2
spondingbits of PINSEL0 register for activating these port pins to act as PWM should
be given a logic of 10. See Table 11.3. Let us attempt to make the lower 16 GPIO pins to be output pins.
Then IODIR = 0x0000 FFFF
Table 11.3 To send zeros to all these pins, IOCLR = 0x0000 FFFF.
Output Function Output Pin Bits of PINSEL0 Register Now, check the pin values in the register IOPIN, which will have the lower 16 bits
to be 0.
PWMl PO.O 1:0
Now, if these pins are to be set, IOSET = Ox0000 FFFF
PWM2 P0.7 15:14
Check the pin values in the register IOPIN, which will have the lower 16 bits to be 1.
PWM3 P0.1 3:2
2.2RC.7E28 5325218558.,2£ 51.09 1-533.28a
801.39239139/8%23687
2E79327788ML 89550303.66 03.3'¥ 3BO%%883970089%819
Example 11.3
Thus, PINSEL0 register has the value -0000 0000 0000 0000 1000 0000 0000 1010= Generate an asymmetric al square wave at the lowest four pins of Port0.
Ox0000800A
gr.-±EOE399.320.3900.38.8Mr+±£6¢%8Ai.+±e «s.....0;A9sos..ca.save
:es0es-42.£ML. #include <LPC214X.H>
int main(void)
{
11.3.1.4 ] Using GPIO Pins unsigned int x;
These pins can be used for applications for which specific 'controllers/drivers' are not for (;;)
available inside the chip-for driving an LCD display, relays, motor controls, ON/OFF {
functions and so on. Four registers are available for this. They are shown in Figure 11.6 IODIR0 = 0xFFFFFFFF; //Make all pins as outputs
and listed as follows: for(x = 0;x<4;x++) //Delay for the high part
402 EMBEDDED SYSTEMS ARM--THE WORLD'S MOST POPULAR 32BIT EMBEDDED PROCESSOR 403
IOSET0 = 0x0000000F; //Set the lowest four bits
P0.0 to P0.3

for(x = 0;x<l2;x++) //Delay for the low part


IOCLR0 = 0x0000000F; //Clear the lowest four
bits PO.Oto P0.3

Example 11.3 is a simple example to show the function of the four GPIO registers
corresponding to Port 0. The use of Embedded Chas been discussed in Chapter 9 and is
not repeated here. This program sets and clears the lowest four bits of Port Oat a certain lowest four pins of Port0 for Example 11.3
asymmetric rate (note that the delays are different).
Figure 11.7 shows the output at Port pins 0.0 to 0.3, viewed in the logic analyser
which is available in the 'simulator' of Keil RVDK. LEDs may be connected to the port and this function is then called. To make the ON time to be twice the OFF time, the
pins and, then they will go ON and OFF at the rate determined by the delay. In the wait function is called twice when the pin is '1'.
program, the OFF time is three times the ON time.
The contents of the registers in Figure 11.6 can be observed in the 'peripherals' part Note For both the above programs we haven't used the 'PINSELECT BLOCK for
of the simulator. choosing the pin function. This is not necessary, because on reset, all port pins behave as
GPIO itsel£
Example 11.4
11.3.2 ] The Timer Unit
#include <LPC214x.h> Next let's study the timers/counters of LPC 2148. A timer and a counter are function-
void wait(void) ally equivalent, except that a timer uses the PCLK for its timing, while a counter uses an
external source. This means that a counter is used to count external events via the capture
int d; inputs. A counter is also called a 'capture timer'.
for (d = 0; d <1000000; d++); Here, we discuss the timer function alone. There are two such units-Timer 0 and
Timer 1. There are a number of registers associated with timer operations. Let's discuss
int main(void) the functionality of each of them for Timer 0 which we notate as TO. When we use
Timer 1, we use similar registers, but with the notation Tl.
IODIRl = 0x00010000; //Make Pl.16 an output The first step in timer operation is to load a number into what is called a 'match
while(1) register'. Then a timer count register is started. This register keeps incrementing for each
{ PCLK cycle or a lower rate pre-scaled cycle. When the content of this timer count reg-
IOSET1 = 1<<16; //P1.16 = 1 ister becomes equal to the value in the match register, i.e., a match occurs, the delay that
wait(); occurs from the starting time can be used for our 'timing'. Figure 11.8 illustrates the idea
wait(); of timing done using the timer unit.
IOCLR1 = 1<<16; //P1.16 = 0 Now, let us understand the special function registers associated with Timer 0. Keep
wait); in mind that a similar set of registers exist for Timer 1 as well.

} 11.3.2.1 ] Important SFRs of Timer 0


Ee, "
£09,8/9¢4¢'0€..09qer. 99I /» .M,"" Pee,
reg.9$6, 799".3 : .€3. . 9096830%
$ 0, $ 50 82.M84 MM 9 9 9. . 0 //90919,09449, 903.2299%,22
%9.0.237.2.7%

Timer Count Register-TOTC


Example 11.4 is another program which generates a square wave at the GPIO. Here This is a 32-bit register, which gives it a range of counting from 0 to OxFFFF FFFF and
only one pin is chosen and it is Pl.16. That pin is made 1 by loading '1'on the LSB of then wraps back to the value Ox0000 0000. This register is incremented on every tick of
IOPINl register and shifting it left 16 times. Next the IOPINl register is cleared. This the clock (i.e. PCLK) , if the prescale counter is made 0 (the use of the prescaler will be
program generates a square wave at Pl.16. The delay is created in a function named wait explained subsequently).
ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 405
404 EMBEDDED SYSTEMS
The lowest three bits are for controlling the operations related to the Match register 0.
Match Control Register
The next three are for MRl, MR2 and MR3, in that order. Remem ber that th ere are four
match registers.
Match Register See the bits of the register related to Tim er opera tion.
Bit 0-1: When '1', an interrupt is activated when m atch occurs
Stop When 'O', th e interrupt is disabled.
Control
Reset Bit 1-R: When '1', th e Tim er count register is reset when match occurs
When 'O', this featu re is disabled.
Timer Count Register
Bit 2-S: When '1', th e tim er count (TOTC) and the pre-scale counter will be
stopped when match occurs, also the Enable bit of the TOTCR is m ade '0'.
Let's make a sim ple tim er for which the steps are as listed in the following section.
Timer Control Register

PCLK
11.3.2.2 ] Timer Operation
Prescaler
i) Load a num ber in a match register.
Figure 11.8 [ Simplified block diagram ill ustrating the opgation of the timer unit ii) Star t the tim er by enabling the 'E' bit in TOTCR.
iii) The tim er count register (TOTC) starts increm enting for every tick of the peripheral
clock PCLK (no prescaling is done).
Timer Control Register-TOTCR
iv) When the content of the TOTC equals the value in the match register, tim ing is said
This is an 8-bi t regi ster (Figure 11.9) in which only the lowest two bits need be used. to have occurred.
Bit O-E: Thi s bit is the Enable bit. When this bit is '1', th e counter is enabled and
v) One of many possibilities can be m ade to occur when this happens.
starts. Then, the count in TOTC is increm ented for every cycle of PCLK (if prescaling
vi) The possibilities are to reset the tim er count regi ster, stop the tim er, or generate an
is not used).
interrupt. This 'setting' is done in the TOMCR register.
Bit 1-R: This bit is the Reset bit. When '1', th e counter is reset on the next positive
edge of PCLK. Now let's design a very sim ple tim er for generating a symm etric square wave at
Pl.16, using Tim er 0.
Match Registers (MR0 to MR3)
There are four 32-bit match registers available: MRO to MR3. For the operation of one Example 11.5
tim er, one of the m atch registers may be sufficient and is used by loading a number into it.
During tim er operation, the tim er count register starts increm enting, and at som e #include <LPC21Ax.h>
tim e, its count 'm atches' with the num ber in the match register. When this match occurs, void wait(void);
som e action can be progra m m ed to be done by 'configuring' the bits of the 'm atch control int main(void)
register'.
T0MR0 = 0x000000FF; //Load a number in the
Match Control Register-TOMCR
match register
This is a 16-bit regi ster Figure 11.10) uscd to specify the event to occur when the match T0MCR = 4;; //Stop when match occurs
occurs. while(l)
{
Bit 1 BitO IODIRl = 0x00010000; //Make Pl.16 an output

Figure 11.9 I The important bits ofT0TCR


II IOSETl = 1<<16;
wait();
pin
//Pl.16 = 1
//Call the wait function
IOCLRl = 1<<16; //Pl.16 = 0
wait();
S R

Figure 11.10 I Important bits ofT0MCR


406 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 407
void wait(void) //The wait function

TOTCR = l; //Start the timer

'
while(!(TOTC == TOMRO)) ; //Until TOTC = MRO
TOTCR = 2; //Reset the counter HT8.7ms
TOTC = O; //Make the timer count Fig 11.11b I Square wave from the timer with a) T0OMRO == OFF b) TOMRO == OFFFF
reg= 0

11.3.2.4 ] Using the Pre-scaler


Explanation of Example 11.5
To get a lower frequency output, there is the facility of using the prescale counter. There
i) In the program, Timer 0 is used. The timer match control register (T0MCR) is are two registers associated with 'prescaling'-the prescale counter and the prescale
written with value 04, which indicates that the timer count register is to stop when register.
match is obtained. The match register (of Timer 0) is loaded with the number OxFF. The prescale counter increments for every PCLK, and when it counts up to the
These two steps are the initial conditions. value in the prescale counter (T0PR), it allows the timer counter (T0TC) to increment
ii) The pin P1.1 is chosen as the output pin. It is set to '1'initially and then the wait its value by 1. This causes the T0TC to increment on every PCLK when PR =0, every
function (which creates a delay) is called. After one call of the wait function, P1.16 2 PCLKs when PR = 1, every three PCLKs when T0PR =2,every four PCLKs when
is made '0'and the wait function is called again. This causes a sym metric square wave TOPR = 3 and so on. In effect, load a number into T0PR, which will cause the output
to be obtained at this pin. Since this is a GPIO pin, it is understandable that the frequency to get divided.
other pin may also be used here to get the same functionality. The mechanism is like this.
iii) The wait function creates the needed delay. The timer control register (T0TCR) is
loaded with the value '1', so as to enable (start) the timer counter so that the timer i) Say the prescale counter contains a number 2. When the timer is started, the pres-
register T0TC increments for every clock tick of PCLK. When TOTC increments clae counter decrements to 1 and only when it reaches 0, the timer count in T0TC is
to the value OxFF (stored i T0MR0), 'match' occurs. At this point of time, the incremented by 1. Thus, the timer count is incremented once only when the prescale
counting is stopped, and the timer counter register is cleared. counter counts down from 2 to 0, i.e., in 3 clock periods of PCLK.
iv) This whole sequence of events is repeated to get a continuous square wave. ii) This continues as long as the timer is enabled.
iii) For the case in Example 11.6, not using a prescalc value amount to loading a num-
11,3.2.3 I Calculation ofTimer OutputFrequency ber 0 into the prescale counter.

Now let's calculate the frequency of the square wave generated. The above program has See Example 11.6, which shows the part of Example 11.5 which has been modified
been tested on a board in which the peripheral clock (PCLK) is obtained by dividing the by the additional instruction T0PR = 1. We now get a timer frequency of 57 Hz, for
crystal frequency of 60MHz by 4 (using VP B register settings) Thus PCLK is 15 MHz T0MR0= OxFFFF. Without prescaling, the frequency would have been 114Hz.
now, and it has a period ofT = 0.067 µsecs.
The function 'wait' creates a delay which is equal to 256 periods of PCLK (as Example 11.6
T0MR0 = OxFF), i.e., 256 x0.067 µsecs.= 17.075 µsecs. This delay is half the period
of the square wave generated at Pl.16. The period of the signal is thus 34µsecs and the #include <LPC214x.h>
frequency = 29.4 KHZ. void wait(void);
Next, change the match counter value to OxFFFF. A similar calculation gives a fre- int main(void)
quency of 114 Hz. Figure 11.ll(a) and (b) show the square waves generated for match
values of OxFF with T = 34 S and OxFFFF with T=8. 7msecs. TOMRO = OxOOOFFFF; //Load a number in the
match register
TOMCR = 4;
TOPR = l; //Stop when TC is reached

Fig 11.11a
a, while(l)

lat.are 3as0 .% 3.ass4-m s.ret6t0ta.ts494 o t 62 p.irk s4.4a3L±toe.tr.3rtte


«ts+so
7.tis1
.3.ilt attitu de •nss. res tta$
408 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 409

Table 11.4a TOMRO = OFFFF TOIR = 0x01; //Clear match O interrupt


TOPR Division Factor Frequency at Pl .16
VICVectAddr = 0x00000000; //End of interrupt

0 None 114 Hz
int main(void)
2 57 Hz
2 3 38Hz IODIR1 = 0xOOFFOOOO;
//Pl.16.23 defined as
4 5 22.8 Hz Outputs
83; RGEE3ad 98 E08-%2#.JEN8EE9RCA8SOSEEHS IOCLRl = OxOOFFOOOO; = 0
RM,3%8±.S2727899.RE220932.99%9%
//Pl.16
TOTCR OxOOOOOOOO; = //Disable timer counterO
Table 11.4b MRO = OFF TOPR Ox00000002; = //Prescaler value
TOMCR = 0x00000003; //Enable interrupt and
TOPR Division Factor Frequency at Pl .16
reset on match
0 None 29.4 KHz
TOMRO 0xFF; = // MRO value
[
1 2 14.7 KHz tl•;,
··1
VICVectAddr 4 = (unsigned)Timer O_ISR; q:,
2 3 9.8 KHz
//Set the timer ISR if
3 4 7.35 KHz vector address
GAG t.ca «£A&G'.E Ge7LutliaaGea«
VICVectCntl4 = Ox00000024; //Set channel
VICintEnable = OxOOOOO //Enable the TIMER-O
Tables 11.4(a) and Table 11.4(b) show the frequencies generated by Example 11.6 interrupt
ll for match values of OxFF and OxFFFF, for different pre-scaling factors. TOTCR = OxOOOOOOOl; //Enable timer counterO
I for (; ; ) ;

£
k»,
.",
11.3.3 I Timer O in the lnterupt Mode
Next, we write a program for Timer O to operate it in 'the interrupt mode. Example 11.7
tsAt
#wt.aura

it is such a program. To understand how the interrupt mechanism is incorporated here, a 11.3.3.1 Vectored Interrupt Controller (VIC)
brief introduction to the interrupt structure of the chip is necessary.
See Figure 11.1 in which the block diagram of LPC 2148 has a VIC as a peripheral. It
The discussion starts on a general note, and converges to the use of Timer O0, in the
is the VIC that manages all the interrupts of the ARM core (IROs and FIQ;) as well as
interrupt mode. This will help us to use any other peripherals in the interrupt mode by
interrupt requests from the peripherals.
using the associated registers of the peripherals and its related interrupt registers. You
can, for instance, try to write programs for PWM and UARTs in the interrupt mode. Features of VIC
Example 11.7 is a program for generating a square wave at pin Pl.16. The calculation for
the frequency at this pin is the same as presented in Section 11.3.2.3. • 32 interrupt request inputs
As part of the mechanism of understanding interrupts clearly, instructions of this • 16 vectored IRQ interrupts
program are referred to, at every step of the forthcoming discussion. • 16 priority levels dynamically assigned to interrupt requests
• Software interrupt generation

Example 11.7 The vectored interrupt controller (VIC) take s 32 interrupt request inputs and pro-
grammably assigns them into 3 categories: FIQ, vectored IRQand non-vectored IRQ.
#include <LPC214x.h> The programmable assignment scheme means that priorities of interrupts from various
unsigned int x= 0; peripherals can be dynamically assigned and adjusted.
_irq void Timer O ISR (void) Fast interrupt requests (FIQ) have the highest priority. If more than one request
is assigned to FIQ, the VIC ORs the requests to produce the FIQ_signal to the ARM
x
if( )
1;= processor.
Vectored IROs have the middle priority, but only 16 of the 32 requests can be
IOSETl = 1<<16; //P1.16 = 1 assigned to this category. Any of the 32 requests can be assigned to any of the 16 vec-
else tored IRQslots, among which slot O has the highest priority and slot 15 has the lowest.
IOCLRl = 1<<16; //P1.16 = 0 Non-vectored IRQs have the lowest priority.
ARM--THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 411
410 EMBEDDED SYSTEMS
Table 11.6 I Bit Definitions in the VIC Interrupt Enable Register for a Few Interrupt Sources

IRO 1
Bit 7 6 5 4 3 2 1 0
A
IR02 R Symbol UART 1 UART0 TIMER 1 TIMER 0 ARM Core 1 ARM Core 0 - WDT
: M
V
I
C
IRQ
- C
0
R
Access R/W R/W R/W R/W R/W R/W R/W R/W

For example, see Table. 11.6 which is part of this registers' bit definitions for the
E
interrupting peripherals. It seen that the bit position for Timer 0 is '4'and hence bit 4 of
IRON
-- this register is to be set if Timer 0 is to be operated in the interrupt mode.
Fig ure 11.12 I The VIC's connection to ARM Example 11.7 uses the instruction

VIC Int Enable= Ox000000lO; //Enable the TIMER-0 interrupt by


The VIC ORs the requests from all the vectored and non-vectored IRQs to produce making bit= 4
the IRQsignal to the ARM processor. Figure 11.12 illustrates this.
s 11.3.3.4 I Vector Control Register (VIC Vect CntlO- 1 SJ
11.3.3.2 I Interrupt Sources for the VIC
Only 6 bits of this register are to be used. They are lower 6 ones.
Each peripheral device has one interrupt line connected to the VIC, but may have sev- Thus, in Example 11.7 where Timer 0 is allotted the number 4, as in Table 11.7.
eral internal interrupt flags. Individual interrupt flags may also represent more than one c
interrupt source. (See Table 11.5 which shows a part of the list of interrupt sources for VICVectCntl4 = Ox00000024; //Set channel ¢
4 the VIC) For example, Timer 0 is assigned a number '4 and has eight interrupt flags, ~
3 i.e., interrupts are generated for matching due to four match registers and four capture 11.3.3.5 I Vector Address Registers (VIC Vect Add) ~

,4
#j
nwf
registers, but the VIC associates Timer 0 with just one interrupt line.
Next, we will discuss briefly some of the important registers associated with the VIC.
These are read/write accessible registers. These registers hold the addresses of the inter-
rupt service routines (ISRs) for the vectored IRQslots.
In Example 11.7,
ti
E""
11.3.3.3 I Interrupt Enable Register (VIC Interrupt Enable)
This is a read/write accessible register. This register controls the decision of which of the VICVectAddr4 = (unsigned)TOisr; //Set the timer ISR vector address
32 interrupt requests and software interrupts are allowed to contribute to the generation
This instruction indicates that the ISR is at address 'T0isr' which is the starting
of an interrupt.
location (address) of the ISR.

Table 11.S I Connection of Interrupt Sources to the VIC


Block Flag(s) No 11.3.3.6 I Using Timer O in the Interrupt Mode
WDT Watchdog Interrupt (WDINT) c Besides adding the instructions corresponding to the VIC, the timer program for oper-
ating in the interrupt mode has some minor differences from its operation in the status
Reserved for Software Interrupts only
check mode.
ARM Core Embedded ICE, DbgCommRx 2
First of all, TOMCR is to be programmed to generate an interrupt on match. Hence
ARM Core Embedded ICE, DbgCommTx 3 the 0" bit of this register is to be set (Section11.2.2.1). So in Example 11.7,
TIMER 0 Match 0-3 (MR0, MRl, MR2, MR3) 4
Capture 0-3 (CRO, CR1, CR1, CR3) TOMCR=3; I generate interrupt and reset T0TC.
TIMER 1 Match 0 - 3 (MR0, MR 1, MR2, MR3) 5
Capture 0 3 (CR0, CRl, CR1, CR3) Table 11.7 I The lower 6 Bits of the Vector Control Register

UART0 Rx Line Status (RLS) 6 Bits Function As Used in Example 11.7


Transmit Holding Register Empty (THRE) 4:0 The number of the selected interrupt 00100
Rx Data Available (RDA) 5 Enable the chosen IRQ slot 1
Character Time-out Indicator .537128.409072999%£3138.72302233.613BE.~
8232382£6608.82%£80BU059362/10~
412 EMBEDDED SYST EMS ARM-TH E WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 413

11.3.3.7 [ Timer O Interrupt Register (TOIR) When MCUs have dedicated PWM units, they have registers which can be
program m ed for the required frequency of the pulse train as well as the duty cycle. For
This register has bits for each of the matching states of MR0 to MR3. When a tim er
this ARM? MCU, the PWM unit works sim ilar to a tim er unit. Also if its PWM m ode
opera tes in the interrupt m ode and a match occurs, an interrupt is generated, and the
is not enabled, it works onl y as a tim er.
corresponding flag bit in TOIR is set. To 'clear ' it, a '1'm ust be written into this sam e
register. Then onl y will the interrupt flag be 'reset'.
There is a free running tim er regi ster (PWMTC), whi ch matches to any of the ii
seven 32-bit m atch registers (MR0 to MR6). The m atch register values are continuously
Table 11.8 shows that the corresponding bit for Tim er 0 in T0IR is bit 0. In lll'
i]
com pared to the count in the tim er register which increm ents (with pre-scaler) when 111
Example 11.7, the instruction used is
started. One m atch register (MR0) is dedicated to the action of deciding the frequency #
TOIR = Ox01; II Clear match 0 interrupt of the pulse train, by resetting the count upon m atch. The other m atch registers can be

Steps of the Program (Example 11.7)


used for fixi ng the duty cycle. Thus there are six PWM output pins from which PWM
signals can be sim ultaneously generated.
!Ii
'1)1
i) The opera tion of the tim er is quite straight forward. When a match occurs, Tim er 0 l
I I~
interrupt is activated. 11.3.4.1 I Single Edge Controlled PWM A
.
+,a
ii) Progra m control goes the ISR T0isr, in which pin Pl.16 is com plemented, and the #i
l
Here only the falli ng edge of a pulse tra in is controlled. Two match registers can be used ti
tim er flag is reset.
to provide a single edge controlled PWM outp ut. One match regi ster (PWMMRO)
iii) To signal the end of the interrupt, a dum m y write to the VICVectAddr register is
controls the PWM cycle rate, by resetting the count upon match. The other match reg-
also done.
ister controls the PWM edge position, and thus it controls the duty cycle.
iv) Then, control goes back to the m ain program .
Multiple PWMs can be obtained by using more than one match register. For exam -
ple, refer to Figure 11.14. For this, PWMMR0 is used for specifyi ng T. MR1 to MR6,
11.3.4 I The Pulse Width Modulation Unit
MR3 or MR4 can be used for specifying P. Note that for all pulse trains generated at
4
Pulse width modulation is basically a scene in which it is possible to control the period
II and duty cycle of a squar e wave. The duty cycle is defined as the ratio of the ON tim e of
different PWM output pins, the pulse repetition rate is the sam e, as it is decided by the
num ber in the m atch register MR0, which is com m on to all the pulse trains.
I. the pulse and the period, expressed as a %.
See Figure 11.13 which shows pulse trains at 25, 50 and 75% duty cycles.
Rules for singl e edge controlled PWM outp uts:

ti1
,
i) All single edge controlled PWM outp uts go high at the beginning of a PWM cycle.
Table 11.8 Bits ofTOIR for the Interrupts Generated When 'Match'Occurs ii) Each PWM outp ut will go low when its match value (in MR1 to MR6) is reached.
Bit Symbol Description If no match occurs (i.e. the match value is greater than the PWM rate), th e PWM
0 .. MRO Interrupt Interrupt flag for match channel 0.
outp ut rem ains continuously high.
iii) When a match occurs, actions can be triggered autom atically. The possible actions
MRl Interrupt Interrupt flag for match channel 1.
are to generate an interrupt, reset the PWM tim er counter, or stop the tim er. Actions
2 MR2 Interrupt Interrupt flag for match channel 2. are controlled by the settings in the PWMMCR register.
3 MR3 for match channel 3.
Note In double edge controlled PWM, both the rising and falli ng edges of the PWM
waveform are controlled by the match registers. We lim it the discussion here, to just the
' ' single edge controlled PWM.
V
Q)
25%
0 f----r---r---1M>-- - - - ,- - ~
6
>
0
11.3.4.2 I Calculating the Frequency
V
>>>
::, 50% MR0 is used to decide the frequency of the pulse train. As calculated for the tim er
0
0 (Section 11.3.2.3) PCLK (15 MHz) has a period of0.067sccs. If PWMMR0 is loaded

r 75%
0
'I
I
I
I I
I
I• ►I
I

T ·- - ""4-- - T .'d T _____., T --~ -- T

Fig ure 11.13 I Pulse trains at different duty cycles Figu re 11.14 I A pulse train showing the period T and ON time P
414 EMBEDDED SYSTEMS
ARMTHE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 415
Table 11.9 I Calculation of the Period of the Pulse for Different Values of PWMMR0
Table 11.11 I List of the PWM Output Channel and Corresponding Port Pin
N in PWMMR0 T(µsecs) Frequency (f) = 1 IT
PWMNo Channel No Output Pin No
OxFF 256 x 0.067 = 17.152 58.3KHz
PWMl 1 PO.O
Ox SE 95 x0.067 = 6.365 157 KHz
2 P0.7
PWM2
Ox FFFF 65,536 x0.067 = 4390.8 227 Hz
RS 9events¥ Jae as up ; HIV at ice PWM3 3 PO.l
PWM4 4 P0.8
with a number N, it counts from Oto N. A match occurs after 0.067µsecx (N+1), and
PWMS 5 P0.21
this is the period T of the pulse train generated. Refer Table 11.9 for a sam ple set of
PWM6 6 P0.9
calculations. net Rs

11.3.4.3 ] Calculating the Duty Cycle 11.3.4.5 I Control Registers of the PWM Unit
The duty cycle is the ratio of ON period (P) to the total period T. As per Figure 11.14, it There arc a num ber of registers for this application, and the usage of each of them can be
is P/T, expressed as a percentage. referred from the manual of the chip. Here, we only discuss a few.
There are six matchregister s for decidi ng th e pulse ON tim e. We will consider the
simplest case of single edge controlled PWM. Let's use the m atch register PWMMR3. PWMTCR This is the PWM Tim er Control Register. This is an 8-bit register. Only the
In this case, when the tim er count value matches the value in PWMMR3, th e PWM lower 4 bits of this register need to be used Figure 11.15).
outp ut pin goes low. This will end the high period of the pulse. See Tablell.10 for a BR O CE: COUNTER ENABLE When '1', th e PWM tim er counter and Prescale
sample calculations based on the value of MR3, for T = 6.365µsecs. counter are enabled.
4
$
' Example 11.8
Bit 1-CR: COUNTER RE SET 'When '1', the above mentioned PWM tim er
count regi ster and Prescale counter are reset on the next positive going edge of PCLK.
They rem ain reset until this bit is made '1'.

£a Calcul ate the value of the value to be given in PWMMR0 andPWMMR 3 to get a pulse
train of period 5 ms and duty cycle 6f 25%.
Bit 2: R-Reserved
Bit 3: PE-PWM ENABLE. When '1', th e PWM mode is enabled. Otherwise the
PWM unit acts as just a tim er.
Solution
In Example 11.8, PWMTCR = Ox9
5000 secs = (N+1) x.067
(N+1) = 74,626 = Ox12382 PWMP CR Thi s is the PWM Control Register. This is a 16-bi t regi ster and is used to
The num ber to be loaded in PWMMR0 is 1 less than this, i.e., it is 12381. enable and select the typ e of each PWM channel.
25% duty cycle corresponds to 1.25 msecs This register enables or disables the six PWM outputs, and also chooses between
(N1+1)x 0.067 secs = 1.25 msecs double and single edge control. Bits 0, 1, 7 and 8 and 15 are unused. Table 11.12 shows
Calcul ating Nl = 18,655 the state of the bits of PWMPCR for choosing between single and double edge control.
The number to be loaded in PWMMR3 = 18,655 = Ox48DF.
"9.¢39.99

R PE R CR I CE

11.2.4.4 ] The PWM Output Pins Figure 11.15 Important bits of PWMTCR
Corresponding to the six m atch registers, there are six PWM output pins, and they are
called the PWM channels. The pins and PWMPCR registers bits for enabling each of Table 11.12 ] Choosing Between Single and Double Fdge Control Using Bits of PWMPCR
them are given in Table 11.11. Bit No of PWMPCR When Bit= 0 When Bit= 1
PWMPCR.2 Single edge control for PWM2 Double edge control for PWM2
Table 11.10 I Calculating the Duty Cycle for Different Values of PWMMR3
PWMPCR.3 Single edge control for PWM3 Double edge control for PWM3
Nin PWMMR0 Value in PWMMR3 I, (s ecs) Duty cycle = PIT
PWMPCR.4 Single edge control for PWM4 Double edge control for PWM4
OxSE OxF 1.004 µsec 15.5 %
PWMPCR.5 Single edge control for PWMS Double edge control for PWMS
Ox2E 3.15 µsec 49.4 %
a33L:233323596.125s..£2rare.5qazc.Gr2si.Lugo ! PWMPCR.6 Single edge control for PWM6 Double edge control for PWM6
du,8-3£ 7£,99.38a±I3314.65.3£48
{:44ES4is± 46+,5±Ser ±.±3 9258s.
4. 5£284205$488386$3.$8€85±#841.438$$E%45al
418 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 419

°
Ht- Ty«T »
L
p
C
Po o[_] ► PWM1

PO.7 I PWM2
Figure 11.17 I The output waveform obtained from Example 11.1 O 2
1
4
8
Next, let's consider the genera tion of m ore than one PWM pulse. Exam ple 11.11 »» PWM3
P0.1 ~
is a progra m which gets PWM outp uts from channels 1, 2 and 3. Note that all PWM
outputs will occur at the sam e repetition rate. I
Example 11.11
Figure 11.18 I Output pins for the PWM channels

#include<LPC214x.h>
void PWM_InIt(void)

PWM3
PINSEL0 I= 0x0000800A; //Enable PWM 1,2 and
i 3 outputs

'•
;
n
PWMPR
PWMPCR
=
=
0x0000.0001;
0x00000E00;
//Prescale value= 1
//Pins of PWM 1,2 and
3 enabled PWM1

i•
PWMMR0 = 0xFF; //Set PWM frequency ''
PWMTCR = 0x00000009; '
PWMMCR = 0x00000002; //Reset on PWMMRO

int main() PWM2J


4

k
P3
I

T
1-
P3

T
I
►I
L
PWM_Init ();
while(1) Figure 11.19 I Output waveforms from three PWM channels

{
PWMMR1 = 0x30; //Pulse on time at PWMl
channel Table 11.14 I Values of the Duty Cycles obtained from Example 11.11
PWMMR2 = 0xS0; //Pulse on time at PWM2 Channel Output Pin P (ON Time) secs Duty Cycle
channel PWM 1 P0.0 3.38 x2 = 6.76 19.7%
PWMMR3 = 0x23; //Pulse on time at PWMl
PWM2 P0.7 5.42x2 = 10.64 31.02%

PWMLER = 0xE;
channel
//Latch register value xx

11.3.5
PWM3

I
GEE%.s

The UART
P0.1
Masse
2.41 x2 = 4.42
- 12.8%

T = 34.3 µsecs This chip has two UARTs, nam ely, UARTO and UART1. To understand the operation
Figure 11.18 shows the output pins on which the PWM signals are obtained and of these, first observe the sim plified block diagram of the UARTs of the chip. For any
Figure 11.19 shows the PWM output waveform s. Since there is a prescaling factor of of the regi sters referred herein, you must add the prefix U0 or Ul depending on which
1, the basic tim e T (calculated with a count ofOxFF) is multiplied by 2 to getT = 34. 3 unit (UARTO or UART1) is being used. The three im portant units are the transm itter,
secs. The pulse ON tim es are also m ultiplied by 2 to get values as shown in Table 11.14. receiver and the baud rate generator blocks.
420 EMBEDDED SYSTEMS

ARM THE WORLD'S MOST POPULAR 32-B8II EMBEDDED PROCESSOR 421

11.3.5.3 I The BAUD Rate Generator (BRG)


TCLK
This takes PCLK as input, and generates the baud rates for the transmitter and receiver,
TX by using the numbers in the registers DLL and DLM which function as dividers.

TXD 11.3.5.4 [ Registers of UARTO


Let us use UART0 for transferring a character string from the LPC 2148 board to a
RCLK PC, using the 'hyperterminal', at a baud of 9600. Example 11.12 transmits the string
'sushmita LPC2148'.The string is moved one character at a time to the THR. Between
RX
the loading of one character and the next one, a delay is given.
It may be necessary to understand the registers used in the program to gain a full
B RXD
u understanding ofit.As such, refer to the working of each of the registers (Section 11.3.5.4)
s while going through the program.
BRG
Example 11.12
TCLK #include "LPC214x.h"
void init(void);
void delay_ms (int);
RCLK
int main()
PCLK
inti;
unsigned char c[] = "sushmita LPC2148 \n ";
//Send data by writing to U0THR
init ();
for (;;)
APB {
Interface
for(i = 0;i<=17;i++)
{
Figure 11.20 I Simplified block diagram of a UART on the chip
UOTHR =
c[i];
while((U0LSR &GG 0x20));
11.3.5.1 { The Transmitter //Check status of one bit
Figure 11.20 shows two registers in this block. They are the Transmitter Holding Register of U0LSR
delay_ms(250);
(THR) and the Transmitter Shift Register (TSR). When a data byte arrives in he THR,
(from the CPU through the bus) it is 'framed' (by adding start and stop bits) and trans-
ferred to the TSR and sent out through the TxD pin one bit at a time, by clocking the
TSR at the baud decided by the transmitter clock TCLK.
void init()

11.3.5.2 \ The Receiver PINSEL0 = 0x0S;


There are two registers in the receiver block. They are the Receiver Shift Register (RSR) U0FCR = 0x07; //Enable and clear FIFOs
and the Receiver Buffer Register (RBR). The data received serially through the RxD line U0LCR = 0x83; //8-N-1, enable divisors
(at the baud decided by RCLK), is moved bit by bit into the RSR, and then transferred U0DLL = 0x62; //9600 baud
to the RBR after de-framing. From the RBR, it is copied to the CPU registers through U0DLM = 0x00;
the bus. U0LCR = 0x03; //8-N-1, disable divisors
...

EMBEDDED SYSTEMS ARMTHE WORLD'S MOST POPULAR 32-BT EMBEDDED PROCESSOR 423
422

void delay_ms(int x)
The calculation for these values is as follows

int a,b; UARTO = PCLK x Mu/Val


for(a = O;ax;a++) +« o 16x(256xU0DLM +UODLL) (MulVal + DiuAddVVall
{
In our case, PCLK = 15 MHz
for(b = O;b<3000;b++);
With U0DLM =0 and U0DLL = Ox62 {98 in decimal notation), the calculation
with the above formula gives the baud rate to be 9566.32,i.e., 9600 {approx)
I tar »er 9ace ti K i« ouriMe.a Lr
UARTO FIFO Control Registers (UOFCR)
This is an 8-bit register Figure 11.21 in which the important bits are as explained.
Pinselect Register (PINSELOJ Bit 0: E: This bit must be set for enabling the Tx and Rx FIFOs
This register has been discussed earlier. In this context, only the pin selection for the Bit 1: Rx FIFO Reset: This must be set, to clear all bytes in UART0 Rx FIFO and
TxD and RxD pins ofUART0 are referred. reset the pointer logic. This bit is self-clearing.
Table 11.15 shows that P0.0 and P0.1 are the relevant pins, PINSEL0 register Bit 2: T FIFO Reset: This must be set to clear all bytes in UART0 T FIFO and
selects pins PO.0 as TD and PO.1 as RD, by writing PINSEL0 = 0x5. - reset the pointer logic. This bit is self-clearing.
Bits 7 and 6: Rx trigger level: These two bits determine how many receiver FIFO
UARTO Transmit Holding Register (UOTHR) characters must be written before an interrupt is activated.
This is an 8-bit register and part of the transmit buffer, in fact, it is the topmost byte of We have chosen 00.
this buffer, and new characters are to be loaded into this register for being transmitted.
The data to be transmitted is written into this 'write only' register. In Example 11.12, UARTO Line Control Register (UOLCR)
the characters to be transmitted are loaded into this register one byte at a time, with The U0LCR is an 8-bit register which determines the format of the data character that
a delay. is to be transmitted or received.
The bits actively used here are
UARTO Divisor Latch Registers (UODLL and UODLM)
i) Bits 1: 0 These two bits have been chosen to be '11'to indicate 8-bit character length
The UART0 divisor Latch is part of the UART0 Fractional Baud Rate Generator and
ii) Bit 2: This is made 'O' to select one stop bit
holds the value used to divide the clock supplied by the fractional prescaler in order to
iii) Bit 7: This is the Divisor Latch Access Bit (DLAB) and is set, to enable the use of
produce the baud rate clock, which must be 16x the desired baud rate. The U0DLL and the divisor latch
U0DLM registers together form a 16-bit divisor where U0DLL contains the lower
iv) Other bits of this register pertain to parity and break control. These have been dis-
8 bits of the divisor and U0DLM contains the higher 8 bits of the divisor.
abled by making these bits to be' 0',
Example 11.12 uses U0DLL =0x62 and U0DLM = 0x00 to get a baud rate of 9600.
Thus we have U0LCR = 1000 0011 = 0x83

UARTO Line Status Register (UOLSR)


Table 11.15 Relevant Bits of the PINSELO Register The U0LSR is a read-only register that provides status information regarding the
Function UART0 TX and RX blocks.
Bits Port Pin Value
In Example 11.10, only the 5 bit of this register is used.The 5bit shows a '1'when
1:0 P0.0 00 GPIO
the transmitter holding register {U0THR) is empty. Only if the register is empty can the
01 TxD (UART0) next byte be sent for transmission. To confirm this, U0LSR is ANDed with Ox20, and
10 PWM1 the next character is sent only after this status bit is confirmed to be high.
11 Reserved
3:2 P0.1 00 GPIO
01 RxD (UART0) T (Bit 7 and 6) I Reserved (Bits Tx R (Bit2) RxR (Bit 1) E (BitO)
3to 5)
10 PWM3
11 EINT0 Figure 11.21 [ Bits of UOFCR
.Mr0
23+s
#EE8.3 e8%0116 BEE9.Ms.ca40090898809.0800089060319008.463307325~ 38382292et£753368428
~

424 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 33-BIT EMBEDDED PROCESSOR 425

With this, we conclude our discussion of serial com m unication. For more details of JTAG
Interface
the registers used and the interrupt mode of tra nsm ission, do refer to the user manual
ofLPC2148.

LPC2917/19/01
11.3.6 ] The SSP Unit
This unit perform s serial com m unication using the SPI protocol (Refer Section 5.2.2).
Appendix I contains a progra m which interfaces an SD card to LPC 2148 using the SSP
unit. It may be necessary to refer to the manual of the chip, and gain an understanding
of the registers of the SSP unit, to get a clear understanding of the interfacing progra m .
Section 19.2 discusses a com plete product developed using LPC 2148 MCU.

;

n
@
Cd
5 6
a
-.

, Master
I Master GPOMA Controller
With this, our discussion of a typ ical ARM? MCU ends. Note that the one that we
Slave l
have used is only one am ong the num erous versions of ARM? available in the market. Vectored I GPOMA Registers
AHB to DTL I Slave I
But a basic understanding of this chip, peripherals and program m ing will help in under- Interrupt I
Controller Bridge
I
standing any other ARM? MCU chip. Slave,
Next we will take a look at typ ical ARM9 and Cortex MCUs. Going beyond the I
AHB to DTL i Slave I
periphery of these is beyond the scope of this book. We will look at them , from a block Bridge Slave 1
I
diagram point of view, just to observe their com plexity and power. I
I
5 I
I
I

?
I I
I Slave'
I
11.4 ] ARM9 I I

5
I I
'
The ARM9 core is a more advanced mem ber of the ARM fam ily (com pared to ARM7).
I
save]
I
I I
It has a 5 stage pipeline and opera tes at a frequency range of approxim ately double that AHB to APB Slave
3 of ARM?. Many ARM9 cores have DSP instructions and thus are 'Enhanced' ARM9E
processors. Because the core is so powerful, it is used for m ore com plex operations and PWM0/1/2/3 y1
Bridge I
I
I
I
I
I
I
Event Router

I I
I I
Chip Feature ID
an MCU which is based on ARM9, typically has more peripherals than an ARM? based 3.3V ADC1/2 Ky
I I Slave'
I
MCU. I
General Purpose 1/0
I
Here we will take a look at a particular ARM9 board developed by NXP; it is con- SV ADC0 Ports 0/1/2/3
tains an MCU of the LPC 29xx series. The user manual describes the featu res of this
board in this way:
Quadrature
Encoder I
'The LPC 29xx com bine an 125 MHz ARM968E-S CPU core, Full Speed USE AHB to APB I Slave
2.0 host and device (LPC 2927/29 only), CAN and LIN, 56 KB SRAM , up to 768 KB Bridge
flash mem ory, external mem ory interface, three 10-bi ADCs, and multiple serial and
parallel interfaces in a single chip'. wDT
___ J
It is obvious that this is a very powerful chip with many m ore peripherals than the
ARM? MCU that we have just stu died. Figure 11.22 shows the internal block diagra m
of the chip, in which you can observe its advanced featu res and peripheral structu re.

11.5 ] ARM Cortex-M3


To com plete our discussion, let us look at a cortex-based MCU as well. The LPC 17xx The LPC 17xx is an ARM Cortex-M3 based microcontroller for em bedded
series is an MCU series with ARM cortex M3 as its core. The following paragraphs applications requiring a high level of integration and low-power dissipation. The ARM
quoted from the user manual, illustra te its main features, which are also evident from Cortex-M3 is a next generation core that offers system enhancem ents such as modern-
the block diagram of Figure 11.23. ized debug featu res and a higher level of support block integration.
426 EMBEDDED SYST EMS
ARM-TH E WORLD'S MOST POP ULAR 32-BIT EMBEDDE D PROCESSOR 427
Ethern et 5
PHY USB
c 0
High speed versions (LPC 1769 and LPC 1759) operate at up to a 120 MHz CPU
ti;
r frequency. Other versions operate at up to an 100 MHz CPU frequency. The ARM
Cortex-M3 CPU incorporates a 3-stage pipeline and uses a Harvard architecture with
separate local instruction and data buses as well as a third bus for peripherals. It also
includes an internal prefetch unit that supports speculative branches.
The peripheral complement of the LPC 17xx includes up to 512kB of flash mem-
ory, up to 64 kB of data memory, Ethernet MAC, a USE interface that can be configured
as either Host, Device, or OTC, 8 channel general purpose DMA controller, 4 UARTs,
2 CAN channels,2 SSP controllers, SPI interface, 3 12C interfaces, 2-inputplus 2-output
12S interface, 8 channel 12-bit ADC, 10-bit DAC, motor control PWM, Quadrature
Encoder interface, 4 general purpose timers, 6-output general purpose PWM, ultra-low
power RTC with separate battery supply, and up to 70 general purpose 1/0 pins.'

Conclusion
With this, we conclude our discussion of ARM peripheral interfacing. The programs in
the chapter have been tested and confirmed to be working as per the specifications for
which it has been designed (i.e. frequency, pulse width, etc.) Only a few peripherals of
APB Slave Gro up 0 APB Sla ve Gro up 1
ARM? have been discussed, but the methodology used for those blocks is expected to
help in understanding the rest of them. The important point is that registers of the unit
to be used are to be understood with a high degree of clarity. For ARM 9 and Cortex
MCUs, only the degree of complexity has been shown-programming them can be done
on similar lines, as has been done for ARM 7.

KEY POINTS OF THIS CHAPTER


o The LPC 2148 MCU belongs to the series of ARM7 MCUs of NXP, and is very popular.
o It operates with a system clock of 60 MHz, and has a large set of peripherals.
o It has three intern al buses operating at different fr equencies, conforming to AMBA
specifications.
o There is a 'memory accelerator module'to allow fast access to program lines.
o It has two ports, the pins of which act as GPIO pins.
o Each pin is multi functional, and can be configured for a specific function by using the bits
of a'Pinselect Register'.
o There are two timers which can be used as fr ee running interval timers or as capture
timers.
o The system clock is named CCLK, and is divided to get a lower fr equency peripheral clock
called PCLK.
RTC Pow e r Do ma in o The timers and PW Ms used in the programs in the chapter have values of output frequen-
cies, based on a PCLK of 15 MHZ .
Figure 11.23 I Intern al block diagram of the 17x seri es of ARM-Cortex M3 MCU o The PWM unit has 6 output pins from which 6 PWM pulse trains can be obtained
simultaneously.
428 EMBEDDED SYSTEMS

o There are two serial communication units named UART0 and UARTl.

W$4-$. HI
o Using the SSP unit, an SD card can be interfaced to the chip.
o ARM9 and Cortex MCUs are much more complex and powerful, and have more number -% "I ,TT-lfj
of peripherals.

QUESTIONS
1. Name five peripherals in the LPC 2148 MCU.
2. What is the difference between PCLK and CCLK?
I2
HIEIIE
RWII'IIIIIR
45fl
·f.sill.== , - 5
"±j$<@_H,"

3. What is the necessity for having the MAM module? How does it function?
4. Distinguish between the power-down and idle modes of this MCU.
5. What is meant by the term 'AMBA ? In this chapter, you will learn
6. Differentiate between the different internal buses in terms of speed and function.
0 The history and application range of PSoC 0 The working principle of Switched
7. Look at the memory map and find out the extent of memory locations for static RAM and devices Capacitor circuits
flash ROM. 0 The distinct and special featu res of PSoC 0 The finer details of the analog blocks
8. For a GPIO pin to be made to act as an ON/OFF switch, which are the registers to be used. 0 The differences between PS6C1, PS6C3 0 How to do the interconnections on the
Give an example to illustrate the use of these registers. and PSoC5 GIB for digital and analog blocks
9. How does the prescaler in a time unit function? 0 The internal architectu re of PS6C1 0 Ihe programm ing of PS6C1
10. Distinguish between single and double edge PWM. 0 The GIB of PSoC Designer 0 The enhancem ents available for PSoC3
0 The digital blocks of PSoCl and PSoC5

EXERCISES
Write programs to obtain the following waveforms:
1. Generate a symmetrical square wave at four pins of Port 1, using software delay.
Introduction
2. Generate an asymmetric square wave at four pins of Port 0 using software delay. The term SoC was m entioned in Chapter 1. So we know that an MCU with a large
3. Using Timer 1, obtain a symmetric square wave of frequency 1 0 KHz at one pin of Port 1, num ber of peripherals is called an SoC, for 'System on chip'. Each of the peripherals on
and another square wave of frequency 90 KHz at one pin of Port 0 using Timer 0. Both an SoC is usually program m able, and so the term PSoC can have a general meaning.
waveforms should be simultaneously present. But in this chapter, we discuss a very specific product line of Cypress Sem iconductors
4. Using Timer 0, generate an asymmetric square waveform at four pins of Port 1. The square designated as PSoC. We discuss the special featu res of Cypress's PSoC, which has
wave should have an ON time of 0.1 msec and an OFF time of 0.35 msec. becom e very popul ar in the em bedded system s world and has found many applications
for itself PSoC is a fam ily of em bedded processors with a simple 8-bit M8C core in
5. Generate PWMs at the six output pins of the PWM unit, with duty cycles of 10, 20, 30, 40,
PSC 1,a more sophi sticated 8-bit 8051 core in PS6C3, and an advanced 32-bit ARM
SO and 70%.
core in PSoC5.
In this chapter, we will concentrate more on the PSoC 1 architecture and usage.
The aim is to introduce the reader to this series of MCUs which are versatile, easy to
understand and use, and have many features that other MCUs do not possess. The best
way to learn is to get a PSoC developm ent kit and do a project based on one of the chips
belonging to this fam ily. This chapter introduces you to PSoC and analyses why PSoC is
a good point to 'take off' into the em bedded design world.

Chapter-opening image: A PSoC5 development board.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy