ARM Processor Unit 5
ARM Processor Unit 5
MOST POPUlAR
32-BIT fMBf□Uf□
ig%,
[:Ji, 11mm '? i l.f:::i: 6
l0
PART I
PROCfSSOR "ES"%±
~---'
-; £a ,g
,.-,,,__ ii,
Introduction
-,.
This chapter gives an introduction to ARM, the very popular 32-bit processor, with
a short account of its history, followed by details of where it stands in the embedded
processor market now. ARM stands for 'Advanced RISC Machine'. The name explicitly
£
states its characteristic of being a RISC processor. The first ARM processor actually was
meant to be the 'Acorn RISC Machine' as it was manufactured by Acorn Computers
1
Ltd., Cambridge, England, in 1985.
i
.,
10.1 I History of the ARM Processor
'
J
},, In 1985, Acorn Computers Ltd. was in search of a new processor to put up in the
desktop market. While the technocrats were contemplating various design options, they
II
came across a few papers published by a set of students in the University of Berkley
I - (USA) outlining a very simple processor design based on RISC principles. The computer lj
architects of Acorn Computers found the design very attractive and decided to build
Ji
Chapter-opening image: An ARM? LPC2140ooard.
/l
If'
336 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32 BIT EMBEDDED PROCESSOR 337
r
a new processor using som e of these principles. This led to the developm ent of ARM1, 10.1.2 ] The ARM Microcontroller
which had less than 25,000 tra nsistors, and operated at 6 MHz. ARM has been designated as a 'm icroprocessor' and indeed it is a processor which has
This was followed by ARM2 (in 1987) with 30,000 transistors. Com paring this to
very high com puting capabilities. It has a rich set of featu res for handling com plex
an Intel/Motorola's processor of that tim e having 70,000 transistors, this was a beauty com putations.
in terms of a sm aller die size and lower power dissipation. This was thus, the first ARM
However, for using it as an em bedded processor, it needs many m ore capabilities
processor which was produced in bulk. It had a 32-bit data bus, a 26-bit address space
and these com e in the form of on-chip peripherals. To the ARM core, peripherals are
and sixteen 32-bit registers and was clocked at 8 to 12 MHz. It dissipated much less
added and thus it becom es a 'm icrocontroller' or an MCU (m icrocontroller unit), rather
power, and perform ed much better than Intel's 80286 which cam e up around the sam e
than an MPU (m icro processor unit). Figure 10.1 shows the ARM MCU. The num ber
tim e (but focused on the deskt op mar ket). and kind of peripherals added, depends on the requirem ents of the buyer of the IP. It is
ARM3, ARM4 and ARMS were also designed, but never produced, because around
because of this that we have varying num ber of peripherals for ARM processors sup-
this tim e, in 1990, Acorn Com puters team ed up with Apple Com puters and VLSI
plied by different com panies. It could be obvious that to support more peripherals, the
Technology group to form a com pany nam ed Advanced RISC Machines Ltd. This com -
core has to be more powerful. That is why we generally find m ore periphera ls around an
pany continued with ARM6, ARM7, etc. The latter was the processor which becam e
ARM 9 core rather than around an ARM? core. But as a rule, users have to spell out
very popular and led to ARM being used in exotic products such as mobile phones, their requirem ents for the peripherals of an MCU. la
PDAs, IPods, com puter hard disks, etc. After this, ARM made rapid strides in th e 32-bit
When a chip has the core and the necessary peripherals to perform as a system , it is «lo
em bedded market, accounting for a very high percentage of applications in the high-end n
called a System on Chip (SC )a nd th e term 'ARM SoC' is a very com m only used-
em bedded system s mar ket. understandably it has som e version of the ARM core and a large set of peripherals. '"
As of 2011, ARM processors account for approxim ately 90 per cent of all em bedded 1
32-it RISC processors. ARM processors arc used extensively in consum er electronics,
!1 11
1' including PDAs, mobile phones, digital media and music players, handheld gam e con- 10.1.3 I RISC VS CISC '
soles, calculators and com puter peripherals such as hard drives and routers, etc.
The differences between these two schools of thought in com puter architectu re have
The subsequent and more advanced processors of the ARM fami ly (ARM9, ARM1O,
been discussed in Section 0.3.
ARMll, Cortex) have been built on the success of the ARM? processor, which is still
But to put the idea in a proper perspective in the context of ARM, som e specific
the most popular and widely used m em ber of the ARM family.
featu res of RISC are listed herein. These apply to most of the instructions of ARM, but
Over the year s, many advanced featu res have been added to the ARM processor, but
not necessarily to all.
the core has rem ained more or less the sam e.
i) Instructions are of the sam e size, that is, 32 bits
10.1.1] The ARM Core ii) Instructions are executed in one cycle
iii) Only the load and store instructions access mem ory
What is meant by the 'core'? The core is the 'processing unit' or the 'com puting engine'
which has all the com puting power, and this aspect is decided by the architectu re, which
represents the basic design of the processor.
One special and unique featu re of ARM as a company is that it designs the core and
licenses this IP (Intellectu al Property ) to others. This sim ply means that the com pany
ARM Core Developed by ARM
HtPl
docs not 'fabricate' the chip, but sells only the design. This design is taken by the licensee,
who may or may not add m ore featu res (usually peripherals) to the design. Som etim es
the buyer can also modify the basic design to a minor extent. The buyer com pany fabri-
cates the design and sells it/uses it for its products.
There arc various ways in which ARM sells its IP. It could be in the form of a soft
IP. In this case, the design is sold as RTL (VHDL/Verilog code), and this allows the
buyer to modify the design to a certain extent. If the design is sold as a hard IP, it means
the buyer gets only the layout or the net list (connection of nets or electronic wires).
Thus, the buyer can add only periphera ls to the 'black box' design he has purchased.
Internal Bus
I Chip developed by
licensees and chip
maeaac<crn,s
We can thus understand that ARM the com pany does not 'fabricate' ARM chips.
(In contrast, Intel fabricates its processors and sells them as chips.) It is because of this,
that we have ARM chips and boards of various com panies- Sam sung, Philips, Atm el,
Figure 10.1 I ARM SoCcore with peripherals
Texas Instrum ents, ST Microelectronics and so on-- the list is very long.
ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 341
340 EMBEDDED SYSTEMS
Table 10.2 I Variants of the ARM Processor
ARM7TDMI
Processor Architecture Memory Management Other
Embedded ICE-RT Name Version Features Features
ETM7 Interface ARM7TDMI ARMv4T
ARMV4T ARMTTDMI-S ARMv4T
ARM-7 Core ARM7EJ-S ARMvSE DSP, Jazelle
Thumb ARM920T ARMV4T MMU
ARM922T ARMv4T MMU
ARM920T
ARM926EJS ARMvSE MMU DSP, Jazelle
MMU
ARM946E-S ARMvSE MPU DSP
Dual 16K Caches
ARM966E-S ARMvSE DSP
Embedded ICE
ARM968E-S ARMvSE OMA, DSP
ETM9 Interface
ARMV4T
ARM966HS
ARM1020E
ARMvSE
ARMvSE
MPU (optional)
MMU
DSP
DSP
'"'~I
l(
ARM-9 Core
ARM1022E ARMvSE MMU DSP
>M
Thumb
ARM 1026EJ-S ARMvSE MMU or MPU DSP, Jazelle
ASB Interface ARM1136J(F)-S ARMv6 MMU DSP, Jazelle
Figure 10.2 I Two ARM cores ARM 1176JZ(F)-S ARMv6 MMU+TrustZone DSP, Jazelle
ARM 11 MPCore ARMv6 MMU+Multiprocessor DSP, Jazelle
Subsequently, it was decided to do awaywith these complex naming schemes, as the Cache Support
features corresponding to TDMI were expected to be mandatorily available in all ARM
ARM 1156T2(F)-S ARMv6 MPU DSP
processors. But some numbers were added to imply the presence of memory interfaces,
cache, tightly coupled memory and so on. For example, ARMwith cache and MMU are Cortex-M0 ARMV6-M NVIC
now given the suffix 26 or 36, whereas processors with MPUs are suffixed with 46. Over Cortex-Ml ARMv6-M FPGA TCM interface NVIC
the years, this type of naming convention has also changed. Refer to Table 10.2 for some Cortex-M3 ARMv7-M MPU (optional) NVIC
8say Es ES1E8%2433E
. .2¥
more variants of ARM.
(Courtesy: The Definitive Guide to ARM Cortex-M3 by Joseph Liu, Newnes Publications)
ii) The R profile: This profile which has the ARMv7-R architectu re has been designed Decode Execute
for high-end applications which require real-tim e capabilities. Typical applications
Figure 10.3a I A three stage pipeline
are autom atic bra king system s and other safety critical applications.
iii) TheM profile: This profile which has the ARMV7-M architecture has been designed
for deeply em bedded microcontroller type system s. This is to be used in industrial
control applications where a large num ber of peripherals may have to be handled Cycle 2 3 4 5
and controlled. Operation
Now that we have done a survey of the range of ARM processors, let's discuss the INSTR2 I Fetch
11
Decode I[ Execute
featu res which have m ade ARM a very popular processor in the high-end em bedded
market. INSTR 3 Fetch
11
Decode I[ Execute
i) Data bus width: The processor has a 32-bit data bus width, which means that it can Figure 10.3b The three stage pipeline with 3 instructions in operation
iii oat
read and write 32 bits in one cycle. For high end applications, having a wide data
bus corresponds to a high data bandwidth and is very important. When ARM first el
'
made its entry into the field, there were very few embedded processors which had
Decode Execute Buffer Write
>
'l
t such a wide bus width.
ii) Computational capability: The instruction set of ARM has been cleverly designed
to facilitate very good com putational capability. Many unique and new methods of
Figure 10.4 ] Afive-stage pipeline
'¥
fast com putation without the necessity of extensive hardware is used. The design of
the processor used the RISC approach, but over the years, this philosophy has been
Figure 10.3a shows a three stage pipeline, while Figure 10.3b shows three
diluted to enable the addition of specialized hardware for com putationally intensive
instructions in the pipeline. Any instruction needs three sub cycles to com e out of the
tasks. In essence, ARM is a RISC processor which has a few CISC featu res as well.
pipeline, which tra nslates to a throughput of three instructions per clock period (T).
iii) Low power: In the embedded field, power saving is very important, because a large
ARM7 has a 3-stage pip eline, while ARM9 has a 5-stage pipelin e with m ore
number of devices operate on battery power. Designing lower power processor cores
finely quantized stages (Figure 10.4), which are 'fetch, decode, execute, buffer data
is thus a matter of high priority. How is it that a processor is designed to have low
and write back'. As a general rule, more advanced processors have more pipeline
power capability? Em bedded processors operate at low clock frequencies com pared to
stages. For exam ple, ARMlO has 6 stages.
desk top processors. While 3.3GHz is com m only used in the desktop processor field,
Pipelining is a great idea, but it has the drawback that when a bra nch instruc-
ARM operates at relatively low frequencies from 60 MHz to at the most 1 GHz.
tion appears, the instructions following it arc no longer needed to be executed in
The other techniques in low-power design are explained in Section 2.4.
the norm al sequence. So the instructions in the previous stage/stages have to be
iv) Pipelining: Pipelining is a fundam ental idea in computer architectu re, for increas-
discarded, or we say that the pipeline is to be flushed. This creates a loss of speed,
ing the speed of opera tion. The idea is to get m any activities to be done in tandem ,
and the penalty is higher for pipelines with m ore number of stages.
by dividing the whole instruction processing stage into sub stages. The basic task
that any processor does is 'fetch, decode and execute'. In the sim plest form of pipe- v) Multiple regi ster instructions: Since ARM is a RISC processor, it has instructions
lining (3 stage), all the three stages are active all the tim e. While the first stage is which process data which arc in registers only - this sim ply m eans that data
fetching an instruction, the next stage, that is,. th e decode stage, is busy with the processing instructions do not use addressing modes in which one operand is
decoding of the previously fetched instruction, and the execute stage is execut- in mem ory. But there are instructions which access m em ory and load data into
ing the instruction which had been previously decoded. Thus at any tim e, there multiple registers - also, contents of multiple registers can be stored in mem ory,
are three instructions sim ultaneously present in the pipeline, at different levels of with a single instruction.
processing. vi) DSP enhancements: Our processor has RISC as its basic policy, but the m ore
If the processor clock frequency is f, the clock period (T) of the processor advanced mem bers of the fam ily have DSP (Digital Signal Processing) instructions
is divided by 3 to give a tim e ofT/3 for each of the stages. In this sub-cycle (of as an enhanced featu re. This is where ARM departs from its RISC philosophy, but
period T/3), one instruction each is obtained as a throughput, which is essentially is necessary for surviving in the em bedded market. These DSP enhancements are
3 instructions in the period T. It means that the processing speed is multiplied by 3. signified by an 'E' in the nam e as of the ARMvSTE and ARMvSTEJ architectu res.
r
!'
~
344 EMBEDDED SYSTEMS
i ARM- THE WORLD'S MOST POPULAR 32-8IT EMBEDDED PROCESSOR 345
the exact functions of each m ode right now. But keep in mind that the user m ode cor- R5
Fast
>
responds to the sim plest m ode, with least privileges, but is the mode under which most R6 •I
Interrupt
application program s run. The system mode is a highly privileged mode. This mode is Request
R7
used by opera ting system s to manipulate and control the activities of the processor. The
R8 R8_FIO
other modes are entered on the occurrence of exceptions or rather, they are interrupt
modes. See the list of the opera ting m odes of ARM. R9 R9_FIO i
i) User: Unprivileged m ode under which most tasks run R10 R10_FIQ I
ii) FIQ(Fast Interrupt Request): Entered on a high priority (fast) interrupt request R11 R11_FIQ
Interrupt !
iii) IRQ(Interrupt Request): Entered on a low priority interrupt request R12 R12_FIQ Request Supervisor Undefined Abort ·•
iv) Supervisor: Entered on reset and when a software interrupt instruction (SWI ) is R13_FIO R13_IRQ R13_SVC R13_ABT
R13 SP
executed ---
R14LR R14_FIQ R14_IRO R14_SVC R14_ABT
v) Abort: Used to handle mem ory access violations
vi) Undef: Used to handle undefined instructions R15 PC
vii) System: Privileged m ode using the same registers as user mode
CPSR
ARM has 37 regi ster s each of which is 32 bits long. They are listed as follows: Figure 10.5 Register set of ARM
i) 1 dedicated progra m counter (PC)
ii) 1 dedicated current program statu s register (CPSR)
iii) 5 dedicated saved program statu s registers (SPSR) Figure 10.5 shows the whole set of regi sters available for the processor. Look at the
iv) 30 general purpose registers set of registers titled as 'user and system'. Let's discuss the specific functions of each of
them .
Now, let's go into the details of the listed registers RO-R12 arc general purpose registers, or what may be designated as scratch pad'
(
registers. These are the registers into which data and address are loaded. They are also
10.2.3.1 I General Purpose Registers 'the' registers used in com putations.
j
1,
There are 30 of them ; but they are distributed am ong different m odes. R13 is the pointer to the stack, and is the stack poin ter (SP). ;
To understand this featu re, see the case of one particular mode, say the user m ode. R15 acts as the program counter (PC), whi ch, like in any other processor, is the !
In this mode, the registers act as shown in Table 10.4. register which sequences instructions as they are fetched from m em ory. f
I
lI
346 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-8IT EMBEDDED PROCESSOR 347
R14 is the link register (LR), a special register. It is used whether there is a procedure Table 10.5 I CPSR Bits
call or an interrupt, that is, branching to a location. When branching becom es necessary, Bit Nos. Notation Interpretation
the value of PC is saved in the link register, and PC takes on the new bra nch address. 0 to 4 Mode Specifies the current mode of
When retu rning to the original sequence, the PC value can be retrieved from the link operation
register. This is a very convenient option, because the necessity to push the PC value to T Specifies whether in ARM(T = 1)or
5
the stack is avoided. The stack is a m em ory area, and saving and retrieving from stack is Thumb(T = 0) state
tim e consuming. Having such a register, that is, the LR, to store retu rn addresses helps
6 F Disables (F = 1 )FIQ
to reduce the delay associated with procedure calls and interrupts.
7 I Disables (I = 1 )IRQ
We know· that there are seven modes for the processor, which im plies that it can be 24 J In Jazelle state (J -- 1)
switched to different modes, as decided by the requirem ent. When the processor switches, 27 Q Sticky overflow flag
say, from the user to another mode, som e of the user mode registers are replaced by 28 to 31 V,CZ,N Conditional
another set of registers. See th e FIO_m ode, for exam ple, in this mode, R8 to R14 are
al
replaced by another set of registers, and the names of these registers are suffixed by FIO,
like R14_FIQ, R12_FIQand so on. 4¢
Bits O to 4 specify the current mode of operation. Since there are only 7 modes of
operation, onl y seven m ode num bers are valid.
»
Why Is it that FIQ uses another set of registers? 1111
The J bit is for indicating whether the Jazelle state is valid or not. The T bit specifies
Note that th is mode is entered on a 'fast interrupt' which means it requires fast action.
whether the current opera tion is in the ARM or Thum b mode.
One action during interrupts would be to save the contents of the currently uscd regis-
The contents of this register can be modified only in the highly privileged system
ters. This 'saving' takes som e tim e. To ensure fast operation, in the case of being switched
mode. It also contains the condition flag bits. Most of you are likely to know the rel-
to the FIQ_mode, new registers are used. No tim e is spent on saving the contents of
regi ster R8 to R14 of the user mode. Once th e FIQ_m ode is entered, those registers are
evance of the conditional flag bits. But for those who might be new to the concept of
flags, here is a concise descr iption.
l
just swapped out, and replaced by a set of new regi sters. Note also, th at all registers are
not swapped out, however.
J
·A
Now look at Figure 10.5 once again to note the IRO_m ode. Here only R13 and R14 10.2.5 I Conditional Flags
are replaced by new registers. In th e IRQ_m ode, the response is not expected to be, as N: Negative Flag This flag indicates the statu s of the MSB of the result of an opera-
fast as in the FIO_m ode. Thus, there is sufficient tim e to allow the contents of most of tion. If we are dealing with signed number N = 1 means that the sign bit = 1, which is
the registers to be saved, before mode switching is done. This also applies to the modes a negative result.
'undef, supervisor and abort'. In these modes too, only two registers are swapped out and
replaced with new ones. C: Carry Flag This bit is set if there is an overflow from the MSB of the data being
manipul ated; this can happen in additions, shifts, rotates etc. It is also set when the result
CPSR of subtra ction is positive. If Rl-R2 gives a positive result, C = 1, indicates that R1 is
The CPSR (Current Progra m Statu s Register) is a very important register, and there is greater than R2. To be precise, let's say that 'A carry occurs if the result of an add, sub-
onl y one such register for the processor. Figure 10.6 and Table 10.5 gives its details. tract or com pare is greater than or equal to 232, or as the resul t of an inline barrel shifter
The CPSR contains the inform ation about the current state of the processor. It has operation in a move or logical instruction'.
bits which specify the mode, control bits to enable/disable interrupts, and also specifies
whether the Thum b or ARM mode is currently in use. Z: Zero Flag If the result of an arithm etic or logical opera tion is zero, then Z = 1.
V: Overflow Flag This is the overflow flag, which is relevant only for signed operations.
It indicates that the sign bit has possibly been corrupted because the result has gone out
of the range.
I
31 28 27 24 23 16 15 8 7 6 5 4 0
When signed num bers arc used, only 31 bits are available for the magnitude of the i
N Z C VIO
348 EMBEDDED SYSTEMS ARM THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 349
To cite an exam ple, say two positive num bers are added, and the magnitu de of the Note that the first entry in the table is 'Reset'. All processors have a address, term ed
sum becom es greater than 31 bits. There will be an overflow into the sign bit, which will 'reset vector'which is the location to which control bra nches to, when it is first powered
change the MSB to '1'and get wrongly interpreted as a negative num ber. Thi s overflow ( on, or when reset in the midst of processor activity. For ARM, this is OxOO00 0000.
into the sign bit (MSB) with no overflow out of the MSB causes the overflow (V) bit Since this location is always fixed, RE SET is usually included in the class of vectored
to be set. interrupts.
Q: Sticky Overflow Flag This flag indicates overflow itself, but it is 'sticky' in the sense
that it rem ains set until explicitly cleared.
10.4 I Programming the ARM Processor
Saved Program Status Registers (SPSR) There are five 'Saved Program Statu s
Now that we have had a look at the concepts regarding the instruction set architectu re
Registers', that is, one for each of the 'exception' modes of operation. When an exception,
(ISA) of ARM, we are in a position to understand it better by program m ing. Writing,
that is, an interrupt occurs, the corresponding SPSR saves the current CPSR value into
running and testing program s is the key to understanding any processor. By doing pro-
it (so as to be able to retrieve it on retu rning to the previous mode). The system mode
gram m ing, we becom e capable of understanding alm ost everything about how registers,
and user m odes do not have SPSRs because they are not entered through the m echanism
mem ory and flags act on data. In short, we get a total feel about the processing activity
of interrupts. done inside the processor.
et{
To get to this, we need a programm ing environm ent, that is, an Integrated
«t
Developm ent Environm ent (IDE). There are many IDEs available for ARM, som e of
10.3 ] Interrupt Vector Table
which are free of cost (and freely downloadable) and som e of which are proprietary and
»
We have seen that ARM has a num ber of exception modes. Exceptions are a class of thus have to be paid for. However for stu dents, an evaluation version is available which is
interrupts which are internally genera ted due to the occurrence of som e specific condi- freely downloadaple and available from the website www.keil.com . Here, we will use the
tions. For exam ple, when an undefined instruction is detected, th e processor can' t process Keil IDE also called the RVDK (Real View Developm ent Kit), which is very popul ar
it. The solution for such an undesired situ ation is to m ake the processor switch to another and easy to use. This version can be used for testing program s and for sim ul ation also.
mode and genera te an interrupt. This interrupt takes control to an interrupt service rou- We will do all our learning using this IDE. The step-by-step procedure for using this,
tine (ISR i.e. interrupt handler) residing in a specific location in mem ory. This specific is detailed in Appendix A. In this part of the chapter, we will assum e that you have this
location is term ed the 'Interrupt Vector' corresponding to this exception. IDE and also that you have already browsed through Appendix A.
Besides 'exceptions', the processor can be interrupted by instructions and this is
called a software interrupt (SWI ). There are hardware interrupts as well, which are acti-
10.4.1 ProgrammingAssembly vs C
vated y FIQorIRO. r ·1 ,
The aforesaid discussion is just to clarify the fact that associated with all exceptions, Programm ing can be done in assem bly as well as in high level languages. In the em bed- , I
I
,.1,
hardware and software interrupts, there is a fixed interrupt vector which leads to the ISR ded design world, high level languages are used in product design, and C is a very 4
or the interrupt handler. popular language. As such we will also do C program m ing (in the next chapter). But
See Table 10.6 which shows the pre-defined interrupt vectors. before that, let's have a stint in assem bly program m ing. Our approach will be such that
to understand the ARM core, that is, to use its regi sters, do mem ory access and so on,
we will do assem bly program m ing. This ensures that we get a good grip on the ARM
Table 10.6 I List of Interrupt Vectors core architectu re. In this context, it will turn out that we focus on the com putational
capabilities of the core.
t
Exception Shorthand Vector Address
And when we start using ARM as a microcontroller, i.e. the core with a number of
Reset RESET 0x00000000
peripherals, we use C program m ing. This will allow us to use the processor in various
Undefined instruction UNDEF 0x00000004 practical applications involving peripherals and intera ction with th e external world. This
Software interrupt SWI 0x00000008 part will be discussed in Chapter 11.
Prefetch abort PABT 0x0000000c
Data abort DAT 0x000000l0
Reserved 0x00000014
10.5 I ARM Assembly Language
Interrupt request IRQ 0x00000018 As m entioned earlier, the ARM instruction set has been cleverly designed to get m ore
Fast 0x000000l c than one operation to be done in a single instruction. Let's list out som e featu res of the
ARM instruction set.
350 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 351
Operand 1 Operand 2 Address Data
0x00001200 A3
0x00001201 90
Ox00001202 47
Ox00001203 OE
Barrel
Shifter Figure 10.8a I The little endian format
Address Data
0x00001200 OE
0x00001201 47
0x00001202 90
0x00001203 A3
ALU Figure 10.8b I The big endian format
!
wt
A 32-bit data stored in mem ory needs 4 bytes of space which means 4 consecutive tt I
addresses are required, as one address can store only one byt e. When the lowest byt e
Result
of the 32-bit word is stored in the lowest of these four addresses, it is called the 'little
» '
Figure 10.7 I Data processing unit
endian' format. Otherwise, it is the 'big endi an' form at. See Figure 10.8. The 32-bit data
word is Ox0E4790A3.1he storage addresses are from Ox00001200 onwards.
i) ARM is a RISC processor, in which every instruction has a maxim um size of32 bits. In the processor industry, both formats are used. Intel prefers the little endian form at,
Instructions are expected to be executed in one cycle. This is true for most instruc- while Motorola uses the big endian form at. ARM allows both form ats ( can be fixed up
tions, but not for all. Therefore it is better to say that ARM is a RISC processor with by software, in the initialization stage). In this book, we assum e the little endian form at.
a few CISC typ e instructions as well.
ii) Another featu re of RISC and therefore of ARM, is that it is a load-store architec- 10.5.2 [ Data Alignment
,J
tu re. This m eans that all com putations are register based, that is, the opera nds are to
be brought to registers from m em ory, using a load instruction. After com putation,
Storing (and loading also) of 4 byt es in mem ory can be done in one cycle, because the I
the result is to be stored in mem ory. For the user, this means that there is no data
processor has a 32-bit data bus. When 32-bit data is stored in m em ory, four addresses
are needed. We need to specify only one address in our instruction; but there is an aspect
I
1
processing instructions in which one of the operands is in m em ory. All operands are
to be available in registers before com putation can be done.
iii) A third featu re of ARM is that its ALU has a barrel shifter (Figure 10. 7) associated
called 'alignm ent'. For 32-bit data, 'alignm ent' im plies that the last two bits of this address
are zero. For exam ple, the address 0x00001200 is an aligned address. When this address
I
with one of its operands. A barrel shifter is a unit that can perform more than one
is used to store 32-bi t data, this address and the next three addresses arc autom atically
accessed. This is because of the way mem ory is organized, as four banks (see Figure 10.9).
ii
bit of shift/rotation, to the right or to the left on an opera nd. As we will soon see, the
barrel shifter adds som e clever processing techniques to data processing and allows I
shifting and an arithm etic opera tion to be com bined in the sam e instruction.
iv) 'Conditions' can be appended to instructions: this implies that we can choose to 'do
Bank 3 Bank2 Bank 1 Bank 0 I
or not do' a particular operation based on the statu s of a condition flag, For m ost 0x1204 0x1205 0x1206 0x1207
~i
other processors, only branching operations depend on flag statu s. Here we will see 0x1200 0x1201 0x1202 0x1203
±£:
that data movem ent as well as data processing instructions can be made 'conditional'. ~
11
10.5.1 I Data Types D31 D24 D23 D16 D15 DB D7 DO #
ii
ARM can operate on 32-bit data, which is term ed a word, 16-bit data called a half word 16 Bits 16 Bits i
and also on byt e opera nds. The processing tools offer the option of storing data as 'little
endian', or 'big endian'. To clarify this concept, follow the forthcom ing discussion, and
I: 32 Bits :I ±'f
Figure 10. 9 I Memory banks
observe Figure 10.8
EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-8I EMBEDDED PROCESSOR 353
352
If the address of a 32-bit number is given as O:x:1200, the accessed addresses are 10.6.1 I Data Processing Instructions
0x1200, 0x1201, O:x:1202 and O:x:1203. The 4 bytes in these addresses are considered to be ARM is a RISC processor, one of the features ofwhich is that it processes, i.e., performs
in the same row, that is, aligned. In this case, one byte each from each bank is accessed computations, on data which are in registers only. There are instructions which move
and only one memory cycle is needed to access an aligned word. data from one register to another. Such instructions have only two operands, that is, the
For unaligned data, one more cycle is necessary. Think of the address O:x:1201. The source and the destination. Instructions which perform arithmetic/logical computations
locations to be accessed will be O:x:1201, O:x:1202, O:x:1203 and O:x:1204. Note that the first have three operands-two source operands and one destination operand.
three bytes will be in the same row, while the last will be in a different row (bank), and
so one more cycle of access will be required.
We summarize the conditions for 'aligned data' as follows: 10.6.1.1 I MOVandMVN
• For word (32-bit) data, the specified address should have its least significant two bits The 'MOV' instruction is a 'register to register' data movement instruction with the for-
mat MOV destination, source where both the source and destination have to be registers.
as 0.
• For halfword (16-bit) accesses, the specified address should have the LSB equal to 0. The mnemonic 'MVN'stands for 'move negated' which implies moving the comple-
mented value of the source to the destination.
Most of the tools for ARM ensure that data is stored in aligned locations, so as to Registers Rl to R12 can be used for data movement as they are general purpose
avoid unnecessary extra cycles of operation. registers. The registers R13, R14 and R15, whi ch are the stack pointer, link register and u t]
-. the program counter respectively, can also use the MOV instructions, but this must be a
10.5.3 I Assembly Language Rules done carefully and only for specific purposes. :111
An assembly language line has four fields, namely, label, opcode, operand and comment. Examples
A label is positioned at the left of a line and is the symbol for the memory address which MOVR11,R2 copy the contents ofR2 to R11
stores that line of information. There are certain rules regarding labels that are allowed MOV R12, R10 copy the contents of R10 to R12
under the type of assembler being used. The manual of the specific assembler should be MVNR0,R9 ;move the complemented value of R9 to RO
referred, to get this clear. The second field is the opcode or instruction field. The third is if R9 = O:x:FFF00000, RO = 0:x:000FFFFFF
the operand field, and the last is the comment field which starts with a semicolon. The
use of comments is advised for making programs more readable. Note Here we have discussed only the case of the MOV instruction used for moving
A typical assembly language statement is data between registers. The MOV instruction is also used for copying immediate data
into registers. That will be discussed in Section 10.17.
BOSE ADD Rl, R2, R3 ;add R2 and R3 and copy the sum to Rl.
The label is BOSE, the opcode is ADD, the operands are R1, R2 and R3 and the
10.6.1.2 ] The Barr el Shifter
line after the semicolon is the comment. While writing programs, make sure you don't
write instructions at the extreme left of the page-that part is the 'label' field in this
book.We will use the assembler which is part of the RVDK supplied by Keil. The steps
Now, refer to Figure 10.7. We see that there is a barrel shifter associated with data
processing. The figure shows two register operands, one of which can optionally be
l
in usirig it have been clearly described in Appendix A. More details are available in the acted upon by a barrel shifter, before being admitted to the ALU. The barrel shifter
'Real view assembly guide'. can do shifting and rotation. Let us first have a general discussion on shifts and ~,J.
rotations. 4
'
'I..
10.6 ] ARM Instruction Set 10.6.2 I Shift and Rotate .~
~
We will now discuss the ARM instruction set, and gradually move on to writing Two types of shifts are possible: logical and arithmetic. I]$i#l
programs. ,,+it
,,
The instruction set can be broadly classified as follows:
10.6.2.1 I Logical Shift Left (LSL) I
i) Data processing instructions Logical Shift Left of a (say) 32-bit number causes it to shift left, (a specified number of 1
ii) Load store instructions-single register, multiple register
times) and the vacant bits on the right are filled with zeros. See Figure 10.10. The last bit t
.. 1j
'·,
shifted out from the left is copied to the carry flag. Keep in mind that a left shift by one 10.6.3 I Format of Shift and Rotate Instructions
bit position corresponds to multiplication by 2. An LSL of 5 implies multiplication by 32.
The num ber of bit positions by which shifts and rotations are to be done m ay be specified
by a constant or may be indicated in another register.
10.6.2.2 I Logical Shift Right (LSRJ
Logical Shift Right does a sim ilar thing. The vacant bit positions on the left arc filled
Examples
with zeros, and the last bit shifted out is retained in the carry flag. This is shown in
Figure 10.11. Shifting right by one, divides the num ber by 2. Two right shifts cause a
LSL R2,#4 shift left logically, the content of R2 by 4 bit positions
ASR R5, #8 ;shift right arithm etically, the content ofR2 by 4 bit positions
division by 4.
ROR R 1,R2 ;rotate the content ofR l, by the num ber specified in R2
10.6.2.3 I Arithmetic Shift Right (ASR)
Arithm etic Shift Right is different in the sense that the vacant bit positions on the left Example 10.1
are filled with the MSB of the original number. See Figure 10.12. This type of shift has
the function of doing 'sign extension' of data, because for positive numbers the MSB is 0, The content of som e of the registers are given as:
and for negative num bers, the MSB is 1. There is no instruction for arithmetic shift left,
,/«4 Rl = 0EF00DE12, R2 = Ox0456123F, R5 =4,R 6= 28. aa
because of not having an application for it.
r t
m~ Find the result (in the destination register), when the following instructions are executed.
,,. 10.6.2.4 ] Rotate Right (ROR) l
» In this, the data is moved right, and the bits shifted out from the right are inserted back i)
ii)
LSL Rl,#8
ASR R1, R5
through the left. Sec Figure 10.13. The last bit rotated out is available in the carry flag.
iii) ROR R2, R6
There is no 'rotate left' instruction, because left rotation by n tim es can be achieved by
iv) LSR R2, #5
rotating to the right (32- n) tim es. For exam ple, rotating 4 tim es to the left is achieved
by rotating 32-4 = 28 tim es to the right. Solution
10.6.2.5 I Rotate Right Extended (RRX) i) Shifting Rl left 8 tim es causes 8 zeros in the 8 positions on the right. Rl now con-
tains Ox00DE1200 1
This corresponds to rotating right through the carry bit, meaning that the bit that drops
ii) R5 contains 4. Arithm etically right shifting Rl 4 times, causes the MSB (1, for the
off from the right side is moved to C and the carry bit enters through the left of the data.
given num ber) to be replicated 4 tim es on the left, thus causing a sign extension of
This should be obvious from Figure 10.14.
the shifted number. R1 now contains 0xFEF00DEl.
iii) R6 contains 28. Rotating R2 28 tim es to the right is equivalent to rotating it
0 Register 32-28 = 4 tim es, to the left. After rotation, R6 contains Ox456123F0.
Logical shift right iv) Here, R2 is logically shifted right 5 tim es, and so 5 zeros enter through the left. R2
Figure 10.11
now has the value Ox0022B091.
89act-etc it» • targee.gc.macer.
: Register CF
10.6.4 I Combining the Operations of Move and Shift
Figure 10.12 I Arithmetic shift right
Recollect the barrel shifter which is an integral part of the data processing unit of the
processor. This allows shifting and data processing to be done in the sam e instruction
cycle. We will first see how moving and shifting can be com bined in one instruction
Register itself.
Figure 10.13 I Rotate right MOV RI, R2, LSL #2
MOV R1, R2, LSR R3
Figure 10.14 I •
Rotate right extended
Register
1 In both the above instructions, R1 is the destination register. In the first instruction, the
source operand, that is, the content of R2 is logically shifted twice and then m oved to
the destination register R1. In the second, the am ount of' shifting' is specified in register
R3. After the shifting is done, the result is moved to R1.
!
1-] 83 -
Find the content of the destination registers after the execution of each of the given
OPCODE Other Info
instructions, given that the content of RS = Ox72340200 and R2 = 4.
Solution
Table 10.7 I List of Conditions, Codes and Corresponding Flag Status
The resul ts here are sim ilar to Exam ple 10.1, except that the source and destination reg- Cond Mnemonic Meaning Condition Flag State
isters are not the sam e after execution of the instructions. 0000 EQ Equal Z=1
0001 NE Not Equal 7Z=0
i) MOV R3, R5, LSL #3.
The content of RS is shifted left 3 tim es, and moved to R3. 0010 CS/HS Carry set/unsigned >- C= 1
R3 now contain s 091A01000 0011 CL O Carry clear/unsigned < C=0
ii) MOV R6, R5, ASR R2
R2 = 4,and so RS is arithm etically shifted right 4 tim es. Since th e MSB of the
0100 Ml Minus/Negative N= l }' 4
·~,
flags. But for ARM, we m ust suffix the instruction by S for this to happen. Otherwise
the flags are unaffected. It is the S suffix on a data processing instruction that causes the
flags in the CPSR to be updated.
In Example 10.2, in the instruction MOV R3, R5, LSL #3, th ere is a logical opera -
tion involved, that is, the left shift opera tion. This should cause the carry flag and N flag
1111
333.3580±.
(NV)
ERE39257325..243209154.50358355372%-16.53RT9~
Unpredictable
ige
but -3 is greater th an -6. Thus, it is clear that unsigned and signed num bers have to be
dealt with differently.
- ',
ii
to be set. But since the MOY instruction is not appended with the suffix, 'S', the flags ~]
rem ain unaffected, that is, reset. The MOY instruction can be made conditional by writ- 4
ing it as MOVS R3, R5, LSL #3. After this is executed, we find the N and C flags to be
t
10.8 [ Arithmetic Instructions iii/
set. This flag setting can be used to m ake an instruction following it, to be 'conditional'.
i"
We will soon see more aspects of this. Now let's get a feel of the arithm etic instructions of ARM and the special ways in which
Figure 10.15 shows the form at of a typ ical ARM instruction. In the instruction they can be used.
:lt
code, four bits are allotted for the condition under which the instruction is to be exe-
cuted. If no condition is indicated, these bits assume the 'always' condition. 10.8.1 I Addition and Subtraction
Table 10.7 lists the conditions, condition codes and the flag statuses for these condi-
Addition and subtraction are three operand instructions. The destination is always a regi s-
tions. We will discuss the use of condition codes for instructions. ,i
ter. The source operands may both be registers or one of them m ay be an im m ediate data.
Note that the conditions used for signed num bers and unsigned num bers ar e dif--
ferent. For unsigned numbers, we use the m nem onic 'higher' or 'lower', while for signed
There are som e issues in using im m ediate data greater than 8 bits (Ref Section 10.17). 4
num bers, the conditions are specified as 'greater than' or 'lower than'. The flag settings arc
See Table 10.8 which gives exam ples of how the different addition and subtraction i
also different. The logic of this is very sim ple, that is, we know that 6 is. higher than 3,
instructions work. Any of the general purpose registers may be used as operands, though l
in the table, only R3, R4 and RS have been m entioned.
[
111
lii
358 EMBEDDED SYSTEMS ARM THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 359
ADD R3, R4, RS Add R3 = R4 + RS '$ v) If the car ry flag is not set, and the Z flag is also not set, it m eans that R2 is bigger.
is suffixed by S. Following such an ADD instructions, we can have instructions with in stances. Thi s is a very great saving, as branching causes stalling of th e pipeline
conditions appended to it. The set of possible conditions are listed in Table. 10. 7. For any (Section 10.2). It allows very dense code, without many bra nches. Not executing som e of «I
instruction, the upper 4 bits are used to specify the condition (Figure 10.15). the conditional instructions does affect the speed, but the penalty is less than the over-
head due to a bra nch.
Consider these program lines
SUBS R1, R2, R3 the suffix 'S' has been used
MOVEQR2,R l ;the EQnotation tests the Z=1 condition Example 10.4
Here the move instruction is executed only if the result of the subtra ction produces a Find the result of the following instructions. What do these instructions accom plish?
zero and sets the zero flag. The condition EQ_implies th e setting of the zero flag (Refer
i) ADD R1, R2, R2, LSL #3
Table 10.7)). Let' s use this concept in a sim ple exam ple.
ii) RSB R3, R3, R3, LSL #3
Example 10.3 iii) RSB R3, R2, R2, LSL #4
iv) SUB RO, RO, RO, LSL #2
It is required to com pare two num bers which are in registers R1 and R2. The bigger v) RSB R2, Rl, #0
num ber is to be placed in RlO. If the two num bers are equal, then the num ber is to be
moved to R9. Solution
i) ADD R1, R2, R2, LSL #3
Solution
One source operand is R2, LSL #3. Left shifting 3 times accomplishes multiplica-
Here we use the subtraction operation to do the com parison. tion by 2?=8
SUBS R3, R1, R2 ;R3=R 1- R2 The result of the whole operation is R1 = R2+8R2=9R2
MOVEQ R 9, R1 ;If R l and R2 are equal (Z=1)m ove Rl to R9 ii) RSB R3, R3, R3, LSL #3
MOVHI RlO, Rl ;ifRl>R2, C = 1, Rl is moved to RlO R3=8R3- R3=7R 3
MOVR 8,R2 ;otherwise i.e., ifRl is less than R2, m ove R2 to RS iii) RSB R3, R2, R2, LSL #4
R3 = 16R2- R2 = 15R2
iv) SUB RO, RO, RO LSL #2
The salient points of this program are as follows:
RO= R0-4RO=-3R0
i) First the operation, R1-R2 is perform ed and the result is placed in R3. v) RSB R2, Rl,#0
ii) Since the SUB instruction has been appended with S, the flags will be set We get R2 = 0- Rl = -Rl. i.e., we get the negative value ofRl
. ' . ' 8 °,242/1%35.3.' 10KR3/.·19£:3&-h.1"£:+$2±1.%6 £2 0 9 9 /7 3 8 3 . +±9911935%,9
accordingly.
iii) If the two num bers are equal, the zero flag gets set and the instruction MOVEO
will get executed. Otherwise it becom es a NOP (no operation) instruction. Here
one of the numbers (R1) is to moved to R9 (as both num bers arc cqual).
10.9 I Logical Instructions
iv) The next line checks whether the carry flag has been set. If R1>R2, th e carry Now, we will see the logical instructions of the processor. They also need to be suffixed
flag is set (C = 1) and the MOVHI (m ove if high) instruction gets executed. with 'S' to have th e flags updated. See Table 10.9.
re
ti
EMBEDDED SYSTEMS
'I ARM-THE WORLD'S MOST POPULAR 32-8IT EMBEDDED PROCESSOR 361
360
Table 10.9 I List of Logical Instructions Table 10.11 I Flag Settings After a Compare Instruction
Note One of the source opera nds may be 8-bit im m ediate data as well. Refer to 10.11 I Multiplication
Section 10.17 for details of how to handle data bigger than 8 bits.
ecires.a.at.et.» q3tcsM.
Me.&'Y
I .#£0%0,0.979e986.1909%836080%%%.
298990066606
438 Rei pg.eNE.reM.Acbats0#
tee399. Multiplication is a com plex operation which needs specialized hardware and takes m ore
than one cycle to execute. ARM has a number of multiplication instructions, which uses
10.10 I Compare Instructions
this hardware. Let's exam ine how these instructions are used.
This instruction com pares two opera nds and causes the conditional flags to be affected, 10.11.1 I Multiply
but neither the destination nor the source changes. Com parison is done by a subtraction The form at of the multiply instruction is
opera tion, and the flags are set/reset according to the resul t of this. (ARM has four typ es
of com pare instructions as shown in Table 10.10). However, only two flags really matter MUL Rd, Rm, Rs
and they are the zero flag and the carry flag. Refer to Table 10.11 to get an idea of the where Rd is the destination register. Rm and Rs are source registers. A num ber of points
flag settings after a com par e instruction. arc to be kept in mind when these instructions are used. Table 10.12 lists different typ es
Note Since the com pare instructions explicitly affect the flags, the suffix S is not of multiplication instructions.
required for them .
Com parison is a very im portant operation, and we will use it very frequently. Table 10.12 List of Multiply Instructions
A num ber of progra m s using this instruction will be discussed subsequently. Instruction Operation Calculation
+
SMLAL RO, Rl, R2, R3 Signed multiply and (RO, Rl] = (RO, R1J + R2 * R3
I
l
Table 10.10 List of'Compare'Instructi ons accumulate
CMP R3,R4 Compare R3- R4, but only flags affected SMULL R0, R1, R2, R3 Signed multiply [RO, R1]= R2 * R3
,,,
CMN R3, R4 Compare negated R3 + R4, but only flags affected UM LAL R0, R1, R2, R3 Unsigned multiply and [RO, R1]= [RO, R1] + R2 R3 4
TST R3, R4 Test R3 AND R4 but only flags affected accumulate
i-
;
R4 Test R3 OR RA but only flags affected UMULL
r Ta 2898Ga99/ £3#5A22298% i
1
1
iit
/
362 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 363
i) The source and destination registers are 32 bits in lengt h. If the product is longer
than 32 bits, only the lower bits are preserved in the destination register.
10.13 I Starting Assembly Language
ii) Im m ediate data cannot be used as a source operand.
Programming
iii) If the multiplicand and m ultiplier are signed num bers, it is up to the program m er to If you have any previous experience of assem bly language program m ing, you will know
identify a logic to interpret the sign of the product. that there arc two item s used therein- instructions and directives- the form er are exe-
iv) The instruction can be made conditional. cutable statem ents which are 'executed' by the processor. The latter, that is, directives are
non-executable statem ents relating to the assem bler. They are used to give the assem bler
Example necessary inform ation to perform the assem bly process sm oothly. For som e processors,
MUL R1,R2, R 3 R 1 =R2 x R3 directives are also called pseudo instructions. For ARM, pseudo instructions are special
MULS R1,R2, R 3 R 1 =R2 xR 3 and flags are also set directives issued to the processor which causes certain instructions to be executed. Thus,
MULSEQ R3, R2, R1 ;R3 = R1x R2 is done only if the Z=1 they are also executable statem ents. Thus for ARM, an assem bly language line will con-
;(because of the EQsuffix) tain an instruction, directive or pseudo instruction.
;because of the S suffix, flags are updated Writing and testing a progra m for ARM is done in a com puter, usually a PC, which
MULE Q R4,R 3, R5 ;ifZ = 1, R4 = R3 x R5 is called the host com puter. The host com puter should have the program developm ent '.'.J
tools for ARM. Since the progra m written in ARM assem bly language is assem bled
in a PC which has a different processor (usually som e version of Pentium) , th e process
'
«i
i
10.11.2 ] Multiply and Accumulate i
is called 'cross assem bly'. After the progra m in tested, it is converted to a hex file and
I
The format of this instruction is
Example
MLA RO, R1, R2, R3 ;RO = R1 x R2 + R3
The default area for code is Read-Only and for data it is Read-Write.
Let's understand som e fundam ental directives first. ,I
4
10.11.3 I Long Multiply/Long Multiply and Accumulate
10.13.1] The AREA Directive
I
In this, when 32-bit data arc multiplied to get 64 bit results, the upper 32 bits are saved
The first thing we do when we start assem bly language progra m m ing is to define an
area. There is a directive nam ed 'ARE A' for this. This directive nam es the area and sets its
I
in a specified register. For signed data, the sign bit is also preserved in the upper register.
attributes. The attributes are placed after the name, separated by com m as.
The form at is
Instruction <RdLo>, <RdHi>, <Rm>, <Rs> Examples Example
ARE A
1
'ii
Note That in all the above cases, two registers function as the destination SORT, CODE, RE AD ONLY
Since multiplication is a com plex instruction, it takes many cycles for execution. So, it is
ARE A TAB LE, DATA I
best to realize multiplication using shifting and adding, rather than using any of the mul- The first area defined above is given the nam e SORT; it contains a code, and is read
tiply instructions. Table 10.12 lists the available 'multiply and accum ulate' instructions. only. The word 'read only' is optional. The second AREA directive has the nam e TABLE
and it contains data and though not mentioned will correspond to the Read Write area #!1~1
(as it is a data area).
10.12 I Division
Division is another com plex instruction requiring specialized hardware and extra clock 10.13.2 I The ENTRY Directive
cycles: As a policy, basic ARM architecture docs not have a 'divide' instruction. Division The ENTRY directive marks the first instruction to be executed within an application.
can be realized using repeated subtraction. Com pilers are given the responsibility of Because an application cannot have more than one entry point, the ENTRY directive
accom plishing division using the sim ple instructions of the processor. can appear in only one of the source modules.
#,I
Ii
The second line has address NUMB for the first half word, and NUMB +2for
10.13.3 ] The END Directive the second half word. In the last case, the addresses of the words are NUMBR and
This directive tells the assembler to stop reading. Anything written after the END NUMBR + 4.
directive will be ignored by the assembler. So every assembly language source module Keep in mind that a byte needs only one address, halfword needs two, and a word
must finish with an END directive, on a line by itself requires four memory addresses.
10.14 I General Structure of an Assembly Language Line 10.14.2 I The EQU Directive
The general form of source lines in assembly language is: This is a frequently used directive, and is used to equate a numeric constant to a label.
{label} {instructionldirectivelpseudo-instruction} {;comment} The constant may be data or address. Examples are as follows:
10.16 I Branch Instructions AREA FACTO, CODE ;define the code area
For any processor, branching is a very important operation. 1he power to change ENTRY ;entry point
the sequence of execution is obtained by branching, which may be conditional or MOV R1, #10 ;Rl = 10
368 EMBEDDED SYSTEMS
ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 369
MOV R2, #1 ;R2 =1
;R2 =R2 xR2 iii) When th e result of subtraction becom es ve, (the condition 'MI' for minus), add the
REPT MUL R2 , Rl , R2
divisor to this negative num ber (in R3).
SUBS Rl, R1, #1 ;R1 =RI- 1
;branch to REPT if Z! = 0 iv) In this problem , when 16 is subtracted 31 (0x1F) tim es from 500, the value in R4
BNE REPT
;last line is +ve. One more subtraction makes the N flag to be set, and the num ber R4 to be
STOP B STOP
negative.
END
v) To this -ve num ber add the divisor. This m akes it equal to the rem ainder which is 4,
in this case.
This is a very sim ple program which finds the factorial of 10. It can be used to find vi) Thus, we get 31 (OxlF) as the quotient (in R3) and 4 as the rem ainder (in R4)
the factorial of any other num ber (except O), provided the factorial does not exceed
32 bits in size. The technique is to multiply the number with the 'num ber-1' recursively. 10.16.1 ] Subroutines/Procedures
Meanwhile, a counter also decrem ents by l(which is done by subtra ction), and when the
counter is 0, the Z flag is set. The multiplication is then stopped. The factorial is avail- In Table 10.13, th ere is another form of th e branch instruction which is BL standing for
able in the register R2. The branch instruction used is a conditional one, that is, BNE 'Branch and Link'. Recollect th at a procedure (also called subroutines, functions, etc.) m eans
which tests the Zero flag. The instruction before it, that is, SUB has been appended with th at a new program sequence is taken up, but control return s to th e original point after
that. Most processors (including ARM) use stacks to store th e retu rn addresses and return «}
th e 'S' suffix to ensure the setting of flags.
instructions to handle procedure calls. ARM has an additional feature to handle procedures +j
Now let's see another exam ple which uses condi tional branching. This program per-
form s division by repeated subtraction. in a simpler m anner. Recollect a register named th e 'Link Register'. When a BL instruction
is encountered, th e PC value is changed to th at of th e target, but th e old PC value is copied
Example 10.8 to the LR register. At th e end of the procedure, the LR value can be copied back to the PC.
Now let's write a program which calls a procedure.
AREA DIV, CODE
ENTRY Example10.9
MOV R1, #500 ;Move the dividend to Rl
Write a program to calculate 3x2 + 5Y, where X = 8 and Y = 5
MOV R2, #16 ;Move the divisor to R2
MOV R3, #0 ;R3 =
0 Solution
MOV R4, Rl ;copy the dividend to R4
AREA PROCED,CODE
REPT SUBS R4 , RA, R2 ;subtract and set flags
ENTRY
ADDPL R3, R3, #1 ;add if N =
1 i.e. MSB of RS is +ve
BPL REPT ;repeat the loop if the MSB is +ve MOV R2,#8 ;to calculate 3Xx? +5¥
ADDMI R4, RA, R2 ;if MSB of R4 is -ve, add R2 to R4 BL SQUARE ;call the SQUARE procedure
STOP B STOP ADD R1,R3,R3,LSL #1 ; 3x°
END
a» - k .
MOV R2,#5 ;R2 = 5
arsorzusarsznzaroe.sets.arr raa.cs3%1.%%«as-423 15060-5L.2us%2.7a
BL SQUARE ;call the SQUARE procedure
ADD R0,R3,R3,LSL #2 ;5Y°
This progra m perform s division by repeated subtraction. Here 500 is to be divided by 16.
The m ethod is to subtract 16 from 500 repeatedly until the resul t becom es negative.
ADD R4,Rl,R0 ;RA = RI+RO i.e 3X? +5Y
STOP B STOP ;last line in the execution
The branch instruction BPL REPT m eans Branch to label RE PT if plus (PL), i.e., if
N=0. SQUARE MUL R3,R2,R2 ;the SQUARE procedure
Besides conditional bra nching, there are the ADD and SUB instructions also, which MOV PC,LR ;return LR back to PC
are conditional- the condition used is the statu s of the sign flag N. END
etarosere .qr.1a, ..as«9:,r ps % ts1.5ta.7,a.20ersatzestr
93era4 e£a.9a893Pe5cu terrroarr98
The steps of the progra m are as follows:
i) Subtra ct 16 from 500, and check if the result is +ve or -ve. This can be verified by The salient points of this program are as follows:
checki ng the N flag which corresponds to the MSB of the resultant num ber. The
i) A procedure nam ed SQUARE has been used. This procedure uses the multiply
condition flags are updated by the subtra ction operation (using the suffix S).
instruction to find the square of any num ber. The num ber to be squared is passed
ii) If the num ber (in R4) is +ve, it m eans that subtraction can be repeated unhindered.
to the procedure using the regi ster R2. The square of the num ber is retu rned to the
Each tim e this is verified, the quotient register (R3) is increm ented by 1.
m ain program in R3.
EMBEDDED SYSTEMS
370 ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR
371
■f
ii) There are two num bers, X and Y, whose squares are to be found. Calli ng the pro-
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1312 11 10 9 8 7 6 5 4 3 2 1 0
cedure am ounts to just writing the instruction BL SQUARE . This instruction will
cause a branching to the procedure nam ed SQUARE . It also copies the current PC I Cond /0/0/ 1 >/ Rn / Rd / Shifter_operand I
value to the link regi ster (LR).
iii) The procedure has only two instructions: one to perform squaring, and the other to Figure 10.17a Format of a typical data processing instruction
copy the LR content back to PC. The second instruction causes a retu rn to the main
program .
iv) We need two mul tiplications, in addition to the squaring operation. These two, that 11 87 0
is, 3X 2 and 5Y? are achieved by shifting and adding. The MUL instruction is used as
ROT IMMED-8
little as possible because it takes more tim e, and causes higher power dissipation.
v) The last step is adding 3X 2 and 5Y? whi ch are now in R1 and RO. The sum is avail-
able in RA. x2
vi) Note that the last program line to be executed is STOP B STOP, even though it is
not the last line in the assem bly file.
In Table 10.13 there are two more form s for the bra nch instruction. BX stands for
Branch and Exchange. BLX is for Bra nch Link and Exchange. The Exchange featu re
is applicable when ARM and THUMB instructions are being used, and it is needed to
switch from one set to another. 32-Bit Cons tant
Figure 10.17b I Modification of the 'shifter operand'
,
We can also use the MVN instruction for generating new num bers. lf0 is loaded into a 4
The data processing instruction form at has 12 bits available for operand 2.
register and moved into another or the sam e register after using the MVN instruction, i!:ij
Figure 10.17b shows the instruction format which has been modified for using the
we get OxFFFFFFFF it
im m ediate mode.
You can try this code to verify. #
ARM-TH E WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 373
Table 10.14 I The Range of Constants That Can Be Generated By the Rotation Scheme iii 0x6D 16 0x6D0000
Decimal Values Equivalent Step Between Rotate iv 0x05 6 0x14000000
Hexadecimal Values V 0x3E 2 0x8000000F
0- 255 0-0ff 1 No rotate
0x100 - 0x3fc 4 Right by 30 bits
256, 260, 264, ... ,1020
0x400 -- 0ff0 16 Right by 28 bits
1024, 1040, 1056, ... , 4080
- -.
4096, 4160, 4224, ... ,16320 0xl 000- 03fc0
nrrvm m
MOVRl,#Ox0
MVNRl,Rl
SI.ti
64
- --
Right by 26 bits The answers for iii, iv and v are in Table 10.15
From Example 10.9, we note that many 32-bit numbers can be generated by the
ARM rotation scheme, but there are constants which cannot be obtained by this method.
For example, the number 0xllllllll cannot be generated by rotation.
Let's summarize the points regarding the generation of constants using the ARM
rotation scheme.
i) A class of constants can be generated by this scheme, (Table 10.14) but all constants
How then are such constants obtained for use in the
immediate mode ofaddressing?
10.17.2 I Literal Pools
, i
cannot be generated. Those that cannot be, will have to loaded directly into memory,
In computer science, specifically in compiler and assembler design, a literal pool is I
by using the concept of'literal pools'. We will come to that soon. (Table 10.14 shows (i
a lookup table used to hold literals during assembly and execution. But first, what I,
the range of constants that can be generated by the rotation scheme) exactly is a literal? In programming, a literal is a value written exactly as it is meant
d,
ii) To generate the constant needed, the programmer need not specify the 8-bit imme- to be interpreted. A literal can be a number, a character or a string. For example, in
~1'
diate number, and the number of rotations to be done. He just has to write an
the expression, x= 145,x is a variable, and 145 is a literal. Thus, literals are constants
instruction in the immediate mode. The assembler converts this instruction to the
required scheme. When a constant Ox.200000002 is needed, the assembler converts
it to the instructions.
MOVR1, #0x22
for ARM. When it is required to load a constant in a register, the assembler can help
by creating a space in memory and then placing this constant in the space. From this
memory space, the processor can take it and use it using load instructions. But assemblers
are not guided by instructions; they use what are called pseudo instructions. In this case
j
MOVR1, Rl, ROR#4, which creates the require 32-bit constants for us.
iii) The processor does not have an instruction for rotation to the left. But, rotating n
of making· a literal pool, and taking a constant from the pool; there is a specific pseudo
instruction
'I
times to the left is achieved by rotating (32-n) times to the right.
LDR Rd, = const 'I
Example 10.9 This pseudo instruction can construct any 32-bit numeric constant. Suppose we
Find the 32-bit constant generated by each of the following rotations need to the constant 0x33333333, it is likely that we write an instructions MOV Rl,
# 0x33333333. With this, the assembler will give an error message that such a constant
i) Rotate Ox40, to the right 30 times cannot be generated. To avoid such a situation, we write
ii) Rotate Ox.56, to the left 12 times
iii) Rotate Ox6D, to the right 16 times LDR R1, = 033333333.
iv) Rotate Ox05, to the right 6 times I
This is a pseudo instruction (don't confuse it with the LDR instruction, we will soon
v) Rotate Ox.FC, to the right 2 times
¥I
,
come to, there is a difference in format between the two). This will cause the assembler
f
Solution to check one of the following possibilities. ;
i) The 8-bit number is 01000000. i) Can the constant be constructed with MOV or MVN instruction combined with 4
00000000 00000000 00000000 01000000; the 8-bit number in 32-bit format rotation? If this is possible, the assembler generates the appropriate instruction, that ~''i
$
00000000 00000000 00000001 00000000; the number after rotation is, an 8-bit number is rotated appropriately to get the constant in question. ~i
Thus, the constant obtained is Ox.100 ii) If the constant cannot be constructed this way, the assembler places the value in a I
ii) The 8-bit number is 01010110 literalpool and generates an LDR (load register) instruction with a program-relative 1
00000000 00000000 000000000 01010110, the 8-bit number in 32-bit format. address that reads the constant from this literal pool. r
Rotating 12 times to the left is equivalent to rotating 20 times to the right I
t
00000000 00000101 01100000 00000000; the number after rotation i
-·------------------------------
The constant generated is 0x56000 I
I•
:~
374 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 375
Example10.10b
Example 10.10a
AREA PROGl,CODE,READ ONLY
AREA PROGl,CODE,READ ONLY
ENTRY
ENTRY
LDR Rl, =
#0x12400000
LDR R1, =
0x12400000
LDR R2, = 0x00555555
LDR R2, = 0X00555555
ADD R3,R1,R2
ADD R3,R1,R2
SPACE 4400
STOP B STOP
STOP B STOP
END
..
...,__...,
gun: e, egg.pg ez
_,..,.__,.....,
rope.t
,..
fflW
- ffllll~~l!iitilW1.·.::,mt:t:i'W.'l\i"..r.-w. END
5900960680906.0999.086000099
P/weest e0990,982./008, 090009%.006-6
0$09803096a3800er0we38 xx res errs
This is a modified version of Exam ple 10. lOa. Recollect that we need to have a litera l pool
In Exam plelO.lOa, two constants are needed. If you run this program and check for the constant Ox00555555. Here, a directive called SPACE 4400 has been inserted.
the disassem bly file, you will find two interesting facts, which relate to the different This directive creates an em pty area of 4400 byes. Because of this, the total space occu-
ways in which these two constants are genera ted. The assem bler realizes that the first pied by the program becom es large (greater than 4400 byt es, anyway). The literal pool is
constant can be obtained by the rotation schem e, but the second one cannot. So a usually after the progra m area. In this case, this will make the literal pool to be beyond
litera l pool is created just after the last instruction, and the constant 0x00555555 is th e range (greater than 4KB ) of th e LDR R2, = 000555555. Hence, on assembling, the
placed therein. Then a 'load register' instruction is generated to load the constant into following message is seen.
register. error: A1284E: Literalpool too distant, use LTORG to assemble it within 4KB
In the program an error message indicating this will be obtained as above.
How Is the literal pool accessed? To avoid such a situ ation, we can place the literal closer to the instruction which
The literal we need is accessed from the literal pool using a PC relative mode. In this needs the constant. See the m odified version of the program .
;
mode, onl y 12 bits are allowed for the 'relative num ber' which can be positive or negative.
Thus, the liter al in th e pool has to be withi n +/- 4KB of the current PC value. Example 10.1 Oc
AREA PROGl,CODE,READ ONLY
Where should the literal pool be placed? ENTRY
LDR Rl, = 0x12400000
Norm ally the literal pool is placed just after the END directive, which means just
after the end of the progra m area. This is okay for norm al size program s. But som e-
LDR R2, =
0X00555555
ADD R3,R1,R2
tim es, progra m s are very large and if the literal pool is placed after the end of the
LTORG
program , it may be out of range (of +/- 4KB ) of the LDR instruction. Such a situ-
SPACE 4400
ation im plies that there should be the flexibility of placing literal pools anywhere in
STOP B STOP
mem ory. This is done by the LTORG directive which allows us to define the origin
of a litera l pool. END
Ge.9Mo.ii»
¥Ms#9C.A993a0gO,a:6LL.E99 es.ao
0PA.9.9 er2r. 3%39%9%.150907 .z9±% en
When a pseudo instruction LDR Rd= const is encountered, the assem bler checks
if the constant is available and addressable in the nearest literal pool. If it is so, it takes Now the program runs without error, because a literal pool has been created before the
it from the pool. Otherwise, it attem pts to place the constant in the next literal pool. 'free space' of 4400 bytes. By the use of the LTORG directive, the required constant is
If the next literal pool is out of range, the assem bler generates an error message. In this found to have been placed in this pool, by the assem bler. Thus, we see that the directive
case, the LTORG directive is to be used to place an additional literal pool in the code. LTORG can be used to place literal pools wherever we want. This will becom e useful as
Place the LTORG directive after the failed LDR pseudo instruction, and within 4KB progra m s becom e larger.
mem ory space.
Literal pools are to be placed in locations where the processor docs not attem pt to 10.18 I Load and Store Instructions
execute them as instructions. It is best to place them after unconditional branc h instruc-
ARM is a RISC architectu re, and one of the featu res of RISC is that of being a 'load
tions, or after the retu rn instruction at the end of a subroutine. Let us sec an exam ple of
store' architectu re. Loading is the process of getting data from mem ory into a register,
a case where the LTORG directive becom es necessary.
and storing is just the reverse process. In ARM, data is brought into registers using a
376 EMBEDDED SYSTEMS ARM- THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 377
load instruction, and only then can it be used for data processing. After computation,
the result can be 'stored' in memory. The memory in question is 'RAM' which is the
Table 10.16 I List of Load and Store Instructions
read/write memory. RAM is volatile and is used for temporary storage of data in the LDR Load Word STR Store Word
course of computations. The only instructions which access RAM are 'load' and 'store'.
LDRH Load Half Word STRH Store Half Word
All registers can be accessed using these instructions, but programmers are advised to
exercise caution when accessing critical registers like the PC, SP, etc. LDRSH Load Signed Half Word
The syntax for load or store is
LDRB Load Byte STRB Store Byte
LDR/STR {<cond>}<Rd>, <addressing mode>
Rd is the source register for store and destination register for load. LDRSB Load Signed Byte
The addressing mode gives us the necessary information to get the 'effective address',
which is the actual memory address to be accessed. The addressing mode is indirect
memory and registers. There are also instructions which differentiate between signed
because the memory address is not to be specified directly in the instruction, rather
and unsigned data.
a base register is mandatorily used. For the simplest case, an example of LOAD and
There are instructions which clearly indicate the kind of data to be moved. See
STORE instructions are as follows:
Table 10.16. From the table, we understand that we can load and store parts of a 32-bit
LDR R1, [R2] ;copy into R1 the content of memory specified in R2 word by using B for byte and H for halfword, along with the load and store instructions.
STR R1,[R2] ;store the content ofRl into the memory address specified in R2 If a memory locationcontains a 32-bit word, we can move the LSB (assuming little
endian format) into a register by using LDRB, or the lower half of the word by using
This implies that the load/store instruction must be preceded by an instruction
LDRH. Let's clarify this by an example.
which copies the address into R2. We will soon get to know how this is done. There
are various ways of specifying the effective address. The barrel shifter can be part of the Example 10.12
address specifying mechanism.
Two memory areas are being referenced and two registers are used as pointers:
Example 10.11 R1= Ox00000lO0
R2 = Ox40001200
Howis the effective memory address calculated in the followingload and store instructions?
Figures 10.18a and b show the data addresses and corresponding data.
i) LDR R3, [R2,LSL #2] Show the content of memory, after the execution of the following instructions:
ii) STR R9, [R1, R2, ROR #2]
Address Byte Stored
iii) LDR RA4, [R3, R2]
0 x00000100 56
iv) STR R5, [RA, R3, ASL #4]
O x00000101 23
Solution 0 x00000102 OD
O x00000103 AE
i) LDR R3, [R2, LSL #2]
In this the effective address is the content ofR2 left shifted by 2, i.e. multiplied by 4 Figure 10.18a I Address and data
ii) STR R9, [R1, R2, ROR #2]
Here, the effective address is specified by R1, R2 and a right rotation. To calculate Address Byte Stored
0 x40001200 00
it, the content of R2 is rotated twice by 2, and then added to the content of Rl.
O x40001201 00
vi) LDR R4,[R3, R2]
o x40001202 00
The effective address here is the sum of R3 and R2. O x40001203 00
vii) STR R5, [R4, R3, ASL #4]
Figure 10.18b I Address and data
The effective address is the sum of the content of R4 and the arithmetically left
shifted (by 4) content of R3.
mroast8 i) LDR R3, [RI]
ii) LDRB R3, [R1]
10.18.1 I Bytes, Half Words and Words iii) LDRH R3,[R1]
Now, let's see another aspect of load and store instructions. ARM has instructions iv) STRB R3, [R2] given that R3 = OxAE0D2356
to transfer specifically a word (32 bits), half word (16 bits) or a byte (8 bits) between
For this case, show halfword and word storage as well.
378 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-IT EMBEDDED PROCESSOR 379
In this, the half word (lower two byt es) of the address is copied to R3. observe in Table 10.16 that th ere are no store instructions for signed byt es or signed half l
R3 = 0x00002356 words. This is because storing sim ply means placing num bers in m em ory. These num bers li
iv) STRB R3,[R2] given that R3 = 0RAE 0D2356 may be signed, unsigned data or code- it is only when the user brings it to a register, 11t
1
In this, the byt e corresponding to the LSB of the data in R3 is copied to the address is the processing on that num ber done. Onl y then it is necessary for that num ber to be
1·1
pointed by R2. See Tables 10.17a, b and c for byt e, half word storage and word interpreted as signed or unsigned.
111··
storage as well.
Table 10.17a, b and c
10.18.3 I Indexed Addressing Modes i '.1
i!
In this mode, the effective address calculation can be done before a load/ store is executed };lj''
STRB R3, [R2] or afterwards. Let's see what it is all about.
0 x40001200 56
00 10.18.3.1 ] Pre-indexed Addressing Mode
00 Observe the instruction LDR RO, [R7, #4). Here R7 is the base register and the effective
00 address is R7 + 4. The data at this effective address is copied to RO.
Next, see the instruction STR Rl, [R5, R6, LSL #2). The effective address = R5 +
STRH R3, [R2] R6 left shifted twice.
In the above two instructions, there is a notable featu re, however. After the load/
0 x40001200 56
store is done, the base address content rem ains unchanged, that is, the effective address
23
is not copied to the base register. But if we want the base address to contain the effective
00 address, just suffix the instruction by the char acter "!' and then 'write back' occurs.
00 Consider the instruction LDR R2, [R6, #-8) !. In this, after the loading operation is
done, R6 has the effective address written back into it.
STR R3, [R2]
0 x40001200 56 Example 10.13
23
Calculate the effective addresses and explain what each instruction does.
OD i) STRB R2,[R6, R7, #Ox24)!
AE ii) LDRSH R4, [RlO, Rll , ASR #4]
;I../1.58. ./.32+5./%.2MC-1318291.323991
8,0933 @».ere.#rrEELAs•
8/8E.3.0.99c 3A.5.2Pr++ 93ues+a
Solution
10.18.2 ] Loading Signed Numbers i) STRB R2,[R6, R7, #Ox24]!
Signed num bers are those whose MSB is the sign bit. For positive num bers, the sign bit The effective address is the sum of the contents ofR6, R7 and the num ber Ox24.
is 'O', whereas negative num bers arc in the two's com plem ent form and have their MSBs 1he content of R2 is stored in the effective address. After that, the effective address
to be' 1', When a 32-bit num ber is available in mem ory, it can be loaded into registers as is copied to R6.
signed bytes and signed half words. In these cases, the MSB of the byt e part or the half ii) LDRSH RA, [R1O, R11, ASR #4]
word part is checked, and sign extension is done while loading it into registers Here, the effective address is the sum of the contents of R10 and Rl 1 after arith-
Consider the case of a word 0xCDEF8204 in m em ory. Let R7 be used as a pointer metically shifting it right by 4 positions. The half word in this address is loaded to
to that memory location. Then, observe the result of execution of the following instruc- RA. The contents of the base register rem ains unchanged.
tions, as given in the com m ents colum n. teet.32. €er r.weaus.e
rRat
cc. upaces,cs..pre .1396 er, egger?-9 'NT±r #R ES; As .t:.c a.a».2e;
ARM--THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 381
380 EMBEDDED SYSTEMS
Example 10.15
Example 10.14
AREA FIRST, CODE, READ ON L Y
Let's add 10 numb ers which are in mem ory. The num bers are 16-bi t long, that is, half
ENTRY
words, and use two byt e spaces. The pre-indexed mode of addressing with write back is
used to index the half words which have addresses with a spacing of2 between them . The
LDR R7, NUMS =
;load the address of NUMS in R7
instruction LDRH R2, [R7, #2]! does the indexing of th e 16-bit numb ers.
LDR RB, NUMSl =
;load the address of NUMSl in RB
LDR R9, NUM S2 =
;load the address of NUMS2 in R9
LDR Rl, {R7] ;load the word to Rl
Solution
STR Rl, [R9] ;store the word in Rl in NUMS2
AREA DADD, CODE, READONLY
STR R1, [R8] ;store the word in Rl in NUMSl
ENTRY
STOP B STOP
LDR R7, =
TABLE ; copy the address of Table to R7 NUMS DCD 653451134
STRT MOV R0,#9 ;RO = 9 AREA SECOND, DATA, READWRITE
LDRH RI, [R7] ;load 1" number from memory to Rl NUMS2 SPACE 60
REPT LDRH R2, [R7,#2] ! ;pre-indexed with writeback NUM Sl DCD 0
ADD Rl,Rl,R2 ;Rl Rl+R2 = END
SUBS RO,RO,#I ;RO = R0-1 330908-%
136066%.,2009e987108991056 Raes, 9,86,99009%8R.0868099798.2309980.0
3Rq44009093 5.a hattut
END RO M+1
R1 M+2
Exam ple 10.16 uses many of the program m ing aspects that we have been discussing so R2
far. Let's have a look at the im portant featu res of this program .
i) There is an ASCII string written in readonly mem ory using the DCB directive.
Such a string is enclosed in double quotes and each character is a byt e.
ii) One readonly and one read/ write m em ory areas have been defined. M+12
iii) After the ASCII string, a O is used as a term inating char acter. The arrival of this R12 M+13
0 in R2 is used to check whether the required transfer of the string is done. R13 M+14
iv) The instructions for loading and storing are suffixed by 'B'which indicates that only STM
R14 M+15
a byt e is to be tra nsferred.
v) Post-indexed mode of addressing is used for load and store. The addresses need to R15
be increm ented only by 1,as only a byt e is transferred.
Register Set Memory
vi) The instructions for loading and storing are in a procedure nam ed COPY. The pro-
cedure is 'called' by the BL instruction which does branching and also copies the Figure 10.19 I The LDM and STM instructions
current PC to the link register. The last line of the procedure is copying the LR back
to PC. This constitu tes the 'retu rn' to the main program . each register in the opera tion, and the increm ent or decrem ent can occur before or after
the opera tion. The suffixes for these options are as follows:
10.20.2 [ The STM instruction is the first one that can be taken out. It is sequential access that is done, and not random
access. Two operations are defined for a stack, that is, the PUSH, in which data is written
This has the same format as the LDM instruction. Consider the instruction into the stack, and POP in which data is read out and loaded into registers. The stack has
STMIA Rl, (R2-R4} a pointer to its top which is called the Stack pointer (SP). Por ARM, this is register R13.
This will be equivalent to the instructions This means that the address of the top of the stack is to be available in SP.
STR R2, [Rl]
STR R3, [R1. #4] 10.20.3.1 ] Types ofStacks
STR R4, [Rl. #8]] Ascending/Descending and Empty/Full
After the sequences of four stores are over, the base content does not vary, however. An ascending stack grows upwards. It starts from a low memory address and, as items
If you need it to be changed to that of the final address, the writeback operator "!'is to be are pushed onto it, progresses to higher memory addresses. A descending stack grows
used. So write the instruction as STMIA Rl!,(R2-R4} downwards. It starts from a high memory address, and as items are pushed onto it, it
Now let's use the LDM and STM instructions to simply Example 10.16 which progresses to lower memory addresses.
transfers bytes from one portion of memory (Readonly) to another portion (Read/ In an empty stack, the stack pointer points to the next free (empty) location on the
write). But the multiple load/store instructions can be used only for words (32 bits). So stack, i.e., to the place where the next item to be pushed, will be stored. In a full stack,
Example 10.17 has been modified and used to move 6 words. the stack pointer points to the topmost item in the stack, that is the location of the last
item pushed onto the stack. In practice, stacks are almost always full and descending.
Example 10.17 Most stacks are 'Full descending' types.
AREA STRINl, CODE, READONLY Let's consider a descending stack in which SP is first decremented and then data
ENTRY is pushed in. The reverse occurs for the POP operation. Stacks allow data to be pushed
or popped only as words (32 bits for ARM). Consider that SP= Ox50002000, and the
LOR R1, = SOURCE ;pointer to source contents ofRl and R2 are pushed in At the end of the operation we find that SP= SP-8
LDR RO, = DESTIN ;pointer to destination = Ox50001FF8. ARM does not have a mnemonic for PUSH, instead it uses the STM
LDMIA R1,{R2-R8} ;Load six words to R2-R8 instruction. To simplify the use of the STM/LDM instructions corresponding to PUSH
STMIA RO, {R2-R8} ;Store six words in destination and POP for different types of stacks, Table 10.18 can be referred to.
STOP B STOP
SOURCE DCD Ox675889,0xl234568,0x9876543,0x2345678,0x8907653 For the kind ofstack that we are talking about now, what
the instruction we can use for pushing the contents
AREA STRIN2 , DATA,READWRITE ;defin e the R/W memory area
ofregisters R1 to R3?
DESTIN DCD 0
END
teer. 3 pet.asses6retattoo.eeer.sea.pee.» a es.aswoe
The answer is STMDB SP! (Rl-R3}. We need SP to be used as the base register. For
e.e.gee.oaksa.eoar.au.arr.aeoeroecots.aa
in, SP is first decremented, and then storing is done. So we use the suffix 'DB'
Here 6 words from the source memory have been copied to six registers using just with SP. The operator '!' is used such the decremented value is available in SP.
one instruction. In the next instructicn, these six words are stored in the destination this simple program (Example 10.18) in which SP is initialized to Ox40000200.
memory Some values are loaded into registers R1 to R3. Using the STMDB instruction, the con-
Note how simple, the program is. of the three registers, that is, 3 words are pushed to the stack, and will be available in
Example 10.17 illustrates the idea that block data transfers can be simplified using memorv. At the end of the program, SP will be found to have the value of0x400001F4.
the multiple register instructions. But their real importance is for stack implementation.
Stacks are a necessity for any processor; stacks are needed for storing data temporarily
and also for storing return addresses and register values during procedure calls. We will Table 10.18 I Types of Stacks and Corresponding Instructions to be Used
see this now. For those who are not very familiar with the concept of stacks, here is a Stack Type Push Pop
brief review. Full Descending STMF(DB) LDMFD (IA)
Full Ascending STMFA (IB) LDMFA (DA)
10.20.3 ] Stack Empty Descending STMED (DA) LDMED (1B)
A stack is an area in memory, the accessing ofwhich is done in a special way. Most stacks Empty Ascending STMEA(IA) LDMEA (DB)
are Last-In First-Out (LIFO) type stacks. This means that the last data that was stored 2.2386.7.±1 886328
C S> SI
EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR
386 387
Example 10.18
The main program has a procedure named PROC1 whi ch is called by the j
instruction BL 'PROCl. This instruction causes the current PC value to be copied to
AREA STCK, CODE, READONLY LR. In PROC1, since we anticipate a nested procedure, we push LR and the worki ng
ENTRY registers to stack using the instruction STMDB SP!,[RE GS,LR}. Thus th e content of ii
'I
LDR SP, = 0x40000200 LR is safely stored in th e stack. i
I
MOV R1,#1 In the procedure PROCl, another procedure is called by the instruction BL
!
MOV R2,#2 'P ROC2. This instruction causes the copying of the present PC to LR. I PROC2, th ere 1
MOV R3,#3 is the instruction MOV PC, LR at the end. This will get the PC value back from LR, 'l
STMDB SP!,{Rl-R3} and thus execution goes back to PROC1. ]
STOP B STOP At the end of PROCl, th ere is the stack based instruction LDMIA SP!, (RE GS,
:11
END retrieves the contents of LR. This is given back to PC by the instruction MOV
LR. Thus, execution goes back to the main program . l1
Now, in this program , if the STM instruction is changed to STMIA, the stack becom es
Any part of memory can be defined as a stack, by sim ply defining the content of the
an ascending stack, and the value of SP will be Ox4000200C, after program execution.
pointer register. Let's write a procedure using a stack. ,1, 111;
Thus, it is obvious that a stack is a data structu re which can be defined by software. I
'l i
10.20.4 I Stacks and Subroutines/Procedures
For m ost processors, procedures use a stack to store the retu rn address. A procedure is
CODE, READONLY ]IiI,~
taken up by a 'CALL' instruction. This causes the action of pushing the current value of
LDR R7, = 0X40000000
PC onto the stack. The procedure ends with a 'RE TURN ' instruction. This causes the
LDR SP, = 0x40000210 ;define SP
PC value to be popped back.
MOV R1,#1
For ARM, so far (Section 10.16.1) we have used procedures without the necessity of
MOV R2,#2 MG
a stack. That is because the Link Register (LR) keeps the retu rn address, when a proce-
MOV R3,#3
dure is called. But think of the case of nested procedures. There is onl y one link register
BL PROCl ;Call PROC1
for a mode, and a new procedure will overwrite th e existing link register which stores the
LDR R6, [R7] ;load R6
details of the previous procedure, and very soon things may go out of hand.
B STOP
In such cases, a stack is a necessity. Each tim e a procedure is called, the PC value
is saved in the LR, as is the usual case. When a nested procedure com es in, the content STMDB SP! ,{LR,RI-R3) ;save registers and LR on stack
of the link register is pushed on to the stack, and popped out from the stack when exit- MOV R1,#0x34
ing the procedure. Figure 10.20 shows the sequence of actions needed to take care of a MOV R2,#0x45
nested procedure. MOV R3,#0xDC
Now, let's try to understand the sequence of actions indicated by Figure 10.20. BL PROC2 ;call PROC2
In the m ain program , we define a stack by giving a value to the stack pointer (SP). STR R5, (R7 ;store RS
LDMIA SP! ,{Rl-R3,LR} ;retrieve registers from stack
MOV PC,LR ;copy LR to PC
Main Program PROC1 PROC2
ADD R4,R2,Rl ;the nested procedure PROC2
LDR$P,cl / ADD R5,R4,R3
-
STMDB
SP!{REGS, LR} MOV PC,LR ;go back to PROCl
---------------
--------------- --------------- END
BL PROC1 BL PROC2
---------------
•
--------------- a mple 10.19 shows the in stance of a nested procedure and the use of the stack. The
---------------
--------------- is in tune with the sequence outlined by Figure 10.20. Nothing very im portant
LDMIA
--------------- SP!{REGS, LR} achieved by the progra m , But it shows how any nested procedure can be written.
---------------
~ MOV PC, LR MOV PC, LR PROC1 changes the contents of registers R1, R2 and R3, but since they have already
een saved on the stack by th e STMDB in st ruction, their contents can be retrieved
Figure 10.20 I Sequence of actions needed for a nested procedure hile returning to the main program .
EMBEDDED SYSTEMS
388
ARMTHE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 389
PROC2 adds the new contents of R1, R2 and R3, and returns. In PROC1, the sum
in R5 is stored in the memory location pointed by R7. Later, in the main program, this
STOP B STOP
content is loaded to R6.
END
Example 10.20
This program uses the concept of'bubble sorting'. First the 15 numbers stored in the vari-
Write a program which arranges numbers (stored in readonly memory) in ascending
able ARRAY are loaded, in two different steps. After that, two loops are taken in which
order, and placl them in the R/W memory.
the first loop uses a counter from O to 14. The first loop traverses from one end of the array
AREA NUM,DATA,READONLY to other, and the second loop is used to compare nearby values and swap them according
ARRAY DCD 2,7,4,5,11,17,3,15,8,6,9,19,10,23,20 to which value is lesser than the other. Hence, by each iteration of the outer loop, the low-
est value in the array slowly comes to the first element, and this process continues.
AREA COD,CODE
ENTRY
LDR RO , ARRAY = ; load ARRAY to RO 1
I
LDMIA R0,(R1-R10} ;load 10 numbers to Rl to R10 Conclusion
MOV SP,#0x40000000
STMIA SP,{Rl-Rl0}
;location of R/W memory
;store the 10 numbers
With this, we come to the end of our discussion on the architecture and assembly lan- iiy
guage programming for ARM. There are a few more instructions, pseudo instructions l
ADD SP,#A0 and directives that haven't been dealt with, but that can be learned by referring to a book ,,
1)
ADD R0,#40 fully dedicated to this processor. ,.il
LDMIA R0,{Rl-RS} ;load next set of 5 numbers
STMIA SP,{RI-R5} ;store them
MOV SP,#0x40000000 ;address of Read-write memory
MOV Rl,#0 ;Initialize counter Rl to zero
KEY POINTS OF THIS CHAPTER
MOV R3,SP o ARM is the most popular of the 32-bit processors in the market.
LOOPl MOV R2,#0 ; outer loop, counter from one end
o ARM, the company does not fabricate chips-instead it sells the design as 'Intellectual
MOV RA4,SP
Property'.
LOOP2 CMP R2,#14
;branch to OUTER o The ARM family consists of members ARM7, 9, 10, 11 and Cortex versions.
BEQ OUTER
ADD R2,#1 ;increment the counter o Lower power dissipation and good computational capability are the chief attributes of
LDR R0, [R4] ;stored as 4 bytes the ARM processor.
;hence a jump of 4 o The processor has a large set of registers, and operates in seven modes.
LDR R5, [RA4,#4] o It can be programmed in assembly and one of the ID Es available is Keil RVDK.
ADD RA,#4
;comparing nearby values o The barrel shifter in the ALU has a lot of relevance, as it simplifies computations.
CMP R0,R5
BLT LOOP2 o ARM can use data processing instructions conditionally, by suffixing 5 to it.
MOV R6,R0 o There is a link register (LR) for simplifying procedure calls.
MOV R0,R5 ;swapping and storing them
o It has a special mechanism for handling immediate data which is bigger than 8 bits.
MOV RS,R6
o It has multiple register load and store instructions .
STR RS, [R4]
SUB R4,#4 o Stacks are needed when nested procedures come .
STR RO, [R4]
ADD RA4,#4
B LOOP2
QUESTIONS
OUTER
ADD R1,#1 1. List out the important features that make ARM ideal for embedded applications.
CMP R1,#15
2. Name two aspects in the design of ARM which has made it a processor with 'low-power
BNE LOOPl
dissipation'.
3. What is the use of a cache for any processor?
390
8. How is the instruction LDR different from the pseudo instruction LDR?
32-III HREIII
ll
9. Why is it that compare instructions don't need the suffix 'S'?
10. How is the write back operator used in the 'pre-indexed' mode of addressing?
9£E$$IR
EXERCISES
MINI II- II'MEI NI NVR MMII
Write instructions for the following, without using any CISC type instruction.
a) move into R7, a byte multiplied by 8
b) move into R6, a word multiplied by 17
# MEI/ II$IM £
c) move into RS, a number divided by 8
2. What do the following instructions mean and what is accomplished? In this chapter, you will learn
a) ANDEQ R1, R2, R4
b) ADDHI R2, R4, R2 o The internal architectu re of LPC 2148, o The program m ing of the tim er unit
c) MOVAL R7, R5 a typ ical and popular ARM? MCU o The program m ing of the PWM unit
d) SUBME R1, R2, R7 o The buses in this MCU o How to use the serial com munication unit
e) CMP R1, R2 o The list of peripherals inside the chip o The internal structu re of ARM9 and
f) TEST R1, R3
o The m em ory map of the peripherals Cortex-M3
g) MOVGT R2, R5
h) ADDLT R5, R6, R7 o The program m ing of the GPIO
a) Find the factorial of any number (the factorial should fit in a 32-bit register)
b) Do division using repeated subtraction
c) Find the sum of the first 100 natural numbers. Save the result in memory Introduction
d) Find the sum of 10 numbers stored in Read only memory the result should be in Read/
write memory In the previous chapter, we m ade a thorough stu dy of the core of ARM. The stu dy was
e) Store 15 numbers in memory and arrange them in not exhaustive; there are m any m ore aspects, featu res and advancem ents introduced with
- descending order each new version of the architectu re. The trick is to learn more as and when you need
- ascending order to use the chip for a specific application. What we did in the last chapter was assem bly
f) Write a program with a procedure call using language program m ing, so that the com putational capabilities of ARM, the processor,
- without using stack are clear.
- using stack In this chapter, we take a different approach-we exam ine ARM as a m icrocontroller
aka SoC (System on Chip). The application dom ain of this processor is in the
em bedded field. A number of peripherals are added inside the chip so as to make it a
'm icrocontroller'and as the peripherals increase in num ber and com plexity, it is sufficient
to make a com plete system . The num ber and kind of peripherals needed depends on the
application, but there are som e peripherals which are m ore or less a standard featu re in
most microcontrollers, examples are timers/counters, serial ports, general purpose I/O, TMS"» TI» XTAL2
TRST] TCK" [ TDO" XTAL1 ♦ RST
etc. More advanced MCUs have I2C, SPI, RTC, PWM units and so on as internal
peripherals. MCUs with more advanced cores have peripherals such as LCD controllers,
CAN controllers, USB controllers, etc. LPC2141/42/44/46/48
Test/Debug
Interface
In this chapter, we will take a look at the internal block diagram and internal buses P0[31 :28] and
P0[25:0]
of some ARM MCUs, study a few selected peripherals and write a few programs for ARM7 TDMI -S
these peripherals. P1[31:16] «!»',I
AHB Bridge
All the programs presented in the chapter have been tested and verified on the LPC
2148 MCU using the Keil RVDK.
ARM MCUs are manufactured by different firms and so there are a variety of
MCUs and peripheral boards in the market. We choose a few popular ones for our
study.
J1
9
We start with an MCU based on the ARM 7 core-NXP founded by Philips (the ·,
company) has manufactured and popularized the LPC 21xx series which is a set of l
:-.1.
MCUs with sufficient peripherals for a moderately complex application. Let's begin 4l
with the LPC 2148 MCU which is one member of the LPC 214x series, the other
members being LPC 2141/42/44/46. Data sheets and user manuals for the series are EINT3 to External
VPB (VLSI
Peripheral Bus) D+
D-
±',
!
1
SD
Adapter
Pot1 LPC214x JTAG RS-232 11.2 I Features of the LPC 214x Family
I The different functional blocks in this SoC family (which includes the 2141/42/44/48)
are shown in Figure 11.1, and let us attempt to understand some of them. The details
of the ARM7 core was thoroughly covered in the previous chapter and so it is not
repeated here.
The main features provided by this family are listed below. It is not necessary to go
through the list comprehensively right now. Once the most important features are stud-
ied in detail, this list may be used as a back reference.
i) The core ARM 7TDMI-S in a tiny LQFP64 package
ii) 8 KB to 40 KB of on-chip static RAM
iii) 32 KB to 512 KB of on-chip flash memory
iv) 128-bit wide interface/accelerator enables high-speed 60 MHz operation
v) USB 2.0 Full-speed compliant device controller with 2 KB of endpoint RAM.
In addition, provides 8 KB of on-chip RAM accessible to USB by DMA
Speaker LEDs Reset INT1 USB
10-bit ADCs provide a total of 6/14 analog inputs
Figure 11.1a I Photograph of the LPC 2148 MCU on a MCB 2140 development board Single 10-bit DAC provides variable analog output (LPC2142/44/46/48 only)
394 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 395
iii) Two 32-bit tim ers/external event counters (with four captu re and four com pare ± /die Mode
channels each), PWM unit (six outp uts) and watchdog
In the idle m ode, instruction execution is suspended until either a reset or interrupt occurs.
iv) Low power real-tim e clock (RTC) with independent power and 32 kHz clock input
But peripheral functions can continue opera tion and may generate interrupts to
v) Multiple serial interfa ces including two UARTs (16C550), two fast 12C-bus
cause the processor to resum e execution. The idle mode elim inates power used by the
(400 Kbit/s), SPI and SSP with buffering and variable data lengt h capabilities
processor, m em ory system s, related controllers and internal buses.
vi) Vectored interrupt controller (VlC) with configurable priorities and vector addresses
vii) Up to 45 of 5 V tolera nt fast general purpose I/O pin s in a tiny LQF P64 packa ge Power-down Mode
viii) Up to 21 external interrupt pins available
In the power-down mode, the oscillator is shut down and the chip receives no internal
ix) 60 MHz maxim um CPU clock available from program m able on-chip PLL
clocks.
x) On-chip integrated oscill ator operates with an external crystal from 1 MHz to 25 MHz
This mode can be term inated and norm al operation resum ed by either a reset or
xi) Power saving m odes include idle and power-down
certain specific interrupts that are able to function without clocks. Since all dynam ic oper-
xii) Processor wake-up from power-down mode via external interrupt or BOD
ation of the chip is suspended, this m ode reduces chip power consum ption to alm ost zero.
xiii) Single power supply chip with POR and BOD circuits
We now take a more detailed look at the im portant featu res of the chip. It may be 11.2.4 ] Internal Buses
necessar y to refer to Figure 11.1 while discussing these features.
11.2.4.T ] AMBA
11.2.1 ] Memory 'Advanced Microcontroller Bus Architectu re' or AM BA is a standard defined by ARM
in 1996, for on-chip buses in its SoC designs. In Figure 11.2, a num ber of buses can be
The mem ory available includes up to 40KB static RAM and 512KB flash. In the case of
seen which form part of AM BA. The figure shows the bus structu re re-drawn to em pha-
LPC2146/48 only, an 8 KB SRAM block intended to be utilized mainly by the USB,can
size the functionality of the constitu ent buses of the AM BA standard. Three buses with
also be used as a general purpose RAM for data storage and code storage and execution.
different protocols and speeds have been defined, for catering to the different kinds of
com ponents present inside the chip.
11.2.2 ] Memory Map The fastest bus is the system or the local bus, which connects the processor core
The total memory space is 4 GB (corresponding to an internal address bus of 32 bits, i.e., with mem ory, as mem ory accesses have to be very fast. In the LPC 21xx series, serious
2= 4GB).It is a 'm em ory m apped 1/0' system in which peripherals and mem ory share thought has been given to the idea of speeding up special peripherals, and as such, a
the sam e memory space. GPIO (General Purpose 1/0) block is also connected to the local bus. This perm its
peripherals connected to this fast GPIO block, to use the high speed of the local bus.
11.2.3 I System Functions
The system functions include a crystal oscillator and a PLL (Phase Locked Loop). The Fast
oscillator frequency can be in the range of 10 to 25 MHz which can be multiplied up, ARM 7
GPIO
Core
to get a system frequency up to 60 MHz using the PLL. There is also the possibility of
AHB VIC
changing the system frequency dynam ically (using the PLL). When the system is idling,
Bridge
the frequency can be scaled down to reduce power dissipation.
1- - - - - - - - - - - - - - - - 1
11.2.3.2 ] Power Control I I
: VPB Peripherals :
One of the main featu res of ARM processors are their low-power dissipation Besides a
I I
basic low power design in the technological aspects, low power modes are also available:
they are the idle m ode and power-down m ode.
Figure 11.2 I The internal bus structure
EMBEDDED SYSTEMS
396 ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 397
Along with the core, an AHB bridge is seen which defines the high speed AMBA's
In this setup, the flash memory is arranged as a bank of 128 bits such that each
'Advanced High Performance Bus' facilitating the 'Vectored Interrupt Controller'. The
access to flash allows 128 bits to be accessed. This will require the flash to be organized
third bus is the VP B bus which stands for VLSI Peripheral Bus; there is a bridge which
as 4 memory modules, where each module will have a bandwidth of 32 bits and thus
communicates between the low speed VP B bus and the higher speed AHB bus. The
effectively 128 bits at a time. In practice, the speed of memory will not get multiplied
VP B is the one that connects to all the peripherals of the LPC 214x SoC.
by 4, but it improves the speed over the case of having a flash memory and 32 bits
access, at a time. All the extra hardware to get this done is in the memory controller,
11.2.4.2 { The VPB Bus and Divider i.e., the MAM.
Figure 11.2 shows that there is a bridge that interfaces between the VP B and the AHB.
There is also a VP B divider. This is a register whose settings can be used to divide the
output frequency of the PLL so as to get a reduced clock frequency (1/2 to ) for the
VP B peripherals which need to operate at a frequency lower than the processor. The 11.3 I Peripherals
processor clock is designated as CCLK, while the peripheral clock is called PCLK. On Section 11.2 contains a list of the peripherals available in this chip. Each of the peripher-
reset, PCLK is ¼ of CCLK. als has addresses and the peripherals use the memory mapped 1/0 scheme of addressing-
this means that both memory and I/O share the same address space. The total address
11.2.5 I Memory Accelerator Module space is from Ox0 to OxFFFFFFF, i.e., a 4GB space.
The memory map of the system is as shown in Figure 11.4.
Instructions are executed by fetching them from memory. In a typical MCU, program
code, that is, the instructions are stored in flash memory. Flash memory is rather slow, What are the notable features in this memory map?
which means that program execution gets slowed down, and so the very purpose of hav- A
ing a high speed processor gets defeated. The easiest solution is to have a shadow RAM i) The 512KB of flash (non-volatile) memory has an address starting from 0x0.
as in PCs, where the content of flash is copied to RAM (on startup) so that this (fast) ii) The 40 KB of static RAM has the addresses starting at Ox4000 0000.
RAM is accessed rather than the slow flash. Such a solution can be thought out for our iii) There is a RAM allotted for USB with DMA' applications.
MCU as well, but here we are thinking of 'on-chip'flash and RAM which are limited iv) The peripherals attached to the AHB have addresses from OxF000 0000.
in size- so copying program code to on-chip RAM is not a feasible solution. Another v) The peripherals attached to the lower speed VP B bus have the addresses from
possibility is to have a fast cache on the chip, but that will need additional hardware and 0xE000 0000.
increase the complexity of the chip.
VPB Peripherals Next, we will learn how to use the peripherals of the chip. Each
The final solution to this has come in the form of a module named the 'memory
peripheral has a number of special function registers (SFRs) associated with it, and each
accelerator module'. In simple terms, the memory accelerator module (MAM ) attempts
SFR has a specific address. To use a specific peripheral in the way we want, the associated
to have the next ARM instruction that will be needed, in its latches in time to prevent
registers should be written with the appropriate bits.
CPU fetch stalls (see Figure 11.3).
Buffers
11.3.1.1 ] Port0
Port 0 is a 32-biI /O port with individual direction controls for each bit. 28 pins of the
Port 0 can be used as general purpose bi-directional digital I/Os, while P0.31 provides
digital output functions only. The operation of Port 0 pins depends upon the pin func-
tion selected via the pin connect block. Pins PO.24, P0.26 and P0.27 are 'reserved' and
Figure 11.3 Simplified view of the memory accelerator module
not available for use.
ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 399
398 EMBEDDED SYSTEMS
Ox0008 0000
Ox0007 FFFF
Total of 512 kB On-chip Non-volatile Memory (LPC2148) GPIO
Ox0004 0000
Ox0003 FFFF SPIO
Total of 256 kB On-chip Non-volatile Memory (LPC2146) Pin P0.5
Ox0002 0000 Select
Ox0001 FFFF Match 0.1 Mux
Total of 128 kB On-chip Non-volatile Memory (LPC2144)
Ox0001 0000
OxOOOO FFFF AD 0.7
Total of 64 kB On-chip Non-volatile Memory (LPC2142)
0x0000 8000
OxOOOO 7FFF
Total of 32 kB On-chip Non-volatile Memory (LPC2141)
0.0 G8 OxOOOO 0000
11:10
Figure 11.4 I Memory map of the Soc Bits of PINSELO Reg
Figure 11.5 [ Pinselect mux of pin P0.5
400 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 401
Table 11.2 I Generalization of the Function of the Pinselect Register Bits SFRs
PINSEL0 and PINSEL 1 Function 0
1
00 Primary function, typically GPIO IODIR
G
01 First alternate function p
Second alternate function I
10 IOCLR
0
11 Reserved
Sr±.39r8Er±.$.929/I.
7£722099:33,33909±5.388££
7.9%±M08.3601.29305
$704:±$%.3
303611-5980%.32
IOSET p
i
n
s
What Table 11.1 displays is that the bits 11 and 10 of PINSEL0 register should be IOPIN
00 if pin No:29 (P0.5) is to be a GPIO, 01 if it is to be used as SPIO, 10 if it is to be a
match pin for Timer 0 and 11,if it is used as AD0.7. 31
In general, pin selection is as shown in Table 11.2 Figure 11.6 Registers pertaining to one GPIO pin
The description of a pin shows that it can have 4 possible functions. The logic on the
select pins decides the functionality of the port pin. For any pin, the bit configuration '00'
of the corresponding pinselect register bits program the pin to act as a general purpose i) IODIR (IO Direction register): The bit setting of this register decides whether a
I/0 pin. By default, on reset, all port pins act as GPIO pins. pin is to be an in put(0) or output(1).
ii) IOSET (IO Set register): This register is used to set the output pins of the chip. To
Example 11.1 make a pin to be '1', the corresponding bit in the register is to be '1'. Writing zeros
have no effect.
Use the PINSEL0 register for activating PWMl, PWM2 and PWM3 outputs, which iii) IOCLR (IO Clear register): To make an output pin to have a '0'value, i.e., to clear
are at pins P0.0, PO.land P0.7, respectively. it. The corresponding bit in this register has to be a'1.Witing zeros have no effect.
iv) IOPIN (IO Pin register): From this register, the value of the corresponding pin can
Solution
be read, irrespective of whether the pin is an input or output pin.
Refer Appendix D, which gives the complete listing of the pin functions of the chip.
PWM output pins are listed as the second alternate function of any port pin. The corre- Example 11.2
spondingbits of PINSEL0 register for activating these port pins to act as PWM should
be given a logic of 10. See Table 11.3. Let us attempt to make the lower 16 GPIO pins to be output pins.
Then IODIR = 0x0000 FFFF
Table 11.3 To send zeros to all these pins, IOCLR = 0x0000 FFFF.
Output Function Output Pin Bits of PINSEL0 Register Now, check the pin values in the register IOPIN, which will have the lower 16 bits
to be 0.
PWMl PO.O 1:0
Now, if these pins are to be set, IOSET = Ox0000 FFFF
PWM2 P0.7 15:14
Check the pin values in the register IOPIN, which will have the lower 16 bits to be 1.
PWM3 P0.1 3:2
2.2RC.7E28 5325218558.,2£ 51.09 1-533.28a
801.39239139/8%23687
2E79327788ML 89550303.66 03.3'¥ 3BO%%883970089%819
Example 11.3
Thus, PINSEL0 register has the value -0000 0000 0000 0000 1000 0000 0000 1010= Generate an asymmetric al square wave at the lowest four pins of Port0.
Ox0000800A
gr.-±EOE399.320.3900.38.8Mr+±£6¢%8Ai.+±e «s.....0;A9sos..ca.save
:es0es-42.£ML. #include <LPC214X.H>
int main(void)
{
11.3.1.4 ] Using GPIO Pins unsigned int x;
These pins can be used for applications for which specific 'controllers/drivers' are not for (;;)
available inside the chip-for driving an LCD display, relays, motor controls, ON/OFF {
functions and so on. Four registers are available for this. They are shown in Figure 11.6 IODIR0 = 0xFFFFFFFF; //Make all pins as outputs
and listed as follows: for(x = 0;x<4;x++) //Delay for the high part
402 EMBEDDED SYSTEMS ARM--THE WORLD'S MOST POPULAR 32BIT EMBEDDED PROCESSOR 403
IOSET0 = 0x0000000F; //Set the lowest four bits
P0.0 to P0.3
Example 11.3 is a simple example to show the function of the four GPIO registers
corresponding to Port 0. The use of Embedded Chas been discussed in Chapter 9 and is
not repeated here. This program sets and clears the lowest four bits of Port Oat a certain lowest four pins of Port0 for Example 11.3
asymmetric rate (note that the delays are different).
Figure 11.7 shows the output at Port pins 0.0 to 0.3, viewed in the logic analyser
which is available in the 'simulator' of Keil RVDK. LEDs may be connected to the port and this function is then called. To make the ON time to be twice the OFF time, the
pins and, then they will go ON and OFF at the rate determined by the delay. In the wait function is called twice when the pin is '1'.
program, the OFF time is three times the ON time.
The contents of the registers in Figure 11.6 can be observed in the 'peripherals' part Note For both the above programs we haven't used the 'PINSELECT BLOCK for
of the simulator. choosing the pin function. This is not necessary, because on reset, all port pins behave as
GPIO itsel£
Example 11.4
11.3.2 ] The Timer Unit
#include <LPC214x.h> Next let's study the timers/counters of LPC 2148. A timer and a counter are function-
void wait(void) ally equivalent, except that a timer uses the PCLK for its timing, while a counter uses an
external source. This means that a counter is used to count external events via the capture
int d; inputs. A counter is also called a 'capture timer'.
for (d = 0; d <1000000; d++); Here, we discuss the timer function alone. There are two such units-Timer 0 and
Timer 1. There are a number of registers associated with timer operations. Let's discuss
int main(void) the functionality of each of them for Timer 0 which we notate as TO. When we use
Timer 1, we use similar registers, but with the notation Tl.
IODIRl = 0x00010000; //Make Pl.16 an output The first step in timer operation is to load a number into what is called a 'match
while(1) register'. Then a timer count register is started. This register keeps incrementing for each
{ PCLK cycle or a lower rate pre-scaled cycle. When the content of this timer count reg-
IOSET1 = 1<<16; //P1.16 = 1 ister becomes equal to the value in the match register, i.e., a match occurs, the delay that
wait(); occurs from the starting time can be used for our 'timing'. Figure 11.8 illustrates the idea
wait(); of timing done using the timer unit.
IOCLR1 = 1<<16; //P1.16 = 0 Now, let us understand the special function registers associated with Timer 0. Keep
wait); in mind that a similar set of registers exist for Timer 1 as well.
PCLK
11.3.2.2 ] Timer Operation
Prescaler
i) Load a num ber in a match register.
Figure 11.8 [ Simplified block diagram ill ustrating the opgation of the timer unit ii) Star t the tim er by enabling the 'E' bit in TOTCR.
iii) The tim er count register (TOTC) starts increm enting for every tick of the peripheral
clock PCLK (no prescaling is done).
Timer Control Register-TOTCR
iv) When the content of the TOTC equals the value in the match register, tim ing is said
This is an 8-bi t regi ster (Figure 11.9) in which only the lowest two bits need be used. to have occurred.
Bit O-E: Thi s bit is the Enable bit. When this bit is '1', th e counter is enabled and
v) One of many possibilities can be m ade to occur when this happens.
starts. Then, the count in TOTC is increm ented for every cycle of PCLK (if prescaling
vi) The possibilities are to reset the tim er count regi ster, stop the tim er, or generate an
is not used).
interrupt. This 'setting' is done in the TOMCR register.
Bit 1-R: This bit is the Reset bit. When '1', th e counter is reset on the next positive
edge of PCLK. Now let's design a very sim ple tim er for generating a symm etric square wave at
Pl.16, using Tim er 0.
Match Registers (MR0 to MR3)
There are four 32-bit match registers available: MRO to MR3. For the operation of one Example 11.5
tim er, one of the m atch registers may be sufficient and is used by loading a number into it.
During tim er operation, the tim er count register starts increm enting, and at som e #include <LPC21Ax.h>
tim e, its count 'm atches' with the num ber in the match register. When this match occurs, void wait(void);
som e action can be progra m m ed to be done by 'configuring' the bits of the 'm atch control int main(void)
register'.
T0MR0 = 0x000000FF; //Load a number in the
Match Control Register-TOMCR
match register
This is a 16-bit regi ster Figure 11.10) uscd to specify the event to occur when the match T0MCR = 4;; //Stop when match occurs
occurs. while(l)
{
Bit 1 BitO IODIRl = 0x00010000; //Make Pl.16 an output
'
while(!(TOTC == TOMRO)) ; //Until TOTC = MRO
TOTCR = 2; //Reset the counter HT8.7ms
TOTC = O; //Make the timer count Fig 11.11b I Square wave from the timer with a) T0OMRO == OFF b) TOMRO == OFFFF
reg= 0
Now let's calculate the frequency of the square wave generated. The above program has See Example 11.6, which shows the part of Example 11.5 which has been modified
been tested on a board in which the peripheral clock (PCLK) is obtained by dividing the by the additional instruction T0PR = 1. We now get a timer frequency of 57 Hz, for
crystal frequency of 60MHz by 4 (using VP B register settings) Thus PCLK is 15 MHz T0MR0= OxFFFF. Without prescaling, the frequency would have been 114Hz.
now, and it has a period ofT = 0.067 µsecs.
The function 'wait' creates a delay which is equal to 256 periods of PCLK (as Example 11.6
T0MR0 = OxFF), i.e., 256 x0.067 µsecs.= 17.075 µsecs. This delay is half the period
of the square wave generated at Pl.16. The period of the signal is thus 34µsecs and the #include <LPC214x.h>
frequency = 29.4 KHZ. void wait(void);
Next, change the match counter value to OxFFFF. A similar calculation gives a fre- int main(void)
quency of 114 Hz. Figure 11.ll(a) and (b) show the square waves generated for match
values of OxFF with T = 34 S and OxFFFF with T=8. 7msecs. TOMRO = OxOOOFFFF; //Load a number in the
match register
TOMCR = 4;
TOPR = l; //Stop when TC is reached
Fig 11.11a
a, while(l)
0 None 114 Hz
int main(void)
2 57 Hz
2 3 38Hz IODIR1 = 0xOOFFOOOO;
//Pl.16.23 defined as
4 5 22.8 Hz Outputs
83; RGEE3ad 98 E08-%2#.JEN8EE9RCA8SOSEEHS IOCLRl = OxOOFFOOOO; = 0
RM,3%8±.S2727899.RE220932.99%9%
//Pl.16
TOTCR OxOOOOOOOO; = //Disable timer counterO
Table 11.4b MRO = OFF TOPR Ox00000002; = //Prescaler value
TOMCR = 0x00000003; //Enable interrupt and
TOPR Division Factor Frequency at Pl .16
reset on match
0 None 29.4 KHz
TOMRO 0xFF; = // MRO value
[
1 2 14.7 KHz tl•;,
··1
VICVectAddr 4 = (unsigned)Timer O_ISR; q:,
2 3 9.8 KHz
//Set the timer ISR if
3 4 7.35 KHz vector address
GAG t.ca «£A&G'.E Ge7LutliaaGea«
VICVectCntl4 = Ox00000024; //Set channel
VICintEnable = OxOOOOO //Enable the TIMER-O
Tables 11.4(a) and Table 11.4(b) show the frequencies generated by Example 11.6 interrupt
ll for match values of OxFF and OxFFFF, for different pre-scaling factors. TOTCR = OxOOOOOOOl; //Enable timer counterO
I for (; ; ) ;
£
k»,
.",
11.3.3 I Timer O in the lnterupt Mode
Next, we write a program for Timer O to operate it in 'the interrupt mode. Example 11.7
tsAt
#wt.aura
it is such a program. To understand how the interrupt mechanism is incorporated here, a 11.3.3.1 Vectored Interrupt Controller (VIC)
brief introduction to the interrupt structure of the chip is necessary.
See Figure 11.1 in which the block diagram of LPC 2148 has a VIC as a peripheral. It
The discussion starts on a general note, and converges to the use of Timer O0, in the
is the VIC that manages all the interrupts of the ARM core (IROs and FIQ;) as well as
interrupt mode. This will help us to use any other peripherals in the interrupt mode by
interrupt requests from the peripherals.
using the associated registers of the peripherals and its related interrupt registers. You
can, for instance, try to write programs for PWM and UARTs in the interrupt mode. Features of VIC
Example 11.7 is a program for generating a square wave at pin Pl.16. The calculation for
the frequency at this pin is the same as presented in Section 11.3.2.3. • 32 interrupt request inputs
As part of the mechanism of understanding interrupts clearly, instructions of this • 16 vectored IRQ interrupts
program are referred to, at every step of the forthcoming discussion. • 16 priority levels dynamically assigned to interrupt requests
• Software interrupt generation
Example 11.7 The vectored interrupt controller (VIC) take s 32 interrupt request inputs and pro-
grammably assigns them into 3 categories: FIQ, vectored IRQand non-vectored IRQ.
#include <LPC214x.h> The programmable assignment scheme means that priorities of interrupts from various
unsigned int x= 0; peripherals can be dynamically assigned and adjusted.
_irq void Timer O ISR (void) Fast interrupt requests (FIQ) have the highest priority. If more than one request
is assigned to FIQ, the VIC ORs the requests to produce the FIQ_signal to the ARM
x
if( )
1;= processor.
Vectored IROs have the middle priority, but only 16 of the 32 requests can be
IOSETl = 1<<16; //P1.16 = 1 assigned to this category. Any of the 32 requests can be assigned to any of the 16 vec-
else tored IRQslots, among which slot O has the highest priority and slot 15 has the lowest.
IOCLRl = 1<<16; //P1.16 = 0 Non-vectored IRQs have the lowest priority.
ARM--THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 411
410 EMBEDDED SYSTEMS
Table 11.6 I Bit Definitions in the VIC Interrupt Enable Register for a Few Interrupt Sources
IRO 1
Bit 7 6 5 4 3 2 1 0
A
IR02 R Symbol UART 1 UART0 TIMER 1 TIMER 0 ARM Core 1 ARM Core 0 - WDT
: M
V
I
C
IRQ
- C
0
R
Access R/W R/W R/W R/W R/W R/W R/W R/W
For example, see Table. 11.6 which is part of this registers' bit definitions for the
E
interrupting peripherals. It seen that the bit position for Timer 0 is '4'and hence bit 4 of
IRON
-- this register is to be set if Timer 0 is to be operated in the interrupt mode.
Fig ure 11.12 I The VIC's connection to ARM Example 11.7 uses the instruction
,4
#j
nwf
registers, but the VIC associates Timer 0 with just one interrupt line.
Next, we will discuss briefly some of the important registers associated with the VIC.
These are read/write accessible registers. These registers hold the addresses of the inter-
rupt service routines (ISRs) for the vectored IRQslots.
In Example 11.7,
ti
E""
11.3.3.3 I Interrupt Enable Register (VIC Interrupt Enable)
This is a read/write accessible register. This register controls the decision of which of the VICVectAddr4 = (unsigned)TOisr; //Set the timer ISR vector address
32 interrupt requests and software interrupts are allowed to contribute to the generation
This instruction indicates that the ISR is at address 'T0isr' which is the starting
of an interrupt.
location (address) of the ISR.
11.3.3.7 [ Timer O Interrupt Register (TOIR) When MCUs have dedicated PWM units, they have registers which can be
program m ed for the required frequency of the pulse train as well as the duty cycle. For
This register has bits for each of the matching states of MR0 to MR3. When a tim er
this ARM? MCU, the PWM unit works sim ilar to a tim er unit. Also if its PWM m ode
opera tes in the interrupt m ode and a match occurs, an interrupt is generated, and the
is not enabled, it works onl y as a tim er.
corresponding flag bit in TOIR is set. To 'clear ' it, a '1'm ust be written into this sam e
register. Then onl y will the interrupt flag be 'reset'.
There is a free running tim er regi ster (PWMTC), whi ch matches to any of the ii
seven 32-bit m atch registers (MR0 to MR6). The m atch register values are continuously
Table 11.8 shows that the corresponding bit for Tim er 0 in T0IR is bit 0. In lll'
i]
com pared to the count in the tim er register which increm ents (with pre-scaler) when 111
Example 11.7, the instruction used is
started. One m atch register (MR0) is dedicated to the action of deciding the frequency #
TOIR = Ox01; II Clear match 0 interrupt of the pulse train, by resetting the count upon m atch. The other m atch registers can be
ti1
,
i) All single edge controlled PWM outp uts go high at the beginning of a PWM cycle.
Table 11.8 Bits ofTOIR for the Interrupts Generated When 'Match'Occurs ii) Each PWM outp ut will go low when its match value (in MR1 to MR6) is reached.
Bit Symbol Description If no match occurs (i.e. the match value is greater than the PWM rate), th e PWM
0 .. MRO Interrupt Interrupt flag for match channel 0.
outp ut rem ains continuously high.
iii) When a match occurs, actions can be triggered autom atically. The possible actions
MRl Interrupt Interrupt flag for match channel 1.
are to generate an interrupt, reset the PWM tim er counter, or stop the tim er. Actions
2 MR2 Interrupt Interrupt flag for match channel 2. are controlled by the settings in the PWMMCR register.
3 MR3 for match channel 3.
Note In double edge controlled PWM, both the rising and falli ng edges of the PWM
waveform are controlled by the match registers. We lim it the discussion here, to just the
' ' single edge controlled PWM.
V
Q)
25%
0 f----r---r---1M>-- - - - ,- - ~
6
>
0
11.3.4.2 I Calculating the Frequency
V
>>>
::, 50% MR0 is used to decide the frequency of the pulse train. As calculated for the tim er
0
0 (Section 11.3.2.3) PCLK (15 MHz) has a period of0.067sccs. If PWMMR0 is loaded
r 75%
0
'I
I
I
I I
I
I• ►I
I
Fig ure 11.13 I Pulse trains at different duty cycles Figu re 11.14 I A pulse train showing the period T and ON time P
414 EMBEDDED SYSTEMS
ARMTHE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 415
Table 11.9 I Calculation of the Period of the Pulse for Different Values of PWMMR0
Table 11.11 I List of the PWM Output Channel and Corresponding Port Pin
N in PWMMR0 T(µsecs) Frequency (f) = 1 IT
PWMNo Channel No Output Pin No
OxFF 256 x 0.067 = 17.152 58.3KHz
PWMl 1 PO.O
Ox SE 95 x0.067 = 6.365 157 KHz
2 P0.7
PWM2
Ox FFFF 65,536 x0.067 = 4390.8 227 Hz
RS 9events¥ Jae as up ; HIV at ice PWM3 3 PO.l
PWM4 4 P0.8
with a number N, it counts from Oto N. A match occurs after 0.067µsecx (N+1), and
PWMS 5 P0.21
this is the period T of the pulse train generated. Refer Table 11.9 for a sam ple set of
PWM6 6 P0.9
calculations. net Rs
11.3.4.3 ] Calculating the Duty Cycle 11.3.4.5 I Control Registers of the PWM Unit
The duty cycle is the ratio of ON period (P) to the total period T. As per Figure 11.14, it There arc a num ber of registers for this application, and the usage of each of them can be
is P/T, expressed as a percentage. referred from the manual of the chip. Here, we only discuss a few.
There are six matchregister s for decidi ng th e pulse ON tim e. We will consider the
simplest case of single edge controlled PWM. Let's use the m atch register PWMMR3. PWMTCR This is the PWM Tim er Control Register. This is an 8-bit register. Only the
In this case, when the tim er count value matches the value in PWMMR3, th e PWM lower 4 bits of this register need to be used Figure 11.15).
outp ut pin goes low. This will end the high period of the pulse. See Tablell.10 for a BR O CE: COUNTER ENABLE When '1', th e PWM tim er counter and Prescale
sample calculations based on the value of MR3, for T = 6.365µsecs. counter are enabled.
4
$
' Example 11.8
Bit 1-CR: COUNTER RE SET 'When '1', the above mentioned PWM tim er
count regi ster and Prescale counter are reset on the next positive going edge of PCLK.
They rem ain reset until this bit is made '1'.
£a Calcul ate the value of the value to be given in PWMMR0 andPWMMR 3 to get a pulse
train of period 5 ms and duty cycle 6f 25%.
Bit 2: R-Reserved
Bit 3: PE-PWM ENABLE. When '1', th e PWM mode is enabled. Otherwise the
PWM unit acts as just a tim er.
Solution
In Example 11.8, PWMTCR = Ox9
5000 secs = (N+1) x.067
(N+1) = 74,626 = Ox12382 PWMP CR Thi s is the PWM Control Register. This is a 16-bi t regi ster and is used to
The num ber to be loaded in PWMMR0 is 1 less than this, i.e., it is 12381. enable and select the typ e of each PWM channel.
25% duty cycle corresponds to 1.25 msecs This register enables or disables the six PWM outputs, and also chooses between
(N1+1)x 0.067 secs = 1.25 msecs double and single edge control. Bits 0, 1, 7 and 8 and 15 are unused. Table 11.12 shows
Calcul ating Nl = 18,655 the state of the bits of PWMPCR for choosing between single and double edge control.
The number to be loaded in PWMMR3 = 18,655 = Ox48DF.
"9.¢39.99
R PE R CR I CE
11.2.4.4 ] The PWM Output Pins Figure 11.15 Important bits of PWMTCR
Corresponding to the six m atch registers, there are six PWM output pins, and they are
called the PWM channels. The pins and PWMPCR registers bits for enabling each of Table 11.12 ] Choosing Between Single and Double Fdge Control Using Bits of PWMPCR
them are given in Table 11.11. Bit No of PWMPCR When Bit= 0 When Bit= 1
PWMPCR.2 Single edge control for PWM2 Double edge control for PWM2
Table 11.10 I Calculating the Duty Cycle for Different Values of PWMMR3
PWMPCR.3 Single edge control for PWM3 Double edge control for PWM3
Nin PWMMR0 Value in PWMMR3 I, (s ecs) Duty cycle = PIT
PWMPCR.4 Single edge control for PWM4 Double edge control for PWM4
OxSE OxF 1.004 µsec 15.5 %
PWMPCR.5 Single edge control for PWMS Double edge control for PWMS
Ox2E 3.15 µsec 49.4 %
a33L:233323596.125s..£2rare.5qazc.Gr2si.Lugo ! PWMPCR.6 Single edge control for PWM6 Double edge control for PWM6
du,8-3£ 7£,99.38a±I3314.65.3£48
{:44ES4is± 46+,5±Ser ±.±3 9258s.
4. 5£284205$488386$3.$8€85±#841.438$$E%45al
418 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 32-BIT EMBEDDED PROCESSOR 419
°
Ht- Ty«T »
L
p
C
Po o[_] ► PWM1
PO.7 I PWM2
Figure 11.17 I The output waveform obtained from Example 11.1 O 2
1
4
8
Next, let's consider the genera tion of m ore than one PWM pulse. Exam ple 11.11 »» PWM3
P0.1 ~
is a progra m which gets PWM outp uts from channels 1, 2 and 3. Note that all PWM
outputs will occur at the sam e repetition rate. I
Example 11.11
Figure 11.18 I Output pins for the PWM channels
#include<LPC214x.h>
void PWM_InIt(void)
PWM3
PINSEL0 I= 0x0000800A; //Enable PWM 1,2 and
i 3 outputs
'•
;
n
PWMPR
PWMPCR
=
=
0x0000.0001;
0x00000E00;
//Prescale value= 1
//Pins of PWM 1,2 and
3 enabled PWM1
i•
PWMMR0 = 0xFF; //Set PWM frequency ''
PWMTCR = 0x00000009; '
PWMMCR = 0x00000002; //Reset on PWMMRO
k
P3
I
►
T
1-
P3
►
T
I
►I
L
PWM_Init ();
while(1) Figure 11.19 I Output waveforms from three PWM channels
{
PWMMR1 = 0x30; //Pulse on time at PWMl
channel Table 11.14 I Values of the Duty Cycles obtained from Example 11.11
PWMMR2 = 0xS0; //Pulse on time at PWM2 Channel Output Pin P (ON Time) secs Duty Cycle
channel PWM 1 P0.0 3.38 x2 = 6.76 19.7%
PWMMR3 = 0x23; //Pulse on time at PWMl
PWM2 P0.7 5.42x2 = 10.64 31.02%
PWMLER = 0xE;
channel
//Latch register value xx
11.3.5
PWM3
I
GEE%.s
The UART
P0.1
Masse
2.41 x2 = 4.42
- 12.8%
T = 34.3 µsecs This chip has two UARTs, nam ely, UARTO and UART1. To understand the operation
Figure 11.18 shows the output pins on which the PWM signals are obtained and of these, first observe the sim plified block diagram of the UARTs of the chip. For any
Figure 11.19 shows the PWM output waveform s. Since there is a prescaling factor of of the regi sters referred herein, you must add the prefix U0 or Ul depending on which
1, the basic tim e T (calculated with a count ofOxFF) is multiplied by 2 to getT = 34. 3 unit (UARTO or UART1) is being used. The three im portant units are the transm itter,
secs. The pulse ON tim es are also m ultiplied by 2 to get values as shown in Table 11.14. receiver and the baud rate generator blocks.
420 EMBEDDED SYSTEMS
EMBEDDED SYSTEMS ARMTHE WORLD'S MOST POPULAR 32-BT EMBEDDED PROCESSOR 423
422
void delay_ms(int x)
The calculation for these values is as follows
424 EMBEDDED SYSTEMS ARM-THE WORLD'S MOST POPULAR 33-BIT EMBEDDED PROCESSOR 425
With this, we conclude our discussion of serial com m unication. For more details of JTAG
Interface
the registers used and the interrupt mode of tra nsm ission, do refer to the user manual
ofLPC2148.
LPC2917/19/01
11.3.6 ] The SSP Unit
This unit perform s serial com m unication using the SPI protocol (Refer Section 5.2.2).
Appendix I contains a progra m which interfaces an SD card to LPC 2148 using the SSP
unit. It may be necessary to refer to the manual of the chip, and gain an understanding
of the registers of the SSP unit, to get a clear understanding of the interfacing progra m .
Section 19.2 discusses a com plete product developed using LPC 2148 MCU.
#±
;
n
@
Cd
5 6
a
-.
, Master
I Master GPOMA Controller
With this, our discussion of a typ ical ARM? MCU ends. Note that the one that we
Slave l
have used is only one am ong the num erous versions of ARM? available in the market. Vectored I GPOMA Registers
AHB to DTL I Slave I
But a basic understanding of this chip, peripherals and program m ing will help in under- Interrupt I
Controller Bridge
I
standing any other ARM? MCU chip. Slave,
Next we will take a look at typ ical ARM9 and Cortex MCUs. Going beyond the I
AHB to DTL i Slave I
periphery of these is beyond the scope of this book. We will look at them , from a block Bridge Slave 1
I
diagram point of view, just to observe their com plexity and power. I
I
5 I
I
I
?
I I
I Slave'
I
11.4 ] ARM9 I I
5
I I
'
The ARM9 core is a more advanced mem ber of the ARM fam ily (com pared to ARM7).
I
save]
I
I I
It has a 5 stage pipeline and opera tes at a frequency range of approxim ately double that AHB to APB Slave
3 of ARM?. Many ARM9 cores have DSP instructions and thus are 'Enhanced' ARM9E
processors. Because the core is so powerful, it is used for m ore com plex operations and PWM0/1/2/3 y1
Bridge I
I
I
I
I
I
I
Event Router
I I
I I
Chip Feature ID
an MCU which is based on ARM9, typically has more peripherals than an ARM? based 3.3V ADC1/2 Ky
I I Slave'
I
MCU. I
General Purpose 1/0
I
Here we will take a look at a particular ARM9 board developed by NXP; it is con- SV ADC0 Ports 0/1/2/3
tains an MCU of the LPC 29xx series. The user manual describes the featu res of this
board in this way:
Quadrature
Encoder I
'The LPC 29xx com bine an 125 MHz ARM968E-S CPU core, Full Speed USE AHB to APB I Slave
2.0 host and device (LPC 2927/29 only), CAN and LIN, 56 KB SRAM , up to 768 KB Bridge
flash mem ory, external mem ory interface, three 10-bi ADCs, and multiple serial and
parallel interfaces in a single chip'. wDT
___ J
It is obvious that this is a very powerful chip with many m ore peripherals than the
ARM? MCU that we have just stu died. Figure 11.22 shows the internal block diagra m
of the chip, in which you can observe its advanced featu res and peripheral structu re.
Conclusion
With this, we conclude our discussion of ARM peripheral interfacing. The programs in
the chapter have been tested and confirmed to be working as per the specifications for
which it has been designed (i.e. frequency, pulse width, etc.) Only a few peripherals of
APB Slave Gro up 0 APB Sla ve Gro up 1
ARM? have been discussed, but the methodology used for those blocks is expected to
help in understanding the rest of them. The important point is that registers of the unit
to be used are to be understood with a high degree of clarity. For ARM 9 and Cortex
MCUs, only the degree of complexity has been shown-programming them can be done
on similar lines, as has been done for ARM 7.
o There are two serial communication units named UART0 and UARTl.
W$4-$. HI
o Using the SSP unit, an SD card can be interfaced to the chip.
o ARM9 and Cortex MCUs are much more complex and powerful, and have more number -% "I ,TT-lfj
of peripherals.
QUESTIONS
1. Name five peripherals in the LPC 2148 MCU.
2. What is the difference between PCLK and CCLK?
I2
HIEIIE
RWII'IIIIIR
45fl
·f.sill.== , - 5
"±j$<@_H,"
3. What is the necessity for having the MAM module? How does it function?
4. Distinguish between the power-down and idle modes of this MCU.
5. What is meant by the term 'AMBA ? In this chapter, you will learn
6. Differentiate between the different internal buses in terms of speed and function.
0 The history and application range of PSoC 0 The working principle of Switched
7. Look at the memory map and find out the extent of memory locations for static RAM and devices Capacitor circuits
flash ROM. 0 The distinct and special featu res of PSoC 0 The finer details of the analog blocks
8. For a GPIO pin to be made to act as an ON/OFF switch, which are the registers to be used. 0 The differences between PS6C1, PS6C3 0 How to do the interconnections on the
Give an example to illustrate the use of these registers. and PSoC5 GIB for digital and analog blocks
9. How does the prescaler in a time unit function? 0 The internal architectu re of PS6C1 0 Ihe programm ing of PS6C1
10. Distinguish between single and double edge PWM. 0 The GIB of PSoC Designer 0 The enhancem ents available for PSoC3
0 The digital blocks of PSoCl and PSoC5
EXERCISES
Write programs to obtain the following waveforms:
1. Generate a symmetrical square wave at four pins of Port 1, using software delay.
Introduction
2. Generate an asymmetric square wave at four pins of Port 0 using software delay. The term SoC was m entioned in Chapter 1. So we know that an MCU with a large
3. Using Timer 1, obtain a symmetric square wave of frequency 1 0 KHz at one pin of Port 1, num ber of peripherals is called an SoC, for 'System on chip'. Each of the peripherals on
and another square wave of frequency 90 KHz at one pin of Port 0 using Timer 0. Both an SoC is usually program m able, and so the term PSoC can have a general meaning.
waveforms should be simultaneously present. But in this chapter, we discuss a very specific product line of Cypress Sem iconductors
4. Using Timer 0, generate an asymmetric square waveform at four pins of Port 1. The square designated as PSoC. We discuss the special featu res of Cypress's PSoC, which has
wave should have an ON time of 0.1 msec and an OFF time of 0.35 msec. becom e very popul ar in the em bedded system s world and has found many applications
for itself PSoC is a fam ily of em bedded processors with a simple 8-bit M8C core in
5. Generate PWMs at the six output pins of the PWM unit, with duty cycles of 10, 20, 30, 40,
PSC 1,a more sophi sticated 8-bit 8051 core in PS6C3, and an advanced 32-bit ARM
SO and 70%.
core in PSoC5.
In this chapter, we will concentrate more on the PSoC 1 architecture and usage.
The aim is to introduce the reader to this series of MCUs which are versatile, easy to
understand and use, and have many features that other MCUs do not possess. The best
way to learn is to get a PSoC developm ent kit and do a project based on one of the chips
belonging to this fam ily. This chapter introduces you to PSoC and analyses why PSoC is
a good point to 'take off' into the em bedded design world.