Module 5 Notes
Module 5 Notes
CACHES:
• A cache is a small, fast array of memory placed between the processor core and main
memory that stores portions of recently referenced main memory.
• The goal of a cache is to reduce the memory access bottleneck imposed on the processor
core by slow memory.
• Often used with a cache is a write buffer—a very small first-in-first-out (FIFO) memory
placed between the processor core and main memory. The purpose of a write buffer is to
free the processor core and cache memory from the slow write time associated with
writing to main memory.
• Since cache memory only represents a very small portion of main memory, the cache fills
quickly during program execution.
• Once full, the cache controller frequently evicts existing code or data from cache memory
to make more room for the new code or data.
• This eviction process tends to occur randomly, leaving some data in cache and removing
others.
Memory Hierarchy
• Figure reviews some of this information to show where a cache and write buffer fit in the
hierarchy.
• The innermost level of the hierarchy is at the processor core.
• This memory is so tightly coupled to the processor that in many ways it is difficult to
think of it as separate from the processor. This memory is known as a register file.
• Also at the primary level is main memory. It includes volatile components like SRAM
1
ARM processors Module - 5
• Figure shows the relationship that a cache has with main memory system and the
processor core.
• Upper part is without Cache and lower one is with cache.
• If a cached core supports virtual memory, it can be located between the core and the
memory management unit (MMU), or between the MMU and physical memory.
• Placement of the cache before or after the MMU determines the addressing realm the
cache operates in and how a programmer views the cache memory system.
2
ARM processors Module - 5
Cache Architecture
• ARM uses two bus architectures in its cached cores, the Von Neumann and the Harvard.
• A different cache design is used to support the two architectures.
• In processor cores using the Von Neumann architecture, there is a single cache used for
instruction and data. This type of cache is known as a unified cache.
• The Harvard architecture has separate instruction and data buses to improve overall
system performance, but supporting the two buses requires two caches.
• In processor cores using the Harvard architecture, there are two caches: an instruction
cache (I-cache) and a data cache (D-cache). This type of cache is known as a split cache.
3
ARM processors Module - 5
4
ARM processors Module - 5
• The tag field is the portion of the address that is compared to the cache-tag value found in
the directory store.
• The comparison of the tag with the cache-tag determines whether the requested data is in
cache or represents another of the million locations in main memory with an ending
address of 0x824.
• During a cache line fill the cache controller may forward the loading data to the core at
the same time it is copying it to cache; this is known as data streaming.
• If valid data exists in this cache line but represents another address block in main memory,
the entire cache line is evicted and replaced by the cache line containing the requested
address. This process of removing an existing cache line as part of servicing a cache miss is
known as eviction
• A direct-mapped cache is a simple solution, but there is a design cost inherent in having a
single location available to store a value from main memory.
• Direct-mapped caches are subject to high levels of thrashing—a software battle for the
same location in cache memory.
Set Associativity
• This structural design feature is a change that divides the cache memory into smaller
equal units, called ways.
Write Buffers
• A write buffer is a very small, fast FIFO memory buffer that temporarily holds data that
the processor would normally write to main memory.
7
ARM processors Module - 5
• In a system with a write buffer, data is written at high speed to the FIFO and then emptied
to slower main memory.
• The write buffer reduces the processor time taken to write small blocks of sequential data
to main memory. The FIFO memory of the write buffer is at the same level in the memory
hierarchy as the L1 cache and is shown in Figure
• The efficiency of the write buffer depends on the ratio of main memory writes to the
number of instructions executed.
• A write buffer also improves cache performance; the improvement occurs during cache
line evictions. If the cache controller evicts a dirty cache line, it writes the cache line to the
write buffer instead of main memory.
• Data written to the write buffer is not available for reading until it has exited the write
buffer to main memory.
• The miss rate is similar in form: the total cache misses divided by the total number of
memory requests expressed as a percentage over a time interval. Note that the miss rate
also equals 100 minus the hit rate.
• hit time—the time it takes to access a memory location in the cache
• miss penalty—the time it takes to load a cache line from main memory into cache.
• There are two methods to control access to system resources, unprotected and protected.
• An unprotected system relies solely on software to protect the system resources.
• A protected system relies on both hardware and software to protect the system resources.
• An unprotected embedded system has no hardware dedicated to enforcing the use of
memory and peripheral devices during operation.
• A protected system has dedicated hardware to check and restrict access to system
resources. It can enforce resource ownership.
• A protected system is proactive in preventing one task from using the resources of
another.
Cache Policy
There are three policies that determine the operation of a cache: the write policy, the replacement
policy, and the allocation policy. The cache write policy determines where data is stored during
processor write operations. The replacement policy selects the cache line in a set that is used for
8
ARM processors Module - 5
the next line fill during a cache miss. The allocation policy determines when the cache controller
allocates a cache line.
When the processor core writes to memory, the cache controller has two alternatives for its write
policy. The controller can write to both the cache and main memory, updating the values in both
locations; this approach is known as writethrough. Alternatively, the cache controller can write to
cache memory and not update main memory, this is known as writeback or copyback.
12.3.1.1 Writethrough
When the cache controller uses a writethrough policy, it writes to both cache and main memory
when there is a cache hit on write, ensuring that the cache and main memory stay coherent at all
times. Under this policy, the cache controller performs a write to main memory for each write to
cache memory. Because of the write to main memory, a writethrough policy is slower than a
writeback policy.
12.3.1.2 Writeback
When a cache controller uses a writeback policy, it writes to valid cache data memory and not to
main memory. Consequently, valid cache lines and main memory may contain different data. The
cache line holds the most recent data, and main memory contains older data, which has not been
updated.
Caches configured as writeback caches must use one or more of the dirty bits in the cache line
status information block. When a cache controller in writeback writes a value to cache memory, it
sets the dirty bit true. If the core accesses the cache line at a later time, it knows by the state of the
dirty bit that the cache line contains data not in main memory. If the cache controller evicts a dirty
cache line, it is automatically written out to main memory. The controller does this to prevent the
loss of vital information held in cache memory and not in main memory.
One performance advantage a writeback cache has over a writethrough cache is in the frequent use
of temporary local variables by a subroutine. These variables are transient in nature and never
really need to be written to main memory. An example of one of these
transient variables is a local variable that overflows onto a cached stack because there are not
enough registers in the register file to hold the variable.
On a cache miss, the cache controller must select a cache line from the available set in cache
memory to store the new information from main memory. The cache line selected for replacement
is known as a victim. If the victim contains valid, dirty data, the controller must write the dirty data
from the cache memory to main memory before it copies new data into the victim cache line. The
process of selecting and replacing a victim cache line is known as eviction.
9
ARM processors Module - 5
The strategy implemented in a cache controller to select the next victim is called its replacement
policy. The replacement policy selects a cache line from the available associative member set; that
is, it selects the way to use in the next cache line replacement. To summarize the overall process,
the set index selects the set of cache lines available in the ways, and the replacement policy selects
the specific cache line from the set to replace.
ARM cached cores support two replacement policies, either pseudorandom or round-robin.
§ Round-robin or cyclic replacement simply selects the next cache line in a set to replace.
The selection algorithm uses a sequential, incrementing victim counter that increments
each time the cache controller allocates a cache line. When the victim counter reaches a
maximum value, it is reset to a defined base value.
§ Pseudorandom replacement randomly selects the next cache line in a set to replace. The
selection algorithm uses a nonsequential incrementing victim counter. In a pseudoran- dom
replacement algorithm the controller increments the victim counter by randomly selecting
an increment value and adding this value to the victim counter. When the victim counter
reaches a maximum value, it is reset to a defined base value.
Most ARM cores support both policies (see Table 12.1 for a comprehensive list of ARM
cores and the policies they support). The round-robin replacement policy has greater pre-
dictability, which is desirable in an embedded system. However, a round-robin
replacement policy is subject to large changes in performance given small changes in
memory access.
There are two strategies ARM caches may use to allocate a cache line after a the occurrence of a
cache miss. The first strategy is known as read-allocate, and the second strategy is known as read-
write-allocate.
A read allocate on cache miss policy allocates a cache line only during a read from main memory.
If the victim cache line contains valid data, then it is written to main memory before the cache line
is filled with new data.
Under this strategy, a write of new data to memory does not update the contents of the cache
memory unless a cache line was allocated on a previous read from main memory. If the cache line
contains valid data, then a write updates the cache and may update main memory if the cache
write policy is writethrough. If the data is not in cache, the controller writes to main memory only.
A read-write allocate on cache miss policy allocates a cache line for either a read or write to
memory. Any load or store operation made to main memory, which is not in cache memory,
allocates a cache line. On memory reads the controller uses a read-allocate policy.
On a write, the controller also allocates a cache line. If the victim cache line contains valid data,
then it is first written back to main memory before the cache controller fills the victim cache line
with new data from main memory. If the cache line is not valid, it simply does a cache line fill.
10
ARM processors Module - 5
After the cache line is filled from main memory, the controller writes the data to the corresponding
data location within the cache line. The cached core also updates main memory if it is a
writethrough cache.
The ARM7, ARM9, and ARM10 cores use a read-allocate on miss policy; the Intel XScale
supports both read-allocate and write-allocate on miss. Table 12.1 provides a listing of the policies
supported by each core.
There are several coprocessor 15 registers used to specifically configure and control ARM cached
cores. Table 12.2 lists the coprocessor 15 registers that control cache configuration. Primary CP15
registers c7 and c9 control the setup and operation of cache. Secondary CP15:c7 registers are write
only and clean and flush cache. The CP15:c9 register defines the victim pointer base address,
which determines the number of lines of code or data that are locked in cache. We discuss these
commands in more detail in the sections that follow. To review the general use of coprocessor 15
instructions and syntax, see Section 3.5.2.
There are other CP15 registers that affect cache operation; the definition of these registers is core
dependent. These other registers are explained in Chapter 13 in Sections 13.2.3 and 13.2.4 on
initializing the MPU, and in Chapter 14 in Section 14.3.6 on initializing the MMU.
In the next several sections we use the CP15 registers listed in Table 12.2 to provide example
routines to clean and flush caches, and to lock code or data in cache. The control system usually
calls these routines as part of its memory management activities.
11