CDT25 CacheMemory
CDT25 CacheMemory
Motivation Cache memory is used to reduce the average time to access data from the Main
(Why you memory
(students) should
learn these topics?)
Lecture Learning Outcomes (LLOs): After completion of this lecture, you should be able to…
LLO1
On topic 1
Compare various mapping techniques of cache memory
Direct Mapping:
Main Block 0
memory
Cache Block 1
tag Block 0
tag Block 1
Block 127
Block 128
tag Block 127 Block 129
Block 4095
Block j of the main memory maps to j modulo 128 of the cache. 0 maps to 0, 129 maps to 1.
More than one memory block is mapped onto the same position in the cache. May lead to
contention for cache blocks even if the cache is not full. Resolve the contention by allowing
new block to replace the old block, leading to a trivial replacement algorithm.
Prepared by: P.PRIYANKA, Dept. of CSN, KITSW Page 1 of 7
Department of CSN, KITSW U18CN305 COMPUTER ARCHITECTURE AND
3CSN2 AY:2023-24 ORGANIZATION
CDT25 - LECTURE SUMMARY
Memory address is divided into three fields:
- Low order 4 bits determine one of the 16 words in a block.
- When a new block is brought into the cache, the next 7 bits determine which cache
block this new block is placed in.
- High order 5 bits determine which of the possible 32 blocks is currently present in the
cache. These are tag bits. Simple to implement but not very flexible.
Associative Mapping:
Main Block 0
memory
Cache Block 1
tag
Block 0
tag
Block 1
Block 127
Block 128
tag
Block 127 Block 129
Block 255
Tag Word
12 4 Block 256
Block 4095
Block 63
Block 64
tag Block 126 Block 65
Block 4095
Blocks of cache are grouped into sets. Mapping function allows a block of the main memory
to reside in any block of a specific set. Divide the cache into 64 sets, with two blocks per set.
Memory block 0, 64, 128 etc. map to block 0, and they can occupy either of the two positions.
Memory address is divided into three fields:
- 6 bit field determines the set number.
- High order 6 bit fields are compared to the tag fields of the two blocks in a set.
Set-associative mapping combination of direct and associative mapping.
Number of blocks per set is a design parameter.
- One extreme is to have all the blocks in one set, requiring no set bits (fully associative
mapping).
- Other extreme is to have one block per set, is the same as direct mapping.
REPLACEMENT ALGORITHMS
In a direct-mapped cache, the position of each block is predetermined by its address;
hence, the replacement strategy is trivial. In associative and set-associative caches there
exists some flexibility. When a new block is to be brought into the cache and all the
positions that it may occupy are full, the cache controller must decide which of the old
blocks to overwrite. This is an important issue, because the decision can be a strong
determining factor in system performance.
In general, the objective is to keep blocks in the cache that are likely to be referenced in
the near future. But, it is not easy to determine which blocks are about to be referenced.
The property of locality of reference in programs gives a clue to a reasonable strategy.
Because program execution usually stays in localized areas for reasonable periods of
time, there is a high probability that the blocks that have been referenced recently will
be referenced again soon. Therefore, when a block is to be overwritten, it is sensible to
Note that no bits are needed to identify a word within a block, as was done in Figures 8.16
through 8.18, because we have assumed that each block contains only one word. The
application normalizes the elements of the first row of A with respect to the average value of
the elements in the row. Hence, we need to compute the average of the elements in the row
and divide each element by that average. The required task can be expressed as
We use the variables SUM and AVE to hold the sum and average values, respectively. These
variables, as well as index variables i and j, are held in processor registers during the
computation.
Direct-Mapped Cache
In a direct-mapped data cache, the contents of the cache change as shown in Figure 8.21. The
columns in the table indicate the cache contents after various passes through the two program
loops in Figure 8.20 are completed. For example, after the second pass through the first loop (j =
1), the cache holds the elements A(0, 0) and A(0, 1). These elements are in block positions 0 and
4, as determined by the three least-significant bits of the address. During the next pass, the A(0,
0) element is replaced by A(0, 2), which maps into the same block position. Note that the
desired elements map into only two positions in the cache, thus leaving the contents of the
other six positions unchanged from whatever they were before the normalization task started.
Elements A(0, 8) and A(0, 9) are loaded into the cache during the ninth and tenth passes
Prepared by: P.PRIYANKA, Dept. of CSN, KITSW Page 5 of 7
Department of CSN, KITSW U18CN305 COMPUTER ARCHITECTURE AND
3CSN2 AY:2023-24 ORGANIZATION
CDT25 - LECTURE SUMMARY
through the first loop (j = 8, 9). The second loop reverses the order in which the elements are
handled. The first two passes through this loop (i = 9, 8) find the required data in the cache.
When i = 7, element A(0, 9) is replaced with A(0, 7). When i = 6, element A(0, 8)
is replaced with A(0, 6), and so on. Thus, eight elements are replaced while the second loop is
executed. In total, there are only two hits during execution of this task. The reader should keep in
mind that the tags must be kept in the cache for each block. They are not shown to keep the figure
simple.
Associative-Mapped Cache
Figure 8.22 presents the changes in cache contents for the case of an associative-mapped cache.
During the first eight passes through the first loop, the elements are brought into consecutive block
positions, assuming that the cache was initially empty. During the ninth pass (j = 8), the LRU
algorithm chooses A(0, 0) to be overwritten by A(0, 8). In the next and last pass through the j loop,
element A(0, 1) is replaced with A(0, 9). Now, for the first eight passes through the second loop (i =
9, 8,..., 2) all the required elements are found in the cache.
When i = 1, the element needed is A(0, 1), so it replaces the least recently used element, A(0, 9).
During the last pass, A(0, 0) replaces A(0, 8). In this case, when the second loop is executed, only
two elements are not found in the cache. In the direct-mapped case, eight of the elements had to be
reloaded during the second loop. Obviously, the associative-mapped cache benefits from the
complete freedom in mapping a memory block into any position in the cache.
In both cases, better utilization of the cache is achieved by reversing the order in which the elements
are handled in the second loop of the program. It is interesting to consider what would happen if
the second loop dealt with the elements in the same order as in the first loop. Using either direct
mapping or the LRU algorithm, all elements would be overwritten before they are used in the
second loop (see Problem 8.10).
Set-Associative-Mapped Cache
For this example, we assume that a set-associative data cache is organized into two sets, each
capable of holding four blocks. Thus, the least-significant bit of an address determines which set a
memory block maps into, but the memory data can be placed in any of the four blocks of the set.
The high-order 15 bits of the address constitute the tag.
A block-set-associative cache consists of a total of 64 blocks, divided into 4-block sets. The main memory
contains 4096 blocks, each consisting of 32 words. Assuming a 32-bit byte-addressable address space,
how many bits are there in each of the Tag, Set, and Word fields?.