0% found this document useful (0 votes)
33 views7 pages

CDT25 CacheMemory

1. The document discusses different cache mapping techniques including direct mapping, associative mapping, and set-associative mapping. Direct mapping maps each block of main memory to a specific block in the cache. Associative mapping allows a block to map to any cache position. Set-associative mapping groups blocks into sets, and a memory block can reside in any block within its set. 2. When a cache miss occurs and the cache is full, a replacement algorithm is needed to determine which existing block to remove to make space for the new block. The document discusses the least recently used (LRU) algorithm, which replaces the block that has gone the longest without being referenced. 3. The performance of different cache

Uploaded by

ASHRAY KONDRU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views7 pages

CDT25 CacheMemory

1. The document discusses different cache mapping techniques including direct mapping, associative mapping, and set-associative mapping. Direct mapping maps each block of main memory to a specific block in the cache. Associative mapping allows a block to map to any cache position. Set-associative mapping groups blocks into sets, and a memory block can reside in any block within its set. 2. When a cache miss occurs and the cache is full, a replacement algorithm is needed to determine which existing block to remove to make space for the new block. The document discusses the least recently used (LRU) algorithm, which replaces the block that has gone the longest without being referenced. 3. The performance of different cache

Uploaded by

ASHRAY KONDRU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Department of CSN, KITSW U18CN305 COMPUTER ARCHITECTURE AND

3CSN2 AY:2023-24 ORGANIZATION


CDT25 - LECTURE SUMMARY

CDT25 Cache Memory


Topics Covered

Motivation Cache memory is used to reduce the average time to access data from the Main
(Why you memory
(students) should
learn these topics?)

Lecture Learning Outcomes (LLOs): After completion of this lecture, you should be able to…
LLO1
On topic 1
Compare various mapping techniques of cache memory

CDT25– Lecture Summary – Key Takeaways


Mapping functions:
Mapping functions determine how memory blocks are placed in the cache.
A simple processor example:
Cache consisting of 128 blocks of 16 words each. Total size of cache is 2048 (2K) words.
Main memory is addressable by a 16-bit address. Main memory has 64K words. Main
memory has 4K blocks of 16 words each.
Three mapping functions:
 Direct mapping
 Associative mapping
 Set-associative mapping.

Direct Mapping:
Main Block 0
memory
Cache Block 1
tag Block 0
tag Block 1
Block 127
Block 128
tag Block 127 Block 129

Tag Block Word Block 255


5 7 4 Block 256
Main memory address Block 257

Block 4095

Block j of the main memory maps to j modulo 128 of the cache. 0 maps to 0, 129 maps to 1.
More than one memory block is mapped onto the same position in the cache. May lead to
contention for cache blocks even if the cache is not full. Resolve the contention by allowing
new block to replace the old block, leading to a trivial replacement algorithm.
Prepared by: P.PRIYANKA, Dept. of CSN, KITSW Page 1 of 7
Department of CSN, KITSW U18CN305 COMPUTER ARCHITECTURE AND
3CSN2 AY:2023-24 ORGANIZATION
CDT25 - LECTURE SUMMARY
Memory address is divided into three fields:
- Low order 4 bits determine one of the 16 words in a block.
- When a new block is brought into the cache, the next 7 bits determine which cache
block this new block is placed in.
- High order 5 bits determine which of the possible 32 blocks is currently present in the
cache. These are tag bits. Simple to implement but not very flexible.

Associative Mapping:
Main Block 0
memory
Cache Block 1
tag
Block 0
tag
Block 1
Block 127
Block 128
tag
Block 127 Block 129

Block 255
Tag Word
12 4 Block 256

Main memory address Block 257

Block 4095

Main memory block can be placed into any cache position.


Memory address is divided into two fields:
- Low order 4 bits identify the word within a block.
- High order 12 bits or tag bits identify a memory block when it is resident in the cache.
Flexible, and uses cache space efficiently. Replacement algorithms can be used to replace an
existing block in the cache when the cache is full. Cost is higher than direct-mapped cache
because of the need to search all 128 patterns to determine whether a given block is in the
cache.
Set-Associative Mapping:

Prepared by: P.PRIYANKA, Dept. of CSN, KITSW Page 2 of 7


Department of CSN, KITSW U18CN305 COMPUTER ARCHITECTURE AND
3CSN2 AY:2023-24 ORGANIZATION
CDT25 - LECTURE SUMMARY
Cache Main
memory Block 0
Block 1
tag Block 1
tag Block 2

Block 63
Block 64
tag Block 126 Block 65

Tag Block Word Block 127


5 7 4 Block 128
Main memory address Block 129

Block 4095

Blocks of cache are grouped into sets. Mapping function allows a block of the main memory
to reside in any block of a specific set. Divide the cache into 64 sets, with two blocks per set.
Memory block 0, 64, 128 etc. map to block 0, and they can occupy either of the two positions.
Memory address is divided into three fields:
- 6 bit field determines the set number.
- High order 6 bit fields are compared to the tag fields of the two blocks in a set.
Set-associative mapping combination of direct and associative mapping.
Number of blocks per set is a design parameter.
- One extreme is to have all the blocks in one set, requiring no set bits (fully associative
mapping).
- Other extreme is to have one block per set, is the same as direct mapping.

REPLACEMENT ALGORITHMS
In a direct-mapped cache, the position of each block is predetermined by its address;
hence, the replacement strategy is trivial. In associative and set-associative caches there
exists some flexibility. When a new block is to be brought into the cache and all the
positions that it may occupy are full, the cache controller must decide which of the old
blocks to overwrite. This is an important issue, because the decision can be a strong
determining factor in system performance.
In general, the objective is to keep blocks in the cache that are likely to be referenced in
the near future. But, it is not easy to determine which blocks are about to be referenced.
The property of locality of reference in programs gives a clue to a reasonable strategy.
Because program execution usually stays in localized areas for reasonable periods of
time, there is a high probability that the blocks that have been referenced recently will
be referenced again soon. Therefore, when a block is to be overwritten, it is sensible to

Prepared by: P.PRIYANKA, Dept. of CSN, KITSW Page 3 of 7


Department of CSN, KITSW U18CN305 COMPUTER ARCHITECTURE AND
3CSN2 AY:2023-24 ORGANIZATION
CDT25 - LECTURE SUMMARY
overwrite the one that has gone the longest time without being referenced. This block is
called the least recently used (LRU) block, and the technique is called the LRU
replacement algorithm.
To use the LRU algorithm, the cache controller must track references to all blocks as
computation proceeds. Suppose it is required to track the LRU block of a four-block set
in a set-associative cache. A 2-bit counter can be used for each block. When a hit occurs,
the counter of the block that is referenced is set to 0
. Counters with values originally lower than the referenced one are incremented by one,
and all others remain unchanged. When a miss occurs and the set is not full, the counter
associated with the new block loaded from the main memory is set to 0, and the values
of all other counters are increased by one. When a miss occurs and the set is full, the
block with the counter value 3 is removed, the new block is put in its place, and its
counter is set to 0. The other three block counters are incremented by one. It can be
easily verified that the counter values of occupied blocks are always distinct.
The LRU algorithm has been used extensively. Although it performs well for many
access patterns, it can lead to poor performance in some cases. For example, it produces
disappointing results when accesses are made to sequential elements of an array that is
slightly too large to fit into the cache.

Performance of the LRU algorithm can be improved by introducing a small amount of


randomness in deciding which block to replace. Several other replacement algorithms
are also used in practice. An intuitively reasonable rule would be to remove the
“oldest” block from a full set when a new block must be brought in. However, because
this algorithm does not take into account the recent pattern of access to blocks in the
cache, it is generally not as effective as the LRU algorithm in choosing the best blocks to
remove. The simplest algorithm is to randomly choose the block to be overwritten.
Interestingly enough, this simple algorithm has been found to be quite effective in
practice.
We now consider a detailed example to illustrate the effects of different cache mapping
techniques. Assume that a processor has separate instruction and data caches. To keep the
example simple, assume the data cache has space for only eight blocks of data. Also assume
that each block consists of only one 16-bit word of data and the memory is word-addressable
with 16-bit addresses. (These parameters are not realistic for actual computers, but they allow
us to illustrate mapping techniques clearly.)
Finally, assume the LRU replacement algorithm is used for block replacement in the cache. Let
us examine changes in the data cache entries caused by running the following application. A 4
× 10 array of numbers, each occupying one word, is stored in main memory locations 7A00
through 7A27 (hex). The elements of this array, A, are stored in column order, as shown in
Figure 8.19. The figure also indicates how tags for different cache mapping techniques are
derived from the memory address.

Prepared by: P.PRIYANKA, Dept. of CSN, KITSW Page 4 of 7


Department of CSN, KITSW U18CN305 COMPUTER ARCHITECTURE AND
3CSN2 AY:2023-24 ORGANIZATION
CDT25 - LECTURE SUMMARY

Note that no bits are needed to identify a word within a block, as was done in Figures 8.16
through 8.18, because we have assumed that each block contains only one word. The
application normalizes the elements of the first row of A with respect to the average value of
the elements in the row. Hence, we need to compute the average of the elements in the row
and divide each element by that average. The required task can be expressed as

We use the variables SUM and AVE to hold the sum and average values, respectively. These
variables, as well as index variables i and j, are held in processor registers during the
computation.
Direct-Mapped Cache
In a direct-mapped data cache, the contents of the cache change as shown in Figure 8.21. The
columns in the table indicate the cache contents after various passes through the two program
loops in Figure 8.20 are completed. For example, after the second pass through the first loop (j =
1), the cache holds the elements A(0, 0) and A(0, 1). These elements are in block positions 0 and
4, as determined by the three least-significant bits of the address. During the next pass, the A(0,
0) element is replaced by A(0, 2), which maps into the same block position. Note that the
desired elements map into only two positions in the cache, thus leaving the contents of the
other six positions unchanged from whatever they were before the normalization task started.
Elements A(0, 8) and A(0, 9) are loaded into the cache during the ninth and tenth passes
Prepared by: P.PRIYANKA, Dept. of CSN, KITSW Page 5 of 7
Department of CSN, KITSW U18CN305 COMPUTER ARCHITECTURE AND
3CSN2 AY:2023-24 ORGANIZATION
CDT25 - LECTURE SUMMARY
through the first loop (j = 8, 9). The second loop reverses the order in which the elements are
handled. The first two passes through this loop (i = 9, 8) find the required data in the cache.
When i = 7, element A(0, 9) is replaced with A(0, 7). When i = 6, element A(0, 8)

is replaced with A(0, 6), and so on. Thus, eight elements are replaced while the second loop is
executed. In total, there are only two hits during execution of this task. The reader should keep in
mind that the tags must be kept in the cache for each block. They are not shown to keep the figure
simple.

Associative-Mapped Cache
Figure 8.22 presents the changes in cache contents for the case of an associative-mapped cache.
During the first eight passes through the first loop, the elements are brought into consecutive block
positions, assuming that the cache was initially empty. During the ninth pass (j = 8), the LRU
algorithm chooses A(0, 0) to be overwritten by A(0, 8). In the next and last pass through the j loop,
element A(0, 1) is replaced with A(0, 9). Now, for the first eight passes through the second loop (i =
9, 8,..., 2) all the required elements are found in the cache.
When i = 1, the element needed is A(0, 1), so it replaces the least recently used element, A(0, 9).
During the last pass, A(0, 0) replaces A(0, 8). In this case, when the second loop is executed, only
two elements are not found in the cache. In the direct-mapped case, eight of the elements had to be
reloaded during the second loop. Obviously, the associative-mapped cache benefits from the
complete freedom in mapping a memory block into any position in the cache.
In both cases, better utilization of the cache is achieved by reversing the order in which the elements
are handled in the second loop of the program. It is interesting to consider what would happen if
the second loop dealt with the elements in the same order as in the first loop. Using either direct
mapping or the LRU algorithm, all elements would be overwritten before they are used in the
second loop (see Problem 8.10).

Set-Associative-Mapped Cache
For this example, we assume that a set-associative data cache is organized into two sets, each
capable of holding four blocks. Thus, the least-significant bit of an address determines which set a
memory block maps into, but the memory data can be placed in any of the four blocks of the set.
The high-order 15 bits of the address constitute the tag.

Prepared by: P.PRIYANKA, Dept. of CSN, KITSW Page 6 of 7


Department of CSN, KITSW U18CN305 COMPUTER ARCHITECTURE AND
3CSN2 AY:2023-24 ORGANIZATION
CDT25 - LECTURE SUMMARY

CDT37 - LECTURE LEVEL PRACTICE PROBLEMS (LLPs) to test the LLOs


To test whether you achieved the learning outcomes of this lecture, you should be able to solve the following LLPs,
in the class itself, after completion of the lecture. Minimum one question / problem (LLP) is designed to test each
expected LLO.

LLP1 (on LLO1):

A block-set-associative cache consists of a total of 64 blocks, divided into 4-block sets. The main memory
contains 4096 blocks, each consisting of 32 words. Assuming a 32-bit byte-addressable address space,
how many bits are there in each of the Tag, Set, and Word fields?.

Prepared by: P.PRIYANKA, Dept. of CSN, KITSW Page 7 of 7

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy