0% found this document useful (0 votes)

7 views26 pages

CA11_2023S1_new

The lecture discusses associative cache in computer architecture, focusing on memory hierarchy, cache misses, and performance metrics. It covers cache design, measuring cache performance, and strategies to reduce cache miss rates, including set associativity and replacement policies. Examples illustrate concepts like memory stall cycles and average memory access time, emphasizing the impact of cache design on overall system performance.

Uploaded by

Huy Hoang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views26 pages

CA11_2023S1_new

Uploaded by

Huy Hoang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

ELT3047 Computer Architecture

Lecture 11: Associative cache

Hoang Gia Hung

Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Last lecture review
❑ Memory hierarchy: have multiple levels of storage & ensure the
data the processor needs is kept in the fast(er) level(s).
➢ Temporal Locality: if address 𝑋 is accessed, it’s likely to be accessed in the
near future.
➢ Spatial Locality: if address 𝑋 is accessed, data stored in nearby locations
are likely to be accessed in the near future.

❑ Three major categories of cache misses:

1. Compulsory misses: sad facts of life. Example: cold start misses
2. Conflict misses: multiple memory location being mapped to the same
cache location. Nightmare Scenario: ping pong effect.
3. Capacity misses: the cache is not big enough to contains all the cache
blocks required by the program. Solution: increase cache size.

❑ Cache design space:

➢ total size, block size
➢ write-hit policy (write-through, write-back)
➢ write-miss policy (write allocate, write buffers)
Measuring Cache Performance
❑ The processor stalls on a cache miss
➢ When fetching instructions from the Instruction Cache (I-cache)
➢ When loading or storing data into the Data Cache (D-cache)
➢ Miss penalty is assumed equal for I-cache & D-cache
➢ Miss penalty is assumed equal for Load and Store

❑ Components of CPU time:

➢ Program execution cycles (includes cache hit time)
➢ Memory stall cycles (mainly from cache misses)
➢ CPU time = IC × CPI × CC = IC × (CPIideal + Memory-stall cycles) × CC
CPIstall
▪ CPIideal = CPI for ideal cache (no cache misses)
▪ CPIstall = CPI in the presence of memory stalls
▪ Memory stall cycles increase the CPI!
Memory Stall Cycles
❑ Sum of read-stalls and write-stalls (due to cache misses)
➢ Read-stall cycles = reads/program × read miss rate × read miss penalty
➢ Write-stall cycles = (writes/program × write miss rate × write miss penalty)
+ write buffer stalls

❑ Memory stall cycles = (I-Cache Misses + D-Cache Misses) ×

Miss Penalty
➢ I-Cache Misses = I-Count × I-Cache Miss Rate
➢ D-Cache Misses = LS-Count × D-Cache Miss Rate
▪ LS-Count (Load & Store) = I-Count × LS Frequency

❑ With simplifying assumptions:

Memory stall cycles = I-Count x misses/instruction x miss penalty
I-Cache Miss Rate + LS Frequency × D-Cache Miss Rate
➢ Memory stall cycles/instruction = I-Cache Miss Rate × Miss Penalty +
LS Frequency × D-Cache Miss Rate × Miss Penalty
➢ For write-through caches: Memory-stall cycles = miss rate × miss penalty
Memory Stall Cycles: example
❑ Example: Compute misses/instruction and memory stall cycles
for a program with the given characteristics
▪ Instruction count (I-Count) = 106 instructions
▪ 30% of instructions are loads and stores
▪ D-cache miss rate is 5% and I-cache miss rate is 1%
▪ Miss penalty is 100 clock cycles for instruction and data caches

❑ Solution:
➢ misses/instruction=1%+30%x5%=0.025;
➢ memory stall cycles/instruction=0.025x100=2.5 cycles
➢ total memory stall cycles=2.5x106=2,500,000 cycles
Impacts of Cache Performance
❑ Relative cache penalty increases as processor performance
improves (faster clock rate and/or lower CPI)
➢ When calculating CPIstall, the cache miss penalty is measured in processor
clock cycles needed to handle a miss.
➢ The lower the CPIideal, the more pronounced the impact of stalls

❑ Example: Given
▪ I-cache miss rate = 2%, D-cache miss rate = 4%
▪ Miss penalty = 100 cycles
▪ Base CPI (ideal cache) = 2
▪ Load & stores are 36% of instructions
Questions:
➢ What is CPIstall? 2+(2%+36%x4%)x100 = 5.44, % time on memory stall = 63%
➢ What if the CPIideal is reduced to 1? % time on memory stall = 77%
➢ What if the processor clock rate is doubled? Miss penalty = 200, CPIstall = 8.88
Average Memory Access Time (AMAT)
❑ Hit time is also important for performance
➢ A larger cache will have a longer access time → an increase in hit time will
likely add another stage to the pipeline.
➢ At some point, the increase in hit time for a larger cache will overcome the
improvement in hit rate leading to a decrease in performance.

❑ Average Memory Access Time (AMAT) is the average time to

access memory considering both hits and misses.
AMAT = Hit time + Miss rate × Miss penalty
❑ Example: Find the AMAT for a cache with
▪ Cache access time (Hit time) of 1 cycle = 2 ns
▪ Miss penalty of 20 clock cycles
▪ Miss rate of 0.05 per access

❑ Solution:
➢ AMAT = 1 + 0.05 × 20 = 2 cycles = 4 ns
➢ Without the cache, AMAT will be equal to miss penalty = 20 cycles = 40 ns
Reducing cache miss rates #1: cache
associativity
❑ Allow more flexible block placement
➢ In a direct mapped cache a memory block maps to exactly one cache block
➢ At the other extreme, could allow a memory block to be mapped to any cache
block → fully associative cache (no indexing)

❑ A compromise is to divide the cache into sets, each of which

consists of n “ways” (n-way set associative).
➢ A memory block maps to a unique set (specified by the index field) and can
be placed in any way of that set (so there are n choices).
Set index = (block address) modulo (# sets in the cache)
❑ Example: consider the main memory word reference for the
following string
0 4 0 4 0 4 0 4
➢ Start with an empty cache - all blocks initially marked as not valid
Set Associative Cache: Example
Main Memory

0000xx One word blocks

Cache 0001xx Two low order bits
0010xx define the byte in the
Way Set V Tag Data
0011xx word (32b words)
0 0100xx
0 0101xx
1
0110xx
0
0111xx Q2: How do we find it?
1
1
1000xx Use next 1 low order
1001xx memory address bit to
Q1: Is it there?
1010xx determine which cache
Compare all the cache 1011xx set (i.e., modulo the
tags in the set to the high 1100xx number of sets in the
order 3 memory address 1101xx
cache)
bits to tell if the memory 1110xx
block is in the cache 1111xx
Set associative cache example:
reference string mapping
0 4 0 4 0 4 0 4

0 miss 4 miss 0 hit 4 hit

000 Mem(0) 000 Mem(0) 000 Mem(0) 000 Mem(0)

010 Mem(4) 010 Mem(4) 010 Mem(4)

❑ 8 requests, 2 misses
➢ Solves the ping pong effect in a direct mapped cache since now 2 memory
locations that map into the same cache set can co-exist!
Four-Way Set Associative Cache
Organization

28 = 256 sets
each with Way 0 Way 1 Way 2 Way 3
four ways
(each with
one block)

Content Addressable Memory

(CAM): a circuit that combines
comparison and storage in a single
device - supply the data, it will look
for a copy & returns the index of
the matching row → CAM allows
much higher set associativity (8-
way and above) than the standard
HW of SRAMs + comparators.
Range of Set Associative Caches
Used for tag compare Selects the set Selects the word in the block

Tag Index Block offset Byte offset

Increasing associativity
Decreasing associativity
Fully associative
Direct mapped (only one set)
(only one way) Tag is all the bits except
Smaller tags, only a block and byte offset
single comparator

❑ For a fixed size cache, each increase by a factor of two in

associativity doubles the number of blocks per set (= the number
of ways) and halves the number of sets – decreases the size of
the index by 1 bit and increases the size of the tag by 1 bit.
Replacement Policies
❑ A miss occurred, which way’s block do we pick for replacement?
➢ Direct mapped: no choice.
➢ Set associative: non-valid entry, then choose among entries in the set.

❑ First In First Out (FIFO): replace the oldest block in set

➢ Use one counter per set to specify the oldest block. On a cache miss replace
the block specified by counter & increment the counter.

❑ Least Recently Used (LRU): replace the one that has been
unused for the longest time
➢ Requires hardware to keep track of when each way’s block was used relative
to the other blocks in the set. For 2-way set associative, takes one bit per set
→ set the bit when a block is referenced (and reset the other way’s bit)
➢ Manageable for 4-way, too hard beyond that.

❑ Random
➢ Gives approximately the same performance as LRU for high associativity.
Caching Example: Write-Back Fully
Associative with LRU
❑ Matrix multiplication, cold start
➢ B has been transposed into Bt to optimize efficiency
➢ Lower LRU number = more recent used

❑ Compute C[0]
➢ Access A[0] = 0x1000:
▪ Miss, copy block A[0:3] to cache, set Tag, V, LRU bits
➢ Access Bt[0] = 0x2000: Bt 4 4 4 4
▪ Miss, copy block Bt[0:3] to cache, set new Tag, V, 3 3 3 3
LRU bits, update existing LRU bit
2 2 2 2
1 1 1 1
Tag V Dirty LRU Data
A C
1
0x100 0 0 2
1
0 0x01 0x16
0xBF 0x02 0x88
0x03 0x2B
0x04
1 2 3 4
0x200 0
0x733 1 0 1
0 0x01 0x01
0x3B 0x18 0x01
0xF1 0x01
0xB3 5 6 7 8
0x156 0 0 0 0xE6 0x57 0x49 0xEE 9 10 11 12
0x4E9 0 0 0 0xB5 0x81 0x67 0x3F 13 14 15 16
Caching Example: Write-Back Fully
Associative with LRU
❑ Compute C[0] (cont.)
➢ Access A[1] = 0x1004: hit, Access Bt[1] = 0x2004: hit
➢ Access A[2] = 0x1008: hit, Access Bt[2] = 0x2008: hit
➢ Access A[3] = 0x100C: hit, Access Bt[3] = 0x200C: hit

❑ Write to C[0] = 0x3000:

➢ Miss + Write, so set dirty bit as well as Tag, V, & LRU bits
➢ Update existing LRU bits Bt 4 4 4 4
➢ Main memory not updated yet 3 3 3 3
2 2 2 2
1 1 1 1
Tag V Dirty LRU Data
A C
0x100 1 0 3
2 0x01 0x02 0x03 0x04
1 2 3 4
0x200 1 0 2
1 0x04 0x04 0x04 0x04 5 6 7 8
0x300 0
0x156 1 1
0 1
0 0x28 0x57
0xE6 0x?? 0x49
0x?? 0xEE
0x?? 9 10 11 12
0x4E9 0 0 0 0xB5 0x81 0x67 0x3F 13 14 15 16
Caching Example: Write-Back Fully
Associative with LRU
❑ Compute C[1]
➢ Access A[0] = 0x1000: hit, Access Bt[4] = 0x2010:
▪ Miss, copy block Bt[4:7] to cache
➢ Access A[1] = 0x1004: hit, Access Bt[5] = 0x2014: hit
➢ Access A[2] = 0x1008: hit, Access Bt[6] = 0x2018: hit
➢ Access A[3] = 0x100C: hit, Access Bt[7] = 0x201C: hit

❑ Write to C[1] = 0x3004: Bt 4 4 4 4

➢ Hit, update LRU bits 3 3 3 3
➢ Main memory not updated yet 2 2 2 2
1 1 1 1
Tag V Dirty LRU Data
A C
0x100 1 0 3
2 0x01 0x02 0x03 0x04
1 2 3 4
0x200 1 0 2
4 0x04 0x04 0x04 0x04 5 6 7 8
0x300 1 1 1
3 0x28 0x??
0x1E 0x?? 0x?? 9 10 11 12
0x4E9
0x201 0
1 0 2
0
1 0xB5
0x03 0x81
0x03 0x67
0x03 0x3F
0x03 13 14 15 16
Caching Example: Write-Back Fully
Associative with LRU
❑ Compute C[2]
➢ Access A[0] = 0x1000: hit, Access Bt[8] = 0x2020:
▪ miss, cache full → evict block 0x200
▪ copy block Bt[8:11] to cache, update necessary bits
➢ Access A[1] = 0x1004: hit, Access Bt[9] = 0x2024: hit
➢ Access A[2] = 0x1008: hit, Access Bt[10] = 0x2028: hit
➢ Access A[3] = 0x100C: hit, Access Bt[11] = 0x202C: hit
Bt 4 4 4 4
❑ Write to C[2] = 0x3008: 3 3 3 3
➢ Hit, update LRU bits, main memory not updated yet. 2 2 2 2
1 1 1 1
Tag V Dirty LRU Data
A C
0x100 1 0 2
3 0x01 0x02 0x03 0x04
1 2 3 4
0x200
0x202 1 0 1
4
2 0x04
0x02 0x04
0x02 0x04
0x02 0x04
0x02 5 6 7 8
0x300 1 1 3
1 0x28 0x1E 0x??
0x14 0x?? 9 10 11 12
0x201 1 0 2
4 0x03 0x03 0x03 0x03 13 14 15 16
Caching Example: Write-Back Fully
Associative with LRU
❑ Compute C[3]
➢ Access A[0] = 0x1000: hit, Access Bt[12] = 0x2030:
▪ miss, cache full → evict block 0x201
▪ copy block Bt[8:11] to cache, update necessary bits
➢ Access A[1] = 0x1004: hit, Access Bt[13] = 0x2024: hit
➢ Access A[2] = 0x1008: hit, Access Bt[14] = 0x2028: hit
➢ Access A[3] = 0x100C: hit, Access Bt[15] = 0x202C: hit
Bt 4 4 4 4
❑ Write to C[3] = 0x300C: 3 3 3 3
➢ Hit, update LRU bits, main memory not updated yet. 2 2 2 2
1 1 1 1
Tag V Dirty LRU Data
A C
0x100 1 0 2
3 0x01 0x02 0x03 0x04
1 2 3 4
0x202 1 0 4
2 0x02 0x02 0x02 0x02 5 6 7 8
0x300 1 1 3
1 0x28 0x1E 0x14 0x0A
0x?? 9 10 11 12
0x203 1
0x201 0 2
1
4 0x01 0x03
0x03 0x01 0x03
0x01 0x03
0x01 13 14 15 16
Caching Example: Write-Back Fully
Associative with LRU
❑ Compute C[4]
➢ Access A[4] = 0x1010:
▪ miss, cache full → evict block 0x202
▪ copy block A[4:7] to cache, update necessary bits
➢ Access Bt[0] = 0x2000:
▪ miss, cache full → evict block 0x100
▪ copy block Bt[0:3] to cache, update necessary bits
Bt 4 4 4 4
➢ Other accesses: hits
3 3 3 3
➢ Write to C[4] = 0x3010:
2 2 2 2
▪ miss → evict block 0x203, memory not updated yet.
1 1 1 1
Tag V Dirty LRU Data
A C
0x200 1
0x100 0 2
3
4
1 0x04 0x02
0x01 0x04 0x03
0x04 0x04
1 2 3 4
0x101 1
0x202 0 3
4
1
2 0x05 0x02
0x02 0x06 0x02
0x07 0x02
0x08 5 6 7 8
0x300 1 1 4
1
2
3 0x28 0x1E 0x14 0x0A 9 10 11 12
0x301 1
0x203 1
0 1
2
3
4 0x68 0x01
0x01 0x?? 0x01
0x?? 0x01
0x?? 13 14 15 16
Caching Example: Write-Back Fully
Associative with LRU
❑ Compute C[5]
➢ Access A[4] = 0x1010: hit, Access Bt[4] = 0x2010:
▪ miss, cache full → evict block 0x300: only at this
point does main memory get updated.
▪ copy block Bt[4:7] to cache, update necessary bits
➢ Access A[5] = 0x1014: hit, Access Bt[5] = 0x2014: hit
➢ Access A[6] = 0x1018: hit, Access Bt[6] = 0x2018: hit
➢ Access A[7] = 0x101C: hit, Access Bt[7] = 0x201C: hit Bt 4 4 4 4
➢ Write to C[5] = 0x3014: 3 3 3 3
▪ Hit, update LRU bits, main memory not updated yet. 2 2 2 2
1 1 1 1
Tag V Dirty LRU Data
A C
0x200 1 0 2
4 0x04 0x04 0x04 0x04
1 2 3 4 40 30 20 10
0x101 1 0 3
2 0x05 0x06 0x07 0x08 5 6 7 8
0x300
0x201 1 1
0 4
2
1 0x28
0x03 0x1E
0x03 0x14
0x03 0x0A
0x03 9 10 11 12
0x301 1 1 1
3 0x68 0x??
0x4E 0x?? 0x?? 13 14 15 16
How Much Associativity?
❑ Increased associativity 12
4KB
decreases miss rate 10 8KB
➢ But with diminishing returns 16KB
8

Miss Rate
32KB
❑ The choice of direct 64KB
6
mapped or set associative 128KB

depends on the cost of a 4 256KB

miss versus the cost of 2

512KB

implementation. 0
1-way 2-way 4-way 8-way

❑ N-way set associative Associativity

cache costs
➢ N comparators (delay and area)
➢ MUX delay (set selection) before data is available
➢ Data available after set selection and Hit/Miss decision (c.f. direct mapped
cache: the cache block is available before the Hit/Miss decision) → can be
an important consideration (why?).
Reducing Cache Miss Rates #2: multi-
level caches
❑ Use multiple levels of caches
➢ Primary (L1) cache attached to CPU
➢ Larger, slower, L2 cache services misses from primary cache. With
advancing technology → have more than enough room on the die for L2,
normally a unified cache (i.e., it holds both instructions and data) and in some
cases even a unified L3 cache.

❑ Example: Given
▪ CPU base CPI = 1, clock rate = 4GHz
▪ Miss rate/instruction = 2%
▪ Main memory access time = 100ns
Questions:
➢ Compute the actual CPI with just primary cache.
➢ Compute the performance gain if we add L2 cache with
▪ Access time = 5ns
▪ Global miss rate to main memory = 0.5%
Multi-level cache: example solution
❑ With just primary cache
➢ Miss penalty = 100ns/0.25ns = 400 cycles
➢ CPIstall = 1 + 0.02 × 400 = 9

❑ With added L2 cache

➢ Primary miss with L2 hit: penalty = 5ns/0.25ns = 20 cycles
➢ Primary miss with L2 miss: penalty = L2 access stall + Main memory stall =
20 + 400 = 420 cycles
➢ CPIstall = 1 + (0.02 - 0.005) × 20 + 0.005 x 420 = 3.4 cycles
➢ [Alternatively, CPIstall = 1 + L1 stalls/instruction + L2 stalls/instruction = 1 +
0.02 x 20 + 0.005 x 400 = 3.4 cycles]
➢ Performance gain = 9/3.4=2.6 times.
Multilevel Cache Design Considerations
❑ Design considerations for L1 and L2 caches are very different
➢ Primary cache should focus on minimizing hit time in support of a shorter
clock cycle → smaller with smaller block sizes.
➢ Secondary cache(s) should focus on reducing miss rate to reduce the
penalty of long main memory access times → larger with larger block sizes &
higher levels of associativity.

❑ The miss penalty of the L1 cache is significantly reduced by the

presence of an L2 cache – so it can be smaller (i.e., faster) but
have a higher miss rate
❑ For the L2 cache, hit time is less important than miss rate
➢ The L2$ hit time determines L1$’s miss penalty
➢ L2$ local miss rate >> the global miss rate
▪ Local miss rate = fraction of references to one level of a cache that miss
▪ Global miss rate = fraction of references that miss in all levels of a multi-
level cache → dictates how often we must access the main memory.
Multi-level cache parameters: two real-
life examples
Intel Nehalem AMD Barcelona
Split I$ and D$; 32KB for each Split I$ and D$; 64KB for each
L1 cache organization & size
per core; 64B blocks per core; 64B blocks
4-way (I), 8-way (D) set assoc.; 2-way set assoc.; LRU
L1 associativity
~LRU replacement replacement
L1 write policy write-back, write-allocate write-back, write-allocate
Unified; 256MB (0.25MB) per Unified; 512KB (0.5MB) per
L2 cache organization & size
core; 64B blocks core; 64B blocks
L2 associativity 8-way set assoc.; ~LRU 16-way set assoc.; ~LRU
L2 write policy write-back, write-allocate write-back, write-allocate
Unified; 8192KB (8MB) shared Unified; 2048KB (2MB) shared
L3 cache organization & size
by cores; 64B blocks by cores; 64B blocks
32-way set assoc.; evict block
L3 associativity 16-way set assoc.
shared by fewest cores
L3 write policy write-back, write-allocate write-back; write-allocate
The Cache Design Space
❑ Several interacting dimensions Cache Size
➢ cache size
➢ block size Associativity
➢ associativity
➢ replacement policy
➢ write-through vs write-back
➢ write allocation
Block Size
❑ The optimal choice is a compromise
➢ depends on access characteristics
▪ workload
▪ use (I-cache, D-cache, TLB) Bad
➢ depends on technology / cost

❑ Simplicity often wins Good Factor A Factor B

Less More

Cache Design
No ratings yet
Cache Design
59 pages
CMP3010L09-MemoryII (1)
No ratings yet
CMP3010L09-MemoryII (1)
39 pages
U42
No ratings yet
U42
41 pages
CS2115 chapter-6
No ratings yet
CS2115 chapter-6
45 pages
week12 updated
No ratings yet
week12 updated
60 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Chapter 2z Ppt
No ratings yet
Chapter 2z Ppt
54 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
53-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
38 pages
CL10 MemoryMgmt
No ratings yet
CL10 MemoryMgmt
45 pages
SS_Computer_Architecture_Cache_Memory_Organization
No ratings yet
SS_Computer_Architecture_Cache_Memory_Organization
24 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
10_Caches
No ratings yet
10_Caches
34 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
revision1
No ratings yet
revision1
33 pages
L18-Cache-Wrap-up
No ratings yet
L18-Cache-Wrap-up
30 pages
Week 13 - Lecture 13 - Memory (cont)
No ratings yet
Week 13 - Lecture 13 - Memory (cont)
31 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
10-cacheperf
No ratings yet
10-cacheperf
24 pages
5 1
No ratings yet
5 1
39 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec04-Cache - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec04-Cache - (Cuuduongthancong - Com)
16 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Cache
No ratings yet
Cache
34 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
CA_Lecture_08
No ratings yet
CA_Lecture_08
38 pages
cache_ppt
No ratings yet
cache_ppt
38 pages
Cache Org
No ratings yet
Cache Org
19 pages
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
No ratings yet
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
18 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
UNIT IV.ppt
No ratings yet
UNIT IV.ppt
61 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
CA I - Chapter 5 Caches 3
No ratings yet
CA I - Chapter 5 Caches 3
70 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
No ratings yet
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
33 pages
ch5 Easy
No ratings yet
ch5 Easy
27 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
6.Module 2_Part 2
No ratings yet
6.Module 2_Part 2
39 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Assign1 PDF
No ratings yet
Assign1 PDF
5 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
4 Unit Speed, Size and Cost
No ratings yet
4 Unit Speed, Size and Cost
5 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Computer Organization Exercise Answer7
No ratings yet
Computer Organization Exercise Answer7
7 pages
hw4 Sol
No ratings yet
hw4 Sol
4 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
#4.1 Manual Disa
100% (2)
#4.1 Manual Disa
63 pages
Humphrey Hfa III Series
100% (1)
Humphrey Hfa III Series
521 pages
Advance Computer Architecture Homework 2 Solution
No ratings yet
Advance Computer Architecture Homework 2 Solution
8 pages
Ca Sol PDF
No ratings yet
Ca Sol PDF
8 pages
ERA.THISHANTHAN NETUKKADI A.L 2023
No ratings yet
ERA.THISHANTHAN NETUKKADI A.L 2023
17 pages
Hakmem. Mit Ai Memo 239, Feb 29, 1972-Ocr
No ratings yet
Hakmem. Mit Ai Memo 239, Feb 29, 1972-Ocr
107 pages
9-Memory Design (Module4) - 18-Dec-2019Material - I - 18-Dec-2019 - Module - 4A - Memory - Design
No ratings yet
9-Memory Design (Module4) - 18-Dec-2019Material - I - 18-Dec-2019 - Module - 4A - Memory - Design
134 pages
Chapter 3 Storage Devices and Media
No ratings yet
Chapter 3 Storage Devices and Media
30 pages
5 X 7 LED Matrix Display Project
No ratings yet
5 X 7 LED Matrix Display Project
8 pages
Cdd141910-Quantel Aviso - Service Manual - Opt
No ratings yet
Cdd141910-Quantel Aviso - Service Manual - Opt
28 pages
Pool of Radiance - Controls
No ratings yet
Pool of Radiance - Controls
3 pages
Unit - 6 Pipeline & Vector Processing: 2140707 Computer Organization
No ratings yet
Unit - 6 Pipeline & Vector Processing: 2140707 Computer Organization
36 pages
Computer Fundamentals - 15 QnA
No ratings yet
Computer Fundamentals - 15 QnA
4 pages
catalog
No ratings yet
catalog
1 page
MM Assignment 3
100% (1)
MM Assignment 3
4 pages
Nursing Tics Prelim 2003 Version
No ratings yet
Nursing Tics Prelim 2003 Version
9 pages
Input/Output Interface Circuits And Lsi Peripheral Device: 國立台灣大學生物機電系林達德 611 37100 微處理機原理與應用 Lecture 10-2
No ratings yet
Input/Output Interface Circuits And Lsi Peripheral Device: 國立台灣大學生物機電系林達德 611 37100 微處理機原理與應用 Lecture 10-2
27 pages
CBA Response Code Extensive
No ratings yet
CBA Response Code Extensive
21 pages
CDVR 4
100% (1)
CDVR 4
6 pages
HP Man Service Manager 9.33 Support Matrix PDF
No ratings yet
HP Man Service Manager 9.33 Support Matrix PDF
21 pages
Open Positions As of Apr 23,2009
No ratings yet
Open Positions As of Apr 23,2009
12 pages
CCTV NEPO MALL PROPOSAL Updated Final
No ratings yet
CCTV NEPO MALL PROPOSAL Updated Final
4 pages
TD-W8961ND V1 User Guide 19100 PDF
No ratings yet
TD-W8961ND V1 User Guide 19100 PDF
70 pages
AliNaveed-Cv (Mobile Application Developer)
No ratings yet
AliNaveed-Cv (Mobile Application Developer)
4 pages
Mark V ICS System - GE Control-Turbine PDF
100% (3)
Mark V ICS System - GE Control-Turbine PDF
25 pages
Photoshop CS5 Tutorial
No ratings yet
Photoshop CS5 Tutorial
27 pages
Fiber Optic Exam 1
No ratings yet
Fiber Optic Exam 1
7 pages
Perfstat Command
No ratings yet
Perfstat Command
5 pages
Quick Ref Card
No ratings yet
Quick Ref Card
2 pages
EXCOUNT-I - Section of Surge Arrester Buyers Guide - Edition 11 2014-05 - English - 1HSM 9543 12-00en PDF
No ratings yet
EXCOUNT-I - Section of Surge Arrester Buyers Guide - Edition 11 2014-05 - English - 1HSM 9543 12-00en PDF
4 pages
New High-Performance, Low-Power STN2100 OBD Interpreter IC Now Available
No ratings yet
New High-Performance, Low-Power STN2100 OBD Interpreter IC Now Available
3 pages
Pokemon - Crystal, Heartgold, Soulsilver
No ratings yet
Pokemon - Crystal, Heartgold, Soulsilver
1 page
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CA11_2023S1_new

Uploaded by

CA11_2023S1_new

Uploaded by

ELT3047 Computer Architecture

Lecture 11: Associative cache

Hoang Gia Hung

❑ Three major categories of cache misses:

❑ Cache design space:

❑ Components of CPU time:

❑ Memory stall cycles = (I-Cache Misses + D-Cache Misses) ×

❑ With simplifying assumptions:

❑ Average Memory Access Time (AMAT) is the average time to

❑ A compromise is to divide the cache into sets, each of which

0000xx One word blocks

0 miss 4 miss 0 hit 4 hit

010 Mem(4) 010 Mem(4) 010 Mem(4)

Content Addressable Memory

Tag Index Block offset Byte offset

❑ For a fixed size cache, each increase by a factor of two in

❑ First In First Out (FIFO): replace the oldest block in set

❑ Write to C[0] = 0x3000:

❑ Write to C[1] = 0x3004: Bt 4 4 4 4

depends on the cost of a 4 256KB

miss versus the cost of 2

❑ N-way set associative Associativity

❑ With added L2 cache

❑ The miss penalty of the L1 cache is significantly reduced by the

❑ Simplicity often wins Good Factor A Factor B

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.