0% found this document useful (0 votes)
19 views33 pages

Cache Overview

Uploaded by

Raghu Venkatesan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views33 pages

Cache Overview

Uploaded by

Raghu Venkatesan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Cache Overview

- vardhamana.hegde@wipro.com
Agenda
• Introduction
• Cache structure
• Cache Organization
• Principle of locality
• Hit and Miss!
• Block placement
• Block identification
• Block replacement
• Interaction policies with main memory
• Cache coherency
• MESI protocol
• Some terms
• Benefits of larger cache – the Xeon case
• Comparing Intel Processors
• Cache in AMD64
• References
06/23/23 Cache Overview - vardhamana.hegde@wipro.com 2
Introduction
• Pronounced as – “cash”
• It is also a memory!
• Contains the most recently accessed pieces of main memory
• Bottleneck in processors – slower memories
• Benefits?
– For a typical desktop application
– For Pentium with 16Kbyte cache
Contains about 90% of the addresses requested by the processor!
• Basic Model

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 3


Introduction – contd..

• Memory hierarchy considerations


– Make the Common Case Fast
• Amdahl’s Law – “the performance improvement to be gained from using some
faster mode of execution is limited by the fraction of the time the faster mode can
be used”
– Principle of locality
• The properties of programs that you want to exploit
– Smaller is Faster
• Generally and also true for memories!

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 4


Introduction – contd..

– Block Placement
• Where should a block be placed in a cache?
– Block Identification
• How to find if a block is found in the cache?
– Block Replacement
• Which block should be replaced if not found in cache?
– Interaction Policies with Main Memories
• What happens on reads and writes to the cache?
• In reality?
Faster/Smaller/Costlier/Power Hungry/Less dense
Bigger/Slower/Cheaper/Less Power/Denser

Decrease
Registers (FF)
in access Increase in
Cache (SRAM) cost per
time
byte
Main Memory (DRAM)
Virtual Memory (Disk)

Storage (Disk/Tape)

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 5


Introduction – contd..

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 6


Cache structure
• Functional Blocks
– SRAM
• The block that holds the data
• Size determines the size of the
cache
– Tag RAM
• Small piece of SRAM
• Stores the address of the data
that is stored in SRAM
– Cache Controller
• Performs snoops and snarfs
• Update SRAM and TRAM
• Implement Write policy
• Determine if the memory request
is cacheable (not all requests
need to be cacheable!)
• Determine if a memory access is
a cache hit or miss

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 7


Cache Organization

• Cache Page
– “equal” pieces of main memory
– Size is dependent on cache size

• Cache Line
– Smaller pieces of a cache page
– Size is determined by both
processor design and cache
design
– In Pentium, cache line is 32
bytes

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 8


Principle of Locality
• Locality of reference
– Make use of the properties of the programs
– Tend to reuse data and instructions used recently
– 90/10 rule = “ A program spends 90% of its time in 10% of it’s code”

• Temporal Locality
– Recently accessed items are likely to be accessed in the near future

• Spatial Locality
– Items whose addresses are near one another tend to be referenced close
together “in time”

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 9


Hit and Miss!
• Cache Hit - memory accesses that result in finding the data in the cache

• Cache Miss – the memory access, where the cache does not contain the
information requested

– Compulsory – a first reference


• Larger cache  greater compulsory misses
• Smaller lines  more compulsory misses

– Capacity – a value was evicted because for lack of space


• Larger cache  lesser the misses
• Increase associativity  increase in capacity misses

– Conflict – another location with the same mapping was loaded


• Increased size  lower the conflict
• Increased associativity  decreased conflicts
• Sensitive to code/data placement

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 10


Block Placement
Where do you want to place the data in the
“cache”?

• Fully Associative
– A block of data can be placed anywhere
in cache
– Main memory and cache memory are
both divided into lines of equal size
– Provides the best performance – store
the line anywhere!
– Complexity in implementing
• Determine if the data is present or not
• Need to compare the address within the
TRAM (done in parallel) within the timing
requirements
– Hence used in caches of smaller size –
typically less than 4K

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 11


Block Placement – contd..

Where do you want to place the


data in the “cache”?

• Direct Map
– A block of data can be placed
anywhere in cache
– One way set associative?
– Main memory = n (cache size)
– Least complex
– Less flexible for jump kind of
instructions – lower performance

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 12


Block Placement – contd..

Where do you want to place the


data in the “cache”?

• Set Associative
– A block of data can be placed in
restrictive “set” of places in the
cache
– A combination of “Fully
Associative” and “Direct Mapped”
schemes
– Cache is divided into equal
“cache ways”
– Cache Page = Cache Way
– Cache Way – Direct Mapped

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 13


Block Placement – contd..

• Most common cache organizations


– direct mapped / two way set associative / four way set associative

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 14


Block Identification
How to find out if the block is present in cache?

• Directory
– Address tags – checked to match the block address
• Checked in parallel for speed of operation
• Address from CPU = f (block offset, index, tag)
– Control bits – indicate that the content of a block of data is valid

Block Address
Offset
Tag Index

Stored in Cache and Selects the set Selects data within


compared with CPU the block
address

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 15


Block Identification – contd.

• Basic Algorithm

// Search cache directory for Tag


if “hit” then
Use offset to fetch data from the RAM
else
// access main memory
if “hit” then
store data (and block) in cache and pass data
to CPU
else
Do context switching (while processing IO from
disk)
end

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 16


Block Replacement
Which block should be replaced?
• Random
• Least Recently Used (LRU)
• First In First Out (FIFO)
• Most Recently Used (MRU)
• Least Frequently Used (LFU)
• Most Frequently Used (MFU)

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 17


Main Memory Interaction Policies
What happens on a read to cache?

• Read policies

– Look Aside
• Less Expensive
• Better response to cache miss
• Processor cannot access cache
when another bus master is
accessing the main memory

– Look Through
• More complex
• Processor runs out of cache
• Memory access on cache miss
is slower
• Good when
– cache hits are higher and
– There are other bus masters

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 18


Main Memory Interaction Policies
What happens on a write to cache?

• Write policies on “write hit” - Write Through and Write Back


– Write Back – cache acts like buffer
• Advantages
– Write occur at the speed of the cache – greater performance
– Write to main memory can happen when the system bus is available
– Multiple writes within a block requires only one write to main memory
– As a result requires less memory bandwidth
• Disadvantages
– Harder to implement
– Main memory is not always consistent with cache
– Reads that result in replacement may cause write of dirty blocks to main memory

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 19


Main Memory Interaction Policies
– Write Through
• Advantages
– Easy to implement
– Main memory has always the most current copy of the data
– Read miss never results in writes to main memory
• Disadvantages
– Write is slower
– Every write needs a main memory access
– Requires more bandwidth to write to main memory
• Write policies on “write miss”
– Write allocate
• write into the cache
• Usually associated with write back caches
– Write around (No Write allocate) – write directly to main memory

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 20


Cache Coherency

• Case of shared memory, multiprocessor systems


• Goal – reduce memory access and also network traffic
• Problem – caches to be consistent with main memory
• To decide – memory entry should be updated or invalidated

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 21


Cache Coherency – contd.

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 22


Cache Coherency – contd.

• Pentium – MESI protocol to maintain cache consistency/coherency


– Each cache line – “Modified”, “Exclusive”, “Shared” and “Invalid”
• Modified – is modified
• Exclusive – is stored in this cache only and is not changed by write access yet
• Shared – may be present in other caches – cannot modify
• Invalid – is invalid, fetch to satisfy any access

• Other protocols
– MSI – PowerPC755
– MESI (Illinois) – Pentium Class
– MOESI – UltraSPARC and AMD64
• MESI + Owned – Is the owner, has to communicate
– Berkley
– Firefly
– Futurebus+

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 23


Cache Coherency – MESI
• “Simplified” MESI protocol
– Start with “invalid”

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 24


Some Terms
• Locality of reference
– Make use of the properties of the programs
– 90/10 rule = “ A program spends 90% of its time in 10% of it’s code”
– Tend to reuse data and instructions used recently
• Temporal Locality
– Recently accessed items are likely to be accessed in the near future
• Spatial Locality
– Items whose addresses are near one another tend to be referenced close
together “in time”
• Fetch rate
– the number of bits in cache for each memory access
• Memory stall cycles is a function of
– Instruction Count
– Memory references per instruction
– The fraction of accesses that are not in the cache
– The additional time to service the miss

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 25


Some Terms – contd..

• Cache Hit
– memory accesses that result in finding the data in the cache = f (size, fetch
rate, locality of reference)
• Cache Miss
– the memory access, where the cache does not contain the information
requested
• Main memory
– Physical memory such as RAM and ROM (but not cache memory) that is
installed in a particular computer system
• Physical memory
– Actual memory, consisting of main memory and cache
• Virtual Address cache
– High speed buffer between CPU and MMU
– Uses virtual address to decide the presence of data in the cache
– MMU translation can be avoided
– Susceptible to cache aliasing problems

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 26


Some Terms – contd..

• Physical Address Cache


– high speed buffer between the MMU and physical memory
– CPU  (Virtual Address)  MMU  Cache  (Physical Address) 
Physical Memory
– Uses the physical address to determine data in cache
– For every access an MMU translation must be completed regardless of data
present in cache
– Greatly reduces potential cache aliasing problems
• Cache Aliasing
– Two or more sets of data or instruction addresses that have the same lower
order bits and therefore occupy the same cache address
• Cache Consistency
– Cache has a copy of the content of the main memory – must reflect the
content of (be consistent with) the main memory
• Probe
– A check for an address in a processor’s caches or internal buffers. “External
probes” originate outside the processor and “internal probes” originate within
the processor

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 27


Some Terms – contd..

• Snoop
– to watch the address lines to check if it contains the data for any memory
transactions
• Snarf
– taking the data from the data lines – to update and maintain consistency
• Dirty data
– Data held in cache that is more recent than the copy held in main memory
• Stale Data
– The data available in cache, when data is modified within main memory but
not modified in cache
• Flush
– when used with cache – write back, if modified, and invalidate – “flush the
cache line”
• Write Merging
– Block are often larger than a machine “word”
– To save write buffer space and memory traffic

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 28


Some Terms – contd..

• Split Cache vs. Unified Cache


– Unified Cache
• All memory requests to a single cache
• Less hardware but lesser bandwidth and more opportunity for collision
– I & D cache
• Because they have different access patterns – can customize
• Separate memory for instruction and data
• Requires additional hardware, I cache is read only
• Shadow Cache
– Transient accesses evict cache values – some times
– Small separate cache store evicted values
– Replacement policy first checks shadow cache and will not evict such values
(which are recently evicted and reloaded)
• Write Buffers
– To avoid stall on writes
– A smaller cache that can hold few values waiting to go into memory
– Helps when writes are clustered
– Does not entirely eliminate stall (what if the buffer becomes full)
06/23/23 Cache Overview - vardhamana.hegde@wipro.com 29
Some Terms – contd..

• Prefetch
– Increase in line size  increase in conflict misses but reduction in
compulsory misses
Tradeoff?
– Prefetch buffer  holds additional line beyond the one recently fetched
– Instructions for prefetch
• Non Blocking Cache
– Case of superscalar processors
– No need to wait if one of the pipeline faces a miss
– Make sure the interdependency is maintained correctly
• Synonym Problem
– Virtual memory maps multiple logical locations to same physical memory
– Different logical locations in the cache at the same time
– CPU gets unexpected values
– Physical cache invalidates the correct one

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 30


06/23/23 Cache Overview - vardhamana.hegde@wipro.com 31
Comparing Intel Processors

   Pent  P-II  P-III  P-4


 Split?  Yes  Yes  Yes  Yes
 Data size  8 K  16K  16K  8K
 Instruction  8 K  16K  16K  ~96K
 Associativity  2-Way  D4/I2  2-Way  D4/ I-trace
 Line size  32B  32B  32B  32B
 256K same
 Level 2  Off chip  256K  256K
package
 L2 Assoc.      8-Way  8-Way

06/23/23 Cache Overview - vardhamana.hegde@wipro.com 32


Thank You!

References
- An Overview of Cache – http://www.intel.com/design/intarch/papers/cache6.htm
- Memory Hierarchy Design –
http://www.cs.iastate.edu/~prabhu/Tutorial/CACHE/mem_title.html
- Memory Hierarchy in Cache Based Systems –
http://www.sun.com/blueprints/1102/817-0742.pdf
- Cache coherency issues for real time multiprocessing -
http://www.embedded.com/97/feat9702.htm

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy