Cache Overview
Cache Overview
- vardhamana.hegde@wipro.com
Agenda
• Introduction
• Cache structure
• Cache Organization
• Principle of locality
• Hit and Miss!
• Block placement
• Block identification
• Block replacement
• Interaction policies with main memory
• Cache coherency
• MESI protocol
• Some terms
• Benefits of larger cache – the Xeon case
• Comparing Intel Processors
• Cache in AMD64
• References
06/23/23 Cache Overview - vardhamana.hegde@wipro.com 2
Introduction
• Pronounced as – “cash”
• It is also a memory!
• Contains the most recently accessed pieces of main memory
• Bottleneck in processors – slower memories
• Benefits?
– For a typical desktop application
– For Pentium with 16Kbyte cache
Contains about 90% of the addresses requested by the processor!
• Basic Model
– Block Placement
• Where should a block be placed in a cache?
– Block Identification
• How to find if a block is found in the cache?
– Block Replacement
• Which block should be replaced if not found in cache?
– Interaction Policies with Main Memories
• What happens on reads and writes to the cache?
• In reality?
Faster/Smaller/Costlier/Power Hungry/Less dense
Bigger/Slower/Cheaper/Less Power/Denser
Decrease
Registers (FF)
in access Increase in
Cache (SRAM) cost per
time
byte
Main Memory (DRAM)
Virtual Memory (Disk)
Storage (Disk/Tape)
• Cache Page
– “equal” pieces of main memory
– Size is dependent on cache size
• Cache Line
– Smaller pieces of a cache page
– Size is determined by both
processor design and cache
design
– In Pentium, cache line is 32
bytes
• Temporal Locality
– Recently accessed items are likely to be accessed in the near future
• Spatial Locality
– Items whose addresses are near one another tend to be referenced close
together “in time”
• Cache Miss – the memory access, where the cache does not contain the
information requested
• Fully Associative
– A block of data can be placed anywhere
in cache
– Main memory and cache memory are
both divided into lines of equal size
– Provides the best performance – store
the line anywhere!
– Complexity in implementing
• Determine if the data is present or not
• Need to compare the address within the
TRAM (done in parallel) within the timing
requirements
– Hence used in caches of smaller size –
typically less than 4K
• Direct Map
– A block of data can be placed
anywhere in cache
– One way set associative?
– Main memory = n (cache size)
– Least complex
– Less flexible for jump kind of
instructions – lower performance
• Set Associative
– A block of data can be placed in
restrictive “set” of places in the
cache
– A combination of “Fully
Associative” and “Direct Mapped”
schemes
– Cache is divided into equal
“cache ways”
– Cache Page = Cache Way
– Cache Way – Direct Mapped
• Directory
– Address tags – checked to match the block address
• Checked in parallel for speed of operation
• Address from CPU = f (block offset, index, tag)
– Control bits – indicate that the content of a block of data is valid
Block Address
Offset
Tag Index
• Basic Algorithm
• Read policies
– Look Aside
• Less Expensive
• Better response to cache miss
• Processor cannot access cache
when another bus master is
accessing the main memory
– Look Through
• More complex
• Processor runs out of cache
• Memory access on cache miss
is slower
• Good when
– cache hits are higher and
– There are other bus masters
• Other protocols
– MSI – PowerPC755
– MESI (Illinois) – Pentium Class
– MOESI – UltraSPARC and AMD64
• MESI + Owned – Is the owner, has to communicate
– Berkley
– Firefly
– Futurebus+
• Cache Hit
– memory accesses that result in finding the data in the cache = f (size, fetch
rate, locality of reference)
• Cache Miss
– the memory access, where the cache does not contain the information
requested
• Main memory
– Physical memory such as RAM and ROM (but not cache memory) that is
installed in a particular computer system
• Physical memory
– Actual memory, consisting of main memory and cache
• Virtual Address cache
– High speed buffer between CPU and MMU
– Uses virtual address to decide the presence of data in the cache
– MMU translation can be avoided
– Susceptible to cache aliasing problems
• Snoop
– to watch the address lines to check if it contains the data for any memory
transactions
• Snarf
– taking the data from the data lines – to update and maintain consistency
• Dirty data
– Data held in cache that is more recent than the copy held in main memory
• Stale Data
– The data available in cache, when data is modified within main memory but
not modified in cache
• Flush
– when used with cache – write back, if modified, and invalidate – “flush the
cache line”
• Write Merging
– Block are often larger than a machine “word”
– To save write buffer space and memory traffic
• Prefetch
– Increase in line size increase in conflict misses but reduction in
compulsory misses
Tradeoff?
– Prefetch buffer holds additional line beyond the one recently fetched
– Instructions for prefetch
• Non Blocking Cache
– Case of superscalar processors
– No need to wait if one of the pipeline faces a miss
– Make sure the interdependency is maintained correctly
• Synonym Problem
– Virtual memory maps multiple logical locations to same physical memory
– Different logical locations in the cache at the same time
– CPU gets unexpected values
– Physical cache invalidates the correct one
References
- An Overview of Cache – http://www.intel.com/design/intarch/papers/cache6.htm
- Memory Hierarchy Design –
http://www.cs.iastate.edu/~prabhu/Tutorial/CACHE/mem_title.html
- Memory Hierarchy in Cache Based Systems –
http://www.sun.com/blueprints/1102/817-0742.pdf
- Cache coherency issues for real time multiprocessing -
http://www.embedded.com/97/feat9702.htm