Chapter 5 - Memory
Chapter 5 - Memory
Caches
Memory
What Happens at Boot?
When the computer switches ON, the CPU executes instructions from some start
address (stored in Flash ROM)
CPU
Memory mapped
0x0002000:
Code to copy firmware into
regular memory and jump
into it)
Spatial locality
If a data location is referenced, data locations with nearby addresses will tend to
be referenced soon.
Useful to pre-load data that is close (in address) to other recently accessed data
E.g., sequential instruction access, array data
Great Idea #3: Principle of Locality/Memory Hierarchy
Taking Advantage of Locality
Memory hierarchy
Store everything on disk
Copy recently accessed (and nearby) items from disk to
smaller DRAM memory
Main memory
Copy more recently accessed (and nearby) items from
DRAM to smaller SRAM memory
Cache memory attached to CPU
Memory Hierarchy Levels
Block (aka line): unit of copying
May be multiple words
It need to refresh and so the access time is very close to the cycle time.
SRAMs typically use six to eight transistors per bit to prevent the information
from being disturbed when it is read. SRAM needs only minimal power to retain
the charge in standby mode.
In the past, most PCs and server systems used separate SRAM chips for either
their primary, secondary, or even tertiary caches.
Today, thanks to Law, all levels of caches are integrated onto the
processor chip, so the market for separate SRAM chips has nearly
evaporated.
DRAM Technology
Data stored as a charge in a capacitor
Single transistor used to access the charge
Must periodically be refreshed
Read contents and write back
Advanced DRAM Organization
Bits in a DRAM are organized as a rectangular array
DRAM accesses an entire row
Burst mode: supply successive words from a row with
reduced latency
Double data rate (DDR) DRAM
Get twice as much bandwidth based on the clock rate and
the data width
Quad data rate (QDR) DRAM
Separate DDR inputs and outputs
DRAM Performance Factors
Row buffer
Allows several words to be read and refreshed in parallel
Synchronous DRAM
Allows for consecutive accesses in bursts without needing
to send each address
Improves bandwidth
DRAM banking
Allows simultaneous access to multiple DRAMs
Improves bandwidth
Flash Types
NOR flash: bit cell like a NOR gate
Random read/write access
Used for instruction memory in embedded systems
15
Memory Terms
Memory hierarchy A structure that uses multiple levels of memories; as the distance from
the processor increases, the size of the memories and the access time both increase.
Block (or line): The minimum unit of information that can be either present or not present
in a cache.
Hit rate: The fraction of memory accesses found in a level of the memory hierarchy.
Miss rate: The fraction of memory accesses not found in a level of the memory hierarchy.
Hit time: The time required to access a level of the memory hierarchy, including the time
needed to determine whether the access is a hit or a miss.
Miss penalty: The time required to fetch a block into a level of the memory hierarchy from
the lower level, including the time to access the block, transmit it from one level to the
other, insert it in the level that experienced the miss, and then pass the block to the
requestor.
The Basics of Caches
Cache: represents the level of the memory hierarchy between the
processor and main memory in the first commercial computer to have
this extra level. The memories in the Datapath are simply replaced by
caches.
Cache Memory
The level of the memory hierarchy closest to the CPU
Given accesses X1 Xn 1, Xn (references)
Tag: A field in a table used for a memory hierarchy that contains the address
information required to identify whether the associated block in the hierarchy
corresponds to a requested word.
The tag from the cache is compared against the upper portion
of the address to determine whether the entry in the cache
corresponds to the requested address.
Because the cache has 210 (or 1024) words and a block size of
one word, 10 bits are used to index the cache, leaving 32 10
2 = 20 bits to be compared against the tag.
If the tag and upper 20 bits of the address are equal and the
valid bit is on, then the request hits in the cache, and the word
is supplied to the processor. Otherwise, a miss occurs.
Initial state of the cache after power-on
8-blocks, 1 word/block, direct mapped
Set associative
A cache that has a fixed number of locations (at least two) where each block can be
placed.
Direct Mapped - There is a direct mapping from any block address in memory to a single
location in the upper level of the hierarchy.
Associative Cache Example
In direct-mapped placement, there is only
one cache block where memory block 12
can be found, and that block is given by
(12 modulo 8) = 4.
Associativity Example 0
6
(0 modulo 4) = 0
(6 modulo 4) = 2
Direct mapped
Block Cache Hit/miss Cache content after access
address index
0 1 2 3
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
Associativity Example
2-way set associative
Cache content after access Block address Cache set
Block Cache
address index Hit/miss
Set 0 Set 1 0 (0 modulo 2) = 0
0 0 miss Mem[0]
6 (6 modulo 2) = 0
8 0 miss Mem[0] Mem[8]
8 (8 modulo 2) = 0
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
Fully associative
Block Hit/miss
address Cache content after access
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]
Set Associative Cache Organization
The comparators determine which element of
the selected set (if any) matches the tag. The
output of the comparators is used to select the
data from one of the four blocks of the
indexed set, using a multiplexor with a
decoded select signal.
Hardware caches
Reduce comparisons to reduce cost
Virtual memory
Full table lookup makes full associativity feasible
Benefit in reduced miss rate
Cache Design Trade-offs
Read/Write Read/Write
Valid Valid
32 32
Address Address
CPU Write Data
32 Cache Write Data
128 Memory
32 128
Read Data Read Data
Ready Ready
Multiple cycles
per access