Lec 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

COMPUTER ORGANIZATION AND DESIGN

5th
Edition
The Hardware/Software Interface

Chapter 5
Large and Fast:
Exploiting Memory
Hierarchy
§5.1 Introduction
Principle of Locality
 Programs access a small proportion of
their address space at any time
 Temporal locality
 Items accessed recently are likely to be
accessed again soon
 e.g., instructions in a loop, induction variables
 Spatial locality
 Items near those accessed recently are likely
to be accessed soon
 E.g., sequential instruction access, array data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2
Taking Advantage of Locality
 Memory hierarchy
 Store everything on disk
 Copy recently accessed (and nearby)
items from disk to smaller DRAM memory
 Main memory
 Copy more recently accessed (and
nearby) items from DRAM to smaller
SRAM memory
 Cache memory attached to CPU

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3


Memory Hierarchy Levels
 Block (aka line): unit of copying
 May be multiple words (take
advantage of spatial locality)
 If accessed data is present in
upper level
Fast  Hit: access satisfied by upper level
 Hit ratio: hits/accesses
 If accessed data is absent
Slow  Miss: block copied from lower level
 Time taken: miss penalty
 Miss ratio: misses/accesses
= 1 – hit ratio
 Then accessed data supplied from
upper level

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 4


§5.3 The Basics of Caches
Cache Memory
 Cache memory (important subject!)
 The level of the memory hierarchy closest to the CPU
 Much smaller than main memory for fast access
 Given accesses X1, …, Xn–1, Xn

 How do we know if the


data is present?
 Where do we look?
 Not byte-addressable,
unlike main memory

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5


Direct Mapped Cache
 Location determined by address mapping
 Direct mapped cache: only one choice
 (Block address) modulo (#Blocks in cache)
Low-order addr bits  Cache retains a part
of memory blocks
 Memory is byte-
addressable
 Use low-order block
address bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 6


Memory Addressing
Byte-addressable
Memory Address (32-bit) 2 2 3 3 Memory
0 1 2 3 8 9 0 1

Byte Address

Word Address Offset


(Align to word, offset 2-bit = ‘00’ ) in a word

Double-Word Address Offset


(Align to DW, offset 3-bit = ’000' ) in a double word

Note, offset bits are ‘0’ to align on the word


or double-word boundary

CDA3101
Tags and Valid Bits
 How do we know which particular block is
stored in a cache location?
 Store block address (ID) as well as the data
 Actually, only need the high-order bits
 Called the address tag
 Matching tag to determine cache hit
 What if there is no data in a location?
 Valid bit: 1 = present, 0 = not present
 Initially 0

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 8


Cache Example – Direct-mapped
 8-blocks, 1 word/block, direct mapped
 Initial state

Index V Tag Data


000 N
001 N
010 N
011 N
100 N
101 N
110 N
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 9


Cache Example – Direct-mapped
Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110

tag index

Index V Tag Data


000 N
001 N
010 N
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 10


Cache Example
Word addr Binary addr Hit/miss Cache block
26 11 010 Miss 010

Index V Tag Data


000 N
001 N
010 Y 11 Mem[11010]
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 11


Cache Example
Word addr Binary addr Hit/miss Cache block
22 10 110 Hit 110
26 11 010 Hit 010

Index V Tag Data


000 N
001 N
010 Y 11 Mem[11010]
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 12


Cache Example
Word addr Binary addr Hit/miss Cache block
16 10 000 Miss 000
3 00 011 Miss 011
16 10 000 Hit 000

Index V Tag Data


000 Y 10 Mem[10000]
001 N
010 Y 11 Mem[11010]
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 13


Cache Example
Word addr Binary addr Hit/miss Cache block
18 10 010 Miss 010

Index V Tag Data


000 Y 10 Mem[10000]
001 N
010 Y 10 11 Mem[10010] Replace old block!
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 14


Four Basic Questions
• Consider access of levels in a memory hierarchy.
– For memory: byte addressable
– For caches: Use block (or called line) for the unit of data
transfer; satisfy Principle of Locality. But, need to locate
blocks (not byte addressable)
• Block transfer between cache levels, and the memory
• Cache design is described by four behaviors:
– Block Placement:
• Where could a new block be placed in the level?
– Block Identification:
• How is a block found if it is in the level?
– Block Replacement:
• Which existing block should be replaced if necessary?
– Write Strategy:
• How are writes to the block handled?
15
Address Subdivision
Depending on block size
for offset; 4-byte here

(1K blocks, i.e. 1K sets)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 16


Example: Larger Block Size
 64 blocks, 16 bytes/block (1KB direct-
mapped cache)
 To what block number does address 1200
map?

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 17


Example: Larger Block Size
 64 blocks, 16 bytes/block (1KB direct-
mapped cache)
 To what block number does address 1200
map?
 Block address = 1200/16 = 75
 Block number = 75 modulo 64 = 11
31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits
Byte in the block
Block address

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 18


Block Size Considerations
 Larger blocks should reduce miss rate
 Due to spatial locality
 But in a fixed-sized cache
 Larger blocks ⇒ fewer of them
 More competition ⇒ increased miss rate
 Larger blocks ⇒ pollution (bring in unwanted
data)
 Larger miss penalty
 Can override benefit of reduced miss rate
 Early restart and critical-word-first can help
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 19
Cache Miss Rate – Block Sizes

FIGURE 5.11 Miss rate versus block size. Note that the miss rate actually goes up
if the block size is too large relative to the cache size. Each line represents a cache
of different size. (This figure is independent of associativity, discussed soon.)
Unfortunately, SPEC CPU2000 traces would take too long if block size were
included, so this data is based on SPEC92.

Copyright © 2014 Elsevier Inc. All rights reserved. 20


Cache Misses
 On cache hit, CPU proceeds normally
 On cache miss
 Stall the CPU pipeline
 Fetch block from next level of hierarchy
 Instruction cache miss
 Restart instruction fetch
 Data cache miss
 Complete data access

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 21


Write Policy - Write-Through
 On data-write hit, could just update the block in
cache
 But then cache and memory would be inconsistent
 Write through: also update memory
 But makes writes take longer
 e.g., if base CPI = 1, 10% of instructions are stores,
write to memory takes 100 cycles
 Effective CPI = 1 + 0.1×100 = 11
 Solution: write buffer
 Holds data waiting to be written to memory
 CPU continues immediately
 Only stalls on write if write buffer is already full

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22


Write Policy - Write-Back
 Alternative: On data-write hit, just update
the block in cache
 Keep track of whether each block is dirty
 When a dirty block is replaced
 Write it back to memory
 Can use a write buffer to allow replacing block
to be read first. There exist 2 memory operations (reading of
the replacing block and writing the replaced one). Start by reading to
speed up execution. So u have to put the replaced block in a buffer to
allow the replacing block to be written

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23


Write Allocation, Write Around
 What should happen on a write miss?
 Alternatives for write-through
 Allocate on miss: fetch the block
 Write around: don’t fetch the block. Called no
Write Allocation
 Since programs often write a whole block before
reading it (e.g., initialization)
 For write-back
 Usually fetch the block (write allocate)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 24


Example: Intrinsity FastMATH
 Embedded MIPS processor
 12-stage pipeline
 Instruction and data access on each cycle
 Split cache: separate I-cache and D-cache
 Each 16KB: 256 blocks × 16 words/block
 D-cache: write-through or write-back
 SPEC2000 miss rates
 I-cache: 0.4%
 D-cache: 11.4%
 Weighted average: 3.2%

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 25


Example: Intrinsity FastMATH
Direct-mapped, 64-byte block
16KB cache
Cache size = 64x256
= 16 KB

(also called word offset)


(64 bytes)

(4 bytes)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy