Lec 2

COMPUTER ORGANIZATION AND DESIGN
5th
Edition
The Hardware/Software Interface
Chapter 5
Large and Fast:
Exploiting Memory
Hierarchy
§5.1 Introduction
Principle of Locality
 Programs access a small proportion of
their address space at any time
 Temporal locality
 Items accessed recently are likely to be
accessed again soon
 e.g., instructions in a loop, induction variables
 Spatial locality
 Items near those accessed recently are likely
to be accessed soon
 E.g., sequential instruction access, array data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2
Taking Advantage of Locality
 Memory hierarchy
 Store everything on disk
 Copy recently accessed (and nearby)
items from disk to smaller DRAM memory
 Main memory
 Copy more recently accessed (and
nearby) items from DRAM to smaller
SRAM memory
 Cache memory attached to CPU

Memory Hierarchy Levels
 Block (aka line): unit of copying
 May be multiple words (take
advantage of spatial locality)
 If accessed data is present in
upper level
Fast  Hit: access satisfied by upper level
 Hit ratio: hits/accesses
 If accessed data is absent
Slow  Miss: block copied from lower level
 Time taken: miss penalty
 Miss ratio: misses/accesses
= 1 – hit ratio
 Then accessed data supplied from
upper level

§5.3 The Basics of Caches
Cache Memory
 Cache memory (important subject!)
 The level of the memory hierarchy closest to the CPU
 Much smaller than main memory for fast access
 Given accesses X1, …, Xn–1, Xn
 How do we know if the

data is present?
 Where do we look?
 Not byte-addressable,
unlike main memory

Direct Mapped Cache
 Location determined by address mapping
 Direct mapped cache: only one choice
 (Block address) modulo (#Blocks in cache)
Low-order addr bits  Cache retains a part
of memory blocks
 Memory is byte-
addressable
 Use low-order block
address bits

Memory Addressing
Byte-addressable
Memory Address (32-bit) 2 2 3 3 Memory
0 1 2 3 8 9 0 1
Byte Address
Word Address Offset

(Align to word, offset 2-bit = ‘00’ ) in a word
Double-Word Address Offset

(Align to DW, offset 3-bit = ’000' ) in a double word
Note, offset bits are ‘0’ to align on the word

or double-word boundary
CDA3101
Tags and Valid Bits
 How do we know which particular block is
stored in a cache location?
 Store block address (ID) as well as the data
 Actually, only need the high-order bits
 Called the address tag
 Matching tag to determine cache hit
 What if there is no data in a location?
 Valid bit: 1 = present, 0 = not present
 Initially 0

Cache Example – Direct-mapped
 8-blocks, 1 word/block, direct mapped
 Initial state
Index V Tag Data

000 N
001 N
010 N
011 N
100 N
101 N
110 N
111 N

Cache Example – Direct-mapped
Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110
tag index
Index V Tag Data

000 N
001 N
010 N
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Cache Example
26 11 010 Miss 010
Index V Tag Data

000 N
001 N
010 Y 11 Mem[11010]
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Cache Example
22 10 110 Hit 110
26 11 010 Hit 010
Index V Tag Data

000 N
001 N
010 Y 11 Mem[11010]
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Cache Example
16 10 000 Miss 000
3 00 011 Miss 011
16 10 000 Hit 000
Index V Tag Data

000 Y 10 Mem[10000]
001 N
010 Y 11 Mem[11010]
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N

Cache Example
18 10 010 Miss 010
Index V Tag Data

000 Y 10 Mem[10000]
001 N
010 Y 10 11 Mem[10010] Replace old block!
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N

Four Basic Questions
• Consider access of levels in a memory hierarchy.
– For memory: byte addressable
– For caches: Use block (or called line) for the unit of data
transfer; satisfy Principle of Locality. But, need to locate
blocks (not byte addressable)
• Block transfer between cache levels, and the memory
• Cache design is described by four behaviors:
– Block Placement:
• Where could a new block be placed in the level?
– Block Identification:
• How is a block found if it is in the level?
– Block Replacement:
• Which existing block should be replaced if necessary?
– Write Strategy:
• How are writes to the block handled?
15
Address Subdivision
Depending on block size
for offset; 4-byte here
(1K blocks, i.e. 1K sets)

Example: Larger Block Size
 64 blocks, 16 bytes/block (1KB direct-
mapped cache)
 To what block number does address 1200
map?

Example: Larger Block Size
 64 blocks, 16 bytes/block (1KB direct-
mapped cache)
 To what block number does address 1200
map?
 Block address = 1200/16 = 75
 Block number = 75 modulo 64 = 11
31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits
Byte in the block
Block address

Block Size Considerations
 Larger blocks should reduce miss rate
 Due to spatial locality
 But in a fixed-sized cache
 Larger blocks ⇒ fewer of them
 More competition ⇒ increased miss rate
 Larger blocks ⇒ pollution (bring in unwanted
data)
 Larger miss penalty
 Can override benefit of reduced miss rate
 Early restart and critical-word-first can help
Cache Miss Rate – Block Sizes
FIGURE 5.11 Miss rate versus block size. Note that the miss rate actually goes up
if the block size is too large relative to the cache size. Each line represents a cache
of different size. (This figure is independent of associativity, discussed soon.)
Unfortunately, SPEC CPU2000 traces would take too long if block size were
included, so this data is based on SPEC92.
Copyright © 2014 Elsevier Inc. All rights reserved. 20

Cache Misses
 On cache hit, CPU proceeds normally
 On cache miss
 Stall the CPU pipeline
 Fetch block from next level of hierarchy
 Instruction cache miss
 Restart instruction fetch
 Data cache miss
 Complete data access

Write Policy - Write-Through
 On data-write hit, could just update the block in
cache
 But then cache and memory would be inconsistent
 Write through: also update memory
 But makes writes take longer
 e.g., if base CPI = 1, 10% of instructions are stores,
write to memory takes 100 cycles
 Effective CPI = 1 + 0.1×100 = 11
 Solution: write buffer
 Holds data waiting to be written to memory
 CPU continues immediately
 Only stalls on write if write buffer is already full

Write Policy - Write-Back
 Alternative: On data-write hit, just update
the block in cache
 Keep track of whether each block is dirty
 When a dirty block is replaced
 Write it back to memory
 Can use a write buffer to allow replacing block
to be read first. There exist 2 memory operations (reading of
the replacing block and writing the replaced one). Start by reading to
speed up execution. So u have to put the replaced block in a buffer to
allow the replacing block to be written

Write Allocation, Write Around
 What should happen on a write miss?
 Alternatives for write-through
 Allocate on miss: fetch the block
 Write around: don’t fetch the block. Called no
Write Allocation
 Since programs often write a whole block before
reading it (e.g., initialization)
 For write-back
 Usually fetch the block (write allocate)

Example: Intrinsity FastMATH
 Embedded MIPS processor
 12-stage pipeline
 Instruction and data access on each cycle
 Split cache: separate I-cache and D-cache
 Each 16KB: 256 blocks × 16 words/block
 D-cache: write-through or write-back
 SPEC2000 miss rates
 I-cache: 0.4%
 D-cache: 11.4%
 Weighted average: 3.2%

Example: Intrinsity FastMATH
Direct-mapped, 64-byte block
16KB cache
Cache size = 64x256
= 16 KB
(also called word offset)

(64 bytes)
(4 bytes)

Lec 2

Uploaded by

Copyright:

Available Formats

Lec 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 2

Uploaded by

Copyright:

Available Formats

COMPUTER ORGANIZATION AND DESIGN

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 4

 How do we know if the

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 6

Word Address Offset

Double-Word Address Offset

Note, offset bits are ‘0’ to align on the word

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 8

Index V Tag Data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 9

Index V Tag Data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 10

Index V Tag Data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 11

Index V Tag Data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 12

Index V Tag Data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 13

Index V Tag Data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 14

(1K blocks, i.e. 1K sets)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 16

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 17

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 18

Copyright © 2014 Elsevier Inc. All rights reserved. 20

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 21

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 24

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 25

(also called word offset)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 26

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.