0% found this document useful (0 votes)

38 views9 pages

Improving Cache Performance Reducing Misses

Reducing cache misses can improve performance. There are three types of cache misses: compulsory, capacity, and conflict. To reduce misses, one can increase cache size or associativity, use larger block sizes, employ victim caches or hardware prefetching. For example, using an 8-way set associative cache rather than a direct mapped cache can lower average memory access time according to an example given. Hardware prefetching of both instructions and data from up to 8 streams has been shown to reduce misses by 50-70% for some workloads.

Uploaded by

Kunal Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views9 pages

Improving Cache Performance Reducing Misses

Uploaded by

Kunal Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Improving Cache Performance Reducing Misses

How To Measure
Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks) • Classifying Misses: 3 Cs
– Compulsory—The first access to a block is not in the
Misses in infinite
• 1. Reduce the miss rate, cache, so the block must be brought into the cache. These
cache
are also called cold start misses or first reference misses.
• 2. Reduce the miss ppenalty,
y, or
– Capacity—If C is the size of the cache (in blocks) and N
Non-compulsory
l
• 3. Reduce the time to hit in the cache. there have been more than C unique cache blocks misses in size X
accessed since this cache was last accessed. fully associative
– Conflict—Any miss that is not a compulsory miss or cache
capacity miss must be a byproduct of the cache mapping Non-compulsory,
algorithm. A conflict miss occurs because too many non-capacity
acti e blocks are mapped to the same cache set.
active set misses
i

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

How To Reduce Misses? Reduce Misses via Larger Block Size

• Compulsory Misses?

• C
Capacity
i Misses?
Mi ?

• Conflict Misses?

• What can the compiler do?

• 16K cache, miss penalty for 16-byte block = 42, 32-byte is 44, 64-byte is 48.
Miss rates are 3.94,, 2.87,, and 2.64%. Which gives
g best performance
p (lowest
(
AMAT)?
CSE 240A Dean Tullsen CSE 240A Dean Tullsen
Example: Avg. Memory Access Time
Reduce Misses via Higher Associativity
vs Miss Rate
vs.
• Beware: Execution time is only final measure! • Example: assume CT = 1.10 for 2-way, 1.12 for 4-way, 1.14 for
– Will Clock Cycle time increase?
8-way vs
vs. CT direct mapped
– Hill [1988] suggested hit time external cache +10%, internal + 2% AMAT
for 2-way vs. 1-way Cache Size Associativity
(KB) 1
1-way 2
2-way 4
4-way 88-way
1 7.65 6.60 6.22 5.44
2 5.90 4.90 4.62 4.09
4 4 60
4.60 3 95
3.95 3 57
3.57 3 19
3.19
8 3.30 3.00 2.87 2.59
16 2.45 2.20 2.12 2.04
32 2.00 1.80 1.77 1.79
64 1.70 1.60 1.57 1.59
128 1.50 1.45 1.42 1.44

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Reducing Misses by emulating Reducing Misses by HW Prefetching of

associativity: Victim Cache Instruction & Data
• E.g., Instruction Prefetching
– Alpha
Al h 21064 ffetches
h 2 blblocks
k on a miss
i
• HR of associative + access – Extra block placed in stream buffer
time of direct mapped? – On miss check stream buffer
• Add bbuffer
ff tto hhold
ld ddata
t • Works with data blocks too:
recently discarded from cache – Jouppi [1990] 1 data stream buffer got 25% misses from 4KB
• Jouppi
pp [[1990]:
] 4-entryy victim cache; 4 streams got 43%
cache removed 20% to 95% – Palacharla & Kessler [1994] for scientific programs for 8 streams
got 50% to 70% of misses from 2 64KB, 4-way set associative
of conflicts for a 4 KB direct caches
mapped data cache • Prefetching
P f t hi relies
li on extra
t memory bandwidth
b d idth that
th t can be
b
used without penalty

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Reducing Misses by Reducing Misses by Various
SW Prefetching Data Compiler Optimizations
• Instructions
– Reorder procedures in memory so as to reduce misses
• Data Prefetch – Profiling to look at conflicts
– McFarling [1989] reduced cache misses by 75% on 8KB direct mapped cache
– Load data into register (HP PA-RISC, IA64, Tera) with 4 byte blocks
– Cache Prefetch: load into cache (MIPS IVIV, PowerPC
PowerPC, SPARC) • D t
Data
– Special prefetching instructions cannot cause faults; – Merging Arrays: improve spatial locality by single array of compound elements
a form of speculative execution vs. 2 arrays
– Loop Interchange: change nesting of loops to access data in order stored in
memory
• Issuing Prefetch Instructions (including address calculation) – Loop Fusion: Combine 2 independent loops that have same looping and some
takes time variables overlap
– Blocking:
Bl ki I
Improve ttemporall locality
l lit by b accessing
i “blocks”
“bl k ” off data
d t repeatedly
t dl
– Is cost of prefetch issues < savings in reduced misses? vs. going down whole columns or rows

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Merging Arrays Example Loop Interchange Example

/* Before */ /* Before */
int val[SIZE]; for (k = 0; k < 100; k = k+1)
int key[SIZE]; for (j = 0; j < 100; j = j+1)
for (i = 0; i < 5000; i = i+1)
/* After */ x[i][j] = 2 * x[i][j];
struct merge { /* After */
int val; for (k = 0; k < 100; k = k+1)
int key; for (i = 0; i < 5000; i = i+1)
}; for (j = 0; j < 100; j = j+1)
struct merge merged_array[SIZE]; x[i][j] = 2 * x[i][j];

Sequential
S ti l accesses instead
i t d off striding
t idi through
th h memory
Reducing conflicts between val & key every 100 words

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Loop Fusion Example
Blocking Example
/* Before */ /* Before */
for (i = 0; i < N; i = i+1) for (i = 0; i < N; i = i+1)
for (j = 0; j < N; j = j+1) for (j = 0; j < N; j = j+1)
a[i][j] = 1/b[i][j] * c[i][j]; {r = 0;
for (k = 0; k < N; k = k+1){
for (i = 0; i < N; i = i+1)
r = r + y[i][k]*z[k][j];};
for (j = 0; j < N; j = j+1)
x[i][j] = r;
d[i][j] = a[i][j] + c[i][j]; };
• Two Inner Loops:
/* After */
– Read all NxN elements of z[]
for (i = 0; i < N; i = i+1)
– Read N elements of 1 row of y[] repeatedly
for (j = 0; j < N; j = j+1)
– Write N elements of 1 row of x[]
{ a[i][j] = 1/b[i][j] * c[i][j];
d[i][j] = a[i][j] + c[i][j];}
• Capacity Misses a function of N & Cache Size:
– worst case => 2N3 + N2.
2 misses per access to a & c vs
vs. one miss per access • Id compute
Idea: t on BxB
B B submatrix
b t i that
th t fits
fit in
i cache
h
CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Blocking Example Key Points

/* After */ ⎛ Memory accesses ⎞
CPUtime = IC × CPI + × Miss rate × Miss penalty × Clock cycle time
for (jj = 0; jj < N; jj = jj+B) ⎝ Execution
Instruction ⎠
for (kk = 0; kk < N; kk = kk+B)
for (i = 0; i < N; i = i+1)
• 3 Cs: Compulsory, Capacity, Conflict Misses
for (j = jj; j < min(jj+B-1,N); j = j+1) • Reducing Miss Rate
{ = 0
{r 0; – 11. Reduce Misses via Larger Block Size
for (k = kk; k < min(kk+B-1,N); k = k+1) { – 2. Reduce Misses via Higher Associativity
r = r + y[i][k]*z[k][j];}; – 3. Reducing Misses via Victim Cache
x[i][j] = x[i][j] + r; – 4 Reducing Misses by HW Prefetching Instr,
4. Instr Data
}; – 5. Reducing Misses by SW Prefetching Data
– 6. Reducing Misses by Compiler Optimizations
• Capacity Misses from 2N3 + N2 to 2N3/B +N2 • R
Rememberb danger
d off concentrating
t ti on just
j t one parameter
t when
h
• B called Blocking Factor evaluating performance
• Conflict Misses Are Not As Easy... • Next: reducing Miss penalty

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Reducing Miss Penalty: Read Priority
Improving Cache Performance
over Write on Miss
1. Reduce the miss rate, • The easiest way to resolve RAW hazards (and other ordering issues)
b t
between loads
l d andd stores
t is
i to
t sendd them
th all
ll to
t memory in
i instruction
i t ti
2. Reduce the miss penalty, or
order.
3. Reduce the time to hit in the cache. • If always wait for write buffer to empty might increase read miss penalty
b 50%
by
• Check write buffer contents before read;
if no conflicts, let the memory access continue
• Write
i Backk Caches?
h
– Read miss may require write of dirty block
– Normal: Write dirty block to memory, and then do the read
– Instead copy the dirty block to a write buffer, then do the read, and then do
the write
– CPU stalls less since it can restart as soon as read completes

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Non-blocking Caches to reduce

Early Restart and Critical Word First
stalls on misses
• Don’t wait for full block to be loaded before restarting CPU • Non-blockingg cache ((or lockup-free
pf cache)) allowd the data
– Early restart—As soon as the requested word of the block arrives, send it cache to continue to supply cache hits during a miss
to the CPU and let the CPU continue execution
– Critical Word First—Request the missed word first from memory and
• “hit under miss” reduces the effective miss penalty by being
helpful during a miss instead of ignoring the requests of the
send it to the CPU as soon as it arrives; let the CPU continue execution
CPU
while filling the rest of the words in the block. Also called wrapped fetch
and requested word first • “hit under multiple miss” or “miss under miss” can further
lower the effective miss penalty by overlapping multiple misses
• Most useful with large blocks,
– Significantly increases the complexity of the cache controller as there can
• Spatial locality a problem; often we want the next sequential be multiple outstanding memory accesses
word soon,
soon so not always a benefit (early restart)
restart). • assumes
ass mes “stall on use”
se” not “stall on miss” which
hich works
orks naturally
nat rall
with dynamic scheduling, but can also work with static.

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Miss Penalty Reduction: Second
But…
Level Cache
• L2 Equations
• The primary way to reduce miss penalty… AMAT = Hit TimeL1 + Miss RateL1 x Miss PenaltyL1
cpu
Miss PenaltyL1 = Hit TimeL2 + Miss RateL2 x Miss PenaltyL2
cpu lowest-level
AMAT = Hit TimeL1 + Miss RateL1 x (Hit TimeL2 + Miss RateL2 + cache
Miss PenaltyL2)
cache
• Definitions:
next-level
– Local miss rate
rate— misses in this cache divided by the total memory/cache
number of memory accesses to this cache (Miss rateL2)
cache – Global miss rate—misses in this cache divided by the total
number of memory accesses generated by the CPU
(Mi Rate
(Miss R L1 x Miss
Mi Rate
R L2)

Memory
CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Multi-level Caches,, cont. Reducing Miss Penalty Summary

• L1 cache local miss rate 10%, L2 local miss rate 40%. What are
the global miss rates?
• L1 highest priority is fast hit time. L2 typically low miss rate. • Four techniques
– Read priority over write on miss
• Design L1 and L2 caches in concert.
– E l Restart
Early R t t andd Critical
C iti l Word
W d First
Fi t on miss
i
• Property of inclusion -- if it is in L1 cache, it is guaranteed to be – Non-blocking Caches (Hit Under Miss)
in the L2 cache -- simplifies design of consistent caches. – Multi-level Caches
• L2 cache
h can have
h different
diff associativity
i i i (good
( d idea?)
id ?) or block
bl k
size (good idea?) than L1 cache.
• These principles can continue to be applied recursively to
Multilevel Caches
– Danger is that time to DRAM will grow with multiple levels in between

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Fast Hit times via
Review: Improving Cache Performance
Small and Simple Caches
1. Reduce the miss rate,
2. Reduce the miss penalty, or • This is why Alpha 21164 has 8KB Instruction and 8KB
3. Reduce the time to hit in the cache. data cache + 96KB second level cache
• I and D caches used to be typically Direct Mapped,
Mapped on chip

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

DM Hit Time + Associative Hit Rate -> Fast hits by Avoiding

Way Prediction Address Translation: Virtual Cache
• Add bits (?) to each cache line to predict which way is • Send virtual address to cache? Called Virtually Addressed Cache or just Virtual
y
Cache vs. Physical Cache
going
i to hit.
hi – Every time process is switched logically must flush the cache; otherwise get false hits
• How is that going to help? Cost is time to flush + “compulsory” misses from empty cache
– Dealing with aliases (sometimes called synonyms);
– Read one tag & compare T different
Two diff t virtual
i t l addresses
dd map to
t same physical
h i l address
dd
– Speculatively read data from that one block – I/O must interact with cache…
• Next cycle • Solution to aliases
– HW that guarantees that every cache block has unique physical address
– Read other tags and compare
– SW guarantee : lower n bits must have same address; as long as covers index field &
• Pentium 4
tag data tag data lru wp
direct mapped, they must be unique; called page coloring
• Solution to cache flush
– Add process identifier tag that identifies process as well as address within process: can’t
get a hit if wrong process

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Virtual Cache Cache Bandwidth: Trace Caches

• Physical Cache Virtual Cache • Fetch Bottleneck – Cannot execute instructions faster than
you can fetch
f h themh iinto the
h processor.
cpu cpu
• Cannot typically fetch more than about one taken branch
per cycle, at best (why? Why one taken branch?)
Virtual address
TLB
Virtual address • Trace cache is an instruction cache that stores instructions
in dynamic execution order rather than program/address
Ph i l address
Physical dd
TLB cache order.
d
cache • Implemented on the Pentium 4

Physical address

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Trace Cache Cache Optimization Summary

A Technique MR MP HT Complexity
B
C Larger
g Block Size
beq J: Higher Associativity
D Victim Caches
E A B C beq D E F G A B C beq GH jsr W X ret I HW Prefetching of Instr/Data
F H jsr I J K L M N p
Compiler Controlled Prefetchingg
J: G Compiler Reduce Misses
H Priority to Read Misses
jsr W Early Restart & Critical Word 1st
… W X ret … Non-Blocking Caches
Second Level Caches
Small & Simple Caches
W
Way Prediction
X
Avoiding Address Translation
ret
Trace Cache?
Conventional Cache Trace Cache

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Cache Research at UCSD Cache Research at UCSD, cont.

• Hardware
H d prefetching
f t hi off complex
l data
d t structures
t t (e.g.,
( pointer
i t chasing)
h i )
• Event-driven compilation
p – while main thread runs,, hw monitors
identify problematic loads, then fork new compilation thread (on SMT
• Fetch Target Buffer or CMP) to alter code.
– Let branch predictor run ahead of fetch engine – Dynamic value specialization
• Runtime
R ti identification
id tifi ti off cache
h conflict
fli t misses
i – Inline
li software
f prefetching
f hi
• Speculative Precomputation (helper thread prefetching) – Helper thread prefetching (speculative precomputation)
– Spawn threads at runtime to calculate addresses of delinquent • Software Data Spreading
(problematic) loads and prefetch Æ creates prefetcher from application – IInsertt migration
i ti calls
ll in
i loops
l with
ith large
l data
d t sets,
t spreading
di the
th data
d t over
code. multiple private caches.
• Code Layout to Reduce Icache Conflict Misses • Inter-core Prefetching
– Also, for multithreaded processors – Prefetch thread runs ahead of main thread, but in another core. After an
• Code Layout to Reduce Dcache Conflict Misses interval, they swap cores. The main thread finds all of its data preloaded
– Also, for multithreaded processors into the new cache, and the prefetcher starts prefilling the next cache.

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

CS252 Graduate Computer Architecture Caches and Memory Systems I
No ratings yet
CS252 Graduate Computer Architecture Caches and Memory Systems I
49 pages
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
57 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
No ratings yet
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
78 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Caches and Memory
No ratings yet
Caches and Memory
65 pages
Lec 33
No ratings yet
Lec 33
26 pages
Aca Seminar Report
No ratings yet
Aca Seminar Report
11 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Cache Performance
No ratings yet
Cache Performance
41 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Cache Optimizations
No ratings yet
Cache Optimizations
23 pages
Lec 34
No ratings yet
Lec 34
26 pages
Lecture 12: Cache Innovations
No ratings yet
Lecture 12: Cache Innovations
17 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
No ratings yet
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
12 pages
Cache
No ratings yet
Cache
34 pages
Week 13 - Lecture 13 - Memory (cont)
No ratings yet
Week 13 - Lecture 13 - Memory (cont)
31 pages
Cache Impact On Performance: An Example: Assuming The Following Execution and Cache Parameters
No ratings yet
Cache Impact On Performance: An Example: Assuming The Following Execution and Cache Parameters
32 pages
Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University
No ratings yet
Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University
27 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
10_Caches
No ratings yet
10_Caches
34 pages
Cane Line OutdoorCollection 2023
No ratings yet
Cane Line OutdoorCollection 2023
296 pages
L17
No ratings yet
L17
23 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
17 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
L18-Cache-Wrap-up
No ratings yet
L18-Cache-Wrap-up
30 pages
Compiler Optimizations and Prefetching
No ratings yet
Compiler Optimizations and Prefetching
22 pages
Merging Write Buffers
No ratings yet
Merging Write Buffers
14 pages
Cache Performance Average Memory Access Time
No ratings yet
Cache Performance Average Memory Access Time
23 pages
L07-MemoryII
No ratings yet
L07-MemoryII
27 pages
CS152 Quiz #2: Name: - This Is A Closed Book, Closed Notes Exam. 80 Minutes 9 Pages
No ratings yet
CS152 Quiz #2: Name: - This Is A Closed Book, Closed Notes Exam. 80 Minutes 9 Pages
13 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
No ratings yet
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
16 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II
No ratings yet
CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II
14 pages
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
No ratings yet
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
18 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
CS530-Fall2015-Lecture6
No ratings yet
CS530-Fall2015-Lecture6
3 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
New Quotation Formate
100% (1)
New Quotation Formate
9 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
1004227 (1)
100% (1)
1004227 (1)
548 pages
Cse410 Sp09 Final Sol
No ratings yet
Cse410 Sp09 Final Sol
10 pages
Average Access Time (AAT)
No ratings yet
Average Access Time (AAT)
6 pages
COA Digital-Cheatsheet
No ratings yet
COA Digital-Cheatsheet
4 pages
Project Report On: New Product Development Strategy
No ratings yet
Project Report On: New Product Development Strategy
32 pages
Crictical Word First For Cache Misses
No ratings yet
Crictical Word First For Cache Misses
21 pages
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
Phase Equilibrium: Pure Substance
No ratings yet
Phase Equilibrium: Pure Substance
28 pages
Lecture16 PDF
No ratings yet
Lecture16 PDF
4 pages
Cache Performance Improving Cache Performance
No ratings yet
Cache Performance Improving Cache Performance
6 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Pipelining DLX PDF
No ratings yet
Pipelining DLX PDF
64 pages
Solution of CSE 240A Assignemnt 3
No ratings yet
Solution of CSE 240A Assignemnt 3
5 pages
Homework Questionnaire
100% (1)
Homework Questionnaire
5 pages
Last Resort 01
100% (3)
Last Resort 01
64 pages
Chm1163 Week 6
No ratings yet
Chm1163 Week 6
76 pages
Computer Architecture
No ratings yet
Computer Architecture
5 pages
Fire Guard Training Handbook
100% (1)
Fire Guard Training Handbook
172 pages
Amphora Cjenik Web 2023
No ratings yet
Amphora Cjenik Web 2023
24 pages
Brigance Developmental Milestones
No ratings yet
Brigance Developmental Milestones
54 pages
MAPEH 6 - MUSIC PPT Q3 - Timbre (Recovered)
No ratings yet
MAPEH 6 - MUSIC PPT Q3 - Timbre (Recovered)
48 pages
This Bread Bakery Business Plan
100% (1)
This Bread Bakery Business Plan
39 pages
Writing Conceptual Article
No ratings yet
Writing Conceptual Article
17 pages
IFFCO Training Report File Full
No ratings yet
IFFCO Training Report File Full
53 pages
Food and Drink Vocabulary - Spanish - 1st Grade by Slidesgo
No ratings yet
Food and Drink Vocabulary - Spanish - 1st Grade by Slidesgo
40 pages
Master Thesis
No ratings yet
Master Thesis
78 pages
Unit - II: Environmental Engineering-I: Topic
No ratings yet
Unit - II: Environmental Engineering-I: Topic
53 pages
Assessment q2 SCIENCE
No ratings yet
Assessment q2 SCIENCE
11 pages
Lec 4
No ratings yet
Lec 4
19 pages
Population Science American English Student
No ratings yet
Population Science American English Student
6 pages
Residual Alkalinity Nomograph by John Palmer PDF
No ratings yet
Residual Alkalinity Nomograph by John Palmer PDF
1 page
Engineering Management: Oliver N. Oliveros
No ratings yet
Engineering Management: Oliver N. Oliveros
14 pages
Besr Reviewer
No ratings yet
Besr Reviewer
41 pages
Poststructuralism
No ratings yet
Poststructuralism
4 pages
Wbjee - 2023 Math
No ratings yet
Wbjee - 2023 Math
33 pages
Polysilicon Vs Metal Gate PDF
No ratings yet
Polysilicon Vs Metal Gate PDF
32 pages
PDAJChallenge Kit
100% (1)
PDAJChallenge Kit
15 pages
Installing Cisco Unified Communications Manager Release 8.5
No ratings yet
Installing Cisco Unified Communications Manager Release 8.5
38 pages
04
No ratings yet
04
7 pages
Physical Modelling and Experimental Characterisation of Inalas/Ingaas Avalanche Photodiode For 10 GB/S Data Rates and Higher
No ratings yet
Physical Modelling and Experimental Characterisation of Inalas/Ingaas Avalanche Photodiode For 10 GB/S Data Rates and Higher
6 pages
Copyright Form IJARESM Journal 2
No ratings yet
Copyright Form IJARESM Journal 2
1 page
Topic: Functions and Importance of Mass Media. Functions of Mass Media
No ratings yet
Topic: Functions and Importance of Mass Media. Functions of Mass Media
4 pages
Arts Education Synthesis Essay
No ratings yet
Arts Education Synthesis Essay
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Improving Cache Performance Reducing Misses

Uploaded by

Improving Cache Performance Reducing Misses

Uploaded by

Improving Cache Performance Reducing Misses

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

How To Reduce Misses? Reduce Misses via Larger Block Size

• What can the compiler do?

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Reducing Misses by emulating Reducing Misses by HW Prefetching of

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Merging Arrays Example Loop Interchange Example

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Blocking Example Key Points

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Non-blocking Caches to reduce

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Multi-level Caches,, cont. Reducing Miss Penalty Summary

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

DM Hit Time + Associative Hit Rate -> Fast hits by Avoiding

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

Trace Cache Cache Optimization Summary

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

CSE 240A Dean Tullsen CSE 240A Dean Tullsen

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.