0% found this document useful (0 votes)

7 views

Assignment (G)

This document discusses high performance computing through cache memory architectures. It compares separate instruction and data caches to a unified cache, and looks at miss rates and average memory access times. Shared L2/L3 caches have issues as core count increases, like contention. Separate caches have pros like avoiding structural hazards but shared caches allow flexibility in capacity usage, lowering miss rates.

Uploaded by

shs5feb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Assignment (G)

Uploaded by

shs5feb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

High Performance Compu ng

1. Consider only the Level 1 cache. We compare the performances between a pair of 16KB-
instruc on/16KB-data separate caches, and a 32 KB uni ed cache. A benchmark suite contains
25% of data transfer instruc ons. Each data transfer instruc on consists of one instruc on fetch
and one data transfer. Assume a hit takes 1 clock cycle and the miss penalty is 100 clock cycles. A
load or store hit takes 1 extra cycle on a uni ed cache. The following table shows Miss per 1000
instruc ons for instruc on, data, and uni ed caches of di erent sizes.

Size Instruc on cache Data cache Uni ed cache

16KB 3.62 40.9 50.6
32KB 1.27 37.8 43.2

(1) Find the percentage of instruc on references in the en re memory references.

From the benchmark suite, we know that 25% of the instruc ons are data transfer
instruc ons. Thus, assuming there are 100 instruc ons, there will be 100 instruc ons fetch
and 25 data transfer. The percentage of instruc on references in the en re memory
references is,

instruction references 100

%instruction references = = = 80%
instruction references + data references 100 + 25

(2) Find the miss rate of the 16KB instruc on cache and that of the 16KB data cache respec vely.

Miss rate is given by,

Misses
1000 Inst r uct ions
1000
Miss rate = Memory Access
Instruction

Since every instruc on access has exactly one memory access to fetch the instruc on, the
instruc on miss rate is

3.62/1000
Miss rate16KB Instruction = = 0.00362
1.00
fi
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
fi
fi
ti
ti
ti
ff
fi
ti
ti
ti
ti
ti
ti
ti
Since 25% of the instruc ons are data transfers, the data miss rate is,

40.9/1000
Miss rate16KB Data = = 0.1636
0.25

(3) Find the overall miss rate of the separate caches.

Since 80% of memory accesses are instruc on references, the overall miss rate is,

(80% × 0.00362) + (20% × 0.1636) = 0.035616

(4) Find the miss rate of the 32KB uni ed cache.

43.2/1000
Miss rate32KB Unified = = 0.03456
1.00 + 0.25

(5) Find the average memory access me of the separate caches.

Average memory access me is given by,

Average memory access time = Hit time + Miss rate × Miss penalty
Average memory access timeseparate = 80% × (1 + 0.00362 × 100) + 20% × (1 + 0.1636 × 100) = (80% × 1.362) + (20% × 17.36) = 4.5616

(6) Find the average memory access me of the uni ed cache.

Average memory access time unified = 80% × (1 + 0.03456 × 100) + 20% × (1 + 1 + 0.03456 × 100) = (80% × 4.456) + (20% × 5.456) = 4.656

From the results shown above, we could see that the miss rate of the separate caches is higher
compared to the miss rate of the uni ed cache. It is due to the fact that the uni ed cache has the
exibility to manage how to ll all of the capacity it has, thus the chance to have misses is
lowered. However, when we examine further to the average memory access me, it turns out
that the separate cache is be er since it shows lower value. This has to do with the fact that the
separate caches o er more ports per clock thus able to avoid the structural hazard.
fl
ff
ti
ti
tt
fi
ti
ti
fi
fi
ti
fi
fi
ti
2. Mul core processor is popular in the current microprocessor designs. Some mul core processors
in the market have separate L2 (or L3) caches, i.e. one cache per core, while others have L2 (L3)
caches shared by all cores. (Note that we assume the L2/L3 caches are all uni ed, i.e. both
instruc ons and data are on the same cache.)

Discuss the pros and cons of the separate cache architecture, using the knowledge you have
acquired in the lectures and the reliable informa on found in books, technical papers, on the
webs, etc.

When the number of processor cores increases, for example 4, 8, or more, what would be the
major problem(s) with the shared cache? Explain a possible solu on if you have an idea.

I would like to begin with what cache memory is and what the importance are. In the past, CPU
and memory had a slight di erent clock speed. Along with their development, it turned out that
CPU speed increased much faster than that of memory. This was due to the strict boundary
where the engineer stuck with slow DRAM to keep the cost low, as opposed to migra ng to fast
SRAM which is very expensive. This was where cache memory rst appeared, where it is actually
a small SRAM. Building a small SRAM would not cost that much yet giving signi cant performance
improvement.

Cache memory works as a bridge between CPU and main memory. Cache stores par cular block
from main memory so that it can be readily used by CPU, thus minimizing the wai ng me. As a
comparison, accessing cache is roughly 100 mes faster compared to the main memory. However,
since cache is small in term of size, only a very limited amount of data can be stored in cache. This
is where the term “cache hit” and “cache miss” appeared, where “hit” implies the data requested
by CPU is available in cache, while “miss” implies the reverse. When miss takes place, it requires
the cache to retrieve the data from the main memory, which is way slower. This leads to a term
“Miss penalty” which is a me required to wait the cache nding the data from main memory.
One way to reduce this is by introducing a second level (L2) or third level (L3) cache, where they
have much larger capacity yet slower compared to rst level (L1) cache, but s ll fast enough
compared to main memory. As in recent days there have been several implementa ons of L4
cache, I would like to restrain our discussion here about this last level cache, either L2, L3 or L4.
Thus, our current discussion is about the pros and cons of having a shared or separate last cache
for mul core processor.

Figure 1. Mul core processor die map (source: h p://www.techwarelabs.com/)

ti
ti
ti
ti
ti
ff
ti
tt
ti
fi
fi
fi
ti
fi
ti
ti
ti
fi
ti
ti
ti
ti
Figure 1 gives an example of L3 installa on on a die. Although it is not the only way to have a
shared L3 for mul core processor, it is a current common prac ce among vendors. Another way is
indeed to have a separate L3 cache each for every core. We can refer to Figure 1, and further
imagine having cache in Fig. 1 is cut so that it become 4 private caches. For a shared L3 cache, it is
possible that the data required by core-0 is available at cache block near the core-3, where it is far
from core-0. Having a private L3 cache will thus localize the required block so that the required
me to access the block in L3 cache is smaller. The advantage of having separate private L3 cache
is thus it cuts access latency. Another advantage is that since every cache has di erent bus
connec ng it to the core, it will reduce the chance of bus conges on, which implies reducing L3
miss penalty. Furthermore, having separate cache would reduce the probability of cache
conten on, meaning that two di erent cores will not overwrite the vital data that other cores put
on a speci c block loca on. However, several disadvantages do exist for having separate L3 cache.
One major drawback is that some mes this large L3 capacity is not maximized. Since the work of
each core strongly depends on the program, it is common that many programs do not give each
core the same burden, i.e. some cores work harder than others. The low burden cores will thus
not maximize the available L3 capacity, without giving any chances for other cores to use this
spare capacity. This leads to higher L3 cache miss for a separate implementa on.

In spite of many advantages o ered by separate L3 cache, many vendors prefer shared L3 cache.
This is perhaps due to an easier hardware implementa on to have a shared L3 cache, thus
reducing the price. However, since processor cores increases in a single chip, this will lead to
some problems for a shared L3 cache. One major problem is a cache conten on, as explained in
the previous paragraph. There will be some cores compe ng upon their cache size, and the worst
part is that they may overwrite a vital data previously wri en by other cores. Another problem
might come from access latency caused by bus conges on, if the data required by a par cular
core is located far away within L3 cache. To overcome these, my idea would be to localize their
region nearest to the posi on of each core. By implemen ng this, this will prevent other cores
interrup ng other cores’ cache region. Furthermore, the size of each cache region for each core
need not be the same. It can be op mized by assessing each core workload. When each workload
is well approximated, a percentage of cache region can be assigned for each processor core. This
will maximize L3 cache capacity, in addi on to preven ng cache conten on as well as bus
conges on due to locality it possess.

3. Discuss brie y how “spinlock” by TS (Test and Set) instruc on may cause performance
deteriora on on a bus-based mul -processor system.

We can understand Test and Set spinlock by imagining peoples queueing a toilet. If one is using
the toilet at the me being, other people cannot use that. This is implemented by assigning a
variable x that can be assessed by every processor in shared memory system. Whenever the
variable x shows 1, other requests from other processors will be contained in a loop that cannot
ti
ti
ti
ti
ti
ti
fi
fl
ti
ti
ti
ti
ff
ff
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
tt
ti
ti
ti
ti
ti
ti
ff
ti
proceed further while x is s ll 1. For a bus-based mul -processor system, this can deteriorate
performance badly. We can return to a toilet analogy. In public toilet, it is commonly contained in
a larger space where people can wait there. We can imagine suppose that we have only one
western-style toilet along with many other Japanese-style toilet. Let us suppose that a lot of
people want that only one western-style toilet, they will normally make a queue just within the
room. The more the people queueing that toilet, the more occupied the toilet room, disturbing
the ow of people using Japanese-style toilet. In a technical term, this is referred to as “high bus
tra c”. Whenever a par cular processor has obtained the lock, another processor will keep
incurring bus transac ons in a empt to acquire the key. This will make the bus tra c busy and
disturbing the ow of other processes. Another problem is referred to as “fairness”. We can
imagine of queueing a toilet unorderly. Let us suppose a group of people consis ng of big and
small guys compe ng in using the toilet. In this case, big guys will most likely win. The small guy
may never use the toilet if another big guys keep coming and taking the turn of the small guy. This
happen in the mul processor system where a processor does not get a fair chance of ge ng the
lock when it is set free. This become even worse for a NUMA architecture where they do not have
a symmetric access to the main memory.
ffi
fl
fl
ti
ti
ti
ti
ti
tt
ti
ti
ffi
tti

Cose222 HW4
No ratings yet
Cose222 HW4
5 pages
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
ch2 Appb
No ratings yet
ch2 Appb
58 pages
Memory Hierarchy 4.0
No ratings yet
Memory Hierarchy 4.0
50 pages
HW4
No ratings yet
HW4
3 pages
Cache TLB
100% (1)
Cache TLB
15 pages
study guide 2
No ratings yet
study guide 2
4 pages
Solution of CSE 240A Assignemnt 3
No ratings yet
Solution of CSE 240A Assignemnt 3
5 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
"Cache Memory" in (Microprocessor and Assembly Language) : Lecture-20
No ratings yet
"Cache Memory" in (Microprocessor and Assembly Language) : Lecture-20
19 pages
Assignment2and 3-2024
No ratings yet
Assignment2and 3-2024
3 pages
Assign1 PDF
No ratings yet
Assign1 PDF
5 pages
L07-MemoryII
No ratings yet
L07-MemoryII
27 pages
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
No ratings yet
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
5 pages
SRAM-main
No ratings yet
SRAM-main
7 pages
How To Find AMAT - Final - Question
100% (1)
How To Find AMAT - Final - Question
17 pages
Memory Latency
No ratings yet
Memory Latency
7 pages
03-Memory
No ratings yet
03-Memory
48 pages
CS252 Graduate Computer Architecture Caches and Memory Systems I
No ratings yet
CS252 Graduate Computer Architecture Caches and Memory Systems I
49 pages
Computer Organization Exercise Answer7
No ratings yet
Computer Organization Exercise Answer7
7 pages
10-cacheperf
No ratings yet
10-cacheperf
24 pages
Dagatan Nino PR
No ratings yet
Dagatan Nino PR
12 pages
CacheMemoryAnAnalysisonOptimization
No ratings yet
CacheMemoryAnAnalysisonOptimization
6 pages
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
53-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
38 pages
A8 Solution 2
No ratings yet
A8 Solution 2
4 pages
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
No ratings yet
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
18 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
Cache Performance Average Memory Access Time
No ratings yet
Cache Performance Average Memory Access Time
23 pages
Computer Organization PDF
No ratings yet
Computer Organization PDF
2 pages
Cache
No ratings yet
Cache
34 pages
CSC506 Homework Due Friday, 5/28/99 - Cache Questions & VM
No ratings yet
CSC506 Homework Due Friday, 5/28/99 - Cache Questions & VM
2 pages
l08 Caches 2
No ratings yet
l08 Caches 2
39 pages
revision1
No ratings yet
revision1
33 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
ACA_Lecture_27_Cache_Optimizations
No ratings yet
ACA_Lecture_27_Cache_Optimizations
20 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Cache Memory Performance
No ratings yet
Cache Memory Performance
10 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Unit_3
No ratings yet
Unit_3
42 pages
Cache Org
No ratings yet
Cache Org
19 pages
Advanced Architecture Memory
No ratings yet
Advanced Architecture Memory
13 pages
ACA - Memory
No ratings yet
ACA - Memory
26 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
Term Paper: Cahe Coherence Schemes
No ratings yet
Term Paper: Cahe Coherence Schemes
12 pages
Lecture 12: Cache Innovations
No ratings yet
Lecture 12: Cache Innovations
17 pages
2015Sp CS61C L16 Kavs Caches3
No ratings yet
2015Sp CS61C L16 Kavs Caches3
25 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
Chip Multicore Processors - Tutorial 6: Task 6.1: Cache Misses
No ratings yet
Chip Multicore Processors - Tutorial 6: Task 6.1: Cache Misses
1 page
Cacche
No ratings yet
Cacche
6 pages
MEMORY ORGANIZATION
No ratings yet
MEMORY ORGANIZATION
25 pages
hw2 Solns
No ratings yet
hw2 Solns
15 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Computer Architecture_Lecture 06
No ratings yet
Computer Architecture_Lecture 06
18 pages
Improving and Measuring Cache Performance
No ratings yet
Improving and Measuring Cache Performance
8 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
IBM System Storage TS7650G ProtecTIER Ga32091805 PDF
No ratings yet
IBM System Storage TS7650G ProtecTIER Ga32091805 PDF
166 pages
Home Automation Using Node Muc
No ratings yet
Home Automation Using Node Muc
16 pages
Microprocessors Part1
No ratings yet
Microprocessors Part1
20 pages
S4100 0104 PDF
No ratings yet
S4100 0104 PDF
14 pages
Third Long Test in G8 Ict
No ratings yet
Third Long Test in G8 Ict
3 pages
First Generation (1940 - 1956) : Vacuum Tubes
No ratings yet
First Generation (1940 - 1956) : Vacuum Tubes
4 pages
Implementing Double Data Rate I/O Signaling in Cyclone Devices
No ratings yet
Implementing Double Data Rate I/O Signaling in Cyclone Devices
6 pages
Experience:: Naveen Kumar Vemulapalli
No ratings yet
Experience:: Naveen Kumar Vemulapalli
2 pages
N300 HIMS+2000+Program+User+Guide (English)
No ratings yet
N300 HIMS+2000+Program+User+Guide (English)
35 pages
Ingenic Zeratul T31 USB Interface Configuration Usage 20201220 EN
No ratings yet
Ingenic Zeratul T31 USB Interface Configuration Usage 20201220 EN
13 pages
Empresas de Microcontroladores PDF
No ratings yet
Empresas de Microcontroladores PDF
193 pages
Getting Started With The Kinetis FlashLoader
No ratings yet
Getting Started With The Kinetis FlashLoader
6 pages
Connectors
No ratings yet
Connectors
11 pages
Repository and Search Engine For Alumni of College (RASE: A Project Report On
No ratings yet
Repository and Search Engine For Alumni of College (RASE: A Project Report On
3 pages
Classification of Computers 01
No ratings yet
Classification of Computers 01
14 pages
T5L DGUSII Application Development Guide V2.91
No ratings yet
T5L DGUSII Application Development Guide V2.91
238 pages
Single Network Emulation Platform: Unified Networking Lab Unified Networking Lab by Andrea Dainese
No ratings yet
Single Network Emulation Platform: Unified Networking Lab Unified Networking Lab by Andrea Dainese
11 pages
Quick Start Guide: LR1100U and LR1100E
No ratings yet
Quick Start Guide: LR1100U and LR1100E
2 pages
Computers and The Internet
No ratings yet
Computers and The Internet
3 pages
Huawei.H13-611.v2020-10-14.q171: Leave A Reply
No ratings yet
Huawei.H13-611.v2020-10-14.q171: Leave A Reply
45 pages
Contoh Hasil Stock Opname Produk C
No ratings yet
Contoh Hasil Stock Opname Produk C
6 pages
Direct and Indirect Speech
No ratings yet
Direct and Indirect Speech
1,706 pages
STM32F103 Electronic Components
No ratings yet
STM32F103 Electronic Components
9 pages
Cloud Virtualization Technology
No ratings yet
Cloud Virtualization Technology
49 pages
SRDF Steps & Summary
No ratings yet
SRDF Steps & Summary
7 pages
GD32F103xxDatasheet_Rev3.0
No ratings yet
GD32F103xxDatasheet_Rev3.0
112 pages
DSE Product Guide 2023 - PAGES
No ratings yet
DSE Product Guide 2023 - PAGES
81 pages
Datasheet
No ratings yet
Datasheet
2 pages
Auditing With Technology (Q&A) - Final
No ratings yet
Auditing With Technology (Q&A) - Final
15 pages
DIMM-SH7723: Processor Module With SH7723 Hardware Description
No ratings yet
DIMM-SH7723: Processor Module With SH7723 Hardware Description
31 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Assignment (G)

Uploaded by

Assignment (G)

Uploaded by

High Performance Compu ng

Size Instruc on cache Data cache Uni ed cache

(1) Find the percentage of instruc on references in the en re memory references.

instruction references 100

Miss rate is given by,

(3) Find the overall miss rate of the separate caches.

(80% × 0.00362) + (20% × 0.1636) = 0.035616

(4) Find the miss rate of the 32KB uni ed cache.

(5) Find the average memory access me of the separate caches.

Average memory access me is given by,

(6) Find the average memory access me of the uni ed cache.

Figure 1. Mul core processor die map (source: h p://www.techwarelabs.com/)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.