0% found this document useful (0 votes)

55 views

A8 Solution 2

This document discusses memory hierarchy concepts through several examples and problems. It covers calculating average memory access time, comparing cache configurations based on miss rates and penalties, analyzing cache hit rates for block fetching, timing for accessing different amounts of data from DRAM, and identifying bank conflicts in an interleaved memory system.

Uploaded by

Omar Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

A8 Solution 2

Uploaded by

Omar Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

COE 301 – Computer Organization

Assignment 8: Memory Hierarchy

Solution
1. (4 pts) Consider a processor with a 2 ns clock cycle, a miss penalty of 20 clock cycles, a miss rate
of 0.05 misses per instruction, and a cache access time (hit time) of 1 clock cycle. Assume that the
read and write miss penalties are the same.
a) (1 pt) Find the average memory access time (AMAT).
b) (1 pt) Suppose we can improve the miss rate to 0.03 misses per instruction by doubling the cache
size. However, this causes the cache access time to increase to 1.2 cycles. Using the AMAT as a
metric, determine if this is a good trade-off.
c) (2 pts) If the cache access time determines the processor’s clock cycle time, which is often the
case, AMAT may not correctly indicate whether one cache organization is better than another. If
the processor’s clock cycle time must be changed to match that of a cache, is this a good trade-
off? Assume that the processors in part (a) and (b) are identical, except for the clock rate and the
cache miss rate. Assume 1.5 references per instruction (for both I-cache and D-cache) and a CPI
without cache misses of 2. The miss penalty is 20 cycles for both processors.

Solution:
a) AMAT = Hit time + Miss rate × Miss penalty
= 2 ns + 0.05 × (20 × 2 ns) = 4 ns
b) AMAT = 1.2 × 2 ns + 0.03 × 20 × 2 ns = 2.4 ns + 1.2 ns = 3.6 ns
Yes, this is a good trade-off.
c) CPU time = Clock cycle × IC × (CPIideal-cache + cache stall cycles per instruction)
CPU time(a) = 2 ns × IC × (2 + 1.5 × 20 × 0.05) = 7 × IC
CPU time(b) = 2.4 ns × IC × (2 + 1.5 × 20 × 0.03) = 6.96 × IC
The CPU times in parts (a) and (b) are almost identical. Hence, doubling the cache
size to improve the miss rate at the expense of stretching the clock cycle results in
essentially no net gain.

Prepared by Dr. Muhamed Mudawar Page 1 of 4

2. (5 pts) Consider three processors with three cache configurations:
Processor 1: Direct-mapped i-cache and d-cache with one-word blocks
Instruction miss-rate = 4%, data miss-rate = 6%
Processor 2: Direct-mapped i-cache and d-cache with four-word blocks
Instruction miss-rate = 2%, data miss-rate = 4%
Processor 3: Two-way set associative i-cache and d-cache with four-word blocks
Instruction miss-rate = 2%, data miss-rate = 3%
a) (3 pts) For these processors, 50% of the instructions contain a data reference. Assume that the
cache penalty is 6 + Block size in words. Determine which processor spends the most cycles on
cache misses.
b) (2 pts) The cycle time is 420 ps for Processor 1 and 2, and 310 ps for the third processor.
Determine which processor is the fastest and which one is the slowest.

Solution:

a) For Processor 1:
Miss penalty = 6 + 1 = 7 cycles
Stall cycles per instruction = 4% × 7 + 50% × 6% × 7 = 0.28 + 0.21 = 0.49

For Processors 2:
Miss penalty = 6 + 4 = 10 cycles
Stall cycles per instruction = 2% × 10 + 50% × 4% × 10 = 0.2 + 0.2 = 0.4

For Processor 3:
Miss penalty = 6 + 4 = 10 cycles
Stall cycles per instruction = 2% × 10 + 50% × 3% × 10 = 0.2 + 0.15 = 0.35

Therefore, Processor 1 spends the most cycles on cache misses.

b) CPU time = IC × CPI × Clock cycle

Instruction count is same for all processors

CPI = CPIideal-cache + Stall cycles per instruction

CPIideal-cache is the same for all processors

For processor 1: CPU time = IC × (CPIideal-cache + 0.49) × 420 ps

For processor 2: CPU time = IC × (CPIideal-cache + 0.4) × 420 ps
For processor 3: CPU time = IC × (CPIideal-cache + 0.35) × 310 ps

Clearly, Processor 1 is the slowest and Processor 3 is the fastest.

Prepared by Dr. Muhamed Mudawar Page 2 of 4

3. (5 pts) A computer system has a 1 GB main memory. It also has a 4K-Byte cache organized as a
4-way set-associative, with 4 blocks per set and 64 bytes per block.
a) (2 pts) Calculate the number of bits in the Tag, Set Index, and Byte Offset fields of the memory
address format.
b) (3 pts) Assume that the cache is initially empty. Suppose the processor fetches 4352 consecutive
bytes from memory starting at address 0. The same fetch sequence is then repeated 9 more times
for a total of 10 iterations. What is the hit rate assuming that the LRU algorithm is used for block
replacement?

Solution:

a) Numbers of Sets = (4 * 1024) / (4 * 64) = 16 sets, Set index = 4 bits, Byte offset = 6 bits

Address = 30 bits (1 GB main memory), Tag = 30 – 4 – 6 = 20 bits

b) 4352 / 64 = 68 consecutive blocks are being fetched

The cache can store only 64 blocks. The last 4 fetched blocks will replace the first 4 blocks.

The first iteration causes 68 cache misses to fetch the 68 blocks from memory

Then each iteration causes 8 cache misses only to fetch the first 4 and the last 4 blocks

Therefore, total cache misses = 68 + 9 * 8 = 140

If the processor is loading individual bytes from the cache, then

Total accesses to the cache = 4352 * 10 = 43520

Miss Rate = 140 / 43520 = 0.32%, Hit Rate = 100% – Miss Rate = 99.68%

If the processor is loading words from the cache and word size = 4 bytes, then

Total accesses to the cache = (4352 / 4) * 10 = 10880

Miss Rate = 140 / 10880 = 1.287%, Hit Hate = 100% – Miss Rate = 98.7%

4. (3 pts) Consider a main memory constructed with Synchronous DRAM chips that have the
following timing requirements: 1 bus cycle to transfer the address, 10 bus cycles access latency,
and 1 bus cycle to transfer a word. Assume that 32-bits of data can be transferred in parallel. If a
200-MHz clock is used for the bus and memory, and burst mode is used to transfer a block, how
long does it take to access and transfer 32 bytes of data, 64 bytes of data, and 128 bytes of data?

Solution:

Clock Rate = 200 MHz, Clock cycle for bus and memory = 5 ns

Time to transfer 32 bytes (8 words) = 1 cycle (address) + 10 cycles (latency to get first word)
+ 7 cycles (7 remaining words) = 18 bus cycles = 18 * 5 ns = 90 ns

Time to transfer 64 bytes (16 word) = 1 + 10 + 15 = 26 bus cycles = 26 * 5 ns = 130 ns

Time to transfer 128 bytes (32 words) = 1 + 10 + 31 = 42 bus cycles = 42 * 5 ns = 210 ns

Prepared by Dr. Muhamed Mudawar Page 3 of 4

5. (3 pts) Assume a memory system that supports interleaving of either four reads or four writes.
Given the following memory addresses in order as they appear on the memory address bus: 3, 9,
17, 2, 51, 37, 13, 4, 8, 41, 67, 10, which ones will result in a bank conflict?

Solution:

Address Bank Bank Conflict

3 3 No
9 1 No
17 1 Yes (with 9)
2 2 No
51 3 No
37 1 Yes (with 17)
13 1 Yes (with 37)
4 0 No
8 0 Yes (with 4)
41 1 No
67 3 No
10 2 No

Prepared by Dr. Muhamed Mudawar Page 4 of 4

Cose222 HW4
No ratings yet
Cose222 HW4
5 pages
Smart Braille Reading and Writing Device Final Year Report
100% (3)
Smart Braille Reading and Writing Device Final Year Report
50 pages
Cit 711
No ratings yet
Cit 711
200 pages
Lecture 41
No ratings yet
Lecture 41
41 pages
Computer Organization Exercise Answer7
No ratings yet
Computer Organization Exercise Answer7
7 pages
School of Electronics Engineering (Sense) : Class Number: VL2021220101854 Semester
No ratings yet
School of Electronics Engineering (Sense) : Class Number: VL2021220101854 Semester
4 pages
Assign1 PDF
No ratings yet
Assign1 PDF
5 pages
Cache TLB
100% (1)
Cache TLB
15 pages
Test 6 PracticeQuestion Cachememory 1
No ratings yet
Test 6 PracticeQuestion Cachememory 1
21 pages
207 Assignment 6
No ratings yet
207 Assignment 6
7 pages
Solution of CSE 240A Assignemnt 3
No ratings yet
Solution of CSE 240A Assignemnt 3
5 pages
Computer Organization PDF
No ratings yet
Computer Organization PDF
2 pages
Solutions: 18-742 Advanced Computer Architecture
No ratings yet
Solutions: 18-742 Advanced Computer Architecture
8 pages
Tutorial 7cache
No ratings yet
Tutorial 7cache
2 pages
Study Set 12 Memory Components and DRAM
No ratings yet
Study Set 12 Memory Components and DRAM
8 pages
Jaimin Brahmbhatt COSC 6351 Advanced Computer Architecture Assignment
No ratings yet
Jaimin Brahmbhatt COSC 6351 Advanced Computer Architecture Assignment
3 pages
2010 Final Exam Solutions
0% (1)
2010 Final Exam Solutions
13 pages
Sheet 01
No ratings yet
Sheet 01
3 pages
Tutorial 3
No ratings yet
Tutorial 3
14 pages
Maths
No ratings yet
Maths
3 pages
PgtrbcomputerscienceIN PART 3
No ratings yet
PgtrbcomputerscienceIN PART 3
91 pages
Test 6 PracticeQuestion Cachememory 1 Updated
No ratings yet
Test 6 PracticeQuestion Cachememory 1 Updated
22 pages
Quiz 1 Rubric
No ratings yet
Quiz 1 Rubric
3 pages
Assignment (G)
No ratings yet
Assignment (G)
5 pages
Cmsc132part1 3rdexam
No ratings yet
Cmsc132part1 3rdexam
2 pages
Final Exam - Fall 2008: COE 308 - Computer Architecture
No ratings yet
Final Exam - Fall 2008: COE 308 - Computer Architecture
8 pages
CO Problems M3
No ratings yet
CO Problems M3
2 pages
Week 6: Assignment Solutions
No ratings yet
Week 6: Assignment Solutions
4 pages
Topics Covered: Memory Subsystem
No ratings yet
Topics Covered: Memory Subsystem
7 pages
Advanced Architecture Memory
No ratings yet
Advanced Architecture Memory
13 pages
Cache Worksheet
No ratings yet
Cache Worksheet
2 pages
Coa Assignment On Cache
No ratings yet
Coa Assignment On Cache
4 pages
PDF
No ratings yet
PDF
6 pages
Computer Architecture - Tutorial 4 (SOLUTIONS) : Context, Objectives and Organization
No ratings yet
Computer Architecture - Tutorial 4 (SOLUTIONS) : Context, Objectives and Organization
4 pages
Cache Performance Average Memory Access Time
No ratings yet
Cache Performance Average Memory Access Time
23 pages
Homework4 v2 Solution
No ratings yet
Homework4 v2 Solution
14 pages
Review Problems For Exam 1: MIPS (Instruction Count) / (Execution Time X 10
No ratings yet
Review Problems For Exam 1: MIPS (Instruction Count) / (Execution Time X 10
6 pages
CS704 Finalterm QA Past Papers
No ratings yet
CS704 Finalterm QA Past Papers
20 pages
BaiTap Chuong4 PDF
No ratings yet
BaiTap Chuong4 PDF
8 pages
Ca Mod 2
No ratings yet
Ca Mod 2
40 pages
Cache Ques B
No ratings yet
Cache Ques B
1 page
18-742 Advanced Computer Architecture: Test I February 24, 1998
No ratings yet
18-742 Advanced Computer Architecture: Test I February 24, 1998
10 pages
Cao Cat-2 Quesns and Key
No ratings yet
Cao Cat-2 Quesns and Key
8 pages
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
No ratings yet
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
18 pages
SPRING 2015 CDA 3101 Homework 3: Date-Assigned: Mar 27th, 2015 Due Dates: 11:55pm, April 7th, 2015
No ratings yet
SPRING 2015 CDA 3101 Homework 3: Date-Assigned: Mar 27th, 2015 Due Dates: 11:55pm, April 7th, 2015
5 pages
Module 4 - Cache Memory Problems
No ratings yet
Module 4 - Cache Memory Problems
8 pages
PCS216
No ratings yet
PCS216
3 pages
IT4272E CS FinalExam 20211
No ratings yet
IT4272E CS FinalExam 20211
1 page
5 1
No ratings yet
5 1
39 pages
323 MT 1
No ratings yet
323 MT 1
3 pages
HW4
No ratings yet
HW4
3 pages
Solutions To Set 8
No ratings yet
Solutions To Set 8
18 pages
Ceng252 Quiz3answers
100% (2)
Ceng252 Quiz3answers
3 pages
CA Individual Assignments 2019 04
No ratings yet
CA Individual Assignments 2019 04
1 page
CA Tut12 ANS
No ratings yet
CA Tut12 ANS
3 pages
CENG400-Final-Fall 2015
No ratings yet
CENG400-Final-Fall 2015
10 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
HW6 Spring2022 Solution 2
No ratings yet
HW6 Spring2022 Solution 2
10 pages
Ca Sol PDF
No ratings yet
Ca Sol PDF
8 pages
Advance Computer Architecture Homework 2 Solution
No ratings yet
Advance Computer Architecture Homework 2 Solution
8 pages
Solution CSE332 Sec 5 MT Fall2021 1
No ratings yet
Solution CSE332 Sec 5 MT Fall2021 1
3 pages
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Coniq Cloud The Intelligent Connection
No ratings yet
Coniq Cloud The Intelligent Connection
8 pages
Workshop Details - Datasphere Analytic Model & Catalog (18877)
No ratings yet
Workshop Details - Datasphere Analytic Model & Catalog (18877)
18 pages
Internet of Things (IOT) : Market Research
No ratings yet
Internet of Things (IOT) : Market Research
12 pages
Human Factors and Cognitive Psychology
No ratings yet
Human Factors and Cognitive Psychology
15 pages
How To Create An Effective PowerPoint
No ratings yet
How To Create An Effective PowerPoint
13 pages
How Will You Test Incremental Loading
100% (1)
How Will You Test Incremental Loading
2 pages
Introduction to Microsoft Excel 2024
No ratings yet
Introduction to Microsoft Excel 2024
40 pages
MDM System Management Training
No ratings yet
MDM System Management Training
199 pages
Homer 041919
No ratings yet
Homer 041919
2 pages
Amare Yalew: Work Authorization: Green Card Holder
No ratings yet
Amare Yalew: Work Authorization: Green Card Holder
3 pages
Tripp Lite Owners Manual 796960
No ratings yet
Tripp Lite Owners Manual 796960
8 pages
7th Grade Computer Applications Curriculum
No ratings yet
7th Grade Computer Applications Curriculum
8 pages
What Are Nets On PCB
No ratings yet
What Are Nets On PCB
10 pages
843 Ai Xi
No ratings yet
843 Ai Xi
11 pages
Introduction To Linux
No ratings yet
Introduction To Linux
17 pages
Esp32 s2 Mini 1 - Esp32 s2 Mini 1u - Datasheet - en
No ratings yet
Esp32 s2 Mini 1 - Esp32 s2 Mini 1u - Datasheet - en
30 pages
Absolute Encoders - Singleturn: Ssi / Biss + Incremental Sendix 5873 (Tapered Shaft)
No ratings yet
Absolute Encoders - Singleturn: Ssi / Biss + Incremental Sendix 5873 (Tapered Shaft)
5 pages
Matematika Angol Nyelven: Emelt Szintű Írásbeli Vizsga
No ratings yet
Matematika Angol Nyelven: Emelt Szintű Írásbeli Vizsga
24 pages
Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
100% (1)
Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
481 pages
Essential Electronics & Microcontrollers: FYS3240-4240 Data Acquisition & Control
No ratings yet
Essential Electronics & Microcontrollers: FYS3240-4240 Data Acquisition & Control
26 pages
Dte Micro Project
No ratings yet
Dte Micro Project
11 pages
Computer 9 - Module 1
No ratings yet
Computer 9 - Module 1
32 pages
Chapter 8 Paper
No ratings yet
Chapter 8 Paper
1 page
Universal Reordering Via Linguistic Typology
No ratings yet
Universal Reordering Via Linguistic Typology
10 pages
PDF HMH Texas Algebra 2 Timothy D. Kanold Download
100% (5)
PDF HMH Texas Algebra 2 Timothy D. Kanold Download
53 pages
Nucleo Boards Programming With The STM32CubeIDE
No ratings yet
Nucleo Boards Programming With The STM32CubeIDE
30 pages
CID TOSHIBA External HDD
No ratings yet
CID TOSHIBA External HDD
8 pages
3RW55456HA14 Datasheet en PDF
No ratings yet
3RW55456HA14 Datasheet en PDF
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

A8 Solution 2

Uploaded by

A8 Solution 2

Uploaded by

COE 301 – Computer Organization

Assignment 8: Memory Hierarchy

Prepared by Dr. Muhamed Mudawar Page 1 of 4

Therefore, Processor 1 spends the most cycles on cache misses.

b) CPU time = IC × CPI × Clock cycle

Instruction count is same for all processors

CPI = CPIideal-cache + Stall cycles per instruction

CPIideal-cache is the same for all processors

For processor 1: CPU time = IC × (CPIideal-cache + 0.49) × 420 ps

Clearly, Processor 1 is the slowest and Processor 3 is the fastest.

Prepared by Dr. Muhamed Mudawar Page 2 of 4

Address = 30 bits (1 GB main memory), Tag = 30 – 4 – 6 = 20 bits

b) 4352 / 64 = 68 consecutive blocks are being fetched

Therefore, total cache misses = 68 + 9 * 8 = 140

If the processor is loading individual bytes from the cache, then

Total accesses to the cache = 4352 * 10 = 43520

Total accesses to the cache = (4352 / 4) * 10 = 10880

Time to transfer 64 bytes (16 word) = 1 + 10 + 15 = 26 bus cycles = 26 * 5 ns = 130 ns

Time to transfer 128 bytes (32 words) = 1 + 10 + 31 = 42 bus cycles = 42 * 5 ns = 210 ns

Prepared by Dr. Muhamed Mudawar Page 3 of 4

Address Bank Bank Conflict

Prepared by Dr. Muhamed Mudawar Page 4 of 4

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.