A8 Solution 2
A8 Solution 2
Solution
1. (4 pts) Consider a processor with a 2 ns clock cycle, a miss penalty of 20 clock cycles, a miss rate
of 0.05 misses per instruction, and a cache access time (hit time) of 1 clock cycle. Assume that the
read and write miss penalties are the same.
a) (1 pt) Find the average memory access time (AMAT).
b) (1 pt) Suppose we can improve the miss rate to 0.03 misses per instruction by doubling the cache
size. However, this causes the cache access time to increase to 1.2 cycles. Using the AMAT as a
metric, determine if this is a good trade-off.
c) (2 pts) If the cache access time determines the processor’s clock cycle time, which is often the
case, AMAT may not correctly indicate whether one cache organization is better than another. If
the processor’s clock cycle time must be changed to match that of a cache, is this a good trade-
off? Assume that the processors in part (a) and (b) are identical, except for the clock rate and the
cache miss rate. Assume 1.5 references per instruction (for both I-cache and D-cache) and a CPI
without cache misses of 2. The miss penalty is 20 cycles for both processors.
Solution:
a) AMAT = Hit time + Miss rate × Miss penalty
= 2 ns + 0.05 × (20 × 2 ns) = 4 ns
b) AMAT = 1.2 × 2 ns + 0.03 × 20 × 2 ns = 2.4 ns + 1.2 ns = 3.6 ns
Yes, this is a good trade-off.
c) CPU time = Clock cycle × IC × (CPIideal-cache + cache stall cycles per instruction)
CPU time(a) = 2 ns × IC × (2 + 1.5 × 20 × 0.05) = 7 × IC
CPU time(b) = 2.4 ns × IC × (2 + 1.5 × 20 × 0.03) = 6.96 × IC
The CPU times in parts (a) and (b) are almost identical. Hence, doubling the cache
size to improve the miss rate at the expense of stretching the clock cycle results in
essentially no net gain.
Solution:
a) For Processor 1:
Miss penalty = 6 + 1 = 7 cycles
Stall cycles per instruction = 4% × 7 + 50% × 6% × 7 = 0.28 + 0.21 = 0.49
For Processors 2:
Miss penalty = 6 + 4 = 10 cycles
Stall cycles per instruction = 2% × 10 + 50% × 4% × 10 = 0.2 + 0.2 = 0.4
For Processor 3:
Miss penalty = 6 + 4 = 10 cycles
Stall cycles per instruction = 2% × 10 + 50% × 3% × 10 = 0.2 + 0.15 = 0.35
Solution:
a) Numbers of Sets = (4 * 1024) / (4 * 64) = 16 sets, Set index = 4 bits, Byte offset = 6 bits
The cache can store only 64 blocks. The last 4 fetched blocks will replace the first 4 blocks.
The first iteration causes 68 cache misses to fetch the 68 blocks from memory
Then each iteration causes 8 cache misses only to fetch the first 4 and the last 4 blocks
Miss Rate = 140 / 43520 = 0.32%, Hit Rate = 100% – Miss Rate = 99.68%
If the processor is loading words from the cache and word size = 4 bytes, then
Miss Rate = 140 / 10880 = 1.287%, Hit Hate = 100% – Miss Rate = 98.7%
4. (3 pts) Consider a main memory constructed with Synchronous DRAM chips that have the
following timing requirements: 1 bus cycle to transfer the address, 10 bus cycles access latency,
and 1 bus cycle to transfer a word. Assume that 32-bits of data can be transferred in parallel. If a
200-MHz clock is used for the bus and memory, and burst mode is used to transfer a block, how
long does it take to access and transfer 32 bytes of data, 64 bytes of data, and 128 bytes of data?
Solution:
Clock Rate = 200 MHz, Clock cycle for bus and memory = 5 ns
Time to transfer 32 bytes (8 words) = 1 cycle (address) + 10 cycles (latency to get first word)
+ 7 cycles (7 remaining words) = 18 bus cycles = 18 * 5 ns = 90 ns
Solution: