Ilovepdf Merged
Ilovepdf Merged
Architecture
Faculty: SHIBU V S
Module:2
Topic: Memory Hierarchy Technology
Hierarchical Memory Technology
• Storage devices such as registers, caches, main memory, disk
devices, and backup storage are often organized as a hierarchy
• The memory technology and storage organization at each level are
characterized by five parameters:
• the access time(ti),
• memory size(si),
• cost per byte(ci),
• transfer bandwidth(bi),
• unit of transfer(xi).
• The access time ti refers to the round-trip time from the CPU to the
ith-level memory.
• The memory size si, is the number of bytes or words in level i.
• The cost of the ith-level memory is estimated by the product cisi
• The bandwidth bi refers to the rate at which information is
transferred between adjacent levels.
• The unit of transfer xi refers to the grain size for data transfer
between levels i and i+ 1.
• Memory devices at a lower level are faster to access, smaller in size,
and more expensive per byte, having a higher bandwidth and using
a smaller unit of transfer as compared with those at a higher level.
• ti-1 < ti , si-1 < si , ci-1 > ci ,bi-1 > bi , xi-1 < xi ,
• for i = l, 2, 3,and 4, in the hierarchy where i= 0 corresponds to the
CPU register level. The cache is at level 1, main memory at level 2,
the disks at level 3, and backup storage at level 4.
Registers
• The registers are parts of the processor: multi-level caches are built
either on the processor chip or on the processor board.
• Register assignment is made by the compiler.
• Register transfer operations are directly controlled by the processor
after instructions are decoded.
• Register transfer is conducted at processor speed, in one clock
cycle.
Caches
• The cache is controlled by the MMU and is programmer-
transparent.
• The cache can also be implemented at one or multiple levels,
depending on the speed and application requirements.
• Over the last two or three decades, processor speeds have
increased at a much faster rate than memory speeds.
• Therefore multi-level cache systems have become essential to deal
with memory access latency.
Main Memory
• The main memory is sometimes called the primary memory of a
computer system.
• It is usually much larger than the cache and often implemented by
the most cost-effective RAM chips, such as DDR SDRAMs, i.e. dual
data rate synchronous dynamic RAMs.
• The main memory is managed by a MMU in cooperation with the
operating system.
Disk Drives and backup Storage
• The disk storage is considered the highest level of on-line memory.
• lt holds the system programs such as the OS and compilers, and
user programs and their data sets.
• Optical disks and magnetic tape units are off-line memory for use as
archival and backup storage.
• They hold copies of present and past user programs and processed
results and files.
• Disk drives are also available in the form of RAID arrays.
• A typical workstation computer has the cache and main memory on
a processor board and hard disks in an attached disk drive
Peripheral Technology
• Besides disk drives and backup storage, peripheral devices include
printers, plotters, terminals, monitors, graphics displays, optical
scanners, image digitizers, output microfilm devices,etc.
• Some l/O devices are tied to special-purpose or multimedia
applications.
Inclusion, Coherence and Locality
• Information stored in a memory hierarchy (M1, M2, ….Mn)satisfies
three important properties: inclusion, coherence and locality
• We consider cache memory the innermost level M1,which directly
communicates with the CPU registers.
• The outermost level Mn contains all the information words stored.
The collection of all addressable words in Mn forms the virtual
address space of a computer.
Inclusion Property
• The inclusion is Property is stated as
• The inclusion relationship implies that all information items are
originally stored in the outermost level Mn .
• During the processing, subsets of Mn, are copied into Mn-1. Similarly,
subsets of Mn-1 are copied into Mn-2 and so on.
• In other words, if an information word is found in Mi, then copies of
the same word can also be found in all upper levels Mi+1, Mi+2 ………
Mn .
• A word stored in Mi+1 may not be found in Mi .
• A word miss in Mi implies that it is also missing from all lower levels
Mi-1, Mi-2,…., M1.The highest level is the backup storage, where
everything can be found.
• Information transfer between the CPU and cache is in terms of
words (4 or 8 bytes each depending on the word length of a
machine).
• The cache (M1) is divided into cache blocks, also called cache lines
by some authors. Each block may be typically 32 bytes (8 words).
Blocks are the units of data transfer between the cache and main
memory, or between L1 and L2 cache, etc.
• The main memory (M2) is divided into pages, say, 4Kbytes each.
Each page contains 128 blocks. Pages are the units of information
transferred between disk and main memory.
• Scattered pages are organized as a segment in the disk memory, for
example, segment F contains page A, page B, and other pages. The
size of a segment varies depending on the user‘s needs.
• Data transfer between the disk and backup storage is handled at
the file level, such as segments F and G.
Coherence Property
• The coherence property requires that copies of the information
item at successive memory levels be consistent.
• If a word is modified in the cache, copies of that word must be
updated immediately or eventually at all higher levels. The
hierarchy should be maintained as such.
• Frequently used information is often found in the lower levels in
order to minimize the effective access time of the memory
hierarchy.
• ln general, there are two strategies tor maintaining the coherence
in a memory hierarchy.
• The first method is called write-through (WT), which demands
immediate update in Mi+1 if a word is modified in Mi
for i= 1,2,... ,n- 1.
• The second method is write-back (WB), which delays the update in
Mi+1 until the word being modified in Mi is replaced or removed
from Mi.
Locality of Reference
• The memory hierarchy was developed based on a program behavior
known as Locality of reference.
• Memory references are generated by the CPU for either instruction or
data access. These accesses tend to be clustered in certain regions in
time, space, and ordering.
• In other words, most programs act in favor of a certain portion of their
address space during any time window.
• Hennessy and Patterson have pointed out a 90-10 rule which states that a
typical program may spend 90% of its execution time on only 10% of the
code such as the inner most loop of a nested looping operation .
• There are three dimensions of the locality property: temporal,
spatial, and sequential.
Temporal Locality
• Recently referenced items (instructions or data) are likely to be
referenced again in the near future. This is often caused by special
program constructs such as iterative loops, process stacks,
temporary variables, or subroutines.
• Once a loop is entered or a subroutine is called, a small code
segment will be referenced repeatedly many times. Thus temporal
locality tends to cluster the access in the recently used areas.
Spatial Locality
• This refers to the tendency for a process to access items whose
addresses are near one another. For example, operations on tables
or arrays involve accesses of a certain clustered area in the address
space.
• Program segments, such as routines and macros, tend to be stored
in the same neighborhood of the memory space.
Sequential Locality
• ln typical programs, the execution of instructions follows a
sequential order (or the program order) unless branch instructions
create out-of-order executions.
• The ratio of in-order execution to out-of-order execution is roughly
5 to 1 in ordinary programs. Besides, the access of a large data array
also follows a sequential order.
Memory Design Implications
• The sequentiality in program behavior also contributes to the
spatial locality because sequentially coded instructions and array
elements are often stored in adjacent locations.
• Each type of locality affects the design of the memory hierarchy.
• The temporal locality leads to the popularity of the least recently used
(LRU) replacement algorithm.
• The spatial locality assists us in determining the size of unit data transfers
between adjacent memory levels.
• The temporal locality also helps determine the size of memory at
successive levels.
• The sequential locality affects the determination of grain size for optimal
scheduling (grain packing).
• Prefetch techniques are heavily affected by the locality properties.
• The principle of localities guides the design of cache, main memory, and
even virtual memory organization.
Memory capacity Planning
• The performance of a memory hierarchy is determined by the
effective access time Teff to any level in the hierarchy.
• It depends on the hit ratios and access frequencies at successive
levels.
Hit Ratios : Hit ratio is a concept defined for any two adjacent levels of
a memory hierarchy.
• When an information item is found in Mi, we call it a hit, otherwise,
a miss.
• Consider memory levels Mi and Mi-1 in a hierarchy, i= 1, 2,. . ., n. The
hit ratio hi at Mi is the probability that an information item will be
found in Mi.
• It is a function of the characteristics of the two adjacent levels Mi
and Mi-1 and The miss ratio at Mi, is definedas 1-hi.
• The hit ratios at successive levels are a function of memory
capacities, management policies, and program behavior.
• Successive hit ratios are independent random variables with values
between 0 and 1. To simplify the future derivation, we assume h0 =
0 and hn = 1, which means the CPU always accesses M1, first and the
access to the outermost memory Mn is always a hit.
• The access frequency to Mi is defined as
fi=(1-h1)(1-h2)(1-h3)…..(1-hi-1)hi.
and f1=h1
• Due to the locality property, the access frequencies decreases very
rapidly from low to high levels; that is,
•
• the unit cost ci and capacity si, at each level Mi, depend on the
speed ti, required.
• Therefore, the above optimization involves tradeoffs among ti, ci, si
and fi or hi at all levels i = 1, 2, . . ., n.
Example: The design of a memory hierarchy
• Consider the design of a three-level memory hierarchy with the
following specifications for memory characteristics:
• The design goal is to achieve an effective memory-access time t =
850 ns with a cache hit ratio h1 = 0.93 and a hit ratio h2= 0.99 in
main memory. Also, the total cost of the memory hierarchy is
upper-bounded by $1,500.
The memory hierarchy cost is calculated as:
C=c1s1 + c2s2 +c3s3 <= 1500
The maximum capacity of the disk is thus obtained as s3= 40 Gbytes
without exceeding the budget.
• Next, we want to choose the access time [t2] of the RAM to build
the main memory. The effective memory access time is calculated
as:
• t= <=850
850 x 10-9 =0.98 x 25 x10-9 + 0.02 x 0.99 x t2 + 0.02 x 0.01 x 1 x 4 x10-3
• t2=1250ns
• Suppose one wants to double the main memory to 64 Mbytes at
the expense of reducing the disk capacity under the same budget
limit. This change will not affect the cache hit ratio. But it may
increase the hit ratio in the main memory, and thereby, the
effective memory-access time will be reduced.
Problem No:1
a) The average access time is t a h1t1 (1 h1 )h2t 2 ht1 (1 h)10t1
= (10 9h)t1
If h=0.7 , ta =3.7t1=74ns
If h=0.9 , ta =1.9t1=38ns
If h=0.98 , ta =1.18t1=23.6ns
c1s1 c2 s2 20c2 s1 c2 4
b) The average byte cost= c
s1 s2 s1 4000
20 0.2 s1 0.2 4000 4 s1 800
=
s1 4000 s1 4000
S1=64 ,c = 0.26
S2=128 , c = 0.32
S3=256, c= 0.43
c) For the three designs,
1) 74ns x 0.26 = 19.24
2) 38 ns x 0.32 = 12.6
The two broad categories which we shall discuss are CISC and RISC.
• Under both CISC and RISC categories, products designed for multi-
core chips, embedded applications, or for low cost and or low
power consumption, tend to have lower clock speeds.
• All functional units share the use of a common large register file.