0% found this document useful (0 votes)
13 views

ACA - Memory

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

ACA - Memory

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 26

MEMORY

Memory Hierarchy:

Objectives: To attempt to match the processor speed with the rate of information
transfer or the bandwidth of the memory at the lowest level at a reasonable cost.
Major difference exists in the hierarchical memory structures of the two systems
due to the memory reference characteristics of multiprogrammed uniprocessor
system and parallel processor system. In parallel processing system concurrent
memory requests come from different processor at the same level. A conflict
occurs when two or more of these concurrent requests reference the same
section of memory at the same level. This type of conflict degrades the
performance of the system. This type of conflict can be reduced by partitioning the
memory at a given level into several modules to achieve some degree of
concurrent access.

The Characteristics of a memory system are:

 Access time ta

 Capacity of storage S

 Cost / bit C

Design Goals:

 Fastest Access time ( t a should be minimum)

 Maximum capacity (S should be maximum)

 Cost / bit (C should be minimum)

For high speed memories:

 t a is less

 S is less

 C is high

1
For high capacity memories:

 ta is large

 S is large

 C is low

CPU

Register
C
O
S
T Cache

&

S Main Memory
P
P
E
D

Disk

Capacity

Register :- highest speed, a few in number

Cache :- higher speed, a small capacity, higher cost

Main Memory :- high speed, moderate capacity, high cost

Disk: - low speed, large capacity, low cost

Classification of memory based on access method:

2
 Random Access Memory (RAM) – in RAM the access time t a of a memory
word is independent of its location.

 Sequential Access Memory (SAM) – in SAM information is accessed


serially or sequentially.

 Direct Access Storage Device (DASD) – DASDs are rotational devices


made of magnetic materials where any block of information can be
accessed directly.

Commonly used DASDs are Drums, Fixed – Head Disks, Moveable-Arm


Disks. The time to transfer a block of information is t a + tb, where ta is the

access time & tb is the block transfer time. For Drums and Fixed-Head Disks t a
is the time it takes for the initial word of the desired block to rotate into
position. For Movable-Arm Disks additional “Seek Time” t s is required to move
the arms into track position.

Characteristics of memory devices in a memory hierarchy

Level Memory Type Technology Size Si Avg access Unit of


i time ti transfer
1 Cache Bipolar, 2K-128KB 30 - 100 ns 1 word
HMOS bytes
2 Main or primary MOS 4K-16M .25 - 1 s 2 – 32 word
core bytes .5 - 1 s
3 Bulk Memory Core 64K-16M 5 – 10 s 2 – 32 word
(LCS, ECS) bytes
4 Fixed head disks Magnetic 8M-256M 5 – 15 ms 1 k – 4k
or drums bytes bytes
5 Movable arm disk Magnetic 8M-500M 25 – 75 ms 4 K bytes
bytes
6 Tape Magnetic 50M bytes 1–5s 1K - 16K
bytes

The memory hierarchy is structured in such a way that the level i is “higher” than
those at level i+1.

If ci,ti and si are the cost per byte, average access time and total memory size at
level i respectively, then

3
ci > ci+1

ti < ti+1

si < si+1

where i>=1.

Cache Memory: A special very high-speed memory called a Cache is


sometimes used to increase the speed of processing by making current programs
and data available to the CPU at a rapid speed. The cache is used in computer
system to compensate for the speed differential between main memory access
time and processor logic. CPU logic is usually faster than main memory access
time, with the result that processor speed is limited primarily by the speed of main
memory. A technique used to compensate for the mismatch in operating speeds
is to employ an extremely fast, small cache between the CPU and main memory
whose access time is close to processor logic clock cycle time. The cache is used
for storing segments of programs currently being executed in the CPU and
temporary data frequently needed in the present calculation. Information transfer
between the processor and the cache is on a word basis. By introducing the
cache memory, the performance of the computer is improved.

Auxiliary memory
Magnetic I/O Main
tape Processor memory

Magnetic
disks

CPU Cache
memory

Memory hierarchy in a computer system

4
Processor Memory
Interconnection network

Local Main Fixed head


Processor memory memory disks or drums
(cache)
M2, 0
M3, 0
P0 M1, 0
M2, 1

Channels
M2, 2

P1 M1, 1
M2, 3 M3, 1

Three level memory hierarchy

Level 1 2 3

Access time t1 t2 t3

Memory capacity (bytes) s1 s2 s3

Cost per byte c1 c2 c3

M2, 0 – M2, 3 : main memory , designed either with Metal Oxide Semiconductor
(MOS) or with ferromagnetic (core) technology. The unit of information transfer
between main memory & cache is a block of contiguous information. The primary
memory can be extended with Large Core Storage (LCS) or with Extended Core
Storage (ECS).

The processor usually references an item in the memory by providing the address
of that item in the memory. The address space in level i is a subset of that in level
i+1. But Ak of level i is not necessarily the address A k at level i+1. Any information
that exists in level i that may exists in level i+1.

5
Data Inconsistency or Coherence problem: Some of the information in
level i may be more current than that in level i + 1. Data consistency problem
arises between adjacent levels because they have different copies of the same
information. Usually level i+1 is eventually updated with the modified information
from level i. Data consistency problem may also exist between the local memories
and caches when two concurrent processes executing on separate processors,
interact via one or more shared variables. One process may have the updated
value of the shared variable in its local memory, while the other process may
continue with the old value of the shared variable in its local memory.

Hit Ratio & Miss Ratio: Hit Ratio (H) is the probability of finding the requested
information in the memory of a given level. In general, H depends on the
granularity of information transfer, the capacity of the memory at that level, the
management strategy etc. Usually H is most sensitive to the memory size s.

The Miss Ratio F(s) = 1- H(s),

Where H(s) is the success function.

Since copies of information in level i are assumed to exist in levels greater than i,
the probability of a hit at level i and of misses at levels i-1 is:

h i = H( si ) – H( si-1 )

Where hi is the access frequency at level i and indicates the relative number of
successful accesses to level i. The missing-item fault frequency at level i is then

f i = 1 - hi

Optimization of Memory Hierarchy: The performance of the hierarchy may


be indicated by the effective hierarchy access time per each memory reference.
The other factors affect the performance are access time and memory size of
each level, granularity of information transfer (block size) etc.

6
The effective access time Ti from the processor to the i th level of the memory

hierarchy is the sum of the individual average access times tk of each level from
k = 1 to I:
i
Ti = tk
k=1

tk includes the Wait time due to memory conflicts at level k and the delay in the
switching network between levels k-1 and k. The degree of conflicts depend on

 Number of processors

 The number of memory modules

 Interconnection network between processor and memory modules

The elective access time for each memory reference in n level memory hierarchy
is

n
T = hi ti
i=1

Miss Penalty: The extra time needed to bring the desired information into the
cache is called miss penalty. In general miss penalty is the time needed to bring a
block of data from a slower unit in the memory hierarchy to a faster unit. The miss
penalty is reduced if efficient mechanisms for transferring data between the
various units of the hierarchy are implemented.

Impact of Cache on performance:

Average access time experienced by CPU

tave = hC + (1-h)M

where h = hit rate


M= miss penalty

7
C= time to access cache

Enhancement of performance by Secondary Cache: Since the size of a


cache on the CPU chip is limited by space constraints, a good design strategy for
designing a high-performance system is to use such a cache as a primary cache.
Am external cache is added to provide desired capacity. Primary cache is directly
accessed by the CPU. The secondary cache can be considerably slower, but it
should be much larger to ensure hit rate. Including a secondary cache further
reduces the impact of the main memory speed on the performance of the
computer. The average access time experienced by the CPU in a system with two
levels of caches is

tave = h1C1 + (1-h1)h2C2 + (1-h1)(1-h2)M

Where the parameters are defined as follows

h1 = the hit rate of primary cache

h2 = the hit rate of secondary cache

C1= the time to access information in the primary cache

C2= the time to access information in the secondary cache

M = the time to access information in the main memory

Virtual Memory: Virtual Memory may be described as a hierarchy of two


memory systems – one of them is low cost, large capacity, low speed system and
the other is a high cost, small capacity, high system. The operating system
manages the two memories in such a way that an user feels that he has access to

8
a single, large, directly addressable and fast main memory. A virtual memory
system facilitates its users to use a large addressable memory space without
worrying about the size limitations of the physical main memory.
In order to implement a virtual memory system, the main memory is divided into
fixed size contiguous areas, called Page frames. In addition, the on line disk
storage, are also divided into pieces of the same size, called either Pages or
Segments. Only those programs pages or segments that are actually at a
particular time in the processing, needed be in the primary storage. The remaining
pages may be kept temporarily in virtual memory, from where they can be loaded
into the main memory as and when required. The binary addresses that the
processor issues for either instructions or data are called Virtual or Logical
addresses. These addresses are translated into physical addresses by a
combination of hardware & software components. If a virtual address refers to a
part of a program or data space that is currently in the physical memory, then the
content of the address is accessed immediately; otherwise it is brought to the
physical memory before using.

PROCESSOR

Data Virtual address

MMU

Physical address

CACHE

Data Physical address

MAIN MEMORY

DMA transfer

DISK STORAGE

9
VIRTUAL MEMORY ORGANIZATION

A special hardware unit, called Memory Management Unit (MMU), translates


virtual address into physical address. When the desired data or instructions are in
the main memory, these data or instructions are fetched; if it is not in the main
memory, the MMU causes the operating system to bring the data into the memory
from the disk.

PAGING: 16-bit address is used.

64K
address
addresses space 4K
0 Main memory
4096 0
8191 4095

Address 4096 -> memory word 0

Address 8191 -> memory word 4095

65535

4096 to 8191 addresses are mapped onto main memory address 0 - 4095

Problem: What happens if a program jumps to an address between 8192 &


12287?

On a machine without virtual storage, it will be an error.

On a machine with virtual storage, it will not be an error. The following steps will
take place.

 The content of main memory would be saved in the secondary storage.

 Words 8192 to 12287 would be located in the secondary memory.

 Words 8192 to 12287 would be loaded into main memory.

10
 The address map is updated.

 Execution will continue.


IMPLEMENTATION OF PAGING: The virtual address space is broken up
into a number of equal-sized pages. The page size varies from 512 to 4096
addresses per page. A page table is maintained to obtain the virtual addresses of
different pages. When a program tries to reference its memory, whether to fetch
data, store data, fetch instruction or jump, it would first generate a 16-bit address
corresponding to virtual address between 0 and 65535.

Virtual Physical
Address memory
Space addresses
0-4K 2 0-4K
4K-8K 1 4K-8K
8K-12K 6 8K-12K
12K-16K 0 12K-16K
16K-20K 4 16K-20K
20K-24K 3 20K-24K
24K-28K X 24K-28K
28K-32K X 28K-32K
32K-36K X
36K-40K 5
40K-44K X Page frame
44K-48K 7
48K-52K X
52K-56K X
56K-60K X Virtual Page
60K-64K X

Relation between virtual addresses & physical memory addresses is


given in the page table

11
In the following figure a virtual address 8196 (0010000000000100 in binary) is
mapped using the MMU.

16 bit virtual address


0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
4 bit virtual page 12 bit address within the selected virtual page
number =2

0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0

Virtual Page =2 Incoming virtual address (8196)

V. PAGE P. Frame Present/absent


0 010 1
1 001 1
2 110 1
3 000 1
4 100 1
5 011 1
6 000 0
7 000 0
8 000 0
9 101 1
10 000 0
11 111 1
12 000 0
13 000 0
14 000 0
15 000 0
Outgoing physical address (24580)

1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0

12
In this example, the 16-bit address is taken as a 4 bit virtual page number and a
12-bit address within the selected page. In this figure, the 16-bit address is 8196.

PAGE FAULT: It is assumed that the virtual page reference is in main memory.
This assumption is not always true because there is not enough room in main
memory for all virtual pages. When a reference is made to an address on a page
which is not present in the in the main memory, it is called a page fault.

DEMAND PAGING: It is possible to start a program running on a machine with


virtual memory even when none of the program is in the main memory. When
CPU tries to fetch the first instruction, it immediately gets a page fault, which
causes the page containing the first instruction to be loaded and entered in the
page table. Then the first instruction can begin. If the first instruction has two
addresses, with two addresses on different pages, both different from the
instruction page, two or more page faults will occur, two more pages will be
brought in before the instruction can finally execute. The next instruction may
possibly cause some more page faults, and so on. This method of operating a
virtual memory is called Demand Paging.

PAGE SIZE& FRAGMENTATION: If a program and data need 26000 words


on a machine with 4096 words per page, the first 6 pages will be filled and the last
page will contain 26000 – 24576 = 1424 words. When the seventh page is present
in the memory, those words will waste valuable memory space. This problem of
these wasted words is called Fragmentation.

If the page size is n word, the average amount of space wasted in the last page of
the program by fragmentation will be n/2 - a situation that suggests using a small
page size to minimize waste. A small page size means many pages, as well as a
large page table. If the page table is maintained in hardware, a large number of
registers are required. In addition, more time will be will be required to load and
save these registers whenever a program is started or stopped. Furthermore
small pages make inefficient use of secondary memories with long access time,

13
such as disk. Because the Transfer time is relatively is usually sorter than the
combined seek and plus relational delay.

PAGE REPLACEMENT ALGORITHMS:

 Optimal Page Replacement Algorithm: This algorithm says that the page
with highest level should be removed. If one page will not be used for next
8 million instructions and another page will not be used for 6 million
instructions, removing the former will push back the page fault.

 Not Recently Used Page Replacement Algorithm: In order to allow the


operating to collect useful statistics about which pages are being used and
which ones are not, most computers with virtual memory have two status
bits associated with each page. R is a set whenever the page is referenced
(Read or Write). M is set when the page is written to ( ie. modified).

 FIFO Page Replacement Algorithm: The OS maintains a list of all pages


currently in the memory, with the page at the head of the list the oldest one
and the page at the tail the most recent arrival. On a page fault, the page at
the head is removed and new page is inserted.

 Second Chance Page Replacement Algorithm: A simple modification to


FIFO that avoids the problem of throwing out a heavily used page is to
inspect the R bit of the oldest page. If it is), it is both old and unused, so it
is replaced immediately. If the R bit is 1, the bit is cleared, the page put
onto the list of pages, and its load time is updated as though it had just
arrived in the memory. Then the searched continues.

 Clock Page Replacement Algorithm: This approach describes to keep all


the pages on a circular list in the form of a clock. A hand points to the
oldest page. When a page fault occurs, the page pointed by the hand is
inspected. If R bit is 0, the page is removed, the new page is inserted and
the hand is advanced by one position. If R is 1, it is cleared and the hand is
advanced to the next page. This process is repeated until a page is found
with R = 0.

14
 Least Recently Used (LRU) Page Replacement Algorithm: It is a good
approximation to the Optimal Algorithm. It is based on the observation that
pages that have been heavily used in the last few instructions will probably
be heavily used again in the next few. Pages that have not been used for
ages will probably remain unused for a long time. When a page fault
occurs, throw out the page that has been unused for a long time.

SEGMENTATION: The virtual memory is basically is one-dimensional memory


because the virtual addresses go from 0 to some maximum addresses, one after
another. A compiler has many tables that are built up as compilation proceeds. In
this case, it is better to have many virtual address spaces instead of having one.
As the space requirement grows and shrink rapidly, so there may be space
allocation problem. A straightforward and extremely general solution is to provide
the machine with many completely independent address spaces, called
Segments. Each segment consists of a linear sequence of addresses from 0 to
some maximum value. Different segment may have different length. Moreover
segment length may change during execution. As each address space is
independent, so different segments may grow or shrink independently, without
affecting each other. To specify an address in this segmented or two-dimensional
memory, the program must supply a two-part address, a segment number and an
address within the segment.
Segment 0 Segment 1 Segment 2

0 0 0

4K 4K 4K

8K 8K

12K

15
Because each segment forms a logical entity of which the programmer is aware,
such as a procedure, or an array, or a stack, different segments can have different
kinds of protection. A procedure segment can be specified as execute only,
prohibiting attempts to read from it or store into it.

IMPLEMENTATION OF PURE SEGMENTATION: The implementation of


segmentation differs from paging in an essential way: pages are fixed size and
segments are not.

Seg 0 4K Seg 0 4K Seg 0 4K Seg 0 4K Seg 0 4K

Seg 1 8K Seg 7 5K Seg 7 5K Seg 7 5K Seg 7 5K


3K 3K 3K
Seg 2 5K Seg 2 5K Seg 2 5K Seg 2 5K Seg 2 5K

Seg 6 4K Seg 6 4K
Seg 3 8K Seg 3 8K Seg 3 8K
4K Seg 5 4K
Seg 5 4K Seg 5 4K
Seg 4 7K Seg 4 7K 10k
3k 3K
A B C D E

A-D: Development Of Checker boarding


E: Removal Of The Checker boarding By Compaction

CONSIDERATION PAGING SEGMENTATION


Need the programmer aware that No Yes
this technique is used?
How many linear address spaces 1 Many
are there?
Can the total address space exceed Yes Yes
the size of physical memory?
Can procedure & data be No Yes
distinguished and separately
protected?
Can tables whose size fluctuates be No Yes
accommodated easily?
Is sharing of procedures between No Yes
users facilitated?
Why was the technique invented? To get large linear To allow programs
address space and data to be
without having broken up into
more physical logically independent
memory. address spaces.

16
ADDRESS TRANSLATION:

Page table base register Virtual address from processor


Page table address Virtual page number Offset

+
PAGE TABLE

.
.
.

Control Page frame


Bits in memory
Page Frame Offset

Physical address in main memory

A simple method for translating virtual address to physical address is to assume


that all programs and data are composed of fixed-length units called pages, each
of which consists of a block of words that occupy contiguous locations in the main
memory. Pages are generally of 2K to 16K bytes in length. Pages should not be
too small, because the access time of a magnetic disk is much longer than the
access time of main memory. If page size is too big, a major portion of the page
will be left unused.

Each virtual memory address is generated by the processor, is interpreted as a


Virtual Page Number (High order bits) followed by an offset (low order bits) that
specifies the location of a particular byte (or word) within a page. Information
about the main memory location of each page is kept in a page table. This

17
information includes the main memory address where the page is stored and the
current status of the page. The starting address of the page table is kept in a
Page Table Base Register. By adding the virtual page number to the contents of
this register, the address of the corresponding entry in the page table is obtained.
The contents of this location give the starting address of the page if the page is
currently in the main memory.

Each entry in the page table also includes some control bits that describe the
status of the page while it is in main memory. One bit indicates whether the page
is available in the main memory. Another bit indicates whether the page is
modified during its residency in the main memory.
The page table information is used by the MMU for every read & write access.
MMU is normally implemented as part of the CPU chip. It is impossible to
accommodate the entire page table within MMU. So page table is kept in the main
memory. A copy of small portion of the page table is kept in the MMU. This portion
consists of entries that correspond to the most recently accessed pages. A small
cache, usually called the Translation Lookaside Buffer (TLB), is incorporated into
the MMU for this purpose.
Address translation proceeds as follows. Given a virtual address, the MMU looks
in the TLB for the referenced page. If the page table entry for this page is found in
the TLB, the physical address is obtained immediately. If there is a miss in the
TLB, then the required entry is obtained from the page table in the main memory,
and the TLB is updated. The address translation process in the MMU requires
some time to perform mostly dependent on the time needed to look up entries in
the TLB. We an reduce the average translation time by including one or more
special registers that retain the virtual page number and the physical page frame
of the most recently performed translations. The information in these registers can
be accessed more quickly than the TLB.

18
CACHE MAPPING:
Consideration: 1) A cache consisting of 128 block frames of 16 words each, for a
total of 2048(2K) words.

2) Main memory is addressable by 16-bit address.

3) Main memory has 64K words, 4K blocks of 16 words each.

DIRECT MAPPING:

Main memory
Block frame 0 -> 0, 128, 256
Block 0
Block frame 1 -> 1, 129, 257 Block 1

Cache

tag Block 0
tag Block 1
Block 127

Block 128
Block 129

tag
Block 255
Block 127 Block 256
Tag Block Word Block 257

5 7 4
Main memory address
Block 4095
The high order 5 bits of the memory address of the block are stored in 5 tag bits
associated with its location in the cache. They identify 32 blocks in the cache. 7 bit
Block field points to a particular block frame location in the cache. 4-bit word field
represents the word number.
ADVANTAGE: Easy to implement.

DISADVANTAGE: Not flexible (contention may arise).

19
ASSOCIATIVE – MAPPED CACHE:

Main memory
Block 0

Cache Block 1

tag Block 0
tag Block 1

Block i

Block 127

Tag Word

12 4
Block 4095
Main memory address

Main memory block can be placed into any cache block position. 12 tag bits are
required to identify a memory block when it is resident in the cache. The tag bits
of a CPU generated address are compared to the tag bits of each block of the
cache to see if the desired is present. This is called Associative Mapping
technique.

ADVANTAGE: Freedom in choosing the cache location. Better space utilization.

DISADVANTAGE: Cost is higher, because it requires associative search.

SET - ASSOCIATIVE – MAPPED CACHE:

Set 0 -> 0, 64, 128, … ,4032 Set 63 -> 63,127, ….. 4095

20
Main memory
Set 1 -> 1, 65,129, ….
Cache Block 0
Set 0 tag Block 0
Block 1 Block 1
tag
tag
Set 1 Block 2
tag Block 3

Block63
Set 63 tag Block 64
Block 126 Block 65
tag
Block 127
Block 127
Block 128
Tag Set Word
6 6 4 Block 129

Main memory address


Block 4095

64 sets can be represented by 6-bit set field of the address determines which set
of the cache might contain the desired block. One more control bit, called Valid
Bit, must be provided for each block. The valid bits are all set to 0 when power is
initially applied to the system or when the main memory is loaded with new
programs and data from the disk. The valid bit of a particular cache block is set to
Block 0
1 the first time this block is loaded from the main
Blockmemory.
1 Whenever a main
memory block is updated by a source that bypass the cache, a check is made to
determine whether the block being loaded is currently in the cache. If it exists, its
Block 15
Block 16
valid bit is cleared to 0.

ADVANTAGE: 1) Providing few choices eliminates contention problem.


Block 31
2) Hardware cost is reduced by decreasing the size of associative
search.

SECTOR MAPPING CACHE:

Main memory
Cache

Valid Bit Sector 0


Block 16368
21
Tag Block 0
Sector 0

Block 15 Sector 1
Sector 1 Tag
Block 16

Block 31

Tag
Sector 7 Block 112

Block 127

Sector 1023

Sector (Tag) Block Word


10 4 4 Main memory address

Memory is partitioned into number of sectors and the cache is divided into number
of sector frames. If a request is made for a block not in cache, the sector to which
this block belongs is brought into the buffer. A valid bit is associated with each
block frame to indicate the blocks in a sector that have been referenced and
retrieved from memory.

ADVANTAGE: Reduce the cost of the map since it requires relatively few tags,
which permits simultaneous comparisons with all tags.

MEMORY BANDWIDTH: The bandwidth of a system is defined as the number of


operations performed per unit time. In case of a main memory system, the
memory bandwidth is measured by the number of memory words that can be
accessed per unit time. Let W be the number of words delivered per memory

cycle tm.

22
Maximum memory bandwidth Bm = W / tm (words/s or bytes/s)
u
Utilized CPU rate B P= Rw / TP (words/s)
TP is the total CPU time required to generate the Rw result.

PERFORMANCE CONSIDERATIONS: Two key factors in the commercial


success of a computer are performance and cost; a common measure of success
is the PRICE/ PERFORMANCE RATIO.

INTERLEAVING: If the main memory of a computer is structured as a collection


of physically separated module, each with its own address buffer register (ABR)
and data buffer register (DBR), memory access operations may proceed in more
than one module at the same time. In this way the aggregate rate of transmission
of words to and from the main memory system can be increased.
Consecutive words in a module [High order]:

k bits m bits

Module Address in module MM address

ABR DBR ABR DBR ABR DBR

Module 0 Module i Module


n-1

The high order k bits name one of n modules, and the low order m bits name a
particular word in that module. When consecutive locations are accessed, as
happens when a block of data is transferred to a cache, only one module is
involved. At the same time other devices with direct memory access (DMA) can
access information in other memory modules.

ADVANTAGE: Better system reliability, since a failed module affects only

23
localized area of the address space.

DISADVANTAGE: In array processor, elements of a vector cannot be fetched


simultaneously by all processors, if they reside in the same
module.

Consecutive words in Consecutive Module [Low order]:

m bits k bits
Address in module Module
MM address

ABR DBR ABR DBR ABR DBR

Module
Module 0 Module i k
2 -1

The more efficient way to address the modules is low order Interleaving. The low
order k bits of the memory address selects a module, and high order m bits name
a location within that module. In this way consecutive addresses are located in
successive modules. Thus, any component of the system that generates requests
for access to consecutive memory locations can keep several modules busy at
k
any one time. To implement interleaving, there must be 2 modules.

ADVANTAGE: Reduce memory interference to a segment of shared data.

WRITE BUFFER: When the write through protocol is used, each write operation
results in writing a new value into the main memory. The CPU is slowed down by
the write requests. To improve performance, a write buffer can be included for
temporary storage of write requests. The CPU places each write request into the

24
buffer and continues the execution of the next instruction. The write requests
stored in the write buffer are sent to the main memory whenever the main memory
is not responding to read requests.

PREFETECHING: To avoid stalling of the CPU, it is possible to prefetch the data


into the cache when they are first needed. A special prefetch instruction may be
provided in the instruction set of the processor. Executing this instruction causes
the addressed data to be loaded into the cache. A prefetch instruction is inserted
in a program to cause the data to be loaded in the cache by the time they are
needed in the program. The prefetching will take place while the CPU is busy for
executing instructions. Prefetching instructions can be inserted in the program by
the programmer or by the compiler. Prefetching can be done also by using
hardware. Some prefetches may load into cache that will not be used by the
instruction that follows.

LOOKUP-FREE CACHE: Action of prefetching stops other accesses to the


cache until the prefetch is completed. A cache of this type is said to be locked
while it services a miss. A cache that can support multiple outstanding misses is
called lookup-free cache. Since it can serve only one miss at a time, it must
include circuitry that keeps tracks of all outstanding misses. This type of track
keeping can be done by registers.

SOLVING CACHE COHERENCE PROBLEM:

1. WRITE-THROUGH PROTOCOL: This is implemented in two ways-

a) Write-Through With Update Protocol: When a processor writes a new value


into its cache, the new value is also written into the memory module that holds
the cache block being changed. Broadcast the written data to all processor
modules in the system. As each processor module receives the broadcast data, it
updates the contents of the affected cache block if this block is present in its
cache (primary or secondary).

25
b) Write-Through With Invalidation Protocol: When a processor writes a new value
into its cache, this value is written into the memory module, and all copies in other
caches are invalidated. Broadcasting can be used to send the invalidation
requests throughout the system.

2. Write-Back Update Protocol: Multiple copies of a cache block may exist if


different processors have loaded (read) the block into their caches. To change the
block a processor must be the exclusive owner of the block. When the ownership
is granted to this processor by the memory module that is the home location of the
block, all other copies, including the one in the memory module, are invalidated.
Now the owner may change the content. When another processor wants to read
this block, the data are sent to the processor by the current owner. The data are
also sent to the home memory module to update the values.

ADVANTAGE: Less traffic than write-through protocol, because a processor is


likely to perform several writes to a cache block before another
processor needs this block.

Snoopy Cache: In a single-bus system, all transactions between processors and


memory modules take place via the Bus. If two processors want to write to the
same cache block at the same time, one of the processors will be granted to use
the bus and become the owner. The other processor’s copy of the cache block is
invalidated. The second processor repeats the write request. This sequential
handling of write requests ensures that the two processors can correctly change
cache blocks. Such technique is called Snoopy Cache Technique.

26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy