Distributed Shared Memory - Revised

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Distributed Shared

Memory
Distributed shared memory
 DSM paradigm provides process with shared address
space.
 Primitives for shared memory:
1. Read(address)
2. Write(address, data)
 Shared memory paradigm gives the system a illusion of
physically shared memory.
 DSM refers to shared memory paradigm applied to
loosely coupled distributed memory systems.
Cont….
 Shared memory exists only virtually.
 Similar concept to virtual memory.
 DSM also known as DSVM.
 DSM provides a virtual address space shared among
processes on loosely coupled processors.
 DSM is basically an abstraction that integrates the local
memory of different machine into a single logical entity
shared by cooperating processes.
Distributed shared memory
DSM Architecture
 Each node of the system consist of one or more CPUs
and memory unit.
 Nodes are connected by high speed communication
network.
 Simple message passing system for nodes to exchange
information.
 Main memory of individual nodes is used to cache pieces
of shared memory space.
 Reduces network latency
Cont….
 Memory mapping manager routine maps local memory
to shared virtual memory.
 Shared memory space is partitioned into blocks.
 Data caching is used in DSM system to reduce network
latency.
 The basic unit of caching is a memory block.
Cont….
 If data is not available in local memory network block
fault is generated.

 On Network block fault:


 The missing block is migrated from the remote node to the
client process’s node and OS maps it into the application’s
address space.
 Data blocks keep migrating from one node to another on
demand but no communication is visible to the user processes.
Design and implementation issues
1. Granularity
2. Structure of Shared memory
3. Memory coherence and access synchronization
4. Data location and access
5. Replacement strategy
6. Thrashing
7. Heterogeneity
Granularity

 Refers to the block size of DSM.


 The unit of sharing & the unit of data transfer across the
network when a network block fault occurs.

 Possible unit are a few word, a page or few pages.


Memory Coherence & Access Synchronization
 Replicated and shared data items may
simultaneously be available in the main memories of
a number of nodes.

 Memory Coherence Problem


 Deals with the consistency of a piece of shared data lying
in the main memories of two or more nodes.
Replacement strategy
 Ifthe local memory of a node is full, a cache miss at that
node implies not only a fetch of accessed data block from
a remote node but also a replacement.

 Data block must be replaced by the new data block.


Thrashing

 Data block migrate between nodes on demand.

 Therefore if two nodes compete for write access to a


single data item, the corresponding data block may be
transferred back.
Granularity

 Most visible parameter in the design of DSM system is


block size.

 Sending small packet of data is more expensive than


sending large packet.
Cont…
 Factors influencing block size selection:
1. Paging overhead
2. Directory size
3. Thrashing
4. False sharing
Paging overhead
 A process is likely to access a large region of its
shared address space in a small amount of time.

 The paging overhead is less for large block size as


compared to the paging overhead for small block size.

Directory size
 The larger the block size, the smaller the
directory.

 Result: reduced directory management overhead


for larger block size.
Thrashing

 The problem of thrashing may occur:

 when data item in the same data block are being


updated by multiple nodes at the same time.

 with any block size, more likely with larger block


size.
False sharing
 Occurs when two different processes access
two unrelated variable that reside in the
same data block.
 The larger is the block size, higher is the probability of
false sharing.
 False sharing of a block may lead to a thrashing problem
Possible Solutions
Using page size as block size

 Relative advantage and disadvantages of small


and large block size makes it difficult for DSM
designer to decide on a proper block size.

 On Intel Core2 Duo 64-bit system, the page size


is 4kB, which is normal for almost every desktop
PC architectures.

 Linux kernel 2.6.x versions support large pages


(4MB).
Advantages of using page size as
block size

 Use of existing page fault schemes to trigger a DSM page


fault.
 Allows the access right control.
 Page size do not impose undue communication overhead
at the time of network page fault.
 Page size is a suitable data entity unit with respect to
memory contention.
Structure of Shared-Memory Space

 Structure defines the abstract view of the shared


memory space for application programmers.

 The structure and granularity of a DSM system are


closely related.
Consistency Models
 Consistency requirement vary from application to
application.

 Refers to the degree of consistency that has to be


maintained for the shared memory data.

 If a system supports a stronger consistency model,


weaker consistency model is automatically supported
but the converse is not true.
Types of Consistency Models
1. Strict Consistency model
2. Sequential Consistency model
3. Casual consistency model
4. Pipelined Random Access Memory consistency
model (PRAM)
5. Processor Consistency model
6. Weak consistency model
7. Release consistency model
Strict Consistency Model
 The strongest form with most stringent consistency
requirement.

 Value returned by a read operation on a memory


address is always the same as the value written by
the most recent write operation to that address.

 All writes instantaneously become visible to all


processes.

 Implementation requires the existence of an absolute


global time to synchronize clocks of all nodes.
 Practically impossible.
Sequential Consistency Model
 Proposed by Lamport in 1979.
 All processes see the same order of all memory access
operations on the shared memory.
 Exact order of access operations are interleaved & does
not matter.
Example :
 Operations performed in order
1. Read(r1)
2. write(w1)
3. Read(r2)
Cont…

 Only acceptable ordering


 for a strictly consistency memory
(r1, w1, r2)

 For sequential consistency model


Any of the orderings (r1,w1,r2), (r1,r2,w1), (w1,r1,r2),
(w1,r2,r1), (r2,r1,w1), (r2,w1,r1), is correct

if all processes see the same ordering.


Cont…
 The consistency requirement of the sequential
consistency model is weaker than that of the strict
consistency model.

 A sequentially consistent memory provide one-copy /


single-copy semantics.

 Acceptable by most applications.


Casual Consistency Model (PCR)
 Proposed by hutto and ahemad in 1990.

 All processes see only those memory reference


operations in the correct order that are potentially
casually related (PCR).
 Otherwise seen by different processes in different order.

 Required to construct and maintain dependency graphs


for memory access operations.
Pipelined Random Access Memory
(PRAM) Consistency model
 Proposed by Lipton and Sandberg in 1988.
 A weaker consistency semantics.
 Ensures that all write operations performed by a single
process are seen by all other processes in the order in
which they were performed as if all write operations
performed by a single process are in a pipeline.
PRAM
 P1 – W11 & W12
 P2 – W21 & W22
 P3 can see these as ((W11,W12), (W21,W22))
 P4 can see these as ((W21,W22), (W11,W12))
 Advantages:
 Simple and easy to implement.
 Good performance
 Limitation:
 All processes may not agree on the same order of memory
reference operations.
Processor Consistency model
 Proposed by Goodman in 1989.
 Very similar to PRAM model with additional restriction of
memory coherence.
 Memory coherence:
 For any memory location, all processes agree on the same order
of all write operation to that location.

 Processor consistency ensures that all write operations


performed on the same location are seen by all
processes in the same order.
Weak Consistency Model
 Many applications may demand that:
 It is not necessary to show the change in memory
done by every write operation to other processes.
 Idea of weak consistency:
 Better performance can be achieved if consistency
is enforced on a group of memory reference
operations rather than on individual memory
reference operations.
 Uses a special variable called a
synchronization variable to synchronize
memory.
Cont…
Requirements:
1. All accesses to synchronization variables must obey
sequential consistency semantics.
2. All previous write operations must be completed
everywhere before an access to a synchronization
variable is allowed.
3. All previous accesses to synchronization variables must
be completed before access to a non-synchronization
variable is allowed.
4. Better performance at the cost of putting extra burden on the
programmers.
Release consistency model

 Enhancement of weak consistency model.

 Use of two synchronization variables:


 Acquire
 Release
Cont…
 Acquire
 Used to tell the system that a process is
entering Critical section.
 Results in propagating changes made by other
nodes to process's node.
 Release
 Used to tell the system that a process has just
exited critical section.
 Results in propagating changes made by the
process to other nodes.
Cont…
 A variation of release consistency is lazy
release consistency proposed by Keleher in
1992.

 Modifications are not sent to other nodes at the


time of release but only on demand.

 Better performance.
Implementing Seq. Consistency Model

 Protocol for implementing depends on


whether the DSM system allows:
 replication & / or
 Migration
Of shared memory data blocks.

Strategies:
1. Nonreplicated, Nonmigrating blocks (NRNMB)
2. Nonreplicated, Migrating blocks (NRMB)
3. Replicated, migrating blocks (RMB)
4. Replicated, Nonmigrating blocks (RNMB)
NRNMBS
 Simplest strategy.

 Each block of the shared memory has a single


copy whose location is always fixed.
Cont…
 Enforcing sequential consistency is trivial.

 Method is simple and easy to implement.

 Drawbacks:
 Serializing data access creates a bottleneck.

 Parallelism is not possible.


Cont…
Data locating in the NRNMB strategy:
 There is a single copy of each block in the entire
system.

 The location of a block never changes.

 Requires a simple mapping function to map a block


of a node.
NRMBS
 Each block of the shared memory has a single
copy in the entire system.

 Migration is allowed.
 Owner node of a block changes as soon as the
block is migrating to a new node.

 Only the processes executing on one node


can read or write a given data item at any
time.
Cont…
Cont…
 Advantage :
 No communications cost are incurred when a
process accesses data currently held locally.
 Advantage of data access locality.

 Drawbacks:
 Prone to thrashing problem.
 No parallelism.
Data locating in the NRMB strategy

 There is a single copy of each block, the location


of a block keeps changing dynamically.
Cont…
Following method used:
1. Broadcasting
2. Centralized server algorithm
3. Fixed distributed server algorithm
4. Dynamic distributed server algorithm
Broadcasting

 Each node maintains an owned block table


that contains an entry for each block for
which the node is the current owner.
Cont…
Cont….

 On a fault,
 Fault handler of the faulting node broadcasts a
read/write request on the network.

Disadvantage:
 Not scalable.
Centralized server algorithm
Cont…
 A centralized server maintains a block table
that contains the location information for all
block in the shared memory space.

Drawbacks:
 A centralized server serializes location
queries, reducing parallelism.

 The failure of the centralized server will cause


the DSM system to stop functioning.
Fixed distributed server algorithm
Cont..
 A direct extension of the centralized server
scheme.

 There is a block manager on several nodes.

 Each block manager is given a predetermined


subset of data blocks to manage.

 On fault, the mapping functions is used by


the currently accessed block.
Dynamic distributed server algorithm
Cont…

 Does not use any block manager and


attempts to keep track of the ownership
information of all block in each node.

 Each node has a block table.


 Contains the ownership information for all blocks.

 Use of probable owner.


 When fault occurs, the faulting node extracts
from its block table the node information.
RMB - Replicated, migrating blocks

 Used to increase parallelism.


 Read operations are carried out in parallel at multiple
nodes by accessing the local copy of the data.

 Replication tends to increase the cost of write


operation.
 Problem to maintain consistency.

 Extra expense if the read/write ratio is large.


Cont…
 Protocols to ensure sequential consistency:
1. Write-invalidate
2. Write-update

Write-invalidate
 All copies of a piece of data except one are invalidated
before a write can be performed on it.

 After invalidation of a block, only the node that performs


the write operation on the block holds the modified
version of the block.
Cont…
Write-update
 A write operation is carried out by updating all copies of
the data on which the write is performed.

 On write fault, the fault handler copies the accessed


block from one of the block’s current node to its own
node.

 The write operation completes only after all the copies of


the block have been successfully updated.
Cont…
Cont…
 Assign sequence number to the modification and
multicasts that to all the nodes where a replica of the
data block to be modified is located.

 The write operations are processed at each node in


sequence number order.

 If the verification fails, node request the sequencer


for a retransmission of the missing modification.
Cont…
Cont…
 Write-update approach is very expensive.

 Inthe write-invalidate approach, updates are only


propagated when data is read and several updates
can take place before communication is necessary.
 Use of status tag associated with each block.
 Indicates the block is valid/shared/ read-only / writable.
Data Locating in the RMB strategy

Following algorithms may be used:


1. Broadcasting
2. Centralized-server algorithm
3. Fixed distributed-server algorithm
4. Dynamic distributed-server algorithm
Replicated, Non-Migrating Block
 A shared memory block may be replicated at multiple
node of the system but the location of each replica is
fixed.
 All replicas of a block are kept consistent by updating
them all in case of a write access.
 Sequential consistency is ensured by using a global
sequencer to sequence the write operation of all nodes.
Data locating in the RNMB strategy
Following characteristics:
1. The replica location of a block never change.
2. All replicas of a data block are kept consistent.
3. Only a read request can be directly sent to one of the
node having a replica and all write requests have to
be sent to the sequencer.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy