0% found this document useful (0 votes)

25 views

Unit 1 DC

UNIT 1 DC

Uploaded by

subbulakshmi R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Unit 1 DC

UNIT 1 DC

Uploaded by

subbulakshmi R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

CS3551- DISTRIBUTED COMPUTING

UNIT I INTRODUCTION

Introduction: Definition-Relation to Computer System Components – Motivation – Message -Passing

Systems versus Shared Memory Systems – Primitives for Distributed Communication –
Synchronous versus Asynchronous Executions – Design Issues and Challenges; A Model of
Distributed Computations: A Distributed Program – A Model of Distributed Executions – Models of
Communication Networks – Global State of a Distributed System.

UNIT II LOGICAL TIME AND GLOBAL STATE

Logical Time: Physical Clock Synchronization: NTP – A Framework for a System of Logical Clocks
– Scalar Time – Vector Time; Message Ordering and Group Communication: Message Ordering
Paradigms – Asynchronous Execution with Synchronous Communication – Synchronous Program
Order on Asynchronous System – Group Communication – Causal Order – Total Order; Global
State and Snapshot Recording Algorithms: Introduction – System Model and Definitions – Snapshot
Algorithms for FIFO Channels

UNIT III DISTRIBUTED MUTEX AND DEADLOCK

Distributed Mutual exclusion Algorithms: Introduction – Preliminaries – Lamport’s algorithm –

RicartAgrawala’s Algorithm –– Token-Based Algorithms – Suzuki-Kasami’s Broadcast Algorithm;
Deadlock Detection in Distributed Systems: Introduction – System Model – Preliminaries – Models
of Deadlocks – Chandy-Misra-Haas Algorithm for the AND model and OR Model.

UNIT IV CONSENSUS AND RECOVERY

Consensus and Agreement Algorithms: Problem Definition – Overview of Results – Agreement in a

Failure-Free System(Synchronous and Asynchronous) – Agreement in Synchronous Systems with
Failures; Checkpointing and Rollback Recovery: Introduction – Background and Definitions – Issues
in Failure Recovery – Checkpoint-based Recovery – Coordinated Checkpointing Algorithm –
– Algorithm for Asynchronous Checkpointing and Recovery

UNIT V CLOUD COMPUTING

Definition of Cloud Computing – Characteristics of Cloud – Cloud Deployment Models – Cloud

Service Models – Driving Factors and Challenges of Cloud – Virtualization – Load Balancing –
Scalability and Elasticity – Replication – Monitoring – Cloud Services and Platforms: Compute
Services – Storage Services – Application Services
UNIT I
INTRODUCTION

The process of computation was started from working on a single processor. This uni-
processor computing can be termed as centralized computing.
A distributed system is a collection of independent computers, interconnected via a
network, capable of collaborating on a task. Distributed computing is computing
performed in a distributed system.

A distributed system is a collection of independent entities that cooperate to solve a problem

that cannot be individually solved. Distributed computing is widely used due to
advancements in machines; faster and cheaper networks. In distributed systems, the entire
network will be viewed as a computer. The multiple systems connected to the network will
appear as a single system to the user.
Features of Distributed Systems:
No common physical clock - It introduces the element of “distribution” in the system and
gives rise to the inherent asynchrony amongst the processors.
No shared memory - A key feature that requires message-passing for communication. This
feature implies the absence of the common physical clock.
Geographical separation – The geographically wider apart that the processors are, the
more representative is the system of a distributed system.
Autonomy and heterogeneity – Here the processors are “loosely coupled” in that they have
different speeds and each can be running a different operating system.

Issues in distributed systems

Heterogeneity
Openness
Security
Scalability
Failure handling
Concurrency
Transparency
Quality of service

1.2 Relation to Computer System Components

Fig 1.1: Example of a Distributed System

As shown in Fig 1.1, Each computer has a memory-processing unit and the computers are
connected by a communication network. Each system connected to the distributed networks
hosts distributed software which is a middleware technology. This drives the Distributed
System (DS) at the same time preserves the heterogeneity of the DS. The term computation
or run in a distributed system is the execution of processes to achieve a common goal.

Fig 1.2: Interaction of layers of network

The interaction of the layers of the network with the operating system and
middleware is shown in Fig 1.2. The middleware contains important library functions for
facilitating the operations of DS.
The distributed system uses a layered architecture to break down the complexity of system
design. The middleware is the distributed software that drives the distributed system, while
providing transparency of heterogeneity at the platform level

Examples of middleware: Object Management Group’s (OMG), Common Object Request

Broker Architecture (CORBA) [36], Remote Procedure Call (RPC), Message Passing
Interface (MPI)

1.3 Motivation

The following are the key points that acts as a driving force behind DS:

Inherently distributed computations: DS can process the computations at geographically

remote locations.
Resource sharing: The hardware, databases, special libraries can be shared between
systems without owning a dedicated copy or a replica. This is cost effective and reliable.
Access to geographically remote data and resources: Resources such as centralized
servers can also be accessed from distant locations.
Enhanced reliability: DS provides enhanced reliability, since they run on multiple copies of
resources.
The term reliability comprises of:
1. Availability: The resource/ service provided by the resource should be accessible
atall times
2. Integrity: the value/state of the resource should be correct and consistent.
3. Fault-Tolerance: Ability to recover from system failures
Increased performance/cost ratio: The resource sharing and remote access features of DS
naturally increase the performance / cost ratio.
Scalable: The number of systems operating in a distributed environment can be increased as
the demand increases.

1.4 MESSAGE-PASSING SYSTEMS VERSUS SHARED MEMORY SYSTEMS

Communication among processors takes place via shared data variables, and
control variables for synchronization among the processors. The communicationsbetween
the tasks in multiprocessor systems take place through two main modes:

Message passing systems:

• This allows multiple processes to read and write data to the message queue
without being connected to each other.
• Messages are stored on the queue until their recipient retrieves them.
Shared memory systems:
• The shared memory is the memory that can be simultaneously accessed by
multiple processes. This is done so that the processes can communicate with each
other.
• Communication among processors takes place through shared data variables, and
control variables for synchronization among the processors.
• Semaphores and monitors are common synchronization mechanisms on shared
memory systems.
• When shared memory model is implemented in a distributed environment, it is
termed as distributed shared memory.

Emulating message-passing on a shared memory system (MP → SM)

• The shared memory system can be made to act as message passing system. The
shared address space can be partitioned into disjoint parts, one part being
assigned to each processor.
• Send and receive operations care implemented by writing to and reading from the
destination/sender processor’s address space. The read and write operations are
synchronized.
• Specifically, a separate location can be reserved as the mailbox for each ordered
pair of processes.

Emulating shared memory on a message-passing system (SM → MP)

• This is also implemented through read and write operations. Each shared
location can be modeled as a separate process. Write to a shared location is
emulated by sending an update message to the corresponding owner process and
read operation to a shared location is emulated by sending a query message to the
owner process.
• This emulation is expensive as the processes has to gain access to other process
memory location. The latencies involved in read and write operations may be
high even when using shared memory emulation because the read and write
operations are implemented by using network-wide communication.

1.5 PRIMITIVES FOR DISTRIBUTED COMMUNICATION

Blocking / Non blocking / Synchronous / Asynchronous

• Message send and message receive communication primitives are done through
Send() and Receive(), respectively.
• A Send primitive has two parameters: the destination, and the buffer in the user
space that holds the data to be sent.
• The Receive primitive also has two parameters: the source from which the data is
to be received and the user buffer into which the data is to be received.
There are two ways of sending data when the Send primitive is called:

• Buffered: The standard option copies the data from the user buffer to the kernel
buffer. The data later gets copied from the kernel buffer onto the network. For the
Receive primitive, the buffered option is usually required because the data may
already have arrived when the primitive is invoked, and needs a storage place in
the kernel.
• Unbuffered: The data gets copied directly from the user buffer onto the network.

Blocking primitives
• The primitive commands wait for the message to be delivered. The execution of
the processes is blocked.
• The sending process must wait after a send until an acknowledgement is made
bythe receiver.
• The receiving process must wait for the expected message from the sending
process
• A primitive is blocking if control returns to the invoking process after the
processing for the primitive completes.
Non Blocking primitives
• If send is nonblocking, it returns control to the caller immediately, before the
message is sent.
• The advantage of this scheme is that the sending process can continue computing
in parallel with the message transmission, instead of having the CPU go idle.
• This is a form of asynchronous communication.
• A primitive is non-blocking if control returns back to the invoking process
immediately after invocation, even though the operation has not completed.
• For a non-blocking Send, control returns to the process even before the data
iscopied out of the user buffer.

For a non-blocking Receive, control returns to the process even before thedata may have
arrived from the sender.
Synchronous
• A Send or a Receive primitive is synchronous if both the Send() and Receive()
handshake with each other.
• The processing for the Send primitive completes only after the invoking
processor learns
• The processing for the Receive primitive completes when the data to be
received is copied into the receiver’s user buffer.
Asynchronous
• A Send primitive is said to be asynchronous, if control returns back to the
invoking process after the data item to be sent has been copied out of the user-
specified buffer.
• For non-blocking primitives, a return parameter on the primitive call returns a
system-generated handle which can be later used to check the status of
completion of the call.
• The process can check for the completion:
o checking if the handle has been flagged or posted
o issue a Wait with a list of handles as parameters: usually blocks until one
of the parameter handles is posted.
The send and receive primitives can be implemented in four modes:
• Blocking synchronous
• Non- blocking synchronous
• Blocking asynchronous
• Non- blocking asynchronous

Four modes of send operation

Blocking synchronous Send:
• The data gets copied from the user buffer to the kernel buffer and is then sent over
the network.
• After the data is copied to the receiver’s system buffer and a Receive call has been
issued, an acknowledgement back to the sender causes control to return to the
process that invoked the Send operation and completes the Send.
Non-blocking synchronous Send:
• Control returns back to the invoking process as soon as the copy of data from the user
buffer to the kernel buffer is initiated.
• A parameter in the non-blocking call also gets set with the handle of a location that
the user process can later check for the completion of the synchronous send
operation.
• The location gets posted after an acknowledgement returns from the receiver.
• The user process can keep checking for the completion of the non-blocking
synchronous Send by testing the returned handle, or it can invoke the blocking Wait
operation on the returned handle
Blocking asynchronous Send:
• The user process that invokes the Send is blocked until the data is copied from the
user’s buffer to the kernel buffer.
Non-blocking asynchronous Send:
• The user process that invokes the Send is blocked until the transfer of the data from
the user’s buffer to the kernel buffer is initiated.
• Control returns to the user process as soon as this transfer is initiated, and a parameter
in the non-blocking call also gets set with the handle of a location that the user
process can check later using the Wait operation for the completion of the
asynchronous Send.
The asynchronous Send completes when the data has been copied out of the user’s
buffer. The checking for the completion may be necessary if the user wants to reuse the
buffer from which the data was sent.
Modes of receive operation
Blocking Receive:
The Receive call blocks until the data expected arrives and is written in the specified
user buffer. Then control is returned to the user process.
Non-blocking Receive:
• The Receive call will cause the kernel to register the call and return the handle
of a location that the user process can later check for the completion of the
non-blocking Receive operation.
• This location gets posted by the kernel after the expected data arrives and is
copied to the user-specified buffer. The user process can check for then
completion of the non-blocking Receive by invoking the Wait operation on the
returned handle.
•
Processor Synchrony

Processor synchrony indicates that all the processors execute in lock-step with their clocks
synchronized.

To ensure that no processor begins executing the next step of code until all the processors
have completed executing the previous steps ofcode assigned to each of the processors.

Libraries and standards

There exists a wide range of primitives for message-passing. The message-passing interface
(MPI) library and the PVM (parallel virtual machine) library are used largely by the
scientific community
• Message Passing Interface (MPI): This is a standardized and portable message-
passing system to function on a wide variety of parallel computers. MPI primarily
addresses the message-passing parallel programming model: data is moved from the
address space of one process to that of another process through cooperative
operations on each process.
• Parallel Virtual Machine (PVM): It is a software tool for parallel networking of
computers. It is designed to allow a network of heterogeneous Unix and/or Windows
machines to be used as a single distributed parallel processor.
• Remote Procedure Call (RPC): The Remote Procedure Call (RPC) is a common
model of request reply protocol. In RPC, the procedure need not exist in the same
address space as the calling procedure.
• Remote Method Invocation (RMI): RMI (Remote Method Invocation) is a way that
a programmer can write object-oriented programming in which objects on different
computers can interact in a distributed network.
• Remote Procedure Call (RPC): RPC is a powerful technique for constructing
distributed, client-server based applications. In RPC, the procedure need not exist in
the same address space as the calling procedure. The two processes may be on the
same system, or they may be on different systems with a network connecting them.

• Common Object Request Broker Architecture (CORBA): CORBA describes a

messaging mechanism by which objects distributed over a network can communicate with
each other irrespective of the platform and language used to develop those objects.
1.6 SYNCHRONOUS VS ASYNCHRONOUS EXECUTIONS
The execution of process in distributed systems may be synchronous or asynchronous.

Asynchronous Execution:
A communication among processes is considered asynchronous, when every
communicating process can have a different observation of the order of the messages being
exchanged. In an asynchronous execution:
• there is no processor synchrony and there is no bound on the drift rate of processor
clocks
• message delays are finite but unbounded
• no upper bound on the time taken by a process
Fig: Asynchronous execution in message passing system

Synchronous Execution:
A communication among processes is considered synchronous when every process
observes the same order of messages within the system. In an synchronous execution:
• processors are synchronized and the clock drift rate between any two processors is
bounded
• message delivery times are such that they occur in one logical step or round
• upper bound on the time taken by a process to execute a
step.

Emulating an asynchronous system by a synchronous system (A → S)

An asynchronous program can be emulated on a synchronous system fairly trivially as the
synchronous system is a special case of an asynchronous system – all communication
finishes within the same round in which it is initiated.

Emulating a synchronous system by an asynchronous system (S → A)

A synchronous program can be emulated on an asynchronous system using a tool called
synchronizer.

Emulation for a fault free system

Fig 1.15: Emulations in a failure free message passing system
If system A can be emulated by system B, denoted A/B, and if a problem is not solvable in
B, then it is also not solvable in A. If a problem is solvable in A, it is also solvable in B.
Hence, in a sense, all four classes are equivalent in terms of computability in failure-free
systems.

1.7 DESIGN ISSUES AND CHALLENGES IN DISTRIBUTED SYSTEMS

The design of distributed systems has numerous challenges. They can be categorized
into:
• Issues related to system and operating systems design
• Issues related to algorithm design
• Issues arising due to emerging technologies
The above three classes are not mutually exclusive.

1.7.1 Issues related to system and operating systems design

The following are some of the common challenges to be addressed in designing a
distributed system from system perspective:
➢ Communication: This task involves designing suitable communication mechanisms
among the various processes in the networks.
Examples: RPC, RMI

➢ Processes: The main challenges involved are: process and thread management at
both client and server environments, migration of code between systems, design of software
and mobile agents.
➢ Naming: Devising easy to use and robust schemes for names, identifiers, and
addresses is essential for locating resources and processes in a transparent and scalable
manner. The remote and highly varied geographical locations make this task difficult.
➢ Synchronization: Mutual exclusion, leader election, deploying physical clocks,
global state recording are some synchronization mechanisms.
➢ Data storage and access Schemes: Designing file systems for easy and efficient data
storage with implicit accessing mechanism is very much essential for distributed operation
➢ Consistency and replication: The notion of Distributed systems goes hand in hand
with replication of data, to provide high degree of scalability. The replicas should be handed
with care since data consistency is prime issue.

➢ Fault tolerance: This requires maintenance of fail proof links, nodes, and processes.
Some of the common fault tolerant techniques are resilience, reliable communication,
distributed commit, checkpointing and recovery, agreement and consensus, failure detection,
and self-stabilization.
➢ Security: Cryptography, secure channels, access control, key management –
generation and distribution, authorization, and secure group management are some of the
security measure that is imposed on distributed systems.
➢ Applications Programming Interface (API) and transparency: The user
friendliness and ease of use is very important to make the distributed services to be used by
wide community. Transparency, which is hiding inner implementation policy from users, is
of the following types:

▪ Access transparency: hides differences in data representation

▪ Location transparency: hides differences in locations y providing uniform access to
data located at remote locations.
▪ Migration transparency: allows relocating resources without changing names.
▪ Replication transparency: Makes the user unaware whether he is working on
original or replicated data.
▪ Concurrency transparency: Masks the concurrent use of shared resources for the
user.
▪ Failure transparency: system being reliable and fault-tolerant.
➢ Scalability and modularity: The algorithms, data and services must be as distributed
as possible. Various techniques such as replication, caching and cache management, and
asynchronous processing help to achieve scalability.
1.7.2 Algorithmic challenges in distributed computing
➢ Designing useful execution models and frameworks
The interleaving model, partial order model, input/output automata model and the Temporal
Logic of Actions (TLA) are some examples of models that provide different degrees of
infrastructure.
➢ Dynamic distributed graph algorithms and distributed routing algorithms
• The distributed system is generally modeled as a distributed graph.
• Hence graph algorithms are the base for large number of higher level
communication,data dissemination, object location, and object search functions.
• These algorithms must have the capacity to deal with highly dynamic graph
characteristics. They are expected to function like routing algorithms.
• The performance of these algorithms has direct impact on user-perceived latency, data
traffic and load in the network.
➢ Time and global state in a distributed system

• The geographically remote resources demands the synchronization based on logical

time.
• Logical time is relative and eliminates the overheads of providing physical time for
applications. Logical time can
(i) Capture the logic and inter-process dependencies
(ii) track the relative progress at each process
• Maintaining the global state of the system across space involves the role of time
dimension for consistency. This can be done with extra effort in a coordinated manner.
• Deriving appropriate measures of concurrency also involves the time dimension, as
theexecution and communication speed of threads may vary a lot.
➢ Synchronization/coordination mechanisms
• Synchronization is essential for the distributed processes to facilitate concurrent
execution without affecting other processes.

• The synchronization mechanisms also involve resource management and

concurrency management mechanisms.
• Some techniques for providing synchronization are:
✓ Physical clock synchronization: Physical clocks usually diverge in their values due
to hardware limitations. Keeping them synchronized is a fundamental challenge to maintain
common time.
✓ Leader election: All the processes need to agree on which process will play the
roleof a distinguished process or a leader process. A leader is necessary even for many
distributed algorithms because there is often some asymmetry.
✓ Mutual exclusion: Access to the critical resource(s) has to be coordinated.

✓ Deadlock detection and resolution: This is done to avoid duplicate work,

and deadlock resolution should be coordinated to avoid unnecessary aborts of
processes.
✓ Termination detection: cooperation among the processes to detect the specific global
state of quiescence.
✓ Garbage collection: Detecting garbage requires coordination among the processes.
➢ Group communication, multicast, and ordered message delivery
• A group is a collection of processes that share a common context and collaborate on a
common task within an application domain. Group management protocols are needed for
group communication wherein processes can join and leave groups dynamically, or fail.
➢ Monitoring distributed events and predicates
• Predicates defined on program variables that are local to different processes are used
for specifying conditions on the global system state.
• On-line algorithms for monitoring such predicates are hence important.
• The specification of such predicates uses physical or logical time relationships.
➢ Distributed program design and verification tools
Methodically designed and verifiably correct programs can greatly reduce the overhead of
software design, debugging, and engineering. Designing these is a big challenge.
➢ Debugging distributed programs
Debugging distributed programs is much harder because of the concurrency and replications.
Adequate debugging mechanisms and tools are need of the hour.
➢ Data replication, consistency models, and caching
• Fast access to data and other resources is important in distributed systems.
Managing replicas and their updates faces concurrency problems.
• Placement of the replicas in the systems is also a challenge because resources
usuallycannot be freely replicated.
➢ World Wide Web design – caching, searching, scheduling
• WWW is a commonly known distributed system.
• The issues of object replication and caching, prefetching of objects have to be done on
WWW also.
• Object search and navigationon the web are important functions in the operation of
the web.
➢ Distributed shared memory abstraction
• A shared memory is easier to implement since it does not involve managing the
communication tasks.
• The communication is done by the middleware by message passing.
• The overhead of shared memory is to be dealt by the middleware technology.
• Some of the methodologies that does the task of communication in shared memory
distributed systems are:
✓ Wait-free algorithms: The ability of a process to complete its execution irrespective
of the actions of other processes is wait free algorithm. They control the access to shared
resources in the shared memory abstraction. They are expensive.
✓ Mutual exclusion: Concurrent access of processes to a shared resource or data is
executed in mutually exclusive manner. Only one process is allowed to execute the critical
section at any given time. In a distributed system, shared variables or a local kernel cannot
be used to implement mutual exclusion. Message passing is the sole means for implementing
distributed mutual exclusion.

✓ Register constructions: Architectures must be designed in such a way that,

registersallows concurrent access without any restrictions on the concurrency permitted.
➢ Reliable and fault-tolerant distributed systems
The following are some of the fault tolerant strategies:
✓ Consensus algorithms: Consensus algorithms allow correctly functioning processes
to reach agreement among themselves in spite of the existence of malicious processes. The
goal of the malicious processes is to prevent the correctly functioning processes from
reaching agreement. The malicious processes operate by sending messages with misleading
information, to confuse the correctly functioning processes.
✓ Replication and replica management: The Triple Modular Redundancy (TMR)
technique is used in software and hardware implementation. TMR is a fault-tolerant form of
N-modular redundancy, in which three systems perform a process and that result is
processed by a majority-voting system to produce a single output.
✓ Voting and quorum systems: Providing redundancy in the active or passive
components in the system and then performing voting based on some quorum criterion is a
classical way of dealing with fault-tolerance. Designing efficient algorithms for this
purposeis the challenge.
✓ Distributed databases and distributed commit: The distributed databases should
also follow atomicity, consistency, isolation and durability (ACID) properties.
✓ Self-stabilizing systems: A self-stabilizing algorithm guarantee to take the system to
a good state even if a bad state were to arise due to some error. Self-stabilizing algorithms
require some in-built redundancy to track additional variables of the state and do extra work.
✓ Checkpointing and recovery algorithms: Checkpointing is periodically recording
the current state on secondary storage so that, in case of a failure. The entire computation is
not lost but can be recovered from one of the recently taken checkpoints. Checkpointing in
distributed environment is difficult because if the checkpoints at the different processes are
not coordinated, the local checkpoints may become useless because they are inconsistent with
the checkpoints at other processes.
✓ Failure detectors: The asynchronous distributed do not have a bound on the message
transmission time. This makes the message passing very difficult, since the receiver do not
know the waiting time. Failure detectors probabilistically suspect another process as having
failed and then converge on a determination of the up/down status of the suspected process.
➢ Load balancing
The objective of load balancing is to gain higher throughput, and reduce the user
perceived latency. Load balancing may be necessary because of a variety off actors such
as high network traffic or high request rate causing the network connection to be a
bottleneck, or high computational load. The following are some forms of load balancing:
✓ Data migration: The ability to move data around in the system, based on the access
pattern of the users
✓ Computation migration: The ability to relocate processes in order to perform
are distribution of the workload.
✓ Distributed scheduling: This achieves a better turnaround time for the users by
using idle processing power in the system more efficiently.
➢ Real-time scheduling
Real-time scheduling becomes more challenging when a global view of the system state is
absent with more frequent on-line or dynamic changes. The message propagation delays
which are network-dependent are hard to control or predict. This is an hindrance to meet the
QoS requirements of the network.

➢ Performance
User perceived latency in distributed systems must be reduced. The common issues in
performance:
✓ Metrics: Appropriate metrics must be defined for measuring the performance of
theoretical distributed algorithms and its implementation.
✓ Measurement methods/tools: The distributed system is a complex entity
appropriate methodology and tools must be developed for measuring the performance
metrics.
1.7.3 Applications of distributed computing and newer challenges
The deployment environment of distributed systems ranges from mobile systems to
cloud storage. All the environments have their own challenges:
➢ Mobile systems
o Mobile systems which use wireless communication in shared broadcast
medium have issues related to physical layer such as transmission range,
power, battery power consumption, interfacing with wired internet, signal
processing and interference.
o The issues pertaining to other higher layers include routing, location
management, channel allocation, localization and position estimation, and
mobility management.
o Apart from the above mentioned common challenges, the architectural
differences of the mobile network demands varied treatment. The two
architectures are:
✓ Base-station approach (cellular approach): The geographical region is divided into
hexagonal physical locations called cells. The powerful base station transmits signals to all
other nodes in its range

✓ Ad-hoc network approach: This is an infrastructure-less approach which do not

haveany base station to transmit signals. Instead all the responsibility is distributed among
the mobile nodes.
✓ It is evident that both the approaches work in different environment with different
principles of communication. Designing a distributed system to cater the varied need is a
great challenge.

➢ Sensor networks
o A sensor is a processor with an electro-mechanical interface that is capable of
sensing physical parameters.
o They are low cost equipment with limited computational power and battery
life. They are designed to handle streaming data and route it to external
computer network and processes.
o They are susceptible to faults and have to reconfigure themselves.
o These features introduces a whole new set of challenges, such as position
estimation and time estimation when designing a distributed system .
➢ Ubiquitous or pervasive computing
o In Ubiquitous systems the processors are embedded in the environment to
perform application functions in the background.
o Examples: Intelligent devices, smart homes etc.
o They are distributed systems with recent advancements operating in wireless
environments through actuator mechanisms.
o They can be self-organizing and network-centric with limited resources.
➢ Peer-to-peer computing
o Peer-to-peer (P2P) computing is computing over an application layer
networkwhere all interactions among the processors are at a same level.
o This is a form of symmetric computation against the client sever paradigm.
o They are self-organizing with or without regular structure to the network.
Some of the key challenges include: object storage mechanisms, efficientobject lookup, and retrieval in a
scalable manner; dynamic reconfiguration with nodes as well as objects joining and leaving the network
randomly;replication strategies to expedite object search; tradeoffs between object size latency and table
sizes; anonymity, privacy, and security
➢ Publish-subscribe, content distribution, and multimedia
o The users in present day require only the information of interest.
o In a dynamic environment where the information constantly fluctuates there
isgreat demand for
o Publish: an efficient mechanism for distributing this information
o Subscribe: an efficient mechanism to allow end users to indicate interest in
receiving specific kinds of information
o An efficient mechanism for aggregating large volumes of published
information and filtering it as per the user’s subscription filter.
o Content distribution refers to a mechanism that categorizes the information
based on parameters.
o The publish subscribe and content distribution overlap each other.
o Multimedia data introduces special issue because of its large size.
➢ Distributed agents
o Agents are software processes or sometimes robots that move around the
system to do specific tasks for which they are programmed.
o Agents collect and process information and can exchange such
informationwith other agents.

o Challenges in distributed agent systems include coordination mechanisms

among the agents, controlling the mobility of the agents, their software design
and interfaces.
➢ Distributed data mining
o Data mining algorithms process large amount of data to detect patterns and
trends in the data, to mine or extract useful information.
o The mining can be done by applying database and artificial intelligence
techniques to a data repository.
➢ Grid computing
• Grid computing is deployed to manage resources. For instance, idle CPU
cycles of machines connected to the network will be available to others.
• The challenges includes: scheduling jobs, framework for implementing quality
of service, real-time guarantees, security.
➢ Security in distributed systems
The challenges of security in a distributed setting include: confidentiality,
authentication and availability. This can be addressed using efficient and scalable solutions.

1.8 A MODEL OF DISTRIBUTED COMPUTATIONS: DISTRIBUTED PROGRAM

• A distributed program is composed of a set of asynchronous processes that
communicate by message passing over the communication network. Each process
may run on different processor.
• The processes do not share a global memory and communicate solely by passing
messages. These processes do not share a global clock that is instantaneously
accessible to these processes.
• Process execution and message transfer are asynchronous – a process may execute an
action spontaneously and a process sending a message does not wait for the delivery
of the message to be complete.
• The global state of a distributed computation is composed of the states of the
processes and the communication channels. The state of a process is characterized by
the state of its local memory and depends upon the context.
• The state of a channel is characterized by the set of messages in transit in the channel.
A MODEL OF DISTRIBUTED EXECUTIONS

• The execution of a process consists of a sequential execution of its actions.

• The actions are atomic and the actions of a process are modeled as three types of
events: internal events, message send events, and message receive events.
• An internal event changes the state of the process at which it occurs.
• A send event changes the state of the process that sends the message and the state of
the channel on which the message is sent.
• The execution of process pi produces a sequence of events e1, e2, e3, …, and it is
denoted by Hi: Hi =(hi→i). Here hiare states produced by pi and →are the casual
dependencies among events pi.
• →msgindicates the dependency that exists due to message passing between two events.

Fig Space time distribution of distributed systems

• An internal event changes the state of the process at which it occurs. A send event
changes the state of the process that sends the message and the state of the channel
onwhich the message is sent.
• A receive event changes the state of the process that receives the message and the
stateof the channel on which the message is received.
Casual Precedence Relations
Causal message ordering is a partial ordering of messages in a distributed computing
environment. It is the delivery of messages to a process in the order in which they were
transmitted to that process.

It places a restriction on communication between processes by requiring that if the

transmission of message mi to process pk necessarily preceded the transmission of message
mj to the same process, then the delivery of these messages to that process must be ordered
such that mi is delivered before mj.
Happen Before Relation
The partial ordering obtained by generalizing the relationship between two process is called
as happened-before relation or causal ordering or potential causal ordering. This term
was coined by Lamport. Happens-before defines a partial order of events in a distributed
system. Some events can’t be placed in the order. If say A →B if A happens before B. A B
is defined using the following rules:
✓ Local ordering:A and B occur on same process and A occurs before B.
✓ Messages: send(m) → receive(m) for any message m
✓ Transitivity: e → e’’ if e → e’ and e’ → e’’
• Ordering can be based on two situations:
1. If two events occur in same process then they occurred in the order observed.
2. During message passing, the event of sending message occurred before the event of
receiving it.

Lamports ordering is happen before relation denoted by →

• a→b, if a and b are events in the same process and a occurred before b.
• a→b, if a is the vent of sending a message m in a process and b is the event of the
same message m being received by another process.
• If a→b and b→c, then a→c. Lamports law follow transitivity property.

When all the above conditions are satisfied, then it can be concluded that a→b is casually
related. Consider two events c and d; c→d and d→c is false (i.e) they are not casually
related, then c and d are said to be concurrent events denoted as c||d.

Fig Communication between processes

Fig 1.22 shows the communication of messages m1 and m2 between three processes p1, p2
and p3. a, b, c, d, e and f are events. It can be inferred from the diagram that, a→b; c→d;
e→f; b->c; d→f; a→d; a→f; b→d; b→f. Also a||e and c||e.

Logical vs physical concurrency

Physical as well as logical concurrency is two events that creates confusion in
distributed systems.
Physical concurrency: Several program units from the same program that execute
simultaneously.
Logical concurrency: Multiple processors providing actual concurrency. The actual
execution of programs is taking place in interleaved fashion on a single processor.

Differences between logical and physical concurrency

Logical concurrency Physical concurrency
Several units of the same program execute Several program units of the same program
simultaneously on same processor, giving an execute at the same time on different
illusion to the programmer that they are processors.
executing on multiple processors.
They are implemented through interleaving. They are implemented as uni-processor with
I/O
channels, multiple CPUs, network of uni or
multi CPU machines.
MODELS OF COMMUNICATION NETWORK
The three main types of communication models in distributed systems are:
FIFO (first-in, first-out): each channel acts as a FIFO message queue.
Non-FIFO (N-FIFO): a channel acts like a set in which a sender process adds messages and
receiver removes messages in random order.
Causal Ordering (CO): It follows Lamport’s law.
o The relation between the three models is given by CO FIFO N-FIFO.

A system that supports the causal ordering model satisfies the following property:

GLOBAL STATE

Distributed Snapshot represents a state in which the distributed system might have been in. A snapshot
of the system is a single configuration of the system.

• The global state of a distributed system is a collection of the local states of its components, namely,
the processes
and the communication channels. • The state of a process at any time is defined by the contents of
processor registers, stacks, local memory, etc. and depends on the local context of the distributed
application.
• The state of a channel is given by the set of messages in transit in the channel.
UNIT II

LOGICAL TIME & GLOBAL STATE

Logical clocks are based on capturing chronological and causal relationships of processes and
ordering events based on these relationships.

Three types of logical clock are maintained in distributed systems:

• Scalar clock
• Vector clock
• Matrix clock

In a system of logical clocks, every process has a logical clock that is advanced using a set
of rules. Every event is assigned a timestamp and the causality relation between events can
be generally inferred from their timestamps.
The timestamps assigned to events obey the fundamental monotonicity property; that is, if
an event a causally affects an event b, then the timestamp of a is smaller than the timestamp
of b.
A Framework for a system of logical clocks

A system of logical clocks consists of a time domain T and a logical clock C. Elements of T form a
partially ordered set over a relation <. This relation is usually called the happened before or
causal precedence.

The logical clock C is a function that maps an event e in a distributed system to an element
in the time domain T denoted as C(e).
such that
for any two events ei and ej,.
This monotonicity property is called the clock consistency condition. When T and C
satisfythe following condition,

Then the system of clocks is strongly consistent.

Implementing logical clocks

The two major issues in implanting logical clocks are:
Data structures: representation of each process
Protocols: rules for updating the data structures to ensure consistent conditions.

Data structures:
Each process pi maintains data structures with the given capabilities:
• A local logical clock (lci), that helps process pi measure its own progress.
• A logical global clock (gci), that is a representation of process pi’s local view of the
logicalglobal time. It allows this process to assign consistent timestamps to its local events.

SAP HANA Commvault Best Practices PDF
No ratings yet
SAP HANA Commvault Best Practices PDF
38 pages
CS3551 - 1_merged
No ratings yet
CS3551 - 1_merged
117 pages
CS3551 Unit 1 and 2
No ratings yet
CS3551 Unit 1 and 2
48 pages
DC - Unit 1 - Introduction
No ratings yet
DC - Unit 1 - Introduction
68 pages
CS3551 unit 1 Notes(1)
No ratings yet
CS3551 unit 1 Notes(1)
25 pages
Unit 1notes Full
No ratings yet
Unit 1notes Full
20 pages
CS3551-Distributed computing Notes_removed (1)
No ratings yet
CS3551-Distributed computing Notes_removed (1)
32 pages
CS3551 - Distributed Computing (1)
No ratings yet
CS3551 - Distributed Computing (1)
106 pages
CS3551 Unit 1
No ratings yet
CS3551 Unit 1
24 pages
Distributed-Computing Notes
No ratings yet
Distributed-Computing Notes
108 pages
DC unit 1 - notes
No ratings yet
DC unit 1 - notes
36 pages
DC - Unit 1 - Introduction Notes
No ratings yet
DC - Unit 1 - Introduction Notes
23 pages
Distributed Computing
No ratings yet
Distributed Computing
36 pages
Unit-2 (A)
No ratings yet
Unit-2 (A)
40 pages
NUNIT I
No ratings yet
NUNIT I
119 pages
dc unit 1
No ratings yet
dc unit 1
25 pages
UNIT I NOTES DC
No ratings yet
UNIT I NOTES DC
28 pages
Cs3551 Distributed Computing Unit-1
No ratings yet
Cs3551 Distributed Computing Unit-1
52 pages
CS8603 Unit I
No ratings yet
CS8603 Unit I
35 pages
DC Module1
No ratings yet
DC Module1
54 pages
CS8603 U.I
No ratings yet
CS8603 U.I
37 pages
CS8603 U.I
No ratings yet
CS8603 U.I
36 pages
Chapter 1 - Intro
No ratings yet
Chapter 1 - Intro
31 pages
DC - Hand Written
No ratings yet
DC - Hand Written
26 pages
Distributed Systems Lecture 1-2
No ratings yet
Distributed Systems Lecture 1-2
20 pages
Rohini 89923516027
No ratings yet
Rohini 89923516027
6 pages
ds01 (1)
No ratings yet
ds01 (1)
41 pages
chapter 1
No ratings yet
chapter 1
20 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
29 pages
IEC 312 - Distributed System Security
No ratings yet
IEC 312 - Distributed System Security
22 pages
Distributed Computing
No ratings yet
Distributed Computing
56 pages
196 - Cahpter 1 - Characterization of DSs
No ratings yet
196 - Cahpter 1 - Characterization of DSs
10 pages
Distributed Systems: Dr.P.Amudha Associate Professor
100% (4)
Distributed Systems: Dr.P.Amudha Associate Professor
38 pages
Distributed Computing: Unit-1 (
No ratings yet
Distributed Computing: Unit-1 (
47 pages
Distributed Systems
100% (1)
Distributed Systems
71 pages
532ebdistributed Processing
No ratings yet
532ebdistributed Processing
53 pages
DS Answer PDF
No ratings yet
DS Answer PDF
79 pages
Unit-I Notes
No ratings yet
Unit-I Notes
40 pages
CC - Lect 3 4 - 2-01-24
No ratings yet
CC - Lect 3 4 - 2-01-24
15 pages
Introduction-DC
No ratings yet
Introduction-DC
43 pages
Design of Parallel and Distributed Systems: Dr. Seemab Latif
No ratings yet
Design of Parallel and Distributed Systems: Dr. Seemab Latif
36 pages
Chapter 1
No ratings yet
Chapter 1
117 pages
PDS Unit 1
No ratings yet
PDS Unit 1
59 pages
DC Notes - 2 Marks
No ratings yet
DC Notes - 2 Marks
11 pages
Lecture 1 Introduction To Distributed Systems - 034922
No ratings yet
Lecture 1 Introduction To Distributed Systems - 034922
6 pages
Distributed Processing Systems
No ratings yet
Distributed Processing Systems
5 pages
Distributed Computing Note
100% (1)
Distributed Computing Note
54 pages
Cloud Computing Notes
No ratings yet
Cloud Computing Notes
98 pages
DC Module1
No ratings yet
DC Module1
62 pages
1st Unit DC
No ratings yet
1st Unit DC
66 pages
Unit 2. Distributed Os and Issue
No ratings yet
Unit 2. Distributed Os and Issue
1 page
KCS 713 Unit 1 Lecture 6
No ratings yet
KCS 713 Unit 1 Lecture 6
38 pages
Chapter (1) Introduction To Distributed Systems
No ratings yet
Chapter (1) Introduction To Distributed Systems
15 pages
Advanced Distributed Systems
100% (1)
Advanced Distributed Systems
15 pages
Distributed Systems
No ratings yet
Distributed Systems
121 pages
1
No ratings yet
1
31 pages
COMP 4126 STUDY NOTES
No ratings yet
COMP 4126 STUDY NOTES
18 pages
UNIT-1 NOTES
No ratings yet
UNIT-1 NOTES
23 pages
UNIT-1 by Satish
No ratings yet
UNIT-1 by Satish
37 pages
Distributed System
No ratings yet
Distributed System
26 pages
Operating Systems: Concepts to Save Money, Time, and Frustration
From Everand
Operating Systems: Concepts to Save Money, Time, and Frustration
Jonathan Rigdon
No ratings yet
Data Center Dissertation
100% (2)
Data Center Dissertation
6 pages
Hyper-Converged Infrastructure: How Nutanix™ Works On FUJITSU PRIMERGY® The Definitive Guide To
No ratings yet
Hyper-Converged Infrastructure: How Nutanix™ Works On FUJITSU PRIMERGY® The Definitive Guide To
40 pages
Practice Questions Vlore
No ratings yet
Practice Questions Vlore
23 pages
Full Project
No ratings yet
Full Project
66 pages
TUT - SQL Server Transaction Logs
No ratings yet
TUT - SQL Server Transaction Logs
19 pages
A_Comparative_Study_of_System-Level_Energy_Management_Methods_for_Fault-Tolerant_Hard_Real-Time_Systems
No ratings yet
A_Comparative_Study_of_System-Level_Energy_Management_Methods_for_Fault-Tolerant_Hard_Real-Time_Systems
12 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
14 pages
Shoper 9 POS Re-Installation - Tally Shopper - Tally Chennai - Tally - NET Services
No ratings yet
Shoper 9 POS Re-Installation - Tally Shopper - Tally Chennai - Tally - NET Services
30 pages
Veeam Availability Suite v10: Configuration and Management: (VAS10CM)
No ratings yet
Veeam Availability Suite v10: Configuration and Management: (VAS10CM)
3 pages
Module 3 Challenge Lab - Creating A Static Website For The Café
No ratings yet
Module 3 Challenge Lab - Creating A Static Website For The Café
6 pages
70 765 ProvisioningSQLDatabases
No ratings yet
70 765 ProvisioningSQLDatabases
40 pages
Prep4Sure: Real Exam & Prep4Sureexam Is For Well Preparation of It Exam
No ratings yet
Prep4Sure: Real Exam & Prep4Sureexam Is For Well Preparation of It Exam
7 pages
Best Practices of Huawei SAP HANA TDI Solution Using OceanStor Dorado V3
No ratings yet
Best Practices of Huawei SAP HANA TDI Solution Using OceanStor Dorado V3
26 pages
Sap Hana System Replication Failover On SUSE LINUX For Sap Applications
100% (1)
Sap Hana System Replication Failover On SUSE LINUX For Sap Applications
30 pages
VSP 5000 Series Datasheet
No ratings yet
VSP 5000 Series Datasheet
3 pages
Dell EMC SC Series: Performance and Availability: Dell Storage Center OS (SCOS) 7.3
No ratings yet
Dell EMC SC Series: Performance and Availability: Dell Storage Center OS (SCOS) 7.3
19 pages
HP Simply Storageworks: Introduction To Storage Technologies
100% (2)
HP Simply Storageworks: Introduction To Storage Technologies
16 pages
BDB Prog Reference
No ratings yet
BDB Prog Reference
344 pages
Symmetrix SRDF Product Guide
No ratings yet
Symmetrix SRDF Product Guide
136 pages
Data Modeling For Big Data: ISSN 2085-4579
No ratings yet
Data Modeling For Big Data: ISSN 2085-4579
11 pages
Acronis Certified Engineer Backup 12.5 Training Presentation Module 5 en
No ratings yet
Acronis Certified Engineer Backup 12.5 Training Presentation Module 5 en
102 pages
Isilon GUI Administration
No ratings yet
Isilon GUI Administration
434 pages
Thesis On Distributed Database System
100% (3)
Thesis On Distributed Database System
8 pages
(Ebook) System Design Interview Fundamentals by Rylan Liu ISBN 9798545403486, 8545403488 - Read the ebook now or download it for a full experience
100% (1)
(Ebook) System Design Interview Fundamentals by Rylan Liu ISBN 9798545403486, 8545403488 - Read the ebook now or download it for a full experience
83 pages
Chapter One Introduction To Distributed Systems
No ratings yet
Chapter One Introduction To Distributed Systems
39 pages
HUAWEI CLOUD Services - Relational Database Service
No ratings yet
HUAWEI CLOUD Services - Relational Database Service
51 pages
NetApp Storage Best Practices and Resiliency Guide
No ratings yet
NetApp Storage Best Practices and Resiliency Guide
26 pages
Sap Teched 2016 - Deployment Options With Business Continuity For Sap Hana (Ha and DR)
No ratings yet
Sap Teched 2016 - Deployment Options With Business Continuity For Sap Hana (Ha and DR)
35 pages
Click House
No ratings yet
Click House
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 1 DC

Uploaded by

Unit 1 DC

Uploaded by

CS3551- DISTRIBUTED COMPUTING

Introduction: Definition-Relation to Computer System Components – Motivation – Message -Passing

UNIT II LOGICAL TIME AND GLOBAL STATE

UNIT III DISTRIBUTED MUTEX AND DEADLOCK

Distributed Mutual exclusion Algorithms: Introduction – Preliminaries – Lamport’s algorithm –

UNIT IV CONSENSUS AND RECOVERY

Consensus and Agreement Algorithms: Problem Definition – Overview of Results – Agreement in a

UNIT V CLOUD COMPUTING

Definition of Cloud Computing – Characteristics of Cloud – Cloud Deployment Models – Cloud

A distributed system is a collection of independent entities that cooperate to solve a problem

Issues in distributed systems

1.2 Relation to Computer System Components

Fig 1.1: Example of a Distributed System

Fig 1.2: Interaction of layers of network

Examples of middleware: Object Management Group’s (OMG), Common Object Request

Inherently distributed computations: DS can process the computations at geographically

1.4 MESSAGE-PASSING SYSTEMS VERSUS SHARED MEMORY SYSTEMS

Message passing systems:

Emulating message-passing on a shared memory system (MP → SM)

Emulating shared memory on a message-passing system (SM → MP)

1.5 PRIMITIVES FOR DISTRIBUTED COMMUNICATION

Blocking / Non blocking / Synchronous / Asynchronous

Four modes of send operation

Libraries and standards

• Common Object Request Broker Architecture (CORBA): CORBA describes a

Emulating an asynchronous system by a synchronous system (A → S)

Emulating a synchronous system by an asynchronous system (S → A)

Emulation for a fault free system

1.7 DESIGN ISSUES AND CHALLENGES IN DISTRIBUTED SYSTEMS

1.7.1 Issues related to system and operating systems design

▪ Access transparency: hides differences in data representation

• The geographically remote resources demands the synchronization based on logical

• The synchronization mechanisms also involve resource management and

✓ Deadlock detection and resolution: This is done to avoid duplicate work,

✓ Register constructions: Architectures must be designed in such a way that,

✓ Ad-hoc network approach: This is an infrastructure-less approach which do not

o Challenges in distributed agent systems include coordination mechanisms

1.8 A MODEL OF DISTRIBUTED COMPUTATIONS: DISTRIBUTED PROGRAM

• The execution of a process consists of a sequential execution of its actions.

Fig Space time distribution of distributed systems

It places a restriction on communication between processes by requiring that if the

Lamports ordering is happen before relation denoted by →

Fig Communication between processes

Logical vs physical concurrency

Differences between logical and physical concurrency

LOGICAL TIME & GLOBAL STATE

Three types of logical clock are maintained in distributed systems:

Then the system of clocks is strongly consistent.

Implementing logical clocks

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.