Distributed Computing # Introduction: Module-2
Distributed Computing # Introduction: Module-2
MODULE-2
# Introduction
Usually causality is tracked using physical time. However, in distributed systems, it is not possible to
have global physical time; it is possible to realize only an approximation of it. The knowledge of the causal
precedence relation among the events of processes helps solve a variety of problems in distributed systems.
Examples of some of these problems is as follows:
Distributed algorithms design The knowledge of the causal precedence relation among events helps ensure
liveness and fairness in mutual exclusion algorithms, helps maintain consistency in replicated databases, and
helps design correct deadlock detection algorithms to avoid phantom and undetected deadlocks.
Tracking of dependent events In distributed debugging, the knowledge of the causal dependency among
events helps construct a consistent state for resuming reexecution; in failure recovery, it helps build a
checkpoint; in replicated databases, it aids in the detection of file inconsistencies in case of a network
partitioning.
Knowledge about the progress The knowledge of the causal dependency among events helps measure the
progress of processes in the distributed computation. This is useful in discarding obsolete information,
garbage collection, and termination detection.
Concurrency measure The knowledge of how many events are causally dependent is useful in measuring
the amount of concurrency in a computation. All events that are not causally related can be executed
concurrently. Thus, an analysis of the causality in a computation gives an idea of the concurrency in the
program.
In a system of logical clocks, every process has a logical clock that is advanced using a set of rules. Every
event is assigned a timestamp and the causality relation between events can be generally inferred from their
timestamps. The timestamps assigned to events obey the fundamental monotonicity property; that is, if an event
a causally affects an event b, then the timestamp of a is smaller than the timestamp of b.
# Scalar Time
• Proposed by Lamport in 1978 as an attempt to totally order events in a distributed system.
• Time domain is the set of non-negative integers.
• The logical local clock of a process pi and its local view of the global time are squashed into one
integer variable Ci .
• Rules R1 and R2 to update the clocks are as follows:R1: Before executing an event (send, receive, or
internal), process pi executes the following: Ci := Ci + d (d > 0)
• In general, every time R1 is executed, d can have a different value; however, typically d is kept at 1.
• R2: Each message piggybacks the clock value of its sender at sending time. When a process pi
receives a message with timestamp Cmsg , it executes the following actions:
◮ Ci := max(Ci, Cmsg )
◮ Execute R1.
◮ Deliver the message.
Figure shows evolution of scalar time
# Vector Time
• The system of vector clocks was developed independently by Fidge, Mattern and Schmuck.
• In the system of vector clocks, the time domain is represented by a set of n-dimensional non-negative
integer vectors.
• Each process pi maintains a vector vti [1..n], where vti[i] is the local logical clock of pi and describes
the logical time progress at process pi . vti[j] represents process pi ’s latest knowledge of process pj
local time.
• If vti[j]=x, then process pi knows that local time at process pj has progressed till x.
• The entire vector vti constitutes pi ’s view of the global logical time and is used to timestamp events.
• Process pi uses the following two rules R1 and R2 to update its clock: R1: Before executing an event,
process pi updates its local logical time as follows: vti [i] := vti [i] + d ; (d > 0)
• R2: Each message m is piggybacked with the vector clock vt of the sender process at sending time. On
the receipt of such a message (m,vt), process pi executes the following sequence of actions:
1. Update its global logical time as follows: 1 ≤ k ≤ n : vti[k] := max(vti[k], vt[k])
2. Execute R1.
3. Deliver the message m
# Election Algorithm
• In order to perform coordination, distributed systems employ the concept of coordinators.
• An algorithm for choosing a unique process to play a particular role(coordinator) is called an election
algorithm.
• Election algorithm assumes that every active process in the system has a unique priority number.
• The process with highest priority will be chosen as the coordinator.
• When a coordinator fails, the algorithm elects that active process which has highest priority number.
• Then this number is send to every active process in the distributed system.
Bully Algorithm
There are 3 types of messages in bully algorithm
Election message – announces an election
Ok message – response to an election message
Coordinator message – announce the identity of the elected process
Steps:-
1. A process can begin an election by sending an election message to processes with high priority
number and waiting for ok messages in response.
2. If none arrives within time T, the process considers itself as the coordinator and sends a coordinator
message to all processes with lower identifiers announcing this
3. Otherwise the other process start election for a coordinator.
4. If coordinator does not respond to it within a time interval T, then it is assumed that coordinator has
failed.
5. Now process P sends election message to every process with high priority number.
6. It waits for responses, if no one responds within time interval T, then process P elects itself as a
coordinator.
7. Then it sends a message to all lower priority number processes that it is elected as their new
coordinator.
8. If a process that was previously down/failed comes back it take over the coordinator job.
9. Biggest guy always wins hence the name bully algorithm.
3 17
4
24
15 24
28
The reason we need to record the local state and communication channel state can be explained using an
example:
Chandy-Lamport algorithm
• The Chandy-Lamport algorithm uses a control message, called a marker whose role in a FIFO system is to
separate messages in the channels.
• After a site has recorded its snapshot, it sends a marker, along all of its outgoing channels before sending
out any more messages.
• A marker separates the messages in the channel into those to be included in the snapshot from those not
to be recorded in the snapshot.
• A process must record its snapshot no later than when it receives a marker on any of its incoming
channels.
• The algorithm can be initiated by any process by executing the “Marker Sending Rule” by which it records
its local state and sends a marker on each outgoing channel.
• A process executes the “Marker Receiving Rule” on receiving a marker. If the process has not yet recorded
its local state, it records the state of the channel on which the marker is received as empty and executes
the “Marker Sending Rule” to record its local state.
• The algorithm terminates after each process has received a marker on all of its incoming channels.
• All the local snapshots get disseminated to all other processes and all the processes can determine the
global state.
# Termination Detection
A fundamental problem in distributed systems is to determine if a distributed computation has terminated.
The detection of the termination of a distributed computation is non-trivial since no process has complete
knowledge of the global state, and global time does not exist. A distributed computation is considered to be
globally terminated if every process is locally terminated and there is no message in transit between any
processes. A “locally terminated” state is a state in which a process has finished its computation and will not
restart any action unless it receives a message. In the termination detection problem, a particular process (or
all of the processes) must infer when the underlying computation has terminated.
Messages used in the underlying computation are called basic messages, and messages used for the purpose
of termination detection (by a termination detection algorithm) are called control messages.
A termination detection (TD) algorithm must ensure the following:
1. Execution of a TD algorithm cannot indefinitely delay the underlying computation; that is, execution of the
termination detection algorithm must not freeze the underlying computation.
2. The termination detection algorithm must not require addition of new communication channels between
processes.
The last process to terminate will have the largest clock value. Therefore, every process will take a snapshot
for it, however, it will not take a snapshot for any other process.
Formal description:
The algorithm is defined by the following four rules
Correctness of Algorithm:
A: set of weights on all active processes
B: set of weights on all basic messages in transit
C: set of weights on all control messages in transit
Wc : weight on the controlling agent.
# Spanning-Tree-Based Termination Detection Algorithm
• There are N processes Pi , 0≤i≤N, which are modeled as the nodes i, 0≤i≤N, of a fixed connected
undirected graph.
• The edges of the graph represent the communication channels.
• The algorithm uses a fixed spanning tree of the graph with process P0 at its root which is responsible
for termination detection.
• Process P0 communicates with other processes to determine their states through signals.
• All leaf nodes report to their parents, if they have terminated.
• A parent node will similarly report to its parent when it has completed processing and all of its
immediate children have terminated, and so on.
• The root concludes that termination has occurred, if it has terminated and all of its immediate
children have also terminated.
• Two waves of signals generated one moving inward and other outward through the spanning tree.
Initially, a contracting wave of signals, called tokens, moves inward from leaves to the root.
• If this token wave reaches the root without discovering that termination has occurred, the root
initiates a second outward wave of repeat signals.
• As this repeat wave reaches leaves, the token wave gradually forms and starts moving inward again,
this sequence of events is repeated until the termination is detected.
• Initially, each leaf process is given a token.
• Each leaf process, after it has terminated sends its token to its parent.
• When a parent process terminates and after it has received a token from each of its children, it sends
a token to its parent.
• This way, each process indicates to its parent process that the subtree below it has become idle.
• In a similar manner, the tokens get propagated to the root.
• The root of the tree concludes that termination has occurred, after it has become idle and has
received a token from each of its children.
Main idea is to color the processes and tokens and change the color when messages such as in Figure are
involved.
The algorithm works as follows:
• Initially, each leaf process is provided with a token. The set S is used for book-keeping to know which
processes have the token Hence S will be the set of all leaves in the tree .
• Initially, all processes and tokens are colored white.
• When a leaf node terminates, it sends the token it holds to its parent process.
• A parent process will collect the token sent by each of its children. After it has received a token from
all of its children and after it has terminated, the parent process sends a token to its parent.
• A process turns black when it sends a message to some other process. When a process terminates, if
its color is black, it sends a black token to its parent.
• A black process turns back to white, after it has sent a black token to its parent.
• A parent process holding a black token (from one of its children), sends only a black token to its
parent, to indicate that a message-passing was involved in its subtree.
• Tokens are propagated to the root in this fashion. The root, upon receiving a black token, will know
that a process in the tree had sent a message to some other process. Hence, it restarts the algorithm
by sending a Repeat signal to all its children.
• Each child of the root propagates the Repeat signal to each of its children and so on, until the signal
reaches the leaves.
• The leaf nodes restart the algorithm on receiving the Repeat signal.
• The root concludes that termination has occurred, if it is white, it is idle, and it received a white
token from each of its children.
Performance
• The best case message complexity of the algorithm is O(N), where N is the number of processes in the
computation, which occurs when all nodes send all computation messages in the first round.
• The worst case complexity of the algorithm is O(N*M), where M is the number of computation
messages exchanged, which occurs when only computation message is exchanged every time the
algorithm is executed.