Unit 4
Unit 4
processes need to coordinate their access to shared resources to avoid conflicts or data
inconsistency. Since there is no central controller in distributed systems, mutual exclusion must be
achieved through communication between processes, ensuring that only one process can access a
critical section (the shared resource) at a time.
Here’s a breakdown of Distributed Mutual Exclusion, including key algorithms used to solve it:
1. Critical Section (CS): A section of code where shared resources are accessed and need to be
protected to avoid conflicts.
2. Mutual Exclusion: Only one process should be allowed to enter the critical section at a time.
3. No Central Coordinator: Since distributed systems don’t rely on a single point of control,
processes communicate with each other to achieve mutual exclusion.
4. No Deadlock: The algorithm must ensure that no two processes are waiting for each other,
leading to a situation where neither can proceed.
5. No Starvation: All processes that request access to the critical section should eventually be
allowed to enter.
1. Lamport’s Algorithm:
One of the earliest and most famous algorithms for achieving distributed mutual exclusion.
Basic Idea: Processes exchange timestamped messages to request access to the critical
section. These timestamps help in deciding which process should get access to the critical
section.
Steps:
1. Request: When a process Pi wants to enter the critical section, it sends a "Request"
message with its timestamp to all other processes.
2. Reply: Each process replies to Pi only if it is not requesting or is requesting with a
later timestamp.
3. Enter CS: Pi enters the critical section when it has received replies from all other
processes.
4. Release: When Pi exits the critical section, it sends a "Release" message to all
processes, allowing others to proceed.
Advantages:
Drawbacks:
o Requires O(n)messages for each critical section access, where n is the number of
processes.
2. Ricart-Agrawala Algorithm:
The Ricart-Agrawala Algorithm is a distributed mutual exclusion algorithm used to ensure that only
one process at a time can enter the critical section in a distributed system. This algorithm is
message-passing based and allows processes to request access to the critical section without
needing a centralized coordinator. It was developed by Glenn Ricart and Ashok Agrawala.
o When a process Pi wants to enter the critical section, it increments its logical clock,
timestamps the request, and sends a Request message to all other processes.
o The process waits for a Reply message from all other processes before entering the
critical section.
2. Receiving Requests:
If Pj is not in the critical section and doesn't want to enter, it immediately
sends a Reply.
o Once a process Pi receives a Reply from all other processes, it can enter the critical
section.
o After leaving the critical section, Pisends a Release message to all deferred requests,
allowing them to proceed.
Properties
Fairness: The Ricart-Agrawala algorithm ensures that requests are granted in the order they
were made (based on timestamps).
Deadlock-Free: Since each process must wait for a reply from every other process, circular
waiting (a cause of deadlock) is avoided.
No Starvation: Every request will eventually be granted since timestamps ensure a strict
order of requests.
3. Maekawa’s Algorithm:
This algorithm reduces the number of messages by dividing processes into groups (or quorums),
where each process only communicates with a subset of processes instead of all.
Basic Idea: Each process communicates with a subset (quorum) of other processes to
request access to the critical section. Quorums are designed so that any two quorums share
at least one common process (to ensure mutual exclusion).
Steps:
1. Request: A process requests permission to enter the critical section from all the
processes in its quorum.
2. Grant: Each process in the quorum grants permission only if it hasn’t already
granted it to another process.
3. Enter CS: The process enters the critical section when it has received permission
from all members of its quorum.
4. Release: After exiting the critical section, the process sends "Release" messages to
its quorum members.
Advantages:
o Reduces the number of messages, as each process communicates with only a subset
of processes (quorum), not all processes.
Drawbacks:
4. Token-Based Algorithms:
In token-based algorithms, a unique token is used to grant access to the critical section. The token
circulates among processes, and only the process holding the token can enter the critical section.
Token Ring Algorithm: A simple implementation where processes are organized in a logical
ring. The token is passed around the ring. A process can enter the critical section if it holds
the token. After using the token, the process passes it to the next process in the ring.
o Advantages:
Simple to implement.
o Drawbacks:
If the token is lost (e.g., due to a crash), the system must have mechanisms
to regenerate the token.
A process may have to wait a long time to get the token if it’s far away in the
ring.
Coordination and agreement in group communication are essential problems in distributed systems,
where multiple processes must work together to reach a consensus or coordinate actions. This is
challenging because distributed systems are often asynchronous, with processes running
independently, network delays, potential failures, and no global clock.
Here’s an overview of coordination and agreement in group communication, common issues, and
approaches to solving them.
Coordination in distributed systems refers to managing and synchronizing the actions of distributed
processes to achieve a common goal. Effective coordination is crucial to ensure that processes do
not act inconsistently or at conflicting times.
Mutual Exclusion: Ensuring that only one process at a time accesses a shared resource (e.g.,
using algorithms like Lamport's or Ricart-Agrawala's).
Leader Election: Selecting a single process to act as a coordinator or leader for a specific
task. For example, Ring-Based Algorithm can be used.
Group Membership: Keeping track of which processes are part of a group, even as nodes
join or leave.
Synchronization: Ensuring that processes execute operations in the correct order, often
involving clock synchronization (e.g., logical clocks or vector clocks).
Agreement is the ability of processes in a distributed system to agree on a common decision, value,
or course of action, even in the presence of failures. Agreement problems are especially important
for tasks like maintaining consistency in databases, achieving consensus, and ensuring that
distributed processes act in concert.
Consensus: All non-faulty processes must agree on a single value. The value could be, for
example, a proposed action or the state of the system.
Atomic Broadcast: Ensuring that a message is delivered to all group members in the same
order, even if some members fail.
Byzantine Agreement: Ensuring agreement even when some processes are faulty or
malicious (Byzantine faults).
Network Latency: Communication delays between processes can result in inconsistent views
of the system’s state.
Process Failures: Processes may fail or crash, and it may be difficult for other processes to
detect the failure immediately.
Asynchrony: Without a global clock, processes cannot rely on synchronized timing.
Message Loss: Messages may be delayed or lost due to network issues, leading to
inconsistent states or delays in agreement.
4. Coordination Approaches:
A. Leader Election:
Leader election algorithms are used to designate a single process as the "leader" or coordinator. This
leader is responsible for coordinating the actions of the group or making final decisions.
Bully Algorithm: In this algorithm, each process has a unique ID. When a process notices
that the current leader has failed, it initiates an election by sending messages to all
processes with higher IDs. The process with the highest ID becomes the new leader.
Ring Algorithm: Processes are arranged in a logical ring, and election messages are passed
around the ring until a leader is elected.
5. Agreement Approaches:
A. Consensus Algorithms:
Consensus algorithms help processes agree on a single value or decision, even if some processes fail
or there is network unreliability.
Paxos: Paxos is a widely used consensus algorithm that tolerates process failures. It
guarantees that even if some processes fail or messages are lost, the remaining processes
can reach a consensus on a single value. Paxos is particularly important in distributed
databases and distributed systems like Google’s Chubby and Amazon’s Dynamo.
Raft: Raft is another consensus algorithm that is designed to be more understandable than
Paxos. It divides the process into two steps: leader election and log replication. Once a
leader is elected, it coordinates the replication of data across all nodes.
In some distributed systems, processes may act maliciously or unpredictably (called Byzantine
faults). Agreement under Byzantine faults is significantly more challenging because some processes
may send conflicting or incorrect information to different processes.
Byzantine Generals Problem: This classical problem illustrates the difficulty of achieving
agreement in a distributed system with malicious processes. It asks how a group of generals,
each commanding an army, can agree on a coordinated attack plan if some of them may
betray the group.
Practical Byzantine Fault Tolerance (PBFT): PBFT is an algorithm that ensures consensus in a
distributed system even when some processes are faulty or malicious. It is used in
blockchain and financial systems, where security and fault tolerance are critical.
Atomic Broadcast: Guarantees that all correct processes receive the same messages in the
same order, even in the presence of process failures. This is essential for maintaining
consistency in replicated databases.
Total Order Broadcast: A special case of atomic broadcast where processes not only receive
the same messages but in the same total order. Total order broadcast is often implemented
using consensus algorithms like Paxos or Raft.
A. Multicast Communication:
Reliable Multicast: Ensures that all processes in the group receive the message, even if
some processes fail. If a message is sent to a group, either all members receive it, or none
do.
Ordered Multicast: Guarantees that all processes receive messages in the same order,
ensuring consistency. This is useful in systems like distributed databases, where the order of
operations is critical.
Group membership protocols maintain information about which processes are currently part of a
group. This is crucial for agreement and coordination tasks since processes need to know which
other processes are active and should be considered for agreement.
View Synchronous Multicast: Ensures that processes in a distributed group have the same
"view" of which processes are active, and all messages are delivered consistently to
processes in the group.
Conclusion:
Coordination and agreement are critical challenges in distributed systems to ensure consistency,
reliability, and fault tolerance. Algorithms like Paxos, Raft, and Byzantine Fault Tolerance have been
developed to solve these issues, and they play a crucial role in distributed databases, cloud services,
and large-scale systems.