7.DS Leader Election
7.DS Leader Election
7.DS Leader Election
2
Requirement of Leader Election
Typically leader election is used:
To ensure exclusive access by a single node to shared data, or
To ensure a single node coordinates the work in a system.
3
Applications of LEAs
Radio networks:
In radio network protocols, leader election is often used as a first step to approach more
advanced communication primitives, such as message gathering or broadcasts.
When adjacent nodes transmit at the same time in wireless networks (very natural) induces
collisions; electing a leader allows to better coordinate this process.
While the diameter D of a network is a natural lower bound for the time needed to elect a
leader, upper and lower bounds for the leader election problem depend on the specific radio
model.
RDBMS:
RDBMSs rely on leader election to pick a leader database which handles all writes, and
sometimes, all reads where election may be automated, but it’s frequently done manually by
a human operator.
4
Election Algorithms
Many distributed algorithms need one process to act as a coordinator for coordinating all the
activities in a distributed system
Election algorithms are to pick a unique coordinator/leader based on certain criteria such as
largest identifier
Examples:
(1) Take over the role of a failed process (Fault Tolerance)
(2) Pick a master in Berkeley clock synchronization algorithm (Physical Clock Synchronization)
(3) A powerful tool used in systems across Amazon for fault-tolerance and easier to operate.
(4) Znode in Zookeeper is choosen as leader. All application processes watch the current
smallest znode which is ephemeral (works for small duration of time), and check if they are the
new leader when the smallest znode goes away 5
Leader Election Algorithms
Once the leader is elected, the nodes reach a particular state known as terminated state.
The states are partitioned into elected states & non-elected states.
Liveness condition: Every node will eventually enter an elected state or a non-elected state.
Safety condition: Only a single node enters the elected state & eventually become the leader.
Once a decision is made, a node is elected as the leader and all the other nodes will
acknowledge the role of that node as the leader 6
Validity of LEA
A LEA is valid if it meets the following conditions:
Termination: the algorithm should finish within a finite
time once the leader is selected. In randomized
approaches this condition is sometimes weakened (for
example, requiring termination with probability 1).
Size of the network: the algorithm may or may not use knowledge of the number
of processes in the system.
8
Types of Leader Election Algorithms
The most prominent LE algorithms are:
b. Ring Algorithm
Modified Ring Algorithm
9
Bully Algorithm: Basic Assumptions
1. The system is synchronous
5. The processes may fail at any time including during execution of algorithm
Types of Messages:
Coordinator: For announcing the victory of election
Election Message: To initiate election process
Alive Message : To indicate the status of message
11
Bully Algorithm
When a process P recovers from failure, or the failure detector indicates that the current coordinator has failed, P
performs the following actions:
Step 1: If P has the highest process ID, it sends a Victory message to all other processes and becomes the new
Coordinator. Otherwise, P broadcasts an Election message to all other processes with higher process IDs than
itself.
Step 2: If P receives no Answer after sending an Election message, then it broadcasts a Victory message to all
other processes and becomes the Coordinator.
Step 3: If P receives an Answer from a process with a higher ID, it sends no further messages for this election and
waits for a Victory message. (If there is no Victory message after a period of time, it restarts the process at the
beginning.)
Step 4: If P receives an Election message from another process with a lower ID it sends an Answer message back
and starts the election process at the beginning, by sending an Election message to higher-numbered processes.
12
Algorithm (Bully)
Step 1: Let process P sends a message to the coordinator
P C
Step2: If coordinator does not respond to it within a time interval T, then it is assumed that coordinator has
failed
Step 3: Now process P sends election message to every process with high priority number
Step 4: It waits for response, if no one responds for time interval T then process P elects itself as a coordinator
Step 5: Then it sends a message to all lower priority processes then it is elected as their new coordinator
Process P again waits for time interval T to receive another message from Q that it has been elected
as coordinator
If Q does not respond within time interval T, then it is assumed to have failed and algorithm is
reiterated 13
Example Bully Algorithm
Failed N3
N80
ELECT N5
N32
N12 N6
Detected the
ELECT
failure
N12 & N32 send OK or Alive messages to N6 being the higher ids than N6
N32 know the id of all other processes (Every process knows the ids of all other processes)
N32 sends Coordinator or Victory message to all lower Id processes & the Election is
Complete 15
If failures is stop, eventually will elect a leader
Best case
Second highest id detects leader failure
Sends (N-2) coordinator messages
Completion time: 1 message transmission time 17
Impossibility
Since timeouts built into protocol, in asynchronous system model:
18
Disadvantages of Bully Algorithm
(a) Space Complexity is very large since every process should know the identity
of every other process in the system.
19
Improved Bully Algorithm
Presented by A.arghavani, E.ahmadi, A.T.haghighat in 2011.
The main concept: The algorithm declares the new coordinator before actual or current
coordinator is crashed. (needs extra stages)
Before the coordinator is failed, the current coordinator tries to gather information about
processes in the system and declares the next possible coordinator to the processes.
With increasing knowledge and get the id of all other process, a process with the bigger id
attempts to execute the bully algorithm.
If the coordinator is failed, each process that notices this failure compares its id with the id
which it has received via the coordinator.
21
Disadvantages of Modified Bully Algorithm
It is better than bully but also has O(n2) complexity in worst case.
22
MODIFIED ELECTION ALGORITHM
Presented by M.S. Kordafshari, M.Gholipour, M.jahanshahi, A.T.haghighat in 2005.
1. When any process p notices that coordinator is not responding, it initiates an election and send election
message to all process with higher priority number.
2. If no process responds, process P wins the election and becomes new coordinator.
3. Process with the higher priority sends ok message with its priority number to process P.
4. When process p receive all the response it select the new coordinator with the highest priority number
process and sends the grant message to it.
5. Now the coordinator process will broadcast a new coordinator message to all other process and informs itself
as a coordinator.
23
Ring Algorithm
The algorithm applies to system organized as a ring (logically or physically)
Assumptions: The link between the processes are unidirectional and every
process can manage to the processes on its right only (Clockwise)
1 0 2 3 4
24
Algorithm: Ring
Step1: If process P1 detects a coordinator failure, it creates a new active list which is empty
initially.
It sends election message to its neighbor on right and adds number 1 to its active list.
Step 2: If process P2 receives a message elect from processes on left, it responds in 3 ways:
(i) If msg received does not contain 1 in active list then P1 adds 2 to its active list & forward
the message
(ii) If this is the first election message it has received or sent, P1 creates new active list with
numbers 1 and 2.
It then sends election message 1 followed by 2:
0 3 1 4 2 1
Coordinator
(iii) If process P1 receives its own election message 1 then active list for P1 now contains
numbers of all the active processes in the system. Now process P1 detects highest priority
number from list and elects it as the next coordinator.
25
Example: Ring Algorithm
0-7 Processes are participating in the network
P thinks the coordinator has crashed, builds an election message which contains its own id
number (Process 6)
Sends to first live successor; (Ex. Node 5 sends id 5 to node 6)
Each process adds its own number and forwards to next
O.K to have two elections at once
2nd Part One Part
[5,6,0]
1
0 [2] Election Message
2
7 [5,6]
6 3
[2,3]
Previous Coordinator [5] 4
has crashed 5 [2,3,4]
[5,6,0,1,2,3,4] Valu5 [2,3,4,5,6,0,1] Active List at Node 2 26
Example: Ring Algorithm
When the message returns to P, it sees its own process ID in the list &
knows that the circuit is complete
[5, 6, 0, 1, 2, 3, 4]
[2, 3, 4, 5, 6, 0, 1] 6 Coordinator
27
MODIFIED RING ALGORITHM
When a node notices that the leader has crashed, it sends its ID number to its neighboring node in the ring. Thus,
it is not necessary for all nodes to send their IDs into the ring.
The receiving node compares the received ID with its own, and forwards whichever is the greatest. This
comparison is done by all the nodes such that only the greatest ID remains in the ring.
If the received ID equals that of the initial sender, it declares itself as the leader by sending a coordinate message
into the ring.
It can be observed that this method dramatically reduces the overhead involved in message passing.
Thus, if many nodes notice the absence of the leader at the same time, only the message of the node with the
greatest ID circulates in the ring thus, preventing smaller IDs from being sent.
If n{i1,i2,··· ,im} is the number of nodes that concurrently detect the absence of the crashed coordinator and n is
the number of nodes in the ring, then the total number of messages passed with an order of O(n2) is as follows:
T = n{i1i2,··· ,im} × n.
28
• Leader election is an important component of many cloud computing systems
29
Applications of Leader Election
In Wireless Networks:
Key distribution,
Routing coordination,
Sensor coordination, and
General control.
In Cloud Computing:
Resolving Conflicts During Resource sharing
30
How Amazon elects a leader ?
There are many ways to elect a leader, ranging from algorithms like Paxos, to software like
Apache ZooKeeper, to custom hardware, to leases.
Leases:
are the most widely used leader election mechanism at Amazon.
requires that the leader heartbeat periodically to show that it’s still the leader.
If the existing leader fails to heartbeat after some time, other leader candidates can try to take
over. 31
Examples of systems using leader election at Amazon
Leader election is a widely deployed pattern across Amazon.
For example:
RDBMSs rely on leader election to pick a leader database which handles all
writes and sometimes all reads.
32
Examples of systems using leader election at Amazon
Amazon EBS (Elastic Block Store) distributes reads and writes for a volume (Solid
State Drives/Hard Disk Drives) over many storage servers.
To ensure consistency, it uses leader election to elect primaries for each area of
the volume which order the reads and writes.
If primary fails, follower copies steps in using the same leader election
mechanism.
To tolerate failures, Amazon distributed systems don’t have a single leader. Instead,
leadership is a property that passes from server to server, or process to process.
In distributed systems, it’s not possible to guarantee that there is exactly one leader in the
system. Instead, there can mostly be one leader, and there can be either zero leaders or two
leaders during failures.
Idempotent can often tolerate two leaders with minimal loss of efficiency
Network Latency: Consider that slow networking, timeouts, retries, and garbage collection
pauses can cause the remaining lease time to expire before the code expects it to.
Correctness: Avoid heartbeating leases in a background thread. This can cause correctness
issues if the thread can’t interrupt the code when the lease expires or the heartbeating thread
dies.
Availability: This issues can occur if the work thread dies or stops while the heartbeating
thread holds on to the lease.
36
Characteristics of a Good Leader Election
Reliability: Have reliable metrics that show how much work a leader can do versus how much
it is doing now.
Scalability: Review the metrics often and make sure that there are plans for scaling in advance
of running out of capacity.
Flexibility: Make it easy to find which host is the current leader and which host was the leader
at any given time. Keep an audit trail or log of leadership changes.
Formal Verification Tools: Model and formally verify the correctness of distributed algorithms
using tools like TLA+.
Bug Tolerance: This catches subtle, difficult to observe, and rare bugs that can creep in when
an application assumes too much about the guarantees provided by the leader election
protocol. 37
References
1. https://www.coursera.org/lecture/cloud-computing-2/1-4-bully-algorithm-K8QwJ
2. https://aws.amazon.com/builders-library/leader-election-in-distributed-systems/
3. Seema Balhara, Kavita Khanna, Leader Election Algorithms in Distributed Systems, International Journal of
Computer Science and Mobile Computing, Vol. 3, Issue. 6, June 2014, pg.374 – 379
38
Thank
You
39