Chapter 6
Chapter 6
1 1
Introduction
2 2
6.1 Clock Synchronization
3 3
when each machine has its own clock, an event that occurred after
another event may nevertheless be assigned an earlier time
4 4
Physical Clocks
is it possible to synchronize all clocks in a distributed
system?
no; even if all computers initially start at the same time,
they will get out of synch after some time due to crystals in
different computers running at different frequencies, a
phenomenon called clock skew
5 5
Clock Synchronization Algorithms
two situations:
one machine has a receiver of UTC time, then how do we
synchronize all other machines to it
no machine has a receiver, each machine keeps track of
its own time, then how to synchronize them
Cristian’s Algorithm
6 6
problem: message delays; how to estimate these delays
8 8
The Berkley Algorithm
in the previous algorithm, the time server is passive; only
other machines ask it periodically
in Berkley UNIX, a time daemon polls every machine from
time to time to ask the time
it then calculates the average and sends messages to all
machines so that they will adjust their clocks accordingly
suitable when no machine has a UTC receiver
the time daemon’s time must be set periodically manually
sometimes we may be interested to internally synchronize
clocks instead of the exact time if machines are not
communicating with the outside world; in this case manual
adjustment of the daemon’s time may not be required
9 9
a) the time daemon asks all the other machines for their clock values
b) the machines answer how far ahead or behind the time daemon they are
c) the time daemon tells everyone how to adjust their clock
10 10
6.2 Logical Clocks
for some applications, it is sufficient if all machines agree on
the same time, rather than with the real time
we need internal consistency of the clocks rather than being
close to the real time
hence the concept of logical clocks
what matters is the order in which events occur (e.g., our
make example)
Lamport’s Logical Clocks
Lamport defined a relation called happens-before
the expression a → b, read as “a happens before b”, means
all processes agree that first event a occurs, then event b
occurs
11 11
this relation can be observed in two situations
if a and b are events in the same process, and a occurs
before b, then a → b is true
if a is the event of a message being sent by one process,
and b is the event of the message being received by
another process, then a → b is also true
happens-before is a transitive relation
if a → b and b → c, then a → c
if two events, x and y, happen in different processes that
do not exchange messages, then both x → y and y → x are
not true; these events are said to be concurrent; it means
that nothing can be said about the order of these events
for every event a, we can assign a time value C(a) on which
all processes agree; if a → b, then C(a) < C(b)
12 12
to rephrase the previous situations
1. if a → b in the same process, then C(a) < C(b)
2. if a and b represent the sending and receiving of a
message, respectively, then C(a) < C(b)
Lamport’s proposed algorithm for assigning times for
processes
consider three processes each running on a different
machine, each with its own clock
the solution follows the happens before relation
each message carries the sending time
if the receiver’s clock shows a value prior to the time the
message was sent, the receiver fast forwards its clock to
be one more than the sending time
13 13
a) three processes, each with its own clock; the clocks run at constant
rates, but the rates are different
b) Lamport's algorithm corrects the clocks
14 14
Implementation
each process maintains a local counter Ci. These
counters are updated as follows
before executing an event (i.e., sending a message over
the network or delivering a message to an application,
etc.) Pi executes Ci ← Ci + 1
this is required because between two events, the clock
must tick at least once, i.e., a process that sends or
receives two messages in succession must advance
its clock by one tick
when process Pi sends a message m to Pj, it sets m’s
timestamp ts(m) equal to Ci after having executed the
previous step
upon the receipt of a message m, process Pj adjusts its
own local counter as
Cj ← max{Cj , ts(m)}, after which it then executes the first
step and delivers the message to the application
15 15
additional requirement
no two events must occur at exactly the same time; i.e.,
for all distinctive events a and b, C(a) ≠ C(b)
if so attach the number of the process
e,g., if events happen in processes 1 and 2 at time
40, then we have 40.1 and 40.2
16 16
6.3 Mutual Exclusion
when a process has to read or update shared data (quite
often to use a shared resource such as a printer, a file, etc.), it
first enters a critical region to achieve mutual exclusion
in single processor systems, critical regions are protected
using semaphores, monitors, and similar constructs
how are critical regions and mutual exclusion implemented in
distributed systems?
distributed mutual exclusion algorithms can be classified into
two: token-based and permission-based
a. Permission-based approach: based on getting permission
from other processes
three algorithms: centralized, decentralized, and
distributed
17 17
A Centralized Algorithm
a coordinator is appointed and is in charge of granting
permissions
three messages are required: request, grant, release
19 19
two solutions: a decentralized algorithm which is
probabilistic (has a probability of failure) and adistributed
algorithm which is deterministic
A Distributed Algorithm
assume that there is a total ordering of all events in the
system, such as using Lamport timestamps
when a process wants to enter a critical region it builds a
message (containing name of the critical region, its
process number, current time) and sends it to everyone
including itself
the sending of a message is assumed to be reliable; i.e.,
every message is acknowledged
when a process receives a request message
1. if the receiver is already in the critical region, it does not
reply; instead it queues the request
2. if the receiver is not in a critical region and does not
want to enter it, it sends back an OK message to the
sender
20 20
3. if the receiver wants to enter the critical region but has
not yet done so, it compares the timestamp of the
message it sent with the incoming one; the lowest wins;
if the incoming message is lower, it sends an OK
message; otherwise it queues the incoming message
and does not do anything
when the sender gets a reply from all processes, it may
enter into the critical region
when it finishes it sends an OK message to all processes
in its queue
is it possible that two processes can enter into the critical
region at the same time if they initiate a message at the
same time? NO
a) two processes (0 and 2) want to enter the same critical
region at the same moment
b) process 0 has the lowest timestamp, so it wins
c) when process 0 is done, it sends an OK message, so
process 2 can now enter into its critical region
21 21
mutual exclusion is guaranteed; no deadlock, no starvation
the total number of messages to enter a critical region is
increased to 2(n-1), where n is the number of processes
no single point of failure; unfortunately n points of failure
we also have n bottlenecks
hence, it is slower, more complicated, more expensive, and
less robust than the previous one; but shows that a
distributed algorithm is possible
22 22
b. Token-based solutions
a token is passed between processes; if a process
wants to transmit, it waits for the token, access the
shared resource and releases the token
assume a bus network (e.g., Ethernet); no physical
ordering of processes required but a logical ring is
constructed by software
25 25
6.4 Election Algorithms
there are situations where one process must act as a
coordinator, initiator, or perform some special task
assume that
each process has a unique number (say its network
address provided there is one process per machine)
every process knows the process number of every other
process, but not the state of each process (which ones are
currently running and which ones are down)
the goal of an election algorithm is to ensure that all
processes agree on the new coordinator
two traditional algorithms: the Bully algorithm and the Ring
algorithm
26 26
The Bully Algorithm (the biggest person wins)
when a process (say P4) notices that the coordinator is no
longer responding to requests, it initiates an election as
follows
1. P4 sends an ELECTION message to all processes with
higher numbers (P5, P6, P7)
if a process gets an ELECTION message from one of
its lower-numbered colleagues, it sends an OK
message to the sender and holds an election
2. if no one responds, P4 wins the election and becomes a
coordinator
3. if one of the higher-ups answers, this new process takes
over
27 27
a) Process 4 holds an election
b) Process 5 and 6 respond, telling 4 to stop
c) Now 5 and 6 each hold an election
28 28
d) Process 6 tells 5 to stop
e) Process 6 wins and tells everyone
30 30
e.g., processes 2 and 5 simultaneously discover that the
previous coordinator has crashed
they build an ELECTION message and a coordinator is
chosen
although the process is done twice, there is no problem,
only a little bandwidth is consumed