Synchronization: CS403/534 Distributed Systems Erkay Savas Sabanci University
Synchronization: CS403/534 Distributed Systems Erkay Savas Sabanci University
Part II
CS403/534
Distributed Systems
Erkay Savas
Sabanci University
1
Election Algorithms
• Issue:
– Many distributed algorithms require that one process
act as a coordinator (initiator, etc).
– In many systems this is done manually. This leads to
centralized solutions single point of failure
– The question is how to select this special process
dynamically.
• Advantage of dynamic selection
– If the coordinator dies, then another coordinator can
be elected dynamically.
• Assumptions:
– Every process has a unique ID and processes know each
other’s ID
– A running process with the highest ID becomes the
coordinator
2
The Bully Algorithm
• Algorithm: When a process P notices that the
coordinator is not responding, it holds an election
as follows:
1. Process P sends an ELECTION message to all processes
with higher ID numbers
2. If no one responds, P wins the election and becomes
coordinator
3. If an OK message is returned by a process, P gives up
and responding process holds an election.
3
The Bully Algorithm: Example (1)
1 1 1
2 n 5 2 5 2 5
ctio OK
ele election
election OK 4 6
4 6 4 6
0 3 0 3 0 3
7 7 7
1 1
2 5 2 5
OK
4 6 4 6
Coordinator
0 3 0 3
7 7
5
A Ring Algorithm (1)
• Principle:
– Processes are arranged on a ring and process with the
highest ID number is elected as a coordinator
• Algorithm:
– Any process that notices the coordinator is down can
start the election.
– It first builds an ELECTION message containing its
own ID number
– Sends this message to its successor
– Process that receives this message adds its number
and passes it along
– Eventually, the message comes back to the initiator
with a list of alive processes.
6
A Ring Algorithm (2)
[5,6,0] 1 [5,6,0,1]
0 2
[2]
7 [5, 6] 3
6 is the new
coordinator [2,3]
no response
6 4
[5] 5
1 2 1 2 1 2
12 OK
2 enters
12 critical region
12
A Toke Ring Algorithm
2
1 3
0 4
7 5
6
16
Transaction Primitives
Primitive Description
BEGIN_TRANSACTION 1. BEGIN_TRANSACTION
reserve ANK -> IST; 2. reserve ANK -> IST; OK
reserve IST -> CHI; 3. reserve IST -> CHI; OK
reserve CHI -> LAX;
END_TRANSACTION 4. reserve CHI -> LAX full
5. ABORT_TRANSACTION
(a)
(b)
18
Transaction: ACID Properties
• Atomicity:
– To the outside world, a transaction is indivisible.
• Consistent:
– The transaction does not violate system invariants.
– e.g., the law of conservation of money in a bank.
– A transaction establishes a valid state transition.
– Invalid, intermediate states during the transaction’s
execution are possible
• Isolated:
– Concurrent transactions do not interfere with each
other.
– Transactions are isolated or serializable.
• Durable:
– Once a transaction commits, the changes are permanent
19
Classification: Flat Transaction
• The simplest and most familiar one. Satisfies the
ACID properties.
• Problem:
– It does not allow partial results to be committed or
aborted.
• Example: flight reservation
– It would be nice if we made any subset of the three
reservations (commit them) and dealt with the rest
later.
20
Classification: Nested Transactions
• A nested transaction is constructed from a
number of subtransactions
• Parent transaction may fork off subtransactions
that may execute in isolation (possibly on
different machines)
• Once a subtransaction commits, its state will
become visible to the parent transaction.
• If the parent transaction aborts, all child
subtransactions have to abort as well
– The permanency of subtransactions are not applicable
• A subtransaction is given a private copy of all the
data in the entire system to manipulate it.
21
Classification: Distributed Transactions
• A distributed transaction is a flat, indivisible
transaction that operates on distributed data.
• Nested transactions generally follow logical
division of the work of the original transaction.
– A subtransaction can still be distributed
• The main problem of distributed transaction is
that it requires distributed locking algorithms.
22
Implementing Transactions
• Solution 1: private workspace
• Case Study: Transactions on file system
• When a process starts a transaction, it is given a
private workspace to work
• All the reads and writes are done on this private
workspace, not on the original disk blocks.
• All the other processes see the original blocks of
file until the transaction commits
• The transaction will have a private view of the
file system
• After committing, this view will be globally unique
23
Private Workspace
index original private
index
0 index workspace
0
1 0 0’
1
2 1 1
2
2 2
3
3’
1 2 0 1 2 0
1 2
0’ 3’ 0 3
free blocks
• a) A transaction
• b) – d) The log before each statement is executed 25
Concurrency Control (1)
• Problem
– Increase efficiency by allowing several transactions to
execute at the same time
– and we have to do it in such a way that consistency and
isolation properties are not violated.
• Constraint
– Effect should be the same as if the transactions were
executed in some serial (sequential) order.
– The same data items must be accessed in a specific
order by different transactions
– However, the final result is the same as if all
transactions had run sequentially.
26
Concurrency Control: Layered Organization
Transactions
READ/WRITE
BEGIN_TRANSACTION
Transaction
Atomicity
END_TRANSACTION manager
LOCK/RELEASE
or Scheduler Controlling concurrency
Timestamp operations (isolation & consistency )
Data
manager Execute read/write
• General
transaction organization
manager of managers
for handling
distributed
transactions
scheduler scheduler scheduler .
29
Serializability: Example
T1: T2: T3:
BEGIN_TRANSACTION BEGIN_TRANSACTION BEGIN_TRANSACTION
x = 0; x = 0; x = 0;
x = x + 1; x = x + 2; x = x + 3;
END_TRANSACTION END_TRANSACTION END_TRANSACTION
Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3; Legal/serialized
Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal/unseriliazed
Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal
(d)
• a) – c) Three transactions T1, T2, and T3
• d) Possible schedules
30
Serializability : Conflicts
• Model of computation:
– Since we are not interested in the specific
computations of each transaction, a transaction can be
modeled as a series of read and write operations on
data items.
• Conflicting operations:
– Two operations oper(Ti, x) and oper(Tj, x) of two
transactions Ti and Tj on the same data item x may
conflict at data manager.
– read-write conflict: One operation is a read and the
other is a write
– write-write conflict: Both are write operations
– Two read operations never conflict
31
Synchronization
• Concurrency control algorithms can generally be
classified by looking at the way read and write
operations are synchronized
• Synchronization can be achieved through
– Mutual exclusion mechanism on shared data (i.e.
locking). Before writing or reading of a data item, a
lock must be obtained.
– Explicitly ordering operations using timestamps. Data
managers are forced to execute read and writes
following timestamp order
• Optimistic and pessimistic concurrency control
32
Two-Phase Locking (2PL)
• Transactions indicate their intentions by
requesting locks from the scheduler (a.k.a lock
manager).
• Locks are either read lock (a.k.a. shared lock) or
write lock (a.k.a. exclusive lock)
• Read locks and write locks conflict (because read
and write operations are incompatible)
Read lock Write lock
Read lock Yes No
Write lock No No
35
Strict-2PL
• Strict two-phase locking.
36
Strict-2PL: Advantages
1. A transaction reads a value written by a
committed transaction;
– therefore a transaction never has to abort because
another transaction aborts
– extreme case is cascaded aborts.
2. All lock acquisitions and releases can be handled
by the system automatically;
– locks are acquired whenever a data item is to be
accessed and released when the transaction commits.
37
2PL: Example (1)
• Initial values: x = y = z = 50;
• Two Transactions:
T1: T2:
x = x + 100 z = 2z + x
y = y + 150 x = 2x
z = y + z y = 2y
• (Serialized) Order: T1 T2
T1: x = 150 y = 200 z = 250
T2: x = 300 y = 400 z = 650
• T2 T1
x = 200 y = 250 z = 350
38
Ordering without 2PL: Example
• Initial values: x = y = z = 50
T1 T2
lock x lock z
modify x (x=150) x=x+100 wait for lock on x
unlock x …
lock y lock x
modify y (y=200) y=y+150 modify z (z = 250) z=2z + x
unlock y unlock z
lock z modify x (x=300) x = 2x
modify z (z=450) z=y+z lock y
unlock z modify y (y = 400) y= 2y
unlock x and y
40
Timestamp Ordering (1)
• Basic idea:
– Transaction manager assigns a unique timestamp
ts(Ti) to each transaction Ti.
– Every operation in Ti is timestamped with ts(Ti).
– Each data item x has also a read timestamp,
tsRD(x), and a write timestamp tsWR(x).
– tsRD(x)is set to the timestamp of the transaction
that has most recently read x.
– Similarly, tsWR(x)is set to timestamp of the
transaction that has most recently changed x.
– In case of conflicting operations on data item x, the
data manager processes the one with the lowest
timestamp first.
41
Timestamp Ordering (2)
• Suppose: two conflicting operations
read(Ti, x), write(Tj, x)
– ts(Ti) < ts(Tj), but write(Tj, x) has already
been processed by the data manager
– Then, the scheduler must reject read(Ti, x)
• Note: Timestamp ordering is rather aggressive
for if a single operation is rejected, the
transaction will have to be aborted
• Example: read(T, x)
– if ts(T) < tsWR(x) then T must be aborted
– if ts(T) > tsWR(x) then read(T, x) can be
executed (tsRD(x):= max(ts(T), tsRD(x)))
42
Timestamp Ordering (3)
• Example: write(T, x)
– if ts(T) < tsRD(x) then T must be aborted
– if ts(T) > tsRD(x) then write(T, x) can be
executed
43
Pessimistic Timestamp Ordering: Example (1)
• ts(T1) << ts(T2) < ts(T3).
• Initially, tsRD(x) = tsWR(x) = ts(T1)
• T2 is trying to write x
time time
write is accepted write is not accepted
ts(T2) tsRD(x)
(T2) (T3)
time
write is not accepted
44
Pessimistic Timestamp Ordering: Example (2)
• ts(T1) << ts(T4) < ts(T2) < ts(T3)
• Initially, tsRD(x) = tsWR(x) = ts(T1)
• T2 is trying to read x
tsWR(x) ts(T2) ts(T2) tsWR(x)
time time
no conflict Abort
time time
no conflict, T2 just waits Abort
45
Optimistic Concurrency Control
• Observation:
1. Maintaining locks is expensive
2. In practice, there are not many conflicts
• Alternative
– Go ahead immediately with all operations, use
tentative writes everywhere (shadow or private
copies), solve conflicts while committing
• Phases:
1. Allow operations tentatively
2. validate effects (through checking all transactions)
3. make updates permanent
46