Consistency and Replication: CS403/534 Distributed Systems Erkay Savas Sabanci University
Consistency and Replication: CS403/534 Distributed Systems Erkay Savas Sabanci University
Consistency and Replication: CS403/534 Distributed Systems Erkay Savas Sabanci University
Part I
CS403/534
Distributed Systems
Erkay Savas
Sabanci University
1
Consistency and Replication
• Reasons
1. Enhance reliability
2. Improve performance
• Major problems:
– Keep the replicated data consistent
– More bandwidth between the replicas to communicate
updates
• Overview
– Consistency models: data-centric, client-centric
– Implementation of consistency models
2
Replication as a Scaling Technique
• Main problem: to keep replicas consistent.
– Updates must be propagated to every replica
– All conflicting (update) operations to be performed in
the same order everywhere
• Issues involved:
– Negative impact on bandwidth requirements
– Enforcing an ordered relationship between events
require synchronization: physical or logical clocks
– How seriously we should take the strict (tight)
consistency approach.
– To what extent consistency loosened highly depended
• on the purpose for which those data are used
• on the access & update patterns of the replicated
data 3
Data-Centric Consistency Models
• Data store:
– A term used to refer to a physically distributed
shared data
– Read and write operations are of concern
– Each process has a replica nearby
Local
(nearby)
copy
5
Strong Models: Strict Consistency
• Any read on a shared data item x returns a value
corresponding to the result of the most recent write
on x.
• All writes are instantaneously visible to all processes
• It is impossible to implement it in distributed systems
since strict consistency relies on absolute global time
time
7
Sequential Consistency: Example
8
Sequential Consistency: Linearizability
• Linearizability is similar to sequential
consistency; but stronger than sequential
consistency (but weaker than strict consistency).
• Operations are assumed to receive a timestamp
using a globally available clock
– one with finite precision
• If tsOP1(x) < tsOP2(x) then OP1(x) should
precede OP2(x) in the sequence
• Linearizable data is also sequentially consistent
• Processes use loosely synchronized clocks
9
Sequential Consistency : Example (1)
Process P1 Process P2 Process P3
x = 1; y = 1; z = 1;
print (y, z); print (x, z); print (x, y);
12
Casual Consistency
• Necessary condition:
– Writes that are potentially casually related must be
seen by all processes in the same order.
– Concurrent writes (that are not causally related)
may be seen in a different order on different
machines.
• Two writes are causally related to each other
through a read operation.
• Vector timestamps are used to implement
casual consistency
13
Casual Consistency: Example (1)
14
Casual Consistency: Example (2)
• Necessary Condition:
– Writes done by a single process are seen by all
other processes in the order in which they
were issued, but writes from different
processes may be seen in a different order by
different processes.
• Weaker form of strong consistency
• All writes generated by different
processes are concurrent
16
FIFO Consistency: Example (1)
P1: W(x)a
17
FIFO Consistency: Example (2)
P1: P2: P3:
x = 1; y = 1; z = 1;
print (y, z); print (x, z); print (x, y);
x = 1; x = 1; y = 1;
print (y, z); y = 1; print (x, z);
y = 1; print(x, z); z = 1;
print(x, z); print ( y, z); print (x, y);
z = 1; z = 1; x = 1;
print (x, y); print (x, y); print (y, z);
Process P1 Process P2
x = 1; y = 1;
if (y == 0) kill (P2); if (x == 0) kill (P1);
19
Weak Consistency Model
• Basic idea:
– individual read & write operations are not
immediately made known to other processes.
– Final effect is communicated (as in transactions)
– synchronization variable (S) (a lock or barrier)
– A synchronization variable has only one
operation: synchronize(S)
– A process gains exclusive access to a critical
region through synchronization operation
– While a process is in the critical region, the
inconsistencies will happen.
– synchronize operation pushes local updates to
other replicas + brings about the remote updates
to the local replica
20
Weak Consistency: Properties
1. Access to synchronization variable is sequential
• If processes P1 and P2 call synchronize(S), the
execution order of these operations will be the same
everywhere
2. Synchronization flushes the pipeline
• It forces all writes that are in progress or partially
completed or completed at some local copies but not
all to complete everywhere.
3. When data items are accessed, either for
reading or writing, all previous synchronization
will have been completed.
• By doing synchronization before reading a shared
data, a process can be sure of getting the most
recent values. 21
Weak Consistency: Properties
• A good consistency model when isolated accesses
to shared data is rare.
• With weak consistency, sequential consistency is
enforced between groups of operations
• Synchronization variables to delimit those
groups.
• Weak consistency models tolerates a greater
degree of inconsistency for a limited amount of
time.
22
Weak Consistency: Example
P2: S R(x)b S
23
Release Consistency (1)
• Drawbacks of weak consistency: When a synchronization
variable is accessed by a process, the data store does not
know if
– the process is finished writing data (exiting critical region)
– or it is just about to read data (entering critical region)
• Therefore, when an access to a synchronization variable
is initiated, the local data store does two things:
1. All locally initiated writes have been propagated to all other
copies
2. Gather in all writes from other copies
• If data store makes the distinction between entering or
exiting from critical regions, problems will be solved
24
Release Consistency: Example
• Two types of synchronization operations are used:
1. Acquire
2. Release
• Programmer is responsible to call these operations
before entering and exiting critical region
P3: R(x)a
26
Entry Consistency
• With release consistency, all local updates are
made available to all copies during the release of
the lock
• With entry consistency, each individual shared
data item is associated with some
synchronization variable (e.g. lock or barrier)
• When acquiring the synchronization variable, the
most recent values of its associated shared data
item must be fetched
• Note: where release consistency affects all
shared data, entry consistency affects only
those associated with a synchronization variable.
27
Entry Consistency: Conditions
1. At an acquire, all remote changes to the
guarded data must be made visible
2. Before updating a shared data item, a
process must enter the critical region in
exclusive mode
3. If a process wants to enter a critical region
in nonexclusive mode, it must first check
with the owner of the synchronization
variable to fetch the most recent copies of
shared data.
28
Entry Consistency: Example
Consistency Description
Strict Absolute time ordering of all shared accesses.
All processes must see all shared accesses in the same order.
Linearizability Accesses are furthermore ordered according to a global
timestamp
All processes see all shared accesses in the same order.
Sequential
Accesses are not ordered in time
All processes see causally-related shared accesses in the
Causal
same order.
All processes see writes from each other in the order they
FIFO were used. Writes from different processes may not always
be seen in that order
30
Summary of Consistency Models
Models with synchronization operations.
Consistency Description
Weak Shared data can be counted on to be consistent only after a
synchronization is done
Release Shared data are made consistent when a critical region is
exited
Entry Shared data pertaining to a critical region are made consistent
when a critical region is entered.
31
Client-Centric Consistency Models
• Data-centric consistency: guarantees systemwide
consistency on data store
• Client-centric consistency: provides guarantee of
consistency for accesses of a single client
– Most large-scale distributed systems apply replication
for scalability,
– Simultaneous updates are rare; and if they happen they
are easy to resolve
– DNS: Updates are done by one processes in a domain (no
write-write conflicts); propagate slowly
– WWW: Caches all over the place, but there need be no
guarantee that you are reading the most recent version
of a page
32
Eventual Consistency
• Large-scale distributed and replicated
databases can tolerate a relatively high degree
of inconsistency.
• If no updates take place for a long time, all
replicas will gradually become 100% consistent.
• This model guarantees that updates to be
propagated to all replicas eventually.
• Write-write conflicts are often relatively easy
to solve assuming only a small group of processes
can perform updates.
• Very inexpensive to implement.
33
Consistency for Mobile Client
Client moves to another location &
(transparently) connect to other replica
Wide-area network
L1: W(x1)
L1: W(x1)
L2: W(x2)
L1: WS(x1)
43
Monotonic Reads with Vector
Timestamps
Read_Set = (0, 0, 0)
Read_Set = (2, 1, 1)
Read_Set = (2, 3, 1)
1. (0, 0, 0)
5. (2, 3, 1)
3. (2, 1, 1)
2. (2, 1, 1)
4. Sync
server 1 server 2 server 3