Consistency and Replication
Consistency and Replication
Replication
1
Why Replicate Data?
Enhance reliability.
Improve performance.
7
Data-Centric Consistency Models
A data-store can be read from or written to by any process in a distributed
system.
A local copy of the data-store (replica) can support “fast reads”.
However, a write to a local replica needs to be propagated to all remote
replicas.
9
Consistency Model Diagram
Notation
Wi(x)a – a write by process ‘i’ to item ‘x’
with a value of ‘a’. That is, ‘x’ is set to ‘a’.
12
Sequential Consistency Diagrams
In other words: all processes see the same interleaving set of
operations, regardless of what that interleaving is.
13
Problem with Sequential Consistency
With this consistency model, adjusting the
protocol to favour reads over writes (or
vice-versa) can have a devastating impact
on performance (refer to the textbook for
the gory details).
x = 1; y = 1; z = 1;
print ( y, z); print (x, z); print (x, y);
15
Linearizability and Sequential Consistency (2)
Four valid execution sequences for the processes
of the previous slide. The vertical axis is time.
x = 1; x = 1; y = 1; y = 1;
print ((y, z); y = 1; z = 1; x = 1;
y = 1; print (x,z); print (x, y); z = 1;
print (x, z); print(y, z); print (x, z); print (x, z);
z = 1; z = 1; x = 1; print (y, z);
print (x, y); print (x, y); print (y, z); print (x, y);
incorrect
correct
21
Introducing Weak Consistency
Not all applications need to see all writes,
let alone seeing them in the same order.
22
Weak Consistency Properties
The three properties of Weak Consistency:
1. Accesses to synchronization variables
associated with a data-store are
sequentially consistent.
2. No operation on a synchronization
variable is allowed to be performed until
all previous writes have been completed
everywhere.
3. No read or write operation on data items
are allowed to be performed until all
previous operations to synchronization
variables have been performed.
23
Weak Consistency: What It Means
So …
24
Weak Consistency Examples
Wrong!!
Answer: It doesn’t!
All processes must see all shared accesses in the same order. Accesses are furthermore ordered
Linearizability
according to a (nonunique) global timestamp.
Sequential All processes see all shared accesses in the same order. Accesses are not ordered in time.
Causal All processes see causally-related shared accesses in the same order.
All processes see writes from each other in the order they were used. Writes from different processes
FIFO
may not always be seen in that order.
(a)
Consistency Description
Weak Shared data can be counted on to be consistent only after a synchronization is done.
Release Shared data are made consistent when a critical region is exited.
Entry Shared data pertaining to a critical region are made consistent when a critical region is entered.
(b) 32
Client-Centric Consistency Models
The previously studied consistency models
concern themselves with maintaining a
consistent (globally accessible) data-store in the
presence of concurrent read/write operations.
34
Toward Eventual Consistency
The only requirement is that all replicas will
eventually be the same.
Monotonic-Read Consistency
Monotonic-Write Consistency
Read-Your-Writes Consistency
Writes-Follow-Reads Consistency
37
More on Bayou, 1 of 4
Monotonic Reads: if a process reads the
value of a data item ‘x’, any successive read
operation on ‘x’ by that process will always
return that same value or a more recent
value.
not guaranteed!
different value!
42
Replica Placement Types
There are three types of replica:
1. Permanent replicas: tend to be small in
number, organized as COWs (Clusters of
Workstations) or mirrored systems.
2. Server-initiated replicas: used to enhance
performance at the initiation of the owner of the
data-store. Typically used by web hosting
companies to geographically locate replicas close
to where they are needed most. (Often referred
to as “push caches”).
3. Client-initiated replicas: created as a result of
client requests – think of browser caches. Works
well assuming, of course, that the cached data
does not go stale too soon.
43
Update Propagation
When a client initiates an update to a
distributed data-store, what gets
propagated?
Messages sent. Update (and possibly fetch update later). Poll and update.
Response time at
Immediate (or fetch-update time). Fetch-update time.
client.
46
Epidemic Protocols
This is an interesting class of protocol that
can be used to implement Eventual
Consistency (note: these protocols are used
in Bayou).
The main concern is the propagation of
updates to all the replicas in as few a
number of messages as possible.
Of course, here we are spreading updates,
not diseases!
With this “update propagation model”, the
idea is to “infect” as many replicas as quickly
as possible.
47
Epidemic Protocols: Terminology
Server types:
Infective replica: a server that holds an
update that can be spread to other replicas.
Susceptible replica: a yet to be updated
server.
Removed replica: an updated server that
will not (or cannot) spread the update to
any other replicas.
49
The Gossiping Protocol
This variant is referred to as “gossiping” or “rumor
spreading”, as works as follows:
1. Sequential Consistency.
2. Weak Consistency (with sync vars).
3. Atomic Transactions (to be studied soon).
52
Primary-Based Protocols
Each data item is associated with a
“primary” replica.
1. Remote-Write.
2. Local-Write.
53
Remote-Write Protocols
With this protocol, all writes are performed at a single (remote) server.
This model is typically associated with traditional client/server systems.
54
Primary-Backup Protocol: A Variation
Writes are still centralised, but reads are now distributed. The
primary coordinates writes to each of the backups.
55
The Bad and Good of Primary-Backup
Bad: Performance!
1. Active Replication.
2. Majority Voting (Quorums).
61
Active Replication
A special process carries out the update
operations at each replica.
Lamport’s timsestamps can be used to
achieve total ordering, but this does not scale
well within Distributed Systems.
An alternative/variation is to use a
sequencer, which is a process that assigns a
unique ID# to each update, which is then
propagated to all replicas.
redundant invocation
65
Quorum Protocols: Generalization
NR + NW > N
66
Quorum-Based Protocols
69
Consistency and Replication: Summary
Reasons for replication: improved
performance, improved reliability.
71
End of Summary
To distribute (or “propagate”) updates, we
draw a distinction between WHAT is
propagated, WHERE it is propagated and by
WHOM.