0% found this document useful (0 votes)
8 views14 pages

Lecture 8

The document provides an overview of distributed data replication, detailing its definition, goals, types, consistency models, challenges, examples, and trade-offs. It emphasizes the importance of replication for reliability, performance, and fault tolerance, while discussing various replication methods such as synchronous, asynchronous, and multi-master replication. Additionally, it addresses challenges like the CAP theorem, conflict resolution, and the implications of performance versus consistency in real-world applications.

Uploaded by

Kashmala Alam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

Lecture 8

The document provides an overview of distributed data replication, detailing its definition, goals, types, consistency models, challenges, examples, and trade-offs. It emphasizes the importance of replication for reliability, performance, and fault tolerance, while discussing various replication methods such as synchronous, asynchronous, and multi-master replication. Additionally, it addresses challenges like the CAP theorem, conflict resolution, and the implications of performance versus consistency in real-world applications.

Uploaded by

Kashmala Alam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Distributed Data Replication

Prepared By Miangul Shafiq Ahmad Jan


Contents:

• Introduction to Distributed Data Replication


• Goals of Distributed Data Replication
• Types of Replication in Distributed Systems
• Consistency Models in Distributed Replication
• Challenges in Distributed Data Replication
• Examples and Use Cases
• Trade-Offs and Real-World Considerations
1. Introduction to Distributed Data Replication
Definition:
Distributed data replication is the process of storing copies of data on multiple nodes within a
distributed system. These nodes could be located in the same data center or across different geographic
locations.
Why Replicate Data?
• Reliability and Availability: If one node fails, data is still accessible from another node.
• Performance Optimization: Replication reduces latency by bringing data closer to users.
• Fault Tolerance: Replication helps protect against data loss and enhances disaster recovery.
Example:
Think about a global social media platform like Facebook. If you access it from Europe, the server
handling your requests might be closer to Europe, while someone in Asia would connect to a server in
Asia. The data (like profile info and posts) is replicated across these servers for faster access and
reliability.
2. Goals of Distributed Data Replication
• High Availability: Ensuring data can be accessed even in the event of system
failures.
• Fault Tolerance: Allowing the system to recover from hardware failures.
• Scalability: Distributing the load across multiple nodes to handle larger amounts of
data and traffic.
• Low Latency: Bringing data closer to users to reduce delays.
3. Types of Replication in Distributed Systems
1. Synchronous Replication:
o Changes are instantly copied to all replicas.
o Pros: Strong consistency across replicas, as updates are immediately reflected
everywhere.
o Cons: Increased latency since changes must be confirmed by all replicas before
the system considers an operation complete.
o Example: Banks may use synchronous replication for transactions to ensure
that account balances remain consistent across all replicas.
3. Types of Replication in Distributed Systems(cont.)

2. Asynchronous Replication:

o Changes are made to a primary replica and then propagated to other replicas over time.

o Pros: Faster writes because the system does not wait for all replicas to confirm.

o Cons: Temporary inconsistency since other replicas might not have the latest data
immediately.

o Example: Social media posts may use asynchronous replication, where updates can
propagate over a few seconds without causing issues.
3. Types of Replication in Distributed Systems(cont.)

3. Multi-Master Replication:

• Allows updates on multiple replicas simultaneously, which are then synchronized across
replicas.

• Pros: Enables high availability and distributed workloads.

• Cons: Conflict resolution is challenging, as multiple replicas might have conflicting updates.

• Example: Collaboration tools like Google Docs use multi-master replication to allow multiple
users to edit documents simultaneously.
3. Types of Replication in Distributed Systems(cont.)

4. Primary-Backup Replication (Master-Slave):

• A primary replica handles all writes, and other replicas (backups) receive updates from the
primary.

• Pros: Simple conflict management, as only one replica accepts writes.

• Cons: The primary node can be a bottleneck and a single point of failure.

• Example: A website with a master database that replicates data to read-only replicas to
handle traffic more efficiently.
4. Consistency Models in Distributed Replication
Consistency refers to the state of data across replicas. Different models offer different levels of consistency:
1. Strong Consistency:
• Guarantees that all replicas reflect the latest write before any read operation.
• Use Case: Critical systems where accurate data is essential, like online banking.
2. Eventual Consistency:
• Guarantees that, in the absence of new updates, all replicas will eventually become consistent.
• Use Case: Social media platforms where a few seconds of inconsistency don’t cause issues.
3. Causal Consistency:
• Ensures that if one operation causally affects another, the system respects this order in replication.
• Use Case: Messaging applications where responses should follow messages in order.
4. Read-After-Write Consistency:
• Ensures that once a write completes, the client can read the updated data immediately.
• Use Case: Blog platforms, where users should see their posts immediately after publishing.
5. Challenges in Distributed Data Replication
1. Consistency vs. Availability (CAP Theorem):

o The CAP Theorem states that in any distributed data system, you can only choose two of
the following three guarantees:

▪ Consistency: Every read receives the most recent write.

▪ Availability: Every request receives a response.

▪ Partition Tolerance: The system continues to operate even if there’s a


communication breakdown between nodes.

o Implication: Choosing between these factors impacts the design of the distributed
system. For instance, systems that prioritize availability and partition tolerance may
sacrifice consistency (eventual consistency).
5. Challenges in Distributed Data Replication (cont.)
2. Conflict Resolution:
o Techniques to resolve conflicts include:
▪ Timestamping: Keeping the most recent update.
▪ Vector Clocks: Tracking the order of operations across nodes.
▪ Application Logic: Using domain-specific rules (like majority vote) to handle conflicts.
3. Data Latency:
o Replication adds network overhead, as data needs to be copied to multiple locations.
o Trade-offs must be considered between data accuracy and the speed of access.
4. Fault Detection and Failover:
o Detecting failed nodes and rerouting requests to healthy replicas requires careful coordination.
o Load balancers or distributed algorithms (like Paxos or Raft) help manage failover and maintain
system stability.
6. Examples and Use Cases
• Google Search Infrastructure: Google replicates search data across multiple data
centers globally, ensuring fast access and reliability.
• Netflix: Uses distributed data replication to store and stream media content across
the world, reducing latency and providing a seamless user experience.
• Amazon Web Services (AWS) S3: AWS S3 replicates data in multiple geographic
locations for redundancy and disaster recovery.
7. Trade-Offs and Real-World Considerations
• Performance vs. Consistency:
A highly available system may prioritize quick responses over having the latest data,
while a highly consistent system may experience delays.
• Cost:
Data replication requires additional storage, bandwidth, and hardware, which can
increase operational costs.
• Geographic Distribution and Legal Considerations:
Some regions have regulations requiring data to stay within specific geographic
boundaries, influencing where and how data is replicated.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy