Distributed Databases: CMP-3440 - Database Systems
Distributed Databases: CMP-3440 - Database Systems
Lecture 11
Distributed Databases
• A large organization having multiple locations may choose to use a central
database server or to distribute database to local servers.
2
Evolution
• Merging of two diverging concepts:
1. Integration through the use of database technology; and
2. Distribution through the use of data communication technology.
Concepts
• A distributed database is a single logical database that is spread physically across
computers in multiple locations that are connected by a communication network.
Advantages Disadvantages
• Transparency. • Software cost and complexity.
• Local autonomy. • Processing overhead.
• Increased reliability and availability. • Data integrity.
• Modular growth. • Slow response
• Lower communication cost.
• Faster response.
6
Strategies for Distributing
1. Data Replication.
2. Horizontal Partitioning.
3. Vertical Partitioning.
8
Strategies for Distributing
Horizontal Partitioning.
• Some of the rows of a table/relation are put into a base relation at one site,and
other rows are put into a base relation at anothersite.
• The transactions are processed locally to minimize response time. (normal
pattern for persons using ATM).
10
Strategies for Distributing
Combination of Operations.
• Almost unlimited combinations of strategies.
i. Engineering parts, accounting, customer data can be vertically partitioned?
ii. Standard parts data can be horizontally partitioned among 3 locations?
iii. Standard price list data can be replicated in all 3 locations?
11
12
Comparison of Data Distribution Strategy
13
14
Location Transparency
• Users can act as if all the data were located at a single node.
• Querying does not require the user to know where the data are physically stored.
15
Replication Transparency
• Users may treat the item as if it were a single item at a single node.
• The distributed DBMS will consult the data dictionary and determine that this is a
local transaction or a copy has been replicated locally.
• If data are replicated at some sites but not at all, that request will have to be
routed to another site.
• The DDBMS will select the fastest route of response without letting the user
know whether replication was done or not.
16
Failure Transparency
• Each node in a DDBMS is subject to the same types of failure as in a centralized
system; with some additional risk of failures of a communication link.
17
Commit Protocol
• Transaction Manager executes a commit protocol; which is a well-defined procedure to ensure
that global transaction is either successfully completed at each site or else aborted.
18
Concurrency Transparency
• Concurrency control is more complexed in a DDBMS; because concurrent users
are spread out among multiple sites and the data are often replicated at several
sites.
19
Time-Stamping
• Even if two events occur simultaneously at different sites, each will have a unique
time-stamp.
• To ensure that transactions are processed in serial order; thus avoiding the need
of locks (and possible deadlocks).
20
Query Optimization
• With DDBMS the response of a query may require the DBMS to assemble data
from several different sites.
• Suppose:
Supplier_T(SupplierNumber, City) 10,000 records, stored in Lahore
21
Query Optimization
• A query, written by a user from Lahore is:
SELECT Supplier_T.SupplierNumber
FROM Supplier_T, Shipment_T, Part_T
WHERE Supplier_T.City = ‘Karachi’
AND Shipment_T.PartNumber = Part_T.PartNumber
AND Part_T.Color = ‘Red’;
22
Oracle Replication
• Still an emerging technology, rather than established. Current releases do not provide all
of the features.
• Oracle GoldenGate:
• (Heterogeneous) Middleware to replicate data between oracle and non-oracle data-stores.
• Oracle Streams:
• (Homogeneous) Built-in feature of the Oracle database, is a data replication and integration
feature.
• Snapshot Replication:
• Materialized views are mostly used for unidirectional (one-way) replication; for pulling data.
• Advanced Replication:
• Supports unidirectional replication, multiple masters, conflict resolution.
23
24