0% found this document useful (0 votes)
63 views

Distributed Databases: CMP-3440 - Database Systems

The document discusses distributed databases and distributed processing. Key points include: 1) A distributed database is a single logical database that is physically spread across multiple connected computers. 2) Distributed databases allow for data to be replicated, partitioned horizontally or vertically across sites. 3) A distributed database management system (DDBMS) manages the distributed database and makes the distribution transparent to users.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Distributed Databases: CMP-3440 - Database Systems

The document discusses distributed databases and distributed processing. Key points include: 1) A distributed database is a single logical database that is physically spread across multiple connected computers. 2) Distributed databases allow for data to be replicated, partitioned horizontally or vertically across sites. 3) A distributed database management system (DDBMS) manages the distributed database and makes the distribution transparent to users.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

CMP-3440 – Database Systems

Distributed Processing and Distributed Databases

Lecture 11

Distributed Databases
• A large organization having multiple locations may choose to use a central
database server or to distribute database to local servers.

• The distributed database is still centrally administered.

• The network must allow the users to share the data.

• A distributed database required multiple instances of a DBMS.

2
Evolution
• Merging of two diverging concepts:
1. Integration through the use of database technology; and
2. Distribution through the use of data communication technology.

Concepts
• A distributed database is a single logical database that is spread physically across
computers in multiple locations that are connected by a communication network.

• Software system that permits the management of distributed database and


makes the distribution transparent tousers.

• Data splits into fragments.

• Fragments may be replicated.

• Each DBMS participates at least one global application.


4
Environment
1. Homogeneous: The same DBMS is used at each node. Two types:
i. Autonomous: each DBMS works independently, passing messages back and
forth to share data updates.
ii. Nonautonomous: a central, or master, DBMS coordinates database access
and updates across the nodes.

2. Heterogeneous: potentially different DBMSs are used at each node.


i. Systems: support some or all of the functionality of one logical database.
ii. Gateways: simple paths are created to other databases, without the
benefits of one logical database.

Advantages Disadvantages
• Transparency. • Software cost and complexity.
• Local autonomy. • Processing overhead.
• Increased reliability and availability. • Data integrity.
• Modular growth. • Slow response
• Lower communication cost.
• Faster response.

6
Strategies for Distributing
1. Data Replication.

2. Horizontal Partitioning.

3. Vertical Partitioning.

4. Combination of the above.

Strategies for Distributing


Data Replication.
• Updates data copies. Can use either synchronous or asynchronous distributed
database technologies.
i. Snapshot Replication: simple table copying or periodic snapshots from
multiple sites are collected at a master/primary database to; then the snapshot
is sent periodically to each site where there is a copy. Can be full, differential or
incremental.
ii. Near-Real-Time Replication: messages for each completed transaction are
triggered for broadcast across the network informing all nodes to update data
as soon as possible; without forcing a confirmation to the originating node.
iii. Pull Replication: the target, not the source node, controls when a local
database is updated. Thus the target database determines when it needs to be
updated/refreshed and requests a snapshot.

8
Strategies for Distributing
Horizontal Partitioning.
• Some of the rows of a table/relation are put into a base relation at one site,and
other rows are put into a base relation at anothersite.
• The transactions are processed locally to minimize response time. (normal
pattern for persons using ATM).

Strategies for Distributing


Vertical Partitioning.
• Some of the columns of a table/relation are put into a base relation at one site,
and other columns are put into a base relation at anothersite.

10
Strategies for Distributing
Combination of Operations.
• Almost unlimited combinations of strategies.
i. Engineering parts, accounting, customer data can be vertically partitioned?
ii. Standard parts data can be horizontally partitioned among 3 locations?
iii. Standard price list data can be replicated in all 3 locations?

11

Selecting right Data Distribution Strategy


1. Totally centralized at one location, accessed from many geographically
distributed sites.
2. Partially or totally replicated across geographically distributed sites, with each
copy periodically updated with snapshots.
3. Partially or totally replicated across geographically distributed sites, with near-
real-time synchronization of updates.
4. Partitioned into segments at different geographically distributed sites, but still
within one logical database and one distributed DBMS.
5. Partitioned into independent, non-integrated segments spanning multiple
computers and database software.

12
Comparison of Data Distribution Strategy

13

Functions of a Distributed DBMS


i. Keep track of where data are located in a distributed data dictionary.
ii. Determine the location from which to retrieve data and the location at which
to process each part of a distributed query.
iii. Provide security, concurrency and dead-lock control, global query optimization.
iv. Provide consistency among copies of data across remote sites.
v. Present a single logical database that is physically distributed. Using global
primary key control.
vi. Provide scalability and transparency.
vii. Permit different nodes to run different DBMS.

14
Location Transparency
• Users can act as if all the data were located at a single node.

• Querying does not require the user to know where the data are physically stored.

• Administrator does not need to create a view using UNION operator.

• To achieve location transparency the distributed DBMS must have access to an


accurate and current data dictionary/directory that indicates locations of all data.

15

Replication Transparency
• Users may treat the item as if it were a single item at a single node.

• The distributed DBMS will consult the data dictionary and determine that this is a
local transaction or a copy has been replicated locally.

• If data are replicated at some sites but not at all, that request will have to be
routed to another site.

• The DDBMS will select the fastest route of response without letting the user
know whether replication was done or not.

16
Failure Transparency
• Each node in a DDBMS is subject to the same types of failure as in a centralized
system; with some additional risk of failures of a communication link.

• Error detection and system reconfiguration are probably the functions of


communications controller or processor, however the DDBMS is responsible for
data recovery when a failure has occurred.

• DDBMS at each node has a component called transaction manager to:


i. Maintain a log of transactions and before & after databaseimages.
ii. Maintain concurrency control scheme to ensure integrity during parallel
transactions.

17

Commit Protocol
• Transaction Manager executes a commit protocol; which is a well-defined procedure to ensure
that global transaction is either successfully completed at each site or else aborted.

• Most widely used two-phase commit protocol:


i. First the originating site of global transaction sends a request to each of the sites that will
process some portion of the transaction.
ii. Each site locks its portion of database being updated; and processes the sub-transaction but
does not immediately commit to local database. Instead the result is stored in temporary file.
iii. Each site notifies originating site when it has completed its sub-transaction.
iv. When all sites have responded, a message is broadcasted to all participating sites to ask
whether they want to commit; each site returns an “OK” or “NOT OK” message.
v. If all “OK” are received, it broadcasts message to commit their portions, if one or more “NOT
OK” are received, it broadcasts message to abort transaction.

18
Concurrency Transparency
• Concurrency control is more complexed in a DDBMS; because concurrent users
are spread out among multiple sites and the data are often replicated at several
sites.

• Transaction managers at each site must cooperate to provide concurrency using:


i. Locking.
ii. Versioning.
iii. Time-Stamping.

19

Time-Stamping
• Even if two events occur simultaneously at different sites, each will have a unique
time-stamp.

• To ensure that transactions are processed in serial order; thus avoiding the need
of locks (and possible deadlocks).

20
Query Optimization
• With DDBMS the response of a query may require the DBMS to assemble data
from several different sites.

• Suppose:
Supplier_T(SupplierNumber, City) 10,000 records, stored in Lahore

Part_T(PartNumber, Color) 100,000 records, stored in Faisalabad

Shipment_T(SupplierNumber, PartNumber) 1,000,000 records, stored in Rawalpindi

21

Query Optimization
• A query, written by a user from Lahore is:

SELECT Supplier_T.SupplierNumber
FROM Supplier_T, Shipment_T, Part_T
WHERE Supplier_T.City = ‘Karachi’
AND Shipment_T.PartNumber = Part_T.PartNumber
AND Part_T.Color = ‘Red’;

22
Oracle Replication
• Still an emerging technology, rather than established. Current releases do not provide all
of the features.

• Oracle GoldenGate:
• (Heterogeneous) Middleware to replicate data between oracle and non-oracle data-stores.
• Oracle Streams:
• (Homogeneous) Built-in feature of the Oracle database, is a data replication and integration
feature.
• Snapshot Replication:
• Materialized views are mostly used for unidirectional (one-way) replication; for pulling data.
• Advanced Replication:
• Supports unidirectional replication, multiple masters, conflict resolution.

23

Oracle Replication Manager


• GUI for setting up, managing and monitoring a replicationenvironment.

24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy