ADT Notes
ADT Notes
Features
In a homogeneous distributed database, all the sites use identical DBMS and
operating systems. Its properties are −
Architectural Models
In these systems, each peer acts both as a client and a server for imparting
database services. The peers share their resource with other peers and co-
ordinate their activities.
Example: Consider that we have three departments using Oracle-9i for DBMS.
If some changes are made in one department then, it would update the other
department also.
2. Heterogeneous distributed database system.
MySQL
Oracle
SQL Server
dBASE
FoxPro
PostgreSQL, etc.
Types of DBMS
Hierarchical DBMS
Network DBMS in one where the relationships among data in the database are
of type many-to-many in the form of a network. The structure is generally
complicated due to the existence of numerous many-to-many relationships.
Network DBMS is modelled using “graph” data structure.
Relational DBMS
Distributed DBMS
Operations on DBMS
The four basic operations on a database are Create, Retrieve, Update and Delete.
Example SQL command to insert a single tuple into the student table −
UPDATE STUDENT
SET STREAM = 'ELECTRONICS AND COMMUNICATIONS'
WHERE STREAM = 'ELECTRONICS';
Example − To delete all students who are in 4 th year currently when they
are passing out, we use the SQL command −
DELETE FROM STUDENT
WHERE YEAR = 4;
• Fragmentation. The system partitions the relation into several fragments, and
stores each fragment at a different site.
Data Replication
Data replication is the process of storing separate copies of the database at two
or more sites. It is a popular fault tolerance technique of distributed databases.
Snapshot replication
Near-real-time replication
Pull replication
Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The
subsets of the table are called fragments. Fragmentation can be of three types:
horizontal, vertical, and hybrid (combination of horizontal and vertical).
Horizontal fragmentation can further be classified into two techniques: primary
horizontal fragmentation and derived horizontal fragmentation.
Advantages of Fragmentation
Since data is stored close to the site of usage, efficiency of the database
system is increased.
Local query optimization techniques are sufficient for most queries since
data is locally available.
Since irrelevant data is not available at the sites, security and privacy of
the database system can be maintained.
Disadvantages of Fragmentation
When data from different fragments are required, the access speeds may
be very low.
In case of recursive fragmentations, the job of reconstruction will need
expensive techniques.
Lack of back-up copies of data in different sites may render the database
ineffective in case of failure of a site.
1. Synchronous Replication:
In synchronous replication, the replica will be modified immediately after some
changes are made in the relation table. So there is no difference between
original data and replica.
2. Asynchronous replication:
In asynchronous replication, the replica will be modified after commit is fired
on to the database.
Replication Schemes
The three replication schemes are as follows:
1. Full Replication
In full replication scheme, the database is available to almost every location or
user in communication network.
2. No Replication
No replication means, each fragment is stored exactly at one location.
Advantages of no replication
Disadvantages of no replication
For example, let us consider that a University database keeps records of all
registered students in a Student table having the following schema.
STUDENT
Now, the fees details are maintained in the accounts section. In this case, the
designer will fragment the database as follows −
For example, in the student schema, if the details of all students of Computer
Science Course needs to be maintained at the School of Computer Science, then
the designer will horizontally fragment the database as follows −
CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science";
Hybrid Fragmentation
Distributed Transactions
Each high level operation can be divided into a number of low level tasks or
operations. For example, a data update operation can be divided into three tasks
read_item() − reads data item from storage to main memory.
Transaction Operations
Transaction States
Active − The initial state where the transaction enters is the active state.
The transaction remains in this state while it is executing read, write or
other operations.
Partially Committed − The transaction enters this state after the last
statement of the transaction has been executed.
Committed − The transaction enters this state after successful
completion of the transaction and system checks have issued commit
signal.
Failed − The transaction goes from partially committed state or active
state to failed state when it is discovered that normal execution can no
longer proceed or system checks fail.
Aborted − This is the state after the transaction has been rolled back after
failure and the database has been restored to its state that was before the
transaction began.
The following state transition diagram depicts the states in the transaction and
the low level transaction operations that causes change in states.
Desirable Properties of Transactions
T1 T2
Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
Schedule
1. Serial Schedule
For example: Suppose there are two transactions T1 and T2 which have some
operations. If it has no interleaving of operations, then there are the following
two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations
of T2.
2. Execute all the operations of T1 which was followed by all the operations
of T2.
In the given (a) figure, Schedule A shows the serial schedule where T1
followed by T2.
In the given (b) figure, Schedule B shows the serial schedule where T2
followed by T1.
2. Non-serial Schedule
Validity — The value that’s decided upon should have been proposed by some
process
This protocol requires a coordinator. The client contacts the coordinator and
proposes a value. The coordinator then tries to establish the consensus among a
set of processes in two phases, hence the name.
1. In the first phase, coordinator contacts all the participants suggests value
proposed by the client and solicit their response.
2. After receiving all the responses, the coordinator makes a decision to commit
if all participants agreed upon the value or abort if someone disagrees.
When speaking about failures what are the types of failures of a node?
Fail Recover Model, Nodes crash, and recover after a certain time and continue
executing.
This is an extension of two-phase commit wherein the commit phase is split into
two phases as follows.
a. Prepare to commit, After unanimously receiving yes in the first phase of 2PC
the coordinator asks all participants to prepare to commit. During this phase, all
participants acquire locks etc, but they don’t actually commit.
b. If the coordinator receives yes from all participants during the prepare to
commit phase then it asks all participants to commit.
The pre-commit phase introduced
above helps us to recover from the case when a participant failure or both
coordinator and participant node failure during commit phase. The recovery
coordinator when it takes over after coordinator failure during phase2 of
previous 2 pc the new pre-commit comes handy as follows. On querying
participants, if it learns that some nodes are in commit phase then it assumes
that previous coordinator before crashing has made the decision to commit.
Hence it can shepherd the protocol to commit. Similarly, if a participant says
that it doesn’t receive prepare to commit, then the new coordinator can assume
that previous coordinator failed even before it started the prepare to commit
phase. Hence it can safely assume no other participant would have committed
the changes and hence safely abort the transaction.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database. Thus, for maintaining the concurrency of the
database, we have the concurrency control protocols.
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it
acquires an appropriate lock on it. There are two types of lock:
1. Shared lock:
It is also known as a Read-only lock. In a shared lock, the data item can
only read by the transaction.
It can be shared between the transactions because when the transaction
holds a lock, then it can't update the data on the data item.
2. Exclusive lock:
In the exclusive lock, the data item can be both reads as well as written by
the transaction.
This lock is exclusive, and in this lock, multiple transactions do not
modify the same data simultaneously.
Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction
may be released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase
can happen:
1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking
phase.
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Transaction T2:
Where
Validation (Ti): It contains the time when Ti finishes its read phase and starts
its validation phase.
Conflict Graphs
Another method is to create conflict graphs. For this transaction classes are
defined. A transaction class contains two set of data items called read set and
write set. A transaction belongs to a particular class if the transaction’s read set
is a subset of the class’ read set and the transaction’s write set is a subset of the
class’ write set. In the read phase, each transaction issues its read requests for
the data items in its read set. In the write phase, each transaction issues its write
requests.
A conflict graph is created for the classes to which active transactions belong.
This contains a set of vertical, horizontal, and diagonal edges. A vertical edge
connects two nodes within a class and denotes conflicts within the class. A
horizontal edge connects two nodes across two classes and denotes a write-write
conflict among different classes. A diagonal edge connects two nodes across
two classes and denotes a write-read or a read-write conflict among two classes.
The conflict graphs are analyzed to ascertain whether two transactions within
the same class or across two different classes can be run in parallel.
Query Processing
Thus, to make the system understand the user query, it needs to be translated in
the form of relational algebra. We can bring this query in the relational algebra
form as:
After translating the given query, we can execute each relational algebra
operation by using different algorithms. So, in this way, a query processing
begins its working.
Evaluation
Optimization
The cost of the query evaluation can vary for different types of queries.
Although the system is responsible for constructing the evaluation plan,
the user does need not to write their query efficiently.
Usually, a database system generates an efficient query evaluation plan,
which minimizes its cost. This type of task performed by the database
system and is known as Query Optimization.
For optimizing a query, the query optimizer should have an estimated
cost analysis of each operation. It is because the overall operation cost
depends on the memory allocations to several operations, execution costs,
and so on.
Finally, after selecting an evaluation plan, the system evaluates the query and
produces the output of the query.
Distributed Transactions