Dbms - r18 Unit 4 Notes
Dbms - r18 Unit 4 Notes
Dbms - r18 Unit 4 Notes
1. TRANSACTION
Definition: A transaction is a single logical unit consisting of one or more database access
operation.
Example: Withdrawing 1000 rupees from ATM.
The following set of operations are performed to withdraw 1000 rupees from database
2. ACID PROPERTIES
ACID properties are used for maintaining the integrity of database during transaction processing.
ACID stands for Atomicity, Consistency, Isolation, and Durability.
Atomicity: This property ensure that either all of the tasks of a transaction are performed or
none of them. In simple words it is referred as “all or nothing rule”.
Each transaction is said to be atomic if when one part of the transaction fails, the entire
transaction fails. When all parts of the transaction completed successfully, then the transaction
said to be success. (“all or nothing rule” )
ii. Consistency: The consistency property ensures that the database must be in consistent state
before and after the transaction. There must not be any possibility that some data is incorrectly
affected by the execution of a transaction.
For example, transfering funds from one account to another, the consistency property ensures
that the total values of funds in both the accounts is the same before and end of the transaction.
i.e., Assume initially, A balance = $400 and B balance = 700$.
The total balance of A + B = 1100$ (Before transferring 100$ from A to B)
The total balance of A + B = 1100$ (After transferring 100$ from A to B)
iii. Isolation: For every pair of transactions, one of the transactions should not start execution
before the other transaction execution completed, if they use some common data variable. That
is, if the transaction T1 is executing and using the data item X, then transaction T2 should not
start until the transaction T1 ends, if T2 also use same data item X.
For example, Transaction T1: Transfer 100$ from account A to account B
Transaction T2: Transfer 150$ from account B to account C
Assume initially, A balance = B balance = C balance = $1000
Transaction T1 Transaction T2
10:00 AM Read A’s balance ($1000) Read B’s balance ($1000)
10:01 AM A balance = A Balance – 100$ (1000-100 = 900$) B balance = B Balance – 150$ (1000-150 = 850$)
10:02 AM Read B’s balance ($1000) Read C’s balance ($1000)
10:03 AM B balance = B Balance + 100$ (1000+100 = 1100$) C balance = C Balance + 150$ (1000+150 = 1150$)
10:04 AM Write A’s balance (900$) Write B’s balance (850$)
10:05AM Write B’s balance (1100$) Write C’s balance (1150$)
10:06 AM COMMIT COMMIT
iv. Durability: Once a transaction completes successfully, the changes it has made into the
database should be permanent even if there is a system failure. The recovery-management
component of database systems ensures the durability of transaction. For example, assume
account A balance = 1000$. If A withdraw 100$ today, then the A balance = 900$. After two
days or a month, A balance should be 900$, if no other transactions done on A.
3. STATES OF TRANSACTION
A transaction goes through many different states throughout its life cycle. These states are called
as transaction states. They are:
Active State:
This is the first state in the life cycle of a transaction.
Once the transaction starts executing, then it is said to be in active state.
During this state it performs operations like READ and WRITE on some data items. All
the changes made by the transaction are now stored in the buffer in main memory. They
are not updated in database.
From active state, a transaction can go into either a partially committed state or a failed
state.
Committed State:
After all the changes made by the transaction have been successfully updated in the
database, it enters into a committed state and the transaction is considered to be fully
committed.
After a transaction has entered the committed state, it is not possible to roll back (undo)
the transaction. This is because the system is updated into a new consistent state and the
changes are made permanent.
The only way to undo the changes is by carrying out another transaction called
as compensating transaction that performs the reverse operations.
Failed State:
When a transaction is getting executed in the active state or partially committed state and
some failure occurs due to which it becomes impossible to continue the execution, it
enters into a failed state.
Aborted State:
After the transaction has failed and entered into a failed state, all the changes made by it
have to be undone.
To undo the changes made by the transaction, it becomes necessary to roll back the
transaction.
After the transaction has rolled back completely, it enters into an aborted state.
Terminated State:
This is the last state in the life cycle of a transaction.
After entering the committed state or aborted state, the transaction finally enters into a
terminated state where its life cycle finally comes to an end.
i. Serial Schedules:
All the transactions execute serially one after the other.
When one transaction executes, no other transaction is allowed to execute.
Examples:
Schedule-1 Schedule-2
T1 T2 T1 T2
Read(A) Read(A)
A=A-100 A=A+500
Write(A) Write(A)
Read(B) COMMIT
B=B+100 Read(A)
Write(B) A=A-100
COMMIT Write(A)
Read(A) Read(B)
A=A+500 B=B+100
Write(A) Write(B)
COMMIT COMMIT
In schedule-1 and schedule-2, the two transactions T1 and T2 executing concurrently. The
operations of T1 and T2 are interleaved. So, these schedules are Non-Serial Schedule.
In Schedule -1, only rule (1) & (2) are true, but rule (3) is not holding. So, the operations are not conflict.
In Schedule -2, rule (1), (2) & (3) are true. So, the operations are conflict.
In Schedule -3, only rule (1) & (3) are true, but rule (2) is not holding. So, the operations are not conflict.
In Schedule -4, rule (1), (2) & (3) are true. So, the operations are conflict.
There exists a cycle in the above graph. Therefore, the schedule S is not conflict serializable.
There exists no cycle in the precedence graph. Therefore, the schedule S is conflict serializable.
(1) Initial Read: The first read operation on each data item in both the schedule must be same.
For each data item X, If first read on X is done by transaction T a in schedule S1, then in
schedule2 also the first read on X must be done by transaction Ta only.
(2) Updated Read: It should be same in both the schedules.
If Read(X) of Ta followed by Write(X) of Tb in schedule S1, then in schedule S2 also, Read(X)
of Ta must follow Write(X) of Tb ..
(3) Final write: The final write operation on each data item in both the schedule must be same.
For each data item X, if X has been updated at last by transaction Ti in schedule S1, then in
schedule S2 also, X must be updated at last by transaction Ti.
Note: Every conflict serializable schedule is also view serializable schedule but not vice-versa
Problem 03: Check whether the given schedule S is view serializable or not
Schedule – 1
T1 T2
Read(A)
Write(A)
Read(A)
Write(A)
Read(B)
Write(B)
Read(B)
Write(B)
Solution:
For the given schedule-1, the serial schedule can be schedule -2
Schedule-1 (S1) Schedule-2 (S2)
T1 T2 T1 T2
Read(A) Read(A)
Write(A) Write(A)
Read(A) Read(B)
Write(A) Write(B)
Read(B) Read(A)
Write(B) Write(A)
Read(B) Read(B)
Write(B) Write(B)
Note: Other way of solving it is, if we are able to prove that S1 is conflict serializable, then S1 is also view serializable. (Refer
conflict serializable problems. Every conflict serializable schedule is also view serializable but not vice-versa.)
6. RECOVERABILITY
During execution, if any of the transaction in a schedule is aborted, then this may leads
the database into inconsistence state. If anything goes wrong, then the completed operations in
the schedule needs to be undone. Sometimes, these undone operations may not possible. The
recoverability of schedule depends on undone operations.
T1 T2
Read(A)
Write(A)
|
| Read(A) //Dirty Read
| Write(A)
|
COMMIT
COMMIT //Delayed
Here,
T2 performs a dirty read operation.
The commit operation of T2 is delayed till T1 commits or roll backs.
T1 commits later.
T2 is now allowed to commit.
In case, T1 would have failed, T2 has a chance to recover by rolling back.
Isolation level defines the degree to which a transaction must be isolated from the data
modifications made by any other transactions in the database system. The phenomena’s used to
define levels of isolation are:
a) Dirty Read
b) Non-repeatable Read
c) Phantom Read
Dirty Read: If a transaction reads a data value updated by an uncommitted transaction, then
this type of read is called as dirty read.
T1 T2
Read(A)
Write(A)
|
| Read(A) //Dirty Read
| Write(A)
| COMMIT
|
ROLLBACK
As T1 aborted, the results produced by T2 become wrong. This is because T2 read A (Dirty
Read) which is updated by T1.
Non-Repeatable Read: Non repeatable read occurs when a transaction read same data value
twice and get a different value each time. It happens when a transaction reads once before and
once after committed UPDATES from another transaction.
Table in Database
Read A=10 T1 T2
Read(A)
A
Write A=20
Write(A)
Read A=20
Read(A)
First, T1 reads data item A and get A=10
Next, T2 writes data item A as A = 20
Last, T1 reads data item A and get A=20
Phantom reads: Phantom reads occurs when a transaction read same data value twice and get
a different value each time. It happens when a transaction reads once before and once after
committed INSERTS and/or DELETES from another transaction.
Non-repeatable read Phantom read
When T1 perform second read, there is no When T1 perform second read, the no of rows
change in no of rows in the given table either increase or decrease.
T2 perform UPDATE operation on the T2 perform INSERT and/or DELETE
given table operation on the given table
Based on these three phenomena, SQL define four isolation levels. They are:
(1) Read uncommitted: This is the lowest level of isolation. In this level, one transaction
may read the data item modified by other transaction which is not committed. It mean dirty
read is allowed. In this level, transactions are not isolated from each other.
8. CONCURRENCY CONTROL
Concurrency is the ability of a database to execute multiple transactions simultaneously.
Concurrency control is a mechanism to manage the simultaneously executing multiple
transactions such that no transaction interfere with other transaction.
Executing multiple transactions concurrently improves the system performance.
Concurrency control increases the throughput and reduces waiting time of transactions.
If Concurrency Control is not done, then it may leads to problems like lost updates, dirty
read, non-repeatable read, phantom read etc. (Refer section 7 for more details)
Lost Updates: It occur when two transactions update same data item at the same time. In
this the first write is lost and only the second write is visible.
Concurrency control Protocols:
The concurrency can be controlled with the help of the following Protocols
(1) Lock-Based Protocol
(2) Timestamp-Based Protocol
(3) Validation-Based Protocol
Begin of End of
Transaction Transaction
No of locks
Growing Phase Shrinking Phase
Begin of End of
Transaction Transaction
The Two Phase Locking (2PL) has two phases. They are:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released. (Only get new locks but no release of locks).
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released,
but no new locks can be acquired. (Only release locks but no more getting new locks).
Example:
Time T1 T2
0 LOCK-S(A)
1 LOCK-S(A)
2 Read(A)
3 Read(A)
4 LOCK-X(B)
5 --
6 Read(B)
7 B = B + 100
8 Write(B)
9 UNLOCK(A)
10 LOCK-X(C)
11 UNLOCK(B) --
12 Read(C)
13 C = C + 500
14 Write(C)
15 COMMIT
16 UNLOCK(A)
17 UNLOCK(C)
18 COMMIT
The following way shows how unlocking and locking work with 2-PL.
Transaction T1: Transaction T2:
Growing phase: from step 1-5 (After first lock onwards) Growing phase: from step 2-11 (After first lock onwards)
Shrinking phase: from step 10-12 (After first unlock onwards) Shrinking phase: from step 17-18 (After first unlock onwards)
Lock point: at 5 (No more new locks) Lock point: at 11 (No more new locks)
Growing Phase
Begin of End of
Transaction Transaction
T3: Read(X)
X R-Timestamp(X) = TS(T2)
T2: Write(X)
There are mainly two Timestamp Ordering Algorithms in DBMS. They are:
Basic Timestamp Ordering
Thomas Write rule
Check the following condition whenever a transaction Ti issues a Read (X) operation:
o If W_ timestamp(X) >TS(Ti) then the operation is rejected.
o If W_ timestamp(X) <= TS(Ti) then the operation is executed.
(Read is not allowed by Ti, if any younger transactions than Ti write X)
o If TS(Ti) < W_ timestamp(X) then the operation is rejected and Ti is rolled back
otherwise the operation is executed. (Write is not allowed by Ti, if any younger transactions than
Ti write X and also Ti should be rolled back and restarted later)
(i). If R_TS(X) > TS(Ta), then abort and rollback Ta and reject the operation.
Transaction:T1 Transaction:T2 Variable A
Arrival = 9:00 AM Arrival = 9:02 AM Initial A=100
TS(T1) = 9:00 AM TS(T1) = 9:02 AM
| |
| Read(A) (A=100) A = 100 (R_TS(A) = 9:02AM)
| | :
Write(A) (A=200) | (A=100) A = 200 100
(iii). If the condition in (i) or (ii) is not satisfied, then execute Write(X) of Ta and set
W_TS(X) to TS(Ta).
Outdated writes are rejected but the transaction is continued in Thomas Write Rule but in Basic
TO protocol will reject write operation and terminate such a Transaction.
To perform the validation test, we need to know when the various phases of transaction Ta took
place. We shall therefore associate three different timestamps with transaction Ta.
(i). Start (Ta): the time when Ta, started its execution.
(ii). Validation (Ta): the time when Ta finished its execution and started its
validation phase.
A database contains multiple tables. Each table contains multiple records. Each record contains
multiple field values. It is shown in the above figure. For example, consider Table D and Record
R2. These two are not mutually exclusive. R2 is a part of D. So granularity means different
levels of data where as smaller levels are nested inside the higher levels. Inside database we have
tables. Inside table we have records. Inside record we have field values. This can be represented
with a tree as shown below.
DB
A B C D Tables
The larger the object size on which lock is applied, the lower the degree of concurrency
permitted. On the other hand, the smaller the object size on which lock is applied, the system has
to maintain larger number of locks. More locks cause a higher overhead and needs more disk
space. So, what is the best object size on which lock can be applied? It depends on the types of
transactions involved. If a typical transaction accesses data values from a record, it is
advantageous to have the lock to that one record. On the other hand, if a transaction typically
accesses many records in the same table, it may be better to have lock at that table.
Locking at higher levels needs lock details at lower levels. This information is provided by
additional types of locks called intention locks. The idea behind intention locks is for a
transaction to indicate, along the path from the root to the desired node, what type of lock
(shared or exclusive) it will require from one of the node’s descendants. There are three types of
intention locks:
(1) Intention-shared (IS): It indicates that one or more shared locks will be requested on
some descendant node(s).
(2) Intention-exclusive (IX): It indicates that one or more exclusive locks will be requested
on some descendant node(s).
(3) Shared-intention-exclusive (SIX): It indicates that the current node is locked in shared
mode but that one or more exclusive locks will be requested on some descendant node(s).
The compatibility table of the three intention locks, the shared and exclusive locks, is shown in
Figure.
Mode IS IX S SIX X
IS Yes Yes Yes Yes No
IX Yes Yes No No No
S Yes No Yes No No
SIX Yes No No No No
X No No No No No
It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts
to lock a node, then that node must follow these protocols:
When a system crashes, it may have many transactions being executed and many files may be
opened for them. When a DBMS recovers from a crash, it must maintain the following:
It must check the states of all the transactions that were being executed.
The following techniques facilitate a DBMS in recovering as well as maintaining the atomicity
of a transaction:
Log based recovery
Check point
Shadow paging
T1
T2
T3
T4
Time
The recovery system reads the logs backwards from the end to the last checkpoint.
It maintains two lists, an undo-list and a redo-list.
16. ARIES ALGORITHM (Algorithm for Recovery and Isolation Exploiting Semantics)
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is one of the log based
recovery method. It uses the Write Ahead Log (WAL) protocol.
Analysis Phase
Redo Phase
Undo phase
Database copy is created and stored in the remote area with the help of network. This
database is periodically updated with the current database so that it will be in sync with data and
other details. This remote database can be updated manually called offline backup. It can be
backed up online where the data is updated at current and remote database simultaneously. In
this case, as soon as there is a failure of current database, system automatically switches to the
remote database and starts functioning. The user will not know that there was a failure.
Full backup or Normal backup: Full backup is also known as Normal backup. In this,
an exact duplicate copy of the original database is created and stored every time the
backup made. The advantage of this type of backup is that restoring the lost data is very
fast as compared to other. The disadvantage of this method is that it takes more time to
backup.
Incremental Backup: Instead of backup entire database every time, backup only the
files that have been updated since the last full backup. For this at least weekly once
normal backup has to be done. While incremental database backups do run faster, the
recovery process is a bit more complicated.
Differential backup: Differential is similar to incremental backup but the difference is
that the recovery process is simplified by not clear the archive bit. So a file that is
updated after a normal backup will be archived every time a differential backup is run
until the next normal backup runs and clears the archive bit.