dbms unit-3
dbms unit-3
To find that where the problem has occurred, we generalize a failure into the
following categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point
from where it can't go any further. If a few transaction or process is hurt, then
this is called as transaction failure.
2. System Crash
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail
frequently. It was a common problem in the early days of
technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head
crash, and unreachability to the disk or any other failure, which
destroy all or part of disk storage.
Storage structure
A database system provides an ultimate view of the stored data. However, data
in the form of bits, bytes get stored in different storage devices.
In this section, we will take an overview of various types of storage devices that
are used for accessing and storing data.
For storing the data, there are different types of storage options available.
These storage types differ from one another as per the speed and accessibility.
There are the following types of storage devices used for storing the data:
o Primary Storage
o Secondary Storage
o Tertiary Storage
Primary Storage
It is the primary area that offers quick access to the stored data. We also know
the primary storage as volatile storage. It is because this type of memory does
not permanently store the data. As soon as the system leads to a power cut or a
crash, the data also get lost. Main memory and cache are the types of primary
storage.
PlayNext
Unmute
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
o Main Memory: It is the one that is responsible for operating the data that
is available by the storage medium. The main memory handles each
instruction of a computer machine. This type of memory can store
gigabytes of data on a system but is small enough to carry the entire
database. At last, the main memory loses the whole content if the system
shuts down because of power failure or other reasons.
1. Cache: It is one of the costly storage media. On the other hand, it is the
fastest one. A cache is a tiny storage media which is maintained by the
computer hardware usually. While designing the algorithms and query
processors for the data structures, the designers keep concern on the
cache effects.
Secondary Storage
Secondary storage is also called as Online storage. It is the storage area that
allows the user to save and store data permanently. This type of memory does
not lose the data due to any power failure or system crash. That's why we also
call it non-volatile storage.
There are some commonly described secondary storage media which are
available in almost every type of computer system:
o Flash Memory: A flash memory stores data in USB (Universal Serial Bus)
keys which are further plugged into the USB slots of a computer system.
These USB keys help transfer data to a computer system, but it varies in
size limits. Unlike the main memory, it is possible to get back the stored
data which may be lost due to a power cut or other reasons. This type of
memory storage is most commonly used in the server systems for caching
the frequently used data. This leads the systems towards high
performance and is capable of storing large amounts of databases than
the main memory.
o Magnetic Disk Storage: This type of storage media is also known as online
storage media. A magnetic disk is used for storing the data for a long
time. It is capable of storing an entire database. It is the responsibility of
the computer system to make availability of the data from a disk to the
main memory for further accessing. Also, if the system performs any
operation over the data, the modified data should be written back to the
disk. The tremendous capability of a magnetic disk is that it does not
affect the data due to a system crash or failure, but a disk failure can
easily ruin as well as destroy the stored data.
Tertiary Storage
It is the storage type that is external from the computer system. It has the
slowest speed. But it is capable of storing a large amount of data. It is also
known as Offline storage. Tertiary storage is generally used for data backup.
There are following tertiary storage devices available:
Storage Hierarchy
Besides the above, various other storage devices reside in the computer system.
These storage media are organized on the basis of data accessing speed, cost
per unit of data to buy the medium, and by medium's reliability. Thus, we can
create a hierarchy of storage media on the basis of its cost and speed.
Thus, on arranging the above-described storage media in a hierarchy according
to its speed and cost, we conclude the below-described image:
In the image, the higher levels are expensive but fast. On moving down, the cost
per bit is decreasing, and the access time is increasing. Also, the storage media
from the main memory to up represents the volatile nature, and below the
main memory, all are non-volatile devices.
Introduction
Data may be monitored, stored, and changed rapidly and effectively using a
DBMS (Database Management System).A database possesses atomicity,
consistency, isolation, and durability qualities. The ability of a system to
preserve data and changes made to data defines its durability. A database could
fail for any of the following reasons:
o System breakdowns occur as a result of hardware or software issues in
the system.
o Disk crashes may occur as a result of the system's failure to read the disc.
o The data in the database must be recoverable to the state they were in
prior to the system failure, even if the database system fails. In such
situations, database recovery procedures in DBMS are employed to
retrieve the data.
o [write the item, T, X, old value, new value] indicates that the transaction T
changes the value of the variable X from the old value to the new value.
We may utilize these logs to see how the state of the data changes during a
transaction and recover it to the prior or new state.
An undo operation can be used to inspect the [write item, T, X, old value, new
value] operation and restore the data state to old data. The only way to restore
the previous state of data to the new state that was lost due to a system failure
is to do the [commit, T] action.
Consider the following series of transactions: t1, t2, t3, and t4. The system
crashes after the fourth transaction; however, the data can still be retrieved to
the state it was in before the checkpoint was established during transaction t1.
What is Transaction?
Read(A): Read operations Read(A) or R(A) reads the value of A from the
database and stores it in a buffer in the main memory.
Write (A): Write operation Write(A) or W(A) writes the value back to the
database from the buffer.
(Note: It doesn’t always need to write it to a database back it just writes the
changes to buffer this is the reason where dirty read comes into the picture)
Let us take a debit transaction from an account that consists of the following
operations:
1. R(A);
2. A=A-1000;
3. W(A);
• The first operation reads the value of A from the database and stores it in
a buffer.
• the Second operation will decrease its value by 1000. So buffer will
contain 4000.
• the Third operation will write the value from the buffer to the database.
So A’s final value will be 4000.
But it may also be possible that the transaction may fail after executing some of
its operations. The failure can be because of hardware, software or power, etc.
For example, if the debit transaction discussed above fails after executing
operation 2, the value of A will remain 5000 in the database which is not
acceptable by the bank. To avoid this, Database has two important operations:
Properties of a transaction:
• For Example, with T1 (debit of Rs. 1000 from A) and T2 (credit of 500 to A)
executing concurrently, the database reaches an inconsistent state.
• Let us assume the Account balance of A is Rs. 5000. T1 reads A(5000) and
stores the value in its local buffer space. Then T2 reads A(5000) and also
stores the value in its local buffer space.
• A’s value is updated to 4000 in the database and then T2 writes the value
from its buffer back to the database. A’s value is updated to 5500 which
shows that the effect of the debit transaction is lost and the database has
become inconsistent.
A=5000
W(A); A=5500
Table 1
Isolation: The result of a transaction should not be visible to others before the
transaction is committed. For example, let us assume that A’s balance is Rs.
5000 and T1 debits Rs. 1000 from A. A’s new balance will be 4000. If T2 credits
Rs. 500 to A’s new balance, A will become 4500, and after this T1 fails. Then we
have to roll back T2 as well because it is using the value produced by T1. So
transaction results are not made visible to other transactions before it commits.
Durable: Once the database has committed a transaction, the changes made by
the transaction should be permanent. e.g.; If a person has credited $500000 to
his account, the bank can’t say that the update has been lost. To avoid this
problem, multiple copies of the database are stored at different locations.
What is a Schedule?
Question: Consider the following transaction involving two bank accounts x and
y:
1. read(x);
2. x := x – 50;
3. write(x);
4. read(y);
5. y := y + 50;
6. write(y);
The constraint that the sum of the accounts x and y should remain constant is
that of?
1. Atomicity
2. Consistency
3. Isolation
4. Durability
[GATE 2015]
Advantages of Concurrency:
In general, concurrency means, that more than one transaction can work on a
system.
• Waiting Time: It means if a process is in a ready state but still the process
does not get the system to get execute is called waiting time. So,
concurrency leads to less waiting time.
• Response Time: The time wasted in getting the response from the cpu for
the first time, is called response time. So, concurrency leads to less
Response Time.
Serializable
S.NO. Serial Schedule Schedule
In Serializable schedule
In Serial schedule, transactions will
1 transaction are executed
be executed one after other.
concurrently.
Buffer management:
• The buffer manager sends the block address if a user requests certain
data and the data block is present in the database buffer in the main
memory.
• It is also responsible for allocating the data blocks in the database buffer
if the data blocks are not found in the database buffer.
• If the data blocks to be removed have been recently updated then the
changes are copied/written to the disk storage, else they are simply
removed from the database buffer.
• Programs that might interfere with requests from the disks and the
database buffer can't see what the buffer manager is doing inside as it
acts as a VM in the system.
Methods
The buffer manager applies the following techniques to provide the database
system with the best possible service:
If there is no space for a new data block in the database buffer, an existing block
must be removed from the buffer for the allocation of a new data block. Here,
the Least Recently Used (LRU) technique is used by several operating systems.
The least recently used data block is taken out of the buffer and sent back to the
disk. The term Buffer Replacement Strategy refers to this kind of replacement
technique.
Pinned Blocks
When a user needs to restore any data block from a system crash or failure, it is
crucial to limit the number of times a block is copied/written to the disk storage
to preserve the data. The majority of the recovery systems forbid writing blocks
to the disk while a data block update is taking place. Pinned Blocks are the data
blocks that are restricted from being written back to the disk. It helps a
database to have the capability to prevent writing data blocks while doing
updates so that the correct data is persisted after all operations.
Sometimes we may have to copy/write back the changes made in the data
blocks to the disk storage, even if the space that the data block takes up in the
database buffer is not required for usage. This method is regarded as a Forced
Output of Blocks. This method is used because system failure can cause data
stored in the database buffer to be lost, and often disk memory is not affected
by any type of system crash or failure.
The basic measure is to dump the entire contents of the database to stable
storage periodically.
a) Output all the log records currently present in the main memory into the
stable storage.
c) Copy all the data present in the database to the stable storage.
Most database systems support an ‘SQL dump’ as well, which writes out all the
SQL DDL statements and SQL insert statements into a file, which can then be re-
executed to recreate the database.
There are different other logging techniques which are more efficient than
above methods. Let us see some of them below :
o Operation Logging
o Transaction Rollback
o Crash Recovery
▪ Redo Phase
▪ Undo Phase
o Check pointing
Redo for the transaction can be done by following the log file –
physical log. We will not maintain logical log for redoing the
transaction. This is because; the state of the record would have
changes by the time system is recovered. Some other transactions
would have already executed and will lead to logical redo log to be
wrong. Hence the physical log itself is re-executed to redo the
operations.
Operation Logging
• When the operation is executed, logs for them are inserted as any other
normal logging method. It will contain physical undo and redo
informations.
• When the operation is complete, it will log <Ti, Oj, Operartion_end, U>.
This will have logical undo information for reverting the changes.
Transaction Rollback
• If it finds <Ti, Oj, Operartion_end, U> while traversing log, then rollback
the operation using logical undo, U. This logical undo operation is also
logged into log file as normal operation execution, but at the end instead
of <Ti, Oj, Operartion_end, U>, <Ti, Oj, Operartion_Abort> is logged. Then
skip all the operations till <Ti, Oj, Operartion_begin>is reached. i.e.; it
performs all the logical undo operation like any other normal operation
and its logs are entered into log file, and all the physical undo operations
are ignored.
Let us consider the transaction as below. We can observe that T1
has two operations O1 and O2, where O1 is completed fully and
while performing O2, system crashes. While recovering it starts
scan in reverse from the point where it failed and starts entering
the logs for recovering. Hence it finds only <T1, Z, ‘abc’, ‘xyz’> entry
in the log while recovering, and redo-only entry <T1, Z, ‘abc’> for O2
is entered. Then it finds operation end for O1. Hence it uses logical
undo to rollback the changes by O1. Though it finds logical undo as
‘DELETE’, it starts inserting the redo logs for performing ‘DELETE’.
This redo logs for delete will in turn delete the changes done by the
operation O1. It then traverses back the physical redo of O1
without executing it (ignores it) till it reaches <T1, Start>, and stops.
It adds <T1, Start> to the log file to indicate the end of reverting
transaction T1. We can see this in below log file- after logical undo
of O1, we do not have any logs of physical undo or redo, it jumps to
Abort log entries.
Crash Recovery
Redo Phase
• <Ti, Commit> or <Ti, Abort> is found, then it deletes Ti entry from undo
list, L
Hence undo list will have all the transactions which are partially
performed and all other committed transactions are re-done
(redoing the transaction is not exactly as re-executing them. This
forward scanning assumes that those transactions are already
performed and committed, and lists only those transactions that
are not committed yet and in partial execution state.)
Undo Phase
In this phase, the log files are scanned backward for the
transactions in the undo list. Undoing of transactions are performed
as described in transaction rollback. It checks for the end log for
each operations, if found then it performs logical undo, else
physical undo by entering the logs in log files.
Check pointing
Check pointing is the mechanism to mark the entries in the log file
that those changes are permanently updated into database, and if
there is any failure, log files need not be traversed beyond that
point. Only those entries after check point are not written to DB,
and have to be redone / undone. This is done at periodic intervals
or as per the schedule. It checks for the log records after the last
check point and outputs it to the stable memory / disks. If the
buffer blocks in main memory is full, then it outputs the logs into
disks. If there is any new checkpoint is defined, then all the entries
from last checkpoint to the new check points are written to disks.
But any transactions will not get executed during this check
pointing process.
• Creates a list M for all the logs between the last checkpoint to new
checkpoint, i.e.; M is the list of log records which are yet to be written to
disk.
• The buffer blocks in list M are written to the disk or stable storage. No
transactions should update these blocks. In addition, all the records in
these blocks in list M are written to the disk first, and then the block is
updated to the disk.
• Disk should have pointer to the last checkpoint – last_checkpoint in the
main memory at fixed location. This will help to read the blocks for the
next update and maintain new list M.
Whenever there is recovery from failure, the logs are read from the
last_checkpoint stored in DB. This is because logs before
last_checkpoint are already updated to DB and those after these
points have to be written. These logs are recovered as described in
above methods.
Remote Backup
Data and log records from a primary site are continuously backed
up into a remote backup site.
b) Transfer of control: When the primary site fails, the backup site
takes over the processing and becomes the new primary site.