DBMS Recovery System
DBMS Recovery System
● An integral part of a database system is a recovery scheme that can restore the database to
the consistent state that existed before the failure.
● The recovery scheme must also provide high availability; that is, it must minimize the time
for which the database is not usable after a failure.
Failure Classification
There are various types of failure that may occur in a system, each of which needs to be dealt with in
a different manner.
Transaction failure:
There are two types of errors that may cause a transaction to fail:
◦ Logical error. The transaction can no longer continue with its normal execution because of some
internal condition, such as bad input, data not found, overflow, or resource limit exceeded.
◦ System error. The system has entered state of deadlock, as a result of which a transaction cannot
continue with its normal execution. The transaction, however, can be re executed at a later time.
System crash. There is a hardware malfunction, or a bug in the database software or the operating
system, that causes the loss of the content of volatile storage, and brings transaction processing to
a halt. The content of non volatile storage remains intact, and is not corrupted.
Disk failure. A disk block loses its content as a result of either a head crash or failure during a
data-transfer operation. Copies of the data on other disks, or archival backups on tertiary media,
such as DVD or tapes, are used to recover from the failure.
Storage
Categories of Storage:
Volatile storage
Nonvolatile storage
Stable storage
Stable Storage Implementation
Stable storage is a classification of computer data storage technology that guarantees atomicity
for any given write operation and allows software to be written that is robust against some
hardware and power failures.
To implement stable storage, we need to replicate the needed information in several nonvolatile
storage media (usually disk)with independent failure modes, and to update the information in
a controlled manner to ensure that failure during data transfer does not damage the needed
information.
● RAID systems guarantee that the failure of a single disk (even during data
transfer) will not result in loss of data. The simplest and fastest form of RAID is
the mirrored disk, which keeps two copies of each block, on separate disks.
● Other forms of RAID offer lower costs, but at the expense of lower performance.
● RAID systems, however, cannot guard against data loss due to disasters such
● as fires or flooding.
● Many systems store archival backups of tapes off site to guard against such
disasters. However, since tapes cannot be carried off site continually, updates
since the most recent time that tapes were carried off site could be lost in such a
disaster.
● More secure systems keep a copy of each block of stable storage at a remote
site, writing it out over a computer network, in addition to storing the block on a
local disk system.
● Since the blocks are output to a remote system as and when they are output to
local storage, once an output operation is complete, the output is not lost, even
in the event of a disaster such as a fire or flood.
Block transfer between memory and disk storage can result in:
● Partial failure. A failure occurred in the midst of transfer, and the destination
block has incorrect information
● Total failure. The failure occurred sufficiently early during the transfer that the
destination block remains intact.
If a data-transfer failure occurs, the system detects it and invokes a recovery
procedure to restore the block to a consistent state. To do so, the system must
maintain two physical blocks for each logical database block; in the case of
mirrored disks, both blocks are at the same location; in the case of remote backup,
one of the blocks is local, whereas the other is at a remote site.
● Block movements between disk and main memory are initiated through the
following two operations:
● 1. input(A) transfers the physical block A to main memory.
● 2. output(B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
• Conceptually, each transaction Ti has a private work area in which copies of data
items accessed and updated by Ti are kept. The system creates this work area when
the transaction is initiated; the system removes it when the transaction either
commits or aborts.
Each data item X kept in the work area of transaction Ti is denoted by xi. Transaction Ti
interacts with the database system by transferring data to and from its work area to the
system buffer.
We transfer data by these two operations:
1. read(X) assigns the value of data item X to the local variable xi. It executes this
operation as follows:
a. If block BX on which X resides is not in main memory, it issues input(BX).
b. It assigns to xi the value of X from the buffer block.
2. write(X) assigns the value of local variable xi to data item X in the buffer block. It
executes this operation as follows:
a. If block BX on which X resides is not in main memory, it issues input(BX).
b. It assigns the value of xi to X in buffer BX.
Log Records
● The most widely used structure for recording database modifications is the log.
● The log is a sequence of log records, recording all the update activities in the
database.
● There are several types of log records. An update log record describes a single
database write.
● It has these fields:
• Transaction identifier, which is the unique identifier of the transaction that
performed the write operation.
• Data-item identifier, which is the unique identifier of the data item written.
Typically, it is the location on disk of the data item, consisting of the block
identifier of the block on which the data item resides, and an offset within the
block.
• Old value, which is the value of the data item prior to the write.
• New value, which is the value that the data item will have after the
write.
We represent an update log record as Ti , Xj , V1, V2
Ti Transaction
Xj Write data item
V1 Value before write
V2 Value after write
Ex: • <Ti start>. Transaction Ti has started.
• <Ti commit>. Transaction Ti has committed.
• <Ti abort>. Transaction Ti has aborted.
● Note: We have the ability to undo a modification that has already been output
to the database. We undo it by using the old-value field in log records.
Database Modification
● A transaction creates a log record prior to modifying the database.
● The log records allow the system to undo changes made by a transaction in the
event that the transaction must be aborted; they allow the system also to redo
changes made by a transaction if the transaction has committed but the system
crashed before those changes could be stored in the database on disk.
● In order for us to understand the role of these log records in recovery, we need to
consider the steps a transaction takes in modifying a data item:
● 1. The transaction performs some computations in its own private part of main
memory.
● 2. The transaction modifies the data block in the disk buffer in main memory
holding the data item.
● 3. The database system executes the output operation that writes the data block
to disk.
● If a transaction does not modify the database until it has committed, it is said
to use the deferred-modification technique.
● If database modifications occur while the transaction is still active, the
transaction is said to use the immediate-modification technique.
● Deferred modification has the overhead that transactions need to make local
copies of all updated data items; further, if a transaction reads a data item that
it has updated, it must read the value from its local copy.
● A recovery algorithm must take into account a variety of factors, including:
• The possibility that a transaction may have committed although
some of its database modifications exist only in the disk buffer in main
memory and not in the database on disk.
• The possibility that a transaction may have modified the database
while in the active state and, as a result of a subsequent failure, may need to
abort.
This allows the system to perform undo and redo operations as appropriate.
• Undo using a log record sets the data item specified in the log record
to the old value.
• Redo using a log record sets the data item specified in the log record
to the new value.
Concurrency Control and Recovery
● If the concurrency control scheme allows a data item X that has been modified by
a transaction T1 to be further modified by another transaction T2 before T1
commits, then undoing the effects of T1 by restoring the old value of X (before T1
updated X) would also undo the effects of T2.
● To avoid such situations, recovery algorithms usually require that if a data item
has been modified by a transaction, no other transaction can modify the data
item until the first transaction commits or aborts.
● This requirement can be ensured by acquiring an exclusive lock on any updated
data item and holding the lock until the transaction commits; in other words, by
using strict two-phase locking.
● Snapshot-isolation and validation based concurrency-control techniques also
acquire exclusive locks on data items at the time of validation, before modifying
the data items, and hold the locks until the transaction is committed; as a result
Transaction Commit
● We say that a transaction has committed when its commit log record, which is
the last log record of the transaction, has been output to stable storage; at
that point all earlier log records have already been output to stable storage.
● Thus, there is enough information in the log to ensure that even if there is a
system crash, the updates of the transaction can be redone. If a system crash
occurs before a log record < Ti commit> is output to stable storage,
transaction Ti will be rolled back.
● Thus, the output of the block containing the commit log record is the single
atomic action that results in a transaction getting committed.
● With most log-based recovery techniques, including the ones we describe in
this chapter, blocks containing the data items modified by a transaction do not
have to be output to stable storage when the transaction commits, but can be
output some time later.