Transactions (Database Engine) : Transaction

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Transactions (Database Engine)

Other Versions

• Transactions have the “ACID” properties


– Atomicity - all or nothing
– Consistency - preserves database integrity
– Isolation - execute as if they were run alone
– Durability - results aren’t lost by a failure

A transaction is a sequence of operations performed as a single logical unit of work. A


logical unit of work must exhibit four properties, called the atomicity, consistency, isolation, and
durability (ACID) properties, to qualify as a transaction.

Atomicity

A transaction must be an atomic unit of work; either all of its data modifications are
performed, or none of them is performed.

Consistency

When completed, a transaction must leave all data in a consistent state. In a relational
database, all rules must be applied to the transaction's modifications to maintain all data
integrity. All internal data structures, such as B-tree indexes or doubly-linked lists, must
be correct at the end of the transaction.

Isolation

Modifications made by concurrent transactions must be isolated from the modifications


made by any other concurrent transactions. A transaction either recognizes data in the
state it was in before another concurrent transaction modified it, or it recognizes the data
after the second transaction has completed, but it does not recognize an intermediate
state. This is referred to as serializability because it results in the ability to reload the
starting data and replay a series of transactions to end up with the data in the same state it
was in after the original transactions were performed.

Durability
After a transaction has completed, its effects are permanently in place in the system. The
modifications persist even in the event of a system failure.

Specifying and Enforcing Transactions

SQL programmers are responsible for starting and ending transactions at points that enforce the
logical consistency of the data. The programmer must define the sequence of data modifications
that leave the data in a consistent state relative to the organization's business rules. The
programmer includes these modification statements in a single transaction so that the SQL
Server Database Engine can enforce the physical integrity of the transaction.

It is the responsibility of an enterprise database system, such as an instance of the Database


Engine, to provide mechanisms ensuring the physical integrity of each transaction. The Database
Engine provides:

 Locking facilities that preserve transaction isolation.


 Logging facilities that ensure transaction durability. Even if the server hardware,
operating system, or the instance of the Database Engine itself fails, the instance uses the
transaction logs upon restart to automatically roll back any uncompleted transactions to
the point of the system failure.
 Transaction management features that enforce transaction atomicity and consistency.
After a transaction has started, it must be successfully completed, or the instance of the
Database Engine undoes all of the data modifications made since the transaction started.

Deadlock in Database Systems


August 10th, 2009 Angus Macdonald

I first studied deadlock in my second year as an undergraduate, which is now around five years
ago. It took a recent supervisor meeting to realise how little of that I remembered, so I felt now
was a good time to refresh my memory a little. Hopefully it’ll help you too.

What is deadlock?

Deadlock is what occurs when two or more transactions are waiting on each other to release a
lock. Neither can move, and so both stall.

To combat this, a system can either prevent deadlock from occurring, or detect when it does
happen and act accordingly.

Models for Concurrency Control

To understand how deadlock occurs in database systems it helps to understand the role of various
concurrency control techniques. I’m going to discuss two popular approaches.
Two-phase locking (2PL) is commonly used to guarantee serializability in database systems. In
this model a transaction can either obtain a shared or exclusive lock for an item, but all locking
operations must occur before the first unlock operation. The name refers to the two phases that
result from this: the expanding phase where locks are acquired, and the shrinking phase where
locks are released.

There are a number of variations on this model. Conservative 2PL requires that all locks are
taken out at the beginning of the transaction, whereas Rigorous 2PL requires that all locks are
held until after the transaction commits (or aborts). The former collapses the expanding phase,
while the latter collapses the shrinking phase.

Timestamp-based concurrency control involves using unique transaction timestamps in place of


conventional locks. Concurrency control is based on the ordering of timestamps. So, for
example, when a transaction accesses an item, the system checks whether this transaction is older
than the last one which accessed the item. If this is the case the transaction proceeds; otherwise
ordering is violated and the transaction is aborted. Such strict timestamp-based approaches can
lead to the cyclic restart of transactions and starvation.

Multi-version concurrency control (MVCC) also incorporates timestamps by allowing several


versions of an item to be stored. This allows the system to present a consistent (but potentially
historical) version of the database to queries, meaning fewer reads are rejected than with basic
timestamp ordering.

Optimistic concurrency control (OCC) allows multiple transactions to read and update items
without blocking. However, before a transaction is committed the database must check for
conflicts – if any are found one of the conflicting transactions is rolled back.

Deadlock Prevention

For deadlock to occur four conditions need to be true (meaning you need to break one to prevent
deadlock):

Mutual Exclusion – a resource cannot be held by more than one transaction at a time. This
condition is true of database systems using two 2PL where an exclusive lock is required on
updates. Systems using OCC don’t hold locks, and so break this condition.

Hold and Wait – transactions already holding resources can request further resources.
Conservative 2PL breaks this condition, since it requires all locks to be acquired from the outset.
However this isn’t always desirable as it limits concurrency.

No pre-emption – a resource cannot be forcibly removed from a transaction. Pre-emption is


used in timestamp-based approaches. Two of the most commonly used schemes are wait-die and
wound-wait. In wait-die (non-preemptive), if a transaction tries to lock an item which is already
locked, it waits if the holder of the lock is a younger transaction (based on timestamp); otherwise
it will abort. In wound-wait (pre-emptive), instead of waiting as before, the transaction aborts the
other younger transaction. If it is the younger transaction it waits.
Circular wait – a number of transactions form a circular chain where each transaction is waiting
for a resource that a later transaction (in the chain) holds. This can be prevented by imposing a
total ordering on resources, requiring that each transaction requests locks on resources in an
agreed order. This may not be possible in some forms of 2PL where locks are not taken out at a
single point in time (e.g. rigorous 2PL).

Many of these approaches aren’t ideal because they result in transactions being aborted at the
slightest chance of deadlock. In situations where deadlock will rarely occur (for example, when
transactions are mostly short-lived and lightweight) detection is more practical.

Deadlock Detection

Deadlock detection involves periodically checking whether the system is in a state of deadlock.
There are two basic methods of detection: timeouts and wait-for graphs.

Timeouts represent the simplest method of detection. If a transaction waits for a period longer
than some constant timeout period it will be aborted. This method has a low overhead, but may
end up aborting transactions even in the absence of deadlock.

Another approach is for the system to construct a wait-for graph. Each node in the graph
represents an active transaction. A directed edge is drawn between two transactions when one
transaction is waiting for a lock on an item held by the other transaction. Deadlock exists (and is
detected) when there is a cycle in the graph. At this point the system engages in victim selection,
where one of the transactions is chosen to be aborted. The challenge in this technique is deciding
when and how often to check for deadlock in the graph.

Serialization provides:
serialization is the process of translating data structures or object state into a format that can be stored
(for example, in a file or memory buffer, or transmitted across a network connection link) and
resurrected later in the same or another computer environment. [1] When the resulting series of bits is
reread according to the serialization format, it can be used to create a semantically identical clone of the
original object. For many complex objects, such as those that make extensive use of references, this
process is not straightforward. Serialization of object-oriented objects does not include any of their
associated methods with which they were previously inextricably linked.

• a method of persisting objects, for example writing their properties to a file on disk, or
saving them to a database.
• a method of remote procedure calls, e.g., as in SOAP.
• a method for distributing objects, especially in software componentry such as COM,
CORBA, etc.
• a method for detecting changes in time-varying data.
Recovery
Recovery from Instance Failure

Crash or instance recovery recovers a database to its transaction-consistent state just before
instance failure. Crash recovery recovers a database in a single-instance configuration and
instance recovery recovers a database in an Oracle Parallel Server configuration. If all instances
of an Oracle Parallel Server database crash, then Oracle performs crash recovery.

Recovery from instance failure is automatic, requiring no DBA intervention. For example, when
using the Oracle Parallel Server, another instance performs instance recovery for the failed
instance. In single-instance configurations, Oracle performs crash recovery for a database when
the database is restarted, that is, mounted and opened to a new instance. The transition from a
mounted state to an open state automatically triggers crash recovery, if necessary.

Crash or instance recovery consists of the following steps:

1. Rolling forward to recover data that has not been recorded in the datafiles, yet has been
recorded in the online redo log, including the contents of rollback segments. This is called cache
recovery.

2. Opening the database. Instead of waiting for all transactions to be rolled back before making the
database available, Oracle allows the database to be opened as soon as cache recovery is
complete. Any data that is not locked by unrecovered transactions is immediately available.

3. Marking all transactions system-wide that were active at the time of failure as DEAD and
marking the rollback segments containing these transactions as PARTLY AVAILABLE.

4. Rolling back dead transactions as part of SMON recovery. This is called transaction recovery.

5. Resolving any pending distributed transactions undergoing a two-phase commit at the time of
the instance failure.

6. As new transactions encounter rows locked by dead transactions, they can automatically roll
back the dead transaction to release the locks. If you are using Fast-Start Recovery, just the data
block is immediately rolled back, as opposed to the entire transaction.

Elements of logical security include:


Logical Security consists of software safeguards for an organization’s systems, including user
identification and password access, authenticating, access rights and authority levels. These measures
are to ensure that only authorized users are able to perform actions or access information in a network
or a workstation. It is a subset of computer security.
 User IDs, also known as logins, user names, logons or accounts, are unique personal
identifiers for agents of a computer program or network that is accessible by more than
one agent. These identifiers are based on short strings of alphanumeric characters, and are
either assigned or chosen by the users.
 Authentication is the process used by a computer program, computer, or network to
attempt to confirm the identity of a user. Blind credentials (anonymous users) have no
identity, but are allowed to enter the system. The confirmation of identities is essential to
the concept of access control, which gives access to the authorized and excludes the
unauthorized.
 Biometrics authentication is the measuring of a user’s physiological or behavioral
features to attempt to confirm his/her identity. Physiological aspects that are used include
fingerprints, eye retinas and irises, voice patterns, facial patterns, and hand
measurements. Behavioral aspects that are used include signature recognition, gait
recognition, speaker recognition and typing pattern recognition. When a user registers
with the system which he/she will attempt to access later, one or more of his/her
physiological characteristics are obtained and processed by a numerical algorithm. This
number is then entered into a database, and the features of the user attempting to match
the stored features must match up to a certain error rate.
 Deference btwn logical and physical security is

The term Logical Security is colloquially used to refer to electronic measures such as permissions
within the operating system or access rules at the network layers such as the firewall, routers and
switches.

Physical security is traditionally used to describe controlled entry doors, video surveillance and other
metaphysical measures.

The overlap between the two is ever increasing since systems which provide logical security have
some physical security measures such as key lock panels on server face plates and when physical
systems such as the HiD card reader for the door entry systems use a server to maintain user and
card records and the ever increasing number of PC based DVR's for video surveillance cameras.

Concurrency control mechanisms


Categories

The main categories of concurrency control mechanisms are:

 Optimistic - Delay the checking of whether a transaction meets the isolation and other integrity
rules (e.g., serializability and recoverability) until its end, without blocking any of its (read, write)
operations ("...and be optimistic about the rules being met..."), and then abort a transaction to
prevent the violation, if the desired rules are to be violated upon its commit. An aborted
transaction is immediately restarted and re-executed, which incurs an obvious overhead (versus
executing it to the end only once). If not too many transactions are aborted, then being
optimistic is usually a good strategy.
 Pessimistic - Block an operation of a transaction, if it may cause violation of the rules, until the
possibility of violation disappears. Blocking operations is typically involved with performance
reduction.
 Semi-optimistic - Block operations in some situations, if they may cause violation of some rules,
and do not block in other situations while delaying rules checking (if needed) to transaction's
end, as done with optimistic.

Different categories provide different performance, i.e., different average transaction completion
rates (throughput), depending on transaction types mix, computing level of parallelism, and other
factors. If selection and knowledge about trade-offs are available, then category and method
should be chosen to provide the highest performance.

The mutual blocking between two transactions (where each one blocks the other) or more results
in a deadlock, where the transactions involved are stalled and cannot reach completion. Most
non-optimistic mechanisms (with blocking) are prone to deadlocks which are resolved by an
intentional abort of a stalled transaction (which releases the other transactions in that deadlock),
and its immediate restart and re-execution. The likelihood of a deadlock is typically low.

Both blocking, deadlocks, and aborts result in performance reduction, and hence the trade-offs
between the categories.

Methods

Many methods for concurrency control exist. Most of them can be implemented within either
main category above. The major methods,[1] which have each many variants, and in some cases
may overlap or be combined, are:

1. Locking (e.g., Two-phase locking - 2PL) - Controlling access to data by locks assigned to the data.
Access of a transaction to a data item (database object) locked by another transaction may be
blocked (depending on lock type and access operation type) until lock release.
2. Serialization graph checking (also called Serializability, or Conflict, or Precedence graph
checking) - Checking for cycles in the schedule's graph and breaking them by aborts.
3. Timestamp ordering (TO) - Assigning timestamps to transactions, and controlling or checking
access to data by timestamp order.
4. Commitment ordering (or Commit ordering; CO) - Controlling or checking transactions'
chronological order of commit events to be compatible with their respective precedence order.

Other major concurrency control types that are utilized in conjunction with the methods above
include:

 Multiversion concurrency control (MVCC) - Increasing concurrency and performance by


generating a new version of a database object each time the object is written, and allowing
transactions' read operations of several last relevant versions (of each object) depending on
scheduling method.
 Index concurrency control - Synchronizing access operations to indexes, rather than to user
data. Specialized methods provide substantial performance gains.
 Private workspace model (Deferred update) - Each transaction maintains a private workspace
for its accessed data, and its changed data become visible outside the transaction only upon its
commit (e.g., Weikum and Vossen 2001). This model provides a different concurrency control
behavior with benefits in many cases.

The most common mechanism type in database systems since their early days in the 1970s has
been Strong strict Two-phase locking (SS2PL; also called Rigorous scheduling or Rigorous 2PL)
which is a special case (variant) of both Two-phase locking (2PL) and Commitment ordering
(CO). It is pessimistic. In spite of its long name (for historical reasons) the idea of the SS2PL
mechanism is simple: "Release all locks applied by a transaction only after the transaction has
ended." SS2PL (or Rigorousness) is also the name of the set of all schedules that can be
generated by this mechanism, i.e., these are SS2PL (or Rigorous) schedules, have the SS2PL (or
Rigorousness) property.

Major goals of concurrency control mechanisms

Concurrency control mechanisms firstly need to operate correctly, i.e., to maintain each
transaction's integrity rules (as related to concurrency; application-specific integrity rule are out
of the scope here) while transactions are running concurrently, and thus the integrity of the entire
transactional system. Correctness needs to be achieved with as good performance as possible. In
addition, increasingly a need exists to operate effectively while transactions are distributed over
processes, computers, and computer networks. Other subjects that may affect concurrency
control are recovery and replication.

Correctness

Serializability
Main article: Serializability

For correctness, a common major goal of most concurrency control mechanisms is generating
schedules with the Serializability property. Without serializability undesirable phenomena may
occur, e.g., money may disappear from accounts, or be generated from nowhere. Serializability
of a schedule means equivalence (in the resulting database values) to some serial schedule with
the same transactions (i.e., in which transactions are sequential with no overlap in time, and thus
completely isolated from each other: No concurrent access by any two transactions to the same
data is possible). Serializability is considered the highest level of isolation among database
transactions, and the major correctness criterion for concurrent transactions. In some cases
compromised, relaxed forms of serializability are allowed for better performance (e.g., the
popular Snapshot isolation mechanism) or to meet availability requirements in highly distributed
systems (see Eventual consistency), but only if application's correctness is not violated by the
relaxation (e.g., no relaxation is allowed for money transactions, since by relaxation money can
disappear, or appear from nowhere).
Almost all implemented concurrency control mechanisms achieve serializability by providing
Conflict serializablity, a broad special case of serializability (i.e., it covers, enables most
serializable schedules, and does not impose significant additional delay-causing constraints)
which can be implemented efficiently.

Recoverability
See Recoverability in Serializability

Comment: While in the general area of systems the term "recoverability" may refer to the ability
of a system to recover from failure or from an incorrect/forbidden state, within concurrency
control of database systems this term has received a specific meaning.

Concurrency control typically also ensures the Recoverability property of schedules for
maintaining correctness in cases of aborted transactions (which can always happen for many
reasons). Recoverability (from abort) means that no committed transaction in a schedule has
read data written by an aborted transaction. Such data disappear from the database (upon the
abort) and are parts of an incorrect database state. Reading such data violates the consistency rule
of ACID. Unlike Serializability, Recoverability cannot be compromised, relaxed at any case,
since any relaxation results in quick database integrity violation upon aborts. The major methods
listed above provide serializability mechanisms. None of them in its general form automatically
provides recoverability, and special considerations and mechanism enhancements are needed to
support recoverability. A commonly utilized special case of recoverability is Strictness, which
allows efficient database recovery from failure (but excludes optimistic implementations; e.g.,
Strict CO (SCO) cannot have an optimistic implementation, but has semi-optimistic ones).

Comment: Note that the Recoverability property is needed even if no database failure occurs
and no database recovery from failure is needed. It is rather needed to correctly automatically
handle transaction aborts, which may be unrelated to database failure and recovery from it.

Distribution

With the fast technological development of computing the difference between local and
distributed computing over low latency networks or buses is blurring. Thus the quite effective
utilization of local techniques in such distributed environments is common, e.g., in computer
clusters and multi-core processors. However the local techniques have their limitations and use
multi-processes (or threads) supported by multi-processors (or multi-cores) to scale. This often
turns transactions into distributed ones, if they themselves need to span multi-processes. In these
cases most local concurrency control techniques do not scale well.

Distributed serializability and Commitment ordering

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy