Unit 2

UNIT-II
Transaction Processing
Transaction Processing:
Consistency:
Consistency in database management systems (DBMS) is a fundamental aspect of the ACID

properties, which ensure reliable transaction processing. The ACID properties stand for
Atomicity, Consistency, Isolation, and Durability. Consistency specifically refers to the
requirement that a database remains in a valid state before and after a transaction.
Here’s a deeper look at Consistency in DBMS:
1. Definition: Consistency ensures that any transaction will bring the database from one
valid state to another, maintaining the predefined rules and constraints, such as
integrity constraints, cascades, triggers, etc.
2. Integrity Constraints: Consistency rules are often implemented through integrity
constraints, which can include:
o Primary Keys: Ensuring each record can be uniquely identified.
o Foreign Keys: Ensuring referential integrity between related tables.
o Unique Constraints: Ensuring that values in certain columns are unique
across the dataset.
o Check Constraints: Ensuring that values in a column meet a specific
condition.
3. Transaction Validation: During a transaction, the DBMS checks the consistency
rules. If a transaction would violate any of these rules, it is rolled back to maintain the
database's integrity. For example, if a transaction tries to insert a duplicate value in a
column with a unique constraint, the transaction will fail.
4. Example:
o Bank Transfer: When transferring money from Account A to Account B, the
transaction ensures that the total amount of money in the system remains the
same before and after the transaction. If $100 is deducted from Account A,
then $100 must be added to Account B.
o Inventory Management: If an order is placed that reduces the inventory
count, the system ensures that the count cannot go below zero, maintaining the
consistency of inventory data.
5. Ensuring Consistency:
o Atomicity: By ensuring that a transaction is all or nothing, the system
prevents partial updates that could violate consistency.
o Isolation: By ensuring that transactions are processed in isolation from one
another, the system prevents concurrent transactions from leading to an
inconsistent state.
o Durability: By ensuring that once a transaction is committed, it remains so
even in the event of a system failure, the system ensures that the database
remains consistent.
Consistency is crucial for the reliability and trustworthiness of a DBMS, ensuring that all data
remains accurate and adheres to business rules and constraints throughout all transactions.
Atomicity:
Atomicity is one of the key properties of a transaction in a Database Management System
(DBMS), often referred to by the acronym ACID (Atomicity, Consistency, Isolation,
Durability). Atomicity ensures that a series of database operations (which constitute a
transaction) are treated as a single, indivisible unit. This means that either all operations
within the transaction are completed successfully, or none of them are applied at all. Here’s a
detailed explanation of atomicity:
1. All-or-Nothing Principle: If any part of a transaction fails, the entire transaction

fails. For example, if a transaction involves transferring money from one bank
account to another, both the debit from the first account and the credit to the second
account must succeed. If the debit succeeds but the credit fails, the transaction is
rolled back, and neither operation is applied.
2. Rollback Mechanism: DBMS uses a rollback mechanism to maintain atomicity. If a
transaction cannot be completed successfully, the database is returned to the state it
was in before the transaction began. This involves undoing all the changes made
during the transaction.
3. Commit and Abort:
o Commit: When a transaction is completed successfully, all changes made
during the transaction are saved to the database. This makes the changes
permanent.
o Abort: If a transaction fails or encounters an error, it is aborted. All changes
made during the transaction are undone, ensuring that the database remains in
a consistent state.
4. Transaction Log: The DBMS maintains a transaction log to record all changes made
during a transaction. This log is crucial for implementing atomicity. In the event of a
system failure, the DBMS can use the transaction log to determine which transactions
need to be rolled back or completed.
5. Error Handling: Atomicity requires robust error handling mechanisms to detect
failures and ensure that the necessary rollback is performed. This may involve
detecting issues such as hardware failures, software bugs, or violations of database
constraints.
Isolation and Durability:
n addition to Atomicity, Isolation and Durability are two other key properties of database
transactions within the ACID framework. Here’s an explanation of each:
Isolation
Isolation ensures that concurrent transactions do not interfere with each other. Each
transaction operates as if it is the only transaction in the system, even though multiple
transactions may be occurring simultaneously. This prevents the operations of one transaction
from being visible to other transactions until the transaction is committed, ensuring data
consistency and correctness.
1. Concurrency Control: DBMS uses concurrency control mechanisms like locking,

timestamp ordering, and multiversion concurrency control (MVCC) to manage the
isolation of transactions. These mechanisms ensure that transactions are executed in a
way that they do not affect each other.
2. Isolation Levels: Different levels of isolation provide a trade-off between
performance and the degree of isolation:
o Read Uncommitted: Transactions may read uncommitted changes made by
other transactions, leading to issues like dirty reads.
o Read Committed: Transactions can only read committed changes made by
other transactions, preventing dirty reads.
o Repeatable Read: Ensures that if a transaction reads a value, it will see the
same value if it reads again, preventing non-repeatable reads.
o Serializable: The highest level of isolation, ensuring complete isolation by
executing transactions in a way that they appear to be executed serially.
3. Phenomena Avoided:
o Dirty Reads: Reading uncommitted changes from other transactions.
o Non-repeatable Reads: A value read twice in the same transaction may be
different if another transaction modifies and commits it in between the reads.
o Phantom Reads: A query executed twice in the same transaction returns a
different set of rows because another transaction inserted or deleted rows in
between the queries.
Durability
Durability ensures that once a transaction has been committed, its changes are permanent and
survive any subsequent system failures. This property guarantees that committed data will not
be lost, even in the event of power outages, crashes, or other failures.
1. Transaction Logs: The DBMS maintains transaction logs that record all changes
made by a transaction. These logs are written to stable storage before the transaction
is considered committed.
2. Write-Ahead Logging (WAL): This technique ensures that all changes are logged
before they are applied to the database. In case of a system failure, the logs can be
used to redo the changes made by committed transactions, ensuring durability.
3. Checkpointing: Periodically, the DBMS takes a snapshot of the database’s current
state. This reduces the amount of work needed during recovery, as the system can
start from the last checkpoint and use the transaction logs to restore the database to a
consistent state.
4. Redundant Storage: To enhance durability, DBMSs often use redundant storage
systems such as RAID (Redundant Array of Independent Disks) to protect against
hardware failures.
5. Crash Recovery: In the event of a system crash, the DBMS uses the transaction logs
and checkpoint information to recover the database to a consistent state. This involves
undoing any changes made by incomplete transactions and redoing changes made by
committed transactions.
Serializable Schedule:
In a Database Management System (DBMS), a serializable schedule is one that ensures the
correctness and consistency of transactions. A schedule is a sequence of operations from one
or more transactions. The goal of a serializable schedule is to maintain the same results as if
the transactions were executed serially (one after the other), even though they may actually
be executed concurrently (interleaved). This ensures that the outcome of the concurrent
transactions is predictable and reliable.
Key Concepts of Serializable Schedules
1. Serial Schedule: A schedule in which transactions are executed one after another
without any interleaving of their operations. It is the simplest form of a schedule, but
not efficient for concurrent processing.
2. Serializable Schedule: A schedule that, although it may allow interleaving of
operations from different transactions, produces the same result as some serial
schedule. This ensures that the interleaving of operations does not introduce
inconsistency or unexpected results.
Types of Serializability
1. Conflict Serializability: A schedule is conflict-serializable if it can be transformed

into a serial schedule by swapping non-conflicting operations. Two operations
conflict if they belong to different transactions, operate on the same data item, and at
least one of them is a write operation.
o Example: Consider two transactions T1 and T2:
 T1: R(A), W(A)
 T2: R(A), W(A)
If the schedule is S1: R1(A), R2(A), W1(A), W2(A), then this is not conflict-
serializable because you cannot swap operations to make it equivalent to any
serial schedule (like T1 followed by T2 or vice versa).
2. View Serializability: A schedule is view-serializable if it can be transformed into a

serial schedule by considering the view of the transactions, meaning the read and
write operations produce the same final results and the same intermediate reads.
o Example: Consider transactions T1 and T2:
 T1: W(A)
 T2: R(A), W(A)
If the schedule is S1: W1(A), R2(A), W2(A), this is view-serializable because

it can be transformed into a serial schedule by considering the final output and
the intermediate reads.
Testing for Serializability
1. Precedence Graph (Serialization Graph):

o Nodes: Each node represents a transaction.
o Edges: An edge from Ti to Tj indicates that an operation of Ti precedes and
conflicts with an operation of Tj.
o A schedule is conflict-serializable if and only if its precedence graph is
acyclic.
o Example:
 T1: R(A), W(B)
 T2: W(A), R(B)
For the schedule S1: R1(A), W2(A), R2(B), W1(B), the precedence graph will
have edges from T1 to T2 and T2 to T1, forming a cycle. Hence, it is not
conflict-serializable.
Ensuring Serializability
DBMSs use various concurrency control techniques to ensure that schedules are serializable,
such as:
1. Two-Phase Locking (2PL): Ensures serializability by dividing the transaction

execution into two phases: a growing phase, where locks are acquired but not
released, and a shrinking phase, where locks are released but not acquired. This
prevents cycles in the precedence graph.
2. Timestamp Ordering: Assigns timestamps to transactions and ensures that
conflicting operations are executed in timestamp order. This prevents non-serializable
schedules.
3. Multiversion Concurrency Control (MVCC): Maintains multiple versions of data
items to allow read operations to occur without blocking write operations, while
ensuring serializability.
Recoverable Schedule:
A recoverable schedule in a database context ensures that the database can return to a
consistent state after a transaction failure. This concept is critical for maintaining data
integrity in the event of system crashes, power failures, or other unforeseen issues.
Key Concepts:
1. Transaction: A sequence of one or more SQL operations treated as a single unit.

2. Schedule: An ordering of the operations of multiple transactions.
3. Recoverable Schedule: A schedule where, for any pair of transactions TiT_iTi and
TjT_jTj, if TjT_jTj reads data previously written by TiT_iTi, then TiT_iTi must
commit before TjT_jTj.
Characteristics of a Recoverable Schedule:
1. Commit Order: Transactions that read from another transaction must commit after
the transactions they read from.
2. Cascadeless: Ideally, schedules should be cascadeless, meaning no transaction reads
data written by another uncommitted transaction. This ensures recoverability but with
fewer complications.
Example:
Assume two transactions T1T1T1 and T2T2T2:
 T1T1T1 writes data AAA.

 T2T2T2 reads data AAA.
 T1T1T1 commits.
 T2T2T2 commits.
This schedule is recoverable because T2T2T2 only commits after T1T1T1 has committed.
Non-Recoverable Schedule:
If T2T2T2 commits before T1T1T1, and then T1T1T1 fails, T2T2T2 has read uncommitted
data, leading to inconsistency. This situation should be avoided.
Techniques to Ensure Recoverable Schedules:
1. Locking Protocols: Use locking mechanisms to control access to data items.

2. Timestamp Ordering: Assign timestamps to transactions and order them
accordingly.
3. Commit Dependencies: Track dependencies between transactions to enforce commit
order.
By understanding and implementing these principles, a database can maintain consistency

and integrity, even in the face of transaction failures.
Concurrency Control:
Concurrency control in a Database Management System (DBMS) is the process of managing

simultaneous operations on the database without allowing them to interfere with each other.
This is essential for maintaining data integrity and consistency when multiple users access or
modify the database at the same time.
Here are the key concepts related to concurrency control in DBMS:
1. Transactions:
 A transaction is a sequence of operations performed as a single logical unit of work. A

transaction must be either fully completed or fully aborted.
2. ACID Properties:
 Atomicity: Ensures that all operations within a transaction are completed; if not, the
transaction is aborted.
 Consistency: Ensures that the database remains in a consistent state before and after
the transaction.
 Isolation: Ensures that transactions are isolated from each other until they are
completed.
 Durability: Ensures that once a transaction is committed, it will remain so, even in
the event of a system failure.
3. Concurrency Problems:
 Lost Updates: When two transactions simultaneously update the same data, and the
final value reflects only one of the updates.
 Temporary Inconsistencies: When one transaction reads data that is being modified
by another transaction, leading to incorrect or intermediate results.
 Deadlock: A situation where two or more transactions are waiting for each other to
release locks, causing all of them to wait indefinitely.
 Non-repeatable Reads: When a transaction reads the same data twice and gets
different results each time.
 Phantom Reads: When a transaction re-executes a query and finds that the set of
rows satisfying the query condition has changed.
4. Concurrency Control Techniques:
Lock-Based Protocols:
 Locks: Mechanisms to control access to database items.

o Shared Locks (S): Allow multiple transactions to read but not write.
o Exclusive Locks (X): Allow one transaction to read and write.
 Two-Phase Locking (2PL): Ensures serializability by requiring transactions to
acquire all the locks before releasing any.
o Strict 2PL: Requires transactions to hold all locks until commit.
 Deadlock Handling:
o Deadlock Prevention: Ensures that the system never enters a deadlock state.
o Deadlock Detection and Resolution: Allows the system to enter a deadlock
state but detects and resolves it.
Timestamp-Based Protocols:
 Timestamp Ordering: Assigns a unique timestamp to each transaction and ensures

that conflicting operations are executed in timestamp order.
o Basic Timestamp Ordering: Ensures that the transactions execute in the
order of their timestamps.
o Thomas's Write Rule: An optimization that allows certain types of write
operations to be ignored, reducing unnecessary aborts.
Optimistic Concurrency Control:
 Transactions execute without restrictions, but before committing, they check whether
they have conflicted with other transactions.
 If a conflict is detected, the transaction is rolled back and restarted.
Multiversion Concurrency Control (MVCC):
 Maintains multiple versions of data items to allow read operations to proceed without
waiting for write operations.
 Provides snapshot isolation, ensuring that each transaction sees a consistent snapshot
of the database.
5. Isolation Levels:
 Read Uncommitted: Lowest isolation level, where transactions may read
uncommitted changes made by other transactions.
 Read Committed: Ensures that a transaction can only read committed changes made
by other transactions.
 Repeatable Read: Ensures that if a transaction reads a value, it will read the same
value throughout its execution.
 Serializable: Highest isolation level, ensuring full isolation from other transactions.
Concurrency control is a critical aspect of DBMS to ensure data integrity, consistency, and
performance, especially in environments where multiple users are accessing and modifying
the database simultaneously.
Time-stamp based protocols:
Time-stamp based protocols in Database Management Systems (DBMS) are concurrency

control mechanisms that use timestamps to manage the order of transactions. The primary
objective is to ensure the serializability of transactions, which means that the transactions
appear to be executed in some serial order.
Key Concepts of Time-Stamp Based Protocols
1. Timestamp: Each transaction is assigned a unique timestamp when it starts. This

timestamp is used to order transactions.
2. W-timestamp (Q): The latest timestamp of any transaction that wrote to data item Q.
3. R-timestamp (Q): The latest timestamp of any transaction that read from data item Q.
Types of Time-Stamp Based Protocols
Basic Timestamp Ordering Protocol
 Read Operation (Ti reads Q):

o If Ti’s timestamp is less than W-timestamp(Q), then Ti must be aborted and
restarted with a new timestamp.
o Otherwise, the read is executed, and the R-timestamp(Q) is updated to the
maximum of R-timestamp(Q) and Ti’s timestamp.
 Write Operation (Ti writes Q):

o If Ti’s timestamp is less than R-timestamp(Q), then Ti must be aborted and
o If Ti’s timestamp is less than W-timestamp(Q), then Ti must be aborted and
o Otherwise, the write is executed, and the W-timestamp(Q) is updated to Ti’s
timestamp.
Thomas's Write Rule
 Write Operation (Ti writes Q):

o If Ti’s timestamp is less than W-timestamp(Q), the write is ignored and the
transaction continues.
o Otherwise, the write is executed, and the W-timestamp(Q) is updated to Ti’s
timestamp.
Advantages of Time-Stamp Based Protocols
 Ensures serializability: Transactions are executed in a serial order.

 Deadlock-free: No waiting is involved as transactions are aborted and restarted.
Disadvantages of Time-Stamp Based Protocols
 High overhead: Maintaining and comparing timestamps for each operation.

 Potential for starvation: Transactions with older timestamps may be repeatedly
aborted.
 Cascading rollbacks: When a transaction is aborted, other dependent transactions may
also need to be aborted.
Example
Consider two transactions T1 and T2 with timestamps TS(T1) = 1 and TS(T2) = 2,

respectively.
1. T1 reads Q: Allowed since it’s the first transaction.

2. T2 writes Q: Allowed since TS(T2) > TS(T1). The W-timestamp(Q) is updated to
TS(T2).
3. T1 writes Q: Not allowed if TS(T1) < W-timestamp(Q). T1 would be aborted and
By using timestamps, the DBMS ensures that all operations on data items occur in a
consistent order, thus maintaining database integrity and isolation of transactions.
Isolation Levels:
Isolation levels in Database Management Systems (DBMS) define the degree to which the
operations in one transaction are isolated from those in other transactions. They help control
the visibility of changes made by one transaction to other concurrent transactions. The
isolation levels in SQL databases, as defined by the SQL standard (SQL-92), include:
1. Read Uncommitted
 Description: The lowest isolation level, where transactions can see changes made by
other transactions even if they are not yet committed.
 Advantages: Offers the highest level of concurrency and the least amount of
overhead.
 Disadvantages: Can lead to dirty reads, non-repeatable reads, and phantom reads.
2. Read Committed
 Description: A transaction can only read data that has been committed. It cannot read
data that is currently being modified by another transaction.
 Advantages: Prevents dirty reads.
 Disadvantages: Can still have non-repeatable reads and phantom reads.
3. Repeatable Read
 Description: Ensures that if a transaction reads a value, any subsequent reads of that
value will see the same data, even if other transactions update the data in the
meantime.
 Advantages: Prevents dirty reads and non-repeatable reads.
 Disadvantages: Does not prevent phantom reads.
4. Serializable
 Description: The highest isolation level, where transactions are completely isolated
from one another. It ensures that the outcome of executing transactions concurrently
is the same as if the transactions were executed serially, one after the other.
 Advantages: Prevents dirty reads, non-repeatable reads, and phantom reads.
 Disadvantages: Can significantly reduce concurrency and increase the overhead on
the system.
Phenomena Addressed by Isolation Levels
1. Dirty Reads: A transaction reads data written by another transaction that has not yet
been committed.
2. Non-repeatable Reads: A transaction reads the same row twice and gets different
data each time because another transaction has modified the row and committed the
change.
3. Phantom Reads: A transaction re-executes a query returning a set of rows that
satisfies a condition and finds that the set of rows satisfying the condition has changed
due to another recently committed transaction.
Summary Table
Isolation Level Dirty Reads Non-repeatable Reads Phantom Reads

Read Uncommitted Yes Yes Yes
Read Committed No Yes Yes
Repeatable Read No No Yes
Serializable No No No
Understanding and choosing the appropriate isolation level is crucial for balancing between
data integrity and system performance in a DBMS.
Online Analytical Processing:
Online Analytical Processing (OLAP) is a category of data processing that enables analysts to
extract and query data in order to observe it from different perspectives. OLAP is often
contrasted with Online Transaction Processing (OLTP), which is oriented more toward
transaction-based processing.
Key Concepts of OLAP:

1. Multidimensional Data Model:
 OLAP systems are based on a multidimensional data model, which allows data to be
modeled and viewed in multiple dimensions.
 Each dimension represents a different perspective on the data, such as time,
geography, product lines, etc.
2. Cubes:
 OLAP data is often stored in a data structure called an OLAP cube.

 A cube allows data to be pre-aggregated and indexed, facilitating fast query
responses.
 Each cell in the cube represents a measure, which is a data point of interest (e.g., sales
revenue).
3. Dimensions and Hierarchies:
 Dimensions are the perspectives or entities with respect to which an organization

wants to keep records.
 Hierarchies within dimensions allow for data to be organized at various levels of
granularity (e.g., Year → Quarter → Month).
4. Measures:
 Measures are the numeric values that users are interested in analyzing (e.g., sales
figures, profit margins).
5. OLAP Operations:
 Roll-Up: Aggregating data along a dimension, reducing the level of detail (e.g.,
monthly to quarterly sales).
 Drill-Down: Breaking down data into more detailed views (e.g., quarterly to monthly
sales).
 Slice: Extracting a subset of the cube by fixing a dimension to a specific value (e.g.,
sales data for a specific region).
 Dice: Extracting a subcube by specifying a range of values on multiple dimensions.
 Pivot: Rotating the data to view it from a different perspective.
Types of OLAP Systems:

1. MOLAP (Multidimensional OLAP):
 Data is stored in multidimensional cubes.

 Provides excellent query performance and fast access due to pre-aggregation.
2. ROLAP (Relational OLAP):
 Data is stored in relational databases, and the OLAP middleware converts OLAP
queries into SQL queries.
 Suitable for handling large amounts of data but might have slower query performance
compared to MOLAP.
3. HOLAP (Hybrid OLAP):
 Combines the features of MOLAP and ROLAP.

 Stores large volumes of detailed data in relational databases and pre-aggregated data
in multidimensional cubes.
Benefits of OLAP:
 Fast Query Performance: Pre-aggregated and indexed data allows for quick query
responses.
 Interactive Analysis: Users can interactively analyze data from multiple
perspectives.
 Complex Calculations: Supports complex calculations and data modeling.
 Data Consistency: Ensures data consistency across different views and dimensions.
Use Cases:
 Business Intelligence: Helps organizations make informed business decisions by
analyzing historical and current data.
 Financial Reporting: Supports financial planning, budgeting, and forecasting.
 Sales and Marketing Analysis: Analyzes sales performance, market trends, and
customer behavior.
OLAP plays a critical role in data warehousing and business intelligence by enabling efficient
and effective data analysis and reporting.
Database performance Tuning and Query optimization:
Query Tree:
In the context of database management systems (DBMS), a Query Tree is a tree data structure
that represents the operations of a query in a hierarchical manner. It is used in query
optimization and execution.
Components of a Query Tree:
1. Nodes: Each node represents an operation, such as a join, selection, projection, or

aggregation.
2. Leaves: The leaf nodes represent the base relations (tables) from which data is
retrieved.
3. Edges: The edges represent the flow of data between operations.
Example of Query Tree Operations:
 Selection: Filtering rows based on a condition.

 Projection: Selecting specific columns from a table.
 Join: Combining rows from two or more tables based on a related column.
Example Query:
Consider the SQL query:
sql
Copy code
SELECT E.name
FROM Employee E, Department D
WHERE E.dept_id = D.dept_id AND D.name = 'Sales';
Corresponding Query Tree:
1. Leaf Nodes:
o Employee table (E)
o Department table (D)
2. Selection:
o D.name = 'Sales' on Department table
3. Join:
o E.dept_id = D.dept_id
4. Projection:
o E.name
Visualization of the Query Tree:

mathematica
Copy code
Π (E.name)
|
⨝ (E.dept_id = D.dept_id)
/ \
Employee σ (D.name = 'Sales')
|
Department
Steps in Query Execution:
1. Selection: Apply the selection on the Department table to filter departments named
'Sales'.
2. Join: Perform a join between the filtered Department table and the Employee table
based on dept_id.
3. Projection: Select the name column from the resulting table.
Query Optimization:
Query trees are useful in query optimization. DBMS can transform the query tree into
equivalent forms to minimize the cost of query execution, such as:
 Rearranging the order of operations.

 Using indices for faster access.
 Pushing selections and projections down the tree to reduce intermediate results.
Understanding query trees and their optimization can significantly improve the performance
of database queries.
cost of a query:
The cost of a query in a Database Management System (DBMS) generally refers to the
resources required to execute the query. This can include various factors such as:
1. Time Complexity: The amount of time it takes to execute the query, which can be
affected by the query's complexity, the size of the dataset, and the efficiency of the
query plan.
2. I/O Cost: The amount of data that needs to be read from or written to disk. This can
be influenced by factors like table scans, index usage, and the size of the result set.
3. CPU Cost: The amount of CPU time required to process the query. This includes
operations like joins, aggregations, and sorting.
4. Memory Usage: The amount of RAM needed to execute the query, which can be
affected by the size of intermediate results and the complexity of operations.
5. Network Cost: For distributed databases, the cost of transferring data between nodes
or between the database and the client.
The DBMS often uses a cost-based optimizer to estimate and minimize these costs by
choosing the most efficient execution plan based on statistics and available indexes.
Join :
In a Database Management System (DBMS), a "join" operation is used to combine rows from
two or more tables based on a related column between them. Joins are fundamental for
querying relational databases effectively. Here are the most common types of joins:
1. INNER JOIN: Returns records that have matching values in both tables. For
example:
sql
Copy code
SELECT *
FROM table1
INNER JOIN table2
ON table1.common_column = table2.common_column;
2. LEFT JOIN (or LEFT OUTER JOIN): Returns all records from the left table and
the matched records from the right table. The result is NULL from the right side if
there is no match. For example:
sql
Copy code
SELECT *
FROM table1
LEFT JOIN table2
3. RIGHT JOIN (or RIGHT OUTER JOIN): Returns all records from the right table
and the matched records from the left table. The result is NULL from the left side
when there is no match. For example:
sql
Copy code
SELECT *
FROM table1
RIGHT JOIN table2
4. FULL JOIN (or FULL OUTER JOIN): Returns all records when there is a match in
either left or right table. The result is NULL from the table that does not have a
match. For example:
sql
Copy code
SELECT *
FROM table1
FULL JOIN table2
5. CROSS JOIN: Returns the Cartesian product of both tables, i.e., all possible
combinations of rows. This can result in a large number of rows. For example:
sql
Copy code
SELECT *
FROM table1
CROSS JOIN table2;
6. SELF JOIN: Joins a table with itself to combine rows based on a related column. For
example:
sql
Copy code
SELECT a.*, b.*
FROM table a
INNER JOIN table b
ON a.common_column = b.common_column;
Selection and Projection Implementation Algorithms and Optimization
Database Security:
Access Control:
Access control in a Database Management System (DBMS) is crucial for ensuring that only
authorized users can access or manipulate data. Here's a breakdown of the concepts related to
selection, projection, implementation algorithms, and optimization in the context of database
security:
Access Control Mechanisms in DBMS
1. Authentication:
o Purpose: To verify the identity of users accessing the database.
o Methods: Username/password, multi-factor authentication, biometric
verification.
2. Authorization:
o Purpose: To determine which resources a user can access and what actions
they can perform.
o Types:
 Discretionary Access Control (DAC): Users are granted permissions
by the owner of the data.
 Mandatory Access Control (MAC): Access is controlled by the
system based on security labels or classifications.
 Role-Based Access Control (RBAC): Permissions are assigned to
roles, and users are assigned to roles.
 Attribute-Based Access Control (ABAC): Access decisions are
based on attributes (user attributes, resource attributes, environmental
conditions).
3. Access Control Policies:

o Access Control Lists (ACLs): Define which users or groups have access to
specific resources.
o Capabilities: Tokens or keys that a user possesses, specifying what actions
they can perform.
4. Views and Permissions:

o Views: Provide a way to restrict access to specific data by creating a virtual
table that includes only the desired columns or rows.
o Permissions: Grant or deny access to these views or underlying tables.
5. Encryption:
o Purpose: To protect data from unauthorized access by encoding it.
o Types:
 At-Rest Encryption: Encrypts stored data.
 In-Transit Encryption: Encrypts data during transmission.
Selection and Projection Implementation Algorithms
1. Selection (σ):
o Purpose: To retrieve rows that satisfy a specified condition.
o Algorithm: Typically implemented using a linear scan of the table or an index
scan if an index is available on the condition’s attribute.
o Optimization:
 Indexing: Using indexes on columns involved in the selection
condition can speed up retrieval.
 Query Optimization: Query optimizers may re-order selections and
use cost-based approaches to choose the most efficient execution plan.
2. Projection (π):
o Purpose: To retrieve specific columns from a table.
o Algorithm: Involves reading the relevant columns and ignoring others, which
can be done efficiently using columnar storage or by creating a temporary
table with the required columns.
o Optimization:
 Columnar Storage: Storing data in columns rather than rows can
improve projection performance.
 Materialized Views: Pre-computing projections and storing the results
can speed up retrieval.
Implementation and Optimization Strategies
1. Indexes:
o Purpose: Speed up data retrieval operations (selection) by providing quick
access paths.
o Types: B-trees, hash indexes, bitmap indexes.
2. Query Optimization:
o Cost-Based Optimization: Uses statistical information to estimate the cost of
different execution plans and chooses the most efficient one.
o Rule-Based Optimization: Applies a set of predefined rules to transform
queries into more efficient forms.
3. Database Tuning:
o Purpose: Adjust system parameters to improve performance.
o Techniques: Adjusting buffer sizes, query caching, optimizing schema design.
4. Data Partitioning:
o Purpose: Improve performance by distributing data across multiple storage
locations.
o Types: Horizontal partitioning (based on rows) and vertical partitioning
(based on columns).
MAC:
In the context of database management systems (DBMS), MAC stands for Mandatory Access
Control. It is a security model used to enforce access controls in a database by defining
access rights and permissions based on the classification of information and the security
clearance of users.
Here's a brief overview:
 Mandatory Access Control (MAC): This model is often used in environments where
security and confidentiality are paramount. It restricts access to information based on
the sensitivity of the data and the classification level assigned to it. Users are granted
access according to their clearance levels and the sensitivity of the data, and these
permissions cannot be altered by users.
 Labeling: In MAC, data is classified into categories or levels (e.g., confidential,
secret, top secret), and users are assigned a security label or clearance level. Access to
data is governed by these labels and the rules defined by the system.
 Use Case: MAC is typically used in military and government databases where data
sensitivity is critical and cannot be controlled on an individual basis by users.
In contrast, Discretionary Access Control (DAC) allows users to control access to their own
data and resources, which can be more flexible but less strict than MAC.
Role-Based Access Control (RBAC):
Role-Based Access Control (RBAC) is a method of regulating access to computer resources

based on the roles assigned to users within an organization. In a Database Management
System (DBMS), RBAC helps manage and control who has access to what data and what
operations they can perform on that data.
Here’s a basic overview of how RBAC works in a DBMS:
1. Roles: Define different roles based on job functions or responsibilities. For instance,
roles could be "Admin," "Developer," "Analyst," etc.
2. Permissions: Specify what actions each role can perform on the database.
Permissions might include things like SELECT (read), INSERT (write), UPDATE,
DELETE, and so on.
3. Users: Assign users to one or more roles based on their job requirements.
4. Role Assignment: Users inherit permissions through the roles they are assigned to.
For example, a user assigned the "Analyst" role might have read-only access to
certain tables.
5. Policy Enforcement: The DBMS enforces access policies based on the roles and
permissions. This ensures that users can only perform actions and access data allowed
by their roles.
Example:
Imagine a database for a company with different roles:
 Admin: Full access to all tables and operations.

 Manager: Can read and update employee records, but not delete.
 Employee: Can only view their own records.
A user assigned the "Manager" role would be able to perform SELECT and UPDATE
operations on employee records but not DELETE.
RBAC helps in simplifying management, ensuring that permissions are consistent with job
responsibilities, and improving security by minimizing the risk of unauthorized access.
Authorization:
Authorization in a Database Management System (DBMS) refers to the process of

determining what actions a user or process can perform on the database. It involves granting
or restricting access to various database resources and operations to ensure data security and
integrity. Here's a breakdown of key concepts related to authorization in DBMS:
1. User Roles and Privileges: Users are assigned roles, and each role has specific
privileges. Privileges determine what actions a user can perform, such as reading data,
writing data, or executing administrative tasks.
2. Access Control: This involves defining and enforcing rules to manage access to
database objects. Access control can be implemented at different levels:
o Database Level: Controls access to the entire database.
o Table Level: Controls access to specific tables within the database.
o Row Level: Controls access to specific rows within a table.
o Column Level: Controls access to specific columns within a table.
3. Grant and Revoke:

o Grant: This command is used to give a user or role specific privileges.
o Revoke: This command is used to remove specific privileges from a user or
role.
4. Authentication: Before authorization can occur, users need to be authenticated to

confirm their identity. This is typically done through login credentials such as
usernames and passwords.
5. Authorization Models: Various models help manage authorization, including:
o Discretionary Access Control (DAC): Allows users to control access to their
own data.
o Mandatory Access Control (MAC): Enforces access control policies set by
the system, which cannot be overridden by users.
o Role-Based Access Control (RBAC): Users are assigned roles, and each role
has specific permissions.
6. Audit Trails: Tracking and logging access and changes made to the database to
ensure that unauthorized actions are detected and can be investigated.
SQL Injection Attacks:
SQL injection is a type of security vulnerability that occurs when an attacker is able to insert
or manipulate malicious SQL queries through user inputs. This can allow them to gain
unauthorized access to data, manipulate or delete data, and potentially take control of the
entire database. Here’s a breakdown of SQL injection attacks and how to mitigate them:
How SQL Injection Works:
1. User Input Manipulation: An attacker inserts malicious SQL code into a form field,
URL, or any other input method.
2. Execution of Malicious SQL: If the application doesn't properly validate or escape
this input, the database executes the injected SQL code.
3. Exploitation: The attacker can exploit the SQL query to gain access to unauthorized
data, perform operations on the database, or execute commands that compromise the
database server.
Types of SQL Injection:
1. Classic SQL Injection: The attacker inserts SQL code directly into an input field to
manipulate the query.
2. Blind SQL Injection: The attacker is unable to see the results of the query but can
infer information based on the application's behavior or error messages.
3. Error-Based SQL Injection: The attacker uses database errors to gain information
about the structure of the database.
4. Union-Based SQL Injection: The attacker uses the UNION SQL operator to
combine the results of the original query with the results of a malicious query.
5. Boolean-Based SQL Injection: The attacker uses boolean conditions to infer
information about the database.
Examples of SQL Injection:
1. Classic Example:
sql
Copy code
SELECT * FROM users WHERE username = 'admin' AND password =
'password';
If an attacker inputs ' OR '1'='1, the query becomes:
sql
Copy code
SELECT * FROM users WHERE username = 'admin' AND password = '' OR
'1'='1';
2. Union-Based Example:
sql
Copy code
SELECT id, name FROM users WHERE id = 1 UNION SELECT 1,
'malicious_data';
Mitigation Strategies:
1. Use Prepared Statements and Parameterized Queries: Prepared statements ensure

that user input is treated as data, not executable code.
sql
Copy code
SELECT * FROM users WHERE username = ? AND password = ?;
2. Input Validation and Sanitization: Validate and sanitize all user inputs to ensure
they conform to expected formats.
3. Escape User Inputs: Properly escape special characters in user inputs to prevent
them from being interpreted as SQL code.
4. Use ORM (Object-Relational Mapping): Many ORMs handle SQL injection risks
internally by using parameterized queries.
5. Least Privilege Principle: Ensure that the database user has the minimum
permissions necessary for the application to function.
6. Regular Security Audits: Regularly audit your code and database for vulnerabilities.
7. Error Handling: Avoid displaying detailed error messages to users. Instead, log
errors and display generic messages.

Unit 2

Uploaded by

Copyright:

Available Formats

Unit 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 2

Uploaded by

Copyright:

Available Formats

UNIT-II

Consistency in database management systems (DBMS) is a fundamental aspect of the ACID

Here’s a deeper look at Consistency in DBMS:

1. All-or-Nothing Principle: If any part of a transaction fails, the entire transaction

Isolation and Durability:

1. Concurrency Control: DBMS uses concurrency control mechanisms like locking,

1. Conflict Serializability: A schedule is conflict-serializable if it can be transformed

2. View Serializability: A schedule is view-serializable if it can be transformed into a

If the schedule is S1: W1(A), R2(A), W2(A), this is view-serializable because

Testing for Serializability

1. Precedence Graph (Serialization Graph):

1. Two-Phase Locking (2PL): Ensures serializability by dividing the transaction

1. Transaction: A sequence of one or more SQL operations treated as a single unit.

Characteristics of a Recoverable Schedule:

Assume two transactions T1T1T1 and T2T2T2:

 T1T1T1 writes data AAA.

Techniques to Ensure Recoverable Schedules:

1. Locking Protocols: Use locking mechanisms to control access to data items.

By understanding and implementing these principles, a database can maintain consistency

Concurrency control in a Database Management System (DBMS) is the process of managing

Here are the key concepts related to concurrency control in DBMS:

 A transaction is a sequence of operations performed as a single logical unit of work. A

4. Concurrency Control Techniques:

 Locks: Mechanisms to control access to database items.

 Timestamp Ordering: Assigns a unique timestamp to each transaction and ensures

Optimistic Concurrency Control:

Multiversion Concurrency Control (MVCC):

Time-stamp based protocols:

Time-stamp based protocols in Database Management Systems (DBMS) are concurrency

Key Concepts of Time-Stamp Based Protocols

1. Timestamp: Each transaction is assigned a unique timestamp when it starts. This

Types of Time-Stamp Based Protocols

Basic Timestamp Ordering Protocol

 Read Operation (Ti reads Q):

 Write Operation (Ti writes Q):

Thomas's Write Rule

 Write Operation (Ti writes Q):

Advantages of Time-Stamp Based Protocols

 Ensures serializability: Transactions are executed in a serial order.

Disadvantages of Time-Stamp Based Protocols

 High overhead: Maintaining and comparing timestamps for each operation.

Consider two transactions T1 and T2 with timestamps TS(T1) = 1 and TS(T2) = 2,

1. T1 reads Q: Allowed since it’s the first transaction.

Phenomena Addressed by Isolation Levels

Isolation Level Dirty Reads Non-repeatable Reads Phantom Reads

Online Analytical Processing:

Key Concepts of OLAP:

 OLAP data is often stored in a data structure called an OLAP cube.

3. Dimensions and Hierarchies:

 Dimensions are the perspectives or entities with respect to which an organization

Types of OLAP Systems:

 Data is stored in multidimensional cubes.

2. ROLAP (Relational OLAP):

3. HOLAP (Hybrid OLAP):

 Combines the features of MOLAP and ROLAP.

Database performance Tuning and Query optimization:

Components of a Query Tree:

1. Nodes: Each node represents an operation, such as a join, selection, projection, or

Example of Query Tree Operations: