DBMS CHAP-4
DBMS CHAP-4
Cost: Includes disk writes for intermediate results (e.g., 𝛤(𝑏ᵣ) seeks for 𝑏ᵣ buffer blocks).
Use Cases: Reusable aggregations, dashboards, or complex queries with shared sub-results.
Trade-off: Faster read times but higher storage/maintenance overhead.
Pipelining
Pipelining processes data incrementally by streaming results between operations without
materializing intermediates. Stages execute concurrently, passing output directly to
subsequent steps.
Example:
A query execution plan for SELECT * FROM orders WHERE total > 100 ORDER BY date:
Characteristics:
Performance: Reduces latency by overlapping operations (e.g., filtering while sorting).
Resource Use: Minimizes disk I/O but risks pipeline stalls if stages are unbalanced.
Use Cases: High-throughput OLTP systems, real-time analytics.
2] List the desirable properties of decomposition. Explain loss less join with
example.
Desirable Properties of Decomposition: Decomposition in DBMS involves breaking a
relation into smaller relations to improve database design. A good decomposition must
satisfy the following properties:
Lossless Join: Ensures that no information is lost when decomposed relations are joined
back together.
Guarantees that the original relation can be reconstructed from the smaller relations using a
natural join operation.
Dependency Preservation: All functional dependencies of the original relation should be
preserved in the decomposed relations.
This ensures that constraints on data integrity can be enforced without needing to
recombine relations.
Attribute Preservation: Every attribute from the original relation must appear in at least one
of the decomposed relations.
This prevents any loss of attributes during decomposition.
Minimization of Redundancy: Reduces duplicate data, thereby minimizing storage
requirements and preventing anomalies like insertion, deletion, and update issues.
A decomposition is lossless if joining the decomposed tables reconstructs the original table
without losing any information.
Example:
R (A, B, C) with functional dependencies:
A→B→C
We decompose R into two relations: R1(A, B) with A→BR2(B, C) with B→C Now, if we
perform a natural join between R1 and R2 on attribute B, we get back the
original relation R (A, B, C) This ensures that no data is lost during decomposition.
Key Condition for Lossless Join:
1. The decomposition is lossless if at least one of the following holds:
2. The common attribute between decomposed relations is a super key in one of the
relations.
3. Functional dependencies ensure consistency during join operations.
5] Define query processing. What are the steps involved in query processing?
Query Processing in DBMS
1)Query processing refers to the systematic approach of converting a user's high-level query
(e.g., SQL) into an executable form that the database system can understand and process
efficiently.
2) It involves multiple steps to ensure accurate data retrieval or manipulation while
optimizing performance.
Steps Involved in Query Processing
Parsing and Translation: The query is first parsed to check its syntax and semantics. This
step ensures that the query is syntactically correct and meaningful.
The parser converts the query into an internal representation, often in the form of relational
algebra or a parse tree.
Checks performed during parsing include:
Syntax Check: Verifies the syntactic correctness of the query.
Semantic Check: Ensures that the query references valid database objects (e.g., tables,
attributes).
Shared Pool Check: Determines whether the query has already been processed (soft
parsing) or needs full processing (hard parsing).
Optimization: The query optimizer evaluates multiple execution strategies and selects the
one with the lowest cost.
Optimization considers factors such as available indexes, table sizes, and statistical data
stored in the database catalog.
The output of this phase is an optimal query execution plan.
Row Source Generation: The optimizer's selected execution plan is transformed into an
iterative execution plan by the row source generator.
This iterative plan is essentially a binary program that can be executed by the SQL engine to
retrieve or manipulate data.
Execution
The execution engine runs the query using the generated plan.
During this phase, actual data retrieval or manipulation occurs, and results are produced.
Result Formatting
Once execution is complete, the results are formatted according to user requirements (e.g.,
simple lists or complex reports) before being presented.
Example: A database requires a primary key for every record, but if no value is provided, the
record cannot be inserted.
Deletion Anomaly: Happens when deleting a record unintentionally removes related data.
Example: Deleting a customer record might also delete all associated orders.
Update/Modification Anomaly:
Arises when updating data leads to inconsistencies because related records are not updated
simultaneously.
Example: Changing an employee's salary in one record but not in others can cause errors in
reporting.
How Normalization Resolves Anomalies
Database normalization uses normal forms to address these anomalies:
First Normal Form (1NF):
Ensures atomicity (each field holds a single value) and eliminates repeating groups.
Fixes insertion anomalies by requiring unique identifiers (primary keys) for records.
Second Normal Form (2NF):
Eliminates partial dependencies (attributes dependent on part of a composite key).
Resolves update anomalies by ensuring all attributes depend entirely on the primary key.
Third Normal Form (3NF):
Removes transitive dependencies (non-key attributes depending on other non-key
attributes).
Prevents deletion anomalies by organizing data into separate tables linked by foreign keys.
14] Describe the concept of transitive dependency. Explain how this concept
is use to define 3NF
17] Define Boyce Codd normal form. How does it differ from 3NF? Why is
considered a stronger form of 3NF.
Boyce-Codd Normal Form (BCNF) is a database normalization technique that ensures a table
is in Third Normal Form (3NF) and, additionally, requires that for every non-trivial functional
dependency X→Y, X must be a superkey (or a candidate key) of the table. This means that if
one attribute determines another, it should uniquely identify a row in the table.
Difference Between BCNF and 3NF
Dependency on Candidate Keys:
3NF ensures that there are no transitive dependencies, meaning a non-key attribute cannot
depend on another non-key attribute.
BCNF is stricter, requiring that every determinant in a functional dependency must be a
candidate key (or superkey), eliminating partial dependencies and ensuring that every
dependency is directly tied to a key.
Redundancy Elimination:
3NF reduces redundancy by eliminating transitive dependencies but may still allow some
redundancy if a non-key attribute depends on part of a composite key.
BCNF further reduces redundancy by ensuring that all dependencies are tied to candidate
keys, thus eliminating more types of redundancy.
Normalization Hierarchy:
BCNF is considered a stronger form of 3NF because it imposes additional constraints to
ensure that every determinant is a key, which helps in maintaining data integrity and
reducing anomalies.
Why BCNF is Considered a Stronger Form of 3NF
BCNF is considered stronger than 3NF for several reasons:
Stricter Conditions:
BCNF requires that every determinant in a functional dependency be a superkey, which is
not a requirement in 3NF. This stricter condition helps eliminate more types of redundancy
and anomalies.
Improved Data Integrity:
By ensuring that all dependencies are tied to keys, BCNF maintains better data integrity and
consistency compared to 3NF.
Reduced Anomalies: BCNF reduces the risk of update, insertion, and deletion anomalies
more effectively than 3NF by eliminating partial dependencies and ensuring that all
dependencies are directly tied to keys.
18] What is query processing? Explain query processing steps with neat
sketch.
1)Query processing refers to the sequence of activities involved in retrieving data from a
database based on a user's query.
2) It converts high-level queries (e.g., SQL) into low-level instructions that the database can
execute efficiently. The goal is to ensure accurate and optimized data retrieval.
Steps of Query Processing
Parsing and Translation:
The query is checked for syntax and semantics.
It is translated into an internal representation, often in the form of a parse tree or relational
algebra expression.
Example: SQL query SELECT * FROM Employee WHERE Salary > 50000 is converted into
relational algebra.
Optimization:
Multiple query execution plans are considered.
The system selects the optimal plan based on cost estimation (e.g., I/O operations, CPU
usage).
Example: Choosing between index-based access or sequential scan.
Evaluation:
The chosen execution plan is run to retrieve the required data.
The database engine executes the query efficiently and returns the result.
Sketch of Query Processing Steps
Below is a simplified diagram illustrating query processing:
User Query (SQL)
↓
Parsing & Translation ↓
↓
Relational Algebra Expression Query Evaluation
↓
Query Optimization ↓
↓
2. Deletion Anomaly
Occurs when deleting a record unintentionally removes related data, leading to loss of
important information.
20] State the need of normalization? Explain 2NF with suitable example.
Need for Normalization: Normalization is essential in database design for several reasons:
Eliminating Redundancy: Reduces duplicate data and saves storage space.
Improving Data Integrity: Ensures consistency and accuracy by organizing data logically.
Minimizing Anomalies: Prevents insertion, deletion, and update anomalies that can lead to
errors or data loss.
Optimizing Query Performance: Simplifies queries and improves execution speed by
structuring data efficiently.
Enhancing Scalability: Makes the database adaptable to changing business needs.
Explanation of 2NF with Example
Second Normal Form (2NF) builds upon the First Normal Form (1NF) by eliminating partial
dependencies. A relation is in 2NF if
It is in 1NF (all attributes are atomic).
Every non-prime attribute is fully functionally dependent on the entire primary key.