DDM 3
DDM 3
UNIT-III
RELATIONAL DATABASE DESIGN AND NORMALIZATION
SYLLABUS:
The two main options are to map the whole specialization into a single table, or to map it into multiple
tables. Within each option are variations that depend on the constraints on the specialization/generalization.
Step 8: Options for Mapping Specialization or Generalization. Convert each specialization with m
subclasses {S1, S2, …, Sm} and (generalized) superclass C, where the attributes of C are {k, a1, …, an}
and k is the (primary) key, into relation schemas.
MAPPING OF SHARED SUBCLASSES (MULTIPLE INHERITANCE):
A shared subclass, such as ENGINEERING_MANAGER is a subclass of several super classes, indicating
multiple inheritance. These classes must all have the same key attribute; otherwise, the shared subclass
would be modeled as a category (union type).
3.2UPDATE ANOMALIES:
Anomalies in the relational model refer to inconsistencies or errors that can arise when working with
relational databases, specifically in the context of data insertion, deletion, and modification.
Insertion Anomalies: These anomalies occur when it is not possible to insert data into a database because
the required fields are missing or because the data is incomplete.
For example, if a database requires that every record has a primary key, but no value is provided for a
particular record, it cannot be inserted into the database.
Deletion anomalies: These anomalies occur when deleting a record from a database and can result in the
unintentional loss of data.
For example, if a database contains information about customers and orders, deleting a customer record
may also delete all the orders associated with that customer.
Update anomalies: These anomalies occur when modifying data in a database and can result in
inconsistencies or errors.
For example, if a database contains information about employees and their salaries, updating an employee’s
salary in one record but not in all related records could lead to incorrect calculations and reporting.
FOR EXAMPLE:
Insertion Anomaly: If a tuple is inserted in referencing relation and referencing attribute value is not
present in referenced attribute, it will not allow insertion in referencing relation.
Example: If we try to insert a record in STUDENT_COURSE with STUD_NO =7, it will not allow it.
Deletion and Updating Anomaly: If a tuple is deleted or updated from referenced relation and the
referenced attribute value is used by referencing attribute in referencing relation, it will not allow deleting
the tuple from referenced relation.
Example: If we want to update a record from STUDENT_COURSE with STUD_NO =1, We have to update
it in both rows of the table. If we try to delete a record from STUDENT with STUD_NO =1, it will not
allow it.
REMOVAL OF ANOMALIES:
✓ These anomalies can be avoided or minimized by designing databases that adhere to the principles
of normalization.
✓ Normalization involves organizing data into tables and applying rules to ensure data is stored in a
consistent and efficient manner.
✓ By reducing data redundancy and ensuring data integrity, normalization helps to eliminate
anomalies and improve the overall quality of the database
According to E.F.Codd, who is the inventor of the Relational Database, the goals of Normalization include:
For example;
If A and B are attributes of relation R, B is functionally dependent on A (denoted A ® B), if each value of
A is associated with exactly one value of B. (A and B may each consist of one or more attributes.)
In this example, maf_year and color are independent of each other but dependent on car_model. In this
example, these two columns are said to be multi value dependent on car_model.
This dependence can be represented like this:
car_model ->maf_year
car_model-> color
(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it’s non-trivial functional dependency.
Transitive dependency:
✓ There is an additional type of functional dependency called a transitive dependency,that we need
to recognize, because its existence in a relation can potentially cause the types of update anomaly.
Client
fd2. clientNo ® cName (Primary key)
Rental
fd1 clientNo, propertyNo ® rentStart, rentFinish (Primary key)
fd5' clientNo, rentStart ® propertyNo, rentFinish (Candidate key)
fd6' propertyNo, rentStart ® clientNo, rentFinish (Candidate key)
PropertyOwner
fd3 propertyNo ® pAddress, rent, ownerNo, oName (Primary key)
fd4 ownerNo ® oName (Transitive dependency)
➢ To transform the PropertyOwner relation into 3NF, we must first remove this transitive dependency
by creating two new relations called PropertyForRent and Owner, as shown in Figure 14.15.
➢ The new relations have the following form:
PropertyForRent (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)
➢ The PropertyForRent and Owner relations are in 3NF, as there are no further transitive
dependencies on the primary key.
Example: 1NF,2NF,3NF:
BCNF: It stands for Boyce Codd Normal form, which is the next version of 3NF. Sometimes, it is also
pronounced as 3.5 NF. A normal form is said to be in BCNF if it follows the given conditions:
FOURTH NORMAL FORM (4 NF): A relation is said to be Fourth Normal Form (4NF) if it follows the
given conditions:
1. For a multivalued dependency A -> B, for a single value of A, there are multiple values of B.
2. If a table has at least three columns, they have a multivalued dependency.
3. For a relation with A, B and C columns, where B and C should be independent.
JOIN DEPENDENCY:
A relation (R) is said to be a Join dependency if the relation (R) schema can be divided into smaller sets of
tables R1, R2 … Rn that can be redesigned by joining multiple tables to the original table(R).
•General definition for Second Normal Form (2NF) --is a relation that is in first normal form and
everyone-candidate-key attribute is fully functionally dependent on any candidate key. In this definition, a
candidate-key attribute is part of any candidate key.
•General definition for Third Normal Form (3NF) --is a relation that is in first and second normal formin
which no non-candidate-key attribute is transitively dependent on any candidate key. In this definition, the
candidate-key attribute is part of any candidate key.
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6)
Rule 6 states that we can combine a set of nonoverlapping dependencies to form another valid dependency.
In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.
If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Each of the preceding inference rules can be proved from the definition of functional dependency, either
by direct proof or by contradiction. A proof by contradiction assumes that the rule does not hold and shows
that this is not possible. We now prove that the first three rules IR1 through IR3 are valid. The second proof
is by contradiction.
Proof of IR1. Suppose that X ⊇ Y and that two tuples t1 and t2 exist in some relation instance r of R such
that t1 [X] = t2 [X]. Then t1[Y] = t2[Y] because X ⊇ Y;
hence, X → Y must hold in r.
Proof of IR2 (by contradiction). Assume that X → Y holds in a relation instance r of R but that XZ →
YZ does not hold. Then there must exist two tuples t1and t2 in r such that (1) t1 [X] = t2 [X], (2) t1 [Y] =
t2 [Y], (3) t1 [XZ] = t2 [XZ], and (4) t1 [YZ] ≠ t2 [YZ]. This is not possible because from (1) and (3) we
deduce (5) t1 [Z] = t2 [Z], and from (2) and (5) we deduce (6) t1 [YZ] = t2 [YZ], contradicting (4).
Proof of IR3. Assume that (1) X → Y and (2) Y → Z both hold in a relation r. Then for any two tuples t1
and t2 in r such that t1 [X] = t2 [X], we must have (3) t1 [Y] = t2 [Y], from assumption (1); hence we must
also have (4) t1 [Z] = t2 [Z] from (3) and assumption (2); thus X → Z must hold in r.
There are three other inference rules that follow from IR1, IR2 and IR3. They are as follows:
1. IR4 (decomposition, or projective, rule): {X → YZ} |=X → Y.
2. IR5 (union, or additive, rule): {X → Y, X → Z} |=X → YZ.
3. IR6 (pseudo transitivity rule): {X → Y, WY → Z} |=WX → Z.
✓ The decomposition rule (IR4) says that we can remove attributes from the righthand side of a
dependency; applying this rule repeatedly can decompose the FD X → {A1, A2, …, An} into the
set of dependencies {X → A1, X → A2, …, X → An}.
✓ The union rule (IR5) allows us to do the opposite; we can combine a set of dependencies {X →
A1, X → A2, …, X → An} into the single FD X → {A1, A2, …, An}.
✓ The pseudo transitivity rule (IR6) allows us to replace a set of attributes Y on the lefthand side of
a dependency with another set X that functionally determines Y, and can be derived from IR2 and
IR3 if we augment the first functional dependency X → Y with W (the augmentation rule) and then
apply the transitive rule.
✓ One important cautionary note regarding the use of these rules: Although X → A and X → B
implies X → AB by the union rule stated above, X → A and Y → B does imply that XY → AB.
Also, XY → A does not necessarily imply either X → A or Y → A.
Definition. For each such set of attributes X, we determine the set X+ of attributes that are functionally
determined by X based on F; X+ is called the closure of X under F.
The formal definition is: A set of FD F to be minimal if it satisfies the following conditions–
1. Every dependency in F has a single attribute for its right-hand side.
2. We cannot replace any dependency X->A in F with a dependency Y->A, where Y is a proper subset
of X, and still have a set of dependencies that is equivalent to F.
3. We cannot remove any dependency from F and still have a set of dependencies that are equivalent
to F.
Canonical cover is called minimal cover which is called the minimum set of FDs. A set of FD FC is called
canonical cover of F if each FD in FC is a −
1. Simple FD.
2. Left reduced FD.
3. Non-redundant FD.
Example:
Consider an example to find canonical cover of F.
Minimal cover: The minimal cover is the set of FDs which are equivalent to the given FDs.
Canonical cover: In canonical cover, the LHS (Left Hand Side) must be unique.
First of all, we will find the minimal cover and then the canonical cover.
Now, we will convert the above set of FDs into canonical cover.
The canonical cover for the above set of FDs will be as follows −
A -> BC
B -> C
Example 1: Let the given set of FDs be E: {B → A, D → A, AB → D}. We have to find the minimal cover
of E.
■ All above dependencies are in canonical form (that is, they have only one
attribute on the right-hand side), so we have completed step 1 of Algorithm 15.2 and can proceed to step 2.
In step 2 we need to determine if AB → D has any redundant (extraneous) attribute on the left-hand side;
that is, can it be replaced by B → D or A → D?
■ Since B → A, by augmenting with B on both sides (IR2), we have BB → AB, or B → AB (i). However,
AB → D as given (ii).
■ Hence by the transitive rule (IR3), we get from (i) and (ii), B → D. Thus
■ We now have a set equivalent to the original E, say E′: {B → A, D → A, B → D}. No further reduction
is possible in step 2 since all FDs have a single attribute on the left-hand side.
■ In step 3 we look for a redundant FD in E′. By using the transitive rule on B → D and D → A, we derive
B → A. Hence B → A is redundant in E′ and can be eliminated.
■ Therefore, the minimal cover of E is F: {B → D, D → A}.
EXAMPLE:
Identifying the minimal set of functional dependencies of the Staff-Branch relation
We apply the three conditions described previously on the set of functional dependencies
for the StaffBranch relation listed in Example 14.5 to produce the following functional dependencies:
staffNo ® sName
staffNo ® position
staffNo ® salary
staffNo ® branchNo
branchNo ® bAddress
bAddress ® branchNo
branchNo, position ® salary
bAddress, position ® salary
These functional dependencies satisfy the three conditions for producing a minimal set of functional
dependencies for the StaffBranch relation.
1. Condition 1 ensures that every dependency is in a standard form with a single attribute on the right-
hand side.
2. Conditions 2 and 3 ensure that there are no redundancies in the dependencies, either by having
redundant attributes on the left-hand side of a dependency (Condition 2) or by having a dependency
that can be inferred from the remaining functional dependencies in X (Condition 3).
Definition of BCNF:
✓ BCNF is based on functional dependencies that take into account all candidate keys in a relation;
however, BCNF also has additional constraints compared with the general definition of 3NF.
✓ Boyce–Codd Normal Form (BCNF)--A relation is in BCNF if and only if every determinant is a
candidate key.
Types of Decomposition
Lossless Decomposition:
✓ If the information is not lost from the relation that is decomposed, then the decomposition will be
lossless.
✓ The lossless decomposition guarantees that the join of relations will result in the same relation as
it was decomposed.
✓ The relation is said to be lossless decomposition if natural joins of all the decomposition give the
original relation.
Example:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT:
DEPARTMENT table:
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation
will look like:
Employee ⋈ Department:
Hence, the decomposition is Lossless join decomposition.
Dependency Preserving:
✓ It is an important constraint of the database.
✓ In the dependency preservation, at least one decomposed table must satisfy every dependency.
✓ If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a
part of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and
R2.
✓ For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC).
The relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving
because FD A->BC is a part of relation R1(ABC).
Properties of Decomposition:
Decomposition must have the following properties:
1. Decomposition Must be Lossless
2. Dependency Preservation
3. Lack of Data Redundancy
2. Dependency Preservation–
Dependency is a crucial constraint on a database, and a minimum of one decomposed table must satisfy
every dependency. If {P → Q} holds, then two sets happen to be dependent functionally. Thus, it becomes
more useful when checking the dependency if both of these are set in the very same relation. This property
of decomposition can be done only when we maintain the functional dependency. Added to this, this
property allows us to check various updates without having to compute the database structure’s natural join.
For example:
R = (A, B, C)
F = {A ->B, B->C}
Key = {A}
R is not in BCNF.
Decomposition R1 = (A, B), R2 = (B, C)
➢ The StaffPropertylnspection relation is transformed into second normal form by removing the
partial dependency from the relation and creating two new relations called Property and
Propertylnspection with the following form:
Property (propertyNo, pAddress)
Propertylnspection (propertyNo, iDate, iTime, comments, staffNo, sName, carReg)
➢ These relations are in 2NF, as every non-primary-key attribute is functionally dependent on the
primary key of the relation.
Property Relation
fd2 propertyNo ® pAddress
Property Inspection Relation
fd1 propertyNo, iDate ® iTime, comments, staffNo, sName, carReg
fd3 staffNo ® sName
fd4 staffNo, iDate ® carReg
fd5 carReg, iDate, iTime ® propertyNo, comments, staffNo, sName
fd6 staffNo, iDate, iTime ® propertyNo, comments
➢ To transform the Propertylnspection relation into 3NF, we remove the transitive dependency
(staffNo ® sName) by creating two new relations called Staff and Propertylnspect with the form:
Staff (staffNo, sName)
Propertylnspect (propertyNo, iDate, iTime, comments, staffNo, carReg)
➢ The Staff and PropertyInspect relations are in 3NF as no non-primary-key attribute is wholly
functionally dependent on another non-primary-key attribute.
➢ Thus, the StaffPropertylnspection relation shown in Figure 15.5 has been transformed by the
process of normalization into three relations in 3NF with the following form:
Property (propertyNo, pAddress)
Staff (staffNo, sName)
Propertylnspect (propertyNo, iDate, iTime, comments, staffNo, carReg)
The functional dependencies for the Property, Staff, and Propertylnspect relations are as follows:
Property Relation
fd2. propertyNo ® pAddress
Staff Relation
fd3 staffNo ® sName
Property Inspect Relation
fd1 propertyNo, iDate ® iTime, comments, staffNo, carReg
fd4 staffNo, iDate ® carReg
fd5 carReg, iDate, iTime ® propertyNo, comments, staffNo
fd6 staffNo, iDate, iTime ® propertyNo, comments
➢ To transform the Propertylnspect relation into BCNF, we must remove the dependency that violates
BCNF by creating two new relations called StaffCar and Inspection with the form:
StaffCar (staffNo, iDate, carReg)
Inspection (propertyNo, iDate, iTime, comments, staffNo)
➢ The StaffCar and Inspection relations are in BCNF, as the determinant in each of these relations
are also a candidate key.
The resulting BCNF relations have the following form:
Property (propertyNo, pAddress)
Staff (staffNo, sName)
Inspection (propertyNo, iDate, iTime, comments, staffNo)
StaffCar (staffNo, iDate, carReg)
MULTI-VALUED DEPENDENCY:
✓ The possible existence of multi-valued dependencies in a relation is due to 1NF, which disallows
an attribute in a tuple from having a set of values.
✓ For example, if we have two multi-valued attributes in a relation, we have to repeat each value of
one of the attributes with every value of the other attribute, to ensure that tuples of the relation are
consistent. This type of constraint is referred to as a multivalued dependency and results in data
redundancy.
✓ Multi-Valued Dependency (MVD)----Represents a dependency between attributes (for example,
A, B, and C) in a relation, such that for each value of A there is a set of values for B and a set of
values for C. However, the set of values for B and C are independent of each other. We represent
a MVD between attributes A, B, and C in a relation using the following notation:
A —>> B
A —>> C
✓ A multivalued dependency constraint potentially exists in the BranchStaffOwnerrelation, because
two independent 1: * relationships are represented in the same relation. We specify the MVD
constraint in the BranchStaffOwner relation shown in Figure 15.8(a) as follows:
branchNo —>>sName
branchNo —>>oName
✓ A multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A —>>
B in relation R is defined as being trivial if (a) B is a subset of A or (b) A < B = R. A MVD is
defined as being non trivial if neither (a) nor (b) are satisfied. A trivial MVD does not specify a
constraint on a relation; a nontrivial MVD does specify a constraint.
LOSSLESS-JOIN DEPENDENCY:
✓ A property of decomposition that ensures that no spurious tuples are generated when relations are
reunited through a natural join operation.
✓ In splitting relations by projection, we are very explicit about the method of decomposition. In
particular, we are careful to use projections that can be reversed by joining the resulting relations,
so that the original relation is reconstructed.
✓ Such a decomposition is called a lossless-join (also called a non-loss- or non-additive-join)
decomposition, because it preserves all the data in the original relation and does not result in the
creation of additional spurious tuples.
To identify the type of constraint on the Property item Supplier relation in Figure 15.9(a), consider the
following statement:
If Property PG4 requires Bed (from data in tuple 1)
Supplier S2 supplies property PG4 (from data in tuple 2)
Supplier S2 provides Bed (from data in tuple 3)
Then Supplier S2 provides Bed for property PG4
This is an example of a type of update anomaly and we say that this relation contains a nontrivial join
dependency (JD) constraint.
*************************************************************************************