Normalization and FD
Normalization and FD
Normalization and FD
1
UNIT-5
NORMALIZATION
Table of Contents
Introduction.............................................................................................................................................3
5.1 Data redundancy and the associated issues..................................................................................3
5.1.1 Insertion Anomaly.....................................................................................................................4
5.1.2 Updation Anomaly....................................................................................................................4
5.1.3 Deletion Anomaly......................................................................................................................4
5.2 FIRST NORMAL FORM (1NF)..........................................................................................................5
5.3 SECOND NORMAL FORM (2NF)......................................................................................................8
5.4 THIRD NORMAL FORM (3NF).......................................................................................................12
5.5 BOYCE CODD NORMAL FORM (BCNF).........................................................................................16
5.6 FINDING THE HIGHEST NORMAL FORM OF A GIVEN RELATION................................................19
5.7 FOURTH NORMAL FORM (4NF)....................................................................................................24
5.8 FIFTH NORMAL FORM (5NF)........................................................................................................26
5.9 CONDITIONS FOR RELATION DECOMPOSITION..........................................................................29
5.9.1 LOSSLESS (NON-ADDITIVE) JOIN DECOMPOSITION................................................................29
5.9.2 DEPENDENCY PRESERVING DECOMPOSITION........................................................................33
5.10 DECOMPOSITION OF RELATIONS TO CONVERT THEM INTO HIGHER NORMAL FORM............37
5.11 Denormilzation topic to be added................................................................................................41
2
Introduction
In unit 2 and unit 3, we presented various aspects of creating a relational model and its
associated terminologies. So far, we have seen that attributes are grouped to form a
relation schema by mapping a database schema design from a conceptual data model (ER
model). ER model makes the designer identify entity types and relationship types and their
respective attributes, which leads to a natural and logical grouping of the attributes into
relations. Each relation schema consists of several attributes, and the relational database
schema consists of several relation schemas.
However, we still need some formal way of analyzing why one grouping of attributes into a
relation schema may be better than another. While discussing database design in unit 2 and
unit 3, we did not develop any measure of appropriateness or goodness to measure the
quality of the design, other than the designer's intuition. This unit will discuss some of the
theories developed to evaluate relational schemas for design quality—that is, to formally
measure why one set of groupings of attributes into Relation schemas is better than
another.
Relational database design ultimately produces a set of relations. The implicit goals of the
design activity are information preservation and minimum redundancy. Information is very
hard to quantify—hence we consider information preservation in terms of maintaining all
concepts, including attribute types, entity types, relationship types, and
generalization/specialization relationships, described using a model such as the ER model.
Thus, the relational design must preserve all of these concepts, which are originally
captured in the conceptual design after the conceptual to logical design mapping.
Minimizing redundancy implies minimizing redundant storage of the same data and
reducing the need for multiple updates to maintain consistency across multiple copies of the
same information in response to real-world events requiring an update.
Data redundancy occurs when the same piece of data is stored in two or more separate
places. Suppose we create a Relation to store sales records, and in the records for each sale,
we enter the customer address as one of the attributes. Now we have multiple sales to the
same customer, so the same address is entered multiple times. The address that is
repeatedly entered is redundant data.
Data redundancy normally happens when we try to combine attributes from multiple entity
types and relationship types into a single Relation.
3
To understand data redundancy and the problems associated with it, let us take an example
of Relation below:
Let us take a scenario (as below) where we have a FACULTY_DETAIL Relation that stores all
the faculty attributes and the department they work for. The department data is not stored
separately.
In the Figure 5.1 Relation FACULTY_DETAIL, we have redundant data in the column –
dept_location. For each faculty, while specifying its department, dept_location information
needs to be repeated. This Relation suffers from insertion, updation, and deletion
anomalies.
Suppose the college starts a new department (CSE-DS at Bhabha Block) that is yet to have
any faculty. If we want to insert this new department in the above Relation, it will cause an
insertion anomaly because of the absence of faculty in this new department. The faculty_Id
will be NULL for this record/tuple. This is not possible because faculty_Id is a Primary Key.
This is an insertion anomaly.
Suppose the location of a department is changed. The new location needs to be updated for
this particular department in all the rows/tuples where it appears. While carrying this
updation process, if we miss any row/tuple where this department appears, this
department's data will be inconsistent in the Relation. This is called updation anomaly.
If faculty_Id - 2765 (Girish) leaves the college and his record is deleted from the database.
We can see that he is the only faculty in the ME department. The moment we delete
faculty_Id - 2765 record/tuple from the Relation, ME department information is also lost.
This is called deletion anomaly.
4
If we decompose the above FACULTY_DETAIL Relation into two separate relations, say
faculty and department shown in Figure 5.2, we will eliminate the data redundancy and
related anomalies discussed above.
When we convert the ER model into a relational model, in most cases, substantial
normalization is already achieved by virtue of implicit and explicit constraints discussed in
Unit 3. However, we will discuss all the normal forms in detail to understand the
normalization process.
The first normal form (1NF) imposes a fundamental requirement on relations. We say that a
relation schema R is in first normal form (1NF) if the domains of all attributes of R are
atomic. A domain of an attribute is atomic if elements of the domain are considered to be
indivisible units.
It means that multivalued attributes, composite attributes, and their combinations are not
allowed in a Relation that is in first normal form.
Multivalued attribute: A multivalued attribute may have one or more values for a particular
entity. Example – Phone Number. In our SMS case study, the phone number attribute in the
STUDENT entity type is a multivalued attribute. It means that a student can have multiple
phone numbers. If you remember, this also comes from the implicit constraint applied to
relational databases.
Composite attribute: Composite attributes are not atomic because they are assembled
using some other atomic attributes. A typical example of a composite attribute is a person's
address, composed of atomic attributes, such as House No., Street, City, State, Pincode.
In the case of a composite attribute, we can still store it in the database without violating
any database constraint; however, it is not a good database design. Storing a composite
attribute in the database will make data querying and analysis on its constituent atomic
attributes very complex. It can also result in the redundancy of data.
5
Let us try and understand it with an example shown in Figure 5.3 below:
For handling a Composite attribute, we need to create a separate column for each part of
the composite attribute as shown in Figure 5.4, as the number of parts in a composite
attribute will be fixed for most of the cases.
Option 1 : Expand the Key of this Relation to include phone_no with roll_no. The Relation
will now have a composite primary key consisting of roll_no & phone_no.
This arrangement achieves the first normal form (1NF); however, it is not a good design as it
introduces data redundancy (shown in Figure 5.5) into the Relation. For each additional
phone number of a student, the data in other columns is repeated.
Figure 5. 5: Representation of converting student relation in INF by expanding the KEY (phone_no with roll_no)
Option 2: Suppose the maximum number of values is known for phone_no, as many
columns can be added to the existing Relation.
Let us assume a student can have a maximum of two phone_no as shown in Figure 5.6 . We
can create the below relation design, with two separate columns to store two possible
6
student phone numbers to achieve the first normal form (1NF). This is not a good design as
it limits the phone numbers a student can have. If we want to allow more phone numbers,
the relation design would need a change, which is not a good design practice.
Figure 5.6: Representation of converting student relation in INF by adding columns for maximum no. of phone nos.
Option 3 : Decompose this Relation into two relations – STUDENT & STUDENT_PHONE_NO.
They are linked to each other with the Primary Key (PK) - Foreign Key (FK) relationship. This
is a good design as it takes care of data redundancy and does not limit the number of phone
numbers a student can have and is repsresnted in the Figure 5.7 below.
Figure 5.7: Representation of converting student relation in INF by decomposing the Relation in Student and
STUDENT_PHONE_NO relations
7
5.3 SECOND NORMAL FORM (2NF)
We have seen that the First Normal Form (1NF) does not focus much on eliminating
redundancy due to functional dependency, but rather, it focuses on eliminating repeating
groups.
The Second Normal Form (2NF) is based on the concept of full functional dependency. The
Second Normal Form applies to relations with composite keys, that is, relations with a
primary key composed of two or more attributes. A Relation with a single-attribute primary
key is automatically in at least 2NF. A Relation not in 2NF may suffer from inconsistency
problems arising during insert, delete and update operations.
Definition:
For a Relation to be in 2NF, it should fulfill the below two conditions:
1. The Relation should be in 1NF
2. The Relation should have No Partial Dependency, i.e., no non-prime attribute
(attributes that are not part of any Primary/candidate key) is dependent on any
proper subset of any candidate key of the Relation.
How to check:
• 2NF applies to relations with composite candidate keys. A Relation with a single-
attribute candidate Keys is automatically in at least 2NF.
• Proper Subset (CK/PK) → any non-prime attribute should not hold.
Example 1:
Let us assume a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.The tabular representation of TEACHER
Realtion is shown in Figure 5.8.
8
Figure 5.8: Representation of TEACHER relation with data redundancy
The FDs in the above Relation teacher_id → teacher_age can be depicted as:
Relation (ABC) with FD = A→C
Let us find the candidate key of the above Relation. Candidate Key is (AB).
Prime Attributes – A, B
Non-prime Attributes – C
We have a composite candidate key (AB), and its proper subset (A) can determine a non-
prime attribute (C), FD (A→C). So this is a case of partial dependency. Therefore the Relation
is not in 2NF.
To convert this Relation into 2NF, we need to remove the partially dependent attribute(s)
from the Relation by placing them in a new Relation along with a copy of their determinant
which is shown in Figure 5.9.
Figure 5.9: Decomposition of TEACHER relation into TEACHER_DETAIL and TEACHER_SUBJECT to remove redundancy
Example 2:
In the previous section, when we converted the Relation into 1NF using option 1, we go the
Relation as shown in Figure 5.10:
9
Figure 5.10: Representation of STUDENT relation
Now let us analyze this Relation from a functional dependency point of view and find out if
this is in 2NF or not.
We can re-write the above as Relation R(ABCDEFGHIJKL) with FDs = A→BCDEFGHIJK, I→J.
Let us find the candidate key of the above Relation. Candidate Keys are (AL).
Prime Attributes – A, L
Non-prime Attributes – B, C, D, E, F, G, H, I, J, K
We have a composite candidate key (AL), and its proper subset (A) can determine non-prime
attributes (B, C, D, E, F, G, H, I, J, K), FD (A→BCDEFGHIJK). So this is a case of partial
dependency. Therefore the Relation is not in 2NF.
To convert this Relation into 2NF, we need to remove the partially dependent attribute(s)
from the Relation by placing them in a new relation along with a copy of their determinant
which is shown in Figure 5.11.
Example 3:
Let's take Relation R(A,B,C,D,E,F) with FD set = (A→B, B→C, C→D, D→E). Let us find if this
Relation is in 2NF or not.
10
The candidate key of the above Relation is (A). As the candidate key is not composite, the
case of partial dependency does not arise. Therefore the Relation is in 2NF.
Example 4:
Let’s take Relation R(A,B,C,D) with FD set = (AB→CD, C→A, D→B). Let us find if this Relation
is in 2NF or not.
The candidate keys of the above Relation are (AB), (BC), (CD), (AD).
Prime Attributes – A, B, C, D
Non-prime Attributes – NILL
In this case, though, we have composite candidate keys but no non-prime attribute. So the
case of partial dependency does not arise. Therefore the Relation is in 2NF.
Example 5:
Let’s take Relation R(A,B,C,D) with FD set = (A→B, B→D). Let us find if this Relation is in 2NF
or not.
In this case, we have a composite candidate key (AC), and its proper subset (A) can
determine a non-prime attribute (B), FD (A→B). So this is a case of partial dependency.
Therefore the Relation is not in 2NF.
11
5.4 THIRD NORMAL FORM (3NF)
Although Second Normal Form (2NF) relations have less redundancy than those in 1NF, they
may still suffer from inconsistency problems arising during insert, delete and update
operations.
Definition:-
For a Relation to be in 3NF, it should fulfill both the below two conditions.
1. The Relation should be in 2NF.
2. There should be no non-prime attribute that is transitively dependent on the primary
key or any candidate key
or
A non-prime attribute should not functionally depend on the other non-prime
attribute.
This means if we have a Relation R(A,B,C,D) with FDs = A→BCD, B→C. In this Relation, (A) is
the candidate key and we have a transitive dependency, A→B, B→C.
We have a non-prime attribute (C) that is transitively dependent on candidate key (A),
therefore this Relation is not in 3NF or we can say, we have a non-prime attribute (C) which
is dependent on another non-prime attribute (B); hence the Relation is violating the 3NF
condition.
How to check:-
A Relation is in 3NF if at least one of the following condition holds in every non-trivial
function dependency X→Y:
• X is a super key
• Y is a prime attribute
The normalization of 2NF relations to 3NF involves the removal of transitive dependencies.
If a transitive dependency exists, we remove transitively dependent attribute(s) from the
Relation by placing the attribute(s) in a new Relation along with a copy of the determinant.
The remaining attributes of the Relation along with the determinant above remain part of
the base Relation.
Example 1:
12
In the previous section, in example 2, we converted the STUDENT Relation from 1NF to 2NF
by decomposing it into two separate relations STUDENT_DETAIL and STUDENT_PHONE_NO.
Now let us analyze the STUDENT_DETAIL Relation, which is already in 2NF shown in Figure
5.12.
The candidate key of the Relation is roll_no. In this Relation, we have a transitive
dependency roll_no → city, city → state. This transitive dependency is causing data
redundancy in the Relation. Therefore this Relation is not in 3NF.
The normalization of this Relation to 3NF will involve the removal of transitive
dependencies. We need to remove the transitively dependent attribute(s) from the Relation
by placing the attribute(s) in a new Relation (CITY_STATE_MASTER) along with a copy of the
determinant which is shown in Figure 5.13.
13
Example 2:
Let us take Relation R(A, B, C, D) with FD set = (A→B, B→C, C→D). Let us find if this Relation
is in 3NF or not.
Example 3:
Let’s take Relation R(A,B,C,D,E,F) with FD set = (AB→CDEF, BD→F). Let's find if this Relation
is in 3NF or not.
AB→CDEF, AB is a super key (we know all candidate keys are super keys) – 3NF condition
met.
BD→F, BD is not a super key, and F is not a prime attribute – 3NF condition failed.
Therefore we can conclude that the above Relation is not 3NF.
Example 4:
14
Let's take Relation R(A,B,C,D,E) with FD set = (A→B, B→C, C→D, D→A). Let us find if this
Relation is in 3NF or not.
The candidate key of the above Relation is (AE), (DE), (CE), (BE).
Prime attributes – A, B, C, D, E
Non-prime attributes – NILL
A→B, A is not a super key, but B is a prime attribute – 3NF condition met.
B→C, B is not a super key, but C is a prime attribute – 3NF condition met.
C→D, C is not a super key, but D is a prime attribute – 3NF condition met.
D→A, D is not a super key, but A is a prime attribute – 3NF condition met.
Therefore we can conclude that the above Relation is in 3NF.
15
5.5 BOYCE CODD NORMAL FORM (BCNF)
Boyce-Codd Normal Form or BCNF is an extension to the third normal form and is also
known as the 3.5 Normal Form. Some redundancies might still remain even after a Relation
is in 3NF.
Definition:
For a Relation to be in BCNF, it should fulfill both the below two conditions.
1. The Relation should be in 3NF.
2. For each non-trivial functional dependency X→Y, X should be a Super Key
or
The Relation has no non-trivial functional dependency i.e.; the Relation is an all-key
Relation (all attributes make the only candidate key)
Example 1:
This Relation is in 3NF (use the concepts learned in the previous section). Now let us analyze
each FD for BCNF condition:
A→B, A is a super key – BCNF condition met.
B→C, B is a super key – BCNF condition met.
C→A, C is a super key – BCNF condition met.
All FDs are meeting the BCNF condition; therefore, we can conclude that the above Relation
is in BCNF.
Example 2:
16
Prime attributes – A, B, C
Non-prime attributes – NILL
This Relation is in 3NF (use the concepts learned in the previous section). Now let us analyze
each FD for BCNF condition:
AB→C, AB is a super key – BCNF condition met.
C→B, C is not a super key – BCNF condition not met.
All FDs are not meeting the BCNF condition; therefore, we can conclude that the above
Relation is not in BCBF.
Example 3:
This Relation satisfies the 1st Normal form because all the values are atomic, column names
are unique, and all the values stored in a particular column are of the same domain.
This Relation also satisfies the 2nd Normal Form as there is no Partial Dependency.
And, there is no Transitive Dependency; hence the Relation also satisfies the 3rd Normal
Form.
17
But this Relation is not in Boyce-Codd Normal Form as FD; professor → subject does not
meet the BCNF condition. Here LHS (professor) is not a super key.
To make this Relation satisfy BCNF, we will decompose this Relation into two relations
STUDENT_PROFESSOR and PROFESSOR_SUBJECT which is shown in Figure 5.15.
18
5.6 FINDING THE HIGHEST NORMAL FORM OF A GIVEN RELATION
3NF: First, it should be in 2NF and at least one of the following condition holds in every
non-trivial function dependency X→Y:
X is a super key
Y is a prime attribute
BCNF: First, it should be in 3NF and if there exists a non-trivial dependency between two
sets of attributes X and Y such that X→Y, then X is Super Key
Figure 5.16: Venn diagram showing the relationship between various normal forms
In Figure 5.16, Venn diagram shows the relationship between various normal forms. If a
Relation is in BCNF, it is already in 3NF, 2NF & 1NF. That is why we start checking a Relation
for BCNF and then move to 3NF and so on.
Now let us work on a few examples to find the highest normal form of a given Relation.
Example 1:
19
Relation R(ABCDEFGH) with FDs = {ABC→DE, E→GH, H→G, G→H, ABCD→EF}
Step 1:
Candidate key of this Relation is (ABC)
Step 2:
Prime attributes: A, B, C
Non-prime attributes: D, E, F, G, H
Step 3:
Check for BCNF
ABC→DE, ABC is a super key – BCNF condition met.
E→GH, E is not a super key – BCNF condition not met.
H→G, H is not a super key – BCNF condition not met.
G→H, G is not a super key – BCNF condition not met.
ABCD→EF, ABCD is a super key – BCNF condition met.
As all FDs are not meeting BCNF conditions, this Relation is not in BCNF.
Example 2:
Step 1:
Candidate keys of this Relation are (A), (BC), (CD).
20
Step 2:
Prime attributes: A, B, C, D
Non-prime attributes: NILL
Step 3:
Check for BCNF
A→BCD, A is a super key – BCNF condition met.
BC→AD, BC is a super key – BCNF condition met.
D→B, D is not a super key – BCNF condition not met.
As all FDs are not meeting BCNF conditions, this Relation is not in BCNF.
Example 3:
Step 1:
Candidate key of this Relation is (AB)
Step 2:
Prime attributes: A, B
Non-prime attributes: C, D
Step 3:
Check for BCNF
AB→C, AB is a super key – BCNF condition met.
ABD→C, ABD is a super key – BCNF condition met.
ABC→D, ABC is a super key – BCNF condition met.
AC→D, AC is not a super key – BCNF condition not met.
As all FDs are not meeting BCNF conditions, this Relation is not in BCNF.
21
AC→D, AC is a not super key – 3NF condition not met.
As all FDs are meeting 3NF conditions, this Relation is in 3NF.
Example 4:
Step 1:
Candidate keys of this Relation are (AB), (AD)
Step 2:
Prime attributes: A, B, D
Non-prime attributes: C, E
Step 3:
Check for BCNF
AB→CDE, AB is a super key – BCNF condition met.
D→BE, D is not a super key – BCNF condition not met.
As all FDs are not meeting BCNF conditions, this Relation is not in BCNF.
22
As all FDs are not meeting 2NF conditions, this Relation is not in 2NF.
So this Relation is in 1NF.
23
5.7 FOURTH NORMAL FORM (4NF)
The fourth Normal Form comes into the picture when Multivalued Dependency
(MVD) occurs in any Relation. These relations need to be identified and decomposed further
into a 4NF decomposition to improve database design which is shown in Figure 5.17.
Definition:
For a Relation to be in 4NF, it should fulfill the below two conditions:
1. The Relation should be in BCNF
2. The Relation should not have any Multivalued Dependency (MVD).
Multivalued Dependency (MVD):
• Multivalued dependencies are a consequence of 1NF, which disallows multivalued
attributes in a tuple and the accompanying process of converting an un-normalized
Relation into 1NF.
• Suppose we have two or more multivalued independent attributes in the same
Relation. In that case, we get into having to repeat every value of one attribute with
every value of the other attribute to keep the relation state consistent and maintain
the independence among the attributes involved.
• A non-trivial multivalued dependency specifies this constraint.
Example:
24
4NF normalization process:
25
5.8 FIFTH NORMAL FORM (5NF)
Definition:
A Relation R is in 5NF if and only if it satisfies the following conditions:
1. R should be already in 4NF.
2. It should not have any join dependency.
Join dependency – If the join of R1 and R2 over C is equal to relation R, then we can say
that a join dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C)
and R2(C, D) of a given relations R (A, B, C, D). Otherwise, R1 and R2 are lossless
decompositions of R.
A JD ⋈ {R1, R2, …, Rn} is said to hold over a relation R if R1, R2, ….., Rn is a lossless-join
decomposition.
A Relation R is in 5NF if and only if it cannot be decomposed further into two or more
relations with a loss-less join Property, ensuring that no spurious or extra tuples are
generated when relations are reunited through a natural join.
Example:
26
Multiple professors can teach each subject. For example, Java is taught by Amit,
Mohit & Payal.
Each professor can teach multiple subjects. For example, Amit can teach Java & C++.
From the ER modeling perspective, the above Relation is the outcome of a ternary
relationship type between student, subject, and professor shown in Figure 5.20
Figure 5.20: Representation of Ternay relationship between STUDENT, SUBJECT and PROFESSOR
If we decompose the above Relation into three separate binary relations as shown below in
Figure 5.21:
Figure 5.21: Decomposition of STUDENT, SUBJECT and PROFESSOR relation into three separate binary relations
We can see from the above decomposition that there is a loss of information.
From the above information, it is impossible to decipher who is teaching C++ to student 101.
Hence there is a loss of information; therefore, this decomposition is not lossless. There is
no join dependency between the base Relation and the decomposed relations.
27
5.9 CONDITIONS FOR RELATION DECOMPOSITION
We have gone through the details of the normalization process to help reduce data
redundancy and hence avoid insertion, updation, and deletion anomalies. One thing
common across the normalization process is the decomposition of base relations into two
or more relations to achieve a higher normal form.
When we decompose a Relation into two or more relations to achieve a higher normal form,
we need to make sure that the decomposition is:
1. Lossless (non-additive) join decomposition
2. Dependency preserving decomposition (optional in case of BCNF decomposition)
Lossless (non-additive) join decomposition ensures that no spurious tuples are generated
when a natural join operation is applied to the relations resulting from the decomposition.
The condition of no spurious tuples should hold on every legal relation state. The lossless
join property is always defined for a specific set F of dependencies. The word loss in lossless
refers to loss of information, not to the loss of tuples.
If we decompose a Relation r(R) into r 1 (R1) and r2 (R2) such that R1 Ս R2 = R (attribute
preservation condition), then it is said to be lossless if it satisfies r 1 ⋈ r2 = r with no new
tuples added and no tuples eliminated.
If we decompose a Relation r(R) into r 1 (R1), r2 (R2)….rk (Rk) such that R1 Ս R2….Ս Rk = R
(attribute preservation condition) is said to be lossless if it satisfies r 1 ⋈ r2 ⋈ …rk = r with no
new tuples added and no tuples eliminated.
Example 1:
Case 1:
⋈
28
In case 1, we can see that R1 U R2 = R and r1 ⋈ r2 = r. It is a lossless join decomposition.
⋈
Case 2:
r(R): r1(R1): r2(R2): r1(R1) r2(R2)
A B C A B A C A B C
a1 b1 c1 a1 b1 a1 c1 a1 b1 c1 √
a2 b2 c1 a2 b2 a2 c1 a1 b1 c2 X
a1 b2 c2 a1 b2 a1 c2 a1 b1 c3 √
a3 b2 c3 a3 b2 a3 c3 a2 b2 c1 √
a1 b1 c3 a2 b1 a1 c3 a2 b2 c4 X
a2 b1 c4 a2 c4 a1 b2 c1 X
a1 b2 c2 √
a1 b2 c3 X
a3 b2 c3 √
a2 b1 c1 X
a2 b1 c4 √
√ Correct tuple
X Spurious tuple
For lossless join decomposition using FD set, the following conditions must hold:
1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R
must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. The intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. The common attribute must be a key for at least one Relation (R1 or R2)
Att(R1) ∩ Att(R2) → Att(R1) or Att(R1) ∩ Att(R2) → Att(R2)
In the above example in case 1, the common attribute of R 1 and R2, i.e., A, is the candidate
key of R1, and we can see that decomposition is lossless (non-additive).
In the above example in case 2, the common attribute of R 1 and R2, i.e., A, is not a candidate
key of either R1 or R2, and we can see that decomposition is not lossless (lossy).
Example 2:
A Relation R(A,B,C,D,E,F) with FD set {AB→C, C→D, D→EF, F→A, D→B} is decomposed into
R1(A, B, C), R2(C, D, E), R3(E, F)
29
As Join (⋈) is a binary operation so we will take two relations at a time.
30
{For each functional dependency X→Y in F
{For all rows in S that have the same symbols in the columns corresponding
to attributes in X
{Make the symbols in each column that correspond to an attribute in
Y be the same in all these rows as follows:
If any of the rows have an 'a' symbol for the column, set the other
rows to that same 'a' symbol in the column.
If no 'a' symbol exists for the attribute in any of the rows, choose one
of the 'b' symbols that appears in one of the rows for the attribute
and set the other rows to that same 'b' symbol in the column ;};
}}}
4. If a row is made up entirely of 'a’ symbols, then the decomposition has the non-
additive join property; otherwise, it does not.
Example 3:
R(A,B,C,D,E)
Decomposition is:
R1(AD) ; R2(AB) ; R3(BE) ; R4(CDE) ; R5(AE)
Set of functional dependencies FD = {A→C, B→C, C→D, DE→C, CE→A}. Verify whether this
decomposition is lossless or lossy.
Now consider a set of functional dependencies F= {A→C, B→C, C→D, DE→C, CE→A}
Adding two lines, simillary for rest
31
1. A → C 2. B → C
A B C D E A B C D E
AD a1 b12 b13 a4 b15 AD a1 b12 b13 a4 b15
AB a1 a2 b13 b24 b25 AB a1 a2 b13 b24 b25
BE b31 a2 b33 b34 a5 BE b31 a2 b13 b34 a5
CDE b41 b42 a3 a4 a5 CDE b41 b42 a3 a4 a5
AE a1 b52 b13 b54 a5 AE a1 b52 b13 b54 a5
3. C → D 4. DE → C
A B C D E A B C D E
AD a1 b12 b13 a4 b15 AD a1 b12 b13 a4 b15
AB a1 a2 b13 a4 b25 AB a1 a2 b13 a4 b25
BE b31 a2 b13 a4 a5 BE b31 a2 a3 a4 a5
CDE b41 b42 a3 a4 a5 CDE b41 b42 a3 a4 a5
AE a1 b52 b13 a4 a5 AE a1 b52 a3 a4 a5
5. CE → A 6. A → C
A B C D E A B C D E
AD a1 b12 b13 a4 b15 AD a1 b12 a3 a4 b15
AB a1 a2 b13 a4 b25 AB a1 a2 a3 a4 b25
BE a1 a2 a3 a4 a5 BE a1 a2 a3 a4 a5
CDE a1 b42 a3 a4 a5 CDE a1 b42 a3 a4 a5
AE a1 b52 a3 a4 a5 AE a1 b52 a3 a4 a5
A → C, B → C, C → D,
DE → C, CE → A, A → C
A B C D E
AD a1 b12 a3 a4 b15
AB a1 a2 a3 a4 b25
BE a1 a2 a3 a4 a5 All 'a' symbols are in this row
CDE a1 b42 a3 a4 a5
AE a1 b52 a3 a4 a5
32
check that the constraints are not violated in case of any update in any of the decomposed
relations. Dependency preservation is optional for BCNF decomposition.
Definition:
A Decomposition D = {R1, R2, R3….,Rn} of R is dependency preserving w.r.t a set F of
Functional dependency if (F1 U F2 U … U Fn)+ = F+.
How to check:
Consider a Relation R with some functional dependencies set F.R is decomposed or divided
into R1 with FD {F1} and R2 with {F2}, then there can be three cases:
Solution:
Step 1: For decomposed Relation R1(A, B, C) and R2(C, D), let us find the functional
dependency of each sub Relation as F1 and F2 using closure property.
To find FD’s for Relation R1, i.e., F1, we will consider all combination of attributes that
belong to Relation R1(ABC), i.e., find closure of A, B, C, AB, BC, and AC using original FD set F
(Note: ABC is not considered as it is always ABC due to triviality) and then eliminate such
FD’s in which any attribute appears which is not part of R1 Relation. No need to add trivial
functional dependencies
(A)+ = {A}) // Trivial hence ignore
(B)+ = {B} // Trivial hence ignore
(C)+ = {C, A, D} but D cannot be part of the closure because D is not present R1.
= Therefore, {C}+ = (C,A} now we will write FD as C→CA, But C on RHS is trivial attribute.
Hence remove from RHS. Finally, FD using {C} + is C→A ………………………………………….
(1)
(AB)+ = {A,B,C,D} but D can't be in closure as D is not present R1.
= {A,B,C}. Therefore FD will be AB→C // Removing trivial attributes (AB) from RHS…..
(2)
+
(BC) = {B,C,D,A} but D can't be in closure as D is not present R1.
= {A,B,C}. Therefore FD will be BC→A // Removing trivial attributes (BC) from RHS...…
(3)
(AC)+ = {A,C,D} but D can't be in closure as D is not present R1.
33
= {A,C}. Ignoring AC (trivial). Therefore no new FD is derived using AC.
To find FD’s for Relation R2, i.e., F2, we will consider all combination of attributes that
belongs to Relation R2(CD), i.e., C, D, CD (Note: CD is not considered as it is always CD due
to triviality)
Similarly, we can derive for F2 = {C→D}
Step 2: Test whether original Relation functional dependency {AB→C, C→D, D→A} exist in
{F1 U F2} or F = {F1 U F2}.
To find FD’s for Relation R1, i.e., F1, we will consider all combination of attributes that
belongs to Relation R1(ABC), i.e., find closure of A, B, C, AB, BC and AC using original FD set F
(A)+ = {A,B,C,D} but we will ignore A (trivial) & D (D is not part of R1). Therefore, {A} + = {B,C}.
We can write Functional dependency derived from A as A→BC …………..……………… (1)
(B) = {B,C,D,A}. Ignoring B (trivial) & D (D not the part of R1). Therefore, {B} + = {C,A}.
+
We can write Functional dependency derived from AB as AB→C. But please note
that this is duplicate FD because attribute A alone can derive C in equation (1)
above
34
or we can say we will not check any combination of attributes, with attribute(s)
which
itself is capable of acting as the key of the Relation R. Hence we will ignore this FD as
part of F1 set.
Similarly, (A)+, (B)+, (C)+ derive all attributes of Relation R; hence testing the combination like
AC & BC will not going to add any new functional dependency in the set F1.
Therefore final F1 = {A→BC, B→CA, C→BA}
To find FD’s for Relation R2, i.e., F2, we will consider all the combination of attributes of
R2(CDE), i.e., C, D, E, CD, CE, DE using original functional dependency set F = F= {A→B, B→C,
C→D, D→A}.
(C)+ = {C,D,A,B}. Ignoring C (trivial) & AB (AB not the part of R2). Therefore, {C}+ = {D}.
We can write Functional dependency derived from C as C→D ……………………………… (1)
(D) = {D,A,B,C}. Ignoring D (trivial) & AB (AB not the part of R2). Therefore, {D} + = {C}.
+
We can write Functional dependency derived from DE as DE→C. But please note this
is duplicate FD because D alone can derive C in equation (2). Hence we will ignore
this
FD.
(CE) = {C,E,D,A,B}. Ignoring CE (trivial) & AB (AB not part of R2). Hence {CE} + = {D}.
+
We can write Functional dependency derived from CE as CE→D. But please note this
is duplicate FD because C alone can derive D in equation (1). Hence we will ignore
this
FD.
Step 2: Test whether original Relation functional dependency F = {A→B, B→C, C→D, D→A}
exist in {F1 U F2} or F = {F1 U F2}.
35
D→A can be derived using axioms on {F1 U F2} i.e., using D→C & C→BA, we can derive
D→BA (using transitivity rule) & then applying the decomposing rule, we can infer D→B &
D→A. Hence, D→A is present in {F1 U F2}. This means F= {F1 U F2}.
We have understood the concept of 1NF, 2NF, 3NF & BCNF and find the highest normal
form of a given Relation. Let us use this knowledge to convert a given Relation into a higher
normal form. We will do this with a set of examples to bring more clarity.
Example 1:
36
We will create two separate relations to handle two partial dependencies A→B
(including B→E, as E is dependent on B) & C→D.
i.e. R1(A,B,E), R2(C,D). After removing the partial dependent (and their dependents)
attributes, the base Relation will be reduced to R3(A, C).
Step 4 – Check again if the above-decomposed relations have achieved the highest normal
form.
Relation R1(A, B, E)
Candidate Key – (A)
Prime attributes – A
Non-prime attributes – B, E
We see there is transitive dependency here B→E; therefore, this Relation is not in
3NF.
R2(C, D) & R3(A, C) are both in BCNF (you can check by concepts learned in the
earlier sections).
Step 5 – Decompose the Relation R1(A, B, E) to remove the anomalies identified above.
We have identified a transitive dependency in the above Relation, thus violating 3NF.
From previous sections, we know:
37
Conclusion: Relation R(A,B,C,D,E) with FDs = {A→B, B→E, C→D) is in 1NF. It can be
decomposed into 4 separate relations - R11(A,B), R12(B,E), R2(C,D) & R3(A,C) to achieve the
highest normal form of BCNF.
Example 2:
We will create two separate relations to handle two transitive dependencies B→C,
C→D
i.e. R1(BC) & R2(CD). After removing the partial dependent attributes, the base
Relation will be reduced to R3(AB).
Step 4 – Check again if the above-decomposed relations have achieved the highest normal
form.
R1(BC), R2(CD) & R3(AB) are all in BCNF (you can check by concepts learned in the
earlier sections).
Step 5 – After carrying out the decomposition, we need to make sure that one of the
decomposed relations contains the candidate key of the Relation R(A, B, C, D) i.e
(A). Here R3(A, B) meets the condition.
38
Conclusion: Relation R(A,B,C,D) with FDs = {A→B, B→C, C→D) is in 2NF. It can be
decomposed into 3 separate relations - R1(BC), R2(CD) & R3(AB) to achieve the highest
normal form of BCNF.
Example 3:
Step 4 – Check again if the above-decomposed relations have achieved the highest normal
form.
R1(D, B) & R2(A, C, D) are both in BCNF (you can check by concepts learned in the
earlier sections).
Step 5 – BCNF decompositions are not always dependency preserving; therefore, we don't
need to make sure that all candidate keys of the base Relation are there in the
decomposed relations.
39
Conclusion: Relation R(A,B,C,D) with FDs = {A→BCD, BC→AD, D→B} is in 3NF. It can be
decomposed into two separate relations - R1(D,B) & R2(A,C,D) to achieve the highest
normal form of BCNF.
40