Normalization and FD

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 40

DATABASE MANAGEMENT SYSTEM

DBMS CORE GROUP

1
UNIT-5
NORMALIZATION

Table of Contents
Introduction.............................................................................................................................................3
5.1 Data redundancy and the associated issues..................................................................................3
5.1.1 Insertion Anomaly.....................................................................................................................4
5.1.2 Updation Anomaly....................................................................................................................4
5.1.3 Deletion Anomaly......................................................................................................................4
5.2 FIRST NORMAL FORM (1NF)..........................................................................................................5
5.3 SECOND NORMAL FORM (2NF)......................................................................................................8
5.4 THIRD NORMAL FORM (3NF).......................................................................................................12
5.5 BOYCE CODD NORMAL FORM (BCNF).........................................................................................16
5.6 FINDING THE HIGHEST NORMAL FORM OF A GIVEN RELATION................................................19
5.7 FOURTH NORMAL FORM (4NF)....................................................................................................24
5.8 FIFTH NORMAL FORM (5NF)........................................................................................................26
5.9 CONDITIONS FOR RELATION DECOMPOSITION..........................................................................29
5.9.1 LOSSLESS (NON-ADDITIVE) JOIN DECOMPOSITION................................................................29
5.9.2 DEPENDENCY PRESERVING DECOMPOSITION........................................................................33
5.10 DECOMPOSITION OF RELATIONS TO CONVERT THEM INTO HIGHER NORMAL FORM............37
5.11 Denormilzation topic to be added................................................................................................41

2
Introduction

In unit 2 and unit 3, we presented various aspects of creating a relational model and its
associated terminologies. So far, we have seen that attributes are grouped to form a
relation schema by mapping a database schema design from a conceptual data model (ER
model). ER model makes the designer identify entity types and relationship types and their
respective attributes, which leads to a natural and logical grouping of the attributes into
relations. Each relation schema consists of several attributes, and the relational database
schema consists of several relation schemas.

However, we still need some formal way of analyzing why one grouping of attributes into a
relation schema may be better than another. While discussing database design in unit 2 and
unit 3, we did not develop any measure of appropriateness or goodness to measure the
quality of the design, other than the designer's intuition. This unit will discuss some of the
theories developed to evaluate relational schemas for design quality—that is, to formally
measure why one set of groupings of attributes into Relation schemas is better than
another.

Relational database design ultimately produces a set of relations. The implicit goals of the
design activity are information preservation and minimum redundancy. Information is very
hard to quantify—hence we consider information preservation in terms of maintaining all
concepts, including attribute types, entity types, relationship types, and
generalization/specialization relationships, described using a model such as the ER model.
Thus, the relational design must preserve all of these concepts, which are originally
captured in the conceptual design after the conceptual to logical design mapping.
Minimizing redundancy implies minimizing redundant storage of the same data and
reducing the need for multiple updates to maintain consistency across multiple copies of the
same information in response to real-world events requiring an update.

5.1 Data redundancy and the associated issues

Data redundancy occurs when the same piece of data is stored in two or more separate
places. Suppose we create a Relation to store sales records, and in the records for each sale,
we enter the customer address as one of the attributes. Now we have multiple sales to the
same customer, so the same address is entered multiple times. The address that is
repeatedly entered is redundant data.

Data redundancy normally happens when we try to combine attributes from multiple entity
types and relationship types into a single Relation.

3
To understand data redundancy and the problems associated with it, let us take an example
of Relation below:
Let us take a scenario (as below) where we have a FACULTY_DETAIL Relation that stores all
the faculty attributes and the department they work for. The department data is not stored
separately.

Figure 5.1: Represnetation of FACULTY_DETAIL Relation

In the Figure 5.1 Relation FACULTY_DETAIL, we have redundant data in the column –
dept_location. For each faculty, while specifying its department, dept_location information
needs to be repeated. This Relation suffers from insertion, updation, and deletion
anomalies.

5.1.1 Insertion Anomaly

Suppose the college starts a new department (CSE-DS at Bhabha Block) that is yet to have
any faculty. If we want to insert this new department in the above Relation, it will cause an
insertion anomaly because of the absence of faculty in this new department. The faculty_Id
will be NULL for this record/tuple. This is not possible because faculty_Id is a Primary Key.
This is an insertion anomaly.

5.1.2 Updation Anomaly

Suppose the location of a department is changed. The new location needs to be updated for
this particular department in all the rows/tuples where it appears. While carrying this
updation process, if we miss any row/tuple where this department appears, this
department's data will be inconsistent in the Relation. This is called updation anomaly.

5.1.3 Deletion Anomaly

If faculty_Id - 2765 (Girish) leaves the college and his record is deleted from the database.
We can see that he is the only faculty in the ME department. The moment we delete
faculty_Id - 2765 record/tuple from the Relation, ME department information is also lost.
This is called deletion anomaly.

4
If we decompose the above FACULTY_DETAIL Relation into two separate relations, say
faculty and department shown in Figure 5.2, we will eliminate the data redundancy and
related anomalies discussed above.

Figure 5.2: Representation of Faculty and Department Relation

This process of minimizing redundancy from a Relation or set of relations is called


normalization.

When we convert the ER model into a relational model, in most cases, substantial
normalization is already achieved by virtue of implicit and explicit constraints discussed in
Unit 3. However, we will discuss all the normal forms in detail to understand the
normalization process.

5.2 FIRST NORMAL FORM (1NF)

The first normal form (1NF) imposes a fundamental requirement on relations. We say that a
relation schema R is in first normal form (1NF) if the domains of all attributes of R are
atomic. A domain of an attribute is atomic if elements of the domain are considered to be
indivisible units.

It means that multivalued attributes, composite attributes, and their combinations are not
allowed in a Relation that is in first normal form.

Multivalued attribute: A multivalued attribute may have one or more values for a particular
entity. Example – Phone Number. In our SMS case study, the phone number attribute in the
STUDENT entity type is a multivalued attribute. It means that a student can have multiple
phone numbers. If you remember, this also comes from the implicit constraint applied to
relational databases.

Composite attribute: Composite attributes are not atomic because they are assembled
using some other atomic attributes. A typical example of a composite attribute is a person's
address, composed of atomic attributes, such as House No., Street, City, State, Pincode.
In the case of a composite attribute, we can still store it in the database without violating
any database constraint; however, it is not a good database design. Storing a composite
attribute in the database will make data querying and analysis on its constituent atomic
attributes very complex. It can also result in the redundancy of data.

5
Let us try and understand it with an example shown in Figure 5.3 below:

Figure 5.3: Representation of STUDENT relation with an address as composite attribute

For handling a Composite attribute, we need to create a separate column for each part of
the composite attribute as shown in Figure 5.4, as the number of parts in a composite
attribute will be fixed for most of the cases.

Figure 5.4: Representation of STUDENT relation with address as simple attribute

For handling a multivalued attribute, we have the three options:-

Option 1 : Expand the Key of this Relation to include phone_no with roll_no. The Relation
will now have a composite primary key consisting of roll_no & phone_no.

This arrangement achieves the first normal form (1NF); however, it is not a good design as it
introduces data redundancy (shown in Figure 5.5) into the Relation. For each additional
phone number of a student, the data in other columns is repeated.

Figure 5. 5: Representation of converting student relation in INF by expanding the KEY (phone_no with roll_no)

Option 2: Suppose the maximum number of values is known for phone_no, as many
columns can be added to the existing Relation.
Let us assume a student can have a maximum of two phone_no as shown in Figure 5.6 . We
can create the below relation design, with two separate columns to store two possible

6
student phone numbers to achieve the first normal form (1NF). This is not a good design as
it limits the phone numbers a student can have. If we want to allow more phone numbers,
the relation design would need a change, which is not a good design practice.

Figure 5.6: Representation of converting student relation in INF by adding columns for maximum no. of phone nos.

Option 3 : Decompose this Relation into two relations – STUDENT & STUDENT_PHONE_NO.
They are linked to each other with the Primary Key (PK) - Foreign Key (FK) relationship. This
is a good design as it takes care of data redundancy and does not limit the number of phone
numbers a student can have and is repsresnted in the Figure 5.7 below.

Figure 5.7: Representation of converting student relation in INF by decomposing the Relation in Student and
STUDENT_PHONE_NO relations

7
5.3 SECOND NORMAL FORM (2NF)

We have seen that the First Normal Form (1NF) does not focus much on eliminating
redundancy due to functional dependency, but rather, it focuses on eliminating repeating
groups.

The Second Normal Form (2NF) is based on the concept of full functional dependency. The
Second Normal Form applies to relations with composite keys, that is, relations with a
primary key composed of two or more attributes. A Relation with a single-attribute primary
key is automatically in at least 2NF. A Relation not in 2NF may suffer from inconsistency
problems arising during insert, delete and update operations.

Definition:
For a Relation to be in 2NF, it should fulfill the below two conditions:
1. The Relation should be in 1NF
2. The Relation should have No Partial Dependency, i.e., no non-prime attribute
(attributes that are not part of any Primary/candidate key) is dependent on any
proper subset of any candidate key of the Relation.
How to check:
• 2NF applies to relations with composite candidate keys. A Relation with a single-
attribute candidate Keys is automatically in at least 2NF.
• Proper Subset (CK/PK) → any non-prime attribute should not hold.

How to convert 1NF to 2NF:


The normalization of 1NF relations to 2NF involves the removal of partial functional
dependencies. If a partial dependency exists, we remove partially dependent attribute(s)
(along with their dependents, if any) from the Relation by placing them in a new Relation
along with a copy of their determinant. The remaining attributes of the Relation along with
the determinant above remain part of the base Relation.

What is Partial Functional Dependency?


If a proper subset of a Candidate Key determines any non-prime attribute, it means we
have a Partial Functional Dependency.
Proper Subset (CK/PK) → any non-prime attribute

Example 1:
Let us assume a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.The tabular representation of TEACHER
Realtion is shown in Figure 5.8.

8
Figure 5.8: Representation of TEACHER relation with data redundancy

The FDs in the above Relation teacher_id → teacher_age can be depicted as:
Relation (ABC) with FD = A→C

Let us find the candidate key of the above Relation. Candidate Key is (AB).
Prime Attributes – A, B
Non-prime Attributes – C

We have a composite candidate key (AB), and its proper subset (A) can determine a non-
prime attribute (C), FD (A→C). So this is a case of partial dependency. Therefore the Relation
is not in 2NF.

To convert this Relation into 2NF, we need to remove the partially dependent attribute(s)
from the Relation by placing them in a new Relation along with a copy of their determinant
which is shown in Figure 5.9.

Figure 5.9: Decomposition of TEACHER relation into TEACHER_DETAIL and TEACHER_SUBJECT to remove redundancy

Example 2:

In the previous section, when we converted the Relation into 1NF using option 1, we go the
Relation as shown in Figure 5.10:

9
Figure 5.10: Representation of STUDENT relation

Now let us analyze this Relation from a functional dependency point of view and find out if
this is in 2NF or not.

We can re-write the above as Relation R(ABCDEFGHIJKL) with FDs = A→BCDEFGHIJK, I→J.
Let us find the candidate key of the above Relation. Candidate Keys are (AL).

Prime Attributes – A, L
Non-prime Attributes – B, C, D, E, F, G, H, I, J, K
We have a composite candidate key (AL), and its proper subset (A) can determine non-prime
attributes (B, C, D, E, F, G, H, I, J, K), FD (A→BCDEFGHIJK). So this is a case of partial
dependency. Therefore the Relation is not in 2NF.

To convert this Relation into 2NF, we need to remove the partially dependent attribute(s)
from the Relation by placing them in a new relation along with a copy of their determinant
which is shown in Figure 5.11.

Figure 5.11: Representation of STUDENT relation in 2NF by removing partial dependency

Example 3:

Let's take Relation R(A,B,C,D,E,F) with FD set = (A→B, B→C, C→D, D→E). Let us find if this
Relation is in 2NF or not.

10
The candidate key of the above Relation is (A). As the candidate key is not composite, the
case of partial dependency does not arise. Therefore the Relation is in 2NF.

Example 4:

Let’s take Relation R(A,B,C,D) with FD set = (AB→CD, C→A, D→B). Let us find if this Relation
is in 2NF or not.

The candidate keys of the above Relation are (AB), (BC), (CD), (AD).
Prime Attributes – A, B, C, D
Non-prime Attributes – NILL

In this case, though, we have composite candidate keys but no non-prime attribute. So the
case of partial dependency does not arise. Therefore the Relation is in 2NF.

Example 5:

Let’s take Relation R(A,B,C,D) with FD set = (A→B, B→D). Let us find if this Relation is in 2NF
or not.

The candidate key of the above Relation is (AC).


Prime Attributes – A, C
Non-prime Attributes – B, D

In this case, we have a composite candidate key (AC), and its proper subset (A) can
determine a non-prime attribute (B), FD (A→B). So this is a case of partial dependency.
Therefore the Relation is not in 2NF.

11
5.4 THIRD NORMAL FORM (3NF)

Although Second Normal Form (2NF) relations have less redundancy than those in 1NF, they
may still suffer from inconsistency problems arising during insert, delete and update
operations.

A transitive dependency causes these inconsistency problems. Transitive dependency


causes redundancy in the Relation. We need to remove such dependencies by progressing
to the Third Normal Form (3NF).

Definition:-
For a Relation to be in 3NF, it should fulfill both the below two conditions.
1. The Relation should be in 2NF.
2. There should be no non-prime attribute that is transitively dependent on the primary
key or any candidate key
or
A non-prime attribute should not functionally depend on the other non-prime
attribute.

This means if we have a Relation R(A,B,C,D) with FDs = A→BCD, B→C. In this Relation, (A) is
the candidate key and we have a transitive dependency, A→B, B→C.
We have a non-prime attribute (C) that is transitively dependent on candidate key (A),
therefore this Relation is not in 3NF or we can say, we have a non-prime attribute (C) which
is dependent on another non-prime attribute (B); hence the Relation is violating the 3NF
condition.

How to check:-
A Relation is in 3NF if at least one of the following condition holds in every non-trivial
function dependency X→Y:
• X is a super key
• Y is a prime attribute

How to convert 2NF to 3NF:-

The normalization of 2NF relations to 3NF involves the removal of transitive dependencies.
If a transitive dependency exists, we remove transitively dependent attribute(s) from the
Relation by placing the attribute(s) in a new Relation along with a copy of the determinant.
The remaining attributes of the Relation along with the determinant above remain part of
the base Relation.

Example 1:

12
In the previous section, in example 2, we converted the STUDENT Relation from 1NF to 2NF
by decomposing it into two separate relations STUDENT_DETAIL and STUDENT_PHONE_NO.

Now let us analyze the STUDENT_DETAIL Relation, which is already in 2NF shown in Figure
5.12.

Figure 5.12: Representation of STUDENT_DETAIL relation in 2NF with data redundancy

FDs in the above Relation are:


roll_no → first_name, middle_name, last_name, dob, gender, house_no, street_name, city,
State, pincode
city → state

The candidate key of the Relation is roll_no. In this Relation, we have a transitive
dependency roll_no → city, city → state. This transitive dependency is causing data
redundancy in the Relation. Therefore this Relation is not in 3NF.

The normalization of this Relation to 3NF will involve the removal of transitive
dependencies. We need to remove the transitively dependent attribute(s) from the Relation
by placing the attribute(s) in a new Relation (CITY_STATE_MASTER) along with a copy of the
determinant which is shown in Figure 5.13.

Figure 5.13: Representation of STUDENT_DETAIL relation in 3NF by removing transitive dependencies

13
Example 2:

Let us take Relation R(A, B, C, D) with FD set = (A→B, B→C, C→D). Let us find if this Relation
is in 3NF or not.

The candidate key of the above Relation is (A).


Prime attributes – A
Non-prime attributes – B, C, D

Now let us analyze each FD for the 3NF condition:


A relation is in 3NF if at least one of the following condition holds in every non-trivial
function dependency X→Y:
A→B,• A isX aissuper
a superkeykey
(we know all candidate keys are super keys) – 3NF condition met.
B→C,•B isYnotis a aprime
superattribute
key, and C is not a prime attribute – 3NF condition failed.
Therefore we can conclude that the above Relation is not 3NF.
Data missing : - Example Not complete

Example 3:

Let’s take Relation R(A,B,C,D,E,F) with FD set = (AB→CDEF, BD→F). Let's find if this Relation
is in 3NF or not.

The candidate key of the above Relation is (AB).


Prime attributes – A, B
Non-prime attributes – C, D, E, F

Now let us analyze each FD for the 3NF condition:


A relation is in 3NF if at least one of the following condition holds in every non-trivial
function dependency X→Y:
• X is a super key
• Y is a prime attribute

AB→CDEF, AB is a super key (we know all candidate keys are super keys) – 3NF condition
met.
BD→F, BD is not a super key, and F is not a prime attribute – 3NF condition failed.
Therefore we can conclude that the above Relation is not 3NF.

Example 4:

14
Let's take Relation R(A,B,C,D,E) with FD set = (A→B, B→C, C→D, D→A). Let us find if this
Relation is in 3NF or not.

The candidate key of the above Relation is (AE), (DE), (CE), (BE).
Prime attributes – A, B, C, D, E
Non-prime attributes – NILL

Now let us analyze each FD for the 3NF condition:


A relation is in 3NF if at least one of the following condition holds in every non-trivial
function dependency X→Y:
• X is a super key
• Y is a prime attribute

A→B, A is not a super key, but B is a prime attribute – 3NF condition met.
B→C, B is not a super key, but C is a prime attribute – 3NF condition met.
C→D, C is not a super key, but D is a prime attribute – 3NF condition met.
D→A, D is not a super key, but A is a prime attribute – 3NF condition met.
Therefore we can conclude that the above Relation is in 3NF.

15
5.5 BOYCE CODD NORMAL FORM (BCNF)

Boyce-Codd Normal Form or BCNF is an extension to the third normal form and is also
known as the 3.5 Normal Form. Some redundancies might still remain even after a Relation
is in 3NF.

Definition:
For a Relation to be in BCNF, it should fulfill both the below two conditions.
1. The Relation should be in 3NF.
2. For each non-trivial functional dependency X→Y, X should be a Super Key
or
The Relation has no non-trivial functional dependency i.e.; the Relation is an all-key
Relation (all attributes make the only candidate key)

How to convert 3NF to BCNF:


The normalization of 3NF relations to BCNF involves creating new Relation for every
dependency that violates the BCNF condition. The remaining attributes of the Relation,
along with the determinant (of the FD violating the BCNF condition) above, remain part of
the base Relation.

Example 1:

Relation R(A,B,C) with FD set = (A→B, B→C, C→A).


The candidate key of the above Relation is (A), (B), (C).
Prime attributes – A, B, C
Non-prime attributes – NILL

This Relation is in 3NF (use the concepts learned in the previous section). Now let us analyze
each FD for BCNF condition:
A→B, A is a super key – BCNF condition met.
B→C, B is a super key – BCNF condition met.
C→A, C is a super key – BCNF condition met.

All FDs are meeting the BCNF condition; therefore, we can conclude that the above Relation
is in BCNF.

Example 2:

Relation R(A,B,C) with FD set = (AB→C, C→B).


The candidate key of the above Relation is (AB), (AC).

16
Prime attributes – A, B, C
Non-prime attributes – NILL
This Relation is in 3NF (use the concepts learned in the previous section). Now let us analyze
each FD for BCNF condition:
AB→C, AB is a super key – BCNF condition met.
C→B, C is not a super key – BCNF condition not met.

All FDs are not meeting the BCNF condition; therefore, we can conclude that the above
Relation is not in BCBF.

Example 3:

Below in Figure 5.14 we have a STUDENT_SUBJECT_PROFESSOR Relation with columns


student_id, subject, and professor.

Figure 5. 14: Representation of STUDENT_SUBJECT_PROFESSOR relation with data redundancy

In the above Relation:


 One student can enroll in multiple subjects. For example, a student with student_id
101 has opted for subjects - Java & C++.
 For each subject, a professor is assigned to the student.
 There can be multiple professors teaching one subject as we have for Java.
 One professor teaches only one subject.

FDs for this Relation:


student_id, subject → professor
professor → subject
Candidate key for the Relation – (student_id, subject)

This Relation satisfies the 1st Normal form because all the values are atomic, column names
are unique, and all the values stored in a particular column are of the same domain.
This Relation also satisfies the 2nd Normal Form as there is no Partial Dependency.
And, there is no Transitive Dependency; hence the Relation also satisfies the 3rd Normal
Form.

17
But this Relation is not in Boyce-Codd Normal Form as FD; professor → subject does not
meet the BCNF condition. Here LHS (professor) is not a super key.

To make this Relation satisfy BCNF, we will decompose this Relation into two relations
STUDENT_PROFESSOR and PROFESSOR_SUBJECT which is shown in Figure 5.15.

Figure 5.15: Representation of STUDENT_SUBJECT_PROFESSOR relation with no data redundancy (BCNF)

Note : BCNF is stronger than 3NF ( Reasons to be listed)

18
5.6 FINDING THE HIGHEST NORMAL FORM OF A GIVEN RELATION

Steps to find the highest normal form of a Relation:


1. Find all possible candidate keys of the Relation.
2. Divide all attributes into two categories: prime attributes and non-prime attributes.
3. Check for BCNF normal form, then 3NF, and so on. By definition (implicit
constraints), a Relation will always be in 1NF.

Summary of definition of Normal forms:

2NF: No non-prime attribute should be partially dependent on Candidate Key (CK).


i.e., Proper Subset (CK/PK) → any non-prime attribute should not hold.

3NF: First, it should be in 2NF and at least one of the following condition holds in every
non-trivial function dependency X→Y:
 X is a super key
 Y is a prime attribute

BCNF: First, it should be in 3NF and if there exists a non-trivial dependency between two
sets of attributes X and Y such that X→Y, then X is Super Key

Figure 5.16: Venn diagram showing the relationship between various normal forms

In Figure 5.16, Venn diagram shows the relationship between various normal forms. If a
Relation is in BCNF, it is already in 3NF, 2NF & 1NF. That is why we start checking a Relation
for BCNF and then move to 3NF and so on.

Now let us work on a few examples to find the highest normal form of a given Relation.

Example 1:

19
Relation R(ABCDEFGH) with FDs = {ABC→DE, E→GH, H→G, G→H, ABCD→EF}

Step 1:
Candidate key of this Relation is (ABC)

Step 2:
Prime attributes: A, B, C
Non-prime attributes: D, E, F, G, H

Step 3:
Check for BCNF
ABC→DE, ABC is a super key – BCNF condition met.
E→GH, E is not a super key – BCNF condition not met.
H→G, H is not a super key – BCNF condition not met.
G→H, G is not a super key – BCNF condition not met.
ABCD→EF, ABCD is a super key – BCNF condition met.
As all FDs are not meeting BCNF conditions, this Relation is not in BCNF.

Check for 3NF


ABC→DE, ABC is a super key – 3NF condition met.
E→GH, E is not a super key, and G&H are non-prime attributes – 3NF condition not met.
H→G, H is not a super key, and G is a non-prime attribute – 3NF condition not met.
G→H, G is not a super key, and H is a non-prime attribute – 3NF condition not met.
ABCD→EF, ABCD is a super key – 3NF condition met.
As all FDs are not meeting 3NF conditions, this Relation is not in 3NF.

Check for 2NF


ABC→DE, LHS not a proper subset of candidate key (ABC) – 2NF condition met.
E→GH, LHS not a proper subset of candidate key (ABC) – 2NF condition met.
H→G, LHS not a proper subset of candidate key (ABC) – 2NF condition met.
G→H, LHS not a proper subset of candidate key (ABC) – 2NF condition met.
ABCD→EF, LHS not a proper subset of candidate key (ABC) – 2NF condition met.
As all FDs are meeting 2NF conditions, this Relation is in 2NF.

Example 2:

Relation R(A,B,C,D) with FDs = {A→BCD, BC→AD, D→B}

Step 1:
Candidate keys of this Relation are (A), (BC), (CD).

20
Step 2:
Prime attributes: A, B, C, D
Non-prime attributes: NILL

Step 3:
Check for BCNF
A→BCD, A is a super key – BCNF condition met.
BC→AD, BC is a super key – BCNF condition met.
D→B, D is not a super key – BCNF condition not met.
As all FDs are not meeting BCNF conditions, this Relation is not in BCNF.

Check for 3NF


A→BCD, A is a super key – 3NF condition met.
BC→AD, BC is a super key – 3NF condition met.
D→B, D is not a super key, but B is a prime attribute – 3NF condition met.
As all FDs are meeting 3NF conditions, this Relation is in 3NF.
No need to check for 2NF, and as all 3NF relations are 2NF

Example 3:

Relation R(A,B,C,D) with FDs = {AB→C, ABD→C, ABC→D, AC→D}

Step 1:
Candidate key of this Relation is (AB)

Step 2:
Prime attributes: A, B
Non-prime attributes: C, D

Step 3:
Check for BCNF
AB→C, AB is a super key – BCNF condition met.
ABD→C, ABD is a super key – BCNF condition met.
ABC→D, ABC is a super key – BCNF condition met.
AC→D, AC is not a super key – BCNF condition not met.
As all FDs are not meeting BCNF conditions, this Relation is not in BCNF.

Check for 3NF


AB→C, AB is a super key – 3NF condition met.
ABD→C, ABD is a super key – 3NF condition met.
ABC→D, ABC is a super key – 3NF condition met.

21
AC→D, AC is a not super key – 3NF condition not met.
As all FDs are meeting 3NF conditions, this Relation is in 3NF.

Check for 2NF


AB→C, LHS not a proper subset of candidate key (AB) – 2NF condition met.
ABD→C, LHS not a proper subset of candidate key (AB) – 2NF condition met.
ABC→D, LHS not a proper subset of candidate key (AB) – 2NF condition met.
AC→D, LHS not a proper subset of candidate key (AB) – 2NF condition met.

As all FDs are meeting 2NF conditions, this Relation is in 2NF.

Example 4:

Relation R(A,B,C,D,E) with FDs = {AB→CDE, D→BE}

Step 1:
Candidate keys of this Relation are (AB), (AD)

Step 2:
Prime attributes: A, B, D
Non-prime attributes: C, E

Step 3:
Check for BCNF
AB→CDE, AB is a super key – BCNF condition met.
D→BE, D is not a super key – BCNF condition not met.
As all FDs are not meeting BCNF conditions, this Relation is not in BCNF.

Check for 3NF


AB→CDE, AB is a super key – 3NF condition met.
D→BE can be written as D→B, D→E.
D→B, D is not a super key, but B is a prime attribute – 3NF condition met.
D→E, D is not a super key, and E is not a prime attribute – 3NF condition not met.
As all FDs are not meeting the 3NF conditions, this Relation is not in 3NF.

Check for 2NF


AB→CDE, LHS not a proper subset of candidate key (AB) – 2NF condition met.
D→B, LHS is a proper subset of candidate key (AD), but B is not a non-prime attribute – 2NF
condition met.
D→E, LHS is a proper subset of candidate key (AD), and E is a non-prime attribute – 2NF
condition not met.

22
As all FDs are not meeting 2NF conditions, this Relation is not in 2NF.
So this Relation is in 1NF.

23
5.7 FOURTH NORMAL FORM (4NF)

The fourth Normal Form comes into the picture when Multivalued Dependency
(MVD) occurs in any Relation. These relations need to be identified and decomposed further
into a 4NF decomposition to improve database design which is shown in Figure 5.17.

Definition:
For a Relation to be in 4NF, it should fulfill the below two conditions:
1. The Relation should be in BCNF
2. The Relation should not have any Multivalued Dependency (MVD).
Multivalued Dependency (MVD):
• Multivalued dependencies are a consequence of 1NF, which disallows multivalued
attributes in a tuple and the accompanying process of converting an un-normalized
Relation into 1NF.
• Suppose we have two or more multivalued independent attributes in the same
Relation. In that case, we get into having to repeat every value of one attribute with
every value of the other attribute to keep the relation state consistent and maintain
the independence among the attributes involved.
• A non-trivial multivalued dependency specifies this constraint.

Example:

Figure 5.17: Representation to show the concept of multivalued dependency

24
4NF normalization process:

Figure 5.18: Representation of 4NF normalization process

Explanation of above example is required

25
5.8 FIFTH NORMAL FORM (5NF)

Fifth Normal Form in Database Normalization is generally not implemented in real-life


database design; however, we should know what it is. It is also known as Project Join
Normal Form (PJNF).

Definition:
A Relation R is in 5NF if and only if it satisfies the following conditions:
1. R should be already in 4NF.
2. It should not have any join dependency.

Join dependency – If the join of R1 and R2 over C is equal to relation R, then we can say
that a join dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C)
and R2(C, D) of a given relations R (A, B, C, D). Otherwise, R1 and R2 are lossless
decompositions of R.
A JD ⋈ {R1, R2, …, Rn} is said to hold over a relation R if R1, R2, ….., Rn is a lossless-join
decomposition.

A Relation R is in 5NF if and only if it cannot be decomposed further into two or more
relations with a loss-less join Property, ensuring that no spurious or extra tuples are
generated when relations are reunited through a natural join.

Example:

Figure 5.19: Representation of STUDENT_SUBJECT_PROFESSOR relation in 4NF

In the above Figure 5.19, 4NF Relation:


 One student can enroll in multiple subjects. For example, the student with
student_id 101 has opted for subjects – Java, C++ & C#

26
 Multiple professors can teach each subject. For example, Java is taught by Amit,
Mohit & Payal.
 Each professor can teach multiple subjects. For example, Amit can teach Java & C++.

From the ER modeling perspective, the above Relation is the outcome of a ternary
relationship type between student, subject, and professor shown in Figure 5.20

Figure 5.20: Representation of Ternay relationship between STUDENT, SUBJECT and PROFESSOR

If we decompose the above Relation into three separate binary relations as shown below in
Figure 5.21:

Figure 5.21: Decomposition of STUDENT, SUBJECT and PROFESSOR relation into three separate binary relations

We can see from the above decomposition that there is a loss of information.

Student 101 is studying subjects – Java, C++ & C#.


Student 101 is being taught by two professors – Amit & Rajan.
Amit can teach – Java & C++, and Rajan can teach – C# & C++.

From the above information, it is impossible to decipher who is teaching C++ to student 101.
Hence there is a loss of information; therefore, this decomposition is not lossless. There is
no join dependency between the base Relation and the decomposed relations.

Hence we can conclude that the base Relation student_subject_professor is in 5NF as it


cannot be further non-loss decomposed.

27
5.9 CONDITIONS FOR RELATION DECOMPOSITION

We have gone through the details of the normalization process to help reduce data
redundancy and hence avoid insertion, updation, and deletion anomalies. One thing
common across the normalization process is the decomposition of base relations into two
or more relations to achieve a higher normal form.

When we decompose a Relation into two or more relations to achieve a higher normal form,
we need to make sure that the decomposition is:
1. Lossless (non-additive) join decomposition
2. Dependency preserving decomposition (optional in case of BCNF decomposition)

Let us understand both these concepts in detail.

5.9.1 LOSSLESS (NON-ADDITIVE) JOIN DECOMPOSITION

Lossless (non-additive) join decomposition ensures that no spurious tuples are generated
when a natural join operation is applied to the relations resulting from the decomposition.
The condition of no spurious tuples should hold on every legal relation state. The lossless
join property is always defined for a specific set F of dependencies. The word loss in lossless
refers to loss of information, not to the loss of tuples.

If we decompose a Relation r(R) into r 1 (R1) and r2 (R2) such that R1 Ս R2 = R (attribute
preservation condition), then it is said to be lossless if it satisfies r 1 ⋈ r2 = r with no new
tuples added and no tuples eliminated.

If we decompose a Relation r(R) into r 1 (R1), r2 (R2)….rk (Rk) such that R1 Ս R2….Ս Rk = R
(attribute preservation condition) is said to be lossless if it satisfies r 1 ⋈ r2 ⋈ …rk = r with no
new tuples added and no tuples eliminated.

Example 1:

Case 1:

r(R): r1(R1): r2(R2): r1(R1) r2(R2)


A B C A B A C A B C
a1 b1 c1 a1 b1 a1 c1 a1 b1 c1
a2 b2 c1 a2 b2 a2 c1 a1 b1 c2
a1 b1 c2 a3 b2 a1 c2 a1 b1 c3
a3 b2 c3 a3 c3 a2 b2 c1
a1 b1 c3 a1 c3 a2 b2 c4
a2 b2 c4 a2 c4 a3 b2 c3

28
In case 1, we can see that R1 U R2 = R and r1 ⋈ r2 = r. It is a lossless join decomposition.

Case 2:
r(R): r1(R1): r2(R2): r1(R1) r2(R2)
A B C A B A C A B C
a1 b1 c1 a1 b1 a1 c1 a1 b1 c1 √
a2 b2 c1 a2 b2 a2 c1 a1 b1 c2 X
a1 b2 c2 a1 b2 a1 c2 a1 b1 c3 √
a3 b2 c3 a3 b2 a3 c3 a2 b2 c1 √
a1 b1 c3 a2 b1 a1 c3 a2 b2 c4 X
a2 b1 c4 a2 c4 a1 b2 c1 X
a1 b2 c2 √
a1 b2 c3 X
a3 b2 c3 √
a2 b1 c1 X
a2 b1 c4 √

√ Correct tuple
X Spurious tuple

In case 2, we can see that R1 U R2 = R and r1 ⋈ r2 ≠ r. It is not a lossless join decomposition.

For lossless join decomposition using FD set, the following conditions must hold:
1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R
must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. The intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. The common attribute must be a key for at least one Relation (R1 or R2)
Att(R1) ∩ Att(R2) → Att(R1) or Att(R1) ∩ Att(R2) → Att(R2)

In the above example in case 1, the common attribute of R 1 and R2, i.e., A, is the candidate
key of R1, and we can see that decomposition is lossless (non-additive).
In the above example in case 2, the common attribute of R 1 and R2, i.e., A, is not a candidate
key of either R1 or R2, and we can see that decomposition is not lossless (lossy).

Example 2:

A Relation R(A,B,C,D,E,F) with FD set {AB→C, C→D, D→EF, F→A, D→B} is decomposed into
R1(A, B, C), R2(C, D, E), R3(E, F)

Condition 1:- Att(R1) U Att(R2) U Att(R3) = (A,B,C,D) = R(A,B,C,D) – condition met

29
As Join (⋈) is a binary operation so we will take two relations at a time.

Att(R1) ∩ Att(R2) = (C) ≠ Φ – condition met


Let us check if (C) is a Key in either R1 or R2.
Find C+ = {C,D,E,F,A,B}, so we can see (C) can determine all attributes of both R1 & R2, hence
it is a Key in both R1 & R2 - condition met
So, R1(A,B,C) ⋈ R2(C,D,E) = R12(A,B,C,D,E) is a lossless join

Att(R12) ∩ Att(R3) = (E) ≠ Φ – condition met


Let us check if (E) is a Key in either R12 or R3.
Find E+ = {E}, so we can see (E) cannot determine all attributes of either R12 or R3 –
condition not met.
So, R12(A,B,C,D,E) ⋈ R3(E,F) = R(A,B,C,D,E,F) is not a lossless join
Therefore we can conclude that the whole decomposition R1 (ABC), R2 (CDE) & R3 (EF) is
not a lossless join

Algorithm to test for lossless (Non-additive) Join Property

Input: A universal Relation R, a decomposition D = {R1, R2, …, Rm} of R, and a set F of


functional dependencies.
Output: A decision whether decomposition is lossless or not.
1. Create an initial matrix S with one row i for each Relation Ri in D, and one column j
for each attribute Aj in R.
2. For each row i representing Relation schema Ri
{For each column j representing attribute Aj
{If Relation Ri includes attribute Aj:
Put the symbol aj i.e. S(i, j): = aj
Otherwise
Put the symbol bij i.e. S(i, j): = bij
}}
3. Repeat the following loop until a complete loop execution results in no changes to S

30
{For each functional dependency X→Y in F
{For all rows in S that have the same symbols in the columns corresponding
to attributes in X
{Make the symbols in each column that correspond to an attribute in
Y be the same in all these rows as follows:
If any of the rows have an 'a' symbol for the column, set the other
rows to that same 'a' symbol in the column.
If no 'a' symbol exists for the attribute in any of the rows, choose one
of the 'b' symbols that appears in one of the rows for the attribute
and set the other rows to that same 'b' symbol in the column ;};
}}}
4. If a row is made up entirely of 'a’ symbols, then the decomposition has the non-
additive join property; otherwise, it does not.

Example 3:

R(A,B,C,D,E)
Decomposition is:
R1(AD) ; R2(AB) ; R3(BE) ; R4(CDE) ; R5(AE)
Set of functional dependencies FD = {A→C, B→C, C→D, DE→C, CE→A}. Verify whether this
decomposition is lossless or lossy.

Solution: Initialization of matrix:


1 2 3 4 5
A B C D E
1 AD a1 b12 b13 a4 b15
2 AB a1 a2 b23 b24 b25
3 BE b31 a2 b33 b34 a5
4 CDE b41 b42 a3 a4 a5
5 AE a1 b52 b53 b54 a5

Now consider a set of functional dependencies F= {A→C, B→C, C→D, DE→C, CE→A}
Adding two lines, simillary for rest

31
1. A → C 2. B → C
A B C D E A B C D E
AD a1 b12 b13 a4 b15 AD a1 b12 b13 a4 b15
AB a1 a2 b13 b24 b25 AB a1 a2 b13 b24 b25
BE b31 a2 b33 b34 a5 BE b31 a2 b13 b34 a5
CDE b41 b42 a3 a4 a5 CDE b41 b42 a3 a4 a5
AE a1 b52 b13 b54 a5 AE a1 b52 b13 b54 a5
3. C → D 4. DE → C
A B C D E A B C D E
AD a1 b12 b13 a4 b15 AD a1 b12 b13 a4 b15
AB a1 a2 b13 a4 b25 AB a1 a2 b13 a4 b25
BE b31 a2 b13 a4 a5 BE b31 a2 a3 a4 a5
CDE b41 b42 a3 a4 a5 CDE b41 b42 a3 a4 a5
AE a1 b52 b13 a4 a5 AE a1 b52 a3 a4 a5
5. CE → A 6. A → C
A B C D E A B C D E
AD a1 b12 b13 a4 b15 AD a1 b12 a3 a4 b15
AB a1 a2 b13 a4 b25 AB a1 a2 a3 a4 b25
BE a1 a2 a3 a4 a5 BE a1 a2 a3 a4 a5
CDE a1 b42 a3 a4 a5 CDE a1 b42 a3 a4 a5
AE a1 b52 a3 a4 a5 AE a1 b52 a3 a4 a5
A → C, B → C, C → D,
DE → C, CE → A, A → C
A B C D E
AD a1 b12 a3 a4 b15
AB a1 a2 a3 a4 b25
BE a1 a2 a3 a4 a5 All 'a' symbols are in this row
CDE a1 b42 a3 a4 a5
AE a1 b52 a3 a4 a5

Thus, decomposition of R(A,B,C,D,E) in to R1(AD) ; R2(AB) ; R3(BE) ; R4(CDE) ; R5(AE) is a lossless


decomposition.

5.9.2 DEPENDENCY PRESERVING DECOMPOSITION

Dependency preserving or preserving functional dependencies


For a Relation R to be recoverable, its decomposition must be lossless, as explained in
earlier section. In addition to this, the decomposition must satisfy another property known
as dependency preservation. It states, if a Relation R is decomposed into relations R1 and
R2, then all functional dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of FD’s of R1 and R2.

Need of dependency preservation:


The set of FD’s on original Relation defines the integrity constraints that Relation needs to
meet. If any decomposition does not preserve the dependencies of the original Relation,
impose an unnecessary burden on the RDBMS by joining all these decomposed relations to

32
check that the constraints are not violated in case of any update in any of the decomposed
relations. Dependency preservation is optional for BCNF decomposition.

Definition:
A Decomposition D = {R1, R2, R3….,Rn} of R is dependency preserving w.r.t a set F of
Functional dependency if (F1 U F2 U … U Fn)+ = F+.
How to check:

Consider a Relation R with some functional dependencies set F.R is decomposed or divided
into R1 with FD {F1} and R2 with {F2}, then there can be three cases:

1. {F1 U F2} = F -----> Decomposition is dependency preserving.


2. {F1 U F2} is a subset of F -----> Decomposition is not Dependency preserving.
3. {F1 U F2} is a super set of F -----> This case is not possible.
Example 1:
Let a Relation R (ABCD) and functional dependency set F= {AB→C, C→D, D→A}. Relation R is
decomposed into R1(ABC) and R2(CD). Check whether decomposition is dependency
preserving or not.

Solution:
Step 1: For decomposed Relation R1(A, B, C) and R2(C, D), let us find the functional
dependency of each sub Relation as F1 and F2 using closure property.

To find FD’s for Relation R1, i.e., F1, we will consider all combination of attributes that
belong to Relation R1(ABC), i.e., find closure of A, B, C, AB, BC, and AC using original FD set F
(Note: ABC is not considered as it is always ABC due to triviality) and then eliminate such
FD’s in which any attribute appears which is not part of R1 Relation. No need to add trivial
functional dependencies
(A)+ = {A}) // Trivial hence ignore
(B)+ = {B} // Trivial hence ignore
(C)+ = {C, A, D} but D cannot be part of the closure because D is not present R1.
= Therefore, {C}+ = (C,A} now we will write FD as C→CA, But C on RHS is trivial attribute.
Hence remove from RHS. Finally, FD using {C} + is C→A ………………………………………….
(1)
(AB)+ = {A,B,C,D} but D can't be in closure as D is not present R1.
= {A,B,C}. Therefore FD will be AB→C // Removing trivial attributes (AB) from RHS…..
(2)
+
(BC) = {B,C,D,A} but D can't be in closure as D is not present R1.
= {A,B,C}. Therefore FD will be BC→A // Removing trivial attributes (BC) from RHS...…
(3)
(AC)+ = {A,C,D} but D can't be in closure as D is not present R1.

33
= {A,C}. Ignoring AC (trivial). Therefore no new FD is derived using AC.

Therefore F1 = {C→A, AB→C, BC→A} using (1), (2) & (3)

To find FD’s for Relation R2, i.e., F2, we will consider all combination of attributes that
belongs to Relation R2(CD), i.e., C, D, CD (Note: CD is not considered as it is always CD due
to triviality)
Similarly, we can derive for F2 = {C→D}

Step 2: Test whether original Relation functional dependency {AB→C, C→D, D→A} exist in
{F1 U F2} or F = {F1 U F2}.

{F1 U F2} = {C→A, AB→C, BC→A, C→D}

AB→C is present in {F1 U F2}.


C→D is present in {F1 U F2}.
D→A is not present in any of F1 or F2 nor in {F1 U F2} +. Hence this dependency is not
preserved or we can say F1 U F2 is a subset of F.

So given decomposition is not dependency preserving.


Example 2:
Let a relation R(A,B,C,D,E) and functional dependency set F = {A→B, B→C, C→D, D→A}.
Relation R is decomposed into R1(ABC) and R2(CDE). Check whether decomposition is
dependency preserving or not.
Solution:
Step 1: For decomposed Relation R1(ABC) and R2(CDE), let us find the functional
dependency of each sub Relation as F1 and F2 using closure property.

To find FD’s for Relation R1, i.e., F1, we will consider all combination of attributes that
belongs to Relation R1(ABC), i.e., find closure of A, B, C, AB, BC and AC using original FD set F

(A)+ = {A,B,C,D} but we will ignore A (trivial) & D (D is not part of R1). Therefore, {A} + = {B,C}.
We can write Functional dependency derived from A as A→BC …………..……………… (1)
(B) = {B,C,D,A}. Ignoring B (trivial) & D (D not the part of R1). Therefore, {B} + = {C,A}.
+

We can write Functional dependency derived from B as B→CA ……………..…………… (2)


(C) = {C,D,A,B}. Ignoring C (trivial) & D (D not the part of R1). Therefore, {C} + = {B,A}.
+

We can write Functional dependency derived from C as C→BA ..………………………… (3)


(AB) = {A,B,C,D}. Ignoring AB (trivial) & D (D not the part of R1). Hence {AB} + = {C}.
+

We can write Functional dependency derived from AB as AB→C. But please note
that this is duplicate FD because attribute A alone can derive C in equation (1)
above

34
or we can say we will not check any combination of attributes, with attribute(s)
which
itself is capable of acting as the key of the Relation R. Hence we will ignore this FD as
part of F1 set.

Similarly, (A)+, (B)+, (C)+ derive all attributes of Relation R; hence testing the combination like
AC & BC will not going to add any new functional dependency in the set F1.
Therefore final F1 = {A→BC, B→CA, C→BA}

To find FD’s for Relation R2, i.e., F2, we will consider all the combination of attributes of
R2(CDE), i.e., C, D, E, CD, CE, DE using original functional dependency set F = F= {A→B, B→C,
C→D, D→A}.
(C)+ = {C,D,A,B}. Ignoring C (trivial) & AB (AB not the part of R2). Therefore, {C}+ = {D}.
We can write Functional dependency derived from C as C→D ……………………………… (1)
(D) = {D,A,B,C}. Ignoring D (trivial) & AB (AB not the part of R2). Therefore, {D} + = {C}.
+

We can write Functional dependency derived from D as D→C ……………………………… (2)


+
(E) = {E}. Ignoring trivial attribute E, therefore no FD using E.
(CD) + = {C,D,A,B}. Ignoring CD (trivial) & AB (AB not the part of R2). Therefore no new
FD is derived using CD.
(DE) = {D,E,A,B,C}. Ignoring DE (trivial) & AB (AB not part of R2). Hence {DE} + = {C}.
+

We can write Functional dependency derived from DE as DE→C. But please note this
is duplicate FD because D alone can derive C in equation (2). Hence we will ignore
this
FD.
(CE) = {C,E,D,A,B}. Ignoring CE (trivial) & AB (AB not part of R2). Hence {CE} + = {D}.
+

We can write Functional dependency derived from CE as CE→D. But please note this
is duplicate FD because C alone can derive D in equation (1). Hence we will ignore
this
FD.

Therefore final F2 = {C→D, D→C}

Step 2: Test whether original Relation functional dependency F = {A→B, B→C, C→D, D→A}
exist in {F1 U F2} or F = {F1 U F2}.

F1 = {A→BC, B→CA, C→BA}


F2 = {C→D, D→C}
{F1 U F2} = {A→BC, B→CA, C→BA, C→D, D→C}
A→B is present in {F1 U F2} (applying the decomposing rule on A→BC)
B→C is present in {F1 U F2} (applying the decomposing rule on B→CA)
C→D is present in {F1 U F2}

35
D→A can be derived using axioms on {F1 U F2} i.e., using D→C & C→BA, we can derive
D→BA (using transitivity rule) & then applying the decomposing rule, we can infer D→B &
D→A. Hence, D→A is present in {F1 U F2}. This means F= {F1 U F2}.

So given decomposition of Relation R is dependency preserving.

5.10 DECOMPOSITION OF RELATIONS TO CONVERT THEM INTO


HIGHER NORMAL FORM

We have understood the concept of 1NF, 2NF, 3NF & BCNF and find the highest normal
form of a given Relation. Let us use this knowledge to convert a given Relation into a higher
normal form. We will do this with a set of examples to bring more clarity.

Example 1:

Given Relation R(A,B,C,D,E) with FDs = {A→B, B→E, C→D)

Step 1 – Find the current normal form of the Relation


Candidate Key – (AC)
Prime attributes – A, C
Non-prime attributes – B, D
Using the process learned in the section above, we can find that R is in 1NF.

Step 2 – Find the FDs that are creating a problem


A→B (This is a partial dependency as (A) being a proper subset of candidate key (AC)
is determining a non-prime attribute (B) – Thus violating 2NF
C→D (This is a partial dependency as (C) being a proper subset of candidate key (AC)
is determining a non-prime attribute (B) – Thus violating 2NF

Step 3 – Decompose the Relation to remove the anomalies identified above


As, we have identified two partial dependencies in the above Relation, thus violating
2NF. From previous sections, we know:
How to convert 1NF to 2NF:
The normalization of 1NF relations to 2NF involves the removal of partial
functional dependencies. If a partial dependency exists, we remove the partially
dependent attribute(s) (along with their dependents, if any) from the Relation by
placing them in a new relation along with a copy of their determinant. The
remaining attributes of the Relation along with the determinant above remain
part of the base relation.

36
We will create two separate relations to handle two partial dependencies A→B
(including B→E, as E is dependent on B) & C→D.
i.e. R1(A,B,E), R2(C,D). After removing the partial dependent (and their dependents)
attributes, the base Relation will be reduced to R3(A, C).

Step 4 – Check again if the above-decomposed relations have achieved the highest normal
form.
Relation R1(A, B, E)
Candidate Key – (A)
Prime attributes – A
Non-prime attributes – B, E
We see there is transitive dependency here B→E; therefore, this Relation is not in
3NF.
R2(C, D) & R3(A, C) are both in BCNF (you can check by concepts learned in the
earlier sections).

Step 5 – Decompose the Relation R1(A, B, E) to remove the anomalies identified above.
We have identified a transitive dependency in the above Relation, thus violating 3NF.
From previous sections, we know:

How to convert 2NF to 3NF:


The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively
dependent attribute(s) from the Relation by placing the attribute(s) in a new
relation along with a copy of the determinant. The remaining attributes of the
Relation along with the determinant above remain part of the base relation.

We will create a separate Relation to handle the transitive dependency B→E


i.e., R12(B,E). After removing the transitive dependent attribute, the base Relation
will be reduced to R11(A, B).
Step 6 – Check again if the above-decomposed relations have achieved the highest normal
form.
R11(A, B) &R12(B, E) are both in BCNF (you can check by concepts learned in the
earlier sections).
Step 7 – After carrying out the decomposition, we need to make sure that one of the
decomposed relations contains the candidate key of the Relation R(A, B, C, D, E) i.e
(AC). Here R3(A, C) meets the condition.

37
Conclusion: Relation R(A,B,C,D,E) with FDs = {A→B, B→E, C→D) is in 1NF. It can be
decomposed into 4 separate relations - R11(A,B), R12(B,E), R2(C,D) & R3(A,C) to achieve the
highest normal form of BCNF.

Example 2:

Given Relation R(A,B,C,D) with FDs = {A→B, B→C, C→D)


Step 1 – Find the current normal form of the Relation
Candidate Key – (A)
Prime attributes – A
Non-prime attributes – B, C, D
Using the process learned in the section above, we can find that Relation R is in 2NF.

Step 2 – Find the FDs that are creating a problem


B→C, transitive dependency– Thus violating 3NF
C→D, transitive dependency – Thus violating 3NF

Step 3 – Decompose the Relation to remove the anomalies identified above


As, we have identified two transitive dependencies in the above Relation, thus
violating 3NF. From previous sections, we know:
How to convert 2NF to 3NF:
The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively
dependent attribute(s) (along with their dependents, if any) from the Relation by
placing the attribute(s) in a new relation along with a copy of the determinant.
The remaining attributes of the Relation along with the determinant above

We will create two separate relations to handle two transitive dependencies B→C,
C→D
i.e. R1(BC) & R2(CD). After removing the partial dependent attributes, the base
Relation will be reduced to R3(AB).

Step 4 – Check again if the above-decomposed relations have achieved the highest normal
form.
R1(BC), R2(CD) & R3(AB) are all in BCNF (you can check by concepts learned in the
earlier sections).

Step 5 – After carrying out the decomposition, we need to make sure that one of the
decomposed relations contains the candidate key of the Relation R(A, B, C, D) i.e
(A). Here R3(A, B) meets the condition.

38
Conclusion: Relation R(A,B,C,D) with FDs = {A→B, B→C, C→D) is in 2NF. It can be
decomposed into 3 separate relations - R1(BC), R2(CD) & R3(AB) to achieve the highest
normal form of BCNF.

Example 3:

Relation R(A,B,C,D) with FDs = {A→BCD, BC→AD, D→B}

Step 1 – Find the current normal form of the Relation


Candidate Keys – (A), (BC), (CD)
Prime attributes – A, B, C, D
Non-prime attributes – NILL
Using the process learned in the section above, we can find that Relation R is in 3NF
and not in BCNF.

Step 2 – Find the FDs that are creating a problem


D→B, D is not a super key – Thus violating BCNF

Step 3 – Decompose the Relation to remove the anomalies identified above


We have identified one dependency violating the BCNF condition. From previous
sections, we know:
How to convert 3NF to BCNF:-
The normalization of 3NF relations to BCNF involves creating new Relation for
every dependency, which is violating the BCNF condition. The remaining
attributes of the Relation, along with the determinant (of the FD violating the
BCNF condition) above, remain part of the base relation.

We will create one separate Relation to handle the dependency D→B


i.e., R1(D, B). After removing the dependent attributes of the above dependency
from the base Relation, it will be reduced to R2(A, C, D).

Step 4 – Check again if the above-decomposed relations have achieved the highest normal
form.
R1(D, B) & R2(A, C, D) are both in BCNF (you can check by concepts learned in the
earlier sections).

Step 5 – BCNF decompositions are not always dependency preserving; therefore, we don't
need to make sure that all candidate keys of the base Relation are there in the
decomposed relations.

39
Conclusion: Relation R(A,B,C,D) with FDs = {A→BCD, BC→AD, D→B} is in 3NF. It can be
decomposed into two separate relations - R1(D,B) & R2(A,C,D) to achieve the highest
normal form of BCNF.

5.11 Denormilzation topic to be added

40

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy