0% found this document useful (0 votes)
48 views

Normalization and Normal Form

The document discusses normalization in databases. Normalization is the process of organizing data to minimize duplication and dependency. It involves decomposing relations and dividing them into smaller, simpler relations. The document outlines various normal forms including 1NF, 2NF, 3NF, BCNF and discusses their definitions and purposes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Normalization and Normal Form

The document discusses normalization in databases. Normalization is the process of organizing data to minimize duplication and dependency. It involves decomposing relations and dividing them into smaller, simpler relations. The document outlines various normal forms including 1NF, 2NF, 3NF, BCNF and discusses their definitions and purposes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Normalization

A large database defined as a single relation may result in data duplication. This repetition of
data may result in:

o Making relations very large.


o It isn't easy to maintain and update data as it would involve searching many records in
relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with redundant
data into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.

What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms apply
to individual relations. The relation is said to be in particular normal form if it satisfies
constraints.

Following are the various types of Normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-
valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.
Important Points Regarding Normal Forms in DBMS
 First Normal Form (1NF): This is the most basic level of normalization. In 1NF,
each table cell should contain only a single value, and each column should have a
unique name. The first normal form helps to eliminate duplicate data and simplify
queries.
 Second Normal Form (2NF): 2NF eliminates redundant data by requiring that
each non-key attribute be dependent on the primary key. This means that each
column should be directly related to the primary key, and not to other columns.
 Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This means that each column should be
directly related to the primary key, and not to any other columns in the same table.
 Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures
that each determinant in a table is a candidate key. In other words, BCNF ensures
that each non-key attribute is dependent only on the candidate key.
 Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures
that a table does not contain any multi-valued dependencies.
 Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves
decomposing a table into smaller tables to remove data redundancy and improve
data integrity.

Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.

1NF: relation will be 1NF if it contains an atomic value.

o It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

2NF
o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent on the
primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In
a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38
83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which


is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

3NF
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be
in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent


on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively
dependent on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal
Example-2:
Consider relation R(A, B, C, D, E)
A -> BC, CD -> E, B -> D, E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All attribute are on
right sides of all functional dependencies are prime.
Note –
Third Normal Form (3NF) is considered adequate for normal relational database
design because most of the 3NF tables are free of insertion, update, and deletion
anomalies. Moreover, 3NF always ensures functional dependency preserving and
lossless.

BCNF
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a superkey for every functional dependency (FD) X−>Y in a given
relation.
Note: To test whether a relation is in BCNF, we identify all the determinants and make sure
that they are candidate keys.
It can be inferred that every relation in BCNF is also in 3NF. To put it another way, a
relation in 3NF need not be in BCNF. Ponder over this statement for a while.
To determine the highest normal form of a given relation R with functional dependencies, the
first step is to check whether the BCNF condition holds. If R is found to be in BCNF, it can
be safely deduced that the relation is also in 3NF, 2NF, and 1NF as the hierarchy shows. The
1NF has the least restrictive constraint – it only requires a relation R to have atomic values in
each tuple. The 2NF has a slightly more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is less
restrictive than the BCNF. In this manner, the restriction increases as we traverse down the
hierarchy.
Examples
Here, we are going to discuss some basic examples which let you understand the properties of
BCNF. We will discuss multiple examples here.
Example 1
Let us consider the student database, in which data of the student are mentioned.

Stu_ID Stu_Branch Stu_Course Branch_Number Stu_Course_No

Computer Science
101 DBMS B_001 201
& Engineering

Computer Science Computer


101 B_001 202
& Engineering Networks

Electronics &
VLSI
102 Communication B_003 401
Technology
Engineering

Electronics &
Mobile
102 Communication B_003 402
Communication
Engineering

Functional Dependency of the above is as mentioned:


Stu_ID −> Stu_Branch
Stu_Course −> {Branch_Number, Stu_Course_No}
Candidate Keys of the above table are: {Stu_ID, Stu_Course}
Why this Table is Not in BCNF?
The table present above is not in BCNF, because as we can see that neither Stu_ID nor
Stu_Course is a Super Key. As the rules mentioned above clearly tell that for a table to be in
BCNF, it must follow the property that for functional dependency X−>Y, X must be in Super
Key and here this property fails, that’s why this table is not in BCNF.
How to Satisfy BCNF?
For satisfying this table in BCNF, we have to decompose it into further tables. Here is the full
procedure through which we transform this table into BCNF. Let us first divide this main
table into two tables Stu_Branch and Stu_Course Table.
Stu_Branch Table

Stu_ID Stu_Branch

101 Computer Science & Engineering

102 Electronics & Communication Engineering

Candidate Key for this table: Stu_ID.


Stu_Course Table

Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile Communication B_003 402

Candidate Key for this table: Stu_Course.


Stu_ID to Stu_Course_No Table

Stu_ID Stu_Course_No

101 201

101 202

102 401
Stu_ID Stu_Course_No

102 402

Candidate Key for this table: {Stu_ID, Stu_Course_No}.


After decomposing into further tables, now it is in BCNF, as it is passing the condition of
Super Key, that in functional dependency X−>Y, X is a Super Key.
Example 2
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
 Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can determine
all attributes of the relation, So AC will be the candidate key. A or C can’t be derived
from any other attribute of the relation, so there will be only 1 candidate key {AC}.
 Step-2: Prime attributes are those attributes that are part of candidate key {A, C} in
this example and others will be non-prime {B, D, E} in this example.
 Step-3: The relation R is in 1st normal form as a relational DBMS does not allow
multi-valued or composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and
B->E is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is
a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to
satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be a prime
attribute. So the highest normal form of relation will be the 2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
Suppose, it is known that the only candidate key of R is AB. A careful observation is required
to conclude that the above dependency is a Transitive Dependency as the prime attribute B
transitively depends on the key AB through C. Now, the first and the third FD are in BCNF
as they both contain the candidate key (or simply KEY) on their left sides. The second
dependency, however, is not in BCNF but is definitely in 3NF due to the presence of the
prime attribute on the right side. So, the highest normal form of R is 3NF as all three FDs
satisfy the necessary conditions to be in 3NF.
Example 3
For example consider relation R(A, B, C)
A -> BC,
B -> A
A and B both are super keys so the above relation is in BCNF.
Note: BCNF decomposition may always not be possible with dependency preserving,
however, it always satisfies the lossless join condition. For example, relation R (V, W, X, Y,
Z), with functional dependencies:
V, W -> X
Y, Z -> X
W -> Y
It would not satisfy dependency preserving BCNF decomposition.
Note: Redundancies are sometimes still present in a BCNF relation as it is not always
possible to eliminate them completely.
There are also some higher-order normal forms, like the 4th Normal Form and the 5th
Normal Form.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy