Ch12 Normalization
Ch12 Normalization
Ch12 Normalization
UCS310: DBMS 1
Topics to be Covered
Functional Dependency
Armstrong’s Axioms
Closure of a set of FDs
Canonical Cover
Closure of a set of attributes to decide the key
Anomaly detection
Decomposition
Normalization and Normal Forms
UCS310: DBMS 2
Functional Dependency (FD) and its types
What is Functional Dependency (FD)?
• Let R be a relation schema having n attributes A1, A2, A3,…, An.
• Let attributes X and Y are two subsets of attributes of relation R.
• If the values of the X component of a tuple uniquely (or functionally)
determine the values of the Y component, then there is a functional
dependency from X to Y. This is denoted by X → Y.
• (i.e RollNo → Name, SPI, BL).
• It is referred as: Y is functionally dependent on the X or X functionally
determines Y. Student
RollNo Name SPI BL
101 Raju 8 0
102 Mitesh 7 1
103 Jay 7 0
Diagrammatic representation of (FD)
X→Y {X1, X2} → Y X → {Y1, Y2}
X Y X1 X2 Y X Y1 Y2
• Example
• Consider the relation Account(account_no, balance, branch).
• account_no can determine balance and branch.
• So, there is a functional dependency from account_no to balance and branch.
• This can be denoted by account_no → {balance, branch}.
account_no balance branch
Types of Functional Dependency (FD)
• Full Functional Dependency
• In a relation, the attribute B is fully functional dependent on A if B is
functionally dependent on A, but not on any proper subset of A.
• Eg. {Roll_No, Semester, Department_Name} → SPI
• We need all three {Roll_No, Semester, Department_Name} to find SPI.
Types of Functional Dependency (FD)
• Partial Functional Dependency
• In a relation, the attribute B is partial functional dependent on A if B is functionally
dependent on A as well as on any proper subset of A.
• If there is some attribute that can be removed from A and the still dependency holds
then it is partial functional dependency.
• Eg. {Enrollment_No, Department_Name} → SPI
• Enrollment_No is sufficient to find SPI, Department_Name is not required to find SPI.
Types of Functional Dependency (FD)
• Transitive Functional Dependency
• In a relation, if attribute(s) A → B and B → C, then A → C (means C is transitively
depends on A via B).
Sub_Fac
Subject Faculty Age
DS Shah 35
DBMS Patel 32
DF Shah 35
Pseudo
Transitivity Decomposition
Transitivity
If A → B and B → C If A → B and BD → C If A → BC
then A → B and A → C
then A → C then AD → C
Union Composition
If A → B and A → C If A → B and C → D
then A → BC then AC → BD
Closure of a set of FDs
What is closure of a set of FDs?
• Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
• E.g.: F = {A → B and B → C}, then we can infer that A → C (by
transitivity rule)
• The set of functional dependencies (FDs) that is logically implied by
F is called the closure of F. It is denoted by F+.
Closure of a set of FDs [Example]
Suppose we are given a relation schema R(A,B,C,G,H,I) and the set of functional
dependencies are:
F = (A → B, A → C, CG → H, CG → I, B → H)
We have
A→B
Transitivity rule A→H
B→H
Closure of a set of FDs [Example]
Suppose we are given a relation schema R(A,B,C,G,H,I) and the set of functional
dependencies are:
F = (A → B, A → C, CG → H, CG → I, B → H)
We have
CG → H
Union rule CG → HI
CG → I
Closure of a set of FDs [Example]
Suppose we are given a relation schema R(A,B,C,G,H,I) and the set of functional
dependencies are:
F = (A → B, A → C, CG → H, CG → I, B → H)
We have
A→C Pseudo-
AG → I
CG → I transitivity rule
Closure of a set of FDs [Example]
Suppose we are given a relation schema R(A,B,C,G,H,I) and the set of functional
dependencies are:
F = (A → B, A → C, CG → H, CG → I, B → H)
We have
Augmentation
A→C AG → CG
rule
AG → CG
Transitivity rule AG → I
CG → I
Closure of a set of FDs [Example]
Suppose we are given a relation schema R(A,B,C,G,H,I) and the set of functional
dependencies are:
F = (A → B, A → C, CG → H, CG → I, B → H)
F+ = (A → BC, CD → EF, A → E, AD → E, AD → F)
Closure of a set of FDs [Example]
Compute the closure of the following set F of functional dependencies FDs for relational
schema R = (A,B,C,D,E):
F = (AB → C, D → AC, D → E )
Find out the closure of F.
F+ = (D → A, D → C, D → ACE)
Closure of attribute sets to decide the Key
What is a closure of attribute sets?
• Given a set of attributes α, the closure of α under F is the set of attributes that
are functionally determined by α under F.
• It is denoted by α+.
What is a closure of attribute sets?
• Given a set of attributes α, the closure of α under F is the set of attributes that
are functionally determined by α under F.
• It is denoted by α+.
Algorithm
Algorithm to compute α+, the closure of α under F
Steps
1. result = α
2. while (changes to result) do
for each β → γ in F do
begin
• if β ⊆ result then result = result U γ
• else result = result
end
Closure of attribute sets [Example]
• Consider the relation schema R = (A, B, C, G, H, I).
• For this relation, a set of functional dependencies F can be given as
F = {A → B, A → C, CG → H, CG → I, B → H}
• Find out the closure of (AG)+. Step 1.
Algorithm result = α => result = AG
Algorithm to compute α+, the closure of α under F A→B A ⊆ AG result = ABG
Steps
A→C A ⊆ ABG result = ABCG
1. result = α
CG → H CG ⊆ ABCG result = ABCGH
2. while (changes to result) do
for each β → γ in F do CG → I CG ⊆ ABCGH result = ABCGHI
begin B→H B ⊆ ABCGHI result = ABCGHI
• if β ⊆ result then result = result U
γ
• else result = result AG+ = ABCGHI
end
Closure of attribute sets [Exercise]
• Given functional dependencies (FDs) for relational schema R = (A,B,C,D,E):
• F = {A → BC, CD → E, B → D, E → A}
• Find Closure for A
• Find Closure for CD
• Find Closure for B
• Find Closure for BC
• Find Closure for E
Answer
A+ = ABCDE
CD+ = ABCDE
B+ = BD
BC+ = ABCDE
E+ = ABCDE
Canonical cover
What is extraneous attributes?
• Let us consider a relation R with schema R = (A, B, C) and set of functional
dependencies FDs F = { AB → C, A → C }.
• In AB → C, B is extraneous attribute. The reason is, there is another FD A → C,
which means when A alone can determine C, the use of B is unnecessary
(extra).
• An attribute of a functional dependency is said to be extraneous if we can
remove it without changing the closure of the set of functional dependencies.
What is canonical cover?
• A canonical cover of F is a minimal set of functional dependencies equivalent
to F, having no redundant dependencies or redundant parts of dependencies.
• It is denoted by Fc
• A canonical cover for F is a set of dependencies Fc such that
• F logically implies all dependencies in Fc and
• Fc logically implies all dependencies in F and
• No functional dependency in Fc contains an extraneous attribute and
• Each left side of functional dependency in Fc is unique.
F = {A → B, A → C}
Decomposition Rule Union Rule
Fc = {A → BC}
Algorithm to find canonical cover
• Repeat
• Use the union rule to replace any dependencies in F α1 → β1 and α1 → β2 with α1 →
β1β2
• Find a functional dependency α → β with an extraneous attribute either in α or in β
/* Note: test for extraneous attributes done using Fc, not F */
• If an extraneous attribute is found, delete it from α → β
• until F does not change
/* Note: Union rule may become applicable after some extraneous
attributes have been deleted, so it has to be re-applied */
Canonical cover [Example]
Consider the relation schema R = (A, B, C) with FDs
F = {A → BC, B → C, A → B, AB → C}
Find canonical cover.
• Now consider there is only one employee in some department (IT) and that
employee leaves the organization.
• So we need to delete tuple of that employee (Jay).
• But in addition to that information about the department also deleted.
• This kind of problem in the relation where deletion of some tuples can lead to loss of
some other data not intended to be removed is known as delete anomaly.
Update anomaly
• Consider a relation Emp_Dept(EID, Ename, City, Dname, Manager) EID as a primary key
Emp_Dept
EID Ename City Dname Manager An update anomaly exists when one or more
1 Raj Rajkot CE Sah records (instance) of duplicated data is updated,
2 Meet Surat C.E Shah but not all.
3 Jay Baroda Computer Shaah
Want to update
4 Hari Rajkot IT Dave manager of CE
department
• Suppose the manager of a (CE) department has changed, this requires that the
Manager in all the tuples corresponding to that department must be changed to
reflect the new status.
• If we fail to update all the tuples of given department, then two different records of
employee working in the same department might show different Manager lead to
inconsistency in the database.
How to deal with insert, delete and update anomaly
Emp_Dept Emp Dept
Dnam EID Ename City DID Dnam
EID Ename City DID Manager DID Manager
e e
1 Raj Rajkot 1
1 Raj Rajkot 1 CE Shah 1 CE Shah
2 Meet Surat 1 2 IT Dave
2 Meet Surat 1 C.E Shah
3 Jay Baroda 2 IT Dave 3 Jay Baroda 2 3 EC NULL
NULL NULL NULL 3 EC NULL
Customer
Ano Balance Bname
A01 5000 Rajkot
A02 5000 Surat
Normalization and normal forms
What is normalization?
• Normalization is the process of removing redundant data from tables to
improve data integrity, scalability and storage efficiency.
• data integrity (completeness, accuracy and consistency of data)
• scalability (ability of a system to continue to function well in a growing amount of work)
• storage efficiency (ability to store and manage data that consumes the least amount of
space)
• What we do in normalization?
• Normalization generally involves splitting an existing table into multiple (more than one)
tables, which can be re-joined or linked each time a query is issued (executed).
How many normal forms are there?
• Normal forms:
• 1NF (First normal form)
• 2NF (Second normal form)
• 3NF (Third normal form)
• BCNF (Boyce–Codd normal form)
• 4NF (Forth normal form)
• 5NF (Fifth normal form)
As we move from 1NF to 5NF number of tables and complexity increases but
redundancy decreases.
Normal forms
1NF (First Normal Form)
1NF (First Normal Form)
• Conditions for 1NF
Each cells of a table should contain a single
value.
• A relation R is in first normal form (1NF) if and only if it does not contain any
composite attribute or multi-valued attributes or their combinations.
OR
• A relation R is in first normal form (1NF) if and only if all underlying domains
contain atomic values only.
1NF (First Normal Form) [Example - Composite attribute]
Customer
CID Name Address
C01 Raju Jamnagar Road, Rajkot
• In customer relation address is composite attribute
which is further divided into sub-attributes as “Road”
C02 Mitesh Nehru Road, Jamnagar
and “City”.
C03 Jay C.G Road, Ahmedabad
• So customer relation is not in 1NF.
• Problem: It is difficult to retrieve the list of customers living in ’Jamnagar’ city from
customer table.
• The reason is that address attribute is composite attribute which contains road name
as well as city name in single cell.
• It is possible that city name word is also there in road name.
• In our example, ’Jamnagar’ word occurs in both records, in first record it is a part of
road name and in second one it is the name of city.
1NF (First Normal Form) [Example - Composite attribute]
Customer Customer
CID Name Address CID Name Road City
C01 Raju Jamnagar Road, Rajkot C01 Raju Jamnagar Road Rajkot
C02 Mitesh Nehru Road, Jamnagar C02 Mitesh Nehru Road Jamnagar
C03 Jay C.G Road, Ahmedabad C03 Jay C.G Road Ahmedaba
d
• Solution: Split the table into two tables in such as way that
• the first table contains all attributes except multi-valued attribute with same primary
key and
• second table contains multi-valued attribute and place a primary key in it.
• insert the primary key of first table in the second table as a foreign key.
Normal forms
2NF (Second Normal Form)
2NF (Second Normal Form)
• Conditions for 2NF
It is in 1NF and each table should contain a single primary
key.
• Problem: For example, in case of a joint account multiple (more than one)
customers have common (one) accounts.
• If an account ’A01’ is operated jointly by two customers says ’C01’ and ’C02’
then data values for attributes Balance and BranchName will be duplicated in
two different tuples of customers ’C01’ and ’C02’.
2NF (Second Normal Form) [Example]
Customer Table-1 Table-2
CID ANO AccessDate Balance BranchName ANO Balance BranchName CID ANO AccessDate
C01 A01 01-01-2017 50000 Rajkot A01 50000 Rajkot C01 A01 01-01-2017
C02 A01 01-03-2017 50000 Rajkot A02 25000 Surat C02 A01 01-03-2017
C01 A02 01-05-2017 25000 Surat C01 A02 01-05-2017
C03 A02 01-07-2017 25000 Surat C03 A02 01-07-2017
• Solution: Decompose relation in such a way that resultant relations do not have any
partial FD.
• Remove partial dependent attributes from the relation that violets 2NF.
• Place them in separate relation along with the prime attribute on which they are fully
dependent.
• The primary key of new relation will be the attribute on which it is fully dependent.
• Keep other attributes same as in that table with the same primary key.
Normal forms
3NF (Third Normal Form)
3NF (Third Normal Form)
• Conditions for 3NF
It is in 2NF and there is no transitive
dependency.
(Transitive dependency???) A → B & B → C then A →
C
• Problem: In this relation, branch address will be stored repeatedly for each
account of the same branch which occupies more space.
3NF (Third Normal Form) [Example]
Customer Table-1 Table-2
ANO Balance BranchName BranchAddress BranchName BranchAddress ANO Balance BranchName
A01 50000 Rajkot Kalawad road Rajkot Kalawad road A01 50000 Rajkot
A02 40000 Rajkot Kalawad Road Surat C.G Road A02 40000 Rajkot
A03 35000 Surat C.G Road A03 35000 Surat
A04 25000 Surat C.G Road A04 25000 Surat
• Solution: Decompose relation in such a way that resultant relations do not have any
transitive FD.
• Remove transitive dependent attributes from the relation that violets 3NF.
• Place them in a new relation along with the non-prime attributes due to which transitive
dependency occurred.
• The primary key of the new relation will be non-prime attributes due to which transitive
dependency occurred.
• Keep other attributes same as in the table with same primary key and add prime attributes of
other relation into it as a foreign key.
Normal forms
BCNF (Boyce-Codd Normal Form)
BCNF (Boyce-Codd Normal Form)
Primary Determina
• Conditions for BCNF nt
Dependent
Key
BCNF is based on the concept of a
AccountNO → {Balance,
determinant.
Branch}
It is in 3NF and every determinant should be primary
key.
• Here, one faculty teaches only one subject, but a subject may be
taught by more than one faculty.
• A student can learn a subject from only one faculty.
BCNF (Boyce-Codd Normal Form) [Example]
Student Table-1 Table-2 • Solution: Decompose relation in
RNO Subject Faculty Faculty Subject RNO Faculty such a way that resultant relations
101 DS Patel Patel DS 101 Patel do not have any transitive FD.
102 DBMS Shah Shah DBMS 102 Shah • Remove transitive dependent prime
103 DS Jadeja Jadeja DS 103 Jadeja
attribute from relation that violets
BCNF.
104 DBMS Dave Dave DBMS 104 Dave • Place them in separate new relation
105 DBMS Shah 105 Shah along with the non-prime attribute
due to which transitive dependency
102 DS Patel 102 Patel
occurred.
101 DBMS Dave 101 Dave • The primary key of new relation will be
105 DS Jadeja 105 Jadeja this non-prime attribute due to which
transitive dependency occurred.
• Keep other attributes same as in that
table with same primary key and add a
prime attribute of other relation into it
as a foreign key.
Normal forms
4NF (Forth Normal Form)
Multivalued dependency (MVD)
• For a dependency X → Y, if for a single value of X, multiple values of Y exists, then the table
may have multi-valued dependency.
Student
RNO Subject Faculty
101 DS Patel
101 DBMS Patel
101 DS Shah
101 DBMS Shah
• The redundancy in this example is due to the constraint that the text books for a
course are independent of the instructors, which cannot be expressed in terms of FDs.
• This constraint is an example of a multivalued dependency, or MVD.
Multivalued Dependencies:
Multivalued Dependencies:
Multivalued Dependencies:
• Let R be a relation schema and let R and R. The multivalued
dependency
holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such
that t1[] = t2 [], there exist tuples t3 and t4 in r such that:
• Above student table has multivalued dependency. So student table is not in 4NF.
Functional dependency & Multivalued dependency
• A table can have both functional dependency as well as multi-valued
dependency together.
• RNO → Address
• RNO →→ Subject
• RNO →→ Faculty Subject Faculty Address
RNO Subject RNO Faculty RNO Address
Student
101 DS 101 Patel 101 C. G. Road, Rajkot
RNO Address Subject Faculty
101 DBMS 101 Shah
101 C. G. Road, Rajkot DS Patel
101 C. G. Road, Rajkot DBMS Patel
101 C. G. Road, Rajkot DS Shah
101 C. G. Road, Rajkot DBMS Shah
Normal forms
5NF (Fifth Normal Form)
Join Dependency- 5NF
5NF (Fifth Normal Form)
• Conditions for 5NF
• A relation R is in fifth normal form (5NF)
• if and only if it is in 4NF and
• it cannot have a lossless decomposition in to any number of smaller tables (relations).
Student_Result
RID RNO Name Subject Result
Student_Result relation is further decomposed into
1 101 Raj DBMS Pass
sub-relations. So the above relation is not in 5NF.
2 101 Raj DS Pass
3 101 Raj DF Pass
4 102 Meet DBMS Pass
5 102 Meet DS Fail
6 102 Meet DF Pass
7 103 Suresh DBMS Fail
8 103 Suresh DS Pass
5NF (Fifth Normal Form)
• Conditions for 5NF
• A relation R is in fifth normal form (5NF)
• if and only if it is in 4NF and
• it cannot have a lossless decomposition in to any number of smaller tables (relations).
Student_Result Student Subject Result
RID RNO Name Subject Result RNO Name SID Name RID RNO SID Result
1 101 Raj DBMS Pass 101 Raj 1 DBMS 1 101 1 Pass
2 101 Raj DS Pass 102 Meet 2 DS 2 101 2 Pass
3 101 Raj DF Pass 103 Suresh 3 DF 3 101 3 Pass
4 102 Meet DBMS Pass 4 102 1 Pass
5 102 Meet DS Fail 5 102 2 Fail
6 102 Meet DF Pass 6 102 3 Pass
None of the above relations can be
7 103 Suresh DBMS Fail further decomposed into sub-relations. So 7 103 1 Fail
8 103 Suresh DS Pass the above database is in 5NF. 8 103 2 Pass
How to normalize database?
• A software contract and consultancy firm maintains details of all the various
projects in which its employees are currently involved. These details comprise:
Employee Number, Employee Name, Date of Birth, Department Code,
Department Name, Project Code, Project Description, Project Supervisor.
• Assume the following:
• Each employee number is unique.
• Each department has a single department code.
• Each project has a single code and supervisor.
• Each employee may work on one or more projects.
• Employee names need not necessarily be unique.
• Project Code, Project Description and Project Supervisor are repeating fields.
• Normalize this data to Third Normal Form.
How to normalize database?
• A software contract and consultancy firm maintains details of all the various
projects in which its employees are currently involved. These details comprise:
Employee Number, Employee Name, Date of Birth, Department Code,
Department Name, Project Code, Project Description, Project Supervisor.
UNF
Employee Employee Date of Department Department Project Project Project
Number Name Birth Code Name Code Description Supervisor
1 Raj 1-1-85 1 CE 1 IOT Patel
2 Meet 4-4-86 2 EC 2 PHP Shah
3 Suresh 2-2-85 1 CE 1 IOT Patel
1 Raj 1-1-85 1 CE 2 PHP Shah
How to normalize database?
UNF
Employee Employee Date of Department Department Project Project Project
Number Name Birth Code Name Code Description Supervisor
1 Raj 1-1-85 1 CE 1 IOT Patel
2 Meet 4-4-86 2 EC 2 PHP Shah
3 Suresh 2-2-85 1 CE 1 IOT Patel
1 Raj 1-1-85 1 CE 2 PHP Shah
1NF
Employee Employee Date of Department Department Employee Project Project Project
Number Name Birth Code Name Number Code Description Supervisor
1 Raj 1-1-85 1 CE 1 1 IOT Patel
2 Meet 4-4-86 2 EC 2 2 PHP Shah
3 Suresh 2-2-85 1 CE 3 1 IOT Patel
1 2 PHP Shah
How to normalize database?
1NF
Employee Employee Date of Department Department Employee Project Project Project
Number Name Birth Code Name Number Code Description Supervisor
1 Raj 1-1-85 1 CE 1 1 IOT Patel
2 Meet 4-4-86 2 EC 2 2 PHP Shah
3 Suresh 2-2-85 1 CE 3 1 IOT Patel
1 2 PHP Shah
2NF
Employee Employee Date of Department Department Project Project Project Employee Project
Number Name Birth Code Name Code Description Supervisor Number Code
1 Raj 1-1-85 1 CE 1 IOT Patel 1 1
2 Meet 4-4-86 2 EC 2 PHP Shah 2 2
3 Suresh 2-2-85 1 CE 3 1
1 2
How to normalize database?
3NF
UCS310: DBMS 83