ch 14-Final-normalization
ch 14-Final-normalization
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-5
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Informal Design Guidelines for
Relational Databases (2)
l
We first discussed informal guidelines for good
relational design followed by Fds rules and derivation
l
Now we will discuss formal concepts of functional
dependencies and normal forms
- 1NF (First Normal Form)
- 2NF (Second Normal Form)
- 3NF (Third Normal Form)
- BCNF (Boyce-Codd Normal Form)
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-6
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Reviewing: Informal Design
Guidelines for Relation Schemas
● Measures of quality
● Making sure attribute semantics are clear
● Reducing redundant information in tuples
● Reducing NULL values in tuples
● Disallowing possibility of generating spurious
tuples
l
Functional dependencies (FDs) are used to specify
formal measures of the "goodness" of relational
designs
l
FDs and keys are used to define normal forms for
relations
l
FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes
l
A set of attributes X functionally determines a set
of attributes Y if the value of X determines a
unique value for Y
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-18
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Normal forms based on primary key
●Most practical relational DB Design takes one of
the following approach:
– Make ERD and then map to tables
– Useexternal knowledge, informal methods to
design relation schemes
Once, scheme is designed using either of this
●
l
Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations on the basis of
condition using keys and FDs
l
A formal framework for analyzing relation schemas
based on keys and FDs
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-30
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
There are two important properties
of decompositions:
(a) Non-additive join or losslessness join property
No spurious tuples are generated while joining
(b) Dependency preservation property
Each FD in original R must be represented
Note that property (a) is extremely important and
cannot be sacrificed. Property (b) is less stringent
and may be sacrificed.
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-17
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Practical Use of Normal Forms
l
Normalization is carried out in practice so that the
resulting designs are of high quality and meet the desirable
properties
l
The practical utility of these normal forms becomes
questionable when the constraints on which they are based
are hard to understand or to detect
l
The database designers need not normalize to the highest
possible normal form. (usually up to 3NF, BCNF or 4NF)
l
Denormalization: the process of storing the join of higher
normal form relations as a base relation—which is in a
lower normal form
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-31
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Definitions of Keys and Attributes
Participating in Keys (1)
l
A superkey of a relation schema R = {A1, A2, ....,
An} is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
l
A key K is a superkey with the additional property
that removal of any attribute from K will cause K
not to be a superkey any more.
More than one key then each is called candidate key.
One is taken as primary and rest are called secondary keys
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-32
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Definitions of Keys and Attributes
Participating in Keys (2)
l
If a relation schema has more than one key, each is
called a candidate key. One of the candidate keys
is arbitrarily designated to be the primary key,
and the others are called secondary keys.
l
A Prime attribute must be a member of some
candidate key
l
A Nonprime attribute is not a prime attribute—
that is, it is not a member of any candidate key.
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-33
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Example
Given a relation scheme (A,B,C,D) with FD as
F={A->B,B->C,A->D, CD->A}
● Find candidate keys A, CD
A, C, D
● Prime attributes
B
● Non-prime attributes
First Normal Form
l
Disallows composite attributes, multivalued
attributes, and nested relations; attributes
whose values for an individual tuple are
non-atomic
l
Considered to be part of the formal definition of
Relation in relational data model i.e. relation is a
flat file
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-34
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Normalization into 1NF
Chap
Multiple Multivalues attribute
PERSON(SSN, {CAR#},{MOBILE}) Not in 1NF
One way to normalize : Expand key to include
both MV attribures I.e. key is
(SSN, CAR#,MOBILE)
– Leads to redundancy
Better way: Decompose R on the basis of two MV
●
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-37
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Second Normal Form (2)
l
A relation schema R is in second normal
form (2NF) if every non-prime attribute A
in R is fully functionally dependent on the
primary key
l
R can be decomposed into 2NF relations via
the process of 2NF normalization
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-38
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.3 Second Normal Form (1)
l
Uses the concepts of FDs, primary key
Definitions:
l
Prime attribute - attribute that is member of the
primary key K
l
Full functional dependency - a FD Y -> Z where
removal of any attribute from Y means the FD
does not hold any more
Examples: - {SSN, PNUMBER} -> HOURS is a full FD since
neither SSN -> HOURS nor PNUMBER -> HOURS hold
- {SSN, PNUMBER} -> ENAME is not a full FD (it is called
a partial dependency ) since SSN -> ENAME also holds
Second Normal Form (2)
l
A relation schema R is in second normal
form (2NF) if every non-prime attribute A
in R is fully functionally dependent on the
primary key
l
R can be decomposed into 2NF relations via
the process of 2NF normalization
3.3 Second Normal Form (1)
l
Uses the concepts of FDs, primary key
Definitions:
l
Prime attribute - attribute that is member of the
primary key K
l
Full functional dependency - a FD Y -> Z where
removal of any attribute from Y means the FD
does not hold any more
Examples: - {SSN, PNUMBER} -> HOURS is a full FD since
neither SSN -> HOURS nor PNUMBER -> HOURS hold
- {SSN, PNUMBER} -> ENAME is not a full FD (it is called
a partial dependency ) since SSN -> ENAME also holds
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-37
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Second Normal Form (2)
l
A relation schema R is in second normal
form (2NF) if every non-prime attribute A
in R is fully functionally dependent on the
primary key
l
R can be decomposed into 2NF relations via
the process of 2NF normalization
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-38
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
3.3 Second Normal Form (1)
l
Uses the concepts of FDs, primary key
Definitions:
l
Prime attribute - attribute that is member of the
primary key K
l
Full functional dependency - a FD Y -> Z where
removal of any attribute from Y means the FD
does not hold any more
Examples: - {SSN, PNUMBER} -> HOURS is a full FD since
neither SSN -> HOURS nor PNUMBER -> HOURS hold
- {SSN, PNUMBER} -> ENAME is not a full FD (it is called
a partial dependency ) since SSN -> ENAME also holds
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-37
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Second Normal Form (2)
l
A relation schema R is in second normal
form (2NF) if every non-prime attribute A
in R is fully functionally dependent on the
primary key
l
R can be decomposed into 2NF relations via
the process of 2NF normalization
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-38
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Third Normal Form
Definition:
l
Transitive functional dependency - a FD X -> Z
that can be derived from two FDs X -> Y and Y -> Z
Examples:
SSN -> DMGRSSN is a transitive FD since
SSN -> DNUMBER and DNUMBER -> DMGRSSN
hold
SSN -> ENAME is non-transitive since there is no set
of attributes X where SSN -> X and X -> ENAME
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-41
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
More example
General Normal Form Definitions
(For Multiple Candidate Keys)
A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on every key of R
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-43
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
General Normal Form Definitions
Definition:
l
Superkey of relation schema R - a set of attributes
S of R that contains a key of R
l
A relation schema R is in third normal form
(3NF) if whenever a FD X -> A holds in R, then
either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R
NOTE: Boyce-Codd normal form disallows condition (b)
above
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-44
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Interpreting General 3 Normal Form Definition
If in a relation with FD X->A violates both conditios (a) and (b) means
-A nonprime attribute determines another nonprime attribute (transitivity)
-A proper subset of K determines a nonprime attribute
l
A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -> A holds in
R, then X is a superkey of R
l
Each normal form is strictly stronger than the previous one
–
Every 2NF relation is in 1NF
–
Every 3NF relation is in 2NF
–
Every BCNF relation is in 3NF
l
There exist relations that are in 3NF but not in BCNF
l
The goal is to have each relation in BCNF (or 3NF)
Chapter
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition 10-45
Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Boyce-Codd normal form
Chapter 10-46
A relation TEACH that is in 3NF but not in BCNF
Chapter 10-47
Achieving the BCNF by Decomposition
l
Two FDs exist in the relation TEACH:
fd1: { student, course} -> instructor
fd2: instructor -> course
l
{student, course} is a candidate key for this relation and that
the dependencies shown follow the pattern in Figure 10.12
(b). So this relation is in 3NF but not in BCNF
l
A relation NOT in BCNF should be decomposed so as to
meet this property, while possibly forgoing the preservation of
all functional dependencies in the decomposed relations.
Chapter 10-48
Achieving the BCNF by Decomposition
l
Three possible decompositions for relation TEACH
1. {student, instructor} and {student, course}
2. {course, instructor } and {course, student}
3. {instructor, course } and {instructor, student}
l
All three decompositions will lose fd1. We have to settle for sacrificing the
functional dependency preservation. But we cannot sacrifice the non-additivity
property after decomposition.
l
Out of the above three, only the 3rd decomposition will not generate spurious
tuples after join.(and hence has the non-additivity property).
Chapter 10-49