0% found this document useful (0 votes)
2 views

7. Normalization

The document discusses normalization in database design, emphasizing the importance of eliminating redundancy and avoiding anomalies related to insertion, deletion, and modification. It outlines the steps of normalization, including defining normal forms (1NF, 2NF, 3NF, BCNF) and functional dependencies, which are essential for improving relational database schemas. Additionally, it provides examples and exercises to illustrate the process of normalizing a given relation.

Uploaded by

mohamedmt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

7. Normalization

The document discusses normalization in database design, emphasizing the importance of eliminating redundancy and avoiding anomalies related to insertion, deletion, and modification. It outlines the steps of normalization, including defining normal forms (1NF, 2NF, 3NF, BCNF) and functional dependencies, which are essential for improving relational database schemas. Additionally, it provides examples and exercises to illustrate the process of normalizing a given relation.

Uploaded by

mohamedmt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

24CSIS03C

Database Systems

Lecture 7

Normalization
Literature
▪ The material presented here is based on:

▪ Lecture Notes of M. Tamer Özsu. ,CS448 Introduction to Database

Management

▪ PP slides of El Masri, R and Navathe, S., “Fundamentals of Database

Systems”, 5th Edition


Database Design - Where are we?

Step 1: ER-to-Relational Mapping

Step 2: Normalization:
“Improving” the design
Relational Design Principles
• Relations should have semantic unity
• Information repetition should be avoided
– Anomalies: insertion, deletion, modification
• Avoid null values as much as possible
– Difficulties with interpretation
• don’t know, don’t care, known but unavailable, does not apply

• Specification of joins

– Spurious joins
• A spurious tuple is a record produced as a result of a join on two or more
tables where the joining fields are neither a primary or foreign keys
Redundancy
• The TITLE, SALARY, BUDGET attribute values
are repeated for each project that the
engineer is involved in.
– Waste of space
– Complicates updates
Insertion Anomaly
• It is difficult (impossible?) to store information
about a new project until an employee is
assigned to it. Why?
Deletion Anomaly
• If an engineer, who is the only employee on a project, leaves
the company, his personal information cannot be deleted, or
the information about that project is lost.
• May have to delete many tuples.
Modification Anomaly
• If any attribute of project (say BUDGET of P1) is
modified, all the tuples for all employees who work on
that project need to be modified.
What to do?
• Take each relation individually and “improve” it in
terms of the desired characteristics
– Normal forms
• Atomic values (1NF)
• Keys and dependencies (2NF, 3NF, BCNF)
• Normalization
– Normalization is a process of concept separation which
applies a top-down methodology for producing a schema
by subsequent refinements and decompositions.
– Do not combine unrelated sets of facts in one table; each
relation should contain an independent set of facts.
Normal Forms
Functional Dependence

• Functional dependencies (FDs) are used to specify formal


measures of the "goodness" of relational designs

• FDs and keys are used to define normal forms for relations

• FDs are constraints that are derived from the meaning and
interrelationships of the data attributes

• A set of attributes X functionally determines a set of attributes


Y if the value of X determines a unique value for Y
Functional Dependence
• X ―› Y holds if whenever two tuples have the same value for
X, they must have the same value for Y

• For any two tuples t1 and t2 in any relation instance r(R): If


t1[X]=t2[X], then t1[Y]=t2[Y]

• X ―› Y in R specifies a constraint on all relation instances r(R)

• Written as X ―› Y; can be displayed graphically on a relation


schema as in Figures. ( denoted by the arrow: ).

• FDs are derived from the real-world constraints on the


attributes
Examples of FD constraints
• In relation EMP-PROJ
– (ENO, PNO) ―› (ENAME, TITLE, SALARY, DURATION, RESP)

– ENO ―›(ENAME, TITLE, SALARY)

– PNO ―› (PNAME, BUDGET)

– TITLE ―› SALARY
Examples of FD constraints
• An FD is a property of the attributes in the schema R

• The constraint must hold on every relation instance r(R)

• If K is a key of R, then K functionally determines all attributes

in R (since we never have two distinct tuples with t1[K]=t2[K]).


Inference Rules for FDs
• Given a set of FDs F, we can infer additional FDs that hold
whenever the FDs in F hold
Armstrong's inference rules:
IR1. (Reflexive) If Y subset-of X, then X ―› Y
IR2. (Augmentation) If X ―› Y, then XZ ―› YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X ―› Y and Y ―› Z, then X ―› Z
• IR1, IR2, IR3 form a sound and complete set of inference
rules
Inference Rules for FDs
Some additional inference rules that are useful:

(Decomposition) If X ―› YZ, then X ―› Y and X ―› Z

(Union) If X ―› Y and X ―› Z, then X ―› YZ

(Psuedotransitivity) If X ―› Y and WY ―› Z, then WX ―› Z

• The last three inference rules, as well as any other inference


rules, can be deduced from IR1, IR2, and IR3 (completeness
property)
Some Basics
• Attributes
– Prime attribute is a member of any key
– Non-prime attribute is any attribute which is not prime

• Full functional dependency


– A FD X ―› Y is a full functional dependency if X is minimal, i.e., removal
of any attribute A from X means the dependency does not hold
anymore.

• Partial functional dependency


– A partial functional dependency is a functional dependency, X → Y
where some attribute A → X can be removed from X and the
dependency still holds, i.e. for some A → X, (X − {A}) → Y.

• Transitive dependency
– Transitive dependency is a condition where A, B, and C are attributes
of a relation such that if A → B and B → C, then C is transitively
dependent on A via B (provided that A is not functionally dependent
on B or C)
First Normal Form
• All attribute values are atomic
• 1NF relation cannot have an attribute value that is:
– a set of values (Multivalued)
– a tuple of values (nested relation)

• This is a standard assumption in relational DBMSs and


in the rest of this section
• In object-oriented DBMSs this assumption is relaxed.
Second Normal Form
• All non-prime attributes should fully depend on the primary
key
Definitions:
• Prime attribute - attribute that is member of the
primary key K
• Full functional dependency - a FD Y ―› Z where
removal of any attribute from Y means the FD does
not hold any more
Examples: - {SSN, PNUMBER} ―› HOURS is a full FD since
neither SSN -> HOURS nor PNUMBER ―› HOURS hold
- {SSN, PNUMBER} ―› ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds
Third Normal Form
• Intuitively: A relation R is in 3NF iff
– R is in 2NF (i.e., every non-prime attribute is fully
functionally dependent on every key)
– No non-prime attribute of R is transitively dependent on
the primary key.
• The issues is to remove the transitive dependencies
• N.B.: The absence of transitive dependencies
guarantees absence of partial functional dependencies.
3NF – Example

• EMP is not in 3NF because of fd2


• TITLE ―› SALARY but TITLE is not a superkey and SALARY is
not prime
• Problem is that ENO transitively determines SALARY (as well
as directly determining it)
• Solution:
Boyce-Codd Normal Form
• You can still have transitive dependencies in 3NF if the
dependent attribute(s) are prime.
• A 1NF relation scheme R is in BCNF if for every non-
trivial functional dependency X ―› Y, X is a superkey.
• Properties of BCNF
– All non-prime attributes are fully dependent on every key.
– All prime attributes are fully dependent on the keys that
they do not belong to.
– No attribute is non-trivially dependent on any set of non-
prime attributes.
BCNF – Example
• Assume the following definition of the PROJECT
• relation with:
– Each employee on a project has a unique location and
responsibility with respect to that project, and
– Only one project can be found at each location n FDs
would be

– which makes PROJECT in 3NF but not in BCNF


Exercise
Given the following relation and its functional
dependencies, Normalize it up to the 3NF.
• STUDENT (Student#, Module#, S_name, Age, {Address
(St.#, City, Zip)},M_title, Inst_code, Inst_name, Sem,
Grade)
– Student# → S_name, Age, Address
– Module# → M_title, Inst_code
– Inst_code → Inst_name
– Student#, Module# → Grade
Exercise
• 1NF:
STUDENT (Student#, Module#, S_name, Age, M_title,
Inst_code, Inst_name, Sem, Grade)
ADDRESS (Student#, Zip, St.#, City)

• 2NF:
STUDENT (Student#, S_name, Age)
ADDRESS (Student#, Zip, St.#, City)
MODULE (Module#, M_title, Inst_code, Inst_name, Sem)
STUDY (Student#, Module#, Grade)
Exercise
• 3NF:
STUDENT (Student#, S_name, Age)
ADDRESS (Student#, Zip, St.#, City)
MODULE (Module#, M_title, Inst_code, Sem)
STUDY (Student#, Module#, Grade)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy