7. Normalization
7. Normalization
Database Systems
Lecture 7
Normalization
Literature
▪ The material presented here is based on:
Management
Step 2: Normalization:
“Improving” the design
Relational Design Principles
• Relations should have semantic unity
• Information repetition should be avoided
– Anomalies: insertion, deletion, modification
• Avoid null values as much as possible
– Difficulties with interpretation
• don’t know, don’t care, known but unavailable, does not apply
• Specification of joins
– Spurious joins
• A spurious tuple is a record produced as a result of a join on two or more
tables where the joining fields are neither a primary or foreign keys
Redundancy
• The TITLE, SALARY, BUDGET attribute values
are repeated for each project that the
engineer is involved in.
– Waste of space
– Complicates updates
Insertion Anomaly
• It is difficult (impossible?) to store information
about a new project until an employee is
assigned to it. Why?
Deletion Anomaly
• If an engineer, who is the only employee on a project, leaves
the company, his personal information cannot be deleted, or
the information about that project is lost.
• May have to delete many tuples.
Modification Anomaly
• If any attribute of project (say BUDGET of P1) is
modified, all the tuples for all employees who work on
that project need to be modified.
What to do?
• Take each relation individually and “improve” it in
terms of the desired characteristics
– Normal forms
• Atomic values (1NF)
• Keys and dependencies (2NF, 3NF, BCNF)
• Normalization
– Normalization is a process of concept separation which
applies a top-down methodology for producing a schema
by subsequent refinements and decompositions.
– Do not combine unrelated sets of facts in one table; each
relation should contain an independent set of facts.
Normal Forms
Functional Dependence
• FDs and keys are used to define normal forms for relations
• FDs are constraints that are derived from the meaning and
interrelationships of the data attributes
– TITLE ―› SALARY
Examples of FD constraints
• An FD is a property of the attributes in the schema R
• Transitive dependency
– Transitive dependency is a condition where A, B, and C are attributes
of a relation such that if A → B and B → C, then C is transitively
dependent on A via B (provided that A is not functionally dependent
on B or C)
First Normal Form
• All attribute values are atomic
• 1NF relation cannot have an attribute value that is:
– a set of values (Multivalued)
– a tuple of values (nested relation)
• 2NF:
STUDENT (Student#, S_name, Age)
ADDRESS (Student#, Zip, St.#, City)
MODULE (Module#, M_title, Inst_code, Inst_name, Sem)
STUDY (Student#, Module#, Grade)
Exercise
• 3NF:
STUDENT (Student#, S_name, Age)
ADDRESS (Student#, Zip, St.#, City)
MODULE (Module#, M_title, Inst_code, Sem)
STUDY (Student#, Module#, Grade)