Data Normalization
Data Normalization
Data Normalization
1
Objectives
2
Why Normalization?
3
Definitions
4
Keys
Primary key
Every table (object) must have a primary key
Uniquely identifies a row (one-to-one)
Concatenated (or composite) key
Multiple columns needed for primary key
Identify repeating relationships (1 : M or M : N)
Key columns are underlined
First step
Collect user documents
Identify possible keys: unique or repeating relationships
5
Notation
6
Identifying Key Columns
7
Surrogate Keys
8
Common Order System
Customer Salesperson
1 1
*
Order
*
1
*
OrderItem
*
1
Item
Customer(CustomerID, Name, Address, City, Phone)
Salesperson(EmployeeID, Name, Commission, DateHired)
Order(OrderID, OrderDate, CustomerID, EmployeeID)
OrderItem(OrderID, ItemID, Quantity)
Item(ItemID, Description, ListPrice)
9
Database Normalization Rules
10
Atomic Values for Phone Numbers
11
Repeating Values for Phone Numbers
12
Repeating Values for Phone Numbers
13
Simple Form
14
Initial Design
15
Sample Database for Sales
Sale ID Date
Customer
First Name
Last Name
Address
City, State ZIPCode
ItemID Description List Price Quantity QOH Value
Total
19
Initial Objects
20
Initial Form Evaluation
SaleForm(SaleID, SaleDate, CustomerID, FirstName, LastName,
Address, City, State, ZIPCode,
(ItemID, Description, ListPrice, Quantity, QuantityOnHand) )
Sale Date
Identify potential keys. ID
Identify repeating groups.
Customer
First Name
Last Name
Address
City, State ZIPCode
ItemID Descriptio List Quantit QOH Value
n Price y
Total
21
Problems with Repeating Sections
Repeating section
Duplication Not atomic
SaleID Date CID FirstName LastName Address City State ZIP ItemID Description ListPrice Quantity QOH
11851 7/15 15023 Mary Jones 111 Elm Chicago IL 60601 15 Air Tank 192.00 2 15
27 Regulator 251.00 1 5
32 Mask 1557 65.00 1 6
11852 7/15 63478 Miguel Sanchez 222 Oro Madrid 15 Air Tank 192.00 4 15
33 Mask 2020 91.00 1 3
11853 7/16 15023 Mary Jones 111 Elm Chicago IL 60601 41 Snorkel 71 44.00 2 15
75 Wet suit-S 215.00 1 3
11854 7/17 94552 Madeline O’Reilly 333 Tam Dublin 75 Wet suit-S 215.00 2 3
32 Mask 1557 65.00 1 6
57 Snorkel 95 83.00 1 17
22
First Normal Form
23
Current Design
24
Multiple Repeating: Independent Groups
25
Nested Repeating Sections
26
First Normal Form Problems (Data)
27
Second Normal Form Definition
28
Second Normal Form Example
29
Second Normal Form Example (Data)
30
Second Normal Form in DB Design
31
Second Normal Form Problems (Data)
Duplication
32
Third Normal Form Definition
Depend on SaleID
Depend on CustomerID
33
Third Normal Form Example
34
Third Normal Form Tables
35
Third Normal Form Tables
36
3NF Rules/Procedure
37
Checking Your Work (Quality Control)
38
Boyce-Codd Normal Form (BCNF)
c
Employee-Specialty(EID, Specialty, Manager)
b a
d
Business rules.
Each employee may have many specialties.
Each specialty has many managers.
Employee has only one manager for each specialty.
Each manager has only one specialty.
Employee(EID, Manager)
Manager(Manager, Specialty)
39
Fourth Normal Form (Keys)
Business rules.
Each employee has many specialties.
Each employee has many tools.
Tools and specialties are unrelated.
EmployeeSpecialty(EID, Specialty)
EmployeeTools(EID, ToolID)
40
Domain-Key Normal Form (DKNF)
Need to add:
RequiredTools(TaskID, ToolID)
41
No Hidden Dependencies
42
Data Rules and Integrity
Sale
Simple business rules
SaleID SaleDate CID …
Limits on data ranges 1173 Jan-04 321
Price > 0 1174 Jan-05 938
Salary < 100,000 1185 Jan-08 337
DateHired > 1/12/2005 1190 Jan-09 321
1192 Jan-09 776
Choosing from a set
Gender = M, F, Unknown
Jurisdiction=City, County, State, Federal No data for this
customer yet!
Referential Integrity
Foreign key values in one table must exist
Customer
in the master table.
CID Name Phone …
Sale(SaleID, SaleDate, CID,…) 321 Jones 9983-
CID must exist in the customer table. 337 Sanchez 7738-
938 Carson 8738-
43
SQL Foreign Key (Oracle, SQL Server)
44
Effect of Business Rules
45
Business Rules 1
46
Business Rules 2
47
Business Rules 2: Normalized
48
Converting a Class Diagram
to Normalized Tables
Manager
* 1
* Purchase * 1
Supplier 1 Employee
Order
*
*
Item
49
One-to-Many Relationships
1 * Purchase * 1
Supplier Employee
Order
50
One-to-Many Sample Data
Supplier
Purchase Order
POID Date SID EID
22234 9-9-2004 5676 221
22235 9-10-2004 5676 554
22236 9-10-2004 7831 221
22237 9-11-2004 8872 335
Employee
51
Many-to-Many Relationships
Purchase PurchaseOrder(POID, Date, SID, EID) Purchase
Order 1 Order
* 1
* *
POItem(POID, ItemID, Quantity, PricePaid) POItem
*
*
* 1 1
Item Item(ItemID, Description, ListPrice) Item
52
Many-to-Many Sample Data
Purchase Order
POID Date SID EID
22234 9/9 5676 221
22235 9/10 5676 554
22236 9/10 7831 221
22237 9/11 8872 335
POItem
POID ItemID Quantity Price
22234 444098 3 2.00
22234 444185 1 25.00
22235 444185 4 24.00
22236 555828 10 150.00
22236 555982 1 5800.00
Item
ItemID Description ListPrice
444098 Staples 2.00
444185 Paper 28.00
555828 Wire 158.00
555982 Sheet steel 5928.00
888371 Brake assembly 152.00 53
N-ary Associations
Employee
Name
...
1
*
1 * * 1
Component Assembly Product
CompID ProductID
Type Type
Name Name
Assembly
EmployeeID
CompID
ProductID
54
Generalization or Subtypes
Item
55
Item
Subtypes Sample Data
ItemID Description ListPrice
444098 Staples 2.00
444185 Paper 28.00
555828 Wire 158.00
555982 Sheet steel 5928.00
888371 Brake assembly 152.00
RawMaterials
ItemID Weight StrengthRating
555828 57 2000
555982 2578 8321
AssembledComponents
ItemID Width Height Depth
888371 1 3 1.5
OfficeSupplies
ItemID BulkQuantity Discount
444098 20 10%
444185 10 15% 56
Composition
Bicycle
Bicycle
Size
Model Type SerialNumber
… ModelType
WheelID Components
CrankID
Wheels StemID ComponentID
… Category
Description
Crank Weight
Cost
Stem
57
Recursive Relationships
Manager
* 1
Employee
58
Normalization Examples
Possible topics
Auto repair
Auto sales
Department store
Hair stylist
HRM department
Law firm
Manufacturing
National Park Service
Personal stock portfolio
Pet shop
Restaurant
Social club
Sports team
59
Multiple Views & View Integration
60
The Pet Store: Sales Form
Sa les
Sa leID Da t e
Cu st om er E m ployeeID
Na m e Na m e
Addr ess
Cit y, St a t e, ZIP
Anim a l Sa le
ID Na m e Ca t egor y Br eed DoB Gen der Reg. Color Don a t ion Gr oup
An im a l Su bTot a l
Mer ch a n dise Sa le
It em Descr ipt ion Ca t egor y List P r ice Sa leP r ice Qu a n t it y Va lu e
Mer ch a n dise Su bt ot a l
Ta x
Tot a l
61
The Pet Store: Purchase Merchandise
62
Pet Store Normalization
63
Pet Store View Integration
64
Pet Store Class Diagram
65
Rolling Thunder Integration Example
66
Initial Tables for Bicycle Assembly
BicycleAssembly(
SerialNumber, Model, Construction, FrameSize, TopTube, ChainStay, HeadTube, SeatTube,
PaintID, PaintColor, ColorStyle, ColorList, CustomName, LetterStyle, EmpFrame, EmpPaint,
BuildDate, ShipDate,
(Tube, TubeType, TubeMaterial, TubeDescription),
(CompCategory, ComponentID, SubstID, ProdNumber, EmpInstall, DateInstall, Quantity, QOH) )
67
Rolling Thunder: Purchase Order
68
RT Purchase Order: Initial Tables
69
Rolling Thunder: Transactions
70
RT Transactions: Initial Tables
71
Rolling Thunder: Components
72
RT Components: Initial Tables
73
RT: Integrating Tables
74
RT Example: Integrated Tables
75
Rolling Thunder Tables
76
View Integration (FEMA Example 1)
Team Roster
Team# Date Formed Leader
Home Base Name Fax Phone
Response time (days) Address, C,S,Z Home phone
Team Members/Crew
ID Name Home phone Specialty DoB SSN Salary
Total Salary
This first form is kept for each team that can be called on to help in
emergencies.
77
View Integration (FEMA Example 2)
SubProblem Details
Sub Prob# Category Description Action Est. Cost
78
View Integration (FEMA Example 3)
79
View Integration (FEMA Example 3a)
80
View Integration (FEMA Example 4)
Task Completion Report Date
Disaster Name Disaster Rating HQ Phone
Total Expenses
Problem# Supervisor Date
SubProblem Team# Team Specialty CompletionStatus Comment Expenses
Total Expenses
81
View Integration (FEMA Example 4a)
82
DBMS Table Definition
83
Key Table Definition in Access
SQL Developer
85
Graphical Table Definition in SQL Server
86
CREATE TABLE Animal
( SQL Table Definition
AnimalID INTEGER,
Name NVARCHAR2(50),
Category NVARCHAR2(50),
Breed NVARCHAR2(50),
DateBorn DATE,
Gender NVARCHAR2(50) CHECK (Gender='Male' Or
Gender='Female' Or Gender='Unknown' Or Gender Is Null),
Registered NVARCHAR2(50),
Color NVARCHAR2(50),
Photo LONG RAW,
ImageFile NVARCHAR2(250),
ImageHeight INTEGER,
ImageWidth INTEGER,
AdoptionID INTEGER,
Donation NUMBER(10,2),
CONSTRAINT pk_Animal PRIMARY KEY (AnimalID),
CONSTRAINT fk_BreedAnimal FOREIGN KEY (Category,Breed)
REFERENCES Breed(Category,Breed)
ON DELETE CASCADE,
CONSTRAINT fk_CategoryAnimal FOREIGN KEY (Category)
REFERENCES Category(Category)
ON DELETE CASCADE
);
87
Oracle Databases
For Oracle and SQL Server, it is best to create a text file that contains all of
the SQL statements to create the table.
It is usually easier to modify the text table definition.
The text file can be used to recreate the tables for backup or transfer to another
system.
To make major modifications to the tables, you usually create a new table, then
copy the data from the old table, then delete the old table and rename the new
one. It is much easier to create the new table using the text file definition.
Be sure to specify Primary Key and Foreign Key constraints.
Be sure to create tables in the correct order—any table that appears in a
Foreign Key constraint must first be created. For example, create Customer
before creating Order.
In Oracle, to substantially improve performance, issue the following command
once all tables have been created:
Analyze table Animal compute statistics;
88
Data Volume
89
Data Volume Example
90
Appendix: Formal Definitions: Terms
91
Appendix: Functional Dependency
Holds when any rows of data that have identical values for
X attributes also have identical values for their Y attributes:
If t1[X] = t2[X], then t1[Y] = t2[Y]
92
Appendix: Keys
Keys are attributes that are ultimately used to identify rows of data.
93
Appendix: First Normal Form
A relation is in first normal form (1NF) if and only if all attributes are
atomic.
Example:
Customer(CID, Name: First + Last, Phones, Address)
CID Name: First + Last Phones Address
111 Joe Jones 111-2223 123 Main
111-3393
112-4582
94
Appendix: Second Normal Form
A relation is in second normal form (2NF) if it is in 1NF and each non-
key attribute is fully functionally dependent on the primary key.
Example:
OrderProduct(OrderID, ProductID, Quantity, Description)
95
Appendix: Transitive Dependency
Example:
There is an FD between OrderID and CustomerID. Given the OrderID
key attribute, you always know the CustomerID.
96
Appendix: Third Normal Form
Example:
Order(OrderID, OrderDate, CustomerID, Name, Phone)
97
Appendix: Boyce-Codd Normal Form
98
Appendix: Multi-Valued Dependency
Example:
Employees have many specialties and many tools, but tools
and specialties are not directly related.
99
Appendix: Fourth Normal Form
Example:
EmpSpecTools(EID, Specialty, ToolID)
EmpSpec(EID, Specialty)
EmpTools(EID, ToolID)
100