SQIT3043 Chapter 5 - Data Normalization
SQIT3043 Chapter 5 - Data Normalization
SQIT3043 Chapter 5 - Data Normalization
SQIT3043
DATA SOURCE
Use top-down
approach such as ER
modeling
Users
Users
requirements
specification
Forms/reports
that are used or
generated by the
enterprise
APPROACH 2
Set of well-designed
relations
Use normalization
as a validation
technique to check
structure of
relations
(not covered)
Source
describing the
enterprise such
as data
dictionary &
corporate data
model
Use normalization
as a bottom-up
technique to create
set of relations.
(covered in this
chapter)
APPROACH 1
Figure 5.1 How normalization can be used to support database design
SQIT3043
sName
John White
Ann Beech
David Ford
Mary Howe
Susan Brand
Julie Lee
position
Manager
Assistant
Supervisor
Assistant
Manager
Assistant
salary
3000
12000
18000
9000
24000
9000
branchNo
B005
B003
B003
B007
B003
B005
Branch
branchNo
bAddress
22 Deer Rd, London
16 Argyll St, Aberdeen
163 Main St, Glasgow
B005
B007
B003
sName
position
salary
branchNo
bAddress
SL21
SG37
SG14
SA9
SG5
SL41
John White
Ann Beech
David Ford
Mary Howe
Susan Brand
Julie Lee
Manager
Assistant
Supervisor
Assistant
Manager
Assistant
3000
12000
18000
9000
24000
9000
B005
B003
B003
B007
B003
B005
SQIT3043
branch (i.e. branchNo and bAddress) are repeated for every member
of staff located at that branch.
In contrast, the branch details appear only once for each branch in
the Branch relation, and only branch number (branchNo) is repeated
in the Staff relation to represent where each member of staff is
located.
Relations that have redundant data may have problem called update
anomalies, which are classified as insertion, deletion or modification
anomalies.
Based on figure 5.3, there are two main types of insertion anomalies:
- To insert the details of new members of staff into the
StaffBranch relation, we must include the details of branch
at which the staff are to be located. For example, to insert the
details of new staff located at branch B007, we must enter
the correct details of branch number B007 so that the
branch details are consistent with values for branch B007 in
other tuples of the StaffBranch relation. The relations shown
in figure 5.2 do not suffer from this potential inconsistency,
because we enter only the appropriate branch number for
each staff member in the Staff relation. Instead the details
SQIT3043
SQIT3043
Assume that a relational schema has attributes (A, B, C,Z) and that
the database is described by a single universal relation called R = (A,
B, C,Z). This assumption means that every attribute in the database
has a unique name.
Functional dependency describes the relationship between
attributes in a relation. For example, if A and B are attributes of relation
R, B is functionally dependent on A (denoted A B), if each value of
A is associated with exactly one value of B. (A and B may each consist
of one or more attributes).
If we know the value of A and we examine the relation that holds this
dependency, we find only one value of B in all the tuples that have a
given value of A, at any moment in time. Thus when two tuples have
the same value of A, they also have the same value of B. However, for
a given value of B, there may be several different value of A. This
dependency between attributes A and B can be represented
diagrammatically as shown in figure 4.4.
SQIT3043
B is functionally dependent on A
staffNo
position
SL21
Manager
SQIT3043
position
staffNo
Manager
SL21
SG5
SQIT3043
SQIT3043
SQIT3043
SQIT3043
SQIT3043
Data source
Users
Users
Requirement
Specification
Forms/reports
that are used
or generated
by the
enterprise
Source
describing the
enterprise such
as data
dictionary and
corporate
model
Unnormalized
Form (UNF)
Remove repeating groups
First
Normal Form
(1NF)
Remove partial dependencies
Second
Normal Form
(1NF)
Remove transitive dependencies
Third
Normal Form
(1NF)
SQIT3043
First Normal Form (1NF) A relation in which the intersection of each row
and column contains one and only one value.
We begin by transferring the data from the source (e.g. a standard data
entry form) into table format (Unnormalized table)
To transform unnormalized table to 1NF, we identify and remove repeating
groups within the table. There are two approaches to do so:
- By entering appropriate data in the empty columns of rows
containing the repeating data.
We fill in the blanks by duplicating the non-repeating data,
where required (i.e. flattening the table).
This approach introduces more redundancy into the original
UNF table as part of the flattening process.
- By placing the repeating data along with a copy of the original
key attribute(s), in a separate relation.
This approach is applied repeatedly in cases where there are
more than one repeating groups in an unnormalized table until
no repeating group remains.
This approach creates two or more relations with less
redundancy than in the original UNF table.
Example 6: First Normal Form
- Figure 5.7 shows a collection of DreamHome leases. For this
example we assume that a client rents a given property only once
and cannot rent more than one property at any one time.
DreamHome Lease
DreamHome Lease
DreamHome Lease
Client Number: CR76
(if known)
Full Name: John Kay
(Please Print)
Monthly Rent: 350
SQIT3043
ClientRental
clientNo
CR76
cName
John
Kay
propertyNo
PG4
PG16
CR56
Aline
Stewart
PG4
PG36
PG16
pAddress
6
Lawrence
St,
Glasgow.
5 Novar
Dr.
Glasgow.
6
Lawrence
St,
Glasgow.
2 Manor
Rd,
Glasgow.
5 Novar
Dr,
Glasgow.
rentStart
1-Jul-07
rentFinish
31-Aug-08
rent
350
ownerNo
C040
oName
Tina
Murphy
1-Sep-08
1-Sep-09
450
C093
Tony
Shaw
1-Sep-06
10-June-07
350
C040
Tina
Murphy
10-Oct-07
1-Dec-08
375
C093
Tony
Shaw
1-Nov-09
10-Aug-10
450
C093
Tony
Shaw
Using approach 1:
-
SQIT3043
ClientRental
clientNo
CR76
cName
John
Kay
propertyNo
PG4
pAddress
6
Lawrence
St,
Glasgow.
rentStart
1-Jul-07
rentFinish
31-Aug-08
rent
350
ownerNo
C040
oName
Tina
Murphy
CR76
John
Kay
PG16
5 Novar
Dr.
Glasgow.
1-Sep-08
1-Sep-09
450
C093
Tony
Shaw
CR56
Aline
Stewart
PG4
6
Lawrence
St,
Glasgow.
1-Sep-06
10-June-07
350
C040
Tina
Murphy
CR56
Aline
Stewart
PG36
2 Manor
Rd,
Glasgow.
10-Oct-07
1-Dec-08
375
C093
Tony
Shaw
CR56
Aline
Stewart
PG16
5 Novar
Dr,
Glasgow.
1-Nov-09
10-Aug-10
450
C093
Tony
Shaw
Using approach 2:
-
SQIT3043
Client
clientNo
CR76
CR56
cName
John Kay
Aline Stewart
PropertyRentalOwner
clientNo
CR76
propertyNo
PG4
pAddress
6 Lawrence St,
Glasgow.
rentStart
1-Jul-07
rentFinish
31-Aug-08
rent
350
ownerNo
C040
oName
Tina
Murphy
CR76
PG16
5 Novar Dr.
Glasgow.
1-Sep-08
1-Sep-09
450
C093
Tony
Shaw
CR56
PG4
6 Lawrence St,
Glasgow.
1-Sep-06
10-June-07
350
C040
Tina
Murphy
CR56
PG36
2 Manor Rd,
Glasgow.
10-Oct-07
1-Dec-08
375
C093
Tony
Shaw
CR56
PG16
5 Novar Dr,
Glasgow.
1-Nov-09
10-Aug-10
450
C093
Tony
Shaw
SQIT3043
SQIT3043
Client
clientNo
CR76
CR56
cName
John Kay
Aline Stewart
Rental
clientNo
CR76
CR76
CR56
CR56
CR56
propertyNo
PG4
PG16
PG4
PG36
PG16
rentStart
1-Jul-07
1-Sep-08
1-Sep-06
10-Oct-07
1-Nov-09
rentFinish
31-Aug-08
1-Sep-09
10-June-07
1-Dec-08
10-Aug-10
PropertyOwner
propertyNo
PG4
PG16
PG36
pAddress
6 Lawrence St, Glasgow.
5 Novar Dr. Glasgow.
2 Manor Rd, Glasgow.
rent
350
450
375
ownerNo
C040
C093
C093
oName
Tina Murphy
Tony Shaw
Tony Shaw
SQIT3043
3NF a relation is in the 1NF and 2NF and in which no non-primary key
attributes is transitively dependent on the primary key.
Involves the removal of transitive dependencies from the relation by
placing the attributes in a new relation along with a copy of the
determinant.
Example 8: Third normal Form
- The six functional dependencies for the Client, Rental and
PropertyOwner relations are:
a. Client
FD2: clientNo cName
b. Rental
FD1: clientNo, propertyNo rentStart, rentFinish
Primary key
FD5: clientNo, rentStart propertyNo, rentFinish
Candidate key
FD6: propertyNo, rentStart clientNo, rentFinish Candidate key
c. PropertyOwner
FD3: propertyNo pAddress, rent, ownerNo, oName
Primary key
FD4: ownerNo oName
Transitive dependency
-
SQIT3043
Client
clientNo
CR76
CR56
cName
John Kay
Aline Stewart
Rental
clientNo
CR76
CR76
CR56
CR56
CR56
propertyNo
PG4
PG16
PG4
PG36
PG16
rentStart
1-Jul-07
1-Sep-08
1-Sep-06
10-Oct-07
1-Nov-09
rentFinish
31-Aug-08
1-Sep-09
10-June-07
1-Dec-08
10-Aug-10
PropertyForRent
propertyNo
PG4
PG16
PG36
pAddress
6 Lawrence St, Glasgow.
5 Novar Dr. Glasgow.
2 Manor Rd, Glasgow.
rent
350
450
375
ownerNo
C040
C093
C093
Owner
ownerNo
C040
C093
C093
oName
Tina Murphy
Tony Shaw
Tony Shaw
2NF a relation that is in the first normal form and every non candidate
key attribute is fully functionally dependent on any candidate key.
3NF a relation that is in first and second normal form and in which no
non-candidate key attribute is transitively dependent on any candidate
key.
SQIT3043
SOURCE:
Connoly, T., & Begg, C. (2010). Database Systems Apractical Approach to
Design, Implementation, and Management (5 ed.). Boston: Pearson.
REFERENCES:
Codd E.F. (1972b). Further Normalization of the Database Relational Model. In
Database Systems (Rustin R., ed.), Englewoods Cliffs, NJ: Prentice Hall.
Codd E.F. (1974). Recent Investigation in Relational Database Systems. In Proc.
IFIP Congress.