Preliminary Definitions: Entity
Preliminary Definitions: Entity
Preliminary Definitions: Entity
Normalization www.thecodexpert.com
Normalization is the process of efficiently organizing data in a database. This includes creating tables and
establishing relationships between those tables according to rules designed both to protect the data and to
make the database more flexible by eliminating redundancy and inconsistent dependency. Generally there are
two goals of the normalization process:
1. Eliminating redundant data (for example, storing the same data in more than one table).
2. Ensuring data dependencies make sense (only storing related data in a table).
Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is
logically stored.
Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one
place must be changed, the data must be changed in exactly the same way in all locations. A customer address
change is much easier to implement if that data is stored only in the Customers table and nowhere else in the
database.
Preliminary Definitions
In this section I introduce several definitions that are common jargon in the world of database administration
and normalization.
Entity: The word ‘entity’ as it relates to databases can simply be defined as the general name for the
information that is to be stored within a single table. For example, if I were interested in storing information
about the school’s students, then ‘student’ would be the entity. The student entity would likely be composed
of several pieces of information, for example: student identification number, name, and email address. These
pieces of information are better known as attributes.
Primary key: A primary key uniquely identifies a row of data found within a table. Referring to the school
system, the student identification number would be the primary key for the student table since an ID would
uniquely identify each student.
Note that a primary key might not necessarily correspond to one specific attribute. In fact, it could be the
result of a combination of several components of the entity. For example, while a location could not be a
primary key for a class, since there might be several classes held there throughout the day, the combined time
and location would make a satisfactory primary key, since no two classes could be held at the same time in the
same location. When multiple attributes are used to derive a primary key, this key is known as a concatenated
primary key.
Relationship: Understanding of the various relationships both between the data items forming the various
entities and between the entities themselves forms the crux of database normalization. There are three types of
data relationships that you should be aware of:
One-to-one (1:1) - A one-to-one relationship signifies that each instance of a given entity relates to exactly
one instance of another entity. For example, each student would have exactly one grade record, and each
grade record would be specific to one student.
one-to-many (1: N) - A one-to-many relationship signifies that each instance of a given entity relates to one
or more instances of another entity. For example, one professor entity could be found teaching several classes,
and each class could in turn be mapped to one professor.
many-to-many (M: N) - A many-to-many relationship signifies that many instances of a given entity relate
to many instances of another entity. To illustrate, a schedule could be comprised of many classes, and a class
could be found within many schedules.
Foreign key: A foreign key forms the basis of a 1: N relationship between two tables. The foreign key can
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
be found within the M table, and maps to www.thecodexpert.com
the primary key found in the 1 table. To illustrate, the primary key
in the professor table (probably a unique identification number) would be introduced as the foreign key within
the classes’ entity, since it would be necessary to map a particular professor to several classes.
1. The expression X Y means 'if I know the value of X, then I can obtain the value of Y' (in a table or
somewhere).
2. In the expression X Y, X is the determinant and Y is the dependent attribute.
3. The value X determines the value of Y.
4. The value Y depends on the value of X.
1. An attribute is transitively dependent if its value is determined by another attribute which is not a key.
2. If X Y and X is not a key then this is a transitive dependency.
3. A transitive dependency exists when A B C but NOT A C.
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
Multi-Valued Dependencies (MVD)www.thecodexpert.com
1. A table involves a multi-valued dependency if it may contain multiple values for an entity.
2. A multi-valued dependency may arise as a result of enforcing 1st normal form.
3. X Y, ie X multi-determines Y, when for each value of X we can have more than one value of Y.
4. If A B and A C then we have a single attribute A which multi-determines two other independent
attributes, B and C.
5. If A (B,C) then we have an attribute A which multi-determines a set of associated attributes, B and
C.
If a table can be decomposed into three or more smaller tables, it must be capable of being joined again on
common keys to form the original table.
Modification Anomalies
A major objective of data normalization is to avoid modification anomalies. These come in two flavours:
1. An insertion anomaly is a failure to place information about a new database entry into all the places
in the database where information about that new entry needs to be stored. In a properly normalized
database, information about a new entry needs to be inserted into only one place in the database. In an
inadequately normalized database, information about a new entry may need to be inserted into more
than one place, and, human fallibility being what it is, some of the needed additional insertions may be
missed. There are circumstances in which certain facts cannot be recorded at all. For example, each
record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire
Date, and Course Code—thus we can record the details of any faculty member who teaches at least
one course, but we cannot record the details of a newly-hired faculty member who has not yet been
assigned to teach any courses. This phenomenon is known as an insertion anomaly.
An insertion anomaly. Until the new faculty member is assigned to teach at least one course, his
details cannot be recorded.
2. A deletion anomaly is a failure to remove information about an existing database entry when it is time
to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of
entry needs to be deleted from only one place in the database. In an inadequately normalized database,
information about that old entry may need to be deleted from more than one place, and, human
fallibility being what it is, some of the needed additional deletions may be missed.There are
circumstances in which the deletion of data representing certain facts necessitates the deletion of data
representing completely different facts. The "Faculty and Their Courses" table described in the
previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be
assigned to any courses, we must delete the last of the records on which that faculty member appears.
This phenomenon is known as a deletion anomaly.
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com
A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to
any courses.
3 The same information can be expressed on multiple records; therefore updates to the table may result
in logical inconsistencies. For example, each record in an "Employees' Skills" table might contain an
Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will
potentially need to be applied to multiple records (one for each of his skills). If the update is not
carried through successfully—if, that is, the employee's address is updated on some records but not
others—then the table is left in an inconsistent state. Specifically, the table provides conflicting
answers to the question of what this particular employee's address is. This phenomenon is known as an
update anomaly.
An update anomaly. As shown in fig, its having different addresses on different records.
All three kinds of anomalies are highly undesirable, since their occurrence constitutes corruption of the
database. Properly normalised databases are much less susceptible to corruption than are unnormalised
databases.
Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized.
These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred
to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often
see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen
In order to create a table that is in first normal form we must extract the repeating groups and place them in a
separate table, which I shall call ORDER_LINE.
I have removed 'product1', 'product2' and 'product3', so there are no repeating groups.
Each row contains one product for one order, so this allows an order to contain any number of products.
1. Anomalies can occur when attributes are dependent on only part of a multi-attribute (composite) key.
2. A relation is in second normal form when all non-key attributes are dependent on the whole key. That
is, no attribute is dependent on only a part of the key.
3. Any relation having a key with a single attribute is in second normal form.
Second normal form (2NF) further addresses the concept of removing duplicative data:
Here we should realize that cust_address and cust_contact are functionally dependent on cust but not on
order_id, therefore they are not dependent on the whole key. To make this table 2NF these attributes must be
removed and placed somewhere else.
1. Anomalies can occur when a relation contains one or more transitive dependencies.
2. A relation is in 3NF when it is in 2NF and has no transitive dependencies.
3. A relation is in 3NF when 'All non-key attributes are dependent on the key, the whole key and nothing
but the key'.
Here we should realise that cust_address and cust_contact are functionally dependent on cust which is
not a key. To make this table 3NF these attributes must be removed and placed somewhere else.
Values in a record that are not part of that record's key do not belong in the table. In general, any time the
contents of a group of fields may apply to more than a single record in the table, consider placing those fields
in a separate table.
For example, in an Employee Recruitment table, a candidate's university name and address may be included.
But you need a complete list of universities for group mailings. If university information is stored in the
Candidates table, there is no way to list universities with no current candidates. Create a separate Universities
table and link it to the Candidates table with a university code key.
R2(room/bldg, campus)
Tables should have only two dimensions. Since one student has several classes, these classes should be
listed in a separate table. Fields Class1, Class2, and Class3 in the above records are indications of design
trouble.
Spreadsheets often use the third dimension, but tables should not. Another way to look at this problem is
with a one-to-many relationship, do not put the one side and the many side in the same table. Instead, create
another table in first normal form by eliminating the repeating group (Class#), as shown below:
2.
Student# Advisor Adv-Room Class#
1022 Jones 412 101-07
1022 Jones 412 143-01
1022 Jones 412 159-02
4123 Smith 216 201-01
4123 Smith 216 211-02
4123 Smith 216 214-01
Second Normal Form: Eliminate Redundant Data
Note the multiple Class# values for each Student# value in the above table. Class# is not functionally
dependent on Student# (primary key), so this relationship is not in second normal form.
Students:
3.
Student# Advisor Adv-Room
1022 Jones 412
4123 Smith 216
Registration:
Student# Class#
1022 101-07
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
1022 143-01 www.thecodexpert.com
1022 159-02
4123 201-01
4123 211-02
4123 214-01
Third Normal Form: Eliminate Data Not Dependent On Key
In the last example, Adv-Room (the advisor's office number) is functionally dependent on the Advisor
attribute. The solution is to move that attribute from the Students table to the Faculty table, as shown
below:
Students:
Student# Advisor
4. 1022 Jones
4123 Smith
Faculty:
this and
Lots of more
Educational and
Technological Stuff...
Visit...
www.thecodexpert.com