Fundamemtal Concept On Database
Fundamemtal Concept On Database
Fundamemtal Concept On Database
CONCEPTS
The term database refers to a collection of related data from which the users can efficiently
retrieve the desired information. In addition to the storage and retrieval of data, certain other
operations can also be performed on a database. These operations include adding, updating
and deleting data. All these operations on a database are performed using a database
management system (DBMS). Essentially, a DBMS is a computerized record-keeping
system. In this topic we will be introduced to the basic terminology used in a database
management system (such as normalisation, entities, attributes, keys, relational database
management systems, structured query language).
Table of Contents
I. DATA, INFORMATION AND KNOWLEDGE ........................................................................... 2
II. DATABASE: DEFINITION........................................................................................................... 3
III. DATA CONCEPTS .................................................................................................................... 5
IV. DATA INTEGRITY ................................................................................................................. 10
V. DATABASE MANAGEMENT SYSTEM ................................................................................... 11
VI. DATABASE MODELS ............................................................................................................ 12
VII. DATABASE NORMALIZATION ........................................................................................... 14
VIII. INTRODUCTION TO QUERIES ............................................................................................ 18
Page 1
Topic : Database Design systems and modelling
I.1 Definitions
Data can be anything such as a number, a person's name, images, sounds and so on. Hence,
data can be defined as a set of isolated and unrelated raw facts (represented by values), which
have little or no meaning because they lack a context for evaluation (e.g. ‘Monica’, ‘36’,
‘chief’ …). When the data are processed and converted into a meaningful and useful form, it
is known as information. Hence, information can be defined as a set of organized and
validated collection of data. For example, 'Monica is 35 years old and she is a chef'.
Strictly speaking, data refer to the values physically recorded in the database, whereas
information refers to the conclusion or meaning drawn out of it. With respect to database,
these terms are synonymous.
Other than data and information, one more term, knowledge, is frequently used with database
technology. Knowledge is the act of understanding the context in which the information is
used. It can be based on learning through information, experience and/or intuition.
Data Model: A data model is a representation of a real world situation about which data is
to be collected and stored in a database. A data model depicts the dataflow and logical
interrelationships among different data elements.
- Accurate: To be useful, information must be accurate at all levels because all further
developments are based on the available information.
- Timely: Information is appreciated only if it is available on time.
- Complete: Complete information tends to be comprehensive in covering the issue or
topic of interest.
- Precise: Information should be to the point, containing all the essential elements of
the relevant subject areas.
- Relevant: Information is relevant if it can be applied to a specific situation, problem
or issue of interest.
Page 2
Topic : Database Design systems and modelling
Decision-making is the process of identifying, selecting and implementing the best possible
alternative. The right information, in the right form and at the right time is essential to make
correct decisions.
Information is vital for communication and is a critical resource for performing work in
organizations. Business managers spend most of their day in communicating with other
managers, subordinates, customers, vendors and so on. A manager must keep track of the
information flow from the sources inside and outside the organization.
The future is shaped by our actions today, and these actions are based upon our knowledge.
Therefore, for achieving higher levels of success, one must be well informed and should have
clarity of information.
Information helps in making sense of our environment, which assists in achieving the
performance objectives. In fact, productivity is directly related to the availability and value of
the information and its application in the related context.
Page 3
Topic : Database Design systems and modelling
II.2.1 Field
A field represents one related part of a table and is the smallest logical structure of storage in
a database. It holds one piece of information about an item or a subject. For example, in a
database maintaining information about employee, the fields can be Code, Deptt, Name,
Address, City and Phone (see Figure 2).
II.2.2 Record
A record is a collection of multiple related fields that can be treated as a unit. For example,
fields Code, Deptt, Name, Address, City and Phone for a particular employee form a record.
Figure 2 contains nine records (0101–0109) and each record has six fields.
II.2.3 Table
A table is a named collection of logically related multiple records. For example, a collection
of all the employee records of a company form employee table. Note that every record in a
table has the same set of fields. Depending on the database software, a table can also be
referred to as a file. The collection of multiple related files (tables) forms the database.
A data type determines the type of data that can be stored in a column. Although many data
types are available, the four most commonly used data types are Character, Numeric,
Boolean and DateTime. The values for this data type vary widely depending on the database
management software being used.
Page 4
Topic : Database Design systems and modelling
Apart from the data, the database also stores metadata, which describes the tables, columns,
indexes, constraints and other items that make up the database. In simple words, metadata is
the data about data. This metadata is stored in an area called the data dictionary. Hence, a
data dictionary defines the basic organization of a database.
Most database systems keep the data dictionary hidden from users to prevent them from
accidentally destroying its contents. Different users use the dictionary in different ways.
Physical concepts of data refer to the manner in which the data are physically stored on the
hardware (like hard disk). Fundamentally, it involves the physical organization of the records
of a file for the convenience of storage and retrieval of data. Usually, the files are organized
in three fashions: sequential, direct and indexed sequential.
In sequential files, the data are stored and/or retrieved in a logical order, that is, in a
sequence. The records are stored one after the other in an ascending or descending order,
based on the key field (which is unique for each record) of the records. Generally, these files
are stored on sequential storage devices such as magnetic tapes and punched cards. In such
files, to retrieve a record, all the records must be traversed sequentially before reaching to the
desired record. An analogy to sequential files may be taken as an audio cassette.
Direct files facilitate accessing any record directly or randomly without having to traverse the
sequence of records. These files are also known as random or relative files. Even though only
one item can be accessed at a time, that item may be stored anywhere in the file. For
example, in case of CDs
Page 5
Topic : Database Design systems and modelling
III.2.1 Entity
An entity is any object in the system that we want to model and store information about.
Entities are usually recognizable concepts, either concrete or abstract, such as person, places,
things, or events which have relevance to the database. Some specific examples of entities are
Employee, Student, Lecturer. An entity is analogous to a table in the relational model.
An entity occurrence is an instance of an entity. For example, in the student entity, the
information about each individual student details is an entity occurrence, An entity
occurrence can also be referred to as a record. By convention, entities are represented by
rectangles:
III.2.2 Attributes
An attribute is an item of information which is stored about an entity. For example, the
entity 'lecturer' could have attributes such as staff id, surname, forename, date of birth,
telephone number, etc. By convention, an attribute is represented by a diamond linked to the
corresponding entity:
III.2.3 Relationship
Page 6
Topic : Database Design systems and modelling
Even though a relationship may involve more than two entities, the most commonly
encountered relationships are binary, involving exactly two entities. Generally, such binary
relationships are of three types and called cardinality: one-to-one, one-to-many and many-
to-many.
One-to-one is where one occurrence of an entity relates to only one occurrence in another
entity, eg if a man only marries one woman and a woman only marries one man, it is a one-
to-one (1:1) relationship.
Fig 5 : One-to-One
Fig 6: One-to-Many
Page 7
Topic : Database Design systems and modelling
In many-to-many relationship, one record in a table can be related to one or more records in a
second table, and one or more records in the second table can be related to one or more
records in the first table. For example, One teacher teaches many students and a student is
taught by many teachers.
III.4 Keys
A key is a data item that allows us to uniquely identify individual occurrences or an entity
type. You can sort and quickly retrieve information from a database by choosing one or more
fields (ie attributes) to act as keys. For instance, in a student's table you could use a
combination of the last name and first name fields (or perhaps last name, first name and birth
dates to ensure you identify each student uniquely) as a key field. There exist many types of
keys:
a) Primary Key:
A field or a set of fields that uniquely identify each record in a table is known as a primary
key. This implies that no two records in the relation can have same value for the primary key.
For example, your student number is a primary key as this uniquely identifies you within the
college student records system. An employee number uniquely identifies a member of staff
within a company. An IP address uniquely addresses a PC on the internet.
A primary key is mandatory. That is, each entity occurrence must have a value for its
primary key.
b) Candidate Key:
In a table, there can be more than one field that can uniquely identify each record. All such
fields are known as candidate keys. One of these candidate keys is chosen as a primary key;
the other keys that are not chosen as primary key are known as alternate keys or secondary
keys.
c) Foreign Key:
A field of a table that references the primary key of another table is referred to as foreign key.
Figure 13.3 illustrates how a foreign key constraint is related to a primary key constraint.
Here, the field Item_Code in the PURCHASE table references the field Item_Code in the
ITEM relation. Thus, the attribute Item_Code in the PURCHASE relation is the foreign key.
Page 8
Topic : Database Design systems and modelling
NOTE: The key composed of more than one field is known as composite key. Sometimes, it is
also known as concatenated key or structured key.
d) Simple Key
Any of the keys described before (ie: primary, secondary or foreign) may have one or more
attributes. A simple key consists of a single attribute to uniquely identify an entity
occurrence, for example, a student number, which uniquely identifies a particular student. No
two students would have the same student number.
e) Compound Key
A compound key consists of more than one attribute to uniquely identify an entity
occurrence. Each attribute, which makes up the key, is also a simple key in its own right.
For example, we have an entity named enrolment, which holds the courses on which a
student is enrolled. In this scenario a student is allowed to enrol on more than one course.
This has a compound key of both student number and course number, which is required to
uniquely identify a student on a particular course.
Student number and course number combined is a compound primary key for the enrolment
entity.
f) Composite Key
A composite key consists of more than one attribute to uniquely identify an entity
occurrence. This differs from a compound key in that one or more of the attributes, which
make up the key, are not simple keys in their own right.
Page 9
Topic : Database Design systems and modelling
For example, you have a database holding your CD collection. One of the entities is called
tracks, which holds details of the tracks on a CD. This has a composite key of CD name,
track number.
CD name in the track entity is a simple key, linking to the CD entity, but track number is not
a simple key in its own right.
Application exercise
For each of the following entities, list possible primary keys. Then, suggest secondary keys,
if any: Student, Course, Unit, Result, Classroom, Lecturer, Department, Attendance
Integrity ensures that the data in a database is both accurate and complete, in other words,
that the data makes sense. There are at least five different types of integrity that need to be
considered: Domain constraints, Entity integrity, Column constraints, User-defined
integrity constraints, Referential integrity. The data analysis stage will identify the
requirements of these.
Domain Constraints: A domain is defined as the set of all unique values permitted
for an attribute. For example, a domain of Date is the set of all possible valid dates, a
domain of Integer is all possible whole numbers, and a domain of day-of-week is
Monday, Tuesday ... Sunday.
Entity Integrity: It implies that no component of a primary key is allowed to have a
NULL value.
Column Constraints: During the data analysis phase, business rules will identify any
column constraints. For example, a salary cannot be negative; an employee number
must be in the range 1000 - 2000, etc.
User-Defined Integrity Constraints: Business rules may dictate that when a specific
action occurs, further actions should be triggered. For example, deletion of a record
automatically writes that record to an audit table.
Referential Integrity: It implies that if a foreign key is defined in one table, any of
its value must exist as a primary key in another table.
Page 10
Topic : Database Design systems and modelling
V.1 Definition
To carry out operations like insertion, deletion and retrieval, the database needs to be
managed by a software package. This software is called a database management system
(DBMS). Hence, DBMS can be defined as a collection of interrelated data and a set of
programs to access that data.
Database system: Database system is a general term that refers to the combination of a
database, a database management system and a data model. This system is responsible for the
following data manipulation acts; data controlling, data retrieving, data maintenance and data
definition.
Advantages
Page 11
Topic : Database Design systems and modelling
carried out whenever access is attempted to sensitive data. To ensure security, DBMS
provides security tools such as user codes and passwords.
→ Maintenance of Data Integrity: Data integrity means the consistency and accuracy
of the data in the database.
→ Better Interaction with Users: Centralizing the data in a database also means that
users can obtain new and combined information that would have been impossible to
obtain otherwise. In addition, use of a DBMS allows the users, who do not know
programming, to interact with the data more easily.
Disadvantages
A database model or simply a data model is an abstract model that describes how the data
are organized and represented. A data model consists of two parts, which are as follows:
Page 12
Topic : Database Design systems and modelling
Every database and DBMS is based on a particular database model. There are four basic
types of database models—hierarchical, network, relational and object-oriented. These
models provide different conceptualizations of the database and they have different outlooks
and perspectives.
The main advantage of the hierarchical data model is that the data access is quite predictable
in structure, and therefore, both retrieval and updates can be highly optimized by a DBMS.
However, the main drawback of this model is that the links are 'hard coded' into the data
structure, that is, the links are permanently established and cannot be modified.
The main limitation of the network data model is that it can be quite complicated to maintain
all the links and a single broken link can lead to problems in the database. In addition, since
there are no restrictions on the number of links, the database design can become
overwhelmingly complex.
Like the other models, the object model assumes that objects can conceptually be collected
together into meaningful groups known as classes.
Page 13
Topic : Database Design systems and modelling
The relational data model represents the database as a collection of simple two-dimensional
tables called tables or relations. The rows of a relation are referred to as tupples and the
columns are referred to as attributes. The relationship between the two relations is
implemented through a common attribute in the relations and not by physical links or
pointers.
Normalisation is a process which we analyze and alter a database relation in order to get more
concise and organized data structures. Normalised data is stable and has a natural structure.
We call a relation normalized if:
VII.1 Dependencies
In order to be able to normalise a relation according to the three normal forms, we must first
understand the concept of dependency between attributes within a relation.
Example:
ID Name The attribute Name is functionally dependent of attribute ID (ID --> Name).
S1 Meier
S2 Weber
Page 14
Topic : Database Design systems and modelling
Example:
Example
IDStudent Name IDProfessor Grade The attribute Grade is fully functional dependent
S1 Meier P2 5 on the attributes IDStudent and IDProfessor.
S2 Weber P1 6
Example:
Page 15
Topic : Database Design systems and modelling
b) Create separate tables for each group of related data and identify each row with a
unique column or set of columns (the primary key).
Example
Student(Surname, Name, Skills)
The attribute Skills can contain multiple values and therefore the relation is not in the first
normal form.
But the attributes Name and Surname are
atomic attributes that can contain only one
value.
Example First normal form
To get to the first normal form (1NF) we
must create a separate tuple for each value of
the multivalued attribute
Example
A university uses the following relation:
Student(IDSt, StudentName, IDProf,
ProfessorName, Grade)
Page 16
Topic : Database Design systems and modelling
The attributes IDSt and IDProf are the identification keys. All attributes a single valued
(1NF).
Example
A bank uses the following relation:
Vendor(ID, Name, Account_No, Bank_Code_No, Bank)
The attribute ID is the primary key. All attributes are single valued (1NF). The table is also in
2NF. The following dependencies exist:
1. Name, Account_No, Bank_Code_No are functionally dependent on ID (ID -->
Name, Account_No, Bank_Code_No)
2. Bank is functionally dependent on Bank_Code_No (Bank_Code_No --> Bank)
Example Third normal form
The table in this example is in 1NF
and in 2NF. But there is a transitive
dependency between Bank_Code_No
and Bank, because Bank_Code_No is
not the primary key of this relation.
To get to the third normal form (3NF),
we have to put the bank name in a
separate table together with the
clearing number to identify it.
Page 17
Topic : Database Design systems and modelling
A report is a file or view of data formatted in a way that allows the user to see a large
amount of data from documents or a database. This helps the user get a quick overview of
their data or allows them to present a lot of data easily.
The selection operation retrieves certain records from a relation based on the user-
specified criteria.
The projection operation extracts fields from a relation, permitting the user to create
new relations that contain only the required information.
The join operation combines the data from the two relations based on a common
column, providing the user with more information than is available in individual
relations.
Together, these three operations are part of relational algebra. Relational database systems
use a query language called Structured Query Language (SQL) to implement relational
algebra operations.
Page 18
Topic : Database Design systems and modelling
When the table is defined every field in it is assigned a data type. The type of a data value
both defines and constrains the kinds of operations, which may be performed on it. Some of
the most commonly used SQL data types are as follows:
• Data Definition Language: DDL is used to create and delete database and its objects.
These commands are primarily used by the DBA during the building and removal
phases of a database project. The most important DDL statements in SQL are as
follows:
- CREATE TABLE: To create a new table.
- ALTER TABLE: To modify the structure of a table.
- DROP TABLE: To delete a table.
• Data Manipulation Language: DML is used to retrieve, insert, modify and delete
database information. These commands will be used by all database users during the
routine operation of the database. The most important DML statements in SQL are the
following:
- INSERT: To insert data into a table.
- UPDATE: To update data in a table.
- DELETE: To delete data from a table.
- SELECT: To retrieve data from a table.
NOTE: All SQL queries must be terminated by a semicolon (;) even if the statement extends
over many lines.
The CREATE TABLE command is used to define the structure of the table.
Syntax: Example:
CREATE TABLE <tablename> ( CREATE TABLE EMPLOYEE(
<field1> <data type>, Code NUMBER(5),
< field2> <data type>, Deptt CHAR(10),
Page 19
Topic : Database Design systems and modelling
• The table and column names must start with a letter followed by letters, numbers or
underscores.
• Avoid using SQL keywords as names for tables or columns (such as SELECT,
CREATE, and INSERT).
• For each column, a name and a data type must be specified and the column name
must be unique within the table definition.
• Each column definition should be separated with a comma.
The ALTER TABLE command allows a user to change the structure of an existing table.
Syntax:
ALTER TABLE <tablename>
<ADD | MODIFY | DROP column(s)>;
Examples: Explanation
1 ALTER TABLE EMPLOYEE command will add a new column, named Email,
ADD Email CHAR(25); having a maximum width of 25
characters in the EMPLOYEE table.
2 ALTER TABLE EMPLOYEE command will change the maximum width of the
MODIFY Name CHAR(25); Name column to 25 characters in the EMPLOYEE
table.
3 ALTER TABLE EMPLOYEE command will delete the Deptt column from the
DROP Deptt; EMPLOYEE table.
The DROP TABLE command removes the table definition (with all records).
Syntax: Examples:
DROP TABLE <tablename>; DROP TABLE EMPLOYEE;
The above SQL command will delete the EMPLOYEE table.
The INSERT command is used to insert or add rows (records) into the specified table.
Page 20
Topic : Database Design systems and modelling
Syntax:
INSERT INTO <tablename> (
column1, column2, ..., columnN)
VALUES (value1, value2, ..., valueN);
Examples Explanation
1 INSERT INTO EMPLOYEE ( example 1 will add a new record at the
Code, Deptt, Name, Address, Salary) bottom of the EMPLOYEE table
VALUES (101, 'RD01', 'Prince', 'Park Way', consisting of the values in parenthesis.
15000);
2 INSERT INTO EMPLOYEE
VALUES (102, 'RD01', 'Pankaj', 'Pitampura',
26062700, 8000);
Note that for each of the listed columns, a matching value must be specified. In case no
column list is specified, then a value must be given for each column and in the same order as
specified in the CREATE TABLE command.
The UPDATE command is used for modifying attribute values of records in a table.
Syntax:
UPDATE <tablename>
SET column1 = value1
[, column2 = value2]
... ... ... ... ...
[, columnN = valueN]
[WHERE <condition>];
Note that components specified inside the square brackets [] are optional.
Examples Explanation
1 UPDATE EMPLOYEE example 1 command will update (in our case,
SET Salary = Salary + 1000; increments) the Salary field with 1000 for all the records.
2 UPDATE EMPLOYEE Example 2 command will increment the Salary column
SET Salary = Salary + 1000 with 1000 for only those rows that comply with
WHERE Deptt = 'RD01'; condition specified in WHERE clause (Deptt = 'RD01').
The DELETE command is used to delete all or selected records from the specified table.
Syntax: Example:
DELETE FROM <tablename> DELETE FROM EMPLOYEE
[WHERE <condition>]; WHERE Salary > 8000;
NOTE: If WHERE condition is not used in the DELETE command, then all the records from
the specified table will be deleted.
Page 21
Topic : Database Design systems and modelling
The SELECT statement is used to query the database and retrieve selected data.
Syntax:
SELECT <column1, column2, column3,...., columnN>
FROM <tablename>
[WHERE <condition>]
[GROUP BY <column1, column2, column3,...., columnN>]
[HAVING <condition>]
[ORDER BY <column1, column2, column3,...., columnN [ASC|DESC]>];
To select all the columns of a table, use * instead of column list with SELECT.
Examples Explanation
1 SELECT Code, Name, Salary The SELECT statement selects the values of the three
FROM EMPLOYEE; specified columns from the EMPLOYEE table. This
operation is called projection.
2 SELECT * The SELECT statement selects all those columns from
FROM EMPLOYEE EMPLOYEE table in which the Salary column contains a
WHERE Salary > 7500; value greater than 7500. This operation is called selection.
SELECT * the SELECT statement displays the result in a descending
FROM EMPLOYEE order by the attribute Name.
WHERE Salary > 7500
ORDER BY Name DESC;
The ORDER BY clause specifies a sorting order in which the result tuples of a query are to
be displayed; DESC specifies a descending order. By default, ORDER BY arranges the
result set in ascending order (whether one uses ASC or not).
The INNER JOIN creates a new result table by combining column values of two tables
(table1 and table2) based upon the join-predicate. The query compares each row of table1
with each row of table2 to find all pairs of rows which satisfy the join-predicate. When the
join-predicate is satisfied, column values for each matched pair of rows of A and B are
combined into a result row.
Syntax:
Page 22
Topic : Database Design systems and modelling
ON table1.common_filed = table2.common_field;
Example:
Inner joins eliminate the rows that do not match with a row from the other table. Outer joins,
however, return all rows from at least one of the tables or views mentioned in the FROM
clause, as long as those rows meet any WHERE or HAVING search conditions. All rows are
retrieved from the left table referenced with a left outer join, and all rows from the right table
referenced in a right outer join. All rows from both tables are returned in a full outer join.
SQL Server uses the following ISO keywords for outer joins specified in a FROM clause:
Page 23
Topic : Database Design systems and modelling
The following example JOINs the region and branch tables on the region_nbr column. Here
are the contents of the tables:
Exercise normalization
The following table is already in first normal form (1NF). Convert this table to the third
normal form (3NF) using the techniques you learned in this topic.
Page 24