DB Lecture Note All in ONE

Database Systems Lecture Note
Finster 1
Introduction to Database System
Database systems are designed to manage large data

set in an organization. The data management involves
both definition and the manipulation of the data
which ranges from simple representation of the data
to considerations of structures for the storage of
information. The data management also consider the
provision of mechanisms for the manipulation of
information.
Today, Databases are essential to every business.

They are used to maintain internal records, to present
data to customers and clients on the World-Wide-
Web, and to support many other commercial
processes. Databases are likewise found at the core of
many modern organizations.
The power of databases comes from a body of

knowledge and technology that has developed over
several decades and is embodied in specialized
software called a database management system, or
DBMS. A DBMS is a powerful tool for creating and
managing large amounts of data efficiently and
allowing it to persist over long periods of time, safely.
These systems are y the most complex types of
software available.
Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 1

Thus, for our question: What is a database? In

essence a database is nothing more than a collection
of shared information that exists over a long period of
time, often many years. In common dialect, the term
database refers to a collection of data that is managed
by a DBMS.
Thus the DB course is about:

 How to organize data
 Supporting multiple users
 Efficient and effective data retrieval
 Secured and reliable storage of data
 Maintaining consistent data
 Making information useful for decision
making
Data management passes through the different levels

of development along with the development in
technology and services. These levels could best be
described by categorizing the levels into three levels
of development. Even though there is an advantage
and a problem overcome at each new level, all
methods of data handling are in use to some extent.
The major three levels are;
1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
1. Manual Approach

In the manual approach, data storage and retrieval

follows the primitive and traditional way of
information handling where cards and paper are used
for the purpose. The data storage and retrieval will be
performed using human labour.
➢ Files for as many event and objects as the

organization has are used to store information.
➢ Each of the files containing various kinds of
information is labelled and stored in one ore
more cabinets.
➢ The cabinets could be kept in safe places for
security purpose based on the sensitivity of the
information contained in it.
➢ Insertion and retrieval is done by searching first
for the right cabinet then for the right the file
then the information.
➢ One could have an indexing system to facilitate
access to the data
Limitations of the Manual approach

➢ Prone to error
➢ Difficult to update, retrieve, integrate
➢ You have the data but it is difficult to compile the
information
➢ Limited to small size information
➢ Cross referencing is difficult

An alternative approach of data handling is a

computerized way of dealing with the information.
The computerized approach could also be either
decentralized or centralized based on where the data
resides in the system.

2. Traditional File Based Approach

After the introduction of Computers for data
processing to the business community, the need to
use the device for data storage and processing
increase. There were, and still are, several computer
applications with file based processing used for the
purpose of data handling. Even though the
approach evolved over time, the basic structure is
still similar if not identical.
➢ File based systems were an early attempt to
computerize the manual filing system.
➢ This approach is the decentralized computerized
data handling method.
➢ A collection of application programs perform
services for the end-users. In such systems, every
application program that provides service to end
users define and manage its own data
➢ Such systems have number of programs for each
of the different applications in the organization.
➢ Since every application defines and manages its
own data, the system is subjected to serious data
duplication problem.
➢ File, in traditional file based approach, is a
collection of records which contains logically
related data.

Limitations of the Traditional File Based approach

As business application become more complex

demanding more flexible and reliable data handling
methods, the shortcomings of the file based system
became evident. These shortcomings include, but not
limited to:
➢ Separation or Isolation of Data: Available
information in one application may not be
known. Data Synchronisation is done manually.
➢ Limited data sharing- every application
maintains its own data.
➢ Lengthy development and maintenance time
➢ Duplication or redundancy of data (money and
time cost and loss of data integrity)
➢ Data dependency on the application- data
structure is embedded in the application; hence, a
change in the data structure needs to change the
application as well.
➢ Incompatible file formats or data structures (e.g.
“ C” and COBOL) between different applications
and programs creating inconsistency and
difficulty to process jointly.
➢ Fixed query processing which is defined during
application development
The limitations for the traditional file based data
handling approach arise from two basic reasons.
1. Definition of the data is embedded in the
application program which makes it difficult
to modify the database definition easily.

2. No control over the access and manipulation

of the data beyond that imposed by the
application programs.
The most significant problem experienced by the
traditional file based approach of data handling can
be formalized by what is called “ update anomalies” .
We have three types of update anomalies;
1. Modification Anomalies: a problem experienced
when one ore more data value is modified on one
application program but not on others containing
the same data set.
2. Deletion Anomalies: a problem encountered
where one record set is deleted from one
application but remain untouched in other
3. Insertion Anomalies: a problem experienced
when ever there is new data item to be recorded,
and the recording is not made in all the
applications. And when same data item is
inserted at different applications, there could be
errors in encoding which makes the new data
item to be considered as a totally different object.

3. Database Approach
Following a famous paper written by Dr. Edgard
Frank Codd in 1970, database systems changed
significantly. Codd proposed that database systems
should present the user with a view of data organized
as tables called relations. Behind the scenes, there
might be a complex data structure that allowed rapid
response to a variety of queries. But, unlike the user
of earlier database systems, the user of a relational
system would not be concerned with the storage
structure. Queries could be expressed in a very high-
level language, which greatly increased the efficiency
of database programmers. The database approach
emphasizes the integration and sharing of data
throughout the organization.
Thus in Database Approach:

➢ Database is just a computerized record keeping
system or a kind of electronic filing cabinet.
➢ Database is a repository for collection of
computerized data files.
➢ Database is a shared collection of logically related
data and description of data designed to meet the
information needs of an organization. Since it is a
shared corporate resource, the database is
integrated with minimum amount of or no
duplication.
➢ Database is a collection of logically related data
where these logically related data comprises

entities, attributes, relationships, and business

rules of an organization's information.
➢ In addition to containing data required by an
organization, database also contains a description
of the data which is known as “ Metadata” or
“ Data Dictionary” or “ Systems Catalogue” or
“ Data about Data” or some times “ Data
Directory”.
➢ Since a database contains information about the
data (metadata), it is called a self descriptive
collection of integrated records.
➢ The purpose of a database is to store information
and to allow users to retrieve and update that
information on demand.
➢ Database is deigned once and used
simultaneously by many users.
➢ Unlike the traditional file based approach in
database approach there is program data
independence. That is the separation of the data
definition from the application. Thus the
application is not affected by changes made in the
data structure and file organization.
➢ Each database application will perform the
combination of: Creating database, Reading,
Updating and Deleting data.

Benefits of the database approach

➢ Data can be shared: two or more users can access
and use same data instead of storing data in
redundant manner for each user.
➢ Improved accessibility of data: by using structured
query languages, the users can easily access data
without programming experience.
➢ Redundancy can be reduced: isolated data is
integrated in database to decrease the redundant
data stored at different applications.
➢ Quality data can be maintained: the different
integrity constraints in the database approach
will maintain the quality leading to better
decision making
➢ Inconsistency can be avoided: controlled data
redundancy will avoid inconsistency of the data
in the database to some extent.
➢ Transaction support can be provided: basic demands
of any transaction support systems are implanted
in a full scale DBMS.
➢ Integrity can be maintained: data at different
applications will be integrated together with
additional constraints to facilitate validity and
consistency of shared data resource.

➢ Security measures can be enforced: the shared data

can be secured by having different levels of
clearance and other data security mechanisms.
➢ Improved decision support: the database will
provide information useful for decision making.
➢ Standards can be enforced: the different ways of
using and dealing with data by different unite of
an organization can be balanced and
standardized by using database approach.
➢ Compactness: since it is an electronic data
handling method, the data is stored compactly
(no voluminous papers).
➢ Speed: data storage and retrieval is fast as it will
be using the modern fast computer systems.
➢ Less labour: unlike the other data handling
methods, data maintenance will not demand
much resource.
➢ Centralized information control: since relevant data
in the organization will be stored at one
repository, it can be controlled and managed at
the central level.

Limitations and risk of Database Approach

➢ Introduction of new professional and specialized
personnel.
➢ Complexity in designing and managing data
➢ The cost and risk during conversion from the old
to the new system
➢ High cost to be incurred to develop and maintain
the system
➢ Complex backup and recovery services from the
users perspective
➢ Reduced performance due to centralization and
data independency
➢ High impact on the system when failure occurs
to the central system.

Database Management System (DBMS)

Database Management System (DBMS) is a Software
package used for providing EFFICIENT,
CONVENIENT and SAFE MULTI-USER (many
people/programs accessing same database, or even
same data, simultaneously) storage of and access to
MASSIVE amounts of PERSISTENT (data outlives
programs that operate on it) data. A DBMS also
provides a systematic method for creating, updating,
storing, retrieving data in a database. DBMS also
provides the service of controlling data access,
enforcing data integrity, managing concurrency
control, and recovery. Having this in mind, a full
scale DBMS should at least have the following
services to provide to the user.

1. Data storage, retrieval and update in the database

2. A user accessible catalogue
3. Transaction support service: ALL or NONE
transaction, which minimize data inconsistency.
4. Concurrency Control Services: access and update
on the database by different users simultaneously
should be implemented correctly.
5. Recovery Services: a mechanism for recovering the
database after a failure must be available.
6. Authorization Services (Security): must support
the implementation of access and authorization
service to database administrator and users.
7. Support for Data Communication: should provide
the facility to integrate with data transfer software
or data communication managers.
8. Integrity Services: rules about data and the change
that took place on the data, correctness and
consistency of stored data, and quality of data
based on business constraints.
9. Services to promote data independency between
the data and the application
10. Utility services: sets of utility service facilities
like
➢ Importing data
➢ Statistical analysis support
➢ Index reorganization
➢ Garbage collection

DBMS and Components of DBMS Environment
Fig. General architecture of a DBMS
A DBMS is software package used to design,

manage, and maintain databases. Each DBMS
should have facilities to define the database,
manipulate the content of the database and control
the database. These facilities will help the designer,
the user as well as the database administrator to
discharge their responsibility in designing, using

and managing the database. It provides the

following facilities:
➢ Data Definition Language (DDL):

o Language used to define each data
element required by the organization.
o Commands for setting up schema or the
intension of database
o These commands are used to setup a
database, create, delete and alter table with
the facility of handling constraints
➢ Data Manipulation Language (DML):

o Is a core command used by end-users
and programmers to store, retrieve, and
access the data in the database e.g. SQL
o Since the required data or Query by the
user will be extracted using this type of
language, it is also called "Query Language"
➢ Data Dictionary:
o Due to the fact that a database is a self
describing system, this tool, Data Dictionary,
is used to store and organize information
about the data stored in the database.
➢ Data Control Language:

o Database is a shared resource that
demands control of data access and usage.

The database administrator should have the

facility to control the overall operation of the
system.
o Data Control Languages are commands
that will help the Database Administrator to
control the database.
o The commands include grant or revoke
privileges to access the database or
particular object within the database and to
store or remove database transactions
The DBMS is software package that helps to design,

manage, and use data using the database approach.
Taking a DBMS as a system, one can describe it with
respect to it environment or other systems interacting
with the DBMS. The DBMS environment has five
components. To design and use a database, there will
be the interaction or integration of Hardware,
Software, Data, Procedure and People.
1. Hardware: are components that one can touch

and feel. These components are comprised of
various types of personal computers, mainframe
or any server computers to be used in multi-user
system, network infrastructure, and other
peripherals required in the system.

2. Software: are collection of commands and

programs used to manipulate the hardware to
perform a function. These include components
like the DBMS software, application programs,
operating systems, network software, language
software and other relevant software.
3. Data: since the goal of any database system is to

have better control of the data and making data
useful, Data is the most important component to
the user of the database. There are two categories
of data in any database system: that is Operational
and Metadata. Operational data is the data
actually stored in the system to be used by the
user. Metadata is the data that is used to store
information about the database itself.
The structure of the data in the database is called
the schema, which is composed of the Entities,
Properties of entities, and relationship between
entities and business constraints.

4. Procedure: this is the rules and regulations on

how to design and use a database. It includes
procedures like how to log on to the DBMS, how
to use facilities, how to start and stop DBMS,
how to make backup, how to treat hardware and
software failure, how to change the structure of
the database.
5. People: this component is composed of the

people in the organization that are responsible or
play a role in designing, implementing,
managing, administering and using the resources
in the database. This component includes group
of people with high level of knowledge about the
database and the design technology to other with
no knowledge of the system except using the
data in the database.

Database Development Life Cycle (DDLC)
As it is one component in most information system

development tasks, there are several steps in
designing a database system. Here more emphasis is
given to the design phases of the system development
life cycle. The major steps in database design are;
1. Planning: that is identifying information gap in

an organization and propose a database solution
to solve the problem.
2. Analysis: that concentrates more on fact finding

about the problem or the opportunity. Feasibility
analysis, requirement determination and
structuring, and selection of best design method
are also performed at this phase.
3. Design: in database development more emphasis

is given to this phase. The phase is further
divided into three sub-phases.
a. Conceptual Design: concise description of
the data, data type, relationship between
data and constraints on the data.
• There is no implementation or physical
detail consideration.
• Used to elicit and structure all
information requirements

b. Logical Design: a higher level conceptual

abstraction with selected specific data model to
implement the data structure.
• It is particular DBMS independent and
with no other physical considerations.
c. Physical Design: physical implementation of
the logical design of the database with
respect to internal storage and file structure
of the database for the selected DBMS.
• To develop all technology and
organizational specification.
4. Implementation: the testing and deployment of

the designed database for use.
5. Operation and Support: administering and

maintaining the operation of the database system
and providing support to users. Tuning the
database operations for best performance.

Roles in Database Design and Use
As people are one of the components in DBMS

environment, there are group of roles played by
different stakeholders of the designing and operation
of a database system.
1. Database Administrator (DBA)

➢ Responsible to oversee, control and manage the
database resources (the database itself, the DBMS
and other related software)
➢ Authorizing access to the database
➢ Coordinating and monitoring the use of the
database
➢ Responsible for determining and acquiring
hardware and software resources
➢ Accountable for problems like poor security,
poor performance of the system
➢ Involves in all steps of database development
We can have further classifications of this role in
big organizations having huge amount of data and
user requirement.
1. Data Administrator (DA): is responsible on
management of data resources. This involves
in database planning, development,
maintenance of standards policies and
procedures at the conceptual and logical
design phases.

2. Database Administrator (DBA): This is

more technically oriented role. DBA is
responsible for the physical realization of the
database. It is involved in physical design,
implementation, security and integrity
control of the database.
2. Database Designer (DBD)

➢ Identifies the data to be stored and choose the
appropriate structures to represent and store the
data.
➢ Should understand the user requirement and
should choose how the user views the database.
➢ Involve on the design phase before the
implementation of the database system.
We have two distinctions of database designers,
one involving in the logical and conceptual design
and another involving in physical design.
1. Logical and Conceptual DBD

➢ Identifies data (entity, attributes and
relationship) relevant to the
organization
➢ Identifies constraints on each data
➢ Understand data and business rules in
the organization

➢ Sees the database independent of any

data model at conceptual level and
consider one specific data model at
logical design phase.
2. Physical DBD
➢ Take logical design specification as input
and decide how it should be physically
realized.
➢ Map the logical data model on the
specified DBMS with respect to tables and
integrity constraints. (DBMS dependent
designing)
➢ Select specific storage structure and access
path to the database
➢ Design security measures required on the
database

3. Application Programmer and Systems Analyst

➢ System analyst determines the user
requirement and how the user wants to view
the database.
➢ The application programmer implements these
specifications as programs; code, test, debug,
document and maintain the application
program.
➢ The application programmer determines the
interface on how to retrieve, insert, update and
delete data in the database.
➢ The application could use any high level
programming language according to the
availability, the facility and the required
service.
4. End Users
Workers, whose job requires accessing the
database frequently for various purposes, there
are different group of users in this category.
1. Naïve Users:
➢ Sizable proportion of users
➢ Unaware of the DBMS
➢ Only access the database based on their
access level and demand
➢ Use standard and pre-specified types of
queries.

2. Sophisticated Users
➢ Users familiar with the structure of the
Database and facilities of the DBMS.
➢ Have complex requirements
➢ Have higher level queries
➢ Are most of the time engineers, scientists,
business analysts, etc
3. Casual Users
➢ Users who access the database
occasionally.
➢ Need different information from the
database each time.
➢ Use sophisticated database queries to
satisfy their needs.
➢ Are most of the time middle to high level
managers.
These users can be again classified as “ Actors on the

Scene” and “ Workers Behind the Scene” .
Actors on the Scene:

➢ Data Administrator
➢ Database Administrator
➢ Database Designer
➢ End Users
Workers behind the scene

➢ DBMS designers and implementers: who design
and implement different DBMS software.

➢ Tool Developers: experts who develop software

packages that facilitates database system
designing and use. Prototype, simulation, code
generator developers could be an example.
Independent software vendors could also be
categorized in this group.
➢ Operators and Maintenance Personnel: system
administrators who are responsible for actually
running and maintaining the hardware and
software of the database system and the
information technology facilities.

ANSI-SPARC Architecture
The purpose and origin of the Three-Level database

architecture
 All users should be able to access same data.
This is important since the database is having a
shared data feature where all the data is stored
in one location and all users will have their
own customized way of interacting with the
data.
 A user's view is unaffected or immune to
changes made in other views. Since the
requirement of one user is independent of the
other, a change made in one user’ s view
should not affect other users.
 Users should not need to know physical
database storage details. As there are naïve
users of the system, hardware level or physical
details should be a black-box for such users.
 DBA should be able to change database storage
structures without affecting the users' views. A
change in file organization, access method
should not affect the structure of the data
which in turn will have no effect on the users.
 Internal structure of database should be
unaffected by changes to physical aspects of
storage, such as change of hard disk
 DBA should be able to change conceptual
structure of database without affecting all

users. In any database system, the DBA will

have the privilege to change the structure of
the database, like adding tables, adding and
deleting an attribute, changing the
specification of the objects in the database.
All of the above and much more functionalities
are possible due to the three level ANSI-SPARC
architecture.
Three-level ANSI-SPARC Architecture of a Database

ANSI-SPARC Architecture and Database Design

Phases
External Level: Users' view of the database. It

describes that part of database that is relevant to a
particular user. Different users have their own
customized view of the database independent of
other users.
Conceptual Level: Community view of the database.

Describes what data is stored in database and
relationships among the data along with the
business constraints.
Internal Level: Physical representation of the

database on the computer. Describes how the data
is stored in the database.

The following example can be taken as an illustration

for the difference between the three levels in the
ANSI-SPARC database Architecture. Where:
• The first level is concerned about the group
of users and their respective data
requirement independent of the other.
• The second level is describing the whole
content of the database where one piece of
information will be represented once.
• The third level

Differences between Three Levels of ANSI-SPARC

Architecture

Defines DBMS schemas at three levels:

Internal schema: at the internal level to describe
physical storage structures and access paths.
Typically uses a physical data model i.e. specific
DBMS.
Conceptual schema: at the conceptual level to

describe the structure and constraints for the whole
database for a community of users. It uses a
conceptual or an implementation data model.
External schema: at the external level to describe the

various user views. Usually uses the same data
model as the conceptual level.
Data Independence
Logical Data Independence:
 Refers to immunity of external schemas to
changes in conceptual schema.
 Conceptual schema changes e.g.
addition/removal of entities should not require
changes to external schema or rewrites of
 The capacity to change the conceptual schema
without having to change the external schemas
and their application programs.

Physical Data Independence

 The ability to modify the physical schema
without changing the logical schema
 Applications depend on the logical schema
 In general, the interfaces between the various
levels and components should be well defined
so that changes in some parts do not seriously
influence others.
 The capacity to change the internal schema
without having to change the conceptual
schema
 Refers to immunity of conceptual schema to

changes in the internal schema
 Internal schema changes e.g. using different file
organizations, storage structures/devices
should not require change to conceptual or
external schemas.

Data Independence and the ANSI-SPARC Three-level

Architecture

The distinction between a Data Definition

Language (DDL) and a Data Manipulation
Language (DML)
Database Languages
Data Definition Language (DDL)

 Allows DBA or user to describe and name
entitles, attributes and relationships required
for the application.
 Specification notation for defining the database
schema
Data Manipulation Language (DML)

 Provides basic data manipulation operations
on data held in the database.
 Language for accessing and manipulating the
data organized by the appropriate data model
 DML also known as query language
Procedural DML: user specifies what data is

required and how to get the data.
Non-Procedural DML: user specifies what data

is required but not how it is to be
retrieved
Data Control Language (DCL)
 Allows a DBA to define access control and
privileges for users.

 It is a mechanism for implementing security at

a database object level.
 Uses the Grant and Revoke SQL Statements
SQL is the most widely used non-procedural query

language
Fourth Generation Language (4GL)

 Query Languages
 Forms Generators
 Report Generators
 Graphics Generators
 Application Generators

A Classification of data models

Data Model
A specific DBMS has its own specific Data Definition

Language to define a database schema, but this type
of language is too low level to describe the data
requirements of an organization in a way that is
readily understandable by a variety of users.
We need a higher-level language.
Such a higher-level description of the database
schema is called data-model.
Data Model: a set of concepts to describe the structure

of a database, and certain constraints that the database
should obey.
A data model is a description of the way that data is

stored in a database. Data model helps to understand
the relationship between entities and to create the
most effective structure to hold data.
Data Model is a collection of tools or concepts for

describing
 Data
 Data relationships
 Data semantics
 Data constraints

The main purpose of Data Model is to represent the

data in an understandable way.
Categories of data models include:
 Object-based
 Record-based
 Physical
Record-based Data Models

Consist of a number of fixed format records.
Each record type defines a fixed number of fields,
Each field is typically of a fixed length.
 Hierarchical Data Model
 Network Data Model
 Relational Data Model

1. Hierarchical Model
• The simplest data model
• Record type is referred to as node or
segment
• The top node is the root node
• Nodes are arranged in a hierarchical
structure as sort of upside-down tree
• A parent node can have more than one child
node
• A child node can only have one parent node
• The relationship between parent and child is
one-to-many
• Relation is established by creating physical
link between stored records (each is stored
with a predefined access path to other
records)
• To add new record type or relationship, the
database must be redefined and then stored
in a new form.
Department
Employee Job
Time Card Activity

ADVANTAGES of Hierarchical Data Model:

 Hierarchical Model is simple to construct and
operate on
 Corresponds to a number of natural
hierarchically organized domains - e.g.,
assemblies in manufacturing, personnel
organization in companies
 Language is simple; uses constructs like GET,
GET UNIQUE, GET NEXT, GET NEXT
WITHIN PARENT etc.
DISADVANTAGES of Hierarchical Data Model:

 Navigational and procedural nature of
processing
 Database is visualized as a linear arrangement
of records
 Little scope for "query optimization"
2. Network Model
• Allows record types to have more than one
parent unlike hierarchical model
• A network data models sees records as set
members
• Each set has an owner and one or more
members

• Allow no many to many relationship

between entities
• Like hierarchical model network model is a
collection of physically linked records.
• Allow member records to have more than
one owner
Department Job
Employee
Activity
Time Card
ADVANTAGES of Network Data Model:

 Network Model is able to model complex
relationships and represents semantics of
add/delete on the relationships.
 Can handle most situations for modeling using
record types and relationship types.
 Language is navigational; uses constructs like
FIND, FIND member, FIND owner, FIND
NEXT within set, GET etc. Programmers can
do optimal navigation through the database.
DISADVANTAGES of Network Data Model:

 Navigational and procedural nature of

processing
 Database contains a complex array of pointers
that thread through a set of records.
 Little scope for automated "query
optimization”

3. Relational Data Model

• Developed by Dr. Edgar Frank Codd in 1970
(famous paper, 'A Relational Model for Large
Shared Data Banks')
• Terminologies originates from the branch of
mathematics called set theory and predicate
logic and is based on the mathematical concept
called Relation
• Can define more flexible and complex
relationship
• Viewed as a collection of tables called
“ Relations” equivalent to collection of record
types
• Relation: Two dimensional table
• Stores information or data in the form of tables
 rows and columns
• A row of the table is called tuple equivalent
to record
• A column of a table is called attribute
equivalent to fields
• Data value is the value of the Attribute
• Records are related by the data stored jointly in
the fields of records in two tables or files. The
related tables contain information that creates
the relation
• The tables seem to be independent but are
related some how.

• No physical consideration of the storage is

required by the user
• Many tables are merged together to come up
with a new virtual view of the relationship
Alternative
terminologies
Relation Table File
Tuple Row Record
Attribute Column Field
• The rows represent records (collections of

information about separate items)
• The columns represent fields (particular
attributes of a record)
• Conducts searches by using data in specified
columns of one table to find additional data in
another table
• In conducting searches, a relational database
matches information from a field in one table
with information in a corresponding field of
another table to produce a third table that
combines requested data from both tables

Chapter Two
Relational Data Model
Important terms:
Relation: a table with rows and columns
Attribute: a named column of a relation
Domain: a set of allowable values for one or more
attributes
Tuple: a row of a relation
Degree: the degree of a relation is the number of
attributes it contains
Unary relation, Binary relation, Ternary relation, N-ary
relation
Cardinality: of a relation is the number of tuples
the relation has
Relational Database: a collection of normalized
relations with distinct relation names.
Relation Schema: a named relation defined by a set
of attribute-domain name pair

Let A1, A2...........An be attributes with domain D1, D2

………,Dn.
Then the sets {A1:D1, A2:D2… An:Dn} is a Relation
Schema. A relation R, defined by a relation schema
S, is a set of mappings from attribute names to their
corresponding domains. Thus a relation is a set of
n- tuples of the form
(A1:d1, A2:d2 ,…, An:dn) where d1 є D1, d2 є D2,…….. dn
є Dn,
Eg.
Student (studentId char(10), studentName char(50),
DOB date) is a relation schema for the student
entity in SQL
Relational Database schema: a set of relation

schema each with distinct names.
Suppose R1, R2,……, Rn is the set of relation schema
in a relational database then the relational database
schema (R) can be stated as

R={ R1 , R2 ,……., Rn}
Properties of Relational Databases
• A relation has a name that is distinct from all

other relation names in the relational schema.
• Each tuple in a relation must be unique
• All tables are LOGICAL ENTITIES
• Each cell of a relation contains exactly one
atomic (single) value.
• Each column (field or attribute) has a distinct
name.
• The values of an attribute are all from the same
domain.
• A table is eith
• +er a BASE TABLES (Named Relations) or
VIEWS (Unnamed Relations)
• Only Base Tables are physically stored
• VIEWS are derived from -BASE TABLES with
SQL statements like: [SELECT .. FROM ..
WHERE .. ORDER BY]
• Relational database is the collection of tables
o Each entity in one table
o Attributes are fields (columns) in table
• Order of rows theoretically ( but practically has
impact on performance) and columns is
immaterial

• Entries with repeating groups are said to be

un-normalized
All values in a column represent the same attribute

and have the same data format

Building Blocks of the Relational Data Model
The building blocks of the relational data model are:
➢ Entities: real world physical or logical object

➢ Attributes: properties used to describe each
Entity or real world object.
➢ Relationship: the association between Entities
➢ Constraints: rules that should be obeyed while
manipulating the data.
1. The ENTITIES (persons, places, things etc.) which

the organization has to deal with. Relations can also
describe relationships
The name given to an entity should always be a

singular noun descriptive of each item to be
stored in it. E.g. : student NOT students.
Every relation has a schema, which describes the

columns, or fields the relation itself corresponds
to our familiar notion of a table:
A relation is a collection of tuples, each of which
contains values for a fixed number of attributes
 Existence Dependency: the dependence of an
entity on the existence of one or more entities.
 Weak entity : an entity that can not exist without
the entity with which it has a relationship – it is
indicated by a double rectangle

2. The ATTRIBUTES - the items of information which

characterize and describe these entities.
Attributes are pieces of information ABOUT

entities. The analysis must of course identify
those which are actually relevant to the proposed
application. Attributes will give rise to recorded
items of data in the database
At this level we need to know such things as:
• Attribute name (be explanatory words or

phrases)
• The domain from which attribute values are
taken (A DOMAIN is a set of values from
which attribute values may be taken.)
Each attribute has values taken from a
domain. For example, the domain of Name
is string and that for salary is real. How
ever these are not shown on E-R models
• Whether the attrib\\ute is part of the entity
identifier (attributes which just describe
an entity and those which help to identify
it uniquely)
• Whether it is permanent or time-varying
(which attributes may change their values
over time)

• Whether it is required or optional for the

entity (whose values will sometimes be
unknown or irrelevant)
Types of Attributes
(1) Simple (atomic) Vs Composite attributes

•
• Simple : contains a single value (not
divided into sub parts)
E.g. Age, gender
• Composite: Divided into sub parts
(composed of other attributes)
E.g. Name, address
(2) Single-valued Vs multi-valued attributes

• Single-valued : have only single
value(the value may change but has
only one value at one time)
E.g. Name, Sex, Id. No. color_of_eyes
• Multi-Valued: have more than one
value
E.g. Address, dependent-name
Person may have several college
degrees
(3) Stored vs. Derived Attribute

• Stored : not possible to derive or
compute

E.g. Name, Address

• Derived: The value may be derived
(computed) from the values of other
attributes.
E.g. Age (current year – year of birth)
Length of employment (current
date- start date)
Profit (earning-cost)
G.P.A (grade point/credit hours)
(4) Null Values
• NULL applies to attributes which are
not applicable or which do not have
values.
• You may enter the value NA (meaning
not applicable)
• Value of a key attribute can not be null.
Default value - assumed value if no explicit

value
Entity versus Attributes

When designing the conceptual specification of the
database, one should pay attention to the distinction
between an Entity and an Attribute.
 Consider designing a database of employees for
an organization:

 Should address be an attribute of Employees or an

entity (connected to Employees by a
relationship)?
• If we have several addresses per
employee, address must be an entity
(attributes cannot be set-valued/multi
valued)
 If the structure (city, Woreda, Kebele, etc) is
important, e.g. want to retrieve employees in a
given city, address must be modeled as an entity
(attribute values are atomic)

3. The RELATIONSHIPS between entities which

exist and must be taken into account when
processing information. In any business processing
one object may be associated with another object
due to some event. Such kind of association is what
we call a RELATIONSHIP between entity objects.
• One external event or process may affect

several related entities.
• Related entities require setting of LINKS
from one part of the database to another.
• A relationship should be named by a word
or phrase which explains its function
• Role names are different from the names of
entities forming the relationship: one entity
may take on many roles, the same role may
be played by different entities
• For each RELATIONSHIP, one can talk
about the Number of Entities and the
Number of Tuples participating in the
association. These two concepts are called
DEGREE and CARDINALITY of a
relationship respectively.
Degree of a Relationship
• An important point about a relationship is
how many entities participate in it. The
number of entities participating in a

relationship is called the DEGREE of the

relationship.
Among the Degrees of relationship, the

following are the basic:
O UNARY/RECURSIVE
RELATIONSHIP: Tuples/records of a
Single entity are related withy each other.
O BINARY RELATIONSHIPS:
Tuples/records of two entities are associated
in a relationship
O TERNARY RELATIONSHIP:
Tuples/records of three different entities are
associated
o And a generalized one:
▪ N-ARY RELATIONSHIP: Tuples
from arbitrary number of entity sets are
participating in a relationship.

Cardinality of a Relationship
• Another important concept about
relationship is the number of
instances/tuples that can be associated with
a single instance from one entity in a single
relationship. The number of instances
participating or associated with a single
instance from an entity in a relationship is
called the CARDINALITY of the
relationship. The major cardinalities of a
relationship are:
o ONE-TO-ONE: one tuple is associated
with only one other tuple.
▪ E.g. Building – Location as a
single building will be located in a
single location and as a single
location will only accommodate a
single Building.
o ONE-TO-MANY, one tuple can be
associated with many other tuples, but
not the reverse.
▪ E.g. Department-Student as one
department can have multiple
students.
o MANY-TO-ONE, many tuples are
associated with one tuple but not the
reverse.

▪ E.g. Employee – Department: as

many employees belong to a single
department.
o MANY-TO-MANY: one tuple is
associated with many other tuples and
from the other side, with a different role
name one tuple will be associated with
many tuples
▪ E.g. Student – Courseas a student
can take many courses and a single
course can be attended by many
students.
However, the degree and cardinality of a relation

are different from degree and cardinality of a
relationship.

• Key constraints
If tuples are need to be unique in the database,
and then we need to make each tuple distinct. To
do this we need to have relational keys that
uniquely identify each record.
Super Key: an attribute or set of attributes that

uniquely identifies a tuple within a
relation.
Candidate Key: a super key such that no proper
subset of that collection is a Super Key
within the relation.
A candidate key has two properties:
1. Uniqueness
2. Irreducibility
If a super key is having only one
attribute, it is automatically a Candidate
key.
If a candidate key consists of more than
one attribute it is called Composite Key.
Primary Key: the candidate key that is selected to
identify tuples uniquely within the
relation.
The entire set of attributes in a relation
can be considered as a primary case in a
worst case.

Foreign Key: an attribute, or set of attributes,

within one relation that matches the
candidate key of some relation.
A foreign key is a link between different
relations to create a view or an unnamed
relation
Relational Constraints/Integrity Rules

• Relational Integrity
➢ Domain Integrity: No value of the
attribute should be beyond the
allowable limits
➢ Entity Integrity: In a base relation, no
attribute of a Primary Key can assume
a value of NULL
➢ Referential Integrity: If a Foreign Key
exists in a relation, either the Foreign
Key value must match a Candidate
Key value in its home relation or the
Foreign Key value must be NULL
➢ Enterprise Integrity: Additional rules
specified by the users or database
administrators of a database are
incorporated
• Relational Views

Relations are perceived as a Table from the users’

perspective. Actually, there are two kinds of relation
in relational database. The two categories or types of
Relations are Named and Unnamed Relations. The
basic difference is on how the relation is created, used
and updated:
1. Base Relation
A Named Relation corresponding to an entity in
the conceptual schema, whose tuples are
physically stored in the database.
2. View (Unnamed Relation)
A View is the dynamic result of one or more
relational operations operating on the base
relations to produce another virtual relation that
does not actually exist as presented. So a view is
virtually derived relation that does not
necessarily exist in the database but can be
produced upon request by a particular user at the
time of request. The virtual table or relation can
be created from single or different relations by
extracting some attributes and records with or
without conditions.
Purpose of a view
➢ Hides unnecessary information from users:
since only part of the base relation (Some
collection of attributes, not necessarily all)
are to be included in the virtual table.

➢ Provide powerful flexibility and security:

since unnecessary information will be
hidden from the user there will be some sort
of data security.
➢ Provide customized view of the database for
users: each user is going to be interfaced
with their own preferred data set and format
by making use of the Views.
➢ A view of one base relation can be updated.
➢ Update on views derived from various
relations is not allowed since it may violate
the integrity of the database.
➢ Update on view with aggregation and
summary is not allowed. Since aggregation
and summary results are computed from a
base relation and does not exist actually.

Schemas and Instances and Database State
When a database is designed using a Relational data

model, all the data is represented in a form of a table.
In such definitions and representation, there are two
basic components of the database. The two
components are the definition of the Relation or the
Table and the actual data stored in each table. The
data definition is what we call the Schema or the
skeleton of the database and the Relations with some
information at some point in time is the Instance or
the flesh of the database.
Schemas
 Schema describes how data is to be structured,
defined at setup/Design time (also called
"metadata")
 Since it is used during the database development
phase, there is rare tendency of changing the
schema unless there is a need for system
maintenance which demands change to the
definition of a relation.
 Database Schema (Intension): specifies name of

relation and the collection of the attributes
(specifically the Name of attributes).
➢ refer to a description of database (or
intention)

➢ specified during database design

➢ should not be changed unless during
maintenance
 Schema Diagrams
➢ convention to display some aspect of a
schema visually
 Schema Construct
➢ refers to each object in the schema (e.g.
STUDENT)
E.g.: STUNEDT
(FName,LName,Id,Year,Dept, Sex)

Instances
 Instance: is the collection of data in the database

at a particular point of time (snap-shot).
➢ Also called State or Snap Shot or Extension
of the database
➢ Refers to the actual data in the database at a
specific point in time
➢ State of database is changed any time we
add, delete or update an item.
➢ Valid state: the state that satisfies the
structure and constraints specified in the
schema and is enforced by DBMS
 Since Instance is actual data of database at some

point in time, changes rapidly
 To define a new database, we specify its database
schema to the DBMS (database is empty)
 database is initialized when we first load it with
data

Chapter Three
Database Design
Database design is the process of coming up with
different kinds of specification for the data to be
stored in the database. The database design part is
one of the middle phases we have in information
systems development where the system uses a
database approach. Design is the part on which we
would be engaged to describe how the data should be
perceived at different levels and finally how it is
going to be stored in a computer system.
Information System with Database application

consists of several tasks which include:
➢ Planning of Information systems Design

➢ Requirements Analysis,
➢ Design (Conceptual, Logical and Physical
Design)
➢ Implementation
➢ Testing and deployment
➢ Operation and Support
From these different phases, the prime interest of a

database system will be the Design part which is
again sub divided into other three sub-phases. These
sub-phases are:
1. Conceptual Design

2. Logical Design, and

3. Physical Design
➢ In general, one has to go back and forth between

these tasks to refine a database design, and
decisions in one task can influence the choices in
another task.
➢ In developing a good design, one should answer
such questions as:
▪ What are the relevant Entities for the
Organization
▪ What are the important features of each
Entity
▪ What are the important Relationships
▪ What are the important queries from the
user
▪ What are the other requirements of the
Organization and the Users
The Three levels of Database Design
Conceptual Design
Logical Design
Physical Design
Conceptual Database Design

 Conceptual design is the process of constructing

a model of the information used in an enterprise,
independent of any physical considerations.
❖ It is the source of information for the logical
design phase.
❖ Mostly uses an Entity Relationship Model to
describe the data at this level.
 After the completion of Conceptual Design one

has to go for refinement of the schema, which is
verification of Entities, Attributes, and
Relationships
Logical Database Design

 Logical design is the process of constructing a
model of the information used in an enterprise
based on a specific data model (e.g. relational,
hierarchical or network or object), but
independent of a particular DBMS and other
physical considerations.
❖ Normalization process
 Collection of Rules to be maintained
 Discover new entities in the process
 Revise attributes based on the rules and
the discovered Entities
Physical Database Design

 Physical design is the process of producing a
description of the implementation of the database

on secondary storage. -- defines specific storage

or access methods used by database
 Describes the storage structures and access
methods used to achieve efficient access to
the data.
 Tailored to a specific DBMS system --
Characteristics are function of DBMS and
operating systems
 Includes estimate of storage space

Conceptual Database Design
 Conceptual design revolves around discovering

and analyzing organizational and user data
requirements
 The important activities are to identify
➢ Entities
➢ Attributes
➢ Relationships
➢ Constraints
 And based on these components develop the ER
model using
➢ ER diagrams
The Entity Relationship (E-R) Model
 Entity-Relationship modeling is used to represent

conceptual view of the database
 The main components of ER Modeling are:
o Entities
▪ Corresponds to entire table, not row
▪ Represented by Rectangle
o Attributes
▪ Represents the property used to
describe an entity or a relationship
▪ Represented by Oval
o Relationships
▪ Represents the association that exist
between entities

▪ Represented by Diamond
o Constraints
▪ Represent the constraint in the data
• Cardinality and Participation
Constraints
Before working on the conceptual design of the

database, one has to know and answer the
following basic questions.
• What are the entities and relationships in the
enterprise?
• What information about these entities and
relationships should we store in the database?
• What are the integrity constraints that hold?
Constraints on each data with respect to update,
retrieval and store.
• Represent this information pictorially in ER
diagrams, then map ER diagram into a relational
schema.

Developing an E-R Diagram
 Designing conceptual model for the database is not

a one linear process but an iterative activity where the
design is refined again and again.
 To identify the entities, attributes, relationships,
and constraints on the data, there are different set of
methods used during the analysis phase. These
include information gathered by…
➢ Interviewing end users individually and in a
group
➢ Questionnaire survey
➢ Direct observation
➢ Examining different documents
 Analysis of requirements gathered
➢ Nouns -- prospective entities
➢ Adjectives--prospective attributes
➢ Verbs/verb phrases-prospective
relationships
 The basic E-R model is graphically depicted and

presented for review.
 The process is repeated until the end users and
designers agree that the E-R diagram is a fair
representation of the organization’ s activities and
functions.
 Checking for Redundant Relationships in the ER
Diagram. Relationships between entities indicate

access from one entity to another - it is therefore

possible to access one entity occurrence from another
entity occurrence even if there are other entities and
relationships that separate them - this is often referred
to as Navigation' of the ER diagram
 The last phase in ER modeling is validating an ER
Model against requirement of the user.

Graphical Representations in ER Diagramming
 Entity is represented by a RECTANGLE

containing the name of the entity.
Strong Entity Weak Entity
 Connected entities are called relationship

participants
 Attributes are represented by OVALS and are

connected to the entity by a line. Ova
Oval Oval ls
Oval
Ova
s s s
ls
Multi-valued Composite Ova
Attribute Attribute Attribute ls
 A derived attribute is indicated by a DOTTED

LINE. (……..)
Ovals
 PRIMARY KEYS are underlined.
Ke
y
 Relationships are represented by DIAMOND

shaped symbols
 Weak Relationship is a relationship between
Weak and Strong Entities

 Strong Relationship is a relationship

between two strong Entities
Diamond Diamond
Strong Relationship Weak Relationship

Example 1: Build an ER Diagram for the following

information:
 A student record management system will have
the following two basic data object categories
with their own features or properties: Students
will have an Id, Name, Dept, Age, GPA and
Course will have an Id, Name, Credit Hours
 Whenever a student enroll in a course in a
specific Academic Year and Semester, the
Student will have a grade for the course
Name Dept DoB Id Name Credit
Id Gpa
Students Course
s
Age
Enrolled In Semester
Academic
Year
Grade
Example 2: Build an ER Diagram for the following

information:
 A Personnel record management system will
have the following two basic data object

categories with their own features or properties:

Employee will have an Id, Name, DoB, Age, Tel
and Department will have an Id, Name, Location
 Whenever an Employee is assigned in one
Department, the duration of his stay in the
respective department should be registered.

Structural Constraints on Relationship
1. Constraints on Relationship / Multiplicity/

Cardinality Constraints
➢ Multiplicity constraint is the number or range of
possible occurrence of an entity type/relation that
may relate to a single occurrence/tuple of an entity
type/relation through a particular relationship.
➢ Mostly used to insure appropriate enterprise
constraints.
One-to-one relationship:
➢ A customer is associated with at most one loan via the
relationship borrower
➢ A loan is associated with at most one customer via
borrower
E.g.: Relationship Manages between STAFF and

BRANCH
The multiplicity of the relationship is:
➢ One branch can only have one manager

➢ One employee could manage either one or no

branches
1..1 Manages 0..1

Employee Branch

One-To-Many Relationships
➢ In the one-to-many relationship a loan is associated
with at most one customer via borrower, a customer is
associated with several (including 0) loans via
borrower
E.g.: Relationship Leads between STAFF and

PROJECT
The multiplicity of the relationship
➢ One staff may Lead one or more project(s)
➢ One project is Lead by one staff
1..1 Leads 0..*
Employee Project
Many-To-Many Relationship
➢ A customer is associated with several (possibly 0)
loans via borrower
➢ A loan is associated with several (possibly 0)
customers via borrower

E.g.: Relationship “Teaches” between INSTRUCTOR

and COURSE
The multiplicity of the relationship
➢ One Instructor Teaches one or more Course(s)
➢ One Course Thought by Zero or more Instructor(s)
0..* Teaches 1..*

Instructor Course

Participation of an Entity Set in a Relationship Set
Participation constraint of a relationship is involved

in identifying and setting the mandatory or optional
feature of an entity occurrence to take a role in a
relationship. There are two distinct participation
constraints with this respect, namely: Total
Participation and Partial Participation
➢ Total participation: every tuple in the entity or

relation participates in at least one relationship by
taking a role. This means, every tuple in a relation
will be attached with at least one other tuple. The
entity with total participation in a relationship will be
connected to the relationship using a double line.
➢ Partial participation: some tuple in the entity or
relation may not participate in the relationship. This
means, there is at least one tuple from that Relation
not taking any role in that specific relationship. The
entity with partial participation in a relationship will
be connected to the relationship using a single line.
➢ E.g. 1: Participation of EMPLOYEE in “ belongs to”

relationship with DEPARTMENT is total
since every employee should belong to a
department.
Participation of DEPARTMENT in “ belongs
to” relationship with EMPLOYEE is total

since every department should have more

than one employee.
1..* 1..1
Employee BelongsTo Department
➢ E.g. 2: Participation of EMPLOYEE in “ manages”

relationship with DEPARTMENT, is partial
participation since not all employees are
managers.
Participation of DEPARTMENT in
“ Manages” relationship with EMPLOYEE is
total since every department should have a
manager.
1..1 0..1
Employee Manages Department
Problem in ER Modeling
The Entity-Relationship Model is a conceptual data

model that views the real world as consisting of
entities and relationships. The model visually
represents these concepts by the Entity-Relationship
diagram. The basic constructs of the ER model are
entities, relationships, and attributes. Entities are

concepts, real or abstract, about which information is

collected. Relationships are associations between the
entities. Attributes are properties which describe the
entities.
While designing the ER model one could face a

problem on the design which is called a connection
traps. Connection traps are problems arising from
misinterpreting certain relationships
There are two types of connection traps;

1. Fan trap:
Occurs where a model represents a relationship
between entity types, but the pathway between
certain entity occurrences is ambiguous.
May exist where two or more one-to-many (1:M)
relationships fan out from an entity. The problem
could be avoided by restructuring the model so
that there would be no 1:M relationships fanning
out from a singe entity and all the semantics of
the relationship is preserved.
Example:
1..* Works 1..1 1..1 IsAssigned 1..*

EMPLOYEE For BRANCH CAR
Semantics description of the problem;

Emp1 Br1 Car1
Emp2 Br2 Car2
Emp3 Br3 Car3 85
Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009;
Emp4 Br4 Car4
Emp5 Car5
Problem: Which car (Car1 or Car3 or Car5) is used by

Employee 6 Emp6 working in Branch 1 (Br1)? Thus
from this ER Model one can not tell which car is used
by which staff since a branch can have more than one
car and also a branch is populated by more than one
employee. Thus we need to restructure the model to
avoid the connection trap.
To avoid the Fan Trap problem we can go for

restructuring of the E-R Model. This will result in the
following E-R Model.
1..1 Has 1..* 1..* Used By 1..*

BRANCH CAR EMPLOYEE
Semantics description of the problem;

Car1
Br1 Emp1
Car2
Br2 Emp2
Car3
Br3 Emp3
Car4
Br4 Emp4
Compiled by;Adane Kasie, Faculty of Informatics,
Car5 BDU sep't 19/2009; 86
Emp5
Car6
Emp6

2. Chasm Trap:
Occurs where a model suggests the existence of a
relationship between entity types, but the path
way does not exist between certain entity
occurrences.
Chasm trap may exist when there are one or
more relationships with a minimum multiplicity
on cardinality of zero forming part of the
pathway between related entities.
Example:
1..1 Has 1..* 0..1 Manages 0..*

BRANCH EMPLOYEE PROJECT
If we have a set of projects that are not active

currently then we can not assign a project
manager for these projects. So there are project
with no project manager making the
participation to have a minimum value of zero.
Problem:
How can we identify which BRANCH is
responsible for which PROJECT? We know that
whether the PROJECT is active or not there is a
responsible BRANCH. But which branch is a
question to be answered, and since we have a
minimum participation of zero between

employee and PROJECT we can’ t identify the

BRANCH responsible for each PROJECT.
The solution for this Chasm Trap problem is to

add another relation ship between the extreme
entities (BRANCH and PROJECT)
1..1 Has 1..* 0..1 Manages 0..*

BRANCH EMPLOYEE PROJECT
1..1 Responsible for 1..*

Enhanced E-R (EER) Models

 Object-oriented extensions to E-R model

 EER is important when we have a relationship between
two entities and the participation is partial between entity
occurrences. In such cases EER is used to reduce the
complexity in participation and relationship complexity.
 ER diagrams consider entity types to be primitive objects
 EER diagrams allow refinements within the structures of
entity types
 EER Concepts
 Generalization
 Specialization
 Sub classes
 Super classes
 Attribute Inheritance
 Constraints on specialization and generalization

 Generalization
➢ Generalization occurs when two or more entities
represent categories of the same real-world object.
➢ Generalization is the process of defining a more
general entity type from a set of more specialized
entity types.
➢ A generalization hierarchy is a form of abstraction
that specifies that two or more entities that share
common attributes can be generalized into a higher
level entity type.
➢ Is considered as bottom-up definition of entities.
➢ Generalization hierarchy depicts relationship
between higher level superclass and lower level
subclass.
Generalization hierarchies can be nested. That is, a
subtype of one hierarchy can be a supertype of
another. The level of nesting is limited only by the
constraint of simplicity.
Example: Account is a generalized form for aving and

Current Accounts

 Specialization
➢ Is the result of subset of a higher level entity set to
form a lower level entity set.
➢ The specialized entities will have additional set of
attributes (distinguishing characteristics) that
distinguish them from the generalized entity.
➢ Is considered as Top-Down definition of entities.
➢ Specialization process is the inverse of the
Generalization process. Identify the distinguishing
features of some entity occurrences, and specialize
them into different subclasses.
➢ Reasons for Specialization
o Attributes only partially applying to
superclasses
o Relationship types only partially applicable to
the superclass
➢ In many cases, an entity type has numerous sub-
groupings of its entities that are meaningful and
need to be represented explicitly. This need requires
the representation of each subgroup in the ER
model. The generalized entity is a superclass and
the set of specialized entities will be subclasses for
that specific Superclass.
o Example: Saving Accounts and Current
Accounts are Specialized entities for the
generalized entity Accounts. Manager, Sales,
Secretary: are specialized employees.
 Subclass/Subtype

➢ An entity type whose tuples have attributes that

distinguish its members from tuples of the
generalized or Superclass entities.
➢ When one generalized Superclass has various
subgroups with distinguishing features and these
subgroups are represented by specialized form, the
groups are called subclasses.
➢ Subclasses can be either mutually exclusive
(disjoint) or overlapping (inclusive).
➢ A single subclass may inherit attributes from two
distinct superclasses.
➢ A mutually exclusive category/subclass is when an
entity instance can be in only one of the subclasses.
E.g.: An EMPLOYEE can either be SALARIED
or PART-TIMER but not both.
➢ An overlapping category/subclass is when an entity
instance may be in two or more subclasses.
E.g.: A PERSON who works for a university
can be both EMPLOYEE and a
STUDENT at the same time.
 Superclass /Supertype
➢ An entity type whose tuples share common
attributes. Attributes that are shared by all entity
occurrences (including the identifier) are associated
with the supertype.
➢ Is the generalized entity
 Relationship Between Superclass and Subclass

➢ The relationship between a superclass and any of

its subclasses is called a superclass/subclass or
class/subclass relationship
➢ An instance can not only be a member of a
subclass. i.e. Every instance of a subclass is also
an instance in the Superclass.
➢ A member of a subclass is represented as a distinct
database object, a distinct record that is related via
the key attribute to its super-class entity.
➢ An entity cannot exist in the database merely by
being a member of a subclass; it must also be a
member of the super-class.
➢ An entity occurrence of a sub class not necessarily
should belong to any of the subclasses unless there
is full participation in the specialization.
➢ The relationship between a subclass and a
Superclass is an “ IS A” or “ IS PART OF” type.
▪ Subclass IS PART OF Superclass
▪ Manager IS AN Employee
➢ All subclasses or specialized entity sets should be
connected with the superclass using a line to a
circle where there is a subset symbol indicating the
direction of subclass/superclass relationship.

➢ We can also have subclasses of a subclass forming

a hierarchy of specialization.
➢ Superclass attributes are shared by all subclasses
of that superclass
➢ Subclass attributes are unique for the subclass.
 Attribute Inheritance
➢ An entity that is a member of a subclass inherits all
the attributes of the entity as a member of the
superclass.
➢ The entity also inherits all the relationships in
which the superclass participates.
➢ An entity may have more than one subclass
categories.
➢ All entities/subclasses of a generalized entity or
superclass share a common unique identifier

attribute (primary key). i.e. The primary key of the

superclass and subclasses are always identical.
• Consider the EMPLOYEE supertype entity shown

above. This entity can have several different subtype
entities (for example: HOURLY and SALARIED),
each with distinct properties not shared by other
subtypes. But whether the employee is HOURLY or
SALARIED, same attributes (EmployeeId, Name, and
DateHired) are shared.
• The Supertype EMPLOYEE stores all properties that
subclasses have in common. And HOURLY
employees have the unique attribute Wage (hourly
wage rate), while SALARIED employees have two
unique attributes, StockOption and Salary.
Constraints on specialization and generalization
 Completeness Constraint.

• The Completeness Constraint addresses the issue of

whether or not an occurrence of a Superclass must
also have a corresponding Subclass occurrence.
• The completeness constraint requires that all instances
of the subtype be represented in the supertype.
• The Total Specialization Rule specifies that an entity
occurrence should at least be a member of one of the
subclasses. Total Participation of superclass instances
on subclasses is diagrammed with a double line from
the Supertype to the circle as shown below.
E.g.: If we have EXTENTION and REGULAR as
subclasses of a superclass STUDENT, then it is
mandatory that each student to be either
EXTENTION or REGULAR student. Thus the
participation of instances of STUDENT in
EXTENTION and REGULAR subclasses will
be total.
• The Partial Specialization Rule specifies that it is not

necessary for all entity occurrences in the superclass to
be a member of one of the subclasses. Here we have
an optional participation on the specialization. Partial

Participation of superclass instances on subclasses is

diagrammed with a single line from the Supertype to
the circle.
E.g.: If we have MANAGER and SECRETARY as
subclasses of a superclass EMPLOYEE, then it
is not the case that all employees are either
manager or secretary. Thus the participation of
instances of employee in MANAGER and
SECRETARY subclasses will be partial.
 Disjointness Constraints.
• Specifies the rule whether one entity occurrence can

be a member of more than one subclasses. i.e. it is a
type of business rule that deals with the situation
where an entity occurrence of a Superclass may also
have more than one Subclass occurrence.
• The Disjoint Rule restricts one entity occurrence of a
superclass to be a member of only one of the
subclasses. Example: a EMPLOYEE can either be
SALARIED or PART-TIMER, but not the both at the
same time.
• The Overlap Rule allows one entity occurrence to be
a member f more than one
subclass. Example: EMPLOYEE working at the
university can be both a STUDENT and an
EMPLOYEE at the same time.
• This is diagrammed by placing either the letter "d" for
disjoint or "o" for overlapping inside the circle on the
Generalization Hierarchy portion of the E-R diagram.

The two types of constraints on generalization and

specialization (Disjointness and Completeness constraints)
are not dependent on one another. That is, being disjoint
will not favour whether the tuples in the superclass should
have Total or Partial participation for that specific
specialization.
From the two types of constraints we can have four

possible constraints
 Disjoint AND Total
 Disjoint AND Partial
 Overlapping AND Total
 Overlapping AND Partial

Chapter Four
Logical
Database Design
The whole purpose of the data base design is to create an
accurate representation of the data, the relationship between
the data and the business constraints pertinent to that
organization. Therefore, one can use one or more technique
to design a data base. One such a technique was the E-R
model. In this chapter we use another technique known as
“ Normalization” with a different emphasis to the database
design---- defines the structure of a database with a specific
data model.
Logical design is the process of constructing a model of the

information used in an enterprise based on a specific data
model (e.g. relational, hierarchical or network or object),
but independent of a particular DBMS and other physical
considerations.
The focus in logical database design is the Normalization

Process
❖ Normalization process

 Collection of Rules (Tests) to be applied on

relations to obtain the minimal, non
redundant set or attributes.
 Discover new entities in the process
 Revise attributes based on the rules and the
discovered Entities
 Works by examining the relationship
between attributes known as functional
dependency.
The purpose of normalization is to find the suitable set of

relations that supports the data requirements of an
enterprise.
A suitable set of relations has the following characteristics;
• Minimal number of attributes to support the data

requirements of the enterprise
• Attributes with close logical relationship (functional
dependency) should be placed in the same relation.
• Minimal redundancy with each attribute represented
only once with the exception of the attributes which
form the whole or part of the foreign key, which are
used for joining of related tables.
The first step before applying the rules in relational data

model is converting the conceptual design to a form
suitable for relational logical model, which is in a form of
tables.

Converting ER Diagram to Relational Tables
Three basic rules to convert ER into tables or relations:

Rule 1: Entity Names will automatically be table names
Rule 2: Mapping of attributes: attributes will be columns of
the respective tables.
✓ Atomic or single-valued or derived or stored
attributes will be columns
✓ Composite attributes: the parent attribute will be
ignored and the decomposed attributes (child
attributes) will be columns of the table.
✓ Multi-valued attributes: will be mapped to a new
table where the primary key of the main table will
be posted for cross referencing.
Rule 3: Relationships: relationship will be mapped by

using a foreign key attribute. Foreign key is a primary or
candidate key of one relation used to create association
between tables.
✓ For a relationship with One-to-One Cardinality:

post the primary or candidate key of one of the table
into the other as a foreign key. In cases where one
entity is having partial participation on the
relationship, it is recommended to post the
candidate key of the partial participants to the total
participant so as to save some memory location due
to null values on the foreign key attribute. E.g.: for
a relationship between Employee and Department
where employee manages a department, the
cardinality is one-to-one as one employee will

manage only one department and one department

will have one manager. here the PK of the
Employee can be posted to the Department or the
PK of the Department can be posted to the
Employee. But the Employee is having partial
participation on the relationship "Manages" as not
all employees are managers of departments. thus,
even though both way is possible, it is
recommended to post the primary key of the
employee to the Department table as a foreign key.
✓ For a relationship with One-to-Many Cardinality:

Post the primary key or candidate key from the
“ one” side as a foreign key attribute to the
“ many” side. E.g.: For a relationship called
“ Belongs To” between Employee (Many) and
Department (One) the primary or candidate key of
the one side which is Department should be posted
to the many side which is Employee table.
✓ For a relationship with Many-to-Many

Cardinality: for relationships having many to many
cardinality, one has to create a new table (which is
the associative entity) and post primary key or
candidate key from the participant entities as
foreign key attributes in the new table along with
some additional attributes (if applicable). The same
approach should be used for relationships with
degree greater than binary.

✓ For a relationship having Associative Entity

property: in cases where the relationship has its
own attributes (associative entity), one has to create
a new table for the associative entity and post
primary key or candidate key from the participating
entities as foreign key attributes in the new table.

Example to illustrate the major rules in mapping ER to

relational schema:
The following ER has been designed to represent the

requirement of an organization to capture Employee
Department and Project information. And Employee works
for department where an employee might be assigned to
manage a department. Employees might participate on
different projects within the organization. An employee
might as well be assigned to lead a project where the
starting and ending date of his/her project leadership and
bonus will be registered.

FName LName
e e
EID Salar DID DLoc

Name y Manage
s
1 1
Employee Departmen
t
M 1 M WorksFor 1
Tel DNam
e
StartDate
Leads
EndDate
Participat
e
PBonu
s
M M
Project
PFund
PID PName

After we have drawn the ER diagram, the next thing is to

map the ER into relational schema so as the rules of the
relational data model can be tested for each relational
schema. The mapping can be done for the entities followed
by relationships based on the rule of mapping. the mapping
has been done as follows.
✓ Mapping EMPLOYEE Entity:

There will be Employee table with EID, Salary,
FName and LName being the columns. The
composite attribute Name will be ignored as its
decomposed attributes (FName and LName) are
columns in the Employee Table. The Tel attribute will
be a new table as it is multi-valued.
Employee
EID FName LName Salary
Telephone
EID Tel
✓ Mapping DEPARTMENT Entity:

There will be Department table with DID, DName,

and DLoc being the columns.
Department
DID DName DLoc
✓ Mapping PROJECT Entity:

There will be Project table with PID, PName, and
PFund being the columns.
Project
PID PName PFund
✓ Mapping the MANAGES Relationship:

As the relationship is having one-to-one cardinality,
the PK or CK of one of the table can be posted into the
other. But based on the recommendation, the Pk or CK
of the partial participant (Employee) should be posted
to the total participants (Department). This will require
adding the PK of Employee (EID) in the Department
Table as a foreign key. We can give the foreign key
another name which is MEID to mean "managers
employee id". this will affect the degree of the
Department table.
Department
DID DName DLoc MEID
✓ Mapping the WORKSFOR Relationship:

As the relationship is having one-to-many cardinality,
the PK or CK of the "One" side (PK or CK of
Department table) should be posted to the many side
(Employee table). This will require adding the PK of
Department (DID) in the Employee Table as a foreign

key. We can give the foreign key another name which

is EDID to mean "Employee's Department id". this
will affect the degree of the Employee table.
Employee
EID FName LName Salary EDID
✓ Mapping the PARTICIPATES Relationship:

As the relationship is having many-to-many
cardinality, we need to create a new table and post the
PK or CK of the Employee and Project table into the
new table. We can give a descriptive new name for the
new table like Emp_Partc_Project to mean "Employee
participate in a project".
Emp_Partc_Project
EID PID
✓ Mapping the LEADS Relationship:

As the relationship is associative entity, we are
supposed to create a table for the associative entity
where the PK of Employee and Project tables will be
posted in the new table as a foreign key. The new table
will have the attributes of the associative entity as
columns. We can give a descriptive new name for the
new table like Emp_Lead_Project to mean "Employee
leads a project".
Emp_Lead_Project
EID PID PBonus StartDate EndDate
At the end of the mapping we will have the following

relational schema (tables) for the logical database design
phase.

Department
DID DName DLoc MEID
Project
PID PName PFund
Telephone
EID Tel
Employee
EID FName LName Salary EDID
Emp_Partc_Project
EID PID
Emp_Lead_Project
EID PID PBonus StartDate EndDate
After converting the ER diagram in to table forms, the next

phase is implementing the process of normalization, which
is a collection of rules each table should satisfy.
Normalization
A relational database is merely a collection of data,

organized in a particular manner. As the father of the
relational database approach, Codd created a series of rules
(tests) called normal forms that help define that
organization

One of the best ways to determine what information should

be stored in a database is to clarify what questions will be
asked of it and what data would be included in the answers.
Database normalization is a series of steps followed to
obtain a database design that allows for consistent storage
and efficient access of data in a relational database. These
steps reduce data redundancy and the risk of data becoming
inconsistent.
NORMALIZATION is the process of identifying the
logical associations between data items and designing a
database that will represent such associations but without
suffering the update anomalies which are;
1. Insertion Anomalies
2. Deletion Anomalies
3. Modification Anomalies
Normalization may reduce system performance since data
will be cross referenced from many tables. Thus
denormalization is sometimes used to improve
performance, at the cost of reduced consistency guarantees.
Normalization normally is considered “good” if it is
lossless decomposition.
All the normalization rules will eventually remove the
update anomalies that may exist during data manipulation
after the implementation. The update anomalies are;

The type of problems that could occur in insufficiently

normalized table is called update anomalies which
includes;
(1) Insertion anomalies

An "insertion anomaly" is a failure to place information
about a new database entry into all the places in the
database where information about that new entry needs
to be stored. Additionally, we may have difficulty to
insert some data. In a properly normalized database,
information about a new entry needs to be inserted into
only one place in the database; in an inadequately
normalized database, information about a new entry may
need to be inserted into more than one place and, human
fallibility being what it is, some of the needed additional
insertions may be missed.
(2) Deletion anomalies

A "deletion anomaly" is a failure to remove information
about an existing database entry when it is time to
remove that entry. Additionally, deletion of one data may
result in lose of other information. In a properly
normalized database, information about an old, to-be-
gotten-rid-of entry needs to be deleted from only one
place in the database; in an inadequately normalized
database, information about that old entry may need to
be deleted from more than one place, and, human
fallibility being what it is, some of the needed additional
deletions may be missed.
(3) Modification anomalies

A modification of a database involves changing some

value of the attribute of a table. In a properly normalized
database table, what ever information is modified by the
user, the change will be effected and used accordingly.
In order to avoid the update anomalies we in a given

table, the solution is to decompose it to smaller tables
based on the rule of normalization. However, the
decomposition has two important properties
a. The Lossless-join property insures that any instance

of the original relation can be identified from the
instances of the smaller relations.
b. The Dependency preservation property implies

that constraint on the original dependency can be
maintained by enforcing some constraints on the
smaller relations. i.e. we don’ t have to perform
Join operation to check whether a constraint on the
original relation is violated or not.
The purpose of normalization is to reduce the chances for

anomalies to occur in a database.

Example of problems related with Anomalies
EmpID FName LName SkillID Skill SkillType School Sch
12 Abebe Mekuria 2 SQL Database Sid AAU

16 Lemma Alemu 5 C++ Programming Ger Unity
28 Chane Kebede 2 SQL Database Sid AAU
25 Abera Taye 6 VB6 Programming Pia Helico
65 Almaz Belay 2 SQL Database Pia Helico
24 Dereje Tamiru 8 Oracle Database Ger Unity
51 Selam Belay 4 Prolog Programming Jim Jimma
City
94 Alem Kebede 3 Cisco Networking AAU Sid
18 Girma Dereje 1 IP Programming Jimma Jim
City
13 Yared Gizaw 7 Java Programming AAU Sid
Deletion Anomalies:
If employee with ID 16 is deleted then ever
information about skill C++ and the type of skill is
deleted from the database. Then we will not have any
information about C++ and its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called
Pascal? We can not decide weather Pascal is allowed
as a value for skill and we have no clue about the type
of skill that Pascal should be categorized as.
Modification Anomalies:

What if the address for Helico is changed from Piazza

to Mexico? We need to look for every occurrence of
Helico and change the value of School_Add from
Piazza to Mexico, which is prone to error.
Database-management system can work only with the

information that we put explicitly into its tables for a given
database and into its rules for working with those tables,
where such rules are appropriate and possible.

Functional Dependency (FD)

Before moving to the definition and application of
normalization, it is important to have an understanding of
"functional dependency."
Data Dependency
The logical associations between data items that point the
database designer in the direction of a good database design
are refered to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or

dependent relationship if certain values of data item B
always appears with certain values of data item A. if the
data item A is the determinant data item and B the
dependent data item then the direction of the association is
from A to B and not vice versa.
The essence of this idea is that if the existence of

something, call it A, implies that B must exist and have a
certain value, then we say that "B is functionally
dependent on A." We also often express this idea by
saying that "A functionally determines B," or that "B is a
function of A," or that "A functionally governs B." Often,
the notions of functionality and functional dependency are
expressed briefly by the statement, "If A, then B." It is
important to note that the value of B must be unique for a
given value of A, i.e., any given value of A must imply just
one and only one value of B, in order for the relationship to
qualify for the name "function." (However, this does not
necessarily prevent different values of A from implying the
same value of B.)

However, for the purpose of normalization, we are

interested in finding 1..1 (one to one) dependencies, lasting
for all times (intension rather than extension of the
database), and the determinant having the minimal number
of attributes.
X  Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
The notation is: AB which is read as; B is functionally
dependent on A
In general, a functional dependency is a relationship

among attributes. In relational databases, we can have a
determinant that governs one or several other attributes.
FDs are derived from the real-world constraints on the

attributes and they are properties on the database intension
not extension.
Example
Dinner Type of
Course Wine
Meat Red
Fish White
Cheese Rose
Since the type of Wine served depends on the type of

Dinner, we say Wine is functionally dependent on Dinner.

Dinner  Wine
Dinner Type of Type of

Course Wine Fork
Meat Red Meat fork
Fish White Fish fork
Cheese Rose Cheese
fork
Since both Wine type and Fork type are determined by the
Dinner type, we say Wine is functionally dependent on
Dinner and Fork is functionally dependent on Dinner.
Dinner  Wine
Dinner  Fork
Partial Dependency
If an attribute which is not a member of the primary key is
dependent on some part of the primary key (if we have
composite primary key) then that attribute is partially
functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute.
Then if {A,B}C and BC
Then C is partially functionally dependent on {A,B}
Full Functional Dependency

If an attribute which is not a member of the primary key is
not dependent on some part of the primary key but the

whole key (if we have composite primary key) then that

attribute is fully functionally dependent on the primary key.
Let {A,B} be the Primary Key and C is a non- key attribute
Then if {A,B}C and BC and AC does not hold

Then C Fully functionally dependent on {A,B}
Transitive Dependency
In mathematics and logic, a transitive relationship is a
relationship of the following form: "If A implies B, and if
also B implies C, then A implies C."
Example:
If Mr X is a Human, and if every Human is an Animal,
then Mr X must be an Animal.
Generalized way of describing transitive dependency is

that:
If A functionally governs B, AND

If B functionally governs C
THEN A functionally governs C
Provided that neither C nor B determines A i.e. (B / A
and C / A)
In the normal notation:
{(AB) AND (BC)} ==> AC provided that B / A and

C / A


Steps of Normalization:
We have various levels or steps in normalization called

Normal Forms.
The level of complexity, strength of the rule and
decomposition increases as we move from one lower level
Normal Form to the higher.
A table in a relational database is said to be in a certain
normal form if it satisfies certain constraints.
A normal form below represents a stronger condition than
the previous one
Normalization towards a logical design consists of the
following steps:
UnNormalized Form(UNF):
Identify all data elements
First Normal Form(1NF):
Find the key with which you can find all data i.e.
remove any repeating group
Second Normal Form(2NF):
Remove part-key dependencies (partial dependency).
Make all data dependent on the whole key.
Third Normal Form(3NF)
Remove non-key dependencies (transitive
dependencies). Make all data dependent on nothing but
the key.
For most practical purposes, databases are considered
normalized if they adhere to the third normal form (there is
no transitive dependency).

First Normal Form (1NF)

Requires that all column values in a table are atomic
(e.g., a number is an atomic value, while a list or a set
is not).
We have tow ways of achiving this:
1. Putting each repeating group into a separate table
and connecting them with a primary key-foreign
key relationship
2. Moving these repeating groups to a new row by
repeating the non-repeating attributes known as
“ flattening” the table. If so then Find the key
with which you can find all data
Definition: a table (relation) is in 1NF

If
➢ There are no duplicated rows in the table.
Unique identifier
➢ Each cell is single-valued (i.e., there are no
repeating groups).
➢ Entries in a column (attribute, field) are of the
same kind.

Example for First Normal form (1NF )
UNNORMALIZED
EmpID FirstName LastName Skill SkillType School
SchoolAdd
12 Abebe Mekuria SQL, Database, AAU,
Sidist_Kilo
VB6 Programming Helico
Piazza
16 Lemma Alemu C++ Programming GerjiUnity
IP Programming JimmaJimma
City
28 Chane Kebede SQL Database AAU Sidist_Kilo
65 Almaz Belay SQL Database Helico Piazza
Prolog Programming Jimma Jimma
Java Programming AAU City
Sidist_Kilo
24 Dereje Tamiru Oracle Database Unity Gerji
94 Alem Kebede Cisco Networking AAU Sidist_Kilo
FIRST NORMAL FORM (1NF)
Remove all repeating groups. Distribute the multi-valued

attributes into different rows and identify a unique
identifier for the relation so that is can be said is a relation
in relational database. Flatten the table.
EmpID FirstName LastName SkillID Skill SkillType School Sch

12 Abebe Mekuria 1 SQL Database AAU Sid
12 Abebe Mekuria 3 VB6 Programming Helico P
16 Lemma Alemu 2 C++ Programming Unity G
16 Lemma Alemu 7 IP Programming Jimma J

28 Chane Kebede 1 SQL Database AAU Sid

65 Almaz Belay 1 SQL Database Helico P
65 Almaz Belay 5 Prolog Programming Jimma J
65 Almaz Belay 8 Java Programming AAU Sid

24 Dereje Tamiru 4 Oracle Database Unity G
94 Alem Kebede 6 Cisco Networking AAU Sid

Second Normal form 2NF

No partial dependency of a non key attribute on part of the
primary key. This will result in a set of relations with a
level of Second Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a
non-composite) key is automatically also in 2NF.
Definition: a table (relation) is in 2NF

If
➢ It is in 1NF and
➢ If all non-key attributes are dependent on the
entire primary key. i.e. no partial dependency.
Example for 2NF:
EMP_PROJ
EmpID EmpName ProjNo ProjName ProjLoc ProjFund ProjMan
EMP_PROJ rearranged
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMan
Business rule: Whenever an employee participates in a

project, he/she will be entitled for an incentive.
This schema is in its 1NF since we don’ t have any

repeating groups or attributes with multi-valued property.
To convert it to a 2NF we need to remove all partial
dependencies of non key attributes on part of the primary
key.

{EmpID, ProjNo} EmpName, ProjName, ProjLoc,

ProjFund, ProjMangID, Incentive
But in addition to this we have the following dependencies
FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund,
ProjMangID
FD3: {EmpID, ProjNo} Incentive
As we can see, some non key attributes are partially

dependent on some part of the primary key. This can be
witnessed by analyzing the first two functional
dependencies (FD1 and FD2). Thus, each Functional
Dependencies, with their dependent attributes should be
moved to a new relation where the Determinant will be the
Primary Key for each.
EMPLOYEE
EmpID EmpName
PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID
EMP_PROJ
EmpID ProjNo Incentive
Third Normal Form (3NF)

Eliminate Columns dependent on another non-Primary Key
- If attributes do not contribute to a description of the key;
remove them to a separate table.
This level avoids update and deletes anomalies.

Definition: a Table (Relation) is in 3NF

If
➢ It is in 2NF and
➢ There are no transitive dependencies between
a primary key and non-primary key
attributes.
Example for (3NF)
Assumption: Students of same batch (same year) live in

one building or dormitory
STUDENT
StudID Stud_F_Name Stud_L_Name Dept Year Dormitary
125/97 Abebe Mekuria Info Sc 1 401
654/95 Lemma Alemu Geog 3 403
842/95 Chane Kebede CompSc 3 403
165/97 Alem Kebede InfoSc 1 401
985/95 Almaz Belay Geog 3 403
This schema is in its 2NF since the primary key is a single

attribute and there are no repeating groups (multi valued
attributes).
Let’ s take StudID, Year and Dormitary and see the

dependencies.
StudIDYear AND YearDormitary

And Year can not determine StudID and Dormitary can
not determine StudID Then transitively
StudIDDormitary

To convert it to a 3NF we need to remove all transitive

dependencies of non key attributes on another non-key
attribute.
The non-primary key attributes, dependent on each other

will be moved to another table and linked with the main
table using Candidate Key- Foreign Key relationship.
STUDENT DORM
StudID Stud Stud Dept Year Year Dormitary
F_Name L_Name 1 401
125/97 Abebe Mekuria Info Sc 1 3 403
654/95 Lemma Alemu Geog 3
842/95 Chane Kebede CompSc 3
165/97 Alem Kebede InfoSc 1
985/95 Almaz Belay Geog 3
Generally,
eventhough there are other four additional levels of
Normalization, a table is said to be normalized if it reaches
3NF. A database with all tables in the 3NF is said to be
Normalized Database.
Mnemonic for remembering the rationale for normalization
up to 3NF could be the following:
1. No Repeating or Redunduncy: no repeting fields in the
table.
2. The Fields Depend Upon the Key: the table should solely
depend on the key.
3. The Whole Key: no partial keybdependency.
4. And Nothing But the Key: no inter data dependency.

5. So Help Me Codd: since Codd came up with these rules.

Other Levels of Normalization

Boyce-Codd Normal Form (BCNF):
BCNF is based on functional dependency that takes in to

account all the candidate keys in a relation.
So, table is in BCNF if it is in 3NF and if every
determinant is a candidate key. Violation of the BCNF is
very rare. The potential sources for violation of this rule are
1. The relation contains two (or more) composite
candidate keys
2. The candidate keys over lap i.e. have common
attribute.
The issue is related to:

Isolating Independent Multiple Relationships - No table may
contain two or more 1:N or N:M relationships that are not
directly related.
The correct solution, to cause the model to be in 4th normal
form, is to ensure that all M:M relationships are resolved
independently if they are indeed independent, as shown
below.
Forth Normal form (4NF)

Isolate Semantically Related Multiple Relationships - There
may be practical constrains on information that justify
separating logically related many-to-many relationships.
MVD(Multi-Valued Dependency ) : represents a
dependency between attributes( for example A, B,C) in a
Adane kasie Faculty of Informatics, BDU 2009

relation such that for every value of A there is a set of values

for B and there is a set of values for C but the sets B and C are
independent to each other.
MVD between attributes A, B, and C in a relation is

represented as follows
A------>>B
A------->>C
Def: A table is in 4NF if it is in BCNF and if it has no

multi-valued dependencies.
Fifth Normal Form (5NF)

Sometimes called the Project – Join – Normal Form
(PJNF)
5NF is based on the Join dependency.
Join Dependency: a property of decomposition that ensures
that no spurious are generated when rejoining to obtain the
original relation
Def: A table is in 5NF, also called "Projection-Join

Normal Form" (PJNF), if it is in 4NF and if every join
dependency in the table is a consequence of the
candidate keys of the table.
Domain-Key Normal Form (DKNF)

A model free from all modification anomalies.

Def: A table is in DKNF if every constraint on the table is

a logical consequence of the definition of keys and
domains.
The underlying ideas in normalization are simple enough.

Through normalization we want to design for our relational
database a set of tables that;
(1) Contain all the data necessary for the purposes that the
database is to serve,
(2) Have as little redundancy as possible,
(3) Accommodate multiple values for types of data that
require them,
(4) Permit efficient updates of the data in the database, and
(5) Avoid the danger of losing data unknowingly.

Pitfalls of Normalization
Problems associated with normalization

• Requires data to see the problems
• May reduce performance of the system
• Is time consuming,
• Difficult to design and apply and
• Prone to human error


Chapter Five
Physical Database Design
Methodology for Relational Database
We have established that there are three

levels of database design:
• Conceptual design: producing a data
model which accounts for the
relevant entities and relationships
within the target application domain;
• Logical design: ensuring, via
normalization procedures and the
definition of integrity rules, that the
stored database will be non-redundant
and properly connected;
• Physical design: specifying how
database records are stored, accessed

and related to ensure adequate

performance.
It is considered desirable to keep these
three levels quite separate -- one of
Codd's requirements for an RDBMS is
that it should maintain logical-physical
data independence. The generality of the
relational model means that RDBMSs
are potentially less efficient than those
based on one of the older data models
where access paths were specified once
and for all at the design stage. However
the relational data model does not
preclude the use of traditional techniques
for accessing data - it is still essential to
exploit them to achieve adequate
performance with a database of any size.
We can consider the topic of physical

database design from three aspects:

• What techniques for storing and

finding data exist
• Which are implemented within a
particular DBMS
• Which might be selected by the
designer for a given application
knowing the properties of the data
Thus the purpose of physical database
design is:
1. How to map the logical database
design to a physical database design.
2. How to design base relations for
target DBMS.
3.How to design enterprise
constraints for target DBMS.
3. How to select appropriate file
organizations based on analysis of
transactions.
4. When to use secondary indexes to
improve performance.
5. How to estimate the size of the

database
6. How to design user views
7. How to design security
mechanisms to satisfy user
requirements.
8. How to design procedures and
triggers.
Physical database design is the process

of producing a description of the
implementation of the database on
secondary storage.
Physical design describes the base
relation, file organization, and indexes
used to achieve efficient access to the
data, and any associated integrity
constraints and security measures.
 Sources of information for the

physical design process include global
logical data model and documentation

that describes model. Set of normalized
relation.
 Logical database design is concerned
with the what; physical database design
is concerned with the how.
 The process of producing a
description of the implementation of
the database on secondary storage.
 Describes the storage structures and
access methods used to achieve
efficient access to the data.
Steps in physical database design

1. Translate logical data model for
target DBMS
1.1. Design base relation
1.2. Design representation of
derived data
1.3. Design enterprise constraint
2. Design physical representation
2.1. Analyze transactions

2.2. Choose file organization
2.3. Choose indexes
2.4. Estimate disk space and
system requirement
3. Design user view
4. Design security mechanisms
5. Consider controlled redundancy
6. Monitor and tune the operational
system

1. Translate logical data model for

target DBMS
This phase is the translation of the global

logical data model to produce a relational
database schema in the target DBMS.
This includes creating the data dictionary
based on the logical model and
information gathered.
After the creation of the data dictionary,
the next activity is to understand the
functionality of the target DBMS so that
all necessary requirements are fulfilled
for the database intended to be
developed.
Knowledge of the DBMS includes:

➢ how to create base relations
➢ whether the system supports:
o definition of Primary key
o definition of Foreign key
o definition of Alternate
key(Unique keys)
o definition of Domains
o Referential integrity constraints
o definition of enterprise level
constraints
1.1. Design base relation
To decide how to represent base relations

identified in global logical model in
target DBMS.
Designing base relation involves
identification of all necessary
requirements about a relation starting
from the name up to the referential
integrity constraints.
For each relation, need to define:
• The name of the relation;
• A list of simple attributes in brackets;

• The PK and, where appropriate, AKs

and FKs.
• A list of any derived attributes and
how they should be computed;
• Referential integrity constraints for
any FKs identified.
For each attribute, need to define:
• Its domain, consisting of a data type,
length, and any constraints on the
domain;
• An optional default value for the
attribute;
• Whether the attribute can hold nulls.
• Whether the attribute can be derived ,
if do how it should be computed
The implementation of the physical

model is dependent on the target DBMS
since some has more facilities than the
other in defining database definitions.

The base relation design along with

every justifiable reason should be fully
documented.
1.2. Design representation of

derived data
While analyzing the requirement of

users, we may encounter that there are
some attributes holding data that will be
derived from existing or other attributes.
A decision on how to represent any
derived data present in the global logical
data model in the target DBMS should
be devised.
Examine logical data model and data

dictionary, and produce list of all derived
attributes. Most of the time derived
attributes are not expressed in the logical
model but will be included in the data
dictionary. Whether to store derived

attributes in a base relation or calculate
them when required is a decision to be
made by the designer considering the
performance impact.
Option selected is based on:
• Additional cost to store the derived
data and keep it consistent with
operational data from which it is
derived;
• Cost to calculate it each time it is
required.
Less expensive option is chosen subject
to performance constraints.
The representation of derived attributes
should be fully documented.
1.3. Design enterprise constraint
Data in the database is not only subjected

to constraints on the database and the
data model used but also with some

enterprise dependent constraints. These
constraint definitions are also dependent
on the DBMS selected and enterprise
level requirements.
One need to know the functionalities of
the DBMS since in designing the
enterprise constraints for the target
DBMS some DBMS provide more
facilities than others.
All the enterprise level constraints and

the definition method in the target
DBMS should be fully documented.
2. Design physical representation
This phase is the level for determining

the optimal file organizations to store the
base relations and the indexes that are
required to achieve acceptable
performance; that is, the way in which

relations and tuples will be held on
secondary storage.
Number of factors that may be used to
measure efficiency:
• Transaction throughput: number of
transactions processed in given time
interval.
• Response time: elapsed time for
completion of a single transaction.
• Disk storage: amount of disk space
required to store database files.
However, no one factor is always
correct.
Typically, have to trade one factor off
against another to achieve a reasonable
balance.
2.1. Analyze transactions

The objective here is to understand the
functionality of the transactions that will
run on the database and to analyze the

important transactions.
Attempt to identify performance criteria,
e.g.:
• Transactions that run frequently and
will have a significant impact on
performance;
• Transactions that are critical to the
business;
• Times during the day/week when
there will be a high demand made on
the database (called the peak load).
Use this information to identify the parts
of the database that may cause
performance problems.
To select appropriate file organizations
and indexes, also need to know high-
level functionality of the transactions,
such as:
• Attributes that are updated in an
update transaction;
• Criteria used to restrict tuples that are

retrieved in a query.
Often not possible to analyze all
expected transactions, so investigate
most ‘ important’ ones.
To help identify which transactions to
investigate, can use:
• Transaction/relation cross-reference
matrix, showing relations that each
transaction accesses, and/or
• Transaction usage map, indicating
which relations are potentially
heavily used.
To focus on areas that may be
problematic:
1. Map all transaction paths to
relations.
2. Determine which relations are
most frequently accessed by
transactions.

3. Analyze the data usage of selected

transactions that involve these
relations.
2.2. Choose file organization

The objective here is to determine an
efficient file organization for each base
relation
File organizations include Heap, Hash,
Indexed Sequential office Access
Method (ISAM), B+-Tree, and Clusters.
Most DBMSs provide little or no option

to select file organization. However, they
prove the user with an option to select an
index for every relation
2.3. Choose indexes

The objective here is to determine
whether adding indexes will improve the
performance of the system.
One approach is to keep tuples unordered

and create as many secondary indexes as
necessary.
Another approach is to order tuples in
the relation by specifying a primary or
clustering index.
In this case, choose the attribute for
ordering or clustering the tuples as:
• Attribute that is used most often for
join operations - this makes join
operation more efficient, or
• Attribute that is used most often to
access the tuples in a relation in order
of that attribute.
If ordering attribute chosen is on the
primary key of a relation, index will be a
primary index; otherwise, index will be a
clustering index.
Each relation can only have either a
primary index or a clustering index.

Secondary indexes provide a mechanism

for specifying an additional key for a
base relation that can be used to retrieve
data more efficiently.
Overhead involved in maintenance and
use of secondary indexes that has to be
balanced against performance
improvement gained when retrieving
data.
This includes:
• Adding an index record to every
secondary index whenever tuple is
inserted;
• Updating a secondary index when
corresponding tuple is updated;
• Increase in disk space needed to store
the secondary index;
• Possible performance degradation
during query optimization to consider
all secondary indexes.

Guidelines for Choosing Indexes

(1) Do not index small relations.
(2) Index PK of a relation if it is not a
key of the file organization.
(3) Add secondary index to a FK if it
is frequently accessed.
(4) Add secondary index to any
attribute that is heavily used as a
secondary key.
(5) Add secondary index on attributes
that are involved in: selection or
join criteria; ORDER BY; GROUP
BY; and other operations involving
sorting (such as UNION or
DISTINCT).
involved in built-in functions.
that could result in an index-only
plan.

(8) Avoid indexing an attribute or

relation that is frequently updated.
(9) Avoid indexing an attribute if the
query will retrieve a significant
proportion of the tuples in the
relation.
Avoid indexing attributes that
consist of long character strings.
2.4. Estimate disk space and

system requirement
The objective here is to estimate the

amount of disk space that will be
required by the database.
Purpose is to answer the following
questions:
• If system already exists: is there
adequate storage?
• If procuring new system: what
storage will be required?
3. Design user view

To design the user views that was
identified during the Requirements
Collection and Analysis stage of the
relational database application
development lifecycle.
Define views in DDL to provide user
views identified in data model
Map onto objects in physical data model
4. Design security mechanisms

To design the security measures for the
database as specified by the users.
System security – Authentication
Data security-authorizations
5. Consider the Introduction of

Controlled Redundancy
The objective here is to determine

whether introducing redundancy in a
controlled manner by relaxing the
normalization rules will improve the
performance of the system. This is
sometimes known as denormalization
Informally speaking, denormalization is
merging of relations
Result of normalization is a logical
database design that is structurally
consistent and has minimal redundancy.
However, sometimes a normalized
database design does not provide
maximum processing efficiency.
It may be necessary to accept the loss of
some of the benefits of a fully
normalized design in favor of
performance.
Also consider that denormalization:
• Makes implementation more
complex;
• Often sacrifices flexibility;

• May speed up retrievals but it slows
down updates.
Denormalization refers to a refinement to
relational schema such that the degree of
normalization for a modified relation is
less than the degree of at least one of the
original relations.
Also use term more loosely to refer to
situations where two relations are
combined into one new relation, which is
still normalized but contains more nulls
than original relations. No fixed rule
when to denormalize but ,
Consider denormalization in following
situations, specifically to speed up
frequent or critical transactions:
• Step 1 Combining 1:1 relationships
• Step 2 Duplicating non-key attributes
in 1:* relationships to reduce joins

• Step 3 Duplicating foreign key

attributes in 1:* relationships to
reduce joins
• Step 4 Introducing repeating groups
• Step 5 Merging lookup tables with
base relations
• Step 6 Creating extract tables.
6. Monitoring and Tuning the

operational system
The objective here is to monitor
operational system and improve
performance of system to correct
inappropriate design decisions or reflect
changing requirements.
Importance of monitoring and tuning the

operational system
• Avoids procurement of additional

hardware
• Down size the hardware
configuration less and cheaper
hardware less expensive
maintenance.
• Faster response time and high
throughput more productive
• Faster response time good staff
moral, customer satisfaction

Chapter Six
Relational Query Languages
In addition to the structural component

of any data model equally important is
the manipulation mechanism. This
component of any data model is called
the “ query language” .
 Query languages: Allow

manipulation and retrieval of data
from a database.
 QueryLanguages!=programmi
languages!
➢ QLs not intended to be used
for complex calculations.
➢ QLs support easy, efficient
access to large data sets.
 Relational model supports simple,
powerful query languages.

Formal Relational Query Languages

 There are varieties of Query
languages used by relational DBMS
for manipulating relations.
 Some of them are procedural

➢ User tells the system exactly
what and how to manipulate the
data
 Others are non-procedural
➢ User states what data is
needed rather than how it is to be
retrieved.
Two mathematical Query Languages

form the basis for Relational Query
Languages
➢ Relational Algebra:
➢ Relational Calculus:

 We may describe the relational

algebra as procedural language: it
can be used to tell the DBMS how to
build a new relation from one or
more relations in the database.
 We may describe relational

calculus as a non procedural
language: it can be used to formulate
the definition of a relation in terms of
one or more database relations.
 Formally the relational algebra
and relational calculus are equivalent
to each other. For every expression
in the algebra, there is an equivalent
expression in the calculus.
 Both are non-user friendly
languages. They have been used as
the basis for other, higher-level data
manipulation languages for relational
databases.
A query is applied to relation instances,

and the result of a query is also a
relation instance.
➢ Schemas of input relations for
a query are fixed
➢ The schema for the result of a
given query is also fixed!
Determined by definition of query
language constructs.
Relational Algebra
The basic set of operations for the

relational model is known as the
relational algebra. These operations
enable a user to specify basic retrieval
requests.
The result of the retrieval is a new

relation, which may have been formed
from one or more relations. The algebra

operations thus produce new relations,
which can be further manipulated using
operations of the same algebra.
A sequence of relational algebra

operations forms a relational algebra
expression, whose result will also be a
relation that represents the result of a
database query (or retrieval request).
 Relational algebra is a theoretical

language with operations that work
on one or more relations to define
another relation without changing the
original relation.
 The output from one operation
can become the input to another
operation (nesting is possible)

 There are different basic

operations that could be applied on
relations on a database based on
the requirement.
 Selection (  ) Selects a
subset of rows from a relation.
 Projection (  ) Deletes
unwanted columns from a
relation.
 Renaming: assigning
intermediate relation for a single
operation
 Cross-Product ( x ) Allows
to concatenate a tuple from one
relation with all the tuples from
the other relation.
 Set-Difference ( - ) Tuples
in relation R1, but not in relation
R2.
 Union ( ) Tuples in relation
R1, or in relation R2.
 Intersection () Tuples in

relation R1 and in relation R1
 Join Tuples joined from two
relations based on a condition
Join and intersection are derivable
from the rest.
 Using these, we can build up
sophisticated database queries.

Table1:
Sample table used to illustrate different
kinds of relational operations. The
relation contains information about
employees, IT skills they have and the
school where they attend each skill.
Employee
EmpID FName LName SkillID Skill SkillTyp
12 Abebe Mekuria 2 SQL Database
16 Lemma Alemu 5 C++ Programm
28 Chane Kebede 2 SQL Database
25 Abera Taye 6 VB6 Programm
65 Almaz Belay 2 SQL Database
24 Dereje Tamiru 8 Oracle Database
51 Selam Belay 4 Prolog Programm
94 Alem Kebede 3 Cisco Networkin

18 Girma Dereje 1 IP Programm

13 Yared Gizaw 7 Java Programm

1. Selection
 Selects subset of tuples/rows in a
relation that satisfy selection
condition.
 Selection operation is a unary
operator (it is applied to a single
relation)
 The Selection operation is applied
to each tuple individually
 The degree of the resulting
relation is the same as the original
relation but the cardinality (no. of
tuples) is less than or equal to the
original relation.
 The Selection operator is
commutative.
 Set of conditions can be combined
using Boolean operations ((AND),
(OR), and ~(NOT))
 No duplicates in result!

 Schema of result identical to

schema of (only) input relation.
 Result relation can be the input for
another relational algebra
operation!(Operator composition.)
 It is a filter that keeps only those
tuples that satisfy a qualifying
condition (those satisfying the
condition are selected while others
are discarded.)
Notation:
<Selection Condition> <Relation Name>
Example: Find all Employees with skill

type of Database.
< SkillType =”Database”> (Employee)
This query will extract every tuple from
a relation called Employee with all the
attributes where the SkillType attribute
with a value of “ Database” .
The resulting relation will be the

following.
EmpID FName LName SkillID Skill SkillTy

12 Abebe Mekuria 2 SQL Databa
28 Chane Kebede 2 SQL Databa
65 Almaz Belay 2 SQL Databa
24 Dereje Tamiru 8 Oracle Databa
If the query is all employees with a

SkillType Database and School Unity
the relational algebra operation and the
resulting relation will be as follows.
< SkillType =”Database” AND School=”Unity”>
(Employee)

24 Dereje Tamiru 8 Oracle Databas

2. Projection
 Selects certain attributes while
discarding the other from the base
relation.
 The PROJECT creates a vertical
partitioning – one with the needed
columns (attributes) containing
results of the operation and other
containing the discarded Columns.
 Deletes attributes that are not in
projection list.
 Schema of result contains exactly
the fields in the projection list, with
the same names that they had in the
(only) input relation.
 Projection operator has to
eliminate duplicates!
 Note: real systems typically
don’ t do duplicate elimination
unless the user explicitly asks for
it.
 If the Primary Key is in the

projection list, then duplication will
not occur
 Duplication removal is necessary
to insure that the resulting table is
also a relation.
Notation:
<Selected Attributes> <Relation Name>
Example: To display Name, Skill, and

Skill Level of an employee, the
query and the resulting relation
will be:
<FName, LName, Skill, Skill_Level>
(Employee)
FName LName Skill SkillLevel

Abebe Mekuria SQL 5
Lemma Alemu C++ 6
Chane Kebede SQL 10
Abera Taye VB6 8

Almaz Belay SQL 9
Dereje Tamiru Oracle 5
Selam Belay Prolog 8
Alem Kebede Cisco 7
Girma Dereje IP 4
Yared Gizaw Java 6
If we want to have the Name, Skill, and

Skill Level of an employee with Skill
SQL and SkillLevel greater than 5 the
query will be:
<FName, LName, Skill, Skill_Level> ( <Skill=”SQL” 
SkillLevel>5>(Employee))
FName LName Skill SkillLevel
Chane Kebede SQL 10
Almaz Belay SQL 9

3. Rename Operation
 We may want to apply several
relational algebra operations one after
the other. The query could be written
in two different forms:
1. Write the operations as a
single relational algebra
expression by nesting the
operations.
2. Apply one operation at a time
and create intermediate result
relations. In the latter case, we
must give names to the relations
that hold the intermediate
resultsRename Operation
If we want to have the Name, Skill, and

Skill Level of an employee with salary
greater than 1500 and working for
department 5, we can write the

expression for this query using the two

alternatives:
1. A single algebraic expression:

The above used query is using a single
algebra operation, which is:
<FName, LName, Skill, Skill_Level> ( <Skill=”SQL” 
SkillLevel>5>(Employee))
2. Using an intermediate relation by

the Rename Operation:
Step1: Result1  <DeptNo=5  Salary>1500>
(Employee)
Step2: Result <FName, LName, Skill,
Skill_Level>(Result1)
Then Result will be equivalent with the

relation we get using the first alternative.

4. Set Operations
The three main set operations are the
Union, Intersection and Set Difference.
The properties of these set operations are
similar with the concept we have in
mathematical set theory. The difference
is that, in database context, the elements
of each set, which is a Relation in
Database, will be tuples. The set
operations are Binary operations which
demand the two operand Relations to
have type compatibility feature.
Type Compatibility
Two relations R1 and R2 are said to be
Type Compatible if:
1. The operand relations R1(A1, A2,
..., An) and R2(B1, B2, ..., Bn) have
the same number of attributes, and
2. The domains of corresponding
attributes must be compatible; that
is, Dom(Ai)=Dom(Bi) for i=1, 2, ...,

n.
To illustrate the three set operations, we

will make use of the following two
tables:
Employee
EmpID FName LName SkillID Skill Skill
16 Lemma Alemu 5 C++ Progra
25 Abera Taye 6 VB6 Progra
51 Selam Belay 4 Prolog Progra
94 Alem Kebede 3 Cisco Netwo
18 Girma Dereje 1 IP Progra
13 Yared Gizaw 7 Java Progra
RelationOne: Employees who attend

Database Course

RelationTwo : Employees who attend a

course in AAU
EmpID FName LName SkillID Skill SkillT
12 Abebe Mekuria 2 SQL Databas
94 Alem Kebede 3 Cisco Networ
28 Chane Kebede 2 SQL Databas
13 Yared Gizaw 7 Java Program
a. UNION Operation
The result of this operation, denoted
by R U S, is a relation that includes
all tuples that are either in R or in S
or in both R and S. Duplicate tuple is
eliminated.

The two operands must be "type

compatible"
Eg: RelationOne U RelationTwo

Employees who attend Database in any
School or who attend any course at AAU
EmpID FName LName SkillID Skill Skill

94 Alem Kebede 3 Cisco Netwo
13 Yared Gizaw 7 Java Progra
b. INTERSECTION Operation
by R ∩ S, is a relation that includes
all tuples that are in both R and S.
compatible"
Eg: RelationOne ∩ RelationTwo

Employees who attend Database Course
at AAU

12 Abebe Mekuria 2 SQL Database
28 Chane Kebede 2 SQL Database
c. Set Difference (or MINUS)

Operation
by R - S, is a relation that includes all
tuples that are in R but not in S.
compatible"
Eg: RelationOne - RelationTwo
but didn’ t take any course at AAU
65 Almaz Belay 2 SQL Databas
24 Dereje Tamiru 8 Oracle Databas
Eg: RelationTwo - RelationOne

but didn’ t take any course at AAU

12 Abebe Mekuria 2 SQL Databas
94 Alem Kebede 3 Cisco Networ
13 Yared Gizaw 7 Java Program
The resulting relation for; R1  R2, R1

 R2, or R1-R2 has the same attribute
names as the first operand relation R1
(by convention).
Some Properties of the Set Operators

Notice that both union and intersection
are commutative operations; that is
R  S = S  R, and R  S = S  R

Both union and intersection can be

treated as n-nary operations applicable to
any number of relations as both are
associative operations; that is
R  (S  T) = (R  S)  T, and (R  S)
 T = R  (S  T)
The minus operation is not commutative;

that is, in general
R-S≠S–R
5. CARTESIAN (cross product)

Operation
This operation is used to combine tuples
from two relations in a combinatorial
fashion. That means, every tuple in
Relation (R) will be related with every
other tuple in Relation (S).
• In general, the result of R(A1, A2, . . .,
An) x S(B1,B2, . . ., Bm) is a relation Q
with degree n + m attributes Q(A1,
A2, . . ., An, B1, B2, . . ., Bm), in that

order.
• Where R has n attributes and S has m
attributes.
• The resulting relation Q has one tuple
for each combination of tuples—one
from R and one from S.
• Hence, if R has n tuples, and S has m
tuples, then | R x S | will have n* m
tuples.
Example:
Employee
ID FName LName
123 Abebe Lemma
567 Belay Taye
822 Kefle Kebede
Dept
DeptID DeptName MangID
2 Finance 567
3 Personnel 123
Then the Cartesian product between

Employee and Dept relations will be of
the form:
Employee X Dept:
ID FName LName DeptID DeptName Mang
123 Abebe Lemma 2 Finance 567
123 Abebe Lemma 3 Personnel 123
567 Belay Taye 2 Finance 567
567 Belay Taye 3 Personnel 123
822 Kefle Kebede 2 Finance 567
822 Kefle Kebede 3 Personnel 123

Basically, even though it is very

important in query processing, the
Cartesian Product is not useful by itself
since it relates every tuple in the First
Relation with every other tuple in the
Second Relation. Thus, to make use of
the Cartesian Product, one has to use it
with the Selection Operation, which
discriminate tuples of a relation by
qtesting whether each will satisfy the
selection condition.
In our example, to extract employee
information about managers of the
departments (Managers of each
department), the algebra query and the
resulting relation will be.
<ID, FName, LName, DeptName >
( <ID=MangID>(Employee X Dept))
ID FName LName DeptName
123 Abebe Lemma Personnel
567 Belay Taye Finance
6. JOIN Operation
The sequence of Cartesian product
followed by select is used quite
commonly to identify and select related
tuples from two relations, a special
operation, called JOIN. Thus in JOIN
operation, the Cartesian Operation and
the Selection Operations are used
together.
JOIN Operation is denoted by a
symbol.
This operation is very important for any

relational database with more than a
single relation, because it allows us to
process relationships among relations.
The general form of a join operation on

two relations
R(A1, A2,. . ., An) and S(B1, B2, . . ., Bm)

is:
R <join condition>S is equivalent to

<selection condition>(R X S)
where <join condition> and
<selection condition> are the same
Where, R and S can be any relation that

results from general relational algebra
expressions.
Since JOIN is an operation that needs
two relation, it is a Binary operation.
This type of JOIN is called a THETA

JOIN ( - JOIN)
Where  is the logical operator used in

the join condition.
 Could be { <,  , >, , , = }

Example:
Thus in the above example we want to
extract employee information about
managers of the departments, the
algebra query using the JOIN operation
will be.
Employee < ID=MangID>Dept
a. EQUIJOIN Operation
The most common use of join involves
join conditions with equality
comparisons only (=). Such a join, where
the only comparison operator used is the
equal sign is called an EQUIJOIN. In the
result of an EQUIJOIN we always have
one or more pairs of attributes (whose
names need not be identical) that have
identical values in every tuple since we
used the equality logical operator.

For example, the above JOIN

expression is an EQUIJOIN since the
logical operator used is the equal to
operator (=).
b. NATURAL JOIN Operation
We have seen that in EQUIJOIN one of
each pair of attributes with identical
values is extra, a new operation called
natural join was created to get rid of the
second (or extra) attribute that we will
have in the result of an EQUIJOIN
condition.
The standard definition of natural join
requires that the two join attributes, or
each pair of corresponding join
attributes, have the same name in both
relations. If this is not the case, a
renaming operation on the attributes is
applied first.

R1R S represents a natural join

between R and S. The degree of R1 is
degree of R plus Degree of S less the
number of common attributes
c. OUTER JOIN Operation
OUTER JOIN is another version of the
JOIN operation where non matching
tuples from a relation are also included
in the result with NULL values for
attributes in the other relation.
There are two major types of OUTER
JOIN.
1. RIGHT OUTER JOIN: where
non matching tuples from the second
(Right) relation are included in the
result with NULL value for attributes
of the first (Left) relation.
2. LEFT OUTER JOIN: where non
matching tuples from the first (Left)
relation are included in the result with

NULL value for attributes of the

second (Right) relation.
R <Join Condition > S theta left outer Join
Notation for Left Outer Join:
R S  natural left outer join
When two relations are joined by a JOIN

operator, there could be some tuples in
the first relation not having a matching
tuple from the second relation, and the
query is interested to display these non
matching tuples from the first or second
relation. Such query is represented by the
OUTER JOIN.
d. SEMIJOIN Operation
SEMI JOIN is another version of the

JOIN operation where the resulting
Relation will contain those attributes of
only one of the Relations that are related
with tuples in the other Relation. The
following notation depicts the inclusion
of only the attributes form the first
relation (R) in the result which are
actually participating in the relationship.
R <Join Condition> S
Aggregate functions and Grouping

statements
Some queries may involve aggregate
function (scalar aggregates like totals in
a report, or Vector aggregates like
subtotals in reports)

a) AL (R): Scalar aggregate functions

on relation R with AL as a list of
(<aggregate function > ,<attribute >)
pairs
b) GA AL (R): Vector aggregate

functions on relation R with AL as list
of (<aggregate function >, <attribute
>) pairs with a grouping attribute GA.
Example (a): the number of

employees in a an organization
(assume you have an employee table)
This is a scalar aggregate
PR(Num_Employees) Count EmpId

(Employee) , where PR = Produce
relation R
Example (b): the number of

employees in each department of an
organization (assume you have an

employee table)
This is a vector aggregate
PR (DeptId, Num_Employees) DeptId

Count EmpId (Employee) , where PR
= Produce relation R
Relational Calculus
A relational calculus expression creates a
new relation, which is specified in terms
of variables that range over rows of the
stored database relations (in tuple
calculus) or over columns of the stored
relations (in domain calculus).

In a calculus expression, there is no order

of operations to specify how to retrieve
the query result. A calculus expression
specifies only what information the
result should contain rather than how to
retrieve it.
In Relational calculus, there is no

description of how to evaluate a query;
this is the main distinguishing feature
between relational algebra and relational
calculus.
Relational calculus is considered to be a

nonprocedural language. This differs
from relational algebra, where we must
write a sequence of operations to specify
a retrieval request; hence relational
algebra can be considered as a
procedural way of stating a query.
When applied to relational database, the

calculus is not that of derivative and
differential but in a form of first-order
logic or predicate calculus, a predicate is
a truth-valued function with arguments.
When we substitute values for the

arguments in the predicate, the function
yields an expression, called a
proposition, which can be either true or
false.
If a predicate contains a variable, as in

‘ x is a member of staff’ , there must be
a range for x. When we substitute some
values of this range for x, the proposition
may be true; for other values, it may be
false.

If COND is a predicate, then the set of

all tuples evaluated to be true for the
predicate COND will be expressed as
follows:
{t | COND(t)}
Where t is a tuple variable and
COND (t) is a conditional expression
involving t. The result of such a
query is the set of all tuples t that
satisfy COND (t).
If we have set of predicates to evaluate

for a single query, the predicates can be
connected using (AND), (OR), and
~(NOT)
A relational calculus expression creates

a new relation, which is specified in
terms of variables that range over rows
of the stored database relations (in tuple

calculus) or over columns of the stored

relations (in domain calculus).
Tuple-oriented Relational Calculus

➢ The tuple relational calculus is
based on specifying a number of
tuple variables. Each tuple variable
usually ranges over a particular
database relation, meaning that the
variable may take as its value any
individual tuple from that relation.
➢ Tuple relational calculus is
interested in finding tuples for which
a predicate is true for a relation.
Based on use of tuple variables.
➢ Tuple variable is a variable that
‘ ranges over’ a named relation:
that is, a variable whose only
permitted values are tuples of the
relation.

➢ If E is a tuple that ranges over a

relation employee, then it is
represented as EMPLOYEE(E) i.e.
Range of E is EMPLOYEE
➢ Then to extract all tuples that

satisfy a certain condition, we will
represent it as all tuples E such that
COND(E) is evaluated to be true.
{E  COND(E)}
The predicates can be connected using

the Boolean operators:
 (AND),  (OR),  (NOT)
COND(t) is a formula, and is called a

Well-Formed-Formula (WFF) if:
➢ Where the COND is composed
of n-nary predicates (formula
composed of n single predicates)

and the predicates are connected
by any of the Boolean operators.
➢ And each predicate is of the
form A  B and  is one of the
logical operators { <,  , >, , , =
}which could be evaluated to
either true or false. And A and B
are either constant or variables.
➢ Formulae should be
unambiguous and should make
sense.
Example (Tuple Relational Calculus)

➢ Extract all employees whose skill
level is greater than or equal to 8
{E | Employee(E)  E.SkillLevel
>= 8}

25 Abera Taye 6 VB6 Program

65 Almaz Belay 2 SQL Databas
51 Selam Belay 4 Prolog Program
➢ To find only the EmpId, FName,

LName, Skill and the School where
the skill is attended where of
employees with skill level greater
than or equal to 8, the tuple based
relational calculus expression will be:
{E.EmpId, E.FName, E.LName, E.Skill,

E.School | Employee(E)  E.SkillLevel
>= 8}
EmpID FName LName Skill School

28 Chane Kebede SQL AAU
25 Abera Taye VB6 Helico
65 Almaz Belay SQL Helico
51 Selam Belay Prolog Jimma
➢ E.FName means the value of the

First Name (FName) attribute for the
tuple E.

Quantifiers in Relational Calculus

➢ To tell how many instances the
predicate applies to, we can use the
two quantifiers in the predicate logic.
➢ One relational calculus expressed
using Existential Quantifier can also
be expressed using Universal
Quantifier.
1. Existential quantifier 
(‘ there exists’ )
Existential quantifier used in
formulae that must be true for at
least one instance, such as:
An employee with skill level
greater than or equal to 8 will be:
{E | Employee(E) 
(E)(E.SkillLevel >= 8)}
This means, there exist at least

one tuple of the relation
employee where the value for

the SkillLevel is greater than or
equal to 8
2. Universal quantifier  (‘ for

all’ )
Universal quantifier is used in
statements about every instance,
such as:
An employee with skill level
greater than or equal to 8 will be:
{E | Employee(E) 
(E)(E.SkillLevel >= 8)}
This means, for all tuples of

relation employee where value
for the SkillLevel attribute is
greater than or equal to 8.
Example:
Let’ s say that we have the following

Schema (set of Relations)
Employee(EID, FName, LName,

EDID)
Project(PID, PName, PDID)
Dept(DID, DName, DMangID)
WorksOn(WEID, WPID)
To find employees who work on projects

controlled by department 5 the query will
be:
{E | Employee(E)  (P)(Project(P) 
(w)(WorksOn(w)  PDID =5 
EID=WEID))}
Domain Relational Calculus

In tuple relational Calculus, we use

variables that range over tuples of a
relation, in the case of domain relational
calculus we use variables that range over
domain elements (field variables).
• An expression in the domain relational
calculus has the following general form
{(x1,x2,x3,….xn)| P(x1,x2,x3,….xn,xm)}
Where (x1,x2,x3,….xn) represents the

domain variables and
P(x1,x2,x3,….xn,xm) represents the
formula
Formulas are of the form
R(x1,x2,x3,….xn), x1 x2 or

xi C where  є {<,>,<=,>=,=,≠} and R is

a relation of degree n and each xi is
domain variable
If f1 and f2 are formulas then so are
f1  f2 , f1  f2 ,~f1 , (x)f1 , (x)f1
• The Answer for such a query includes
all tuples with attributes
(x1,x2,x3,….xn) that make the formula
P(x1,x2,x3,….xn,xm) be true.
• Formula is recursively defined, starting
with simple atomic formulas (getting
tuples from relations or making
comparisons of values), and building
bigger and better formulas using the
logical connectives. i.e the Predicate P

can be set of formula combined by

Boolean operators
Example: Consider the schema of

relations on page 102.
Query1: list Employees
{Fname, Lname| (Employee
(EID,FName, LName)}
Query2: Find the list of Employees who
work in the department of IS
Domain relational Calculus expression
for the query
{EID,Fname,Lname|(DName,EDID,
DID)(Employee(EID,FName,
LName)Department(DID,DName,D

MangID)DID=EDIDDName=’ IS’ )
}
, Where DName, EDID, DID
DName, EDID, DID
Query3:List the names of employees
that do not manage any department
{Fname,Lname|(EID)(Employee(EID,F
name,Lname)
(~(DMangId)(Dept(DID,Dname,DMa
ngId) (EID=DMangId))))}

Chapter Seven
Advanced
Concepts in Database
Systems
• Database Security and Integrity
• Distributed Database Systems
• Data warehousing
1. Database Security and Integrity
A database represents an essential corporate resource that

should be properly secured using appropriate controls.
• Database security encompasses hardware, software,
people and data
Multi-user database system - DBMS must provide a

database security and authorization subsystem to enforce
limits on individual and group access rights and privileges.
Database security and integrity is about protecting the

database from being inconsistent and being disrupted. We
can also call it database misuse.
Database misuse could be Intentional or accidental, where

accidental misuse is easier to cope with than intentional
misuse.
Accidental inconsistency could occur due to:
➢ System crash during transaction processing
➢ Anomalies due to concurrent access
➢ Anomalies due to redundancy
➢ Logical errors
Like wise, even though there are various threats that could
be categorized in this group, intentional misuse could be:
➢ Unauthorized reading of data
➢ Unauthorized modification of data or
➢ Unauthorized destruction of data
Most systems implement good Database Integrity to

protect the system from accidental misuse while there are
many computer based measures to protect the system from
intentional misuse, which is termed as Database Security
measures.
• Database security is considered in relation to the

following situations:
➢ Theft and fraud
➢ Loss of confidentiality (secrecy)
➢ Loss of privacy
➢ Loss of integrity
➢ Loss of availability
Security Issues and general considerations

• Legal, ethical and social issues regarding the right to
access information
• Physical control
• Policy issues regarding privacy of individual level at
enterprise and national level
• Operational consideration on the techniques used
(password, etc)
• System level security including operating system and
hardware control
• Security levels and security policies in enterprise level
• Database security - the mechanisms that protect the

database against intentional or accidental threats. And
Database security encompasses hardware, software,
people and data
• Threat – any situation or event, whether intentional or

accidental, that may adversely affect a system and
consequently the organization
• A threat may be caused by a situation or event involving

a person, action, or circumstance that is likely to bring
harm to an organization
• The harm to an organization may be tangible or

intangible
Tangible – loss of hardware, software, or data
Intangible – loss of credibility or client confidence
Examples of threats:
✓ Using another persons’ means of access
✓ Unauthorized amendment/modification or copying
of data
✓ Program alteration
✓ Inadequate policies and procedures that allow a mix
of confidential and normal out put
✓ Wire-tapping
✓ Illegal entry by hacker
✓ Blackmail
✓ Creating ‘ trapdoor’ into system
✓ Theft of data, programs, and equipment
✓ Failure of security mechanisms, giving greater
access than normal
✓ Staff shortages or strikes
✓ Inadequate staff training
✓ Viewing and disclosing unauthorized data
✓ Electronic interference and radiation
✓ Data corruption owing to power loss or surge
✓ Fire (electrical fault, lightning strike, arson), flood,
bomb
✓ Physical damage to equipment
✓ Breaking cables or disconnection of cables
✓ Introduction of viruses
Levels of Security Measures

Security measures can be implemented at several levels and
for different components of the system. These levels are:
1. Physical Level: concerned with securing the site
containing the computer system should be physically
secured. The backup systems should also be physically
protected from access except for authorized users.
2. Human Level: concerned with authorization of database

users for access the content at different levels and
privileges.
3. Operating System: concerned with the weakness and

strength of the operating system security on data files.
Weakness may serve as a means of unauthorized access
to the database. This also includes protection of data in
primary and secondary memory from unauthorized
access.
4. Database System: concerned with data access limit

enforced by the database system. Access limit like
password, isolated transaction and etc.

Even though we can have different levels of security and

authorization on data objects and users, who access which
data is a policy matter rather than technical.
These policies
➢ should be known by the system: should be encoded in
the system
➢ should be remembered: should be saved somewhere
(the catalogue)
• An organization needs to identify the types of threat it

may be subjected to and initiate appropriate plans and
countermeasures, bearing in mind the costs of
implementing them

Countermeasures: Computer based controls

• The types of countermeasure to threats on computer
systems range from physical controls to administrative
procedures
• Despite the range of computer-based controls that are

available, it is worth noting that, generally, the security
of a DBMS is only as good as that of the operating
system, owing to their close association
• The following are computer-based security controls for a

multi-user environment:
➢ Authorization
▪ The granting of a right or privilege that enables a
subject to have legitimate access to a system or a
system’ s object
▪ Authorization controls can be built into the
software, and govern not only what system or
object a specified user can access, but also what
the user may do with it
▪ Authorization controls are sometimes referred to
as access controls
▪ The process of authorization involves
authentication of subjects (i.e. a user or program)
requesting access to objects (i.e. a database table,
view, procedure, trigger, or any other object that
can be created within the system)
➢ Views

▪ A view is the dynamic result of one or more

relational operations operation on the base
relations to produce another relation
▪ A view is a virtual relation that does not actually
exist in the database, but is produced upon
request by a particular user
▪ The view mechanism provides a powerful and
flexible security mechanism by hiding parts of
the database from certain users
▪ Using a view is more restrictive than simply
having certain privileges granted to a user on the
base relation(s)
➢ Integrity
▪ Integrity constraints contribute to maintaining a
secure database system by preventing data from
becoming invalid and hence giving misleading or
incorrect results
▪ Domain Integrity
▪ Entity integrity
▪ Referential integrity
▪ Key constraints
➢ Backup and recovery

▪ Backup is the process of periodically taking a
copy of the database and log file (and possibly
programs) on to offline storage media

▪ A DBMS should provide backup facilities to

assist with the recovery of a database following
failure
▪ Database recovery is the process of restoring the
database to a correct state in the event of a failure
▪ Journaling is the process of keeping and
maintaining a log file (or journal) of all changes
made to the database to enable recovery to be
undertaken effectively in the event of a failure
▪ The advantage of journaling is that, in the event
of a failure, the database can be recovered to its
last known consistent state using a backup copy
of the database and the information contained in
the log file
▪ If no journaling is enabled on a failed system, the
only means of recovery is to restore the database
using the latest backup version of the database
▪ However, without a log file, any changes made
after the last backup to the database will be lost
➢ Encryption
▪ The encoding of the data by a special algorithm
that renders the data unreadable by any program
without the decryption key
▪ If a database system holds particularly sensitive
data, it may be deemed necessary to encode it as
a precaution against possible external threats or
attempts to access it
▪ The DBMS can access data after decoding it,
although there is a degradation in performance
because of the time taken to decode it
▪ Encryption also protects data transmitted over

communication lines
▪ To transmit data securely over insecure networks
requires the use of a Cryptosystem, which
includes:
Authentication
➢ All users of the database will have different access
levels and permission for different data objects, and
authentication is the process of checking whether the
user is the one with the privilege for the access level.
➢ Is the process of checking the users are who they say
they are.
➢ Each user is given a unique identifier, which is used
by the operating system to determine who they are
➢ Thus the system will check whether the user with a
specific username and password is trying to use the
resource.
➢ Associated with each identifier is a password, chosen
by the user and known to the operation system, which
must be supplied to enable the operating system to
authenticate who the user claims to be
Any database access request will have the following three

major components
1. Requested Operation: what kind of operation is
requested by a specific query?
2. Requested Object: on which resource or data of

the database is the operation sought to be applied?
3. Requesting User: who is the user requesting the
operation on the specified object?
The database should be able to check for all the three
components before processing any request. The checking is
performed by the security subsystem of the DBMS.
Forms of user authorization

There are different forms of user authorization on the
resource of the database. These forms are privileges on
what operations are allowed on a specific data object.
User authorization on the data/extension

1. Read Authorization: the user with this privilege is
allowed only to read the content of the data object.
2. Insert Authorization: the user with this privilege is

allowed only to insert new records or items to the data
object.
3. Update Authorization: users with this privilege are

allowed to modify content of attributes but are not
authorized to delete the records.

4. Delete Authorization: users with this privilege are only

allowed to delete a record and not anything else.
➢ Different users, depending on the power of the user, can

have one or the combination of the above forms of
authorization on different data objects.
Role of DBA in Database Security
The database administrator is responsible to make the

database to be as secure as possible. For this the DBA
should have the most powerful privilege than every other
user. The DBA provides capability for database users while
accessing the content of the database.
The major responsibilities of DBA in relation to

authorization of users are:
1. Account Creation: involves creating different accounts
for different USERS as well as USER GROUPS.
2. Security Level Assignment: involves in assigning

different users at different categories of access levels.
3. Privilege Grant: involves giving different levels of

privileges for different users and user groups.
4. Privilege Revocation: involves denying or canceling

previously granted privileges for users due to various
reasons.

5. Account Deletion: involves in deleting an existing

account of users or user groups. Is similar with denying
all privileges of users on the database.

2. Distributed Database Systems
 Database development facilitates the integration of

data available in an organization and enforces security
on data access. But it is not always the case that
organizational data reside in one site. This demand
databases at different sites to be integrated and
synchronized with all the facilities of database
approach. This leads to Distributed Database Systems.
 In a distributed database system, the database is stored

on several computers. The computers in a distributed
system communicate with each other through various
communication media, such as high speed buses or
telephone line.
 A distributed database system consists of a collection

of sites, each of which maintains a local database
system and also participates in global transaction
where different databases are integrated together.
 Even though integration of data implies centralized

storage and control, in distributed database systems
the intention is different. Data is stored in different
database systems in a decentralized manner but act as
if they are centralized through development of
computer networks.
 A distributed database system consists of loosely

coupled sites that share no physical component and

database systems that run on each site are independent

of each other.
 Transactions may access data at one or more sites
 Organization may implement their database system on

a number of separate computer system rather than a
single, centralized mainframe. Computer Systems
may be located at each local branch office.
The functionalities of a DDBMS will include: Extended

Communication Services, Extended Data Dictionary,
Distributed Query Processing, Extended Concurrency
Control and Extended Recovery Services.
Concepts in DDBMS
 Replication: System maintains multiple copies of
data, stored in different sites, for faster retrieval and
fault tolerance.
 Fragmentation: Relation is partitioned into several
fragments stored in distinct sites
 Data transparency: Degree to which system user
may remain unaware of the details of how and where
the data items are stored in a distributed system

Advantages of DDBMS
1. Data sharing and distributed control:
➢ User at one site may be able access data that is
available at another site.
➢ Each site can retain some degree of control over
local data
➢ We will have local as well as global database
administrator
2. Reliability and availability of data

➢ If one site fails the rest can continue operation as
long as transaction does not demand data from the
failed system and the data is not replicated in other
sites
3. Speedup of query processing

➢ If a query involves data from several sites, it may be
possible to split the query into sub-queries that can
be executed at several sites which is parallel
processing
Disadvantages of DDBMS
1. Software development cost
2. Greater potential for bugs (parallel processing
may endanger correctness)
3. Increased processing overhead (due to
communication jargons)
4. Communication problems
Homogeneous and Heterogeneous Distributed

Databases
 In a homogeneous distributed database

 All sites have identical software
 Are aware of each other and agree to cooperate in
processing user requests.
 Each site surrenders part of its autonomy in terms
of right to change schemas or software
 Appears to user as a single system
 In a heterogeneous distributed database
 Different sites may use different schemas and
software
 Difference in schema is a major problem for
query processing
 Difference in software is a major problem
for transaction processing
 Sites may not be aware of each other and may
provide only limited facilities for cooperation in
transaction processing

3. Data warehousing
 Data warehouse is an integrated, subject-oriented,

time-variant, non-volatile database that provides
support for decision making.
✓ Integrated  centralized, consolidated database

that integrates data derived from the entire
organization.
➢ Consolidates data from multiple and diverse

sources with diverse formats.
➢ Helps managers to better understand the
company’ s operations.
✓ Subject-Oriented  Data warehouse contains
data organized by topics. Eg. Sales, marketing,
finance, etc.
✓ Time variant: In contrast to the operational

data that focus on current transactions, the
warehouse data represent the flow of data
through time.
➢ Data warehouse contains data that reflect
what happened last week, last month, past
five years, and so on.
✓ Non volatile  Once data enter the data
warehouse, they are never removed. Because the
data in the warehouse represent the company’ s
entire history.
Differences between database and data warehouse

✓ Because data is added all the time, warehouse is

growing.
✓ The data warehouse and operational
environments are separated. Data warehouse
receives its data from operational databases.
✓ Data warehouse environment is characterized by
read-only transactions to very large data sets.
✓ Operational environment is characterized by
numerous update transactions to a few data
entities at a time.
✓ Data warehouse contains historical data over a
long time horizon.
 Ultimately Information is created from data
warehouses. Such Information becomes the basis for
rational decision making.
 The data found in data warehouse is analyzed to

discover previously unknown data characteristics,
relationships, dependencies, or trends.

DB Lecture Note All in ONE

Uploaded by

Copyright:

Available Formats

DB Lecture Note All in ONE

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DB Lecture Note All in ONE

Uploaded by

Copyright:

Available Formats

Database Systems Lecture Note

Database systems are designed to manage large data

Today, Databases are essential to every business.

The power of databases comes from a body of

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 1

Thus, for our question: What is a database? In

Thus the DB course is about:

Data management passes through the different levels

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 2

In the manual approach, data storage and retrieval

➢ Files for as many event and objects as the

Limitations of the Manual approach

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 3

An alternative approach of data handling is a

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 4

2. Traditional File Based Approach

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 5

Limitations of the Traditional File Based approach

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 6

As business application become more complex

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 7

2. No control over the access and manipulation

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 8

Thus in Database Approach:

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 9

entities, attributes, relationships, and business

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 10

Benefits of the database approach

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 11

➢ Security measures can be enforced: the shared data

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 12

Limitations and risk of Database Approach

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 13

Database Management System (DBMS)

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 14

1. Data storage, retrieval and update in the database

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 15

DBMS and Components of DBMS Environment

Fig. General architecture of a DBMS

A DBMS is software package used to design,

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 16

and managing the database. It provides the

➢ Data Definition Language (DDL):

➢ Data Manipulation Language (DML):

➢ Data Control Language:

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 17

The database administrator should have the

The DBMS is software package that helps to design,

1. Hardware: are components that one can touch

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 18

2. Software: are collection of commands and

3. Data: since the goal of any database system is to

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 19

4. Procedure: this is the rules and regulations on

5. People: this component is composed of the

Compiled by;Adane Kasie, Faculty of Informatics, BDU sep't 19/2009; 20

Database Development Life Cycle (DDLC)

As it is one component in most information system

1. Planning: that is identifying information gap in

2. Analysis: that concentrates more on fact finding