DB Lecture Note All in ONE
DB Lecture Note All in ONE
DB Lecture Note All in ONE
Finster 1
Introduction to Database System
1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
1. Manual Approach
3. Database Approach
Following a famous paper written by Dr. Edgard
Frank Codd in 1970, database systems changed
significantly. Codd proposed that database systems
should present the user with a view of data organized
as tables called relations. Behind the scenes, there
might be a complex data structure that allowed rapid
response to a variety of queries. But, unlike the user
of earlier database systems, the user of a relational
system would not be concerned with the storage
structure. Queries could be expressed in a very high-
level language, which greatly increased the efficiency
of database programmers. The database approach
emphasizes the integration and sharing of data
throughout the organization.
➢ Data Dictionary:
o Due to the fact that a database is a self
describing system, this tool, Data Dictionary,
is used to store and organize information
about the data stored in the database.
2. Physical DBD
➢ Take logical design specification as input
and decide how it should be physically
realized.
➢ Map the logical data model on the
specified DBMS with respect to tables and
integrity constraints. (DBMS dependent
designing)
➢ Select specific storage structure and access
path to the database
➢ Design security measures required on the
database
4. End Users
Workers, whose job requires accessing the
database frequently for various purposes, there
are different group of users in this category.
1. Naïve Users:
➢ Sizable proportion of users
➢ Unaware of the DBMS
➢ Only access the database based on their
access level and demand
➢ Use standard and pre-specified types of
queries.
2. Sophisticated Users
➢ Users familiar with the structure of the
Database and facilities of the DBMS.
➢ Have complex requirements
➢ Have higher level queries
➢ Are most of the time engineers, scientists,
business analysts, etc
3. Casual Users
➢ Users who access the database
occasionally.
➢ Need different information from the
database each time.
➢ Use sophisticated database queries to
satisfy their needs.
➢ Are most of the time middle to high level
managers.
ANSI-SPARC Architecture
Data Independence
Logical Data Independence:
Refers to immunity of external schemas to
changes in conceptual schema.
Conceptual schema changes e.g.
addition/removal of entities should not require
changes to external schema or rewrites of
application programs.
The capacity to change the conceptual schema
without having to change the external schemas
and their application programs.
Database Languages
1. Hierarchical Model
• The simplest data model
• Record type is referred to as node or
segment
• The top node is the root node
• Nodes are arranged in a hierarchical
structure as sort of upside-down tree
• A parent node can have more than one child
node
• A child node can only have one parent node
• The relationship between parent and child is
one-to-many
• Relation is established by creating physical
link between stored records (each is stored
with a predefined access path to other
records)
• To add new record type or relationship, the
database must be redefined and then stored
in a new form.
Department
Employee Job
2. Network Model
• Allows record types to have more than one
parent unlike hierarchical model
• A network data models sees records as set
members
• Each set has an owner and one or more
members
Department Job
Employee
Activity
Time Card
Alternative
terminologies
Relation Table File
Tuple Row Record
Chapter Two
Important terms:
Relation: a table with rows and columns
Attribute: a named column of a relation
Domain: a set of allowable values for one or more
attributes
Tuple: a row of a relation
Degree: the degree of a relation is the number of
attributes it contains
Unary relation, Binary relation, Ternary relation, N-ary
relation
Cardinality: of a relation is the number of tuples
the relation has
Relational Database: a collection of normalized
relations with distinct relation names.
Relation Schema: a named relation defined by a set
of attribute-domain name pair
Types of Attributes
Degree of a Relationship
• An important point about a relationship is
how many entities participate in it. The
number of entities participating in a
Cardinality of a Relationship
• Another important concept about
relationship is the number of
instances/tuples that can be associated with
a single instance from one entity in a single
relationship. The number of instances
participating or associated with a single
instance from an entity in a relationship is
called the CARDINALITY of the
relationship. The major cardinalities of a
relationship are:
o ONE-TO-ONE: one tuple is associated
with only one other tuple.
▪ E.g. Building – Location as a
single building will be located in a
single location and as a single
location will only accommodate a
single Building.
o ONE-TO-MANY, one tuple can be
associated with many other tuples, but
not the reverse.
▪ E.g. Department-Student as one
department can have multiple
students.
o MANY-TO-ONE, many tuples are
associated with one tuple but not the
reverse.
• Key constraints
If tuples are need to be unique in the database,
and then we need to make each tuple distinct. To
do this we need to have relational keys that
uniquely identify each record.
• Relational Views
Purpose of a view
➢ Hides unnecessary information from users:
since only part of the base relation (Some
collection of attributes, not necessarily all)
are to be included in the virtual table.
Schemas
Schema describes how data is to be structured,
defined at setup/Design time (also called
"metadata")
Since it is used during the database development
phase, there is rare tendency of changing the
schema unless there is a need for system
maintenance which demands change to the
definition of a relation.
Schema Diagrams
➢ convention to display some aspect of a
schema visually
Schema Construct
➢ refers to each object in the schema (e.g.
STUDENT)
E.g.: STUNEDT
(FName,LName,Id,Year,Dept, Sex)
Instances
Chapter Three
Database Design
Database design is the process of coming up with
different kinds of specification for the data to be
stored in the database. The database design part is
one of the middle phases we have in information
systems development where the system uses a
database approach. Design is the part on which we
would be engaged to describe how the data should be
perceived at different levels and finally how it is
going to be stored in a computer system.
Conceptual Design
Logical Design
Physical Design
▪ Represented by Diamond
o Constraints
▪ Represent the constraint in the data
• Cardinality and Participation
Constraints
Ke
y
Id Gpa
Students Course
s
Age
Enrolled In Semester
Academic
Year
Grade
One-to-one relationship:
➢ A customer is associated with at most one loan via the
relationship borrower
➢ A loan is associated with at most one customer via
borrower
One-To-Many Relationships
➢ In the one-to-many relationship a loan is associated
with at most one customer via borrower, a customer is
associated with several (including 0) loans via
borrower
Many-To-Many Relationship
➢ A customer is associated with several (possibly 0)
loans via borrower
➢ A loan is associated with several (possibly 0)
customers via borrower
1..1 0..1
Employee Manages Department
Problem in ER Modeling
Example:
2. Chasm Trap:
Occurs where a model suggests the existence of a
relationship between entity types, but the path
way does not exist between certain entity
occurrences.
Chasm trap may exist when there are one or
more relationships with a minimum multiplicity
on cardinality of zero forming part of the
pathway between related entities.
Example:
Problem:
How can we identify which BRANCH is
responsible for which PROJECT? We know that
whether the PROJECT is active or not there is a
responsible BRANCH. But which branch is a
question to be answered, and since we have a
minimum participation of zero between
EER Concepts
Generalization
Specialization
Sub classes
Super classes
Attribute Inheritance
Constraints on specialization and generalization
Generalization
➢ Generalization occurs when two or more entities
represent categories of the same real-world object.
➢ Generalization is the process of defining a more
general entity type from a set of more specialized
entity types.
➢ A generalization hierarchy is a form of abstraction
that specifies that two or more entities that share
common attributes can be generalized into a higher
level entity type.
➢ Is considered as bottom-up definition of entities.
➢ Generalization hierarchy depicts relationship
between higher level superclass and lower level
subclass.
Generalization hierarchies can be nested. That is, a
subtype of one hierarchy can be a supertype of
another. The level of nesting is limited only by the
constraint of simplicity.
Specialization
➢ Is the result of subset of a higher level entity set to
form a lower level entity set.
➢ The specialized entities will have additional set of
attributes (distinguishing characteristics) that
distinguish them from the generalized entity.
➢ Is considered as Top-Down definition of entities.
➢ Specialization process is the inverse of the
Generalization process. Identify the distinguishing
features of some entity occurrences, and specialize
them into different subclasses.
➢ Reasons for Specialization
o Attributes only partially applying to
superclasses
o Relationship types only partially applicable to
the superclass
➢ In many cases, an entity type has numerous sub-
groupings of its entities that are meaningful and
need to be represented explicitly. This need requires
the representation of each subgroup in the ER
model. The generalized entity is a superclass and
the set of specialized entities will be subclasses for
that specific Superclass.
o Example: Saving Accounts and Current
Accounts are Specialized entities for the
generalized entity Accounts. Manager, Sales,
Secretary: are specialized employees.
Subclass/Subtype
Superclass /Supertype
➢ An entity type whose tuples share common
attributes. Attributes that are shared by all entity
occurrences (including the identifier) are associated
with the supertype.
➢ Is the generalized entity
Attribute Inheritance
➢ An entity that is a member of a subclass inherits all
the attributes of the entity as a member of the
superclass.
➢ The entity also inherits all the relationships in
which the superclass participates.
➢ An entity may have more than one subclass
categories.
➢ All entities/subclasses of a generalized entity or
superclass share a common unique identifier
Completeness Constraint.
Chapter Four
Logical
Database Design
The whole purpose of the data base design is to create an
accurate representation of the data, the relationship between
the data and the business constraints pertinent to that
organization. Therefore, one can use one or more technique
to design a data base. One such a technique was the E-R
model. In this chapter we use another technique known as
“ Normalization” with a different emphasis to the database
design---- defines the structure of a database with a specific
data model.
FName LName
e e
Tel DNam
e
StartDate
Leads
EndDate
Participat
e
PBonu
s
M M
Project
PFund
PID PName
Telephone
EID Tel
Department
DID DName DLoc MEID
Project
PID PName PFund
Telephone
EID Tel
Employee
EID FName LName Salary EDID
Emp_Partc_Project
EID PID
Emp_Lead_Project
EID PID PBonus StartDate EndDate
Deletion Anomalies:
If employee with ID 16 is deleted then ever
information about skill C++ and the type of skill is
deleted from the database. Then we will not have any
information about C++ and its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called
Pascal? We can not decide weather Pascal is allowed
as a value for skill and we have no clue about the type
of skill that Pascal should be categorized as.
Modification Anomalies:
Data Dependency
The logical associations between data items that point the
database designer in the direction of a good database design
are refered to as determinant or dependent relationships.
Example
Dinner Type of
Course Wine
Meat Red
Fish White
Cheese Rose
Dinner Wine
Since both Wine type and Fork type are determined by the
Dinner type, we say Wine is functionally dependent on
Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork
Partial Dependency
If an attribute which is not a member of the primary key is
dependent on some part of the primary key (if we have
composite primary key) then that attribute is partially
functionally dependent on the primary key.
Transitive Dependency
In mathematics and logic, a transitive relationship is a
relationship of the following form: "If A implies B, and if
also B implies C, then A implies C."
Example:
If Mr X is a Human, and if every Human is an Animal,
then Mr X must be an Animal.
Steps of Normalization:
UnNormalized Form(UNF):
Identify all data elements
First Normal Form(1NF):
Find the key with which you can find all data i.e.
remove any repeating group
Second Normal Form(2NF):
Remove part-key dependencies (partial dependency).
Make all data dependent on the whole key.
Third Normal Form(3NF)
Remove non-key dependencies (transitive
dependencies). Make all data dependent on nothing but
the key.
For most practical purposes, databases are considered
normalized if they adhere to the third normal form (there is
no transitive dependency).
UNNORMALIZED
EmpID FirstName LastName Skill SkillType School
SchoolAdd
12 Abebe Mekuria SQL, Database, AAU,
Sidist_Kilo
VB6 Programming Helico
Piazza
16 Lemma Alemu C++ Programming GerjiUnity
IP Programming JimmaJimma
City
28 Chane Kebede SQL Database AAU Sidist_Kilo
65 Almaz Belay SQL Database Helico Piazza
Prolog Programming Jimma Jimma
Java Programming AAU City
Sidist_Kilo
24 Dereje Tamiru Oracle Database Unity Gerji
94 Alem Kebede Cisco Networking AAU Sidist_Kilo
EMP_PROJ
EmpID EmpName ProjNo ProjName ProjLoc ProjFund ProjMan
EMP_PROJ rearranged
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMan
FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund,
ProjMangID
FD3: {EmpID, ProjNo} Incentive
EMPLOYEE
EmpID EmpName
PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID
EMP_PROJ
EmpID ProjNo Incentive
STUDENT DORM
StudID Stud Stud Dept Year Year Dormitary
F_Name L_Name 1 401
125/97 Abebe Mekuria Info Sc 1 3 403
654/95 Lemma Alemu Geog 3
842/95 Chane Kebede CompSc 3
165/97 Alem Kebede InfoSc 1
985/95 Almaz Belay Geog 3
Generally,
eventhough there are other four additional levels of
Normalization, a table is said to be normalized if it reaches
3NF. A database with all tables in the 3NF is said to be
Normalized Database.
Mnemonic for remembering the rationale for normalization
up to 3NF could be the following:
1. No Repeating or Redunduncy: no repeting fields in the
table.
2. The Fields Depend Upon the Key: the table should solely
depend on the key.
3. The Whole Key: no partial keybdependency.
4. And Nothing But the Key: no inter data dependency.
A------>>B
A------->>C
Pitfalls of Normalization
Chapter Five
Physical Database Design
Methodology for Relational Database
o definition of Alternate
key(Unique keys)
o definition of Domains
o Referential integrity constraints
o definition of enterprise level
constraints
Chapter Six
Relational Query Languages
Relational Algebra
Table1:
Sample table used to illustrate different
kinds of relational operations. The
relation contains information about
employees, IT skills they have and the
school where they attend each skill.
Employee
EmpID FName LName SkillID Skill SkillTyp
12 Abebe Mekuria 2 SQL Database
16 Lemma Alemu 5 C++ Programm
28 Chane Kebede 2 SQL Database
25 Abera Taye 6 VB6 Programm
65 Almaz Belay 2 SQL Database
24 Dereje Tamiru 8 Oracle Database
51 Selam Belay 4 Prolog Programm
1. Selection
Selects subset of tuples/rows in a
relation that satisfy selection
condition.
Selection operation is a unary
operator (it is applied to a single
relation)
The Selection operation is applied
to each tuple individually
The degree of the resulting
relation is the same as the original
relation but the cardinality (no. of
tuples) is less than or equal to the
original relation.
The Selection operator is
commutative.
Set of conditions can be combined
using Boolean operations ((AND),
(OR), and ~(NOT))
No duplicates in result!
Notation:
<Selection Condition> <Relation Name>
2. Projection
Selects certain attributes while
discarding the other from the base
relation.
The PROJECT creates a vertical
partitioning – one with the needed
columns (attributes) containing
results of the operation and other
containing the discarded Columns.
Deletes attributes that are not in
projection list.
Schema of result contains exactly
the fields in the projection list, with
the same names that they had in the
(only) input relation.
Projection operator has to
eliminate duplicates!
Note: real systems typically
don’ t do duplicate elimination
unless the user explicitly asks for
it.
Adane kasie Faculty of Informatics, BDU 2009
Database Systems Lecture Note
Notation:
<Selected Attributes> <Relation Name>
3. Rename Operation
We may want to apply several
relational algebra operations one after
the other. The query could be written
in two different forms:
1. Write the operations as a
single relational algebra
expression by nesting the
operations.
2. Apply one operation at a time
and create intermediate result
relations. In the latter case, we
must give names to the relations
that hold the intermediate
resultsRename Operation
4. Set Operations
The three main set operations are the
Union, Intersection and Set Difference.
The properties of these set operations are
similar with the concept we have in
mathematical set theory. The difference
is that, in database context, the elements
of each set, which is a Relation in
Database, will be tuples. The set
operations are Binary operations which
demand the two operand Relations to
have type compatibility feature.
Type Compatibility
Two relations R1 and R2 are said to be
Type Compatible if:
1. The operand relations R1(A1, A2,
..., An) and R2(B1, B2, ..., Bn) have
the same number of attributes, and
2. The domains of corresponding
attributes must be compatible; that
Adane kasie Faculty of Informatics, BDU 2009
Database Systems Lecture Note
a. UNION Operation
The result of this operation, denoted
by R U S, is a relation that includes
all tuples that are either in R or in S
or in both R and S. Duplicate tuple is
eliminated.
b. INTERSECTION Operation
The result of this operation, denoted
by R ∩ S, is a relation that includes
all tuples that are in both R and S.
The two operands must be "type
compatible"
Adane kasie Faculty of Informatics, BDU 2009
Database Systems Lecture Note
Example:
Employee
ID FName LName
123 Abebe Lemma
567 Belay Taye
Adane kasie Faculty of Informatics, BDU 2009
Database Systems Lecture Note
Dept
DeptID DeptName MangID
2 Finance 567
3 Personnel 123
Employee X Dept:
ID FName LName DeptID DeptName Mang
123 Abebe Lemma 2 Finance 567
123 Abebe Lemma 3 Personnel 123
567 Belay Taye 2 Finance 567
567 Belay Taye 3 Personnel 123
822 Kefle Kebede 2 Finance 567
822 Kefle Kebede 3 Personnel 123
6. JOIN Operation
The sequence of Cartesian product
followed by select is used quite
commonly to identify and select related
tuples from two relations, a special
operation, called JOIN. Thus in JOIN
operation, the Cartesian Operation and
the Selection Operations are used
together.
JOIN Operation is denoted by a
symbol.
Example:
Thus in the above example we want to
extract employee information about
managers of the departments, the
algebra query using the JOIN operation
will be.
a. EQUIJOIN Operation
The most common use of join involves
join conditions with equality
comparisons only (=). Such a join, where
the only comparison operator used is the
equal sign is called an EQUIJOIN. In the
result of an EQUIJOIN we always have
one or more pairs of attributes (whose
names need not be identical) that have
identical values in every tuple since we
used the equality logical operator.
d. SEMIJOIN Operation
Adane kasie Faculty of Informatics, BDU 2009
Database Systems Lecture Note
R <Join Condition> S
Relational Calculus
A relational calculus expression creates a
new relation, which is specified in terms
of variables that range over rows of the
stored database relations (in tuple
calculus) or over columns of the stored
relations (in domain calculus).
1. Existential quantifier
(‘ there exists’ )
Existential quantifier used in
formulae that must be true for at
least one instance, such as:
An employee with skill level
greater than or equal to 8 will be:
{E | Employee(E)
(E)(E.SkillLevel >= 8)}
Example:
Adane kasie Faculty of Informatics, BDU 2009
Database Systems Lecture Note
MangID)DID=EDIDDName=’ IS’ )
}
, Where DName, EDID, DID
DName, EDID, DID
Query3:List the names of employees
that do not manage any department
{Fname,Lname|(EID)(Employee(EID,F
name,Lname)
(~(DMangId)(Dept(DID,Dname,DMa
ngId) (EID=DMangId))))}
Chapter Seven
Advanced
Concepts in Database
Systems
• Database Security and Integrity
• Distributed Database Systems
• Data warehousing
Like wise, even though there are various threats that could
be categorized in this group, intentional misuse could be:
➢ Unauthorized reading of data
➢ Unauthorized modification of data or
➢ Unauthorized destruction of data
Examples of threats:
✓ Using another persons’ means of access
✓ Unauthorized amendment/modification or copying
of data
✓ Program alteration
✓ Inadequate policies and procedures that allow a mix
of confidential and normal out put
✓ Wire-tapping
✓ Illegal entry by hacker
✓ Blackmail
✓ Creating ‘ trapdoor’ into system
✓ Theft of data, programs, and equipment
✓ Failure of security mechanisms, giving greater
access than normal
✓ Staff shortages or strikes
✓ Inadequate staff training
✓ Viewing and disclosing unauthorized data
✓ Electronic interference and radiation
✓ Data corruption owing to power loss or surge
✓ Fire (electrical fault, lightning strike, arson), flood,
bomb
✓ Physical damage to equipment
✓ Breaking cables or disconnection of cables
✓ Introduction of viruses
Adane kasie Faculty of Informatics, BDU 2009
Database Systems Lecture Note
These policies
➢ should be known by the system: should be encoded in
the system
➢ should be remembered: should be saved somewhere
(the catalogue)
➢ Views
➢ Encryption
▪ The encoding of the data by a special algorithm
that renders the data unreadable by any program
without the decryption key
▪ If a database system holds particularly sensitive
data, it may be deemed necessary to encode it as
a precaution against possible external threats or
attempts to access it
▪ The DBMS can access data after decoding it,
although there is a degradation in performance
because of the time taken to decode it
Adane kasie Faculty of Informatics, BDU 2009
Database Systems Lecture Note
Authentication
➢ All users of the database will have different access
levels and permission for different data objects, and
authentication is the process of checking whether the
user is the one with the privilege for the access level.
➢ Is the process of checking the users are who they say
they are.
➢ Each user is given a unique identifier, which is used
by the operating system to determine who they are
➢ Thus the system will check whether the user with a
specific username and password is trying to use the
resource.
➢ Associated with each identifier is a password, chosen
by the user and known to the operation system, which
must be supplied to enable the operating system to
authenticate who the user claims to be
Concepts in DDBMS
Replication: System maintains multiple copies of
data, stored in different sites, for faster retrieval and
fault tolerance.
Fragmentation: Relation is partitioned into several
fragments stored in distinct sites
Data transparency: Degree to which system user
may remain unaware of the details of how and where
the data items are stored in a distributed system
Advantages of DDBMS
1. Data sharing and distributed control:
➢ User at one site may be able access data that is
available at another site.
➢ Each site can retain some degree of control over
local data
➢ We will have local as well as global database
administrator
Disadvantages of DDBMS
1. Software development cost
2. Greater potential for bugs (parallel processing
may endanger correctness)
3. Increased processing overhead (due to
communication jargons)
4. Communication problems
3. Data warehousing