Unit 1 Introduction To Dbms
Unit 1 Introduction To Dbms
Unit 1 Introduction To Dbms
1. Computer file contains information arranged in an electronic format. It also facilitates easy
storage, retrieval, and manipulation of data.
2. They are stored in the form bits and bytes. It has a name and the computer would recognize a file
based on this name.
3. A programmer working with this file can give instructions to the computer to open the file, read
from it, write to it, modify its contents, close it, and so on.
4. A program passes control to another in a sequence. This is called batch processing, where no or
minimum human interaction is required.
5. In many situations the program needs to be conversational. These days, the computer performs
both the searching and the answering operations in an automated manner.
6. A search that can take place at any time is called as an online query.
7. When an instantaneous answer is expected, it is called online processing or real-time
processing. Example: Airlines reservations
8. Data can be classified into two types:
Master data -does not change with time.
Transaction data -can change from time to time.
9. Example: Library Management
10. There is a library of books and a librarian to maintain it. The librarian has created one card per
book, which contains details such as book number, title, author, price and date of purchase.
11. For this, the librarian has used the conceptual record layout as shown in the figure 1.1
4. Database Characteristics
The main characteristics of the database approach are the following:
1. Self-describing nature of a database system
2. Insulation between programs and data, and data abstraction
3. Support of multiple views of the data
4. Sharing of data and multiuser transaction processing
External level
The users' view of the database External level describes that part of the database
that is relevant to each user. The external level consists of a number of different
external views of the database. Each user has a view of the 'real world' represented
in a form that is familiar for that user. The external view includes only those
entities, attributes, and relationships in the real world that the user is interested in.
The use of external models has some very major advantages,
Makes application programming much easier.
Simplifies the database designer's task.
Helps in ensuring the database security.
Conceptual level
The community view of the database conceptual level describes what data is
stored in the database and the relationships among the data. The middle
level in the three level architecture is the conceptual level. This level
contains the logical structure of the entire database as seenby theDBA. It is a
complete view of the data requirements of the organization that is
independent of any storage considerations.
The conceptual level represents:
All entities, their attributes and their relationships
The constraints on the data
Semantic information about the data
Security and integrity information.
The conceptual level supports each external view. However, this level must
notcontain any storage dependent details. For instance, the description of an
entity should contain only data
types of attributes and their length, but not any storage consideration such as the
number of bytes occupied.
Internal level
The physical representation of the database on the computer Internal level describes
how the data is stored in the database. The internal level covers the physical
implementation of the database to achieve optimal runtime performance and storage
space utilization. It covers the data structures and file organizations used to store
data on storage devices.
The internal level is concerned with
•Storage space allocation for data and indexes.
•Record descriptions for storage
•Record placement.
•Data compression and data encryption techniques.
•Below the internal level there is a physical level that maybe managed by the
operating system under the direction of the DBMS
Physical level
The physical level below the DBMS consists of items only the operating system
knows such as exactly how the sequencing is implemented and whether the fields
of internal records are stored as contiguous bytes onthe disk.
Instances and Schemas
Similar to types and variables in programming languages which we alreadyknow,
Schema is the logical structure of the database E.g., the database consists of
information about a set of customers and accounts and the relationship between
them) analogous to type information of a variable in a program.
Physical schema: database design at the physical level
Logical schema: database design at the logical level
DATA MODELS
The data model is a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints. A data model provides a way to describe the design of
a data base at the physical, logical and view level.
The purpose of a data model is to represent data and to make the data understandable.
According to the types of concepts used to describe the database structure, there are three
data models:
1. An external data model, to represent each user's view of the organization.
2. A conceptual data model, to represent the logical view that is DBMS independent
3. An internal data model, to represent the conceptual schema in such a way that it can
be understood by the DBMS.
The relational data model is based on the concept of mathematical relations. Relational model
stores data in the form of a table. Each table corresponds to an entity, and each row represents
an instance of that entity. Tables, also called relations are related to each other through the
sharing of a common entitycharacteristic.
Example
Relational DBMS DB2, oracle, MS SQLserver.
Object -Based Data Models
Object-based data models use concepts such as entities, attributes, and relationships. An entity
is a distinct object in the organization that is to be represents in the database. An attribute is a
property that describes some aspect of the object, and a relationship is an association between
entities. Common types of object-based data model are:
•Entity -Relationship model
•Object -oriented model
•Semantic model
Entity Relationship Model:
The ER model is based on the following components:
•Entity: An entity was defined as anything about which data are to be collected and stored.
Each row in the relational table is known as an entity instance or entity occurrence in the ER
model. Each entity is described by a set of attributes that describes particular characteristics of
the entity.
Object oriented model:
In the object-oriented data model (OODM) both data and their relationships are contained in
a single structure known as an object. An object is described by its factual content. An object
includes information about relationships between the facts within the object, as well as
information about its relationships with other objects. Therefore, the facts within the object
are given greater meaning. The OODM is said to be a semantic data model because semantic
indicates meaning.
The OO data model is based on the following components:
An object is an abstraction of a real-world entity.
Attributes describe the properties of an object.
Data abstraction:
o Suppression of details of data organization and Storage.
o Highlighting the essential features for an improved understanding of data.
Data model:
Collection of concepts that describe the structure of a database.
Provides means to achieve data abstraction.
Basic operations
Specify retrievals and updates on the database
Dynamic aspect or behavior of a database application
Allows the database designer to specify a set of valid operations allowed on
database objects.
Categories of Data Models
High-level or conceptual data models
Close to the way many users perceive data.
Conceptual data models use concepts such as entities, attributes, and relationships.
•
Entity-Represents a real-world object or concept.
•
Attribute-Represents some property of interest that further describes an entity.
•
Relationship among two or more entities represents an association among the
entities.
Low-level or physical data models
Describe the details of how data is stored on computer storage media.
Representational data models
•
Easily understood by end users.
•
Also similar to how data organized in computer storage.
Relational data model
Used most frequently in traditional commercial DBMSs.
Object data model
New family of higher-level implementation data models that are closer to
conceptual data models.
Physical data models
•
Describe how data is stored as files in the computer.
•
Access path- Structure that makes the search for particular database records efficient.
•
Index- Example of an access path that allows direct access to data using an index term
or keyword.
6. DBMS Components
A DBMS is a complex software system.
Figure illustrates, in a simplified form, the typical DBMS components.
DATABASESYSTEM ARCHITECTURE
Transaction Management
A transaction is a collection of operations that performs a single logical function in a
database application. Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures (e.g. power failures and
operating system crashes) and transaction failures. Concurrency-control manager controls
the interaction among the concurrent transactions, to ensure the consistency of the database.
Storage Management
a. A storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries submitted
to the system.
b. The storage manager is responsible for the following tasks:
c. Interaction with the file manager
d. Efficient storing, retrieving, and Storage Management
e. A storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries submitted
to the system.
a. The storage manager is responsible for the following tasks:
b. Interaction with the file manager
c. Efficient storing, retrieving, and updating of data
Database Administrator
Coordinates all the activities of the database system; the database administrator has a good
understanding of the enterprise’s information resources and needs:
a. Schema definition
b. Storage structure and access method definition
c. Schema and physical organization modification
d. Granting user authority to access the database
e. Specifying integrity constraints
f. Acting as liaison with users
g. Monitoring performance and responding to changes in requirements
Database Users
Users are differentiated by the way they expect to interact with the system.
a. Application programmers: interact with system through DML calls.
b. Sophisticated users– form requests in a database query language
c. Specialized users – write specialized database applications that do not fit into the
traditional data processing framework
d. Naive users– invoke one of the permanent application programs that have been
written previously
File manager
Manages allocation of disk space and data structures used to represent information on disk.
Database manager-The interface between low level data and application programs and queries.
Query processor
Translates statements in a query language into low-level instructions the database
manager understands.
DML precompiler
Converts DML statements embedded in an application program to normal procedure calls in
a host language. The precompiler interacts with the query processor.
DDL compiler
Converts DDL statements to a set of tables containing metadata stored in a data dictionary.
In addition, several data structures are required for physical system implementation:
Data files: store the database itself.
Data dictionary: stores information about the structure of the database. It is used heavily.
Great emphasis should be placed on developing a good design and efficient implementation of
the dictionary.
Indices: provide fast access to data items holding particular values
7. Relational Algebra
1. A set of operators (unary and binary) that take relation instances as arguments and return
new relations.
2. Gives a procedural method of specifying a retrieval query.
3. Forms the core component of a relational query engine.
4. SQL queries are internally translated into Relational Algebra expressions.
5. Provides a framework for query optimization.
6. A sequence of relational algebra operations forms a relational algebra expression
Notation:
σ : select operator ( read as sigma)
R: relation name
Examples of select expressions
Obtain information about a professor with name ―giridhar
σ name= “giridhar”(professor)
Obtain information about professors who joined the university between 1980 and 1985
σ startYear≥1980 ^ startYear < 1985(professor)
To select the tuples for all employees who either work in department 4 and make over
$25,000 per year, or work in department 5 and make over $30,000, the following
SELECT operation is given:
σ(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)
The result is shown in Figure
Notation:
Π: project operator (read as pie) R: relation name
Examples of project expressions
To list each employee’s first and last name and salary, the PROJECT operation is
used as follows:
πLname, Fname, Salary(EMPLOYEE)
The result is shown in figure
9. Entity-Relationship model
Entity-Relationship (ER) model- Popular high-level conceptual data model.
ER diagrams -Diagrammatic notation associated with the ER model.
Entity- Thing in real world with independent existence.
Attributes-Particular properties that describe entity. For example, an
EMPLOYEE entity may be described by the attributes employee’s
name, age, address, salary, and job.
Several types of attributes occur in the ER model: simple, composite,
single valued, multi valued, stored, and derived.
Simple or atomic attributes: Attributes that are not divisible.
Composite attributes: It can be divided into smaller subparts, which
represent more basic attributes with independent meanings. Composite
attributes can form a hierarchy.
Example: Address attribute of the EMPLOYEE entity can be subdivided
into Street_address, City, State, and Zip.
Fig 9.2 Two entity types, EMPLOYEE and COMPANY, and some member entities of
each
Key or uniqueness constraint: Attributes whose values are distinct for each individual
entity in entity set
Key attribute: Uniqueness property must hold for every entity set of the entity type.
Value sets (or domain of values): Specifies set of values that may be assigned to
that attribute for each individual entity.
Relationship: attribute of one entity type refers to another entity type. Represent
references as relationships not attributes.
Relationship Types, Sets, and Instances:
Relationship type R among n entity types E1, E2, ..., En: Defines a set of
associations among entities from these entity types.
Relationship instances ri: Each ri associates n individual entities (e1,e2, ..., en)and
each entity ej in ri is a member of entity set Ej.
Relationship Degree
Degree of a relationship type:1. Number of participating entity types 2. A
relationship type of degree two is called binary, and one of degree three is called
ternary.
Relationships as attributes: Think of a binary relationship type in terms of
attributes.
Fig9.3. Some instances in the WORKS_FOR relationship set, which represents a relationship
type WORKS_FOR between EMPLOYEE and DEPARTMENT
Role names: Role name signifies the role that a participating entity plays in
each relationship instance.
Recursive relationships: Same entity type participates more than oncein a relationship
type in different roles.
Cardinality ratio for a binary relationship: Specifies maximum number of relationship
instances that entity can participate in.
Participation constraint: Specifies whether existence of entity depends on its being
related to another entity.
Types: total and partial.
Attributes of Relationship Types
Attributes of 1:1 or 1:N relationship types: can be migrated to one entity type.
For a 1:N relationship type: Relationship attribute can be migrated only to
entity type on N-side of relationship.
For M:N relationship types :1.Some attributes may be determined by combination of
participating entities2. be specified as relationship attributes.
Weak Entity Types
Do not have key attributes of their own.
Identified by being related to specific entities another entity type.
Regular entity types that do have a key attribute are called strong entity types.
Identifying relationship of the weak entity type: The relationship type that relates a
weak entity type to its owner.
Summary of the notation for ER diagram:
Fig 9.4 ER Design for the COMPANY Database
10. Functional dependencies
1. The whole database is described by a single universal relation schema R = { A1, A2, ...,
An }. a. Definition:
2. A functional dependency, denoted by X → Y, between two sets of attributes X and Y that
are subsets of R specifies a constraint on the possible tuples that can form a relation state
r of R.
3. The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must
also have t1[Y] = t2[Y].
4. The values of the Y component of a tuple in r depend on, or are determined by, the values
of the X component.
5. The values of the X component of a tuple uniquely (or functionally) determine the values
of the Y component.
6. There is a functional dependency (FD or f.d) from X to Y, or that Y is functionally
dependent on X.
7. X functionally determines Y in a relation schema R if, and only if, whenever two tuples of
r(R) agree on their X-value, they must necessarily agree on their Y value. Note the
following:
If a constraint on R states that there cannot be more than one tuple with a given X-
value in any relation instance r(R)
That is, X is a candidate key of R—this implies that X → Y for any subset of
attributes Y of R.
If X→Y in R, this does not say whether or not Y→X in R.
8. A functional dependency is a property of the semantics or meaning of the attributes.
9. Whenever the semantics of two sets of attributes in R indicate that a functional
dependency should hold, specify the dependency as a constraint.
10. Relation extensions r(R) that satisfy the functional dependency constraints are called
legal relation states (or legal extensions) of R.
Fig10.3 Sample state of relation DEPARTMENT Fig10.4 . 1NF version of the same relation
with redundancy
Fig10.2is not in 1NF because Dlocations is not an atomic attribute.
There are three main techniques to achieve first normal form:
First technique:
1. Remove the attribute Dlocations and place it in a separate relation
DEPT_LOCATIONS, along with the primary key Dnumber.
2. The primary key of this relation is the combination {Dnumber, Dlocation}.
3. A distinct tuple in DEPT_LOCATIONS exists for each location of a department.
4. This decomposes the non-1NF relation into two 1NF relations.
Second Technique:
5. Expand the key so that there will be a separate tuple, in the original
DEPARTMENT relation for each location of a DEPARTMENT.
6. The primary key becomes the combination {Dnumber, Dlocation}.
7. Disadvantage: introducing redundancy in the relation.
Third technique:
8. If a maximum number of values is known for the attribute—for example, if it is
known that at most three locations can exist for a department—replace the
Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and
Dlocation3.
9. Disadvantage: Introducing NULL values if most departments have fewer than
three locations.
The first solution is considered best because it does not suffer from redundancy and it is
completely general, having no limit placed on a maximum number of values.
10.3Second Normal Form
1. It is based on the concept of full functional dependency.
2. A functional dependency X → Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold any more.
3. A functional dependency X→Y is a partial dependency if some attribute A € X can be
removed from X and the dependency still holds.
4. In the following figure, {Ssn, Pnumber}→ Hours is a fu ll dependency (neither Ssn→
Hours nor Pnumber→Hours holds).
5. However, the dependency {Ssn, Pnumber} →Ename is partial because Ssn→Ename
holds.
Fig10.10. The EMP relation with two MVDs: Ename →→ Pname and Ename →→ Dname
Decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS, shown in figure.
Both EMP_PROJECTS and EMP_DEPENDENTS are in 4NF, because the MVDs
Ename →→ Pname in EMP_PROJECTS and Ename →→ Dname in
EMP_DEPENDENTS are trivial MVDs.
No other nontrivial MVDs hold in either EMP_PROJECTS or EMP_DEPENDENTS. No
FDs hold in these relation schemas either.
Fig10.11. Decomposing the EMP relation into two 4NF relations EMP_PROJECTS
and EMP_DEPENDENTS
10.6 Join Dependencies
Let a relation R have subset of its attribute A,B,C ,..Then R satisfies the Join
dependency (JD) written as *(A,B,C) if and only if every possible legal value of R is
equal to the join of its projection A,B,C…
10.7.1Definition of 5NF:
A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies of
the form *(R1, R2, ..., Rn), where each Ri is a subset of the set of attributes of R and R =
R1 ,R2 ... Rn, at least one of the following holds.
*(R1, R2, ..., Rn) is a trivial join-dependency (i.e., one of Ri is R)
Every Ri is a super key for R.
Example:
Department Subject Student
Comp. Sc. CP1000 John Smith
Mathematics MA1000 John Smith
Comp. Sc. CP2000 Arun Kumar
Comp. Sc. CP3000 Reena Rani
Physics PH1000 Raymond Chew
Chemistry CH2000 Albert Garcia
1. The above relation says that Comp. Sc. offers subjects CP1000, CP2000 and CP3000 which are
taken by a variety of students.
2. No student takes all the subjects and no subject has all students enrolled in it and therefore all
three fields are needed to represent the information.
3. The above relation does not show MVDs since the attributes subject and student are not
independent; they are related to each other and the pairings have significant information in them.
4. The relation can therefore not be decomposed in two relations
(dept, subject), and (dept, student)
Without losing some important information.
The relation can however be decomposed in the following three relations
(dept, subject), and
(dept, student)
(subject, student)
Now it can be shown that this decomposition is lossless.