DATA BASE MANAGEMENT SYSTEMS Unit 1
DATA BASE MANAGEMENT SYSTEMS Unit 1
Introduction:
Database is a collection of related data. Database management system is software designed to assist
the maintenance and utilization of large scale collection of data. DBMS came into existence in 1960
by Charles. Integrated data store which is also called as the first general purpose DBMS. Again in
1960 IBM brought IMS- Information management system. In 1970 Edgor Codd at IBM came with
new database called RDBMS. In 1980then came SQL Architecture- Structure Query Language. In
1980 to 1990 there were advances in DBMS e.g. DB2, ORACLE.
A Database Management System (DBMS) is a collection of program that enables user to create,
maintain and manipulate a database.
The DBMS is hence a general purpose software system that facilitates the process of defining,
constructing and manipulating database for various applications.
Characteristics of DBMS
ADVANTAGES OF DBMS
1. Controlling Redundancy- This redundancy in storing the same data multiple times leads to several
problems.
First Such as entering data on a new student—multiple times: once for each file where student data is
recorded. This leads to duplication of effort.
Second, storage space is wasted when the same data is stored repeatedly, and this problem may be
serious for large databases.
Third, files that represent the same data may become inconsistent. This may happen because an update is
applied to some of the files but not to others. Technique used are data normalization ,demoralization.
2. Restricting Unauthorized Access-When multiple users share a large database, it is likely that most users
will not be authorized to access all information in the database. Only authorized persons are allowed to
access such data. In addition, some users may only be permitted to retrieve data, whereas others are allowed
to retrieve and update. Hence, the type of access operation—retrieval or update—must also be controlled.
DBMS is a secure and authorized system
3. Techniques for Efficient Query Processing-Database systems must provide capabilities for efficiently
executing queries and updates. Because the database is typically stored on disk, the DBMS must provide
specialized data structures and search techniques to speed up disk search for the desired records. Types of
searching are Indexing, Buffering and Caching and query processing and optimization
4. Providing Backup and Recovery- A DBMS must provide facilities for recovering from hardware or
software failures. The backup and recovery subsystem of the DBMS is responsible for recovery.
5. Providing Multiple User Interfaces-Because many types of users with varying levels of technical
knowledge use a database, a DBMS should provide a variety of user interfaces. These include apps for
mobile users, query languages for casual users, programming language interfaces for application
programmers, etc
6. Representing Complex Relationships among Data-A database may include numerous varieties of data
that are interrelated in many ways. A DBMS must have the capability to represent a variety of complex
relationships among the data, to define new relationships as they arise, and to retrieve and update related
data easily and efficiently.
7. Enforcing Integrity Constraints Most database applications have certain integrity constraints that must
hold for the data. A DBMS should provide capabilities for defining and enforcing these constraints. The
simplest type of integrity constraint involves specifying a data type for each data item.
Reservation clerks for airlines, hotels, etc., check availability of seats/rooms and make reservations.
o Sophisticated end users: engineers, scientists, business analysts who implement their own
applications to meet their complex needs.
o Stand-alone users: Use "personal" databases, possibly employing a special- purpose (e.g.,
financial) software package. Mostly maintain personal databases using ready-to-use packaged
applications.
o An example is a tax program user that creates its own internal database.
o System Analysts: determine needs of end users, especially naive and parametric users, and
develop specifications for canned transactions thatmeet these needs.
o Application Programmers: Implement, test, document, and maintain programs that
satisfy the specifications mentioned above.
Data models
1. Hierarchical Model
The hierarchical data model is one of the oldest data models, developed in the 1950s by IBM. In this data model,
the data is organized in a hierarchical tree-like structure. This data model can be easily visualized because each
record has one parent and many children
Advantages:
1. The representation of records is done using an ordered tree, which is natural method of
implementation of one–to-many relationships.
2. Proper ordering of the tree results in easier and faster retrieval of records.
Network Model
A network model is nothing but a generalization of the hierarchical data model as this data model
allows many to many relationships therefore in this model a record can also have more than one parent.
The network model can be represented as a graph and hence it replaces the hierarchical tree with a graph
in whichobject types are the nodes and relationships are the edges. For example -
Advantages-
Data access and Data manipulation can be done easily with this model.
Relational Model
This is the most widely accepted data model. In this model, the database is represented as a collection of
relations in the form of rows and columns of a two-dimensional table. Each row is known as a tuple (a tuple
contains all the data for an individual record) while each column represents an attribute.
Advantages
1. The main advantage of this model is its ability to represent data in a simplified format.
2. The process of manipulating record is simplified with the use of certain key attributes used to
retrieve data.
3. Representation of different types of relationship is possible with this model.
An Entity-Relationship model is a high-level data model that describes the structure of the database in a pictorial
form which is known as ER-diagram.
Entity
Entity Set
Attributes
Relationships
The data in the database at a particular moment in time is called a database state or snapshot. It is also
called the current set of occurrences or instances in the database.
Initial database state-refers to the database state when it is initially loaded into the system
Valid state- A state that satisfies the structure and constraints of the database.
The schema is sometimes called the intension, and a database state is called an extension ofthe schema.
Schema does not change frequently but data base state change whenever there is updation
1. The internal level has an internal schema: The internal level has an internal schema, which describes the
physical storage structure of the database it is also called as Internal or Physical level.
This is the lowest level of database abstraction. It describes how the data is actually stored in the database
and provides methods to access data from the database.
2. The conceptual level has a conceptual schema: The conceptual level has a conceptual schema, which
describes the structure of the whole database for a community of users.
The conceptual schema hides the details of physical storage structures and concentrates on describing
entities, data types, relationships, user operations, and constraints
3. External or View level: This is the highest level of database abstraction. External or view level describes the
actual view of data that is relevant to the particular user.
This level also provides different views of the same database for a specific user or a group of users. An
external view provides a powerful and flexible security mechanism by hiding the parts of the database from
a particular user.
Data Independence
Data independence can be defined as the capacity to change the schema at one level without changing
the schema at next higher level.
It also means the internal structure of database should be unaffected by changes to physical aspects of
storage. Because of data independence, the Database administrator can change the database storage
structures without affecting the users view.
The different levels of data abstraction are:
1. Physical data independence is the capacity to change the internal schema without
changing the conceptual schema (logical).
2. Logical data independence is the capacity to change the conceptual schema without having
to change the external schema (physical).
DATABASE LANGUAGES
Data Definition Language
o DDL stands for Data Definition Language. It is used to define database structure or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of tables and
schemas, their names, indexes, columns in each table, constraints, etc.
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a database. It
handles user requests.
Here are some tasks that come under DML:
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical transaction.
DBMS Interfaces
Menu-based Interfaces for Web Clients or Browsing-These interfaces present the user with lists of
options (called menus) that lead the user through the formulation of a request.
Menus help the user need to memorize the specific commands and syntax of a query language; rather, the
query is composed step- by step by picking options from a menu that is displayed by the system.
Pull-down menus are a very populartechnique in Web-based user interfaces.
Apps for Mobile Devices- These interfaces present mobile users with access to their data. For example,
banking, reservations, and insurance companies, among many others, provide apps that allow users to
access their data through a mobile phone or mobile device.
The apps have built-in programmed interfacesthat typically allow users to login using their account name
and password; the apps then provide a limited menu of options for mobile access to the user data.
Forms-based Interfaces- A forms-based interface displays a form to each user. Users can fill out all of the
form entries to insert new data, or they can fill out only certain entries, in which case the DBMS will
retrieve matching data for the remaining entries. Forms are usually designed and programmed for naive
users as interfaces to canned transactions. Many DBMSs have forms specification languages, which are
special languages that help programmers to implement these interfaces.
Graphical User Interfaces-A GUI typically displays a schema to the user in diagrammatic form. The user
then can specify a query by manipulating the diagram. In many cases, GUIs utilize both menus and forms.
Natural Language Interfaces- These interfaces accept requests written in English or some other language
and attempt to understand them. A natural language interface usually has its own schema and dictionary of
important words. The natural language interface refers to the words in its schema, as well as to the set of
standard words in its dictionary, that are used to interpret the request. If the interpretation is successful, the
interface generates a high-level query corresponding to the natural language request and submits it to the
DBMS for processing; otherwise, a dialogue is started with the user to clarify the request.
Keyword-based Database Search. Keywords, also commonly called search terms, are the words that you
enter into the database search boxes. They represent the main concepts of your research topic and are the
words used in everyday life to describe the topic. Without the right keywords, you may have difficulty
finding the articles that you need.
Speech Input and Output. Limited use of speech as an input query and speech as an answer to a
questionor result of a request is becoming commonplace. Applications with limited vocabularies, such as
inquiries for telephone directory, flight arrival/departure, and credit card account information, are
allowing speech for input and output to enable customers to access this information. The speech input is
detected using a library of predefined words and used to set up the parameters that are supplied to the
queries. For output, asimilar conversion from text or numbers into speech takes place.
Interfaces for Parametric Users. Parametric users, such as bank tellers, often have a small set of
operations that they must perform repeatedly. For example, a teller is able to use single function keys to
invoke routine and repetitive transactions such as account deposits or withdrawals, or balance inquiries.
Systems analysts and programmers design and implement a special interface for each known class of naive
users.
Interfaces for the DBA. Most database systems contain privileged commands that can be used only by the
DBA staff. These include commands for creating accounts, setting system parameters, granting account
authorization, changing a schema, and reorganizing the storage structures of a database.
The Database System Environment
In addition to possessing the software modules just described, most DBMSs have database utilities that help
the DBA manage the database system. Common utilities have the following types of functions:
■ Loading. A loading utility is used to load existing data files—such as text files or sequential files—into the
database. Usually, the current (source) format of the data file and the desired (target) database file structure are
specified to the utility, which then automatically reformats the data and stores it in the database. With the
proliferation of DBMSs, transferring data from one DBMS to another is becoming common in many
organizations. Some vendors offer conversion tools that generate the appropriate loading programs, given the
existing source and target database storage descriptions (internal schemas).
■ Backup. A backup utility creates a backup copy of the database, usually by dumping the entire database onto
tape or other mass storage medium. The backup copy can be used to restore the database in case of catastrophic
disk failure. Incremental backups are also often used, where only changes since the previous backup are
recorded. Incremental backup is more complex, but saves storage space.
■ Database storage reorganization. This utility can be used to reorganize a set of database files into different
file organizations and create new access paths to improve performance.
■ Performance monitoring. Such a utility monitors database usage and provides statistics to the DBA. The
DBA uses the statistics in making decisions such as whether or not to reorganize files or whether to add or drop
indexes to improve performance. Other utilities may be available for sorting files, handling data compression,
monitoring access by users, interfacing with the network, and performing other functions.
DBMS CLASSIFICATION
Based on number of users DBMS is classified as the number of users supported by the system.
Single-user systems support only one user at a time and are mostly used with PCs.
Multiuser systems, which include the majority of DBMSs, support concurrent multiple users.
A DBMS is centralized if the data is stored at a single computer site. A centralized DBMS can support
multiple users, but the DBMS and the database reside totally at a single computer site.
A distributed DBMS (DDBMS) can have the actual database and DBMS software distributed over
many sites connected by a computer network. Big data systems are often massively distributed, with
hundreds of sites. The data is often replicated on multiple sites so that failure of a site will not make
some data unavailable.
Homogeneous DDBMSs use the same DBMS software at all the sites, whereas
Hierarchical Model
The hierarchical data model is one of the oldest data models, developed in the 1950s by IBM. In this data
model, the data is organized in a hierarchical tree-like structure. This data model can be easily visualized
because each record has one parent and many children
Network Model
A network model is nothing but a generalization of the hierarchical data model as this data model allows
many to many relationships therefore in this model a record can also have more than one parent.
The network model can be represented as a graph and hence it replaces the hierarchical tree with a graph in
which object types are the nodes and relationships are the edges. For example –
Relational Model
This is the most widely accepted data model. In this model, the database is represented as a collection of
relations in the form of rows and columns of a two-dimensional table. Each row is known as a tuple (a tuple
contains all the data for an individual record) while each column represents an attribute.
Centralized Distributed
Database is maintained at one site Database is maintained at a number of
different sites
If centralized system fails, entire system If one system fails, system continues
is halted. work with other sites
The file system is a way of arranging the files in DBMS is software for managing the
Structure a storage medium within a computer. database.
Data
Redundant data can be present in a file system. In DBMS there is no redundant data.
Redundancy
Backup and It doesn’t provide Inbuilt mechanism for backup It provides in house tools for backup and
Recovery and recovery of data if it is lost. recovery of data even if it is lost.
Query There is no efficient query processing in the file Efficient query processing is there in
processing system. DBMS.
Security File systems provide less security in comparison DBMS has more security mechanisms as
Constraints to DBMS. compared to file systems.
User Access Only one user can access data at a time. Multiple users can access data at a time.
Responsibilities of the DBA and Database Designer
It is responsibility of the DBA to tune the database performance.
1) Installing and upgrading the DBMS Servers: - DBA is responsible for installing a new DBMS server for the
new projects. He is also responsible for upgrading these servers as there are new versions comes in the market or
requirement.
2) Performance Tuning:-Since database is huge and it will have lots of tables, data, constraints and indices, there
will be variations in the performance from time to time.
3) Backup & Recovery: - Proper backup and recovery programs needs to be developed by DBA and has to be
maintained him. This is one of the main responsibilities of DBA. Data should be backed up regularly so that if there is
any crash, it should be recovered without much effort and data loss.
4) Documentation:-DBA should basically maintain all his installation, backup, recovery, security methods. He
should keep various reports about database performance.
5) Security:-DBA is responsible for creating various database users and roles, and giving them different levels of
access rights.
Chapter 2 Entity-Relationship Model
Entity-Relationship Diagram
A graphical representation of entity-relationship model. Also called E-R diagram or just ERD.
Entity
An entity is an object that exists and which is distinguishable from other objects.
Entity Types and Entity Sets. Entity set- A database usually contains groups of entities that are similar. For
example, a company employing hundreds of employees may want to store similar information concerning each
of the employees. These employee entities share the same attributes, but each entity has its own value(s) for
each attribute.
An entity type defines a collection (or set) of entities that have the same attributes. Each entity type in the
database is described by its name and attributes. Figure shows two entity types: EMPLOYEE and COMPANY
and a list of some of the attributes for each. A few individual entities of each type are also illustrated, along with
the values of their attributes. The collection of all entities of a particular entity type in thedatabase at any point in
time is called an entity set or entity collection.
Attributes
An attribute is a property that describes an entity. All attributes have values. For example, a
student entity may have name, class, age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of attributes:
o Simple attribute: Simple attributes are atomic values, which cannot be divided
further. For example, student's phone-number is an atomic value of 10 digits.
o Composite attribute: Composite attributes are made of more than one simple attribute. For
example, a student's name may have First name and Last name.
o Derived attribute: Derived attributes are attributes, which do not exist physical in the database,
but there values are derived from other attributes presented in the database. For example, Age
can be derived from DOB.
o Stored attribute: An attribute whose value cannot be derived from the values of other
attributes is called a stored attribute. For example, DOB
o Single-valued attribute: Single valued attributes contain on single value. For example:
SocialSecurityNumber.
o Multi-value attribute: Multi-value attribute may contain more than one value. For example, a
person can have more than one phone numbers, Email ID etc.
Keys
o Keys play an important role in the relational database. It is a attribute or set of attribute used to uniquely
identify any record or row of data (tuple) from the table. It is also used to establish and identify
relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.
In the above table Emp-id,Aadhar-no and email _id are keys and .name cannot be a key.
Types of keys:
1. Super Key
Super key is an combination of all possible attributes that can uniquely identify a tuple. A super key is a
superset of a candidate key. A table can have multiple super keys.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple..
o Candidate key is a minimal super key. Candidate key cannot have NULL values.
o
3. Primary key
o Primary key is one of the candidate key choosen by the database designer to uniquely identify the tuple,
In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License Number and Passport_Number as primary keys since
they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
o The value of the primary key cannot be NULL. The value of primary key is always unique and cannot
have duplication.
o The value of the primary key cannot be changed. There can be only one primary key in a table
4. Alternate key
Out of all the candidate key only one get selected as the primary key and the remaining keys are called
alternate keys.
From the candidate key empid is primary key, adhaar no. and email id are alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate keys. In
this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No, acts as the
Alternate key.
5. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o It is used to link two table together. We add the primary key of the DEPARTMENT table,
Department_Id, as a new attribute in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
o Foreign key may have NULL values. And foreign key need not be unique.
o Referenced table may be called as the master table or primary table and the referencing table is called as
foreign table.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is also
known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple roles, and an
employee may work on multiple projects simultaneously. So the primary key will be composed of all three
attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as a composite key
since the primary key comprises more than one attribute.
Value Sets (Domains) of Attributes. Each simple attribute of an entity type is associated with a value set (or
domain of values), which specifies the set of values that may be assigned to that attribute for each individual
entity.
Say for example in an employee table if the range of ages allowed for employees is between 20and 60, we can
specify the valueset of the Age attribute of EMPLOYEE to be the set of integer numbers between 20and 60.
Similarly, we can specify the value set for the Name attribute to be the set of strings of alphabetic characters
separated by blank characters, and so on.
Mathematically, an attribute A of entity set E whose value set is V can be defined as a function from E to the
power set P(V) of V:
A : E →P(V)
ER DIAGRAM NATATIONS
DATABASE MANAGEMENT SYSTEMS
Relationship
Any association between two entity types is called a relationship. Entities take part in the relationship. It is
represented by a diamond shape.
For example, A teacher teaches students. Here, " teaches " is a relationship and this is the relationship between a
Teacher entity and a Student entity.
Degree of Relationship
In DBMS, a degree of relationship represents the number of entity types that associate in a relationship.
Forexample, we have two entities, one is a student and the other is teacher and they are connected with the
primary key and foreign key. So, here we can see that the degree of relationship is 2, as 2 entities are
associating in a relationship.
Types of degree
Now, based on the number of linked entity types, we have 4 types of degrees of relationships.
1. Unary
2. Binary
3. Ternary
4. N-ary
Unary-In this type of relationship, both the associating entity type are the same. So, we can say that
unary relationships exist when both entity types are the same and we call them the degree of relationship
is 1. Or in other words, in a relation only one entity set is participating then such type of relationship is
known as a unary relationship.’
Example: In a particular class, we have many students, there are monitors too. So, here class monitors
are also students. Thus, we can say that only students are participating here. So the degree of such type
of relationship is 1.
Binary (degree 2)-In a Binary relationship, there are two types of entity associates. So, we can say that a
Binary relationship exists when there are two types of entity and we call them a degree of relationship is 2.
Or in other words, in a relation when two entity sets are participating then such type of relationship is
known as a binary relationship. This is the most used relationship and one can easily be converted into a
relational table.
Example: We have two entity types ‘Student’ and ‘ID’ where each ‘Student’ has his ‘ID’. So, here two
entity types are associating we can say it is a binary relationship. Also, one ‘Student’ can have many
‘daughters’ but each ‘daughter’ should belong to only one ‘father. We can say that it is a one-to-many
binary relationship.
Ternary(degree 3)- In the Ternary relationship, there are three types of entity associates. So, we can say
that a Ternary relationship exists when there are three types of entity and we call them a degree of
relationship is 3. Since the number of entities increases due to this, it becomes very complex to turn E-R
into a relational table. Now let’s understand with the examples.
Example: We have three entity types ‘Teacher’, ‘Course’, and ‘Class’. The relationship between these
entities is defined as the teacher teaching a particular course, also the teacher teaches a particular class. So,
here three entity types are associating we can say it is a ternary relationship.
N-ary (n degree)-In the N-ary relationship, there are n types of entity that associates. So, we can say that an
N-ary relationship exists when there are n types of entities. There is one limitation of the N-ary relationship,
as there are many entities so it is very hard to convert into an entity, rational table. So, this is very
uncommon, unlike binary which is very much popular.
Example: We have 5 entities Teacher, Class, Location, Salary, Course. So, here five entity types are
associating we can say an n-ary relationship is 5.
EMPLOYEE and DEPARTMENT, which associates each employee with the department for which the
employee works. Each relationship instance in the relationship set WORKS_FOR associates one EMPLOYEE
entity and one DEPARTMENT entity.
Structural constraints
There are two main types
1. Cardinality ratio
2. Participation
Mapping Cardinalities:
Cardinality defines the number of entities in one entity set which can be associated to the
number of entities of other set via relationship set.
Types of Cardinality Ratios-
There are 4 types of cardinality ratios-
1. Many-to-Many Cardinality-
An entity in set A can be associated with any number (zero or more) of entities in set B.
An entity in set B can be associated with any number (zero or more) of entities in set A.
Symbol Used-
2. Many-to-One Cardinality-
3. One-to-Many Cardinality-
By this cardinality constraint,
An entity in set A can be associated with any number (zero or more) of entities in set B.
An entity in set B can be associated with at most one entity in set A.
Symbol Used-
Here,
1. One-to-One Cardinality-
Symbol Used-
Here,
Participation Constraints and Existence Dependencies. The participation constraint specifies whether
the existence of an entity depends on its being related to another entity via the relationship type.
This constraint specifies the minimum number of relationship instances that each entity can participate in and
is sometimes called the minimum cardinality constraint.
There are two types of participation constraints
Total
Partial—that we illustrate by example.
If a company policy states that every employee must work for a department, then an employee
entity can exist only if it participates in at least one WORKS_FOR relationship instance Thus, the
Participation of EMPLOYEE in WORKS_FOR is called total participation,
meaning that every entity in thetotal set of employee entities must be related to a department entity via
WORKS_FOR. Total participation is also called existence dependency.
In Figure below we do not expect every employee to manage a department, so the participation of
EMPLOYEE in the MANAGES relationship type is partial, meaning that some or part of the set of
employee entities are related to some department entity via MANAGES, but not necessarily all.
.
In ER diagrams, total participation (or existence dependency) is displayed as a double
line connecting the participating entity type to the relationship, whereas partial participation
is represented by a single line (see Figure 3.2).
1. Total Participation-
1. It specifies that each entity in the entity set must compulsorily participate in at least one relationship
instance in that relationship set. That is why, it is also called as mandatory participation.
2. Total participation is represented using a double line between the entity set and relationship set.
Example-
Here,
Double line between the entity set “Student” and relationship set “Enrolled in” signifies total participation. It
specifies that each student must be enrolled in at least one course.
2. Partial Participation-
1. It specifies that each entity in the entity set may or may not participate in the relationship instance in that
relationship set. That is why, it is also called as optional participation.
2. Partial participation is represented using a single line between the entity set and relationship set.
Example-
Here,
Single line between the entity set “Course” and relationship set “Enrolled in” signifies partial participation. It
specifies that there might exist some courses for which no enrollment are made.
Strong entity- The strong entity always has a primary key. Its existence is not dependent on any other entity
i.e it is independent of other entity.
A set of strong entities is known as strong entity set.
Strong entity is represented by a single rectangle.
Weak entity- The weak entity does not have a sufficient attributes to form a primary key. i.e weak entity do
not have a primary key.
A weak entity is dependent on a strong entity to ensure its existence. It is represented by double rectangle.
Example 2