DBMS All Units
DBMS All Units
DBMS All Units
UNIT-I
Gn 1
DDBMS
INTRODUCTION
Data - Data is the raw material that can be processed for any computing machine.
For example − Employee name, Product name, Name of the student, Marks of the student,
Mobile number, Image etc.
Information - Information is the data that has been converted into more useful or intelligent
form.
For example: Report card sheet.
The information is needed for the following reasons −
To gain knowledge about the surroundings.
To keep the system up to date.
To know about the rules and regulations of the society.
Knowledge - The human mind purposefully organizes the information and evaluates it to
produce knowledge.
Example of data, information and knowledge - A student secures 450 marks. Here 450
is data, marks of the student is the information and hard work required to get the marks
is knowledge.
The major differences between Data and Information are as follows
Data Information
Data is the raw fact. It is a processed form of data.
It is not significant to a business. It is significant to a business.
Data is an atomic level piece of information. It is a collection of data.
Example: Product name, Name of student. Example: Report card of student.
It is a phenomenal fact. It is organized data.
This is the primary level of intelligence. It is a secondary level of intelligence.
May or may not be meaningful. Always meaningful.
Understanding is difficult. Understanding is easy.
The diagram given below depicts the use of data and information in a database
Gn 2
DDBMS
CHARACTERISTICS OF DBMS
Characteristics of DBMS are as follow
Reduce Redundancy
Storing of Data
Concurrent Access
Data Consistency
Transaction Support
Security
Support to SQL
Way of Storing the data - In database management system data is stored into tables,
structure for the table is created initially. This table structure is also known as schema in
dbms.
Schema in dbms provides the information about various attributes of name the table, data
type of the attribute. DBMS a provide a facility to represent a relationship among the related
table.
Reduced Redundancy - This is one of the important feature of the dbms that it reduces the
redundancy. Here the term redundancy can be seen as unnecessary repetition or duplication
of data in database.
To reduce the redundancy DBMS use Normalization in DBMS concept which decompose the
given table into smaller tables in order to minimize the redundancy.
Note that DBMS does not guarantee the 100% removal of the redundancy it can only
minimize the redundancy.
Concurrent Access - DBMS support the concurrent access of the database to the multiple
users. Multiple users can work on the database at the same time and still maintained the
consistency. Here the term consistency represents the correctness of the database.
Gn 3
DDBMS
Data Consistency - The term data consistency means state of data should be consistent
means it should be correct at any instant of time. Result of any manipulation or updation
should be reflected.
Support to Structure Query Language - DBMS support to SQL. SQL Queries provide a
easy way to the user to create, insert, update, delete the data in database.
Security - DBMS give s the facility to protect the database from unauthorized users.
Different user s accounts may have different access permissions, using which user can easily
secure their data by unauthorized users.
Transaction Support - DBMS supports transactions which helps the user to maintain the
integrity of the database.
Some DBMS software used in software industry are Oracle, MY SQL and SQL server.
Gn 4
DDBMS
The university clerk has now two choices: either obtain the list of all students and extract the
needed information manually or ask a programmer to write the necessary application
program. Both alternatives are obviously unsatisfactory. Suppose that such a program is
written, and that, several days later, the same clerk needs to trim that list to include only those
students who have taken at least 60 credit hours. As expected, a program to generate such a
list does not exist. Again, the clerk has the preceding two options, neither of which is
satisfactory. The point here is that conventional file-processing environments do not allow
needed data to be retrieved in a convenient and efficient manner. More responsive data-
retrieval systems are required for general use.
Data isolation - Because data scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
Integrity problems - The data values stored in the database must satisfy certain types of
consistency constraints. Suppose the university maintains an account for each department,
and records the balance amount in each account. Suppose also that the university requires that
the account balance of a department may never fall below zero. Developers enforce these
constraints in the system by adding appropriate code in the various application programs.
However, when new constraints are added, it is difficult to change the programs to enforce
them. The problem is compounded when constraints involve several data items from different
files.
Atomicity problems - A computer system, like any other device, is subject to failure. In
many applications, it is crucial that, if a failure occurs, the data be restored to the consistent
state that existed prior to the failure.
Consider a program to transfer $500 from the account balance of department A to the account
balance of department B. If a system failure occurs during the execution of the program, it is
possible that the $500 was removed from the balance of department A but was not credited to
the balance of department B, resulting in an inconsistent database state. Clearly, it is essential
to database consistency that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is
difficult to ensure atomicity in a conventional file-processing system.
Concurrent-access anomalies - For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. Indeed,
today, the largest Internet retailers may have millions of accesses per day to their data by
shoppers. In such an environment, interaction of concurrent updates is possible and may
result in inconsistent data. Consider department A, with an account balance of $10,000. If
two department clerks debit the account balance (by say $500 and $100, respectively) of
Gn 5
DDBMS
department A at almost exactly the same time, the result of the concurrent executions may
leave the budget in an incorrect (or inconsistent) state. Suppose that the programs executing
on behalf of each withdrawal read the old balance, reduce that value by the amount being
withdrawn, and write the result back. If the two programs run concurrently, they may both
read the value $10,000, and write back $9500 and $9900, respectively. Depending on which
one writes the value last, the account balance of department A may contain either $9500 or
$9900, rather than the correct value of $9400. To guard against this possibility, the system
must maintain some form of supervision.
But supervision is difficult to provide because data may be accessed by many different
application programs that have not been coordinated previously.
Security problems - Not every user of the database system should be able to access all the
data. Data should be secured from unauthorised access, for example a student in a college
should not be able to see the payroll details of the teachers, such kind of security constraints
are difficult to apply in file processing systems.
These difficulties, among others, prompted the development of database systems. In what
follows, we shall see the concepts and algorithms that enable database systems to solve the
problems with file-processing systems.
ADVANTAGES OF DBMS:
Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing
same data multiple times). In a database system, by having a centralized database and
centralized control of data by the DBA the unnecessary duplication of data is avoided. It also
eliminates the extra time for processing the large volume of data. It results in saving the
storage space.
Improved Data Sharing: DBMS allows a user to share the data in any number of application
programs.
Data Integrity: Integrity means that the data in the database is accurate. Centralized control
of the data helps in permitting the administrator to define integrity constraints to the data in
the database. For example: in customer database we can enforce integrity that it must accept
the customer only from Noida and Meerut city.
Data Security: It is easier to apply access constraints in database systems so that only
authorized user is able to access the data. Each user has a different set of access thus data is
secured from the issues such as identity theft, data leaks and misuse of data.
Data Consistency: By eliminating data redundancy, we greatly reduce the opportunities for
inconsistency. For example: A customer address is stored only once, we cannot have
disagreement on the stored values. Also updating data values is greatly simplified when each
Gn 6
DDBMS
value is stored in one place only. Finally, we avoid the wasted storage that results from
redundant data storage.
Easy access to data : Database systems manages data in such a way so that the data is easily
accessible with fast response times. Even if the database size is huge, the DBMS can still
provide faster access and updation of data.
Easy recovery: Since database systems keeps the backup of data, it is easier to do a full
recovery of data in case of a failure. This is very useful especially for almost all the
organizations, as the data maintained over time should not be lost during a system crash or
failure.
Flexible: Database systems are more flexible than file processing systems. DBMS systems
are scalable, the database size can be increased and decreased based on the amount of storage
required. It also allows addition of additional tables as well as removal of existing tables
without disturbing the consistency of data.
Reduced Application Development and Maintenance Time: DBMS supports many
important functions that are common to many applications, accessing data stored in the
DBMS, which facilitates the quick development of application.
Disadvantages of DBMS
It is bit complex. Since it supports multiple functionality to give the user the best, the
underlying software has become complex. The designers and developers should have
thorough knowledge about the software to get the most out of it.
Because of its complexity and functionality, it uses large amount of memory. It also needs
large memory to run efficiently.
DBMS system works on the centralized system, i.e.; all the users from all over the world
access this database. Hence any failure of the DBMS, will impact all the users.
DBMS is generalized software, i.e.; it is written work on the entire systems rather specific
one. Hence some of the application will run slow.
HISTORY OF DATABASES
In early 1960’s, Charles Bachman was the first person to develop the Integrated Data
Store (IDS) which was based on network data model.
In the late 1960’s, IBM (International Business Machines Corporation) developed the
Integrated Management Systems (IMS) which is the standard database system used till
date in many places. It was developed based on the hierarchical database model.
Gn 7
DDBMS
It was during the year 1970 that the relational database model was developed by Edgar F
Codd. Many of the database models we use today are relational based. It was considered
the standardized database model from then.
In 1976, Peter Chen has developed Entity-Relationship (ER) model which is widely used
in database design
Later during the 1980’s, IBM developed the Structured Query Language (SQL) as a part
of R project. It was declared as a standard language for the queries by ISO and ANSI.
Gn 8
DDBMS
Gn 9
DDBMS
doesn’t show the data present in those tables. Schema is only a structural view (design) of a
database.
The data stored in database at a particular moment of time is called instance of database.
Database schema defines the attributes in tables that belong to a particular database. The
value of these attributes at a moment of time is called the instance of that database.
For Example consider we have a single table student in the database, today the table has 100
records, so today the instance of the database has 100 records. We are going to add another
100 records in this table by tomorrow so the instance of database tomorrow will have 200
records in table. In short, at a particular moment the data stored in database is called the
instance, this changes over time as and when we add, delete or update data in the database.
Another Example The concept of database schemas and instances can be understood by
analogy to a program written in a programming language. A database schema corresponds to
the variable declarations in a program. Each variable has a particular value at a given instant.
The values of the variables in a program at a point in time correspond to an instance of a
database schema.
Difference between Schema and Instance:
Schema Instance
It is the overall description of the database. It is the collection of information stored in
a database at a particular moment.
Schema is same for whole database. Data in instances can be changed using
addition, deletion, updation.
Does not change Frequently. Changes Frequently.
Defines the basic structure of the database i.e It is the set of Information stored at a
how the data will be stored in the database. particular time.
Gn 10
DDBMS
External level - It is also called view level. The reason this level is called “view” is because
several users can view their desired data from this level which is internally fetched from
database with the help of conceptual and internal level mapping.
This is the highest level of database abstraction. It includes a number of external schemas or
user views. This level provides different views of the same database for a specific user or a
group of users. An external view provides a powerful and flexible security mechanism by
hiding the parts of the database from a particular user.
Conceptual level - It is also called logical level. This level describes the structure of the
whole database. It acts as a middle layer between the physical storage and user view. It
explains what data to be stored in the database, what the data types are, and what relationship
exists among those data. There is only one conceptual schema per database.
Database constraints and security are also implemented in this level of architecture. This level
is maintained by DBA (database administrator).
Internal level - This level is also known as physical level. This level describes how the data
is actually stored in the storage devices. This level is also responsible for allocating space to
the data. This is the lowest level of the architecture.
Gn 11
DDBMS
Gn 12
DDBMS
DATA MODEL
Data Model gives us an idea that how the final system will look like after its complete
implementation. It defines the data elements and the relationships between the data elements.
Data Models are used to show how data is stored, connected, accessed and updated in the
database management system.
Some of the Data Models in DBMS are:
Hierarchical Model
Network Model
Entity-Relationship Model
Relational Model
Object-Oriented Data Model
Gn 13
DDBMS
Hierarchical Model - Hierarchical Model was the first DBMS model. This model organises
the data in the hierarchicall tree structure.
Each child node has one parent node but a parent node can have more than one child node.
Multiple parents are not allowed.
This model has the ability to manage one
one-to-one
one relationships as well as one-to-many
one
relationships.
For Example
Network Model - This model is an extension of the hierarchical model. It was the most
popular model before the relational model. It replaces the hierarchical tree with a graph.
A parent node can have more than one child node and a child node also can have more than
one parent node
This model has the ability to manage one-to-one
one one relationships as well as many-to-many
many
relationships.
For Example: In the example below we can see that node student has two parents i.e. CSE
Department and Library. This was earlier not possible in the hierarchical model.
Gn 14
DDBMS
Relational Model - Relational Model is the most widely used model. In this model, the data
is maintained in the form of a two-dimensional table. All the information is stored in the
form of row and columns. The basic structure of a relational model is tables. So, the tables
are also called relations in the relational model.
For Example, we have an Employee table.
For Example:
Gn 15
DDBMS
In the above diagram, the entities are Teacher and Department. The attributes
of Teacher entity are Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The
attributes of entity Department entity are Dept_id, Dept_name. The two entities are
connected using the relationship. Here, each teacher works for a department.
Object-Oriented Data Model - In this model, both the data and relationship are present in a
single structure known as an object.
In this model, two are more objects are connected through links. We use this link to relate
one object to other objects. This can be understood by the example given below.
In the above example, we have two objects Employee and Department. All the data and
relationships of each object are contained as a single unit. The attributes like Name,
Job_title of the employee and the methods which will be performed by that object are
stored as a single object. The two objects are connected through a common attribute i.e the
Department_id and the communication between these two will be done with the help of this
common id.
Gn 16
DDBMS
DATABASE USERS
Database users are categorized based up on their interaction with the database. These are
seven types of database users in DBMS.
Database Administrator (DBA): Database Administrator (DBA) is a person/team who
defines the schema and also controls the 3 levels of database. The DBA will then create a
new account id and password for the user if he/she needs to access the database. DBA is also
responsible for providing security to the database and he allows only the authorized users to
access/modify the data base. DBA is responsible for the problems such as security breaches
and poor system response time.
DBA also monitors the recovery and backup and provide technical support.
The DBA has a DBA account in the DBMS which called a system or superuser
account
DBA repairs damage caused due to hardware and/or software failures.
DBA is the one having privileges to perform DCL (Data Control Language)
operations such as GRANT and REVOKE, to allow/restrict a particular user from
accessing the database.
Naive / Parametric End Users : Parametric End Users are the unsophisticated who don’t
have any DBMS knowledge but they frequently use the database applications in their daily
life to get the desired results.
For example, Railway’s ticket booking users are naive users. Clerks in any bank is a naive
user because they don’t have any DBMS knowledge but they still use the database and
perform their given task.
System Analyst: System Analyst is a user who analyzes the requirements of parametric end
users. They check whether all the requirements of end users are satisfied.
Sophisticated Users: Sophisticated users can be engineers, scientists, business analyst, who
are familiar with the database. They can develop their own database applications according to
their requirement. They don’t write the program code but they interact with the database by
writing SQL queries directly through the query processor.
Database Designers: Data Base Designers are the users who design the structure of database
which includes tables, indexes, views, triggers, stored procedures and constraints which are
usually enforced before the database is created or populated with data. He/she controls what
data must be stored and how the data items to be related.
Application Programmers: Application Programmers also referred as System Analysts or
simply Software Engineers, are the back-end programmers who writes the code for the
Gn 17
DDBMS
application programs. They are the computer professionals. These programs could be written
in Programming languages such as Visual Basic, Developer, C, FORTRAN, COBOL etc.
Casual Users / Temporary Users: Casual Users are the users who occasionally use/access
the database but each time when they access the database they require the new information,
for example, Middle or higher level manager.
Gn 18
DDBMS
Gn 19
DDBMS
Two tier architecture - In two-tier architecture, the Database system is present at the server
machine and the DBMS application is present at the client machine, these two machines are
connected with each other through a reliable network as shown in the below diagram.
Whenever client machine makes a request to access the database present at server using a
query language like SQL, the server perform the request on the database and returns the result
back to the client. The application connection interface such as JDBC, ODBC are used for the
interaction between server and client.
Three tier architecture - In three-tier architecture, another layer is present between the
client machine and server machine. In this architecture, the client application doesn’t
communicate directly with the database systems present at the server machine, rather the
client application communicates with server application and the server application internally
communicates with the database system present at the server.
Gn 20
DDBMS
Gn 21
DDBMS
UNIT-II
Gn 22
DDBMS
ER MODEL
An Entity–relationship model (ER model) describes the structure of a database with the help
of a diagram, which is known as Entity Relationship Diagram (ER Diagram).
The ER model defines the conceptual view of a database.
An ER model is a design or blueprint of a database that can later be implemented as a
database.
Component of ER Diagram
Gn 23
DDBMS
Weak Entity - An entity that depends on another entity called a weak entity. The weak entity
doesn't contain any key attribute of its own. (or)
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity.
The weak entity is represented by a double rectangle.
For example – a bank account cannot be uniquely identified without kno
knowing
wing the bank to
which the account belongs, so bank account is a weak entity.
Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an ER
diagram.
There are four types of attributes:
Key attribute
Composite attribute
Multivalued attribute
Derived attribute
Key attribute - A key attribute can uniquely identify an entity from an entity set. For
example, student roll number can uniquely identify a student from a set of students. Key
attribute is represented by oval
val same as other attributes however the text of key attribute is
underlined.
Gn 24
DDBMS
Multivalued attribute - An attribute that can hold multiple values is known as multivalued
attribute. It is represented with double ovals in an ER Diagram.
For example – A person can have more than one phone numbers so the phone number
attribute is multivalued.
Derived attribute - A derived attribute is one whose value is dynamic and derived from
another attribute. It is represented by dashed oval in an ER Diagram. For example – Person
age is a derived attribute as it changes over time and can be derived from another attribute
(Date of birth).
For Example, the complete entity type Student with its attributes can be represented as
Gn 25
DDBMS
Relationship
A relationship is represented by diamond shape in ER diagram, it shows the relationship
among entities.
There are four types of relationships:
One to One
One to Many
Many to One
Many to Many
One to One Relationship - When a single instance of an entity is associated with a single
instance of another entity then it is called one to one relationship.
For example, a person has only one passport and a passport is given to one person.
One to Many Relationship - When a single instance of an entity is associated with more
than one instances of another entity then it is called one to many relationship.
For example – a customer can place many orders but a order cannot be placed by many
customers.
Gn 26
DDBMS
Many to One Relationship - When more than one instances of an entity is associated with a
single instance of another entity then it is called many to one relationship.
For example – many students can study in a single college but a student cannot study in many
colleges at the same time.
Many to Many Relationship - When more than one instances of an entity is associated with
more than one instances of another entity then it is called many to many relationship.
For example, a can be assigned to many projects and a project can be assigned to many
students.
RELATIONAL MODEL
The relational Model was proposed by E.F. Codd to model data in the form of relations or
tables. After designing the conceptual model of the Database using ER diagram, we need to
convert the conceptual model into a relational model which can be implemented using any
RDBMS language like Oracle SQL, MySQL, etc.
The relational model represents how data is stored in Relational Databases. A relational
database stores data in the form of relations (tables).
Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE, and
AGE shown in the following table.
ROLL_NO NAME ADDRESS PHONE AGE
1 Rahul Hyderabad 9455123451 18
2 Ravi Vijayawada 9652431543 18
3 Mohan Tadepalligudem 9156253131 20
4 Satish Tanuku 18
Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
Relation Schema: A relation schema represents the name of the relation with its attributes.
e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the relation
schema for STUDENT.
Gn 27
DDBMS
Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples, one
of which is shown as:
1 Rahul Hyderabad 9455123451 18
Relation Instance: The set of tuples of a relation at a particular instance of time is called a
relation instance. The above table shows the relation instance of STUDENT at a particular
time. It can change whenever there is an insertion, deletion, or update in the database.
Degree: The number of attributes in the relation is known as the degree of the relation.
The STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality.
The STUDENT relation defined above has cardinality 4.
Column: The column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from the relation STUDENT.
ROLL NO
1
2
3
4
NULL Values: The value which is not known or unavailable is called a NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.
Referential Constraints - When one attribute of a relation can only take values from another
attribute of the same relation or any other relation, it is called referential Constraints.
Let us suppose we have 2 relations
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE
1 Rahul Hyderabad 9455123451 18 CS
2 Ravi Vijayawada 9652431543 18 CS
3 Mohan Tadepalligudem 9156253131 20 ECT
4 Satish Tanuku 18 ECE
BRANCH
BRANCH_CODE BRANCH_NAME
CS Computer Science
IT Information Technology
ECT Electronics and Communication Technology
ECE Electronics and Communication Engineering
BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint. The relation
which is referencing another relation is called REFERENCING RELATION (STUDENT in
this case) and the relation to which other relations refer is called REFERENCED RELATION
(BRANCH in this case).
Advantages of using the relational model
The advantages and reasons due to which the relational model in DBMS is widely accepted
as a standard are:
Simple and Easy To Use - Storing data in tables is much easier to understand and
implement as compared to other storage techniques.
Manageability - Because of the independent nature of each relation in a relational
database, it is easy to manipulate and manage. This improves the performance of the
database.
Query capability - With the introduction of relational algebra, relational databases
provide easy access to data via high-level query language like SQL.
Data integrity - With the introduction and implementation of relational constraints, the
relational model can maintain data integrity in the database.
Gn 29
DDBMS
Gn 30
DDBMS
Schema Refinement - To check whether our plan is perfect or not. Refinement means to
remove unwanted elements. In this step we analyze the collections of relations in our
relational database schema to identify potential problems and to refine it. In contrast to the
requirements analysis and conceptual design steps which are essentially subjective, schema
refinement can be guided by some elegant and powerful theory.
Physical Database Design - Based on the plan we start construing the house. So we should
focus on arranging the required cement, clay, sand, iron and wood etc. and we construct the
house according the the capacity of the basement. In this step we must consider typical
expected workloads that our database must support and further refine the database design to
ensure that it meets desired performance criteria. This step may simply involve building
indexes on some tables and clustering some tables, or it may involve a substantial redesign of
parts of the database schema obtained from the earlier design steps.
Applications and Security Design - Our building is almost completed. Now we don't allow
everybody into our home. There are some restrictions. If the person is a stranger then we talk
to him outside the gate. If he is a friend, we let him come inside the home and make him sit in
the hall and talk, if he a relative we will allow him to stay with us... In this step, we identify
different user groups and different roles played by various users (e.g., the development team
for a product, the customer support representatives, the product manager). For each role and
user group, we must identify the parts of the database that they must be able to access and the
parts of the database that they should not be allowed to access and take steps to ensure that
they can access only the necessary parts.
Gn 31
DDBMS
Entity Type
The entity type is a collection of the entity having similar attributes. In the above Student
table example, we have each row as an entity and they are having common attributes i.e each
row has its own value for attributes Roll_no, Age, Student_name and Mobile_no. So, we can
define the above STUDENT table as an entity type because it is a collection of entities having
the same attributes.
The table below shows how the data of different entities( different students) are stored.
The E-R representation of the above Student Entity Type is done below.
Note: We use a rectangle to represent an entity type in the E-R diagram, not entity.
Gn 32
DDBMS
above example, Roll_no identifies each element of the table uniquely and hence, we can say
that STUDENT is a strong entity type.
For Example:
Weak Entity Type - Weak entity type doesn't have a key attribute. Weak entity type can't be
identified on its own. It depends upon some other strong entity for its distinct identity. This
can be understood with a real-life example. There can be children only if the parent exits.
There can be no independent existence of children. There can be a room only if building
exits. There can be no independent existence of a room.
A weak entity is represented by a double outlined rectangle. The relationship between a weak
entity type and strong entity type is called an identifying relationship and shown with a
double outlined diamond instead of a single outlined diamond. This representation can be
seen in the diagram below.
Example : If we have two tables of Customer(Customer_id, Name, Mobile_no, Age, Gender)
and Address(Locality, Town, State). Here we cannot identify the address uniquely as there
can be many customers from the same locality. So, for this, we need an attribute of Strong
Entity Type i.e ‘Customer’ here to uniquely identify entities of 'Address' Entity Type.
Entity Set
Entity Set is a collection of entities of the same entity type. In the above example of
STUDENT entity type, a collection of entities from the Student entity type would form an
entity set.
Gn 33
DDBMS
For example
Attributes
An attribute
tribute is a property or characteristic of an entity. An entity may contain any number of
attributes. One of the attributes is considered as the primary key. In an Entity-Relation
Entity Relation model,
attributes are represented in an elliptical shape.
For Example: Student
ent has attributes like name, age, roll number, and many more. To
uniquely identify the student, we use the primary key as a roll number as it is not repeated.
There are different types of attributes: Simple, Composite, Single
Single-valued,
valued, Multi
Multi-valued,
Derivedd attribute, Stored Attribute and key attribute. One more attribute is there, i.e. Complex
Attribute, this is the rarely used attribute.
Simple Attribute: It is also known as atomic attributes. When an attribute cannot be divided
further, then it is calledd a simple attribute.
For example, the roll number of a student, the id number of an employee.
Composite Attribute: Composite attributes are those that are made up of the composition of
more than one attribute. When any attribute can be divided further into
into more sub
sub-attributes,
then that attribute is called a composite attribute.
For example, the
he address can be further split into house number, street number, city, state,
country, and pin code, the name can also be split into first name middle name, and last
la name.
Single-valued Attribute: Those attributes which can have exactly one value are known as
single valued attributes. They contain singular values, so more than one value is not allowed.
For example, the DOB of a student can be a single valued attri
attribute.
bute. Another example is
gender because one person can have only one gender.
Gn 34
DDBMS
Multi-valued Attribute: Those attributes which can have more than one entry or which
contain more than one value are called multi valued attributes.
In the Entity Relationship (ER) diagram, we represent the multi valued attribute by double
oval representation.
For example, one person can have more than one phone number, so that it would be a multi
valued attribute.
Derived Attribute: When one attribute can be derived from the other attribute, then it is
called a derived attribute.
For example, the age of a student can be a derived attribute because we can get it by the DOB
of the student.
Another example can be of working experience, which can be obtained by the date of joining
of an employee.
In the ER diagram, we represent the derived attributes by a dotted oval shape.
Stored Attributes: Values of stored attributes remain constant and fixed for an entity
instance and also, and they help in deriving the derived attributes.
For example, the Age attribute can be derived from the Date of Birth attribute, and also,
the Date of birth attribute has a fixed and constant value throughout the life of an entity.
Hence, the Date of Birth attribute is a stored attribute.
Key Attribute: Key attributes are those attributes that can uniquely identify the entity in the
entity set.
For example, Roll-No is the key attribute because it can uniquely identify the student.
Complex Attribute: If any attribute has the combining property of multi values and
composite attributes, then it is called a complex attribute. It means if one attribute is made up
of more than one attribute and each attribute can have more than one value, then it is called a
complex attribute.
For example, if a person has more than one office and each office has an address made from a
street number and city. So the address is a composite attribute, and offices are multi valued
attributes, so combining them is called complex attributes.
Gn 35
DDBMS
Relationship Set
A relationship set is a set of relationships of same type.
For Example, Set representation of above ER diagram is
Gn 36
DDBMS
Degree of a Relationship Set - The number of entity sets that participate in a relationship set
is termed as the degree of that relationship set. Thus,
Degree of a relationship set = Number of entity sets participating in a relationship set
On the basis of degree of a relationship set, a relationship set can be classified into the
following types
Unary relationship set
Binary relationship set
Ternary relationship set
N-ary relationship set
Unary Relationship Set - Unary relationship set is a relationship set where only one entity
set participates in a relationship set.
For Example,
Binary Relationship Set - Binary relationship set is a relationship set where two entity sets
participate in a relationship set.
For Example,
Ternary Relationship Set - Ternary relationship set is a relationship set where three entity
sets participate in a relationship set.
For Example,
Gn 37
DDBMS
For example, a person has only one passport and a passport is given to one person.
Gn 38
DDBMS
For example – a customer can place many orders but a order cannot be placed by many
customers.
For example – many students can study in a single college but a student cannot study in many
colleges at the same time.
For example, a can be assigned to many projects and a project can be assigned to many
students.
Gn 39
DDBMS
Entity versus Attribute - While identifying the attributes of an entity set, it is sometimes not
clear whether a property should be modeled as an attribute or as an entity set.
In these cases we treat an attribute as an entity for example - Consider the entity set employee
with attributes employee-name and telephone-number. It can easily be argued that a
telephone is an entity in its own right with attributes telephone-number and location (the
office where the telephone is located).
If we take this point of view, we must redefine the employee entity set as:
The employee entity set with attribute employee-name.
The telephone entity set with attributes telephone-number and location
The relationship set emp-telephone, which denotes the association between employees
and the telephones that they have. Such a conversion of attribute helps to give extra
information about it when required.
Gn 40
DDBMS
Entity versus Relationship - Sometimes, an entity set can be better expressed in relationship
set. Thus, it is not always clear whether an object is best expressed by an entity set or a
relationship set.
Gn 41
DDBMS
Placing Relationship Attributes - The cardinality ratio in DBMS can help us determine in
which scenarios we need to place relationship attributes. It is recommended to represent the
attributes of one to one or one to many relationship sets with any participating entity sets
rather than a relationship set.
For example, if an entity cannot be determined as a separate entity rather it is represented by
the combination of participating entity sets. In such case it is better to associate these entities
to many-to-many relationship sets.
Key Constraints
A key constraint is that constraint which uniquely identifies the tuple of a relation. A Relation
should have at least one key constraint.
If any attribute is defined as a key constraint, the values are different for each tuple of that
key attribute.
In the following Employee table, Employee_ID is a key attribute. In this attribute, no two
values can have the same name. This attribute cannot have any Null value.
Gn 42
DDBMS
Super Key - A super key is a set of attributes or single attribute that uniquely identify the
rows (tuples) in a table.
In the given Student Table we can have the following keys as the super key.
1. {Roll_no}
2. {Registration_no}
3. {Roll_no, Registration_no},
4. {Roll_no, Name}
5. {Name, Registration_no}
6. {Roll_no, Name, Registration_no}
All the above keys are able to uniquely identify each row. So, each of these keys is super
key.
Candidate Key - A candidate key is a minimal super key or a super key with no redundant
attribute. It is called a minimal super key because we select a candidate key from a set of
super key such that selected candidate key is the minimum attribute required to uniquely
identify the table. It is selected from the set of the super key which means that all candidate
keys are super key. Candidate Keys are not allowed to have NULL values.
In the above example, we had 6 super keys but all of them cannot become a candidate key.
Only those super keys would become a candidate key which have no redundant attributes.
1. {Roll_no}: This key doesn't have any redundant or repeating attribute. So, it can be
considered as a candidate key.
2. {Registration_no}: This key also doesn't have any repeating attribute. So, it can be
considered as a candidate key.
Gn 43
DDBMS
Primary Key - The primary key is the minimal set of attributes which uniquely identifies any
row of a table. It is selected from a set of candidate keys. Any candidate key can become a
primary key. It depends upon the requirements and is done by the Database Administrator
(DBA). The primary key cannot have a NULL value. It cannot have a duplicate value.
In the above example, we saw that we have two candidate keys i.e (Roll_no) and
(Registration_no). From this set, we can select any key as the primary key for our table. It
depends upon our requirement. Here, if we are talking about class then selecting ‘Roll_no’ as
the primary key is more logical instead of ‘Registrartion_no’.
Gn 44
DDBMS
Specifying Key Constraints - We can specify constraints at the time of creating the table
using CREATE TABLE statement. We can also specify the constraints after creating a table
using ALTER TABLE statement.
Consider Creation of “Student” table,
CREATE TABLE Students ( sid CHAR(20),
name CHAR(30),
login CHAR(20),
age INTEGER,
gpa REAL,
UNIQUE (name, age),
CONSTRAINT Students Key PRIMARY KEY (sid))
Gn 45
DDBMS
One thing that is to be noted here is that the foreign key of one table may or may not be the
primary key. But it should be the primary key of another table. In the above
example, Course_id is not a primary key in the Student table but it is a primary key in the
Course table.
Specifying Foreign Key Constraints –
CREATE TABLE Course (Course_id INTEGER,
Course_name CHAR(20),
Duration(months) INTEGER,
PRIMARY KEY Course_id,
FOREIGN KEY (Course_id) REFERENCES Student)
The foreign key constraint states that every Course_id value in Course must also appear in
Students, that is, Course_id in Course is a foreign key referencing Student.
General Constraints
Domain, primary key, and foreign key constraints are considered to be a fundamental part of
the relational data model and are given special attention in most commercial systems.
Sometimes, however, it is necessary to specify more general constraints.
For example, we may require that student ages be within a certain range of values; given such
an IC specification, the DBMS will reject inserts and updates that violate the constraint. This
is very useful in preventing data entry errors.
If we specify that all students must be 20 years old, then only those students which are having
20 years and above age are valid cases i.e., legal instance. Rest of all others having less than
20 years are called as invalid cases i.e., illegal instance.
Current relational database systems support such general constraints in the form of table
constraints and assertions. Table constraints are associated with a single table and are
Gn 46
DDBMS
checked whenever that table is modified. In contrast, assertions involve several tables and are
checked whenever any of these tables is modified.
For Example we have two entities Student and Teacher. Attributes of Entity Student are
Name, Address & Grade. Attributes of Entity Teacher are: Name, Address & Salary
These two entities have two common attributes: Name and Address, we can make a
generalized entity with these common attributes.
We have created a new generalized entity Person and this entity has the common attributes of
both the entities. As you can see in the following ER diagram that after the generalization
process the entities Student and Teacher only has the specialized attributes Grade and Salary
respectively and their common attributes (Name & Address) are now associated with a new
entity Person which is in the relationship with both the entities (Student & Teacher).
Gn 47
DDBMS
Specialization
In specialization, an entity is divided into sub-entities based on their characteristics. It is a
top-down approach where higher level entity is specialized into two or more lower level
entities.
For Example, there is an entity in the School database, whose name is Teacher.
The Teacher entity contains three attributes, whose names are Name, Age, and Salary.
This Teacher entity can be further broken into three entities, i.e., Math_Teacher,
English_Teacher, and Science_Teacher. These sub-entities are the three type of teacher
working in the school, and all have common attributes which are associated with the parent
entity Teacher.
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.
For example, Center entity offers the Course entity act as a single entity in the relationship
which is in a relationship with another entity visitor. In the real world, if a visitor visits a
coaching center then he will never enquiry about the Course only or just about the Center
instead he will ask the enquiry about both.
Gn 48
DDBMS
Generalization vs Specialization
Generalization Specialization
Generalization is a bottom-up manner approach Specialization is a top-down manner approach
Specialization divides an entity to form multiple
Generalization collects the common features of
new entities that inherit some feature of the
multiple entities to form a new entity.
splitting entity.
In Generalization, the higher level entity must In Specialization, the higher level entity may
have lower level entities. not have lower level entities.
In Generalization, schema size reduces. In Specialization, schema size increases.
Generalization is applied to a group of entities. Specialization is applied to a single entity.
Generalization forms a single entity from Specialization forms multiple entities from a
multiple entities. single entity.
Inheritance is not used in generalization. Inheritance can be used in specialization.
Gn 49
DDBMS
Gn 50
DDBMS
Gn 51
DDBMS
Gn 52
DDBMS
UNIT-III
Gn 53
DDBMS
RELATIONAL ALGEBRA
Relational Algebra is a procedural query language, which takes Relation as input and
generates relation as output.
Relational algebra works on relational model. The purpose of a query language is to retrieve
data from database or perform various operations such as insert, update, delete on the data.
When we say that relational algebra is a procedural query language, it means that it tells what
data to be retrieved and how to be retrieved.
On the other hand relational calculus is a non-procedural query language, which means it tells
what data to be retrieved but doesn’t tell how to retrieve it.
Gn 54
DDBMS
Project Operator (∏) - Project operator is denoted by ∏ symbol and it is used to select
desired columns (or attributes) from a table (or relation). It eliminates duplicates.
Gn 55
DDBMS
ROLL NAME
1 Amar
2 Ramesh
3 Latha
4 Sandya
RENAME
Rename (ρ) - Rename (ρ) operation can be used to rename a relation or an attribute of a
relation. Rename operation is denoted by "Rho"(ρ).
Suppose we are fetching the names of students from STUDENT relation. We would like to
rename this relation as STUDENT_NAME.
ρ(STUDENT_NAME,∏ NAME(STUDENT))
STUDENT_NAME
NAME
Amar
Ramesh
Latha
Sandya
As we can see, this output relation is named "STUDENT_NAME".
Gn 56
DDBMS
The name, age column of student table are renamed as sname and sage respectively
ρ SNAME,SAGE (∏ NAME,AGE(STUDENT))
SET OPERATIONS
Set Operations are Union (∪), Set Intersection (∩), Set Difference (-) and Cartesian product
(X).
Union Operator (∪)
Union operation is done by Union Operator which is represented by "union"(∪). It is the
same as the union operator from set theory, i.e., it selects all tuples from both relations but
with the exception that for the union of two relations/tables both relations must have the same
set of Attributes. It is a binary operator as it requires two operands.
If relations don't have the same set of attributes, then the union of such relations will result
in NULL.
The Syntax of Union Operator (∪) is
table_name1 ∪ table_name2
The rows (tuples) that are present in both the tables will only appear once in the union set. In
short we can say that there are no duplicates present after the union operation.
For Example, Consider two relations STUDENT and EMPLOYEE
STUDENT
ROLL NAME AGE
1 Amar 20
2 Ramesh 18
3 Latha 19
4 Sandya 20
EMPLOYEE
EMPLOYEE_NO NAME AGE
E-1 Amar 30
E-2 Ramya 33
E-3 Lokesh 29
E-4 Harsha 35
Gn 57
DDBMS
Suppose we want all the names from STUDENT and EMPLOYEE relation.
∏ NAME(STUDENT) ∪ ∏ NAME(EMPLOYEE)
Then the output is
NAME
Amar
Ramesh
Latha
Sandya
Ramya
Lokesh
Harsha
Gn 58
DDBMS
Gn 59
DDBMS
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101
DEPARTMENT
D_NO D_NAME E_NO MIN_EXPERIENCE
D-1 HR E-1 03
D-2 IT E-2 05
D-3 Marketing E-3 02
It will be much easier to understand Join Operations when we have the Cartesian product.
The Cartesian Product of the above two relations is
Gn 60
DDBMS
EXPERIEN MIN_EXPERIE
E_NO E_NAME CITY D_NO D_NAME E_NO
CE NCE
E-1 Ram Delhi 04 D-1 HR E-1 03
E-1 Ram Delhi 04 D-2 IT E-2 05
E-1 Ram Delhi 04 D-3 Marketing E-3 02
E-2 Varun Chandigarh 09 D-1 HR E-1 03
E-2 Varun Chandigarh 09 D-2 IT E-2 05
E-2 Varun Chandigarh 09 D-3 Marketing E-3 02
E-3 Ravi Noida 03 D-1 HR E-1 03
E-3 Ravi Noida 03 D-2 IT E-2 05
E-3 Ravi Noida 03 D-3 Marketing E-3 02
E-4 Amit Bangalore 07 D-1 HR E-1 03
E-4 Amit Bangalore 07 D-2 IT E-2 05
E-4 Amit Bangalore 07 D-3 Marketing E-3 02
Theta Join (⋈Ɵ) or Conditional Join (⋈c): Conditional Join is used when you want to join
two or more relation based on some conditions.
Notation: R ⋈θ S Where R is the first relation S is the second relation
Example: we want a relation where EXPERIENCE from EMPLOYEE >=
MIN_EXPERIENCE from DEPARTMENT.
Gn 61
DDBMS
EXPERIEN MIN_EXPERIE
E_NO E_NAME CITY D_NO D_NAME E_NO
CE NCE
E-4 Amit Bangalore 07 D-1 HR E-1 03
E-4 Amit Bangalore 07 D-2 IT E-2 05
E-4 Amit Bangalore 07 D-3 Marketing E-3 02
Equijoin(⋈): Equijoin is a special case of conditional join where only equality condition
holds between a pair of attributes.
A non-equijoin is the inverse of an Equi join, which occurs when you join on a condition
other than "=".
Example: we would like to join EMPLOYEE and DEPARTMENT relation where E_NO
from EMPLOYEE = E_NO from DEPARTMENT.
Natural Join (⋈): A Natural Join can be performed only if two relations share at least one
common attribute. Furthermore, the attributes must share the same name and domain.
Natural join operates on matching attributes where the values of the attributes in both
relations are the same and remove the duplicate ones.
Notation: R ⋈ S Where R is the first relation S is the second relation
EMPLOYEE ⋈ DEPARTMENT
Gn 62
DDBMS
Result is:
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
Outer Join
Unlike Inner Join which includes the tuple that satisfies the given condition, Outer Join also
includes some/all the tuples which don't satisfy the given condition.
It is also of three types:
Left Outer Join
Right Outer Join
Full Outer Join
Left Outer Join: Left Outer Join returns the matching tuples (tuples present in both
relations) and the tuples which are only present in Left Relation, here R.
However, if the matching tuples are NULL, then attributes/columns of Right Relation, here S
are made NULL in the output relation.
Example:
EMPLOYEE ⟕EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
Here we are combining EMPLOYEE and DEPARTMENT relation with the constraint that
EMPLOYEE's E_NO must be equal to DEPARTMENT's E_NO.
Result is:
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
E-4 Amit Bangalore 07 NULL NULL NULL
We can see here, all the tuples from left, i.e., EMPLOYEE relation are present. But E-4 is not
satisfying the given condition, i.e., E_NO from EMPLOYEE must be equal to E_NO from
DEPARTMENT, still it is included in the output relation. This is because Outer Join also
includes some/all the tuples which don't satisfy the condition. That's why Outer Join marked
E-4's corresponding tuple/row from DEPARTMENT as NULL.
Gn 63
DDBMS
Right Outer Join: Right Outer Join returns the matching tuples and the tuples which are
only present in Right Relation here S.
The same happens with the Right Outer Join, if the matching tuples are NULL, then the
attributes of Left Relation, here R are made NULL in the output relation.
We will combine EMPLOYEE and DEPARTMENT relations with the same constraint as
above.
EMPLOYEE ⟖EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
Result is:
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
As all the tuples from DEPARTMENT relation have a corresponding E_NO in EMPLOYEE
relation, therefore no tuple from EMPLOYEE relation contains a NULL.
Full Outer Join: Full Outer Join returns all the tuples from both relations. However, if there
are no matching tuples then, their respective attributes are made NULL in output relation.
Again, combine the EMPLOYEE and DEPARTMENT relation with the same constraint.
EMPLOYEE ⟗EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
Result is:
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
E-4 Amit Bangalore 07 NULL NULL NULL
DIVISION
Division (÷) - Division Operation is represented by "division"(÷ or /) operator and is used in
queries that involve keywords "every", "all", etc.
Notation : R(X,Y)/S(Y)
Here, R is the first relation from which data is retrieved.
S is the second relation that will help to retrieve the data.
Gn 64
DDBMS
X and Y are the attributes/columns present in relation. We can have multiple attributes in
relation, but keep in mind that attributes of S must be a proper subset of attributes of R.
For each corresponding value of Y, the above notation will return us the value of X from
tuple<X,Y> which exists everywhere.
Gn 65
DDBMS
Right Outer Join (⟖) returns the matching tuples and tuples which are only present in the
right relation.
Full Outer Join (⟗) returns all the tuples present in the left and right relations.
(Q1) Find the names of sailors who have reserved boat 103.
We first compute the set of tuples in Reserves with bid = 103 and then take the natural join of
this set with Sailors. Evaluated on the instances R2 and S3, it yields a relation that contains
just one field, called sname, and three tuples Dustin, Horatio, and Lubber.
Gn 66
DDBMS
This query involves a series of two joins. First we choose (tuples describing) red boats. Then
we join this set with Reserves (natural join, with equality specified on the bid column) to
identify reservations of red boats. Next we join the resulting intermediate relation with
Sailors (natural join, with equality specified on the sid column) to retrieve the names of
sailors who have made reservations of red boats. Finally, we project the ‘sailors' names. The
answer, when evaluated on the instances B1, R2 and S3, contains the names Dustin, Horatio,
and Lubber.
This query is very similar to the query we used to compute sailors who reserved red boats. On
instances B1, R2, and S3, the query will return the colors green and red boat.
The join of Sailors and Reserves creates an intermediate relation in which tuples consist of a
Sailors tuple attached to a Reserves tuple. A Sailors tuple appears in (some tuple of) this
intermediate relation only if at least one Reserves tuple has the same sid value, that is, the
sailor has made some reservation. The answer, when evaluated on the instances B1, R2 and
S3, contains the three tuples Dustin, Horatio, and Lubber. Even though there are two sailors
called Horatio who have reserved a boat, the answer contains only one copy of the tuple
Horatio, because the answer is a relation, i.e., a set of tuples, without any duplicates.
We identify the set of all boats that are either red or green (Tempboats, which contains boats
with the bids 102, 103, and 104 on instances B1, R2, and S3). Then we join with Reserves to
identify sids of sailors who have reserved one of these boats; this gives us sids 22, 31, 64, and
74 over our example instances. Finally, we join (an intermediate relation containing this set
of sids) with Sailors to find the names of Sailors with these sids. This gives us the names
Dustin, Horatio, and Lubber on the instances B1, R2, and S3.
Gn 67
DDBMS
First we compute tuples of the form 〈sid, sname, bid〉, where sailor sid has made a
reservation for boat bid; this set of tuples is the temporary relation Reservations. Next we find
all pairs of Reservations tuples where the same sailor has made both reservations and the
boats involved are distinct. Here is the central idea, in order to show that a sailor has reserved
two boats, we must find two Reservations tuples involving the same sailor but distinct boats.
Over instances B1, R2, and S3, the sailors with sids 22, 31, and 64 have each reserved at least
two boats. Finally, we project the names of such sailors to obtain the answer, containing the
names Dustin, Horatio, and Lubber.
This query illustrates the use of the set-difference operator. We use the fact that sid is the key
for Sailors. We first identify sailors aged over 20 (over instances B1, R2, and S3, sids 22, 29,
31, 32, 58, 64, 74, 85, and 95) and then discard those who have reserved a red boat (sids 22,
31, and 64), to obtain the answer (sids 29, 32, 58, 74, 85, and 95). If we want to compute the
names of such sailors, we must first compute their sids (as shown above), and then join with
Sailors and project the sname values.
The intermediate relation Tempsids is defined using division, and computes the set of sids of
sailors who have reserved every boat (over instances B1, R2, and S3, this is just sid 22). We
define the two relations that the division operator (/) is applied to—the first relation has the
schema (sid,bid) and the second has the schema (bid). Division then returns all sids such that
there is a tuple 〈sid,bid〉 in the first relation for each bid in the second. Joining Tempsids
Gn 68
DDBMS
with Sailors is necessary to associate names with the selected sids, for sailor 22, the name is
Dustin.
The only difference with respect to the previous query is that now we apply a selection to
Boats, to ensure that we compute only bids of boats named Interlake in defining the second
argument to the division operator. Over instances B1, R2, and S3, Tempsids evaluates to sids
22 and 64, and the answer contains their names, Dustin and Horatio.
RELATIONAL CALCULUS
Relational calculus is a non-procedural query language that tells the system what data to be
retrieved but doesn’t tell how to retrieve it. It uses mathematical predicate calculus.
There are two types of Relational Calculus:
Tuple Relational Calculus
Domain Relational Calculus
TUPLE RELATIONAL CALCULUS (TRC)
Tuple relational calculus which was originally proposed by Codd in the year 1972. Tuple
relational calculus is used for selecting those tuples that satisfy the given condition.
Gn 69
DDBMS
1: Find the First_name, Last_name, Age of students greater than or equal to 27 age.
{t| t ∈ STUDENT ∧ t[Age]>=27} or { t | STUDENT(t) AND t.age >=27 }
Resulting relation:
First_Name Last_Name Age
Narendra Chari 28
Swapna Kumari 30
Navya Kumari 29
2: Query to display the first name of those students where age is greater than 29
{ t.First_Name | STUDENT(t) AND t.age > 29 }
Resulting relation:
First_Name
Swapna
3: Query to display all the details of students where Last name is ‘Kumari’
{ t | STUDENT(t) AND t.Last_Name = 'Kumari' }
Resulting relation:
First_Name Last_Name Age
Swapna Kumari 30
Navya Kumari 29
Example: Query to find the first name and age of students where student age is greater than
27
{< First_Name, Age > | ∈ STUDENT ∧ Age > 27}
Resulting relation:
Gn 70
DDBMS
First_Name Age
Narendra 28
Swapna 30
Navya 29
Gn 71
DDBMS
We will present a number of sample queries using the following table definitions:
Sailors(sid: integer, sname: string, rating: integer, age: real)
Boats(bid: integer, bname: string, color: string)
Reserves(sid: integer, bid: integer, day: date)
Gn 72
DDBMS
Example:
1. Find the names and ages of all Sailors
SELECT DISTINCT sname, age FROM Sailors (or)
SELECT DISTINCT S.sname, S.age FROM Sailors S
Answer is:
The answer is a set of rows, each of which is a pair (sname, age). If two or more sailors have
the same name and age, the answer still contains just one pair with that name and age. This
query is equivalent to applying the projection operator of relational algebra.
Gn 73
DDBMS
3. Find the names of sailors who have reserved boat number 103.
It can be expressed in SQL as follows.
SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid AND R.bid=103
Let us compute the answer to above query on the instances R3 of Reserves and S4 of Sailors
shown below
Gn 74
DDBMS
The first step is to construct the cross-product S4 x R3, which is shown below
The second step is to apply the qualification S.sid = R.sid AND R.bid=103.
Then the Result is:
sname
rusty
Gn 75
DDBMS
5. Find the names of sailors who have reserved at least one boat.
SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid
Ex: Find the ages of sailors whose name begins and ends with B and has at least three
characters.
SELECT S.age FROM Sailors S WHERE S.sname LIKE ‘B_%B'
Gn 76
DDBMS
Union
The SQL Union operation is used to combine the result of two or more SQL SELECT
queries. The union operation eliminates the duplicate rows from its resultset.
Syntax is
SELECT column_list FROM table1 where condition
UNION
SELECT column_list FROM table2 where condition
For Example,
SELECT * FROM STUDENT1
UNION
SELECT * FROM STUDENT2
The Result is:
ID NAME
1 Ravi
2 Gopal
3 Swapna
Gn 77
DDBMS
4 Kalyan
5 Ram
Union All
Union All operation is equal to the Union operation. It returns the set without removing
duplication and sorting the data.
Syntax is
SELECT column_list FROM table1 where condition
UNION
SELECT column_list FROM table2 where condition
For Example,
SELECT * FROM STUDENT1
UNION ALL
SELECT * FROM STUDENT2
The Result is:
ID NAME
1 Ravi
2 Gopal
3 Swapna
3 Swapna
4 Kalyan
5 Ram
Intersect
It is used to combine two SELECT statements. The Intersect operation returns the common
rows from both the SELECT statements.
Syntax is
SELECT column_list FROM table1 where condition
INTERSECT
SELECT column_list FROM table2 where condition
For Example,
SELECT * FROM STUDENT1
INTERSECT
SELECT * FROM STUDENT2
The Result is:
Gn 78
DDBMS
ID NAME
3 Swapna
Minus
It combines the result of two SELECT statements. Minus operator is used to display the rows
which are present in the first query but absent in the second query.
Syntax is
SELECT column_list FROM table1 where condition
MINUS
SELECT column_list FROM table2 where condition
For Example,
SELECT * FROM STUDENT1
MINUS
SELECT * FROM STUDENT2
The Result is:
ID NAME
1 Ravi
2 Gopal
Examples:
1. Find the names of sailors who have reserved a red or a green boat.
SELECT S1.sname FROM Sailors S1, Boats B1, Reserves R1
WHERE S1.sid = R1.sid AND R1.bid = B1.bid AND B1.color = 'red'
UNION
SELECT S2.sname FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = 'green'
2. Find the names of sailors who have reserved a red and a green boat.
SELECT S1.sname FROM Sailors S1, Boats B1, Reserves R1
WHERE S1.sid = R1.sid AND R1.bid = B1.bid AND B1.color = 'red'
INTERSECT
SELECT S2.sname FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = 'green'
3. Find the sids of all sailor's who have reserved red boats but not green boats.
Gn 79
DDBMS
Gn 80
DDBMS
The ALL operator compares a value of the outer query's result with all the values of
the inner query's result and returns the row if it matches all the values.
The ANY operator compares a value of the outer query's result with all the inner
query's result values and returns the row if there is a match with any value.
Example : IN
Find the names of sailors who have reserved boat 103.
SELECT S.sname FROM Sailors S
WHERE S.sid IN
( SELECT R.sid FROM Reserves R WHERE R.bid = 103 )
Result is
sname
Dustin
Lubber
Horatio
Example : NOT IN
Find the names of sailors who have not reserved boat 103.
SELECT S.sname FROM Sailors S
WHERE S.sid NOT IN
( SELECT R.sid FROM Reserves R WHERE R.bid = 103 )
Result is
Gn 81
DDBMS
sname
Brutus
Andy
Rusty
Horatio
Zorba
Art
Bob
Find the names of sailors who have not reserved a red boat.
SELECT S.sname FROM Sailors S
WHERE S.sid NOT IN
( SELECT R.sid FROM Reserves R
WHERE R. bid IN
( SELECT B.bid FROM Boats B WHERE B.color = 'red' ) )
Result is
sname
Brutus
Andy
Rusty
Zorba
Horatio
Art
Bob
Set-Comparison Operators
SQL also supports op ANY and op ALL, where op is one of the arithmetic comparison
operators {<, <=, =, <>, >=, >}.
Example : ALL
Find the Sailors with the highest rating.
SELECT S.sname FROM Sailors S
WHERE S.rating >= ALL
( SELECT S2.rating FROM Sailors S2 )
Result is
Gn 82
DDBMS
sname
Rusty
Zorba
Example : ANY
Find sailors whose rating is better than some sailor called Andy.
SELECT S.sname FROM Sailors S
WHERE S.rating > ANY
( SELECT S2.rating FROM Sailors S2 WHERE S2.sname = 'Andy' )
Result is
sname
Rusty
Zorba
Horatio
Example:
Find the names of sailors who have reserved boat number 103.
SELECT S.sname FROM Sailors S
WHERE EXISTS
( SELECT * FROM Reserves R WHERE R.bid = 103 AND R.sid = S.sid )
Result is
sname
Dustin
Lubber
Horatio
Gn 83
DDBMS
COUNT() - Count function is used to Count the number of rows in a database table. It can
work on both numeric and non-numeric data types.
The Syntax is
COUNT(*) OR COUNT([DISTINCT]COLUMN_NAME)
COUNT(*) returns the total number of rows in a given table.
COUNT(COULUMN_NAME) returns the total number of non-null values present in the
column which is passed as an argument in the function.
For Example,
SELECT COUNT(*) FROM EMP - Returns total number of records .i.e 6.
SELECT COUNT(Salary) FROM EMP - Return number of Non Null values over the
column Salary. i.e 5.
SELECT COUNT(DISTINCT Salary) FROM EMP - Return number of distinct Non Null
values over the column salary .i.e 4
Gn 84
DDBMS
SUM() - Sum function is used to calculate the sum of all selected columns. It works on
numeric fields only.
The Syntax is
SUM([DISTINCT]COLUMN_NAME)
For Example,
SELECT SUM(Salary) FROM EMP - Sum all Non Null values of Column salary i.e., 310
SELECT SUM(DISTINCT Salary) FROM EMP - Sum of all distinct Non-Null values
i.e., 250.
AVG() - The AVG function is used to calculate the average value of the numeric type. AVG
function returns the average of all non-Null values.
The Syntax is
AVG([DISTINCT]COLUMN_NAME)
For Example,
SELECT AVG(Salary) FROM EMP - Sum all Non Null values of Column salary i.e.,
310/5
SELECT AVG(DISTINCT Salary) FROM EMP - Sum of all distinct Non-Null values
i.e., 250/4.
MIN() - MIN function is used to find the minimum value of a certain column. This function
determines the smallest value of all selected values of a column.
The Syntax is
MIN(COLUMN_NAME)
For Example,
SELECT MIN(Salary) FROM EMP - Minimum value in the salary column except NULL i.e.,
40.
MAX() - MAX function is used to find the maximum value of a certain column. This
function determines the largest value of all selected values of a column.
The Syntax is
MAX(COLUMN_NAME)
For Example,
SELECT MAX(Salary) FROM EMP - Maximum value in the salary i.e., 80.
Gn 85
DDBMS
We use the aggregate functions such as COUNT(), MAX(), MIN(), SUM(), AVG(), etc., in
the SELECT query. The result of the GROUP BY clause returns a single row for each value
of the GROUP BY column.
The Syntax is
SELECT column1, function_name(column2) FROM table_name
WHERE condition
GROUP BY column1, column2;
For example consider the following table EMP
Id Name Salary
1 A 80
2 B 40
3 A 60
4 B 70
5 C 60
6 B 100
Gn 86
DDBMS
ORDER BY - The SQL ORDER BY clause is used to sort the result set in either ascending
or descending order.
By default ORDER BY sorts the data in ascending order.
We can use the keyword DESC to sort the data in descending order and the keyword ASC to
sort in ascending order.
Sort according to one column
The Syntax is
SELECT * FROM table_name ORDER BY column_name ASC|DESC
Sort according to multiple columns
The Syntax is
SELECT * FROM table_name ORDER BY column1 ASC|DESC , column2 ASC|DESC
Consider a relation STUDENT
Roll_No First_Name Last_Name Age
1 Narendra Chari 28
2 Swapna Kumari 28
3 Jagan Mohan 26
4 Navya Kumari 26
Gn 87
DDBMS
We can see that first the result is sorted in ascending order according to Age. There are
multiple rows of having the same Age. Now, sorting further this result-set according to
ROLL_NO will sort the rows with the same Age according to ROLL_NO in descending
order.
HAVING - The HAVING clause places the condition in the groups defined by the GROUP
BY clause in the SELECT statement. This SQL clause is implemented after the GROUP BY
clause in the SELECT statement.
The HAVING clause in SQL is used if we need to filter the result set based on aggregate
functions such as MIN() and MAX(), SUM() and AVG() and COUNT().
The Syntax is
SELECT column_name, aggregate_function_name(column_name) FROM table_name
GROUP BY column_name HAVING condition ORDER BY column1, column2;
Gn 88
DDBMS
Now, suppose that we want to show those cities whose total salary of employees is more than
5000. For this case, we have to type the following query with the HAVING clause in SQL
SELECT SUM(Emp_Salary), Emp_City FROM Employee GROUP BY Emp_City
HAVING SUM(Emp_Salary)>5000;
The Result is
SUM(Emp_Salary) Emp_City
9000 Delhi
8000 Jaipur
Gn 89
DDBMS
NULL VALUES
The SQL NULL is the term used to represent a missing value. A NULL value in a table is a
value in a field that appears to be blank.
We use null when the column value is either unknown or inapplicable.
A field with a NULL value is a field with no value. It is very important to understand that a
NULL value is different than a zero value or a field that contains spaces.
For Example consider the following table EMPLOYEE
Emp_Id Emp_Name Emp_Salary Emp_City
201 Abhay 2000 Goa
202 Ankit 4000 Delhi
203 Bheem NULL Jaipur
204 Ram 2000 NULL
205 Sumit NULL Delhi
IS NOT NULL - IS NOT NULL operator is used to test for non-null values in the specified
column.
The Syntax is
SELECT * FROM tableName WHERE columnName IS NOT NULL;
For Example,
SELECT * FROM EMPLOYEE WHERE Emp_Salary IS NOT NULL;
Gn 90
DDBMS
Result is
Emp_Id Emp_Name Emp_Salary Emp_City
201 Abhay 2000 Goa
202 Ankit 4000 Delhi
204 Ram 2000 NULL
Gn 91
DDBMS
Outer Joins
This is special case of join operator which considers null values.
Outer joins are joins that return matched values and unmatched values from either or both
tables. There are a few types of outer joins:
Left Join returns only unmatched rows from the left table, as well as matched rows in
both tables.
Right Join returns only unmatched rows from the right table , as well as matched rows
in both tables.
Full Outer Join returns unmatched rows from both tables,as well as matched rows in
both tables.
Consider the following two tables DEPARTMENT and PROJECT
DEPARTMENT
DEPT_MID DNO PNO
101 2 11
97 5 22
120 4 33
PROJECT
PNO PNAME
44 D
11 A
22 B
Left Join - The SQL left join returns all the values from the left table and it also includes
matching values from right table, if there are no matching join value it returns NULL.
The Syntax for Left Join:
SELECT table1.column1, table2.column2.... FROM table1
LEFT JOIN table2 ON table1.column_field = table2.column_field;
Join the two tables with LEFT JOIN:
SELECT *.D1, *.P1 FROM DEPARTMENT D1
LEFT JOIN PROJECT P1 ON D1.PNO = P1.PNO;
The result is
DEPT_MID DNO PNO PNO PNAME
101 2 11 11 A
97 5 22 22 B
120 4 33 NULL NULL
Gn 92
DDBMS
Right Join - The SQL right join returns all the values from the right table. It also includes the
matched values from left table, if there is no matching in both tables, it returns NULL.
The Syntax for Right Join:
SELECT table1.column1, table2.column2..... FROM table1
RIGHT JOIN table2 ON table1.column_field = table2.column_field;
We will join the two tables with RIGHT JOIN:
SELECT *.D1, *.P1 FROM DEPARTMENT D1
RIGHT JOIN PROJECT P1 ON D1.PNO = P1.PNO;
The result is
DEPT_MID DNO PNO PNO PNAME
NULL NULL NULL 44 D
101 2 11 11 A
97 5 22 22 B
Full Join - The SQL full join is the result of combination of both left and right outer join and
the join tables have all the records from both tables. It puts NULL on the place of matches
not found.
The Syntax for full outer join:
SELECT table1.column1, table2.column2.... FROM table1
FULL JOIN table2 ON table1.column_name = table2.column_name;
We will join the two tables with FULL JOIN:
SELECT *.D1, *.P1 FROM DEPARTMENT D1
FULL JOIN PROJECT P1 ON D1.PNO = P1.PNO;
The result is
DEPT_MID DNO PNO PNO PNAME
101 2 11 11 A
97 5 22 22 B
120 4 33 NULL NULL
NULL NULL NULL 44 D
Gn 93
DDBMS
For Example,
CREATE TABLE Student
( ID int NOT NULL,
LastName varchar(15) NOT NULL,
FirstName varchar(15),
Age int,
PRIMARY KEY (ID)
);
Gn 94
DDBMS
INTEGER is the source type for the domain ratingval. The optional DEFAULT keyword is
used to associate a default value with a domain. If no value is entered for this column in an
inserted tuple, the default value 1 associated with ratingval is used.
Another Example, we can create a new domain for salary by stating the following SQL
statement
CREATE DOMAIN salary INTEGER DEFAULT 15000
CHECK (VALUE>=15000 AND VALUE<=40000)
Assertions: ICs over Several Tables
Assertions are group of tables on which a constraint is applied. Unlike table constraints which
are applied on single table, assertions are applied on multiple tables.
As an example, suppose that we wish to enforce the constraint that the number of boats plus
the number of sailors should be less than 100.
We could try the following table constraint:
CREATE TABLE Sailors ( sid INTEGER,
sname CHAR ( 10) ,
rating INTEGER,
age REAL,
PRIMARY KEY (sid),
CHECK ( rating >= 1 AND rating <= 10)
CHECK ( (SELECT COUNT (S.sid) FROM Sailors S )
+ (SELECT COUNT (B. bid) FROM Boats B) < 100));
The disadvantage of above solution is that it involves only Sailors table whereas Boats table
must also be involved equally.
The best solution is to create an assertion, as follows
CREATE ASSERTION total
CHECK ((SELECT COUNT (S.sid) FROM Sailors S)
+ (SELECT COUNT (B. bid) FROM Boats B) < 100);
Gn 95
DDBMS
Gn 96
DDBMS
Active Databases
An active Database is a database consisting of a set of triggers. These databases are very
difficult to be maintained because of the complexity that arises in understanding the effect of
these triggers.
In such database, DBMS initially verifies whether the particular trigger specified in the
statement that modifies the database is activated or not, prior to executing the statement. If
the trigger is active then DBMS executes the condition part and then executes the action part
only if the specified condition is evaluated to true. It is possible to activate more than one
trigger within a single statement. In such situation, DBMS processes each of the trigger
randomly.
The execution of an action part of a trigger may either activate other triggers or the same
trigger that Initialized this action. Such types of trigger that activates itself is called as
‘recursive trigger’.
There are several uses of triggers
Triggers can be used to maintain data integrity
Triggers can be used to identify unusual events that occur in a database
Triggers can be used for security checks and also for auditing.
Gn 97
DDBMS
UNIT-IV
Gn 98
DDBMS
Gn 99
DDBMS
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building }, Here, roll_no can determine values of
fields name, dept_name and dept_building, hence a valid Functional dependency
roll_no → dept_name, Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building accurately,
since departments with different dept_name will also have a different dept_building
More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Gn 100
DDBMS
Gn 101
DDBMS
Gn 102
DDBMS
Gn 103
DDBMS
Gn 104
DDBMS
The above table is in 1NF since it contains atomic values. Assume RNo is primary Key.
Decompose the above table into two smaller relations depending on corresponding FD’s i.e.,
FD RNo → Name, Course, Fee and FD RNo → Lang-Known to remove the partial
functional dependency and bring the above table in 2NF as follows.
Gn 105
DDBMS
From the above table FD Set is RNo → Name, RNo → Course, RNo → Fee, Course → Fee
RNo is the Primary Key.
If A->B and B->C are the two FD’s, then A->C is called the Transitive Dependency.
For the above relation, RNo → Course and Course → Fee are true. So Fee is transitively
dependent on RNo. It violates the third normal form.
Course Table
Course Fee
ECT 45K
CSE 50K
IT 40K
Lang Table
RNo Lang-Known
1 Telugu
1 English
2 Telugu
2 Hindi
3 English
Gn 106
DDBMS
Gn 107
DDBMS
All the anomalies which were present in R, now removed in the above two relations.
Gn 108
DDBMS
Gn 109
DDBMS
Gn 110
DDBMS
Common attribute must be a key for at least one relation(R1 or R2) i.e., Att(R1) ⋂
Att(R2) -> Att(R1) or Att(R1) ⋂ Att(R2)->Att(R2)
For Example,
A relation R(A,B,C,D) with FD set {A->BC} is decomposed into R1(ABC) and R2(AD).
This is lossless join decomposition because
First rule holds true as Att(R1) ⋃ Att(R2)=(ABC) ⋃ (AD)= (ABCD) = Att(R)
Second rule holds true as Att(R1) ⋂ Att(R2) = (ABC) ⋂ (AD) ≠ Ø
Third rule holds true as Att(R1) ⋂ Att(R2) = A is a key of R1(ABC) because A-
>BC is given
Gn 111
DDBMS
UNIT-V
Gn 112
DDBMS
TRANSACTION
Transactions are a set of operations that are used to perform some logical set of work. A
transaction is made to change data in a database which can be done by inserting new data,
updating the existing data, or by deleting the data that is no longer required.
A transaction is a set of logically related operations.
For example, you are transferring money from your bank account to your friend’s account,
the set of operations would be like below
Read your account balance
Deduct the amount from your balance
Write the remaining balance to your account
Read your friend’s account balance
Add the amount to his account balance
Write the new updated balance to his account
This whole set of operations can be called a transaction.
In DBMS, we write the above 6 steps transaction like below
Lets say your account is A and your friend’s account is B, you are transferring 10000 from A
to B, the steps of the transaction are
Read(A);
A = A - 10000;
Write(A);
Read(B);
B = B + 10000;
Write(B);
Gn 113
DDBMS
Atomicity
This property ensures that either all the operations of a transaction reflect in database or none.
The logic here is simple, transaction is a single unit, it can’t execute partially. Either it
executes completely or it doesn’t, there shouldn’t be a partial execution.
Consider an example of banking system to understand this
Suppose Account A has a balance of Rs.400 & B has Rs.700. Account A is transferring
Rs.100 to Account B.
This is a transaction that has two operations
a) Debiting Rs.100 from A’s balance
b) Crediting Rs.100 to B’s balance.
Let’s say first operation passed successfully while second failed, in this case A’s balance
would be Rs.300 while B would be having Rs.700 instead of Rs.800. This is unacceptable in
a banking system. Either the transaction should fail without executing any of the operation or
it should process both the operations. The Atomicity property ensures that.
There are two key operations are involved in a transaction to maintain the atomicity of the
transaction.
Abort: If there is a failure in the transaction, abort the execution and rollback the changes
made by the transaction.
Commit: If transaction executes successfully, commit the changes to the database.
Consistency
Database must be in consistent state before and after the execution of the transaction. This
ensures that there are no errors in the database at any point of time. Application programmer
is responsible for maintaining the consistency of the database.
For Example,
A transferring Rs.1000 to B. A’s initial balance is Rs.2000 and B’s initial balance is Rs.5000.
Before the transaction:
Total of A+B = 2000 + 5000 = Rs.7000
After the transaction:
Total of A+B = 1000 + 6000 = Rs.7000
The data is consistent before and after the execution of the transaction so this example
maintains the consistency property of the database.
Gn 114
DDBMS
Isolation
A transaction shouldn’t interfere with the execution of another transaction. To preserve the
consistency of database, the execution of transaction should take place in isolation (that
means no other transaction should run concurrently when there is a transaction already
running).
For example account A is having a balance of Rs.400 and it is transferring Rs.100 to account
B & C both. So we have two transactions here. Let’s say these transactions run concurrently
and both the transactions read Rs.400 balance, in this case the final balance of A would be
Rs.300 instead of Rs.200. This is wrong.
If the transaction were to run in isolation then the second transaction would have read the
correct balance Rs.300 (before debiting Rs.100) once the first transaction went successful.
Durability
Once a transaction completes successfully, the changes it has made into the database should
be permanent even if there is a system failure. The recovery-management component of
database systems ensures the durability of transaction.
ACID properties are the backbone of a database management system. These properties ensure
that even though there are multiple transaction reading and writing the data in the database,
the data is always correct and consistent.
TRANSACTION LOG
A DBMS uses a transaction log to keep track of all transactions that update the database. The
information stored in this log is used by the DBMS for a recovery requirement triggered by a
ROLLBACK statement, a program’s abnormal termination, or a system failure such as a
network discrepancy or a disk crash.
Some RDBMSs use the transaction log to recover a database forward to a currently consistent
state. After a server failure, for example, Oracle automatically rolls back uncommitted
transactions and rolls forward transactions that were committed but not yet written to the
physical database.
While the DBMS executes transactions that modify the database, it also automatically
updates the transaction log.
The transaction log stores
A record for the beginning of the transaction.
For each transaction component (SQL statement):
The type of operation being performed (update, delete, insert).
Gn 115
DDBMS
The names of the objects affected by the transaction (the name of the table).
The “before” and “after” values for the fields being updated.
Pointers to the previous and next transaction log entries for the same transaction.
The ending (COMMIT) of the transaction.
Gn 116
DDBMS
Thus, two rows from the table would be deleted and the SELECT statement would produce
the following result
ID NAME AGE ADDRESS
1 Ramu 32 Tanuku
3 Komali 23 Bhimavaram
The SAVEPOINT Command - A SAVEPOINT is a point in a transaction when you can roll
the transaction back to a certain point without rolling back the entire transaction.
The syntax for a SAVEPOINT command is as shown below.
SAVEPOINT SAVEPOINT_NAME;
Gn 117
DDBMS
This command serves only in the creation of a SAVEPOINT among all the transactional
statements. The ROLLBACK command is used to undo a group of transactions.
The syntax for rolling back to a SAVEPOINT is as shown below.
ROLLBACK TO SAVEPOINT_NAME;
Following is an example where you plan to delete the three different records from the
STUDENT table. You want to create a SAVEPOINT before each delete, so that you can
ROLLBACK to any SAVEPOINT at any time to return the appropriate data to its original
state.
For Example, Consider the STUDENT table having the following records.
ID NAME AGE ADDRESS
1 Ramu 32 Tanuku
2 Chitti 25 Tadepalligudem
3 Komali 23 Bhimavaram
4 Vishal 25 Eluru
The following code block contains the series of operations.
SQL> SAVEPOINT SP1;
Savepoint created.
SQL> DELETE FROM STUDENT WHERE ID=1;
1 row deleted.
SQL> SAVEPOINT SP2;
Savepoint created.
SQL> DELETE FROM STUDENT WHERE ID=2;
1 row deleted.
SQL> SAVEPOINT SP3;
Savepoint created.
SQL> DELETE FROM STUDENT WHERE ID=3;
1 row deleted.
Now that the three deletions have taken place, let us assume that you have changed your
mind and decided to ROLLBACK to the SAVEPOINT that you identified as SP2. Because
SP2 was created after the first deletion, the last two deletions are undone
SQL> ROLLBACK TO SP2;
Rollback complete.
Notice that only the first deletion took place since you rolled back to SP2.
SQL> SELECT * FROM STUDENT;
Gn 118
DDBMS
The SET TRANSACTION Command - The SET TRANSACTION command can be used
to initiate a database transaction. This command is used to specify characteristics for the
transaction that follows. For example, you can specify a transaction to be read only or read
write.
The syntax for a SET TRANSACTION command is as follows.
SET TRANSACTION [ READ WRITE | READ ONLY ];
CONCURRENCY CONTROL
Concurrency control is the process of managing simultaneous execution of transactions in a
multiprocessing database system without having them interfere with one another.
This property of DBMS allows many transactions to access the same database at the same
time without interfering with each other.
Concurrency problems in DBMS Transactions
When multiple transactions execute concurrently in an uncontrolled or unrestricted manner,
then it might lead to several problems. These problems are commonly referred to as
concurrency problems in a database environment.
The concurrency problems that can occur in the database are
Lost update problem (write-write conflict)
Temporary update or dirty read problem or Uncommitted Update (write-read
conflict).
Unrepeatable read or incorrect analysis or inconsistent retrievals (read-write conflict).
Gn 119
DDBMS
Lost update problem – The Lost Update problem arises when an update in the data is done
over another update but by two different transactions. For Example, consider two transactions
A and B performing read/write operations on a data DT in the database DB. The current
value of DT is 1000. The following table shows the read/write operations in A and B
transactions.
Time A B
t1 READ(DT) ------
t2 DT=DT+500 ------
t3 WRITE(DT) ------
t4 ------ DT=DT+300
t5 ------ WRITE(DT)
t6 READ(DT) ------
Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of
DT from 1000 to 1500 and then again transaction B modifies the value to 1800. Transaction
A again reads DT and finds 1800 in DT and therefore the update done by transaction A has
been lost.
Dirty Read Problem - The dirty read problem arises when a transaction reads the data that
has been updated by another transaction that is still uncommitted. It arises due to multiple
uncommitted transactions executing simultaneously. For Example, consider two transactions
A and B performing read/write operations on a data DT in the database DB. The current
value of DT is 1000. The following table shows the read/write operations in A and B
transactions.
Time A B
t1 READ(DT) ------
t2 DT=DT+500 ------
t3 WRITE(DT) ------
t4 ------ READ(DT)
t5 ------ COMMIT
t6 ROLLBACK ------
Gn 120
DDBMS
Transaction A reads the value of data DT as 1000 and modifies it to 1500 which gets stored
in the temporary buffer. The transaction B reads the data DT as 1500 and commits it and the
value of DT permanently gets changed to 1500 in the database DB. Then some server errors
occur in transaction A and it wants to get rollback to its initial value, i.e., 1000 and then the
dirty read problem occurs.
Unrepeatable Read Problem - The unrepeatable read problem occurs when two or more
different values of the same data are read during the read operations in the same transaction.
For Example, consider two transactions A and B performing read/write operations on a data
DT in the database DB. The current value of DT is 1000. The following table shows the
read/write operations in A and B transactions.
Time A B
t1 READ(DT) ------
t2 ------ READ(DT)
t3 DT=DT+500 ------
t4 WRITE(DT) ------
t5 ------ READ(DT)
Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value
of DT from 1000 to 1500 and then again transaction B reads the value and finds it to be 1500.
Transaction B finds two different values of DT in its two different read operations.
Concurrency control is the technique that ensures that the above three conflicts don’t occur in
the database. There are certain rules to avoid problems in concurrently running transactions
and these rules are defined as the concurrency control protocols.
SCHEDULER
Transactions are a set of instructions that perform operations on databases. When multiple
transactions are running concurrently, then a sequence is needed in which the operations are
to be performed because at a time, only one operation can be performed on the database. This
sequence of operations is known as Schedule, and this process is known as Scheduling.
When multiple transactions execute simultaneously in an unmanageable manner, then it
might lead to several problems, which are known as concurrency problems. In order to
overcome these problems, scheduling is required.
Gn 121
DDBMS
The scheduler establishes the order in which the operations within concurrent transactions are
executed. The scheduler interleaves the execution of database operations to ensure
serializability. The scheduler bases its actions on concurrency control algorithms, such as
locking or time stamping methods.
The schedulers ensure the efficient utilization of central processing unit (CPU) of computer
system. It can be observed that the schedule does not contain an ABORT or COMMIT action
for either transaction. Schedules which contain either an ABORT or COMMIT action for
each transaction whose actions are listed in it are called a complete schedule.
If the actions of different transactions are not interleaved, that is, transactions are executed
one by one from start to finish, the schedule is called a serial schedule.
A non-serial schedule is a schedule where the operations from a group of concurrent
transactions are interleaved.
A serial schedule gives the benefits of concurrent execution without giving up any
correctness. The disadvantage of a serial schedule is that it represents inefficient processing
because no interleaving of operations form different transactions is permitted. This can lead
to low CPU utilization while a transaction waits for disk input/output (I/O), or for another
transaction to terminate, thus slowing down processing considerably.
Serializable Schedules
A serializable schedule is a schedule that follows a set of transactions to execute in some
order such that the effects are equivalent to executing them in some serial order like a serial
schedule. The execution of transactions in a serializable schedule is a sufficient condition for
preventing conflicts.
The serial execution of transactions always leaves the database in a consistent state.
serializability describes the concurrent execution of several transactions.
The objective of serializability is to find the non-serial schedules that allow transactions to
execute concurrently without interfering with one another and thereby producing a database
state that could be produced by a serial execution.
Serializability must be guaranteed to prevent inconsistency from transactions interfering with
one another. The order of Read and Write operations are important in serializability.
The serializability rules are as follows:
If two transactions T1 and T2 only Read a data item, they do not conflict and the order is
not important.
If two transactions T1 and T2 either Read or Write completely separate data items, they
do not conflict and the execution order is not important.
Gn 122
DDBMS
If one transaction T1 Writes a data item and another transaction T2 either Reads or Writes
the same data item, the order of execution is important.
Gn 123
DDBMS
Gn 124
DDBMS
Lock Types
The DBMS mainly uses following types of locking techniques.
Binary Locking
Shared / Exclusive Locking
Two - Phase Locking (2PL)
Binary Locking - A binary lock can have two states or values: locked and unlocked (or 1 and
0, for simplicity). A distinct lock is associated with each database item X. If the value of the
lock on X is 1, item X cannot be accessed by a database operation that requests the item. If
the value of the lock on X is 0, the item can be accessed when requested. We refer to the
current value (or state) of the lock associated with item X as LOCK(X).
Two operations, lock_item and unlock_item, are used with binary locking.
Lock_item(X) - A transaction requests access to an item X by first issuing a lock_item(X)
operation.
If LOCK(X) = 1, the transaction is forced to wait. If LOCK(X) = 0, it is set to 1 (the
transaction locks the item) and the transaction is allowed to access item X.
Unlock_item (X) - When the transaction is through using the item, it issues an
unlock_item(X) operation, which sets LOCK(X) to 0 (unlocks the item) so that X may be
accessed by other transactions.
Hence, a binary lock enforces mutual exclusion on the data item; i.e., at a time only one
transaction can hold a lock.
Gn 125
DDBMS
Gn 126
DDBMS
Deadlocks
A deadlock is a condition in which two (or more) transactions in a set are waiting
simultaneously for locks held by some other transaction in the set. Neither transaction can
continue because each transaction in the set is on a waiting queue, waiting for one of the
other transactions in the set to release the lock on an item. Thus, a deadlock is an impasse
that may result when two or more transactions are each waiting for locks to be released that
are held by the other. Transactions whose lock requests have been refused are queued until
the lock can be granted.
A deadlock is also called a circular waiting condition where two transactions are waiting
(directly or indirectly) for each other. Thus in a deadlock, two transactions are mutually
excluded from accessing the next record required to complete their transactions, also called a
deadly embrace.
For Example,
A deadlock exists two transactions A and B exist in the following example:
Transaction A = access data items X and Y
Transaction B = access data items Y and X
Here, Transaction-A has acquired lock on X and is waiting to acquire lock on y. While,
Transaction-B has acquired lock on Y and is waiting to acquire lock on X. But, none of them
can execute further.
Gn 127
DDBMS
Deadlock Prevention - Deadlock prevention technique avoids the conditions that lead to
deadlocking. It requires that every transaction lock all data items it needs in advance. If any
of the items cannot be obtained, none of the items are locked. In other words, a transaction
requesting a new lock is aborted if there is the possibility that a deadlock can occur. Thus, a
timeout may be used to abort transactions that have been idle for too long. This is a simple
but indiscriminate approach. If the transaction is aborted, all the changes made by this
transaction are rolled back and all locks obtained by the transaction are released. The
transaction is then rescheduled for execution. Deadlock prevention technique is used in
two-phase locking.
Gn 128
DDBMS
All database operations (Read and Write) within the same transaction must have the same
time stamp. The DBMS executes conflicting operations in time stamp order, thereby ensuring
serializability of the transactions. If two transactions conflict, one is stopped, rolled back,
rescheduled, and assigned a new time stamp value.
The disadvantage of the time stamping approach is that each value stored in the database
requires two additional time stamp fields: one for the last time the field was read and one for
the last update. Time stamping thus increases memory needs and the database’s processing
overhead. Time stamping demands a lot of system resources because many transactions
might have to be stopped, rescheduled, and restamped.
An example illustrates the difference. Assume that we have two conflicting transactions: T1
and T2, each with a unique time stamp. Suppose T1 has a time stamp of 11548789 and T2
has a time stamp of 19562545. We can deduce from the time stamps that T1 is the older
transaction (the lower time stamp value) and T2 is the newer transaction. Given that scenario,
the four possible outcomes are shown in the following Table.
Gn 129
DDBMS
Validation phase: In this phase, a validation check is done on the temporary variables to see
if it violates the rules of serializability.
Write phase: This is the final phase of validation based protocol. In this phase, if the
validation of the transaction is successful then the values of temporary local variables are
written to the database and the transaction is committed. If the validation is failed in second
phase then the updates are discarded and transaction is slowed down to be restarted later.
Gn 130
DDBMS
DATABASE RECOVERY
Database recovery techniques are used in database management systems (DBMS) to restore a
database to a consistent state after a failure or error has occurred. The main goal of recovery
techniques is to ensure data integrity and consistency and prevent data loss.
There are mainly two types of recovery techniques used in DBMS.
Rollback/Undo Recovery Technique: The rollback/undo recovery technique is based on the
principle of backing out or undoing the effects of a transaction that has not completed
successfully due to a system failure or error. This technique is accomplished by undoing the
changes made by the transaction using the log records stored in the transaction log. The
transaction log contains a record of all the transactions that have been performed on the
database. The system uses the log records to undo the changes made by the failed transaction
and restore the database to its previous state.
Commit/Redo Recovery Technique: The commit/redo recovery technique is based on the
principle of reapplying the changes made by a transaction that has been completed
successfully to the database. This technique is accomplished by using the log records stored
in the transaction log to redo the changes made by the transaction that was in progress at the
time of the failure or error. The system uses the log records to reapply the changes made by
the transaction and restore the database to its most recent consistent state.
In addition to these two techniques, there is also a third technique called checkpoint recovery.
Checkpoint recovery is a technique used to reduce the recovery time by periodically saving
the state of the database in a checkpoint file. In the event of a failure, the system can use the
checkpoint file to restore the database to the most recent consistent state before the failure
occurred, rather than going through the entire log to recover the database.
Overall, recovery techniques are essential to ensure data consistency and availability in
DBMS, and each technique has its own advantages and limitations that must be considered in
the design of a recovery system
Database systems, like any other computer system, are subject to failures but the data stored
in them must be available as and when required. When a database fails it must possess the
facilities for fast recovery. It must also have atomicity i.e. either transaction are completed
successfully and committed (the effect is recorded permanently in the database) or the
transaction should have no effect on the database. There are both automatic and non-
automatic ways for both, backing up of data and recovery from any failure situations. The
techniques used to recover the lost data due to system crashes, transaction errors, viruses,
catastrophic failure, incorrect commands execution, etc. are database recovery techniques. So
Gn 131
DDBMS
to prevent data loss recovery techniques based on deferred update and immediate update or
backing up data can be used.
Recovery techniques are heavily dependent upon the existence of a special file known as
a system log. It contains information about the start and end of each transaction and any
updates which occur during the transaction. The log keeps track of all transaction operations
that affect the values of database items. This information is needed to recover from
transaction failure.
The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
read_item(T, X): This log entry records that transaction T reads the value of database
item X.
write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old value is
sometimes known as a before an image of X, and the new value is known as an
afterimage of X.
commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.
abort(T): This records that transaction T has been aborted.
checkpoint: Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in a consistent state, and all the transactions were committed.
A transaction T reaches its commit point when all its operations that access the database have
been executed successfully i.e. the transaction has reached the point at which it will
not abort (terminate without completing). Once committed, the transaction is permanently
recorded in the database. Commitment always involves writing a commit entry to the log and
writing the log to disk. At the time of a system crash, item is searched back in the log for all
transactions T that have written a start_transaction(T) entry into the log but have not written a
commit(T) entry yet; these transactions may have to be rolled back to undo their effect on the
database during the recovery process.
Undoing – If a transaction crashes, then the recovery manager may undo transactions i.e.
reverse the operations of a transaction. This involves examining a transaction for the log
entry write_item(T, x, old_value, new_value) and set the value of item x in the database to
old-value. There are two major techniques for recovery from non-catastrophic transaction
failures: deferred updates and immediate updates.
Gn 132
DDBMS
Deferred update – This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction updates are
recorded in the local transaction workspace. If a transaction fails before reaching its commit
point, it will not have changed the database in any way so UNDO is not needed. It may be
necessary to REDO the effect of the operations that are recorded in the local transaction
workspace, because their effect may not yet have been written in the database. Hence, a
deferred update is also known as the No-undo/redo algorithm.
Immediate update – In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However, these
operations are recorded in a log on disk before they are applied to the database, making
recovery still possible. If a transaction fails to reach its commit point, the effect of its
operation must be undone i.e. the transaction must be rolled back hence we require both undo
and redo. This technique is known as undo/redo algorithm.
Caching/Buffering – In this one or more disk pages that include data items to be updated are
cached into main memory buffers and then updated in memory before being written back to
disk. A collection of in-memory buffers called the DBMS cache is kept under the control of
DBMS for holding these buffers. A directory is used to keep track of which database items
are in the buffer. A dirty bit is associated with each buffer, which is 0 if the buffer is not
modified else 1 if modified.
Shadow paging – It provides atomicity and durability. A directory with n entries is
constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory. When a
page is to be modified, a shadow page is allocated in which changes are made and when it is
ready to become durable, all pages that refer to the original are updated to refer new
replacement page.
Backward Recovery – The term “Rollback” and “UNDO” can also refer to backward
recovery. When a backup of the data is not available and previous modifications need to be
undone, this technique can be helpful. With the backward recovery method, unused
modifications are removed and the database is returned to its prior condition. All adjustments
made during the previous traction are reversed during the backward recovery. In another
word, it reprocesses valid transactions and undoes the erroneous database updates.
Forward Recovery – “Roll forward” and “REDO” refers to forwarding recovery. When a
database needs to be updated with all changes verified, this forward recovery technique is
helpful.
Some failed transactions in this database are applied to the database to roll those
Gn 133
DDBMS
modifications forward. In another word, the database is restored using preserved data and
valid transactions counted by their past saves.
Gn 134