DBMS All Units

Download as pdf or txt
Download as pdf or txt
You are on page 1of 134

DDBMS

UNIT-I

Gn 1
DDBMS

INTRODUCTION
Data - Data is the raw material that can be processed for any computing machine.
For example − Employee name, Product name, Name of the student, Marks of the student,
Mobile number, Image etc.
Information - Information is the data that has been converted into more useful or intelligent
form.
For example: Report card sheet.
The information is needed for the following reasons −
 To gain knowledge about the surroundings.
 To keep the system up to date.
 To know about the rules and regulations of the society.
Knowledge - The human mind purposefully organizes the information and evaluates it to
produce knowledge.
Example of data, information and knowledge - A student secures 450 marks. Here 450
is data, marks of the student is the information and hard work required to get the marks
is knowledge.
The major differences between Data and Information are as follows
Data Information
Data is the raw fact. It is a processed form of data.
It is not significant to a business. It is significant to a business.
Data is an atomic level piece of information. It is a collection of data.
Example: Product name, Name of student. Example: Report card of student.
It is a phenomenal fact. It is organized data.
This is the primary level of intelligence. It is a secondary level of intelligence.
May or may not be meaningful. Always meaningful.
Understanding is difficult. Understanding is easy.
The diagram given below depicts the use of data and information in a database

Gn 2
DDBMS

Database Management System


 Database management system is a software which is used to manage the database.
For example: MySQL, Oracle, etc are a very popular commercial database which is
used in different applications.
 DBMS provides an interface to perform various operations like database creation,
storing data in it, updating data, creating a table in the database and a lot more.
 It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.

CHARACTERISTICS OF DBMS
Characteristics of DBMS are as follow
 Reduce Redundancy
 Storing of Data
 Concurrent Access
 Data Consistency
 Transaction Support
 Security
 Support to SQL
Way of Storing the data - In database management system data is stored into tables,
structure for the table is created initially. This table structure is also known as schema in
dbms.
Schema in dbms provides the information about various attributes of name the table, data
type of the attribute. DBMS a provide a facility to represent a relationship among the related
table.
Reduced Redundancy - This is one of the important feature of the dbms that it reduces the
redundancy. Here the term redundancy can be seen as unnecessary repetition or duplication
of data in database.
To reduce the redundancy DBMS use Normalization in DBMS concept which decompose the
given table into smaller tables in order to minimize the redundancy.
Note that DBMS does not guarantee the 100% removal of the redundancy it can only
minimize the redundancy.
Concurrent Access - DBMS support the concurrent access of the database to the multiple
users. Multiple users can work on the database at the same time and still maintained the
consistency. Here the term consistency represents the correctness of the database.

Gn 3
DDBMS

Data Consistency - The term data consistency means state of data should be consistent
means it should be correct at any instant of time. Result of any manipulation or updation
should be reflected.
Support to Structure Query Language - DBMS support to SQL. SQL Queries provide a
easy way to the user to create, insert, update, delete the data in database.
Security - DBMS give s the facility to protect the database from unauthorized users.
Different user s accounts may have different access permissions, using which user can easily
secure their data by unauthorized users.
Transaction Support - DBMS supports transactions which helps the user to maintain the
integrity of the database.
Some DBMS software used in software industry are Oracle, MY SQL and SQL server.

FILE-PROCESSING SYSTEM VS DBMS


This typical file-processing system is supported by a conventional operating system. The
system stores permanent records in various files, and it needs different application programs
to extract records from, and add records to, the appropriate files. Before database
management systems (DBMSs) were introduced, organizations usually stored information in
such systems. Keeping organizational information in a file-processing system has a number
of major disadvantages:
Data redundancy and inconsistency - Since different programmers create the files and
application programs over a long period, the various files are likely to have different
structures and the programs may be written in several programming languages. Moreover, the
same information may be duplicated in several places (files). For example, if a student as a
double major (say, music and mathematics) the address and telephone number of that student
may appear in a file that consists of student records of students in the Music department and
in a file that consists of student records of students in the Mathematics department. This
redundancy leads to higher storage and access cost. In addition, it may lead to data
inconsistency; that is, the various copies of the same data may no longer agree. For example,
a changed student address may be reflected in the Music department records but not
elsewhere in the system.
Difficulty in accessing data - Suppose that one of the university clerks needs to find out the
names of all students who live within a particular postal-code area. The clerk asks the data-
processing department to generate such a list. Because the designers of the original system
did not anticipate this request, there is no application program on hand to meet it. There is,
however, an application program to generate the list of all students.

Gn 4
DDBMS

The university clerk has now two choices: either obtain the list of all students and extract the
needed information manually or ask a programmer to write the necessary application
program. Both alternatives are obviously unsatisfactory. Suppose that such a program is
written, and that, several days later, the same clerk needs to trim that list to include only those
students who have taken at least 60 credit hours. As expected, a program to generate such a
list does not exist. Again, the clerk has the preceding two options, neither of which is
satisfactory. The point here is that conventional file-processing environments do not allow
needed data to be retrieved in a convenient and efficient manner. More responsive data-
retrieval systems are required for general use.
Data isolation - Because data scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
Integrity problems - The data values stored in the database must satisfy certain types of
consistency constraints. Suppose the university maintains an account for each department,
and records the balance amount in each account. Suppose also that the university requires that
the account balance of a department may never fall below zero. Developers enforce these
constraints in the system by adding appropriate code in the various application programs.
However, when new constraints are added, it is difficult to change the programs to enforce
them. The problem is compounded when constraints involve several data items from different
files.
Atomicity problems - A computer system, like any other device, is subject to failure. In
many applications, it is crucial that, if a failure occurs, the data be restored to the consistent
state that existed prior to the failure.
Consider a program to transfer $500 from the account balance of department A to the account
balance of department B. If a system failure occurs during the execution of the program, it is
possible that the $500 was removed from the balance of department A but was not credited to
the balance of department B, resulting in an inconsistent database state. Clearly, it is essential
to database consistency that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is
difficult to ensure atomicity in a conventional file-processing system.
Concurrent-access anomalies - For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. Indeed,
today, the largest Internet retailers may have millions of accesses per day to their data by
shoppers. In such an environment, interaction of concurrent updates is possible and may
result in inconsistent data. Consider department A, with an account balance of $10,000. If
two department clerks debit the account balance (by say $500 and $100, respectively) of

Gn 5
DDBMS

department A at almost exactly the same time, the result of the concurrent executions may
leave the budget in an incorrect (or inconsistent) state. Suppose that the programs executing
on behalf of each withdrawal read the old balance, reduce that value by the amount being
withdrawn, and write the result back. If the two programs run concurrently, they may both
read the value $10,000, and write back $9500 and $9900, respectively. Depending on which
one writes the value last, the account balance of department A may contain either $9500 or
$9900, rather than the correct value of $9400. To guard against this possibility, the system
must maintain some form of supervision.
But supervision is difficult to provide because data may be accessed by many different
application programs that have not been coordinated previously.
Security problems - Not every user of the database system should be able to access all the
data. Data should be secured from unauthorised access, for example a student in a college
should not be able to see the payroll details of the teachers, such kind of security constraints
are difficult to apply in file processing systems.
These difficulties, among others, prompted the development of database systems. In what
follows, we shall see the concepts and algorithms that enable database systems to solve the
problems with file-processing systems.
ADVANTAGES OF DBMS:
Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing
same data multiple times). In a database system, by having a centralized database and
centralized control of data by the DBA the unnecessary duplication of data is avoided. It also
eliminates the extra time for processing the large volume of data. It results in saving the
storage space.
Improved Data Sharing: DBMS allows a user to share the data in any number of application
programs.
Data Integrity: Integrity means that the data in the database is accurate. Centralized control
of the data helps in permitting the administrator to define integrity constraints to the data in
the database. For example: in customer database we can enforce integrity that it must accept
the customer only from Noida and Meerut city.
Data Security: It is easier to apply access constraints in database systems so that only
authorized user is able to access the data. Each user has a different set of access thus data is
secured from the issues such as identity theft, data leaks and misuse of data.
Data Consistency: By eliminating data redundancy, we greatly reduce the opportunities for
inconsistency. For example: A customer address is stored only once, we cannot have
disagreement on the stored values. Also updating data values is greatly simplified when each

Gn 6
DDBMS

value is stored in one place only. Finally, we avoid the wasted storage that results from
redundant data storage.
Easy access to data : Database systems manages data in such a way so that the data is easily
accessible with fast response times. Even if the database size is huge, the DBMS can still
provide faster access and updation of data.
Easy recovery: Since database systems keeps the backup of data, it is easier to do a full
recovery of data in case of a failure. This is very useful especially for almost all the
organizations, as the data maintained over time should not be lost during a system crash or
failure.
Flexible: Database systems are more flexible than file processing systems. DBMS systems
are scalable, the database size can be increased and decreased based on the amount of storage
required. It also allows addition of additional tables as well as removal of existing tables
without disturbing the consistency of data.
Reduced Application Development and Maintenance Time: DBMS supports many
important functions that are common to many applications, accessing data stored in the
DBMS, which facilitates the quick development of application.
Disadvantages of DBMS
 It is bit complex. Since it supports multiple functionality to give the user the best, the
underlying software has become complex. The designers and developers should have
thorough knowledge about the software to get the most out of it.
 Because of its complexity and functionality, it uses large amount of memory. It also needs
large memory to run efficiently.
 DBMS system works on the centralized system, i.e.; all the users from all over the world
access this database. Hence any failure of the DBMS, will impact all the users.
 DBMS is generalized software, i.e.; it is written work on the entire systems rather specific
one. Hence some of the application will run slow.

HISTORY OF DATABASES
 In early 1960’s, Charles Bachman was the first person to develop the Integrated Data
Store (IDS) which was based on network data model.
 In the late 1960’s, IBM (International Business Machines Corporation) developed the
Integrated Management Systems (IMS) which is the standard database system used till
date in many places. It was developed based on the hierarchical database model.

Gn 7
DDBMS

 It was during the year 1970 that the relational database model was developed by Edgar F
Codd. Many of the database models we use today are relational based. It was considered
the standardized database model from then.
 In 1976, Peter Chen has developed Entity-Relationship (ER) model which is widely used
in database design
 Later during the 1980’s, IBM developed the Structured Query Language (SQL) as a part
of R project. It was declared as a standard language for the queries by ISO and ANSI.

DATABASE LANGUAGES IN DBMS


 A DBMS has appropriate languages and interfaces to express database queries and
updates.
 Database languages can be used to read, store and update the data in the database.
 Structured Query Language (SQL) is the database language by the use of which we can
perform certain operations on the existing database and also we can use this language to
create a database.
Types of Database Languages

Data Definition Language (DDL)


 DDL stands for Data Definition Language. It is used to define database structure
 It is used for creating tables, schema, indexes, constraints etc. in database.
 DDL is a set of SQL commands used to create, modify, and delete database structures
but not data.
List of DDL commands:
 CREATE: This command is used to create the database or its objects
 DROP: This command is used to delete objects from the database.
 ALTER: This is used to alter the structure of the database.
 TRUNCATE: This is used to remove all records from a table.
 COMMENT: This is used to add comments to the data dictionary.
 RENAME: This is used to rename an object existing in the database.

Gn 8
DDBMS

Data Manipulation Language (DML)


 DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.
List of DDL commands:
 SELECT: It is used to retrieve data from a database.
 INSERT: It is used to insert data into a table.
 UPDATE: It is used to update existing data within a table.
 DELETE: It is used to delete records from a table.
Data Control Language (DCL)
 DCL stands for Data Control Language. The Data Control Language (DCL) is used to
control privilege in Databases.
List of DCL commands:
 GRANT: This command gives users access privileges to the database.
 REVOKE: This command withdraws the user’s access privileges given by using the
GRANT command.
Transaction Control Language (TCL)
 The changes in the database that we made using DML commands are either
performed or rollbacked using TCL.
List of DCL commands:
 COMMIT: It is used to save the transaction on the database.
 ROLLBACK: It is used to restore the database to original since the last Commit.
 SAVEPOINT: It is a point in a transaction in which you can roll the transaction back
to a certain point without rolling back the entire transaction.

INSTANCE AND SCHEMA IN DBMS


Schema is the overall description of the database. The basic structure of how the data will be
stored in the database is called schema.
A database schema defines how data is organized within a relational database; this is
inclusive of logical constraints such as, table names, fields, data types, and the relationships
between these entities.
A database schema is considered the “blueprint” of a database which describes how the data
may relate to other tables or other data models. However, the schema does not actually
contain data.
In the following diagram, we have a schema that shows the relationship between three tables:
Teacher, Subject and Department. The diagram only shows the design of the database, it

Gn 9
DDBMS

doesn’t show the data present in those tables. Schema is only a structural view (design) of a
database.

The data stored in database at a particular moment of time is called instance of database.
Database schema defines the attributes in tables that belong to a particular database. The
value of these attributes at a moment of time is called the instance of that database.
For Example consider we have a single table student in the database, today the table has 100
records, so today the instance of the database has 100 records. We are going to add another
100 records in this table by tomorrow so the instance of database tomorrow will have 200
records in table. In short, at a particular moment the data stored in database is called the
instance, this changes over time as and when we add, delete or update data in the database.
Another Example The concept of database schemas and instances can be understood by
analogy to a program written in a programming language. A database schema corresponds to
the variable declarations in a program. Each variable has a particular value at a given instant.
The values of the variables in a program at a point in time correspond to an instance of a
database schema.
Difference between Schema and Instance:
Schema Instance
It is the overall description of the database. It is the collection of information stored in
a database at a particular moment.
Schema is same for whole database. Data in instances can be changed using
addition, deletion, updation.
Does not change Frequently. Changes Frequently.
Defines the basic structure of the database i.e It is the set of Information stored at a
how the data will be stored in the database. particular time.

Gn 10
DDBMS

THREE-SCHEMA ARCHITECTURE (OR) DBMS THREE LEVEL


ARCHITECTURE (OR) LEVELS OF ABSTRACTION
Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding
irrelevant details from user is called data abstraction.
For example when you are booking a train ticket, you are not concerned how data is
processing at the back end when you click “book ticket”, what processes are happening when
you are doing online payments. You are just concerned about the message that pops up when
your ticket is successfully booked. This doesn’t mean that the process happening at the back
end is not relevant; it just means that you as a user are not concerned what is happening in the
database.

This architecture has three levels:


1. External/View Schema or External/View level
2. Logical/Conceptual Schema or Logical/Conceptual level
3. Physical/Internal Schema or Physical/Internal level

External level - It is also called view level. The reason this level is called “view” is because
several users can view their desired data from this level which is internally fetched from
database with the help of conceptual and internal level mapping.
This is the highest level of database abstraction. It includes a number of external schemas or
user views. This level provides different views of the same database for a specific user or a
group of users. An external view provides a powerful and flexible security mechanism by
hiding the parts of the database from a particular user.

Conceptual level - It is also called logical level. This level describes the structure of the
whole database. It acts as a middle layer between the physical storage and user view. It
explains what data to be stored in the database, what the data types are, and what relationship
exists among those data. There is only one conceptual schema per database.
Database constraints and security are also implemented in this level of architecture. This level
is maintained by DBA (database administrator).

Internal level - This level is also known as physical level. This level describes how the data
is actually stored in the storage devices. This level is also responsible for allocating space to
the data. This is the lowest level of the architecture.

Gn 11
DDBMS

DATA INDEPENDENCE OF DBMS


The ability to modify the schema definition of a DBMS at one level, without affecting the
schema definition of the next higher level is called data independence.
In DBMS there are two types of data independence
1. Physical data independence
2. Logical data independence.
Physical Data Independence - This is defined as the ability to modify the physical schema
of the database without the modification causing any changes in the logical/conceptual or
view/external level.
Due to Physical independence, any of the below change will not affect the conceptual levels.
levels
 Using a new storage device like Hard Drive or Magnetic Tapes
 Modifying the file organization technique in the Database
 Switching to different data structures.
 Changing the access method.
 Modifying indexes.
 Changes to compression techniques or hashing algorithms.
 Change of Location of Database from say C drive to D Drive
Logical Data Independence - Logical data independence is the ability to modify logical
schema without causing any unwanted modifications to the external schema or the
application programs to be rewritten.
Due to Logical independence, any of the below change will
wi not affect
fect the external levels
levels.

Gn 12
DDBMS

 Add/Modify/Delete a new attribute, entity or relationship is possible without a rewrite


of existing application programs
 Merging two records into one
 Breaking an existing record into two or more records

Difference between Physical and Logical Data Independence


Physical Data Independence Logical Data Independence
It is concerned with the internal schema of the It is concerned with the conceptual schema
database. of the database.
It is easier to achieve as compared to logical Logical data independence is difficult to
data independence. achieve as compared to physical data
independence.
Physical data independence is mostly It is mostly focused on the structure or
concerned with how data is saved in the updating data definitions.
system.
Changes at the internal level may or may not When the database's logical structure needs
be required to increase the overall to be modified, the changes made at the
performance of the database. logical level are crucial.
In most cases, a change at the physical level If new fields are added or removed from
does not necessitate a change at the the database, then updates are required to
application program level. be made in the application software.

DATA MODEL
Data Model gives us an idea that how the final system will look like after its complete
implementation. It defines the data elements and the relationships between the data elements.
Data Models are used to show how data is stored, connected, accessed and updated in the
database management system.
Some of the Data Models in DBMS are:
 Hierarchical Model
 Network Model
 Entity-Relationship Model
 Relational Model
 Object-Oriented Data Model

Gn 13
DDBMS

Hierarchical Model - Hierarchical Model was the first DBMS model. This model organises
the data in the hierarchicall tree structure.
Each child node has one parent node but a parent node can have more than one child node.
Multiple parents are not allowed.
This model has the ability to manage one
one-to-one
one relationships as well as one-to-many
one
relationships.
For Example

Network Model - This model is an extension of the hierarchical model. It was the most
popular model before the relational model. It replaces the hierarchical tree with a graph.
A parent node can have more than one child node and a child node also can have more than
one parent node
This model has the ability to manage one-to-one
one one relationships as well as many-to-many
many
relationships.
For Example: In the example below we can see that node student has two parents i.e. CSE
Department and Library. This was earlier not possible in the hierarchical model.

Gn 14
DDBMS

Relational Model - Relational Model is the most widely used model. In this model, the data
is maintained in the form of a two-dimensional table. All the information is stored in the
form of row and columns. The basic structure of a relational model is tables. So, the tables
are also called relations in the relational model.
For Example, we have an Employee table.

Entity-Relationship Model - Entity-Relationship Model or simply ER Model is a high-


level data model diagram. In this model, we represent the real-world problem in the
pictorial form to make it easy for the stakeholders to understand. It is also very easy for the
developers to understand the system by just looking at the ER diagram. We use the ER
diagram as a visual tool to represent an ER Model.

ER diagram has the following three components:


 Entities: Entity is a real-world thing. It can be a person, place, or even a concept.
Example: Teachers, Students, Course, Building, Department, etc are some of the
entities of a School Management System.
 Attributes: An entity contains a real-world property called attribute. This is the
characteristics of that attribute.
Example: The entity teacher has the property like teacher id, salary, age, etc.
 Relationship: Relationship tells how two entities are related.
Example: Teacher works for a department.

For Example:

Gn 15
DDBMS

In the above diagram, the entities are Teacher and Department. The attributes
of Teacher entity are Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The
attributes of entity Department entity are Dept_id, Dept_name. The two entities are
connected using the relationship. Here, each teacher works for a department.

Object-Oriented Data Model - In this model, both the data and relationship are present in a
single structure known as an object.
In this model, two are more objects are connected through links. We use this link to relate
one object to other objects. This can be understood by the example given below.

In the above example, we have two objects Employee and Department. All the data and
relationships of each object are contained as a single unit. The attributes like Name,
Job_title of the employee and the methods which will be performed by that object are
stored as a single object. The two objects are connected through a common attribute i.e the
Department_id and the communication between these two will be done with the help of this
common id.

Gn 16
DDBMS

DATABASE USERS
Database users are categorized based up on their interaction with the database. These are
seven types of database users in DBMS.
Database Administrator (DBA): Database Administrator (DBA) is a person/team who
defines the schema and also controls the 3 levels of database. The DBA will then create a
new account id and password for the user if he/she needs to access the database. DBA is also
responsible for providing security to the database and he allows only the authorized users to
access/modify the data base. DBA is responsible for the problems such as security breaches
and poor system response time.
 DBA also monitors the recovery and backup and provide technical support.
 The DBA has a DBA account in the DBMS which called a system or superuser
account
 DBA repairs damage caused due to hardware and/or software failures.
 DBA is the one having privileges to perform DCL (Data Control Language)
operations such as GRANT and REVOKE, to allow/restrict a particular user from
accessing the database.
Naive / Parametric End Users : Parametric End Users are the unsophisticated who don’t
have any DBMS knowledge but they frequently use the database applications in their daily
life to get the desired results.
For example, Railway’s ticket booking users are naive users. Clerks in any bank is a naive
user because they don’t have any DBMS knowledge but they still use the database and
perform their given task.
System Analyst: System Analyst is a user who analyzes the requirements of parametric end
users. They check whether all the requirements of end users are satisfied.
Sophisticated Users: Sophisticated users can be engineers, scientists, business analyst, who
are familiar with the database. They can develop their own database applications according to
their requirement. They don’t write the program code but they interact with the database by
writing SQL queries directly through the query processor.
Database Designers: Data Base Designers are the users who design the structure of database
which includes tables, indexes, views, triggers, stored procedures and constraints which are
usually enforced before the database is created or populated with data. He/she controls what
data must be stored and how the data items to be related.
Application Programmers: Application Programmers also referred as System Analysts or
simply Software Engineers, are the back-end programmers who writes the code for the

Gn 17
DDBMS

application programs. They are the computer professionals. These programs could be written
in Programming languages such as Visual Basic, Developer, C, FORTRAN, COBOL etc.
Casual Users / Temporary Users: Casual Users are the users who occasionally use/access
the database but each time when they access the database they require the new information,
for example, Middle or higher level manager.

ARCHITECTURE OR STRUCTURE OF DBMS


Query Processor - The query processor will accept query from user and solves it by
accessing the database.
Parts of Query processor:
 DDL interpreter - This will interprets DDL statements and fetch the definitions in the
data dictionary.
 DML compiler - This will translates DML statements in a query language into low
level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans
for same query result DML compiler will select best plan for query optimization.
 Query evaluation engine - This engine will execute low-level instructions generated
by the DML compiler on DBMS.
Storage Manager/Storage Management - A storage manager is a program module which
acts like interface between the data stored in a database and the application programs and
queries submitted to the system. Thus, the storage manager is responsible for storing,
retrieving and updating data in the database.
The storage manager components include:
 Authorization and integrity manager - Checks for integrity constraints and authority
of users to access data.
 Transaction manager - Ensures that the database remains in a consistent state although
there are system failures.
 File manager - Manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
 Buffer manager - It is responsible for retrieving data from disk storage into main
memory. It enables the database to handle data sizes that are much larger than the size
of main memory.
 Data structures implemented by storage manager.
Data files: Stored in the database itself.

Gn 18
DDBMS

Data dictionary: Stores metadata about the structure of the database.


Indices: Provide fast access to data items.
Statistical Data: It stores statistical information about the data stored in the
database, like the number of records, blocks, etc. in a table. This information can
be used to execute a query efficiently.

DBMS CLIENT – SERVER ARCHITECTURE


 The DBMS design depends upon its architecture. The basic client/server architecture is
used to deal with a large number of PCs, web servers, database servers and other
components that are connected with networks.
 The client/server architecture consists of many PCs and a workstation which are
connected via the network.
 DBMS architecture depends upon how users are connected to the database to get their
request done.

Gn 19
DDBMS

Types of DBMS Architecture - There are three types of DBMS architecture


 Single tier architecture
 Two tier architecture
 Three tier architecture
Single tier architecture - In this type of architecture, the database is readily available on the
client machine, any request made by client doesn’t require a network connection to perform
the action on the database.
For example, let’s say you want to fetch the records of employee from the database and the
database is available on your computer system, so the request to fetch employee details will
be done by your computer and the records will be fetched from the database by your
computer as well. This type of system is generally referred as local database system.

Two tier architecture - In two-tier architecture, the Database system is present at the server
machine and the DBMS application is present at the client machine, these two machines are
connected with each other through a reliable network as shown in the below diagram.
Whenever client machine makes a request to access the database present at server using a
query language like SQL, the server perform the request on the database and returns the result
back to the client. The application connection interface such as JDBC, ODBC are used for the
interaction between server and client.

Three tier architecture - In three-tier architecture, another layer is present between the
client machine and server machine. In this architecture, the client application doesn’t
communicate directly with the database systems present at the server machine, rather the
client application communicates with server application and the server application internally
communicates with the database system present at the server.

Gn 20
DDBMS

Gn 21
DDBMS

UNIT-II

Gn 22
DDBMS

ER MODEL
An Entity–relationship model (ER model) describes the structure of a database with the help
of a diagram, which is known as Entity Relationship Diagram (ER Diagram).
The ER model defines the conceptual view of a database.
An ER model is a design or blueprint of a database that can later be implemented as a
database.
Component of ER Diagram

An ER diagram has three main components:


 Entity
 Attribute
 Relationship
Entity
An entity is an object or component of data. Entity is a real-world thing. It can be a person,
place, or even a concept.
An entity is represented as rectangle in an ER diagram.
Example: Teachers, Students, Course, Building, Department, etc are some of the entities of a
School Management System.
For example: In the following ER diagram we have two entities Student and College.

Gn 23
DDBMS

Weak Entity - An entity that depends on another entity called a weak entity. The weak entity
doesn't contain any key attribute of its own. (or)
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity.
The weak entity is represented by a double rectangle.
For example – a bank account cannot be uniquely identified without kno
knowing
wing the bank to
which the account belongs, so bank account is a weak entity.

Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an ER
diagram.
There are four types of attributes:
 Key attribute
 Composite attribute
 Multivalued attribute
 Derived attribute
Key attribute - A key attribute can uniquely identify an entity from an entity set. For
example, student roll number can uniquely identify a student from a set of students. Key
attribute is represented by oval
val same as other attributes however the text of key attribute is
underlined.

Composite attribute - An attribute that is a combination of other attributes is known as


composite attribute. For example, In student entity, the student address is a composite
com
attribute as an address is composed of other attributes such as pin code, state, country.

Gn 24
DDBMS

Multivalued attribute - An attribute that can hold multiple values is known as multivalued
attribute. It is represented with double ovals in an ER Diagram.
For example – A person can have more than one phone numbers so the phone number
attribute is multivalued.

Derived attribute - A derived attribute is one whose value is dynamic and derived from
another attribute. It is represented by dashed oval in an ER Diagram. For example – Person
age is a derived attribute as it changes over time and can be derived from another attribute
(Date of birth).

For Example, the complete entity type Student with its attributes can be represented as

Gn 25
DDBMS

Relationship
A relationship is represented by diamond shape in ER diagram, it shows the relationship
among entities.
There are four types of relationships:
 One to One
 One to Many
 Many to One
 Many to Many

One to One Relationship - When a single instance of an entity is associated with a single
instance of another entity then it is called one to one relationship.
For example, a person has only one passport and a passport is given to one person.

One to Many Relationship - When a single instance of an entity is associated with more
than one instances of another entity then it is called one to many relationship.
For example – a customer can place many orders but a order cannot be placed by many
customers.

Gn 26
DDBMS

Many to One Relationship - When more than one instances of an entity is associated with a
single instance of another entity then it is called many to one relationship.
For example – many students can study in a single college but a student cannot study in many
colleges at the same time.

Many to Many Relationship - When more than one instances of an entity is associated with
more than one instances of another entity then it is called many to many relationship.
For example, a can be assigned to many projects and a project can be assigned to many
students.

RELATIONAL MODEL
The relational Model was proposed by E.F. Codd to model data in the form of relations or
tables. After designing the conceptual model of the Database using ER diagram, we need to
convert the conceptual model into a relational model which can be implemented using any
RDBMS language like Oracle SQL, MySQL, etc.
The relational model represents how data is stored in Relational Databases. A relational
database stores data in the form of relations (tables).
Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE, and
AGE shown in the following table.
ROLL_NO NAME ADDRESS PHONE AGE
1 Rahul Hyderabad 9455123451 18
2 Ravi Vijayawada 9652431543 18
3 Mohan Tadepalligudem 9156253131 20
4 Satish Tanuku 18
Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
Relation Schema: A relation schema represents the name of the relation with its attributes.
 e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the relation
schema for STUDENT.

Gn 27
DDBMS

Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples, one
of which is shown as:
1 Rahul Hyderabad 9455123451 18
Relation Instance: The set of tuples of a relation at a particular instance of time is called a
relation instance. The above table shows the relation instance of STUDENT at a particular
time. It can change whenever there is an insertion, deletion, or update in the database.
Degree: The number of attributes in the relation is known as the degree of the relation.
The STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality.
The STUDENT relation defined above has cardinality 4.
Column: The column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from the relation STUDENT.
ROLL NO
1
2
3
4
NULL Values: The value which is not known or unavailable is called a NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.

Constraints in Relational Model


While designing the Relational Model, we define some conditions which must hold for data
present in the database are called Constraints. These constraints are checked before
performing any operation (insertion, deletion, and updation) in the database. If there is a
violation of any of the constraints, the operation will fail.
Domain Constraints - These are attribute-level constraints. An attribute can only take values
that lie inside the domain range.
e.g; If a constraint AGE>0 is applied to STUDENT relation, inserting a negative value of
AGE will result in failure.
Key Constraints - Every relation in the database should have at least one set of attributes
that defines a tuple uniquely. Those set of attributes is called keys.
e.g.; ROLL_NO in STUDENT is a key. No two students can have the same roll number. So a
key has two properties:
 It should be unique for all tuples.
 It can’t have NULL values.
Gn 28
DDBMS

Referential Constraints - When one attribute of a relation can only take values from another
attribute of the same relation or any other relation, it is called referential Constraints.
Let us suppose we have 2 relations
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE
1 Rahul Hyderabad 9455123451 18 CS
2 Ravi Vijayawada 9652431543 18 CS
3 Mohan Tadepalligudem 9156253131 20 ECT
4 Satish Tanuku 18 ECE

BRANCH
BRANCH_CODE BRANCH_NAME
CS Computer Science
IT Information Technology
ECT Electronics and Communication Technology
ECE Electronics and Communication Engineering

BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint. The relation
which is referencing another relation is called REFERENCING RELATION (STUDENT in
this case) and the relation to which other relations refer is called REFERENCED RELATION
(BRANCH in this case).
Advantages of using the relational model
The advantages and reasons due to which the relational model in DBMS is widely accepted
as a standard are:
 Simple and Easy To Use - Storing data in tables is much easier to understand and
implement as compared to other storage techniques.
 Manageability - Because of the independent nature of each relation in a relational
database, it is easy to manipulate and manage. This improves the performance of the
database.
 Query capability - With the introduction of relational algebra, relational databases
provide easy access to data via high-level query language like SQL.
 Data integrity - With the introduction and implementation of relational constraints, the
relational model can maintain data integrity in the database.

Gn 29
DDBMS

Disadvantages of using the relational model


 The performance of the relational model depends upon the number of relations present in
the database.
 Hence, as the number of tables increases, the requirement of physical memory increases.
 The structure becomes complex and there is a decrease in the response time for the
queries.
 Because of all these factors, the cost of implementing a relational database increase.

INTRODUCTION TO DATABASE DESIGN


The Entity-Relationship(ER) data model allows us to describe the data involved in a real-
world enterprise in terms of objects and their relationships and is widely used to develop an
initial database design. It provides useful concepts that allow us to move from an informal
description of what users want from their database to a more detailed, precise description that
can be implemented in a DBMS.
Database Design and ER Diagrams or Major Steps in Database Design or Database
Design Process
The database design process can be divided into six steps.
Requirements Analysis - Before building the house, we have to analyze our requirements.
How many bedrooms, how many bathrooms, how many kitchens and what will be the area
etc... In simple words, we analyze our requirements. This is the very first step of designing
process. In this we have to analyze that what data to be stored in the database, what
applications must be built on top of it and what operations are most frequent and subject to
performance requirements. In this stage we must find out what the users want from the
database.
Conceptual Database Design - After analyzing our requirements, we will come to a basic
idea of the building according to our requirements. That means drawing a rough sketch. In
this step the information gathered in the requirements analysis step is used to develop a high-
level description of the data to be stored in the database, along with the constraints that are
known to hold over this data. This step is often carried out using E-R model or a similar high-
level data model.
Logical Database Design - Now based on the above design we make a building plan.
Simply, In our above step we focused on what to do. Here in logical database design we
focus on how to do. We must choose a DBMS to implement our database design and convert
the conceptual database design into a database schema in the data model of the chosen
DBMS. Task here is to convert E-R Schema into relational database schema.

Gn 30
DDBMS

Schema Refinement - To check whether our plan is perfect or not. Refinement means to
remove unwanted elements. In this step we analyze the collections of relations in our
relational database schema to identify potential problems and to refine it. In contrast to the
requirements analysis and conceptual design steps which are essentially subjective, schema
refinement can be guided by some elegant and powerful theory.
Physical Database Design - Based on the plan we start construing the house. So we should
focus on arranging the required cement, clay, sand, iron and wood etc. and we construct the
house according the the capacity of the basement. In this step we must consider typical
expected workloads that our database must support and further refine the database design to
ensure that it meets desired performance criteria. This step may simply involve building
indexes on some tables and clustering some tables, or it may involve a substantial redesign of
parts of the database schema obtained from the earlier design steps.
Applications and Security Design - Our building is almost completed. Now we don't allow
everybody into our home. There are some restrictions. If the person is a stranger then we talk
to him outside the gate. If he is a friend, we let him come inside the home and make him sit in
the hall and talk, if he a relative we will allow him to stay with us... In this step, we identify
different user groups and different roles played by various users (e.g., the development team
for a product, the customer support representatives, the product manager). For each role and
user group, we must identify the parts of the database that they must be able to access and the
parts of the database that they should not be allowed to access and take steps to ensure that
they can access only the necessary parts.

ENTITIES, ATTRIBUTES AND ENTITY SETS


Entity
An entity is a real-world thing which can be distinctly identified like a person, place or a
concept. It is an object which is distinguishable from others. If we cannot distinguish it from
others then it is an object but not an entity.
An entity can be of two types:
Tangible Entity - Tangible Entities are those entities which exist in the real world
physically.
Example: Person, car, etc.
Intangible Entity - Intangible Entities are those entities which exist only logically and have
no physical existence.
Example: Bank Account, etc.

Gn 31
DDBMS

Example: If we have a table of a Student (Roll_no, Student_name, Age, Mobile_no) then


each student in that table is an entity and can be uniquely identified by their Roll Number i.e
Roll_no.

Entity Type
The entity type is a collection of the entity having similar attributes. In the above Student
table example, we have each row as an entity and they are having common attributes i.e each
row has its own value for attributes Roll_no, Age, Student_name and Mobile_no. So, we can
define the above STUDENT table as an entity type because it is a collection of entities having
the same attributes.
The table below shows how the data of different entities( different students) are stored.

The E-R representation of the above Student Entity Type is done below.

Note: We use a rectangle to represent an entity type in the E-R diagram, not entity.

Types of Entity type


 Strong Entity Type
 Weak Entity Type
Strong Entity Type - Strong entity are those entity types which has a key attribute. The
primary key helps in identifying each entity uniquely. It is represented by a rectangle. In the

Gn 32
DDBMS

above example, Roll_no identifies each element of the table uniquely and hence, we can say
that STUDENT is a strong entity type.
For Example:

Weak Entity Type - Weak entity type doesn't have a key attribute. Weak entity type can't be
identified on its own. It depends upon some other strong entity for its distinct identity. This
can be understood with a real-life example. There can be children only if the parent exits.
There can be no independent existence of children. There can be a room only if building
exits. There can be no independent existence of a room.
A weak entity is represented by a double outlined rectangle. The relationship between a weak
entity type and strong entity type is called an identifying relationship and shown with a
double outlined diamond instead of a single outlined diamond. This representation can be
seen in the diagram below.
Example : If we have two tables of Customer(Customer_id, Name, Mobile_no, Age, Gender)
and Address(Locality, Town, State). Here we cannot identify the address uniquely as there
can be many customers from the same locality. So, for this, we need an attribute of Strong
Entity Type i.e ‘Customer’ here to uniquely identify entities of 'Address' Entity Type.

Entity Set
Entity Set is a collection of entities of the same entity type. In the above example of
STUDENT entity type, a collection of entities from the Student entity type would form an
entity set.

Gn 33
DDBMS

For example

Differences between Entity, Entity Type and Entity Set


Entity Entity Type Entity Set
A thing in the real world with A category of a particular Set of all entities of a
independent existence entity particular entity type.
Any particular row (a record) The name of a relation All rows of a relation
in a relation(table) is known as (table) in RDBMS is an (table) in RDBMS is entity
an entity. entity type set

Attributes
An attribute
tribute is a property or characteristic of an entity. An entity may contain any number of
attributes. One of the attributes is considered as the primary key. In an Entity-Relation
Entity Relation model,
attributes are represented in an elliptical shape.
For Example: Student
ent has attributes like name, age, roll number, and many more. To
uniquely identify the student, we use the primary key as a roll number as it is not repeated.
There are different types of attributes: Simple, Composite, Single
Single-valued,
valued, Multi
Multi-valued,
Derivedd attribute, Stored Attribute and key attribute. One more attribute is there, i.e. Complex
Attribute, this is the rarely used attribute.
Simple Attribute: It is also known as atomic attributes. When an attribute cannot be divided
further, then it is calledd a simple attribute.
For example, the roll number of a student, the id number of an employee.
Composite Attribute: Composite attributes are those that are made up of the composition of
more than one attribute. When any attribute can be divided further into
into more sub
sub-attributes,
then that attribute is called a composite attribute.
For example, the
he address can be further split into house number, street number, city, state,
country, and pin code, the name can also be split into first name middle name, and last
la name.
Single-valued Attribute: Those attributes which can have exactly one value are known as
single valued attributes. They contain singular values, so more than one value is not allowed.
For example, the DOB of a student can be a single valued attri
attribute.
bute. Another example is
gender because one person can have only one gender.

Gn 34
DDBMS

Multi-valued Attribute: Those attributes which can have more than one entry or which
contain more than one value are called multi valued attributes.
In the Entity Relationship (ER) diagram, we represent the multi valued attribute by double
oval representation.
For example, one person can have more than one phone number, so that it would be a multi
valued attribute.
Derived Attribute: When one attribute can be derived from the other attribute, then it is
called a derived attribute.
For example, the age of a student can be a derived attribute because we can get it by the DOB
of the student.
Another example can be of working experience, which can be obtained by the date of joining
of an employee.
In the ER diagram, we represent the derived attributes by a dotted oval shape.
Stored Attributes: Values of stored attributes remain constant and fixed for an entity
instance and also, and they help in deriving the derived attributes.
For example, the Age attribute can be derived from the Date of Birth attribute, and also,
the Date of birth attribute has a fixed and constant value throughout the life of an entity.
Hence, the Date of Birth attribute is a stored attribute.
Key Attribute: Key attributes are those attributes that can uniquely identify the entity in the
entity set.
For example, Roll-No is the key attribute because it can uniquely identify the student.
Complex Attribute: If any attribute has the combining property of multi values and
composite attributes, then it is called a complex attribute. It means if one attribute is made up
of more than one attribute and each attribute can have more than one value, then it is called a
complex attribute.
For example, if a person has more than one office and each office has an address made from a
street number and city. So the address is a composite attribute, and offices are multi valued
attributes, so combining them is called complex attributes.

We can understand the attributes by the following example


In the given example, we have an ER diagram of a table named Employee. We have a lot of
attributes from the above table.
Department is a single valued attribute that can have only one value.
Name is a composite attribute because it is made up of a first name and the last name as the
middle name attribute.

Gn 35
DDBMS

Work Experience attribute is a derived attribute, and it is represented by a dotted oval. We


can get the work experience by the other attribute date of joining.
Phone number is a multi-valued attribute because one employee can have more than one
phone number, which is represented by a double oval representation.

RELATIONSHIP AND RELATIONSHIP SETS


Relationship
A relationship is defined as an association among several entities. In ER diagram, the
relationship type is represented by a diamond and connecting the entities with lines.
For Example, ‘Enrolled in’ is a relationship that exists between entities Student and Course.

Relationship Set
A relationship set is a set of relationships of same type.
For Example, Set representation of above ER diagram is

Gn 36
DDBMS

Degree of a Relationship Set - The number of entity sets that participate in a relationship set
is termed as the degree of that relationship set. Thus,
Degree of a relationship set = Number of entity sets participating in a relationship set
On the basis of degree of a relationship set, a relationship set can be classified into the
following types
 Unary relationship set
 Binary relationship set
 Ternary relationship set
 N-ary relationship set
Unary Relationship Set - Unary relationship set is a relationship set where only one entity
set participates in a relationship set.
For Example,

Binary Relationship Set - Binary relationship set is a relationship set where two entity sets
participate in a relationship set.
For Example,

Ternary Relationship Set - Ternary relationship set is a relationship set where three entity
sets participate in a relationship set.
For Example,

Gn 37
DDBMS

N-ary Relationship Set - N-ary


ary relationship set is a relationship set where ‘n’ entity sets
participate in a relationship set.

Mapping Cardinalities or Cardinality Ratio or Cardinality - Cardinality express the


number of entities to which another entity can be associated via a relationship.
Types of cardinality in between tables are:
 one-to-one
 one-to-many
 many-to-one
 many-to-many
One-to-one - An entity in A is associated with almost one entity in B and an entity in B is
associated with almost one entity in A.

For example, a person has only one passport and a passport is given to one person.

One-to-many - An entity in A is associated with any number of entities in B but an entity in


B is associated with just one entity in A.

Gn 38
DDBMS

For example – a customer can place many orders but a order cannot be placed by many
customers.

Many-to-one - An entity in A is associated with one entity in B but an entity in B can be


associated with any number of entities in A.

For example – many students can study in a single college but a student cannot study in many
colleges at the same time.

Many-to-many - An entity in A is associated with any number of entities in B and an entity


in B is associated with any number of entities in A.

For example, a can be assigned to many projects and a project can be assigned to many
students.

Gn 39
DDBMS

CONCEPTUAL DESIGN WITH THE ER MODEL


Developing an ER diagram presents several design issues, including the following:
 Entity versus Attribute.
 Entity versus Relationship?
 Binary versus N-ary relationships.
 Placing Relationship Attributes.

Entity versus Attribute - While identifying the attributes of an entity set, it is sometimes not
clear whether a property should be modeled as an attribute or as an entity set.
In these cases we treat an attribute as an entity for example - Consider the entity set employee
with attributes employee-name and telephone-number. It can easily be argued that a
telephone is an entity in its own right with attributes telephone-number and location (the
office where the telephone is located).

If we take this point of view, we must redefine the employee entity set as:
 The employee entity set with attribute employee-name.
 The telephone entity set with attributes telephone-number and location
 The relationship set emp-telephone, which denotes the association between employees
and the telephones that they have. Such a conversion of attribute helps to give extra
information about it when required.

Gn 40
DDBMS

Entity versus Relationship - Sometimes, an entity set can be better expressed in relationship
set. Thus, it is not always clear whether an object is best expressed by an entity set or a
relationship set.

Binary versus N-ary Relationships - It is always possible to replace a non-binary


relationship set by a number of distinct binary relationship sets.
For example, consider a ternary relationship R associated with three entity sets A, B and C.
We can replace the relationship set R by an entity set E and create three relationship sets as:
 R1, relating E and A
 R2, relating E and B
 R3, relating E and C
If the relationship set R had any attributes, these are assigned to entity set E. A special
identifying attribute is created for E.

Gn 41
DDBMS

Placing Relationship Attributes - The cardinality ratio in DBMS can help us determine in
which scenarios we need to place relationship attributes. It is recommended to represent the
attributes of one to one or one to many relationship sets with any participating entity sets
rather than a relationship set.
For example, if an entity cannot be determined as a separate entity rather it is represented by
the combination of participating entity sets. In such case it is better to associate these entities
to many-to-many relationship sets.

INTEGRITY CONSTRAINTS OVER RELATIONS


A database is only as good as the information stored in it, and a DBMS must therefore help
prevent the entry of incorrect information. An integrity constraint (IC) is a condition that is
specified on a database schema, and restricts the data that can be stored in an instance of the
database. If a database instance satisfies all the integrity constraints specified on the database
schema, it is a legal instance.
Integrity constraints are specified and enforced at different times:
 When the DBA or end user defines a database schema, he or she specifies the ICs that
must hold on any instance of this database.
 When a database application is run, the DBMS checks for violations and disallows
changes to the data that violate the specified ICs.
There are different types of integrity constraints in DBMS:
 Key Constraints
 Foreign Key Constraints or Referential Integrity Constraints
 General Constraints

Key Constraints
A key constraint is that constraint which uniquely identifies the tuple of a relation. A Relation
should have at least one key constraint.
If any attribute is defined as a key constraint, the values are different for each tuple of that
key attribute.
In the following Employee table, Employee_ID is a key attribute. In this attribute, no two
values can have the same name. This attribute cannot have any Null value.

Gn 42
DDBMS

Super Key - A super key is a set of attributes or single attribute that uniquely identify the
rows (tuples) in a table.
In the given Student Table we can have the following keys as the super key.

1. {Roll_no}
2. {Registration_no}
3. {Roll_no, Registration_no},
4. {Roll_no, Name}
5. {Name, Registration_no}
6. {Roll_no, Name, Registration_no}
All the above keys are able to uniquely identify each row. So, each of these keys is super
key.
Candidate Key - A candidate key is a minimal super key or a super key with no redundant
attribute. It is called a minimal super key because we select a candidate key from a set of
super key such that selected candidate key is the minimum attribute required to uniquely
identify the table. It is selected from the set of the super key which means that all candidate
keys are super key. Candidate Keys are not allowed to have NULL values.
In the above example, we had 6 super keys but all of them cannot become a candidate key.
Only those super keys would become a candidate key which have no redundant attributes.
1. {Roll_no}: This key doesn't have any redundant or repeating attribute. So, it can be
considered as a candidate key.
2. {Registration_no}: This key also doesn't have any repeating attribute. So, it can be
considered as a candidate key.

Gn 43
DDBMS

3. {Roll_no, Registration_no}: This key cannot be considered as a candidate key


because when we take the subset of this key we get two attributes i.e Roll_no or
Registration_no. Each of these attributes is the candidate key. So, it is not a minimal
super key. Hence, this key is not a candidate key.
4. {Roll_no, Name}: This key cannot be considered as a candidate key because when we
take the subset of this key we get two attributes i.e. Roll_no or Name. Roll_no is a
candidate key. So, it is not a minimal super key. Hence, this key is not a candidate
key.
5. {Name, Registration_no}: This key cannot be considered as a candidate key because
when we take the subset of this key we get two attributes i.e Registration_no or
Name. Registration_no is a candidate key. So, it is not a minimal super key. Hence,
this key is not a candidate key.
6. {Roll_no, Name, Registration_no}: This key cannot be considered as a candidate key
because when we take the subset of this key we get three attributes i.e Roll_no,
Registration_no and Name. Two of these attributes i.e Roll_no and Registration_no
are the candidate key. So, it is not a minimal superkey. Hence, this key is not a
candidate key.
So, we conclude that we can have only 2 out of above 6 super keys as the candidate key. i.e.
(Roll_no) and (Registration_no).

Primary Key - The primary key is the minimal set of attributes which uniquely identifies any
row of a table. It is selected from a set of candidate keys. Any candidate key can become a
primary key. It depends upon the requirements and is done by the Database Administrator
(DBA). The primary key cannot have a NULL value. It cannot have a duplicate value.

In the above example, we saw that we have two candidate keys i.e (Roll_no) and
(Registration_no). From this set, we can select any key as the primary key for our table. It
depends upon our requirement. Here, if we are talking about class then selecting ‘Roll_no’ as
the primary key is more logical instead of ‘Registrartion_no’.

Gn 44
DDBMS

Specifying Key Constraints - We can specify constraints at the time of creating the table
using CREATE TABLE statement. We can also specify the constraints after creating a table
using ALTER TABLE statement.
Consider Creation of “Student” table,
CREATE TABLE Students ( sid CHAR(20),
name CHAR(30),
login CHAR(20),
age INTEGER,
gpa REAL,
UNIQUE (name, age),
CONSTRAINT Students Key PRIMARY KEY (sid))

Foreign Key Constraints or Referential Integrity Constraints


The foreign key of a table is the attribute which establishes the relationship among tables.
The foreign key is the attribute which points to the primary key of another table.
If we have two tables of Student and Course then we can establish a relationship between
these two tables using a foreign key. The ‘Course_id’ in the Student table is the foreign key
as it establishes the link between the Student and Course Table. So, if we need to find the
information about any course opted by any student then we can go the Course table using the
foreign key.

Gn 45
DDBMS

One thing that is to be noted here is that the foreign key of one table may or may not be the
primary key. But it should be the primary key of another table. In the above
example, Course_id is not a primary key in the Student table but it is a primary key in the
Course table.
Specifying Foreign Key Constraints –
CREATE TABLE Course (Course_id INTEGER,
Course_name CHAR(20),
Duration(months) INTEGER,
PRIMARY KEY Course_id,
FOREIGN KEY (Course_id) REFERENCES Student)
The foreign key constraint states that every Course_id value in Course must also appear in
Students, that is, Course_id in Course is a foreign key referencing Student.
General Constraints
Domain, primary key, and foreign key constraints are considered to be a fundamental part of
the relational data model and are given special attention in most commercial systems.
Sometimes, however, it is necessary to specify more general constraints.
For example, we may require that student ages be within a certain range of values; given such
an IC specification, the DBMS will reject inserts and updates that violate the constraint. This
is very useful in preventing data entry errors.
If we specify that all students must be 20 years old, then only those students which are having
20 years and above age are valid cases i.e., legal instance. Rest of all others having less than
20 years are called as invalid cases i.e., illegal instance.
Current relational database systems support such general constraints in the form of table
constraints and assertions. Table constraints are associated with a single table and are

Gn 46
DDBMS

checked whenever that table is modified. In contrast, assertions involve several tables and are
checked whenever any of these tables is modified.

ADDITIONAL FEATURES OF ER MODEL OR EXTENDED ER FEATURES


Generalization
Generalization is the process of extracting common properties from a set of entities and
create a generalized entity from it. It is a bottom-up approach in which two or more entities
can be generalized to a higher level entity if they have some attributes in common.

For Example we have two entities Student and Teacher. Attributes of Entity Student are
Name, Address & Grade. Attributes of Entity Teacher are: Name, Address & Salary

These two entities have two common attributes: Name and Address, we can make a
generalized entity with these common attributes.
We have created a new generalized entity Person and this entity has the common attributes of
both the entities. As you can see in the following ER diagram that after the generalization
process the entities Student and Teacher only has the specialized attributes Grade and Salary
respectively and their common attributes (Name & Address) are now associated with a new
entity Person which is in the relationship with both the entities (Student & Teacher).

Gn 47
DDBMS

Specialization
In specialization, an entity is divided into sub-entities based on their characteristics. It is a
top-down approach where higher level entity is specialized into two or more lower level
entities.

For Example, there is an entity in the School database, whose name is Teacher.
The Teacher entity contains three attributes, whose names are Name, Age, and Salary.
This Teacher entity can be further broken into three entities, i.e., Math_Teacher,
English_Teacher, and Science_Teacher. These sub-entities are the three type of teacher
working in the school, and all have common attributes which are associated with the parent
entity Teacher.

Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.

For example, Center entity offers the Course entity act as a single entity in the relationship
which is in a relationship with another entity visitor. In the real world, if a visitor visits a
coaching center then he will never enquiry about the Course only or just about the Center
instead he will ask the enquiry about both.

Gn 48
DDBMS

Generalization vs Specialization
Generalization Specialization
Generalization is a bottom-up manner approach Specialization is a top-down manner approach
Specialization divides an entity to form multiple
Generalization collects the common features of
new entities that inherit some feature of the
multiple entities to form a new entity.
splitting entity.
In Generalization, the higher level entity must In Specialization, the higher level entity may
have lower level entities. not have lower level entities.
In Generalization, schema size reduces. In Specialization, schema size increases.
Generalization is applied to a group of entities. Specialization is applied to a single entity.
Generalization forms a single entity from Specialization forms multiple entities from a
multiple entities. single entity.
Inheritance is not used in generalization. Inheritance can be used in specialization.

Gn 49
DDBMS

College Management System ER Diagram

Banking Management System ER Diagram

Gn 50
DDBMS

Library Management System ER Diagram

Online Shopping Management System or E-Commerce ER Diagram

Gn 51
DDBMS

Online Airline Reservation System ER Diagram

Hotel Management System ER Diagram

Gn 52
DDBMS

UNIT-III

Gn 53
DDBMS

RELATIONAL ALGEBRA
Relational Algebra is a procedural query language, which takes Relation as input and
generates relation as output.
Relational algebra works on relational model. The purpose of a query language is to retrieve
data from database or perform various operations such as insert, update, delete on the data.
When we say that relational algebra is a procedural query language, it means that it tells what
data to be retrieved and how to be retrieved.
On the other hand relational calculus is a non-procedural query language, which means it tells
what data to be retrieved but doesn’t tell how to retrieve it.

Types of operations/operators in relational algebra


We have divided these operations in two categories:
1. Basic/Fundamental Operations
2. Additional/Derived Operations
Basic/Fundamental Operations:
 Select (σ)
 Project (∏)
 Rename (ρ)
 Union (∪)
 Set Difference (-)
 Cartesian product (X)
Derived Operations:
 Join
 Set Intersection (∩)
 Division (÷)
 Assignment(←)
The selection, projection and rename operations are called unary operations because they
operate only on one relation.
The other operations operate on pairs of relations and are therefore called binary operations.

SELECTION AND PROJECTION


Select Operator (σ) - Select Operator is denoted by sigma (σ) and it is used to find the tuples
(or rows) in a relation (or table) which satisfy the given condition. It is a unary
operator means it requires only one operand.

Gn 54
DDBMS

The Syntax of Select Operator (σ) is


σ Condition(Relation/Table_name)
For example consider the following table student
ROLL NAME AGE
1 Amar 20
2 Ramesh 18
3 Latha 19
4 Sandya 20
Suppose we want the row(s) from STUDENT Relation where "AGE" is 20
σ AGE=20 (STUDENT)
This will return the following output:
ROLL NAME AGE
1 Amar 20
4 Sandya 20

Project Operator (∏) - Project operator is denoted by ∏ symbol and it is used to select
desired columns (or attributes) from a table (or relation). It eliminates duplicates.

The Syntax of Project Operator (∏) is


∏ column_name1, column_name2, ...., column_nameN (Table_name)

Suppose we want the names of all students from STUDENT Relation.


∏ NAME(STUDENT)
This will return the following output:
NAME
Amar
Ramesh
Latha
Sandya

For multiple attributes, we can separate them using a ",".


∏ ROLL,NAME(STUDENT)
Above code will return two columns, ROLL and NAME.

Gn 55
DDBMS

ROLL NAME
1 Amar
2 Ramesh
3 Latha
4 Sandya

Display the ROLL and NAME of the students whose AGE is 20


∏ ROLL,NAME(σ AGE=20 (STUDENT))
This will return the following output:
ROLL NAME
1 Amar
4 Sandya

RENAME
Rename (ρ) - Rename (ρ) operation can be used to rename a relation or an attribute of a
relation. Rename operation is denoted by "Rho"(ρ).

The Syntax of Rename (ρ) is


ρ(new_relation_name, old_relation_name)

We can use the rename operator to rename STUDENT relation to STUDENT1.


ρ(STUDENT1, STUDENT)

Suppose we are fetching the names of students from STUDENT relation. We would like to
rename this relation as STUDENT_NAME.
ρ(STUDENT_NAME,∏ NAME(STUDENT))
STUDENT_NAME
NAME
Amar
Ramesh
Latha
Sandya
As we can see, this output relation is named "STUDENT_NAME".

Gn 56
DDBMS

The name, age column of student table are renamed as sname and sage respectively
ρ SNAME,SAGE (∏ NAME,AGE(STUDENT))

Given a relation ‘STUDENT’, then the expression,


ρx (STUDENT) returns the same relation ‘STUDENT’ under a new name ‘x’

SET OPERATIONS
Set Operations are Union (∪), Set Intersection (∩), Set Difference (-) and Cartesian product
(X).
Union Operator (∪)
Union operation is done by Union Operator which is represented by "union"(∪). It is the
same as the union operator from set theory, i.e., it selects all tuples from both relations but
with the exception that for the union of two relations/tables both relations must have the same
set of Attributes. It is a binary operator as it requires two operands.
If relations don't have the same set of attributes, then the union of such relations will result
in NULL.
The Syntax of Union Operator (∪) is
table_name1 ∪ table_name2
The rows (tuples) that are present in both the tables will only appear once in the union set. In
short we can say that there are no duplicates present after the union operation.
For Example, Consider two relations STUDENT and EMPLOYEE
STUDENT
ROLL NAME AGE
1 Amar 20
2 Ramesh 18
3 Latha 19
4 Sandya 20
EMPLOYEE
EMPLOYEE_NO NAME AGE
E-1 Amar 30
E-2 Ramya 33
E-3 Lokesh 29
E-4 Harsha 35

Gn 57
DDBMS

Suppose we want all the names from STUDENT and EMPLOYEE relation.
∏ NAME(STUDENT) ∪ ∏ NAME(EMPLOYEE)
Then the output is
NAME
Amar
Ramesh
Latha
Sandya
Ramya
Lokesh
Harsha

Intersection Operator (∩)


Intersection operator is denoted by ∩ symbol and it is used to select common rows (tuples)
from two tables (relations).
Only those rows that are present in both the tables will appear in the result set. In
Intersection, duplicates are automatically removed.
The Syntax of Intersection Operator (∩) is
table_name1 ∩ table_name2
Suppose we want common names from STUDENT and EMPLOYEE relation.
∏ NAME(STUDENT) ∩ ∏ NAME(EMPLOYEE)
Then the output is
NAME
Amar

Set Difference Operator (-)


Set Difference is denoted by – symbol. Suppose we have two relations R1 and R2 and we
want to select all those tuples (rows) that are present in Relation R1 but not present in
Relation R2, this can be done using Set difference R1 – R2.
The Syntax of Set Difference (-) is
table_name1 - table_name2
Let's take an example where we would like to know the names of students who are in
STUDENT Relation but not in EMPLOYEE Relation.
∏ NAME(STUDENT) - ∏ NAME(EMPLOYEE)

Gn 58
DDBMS

This will give us the following output:


NAME
Ramesh
Latha
Sandya

Cartesian product Operator (X)


Cartesian product is denoted by X symbol. Suppose we have two relations R1 and R2 then
the Cartesian product of these two relations (R1 X R2) would combine each tuple of first
relation R1 with the each tuple of second relation R2.
The number of rows in the output will always be the cross product of number of rows in each
table. For example table 1 has n rows and table 2 has m rows so the output has n×m rows.
The Syntax of Cartesian product (X) is
R1 X R2
For Example, Consider two tables R and S
R
A B
AA 100
BB 200
CC 300
S
X Y
XX 99
YY 11
ZZ 101
Let’s find the Cartesian product of table R and S.
RXS
Then the output is
A B X Y
AA 100 XX 99
AA 100 YY 11
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11

Gn 59
DDBMS

BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101

JOINS OR JOIN OPERATIONS


Join Operation in DBMS are binary operations that allow us to combine two or more
relations.
They are further classified into two types:
 Inner Join
 Outer Join
Inner Join
When we perform Inner Join, only those tuples returned that satisfy the certain condition.
It is also classified into three types:
 Theta Join or Conditional Join
 Equi Join
 Natural Join
Consider two relations EMPLOYEE and DEPARTMENT
EMPLOYEE
E_NO E_NAME CITY EXPERIENCE
E-1 Ram Delhi 04
E-2 Varun Chandigarh 09
E-3 Ravi Noida 03
E-4 Amit Bangalore 07

DEPARTMENT
D_NO D_NAME E_NO MIN_EXPERIENCE
D-1 HR E-1 03
D-2 IT E-2 05
D-3 Marketing E-3 02

It will be much easier to understand Join Operations when we have the Cartesian product.
The Cartesian Product of the above two relations is

Gn 60
DDBMS

EXPERIEN MIN_EXPERIE
E_NO E_NAME CITY D_NO D_NAME E_NO
CE NCE
E-1 Ram Delhi 04 D-1 HR E-1 03
E-1 Ram Delhi 04 D-2 IT E-2 05
E-1 Ram Delhi 04 D-3 Marketing E-3 02
E-2 Varun Chandigarh 09 D-1 HR E-1 03
E-2 Varun Chandigarh 09 D-2 IT E-2 05
E-2 Varun Chandigarh 09 D-3 Marketing E-3 02
E-3 Ravi Noida 03 D-1 HR E-1 03
E-3 Ravi Noida 03 D-2 IT E-2 05
E-3 Ravi Noida 03 D-3 Marketing E-3 02
E-4 Amit Bangalore 07 D-1 HR E-1 03
E-4 Amit Bangalore 07 D-2 IT E-2 05
E-4 Amit Bangalore 07 D-3 Marketing E-3 02

Theta Join (⋈Ɵ) or Conditional Join (⋈c): Conditional Join is used when you want to join
two or more relation based on some conditions.
Notation: R ⋈θ S Where R is the first relation S is the second relation
Example: we want a relation where EXPERIENCE from EMPLOYEE >=
MIN_EXPERIENCE from DEPARTMENT.

EMPLOYEE ⋈ EMPLOYEE.EXPERIENCE>=DEPARTMENT.MIN_EXPERIENCE DEPARTMENT


Result is:
EXPERIEN MIN_EXPERIE
E_NO E_NAME CITY D_NO D_NAME E_NO
CE NCE
E-1 Ram Delhi 04 D-1 HR E-1 03
E-1 Ram Delhi 04 D-3 Marketing E-3 02
E-2 Varun Chandigarh 09 D-1 HR E-1 03
E-2 Varun Chandigarh 09 D-2 IT E-2 05
E-2 Varun Chandigarh 09 D-3 Marketing E-3 02
E-3 Ravi Noida 03 D-1 HR E-1 03
E-3 Ravi Noida 03 D-3 Marketing E-3 02

Gn 61
DDBMS

EXPERIEN MIN_EXPERIE
E_NO E_NAME CITY D_NO D_NAME E_NO
CE NCE
E-4 Amit Bangalore 07 D-1 HR E-1 03
E-4 Amit Bangalore 07 D-2 IT E-2 05
E-4 Amit Bangalore 07 D-3 Marketing E-3 02

Equijoin(⋈): Equijoin is a special case of conditional join where only equality condition
holds between a pair of attributes.
A non-equijoin is the inverse of an Equi join, which occurs when you join on a condition
other than "=".
Example: we would like to join EMPLOYEE and DEPARTMENT relation where E_NO
from EMPLOYEE = E_NO from DEPARTMENT.

EMPLOYEE ⋈EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT


Result is:
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME E_NO MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR E-1 03
E-2 Varun Chandigarh 09 D-2 IT E-2 05
E-3 Ravi Noida 03 D-3 Marketing E-3 02

Natural Join (⋈): A Natural Join can be performed only if two relations share at least one
common attribute. Furthermore, the attributes must share the same name and domain.
Natural join operates on matching attributes where the values of the attributes in both
relations are the same and remove the duplicate ones.
Notation: R ⋈ S Where R is the first relation S is the second relation

Example: we want to join EMPLOYEE and DEPARTMENT relation with E_NO as a


common attribute.
Notice, here E_NO has the same name in both the relations and also consists of the same
domain, i.e., in both relations E_NO is a string.

EMPLOYEE ⋈ DEPARTMENT

Gn 62
DDBMS

Result is:
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02

Outer Join
Unlike Inner Join which includes the tuple that satisfies the given condition, Outer Join also
includes some/all the tuples which don't satisfy the given condition.
It is also of three types:
 Left Outer Join
 Right Outer Join
 Full Outer Join
Left Outer Join: Left Outer Join returns the matching tuples (tuples present in both
relations) and the tuples which are only present in Left Relation, here R.
However, if the matching tuples are NULL, then attributes/columns of Right Relation, here S
are made NULL in the output relation.
Example:
EMPLOYEE ⟕EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
Here we are combining EMPLOYEE and DEPARTMENT relation with the constraint that
EMPLOYEE's E_NO must be equal to DEPARTMENT's E_NO.
Result is:
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
E-4 Amit Bangalore 07 NULL NULL NULL

We can see here, all the tuples from left, i.e., EMPLOYEE relation are present. But E-4 is not
satisfying the given condition, i.e., E_NO from EMPLOYEE must be equal to E_NO from
DEPARTMENT, still it is included in the output relation. This is because Outer Join also
includes some/all the tuples which don't satisfy the condition. That's why Outer Join marked
E-4's corresponding tuple/row from DEPARTMENT as NULL.

Gn 63
DDBMS

Right Outer Join: Right Outer Join returns the matching tuples and the tuples which are
only present in Right Relation here S.
The same happens with the Right Outer Join, if the matching tuples are NULL, then the
attributes of Left Relation, here R are made NULL in the output relation.

We will combine EMPLOYEE and DEPARTMENT relations with the same constraint as
above.
EMPLOYEE ⟖EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
Result is:
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
As all the tuples from DEPARTMENT relation have a corresponding E_NO in EMPLOYEE
relation, therefore no tuple from EMPLOYEE relation contains a NULL.

Full Outer Join: Full Outer Join returns all the tuples from both relations. However, if there
are no matching tuples then, their respective attributes are made NULL in output relation.
Again, combine the EMPLOYEE and DEPARTMENT relation with the same constraint.
EMPLOYEE ⟗EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
Result is:
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
E-4 Amit Bangalore 07 NULL NULL NULL

DIVISION
Division (÷) - Division Operation is represented by "division"(÷ or /) operator and is used in
queries that involve keywords "every", "all", etc.
Notation : R(X,Y)/S(Y)
Here, R is the first relation from which data is retrieved.
S is the second relation that will help to retrieve the data.

Gn 64
DDBMS

X and Y are the attributes/columns present in relation. We can have multiple attributes in
relation, but keep in mind that attributes of S must be a proper subset of attributes of R.
For each corresponding value of Y, the above notation will return us the value of X from
tuple<X,Y> which exists everywhere.

We have two relations, ENROLLED and COURSE.


ENROLLED consist of two attributes STUDENT_ID and COURSE_ID. It denotes the map
of students who are enrolled in given courses.
COURSE contains the list of courses available.
Here attributes/columns of COURSE relation are a proper subset of attributes/columns of
ENROLLED relation. Hence Division operation can be used here.
ENROLLED
STUDENT_ID COURSE_ID
Student_1 DBMS
Student_2 DBMS
Student_1 OS
Student_3 OS
COURSE
COURSE_ID
DBMS
OS
Now the query is to return the STUDENT_ID of students who are enrolled in every course.
ENROLLED(STUDENT_ID, COURSE_ID)/COURSE(COURSE_ID)
This will return the following relation as output.
STUDENT_ID
Student_1
Takeaway
 Theta Join (θ) combines two relations based on a condition.
 Equi Join is a type of Theta Join where only equality condition (=) is used.
 Natural Join (⋈) combines two relations based on a common attribute (preferably foreign
key).
 Left Outer Join (⟕) returns the matching tuples and tuples which are only present in the
left relation.

Gn 65
DDBMS

 Right Outer Join (⟖) returns the matching tuples and tuples which are only present in the
right relation.
 Full Outer Join (⟗) returns all the tuples present in the left and right relations.

MORE EXAMPLES OF RELATIONAL ALGEBRA QUERIES


We now present several examples to illustrate how to write queries in relational algebra. We
use the Sailors, Reserves, and Boats schema for all our examples in this section.

(Q1) Find the names of sailors who have reserved boat 103.

We first compute the set of tuples in Reserves with bid = 103 and then take the natural join of
this set with Sailors. Evaluated on the instances R2 and S3, it yields a relation that contains
just one field, called sname, and three tuples Dustin, Horatio, and Lubber.

Gn 66
DDBMS

This query involves a series of two joins. First we choose (tuples describing) red boats. Then
we join this set with Reserves (natural join, with equality specified on the bid column) to
identify reservations of red boats. Next we join the resulting intermediate relation with
Sailors (natural join, with equality specified on the sid column) to retrieve the names of
sailors who have made reservations of red boats. Finally, we project the ‘sailors' names. The
answer, when evaluated on the instances B1, R2 and S3, contains the names Dustin, Horatio,
and Lubber.

This query is very similar to the query we used to compute sailors who reserved red boats. On
instances B1, R2, and S3, the query will return the colors green and red boat.

The join of Sailors and Reserves creates an intermediate relation in which tuples consist of a
Sailors tuple attached to a Reserves tuple. A Sailors tuple appears in (some tuple of) this
intermediate relation only if at least one Reserves tuple has the same sid value, that is, the
sailor has made some reservation. The answer, when evaluated on the instances B1, R2 and
S3, contains the three tuples Dustin, Horatio, and Lubber. Even though there are two sailors
called Horatio who have reserved a boat, the answer contains only one copy of the tuple
Horatio, because the answer is a relation, i.e., a set of tuples, without any duplicates.

We identify the set of all boats that are either red or green (Tempboats, which contains boats
with the bids 102, 103, and 104 on instances B1, R2, and S3). Then we join with Reserves to
identify sids of sailors who have reserved one of these boats; this gives us sids 22, 31, 64, and
74 over our example instances. Finally, we join (an intermediate relation containing this set
of sids) with Sailors to find the names of Sailors with these sids. This gives us the names
Dustin, Horatio, and Lubber on the instances B1, R2, and S3.

Gn 67
DDBMS

First we compute tuples of the form 〈sid, sname, bid〉, where sailor sid has made a
reservation for boat bid; this set of tuples is the temporary relation Reservations. Next we find
all pairs of Reservations tuples where the same sailor has made both reservations and the
boats involved are distinct. Here is the central idea, in order to show that a sailor has reserved
two boats, we must find two Reservations tuples involving the same sailor but distinct boats.
Over instances B1, R2, and S3, the sailors with sids 22, 31, and 64 have each reserved at least
two boats. Finally, we project the names of such sailors to obtain the answer, containing the
names Dustin, Horatio, and Lubber.

This query illustrates the use of the set-difference operator. We use the fact that sid is the key
for Sailors. We first identify sailors aged over 20 (over instances B1, R2, and S3, sids 22, 29,
31, 32, 58, 64, 74, 85, and 95) and then discard those who have reserved a red boat (sids 22,
31, and 64), to obtain the answer (sids 29, 32, 58, 74, 85, and 95). If we want to compute the
names of such sailors, we must first compute their sids (as shown above), and then join with
Sailors and project the sname values.

The intermediate relation Tempsids is defined using division, and computes the set of sids of
sailors who have reserved every boat (over instances B1, R2, and S3, this is just sid 22). We
define the two relations that the division operator (/) is applied to—the first relation has the
schema (sid,bid) and the second has the schema (bid). Division then returns all sids such that
there is a tuple 〈sid,bid〉 in the first relation for each bid in the second. Joining Tempsids

Gn 68
DDBMS

with Sailors is necessary to associate names with the selected sids, for sailor 22, the name is
Dustin.

The only difference with respect to the previous query is that now we apply a selection to
Boats, to ensure that we compute only bids of boats named Interlake in defining the second
argument to the division operator. Over instances B1, R2, and S3, Tempsids evaluates to sids
22 and 64, and the answer contains their names, Dustin and Horatio.

RELATIONAL CALCULUS
Relational calculus is a non-procedural query language that tells the system what data to be
retrieved but doesn’t tell how to retrieve it. It uses mathematical predicate calculus.
There are two types of Relational Calculus:
 Tuple Relational Calculus
 Domain Relational Calculus
TUPLE RELATIONAL CALCULUS (TRC)
Tuple relational calculus which was originally proposed by Codd in the year 1972. Tuple
relational calculus is used for selecting those tuples that satisfy the given condition.

The general syntax for Tuple Relational Calculus is:


{ t | P (t) } or { t | Condition (t) }
In the above syntax, t is the resulting tuples, and P(t) is the condition used to get t.
P(t) may have various conditions logically combined with OR (∨), AND (∧), NOT(¬).
It also uses quantifiers:
∃ t ∈ r (P(t)) = “there exists” a tuple in t in relation r such that predicate P(t) is true.
∀ t ∈ r (P(t)) = P(t) is true “for all” tuples in relation r.
Consider a relation STUDENT
First_Name Last_Name Age
Narendra Chari 28
Swapna Kumari 30
Jagan Mohan 26
Navya Kumari 29

Gn 69
DDBMS

1: Find the First_name, Last_name, Age of students greater than or equal to 27 age.
{t| t ∈ STUDENT ∧ t[Age]>=27} or { t | STUDENT(t) AND t.age >=27 }
Resulting relation:
First_Name Last_Name Age
Narendra Chari 28
Swapna Kumari 30
Navya Kumari 29
2: Query to display the first name of those students where age is greater than 29
{ t.First_Name | STUDENT(t) AND t.age > 29 }
Resulting relation:
First_Name
Swapna

3: Query to display all the details of students where Last name is ‘Kumari’
{ t | STUDENT(t) AND t.Last_Name = 'Kumari' }
Resulting relation:
First_Name Last_Name Age
Swapna Kumari 30
Navya Kumari 29

DOMAIN RELATIONAL CALCULUS (DRC)


In domain relational calculus, filtering is done based on the domain of the attributes and not
based on the tuple values.
The general syntax for Domain Relational Calculus is:
{ c1, c2, c3, ..., cn | P(c1, c2, c3, ... ,cn)}
where c1, c2... etc represents domain of attributes(columns) and P defines the formula
including the condition for fetching the data.

Example: Query to find the first name and age of students where student age is greater than
27
{< First_Name, Age > | ∈ STUDENT ∧ Age > 27}
Resulting relation:

Gn 70
DDBMS

First_Name Age
Narendra 28
Swapna 30
Navya 29

SQL (STRUCTURED QUERY LANGUAGE)


SQL was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce after
learning about the relational model from Edgar F. Codd in the early 1970s.
It is used in programming and managing data held in relational database management
systems such as MySql, MS SQL Server, oracle Sybase, etc as a medium (instructions) for
accessing and interacting with data.

Types of SQL Commands


The following are different types of SQL commands:
DDL (Data Definition Language): In SQL DDL commands are used to create and modify the
structure of a database and database objects. These commands
are CREATE, DROP, ALTER, TRUNCATE, and RENAME.
DML (Data Manipulation Language): Once the tables are created and the database is
generated using DDL commands, manipulation inside those tables and databases is done
using DML commands. DML is used for inserting, deleting, and updating data in a database.
It is used to retrieve and manipulate data in a relational database. It includes INSERT,
UPDATE, and DELETE.
DQL (Data Query Language): DQL commands are used for fetching data from a relational
database. They perform read-only queries of data. The only command, 'SELECT' is
equivalent to the projection operation in relational algebra. It command selects the attribute
based on the condition described by the WHERE clause and returns them.
DCL (Data Control Language): DCL includes commands such
as GRANT and REVOKE which mainly deal with the rights, permissions, and other controls
of the database system.
TCL (Transaction Control Language): Transaction Control Language as the name suggests
manages the issues and matters related to the transactions in any database. They are used to
roll back or commit the changes in the database.

Gn 71
DDBMS

THE FORM OF A BASIC SQL QUERY


The basic form of an SQL query is as follows:
SELECT [ DISTINCT ] select-list
FROM from-list
WHERE qualification

Consider the syntax of a basic SQL query in more detail.


 The from-list in the FROM clause is a list of table names. A table name can be followed
by a range variable; a range variable is particularly useful when the same table name
appears more than once in the from-list.
 The select-list is a list of column names of tables named in the from-list. Column names
can be prefixed by a range variable.
 The qualification in the WHERE clause is a boolean combination (i.e., an expression
using the logical connectives AND, OR, and NOT) of conditions of the form expression
op expression, where op is one of the comparison operators {< ;< =; =; <> ;>= ;>}. An
expression is a column name, a constant, or an (arithmetic or string) expression.
 The DISTINCT keyword is optional. It indicates that the table computed as an answer to
this query should not contain duplicates, that is, two copies of the same row. The default
is that duplicates are not eliminated.

We will present a number of sample queries using the following table definitions:
Sailors(sid: integer, sname: string, rating: integer, age: real)
Boats(bid: integer, bname: string, color: string)
Reserves(sid: integer, bid: integer, day: date)

Gn 72
DDBMS

Example:
1. Find the names and ages of all Sailors
SELECT DISTINCT sname, age FROM Sailors (or)
SELECT DISTINCT S.sname, S.age FROM Sailors S
Answer is:
The answer is a set of rows, each of which is a pair (sname, age). If two or more sailors have
the same name and age, the answer still contains just one pair with that name and age. This
query is equivalent to applying the projection operator of relational algebra.

Gn 73
DDBMS

If we omit the keyword DISTINCT then the Query is


SELECT S.sname, S.age FROM Sailors S
Answer is:

2. Find all sailors with a rating above 7.


SELECT sid, sname, rating, age FROM Sailors WHERE rating >7 (or)
SELECT * FROM Sailors WHERE rating >7 (or)
SELECT S.sid, S.sname, S.rating, S.age FROM Sailors AS S WHERE S.rating > 7 -
This query uses the optional keyword AS to introduce a range variable.

Conceptual evaluation strategy


 Compute the cross-product of the tables in the from-list.
 Delete those rows in the cross-product that fail the qualification conditions.
 Delete all columns that do not appear in the select-list.
 If DISTINCT is specified, eliminate duplicate rows.
We illustrate the conceptual evaluation strategy using the following query:

3. Find the names of sailors who have reserved boat number 103.
It can be expressed in SQL as follows.
SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid AND R.bid=103

Let us compute the answer to above query on the instances R3 of Reserves and S4 of Sailors
shown below

Gn 74
DDBMS

The first step is to construct the cross-product S4 x R3, which is shown below

The second step is to apply the qualification S.sid = R.sid AND R.bid=103.
Then the Result is:
sname
rusty

Examples of Basic SQL Queries


1. Find the names of sailors who have reserved boat number 103
It can be expressed in SQL as,
SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid AND R.bid=103
(or)
SELECT sname FROM Sailors, Reserves WHERE Sailors.sid = Reserves.sid AND bid=103.

2. Find the sids of sailors who have reserved a red boat.


SELECT R.sid FROM Boats B, Reserves R WHERE B.bid = R.bid AND B.color = ‘red’

3. Find the names of sailors who have reserved a red boat.


SELECT S.sname FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid
= B.bid AND B.color = 'red'

Gn 75
DDBMS

4. Find the colors of boats reserved by Lubber.


SELECT B.color FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid =
B.bid AND S.sname = `Lubber'

5. Find the names of sailors who have reserved at least one boat.
SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid

Expressions and Strings in the SELECT Command


 SQL supports a more general version of the select-list than just a list of columns.
 Each item in a select-list can be of the form expression AS column name, where
expression is any arithmetic or string expression over column names and constants.
 SQL provides support for pattern matching through the LIKE operator, along with the use
of the wild-card symbols % (which stands for zero or more arbitrary characters) and _
(which stands for exactly one, arbitrary, character). Thus, '_AB%' denotes a pattern
matching every string that contains at least three characters, with the second and third
characters being A and B respectively.

Ex: Find the ages of sailors whose name begins and ends with B and has at least three
characters.
SELECT S.age FROM Sailors S WHERE S.sname LIKE ‘B_%B'

The only such sailor is Bob, and his age is 63.5.

UNION, INTERSECT, AND EXCEPT (OR) SQL SET OPERATIONS


SQL set operations are used for combining data from one or more tables.
SQL Set operation is used to combine the two or more SQL SELECT statements.
Types of Set Operation
 Union
 Union All
 Intersect
 Except or Minus
Each SELECT statement that is used this set operators must follow these conditions
 The same number of columns

Gn 76
DDBMS

 The columns must also have similar data types


 The columns in each SELECT statement must also be in the same order
General Syntax for all SET operators
SELECT column name(s) FROM table1
UNION/UNION ALL/INTERSECT/MINUS
SELECT column name(s) FROM table2
Consider the following two tables
STUDENT1
ID NAME
1 Ravi
2 Gopal
3 Swapna
STUDENT2
ID NAME
3 Swapna
4 Kalyan
5 Ram

Union
The SQL Union operation is used to combine the result of two or more SQL SELECT
queries. The union operation eliminates the duplicate rows from its resultset.
Syntax is
SELECT column_list FROM table1 where condition
UNION
SELECT column_list FROM table2 where condition
For Example,
SELECT * FROM STUDENT1
UNION
SELECT * FROM STUDENT2
The Result is:
ID NAME
1 Ravi
2 Gopal
3 Swapna

Gn 77
DDBMS

4 Kalyan
5 Ram
Union All
Union All operation is equal to the Union operation. It returns the set without removing
duplication and sorting the data.
Syntax is
SELECT column_list FROM table1 where condition
UNION
SELECT column_list FROM table2 where condition
For Example,
SELECT * FROM STUDENT1
UNION ALL
SELECT * FROM STUDENT2
The Result is:
ID NAME
1 Ravi
2 Gopal
3 Swapna
3 Swapna
4 Kalyan
5 Ram

Intersect
It is used to combine two SELECT statements. The Intersect operation returns the common
rows from both the SELECT statements.
Syntax is
SELECT column_list FROM table1 where condition
INTERSECT
SELECT column_list FROM table2 where condition
For Example,
SELECT * FROM STUDENT1
INTERSECT
SELECT * FROM STUDENT2
The Result is:

Gn 78
DDBMS

ID NAME
3 Swapna

Minus
It combines the result of two SELECT statements. Minus operator is used to display the rows
which are present in the first query but absent in the second query.
Syntax is
SELECT column_list FROM table1 where condition
MINUS
SELECT column_list FROM table2 where condition
For Example,
SELECT * FROM STUDENT1
MINUS
SELECT * FROM STUDENT2
The Result is:
ID NAME
1 Ravi
2 Gopal

Examples:
1. Find the names of sailors who have reserved a red or a green boat.
SELECT S1.sname FROM Sailors S1, Boats B1, Reserves R1
WHERE S1.sid = R1.sid AND R1.bid = B1.bid AND B1.color = 'red'
UNION
SELECT S2.sname FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = 'green'

2. Find the names of sailors who have reserved a red and a green boat.
SELECT S1.sname FROM Sailors S1, Boats B1, Reserves R1
WHERE S1.sid = R1.sid AND R1.bid = B1.bid AND B1.color = 'red'
INTERSECT
SELECT S2.sname FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = 'green'

3. Find the sids of all sailor's who have reserved red boats but not green boats.

Gn 79
DDBMS

SELECT S1.sid FROM Sailors S1, Boats B1, Reserves R1


WHERE S1.sid = R1.sid AND R1.bid = B1.bid AND B1.color = 'red'
EXCEPT
SELECT S2.sid FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = 'green'

NESTED QUERIES (OR) SUBQUERIES


One of the most powerful features of SQL is nested queries. A nested query is a query that
has another query embedded within it; the embedded query is called a subquery.
A subquery typically appears within the WHERE clause of a query. Subqueries can
sometimes appear in the FROM clause or the HAVING clause.
A nested query in SQL contains a query inside another query. The result of the inner query
will be used by the outer query.
Subqueries are most frequently used with the SELECT statement.

The basic syntax is


SELECT column_name [, column_name ] FROM table1 [, table2 ]
WHERE column_name OPERATOR
(
SELECT column_name [, column_name ] FROM table1 [, table2 ]
[WHERE]
)
Nested queries in SQL can be classified into two different types:
 Independent Nested Queries
 Co-related Nested Queries or Correlated Nested Queries
Independent Nested Queries
In independent nested queries, the execution order is from the innermost query to the outer
query. An outer query won't be executed until its inner query completes its execution. The
result of the inner query is used by the outer query. Operators such as IN, NOT IN, ALL,
and ANY are used to write independent nested queries.
 The IN operator checks if a column value in the outer query's result is present in the
inner query's result. The final result will have rows that satisfy the IN condition.
 The NOT IN operator checks if a column value in the outer query's result is not
present in the inner query's result. The final result will have rows that satisfy the NOT
IN condition.

Gn 80
DDBMS

 The ALL operator compares a value of the outer query's result with all the values of
the inner query's result and returns the row if it matches all the values.
 The ANY operator compares a value of the outer query's result with all the inner
query's result values and returns the row if there is a match with any value.
Example : IN
 Find the names of sailors who have reserved boat 103.
SELECT S.sname FROM Sailors S
WHERE S.sid IN
( SELECT R.sid FROM Reserves R WHERE R.bid = 103 )
Result is
sname
Dustin
Lubber
Horatio

 Find the names of sailors who have reserved a red boat.


SELECT S.sname FROM Sailors S
WHERE S.sid IN
( SELECT R.sid FROM Reserves R
WHERE R. bid IN
( SELECT B.bid FROM Boats B WHERE B.color = 'red' ) )
Result is
sname
Dustin
Lubber
Horatio

Example : NOT IN
 Find the names of sailors who have not reserved boat 103.
SELECT S.sname FROM Sailors S
WHERE S.sid NOT IN
( SELECT R.sid FROM Reserves R WHERE R.bid = 103 )
Result is

Gn 81
DDBMS

sname
Brutus
Andy
Rusty
Horatio
Zorba
Art
Bob
 Find the names of sailors who have not reserved a red boat.
SELECT S.sname FROM Sailors S
WHERE S.sid NOT IN
( SELECT R.sid FROM Reserves R
WHERE R. bid IN
( SELECT B.bid FROM Boats B WHERE B.color = 'red' ) )
Result is
sname
Brutus
Andy
Rusty
Zorba
Horatio
Art
Bob

Set-Comparison Operators
SQL also supports op ANY and op ALL, where op is one of the arithmetic comparison
operators {<, <=, =, <>, >=, >}.
Example : ALL
 Find the Sailors with the highest rating.
SELECT S.sname FROM Sailors S
WHERE S.rating >= ALL
( SELECT S2.rating FROM Sailors S2 )
Result is

Gn 82
DDBMS

sname
Rusty
Zorba
Example : ANY
 Find sailors whose rating is better than some sailor called Andy.
SELECT S.sname FROM Sailors S
WHERE S.rating > ANY
( SELECT S2.rating FROM Sailors S2 WHERE S2.sname = 'Andy' )

Result is
sname
Rusty
Zorba
Horatio

Co-related Nested Queries


In co-related nested queries, the inner query uses the values from the outer query so that the
inner query is executed for every row processed by the outer query. The co-related nested
queries run slowly because the inner query is executed for every row of the outer query's
result.

Example:
 Find the names of sailors who have reserved boat number 103.
SELECT S.sname FROM Sailors S
WHERE EXISTS
( SELECT * FROM Reserves R WHERE R.bid = 103 AND R.sid = S.sid )

Result is
sname
Dustin
Lubber
Horatio

Gn 83
DDBMS

SQL AGGREGATE FUNCTIONS


SQL aggregation function is used to perform the calculations on multiple rows of a single
column of a table. It returns a single value.
It is also used to summarize the data.

Various Aggregate Functions are


 Count()
 Sum()
 Avg()
 Min()
 Max()
Consider the following table EMP
Id Name Salary
1 A 80
2 B 40
3 C 60
4 D 70
5 E 60
6 F Null

COUNT() - Count function is used to Count the number of rows in a database table. It can
work on both numeric and non-numeric data types.
The Syntax is
COUNT(*) OR COUNT([DISTINCT]COLUMN_NAME)
COUNT(*) returns the total number of rows in a given table.
COUNT(COULUMN_NAME) returns the total number of non-null values present in the
column which is passed as an argument in the function.
For Example,
 SELECT COUNT(*) FROM EMP - Returns total number of records .i.e 6.
 SELECT COUNT(Salary) FROM EMP - Return number of Non Null values over the
column Salary. i.e 5.
 SELECT COUNT(DISTINCT Salary) FROM EMP - Return number of distinct Non Null
values over the column salary .i.e 4

Gn 84
DDBMS

SUM() - Sum function is used to calculate the sum of all selected columns. It works on
numeric fields only.
The Syntax is
SUM([DISTINCT]COLUMN_NAME)
For Example,
 SELECT SUM(Salary) FROM EMP - Sum all Non Null values of Column salary i.e., 310
 SELECT SUM(DISTINCT Salary) FROM EMP - Sum of all distinct Non-Null values
i.e., 250.

AVG() - The AVG function is used to calculate the average value of the numeric type. AVG
function returns the average of all non-Null values.
The Syntax is
AVG([DISTINCT]COLUMN_NAME)
For Example,
 SELECT AVG(Salary) FROM EMP - Sum all Non Null values of Column salary i.e.,
310/5
 SELECT AVG(DISTINCT Salary) FROM EMP - Sum of all distinct Non-Null values
i.e., 250/4.

MIN() - MIN function is used to find the minimum value of a certain column. This function
determines the smallest value of all selected values of a column.
The Syntax is
MIN(COLUMN_NAME)
For Example,
SELECT MIN(Salary) FROM EMP - Minimum value in the salary column except NULL i.e.,
40.

MAX() - MAX function is used to find the maximum value of a certain column. This
function determines the largest value of all selected values of a column.
The Syntax is
MAX(COLUMN_NAME)
For Example,
SELECT MAX(Salary) FROM EMP - Maximum value in the salary i.e., 80.

Gn 85
DDBMS

The GROUP BY, ORDER BY and HAVING Clauses


GROUP BY - In SQL, the GROUP BY clause is used to group rows by one or more
columns.
Important Points are
 GROUP BY clause is used with the SELECT statement.
 In the query, GROUP BY clause is placed after the WHERE clause.
 In the query, GROUP BY clause is placed before ORDER BY clause if used any.
 In the query , Group BY clause is placed before Having clause .

We use the aggregate functions such as COUNT(), MAX(), MIN(), SUM(), AVG(), etc., in
the SELECT query. The result of the GROUP BY clause returns a single row for each value
of the GROUP BY column.

The Syntax is
SELECT column1, function_name(column2) FROM table_name
WHERE condition
GROUP BY column1, column2;
For example consider the following table EMP
Id Name Salary
1 A 80
2 B 40
3 A 60
4 B 70
5 C 60
6 B 100

Consider the query


 SELECT Name, SUM(Salary) FROM EMP GROUP BY Name;
The Result is
Name SUM(Salary)
A 140
B 210
C 60

Gn 86
DDBMS

Consider another table STUDENT


Subject Year Name
English 1 A
English 1 B
English 1 C
English 2 D
English 2 E
Maths 1 F
Maths 1 G
Consider the query
 SELECT Subject, Year, COUNT(*) FROM STUDENT GROUP BY Subject, Year;
The Result is
Subject Year COUNT
English 1 3
English 2 2
Maths 1 2

ORDER BY - The SQL ORDER BY clause is used to sort the result set in either ascending
or descending order.
By default ORDER BY sorts the data in ascending order.
We can use the keyword DESC to sort the data in descending order and the keyword ASC to
sort in ascending order.
Sort according to one column
The Syntax is
SELECT * FROM table_name ORDER BY column_name ASC|DESC
Sort according to multiple columns
The Syntax is
SELECT * FROM table_name ORDER BY column1 ASC|DESC , column2 ASC|DESC
Consider a relation STUDENT
Roll_No First_Name Last_Name Age
1 Narendra Chari 28
2 Swapna Kumari 28
3 Jagan Mohan 26
4 Navya Kumari 26

Gn 87
DDBMS

Consider the Query


 SELECT * FROM STUDENT ORDER BY Roll_No DESC;
The Result is
Roll_No First_Name Last_Name Age
4 Navya Kumari 26
3 Jagan Mohan 26
2 Swapna Kumari 28
1 Narendra Chari 28

Consider another Query


 SELECT * FROM STUDENT ORDER BY Age ASC , ROLL_NO DESC;
The Result is
Roll_No First_Name Last_Name Age
4 Navya Kumari 26
3 Jagan Mohan 26
2 Swapna Kumari 28
1 Narendra Chari 28

We can see that first the result is sorted in ascending order according to Age. There are
multiple rows of having the same Age. Now, sorting further this result-set according to
ROLL_NO will sort the rows with the same Age according to ROLL_NO in descending
order.

HAVING - The HAVING clause places the condition in the groups defined by the GROUP
BY clause in the SELECT statement. This SQL clause is implemented after the GROUP BY
clause in the SELECT statement.
The HAVING clause in SQL is used if we need to filter the result set based on aggregate
functions such as MIN() and MAX(), SUM() and AVG() and COUNT().

The Syntax is
SELECT column_name, aggregate_function_name(column_name) FROM table_name
GROUP BY column_name HAVING condition ORDER BY column1, column2;

Gn 88
DDBMS

Consider the following table


Emp_Id Emp_Name Emp_Salary Emp_City
201 Abhay 2000 Goa
202 Ankit 4000 Delhi
203 Bheem 8000 Jaipur
204 Ram 2000 Goa
205 Sumit 5000 Delhi
If we want to add the salary of employees for each city, we have to write the following query
SELECT SUM(Emp_Salary), Emp_City FROM Employee GROUP BY Emp_City;
The Result is
SUM(Emp_Salary) Emp_City
4000 Goa
9000 Delhi
8000 Jaipur

Now, suppose that we want to show those cities whose total salary of employees is more than
5000. For this case, we have to type the following query with the HAVING clause in SQL
SELECT SUM(Emp_Salary), Emp_City FROM Employee GROUP BY Emp_City
HAVING SUM(Emp_Salary)>5000;
The Result is
SUM(Emp_Salary) Emp_City
9000 Delhi
8000 Jaipur

HAVING Vs WHERE Clause


HAVING Clause WHERE Clause
The HAVING clause checks the condition The WHERE clause checks the condition on
on a group of rows. each individual row.
The HAVING is used with aggregate The WHERE clause cannot be used with
functions. aggregate functions.
The HAVING clause is executed after the The WHERE clause is executed before the
GROUP BY clause. GROUP BY clause.

Gn 89
DDBMS

NULL VALUES
The SQL NULL is the term used to represent a missing value. A NULL value in a table is a
value in a field that appears to be blank.
We use null when the column value is either unknown or inapplicable.
A field with a NULL value is a field with no value. It is very important to understand that a
NULL value is different than a zero value or a field that contains spaces.
For Example consider the following table EMPLOYEE
Emp_Id Emp_Name Emp_Salary Emp_City
201 Abhay 2000 Goa
202 Ankit 4000 Delhi
203 Bheem NULL Jaipur
204 Ram 2000 NULL
205 Sumit NULL Delhi

Comparisons Using Null Values


It is not possible to test for NULL values with comparison operators, such as =, <, or <>.
We will have to use the IS NULL and IS NOT NULL operators instead.
IS NULL - IS NULL operator is used to test whether a NULL is present in the specified
column.
The Syntax is
SELECT * FROM tableName WHERE columnName IS NULL;
For Example,
SELECT * FROM EMPLOYEE WHERE Emp_Salary IS NULL;
Result is
Emp_Id Emp_Name Emp_Salary Emp_City
203 Bheem NULL Jaipur
205 Sumit NULL Delhi

IS NOT NULL - IS NOT NULL operator is used to test for non-null values in the specified
column.
The Syntax is
SELECT * FROM tableName WHERE columnName IS NOT NULL;
For Example,
SELECT * FROM EMPLOYEE WHERE Emp_Salary IS NOT NULL;

Gn 90
DDBMS

Result is
Emp_Id Emp_Name Emp_Salary Emp_City
201 Abhay 2000 Goa
202 Ankit 4000 Delhi
204 Ram 2000 NULL

Logical Connectives AND, OR, and NOT


Use of logical connectives with null values becomes a little complicated if we don’t use three
valued logic.
Typically, the result of a logical expression is TRUE or FALSE. However, when NULL is
involved in the logical evaluation, the result is UNKNOWN. Therefore, a logical expression
may return one of three-valued logic: TRUE, FALSE, and UNKNOWN.
The following table will give you, a better understanding of logical operators when used with
null values.

Impact on SQL Constructs


If we compare two null values using =, the result is unknown! In the context of duplicates,
this comparison is implicitly treated as true, which is an anomaly.
The arithmetic operations +, -, *, and / all return null if one of their arguments is null.
For example, null+2 = null
10*null = null
However, nulls can cause some unexpected behavior with aggregate operations. COUNT(*)
handles null values just like other values; that is, they get counted. All the other aggregate
operations (COUNT, SUM, AVG, MIN, MAX, and variations using DISTINCT) simply
discard null values.

Gn 91
DDBMS

Outer Joins
This is special case of join operator which considers null values.
Outer joins are joins that return matched values and unmatched values from either or both
tables. There are a few types of outer joins:
 Left Join returns only unmatched rows from the left table, as well as matched rows in
both tables.
 Right Join returns only unmatched rows from the right table , as well as matched rows
in both tables.
 Full Outer Join returns unmatched rows from both tables,as well as matched rows in
both tables.
Consider the following two tables DEPARTMENT and PROJECT
DEPARTMENT
DEPT_MID DNO PNO
101 2 11
97 5 22
120 4 33
PROJECT
PNO PNAME
44 D
11 A
22 B
Left Join - The SQL left join returns all the values from the left table and it also includes
matching values from right table, if there are no matching join value it returns NULL.
The Syntax for Left Join:
SELECT table1.column1, table2.column2.... FROM table1
LEFT JOIN table2 ON table1.column_field = table2.column_field;
Join the two tables with LEFT JOIN:
SELECT *.D1, *.P1 FROM DEPARTMENT D1
LEFT JOIN PROJECT P1 ON D1.PNO = P1.PNO;
The result is
DEPT_MID DNO PNO PNO PNAME
101 2 11 11 A
97 5 22 22 B
120 4 33 NULL NULL

Gn 92
DDBMS

Right Join - The SQL right join returns all the values from the right table. It also includes the
matched values from left table, if there is no matching in both tables, it returns NULL.
The Syntax for Right Join:
SELECT table1.column1, table2.column2..... FROM table1
RIGHT JOIN table2 ON table1.column_field = table2.column_field;
We will join the two tables with RIGHT JOIN:
SELECT *.D1, *.P1 FROM DEPARTMENT D1
RIGHT JOIN PROJECT P1 ON D1.PNO = P1.PNO;

The result is
DEPT_MID DNO PNO PNO PNAME
NULL NULL NULL 44 D
101 2 11 11 A
97 5 22 22 B

Full Join - The SQL full join is the result of combination of both left and right outer join and
the join tables have all the records from both tables. It puts NULL on the place of matches
not found.
The Syntax for full outer join:
SELECT table1.column1, table2.column2.... FROM table1
FULL JOIN table2 ON table1.column_name = table2.column_name;
We will join the two tables with FULL JOIN:
SELECT *.D1, *.P1 FROM DEPARTMENT D1
FULL JOIN PROJECT P1 ON D1.PNO = P1.PNO;

The result is
DEPT_MID DNO PNO PNO PNAME
101 2 11 11 A
97 5 22 22 B
120 4 33 NULL NULL
NULL NULL NULL 44 D

Disallowing Null Values


We can disallow null values by specifying NOT NULL as part of the field definition.

Gn 93
DDBMS

For Example,
CREATE TABLE Student
( ID int NOT NULL,
LastName varchar(15) NOT NULL,
FirstName varchar(15),
Age int,
PRIMARY KEY (ID)
);

COMPLEX INTEGRITY CONSTRAINTS IN SQL


Integrity constraints need not only be applied on single columns, they can also be applied on
single table or group of tables (called assertions).
Constraints over a Single Table
We can specify complex constraints over a single table using table constraints, which have
the form CHECK conditional-expression.
For example, to ensure that rating must be an integer in the range 1 to 10, we could use:
CREATE TABLE Sailors ( sid INTEGER,
sname CHAR(10),
rating INTEGER,
age REAL,
PRIMARY KEY (sid),
CHECK (rating >= 1 AND rating <= 10 ));
When a row is inserted into Sailors or an existing row is modified, the conditional expression
in the CHECK constraint is evaluated. If it evaluates to false, the command is rejected.
Domain Constraints and Distinct Types
A domain is essentially a data type with optional constraints (restrictions on the allowed set
of values).
A user can define a new domain using the CREATE DOMAIN statement, which uses
CHECK constraints.
The syntax for creating a new domain is
CREATE DOMAIN Domain_Name Source_Domain DefaultValue CHECK(Value);
For Example,
CREATE DOMAIN ratingval INTEGER DEFAULT 1
CHECK ( VALUE >= 1 AND VALUE <= 10 );

Gn 94
DDBMS

INTEGER is the source type for the domain ratingval. The optional DEFAULT keyword is
used to associate a default value with a domain. If no value is entered for this column in an
inserted tuple, the default value 1 associated with ratingval is used.
Another Example, we can create a new domain for salary by stating the following SQL
statement
CREATE DOMAIN salary INTEGER DEFAULT 15000
CHECK (VALUE>=15000 AND VALUE<=40000)
Assertions: ICs over Several Tables
Assertions are group of tables on which a constraint is applied. Unlike table constraints which
are applied on single table, assertions are applied on multiple tables.
As an example, suppose that we wish to enforce the constraint that the number of boats plus
the number of sailors should be less than 100.
We could try the following table constraint:
CREATE TABLE Sailors ( sid INTEGER,
sname CHAR ( 10) ,
rating INTEGER,
age REAL,
PRIMARY KEY (sid),
CHECK ( rating >= 1 AND rating <= 10)
CHECK ( (SELECT COUNT (S.sid) FROM Sailors S )
+ (SELECT COUNT (B. bid) FROM Boats B) < 100));
The disadvantage of above solution is that it involves only Sailors table whereas Boats table
must also be involved equally.
The best solution is to create an assertion, as follows
CREATE ASSERTION total
CHECK ((SELECT COUNT (S.sid) FROM Sailors S)
+ (SELECT COUNT (B. bid) FROM Boats B) < 100);

TRIGGERS AND ACTIVE DATABASES


Triggers
A trigger is a procedure that is automatically invoked by the DBMS in response to specified
changes to the database, and is typically specified by the DBA. A database that has a set of
associated triggers is called an active database.

Gn 95
DDBMS

A trigger description contains three parts


 Event: Event describes the modifications done to the database which lead to activation of
trigger.
 Condition: Conditions are used to specify whether the particular action must be
performed or not. If the condition is evaluated to true then the respective action is taken
otherwise the action is rejected.
 Action: Action specifies the action to be taken when the corresponding event occurs and
the condition evaluates to true. An action is collection of SQL statements that are
executed as a part of trigger activation.
It is possible to activate the trigger before the event or after the event.
The Syntax is
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
[for each row]
[trigger_body]
Explanation of syntax is
 create trigger [trigger_name]: Creates or replaces an existing trigger with the
trigger_name.
 [before | after]: This specifies when the trigger will be executed.
 {insert | update | delete}: This specifies the DML operation.
 on [table_name]: This specifies the name of the table associated with the trigger.
 [for each row]: This specifies a row-level trigger, i.e., the trigger will be executed for
each row being affected.
 [trigger_body]: This provides the operation to be performed as trigger is fired

The following example shows trigger activation after the event


CREATE TRIGGER count AFTER INSERT ON Students /* Event*/
WHEN (new.age < 18) /* Condition*/
FOR EACH ROW
BEGIN /* Action*/
count := count + 1;
END

Gn 96
DDBMS

The different types of triggers are


 Statement level trigger − It is executed only once for DML statement irrespective of
number of rows affected by statement. Statement-level triggers are the default type of
trigger.
 Row-level triggers − It is executed for each row that is affected by DML command. For
example, if an UPDATE command updates 150 rows then a row-level trigger is fired 150
times whereas a statement-level trigger is fired only for once.

Active Databases
An active Database is a database consisting of a set of triggers. These databases are very
difficult to be maintained because of the complexity that arises in understanding the effect of
these triggers.
In such database, DBMS initially verifies whether the particular trigger specified in the
statement that modifies the database is activated or not, prior to executing the statement. If
the trigger is active then DBMS executes the condition part and then executes the action part
only if the specified condition is evaluated to true. It is possible to activate more than one
trigger within a single statement. In such situation, DBMS processes each of the trigger
randomly.
The execution of an action part of a trigger may either activate other triggers or the same
trigger that Initialized this action. Such types of trigger that activates itself is called as
‘recursive trigger’.
There are several uses of triggers
 Triggers can be used to maintain data integrity
 Triggers can be used to identify unusual events that occur in a database
 Triggers can be used for security checks and also for auditing.

Gn 97
DDBMS

UNIT-IV

Gn 98
DDBMS

PURPOSE OF NORMALIZATION OR SCHEMA REFINEMENT


Normalization is the process of reducing data redundancy in a table and improving data
integrity.
(or)
Normalization is a process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are:
Insertion, update and deletion anomaly.
For Example, A manufacturing company stores the employee details in a table Employee that
has four attributes: Emp_Id for storing employee’s id, Emp_Name for storing employee’s
name, Emp_Address for storing employee’s address and Emp_Dept for storing the
department details in which the employee works.
The Employee table is given below
Emp_Id Emp_Name Emp_Address Emp_Dept
101 Anand Delhi D001
101 Anand Delhi D002
145 Maggie Hyderabad D050
177 Kavya Chennai D003
177 Kavya Chennai D004
The above table is not normalized. The problems that we face when a table in database is not
normalized are
Update anomaly - In the above table we have two rows for employee Rick as he belongs to
two departments of the company. If we want to update the address of Rick then we have to
update the same in two rows or the data will become inconsistent. If somehow, the correct
address gets updated in one department but not in other then as per the database, Rick would
be having two different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly - Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into the
table if Emp_Dept field doesn’t allow null.
Delete anomaly - Let’s say in future, company closes the department D050 then deleting the
rows that are having Emp_Dept as D050 would also delete the information of
employee Maggie since she is assigned only to this department.
To overcome the above anomalies we need to normalize the data.

Gn 99
DDBMS

CONCEPT OF FUNCTIONAL DEPENDENCY


A functional dependency is a constraint that specifies the relationship between two sets of
attributes where one set can accurately determine the value of other sets. It is denoted as X →
Y, where X is a set of attributes that is capable of determining the value of Y. The attribute
set on the left side of the arrow, X is called Determinant, while on the right side, Y is called
the Dependent.
For Example, Consider the following table
Roll_no Name Dept_name Dept_building
42 abc CSE A4
43 pqr IT A3
44 xyz CSE A4
45 xyz IT A3
46 mno ECT B2
47 jkl ME B2

From the above table we can conclude some valid functional dependencies:
 roll_no → { name, dept_name, dept_building }, Here, roll_no can determine values of
fields name, dept_name and dept_building, hence a valid Functional dependency
 roll_no → dept_name, Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
 dept_name → dept_building , Dept_name can identify the dept_building accurately,
since departments with different dept_name will also have a different dept_building
 More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.

Here are some invalid functional dependencies:


 name → dept_name Students with the same name can have different dept_name, hence
this is not a valid functional dependency.
 dept_building → dept_name There can be multiple departments in the same building,
For example, in the above table departments ME and EC are in the same building B2,
hence dept_building → dept_name is an invalid functional dependency.
 More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no,
dept_building → roll_no, etc.

Gn 100
DDBMS

Armstrong’s axioms/properties of functional dependencies or Inference rules


 Reflexivity - If Y is a subset of X, then X→Y holds by reflexivity rule.
For example, {roll_no, name} → name is valid.
 Augmentation - If X → Y is a valid dependency, then XZ → YZ is also valid by the
augmentation rule.
For example, If {roll_no, name} → dept_building is valid, hence
{roll_no, name, dept_name} → {dept_building, dept_name} is also valid.
 Transitivity - If X → Y and Y → Z are both valid dependencies, then X→Z is also valid
by the Transitivity rule.
For example, roll_no → dept_name & dept_name → dept_building, then
roll_no → dept_building is also valid.
Additional Rules - Additional rules are
 Union Rule: If X→Y and X→Z then X→YZ.
 Decomposition Rule: If X→YZ then X→Y & X→Z.
 Pseudo Transitivity Rule: If X→Y and YZ→W then XZ→W where W is a set of
attributes of R.

Types of Functional dependencies in DBMS


 Trivial functional dependency
 Non-Trivial functional dependency
 Multivalued functional dependency
 Transitive functional dependency
Trivial Functional Dependency - In Trivial Functional Dependency, a dependent is always
a subset of the determinant i.e. If X → Y and Y is the subset of X, then it is called trivial
functional dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency, since the
dependent name is a subset of determinant set {roll_no, name}
Similarly, roll_no → roll_no is also an example of trivial functional dependency.

Gn 101
DDBMS

Non-trivial Functional Dependency - In Non-trivial functional dependency, the dependent


is strictly not a subset of the determinant i.e. If X → Y and Y is not a subset of X, then it is
called Non-trivial functional dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the


dependent name is not a subset of determinant roll_no.
Similarly, {roll_no, name} → age is also a non-trivial functional dependency, since age is not
a subset of {roll_no, name}.

Multivalued Functional Dependency - In Multivalued functional dependency, entities of


the dependent set are not dependent on each other i.e. If a → {b, c} and there exists no
functional dependency between b and c, then it is called a multivalued functional
dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the


dependents name & age are not dependent on each other (i.e. name → age or age → name
doesn’t exist !)

Transitive Functional Dependency - In transitive functional dependency, dependent is


indirectly dependent on determinant i.e. If a → b & b → c, then according to axiom of
transitivity, a → c. This is a transitive functional dependency.
For example,

Gn 102
DDBMS

enrol_no name dept building_no


42 abc CSE 4
43 pqr ECT 2
44 xyz IT 1
45 abc ECT 2
Here, enrol_no → dept and dept → building_no, Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect
functional dependency, hence called Transitive functional dependency.

NORMAL FORMS BASED ON FUNCTIONAL DEPENDENCY (1NF, 2NF, 3NF)


Normal form is defined as a set of rules that are framed inorder to convert a relation into
standard form.
The process of normalization helps us divide a larger table in the database into various
smaller tables and then link their using relationships.
Normal forms are basically useful for reducing the overall redundancy (repeating data) from
the tables present in a database, so as to ensure logical storage.

Types of Normal Forms in DBMS


Different types of normal forms are as follows
1NF: We can say that a relation is in 1NF when it consists of an atomic value. Any
multivalued attributes (also called repeating groups) have been removed, so there is a single
value at the intersection of each row and column of the table.
2NF: We can say that a relation is in 2NF when it is already in 1NF, but all the non-key
attributes fully and functionally depend on their primary keys. (or)
We can say that a relation is in 2NF when it is already in 1NF, any partial functional
dependencies have been removed.
3NF: We can say that a relation is in 3NF when it is already in 2NF, but it does not consist of
any transition dependencies.
BCNF: We can say that a relation is in BCNF when it is already in 3NF, any remaining
anomalies that result from functional dependencies have been removed.
4NF: We can say that a relation is in 4NF when it is in BCNF or Boyce-Codd Normal Form,
but it does not have any multi-valued dependencies.
5NF: We can say that a relation is in the 5NF when it is already in 4NF, but it does not
consist of the join dependencies. Also, the joining must be lossless here.

Gn 103
DDBMS

First Normal Form (1NF)


If a relation contain composite or multi-valued attribute, it violates first normal form or a
relation is in first normal form if it does not contain any composite or multi-valued attribute.
A relation is in first normal form if every attribute in that relation is singled valued attribute.
For example, consider the following table
RNo Name Lang-Known Course Fee
1 Gopal Telugu ECT 45K
English
2 RamaRao Telugu CSE 50K
Hindi
3 Chitti English IT 40K
The above table is not in 1NF because of multi-valued attribute Lang-Known. Its
decomposition into 1NF has shown below table.
RNo Name Lang-Known Course Fee
1 Gopal Telugu ECT 45K
1 Gopal English ECT 45K
2 RamaRao Telugu CSE 50K
2 RamaRao Hindi CSE 50K
3 Chitti English IT 40K

Second Normal Form (2NF)


A table is in 2NF, only if a relation is in 1NF and every non-key attribute is fully dependent
on primary key.
The Second Normal Form eliminates partial dependencies on primary keys.
Full Functional Dependency - A functional dependency denoted
as X→Y where X and Y are an attribute set of a relation, is a full dependency, if all the
attributes present in X are required to maintain the dependency.
X→Y is full FD if we remove any attribute of X violates the FD rule.
For example, (A,B) → C is full FD then A → C is not FD and B → C is not FD
Partial Functional Dependency - A functional dependency denoted
as X→Y where X and Y are an attribute set of a relation, is a partial dependency, if some
attribute A∈X can be removed and the dependency still holds.
X→Y is partial FD if we remove any attribute of X does not violates the FD rule.
For example, (A,B) → C is full FD then A → C is FD (or) B → C is FD

Gn 104
DDBMS

Consider the following table Student which is not in 2NF


RNo Name Lang-Known Course Fee
1 Gopal Telugu ECT 45K
1 Gopal English ECT 45K
2 RamaRao Telugu CSE 50K
2 RamaRao Hindi CSE 50K
3 Chitti English IT 40K

The above table is in 1NF since it contains atomic values. Assume RNo is primary Key.
Decompose the above table into two smaller relations depending on corresponding FD’s i.e.,
FD RNo → Name, Course, Fee and FD RNo → Lang-Known to remove the partial
functional dependency and bring the above table in 2NF as follows.

The Student table decompose into Std and Lang tables


Std Table
RNo Name Course Fee
1 Gopal ECT 45K
2 RamaRao CSE 50K
3 Chitti IT 40K
Lang table
RNo Lang-Known
1 Telugu
1 English
2 Telugu
2 Hindi
3 English

Third Normal Form (3NF)


A relation is in 3NF when it is in 2NF and there is no transitive dependency.
(or)
A relation is in third normal form, if there is no transitive dependency for non-prime
attributes as well as it is in second normal form.

Consider the above Std Table shown below

Gn 105
DDBMS

RNo Name Course Fee


1 Gopal ECT 45K
2 RamaRao CSE 50K
3 Chitti IT 40K

From the above table FD Set is RNo → Name, RNo → Course, RNo → Fee, Course → Fee
RNo is the Primary Key.
If A->B and B->C are the two FD’s, then A->C is called the Transitive Dependency.
For the above relation, RNo → Course and Course → Fee are true. So Fee is transitively
dependent on RNo. It violates the third normal form.

To convert it in third normal form, we will decompose the relation


Std (RNo, Name, Course, Fee) as Std (RNo, Name, Fee) and Course (Course, Fee)
So divide the Std table into two tables
Std Table
RNo Name Course
1 Gopal ECT
2 RamaRao CSE
3 Chitti IT

Course Table
Course Fee
ECT 45K
CSE 50K
IT 40K
Lang Table
RNo Lang-Known
1 Telugu
1 English
2 Telugu
2 Hindi
3 English

Gn 106
DDBMS

BOYCE-CODD NORMAL FORM (BCNF)


BCNF (Boyce Codd Normal Form) is the advanced version of 3NF. A table is in BCNF if
every functional dependency X->Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD. LHS is super key.
For Example, Consider a relation R with attributes (student, subject, teacher).
Student Teacher Subject
Chitti Gopinath Database
Chitti Subbarayudu CN
Manoj Gopinath Database
Manoj Eswar CN
FD’s are { (student, Teacher) -> subject
(student, subject) -> Teacher
Teacher -> subject}
Candidate keys are (student, teacher) and (student, subject).
The above relation is in 3NF since there is no transitive dependency. The above relation is
not in BCNF, because in the FD (teacher->subject), teacher is not a key.
This relation suffers with anomalies − For example, if we try to delete the student Manoj, we
will lose the information that Eswar teaches CN. These difficulties are caused by the fact the
teacher is determinant but not a candidate key.
Decomposition for BCNF
Teacher-> subject violates BCNF since teacher is not a candidate key.
If X->Y violates BCNF then divide R into R1(X, Y) and R2(R-Y).
So R is divided into two relations R1(Teacher, subject) and R2(student, Teacher).
R1 Table
Teacher Subject
Gopinath Database
Subbarayudu CN
Eswar CN
R2 Table
Student Teacher
Chitti Gopinath
Chitti Subbarayudu
Manoj Gopinath
Manoj Eswar

Gn 107
DDBMS

All the anomalies which were present in R, now removed in the above two relations.

CONCEPT OF SURROGATE KEY


Surrogate key also called a synthetic primary key, is generated when a new record is inserted
into a table automatically by a database that can be declared as the primary key of that table.
It is the sequential number outside of the database that is made available to the user and the
application or it acts as an object that is present in the database but is not visible to the user or
application.
We can say that, in case we do not have a natural primary key in a table, then we need to
artificially create one in order to uniquely identify a row in the table, this key is called the
surrogate key or synthetic primary key of the table. However, surrogate key is not always the
primary key.
Features of the surrogate key
 It is automatically generated by the system.
 It holds anonymous integer.
 It contains unique value for all records of the table.
 The value can never be modified by the user or application.
 Surrogate key is called the factless key as it is added just for our ease of identification
of unique values and contains no relevant fact (or information) that is useful for the
table.
Consider the following example for a better understanding of the SURROGATE key in SQL.
The below data includes information on each well's geographic location and depth, as
indicated in the Well table
Longitude Latitude Depth
220 140 5.6
220 160 5.6
220 170 7.5
340 170 8.2
340 510 9.4
The table above shows two wells with the same longitude, latitude, or depth. As a result, you
can't select the primary key from one of these three columns because they don't uniquely
identify the row.
You'll need to create a SURROGATE key column, which can be a unique auto-number
column. Here's an example of an auto-number SURROGATE field called WellId in the table:

Gn 108
DDBMS

WellId Longitude Latitude Depth


1 220 140 5.6
2 220 160 5.6
3 220 170 7.5
4 340 170 8.2
5 340 510 9.4

FORTH NORMAL FORM (4NF)


A relation is said to be in 4NF if the relation is in Boyce Codd Normal Form (BCNF) and has
no multi-valued dependency.
Multi-valued Dependency
If the following requirements are met, a table is said to have a multi-valued dependency,
 A table should have at least 3 columns.
 For a single value of A in the dependency A -> B, multiple values of B exist.
 For the relation R(A,B,C), if A and B have a multi-valued dependency,
then B and C should be independent of each other.
Decomposition for 4NF
 If R(X,Y,Z) has X->Y and X->Z then, R is decomposed to R1(X,Y) and R2(X,Z).
 If R(A,B,C,D) has A −> B and A −> C then, R is decomposed
to R1(A,B) and R2(A,C,D).

For Example, Consider a relation R with attributes (EMP_ID, DEPT, HOBBY).


EMP_ID DEPT HOBBY
E1 ECT Badminton
E1 AI & ML Reading
E2 CSE Cricket
E3 ECE Football
In the above relation, you can see that for the Employee E1 multiple records exist in the
DEPT and the HOBBY attribute.
Hence the multi-valued dependencies are,
EMP_ID −> DEPT and
EMP_ID −> HOBBY
Also, the DEPT and HOBBY attributes are independent of each other thus leading to a multi-
valued dependency in the above relation.

Gn 109
DDBMS

Therefore, the above relation is not in 4NF.


To satisfy the fourth normal form, we can decompose the above relation into two tables,
R1(EMP_ID,DEPT) and
R2(EMP_ID,HOBBY)
R1 Table
EMP_ID DEPT
E1 ECT
E1 AI & ML
E2 CSE
E3 ECE
R2 Table
EMP_ID HOBBY
E1 Badminton
E1 Reading
E2 Cricket
E3 Football
In addition to multi-valued dependency, a table can have functional dependency too. In that
case, the functionally dependent columns are moved to a different table, while the multi-
valued dependent columns are moved to other tables.

LOSSLESS JOIN AND DEPENDENCY PRESERVING DECOMPOSITION


Decomposition of a relation in relational model is done to convert it into appropriate normal
form.
A relation R is decomposed into two or more only if the decomposition is both lossless join
and dependency preserving.

Lossless join decomposition


There are two possibilities when a relation R is decomposed into R1 and R2.They are
 Lossy decomposition i.e., R1⋈R2⊃R
 Lossless decomposition i.e., R1⋈R2=R
For a decomposition to be lossless, it should hold the following conditions
 Union of attributes of R1 and R2 must be equal to attribute R. each attribute of R must
be either in R1 or in R2 i.e., Att(R1) ⋃ Att(R2) = Att(R)
 Intersection of attributes of R1 and R2 must not be null i.e., Att(R1) ⋂ Att(R2) ≠ Ø

Gn 110
DDBMS

 Common attribute must be a key for at least one relation(R1 or R2) i.e., Att(R1) ⋂
Att(R2) -> Att(R1) or Att(R1) ⋂ Att(R2)->Att(R2)
For Example,
A relation R(A,B,C,D) with FD set {A->BC} is decomposed into R1(ABC) and R2(AD).
This is lossless join decomposition because
 First rule holds true as Att(R1) ⋃ Att(R2)=(ABC) ⋃ (AD)= (ABCD) = Att(R)
 Second rule holds true as Att(R1) ⋂ Att(R2) = (ABC) ⋂ (AD) ≠ Ø
 Third rule holds true as Att(R1) ⋂ Att(R2) = A is a key of R1(ABC) because A-
>BC is given

Dependency Preserving Decomposition


If we decompose a relation R into relations R1 and R2, all dependencies of R must be part
of either R1 or R2 or must be derivable from combination of functional dependencies (FD) of
R1 and R2
Suppose a relation R(A,B,C,D) with FD set {A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of R1(ABC)
For Example,
Consider a schema R(A,B,C,D) and functional dependencies A->B and C->D which
is decomposed into R1(AB) and R2(CD)
This decomposition is dependency preserving decomposition because
 A->B can be ensured in R1(AB)
 C->D can be ensured in R2(CD)

Gn 111
DDBMS

UNIT-V

Gn 112
DDBMS

TRANSACTION
Transactions are a set of operations that are used to perform some logical set of work. A
transaction is made to change data in a database which can be done by inserting new data,
updating the existing data, or by deleting the data that is no longer required.
A transaction is a set of logically related operations.
For example, you are transferring money from your bank account to your friend’s account,
the set of operations would be like below
 Read your account balance
 Deduct the amount from your balance
 Write the remaining balance to your account
 Read your friend’s account balance
 Add the amount to his account balance
 Write the new updated balance to his account
This whole set of operations can be called a transaction.
In DBMS, we write the above 6 steps transaction like below
Lets say your account is A and your friend’s account is B, you are transferring 10000 from A
to B, the steps of the transaction are
Read(A);
A = A - 10000;
Write(A);
Read(B);
B = B + 10000;
Write(B);

PROPERTIES OF TRANSACTIONS (or) ACID PROPERTIES


To ensure the integrity and consistency of data during a transaction, the database system
maintains four properties. These properties are widely known as ACID properties.

The properties are


 Atomicity
 Consistency
 Isolation
 Durability

Gn 113
DDBMS

Atomicity
This property ensures that either all the operations of a transaction reflect in database or none.
The logic here is simple, transaction is a single unit, it can’t execute partially. Either it
executes completely or it doesn’t, there shouldn’t be a partial execution.
Consider an example of banking system to understand this
Suppose Account A has a balance of Rs.400 & B has Rs.700. Account A is transferring
Rs.100 to Account B.
This is a transaction that has two operations
a) Debiting Rs.100 from A’s balance
b) Crediting Rs.100 to B’s balance.
Let’s say first operation passed successfully while second failed, in this case A’s balance
would be Rs.300 while B would be having Rs.700 instead of Rs.800. This is unacceptable in
a banking system. Either the transaction should fail without executing any of the operation or
it should process both the operations. The Atomicity property ensures that.
There are two key operations are involved in a transaction to maintain the atomicity of the
transaction.
Abort: If there is a failure in the transaction, abort the execution and rollback the changes
made by the transaction.
Commit: If transaction executes successfully, commit the changes to the database.

Consistency
Database must be in consistent state before and after the execution of the transaction. This
ensures that there are no errors in the database at any point of time. Application programmer
is responsible for maintaining the consistency of the database.
For Example,
A transferring Rs.1000 to B. A’s initial balance is Rs.2000 and B’s initial balance is Rs.5000.
Before the transaction:
Total of A+B = 2000 + 5000 = Rs.7000
After the transaction:
Total of A+B = 1000 + 6000 = Rs.7000
The data is consistent before and after the execution of the transaction so this example
maintains the consistency property of the database.

Gn 114
DDBMS

Isolation
A transaction shouldn’t interfere with the execution of another transaction. To preserve the
consistency of database, the execution of transaction should take place in isolation (that
means no other transaction should run concurrently when there is a transaction already
running).
For example account A is having a balance of Rs.400 and it is transferring Rs.100 to account
B & C both. So we have two transactions here. Let’s say these transactions run concurrently
and both the transactions read Rs.400 balance, in this case the final balance of A would be
Rs.300 instead of Rs.200. This is wrong.
If the transaction were to run in isolation then the second transaction would have read the
correct balance Rs.300 (before debiting Rs.100) once the first transaction went successful.

Durability
Once a transaction completes successfully, the changes it has made into the database should
be permanent even if there is a system failure. The recovery-management component of
database systems ensures the durability of transaction.
ACID properties are the backbone of a database management system. These properties ensure
that even though there are multiple transaction reading and writing the data in the database,
the data is always correct and consistent.

TRANSACTION LOG
A DBMS uses a transaction log to keep track of all transactions that update the database. The
information stored in this log is used by the DBMS for a recovery requirement triggered by a
ROLLBACK statement, a program’s abnormal termination, or a system failure such as a
network discrepancy or a disk crash.
Some RDBMSs use the transaction log to recover a database forward to a currently consistent
state. After a server failure, for example, Oracle automatically rolls back uncommitted
transactions and rolls forward transactions that were committed but not yet written to the
physical database.
While the DBMS executes transactions that modify the database, it also automatically
updates the transaction log.
The transaction log stores
 A record for the beginning of the transaction.
 For each transaction component (SQL statement):
 The type of operation being performed (update, delete, insert).

Gn 115
DDBMS

 The names of the objects affected by the transaction (the name of the table).
 The “before” and “after” values for the fields being updated.
 Pointers to the previous and next transaction log entries for the same transaction.
 The ending (COMMIT) of the transaction.

TRANSACTION MANAGEMENT WITH SQL USING COMMIT ROLLBACK AND


SAVE POINT
Transaction Control Commands
The following commands are used to control transactions.
 COMMIT − to save the changes.
 ROLLBACK − to roll back the changes.
 SAVEPOINT − creates points within the groups of transactions in which to
ROLLBACK.
 SET TRANSACTION − Places a name on a transaction.
Transactional control commands are only used with the DML Commands such as - INSERT,
UPDATE and DELETE only. They cannot be used while creating tables or dropping them
because these operations are automatically committed in the database.
The COMMIT Command - The COMMIT command is the transactional command used to
save changes invoked by a transaction to the database.
The COMMIT command is the transactional command used to save changes invoked by a
transaction to the database. The COMMIT command saves all the transactions to the database
since the last COMMIT or ROLLBACK command.
The syntax for the COMMIT command is as follows
COMMIT;
For Example, Consider the STUDENT table having the following records
ID NAME AGE ADDRESS
1 Ramu 32 Tanuku
2 Chitti 25 Tadepalligudem
3 Komali 23 Bhimavaram
4 Vishal 25 Eluru
Following is an example which would delete those records from the table which have age =
25 and then COMMIT the changes in the database.
SQL> DELETE FROM STUDENT WHERE AGE = 25;
SQL> COMMIT;

Gn 116
DDBMS

Thus, two rows from the table would be deleted and the SELECT statement would produce
the following result
ID NAME AGE ADDRESS
1 Ramu 32 Tanuku
3 Komali 23 Bhimavaram

The ROLLBACK Command - The ROLLBACK command is the transactional command


used to undo transactions that have not already been saved to the database. This command
can only be used to undo transactions since the last COMMIT or ROLLBACK command was
issued.
The syntax for a ROLLBACK command is as follows
ROLLBACK;
For Example, Consider the STUDENT table having the following records
ID NAME AGE ADDRESS
1 Ramu 32 Tanuku
2 Chitti 25 Tadepalligudem
3 Komali 23 Bhimavaram
4 Vishal 25 Eluru
Following is an example, which would delete those records from the table which have the age
= 25 and then ROLLBACK the changes in the database.
SQL> DELETE FROM STUDENT WHERE AGE = 25;
SQL> ROLLBACK;
Thus, the delete operation would not impact the table and the SELECT statement would
produce the following result.
ID NAME AGE ADDRESS
1 Ramu 32 Tanuku
2 Chitti 25 Tadepalligudem
3 Komali 23 Bhimavaram
4 Vishal 25 Eluru

The SAVEPOINT Command - A SAVEPOINT is a point in a transaction when you can roll
the transaction back to a certain point without rolling back the entire transaction.
The syntax for a SAVEPOINT command is as shown below.
SAVEPOINT SAVEPOINT_NAME;

Gn 117
DDBMS

This command serves only in the creation of a SAVEPOINT among all the transactional
statements. The ROLLBACK command is used to undo a group of transactions.
The syntax for rolling back to a SAVEPOINT is as shown below.
ROLLBACK TO SAVEPOINT_NAME;
Following is an example where you plan to delete the three different records from the
STUDENT table. You want to create a SAVEPOINT before each delete, so that you can
ROLLBACK to any SAVEPOINT at any time to return the appropriate data to its original
state.
For Example, Consider the STUDENT table having the following records.
ID NAME AGE ADDRESS
1 Ramu 32 Tanuku
2 Chitti 25 Tadepalligudem
3 Komali 23 Bhimavaram
4 Vishal 25 Eluru
The following code block contains the series of operations.
SQL> SAVEPOINT SP1;
Savepoint created.
SQL> DELETE FROM STUDENT WHERE ID=1;
1 row deleted.
SQL> SAVEPOINT SP2;
Savepoint created.
SQL> DELETE FROM STUDENT WHERE ID=2;
1 row deleted.
SQL> SAVEPOINT SP3;
Savepoint created.
SQL> DELETE FROM STUDENT WHERE ID=3;
1 row deleted.
Now that the three deletions have taken place, let us assume that you have changed your
mind and decided to ROLLBACK to the SAVEPOINT that you identified as SP2. Because
SP2 was created after the first deletion, the last two deletions are undone
SQL> ROLLBACK TO SP2;
Rollback complete.
Notice that only the first deletion took place since you rolled back to SP2.
SQL> SELECT * FROM STUDENT;

Gn 118
DDBMS

ID NAME AGE ADDRESS


2 Chitti 25 Tadepalligudem
3 Komali 23 Bhimavaram
4 Vishal 25 Eluru
3 rows selected.

The RELEASE SAVEPOINT Command - The RELEASE SAVEPOINT command is used


to remove a SAVEPOINT that you have created.
The syntax for a RELEASE SAVEPOINT command is as follows.
RELEASE SAVEPOINT SAVEPOINT_NAME;
Once a SAVEPOINT has been released, you can no longer use the ROLLBACK command to
undo transactions performed since the last SAVEPOINT.

The SET TRANSACTION Command - The SET TRANSACTION command can be used
to initiate a database transaction. This command is used to specify characteristics for the
transaction that follows. For example, you can specify a transaction to be read only or read
write.
The syntax for a SET TRANSACTION command is as follows.
SET TRANSACTION [ READ WRITE | READ ONLY ];

CONCURRENCY CONTROL
Concurrency control is the process of managing simultaneous execution of transactions in a
multiprocessing database system without having them interfere with one another.
This property of DBMS allows many transactions to access the same database at the same
time without interfering with each other.
Concurrency problems in DBMS Transactions
When multiple transactions execute concurrently in an uncontrolled or unrestricted manner,
then it might lead to several problems. These problems are commonly referred to as
concurrency problems in a database environment.
The concurrency problems that can occur in the database are
 Lost update problem (write-write conflict)
 Temporary update or dirty read problem or Uncommitted Update (write-read
conflict).
 Unrepeatable read or incorrect analysis or inconsistent retrievals (read-write conflict).

Gn 119
DDBMS

Lost update problem – The Lost Update problem arises when an update in the data is done
over another update but by two different transactions. For Example, consider two transactions
A and B performing read/write operations on a data DT in the database DB. The current
value of DT is 1000. The following table shows the read/write operations in A and B
transactions.
Time A B
t1 READ(DT) ------
t2 DT=DT+500 ------
t3 WRITE(DT) ------
t4 ------ DT=DT+300
t5 ------ WRITE(DT)
t6 READ(DT) ------

Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of
DT from 1000 to 1500 and then again transaction B modifies the value to 1800. Transaction
A again reads DT and finds 1800 in DT and therefore the update done by transaction A has
been lost.

Dirty Read Problem - The dirty read problem arises when a transaction reads the data that
has been updated by another transaction that is still uncommitted. It arises due to multiple
uncommitted transactions executing simultaneously. For Example, consider two transactions
A and B performing read/write operations on a data DT in the database DB. The current
value of DT is 1000. The following table shows the read/write operations in A and B
transactions.
Time A B
t1 READ(DT) ------
t2 DT=DT+500 ------
t3 WRITE(DT) ------
t4 ------ READ(DT)
t5 ------ COMMIT
t6 ROLLBACK ------

Gn 120
DDBMS

Transaction A reads the value of data DT as 1000 and modifies it to 1500 which gets stored
in the temporary buffer. The transaction B reads the data DT as 1500 and commits it and the
value of DT permanently gets changed to 1500 in the database DB. Then some server errors
occur in transaction A and it wants to get rollback to its initial value, i.e., 1000 and then the
dirty read problem occurs.

Unrepeatable Read Problem - The unrepeatable read problem occurs when two or more
different values of the same data are read during the read operations in the same transaction.
For Example, consider two transactions A and B performing read/write operations on a data
DT in the database DB. The current value of DT is 1000. The following table shows the
read/write operations in A and B transactions.
Time A B
t1 READ(DT) ------
t2 ------ READ(DT)
t3 DT=DT+500 ------
t4 WRITE(DT) ------
t5 ------ READ(DT)
Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value
of DT from 1000 to 1500 and then again transaction B reads the value and finds it to be 1500.
Transaction B finds two different values of DT in its two different read operations.

Concurrency control is the technique that ensures that the above three conflicts don’t occur in
the database. There are certain rules to avoid problems in concurrently running transactions
and these rules are defined as the concurrency control protocols.

SCHEDULER
Transactions are a set of instructions that perform operations on databases. When multiple
transactions are running concurrently, then a sequence is needed in which the operations are
to be performed because at a time, only one operation can be performed on the database. This
sequence of operations is known as Schedule, and this process is known as Scheduling.
When multiple transactions execute simultaneously in an unmanageable manner, then it
might lead to several problems, which are known as concurrency problems. In order to
overcome these problems, scheduling is required.

Gn 121
DDBMS

The scheduler establishes the order in which the operations within concurrent transactions are
executed. The scheduler interleaves the execution of database operations to ensure
serializability. The scheduler bases its actions on concurrency control algorithms, such as
locking or time stamping methods.
The schedulers ensure the efficient utilization of central processing unit (CPU) of computer
system. It can be observed that the schedule does not contain an ABORT or COMMIT action
for either transaction. Schedules which contain either an ABORT or COMMIT action for
each transaction whose actions are listed in it are called a complete schedule.
If the actions of different transactions are not interleaved, that is, transactions are executed
one by one from start to finish, the schedule is called a serial schedule.
A non-serial schedule is a schedule where the operations from a group of concurrent
transactions are interleaved.
A serial schedule gives the benefits of concurrent execution without giving up any
correctness. The disadvantage of a serial schedule is that it represents inefficient processing
because no interleaving of operations form different transactions is permitted. This can lead
to low CPU utilization while a transaction waits for disk input/output (I/O), or for another
transaction to terminate, thus slowing down processing considerably.

Serializable Schedules
A serializable schedule is a schedule that follows a set of transactions to execute in some
order such that the effects are equivalent to executing them in some serial order like a serial
schedule. The execution of transactions in a serializable schedule is a sufficient condition for
preventing conflicts.
The serial execution of transactions always leaves the database in a consistent state.
serializability describes the concurrent execution of several transactions.
The objective of serializability is to find the non-serial schedules that allow transactions to
execute concurrently without interfering with one another and thereby producing a database
state that could be produced by a serial execution.
Serializability must be guaranteed to prevent inconsistency from transactions interfering with
one another. The order of Read and Write operations are important in serializability.
The serializability rules are as follows:
 If two transactions T1 and T2 only Read a data item, they do not conflict and the order is
not important.
 If two transactions T1 and T2 either Read or Write completely separate data items, they
do not conflict and the execution order is not important.

Gn 122
DDBMS

 If one transaction T1 Writes a data item and another transaction T2 either Reads or Writes
the same data item, the order of execution is important.

Serializability can also be depicted by constructing a precedence graph.


A precedence relationship can be defined as, transaction T1 precedes transaction T2 and
between T1 and T2 if there are two non-permutable actions A1and A2 and A1 is executed by
T1 before A2 is executed by T2.
Given the existence of non-permutable actions and the sequence of actions in a transaction it
is possible to define a partial order of transactions by constructing a precedence graph.
A precedence graph is a directed graph in which:
 The set of vertices is the set of transactions.
 An arc exists between transactions T1 and T2 if T1 precedes T2.
 A schedule is serializable if the precedence graph is cyclic.
 The serializability property of transactions is important in multi-user and distributed
databases, where several transactions are likely to be executed concurrently.

Difference Between Serial Schedule and Serializable Schedule

Serial Schedule Serializable Schedule


In Serial schedule, transactions will be In Serializable schedule transaction are
executed one after other. executed concurrently.
Serial schedule are less efficient. Serializable schedule are more efficient.
In serial schedule only one transaction In Serializable schedule multiple
executed at a time. transactions can be executed at a time.
Serial schedule takes more time for In Serializable schedule execution is
execution. fast.

METHODS FOR CONCURRENCY CONTROL


There are main three methods for concurrency control. They are as follows
1. Locking Methods
2. Time-stamp Methods
3. Optimistic Methods

Gn 123
DDBMS

1. Locking Methods of Concurrency Control


Locks are an integral part to maintain concurrency control in DBMS. A transaction in any
system implementing lock based concurrency control cannot read or write a statement until it
has obtained the required locks.
A lock is a variable, associated with the data item, which controls the access of that data item.
Locking is the most widely used form of the concurrency control.
Locks are further divided into three fields:
 Lock Granularity
 Lock Types
 Deadlocks
Lock Granularity
A database is basically represented as a collection of named data items. The size of the data
item chosen as the unit of protection by a concurrency control program is called
GRANULARITY.
Locking can take place at the following level
 Database level.
 Table level.
 Page level.
 Row (Tuple) level.
 Attributes (fields) level.
Database level Locking - At database level locking, the entire database is locked. Thus, it
prevents the use of any tables in the database by transaction T2 while transaction T1 is being
executed.
Database level of locking is suitable for batch processes. Being very slow, it is unsuitable for
on-line multi-user DBMSs.
Table level Locking - At table level locking, the entire table is locked. Thus, it prevents the
access to any row (tuple) by transaction T2 while transaction T1 is using the table.
If a transaction requires access to several tables, each table may be locked. However, two
transactions can access the same database as long as they access different tables. Table level
locking is less restrictive than database level. Table level locks are not suitable for multi-user
DBMS
Page level Locking - At page level locking, the entire disk-page (or disk-block) is locked. A
page has a fixed size such as 4 K, 8 K, 16 K, 32 K and so on. A table can span several pages,
and a page can contain several rows (tuples) of one or more tables.

Gn 124
DDBMS

Page level of locking is most suitable for multi-user DBMSs.


Row (Tuple) level Locking - At row level locking, particular row (or tuple) is locked. A
lock exists for each row in each table of the database. The DBMS allows concurrent
transactions to access different rows of the same table, even if the rows are located on the
same page.
The row level lock is much less restrictive than database level, table level, or page level
locks. The row level locking improves the availability of data. However, the management of
row level locking requires high overhead cost.
Attributes (fields) level Locking - At attribute level locking, particular attribute (or field) is
locked. Attribute level locking allows concurrent transactions to access the same row, as
long as they require the use of different attributes within the row. The attribute level lock
yields the most flexible multi-user data access. It requires a high level of computer
overhead.

Lock Types
The DBMS mainly uses following types of locking techniques.
 Binary Locking
 Shared / Exclusive Locking
 Two - Phase Locking (2PL)
Binary Locking - A binary lock can have two states or values: locked and unlocked (or 1 and
0, for simplicity). A distinct lock is associated with each database item X. If the value of the
lock on X is 1, item X cannot be accessed by a database operation that requests the item. If
the value of the lock on X is 0, the item can be accessed when requested. We refer to the
current value (or state) of the lock associated with item X as LOCK(X).

Two operations, lock_item and unlock_item, are used with binary locking.
Lock_item(X) - A transaction requests access to an item X by first issuing a lock_item(X)
operation.
If LOCK(X) = 1, the transaction is forced to wait. If LOCK(X) = 0, it is set to 1 (the
transaction locks the item) and the transaction is allowed to access item X.
Unlock_item (X) - When the transaction is through using the item, it issues an
unlock_item(X) operation, which sets LOCK(X) to 0 (unlocks the item) so that X may be
accessed by other transactions.
Hence, a binary lock enforces mutual exclusion on the data item; i.e., at a time only one
transaction can hold a lock.

Gn 125
DDBMS

Shared / Exclusive Locking


Shared lock (S) - These locks are reffered as read locks, and denoted by 'S'.
If a transaction T has obtained Shared-lock on data item X, then T can read X, but cannot
write X. Multiple Shared lock can be placed simultaneously on a data item.
For example, consider a case where two transactions are reading the account balance of a
person. The database will let them read by placing a shared lock. However, if another
transaction wants to update that account’s balance, shared lock prevent it until the reading
process is over.
Exclusive lock (X) - These Locks are referred as Write locks, and denoted by 'X'.
If a transaction T has obtained Exclusive lock on data item X, then T can be read as well as
write X. Only one Exclusive lock can be placed on a data item at a time. This means multiple
transactions does not modify the same data simultaneously.
For example, when a transaction needs to update the account balance of a person. You can
allows this transaction by placing X lock on it. Therefore, when the second transaction wants
to
read or write, exclusive lock prevent this operation.

Two-Phase Locking (2PL)


Two Phase Locking Protocol also known as 2PL protocol is a method of concurrency control
in DBMS that ensures serializability by applying a lock to the transaction data which blocks
other transactions to access the same data simultaneously. Two Phase Locking protocol helps
to eliminate the concurrency problem in DBMS.
This locking protocol divides the execution phase of a transaction into three different parts.
 In the first phase, when the transaction begins to execute, it requires permission for
the locks it needs.
 The second part is where the transaction obtains all the locks. When a transaction
releases its first lock, the third phase starts.
 In this third phase, the transaction cannot demand any new locks. Instead, it only
releases the acquired locks.
The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in
two steps
Growing Phase: In this phase transaction may obtain locks but may not release any locks.
Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock

Gn 126
DDBMS

Time Transaction Remarks


t0 Lock - X (A) acquire Exclusive lock on A.
t1 Read A read original value of A
t2 A = A - 100 subtract 100 from A
t3 Write A write new value of A
t4 Lock - X (B) acquire Exclusive lock on B.
t5 Read B read original value of B
t6 B = B + 100 add 100 to B
t7 Write B write new value of B
t8 Unlock (A) release lock on A
t9 Unlock (B) release lock on B

Deadlocks
A deadlock is a condition in which two (or more) transactions in a set are waiting
simultaneously for locks held by some other transaction in the set. Neither transaction can
continue because each transaction in the set is on a waiting queue, waiting for one of the
other transactions in the set to release the lock on an item. Thus, a deadlock is an impasse
that may result when two or more transactions are each waiting for locks to be released that
are held by the other. Transactions whose lock requests have been refused are queued until
the lock can be granted.

A deadlock is also called a circular waiting condition where two transactions are waiting
(directly or indirectly) for each other. Thus in a deadlock, two transactions are mutually
excluded from accessing the next record required to complete their transactions, also called a
deadly embrace.

For Example,
A deadlock exists two transactions A and B exist in the following example:
Transaction A = access data items X and Y
Transaction B = access data items Y and X
Here, Transaction-A has acquired lock on X and is waiting to acquire lock on y. While,
Transaction-B has acquired lock on Y and is waiting to acquire lock on X. But, none of them
can execute further.

Gn 127
DDBMS

Transaction-A Time Transaction-B


--- t0 ---
Lock (X) (acquired lock on X) t1 ---
--- t2 Lock (Y) (acquired lock on Y)
Lock (Y) (request lock on Y) t3 ---
Wait t4 Lock (X) (request lock on X)
Wait t5 Wait
Wait t6 Wait
Wait t7 Wait

Deadlock Detection and Prevention


Deadlock detection - This technique allows deadlock to occur, but then, it detects it and
solves it. Here, a database is periodically checked for deadlocks. If a deadlock is detected,
one of the transactions, involved in deadlock cycle, is aborted. Other transaction continues
their execution. An aborted transaction is rolled back and restarted.

Deadlock Prevention - Deadlock prevention technique avoids the conditions that lead to
deadlocking. It requires that every transaction lock all data items it needs in advance. If any
of the items cannot be obtained, none of the items are locked. In other words, a transaction
requesting a new lock is aborted if there is the possibility that a deadlock can occur. Thus, a
timeout may be used to abort transactions that have been idle for too long. This is a simple
but indiscriminate approach. If the transaction is aborted, all the changes made by this
transaction are rolled back and all locks obtained by the transaction are released. The
transaction is then rescheduled for execution. Deadlock prevention technique is used in
two-phase locking.

2. Time-Stamp Methods for Concurrency control


The time stamping approach to scheduling concurrent transactions assigns a global, unique
time stamp to each transaction. The time stamp value produces an explicit order in which
transactions are submitted to the DBMS.

Time stamps must have two properties: uniqueness and monotonicity.


Uniqueness ensures that no equal time stamp values can exist, and monotonicity ensures that
time stamp values always increase.

Gn 128
DDBMS

All database operations (Read and Write) within the same transaction must have the same
time stamp. The DBMS executes conflicting operations in time stamp order, thereby ensuring
serializability of the transactions. If two transactions conflict, one is stopped, rolled back,
rescheduled, and assigned a new time stamp value.
The disadvantage of the time stamping approach is that each value stored in the database
requires two additional time stamp fields: one for the last time the field was read and one for
the last update. Time stamping thus increases memory needs and the database’s processing
overhead. Time stamping demands a lot of system resources because many transactions
might have to be stopped, rescheduled, and restamped.

The wait/die scheme and the wound/wait scheme


We have learned that time stamping methods are used to manage concurrent transaction
execution. In this section, we will learn about two schemes used to decide which transaction
is rolled back and which continues executing.

An example illustrates the difference. Assume that we have two conflicting transactions: T1
and T2, each with a unique time stamp. Suppose T1 has a time stamp of 11548789 and T2
has a time stamp of 19562545. We can deduce from the time stamps that T1 is the older
transaction (the lower time stamp value) and T2 is the newer transaction. Given that scenario,
the four possible outcomes are shown in the following Table.

In the wait/die scheme


If the transaction requesting the lock is the older of the two transactions, it will wait until the
other transaction is completed and the locks are released.
If the transaction requesting the lock is the younger of the two transactions, it will die (roll
back) and is rescheduled using the same time stamp.
In short, in the wait/die scheme, the older transaction waits for the younger to complete and
release its locks

Gn 129
DDBMS

In the wound/wait scheme


If the transaction requesting the lock is the older of the two transactions, it will preempt
(wound) the younger transaction (by rolling it back). T1 preempts T2 when T1 rolls back T2.
The younger, preempted transaction is rescheduled using the same time stamp.
If the transaction requesting the lock is the younger of the two transactions, it will wait until
the other transaction is completed and the locks are released.
In short, in the wound/wait scheme, the older transaction rolls back the younger transaction
and reschedules it.

3. Optimistic Methods of Concurrency Control


Validation based protocol avoids the concurrency of the transactions and works based on the
assumption that if no transactions are running concurrently then no interference occurs. This
is why it is also called Optimistic Concurrency Control Technique.
In this protocol, a transaction doesn’t make any changes to the database directly, instead it
performs all the changes on the local copies of the data items that are maintained in the
transaction itself. At the end of the transaction, a validation is performed on the transaction. If
it doesn’t violate any serializability rule, the transaction commit the changes to the database
else it is updated and restarted.

Three phases of Validation based Protocol


Read phase: In this phase, a transaction reads the value of data items from database and store
their values into the temporary local variables. Transaction then starts executing but it doesn’t
update the data items in the database, instead it performs all the operations on temporary local
variables.

Validation phase: In this phase, a validation check is done on the temporary variables to see
if it violates the rules of serializability.

Write phase: This is the final phase of validation based protocol. In this phase, if the
validation of the transaction is successful then the values of temporary local variables are
written to the database and the transaction is committed. If the validation is failed in second
phase then the updates are discarded and transaction is slowed down to be restarted later.

Gn 130
DDBMS

DATABASE RECOVERY
Database recovery techniques are used in database management systems (DBMS) to restore a
database to a consistent state after a failure or error has occurred. The main goal of recovery
techniques is to ensure data integrity and consistency and prevent data loss.
There are mainly two types of recovery techniques used in DBMS.
Rollback/Undo Recovery Technique: The rollback/undo recovery technique is based on the
principle of backing out or undoing the effects of a transaction that has not completed
successfully due to a system failure or error. This technique is accomplished by undoing the
changes made by the transaction using the log records stored in the transaction log. The
transaction log contains a record of all the transactions that have been performed on the
database. The system uses the log records to undo the changes made by the failed transaction
and restore the database to its previous state.
Commit/Redo Recovery Technique: The commit/redo recovery technique is based on the
principle of reapplying the changes made by a transaction that has been completed
successfully to the database. This technique is accomplished by using the log records stored
in the transaction log to redo the changes made by the transaction that was in progress at the
time of the failure or error. The system uses the log records to reapply the changes made by
the transaction and restore the database to its most recent consistent state.
In addition to these two techniques, there is also a third technique called checkpoint recovery.
Checkpoint recovery is a technique used to reduce the recovery time by periodically saving
the state of the database in a checkpoint file. In the event of a failure, the system can use the
checkpoint file to restore the database to the most recent consistent state before the failure
occurred, rather than going through the entire log to recover the database.
Overall, recovery techniques are essential to ensure data consistency and availability in
DBMS, and each technique has its own advantages and limitations that must be considered in
the design of a recovery system

Database systems, like any other computer system, are subject to failures but the data stored
in them must be available as and when required. When a database fails it must possess the
facilities for fast recovery. It must also have atomicity i.e. either transaction are completed
successfully and committed (the effect is recorded permanently in the database) or the
transaction should have no effect on the database. There are both automatic and non-
automatic ways for both, backing up of data and recovery from any failure situations. The
techniques used to recover the lost data due to system crashes, transaction errors, viruses,
catastrophic failure, incorrect commands execution, etc. are database recovery techniques. So

Gn 131
DDBMS

to prevent data loss recovery techniques based on deferred update and immediate update or
backing up data can be used.
Recovery techniques are heavily dependent upon the existence of a special file known as
a system log. It contains information about the start and end of each transaction and any
updates which occur during the transaction. The log keeps track of all transaction operations
that affect the values of database items. This information is needed to recover from
transaction failure.
 The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
 read_item(T, X): This log entry records that transaction T reads the value of database
item X.
 write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old value is
sometimes known as a before an image of X, and the new value is known as an
afterimage of X.
 commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.
 abort(T): This records that transaction T has been aborted.
 checkpoint: Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in a consistent state, and all the transactions were committed.
A transaction T reaches its commit point when all its operations that access the database have
been executed successfully i.e. the transaction has reached the point at which it will
not abort (terminate without completing). Once committed, the transaction is permanently
recorded in the database. Commitment always involves writing a commit entry to the log and
writing the log to disk. At the time of a system crash, item is searched back in the log for all
transactions T that have written a start_transaction(T) entry into the log but have not written a
commit(T) entry yet; these transactions may have to be rolled back to undo their effect on the
database during the recovery process.
Undoing – If a transaction crashes, then the recovery manager may undo transactions i.e.
reverse the operations of a transaction. This involves examining a transaction for the log
entry write_item(T, x, old_value, new_value) and set the value of item x in the database to
old-value. There are two major techniques for recovery from non-catastrophic transaction
failures: deferred updates and immediate updates.

Gn 132
DDBMS

Deferred update – This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction updates are
recorded in the local transaction workspace. If a transaction fails before reaching its commit
point, it will not have changed the database in any way so UNDO is not needed. It may be
necessary to REDO the effect of the operations that are recorded in the local transaction
workspace, because their effect may not yet have been written in the database. Hence, a
deferred update is also known as the No-undo/redo algorithm.
Immediate update – In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However, these
operations are recorded in a log on disk before they are applied to the database, making
recovery still possible. If a transaction fails to reach its commit point, the effect of its
operation must be undone i.e. the transaction must be rolled back hence we require both undo
and redo. This technique is known as undo/redo algorithm.
Caching/Buffering – In this one or more disk pages that include data items to be updated are
cached into main memory buffers and then updated in memory before being written back to
disk. A collection of in-memory buffers called the DBMS cache is kept under the control of
DBMS for holding these buffers. A directory is used to keep track of which database items
are in the buffer. A dirty bit is associated with each buffer, which is 0 if the buffer is not
modified else 1 if modified.
Shadow paging – It provides atomicity and durability. A directory with n entries is
constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory. When a
page is to be modified, a shadow page is allocated in which changes are made and when it is
ready to become durable, all pages that refer to the original are updated to refer new
replacement page.
Backward Recovery – The term “Rollback” and “UNDO” can also refer to backward
recovery. When a backup of the data is not available and previous modifications need to be
undone, this technique can be helpful. With the backward recovery method, unused
modifications are removed and the database is returned to its prior condition. All adjustments
made during the previous traction are reversed during the backward recovery. In another
word, it reprocesses valid transactions and undoes the erroneous database updates.
Forward Recovery – “Roll forward” and “REDO” refers to forwarding recovery. When a
database needs to be updated with all changes verified, this forward recovery technique is
helpful.
Some failed transactions in this database are applied to the database to roll those

Gn 133
DDBMS

modifications forward. In another word, the database is restored using preserved data and
valid transactions counted by their past saves.

Some of the backup techniques are as follows


Full database backup – In this full database including data and database, Meta information
needed to restore the whole database, including full-text catalogs are backed up in a
predefined time series.
Differential backup – It stores only the data changes that have occurred since the last full
database backup. When some data has changed many times since last full database backup, a
differential backup stores the most recent version of the changed data. For this first, we need
to restore a full database backup.
Transaction log backup – In this, all events that have occurred in the database, like a record
of every single statement executed is backed up. It is the backup of transaction log entries and
contains all transactions that had happened to the database. Through this, the database can be
recovered to a specific point in time. It is even possible to perform a backup from a
transaction log if the data files are destroyed and not even a single committed transaction is
lost.

Gn 134

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy