CMP374 Introduction To RDBMS
CMP374 Introduction To RDBMS
Open University
P34
Diploma in Computer applications
CMP374
Introduction to RDBMS
Yashwantrao Chavan Maharashtra Open University
Vice-Chancellor: Dr. R. Krishnakumar
SCHOOL OF COMPUTER SCIENCE : SCHOOL COUNCIL
Dr. Ramchandra Tiwari Shri. Pramod Khandare Prof. M.S. Karyakate
Director Assistant Professor Associate Professor
School of Computer Science School of Computer Science Computer Department
Y.C.M.Open University Nashik Y.C.M.Open University, Nashik VIIT, Pune
Production
Shri. Anand Yadav
Manager
Print Production Centre YCMOU, Nashik-422101
Unit 5 Overview
Normalizations
Relational DB design
Decomposition (Small schema)
Lossy Decomposition
3 10
Loss less Decomposition
Functional Dependency – Full Dependency, Partial
Dependency, Transitive Dependency
Normalized Forms – Un – Normalized form, 1NF,
2NF, 3NF
De-normalization
Unit 6 SQL Introduction
SQL Statements - DML, DDL, DCL
Data Types in SQL
Basic Types structure
SELECT- SQL SELECT DISTINCT Statement, SQL
Where Clause, And, OR, In, Between, Like Operator, SQL
Order by Keyword, Aggregate Functions, Group By, 4 10
Having Clause.
CREATE – DROP TABLE, Constraints
INSERT, UPDATE, DELETE, ALTER
DATA Control Language (DCL)
Different operations on tables – Rename, Tuple Variables,
Set Operations(UNION Operator, UNION ALL Operator,
INTERSECT Operator, Minus Operator), String
Operations
Null Values
Unit 7 Transaction Introduction
Management
Transaction Concept
3 10
Properties of Transactions
Transaction Terminology
Transaction Terminology
Transaction States
Concurrent Execution of Transactions
Operations on a Transactions
Concurrency Control
Schedules
Recoverability
Topics To Learn
0.0 Overview
0.1 Introduction
0.2 Operations on file
0.3 Introduction to Database
0.0 Overview
In this chapter we will study the concept of data, Files and then different
operations that we can perform on the file. In the section 0.1 you will study Data
and Files. Data is a collection of facts, such as values or measurements. It can be
numbers, words, measurements, observations or even just descriptions of things.
After understanding data we will study the concept of files. File is a container for
data. We use files to keep our data safe and secured. For example, in office we
keep all office information documents (office data) in different files. And each file
has given a particular name. Similarly in computer we use files to keep our data.
We will see these terms in section 0.2.
After understanding data and files we will study Database. Database is also a
type of data storage like file but it has some higher facilities and features than
a regular file which help us to manage the data easily. As in the market we are
having different types of file folders to keep our paper documents similarly the
database is also available in different structures and models to provide ease of
retrieving the data.
0.1 Introduction
Processing
Data Information
Mumbai 90
For example: Suppose you are going to visit in a shop to purchase some items. So
before you go, you make a list of items you want to purchase. This item list
contains name and quantity of items (i.e. data about items) you are purchasing.
1
E.g.. Name and Quantity Item List
of items on item list are Sugar 5 kg
data about items to be
purchased. Wheat 10 kg
Oil 5ltr
Rice 10 kg
Soap 4 qty
So here name and price of items is nothing but the data about that item.
Data can exist in a variety of forms such as numbers, text or symbols on pieces of
paper, as bits and bytes stored in electronic memory, or as facts stored in a person's
mind.
In terms of computer data are symbols or signals that are input to the computer,
stored in computer, and processed by a computer for output as usable
information.
We can say for example:
011000110011… can be represented as binary data in computer.
Hence from above discussion we can differentiate Data, Instruction and Programs
as-
Data: raw information.
Information: processed data.
Programs: set of instructions
Files: We use file to save the data. In general we keep our documents (papers on
which something is written) in a folder file where they can be safe and secured.
For example after making a list of your purchasing items you may keep that list in
your pocket or in envelope where it can be safe. With same manner, we store
computer data in file (i.e. file of computer) where that data will be safe and secure.
2
In terms of computer we define a file as a collection of bytes stored as an
individual unit/ entity. All data in disk are stored as a file with an assigned
filename that is unique within the directory it resides in.
Almost all information stored in a computer must be in a file.
There are many different types of files: data files, text files, program files,
directory files, and so on. Different types of files store different types of
information.
For example, program files store programs, whereas text files store text.
To the computer a file is nothing more than a series of bytes.
The software that manipulates it knows the structure of a file.
E.g.. Database files are made up of a series of records where word processing
file (documents) made up of a continuous flow of text.
When we want to extract the data from file for reading purpose then we
perform retrieval operation on the file. Retrieval operations do not change the
contents (data) of file; it only locates the records in file.
Before we go ahead, we will see what is record???
In database management systems, a complete set of information is called as a
‘record’.
For Example: consider a below file, which contains library information.
Fields
3
We will see this term in more detail in further chapters.
Update operation on the other hand; change the file by modifying the
records, deleting the records and inserting new records. That is we actually modify
the contents in file in update operation.
The operations for locating, modifying, deleting and inserting records will
vary from system to system, but there are most some operations that are used in
most systems and those are given below:
Find (Locate): The goal of this operation is to locate the record or records that
satisfy the search criteria. A block that contains the records is transferred to the
main memory and the records are searched. The first record that matches the
search criteria is located and the search continues for other records until the end
of file is reached.
Read: In read operation the contents of the records are copied from memory to
a program variable or work area. This command in some cases advances the
pointer to next record in the result set.
Read Next: Searches the next record that matches the selection condition and
when found, the contents of record are copied to the program variable or work
area.
Modify: Also known as update. This command modifies the field values / data
of the current record then writes modified record back to the disk.
Insert: Inserts a new record to the file.
Delete: Deletes the current record and updates the file on the disk to reflect the
deletion.
4
be answered quickly, changes made to the data by different users must be
applied consistently, and access to certain parts of the data (E.g.., fees) must be
restricted.
To maintain such large data in conventional file system is a time consuming
and more critical job and we probably do not have 30 GB of main memory to
hold all the data.
Hence we can deal with this data management problem by storing the data in a
database.
There are different types of database but the most popular is a relational
database that stores data in tables where each row in the table holds the same
sort of information. In the early 1970s, Ted Codd, an IBM researcher devised
12 laws of normalization. These apply to how the data is stored and relations
between different tables that we will see in further chapters.
For example, consider a column constraint (i.e. constraint for column) “CHECK”.
The “CHECK” constraint for column will limit the values allowed for that
column.
5
For example, you could specify the data type of a column to be int, which can
store values from 0-255, but then specify a CHECK constraint that limits values
of 1-99 for that column. We will see these constraints in detail in further chapters.
Schema describes the organization of data and relationship within the database. In
short schema is a logical structure (design) of database.
Attribute:
Now we will discuss Attributes in more detail.
Domain:
6
etc. so these are the domains for the attributes sud_marks and stud_division
respectively.
Instance:
An instance of the entity represented by a set of specific values for each of the
attributes at particular time period.
For example; suppose there is a furniture shop where we are having variety of
furniture so here furniture is the entity and the attributes of furniture entity
could be Furniture_type, item_color, item_price etc.
The attributes could be same for all kind of furniture in the shop. But the values
of the attributes in each instance are different.
Thus we have chair, black, 2000 Rupees, as one instance and bed, brown,
10000 Rupees as another instance.
These two cases represent the attributes of two instances of the entity
“furniture”.
Record/ Tuple:
Attributes
Cust_name cust_id Cust_add Balance
Tuple/
Record
Jyoti A123 Nashik 50000
Entity = Customer
Domains= cust_name (jyoti, sarika, manisha), cust_id (A123, B125, C222) etc.
7
Instance= (Jyoti, A123, Nashik, 50000.) is one instance at a particular time period
T1. And suppose jyoti has withdraw 10000 Rupees from her account then at a time
period T2, then the instance at time T2 will be = (Jyoti, A123, Nashik, 40000).
Summary:
1. Retrieval Operation
2. Update Operation
*********************
8
Chapter 1
1.0 Overview
In previous chapter we studied data, files, and database. Now we may need to
study such a tool (software tool), which will manage and handle the database
efficiently.
In this chapter we will study the Database management system (DBMS),
which is software package designed to store and manage databases. You will also
study the different services that are provided by the DBMS, such as transaction
management, concurrency control, recovery management etc. and different
applications of the DBMS (i.e. where the DBMS used mostly) such as Banking,
Airlines, Universities, and Telecommunication etc. After this we will learn the
differences between conventional file systems and Database i.e. how the database is
more convenient than the conventional file system.
The next topic is abstraction level in which you will study three levels of
database abstraction (i.e. to hide specific data from specific user). Because sometimes
you may need to hide some data in database from specific users so as to avoid
complexity and to provide security to database. Hence we can use a database
abstraction level which suggests what amount of data should be hidden from which
users. Which are the database users and what are their roles will be covered in next
topic i.e. “database users”.
To write the data into the database we must use some specific languages that
DBMS can understand. The types of these languages are covered in next topic, which
is “DDL and DML”. After that we will learn the structure of database, which
describes how the data is to be stored in the database and how it is to be managed by
the DBMS.
9
1.1 Definition to DBMS
As we studied that database can contain a huge amount of data. In short it is
storage for data. If database contain large amount of data then there should be
something to manage or handle it. As all of us know our mother manages all house
related work, similarly in computer also we need to have such software or
mechanism, which will handle all database related work and for that here we are
going to study that mechanism which manages all database related work and that
mechanism is nothing but Database Management System (DBMS).
A Database Management system (DBMS) is a collection of programs (or we
can say it as a software) that enables you to store, modify and extract information
from a database.DBMS is software designed to assist in maintaining and utilizing
large collections of data in database.
Apart from these reasons DBMS provides following five services which a
conventional File system does not provide.
1. Transaction Management:
A transaction is sequence of database operations that represents a logical unit
of work (delete record, update record, modify a set of records etc).
DBMS manages these transactions. When the DBMS does a “commit” the
changes made by the transaction are made permanent. If you don’t want to
make changes permanent you can rollback the transaction and the database
remain in its original state. You will study this Transaction Management
process in detail in further chapter.
2. Concurrency Control:
In a database, number of processes or actions (such as delete record, update
record, insert new record etc) may execute simultaneously i.e. concurrently. So
this simultaneous or concurrent execution of actions must be managed
somehow.
DBMS provides concurrency control mechanism by coordinating the actions of
database manipulation processes.
10
3. Recovery Management:
A simple meaning of recovery is to return to a normal/stable condition.
Recovery mechanism in DBMS ensures that the database returned to a
consistent state after a transaction fails or abort i.e. database should not be in
inconsistent state.
Inconsistency means that two files may contain different data of the same
entity. For example, let consider a “student Admission” database with
attributes stud_name and stud_address. And another database “student
account” with attributes stud_id and stud_address. Now if the address of a
student is changed in “student Admission” database, it must be changed in
“student account” database. Because there is a possibility that it is changed in
the “Students Admission” database and not from “student account” database. In
this case, the data of the student becomes inconsistent. But the recovery
Management service of the DBMS maintains consistency in database.
4. Security Management:
Security mechanism of DBMS ensures that only authorized users are given
access to the data in database.
5. Language Interface:
To handle data in database a user has to use some kind of language, which will
be understandable for database.
And that’s why DBMS provides languages support, which is used to define and
manipulate the data in database.
Basically there are two languages used by DBMS to interact between user and
database and those are DDL and DML.
The data structures are created by the data definition language (DDL)
commands.
The data manipulation is done using the data manipulation language (DML)
commands.
We will see these languages in detail in further topics.
11
1.4 Drawbacks of File system
Here we are going to see that why the conventional File system is not so
effective as compared to database.
1) Data redundancy and inconsistency
In file processing system, the same data may be duplicated in several files.
For example, there are two files “Students” and “Library”. The file “Students”
contains the Roll No, name, address and telephone number and other details of all
students in a college. The file “Library” contains the Roll No and name of those
students who get a book from the library along with the information about the
rented books. The data of one student appears in two files. This is known as data
redundancy. This redundancy causes higher storage.
This situation can also result in data inconsistency. Inconsistency means that two
files may contain different data of the same student. For example, if the address of a
student is changed, it must be changed in both files. There is a possibility that it is
changed in the “Students” file and not from “Library” file. In this case, the data of
the student becomes inconsistent.
3) Data Isolation:
Data are scattered in various files, and the files may be in different format,
writing new application program to retrieve data is difficult.
4) Integrity Problems
The data values may need to satisfy some integrity constraints. For example:
let consider a “student info” database, which contains all information of students in
a college. Now suppose there is a field ‘Stud_marks’ in that database. Then value
of this field must be greater than or equal to 0, because student’s marks never been
less than 0.
If we are using conventional file system then we have to handle this integrity
constraint through programming (program code). But in database we can declare
the integrity constraints along with definition itself (that you will study in SQL
chap).
5) Atomicity Problem
It is difficult to ensure atomicity in file processing system. For example; in
online banking system suppose you are transferring $1000 from Account A to
account B. And if a failure occurs during transaction there could be situation like
$1000 is deducted from Account A and not credited in Account B.
7) Security Problems
Enforcing Security Constraints in file processing system is very difficult.
12
1.5 Applications of DBMS
Database is widely used all around the world in different sectors such as:
6. Finance: For storing information about holdings, sales and purchase of financial
instruments such as stocks and bonds.
10.Web based services: For taking web users feedback, responses, resource
sharing online shopping etc.
Data Abstraction:
In our day-to-day life many times we need to hide specific information from
different persons for their convenience.
For example: suppose you are interested in purchasing a new car. At that time a car
manufacturing company hides some details from you (such as where the car is
manufactured, how many engineers made the design of this car, or who did
manufacture the car etc.), to remove the complexity in your mind. Similarly DBMS
hides certain data in the database from different users.
So the abstraction means to hide the specific data from different database users.
Major aim of a DBMS is to provide users with an abstract view of data.
Data abstraction hides certain details of how the data are stored & maintained
in database.
Most DB users are not computer trained, developers hide complexity through
several levels of abstraction to simplify user’s interaction with the systems
13
Three levels of data abstraction are:
This is the lowest level of abstraction which describes how data are actually
stored
It also describes complex low-level data structures in detail
The physical schema (physical level) specifies additional storage details for
data in database. Essentially, the physical schema summarizes how the relations
described in the conceptual schema are actually stored on secondary storage
devices such as disks and tapes.
The process of arriving at a good physical schema is called physical
database design.
Describes what data are stored in the database & what relationships exist among
those data
The conceptual schema (sometimes called the logical schema) describes the
stored data in terms of the data model of the DBMS. In a relational DBMS, the
conceptual schema describes all relations that are stored in the database.
14
For Example: Let consider there is a University database, which contains
relations/tables such as “student”, “faculty”, and “students’ enrollment in courses”,
these relations contain information about entities, such as “students” and
“Faculty”, and about relationships, such as “student’s enrollment in courses”. We
can define the following conceptual schema for these relations:
In this example we can say that “Student” contains sid as type string, then age as
integer, fname as string etc.
1) Application Programmers:
Application programmer is the person who is responsible for implementing
the required functionality of database for the end user. Application programmer
works according to the specification provided by the system analyst.
15
user interface. They perform all operations by using simple commands (menus and
buttons) provided in the user interface.
The Data Definition Language (DDL) is used to create and destroy databases
and database objects. Database administrators will primarily use these
commands during the setup and removal phases, of a database project.
DML is used to change the data in the database tables. Instructions of DML are
well known for everyone: insert, update, delete.
DML statements are used for managing data within schema objects i.e. Data
within database.
16
Some examples:
SELECT - retrieve data from the a database
INSERT - insert data into a table
UPDATE - updates existing data within a table
DELETE - deletes all records from a table, the space for the records remain
CALL - call a PL/SQL or Java subprogram.
Procedural DML:
A low-level or procedural DML allows the user, i.e. programmer to specify
what data is needed and how to obtain it. This type of DML typically retrieves
individual records from the database and processes each separately.
The programmers use the low-level DML.
Example of procedural DML is “Relational Algebra”.
Nonprocedural DML:
A high-level or non-procedural DML allows the user to specify what data is
required without specifying how it is to be obtained. Many DBMSs allow high-level
DML statements either to be entered interactively from a terminal or to be
embedded in a general-purpose programming language.
The end-users use a high-level DML to specify their requests to DBMS to
retrieve data. Usually a single statement is given to the DBMS to retrieve or update
multiple records. The DBMS translates a DML statement into a procedure that
manipulates the set of records. The examples of non-procedural DMLs are SQL and
QBE (Query-By-Example) that are used by relational database systems. These
languages are easier to learn and use. The part of a non-procedural DML, which is
related to data retrieval from database, is known as query language.
17
Figure 1.2 Structure of DBMS
2. DML Compiler and Query optimizer - The DML commands such as insert,
update, delete, retrieve from the application program are sent to the DML compiler
for compilation into object code for database access. The object code is then
optimized in the best way to execute a query by the query optimizer and then send
to the data manager.
3. Data Manager - The Data Manager is the central software component of the
DBMS also knows as Database Control System.
18
The Main Functions Of Data Manager Are: –
• Convert operations in user's Queries coming from the application programs or
combination of DML Compiler and Query optimizer which is known as Query
Processor from user's logical view to physical file system.
• Controls DBMS information access that is stored on disk.
• It also controls handling buffers in main memory.
• It also enforces constraints to maintain consistency and integrity of the data.
• It also synchronizes the simultaneous operations performed by the concurrent
users.
• It also controls the backup and recovery operations.
4. Data Dictionary –
Data Dictionary is a repository of description of data in the database. It contains
information about
• Data - names of the tables, names of attributes of each table, length of attributes,
and number of rows in each table.
• Relationships between database transactions and data items referenced by them,
which is useful in determining which transactions are affected when certain data
definitions are changed.
• Constraints on data i.e. range of values permitted.
• Detailed information on physical database design such as storage structure,
access paths, files and record sizes.
• Access Authorization - is the Description of database users their responsibilities
and their access rights.
• Usage statistics such as frequency of query and transactions.
Data dictionary is used to actually control the data integrity, database operation and
accuracy. It may be used as a important part of the DBMS.
6. Compiled DML - The DML complier converts the high level Queries into low
level file access commands known as compiled DML.
19
1.10 Metadata:
Metadata is loosely defined as data about data.
For example: a web page may include metadata specifying what language it's
written in, what tools were used to create it, and where to go for more on the
subject, allowing browsers to automatically improve the experience of users.
For example, a digital image may include metadata that describes how large
the picture is, the color depth, the image resolution, when the image was
created, and other data. A text document's metadata may contain information
about how long the document is, who the author is, when the document was
written, and a short summary of the document.
Metadata is data about data.
As such, metadata can be stored and managed in a database, often called a
registry or repository. However, it is impossible to identify metadata just by
looking at it because a user would not know when data is metadata or just data.
Summary
DBMS is a collection of programs that enables you to store, modify and extract
information from a database.
Applications of DBMS:
Following are the applications of DBMS
-Banking:
-Airlines:
-Universities:
-Credit card transactions:
-Telecommunications:
-Finance
-Sales
-Manufacturing
-Human Resources
-Web based services
20
Drawbacks of File system:
-Data redundancy and inconsistency:
-Difficulty in Accessing Data
-Data Isolation:
-Integrity Problems
-Atomicity Problem
-Concurrent Access anomalies
-Security Problems
Database users:
1. Application Programmers
2. End Users:
I. Naive Users:
II. Sophisticated Users:
3. Database Administrator:
DDL:The Data Definition Language (DDL) is used to create and destroy databases
and database objects.
21
4. …… and …… are two main types of DML.
5. Data Files:
a. It contains the data portion of the database.
b. It contains Metadata
c. It contains DML commands
d. It contains DDL commands
9. Metadata is….
a. Complex data item in database b. Data about data
c. Data about entity d. Hidden data in database.
**********************
22
Chapter 2
2.0 Overview
In this chapter we are going to study different data models such as object
based data models, Record based data models and their subtypes. A data model is
sometimes called a database model, which describes how a database is structured and
used. That means different data models give us different designs of database and their
corresponding uses. We are going to see those design types that is data models in this
chapter.
When we design database we should provide some restrictions and rules for
that database. These rules and restrictions are known as Constraints. In DBMS we call
these constraints as Integrity Constraint. Integrity constraints that we are going to see
in this chapter are NOT NULL, Entity integrity, Referential integrity and Unique key
constraints.
We will also cover Relational Algebra, which is nothing but the collection of different
operations that are used to modify the contents of data in database.
23
2.1 Introduction to Database
As we know database is basically used to store the data. And it represents the
data with different views; sometimes in table format or by using some graphical
structures. In short a database may have different structures and each structure
represents data with different view.
A data model is a collection of concepts, which describes the structure of
database. Similarly it provides the necessary mean to achieve abstraction of data in
database. What is Data Abstraction and why it is required that we already have seen in
last chapter.
A data model also defines the method of storing and retrieving data in database.
Three types of data models are available:
This model describes data at the conceptual and view levels. What is there in
conceptual and view level that we already have seen in “Data Abstraction” topic.
Databases designed under this model are having more flexible
structure.
By using this model one can specify data constraints explicitly. Data or
domain constraints means: Each column contains same type of data thus when you
select a data type for a particular domain then DBMS will not accept any value of
other data type.
Object based logical model includes following sub-data models:
24
For ex: if there are two relations (databases), say, “cust_details” and “acc_details”.
Then the third relation, say, “cust_acc” specifies relationship associated with a
customer and each account he or she has.
The set of all entities or relationships of the same type is called the entity
set or relationship set. (These terms we will study in detail in chap: E-R diagram)
Another essential element of the E-R diagram is the mapping cardinalities,
which express the number of entities to which another entity can be associated via
a relationship set. This topic too will be covered in chap Entity Relational Model.
Now let’s see some basic terms that Entity Relational Model uses while representing
data in database.
An E-R diagram can express the overall logical structure of a database graphically
and in E-R model-
For example: let consider an entity “student” which have attributes Roll_no, name,
class etc and “book” is another entity which has attributes book_code, book_name,
price, author etc. And a Relation called “Buy” shows the relationship between these
two entities. This is because the student can buy book or a book can be bought by the
student.
Here ‘M’ indicates Many to Many cardinality. I.e. one student can purchase more
than one book and more than one student can purchase one book.
25
-Object databases have been considered since the early 1980s and 1990s but they have
made little impact on mainstream commercial data processing, though there is some
usage in specialized areas.
26
2. Unlike entities in the E-R model, each object has its own unique identity,
independent of the values it contains:
Record based logical model describe data at the conceptual and view levels.
Unlike object-oriented models, these models are used to
1. Specify overall logical structure of the database, and
2. Provide a higher-level description of the implementation.
We are familiar with the ER model, which is an example of an object based logical
model. There are other data models one of those is Record based logical model.
Record based model consists of three types.
A network model is similar to a hierarchical model in the way that data and the
relationships among them are represented in the form of records and links. However,
records in a database are represented graphically.
In the relational model, the table in a database has fixed record length with
fixed number of attributes or fields.
27
A relational model provides the basis for a relational database. A relational model has
three aspects:
Structures
Operations
Integrity rules
Operations are used to manipulate data and structures in a database. When using
operations, you must hold a predefined set of integrity rules.
Integrity rules are laws that govern the operations allowed on data in a database. This
ensures data accuracy and consistency.
A Foreign key is a column or set of columns that refers to a primary key in the
same table or another table. You use foreign keys to establish connections between, or
within, tables. A foreign key must either match a primary key or else be NULL.
29
Attribute (column)
Relation Name Field
R A1 …….. An
Value
Example: The E-R diagram shown in Figure 2.1 can be represented in relational
model as shown in Figure2.5 below.
“ Student”
“Book”
Roll no Name
Book_name Book_cod Author Price
13 Atul DBMS A123 Korth 450
22 Manisha OS A223 stalling 400
23 Shital JAVA A144 Balguru 350
“Buy”
Roll Book_name
13 DBMS
22 OS
30
Name Street City Member Member Balance
Jiten Maple Mumbai 990 990 10000
Jolly West Nashik 336 336 25000
Sudha Sidehill Pune 800 644 15000
Jolly West Nashik 644 800 40000
Sudha Sidehill Pune 644
31
type acc_detail = record
account-number: string;
balance: integer;
end
990 10000
644 15000
For example, In MySQL and SQL Server the data type we use for column may be
different.
This model is used to describe data at the lowest level. This model contains few
models, E.g. Unifying model and Frame memory.
32
integrity constraints are identified as the two minimum constraints that must
be enforced by the DBMS.
By default, all columns in a relation allow null values. Null means the absence
of a value. A NOT NULL constraint requires a column of a table contain no null
values.For ex: Consider below relation of “Stud _info” in which NOT NULL
constraint is applied on column “Marks” hence no row may contain NULL value for
this column. Similarly NOT NULL constraint is not applied for “ID” hence any row
can contain NULL value for this column.
“Stud_info”
ID Name Marks
1 Pooja 89
2 Vivek 90
Mohini 79
4 Vivek 86
The entity integrity constraint states that no primary key value can be null. This is
because the primary key value is used to identify individual tuples in a relation .
Having null value for the primary key implies that we cannot identify some tuples.
ID Name Marks
1 Pooja 89
2 Vivek 90
Null Mohini 79
2 Vivek 86
ID Name Marks
1 Pooja 89
2 Vivek 90
3 Mohini 79
4 Vivek 86
Figure: 2.11 example relation “stud_info1”
Referential Integrity or foreign key constraint
Often we wish to ensure that a value appearing in a relation for a given set of
attributes also appears for another set of attributes in another relation. This is
called referential integrity.
Different tables in a relational database can be related by common columns,
and the rules that govern the relationship of the columns must be maintained.
Referential integrity rules guarantee that these relationships are preserved.
This constraint asserts that a reference in one data item indeed leads to another
data item. A foreign key is a field that is a primary key in another table.
1) Not inserting a record if the value of the foreign key being inserted does not match
an existing record with the primary key having the same value in another related
table.
2) Not deleting a record whose primary key is defined as a foreign key in another
table and
3) Not modifying the value of primary keys.
Consider below example of a database that has not enforced referential
integrity.
“Stud”
Stud_id Stud_name
1 Pooja
2 Jaya
3 Akash
4 Viren
34
Delete 4 Viren
record
From “ Stud”
3 A1 Java
4 A2 .Net
3 A3 PHP
In this example, there is a foreign key (Stud_id) value ‘4’ in the course table.
This value references a non-existent student — in other words there is a foreign
key value with no corresponding primary key value in the referenced table.This
anomaly came about when the record for a student called "Viren", with an Stud_id of
"4", was deleted from the Stud table, even though the course ".Net" referred to this
student. If referential integrity had been enforced, the deletion of the main record i.e.
(4, Viren) in “Stud” would have been possible, but its associated record in “course”
would have been deleted as well.
A UNIQUE key integrity constraint states that every value in a column or set
of columns (key) is unique—that is, no two rows of a table have duplicate values in a
specified column or set of columns.
1 A1 Java
2 A2 .Net
3 A3 PHP
35
Suppose we insert a new row (4, A3, PHP). This row violates the UNIQUE
key constraint, because “A3” is already present in another row; therefore this insertion
is not allowed in course relation.
Now if we insert a new row, say, (4, null, PHP). Then this row is
allowed because the null value is entered for “Course_id” column. However if a NOT
NULL constraint is also applied on “Course_id” column, this row is not allowed.
36
Id Name Rating Age
28 John 9 40
55 Smith 10 35
44 Angel 7 25
50 Grek 9 35
Figure 2.16 Instance S2
The subscript rating > 8 specifies the selection criterion to be applied while
retrieving tuples.
The projection operator π allows us to extract desired columns from a
relation and remaining columns are left out.
We list those column names (attributes) that we wish to appear in the
result as a subscript to π.
for example, we can find out all sailor names and ratings by using π.
The expression
πname, rating(S2)
Evaluates to the relation shown in Figure 2.18. The subscript name, rating
specifies the fields to be retained; the other fields are ‘projected out.’
37
Figure 2.18 πname,rating(S2)
The schema of the result of a projection is determined by the fields that are
projected in the obvious way.
Suppose that we want to find out only the ages of sailors.
The expression
πage(S2)
The following standard operations on sets are also available in relational algebra:
Union ( U ),
intersection (∩),
set-difference (−),
cross-product (×).
Union:
38
– They have the same number of the fields (attributes are referred here as
field), and
– Corresponding fields, taken in order from left to right, have the same
domains
Example:
The union of S1 and S2 is shown in Figure2.20.
Fields are listed in order.
field names are also inherited from S1.
S2 has the same field names, of course, since it is also an instance of
Sailors.
In general, fields of S2 may have different names; recall that we
require only domains to match.
Note that the result is a set of tuples. Tuples that appear in both S1 and
S2 appear only once in S1 U S2.
Intersection:
Example:
The intersection of S1and S2 is shown in Figure2.21.
Figure 2.21 S1 ∩ S2
Set-difference:
R−S returns a relation instance containing all tuples that occur
in R but not in S.
The relations R and S must be union-compatible, and the
schema of the result is defined to be identical to the schema of R.
39
Example:
S1×S2 returns a relation instance whose schema contains all the fields of S1
(in the same order as they appear in S1) followed by all the fields of S2 (in the
same order as they appear in S2).
The cross-product operation is sometimes called Cartesian product.
We will use the convention that the fields of S1×S2 inherit names from the
corresponding fields of S1 and S2.
It is possible for both S1 and S2 to contain one or more fields having the same
name; this situation creates a naming conflict.
The corresponding fields in S1×S2 are unnamed and are referred to solely by
position.
Because R1and S1 both have a field named id, by our convention on field names,
the corresponding two fields in S1×R1 are unnamed, and referred to solely by the
position in which they appear in Figure2.24.
The fields in S1×R1 have the same domains as the corresponding fields in R1 and
S1.
In Figure2.21 id is listed in parentheses to emphasize that it is not an inherited
field name; only the corresponding domain is inherited.
40
Figure 2.24 S1 × R1
2.4.3 Joins
The join operation is one of the most useful operations in relational algebra
and is the most commonly used way to combine information from two or more
relations.
The join operation, as the name suggests, allows the combining of two
relations to form a single new relation.
It is denoted by symbol .
The join is very important for any relational database with more than a single
relation, because it allows us to process relationships among relations.
In its simplest form the JOIN operator is just the cross product of the two
relations.
As the join becomes more complex, tuples are removed within the cross
product to make the result of the join more meaningful.
JOIN allows you to evaluate a join condition between the attributes of the
relations on which the join is undertaken.
R JOINjoin condition S
41
There is one difference between JOIN operation and CARTESIAN
PRODUCT that, JOIN is only combination of tuples satisfying the join
condition (i.e. they should be equal on their common attribute names) appear
in the result where as CARTESIAN PRODUCT takes all combination of
tuples that are included in result.
Natural join:
Natural join ( ) is a binary operator that is written as (R S)
where R and S are relations.
The result of the natural join is the set of all combinations of tuples
in R and S that are equal on their common attribute names.
For an example consider the tables Employee and Dept and their natural join:
Name Id Dept
Archana 123 Finance Dept Manager
Pramod 124 Sales Finance Shah
Pooja 125 Finance Sales Verma
Vivek 126 Sales Production Mitali
Figure 2.25: Relation for “Employee” Figure 2.26: Relation for “Department”
The relation Employee contains name, id, and department of the employee.
And the relation “Department” contains name of departments and name of manager of
that particular department. Then we performed natural join operation on these two
42
relations and the resulting relation is given in Figure 2.27 Employee Department.
In “Employee” relation the first record contains name “Archana” whose id is 123 and
she belongs to the department “Finance”. And in the relation “Department” the first
entry (i.e. dept name) is Finance and the manager of Finance is Shah. Hence when we
perform natural join then in Employee Department we will have Archana whose
id is 123 and dept is Finance. And her manager is Shah.
Invariably the JOIN involves an equality test, and thus is often described as an
equi-join. Such joins result in two attributes in the resulting relation having
exactly the same value.
A `natural join' will remove the duplicate attribute(s).
There are different types of Join Operations such as Outer Join, Inner Join etc that
we will cover in Chapter “SQL”
Summary:
2. Record-based Logical Models: Record based logical model describe data at the
conceptual and view levels. The data models come under this model are-
i. The Relational Model
ii. The Network Model
iii. The Hierarchical Model
3. Physical Data Models: This model is used to describe data at the lowest level.
Integrity constraints
An integrity constraint is a condition that is enforced automatically by the DBMS and
that prevents the data from being stored in the database.
4. Unique Key Constraint: A UNIQUE key integrity constraint states that every
value in a column or set of columns (key) is unique—that is, no two rows of a
table have duplicate values in a specified column or set of columns
43
column from a table. That means the select operation selects tuples that
satisfy a given predicate (condition).
Projection Operation: The projection operator π allows us to extract
desired columns from a relation and remaining columns are left out.
Set Operations
Union ( U ),
Intersection (∩),
Set-difference (−),
Cross-product (×).
Joins: Join operation is the most commonly used to combine information
from two or more relations.
4. The number of entities to which another entity can be associated via a relationship
set is called...
a. The mapping cardinalities
b. The entity Relationship
c. Integrity Constraint
d. None of the above
44
7. Selection Operation in relational algebra is denoted by a symbol
a. σ
b. π
c. >
d. <!
a. Union
b. Intersection
c. Cross Product
d. Join
a. ∑
b. ∞
c. ∆
d.
10. S1 ∩ S2 returns-
a. A relation instance containing all tuples that are present in both S1 and S2.
b. A relation instance containing all tuples that are not present in S1.
c. A relation instance containing all tuples that are not present in S2.
d. A relation instance containing all tuples that are not present in both S1 and S2.
******************
45
Chapter 3
Overview: -
In any organization we need to maintain data related to our organization in
systematic manner, so it can be maintained easily using SQL (Structural Query
Language). For managing, updating, accessing and to perform various operations on
databases we required languages, which will maintain all theses functionality so SQL
was developed. Fundamentally SQL is collection of a number of different elements:
queries, statements, expressions, predicates, and clauses .By using different
statements, expression, constraints, queries we can easily preserve & perform
operations. It is very effective & easy by using SQL.
In this chapter we learn many things related to deletion, retrieval &
modification of data, how to create, delete tables, how to define constraints on tables,
how constraints helps for restricting data, how joins of multiple table is useful etc. We
will see how SQL helps in managing and accessing of data efficiently. Let’s start with
introduction to SQL.
3.0. Introduction: -
Structured Query Language (SQL) is standardized by the American National
Standards Institute (ANSI) and the International Organization for Standardization
(ISO). SQL consist queries, Expression, Statements, which required for giving
instructions to the database. Let’s see how SQL works.
Example: -Suppose you want to filter the rows and for that purpose you use condition
for filtering. All rows satisfying the condition are retrieving in a single step. All SQL
statements use the optimizer. Optimizer it is component of database management
46
system which consider query plans for a given input query, and try to decide which of
those plans will be the better for accessing the specified data.
Definition: - A data type is a set of data with values having predefined characteristics.
A data type is a type of data. In SQL a wide range of data types is present. By
using different data types we can store data in different forms, including character
strings, numbers, file and date.
Example: -For name of student, we may use a string data type for the student’s first
and last name.
Following table shows data types with their syntax and explanation .
47
char(n) char(n) Max 8,000 Char Fixed-length character string.
varchar(n) varchar(n) Max 8,000 Char Variable-length character string.
text text Max 2GB of text Variable-length character string.
data
bit bit(x) Allows 0, 1, or NULL
date date 8 bytes Stores year, month, and day values.
datetime datetime 8 bytes Store Date & Time in numeric
format
Eg. 2007-10-08 12:35:30.123
Time time 3-5 bytes Stores the hour, minute, and second
values.
Image Binary Value Maximum 2GB Image Variables store up
to 2 gigabytes of data and used to
store any type of data file (not just
images).
Currency Currency 8 bytes Use for currency.
AutoNumber AutoNumber 4 bytes AutoNumber fields mechanically
give each record its own number,
usually starting at 1
Ole Object Ole Object Up to 1GB Can store pictures, audio, video, or
other BLOBs (Binary Large
Objects)
Syntax:
Select A1, A2…An
From R1…Rn
Where C
In select we can use asterisk (*) for selection of all rows from tables.
Example: -
SELECT RollNo, Studname, Class
FROM Student WHERE Studname=’paresh’
Above query obtain only those rows where student name is paresh.
48
In general, attributes are referenced in the form R.A Means Relation. Attributes
3.4. SELECT: -
It is used to retrieve data from an SQL database. This process is also known as
a query. The asterisk (*) can be used to SELECT all columns of a table. The SQL
SELECT command is used as follows
RollNo Name
101 Sunita
201 Ramesh
202 Poonam
203 Trupta
Similarly if we use asterisk Symbol in place of column names then query will
retrieve all rows, columns from Student table.
49
4 Ravi Pune Station Pune 1000
5 Nayana Nashik Road Nashik 2600
6 Sachin Juhu Mumbai 2600
City
Nashik
Pune
Mumbai
Operator Description
= Equal
<> Not Equal
> Greater Than
< Less Than
>= Greater Than Or Equal
<= Less Than Or Equal
Example: -Select all rows from Student table given in Figure. 3.4. Where Student
Name is Trupta
Select * From Student Where Name=’ Trupta’
50
3.4.3. Where Clause with Logical Operator: -
And, OR, In, Between, Like these are logical operators whoch we can use with
Where clause.
1. And, OR Operator:
In Where we can use condition but for specify multiple criteria & multiple
condition we used And, Or, In, Between, Like Operator.
In case of AND operator both condition must be satisfy. If out of two, one
condition is not satisfy then query didn’t retrieve any row.
In case of OR operator if one condition is satisfy out of both condition, query
retrieves all rows for which condition is satisfy.
Example: -Select all rows from Student table given in Figure. 3.4. Where RollNo
greater than equal to 101 and less than 201.
Query: -
Select * From Students
Where RollNo>=101 and RollNo<202
RollNo Name
101 Sunita
201 Ramesh
Select all rows from Student table given in Figure. 3.4. Where Name is Sunita
or Poonam.
RollNo Name
101 Sunita
202 Poonam
2. In Operator: -
51
The syntax for using the IN keyword is as follows: -
You can use one or more values in the parenthesis in where clause of select
query. Each value is separated by comma. We can use numerical or characters values.
If only one value is present inside the parenthesis, it is equivalent to check for only
one value as shown below.
Example: -Select all rows from Student table given in Figure. 3.4. Where Name is
Sunita, Poonam.
RollNo Name
101 Sunita
202 Poonam
3. BETWEEN Operator: -
In Operator help for selection from one or more distinct values, but if we want
collection from a range, we used BETWEEN operator. It used to verify whether we
access data between the values range we mention.
This will select all rows whose column has a value between 'value1' and
'value2'.
Example: - Select all rows from Student table given in Figure. 3.4 where RollNo is
between 101 and 202.
RollNo Name
101 Sunita
201 Ramesh
202 Poonam
52
4. LIKE Operator: -
Sometimes we want to access data from tables that matches specified pattern
in columns, in that case we use like operator. It is a method to check for matching
strings. We can use wildcards ahead of the pattern as well as after the pattern.
When we want to match any single character within the particular range or set
we use wild character []. We can identify range in square brackets []. If we want to
use opposite condition means any single character not within the specified range or
set we use [^] .
Example: -Consider Students Table as show below. Select Students Name that is
starts with ‘S’ from Students table.
It is also possible to select the Students living in the city that NOT contains the
pattern "shik" from the "Students" table, by using the NOT keyword.
SELECT * FROM Students WHERE City NOT LIKE '%shik%'
53
3.4.4. SQL Order by Keyword: -
Sometime we want to sort the result-set by a specified column, at that time
we used ORDER BY keyword. In Order By Keyword by default it sort the records in
ascending order. If you want to sort in a descending order, you can use the DESC
keyword. It gives choices for sorting result set ascending or descending.
Name City
Sachin Mumbai
Nayana Nashik
Suresh Nashik
Ramesh Nashik
Kavita Pune
Ravi Pune
54
Nayana Nashik
Suresh Nashik
Sachin Mumbai
55
Select Last (Amount) From Student Output: - 18000
Syntax: -Now ()
SELECT NOW () FROM table_name
Example: -
SELECT RollNo, Amount, Now () as Date
FROM Student where RollNo=1021
Output: -
Example: -
SELECT RollNo, ROUND (Percentage, 0) as Percentage
FROM Student Where RollNo=1021
Example: -
SELECT RollNo, Amount, FORMAT (Now (),'YYYY-MM-DD') as Date
From Student Where RollNo =1021
Output: -
RollNo Amount Date
1021 150000 2010-01-
20
When we want grouping the result dataset by certain database table columns
,we use GROUP BY statement along with the SQL aggregate functions.
56
Example: -Consider Following table Figure. 3.4.6 if we want to select student records
whose fees is maximum among all students & group student by there name, we use
group by statement as follows.
Studname Fees
Deepak 18000
3.4.7.Having Clause: -
We can’t use where clause with aggregate functions so SQL HAVING clause
is used to perform action on groups. It is used with the SELECT clause to specify a
search condition for a group or aggregate. Having clause is applicable only for groups
whereas where is limited to individual rows, not for groups.
Syntax: -
SELECT column1, column2, ... column_n, aggregate_function (expression)
FROM tables WHERE predicates
GROUP BY column1, column2, ... column_n
HAVING condition1 ... condition_n;
57
Query :-SELECT Studname, count (*) as "Total count of name"
FROM FeeInfo
WHERE Studname ='Suresh'
group by studname
HAVING count (*)>1
3.5. CREATE: -
Data is stored in basic structure like Table in SQL. For creating new table in
SQL we used Create Table Statements, it adds a new table to an SQL database. Tables
consist of rows and columns. Create Table have following
Syntax: -
CREATE TABLE "table_name"
("column 1" "data_type_for_column_1",
"column 2" "data_type_for_column_2",...)
Example: - if we have a table for recording Students information, then the columns
may include information such as RollNo, Name, Address, City, Birth Date, and so on.
We use create statement as follows:-
If we want to check that before dropping any table it is exist or not we use IF
EXISTS clause .This clause will drop the table only if it exists. If the table does not
exist, it will create error.
Before actually dropping student table it check for existences of student table.
58
3.5.2. Constraints: -
In general Constraints are like rules. Sometime we want to restrict for doing
some operation on database so constraint are used to enforce the integrity between
columns and tables & for controlling values allowed in columns. By using constraint
we can avoid user from inserting certain types of mismatched data values in the
column(s). Constraint ensures correctness and reliability of the data in the database.
Entity Integrity: When we want to check for no duplicate rows exist in a table we
used entity integrity.
Domain Integrity: -It allows only legal entries for a specified column by check on
type, format, or the range of possible values.
Referential integrity: -it checks that rows cannot be deleted, which are used by other
records.
User-Defined Integrity: -When we want to check that various specific business rules
that must be follow, no one can crash that rules at that time we use user defined
integrity. For enforce these categories of the data integrity we used appropriate
constraints.
SQL supports the following constraints:
PRIMARY KEY
UNIQUE
FOREIGN KEY
CHECK
NOT NULL
1. Not Null: -
Many times user didn’t enter data for field in table and we want to check
column not to accept NULL values, we used NOT NULL constraint. NOT NULL
Constraint enforces the field to accept a value. We Specify Not Null for Column
where we want that user must enter data for that field. This means that you cannot
insert a new record, or update a record without adding a value to this field.
Example: -Following Example shows Student table with Not Null Constraint. Here
user can’t enter RollNo as Null. We restrict user to enter Students RollNo.i.e. RollNo
is required for our tables.
59
2. Primary Key: -
A primary key is used to uniquely identify each row in a table. It can be used
either when the table is created using create table statement or by changing the
existing table structure using alter table. It consists of one or more fields on a table.
When multiple fields are used as a primary key, they are called a composite key.
Example: -
CREATE TABLE Student
(RollNo integer PRIMARY KEY,
FirstName varchar (30),
LastName varchar (30));
3.Foreign Key: -
A FOREIGN KEY in one table point to a PRIMARY KEY in another table. It
is used to make sure referential integrity of the data. When we want to generate
foreign key we must have two tables one is parent table & child table. Both tables
have relationship with each other based on common column. A common column in
both tables relates parent table & child table.
Syntax: -
CREATE TABLE ChildTableName
(Column_Name Size,
Foreign Key (Common Key)
references ParentTableName (Common_Key));
Example: -Consider two tables Student & FeeInfo having RollNo as a common Field.
As shown in following example student has RollNo as Primary Key which have
references to RollNo field of FeeInfo table .In FeeInfo table ChallenNo is Primary
Key & RollNo is Foreign Key.
60
4. UNIQUE constraint: -
It used to check the uniqueness of the values in a set of columns, so no
duplicate values are entered. It is used to enforce primary key constraints.
One Major difference between Unique & Primary Key Constraint is that
primary key & unique both can’t allow repeated value, but in case of primary key we
can’t enter Null for that column, while in unique constraint we can enter Null Value.
Example: -
CREATE TABLE DEPARTMENT
(DEPTNO integer UNIQUE,
DEPTName varchar (30))
5. CHECK constraint: -
A CHECK constraint is used to limit the values that can be placed in a
column. It is used to enforce domain integrity.
For disable constraint we used NOCHECK & for enabling Check constraint
we used CHECK CONSTRAINT
3.6. INSERT: -
When we want to insert records into a table in the database we used INSERT
Command. This statement comes in three forms.
Syntax: -
INSERT INTO table_name [(Col1,Col2,....)]
VALUES (Expression1,Experssion2, .... )
61
Example of first forms: -Consider Students table in Figure. 3.4. Insert RollNo 105 &
name Satish in it.
Query as follows: -
INSERT INTO Students (RollNo, Name) VALUES (105,’Satish’)
The second form is used to copy information from a SELECT query into the
table specified in the INSERT statement.
Syntax: -
INSERT INTO table_name1 [(Col1, Col2, .... )]
SELECT Col1, Col2, ... ColN From Table_Name2
Example of second form:-Insert records into Students from StudInfo table where
Students Name is Soham.
Syntax: -
INSERT INTO table_name
SET Col1 = expression1, Col2 = expression2, ....
Example:-Insert RollNo 205 into student’s tables where students name is Satish .
3.7. UPDATE: -
When we want to update information in a table we use update query. It is used
to set multiple expression value to multiple columns also used to update query with
where clause for obtaining conditionality in query.
62
Examples: -Consider table Students increases students ’s Fees by 100.
UPDATE Students SET Fees = Fees+100
Examples: -update Students table set name to Sumit where Fess is 15000
3.8. DELETE: -
Example: -Consider table in Figure.3.7 Delete record from Students table where
student’s name is Trupta and RollNo is 202
3.9. ALTER: -
63
ALTER TABLE table_name ALTER [COLUMN] column_name DROP DEFAULT
ALTER Constraint:
Example1: -Suppose you create student table but after creation you want to modify
Students tables such that RollNo field should be primary key of table. We alter
Students Table as Follow:-
Example2:-If We created Employee table having Primary key & we created another
table Emp having Field EmpId. But we want to make relation between these two
tables based on their similar Field. So we give references of Parent Table to Child
table EmpINFO. We alter structure of Child table as follow.
SQL table columns can be altered and changed using the MODIFY
COLUMN command.
Example1: -Consider we create Order Table having not null constraint for Order’s
Quantity & OrderID is Primary key for order table as follow:
After creating this Order table if we want to delete Constraint for Order’s
Quantity we drop it as follow:
64
Example2: - If you have check constraint and you want to drop Check Constraint for
salary.
1. ROLE: -
SQL offer Create Role statement to create role. Role is set of privileges that
can be granted to users or to other roles. For adding privileges to a role we use
the GRANT statement. Privileges express the access rights that are provided to a user
on a database object. When you firstly create any role, it is empty, after creating it you
add privileges (rights) to a role.
2. DROP ROLE: -
To remove the specified role(s) we used DROP ROLE.
3. GRANT: -
For offering access or rights on the database objects to the users, we use
GRANT Statement.
Syntax: -
GRANT privilege_name
ON object_name To {user_name | Public role_name}
Privilege_name :-It is the right or access right like ALL, EXECUTE, and
SELECT granted to the user.
Object_name :-It is the name of a database object like TABLE, VIEW,
STORED PROC and SEQUENCE.
user_name: -name of the user to whom an access right is being granted.
PUBLIC: - used to grant access rights to all users.
65
ROLES :-It is a set of privileges grouped together.
Above Query will assign all right to user john .So John user can do any
operation like insert, update, and delete on student table.
4. REVOKE: -
To remove user access rights or privileges from the database objects, we use
REVOKE command.
Syntax: -REVOKE privilege_name ON object_name
FROM {user_name | PUBLIC | role_name}
1. Rename: -
We can rename any table by using RENAME SQL command. If we use
rename command, the data will not be lost only the table name will be changed to
new name.
Example: -we will change the name of our student table name to StudInfo table.
Rename Table Student To StudInfo
66
Syntax: -ALTER TABLE table_name
RENAME COLUMN old_name to new_name;
2. Tuple Variables: -
Set of one or more attributes is known as tuples. Tuple variables are used in
the SQL clause known as FROM.
3. Set Operations: -
SQL set operators allows to join (combine) results from two or more SELECT
statements. It combines rows from different queries. The four set operators union,
union all, intersect and minus allows to serially combine more than one select
statement.
UNION Operator: -
UNION combines the results of two SQL queries into a single table of all
matching rows. The two queries must result in the same number of columns and
compatible data types in order to unite.
Syntax: -
SELECT * FROM Table1
UNION
SELECT * FROM Table2
67
UNION ALL Operator: -
The UNION operator selects only distinct values by default. To allow duplicate
values, use UNION ALL.
Syntax: -
SELECT Col1, Col2, Col3 From Table1
Union ALL
SELECT Col1, Col2, Col3 From Table2
T1 T2
RollNo Name
01 Ravi
02 Vinod
03 Sudhir
01 Jayesh
02 Deepak
03 Sudhir
04 Kalpesh
INTERSECT Operator: -
The Intersect operator takes the results of two queries and returns only rows
that appear in both result sets. This operator does not distinguish between Nulls. It
removes duplicate rows from the final result set. The INTERSECT ALL operator
does not remove duplicate rows from the final result set.
Syntax:-
SELECT Col1, Col2, Col3 From Table1
Intersect
SELECT Col1, Col2, Col3 From Table2
68
Example: -Consider two table T1 & T2 as given below:
Roll_No Name
Roll_No Name
101 Sachin
101 Sachin
103 Nayan
102 Trupta
105 Disha
103 Nayan
106 Deepak
104 Tanvi
T1 T2
RollNo Name
101 Sachin
103 Nayan
Minus Operator: -
EXCEPT operator also called MINUS .It returns rows that appear in one input
but not in the other, so it also eliminates duplicates. The EXCEPT ALL operator does
not remove duplicates. The EXCEPT operator can’t distinguish between Nulls in case
if you are doing row removal and duplicate data removal from table.
69
Output of Query is as follows: -
RollNo Name
102 Trupta
104 Tanvi
4. String Operations: -
A string is a finite sequence of symbols that are chosen from a set or alphabet.
1. Substring: -
The Substring function in SQL is used to grab a piece of the stored data.
SUBSTR (string, position): Select all characters from < string > starting with
position < position >. Note that this syntax is not supported in SQL Server.
SUBSTR (string, position, length): This function will returns number of
characters that mentioned in length parameter from specified position of the
string mentioned in function.
Output: -ssein
Output :-Ishra
70
2. TRIM: -
The TRIM function in SQL is used to remove particular prefix or suffix from a
string. The most common pattern being removed is white spaces.
LTRIM (str): Removes all white spaces from the beginning of the string.
RTRIM (str): Removes all white spaces at the end of the string.
3.LENGTH: -
The Length function in SQL is used to get the length of a string.
Length (str): Find the length of the string STR.
4. REPLACE: -
Replaces all occurrences of the string2 in the string1 with string3.
Output: -Patel
5. CONCATENATE: -
CONCAT (str1, str2, str3,...): Concatenate str1, str2, str3, and any other strings
together.
71
Example: -
SELECT CONCAT (Last_Name, First_Name)
FROM Students WHERE Last_Name = ' Hussein’
Output: -HusseinZakir
Example: -
SELECT Last_Name || ‘ ‘ || First_Name
FROM Students WHERE Last_Name = ' Hussein’
6.LEFT: -
Returns left part of a string with the specified number of characters.
7. RIGHT: -
Returns right part of a string with the specified number of characters.
Output: -hah
8. REPLICATE: -
Repeats string for a specified number of times.
Output: -ShahShah
9.REVERSE: -
Returns reverse of a string.
72
Example: -SELECT REVERSE (‘Last_Name ‘)
FROM Students WHERE Last_Name = ' Shah’
Output: -hahS
10.UPPER: -
Convert string to Uppercase.
11.LOWER: -
Convert string to lowercase.
Output: -Shan
12. UNICODE: -
Returns Unicode standard integer value.
13. STUFF: -
Deletes a specified length of characters and inserts string at a specified starting
index.
Output: -Patel
73
Example: - SELECT MID (First_Name, 1,3) FROM Students
WHERE First_Name = ' Ramesh’
Output: -Ram
3.12.Null Values: -
NULL values represent missing unknown data. The NULL value is different
from all valid values for a given data type. Null Value is unknown in the sense that
the value is: missing from the system. By default, a column of table can hold NULL
values. It can be used as a placeholder for unknown values.
Example: -Consider table Students tables as follow where Email id is optional, in that
case “EmailID “ Column will be saved as Null.
Output of query: -
Studname Class EmailID
Ravi TYBSC
Disha FYBSC
Summary: -
In this chapter we learnt different statement that are supported by SQL DML,
DCL, DDL.We learnt different syntax of queries supported by SQL. We have learned
how to use queries, how to retrieve data, insert new records, delete records and update
records in a database with SQL.We have also learned how to create databases, tables,
and indexes, views with SQL, and how to alter tables, how to drop constraints, tables,
view & indexes. We have learned the different aggregate functions in SQL.
74
Multiple Choice Questions: -
4. For providing access or privileges on the database objects to the users, we use
__________ statement.
a) Revoke
b) Role
c) Grant
d) None of above
6. When we want to select duplicate values from two tables we use _________
operator.
a) Union
b) Intersect
c) Union All
d) Minus
7) When we want to check for no duplicate rows exist in a table we used _________.
a) Foreign Key
b) Primary Key
c) Null Value
d) Distinct
8) A _________ constraint is used to limit the values that can be placed in a column.
a) Check
b) Unique
c) Not Null
d) Foreign key
75
9) The _________operator takes the results of two queries and returns only rows that
appear in both result sets.
a) Union
b) Union All
c) Intersect
d) Minus
****************************************************
76
Chapter 4
Entity Relationship Model
Topics To Learn: -
4.0 Overview
4.1 Modeling
4.2 ER Model
4.3 Entity Types
4.4 Attributes Type
4.5 Relationship
4.6 Relationship Set
4.7. Case Studies on E-R diagrams
4.0 Overview: -
When you are trying to solve any difficult problem, we first try to visualize
that problem. After that we plan for the solution. When anyone wants to design any
database it is necessary to represent all data such that user can easily visualize
database and relationship that exists between the tables, this can be done with help of
ER (Entity –Relationship) model. This Chapter will introduce you to a new technique
that is used for representing data requirements and also to organize data in various
views.
ER model is a graphical picture of the data for a particular database. To
represent real world object that is distinguishable from other object we use entities &
for describing properties of entities we use attribute. In this chapter you will learn
how to represent any database in terms of entities & attributes as well as how to
represent relationships between these entities. We will also learn other extended
features of ER model such as specialization, generalization, attribute inheritances,
aggregation. So as to design any scenario effectively in ER model, let’s start with
modeling first.
4.1 Modeling: -
Modeling languages is used to define schema (data, data relationships, data
explanation, and constraints on the data). For designing database it is necessary to
design database structure correctly so we use data modeling.
Let’s learn about Data Modeling.
Data Modeling: -
It is Process, which is used to produce data model for describing data, its
relationships, and its constraints. It also defines data elements, their structures and
relationships between them. We use data modeling to define and analyze data
requirements that are needed. It is a technique that is used to model data in a standard,
reliable, expected manner so that it manages resources efficiently. It is generally used
in business and information technology to give a suggestion about how information is,
how it should be, how it is stored and used inside a business system.
Importance of Data models: -
It defines and analyzes data requirements.
It’s make communication easy for application programmer, end user, and
designer.
77
To define data elements, their structures and relationships between them.
It provides different view for data for end-users
It organizes data for various users.
Data
Figure. 4.1.1 Data Model
78
4.2 ER Model: -
The Entity-Relationship model is developed by Chen in 1976 to design
database more effectively. A Conceptual data model explains the structure of a
database. The fundamental concept of ER model includes entity types, relationship
type and attributes. Entity type represents a set of object in actual world with similar
properties. Let’s see all components of ER Model.
Example: Student is entity having many properties name, address, roll no etc.
Example: In Bank System Customer withdraw money from bank, in terms of cash
withdrawal process entities are Account & Customer so process is related with
Customer & Account only.
Customer Account
b. Attributes: -
An attribute is a property or characteristic. An entity is represented by a set of
attributes. It corresponds to a field in a table. For each attribute there is a set of
allowable values called the domain or value set of the attribute. Attributes are
represented by ovals and are linked to the entity with a line. Each oval contains the
name of the attribute it represents.
CourseName
CourseFees Duration
Course
c. Entity Set: -
In ER model a specific table row is referred as an entity instance or entity
occurrence. Each entity set has a key. All entities in an entity set have the same set of
79
attributes. Thus entity set is a set of entities of the same type that share the same
properties or attributes. Set of entities of particular type is known as Entity Set.
Example: In Bank System we deal with five entity sets as follows:
1. Customer entity set that includes set of all people having an account at the bank.
Attributes are customer-name, address, Phoneno.
2. Employee entity set that have attributes employee-name, EmpId and phone-
number.
3. Branch entity set that has all branches of a particular bank. Each attributes
branch_name, branch-city and assets describe each branch.
4. Account, the set of all accounts created and maintained in the bank. Attributes are
account_number,balance.
5. Transaction entity set that have set of all account transactions executed in the bank.
Attributes are transaction_number, date and amount.
d. Domain: -
Attribute is characteristic or property used to correspond field in table. Each
attribute is connected with a set of values that is known as domain. The domain
defines the possible value that an attribute may grasp.
Example: Consider Result grading system, in which for first class grade that student
should obtain 60% or more than that. So domain in this case is number from 60 to
100.
Example: The name attribute for Student & Teacher shares same domain. Means
name is compose of set of characters.
Entities that are not depending on any other entities is termed as not existence
dependent.
80
Example: In College entity elective subject does not depend upon entity student.
Because many students study more than one subject .It may happen that if it is
elective subject & if no student selects that elective subject for exam, though any
student in exam does not select that subject but it exists for examination. Only no
student is appearing for that subject. Means entity elective subject is not existence
dependent on student.
Withdraw
81
We underline the discriminator of weak entity .In above example we underline
discriminator of weak entity i.e. Amount.
An entity set that has a primary key is known as strong entity set. A strong
entity is independent of other entities and can exist on its own.
Example: Student is strong entity type because it associates with Primary key
Roll_No.
Student
Roll_No
Figure.4.3.3.Recursive Entity
Student
82
4.4.1. Simple Attribute: -
When attribute consist of a single atomic value it is Simple attribute.
i.e. Simple attribute cannot be subdivided.
Example: - The attributes age, Gender etc is simple attributes.
Gender
City
Address
Street
State
Example:-A Person has ‘age’. Age have fixed value means multiple value is not
possible for Age so Age is Single valued attribute .Single valued attribute can be
simple or composite.
City
83
4.4.4. Multi Valued Attribute: -
Multivalued attributes can have multiple values. Multivalued attributes are
shown by a double line connecting to the entity in the ER diagram.
Example: A customer can have multiple phone numbers, email id's etc
A person may have several college degrees.
Multi valued Entity is denoted as below: -
Phone no
Total
Example: In Result system First rank student is derived by the highest percentage i.e.
student who obtained highest percentage is first ranker.
First Rank
Example: - Person Entity has Phone_No attribute, which is composite & mulitvalued.
Means One person may have more than one phone number & Phone_No is may be
composed of STD Code & Telephone number.
84
Figure. 4.4.7. Complex Attribute
4.5 Relationship: -
A Relationship is an association among several entities. The association or
relationship that exists between the entities relates data item to each other in a
meaningful way. The existence of data association defines that, the relationship exist
between two or more entities. So to represent relationship an additional entity is used
called as relation entity.
85
4.6 Relationship Set: -
Example: - Following Figure shows Student table have relation with College table.
Student S.M.Pawar is associated with Karve College. In this case it is called as
relationship instances.
College
Student
Relationship Instances
Figure.4.6 Relationship Instances
86
1. One-to-one Relationship: -
Example: One Manager in company manages a single department (a one to one
connectivity).
2. One-to-Many Relationship: -
Example: One Professor teaches in many classes. (a one- to- many connectivity)
3. Many-to-one Relationship: -
4. Many-to-Many Relationship: -
Example: The many Colleges can offer courses to many students as a part of student
development program. (a many–to–many connectivity).
87
4.6.2 Types of Relationship:-
a. Unary Relationship: -
An Entity can be related to itself, such kind of relationship is known as Unary
or Recursive Relationship. A unary relationship exists when an association is
maintained within a single entity.
Figure.4.6.2.a.Unary Relationship
b. Binary Relationship: -
A binary relationship exists when two entities are associated.
Figure.4.6.2.b.One-to-One
Figure.4.6.2.c. Many-to-many
88
Figure.4.6.2.d. Ternary Relationship
The lines between the diamonds and the rectangles represent the multiplicity
of the relationships. A single line represents a one-to-one relationship. A line that
branches into three segments where it connects to the entity represents the one-to-
many or many-to-many relationships.
a. Degree of Relationship: -
The degree of a relationship is the number of entities among which the
relationship exists. The most common degree of relationship is binary, which exists
between two entities. The Degree of binary relationship is two (number of entities in
binary relation is two).
Example: - In following Figure student entity is related with College entity. Many
students studying in one college, this is binary relationship. Here two entities are
present so degree of this binary relationship is two.
b. Multiplicity: -
Multiplicity is the number of instances of entities that are related.
Example: -A Binary relationship exists between student & colleges. Many students
are studying in one college or it may happen that one student studying many degrees
from many colleges. So multiplicity of this relationship is many-to-one or many-to-
many.
89
c. Existence: -
Existence means whether the existence of an entity instance is dependent upon
the existence of another related entity instance.
The existence of an entity in a relationship may be either mandatory or
optional.
If an instance of an entity must always occur for an entity to be included in a
relationship, then it is mandatory. An example of mandatory existence is that "Student
must select at least one subject from optional subject List”. It’s mandatory for
students.
If the instance of the entity is not required, it is optional. An example of
optional existence is that “Some activities in colleges are not compulsory to students
“. Means for example to participate in annual gathering is not compulsory to all
students.
Cardinality:-
It is use in database relations to indicate the occurrences of data on either side
of the relation. Let’s see Mapping cardinalities or cardinality ratios.
In Relationship one entity is associated with another entity. Mapping
cardinalities or cardinality ratios is act to express the number of entities to which
another entity can be connected (associated) via a relationship set. Mapping
cardinalities are most useful in describing relationship sets.
One to one: -
An entity in A is associated with at most one entity in B, and an entity in B is
associated with at most one entity in A.
Example: Every University has exactly one Vice Chancellor.
Every Country has only one Prime minister.
One to many: -
An entity in A is associated with any number (zero or more) of entities in B.
An entity in B, however, can be associated with at most one entity in A.
90
Figure. One-to-many Relationship.
Many to one: -
An entity in A is associated with at most one entity in B. An entity in B,
however, can be associated with any number (zero or more) of entities in A.
Many to many: -
An entity in A is associated with any number (zero or more) of entities in B,
and an entity in B is associated with any number (zero or more) of entities in A.
4.6.5 Keys: -
A key is an attribute or a combination of attributes that is used to
identify records. Sometimes we might have to retrieve data from more than one table,
in those cases it is required to join tables with the help of keys. The purpose of the key
is to bind data together across tables without repeating all of the data in table.
91
4.6.5.1 Keys for Relationship set: -
a. Super Key: -
A Super Key is a set of one or more attributes that are taken collectively and
can identify all other attributes uniquely. Super key stands for superset of a key.
b. Candidate Key: -
Candidate Keys are super keys for which no proper subset is a super key.
Example: let suppose we have a Student Table with attributes
(‘RollNo’, ‘Name’,’ Address’, ‘Class’,’Phone_No’)
In above table RollNo Key can identify the record uniquely and similarly
combination of Name and Address can identify the record uniquely, but neither Name
nor Address can be used to identify the records uniquely as it might be possible that
we have two Students with similar name or two Students from the same Address. So
for above table we have only two Candidate Keys used to identify the records from
the table uniquely.
1. RollNo
2. Name, Address. These are Candidate Key
c.Secondary key: - alternate of primary key. Means we can use that key to identify
record but they might not be unique.
d. Compound key: –
Compound key is known as composite key or concatenated key. It consists of
2 or more attributes. If in a table, no single data element is identified uniquely, then
we can form compound key by combining multiple elements to identify the data
element.
e. Alternate Key: -
Any of the candidate keys that are not part of the primary key is called
Alternate Key.
92
Example: Consider Student Table with attributes ‘RollNo’, ‘Name’,’ Address’,
‘Class,’ Phone_No’.In above example Name, Address is Candidate Key, which is not
a Primary Key.
h. Primary Key: -
Primary key is an attribute in table that uniquely identifies each record in table
and we cannot put primary key as null as well as duplicate. In the ER diagram,
primary key is represented by underlining the primary key attribute. Usually
primary key is composed of a single attribute
OrderID
g. Foreign key: -
A foreign key (FK) is a field in a database record that point to a key field
forming a key of another database record in different table. A foreign key in one table
points to the primary key (PK) of another table. This way reference can be made to
bind the information together.
In this case ‘Purches_no’ is primary key for Purchase_Info .It is foreign key
for table Order. Order table is Parent table having references to child table for
Purchase_Info. Means a Foreign Key value must match an existing value in the parent
table or be NULL.
It cannot happen that if foreign key value is not present in the parent
table though that value is present in Child table. If you want to delete values from
child table then that value should also be deleted from parent table.
4.6.5.2. E-R Diagrams: -
ER diagram (ERD) is well known as data schema map or data map .It is use to
capture the data. The ERD captures data about the entities, their attributes and the
relationships between the entities. The ERD pictorially represent the interrelationship
of the database schema. The major components of ERD are entities, attributes,
relationship and cardinality of association.
93
Steps in ER Modeling: -
1. Identify the Entities.
2. Find relationships.
3. Identify the key attributes for every Entity.
4. Identify other relevant attributes.
5. Draw complete E-R diagram with all attributes including Primary Key
3 Relationship
4 Identifying
Relationship
5 Attribute
6 Multivalued Attribute
7 Primary Key
8 Derived Attribute
94
9 Composite Attribute
10 Connector
We can use different notation for cardinality by drawing line such as directed
line ( ) for showing ‘one’ and undirected line ( ) for ‘many’ between relationship
set & entity set.
I. One to One: -
A one to one relationship – One College having one principle only, so it is
a many to one (1: 1) relationship
95
Figure: - Many to one relationship
We will See Information Engineering Style & Martin Style only which is widely used.
Symbol Meaning
One-to-One
One-to-many (mandatory)
96
Many
One-to-more (mandatory)
One-and only one
Symbol Meaning
1 One and only one (mandatory)
Example: -One College has many Professors and many Professors taught many
Students.
Example: -One University has many Colleges and many Colleges have many
Programs.
Colleges Program
University Having Having
s
97
4.6.5.6 Weak Entity Sets: -
An entity set that does not contain sufficient attributes to form a primary key is
called as weak entity set and an entity set that contains a primary key is called a
strong entity set.
Example: -Consider entity student having attribute name, Roll_No. Student is part of
university means many students are connected to university. Many Student from
different colleges may have same name .so in this case entity set Students does not
have sufficient attribute to form primary key. It is weak entity set.
Case Study 1: - Design a database for Students management system that has a set of
students. Each Student is associated with one or more colleges. Colleges are Private
or Government.
Solution: -
98
ER Diagram
Tabular representation
Table Student
RollNo Name Address CName
Table College
99
Case study 2:-Consider Courier services system. In which Administrator is person
who handles administrations of system. Administrator is person who is actually owner
of Courier Services shop. Client is person who courier the documents or things.
Workers are person who works in courier office to handle enquiry, dispatching
process of courier’s etc. Payment _Mode option is available for Client. Client can do
Payments by using different Payment modes. Draw ER diagram for Courier Service
System.
Step 2: Find the relationships that are present in Courier services system
100
M
M
1
M
M
1
1
M
Summary: -
First we learnt in this chapter that ERD means Entity Relationship diagram that is use
for easily visualize database and relationship that exists between the tables.
101
An Entity can be related to itself is called Unary or Recursive Relationship
When two entities are associated then relationship is binary relationship.
When three entities are associated, it is ternary relationship.
Cardinality means how many instances of an entity relate to one instance of
another entity.
A key is an attribute or a combination of attribute that is used to identify
records.
3> The ER model refers to a specific table ________as an entity instance or entity
occurrence.
a) Column b) Group of Rows
c) Row d) Group of Fields
4> __________attribute is an attribute that can be further subdivided.
a) Multivalued attribute
b) Simple Attribute
c) Composite Attribute
d) Null Attribute.
6> An attribute that’s value is derived from a stored attribute is known as _______
Attribute.
a) Simple attribute
b) Derived Attribute
c) Composite Attribute
d) Null Attribute
*****************
102
Chapter 5
Normalization
Topics To Learn
5.1 Overview:
5.2 Relational DB design
5.3 Decomposition (Small Schema)
5.4 Lossy Decomposition
5.5 Loss less Decomposition
5.6. Functional Dependency
5.7 Normalized Forms
5.8. De-normalization
5.1 Overview:-
SQL offers to use different types of queries but sometime it may happen that
we get redundant, inconsistent data, which is not sufficient. To avoid problems
regarding to database, it necessary to convert database into normalize form means
normal, that concern to standard or norm. So we need normalization of database.
This chapter explains you that what is normalization, why we use it, different
types of normalization that helps you to maintain normalize forms in different ways.
We also learn denormalization process, which is also useful sometimes to avoid
drawbacks of normalization. Let’s start from Relational database design first.
103
redundancy. When we decompose relation it is required to check that decomposition
does not lead to bad design.
R = R1 U R2
Decomposition
104
Example: -Given two relation R1 and R2 , We combine them R1 R2.Combine both
relations into single relation as follow :-
R1 R2
In previous example, additional tuples are obtained along with original tuples.
Although there are more tuples, it produces not as much of information. Due to the
loss of information, decomposition for previous example is called Lossy
decomposition or Lossy-join decomposition.
Definition: -When we decompose relation of R into {R1, R2,…, Rn} and if the
natural join of R1, R2… Rn produces exactly the relation R it is called as Loss less
decomposition.
105
Show that decomposition is Loss less Decomposition.
Roll_No Name Address Fees Class Grade
11 Sumit Nashik 150000 FY A
21 Smith Pune 200000 SY B
17 John Nashik 150000 FY A+
15 Sonam Pune 300000 TY B+
R2
R1
Roll_No Name Address Grade Roll_No Fees Class
11 150000 FY
11 Sumit Nashik A
21 200000 SY
21 Smith Pune B 17 150000 FY
17 John Nashik A+
15 300000 TY
15 Sonam Pune B+
106
Solution: -For determine primary key in table, we use column which uniquely
identify each record .For identifying FD’s we identify relationship between two
attributes which allows us to uniquely determine the corresponding attribute’s value.
By using RollNo we can uniquely determine name and city in above relation.
Functional Dependency: -
It is divided into subtypes as follows:
1. Full Functional Dependency
2. Partial Dependency
3. Transitive Dependency
In this case (RollNo, Subject_Code) both form composite key (primary key
with more than one attribute) means if we want to find marks for particular subject we
need RollNo and Subject_ Code.
I.e. Functional Dependency exists in this relation as follows:-
Marks->RollNo, Subject_Code
2. Partial Dependency: -
If relations have functional dependency like A -> B, we remove some attribute
from A and yet the dependency stills hold, this type of dependency is known as
Partial dependency. It is type of functional dependency where an attribute is
functionally dependent on only part of the primary key (primary key must be a
107
composite key). Partial dependency exists when the value in a non-key attribute of a
table is dependent on the value of some part of the table's primary key
1. Insertion:-If you want to enter new record having only student’s basic information
like RollNo, name but if we don’t want to enter Fee, Course details we cannot do like
this. It is must for user to enter information about Fees and Course.
RollNo->Course, fees
Partial Dependency
To remove partial functional dependency split table into new relation according to
Functional dependencies.
3. Transitive Dependency: -
108
for Part Time work company offer only 75 rupees per hour .If employee works for
more hours than actual time limit per day then that amount will be added in salary at
the end of month based on WorkType of employee. So in this case transitive
dependency appears such as follows:-
EmpId->WorkType, ExtraHours
WorkType->ExtraHours->Salary
Solution: -For removing transitive dependency, you have to first remove partial
dependencies then transitive dependency.
Transitive dependent
109
Personal SubjectInfo
RollNo SubjectCode SubjectName Marks
RollNo Name Class 101 C101 C 57
101 P.M.Pathak FY 102 C201 DBMS 72
102 S.M.Patil SY 103 C102 C++ 60
103 A.J.Pawar FY
104 C202 Oracle 75
104 A.D.Despande SY
Definition of Normalization: -
It is the process of decomposing a relation (table) based on functional
dependency and primary key.
Types of Normalization: -
Un-Normalized Form
First Normal Form (1 NF)
Second Normal Form (2 NF)
Third Normal Form (3 NF)
110
For UnNormalize table we identify repeating groups in above tables that is
CourseName, Course_Code and Fees. Key in this table is RollNo because all fields of
table depend on RollNo.
From above table you will notice that, each row contain unique combination of
values. It contains only atomic value. We cannot decompose values. So it is in 1NF.
111
Steps to convert 1NF into Second Normalize Form: -
In first step you have to remove partial dependencies.
If primary key of relation is single attribute means that it has only one
value, and then we did not require to convert it in 2NF.
But if primary key exists in relation is not single means primary key is
composite attributes then we require to identify primary key and functional
dependencies that exist in relation for 1NF.
o If any partial dependency is depending upon primary key, then
remove those dependencies by placing them in new relation along
with copy of determinant.
Example: -Consider Student table in Figure 5.7.1 in that table we reduce redundant
data by extracting it from 1NF and placing that redundant data in new table so we
create relationship between those table as shown below. We remove partial functional
dependencies to produce 2NF.
In that table RollNo is primary key .All other attributes Name, Class
dependent upon primary key, but attribute fees is upon CourseName which is not
primary key of student table .So we remove partial functional dependencies to
produce 2NF.
We can observe that Name attribute completely depend upon primary key
RollNo whereas Fees is depend on Course_Code. Course_Code cannot individually
determine unique record .We need RollNo along with Course_Code for determine
Fees attribute.
112
Before see an example remind what is transitive dependency?
In above example Name is depend upon Primary key RollNo and other
attribute Fee is depend upon CourseName, which is depend upon Course_Code.
113
Primary Key
Implementing the Split up would mean:-
Course
Personal
Course_Code CourseName Fees
RollNo Name Course_Code C102 C 2500
123 Ravi C102 C103 C++ 1200
123 Ravi C103 C104 OOPs 3200
123 Ravi C104
124 Sumit C102
124 Sumit C103
125 Trupta C102
125 Trupta C103
125 Trupta C104 Foreign key is an attribute that
appears as a non-key attribute in one
relation and as a primary key in one
relation
5.8. De-normalization:-
De-normalization is the process of converting normalized form into
unnormalized form. It is a technique to move from higher to lower normal forms of
database.
Advantages of Denormalization: -
Denormalization gives Performance benefits by: -
Minimizing the need for joins.
Precomputing aggregate values, that is, computing them at data modification
time, rather than at select time
Reducing the number of tables, in some cases
Disadvantages of Denormalization:-
It makes updating process slow.
It increase the size of tables.
It makes coding more complex.
It is mostly application specific .So if application changes, it need to re-
evaluate.
114
Summary: -
In this chapter we learnt what is decomposition also Lossy and loss less
decomposition. We focused on normalization, different forms of normalization. Along
with that we learnt de-normalization, and their types.
Decomposition:-
If R is a relation scheme then {R1, R2, ..., Rn} is a decomposition
if R = R1 U R2 U ... U Rn
Lossy Decomposition: -
If R is a relation scheme and R’ = R1 U R2 U ... U Rn
If we get R≠R’ then is called as Lossy decomposition.
1NF: - A Relation is said to be in 1NF if each cell of the table must have single
value.
2NF: - if it is in 1NF, and each and every attribute is depends on Primary key.
115
2. If R is a relation scheme, if R’ = R1 U R2 U.. U Rn if we get R=R’ then is called as
________decomposition.
a) Loss less b)Lossy C
b) Both a and b d)None of above
8. A Relation is said to be in ________ if each cell of the table must have single
value.
a)UnNormalize Form b)Second Normal Form
c) First Normal Form d) None of Above
*********************************
116
Chapter 6
Transactions
Topics to Learn
6.0 Introduction
6.1 Transaction Concept
6.2 Properties of Transactions
6.3 Transaction Terminology
6.4 Transaction States
6.5 Concurrent Execution of Transactions
6.6 Operations on a Transaction
6.7 Concurrency Control
6.8 Schedules
6.9 Recoverability
6.0 Introduction
In our day-to-day life we perform several activities like purchasing of
groceries, vegetables, paying bills, etc. Each activity consists of many steps. For
example, while paying an electricity bill, we have to go to the bank, withdraw money
from bank, then go to the electricity office, stand in queue and finally pay the bill and
again go home. Paying electricity bill is one activity but it contains many steps to
perform it. Similarly Transaction is a single activity, which consists of many
individual steps to perform the work. That’s why transaction is always considered as
an atomic unit or logical unit.
In this chapter we will first see the transaction concept and how a particular
transaction is carried out. Then we will see the properties of transaction, which every
transaction should satisfy. Then we will study the transaction terminology in which
we will go through the basic concepts that will be used through out the chapter. A
transaction before its completion travels through many states. These states will be
explained in transaction states.
In any computing system, there are many programs executing in the system at
the same time. This concept and the problems related to concurrency will be described
in concurrent execution of transactions. There are various operations, which can be
performed on the database; we will then see the operations permissible on the
database. After operations, we will see the concept of schedules. A schedule is the
order in which transactions are executed in the system.
Serializibility is the mechanism used to control the concurrent execution of
transactions. We will study the concept of Serializibility and also its types. The last
topic covered in this chapter is Recoverability. Recoverability is removing the effects
of incomplete transactions. Then we will study the various Recoverable Schedules.
So let’s start with the Transaction Concept.
117
transaction. A transaction is a single unit of database operations that are either
completed entirely or not performed at all. The transaction should be completed or
aborted, no intermediate states are allowed. The transaction consists of simple SQL
statements such as select, insert, delete, etc. The more common transactions are
formed by usually two or more database requests.
E.g..: Consider a simple example of calculating percentage of six subjects:
In this example, we first read the value of each subject and then add it in the
value of total. After the total is calculated, we calculate the percentage and value of
percentage is written back to the database.
When the transaction is executing, the DBMS must ensure that the data item used
should not be accessed by any other transaction.
118
6.2 Properties of Transaction:
The atomic transaction has several properties. These are called as the ACID
properties.
1) Atomicity: A transaction is an atomic unit of processing; it is either performed in
its entirety or not performed at all.
2) Consistency Preservation: A correct execution of a transaction must take the
database from one consistent state to another.
3) Isolation: A transaction should not make its updates visible to other transactions
until it is committed. This property is useful in multi-user database environment
because several different users can access and update the database at the same time.
4) Durability or Permanency: Once a transaction changes the database and the
changes are committed, these changes must never be lost because of the software or
hardware failures.
119
Figure 6.2: An inconsistent transaction.
120
6.3 Transaction Terminology
1) Commit: A transaction that completes the execution successfully is called as a
committed transaction. The committed transaction should always take the database
from one consistent state to another. The changes made by the committed transaction
should be permanently stored in the database even if there is any system failure.
2) Abort: If there are no failures then all the transactions complete successfully. But
transaction may not always complete its execution successfully then the transaction is
called as aborted.
3) Roll back: If we want to obey the atomicity property then all the changes made by
the aborted transactions must be undone. When we undo the changes of a transaction
we say that the transaction has been rolled back.
4) Restart: If a transaction is aborted because of hardware or software failure, a
transaction restarts as a new transaction
5) Kill: A transaction is killed, if there is some internal logic problem, or input
problem or output problem.
6) Throughput: It is the average number of transactions completed in a given amount
of time.
7) Average response time: It is the average time taken by a transaction to complete
after it has been submitted.
121
this check is successful, the transaction is said to have reached the commit point and
enters the committed state. Once a transaction is committed then it has finished the
execution successfully and all its changes must be recorded permanently in the
database.
The database system writes enough information to disk. So, that even in case
of failure the updates of a transaction can be re-created when the system restarts.
When this information is written then the transaction enters the committed state.
A transaction enters a failed state when the DBMS finds that the transaction is
not executing normally. Such a transaction is rolled back. Then, it enters the aborted
state. After a transaction aborts, it may restart or get killed.
Note: We should be careful while writing the changes to the database. Most systems
allow such writes only after the transaction has entered the committed state.
122
program, then suspend that program and execute some commands from the next
program and so on. A program is resumed again at the point it was suspended
when it gets the CPU turn. Hence, many programs are interleaved.
123
Figure 6.6: Read-item (X) operation.
2) Write-item (X): Writes the value of a program variable X into the database
item named X. Write-item (X) includes the following steps:
Find the address of the disk block that contains item X.
Copy the contents of disk block into a buffer in main memory.
Copy item X from program variable named X into its correct location in
the buffer.
Store the updated block from the buffer back to disk.
Note: Here we assume that the program variable and the database item name are
same.
124
6.7 Concurrency Control
In multi-user systems, programs are executed concurrently i.e. at the same
time. Concurrency control is a mechanism used to ensure that the programs do not
interfere with other executing programs. The major task performed by
concurrency control protocols is to manage the concurrent operations on the
database by executing programs. As we have seen, Database is an application that
manages data and allows fast storage and retrieval of that data. Multiple users can
concurrently share the data stored in the database. The database can remain
consistent, if all the programs are only reading the data in the database. The
problem arises when some programs are reading the data in the database and some
program is updating the same data in the database, leaving the database in an
inconsistent state.
Several problems may occur if concurrent executions are not controlled. We
will
now study some of these problems. Consider a simple example of transfer of
money between two accounts. Suppose there is a joint account (account operated
by two persons) of Mr. Sharma & Mrs. Sharma. Mr. Sharma wants to make a
payment of Rs. 1000/- to Mr. Malhotra. So, he performs a transaction T1 to
transfer money to Mr. Malhotra’s account. At the same time, Mrs. Sharma also
performs a transaction T2 to deposit Rs. 2000/- in their joint account. We refer to
Mr. Sharma’s account as “S” and Mr. Malhotra’s account as “M”. The problems,
which can arise, are as follows:
125
As shown above, transaction T1 first reads the value of data item S. It then
subtracts Rs. 1000/- from S. Now the value of S is Rs. 9000/-. After subtraction,
transaction T1 is interleaved and transaction T2 starts its execution. Transaction
T2 also reads the value of data item S that was read by T1. Transaction T2 adds
Rs. 2000/- to the value of S, which makes the value of S equal to Rs. 12000/-.
Transaction T2 is suspended and T1 resumes its execution. T1 writes the updated
value i.e. 9000 to the database. It then reads the value of M and interleaves again.
After this T2 also performs write operation. It writes the value of S to the database
i.e. 12000, which is incorrect. The value of S updated by T1 is lost, leaving the
database in an inconsistent state. The final value of S is Rs. 12000/-, which is
incorrect. The actual value of S after transactions T1 and T2 should be Rs. 11000/-
Here, transaction T1 changes the value of S and fails. So the system must
discard this change. Before it can be done, the transaction T2 reads the temporary
value of S, which is incorrect. The value of item S that is read by T2 is called as
dirty read.
126
individually. And she performs a transaction to calculate the balance of her
account and the joint account. We refer to this new account as “X”.
Here, transaction T2 has read the value of X before updating and the value of
S after updating. So, the final value of balance is incorrect.
Another problem that may occur is the unrepeatable read. Here the
transaction T1 reads an item twice, and the item is change by another transaction
T2 between the two reads. Here, T1 will receive different values for its two reads
for the same item even though T1 has not modified the item in between.
6.8 Schedules
Instructions in the transaction are executed in a particular sequence to
accomplish the task. Schedules are used to represent the sequence in which the
instructions are executed. A schedule is a timely ordered sequence of the
instructions in the transactions. A schedule S of n transactions T1, T2, …Tn is an
ordering of the operations of the transactions subject to the constraints that, for
each transaction Ti that is in S, the operations of Ti in S must appear in the same
order in which they occur in Ti.
The instructions from other transaction Tj can be interleaved with the
operation of Ti in S. A schedule for a set of transactions must consist of all the
instructions of those transactions and must maintain the order in which they
appear in individual transaction.
Consider the simple banking system, which has number of accounts and a set
of transactions, which access and update those accounts. Consider two
transactions T1 and T2, which transfer funds from one account to another.
Transaction T1 transfer Rs.500 from account A to account B, it is defined as,
127
Figure 6.11: Transaction T1.
128
Figure 6.13: A serial schedule T1 followed by T2.
129
Figure 6.15: A concurrent schedule.
130
The final state is an inconsistent state. After executing the schedule S4, the values
of A and B are Rs.500 and Rs.2125 resp.
It is the job of Concurrency control mechanism to ensure that schedules that
are executed concurrently should always result as if they were executed in a serial
manner. This is done to ensure the consistency of the database.
6.9 Recoverability
If a transaction fails we have to undo the effect of the transaction and bring the
database to the consistent state. So when a transaction aborts, all the transactions
dependent on the aborted transaction should also be aborted and rolled back. To
achieve this, there are certain restrictions imposed on the type of schedules
permitted in the system. To bring the database back to a consistent state, there is a
limit on the acceptable schedules. We will now see the types of schedules:
Consider the example of two transactions T1 and T2. T2 has just one
instruction read (A) and it commits before T1 does. Suppose T1 fails before it
commits. Since T2 has read the value of data item X written by T1 we should
abort T2 also. But T2 has already committed and cannot be aborted. So it is
impossible to recover correctly from the failure of T1. So it is a non-recoverable
schedule.
131
Figure 6.25: A non-recoverable schedule
If T1 aborts due to some reason, then the transactions T2 and T3 must also be
rolled back because they have read the value of A which is updated by T1.
Note: Cascading rollbacks can lead to a large amount of work.
132
read operation of Tj. This schedule is called as Cascadeless Schedule. In this type
of schedule cascading rollbacks cannot occur.
Summary
A transaction is a unit of program execution that accesses and updates various
data items.
133
A transaction is a single unit of database operations that are either completed
entirely or not performed at all. The transaction should be completed or aborted,
no intermediate states are allowed.
ACID Properties
o Atomicity: A transaction is an atomic unit of processing; it is either
performed in its entirety or not performed at all.
o Consistency Preservation: A correct execution of a transaction must take
the database from one consistent state to another.
o Isolation: A transaction should not make its updates visible to other
transactions until it is committed.
o Durability: Once a transaction changes the database and the changes are
committed, these changes must never be lost.
Aborted: A transaction may not always complete its execution successfully then
the transaction is called as aborted.
Roll Back: When we undo the changes of a transaction we say that the transaction
has been rolled back.
Transaction States
o Active: This is the initial state. The transaction is in this state while it is
executing.
o Partially Committed: The entire transaction has been executed, but not
yet committed.
o Failed: The transaction cannot execute normally.
o Aborted: The transaction is rolled back and the database is stored to its
prior state.
o Committed: After a successful completion.
Operations on Transaction
o Read-item (X): Reads the database item named X into a program variable.
o Write-item (X): Writes the value of a program variable X into the
database item named X.
Schedules: Schedules are used to represent the sequence in which the instructions
are executed in the system.
134
Recoverable Schedule: A Recoverable Schedule is a schedule in which, if a
transaction Tj reads a data item previously written by a transaction Ti, then the
commit operation of Ti should appear before the commit operation of Tj.
Cascadeless Schedule: For each pair of transactions Ti and Tj such that Tj reads a
data item previously written by Ti, then the commit operation of Ti should appear
before the read operation of Tj. This schedule is called as Cascadeless Schedule.
Strict Schedules :In this schedule, transactions cannot read or write a data item
‘X’ until the last transaction that wrote the data item ‘X’ has committed or
aborted.
2) Which of the following is a transaction state when the normal execution of the
transaction cannot proceed?
a) Failed b) Active c) Aborted d) Committed
*********************************
135