DBMS
DBMS
co
1
www.gradeup.co
DBMS
1 INTRODUCTION TO DBMS
1. BASICS OF DBMS
Databases are organized according to fields and records so that they are easily
searched.
Field: A field is single unit of data that is unique within each entry/row, but the overall
data category is common to all entries. Field is a smaller entity of the table which
contains specific information about every record in the table.
Record: Records are composed of fields, each of which contains one item of
information. A set of records constitutes a file. For example, a personnel file might
contain records that have three fields: a name field, an address field, and a phone
number field.
Database: It is a collection of related data. The database is a collection of inter-
related data which is used to retrieve, insert and delete the data efficiently. It is also
used to organize the data in the form of a table, schema, views, and reports, etc.
For example: The college Database organizes the data about the admin, staff, students
and faculty etc.
Using the database, the information can be easily retrieved, inserted, and deleted.
2. DBMS
2
www.gradeup.co
3
www.gradeup.co
a. Data Definition Language: DDL stands for Data Definition Language. It is used
to define database structure or pattern. Data definition language is used to store
the information of metadata like the number of tables and schemas, their names,
indexes, columns in each table, constraints, etc.
Here are some tasks that come under DDL:
Create: It is used to create objects in the database.
Alter: It is used to alter the structure of the database.
Drop: It is used to delete objects from the database.
Truncate: It is used to remove all records from a table.
Rename: It is used to rename an object.
Comment: It is used to comment on the data dictionary.
b. Data Manipulation Language: DML stands for Data Manipulation Language. It is
used for accessing and manipulating data in a database. It handles user requests.
Select: It is used to retrieve data from a database.
Insert: It is used to insert data into a table.
Update: It is used to update existing data within a table.
Delete: It is used to delete all records from a table.
Merge: It performs UPSERT operation, i.e., insert or update operations.
Call: It is used to call a structured query language or a Java subprogram.
Explain Plan: It has the parameter of explaining data.
Lock Table: It controls concurrency.
c. Data Control Language: DCL stands for Data Control Language. It is used to retrieve
the stored or saved data.
The DCL execution is transactional. It also has rollback parameters.
3. DATA INDEPENDENCY
User can access data without knowing how data can be stored in database file.
In DBMS there are two types of data independence:
3.1. Logical Data Independence:
• It is mainly concerned with the structure or changing the data definition.
• It is difficult as the retrieving of data is mainly dependent on the logical structure of
data.
3.2. Physical Data Independence:
• Mainly concerned with the storage of the data.
• It is easy to retrieve.
4
www.gradeup.co
More I/O cost to access required data from Less I/O cost to access data because of
database files. indexing to database file.
5
www.gradeup.co
5. RDBMS RULES
NOTE:
6. RELATION
6
www.gradeup.co
Tables: In relational data model, relations are saved in the form of Tables. This format stores
the relation among entities. A table has rows and columns, where rows represents records and
columns represent the attributes.
Tuple: A single row of a table, which contains a single record for that relation is called a tuple.
Cardinality: Total number of rows present in the Table.
Relation instance: A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema: A relation schema describes the relation name (table name), attributes,
and their names.
Column: The column represents the set of values for a specific attribute.
Relation key: Each row has one or more attributes, known as relation key, which can identify
the row in the relation (table) uniquely.
Attribute domain: Every attribute has some pre-defined value scope, known as attribute
domain.
7
www.gradeup.co
8. DBMS ARCHITECTURE
8
www.gradeup.co
9
www.gradeup.co
• Security, Data Backup, Recovery, Concurrency Control and Low Data Redundancy are
some of the features of a 3-Tier architecture, which makes it the most widely used
database architecture.
9. ER DIAGRAM
10
www.gradeup.co
10. ENTITY
11
www.gradeup.co
11. ATTRIBUTE
12
www.gradeup.co
d. Derived Attribute
• An attribute that can be derived from another attribute is known as a derived attribute.
• It can be represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another
attribute like Date of birth.
12. RELATIONSHIP
13
www.gradeup.co
14
www.gradeup.co
15
www.gradeup.co
ii. Partial Participation: The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the student, the participation of
course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
Every student in Student Entity set is participating in relationship but there exists a course
C4 which is not taking part in the relationship.
Weak Entity Type and Identifying Relationship:
• A weak entity set do not possess sufficient attributes to form a primary key. It depends
on another entity called a weak entity.
• For example, A company may store the information of dependants (Parents, Children,
Spouse) of an Employee. But the dependents don’t have existence without the employee.
So Dependent will be weak entity type and Employee will be Identifying Entity type for
Dependant.
• The participation of weak entity type is always total.
• The relationship between weak entity type and its identifying strong entity type is called
identifying relationship and it is represented by double diamond.
****
16
www.gradeup.co
17
www.gradeup.co
1
www.gradeup.co
DBMS
2 DATABASE DESIGN
1. CONSTRAINTS
These are the conditions which must be present for a valid relation.
Relational model constraints are restrictions specified to the data values in the relational
database
There are many types of integrity constraints. Constraints on the Relational database
management system is mostly divided into three main categories are:
a. Domain Constraints:
It specifies that the value taken by the attribute must be the atomic value from its domain i.e.
the value is unique.
Domain constraints can be violated if an attribute value is not appearing in the corresponding
domain or it is not of the appropriate data type.
b. Key Constraints:
Every relation in the database should have at least one set of attributes which defines a tuple
uniquely.
The value of the attribute for different tuples in the relation has to be unique.
c. Referential Integrity Constraints:
Referential integrity constraints is based on the concept of Foreign Keys.
When one attribute of a relation can only take values from other attribute of same relation or
any other relation, it is called referential integrity.
A foreign key is an important attribute of a relation which should be referred to in other
relationships.
2
www.gradeup.co
2. TYPES OF CONSTRAINTS
3
www.gradeup.co
Using default value constraint, a default value is set for an attribute. In case no value is
specified, any value for an attribute on which default constraint is specified it hold the
specified default value.
Example:
create table instructor
(instructor_id varchar (5),
name varchar (20) not null,
depart_name varchar (5),
salary numeric (8,2) default 0);
This command specifies that while inserting a tuple in instructor relation if no value is
provided for the salary attribute then its value is set to be 0.
c. Check Clause:
Check is used to impose certain checks like checking if a value is greater than or lesser
than a particular value etc. thus the upper and the lower limit can be set.
The check clause constraint ensures that when a new tuple is inserted in relation it must
satisfy the predicate specified in the check clause.:
Example:
create table Student
(Student_id varchar (5) , name varchar (20) not null, depart_name varchar (20),
primary key (Student_id),
check (depart_name in(‘Comp.Sci.’, ‘Elec.Eng.’, ‘Physics’, ‘Biology’)));
According to the SQL standard, the predicate that is placed inside the check clause can
be a subquery. But, today’s widely used database products do not allow the predicate of
check clause to contain a subquery.
2.3. Referential Integrity Constraint:
• Referential integrity constraint makes sure that the values in the column on which it
is applied are already present in the column it is referring to.
• Thus, here a column of a table refers to the other column of the same or different
table. This ensures that the values are consistent and similar in both the columns. This
is implemented using:
4
www.gradeup.co
Foreign key:
• Foreign key normally references the primary key of same or another table. But it can
refer to other columns too.
• Whenever, same type of attribute exists in two different tables, the attribute in one of
the table is declared as the primary key and in the other, it is made foreign key, so
that the values in both become consistent.
• Foreign key is dependent on primary key.
• It is defined over two tables
i. Referenced Relation: The relation to which other relations refer is called referenced
relation
ii. Referencing Relation: The relation which is referencing to other relation is called
referencing relation
5
www.gradeup.co
3. ANOMALIES
An anomaly is an irregularity, or something which deviates from the expected or normal state.
When designing databases, there are three types of anomalies: Insert, Update and Delete.
A row in referencing relation cannot be inserted if referencing attribute’s value is not present
A row from referenced relation cannot be deleted or updated if value of referenced attribute is
• On Delete Cascade: It will delete the tuples from referencing relation if value used by
1. Insertion: No violation
2. Deletion:
violation occurs.
b. On delete cascade: If we want to delete primary key values from referenced record
c. On delete set null: If we want to delete primary key from referenced table then it
will set the values as null in place of that value in referencing table.
3. Updation:
a. On update no action
b. On update cascade
6
www.gradeup.co
A C
2 4
3 4
4 3
5 4
6 2
A C
3 4
Result
4 3
5 4
****
7
www.gradeup.co
8
www.gradeup.co
1
www.gradeup.co
DBMS
3 NORMALIZATION
1. EQUALITY OF FD SET
(ii) F covers E: Every FD of F set must be a member of E set i.e. every FD in F can be inferred
from E i.e. E ⊆ F
Yes No E⊃F
No Yes F⊃E
2
www.gradeup.co
Example 1:
F = {A → B, B → C, C → A}
G = {A → BC, B → AC, AB → C, BC → A}
Which is true?
A. F ⊂ G
B. F ⊃ G
C. F = G
D. None
Sol:
Step 1: Checking whether all FDs of F are present in G.
A + = A BC , A → BC
+
B = B AC , B → AC
+
AB = AB C , AB → AC
BC+ = BC A , BC → A
F covers G
Step 2: Checking whether all FDs of G are present in F.
A + = A BC , A → BC
+
B = B CA , B → AC
2. MINIMAL COVER
3
www.gradeup.co
4
www.gradeup.co
3. PROPERTIES OF DECOMPOSITION
The process of breaking up or dividing a single relation into two or more sub relations is called
as decomposition of a relation. Decomposition is a tool that eliminates redundancy and checks
whether a decomposition is able to recovers the original relation.
a. Lossless Join Decomposition
b. Dependency Preserving Decomposition
3.1. Lossless Join Decomposition
Relation schema R with instance (r) decomposed into sub relations R 1 , R2 , R3………….RN.
In general [R1 ⋈ R2 ⋈ …… ⋈Rn] ⊇ r
• Decomposition is lossless if it is feasible to reconstruct relation R from decomposed
tables using Joins. The information will not lose from the relation when decomposed. The
join would result in the same original relation
if [R1 ⋈ R2 ⋈ …… ⋈Rn] = r (loss less join decomposition)
• Decomposition is lossy, when a relation is decomposed into two or more relational
schemas, the loss of information is unavoidable when the original relation is retrieved.
Decomposition is called lossy join decomposition when the join of the sub relations does
not result in the same relation R that was decomposed.
The natural join of the sub relations is always found to have some extraneous tuples.
if [R1 ⋈ R2 ⋈ …… ⋈Rn] ⊃ r (lossy join decomposition)
3.1.1. To Check whether decomposition is lossy or lossless:
Consider a relation R is decomposed into two sub relations R 1 and R2.
Then,
• If all the following conditions satisfy, then the decomposition is lossless.
• If any of these conditions fail, then the decomposition is lossy.
5
www.gradeup.co
Step 1: Union of both the sub relations must contain all the attributes that are present
in the original relation R.
Thus:
R1 ∪ R2 = R
Step 2: Intersection of both the sub relations must not be null i.e. there must be some
common attribute which is present in both the sub relations.
Thus, R1 ∩ R2 ≠ ∅
Step 3: Intersection of both the sub relations must be a super key of either R1 or R2 or
both.
Thus:
R1 ∩ R2 = Super key of R1 or R2
Example 3:
Consider the relation R (A, B, C, D, E) with Functional Dependency set :
F = {AB → C , C → D, B → E}
Check whether lossy or lossless decomposition.
(i) Decompose R1 (ABC) and R2 (ABDE)
R1 ∪ R2 = R
(R1 ∩ R2)+ = (AB)+
= ABCDE
= super key for R2
∴ Loss less decomposition
(ii) R1 (ABC) and R2 (DE)
R1 ∪ R2 = R
R1 ∩ R2 = ϕ.
∴ Lossy decomposition.
(iv) Decompose R1 (ABC) and R2 (BDE)
R1 ∪ R2 = R
(R1 ∩ R2)+ = (B)+
= BE
(it is not a super key for any relation Hence, Lossy decomposition).
3.2. Dependency Preserving Decomposition
Relational scheme R with Functional Dependency set F decomposed into sub relation R 1,
R2, R3 … Rn with sets F1, F2 … Fn.
If we decompose a relation R into relations R1 and R2, all dependencies of R either must
be a part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
Dependency preservation ensures-
• None of the functional dependencies that holds on the original relation are lost.
6
www.gradeup.co
• The sub relations still hold or satisfy the functional dependencies of the original relation.
In general,
F1 ∪ F2 ∪ F3 . . . Fn ⊆ F
As, F1 ∪ F2 ∪ F3 ∪ F4 = A → B, B → C, C → D, D → BE = F
∴ Dependency preserving.
4. NORMALIZATION
7
www.gradeup.co
5. NORMAL FORMS
must be atomic.
Multivalued Dependency:
Multivalued dependency occurs when there are more than one independent multivalued
attributes in a table.
Multivalued dependency occurs when two attributes in a table are independent of each
→→
For example:
P→→ Q
P→→ R
In the above case, Multivalued Dependency exists only if Q and R are independent
attributes.
8
www.gradeup.co
In the above table, we can see Students A and C have interest in more than one activity.
This is multivalued dependency because Course of a student are independent of Activities
but are dependent on the student.
Therefore, multivalued dependency:
StudentName →→ CourseDiscipline
StudentName →→ Activities
5.2. Second Normal Form (2 NF)
Relation R is in 2 NF iff:
• R is in 1 NF
• R does not contain any partial dependency.
Partial Dependency:
Let R be a relational schema and X, Y, Z are non- empty set of attributes.
X: Candidate key
Y: Proper subset of X or any candidate key
Z: Non-Prime attribute of R
9
www.gradeup.co
Note:
Always check the normal form from BCNF then 3 NF then 2 NF then 1 NF.
Example 5: R (A, B, C, D)
Let the FDs be
{AB → C,
C → A,
AC → D}
Find the highest normal form
10
www.gradeup.co
Sol:
First we find out the candidate keys.
A+ = A
AB+ = ABCD
BC+ = BCAD
∴ AB and BC are the candidate keys
Now,
C→A AC → D
AB → C
X X
BCNF ✓
∵ C not super key ∵AC not super key
X
3NF ✓ ✓ ∵ AC not super key
and D non-prime
X
2NF ✓ ✓ ∵A =A
+
C+ = ACD, D is non-prime
****
11
www.gradeup.co
12
www.gradeup.co
1
www.gradeup.co
DBMS
4 SQL
1. SQL
• SQL is a standard language for storing, manipulating and retrieving data in databases.
• When the user wants to get some information from the database file, he/she can issue a
query.
• A query is a user-request to retrieve data with a certain condition.
• The user specifies a certain condition. The program will go through all the records in the
database file and select those records that satisfy the condition.
• The result of the query will be stored in the form of a table.
2. FEATURES OF SQL
Select A1,A2,...,An
From r1,r2,...,rn,
Where P
2
www.gradeup.co
3
www.gradeup.co
Syntax:
INSERT INTO TABLE_NAME
(col1, col2, col3,.... col N)
VALUES (value1, value2, value3, .... valueN);
Or
INSERT INTO TABLE_NAME
VALUES (value1, value2, value3, .... valueN);
For example:
INSERT INTO gradeup (Author, Subject) VALUES ("Peter", "DBMS");
b. UPDATE: This command is used to update or modify the value of a column in the
table.
Syntax:
UPDATE table_name
SET [column_name1= value1,...column_nameN = valueN]
[WHERE CONDITION]
For example:
UPDATE students
SET User_Name = 'Peter'
WHERE Student_Id = '3'
c. DELETE: It is used to remove one or more row from a table.
Syntax:
DELETE FROM table_name
[WHERE condition];
For example:
DELETE FROM gradeup
WHERE Author="Peter";
4.3. Data Control Language: DCL stands for Data Control Language. It is used to retrieve
the stored or saved data.
The DCL execution is transactional. It also has rollback parameters.
4.4. SQL Commands:
a. SELECT: SELECT statements are used to fetch data from a database. Every query will
begin with SELECT.
SELECT column_name
FROM table_name;
b. SELECT DISTINCT: SELECT DISTINCT specifies that the statement is going to be a
query that returns unique values in the specified column(s).
SELECT DISTINCT column_name
FROM table_name;
4
www.gradeup.co
c. WHERE: WHERE is a clause that indicates you want to filter the result set to include
only rows where the following condition is true.
SELECT column_name(s)
FROM table_name
WHERE column_name operator value;
d. WITH: WITH clause lets you store the result of a query in a temporary table using an
alias. You can also define multiple temporary tables using a comma and with one instance
of the WITH keyword.
WITH temporary_name AS (
SELECT *
FROM table_name)
SELECT *
FROM temporary_name
WHERE column_name operator value;
e. LIKE: LIKE is a special operator used with the WHERE clause to search for a specific
pattern in a column.
SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern;
f. DELETE: DELETE statements are used to remove rows from a table.
DELETE FROM table_name
WHERE some_column = some_value;
g. GROUP BY: GROUP BY is a clause in SQL that is only used with aggregate functions.
It is used in collaboration with the SELECT statement to arrange identical data into
groups.
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name;
h. HAVING: HAVING was added to SQL because the WHERE keyword could not be used
with aggregate functions.
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > value;
i. INSERT: INSERT statements are used to add a new row to a table.
INSERT INTO table_name (column_1, column_2, column_3)
VALUES (value_1, 'value_2', value_3);
5
www.gradeup.co
j. ORDER BY: ORDER BY is a clause that indicates you want to sort the result set by a
particular column either alphabetically or numerically.
SELECT column_name
FROM table_name
ORDER BY column_name ASC | DESC;
k. Aggregate Functions:
An aggregate function allows you to perform a calculation on a set of values to return a
single scalar value.
i. AVG () Function:
The AVG () function returns the average value of a numeric column.
Syntax: SELECT AVG (column_name) FROM table_name
ii. COUNT () Function:
The COUNT () function returns the number of rows that matches a specified criterion.
• COUNT (column_name)syntax:
It returns the number of values (NULL values will not be counted) to the specified column:
SELECT COUNT (column_name) FROM table_name;
• COUNT(*) Syntax:
It returns the number of records in ab table:
SELECT COUNT (*) FROM table_name;
• COUNT (DISTINCT column_name)Syntax:
It returns the number of distinct values of the specified column:
SELECT COUNT (DISTINCT column_name) FROM table_name;
iii. MAX () Function;
The MAX() function returns the largest value of the selected column.
Syntax: SELECT MAX (column_name) FROM table_name;
iv. MIN () Function:
The MIN () function returns the smallest value of the selected column.
Syntax: SELECT MIN (column_name) FROM table_name;
v. SUM () Function:
The SUM () function returns the total sum of a numeric column.
vi. LEN () Function:
The LEN () function returns the lenth of the value in a text field.
Syntax: SELECT LEN (column_name) FROM table_name;
vii. HAVING Clause: The HAVING clause was added to SQL because the WHERE keyword
could be used with aggregate functions.
viii. GROUP BY Statement: The GROUP BY statement is used in conjunction with the
aggregate functions to group the result set by one or more columns.
6
www.gradeup.co
7
www.gradeup.co
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
• Right Outer Join
RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table on the right
side of the join and matching rows for the table on the left side of join. The rows for which
there is no matching row on left side, the result-set will contain null. RIGHT JOIN is also
known as RIGHT OUTER JOIN.
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
• FULL JOIN:
The SQL FULL JOIN combines the results of both left and right outer joins.
The joined table will contain all records from both the tables and fill in NULLs for missing
matches on either side.
SELECT table1.column1, table2.column2...
FROM table1
FULL JOIN table2
ON table1.common_field = table2.common_field;
4.5. SQL Constraints:
• SQL Constraints are rules used to limit the type of data that can go into a table, to
maintain the accuracy and integrity of the data inside table.
• Constraints can be divided into the following two types,
o Column level constraints: The column level constraints are applied only to one
column.
o Table level constraints: Table level constraints are applied to the whole table.
Constraints are used to make sure that the integrity of data is maintained in the database.
Following are the most used constraints that can be applied to a table.
4.5.1. NOT NULL:
• Not Null constraint restricts a column from having a NULL value. Once NOT NULL
constraint is applied to a column, you cannot pass a null value to that column. It enforces
a column to contain a proper value.
• It cannot be defined at table level.
8
www.gradeup.co
Write a query to select the persons with first name ‘Max’ and last name ‘Han’?
Sol:
SELECT*
FROM Persons
WHERE firstname = ‘Max’
AND lastname = ‘Han’
****
9
www.gradeup.co
10
www.gradeup.co
1
www.gradeup.co
DBMS
5 RELATIONAL MODEL
1. NoSQL
• A NoSQL originally referring to non SQL or non relational is a database that provides a
mechanism for storage and retrieval of data.
• This data is modelled in means other than the tabular relations used in relational databases.
• The purpose of using a NoSQL database is for distributed data stores with humongous data
storage needs.
1.1. NoSQL Features:
NoSQL Databases can have a common set of features such as:
• Non-relational data model.
• Runs well on clusters.
• Mostly open-source.
• Built for the new generation Web applications.
• Is schema-less.
1.2. Advantages of NoSQL:
There are many advantages of working with NoSQL databases such as MongoDB and
Cassandra. The main advantages are high scalability and high availability.
a. High scalability:
Partitioning of data and placing it on multiple machines in such a way that the order of
the data is preserved is sharding. NoSQL database use sharding for horizontal scaling.
Vertical scaling means adding more resources to the existing machine whereas horizontal
scaling means adding more machines to handle the data. Vertical scaling is not that easy
to implement but horizontal scaling is easy to implement. Examples of horizontal scaling
databases are MongoDB, Cassandra etc. NoSQL can handle huge amount of data because
of scalability, as the data grows NoSQL scale itself to handle that data in efficient manner.
b. High availability:
Auto replication feature in NoSQL databases makes it highly available because in case of
any failure data replicates itself to the previous consistent state.
2
www.gradeup.co
3
www.gradeup.co
2. PostgreSQl
4
www.gradeup.co
5
www.gradeup.co
6
www.gradeup.co
3. RELATIONAL ALGEBRA
4. SELECTION (σp)
• It is a unary operator
• It is denoted by σp where σ is the selection operator and p is the predicate condition.
• This operator retrieves the records of relation R which satisfies the predicate condition p.
• It is commutative.
7
www.gradeup.co
Example:
R: A B C
5 7 9
5 7 9
6 9 7
7 9 7
8 10 6
σA>5(R) = A B C
6 9 7
7 9 7
8 10 6
• ∏attribute list (R) is used to project required attributes from relation R and remaining attributes
are discarded.
• Commutativity does not hold on project.
• The number of tuples in PROJECT are always less than or equal to the number of tuples in
R.
• ∏List 1 (∏List 2 (R)) = ∏List 1 (R) iff list 2 ⊇ list 1.
Selects and projects columns named as subject and author from the
relation Books.
Example:
R: A B C
5 7 9
5 7 9
6 9 7
7 9 7
8 10 6
Π B,C(R) = B C
7 9
9 7
10 6
8
www.gradeup.co
• The results of relational algebra do not have a name that can be used to refer to them.
• The rename operation allows us to rename the output relation.
• 'rename' operation is denoted with small Greek letter rho ρ.
7. CROSS PRODUCT(X)
• Result is all attributes of R followed by all attributes of S and each record of R pairs with
every record of S.
• It is denoted by X.
• The result R X S contains one tuple (r, s) for each r ∈ R, s ∈ .
• It is also called Cartesian product.
Example:
R: A B C S: C D
3 4 3 5 4
4 6 9 9 5
5 7 9
R×S: A B C C D
3 4 3 5 4
3 4 3 9 5
4 6 9 5 4
4 6 9 9 5
5 7 9 5 4
5 7 9 9 5
9
www.gradeup.co
8. SET OPERATIONS
R ∪ S: A B C
4 7 7
5 7 4
6 2 4
7 2 4
8.2. Intersection (∩):
• The set intersection operation contains all tuples that are in both R & S.
• It is denoted by intersection ∩.
• Defines a relation consisting of a set of all tuple that are in both A and B. However, A
and B must be union compatible.
• It is also defined in the terms of set difference as: R ∩ S = R- (R - S)
10
www.gradeup.co
Example:
R: A B C
S: A B C
4 7 4
4 7 4
4 7 4
4 7 4
5 7 7
7 2 4
5 7 7
6 2 4
R ∪ S: A B C
4 7 7
R: A B C
4 7 4
S: A B C
4 7 4
4 7 4
5 7 7
4 7 4
5 7 7
6 2 4 7 2 4
R – S: A B C
5 7 7
6 2 4
11
www.gradeup.co
9. JOIN
• A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied.
• Join operation is essentially a cartesian product followed by a selection criterion.
• It is denoted by ⋈.
• There are 4 types of join:
➢ Natural Join
➢ Conditional Join
➢ Equi Join
➢ Outer Join
9.1. Natural Join:
• Natural join can only be performed if there is a common attribute (column) between
the relations.
• The name and type of the attribute must be same.
• As we are taking cross product so we do not remove common columns from the result.
R ⋈S = π distinct attributes (σ equality between same name attributes from R, S (R × S))
Example:
R: A B C
S: C D
3 4 3
5 4
4 6 9
9 5
5 7 9
12
www.gradeup.co
A B C C D
4 6 9 5 4
4 6 9 9 5
5 7 9 5 4
5 7 9 9 5
9.3. Equi Join:
• It is also known as an inner join.
• When the condition join uses only equivalence condition, it becomes a equi join.
• It is based on matched data as per the equality condition. The equi join uses the
comparison operator (=).
9.4. Outer Joins:
Let us use two tables R and S:
R: A B C S: C D
4 6 9 9 5
3 4 3 5 4
5 7 9
i. Left Outer Join:
• In the left outer join, operation allows keeping all tuple in the left relation.
• It is denoted by ⟕.
• If there is no matching tuple is found in right relation, then the attributes of right
relation in the join result are filled with null values.
R⋈S= A B C D
4 6 9 8
5 7 9 5
3 4 3 Null
ii. Right Outer Join:
• In the right outer join, operation allows keeping all tuple in the right relation.
• It is denoted by ⟖.
• If there is no matching tuple is found in the left relation, then the attributes of the
left relation in the join result are filled with null values.
13
www.gradeup.co
R⋈S= A B C D
4 6 9 5
5 7 9 5
Null Null 5 4
iii. Full Outer Join:
• In a full outer join, all tuples from both relations are included in the result,
irrespective of the matching condition.
• It is denoted by ⟗.
R ⟗ S = ( R ⟕ S) ∪ (R ⟖ S)
A B C D
4 6 9 8
5 7 9 5
3 4 3 Null
Null Null 5 4
10. DIVISION
14
www.gradeup.co
• In the tuple relational calculus, you will have to find tuples for which a predicate is true. The
calculus is dependent on the use of tuple variables.
• The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering
variable uses the tuples of a relation.
• The result of the relation can have one or more tuples.
Notation:
1. {T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
For example:
1. {T.name | Author(T) AND T.article = 'database' }
OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name'
from Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal
Quantifiers (∀).
Example 1: {t | EMPLOYEE (t) and t.SALARY>10000} : Implies that it selects the tuples
from EMPLOYEE relation such that resulting employee tuples will have salary greater than
10000. It is example of selecting a range of values.
Example 2: {t | EMPLOYEE (t) AND t.DEPT_ID = 10} : This select all the tuples of
employee name who work for Department 10.
The variable which is used in the condition is called tuple variable. In above example t.SALARY
and t.DEPT_ID are tuple variables. In the first example above, we have specified the condition
t.SALARY >10000. What is the meaning of it? For all the SALARY>10000, display the
employees. Here the SALARY is called as bound variable. Any tuple variable with ‘For All’ (?)
or ‘there exists’ (?) condition is called bound variable. Here, for any range of values of SALARY
greater than 10000, the meaning of the condition remains the same. Bound variables are those
ranges of tuple variables whose meaning will not change if the tuple variable is replaced by
another tuple variable.
In the second example, we have used DEPT_ID= 10. That means only for DEPT_ID = 10 display
employee details. Such variable is called free variable. Any tuple variable without any ‘For
15
www.gradeup.co
All’ or ‘there exists’ condition is called Free Variable. If we change DEPT_ID in this condition to
some other variable, say EMP_ID, the meaning of the query changes. For example, if we change
EMP_ID = 10, then above it will result in different result set. Free variables are those ranges
of tuple variables whose meaning will change if the tuple variable is replaced by another tuple
variable.
All the conditions used in the tuple expression are called as well-formed formula – WFF. All the
conditions in the expression are combined by using logical operators like AND, OR and NOT,
and qualifiers like ‘For All’ (?) or ‘there exists’ (?). If the tuple variables are all bound variables
in a WFF is called closed WFF. In an open WFF, we will have at least one free variable.
For example:
1. {R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output: This query will yield the same result as the previous one.
****
16
www.gradeup.co
17
www.gradeup.co
1
www.gradeup.co
DBMS
6 FILE STRUCTURE
1. FILE ORGANIZATION
• File Organization refers to the logical relationships among various records that constitute
the file, particularly with respect to the means of identification and access to any specific
record.
• In simple terms, Storing the files in certain order is called file Organization.
• File Structure refers to the format of the label and data blocks and of any logical control
record.
• The database is stored as a collection of files.
• File is a collection of blocks.
• A block is a collection of records.
• A record is a sequence of fields.
For storing the records within the blocks there are two ways:
a. Spanned Organization:
• Allows part of a record to be in one block and the rest of it to be on the next block.
• When (portions of) a single record may lie in different blocks, due to their large size, then
such records are called Spanned Records.
• No internal fragmentation except for the last block of the file, hence no memory wasted.
• Searching time will be more as for a single record more than two blocks are accessed.
2
www.gradeup.co
b. Unspanned Organization:
• When many records are restricted to fit within one block due to their small size then such
records are called Unspanned Records.
• Complete record is allocated in one block.
• Internal fragmentation is present.
• Less searching time.
The blocking factor for a file is the average number of file records stored in a disk block.
Block size
Block factor= recods/block
Re cord size
3. I/O COST
Number of disk blocks (database file blocks) required to transfer from disk to main memory in
order to access required records.
3
www.gradeup.co
# of record blocks
• Average number of blocks accessed by linear search:
2
• Average number of blocks accessed by binary search: ⌈log2 # of record blocks⌉
4.2. Direct access file organization
• Direct access file is also known as random access or relative file organization.
• In direct access file, all records are stored in direct access storage device (DASD), such
as hard disk. The records are randomly placed throughout the file.
• The records do not need to be in sequence because they are updated directly and
rewritten back in the same location.
• This file organization is useful for immediate access to large amount of information. It
is used in accessing large databases.
• It is also called as hashing.
4.3. Indexed sequential access file organization
• Indexed sequential access file combines both sequential file and direct access file
organization.
• In indexed sequential access file, records are stored randomly on a direct access device
such as magnetic disk by a primary key.
• This file has multiple keys. These keys can be alphanumeric in which the records are
ordered is called primary key.
• The data can be access either sequentially or randomly using the index. The index is
stored in a file and read into memory when the file is opened.
4
www.gradeup.co
5. INDEXING
• Indexing is defined as a data structure technique which allows to retrieve records from a
database file.
• Indexing is used to optimize the performance of a database by minimizing the number of
disk accesses required when a query is processed.
• The index is a type of data structure. It is used to locate and access the data in a database
table quickly.
• Indexing is a way to optimize the performance of a database by minimizing the number of
disk accesses required when a query is processed. It is a data structure technique which is
used to quickly locate and access the data in a database.
• Indexes are created using a few database columns.
➢ The first column is the Search key that contains a copy of the primary key or candidate
key of the table. These values are stored in sorted order so that the corresponding data
can be accessed quickly. The data may or may not be stored in sorted order.
➢ The second column is the Data Reference or Pointer which contains a set of pointers
holding the address of the disk block where that particular key value can be found.
There are two types of indexing:
Index File:
Each index file consists of two fields: search key and pointer
Hence size of index entry = search key + pointer
# of record blocks + 1
• Index blocking factor =
2
# of record blocks
• First (single) level index blocks:
index blocking factor
• Number of block accesses = ⌈ log2 (first level index blocks) ⌉ + 1
Block factor of Index File:
Block size – Block Header
Block factor of index file= Entries/block
Search Key + Pointer
5
www.gradeup.co
b. Sparse Index:
• For set of records of Database file there exist one index entry.
• In the data file, index record appears only for a few items. Each item points to a block.
• In this, instead of pointing to each record in the main table, the index points to the
records in the main table in a gap.
• In sparse index, Number of Index entries = Number of blocks
6
www.gradeup.co
Ordered Indices: The indices are usually sorted to make searching faster. The indices
which are sorted are known as ordered indices.
Unordered Indices: The indices which are not sorted are known as unordered indices.
5.2. Indexing Models:
a. Primary Index:
• It is defined as an ordered file.
• The index is created based on the primary key of the table.
• It is a sparse indexing on ordered(sorted) key attribute.
• To get a record with value k, binary search is performed on the index file to find the
appropriate index entry i and then retrieve the data file block whose address is p(i).
• I/O cost to access record using primary index with multilevel index is K+1 block.
b. Clustered Index:
• A clustered index can be defined as an ordered data file.
• The records are sorted over the attribute name which is not a primary key.
• The data file is ordered on a non-key field.
• Same elements are taken in a single cluster.
• In clustering index, the pointers point to the first address of the block, next pointer is
used to access the next block.
• It is a sparse index on ordered field and non-key attribute.
• I/O cost to access one cluster of records using clustering index with multilevel index is
K+ one or more block access until 1st block of next cluster.
7
www.gradeup.co
c. Secondary Index:
• It is a dense index on unordered data file.
• In secondary indexing, to reduce the size of mapping, another level of indexing is
introduced.
• In this method, the huge range for the columns is selected initially so that the mapping
size of the first level becomes small.
• Then each range is further divided into smaller ranges.
• The mapping of the first level is stored in the primary memory, so that address fetch
is faster.
• The mapping of the second level and actual data are stored in the secondary memory
(hard disk).
8
www.gradeup.co
d. Multi-level index:
• Index records comprise search-key values and data pointers. Multilevel index is stored
on the disk along with the actual database files.
• As the size of the database grows, so does the size of the indices. If single-level index
is used, then a large size index cannot be kept in memory which leads to multiple disk
accesses.
• Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which
can easily be accommodated anywhere in the main memory.
6. B+ TREE
Structure:
Let, K is the key, B is the block pointer, and R is the record pointer.
9
www.gradeup.co
Balancing Conditions:
p p
• Every internal node except root node contains atleast block pointers & − 1 keys.
2 2
• Internal node can contain at most p pointers and (p – 1) keys.
• Root node can be with atleast 2 block pointer and 1 key. and atmost p block pointer (p – 1)
keys.
• Root node can be with atleast 2 block pointer and 1 key and atmost p block pointer & (p –
1) keys.
p
• The leaf node can contain atleast − 1 keys and atmost P-1 keys
2
6.1. B+ Tree Insertion:
Descend to the leaf where the key fits.
• If the node has an empty space, insert the key/reference pair into the node.
• If the node is already full, split it into two nodes, distributing the keys evenly between
the two nodes. If the node is a leaf, take a copy of the minimum value in the second of
these two nodes and repeat this insertion algorithm to insert it into the parent node. If
the node is a non-leaf, exclude the middle value during the split and repeat this insertion
algorithm to insert this excluded value into the parent node.
Example: Insert a record 60 in the structure given below. With order P= 5.
• The 3rd leaf node should have values (50, 55, 60, 65, 70) and its current root node is
50.
10
www.gradeup.co
• Split the leaf node in the middle so that its balance is not altered. Now group (50, 55)
and (60, 65, 70) into 2 leaf nodes.
• 60 is taken to the upper level because the two newly created leaf nodes cannot
branch from 50 hence 60 added to it.
• In a non-overflowing scenario simply find the node where it fits and place it in that leaf
node.
6.2. B+ Tree Deletion:
Descend to the leaf where the key exists.
• Remove the required key and associated reference from the node.
• If the node still has enough keys and references to satisfy the invariants, stop.
• If the node has too few keys to satisfy the invariants, but its next oldest or next
youngest sibling at the same level has more than necessary, distribute the keys between
this node and the neighbor. Repair the keys in the level above to represent that these
nodes now have a different “split point” between them; this involves simply changing a
key in the levels above, without deletion or insertion.
• If the node has too few keys to satisfy the invariant, and the next oldest or next
youngest sibling is at the minimum for the invariant, then merge the node with its sibling;
if the node is a non-leaf, we will need to incorporate the “split key” from the parent into
our merging. In either case, we will need to repeat the removal algorithm on the parent
node to remove the “split key” that previously separated these merged nodes — unless
the parent is the root and we are removing the final key from the root, in which case the
merged node becomes the new root (and the tree has become one level shorter than
before).
Example: Delete 60 from the diagram given below. Here, order P =5.
Sol:
11
www.gradeup.co
• To delete 60 first, remove 60 from the intermediate node as well as from the 4th leaf
node too.
• If we remove it from the intermediate node, then the tree does not satisfy the rule of
the B+ tree.
• After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as
follows:
7. B TREE
12
www.gradeup.co
Sol: This insertion will cause a split. The middle key will go up to the parent.
Deleting 6
13
www.gradeup.co
• Case 2: If key k is in node x and x is an internal node, there are three cases to consider:
a. If the child y that precedes k in node x has at least t keys (more than the minimum),
then find the predecessor key k' in the subtree rooted at y. Recursively delete k' and
replace k with k' in x
b. Symmetrically, if the child z that follows k in node x has at least t keys, find the
successor k' and delete and replace as before. Note that finding k' and deleting it can be
c. Otherwise, if both y and z have only t−1 (minimum number) keys, merge k and all of
z into y, so that both k and the pointer to z are removed from x. y now contains 2t − 1
7 deleted
• Case 3: If key k is not present in an internal node x, determine the root of the
appropriate subtree that must contain k. If the root has only t − 1 keys, execute either
of the following two cases to ensure that we descend to a node containing at least t keys.
a. If the root has only t−1 keys but has a sibling with t keys, give the root an extra key
by moving a key from x to the root, moving a key from the roots immediate left or right
sibling up into x, and moving the appropriate child from the sibling to x.
14
www.gradeup.co
2 delete
b. If the root and all its siblings have t−1 keys, merge the root with one sibling. This
involves moving a key down from x into the new merged node to become the median key
for that node.
4 deleted
15
www.gradeup.co
8. B TREE VS B+ TREE
1. Search keys cannot be repeatedly stored. Redundant search keys can be present.
****
16
www.gradeup.co
17
www.gradeup.co
1
www.gradeup.co
DBMS
1. TRANSACTION
A transaction is a single logical unit of work which accesses and possibly modifies the contents
of a database. Transactions access data using read and write operations.
a. Read Operation:
• Read operation reads the data from the database and then stores it in the buffer in main
memory.
• For example- Read(X) instruction will read the value of X from the database and will
store it in the buffer in main memory.
• Steps are:
➢ Find the block that contain data item X.
➢ Copy the block to a buffer in the main memory.
➢ Copy item X from the buffer to the program variable named X.
b. Write Operation:
• Write operation writes the updated data value back to the database from the buffer.
• For example- Write(X) will write the updated value of X from the buffer to the
database.
• Steps are:
➢ Find address of block which contains data item X.
➢ Copy disk block into a buffer in the main memory
➢ Update data item X in main memory buffer.
➢ Store the updated block from the buffer back to disk.
2
www.gradeup.co
2. ACID PROPERTIES
• To ensure the consistency of database, certain properties are followed by all the transactions
occurring in the system.
• It is important to ensure that the database remains consistent before and after the
transaction.
• These properties are called as ACID Properties of a transaction.
A = Atomicity
C = Consistency
I = Isolation
D = Durability
Atomicity:
• This property ensures that a transaction should execute all the operations including commit
or none of them i.e. the transaction occurs completely, or it does not occur at all.
• It is the responsibility of Transaction Control Manager to ensure atomicity of the
transactions.
Consistency:
• This means that integrity constraints must be maintained i.e. It ensures that the database
remains consistent before and after the transaction.
• Transaction must be logically correct, and correctness should be monitored by the user.
• It is the responsibility of DBMS and application programmer to ensure consistency of the
database.
Isolation:
• Execution of one transaction is isolated from that of another transactions.
• It ensures that concurrent execution of transaction results in a system state that would be
obtained if transaction were executed serially, i.e., one after the other.
• It is the responsibility of concurrency control manager to ensure isolation for all the
transactions.
Durability:
• The changes applied to the database by a successfully committed transaction must persist
in the database.
• It also ensures that these changes exist permanently and are never lost even if there occurs
a failure of any kind.
• It is the responsibility of recovery manager to ensure durability in the database.
3
www.gradeup.co
3. SCHEDULE
Sequence that indicate the chronological order in which instructions of concurrent transactions
are executed.
Transaction T1 Transaction T2
R(A)
W(A)
R(B)
W(B)
Commit
R(A)
W(B)
Commit
4
www.gradeup.co
Example:
Transaction T1 Transaction T2
R(A)
W(B)
R(A)
R(B)
W(B)
Commit
R(B)
Commit
Ti Tj
R(A) ⋮
⋮ W(A)
b. Write-Read Conflict
Ti Tj
W(A) ⋮
⋮ R(A) → uncommitted
read
c. Write-Write Conflict:
Ti Tj
R(A) ⋮
⋮ W(A)
5
www.gradeup.co
Transaction T1 Transaction T2
R(A)
W(B)
Failure
4. SERIALIZABILITY
• It is the classical concurrency scheme. It ensures that a schedule for executing concurrent
transactions is equivalent to one that executes the transactions serially in some order.
• Some non-serial schedules may lead to inconsistency of the database.
• Serializability is a concept that helps to identify which non-serial schedules are correct and
will maintain the consistency of the database.
6
www.gradeup.co
Conflict Equivalent:
Two schedules are conflict equivalent if one can be transformed to another by swapping
non-conflicting operations.
Two schedules are said to be conflict equivalent if and only if:
1. They contain the same set of the transaction.
2. Conflict pairs must have same precedence in both schedules.
Conflict Equal serial schedules are topological orders of precedence graph of schedule S.
Example: S2 is conflict equivalent to S1 (S1 can be converted to S2 by swapping non-
conflicting operations).
7
www.gradeup.co
T1 T2 T1 T2
Read (A) Read (A)
Write (A) Write (A)
Read (A) Read (B)
Write (A) Write (B)
Read (B) Read (A)
Write (B) Write (A)
Read (B) Read (B)
Write (B) Write (B)
Schedule S1 Schedule S1
Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before
starting any operation of T2. Schedule S1 can be transformed into a serial schedule by
swapping non-conflicting operations of S1.
After swapping of non-conflict operations, the schedule S1 becomes:
T1 T2
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)
Since, S1 is conflict serializable.
4.2. View Serializability:
• A schedule will view serializable if it is view equivalent to a serial schedule.
• Conflict serializable which does not contain blind writes are view serializable.
View Equivalent Schedules:
Consider two schedules S1 and S2 each consisting of two transactions T1 and T2
Schedules S1 and S2 are called view equivalent if the following three conditions hold true
for them-
i. Initial readers must be same for all the data items:
For each data item X, if transaction Ti reads X from the database initially in schedule S1,
then in schedule S2 also, Ti must perform the initial read of X from the database.
8
www.gradeup.co
NOTE:
5. RECOVERABLE PROBLEMS
5.1. Irrecoverable Schedule: It is not possible to roll back after the commitment of a
transaction as the initial data is nowhere.
Example:
9
www.gradeup.co
R(A)
W(A)
⋮ R(A)
Commit W(B)
Commit
5.3. Cascadeless Recoverable Schedule: For each pair of transactions Ti and Tj such that
Tj reads a data item previously written by Ti the commit operation of Ti appears before
the read operation of Tj. Every cascadeless schedule is also recoverable.
Example:
Ti Tj
R(A)
W(A)
Commit
R(A)
Commit
Cascading roll back: A single transaction failure leads to a series of transaction
rollbacks.
5.4. Strict Recoverable Schedule: If transaction Ti updates the data item A, any other
transaction Tj not allowed to R(A) or W(A) until commit or roll back of T i.
It is the procedure in DBMS for managing simultaneous operations without conflicting with each
another.
Different concurrency control protocols offer different benefits between the amount of
concurrency they allow and the amount of overhead that they impose.
• Lock-Based Protocols
• Two Phase Locking Protocols
• Timestamp-Based Protocols
10
www.gradeup.co
11
www.gradeup.co
• In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
Problem in 2PL:
• Irrecoverability
• Deadlock
• Starvation
Strict Two- Phase Locking:
• Basic 2PL with all exclusive locks should be held until commit/roll back.
• It ensures serializability strict recoverable.
Problem in 2PL:
• Starvation
• Irrecoverability
6.3. Time-Stamp Protocol:
• This protocol ensures that every conflicting read and write operations are executed in
timestamp order.
• The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.
• The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as soon
as a transaction is created.
Basic Timestamp ordering protocol works as follows:
Let,
TS(Ti) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
• If W_TS(X) >TS(Ti) then the operation is rejected.
• If W_TS(X) <= TS(Ti) then the operation is executed.
• Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
• If TS(Ti) < R_TS(X) then the operation is rejected.
• If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the
operation is executed.
12
www.gradeup.co
****
13
www.gradeup.co
14