0% found this document useful (0 votes)
23 views95 pages

DBMS

The document discusses the basics of database management systems (DBMS). It covers topics like what a database is, the components and functions of a DBMS, data definition and manipulation languages, data independence, advantages and disadvantages of DBMS compared to file-based systems, and concepts like relations, relational schemas and instances in RDBMS.

Uploaded by

LAXMAN MEENA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views95 pages

DBMS

The document discusses the basics of database management systems (DBMS). It covers topics like what a database is, the components and functions of a DBMS, data definition and manipulation languages, data independence, advantages and disadvantages of DBMS compared to file-based systems, and concepts like relations, relational schemas and instances in RDBMS.

Uploaded by

LAXMAN MEENA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

www.gradeup.

co

1
www.gradeup.co

DBMS

1 INTRODUCTION TO DBMS

1. BASICS OF DBMS

Databases are organized according to fields and records so that they are easily
searched.
Field: A field is single unit of data that is unique within each entry/row, but the overall
data category is common to all entries. Field is a smaller entity of the table which
contains specific information about every record in the table.
Record: Records are composed of fields, each of which contains one item of
information. A set of records constitutes a file. For example, a personnel file might
contain records that have three fields: a name field, an address field, and a phone
number field.
Database: It is a collection of related data. The database is a collection of inter-
related data which is used to retrieve, insert and delete the data efficiently. It is also
used to organize the data in the form of a table, schema, views, and reports, etc.
For example: The college Database organizes the data about the admin, staff, students
and faculty etc.
Using the database, the information can be easily retrieved, inserted, and deleted.

2. DBMS

• Database is a collection of data and Management System is a set of programs


to store and retrieve those data.
• DBMS provides an interface to perform various operations like database
creation, storing data in it, updating data, creating a table in the database and
a lot more.
• It provides protection and security to the database. In the case of multiple
users, it also maintains data consistency.

2
www.gradeup.co

2.1. Tasks performed by DBMS:


• Data Definition: It is used for creation, modification, and removal of definition that
defines the organization of data in the database.
• Data Updation: It is used for the insertion, modification, and deletion of the actual
data in the database.
• Data Retrieval: It is used to retrieve the data from the database which can be used
by applications for various purposes.
• User Administration: It is used for registering and monitoring users, maintain data
integrity, enforcing data security, dealing with concurrency control, monitoring
performance, and recovering information corrupted by unexpected failure.
2.2. Functions of a DBMS
DBMS provides the following functions:
• Concurrency: concurrent access (meaning 'at the same time') to the same database
by multiple users.
• Security: security rules to determine access rights of users.
• Backup and recovery: processes to back-up the data regularly and recover data if a
problem occurs.
• Integrity: database structure and rules improve the integrity of the data.
• Data descriptions: a data dictionary provides a description of the data.
2.3. Advantages of DBMS:
• Reduced updating errors and increased consistency
• Improved data security
• Reduced data redundancy.
• Facilitated development of new applications program
• Greater data integrity and independence from applications programs
• Improved data access to users through use of host and query languages
• Reduced data entry, storage, and retrieval costs
2.4. Disadvantages of DBMS:
• Database systems are complex, difficult, and time-consuming to design
• Extensive conversion costs in moving form a file-based system to a database system
• Initial training required for all programmers and users.
• Substantial hardware and software start-up costs
• Damage to database affects virtually all applications programs.
2.5. Database Languages:
• A DBMS has appropriate languages and interfaces to express database queries and
updates.
• Database languages can be used to read, store and update the data in the database.

3
www.gradeup.co

a. Data Definition Language: DDL stands for Data Definition Language. It is used
to define database structure or pattern. Data definition language is used to store
the information of metadata like the number of tables and schemas, their names,
indexes, columns in each table, constraints, etc.
Here are some tasks that come under DDL:
Create: It is used to create objects in the database.
Alter: It is used to alter the structure of the database.
Drop: It is used to delete objects from the database.
Truncate: It is used to remove all records from a table.
Rename: It is used to rename an object.
Comment: It is used to comment on the data dictionary.
b. Data Manipulation Language: DML stands for Data Manipulation Language. It is
used for accessing and manipulating data in a database. It handles user requests.
Select: It is used to retrieve data from a database.
Insert: It is used to insert data into a table.
Update: It is used to update existing data within a table.
Delete: It is used to delete all records from a table.
Merge: It performs UPSERT operation, i.e., insert or update operations.
Call: It is used to call a structured query language or a Java subprogram.
Explain Plan: It has the parameter of explaining data.
Lock Table: It controls concurrency.
c. Data Control Language: DCL stands for Data Control Language. It is used to retrieve
the stored or saved data.
The DCL execution is transactional. It also has rollback parameters.

3. DATA INDEPENDENCY

User can access data without knowing how data can be stored in database file.
In DBMS there are two types of data independence:
3.1. Logical Data Independence:
• It is mainly concerned with the structure or changing the data definition.
• It is difficult as the retrieving of data is mainly dependent on the logical structure of
data.
3.2. Physical Data Independence:
• Mainly concerned with the storage of the data.
• It is easy to retrieve.

4
www.gradeup.co

4. FILE BASED SYSTEM VS DBMS

File Management System Database Management System

File System is a general, easy-to-use system to


Database management system is used when
store general files which require less security
security constraints are high.
and constraints.

Due to normalization easy to maintain non-


Complex to maintain non-redundant data.
redundant data.

Less degree of concurrency. More degree of concurrency.

Data Inconsistency is less in database


Data Inconsistency is more in file system.
management system.

More I/O cost to access required data from Less I/O cost to access data because of
database files. indexing to database file.

Centralisation is hard to get when it comes to Centralisation is achieved in Database


File Management System. Management System.

In Database Management System, user is


User locates the physical address of the files to
unaware of physical address where data is
access data in File Management System.
stored.

Security is high in Database Management


Security is low in File Management System.
System.

Database Management System stores


File Management System stores unstructured
structured data which have well defined
data as isolated data files/entities.
constraints and interrelation.

5
www.gradeup.co

5. RDBMS RULES

• Data in database file must be in tabular format.


• No two rows of database table must be same.
Example:

• Relational schema: Definition or structure of database table

• Relational Instance: Set of all records of DBMS table.


• Arity: No. of attributes of relational table.
• Cardinality: No. of records of database table.

NOTE:

No two rows of DBMS table must be same.

6. RELATION

• Relation is used to refer to a table in a relational database. It is the defining feature of


relational databases.
• Relational data model is the primary data model, which is used widely around the world for
data storage and processing. This model is simple, and it has all the properties and
capabilities required to process data with storage efficiency.
• The relational database is only concerned with data and not with a structure which can
improve the performance of the model
• A relation is a named, two-dimensional table of data.
• A table consists of rows (records) and columns (attribute or field)
Attribute: Each column in a Table is called as an attribute. Attributes are the properties which
define a relation. e.g., Sid , Sname , DOB
Degree: The total number of attributes which are present in a relation is called the degree of
the relation.

6
www.gradeup.co

Tables: In relational data model, relations are saved in the form of Tables. This format stores
the relation among entities. A table has rows and columns, where rows represents records and
columns represent the attributes.
Tuple: A single row of a table, which contains a single record for that relation is called a tuple.
Cardinality: Total number of rows present in the Table.
Relation instance: A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema: A relation schema describes the relation name (table name), attributes,
and their names.

Column: The column represents the set of values for a specific attribute.
Relation key: Each row has one or more attributes, known as relation key, which can identify
the row in the relation (table) uniquely.
Attribute domain: Every attribute has some pre-defined value scope, known as attribute
domain.

6.1. Properties of Relational Tables


Relational tables have six properties:
a. Values Are Atomic:
This property implies that columns in a relational table are not repeating group or arrays.
The atomic value property of relational tables is important because it is one of the
cornerstones of the relational model.
b. Column Values Are of the Same Kind:
In relational terms this means that all values in a column come from the same domain.
This property simplifies data access because developers and users can be certain of the
type of data contained in each column. It also simplifies data validation. Because all
values are from the same domain, the domain can be defined and enforced with the Data
Definition Language (DDL) of the database software.

7
www.gradeup.co

c. Each Row is Unique:


This property ensures that no two rows in a relational table are identical; there is at least
one column, or set of columns, the values of which uniquely identify each row in the table.
This property guarantees that every row in a relational table is meaningful and that a
specific row can be identified by specifying the primary key value.
d. The Sequence of Columns is Insignificant:
Columns can be retrieved in any order and in various sequences. It also permits the
physical structure of the database to change without affecting the relational tables.
e. The Sequence of Rows is Insignificant:
The main benefit is that the rows of a relational table can be retrieved in different order
and sequences. Adding information to a relational table is simplified and does not affect
existing queries.
f. Each Column Has a Unique Name:
Because the sequence of columns is insignificant, columns must be referenced by name
and not by position.

7. OPERATIONS IN RELATIONAL MODEL

Four basic update operations performed on relational database model are


Insert, update, delete and select.
• Insert is used to insert data into the relation
• Delete is used to delete tuples from the table.
• Modify allows you to change the values of some attributes in existing tuples.
• Select allows you to choose a specific range of data.

8. DBMS ARCHITECTURE

Database architecture focuses on the design, development, implementation and maintenance


of computer programs that store and organize information for organization.
8.1. Types of DBMS architecture:

8
www.gradeup.co

8.1.1. 1-Tier Architecture


• In this architecture, the database is directly available to the user.
• Any changes done here will directly be done on the database itself. It doesn't provide
a handy tool for end users.
• The 1-Tier architecture is used for development of the local application, gives the
user/developers the ability to communicate directly to the database without any
intervention
8.1.2. 2-Tier Architecture
• The 2-Tier architecture of DBMS consists of two tiers. Tier-1 being the database server
and Tier-2 being the users or clients of the application.
• In the two-tier architecture, applications on the client end can directly communicate
with the database at the server side.
• The server side is responsible to provide the functionalities like: query processing and
transaction management.
• To communicate with the DBMS, client-side application establishes a connection with
the server side.

8.1.3. 3-Tier Architecture


• The 3-Tier architecture of DBMS is a fully fledged software system that is responsible
for generating response to user queries in the most efficient and suitable manner.
• The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
• The application on the client-end interacts with an application server which further
communicates with the database system.
• End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
• The 3-Tier architecture is used in case of large web application.

9
www.gradeup.co

• Security, Data Backup, Recovery, Concurrency Control and Low Data Redundancy are
some of the features of a 3-Tier architecture, which makes it the most widely used
database architecture.

9. ER DIAGRAM

• It’s a logical tool which is used for database design scheme.


• ER model stands for an Entity-Relationship model. It is a high-level data model. This model
is used to define the data elements and relationship for a specified system.
• It develops a conceptual design for the database. It also develops a very simple and easy
to design view of data.
• In ER modelling, the database structure is portrayed as a diagram called an entity-
relationship diagram.

10
www.gradeup.co

9.1. Component of ER Diagram

10. ENTITY

• An entity is a thing that has an independent existence.


• It may be any object, class, person or place.
• In the ER diagram, an entity can be represented as rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be
taken as an entity.

10.1. Weak Entity


• A weak entity set do not possess sufficient attributes to form a primary key. It depends
on another entity called a weak entity.
• It is represented using a double rectangle.

10.2. Strong Entity Set


• A strong entity set possess its own primary key.
• It is represented using a single rectangle.

11
www.gradeup.co

11. ATTRIBUTE

• The attribute is used to describe the property of an entity.


• Ellipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.

11.1. Types of Attributes:


i. Key Attribute
• It represents a primary key.
• The key attribute is represented by an ellipse with the text underlined.

ii. Composite Attribute


• An attribute that is composed of many other attributes is known as a composite
attribute.
• The composite attribute is represented by an ellipse, and those ellipses are connected
with an ellipse.

12
www.gradeup.co

iii. Single Valued Attributes:


The attributes that have a single value for a particular entity.
For example, the loan_number attribute for a specific loan entity refers to only one loan
number.
iv. Multivalued Attribute
• An attribute can have more than one value.
• The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.

d. Derived Attribute
• An attribute that can be derived from another attribute is known as a derived attribute.
• It can be represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another
attribute like Date of birth.

12. RELATIONSHIP

• A relationship is used to describe the relation between entities.


• Diamond or rhombus is used to represent the relationship.
• The number of entity sets that participate in a relationship is called as the degree of the
relationship set.
• Lines are used to link attributes to entity sets and entity sets to relationship sets.

13
www.gradeup.co

12.1. Mapping Cardinality:


Mapping Cardinality or cardinality ratios expresses the number of entities to which
another entity can be associated with via a relationship set.
12.2. Degree of a relationship set:
The number of different entities sets participating in a relationship set is called as degree
of a relationship set.
i. Unary Relationship:
When there is only one entity set participating in a relation, the relationship is called as
unary relationship. For example, one person is married to only one person.

ii. Binary Relationship:


When there are two entities set participating in a relation, the relationship is called as
binary relationship. For example, Student is enrolled in Course.

iii. n-ary Relationship:


When there are n entities set participating in a relation, the relationship is called as n-ary
relationship.
12.3. Types of relationship:
i. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known
as one to one relationship. An entity in E1 is associated with at most one entity in E2,
and an entity in E2 is associated with at most one entity in E1.

14
www.gradeup.co

ii. One-to-many relationship


When only one instance of the entity on the left, and more than one instance of an entity
on the right associates with the relationship then this is known as a one-to-many
relationship. An entity in E1 is associated with any number of entities in E2, and an entity
in E2 is associated with at most one entity in E1.

iii. Many-to-one relationship


When more than one instance of the entity on the left, and only one instance of an entity
on the right associates with the relationship then it is known as a many-to-one
relationship. An entity in E1 is associated with at most one entity in E2, and an entity in
E2 is associated with any number of entities in E1.

iv. Many-to-many relationship


When more than one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then it is known as a many-to-many
relationship. An entity in E1 is associated with any number of entities in E2, and an entity
in E2 is associated with any number of entities in E1.

12.4. Participation Constraint:


Participation Constraint is applied on the entity participating in the relationship set.
i. Total Participation: Each entity in the entity set must participate in the relationship.
If each student must enroll in a course, the participation of student will be total. Total
participation is shown by double line in ER diagram.

15
www.gradeup.co

ii. Partial Participation: The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the student, the participation of
course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.

Using set, it can be represented as,

Every student in Student Entity set is participating in relationship but there exists a course
C4 which is not taking part in the relationship.
Weak Entity Type and Identifying Relationship:
• A weak entity set do not possess sufficient attributes to form a primary key. It depends
on another entity called a weak entity.
• For example, A company may store the information of dependants (Parents, Children,
Spouse) of an Employee. But the dependents don’t have existence without the employee.
So Dependent will be weak entity type and Employee will be Identifying Entity type for
Dependant.
• The participation of weak entity type is always total.
• The relationship between weak entity type and its identifying strong entity type is called
identifying relationship and it is represented by double diamond.

****

16
www.gradeup.co

17
www.gradeup.co

1
www.gradeup.co

DBMS

2 DATABASE DESIGN

1. CONSTRAINTS

These are the conditions which must be present for a valid relation.
Relational model constraints are restrictions specified to the data values in the relational
database
There are many types of integrity constraints. Constraints on the Relational database
management system is mostly divided into three main categories are:
a. Domain Constraints:
It specifies that the value taken by the attribute must be the atomic value from its domain i.e.
the value is unique.
Domain constraints can be violated if an attribute value is not appearing in the corresponding
domain or it is not of the appropriate data type.
b. Key Constraints:
Every relation in the database should have at least one set of attributes which defines a tuple
uniquely.
The value of the attribute for different tuples in the relation has to be unique.
c. Referential Integrity Constraints:
Referential integrity constraints is based on the concept of Foreign Keys.
When one attribute of a relation can only take values from other attribute of same relation or
any other relation, it is called referential integrity.
A foreign key is an important attribute of a relation which should be referred to in other
relationships.

2
www.gradeup.co

2. TYPES OF CONSTRAINTS

2.1. Entity Integrity Constraint:


• Entity integrity constraints as the name implies is applied on each entity. i.e. it is
applied on individual rows.
• The constraint here is to have a unique value for each row in the column or a group of
columns it is applied to.
• This attribute is essential when a particular record or row of data is to be accessed. It
can be accessed using the entity integrity constraint by supplying the unique value
and accessing the entire record.
There are two such constraints which ensure uniqueness of data:
Primary key: Primary key ensures that values in a column are unique so that duplicate
values are not allowed and also the primary key column cannot be null. Thus, it focuses
on two properties- uniqueness and not null. A table can contain only one primary key.
Primary key can consist of a column or a group of columns. It is used to uniquely
identify records in a table.
Unique keyword: Unique keyword is just like the primary key, but it allows null
values. Both these are defined on columns during defining the structure of table. Thus,
these are used in data definition language.
2.2. Domain Integrity Constraints:
These are the constraints on the domain value and thus are column level constraints
unlike entity integrity constraint which are row level. The domain integrity constraints are
used to impose restrictions on some particular column.
There are three constraints under domain constraint.
a. Not Null Constraint:
Null is the legal value for all the domain specified in SQL. So, it is ok for an attribute to
have null value. But there are some attributes which need not be null.
Consider a student tuple that has a null value in its ‘name’ attribute. In this case,
information about an unknown student is stored. So, in these cases, not null constraint
for the specific attribute in a relation are particularly specified
By specifying an attribute to be not null the domain of that attribute becomes restricted
for not accepting the null values.
Example:
create table Student
(Student_id varchar (5) , name varchar (20) not null, depart_name varchar (20));
In SQL, particularly the primary key attribute has a not null constraint on it. The attribute
in relation you declare as a primary key need not be specifically declared as not null.

3
www.gradeup.co

b. Default Value Constraint:


Default value is the value to be provided in case no value is provided by the user. Default
value for any column is set depending on its datatype. These are also used during defining
the structure of the tables, in the data definition language statements.

Using default value constraint, a default value is set for an attribute. In case no value is
specified, any value for an attribute on which default constraint is specified it hold the
specified default value.
Example:
create table instructor
(instructor_id varchar (5),
name varchar (20) not null,
depart_name varchar (5),
salary numeric (8,2) default 0);
This command specifies that while inserting a tuple in instructor relation if no value is
provided for the salary attribute then its value is set to be 0.
c. Check Clause:
Check is used to impose certain checks like checking if a value is greater than or lesser
than a particular value etc. thus the upper and the lower limit can be set.
The check clause constraint ensures that when a new tuple is inserted in relation it must
satisfy the predicate specified in the check clause.:
Example:
create table Student
(Student_id varchar (5) , name varchar (20) not null, depart_name varchar (20),
primary key (Student_id),
check (depart_name in(‘Comp.Sci.’, ‘Elec.Eng.’, ‘Physics’, ‘Biology’)));
According to the SQL standard, the predicate that is placed inside the check clause can
be a subquery. But, today’s widely used database products do not allow the predicate of
check clause to contain a subquery.
2.3. Referential Integrity Constraint:
• Referential integrity constraint makes sure that the values in the column on which it
is applied are already present in the column it is referring to.
• Thus, here a column of a table refers to the other column of the same or different
table. This ensures that the values are consistent and similar in both the columns. This
is implemented using:

4
www.gradeup.co

Foreign key:
• Foreign key normally references the primary key of same or another table. But it can
refer to other columns too.
• Whenever, same type of attribute exists in two different tables, the attribute in one of
the table is declared as the primary key and in the other, it is made foreign key, so
that the values in both become consistent.
• Foreign key is dependent on primary key.
• It is defined over two tables
i. Referenced Relation: The relation to which other relations refer is called referenced
relation
ii. Referencing Relation: The relation which is referencing to other relation is called
referencing relation

5
www.gradeup.co

3. ANOMALIES

An anomaly is an irregularity, or something which deviates from the expected or normal state.

When designing databases, there are three types of anomalies: Insert, Update and Delete.

Insertion Anomaly in Referencing Relation:

A row in referencing relation cannot be inserted if referencing attribute’s value is not present

in referenced attribute value.

Deletion/ Updation Anomaly in Referenced Relation:

A row from referenced relation cannot be deleted or updated if value of referenced attribute is

used in value of referencing attribute.

• On Delete Cascade: It will delete the tuples from referencing relation if value used by

referencing attribute is deleted from referenced relation.

• On Update Cascade: It will update the referencing attribute in referencing relation if

attribute value used by referencing attribute is updated in referenced relation

3.1. Referenced Relation:

1. Insertion: No violation

2. Deletion:

a. On delete no action: Deletion of referenced record is restricted if foreign key

violation occurs.

b. On delete cascade: If we want to delete primary key values from referenced record

then it will delete that value from referencing table also.

c. On delete set null: If we want to delete primary key from referenced table then it

will set the values as null in place of that value in referencing table.

3. Updation:

a. On update no action

b. On update cascade

c. On update set null

3.2. Referencing Relation:

1. Insertion: May Cause foreign key violation

2. Deletion: No foreign Key violation

3. Updation: May cause Foreign Key violation

6
www.gradeup.co

A C
2 4
3 4
4 3
5 4
6 2

C is foreign key referencing A


Delete (2, 4) and on delete cascade

A C
3 4
 Result
4 3
5 4

****

7
www.gradeup.co

8
www.gradeup.co

1
www.gradeup.co

DBMS

3 NORMALIZATION

1. EQUALITY OF FD SET

Let F and E are two FD sets for a relation R.


• If all FDs of F can be derived from FDs present in E, we can say that E ⊃ F.
• If all FDs of E can be derived from FDs present in F, we can say that F ⊃ E.
• If both the above points are true, F=E.
Two sets of functional dependencies E and F are equivalent if:
i) E covers F: Every FD of E set must be a member of F set i.e. every FD in E can be inferred
from F. i.e. E ⊇ F

(ii) F covers E: Every FD of F set must be a member of E set i.e. every FD in F can be inferred
from E i.e. E ⊆ F

E covers F F covers E Result

Yes Yes E≡F

Yes No E⊃F

No Yes F⊃E

No No E and F not comparable

2
www.gradeup.co

Example 1:
F = {A → B, B → C, C → A}
G = {A → BC, B → AC, AB → C, BC → A}
Which is true?
A. F ⊂ G
B. F ⊃ G
C. F = G
D. None
Sol:
Step 1: Checking whether all FDs of F are present in G.

A + = A BC , A → BC
+
B = B AC , B → AC
+
AB = AB C , AB → AC
BC+ = BC A , BC → A

F covers G
Step 2: Checking whether all FDs of G are present in F.

A + = A BC , A → BC
+
B = B CA , B → AC

C+ = C , C does not determined A


G does not cover F
∴ F is superset of G.

2. MINIMAL COVER

• A minimal cover or canonical cover of a set of functional dependencies F [ non-reducible


Functional dependency set] is a minimum set of dependencies that is equivalent to F.
• A minimal cover or canonical cover is a simplified and reduced version of the given set of
functional dependencies.
• Since it is a reduced version, it is also called as Irreducible set.
• Canonical cover is free from all the extraneous functional dependencies.
• The closure of canonical cover is same as that of the given set of functional dependencies.
• Canonical cover is not unique and may be more than one for a given set of functional
dependencies.
2.1. Minimal Cover Procedure:
Step 1: Split the Functional Dependencies of Functional Dependency set F such that right
hand side of FD is a single attribute.

3
www.gradeup.co

Step 2: Remove extraneous attributes from each determinant


An attribute of a functional dependency is said to be extraneous if we can remove it
without changing the closure of the set of functional dependencies.

Here, we can derive X from W so in WXY, X becomes an extraneous attribute.


Step 3: Remove redundant Functional Dependencies from above steps result.
Let X→Y is the Functional Dependency of set F iff
X→Y must be member of (F – {X→Y}) = F then X→Y is redundant Functional Dependency.
Which can be removed from F.
Example 2: Given Functional Dependency set
AB → CD
BC → DE
CD → EF
BCD → F
ABE → FG
Find the minimal cover.
Sol.
Step 1: Splitting
AB → C
AB → D
BC → D
BC → E
CD → E
CD → F
BCD → F
ABE → F
ABE → G
Step 2: Remove Extraneous FD
AB → C
AB → D
BC → D
BC → E
CD → E
CD → F
𝐵𝐶𝐷̸ → F ∵(BC)+ = BC 𝐷 𝐸, As we are getting D from BC so D is extraneous in BCD.
𝐴𝐵𝐸̸ → F ∵ (AB)+ = AB 𝐸 FG.
𝐴𝐵𝐸̸ → G

4
www.gradeup.co

Step 3: Remove redundant FD

∴ Minimal cover is : {AB → CG, BC → D, CD → EF}

3. PROPERTIES OF DECOMPOSITION

The process of breaking up or dividing a single relation into two or more sub relations is called
as decomposition of a relation. Decomposition is a tool that eliminates redundancy and checks
whether a decomposition is able to recovers the original relation.
a. Lossless Join Decomposition
b. Dependency Preserving Decomposition
3.1. Lossless Join Decomposition
Relation schema R with instance (r) decomposed into sub relations R 1 , R2 , R3………….RN.
In general [R1 ⋈ R2 ⋈ …… ⋈Rn] ⊇ r
• Decomposition is lossless if it is feasible to reconstruct relation R from decomposed
tables using Joins. The information will not lose from the relation when decomposed. The
join would result in the same original relation
if [R1 ⋈ R2 ⋈ …… ⋈Rn] = r (loss less join decomposition)
• Decomposition is lossy, when a relation is decomposed into two or more relational
schemas, the loss of information is unavoidable when the original relation is retrieved.
Decomposition is called lossy join decomposition when the join of the sub relations does
not result in the same relation R that was decomposed.
The natural join of the sub relations is always found to have some extraneous tuples.
if [R1 ⋈ R2 ⋈ …… ⋈Rn] ⊃ r (lossy join decomposition)
3.1.1. To Check whether decomposition is lossy or lossless:
Consider a relation R is decomposed into two sub relations R 1 and R2.
Then,
• If all the following conditions satisfy, then the decomposition is lossless.
• If any of these conditions fail, then the decomposition is lossy.

5
www.gradeup.co

Step 1: Union of both the sub relations must contain all the attributes that are present
in the original relation R.
Thus:
R1 ∪ R2 = R
Step 2: Intersection of both the sub relations must not be null i.e. there must be some
common attribute which is present in both the sub relations.
Thus, R1 ∩ R2 ≠ ∅
Step 3: Intersection of both the sub relations must be a super key of either R1 or R2 or
both.
Thus:
R1 ∩ R2 = Super key of R1 or R2
Example 3:
Consider the relation R (A, B, C, D, E) with Functional Dependency set :
F = {AB → C , C → D, B → E}
Check whether lossy or lossless decomposition.
(i) Decompose R1 (ABC) and R2 (ABDE)
R1 ∪ R2 = R
(R1 ∩ R2)+ = (AB)+
= ABCDE
= super key for R2
∴ Loss less decomposition
(ii) R1 (ABC) and R2 (DE)
R1 ∪ R2 = R
R1 ∩ R2 = ϕ.
∴ Lossy decomposition.
(iv) Decompose R1 (ABC) and R2 (BDE)
R1 ∪ R2 = R
(R1 ∩ R2)+ = (B)+
= BE
(it is not a super key for any relation Hence, Lossy decomposition).
3.2. Dependency Preserving Decomposition
Relational scheme R with Functional Dependency set F decomposed into sub relation R 1,
R2, R3 … Rn with sets F1, F2 … Fn.
If we decompose a relation R into relations R1 and R2, all dependencies of R either must
be a part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
Dependency preservation ensures-
• None of the functional dependencies that holds on the original relation are lost.

6
www.gradeup.co

• The sub relations still hold or satisfy the functional dependencies of the original relation.
In general,
F1 ∪ F2 ∪ F3 . . . Fn ⊆ F

if [F1 ∪ F2 ∪ F3 . . . Fn ] = F (dependency preserving)


if [F1 ∪ F2 ∪ F3 . . . Fn ] ⊂ F (not dependency pressing)

Because of decomposition there should not be any loss of any dependency.


Let R1 and R2 are two Functional Dependencies set then if:
(i) F1 ∪ F2 = F, then dependency preserving decomposition
(ii) F1 ∪ F2 ⊂ F, then not dependency preserving decomposition
(iii) F1 ∪ F2 ⊃ F, then not dependency preserving decomposition.
Example 4: Let R (A, B, C, D, E)
F = {A → B, B→ C, C → D, D → BE}
Decompose into {AB, BC, CD, DE}
Sol:
and B→ A is not there.
F1 : R1(AB): A→B +
∵ 𝐴 = 𝐵 𝐴𝐶𝐷𝐸 B+ = BCDE
∵ B+ does not have A.
B→C ∵ 𝐵 + = 𝐵 𝐶 𝐷𝐸
F2 : R2(BC):
C→B 𝐶 + = 𝐶𝐷 𝐵 𝐸
C→D ∵ 𝐶 + = 𝐶 𝐷 𝐵𝐸
F3 : R3(CD):
D→C 𝐷 + = 𝐷𝐵𝐸 𝐶
F4: R4(DE): D→E ∵ 𝐷+ = 𝐷 𝐵 𝐸 𝐶

As, F1 ∪ F2 ∪ F3 ∪ F4 = A → B, B → C, C → D, D → BE = F
∴ Dependency preserving.

4. NORMALIZATION

• Normalization is a process of organizing the data in database to avoid:


i. data redundancy
ii. insertion anomaly
iii. update anomaly
iv. deletion anomaly.
• Normalization divides the larger table into the smaller table and links them using relationship.
• The normal form is used to reduce redundancy and minimizing the insertion, deletion and
update anomalies from the database table

7
www.gradeup.co

4.1. Problems caused by redundancy:


Storing same information redundantly i.e. in more than one place within a database.
• Redundant Storage: Some information is stored repeatedly
• Update Anomalies: If one copy of such repeated data is updated, an inconsistency is
created unless all copies are similarly updated.
• Insertion Anomalies: It may not be possible to store certain information unless some
other, information is stored as well.
• Deletion Anomalies: It may not be possible to delete certain information without losing
some other, information is stored as well.

5. NORMAL FORMS

5.1. First Normal Form (1-NF)


• A relation will be 1NF if it contains an atomic value.
• In 1 NF an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
• By default, every relation is in 1NF.
• This is because formal definition of a relation states that value of all the attributes

must be atomic.

Multivalued Dependency:

Multivalued dependency occurs when there are more than one independent multivalued

attributes in a table.

Multivalued dependency occurs when two attributes in a table are independent of each

other but, both depend on a third attribute.

A multivalued dependency consists of at least two attributes that are dependent on a

third attribute that's why it always requires at least three attributes.

If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.

It is represented by double arrow:

→→

For example:

P→→ Q

P→→ R

In the above case, Multivalued Dependency exists only if Q and R are independent

attributes.

8
www.gradeup.co

Student Name Course Activities


A Mathematics Singing
A Mathematics Dancing
B Computers Cricket
C Literature Dancing
C Literature Cricket
C Literature Singing

In the above table, we can see Students A and C have interest in more than one activity.
This is multivalued dependency because Course of a student are independent of Activities
but are dependent on the student.
Therefore, multivalued dependency:
StudentName →→ CourseDiscipline
StudentName →→ Activities
5.2. Second Normal Form (2 NF)
Relation R is in 2 NF iff:
• R is in 1 NF
• R does not contain any partial dependency.
Partial Dependency:
Let R be a relational schema and X, Y, Z are non- empty set of attributes.
X: Candidate key
Y: Proper subset of X or any candidate key
Z: Non-Prime attribute of R

Y→Z is said to be partial dependency iff,


• Y is proper subset of candidate key, or
• Y is a prime attribute
• There is no condition on non-prime → non-prime
Testing for 2 NF:
The proper subset of candidate key should be only prime, then no partial dependency
i.e. R is in 2 NF

9
www.gradeup.co

5.3. Third Normal Form (3 NF)


Relation R is in 3 NF iff every non-trivial functional dependency
X →Y in relation R where
• R is in 2 NF
• X must be a super key of R, or
• Y must be a prime attribute of R
• No transitive dependency exists for non-prime attributes.
Transitive Dependency:
A → B is called a transitive dependency if and only if:
a. A is not a super key.
b. B is a non-prime attribute.
If any one condition fails, then it is not a transitive dependency.
• Transitive dependency must not exist for non-prime attributes.
• However, transitive dependency can exist for prime attributes.
5.4. Boyce Codd Normal Form (BCNF)
• A relational schema R is in BCNF iff every non-trivial functional dependency X→Y is in
relation R with determinant X must be a candidate key/super key
5.5. Fourth Normal Form (4 NF)
Let R be the relational schema. F be the single multivalued dependency X → Y is in 4 NF
iff:
• X is a candidate key or super key.
•X∪Y=R

Note:

Always check the normal form from BCNF then 3 NF then 2 NF then 1 NF.

Example 5: R (A, B, C, D)
Let the FDs be
{AB → C,
C → A,
AC → D}
Find the highest normal form

10
www.gradeup.co

Sol:
First we find out the candidate keys.
A+ = A
AB+ = ABCD
BC+ = BCAD
∴ AB and BC are the candidate keys
Now,
C→A AC → D
AB → C
X X
BCNF ✓
∵ C not super key ∵AC not super key
X
3NF ✓ ✓ ∵ AC not super key
and D non-prime
X
2NF ✓ ✓ ∵A =A
+

C+ = ACD, D is non-prime

Hence, the above functional dependency is in 1 NF.


Important points:
• Non-prime attributes transitively determined by super key is not allowed in 3 NF.
• Prime attributes transitively determined by super key is allowed in 3 NF but not allowed
in BCNF.
• If every candidate key of relation R is simple candidate key, then relation R always in 3
NF.
• If every attribute of relation R is prime attribute, then relation R always in 3 NF but may
not be in BCNF.
• If relation R with no non-trivial functional dependency, then R always in BCNF.

****

11
www.gradeup.co

12
www.gradeup.co

1
www.gradeup.co

DBMS

4 SQL

1. SQL

• SQL is a standard language for storing, manipulating and retrieving data in databases.
• When the user wants to get some information from the database file, he/she can issue a
query.
• A query is a user-request to retrieve data with a certain condition.
• The user specifies a certain condition. The program will go through all the records in the
database file and select those records that satisfy the condition.
• The result of the query will be stored in the form of a table.

2. FEATURES OF SQL

• SQL is easy to learn.


• SQL allows users to set permissions on tables, procedures, and views.
• SQL is used to describe the data.
• SQL is used for storing, manipulating and retrieving data stored in relational database.
• SQL can execute queries against the database.
• SQL is used to define the data in the database and manipulate it when needed.
• SQL is used to create and drop the database and table.
• SQL is used to create a view, stored procedure, function in a database.

3. SQL QUERY STRUCTURE

Select A1,A2,...,An

From r1,r2,...,rn,

Where P

2
www.gradeup.co

4. TYPES OF SQL COMMANDS

4.1. Data Definition Language (DDL):


• DDL stands for Data Definition Language.
• It is used to define database structure or pattern.
• Data definition language is used to store the information of metadata like the number
of tables and schemas, their names, indexes, columns in each table, constraints, etc.
Different DDL commands are:
a. CREATE: It is used to create a new table in the database.
Syntax: CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);
Example: CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB
DATE);
b. DROP: It is used to delete both the structure and record stored in the table.
Syntax: DROP TABLE ;
Example: DROP TABLE EMPLOYEE;
c. ALTER: It is used to alter the structure of the database. This change could be either
to modify the characteristics of an existing attribute or probably to add a new attribute.
Syntax:
• To add a new column in the table
ALTER TABLE table_name ADD column_name COLUMN-definition;
• To modify existing column in the table:
ALTER TABLE MODIFY(COLUMN DEFINITION....);
d. TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.
Syntax:
TRUNCATE TABLE table_name;
Example:
TRUNCATE TABLE EMPLOYEE;
4.2. Data Manipulation Language (DML):
• DML commands are used to modify the database. It is responsible for all form of changes
in the database.
• Data Manipulation Language: DML stands for Data Manipulation Language. It is used
for accessing and manipulating data in a database. It handles user requests.
• The command of DML is not auto- committed that means it cannot permanently save
all the changes in the database. They can be rollback.
a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row
of a table.

3
www.gradeup.co

Syntax:
INSERT INTO TABLE_NAME
(col1, col2, col3,.... col N)
VALUES (value1, value2, value3, .... valueN);
Or
INSERT INTO TABLE_NAME
VALUES (value1, value2, value3, .... valueN);
For example:
INSERT INTO gradeup (Author, Subject) VALUES ("Peter", "DBMS");
b. UPDATE: This command is used to update or modify the value of a column in the
table.
Syntax:
UPDATE table_name
SET [column_name1= value1,...column_nameN = valueN]
[WHERE CONDITION]
For example:
UPDATE students
SET User_Name = 'Peter'
WHERE Student_Id = '3'
c. DELETE: It is used to remove one or more row from a table.
Syntax:
DELETE FROM table_name
[WHERE condition];
For example:
DELETE FROM gradeup
WHERE Author="Peter";
4.3. Data Control Language: DCL stands for Data Control Language. It is used to retrieve
the stored or saved data.
The DCL execution is transactional. It also has rollback parameters.
4.4. SQL Commands:
a. SELECT: SELECT statements are used to fetch data from a database. Every query will
begin with SELECT.
SELECT column_name
FROM table_name;
b. SELECT DISTINCT: SELECT DISTINCT specifies that the statement is going to be a
query that returns unique values in the specified column(s).
SELECT DISTINCT column_name
FROM table_name;

4
www.gradeup.co

c. WHERE: WHERE is a clause that indicates you want to filter the result set to include
only rows where the following condition is true.
SELECT column_name(s)
FROM table_name
WHERE column_name operator value;
d. WITH: WITH clause lets you store the result of a query in a temporary table using an
alias. You can also define multiple temporary tables using a comma and with one instance
of the WITH keyword.
WITH temporary_name AS (
SELECT *
FROM table_name)
SELECT *
FROM temporary_name
WHERE column_name operator value;
e. LIKE: LIKE is a special operator used with the WHERE clause to search for a specific
pattern in a column.
SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern;
f. DELETE: DELETE statements are used to remove rows from a table.
DELETE FROM table_name
WHERE some_column = some_value;
g. GROUP BY: GROUP BY is a clause in SQL that is only used with aggregate functions.
It is used in collaboration with the SELECT statement to arrange identical data into
groups.
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name;
h. HAVING: HAVING was added to SQL because the WHERE keyword could not be used
with aggregate functions.
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > value;
i. INSERT: INSERT statements are used to add a new row to a table.
INSERT INTO table_name (column_1, column_2, column_3)
VALUES (value_1, 'value_2', value_3);

5
www.gradeup.co

j. ORDER BY: ORDER BY is a clause that indicates you want to sort the result set by a
particular column either alphabetically or numerically.
SELECT column_name
FROM table_name
ORDER BY column_name ASC | DESC;
k. Aggregate Functions:
An aggregate function allows you to perform a calculation on a set of values to return a
single scalar value.
i. AVG () Function:
The AVG () function returns the average value of a numeric column.
Syntax: SELECT AVG (column_name) FROM table_name
ii. COUNT () Function:
The COUNT () function returns the number of rows that matches a specified criterion.
• COUNT (column_name)syntax:
It returns the number of values (NULL values will not be counted) to the specified column:
SELECT COUNT (column_name) FROM table_name;
• COUNT(*) Syntax:
It returns the number of records in ab table:
SELECT COUNT (*) FROM table_name;
• COUNT (DISTINCT column_name)Syntax:
It returns the number of distinct values of the specified column:
SELECT COUNT (DISTINCT column_name) FROM table_name;
iii. MAX () Function;
The MAX() function returns the largest value of the selected column.
Syntax: SELECT MAX (column_name) FROM table_name;
iv. MIN () Function:
The MIN () function returns the smallest value of the selected column.
Syntax: SELECT MIN (column_name) FROM table_name;
v. SUM () Function:
The SUM () function returns the total sum of a numeric column.
vi. LEN () Function:
The LEN () function returns the lenth of the value in a text field.
Syntax: SELECT LEN (column_name) FROM table_name;
vii. HAVING Clause: The HAVING clause was added to SQL because the WHERE keyword
could be used with aggregate functions.
viii. GROUP BY Statement: The GROUP BY statement is used in conjunction with the
aggregate functions to group the result set by one or more columns.

6
www.gradeup.co

l. IN Operator: It allows to specify multiple values in a WHERE clause.


SELECT column_name
FROM table_name
WHERE column_name IN (value1, value2,…)
m. BETWEEN Operator: It selects the values within a range.
SELECT column_name
FROM table_name
WHERE column_name BETWEEN value1 and value2
n. List all database:
show database;
o. Creating a database:
create database GradeUp;
p. JOIN in database: It is used to combine data or rows from two or more tables based
on a common field between them.
• Cartesian Join:
The Cartesian Join or Cross Join returns the Cartesian product of the sets of records from
two or more joined tables. Thus, it equates to an inner join where the join-condition
always evaluates to either True or where the join-condition is absent from the statement.
SELECT table1.column1, table2.column2...
FROM table1, table2 [, table3 ]
The most important and frequently used of the joins is the INNER JOIN. They are also
referred to as an EQUIJOIN.
• Inner Join:
The most important and frequently used of the joins is the Inner Join.
The INNER JOIN creates a new result table by combining column values of two tables
(table1 and table2) based upon the join-predicate. The query compares each row of
table1 with each row of table2 to find all pairs of rows which satisfy the join-predicate.
When the join-predicate is satisfied, column values for each matched pair of rows of A
and B are combined into a result row.
SELECT table1.column1, table2.column2...
FROM table1
INNER JOIN table2
ON table1.common_field = table2.common_field;
• Left Outer Join:
This join returns all the rows of the table on the left side of the join and matching rows
for the table on the right side of join. The rows for which there is no matching row on
right side, the result-set will contain null. LEFT JOIN is also known as LEFT OUTER JOIN.

7
www.gradeup.co

SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
• Right Outer Join
RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table on the right
side of the join and matching rows for the table on the left side of join. The rows for which
there is no matching row on left side, the result-set will contain null. RIGHT JOIN is also
known as RIGHT OUTER JOIN.
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
• FULL JOIN:
The SQL FULL JOIN combines the results of both left and right outer joins.
The joined table will contain all records from both the tables and fill in NULLs for missing
matches on either side.
SELECT table1.column1, table2.column2...
FROM table1
FULL JOIN table2
ON table1.common_field = table2.common_field;
4.5. SQL Constraints:
• SQL Constraints are rules used to limit the type of data that can go into a table, to
maintain the accuracy and integrity of the data inside table.
• Constraints can be divided into the following two types,
o Column level constraints: The column level constraints are applied only to one
column.
o Table level constraints: Table level constraints are applied to the whole table.
Constraints are used to make sure that the integrity of data is maintained in the database.
Following are the most used constraints that can be applied to a table.
4.5.1. NOT NULL:
• Not Null constraint restricts a column from having a NULL value. Once NOT NULL
constraint is applied to a column, you cannot pass a null value to that column. It enforces
a column to contain a proper value.
• It cannot be defined at table level.

8
www.gradeup.co

4.5.2. UNIQUE Constraint:


• UNIQUE constraint ensures that a field or column will only have unique values. A
UNIQUE constraint field will not have duplicate data.
• This constraint can be applied at column level or table level.
4.5.3. Primary Key Constraint:
• Primary key constraint uniquely identifies each record in a database. A Primary Key
must contain unique value and it must not contain null value.
• Usually Primary Key is used to index the data inside the table.
4.5.4. Foreign Key Constraint:
FOREIGN KEY is used to relate two tables. FOREIGN KEY constraint is also used to restrict
actions that would destroy links between tables.
4.5.5. Check constraint:
The Check constraint is used to limit the value range that can be placed in a column.
If CHECK constraint is defined on a single column it allows only certain values for this
column.
If Check constraint is defined on a table it can limit the values in certain columns based
on values in other columns in the row.
4.5.6. Default Constraint:
The Default constraint is used to provide a default value for a column.
Example: Consider the given table called Persons

P-Id Lastname Firstname Address City


1. Peter Alex Timoteivn -10 Sandnes
2. Han Max Brazil-50 Sandnes
3. Sharma Pari Storgt-20 Stavanger
4. Joseph Ole Brazil-20 Sandnes

Write a query to select the persons with first name ‘Max’ and last name ‘Han’?
Sol:
SELECT*
FROM Persons
WHERE firstname = ‘Max’
AND lastname = ‘Han’

****

9
www.gradeup.co

10
www.gradeup.co

1
www.gradeup.co

DBMS

5 RELATIONAL MODEL

1. NoSQL

• A NoSQL originally referring to non SQL or non relational is a database that provides a
mechanism for storage and retrieval of data.
• This data is modelled in means other than the tabular relations used in relational databases.
• The purpose of using a NoSQL database is for distributed data stores with humongous data
storage needs.
1.1. NoSQL Features:
NoSQL Databases can have a common set of features such as:
• Non-relational data model.
• Runs well on clusters.
• Mostly open-source.
• Built for the new generation Web applications.
• Is schema-less.
1.2. Advantages of NoSQL:
There are many advantages of working with NoSQL databases such as MongoDB and
Cassandra. The main advantages are high scalability and high availability.
a. High scalability:
Partitioning of data and placing it on multiple machines in such a way that the order of
the data is preserved is sharding. NoSQL database use sharding for horizontal scaling.
Vertical scaling means adding more resources to the existing machine whereas horizontal
scaling means adding more machines to handle the data. Vertical scaling is not that easy
to implement but horizontal scaling is easy to implement. Examples of horizontal scaling
databases are MongoDB, Cassandra etc. NoSQL can handle huge amount of data because
of scalability, as the data grows NoSQL scale itself to handle that data in efficient manner.
b. High availability:
Auto replication feature in NoSQL databases makes it highly available because in case of
any failure data replicates itself to the previous consistent state.

2
www.gradeup.co

1.3. Disadvantages of NoSQL:


NoSQL has the following disadvantages.
a. Narrow focus:
NoSQL databases have very narrow focus as it is mainly designed for storage but it
provides very little functionality. Relational databases are a better choice in the field of
Transaction Management than NoSQL.
b. Management challenge:
The purpose of big data tools is to make management of a large amount of data as simple
as possible. But it is not so easy. Data management in NoSQL is much more complex
than a relational database. NoSQL, in particular, has a reputation for being challenging
to install and even more hectic to manage on a daily basis.
c. Backup:
Backup is a great weak point for some NoSQL databases like MongoDB. MongoDB has no
approach for the backup of data in a consistent manner.
d. Large document size:
Some database systems like MongoDB and CouchDB store data in JSON format. Which
means that documents are quite large (BigData, network bandwidth, speed), and having
descriptive key names actually hurts, since they increase the document size.
1.4. Types of NoSQL:
a. Key-value stores: The key-value model is the simplest and easiest to implement.
Every item in the database is stored as an attribute name (or "key") together with its
value. Riak, Voldemort, and Redis are the most well-known in this category.
b. Wide-column stores: They store data together as columns instead of rows and are
optimized for queries over large datasets. The most popular are Cassandra and HBase.
c. Document databases: They pair each key with a complex data structure known as a
document. Documents can contain many different key-value pairs, or key-array pairs, or
even nested documents. MongoDB is the most popular of these databases.
d. Graph databases: They are used to store information about networks, such as social
connections. Examples are Neo4J and HyperGraphDB.

3
www.gradeup.co

2. PostgreSQl

• PostgreSQL (pronounced as post-gress-Q-L) is an open source relational database


management system (DBMS) developed by a worldwide team of volunteers.
• PostgreSQL is not controlled by any corporation or other private entity and the source code
is available free of charge.
• PostgreSQL runs on all major operating systems, including Linux, UNIX and Windows.
• It supports text, images, sounds, and video, and includes programming interfaces for C /
C++, Java, Perl, Python, Ruby, Tcl and Open Database Connectivity (ODBC).
• PostgreSQL supports four standard procedural languages, which allows the users to write
their own code in any of the languages and it can be executed by PostgreSQL database
server.
2.1. PostgreSQL Data Types:

4
www.gradeup.co

2.2. PostgreSQL Syntax:


a. Create Table:
Syntax:
CREATE TABLE table_name(
column1 datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
PRIMARY KEY( one or more columns )
);
Query:- make a new table named “student” with sRN, sName, sEmail, sGender,sAddr
with sRN as primary key.
b. PostgreSQL INSERT Statement
Syntax:
INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)
VALUES (value1, value2, value3,...valueN);
c. PostgreSQL SELECT Query
Syntax:
SELECT "column1", "column2".."column" FROM "table_name";
d. PostgreSQL WHERE Clause
Syntax:
SELECT column1, column2, ..... columnN
FROM table_name
WHERE [search_condition]
e. PostgreSQL Outer Join
Left Outer Join:
Syntax:
SELECT table1.columns, table2.columns
FROM table1
LEFT OUTER JOIN table2
ON table1.common_filed = table2.common_field;

5
www.gradeup.co

2.3. PostgreSQL Vs NoSQL:

Different aspects PostgreSQL SQL Server


It is an object-relational database
management system that is compatible It is a database management
with various SQL features such as system that is primarily used for
Definition
subqueries, foreign keys, and triggers. multiple data warehousing
Also, it supports multiple user-defined solutions and E-commerce.
types and functions.
Programmed or It has primarily written in C++
It has primarily written in C language.
Written in language.
The PostgreSQL global development
Developed Microsoft develops it.
group develops it.
PostgreSQL was released on 8th July SQL Server was released on 24
Released
1996. April 1989.
Latest release The latest release version is PostgreSQL The latest release version is SQL
version 12.3 [May 2020] Server 2019 [November 2019]
License It is an open-source tool. It is a commercial tool.
It is compatible with various
It is compatible with various programming
programming languages such as
Programming languages such as C, C++, Java, .Net,
C++, PHP, C#, Go, Java, Python
languages Perl, Python, Tcl, JavaScript (Node.js),
JavaScript (Node.js), PHP, R,
and PHP.
Visual Basic, and Ruby.
• Following are the features of
Microsoft SQL Server: It is
platform-independent.
• Following are the features of
• It is compatible with SQL
PostgreSQL: It is free to download.
(SEQUEL) language, case
• It is highly extensible.
insensitive, and has a shared
• It supports various programming
database.
Features languages.
• It is highly scalable.
• It is highly reliable.
• It is both commands based and
• It supports multiple features of SQL.
GUI software.
• Compatible on several operating
• It is implemented from the
systems.
requirement of RDBMS.
• It is compatible with Data Integrity.
• It can be run on a single
computer system or a network of
cloud servers.
Primarily it supports Unix, Window, Linux,
Platforms Primarily it supports Windows and
FreeBSD, HP-UX, NetBSD, OpenBSD, OS
Linux operating systems.
X, and Solaris.
Case sensitive
PostgreSQL is case sensitive. SQL Server is not case sensitive.

6
www.gradeup.co

3. RELATIONAL ALGEBRA

Procedural Query Language Non-Procedural Query Language


In the non-procedural languages, the
In procedural languages, the program code is written as
user must specify only what to do and not
a sequence of instructions. Formulation of how to
how to do it.
retrieve data from the database and what data to
Example: Relational Calculus i.e.
retrieve from database table.
• Tuple relational calculus (TRC)
Example: Relational Algebra
• Domain relational calculus (DRC)

Result of relational algebra is always distinct tuples


3.1. Basic Operators:
There are 6 basic operators of relational algebra:
• Selection operator (σ)
• Projection Operator (∏)
• Rename(ρ)
• Cross Product(X)
• Union (U)
• Set Difference (-)
3.2. Derived Operators:
Derived operators are those operators which can be derived from basic operators. There
are mainly three types of extended operators in Relational Algebra:
• Intersection (∩) {derived using set difference}
• Join (⋈) {derived using cross product, selection and projection}
• Division Operator (÷) {derived using selection, set difference and cross product}

4. SELECTION (σp)

• It is a unary operator
• It is denoted by σp where σ is the selection operator and p is the predicate condition.
• This operator retrieves the records of relation R which satisfies the predicate condition p.
• It is commutative.

σ subject = "Database" (Books),

Selects tuples from books where subject is


'database'.

7
www.gradeup.co

Example:
R: A B C
5 7 9
5 7 9
6 9 7
7 9 7
8 10 6

σA>5(R) = A B C
6 9 7
7 9 7
8 10 6

5. PROJECTION (∏attribute list (R))

• ∏attribute list (R) is used to project required attributes from relation R and remaining attributes
are discarded.
• Commutativity does not hold on project.
• The number of tuples in PROJECT are always less than or equal to the number of tuples in
R.
• ∏List 1 (∏List 2 (R)) = ∏List 1 (R) iff list 2 ⊇ list 1.

Π Subject, author (Customers)

Selects and projects columns named as subject and author from the
relation Books.

Example:
R: A B C
5 7 9
5 7 9
6 9 7
7 9 7
8 10 6

Π B,C(R) = B C
7 9
9 7
10 6

8
www.gradeup.co

6. RENAME OPERATION (ρ)

• The results of relational algebra do not have a name that can be used to refer to them.
• The rename operation allows us to rename the output relation.
• 'rename' operation is denoted with small Greek letter rho ρ.

a. ρ(S,R): Renaming table name R to S.

b. ρ A, B, C(R): Renaming the attributes of table R.

• The general rename operation can be expressed by any of the following:

7. CROSS PRODUCT(X)

• Result is all attributes of R followed by all attributes of S and each record of R pairs with
every record of S.
• It is denoted by X.
• The result R X S contains one tuple (r, s) for each r ∈ R, s ∈ .
• It is also called Cartesian product.

Example:
R: A B C S: C D
3 4 3 5 4
4 6 9 9 5
5 7 9

R×S: A B C C D
3 4 3 5 4
3 4 3 9 5
4 6 9 5 4
4 6 9 9 5
5 7 9 5 4
5 7 9 9 5

9
www.gradeup.co

8. SET OPERATIONS

• Result of set operations are always distinct tuples


• Schema of result is same as schema of R.
8.1. Union (U):
• Suppose there are two tuples R and S. The union operation contains all the tuples that
are either in R or S or both in R & S.
• It eliminates the duplicate tuples. It is denoted by ∪.
• For a union operation to be valid, the following conditions must hold -
i. R and S must be the same number of attributes.
ii. Attribute domains need to be union compatible i.e. they have same number of
fields and corresponding fields, taken in order from left to right, have same
domain.
Example:
R: A B C
S: A B C
4 7 4
4 7 4
4 7 4
4 7 4
5 7 7
7 2 4
5 7 7
6 2 4

R ∪ S: A B C
4 7 7
5 7 4
6 2 4
7 2 4
8.2. Intersection (∩):
• The set intersection operation contains all tuples that are in both R & S.
• It is denoted by intersection ∩.
• Defines a relation consisting of a set of all tuple that are in both A and B. However, A
and B must be union compatible.
• It is also defined in the terms of set difference as: R ∩ S = R- (R - S)

10
www.gradeup.co

Example:
R: A B C
S: A B C
4 7 4
4 7 4
4 7 4
4 7 4
5 7 7
7 2 4
5 7 7
6 2 4

R ∪ S: A B C
4 7 7

8.3. Set Difference (-):


• The set difference operation contains all tuples that are in R but not in S.
• It is denoted by intersection minus (-).
• The attribute name of R must match with the attribute name in S.
• The two-operand relations R and S should be either compatible or Union compatible.

R: A B C
4 7 4
S: A B C
4 7 4
4 7 4
5 7 7
4 7 4
5 7 7
6 2 4 7 2 4

R – S: A B C
5 7 7
6 2 4

11
www.gradeup.co

9. JOIN

• A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied.
• Join operation is essentially a cartesian product followed by a selection criterion.
• It is denoted by ⋈.
• There are 4 types of join:
➢ Natural Join
➢ Conditional Join
➢ Equi Join
➢ Outer Join
9.1. Natural Join:
• Natural join can only be performed if there is a common attribute (column) between
the relations.
• The name and type of the attribute must be same.
• As we are taking cross product so we do not remove common columns from the result.
R ⋈S = π distinct attributes (σ equality between same name attributes from R, S (R × S))
Example:
R: A B C
S: C D
3 4 3
5 4
4 6 9
9 5
5 7 9

R ⋈ S = πA, B, C, D (σR.C = S.C) (R × S)


R⋈S A B C D
4 6 9 5
5 7 9 5
9.2. Conditional join:
• Conditional join works like natural join that accepts a join condition and a pair of
relational instances.
• In natural join, by default condition is equal between common attribute while in
conditional join we can specify the any condition.
R: A B C
S: C D
3 4 3
4 6 9 5 4
5 7 9 9 5
R ⋈c S = σc (R × S)
Let us check for condition that is attribute c in R is greater than 7.
R ⋈ S = σc≥7 (R × S)
c≥7

12
www.gradeup.co

A B C C D
4 6 9 5 4
4 6 9 9 5
5 7 9 5 4
5 7 9 9 5
9.3. Equi Join:
• It is also known as an inner join.
• When the condition join uses only equivalence condition, it becomes a equi join.
• It is based on matched data as per the equality condition. The equi join uses the
comparison operator (=).
9.4. Outer Joins:
Let us use two tables R and S:
R: A B C S: C D
4 6 9 9 5
3 4 3 5 4
5 7 9
i. Left Outer Join:
• In the left outer join, operation allows keeping all tuple in the left relation.
• It is denoted by ⟕.
• If there is no matching tuple is found in right relation, then the attributes of right
relation in the join result are filled with null values.

R⋈S= A B C D
4 6 9 8
5 7 9 5
3 4 3 Null
ii. Right Outer Join:
• In the right outer join, operation allows keeping all tuple in the right relation.
• It is denoted by ⟖.
• If there is no matching tuple is found in the left relation, then the attributes of the
left relation in the join result are filled with null values.

13
www.gradeup.co

R⋈S= A B C D
4 6 9 5
5 7 9 5
Null Null 5 4
iii. Full Outer Join:
• In a full outer join, all tuples from both relations are included in the result,
irrespective of the matching condition.
• It is denoted by ⟗.
R ⟗ S = ( R ⟕ S) ∪ (R ⟖ S)
A B C D
4 6 9 8
5 7 9 5
3 4 3 Null
Null Null 5 4

10. DIVISION

Division operator R÷S can be applied if and only if:


• Attributes of S is proper subset of Attributes of R.
• The relation returned by division operator will have attributes = (All attributes of R – All
Attributes of S)
• The relation returned by division operator will return those tuples from relation R which are
associated to every S’s tuple.
Expansion of Division
∏sidcid(E)/ ∏cid(C) = Retrieve’s sid’s enrolled every course
(a) Retrieve Sid’s not enrolled every course [sid’s enrolled proper subset of all courses]

14
www.gradeup.co

Sid' s enrolled sid' s enrolled  sid' s not enrolled


(b)  = = 
 every course  for some course  for every course 

enrolled proper
subset of courses

∏sidcid (E)/ ∏eid(C) = ∏sid(E) - ∏sid((∏sid(E) × ∏cid(C) – ∏sidcid(E))

11. TUPLE RELATIONAL CALCULUS (TRC)

• In the tuple relational calculus, you will have to find tuples for which a predicate is true. The
calculus is dependent on the use of tuple variables.
• The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering
variable uses the tuples of a relation.
• The result of the relation can have one or more tuples.
Notation:
1. {T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
For example:
1. {T.name | Author(T) AND T.article = 'database' }
OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name'
from Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal
Quantifiers (∀).
Example 1: {t | EMPLOYEE (t) and t.SALARY>10000} : Implies that it selects the tuples
from EMPLOYEE relation such that resulting employee tuples will have salary greater than
10000. It is example of selecting a range of values.
Example 2: {t | EMPLOYEE (t) AND t.DEPT_ID = 10} : This select all the tuples of
employee name who work for Department 10.
The variable which is used in the condition is called tuple variable. In above example t.SALARY
and t.DEPT_ID are tuple variables. In the first example above, we have specified the condition
t.SALARY >10000. What is the meaning of it? For all the SALARY>10000, display the
employees. Here the SALARY is called as bound variable. Any tuple variable with ‘For All’ (?)
or ‘there exists’ (?) condition is called bound variable. Here, for any range of values of SALARY
greater than 10000, the meaning of the condition remains the same. Bound variables are those
ranges of tuple variables whose meaning will not change if the tuple variable is replaced by
another tuple variable.
In the second example, we have used DEPT_ID= 10. That means only for DEPT_ID = 10 display
employee details. Such variable is called free variable. Any tuple variable without any ‘For

15
www.gradeup.co

All’ or ‘there exists’ condition is called Free Variable. If we change DEPT_ID in this condition to
some other variable, say EMP_ID, the meaning of the query changes. For example, if we change
EMP_ID = 10, then above it will result in different result set. Free variables are those ranges
of tuple variables whose meaning will change if the tuple variable is replaced by another tuple
variable.
All the conditions used in the tuple expression are called as well-formed formula – WFF. All the
conditions in the expression are combined by using logical operators like AND, OR and NOT,
and qualifiers like ‘For All’ (?) or ‘there exists’ (?). If the tuple variables are all bound variables
in a WFF is called closed WFF. In an open WFF, we will have at least one free variable.
For example:
1. {R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output: This query will yield the same result as the previous one.

Operation Relational Algebra Tuple Relational Calculas

Selection σcond(R) {T|R(T) AND Cond (T)}

Projection ΠA1, A2,….Ak(R) {T|T.A1,T.A2,….,T.Ak|R(T)}


Cartesian {T|∋T1 ∈R, ∋T2 ∈R(T.A1 = T1.A1, AND,…., T.An =
R×S
Product T1.An AND T.B1 = T2.B1 AND,…., T.Bm = T2.Bm)]
Union R∪S {T|R(T) AND S(T)}
{T|R(T) AND ∀ T1 ∈ S, (T1 <> T)}
Set Difference R–S where T <> T1 is T.A1 <> T1A1 OR …. T.An <>
T1.An

****

16
www.gradeup.co

17
www.gradeup.co

1
www.gradeup.co

DBMS

6 FILE STRUCTURE

1. FILE ORGANIZATION

• File Organization refers to the logical relationships among various records that constitute
the file, particularly with respect to the means of identification and access to any specific
record.
• In simple terms, Storing the files in certain order is called file Organization.
• File Structure refers to the format of the label and data blocks and of any logical control
record.
• The database is stored as a collection of files.
• File is a collection of blocks.
• A block is a collection of records.
• A record is a sequence of fields.
For storing the records within the blocks there are two ways:
a. Spanned Organization:
• Allows part of a record to be in one block and the rest of it to be on the next block.
• When (portions of) a single record may lie in different blocks, due to their large size, then
such records are called Spanned Records.
• No internal fragmentation except for the last block of the file, hence no memory wasted.
• Searching time will be more as for a single record more than two blocks are accessed.

2
www.gradeup.co

b. Unspanned Organization:
• When many records are restricted to fit within one block due to their small size then such
records are called Unspanned Records.
• Complete record is allocated in one block.
• Internal fragmentation is present.
• Less searching time.

2. BLOCK FACTOR OF DATABASE FILE

The blocking factor for a file is the average number of file records stored in a disk block.

 Block size 
Block factor=   recods/block
 Re cord size 

 Block size − Block Header 


= 
 Record size 

3. I/O COST

Number of disk blocks (database file blocks) required to transfer from disk to main memory in
order to access required records.

3
www.gradeup.co

4. TYPES OF FILE ORGANIZATIONS

4.1. Sequential File:


• Storing and sorting in contiguous block within files on tape or disk is called
as sequential access file organization.
• In sequential access file organization, all records are stored in a sequential order. The
records are arranged in the ascending or descending order of a key field.
• Sequential file search starts from the beginning of the file and the records can be
added at the end of the file.
• In sequential file, it is not possible to add a record in the middle of the file without
rewriting the file.
 Block size 
• Blocking Factor :  
 Record size 

 Total no. of records 


• Number of record blocks :  
 Blocking Factor 

 # of record blocks 
• Average number of blocks accessed by linear search:  
 2 
• Average number of blocks accessed by binary search: ⌈log2 # of record blocks⌉
4.2. Direct access file organization
• Direct access file is also known as random access or relative file organization.
• In direct access file, all records are stored in direct access storage device (DASD), such
as hard disk. The records are randomly placed throughout the file.
• The records do not need to be in sequence because they are updated directly and
rewritten back in the same location.
• This file organization is useful for immediate access to large amount of information. It
is used in accessing large databases.
• It is also called as hashing.
4.3. Indexed sequential access file organization
• Indexed sequential access file combines both sequential file and direct access file
organization.
• In indexed sequential access file, records are stored randomly on a direct access device
such as magnetic disk by a primary key.
• This file has multiple keys. These keys can be alphanumeric in which the records are
ordered is called primary key.
• The data can be access either sequentially or randomly using the index. The index is
stored in a file and read into memory when the file is opened.

4
www.gradeup.co

5. INDEXING

• Indexing is defined as a data structure technique which allows to retrieve records from a
database file.
• Indexing is used to optimize the performance of a database by minimizing the number of
disk accesses required when a query is processed.
• The index is a type of data structure. It is used to locate and access the data in a database
table quickly.
• Indexing is a way to optimize the performance of a database by minimizing the number of
disk accesses required when a query is processed. It is a data structure technique which is
used to quickly locate and access the data in a database.
• Indexes are created using a few database columns.
➢ The first column is the Search key that contains a copy of the primary key or candidate
key of the table. These values are stored in sorted order so that the corresponding data
can be accessed quickly. The data may or may not be stored in sorted order.
➢ The second column is the Data Reference or Pointer which contains a set of pointers
holding the address of the disk block where that particular key value can be found.
There are two types of indexing:

Index File:
Each index file consists of two fields: search key and pointer
Hence size of index entry = search key + pointer

 # of record blocks + 1 
• Index blocking factor =  
 2 

 # of record blocks 
• First (single) level index blocks:  
 index blocking factor 
• Number of block accesses = ⌈ log2 (first level index blocks) ⌉ + 1
Block factor of Index File:
 Block size – Block Header 
Block factor of index file=   Entries/block
 Search Key + Pointer 

5
www.gradeup.co

5.1. Categories of Ordering of Index:


a. Dense Index:
• Index entry for each record of database.
• The dense index contains an index record for every search key value in the data file.
It makes searching faster.
• In this, the number of records in the index table is same as the number of records in
the main table.
• It needs more space to store index record itself. The index records have the search
key and a pointer to the actual record on the disk.
• In dense index, Number of Index entries = Number of records in database file.

b. Sparse Index:
• For set of records of Database file there exist one index entry.
• In the data file, index record appears only for a few items. Each item points to a block.
• In this, instead of pointing to each record in the main table, the index points to the
records in the main table in a gap.
• In sparse index, Number of Index entries = Number of blocks

6
www.gradeup.co

Ordered Indices: The indices are usually sorted to make searching faster. The indices
which are sorted are known as ordered indices.
Unordered Indices: The indices which are not sorted are known as unordered indices.
5.2. Indexing Models:
a. Primary Index:
• It is defined as an ordered file.
• The index is created based on the primary key of the table.
• It is a sparse indexing on ordered(sorted) key attribute.
• To get a record with value k, binary search is performed on the index file to find the
appropriate index entry i and then retrieve the data file block whose address is p(i).
• I/O cost to access record using primary index with multilevel index is K+1 block.

b. Clustered Index:
• A clustered index can be defined as an ordered data file.
• The records are sorted over the attribute name which is not a primary key.
• The data file is ordered on a non-key field.
• Same elements are taken in a single cluster.
• In clustering index, the pointers point to the first address of the block, next pointer is
used to access the next block.
• It is a sparse index on ordered field and non-key attribute.
• I/O cost to access one cluster of records using clustering index with multilevel index is
K+ one or more block access until 1st block of next cluster.

7
www.gradeup.co

c. Secondary Index:
• It is a dense index on unordered data file.
• In secondary indexing, to reduce the size of mapping, another level of indexing is
introduced.
• In this method, the huge range for the columns is selected initially so that the mapping
size of the first level becomes small.
• Then each range is further divided into smaller ranges.
• The mapping of the first level is stored in the primary memory, so that address fetch
is faster.
• The mapping of the second level and actual data are stored in the secondary memory
(hard disk).

8
www.gradeup.co

d. Multi-level index:
• Index records comprise search-key values and data pointers. Multilevel index is stored
on the disk along with the actual database files.
• As the size of the database grows, so does the size of the indices. If single-level index
is used, then a large size index cannot be kept in memory which leads to multiple disk
accesses.
• Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which
can easily be accommodated anywhere in the main memory.

6. B+ TREE

• It is a balanced search tree or height restricted search tree.


• Maximum height of tree should not exceed O(logn).
• The leaf nodes of a B+ tree denote actual data pointers.
• B+ tree ensures that all leaf nodes remain at the same height, thus balanced.
• All the leaf nodes are connected via link just like in linked list.

Structure:

• The leaf nodes are at equal distance from the root.


• The structure consists of a internal and leaf node.
• Order, P of a tree is the maximum block pointers that can be stored in a B + Tree.

Let, K is the key, B is the block pointer, and R is the record pointer.

Then, the node structure will be:


(a) Leaf Node:
It consists of set of (Key, record pointer) pairs and one block pointer which points to next
leaf. node.

(b) Internal Node:


It contains only keys and Block pointers

9
www.gradeup.co

Balancing Conditions:

p  p 
• Every internal node except root node contains atleast   block pointers &   − 1 keys.
2  2 
• Internal node can contain at most p pointers and (p – 1) keys.
• Root node can be with atleast 2 block pointer and 1 key. and atmost p block pointer (p – 1)
keys.
• Root node can be with atleast 2 block pointer and 1 key and atmost p block pointer & (p –
1) keys.

p 
• The leaf node can contain atleast   − 1 keys and atmost P-1 keys
2 
6.1. B+ Tree Insertion:
Descend to the leaf where the key fits.
• If the node has an empty space, insert the key/reference pair into the node.
• If the node is already full, split it into two nodes, distributing the keys evenly between
the two nodes. If the node is a leaf, take a copy of the minimum value in the second of
these two nodes and repeat this insertion algorithm to insert it into the parent node. If
the node is a non-leaf, exclude the middle value during the split and repeat this insertion
algorithm to insert this excluded value into the parent node.
Example: Insert a record 60 in the structure given below. With order P= 5.

Sol: order = 5, so 4 keys can be stored in a leaf node.


• As we have to insert 60 so, it will go to 3rd leaf node after 55 because 60 is greater
than 55 and less than 75.
• It is a balanced tree and that leaf node is already full, we cannot insert the record
there. So, we split the node to insert 60.

• The 3rd leaf node should have values (50, 55, 60, 65, 70) and its current root node is
50.

10
www.gradeup.co

• Split the leaf node in the middle so that its balance is not altered. Now group (50, 55)
and (60, 65, 70) into 2 leaf nodes.
• 60 is taken to the upper level because the two newly created leaf nodes cannot
branch from 50 hence 60 added to it.

• In a non-overflowing scenario simply find the node where it fits and place it in that leaf
node.
6.2. B+ Tree Deletion:
Descend to the leaf where the key exists.
• Remove the required key and associated reference from the node.
• If the node still has enough keys and references to satisfy the invariants, stop.
• If the node has too few keys to satisfy the invariants, but its next oldest or next
youngest sibling at the same level has more than necessary, distribute the keys between
this node and the neighbor. Repair the keys in the level above to represent that these
nodes now have a different “split point” between them; this involves simply changing a
key in the levels above, without deletion or insertion.
• If the node has too few keys to satisfy the invariant, and the next oldest or next
youngest sibling is at the minimum for the invariant, then merge the node with its sibling;
if the node is a non-leaf, we will need to incorporate the “split key” from the parent into
our merging. In either case, we will need to repeat the removal algorithm on the parent
node to remove the “split key” that previously separated these merged nodes — unless
the parent is the root and we are removing the final key from the root, in which case the
merged node becomes the new root (and the tree has become one level shorter than
before).
Example: Delete 60 from the diagram given below. Here, order P =5.

Sol:

11
www.gradeup.co

• To delete 60 first, remove 60 from the intermediate node as well as from the 4th leaf
node too.
• If we remove it from the intermediate node, then the tree does not satisfy the rule of
the B+ tree.
• After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as
follows:

7. B TREE

• It is a balanced search tree or height restricted search tree.


• Maximum height of tree should not exceed O(logn).
(i) Each internal node in the B-tree is of the form
<P1, <K1, Pr1>, P2, <K2, Pr2>, …, <Kq-1 >, Pq>
Where q ≤ p. Each Pi is a tree pointer—a pointer to another node in the B-tree. Each Pri is a
data pointer—a pointer to the record whose search key field value is equal to K i (or to the data
file block containing that record).
(ii) Within each node, K 1 < K2 <… < Kq–1.
(iii) For all sarch key field values X in the subtree pointed at by Pi (the ith subtree, we have:
Ki–1 < X < Ki for 1 < i < q; X < Ki for i= 1; and Ki–1 < X for i = q.
(iv) Each node has at most p tree pointers.
(v) Each node, except the root leaf nodes, has at least ⌈p/2⌉ tree pointers. The root node has
at least two tree pointers unless it is the only node in the tree.
(vi) A node with q tree pointers, q ≤ p, has q – 1 search key field values (and hence has q – 1
data pointers).

7.1. B- Tree Insertion:


• Initialize x as root.
• While x is not leaf, do following
➢ Find the child of x that is going to be traversed next. Let the child be y.

12
www.gradeup.co

➢ If y is not full, change x to point to y.


➢ If y is full, split it and change x to point to one of the two parts of y.
If k is smaller than mid key in y, then set x as the first part of y.
Else second part of y. When we split y, we move a key from y to its parent x.
• The loop in step 2 stops when x is leaf. x must have space for 1 extra key as we have
been splitting all nodes in advance. So simply insert k to x.
Example: Insert 90 in the tree given below.

Sol: This insertion will cause a split. The middle key will go up to the parent.

7.2. B- Tree Deletion:


• Case 1: If the key is already in a leaf node and removing it doesn’t cause that leaf
node to not satisfy the balancing property, then simply remove the key to be deleted.
key k is in node x and x is a leaf, simply delete k from x.

Deleting 6

13
www.gradeup.co

• Case 2: If key k is in node x and x is an internal node, there are three cases to consider:
a. If the child y that precedes k in node x has at least t keys (more than the minimum),
then find the predecessor key k' in the subtree rooted at y. Recursively delete k' and
replace k with k' in x
b. Symmetrically, if the child z that follows k in node x has at least t keys, find the

successor k' and delete and replace as before. Note that finding k' and deleting it can be

performed in a single downward pass.

c. Otherwise, if both y and z have only t−1 (minimum number) keys, merge k and all of

z into y, so that both k and the pointer to z are removed from x. y now contains 2t − 1

keys, and subsequently k is deleted.

7 deleted

• Case 3: If key k is not present in an internal node x, determine the root of the

appropriate subtree that must contain k. If the root has only t − 1 keys, execute either

of the following two cases to ensure that we descend to a node containing at least t keys.

Finally, recurse to the appropriate child of x.

a. If the root has only t−1 keys but has a sibling with t keys, give the root an extra key

by moving a key from x to the root, moving a key from the roots immediate left or right

sibling up into x, and moving the appropriate child from the sibling to x.

14
www.gradeup.co

2 delete

b. If the root and all its siblings have t−1 keys, merge the root with one sibling. This
involves moving a key down from x into the new merged node to become the median key
for that node.

4 deleted

15
www.gradeup.co

8. B TREE VS B+ TREE

S.No. B Tree B+ Tree

1. Search keys cannot be repeatedly stored. Redundant search keys can be present.

Data can be stored in leaf nodes as well as


2. Data can only be stored on the leaf nodes.
internal nodes
Searching for some data is a slower process
Searching is comparatively faster as data
3. since data can be found on internal nodes as
can only be found on the leaf nodes.
well as on the leaf nodes.
Deletion will never be a complexed
Deletion of internal nodes are so complicated
4. process since element will always be
and time consuming.
deleted from the leaf nodes.
Leaf nodes are linked together to make
5. Leaf nodes cannot be linked together
the search operations more efficient.

****

16
www.gradeup.co

17
www.gradeup.co

1
www.gradeup.co

DBMS

7 TRANSACTION & CONCURRENCY CONTROL

1. TRANSACTION

A transaction is a single logical unit of work which accesses and possibly modifies the contents
of a database. Transactions access data using read and write operations.
a. Read Operation:
• Read operation reads the data from the database and then stores it in the buffer in main
memory.
• For example- Read(X) instruction will read the value of X from the database and will
store it in the buffer in main memory.
• Steps are:
➢ Find the block that contain data item X.
➢ Copy the block to a buffer in the main memory.
➢ Copy item X from the buffer to the program variable named X.
b. Write Operation:
• Write operation writes the updated data value back to the database from the buffer.
• For example- Write(X) will write the updated value of X from the buffer to the
database.
• Steps are:
➢ Find address of block which contains data item X.
➢ Copy disk block into a buffer in the main memory
➢ Update data item X in main memory buffer.
➢ Store the updated block from the buffer back to disk.

2
www.gradeup.co

2. ACID PROPERTIES

• To ensure the consistency of database, certain properties are followed by all the transactions
occurring in the system.
• It is important to ensure that the database remains consistent before and after the
transaction.
• These properties are called as ACID Properties of a transaction.

A = Atomicity
C = Consistency
I = Isolation
D = Durability

Atomicity:
• This property ensures that a transaction should execute all the operations including commit
or none of them i.e. the transaction occurs completely, or it does not occur at all.
• It is the responsibility of Transaction Control Manager to ensure atomicity of the
transactions.
Consistency:
• This means that integrity constraints must be maintained i.e. It ensures that the database
remains consistent before and after the transaction.
• Transaction must be logically correct, and correctness should be monitored by the user.
• It is the responsibility of DBMS and application programmer to ensure consistency of the
database.
Isolation:
• Execution of one transaction is isolated from that of another transactions.
• It ensures that concurrent execution of transaction results in a system state that would be
obtained if transaction were executed serially, i.e., one after the other.
• It is the responsibility of concurrency control manager to ensure isolation for all the
transactions.
Durability:
• The changes applied to the database by a successfully committed transaction must persist
in the database.
• It also ensures that these changes exist permanently and are never lost even if there occurs
a failure of any kind.
• It is the responsibility of recovery manager to ensure durability in the database.

3
www.gradeup.co

3. SCHEDULE

Sequence that indicate the chronological order in which instructions of concurrent transactions
are executed.

3.1. Strict Schedule:


• All the transactions execute serially, one after the other i.e. after commit of one
transaction begins another transaction.
• Every serial schedule result always preserves integrity.
• When one transaction executes, no other transaction is allowed to execute.
• Less number of users can use the database at a time.
• The number of serial schedules possible with n transactions are n!
Example:

Transaction T1 Transaction T2

R(A)

W(A)

R(B)

W(B)

Commit

R(A)

W(B)

Commit

• Transaction T1 executes first and after completion of T1, T2 is executed


• So, this schedule is an example of a Serial Schedule.
3.2. Concurrent Schedule:
• Multiple transactions execute simultaneously.
• Operations of all the transactions are inter leaved or mixed with each other.
• May cause inconsistency, so to maintain consistency transaction should satisfy ACID
property.

4
www.gradeup.co

Example:

Transaction T1 Transaction T2

R(A)

W(B)

R(A)

R(B)

W(B)

Commit

R(B)

Commit

• The operations of T1 and T2 are interleaved.


• So, this schedule is an example of a Non-Serial Schedule.
Conflicts in Concurrent schedule:
The two operations become conflicting if all conditions satisfy:
• Both belong to separate transactions.
• They have the same data item.
• They contain at least one write operation.
a. Read-Write Conflict:

Ti Tj

R(A) ⋮

⋮ W(A)

b. Write-Read Conflict

Ti Tj

W(A) ⋮

⋮ R(A) → uncommitted
read
c. Write-Write Conflict:

Ti Tj

R(A) ⋮

⋮ W(A)

Concurrency Problems in DBMS


• Dirty Read: Reading the data written by an uncommitted transaction is called as dirty
read.

5
www.gradeup.co

Transaction T1 Transaction T2

R(A)

W(B)

. R(A) //Dirty Read


. W(A)
.
Commit
.
.
.
.
.

Failure

• T2 reads the dirty value of A written by the uncommitted transaction T1.


• T1 fails in later stages and roll backs.
• Thus, the value that T2 read now stands to be incorrect.
• Therefore, database becomes inconsistent.

4. SERIALIZABILITY

• It is the classical concurrency scheme. It ensures that a schedule for executing concurrent
transactions is equivalent to one that executes the transactions serially in some order.
• Some non-serial schedules may lead to inconsistency of the database.
• Serializability is a concept that helps to identify which non-serial schedules are correct and
will maintain the consistency of the database.

4.1. Conflict Serializability:


• A schedule is called conflict serializability if after swapping of non-conflicting operations,
it can transform into a serial schedule.
• The schedule will be a conflict serializable if it is conflict equivalent to a serial schedule.
• Conflict serialisabiltiy: Instructions Ii and Ij of transactions Ti and Tj respectively,
conflict if and only if there exists some item Q accessed by both I i and Ij, and at least one
of these instructions wrote Q.
(i) Ii = read (Q), Ij = read (Q). Ii and Ij don’t conflict.

6
www.gradeup.co

(ii) Ii = read (Q), Ij = write (Q). They conflict.


(iii) Ii = write (Q), Ij = read (Q). They conflict.
(iv) Ii = write (Q), Ij = write (Q). They conflict.
teps to check Conflict Serializable Schedule:
Schedule S is conflict serializable schedule iff precedence graph of schedule S is acyclic.
Step 1: List all the conflicting operations.
Step 2: For the precedence graph:
• Draw an edge for each conflict pair such that if X i (V) and Yj (V) forms a conflict pair
then draw an edge from Ti to Tj, in which one of the V has to be a write operation.
• This ensures that Ti gets executed before Tj.
Step 3: Check for cycles in the graph. If there is no cycle, then conflict serializable
otherwise not conflict serializable.
S : r1(A) w2(A) w2(B) r3(B) r3(C) w1(C) Test whether S is conflict serializable or not.
Sol.
Here
r1 (A) – w2 (A)
w2 (B) – r3 (B)
r3 (C) – w1(C)
are the conflict pairs

Conflict Equivalent:
Two schedules are conflict equivalent if one can be transformed to another by swapping
non-conflicting operations.
Two schedules are said to be conflict equivalent if and only if:
1. They contain the same set of the transaction.
2. Conflict pairs must have same precedence in both schedules.
Conflict Equal serial schedules are topological orders of precedence graph of schedule S.
Example: S2 is conflict equivalent to S1 (S1 can be converted to S2 by swapping non-
conflicting operations).

7
www.gradeup.co

Non-serial schedule Serial schedule

T1 T2 T1 T2
Read (A) Read (A)
Write (A) Write (A)
Read (A) Read (B)
Write (A) Write (B)
Read (B) Read (A)
Write (B) Write (A)
Read (B) Read (B)
Write (B) Write (B)
Schedule S1 Schedule S1
Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before
starting any operation of T2. Schedule S1 can be transformed into a serial schedule by
swapping non-conflicting operations of S1.
After swapping of non-conflict operations, the schedule S1 becomes:
T1 T2

Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)
Since, S1 is conflict serializable.
4.2. View Serializability:
• A schedule will view serializable if it is view equivalent to a serial schedule.
• Conflict serializable which does not contain blind writes are view serializable.
View Equivalent Schedules:
Consider two schedules S1 and S2 each consisting of two transactions T1 and T2
Schedules S1 and S2 are called view equivalent if the following three conditions hold true
for them-
i. Initial readers must be same for all the data items:
For each data item X, if transaction Ti reads X from the database initially in schedule S1,
then in schedule S2 also, Ti must perform the initial read of X from the database.

8
www.gradeup.co

ii. Write-read sequence must be same:


If transaction Ti reads a data item that has been updated by the transaction Tj in schedule
S1, then in schedule S2 also, transaction Ti must read the same data item that has been
updated by the transaction Tj
iii. Final writers must be same for all the data items:

NOTE:

• Every conflict serializable schedule is also view serializable.


• Every view serializable schedule that is not conflict serializable
has blind writes.
For
each data item X, if X has been updated at last by transaction Ti in schedule S1, then in
schedule S2 also, X must be updated at last by transaction Ti.

5. RECOVERABLE PROBLEMS

5.1. Irrecoverable Schedule: It is not possible to roll back after the commitment of a
transaction as the initial data is nowhere.
Example:

9
www.gradeup.co

5.2. Recoverable Schedule: A schedule is recoverable if a transaction Tj reads a data items


previously written by transaction Ti, the commit operation of Ti appears before the commit
operation of Tj.
Example:
Ti Tj

R(A)

W(A)

⋮ R(A)

Commit W(B)

Commit
5.3. Cascadeless Recoverable Schedule: For each pair of transactions Ti and Tj such that
Tj reads a data item previously written by Ti the commit operation of Ti appears before
the read operation of Tj. Every cascadeless schedule is also recoverable.
Example:

Ti Tj

R(A)

W(A)

Commit

R(A)

Commit
Cascading roll back: A single transaction failure leads to a series of transaction
rollbacks.
5.4. Strict Recoverable Schedule: If transaction Ti updates the data item A, any other
transaction Tj not allowed to R(A) or W(A) until commit or roll back of T i.

6. CONCURRENCY CONTROL PROTOCOLS

It is the procedure in DBMS for managing simultaneous operations without conflicting with each
another.
Different concurrency control protocols offer different benefits between the amount of
concurrency they allow and the amount of overhead that they impose.
• Lock-Based Protocols
• Two Phase Locking Protocols
• Timestamp-Based Protocols

10
www.gradeup.co

6.1. Lock Based Protocol


A lock is a data variable which is associated with a data item. This lock signifies that
operations that can be performed on the data item. Locks help synchronize access to the
database items by concurrent transactions.
i. Binary Locks:
• A Binary lock on a data item can either locked (1) or unlocked (0) states.
• Locked Objects cannot be used by any other transaction.
• Unlocked objects are open to all the transactions.
• Every transaction requires a lock and unlock operation for each data item that is
accessed.
• Every transaction locks before use of data item and release the lock after performing
the operation.
ii. Shared/Exclusive:
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
• Shared lock(S):
o It is also known as a Read-only lock. In a shared lock, the data item can only read by
the transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
• Exclusive lock(X):
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.
6.2. Two Phase Locking:
• Two-Phase locking protocol which is also known as a 2PL protocol.
• In this type of locking protocol, the transaction should acquire a lock after it releases
one of its locks.
• There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by
the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
• In the first part, when the execution of the transaction starts, it seeks permission for
the lock it requires.

11
www.gradeup.co

• In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
Problem in 2PL:
• Irrecoverability
• Deadlock
• Starvation
Strict Two- Phase Locking:
• Basic 2PL with all exclusive locks should be held until commit/roll back.
• It ensures serializability strict recoverable.
Problem in 2PL:
• Starvation
• Irrecoverability
6.3. Time-Stamp Protocol:
• This protocol ensures that every conflicting read and write operations are executed in
timestamp order.
• The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.
• The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as soon
as a transaction is created.
Basic Timestamp ordering protocol works as follows:
Let,
TS(Ti) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
• If W_TS(X) >TS(Ti) then the operation is rejected.
• If W_TS(X) <= TS(Ti) then the operation is executed.
• Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
• If TS(Ti) < R_TS(X) then the operation is rejected.
• If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the
operation is executed.

12
www.gradeup.co

Thomas Write Rule Stamp Ordering Protocol


1. Transaction T issues R(A) option:
(a) If WTS (A) > TS (T) then roll back T
(b) Otherwise execute successfully.
Set RTS (A) = max {TS (T), RTS (A)}
2. Transaction T issues W (A) operation:
(a) If RTS (A) > TS (T) then roll back T.
(b) If WTS (A) > TS (T) then ignore W(A) and continue execution of trans T.
Strict Time Stamp Ordering Protocol
A transaction T2 that issues a R(A) or W(A) such that TS(T 2) > WTS (X) has its read write
option delayed until the transaction T 1 that write the value X has committed or rolled
back.
6.4. Wait Die Protocol
• Write the transaction in ascending order of time stamp values.
• If T1 required resource that is hold by T2, T1 wait for T2 to unlock.
• If T2 required resource that is hold by T 1, then roll back T2 and restart with same time
stamp value.
6.5. Wound Wait Protocol
• Write the transaction in ascending order of TS values.
• If T1 required resource that is hold by T2 then roll back T2 and restart with same time
stamp value.
• If T2 required resource that is hold by T1 than T2 wait for T1 to unlock.
• Both wait die and wound wait protocols may have starvation.

****

13
www.gradeup.co

14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy