0% found this document useful (0 votes)
7 views132 pages

DBMS Tutorial

The DBMS Tutorial covers both basic and advanced concepts of Database Management Systems, suitable for beginners and professionals. It explains the functions of DBMS, including data definition, updation, retrieval, and user administration, while also discussing various types of databases such as relational, NoSQL, and cloud databases. Additionally, it highlights the advantages and disadvantages of DBMS, including data redundancy control, ease of maintenance, and the complexities involved.

Uploaded by

smdevx6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views132 pages

DBMS Tutorial

The DBMS Tutorial covers both basic and advanced concepts of Database Management Systems, suitable for beginners and professionals. It explains the functions of DBMS, including data definition, updation, retrieval, and user administration, while also discussing various types of databases such as relational, NoSQL, and cloud databases. Additionally, it highlights the advantages and disadvantages of DBMS, including data redundancy control, ease of maintenance, and the complexities involved.

Uploaded by

smdevx6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 132

DBMS Tutorial provides basic and advanced concepts of Database

. Our DBMS Tutorial is designed for beginners and professionals both.


Database management system is software that is used to manage the database.
Our DBMS
Tutorial includes all topics of DBMS such as introduction, ER model, keys, relational model, join operation, SQL
, functional dependency, transaction, concurrency control, etc.
What is Database
The database is a collection of inter-related data which is used to retrieve, insert and delete the data efficiently. It
is also used to organize the data in the form of a table, schema, views, and reports, etc.
39M
822
C++ vs Java
For example: The college Database organizes the data about the admin, staff, students and faculty etc.
Using the database, you can easily retrieve, insert, and delete the information.
Database Management System
Database management system is a software which is used to manage the database. For example: MySQL
, Oracle
, etc are a very popular commercial database which is used in different applications.
DBMS provides an interface to perform various operations like database creation, storing data in it, updating
data, creating a table in the database and a lot more.
It provides protection and security to the database. In the case of multiple users, it also maintains data
consistency.
DBMS allows users the following tasks:
Data Definition: It is used for creation, modification, and removal of definition that defines the organization of data
in the database.
Data Updation: It is used for the insertion, modification, and deletion of the actual data in the database.
Data Retrieval: It is used to retrieve the data from the database which can be used by applications for various
purposes.
User Administration: It is used for registering and monitoring users, maintain data integrity, enforcing data
security, dealing with concurrency control, monitoring performance and recovering information corrupted by
unexpected failure.
Characteristics of DBMS
It uses a digital repository established on a server to store and manage the information.
It can provide a clear and logical view of the process that manipulates data.
DBMS contains automatic backup and recovery procedures.
It contains ACID properties which maintain data in a healthy state in case of failure.
It can reduce the complex relationship between data.
It is used to support manipulation and processing of data.
It is used to provide security of data.
It can view the database from different viewpoints according to the requirements of the user.
Advantages of DBMS
Controls database redundancy: It can control data redundancy because it stores all the data in one single
database file and that recorded data is placed in the database.
Data sharing: In DBMS, the authorized users of an organization can share the data among multiple users.
Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system.
Reduce time: It reduces development time and maintenance need.
Backup: It provides backup and recovery subsystems which create automatic backup of data from hardware
and software
failures and restores the data if required.
multiple user interface: It provides different types of user interfaces like graphical user interfaces, application
program interfaces
Disadvantages of DBMS
Cost of Hardware and Software: It requires a high speed of data processor and large memory size to run DBMS
software.
Size: It occupies a large space of disks and large memory to run them efficiently.
Complexity: Database system creates additional complexity and requirements.
Higher impact of failure: Failure is highly impacted the database because in most of the organization, all the data
stored in a single database and if the database is damaged due to electric failure or database corruption then the
data may be lost forever.
Database

What is Data?
Data is a collection of a distinct small unit of information. It can be used in a variety of forms like text, numbers,
media, bytes, etc. it can be stored in pieces of paper or electronic memory, etc.
Word 'Data' is originated from the word 'datum' that means 'single piece of information.' It is plural of the word
datum.
In computing, Data is information that can be translated into a form for efficient movement and processing. Data
is interchangeable.
What is Database?
A database is an organized collection of data, so that it can be easily accessed and managed.
40M
761
Difference between JDK, JRE, and JVM
You can organize data into tables, rows, columns, and index it to make it easier to find relevant information.
Database handlers create a database in such a way that only one set of software program provides access of
data to all the users.
The main purpose of the database is to operate a large amount of information by storing, retrieving, and
managing data.
There are many dynamic websites on the World Wide Web nowadays which are handled through databases. For
example, a model that checks the availability of rooms in a hotel. It is an example of a dynamic website that uses
a database.
There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix, PostgreSQL, SQL
Server, etc.
Modern databases are managed by the database management system (DBMS).
SQL or Structured Query Language is used to operate on the data stored in a database. SQL depends on
relational algebra and tuple relational calculus.
A cylindrical structure is used to display the image of a database.

Evolution of Databases
The database has completed more than 50 years of journey of its evolution from flat-file system to relational and
objects relational systems. It has gone through several generations.
The Evolution
File-Based
1968 was the year when File-Based database were introduced. In file-based databases, data was maintained in
a flat file. Though files have many advantages, there are several limitations.
One of the major advantages is that the file system has various access methods, e.g., sequential, indexed, and
random.
It requires extensive programming in a third-generation language such as COBOL, BASIC.
Hierarchical Data Model
1968-1980 was the era of the Hierarchical Database. Prominent hierarchical database model was IBM's first
DBMS. It was called IMS (Information Management System).
In this model, files are related in a parent/child manner.
Below diagram represents Hierarchical Data Model. Small circle represents objects.
Like file system, this model also had some limitations like complex implementation, lack structural independence,
can't easily handle a many-many relationship, etc.
Network data model
Charles Bachman developed the first DBMS at Honeywell called Integrated Data Store (IDS). It was developed in
the early 1960s, but it was standardized in 1971 by the CODASYL group (Conference on Data Systems
Languages).
In this model, files are related as owners and members, like to the common network model.
Network data model identified the following components:
Network schema (Database organization)
Sub-schema (views of database per user)
Data management language (procedural)
This model also had some limitations like system complexity and difficult to design and maintain.
Relational Database
1970 - Present: It is the era of Relational Database and Database Management. In 1970, the relational model
was proposed by E.F. Codd.
Relational database model has two main terminologies called instance and schema.
The instance is a table with rows or columns
Schema specifies the structure like name of the relation, type of each column and name.
This model uses some mathematical concept like set theory and predicate logic.
The first internet database application had been created in 1995.
During the era of the relational database, many more models had introduced like object-oriented model, object-
relational model, etc.
Cloud database
Cloud database facilitates you to store, manage, and retrieve their structured, unstructured data via a cloud
platform. This data is accessible over the Internet. Cloud databases are also called a database as service
(DBaaS) because they are offered as a managed service.
Some best cloud options are:
AWS (Amazon Web Services)
Snowflake Computing
Oracle Database Cloud Services
Microsoft SQL server
Google cloud spanner
Advantages of cloud database
Lower costs
Generally, company provider does not have to invest in databases. It can maintain and support one or more data
centers.
Automated
Cloud databases are enriched with a variety of automated processes such as recovery, failover, and auto-
scaling.
Increased accessibility
You can access your cloud-based database from any location, anytime. All you need is just an internet
connection.
NoSQL Database
A NoSQL database is an approach to design such databases that can accommodate a wide variety of data
models. NoSQL stands for "not only SQL." It is an alternative to traditional relational databases in which data is
placed in tables, and data schema is perfectly designed before the database is built.
NoSQL databases are useful for a large set of distributed data.
Some examples of NoSQL database system with their category are:
MongoDB, CouchDB, Cloudant (Document-based)
Memcached, Redis, Coherence (key-value store)
HBase, Big Table, Accumulo (Tabular)
Advantage of NoSQL
High Scalability
NoSQL can handle an extensive amount of data because of scalability. If the data grows, NoSQL database scale
it to handle that data in an efficient manner.
High Availability
NoSQL supports auto replication. Auto replication makes it highly available because, in case of any failure, data
replicates itself to the previous consistent state.
Disadvantage of NoSQL
Open source
NoSQL is an open-source database, so there is no reliable standard for NoSQL yet.
Management challenge
Data management in NoSQL is much more complicated than relational databases. It is very challenging to install
and even more hectic to manage daily.
GUI is not available
GUI tools for NoSQL database are not easily available in the market.
Backup
Backup is a great weak point for NoSQL databases. Some databases, like MongoDB, have no powerful
approaches for data backup.
The Object-Oriented Databases
The object-oriented databases contain data in the form of object and classes. Objects are the real-world entity,
and types are the collection of objects. An object-oriented database is a combination of relational model features
with objects oriented principles. It is an alternative implementation to that of the relational model.
Object-oriented databases hold the rules of object-oriented programming. An object-oriented database
management system is a hybrid application.
The object-oriented database model contains the following properties.
Object-oriented programming properties
Objects
Classes
Inheritance
Polymorphism
Encapsulation
Relational database properties
Atomicity
Consistency
Integrity
Durability
Concurrency
Query processing
Graph Databases
A graph database is a NoSQL database. It is a graphical representation of data. It contains nodes and edges. A
node represents an entity, and each edge represents a relationship between two edges. Every node in a graph
database represents a unique identifier.
Graph databases are beneficial for searching the relationship between data because they highlight the
relationship between relevant data.

Graph databases are very useful when the database contains a complex relationship and dynamic schema.
It is mostly used in supply chain management, identifying the source of IP telephony.
DBMS (Data Base Management System)
Database management System is software which is used to store and retrieve the database. For example,
Oracle, MySQL, etc.; these are some popular DBMS tools.
DBMS provides the interface to perform the various operations like creation, deletion, modification, etc.
DBMS allows the user to create their databases as per their requirement.
DBMS accepts the request from the application and provides specific data through the operating system.
DBMS contains the group of programs which acts according to the user instruction.
It provides security to the database.
Advantage of DBMS
Controls redundancy
It stores all the data in a single database file, so it can control data redundancy.
Data sharing
An authorized user can share the data among multiple users.
Backup
It providesBackup and recovery subsystem. This recovery system creates automatic data from system failure
and restores data if required.
Multiple user interfaces
It provides a different type of user interfaces like GUI, application interfaces.
Disadvantage of DBMS
Size
It occupies large disk space and large memory to run efficiently.
Cost
DBMS requires a high-speed data processor and larger memory to run DBMS software, so it is costly.
Complexity
DBMS creates additional complexity and requirements.
RDBMS (Relational Database Management System)
The word RDBMS is termed as 'Relational Database Management System.' It is represented as a table that
contains rows and column.
RDBMS is based on the Relational model; it was introduced by E. F. Codd.
A relational database contains the following components:
Table
Record/ Tuple
Field/Column name /Attribute
Instance
Schema
Keys
An RDBMS is a tabular DBMS that maintains the security, integrity, accuracy, and consistency of the data.
Types of Databases
There are various types of databases used for storing different varieties of data:

1) Centralized Database
It is the type of database that stores data at a centralized database system. It comforts the users to access the
stored data from different locations through several applications. These applications contain the authentication
process to let users access data securely. An example of a Centralized database can be Central Library that
carries a central database of each library in a college/university.
Advantages of Centralized Database
It has decreased the risk of data management, i.e., manipulation of data will not affect the core data.
Data consistency is maintained as it manages data in a central repository.
It provides better data quality, which enables organizations to establish data standards.
It is less costly because fewer vendors are required to handle the data sets.
Disadvantages of Centralized Database
The size of the centralized database is large, which increases the response time for fetching the data.
It is not easy to update such an extensive database system.
If any server failure occurs, entire data will be lost, which could be a huge loss.
2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among different database
systems of an organization. These database systems are connected via communication links. Such links help the
end-users to access the data easily. Examples of the Distributed database are Apache Cassandra, HBase,
Ignite, etc.
We can further divide a distributed database system into:

Homogeneous DDB: Those database systems which execute on the same operating system and use the same
application process and carry the same hardware devices.
Heterogeneous DDB: Those database systems which execute on different operating systems under different
application procedures, and carries different hardware devices.
Advantages of Distributed Database
Modular development is possible in a distributed database, i.e., the system can be expanded by including new
computers and connecting them to the distributed system.
One server failure will not affect the entire data set.
3) Relational Database
This database is based on the relational data model, which stores data in the form of rows(tuple) and
columns(attributes), and together forms a table(relation). A relational database uses SQL for storing,
manipulating, as well as maintaining the data. E.F. Codd invented the database in 1970. Each table in the
database carries a key that makes the data unique from others. Examples of Relational databases are MySQL,
Microsoft SQL Server, Oracle, etc.
Properties of Relational Database
There are following four commonly known properties of a relational model known as ACID properties, where:
A means Atomicity: This ensures the data operation will complete either with success or with failure. It follows the
'all or nothing' strategy. For example, a transaction will either be committed or will abort.
C means Consistency: If we perform any operation over the data, its value before and after the operation should
be preserved. For example, the account balance before and after the transaction should be correct, i.e., it should
remain conserved.
I means Isolation: There can be concurrent users for accessing data at the same time from the database. Thus,
isolation between the data should remain isolated. For example, when multiple transactions occur at the same
time, one transaction effects should not be visible to the other transactions in the database.
D means Durability: It ensures that once it completes the operation and commits the data, data changes should
remain permanent.
4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets. It is not a
relational database as it stores data not only in tabular form but in several different ways. It came into existence
when the demand for building modern applications increased. Thus, NoSQL presented a wide variety of
database technologies in response to the demands. We can further divide a NoSQL database into the following
four types:

Key-value storage: It is the simplest type of database storage where it stores every single item as a key (or
attribute name) holding its value, together.
Document-oriented Database: A type of database used to store data as JSON-like document. It helps developers
in storing data by using the same document-model format as used in the application code.
Graph Databases: It is used for storing vast amounts of data in a graph-like structure. Most commonly, social
networking websites use the graph database.
Wide-column stores: It is similar to the data represented in relational databases. Here, data is stored in large
columns together, instead of storing in rows.
Advantages of NoSQL Database
It enables good productivity in the application development as it is not required to store data in a structured
format.
It is a better option for managing and handling large data sets.
It provides high scalability.
Users can quickly access data from the database through key-value.
5) Cloud Database
A type of database where data is stored in a virtual environment and executes over the cloud computing
platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS, etc.) for accessing the
database. There are numerous cloud platforms, but the best options are:
Amazon Web Services(AWS)
Microsoft Azure
Kamatera
PhonixNAP
ScienceSoft
Google Cloud SQL, etc.
6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data in the database system.
The data is represented and stored as objects which are similar to the objects used in the object-oriented
programming language.
7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship nodes. Here, it organizes
data in a tree-like structure.

Data get stored in the form of records that are connected via links. Each child record in the tree will contain only
one parent. On the other hand, each parent record can have multiple child records.
8) Network Databases
It is the database that typically follows the network data model. Here, the representation of data is in the form of
nodes connected via links between them. Unlike the hierarchical database, it allows each record to have multiple
children and parent nodes to form a generalized graph structure.
9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This database is basically
designed for a single user.
Advantage of Personal Database
It is simple and easy to handle.
It occupies less storage space as it is small in size.
10) Operational Database
The type of database which creates and updates the database in real-time. It is basically designed for executing
and handling the daily data operations in several businesses. For example, An organization uses operational
databases for managing per day transactions.
11) Enterprise Database
Large organizations or enterprises use this database for managing a massive amount of data. It helps
organizations to increase and improve their efficiency. Such a database allows simultaneous access to users.
Advantages of Enterprise Database:
Multi processes are supportable over the Enterprise database.
It allows executing parallel queries on the system.
What is RDBMS
RDBMS stands for Relational Database Management Systems..
All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL and
Microsoft Access are based on RDBMS.
It is called Relational Data Base Management System (RDBMS) because it is based on relational model
introduced by E.F. Codd.
How it works
Data is represented in terms of tuples (rows) in RDBMS.
Relational database is most commonly used database. It contains number of tables and each table has its own
primary key.
Due to a collection of organized set of tables, data can be accessed easily in RDBMS.
Brief History of RDBMS
During 1970 to 1972, E.F. Codd published a paper to propose the use of relational database model.
RDBMS is originally based on that E.F. Codd's relational model invention.
What is table
The RDBMS database uses tables to store data. A table is a collection of related data entries and contains rows
and columns to store data.
A table is the simplest example of data storage in RDBMS.
Let's see the example of student table.

ID Name AGE COURSE

1 Ajeet 24 B.Tech

2 aryan 20 C.A

3 Mahesh 21 BCA

4 Ratan 22 MCA

5 Vimal 26 BSC

What is field
Field is a smaller entity of the table which contains specific information about every record in the table. In the
above example, the field in the student table consist of id, name, age, course.

What is row or record


A row of a table is also called record. It contains the specific information of each individual entry in the table. It is
a horizontal entity in the table. For example: The above table contains 5 records.
Let's see one record/row in the table.

1 Ajeet 24 B.Tech

What is column
A column is a vertical entity in the table which contains all information associated with a specific field in a table.
For example: "name" is a column in the above table which contains all information about student's name.

Ajeet

Aryan

Mahesh

Ratan

Vimal

NULL Values
The NULL value of the table specifies that the field has been left blank during record creation. It is totally different
from the value filled with zero or a field that contains space.
Data Integrity
There are the following categories of data integrity exist with each RDBMS:
Entity integrity: It specifies that there should be no duplicate rows in a table.
Domain integrity: It enforces valid entries for a given column by restricting the type, the format, or the range of
values.
Referential integrity: It specifies that rows cannot be deleted, which are used by other records.
User-defined integrity: It enforces some specific business rules that are defined by users. These rules are
different from entity, domain or referential integrity.
Difference between DBMS and RDBMS
Although DBMS and RDBMS both are used to store information in physical database but there are some
remarkable differences between them.
The main differences between DBMS and RDBMS are given below:

No DBMS RDBMS
.

1) DBMS applications store data as file. RDBMS applications store data in a tabular form.

2) In DBMS, data is generally stored in either a In RDBMS, the tables have an identifier called primary key and the
hierarchical form or a navigational form. data values are stored in the form of tables.

3) Normalization is not present in DBMS. Normalization is present in RDBMS.

4) DBMS does not apply any security with RDBMS defines the integrity constraint for the purpose of ACID
regards to data manipulation. (Atomocity, Consistency, Isolation and Durability) property.

5) DBMS uses file system to store data, so in RDBMS, data values are stored in the form of tables, so
there will be no relation between the tables. a relationship between these data values will be stored in the form
of a table as well.
6) DBMS has to provide some uniform RDBMS system supports a tabular structure of the data and a
methods to access the stored information. relationship between them to access the stored information.

7) DBMS does not support distributed RDBMS supports distributed database.


database.

8) DBMS is meant to be for small organization RDBMS is designed to handle large amount of data. it
and deal with small data. it supports single supports multiple users.
user.

9) Examples of DBMS are file Example of RDBMS are mysql, postgre, sql server, oracle etc.
systems, xml etc.

After observing the differences between DBMS and RDBMS, you can say that RDBMS is an extension of DBMS.
There are many software products in the market today who are compatible for both DBMS and RDBMS. Means
today a RDBMS application is DBMS application and vice-versa.
DBMS vs. File System
File System Approach
File based systems were an early attempt to computerize the manual system. It is also called a traditional based
approach in which a decentralized approach was taken where each department stored and controlled its own
data with the help of a data processing specialist. The main role of a data processing specialist was to create the
necessary computer file structures, and also manage the data within structures and design some application
programs that create reports based on file data.

In the above figure:


Consider an example of a student's file system. The student file will contain information regarding the student
(i.e. roll no, student name, course etc.). Similarly, we have a subject file that contains information about the
subject and the result file which contains the information regarding the result.
Some fields are duplicated in more than one file, which leads to data redundancy. So to overcome this problem,
we need to create a centralized system, i.e. DBMS approach.
DBMS:
A database approach is a well-organized collection of data that are related in a meaningful way which can be
accessed by different users but stored only once in a system. The various operations performed by the DBMS
system are: Insertion, deletion, selection, sorting etc.
In the above figure,
In the above figure, duplication of data is reduced due to centralization of data.
There are the following differences between DBMS and File systems:

Basis DBMS Approach File System Approach

Meaning DBMS is a collection of data. In DBMS, the The file system is a collection of data. In this
user is not required to write the procedures. system, the user has to write the procedures for
managing the database.

Sharing of data Due to the centralized approach, data sharing Data is distributed in many files, and it may be of
is easy. different formats, so it isn't easy to share data.

Data Abstraction DBMS gives an abstract view of data that The file system provides the detail of the data
hides the details. representation and storage of data.

Security and DBMS provides a good protection It isn't easy to protect a file under the file system.
Protection mechanism.

Recovery DBMS provides a crash recovery The file system doesn't have a crash mechanism,
Mechanism mechanism, i.e., DBMS protects the user i.e., if the system crashes while entering some
from system failure. data, then the content of the file will be lost.

Manipulation DBMS contains a wide variety of The file system can't efficiently store and retrieve
Techniques sophisticated techniques to store and retrieve the data.
the data.

Concurrency DBMS takes care of Concurrent access of In the File system, concurrent access has many
Problems data using some form of locking. problems like redirecting the file while deleting
some information or updating some information.

Where to use Database approach used in large systems File system approach used in large systems which
which interrelate many files. interrelate many files.

Cost The database system is expensive to design. The file system approach is cheaper to design.

Data Redundancy Due to the centralization of the database, the In this, the files and application programs are
and Inconsistency problems of data redundancy and created by different programmers so that there
inconsistency are controlled. exists a lot of duplication of data which may lead
to inconsistency.
Structure The database structure is complex to design. The file system approach has a simple structure.

Data Independence In this system, Data Independence exists, In the File system approach, there exists no Data
and it can be of two types. Independence.
Logical Data Independence
Physical Data Independence

Integrity Integrity Constraints are easy to apply. Integrity Constraints are difficult to implement in
Constraints file system.

Data Models In the database approach, 3 types of data In the file system approach, there is no concept of
models exist: data models exists.
Hierarchal data models
Network data models
Relational data models

Flexibility Changes are often a necessity to the content The flexibility of the system is less as compared to
of the data stored in any system, and these the DBMS approach.
changes are more easily with a database
approach.

Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

DBMS Architecture
The DBMS design depends upon its architecture. The basic client/server architecture is used to deal with a large
number of PCs, web servers, database servers and other components that are connected with networks.
The client/server architecture consists of many PCs and a workstation which are connected via the network.
DBMS architecture depends upon how users are connected to the database to get their request done.
Types of DBMS Architecture

Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of two
types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
In this architecture, the database is directly available to the user. It means the user can directly sit on the DBMS
and uses it.
Any changes done here will directly be done on the database itself. It doesn't provide a handy tool for end users.
The 1-Tier architecture is used for development of the local application, where programmers can directly
communicate with the database for the quick response.
2-Tier Architecture
The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the client end
can directly communicate with the database at the server side. For this interaction, API's like: ODBC, JDBC are
used.
The user interfaces and application programs are run on the client-side.
The server side is responsible to provide the functionalities like: query processing and transaction management.
To communicate with the DBMS, client-side application establishes a connection with the server side.

Fig: 2-tier Architecture


3-Tier Architecture
The 3-Tier architecture contains another layer between the client and server. In this architecture, client can't
directly communicate with the server.
The application on the client-end interacts with an application server which further communicates with the
database system.
End user has no idea about the existence of the database beyond the application server. The database also has
no idea about any other user beyond the application.
The 3-Tier architecture is used in case of large web application.

Three schema Architecture


The three schema architecture is also called ANSI/SPARC architecture or three-level architecture.
This framework is used to describe the structure of a specific database system.
The three schema architecture is also used to separate the user applications and physical database.
The three schema architecture contains three-levels. It breaks the database down into three different categories.
The three-schema architecture is as follows:
In the above diagram:
It shows the DBMS architecture.
Mapping is used to transform the request and response between various database levels of architecture.
Mapping is not good for small DBMS because it takes more time.
In External / Conceptual mapping, it is necessary to transform the request from external level to conceptual
schema.
In Conceptual / Internal mapping, DBMS transform the request from the conceptual to internal level.
Objectives of Three schema Architecture
The main objective of three level architecture is to enable multiple users to access the same data with a
personalized view while storing the underlying data only once. Thus it separates the user's view from the
physical structure of the database. This separation is desirable for the following reasons:
Different users need different views of the same data.
The approach in which a particular user needs to see the data may change over time.
The users of the database should not worry about the physical implementation and internal workings of the
database such as data compression and encryption techniques, hashing, optimization of the internal structures
etc.
All users should be able to access the same data according to their requirements.
DBA should be able to change the conceptual structure of the database without affecting the user's
Internal structure of the database should be unaffected by changes to physical aspects of the storage.
1. Internal Level

The internal level has an internal schema which describes the physical storage structure of the database.
The internal schema is also known as a physical schema.
It uses the physical data model. It is used to define that how the data will be stored in a block.
The physical level is used to describe complex low-level data structures in detail.
The internal level is generally is concerned with the following activities:
406.1K
Epic Games and Xbox To Donate 2 Weeks of ‘Fortnite’ Proceeds to Ukraine Relief Efforts
Storage space allocations.
For Example: B-Trees, Hashing etc.
Access paths.
For Example: Specification of primary and secondary keys, indexes, pointers and sequencing.
Data compression and encryption techniques.
Optimization of internal structures.
Representation of stored fields.
2. Conceptual Level

The conceptual schema describes the design of a database at the conceptual level. Conceptual level is also
known as logical level.
The conceptual schema describes the structure of the whole database.
The conceptual level describes what data are to be stored in the database and also describes what relationship
exists among those data.
In the conceptual level, internal details such as an implementation of the data structure are hidden.
Programmers and database administrators work at this level.
3. External Level

At the external level, a database contains several schemas that sometimes called as subschema. The
subschema is used to describe the different view of the database.
An external schema is also known as view schema.
Each view schema describes the database part that a particular user group is interested and hides the remaining
database from that user group.
The view schema describes the end user interaction with database systems.
Mapping between Views
The three levels of DBMS architecture don't exist independently of each other. There must be correspondence
between the three levels i.e. how they actually correspond with each other. DBMS is responsible for
correspondence between the three types of schema. This correspondence is called Mapping.
There are basically two types of mapping in the database architecture:
Conceptual/ Internal Mapping
External / Conceptual Mapping
Conceptual/ Internal Mapping
The Conceptual/ Internal Mapping lies between the conceptual level and the internal level. Its role is to define the
correspondence between the records and fields of the conceptual level and files and data structures of the
internal level.
External/ Conceptual Mapping
The external/Conceptual Mapping lies between the external level and the Conceptual level. Its role is to define
the correspondence between a particular external and the conceptual view.
Data Models
Data Model is the modeling of the data description, data semantics, and consistency constraints of the data. It
provides the conceptual tools for describing the design of a database at each level of data abstraction. Therefore,
there are following four data models used for understanding the structure of the database:

1) Relational Data Model: This type of model designs the data in the form of rows and columns within a table.
Thus, a relational model uses tables for representing data and in-between relationships. Tables are also called
relations. This model was initially described by Edgar F. Codd, in 1969. The relational data model is the widely
used model which is primarily used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as objects and relationships
among them. These objects are known as entities, and relationship is an association among these entities. This
model was designed by Peter Chen and published in 1976 papers. It was widely used in database designing. A
set of attributes describe the entities. For example, student_name, student_id describes the 'student' entity. A set
of the same type of entities is known as an 'Entity set', and the set of the same type of relationships is known as
'relationship set'.
3) Object-based Data Model: An extension of the ER model with notions of functions, encapsulation, and object
identity, as well. This model supports a rich type system that includes structured and collection types. Thus, in
1980s, various database systems following the object-oriented approach were developed. Here, the objects are
nothing but the data carrying its properties.
4) Semistructured Data Model: This type of data model is different from the other three data models (explained
above). The semistructured data model allows the data specifications at places where the individual data items of
the same type may have different attributes sets. The Extensible Markup Language, also known as XML, is
widely used for representing the semistructured data. Although XML was initially designed for including the
markup information to the text document, it gains importance because of its application in the exchange of data.
Data model Schema and Instance
The data which is stored in the database at a particular moment of time is called an instance of the database.
The overall design of a database is called schema.
A database schema is the skeleton structure of the database. It represents the logical view of the entire
database.
A schema contains schema objects like table, foreign key, primary key, views, columns, data types, stored
procedure, etc.
A database schema can be represented by using the visual diagram. That diagram shows the database objects
and relationship with each other.
A database schema is designed by the database designers to help programmers whose software will interact
with the database. The process of database creation is called data modeling.
A schema diagram can display only some aspects of a schema like the name of record type, data type, and
constraints. Other aspects can't be specified through the schema diagram. For example, the given figure neither
show the data type of each data item nor the relationship among various files.
In the database, actual data changes quite frequently. For example, in the given figure, the database changes
whenever we add a new grade or add a student. The data at a particular moment of time is called the instance of
the database.

Data
Independence
Data independence can be explained using the three-schema architecture.
Data independence refers characteristic of being able to modify the schema at one level of the database system
without altering the schema at the next higher level.
There are two types of data independence:
1. Logical Data Independence
Logical data independence refers characteristic of being able to change the conceptual schema without having to
change the external schema.
Logical data independence is used to separate the external level from the conceptual view.
If we do any changes in the conceptual view of the data, then the user view of the data would not be affected.
Logical data independence occurs at the user interface level.
2. Physical Data Independence
Physical data independence can be defined as the capacity to change the internal schema without having to
change the conceptual schema.
If we do any changes in the storage size of the database system server, then the Conceptual structure of the
database will not be affected.
Physical data independence is used to separate conceptual levels from the internal levels.
Physical data independence occurs at the logical interface level.
Database
Language
A DBMS has appropriate languages and interfaces to express database queries and updates.
Database languages can be used to read, store and update the data in the database.
Types of Database Language

1. Data Definition Language


DDL stands for Data Definition Language. It is used to define database structure or pattern.
It is used to create schema, tables, indexes, constraints, etc. in the database.
Using the DDL statements, you can create the skeleton of the database.
Data definition language is used to store the information of metadata like the number of tables and schemas,
their names, indexes, columns in each table, constraints, etc.
Here are some tasks that come under DDL:
Create: It is used to create objects in the database.
Alter: It is used to alter the structure of the database.
Drop: It is used to delete objects from the database.
Truncate: It is used to remove all records from a table.
Rename: It is used to rename an object.
Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that's why they come under Data definition
language.
2. Data Manipulation Language
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a database. It
handles user requests.
Here are some tasks that come under DML:
Select: It is used to retrieve data from a database.
Insert: It is used to insert data into a table.
Update: It is used to update existing data within a table.
Delete: It is used to delete all records from a table.
Merge: It performs UPSERT operation, i.e., insert or update operations.
Call: It is used to call a structured query language or a Java subprogram.
Explain Plan: It has the parameter of explaining data.
Lock Table: It controls concurrency.
3. Data Control Language
DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
The DCL execution is transactional. It also has rollback parameters.
(But in Oracle database, the execution of data control language does not have the feature of rolling back.)
Here are some tasks that come under DCL:
Grant: It is used to give user access privileges to a database.
Revoke: It is used to take back permissions from the user.
There are the following operations which have the authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.
4. Transaction Control Language
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical transaction.
Here are some tasks that come under TCL:
Commit: It is used to save the transaction on the database.
Rollback: It is used to restore the database to original since the last Commit.
ACID Properties in DBMS
DBMS is the management of data that should remain integrated when any changes are done in it. It is because if
the integrity of the data is affected, whole data will get disturbed and corrupted. Therefore, to maintain the
integrity of the data, there are four properties described in the database management system, which are known
as the ACID properties. The ACID properties are meant for the transaction that goes through a different group of
tasks, and there we come to see the role of the ACID properties.
In this section, we will learn and understand about the ACID properties. We will learn what these properties stand
for and what does each property is used for. We will also understand the ACID properties with the help of some
examples.
ACID Properties
The expansion of the term ACID defines for:
1) Atomicity: The term atomicity defines that the data remains atomic. It means if any operation is performed on
the data, either it should be performed or executed completely or should not be executed at all. It further means
that the operation should not break in between or execute partially. In the case of executing operations on the
transaction, the operation should be completely executed and not partially.
Example: If Remo has account A having $30 in his account from which he wishes to send $10 to Sheero's
account, which is B. In account B, a sum of $ 100 is already present. When $10 will be transferred to account B,
the sum will become $110. Now, there will be two operations that will take place. One is the amount of $10 that
Remo wants to transfer will be debited from his account A, and the same amount will get credited to account B,
i.e., into Sheero's account. Now, what happens - the first operation of debit executes successfully, but the credit
operation, however, fails. Thus, in Remo's account A, the value becomes $20, and to that of Sheero's account, it
remains $100 as it was previously present.

In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account B. So, it is not an
atomic transaction.
The below image shows that both debit and credit operations are done successfully. Thus the transaction is
atomic.

Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue, and so the
atomicity is the main focus in the bank systems.
2) Consistency: The word consistency means that the value should remain preserved always. In DBMS, the
integrity of the data should be maintained, which means if a change in the database is made, it should remain
preserved always. In the case of transactions, the integrity of the data is very essential so that the database
remains consistent before and after the transaction. The data should always be correct.
Example:
In the above figure, there are three accounts, A, B, and C, where A is making a transaction T one by one to both
B & C. There are two operations that take place, i.e., Debit and Credit. Account A firstly debits $50 to account B,
and the amount in account A is read $300 by B before the transaction. After the successful transaction T, the
available amount in B becomes $150. Now, A debits $20 to account C, and that time, the value read by C is $250
(that is correct as a debit of $50 has been successfully done to B). The debit and credit operation from account A
to C has been done successfully. We can see that the transaction is done successfully, and the value is also
read correctly. Thus, the data is consistent. In case the value read by B and C is $300, which means that data is
inconsistent because when the debit operation executes, it will not be consistent.
4) Isolation: The term 'isolation' means separation. In DBMS, Isolation is the property of a database where no
data should affect the other one and may occur concurrently. In short, the operation on one database should
begin when the operation on the first database gets complete. It means if two operations are being performed on
two different databases, they may not affect the value of one another. In the case of transactions, when two or
more transactions occur simultaneously, the consistency should remain maintained. Any changes that occur in
any particular transaction will not be seen by other transactions until the change is not committed in the memory.
Example: If two operations are concurrently running on two different accounts, then the value of both accounts
should not get affected. The value should remain persistent. As you can see in the below diagram, account A is
making T1 and T2 transactions to account B and C, but both are executing independently without affecting each
other. It is known as Isolation.

4) Durability: Durability ensures the permanency of something. In DBMS, the term durability ensures that the
data after the successful execution of the operation becomes permanent in the database. The durability of the
data should be so perfect that even if the system fails or leads to a crash, the database still survives. However, if
gets lost, it becomes the responsibility of the recovery manager for ensuring the durability of the database. For
committing the values, the COMMIT command must be used every time we make changes.
Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency and availability of data in
the database.
Thus, it was a precise introduction of ACID properties in DBMS. We have discussed these properties in the
transaction section also.
ER model
ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to define the
data elements and relationship for a specified system.
It develops a conceptual design for the database. It also develops a very simple and easy to design view of data.
In ER modeling, the database structure is portrayed as a diagram called an entity-relationship diagram.
For example, Suppose we design a school database. In this database, the student will be an entity with attributes
like address, name, id, age, etc. The address can be another entity with attributes like city, street name, pin code,
etc and there will be a relationship between them.

Component of ER Diagram

1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as
rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be taken as an
entity.

a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key attribute of
its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary key. The key
attribute is represented by an ellipse with the text underlined.

b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The composite attribute is
represented by an ellipse, and those ellipses are connected with an ellipse.

c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute. The double
oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a
dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like Date of birth.

3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the
relationship.

Types of relationship are as follows:


a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.
For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the right associates
with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.

c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the right associates
with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.

Notati
on of ER diagram
Database can be represented using the notations. In ER diagram, many notations are used to express the
cardinality. These notations are as follows:

Mapping Constraints
A mapping constraint is a data constraint that expresses the number of entities to which another entity can be
related via a relationship set.
It is most useful in describing the relationship sets that involve more than two entity sets.
For binary relationship set R on an entity set A and B, there are four possible mapping cardinalities. These are as
follows:
One to one (1:1)
One to many (1:M)
Many to one (M:1)
Many to many (M:M)
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2 is
associated with at most one entity in E1.

One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an entity in E2 is
associated with at most one entity in E1.

Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2 is
associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an entity in E2 is
associated with any number of entities in E1.

Keys
Keys play an important role in the relational database.
It is used to uniquely identify any record or row of data from the table. It is also used to establish and identify
relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.

Types of keys:
1. Primary key
It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain multiple
keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes a primary key.
In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and Passport_Number as primary keys since they are also unique.
For each entity, the primary key selection is based on requirements and developers.

2. Candidate key
A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys are as
strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like SSN,
Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a
key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.
4. Foreign key
Foreign keys are the column of the table used to point to the primary key of another table.
Every employee works in a specific department in a company, and employee and department are two different
entities. So we can't store the department's information in the employee table. That's why we link these two
tables through the primary key of one table.
We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the EMPLOYEE table.
In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in a relation.
These attributes or combinations of the attributes are called the candidate keys. One key is chosen as the
primary key from these candidate keys, and the remaining candidate key, if it exists, is termed the alternate
key. In other words, the total number of the alternate keys is the total number of candidate keys minus the
primary key. The alternate key may or may not exist. If there is only one candidate key in a relation, it does not
have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate keys. In this
relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No, acts as the Alternate
key.

6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is also
known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple roles, and an
employee may work on multiple projects simultaneously. So the primary key will be composed of all three
attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as a composite key
since the primary key comprises more than one attribute.
Generalization
Generalization is like a bottom-up approach in which two or more entities of lower level combine to form a higher
level entity if they have some attributes in common.
In generalization, an entity of a higher level can also combine with the entities of the lower level to form a further
higher level entity.
Generalization is more like subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach.
In generalization, entities are combined to form a more generalized entity, i.e., subclasses are combined to make
a superclass.
For example, Faculty and Student entities can be generalized and create a higher level entity Person.

Specialization
Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one higher level
entity can be broken down into two lower level entities.
Specialization is used to identify the subset of an entity set that shares some distinguishing characteristics.
Normally, the superclass is defined first, the subclass and its related attributes are defined next, and relationship
set are then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or
DEVELOPER based on what role they play in the company.
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with its
corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will never
enquiry about the Course only or just about the Center instead he will ask the enquiry about both.

Reduc
tion of ER diagram to Table
The database can be represented using the notations, and these notations can be reduced to a collection of
tables.
In the database, every entity set or relationship set can be represented in tabular form.
The ER diagram is given below:

There are some points for converting the ER diagram to the table:
Entity type becomes a table.
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual tables.
All single-valued attribute becomes a column for the table.
In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of STUDENT table. Similarly,
COURSE_NAME and COURSE_ID form the column of COURSE table and so on.
A key attribute of the entity type represented by the primary key.
In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key attribute of
the entity.
The multivalued attribute is represented by a separate table.
In the student table, a hobby is a multivalued attribute. So it is not possible to represent multiple values in a
single column of STUDENT table. Hence we create a table STUD_HOBBY with column name STUDENT_ID and
HOBBY. Using both the column, we create a composite key.
Composite attribute represented by components.
In the given ER diagram, student address is a composite attribute. It contains CITY, PIN, DOOR#, STREET, and
STATE. In the STUDENT table, these attributes can merge as an individual column.
Derived attributes are not considered in the table.
In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time by calculating the
difference between current date and Date of Birth.
Using these rules, you can convert the ER diagram to tables and columns and assign the mapping between the
tables. Table structure for the given ER diagram is as below:

Relationship of higher degree


The degree of relationship can be defined as the number of occurrences in one entity that is associated with the
number of occurrences in another entity.
There is the three degree of relationship:
One-to-one (1:1)
One-to-many (1:M)
Many-to-many (M:N)
1. One-to-one
In a one-to-one relationship, one occurrence of an entity relates to only one occurrence in another entity.
A one-to-one relationship rarely exists in practice.
For example: if an employee is allocated a company car then that car can only be driven by that employee.
Therefore, employee and company car have a one-to-one relationship.
2. One-to-many
In a one-to-many relationship, one occurrence in an entity relates to many occurrences in another entity.
For example: An employee works in one department, but a department has many employees.
Therefore, department and employee have a one-to-many relationship.

3. Many-to-many
In a many-to-many relationship, many occurrences in an entity relate to many occurrences in another entity.
Same as a one-to-one relationship, the many-to-many relationship rarely exists in practice.
For example: At the same time, an employee can work on several projects, and a project has a team of many
employees.
Therefore, employee and project have a many-to-many relationship.

Relational Model
concept
Relational model can represent as a table with columns and rows. Each row is known as a tuple. Each table of
the column has a name or attribute.
Domain: It contains a set of atomic values that an attribute can take.
Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a domain, dom(Ai)
Relational instance: In the relational database system, the relational instance is represented by a finite set of
tuples. Relation instances do not have duplicate tuples.
Relational schema: A relational schema contains the name of the relation and name of all columns or attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify the row in the relation
uniquely.
Example: STUDENT Relation

NAME ROLL_NO PHONE_NO ADDRESS AGE

Ram 14795 7305758992 Noida 24

Shyam 12839 9026288936 Delhi 35


Laxman 33289 8583287182 Gurugram 20

Mahesh 27857 7086819134 Ghaziabad 27

Ganesh 17282 9028 9i3988 Delhi 40

In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the attributes.
The instance of schema STUDENT has 5 tuples.
t3 = <Laxman, 33289, 8583287182, Gurugram, 20>
Properties of Relations
Name of the relation is distinct from all other relations.
Each relation cell contains exactly one atomic (single) value
Each attribute contains a distinct name
Attribute domain has no significance
tuple has no duplicate value
Order of tuple can have a different sequence
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of the
query. It uses operators to perform queries.
Types of Relational operation

1. Select Operation:
The select operation selects tuples that satisfy a given predicate.
It is denoted by sigma (σ).
Notation: σ p(r)
Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These relational
can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation
40M
761
Difference between JDK, JRE, and JVM

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:
σ BRANCH_NAME="perryride" (LOAN)
Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

2. Project Operation:
This operation shows the list of those attributes that we wish to appear in the result. Rest of the attributes are
eliminated from the table.
It is denoted by ∏.
Notation: ∏ A1, A2, An (r)
Where
A1, A2, A3 is used as an attribute name of relation r.
Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison


Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:
∏ NAME, CITY (CUSTOMER)
Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:
Suppose there are two tuples R and S. The union operation contains all the tuples that are either in R or S or
both in R & S.

It eliminates the duplicate tuples. It is denoted by ∪.

Notation: R ∪ S
A union operation must hold the following condition:
R and S must have the attribute of the same number.
Duplicate tuples are eliminated automatically.
Example:
DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121
Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:

∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)


Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones
Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection:
Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in both R & S.
It is denoted by intersection ∩.
Notation: R ∩ S
Example: Using the above DEPOSITOR table and BORROW table
Input:
∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:

CUSTOMER_NAME

Smith

Jones

5. Set Difference:
Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in R but not in
S.
It is denoted by intersection minus (-).
Notation: R - S
Example: Using the above DEPOSITOR table and BORROW table
Input:
∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Output:

CUSTOMER_NAME

Jackson

Hayes

Willians
Curry

6. Cartesian product
The Cartesian product is used to combine each row in one table with each row in the other table. It is also known
as a cross product.
It is denoted by X.
Notation: E X D
Example:
EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input:
EMPLOYEE X DEPARTMENT
Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing
2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
Join Operations:

is denoted by ⋈.
A Join operation combines related tuples from different relations, if and only if a given join condition is satisfied. It

Example:
EMPLOYEE

EMP_CODE EMP_NAME

101 Stephan

102 Jack

103 Harry

SALARY

EMP_CODE SALARY

101 50000

102 30000

103 25000

Operation: (EMPLOYEE ⋈ SALARY)


Result:
40M
761
Difference between JDK, JRE, and JVM
EMP_CODE EMP_NAME SALARY

101 Stephan 50000

102 Jack 30000

103 Harry 25000

Types of Join operations:

1. Natural Join:
A natural join is the set of tuples of all combinations in R and S that are equal on their common attribute names.

It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:

∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)


Output:

EMP_NAME SALARY

Stephan 50000
Jack 30000

Harry 25000

2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing information.
Example:
EMPLOYEE

EMP_NAME STREET CITY

Ram Civil line Mumbai

Shyam Park street Kolkata

Ravi M.G. Street Delhi

Hari Nehru nagar Hyderabad

FACT_WORKERS

EMP_NAME BRANCH SALARY

Ram Infosys 10000

Shyam Wipro 20000

Kuber HCL 30000

Hari TCS 50000

Input:

(EMPLOYEE ⋈ FACT_WORKERS)
Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru nagar Hyderabad TCS 50000

An outer join is basically of three types:


Left outer join
Right outer join
Full outer join
a. Left outer join:
Left outer join contains the set of tuples of all combinations in R and S that are equal on their common attribute
names.
In the left outer join, tuples in R have no matching tuples in S.

It is denoted by ⟕.
Example: Using the above EMPLOYEE table and FACT_WORKERS table
Input:

EMPLOYEE ⟕ FACT_WORKERS

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

b. Right outer join:


Right outer join contains the set of tuples of all combinations in R and S that are equal on their common attribute
names.
In right outer join, tuples in S have no matching tuples in R.

It is denoted by ⟖.
Example: Using the above EMPLOYEE table and FACT_WORKERS Relation
Input:

EMPLOYEE ⟖ FACT_WORKERS
Output:

EMP_NAME BRANCH SALARY STREET CITY

Ram Infosys 10000 Civil line Mumbai

Shyam Wipro 20000 Park street Kolkata

Hari TCS 50000 Nehru street Hyderabad

Kuber HCL 30000 NULL NULL


c. Full outer join:
Full outer join is like a left or right join except that it contains all rows from both tables.
In full outer join, tuples in R that have no matching tuples in S and tuples in S that have no matching tuples in R
in their common attribute name.

It is denoted by ⟗.
Example: Using the above EMPLOYEE table and FACT_WORKERS table
Input:

EMPLOYEE ⟗ FACT_WORKERS
Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

Kuber NULL NULL HCL 30000

3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data as per the equality
condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION

CLASS_ID NAME

1 John

2 Harry

3 Jackson

PRODUCT

PRODUCT_ID CITY

1 Delhi

2 Mumbai
3 Noida

Input:

CUSTOMER ⋈ PRODUCT
Output:

CLASS_ID NAME PRODUCT_ID CITY

1 John 1 Delhi

2 Harry 2 Mumbai

3 Harry 3 Noida

Integrity Constraints
Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes have to be performed in such a
way that data integrity is not affected.
Thus, integrity constraint is used to guard against accidental damage to the database.
Types of Integrity Constraint

1. Domain constraints
Domain constraints can be defined as the definition of a valid set of values for an attribute.
The data type of domain includes string, character, integer, time, date, currency, etc. The value of the attribute
must be available in the corresponding domain.
Example:
2. Entity integrity constraints
The entity integrity constraint states that primary key value can't be null.
This is because the primary key value is used to identify individual rows in relation and if the primary key has a
null value, then we can't identify those rows.
A table can contain a null value other than the primary key field.
Example:

3. Referential Integrity Constraints


A referential integrity constraint is specified between two tables.
In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of Table 2, then every
value of the Foreign Key in Table 1 must be null or be available in Table 2.
Example:
4. Key constraints
Keys are the entity set that is used to identify an entity within its entity set uniquely.
An entity set can have multiple keys, but out of which one key will be the primary key. A primary key can contain
a unique and null value in the relational table.
Example:

Relational Calculus
Relational calculus is a non-procedural query language. In the non-procedural query language, the user is
concerned with the details of how to obtain the end results.
The relational calculus tells what to do but never explains how to do.
Types of Relational calculus:
1. Tuple Relational Calculus (TRC)
The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering variable uses the tuples
of a relation.
The result of the relation can have one or more tuples.
Notation:
{T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
For example:
{ T.name | Author(T) AND T.article = 'database' }
OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from Author who
has written an article on 'database'.

TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal Quantifiers
(∀).
For example:

{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}


Output: This query will yield the same result as the previous one.
2. Domain Relational Calculus (DRC)
The second form of relation is known as Domain relational calculus. In domain relational calculus, filtering
variable uses the domain of attributes.

Domain relational calculus uses the same operators as tuple calculus. It uses logical connectives ∧ (and), ∨ (or)
and ┓ (not).

It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.
Notation:
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
For example:

{< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}


Output: This query will yield the article, page, and subject from the relational javatpoint, where the subject is a
database
SQL
SQL stands for Structured Query Language. It is used for storing and managing data in relational database
management system (RDMS).
It is a standard language for Relational Database System. It enables a user to create, read, update and delete
relational databases and tables.
All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as their standard database
language.
SQL allows users to query the database in a number of ways, using English-like statements.
Rules:
SQL follows the following rules:
Structure query language is not case sensitive. Generally, keywords of SQL are written in uppercase.
Statements of SQL are dependent on text lines. We can use a single SQL statement on one or multiple text line.
Using the SQL statements, you can perform most of the actions in a database.
SQL depends on tuple relational calculus and relational algebra.
SQL process:
When an SQL command is executing for any RDBMS, then the system figure out the best way to carry out the
request and the SQL engine determines that how to interpret the task.
In the process, various components are included. These components can be optimization Engine, Query engine,
Query dispatcher, classic, etc.
All the non-SQL queries are handled by the classic query engine, but SQL query engine won't handle logical
files.
Characteristics of SQL
SQL is easy to learn.
SQL is used to access data from relational database management systems.
SQL can execute queries against the database.
SQL is used to describe the data.
SQL is used to define the data in the database and manipulate it when needed.
SQL is used to create and drop the database and table.
SQL is used to create a view, stored procedure, function in a database.
SQL allows users to set permissions on tables, procedures, and views.
Advantages of SQL
There are the following advantages of SQL:
High speed
Using the SQL queries, the user can quickly and efficiently retrieve a large amount of records from a database.
No coding needed
In the standard SQL, it is very easy to manage the database system. It doesn't require a substantial amount of
code to manage the database system.
Well defined standards
Long established are used by the SQL databases that are being used by ISO and ANSI.
39.6M
886
Features of Java - Javatpoint
Portability
SQL can be used in laptop, PCs, server and even some mobile phones.
Interactive language
SQL is a domain language used to communicate with the database. It is also used to receive answers to the
complex questions in seconds.
Multiple data view
Using the SQL language, the users can make different views of the database structure.
SQL Datatype
SQL Datatype is used to define the values that a column can contain.
Every column is required to have a name and data type in the database table.
Datatype of SQL:

1. Binary Datatypes
There are Three types of binary Datatypes which are given below:

Data Type Description

binary It has a maximum length of 8000 bytes. It contains fixed-length binary data.

varbinary It has a maximum length of 8000 bytes. It contains variable-length binary data.

image It has a maximum length of 2,147,483,647 bytes. It contains variable-length binary data.

2. Approximate Numeric Datatype :


The subtypes are given below:

Data type From To Description


float -1.79E + 308 1.79E + 308 It is used to specify a floating-point value e.g. 6.2, 2.9 etc.

real -3.40e + 38 3.40E + 38 It specifies a single precision floating point number

3. Exact Numeric Datatype


The subtypes are given below:

Data type Description

int It is used to specify an integer value.

smallint It is used to specify small integer value.

bit It has the number of bits to store.

decimal It specifies a numeric value that can have a decimal number.

numeric It is used to specify a numeric value.

4. Character String Datatype


The subtypes are given below:

Data type Description

char It has a maximum length of 8000 characters. It contains Fixed-length non-unicode characters.

varchar It has a maximum length of 8000 characters. It contains variable-length non-unicode characters.

text It has a maximum length of 2,147,483,647 characters. It contains variable-length non-unicode characters.

5. Date and time Datatypes


The subtypes are given below:

Datatype Description

date It is used to store the year, month, and days value.

time It is used to store the hour, minute, and second values.

timestamp It stores the year, month, day, hour, minute, and the second value.

SQL Commands
SQL commands are instructions. It is used to communicate with the database. It is also used to perform specific
tasks, functions, and queries of data.
SQL can perform various tasks like create a table, add data to tables, drop the table, modify the table, set
permission for users.
Types of SQL Commands
There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.

1. Data Definition Language (DDL)


DDL changes the structure of the table like creating a table, deleting a table, altering a table, etc.
All the command of DDL are auto-committed that means it permanently save all the changes in the database.
Here are some commands that come under DDL:
CREATE
ALTER
DROP
TRUNCATE
a. CREATE It is used to create a new table in the database.
Syntax:
00:00/01:27
CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);
Example:
CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE);
b. DROP: It is used to delete both the structure and record stored in the table.
Syntax
DROP TABLE table_name;
Example
DROP TABLE EMPLOYEE;
c. ALTER: It is used to alter the structure of the database. This change could be either to modify the
characteristics of an existing attribute or probably to add a new attribute.
Syntax:
To add a new column in the table
ALTER TABLE table_name ADD column_name COLUMN-definition;
To modify existing column in the table:
ALTER TABLE table_name MODIFY(column_definitions....);
EXAMPLE
ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));
ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));
d. TRUNCATE: It is used to delete all the rows from the table and free the space containing the table.
Syntax:
TRUNCATE TABLE table_name;
Example:
TRUNCATE TABLE EMPLOYEE;
2. Data Manipulation Language
DML commands are used to modify the database. It is responsible for all form of changes in the database.
The command of DML is not auto-committed that means it can't permanently save all the changes in the
database. They can be rollback.
Here are some commands that come under DML:
INSERT
UPDATE
DELETE
a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of a table.
Syntax:
INSERT INTO TABLE_NAME
(col1, col2, col3,.... col N)
VALUES (value1, value2, value3, .... valueN);
Or
INSERT INTO TABLE_NAME
VALUES (value1, value2, value3, .... valueN);
For example:
INSERT INTO javatpoint (Author, Subject) VALUES ("Sonoo", "DBMS");
b. UPDATE: This command is used to update or modify the value of a column in the table.
Syntax:
UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE CONDITION]
For example:
UPDATE students
SET User_Name = 'Sonoo'
WHERE Student_Id = '3'
c. DELETE: It is used to remove one or more row from a table.
Syntax:
DELETE FROM table_name [WHERE condition];
For example:
DELETE FROM javatpoint
WHERE Author="Sonoo";
3. Data Control Language
DCL commands are used to grant and take back authority from any database user.
Here are some commands that come under DCL:
Grant
Revoke
a. Grant: It is used to give user access privileges to a database.
Example
GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_USER;
b. Revoke: It is used to take back permissions from the user.
Example
REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;
4. Transaction Control Language
TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.
These operations are automatically committed in the database that's why they cannot be used while creating
tables or dropping them.
Here are some commands that come under TCL:
COMMIT
ROLLBACK
SAVEPOINT
a. Commit: Commit command is used to save all the transactions to the database.
Syntax:
COMMIT;
Example:
DELETE FROM CUSTOMERS
WHERE AGE = 25;
COMMIT;
b. Rollback: Rollback command is used to undo transactions that have not already been saved to the database.
Syntax:
ROLLBACK;
Example:
DELETE FROM CUSTOMERS
WHERE AGE = 25;
ROLLBACK;
c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling back the entire transaction.
Syntax:
SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language
DQL is used to fetch the data from the database.
It uses only one command:
SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is used to select the attribute
based on the condition described by WHERE clause.
Syntax:
SELECT expressions
FROM TABLES
WHERE conditions;
For example:
SELECT emp_name
FROM employee
WHERE age > 20;
SQL Operator
There are various types of SQL operator:
SQL Arithmetic Operators
Let's assume 'variable a' and 'variable b'. Here, 'a' contains 20 and 'b' contains 10.

Operator Description Example

+ It adds the value of both operands. a+b will give 30

- It is used to subtract the right-hand operand from the left-hand operand. a-b will give 10

* It is used to multiply the value of both operands. a*b will give 200

/ It is used to divide the left-hand operand by the right-hand operand. a/b will give 2

% It is used to divide the left-hand operand by the right-hand operand and returns reminder. a%b will give 0

SQL Comparison Operators:


Let's assume 'variable a' and 'variable b'. Here, 'a' contains 20 and 'b' contains 10.

Operato Description Example


r

= It checks if two operands values are equal or not, if the values are queal then condition (a=b) is not
becomes true. true

!= It checks if two operands values are equal or not, if values are not equal, then condition (a!=b) is true
becomes true.

<> It checks if two operands values are equal or not, if values are not equal then condition (a<>b) is true
becomes true.

> It checks if the left operand value is greater than right operand value, if yes then condition (a>b) is not
becomes true. true

< It checks if the left operand value is less than right operand value, if yes then condition (a<b) is true
becomes true.

>= It checks if the left operand value is greater than or equal to the right operand value, if yes (a>=b) is not
then condition becomes true. true

<= It checks if the left operand value is less than or equal to the right operand value, if yes (a<=b) is true
then condition becomes true.

!< It checks if the left operand value is not less than the right operand value, if yes then (a!=b) is not
condition becomes true. true

!> It checks if the left operand value is not greater than the right operand value, if yes then (a!>b) is true
condition becomes true.

SQL Logical Operators


There is the list of logical operator used in SQL:

Operator Description

ALL It compares a value to all values in another value set.

AND It allows the existence of multiple conditions in an SQL statement.

ANY It compares the values in the list according to the condition.

BETWEEN It is used to search for values that are within a set of values.

IN It compares a value to that specified list value.

NOT It reverses the meaning of any logical operator.

OR It combines multiple conditions in SQL statements.

EXISTS It is used to search for the presence of a row in a specified table.

LIKE It compares a value to similar values using wildcard operator.

SQL Table
SQL Table is a collection of data which is organized in terms of rows and columns. In DBMS, the table is known
as relation and row as a tuple.
Table is a simple form of data storage. A table is also considered as a convenient representation of relations.
Let's see an example of the EMPLOYEE table:

EMP_ID EMP_NAME CITY PHONE_NO

1 Kristen Washington 7289201223

2 Anna Franklin 9378282882

3 Jackson Bristol 9264783838

4 Kellan California 7254728346

5 Ashley Hawaii 9638482678

In the above table, "EMPLOYEE" is the table name, "EMP_ID", "EMP_NAME", "CITY", "PHONE_NO" are the
column names. The combination of data of multiple columns forms a row, e.g., 1, "Kristen", "Washington" and
7289201223 are the data of one row.
Operation on Table
Create table
Drop table
Delete table
Rename table
SQL Create Table
SQL create table is used to create a table in the database. To define the table, you should define the name of the
table and also define its columns and column's data type.
Syntax
00:00/05:29
create table "table_name"
("column1" "data type",
"column2" "data type",
"column3" "data type",
...
"columnN" "data type");
Example
SQL> CREATE TABLE EMPLOYEE (
EMP_ID INT NOT NULL,
EMP_NAME VARCHAR (25) NOT NULL,
PHONE_NO INT NOT NULL,
ADDRESS CHAR (30),
PRIMARY KEY (ID)
);
If you create the table successfully, you can verify the table by looking at the message by the SQL server. Else
you can use DESC command as follows:
SQL> DESC EMPLOYEE;

Field Type Null Key Default Extra

EMP_ID int(11) NO PRI NULL

EMP_NAME varchar(25) NO NULL

PHONE_NO NO int(11) NULL

ADDRESS YES NULL char(30)

4 rows in set (0.35 sec)


Now you have an EMPLOYEE table in the database, and you can use the stored information related to the
employees.

Drop table
A SQL drop table is used to delete a table definition and all the data from a table. When this command is
executed, all the information available in the table is lost forever, so you have to very careful while using this
command.
Syntax
DROP TABLE "table_name";
Firstly, you need to verify the EMPLOYEE table using the following command:
SQL> DESC EMPLOYEE;

Field Type Null Key Default Extra

EMP_ID int(11) NO PRI NULL

EMP_NAME varchar(25) NO NULL

PHONE_NO NO int(11) NULL

ADDRESS YES NULL char(30)

4 rows in set (0.35 sec)


This table shows that EMPLOYEE table is available in the database, so we can drop it as follows:
SQL>DROP TABLE EMPLOYEE;
Now, we can check whether the table exists or not using the following command:
Query OK, 0 rows affected (0.01 sec)
As this shows that the table is dropped, so it doesn't display it.

SQL DELETE table


In SQL, DELETE statement is used to delete rows from a table. We can use WHERE condition to delete a
specific row from a table. If you want to delete all the records from the table, then you don't need to use the
WHERE clause.
Syntax
DELETE FROM table_name WHERE condition;
Example
Suppose, the EMPLOYEE table having the following records:

EMP_ID EMP_NAME CITY PHONE_NO SALARY

1 Kristen Chicago 9737287378 150000

2 Russell Austin 9262738271 200000

3 Denzel Boston 7353662627 100000

4 Angelina Denver 9232673822 600000

5 Robert Washington 9367238263 350000

6 Christian Los angels 7253847382 260000

The following query will DELETE an employee whose ID is 2.


SQL> DELETE FROM EMPLOYEE
WHERE EMP_ID = 3;
Now, the EMPLOYEE table would have the following records.

EMP_ID EMP_NAME CITY PHONE_NO SALARY

1 Kristen Chicago 9737287378 150000

2 Russell Austin 9262738271 200000

4 Angelina Denver 9232673822 600000

5 Robert Washington 9367238263 350000

6 Christian Los angels 7253847382 260000

If you don't specify the WHERE condition, it will remove all the rows from the table.
DELETE FROM EMPLOYEE;
Now, the EMPLOYEE table would not have any records.
SQL SELECT Statement
In SQL, the SELECT statement is used to query or retrieve data from a table in the database. The returns data is
stored in a table, and the result table is known as result-set.
Syntax
SELECT column1, column2, ...
FROM table_name;
Here, the expression is the field name of the table that you want to select data from.
Use the following syntax to select all the fields available in the table:
37.9M
866
OOPs Concepts in Java
SELECT * FROM table_name;
Example:
EMPLOYEE

EMP_ID EMP_NAME CITY PHONE_NO SALARY

1 Kristen Chicago 9737287378 150000

2 Russell Austin 9262738271 200000

3 Angelina Denver 9232673822 600000

4 Robert Washington 9367238263 350000

5 Christian Los angels 7253847382 260000

To fetch the EMP_ID of all the employees, use the following query:
SELECT EMP_ID FROM EMPLOYEE;
Output

EMP_ID

3
4

To fetch the EMP_NAME and SALARY, use the following query:


SELECT EMP_NAME, SALARY FROM EMPLOYEE;

EMP_NAME SALARY

Kristen 150000

Russell 200000

Angelina 600000

Robert 350000

Christian 260000

To fetch all the fields from the EMPLOYEE table, use the following query:
SELECT * FROM EMPLOYEE
Output

EMP_ID EMP_NAME CITY PHONE_NO SALARY

1 Kristen Chicago 9737287378 150000

2 Russell Austin 9262738271 200000

3 Angelina Denver 9232673822 600000

4 Robert Washington 9367238263 350000

5 Christian Los angels 7253847382 260000

SQL INSERT Statement


The SQL INSERT statement is used to insert a single or multiple data in a table. In SQL, You can insert the data
in two ways:
Without specifying column name
By specifying column name
Sample Table
EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE


1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

1. Without specifying column name


If you want to specify all column values, you can specify or ignore the column values.
Syntax
INSERT INTO TABLE_NAME
VALUES (value1, value2, value 3, .... Value N);
Query
INSERT INTO EMPLOYEE VALUES (6, 'Marry', 'Canada', 600000, 48);
Output: After executing this query, the EMPLOYEE table will look like:

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

2. By specifying column name


To insert partial column values, you must have to specify the column names.
Syntax
INSERT INTO TABLE_NAME
[(col1, col2, col3,.... col N)]
VALUES (value1, value2, value 3, .... Value N);
Query
INSERT INTO EMPLOYEE (EMP_ID, EMP_NAME, AGE) VALUES (7, 'Jack', 40);
Output: After executing this query, the table will look like:

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

7 Jack null null 40

SQL Update Statement


The SQL UPDATE statement is used to modify the data that is already in the database. The condition in the
WHERE clause decides that which row is to be updated.
Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Sample Table
EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Updating single record


Update the column EMP_NAME and set the value to 'Emma' in the row where SALARY is 500000.
Syntax
UPDATE table_name
SET column_name = value
WHERE condition;
Query
UPDATE EMPLOYEE
SET EMP_NAME = 'Emma'
WHERE SALARY = 500000;
Output: After executing this query, the EMPLOYEE table will look like:

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Emma Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Updating multiple records


If you want to update multiple columns, you should separate each field assigned with a comma. In the
EMPLOYEE table, update the column EMP_NAME to 'Kevin' and CITY to 'Boston' where EMP_ID is 5.
Syntax
UPDATE table_name
SET column_name = value1, column_name2 = value2
WHERE condition;
Query
UPDATE EMPLOYEE
SET EMP_NAME = 'Kevin', City = 'Boston'
WHERE EMP_ID = 5;
Output

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30


2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Kevin Boston 200000 36

6 Marry Canada 600000 48

Without use of WHERE clause


If you want to update all row from a table, then you don't need to use the WHERE clause. In the EMPLOYEE
table, update the column EMP_NAME as 'Harry'.
Syntax
UPDATE table_name
SET column_name = value1;
Query
UPDATE EMPLOYEE
SET EMP_NAME = 'Harry';
Output

EMP_ID EMP_NAME CITY SALARY AGE

1 Harry Chicago 200000 30

2 Harry Austin 300000 26

3 Harry Denver 100000 42

4 Harry Washington 500000 29

5 Harry Los angels 200000 36

6 Harry Canada 600000 48

SQL DELETE Statement


The SQL DELETE statement is used to delete rows from a table. Generally, DELETE statement removes one or
more records form a table.
Syntax
DELETE FROM table_name WHERE some_condition;
Sample Table
EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Deleting Single Record


Delete the row from the table EMPLOYEE where EMP_NAME = 'Kristen'. This will delete only the fourth row.
28M
519
Prime Ministers of India | List of Prime Minister of India (1947-2020)
Query
DELETE FROM EMPLOYEE
WHERE EMP_NAME = 'Kristen';
Output: After executing this query, the EMPLOYEE table will look like:

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Deleting Multiple Record


Delete the row from the EMPLOYEE table where AGE is 30. This will delete two rows(first and third row).
Query
DELETE FROM EMPLOYEE WHERE AGE= 30;
Output: After executing this query, the EMPLOYEE table will look like:
EMP_ID EMP_NAME CITY SALARY AGE

2 Robert Austin 300000 26

3 Christian Denver 100000 42

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Delete all of the records


Delete all the row from the EMPLOYEE table. After this, no records left to display. The EMPLOYEE table will
become empty.
Syntax
DELETE * FROM table_name;
or
DELETE FROM table_name;
Query
DELETE FROM EMPLOYEE;
Output: After executing this query, the EMPLOYEE table will look like:

EMP_ID EMP_NAME CITY SALARY AGE

Views in SQL
Views in SQL are considered as a virtual table. A view also contains rows and columns.
To create the view, we can select the fields from one or more tables present in the database.
A view can either have specific rows based on certain condition or all the rows of a table.
Sample table:
Student_Detail

STU_ID NAME ADDRESS

1 Stephan Delhi

2 Kathrin Noida

3 David Ghaziabad

4 Alina Gurugram

Student_Marks
STU_ID NAME MARKS AGE

1 Stephan 97 19

2 Kathrin 86 21

3 David 74 18

4 Alina 90 20

5 John 96 18

1. Creating view
A view can be created using the CREATE VIEW statement. We can create a view from a single table or multiple
tables.
Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
2. Creating View from a single table
In this example, we create a View named DetailsView from the table Student_Detail.
Query:
CREATE VIEW DetailsView AS
SELECT NAME, ADDRESS
FROM Student_Details
WHERE STU_ID < 4;
Just like table query, we can query the view to view the data.
SELECT * FROM DetailsView;
Output:

NAME ADDRESS

Stephan Delhi

Kathrin Noida

David Ghaziabad

3. Creating View from multiple tables


View from multiple tables can be created by simply include multiple tables in the SELECT statement.
In the given example, a view is created named MarksView from two tables Student_Detail and Student_Marks.
Query:
CREATE VIEW MarksView AS
SELECT Student_Detail.NAME, Student_Detail.ADDRESS, Student_Marks.MARKS
FROM Student_Detail, Student_Mark
WHERE Student_Detail.NAME = Student_Marks.NAME;
To display data of View MarksView:
SELECT * FROM MarksView;

NAME ADDRESS MARKS

Stephan Delhi 97

Kathrin Noida 86

David Ghaziabad 74

Alina Gurugram 90

4. Deleting View
A view can be deleted using the Drop View statement.
Syntax
DROP VIEW view_name;
SQL Index
Indexes are special lookup tables. It is used to retrieve data from the database very fast.
An Index is used to speed up select queries and where clauses. But it shows down the data input with insert and
update statements. Indexes can be created or dropped without affecting the data.
An index in a database is just like an index in the back of a book.
For example: When you reference all pages in a book that discusses a certain topic, you first have to refer to the
index, which alphabetically lists all the topics and then referred to one or more specific page numbers.
1. Create Index statement
It is used to create an index on a table. It allows duplicate value.
Syntax
CREATE INDEX index_name
ON table_name (column1, column2, ...);
Example
CREATE INDEX idx_name
ON Persons (LastName, FirstName);
2. Unique Index statement
It is used to create a unique index on a table. It does not allow duplicate value.
Syntax
CREATE UNIQUE INDEX index_name
ON table_name (column1, column2, ...);
Example
CREATE UNIQUE INDEX websites_idx
ON websites (site_name);
3. Drop Index Statement
It is used to delete an index in a table.
Syntax
DROP INDEX index_name;
Example
DROP INDEX websites_idx;
SQL Sub Query
A Subquery is a query within another SQL query and embedded within the WHERE clause.
Important Rule:
A subquery can be placed in a number of SQL clauses like WHERE clause, FROM clause, HAVING clause.
You can use Subquery with SELECT, UPDATE, INSERT, DELETE statements along with the operators like =, <,
>, >=, <=, IN, BETWEEN, etc.
A subquery is a query within another query. The outer query is known as the main query, and the inner query is
known as a subquery.
Subqueries are on the right side of the comparison operator.
A subquery is enclosed in parentheses.
In the Subquery, ORDER BY command cannot be used. But GROUP BY command can be used to perform the
same function as ORDER BY command.
1. Subqueries with the Select Statement
SQL subqueries are most frequently used with the Select statement.
Syntax
SELECT column_name
FROM table_name
WHERE column_name expression operator
( SELECT column_name from table_name WHERE ... );
Example
Consider the EMPLOYEE table have the following records:

ID NAME AGE ADDRESS SALARY

1 John 20 US 2000.00
2 Stephan 26 Dubai 1500.00

3 David 27 Bangkok 2000.00

4 Alina 29 UK 6500.00

5 Kathrin 34 Bangalore 8500.00

6 Harry 42 China 4500.00

7 Jackson 25 Mizoram 10000.00

The subquery with a SELECT statement will be:


SELECT *
FROM EMPLOYEE
WHERE ID IN (SELECT ID
FROM EMPLOYEE
WHERE SALARY > 4500);
This would produce the following result:

ID NAME AGE ADDRESS SALARY

4 Alina 29 UK 6500.00

5 Kathrin 34 Bangalore 8500.00

7 Jackson 25 Mizoram 10000.00

2. Subqueries with the INSERT Statement


SQL subquery can also be used with the Insert statement. In the insert statement, data returned from the
subquery is used to insert into another table.
In the subquery, the selected data can be modified with any of the character, date functions.
Syntax:
INSERT INTO table_name (column1, column2, column3....)
SELECT *
FROM table_name
WHERE VALUE OPERATOR
Example
Consider a table EMPLOYEE_BKP with similar as EMPLOYEE.
Now use the following syntax to copy the complete EMPLOYEE table into the EMPLOYEE_BKP table.
INSERT INTO EMPLOYEE_BKP
SELECT * FROM EMPLOYEE
WHERE ID IN (SELECT ID
FROM EMPLOYEE);
3. Subqueries with the UPDATE Statement
The subquery of SQL can be used in conjunction with the Update statement. When a subquery is used with the
Update statement, then either single or multiple columns in a table can be updated.
Syntax
UPDATE table
SET column_name = new_value
WHERE VALUE OPERATOR
(SELECT COLUMN_NAME
FROM TABLE_NAME
WHERE condition);
Example
Let's assume we have an EMPLOYEE_BKP table available which is backup of EMPLOYEE table. The given
example updates the SALARY by .25 times in the EMPLOYEE table for all employee whose AGE is greater than
or equal to 29.
UPDATE EMPLOYEE
SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 29);
This would impact three rows, and finally, the EMPLOYEE table would have the following records.

ID NAME AGE ADDRESS SALARY

1 John 20 US 2000.00

2 Stephan 26 Dubai 1500.00

3 David 27 Bangkok 2000.00

4 Alina 29 UK 1625.00

5 Kathrin 34 Bangalore 2125.00

6 Harry 42 China 1125.00

7 Jackson 25 Mizoram 10000.00

4. Subqueries with the DELETE Statement


The subquery of SQL can be used in conjunction with the Delete statement just like any other statements
mentioned above.
Syntax
DELETE FROM TABLE_NAME
WHERE VALUE OPERATOR
(SELECT COLUMN_NAME
FROM TABLE_NAME
WHERE condition);
Example
Let's assume we have an EMPLOYEE_BKP table available which is backup of EMPLOYEE table. The given
example deletes the records from the EMPLOYEE table for all EMPLOYEE whose AGE is greater than or equal
to 29.
DELETE FROM EMPLOYEE
WHERE AGE IN (SELECT AGE FROM EMPLOYEE_BKP
WHERE AGE >= 29 );
This would impact three rows, and finally, the EMPLOYEE table would have the following records.

ID NAME AGE ADDRESS SALARY

1 John 20 US 2000.00

2 Stephan 26 Dubai 1500.00

3 David 27 Bangkok 2000.00

7 Jackson 25 Mizoram 10000.00

SQL Clauses
The following are the various SQL clauses:

1. GROUP BY
SQL GROUP BY statement is used to arrange identical data into groups. The GROUP BY statement is used with
the SQL SELECT statement.
The GROUP BY statement follows the WHERE clause in a SELECT statement and precedes the ORDER BY
clause.
The GROUP BY statement is used with aggregation function.
Syntax
SELECT column
FROM table_name
WHERE conditions
GROUP BY column
ORDER BY column
Sample table:
PRODUCT_MAST

PRODUCT COMPANY QTY RATE COST

Item1 Com1 2 10 20

Item2 Com2 3 25 75

Item3 Com1 2 30 60

Item4 Com3 5 10 50

Item5 Com2 2 20 40

Item6 Cpm1 3 25 75

Item7 Com1 5 30 150

Item8 Com1 3 10 30

Item9 Com2 2 25 50

Item10 Com3 4 30 120

Example:
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY;
Output:
Com1 5
Com2 3
Com3 2
2. HAVING
HAVING clause is used to specify a search condition for a group or an aggregate.
Having is used in a GROUP BY clause. If you are not using GROUP BY clause then you can use HAVING
function like a WHERE clause.
Syntax:
SELECT column1, column2
FROM table_name
WHERE conditions
GROUP BY column1, column2
HAVING conditions
ORDER BY column1, column2;
Example:
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING COUNT(*)>2;
Output:
Com1 5
Com2 3
3. ORDER BY
The ORDER BY clause sorts the result-set in ascending or descending order.
It sorts the records in ascending order by default. DESC keyword is used to sort the records in descending order.
Syntax:
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column1, column2... ASC|DESC;
Where
ASC: It is used to sort the result set in ascending order by expression.
DESC: It sorts the result set in descending order by expression.
Example: Sorting Results in Ascending Order
Table:
CUSTOMER

CUSTOMER_ID NAME ADDRESS


12 Kathrin US

23 David Bangkok

34 Alina Dubai

45 John UK

56 Harry US

Enter the following SQL statement:


SELECT *
FROM CUSTOMER
ORDER BY NAME;
Output:

CUSTOMER_ID NAME ADDRESS

34 Alina Dubai

23 David Bangkok

56 Harry US

45 John UK

12 Kathrin US

Example: Sorting Results in Descending Order


Using the above CUSTOMER table
SELECT *
FROM CUSTOMER
ORDER BY NAME DESC;
Output:

CUSTOMER_ID NAME ADDRESS

12 Kathrin US

45 John UK
56 Harry US

23 David Bangkok

34 Alina Dubai

SQL Aggregate Functions


SQL aggregation function is used to perform the calculations on multiple rows of a single column of a table. It
returns a single value.
It is also used to summarize the data.
Types of SQL Aggregation Function

1. COUNT FUNCTION
COUNT function is used to Count the number of rows in a database table. It can work on both numeric and non-
numeric data types.
COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table. COUNT(*)
considers duplicate and Null.
Syntax
COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )
Sample table:
PRODUCT_MAST

PRODUCT COMPANY QTY RATE COST

Item1 Com1 2 10 20

Item2 Com2 3 25 75
Item3 Com1 2 30 60

Item4 Com3 5 10 50

Item5 Com2 2 20 40

Item6 Cpm1 3 25 75

Item7 Com1 5 30 150

Item8 Com1 3 10 30

Item9 Com2 2 25 50

Item10 Com3 4 30 120

Example: COUNT()
SELECT COUNT(*)
FROM PRODUCT_MAST;
Output:
10
Example: COUNT with WHERE
SELECT COUNT(*)
FROM PRODUCT_MAST;
WHERE RATE>=20;
Output:
7
Example: COUNT() with DISTINCT
SELECT COUNT(DISTINCT COMPANY)
FROM PRODUCT_MAST;
Output:
3
Example: COUNT() with GROUP BY
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY;
Output:
Com1 5
Com2 3
Com3 2
Example: COUNT() with HAVING
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING COUNT(*)>2;
Output:
Com1 5
Com2 3
2. SUM Function
Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.
Syntax
SUM()
or
SUM( [ALL|DISTINCT] expression )
Example: SUM()
SELECT SUM(COST)
FROM PRODUCT_MAST;
Output:
670
Example: SUM() with WHERE
SELECT SUM(COST)
FROM PRODUCT_MAST
WHERE QTY>3;
Output:
320
Example: SUM() with GROUP BY
SELECT SUM(COST)
FROM PRODUCT_MAST
WHERE QTY>3
GROUP BY COMPANY;
Output:
Com1 150
Com2 170
Example: SUM() with HAVING
SELECT COMPANY, SUM(COST)
FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING SUM(COST)>=170;
Output:
Com1 335
Com3 170
3. AVG function
The AVG function is used to calculate the average value of the numeric type. AVG function returns the average
of all non-Null values.
Syntax
AVG()
or
AVG( [ALL|DISTINCT] expression )
Example:
SELECT AVG(COST)
FROM PRODUCT_MAST;
Output:
67.00
4. MAX Function
MAX function is used to find the maximum value of a certain column. This function determines the largest value
of all selected values of a column.
Syntax
MAX()
or
MAX( [ALL|DISTINCT] expression )
Example:
SELECT MAX(RATE)
FROM PRODUCT_MAST;
30
5. MIN Function
MIN function is used to find the minimum value of a certain column. This function determines the smallest value
of all selected values of a column.
Syntax
MIN()
or
MIN( [ALL|DISTINCT] expression )
Example:
SELECT MIN(RATE)
FROM PRODUCT_MAST;
Output:
10
SQL JOIN
As the name shows, JOIN means to combine something. In case of SQL, JOIN means "to combine two or more
tables".
In SQL, JOIN clause is used to combine the records from two or more tables in a database.
Types of SQL JOIN
INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
Sample Table
EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

PROJECT
39.6M
886
Features of Java - Javatpoint
Next
Stay

PROJECT_NO EMP_ID DEPARTMENT

101 1 Testing
102 2 Development

103 3 Designing

104 4 Development

1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as the condition is
satisfied. It returns the combination of all rows from both the tables where the condition satisfies.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
INNER JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

2. LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the right table. If there is no
matching join value, it will return NULL.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
LEFT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

Russell NULL

Marry NULL

3. RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the matched values
from the left table. If there is no matching in both tables, it will return NULL.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
RIGHT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing
Kristen Development

4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join tables have all the records
from both tables. It puts NULL on the place of matches not found.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
FULL JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

Russell NULL

Marry NULL

SQL Set Operation


The SQL Set operation is used to combine the two or more SQL SELECT statements.
Types of Set Operation
Union
UnionAll
Intersect
Minus
1. Union
The SQL Union operation is used to combine the result of two or more SQL SELECT queries.
In the union operation, all the number of datatype and columns must be same in both the tables on which UNION
operation is being applied.
The union operation eliminates the duplicate rows from its resultset.
Syntax
SELECT column_name FROM table1
UNION
SELECT column_name FROM table2;
Example:
The First table
Competitive questions on Structures in Hindi
Keep Watching

ID NAME

1 Jack

2 Harry

3 Jackson
The Second table

ID NAME

3 Jackson

4 Stephan

5 David

Union SQL query will be:


SELECT * FROM First
UNION
SELECT * FROM Second;
The resultset table will look like:

ID NAME

1 Jack

2 Harry

3 Jackson

4 Stephan

5 David

2. Union All
Union All operation is equal to the Union operation. It returns the set without removing duplication and sorting the
data.
Syntax:
SELECT column_name FROM table1
UNION ALL
SELECT column_name FROM table2;
Example: Using the above First and Second table.
Union All query will be like:
SELECT * FROM First
UNION ALL
SELECT * FROM Second;
The resultset table will look like:
ID NAME

1 Jack

2 Harry

3 Jackson

3 Jackson

4 Stephan

5 David

3. Intersect
It is used to combine two SELECT statements. The Intersect operation returns the common rows from both the
SELECT statements.
In the Intersect operation, the number of datatype and columns must be the same.
It has no duplicates and it arranges the data in ascending order by default.
Syntax
SELECT column_name FROM table1
INTERSECT
SELECT column_name FROM table2;
Example:
Using the above First and Second table.
Intersect query will be:
SELECT * FROM First
INTERSECT
SELECT * FROM Second;
The resultset table will look like:

ID NAME

3 Jackson

4. Minus
It combines the result of two SELECT statements. Minus operator is used to display the rows which are present
in the first query but absent in the second query.
It has no duplicates and data arranged in ascending order by default.
Syntax:
SELECT column_name FROM table1
MINUS
SELECT column_name FROM table2;
Example
Using the above First and Second table.
Minus query will be:
SELECT * FROM First
MINUS
SELECT * FROM Second;
The resultset table will look like:

ID NAME

1 Jack

2 Harry

SQL Cursors
As we have discussed SQL Cursors in SQL tutorial of javatpoint so you can go through the concepts again to
make things more clear. View SQL Cursors Details
SQL Trigger
As we have discussed SQL Trigger in SQL tutorial of javatpoint so you can go through the concepts again to
make things more clear. View SQL Trigger Details

Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists between the
primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know the
Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. Trivial functional dependency
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.
2. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.
When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate
the undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using relationship.
The normal form is used to reduce redundancy from the database table.
Types of Normal Forms
There are the four types of normal forms:
Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the
primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.

5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.

First Normal Form (1NF)


A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.
TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset
of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30
47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Third Normal Form (3NF)


A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function
dependency X → Y.
X is a super key.
Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:


{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime
attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third
normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce Codd normal form (BCNF)


BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO


264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:


EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232
D283 549

Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Fourth normal form (4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi-
valued dependency.
Example
STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is
no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math
34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)


A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
5NF is also known as Project-join normal form (PJ/NF).
Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for
Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we
can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

Relational Decomposition
When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is
required.
In a database, it breaks the table into multiple tables.
If the relation has no proper decomposition, then it may lead to problems like loss of information.
Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and
redundancy.
Types of Decomposition

Lossless Decomposition
If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
The lossless decomposition guarantees that the join of relations will result in the same relation as it was
decomposed.
The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi
46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look
like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.


Dependency Preserving
It is an important constraint of the database.
In the dependency preservation, at least one decomposed table must satisfy every dependency.
If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of R1 or
R2 or must be derivable from the combination of functional dependencies of R1 and R2.
For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The relational R
is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of
relation R1(ABC).
Multivalued Dependency
Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend
on a third attribute.
A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it
always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each
model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of
these dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined
COLOR".
Join Dependency
Join decomposition is a further generalization of Multivalued dependencies.
If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists.
Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).
Alternatively, R1 and R2 are a lossless decomposition of R.

A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.
The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R.
Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.
Inclusion Dependency
Multivalued dependency and join dependency can be used to guide database design although they both are less
common than functional dependencies.
Inclusion dependencies are quite common. They typically show little influence on designing of the database.
The inclusion dependency is a statement in which some columns of a relation are contained in other columns.
The example of inclusion dependency is a foreign key. In one relation, the referring relation is contained in the
primary key column(s) of the referenced relation.
Suppose we have two relations R and S which was obtained by translating two entity sets such that every R
entity is also an S entity.
Inclusion dependency would be happen if projecting R on its key attributes yields a relation that is contained in
the relation obtained by projecting S on its key attributes.
In inclusion dependency, we should not split groups of attributes that participate in an inclusion dependency.
In practice, most inclusion dependencies are key-based that is involved only keys.
Canonical Cover
In the case of updating the database, the responsibility of the system is to check whether the existing functional
dependencies are getting violated during the process of updating. In case of a violation of functional
dependencies in the new database state, the rollback of the system must take place.
A canonical cover or irreducible a set of functional dependencies FD is a simplified set of FD that has a similar
closure as the original set FD.
Extraneous attributes
An attribute of an FD is said to be extraneous if we can remove it without changing the closure of the set of FD.
Example: Given a relational Schema R( A, B, C, D) and set of Function Dependency FD = { B → A, AD → BC,
C → ABD }. Find the canonical cover?
Solution: Given FD = { B → A, AD → BC, C → ABD }, now decompose the FD using decomposition
rule( Armstrong Axiom ).
B→A
AD → B ( using decomposition inference rule on AD → BC)
AD → C ( using decomposition inference rule on AD → BC)
C → A ( using decomposition inference rule on C → ABD)
C → B ( using decomposition inference rule on C → ABD)
C → D ( using decomposition inference rule on C → ABD)
Now set of FD = { B → A, AD → B, AD → C, C → A, C → B, C → D }
The next step is to find closure of the left side of each of the given FD by including that FD and excluding that
FD, if closure in both cases are same then that FD is redundant and we remove that FD from the given set,
otherwise if both the closures are different then we do not exclude that FD.
Calculating closure of all FD { B → A, AD → B, AD → C, C → A, C → B, C → D }
1a. Closure B+ = BA using FD = { B → A, AD → B, AD → C, C → A, C → B, C → D }
1b. Closure B+ = B using FD = { AD → B, AD → C, C → A, C → B, C → D }
From 1 a and 1 b, we found that both the Closure( by including B → A and excluding B → A ) are not equivalent,
hence FD B → A is important and cannot be removed from the set of FD.
2 a. Closure AD+ = ADBC using FD = { B →A, AD → B, AD → C, C → A, C → B, C → D }
2 b. Closure AD+ = ADCB using FD = { B → A, AD → C, C → A, C → B, C → D }
From 2 a and 2 b, we found that both the Closure (by including AD → B and excluding AD → B) are equivalent,
hence FD AD → B is not important and can be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → A, C → B, C → D }
3 a. Closure AD+ = ADCB using FD = { B →A, AD → C, C → A, C → B, C → D }
3 b. Closure AD+ = AD using FD = { B → A, C → A, C → B, C → D }
From 3 a and 3 b, we found that both the Closure (by including AD → C and excluding AD → C ) are not
equivalent, hence FD AD → C is important and cannot be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → A, C → B, C → D }
4 a. Closure C+ = CABD using FD = { B →A, AD → C, C → A, C → B, C → D }
4 b. Closure C+ = CBDA using FD = { B → A, AD → C, C → B, C → D }
From 4 a and 4 b, we found that both the Closure (by including C → A and excluding C → A) are equivalent,
hence FD C → A is not important and can be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }
5 a. Closure C+ = CBDA using FD = { B →A, AD → C, C → B, C → D }
5 b. Closure C+ = CD using FD = { B → A, AD → C, C → D }
From 5 a and 5 b, we found that both the Closure (by including C → B and excluding C → B) are not equivalent,
hence FD C → B is important and cannot be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }
6 a. Closure C+ = CDBA using FD = { B →A, AD → C, C → B, C → D }
6 b. Closure C+ = CBA using FD = { B → A, AD → C, C → B }
From 6 a and 6 b, we found that both the Closure( by including C → D and excluding C → D) are not equivalent,
hence FD C → D is important and cannot be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }
Since FD = { B → A, AD → C, C → B, C → D } is resultant FD, now we have checked the redundancy of
attribute, since the left side of FD AD → C has two attributes, let's check their importance, i.e. whether they both
are important or only one.
Closure AD+ = ADCB using FD = { B →A, AD → C, C → B, C → D }
Closure A+ = A using FD = { B →A, AD → C, C → B, C → D }
Closure D+ = D using FD = { B →A, AD → C, C → B, C → D }
Since the closure of AD+, A+, D+ that we found are not all equivalent, hence in FD AD → C, both A and D are
important attributes and cannot be removed.
Hence resultant FD = { B → A, AD → C, C → B, C → D } and we can rewrite as
FD = { B → A, AD → C, C → BD } is Canonical Cover of FD = { B → A, AD → BC, C → ABD }.
Example 2: Given a relational Schema R( W, X, Y, Z) and set of Function Dependency FD = { W → X, Y → X, Z
→ WXY, WY → Z }. Find the canonical cover?
Solution: Given FD = { W → X, Y → X, Z → WXY, WY → Z }, now decompose the FD using decomposition
rule( Armstrong Axiom ).
W→X
Y→X
Z → W ( using decomposition inference rule on Z → WXY )
Z → X ( using decomposition inference rule on Z → WXY )
Z → Y ( using decomposition inference rule on Z → WXY )
WY → Z
Now set of FD = { W → X, Y → X, WY → Z, Z → W, Z → X, Z → Y }
The next step is to find closure of the left side of each of the given FD by including that FD and excluding that
FD, if closure in both cases are same then that FD is redundant and we remove that FD from the given set,
otherwise if both the closures are different then we do not exclude that FD.
Calculating closure of all FD { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
1 a. Closure W+ = WX using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
1 b. Closure W+ = W using FD = { Y → X, Z → W, Z → X, Z → Y, WY → Z }
From 1 a and 1 b, we found that both the Closure (by including W → X and excluding W → X ) are not
equivalent, hence FD W → X is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
2 a. Closure Y+ = YX using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
2 b. Closure Y+ = Y using FD = { W → X, Z → W, Z → X, Z → Y, WY → Z }
From 2 a and 2 b we found that both the Closure (by including Y → X and excluding Y → X ) are not equivalent,
hence FD Y → X is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
3 a. Closure Z+ = ZWXY using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
3 b. Closure Z+ = ZXY using FD = { W → X, Y → X, Z → X, Z → Y, WY → Z }
From 3 a and 3 b, we found that both the Closure (by including Z → W and excluding Z → W ) are not
equivalent, hence FD Z → W is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
4 a. Closure Z+ = ZXWY using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
4 b. Closure Z+ = ZWYX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
From 4 a and 4 b, we found that both the Closure (by including Z → X and excluding Z → X ) are equivalent,
hence FD Z → X is not important and can be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
5 a. Closure Z+ = ZYWX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
5 b. Closure Z+ = ZWX using FD = { W → X, Y → X, Z → W, WY → Z }
From 5 a and 5 b, we found that both the Closure (by including Z → Y and excluding Z → Y ) are not equivalent,
hence FD Z → X is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
6 a. Closure WY+ = WYZX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
6 b. Closure WY+ = WYX using FD = { W → X, Y → X, Z → W, Z → Y }
From 6 a and 6 b, we found that both the Closure (by including WY → Z and excluding WY → Z) are not
equivalent, hence FD WY → Z is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Since FD = { W → X, Y → X, Z → W, Z → Y, WY → Z } is resultant FD now, we have checked the redundancy
of attribute, since the left side of FD WY → Z has two attributes at its left, let's check their importance, i.e.
whether they both are important or only one.
Closure WY+ = WYZX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Closure W+ = WX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Closure Y+ = YX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Since the closure of WY+, W+, Y+ that we found are not all equivalent, hence in FD WY → Z, both W and Y are
important attributes and cannot be removed.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z } and we can rewrite as:
FD = { W → X, Y → X, Z → WY, WY → Z } is Canonical Cover of FD = { W → X, Y → X, Z → WXY, WY → Z
}.
Example 3: Given a relational Schema R( V, W, X, Y, Z) and set of Function Dependency FD = { V → W, VW →
X, Y → VXZ }. Find the canonical cover?
Solution: Given FD = { V → W, VW → X, Y → VXZ }. now decompose the FD using decomposition rule
(Armstrong Axiom).
V→W
VW → X
Y → V ( using decomposition inference rule on Y → VXZ )
Y → X ( using decomposition inference rule on Y → VXZ )
Y → Z ( using decomposition inference rule on Y → VXZ )
Now set of FD = { V → W, VW → X, Y → V, Y → X, Y → Z }.
The next step is to find closure of the left side of each of the given FD by including that FD and excluding that
FD, if closure in both cases are same then that FD is redundant and we remove that FD from the given set,
otherwise if both the closures are different then we do not exclude that FD.
Calculating closure of all FD { V → W, VW → X, Y → V, Y → X, Y → Z }.
1 a. Closure V+ = VWX using FD = {V → W, VW → X, Y → V, Y → X, Y → Z}
1 b. Closure V+ = V using FD = {VW → X, Y → V, Y → X, Y → Z }
From 1 a and 1 b, we found that both the Closure( by including V → W and excluding V → W ) are not
equivalent, hence FD V → W is important and cannot be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → X, Y → Z }.
2 a. Closure VW+ = VWX using FD = { V → W, VW → X, Y → V, Y → X, Y → Z }
2 b. Closure VW+ = VW using FD = { V → W, Y → V, Y → X, Y → Z }
From 2 a and 2 b, we found that both the Closure( by including VW → X and excluding VW → X ) are not
equivalent, hence FD VW → X is important and cannot be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → X, Y → Z }.
3 a. Closure Y+ = YVXZW using FD = { V → W, VW → X, Y → V, Y → X, Y → Z }
3 b. Closure Y+ = YXZ using FD = { V → W, VW → X, Y → X, Y → Z }
From 3 a and 3 b, we found that both the Closure( by including Y → V and excluding Y → V ) are not equivalent,
hence FD Y → V is important and cannot be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → X, Y → Z }.
4 a. Closure Y+ = YXVZW using FD = { V → W, VW → X, Y → V, Y → X, Y → Z }
4 b. Closure Y+ = YVZWX using FD = { V → W, VW → X, Y → V, Y → Z }
From 4 a and 4 b, we found that both the Closure( by including Y → X and excluding Y → X ) are equivalent,
hence FD Y → X is not important and can be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → Z }.
5 a. Closure Y+ = YZVWX using FD = { V → W, VW → X, Y → V, Y → Z }
5 b. Closure Y+ = YVWX using FD = { V → W, VW → X, Y → V }
From 5 a and 5 b, we found that both the Closure( by including Y → Z and excluding Y → Z ) are not equivalent,
hence FD Y → Z is important and cannot be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → Z }.
Since FD = { V → W, VW → X, Y → V, Y → Z } is resultant FD now, we have checked the redundancy of
attribute, since the left side of FD VW → X has two attributes at its left, let's check their importance, i.e. whether
they both are important or only one.
Closure VW+ = VWX using FD = { V → W, VW → X, Y → V, Y → Z }
Closure V+ = VWX using FD = { V → W, VW → X, Y → V, Y → Z }
Closure W+ = W using FD = { V → W, VW → X, Y → V, Y → Z }
Since the closure of VW+, V+, W+ we found that all the Closures of VW and V are equivalent, hence in FD VW
→ X, W is not at all an important attribute and can be removed.
Hence resultant FD = { V → W, V → X, Y → V, Y → Z } and we can rewrite as
FD = { V → WX, Y → VZ } is Canonical Cover of FD = { V → W, VW → X, Y → VXZ }.
CONCLUSION: From the above three examples we conclude that canonical cover / irreducible set of functional
dependency follows the following steps, which we need to follow while calculating Canonical Cover.
STEP 1: For a given set of FD, decompose each FD using decomposition rule (Armstrong Axiom) if the right side
of any FD has more than one attribute.
STEP 2: Now make a new set of FD having all decomposed FD.
STEP 3: Find closure of the left side of each of the given FD by including that FD and excluding that FD, if
closure in both cases are same then that FD is redundant and we remove that FD from the given set, otherwise if
both the closures are different then we do not exclude that FD.
STEP 4: Repeat step 4 till all the FDs in FD set are complete.
STEP 5: After STEP 4, find resultant FD = { B → A, AD → C, C → B, C → D } which are not redundant.
STEP 6: Check redundancy of attribute, by selecting those FD's from FD sets which are having more than one
attribute on its left, let's an FD AD → C has two attributes at its left, let's check their importance, i.e. whether they
both are important or only one.
STEP 6 a: Find Closure AD+
STEP 6 b: Find Closure A+
STEP 6 c: Find Closure D+
Compare Closure of STEP (6a, 6b, 6c) if the closure of AD+, A+, D+ are not equivalent, hence in FD AD → C,
both A and D are important attributes and cannot be removed, otherwise, we remove the redundant attribute
File Organization
The File is a collection of records. Using the primary key, we can access the records. The type and frequency of
access can be determined by the type of file organization which was used for a given set of records.
File organization is a logical relationship among various records. This method defines how file records are
mapped onto disk blocks.
File organization is used to describe the way in which the records are stored in terms of blocks, and the blocks
are placed on the storage medium.
The first approach to map the database to the file is to use the several files and store only one fixed length record
in any given file. An alternative approach is to structure our files so that we can contain multiple lengths for
records.
Files of fixed length records are easier to implement than the files of variable length records.
Objective of file organization
It contains an optimal selection of records, i.e., records can be selected as fast as possible.
To perform insert, delete or update transaction on the records should be quick and easy.
The duplicate records cannot be induced as a result of insert, update or delete.
For the minimal cost of storage, records should be stored efficiently.
Types of file organization:
File organization contains various methods. These particular methods have pros and cons on the basis of access
or selection. In the file organization, the programmer decides the best-suited file organization method according
to his requirement.
Types of file organization are as follows:

Sequential file organization


Heap file organization
Hash file organization
B+ file organization
Indexed sequential access method (ISAM)
Cluster file organization
Sequential File Organization
This method is the easiest method for file organization. In this method, files are stored sequentially. This method
can be implemented in two ways:
1. Pile File Method:
It is a quite simple method. In this method, we store the record in a sequence, i.e., one after another. Here, the
record will be inserted in the order in which they are inserted into tables.
In case of updating or deleting of any record, the record will be searched in the memory blocks. When it is found,
then it will be marked for deleting, and the new record is inserted.
Insertion of the new record:
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are nothing but
a row in the table. Suppose we want to insert a new record R2 in the sequence, then it will be placed at the end
of the file. Here, records are nothing but a row in any table.

2. Sorted File Method:


In this method, the new record is always inserted at the file's end, and then it will sort the sequence in ascending
or descending order. Sorting of records is based on any primary key or any other key.
In the case of modification of any record, it will update the record and then sort the file, and lastly, the updated
record is placed in the right place.

Insertion of the new record:


Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7. Suppose a
new record R2 has to be inserted in the sequence, then it will be inserted at the end of the file, and then it will
sort the sequence.
Pros of sequential file organization
It contains a fast and efficient method for the huge amount of data.
In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
It is simple in design. It requires no much effort to store the data.
This method is used when most of the records have to be accessed like grade calculation of a student,
generating the salary slip, etc.
This method is used for report generation or statistical calculations.
Cons of sequential file organization
It will waste time as we cannot jump on a particular record that is required but we have to move sequentially
which takes our time.
Sorted file method takes more time and space for sorting the records
Heap file organization
It is the simplest and most basic type of organization. It works with data blocks. In heap file organization, the
records are inserted at the file's end. When the records are inserted, it doesn't require the sorting and ordering of
records.
When the data block is full, the new record is stored in some other block. This new data block need not to be the
very next data block, but it can select any data block in the memory to store new records. The heap file is also
known as an unordered file.
In the file, every record has a unique id, and every page in a file is of the same size. It is the DBMS responsibility
to store and manage the new records.
Insertion of a new record
Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a new record R2
in a heap. If the data block 3 is full then it will be inserted in any of the database selected by the DBMS, let's say
data block 1.

If we want to search, update or delete the data in heap file organization, then we need to traverse the data from
staring of the file till we get the requested record.
If the database is very large then searching, updating or deleting of record will be time-consuming because there
is no sorting or ordering of records. In the heap file organization, we need to check all the data until we get the
requested record.
Pros of Heap file organization
It is a very good method of file organization for bulk insertion. If there is a large number of data which needs to
load into the database at a time, then this method is best suited.
In case of a small database, fetching and retrieving of records is faster than the sequential record.
Cons of Heap file organization
This method is inefficient for the large database because it takes time to search or modify the record.

This method is inefficient for large databases.


ash File Organization
Hash File Organization uses the computation of hash function on some fields of the records. The hash function's
output determines the location of disk block where the records are to be placed.

When a record has to be received using the hash key columns, then the address is generated, and the whole
record is retrieved using that address. In the same way, when a new record has to be inserted, then the address
is generated using the hash key and record is directly inserted. The same process is applied in the case of delete
and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each record will be
stored randomly in the memory.
B+ File Organization
B+ tree file organization is the advanced method of an indexed sequential access method. It uses a tree-like
structure to store records in File.
It uses the same concept of key-index where the primary key is used to sort the records. For each primary key,
the value of the index is generated and mapped with the record.
The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this method, all
the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf nodes. They do not
contain any records.

The above B+ tree shows that:


There is one root node of the tree, i.e., 25.
There is an intermediary layer with nodes. They do not store the actual record. They have only pointers to the
leaf node.
The nodes to the left of the root node contain the prior value of the root and nodes to the right contain next value
of the root, i.e., 15 and 30 respectively.
There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
Searching for any record is easier as all the leaf nodes are balanced.
In this method, searching any record can be traversed through the single path and accessed easily.
Pros of B+ tree file organization
In this method, searching becomes very easy as all the records are stored only in the leaf nodes and sorted the
sequential linked list.
Traversing through the tree structure is easier and faster.
The size of the B+ tree has no restrictions, so the number of records can increase or decrease and the B+ tree
structure can also grow or shrink.
It is a balanced tree structure, and any insert/update/delete does not affect the performance of tree.
Cons of B+ tree file organization
This method is inefficient for the static method.
Indexed sequential access method (ISAM)
ISAM method is an advanced sequential file organization. In this method, records are stored in the file using the
primary key. An index value is generated for each primary key and mapped with the record. This index contains
the address of the record in the file.

If any record has to be retrieved based on its index value, then the address of the data block is fetched and the
record is retrieved from the memory.
Pros of ISAM:
In this method, each record has the address of its data block, searching a record in a huge database is quick and
easy.
This method supports range retrieval and partial retrieval of records. Since the index is based on the primary key
values, we can retrieve the data for the given range of value. In the same way, the partial value can also be
easily searched, i.e., the student name starting with 'JA' can be easily searched.
Cons of ISAM
This method requires extra space in the disk to store the index value.
When the new records are inserted, then these files have to be reconstructed to maintain the sequence.
When the record is deleted, then the space used by it needs to be released. Otherwise, the performance of the
database will slow down.
Cluster file organization
When the two or more records are stored in the same file, it is known as clusters. These files will have two or
more tables in the same data block, and key attributes which are used to map these tables together are stored
only once.
This method reduces the cost of searching for various records in different files.
The cluster file organization is used when there is a frequent need for joining the tables with the same condition.
These joins will give only a few records from both tables. In the given example, we are retrieving the record for
only particular departments. This method can't be used to retrieve the record for the entire department.

In this method, we can directly insert, update or delete any record. Data is sorted based on the key with which
searching is done. Cluster key is a type of key with which joining of the table is performed.
Types of Cluster file organization:
Cluster file organization is of two types:
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The above EMPLOYEE
and DEPARTMENT relationship is an example of an indexed cluster. Here, all the records are grouped based on
the cluster key- DEP_ID and all the records are grouped.
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the cluster key, we
generate the value of the hash key for the cluster key and store the records with the same hash key value.
Pros of Cluster file organization
The cluster file organization is used when there is a frequent request for joining the tables with same joining
condition.
It provides the efficient result when there is a 1:M mapping between the tables.
Cons of Cluster file organization
This method has the low performance for the very large database.
If there is any change in joining condition, then this method cannot use. If we change the condition of joining then
traversing the file takes a lot of time.
This method is not suitable for a table with a 1:1 condition.
Indexing in DBMS
Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required
when a query is processed.
The index is a type of data structure. It is used to locate and access the data in a database table quickly.
Index structure:
Indexes can be created using some database columns.

The first column of the database is the search key that contains a copy of the primary key or candidate key of the
table. The values of the primary key are stored in sorted order so that the corresponding data can be accessed
easily.
The second column of the database is the data reference. It contains a set of pointers holding the address of the
disk block where the value of the particular key can be found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered
indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10 bytes long. If
their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
In the case of a database with no index, we have to search the disk block from starting till it reaches 543. The
DBMS will read the record after reading 543*10=5430 bytes.
In the case of an index, we will search using indexes and the DBMS will read the record after reading 542*2=
1084 bytes which are very less compared to the previous case.
Primary Index
If the index is created on the basis of the primary key of the table, then it is known as primary indexing. These
primary keys are unique to each record and contain 1:1 relation between the records.
As primary keys are stored in sorted order, the performance of the searching operation is quite efficient.
The primary index can be classified into two types: Dense index and Sparse index.
Dense index
The dense index contains an index record for every search key value in the data file. It makes searching faster.
In this, the number of records in the index table is same as the number of records in the main table.
It needs more space to store index record itself. The index records have the search key and a pointer to the
actual record on the disk.

Sparse index
In the data file, index record appears only for a few items. Each item points to a block.
In this, instead of pointing to each record in the main table, the index points to the records in the main table in a
gap.

Clustering Index
A clustered index can be defined as an ordered data file. Sometimes the index is created on non-primary key
columns which may not be unique for each record.
In this case, to identify the record faster, we will group two or more columns to get the unique value and create
index out of them. This method is called a clustering index.
The records which have similar characteristics are grouped, and indexes are created for these group.
Example: suppose a company contains several employees in each department. Suppose we use a clustering
index, where all employees which belong to the same Dept_ID are considered within a single cluster, and index
pointers point to the cluster as a whole. Here Dept_Id is a non-unique key.

The previous schema is little confusing because one disk block is shared by records which belong to the different
cluster. If we use separate disk block for separate clusters, then it is called better technique.
Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also grows. These mappings are
usually kept in the primary memory so that address fetch should be faster. Then the secondary memory
searches the actual data based on the address got from mapping. If the mapping size grows then fetching the
address itself becomes slower. In this case, the sparse index will not be efficient. To overcome this problem,
secondary indexing is introduced.
In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In this method, the
huge range for the columns is selected initially so that the mapping size of the first level becomes small. Then
each range is further divided into smaller ranges. The mapping of the first level is stored in the primary memory,
so that address fetch is faster. The mapping of the second level and actual data are stored in the secondary
memory (hard disk).
For example:
If you want to find the record of roll 111 in the diagram, then it will search the highest entry which is smaller than
or equal to 111 in the first level index. It will get 100 at this level.
Then in the second index level, again it does max (111) <= 111 and gets 110. Now using the address 110, it
goes to the data block and starts searching each record till it gets 111.
This is how a search is performed in this method. Inserting, updating or deleting is also done in the same
manner.
B+ Tree
The B+ tree is a balanced binary search tree. It follows a multi-level index format.
In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same
height.
In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support random access as well
as sequential access.
Structure of B+ Tree
In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of the order n where n is
fixed for every B+ tree.
It contains an internal node and leaf node.
Internal node
An internal node of the B+ tree can contain at least n/2 record pointers except the root node.
At most, an internal node of the tree contains n pointers.
Leaf node
The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
At most, a leaf node contains n record pointer and n key values.
Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.
Searching a record in B+ Tree
Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the intermediary node which
will direct to the leaf node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end, we will be
redirected to the third leaf node. Here DBMS will perform a sequential search to find 55.

B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node after 55. It is a
balanced tree, and a leaf node of this tree is already full, so we cannot insert 60 there.
In this case, we have to split the leaf node, so that it can be inserted into tree without affecting the fill factor,
balance and order.
00:00/04:59

The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf node of
the tree in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65, 70) into 2 leaf
nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60 added to it,
and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to find the node
where it fits and then place it in that leaf node.
B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from the
intermediate node as well as from the 4th leaf node too. If we remove it from the intermediate node, then the tree
will not satisfy the rule of the B+ tree. So we need to modify it to have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:

Hashing
In a huge database structure, it is very inefficient to search all the index values and reach the desired data.
Hashing technique is used to calculate the direct location of a data record on the disk without using index
structure.
In this technique, data is stored at the data blocks whose address is generated by using the hashing function.
The memory location where these records are stored is known as data bucket or data blocks.
In this, a hash function can choose any of the column value to generate the address. Most of the time, the hash
function uses the primary key to generate the address of the data block. A hash function is a simple
mathematical function to any complex mathematical function. We can even consider the primary key itself as the
address of the data block. That means each row whose address will be the same as a primary key stored in the
data block.
The above diagram shows data block addresses same as primary key value. This hash function can also be a
simple mathematical function like exponential, mod, cos, sin, etc. Suppose we have mod (5) hash function to
determine the address of the data block. In this case, it applies mod (5) hash function on the primary keys and
generates 3, 3, 1, 4 and 2 respectively, and records are stored in those data block addresses.

Types of Hashing:
Static Hashing
Dynamic Hashing
Static Hashing
In static hashing, the resultant data bucket address will always be the same. That means if we generate an
address for EMP_ID =103 using the hash function mod (5) then it will always result in same bucket address 3.
Here, there will be no change in the bucket address.
Hence in this static hashing, the number of data buckets in memory remains constant throughout. In this
example, we will have five data buckets in the memory used to store the data.

Operations of Static Hashing


Searching a record
When a record needs to be searched, then the same hash function retrieves the address of the bucket where the
data is stored.
Insert a Record
When a new record is inserted into the table, then we will generate an address for a new record based on the
hash key and record is stored in that location.
Delete a Record
To delete a record, we will first fetch the record which is supposed to be deleted. Then we will delete the records
for that address in memory.
Update a Record
To update a record, we will first search it using a hash function, and then the data record is updated.
If we want to insert some new record into the file but the address of a data bucket generated by the hash function
is not empty, or data already exists in that address. This situation in the static hashing is known as bucket
overflow. This is a critical situation in this method.
To overcome this situation, there are various methods. Some commonly used methods are as follows:
1. Open Hashing
When a hash function generates an address at which data is already stored, then the next bucket will be
allocated to it. This mechanism is called as Linear Probing.
For example: suppose R3 is a new address which needs to be inserted, the hash function generates address as
112 for R3. But the generated address is already full. So the system searches next available data bucket, 113
and assigns R3 to it.

2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is linked after the
previous one. This mechanism is known as Overflow chaining.
For example: Suppose R3 is a new address which needs to be inserted into the table, the hash function
generates address as 110 for it. But this bucket is full to store the new data. In this case, a new bucket is inserted
at the end of 110 buckets and is linked to it.
Dynamic Hashing
The dynamic hashing method is used to overcome the problems of static hashing like bucket overflow.
In this method, data buckets grow or shrink as the records increases or decreases. This method is also known as
Extendable hashing method.
This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting in poor performance.
How to search a key
First, calculate the hash address of the key.
Check how many bits are used in the directory, and these bits are called as i.
Take the least significant i bits of the hash address. This gives an index of the directory.
Now using the index, go to the directory and find bucket address where the record might be.
How to insert a new record
Firstly, you have to follow the same procedure for retrieval, ending up in some bucket.
If there is still space in that bucket, then place the record in it.
If the bucket is full, then we will split the bucket and redistribute the records.
For example:
Consider the following grouping of keys into buckets, depending on the prefix of their hash address:

The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6 are 01, so it will go
into bucket B1. The last two bits of 1 and 3 are 10, so it will go into bucket B2. The last two bits of 7 are 11, so it
will go into B3.
Insert key 9 with hash address 10001 into the above structure:
Since key 9 has hash address 10001, it must go into the first bucket. But bucket B1 is full, so it will get split.
The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will go into bucket B1, and the last
three bits of 6 are 101, so it will go into bucket B5.
Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry because last two bits of both the
entry are 00.
Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry because last two bits of both the
entry are 10.
Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because last two bits of both the entry
are 11.

Advantages of dynamic hashing


In this method, the performance does not decrease as the data grows in the system. It simply increases the size
of memory to accommodate the data.
In this method, memory is well utilized as it grows and shrinks with the data. There will not be any unused
memory lying.
This method is good for the dynamic database where data grows and shrinks frequently.
Disadvantages of dynamic hashing
In this method, if the data size increases then the bucket size is also increased. These addresses of data will be
maintained in the bucket address table. This is because the data address will keep changing as buckets grow
and shrink. If there is a huge increase in data, maintaining the bucket address table becomes tedious.
In this case, the bucket overflow situation will also occur. But it might take little time to reach this situation than
static hashing.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy