DBMS Tutorial
DBMS Tutorial
What is Data?
Data is a collection of a distinct small unit of information. It can be used in a variety of forms like text, numbers,
media, bytes, etc. it can be stored in pieces of paper or electronic memory, etc.
Word 'Data' is originated from the word 'datum' that means 'single piece of information.' It is plural of the word
datum.
In computing, Data is information that can be translated into a form for efficient movement and processing. Data
is interchangeable.
What is Database?
A database is an organized collection of data, so that it can be easily accessed and managed.
40M
761
Difference between JDK, JRE, and JVM
You can organize data into tables, rows, columns, and index it to make it easier to find relevant information.
Database handlers create a database in such a way that only one set of software program provides access of
data to all the users.
The main purpose of the database is to operate a large amount of information by storing, retrieving, and
managing data.
There are many dynamic websites on the World Wide Web nowadays which are handled through databases. For
example, a model that checks the availability of rooms in a hotel. It is an example of a dynamic website that uses
a database.
There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix, PostgreSQL, SQL
Server, etc.
Modern databases are managed by the database management system (DBMS).
SQL or Structured Query Language is used to operate on the data stored in a database. SQL depends on
relational algebra and tuple relational calculus.
A cylindrical structure is used to display the image of a database.
Evolution of Databases
The database has completed more than 50 years of journey of its evolution from flat-file system to relational and
objects relational systems. It has gone through several generations.
The Evolution
File-Based
1968 was the year when File-Based database were introduced. In file-based databases, data was maintained in
a flat file. Though files have many advantages, there are several limitations.
One of the major advantages is that the file system has various access methods, e.g., sequential, indexed, and
random.
It requires extensive programming in a third-generation language such as COBOL, BASIC.
Hierarchical Data Model
1968-1980 was the era of the Hierarchical Database. Prominent hierarchical database model was IBM's first
DBMS. It was called IMS (Information Management System).
In this model, files are related in a parent/child manner.
Below diagram represents Hierarchical Data Model. Small circle represents objects.
Like file system, this model also had some limitations like complex implementation, lack structural independence,
can't easily handle a many-many relationship, etc.
Network data model
Charles Bachman developed the first DBMS at Honeywell called Integrated Data Store (IDS). It was developed in
the early 1960s, but it was standardized in 1971 by the CODASYL group (Conference on Data Systems
Languages).
In this model, files are related as owners and members, like to the common network model.
Network data model identified the following components:
Network schema (Database organization)
Sub-schema (views of database per user)
Data management language (procedural)
This model also had some limitations like system complexity and difficult to design and maintain.
Relational Database
1970 - Present: It is the era of Relational Database and Database Management. In 1970, the relational model
was proposed by E.F. Codd.
Relational database model has two main terminologies called instance and schema.
The instance is a table with rows or columns
Schema specifies the structure like name of the relation, type of each column and name.
This model uses some mathematical concept like set theory and predicate logic.
The first internet database application had been created in 1995.
During the era of the relational database, many more models had introduced like object-oriented model, object-
relational model, etc.
Cloud database
Cloud database facilitates you to store, manage, and retrieve their structured, unstructured data via a cloud
platform. This data is accessible over the Internet. Cloud databases are also called a database as service
(DBaaS) because they are offered as a managed service.
Some best cloud options are:
AWS (Amazon Web Services)
Snowflake Computing
Oracle Database Cloud Services
Microsoft SQL server
Google cloud spanner
Advantages of cloud database
Lower costs
Generally, company provider does not have to invest in databases. It can maintain and support one or more data
centers.
Automated
Cloud databases are enriched with a variety of automated processes such as recovery, failover, and auto-
scaling.
Increased accessibility
You can access your cloud-based database from any location, anytime. All you need is just an internet
connection.
NoSQL Database
A NoSQL database is an approach to design such databases that can accommodate a wide variety of data
models. NoSQL stands for "not only SQL." It is an alternative to traditional relational databases in which data is
placed in tables, and data schema is perfectly designed before the database is built.
NoSQL databases are useful for a large set of distributed data.
Some examples of NoSQL database system with their category are:
MongoDB, CouchDB, Cloudant (Document-based)
Memcached, Redis, Coherence (key-value store)
HBase, Big Table, Accumulo (Tabular)
Advantage of NoSQL
High Scalability
NoSQL can handle an extensive amount of data because of scalability. If the data grows, NoSQL database scale
it to handle that data in an efficient manner.
High Availability
NoSQL supports auto replication. Auto replication makes it highly available because, in case of any failure, data
replicates itself to the previous consistent state.
Disadvantage of NoSQL
Open source
NoSQL is an open-source database, so there is no reliable standard for NoSQL yet.
Management challenge
Data management in NoSQL is much more complicated than relational databases. It is very challenging to install
and even more hectic to manage daily.
GUI is not available
GUI tools for NoSQL database are not easily available in the market.
Backup
Backup is a great weak point for NoSQL databases. Some databases, like MongoDB, have no powerful
approaches for data backup.
The Object-Oriented Databases
The object-oriented databases contain data in the form of object and classes. Objects are the real-world entity,
and types are the collection of objects. An object-oriented database is a combination of relational model features
with objects oriented principles. It is an alternative implementation to that of the relational model.
Object-oriented databases hold the rules of object-oriented programming. An object-oriented database
management system is a hybrid application.
The object-oriented database model contains the following properties.
Object-oriented programming properties
Objects
Classes
Inheritance
Polymorphism
Encapsulation
Relational database properties
Atomicity
Consistency
Integrity
Durability
Concurrency
Query processing
Graph Databases
A graph database is a NoSQL database. It is a graphical representation of data. It contains nodes and edges. A
node represents an entity, and each edge represents a relationship between two edges. Every node in a graph
database represents a unique identifier.
Graph databases are beneficial for searching the relationship between data because they highlight the
relationship between relevant data.
Graph databases are very useful when the database contains a complex relationship and dynamic schema.
It is mostly used in supply chain management, identifying the source of IP telephony.
DBMS (Data Base Management System)
Database management System is software which is used to store and retrieve the database. For example,
Oracle, MySQL, etc.; these are some popular DBMS tools.
DBMS provides the interface to perform the various operations like creation, deletion, modification, etc.
DBMS allows the user to create their databases as per their requirement.
DBMS accepts the request from the application and provides specific data through the operating system.
DBMS contains the group of programs which acts according to the user instruction.
It provides security to the database.
Advantage of DBMS
Controls redundancy
It stores all the data in a single database file, so it can control data redundancy.
Data sharing
An authorized user can share the data among multiple users.
Backup
It providesBackup and recovery subsystem. This recovery system creates automatic data from system failure
and restores data if required.
Multiple user interfaces
It provides a different type of user interfaces like GUI, application interfaces.
Disadvantage of DBMS
Size
It occupies large disk space and large memory to run efficiently.
Cost
DBMS requires a high-speed data processor and larger memory to run DBMS software, so it is costly.
Complexity
DBMS creates additional complexity and requirements.
RDBMS (Relational Database Management System)
The word RDBMS is termed as 'Relational Database Management System.' It is represented as a table that
contains rows and column.
RDBMS is based on the Relational model; it was introduced by E. F. Codd.
A relational database contains the following components:
Table
Record/ Tuple
Field/Column name /Attribute
Instance
Schema
Keys
An RDBMS is a tabular DBMS that maintains the security, integrity, accuracy, and consistency of the data.
Types of Databases
There are various types of databases used for storing different varieties of data:
1) Centralized Database
It is the type of database that stores data at a centralized database system. It comforts the users to access the
stored data from different locations through several applications. These applications contain the authentication
process to let users access data securely. An example of a Centralized database can be Central Library that
carries a central database of each library in a college/university.
Advantages of Centralized Database
It has decreased the risk of data management, i.e., manipulation of data will not affect the core data.
Data consistency is maintained as it manages data in a central repository.
It provides better data quality, which enables organizations to establish data standards.
It is less costly because fewer vendors are required to handle the data sets.
Disadvantages of Centralized Database
The size of the centralized database is large, which increases the response time for fetching the data.
It is not easy to update such an extensive database system.
If any server failure occurs, entire data will be lost, which could be a huge loss.
2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among different database
systems of an organization. These database systems are connected via communication links. Such links help the
end-users to access the data easily. Examples of the Distributed database are Apache Cassandra, HBase,
Ignite, etc.
We can further divide a distributed database system into:
Homogeneous DDB: Those database systems which execute on the same operating system and use the same
application process and carry the same hardware devices.
Heterogeneous DDB: Those database systems which execute on different operating systems under different
application procedures, and carries different hardware devices.
Advantages of Distributed Database
Modular development is possible in a distributed database, i.e., the system can be expanded by including new
computers and connecting them to the distributed system.
One server failure will not affect the entire data set.
3) Relational Database
This database is based on the relational data model, which stores data in the form of rows(tuple) and
columns(attributes), and together forms a table(relation). A relational database uses SQL for storing,
manipulating, as well as maintaining the data. E.F. Codd invented the database in 1970. Each table in the
database carries a key that makes the data unique from others. Examples of Relational databases are MySQL,
Microsoft SQL Server, Oracle, etc.
Properties of Relational Database
There are following four commonly known properties of a relational model known as ACID properties, where:
A means Atomicity: This ensures the data operation will complete either with success or with failure. It follows the
'all or nothing' strategy. For example, a transaction will either be committed or will abort.
C means Consistency: If we perform any operation over the data, its value before and after the operation should
be preserved. For example, the account balance before and after the transaction should be correct, i.e., it should
remain conserved.
I means Isolation: There can be concurrent users for accessing data at the same time from the database. Thus,
isolation between the data should remain isolated. For example, when multiple transactions occur at the same
time, one transaction effects should not be visible to the other transactions in the database.
D means Durability: It ensures that once it completes the operation and commits the data, data changes should
remain permanent.
4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets. It is not a
relational database as it stores data not only in tabular form but in several different ways. It came into existence
when the demand for building modern applications increased. Thus, NoSQL presented a wide variety of
database technologies in response to the demands. We can further divide a NoSQL database into the following
four types:
Key-value storage: It is the simplest type of database storage where it stores every single item as a key (or
attribute name) holding its value, together.
Document-oriented Database: A type of database used to store data as JSON-like document. It helps developers
in storing data by using the same document-model format as used in the application code.
Graph Databases: It is used for storing vast amounts of data in a graph-like structure. Most commonly, social
networking websites use the graph database.
Wide-column stores: It is similar to the data represented in relational databases. Here, data is stored in large
columns together, instead of storing in rows.
Advantages of NoSQL Database
It enables good productivity in the application development as it is not required to store data in a structured
format.
It is a better option for managing and handling large data sets.
It provides high scalability.
Users can quickly access data from the database through key-value.
5) Cloud Database
A type of database where data is stored in a virtual environment and executes over the cloud computing
platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS, etc.) for accessing the
database. There are numerous cloud platforms, but the best options are:
Amazon Web Services(AWS)
Microsoft Azure
Kamatera
PhonixNAP
ScienceSoft
Google Cloud SQL, etc.
6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data in the database system.
The data is represented and stored as objects which are similar to the objects used in the object-oriented
programming language.
7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship nodes. Here, it organizes
data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record in the tree will contain only
one parent. On the other hand, each parent record can have multiple child records.
8) Network Databases
It is the database that typically follows the network data model. Here, the representation of data is in the form of
nodes connected via links between them. Unlike the hierarchical database, it allows each record to have multiple
children and parent nodes to form a generalized graph structure.
9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This database is basically
designed for a single user.
Advantage of Personal Database
It is simple and easy to handle.
It occupies less storage space as it is small in size.
10) Operational Database
The type of database which creates and updates the database in real-time. It is basically designed for executing
and handling the daily data operations in several businesses. For example, An organization uses operational
databases for managing per day transactions.
11) Enterprise Database
Large organizations or enterprises use this database for managing a massive amount of data. It helps
organizations to increase and improve their efficiency. Such a database allows simultaneous access to users.
Advantages of Enterprise Database:
Multi processes are supportable over the Enterprise database.
It allows executing parallel queries on the system.
What is RDBMS
RDBMS stands for Relational Database Management Systems..
All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL and
Microsoft Access are based on RDBMS.
It is called Relational Data Base Management System (RDBMS) because it is based on relational model
introduced by E.F. Codd.
How it works
Data is represented in terms of tuples (rows) in RDBMS.
Relational database is most commonly used database. It contains number of tables and each table has its own
primary key.
Due to a collection of organized set of tables, data can be accessed easily in RDBMS.
Brief History of RDBMS
During 1970 to 1972, E.F. Codd published a paper to propose the use of relational database model.
RDBMS is originally based on that E.F. Codd's relational model invention.
What is table
The RDBMS database uses tables to store data. A table is a collection of related data entries and contains rows
and columns to store data.
A table is the simplest example of data storage in RDBMS.
Let's see the example of student table.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
What is field
Field is a smaller entity of the table which contains specific information about every record in the table. In the
above example, the field in the student table consist of id, name, age, course.
1 Ajeet 24 B.Tech
What is column
A column is a vertical entity in the table which contains all information associated with a specific field in a table.
For example: "name" is a column in the above table which contains all information about student's name.
Ajeet
Aryan
Mahesh
Ratan
Vimal
NULL Values
The NULL value of the table specifies that the field has been left blank during record creation. It is totally different
from the value filled with zero or a field that contains space.
Data Integrity
There are the following categories of data integrity exist with each RDBMS:
Entity integrity: It specifies that there should be no duplicate rows in a table.
Domain integrity: It enforces valid entries for a given column by restricting the type, the format, or the range of
values.
Referential integrity: It specifies that rows cannot be deleted, which are used by other records.
User-defined integrity: It enforces some specific business rules that are defined by users. These rules are
different from entity, domain or referential integrity.
Difference between DBMS and RDBMS
Although DBMS and RDBMS both are used to store information in physical database but there are some
remarkable differences between them.
The main differences between DBMS and RDBMS are given below:
No DBMS RDBMS
.
1) DBMS applications store data as file. RDBMS applications store data in a tabular form.
2) In DBMS, data is generally stored in either a In RDBMS, the tables have an identifier called primary key and the
hierarchical form or a navigational form. data values are stored in the form of tables.
4) DBMS does not apply any security with RDBMS defines the integrity constraint for the purpose of ACID
regards to data manipulation. (Atomocity, Consistency, Isolation and Durability) property.
5) DBMS uses file system to store data, so in RDBMS, data values are stored in the form of tables, so
there will be no relation between the tables. a relationship between these data values will be stored in the form
of a table as well.
6) DBMS has to provide some uniform RDBMS system supports a tabular structure of the data and a
methods to access the stored information. relationship between them to access the stored information.
8) DBMS is meant to be for small organization RDBMS is designed to handle large amount of data. it
and deal with small data. it supports single supports multiple users.
user.
9) Examples of DBMS are file Example of RDBMS are mysql, postgre, sql server, oracle etc.
systems, xml etc.
After observing the differences between DBMS and RDBMS, you can say that RDBMS is an extension of DBMS.
There are many software products in the market today who are compatible for both DBMS and RDBMS. Means
today a RDBMS application is DBMS application and vice-versa.
DBMS vs. File System
File System Approach
File based systems were an early attempt to computerize the manual system. It is also called a traditional based
approach in which a decentralized approach was taken where each department stored and controlled its own
data with the help of a data processing specialist. The main role of a data processing specialist was to create the
necessary computer file structures, and also manage the data within structures and design some application
programs that create reports based on file data.
Meaning DBMS is a collection of data. In DBMS, the The file system is a collection of data. In this
user is not required to write the procedures. system, the user has to write the procedures for
managing the database.
Sharing of data Due to the centralized approach, data sharing Data is distributed in many files, and it may be of
is easy. different formats, so it isn't easy to share data.
Data Abstraction DBMS gives an abstract view of data that The file system provides the detail of the data
hides the details. representation and storage of data.
Security and DBMS provides a good protection It isn't easy to protect a file under the file system.
Protection mechanism.
Recovery DBMS provides a crash recovery The file system doesn't have a crash mechanism,
Mechanism mechanism, i.e., DBMS protects the user i.e., if the system crashes while entering some
from system failure. data, then the content of the file will be lost.
Manipulation DBMS contains a wide variety of The file system can't efficiently store and retrieve
Techniques sophisticated techniques to store and retrieve the data.
the data.
Concurrency DBMS takes care of Concurrent access of In the File system, concurrent access has many
Problems data using some form of locking. problems like redirecting the file while deleting
some information or updating some information.
Where to use Database approach used in large systems File system approach used in large systems which
which interrelate many files. interrelate many files.
Cost The database system is expensive to design. The file system approach is cheaper to design.
Data Redundancy Due to the centralization of the database, the In this, the files and application programs are
and Inconsistency problems of data redundancy and created by different programmers so that there
inconsistency are controlled. exists a lot of duplication of data which may lead
to inconsistency.
Structure The database structure is complex to design. The file system approach has a simple structure.
Data Independence In this system, Data Independence exists, In the File system approach, there exists no Data
and it can be of two types. Independence.
Logical Data Independence
Physical Data Independence
Integrity Integrity Constraints are easy to apply. Integrity Constraints are difficult to implement in
Constraints file system.
Data Models In the database approach, 3 types of data In the file system approach, there is no concept of
models exist: data models exists.
Hierarchal data models
Network data models
Relational data models
Flexibility Changes are often a necessity to the content The flexibility of the system is less as compared to
of the data stored in any system, and these the DBMS approach.
changes are more easily with a database
approach.
DBMS Architecture
The DBMS design depends upon its architecture. The basic client/server architecture is used to deal with a large
number of PCs, web servers, database servers and other components that are connected with networks.
The client/server architecture consists of many PCs and a workstation which are connected via the network.
DBMS architecture depends upon how users are connected to the database to get their request done.
Types of DBMS Architecture
Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of two
types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
In this architecture, the database is directly available to the user. It means the user can directly sit on the DBMS
and uses it.
Any changes done here will directly be done on the database itself. It doesn't provide a handy tool for end users.
The 1-Tier architecture is used for development of the local application, where programmers can directly
communicate with the database for the quick response.
2-Tier Architecture
The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the client end
can directly communicate with the database at the server side. For this interaction, API's like: ODBC, JDBC are
used.
The user interfaces and application programs are run on the client-side.
The server side is responsible to provide the functionalities like: query processing and transaction management.
To communicate with the DBMS, client-side application establishes a connection with the server side.
The internal level has an internal schema which describes the physical storage structure of the database.
The internal schema is also known as a physical schema.
It uses the physical data model. It is used to define that how the data will be stored in a block.
The physical level is used to describe complex low-level data structures in detail.
The internal level is generally is concerned with the following activities:
406.1K
Epic Games and Xbox To Donate 2 Weeks of ‘Fortnite’ Proceeds to Ukraine Relief Efforts
Storage space allocations.
For Example: B-Trees, Hashing etc.
Access paths.
For Example: Specification of primary and secondary keys, indexes, pointers and sequencing.
Data compression and encryption techniques.
Optimization of internal structures.
Representation of stored fields.
2. Conceptual Level
The conceptual schema describes the design of a database at the conceptual level. Conceptual level is also
known as logical level.
The conceptual schema describes the structure of the whole database.
The conceptual level describes what data are to be stored in the database and also describes what relationship
exists among those data.
In the conceptual level, internal details such as an implementation of the data structure are hidden.
Programmers and database administrators work at this level.
3. External Level
At the external level, a database contains several schemas that sometimes called as subschema. The
subschema is used to describe the different view of the database.
An external schema is also known as view schema.
Each view schema describes the database part that a particular user group is interested and hides the remaining
database from that user group.
The view schema describes the end user interaction with database systems.
Mapping between Views
The three levels of DBMS architecture don't exist independently of each other. There must be correspondence
between the three levels i.e. how they actually correspond with each other. DBMS is responsible for
correspondence between the three types of schema. This correspondence is called Mapping.
There are basically two types of mapping in the database architecture:
Conceptual/ Internal Mapping
External / Conceptual Mapping
Conceptual/ Internal Mapping
The Conceptual/ Internal Mapping lies between the conceptual level and the internal level. Its role is to define the
correspondence between the records and fields of the conceptual level and files and data structures of the
internal level.
External/ Conceptual Mapping
The external/Conceptual Mapping lies between the external level and the Conceptual level. Its role is to define
the correspondence between a particular external and the conceptual view.
Data Models
Data Model is the modeling of the data description, data semantics, and consistency constraints of the data. It
provides the conceptual tools for describing the design of a database at each level of data abstraction. Therefore,
there are following four data models used for understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and columns within a table.
Thus, a relational model uses tables for representing data and in-between relationships. Tables are also called
relations. This model was initially described by Edgar F. Codd, in 1969. The relational data model is the widely
used model which is primarily used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as objects and relationships
among them. These objects are known as entities, and relationship is an association among these entities. This
model was designed by Peter Chen and published in 1976 papers. It was widely used in database designing. A
set of attributes describe the entities. For example, student_name, student_id describes the 'student' entity. A set
of the same type of entities is known as an 'Entity set', and the set of the same type of relationships is known as
'relationship set'.
3) Object-based Data Model: An extension of the ER model with notions of functions, encapsulation, and object
identity, as well. This model supports a rich type system that includes structured and collection types. Thus, in
1980s, various database systems following the object-oriented approach were developed. Here, the objects are
nothing but the data carrying its properties.
4) Semistructured Data Model: This type of data model is different from the other three data models (explained
above). The semistructured data model allows the data specifications at places where the individual data items of
the same type may have different attributes sets. The Extensible Markup Language, also known as XML, is
widely used for representing the semistructured data. Although XML was initially designed for including the
markup information to the text document, it gains importance because of its application in the exchange of data.
Data model Schema and Instance
The data which is stored in the database at a particular moment of time is called an instance of the database.
The overall design of a database is called schema.
A database schema is the skeleton structure of the database. It represents the logical view of the entire
database.
A schema contains schema objects like table, foreign key, primary key, views, columns, data types, stored
procedure, etc.
A database schema can be represented by using the visual diagram. That diagram shows the database objects
and relationship with each other.
A database schema is designed by the database designers to help programmers whose software will interact
with the database. The process of database creation is called data modeling.
A schema diagram can display only some aspects of a schema like the name of record type, data type, and
constraints. Other aspects can't be specified through the schema diagram. For example, the given figure neither
show the data type of each data item nor the relationship among various files.
In the database, actual data changes quite frequently. For example, in the given figure, the database changes
whenever we add a new grade or add a student. The data at a particular moment of time is called the instance of
the database.
Data
Independence
Data independence can be explained using the three-schema architecture.
Data independence refers characteristic of being able to modify the schema at one level of the database system
without altering the schema at the next higher level.
There are two types of data independence:
1. Logical Data Independence
Logical data independence refers characteristic of being able to change the conceptual schema without having to
change the external schema.
Logical data independence is used to separate the external level from the conceptual view.
If we do any changes in the conceptual view of the data, then the user view of the data would not be affected.
Logical data independence occurs at the user interface level.
2. Physical Data Independence
Physical data independence can be defined as the capacity to change the internal schema without having to
change the conceptual schema.
If we do any changes in the storage size of the database system server, then the Conceptual structure of the
database will not be affected.
Physical data independence is used to separate conceptual levels from the internal levels.
Physical data independence occurs at the logical interface level.
Database
Language
A DBMS has appropriate languages and interfaces to express database queries and updates.
Database languages can be used to read, store and update the data in the database.
Types of Database Language
In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account B. So, it is not an
atomic transaction.
The below image shows that both debit and credit operations are done successfully. Thus the transaction is
atomic.
Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue, and so the
atomicity is the main focus in the bank systems.
2) Consistency: The word consistency means that the value should remain preserved always. In DBMS, the
integrity of the data should be maintained, which means if a change in the database is made, it should remain
preserved always. In the case of transactions, the integrity of the data is very essential so that the database
remains consistent before and after the transaction. The data should always be correct.
Example:
In the above figure, there are three accounts, A, B, and C, where A is making a transaction T one by one to both
B & C. There are two operations that take place, i.e., Debit and Credit. Account A firstly debits $50 to account B,
and the amount in account A is read $300 by B before the transaction. After the successful transaction T, the
available amount in B becomes $150. Now, A debits $20 to account C, and that time, the value read by C is $250
(that is correct as a debit of $50 has been successfully done to B). The debit and credit operation from account A
to C has been done successfully. We can see that the transaction is done successfully, and the value is also
read correctly. Thus, the data is consistent. In case the value read by B and C is $300, which means that data is
inconsistent because when the debit operation executes, it will not be consistent.
4) Isolation: The term 'isolation' means separation. In DBMS, Isolation is the property of a database where no
data should affect the other one and may occur concurrently. In short, the operation on one database should
begin when the operation on the first database gets complete. It means if two operations are being performed on
two different databases, they may not affect the value of one another. In the case of transactions, when two or
more transactions occur simultaneously, the consistency should remain maintained. Any changes that occur in
any particular transaction will not be seen by other transactions until the change is not committed in the memory.
Example: If two operations are concurrently running on two different accounts, then the value of both accounts
should not get affected. The value should remain persistent. As you can see in the below diagram, account A is
making T1 and T2 transactions to account B and C, but both are executing independently without affecting each
other. It is known as Isolation.
4) Durability: Durability ensures the permanency of something. In DBMS, the term durability ensures that the
data after the successful execution of the operation becomes permanent in the database. The durability of the
data should be so perfect that even if the system fails or leads to a crash, the database still survives. However, if
gets lost, it becomes the responsibility of the recovery manager for ensuring the durability of the database. For
committing the values, the COMMIT command must be used every time we make changes.
Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency and availability of data in
the database.
Thus, it was a precise introduction of ACID properties in DBMS. We have discussed these properties in the
transaction section also.
ER model
ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to define the
data elements and relationship for a specified system.
It develops a conceptual design for the database. It also develops a very simple and easy to design view of data.
In ER modeling, the database structure is portrayed as a diagram called an entity-relationship diagram.
For example, Suppose we design a school database. In this database, the student will be an entity with attributes
like address, name, id, age, etc. The address can be another entity with attributes like city, street name, pin code,
etc and there will be a relationship between them.
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as
rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be taken as an
entity.
a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key attribute of
its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary key. The key
attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The composite attribute is
represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute. The double
oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a
dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the
relationship.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the right associates
with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the right associates
with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Notati
on of ER diagram
Database can be represented using the notations. In ER diagram, many notations are used to express the
cardinality. These notations are as follows:
Mapping Constraints
A mapping constraint is a data constraint that expresses the number of entities to which another entity can be
related via a relationship set.
It is most useful in describing the relationship sets that involve more than two entity sets.
For binary relationship set R on an entity set A and B, there are four possible mapping cardinalities. These are as
follows:
One to one (1:1)
One to many (1:M)
Many to one (M:1)
Many to many (M:M)
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2 is
associated with at most one entity in E1.
One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an entity in E2 is
associated with at most one entity in E1.
Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2 is
associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an entity in E2 is
associated with any number of entities in E1.
Keys
Keys play an important role in the relational database.
It is used to uniquely identify any record or row of data from the table. It is also used to establish and identify
relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.
Types of keys:
1. Primary key
It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain multiple
keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes a primary key.
In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and Passport_Number as primary keys since they are also unique.
For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys are as
strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like SSN,
Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a
key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.
4. Foreign key
Foreign keys are the column of the table used to point to the primary key of another table.
Every employee works in a specific department in a company, and employee and department are two different
entities. So we can't store the department's information in the employee table. That's why we link these two
tables through the primary key of one table.
We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the EMPLOYEE table.
In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in a relation.
These attributes or combinations of the attributes are called the candidate keys. One key is chosen as the
primary key from these candidate keys, and the remaining candidate key, if it exists, is termed the alternate
key. In other words, the total number of the alternate keys is the total number of candidate keys minus the
primary key. The alternate key may or may not exist. If there is only one candidate key in a relation, it does not
have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate keys. In this
relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No, acts as the Alternate
key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is also
known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple roles, and an
employee may work on multiple projects simultaneously. So the primary key will be composed of all three
attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as a composite key
since the primary key comprises more than one attribute.
Generalization
Generalization is like a bottom-up approach in which two or more entities of lower level combine to form a higher
level entity if they have some attributes in common.
In generalization, an entity of a higher level can also combine with the entities of the lower level to form a further
higher level entity.
Generalization is more like subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach.
In generalization, entities are combined to form a more generalized entity, i.e., subclasses are combined to make
a superclass.
For example, Faculty and Student entities can be generalized and create a higher level entity Person.
Specialization
Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one higher level
entity can be broken down into two lower level entities.
Specialization is used to identify the subset of an entity set that shares some distinguishing characteristics.
Normally, the superclass is defined first, the subclass and its related attributes are defined next, and relationship
set are then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or
DEVELOPER based on what role they play in the company.
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with its
corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will never
enquiry about the Course only or just about the Center instead he will ask the enquiry about both.
Reduc
tion of ER diagram to Table
The database can be represented using the notations, and these notations can be reduced to a collection of
tables.
In the database, every entity set or relationship set can be represented in tabular form.
The ER diagram is given below:
There are some points for converting the ER diagram to the table:
Entity type becomes a table.
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual tables.
All single-valued attribute becomes a column for the table.
In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of STUDENT table. Similarly,
COURSE_NAME and COURSE_ID form the column of COURSE table and so on.
A key attribute of the entity type represented by the primary key.
In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key attribute of
the entity.
The multivalued attribute is represented by a separate table.
In the student table, a hobby is a multivalued attribute. So it is not possible to represent multiple values in a
single column of STUDENT table. Hence we create a table STUD_HOBBY with column name STUDENT_ID and
HOBBY. Using both the column, we create a composite key.
Composite attribute represented by components.
In the given ER diagram, student address is a composite attribute. It contains CITY, PIN, DOOR#, STREET, and
STATE. In the STUDENT table, these attributes can merge as an individual column.
Derived attributes are not considered in the table.
In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time by calculating the
difference between current date and Date of Birth.
Using these rules, you can convert the ER diagram to tables and columns and assign the mapping between the
tables. Table structure for the given ER diagram is as below:
3. Many-to-many
In a many-to-many relationship, many occurrences in an entity relate to many occurrences in another entity.
Same as a one-to-one relationship, the many-to-many relationship rarely exists in practice.
For example: At the same time, an employee can work on several projects, and a project has a team of many
employees.
Therefore, employee and project have a many-to-many relationship.
Relational Model
concept
Relational model can represent as a table with columns and rows. Each row is known as a tuple. Each table of
the column has a name or attribute.
Domain: It contains a set of atomic values that an attribute can take.
Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a domain, dom(Ai)
Relational instance: In the relational database system, the relational instance is represented by a finite set of
tuples. Relation instances do not have duplicate tuples.
Relational schema: A relational schema contains the name of the relation and name of all columns or attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify the row in the relation
uniquely.
Example: STUDENT Relation
In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the attributes.
The instance of schema STUDENT has 5 tuples.
t3 = <Laxman, 33289, 8583287182, Gurugram, 20>
Properties of Relations
Name of the relation is distinct from all other relations.
Each relation cell contains exactly one atomic (single) value
Each attribute contains a distinct name
Attribute domain has no significance
tuple has no duplicate value
Order of tuple can have a different sequence
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of the
query. It uses operators to perform queries.
Types of Relational operation
1. Select Operation:
The select operation selects tuples that satisfy a given predicate.
It is denoted by sigma (σ).
Notation: σ p(r)
Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These relational
can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation
40M
761
Difference between JDK, JRE, and JVM
Input:
σ BRANCH_NAME="perryride" (LOAN)
Output:
2. Project Operation:
This operation shows the list of those attributes that we wish to appear in the result. Rest of the attributes are
eliminated from the table.
It is denoted by ∏.
Notation: ∏ A1, A2, An (r)
Where
A1, A2, A3 is used as an attribute name of relation r.
Example: CUSTOMER RELATION
Input:
∏ NAME, CITY (CUSTOMER)
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
Suppose there are two tuples R and S. The union operation contains all the tuples that are either in R or S or
both in R & S.
Notation: R ∪ S
A union operation must hold the following condition:
R and S must have the attribute of the same number.
Duplicate tuples are eliminated automatically.
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in both R & S.
It is denoted by intersection ∩.
Notation: R ∩ S
Example: Using the above DEPOSITOR table and BORROW table
Input:
∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in R but not in
S.
It is denoted by intersection minus (-).
Notation: R - S
Example: Using the above DEPOSITOR table and BORROW table
Input:
∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
The Cartesian product is used to combine each row in one table with each row in the other table. It is also known
as a cross product.
It is denoted by X.
Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
Join Operations:
is denoted by ⋈.
A Join operation combines related tuples from different relations, if and only if a given join condition is satisfied. It
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
1. Natural Join:
A natural join is the set of tuples of all combinations in R and S that are equal on their common attribute names.
It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing information.
Example:
EMPLOYEE
FACT_WORKERS
Input:
(EMPLOYEE ⋈ FACT_WORKERS)
Output:
It is denoted by ⟕.
Example: Using the above EMPLOYEE table and FACT_WORKERS table
Input:
EMPLOYEE ⟕ FACT_WORKERS
It is denoted by ⟖.
Example: Using the above EMPLOYEE table and FACT_WORKERS Relation
Input:
EMPLOYEE ⟖ FACT_WORKERS
Output:
It is denoted by ⟗.
Example: Using the above EMPLOYEE table and FACT_WORKERS table
Input:
EMPLOYEE ⟗ FACT_WORKERS
Output:
3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data as per the equality
condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input:
CUSTOMER ⋈ PRODUCT
Output:
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida
Integrity Constraints
Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes have to be performed in such a
way that data integrity is not affected.
Thus, integrity constraint is used to guard against accidental damage to the database.
Types of Integrity Constraint
1. Domain constraints
Domain constraints can be defined as the definition of a valid set of values for an attribute.
The data type of domain includes string, character, integer, time, date, currency, etc. The value of the attribute
must be available in the corresponding domain.
Example:
2. Entity integrity constraints
The entity integrity constraint states that primary key value can't be null.
This is because the primary key value is used to identify individual rows in relation and if the primary key has a
null value, then we can't identify those rows.
A table can contain a null value other than the primary key field.
Example:
Relational Calculus
Relational calculus is a non-procedural query language. In the non-procedural query language, the user is
concerned with the details of how to obtain the end results.
The relational calculus tells what to do but never explains how to do.
Types of Relational calculus:
1. Tuple Relational Calculus (TRC)
The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering variable uses the tuples
of a relation.
The result of the relation can have one or more tuples.
Notation:
{T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
For example:
{ T.name | Author(T) AND T.article = 'database' }
OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from Author who
has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal Quantifiers
(∀).
For example:
Domain relational calculus uses the same operators as tuple calculus. It uses logical connectives ∧ (and), ∨ (or)
and ┓ (not).
It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.
Notation:
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
For example:
1. Binary Datatypes
There are Three types of binary Datatypes which are given below:
binary It has a maximum length of 8000 bytes. It contains fixed-length binary data.
varbinary It has a maximum length of 8000 bytes. It contains variable-length binary data.
image It has a maximum length of 2,147,483,647 bytes. It contains variable-length binary data.
char It has a maximum length of 8000 characters. It contains Fixed-length non-unicode characters.
varchar It has a maximum length of 8000 characters. It contains variable-length non-unicode characters.
text It has a maximum length of 2,147,483,647 characters. It contains variable-length non-unicode characters.
Datatype Description
timestamp It stores the year, month, day, hour, minute, and the second value.
SQL Commands
SQL commands are instructions. It is used to communicate with the database. It is also used to perform specific
tasks, functions, and queries of data.
SQL can perform various tasks like create a table, add data to tables, drop the table, modify the table, set
permission for users.
Types of SQL Commands
There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
- It is used to subtract the right-hand operand from the left-hand operand. a-b will give 10
* It is used to multiply the value of both operands. a*b will give 200
/ It is used to divide the left-hand operand by the right-hand operand. a/b will give 2
% It is used to divide the left-hand operand by the right-hand operand and returns reminder. a%b will give 0
= It checks if two operands values are equal or not, if the values are queal then condition (a=b) is not
becomes true. true
!= It checks if two operands values are equal or not, if values are not equal, then condition (a!=b) is true
becomes true.
<> It checks if two operands values are equal or not, if values are not equal then condition (a<>b) is true
becomes true.
> It checks if the left operand value is greater than right operand value, if yes then condition (a>b) is not
becomes true. true
< It checks if the left operand value is less than right operand value, if yes then condition (a<b) is true
becomes true.
>= It checks if the left operand value is greater than or equal to the right operand value, if yes (a>=b) is not
then condition becomes true. true
<= It checks if the left operand value is less than or equal to the right operand value, if yes (a<=b) is true
then condition becomes true.
!< It checks if the left operand value is not less than the right operand value, if yes then (a!=b) is not
condition becomes true. true
!> It checks if the left operand value is not greater than the right operand value, if yes then (a!>b) is true
condition becomes true.
Operator Description
BETWEEN It is used to search for values that are within a set of values.
SQL Table
SQL Table is a collection of data which is organized in terms of rows and columns. In DBMS, the table is known
as relation and row as a tuple.
Table is a simple form of data storage. A table is also considered as a convenient representation of relations.
Let's see an example of the EMPLOYEE table:
In the above table, "EMPLOYEE" is the table name, "EMP_ID", "EMP_NAME", "CITY", "PHONE_NO" are the
column names. The combination of data of multiple columns forms a row, e.g., 1, "Kristen", "Washington" and
7289201223 are the data of one row.
Operation on Table
Create table
Drop table
Delete table
Rename table
SQL Create Table
SQL create table is used to create a table in the database. To define the table, you should define the name of the
table and also define its columns and column's data type.
Syntax
00:00/05:29
create table "table_name"
("column1" "data type",
"column2" "data type",
"column3" "data type",
...
"columnN" "data type");
Example
SQL> CREATE TABLE EMPLOYEE (
EMP_ID INT NOT NULL,
EMP_NAME VARCHAR (25) NOT NULL,
PHONE_NO INT NOT NULL,
ADDRESS CHAR (30),
PRIMARY KEY (ID)
);
If you create the table successfully, you can verify the table by looking at the message by the SQL server. Else
you can use DESC command as follows:
SQL> DESC EMPLOYEE;
Drop table
A SQL drop table is used to delete a table definition and all the data from a table. When this command is
executed, all the information available in the table is lost forever, so you have to very careful while using this
command.
Syntax
DROP TABLE "table_name";
Firstly, you need to verify the EMPLOYEE table using the following command:
SQL> DESC EMPLOYEE;
If you don't specify the WHERE condition, it will remove all the rows from the table.
DELETE FROM EMPLOYEE;
Now, the EMPLOYEE table would not have any records.
SQL SELECT Statement
In SQL, the SELECT statement is used to query or retrieve data from a table in the database. The returns data is
stored in a table, and the result table is known as result-set.
Syntax
SELECT column1, column2, ...
FROM table_name;
Here, the expression is the field name of the table that you want to select data from.
Use the following syntax to select all the fields available in the table:
37.9M
866
OOPs Concepts in Java
SELECT * FROM table_name;
Example:
EMPLOYEE
To fetch the EMP_ID of all the employees, use the following query:
SELECT EMP_ID FROM EMPLOYEE;
Output
EMP_ID
3
4
EMP_NAME SALARY
Kristen 150000
Russell 200000
Angelina 600000
Robert 350000
Christian 260000
To fetch all the fields from the EMPLOYEE table, use the following query:
SELECT * FROM EMPLOYEE
Output
Views in SQL
Views in SQL are considered as a virtual table. A view also contains rows and columns.
To create the view, we can select the fields from one or more tables present in the database.
A view can either have specific rows based on certain condition or all the rows of a table.
Sample table:
Student_Detail
1 Stephan Delhi
2 Kathrin Noida
3 David Ghaziabad
4 Alina Gurugram
Student_Marks
STU_ID NAME MARKS AGE
1 Stephan 97 19
2 Kathrin 86 21
3 David 74 18
4 Alina 90 20
5 John 96 18
1. Creating view
A view can be created using the CREATE VIEW statement. We can create a view from a single table or multiple
tables.
Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
2. Creating View from a single table
In this example, we create a View named DetailsView from the table Student_Detail.
Query:
CREATE VIEW DetailsView AS
SELECT NAME, ADDRESS
FROM Student_Details
WHERE STU_ID < 4;
Just like table query, we can query the view to view the data.
SELECT * FROM DetailsView;
Output:
NAME ADDRESS
Stephan Delhi
Kathrin Noida
David Ghaziabad
Stephan Delhi 97
Kathrin Noida 86
David Ghaziabad 74
Alina Gurugram 90
4. Deleting View
A view can be deleted using the Drop View statement.
Syntax
DROP VIEW view_name;
SQL Index
Indexes are special lookup tables. It is used to retrieve data from the database very fast.
An Index is used to speed up select queries and where clauses. But it shows down the data input with insert and
update statements. Indexes can be created or dropped without affecting the data.
An index in a database is just like an index in the back of a book.
For example: When you reference all pages in a book that discusses a certain topic, you first have to refer to the
index, which alphabetically lists all the topics and then referred to one or more specific page numbers.
1. Create Index statement
It is used to create an index on a table. It allows duplicate value.
Syntax
CREATE INDEX index_name
ON table_name (column1, column2, ...);
Example
CREATE INDEX idx_name
ON Persons (LastName, FirstName);
2. Unique Index statement
It is used to create a unique index on a table. It does not allow duplicate value.
Syntax
CREATE UNIQUE INDEX index_name
ON table_name (column1, column2, ...);
Example
CREATE UNIQUE INDEX websites_idx
ON websites (site_name);
3. Drop Index Statement
It is used to delete an index in a table.
Syntax
DROP INDEX index_name;
Example
DROP INDEX websites_idx;
SQL Sub Query
A Subquery is a query within another SQL query and embedded within the WHERE clause.
Important Rule:
A subquery can be placed in a number of SQL clauses like WHERE clause, FROM clause, HAVING clause.
You can use Subquery with SELECT, UPDATE, INSERT, DELETE statements along with the operators like =, <,
>, >=, <=, IN, BETWEEN, etc.
A subquery is a query within another query. The outer query is known as the main query, and the inner query is
known as a subquery.
Subqueries are on the right side of the comparison operator.
A subquery is enclosed in parentheses.
In the Subquery, ORDER BY command cannot be used. But GROUP BY command can be used to perform the
same function as ORDER BY command.
1. Subqueries with the Select Statement
SQL subqueries are most frequently used with the Select statement.
Syntax
SELECT column_name
FROM table_name
WHERE column_name expression operator
( SELECT column_name from table_name WHERE ... );
Example
Consider the EMPLOYEE table have the following records:
1 John 20 US 2000.00
2 Stephan 26 Dubai 1500.00
4 Alina 29 UK 6500.00
4 Alina 29 UK 6500.00
1 John 20 US 2000.00
4 Alina 29 UK 1625.00
1 John 20 US 2000.00
SQL Clauses
The following are the various SQL clauses:
1. GROUP BY
SQL GROUP BY statement is used to arrange identical data into groups. The GROUP BY statement is used with
the SQL SELECT statement.
The GROUP BY statement follows the WHERE clause in a SELECT statement and precedes the ORDER BY
clause.
The GROUP BY statement is used with aggregation function.
Syntax
SELECT column
FROM table_name
WHERE conditions
GROUP BY column
ORDER BY column
Sample table:
PRODUCT_MAST
Item1 Com1 2 10 20
Item2 Com2 3 25 75
Item3 Com1 2 30 60
Item4 Com3 5 10 50
Item5 Com2 2 20 40
Item6 Cpm1 3 25 75
Item8 Com1 3 10 30
Item9 Com2 2 25 50
Example:
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY;
Output:
Com1 5
Com2 3
Com3 2
2. HAVING
HAVING clause is used to specify a search condition for a group or an aggregate.
Having is used in a GROUP BY clause. If you are not using GROUP BY clause then you can use HAVING
function like a WHERE clause.
Syntax:
SELECT column1, column2
FROM table_name
WHERE conditions
GROUP BY column1, column2
HAVING conditions
ORDER BY column1, column2;
Example:
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING COUNT(*)>2;
Output:
Com1 5
Com2 3
3. ORDER BY
The ORDER BY clause sorts the result-set in ascending or descending order.
It sorts the records in ascending order by default. DESC keyword is used to sort the records in descending order.
Syntax:
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column1, column2... ASC|DESC;
Where
ASC: It is used to sort the result set in ascending order by expression.
DESC: It sorts the result set in descending order by expression.
Example: Sorting Results in Ascending Order
Table:
CUSTOMER
23 David Bangkok
34 Alina Dubai
45 John UK
56 Harry US
34 Alina Dubai
23 David Bangkok
56 Harry US
45 John UK
12 Kathrin US
12 Kathrin US
45 John UK
56 Harry US
23 David Bangkok
34 Alina Dubai
1. COUNT FUNCTION
COUNT function is used to Count the number of rows in a database table. It can work on both numeric and non-
numeric data types.
COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table. COUNT(*)
considers duplicate and Null.
Syntax
COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )
Sample table:
PRODUCT_MAST
Item1 Com1 2 10 20
Item2 Com2 3 25 75
Item3 Com1 2 30 60
Item4 Com3 5 10 50
Item5 Com2 2 20 40
Item6 Cpm1 3 25 75
Item8 Com1 3 10 30
Item9 Com2 2 25 50
Example: COUNT()
SELECT COUNT(*)
FROM PRODUCT_MAST;
Output:
10
Example: COUNT with WHERE
SELECT COUNT(*)
FROM PRODUCT_MAST;
WHERE RATE>=20;
Output:
7
Example: COUNT() with DISTINCT
SELECT COUNT(DISTINCT COMPANY)
FROM PRODUCT_MAST;
Output:
3
Example: COUNT() with GROUP BY
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY;
Output:
Com1 5
Com2 3
Com3 2
Example: COUNT() with HAVING
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING COUNT(*)>2;
Output:
Com1 5
Com2 3
2. SUM Function
Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.
Syntax
SUM()
or
SUM( [ALL|DISTINCT] expression )
Example: SUM()
SELECT SUM(COST)
FROM PRODUCT_MAST;
Output:
670
Example: SUM() with WHERE
SELECT SUM(COST)
FROM PRODUCT_MAST
WHERE QTY>3;
Output:
320
Example: SUM() with GROUP BY
SELECT SUM(COST)
FROM PRODUCT_MAST
WHERE QTY>3
GROUP BY COMPANY;
Output:
Com1 150
Com2 170
Example: SUM() with HAVING
SELECT COMPANY, SUM(COST)
FROM PRODUCT_MAST
GROUP BY COMPANY
HAVING SUM(COST)>=170;
Output:
Com1 335
Com3 170
3. AVG function
The AVG function is used to calculate the average value of the numeric type. AVG function returns the average
of all non-Null values.
Syntax
AVG()
or
AVG( [ALL|DISTINCT] expression )
Example:
SELECT AVG(COST)
FROM PRODUCT_MAST;
Output:
67.00
4. MAX Function
MAX function is used to find the maximum value of a certain column. This function determines the largest value
of all selected values of a column.
Syntax
MAX()
or
MAX( [ALL|DISTINCT] expression )
Example:
SELECT MAX(RATE)
FROM PRODUCT_MAST;
30
5. MIN Function
MIN function is used to find the minimum value of a certain column. This function determines the smallest value
of all selected values of a column.
Syntax
MIN()
or
MIN( [ALL|DISTINCT] expression )
Example:
SELECT MIN(RATE)
FROM PRODUCT_MAST;
Output:
10
SQL JOIN
As the name shows, JOIN means to combine something. In case of SQL, JOIN means "to combine two or more
tables".
In SQL, JOIN clause is used to combine the records from two or more tables in a database.
Types of SQL JOIN
INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
Sample Table
EMPLOYEE
PROJECT
39.6M
886
Features of Java - Javatpoint
Next
Stay
101 1 Testing
102 2 Development
103 3 Designing
104 4 Development
1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as the condition is
satisfied. It returns the combination of all rows from both the tables where the condition satisfies.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
INNER JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
2. LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the right table. If there is no
matching join value, it will return NULL.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
LEFT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
3. RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the matched values
from the left table. If there is no matching in both tables, it will return NULL.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
RIGHT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join tables have all the records
from both tables. It puts NULL on the place of matches not found.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
FULL JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
ID NAME
1 Jack
2 Harry
3 Jackson
The Second table
ID NAME
3 Jackson
4 Stephan
5 David
ID NAME
1 Jack
2 Harry
3 Jackson
4 Stephan
5 David
2. Union All
Union All operation is equal to the Union operation. It returns the set without removing duplication and sorting the
data.
Syntax:
SELECT column_name FROM table1
UNION ALL
SELECT column_name FROM table2;
Example: Using the above First and Second table.
Union All query will be like:
SELECT * FROM First
UNION ALL
SELECT * FROM Second;
The resultset table will look like:
ID NAME
1 Jack
2 Harry
3 Jackson
3 Jackson
4 Stephan
5 David
3. Intersect
It is used to combine two SELECT statements. The Intersect operation returns the common rows from both the
SELECT statements.
In the Intersect operation, the number of datatype and columns must be the same.
It has no duplicates and it arranges the data in ascending order by default.
Syntax
SELECT column_name FROM table1
INTERSECT
SELECT column_name FROM table2;
Example:
Using the above First and Second table.
Intersect query will be:
SELECT * FROM First
INTERSECT
SELECT * FROM Second;
The resultset table will look like:
ID NAME
3 Jackson
4. Minus
It combines the result of two SELECT statements. Minus operator is used to display the rows which are present
in the first query but absent in the second query.
It has no duplicates and data arranged in ascending order by default.
Syntax:
SELECT column_name FROM table1
MINUS
SELECT column_name FROM table2;
Example
Using the above First and Second table.
Minus query will be:
SELECT * FROM First
MINUS
SELECT * FROM Second;
The resultset table will look like:
ID NAME
1 Jack
2 Harry
SQL Cursors
As we have discussed SQL Cursors in SQL tutorial of javatpoint so you can go through the concepts again to
make things more clear. View SQL Cursors Details
SQL Trigger
As we have discussed SQL Trigger in SQL tutorial of javatpoint so you can go through the concepts again to
make things more clear. View SQL Trigger Details
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists between the
primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know the
Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. Trivial functional dependency
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.
2. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.
When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate
the undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using relationship.
The normal form is used to reduce redundancy from the database table.
Types of Normal Forms
There are the four types of normal forms:
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the
primary key.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset
of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Fourth normal form (4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi-
valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is
no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for
Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we
can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Relational Decomposition
When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is
required.
In a database, it breaks the table into multiple tables.
If the relation has no proper decomposition, then it may lead to problems like loss of information.
Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and
redundancy.
Types of Decomposition
Lossless Decomposition
If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
The lossless decomposition guarantees that the join of relations will result in the same relation as it was
decomposed.
The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look
like:
Employee ⋈ Department
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of
these dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined
COLOR".
Join Dependency
Join decomposition is a further generalization of Multivalued dependencies.
If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists.
Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).
Alternatively, R1 and R2 are a lossless decomposition of R.
A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.
The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R.
Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.
Inclusion Dependency
Multivalued dependency and join dependency can be used to guide database design although they both are less
common than functional dependencies.
Inclusion dependencies are quite common. They typically show little influence on designing of the database.
The inclusion dependency is a statement in which some columns of a relation are contained in other columns.
The example of inclusion dependency is a foreign key. In one relation, the referring relation is contained in the
primary key column(s) of the referenced relation.
Suppose we have two relations R and S which was obtained by translating two entity sets such that every R
entity is also an S entity.
Inclusion dependency would be happen if projecting R on its key attributes yields a relation that is contained in
the relation obtained by projecting S on its key attributes.
In inclusion dependency, we should not split groups of attributes that participate in an inclusion dependency.
In practice, most inclusion dependencies are key-based that is involved only keys.
Canonical Cover
In the case of updating the database, the responsibility of the system is to check whether the existing functional
dependencies are getting violated during the process of updating. In case of a violation of functional
dependencies in the new database state, the rollback of the system must take place.
A canonical cover or irreducible a set of functional dependencies FD is a simplified set of FD that has a similar
closure as the original set FD.
Extraneous attributes
An attribute of an FD is said to be extraneous if we can remove it without changing the closure of the set of FD.
Example: Given a relational Schema R( A, B, C, D) and set of Function Dependency FD = { B → A, AD → BC,
C → ABD }. Find the canonical cover?
Solution: Given FD = { B → A, AD → BC, C → ABD }, now decompose the FD using decomposition
rule( Armstrong Axiom ).
B→A
AD → B ( using decomposition inference rule on AD → BC)
AD → C ( using decomposition inference rule on AD → BC)
C → A ( using decomposition inference rule on C → ABD)
C → B ( using decomposition inference rule on C → ABD)
C → D ( using decomposition inference rule on C → ABD)
Now set of FD = { B → A, AD → B, AD → C, C → A, C → B, C → D }
The next step is to find closure of the left side of each of the given FD by including that FD and excluding that
FD, if closure in both cases are same then that FD is redundant and we remove that FD from the given set,
otherwise if both the closures are different then we do not exclude that FD.
Calculating closure of all FD { B → A, AD → B, AD → C, C → A, C → B, C → D }
1a. Closure B+ = BA using FD = { B → A, AD → B, AD → C, C → A, C → B, C → D }
1b. Closure B+ = B using FD = { AD → B, AD → C, C → A, C → B, C → D }
From 1 a and 1 b, we found that both the Closure( by including B → A and excluding B → A ) are not equivalent,
hence FD B → A is important and cannot be removed from the set of FD.
2 a. Closure AD+ = ADBC using FD = { B →A, AD → B, AD → C, C → A, C → B, C → D }
2 b. Closure AD+ = ADCB using FD = { B → A, AD → C, C → A, C → B, C → D }
From 2 a and 2 b, we found that both the Closure (by including AD → B and excluding AD → B) are equivalent,
hence FD AD → B is not important and can be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → A, C → B, C → D }
3 a. Closure AD+ = ADCB using FD = { B →A, AD → C, C → A, C → B, C → D }
3 b. Closure AD+ = AD using FD = { B → A, C → A, C → B, C → D }
From 3 a and 3 b, we found that both the Closure (by including AD → C and excluding AD → C ) are not
equivalent, hence FD AD → C is important and cannot be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → A, C → B, C → D }
4 a. Closure C+ = CABD using FD = { B →A, AD → C, C → A, C → B, C → D }
4 b. Closure C+ = CBDA using FD = { B → A, AD → C, C → B, C → D }
From 4 a and 4 b, we found that both the Closure (by including C → A and excluding C → A) are equivalent,
hence FD C → A is not important and can be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }
5 a. Closure C+ = CBDA using FD = { B →A, AD → C, C → B, C → D }
5 b. Closure C+ = CD using FD = { B → A, AD → C, C → D }
From 5 a and 5 b, we found that both the Closure (by including C → B and excluding C → B) are not equivalent,
hence FD C → B is important and cannot be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }
6 a. Closure C+ = CDBA using FD = { B →A, AD → C, C → B, C → D }
6 b. Closure C+ = CBA using FD = { B → A, AD → C, C → B }
From 6 a and 6 b, we found that both the Closure( by including C → D and excluding C → D) are not equivalent,
hence FD C → D is important and cannot be removed from the set of FD.
Hence resultant FD = { B → A, AD → C, C → B, C → D }
Since FD = { B → A, AD → C, C → B, C → D } is resultant FD, now we have checked the redundancy of
attribute, since the left side of FD AD → C has two attributes, let's check their importance, i.e. whether they both
are important or only one.
Closure AD+ = ADCB using FD = { B →A, AD → C, C → B, C → D }
Closure A+ = A using FD = { B →A, AD → C, C → B, C → D }
Closure D+ = D using FD = { B →A, AD → C, C → B, C → D }
Since the closure of AD+, A+, D+ that we found are not all equivalent, hence in FD AD → C, both A and D are
important attributes and cannot be removed.
Hence resultant FD = { B → A, AD → C, C → B, C → D } and we can rewrite as
FD = { B → A, AD → C, C → BD } is Canonical Cover of FD = { B → A, AD → BC, C → ABD }.
Example 2: Given a relational Schema R( W, X, Y, Z) and set of Function Dependency FD = { W → X, Y → X, Z
→ WXY, WY → Z }. Find the canonical cover?
Solution: Given FD = { W → X, Y → X, Z → WXY, WY → Z }, now decompose the FD using decomposition
rule( Armstrong Axiom ).
W→X
Y→X
Z → W ( using decomposition inference rule on Z → WXY )
Z → X ( using decomposition inference rule on Z → WXY )
Z → Y ( using decomposition inference rule on Z → WXY )
WY → Z
Now set of FD = { W → X, Y → X, WY → Z, Z → W, Z → X, Z → Y }
The next step is to find closure of the left side of each of the given FD by including that FD and excluding that
FD, if closure in both cases are same then that FD is redundant and we remove that FD from the given set,
otherwise if both the closures are different then we do not exclude that FD.
Calculating closure of all FD { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
1 a. Closure W+ = WX using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
1 b. Closure W+ = W using FD = { Y → X, Z → W, Z → X, Z → Y, WY → Z }
From 1 a and 1 b, we found that both the Closure (by including W → X and excluding W → X ) are not
equivalent, hence FD W → X is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
2 a. Closure Y+ = YX using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
2 b. Closure Y+ = Y using FD = { W → X, Z → W, Z → X, Z → Y, WY → Z }
From 2 a and 2 b we found that both the Closure (by including Y → X and excluding Y → X ) are not equivalent,
hence FD Y → X is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
3 a. Closure Z+ = ZWXY using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
3 b. Closure Z+ = ZXY using FD = { W → X, Y → X, Z → X, Z → Y, WY → Z }
From 3 a and 3 b, we found that both the Closure (by including Z → W and excluding Z → W ) are not
equivalent, hence FD Z → W is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
4 a. Closure Z+ = ZXWY using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
4 b. Closure Z+ = ZWYX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
From 4 a and 4 b, we found that both the Closure (by including Z → X and excluding Z → X ) are equivalent,
hence FD Z → X is not important and can be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
5 a. Closure Z+ = ZYWX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
5 b. Closure Z+ = ZWX using FD = { W → X, Y → X, Z → W, WY → Z }
From 5 a and 5 b, we found that both the Closure (by including Z → Y and excluding Z → Y ) are not equivalent,
hence FD Z → X is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
6 a. Closure WY+ = WYZX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
6 b. Closure WY+ = WYX using FD = { W → X, Y → X, Z → W, Z → Y }
From 6 a and 6 b, we found that both the Closure (by including WY → Z and excluding WY → Z) are not
equivalent, hence FD WY → Z is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Since FD = { W → X, Y → X, Z → W, Z → Y, WY → Z } is resultant FD now, we have checked the redundancy
of attribute, since the left side of FD WY → Z has two attributes at its left, let's check their importance, i.e.
whether they both are important or only one.
Closure WY+ = WYZX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Closure W+ = WX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Closure Y+ = YX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Since the closure of WY+, W+, Y+ that we found are not all equivalent, hence in FD WY → Z, both W and Y are
important attributes and cannot be removed.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z } and we can rewrite as:
FD = { W → X, Y → X, Z → WY, WY → Z } is Canonical Cover of FD = { W → X, Y → X, Z → WXY, WY → Z
}.
Example 3: Given a relational Schema R( V, W, X, Y, Z) and set of Function Dependency FD = { V → W, VW →
X, Y → VXZ }. Find the canonical cover?
Solution: Given FD = { V → W, VW → X, Y → VXZ }. now decompose the FD using decomposition rule
(Armstrong Axiom).
V→W
VW → X
Y → V ( using decomposition inference rule on Y → VXZ )
Y → X ( using decomposition inference rule on Y → VXZ )
Y → Z ( using decomposition inference rule on Y → VXZ )
Now set of FD = { V → W, VW → X, Y → V, Y → X, Y → Z }.
The next step is to find closure of the left side of each of the given FD by including that FD and excluding that
FD, if closure in both cases are same then that FD is redundant and we remove that FD from the given set,
otherwise if both the closures are different then we do not exclude that FD.
Calculating closure of all FD { V → W, VW → X, Y → V, Y → X, Y → Z }.
1 a. Closure V+ = VWX using FD = {V → W, VW → X, Y → V, Y → X, Y → Z}
1 b. Closure V+ = V using FD = {VW → X, Y → V, Y → X, Y → Z }
From 1 a and 1 b, we found that both the Closure( by including V → W and excluding V → W ) are not
equivalent, hence FD V → W is important and cannot be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → X, Y → Z }.
2 a. Closure VW+ = VWX using FD = { V → W, VW → X, Y → V, Y → X, Y → Z }
2 b. Closure VW+ = VW using FD = { V → W, Y → V, Y → X, Y → Z }
From 2 a and 2 b, we found that both the Closure( by including VW → X and excluding VW → X ) are not
equivalent, hence FD VW → X is important and cannot be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → X, Y → Z }.
3 a. Closure Y+ = YVXZW using FD = { V → W, VW → X, Y → V, Y → X, Y → Z }
3 b. Closure Y+ = YXZ using FD = { V → W, VW → X, Y → X, Y → Z }
From 3 a and 3 b, we found that both the Closure( by including Y → V and excluding Y → V ) are not equivalent,
hence FD Y → V is important and cannot be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → X, Y → Z }.
4 a. Closure Y+ = YXVZW using FD = { V → W, VW → X, Y → V, Y → X, Y → Z }
4 b. Closure Y+ = YVZWX using FD = { V → W, VW → X, Y → V, Y → Z }
From 4 a and 4 b, we found that both the Closure( by including Y → X and excluding Y → X ) are equivalent,
hence FD Y → X is not important and can be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → Z }.
5 a. Closure Y+ = YZVWX using FD = { V → W, VW → X, Y → V, Y → Z }
5 b. Closure Y+ = YVWX using FD = { V → W, VW → X, Y → V }
From 5 a and 5 b, we found that both the Closure( by including Y → Z and excluding Y → Z ) are not equivalent,
hence FD Y → Z is important and cannot be removed from the set of FD.
Hence resultant FD = { V → W, VW → X, Y → V, Y → Z }.
Since FD = { V → W, VW → X, Y → V, Y → Z } is resultant FD now, we have checked the redundancy of
attribute, since the left side of FD VW → X has two attributes at its left, let's check their importance, i.e. whether
they both are important or only one.
Closure VW+ = VWX using FD = { V → W, VW → X, Y → V, Y → Z }
Closure V+ = VWX using FD = { V → W, VW → X, Y → V, Y → Z }
Closure W+ = W using FD = { V → W, VW → X, Y → V, Y → Z }
Since the closure of VW+, V+, W+ we found that all the Closures of VW and V are equivalent, hence in FD VW
→ X, W is not at all an important attribute and can be removed.
Hence resultant FD = { V → W, V → X, Y → V, Y → Z } and we can rewrite as
FD = { V → WX, Y → VZ } is Canonical Cover of FD = { V → W, VW → X, Y → VXZ }.
CONCLUSION: From the above three examples we conclude that canonical cover / irreducible set of functional
dependency follows the following steps, which we need to follow while calculating Canonical Cover.
STEP 1: For a given set of FD, decompose each FD using decomposition rule (Armstrong Axiom) if the right side
of any FD has more than one attribute.
STEP 2: Now make a new set of FD having all decomposed FD.
STEP 3: Find closure of the left side of each of the given FD by including that FD and excluding that FD, if
closure in both cases are same then that FD is redundant and we remove that FD from the given set, otherwise if
both the closures are different then we do not exclude that FD.
STEP 4: Repeat step 4 till all the FDs in FD set are complete.
STEP 5: After STEP 4, find resultant FD = { B → A, AD → C, C → B, C → D } which are not redundant.
STEP 6: Check redundancy of attribute, by selecting those FD's from FD sets which are having more than one
attribute on its left, let's an FD AD → C has two attributes at its left, let's check their importance, i.e. whether they
both are important or only one.
STEP 6 a: Find Closure AD+
STEP 6 b: Find Closure A+
STEP 6 c: Find Closure D+
Compare Closure of STEP (6a, 6b, 6c) if the closure of AD+, A+, D+ are not equivalent, hence in FD AD → C,
both A and D are important attributes and cannot be removed, otherwise, we remove the redundant attribute
File Organization
The File is a collection of records. Using the primary key, we can access the records. The type and frequency of
access can be determined by the type of file organization which was used for a given set of records.
File organization is a logical relationship among various records. This method defines how file records are
mapped onto disk blocks.
File organization is used to describe the way in which the records are stored in terms of blocks, and the blocks
are placed on the storage medium.
The first approach to map the database to the file is to use the several files and store only one fixed length record
in any given file. An alternative approach is to structure our files so that we can contain multiple lengths for
records.
Files of fixed length records are easier to implement than the files of variable length records.
Objective of file organization
It contains an optimal selection of records, i.e., records can be selected as fast as possible.
To perform insert, delete or update transaction on the records should be quick and easy.
The duplicate records cannot be induced as a result of insert, update or delete.
For the minimal cost of storage, records should be stored efficiently.
Types of file organization:
File organization contains various methods. These particular methods have pros and cons on the basis of access
or selection. In the file organization, the programmer decides the best-suited file organization method according
to his requirement.
Types of file organization are as follows:
If we want to search, update or delete the data in heap file organization, then we need to traverse the data from
staring of the file till we get the requested record.
If the database is very large then searching, updating or deleting of record will be time-consuming because there
is no sorting or ordering of records. In the heap file organization, we need to check all the data until we get the
requested record.
Pros of Heap file organization
It is a very good method of file organization for bulk insertion. If there is a large number of data which needs to
load into the database at a time, then this method is best suited.
In case of a small database, fetching and retrieving of records is faster than the sequential record.
Cons of Heap file organization
This method is inefficient for the large database because it takes time to search or modify the record.
When a record has to be received using the hash key columns, then the address is generated, and the whole
record is retrieved using that address. In the same way, when a new record has to be inserted, then the address
is generated using the hash key and record is directly inserted. The same process is applied in the case of delete
and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each record will be
stored randomly in the memory.
B+ File Organization
B+ tree file organization is the advanced method of an indexed sequential access method. It uses a tree-like
structure to store records in File.
It uses the same concept of key-index where the primary key is used to sort the records. For each primary key,
the value of the index is generated and mapped with the record.
The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this method, all
the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf nodes. They do not
contain any records.
If any record has to be retrieved based on its index value, then the address of the data block is fetched and the
record is retrieved from the memory.
Pros of ISAM:
In this method, each record has the address of its data block, searching a record in a huge database is quick and
easy.
This method supports range retrieval and partial retrieval of records. Since the index is based on the primary key
values, we can retrieve the data for the given range of value. In the same way, the partial value can also be
easily searched, i.e., the student name starting with 'JA' can be easily searched.
Cons of ISAM
This method requires extra space in the disk to store the index value.
When the new records are inserted, then these files have to be reconstructed to maintain the sequence.
When the record is deleted, then the space used by it needs to be released. Otherwise, the performance of the
database will slow down.
Cluster file organization
When the two or more records are stored in the same file, it is known as clusters. These files will have two or
more tables in the same data block, and key attributes which are used to map these tables together are stored
only once.
This method reduces the cost of searching for various records in different files.
The cluster file organization is used when there is a frequent need for joining the tables with the same condition.
These joins will give only a few records from both tables. In the given example, we are retrieving the record for
only particular departments. This method can't be used to retrieve the record for the entire department.
In this method, we can directly insert, update or delete any record. Data is sorted based on the key with which
searching is done. Cluster key is a type of key with which joining of the table is performed.
Types of Cluster file organization:
Cluster file organization is of two types:
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The above EMPLOYEE
and DEPARTMENT relationship is an example of an indexed cluster. Here, all the records are grouped based on
the cluster key- DEP_ID and all the records are grouped.
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the cluster key, we
generate the value of the hash key for the cluster key and store the records with the same hash key value.
Pros of Cluster file organization
The cluster file organization is used when there is a frequent request for joining the tables with same joining
condition.
It provides the efficient result when there is a 1:M mapping between the tables.
Cons of Cluster file organization
This method has the low performance for the very large database.
If there is any change in joining condition, then this method cannot use. If we change the condition of joining then
traversing the file takes a lot of time.
This method is not suitable for a table with a 1:1 condition.
Indexing in DBMS
Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required
when a query is processed.
The index is a type of data structure. It is used to locate and access the data in a database table quickly.
Index structure:
Indexes can be created using some database columns.
The first column of the database is the search key that contains a copy of the primary key or candidate key of the
table. The values of the primary key are stored in sorted order so that the corresponding data can be accessed
easily.
The second column of the database is the data reference. It contains a set of pointers holding the address of the
disk block where the value of the particular key can be found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered
indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10 bytes long. If
their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
In the case of a database with no index, we have to search the disk block from starting till it reaches 543. The
DBMS will read the record after reading 543*10=5430 bytes.
In the case of an index, we will search using indexes and the DBMS will read the record after reading 542*2=
1084 bytes which are very less compared to the previous case.
Primary Index
If the index is created on the basis of the primary key of the table, then it is known as primary indexing. These
primary keys are unique to each record and contain 1:1 relation between the records.
As primary keys are stored in sorted order, the performance of the searching operation is quite efficient.
The primary index can be classified into two types: Dense index and Sparse index.
Dense index
The dense index contains an index record for every search key value in the data file. It makes searching faster.
In this, the number of records in the index table is same as the number of records in the main table.
It needs more space to store index record itself. The index records have the search key and a pointer to the
actual record on the disk.
Sparse index
In the data file, index record appears only for a few items. Each item points to a block.
In this, instead of pointing to each record in the main table, the index points to the records in the main table in a
gap.
Clustering Index
A clustered index can be defined as an ordered data file. Sometimes the index is created on non-primary key
columns which may not be unique for each record.
In this case, to identify the record faster, we will group two or more columns to get the unique value and create
index out of them. This method is called a clustering index.
The records which have similar characteristics are grouped, and indexes are created for these group.
Example: suppose a company contains several employees in each department. Suppose we use a clustering
index, where all employees which belong to the same Dept_ID are considered within a single cluster, and index
pointers point to the cluster as a whole. Here Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by records which belong to the different
cluster. If we use separate disk block for separate clusters, then it is called better technique.
Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also grows. These mappings are
usually kept in the primary memory so that address fetch should be faster. Then the secondary memory
searches the actual data based on the address got from mapping. If the mapping size grows then fetching the
address itself becomes slower. In this case, the sparse index will not be efficient. To overcome this problem,
secondary indexing is introduced.
In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In this method, the
huge range for the columns is selected initially so that the mapping size of the first level becomes small. Then
each range is further divided into smaller ranges. The mapping of the first level is stored in the primary memory,
so that address fetch is faster. The mapping of the second level and actual data are stored in the secondary
memory (hard disk).
For example:
If you want to find the record of roll 111 in the diagram, then it will search the highest entry which is smaller than
or equal to 111 in the first level index. It will get 100 at this level.
Then in the second index level, again it does max (111) <= 111 and gets 110. Now using the address 110, it
goes to the data block and starts searching each record till it gets 111.
This is how a search is performed in this method. Inserting, updating or deleting is also done in the same
manner.
B+ Tree
The B+ tree is a balanced binary search tree. It follows a multi-level index format.
In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same
height.
In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support random access as well
as sequential access.
Structure of B+ Tree
In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of the order n where n is
fixed for every B+ tree.
It contains an internal node and leaf node.
Internal node
An internal node of the B+ tree can contain at least n/2 record pointers except the root node.
At most, an internal node of the tree contains n pointers.
Leaf node
The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
At most, a leaf node contains n record pointer and n key values.
Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.
Searching a record in B+ Tree
Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the intermediary node which
will direct to the leaf node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end, we will be
redirected to the third leaf node. Here DBMS will perform a sequential search to find 55.
B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node after 55. It is a
balanced tree, and a leaf node of this tree is already full, so we cannot insert 60 there.
In this case, we have to split the leaf node, so that it can be inserted into tree without affecting the fill factor,
balance and order.
00:00/04:59
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf node of
the tree in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65, 70) into 2 leaf
nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60 added to it,
and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to find the node
where it fits and then place it in that leaf node.
B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from the
intermediate node as well as from the 4th leaf node too. If we remove it from the intermediate node, then the tree
will not satisfy the rule of the B+ tree. So we need to modify it to have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:
Hashing
In a huge database structure, it is very inefficient to search all the index values and reach the desired data.
Hashing technique is used to calculate the direct location of a data record on the disk without using index
structure.
In this technique, data is stored at the data blocks whose address is generated by using the hashing function.
The memory location where these records are stored is known as data bucket or data blocks.
In this, a hash function can choose any of the column value to generate the address. Most of the time, the hash
function uses the primary key to generate the address of the data block. A hash function is a simple
mathematical function to any complex mathematical function. We can even consider the primary key itself as the
address of the data block. That means each row whose address will be the same as a primary key stored in the
data block.
The above diagram shows data block addresses same as primary key value. This hash function can also be a
simple mathematical function like exponential, mod, cos, sin, etc. Suppose we have mod (5) hash function to
determine the address of the data block. In this case, it applies mod (5) hash function on the primary keys and
generates 3, 3, 1, 4 and 2 respectively, and records are stored in those data block addresses.
Types of Hashing:
Static Hashing
Dynamic Hashing
Static Hashing
In static hashing, the resultant data bucket address will always be the same. That means if we generate an
address for EMP_ID =103 using the hash function mod (5) then it will always result in same bucket address 3.
Here, there will be no change in the bucket address.
Hence in this static hashing, the number of data buckets in memory remains constant throughout. In this
example, we will have five data buckets in the memory used to store the data.
2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is linked after the
previous one. This mechanism is known as Overflow chaining.
For example: Suppose R3 is a new address which needs to be inserted into the table, the hash function
generates address as 110 for it. But this bucket is full to store the new data. In this case, a new bucket is inserted
at the end of 110 buckets and is linked to it.
Dynamic Hashing
The dynamic hashing method is used to overcome the problems of static hashing like bucket overflow.
In this method, data buckets grow or shrink as the records increases or decreases. This method is also known as
Extendable hashing method.
This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting in poor performance.
How to search a key
First, calculate the hash address of the key.
Check how many bits are used in the directory, and these bits are called as i.
Take the least significant i bits of the hash address. This gives an index of the directory.
Now using the index, go to the directory and find bucket address where the record might be.
How to insert a new record
Firstly, you have to follow the same procedure for retrieval, ending up in some bucket.
If there is still space in that bucket, then place the record in it.
If the bucket is full, then we will split the bucket and redistribute the records.
For example:
Consider the following grouping of keys into buckets, depending on the prefix of their hash address:
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6 are 01, so it will go
into bucket B1. The last two bits of 1 and 3 are 10, so it will go into bucket B2. The last two bits of 7 are 11, so it
will go into B3.
Insert key 9 with hash address 10001 into the above structure:
Since key 9 has hash address 10001, it must go into the first bucket. But bucket B1 is full, so it will get split.
The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will go into bucket B1, and the last
three bits of 6 are 101, so it will go into bucket B5.
Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry because last two bits of both the
entry are 00.
Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry because last two bits of both the
entry are 10.
Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because last two bits of both the entry
are 11.