Introduction of DBMS (Database Management System)
Introduction of DBMS (Database Management System)
Introduction of DBMS (Database Management System)
DBMS stands for Database Management System. We can break it like this DBMS = Database +
Management System. Database is a collection of data and Management System is a set of programs to
store and retrieve those data. Based on this we can define DBMS like this: DBMS is a collection of inter-
related data and set of programs to store & access those data in an easy and effective manner.
The main purpose of database systems is to manage the data. Consider a university that keeps
the data of students, teachers, courses, books etc. To manage this data we need to store this data
somewhere where we can add new data, delete unused data, update outdated data, retrieve data, to
perform these operations on data we need a Database management system that allows us to store the
data in such a way so that all these operations can be performed on the data efficiently.
Redundancy of data: Data is said to be redundant if same data is copied at many places.
If a student wants to change Phone number, he has to get it updated at various sections.
Similarly, old records must be deleted from all sections representing that student.
Difficult Data Access: A user should know the exact location of file to access data, so
the process is very slow and time consuming. If user wants to search student hostel
allotment number of a student from 10000 unsorted students’ records, how difficult it can
be.
No Backup and Recovery:- File system does not incorporate any backup and recovery
of data if a file is lost or corrupted.
Compared to the File Based Data Management System, Database Management System has many
advantages. Some of these advantages are given below −
The file based data management systems contained multiple files that were stored in many
different locations in a system or even across multiple systems. Because of this, there
were sometimes multiple copies of the same file which lead to data redundancy.
This is prevented in a database as there is a single database and any change in it is reflected
immediately. Because of this, there is no chance of encountering duplicate data.
Sharing of Data
In a database, the users of the database can share the data among themselves. There are various
levels of authorisation to access the data, and consequently the data can only be shared based on
the correct authorisation protocols being followed.
Many remote users can also access the database simultaneously and share the data between
themselves.
Data Integrity
Data integrity means that the data is accurate and consistent in the database. Data Integrity is
very important as there are multiple databases in a DBMS. All of these databases contain data
that is visible to multiple users. So it is necessary to ensure that the data is correct and consistent
in all the databases and for all the users.
Data Security
Data Security is vital concept in a database. Only authorised users should be allowed to access
the database and their identity should be authenticated using a username and password.
Unauthorised users should not be allowed to access the database under any circumstances as it
violates the integrity constraints.
Privacy
The privacy rule in a database means only the authorized users can access a database according
to its privacy constraints. There are levels of database access and a user can only view the data
he is allowed to. For example - In social networking sites, access constraints are different for
different accounts a user may want to access.
Database Management System automatically takes care of backup and recovery. The users don't
need to backup data periodically because this is taken care of by the DBMS. Moreover, it also
restores the database after a crash or system failure to its previous condition.
Data Consistency
Data consistency is ensured in a database because there is no data redundancy. All data appears
consistently across the database and the data is same for all the users viewing the database.
Moreover, any changes made to the database are immediately reflected to all the users and there
is no data inconsistency.
Disadvantages of DBMS:
2. Conceptual level
It is also called logical level. The whole design of the database such as relationship
among data, schema of data etc. are described in this level.
Database constraints and security are also implemented in this level of
architecture. This level is maintained by DBA (database administrator).
3. Internal level
This level is also known as physical level. This level describes how the data is actually
stored in the storage devices. This level is also responsible for allocating space to the
data. This is the lowest level of the architecture.
Physical DBMS Architecture
The physical architecture describes the software components used to enter and process
data and how these software components are related and interconnected. It is possible to
identify a number of key functions which are common to most database management system.
Based on various functions, the database system may be partitioned into the following
modules.
DDL Compiler:
Data Description Language compiler processes schema definitions specified in the DDL.
It includes metadata information such as the name of the files, data items, storage details of
each file, mapping information and constraints etc.
Database Administrator:
Database Administrator (DBA) is individual or person responsible for controlling, maintenance,
coordinating, and operation of database management system. Managing, securing, and taking care
of database system is prime responsibility.
Database Administrator manages and controls three levels of database like internal level,
conceptual level, and external level of Database management system architecture and in
discussion with comprehensive user community, gives definition of world view of
database. It then provides external view of different users and applications.
Database Administrator ensures held responsible to maintain integrity and security of
database restricting from unauthorized users. It grants permission to users of database
and contains profile of each and every user in database.
Database Administrator also held accountable that database is protected and secured and
that any chance of data loss keeps at minimum.
Decides hardware –
They decides economical hardware, based upon cost, performance and efficiency of
hardware, and best suits organization. It is hardware which is interface between end users
and database.
Manages data integrity and security –
Data integrity need to be checked and managed accurately as it protects and restricts data
from unauthorized use. DBA eyes on relationship within data to maintain data integrity.
Database design –
DBA is held responsible and accountable for logical, physical design, external model
design, and integrity and security control.
Database implementation –
DBA implements DBMS and checks database loading at time of its implementation.
Query processing performance –
DBA enhances query processing by improving their speed, performance and accuracy.
Tuning Database Performance –
If user is not able to get data speedily and accurately then it may loss organization
business. So by tuning SQL commands DBA can enhance performance of database.
Administrative DBA –
Their job is to maintain server and keep it functional. They are concerned with data
backups, security, trouble shooting, replication, migration etc.
Data Warehouse DBA –
Assigned earlier roles, but held accountable for merging data from various sources into
data warehouse. They also design warehouse, with cleaning and scrubs data prior to
loading.
Development DBA –
They build and develop queries, stores procedure, etc. that meets firm or organization
needs. They are par at programmer.
Application DBA –
They particularly manages all requirements of application components that interact with
database and accomplish activities such as application installation and coordinating,
application upgrades, database cloning, data load process management, etc.
Architect –
They are held responsible for designing schemas like building tables. They work to build
structure that meets organisation needs. The design is further used by developers and
development DBAs to design and implement real application.
OLAP DBA –
They design and builds multi-dimensional cubes for determination support or OLAP
systems.
TYPES OF DATABASES:
Databases can be classified according to the number of users, location of database and the
type of usage.
Users: The count of users determines whether the database is single-user or multi-user.
Single-user database allows only one user to interact with the database at a time. Generally,
a desktop system is used for single-user database. So. it can be termed as desktop database.
A multi-user database allows multiple users at a time. It can be used for either work-group
or an enterprise application. The database meant for a department or a small organization
is termed Is Work-group database, whereas an Enterprise database is defined for an
organization with multiple departments consisting of more number of users.
Location: Database can be designed based upon the location. A database that is located to a
single' location is called as Centralized database. A database that is distributed across
afferent locations is called Distributed database.
Relational model
It is used for data storage and processing.
A relation is a table with columns and rows it is based on the mathematical concept of relation
which is represented physically as table.
Some of the benefits of Relational Model are:-
Ease of use:- The simple tabular representation of database helps the user define and query
the database conveniently.
Flexibility:- New data can be added and deleted easily. Also manipulation of data from
various tables can be done easily using various basic operation.
Accuracy:- In RDBMS the relational algebraic operations are used to manipulate database.
Table-------------------------- Relation
Properties of Relation:-
1. A relation has name that is distinct, for all the names in the relationship schema.
2. Each cell of the relation contains exactly one value.
3. Each attributes has distinct names.
4. Each tuple is distinct, there are no duplicate tuples in relation.
Relational keys
KEYS in DBMS:- KEYS in DBMS is an attribute or set of attributes which helps you to identify a
row(tuple) in a relation(table). They allow you to find the relation between two tables. Keys help you
uniquely identify a row in a table by a combination of one or more columns in that table. Key is also
helpful for finding unique record or row from the table. Database key is also helpful for finding
unique record or row.
Above is a Student table, with attributes student_id, name, phone and age.
Super Key
Super Key is defined as a set of attributes within a table that can uniquely identify each record
within a table. Super Key is a superset of Candidate key.
In the table defined above super key would include. student_id, (student_id, name),
phone, (student_id , age) & (student_id, age, name , phone)
Confused? The first one is pretty simple as student_id is unique for every row of data, hence it
can be used to identity each row uniquely.
Next comes, (student_id, name), now name of two students can be same, but their
student_id can't be same hence this combination can also be a key.
Similarly, phone number for every student will be unique, hence again, phone can also be a
key. Next comes, (student_id, age) now age of two students can be same, but their
student_id
can't be same hence this combination can also be a key.
Candidate Key
Candidate keys are defined as the minimal set of fields which can uniquely identify each record
in a table. It is an attribute or a set of attributes that can act as a Primary Key for a table to
uniquely identify each record in that table. There can be more than one candidate key.
In our example, student_id and phone both are candidate keys for table Student.
A candiate key can never be NULL or empty. And its value should be unique.
There can be more than one candidate keys for a table.
A candidate key can be a combination of more than one columns(attributes).
Primary Key
Primary key is a candidate key that is most appropriate to become the main key for any table. It
is a key that can uniquely identify each record in a table.
For the table Student we can make the student_id column as the primary key.
Composite Key
Key that consists of two or more attributes that uniquely identify any record in a table is called
Composite key. But the attributes which together form the Composite key are not a key
independentely or individually.
In the above picture we have a Score table which stores the marks scored by a student in a
particular subject.
In this table student_id and subject_id together will form the primary key, hence it is a
composite key.
The candidate key which are not selected as primary key are known as secondary keys or
alternative keys.
FOREIGN KEY
A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY
KEY in another table.
The table containing the foreign key is called the child table, and the table containing the
candidate key is called the referenced or parent table.
"Persons" table:
"Orders" table:
In the above example "PersonID" column in the "Orders" table points to the "PersonID" column
in the "Persons" table.
The "PersonID" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
The "PersonID" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links between
tables.
The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign
key column, because it has to be one of the values contained in the table it points to.
1. Native User:- Who need not be aware of the presence of database system.
These are end users of database who work through menu driven
application.
2. Application Programmer:- they are responsible for developing
application program or user interface application program will be written in
High level language.
3. Sophisticated User:- they are interact with the system withought writing
the program they request in query language.
4. Specialized user:- who writes specialized database application that do
not fit into fractional database processing frame work.
5. Online User:- Who may communicate with database directly
through online.
Constraints in DBMS-
Relational constraints are the restrictions imposed on the database contents and operations.
They ensure the correctness of data in the database.
Every relation has some condition that must hold for it to be valid relation called
Relational Integrity Constraints.
Types of Constraints in DBMS-
In DBMS, there are following 4 different types of relational constraints-
1. Domain constraint
2. Tuple Uniqueness constraint
3. Key constraint
4. Entity Integrity constraint
5. Referential Integrity Constraint
1. Domain Constraint-
Domain constraint defines the domain or set of values for an attribute.
It specifies that the value taken by the attribute must be the atomic value from its domain.
Example-
S001 Akshay 20
S002 Abhishek 21
S003 Shashank 20
S004 Rahul A
Here, value ‘A’ is not allowed since only integer values can be taken by the age attribute.
Example-01:
S001 Akshay 20
S002 Abhishek 21
S003 Shashank 20
S004 Rahul 20
This relation satisfies the tuple uniqueness constraint since here all the tuples are unique.
Example-02:
S001 Akshay 20
S001 Akshay 20
S003 Shashank 20
S004 Rahul 20
This relation does not satisfy the tuple uniqueness constraint since here all the tuples are not unique.
3. Key Constraint-
S001 Akshay 20
S001 Abhishek 21
S003 Shashank 20
S004 Rahul 20
This relation does not satisfy the key constraint as here all the values of primary key are not
unique.
Example-02:
S001 Akshay 20
S002 Abhishek 21
S003 Shashank 20
S004 Rahul 20
This relation satisfy the key constraint as here all the values of primary key are unique.
Entity integrity constraint specifies that no attribute of primary key must contain a null value in
any relation.
This is because the presence of null value in the primary key violates the uniqueness property.
Example-
S001 Akshay 20
S002 Abhishek 21
S003 Shashank 20
Rahul 20
This relation does not satisfy the entity integrity constraint as here the primary key contains a NULL
value.
This constraint is enforced when a foreign key references the primary key of a relation.
It specifies that all the values taken by the foreign key must either be available in the relation
of the primary key or be null.
Important Results-
The following two important results emerges out due to referential integrity constraint-
We can not insert a record into a referencing relation if the corresponding record does not exist in
the referenced relation.
We can not delete or update a record of the referenced relation if the corresponding record
exists in the referencing relation.
Example-
Department
Dept_no Dept_name
D10 ASET
D11 ALS
D12 ASFL
D13 ASHS
Here,
The relation ‘Student’ does not satisfy the referential integrity constraint.
This is because in relation ‘Department’, no value of primary key specifies department no. 14.
Thus, referential integrity constraint is violated.
Attributes:-
Attributes are the properties which defines Entity Type.
Entites are represented by means of their properties is called attributes
Entity:- An object with physical existance.---eg. Person, car, house. or
An object with conceptual existence ----eg. company, job, university.
Entity Set:- Set of all Entities.
S1
S2
Student S3 Entity Set
Entity Type
Eg. Roll no , Name, DOB, Age, Address are the attributes which defines entity type Student.
DOB
Name , Age
DOB,
Roll no
Student Address
Note:- There exists a domain or range of values that can be assigned to attribute.
Types of attributes
1.Simple attributes.
2.composite attributes.
3.Derived attributes.
4.Single value attributes
5. Multi valued attributes
1.Simple attributes:- These are atomic value, which cannot be derived further.
Eg:- Phone number, age.
L. Name
F. Name , M. Name
DOB, Age, DOB, Age,
3.Derived attributes:- These attributes do not exist in the physical database, but there values are
derived from other attributes present in database.
Student Age
DOB
Example:- L. Name
F Name
. Composite Attribute
Name
salary
Employee
Phone no
Multivalued attribute.
Age
DOB
Derived attributes EMP id
Kye attribute
ER:- Diagram Component:-
Mapping Cardinalities :-
1. Generalisation:-
Generalization is a process in which the common attributes of more than one entities form a new entity. This
newly formed entity is called generalized entity.
Generalization Example
These two entities have two common attributes: Name and Address, we can make a generalized entity with these
common attributes. Lets have a look at the ER model after generalization.
1. Generalization uses bottom-up approach where two or more lower level entities combine together to form a higher
level new entity.
2. The new generalized entity can further combine together with lower level entity to create a further higher level
generalized entity.
Specialization
Specialization is a process in which an entity is divided into sub-entities. You can think of it as a reverse
process of generalization, in generalization two entities combine together to form a new higher level entity.
Specialization is a top-down process.
The idea behind Specialization is to find the subsets of entities that have few distinguish attributes. For
example – Consider an entity employee which can be further classified as sub-entities Technician, Engineer &
Accountant because these sub entities have some distinguish attributes.
Specialization Example
In the above diagram, we can see that we have a higher level entity “Employee” which we have divided in sub
entities “Technician”, “Engineer” & “Accountant”. All of these are just an employee of a company, however
their role is completely different and they have few different attributes. Just for the example, I have shown that
Technician handles service requests, Engineer works on a project and Accountant handles the credit & debit
details. All of these three employee types have few attributes common such as name & salary which we had
left associated with the parent entity “Employee” as shown in the above diagram.
Aggregration
Aggregation is a process in which a single entity alone is not able to make sense in a relationship so the
relationship of two entities acts as one entity. I know it sounds confusing but don’t worry the example we will
take, will clear all the doubts.
Aggregration Example
In real world, we know that a manager not only manages the employee working under them but he has to
manage the project as well. In such scenario if entity “Manager” makes a “manages” relationship with
either “Employee” or “Project” entity alone then it will not make any sense because he has to manage both.
In these cases the relationship of two entities acts as one entity. In our example, the relationship “Works-
On” between “Employee” & “Project” acts as one entity that has a relationship “Manages” with the entity
“Manager”.
+
UNIT-II: DATABASE INTEGRITY AND NORMALISATION:
Example:
Employee number Employee Name Salary City
In this example, if we know the value of Employee number, we can obtain Employee Name, city, salary,
etc. By this, we can say that the city, Employee Name, and salary are functionally depended on
Employee number.
Eg:-
C→ D C cannot determine D
A→ D A cannot determine D
Transitive dependency:-
A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies. For e.g.
X -> Z is a transitive dependency if the following three functional dependencies hold true:
X->Y
Y does not ->X
Y->Z
Note: A transitive dependency can only occur in a relation of three of more attributes. This dependency helps us
normalizing the database in 3NF (3rd Normal Form).
Book} ->{Author} (if we know the book, we knows the author name)
Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold, that makes sense
because if we know the book name we can know the author’s age.
Partial Dependency?
For a simple table like Student, a single column like student_id can uniquely identfy all the records in a table.
But this is not true all the time. So now let's extend our example to see if more than 1 column together can act as
a primary key.
Let's create another table for Subject, which will have subject_id and subject_name fields and subject_id will
be the primary key.
subject_id subject_name
1 Java
2 C++
3 Php
Now we have a Student table with student information and another table Subject for storing subject
information.
Let's create another table Score, to store the marks obtained by students in the respective subjects. We will also
be saving name of the teacher who teaches that subject along with marks.
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are these and subject_id to know
for which subject the marks are for.
Together, student_id + subject_id forms a Candidate Key(learn about Database Keys) for this table, which can
be the Primary key.
Confused, How this combination can be a primary key?
See, if I ask you to get me marks of student with student_id 10, can you get it from this table? No, because you
don't know for which subject. And if I give you subject_id, you would not know for which student. Hence we
need student_id + subject_id to uniquely identify any row.
Now if you look at the Score table, we have a column names teacher which is only dependent on the subject, for
Java it's Java Teacher and for C++ it's C++ Teacher & so on.
Now as we just discussed that the primary key for this table is a composition of two columns which
is student_id & subject_id but the teacher's name only depends on subject, hence the subject_id, and has
nothing to do with student_id.
This is Partial Dependency, where an attribute in a table depends on only a part of the primary key and not on
the whole key.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are – Insertion, update
and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that has
four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for
storing employee’s address and emp_dept for storing the department details in which the employee works. At
some point of time the table looks like this:
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned to
any department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows
that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned
only to this department.
Normalization.
Here are the most commonly used normal forms:
Example: Suppose a company wants to store the names and contact details of its employees. It creates a table
that looks like this:
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They create a table
that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a same
teacher.
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime
attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates
the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key
of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency
X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create a table named
employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that
makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id).
This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the transitive
dependency:
employee table:
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look like:
Employee ⋈ Department
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of R1 or R2 or
must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of relation R1(ABC).
File Organization
The File is a collection of records. Using the primary key, we can access the records. The type and frequency of
access can be determined by the type of file organization which was used for a given set of records.
File organization is a logical relationship among various records. This method defines how file records are
mapped onto disk blocks.
File organization is used to describe the way in which the records are stored in terms of blocks, and the blocks
are placed on the storage medium.
The first approach to map the database to the file is to use the several files and store only one fixed length record
in any given file. An alternative approach is to structure our files so that we can contain multiple lengths for
records.
Files of fixed length records are easier to implement than the files of variable length records.
Objective of file organization
It contains an optimal selection of records, i.e., records can be selected as fast as possible.
To perform insert, delete or update transaction on the records should be quick and easy.
The duplicate records cannot be induced as a result of insert, update or delete.
For the minimal cost of storage, records should be stored efficiently.
File organization contains various methods. These particular methods have pros and cons on the basis of
access or selection. In the file organization, the programmer decides the best-suited file organization method
according to his requirement.
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are nothing but
a row in the table. Suppose we want to insert a new record R2 in the sequence, then it will be placed at the end
of the file. Here, records are nothing but a row in any table.
2. Sorted File Method:
In this method, the new record is always inserted at the file's end, and then it will sort the sequence in ascending
or descending order. Sorting of records is based on any primary key or any other key.
In the case of modification of any record, it will update the record and then sort the file, and lastly, the updated
record is placed in the right place.
Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7. Suppose a
new record R2 has to be inserted in the sequence, then it will be inserted at the end of the file, and then it will
sort the sequence.
Advantages of sequential file organization
It contains a fast and efficient method for the huge amount of data.
In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
It is simple in design. It requires no much effort to store the data.
This method is used when most of the records have to be accessed like grade calculation of a student, generating
the salary slip, etc.
This method is used for report generation or statistical calculations.
Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a new record
R2 in a heap. If the data block 3 is full then it will be inserted in any of the database selected by the DBMS,
let's say data block 1.
If we want to search, update or delete the data in heap file organization, then we need to traverse the data from
staring of the file till we get the requested record.
If the database is very large then searching, updating or deleting of record will be time-consuming because
there is no sorting or ordering of records. In the heap file organization, we need to check all the data until we
get the requested record.
When a record has to be received using the hash key columns, then the address is generated, and the whole
record is retrieved using that address. In the same way, when a new record has to be inserted, then the address
is generated using the hash key and record is directly inserted. The same process is applied in the case of
delete and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each record will be
stored randomly in the memory.
4. B+ File Organization
B+ tree file organization is the advanced method of an indexed sequential access method. It uses a tree-like
structure to store records in File.
It uses the same concept of key-index where the primary key is used to sort the records. For each primary key,
the value of the index is generated and mapped with the record.
The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this method, all
the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf nodes. They do not
contain any records.
If any record has to be retrieved based on its index value, then the address of the data block is fetched and the
record is retrieved from the memory.
Advantage of ISAM:
In this method, each record has the address of its data block, searching a record in a huge database is quick and
easy.
This method supports range retrieval and partial retrieval of records. Since the index is based on the primary key
values, we can retrieve the data for the given range of value. In the same way, the partial value can also be easily
searched, i.e., the student name starting with 'JA' can be easily searched.
Disadvantage of ISAM
This method requires extra space in the disk to store the index value.
When the new records are inserted, then these files have to be reconstructed to maintain the sequence.
When the record is deleted, then the space used by it needs to be released. Otherwise, the performance of the
database will slow down.
In this method, we can directly insert, update or delete any record. Data is sorted based on the key with which
searching is done. Cluster key is a type of key with which joining of the table is performed.
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The above EMPLOYEE
and DEPARTMENT relationship is an example of an indexed cluster. Here, all the records are grouped based
on the cluster key- DEP_ID and all the records are grouped.
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the cluster key, we
generate the value of the hash key for the cluster key and store the records with the same hash key value.
Types of Index
Indexing is defined based on its indexing attributes. Indexing can be of the following types-
•Primary Index Primary index is defined on an ordered data file. The data file is ordered on a key field. The key field is generally
the primary key of the relation.
•Secondary Index-Secondary index may be generated from a field which is a candidate key and has a unique value in every record,
or a non-key with duplicate values.
•Clustering Index-Clustering index is defined on an ordered data file. The data file is ordered on a non-key field.
Ordered Indexing is of two types-
•Dense Index
•Sparse Index
Dense Index:-
In dense index, there is an index record for every search key value in the database. This makes searching faster but
requires more space to store index records itself. Index records contain search key value and a pointer to the actual
record on the disk.
Sparse Index
In sparse index, index records are not created for every search key. An index record here contains a search key and an
actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at the actual
location of the data. If the data we are looking for is not where we directly reach by following the index, then the system
starts sequential search until the desired data is found.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the disk along with the actual
database files. As the size of the database grows, so does the size of the indices. There is an immense need to keep
the index records in the main memory so as to speed up the search operations.
Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost level so
small that it can be saved in a single disk block, which can easily be accommodated anywhere in the main memory.