DBMS Notes
DBMS Notes
Database-
The database is a collection of inter-related data which is used to retrieve, insert and delete the data
efficiently. It is also used to organize the data in the form of a table, schema, views, and reports, etc.
Functions of DBMS-
Data Storage Management,
Data Transformation and Presentation,
Security Management,
Multi user Access Control,
Backup and Recovery Management
Application of DBMS-
There are different fields where a database management system is utilized. Following are a few applications-
Railway Reservation System
Library Management System
Banking
Education Sector
Account
Online Shopping
Manufacturing
Advantages Of DBMS Over File Oriented System-
Data sharing
The file system does not allow sharing of data Whereas in DBMS, data can be shared easily due to
a centralized system.
Data searching
While DBMS provides inbuilt searching operations. The user only has to write a small query to
retrieve data from the database.
System crashing
In some cases, systems might have crashed due to various reasons. DBMS will have the recovery
manager which retrieves the data .
Data security
A file system provides a password mechanism to protect the database but how long can the
password be protected? No one can guarantee that. This doesn’t happen in the case of DBMS.
DBMS has specialized features that help provide shielding to its data.
Backup
It creates a backup subsystem to restore the data if required.
Interfaces
It provides different multiple user interfaces like graphical user interface and application program
interface.
SQL-
Structured Query Language(SQL) is the database language by which we can perform certain
operations on the existing database and also we can use this language to create a database. SQL uses
certain commands like Create, Drop, Insert, etc. to carry out the required tasks.
These SQL commands are mainly categorized into four categories as:
DDL (Data Definition Language)
DDL is a set of SQL commands used to create, modify, and delete database structures but not
data.These commands are normally not used by a general user, who accessing the database via an
application.
DQL (Data Query Language)
It is a component of SQL statement that allows getting data from the database and imposing order
upon it. It includes the SELECT statement.
DML (Data Manipulation Language)
The SQL commands that deals with the manipulation of data present in the database belong to DML
and this includes most of the SQL statements.
DCL (Data Control Language)
DCL includes commands such as GRANT and REVOKE which mainly deal with the rights,
permissions, and other controls of the database system.
TCL (Transaction Control Language)Transactions group a set of tasks . Each transaction begins
with a specific task and ends when all the tasks in the group successfully complete. If any of the
tasks fail, the transaction fails.
Data Abstraction
Data Abstraction is a process of hiding unwanted details from the end user. The database systems
consist of complicated data structures and relations. For users to access the data easily, these
complications are kept hidden, and only the relevant part of the database is made accessible to the
users through data abstraction.
Levels of abstraction for DBMS
Physical or Internal Level
Logical or Conceptual Level
View or External Level
Physical or Internal Level
It is the lowest level of abstraction for DBMS which defines how the data is actually stored,
Actually, it is decided by developers or database application programmers how to store the data in
the database.
Logical or Conceptual Level
Logical level is the intermediate level. It describes what data is stored in the database and what
relationship exists among those data. It tries to describe the entire or whole data .
View or External Level
It is the highest level. In view level, there are different levels of views and every view only defines a
part of the entire data. it provides many views or multiple views of the same database.
UNIT-2
ER model(Entity-Relationship model)-
It is a high-level data model. This model is used to define the data elements and relationship for a
specified system.It develops a conceptual design for the database. It also develops a very simple and
easy to design view of data.
For example, Suppose we design a school database. In this database, the student will be an entity
with attributes like address, name, id, age, etc. The address can be another entity with attributes like
city, street name, pin code, etc and there will be a relationship between them.
ER-Diagram-
ER Diagram stands for Entity Relationship Diagram that displays the relationship of entity sets
stored in a database. ER diagrams are created based on three basic concepts: entities, attributes and
relationships.
ER Diagrams contain different symbols that use rectangles to represent entities, ovals to define
attributes and diamond shapes to represent relationships.At first look, an ER diagram looks very
similar to the flowchart.
Component of ER Diagram/Er Model-
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented
as rectangles.Consider an organization as an example- manager, product, employee, department etc.
can be taken as an entity.
Weak Entity:An entity that depends on another entity called a weak entity. The weak entity is
represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
Key Attribute:The key attribute is used to represent the main characteristics of an entity. It
represents a primary key.
Composite Attribute:An attribute that composed of many other attributes is known as a composite
attribute.
Multivalued Attribute:An attribute can have more than one value. The double oval is used to
represent multivalued attribute.For example, a student can have more than one phone number.
Derived Attribute:An attribute that can be derived from other attribute is known as a derived
attribute.For example, A person's age changes over time and can be derived from another attribute
like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to
represent the relationship.
One-to-One Relationship:
For example, A female can marry to one male, and a male can marry to one female.
One-to-many relationship:
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.
Many-to-one relationship:
For example, Student enrolls for only one course, but a course can have many students.
Many-to-many relationship:
For example, Employee can assign by many projects and project can have many employees.
Aggregation –
An ER diagram is not capable of representing relationship between an entity and a relationship which
may be required in some scenarios. Aggregation is an abstraction through which we can represent
relationships as higher level entity sets.
Network Model
This is an extension of the Hierarchical model. In this model data is organised more like a graph, and
are allowed to have more than one parent node.In this database model data is more related as more
relationships are established in this database model. Also, as the data is more related, hence
accessing the data is also easier and fast.
Entity-relationship Model
It is a high-level data model. This model is used to define the data elements and relationship for a
specified system.It develops a conceptual design for the database. It also develops a very simple and
easy to design view of data.For example, Suppose we design a school database. In this database, the
student will be an entity with attributes like address, name, id, age, etc. The address can be another
entity with attributes like city, street name, pin code, etc and there will be a relationship between
them.
Relational Model
In this model, data is organised in two-dimensional tables and the relationship is maintained by
storing a common field.The basic structure of data in the relational model is tables. All the
information related to a particular type is stored in rows of that table.
Hence, tables are also known as relations in relational model.
Keys in DBMS-
Primary key
It is the first key used to identify one and only one instance of an entity uniquely. An entity can
contain multiple keys. The key which is most suitable from those lists becomes a primary key.
In the EMPLOYEE table, ID can be the primary key since it is unique for each employee..
For each entity, the primary key selection is based on requirements .
Candidate key
A candidate key is an attribute or set of attributes that can uniquely identify a tuple(row).
Except for the primary key, the remaining attributes are considered a candidate key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
Super Key
Super key is an attribute set that can uniquely identify a tuple(row). A super key is a superset of a
candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name
of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.
SQL Constraints
Constraints are the rules that we can apply on the type of data in a table. That is, we can specify the
limit on the type of data that can be stored in a particular column .
The available constraints in SQL are:
NOT NULL: This constraint tells that we cannot store a null value in a column. That is, if a column
is specified as NOT NULL then we will not be able to store null in this particular column any more.
UNIQUE: This constraint tells that all the values in the column must be unique. That is, the values
in any row of a column must not be repeated.
PRIMARY KEY: A primary key is a field which can uniquely identify each row in a table. And this
constraint is used to specify a field in a table as primary key.
FOREIGN KEY: A Foreign key is a field which can uniquely identify each row in a another table.
And this constraint is used to specify a field as Foreign key.
CHECK: This constraint helps to validate the values of a column , it helps to ensure that the value
stored in a column meets a specific condition.
DEFAULT: This constraint specifies a default value for the column when no value is specified by
the user.
UNIT-3
Hashing
In a huge database structure, it is very inefficient to search all the index values and reach the desired
data. Hashing technique is used to calculate the direct location of a data record on the disk without
using index structure.In this technique, data is stored at the data blocks whose address is generated
by using the hashing function. The memory location where these records are stored is known as data
bucket or data blocks.
Types of Hashing-
Static Hashing-
In static hashing, the resultant data bucket address will always be the same. That means if we generate an
address using the hash function then it will always result in same bucket address 3. Here, there will be no
change in the bucket address.Hence in this static hashing, the number of data buckets in memory remains
constant throughout.
Operations of Static Hashing-
Searching a Record
Deleting a Record
Insert a Record
Update a Record
Dynamic Hashing-
The dynamic hashing method is used to overcome the problems of static hashing like bucket overflow.
In this method, data buckets grow or shrink as the records increases or decreases. This method is also
known as Extendable hashing method.
It allows insertion or deletion without resulting in poor performance.
SQL Query -
A query is an operation that retrieves data from one or more tables or views. In this reference, a top-
level SELECT statement is called a query, and a query nested within another SQL statement is
called a subquery.
SQL SubQuery-
A Subquery or Inner query or a Nested query is a query within another SQL query
A subquery is used to return data that will be used in the main query .
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements.
There are a few rules that subqueries must follow −
Subqueries must be enclosed within parentheses.
A subquery can have only one column in the SELECT clause.
An ORDER BY command cannot be used in a subquery.
The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY,
CLOB, or NCLOB.
A subquery cannot be immediately enclosed in a set function.
Cursor in SQL-
Cursor is a Temporary Memory or Temporary Work Station. It is Allocated by Database Server at
the Time of Performing DML(Data Manipulation Language) operations on Table by User. Cursors
are used to store Database Tables.
Implicit Cursors
Explicit Cursors
Indexing in DBMS-
Indexing is used to optimize the performance of a database by minimizing the number of disk
accesses required when a query is processed.
The index is a type of data structure. It is used to locate and access the data in a database table
quickly.
Index structure:
Indexes can be created using some database columns.
The first column of the database is the search key that contains a copy of the primary key or
candidate key of the table.
The second column of the database is the data reference. It contains a set of pointers holding the
address of the disk block where the value of the particular key can be found.
Indexing Methods-
Ordered indices-
The indices are usually sorted to make searching faster. The indices which are sorted are known as
ordered indices.
Primary Index
If the index is created on the basis of the primary key of the table, then it is known as primary
indexing.
Dense index
The dense index contains an index record for every search key value in the data file. It makes
searching faster.
Sparse index
In this, instead of pointing to each record in the main table, the index points to the records in the
main table in a gap.
Clustering Index
In this case, to identify the record faster, we will group two or more columns to get the unique
value and create index out of them. This method is called a clustering index.
The records which have similar characteristics are grouped, and indexes are created for these
group.
Secondary Index
If the mapping size grows then fetching the address itself becomes slower. In this case, the sparse
index will not be efficient. To overcome this problem, secondary indexing is introduced.
Relational Database-
A relational database is a type of database that stores and provides access to data points that are
related to one another. In a relational database, each row in the table is a record with a unique ID
called the key. The columns of the table hold attributes of the data, and each record usually has a
value for each attribute, making it easy to establish the relationships among data points.
Relational Model concept-
Domain: It contains a set of atomic values that an attribute can take.
Relational instance: In the relational database system, the relational instance is represented by a
finite set of tuples. Relation instances do not have duplicate tuples.Play Video
Relational schema: A relational schema contains the name of the relation and name of all columns
or attributes.
Relational key: In the relational key, each row has one or more attributes.
UNIT-5
Inheritance -
Inheritance in Java is a mechanism in which one object acquires all the properties and behaviors of
a parent object. It is an important part of OOPs (Object Oriented programming system).
The idea behind inheritance in Java is that you can create new classes that are built upon existing
classes. When you inherit from an existing class, you can reuse methods and fields of the parent
class. Moreover, you can add new methods and fields in your current class also.
Sub Class/Child Class: Subclass is a class which inherits the other class. It is also called a
derived class, extended class, or child class.
Super Class/Parent Class: Superclass is the class from where a subclass inherits the features. It is
also called a base class or a parent class.
UNIT-6
Normalization
A large database defined as a single relation can result data duplication. This repetition of data may
result in:
Making relations very large.
It isn't easy to maintain and update data .
Wastage and poor utilization of disk space and resources.
So to handle these problems, we should analyze and decompose the relations into smaller, simpler,
and well-structured relations that are satisfy desirable properties. Normalization is a process of
decomposing the relations into relations with fewer attributes.
Normalization
Normalization is the process of organizing the data in the database.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
Normalization consists of a series of guidelines that helps to guide you in creating a good database
structure.
Normal Forms
Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations.
1NF-A relation is in 1NF if it contains an atomic value.
2NF-A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
on the primary key.
3NF-A relation will be in 3NF if it is in 2NF and no transition dependency exists.
BCNF-A stronger definition of 3NF is known as Boyce Codd's normal form.
4NF-A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
dependency.
5NF-A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
lossless.
Functional Dependency-
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if
we know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
1stEmp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
1. Trivial functional dependency
A → B has trivial functional dependency if B is a subset of A.
2. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.
Multivalued Dependency-
Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute.
A multivalued dependency consists of at least two attributes that are dependent on a third attribute
that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and
black) of each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
Lossless Decomposition-
Lossless join decomposition is a decomposition of a relation R into relations R1, R2 such that if we
perform a natural join of relation R1 and R2, it will return the original relation R.
In Lossless Decomposition, we select the common attribute and the criteria.
Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at least one of the
following functional dependencies are in F+ (Closure of functional dependencies)
Select Operation -
This operation is used to select rows from a table (relation) that specifies a given logic, which is
called as a predicate. The predicate is a user defined condition to select rows of user's choice.
Project Operation -
If the user is interested in selecting the values of a few attributes, rather than selection all attributes of
the Table (Relation), then one should go for PROJECT Operation.
Join Operator in Algebra-
Join operation combines the relation R1 and R2 with respect to a condition. It is denoted by .
The different types of join operation are as follows -
Theta join
Natural join
Outer join − It is further classified into following types −
Left outer join.
Right outer join.
Full outer join.
Transaction-
A transaction can be defined as a group of tasks. A single task is the minimum processing unit which
cannot be divided further.
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's
account to B's account. This very simple and small transaction involves several low-level tasks.
ACID Properties:
Atomicity
Consistency
Durability
Isolation
Transaction State Diagram-
Schedule-
A series of operation from one transaction to another transaction is known as schedule.
Serial Schedule
Non-Serial Schedule
Serializable Schedule
Types of Serializable Schedules :
Conflict Equivalent Schedule or Conflict Serializability
If a schedule is a conflict equivalent to its serial schedule then it is called Conflict Serializable
Schedule.
View Equivalent Schedule or View Serializability
If a schedule is view equivalent to its serial schedule then it is called View Serializable Schedule.
Concurrency Control-
Concurrency control concept comes under the Transaction in database management system (DBMS).
If many transactions try to access the same data, then inconsistency arises. Concurrency control
required to maintain consistency data.
For example, if we take ATM machines and do not use concurrency, multiple persons cannot draw
money at a time in different places. This is where we need concurrency.
Various concurrency control techniques are:
1. Two-phase locking Protocol
2. Time stamp ordering Protocol
3. Multi version concurrency control
4. Validation concurrency control
Why Concurrency Control Needed-
Most high-performance transactional systems need to run transactions concurrently to meet their
performance requirements. Thus, without concurrency control such systems neither provide correct
results nor maintain their databases consistently.
Lock Based Protocols –
It is required in this protocol that all the data items must be accessed in a mutually exclusive manner.
Let me introduce you to two common locks which are used and some terminology followed in this
protocol.
Shared Lock (S):
Exclusive Lock (X):
OTHER
Referential integrity-
Referential integrity is a term used in database design to describe the relationship between two
tables. It is important because it ensures that all data in a database remains consistent and up to date.
It helps to prevent incorrect records from being added, deleted, or modified.
For example, if we delete row number 15 in a primary table, we need to be sure that there’s no
foreign key in any related table with the value of 15. We should only be able to delete a primary key
if there are no associated rows. Otherwise, we would end up with an orphaned record.
Need of Referential Integrity-
Referential integrity is a data quality concept that ensures that when you make changes to data in one
place, those changes are reflected in other related records.
Referential Integrity constraint-
A referential integrity constraint is also known as foreign key constraint. A foreign key is a key
whose values are derived from the Primary key of another table.
The table from which the values are derived is known as Master or Referenced Table and the Table
in which values are inserted accordingly is known as Child or Referencing Table,
Authorization-
Authorization is provided by the Database Administer. Users of the database can only view the
contents they are authorized to view.
The different permissions for authorizations available are:
Primary Permission
Secondary Permission
Public Permission
Context sensitive permission
Strong Entity
Strong Entity is independent to any other entity . A strong entity always have a primary key. In ER
diagram, a strong entity is represented by rectangle. Relationship between two strong entities is
represented by a diamond. A set of strong entities is known as strong entity set.
Weak Entity
Weak entity is dependent on strong entity and cannot exists without a corresponding strong. It has a
foreign key which relates it to a strong entity. Relationship between a strong entity and a weak entity
is represented by double diamond.
Recovery Techniques
Recovery Techniques of the information base are demonstrated as follows −
Log Based Recovery
Logs are the continuation of records which are used to oversee records of the activities during an
exchange. Logs are composed before the real change and put away on a steady stockpiling media.
Log Based Recovery procedure works in three distinct habits as follows −
Conceded Update
Quick Update
Checkpoint
Conceded Update Method
In this technique, an information base isn't truly refreshed on a circle until after an exchange arrives
at its submitting point.
Reserving/Buffering
In this at least one circle, pages that incorporate information things to be refreshed are stored into
principal memory supports and afterward refreshed in memory prior to being composed back to
plate.
Union Compatibility of Relations-
The union compatibility of relations implies that the participating relations must fulfil the following
conditions.
1. Same degree, i.e. The two relations must have the same number (set) of attributes.
2. Same domain of each corresponding pair of attributes of relation A and relation B
Distributed Database System-
A distributed database is basically a database that is not limited to one system, it is spread over
different sites, i.e, on multiple computers or over a network of computers. A distributed database
system is located on various sites that don’t share physical components. This may be required when a
particular database needs to be accessed by various users globally.
1. Homogeneous Database:
In a homogeneous database, all different sites store database identically.
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different schema and
software.Different computers may use a different operating system, different database application.