Unit 2 DBMS
Unit 2 DBMS
It is logical data structure, such as data tables, views, indexes etc… that are separate from the physical storage
structure. This separation means that database administrators can manage physical data storage without
affecting access to that data as a logical structure.
The relational model in DBMS has several features that make it well suited for storing data: atomicity,
consistency, isolation, durability, data consistency, data working together, data flexibility, and lookup
relationships.
Atomicity keeps data accurate by ensuring that all changes to the data are either made completely or not
made at all and that there are no partial changes to the data.
Atomicity is the property of transactions that all changes in a transaction either happen as a group, or no
changes happen at all.
Atomicity in DBMS states that either a transaction is completed or does not occur at all. There is no way
in between, i.e., transactions do not happen partially.
Every transaction is a unit of work and it has two key operations:-
Commit: If this happens, the changes are finally made in the database, and the transaction is complete.
Abort: If this happens, changes are not made in the database, and the transaction does not occur.
Atomicity in the database ensures that the database remains consistent. Also, it makes sure that there are no
incomplete manipulations in the database.
A typical example of this is the order processing. When a person places an order and makes the payment for it,
either the whole order is processed, and the person receives the order, or the order does not get processed at all,
and no order is shipped.
Consistency ensures that the state of the database remains consistent throughout the transaction.
Data consistency is the accuracy, completeness and correctness of data stored in a database.
Data consistency - data written to the database must be valid accordingly to all defined rules.
It is used to ensure data quality and integrity in the database. It is one of the very important concepts in DBMS
because it ensures that the data in the database is correct and can be used for decision-making and thus useful
for metrics and analytical purposes also.
Data Consistency in DBMS is defined by a set of rules that ensure that all data points in the database system are
correctly read and accepted.
Isolation ensures that changes made by one transaction are not visible to other transactions until the
changes are committed.
Isolation in database refers to the ability of a database system to allow multiple transactions to access the same
data without interfering with each other. Isolation ensures that each transaction sees a consistent view of the
data, regardless of the concurrent activities of other transactions.
Durability ensures that once changes are committed, they will remain even if there is a system failure.
Durability ensures that changes made to the database (transactions) that are successfully committed will survive
permanently, even in the case of system failures.
Durability is achieved with database backups, transaction logs, and disk storage. For example, if a transaction
updates a customer's address, durability ensures the updated address is not lost due to a hard disk failure or
power outage. The change will persist with the help of storage devices, backups, and logs.
MySQL ensures durability because it by maintaining logs of previous database states during a transaction. In
the event of a system failure, you can recover your MySQL database at a certain state ensuring durability.
Durability in databases is the property that ensures transactions are saved permanently and do not accidentally
disappear or get erased, even during a database crash. This is usually achieved by saving all transactions to a
non-volatile storage medium.
Data consistency means that all data in the database is consistent with itself. Data working together means
related data is stored in different tables and linked together using keys.
Consistency in database systems refers to the requirement that any given database transaction must change
affected data only in allowed ways. For a database to be consistent, data written to the database must be valid
according to all defined rules, including constraints, cascades, triggers, or any combination.
Data consistency refers to the quality of data being uniform, accurate, and coherent across various
databases, systems, and applications within an organization.
Data consistency refers to the quality of data being uniform, accurate, and coherent across various
databases, systems, and applications within an organization. It ensures that data remains the same and
aligns with the established rules and standards throughout its lifecycle, regardless of the platform or location
it’s accessed from.
Consistent data is what keeps a database running like clockwork. Established rules/values keep inconsistent
data out of primary databases and replicas, allowing its processes to run smoothly.
Types of Consistency Models:
Strong consistency - requires all nodes in the system to agree on a single consistent value.
Eventual consistency - allows for temporary inconsistencies in the data store, but eventually, all nodes
converge to the same value.
Data flexibility means that new data can be added without having to restructure the entire database.
Lookup relationships allow you to look up data in one table based on data in another table.
Normalization is the process of reducing data redundancy and improving data integrity by dividing your data
into smaller and simpler tables that are linked by foreign keys.
Denormalization is the process of increasing data redundancy and improving query efficiency by combining
your data into larger and more complex tables that reduce the number of joins.
You should apply normalization and denormalization techniques according to your data requirements and trade-
offs. For example, if you need to ensure data consistency and avoid update anomalies, you might apply
normalization.
If you need to speed up your queries and reduce the complexity of your code, you might apply denormalization.
In this type of relationship, one entity is related to multiple other entities. For example, a customer may have
multiple orders, a teacher may have multiple students, and a product may be sold in multiple stores. The key
feature of the one-to-many model is that the relationship between two entities is not reciprocal. That is, the
fact that a customer has multiple orders does not mean that an order has multiple customers.
Example. If the two entity types are 'Customer' and 'Account,' each 'Customer' can have many 'Accounts,' but
each 'Account' can only be owned by one 'Customer. ' In this case, we can say that each customer is linked to a
number of accounts. As a result, the relationship is one-to-many.
In systems analysis, a one-to-one relationship is a type of cardinality that refers to the relationship between two
entities (see also entity–relationship model) A and B in which one element of A may only be linked to one
element of B, and vice versa.
A One-to-One relationship in a Database Management System (DBMS) represents a unique
connection between two tables where each record appears only once in both tables. This type of
relationship can be seen in real-world scenarios such as an employee and their assigned workstation.
Country - capital city: Each country has exactly one capital city. Each capital city is the capital of exactly one
country. Person - their fingerprints. Each person has a unique set of fingerprints.
Country - capital city: Each country has exactly one capital city. Each capital city is the capital of
exactly one country.
Person - their fingerprints. Each person has a unique set of fingerprints. Each set of fingerprints
identifies exactly one person.
Email - user account. For many websites, one email address is associated with exactly one user
account and each user account is identified by its email address.
Spouse - spouse: In a monogamous marriage, each person has exactly one spouse.
User profile - user settings. One user has one set of user settings. One set of user settings is associated
with exactly one user.
For clarity, let’s contrast these examples with relationships that are not one-to-one:
Country - city: Each city is in exactly one country, but most countries have many cities.
Parent - child: Each child has two parents, but each parent can have many children.
Employee - manager: Each employee has exactly one immediate supervisor or manager, but each
manager usually supervises many employees.
A one-to-one relationship in an ER diagram is denoted, like all relationships, with a line connecting the two
entities. The “one” cardinality is denoted with a single straight line. (The “many” cardinality is denoted with a
crow’s foot symbol .)
The one-to-one relationship between country and capital can be denoted like this:
The perpendicular straight lines mean “mandatory”. This diagram shows that it’s mandatory for a capital to
have a country and it’s mandatory for a country to have a capital.
Another possibility is for one or both of the sides of the relationship to be optional. An optional side is denoted
with an open circle. This diagram says that there is a one-to-one relationship between a person and their
fingerprints. A person is mandatory (fingerprints must be assigned to a person), but fingerprints are optional (a
person may have no fingerprints assigned in the database).
One-to-One Relationships in a Physical Database
One way to implement a one-to-one relationship in a database is to use the same primary key in both tables.
Rows with the same value in the primary key are related. In this example, France is a country with the id 1 and
its capital city is in the table capital under id 1.
country
id name
1 France
2 Germany
3 Spain
capital
Technically, one of the primary keys has to be marked as foreign key, like in this data model:
The primary key in table capital is also a foreign key which references the id column in the table country.
Since capital.id is a primary key, each value in the column is unique, so the capital can reference at most one
country. It also must reference a country – it’s a primary key, so it cannot be left empty.
Another way you can implement a one-to-one relationship in a database is to add a new column and make it a
foreign key.
In this example, we add the column country_id in the table capital. The capital with id 1, Madrid, is associated
with country 3, Spain.
country
id name
1 France
2 Germany
3 Spain
capital
id name country_id
1 Madrid 3
2 Berlin 2
3 Paris 1
Technically, the column country_id should be a foreign key referencing the id column in the table country.
Since you want each capital to be associated with exactly one country, you should make the foreign key
column country_id unique.
One-to-one relationships are the least frequent relationship type. One of the reasons for this is that very few
one-to-one relationships exist in real life. Also, most one-to-one relationships are one-to-one only for some
period of time. If your model includes a time component and captures change history, as is very often the case,
you’ll have very few one-to-one relationships.
A monogamous relationship may split up or one of the partners may die. If you model the reality of
monogamous relationships (such as marriages or civil unions) over time, you’ll likely need to model the fact
that they last only for a certain period.
You’d think that a person and their fingerprints never change. But what if the person loses a finger or the finger
is badly burnt? Their fingerprints might change. It’s not a very frequent scenario; still, in some models, you
may need to take this into account.
Even something seemingly as stable as countries and their capitals change over time. For example, Bonn used
to be the capital of West Germany (Bundesrepublik Deutschland) after World War II, when Berlin was part of
East Germany. This changed after German reunification; the capital of Germany (Bundesrepublik Deutschland)
is now Berlin. Whether you should or should not take this into account depends on your business reality and the
application you’re working on.
I can think of one feasible scenario for a real one-to-one relationship: optional parts of a table. Imagine you
have the table user with user data. The table contains general user information, such as users’ names, email
addresses, and signup dates. It also contains user settings, such as the color theme or auto-login for that app.
However, most users don’t have any user settings; they use the default settings.
user
There are a lot of empty fields in this table. You could split the user table into two
tables: user and user_settings, which contains information about users settings for those who chose to select
them.
user
user_settings
Splitting data into two tables makes table querying more complex: you have to join data from both tables. On
the other hand, the main user table is simpler to manage.
A one-to-one relationship is a relationship where a record in one table is associated with exactly one record in
another table. This type of relationship is rare in real life. Keep in mind that such logical questions are popular
in ENTRY-LEVEL SQL JOB INTERVIEWS. If you include time in your data model, many one-to-one
relationships become one-to-many or many-to-many relationships. The most common scenario for using a one-
to-one relationship in a database is splitting one table into two: one with mandatory columns, the other with
optional columns.
Advantage:
It offers a number of advantages over other models, including accuracy, ease of use, collaboration, security,
and categorization.
Attribute: An attribute is a characteristic or quality of an entity. In the context of the relational model,
an attribute is a column in a table.
Tables: A table is a data collection organized into rows and columns. In the relational model for
database management, tables are used to store information about entities.
Tuple: A tuple is a row in a table. Tuples are composed of attributes.
Relation Schema: A relation schema is a blueprint for a table. It defines the attributes that are contained
in a table as well as the relationships between those attributes.
Degree: The degree is the number of attributes that it contains.
Cardinality: Cardinality defines the relationship between two attributes in a relation schema. There are
three possible relationships: one-to-one, one-to-many, and many-to-many.
Column: A column is an attribute that is contained in a table. Columns are also sometimes referred to
as fields.
Relation instance: A relation instance is a set of tuples that conforms to a given relation schema. In
other words, it is populated data that has been organized into rows and columns according to the
blueprint set forth by a relation schema.
Relation key: A relation key is an attribute or combination of attributes that uniquely identifies a tuple
in a table.
Domain: The domain of an attribute is the set of values that can be stored in that attribute. For example,
if an attribute represents ages, its domain would be any age from 0 onward.
Relational Model Constraints
Constraints in DBMS (Database Management Systems) encompass rules or conditions applied to database data,
ensuring data integrity, consistency, and adherence to business rules. They define limitations and requirements
that data must meet, preventing the entry of invalid or inconsistent data. Constraints serve as pre-established
rules governing the behavior and relationships of data in a database, contributing to the maintenance of accuracy
and reliability.
Relational database constraints are rules in a database model that help maintain the integrity and consistency of
data. These rules include primary key constraints, unique constraints, foreign key constraints, check constraints,
default constraints, not null constraints, multi-column constraints, etc.
Domain constraints in DBMS are the most basic type of integrity constraint and are used to ensure that
the data in the database is valid.
For example, domain constraints in DBMS could specify that the values in a gender column must be either
male or female.
The domain of Marital Status has a set of possibilities: Married, Single, Divorced
1. Check Constraint: A check constraint in a Database Management System (DBMS) is a way to enforce
certain conditions on the values that are stored in a database. It is a rule or condition that is specified at the
time of table creation or alteration to restrict the values that can be inserted or updated in a column.
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, Salary DECIMAL(10, 2) CHECK (Salary > 0),
Department VARCHAR(50) CHECK (Department IN ('HR', 'IT', 'Finance')) );
With this check constraint, any attempt to insert or update a record in the Employee table with a Salary less than
0 or a Department other than HR, IT, or Finance will result in a constraint violation error.
2. Not Null Constraint: The NOT NULL constraint in a Database Management System (DBMS) is used to
ensure that a column in a table cannot contain any NULL values. NULL is a special marker used in databases
to indicate that a data value does not exist in the database. The NOT NULL constraint ensures that a column
always has a value, and it cannot be left empty.
CREATE TABLE Employees ( EmployeeID INT, FirstName VARCHAR(50) NOT NULL, LastName
VARCHAR(50));
Key constraints are regulations that a DBMS uses to ensure data accuracy and consistency in a database. They
define how the values in a table's one or more columns are related to the values in other tables, making sure that
the data remains correct.
a) A primary key is a column or a set of columns that uniquely identifies each row in a table.
b) The primary key constraint ensures that the values in the specified columns are unique and not NULL.
2. Unique Constraint:
A unique constraint ensures that all values in a column or a set of columns are unique.
Example: Suppose we have a Products table where we want to ensure that each product has a unique product
code:
A foreign key is a column or a set of columns in a table that refers to the primary key of another table.
The foreign key constraint ensures that values in the foreign key column(s) match values in the
referenced primary key column(s).
The foreign key constraint ensures referential integrity, meaning that relationships between tables are
maintained, and it helps prevent inconsistencies in the data. It’s a powerful tool for enforcing relationships
between tables in a relational database.
Example: Let’s consider a simple example to illustrate the use of a foreign key constraint in a relational
database. Suppose we have two tables: Customers and Orders.
CREATE TABLE Orders ( OrderID INT PRIMARY KEY,CustomerID INT,OrderDate DATE, TotalAmount
DECIMAL(10, 2),
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID));
Candidate Key
A candidate key is a column or a combination of columns that uniquely identifies each row in a table. It is used
to ensure that there are no duplicate or ambiguous records in a table. A Candidate key is a subset of Super keys
and devoid of unnecessary attributes that are not important for uniquely identifying tuples.
You declare a column as a candidate key by using the keyword UNIQUE . Precede the UNIQUE keyword
with the NOT NULL specification.
CREATE TABLE employee ( empno INTEGER NOT NULL PRIMARY KEY,ss_no INTEGER NOT NULL
UNIQUE, ename CHAR (19), sal NUMERIC (10, 2), deptno INTEGER NOT NULL) ;
Like a primary key, a candidate key also uniquely identifies a row in a table. Note that a table can have
only one primary key, but can have any number of candidate keys.
A column or a set of columns can be called as candidate key if they identify each row of a table uniquely. ...
Alternate key: There can be more than one keys which can identify each row of the table uniquely. One of
them is defined as primary key and rest of them is called alternate keys of the table.
The Entity Integrity Constraint is essentially a subset of the Key constraint in a database. While the Key
constraint ensures that Primary Key attributes are unique and non-null, the Entity Integrity Constraint
specifically emphasizes that no attribute of a Primary Key should contain null values. This constraint
highlights the perspective that allowing null values in Primary Key attributes could lead to multiple null entries,
violating the uniqueness requirement for each tuple in the Primary Key. Therefore, the Entity Integrity
Constraint reinforces the importance of non-null values within the Primary Key to maintain the uniqueness of
each record in a relational database.
Example: Let’s consider an example to illustrate entity constraints in a database. Assume we have a table named
Employees with the following structure:
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
Email VARCHAR(100) UNIQUE);
Now, let’s insert some data to demonstrate the entity integrity constraints:
-- Valid data insertion
INSERT INTO Employees (EmployeeID, FirstName, LastName, Email) VALUES (1, 'John', 'Doe',
'john.doe@example.com');
In this example, the entity integrity constraint ensures that the EmployeeID attribute, serving as the primary key,
cannot contain null values. Attempting to insert data with a null FirstName or violating the uniqueness of the
primary key results in constraint violation errors, thereby enforcing the entity integrity of the database.
Referential integrity in a database is a crucial concept ensuring data consistency among related tables
through primary and foreign keys.
The referential integrity constraint is established when a foreign key references the primary key of another table,
requiring the referencing attribute to be a subset of the referred attribute.
This ensures that records cannot be inserted in the referencing relation unless they exist in the referenced relation.
Furthermore, any record present in the referencing relation cannot be updated or deleted from the referenced
relation, maintaining the accuracy and coherence of the relational database.
Example: Let’s consider an example with two tables: Orders and Customers.
-- Customers table
CREATE TABLE Customers (CustomerID INT PRIMARY KEY,FirstName VARCHAR(50),LastName
VARCHAR(50),Email VARCHAR(100) UNIQUE,-- Other columns);
-- Orders table
CREATE TABLE Orders (OrderID INT PRIMARY KEY,CustomerID INT,OrderDate DATE,TotalAmount
DECIMAL(10, 2),FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID));
Now, let’s perform some operations to demonstrate referential integrity:
-- Valid data insertion
INSERT INTO Customers (CustomerID, FirstName, LastName, Email)VALUES (1, 'John', 'Doe',
'john.doe@example.com');
INSERT INTO Orders (OrderID, CustomerID, OrderDate, TotalAmount) VALUES (101, 1, '2024-01-06', 150.00);
In this example, the referential integrity constraint ensures that the relationship between the Orders and
Customers tables is maintained. It prevents inserting, updating, or deleting data in a way that would create
inconsistencies in the relationships between these tables.
In a database, a Tuple Uniqueness Constraint, also known as a Unique Constraint, ensures that no two tuples
(rows) in a table have the same combination of values in specified columns. This constraint helps maintain the
uniqueness of data within a table, similar to the UNIQUE constraint.
Example: Let’s consider an example using a Products table where we want to enforce a Tuple
Uniqueness Constraint on the combination of ProductCode and Category:
The UNIQUE (ProductCode, Category) statement specifies a Tuple Uniqueness Constraint, indicating that the
combination of ProductCode and Category must be unique for each tuple in the Products table.
Now, let’s insert some data to demonstrate the Tuple Uniqueness Constraint:
INSERT INTO Products (ProductID, ProductCode, ProductName, Category, Price) VALUES (2, 'P002', 'Product B',
'Clothing', 29.99);
Constraints play a pivotal role in Database Management Systems (DBMS), serving as vital elements to
uphold the integrity, consistency, and reliability of stored data. These constraints establish rules and
criteria that data must meet, safeguarding against the inclusion of erroneous or inconsistent values.
Super keys: The set of attributes that can uniquely identify a tuple is known as Super Key. For Example, STUD_NO,
(STUD_NO, STUD_NAME), etc. A super key is a group of single or multiple keys that identifies rows in a table. It supports NULL
values.
Adding zero or more attributes to the candidate key generates the super key.
A candidate key is a super key but vice versa is not true.
Super Key values may also be NULL.
Role Of A Super Key
In a relational database, a super key's functionality is to distinctly identify tuples (rows) inside a relation (table).
Essentially, a super key's job consists of the following:
1. Originality- A super key needs to ensure that every value conjunction it embodies is distinct in each of
the sets in the relation. This distinguishing feature is required for proper data recovery and processing.
2. Selection- One way to locate certain tuples in the record set is to use the super key. It enables users to
identify and find certain records using the outcomes of the properties that make up the super key and the
database management system.
3. Security Of Data- Because they make it possible for each tuple to be securely identifiable, super keys
are essential for preserving data consistency. This aids in preventing inconsistent data and redundant entries
in the database.
4. Validity Of Reference- Super keys are frequently used to compare table-to-table interactions. A super
key from one table could be utilized as a foreign key in another to preserve the reliability and confidentiality
of references throughout linked tables.
5. Indexing- Table rankings, which may greatly enhance the speed of obtaining information operations,
are frequently made using super keys. A more effective database system locator for certain tuples is indexing.
1 101 1 2
2 102 2 1
3 101 3 3
4 103 1 1
5 102 4 2
In this case, all three characteristics, order_id, customer_id, and product_id, constitute a superkey. This is
because the order ID, customer ID, and product ID combined securely recognize every order.
Superkeys can be created based on the distinct properties of information being preserved, as these examples
show. For the purpose of creating and managing dependable and effective databases, it is essential to
comprehend superkeys.
‘
Relational algebra refers to a procedural query language that takes relation instances as input and returns
relation instances as output. It performs queries with the help of operators. A binary or unary operator can be
used. They take in relations as input and produce relations as output. Recursive relational algebra is applied to a
relationship, and intermediate outcomes are also considered relations.
Five basic operations in relational algebra: Selection, Projection, Cartesian product, Union, and Set Difference.
Selection:
• σpredicate (R) – Works on a single relation R and defines a relation that contains only those tuples (rows) of
R that satisfy the specified condition (predicate)
Projection
– Works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values
of specified attributes and eliminating duplicates.
UNION ( )
– Union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R and S,
duplicate tuples being eliminated.
– R and S must be union-compatible.
• If R and S have I and J tuples, respectively, union is obtained by concatenating them into one relation with a
maximum of (I + J) tuples.
Set Difference
• R–S
– Defines a relation consisting of the tuples that are in relation R, but not in S.
– R and S must be union-compatible.
Intersection ( )
– Defines a relation consisting of the set of all tuples that are in both R and S.
– R and S must be union-compatible.
• Expressed using basic operations:
Cartesian Product
A Cartesian product is the result of joining every row in one table with every row in another table. This occurs
when there is no WHERE clause to restrict rows.
In mathematics, the Cartesian Product of sets A and B is defined as the set of all ordered pairs (x, y) such that x
belongs to A and y belongs to B.
For example, if A = {1, 2} and B = {3, 4, 5},
then the Cartesian Product of A and B is {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)}.
Cartesian Product in DBMS is an operation used to merge columns from two relations.
JOIN
A database is a collection of different tables storing different types of information. The JOIN clause is used when
retrieving data from related tables in a database. The SQL JOIN clause is more complex than a simple query
that retrieves data from a single table because it retrieves data from multiple tables.
Join operation, combine the row of two or more tables based on related columns between them. The main
purpose of join is to retrieve the data from multiple tables in other words join is used to perform multiple table
query.
You can choose among four types of SQL JOINs depending upon the results you desire; Inner JOIN, Left Outer
JOIN, Right Outer JOIN, and Full Outer JOIN.
This type of JOIN returns the cartesian product of rows from the tables in Join. It will return a table which
consists of records which combines each row from the first table with each row of the second table.
For example, if you had a table with a column called "user id" and each user id was unique to a user, you could
join that table to another table with a "user id" column to find the information associated with each user.
This example shows how to use an Inner JOIN clause to join two tables:
INNER JOIN renders records from two tables (more with use of multiple JOINs) that have matching
values in both tables. The column(s) used to join the tables together is the one that includes the matching
values. In the result set of an INNER JOIN, the only records included are those where the case is met.
TABLE 1
TABLE 2
Result Set:
Notice how there are more rows in the ‘Address’ table than there are in the ‘Customers’ table. The result
set renders 3 records due to the way the tables JOIN by the ‘address_id’ column. Since there are only 3
records from the ‘Customers’ table with an ‘address_id’ that matches an ‘address_id’ from the
‘Address’ table, there are only 3 rows that are output in the result set.
INNER JOIN may also be referred to as a JOIN in SQL syntax. The meaning is the same and including
‘INNER’ in the syntax is optional and will render the same result(s).
Natural JOIN
Natural Join is a type of Inner join which is based on column having same name and same datatype
present in both the tables to be joined.
Outer Join is based on both matched and unmatched data. Outer Joins subdivide further into,
Left Outer
Left JOINs return all rows from the first table and only the rows in the second table that match. This example
shows how to use a Left Outer JOIN clause to join two tables:
LEFT JOIN (or LEFT OUTER JOIN) renders ALL records from the table on the left side (Table 1) and all matching
records of the table on the right side (Table 2). In a LEFT JOIN, records that do not have corresponding data from
the right table will show ‘NULL’ values from columns/fields selected from the right table.
Table 1
Table 2
Result Set:
Notice how the bottom 2 rows render ‘NULL’ results from the ‘Customers’ table on the bottom right of the result set.
The result set outputs this way because of the positioning of the tables in the query using the LEFT JOIN. This will
list ALL records from the ‘Address’ table, whether they have customer information tied to the address or not. For the
two records/row that do not, the information from the ‘Customers’ table outputs ‘NULL’ values.
Right Outer
Right JOINs are logically the opposite of Left JOINs—they return all rows from the second table, and only the
rows in the first table that match. This example shows how to use a Right Outer JOIN clause to join two tables:
RIGHT JOIN (or RIGHT OUTER JOIN) renders ALL records from the table on the right side (Table 2) and
corresponding records of the table on the left side (Table 1). In a RIGHT JOIN, records that do not have
corresponding data from the left table will show ‘NULL’ values from columns/fields selected from the left
table.
Table 1
Table 2
RIGHT JOIN Query:
Notice how the result set only outputs three rows. There are only three rows because the use case only
applies to three customer results from the ‘Customers’ table.
Full Outer
Full JOINs combine both left and right joins by returning all rows from both tables, as long as there is at least
one match between them. This example shows how to use a Full Outer JOIN clause to join two tables:
A FULL OUTER JOIN (or FULL JOIN) renders ALL records when a match exists in either table’s (table 1 or table 2)
records. It will also render all rows whether there is matching data from the other table or not. In cases where a
table does not have a match from the join, the result set will render a “null” value in the field.
TABLE 1
TABLE 2
RESULT SET:
Notice how the result set outputs 'null' results for Kelly Radio's record. If there are rows in the "employees"
table that do not have matches in the "employee_address" table or vice versa, the records will still be listed
regardless. An example of this may be that Kelly Radio was just hired as a new employee and has a row in the
employee's table but has not yet submitted his address information. Therefore, Kelly's address id does not yet
have information stored in the employee_address table to populate with a query. However, the record will not
be left out regardless of lacking a total match.
Aggregate Function
From business point view, different organization levels have different information requirements. Top levels
managers are usually interested in knowing whole figure and not necessary the individual details. Aggregate
function allows to easily produce summarized data from the database.
An aggregate function in SQL performs a calculation on multiple values and returns a single value. SQL
provides many aggregate functions that include avg, count, sum, min, max, etc. An aggregate function ignores
NULL values when it performs the calculation, except for the count function.
An aggregate function in SQL returns one value after calculating multiple values of a column. We often use
aggregate functions with the GROUP BY and HAVING clauses of the SELECT statement. There are 5 types of
SQL aggregate functions:
i.Count()
ii.Sum()
iii.Avg()
iv.Min()
v.Max()
COUNT FUNCTION
COUNT function is used to Count the number of rows in a database table. It can work on both numeric and
non-numeric data types.
COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table. COUNT(*)
considers duplicate and Null.
Syntax: COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )
SELECT COUNT(*) FROM PRODUCT_MAST;
SELECT COUNT(*) FROM PRODUCT_MAST WHERE RATE>=20;
SELECT COUNT(DISTINCT COMPANY) FROM PRODUCT_MAST;
SELECT COMPANY, COUNT(*) FROM PRODUCT_MAST GROUP BY COMPANY;
SELECT COMPANY, COUNT(*) FROM PRODUCT_MAST GROUP BY COMPANY HAVING COUNT(
*)>2;
Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.
Syntax: SUM()
or
SUM( [ALL|DISTINCT] expression )
SELECT SUM(COST) FROM PRODUCT_MAST;
SELECT SUM(COST) FROM PRODUCT_MAST WHERE QTY>3;
SELECT SUM(COST) FROM PRODUCT_MAST WHERE QTY>3 GROUP BY COMPANY;
The AVG function is used to calculate the average value of the numeric type. AVG function returns the average
of all non-Null values.
Syntax: AVG()
or
AVG( [ALL|DISTINCT] expression )
MIN function is used to find the minimum value of a certain column. This function determines the smallest
value of all selected values of a column.
Syntax: MIN()
or
MIN( [ALL|DISTINCT] expression )
SELECT MIN(RATE) FROM PRODUCT_MAST;
`
Built In Function
A function is an operation denoted by a function name followed by zero or more input values that are enclosed in
parentheses. It represents a relationship between a set of input values and a set of result values. The input values
to a function are called arguments .
String Functions
Function Description
ASCII Returns the ASCII code value for the leftmost character of a character expression.
CHAR Returns a character for an ASCII value.
CHARINDEX Searches for one character expression within another character expression and returns the starting
position of the first expression.
CONCAT Concatenates two or more string values in an end to end manner and returns a single string.
LEFT Returns a given number of characters from a character string starting from the left
LEN Returns a specified number of characters from a character string.
LOWER Converts a string to lower case.
LTRIM Removes all the leading blanks from a character string.
NCHAR Returns the Unicode character with the specified integer code, as defined by the Unicode standard.
PATINDEX Returns the starting position of the first occurrence of the pattern in a given string.
REPLACE Replaces all occurrences of a specified string with another string value.
RIGHT Returns the right part of a string with the specified number of characters.
RTRIM Returns a string after truncating all trailing spaces.
SPACE Returns a string of repeated spaces.
STR Returns character data converted from numeric data. The character data is right justified, with a specified
length and decimal precision.
STUFFInserts a string into another string. It deletes a specified length of characters from the first string at the
start position and then inserts the second string into the first string at the start position.
SUBSTRING Returns part of a character, binary, text, or image expression
UPPER Converts a lowercase string to uppercase.
DateTime Functions
Function Description
CURRENT_TIMESTAMP Returns the current system date and time of the computer on which the SQL server
instance is installed. Time zone is not included.
DATEADD Returns a new datetime value by adding an interval to the specified datepart of the specified date
DATEDIFF Returns the difference in datepart between two given dates.
DATENAME Returns a datepart as a character string.
DATEPART Returns a datepart as an integer
DAY Returns the Day as an integer representing the Day part of a specified date.
GETDATE Returns a datetime value containing the date and time of the computer on which the SQL Server
instance is installed. It does not include the time zone.
GETUTCDATE Returns a datetime value in UTC format (Coordinated Universal Time), containing the
date and time of the computer on which the SQL Server instance is installed.
MONTH Returns the Month as an integer representing the Month part of a specified date.
YEAR Returns the Year as an integer representing the Year part of a specified date.
ISDATE Determines whether the input is a valid date, time or datetime value.
Numeric Functions
Function Description
ABS Returns the absolute value of a number.
AVG Returns the average value of an expression/column values.
CEILING Returns the nearest integer value which is larger than or equal to the specified decimal value.
COUNT Returns the number of records in the SELECT query.
FLOOR Returns the largest integer value that is less than or equal to a number. The return value is of the
same data type as the input parameter.
MAX Returns the maximum value in an expression.
MIN Returns the minimum value in an expression.
RAND Returns a random floating point value using an optional seed value.
ROUND Returns a numeric expression rounded to a specified number of places right of the decimal point.
SIGN Returns an indicator of the sign of the input integer expression.
SUM Returns the sum of all the values or only the distinct values, in the expression. NULL values are ignore
Set Operation
There are four fundamental set operators used in SQL:
UNION – combines two or more result sets without duplicating values.
UNION ALL – combines two or more result sets including duplicate values
INTERSECT – includes ONLY the values present between two or more result sets.
EXCEPT (MINUS on Oracle) – includes ONLY results from first result set that are NOT included in second
result set
Set operations in SQL is a type of operations which allow the results of multiple queries to be combined into a
single result set.
UNION is used to combine the results of two or more SELECT statements. However it will eliminate duplicate
rows from its resultset. In case of union, number of columns and datatype must be same in both the tables, on
which UNION operation is being applied.
Select * from First UNION Select * from Second
UNION ALL: This operator combines all the records from both the queries.
Duplicate rows will be not be eliminated from the results obtained after performing the UNION ALL operation.
SELECT *FROM t_employees UNION ALL SELECT *FROM t2_employees;
Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the records
from the t_employees table and perform UNION ALL operation with the records fetched by the second
SELECT query from the t2_employees table.
These SELECT statements use the UNION operator to combine the names of ‘Founders’ and ‘Employees’ into a
single result set:
Intersect operation is used to combine two SELECT statements, but it only retuns the records which are
common from both SELECT statements. In case of Intersect the number of columns and datatype must be same.
Select * from First INTERSECT Select * from Second
ELECT *FROM t_employees INTERSECT SELECT *FROM t2_employees;
Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the records
from the t_employees table and perform INTERSECT operation with the records fetched by the second
SELECT query from the t2_employees table.
The Minus operation : It displays the rows which are present in the first query but absent in the second query
with no duplicates.
It combines results of two SELECT statements and return only those in the final result, which belongs to the first
set of the result.
Select * from First MINUS Select * from Second
SELECT *FROM t_employees MINUS SELECT *FROM t2_employees;
Here, in a single query, we have written two SELECT queries. The first SELECT query will fetch the records
from the t_employees table and perform MINUS operation with the records fetched by the second SELECT
query from the t2_employees table.
SubQueries
In SQL a Subquery can be simply defined as a query within another query. In other words we can say that a
Subquery is a query that is embedded in WHERE clause of another SQL query. Important rules for Subqueries:
You can place the Subquery in a number of SQL clauses: WHERE clause, HAVING clause, FROM clause.
for example: To display Name, Location, Phno of the students database where section is A
Select NAME, LOCATION, PHONE_NUMBER from DATABASE WHERE ROLL_NO IN
(SELECT ROLL_NO from STUDENT where SECTION=’A’);
Explanation : First subquery executes “ SELECT ROLL_NO from STUDENT where SECTION=’A’ ” returns
ROLL_NO from STUDENT table whose SECTION is ‘A’.Then outer-query executes it and return the NAME,
LOCATION, PHONE_NUMBER from the DATABASE table of the student whose ROLL_NO is returned
from inner subquery. Output:
NAME ROLL_NO LOCATION PHONE_NUMBER
Ravi 104 Salem 8989898989
Raj 102 Coimbatore 8877665544
Insert Query Example:
Table1: Student1
NAME ROLL_NO LOCATION PHONE_NUMBER
Ram 101 chennai 9988773344
Raju 102 coimbatore 9090909090
Ravi 103 salem 8989898989
Table2: Student2
NAME ROLL_NO LOCATION PHONE_NUMBER
Raj 111 chennai 8787878787
Sai 112 mumbai 6565656565
Sri 113 coimbatore 7878787878
To insert Student2 into Student1 table:
INSERT INTO Student1 SELECT * FROM Student2;
Output:
NAME ROLL_NO LOCATION PHONE_NUMBER
Ram 101 chennai 9988773344
Raju 102 coimbatore 9090909090
Ravi 103 salem 8989898989
Raj 111 chennai 8787878787
Sai 112 mumbai 6565656565
Sri 113 coimbatore 7878787878
To delete students from Student2 table whose rollno is same as that in Student1 table and having location as
chennai
DELStudent2
WHERE ROLL_NO IN ( SELECT ROLL_NO
FROM Student1
WHERE LOCATION = ’chennai’);
Output:
1 row delete successfully.
Display Student2 table:
NAME ROLL_NO LOCATION PHONE_NUMBER
Sai 112 mumbai 6565656565
Sri 113 coimbatore 7878787878
To update name of the students to geeks in Student2 table whose location is same as Raju,Ravi in Student1
table
UPDATE Student2
SET NAME=’geeks’
WHERE LOCATION IN ( SELECT LOCATION
FROM Student1
WHERE NAME IN (‘Raju’,’Ravi’));
Output:
1 row updated successfully.
Display Student2 table:
NAME ROLL_NO LOCATION PHONE_NUMBER
Sai 112 mumbai 6565656565
geeks 113 coimbatore 7878787878
Example 1: Find all professors whose salary is greater than the average budget of all the departments.
Instructor relation:
Query:
select I.ID, I.NAME, I.DEPARTMENT, I.SALARY from(select avg(BUDGET) as averageBudget from
DEPARTMENT) as BUDGET, Instructor as Iwhere I.SALARY > BUDGET.averageBudget;
Output
InstructorID Name Department Salary
44547 Smith Computer Science 95000
48147 Erik Mechanical 80000
Explanation: The average budget of all departments from the department relation is 70000. Erik and Smith are
the only instructors in the instructor relation whose salary is more than 70000 and therefore are present in the
output relation.
Correlated Subquery
A correlated subquery is a subquery that contains a reference to a table that also appears in the outer query
For example
SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2 WHERE t2.column2 =
t1.column2);
SQL Correlated Subqueries are used to select data from a table referenced in the outer query. The subquery is
known as a correlated because the subquery is related to the outer query. In this type of queries, a table alias
(also called a correlation name) must be used to specify which table reference is to be used.
Uncorrelated subquery executes the subquery first and provides the value to the outer query, whereas correlated
subquery references a column in the outer query and executes the subquery once for each row in the outer
query..
A correlated subquery (also known as a synchronized subquery) is a subquery (a query nested inside another
query) that uses values from the outer query. Because the subquery may be evaluated once for each row
processed by the outer query, it can be slow.
Here is an example for a typical correlated subquery. In this example, the objective is to find all employees
whose salary is above average for their department.
SELECT employee_number, name FROM employees emp WHERE salary > ( SELECT AVG(salary)
FROM employees WHERE department = emp.department);
Correlated subqueries in the SELECT clause
Correlated subqueries may appear elsewhere besides the WHERE clause; for example, this query uses a
correlated subquery in the SELECT clause to print the entire list of employees alongside the average salary for
each employee's department. Again, because the subquery is correlated with a column of the outer query, it
must be re-executed for each row of the result.
SELECT employee_number, name, (SELECT AVG(salary) FROM employees WHERE
department = emp.department) AS department_average FROM employees emp
Correlated subqueries in the FROM clause
It is generally meaningless to have a correlated subquery in the FROM clause because the table in the FROM
clause is needed to evaluate the outer query, but the correlated subquery in the FROM clause can't be evaluated
before the outer query is evaluated, causing a chicken-and-egg problem.
Example: SQL Correlated Subqueries
The following correlated subqueries retrive ord_num, ord_amount, cust_code and agent_code from the table
orders ( 'a' and 'b' are the aliases of orders and agents table) with following conditions -
the agent_code of orders table must be the same agent_code of agents table and agent_name of agents table must
be Alex,
the following SQL statement can be used:
Sample table: orders (ordno, ord_amt, adv_amt, ord_date, cust_cd, agentcode)
Agent(Agent_cd, Agent_name, Working_area, commission, country, phoneno)
SELECT a.ord_num,a.ord_amount,a.cust_code,a.agent_code FROM orders a WHERE a.agent_code=( SELECT
b.agent_code FROM agents b WHERE b.agent_name='Alex');
SELECT a.ord_num,a.ord_amount,a.cust_code,a.agent_code FROM orders a WHERE a.agent_code='A003';
a simple subquery is evaluated once in the entire statement. A correlated subquery is evaluated in loops -- once
for each row generated by the data set. This means that adding correlated subqueries will slow down your query
performance, since your query is recalculating information over and over.
A subquery is a query within a query. In simple words, it is using a query inside a query for specific data needs.
SQL GROUP BY
The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of
customers in each country".
The GROUP BY statement is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG())
to group the result-set by one or more columns.
SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country;
HAVING Clause:
The SQL HAVING clause is used in combination with the GROUP BY clause to restrict the groups of returned
rows to only those whose the condition is TRUE.
SELECT department, SUM(sales) AS "Total sales" FROM order_details GROUP BY department HAVING
SUM(sales) > 1000;
SELECT department, MIN(salary) AS "Lowest salary" FROM employees GROUP BY department HAVING
MIN(salary) > 35000;
SELECT department, MAX(salary) AS "Highest salary" FROM employees GROUP BY department HAVING
MAX(salary) < 50000;
The ORDER BY statement in SQL is used to sort the fetched data in either ascending or descending according
to one or more columns. In this article, we will discuss different ways of using Order By in SQL. Here are some
basic rules of Order By statement in SQL. By default ORDER BY sorts the data in ascending order.
The ORDER BY statement in SQL is used to sort the fetched data in either ascending or descending according to
one or more columns. here we will discuss different ways of using Order By in SQL.
A view is a database object that has no values. It is a virtual table, which is created according to the result
set of an SQL query. However, it looks similar to an actual table containing rows and columns. Therefore, we
can say that its contents are based on the base table. It is operated similarly to the base table but does not
contain any data of its own. Its name is always unique, like tables. The views differ from tables as they are
definitions that are created on top of other tables (or views). If any changes occur in the underlying table, the
same changes reflected in the views also.
This diagram illustrates the concept of a view that included columns from more than one table. Here we have
two tables named 'Table A' and 'Table B,' and by using a SQL statement, a view is created containing data
from both tables. Views are a database object; that's why it does not store physically. This feature makes views
excellent for abstracting or hiding complex queries .
The primary use of view in SQL Server is to implement the security mechanism. It prevents users from seeing
specific columns and rows from tables. It only shows the data returned by the query that was declared when the
view was created. The rest of the information is completely hidden from the end-user.
Types of Views:
1. User-Defined Views
Users define these views to meet their specific requirements. It can also divide into two types one is the
simple view, and another is the complex view. The simple view is based on the single base table without using
any complex queries. The complex view is based on more than one table along with group by clause, order by
clause, and join conditions.
CREATE VIEW course_enrolled AS SELECT first_name, last_name, course, amount_paid FROM Student
AS S INNER JOIN Fee AS F ON S.admission_no = F.admission_no;
2. System-Defined Views
System-defined views are predefined and existing views stored in SQL Server, such as Tempdb, Master,
and temp. Each system views has its own properties and functions. They can automatically attach to the user-
defined databases. We can divide the System-defined views in SQL Server into three types: Information
Schema, Catalog View, and Dynamic Management View.
COMMIT command
COMMIT command is used to permanently save any transaction into the database.
When we use any DML command like INSERT, UPDATE or DELETE, the changes made by these commands
are not permanent, until the current session is closed, the changes made by these commands can be rolled back.
To avoid that, we use the COMMIT command to mark the changes as permanent.
ROLLBACK command
This command restores the database to last commited state. It is also used with SAVEPOINT command to jump
to a savepoint in an ongoing transaction.
If we have used the UPDATE command to make some changes into the database, and realise that those changes
were not required, then we can use the ROLLBACK command to rollback those changes, if they were not
commited using the COMMIT command.
SAVEPOINT savepoint_name;
In short, using this command we can name the different states of our data in any table and then rollback to that
state using the ROLLBACK command whenever required.
id name
1 Abhi
2 Adam
4 Alex
Lets use some SQL queries on the above table and see the results.
COMMIT;
SAVEPOINT A;
SAVEPOINT B;
SAVEPOINT C;
id name
1 Abhi
2 Adam
4 Alex
5 Abhijit
6 Chris
7 Bravo
Now let's use the ROLLBACK command to roll back the state of data to the savepoint B.
ROLLBACK TO B;
ROLLBACK TO A;
id name
1 Abhi
2 Adam
4 Alex
5 Abhijit
So now you know how the commands COMMIT, ROLLBACK and SAVEPOINT works.
Relational Calculus
Before understanding Relational calculus in DBMS, we need to understand Procedural Language
And Declarative Language
.
1.Procedural Language : Those Languages which clearly define how to get the required results from the Database are
called Procedural Language. Relational algebra is a Procedural Language.
2. Declarative Language : Those Language that only cares about What to get from the database without getting
into how to get the results are called Declarative Language. Relational Calculus is a Declarative Language.
Relational calculus is a non-procedural query language. In the non-procedural query language, the user is
concerned with the details of how to obtain the end results. The relational calculus tells what to do but never
explains how to do.
Using a non procedural query language, unlike the procedural one used in relational algebra. In other words it
only provides the information about description of the query but not detailed methods on how to do it.
1. In tuple calculus we find tuples which are true for a given condition.
2. The predicate must be true for a tuple
3. Result obtained maybe more than 1 tuple.
Breakdown –
∈ – represents Belongs to
∃ – called an existential quantifier represents there is at least one
r – means relation
(∨ ) – OR
(∧ ) – AND
(¬) – NOT
Example 1 :
This just represents that the tuple t belongs to relation Employee and we’re using this to be on the safe side.
Example 2 :
It will result in the emp_Id for each employee that has his/her salary greater or equal to 10000.
While in tuple relationship calculus we did relational mathematics based on the tuple results and predicates. In
domain relational calculus, however, we do it based on the domains of the attributes.
Breakdown –
Example :
The result here will be returning the Fname and Emp_ID values for all the rows in the employee table where
salary is greater than 10000.
T up le R e lat io na l Ca lc u lu s ( T R C) i s a n o n - p r o c e d u r a l q u e r y l a n g u a g e u s e d i n
r e l a t i o n a l d a t a b a s e m a n a g e m e n t s y s t e m s (RDBMS) to retrieve data from tables. TRC is
based o n t he concept of t uples, which are ordered set s of attribute values that represent a single
row or record in a database table.
TRC is a declarative language, meaning that it specifies what data is required from the database,
rather than how to retrieve it.
TRC queries are expressed as logical formulas t hat describe t he desired tuples.
In tuple relational calculus, we work on filtering tuples based on the given condition. It is also known as
predicate calculus. Tuple (t) variable range for all tuple of relation or table (R).
Syntax: { T | Condition }
In this form of relational calculus, we define a tuple variable, specify the table(relation) name in which the tuple
is to be searched for, along with a condition.
We can also specify column name using a . dot operator, with the tuple variable to only get a certain attribu12
In domain relational calculus, filtering is done based on the domain of the attributes and not based on the tuple
values.
Domain (d (attribute)) variable range for all domain (columns) of relation or table (R). It is similar work on all
the domains (columns) as tuple relational calculus work for all row.
Syntax: { c1, c2, c3, ..., cn | F(c1, c2, c3, ... ,cn)}
where, c1, c2... etc represents domain of attributes(columns) and F defines the formula including the condition
for fetching the data.
For example,
Again, the above query will return the names and ages of the students in the table Student who are older than
17.
The second form of relation is known as Domain relational calculus. In domain relational calculus, filtering
variable uses the domain of attributes. Domain relational calculus uses the same operators as tuple calculus. It
uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses Existential (∃ ) and Universal Quantifiers (∀ ) to
bind the variable. The QBE or Query by example is a query language related to domain relational calculus.
Notation:
Where
For example: