0% found this document useful (0 votes)
10 views

Dbms Complete Notes 2nd Sem-1

Uploaded by

thoshan020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Dbms Complete Notes 2nd Sem-1

Uploaded by

thoshan020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 435

DATABASE

MANAGEMENT SYSTEMS

CHAPTER -1
DATABASES AND DATABASE USERS
Basic Definitions :
Data :
 Data is a Representation of facts, figures, statistics etc. having no
particular meaning.
 Data can be in the form of numbers, characters, symbols, or even
pictures.
 Ex:
 1,ABC,19 etc.
Information :
 It is a processed data (or) Collection of data which is having perfect
meaning is called information.
Field:
 A field is a single piece of information, a record is one complete set of
fields and a file is a collection of records.
EX: Telephone book is analogous to file. It contains a list of records ,
each of which consist of three fields : Name, Address, And telephone
numbers.
Database :
 A database is an collection of related data(information) that
is organized So that it can easily be accessed,managed,and
Updated.
Examples:
University Database
Data: Departments,Students,Exams,Rooms etc..
Usage: Create Exam plans,Enter Exam Results, create Statistics
and Build Timetable
Bank Database
Data: Clients,Accounts,Credits,Funds etc..
Applications: Accounting,Transfers.Risk management
Airline Database
Data: Flights,Passengers,Employees,Airplanes Etc…
Applications: Reservation,Booking,CreatingFlight Schedule.
Database System :
 A database system is a way of organizing information on a
computer , implemented by a set of computer programs.
Data Base Management System (DBMS) :
 It is a collection of programs that enables user to create and
maintain a database.
 In other words it is general-purpose software that provides
the users with the processes of defining, constructing and
manipulating the database for various applications.
Ex:
Computerized library Systems
Automated teller machines
Flight reservation Systems
FUNCTIONALITIES OF DATA BASE

Defining A Database :
 Defining a database involves specifying the Data types,
structures, and constraints of the data to be stored in the
database.
Constructing A Database:
 constructing the database is the process of storing data on
some storage medium that is controlled by the DBMS.
Manipulating A Database:
 Manipulating A Database includes functions such as
Querying the database to retrieve specific data, updating
the database and generating from the data.
Sharing:
 Allows multiple users to access the database
simultaneously.
PROPERTIES OF THE DATABASE :
There are three properties:

1. A database represents some aspects of the real world.


2. A database is a logically coherent collection of data with
some inherent meaning.
3. A database is designed, built, and populated with data for
a Specific purposes.
 Example of a simple database:

Student Course
Name Rollno Class Department Subject name Subject code Department

Somu 1 A CS CA DCCA101 CS

Raju 2 B CS DBMS DCCA102 CS

Radha 3 C ECE Accounts DCCA103 COMMERCE

Grade Report

Rollno Course no Grade


1 Cs01 A
2 Cs02 B
3 Com3 B
MAIN CHARACTERISTICS OF THE DATABASE APPROACH
Self-describing nature of a database system:

 A DBMS catalog stores the description of a particular


database (e.g. data structures, types, and constraints)

 The description is called meta-data.

 This allows the DBMS software to work with different


database applications.

Insulation between programs and data:


 Called program-data independence.

 Allows changing data structures and storage organization


without having to change the DBMS access programs.
Support of multiple views of the data:
 Each user may see a different view of the database, which
describes only the data of interest to that user.
 Ex:
 For example, one user of the student database may be
interested only in accessing and printing the transcript of each
student and another user may be interested only in checking
which course the students enrolled.
Sharing of data and multi-user transaction processing:
 Allow multiple users to access the database at the
same time.
 Concurrency control software–
 Ensure that several users trying to update the
same data do so in a controlled manner.
 Result of the updates is correct.
• OLTP (Online Transaction Processing) is a major part of
database applications. This allows hundreds of concurrent
transactions to execute per second.
• Ex:

• For example, when several reservation clerks try to assign a


seat on a particular train, the DBMS should ensure that each
seat can be accessed by only one clerk at a time for assignment
to a passenger.
Data Abstraction:
 Data abstraction is the method of hiding the unimportant
details that are present in the database from the end users to
make the accessing of data easy and secure.
DIFFERENT PEOPLE BEHIND DBMS
 These apply to "large" databases, not "personal" databases that are defined,
constructed, and used by a single person via, say, Microsoft Access.

 There are two categories of people behind DBMS

i. Actors on the Scene

ii. Workers behind the scene

 Actors on the Scene: They actually use and control the database content;
and design, develop and maintain database applications

 Database Administrators
 Database Designers
 Software Engineers
 End-users
1.Database Administrator (DBA):
 DBA is a person who is responsible for authorizing access
to the database, coordinating and monitoring its use, and
acquiring software and hardware resources as needed.
2.Database Designers:
 Database Designers are responsible for identifying the
data to be stored in the database and for choosing
appropriate structures to represent and store the data.
 They must communicate with the end-users and
understand their needs.
3.End Users:
 These are persons who access the database for querying, updating,
and generating reports.

They are categorized as:

 Casual end-users: occasionally access the database, but they may


need different information each time.

 Naive or parametric end-users: constantly update and query


databases, using standard types of queries and updates.

 Example: Bank clerks, Reservation clerks etc.

 Sophisticated end-users : include engineers, scientists, business


analysts and others who thoroughly familiarize themselves with the
facilities of the DBMS in order to implement their applications to
meet their requirements.
 Stand-alone end-users: maintain personal databases by
using easy-to-use ready-made program packages.
4.System Analysts, Application Programmers, Software
Engineers:
 System Analysts: Determine needs of end users, especially
naive and parametric users, and develop specifications for
canned transactions that meet these needs.
 Application Programmers: Implement, test, document, and
maintain programs that satisfy the specifications mentioned
above.
 System Analysts & Application Programmers commonly
referred as Software Engineers.
Workers Behind the Scene:
 Those who design and develop the DBMS software and related tools.

1)DBMS system designers/ implementers : are persons who


design and implement the DBMS Module, and interfaces as a
Software package.
2) Tool developers: include person who design and implement
tools –the software packages that facilitate Database system
design and use that help improve performance.
3) Operators and maintenance personnel (system
administration personnel) : Are responsible for the actual
running and maintenance of the hardware and software
environment for the database system. Responsible for the day-to-
day operation of the system.
Advantages of using DBMS Approach :

1.Controlling Redundancy
 The Database Management System will not allow storing the
redundant data (same data multiple times) in the database.
 The redundancy in storing the same data multiple times leads to
several problems.
 storage space is wasted when the same data is stored repeatedly.

 Files that represent the same data may become inconsistent and etc.

2.Restricting Unauthorized Access:


 The Database Management System restricts unauthorized access of
the database by enforcing restrictions such as providing a password
for an authorized person to access the data.
3.Providing Persistent Storage for Program Objects
 Databases can be used to provide persistent storage for program
objects and data structures.
4. Providing Storage Structures for Efficient Query Processing
 Database systems must provide capabilities for efficiently
executing queries and updates.
5. Providing Backup and Recovery
 A DBMS must provide facilities for recovering from hardware or
software failures.
6. Providing Multiple user interface
 Many types of users use the database , A DBMS should provide
variety Of User interfaces.
7.It Allows the Data Sharing
 A Database allows the sharing of data under its control by any
number of application programs or users.
8. It ensures data integrity
 Data integrity means that the data contained in the database is
both accurate and consistent.
9. Representing complex relationships among data.
Disadvantages of DBMS :
 Cost of Hardware & Software

 Cost of Data Conversion

 Cost of Staff Trailing

 Appointing Technical Staff

 Database Damage
A Brief History of Database Applications:
Early Database Applications Using Hierarchical and Network
Systems:
 Large numbers of records of similar structure.
 One of the main problems with early database systems was the
intermixing of conceptual relationships with the physical storage and
placement of records on disk.
 Another shortcoming of early systems was that they provided only
programming language interfaces.
Providing Data Abstraction and Application Flexibility with Relational
Databases:
 Relational databases were originally proposed to separate the
physical storage of data from its conceptual representation and to
provide a mathematical foundation for data representation and
querying.
 The relational data model also introduced high-level query
languages that provided an alternative to programming language
interfaces, making it much faster to write new queries.
 Relational databases now exist on almost all types of computers,
from small personal computers to large servers.
Object-Oriented Applications and the Need for More Complex
Databases:
 object-oriented databases (OODBs) were considered a
competitor to relational databases, since they provided more
general data structures.
 Used in specialized applications: engineering design,
multimedia publishing, and manufacturing systems.
Interchanging Data on the Web for E-Commerce Using XML:
 In the 1990s, electronic commerce (e-commerce) emerged as
a major application on the Web.
 A variety of techniques were developed to allow the
interchange of data on the Web.
 Currently, extended Markup Language (XML) is considered
to be the primary standard for interchanging data among
various types of databases and Web pages.
Extending database capabilities for new applications
1. Extensions to better support specialized requirements for
applications
2. Enterprise resource planning (ERP)
3. Customer relationship management (CRM)
Databases versus information retrieval
 Information retrieval (IR)
 Deals with books, manuscripts, and various forms of library-based
articles.
When not to use a DBMS :
a) Main costs of using a DBMS:
i) High initial investment and possible need for additional
hardware.
ii) Overhead for providing generality, security, concurrency control,
recovery, and integrity functions.
b) When a DBMS may be unnecessary:
i) If the database and applications are simple, well defined and not
expected to change.
ii) If there are stringent real-time requirements that may not be met
because of DBMS overhead.
iii)If access to data by multiple users is not required.
c) When no DBMS may be sufficient:
i) If the database system is not able to handle the complexity of data
because of modeling limitations .
ii) If the database users need special operations not supported by
the DBMS.
1. Database Administrators (DBA):
 The DBA is responsible for authorizing access to the
database, for Coordinating and monitoring its use and for
acquiring software and hardware resources as needed.
 These are the people, who maintain and design the
database daily.
DBA is responsible for the following Issues:
o Design of the conceptual and physical schemas:
• The DBA is responsible for interacting with the users
of the system to understand what data is to be stored in
the DBMS and how it is likely to be used.
• The DBA creates the original schema by writing a set
of definitions and is Permanently stored in the 'Data
Dictionary.
 Security and Authorization:
 The DBA is responsible for ensuring the unauthorized data
access is not permitted.
 The granting of different types of authorization allows the
DBA to regulate which parts of the database various users can
access.
 Storage structure and Access method definition:
 The DBA creates appropriate storage structures and access
methods by writing a set of definitions, which are translated
by the DDL compiler.
 Data Availability and Recovery from Failures:
 The DBA must take steps to ensure that if the system fails,
users can continue to access as much of the uncorrupted data
as possible. The DBA also work to restore the data to
consistent state.
 Database Tuning:
 The DBA is responsible for modifying the database to ensure
adequate Performance as requirements change.
 Integrity Constraint Specification:
 The integrity constraints are kept in a special system
structure that is consulted by the DBA whenever an update
takes place in the system.
DATABASE SYSTEM CONCEPTS AND
ARCHITECTURE

CHAPTER-2
DATA MODEL:
 Data Model can be defined as an integrated Collection of concepts
for describing and manipulating data, relationship between data, and
constraints on the data in an organization.
 It defines the data elements and the relationships between the data
elements.
 Data Models are used to show how data is stored, connected,
accessed and updated in the database management system.
 Though there are many data models being used nowadays but the
Relational model is the most widely used model.
Categories of Data Models.
High-level or conceptual data models :
 high level or conceptual model is the User level data model .
 This provides concepts that are close to the way that many users
perceive data.
 Conceptual data models use concepts such as Entities, Attributes and
Relationship
Low level-Physical data models:
 provides concepts that describe the details of how data is stored in the
computer model.
 Low level data model is only for Computer specialists not for end-
user.
Representation data model:
 It is between High level & Low level data model Which provides
concepts that may be understood by end-user but that are not too far
removed from the way data is organized by within the computer.
Types of Data Models :
1. Hierarchical Data Model

2. Network Data Model


3. Relational Data Model
4. Object-oriented Data Models
5. Object-Relational Data Models
Hierarchical Model:
 Hierarchical Model was the first DBMS model. This model organizes
the data in the hierarchical tree structure.
 The hierarchy starts from the root which has root data and then it
expands in the form of a tree adding child node to the parent node.
 The data here is organized in a tree-like structure where the one-to
many relationship(1:N) is between the data types.
 Also, there can be only one path from parent to any node.

 Each child node has a parent node but a parent node can have more than
one child node.
 Multiple parents are not allowed.

 If a parent node is deleted then the child node is automatically deleted.

 Pointers are used to link the parent node with the child node and are
used to navigate between the stored data.
Advantages:
 Any change in the parent node is automatically reflected in the child
node so, the integrity of data is maintained
 Simplicity, Security, Efficiency

Disadvantages:
 Complex relationships are not supported.

 Database Management problem

 Operation Anomalies
Network Model :
 This model is an extension of the hierarchical model.

 It was the most popular model before the relational model.

 This model is the same as the hierarchical model, the only difference
is that a record can have more than one parent.
 It replaces the hierarchical tree with a graph.

 In this model, as there are more relationships so data is more related.

 This model has the ability to manage one-to-one relationships as well


as many-to many (N:N) relationships.
 A network structure thus allows 1:1,1:M, M:M relationship among
entities
 As there are more relationships so there can be more than one path to
the same record.
 This makes data access fast and simple.
 The operations on the network model are done with the help of the
circular linked list.
 The current position is maintained with the help of a program and this
position navigates through the records according to the relationship.
Advantages of Network Model :
 The data can be accessed faster as compared to the hierarchical model.

 Network model and there can be more than one path to reach a
particular node.
 Ease to access data : data access is easier than the hierarchical model

 Data Integrity and Data independence

Disadvantages of Network Model :


 As more and more relationships need to be handled the system might
get complex.
 Operational Anomalies – large number of pointers is required so
insertion, deletion & updating is more complex.
In this diagram we can see that node STUDENT has two parents i.e. CSE
Department and Library. This was earlier not possible in the Hierarchical
model
Relational Model :
 Relational Model is the most widely used model.

 In this model, the data is maintained in the form of a two-dimensional


table to represent its database.
 All the information is stored in the form of row and columns.

 The basic structure of a relational model is tables.

 So, the tables are also called relations in the relational model.

 Each Row in the table is called Tuple or Record.

 Each Column in the table is called Attribute or Field.

 A row contains all the information about any instance of the object.
 Attribute or field: Attributes are the property which defines the table
or relation.
 In the above example, we have different attributes of the employee like
Salary, Mobile no, etc.
Advantages of Relational Model :
 Simple: This model is more simple as compared to the network and hierarchical
model.
 Structural Independence: In the relational model, changes in the structure do
not affect the data access
Disadvantages of Relational Model :
 Hardware Overheads: For hiding the complexities and making things easier for
the user this model requires more powerful hardware computers and data storage
devices.
 Too many rules make database non-user-friendly.
Object-oriented Data Model :
 The real-world problems are more closely represented through the
object-oriented data model.
 In this model, both the data and relationship are present in a single
structure known as an object.
 We can store audio, video, images, etc in the database which was not
possible in the relational model.
 In this model, two are more objects are connected through links.

 We use this link to relate one object to other objects.

 This can be understood by the example given below.


 the above example, we have two objects Employee and Department.
 All the data and relationships of each object are contained as a single
unit.
 The attributes like Name, Job_title of the employee and the methods
which will be performed by that object are stored as a single object.
 The two objects are connected through a common attribute i.e the
Department_id and the communication between these two will be
done with the help of this common id.
 The Behaviour of the objects is represented using methods.

 Similar attributes and methods are grouped together using a Class.


Elements of Object oriented data model
 Objects

The real world entities and situations are represented as objects in the
Object oriented database model.
 Attributes and Method

Every object has certain characteristics. These are represented using


Attributes. The behaviour of the objects is represented using Methods.
 Class

Similar attributes and methods are grouped together using a class. An


object can be called as an instance of the class.
 Inheritance

A new class can be derived from the original class. The derived class
contains attributes and methods of the original class as well as its own.
Object-Relational Models :
 An Object relational model is a combination of a Object oriented
database model and a Relational database model.
 So, it supports objects, classes, inheritance etc. just like Object
Oriented models and has support for data types, tabular structures
etc. like Relational data model.
 One of the major goals of Object relational data model is to close the
gap between relational databases and the object oriented practices
frequently used in many programming languages such as C++, C#,
Java etc.
The advantages of the Object Relational model are −
Inheritance
 The Object Relational data model allows its users to inherit objects,
tables etc.
Complex Data Types
 Complex data types can be formed using existing data types

Extensibility
 The functionality of the system can be extended in Object relational
data model.
Disadvantages of Object Relational model
 The object relational data model can get quite complicated and
difficult to handle at times as it is a combination of the Object
oriented data model and Relational data model and utilizes the
functionalities of both of them.
Schema and Instances:
 Database Schema: A database schema represents the logical view of
the entire database.
 It defines how the data is organized and how the relations among
them are associated.
 Schemas provide a logical classification of objects in the database.
 A schema can contain tables, views, triggers, functions, packages,
and other objects.
 A database schema defines its entities and the relationship among
them.
 It contains a descriptive detail of the database, which can be depicted
by means of schema diagrams.
 It’s the database designers who design the schema to help
programmers understand the database and make it useful.
Schema Diagram
 From the above schema diagram student and Grade report are related
and course and prerequisite and section are related.
DBMS Instance :
 The data stored in database at a particular moment of time is called
instance of database or snapshot or database state
 Database State: Refers to the content of a database at a moment in
time.
 Empty State : At this point, the corresponding database state is the
empty state with no data.
 Initial State: Refers to the database state when the database is first
populated or loaded with initial data.
 Valid State: A state that satisfies the structure and constraints specified
in the Schema.
 The Schema is called the Intension of the Schema and the database
state an Extension of the Schema
SIMPLIFIED DATABASE SYSTEM :
 A database system is Computer-Based system to record and maintain the
information. A Database Management System consists of a collection of
inter-related data and a set of program to access those data.
 The database and the DBMS software together is a Database system.

 It consists of the following

1. User/Programmers
2. Applications programs/Queries
3. Software to process Queries/programs
4. Software to Access stored data
5. DBMS Catalog Contains the Stored database definition (Metadata)
6. The Physical Stored database
A SIMPLIFIED DATABASE SYSTEM ENVIRONMENT
 A DBMS Catalog stores the description of the database. The description is
called Meta-data. This allows the DBMS software to work with different
databases
 The collection of data, usually referred to as the database, contains
information about one particular enterprise.
 The primary goal of DBMS is to provide an environment that is both
convenient and efficient to use in retrieving and storing database information
 This system involves the control of how databases are Created, interrogated
and maintained to provide information needed by end users and the
organization.
 DBMS acts as the interface between the application programs and database.
 There are many different types of DBMS, ranging from small systems that run
on personal computers to huge systems that run on mainframes.
 Ex: Computerized library system
DBMS Architecture :
 Every database system logically organizes data with respect to some
model is called Data model. A Data model describes how various
pieces of data in the database are logically related to each other
 The data model represents the relationship between entities. The
database model is also well known as Database Architecture.
 The structure of a DBMS may be analyzed in two separate architectures

 Logical DBMS Architecture

 Physical DBMS Architecture.


Logical DBMS Architecture (Three Schema/Level DBMS
Architecture):
 The three schema architecture is also called ANSI/SPARC architecture
or three-level architecture.
 The logical Architecture describes how data in the database is perceived
by users.
 It is now concerned with how the data is handled and processed by the
DBMS, but only with how it looks.
 The goal of the Three-schema architecture is to separate the user
application and the physical database.
 In this architecture, schemas can be defined at the following three
levels
 There are following three levels or layers of DBMS Architecture
 External Level
 Conceptual Level
 Internal Level
Physical level (or Internal View / Schema):
 The lowest level of abstraction describes how the data are actually stored.

 The physical level describes physical storage structure of the database.

 This level is also responsible for allocating space to the data.


 Essentially, the physical schema summarizes how the relations described in
the conceptual schema are actually stored on secondary storage devices such
as disks and tapes.
Logical level (or Conceptual View / Schema):
 The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data.
 The logical level thus describes the entire database in terms of a small
number of relatively simple structures.
 It hides physical storage details, concentrating upon describing entities, data
types, relationships, user operations, and constraints.
 Database administrators, who must decide what information to keep in the
database, use the logical level of abstraction.
 This level is maintained by DBA (database administrator).
View level (or External View / Schema):
 An external schema is also known as view schema.
 The highest level of abstraction Describes the various user views..
 It describes the part of the database that a particular user group is interested in
and hides the rest of the database from that user group.
 Usually a high-level model is used here.
 The view schema describes the end user interaction with database systems.
Mapping:
 The processes of transforming requests and results between levels are called
mappings.
 In a DBMS based on the three-schema architecture, each user group refers to its
own external schema.
 Hence, the DBMS must transform a request specified on an external schema into a
request against the conceptual schema, and then into a request on the internal
schema for processing over the stored database.
 If the request is a database retrieval, the data extracted from the stored database
must be reformatted to match the user’s
 There are basically two types of mapping in the database
architecture:
◦ External/conceptual mapping
◦ Conceptual/ internal mapping
Conceptual/ Internal Mapping
 The Conceptual/ Internal Mapping lies between the conceptual level
and the internal level. Its role is to define the correspondence
between the records and fields of the conceptual level and files and
data structures of the internal level.
External/ Conceptual Mapping
 The external/Conceptual Mapping lies between the external level
and the Conceptual level. Its role is to define the correspondence
between a particular external and the conceptual view.
Physical DBMS Architecture
The physical Architecture describes the software components used to enter and
process data, and how these software components are related and
interconnected.
The physical DBMS Architecture divided into 2 parts:
 Back end

 Front end

Back end:
 The back end is responsible for managing the physical database and providing
necessary support and mappings for the internal, conceptual levels.
 Other benefits of DBMS, such as Security, integrity and access control are also
responsibility of the back end
Front end:
 The Front end is really an application that runs on the top of the DBMS. These
may be applications provided by the DBMS vendor, the use of the third party.
 The user interact with the front end and may not even be aware that the back
end exists.
DATA – INDEPENDENCE
 Data Independence is defined as a property of DBMS that helps you to
change Database Schema at one level of a database system without
requiring to change the Schema at the next higher level.
 Data Independence is one of the main advantages of DBMS

 The ability to modify a schema definition in one level without affecting


Schema definition in the next higher level is called “Data
Independence”
 There are two kinds of Data Independence

 1. Physical Data Independence

 2. Logical Data Independence


Physical Data Independence:
 The ability to change the internal schema without having to change the
conceptual schema. By extension, the external schema should not change as
well.
 Physical file reorganization to improve performance(such as creating access
structures) results in change to the internal schema.
 The physical data Independence allows changes in the physical storage devices
or organizations of the files to be made without requiring changes in the
conceptual view or any of the external view.
 If we do any changes in the storage size of the database system server, then
the Conceptual structure of the database will not be affected.
 Physical data independence is used to separate conceptual levels from the
internal levels.
 Physical data independence occurs at the logical interface level.

Example:
 Using new storage devices like Hard drive or Magnetic tapes

 Modifying the file organization technique in the database.


Logical Data Independence
 Logical data independence is the ability to change the conceptual schema
without changing of external view and external API or program.
 When data is added or removed, only the view definition and mappings need
to be changed in DBMS that support logical DBMS
 If the logical schema undergoes a logical reorganization, application programs
that reference the external schema constructs must work as before.
 Logical data independence is used to separate the external level from the
conceptual view.
 Logical data independence occurs at the user interface level.

 Logical data independence is more difficult to achieve than physical data


independence.
Example
 Add/modify/delete a new attribute

 Merging two records into one


DATABASE LANGUAGES:
 Database Languages: A DBMS has appropriate languages and
interfaces to express database queries and updates.
 Database languages can be used to read, store and update the data in
the database.
Types of Database Language:
Database Languages
A DBMS supports a variety of users and must provide appropriate
languages and interface for each category of users
1.DDL(Data Definition language): used (by the DBA and/or database
designers)to specify the conceptual Schema.
Ex: CREATE, DROP, TRUNCATE, ALTER, RENAME statements
 SDL(Storage Definition Language):used to specify the internal schema
 VDL(View Definition Language):used for specifying the external schemas
2.DML(Data Manipulation Language):used for performing operations
such as retrieval and update upon the populated database.
Ex: SELECT, DELETE, INSERT, and UPDATE statements
3.TCL(Transaction Control language):it is used to manage different
transactions occurring within a database
Ex: COMMIT, ROLLBACK and SAVEPOINT statements
4.DCL (Data Control Language):it is used to create roles, permission and
referential integrity as well it is used to control access to database by securing it
Ex: GRANT and REVOKE statements
1.Data Definition language (DDL)
 DDL stands for Data Definition Language. It is used to define database
structure or pattern.
 It is used to create schema, tables, indexes, constraints, etc. in the
database.
 Data definition language is used to store the information of metadata
like the number of tables and schemas, their names, indexes, columns in
each table, constraints, etc.
 The data definition language (DDL) is used by the DBA and by database
designers to define both schemas when no strict separation of levels is
maintained .
 The DBMS will have a DDL compiler whose function is to process
DDL statements in order to identify descriptions of the schema
constructs and to store the schema description in the DBMS catalog.
 Storage Definition Language (SDL): The storage definition
language (SDL), is used to specify the internal schema. The mappings
between the two schemas may be specified in either one.
 View Definition Language (VDL): View definition language is used
to specify user views and their mappings to the conceptual schema.
Here are some tasks that come under DDL:
 Create: It is used to create objects in the database.

 Alter: It is used to alter the structure of the database.

 Drop: It is used to delete objects from the database.

 Truncate: It is used to remove all records from a table.

 Rename: It is used to rename an object.

 Comment: It is used to comment on the data dictionary.

These commands are used to update the database schema that's why they
come under Data definition language.
2. Data Manipulation Language(DML)
 DML stands for Data Manipulation Language.

 It is used for accessing and manipulating data in a database.

 Data manipulation languages (DML) are used to perform manipulation


operation such as retrieval, insertion, deletion, and modification of the data.
Here are some tasks that come under DML:
 Select: It is used to retrieve data from a database.

 Insert: It is used to insert data into a table.

 Update: It is used to update existing data within a table.

 Delete: It is used to delete all records from a table.

 Merge: It performs UPSERT operation, i.e., insert or update operations.

 Call: It is used to call a structured query language or a Java subprogram.

 Explain Plan: It has the parameter of explaining data.

 Lock Table: It controls concurrency.


3. DCL (Data Control Language)
 DCL stands for Data Control Language.

 It is used to retrieve the stored or saved data.

 The DCL execution is transactional.

 It also has rollback parameters. (But in Oracle database, the execution of data
control language does not have the feature of rolling back.)
 Ex: an employee in admin department has nothing to do with records related
to finance of the organisation. So, it would be wise to restrict the employee to
use only the required tables and data. For this purpose, database provides a set
of commands.
Here are some tasks that come under DCL:
 Grant: It is used to give user access privileges to a database. Grant to allow
specified users to perform specified tasks.
 Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT
4.Transaction Control Language (TCL)
 TCL is used to run the changes made by the DML statement.

 TCL can be grouped into a logical transaction.

Here are some tasks that come under TCL:


 Commit: It is used to save the transaction on the database.

 Rollback: It is used to restore the database to original since the last


Commit
“Statements which are used to manage the changes made by DML
statements are called Transaction Control language. Used to manage
transactions in a database. It allows statements to be grouped together
into logical transactions”
DBMS Interfaces
Interfaces are the programs which convert system language to user
understandable language and user language to system understandable
language. So its work likes a translator which provides an environment
in which a user can easily operate the system
Types of interfaces provided by the DBMS include
 Menu-Based interface for web clients or browsing

 Forms-based interfaces

 GUI’s

 Natural Language interfaces

 Interfaces for parametric users

 Interfaces for the DBA


 Menu-Based interface for web clients or browsing:
These interfaces are used by the stand-alone users. These interfaces present a set
of options called menu to the users, from which user can select an option
 Forms-based Interfaces: These are used by the parametric users. In this
type a form is displayed to the user. User has to fill all the information required
through form fields
 Graphical user Interfaces (GUI’s): these are used by both user, by
parametric and by stand alone user. Both Form based and Menu based is called
GUI. Displays a schema to the user in diagram form. The user can select his/her
options using a mouse pointer.
 Natural language Interfaces: these Interfaces are used by the Sophisticated
users. This includes commands in English and it will have its own Schema
 Interfaces for Parametric users: parametric users have small set of
operations they perform. Analyst and programmers design and implement a
special interface for each class of naive users
 Interfaces for the DBA: Systems contain privileged commands only for
DBA staff. Including commands for creating accounts, authorizing accounts,
reorganizing the storage structure etc.
Database System Environment
The term database system refers to an organisation of components that
define and regulate the collection, storage, management, and use of data
within a database environment.
In high level view the database system is composed of the following five
major components:
 Hardware Components in a database System Environment

 Software Components in a database System Environment

 People in a Database System Environment

 Procedure in a Database Environment

 Data in the Database

1. Hardware Components in a database System Environment


Hardware identifies all the system’s physical devices. It includes
computers, computer peripherals, network components etc
Database Environment
2. Software Components in the database system Environment
To make the database system work properly, three types of software are
needed: Operating system, DBMS Software, and Application programs.
 Operating System: It manages all hardware components and allow
other software to run on the computers. Ex: Windows, Linux and etc
 DBMS Software: It manages the database within the database
system. Some example of DBMS software include Oracle, Access,
MySql etc
 Application programs: These are used to access and manipulate
data in the DBMS and to manage the computer environment in which
data access and manipulation take place. Application programs are
most commonly used to access data to generate reports. Most of the
application programs provide GUI.
3. People in Database System Environment:
It includes all users associated with the database system. On the basis
of primary job function we can identify five types of users in a database
system.
System Administrator: They supervise the database system’s general
operations
Database Administrators: They are also known also as DBA’s. They
manage the DBMS and ensure that the database is function properly.
Database Designers: They design the database structure. They are
database architects. As this is very critical, the designers job
responsibilities are increased.
Systems Analyst and programmers: They design and implement the
application programs. They design and create the data entry screens,
reports and procedures through which end users can access and
manipulate the data
End users: are the people who use the application. Ex: Sales-clerks,
supervisors, managers are classified as end users.
4. Procedure in a Database Environment
 Procedures are the instructions and business rules that govern the
design and use of the database system.
 Procedures are critical component of the system. Procedures play an
important role in a company because they enforce the standards by
which business is conducted in an organisation.

5. Data in the Database


 Data are the very important basic entity in a database.

 It is the collection of facts stored in the database.

 Data is the raw material from which information is generated


Centralized DBMS Architecture
 Used mainframes to provide main processing for user application
programs, user interface programs and DBMS functionality
 User accessed systems through ‘dumb’ computer terminals that only
provided display capabilities, with no processing capabilities.
 All processing was performed remotely on the computer system, and
only display information was sent to the terminals, connected through
network.
 Dumb terminals were replaces with workstations, which lead to
client/server architecture.
 “ A Centralized DBMS in which all the DBMS functionality,
application program execution and user interface processing
were carried out on a single machine.”
Centralized DBMS Architecture
Distributed DBMS
 A Distributed Database is a single logical database that is spread
physically across computers in multiple locations that are connected by a
data communications network.
 A DDB is a collection of logically related data distributed across several
machines interconnected by a computer network.
 An application program operating on a DDB may access data stored at
more than one machine
 A set of co-operating databases, each resident at a different site that the
user views and manipulates as a centralized database
“A Distributed DBMS (DDBMS) is a software system that permits the
management of the distributed database and make the distribution
transparent to users”
Distributed DBMS (DDBMS)
Advantages of Distributed DBMS
 Naturally distributed data

 Reliability and availability

 Controlled data sharing

 Improved performance

Disadvantages of Distributed DBMS


 Complexity

 Cost, Security

 Database design is more Complex

Functions of Distributed DBMS


 Extended Communication Services and Data Dictionary

 Distributed query processing

 Extended Concurrency control and Recovery services.


Difference between Centralized & Distributed database

Centralized DBMS Distributed DBMS

Database is maintained at one site Database is maintained at a


number of different sites

If centralized system fails, entire If one system fails, system


system is halted continues to work with other sites

Less reliable More reliable

Low performance High performance


Client – Server Architecture
 Define specialized servers with specific functionalities(file servers,
print servers, web servers, database servers etc)
 Many client machines can access resources provided by specialized
server
 Client machine provide user with appropriate interfaces to utilize
servers, as well as with local processing power to run local applications
 Some machines are client sites, with client software installed and other
machines are dedicated server
 Client – a user machine that provides user interface capabilities and
local processing
 Server – machine that provides services to client machines such as file
access, printing, and database access.
 Client – Server computer systems architecture emerged with new
computing environments; Large no of PC’s, workstations, specialized
server and equipments are connected together via a network
An Architecture in which the user’s PC (the client) is the requesting
machine and the server is the supplying machine, both of which are
connected through a (LAN) or (WAN) such as internet
Two Tier Client/Server Architecture for DBMS
In relational DBMSs, user interface and application programs were
first moved to the client side
 SQL provided a standard language, which was a logical dividing point
between client and server
 Query and transaction functionality remained on server side. In this
architecture, the server is called query server, or transaction server.
 In relational DBMSs, the server is called an SQL server, because most
RDBMSs SQL
 In such systems, the user interface and application programs run on
the client, when DBMS access is needed, the program establishes a
connection to the DBMS on the server side. Once the connection is
created, the client can communicate with the DBMS.
 ODBC(Open Database Connectivity) is a standard that provides and
application processing interface which allows client side programs to
call the DBMS as long as both sides have the required software. Most
database vendors provide ODBC drivers for their system
 Query request are sent from the client to the server, and the server
process the request and sends the result to the client

Two tier Client-Server Architecture


Three – Tier Client Server Architecture of DBMS
 Many web application use three-tier architecture, which adds an
intermediate layer between the client and the database server.
 The middle tier is called the application server, or the web server. Plays
an intermediate role, by storing business rules(procedures/constraints)
used to access data from database
 Can improve database security by checking the clients credentials
before forwarding request to database server
 Clients contain GUI interfaces and application specific rule

 The intermediate server accepts the requests from the client, processes
the request and sends the database commands to the db server, then
passes the data from the database server to the client, where it may be
processes further and filtered
 The three tiers are: user interface, application rules, and data access
Three-Tier Client Server Architecture
Classification of DBMS
The DBMS can be classified into different categories on the basis of
several criteria such as the data model they are using, number of users
they support, number of sites over which the database is distributed and
purpose they serve.
1.Data Model Classification
 Relational data model
 Hierarchical data model
 Network data model
 Object-Oriented data model
 Object-Relational data model
2.Number of Users
 Single User System – Single user system the database resides on one
computer and only accessed by one user at a time
 Multiuser system – multiuser can access the database simultaneously. In
multiuser DBMS, the data is both integrated and shared
3. Number of sites
 Centralized – data is stored in single site

 Distributes – Database & dbms software stored over many sites


connected by network
 Homogeneous – use same dbms software at multiple sites

 Online Transaction processing(OLTP) – Used for data entry and


retrieval. Ex: Banking, airlines etc
4.Cost
 Low-end system under $3000

 High-end system, over $100,000

5. Purpose
 General purpose

 Special Purpose

DBMS is a general purpose software system. It can be designed for


specific purposes such as airline or railway reservation.
DATA MODELLING USING ENTITY-
RELATIONSHIP MODEL

CHAPTER-3
(UNIT 2)
High Level Conceptual Data Models for Database Design
 A data model is collection of concepts for
describing the data in a database.
 While A schema is a description of a particular
collection of data, using a given data model.
 The relationship between data model, schema,
and phases of design are as follow:
The process of database design is divided into different
parts. It includes the following:
 Requirement Collection and Analysis

 Conceptual Design

 Logical Design

 Physical Design
Phases of Database
Design
Requirement Collection and Analysis :
 In this phase a detailed analysis of the requirement is done.

 The requirements and the collection analysis phase produce


both data requirements and functional requirements.
 The data requirements used as a source of the database
design.
 In parallel with specifying the data requirements ,it is useful
to specify the functional requirements of the application.
 The functional requirements are the source of application
software design.
 The use of various information gathering methods like
interview, analyzing the documents, to get a clear
understanding of the requirements.
Conceptual design:
 Once all the requirements have been collected and
analyzed, the next step is to create a conceptual
schema for the database, using a high level
conceptual data model.
 This phase is called conceptual design.
 The Conceptual schema describes the data
requirements of the users, entity types, relationships,
and constraints.
Logical Design:
 This step involves the implementation of the data
model.
 Since the conceptual schema is transformed from high
level model into the implementation data model, this
step is known as Data model mapping.
 Logical Database Design is the process of
describing a model of the information in an
enterprise based on the chosen database
model.
 This results in a database schema in the
implementation data model such as relational
or object –relational database model.
Physical Design :
 Physical database design is the process of
describing the implementation of the
database on the disk.
 It describes the internal storage Structures,
indexes, access paths, base relations, security
issues and constraints.
Overall database design involves the following
steps:
 Identifying all the required files ( files are called
record types).
 Identifying the fields ( attributes) of each of these
record types.
 Identifying the primary key of each of these files.

 Identifying the relation ship between record types.


Example of Database Application
 An example database application is COMPANY Database.

 We need to create a database schema design based on the following


(simplified) requirements of the COMPANY Database:
 The company is organized into DEPARTMENTs. Each department
has a name, number and an employee who manages the department.
We keep track of the start date of the department manager. A
department may have several locations.
 Each department controls a number of PROJECTs. Each project has
a unique name, unique number and is located at a single location.
 We store each EMPLOYEE’s social security number, address, salary,
sex, and birth date.
 Each employee works for one department but may work on several
projects.
 We keep track of the number of hours per week that an employee
currently works on each project.
 We also keep track of the direct supervisor of each employee.

 Each employee may have a number of DEPENDENTs. For each


dependent, we keep track of their name, sex, birth date, and
relationship to the employee.
Entity and Attributes
Entity: Entities are specific objects or things in the mini-world that are
represented in the database. (or)
An Entity is any object in the system that we want to model and store
information about individual objects is called entities.
For example: the EMPLOYEE John Smith, the Research DEPARTMENT,
the ProductX PROJECT
Tangible Entity: Tangible Entities are those entities which exist in the real
world physically.
Example: Person, car, etc.
Intangible Entity: Intangible Entities are those entities which exist only
logically and have no physical existence.
Example: Bank Account, etc.
Attributes:
 “An attributeis a property of an entity.
 For Ex: a person has an age, a car has a colour. ”

Types of Attributes
There are different types of Attributes:
1.Simple Attribute 2. Composite Attribute
3.Single valued Attribute 4. Multi valued Attribute
5.Stored Attribute 6. Derived Attribute
7.Complex Attribute 8. Null value Attribute
1.Simple Attribute(Atomic attribute): Simple attribute that consist
of a single atomic value. A simple attribute cannot be subdivided.
Ex: the attributes age, sex, etc., is simple attributes
2.Composite Attribute: A composite attribute is an attribute that can
be further subdivided and attribute is value not atomic.
Ex: The attribute ADDRESS can be subdivided into street, city,
state, and pin code
3.Single Valued Attributes: A single valued attribute can have only a
single value. For Ex: a person can have a one ’date of birth’, ‘age’ etc.
It can have only single value. But it can be simple or composite
attribute. That is ‘date of birth’ is a composite, ‘age’ is a simple
attribute.
4. Multi-valued Attribute: Multi-valued Attributes can have multiple
values. For Ex: a person may have multiple phone numbers,
multiple degrees etc. Multi -valued attributes are shown by a double
line connecting to the entity in the ER diagram.
Stored Attribute:
The value certain attributes cannot be derived from some other
attributes. Such attributes are said to be Stored Attributes.
Here age and DOB are related Attributes. By using Birth date and with
the help of current date (Today) Can I determine
Ex: Date of Birth.
6. Derived Attribute: If the value for the derived attribute is derived
from the stored attribute. The value for the attribute ‘AGE’ can be
derived by subtracting the DOB from the current date.
7. Complex Attribute: A complex attribute that is both Composite
and multi-valued
Ex: if a person can have more than one residence and each residence can
have multiple phones.
 Here, phone number and email are examples of multi-valued
attributes and address is an example of the composite attribute,
because it can be divided into house number, street, city, and state.
8. Null Value Attribute: A particular entity may not have an applicable
value for an attribute.
For Ex: The apartment number attribute of an address applies only to
the addresses that are in apartment buildings and not to other types of
residences, such as single-family homes
Entity Type
 The entity type is a collection of the entity having
similar attributes.

 In the above Student table example, we have each row


as an entity and they are having common attributes i.e
each row has its own value for attributes Roll_no, Age,
Student_name and Mobile_no.
 So, we can define the above STUDENT table as an
entity type because it is a collection of entities having
the same attributes.
Entity Set
 Entity Set is a collection of entities of the same entity
type.
 We can say that entity type is a superset of the entity
set as all the entities are included in the entity type.
 Example 1: In the above example, two entities
E1 (2, Angel, 19, 8709054568) and E2(4,
Analisa, 21, 9847852156) form an entity set.
Types of Entity :
 Strong Entity: A Strong entity is not dependent on any other entity
in the schema.
 A strong entity will always have a primary key.

 Strong entities are represented by a single rectangle.

 The relationship of two strong entities is represented by a single


diamond.
Weak Entity: A Weak entity depend on another entity. It is an entity
that does not have a primary key attribute
 Double rectangle represents weak entity.
 The relation between one strong and one weak entity is represented by a
double diamond.
 This relationship is also known as identifying relationship.
Key Attributes of an Entity Type
 A key is an attribute or set of attributes that help to uniquely Identify
a tuple in relation (table).
 Such attribute is called Key Attribute and its value can be used to
identify each entity uniquely.
 From the above example, which attribute is the key
attribute.
 Can the name attribute be the key attribute?

 No, it cannot be the ‘key attribute’ because two are


more employees can have the same names.
 Similarly, two are more employees may live at the
same address.
 The ID attribute of the employee entity type is the
‘key attribute.’
 The ID value of no two employees can be the same.
Different types of Keys
1. Candidate Key
 A candidate key is an attribute or set of attributes that can uniquely
identify a rows of a table.
 The primary key of a table is decided based on one of
the candidate keys. So, candidate keys have the same
properties as the primary keys.
 For example: In the EMPLOYEE table, id is best suited for the
primary key. The rest of the attributes, like SSN, Passport_Number,
License_Number, etc., are considered a candidate key.
Candidate Key
2. Primary Key :
 The Primary Key is an attribute that can uniquely
identify a table.
 A table can have only one primary key.

 Out of all the chosen candidate keys, one of the keys


should be selected as the primary key.
3. Alternate Key
 An alternate key is the secondary candidate key that contains all the
property of a candidate key but is an alternate option.
 All the candidate keys which are not selected as
primary keys are known as the Alternate Key.
 For example, employee relation has two attributes, Employee_Id
and PAN_No, that act as candidate keys. In this relation,
Employee_Id is chosen as the primary key, so the other candidate key,
PAN_No, acts as the Alternate key.an alternate option.
4. Foreign Key
 Foreign Key is used to establish relationships between
two tables.
 A single table can have multiple foreign keys.

 A foreign key can have NULL values.


 The table with the foreign key is called the child table, and the table
with the primary key is called the referenced or parent table.
5. Super Key
 Super Key is a single attribute or combination of
attributes that can be used to uniquely identify a row
in a table.
 A single table can have multiple super keys.

 A candidate key and primary key can be a super key,


but the reverse does not hold true.
 For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME), the name of two employees can be the same,
but their EMPLYEE_ID can't be the same. Hence, this combination can
also be a key.
6. Composite Key
 A composite key is a combination of multiple columns
that can uniquely identify tuples.
Real-Life Examples:
 Name (Sam Daniel Mccormick)

 First_Name = Sam

 Middle_Name = Daniel

 Last_Name = Mccormick

7. A unique key is a group of one or more than one


fields or columns of a table which uniquely identify
database record.
A primary key cannot take a NULL value, but a unique key can have
one NULL value as its value.
7. Unique Key:
 A unique key is a group of one or more than one
fields or columns of a table which uniquely identify
database record.
 A primary key cannot take a NULL value, but a unique key can
have one NULL value as its value.
Relationship :
 A relationship describes relations between entities.
Relationship is represented using diamonds.
 For example, employee entity has relation belongs to
with department.
Relationship Type
 Relationships of the same type are grouped
or typed into a relationship type.
 It represents the association between entity types.

 For Ex: ‘Manger’ is a relationship type that exists between


entity type manager & employee.
 In ER diagram relationship diagram is represented by
diamond shape & connecting the entity with lines.

Manag Employee
Manger s
es
Relationship Instances
A relationship instance is the instance that
associates an entity from an entity type to
another entity of another entity type , in
order to establish a relationship among
various participating entity types.
 Ex: if you have an entity set Employees and another entity
set Departments, you might define a relationship set
Works_In which associates members of those two entity
sets.
Relationship Set
 A Relationship set is a set of relationships of the same type.

Degree of a Relationship Set


 The number of different entity sets participating in a
relationship set is called the degree of a relationship
set.
Unary relationship set
Binary relationship set
Ternary relationship set
N-ary relationship set
1. Unary Relationship Set-
 Unary relationship set is a relationship set where
only one entity set participates in a relationship set.
Ex : One person is married to only one person
 Binary Relationship set: Binary relationship set is a
relationship set where two entity sets participate in a
relationship set.
 EX: Student is enrolled in a Course
3. Ternary Relationship Set-
 Ternary relationship set is a relationship set where
three entity sets participate in a relationship set.
Example –
 4. Quaternary Relationship Set: Quaternary relationship set
is a relationship set where four entity sets participate
in a relationship set.
4. N-ary Relationship Set-
 N-ary relationship set is a relationship set where ‘n’
entity sets participate in a relationship set.

Relationship between Entity Sets


The Relationship between entity sets are of four types:
 One-to-One Relationship (1:1)

 One-to-Many Relationship (1:N)


 Many-to-One Relationship (M:1)
 Many-to-Many Relationship (M:N)
1.One-to-One Relationship (1:1)
 one entity from entity set A can be associated with at most one
entity of entity set B and vice versa.
Ex: College can have only one Principal (1:1)

2. One-to-Many Relationship (1:N)


 One entity from entity set A can be associated with more than
one entities of entity set B but from entity set B one entity can
be associated with at most one entity
Ex: College offers many Courses like BCA, B.com, etc (1:N)
3. Many-to-One Relationship (M:1)
 More than one entities from entity set A can be associated
with at most one entity of entity set B but one entity from
entity set B can be associated with more than one entity
from entity set A.
Ex: (M:1) Students study Course
4. Many-to-Many Relationship (M:N)
 one entity from A can be associated with more than one entity from B
and vice versa.
Ex: (M:N) Students enrolled in a classes, that is one lecturer teaches
many students and a student is taught by many lecturer, so it is a many-
to-many (M:N) relationship.

 A relationship type can have attributes;


 for example, Hours Per Week of WORKS_ON;

 its value for each relationship instance describes the


number of hours per week that an EMPLOYEE works
on a PROJECT.
Relationship as Attribute
 whenever an attribute of one entity type refers to another
entity type, some relationship exists.
For example,
 the attribute Manager of DEPARTMENT refers to
an employee who manages the department;
 the attribute Controlling department of PROJECT refers
to the department that controls the project;
 the attribute Supervisor of EMPLOYEE refers to another
employee (the one who supervises this employee);
 the attribute Department of EMPLOYEE refers to
the department for which the employee works; and so on.
 Some instances of WORKS_FOR relationship set, which represents a
relationship type WORKS_FOR between EMPLOYEE and DEPARTMENT,
which associates each employee with the department for
which the employee works in the corresponding entity set.
 Relationship types can have attributes of 1:1(or)1:N relationships, attributes
can be migrated to one of the participating entity types.
 instance ri is shown connected to the EMPLOYEE
and DEPARTMENT entities that participate in ri . In
the miniworld represented by , employees e1,e3, and
e6 work for department d1; employees e2 and e4
work for department d2; and employees e5 and e7
work for department d3.
Attributes of Relationship types:
 A relationship type can have attributes;
 for example, Hours Per Week of WORKS_ON; its
value for each relationship instance describes the
number of hours per week that an EMPLOYEE
works on a PROJECT.
Hours Per
Week

Works_O PROJEC
EMPLOYEE N TS
Role Names and Recursive Relationship :
Role Name:
 Each entity type in a relationship plays a particular role.
 The role name specifies the role that a participating entity type plays in
the relationship.
Ex : In the relationship between Employee and Department, the Employee
entity type plays the employee role and the Department entity type plays
the department(or) employer role.
Recursive Relationship :
A Recursive Relation ship is an entity is associated with itself. An employee
may manage many employees and each employee is managed by one
employee.

If an entity type appears more than once in a


relationship, it is said to be recursive.
Example : The supervision relationship is a recursive
relationship. The supervision relationship type relates an
employee to a supervisior,where both employee and
supervisor entities are members of the same Employee
entity type.
Here the Employee entity participates twice in
SUPERVISION; once in the role of Supervisor and once in
Constraints on Relationship types
Structural constraints limit the possible combinations of entities that can
participate in a relationship instance. Structural constraints of a
relationship type are:
 Cardinality ratio
 Participation
Cardinality ratio :
 The cardinality ratio for a binary relationship defines
the maximum number of times a particular instance of
an entity can be related to the instance of another
entity.
 Three types of cardinality ratios are 1:1, 1:N, and N:M,
respectively, for a One-to-One, One-to-Many, and Many-
to-Many relationship.
One-to-One Cardinality (1:1)
 An entity in A is associated with at most one entity in
B in this form of cardinality mapping.
One-to-Many Cardinality (1:M)
 An entity set A is associated with any number of
entities in B (including zero), and each entity in B is
associated with just one entity in A.
Many-to-One Cardinality (M:1)
 An entity set in A can only be associated with one entity
in B. In contrast, an entity set in B can be associated
with any number of entities in A, including zero.
Many-to-Many Cardinality (M:M)
 Any entity in A and B is associated with many entities
in B and A, respectively, which means many rows in
the first table are associated with many rows in the
second table.
 Here, this is many-to-many relationship because many
employees works on many projects.
Participation Constraints :
 In a Relationship, Participation
constraint specifies the presence of an entity
when it is related to another entity in a
relationship type. It is also called the minimum
cardinality constraint.
 This constraint specifies the number of
instances of an entity that are
participating in the relationship type.
There are two types of Participation
constraint:
 Total participation
 Partial participation
1. Partial Participation
 Each entity in the entity set may or may not participate in
the relationship is known as partial participation.”
 Partial participation represents through a single line
between the entity set and relationship set.
 Partial participation is also known as optional
participation.
2. Total Participation
 Each entity in the entity set must participate in at least
one relationship instance in that relationship set is known
as total participation.
 Total participation represents through double line between
the entity set and relationship set.
 Total participation is also known as mandatory
participation.
 Let Explain with example,
 According to Partial participation, there may exist
some courses for which no enrollments are made.
 According to Total participation, each student must
be enrolled in at least one course.
ER Diagram Notation :
 The graphical representation of organizational system elements
and the association among the elements is called ER Diagram.
 Entities, Attributes and Relationship forms the
components of ER Diagram and the defined
symbols and shapes are summarized below in
Table.
 Entity- this represents the name of an object,
person, thing, event, or place where data is stored.
This is usually represented by rectangles.
 Weak Entity- Weak entities are distinguished by being placed
in double rectangle and by having their identifying relationship
placed in double diamond.
The partial key of the weak entity type is underlined with a dotted
line.
 Attribute- refers to the unique characteristic or
property of an entity.
 Derived Attribute- refers to an attribute derived or
based on another attribute. Derived attributes are shown in
dotted ovals.
 Multivalued Attribute- a type of attribute that can
have multiple values. Multivalued attributes are shown in
double ovals.
 Relationship- defines the interaction between two
entities.
 Cardinality- this refers to the occurrences of a
relationship. In particular, it specifies the maximum
number of relationships between two entities.
 Ordinarily- describes whether a relationship is
mandatory or optional. It is also used to determine
the absolute minimum number of relationships.
ER DIAGRAM FOR A BANK DATABASE
Proper Naming of Schema Constructs:
 When designing a database schema, the choice of the
names for entity types, attributes, relationship types.
 In ER diagram use the convention that entity type and
relationship types are in uppercase letters
 Attributes names are capitalized and role names are in
lower case letters.
 A narrative description of the database requirements,
the nouns appearing in the narrative tend to give rise to
entity type names and verb tends to indicate names of
the relationship types.
 Attribute names generally arise from additional nouns
that describe the nouns corresponding to entity types.
Optionality and Cardinality
Symbols at the ends of the relationship lines indicate the
optionality and the cardinality of each relationship.
•“Optionality” express whether the relationship is optional or
mandatory.
•“Cardinality” express the maximum number of relationships.
•As a relationship line is followed from an entity to another, near
the related entity two symbols will appear. The first of those is the
optionality indicator.
•The Circle(O) indicates that the relationship is optional – the
minimum number of relationships between each instance of the
first entity and instances of the related entity is zero.
•It can be taken as circle as Zero or letter O for “Optional”.
•A Stroke (|) indicates that the relationship is mandatory the minimum
number of relationships between each instances of the first entity and
instances of the related entity is one.
•The Second Symbol indicates cardinality. A stroke(|) indicates that the
maximum number of relationship is one.
•A “Crows-foot” (< ) indicates that many such relationships between
instances of the related entities might exist.
In our model we wish to indicate that each school may enroll many
students, or may not enroll any students at all. We will also wish to
indicate that each student attends exactly one.
The following diagram indicates the optionality and cardinality
Relationship types of Degree higher than two :
 Relationship types of degree 2 are called binary.
 Relationship types of degree 3 are called ternary and
of degree n are called n-ary.
 In general, an n-ary relationship is not equivalent to
n binary relationships.
 Constraints are harder to specify for higher-degree
relationships
(n > 2) than for binary relationships.
 In general, 3 binary relationships can represent
different information than a single ternary
relationship.
 If needed, the binary and n-ary relationships can all
be included in the schema design.
 In some cases, a ternary relationship can be
represented as a weak entity if the data model allows
a weak entity type to have multiple identifying
relationships
EXAMPLE OF A TERNARY RELATIONSHIP
 Suppose that CAN_SUPPLY,
between SUPPLIER and PART, includes an
instance (s, p) whenever supplier s can
supply part p (to any project); USES,
between PROJECT and PART, includes an
instance ( j, p) whenever project j uses part p;
and SUPPLIES,
between SUPPLIER and PROJECT, includes
an instance (s, j) whenever supplier s supplies
 some part to project j. The existence of three
relationship instances (s, p), ( j, p), and (s, j)
in CAN_SUPPLY, USES, and SUPPLIES,
respectively.
Abstraction
 The main purpose of data abstraction is to hide irrelevant data and
provide an abstract view of the data. With the help of data abstraction,
developers hide irrelevant data from the user and provide them the
relevant data.
The two main aspects of abstraction are:
1.Specialization
2.Generalization
Generalization
 Generalization is a bottom-up approach in which two lower entities
combine to form a higher level entity. The higher level entity can
also combine with lower level entity to make further higher level
entity.
Specialization
• Specialization is a opposite of generalization. It is a top-down
approach in which one higher level entity can be broken into two
lower level entities. In his some higher level entities may not have
lower-level entity set at all.
Aggregation
Aggregation is a process the relation between two entities is treated as a
single entity. It compiles information on an object, thereby abstracting a
higher level object.
Three types of aggregations
Aggregate attribute values of an object to form the object
Represent an aggregate relationship as an ordinary relationship
Combine objects that are related by a relationship into a higher-level
aggregate object.
Difference between Generalization and Specialization
Generalization Specialization
Generalization works in Bottom- Specialization works in top—
Up approach down approach
In Generalization, size of In Specialization, size of schema
schema gets reduced gets increased
Generalization is normally We can apply Specialization to
applied to group of entities Single entity
Generalization can be defined as Specialization can be defined as
a process of creating groups from a process of creating
various entity sets. subgrouping within an entity
set.
In Generalization process, it Specialization is a reverse of
takes the union of two of two or Generalization. Specialization is
more lower-level entity to a process of taking a subset of a
produce a higher level entity higher level entity set to form a
sets. lower level entity set
There is no inheritance in There is inheritance in
ER Diagram for Student Database
Relational Model and Relational
Algebra
Relational Model Concepts
Relational Model Concepts :
⚫ 1. Attribute: Each column in a Table.
Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
⚫ 2. Tables – In the Relational model the,
relations are saved in the table format. It is
stored along with its entities. A table has two
properties rows and columns. Rows represent
records and columns represent attributes.
⚫ 3. Tuple – It is nothing but a single row of a
table, which contains a single record.
⚫ 4. Relation Schema: A relation schema
represents the name of the relation with its
attributes.
⚫ 5. Degree: The total number of attributes
⚫ 6. Cardinality: Total number of rows
present in the Table.
⚫ 7. Column: The column represents the set
of values for a specific attribute.
⚫ 8. Relation instance – Relation instance is
a finite set of tuples in the RDBMS system.
Relation instances never have duplicate
tuples.
⚫ 9. Relation key - Every row has one, two or
multiple attributes, which is called relation
key.
⚫ 10. Attribute domain – Every attribute has
some pre-defined value and scope which is
known as attribute domain.
CUSTOMER Table name is
Relation name

Cardinality
(No of Rows)=
4

Degree( No of
Columns)=3
The characteristic of a relation are as
follows:
Ordering of tuples in a relation r(R)
⚫ Relation is defined as a set of tuples (even
though they appear to be in the tabular form).
⚫ Elements have no order among them.
No duplicate tuples
⚫ A relation cannot contain two or more tuples
which have the same values for all the
attributes. i.e., In any relation, every row is
unique.
⚫ There is only one value for each attribute of a
tuple. The tuple should have only one value.
⚫ Values in tuple: All Values are considered
Atomic ( Indivisible).
A Special null value used to represent values
that are unknown.
Relational Integrity Constraints
Integrity Constraints :
⚫ Integrity constraints are a set of rules. It is
used to maintain the quality of information.
⚫ Integrity constraints ensure that the data
insertion, updating, and other processes have
to be performed in such a way that data
integrity is not affected.
Categories of Integrity Constraints
1. Inherent Model Based Constraints
2. Schema based Constraints
3. Application based constraints
4. Data Dependencies Constraints
1. Inherent Model Based Constraints :
⚫ Constraints that are inherent in the data
model are called as Inherent Model Based
Constraints.
Ex: The constraints that a relation cannot have
duplicate tuples in an inherent constraints.
2. Schema based Constraints :
⚫ Constraints that can be directly expressed in
the schemas of the data model, using DDL
(Data Definition Language).
⚫ Schema-based constraints include four
constraints.
A. Domain constraints
B. Entity integrity constraints
C. Referential Integrity Constraints
D. Key constraints
A. Domain constraints :
⚫ Domain constraints can be defined as the
definition of a valid set of values for an
attribute.
⚫ The data type of domain includes string,
character, integer, time, date, currency, etc.
The value of the attribute must be available
in the corresponding domain.
⚫ Every domain must contain atomic
values(smallest indivisible units) which
means composite and multi-valued
attributes are not allowed.
⚫ Domain constraints specify that within each
tuple, and the value of each attribute must
be unique. This is specified as data types
B. Entity integrity constraints :
⚫ The entity integrity constraint states that
primary key value can't be null.
⚫ This is because the primary key value is used
to identify individual rows in relation and if
the primary key has a null value, then we can't
identify those rows.
⚫ A table can contain a null value other than the
primary key field.
C. Referential Integrity Constraints :
⚫ A referential integrity constraint is specified
between two tables.
⚫ In the Referential integrity constraints, if a
foreign key in Table 1 refers to the Primary
Key of Table 2, then every value of the
Foreign Key in Table 1 must be null or be
available in Table 2.
D. Key constraints :
⚫ Keys are the entity set that is used to identify
an entity within its entity set uniquely.
⚫ An entity set can have multiple keys, but out of
which one key will be the primary key. A
primary key can contain a unique and not null
value in the relational table.
E. Constraints on NULLs :
⚫ We can specify whether an attribute can have
NULL or not using this constraint.
⚫ For example, if we do not want a NULL value for
the student’s name then we can specify
and constraint it using NOT NULL.
3. Application based constraints :
⚫ Constraints that cannot be directly applied in the
data model's schemas. These are known as
application-based or semantic constraints.
⚫ Expressed and enforced by application program.
⚫ Ex: Student can not have a phone number more
than 10 digits.
4. Data Dependencies Constraints :
⚫ It includes functional dependencies and multi
valued dependencies they are used mainly for
Integrity Constraint over relations.
⚫ Integrity constraints are a set of rules. It is used
to maintain the quality of information.
⚫ Integrity constraints ensure that the data
insertion, updating, and other processes have to
be performed in such a way that data integrity is
not affected.
⚫ Thus, integrity constraint is used to guard
against accidental damage to the database.
Relational Databases and Relational Databases
Schemas
Relational Databases:
Relational Database is a database system that
stores and retrieves data in a tabular format
organized in the form of rows and columns.
Therefore a relational database is a collection of
relations with distinct relation names.
Relational Databases Schemas:
A relational database schema is the collection of
schemas for the relations in the database
A database state that does not obey all the integrity
constraints is called an Invalid state
A state that satisfies all the integrity constraints is
called a Valid state.
Operations on Relations
The operations of the relational model can be
categorized into:
Retrieval
Update
⚫ Retrieval operations on Relations:
Retrieval operations are performed on the relation
to extract required information from relational
database.
SELECT operation is one of the example for
retrieval operation.
⚫ Update operations on Relations:
There are Three basic update operations on
relations:
Insert: Insert is used to insert new tuple or tuples in
a relation.
The Insert operation:
The SQL INSERT statement is used to insert a single
or multiple data in a table. Insert can be violate any of
the four types of constraints as described below.
⚫ This insertion satisfies all constraints, so it is
acceptable.
EX: insert<5010, “Ashwini”,10, 9710185288, 40,000>
into employee
⚫ Domain constraint can be violated if an attribute
value is given that does not appear in the
corresponding domain.
EX: insert<5010, “Ashwini”,10, 9710185288, 40,000,
“Java project”> into Employee
Is insertion operation not possible because it violates
domain constraint as it has entity “Java project” which
does not corresponding any attribute in the original
⚫ Key constraint can be violated if a key value in the
new tuple already exists in another tuple in the
relation.
EX: insert<5010, “Vidya”,10, 9720685288, 45,000>
into Employee
This insertion violates the key constraint because
another tuple with the same Eno value already
exists in the EMPLOYEE relation, so it is rejected.
⚫ Entity integrity can be violated if the primary key
of the new tuple is null.
Ex: insert< “Anuradha”,20, 9758998743> into
Employee
This insertion violates the entity integrity constraint
(null for the primary key Eno), so it is rejected.
In this operation, the DBMS could ask the user to
provide a value for Eno and could accept the
⚫ Referential integrity can be violated if the value
of any foreign key in t refers to a tuple that
does not exists in the referenced relation.
EX: insert<5040, “Vasudha”,40, 9448050836,
30,000> into Employee
This insertion operation violates the referential
integrity constraint specified on Dno because the
Dno = 40 of EMPLOYEE does not exist in the
DEPARTMENT relation at all
The Delete operation:
The Delete operation is used to delete tuples
from the relation. The Delete operation can
violate only referential integrity, if the tuple
being deleted is referenced by the foreign keys
from other tuples in the database.
To Specify deletion, a condition on the attributes
of the relation selects the tuple(or tuples) to be
deleted.
EX: 1. Delete the EMPLOYEE tuple with Eno =
5030 and Dno = 30
This deletion is acceptable.
2. Delete the DEPARTMENT tuple with Dno
= 20
This Deletion is not acceptable, because tuples in
Several options are available if a deletion operation
cause a violation. Any one of the below options must
be specified during database design for each foreign
key constraint.
1. The First option is to reject the deletion.
2. The Second option is to attempt Cascade (or
propagate) the deletion by deleting tuples that
references the tuple that is being deleted.
3. The Third option is to modify the referencing
attribute values that cause the violations each such
value
(ON is either
DELETE set to nullconstraint
CASCADE or changedis to reference
used in
another
MySQL tovalid tuple.
delete the rows from the child table
automatically, when the rows from the parent
table are deleted.)
The Modify Operation:
The Update (or modify) operation is used to change
the values of one or more attributes in a tuple or
tuples of some relation R.
It is necessary to specify a condition on the attributes
of the relation to select the tuple (or tuples) to be
modified.
⚫ Updating an attribute that is neither a primary key
nor a foreign key usually causes no problems. The
DBMS need only check to confirm that the new
value is of the correct data type and domain.
⚫ Modifying a primary key value is similar to deleting
one tuple and inserting another in its place, because
we use the primary key to identify tuples.
⚫ If a foreign key attribute is modified, the DBMS
must make sure that the new value refers to an
existing tuple in the referenced relation.
EX: 1. Update the SALARY of the EMPLOYEE tuple
2. Update the DNO of the EMPLOYEE tuple with
Eno = 5030 to 10
Acceptable
3. Update the DNO of the EMPLOYEE tuple with
Eno = 5030 to 40
Unacceptable, because it violates referential
integrity. i.e, Dno = 40 does not exist in
the DEPARTMENT table.
4. Update the Eno of the EMPLOYEE tuple with
Eno = 5030 to 5010
Unacceptable, because it violates primary key
and referential integrity constraints. i.e, the
Eno 5010 already exists, we cannot have two
identical values for primary key.
Relational Algebra
The relationl algebra is a procedural query
language. It consists of a set of operations that
take one or two relations as input and produce a
new relation as their result.
The relational algebra is often considered to be an
integral part of the relational data model, and its
operations can be divided into two groups.
⚫ One group includes set operations from
mathematical set theory. Set operations include:
1. Union 2. Intersection
3. Set Difference 4.Cartesian Product
⚫ The other group consists of operations
developed specifically for relational databases
these include
Union (U)
The relations P and Q are said to be union compatible
if both P and Q are of the degree ‘n’ and the domain of
the corresponding ‘n’ attributes are identical.
In Simple, the union operation combines two sets
of rows into a single set composed of all the rows in
either or both of the two original sets.
Consider the relations P and Q. R is the computed
result relation.
R= P U Q
The result relation R contains tuples that are in
either P or Q or in both of them. The duplicated
tuples are eliminated.
Intersection (^)
The result of this operation, by P – Q, is a relation
that includes all tuples that are in both P and Q
R=P^Q
The insertion of two tables rules in a third table
containing all the tuples that are in both relations.
Difference (Minus)(-)
The difference of two tables is a third table
containing all the rows, which are in the first
table but not in the second.
The result of this operation, denoted by P – Q,
is a relation that includes all tuples that are in P
but not in Q
R=P–Q
Cartesian Product (*)
The Cartesian product of two relations is the
concatenation of tuples belonging to the two
relations. A new resultant relation scheme is
created consisting of all possible combinations
of the tuples.
If R = P * Q where a tuple r belongs R
The result relation is obtained by concatenating
each tuple in relation P with each tuple in
relation Q
⚫ The scheme of the result relation is given by R
= P || Q
⚫ The degree of the result relation given by |R| =
|P| + |Q|
⚫ The Cardinality of the result relation given by
Unary Relational Operations
1. Projection (∏)
⚫ The projection operation is used to either
reduce the number of attributes in the resultant
relation or to record attributes.
⚫ The PROJECT operation selects certain columns
from the table and discards the other columns.
⚫ If the user is interested in only certain
attributes of a relation, then the PROJECT
operation is used to project the relation over
these attributes only.
⚫ The result of the PROJECT operation can hence
be visualized as a vertical partition of the
relation into two relations.
1. One has the needed columns (attributes) and
⚫ In general, the project operation is denoted by
Syntax: ∏<attribute list>(R)
Where, ∏(pi) – represent the PROJECT
operation
<attribute list> is the desired list of
attributes from the attributes of the relation R
Ex: Consider the relation S1
1.For Ex, to list each employee’s age, we can use
the PROJECT operation as follows:
∏age (S1)

2. To list all employee’s Sid and Age, the


PROJECT operation can be written as
∏Sid, Age (S1)
2. SELECT Operation
⚫ This is an operation that selects only some of the
tuples of the relation. Such an operation is
known as a selection operation.
⚫ One can consider the SELECT operation to be a
filter that keeps only those tuples that satisfy a
qualifying condition.
⚫ The SELECT operation can also be visualized as
horizontal partition of the relation into two sets
of tuples – those tuples that satisfy the condition
and are selected, and those tuples that do not
satisfy the condition and are discarded .
Syntax: <selection condition>(R)
Where the symbol (sigma) is used to denote
the SELECT operator and the selection condition
is Boolean expression.
Binary Relational operations
1. JOIN (∞)
⚫ The join operation is one of the most useful
operations in relational algebra and is the most
commonly used way to combine information from
two or more relations
⚫ Rows of two tables are combined based on the
given columns(s) values. The tables being joined
must have one common column.
⚫ The generic join operator (called the Theta join
is: ∞ )
⚫ It takes arguments as the attributes from the two
relations that are to be joined.
⚫ The join condition can be = < > ≤ ≥ ≠
⚫ When the join condition operator is = then we
Types of Join
1.Cross Join 4. Non – Equi join
2. Equi Join 5.Outer Join
3.Natural Join 6. Self Join

EX: Consider the two relations EMP and


DEPART to illustrate all types of joins
1. Cross Join : Each and every column of the first
table will combine with each and every column
of the second table.
2. Equi Join: The most used and important of the
join is the Equi join, also referred to as an INNER
JOIN.
⚫ The equi join joins two tables with a common
column in which each is usually the primary
key.
3. Natural Joins: A Natural join is nearly the same as
the EQUIJOIN, however the NATURAL JOIN differs
from the EQUIJOIN by eliminating duplicates
columns in the joining columns.
The Join condition is the same, but the columns
selected differ. The natural join operator is (*)
4. Non – Equi join: NON – EUQIJOIN joins two
or more tables based on a specified column value
not equalling a specified column value in another
table.
5. Outer Join : in the join operations so far, only
those tuples from both relations that satisfy the join
condition are included in the output relation. The
OUTER JOIN includes other tuples as well according
to few rules.
Three types of outer joins:
a. Left Outer Join - Left Outer Join includes all tuples
in the left hand relation and includes only those
matching tuples from the right hand relation.
b. Right Outer Join: Right Outer Join includes all
tuples in the right hand relation and includes
only those matching tuples from the left hand
relation.
c. Full Outer Join: Full Outer Join includes all
tuples in the left hand relation and from the
right hand relation.
6. Self Join
In SELF JOIN a table is joined to itself. That is, each
row of the table is joined with itself and all other rows
depending on some conditions. In other words we can
say that it is a join between two copies of the same
table.
Ex: Self join works by joining a table with itself on a
defined condition. For example, let’s assume that we
have a group of students and they have a best friend
relationship with some other student in the same
table and we want to know the name of the student
and his friend.
Now, in order to get the name of each student
along with his friend, we can perform a self-join
which will join the table something like this on the
condition that friend id is equal to student_id.

The Result table will look like


2. DIVISION operation
The division operator is used for queries which
involve “all”
R1÷R2 = tuples of R1 associated with all tuples of
R2.
Ex: To retrieve the employee ID (EID) of the
employees working on all projects.

From the two relation, need to retrieve the “all”


the employees who works on all the projects. i.e,
employee who works on both PID1, PID2.
Res EMPLOYEE ÷ PROJECT
Additional Relational Operations
Some common database request which are needed in
commercial query languages for relational DBMSs
cannot performed with the basic relational algebra
operations. These operations enhance the expressive
power of the relational algebra.
1. Aggregate Functions and grouping:
⚫ Certain type of request that cannot be expressed in
the basic relational algebra is to specify
mathematical aggregate functions on collections of
values from the database.
⚫ “ Aggregation function takes a collection of values
and returns a single value as a result.”
⚫ Examples of such function include retrieving the
average or total salary of employees or the total
number of employee tuples.
Consider the relation EMP has the following
tuples:
Grouping :
Another common type of request involves grouping, the
tuples in a relation by the value of some of their attributes
and then applying an aggregate function independently to
each group.
EX:

1. To retrieve the number of Employees and their salary in


each department.
In this relation we have 2 department that is dept1 &
dept2
Syntax:
Grouping Attribute Function name attribute pair , Function name
attribute pair
(R)
Dno COUNTssn , AVERAGE salary (EMPLOYEE)
2. Recursive Closure Operations:
Another type of operation that, in general,
cannot be specified in the basic relational algebra
is recursive closure.
“ This operation is applied to a recursive
relationship between tuples of the same type,
such as the relationship between an employee
and a supervisor”
`
Unit – 3
Chapter - 6
Relational Database
Design
Database Design
Database design is a collection of steps that help create, implement,
and maintain a business's data management systems. The primary
purpose of designing a database is to produce physical and logical models
of designs for the proposed database system.
Database Design Strategies
There are two approaches for developing any database, the top-down
method and the bottom-up method.
• Top – down design method
• Bottom – up design method
• Top – down design method
The top-down design method starts from the general and moves
to the specific. In other words, you start with a general idea of what is
needed for the system and then work your way down to the more
specific details of how the system will interact. This process involves
the identification of different entity types and the definition of each
entity’s attributes
• Bottom – up design method
The bottom-up approach begins with the specific details and
moves up to the general. This is done by first identifying the data
elements (items) and then grouping them together in data sets. In
other words, this method first identifies the attributes, and then
groups them to form entities.
Database Anomalies
• Definition: These are problems in relations that occur due to
redundancy in the relation.
• These Anomalies affect the process of inserting, deleting, and
modifying the data in the relation.
• It is important to remove these Anomalies in order to perform
different processing on the relation without any problem.
Types of Anomalies
Insertion Anomaly
Deletion Anomaly
Modification Anomaly
Insertion Anomaly – An insertion anomaly is the inability to add
data to the database due to the absence of other data. The insertion
anomaly occurs when a new record is inserted in the relation. To
insert the information into the table, we must enter the correct
details so that they are consistent with the values for the other
rows.

An insertion anomaly, until the new faculty member Vasudha is


assigned to teach at least one course, her details cannot be
recorded.
• Deletion Anomaly - It occurs when a record is deleted from
the relation that is when we want to delete a particular attribute
in a database. The records represents that attributes also gets
removed from the database. In the below table attempting to
delete a particular course code deletes the entire faculty record
with respect to that course code.

A deletion anomaly, all information about Ashwini is lost when she


temporarily ceases to be assigned to any courses.
Modification Anomaly – It occurs when the record is updated
in the relation. It occurs due to data inconsistency that resulted
from data redundancy or partial update. In the below table when a
faculty record is stored with two different address, updation
anomaly happens. When we are trying to carryout operations on
this record.

In Update anomaly, faculty 407 is having different address on


different records.
Decomposition
Decomposition is the process of breaking an original relation into
multiple sub relations. Decomposition helps to remove anomalies,
redundancy, and other problems in a DBMS. Decomposition can be
lossy or lossless.
• When a relation in the relational model is not in appropriate normal
form then the decomposition of a relation is required.
• In a database, it breaks the table into multiple tables.
• If the relation has no proper decomposition, then it may lead to
problems like loss of information.
• Decomposition is used to eliminate some of the problems of bad
design like anomalies, inconsistencies, and redundancy.
Types of Decomposition
Decomposition is of two major types in DBMS: Lossless and Lossy
Lossless Decomposition:
• A decomposition is said to be lossless when it is feasible to reconstruct
the original relation R using joins from the decomposed tables. It is
most preferred choice.
• This way, the information will not be lost from the relation when we
decompose it.
• A lossless join would eventually result in the original relation.
Ex: Let us take ‘R’ as the Relational Schema, having an instance of ‘r’.
Consider that it is decomposed into: R1 ,R2 ,R3 ,…Rn ; With instance: r1,r2
,r3 ,….rn ,if r1 ∞ r2 ∞ r3 …... ∞ rn
then it is know as “ Lossless join Decomposition”.
Lossy Decomposition:
• Whenever we decompose relation into multiple relational schemas,
then the loss of data /information is unavoidable whenever we try to
retrieve the original relation.
Properties of Decomposition
1. Decomposition Must be Lossless:
• Decomposition must always be lossless, which means the
information must never get lost from decomposed relation.
• This way we get a guarantee that when joining the relations, the join
would eventually lead to the same relation in the result as it was
actually decomposed.
2. Dependency Preservation:
• It is an important constraint of the database.
• In the dependency preservation, at least one decomposed table
must satisfy every dependency.
• If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of functional dependencies of R1
and R2.
3. Lack of Data Redundancy:
• It is also commonly termed as repetition of data/information.
According to this property, decomposition must not suffer from data
redundancy.
• When decomposition is careless, it may cause issues with overall
data in the database.
• When we perform Normalization, we can easily achieve the
property of lack of data redundancy.
Functional Dependency
• It specifies the relationship between two sets of attributes where one
set can determine the values of other set.
• The functional dependency x y says that y is functionally
dependent on x
• “X” is called “determinant” and “Y” is called “dependent”.
Ex: We have a <Department> table with two attributes – DeptId and
DeptName.
Therefore, the above functional dependency between DeptId and
DeptName can be determined as DeptId is functionally dependent on
DeptName – DeptId 🡪 DeptName.

Properties of Functional Dependencies.


Given that X, Y, and Z are sets of attributes in a relation R. The most
important are Armstrong’s axioms, which are used in database
normalization.
• Subset property (Axiom of Reflexivity): if Y is a subset of X, then X 🡪 Y
• Augmentation (Axiom of Augmentation): if X🡪Y, then XZ🡪 YZ
• Transitivity (Axiom of Transitivity): if X🡪Y and Y🡪Z, then X🡪Z
From these rules, we can derive these secondary rules:
Union: if X🡪Y and X🡪Z, then X🡪YZ
Decomposition: if X🡪YZ, then X🡪Y and X🡪Z
Pseudotransitivity: if X🡪Y and WY🡪Z, then XW🡪Z

Types of Functional Dependency


There are mainly four types of functional Dependency
1. Multivalued dependency
2. Trivial functional Dependency
3. Non-Trivial Dependency
4. Transitive Dependency
1.Trivial functional Dependency
In Trivial functional Dependency, a dependent is always a subset of the
determinant. i.e., if X Y and Y is the subset of X, then it is called trivial
functional dependency.
Consider the Employee table
2. Non Trivial Functional Dependency:
In Non Trivial Functional Dependency, the dependent is strictly not a
subset of the determinant i.e. if X Y and Y is not a subset of X, then
it is called Non Trivial Functional Dependency.
3. Multi-Valued Functional Dependency:
• Multivalued dependency occurs when two attributes in a table are
independent of each other but, both depend on a third attribute.
• A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least
three attributes.
4. Transitive Functional Dependency:
In transitive functional dependency, dependent is indirectly dependent
on determinant. i.e. if a b & b C, then according to axiom of
transitivity, a c. This is transitive functional dependency.
A transitive dependency can be described as follows.
• An attribute is transitively dependent if its value is determined by
another attribute which is not a key
• If X Y and X is not a key then this is a transitive dependency
• A transitive dependency exists when A B C but Not A C
Full functional dependency
A full functional dependency is a functional dependency where the
dependent attributes are determined by the determinant attributes.
For example, in the database of employees, the employee ID number
fully determines the employee's name, address, and other personal
information.

Advantages of functional dependency


• Functional Dependency avoids data redundancy. ...
• It helps you to maintain the quality of data in the database.
• It helps you to defined meanings and constraints of databases.
• It helps you to identify bad designs.
• It helps you to find the facts regarding the database design.
Normalization
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize the redundancy from a relation or set
of relations. It is also used to eliminate undesirable characteristics like
Insertion, Update, and Deletion Anomalies.
• Normalization divides the larger table into smaller and links them using
relationships.
• The normal form is used to reduce redundancy from the database table.
Normal Forms
• The data in the database can be considered to be in one of the number of
“normal forms”. Basically the normal form of the data indicates how much
redundancy is in the that data. The normal forms have a strict ordering.
1. First Normal Form(1NF)
2. Second Normal Form (2NF)
3. Third Normal Form(3NF)
4. BCNF
5. Fourth Normal Form(NF)
6. Fifth Normal Form(5NF)
Properties of Normalization
1. No data value should be duplicate in different rows unnecessarily
2. A value must be specified (and required )for every attribute in a row
3. Each relation should be self- contained.
4. When a row is added to a relation, other relations in the database
should not be affected
5. A value of an attribute in a tuple may be changed independent of other
tuples in the relation and other relations.
Normalization benefits
• Facilitates data integration
• Reduces data redundancy
• Provides a robust architecture for retrieving and maintaining data
• Reduces the chances of data' anomalies occurring
Goals of Normalization (purposes)
There are two goals of the normalization process:
• Eliminating redundant data (for example, storing the same data in
more than one table) and
• Ensuring data dependencies make sense (only storing related data
in a table)
• Reduce the potential for data anomalies

Advantages of Normalization
• Avoid redundancy (same data stored many times in same/different
tables).
• All the update anomalies and does not have any loss of data (or)
inefficient data update process.
• In this, a well organized database where all the tables are
inter-related maintaining integrity and consistency of data.
• All data are stored efficiently since there is no redundancy.
• The entire database system remains consistent over time as the
Disadvantages of Normalization
• Maintaining more tables is a bit tough.
• Nested queries over multiple tables gets tricky.

First Normal Form (1NF)


• First normal form (1NF or Minimal form) is a normal form used in
database normalization.
• A relation will be 1NF if it contains an atomic value (small piece of data
that cannot be further divided)
• It states that an attribute of a table cannot hold multiple values. It must
hold only single-valued attribute.
• First normal form disallows the multi-valued attribute, composite
attribute, and their combinations.
There are basically two rules associated with 1NF
Rule 1 : A column with atomic data cannot have several of the same type
of the data in that column.
Rule 1 : A column with atomic data cannot have several of the same type
of the data in that column.
Rule 2 : A table with atomic data cannot have multiple columns with
same type of data
Second Normal form (2NF)
• A table is in second normal form (2NF) if and only if it is in 1NF and
every non key attribute is fully functionally dependent on the whole
primary key.
• A 1NF table is in 2NF if and only if all its non-prime attributes are
functionally dependent on the whole of a candidate key ( A non-prime
attribute is one that does not belong to any candidate key)
Third Normal form (3NF)
• The third normal form (3NF) is a normal form used in database
normalization. A relation R in 3NF if and only if it is in 2NF and
non-key column does not depend on another non – key column.
• 3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
• If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.
A table is in 3NF if and only if both of the following conditions hold:
✔ The relation R (table) is in second normal form (2NF)
✔ Every non-prime attribute of R is non-transitively dependent (i.e,
directly dependent) on every candidate key of R
Mangalore 575024
Boyce Codd normal form (BCNF)
• Boyce Codd normal form (BCNF) is a normal form used in database
normalization. BCNF was developed in 1974 by Raymond F. Boyce and
Edgar F. Codd
• BCNF is the advance version of 3NF. It is stricter than 3NF.
• A table is in BCNF if every functional dependency X → Y, X is the super
key of the table.
Fourth Normal form (4NF)
• A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.
• For a dependency A → B, if for a single value of A, multiple values of B
exists, then the relation will be a multi-valued dependency.
• Multivalued dependency: A type of functional dependency where the
determinant can determine more than one value.
More formally, there are 3 criteria.
1. There must be at least 3 attributes in the relation. Call them A, B, and
C, for example.
2. Given A, one can determine multiple of values of B.
Given A, one can determine multiple of values of C.
3. B and C are independent of one another.
Fifth Normal form (5NF)
• A relation R is in 5NF also called projection-join normal form (PJ/NF) if
and only if every join dependency in R is implied by the candidate keys
of R
• A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
Process of Normalization
Unit 3
Chapter - 7
SQL
Introduction to SQL
• SQL stands for Structured Query Language. SQL is used to
communicate with a database.
• According to ANSI (American National Standards Institute), it is the
standard database query language for RDMS.
• It enables a user to create, read, update and delete relational databases
and tables.
• All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL
Server use SQL as their standard database language.
• SQL allows users to query the database in a number of ways, using
English-like statements.
Popular RDBMS present in today’s market
There are many different vendors that currently produce RDBMS.
Advantages of SQL
• Faster Query Processing –
Large amount of data is retrieved quickly and efficiently. Operations
like Insertion, deletion, manipulation of data is also done in almost no
time.
• No Coding Skills –
For data retrieval, large number of lines of code is not required. All
basic keywords such as SELECT, INSERT INTO, UPDATE, etc are used
and also the syntactical rules are not complex in SQL, which makes it a
user-friendly language.
• Standardized Language –
Due to documentation and long establishment over years, it provides a
uniform platform worldwide to all its users.
• Portable –
It can be used in programs in PCs, server, laptops independent of any
platform (Operating System, etc). Also, it can be embedded with other
applications as per need/requirement/use.
Disadvantages of SQL
• Complex Interface –
SQL has a difficult interface that makes few users uncomfortable while
dealing with the database.
• Cost –
Some versions are costly and hence, programmers cannot access it.
• Complexity: SQL databases can be complex to set up and manage,
requiring skilled database administrators to ensure optimal
performance and maintain data integrity.
• Partial Control –
Due to hidden business rules, complete control is not given to the
database.
Data Types
• When creating tables, we must specify a datatype for each column we
define. Most of RDBMS are rich with various datatypes to store
different kinds of information.
• By choosing the appropriate datatype, we will able store and retrieve
Five Major categories of SQL Datatype
Types of SQL Statements
• All most all RDBMS use SQL for data manipulation and retrieval.
SQL standard language for relational database systems.
• SQL is non-procedural language, where you need to concentrate
on what you want, not on how you get it.
SQL statements are divided into 4 categories.
• DDL (Data definition language)
1. CREATE 2. ALTER 3.DROP 4. TRUNCATE
• DML (Data Manipulation language)
1. SELECT 2. INSERT 3.UPDATE 4. DELETE
• TCL (Transaction Control language)
1. COMMIT 2. ROLLBACK 3.SAVEPOINT
• DCL (Data Control language)
1. GRANT 2. REVOKE
DDL Statements
• DDL or Data Definition Language actually consists of the SQL
commands that can be used to define the database schema.
• It simply deals with descriptions of the database schema and is used
to create and modify the structure of database objects in the database.
• DDL is a set of SQL commands used to create, modify, and delete
database structures but not data.
• The most important DDL statements in SQL are:
1. CREATE TABLE – creates a new table
2. ALTER TABLE – modifies a table
3. DROP TABLE – deletes a table
4. TRUNCATE – deletes data in a table
1. The CREATE statement
• The create statement is used to create the tables.
• Tables are owned by user who create them.
• Names of tables owned by a given user must be unique. Table names
are not case sensitive.
• Table name must begin with a letter and may contain letters, special
characters up to 30 characters long.
• Column names in the table must be unique.
• Table name must not be SQL reserve words like table, constraint, alter
etc.
Syntax: CREATE TABLE table_name
(
column_name1 datatype [ column_constraint],
column_name2 datatype [ column_constraint],
column_name2 datatype [ column_constraint],
------------------------------------------------------
Ex: create a table “Suppliers” with Supplier_id(number),
supplier_name(string), and contact_name(string) as columns.
Supplier_id and supplier_name should not be null.
CREATE TABLE Suppliers
( Supplier_id number(10) not null, Supplier_name varchar2(50) not
null, Contact_name varchar2(50));
Describe Command
We use the DESCRIBE or DESC command to list all the columns in the
table, along with their datatype, size, nullity and order.
Sy: DESCRIBE <table name> or DESC <table name>
Ex: DESC employee;
2. The ALTER Statement
The ALTER TABLE statement is used to rename an existing table. It can
also be used to add new columns, to modify existing columns, and add
new integrity constraints and can also be used to drop column from a
existing table.
Syntax:
ALTETR TABLE table_name
{ADD | MODIFY(col_name datatype,[col_name datatype]….)}
{ DROP CONSTRAINT constraint_name}
{ADD CONSTRAINT constraint_name CONSTRAINT TYPE col_name
[REFERENCES table_name (col_name)]}
{RENAME to new_table_name}
EX: 1. Rename table from SUPPLIERS to DISTRIBUTORS;
ALTER TABLE SUPPLIERS RENAME to
DISTRIBUTORS;
2. ALTER TABLE Distribution by adding new column “CONTACT
_NO)”
ALTER TABLE DISTRIBUTORS ADD CONTACT_NO
NUMBER(10)
3. ALTER TABLE DISTRIBUTORS by modifying SUPPLIER_NAME
column datatype from VARCHAR2(50) to VARCHAR2(100)
ALTER TABLE DISTRIBUTORS MODIFY SUPPLIER_NAME
varchar2(100)
4. Alter table Distributors by Dropping the column Supplier_name
ALTER TABLE DISTRIBUTORS DROP COLUMN
SUPPLIER_NAME;
3. The DROP Statement
• Removes table from the database.
Sy: DROP TABLE table_name
Ex: DROP TABLE DISTRIBUTORS
Sy: TRUNCATE TABLE <Table_Name>
Ex: TRUNCATE TABLE DISTRIBUTORS

CONSRTAINTS
• SQL constraints are used to specify rules for data in a table.
• Constraints can be specified when the table is created with the CREATE
TABLE statement, or after the table is created with the ALTER TABLE
statement.
Constraint Meaning
NOT NULL Constraint Ensures that a column cannot have NULL value
DEFAULT Constraint Provides a default value for a column when none is specified while inserting the
records.
UNIQUE Constraint Ensures that all values in a column are different
CHECK Constraint Makes sure that all values in a column satisfy certain criteria
Primary key Constraint Used to uniquely identify a row in the table

Foreign key Constraint Used to ensure referential integrity of the data


1. NOT NULL Constraint
• By default, a column can hold NULL values. If you do not want a
column to have a NULL value, then you need to define such a
constraint on this column specifying that NULL is now not allowed
for that column.
• A NULL is not the same as no data, rather, it represents unknown
data.
Ex: CREATE TABLE suppliers (
Supplier_id number(10) NOT NULL,
Supplier_name varchar2(50) NOT NULL,
Contact_name varchar2(50));
Column “Supplier_id” and “Supplier_name” cannot include NULL,
while “Contact_name” can include NULL.
2. DEFAULT CONSTRAINT
• The Default constraint provides a default value to a column when the
insert into statement does not provide a specific value.
Ex: let us consider a table below
create table student ( roll no number(10), first name varchar2(30),
Last_Name varchar2(30), Score Number(3) DEFAULT 80);
• When we execute a following query
Insert into student (roll no, first name, Last_name) values (1001,
‘Snigdha’, ‘Srikanth’);
The table will look like the following

Even though a value for the “SCORE” column in the INSERT INTO
• 3. UNIQUE CONSTRAINT
The UNIQUE constraint ensures that all values in a column
are distinct. For example, in the following CREATE TABLE
statement, column “ROLLNO” has a unique constraint, and
hence cannot include duplicate values.
CREATE TABLE STUDENT
( UNIQU
ROLLNO NUMBER(10) E,
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
SCORE NUMBER(3) DEFAULT 80
);
Assume that table contains 3 records as shown below
ROLLNO FIRST_NAME LAST_NAME SCORE
1001 SNIGDHA SRIKANTH 80
1002 SURABHI KUMARI 80
1003 DHANYA RAMESH 80

Executing the following SQL statement, will result in an error


because ROLLNO=1001 already exists in a ROLLNO column, thus
trying to insert another row with that value violates the UNIQUE
constraint.

INSERT INTO STUDENT (ROLLNO, FIRST_NAME, LAST_NAME)VALUES (1001, ‘sumukh’, ‘rao’);


4. CHECK CONSTRAINT
The CHECK constraint ensures that all values in a column satisfy
certain conditions. Once defined , the database will only insert a new
row or update an existing row if the new value satisfies the CHECK
constraint. The CHECK constraint is used to ensure data quality. For
example, in the following CREATE TABLE statement, Column
“ROLLNO” has a constraint– its value must only include integers
greater than 0.
CREATE TABLE STUDENT
CHECK (ROLLNO > 0)
(
ROLLNO NUMBER(10) ,
FIRST_NAME VARCHAR2(30 Byte) NOT NULL,
LAST_NAME VARCHAR2(30 Byte) UNIQUE, CHECK Constraint

SCORE NUMBER(3) DEFAULT 80


);
So, attempting to execute the following statement, will result in an error
because the values for ROLLNO must be greater than 0.
INSERT INTO STUDENT (ROLLNO, FIRST_NAME, LAST_NAME)
VALUES
(-1000, ‘Snigdha’ , ‘Srikanth’);

CHECK Constraint Violated


5. Primary Key Constraint
A Primary key is a constraint on the contents of a table. In relation
terms, the primary key maintains Entity Integrity for the table.
1. For a given row, the set of values for the primary key columns must
be unique from all other rows in the table.
2. No primary key column can contain a null
3. A table can have only one primary key (set of primary key columns)
The primary key constraint in the CREATE STATEMENT has two
forms.
• When the primary key consists of a single column, it can be declared
as column constraint
Ex: RollNo NUMBER(6) NOT NULL PRIMARY KEY
• As a table constraint, it has the following format:
PRIMARY KEY (column-1, [column-2]……)
Column-1 and column-2 are the names of the columns of the primary
key.
6. Foreign key
• A foreign key is a way to enforce referential integrity within database.
A foreign key means that values in one table must also appear in
another table.
• The referenced table is called the parent table while the table with the
foreign key is called the child table. The foreign key in the child table
will generally reference a primary key in the parent table.
Referential integrity requires that :
1. The column of a foreign key must match in number and type the
column of the primary key in the referenced table.
2. The values of the foreign key columns in each row of the referencing
table must match the values of the corresponding primary key
columns for a row in the referenced table.
DML Statements
• DML stands for Data Manipulation Language. DML commands are
basically used to INSERT, UPDATE, and DELETE data in a database
table.
• That means DML statements affect the records in a table. These are
the basic operations that we perform on data such as inserting new
records, deleting unnecessary records, and updating/modifying
existing records.
1. The SELECT Statement
• The select statement is used to retrieve data from the tables. It can
access only data in the database.
• It cannot manipulate data in the database, although it can operate
on the accessed data.
• The select statement also allows retrieving record from more than
one table.
Syntax: To SELECT all columns from the table
SELECT * from Table_Name;
Ex: SELECT * from STUDENT
OR
2. The INSERT statement
• The SQL INSERT INTO clause facilitates the process of inserting
data into a SQL.
• The INSERT statement is used to insert a record into a table .
• We can insert one record or multiple records using insert statement .
It add new rows to the database.
Syntax: INSERT INTO table [(column, [column]….)] {values (values,
[value]….) | query}
Ex: 1. Insert a new row into SUPPLIERS table
Insert into SUPPLIERS ( SUPPLIER_ID, SUPPLIER_NAME,
CONTACT_NUMBER) values (1000, ‘TOSHIBA’, ‘1234567891’);
2. We can produce the same result with slightly modified SQL
INSERT INTO SYNTAX:
Insert into SUPPLIERS values (1000, ‘TOSHIBA’, ‘1234567891’);
3. The UPDATE statement
• The SQL UPDATE clause serves to update data in database table. The
UPDATE statement is used to update existing records in table.
• The update statement is used to update single record or multiple
record in a table.
• The update statement is used to change values that are already in a
table.
• Syntax: UPDATE <table_name> SET <column_name1> = <value1>,
<column_name2> = <value2>, …………WHERE <condition>;
4. The DELETE Statement
• The DELETE statement is used to delete rows in a table. It is possible
to delete all rows in a table without deleting the table.
Syntax: DELETE FROM <table name> WHERE <condition>;
DELETE * FROM <table name>
• The WHERE clause specifies which record or records that should be
deleted. If we omit the WHERE clause, all records will be deleted.
Transaction Control Language Statements
• TCL stands for Transaction Control Languages.
• These commands are used for maintaining consistency of the
database and for the management of transactions made by the DML
commands.
• A single unit of work in a database is formed after the consecutive
execution of commands is known as a transaction.
• There are certain commands present in SQL known as TCL
commands that help the user manage the transactions that take place
in a database.
• COMMIT, ROLLBACK and SAVEPOINT are the most commonly
used TCL commands in SQL.
1. COMMIT
• TCL statements can be used to commit a transaction, which means to
permanently save the changes made during the transaction to the
database.
2. ROLLBACK
• Rolling back means undoing any changes to data that have been
performed by SQL statements within an uncommitted transaction.
• After a transaction is rolled back the affected data is left unchanged
as if the SQL statements in the transaction were never executed.
OR
This command is used to get the data or restore the data to the last
savepoint or last committed state. If due to some reasons the data
inserted, deleted or updated is not correct, you can rollback the data
to a particular savepoint or if savepoint is not done, then to the last
committed state.
• Roll back can be of two types:
1. Rolling back of entire transaction.
2. Rolling back of transaction to a savepoint.
In case of rollback of entire transaction:
1. All changes made by all the SQL statements in the transactions are
undone using the corresponding rollback segments
2. All transactions locks of data are released
3. The transaction is ended
In case of rollback executed after the savepoint:
1. Only the statements executed after the savepoint are rolled back
2. The specified savepoint is preserved, but all savepoints that were
established after the specified one are lost
3. All tables and row locks acquired since the savepoint are released,
but all data locks acquired previous to the savepoint are retained.
4. The transaction is still active, it cannot be continued.
Syntax: ROLLBACK
3. SAVEPOINT
• Savepoint is a command in SQL that is used with the rollback
command.
• It is a command in Transaction Control Language that is used to
mark the transaction in a table.
• Consider you are making a very long table, and you want to roll back
only to a certain position in a table then; this can be achieved using
the savepoint.
• If you made a transaction in a table, you could mark the transaction
as a certain name, and later on, if you want to roll back to that point,
you can do it easily by using the transaction's name.
• Savepoint is helpful when we want to roll back only a small part of a
table and not the whole table. In simple words, we can say savepoint
is a bookmark in SQL.
Data Control Language Statements
• Data control language (DCL) is used to access the stored data. It is
mainly used for revoke and to grant the user the required access to a
database.
• Privileges allow a user to access objects or execute programs that are
owned by another user or to perform system – level operations.
• Privileges can be granted(assigned) to a user. Once granted, privileges
can be revoked (canceled).
A Privilege is
✔ The right to execute a particular type of SQL statement
✔ The right to connect to the database
✔ The right to create a table in your schema
✔ The right to select rows from someone else’s tables
✔ The right to execute someone else’s stored procedure
QUERYING THE DATABASE
• The most common operation in SQL is the query, which is
performed with the declarative SELECT statement. SELECT retrieves
data from one or more tables, or expressions. Standard SELECT
statements have no persistent effects on the database.
• A query includes a list of columns to be included in the final result
immediately following the SELECT keyword. An asterisk(“*”)can also
be used to specify that the query should return all columns of the
Syntax: SELECT <column _List> FROM <Table _Name>
queried tables.
[WHERE <Condition> ]
[HAVING <Condition>]
[ORDER BY <Expression>];

• SELECT is the most complex statement in SQL, with optional


keywords and clauses that include:
• The WHERE clause includes a conditional retrieval of rows, which
restricts the rows returned by the query. The WHERE clause
eliminates all rows from the result set for which the condition does
not evaluate to True.
• SELECT clause can also be used with DISTINCT to eliminate
duplicate values.
• The GROUP BY clause is used to project rows having common
values into smaller set of rows. GROUP BY is often used in
conjunction with SQL aggregation functions or to eliminate
duplicate rows from a result set. THE WHERE clause is applied
before the GROUP BY clause
• The HAVING clause used to filter rows resulting from the GROUP
BY clause. Because it acts on the results of the GROUP BY clause,
aggregation functions can be used in the HAVING clause.
• The ORDER BY clause identifies which columns are used to sort the
resulting data, and in which direction they should be sorted(options
are ascending or descending). Without an ORDER BY clause, the
SELECT WITH WHERE CLAUSE
1. The WHERE clause is used to filter the results from an SQL
statement –select, update, or delete statement. It is difficult to
explain the basic syntax for the WHERE clause, so instead, we’ll take
a look at some examples.
2. The WHERE clause in the SELECT statement is used to retrieve set
of records depending on the search condition.
3. The search condition comprises of:
• Column name or expression or constant
• Comparison operator
• Expression or column name or constant
Operators Used in SQL
Set Operators
Set operators combine the results of two component queries into a
single result. Queries containing set operators are called Compound
Queries. The SET operators in SQL are UNION, MINUS and
INTERSECT.
• The UNION operator selects only distinct values by default. To allow
duplicate values, use UNION ALL.
Extra
SQL Functions
Single-Row Functions
Types of Single row function.
1. Character Functions
2. Number Functions
3. Date Functions
• The oracle database stores dates in a numeric format: Century, year,
month, day, hours, minutes , and seconds.
• The default date display format is DD-MM-YY
SYSDATE Function
SYSDATE is a date function that returns the current database server date
and time.
Ex: we can display the current date by selecting SYSDATE from a DUAL
table.
4. Conversion Functions
• The oracle uses data of one datatype where it expects data of a
different datatype. When this happens, the oracle server can
automatically convert the data to the expected datatype.
• This data type conversion can be done implicitly by the Oracle or
explicitly by the user.
• The Oracle can automatically convert the following
• Implicit Data-Type Conversion :
• In this type of conversion the data is converted from one type to
From To
another implicitly (by itself/automatically).
Varchar2 or Number
Char
Varchar2 or Date
Char
Number Varchar2
Date Varchar2
Explicit Datatype Conversion
Explicit datatype conversion can be done using to – char, to – number,
& to – date
1. TO_CHAR Function
• The to_char function converts a number or date to a string.
• The syntax for the to_char function is:
Syntax: to_char( value, [ format_mask ], [ nls_language ] )
✔ where value can either be a number or date that will be converted
to a string.
✔ format_mask is optional. This is the format that will be used to
convert value to a string
✔ nls_language is optional. This nls language is used to convert value
to a string.
EX: to_char(1210.86) would return a string “1210.86”
to_char( sysdate, ‘ yyyy/mm/dd ’); would return a string
2. TO_NUMBER Function
• The to_number function converts a string to a number.
• The syntax for the to_number function is
Syntax : to_number ( string1, [ format_mask ], [ nls_langauge ] )
String1 is the string that will be converted to a number.
Format_mask is optional. This is the format that will be used to
convert string1 to a number
nls_language is optional. This is the nls language used to convert
string1 to a number
Ex: to_number(‘1210.85’) would return the number 1210.85
to_number(‘23’) would return the number 23
3. TO_DATE Function
The to_date function converts a string to a date.
Syntax: to_date ( string1, [ format_mask ], [ nls_langauge ] )
String1 is the string that will be converted to a date.
Format_mask is optional. This is the format that will be used to convert
string1 to a date
Ex: to_date(‘2011/07/09’, ‘yyyy/mm/dd ’) would return a date value of
July 9, 2011
to_date(‘070911’, ‘MMDDYY’ ) would return a date value of July 9,
2011
Multi-Row Functions ( Group Functions or Aggregate
Functions )
These are also called group functions/aggregate functions which
operates on set of rows to give one result per group. These sets may
comprise the entire table (or) the table split into groups.
Types of Group functions:
• AVG, COUNT, MAX, MIN, STDDEV, SUM, VARIANCE.
Syntax: SLECT [column], group_function (column), ……
FROM table
[WHERE condition]
[GROUP BY column]
ORDER BY clause
• The ORDER BY clause can be used to sort the rows.
• ORDER BY specifies the order in which the retrieved rows are
displayed.
• ASC order the rows in ascending order (this is the default order).
• DESC orders the rows in descending order.
GROUP BY Clause
We can divide the table of information into smaller groups using the
GROUP BY clause. We can then use group() to return summary
information for each group.
Syntax: SLECT column, group_function (column)
FROM table
[WHERE condition]
[GROUP BY group_by_expression ]
[ORDER BY column];
Ex: next slide
HAVING Clause
• WHERE clause is used to restrict the rows that we select, we can use the
HAVING clause to restrict groups.
• When we use HAVING clause the following actions will happen.
1. Rows are grouped
2. The group function will be applied
3. Group matching the having clause are displayed
• Having clause is used to restrict groups
Syntax: SLECT column, group_function (column)
FROM table
[WHERE condition]
[GROUP BY group_by_expression ]
[HAVING group_condition]
[ORDER BY column];
JOINS
SQL joins are used to query data from two or more tables, based on a
relationship between certain columns in these tables.
• Tables are joined on common columns that have the same data type
and data width in table.
• Joins are used to combine columns from different tables and it also
specifies how to relate tables in query.
• In SQL, JOIN clause is used to combine the records from two or more
tables in a database.
Types of SQL JOIN
• INNER JOIN
• LEFT JOIN
• RIGHT JOIN
• FULL JOIN
PROJECT
1. INNER JOIN
• In SQL, INNER JOIN selects records that have matching values in
both tables as long as the condition is satisfied. It returns the
combination of all rows from both the tables where the condition
satisfies.
Syntax: SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;

Query :
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
INNER JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
2. LEFT JOIN
The SQL left join returns all the values from left table and the matching
values from the right table. If there is no matching join value, it will
return NULL.
Syntax: SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
• Query:
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
LEFT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output:
3. RIGHT JOIN
• In SQL, RIGHT JOIN returns all the values from the values from the
rows of right table and the matched values from the left table. If there
is no matching in both tables, it will return NULL.
Syntax: SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;

Query :
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
RIGHT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right
outer join. Join tables have all the records from both tables. It puts
NULL on the place of matches not found.
Syntax: SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Query:
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
FULL JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
VIEWS
• A view is a database object that is a logical representation of one or
more tables.
• A view does not contain data. Instead, a view is a virtual table,
deriving its data from base tables
• SQL views are data objects, like SQL tables that can be queried,
updated, and dropped.
• We can think of view as a stored query
Syntax: CREATE VIEW <View_Name> as SELECT Column_Name(s)
FROM <Table_Name>
WHERE <condition>
Functions of Views
• It can hide certain columns in a table.
• It allows using function and manipulating data.
• It represents the subset of the data contained in a base table.
Ex: Creating and Accessing view
Embedded SQL
• Embedded SQL is a method of combining the computing power of a
programming language and the database manipulation capabilities of
SQL.
• Embedded SQL statements are SQL statements return inline with
program source code of the host language.
• The popular host language is C
• The C and embedded SQL is called pro*C in oracle
• The steps involved in compiling and embedded SQL program.
This illustration shows the steps necessary to compile an embedded SQL program
Five steps are involved in compiling an embedded SQL
program.
Step 1:
• The embedded SQL program is submitted to the SQL pre-compiler.
• The pre-compiler scans the program and process them.
• A different pre-compiler is required for different programming
languages.
Step 2:
• The pre-compiler produces two output files. The first file is the source
file, stripped of its embedded SQL statements.
• The second file is a copy of all the embedded SQL statements used in
the program. This file is sometimes called a database request module,
or DBRM.
Step 3:
• The source file output from the pre-compiler is submitted to the
standard compiler for the host programming language (such as a C or
Step 4:
• The linker accepts the object modules generated by the compiler, links
them with various library routines, and produces an executable
program.
Step 5:
• The database request module sends its input to the binding utility.
This utility examines the SQL statements, validates and optimises
them and produces an access plan for each statements.
• The result is a combined access plan for the entire program which is
used to access the database.
Dynamic SQL
• It is a programming technique that is used to write SQL queries
during runtime.
• Here the SQL statements are not embedded in the source program
instead they can be entered interactively during runtime.
• Dynamic SQL could be used to create flexible SQL queries.
Dynamic SQL Concepts
• In dynamic SQL, the SQL statements are not hard coded in the
programming language. The text of the SQL statement is asked at the
run time to the user.
• In dynamic SQL, the SQL statements that are to be executed are not
known until runtime, so DBMS can’t get prepared for executing the
statements in advanced.
• When the program is executed, the DBMS takes the text of SQL
statements to execute the statements that are executed in such a
manner called statement string.
Five Steps Execution in Dynamic
SQL
Dynamic Statement Execution (Execute Immediate)
The Execute Immediate statement provides the simplest form of
dynamic SQL. This statement passes the text of SQL statements to
DBMS and asks the DBMS to execute the SQL statements
immediately.
For using the statement our program goes through the following
steps.
• The program constructs a SQL statement as a string of text in one of
its data areas (called a buffer).
• The program passes the SQL statements to the DBMS with the
EXECUTE IMMEDIATE statement.
• The DBMS executes the statement and sets the SQL CODE/SQL
STATE values to flag the finishing status same like if the statement
had been hard coded using static SQL.
Specifying Constraints as Assertions and Triggers
Specifying Constraints as Assertions
• When a constraint involves 2 (or) more tables, the table constraint
mechanism is sometimes hard and results may not come as expected.
• To cover such situation SQL supports the creation of assertions that are
constraints not associated with only one table. And an assertion
statement should ensure a certain condition will always exist in the
database.
• Assertions are different from check constraint in a way that, check
constraint are related to one single row only.
• Assertions on the other hand, involve any number of rows in the table
or any number of other table.
• Assertion can check the condition, return a Boolean value.
• An Assertion is piece of SQL which makes sure a condition is satisfied,
else or it stops the action being taken on a database.
• An Assertion is a constraint that might be dependent upon multiple
• Domain constraints, functional dependency and referential integrity
are special forms of assertion. These forms of assertion are
dependent(involve) on single row of a table at a time.
• Any modification to a database is allowed only if it would not cause
any assertion to be violated i.e., assertions are checked only when
UPDATE or INSERT actions are performed against the table.
Syntax: create assertion <constraint name>
check (<search condition>)
(<Constraint Attributes>)
EX: Consider a employee table and we want to make an assertion that
no employee in our database which is paid more than 50,000 or less
than 25,000.
( Not exists
( Select ID from Employee E
Where E.Salary > 50000 or E.Salary < 25000
)
Specifying Constraints as Triggers
• A triggers is a database object that is associated with a table, will be
activated when a action is executed for the table.
• Triggers are sometimes called event-condition-action rules.
• Triggers are activated only when certain events occur. The usual events
are “insert”, “update”, “delete”.
• When the trigger is awakened, the trigger tests a condition. If the
condition does not hold, then nothing else associated with the trigger
happens in response to the given event.
• On the other hand, if the trigger is satisfied then a pre-defined action is
performed by the trigger.
Syntax: Trigger Creation
CREATE (or) REPLACE TRIGGER trigger_name
[ BEFORE | AFTER ] [ INSERT | UPDATE | DELETE ]
ON table _name
[FOR EACH ROW] [WHEN condition ]
BEGIN
-----------
-----------trigger body
-----------
END;
• [ BEFORE | AFTER ] – This specifies when the trigger should be
execute before the constraints and table update (or) after the
constraints and table update.
• [ INSERT | UPDATE | DELETE ] – It specifies what type of DML
operation should activate the trigger.
• ON Table - This specifies what table the trigger is defined on.
• ON Database: - This is used to specify that the trigger is for a system
event . ( Startup, shutdown, and server error)
• ON Schema – This is used to specify a specific schema for which a
trigger is to be fired.
Unit – 4
Chapter - 8
Transaction Processing
Concepts
INTRODUCTION
• Transaction processing system are systems with large databases and
hundreds of concurrent users that are executing database transactions.
• EX: System for reservations, banking, Credit card processing, stock
markets, supermarket checkout, and other similar systems.
• They require high availability and fast response time for hundreds of
concurrent users.
• In this chapter we present the concepts that are needed in transaction
processing system.
Transaction Processing Concepts
Transaction
A transaction is the basic logical unit of execution in an information
system. A transaction is a sequence of operations that must be executed
as a whole, taking a consistent (correct) database state into another
consistent (correct) database state.

A transaction is a discrete unit of work that must be completely


processed (or) not processed at all.
Transaction Processing System (TPS)
Transaction processing systems are systems with large databases and
hundreds of concurrent users that are executing database transactions.
Ex: Airline reservation system, Order entry/processing systems, Bank’s
account processing systems.
Characteristics of TPS are
• Large amounts of data are processed
• The TPS processes information on a regular basis: daily, weekly,
monthly, etc
• Large storage (database) capacity is required
• High processing speed is needed due to the high volume
• TPS basically monitors and collects past data
• High reliability is required
Transactional processing systems provide three functional areas.
1. System runtime functions
• Transaction processing systems provide an execution environment
that ensures the integrity, availability, and security of data.
• It also ensures fast response time and high transaction throughput .
2. System administration functions
• Transaction processing systems provide administrative support that
lets users configure, monitor, and manage their transaction system.
3. Application development functions
• Transaction processing systems provide functions for use in custom
business applications, Including function to access data, to perform
inter computer communication etc.
Single – user vs. Multi – user System
Single-User System
• A DBMS is single-user if at most one user can use the system at a
time.
• The basic components of a transaction processing system can be
found in single user systems.
• Single-user DBMSs are mostly restricted to personal computer
systems.

• Presentation Services – displays forms, handles flow of information


to/from screen.
Multi-User System
• A DBMS is multi-user if many users can use the system concurrently.
• In multi-user system, many users can use the system and hence
access the database-concurrently
• Dumb terminals connected to mainframe
Application and presentation services on mainframe.
• ACID properties required
Isolation: DBMS sees an interleaved schedule.
Atomicity and durability: System supports a major enterprise.
• Transaction abstraction, implemented by DBMS, provides ACID
properties.
Transaction And Systems Concepts
Transaction Operations
• A transaction is an atomic unit of work that is either completed in its
entirety or not done at all.
• For recovery purposes the system needs to keep track of when the
transaction starts, terminates and commits (or) aborts.
The recovery manager keeps track of the following operations:
• Begin_transaction: marks the beginning of the transaction executes.
• Read (or) Write : Two possible operations on the data. These specify
read or write operations on the database items that are executed as
part of a transaction.
• End_transaction: Specifies that operations have ended and marks the
end of execution. The change can be either committed or aborted.
Recovery techniques use the following operators:
• undo : similar to rollback except that it applies to a single operation
rather than to a whole transaction.
• redo : This specifies that certain transaction operations must be
redone to ensure that all the operations of a committed transaction
have been applied successfully to the database.
Transaction States
A transaction is any one execution of a user program in a DBMS, a
transaction moves through various types of execution states.
• Active state – the initial state, The active state is the first state of every
transaction. In this state, the transaction is being executed.
• Partially committed state - In the partially committed state, a
transaction executes its final operation, but the data is still not saved to
the database.
• Failed state – after the discovery that normal execution can no longer
proceed
• Aborted state – after the transaction has been rolled backed and the
database has been restored to its state prior to the start od transaction.
• Committed state - A transaction is said to be in a committed state if it
executes all its operations successfully. In this state, all the effects are
now permanently saved on the database system.
System Log
• Log is nothing but a file which contains a sequence of records, each
log record refers to a write operation.
• All the log records are recorded step by step in the log file. We can say,
log files store the history of all updates activities.
• This information may be needed to permit recovery from transaction
failures.
• Log contains start of transaction, transaction number, record number,
old value, new value, end of transaction etc. For example, mini
statements in bank ATMs.
• The log is kept on disk, so it is not affected by any type of failure
except for disk (or) catastrophic failure.
• In addition, the log is periodically backed up to archival storage (tape)
to guard against such catastrophic failures.
• “ T ” in the following discussion refers to a unique transaction-id that
is generated automatically by the system and is used to identify each
transaction.
Types of log records
• <Ti, Xi, V1, V2> − update log record,
where Ti=transaction, Xi=data, V1=old data, V2=new value.
• <Ti, start> − Transaction Ti starts execution.
• <Ti, commit> − Transaction Ti is committed.
• <Ti, abort> − Transaction Ti is aborted

[start_transaction, T ] : Records that transaction T has started


execution.
[read_item, T, X] : Records that transaction T has read the value of
database item X.
[commit, T] : Records that transaction T has completed successfully.
[abort, T] : Record that transaction T has been aborted.
Commit Point of a Transaction
A transaction T is said to reach its commit point only when it
completes all its operation that has actually accessed the database and
have been executed successfully and the effect of all transaction
operation on the database have been recorded in the log. The
transaction then writes an entry [commit, T] into the log.
• Roll Back of transaction : to erase all data modifications made from
the start of the transaction or to a savepoint.
• Redoing transactions: Transactions that have written their commit
entry in the log must also have recorded all their write operations in
the log, otherwise they would not be committed, so their effect on the
database can be redone from the log entries.
• Force writing a log: Before a transaction reaches its commit point,
any portion of the log that has not been written to the disk yet must
now be written to the disk.
This process is called force-writing the log file before committing a
transaction.
Desirable properties of Transaction
The transaction has the four properties . In order to maintain
consistency in a database, before and after the transaction. These are
called ACID properties.
ACID properties are applied for maintaining the integrity of database
during transaction processing.
ACID stands for Atomicity, Consistency, Isolation, and Durability.
Atomicity
• It states that all operations of the transaction take place at once if not,
the transction is aborted.
• There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is
not executed at all.
Atomicity involves the following two operations:
Abort: if a transaction aborts then all the changes made are not visible.
Commit: if a transaction commits then all the changes made are visible.
Consistency
• The integrity constraint are maintained so that the database is
consistent before and after the transaction.
• The execution of transaction will leave a database in either its prior
stable state or new stable state.
• The consistent property of database states that every transaction sees
a consistent database instance.
• The transaction is used to transform the database from one consistent
Isolation
• It shows that the data which is used at the time of execution of a
transaction cannot be used by the second transaction until the first
one is completed.
• In isolation, if the transaction T1 is being executed and using the data
item X, then that data item cant be accessed by any other transaction
T2 until the transaction T1 ends.
• The concurrency control subsystem of the DBMS enforced the
isolation property.
Durability
• guarantees that once a transaction is committed, its effects are
permanent and survive any system failures. Together, these
properties ensure reliability and maintain data integrity in DBMS
operations.
(optional)
• Atomicity ensures that a transaction is treated as a single
indivisible unit, either executing all its operations or none
at all.
• Consistency ensures that the database remains in a valid
state before and after a transaction.
• Isolation ensures that concurrent transactions do not
interfere with each other, maintaining data integrity.
• Durability guarantees that once a transaction is committed,
its effects are permanent and survive any system failures.
Together, these properties ensure reliability and maintain
data integrity in DBMS operations.
Transaction Support System in SQL
Transaction initiation is done implicitly when SQL statement is
executed. Every transaction must have explicit end statement, which is
either commit or rollback.
Every transaction has certain characteristics:
• Access mode: read only (or) read write.
• Diagnostic area size: option specifies an integer value n, indicating the
number of conditions that can be held simultaneously in the
diagnostic area.
(The diagnostics area contains two kinds of information:
✔ Statement information, such as the number of conditions that occurred or the
affected-rows count.
✔ Condition information, such as the error code and message.)
• The isolation level: option that can be read uncommitted, read
committed, repeatable read, serializable.
(serializable :ensures that multiple transactions can access and modify the same data without interfering with
If a transaction executes at a lower isolation level than serializable, then
one or more of the following violations may occur.
Dirty read: Reading a value that was written by a transaction which
failed.
Non repeatable read: Allowing another transaction to write a new value
between multiple reads of one transaction.
Phantom: A transaction T1 may read a set of rows from a table, then
another transaction T2 inserts new rows. If T1 repeated, then it will see
a Phantom.
(Phantom Read – Phantom Read occurs when two same queries are executed, but the
rows retrieved by the two, are different.
For example, suppose transaction T1 retrieves a set of rows that satisfy some search
criteria. Now, Transaction T2 generates some new rows that match the search criteria
for transaction T1)
Concurrency Control Techniques
• Concurrency control is a database management systems(DBMS)
concept that is used to address conflicts with the simultaneous
accessing or altering of data that can occur with a multi-user system.
• Concurrency control, when applied to a DBMS, is meant to
co-ordinate simultaneous transactions while preserving data integrity.
The concurrency is about to control the multiuser access of database.
Purpose of Concurrency control
• enforce isolation among transactions.
• preserve database consistency through consistency preserving execution of
transactions.
• resolve read-write and write-read conflicts.
Terms of Concurrency Control
Transaction: A transaction is an execution of a user program as a series of
reads and writes of database objects
Schedules: A schedule is the arrangement of transaction operations. A
schedule may contain a set of transactions (Reading, Writing, Aborting or
committing)To run transactions concurrently, we arrange or schedule their
operations in an interleaved fashion.
Serializability: Serialization helps in checking concurrency control between
multiple transactions. It also helps in maintaining consistency in the database
before and after any transaction. Serializable schedules are resource-efficient
and help in improving CPU throughput
Or
Locking System for Concurrency Control
Two operations lock item and unlock item are used with binary
Locking
2. Shared / exclusive (or) read/Write locks
1. Shared lock:
• It is also known as a Read-only lock. In a shared lock, the data item can
only read by the transaction.
• It can be shared between the transactions because when the transaction
holds a lock, then it can't update the data on the data item.
2. Exclusive lock:
• In the exclusive lock, the data item can be both reads as well as written
by the transaction.
• This lock is exclusive, and in this lock, multiple transactions do not
modify the same data simultaneously.
There are three locking operations:
1. read_lock(X)
2. Write_lock(X)
3. Unlock(X)
• There are three possible states read locked, write locked(or)
unlocked.
1. A read locked item is also called shared-locked because outer
transactions are allowed to read the item.
2. Write locked item is called exclusive-locked because a single
transaction exclusively holds the lock on the item.
3. To implement these three operations on a read/write lock is to
keep of the number of transactions that hold a shared(read) lock on
an item in the lock table.
• Shared/ Exclusive locking scheme, the system must enforce the
following rules:
• A transaction T must issue the operation read_lock(X) (or) before
any read-item(X) operation is performed on T
• A transaction T must issue the operation write_lock(X) before any
write_item(X) operation is performed in T.
• A transaction T must issue the operation unlock(X) after all
• A transaction T will not issue a read_lock(X) operation if it already
holds a read lock(or) a write lock in item X.
• A transaction T will not issue a write_lock(X) operation if it already
holds a read lock (or) write lock on item X.
Lock Conversion
Two phase Locking (2PL)
• The two-phase locking protocol divides the execution phase of the
transaction into three parts.
• In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
• In the second part, the transaction acquires all the locks. The third phase
is started as soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It only
releases the acquired locks.
Two-Phase Locking Techniques : Essential Components
1. Lock Manager: Managing locks on data items.
2. Lock Table: Lock manager uses it to store the identify of transaction
locking a data item, the data item, lock mode and pointer to the next
data item locked. One simple way to implement a lock table is through
linked list.
• Database requires that all transactions should be well-formed.
• It must lock the data item before it reads or write to it.
There are two phases of 2PL:
• Growing phase: In the growing phase, a new lock on the data item may
be acquired by the transaction, but none can be released.
• Shrinking phase: In the shrinking phase, existing lock held by the
transaction may be released, but no new locks can be acquired.
Two Phase policy generates two locking algorithm (a) Basic and (b)
Conservative.
• Conservative: Prevents deadlocks by locking all desired data items
before transaction begins execution.
• Basic: Transaction locks data items incrementally. This may cause
deadlock which is dealt with.
Problems caused by use of Locks
The use of locks, can cause two problems, they are
• Deadlock
• Starvation
Deadlock
• A system is in a deadlock state if there exists a set of transactions such
that every transaction in the set in waiting for another transaction in
the set.
• A deadlock is a situation in which two computer programs sharing the
same resource are effectively preventing each other from accessing
the resource, resulting in both programs ceasing function.
Ex: there exists a set of waiting transaction {T0, T1,……Tn} such that T0
is waiting for data item that is held by T1, T1 is waiting for a data item
that is held by T2, Tn-1 is waiting for a data item that is held by Tn, and
Tn is waiting for a data item that is held by T0. None of the transactions
can make progress in such a situation.
Dealing with Deadlock and Starvation

Deadkock prevention
• A transaction locks all data items it refers to before it begins execution.
• This way of locking prevents deadlock since a transaction never waits
for a data item.
• The conservative two-locking uses this approach.
Deadlock detection and resolution
• In this approach, deadlocks are allowed to happen. The scheduler
maintains a wait-for-graph for detecting cycle. If a cycle exists, then
one transaction involved in then cycle is selected (victim) and
roller-back.
• A wait-for-graph is created using the lock table. As soon as a
transaction is blocked, it is added to the graph. When a chain like: Ti
waits for Tj waits for Tk waits for Ti or Tj occurs, then this creates a
cycle.
Deadlock avoidance
• There are many variations of two-phase locking algorithm.
• Some avoid deadlock by not letting the cycle to complete.
• That is as soon as the algorithm discovers that blocking a transaction
is likely to create a cycle, it rolls back the transaction.
Starvation
• Starvation occurs when a particular transaction consistently waits or
restarted and never gets a chance to proceed further.
• In a deadlock resolution it is possible that the same transaction may
consistently be selected as victim and rolled-back.
• The limitation is inherent in all priority based scheduling
mechanisms.
• In wound-wait scheme a younger transaction may always be wounded
(aborted) by a long running older transaction which may create
starvation.
DeadLock Prevention
A deadlock can be prevented by following two commonly used
schemes.
1. Wait – die
2. Wound-wait
Wait-die Scheme
(Non-Preemptive Scheduling is one in which once the
resources (CPU Cycle) have been allocated to a process,
the process holds it until it completes its burst time or
switches to the 'wait' state.
In non-preemptive scheduling, a process cannot be
interrupted until it terminates itself or its time is over.)
(Preemptive Scheduling is a CPU scheduling technique
that works by dividing time slots of CPU to a given
process.
The time slot given might be able to complete the whole
process or might not be able to it.)
2. Wound – Wait scheme
Deadlock detection
Recovery Techniques
Database recovery techniques are used in database management
systems (DBMS) to restore a database to a consistent state after a
failure or error has occurred. The main goal of recovery techniques is
to ensure data integrity and consistency and prevent data loss.
Failure Classification
Recovery Techniques in DBMS
Concurrency Control Based on Time Stamp Ordering
• Timestamp TS(T) is a unique identifier created by DBMS to identify
the transaction
• The timestamp values are assigned in the order in which the
transaction are submitted to the system
• The timestamp can be generated using a counter that is incremented
each time its value is assigned to a transaction
• Timestamp ordering algorithm (TO) it orders the transaction
according to their timestamps
• It generates serializability of schedules
• The algorithm must ensure that for each item accessed by conflicting
operations in the schedule, does not violate the serializability order.
Database Backup and Recovery from Catastrophic Failures

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy