0% found this document useful (0 votes)
29 views91 pages

Unit 1 and 2 It03

A Database Management System (DBMS) is software that efficiently manages, organizes, and retrieves structured data, playing a crucial role in modern computing. It minimizes data redundancy, ensures data integrity and security, and supports various architectures such as 1-Tier, 2-Tier, and 3-Tier for different application needs. Key features include data modeling, storage and retrieval, concurrency control, and backup mechanisms, with types including Relational, NoSQL, and Object-Oriented DBMS.

Uploaded by

Rishbah Tyagi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views91 pages

Unit 1 and 2 It03

A Database Management System (DBMS) is software that efficiently manages, organizes, and retrieves structured data, playing a crucial role in modern computing. It minimizes data redundancy, ensures data integrity and security, and supports various architectures such as 1-Tier, 2-Tier, and 3-Tier for different application needs. Key features include data modeling, storage and retrieval, concurrency control, and backup mechanisms, with types including Relational, NoSQL, and Object-Oriented DBMS.

Uploaded by

Rishbah Tyagi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 91

UNIT 1

Introduction of DBMS (Database Management System)



A Database Management System (DBMS) is a software solution designed to
efficiently manage, organize, and retrieve data in a structured manner. It serves as a critical
component in modern computing, enabling organizations to store, manipulate, and secure
their data effectively. From small applications to enterprise systems, DBMS plays a vital role in
supporting data-driven decision-making and operational efficiency.
What is a DBMS?
A DBMS is a system that allows users to create, modify, and query databases while
ensuring data integrity, security, and efficient data access. Unlike traditional file systems,
DBMS minimizes data redundancy, prevents inconsistencies, and simplifies data management
with features like concurrent access and backup mechanisms. It organizes data
into tables, views, schemas, and reports, providing a structured approach to data management.
Example:
A university database can store and manage student information, faculty records, and
administrative data, allowing seamless retrieval, insertion, and deletion of information as
required.
Key Features of DBMS
1. Data Modeling: Tools to create and modify data models, defining the structure and
relationships within the database.
2. Data Storage and Retrieval: Efficient mechanisms for storing data and executing queries to
retrieve it quickly.
3. Concurrency Control: Ensures multiple users can access the database simultaneously
without conflicts.
4. Data Integrity and Security: Enforces rules to maintain accurate and secure data, including
access controls and encryption.
5. Backup and Recovery: Protects data with regular backups and enables recovery in case of
system failures.
Types of DBMS
There are several types of Database Management Systems (DBMS), each tailored to different
data structures, scalability requirements, and application needs. The most common types are as
follows:
1. Relational Database Management System (RDBMS)
RDBMS organizes data into tables (relations) composed of rows and columns. It uses primary
keys to uniquely identify rows and foreign keys to establish relationships between tables. Queries
are written in SQL (Structured Query Language), which allows for efficient data manipulation
and retrieval.
Examples: MySQL, Oracle, Microsoft SQL Server and Postgre SQL.
2. NoSQL DBMS
NoSQL systems are designed to handle large-scale data and provide high performance for
scenarios where relational models might be restrictive. They store data in various non-relational
formats, such as key-value pairs, documents, graphs, or columns. These flexible data models
enable rapid scaling and are well-suited for unstructured or semi-structured data.
1
Examples: MongoDB, Cassandra, DynamoDB and Redis.

2
3. Object-Oriented DBMS (OODBMS)
OODBMS integrates object-oriented programming concepts into the database environment,
allowing data to be stored as objects. This approach supports complex data types and
relationships, making it ideal for applications requiring advanced data modeling and real-world
simulations.
Examples: ObjectDB, db4o.

database system Vs file system


A file system and a DBMS are two kinds of data management systems that are used in different
capacities and possess different characteristics. A File System is a way of organizing files into
groups and folders and then storing them in a storage device. It provides the media that stores
data as well as enables users to perform procedures such as reading, writing, and even erasure.
On the other hand, DBMS is a more elaborate software application that is solely charged with the
responsibility of managing large amounts of structured data. It provides functionalities such as
query, index, transaction, as well as data integrity. Although the file system serves well for the
purpose of data storage for applications where data is to be stored simply and does not require
any great organization, DBMS is more appropriate for applications where data needs to be stored
and optimized for organizational and structural needs, security, etc.
File System
The file system is basically a way of arranging the files in a storage medium like a hard disk.
The file system organizes the files and helps in the retrieval of files when they are required. File
systems consist of different files which are grouped into directories. The directories further
contain other folders and files. The file system performs basic operations like management, file
naming, giving access rules, etc.
Example: NTFS(New Technology File System) , EXT(Extended File System).

File System

DBMS ( Database Management System)


Database Management System is basically software that manages the collection of related data.
It is used for storing data and retrieving the data effectively when it is needed. It also provides
proper security measures for protecting the data from unauthorized access. In Database
Management System the data can be fetched by SQL queries and relational algebra. It also
provides mechanisms for data recovery and data backup.

Example:
Oracle, MySQL, MS SQL server.

3
DBMS
Difference Between File System and DBMS
Basics File System DBMS

The file system is a way of


DBMS is software for managing the
arranging the files in a storage
database.
Structure medium within a computer.

Data Redundant data can be present in a


In DBMS there is no redundant data.
Redundancy file system.

It doesn't provide Inbuilt mechanism


It provides in house tools for backup
Backup and for backup and recovery of data if it
and recovery of data even if it is lost.
Recovery is lost.

Query There is no efficient query Efficient query processing is there in


processing processing in the file system. DBMS.

There is more data consistency


There is less data consistency in the
because of the process
file system.
Consistency of normalization .

It is less complex as compared to It has more complexity in handling


Complexity DBMS. as compared to the file system.

DBMS has more security


File systems provide less security in
Security mechanisms as compared to file
comparison to DBMS.
Constraints systems.

It has a comparatively higher cost


It is less expensive than DBMS.
Cost than a file system.

In DBMS data independence exists,


mainly of two types:
There is no data independence.
Data 1) Logical Data Independence .
Independence 2) Physical Data Independence.

Only one user can access data at a Multiple users can access data at a
User Access time. time.

4
Basics File System DBMS

The users are not required to write The user has to write procedures for
Meaning procedures. managing databases

Data is distributed in many files. So, Due to centralized nature data


Sharing it is not easy to share data. sharing is easy

Data It give details of storage and It hides the internal details


Abstraction representation of data of Database

Integrity Integrity Constraints are difficult to Integrity constraints are easy to


Constraints implement implement

To access data in a file , user


requires attributes such as file name, No such attributes are required.
Attribute s file location.

Example Cobol , C++ Oracle , SQL Server

,Database system concept and architecture,


A database system is a structured collection of data designed for efficient storage, retrieval, and
management. Its architecture, the structural design, dictates how data is organized and accessed,
influencing efficiency and effectiveness. Key concepts include data models, schemas, and the
three-tier architecture.

Key Concepts:

 Data Models:
Define how data is organized and represented within the database, influencing how it's stored
and retrieved. Examples include relational, entity-relationship, and object-oriented models.
 Schemas:
Describe the structure and constraints of a database, including tables, columns, and
relationships.
 Database Management System (DBMS):
Software that manages the database, handling data access, storage, and manipulation

5
DBMS Architecture 1-level, 2-Level, 3-Level

A Database stores a lot of critical information to access data quickly and securely. Hence it is
important to select the correct architecture for efficient data management. Database Management
System (DBMS) architecture is crucial for efficient data management and system performance. It
helps users to get their requests done while connecting to the database. It focuses on how the
database is designed, built and maintained, shaping how users access and interact with it. This
article explains different DBMS architectures like client/server systems and database models.
Types of DBMS Architecture
There are several types of DBMS Architecture that we use according to the usage requirements.
Types of DBMS Architecture are discussed here.
 1-Tier Architecture
 2-Tier Architecture
 3-Tier Architecture
1- Tier Architecture
In 1-Tier Architecture the database is directly available to the user, the user can directly sit on
the DBMS and use it that is, the client, server, and Database are all present on the same machine.
This setup is simple and is often used in personal or standalone applications where the user
interacts directly with the database.
For Example: A Microsoft Excel spreadsheet is a great example of one-tier architecture.
 Everything—the user interface, application logic and data is handled on a single system.
 The user directly interacts with the application, performs operations like calculations or data
entry and stores data locally on the same machine.
This architecture is simple and works well for personal, standalone applications where no
external server or network connection is needed.
DBMS 1-Tier Architecture
Advantages of 1-Tier Architecture
Below mentioned are the advantages of 1-Tier Architecture.
 Simple Architecture: 1-Tier Architecture is the most simple architecture to set up, as only a
single machine is required to maintain it.
 Cost-Effective: No additional hardware is required for implementing 1-Tier Architecture,
which makes it cost-effective.
 Easy to Implement: 1-Tier Architecture can be easily deployed, and hence it is mostly used
in small projects.

2- Tier Architecture
The 2-tier architecture is similar to a basic client-server model . The application at the client end
directly communicates with the database on the server side. APIs like ODBC and JDBC are used
for this interaction. The server side is responsible for providing query processing and transaction
management functionalities. On the client side, the user interfaces and application programs are
run. The application on the client side establishes a connection with the server side to
communicate with the DBMS.

6
For Example: A Library Management System used in schools or small organizations is a
classic example of two-tier architecture.
1. Client Layer (Tier 1): This is the user interface that library staff or users interact with. For
example they might use a desktop application to search for books, issue them, or check due
dates.
2. Database Layer (Tier 2): The database server stores all the library records such as book
details, user information, and transaction logs.
The client layer sends a request (like searching for a book) to the database layer which processes
it and sends back the result. This separation allows the client to focus on the user interface, while
the server handles data storage and retrieval.
DBMS 2-Tier Architecture

Advantages of 2-Tier Architecture


 Easy to Access: 2-Tier Architecture makes easy access to the database, which makes fast
retrieval.
 Scalable: We can scale the database easily, by adding clients or upgrading hardware.
 Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture and Multi-
Tier Architecture .
 Easy Deployment: 2-Tier Architecture is easier to deploy than 3-Tier Architecture.
 Simple: 2-Tier Architecture is easily understandable as well as simple because of only two
components.
3- Tier Architecture
In 3-Tier Architecture , there is another layer between the client and the server. The client does
not directly communicate with the server. Instead, it interacts with an application server which
further communicates with the database system and then the query processing and transaction
management takes place. This intermediate layer acts as a medium for the exchange of partially
processed data between the server and the client. This type of architecture is used in the case of
large web applications.
For Example: E-commerce Store
User: You visit an online store, search for a product and add it to your cart.
Processing: The system checks if the product is in stock, calculates the total price and applies
any discounts.
Database: The product details, your cart and order history are stored in the database for future
reference.

DBMS 3-Tier Architecture

Advantages of 3-Tier Architecture


 Enhanced scalability: Scalability is enhanced due to the distributed deployment of
application servers. Now, individual connections need not be made between the client and
server.
 Data Integrity: 3-Tier Architecture maintains Data Integrity. Since there is a middle layer
between the client and the server, data corruption can be avoided/removed.
 Security: 3-Tier Architecture Improves Security. This type of model prevents direct
interaction of the client with the server thereby reducing access to unauthorized data.

7
Disadvantages of 3-Tier Architecture
 More Complex: 3-Tier Architecture is more complex in comparison to 2-Tier Architecture.
Communication Points are also doubled in 3-Tier Architecture.
 Difficult to Interact: It becomes difficult for this sort of interaction to take place due to the
presence of middle layers

data model schema and instances

Data Models, Schemas, and Instances

A data model is a conceptual framework for organizing and defining the structure,
operations, and constraints of data in a database.

• It describes how data is stored, connected, accessed, and manipulated.

Types of Data Models:

1.) Hierarchical Data Model

The Hierarchical Data Model is the data model that organizes data in a tree-like structure,
where each record (or node) has a single parent, but can have multiple children.

 It represents one-to-many relationships, where child records are dependent on a single


parent record.
 This structure is useful for representing data with a clear hierarchical relationship, such
as organizational structures or file systems.

Example: A company database where each department is a parent and employees are
children of the department.

2.) Network Data Model

The Network Data Model is an extension of the hierarchical model that allows many-to-
many relationships.

 In this model, data is represented using records (nodes) and sets (edges or links) to
connect them.
 Each record can have multiple parents and multiple children, allowing for more complex
relationships between records.

Example: A database of employees and projects where employees can work on multiple
projects, and each project can involve multiple employees.

8
3.) Relational Data Model

The Relational Data Model organizes data in two-dimensional tables (relations), consisting
of rows (tuples) and columns (attributes). Each table represents a different entity, and
relationships between tables are maintained through the use of foreign keys.

 The relational model provides a highly flexible way to handle data and is the foundation
of modern database systems like MySQL, PostgreSQL, and Oracle.

Example: A student database with tables for students, courses, and enrollments where
relationships are defined through keys.

4.) Object-Oriented Data Model

The Object-Oriented Data Model is based on the principles of object-oriented programming.

 In this model, data is stored in objects, which are instances of classes, and can contain
both data (attributes) and methods (operations).
 This model allows for better representation of real-world entities by encapsulating both
the state and behavior of an entity.

Example: A product inventory system where each product is an object with properties like
name, price, and methods to update or calculate discounts.

5.) Entity-Relationship (ER) Model

The Entity-Relationship (ER) Model is a high-level conceptual model that defines the
structure of data by describing entities (objects), attributes (properties of entities), and
relationships between entities.

 The ER model is typically represented visually using ER diagrams, which help in


designing a database by mapping out the entities and relationships in a system.

Example: A university database where students, courses, and instructors are entities, and
the relationships between them (e.g., a student enrolls in a course) are defined.

Schemas

9
A schema is the overall logical structure of a database that defines how the data is
organized and how relationships among data are maintained.

 It can be viewed as a blueprint or architecture of the database that defines tables, fields,
data types, and relationships.

Types of Schemas:

1.) Physical Schema:

The physical schema describes how the data is actually stored on the storage media. It
includes details about the physical storage of data, such as file structures, indexing methods,
and storage allocations.

 Determines how the logical schema is implemented on physical storage devices.


 Optimizes query performance and storage utilization.

2.) Logical Schema:

The logical schema is an abstract representation of the database’s structure, capturing the
logical relationships between data elements without concern for the physical implementation
details.

 Provides a high-level understanding of the data and its relationships.


 Ensures data consistency, integrity, and normalization.
 Serves as a blueprint for database administrators and developers.

3.) External Schema:

The external schema defines how individual users or user groups interact with the database.
It provides a customized view of the database tailored to the needs of different users or
applications.

 Ensures data security by controlling user access to specific data.


 Simplifies user interactions by presenting only the relevant data and hiding the rest.

Instances

10
An instance refers to the actual data stored in a database at a particular moment in time. It
is the snapshot of the database content.

Explanation:

Instance vs. Schema: While a schema defines the structure of the database (tables,
columns, etc.), an instance represents the actual content within that structure at any given
point. The schema remains relatively static, while instances can change frequently as data is
inserted, updated, or deleted.

• Example: In a relational database with a table Students (defined by the schema), an


instance would be the actual rows of data currently stored in the Students table.

Relation Between Data Models, Schemas, and Instances:

 Data Model: Provides the abstract framework and rules for how data can be stored and
manipulated (e.g., relational model, ER model).
 Schema: Implements the data model for a specific database, defining its structure
(tables, fields, relationships).
 Instance: Represents the actual data that populates the schema at any given time.

data independence and database language and interfaces

What is Data Independence in DBMS?


Data independence is a property of a database management system by which we can change the
database schema at one level of the database system without changing the database schema at the
next higher level. In this article, we will learn in full detail about data independence and will also
see its types. If you read it completely, you will understand it easily.
What is Data Independence in DBMS?
In the context of a database management system, data independence is the feature that allows the
schema of one layer of the database system to be changed without any impact on the schema of
the next higher level of the database system. ” Through data independence, we can build an
environment in which data is independent of all programs, and through the three schema
architectures, data independence will be more understandable. Data via two card stencils along
with centralized DBMS data is a form of transparency that has value for someone.
It can be summed up as a sort of immunity of user applications that adjusts correctly and does
not change addresses, imparting the class of data and their order. I want the separate applications
not to be forced to deal with data representation and storage specifics because this decreases
quality and flexibility. DBMS permits you to see data with such a generalized sight. It actually
means that the ability to change the structure of the lower-level schema without presenting the
upper-level schema is called data independence.

11
Types of Data Independence
There are two types of data independence.
 logical data independence
 Physical data independence

Logical Data Independence


 Changing the logical schema (conceptual level) without changing the external schema (view
level) is called logical data independence.
 It is used to keep the external schema separate from the logical schema.
 If we make any changes at the conceptual level of data, it does not affect the view level.
 This happens at the user interface level.
 For example, it is possible to add or delete new entities, attributes to the conceptual schema
without making any changes to the external schema.

Physical Data Independence


 Making changes to the physical schema without changing the logical schema is called
physical data independence.
 If we change the storage size of the database system server, it will not affect the conceptual
structure of the database.
 It is used to keep the conceptual level separate from the internal level.
 This happens at the logical interface level.
 Example – Changing the location of the database from C drive to D drive.

Difference Between Physical and Logical Data Independence


Physical Data Independence Logical Data Independence

It mainly concerns how the data is stored in It mainly concerns about changes to the
the system. structure or data definition.

It is easier to achieve than logical It is difficult to achieve compared to physical


independence. independence.

To make changes at the physical level we


To make changes at the logical level, we need
generally do not require changes at the
to make changes at the application level.
application program level.

It tells about the internal schema. It tells about the conceptual schema.

There may or may not be a need for changes Whenever the logical structure of
to be made at the internal level to improve the the database has to be changed, the changes
structure. made at the logical level are important.

12
Physical Data Independence Logical Data Independence

Example- change in compression


Example – adding/modifying or deleting a new
technology, hashing algorithm, storage
attribute.
device etc.

Database Languages

Database languages are used to read, store and update the data in the database. Specific
languages are used to perform various operations of the database.

Types of Database Languages

DDL(Data Definition Language)

Data Definition Language(DDL) is used for describing structures or patterns and its relationship
in a database. It is also used to define the database schema, tables, index, Constraints, etc. It can
also be used to store information like the number of tables, names, columns, indexes, etc. The
commands only affect the database structure and not the data.
The commands used in DDL are:
Create: It is used to create a database or table.
Alter: It is used to make a change in the structure of a database.
Drop: It is used to completely delete a table from the database
Rename: It is used to rename a table.
Truncate: It is used to delete the entities inside the table while holding the structure of the table.
Comment: It is used to comment on the data dictionary.

DML(Data Manipulation Language)

DML is used to manipulate the data present in the table or database. We can easily perform
operations such as store, modify, update, and delete on the database.

The commands used in DML are:


Select: It shows the record of the specific table. Also, it can be used with a WHERE clause to

13
get the particular record.

14
Insert: It allows users to insert data into the database or tables.
Update: It is used to update or modify the existing data in database tables.
Delete: It is used to delete records from the database tables. Also, it can be used with a WHERE
clause to delete a particular row from the table.
Merge: It allows the insert and update(UPSERT) operations.

DCL(Data Control Language)

DCL works to deal with SQL commands that are used to permit a user to access, modify and
work on a database. it is used to access stored data. It gives access, revokes access, and changes
the permission to the owner of the database as per the requirement.
The commands used in DCL are:
Grant: It is used to give access to security privileges to a specific database user.
Revoke: It is used to revoke the access from the user that is being granted by the grant
command.

TCL(Transaction Control Language)

It can be grouped into a logical transaction and is used to run the changes made by the DML
command in the database.
Commit: Transaction on the database is saved using Commit.
Rollback: The database gets restored to the original since the last commit.

Interface

An interface is a program that allows users to input queries into a database without writing the
code in the query language. An interface can be used to manipulate the database for adding,
deleting, updating, or viewing the data.

Types of Interface are

Form?based Interface

A form is displayed to each user by the form?based interface. The user fills in the details and
submits the form to make a new entry into the database. It can also be done when the user only
fills in some details and the system will help by retrieving the rest of the details from the
database. The form?based interface is built for the naive user(inexperienced user) which deals
with a limited number of operations. Many DBMS have specification language which helps the
programmer define such forms.

15
Example
Student entering his roll. no, branch in the form to get the grade card.

Menu?based User Interface

In this interface, the user was provided with a list of options (called a menu) through which the
user forms a request. The user doesn't need to memorize the command and syntax and the query
is composed step by step by picking options from a menu. Pull?down menu interfaces are mostly
used in web?based user interfaces and are often used in browsing interfaces by which the
database content can be looked through.
Example
In a shopping website, categories are selected from the menu, brands are selected from the menu
of brands, and budget ranges are applied from the menu of budget range.

GUI(Graphical User Interface)

Users are provided a schema of diagrammatic form by which query can be specified through
manipulating the diagram. GUI utilizes both menu and form in several cases. Schema Diagram's
specific parts are selected using devices used by GUI.
Example
You liked a video on Instagram by tapping with your finger, and the color changes to red. The
visual graphic gets changed due to user action.

Natural Language Interface

A natural language interface contains its unique schema more like the high?level conceptual
schema. It also has a directory of important words. It generates a query based on the
interpretation of important words in the input by the user and if the interpretation is successful,
then it displays the result to the user.
Example
A user googled the fastest car in India, and now the natural language interface will look for the
important words i. e. fastest, car, India, and show the result accordingly.

Speech Input and Output

16
The users query the interface with speech and get the answer in speech. The input is detected
using predefined words and conversions are done into speech to provide the output. Nowadays, it
has become the most common type of interface.
Example
OK Google, Siri on Apple, and Alexa is used in the form of speech.

Interface for DBA

DBA staff are provided commands that can only be used by them only to create an account,
grant account authorization, and change a schema, and storage structure reorganization.

DDL Full Form - Data Definition Language


DDL stands for Data Definition Language. These are the commands that are used to change the
structure of a database and database objects. For example, DDL commands can be used to add,
remove, or modify tables within a database.
In this article, We will learn about the DDL Full Form by understanding various examples and so
on.
What is DDL?
DDL actually represents Data Definition Language, which is actually a set of commands used to
create a structure and maintain databases. Those would
include CREATE, ALTER, DROP, TRUNCATE, and RENAME statements for creating,
changing the structure of, and dropping structures in the database, such as tables. DDL basically
deals with the storage of the data and not the data itself.
DML

DML, or Data Manipulation Language, is a programming language used to manipulate data


within a database. It's a core component of SQL and database management, allowing users to
interact with and modify data. Key DML commands include SELECT, INSERT, UPDATE, and
DELETE.

Elaboration:

 Purpose:
DML statements are used to perform operations on the data stored in a database, such as
retrieving, adding, modifying, and deleting records.

 Key Commands:

17
 SELECT: Retrieves data from a database.
 INSERT: Adds new data to a database.
 UPDATE: Modifies existing data in a database.
 DELETE: Deletes data from a database

Overall Database Structure.

A database structure is the way data is organized and stored in a database system. This structure
includes how different data elements relate to each other, the types of data stored, and the
relationships between those data elements. The overall design of a database is also known as
the database schema.
Here's a more detailed breakdown:

1. Database Schema:

 Logical Structure:
The schema defines the logical organization of the data, including tables, fields, relationships,
and rules.
 Relationships:
It outlines the relationships between entities, such as primary and foreign keys.
 Data Organization:
The schema helps resolve issues with unstructured data by organizing it in a clear, structured
way.
2. Key Components of a Database Structure:
 Tables: Data is organized into tables with rows and columns.
 Fields (Attributes): Columns within a table that define the type of data stored.
 Records (Rows): A collection of fields (columns) that represents a single entity.
 Data Dictionary: A centralized repository that stores metadata, including data types,
relationships, and constraints.
 Indexes: Used for faster retrieval of data.
 Constraints: Rules that ensure data integrity, such as primary keys, foreign keys, and data type
constraints.

3. DBMS Architecture (Three Levels):

18
 Internal Level:
Deals with the physical storage of data, including disk storage, data compression, and
indexing.
 Conceptual Level:
Represents the logical organization of the data, including tables, attributes, and relationships,
independent of any specific DBMS.
 External Level:
Defines how users interact with the database, providing customized views and interfaces.
4. Types of Database Structures:

 Hierarchical: Organizes data in a tree-like structure.


 Network: Extends the hierarchical model with more flexible relationships.
 Relational: Organizes data into tables with rows and columns.
 Object-Oriented: Stores data as objects with attributes and methods.

Data modeling using the Entity Relationship Model:


Introduction of ER Model


The Entity-Relationship Model (ER Model) is a conceptual model for designing a databases. This
model represents the logical structure of a database, including entities, their attributes and
relationships between them.
 Entity: An objects that is stored as data such as Student, Course or Company.
 Attribute: Properties that describes an entity such as StudentID, CourseName,
or EmployeeEmail.
 Relationship: A connection between entities such as "a Student enrolls in a Course".
Components of ER Diagram

The graphical representation of this model is called an Entity-Relation Diagram (ERD).


ER Model in Database Design Process
We typically follow the below steps for designing a database for an application.
 Gather the requirements (functional and data) by asking questions to the database users.
 Create a logical or conceptual design of the database. This is where ER model plays a role.
It is the most used graphical representation of the conceptual design of a database.
 After this, focus on Physical Database Design (like indexing) and external design (like
views)

Why Use ER Diagrams In DBMS?

19
 ER diagrams represent the E-R model in a database, making them easy to convert into
relations (tables).
 These diagrams serve the purpose of real-world modeling of objects which makes them
intently useful.
 Unlike technical schemas, ER diagrams require no technical knowledge of the underlying
DBMS used.
 They visually model data and its relationships, making complex systems easier to
understand.
Symbols Used in ER Model
ER Model is used to model the logical view of the system from a data perspective which
consists of these symbols:
 Rectangles: Rectangles represent entities in the ER Model.
 Ellipses: Ellipses represent attributes in the ER Model.
 Diamond: Diamonds represent relationships among Entities.
 Lines: Lines represent attributes to entities and entity sets with other relationship types.
 Double Ellipse: Double ellipses represent multi-valued Attributes, such as a student's
multiple phone numbers
 Double Rectangle: Represents weak entities, which depend on other entities for
identification.

What is an Entity?
An Entity represents a real-world object, concept or thing about which data is stored in a
database. It act as a building block of a database. Tables in relational database represent these
entities.
Example of entities:
 Real-World Objects: Person, Car, Employee etc.
 Concepts: Course, Event, Reservation etc.
 Things: Product, Document, Device etc.
The entity type defines the structure of an entity, while individual instances of that type
represent specific entities.
What is an Entity Set?
An entity refers to an individual object of an entity type, and the collection of all entities of a
particular type is called an entity set. For example, E1 is an entity that belongs to the entity
type "Student," and the group of all students forms the entity set.
In the ER diagram below, the entity type is represented as:
Entity Set

We can represent the entity sets in an ER Diagram but we can't represent individual entities
because an entity is like a row in a table, and an ER diagram shows the structure and
relationships of data, not specific data entries (like rows and columns). An ER diagram is a
visual representation of the data model, not the actual data itself.

Types of Entity

20
There are two main types of entities:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute that can uniquely identify each
instance of the entity. A Strong Entity does not depend on any other Entity in the Schema for
its identification. It has a primary key that ensures its uniqueness and is represented by a
rectangle in an ER diagram.
2. Weak Entity
A Weak Entity cannot be uniquely identified by its own attributes alone. It depends on a strong
entity to be identified. A weak entity is associated with an identifying entity (strong entity),
which helps in its identification. A weak entity are represented by a double rectangle. The
participation of weak entity types is always total. The relationship between the weak entity
type and its identifying strong entity type is called identifying relationship and it is represented
by a double diamond.
Example:
A company may store the information of dependents (Parents, Children, Spouse) of an
Employee. But the dependents can't exist without the employee. So dependent will be a Weak
Entity Type and Employee will be identifying entity type for dependent, which means it is
Strong Entity Type.
Strong Entity and Weak Entity

Attributes in ER Model
Attributes are the properties that define the entity type. For example, for a Student entity
Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes that define entity type
Student. In ER diagram, the attribute is represented by an oval.
Attribute

Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute.
For example, Roll_No will be unique for each student. In ER diagram, the key attribute is
represented by an oval with an underline.
Key Attribute

2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example,
the Address attribute of the student Entity type consists of Street, City, State, and Country. In
ER diagram, the composite attribute is represented by an oval comprising of ovals.
Composite Attribute

3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No (can
be more than one for a given student). In ER diagram, a multivalued attribute is represented by
a double oval.
Multivalued Attribute

4. Derived Attribute

21
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.
Derived Attribute

The Complete Entity Type Student with its Attributes can be represented as:
Entity and Attributes

Relationship Type and Relationship Set


A Relationship Type represents the association between entity types. For example, ‘Enrolled
in’ is a relationship type that exists between entity type Student and Course. In ER diagram,
the relationship type is represented by a diamond and connecting the entities with lines.
Entity-Relationship Set

A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
Relationship Set

Degree of a Relationship Set


The number of different entity sets participating in a relationship set is called the degree of a
relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a relation, the
relationship is called a unary relationship. For example, one person is married to only one
person.
Unary Relationship

2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship

3. Ternary Relationship: When there are three entity sets participating in a relationship, the
relationship is called a ternary relationship.
4. N-ary Relationship: When there are n entities set participating in a relationship, the
relationship is called an n-ary relationship.
Cardinality in ER Model
The maximum number of times an entity of an entity set participates in a relationship set is
known as cardinality.
Cardinality can be of different types:
1. One-to-One
When each entity in each entity set can take part only once in the relationship, the cardinality is
one-to-one. Let us assume that a male can marry one female and a female can marry one male.
So the relationship will be one-to-one.

2. One-to-Many

22
In one-to-many mapping as well where each entity can be related to more than one entity. Let
us assume that one surgeon department can accommodate many doctors. So the Cardinality
will be 1 to M. It means one department has many Doctors.
one to many cardinality

3. Many-to-One
When entities in one entity set can take part only once in the relationship set and entities in
other entity sets can take part more than once in the relationship set, cardinality is many to one.
Let us assume that a student can take only one course but one course can be taken by many
students. So the cardinality will be n to 1. It means that for one course there can be n students
but for one student, there will be only one course.
many to one cardinality

.
4. Many-to-Many
When entities in all entity sets can take part more than once in the relationship cardinality is
many to many. Let us assume that a student can take more than one course and one course can
be taken by many students. So the relationship will be many to many.
many to many cardinality

.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation: Each entity in the entity set must participate in the relationship. If each
student must enroll in a course, the participation of students will be total. Total participation is
shown by a double line in the ER diagram.
2. Partial Participation: The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the students, the participation in the
course will be partial.

How to Draw an ER Diagram


1. Identify Entities: The very first step is to identify all the Entities. Represent these entities in
a Rectangle and label them accordingly.
2. Identify Relationships: The next step is to identify the relationship between them and
represent them accordingly using the Diamond shape. Ensure that relationships are not directly
connected to each other.
3. Add Attributes: Attach attributes to the entities by using ovals. Each entity can have
multiple attributes (such as name, age, etc.), which are connected to the respective entity.
4. Define Primary Keys: Assign primary keys to each entity. These are unique identifiers that
help distinguish each instance of the entity. Represent them with underlined attributes.
5. Remove Redundancies: Review the diagram and eliminate unnecessary or repetitive
entities and relationships.
6. Review for Clarity: Review the diagram make sure it is clear and effectively conveys the
relationships between the entities.

23
24
UNIT 2
Relational Model in DBMS


The Relational Model represents data and their relationships through a collection of tables. Each
table also known as a relation consists of rows and columns. Every column has a unique name
and corresponds to a specific attribute, while each row contains a set of related data values
representing a real-world entity or relationship. This model is part of the record-based models
which structure data in fixed-format records each belonging to a particular type with a defined
set of attributes.
E.F. Codd introduced the Relational Model to organize data as relations or tables. After creating
the conceptual design of a database using an ER diagram, this design must be transformed into a
relational model which can then be implemented using relational database systems like Oracle
SQL or MySQL.

What is the Relational Model?

The relational model represents how data is stored and managed in Relational Databases. Data
is organized into tables, each known as a relation, consisting of rows (tuples)
and columns (attributes). Each row represents an entity or record, and each column represents a
particular attribute of that entity. A relational database consists of a collection of tables each of
which is assigned a unique name.
Example:
Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS,
PHONE, and AGE shown in the table.

Key Terms in the Relational Model


1. Attribute: Attributes are the properties that define an entity.
 Example: ROLL_NO, NAME, ADDRESS etc.
2. Relation Schema: A relation schema defines the structure of the relation and represents the
name of the relation with its attributes.
 Example: STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the relation
schema for STUDENT. If a schema has more than 1 relation it is called Relational Schema.
3. Tuple: A Tuple represents a row in a relation. Each tuple contains a set of attribute values
that describe a particular entity.
 Example: (1, RAM, DELHI, 9455123451, 18) is a tuple in the STUDENT table.
4. Relation Instance: The set of tuples of a relation at a particular instance of time is called
a relation instance. It can change whenever there is an insertion, deletion or update in the
database.
5. Degree: The number of attributes in the relation is known as the degree of the relation.
 Example: The STUDENT relation has a degree of 5, as it has 5 attributes.
6. Cardinality: The number of tuples in a relation is known as cardinality.

25
 Example: The STUDENT relation defined above has cardinality 4.
7. Column: The column represents the set of values for a particular attribute.
 Example: The column ROLL_NO is extracted from the relation STUDENT.
8. NULL Values: The value which is not known or unavailable is called a NULL value. It is
represented by NULL.
 Example: PHONE of STUDENT having ROLL_NO 4 is NULL.

Types of Keys in the Relational Model

1. Primary Key:
A Primary Key uniquely identifies each tuple in a relation. It must contain unique values and cannot
have NULL values. Example: ROLL_NO in the STUDENT table is the primary key.

2. Candidate Key
A Candidate Key is a set of attributes that can uniquely identify a tuple in a relation. There can be
multiple candidate keys, and one of them is chosen as the primary key.

3. Super Key
A Super Key is a set of attributes that can uniquely identify a tuple. It may contain extra attributes
that are not necessary for uniqueness.

4. Foreign Key
A Foreign Key is an attribute in one relation that refers to the primary key of another relation. It
establishes relationships between tables. Example: BRANCH_CODE in the STUDENT table is
a foreign key that refers to the primary key BRANCH_CODE in the BRANCH table.

5. Composite Key
A Composite Key is formed by combining two or more attributes to uniquely identify a tuple.
Example: A combination of FIRST_NAME and LAST_NAME could be a composite key if no
one in the database shares the same full name.

Relational Model Notation


 Relation schema R of degree n is denoted by by R(A1, A2, ...,An).
 Uppercase letters Q, R, S denote relation names.
 Lowercase letters q, r, s denote relation states.
 Letters t, u, v denote tuples.
 In general, the name of a relation schema such as STUDENT also indicates the current set of
tuples in that relation.
 An attribute A can be qualified with the relation name R to which it belongs by using the dot
notation R.A for example, STUDENT.Name or STUDENT.Age.
 An n-tuple t in a relation r(R) is represented as t=<v1,v2,...,vn> where vi is the value
corresponding to the attribute Ai. The value vi for attribute Ai in tuple t can be accessed
using t[Ai] or t.Ai.

26
Characteristics of the Relational Model
1. Data Representation: Data is organized in tables (relations), with rows (tuples) representing
records and columns (attributes) representing data fields.
2. Atomic Values: Each attribute in a table contains atomic values, meaning no multi-valued or
nested data is allowed in a single cell.
3. Unique Keys: Every table has a primary key to uniquely identify each record, ensuring no
duplicate rows.
4. Attribute Domain: Each attribute has a defined domain, specifying the valid data types and
constraints for the values it can hold.
5. Tuples as Rows: Rows in a table, called tuples, represent individual records or instances of
real-world entities or relationships.
6. Relation Schema: A table’s structure is defined by its schema, which specifies the table
name, attributes, and their domains.
7. Data Independence: The model ensures logical and physical data independence, allowing
changes in the database schema without affecting the application layer.
8. Integrity Constraints: The model enforces rules like:
9. Domain constraints: Attribute values must match the specified domain.
10. Entity integrity: No primary key can have NULL values.
11. Referential integrity: Foreign keys must match primary keys in the referenced table or be
NULL.
12. Relational Operations: Supports operations like selection, projection, join, union, and
intersection, enabling powerful data retrieval manipulation.
13. Data Consistency: Ensures data consistency through constraints, reducing redundancy and
anomalies.
14. Set-Based Representation: Tables in the relational model are treated as sets, and operations
follow mathematical set theory principles.

Constraints in Relational Model

While designing the Relational Model, we define some conditions which must hold for data present
in the database are called Constraints. These constraints are checked before performing any
operation (insertion, deletion, and updation ) in the database. If there is a violation of any of the
constraints, the operation will fail.

1. Domain Constraints
Domain Constraints ensure that the value of each attribute A in a tuple must be an atomic
value derived from its specified domain, dom(A). Domains are defined by the data types
associated with the attributes. Common data types include:
 Numeric types: Includes integers (short, regular, and long) for whole numbers and real
numbers (float, double-precision) for decimal values, allowing precise calculations.
 Character types: Consists of fixed-length (CHAR) and variable-length
(VARCHAR, TEXT) strings for storing text data of various sizes.
 Boolean values: Stores true or false values, often used for flags or conditional checks in
databases.

27
 Specialized types: Includes types
for date (DATE), time (TIME), timestamp (TIMESTAMP), and money (MONEY), used for
precise handling of time-related and financial data.
2. Key Integrity
Every relation in the database should have at least one set of attributes that defines a tuple uniquely.
Those set of attributes is called keys. e.g.; ROLL_NO in STUDENT is key. No two students can
have the same roll number. So a key has two properties:
 It should be unique for all tuples.
 It can’t have NULL values.
3. Referential Integrity Constraints
When one attribute of a relation can only take values from another attribute of the same relation or
any other relation, it is called referential integrity. Let us suppose we have 2 relations

Table STUDENT
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE

1 RAM DELHI 9455123451 18 CS

2 RAMESH GURGAON 9652431543 18 CS

3 SUJIT ROHTAK 9156253131 20 ECE

4 SURESH DELHI 18 IT

Table BRANCH
BRANCH_CODE BRANCH_NAME

CS COMPUTER SCIENCE

IT INFORMATION TECHNOLOGY

ELECTRONICS AND COMMUNICATION


ECE
ENGINEERING

28
BRANCH_CODE BRANCH_NAME

CV CIVIL ENGINEERING

Explanation: BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint. The relation
which is referencing another relation is called REFERENCING RELATION (STUDENT in this
case) and the relation to which other relations refer is called REFERENCED RELATION
(BRANCH in this case).

Anomalies in the Relational Model

An anomaly is an irregularity or something which deviates from the expected or normal state.
When designing databases, we identify three types of anomalies: Insert, Update, and Delete.

1. Insertion Anomaly in Referencing Relation

We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is not


present in the referenced attribute value. e.g.; Insertion of a student with BRANCH_CODE ‘ME’
in STUDENT relation will result in an error because ‘ME’ is not present in BRANCH_CODE of
BRANCH.

3. Deletion/ Updation Anomaly in Referenced Relation:

We can’t delete or update a row from REFERENCED RELATION if the value of


REFERENCED ATTRIBUTE is used in the value of REFERENCING ATTRIBUTE. e.g. if we
try to delete a tuple from BRANCH having BRANCH_CODE ‘CS’, it will result in an error
because ‘CS’ is referenced by BRANCH_CODE of STUDENT, but if we try to delete the row
from BRANCH with BRANCH_CODE CV, it will be deleted as the value is not been used by
referencing relation. It can be handled by the following method:

3. On Delete Cascade
It will delete the tuples from REFERENCING RELATION if the value used by REFERENCING
ATTRIBUTE is deleted from REFERENCED RELATION. e.g.; if we delete a row from
BRANCH with BRANCH_CODE ‘CS’, the rows in STUDENT relation with BRANCH_CODE
CS (ROLL_NO 1 and 2 in this case) will be deleted.

4. On Update Cascade
It will update the REFERENCING ATTRIBUTE in REFERENCING RELATION if the
attribute value used by REFERENCING ATTRIBUTE is updated in REFERENCED
RELATION. e.g., if we update a row from BRANCH with BRANCH_CODE ‘CS’ to ‘CSE’, the

29
rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be
updated with BRANCH_CODE ‘CSE’.

5. Super Keys
Any set of attributes that allows us to identify unique rows (tuples) in a given relationship is
known as super keys. Out of these super keys, we can always choose a proper subset among
these that can be used as a primary key. Such keys are known as Candidate keys. If there is a
combination of two or more attributes that are being used as the primary key then we call it a
Composite key.

Codd Rules in Relational Model


E.F. Codd, the creator of the relational model, proposed 12 rules (known as Codd’s 12 Rules)
that define what constitutes a relational database system. These rules emphasize the importance
of data independence, consistency, and structure.
Key Codd’s Rules:
 Rule 1: The information rule – All information in a relational database is represented
logically in tables (relations).
 Rule 2: The guaranteed access rule – Every data element is accessible by using a
combination of table name, primary key, and attribute name.
 Rule 5: The powerful language rule – A relational DBMS should have a comprehensive
language capable of expressing all relational queries.

Advantages of the Relational Model


1. Simple model: Relational Model is simple and easy to use in comparison to other languages.
2. Flexible: Relational Model is more flexible than any other relational model present.
3. Secure: Relational Model is more secure than any other relational model.
4. Data Accuracy: Data is more accurate in the relational data model.
5. Data Integrity: The integrity of the data is maintained in the relational model.
6. Operations can be Applied Easily: It is better to perform operations in the relational model.

Disadvantages of the Relational Model

1. Performance: The relational model can experience performance issues with very large
databases.
2. Complexity for Complex Data: The model struggles with hierarchical or complex data
relationships, which might be better handled with other models like the Graph or Document
model.
3. Normalization Overhead: Extensive use of normalization can result in complex queries and
slower performance.

Domain constraints,

Domain constraints, in the context of databases and data integrity, define the acceptable values
for an attribute (column). They specify the data type and any additional restrictions, ensuring

30
data accuracy and consistency. These constraints act as rules, preventing invalid data from being
entered into the database.

Here's a more detailed explanation:

 Purpose:
Domain constraints ensure that the values stored in a database column are valid and within a
specific range or domain.
 Data Type:
They specify the data type of the attribute, such as integer, string, date, etc.
 Restrictions:
They can include additional restrictions, such as allowed ranges, formats, or patterns.
 Examples:
 A "NOT NULL" constraint prevents a column from accepting null values.
 A "UNIQUE" constraint ensures that all values in a column are different.
 A "CHECK" constraint can enforce specific criteria or conditions on the values.
 Importance:
Domain constraints are crucial for maintaining data integrity and preventing errors.
 Application:
They are used in various contexts, including:
 Database design
 Data validation
 Data quality

relational algebra
Introduction of Relational Algebra in DBMS


Relational Algebra is a formal language used to query and manipulate relational databases,
consisting of a set of operations like selection, projection, union, and join. It provides a
mathematical framework for querying databases, ensuring efficient data retrieval and
manipulation. Relational algebra serves as the mathematical foundation for query SQL

31
Relational algebra simplifies the process of querying databases and makes it easier to understand
and optimize query execution for better performance. It is essential for learning SQL because
SQL queries are based on relational algebra operations, enabling users to retrieve data
effectively.
Key Concepts in Relational Algebra
Before explaining relational algebra operations, let's define some fundamental concepts:
1. Relations: In relational algebra, a relation is a table that consists of rows and columns,
representing data in a structured format. Each relation has a unique name and is made up of
tuples.
2. Tuples: A tuple is a single row in a relation, which contains a set of values for each attribute.
It represents a single data entry or record in a relational table.
3. Attributes: Attributes are the columns in a relation, each representing a specific characteristic
or property of the data. For example, in a "Students" relation, attributes could be "Name", "Age",
and "Grade".
4. Domains: A domain is the set of possible values that an attribute can have. It defines the type
of data that can be stored in each column of a relation, such as integers, strings, or dates.
Basic Operators in Relational Algebra
Relational algebra consists of various basic operators that help us to fetch and manipulate data
from relational tables in the database to perform certain operations on relational data. Basic
operators are fundamental operations that include selection (σ), projection (π), union (U), set
difference (−), Cartesian product (×), and rename (ρ

, relational calculus,

Relational calculus is a declarative query language in database theory that allows users to specify
what data they want to retrieve, rather than how to retrieve it. It's a non-procedural language,
meaning it focuses on the "what" instead of the "how," unlike relational algebra which is
procedural. Relational calculus is based on predicate calculus, a part of symbolic logic. There are
two main types: Tuple Relational Calculus (TRC) and Domain Relational Calculus (DRC).

Key Concepts:

 Declarative: Relational calculus describes the desired result without specifying the steps to
obtain it.
 Non-procedural: It focuses on what needs to be retrieved, not how.
 Based on Predicate Calculus: Relational calculus builds upon the concepts of predicate
calculus, a branch of logic.
 Two Main Types: TRC and DRC offer different ways of expressing
queries. Tuple Relational Calculus (TRC):
 TRC focuses on tuples (rows) of relations and uses predicates to define conditions.
 It describes the desired tuples based on conditions (predicates).

32
 For example, you can write a TRC query to retrieve all employees who earn more than a certain
salary.
 TRC is used as a theoretical foundation for optimizing queries in relational databases, according
to Naukri.com.
Domain Relational Calculus (DRC):

 DRC deals with individual values (domain values) of attributes rather than entire tuples.
 It expresses queries in terms of the values of attributes.
 While DRC can be powerful, it can also be more difficult to express complex queries compared
to TRC.
Relationship to Relational Algebra:
 Both relational algebra and relational calculus are used in database management systems
(DBMS).
 Relational algebra is procedural, specifying a sequence of operations to retrieve data.
 Relational calculus is declarative, specifying what data to retrieve without specifying the steps.
 Both are considered equivalent in expressive power, meaning they can express the same queries.

tuple and domain calculus


Domain Relational Calculus is a non-procedural query language equivalent in power to Tuple
Relational Calculus. Domain Relational Calculus provides only the description of the query but
it does not provide the methods to solve it. In Domain Relational Calculus, a query is
expressed as,
{ < x1, x2, x3, ..., xn > | P (x1, x2, x3, ..., xn ) } where, < x1, x
domains variables and P (x1, x2, x3, ..., xn ) represents the condition or formula equivalent to the
Predicate calculus.
Predicate Calculus Formula:
1. Set of all comparison operators
2. Set of connectives like and, or, not
3. Set of quantifiers

Domain Relational Calculus is a non-procedural query language equivalent in power to Tuple


Relational Calculus. Domain Relational Calculus provides only the description of the query but
it does not provide the methods to solve it. In Domain Relational Calculus, a query is
expressed as,
{ < x1, x2, x3, ..., xn > | P (x1, x2, x3, ..., xn ) } where, < x1, x
domains variables and P (x1, x2, x3, ..., xn ) represents the condition or formula equivalent to the
Predicate calculus.
Predicate Calculus Formula:
33
1. Set of all comparison operators
2. Set of connectives like and, or, not
3. Set of quantify

34
UNIT-3

What is SQL?
Data is at the core of every application, and SQL (Structured Query Language) manages and interacts
with this data. Whether we’re handling a small user database or analyzing terabytes of sales
records, SQL allows efficient querying, updating, and management of relational databases.

When data needs to be retrieved from a database, SQL is used to construct and send the request.
The Database Management System (DBMS) processes the SQL query, retrieves the requested
data, and returns it to the user or application. Instead of specifying step-by-step procedures, SQL
statements describe what data should be retrieved, organized, or modified, allowing the DBMS to
handle how the operations are executed efficiently.

It is a standardized programming language used to manage and manipulate relational databases. It


enables users to perform a variety of tasks such as querying data, creating and modifying database
structures, and managing access permissions. SQL is widely used across various relational
database management systems such as MySQL, PostgreSQL, Oracle, and SQL Server

Characteristics of SQL?
User-Friendly and Accessible: SQL is designed for a broad range of users, including those with
minimal programming experience, making it approachable for non-technical individuals.
Declarative Language: As a non-procedural language, SQL allows users to specify what data is needed
rather than how to retrieve it, focusing on the desired results rather than the retrieval process.
Efficient Database Management: SQL enables the creation, modification, and management of
databases efficiently, saving time and simplifying complex database operations.
Standardized Language: Based on ANSI (American National Standards Institute) and ISO
(International Organization for Standardization) standards, SQL ensures consistency and stability
across various database management systems (DBMS).
Command Structure: SQL does not require a continuation character for multi-line queries, allowing
flexibility in writing commands across one or multiple lines.
Execution Mechanism: Queries are executed using a termination character (e.g., a semicolon ;),
enabling immediate and accurate command processing.
Built-in Functionality: SQL includes a rich set of built-in functions for data manipulation,
aggregation, and formatting, empowering users to handle diverse data-processing needs
effectively.

35
Advantages of SQL

Faster Query Processing: Large amount of data is retrieved quickly and efficiently. Operations like
Insertion, deletion, manipulation of data is also done in almost no time.
No Coding Skills: For data retrieval, large number of lines of code is not required. All basic keywords
such as SELECT, INSERT INTO, UPDATE, etc are used and also the syntactical rules are not
complex in SQL, which makes it a user-friendly language.
Standardized Language: Due to documentation and long establishment over years, it provides a
uniform platform worldwide to all its users.
Portable: It can be used in programs in PCs, server, laptops independent of any platform (Operating
System, etc). Also, it can be embedded with other applications as per need/requirement/use.
Interactive Language : Easy to learn and understand, answers to complex queries can be received in
seconds.
Multiple data views : One of the advantages of SQL is its ability to provide multiple data views . This
means that SQL allows users to create different views or perspectives of the data stored in a
database, depending on their needs and permissions.
Scalability : SQL databases can handle large volumes of data and can be scaled up or down as per the
requirements of the application.
Security : SQL databases have built-in security features that help protect data from unauthorized
access, such as user authentication, encryption, and access control.
Data Integrity : SQL databases enforce data integrity by enforcing constraints such as unique keys,
primary keys, and foreign keys, which help prevent data duplication and maintain data accuracy.
Backup and Recovery : SQL databases have built-in backup and recovery tools that help recover data
in case of system failures, crashes, or other disasters.
Data Consistency: SQL databases ensure consistency of data across multiple tables through the use of
transactions, which ensure that changes made to one table are reflected in all related tables.
Disadvantages of SQL :
Although SQL has many advantages, still there are a few disadvantages.
Various Disadvantages of SQL are as follows:

Complex Interface : SQL has a difficult interface that makes few users uncomfortable while dealing
with the database.
Cost : Some versions are costly and hence, programmers cannot access it.
Partial Control : Due to hidden business rules, complete control is not given to the database.
Limited Flexibility: SQL databases are less flexible than NoSQL databases when it comes to handling
unstructured or semi-structured data, as they require data to be structured into tables and columns.
Lack of Real-Time Analytics: SQL databases are designed for batch processing and do not support
real-time analytics, which can be a disadvantage for applications that require real-time data
processing.
Limited Query Performance: SQL databases may have limited query performance when dealing with
large datasets, as queries may take longer to process than in-memory databases.
Complexity: SQL databases can be complex to set up and manage, requiring skilled database
administrators to ensure optimal performance and maintain data integrity.

36
What are SQL Data Types?

 Numeric Data Types


 Character and String Data Types
 Date and Time Data Types
 Binary Data Types
 Boolean Data Types
 Special Data Types
1. Numeric Data Types
Numeric data types are fundamental to database design and are used to
store numbers, whether they are integers, decimals, or floating-point
numbers. These data types allow for mathematical operations
like addition, subtraction, multiplication, and division, which makes
them essential for managing financial, scientific, and analytical data.
Exact Numeric Datatype
Exact numeric types are used when precise numeric values are needed,
such as for financial data, quantities, and counts. Some common exact
numeric types include:
Descriptio
Data Type n Range

-
Large
9,223,372,036,854,77
integer
BIGINT 5,808 to
number
9,223,372,036,854,77
s
5,807

Standard
-2,147,483,648 to
INT integer
2,147,483,647
values

37
Descriptio
Data Type n Range

Small
SMALLINT -32,768 to 32,767
integers

Very small
TINYINT 0 to 255
integers

Exact
fixed-
point
number
DECIMAL s (e.g., -10^38 + 1 to 10^38 - 1
for
financia
l
values)

Similar to
DECIM
AL,
NUMERIC -10^38 + 1 to 10^38 - 1
used for
precisio
n data

-
For storing 922,337,203,685,477.
MONEY monetar 5808 to
y values 922,337,203,685,477.
5807

SMALLMONE Smaller -214,748.3648 to


Y monetar 214,748.3647
38
Descriptio
Data Type n Range

y values
Approximate Numeric Datatype
These types are used to store approximate values, such as scientific
measurements or large ranges of data that don't need exact precision.
Data
Type Description Range

Approximate numeric -1.79E+308 to


FLOAT
values 1.79E+308

Similar to FLOAT, but -3.40E+38 to


REAL
with less precision 3.40E+38
2. Character and String Data Types
Character data types are used to store text or character-based data. The
choice between fixed-length and variable-length data types depends on
the nature of your data.
Character String Data Types
Data Type Description

The maximum length of 8000 characters.(Fixed-


Char
Length non-Unicode Characters)

The maximum length of 8000 characters.


Varchar
(Variable-Length non-Unicode Characters)

The maximum length of 2^31 -


Varchar(max
1 characters(SQL Server 2005 only).(Variable
)
Length non-Unicode data)

39
Data Type Description

The maximum length of 2,127,483,647


Text characters(Variable Length non-Unicode
data)
Unicode Character String Data Types
Unicode data types are used to store characters from any language,
supporting a wider variety of characters. These are given in below table.
Data Type Description

The maximum length of 4000


Nchar characters(Fixed-Length Unicode
Characters)

The maximum length of 4000 characters.


Nvarchar
(Variable-Length Unicode Characters)

The maximum length of 2^31 -


Nvarchar(max
1 characters(SQL Server 2005 only).
)
(Variable Length Unicode data)
3. Date and Time Data Type
SQL provides several data types for storing date and time information.
They are essential for managing timestamps, events, and time-based
queries. These are given in the below table.
Storage
Data Type Description Size

stores the data of date (year,


DATE 3 Bytes
month, day)

stores the data of time (hour,


TIME 3 Bytes
minute,second)

40
Storage
Data Type Description Size

store both the data and time


DATETIME (year, month, day, hour, 8 Bytes
minute, second)
4. Binary Data Types in SQL
Binary data types are used to store binary data such as images, videos, or
other file types. These include:
Data Type Description Max Length

Fixed-length binary
Binary 8000 bytes
data.

VarBinar Variable-length binary


8000 bytes
y data.

Stores binary data as 2,147,483,647


Image
images. bytes
5. Boolean Data Type in SQL
The BOOLEAN data types are used to store logical values, typically TRUE
or FALSE. It's commonly used for flag fields or binary conditions.
Data Type Description

BOOLEAN Stores a logical value (TRUE/FALSE).


6. Special Data Types
SQL also supports some specialized data types for advanced use cases:
XML Data Type
The XML data type allows for the storage of XML documents and
fragments in a SQL Server database.

41
DataType Description

XML Used to store XML data and manipulate XML


Datatype structures in the database
Spatial Data Type
A datatype is used for storing planar spatial data, such as points, lines, and
polygons, in a database table.
DataType Description

stores planar spatial data, such as points, lines, and


Geometry
polygons, in a database table.

SQL Literals
Last Updated : 10 Sep, 2020



There are four kinds of literal values supported in SQL. They are : Character string, Bit string, Exact numeric,
and Approximate numeric. These are explained as following below.
1. Character string : Character strings are written as a sequence of characters enveloped in single quotes.
the only quote character is delineate at intervals a personality string by 2 single quotes. Some example
of character strings are :
 'My String'
 'I love GeeksForGeeks'
 '16378'
2. Bit string : A bit string is written either as a sequence of 0s and 1s enveloped in single quotes and
preceded by the letter ‘B’ or as a sequence of positional representation system digits enveloped in single
quotes and preceded by the letter X’ some examples are given below :
 B'10001011'
 B'1'
 B'0'
 X'C 5'
 X'0'
3. Exact numeric : These literals ar written as a signed or unsigned decimal variety probably with
mathematical notation. Samples of actual numeric literals are given below :
 8
42
 80
 80.00
 0.8
 +88.88
 -88.88
4. Approximate numeric : Approximate numeric literals are written as actual numeric literals followed by
the letter ‘E’, followed by a signed or unsigned number. Some example are :
 6E6
 66.6E6
 +66E-6
 0.66E
 -6.66E-8

DDL - Data Definition Language


DDL or Data Definition Language actually consists of the SQL commands that can be used
for defining, altering, and deleting database structures such as tables, indexes, and schemas. It
simply deals with descriptions of the database schema and is used to create and modify the
structure of database objects in the database
Common DDL Commands
Command Description Syntax

Create
database or its
objects (table,
CREATE TABLE table_name
index,
CREATE (column1 data_type, column2
function,
data_type, ...);
views, store
procedure, and
triggers)

Delete objects
DROP from the DROP TABLE table_name;
database

Alter the ALTER TABLE table_name


ALTER structure of the ADD COLUMN
database column_name data_type;

TRUNCATE Remove all TRUNCATE TABLE


records from a table_name;
table, including

43
Command Description Syntax

all spaces
allocated for
the records are
removed

Add comments
COMMENT 'comment_text'
COMMENT to the data
ON TABLE table_name;
dictionary

Rename an RENAME TABLE


RENAME object existing old_table_name TO
in the database new_table_name;

Example:
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
hire_date DATE
);
In this example, a new table called employees is created with columns for employee ID, first
name, last name, and hire date.
2. DQL - Data Query Language
DQL statements are used for performing queries on the data within schema objects. The
purpose of the DQL Command is to get some schema relation based on the query passed to
it. This command allows getting the data out of the database to perform operations with it. When
a SELECT is fired against a table or tables the result is compiled into a further temporary table,
which is displayed or perhaps received by the program.
DQL Command
Command Description Syntax

SELECT column1,
It is used to retrieve
column2, ...FROM
SELECT data from the
table_name WHERE
database
condition;

Example:
SELECT first_name, last_name, hire_date
FROM employees
WHERE department = 'Sales'
ORDER BY hire_date DESC;
This query retrieves employees' first and last names, along with their hire dates, from the
employees table, specifically for those in the 'Sales' department, sorted by hire date.
44
3. DML - Data Manipulation Language
The SQL commands that deal with the manipulation of data present in the database belong
to DML or Data Manipulation Language and this includes most of the SQL statements. It is the
component of the SQL statement that controls access to data and to the database. Basically, DCL
statements are grouped with DML statements.
Common DML Commands
Command Description Syntax

INSERT INTO table_name


Insert data
INSERT (column1, column2, ...) VALUES
into a table
(value1, value2, ...);

Update
UPDATE table_name SET column1
existing
UPDATE = value1, column2 = value2
data within
WHERE condition;
a table

Delete
records
DELETE FROM table_name
DELETE from a
WHERE condition;
database
table

Table
LOCK TABLE table_name IN
LOCK control
lock_mode;
concurrency

Call a
PL/SQL or
CALL CALL procedure_name(arguments);
JAVA
subprogram

Describe
EXPLAIN EXPLAIN PLAN FOR SELECT *
the access
PLAN FROM table_name;
path to data

Example:
INSERT INTO employees (first_name, last_name, department)
VALUES ('Jane', 'Smith', 'HR');
This query inserts a new record into the employees table with the first name 'Jane', last name
'Smith', and department 'HR'.
4. DCL - Data Control Language
DCL (Data Control Language) includes commands such as GRANT and REVOKE which
mainly deal with the rights, permissions, and other controls of the database system. These
45
commands are used to control access to data in the database by granting or revoking
permissions.
Common DCL Commands
Command Description Syntax

Assigns new
privileges to a user GRANT privilege_type
account, allowing [(column_list)] ON
GRANT access to specific [object_type] object_name TO
database objects, user [WITH GRANT
actions, or OPTION];
functions.

Removes
previously granted
REVOKE [GRANT
privileges from a
OPTION FOR] privilege_type
user account,
REVOKE [(column_list)] ON
taking away their
[object_type] object_name
access to certain
FROM user [CASCADE];
database objects or
actions.

Example of DCL
GRANT SELECT, UPDATE ON employees TO user_name;
This command grants the user user_name the permissions to select and update records in the
employees table.
5. TCL - Transaction Control Language
Transactions group a set of tasks into a single execution unit. Each transaction begins with a
specific task and ends when all the tasks in the group are successfully completed. If any of
the tasks fail, the transaction fails. Therefore, a transaction has only two
results: success or failure. We can explore more about transactions here.
Common TCL Commands
Command Description Syntax

BEGIN
BEGIN Starts a new
TRANSACTION
TRANSACTION transaction
[transaction_name];

Saves all changes


COMMIT made during the COMMIT;
transaction

ROLLBACK Undoes all ROLLBACK;


changes made
46
Command Description Syntax

during the
transaction

Creates a SAVEPOINT
savepoint within savepoint_name;
SAVEPOINT
the current
transaction

Example:
BEGIN TRANSACTION;
UPDATE employees SET department = 'Marketing' WHERE department = 'Sales';
SAVEPOINT before_update;
UPDATE employees SET department = 'IT' WHERE department = 'HR';
ROLLBACK TO SAVEPOINT before_update;
COMMIT;
In this example, a transaction is started, changes are made, and a savepoint is set. If needed, the
transaction can be rolled back to the savepoint before being committed.
Most Important SQL Commands
There are also a few other SQL Commands we often rely on when writing powerful queries.
While they don’t fit neatly into the five main categories, they’re absolutely essential for working
with data effectively.
Command Description

SELECT Retrieves data from one or more tables.

INSERT Adds new rows (records) to a table.

UPDATE Modifies existing data in a table.

DELETE Removes specific rows from a table.

CREATE TABLE Creates a new table in the database.

Modifies the structure of an existing table (e.g., add or


ALTER TABLE
remove columns).

DROP TABLE Permanently deletes a table and its data.

TRUNCATE Removes all rows from a table but keeps its structure

47
Command Description

TABLE intact.

WHERE Filters records based on a condition.

ORDER BY Sorts the result set in ascending or descending order.

Groups rows that have the same values in specified


GROUP BY
columns.

HAVING Filters grouped data (used with GROUP BY).

Combines rows from two or more tables based on a


JOIN
related column.

DISTINCT Removes duplicate values from the result set.

IN / BETWEEN /
Used for advanced filtering conditions.
LIKE

UNION Combines the result of two or more SELECT queries.

GRANT Gives user privileges or permissions.

REVOKE Removes user privileges.

COMMIT Saves all changes made in the current transaction.

Undoes changes if something goes wrong in a


ROLLBACK
transaction.

SAVEPOINT Sets a point in a transaction to roll back to if needed.

48
SQL operators are important in database management systems (DBMS) as they allow us to
manipulate and retrieve data efficiently. Operators in SQL perform arithmetic, logical,
comparison, bitwise, and other operations to work with database values. Understanding SQL
operators is crucial for performing complex data manipulations, calculations, and filtering
operations in queries.

SQL Arithmetic Operators


Arithmetic operators in SQL are used to perform mathematical operations on numeric data
types in SQL queries. Some common arithmetic operators:
Operator Description

The addition is used to perform an addition operation on the data


+
values.

- This operator is used for the subtraction of the data values.

This operator works with the 'ALL' keyword and it calculates division
/
operations.

* This operator is used for multiplying data values.

% Modulus is used to get the remainder when data is divided by another.

SQL Comparison Operators


Comparison Operators in SQL are used to compare one expression's value to other
expressions. SQL supports different types of comparison operator, which are described below:
Operator Description

= Equal to.

> Greater than.

< Less than.

>= Greater than equal to.

49
Operator Description

<= Less than equal to.

<> Not equal to.

SQL Logical Operators


Logical Operators in SQL are used to combine or manipulate conditions in SQL queries to
retrieve or manipulate data based on specified criteria..

Operator Description
AND
Logical AND compares two Booleans as expressions and returns true when both expressions are
true.

OR
Logical OR compares two Booleans as expressions and returns true when one of the expressions
is true.

NOT
Not takes a single Boolean as an argument and change its value from false to true or from true to
false.

SQL Bitwise Operators


Bitwise operators in SQL are used to perform bitwise operations on binary values in SQL
queries, manipulating individual bits to perform logical operations at the bit level. Some SQL
Bitwise Operators are:
Operator Description

& Bitwise AND operator

| Bitwise OR operator

^ Bitwise XOR (exclusive OR) operator

50
Operator Description

~ Bitwise NOT (complement) operator

<< Left shift operator

>> Right shift operator

SQL Compound Operators


Compound operators combine an operation with assignment. These operators modify the value
of a column and store the result in the same column in a single step. Some Compound operators
are:
Operato
r Description

+= Add and assign

-= Subtract and assign

*= Multiply and assign

/= Divide and assign

%= Modulo and assign

&= Bitwise AND and assign

^= Bitwise XOR and assign

|= Bitwise OR and assign

SQL Special Operators


SQL also provides several special operators that serve specific functions such as filtering data
based on a range, checking for existence, and comparing sets of values.
Operators Description

ALL ALL is used to select all records of a SELECT


STATEMENT. It compares a value to every value in a list
of results from a query. The ALL must be preceded by the
comparison operators and evaluated to TRUE if the query

51
Operators Description

returns no rows.

ANY compares a value to each value in a list of results from


ANY a query and evaluates to true if the result of an inner query
contains at least one row.

The SQL BETWEEN operator tests an expression against a


BETWEEN range. The range consists of a beginning, followed by an
AND keyword and an end expression.

The IN operator checks a value within a set of values


IN separated by commas and retrieves the rows from the table
that match.

The EXISTS checks the existence of a result of a subquery.


The EXISTS subquery tests whether a subquery fetches at
EXISTS
least one row. When no data is returned then this operator
returns 'FALSE'.

SOME operator evaluates the condition between the outer


SOME and inner tables and evaluates to true if the final result
returns any one row. If not, then it evaluates to false.

The UNIQUE operator searches every unique row of a


UNIQUE
specified table.

What is a View in SQL?


A view in SQL is a saved SQL query that acts as a virtual table. Unlike regular tables, views do
not store data themselves. Instead, they dynamically generate data by executing the SQL query
defined in the view each time it is accessed. It can fetch data from one or more tables and present
it in a customized format, allowing developers to:

Simplify Complex Queries: Encapsulate complex joins and conditions into a single object.
Enhance Security: Restrict access to specific columns or rows.
Present Data Flexibly: Provide tailored data views for different users.

52
What Are Indexes in SQL?
An index in SQL is a schema object that improves the speed of data retrieval operations on a
table. It works by creating a separate data structure that provides pointers to the rows in a table,
making it faster to look up rows based on specific column values. Indexes act as a table of
contents for a database, allowing the server to locate data quickly and efficiently, reducing disk
I/O operations.

Benefits of Indexes:
Faster Queries: Speeds up SELECT and JOIN operations.
Lower Disk I/O: Reduces the load on your database by limiting the amount of data scanned.
Better Performance on Large Tables: Essential when working with millions of records.

TABLE IN SQL
relational database system contains one or more objects called tables. The data or information for
the database are stored in these tables. Tables are uniquely identified by their names and are
comprised of columns and rows. Columns contain the column name, data type, and any other
attributes for the column. Rows contain the records or data for the columns

Queries and sub queries TO DO

SQL Aggregate functions

SQL Aggregate Functions are used to perform calculations on a set of rows and return a single
value. These functions are particularly useful when we need to summarize, analyze, or group
large datasets in SQL databases. Whether you're working with sales data, employee records, or
product inventories, aggregate functions help us derive meaningful insights.

Commonly used aggregate functions include COUNT(), SUM(), AVG(), MIN(), and MAX().

53
Key Features of SQL Aggregate Functions:
 Operate on groups of rows: They work on a set of rows and return a single value.
 Ignore NULLs: Most aggregate functions ignore NULL values, except for COUNT(*).
 Used with GROUP BY: To perform calculations on grouped data, you often use
aggregate functions with GROUP BY.
 Can be combined with other SQL clauses: Aggregate functions can be used alongside
HAVING, ORDER BY, and other SQL clauses to filter or sort results.

Count()
The COUNT() function returns the number of rows that match a given condition or are present in
a column.

COUNT(*): Counts all rows.


COUNT(column_name): Counts non-NULL values in the specified column.
COUNT(DISTINCT column_name): Counts unique non-NULL values in the column.

SUM()
The SUM() function calculates the total sum of a numeric column.

SUM(column_name): Returns the total sum of all non-NULL values in a column.

AVG()
The AVG() function calculates the average of a numeric column. It divides the sum of the
column by the number of non-NULL rows.

AVG(column_name): Returns the average of the non-NULL values in the column

MIN() and MAX()


The MIN() and MAX() functions return the smallest and largest values, respectively, from a
column.

MIN(column_name): Returns the minimum value.


MAX(column_name): Returns the maximum value

54
Joins
Joins are used to combine rows from two or more tables based on related columns between them.
The most common type is the INNER JOIN which returns only matching rows from both tables.
LEFT JOIN returns all rows from the left table with matching rows from the right (or NULL if
no match). RIGHT JOIN does the opposite, returning all rows from the right table. FULL JOIN
returns all rows from both tables, matching where possible. CROSS JOIN produces a Cartesian
product, combining every row from the first table with every row from the second table, which is
rarely used in practice but important to understand.
Unions
UNION combines the result sets of two or more SELECT statements, removing duplicates. The
number and order of columns must match in all queries, and data types must be compatible.
UNION ALL is similar but retains duplicates and is more efficient since it doesn't need to check
for duplicates. These operations are vertical combinations, stacking result sets on top of each
other rather than joining them side-by-side.
Intersection
INTERSECT returns only the rows that appear in both result sets of two SELECT statements.
Like UNION, the queries must have the same number of columns with compatible data types.
This operation is useful for finding common elements between two datasets. Not all database
systems support INTERSECT directly, sometimes requiring alternative approaches using
EXISTS or IN clauses.
Minus (EXCEPT)
MINUS (called EXCEPT in some databases) returns rows from the first query that aren't present
in the second query's results. It essentially performs set subtraction. The operation requires the
same number of columns with compatible types in both queries. This is particularly useful for
finding differences between datasets or excluding specific records from a result set.
Cursors
Cursors are database objects used to retrieve, manipulate, and navigate through result sets row by
row. They provide more control than standard result sets, allowing procedural processing of data.
There are implicit cursors (automatically created for SQL statements) and explicit cursors
(defined by programmers). Cursors are essential in PL/SQL for row-by-row processing, though
overuse can impact performance.
Triggers
Triggers are stored programs that automatically execute ("fire") in response to specific database
events (INSERT, UPDATE, DELETE) on particular tables or views. They can run before or
after the triggering event and are useful for enforcing business rules, maintaining audit trails, or
keeping derived data consistent. Triggers operate transparently to applications, executing
whenever the defined event occurs regardless of what caused the event.
Procedures
Procedures are named PL/SQL blocks that perform specific tasks, optionally accepting
parameters and returning values. They promote code reusability and modularity, encapsulating
complex operations that can be called from applications or other procedures. Unlike functions,
procedures don't have to return values and are called as standalone statements. They're stored in
the database and can include transaction control statements, exception handling, and all PL/SQL
55
features.

What is database design?


Database Design can be defined as a set of procedures or collection of tasks involving various
steps taken to implement a database.A good database design is important. It helps you get the
right information when you need it. Following are some critical points to keep in mind to
achieve a good database design:

Data consistency and integrity must be maintained.


Low Redundancy
Faster searching through indices
Security measures should be taken by enforcing various integrity constraints.
Data should be stored in fragmented bits of information in the most atomic format possible.

Normalization is an important process in database design that helps improve the database's
efficiency, consistency, and accuracy. It makes it easier to manage and maintain the data and
ensures that the database is adaptable to changing business needs.

Database normalization is the process of organizing the attributes of the database to reduce or
eliminate data redundancy (having the same data but at different places).

Normalization generally involves splitting a table into multiple ones which must be linked each
time a query is made requiring data from the split tables.

Functional dependency in DBMS is an important concept that describes the


relationship between attributes (columns) in a table. It shows that the value of one attribute
determines the other. In this article, we will learn about functional dependencies and their types.
Functional dependencies help maintain the quality of data in the database.

A functional dependency occurs when one attribute uniquely determines another attribute within
a relation. It is a constraint that describes how attributes in a table relate to each other. If attribute
A functionally determines attribute B we write this as the A→B.

Functional dependencies are used to mathematically express relations among database entities
56
and are very important to understanding advanced concepts in Relational Database Systems.

Benefits of Functional Dependency in DBMS


Functional dependency in a database management system offers several advantages for
businesses and organizations:

Prevents Duplicate Data:


Functional dependency helps avoid storing the same data repeatedly in the database, reducing
redundancy and saving storage space.
Improves Data Quality and Accuracy:
By organizing data efficiently and minimizing duplication, functional dependency ensures the
data is reliable, consistent, and of high quality.
Reduces Errors:
Keeping data organized and concise lowers the chances of errors in records or datasets, making it
easier to manage and update information.
Saves Time and Costs:
Properly organized data allows for quicker and easier access, improving productivity and
reducing the time and cost of managing information.
Defines Rules and Behaviors:
Functional dependency allows setting rules and constraints that control how data is stored,
accessed, and maintained, ensuring better data management.
Helps Identify Poor Database Design:
It highlights issues like scattered or missing data across tables, helping identify and fix design
flaws to maintain consistency and integrity.

Types of Functional Dependencies in DBMS


Trivial functional dependency
Non-Trivial functional dependency
Multivalued functional dependency
Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X →
Y and Y is the subset of X, then it is called trivial functional dependency.

Symbolically: A→B is trivial functional dependency if B is a subset of A.

57
Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.

Semi Non Trivial Functional Dependencies


A semi non-trivial functional dependency occurs when part of the dependent attribute (right-
hand side) is included in the determinant (left-hand side), but not all of it. This is a middle
ground between trivial and non-trivial functional dependencies. X -> Y is called semi non-trivial
when X intersect Y is not NULL.

Multivalued Functional Dependency


In Multivalued functional dependency, entities of the dependent set are not dependent on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is
called a multivalued functional dependency.

Transitive Functional Dependency


In transitive functional dependency, dependent is indirectly dependent on determinant. i.e. If a →
b & b → c, then according to axiom of transitivity, a → c. This is a transitive functional
dependency.

Fully Functional Dependency


In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y
and X->Z which states that those dependencies are fully functional.

Partial Functional Dependency


In partial functional dependency a non key attribute depends on a part of the composite key,
rather than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite
key and Z is non key attribute. Then X->Z is a partial functional dependency in RBDMS.

58
Normal Forms in DBMS?
Normalization is a technique used in database design to reduce redundancy and improve data
integrity by organizing data into tables and ensuring proper relationships. Normal Forms are
different stages of normalization, and each stage imposes certain rules to improve the structure
and performance of a database. Let's break down the various normal forms step-by-step to
understand the conditions that need to be satisfied at each level:

1. First Normal Form (1NF): Eliminating Duplicate Records


A table is in 1NF if it satisfies the following conditions:

All columns contain atomic values (i.e., indivisible values).


Each row is unique (i.e., no duplicate rows).
Each column has a unique name.
The order in which data is stored does not matter.
Example of 1NF Violation: If a table has a column "Phone Numbers" that stores multiple phone
numbers in a single cell, it violates 1NF. To bring it into 1NF, you need to separate phone
numbers into individual rows.

2. Second Normal Form (2NF): Eliminating Partial Dependency


A relation is in 2NF if it satisfies the conditions of 1NF and additionally. No partial dependency
exists, meaning every non-prime attribute (non-key attribute) must depend on the entire primary
key, not just a part of it.

Example: For a composite key (StudentID, CourseID), if the StudentName depends only on
StudentID and not on the entire key, it violates 2NF. To normalize, move StudentName into a
separate table where it depends only on StudentID.

3. Third Normal Form (3NF): Eliminating Transitive Dependency


A relation is in 3NF if it satisfies 2NF and additionally, there are no transitive dependencies. In
simpler terms, non-prime attributes should not depend on other non-prime attributes.

Example: Consider a table with (StudentID, CourseID, Instructor). If Instructor depends on


CourseID, and CourseID depends on StudentID, then Instructor indirectly depends on StudentID,
which violates 3NF. To resolve this, place Instructor in a separate table linked by CourseID.

Advantages of Normal Form


1. Reduced data redundancy: Normalization helps to eliminate duplicate data in tables, reducing
the amount of storage space needed and improving database efficiency.

2. Improved data consistency: Normalization ensures that data is stored in a consistent and
organized manner, reducing the risk of data inconsistencies and errors.

59
3. Simplified database design: Normalization provides guidelines for organizing tables and data
relationships, making it easier to design and maintain a database.

4. Improved query performance: Normalized tables are typically easier to search and retrieve
data from, resulting in faster query performance.

5. Easier database maintenance: Normalization reduces the complexity of a database by breaking


it down into smaller, more manageable tables, making it easier to add, modify, and delete data.

Common Challenges of Over-Normalization


While normalization is a powerful tool for optimizing databases, it's important not to over-
normalize your data. Excessive normalization can lead to:

Complex Queries: Too many tables may result in multiple joins, making queries slow and
difficult to manage.
Performance Overhead: Additional processing required for joins in overly normalized databases
may hurt performance, especially in large-scale systems.

60
UNIT 5

What is Client-Server Architecture?


Client-server architecture is a cornerstone of modern system design, where the
network infrastructure is structured to include multiple clients and a central server.
In this model, clients are devices or programs that make requests for services or
resources, while the server is a powerful machine or software that fulfills these
requests. Communication between clients and the server follows a request-
response protocol, such as HTTP/HTTPS for web services or SQL for database
queries.

This architecture allows for efficient data management and resource allocation by
centralizing critical functions on the server, which can handle complex processing
and large-scale data storage.
Clients manage user interactions and send specific requests to the server, which
processes these requests and sends back appropriate responses.
The client-server architecture is highly scalable, as it can accommodate more
clients by scaling the server's capabilities or adding additional servers.
This design is prevalent in various applications, including web services, database
management, and email systems, providing a robust framework for developing and
managing complex, distributed systems efficiently.

Client-server architecture is critically important in system design for several


reasons:

Centralized Management: By centralizing resources and services on a server, this


architecture simplifies maintenance, updates, and security management.
Administrators can efficiently monitor and manage data, apply updates, and
enforce security policies from a single location.
Scalability: Client-server architecture supports scalability. As the number of clients
grows, additional servers can be added, or existing server capacities can be
expanded to handle increased demand without significantly altering the overall
61
system architecture.
Resource Optimization: This model allows for optimized resource allocation.
Servers are designed to handle intensive processing and large data storage, while
clients are optimized for user interactions and requests. This separation ensures
efficient use of system resources.
Reliability and Availability: With robust server infrastructure, client-server
systems can ensure high reliability and availability. Redundancies, backups, and
load balancing techniques can be implemented on the server side to minimize
downtime and ensure continuous service availability.
Enhanced Security: Centralized servers enable better security controls and data
protection measures. Sensitive data can be securely stored on servers, and access
can be tightly controlled and monitored. Encryption and authentication
mechanisms can be more effectively implemented.

The centralized architecture is defined as every node being connected to a central


coordination system, and whatever information they desire to exchange will be
shared by that system. A centralized architecture does not automatically require
that all functions must be in a single place or circuit, but rather that most parts are
grouped and none are repeated elsewhere as would be the case in a distributed
architecture.

A centralized architecture for DBMS is one in which all data is stored on a single
server, and all clients connect to that server in order to access and manipulate the
data. This type of architecture is also known as a monolithic architecture. One of
the main advantages of a centralized architecture is its simplicity - there is only one
server to manage, and all clients use the same data.

It consists following types of architecture:

Client-server
Application LayerinG

62
A distributed database is basically a database that is not limited to one
system, it is spread over different sites, i.e, on multiple computers or over a
network of computers. A distributed database system is located on various sites
that don't share physical components. This may be required when a particular
database needs to be accessed by various users globally. It needs to be managed
such that for the users it looks like one single database.

A distributed database system is a type of database management system that stores


data across multiple computers or sites that are connected by a network. In a
distributed database system, each site has its own database, and the databases are
connected to each other to form a single, integrated system

Advantages of Distributed Database System :

1) There is fast data processing as several sites participate in request processing.


2) Reliability and availability of this system is high.
3) It possess reduced operating cost.
4) It is easier to expand the system by adding more sites.
5) It has improved sharing ability and local autonomy.

Disadvantages of Distributed Database System :

1) The system becomes complex to manage and control.


2) The security issues must be carefully managed.
3) The system require deadlock handling during the transaction processing
otherwise
the entire system may be in inconsistent state.
4) There is need of some standardization for processing of distributed database
system.

Applications of Distributed Database:

It is used in Corporate Management Information System.


It is used in multimedia applications.
63
Used in Military's control system, Hotel chains etc.
It is also used in manufacturing control system.
several different architectures for distributed database systems, including:
Client-server architecture: In this architecture, clients connect to a central server,
which manages the distributed database system. The server is responsible for
coordinating transactions, managing data storage, and providing access control.

Peer-to-peer architecture: In this architecture, each site in the distributed database


system is connected to all other sites. Each site is responsible for managing its own
data and coordinating transactions with other sites.

Federated architecture: In this architecture, each site in the distributed database


system maintains its own independent database, but the databases are integrated
through a middleware layer that provides a common interface for accessing and
querying the data.

Distributed database systems can be used in a variety of applications, including e-


commerce, financial services, and telecommunications.

What Is an Object-Oriented Database (OODB)?


An object-oriented database (OODB) is a database that combines object-oriented
programming concepts with relational database principles. It is managed by an
object-oriented database management system (OODBMS). OODBs contain the
following elements:

Objects. The basic building block and an instance of a class. The type is either
built-in or user-defined.
Classes. A schema or blueprint that defines object structure and behavior.
Methods. A blueprint that defines the behavior of a class.
Pointers. An entity that helps access elements of an object database. They also help
establish relationships between objects.

Components of Object-Oriented Data Model:


The OODBMS is based on three major components, namely: Object structure,
Object classes, and Object identity. These are explained below.

64
1. Object Structure:
The structure of an object refers to the properties that an object is made up of.
These properties of an object are referred to as an attribute. Thus, an object is a
real-world entity with certain attributes that makes up the object structure. Also, an
object encapsulates the data code into a single unit which in turn provides data
abstraction by hiding the implementation details from the user.

The object structure is further composed of three types of components: Messages,


Methods, and Variables. These are explained below.

Messages -
A message provides an interface or acts as a communication medium between an
object and the outside world. A message can be of two types:

Read-only message: If the invoked method does not change the value of a variable,
then the invoking message is said to be a read-only message.
Update message: If the invoked method changes the value of a variable, then the
invoking message is said to be an update message.

Methods -
When a message is passed then the body of code that is executed is known as a
method. Whenever a method is executed, it returns a value as output. A method
can be of two types:

Read-only method: When the value of a variable is not affected by a method, then
it is known as the read-only method.
Update-method: When the value of a variable change by a method, then it is
known as an update method.

Variables -
It stores the data of an object. The data stored in the variables makes the object
distinguishable from one another.

2. Object Classes:
An object which is a real-world entity is an instance of a class. Hence first we need
to define a class and then the objects are made which differ in the values they store
but share the same class definition. The objects in turn correspond to various
messages and variables stored in them.

Example -
65
class CLERK

{ //variables
char name;
string address;
int id;
int salary;

//Messages
char get_name();
string get_address();
int annual_salary();
};

In the above example, we can see, CLERK is a class that holds the object variables
and messages.

An OODBMS also supports inheritance in an extensive manner as in a database


there may be many classes with similar methods, variables and messages. Thus,
the concept of the class hierarchy is maintained to depict the similarities among
various classes.

The concept of encapsulation that is the data or information hiding is also


supported by an object-oriented data model. And this data model also provides the
facility of abstract data types apart from the built-in data types like char, int, float.
ADT's are the user-defined data types that hold the values within them and can
also have methods attached to them.

Thus, OODBMS provides numerous facilities to its users, both built-in and user-
defined. It incorporates the properties of an object-oriented data model with a
database management system, and supports the concept of programming
paradigms like classes and objects along with the support for other concepts like
encapsulation, inheritance, and the user-defined ADT's (abstract data types).

ODBMS stands for Object-Oriented Database Management System, which is a


type of database management system that is designed to store and manage object-
oriented data. Object-oriented data is data that is represented using objects, which
encapsulate data and behavior into a single entity.

66
An ODBMS stores and manages data as objects, and provides mechanisms for
querying, manipulating, and retrieving the data. In an ODBMS, the data is
typically stored in the form of classes and objects, which can be related to each
other using inheritance and association relationships.

In an ODBMS, the data is managed using an object-oriented programming


language or a specialized query language designed for object-oriented databases.
Some of the popular object-oriented database languages include Smalltalk, Java,
and C++. Some ODBMS also support standard SQL for querying the data.

ODBMS have several advantages over traditional relational databases. One of the
main advantages is that they provide a natural way to represent complex data
structures and relationships. Since the data is represented using objects, it can be
easier to model real-world entities in the database. Additionally, ODBMS can
provide better performance and scalability for applications that require a large
number of small, complex transactions.

However, there are also some disadvantages to using an ODBMS. One of the main
disadvantages is that they can be more complex and harder to use than traditional
relational databases. Additionally, ODBMS may not be as widely used and
supported as traditional relational databases, which can make it harder to find
expertise and support. Finally, some applications may not require the advanced
features and performance provided by an ODBMS, and may be better suited for a
simpler database solution

Features of ODBMS:
Object-oriented data model: ODBMS uses an object-oriented data model to store
and manage data. This allows developers to work with data in a more natural way,
as objects are similar to the objects in the programming language they are using.

Complex data types: ODBMS supports complex data types such as arrays, lists,
sets, and graphs, allowing developers to store and manage complex data structures
in the database.

Automatic schema management: ODBMS automatically manages the schema of


the database, as the schema is defined by the classes and objects in the application
code. This eliminates the need for a separate schema definition language and
simplifies the development process.

High performance: ODBMS can provide high performance, especially for


67
applications that require complex data access patterns, as objects can be retrieved
with a single query.

Data integrity: ODBMS provides strong data integrity, as the relationships between
objects are maintained by the database. This ensures that data remains consistent
and correct, even in complex applications.

Concurrency control: ODBMS provides concurrency control mechanisms that


ensure that multiple users can access and modify the same data without conflicts.

Scalability: ODBMS can scale horizontally by adding more servers to the database
cluster, allowing it to handle large volumes of data.

Support for transactions: ODBMS supports transactions, which ensure that


multiple operations on the database are atomic and consistent.

Advantages:
Supports Complex Data Structures: ODBMS is designed to handle complex data
structures, such as inheritance, polymorphism, and encapsulation. This makes it
easier to work with complex data models in an object-oriented programming
environment.

Improved Performance: ODBMS provides improved performance compared to


traditional relational databases for complex data models. ODBMS can reduce the
amount of mapping and translation required between the programming language
and the database, which can improve performance.

Reduced Development Time: ODBMS can reduce development time since it


eliminates the need to map objects to tables and allows developers to work directly
with objects in the database.

Supports Rich Data Types: ODBMS supports rich data types, such as audio, video,
images, and spatial data, which can be challenging to store and retrieve in
traditional relational databases.

Scalability: ODBMS can scale horizontally and vertically, which means it can
handle larger volumes of data and can support more users.

Disadvantages:
Limited Adoption: ODBMS is not as widely adopted as traditional relational
databases, which means it may be more challenging to find developers with
68
experience working with ODBMS.

Lack of Standardization: ODBMS lacks standardization, which means that


different vendors may implement different features and functionality.

Cost: ODBMS can be more expensive than traditional relational databases since it
requires specialized software and hardware.

Integration with Other Systems: ODBMS can be challenging to integrate with


other systems, such as business intelligence tools and reporting software.

Scalability Challenges: ODBMS may face scalability challenges due to the


complexity of the data models it supports, which can make it challenging to
partition data across multiple nodes.

What is Spatial Data Mining?


Spatial Data Mining is the process of discovering interesting and previously
unknown, but potentially useful patterns from spatial databases. In spatial data
mining analysts use geographical or spatial information to produce business
intelligence or other results. Challenges involved in spatial data mining include
identifying patterns or finding objects that are relevant to the research project.
Advantages of Spatial Data Mining
 Insight Into Geographical Patterns: Spatial statistics assists in identifying such features
that would otherwise, lay undetected, by enabling organizations and researchers, to
identify trends concerning the area.
 Better Decision Making: Spatial data mining is thus applicable in areas such as urban
planning, environmental management, and logistics resources in organizations to make a
wise decision.
 Enhanced Visualization: The data collected at the different spatial levels can be
presented and represented in maps, which gives a better view of the trends and patterns.
Disadvantages of Spatial Data Mining
 Complexity of Data: This means that the raw data may be huge and intricate, making the
application of complex algorithms necessary together with huge computing facilities
for data processing.
 Data Inaccuracy: Some mistakes can be made while collecting the data like wrong
geographical coordinates of any place and wrong conclusions can be drawn as a result of
that.
What is Temporal Data Mining?
69
Temporal data refers to the extraction of simple, non-trivial and potentially useful
abstract information from large collection of temporal data. It is concerned with
the analysis of temporal data and for finding temporal patterns and regularities in
sets of temporal data tasks of temporal data mining are -
 Data Characterization and Comparison
 Cluster Analysis
 Classification
 Association rules
 Prediction and Trend Analysis
 Pattern Analysis
Advantages of Temporal Data Mining
 Trend Identification: Forecasting is also beneficial in identification of temporal factors
such as seasonality, cycle and trends, or generation of long-term shifts.
 Forecasting: It can be used in a predictive way to help the organization on how they can
be prepared to face the future outcomes or trends.
 Anomaly Detection: One of the strengths of temporal data mining is that it is able to
identify patterns that change frequently especially where there is evidence of sharp
changes which may point to either an event or a problem.
Disadvantages of Temporal Data Mining
 Handling Data Complexity: A lot of times, time-series data will be some kind of
dependent relationship with time points, and this makes analysis very difficult.
 Requires Large Historical Data: The problem of temporal data mining is that it might
take a large amount of historical data to find useful patterns which is not always feasible.
Difference Between Spatial and Temporal Data Mining
Spatial data mining Temporal data mining

It requires space. It requires time.

Spatial mining is the extraction Temporal mining is the extraction


of knowledge/spatial of knowledge about occurrence of
relationship and interesting an event whether they follow
measures that are not explicitly Cyclic , Random ,Seasonal
stored in spatial database. variations etc.

It deals with implicit or explicit


It deals with spatial (location ,
Temporal content , from large
Geo-referenced) data.
quantities of data.

70
Spatial data mining Temporal data mining

Spatial databases reverses


spatial objects derived by Temporal data mining comprises
spatial data. types and spatial the subject as well as its utilization
association among such in modification of fields.
objects.

It includes finding It aims at mining new and


characteristic rules, unknown knowledge, which takes
discriminant rules, association into account the temporal aspects
rules and evaluation rules etc. of data.

It is the method of identifying


unusual and unexplored data It deals with useful knowledge
but useful models from spatial from temporal data.
databases.

Examples - An association rule


which looks like - "Any Person
who buys a car also buys steering
Examples - Determining
lock". By temporal aspect this rule
hotspots , Unusual locations.
would be - " Any person who buys
a car also buys a steering lock after
that ".

Decision Support System (DSS) is a real-time decision-making tool where


data, models, and software are used in partnership with individuals to generate
efficient solutions. It combines numerous data inputs and offers methodological
approaches to evaluation, modelling and display of the information to facilitate
decision-making in case of the system’s challenging issues. DSS facilitates semi-
structured and unstructured decision-making and can improve the quality, speed
and efficiency of the decisions since new information. Prognoses that would be
hard to produce manually are available.

71
Characteristics of a Decision Support System
Interactive Interface: The graphical user interface is also user friendly hence users
can interact with DSS easily in inputting their data and get the desired output.
Data Integration: It gets information from various sources like DBMS, data marts,
Data warehouses and even data feeds to have complete data when processing data.
Support for Semi-structured and Unstructured Decisions: DSS on the other hand is
intended for usage in cases where the decision-making process is not highly
routinized as in the case of traditional management information systems.
Analytical Models and Tools: DSS also has tools for analysing data and making
recommendations; these tools range from statistical analysis, forecasting,
optimization, and simulation models.
Flexibility and Adaptability: The system can be applied to any type of decision-
making environment and is versatile in the sense that it can be modified in exact
conformity with the needs of the users or the organization.
What-if Analysis: It helps in what-if analysis where the assumptions or the values
of the input variable can be varied to determine the impact of change on the result.
Timely and Relevant Information: DSS thus supplies timely and relevant
information that can be used by the decision-makers to respond appropriately to
requisite and volatile environments.
Support for Group Decision Making: Most of the available DSS have group
support systems, where more than one person is involved in the decision-making
process.
Purpose of a Decision Support System
Improving Decision Quality: DSS assists in improving the quality of the decision
based on information and analysis that is accurate, comprehensive, pertinent, and
timely, therefore making better decisions possible.
Handling Complex Problems: This analytic resource is helpful when dealing with
assignments that are structured and unstructured since other approaches may not be
efficient in handling these issues by the application of analytical and modelling
instruments.
Facilitating Rapid Decision Making: DSS facilitates speedy implementation of the
decision by first automating the process of data collection and analysis.
Supporting Strategic Planning: Strategic support is well provided by DSS since all
organisations need to make long-run forecasts and planning and for this, DSS
offers tools in the form of scenarios, forecasts, and simulations.
Enhancing Efficiency: DSS reduces the time and efforts needed to amaze the
decision information, assemble data and analyze it, thus enhancing organizational
72
productivity at the operation stage.
Encouraging Collaboration: As has been seen many DSSs make it possible to make
collaborative decisions this makes it possible for many people to share information
and come up with agreed decisions.

Data analysis refers to the practice of examining datasets to draw conclusions


about the information they contain. It involves organizing, cleaning, and studying
the data to understand patterns or trends. Data analysis helps to answer questions
like "What is happening" or "Why is this happening".

Organizations use data analysis to improve decision-making, enhance efficiency,


and predict future outcomes. It's widely applied across various industries such as
business, healthcare, marketing, finance, and scientific research to gain insights
and solve. In this article, we will explore what is of data analysis, its types and the
tools used for effective analysis.

Why Data Analysis is important?


Data analysis is important because it helps us understand information so we can
make better choices. Let's understand this in more detail:

Informed Decision-Making: When we look at data, it helps us make better choices


because we can see how things have worked in the past, what’s happening right
now, and what might happen in the future. It gives us the facts to make smart
decisions.
Business Intelligence: Analyzing data helps companies stay ahead of others. By
looking at things like what customers like, what’s trending in the market, and
where they can improve, they can plan better and make smarter moves.
Problem Solving: It use in identifying and solving problems within a system or
process by revealing patterns or anomalies that require attention.
Performance Evaluation: If something isn’t working right, looking at data helps us
find out what’s wrong. It shows us patterns or issues we might not notice
otherwise, helping us fix problems.
Risk Management: Understanding patterns in data helps in predicting and
managing risks, allowing organizations to deal with the challenges.
The Process of Data Analysis
A Data analysis involves several key steps that help us to get insights from the raw
data Now Let's understand the process of Data Analysis.

process-of-Data-Analysis
73
Data Analysis Process
Define Objectives : Clearly define the goals of the analysis and the specific
questions you aim to answer. Establish a clear understanding of what insights or
decisions the analyzed data should inform.
Data Collection: Gather relevant data from various sources. Ensure data integrity,
quality, and completeness. Organize the data in a format suitable for analysis.
There are two types of data: qualititative and quantitative data.
Data Cleaning and Preprocessing: Address missing values, handle outliers, and
transform the data into a usable format. Cleaning and preprocessing steps are
crucial for ensuring the accuracy and reliability of the analysis.
Exploratory Data Analysis (EDA): Conduct exploratory analysis to understand the
characteristics of the data. Visualize distributions, identify patterns, and calculate
summary statistics. EDA helps in formulating hypotheses and refining the analysis
approach.
Statistical Analysis : Apply appropriate statistical methods or modeling techniques
to answer the defined questions. This step involves testing hypotheses, building
predictive models, or performing any analysis required to derive meaningful
insights from the data.
Visualization and Communication: Interpret the results in the context of the
original objectives. Communicate findings through reports, visualizations, or
presentations. Clearly articulate insights, conclusions, and recommendations based
on the analysis to support informed decision-making.
If you want to learn more about it . refers this: Data Analysis Process

Types of Data Analysis


Data Analysis are mainly divided into four types depending on the nature of the
data and the questions being addressed.

type_of_data_analytics
Types of Data Analysis
1. Descriptive Analysis
Descriptive analysis helps us understand what happened in the past. It looks at
historical data and summarizes it in a way that makes sense. For example, a
company might use descriptive analysis to see how much they sold last year or to
find out which product was most popular.

2. Diagnostic Analysis
Diagnostic analysis works hand in hand with Descriptive Analysis. As descriptive
Analysis finds out what happened in the past, diagnostic Analysis, on the other
hand, finds out why did that happen or what measures were taken at that time, or
how frequently it has happened. It helps businesses figure out the reasons behind
74
certain outcomes.

3. Predictive Analysis
By forecasting future trends based on historical data, Predictive analysis predictive
analysis enables organizations to prepare for upcoming opportunities and
challenges. For example, a store might use predictive analysis to figure out what
products will be popular in the upcoming season. It helps businesses prepare for
future events and make plans.

4. Prescriptive Analysis
Prescriptive Analysis is an advanced method that takes Predictive Analysis insights
and gives suggestions on the best actions to take. For example, if predictive
analysis shows that a certain product will be popular, prescriptive analysis might
suggest how much stock to buy or what marketing strategies to use. It’s about
giving businesses clear advice on how to act.

To learn more about it read this article: Types of Data Analysis

Tools for Data Analysis


Several tools are available to facilitate effective data analysis. These tools can
range from simple spreadsheet applications to complex statistical software. Some
popular tools include:

SAS, used for advanced analytics and predictive modeling;


Microsoft Excel, which is great for simple data manipulation and visualizations;
R, a free language for statistical analysis;
Python, a versatile programming language with libraries for data science;
Tableau Public, for creating interactive data visualizations;
Knime, an open-source platform for data mining and machine learning;
Power BI, a service for creating business intelligence dashboards and reports.

75
A Mobile database is a database that can be connected to a mobile computing
device over a mobile network (or wireless network). Here the client and the server
have wireless connections. In today's world, mobile computing is growing very
rapidly, and it is huge potential in the field of the database. It will be applicable on
different-different devices like android based mobile databases, iOS based mobile
databases, etc. Common examples of databases are Couch base Lite, Object Box,
etc.

Features of Mobile database :


Here, we will discuss the features of the mobile database as follows.

A cache is maintained to hold frequent and transactions so that they are not lost
due to connection failure.
As the use of laptops, mobile and PDAs is increasing to reside in the mobile
system.
Mobile databases are physically separate from the central database server.
Mobile databases resided on mobile devices.
Mobile databases are capable of communicating with a central database server or
other mobile clients from remote sites.
With the help of a mobile database, mobile users must be able to work without a
wireless connection due to poor or even non-existent connections (disconnected).
A mobile database is used to analyze and manipulate data on mobile devices.
Mobile Database typically involves three parties :

Fixed Hosts -
It performs the transactions and data management functions with the help of
database servers.

Mobiles Units -
These are portable computers that move around a geographical region that includes
the cellular network that these units use to communicate to base stations.

Base Stations -
These are two-way radios installation in fixed locations, that pass communication
with the mobile units to and from the fixed hosts.
Limitations :
76
Here, we will discuss the limitation of mobile databases as follows.

It has Limited wireless bandwidth.


In the mobile database, Wireless communication speed.
It required Unlimited battery power to access.
It is Less secured.
It is Hard to make theft-proof.

Object Oriented Databases


Object oriented databases are also called Object Database Management Systems
(ODBMS). Object databases store objects rather than data such as integers, strings
or real numbers. Objects are used in object oriented languages such as Smalltalk,
C++, Java, and others. Objects basically consist of the following:
Attributes - Attributes are data which defines the characteristics of an object. This
data may be simple such as integers, strings, and real numbers or it may be a
reference to a complex object.
Methods - Methods define the behavior of an object and are what was formally
called procedures or functions.
Therefore objects contain both executable code and data. There are other
characteristics of objects such as whether methods or data can be accessed from
outside the object. We don't consider this here, to keep the definition simple and to
apply it to what an object database is. One other term worth mentioning is classes.
Classes are used in object oriented programming to define the data and methods
the object will contain. The class is like a template to the object. The class does not
itself contain data or methods but defines the data and methods contained in the
object. The class is used to create (instantiate) the object. Classes may be used in
object databases to recreate parts of the object that may not actually be stored in
the database. Methods may not be stored in the database and may be recreated by
using a class.
Comparison to Relational Databases
Relational databases store data in tables that are two dimensional. The tables have
rows and columns. Relational database tables are "normalized" so data is not
repeated more often than necessary. All table columns depend on a primary key (a
unique value in the column) to identify the column. Once the specific column is
identified, data from one or more rows associated with that column may be
obtained or changed.
To put objects into relational databases, they must be described in terms of simple
string, integer, or real number data. For instance in the case of an airplane. The
77
wing may be placed in one table with rows and columns describing its dimensions
and characteristics. The fusalage may be in another table, the propeller in another
table, tires, and so on.
Breaking complex information out into simple data takes time and is labor
intensive. Code must be written to accomplish this task.
Object Persistence
With traditional databases, data manipulated by the application is transient and
data in the database is persisted (Stored on a permanent storage device). In object
databases, the application can manipulate both transient and persisted data.
When to Use Object Databases
Object databases should be used when there is complex data and/or complex data
relationships. This includes a many to many object relationship. Object databases
should not be used when there would be few join tables and there are large
volumes of simple transactional data.
Object databases work well with:
CAS Applications (CASE-computer aided software engineering, CAD-computer
aided design, CAM-computer aided manufacture)
Multimedia Applications
Object projects that change over time.
Commerce
Object Database Advantages over RDBMS
Objects don't require assembly and disassembly saving coding time and execution
time to assemble or disassemble objects.
Reduced paging
Easier navigation
Better concurrency control - A hierarchy of objects may be locked.
Data model is based on the real world.
Works well for distributed architectures.
Less code required when applications are object oriented.
Object Database Disadvantages compared to RDBMS
Lower efficiency when data is simple and relationships are simple.
Relational tables are simpler.
Late binding may slow access speed.
More user tools exist for RDBMS.
Standards for RDBMS are more stable.
Support for RDBMS is more certain and change is less likely to be required.
ODBMS Standards
Object Data Management Group
Object Database Standard ODM6.2.0
Object Query Language
OQL support of SQL92
78
How Data is Stored
Two basic methods are used to store objects by different database vendors.
Each object has a unique ID and is defined as a subclass of a base class, using
inheritance to determine attributes.
Virtual memory mapping is used for object storage and management.

Data transfers are either done on a per object basis or on a per page (normally 4K)
basis.

XML Database
is used to store huge amount of information in the XML format. As the use of
XML is increasing in every field, it is required to have a secured place to store the
XML documents. The data stored in the database can be queried using XQuery,
serialized, and exported into a desired format.
XML Database Types
There are two major types of XML databases −
XML- enabled
Native XML (NXD)
XML - Enabled Database
XML enabled database is nothing but the extension provided for the conversion of
XML document. This is a relational database, where data is stored in tables
consisting of rows and columns. The tables contain set of records, which in turn
consist of fields.
Native XML Database
Native XML database is based on the container rather than table format. It can
store large amount of XML document and data. Native XML database is queried
by the XPath-expressions.
Native XML database has an advantage over the XML-enabled database. It is
highly capable to store, query and maintain the XML document than XML-enabled
database.

79
A Multimedia database is a collection of interrelated multimedia data that includes
text, graphics (sketches, drawings), images, animations, video, audio etc and have
vast amounts of multisource multimedia data. The framework that manages
different types of multimedia data which can be stored, delivered and utilized in
different ways is known as multimedia database management system. There are
three classes of the multimedia database which includes static media, dynamic
media and dimensional media.

Content of Multimedia Database management system :

Media data - The actual data representing an object.


Media format data - Information such as sampling rate, resolution, encoding
scheme etc. about the format of the media data after it goes through the acquisition,
processing and encoding phase.
Media keyword data - Keywords description relating to the generation of data. It is
also known as content descriptive data. Example: date, time and place of
recording.
Media feature data - Content dependent data such as the distribution of colors,
kinds of texture and different shapes present in data.

Types of multimedia applications based on data management characteristic are :

Repository applications - A Large amount of multimedia data as well as meta-


data(Media format date, Media keyword data, Media feature data) that is stored for
retrieval purpose, e.g., Repository of satellite images, engineering drawings,
radiology scanned pictures.
Presentation applications - They involve delivery of multimedia data subject to
temporal constraint. Optimal viewing or listening requires DBMS to deliver data at
certain rate offering the quality of service above a certain threshold. Here data is
processed as it is delivered. Example: Annotating of video and audio data, real-
time editing analysis.
Collaborative work using multimedia information - It involves executing a
complex task by merging drawings, changing notifications. Example: Intelligent
healthcare network.
There are still many challenges to multimedia databases, some of which are :
80
Modelling - Working in this area can improve database versus information
retrieval techniques thus, documents constitute a specialized area and deserve
special consideration.
Design - The conceptual, logical and physical design of multimedia databases has
not yet been addressed fully as performance and tuning issues at each level are far
more complex as they consist of a variety of formats like JPEG, GIF, PNG, MPEG
which is not easy to convert from one form to another.
Storage - Storage of multimedia database on any standard disk presents the
problem of representation, compression, mapping to device hierarchies, archiving
and buffering during input-output operation. In DBMS, a ”BLOB”(Binary Large
Object) facility allows untyped bitmaps to be stored and retrieved.
Performance - For an application involving video playback or audio-video
synchronization, physical limitations dominate. The use of parallel processing may
alleviate some problems but such techniques are not yet fully developed. Apart
from this multimedia database consume a lot of processing time as well as
bandwidth.
Queries and retrieval -For multimedia data like images, video, audio accessing data
through query opens up many issues like efficient query formulation, query
execution and optimization which need to be worked upon.
Areas where multimedia database is applied are :

Documents and record management : Industries and businesses that keep detailed
records and variety of documents. Example: Insurance claim record.
Knowledge dissemination : Multimedia database is a very effective tool for
knowledge dissemination in terms of providing several resources. Example:
Electronic books.
Education and training : Computer-aided learning materials can be designed using
multimedia sources which are nowadays very popular sources of learning.
Example: Digital libraries.
Marketing, advertising, retailing, entertainment and travel. Example: a virtual tour
of cities.

Web Database?
A web database is a system for storing and displaying information that is
accessible from the Internet. It is a type of web application designed to be managed
and accessed through the Internet.

Web databases are ideal for situations where the information should be shared or
81
when it must be accessed from various locations or different devices. They are
especially beneficial when the system is to be shared between locations or different
devices, such as tablets, computers, and cell phones. Web databases can be used
for a range of different purposes, including membership databases, client lists,
inventory databases, and more.

In a web database, each field in a table has to have a defined data type, such as
numbers, strings, and dates. Proper database design involves choosing the correct
data type for each field to reduce memory consumption and improve performance.

Web databases enable collected data to be organised and catalogued thoroughly


within hundreds of parameters. They are customisable to an individual’s or
business’s needs and can be used for various purposes, such as creating website
polls, feedback forms, client or customer and inventory lists, and more.

Web databases can be accessed from anywhere by authorised users, allowing for
sharing and collaboration. Examples of web database software include Microsoft
Office Access, OpenOffice Base, Webex WebOffice database, FormLogix Web
database, and MySQL, which is a relational database management system often
used with web hosting for managing either personal or business website databases.

What are The Types of Web Databases?


Web databases can be categorised into several types based on different criteria,
such as data model, location, design, and hosting. Some of the common types of
web database are:

Data Model Based:


Hierarchical Databases: Data is organised in a tree-like structure, with data linked
based on a common point of linkage.
Network Databases: Similar to hierarchical databases, child records can be
associated with multiple parent records.
Object-Oriented Databases: These databases are designed to work well with
object-oriented programming languages such as Java, C++, and Python.
Relational Databases: Data is organised into tables, rows, and columns. SQL is
commonly used to query this type of database.
Non-Relational Databases (NoSQL): These databases are designed to handle
unstructured data and can scale horizontally.
Location-Based:
Centralised Database: The database is stored and maintained in a single location.
Distributed Database: The database is spread across different physical locations for
improved performance and reliability.
82
Design Based:
Operational (OLTP) Database: These databases are designed for transaction-
oriented applications.
Analytical (OLAP) Database: These databases are designed for data analysis and
reporting.
Hosting Based:
On-Premises Database: The database is hosted on the company’s own servers.
Cloud Database: The database is hosted on a cloud service provider’s platform.

Spatial data support in database is important for efficiently storing, indexing and
querying of data on the basis of spatial location. For example, suppose that we
want to store a set of polygons in a database and to query the database to find all
polygons that intersect a given polygon. We cannot use standard index structures,
such as B-trees or hash indices, to answer such a query efficiently. Efficient
processing of the above query would require special-purpose index structures, such
as R-trees for the task.

Two types of Spatial data are particularly important:

Computer-aided design (CAD)data, which include spatial information about how


objects-such as building, cars, or aircraft-are constructed. Other important
examples of computer-aided-design databases are integrated-circuit and electronic-
device layouts.

CAD systems traditionally stored data in memory during editing or other


processing, and wrote the data back to a file at the end of a session of editing. The
drawbacks of such a schema include cost(programming complexity, as well as time
cost) of transforming data from one form to another, and the need to read in an
entire file even if only parts of it are required. For large design of an entire
airplane, it may be impossible to hold the complete design in memory. Designers
of object oriented database were motivated in large part by the database
requirements of CAD systems. Object-oriented database represent components of
design as objects, and the connections between the objects indicate how the design
is structure.

Geographic data such as road maps, land-usage maps, topographic elevation maps,
political maps showing boundaries, land-ownership maps, and so on. Geographical
information system are special purpose databases for storing geographical data.
Geographical data are differ from design data in certain ways. Maps and satellite
images are typical examples of geographic data. Maps may provide not only
83
location information associated with locations such as elevations. Soil type, land
type and annual rainfall.

Types of geographical data :

Raster data
Vector data
1.Raster data: Raster data consist of pixels also known as grid cells in two or more
dimensions. For example, image of Satellites , digital pictures, and scanned maps.

2.Vector data: Vector data consist of triangles, lines, and various geometrical
objects in two dimensions and cylinders, cuboids, and other polyhedrons in three
dimensions. For example, building boundaries and roads.

Applications of Spatial databases in DBMS :

Microsoft SQL server: Since the 2008 version of Microsoft SQL server supported
spatial databases.
CouchDB : This is document-based database in which spatial data is enabled by
plugin called GeoCouch.
Neo4j database.

An active Database is a database consisting of a set of triggers. These databases are


very difficult to be maintained because of the complexity that arises in
understanding the effect of these triggers. In such database, DBMS initially
verifies whether the particular trigger specified in the statement that modifies the
database is activated or not, prior to executing the statement. If the trigger is active
then DBMS executes the condition part and then executes the action part only if
the specified condition is evaluated to true. It is possible to activate more than one
trigger within a single statement. In such situation, DBMS processes each of the
trigger randomly. The execution of an action part of a trigger may either activate
other triggers or the same trigger that Initialized this action. Such types of trigger
that activates itself is called as 'recursive trigger'. The DBMS executes such chains
of trigger in some pre-defined manner but it effects the concept of understanding.

84
Features of Active Database:
1. It possess all the concepts of a conventional database i.e. data modelling facilities, query
language etc.
2. It supports all the functions of a traditional database like data definition, data
manipulation, storage management etc.
3. It supports definition and management of ECA rules.
4. It detects event occurrence.
5. It must be able to evaluate conditions and to execute actions.
6. It means that it has to implement rule execution.
Examples of Active Databases:
1. Real-time Databases
2. In-Memory Databases
3. Transactional Databases
4. Time-series Databases
1.Real-time Databases:
 Oracle TimesTen: A relational database that runs in memory and is intended for real-
time applications that need response times of less than one millisecond.
 VoltDB: A lightning-fast in-memory database for instantaneous analytics and data
processing.
2.In-Memory Databases:
85
 SAP HANA: A column-oriented, in-memory relational database management system for
processing large amounts of data and real-time analytics.
 MemSQL: Uses in-memory processing for real-time data insights, combining analytics
and transactions on a single platform.
3.Transactional Databases:
 MySQL Cluster: Offers automatic sharding and synchronous replication for high
availability and real-time data access.
 Microsoft SQL Server with Always On: High availability and disaster recovery are
provided by Microsoft SQL Server with Always On, which enables real-time read access
to replicated databases.
4.Time-series Databases:
 InfluxDB: For time-stamped data, InfluxDB is designed to withstand heavy write and
query loads. It is frequently utilized in IoT and monitoring applications.
 Prometheus: A toolkit for alerting and monitoring that keeps track of time series data
and is used to analyze and monitor systems in real time.
These databases and platforms support a variety of real-time data handling
requirements, including high-throughput stream processing, low-latency
transaction processing, and event-driven architectures.
Advantages :
1. Enhances traditional database functionalities with powerful rule processing capabilities.
2. Enable a uniform and centralized description of the business rules relevant to the
information system.
3. Avoids redundancy of checking and repair operations.
4. Suitable platform for building large and efficient knowledge base and expert systems.

86
A Mobile database is a database that can be connected to a mobile computing
device over a mobile network (or wireless network). Here the client and the server
have wireless connections. In today's world, mobile computing is growing very
rapidly, and it is huge potential in the field of the database. It will be applicable on
different-different devices like android based mobile databases, iOS based mobile
databases, etc. Common examples of databases are Couch base Lite, Object Box,
etc.

Features of Mobile database :


Here, we will discuss the features of the mobile database as follows.

A cache is maintained to hold frequent and transactions so that they are not lost
due to connection failure.
As the use of laptops, mobile and PDAs is increasing to reside in the mobile
system.
Mobile databases are physically separate from the central database server.
Mobile databases resided on mobile devices.
Mobile databases are capable of communicating with a central database server or
other mobile clients from remote sites.
With the help of a mobile database, mobile users must be able to work without a
wireless connection due to poor or even non-existent connections (disconnected).
A mobile database is used to analyze and manipulate data on mobile devices.
Mobile Database typically involves three parties :

Fixed Hosts -
It performs the transactions and data management functions with the help of
database servers.

Mobiles Units -
These are portable computers that move around a geographical region that includes
the cellular network that these units use to communicate to base stations.

Base Stations -
These are two-way radios installation in fixed locations, that pass communication
with the mobile units to and from the fixed hosts.
Limitations :
Here, we will discuss the limitation of mobile databases as follows.

It has Limited wireless bandwidth.


In the mobile database, Wireless communication speed.
87
It required Unlimited battery power to access.
It is Less secured.
It is Hard to make theft-proof.

88
UNIT- 4

Transaction mean in DBMS?

A transaction refers to a sequence of one or more operations (such as read, write, update, or
delete) performed on the database as a single logical unit of work. A transaction ensures that
either all the operations are successfully executed (committed) or none of them take effect
(rolled back). Transactions are designed to maintain the integrity, consistency and reliability of
the database, even in the case of system failures or concurrent access.

All types of database access operation which are held between the beginning and end transaction
statements are considered as a single logical transaction. During the transaction the database is
inconsistent. Only once the database is committed the state is changed from one consistent state
to another.

Facts about Database Transactions


 A transaction is a program unit whose execution may or may not change the contents of a
database.
 The transaction is executed as a single unit.
 If the database operations do not update the database but only retrieve data, this type of
transaction is called a read-only transaction.
 A successful transaction can change the database from one CONSISTENT STATE to
another.
 DBMS transactions must be atomic, consistent, isolated and durable.
 If the database were in an inconsistent state before a transaction, it would remain in the
inconsistent state after the transaction.

Properties of Transaction
As transactions deal with accessing and modifying the contents of the database, they must have
some basic properties which help maintain the consistency and integrity of the database before
and after the transaction. Transactions follow 4 properties, namely, Atomicity, Consistency,
Isolation, and Durability.
Generally, these are referred to as ACID properties of transactions in DBMS. ACID is the
acronym used for transaction properties. A brief description of each property of the transaction is
as follows.

89
Atomicity

Atomicity is achieved through commit and rollback operations, i.e. changes are made to the
database only if all operations related to a transaction are completed, and if it gets interrupted,
any changes made are rolled back using rollback operation to bring the database to its last saved
state.

Consistency
This property of a transaction keeps the database consistent before and after a transaction is
completed.
Execution of any transaction must ensure that after its execution, the database is either in its prior
stable state or a new stable state.
In other words, the result of a transaction should be the transformation of a database from one
consistent state to another consistent state.
Consistency, here means, that the changes made in the database are a result of logical operations
only which the user desired to perform and there is not any ambiguity.

Isolation
This property states that two transactions must not interfere with each other, i.e. if some data is
used by a transaction for its execution, then any other transaction can not concurrently access
that data until the first transaction has completed. It ensures that the integrity of the database is
maintained and we don't get any ambiguous values. Thus, any two transactions are isolated from
each other.

Example:
Transaction 1: Withdraw $100 from account A.

Durability
This property ensures that the changes made to the database after a transaction is completely
executed, are durable.
It indicates that permanent changes are made by the successful execution of a transaction.
In the event of any system failures or crashes, the consistent state achieved after the completion
of a transaction remains intact. The recovery subsystem of DBMS is responsible for enforcing
this property.

90
91

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy