Module 1
Module 1
Data is a collection of a distinct small unit of information. It can be used in a variety of forms like text, numbers,
media, bytes, etc. it can be stored in pieces of paper or electronic memory, etc.
What is Database?
A database is an organized collection of data, so that it can be easily accessed and managed.
For example: The college Database organizes the data about the admin, staff, students and faculty etc.
o DBMS provides the interface to perform the various operations like creation, deletion, modification, etc.
o DBMS allows the user to create their databases as per their requirement.
o DBMS accepts the request from the application and provides specific data through the operating system.
o DBMS contains the group of programs which acts according to the user instruction.
o It provides security to the database.
Advantage of DBMS
Controls redundancy
It stores all the data in a single database file, so it can control data redundancy.
Data sharing
Backup
• Library Management System − Now-a-days it’s become easy in the Library to track each
book and maintain it because of the database. This happens because there are thousands of
books in the library. It is very difficult to keep a record of all books in a copy or register.
Now DBMS used to maintain all the information related to book issue dates, name of the
book, author and availability of the book.
• Banking − Banking is one of the main applications of databases. We all know there will
be a thousand transactions through banks daily and we are doing this without going to the
bank. This is all possible just because of DBMS that manages all the bank transactions.
• Universities and colleges − Now-a-days examinations are done online. So, the universities
and colleges are maintaining DBMS to store Student’s registrations details, results, courses
and grade all the information in the database. For example, telecommunications. Without
DBMS there is no telecommunication company. DBMS is most useful to these companies
to store the call details and monthly postpaid bills.
.
• Finance − Now-a-days there are lots of things to do with finance like storing sales, holding
information and finance statement management etc. these all can be done with database
systems.
• Military − In military areas the DBMS is playing a vital role. Military keeps records of
soldiers and it has so many files that should be kept secure and safe. DBMS provides a high
security to military information.
• Online Shopping − Now-a-days we all do Online shopping without wasting the time by
going shopping with the help of DBMS. The products are added and sold only with the help
of DBMS like Purchase information, invoice bills and payment.
• Manufacturing − Manufacturing companies make products and sell them on a daily basis.
To keep records of all those details DBMS is used.
• Airline Reservation system − Just like the railway reservation system, airlines also need
DBMS to keep records of flights arrival, departure and delay status.
Purpose of Database systems
The purpose of database systems is to make the database user-friendly and do easy operations. Users can easily insert,
update, and delete. Actually, the main purpose is to have more control of the data.
If there are multiple copies of the same data, it just avoids it. It just maintains data in a single repository. Also, the purpose
of database systems is to make the database consistent.
A database system can easily manage to access data. Through different queries, it can access data from the database.
Data isolation:
Atomicity of updates:
In case of power failure, the database might lose data. So, this feature will automatically prevent data loss.
Concurrent access:
Users can have multiple access to the database at the same time.
Security problems:
Database systems will make the restricted access. So, the data will not be vulnerable.
It can support multiple views of data to give the required view as their needs. Only database admins can have a complete
view of the database. We cannot allow the end-users to have a view of developers.
View of Data
View of data in DBMS narrate how the data is visualized at each level of data abstraction?
Data abstraction allow developers to keep complex data structures away from the users. The
developers achieve this by hiding the complex data structures through levels of abstraction.
There is one more feature that should be kept in mind i.e. the data independence. While
changing the data schema at one level of the database must not modify the data schema at the
next level.
Data Abstraction
Data abstraction is hiding the complex data structure in order to simplify the user’s
interface of the system. It is done because many of the users interacting with the database
system are not that much computer trained to understand the complex data structures of the
database system.
Three-Schema Architecture:
The main objective of this architecture is to have an effective separation between the user
interface and the physical database. So, the user never has to be concerned regarding the
internal storage of the database and it has a simplified interaction with the database system.
The physical or the internal level schema describes how the data is stored in the hardware.
It also describes how the data can be accessed. The physical level shows the data abstraction
at the lowest level and it has complex data structures. Only the database administrator
operates at this level.
2. Logical Level/ Conceptual Level
It is a level above the physical level. Here, the data is stored in the form of the entity
set, entities, their data types, the relationship among the entity sets, user
operations performed to retrieve or modify the data and certain constraints on the data. Well
adding constraints to the view of data adds the security. As users are restricted to access some
particular parts of the database.
It is the developer and database administrator who operates at the logical or the conceptual
level.
We have to create a database of a college. Now, what entity sets would be involved? Student,
Lecturer, Department, Course and so on…
Now, the entity sets Student, Lecturer, Department, Course will be stored in the storage as
the consecutive blocks of the memory location. This is the physical or internal level and is
hidden from the programmers but the database administrator is it aware of it.
At the logical level, the programmers define the entity sets and relationship among these entity
sets using a programming language like SQL. So, the programmers work at the logical level
and even the database administrator also operates at this level.
At the view level, the users have the set of applications which they use to retrieve the data
they are interested in.
Data Independence
Data independence defines the extent to which the data schema can be changed at one level
without modifying the data schema at the next level. Data independence can be classified as
shown below:
Well, the changes to data schema at the logical level are made either to enlarge or reduce the
database by adding or deleting more entities, entity sets, or changing the constraints on data.
As we discussed above the database comprises of several entity sets and the relationship
between them. Now, the data in the database keeps on changing with time. As we keep
inserting or deleting the data to and from the database.
Now, at a particular time if we retrieve any information from the database then that
corresponds to an instance.
What is schema?
Whenever we talk about the database the developers have to deal with the definition of
database and the data in the database.
The definition of a database comprises of the description of what data it would contain what
would be the relationship between the data. This definition is the database schema.
Database languages
Database languages, also known as query languages or data query languages, are a classification of
programming languages that developers use to define and access databases, which are collections of
organized data that users can access electronically
4 categories of database languages
Here are four types of database languages and their uses:
1. Data definition language (DDL)
Data definition language (DDL) creates the framework of the database by specifying the database schema,
which is the structure that represents the organization of data. Its common uses include the creation and
alteration of tables, files, indexes and columns within the database. This language also allows users to
rename or drop the existing database or its components. Here's a list of DDL statements:
• CREATE: Creates a new database or object, such as a table, index or column
• ALTER: Changes the structure of the database or object
• DROP: Deletes the database or existing objects
• RENAME: Renames the database or existing objects
2. Data manipulation language (DML)
Data manipulation language (DML) provides operations that handle user requests, offering a way to access
and manipulate the data that users store within a database. Its common functions include inserting,
updating and retrieving data from the database. Here's a list of DML statements:
• INSERT: Adds new data to the existing database table
• UPDATE: Changes or updates values in the table
• DELETE: Removes records or rows from the table
• SELECT: Retrieves data from the table or multiple tables
3. Data control language (DCL)
Data control language (DCL) controls access to the data that users store within a database. Essentially,
this language controls the rights and permissions of the database system. It allows users to grant or revoke
privileges to the database. Here's a list of DCL statements:
• GRANT: Gives a user access to the database
• REVOKE: Removes a user's access to the database
4. Transaction control language (TCL)
Transaction control language (TCL) manages the transactions within a database. Transactions group a set
of related tasks into a single, executable task. All the tasks must succeed in order for the transaction to
work. Here's a list of TCL statements:
• COMMIT: Carries out a transaction
• ROLLBACK: Restores a transaction if any tasks fail to execute
DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server architecture is used to
deal with a large number of PCs, web servers, database servers and other components that are
connected with networks.
o The client/server architecture consists of many PCs and a workstation which are connected via the
network.
o DBMS architecture depends upon how users are connected to the database to get their request
done.
1-Tier Architecture
o In this architecture, the database is directly available
to the user. It means the user can directly sit on the
DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a handy
tool for end users.
o The 1-Tier architecture is used for development of the local application, where programmers can
directly communicate with the database for the quick response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on
the client end can directly communicate with the database at the server side. For this interaction,
API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the
client-side.
o The server side is responsible to provide the functionalities
like: query processing and transaction management.
o To communicate with the DBMS, client-side application
establishes a connection with the server side.
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this architecture,
client can't directly communicate with the server.
o The application on the client-end interacts with an application
server which further communicates with the database system.
o End user has no idea about the existence of the database beyond
the application server. The database also has no idea about any
other user beyond the application.
o The 3-Tier architecture is used in case of large web application.
DATA MODELS
Data models in DBMS help to understand the design at the conceptual, physical, and logical levels as it
provides a clear picture of the data making it easier for developers to create a physical database.
Data models are used to describe how the data is stored, accessed, and updated in a DBMS.
It defines the data elements and the relationships between the data elements. Data Models
are used to show how data is stored, connected, accessed and updated in the database
management system. The most common types of data models are:-
1. Hierarchical Model
2. Network Model
3. Entity-Relationship Model
4. Relational Model
5. Object-Oriented Data Model
1) Hierarchical Model
Hierarchical Model was the first DBMS model. This model organises the data in the
hierarchical tree structure. The hierarchy starts from the root which has root data and then
it expands in the form of a tree adding child node to the parent node. This model easily
represents some of the real-world relationships like food recipes, sitemap of a website
etc. Example: We can represent the relationship between the shoes present on a shopping
website in the following way:
2) Network Model
This model is an extension of the hierarchical model. It was the most popular model before
the relational model. This model is the same as the hierarchical model, the only difference is
that a record can have more than one parent. It replaces the hierarchical tree with a
graph. Example: In the example below we can see that node student has two parents i.e. CSE
Department and Library. This was earlier not possible in the hierarchical model.
• The data can be accessed faster as compared to the hierarchical model. This is
because the data is more related in the network model and there can be more than
one path to reach a particular node. So the data can be accessed in many ways.
• As there is a parent-child relationship so data integrity is present. Any change in
parent record is reflected in the child record.
Disadvantages of Network Model
• As more and more relationships need to be handled the system might get complex.
So, a user must be having detailed knowledge of the model to work with the model.
• Any change like updation, deletion, insertion is very complex.
3) Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In this
model, we represent the real-world problem in the pictorial form to make it easy for the
stakeholders to understand. It is also very easy for the developers to understand the system
by just looking at the ER diagram. We use the ER diagram as a visual tool to represent an ER
Model. ER diagram has the following three components:
In the above diagram, the entities are Teacher and Department. The attributes
of Teacher entity are Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The
attributes of entity Department entity are Dept_id, Dept_name. The two entities are
connected using the relationship. Here, each teacher works for a department.
Advantages of ER Model
4) Relational Model
Relational Model is the most widely used model. In this model, the data is maintained in the
form of a two-dimensional table. All the information is stored in the form of row and
columns. The basic structure of a relational model is tables. So, the tables are also
called relations in the relational model. Example: In this example, we have an Employee
table.
• Tuples : Each row in the table is called tuple. A row contains all the information
about any instance of the object. In the above example, each row has all the
information about any specific individual like the first row has information about
John.
• Attribute or field: Attributes are the property which defines the table or relation. The
values of the attribute should be from the same domain. In the above example, we
have different attributes of the employee like Salary, Mobile_no, etc.
Advnatages of Relational Model
• Simple: This model is more simple as compared to the network and hierarchical
model.
• Scalable: This model can be easily scaled as we can add as many rows and columns we
want.
• Structural Independence: We can make changes in database structure without
changing the way to access the data. When we can make changes to the database
structure without affecting the capability to DBMS to access the data we can say that
structural independence has been achieved.
4)Object-Oriented Data Model
The real-world problems are more closely represented through the object-oriented data
model. In this model, both the data and relationship are present in a single structure known
as an object. We can store audio, video, images, etc in the database which was not possible
in the relational model(although you can store audio and video in relational database, it is
adviced not to store in the relational database). In this model, two are more objects are
connected through links. We use this link to relate one object to other objects. This can be
understood by the example given below.
In the above example, we have two objects Employee and Department. All the data and
relationships of each object are contained as a single unit. The attributes like Name, Job_title
of the employee and the methods which will be performed by that object are stored as a single
object. The two objects are connected through a common attribute i.e the Department_id and
the communication between these two will be done with the help of this common id.
Introduction to Relational Model
The relational model represents how data is stored in Relational Databases. A relational
database stores data in the form of relations (tables). Consider a relation STUDENT with
attributes ROLL_NO, NAME, ADDRESS, PHONE, and AGE shown in.
STUDENT
IMPORTANT TERMINOLOGIES
• Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
• Relation Schema: A relation schema represents the name of the relation with its attributes. e.g.;
STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the relation schema for
STUDENT. If a schema has more than 1 relation, it is called Relational Schema.
• Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples, one of
which is shown as:
• Relation Instance: The set of tuples of a relation at a particular instance of time is called a
relation instance. Table shows the relation instance of STUDENT at a particular time. It can
change whenever there is an insertion, deletion, or update in the database.
• Degree: The number of attributes in the relation is known as the degree of the relation.
The STUDENT relation defined above has degree 5.
• NULL Values: The value which is not known or unavailable is called a NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL
DATABASE SCHEMA
A database schema is a blueprint or architecture of how our data will look. It doesn’t hold
data itself, but instead describes the shape of the data and how it might relate to other tables
or models. An entry in our database will be an instance of the database schema. It will
contain all of the properties described in the schema.
The size and complexity of your database schema depends on the size of your project. The
visual style of a database schema allows programmers to structure the database and its
relationships properly before jumping into the code. The process of planning a database
design is called data modeling..
In many companies, database design and DBMS responsibilities usually fall to the role of
the Database Administrator (DBA). DBAs are responsible for ensuring that data analysts
and database users can easily access information. They work alongside management teams
to plan and securely manage an organization’s database.
There are two main database schema types that define different parts of the
schema: logical and physical.
1. Logical
A logical database schema represents how the data is organized in terms of tables. It also
explains how attributes from tables are linked together. Different schemas use a different
syntax to define the logical architecture and constraints.
2. Physical
The physical database schema represents how data is stored on disk storage. In other words,
it is the actual code that will be used to create the structure of your database.
Suppose you want to create a database to store information for your company’s
accounting department. This database’s schema could outline the structure of two
simple tables:
A) Table1
• Title: Users
• Fields: ID, Full Name, Email, DOB, Dept
B) Table2
These schema tables can then be converted into SQL code by developers and
database administrators.
Let's start by creating a schema called bookstore and a table called genres:
The snippet of code above, besides defining a schema, The table has three columns:
For example, ID is used as a key in the Student table because it is unique for each student. In
the PERSON table, passport_number, license_number, SSN are keys since they are unique for
each person.
1. Candidate Key
2. Primary Key
3. Foreign Key
4. Super Key
5. Alternate Key
6. Composite Key
7. Unique Key
1. Candidate Key
The candidate keys refer to those attributes that identify rows uniquely in a table. In a table, we select
the primary key from a candidate key. Thus, a candidate key has similar properties as that of the primary
keys. In a table, there can be multiple candidate keys.
2. Primary Key
The primary key refers to a column or a set of columns of a table that helps us identify all the records
uniquely present in that table.
A table can consist of just one primary key. Also, this primary key cannot consist of the same values
reappearing/repeating for any of its rows. All the values of a primary key have to be different, and there
should be no repetitions.
3. Super Key
A super key refers to the set of all those keys that help us uniquely identify all the rows present in a
table. It means that all of these columns present in a table that can identify the columns of that table
uniquely act as the super keys.
A super key is a candidate key’s superset (candidate key has been explained below). We need to pick
the primary key of any table from the super key’s set so as to make it the table’s identity attribute.
4. Alternate Key
As we have stated above, any table can consist of multiple choices for the primary key. But, it can only
choose one. Thus, all those keys that did not become a primary key are known as alternate keys.
5. Foreign Key
We use a foreign key to establish relationships between two available tables. The foreign key would
require every value present in a column/set of columns to match the referential table’s primary key. A
foreign key helps us to maintain data as well as referential integrity.
Foreign keys are the column of the table used to point to the primary key of another table.
6. Composite Key
The composite key refers to a set of multiple attributes that help us uniquely identify every tuple present
in a table. The attributes present in a set may not be unique whenever we consider them separately.
Thus, when we take them all together, it will ensure total uniqueness.
7. Unique Key
A unique key refers to a column/a set of columns that identify every record uniquely in a table. All the
values in this key would have to be unique. Remember that a unique key is different from a primary key.
It is because it is only capable of having one null value. A primary key, on the other hand, cannot have a
null value.
Relational Algebra
Relational Algebra is a procedural query language, it is used to provide a single table / relation
as output of performing operations on more than one relations. Some of the basic relations
will be discussed here.
In our course of learning, we will use three relations (table) −
Table 1: course
Course_id Name
1 Computer science
2 Information Technology
3 mechanical
Table 2: students
Roll No. Name address age
1 Ram Delhi 18
2 Raju hyderabad 20
4 Faiz Delhi 22
5 Salman hyderabad 20
Table 3: Hostel
St. No. Name address age
1 Ram Delhi 18
2 Akash hyderabad 20
3 neha Jhansi 21
On this relations, we will perform some operation to make new relation based on operations
performed.
1) Selection operation (σ) − The selection operator denoted by sigma σ is used to select the
tuples of a relation based on some condition. Only those tuples that fall under certain
conditions are selected.
Syntax
σ(condition)(relation_name)
Example
Select the student with course id 1.
σ(course_id = 1)(student)
Result
Roll No. Name address age
4 Faiz Delhi 22
2)Projection operation (∏) The projection operator denoted by ∏ is used to select columns
from a specific reaction. Only specific columns are selected.
Syntax
∏(column1 , column2 , … , columnn)(relation_name)
Example
Let’s select all students's name and no who are in hostel.
∏( st. No. , name)(hostel)
Result
St. No. Name
1 Ram
2 Akash
3 neha
The row are always distinct in projection i.e. if their is any other student whose name is panjak
the other one is removed.
3)Cross Product(X) - Cross product is denoted using the X symbol and is used to find the value
of join of two variables. In cross product each tuple of relation1 is crossed with each tuple of
relation2. Which makes the output relation of the order nXm, where n is the number of tuples
in relation1 and m is the number of tuples in relation2.
Syntax
relation1 X relation2
Example
Let’s find cross product of course and hostel table.
student X course
4)Union (U) - The union of two relations relation1 and relation2 will gives the tuples that are
either in relation1 or in relation2 but tuples that are in both relation1 and relation2 are
considered only once.
Also both relations should be of the same domain for finding there union.
Syntax
relation1 U relation2
Example
Let’s find the union of student and hostel
student U hostel
1 Ram Delhi 18
2 Raju hyderabad 20
4 Faiz Delhi 22
5 Salman hyderabad 20
2 Akash hyderabad 20
3 neha Jhansi 21
5)Minus (-) operator - operator is denoted by - symbol. Relation1 - relation2 will result into a
relation in which the tuple in relation1 and not in relation2 are present. For calculating minus
too, the relations must be union compatible.
Syntax
relation1 - relation2
Example
Let’s find the operation student - hostel
student - hostel
2 Raju hyderabad 20
4 Faiz Delhi 22
5 Salman hyderabad 20
6)rename(ρ) − the rename operation denoted by the ρ is used to rename the given relation to
another name given.
Syntax
ρ(new_name , old_name)