Unit-1
Unit-1
Systems
UNIT-I
File-based Definition
• A file processing system is a collection of programs that store and manage files in computer hard-
disk.File processing system has more data redundancy, less data redundancy in dbms. File processing
system provides less flexibility in accessing data, whereas dbms has more flexibility in accessing data.
• Program defines and manages it’s own data
Types Of File Processing System:
• Relative-record-number processing
• Consecutive processing.
• Sequential-by-key processing.
• Random-by-key processing.
• Sequential-within-limits processing.
Disadvantages of file processing:
Concurrent Access: – Accessing the same data from the same file is called concurrent access.
In the file system, concurrent access leads to incorrect data. For example, a student wants to
borrow a book from the library.
Security: – Each file can be password protected. But what if you have to give access to only
a few records in the file? For example, the user has to be given access to view only their bank
account information in the file. This is very difficult in the file system.
Atomicity: – If there is any failure to insert, update, or delete in the file system, there is no
mechanism to switch back to the previous state. Imagine marks for one particular subject
need to be entered into the Report file and then total needs to be calculated. But after
entering the new marks, the file is closed without saving. That means whole of the required
transaction is not performed.
Database management system (DBMS):
A Database Management System (DBMS) is software designed to store, retrieve, define, and manage data in a
database.
• Definition
• A collection of self-describing and integrated data files
• System catalog
• Meta data
• Data dictionary
• Overhead data
• Data abstraction
Database Management System Facility
A Data Base Management System is a system software for easy, efficient and reliable data processing and
management. It can be used for:
• Creation of a database.
• Retrieval of information from the database.
• Updating the database.
• Managing a database.
• Multiple User Interface
Data scalability, expandability and flexibility: We can change schema of the database, all schema will be
updated according to it. Overall the time for developing an application is reduced.
Security: Simplifies data storage as it is possible to assign security permissions allowing restricted access to
data.
Data organization: DBMS allow users to organize large amounts of data in a structured and systematic
way. Data is organized into tables, fields, and records, making it easy to manage, store, and retrieve
information.
Data scalability: DBMS are designed to handle large amounts of data and are scalable to meet the growing
needs of organizations. As organizations grow, DBMS can scale up to handle increasing amounts of data and
user traffic.
Data Organization and Management:
• One of the primary needs for a DBMS is data organization and management. DBMSs allow data to be stored in a
structured manner, which helps in easier retrieval and analysis. A well-designed database schema enables faster
access to information, reducing the time required to find relevant data. A DBMS also provides features like
indexing and searching, which make it easier to locate specific data within the database. This allows
organizations to manage their data more efficiently and effectively.
Data Security and Privacy:
• DBMSs provide a robust security framework that ensures the confidentiality, integrity, and availability of data.
They offer authentication and authorization features that control access to the database. DBMSs also provide
encryption capabilities to protect sensitive data from unauthorized access. Moreover, DBMSs comply with
various data privacy regulations such as the GDPR, HIPAA, and CCPA, ensuring that organizations can store and
manage their data in compliance with legal requirements.
Data Integrity and Consistency:
• Data integrity and consistency are crucial for any database. DBMSs provide mechanisms that ensure the accuracy
and consistency of data. These mechanisms include constraints, triggers, and stored procedures that enforce
data integrity rules. DBMSs also provide features like transactions that ensure that data changes are atomic,
consistent, isolated, and durable (ACID).
Concurrent Data Access:
• A DBMS provides a concurrent access mechanism that allows multiple users to access the same data
simultaneously. This is especially important for organizations that require real-time data access. DBMSs use
locking mechanisms to ensure that multiple users can access the same data without causing conflicts or data
corruption.
Data Analysis and Reporting:
• DBMSs provide tools that enable data analysis and reporting. These tools allow organizations to
extract useful insights from their data, enabling better decision-making. DBMSs support various data
analysis techniques such as OLAP, data mining, and machine learning. Moreover, DBMSs provide
features like data visualization and reporting, which enable organizations to present their data in a
visually appealing and understandable way.
Scalability and Flexibility:
• DBMSs provide scalability and flexibility, enabling organizations to handle increasing amounts of data.
DBMSs can be scaled horizontally by adding more servers or vertically by increasing the capacity of
existing servers. This makes it easier for organizations to handle large amounts of data without
compromising performance. Moreover, DBMSs provide flexibility in terms of data modeling, enabling
organizations to adapt their databases to changing business requirements.
Cost-Effectiveness:
• DBMSs are cost-effective compared to traditional file-based systems. They reduce storage costs by
eliminating redundancy and optimizing data storage. They also reduce development costs by
providing tools for database design, maintenance, and administration. Moreover, DBMSs reduce
operational costs by automating routine tasks and providing self-tuning capabilities.
Structure of Database Management System
• In addition to these three levels, a DBMS also includes a Database Administrator (DBA) component, which is
responsible for managing the database system. The DBA is responsible for tasks such as database design,
security management, backup and recovery, and performance tuning.
1. Physical Data Independence :
Physical Data Independence is defined as the ability to make changes in the structure of the
lowest level of the Database Management System (DBMS) without affecting the higher-level
schemas. Hence, modification in the Physical level should not result in any changes in the
Logical or View levels.
• There are 3 levels in the schema
architecture of DBMS: Physical level,
Logical level, and View level (arranged
from the lowest to highest level).
How is Physical Data Independence achieved?
• Physical Data Independence is achieved by modifying the physical layer to logical layer
mapping (PL-LL mapping). We must ensure that the modification we have done is localized.
Logical Data Independence
• Logical Data Independence is defined as the ability to make changes in the structure of the
middle level of the Database Management System (DBMS) without affecting the highest-level
schema or application programs. Hence, modification in the logical level should not result in any
changes in the view levels or application programs.
Example –
Changes in the middle level (logical level) are: adding new attributes to a relation, deleting
existing attributes of the relation, etc. Ideally, we would not want to change any application or
programs that do not require to use of the modified attribute.
How is Logical Data Independence achieved?
Logical Data Independence is achieved by modifying the view layer to logical layer mapping (VL-LL mapping).
Basic terminologies of Database
• Entity − An entity is a specific real-world thing or idea that we wish to represent and keep data about. For
instance, students, professors, courses, and departments might all be considered entities in a university database
• Attribute − An attribute is a representation of a particular quality or trait of an entity. It outlines the information
about the entity that we wish to store. A student entity, for instance, may include characteristics like a student ID,
name, date of birth, and major.
• Key − A key is an entity's or an entity instance's particular set of properties that uniquely identify it. For data
integrity and effective data retrieval, keys are necessary. To ensure that each student has a distinct identification,
the student ID, for instance, may act as the primary key in the student object
• Table − A relational database system's core structure for organizing data into rows and columns is a table. Each
table is made up of columns (attributes) and rows (records), and it represents a single entity. For instance, a table
called "Students" may have columns for student data such as student ID, name, and major.
• Primary Key − A primary key is a way for a table to be uniquely identified. It guarantees that each row in the table
can be identified individually. A single column or a group of columns might serve as the primary key. The student
ID column, for instance, may serve as a primary key in the "Students" database.
• Foreign Key − A column or group of columns in one database that relate to the primary key in another table is
known as a foreign key. This creates a connection between the two tables. For instance, to link students with the
courses they are registered for, a foreign key in the "Courses" database can make reference to the primary key in
the "Students" field.
• Relational Database − A relational database is a kind of database system that arranges
information into tables and uses keys to create relationships between those tables. It
provides a systematic and effective method of managing data by adhering to the
fundamentals of the relational model. Popular relational database systems include
PostgreSQL, Oracle, and MySQL.
• Query − Requesting data or information from a database is known as a query. It enables
users to obtain, manipulate, and manage data and is described using a query language
like SQL. For instance, a query may return a list of every student registered for a certain
course.
• Index − A database table's index is a type of data structure that accelerates data
retrieval processes. Based on the indexed column(s), it enables easy access to specified
data. In the "Students" database, for instance, an index on the student ID column would
speed up searches looking for students by their ID.
• Normalization − Normalization is the process of arranging data in a database to reduce
duplication and strengthen data integrity. It entails breaking down tables and creating
• ACID − Atomicity, Consistency, Isolation, and Durability, or simply ACID, are characteristics
that guarantee the dependability and integrity of database transactions. A transaction will
always be seen as a single piece of work thanks to atomicity. A transaction moves the
database from one legitimate state to another by guaranteeing consistency. Concurrent
transactions are prevented from interfering with one another through isolation. Durability
ensures that changes made during a transaction are permanent and will endure any future
system failures.
• Data Warehouse − An organization's data warehouse is a sizable, integrated, and unified
collection of information from numerous sources. It is intended to be used for decision-making,
reporting, and analysis. For the purposes of business intelligence and data analytics, data
warehouses often store historical and aggregated data
• Data Mining − Finding patterns, trends, and insights from huge databases is referred to as
data mining. In order to extract useful knowledge and information, statistical and machine-
learning approaches are applied. To find hidden patterns and provide predictions or
suggestions based on the data, data mining techniques are utilized.
• Backup and Recovery − Backup and recovery procedures are crucial for guaranteeing data
availability and guarding against data loss. In order to offer a restoration point in the event of
a system failure or data corruption, backup entails making copies of the database at regular
intervals. Recovery entails utilizing the backup copies to restore the database to a consistent
condition.
Client data can be recovered in the event of hardware failures or unintentional deletions, a database administrator, for instance,
may plan daily backups of a customer database.
• Replication − Replication is the process of making and keeping copies of a database or specific sections of a database on
several servers. It increases fault tolerance, scalability, and data availability. Asynchronous or synchronous replication
guarantees that changes made to one duplicate are replicated to the others.
For instance, in a distributed e-commerce system, product information may be duplicated over several servers to make sure that
users can easily access product details no matter where they are.
• Data Dictionary − A data dictionary, sometimes referred to as a metadata repository, is a central repository for details on
the objects and the schema of a database. It includes metadata including table and column names, data types, restrictions, and
table connections. The DBMS uses the data dictionary to verify queries, uphold data integrity, and offer details on the database
architecture.
• The "Employees" table, for instance, may be described in the data dictionary together with the names and data types of its
columns, such as "Employee ID," "First Name," "Last Name," and "Salary."
• Database Schema − A database schema outlines the logical organization and structure of a database. The tables,
columns, data types, restrictions, and connections between the tables are all described. A blueprint for building and running the
database is provided by the schema. A database design for an online shop, for instance, would have tables like "Books,"
"Authors," and "Orders," each with specific fields, data types, and connections.
Database system Architecture
• A Database Architecture is a representation of DBMS design. It helps to design, develop, implement, and
maintain the database management system. A DBMS architecture allows dividing the database system into
individual components that can be independently modified, changed, replaced, and altered. It also helps to
understand the components of a database.
• A Database stores critical information and helps access data quickly and securely. Therefore, selecting the
correct Architecture of DBMS helps in easy and efficient data management.
Types of DBMS Architecture
There are mainly three types of DBMS architecture:
• One Tier Architecture (Single Tier Architecture) : Billing Systems and Commissions systems
• Two Tier Architecture: Railway Reservation System
• Three Tier Architecture
• N-tier Architecture
1-tier architecture:
• One-tier architecture involves putting all of the required components for a
software application or technology on a single server or platform.
• Simple Architecture: 1-Tier Architecture is the most simple architecture to
set up, as only a single machine is required to maintain it.
• Cost-Effective: No additional hardware is required for implementing 1-
Tier Architecture, which makes it cost-effective.
• Easy to Implement: 1-Tier Architecture can be easily deployed, and hence
it is mostly used in small projects.
• Basically, a one-tier architecture keeps all of the elements of an application, including the interface, Middleware and
back-end data, in one place.
2-tier architecture:
• The two-tier is based on Client Server architecture. The two-tier architecture is like client server application. The
direct communication takes place between client and server. There is no intermediate between client and server.
• APIs like ODBC and JDBC are used for this interaction. The server side is
responsible for providing query processing and transaction management
functionalities. On the client side, the user interfaces and application
programs are run. The application on the client side establishes a
connection with the server side to communicate with the DBMS.
• An advantage of this type is that maintenance and understanding are
easier, and compatible with existing systems. However, this model gives
poor performance when there are a large number of users.
Advantages of 2-Tier Architecture
• Easy to Access: 2-Tier Architecture makes easy access
to the database, which makes fast retrieval.
• Scalable: We can scale the database easily, by adding
clients or upgrading hardware.
• Low Cost: 2-Tier Architecture is cheaper than 3-Tier
Architecture and Multi-Tier Architecture.
• Easy Deployment: 2-Tier Architecture is easier to
deploy than 3-Tier Architecture.
• Simple: 2-Tier Architecture is easily understandable as
well as simple because of only two components.
3-tier architecture:
• A 3-tier architecture separates its tiers from each other
based on the complexity of the users and how they use
the data present in the database. It is the most widely
used architecture to design a DBMS.
This architecture has different usages with different applications. It can be used in web
applications and distributed applications. The strength in particular is when using this
architecture over distributed systems.
• Database (Data) Tier − At this tier, the database resides along with its query processing
languages. We also have the relations that define the data and their constraints at this level.
• Application (Middle) Tier − At this tier reside the application server and the programs that
access the database. For a user, this application tier presents an abstracted view of the
database. End-users are unaware of any existence of the database beyond the application. At
the other end, the database tier is not aware of any other user beyond the application tier.
Hence, the application layer sits in the middle and acts as a mediator between the end-user
and the database.
• User (Presentation) Tier − End-users operate on this tier and they know nothing about any
existence of the database beyond this layer. At this layer, multiple views of the database can
be provided by the application. All views are generated by applications that reside in the
n-tier architecture:
N-tier architecture would involve dividing an application into three different tiers. These
would be the
1.logic tier,
2.the presentation tier, and
3.the data tier.
• It is the physical separation of the different parts of the application as opposed to the usually
conceptual or logical separation of the elements in the model-view-controller (MVC)
framework.
• Another difference from the MVC framework is that n-tier layers are connected linearly,
meaning all communication must go through the middle layer, which is the logic tier.
• In MVC, there is no actual middle layer because the interaction is triangular; the control
layer has access to both the view and model layers and the model also accesses the view; the
controller also creates a model based on the requirements and pushes this to the view.
• However, they are not mutually exclusive, as the MVC framework can be used in
conjunction with the n-tier architecture, with the n-tier being the overall architecture used and
MVC used as the framework for the presentation tier.
Data Models
• Data Model is the modeling of the data description, data semantics, and consistency constraints
of the data. It provides the conceptual tools for describing the design of a database at each level
of data abstraction. Therefore, there are following four data models used for understanding the
structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and columns within a table. Thus, a
relational model uses tables for representing data and in-between relationships. Tables are also called relations. This
model was initially described by Edgar F. Codd, in 1969. The relational data model is the widely used model which is
primarily used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as objects and relationships
among them. These objects are known as entities, and relationship is an association among these entities. This model
was designed by Peter Chen and published in 1976 papers. It was widely used in database designing. A set of attributes
describe the entities. For example, student_name, student_id describes the 'student' entity. A set of the same type of
entities is known as an 'Entity set', and the set of the same type of relationships is known as 'relationship set'.
3) Object-based Data Model: An extension of the ER model with notions of functions, encapsulation, and object
identity, as well. This model supports a rich type system that includes structured and collection types. Thus, in 1980s,
various database systems following the object-oriented approach were developed. Here, the objects are nothing but
the data carrying its properties.
4) Semi-structured Data Model: This type of data model is different from the other three
data models (explained above). The semi-structured data model allows the data specifications
at places where the individual data items of the same type may have different attributes sets.
The Extensible Markup Language, also known as XML, is widely used for representing the
semi-structured data. Although XML was initially designed for including the markup
information to the text document, it gains importance because of its application in the
exchange of data.
ER (Entity Relationship) Diagram in DBMS
• ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to define the
data elements and relationship for a specified system.
• It develops a conceptual design for the database. It also develops a very simple and easy to design view of data.
• In ER modeling, the database structure is portrayed as a diagram called an entity-relationship diagram.
• For example, Suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city, street
name, pin code, etc and there will be a relationship between them.
Component of ER Diagram
1. Entity:
• An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as rectangles.
• Consider an organization as an example- manager, product, employee, department etc. can be taken as an entity.
a. Weak Entity
• An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key attribute
of its own. The weak entity is represented by a double rectangle.
Strong Entity Type:
• It is an entity that has its own existence and is independent.
• The entity relationship diagram represents a strong entity type with the help of a single rectangle. Below is the
ERD of the strong entity type: • In the above example, the "Customer" is the entity
type with attributes such as ID, Name, Gender, and
Phone Number. Customer is a strong entity type as it
has a unique ID for each customer.
2. Attribute
• The attribute is used to describe the property a. Key Attribute
of an entity. Eclipse is used to represent an
attribute. The key attribute is used to represent the main
characteristics of an entity. It represents a
• For example, id, age, contact number, name, primary key. The key attribute is represented by
etc. can be attributes of a student. an ellipse with the text underlined.
b. Composite Attribute c. Multivalued Attribute
• An attribute that composed of many other • An attribute can have more than one value. These
attributes is known as a composite attributes are known as a multivalued attribute.
The double oval is used to represent multivalued
attribute. The composite attribute is attribute.
represented by an ellipse, and those
• For example, a student can have more than one
ellipses are connected with an ellipse. phone number.
d. Derived Attribute
• An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a
dashed ellipse.
• For example, A person's age changes over time and can be derived from another attribute like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the
relationship.
c. Many-to-one relationship
• When more than one instance of the entity on the left, and only one instance of an entity on the right associates
with the relationship then it is known as a many-to-one relationship.
• For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
• When more than one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then it is known as a many-to-many relationship.
• For example, Employee can assign by many projects and project can have many employees.
Different Types of Database Keys
• Candidate Key
• Primary Key
• Super Key
• Alternate Key
• Foreign Key
• Composite Key
Candidate Key
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key. For
Example, STUD_NO in STUDENT relation.
• It is a minimal super key.
• It is a super key with no repeated data is called a candidate key.
• The minimal set of attributes that can uniquely identify a record.
• It must contain unique values.
• It can contain NULL values.
• Every table must have at least a single candidate key.
• A table can have multiple candidate keys but only one primary key.
• The value of the Candidate Key is unique and may be null for a tuple.
• There can be more than one candidate key in a relationship.
Example:
STUD_NO is the candidate key for relation STUDENT.
1 001 C001
2 056 C005
Primary Key
There can be more than one candidate key in relation out of which one can be chosen as the
primary key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate keys).
•It is a unique key.
•It can identify only one tuple (a record) at a time.
•It has no duplicate values, it has unique values.
•It cannot be NULL.
•Primary keys are not necessarily to be a single column; more than one column can also be a
primary key for a table.
•Example:
•STUDENT table -> Student(STUD_NO, SNAME,
•ADDRESS, PHONE) , STUD_NO is a primary key
Table STUDENT
Super Key
The set of attributes that can uniquely identify a tuple is known as Super Key. For Example, STUD_NO, (STUD_NO,
STUD_NAME), etc. A super key is a group of single or multiple keys that identifies rows in a table. It supports NULL
values.
Adding zero or more attributes to the candidate key generates the super key.
A candidate key is a super key but vice versa is not true.
Super Key values may also be NULL.
Example:
• All the keys which are not primary keys are called alternate keys.
• It is a secondary key.
• It contains two or more fields to identify two or more records.
• These values are repeated.
Eg:- SNAME, and ADDRESS is Alternate keys
Example:
2 056 C005
• Foreign Key can be NULL as well as may contain duplicate tuples i.e. it need not follow
uniqueness constraint.
• For Example, STUD_NO in the STUDENT_COURSE relation is not unique. It has been repeated
for the first and third tuples. However, the STUD_NO in STUDENT relation is a primary key and
it needs to be always unique, and it cannot be null.
Composite Key:Sometimes, a table might not have a single column/attribute that uniquely identifies all the records of a
table. To uniquely identify rows of a table, a combination of two or more columns/attributes can be used. It still can give
duplicate values in rare cases. So, we need to find the optimal set of attributes that can uniquely identify rows in a table.
• It acts as a primary key if there is no primary key in a table
• Two or more attributes are used together to make a composite key.
• Different combinations of attributes may give different accuracy in terms of identifying the rows uniquely.
• Example:
• FULLNAME + DOB can be combined
• together to access the details of a student.
Extended Entity-Relationship (EE-R) Model
• EER is a high-level data model that incorporates the extensions to the original ER model. Enhanced ERD are high
level models that represent the requirements and complexities of complex database.
• In addition to ER model concepts EE-R includes −
• Subclasses and Super classes.
• Specialization and Generalization.
• Category or union type.
• Aggregation.
Abstract: The ER Diagram of the University management system shows the relationship
between various entities. We can also call it the blueprint of the University
management system.
Tool used: The ER diagram provides some symbol that is known as the diagramming tool.
Users: The users of the ER diagram are university admin , applications, software, and
websites.