Unit-1-2
Unit-1-2
UNIT-I
UNIT-1
Issues in File Processing System, Need for DBMS, Basic terminologies
of Database, Database system Architecture, Various Data models, ER
diagram basics and extensions, Case study : Construction of Database
design using Entity Relationship diagram for an application such as
University Database, Banking System, Information System
File-based Definition
• A file processing system is a collection of programs that store and manage files in computer hard-disk.
File processing system has more data redundancy, whereas there is less data redundancy in dbms. File
processing system provides less flexibility in accessing data, whereas dbms has more flexibility in
accessing data.
• Program defines and manages it’s own data
Types Of File Processing System:
• Relative-record-number processing: Data is accessed using a relative position (record
number) rather than a key. It’s efficient for direct access.
• Consecutive processing: Data is processed in a fixed, sequential order, one record after
another, without skipping.
• Sequential-by-key processing: Records are processed in a sequential order based on
their key values, typically sorted.
• Random-by-key processing: Accessing records randomly based on the key, allowing
immediate retrieval of any record without following a sequence.
• Sequential-within-limits processing: Data is processed in sequence, but only within
specified range limits, useful for subset processing.
Disadvantages of file processing:
Data Redundancy
Data Inconsistency
Lack of Data Integration
Difficulty in Access
Poor Data Security
No Support for Multi-User Access
Lack of Backup and Recovery
Limited Query Capabilities
Inefficient Data Handling
Difficulty in Maintaining Relationships
No Standardization
Database management system (DBMS):
A Database Management System (DBMS) is software designed to store, retrieve, define, and manage data in a
database.
• Definition
• A collection of self-describing and integrated data files
• Self-describing: It contains metadata (data about the data), which defines the
structure, relationships, and constraints of the stored data.
• Integrated: Data is centralized, ensuring consistency and eliminating
redundancy.
• System catalog- collection of system tables maintained by the database
management system (DBMS).
• It acts as a repository for metadata, storing details like table schemas, relationships,
indexes, users, and access permissions.
• It ensures the DBMS can manage and interpret data correctly.
Database
• Meta data
• Metadata is data that describes other data, often called "data about
data."Examples include the data types of fields, constraints, and relationships
between tables.
• Stored in the data dictionary, metadata helps the DBMS understand how to
handle and query the data.
• Overhead data refers to auxiliary information maintained by the
DBMS to ensure reliability and accountability.
• Logs: Used for recovery in case of system crashes or failures.
• Audit trails: Track user actions to ensure security and compliance.
Data abstraction
• Data abstraction is a key concept in database management systems (DBMS) that
focuses on hiding the complexities of how data is stored, managed, and
maintained from the end-users. It provides a simplified view of the database and
separates the logical structure of data from the physical storage.
• Data abstraction is the process of hiding implementation details from users to
provide a simplified view of the database.
• Goal: To shield users from the complexities of physical data storage and
management.
• Levels of Abstraction:
• Physical Level: How the data is stored physically on the disk (e.g., blocks, files).
• Logical Level: How the data is structured (e.g., tables, columns, and relationships).
• View Level: How the data appears to end-users (e.g., reports, dashboards).
• Abstraction makes it easier for developers, administrators, and end-users to
interact with databases without worrying about underlying complexities.
Database Management System Facility
• Structured query language (SQL)
Data definition language (DDL) – Enables users to define db schema
Data manipulation language (DML) – supports querying and modifying data
• Security system – supports user authentication and authorization
• Integrity system-accuracy, consistency, and reliability –implements constraints to
ensure data integrity
• Concurrency control system – manages simultaneous access by multiple users
• Backup & recovery system – facilities of periodic data backups and recovery
mechanisms after failures
• View mechanism
DBMS Environment
• Hardware
• Client-server architecture
• Software
• dbms, os, network, application
• Data
• Schema, subschema, table, attribute
• People
• Data administrator & database administrator
• Database designer: logical & physical
• Application programmer
• End-user: naive & sophisticated
• Procedure
• Start, stop, log on, log off, back up, recovery
DBMS Environment
1. Hardware
• Client-Server Architecture: The physical infrastructure that hosts the DBMS, which includes:
• Servers to host the database.
• Client systems for user interaction.
• Network components for communication.
2. Software
• DBMS Software: The database management system itself.
• Operating System (OS): The system software supporting the DBMS.
• Network Software: Facilitates communication in distributed systems.
• Application Software: Interfaces for interacting with the database (e.g., applications, front-end
tools).
DBMS Environment
• Data
• Schema: The overall structure of the database, including logical design.
• Subschema: Specific views or parts of the schema for user roles or applications.
• Tables: Collections of rows and columns storing data.
• Attributes: Columns in a table representing data fields.
• People
• Data administrator & database administratorData Administrator: Defines data policies and ensures data
integrity.
• Database Administrator (DBA): Manages the DBMS, including maintenance, security, and performance.
• Database Designer:
• Logical Designer: Focuses on the conceptual and logical structure.
• Physical Designer: Optimizes the storage and retrieval mechanisms.
• Application Programmer: Develops applications that interact with the database.
• End-Users:
• Naive Users: Use predefined queries and interfaces.
• Sophisticated Users: Write their own queries and interact directly with the DBMS.
DBMS Environment
• Procedure
• Start: Initialize the DBMS and its components.
• Stop: Properly shut down the DBMS.
• Log On: Authenticate and access the system.
• Log Off: Securely exit the system.
• Back Up: Create copies of the database for disaster recovery.
• Recovery: Restore the database to a consistent state after a failure.
Advantages of DBMS
• Control redundancy
• Consistency
• Integrity-accuracy, consistency, and reliability
• Security
• Concurrency control
• Backup & recovery
• Data standard
• More information
• Data sharing & conflict control
• Productivity & accessibility
• Economy of scale
• Maintenance
Need for DBMS
A Data Base Management System is a system software for easy, efficient and reliable data processing and
management. It can be used for:
• Creation of a database.
• Retrieval of information from the database.
• Updating the database.
• Managing a database.
• Multiple User Interface
1. Data Organization and Management:
One of the primary needs for a DBMS is data organization and management. DBMSs allow data to be stored
in a structured manner, which helps in easier retrieval and analysis. A well-designed database schema
enables faster access to information, reducing the time required to find relevant data. A DBMS also
provides features like indexing and searching, which make it easier to locate specific data within the
database. This allows organizations to manage their data more efficiently and effectively
Data Security and Privacy:
DBMSs provide a robust security framework that ensures the confidentiality, integrity, and availability of data.
They offer authentication and authorization features that control access to the database. DBMSs also provide
encryption capabilities to protect sensitive data from unauthorized access. Moreover, DBMSs comply with various
data privacy regulations such as the GDPR, ensuring that organizations can store and manage their data in
compliance with legal requirements.
3. Data Integrity and Consistency:
Data integrity and consistency are crucial for any database. DBMSs provide mechanisms that ensure the accuracy
and consistency of data. These mechanisms include constraints, triggers, and stored procedures that enforce data
integrity rules. DBMSs also provide features like transactions that ensure that data changes are atomic, consistent,
isolated, and durable (ACID).
4. Concurrent Data Access:
A DBMS provides a concurrent access mechanism that allows multiple users to access the same data
simultaneously. This is especially important for organizations that require real-time data access. DBMSs use locking
mechanisms to ensure that multiple users can access the same data without causing conflicts or data corruption.
5. Data Analysis and Reporting:
DBMSs provide tools that enable data analysis and reporting. These tools allow organizations to extract useful
insights from their data, enabling better decision-making. DBMSs support various data analysis techniques such as
OLAP, data mining, and machine learning. Moreover, DBMSs provide features like data visualization and reporting,
which enable organizations to present their data in a visually appealing and understandable way.
6. Scalability and Flexibility:
DBMSs provide scalability and flexibility, enabling organizations to handle increasing amounts of data.
DBMSs can be scaled horizontally by adding more servers or vertically by increasing the capacity of
existing servers. This makes it easier for organizations to handle large amounts of data without
compromising performance. Moreover, DBMSs provide flexibility in terms of data modeling, enabling
organizations to adapt their databases to changing business requirements.
7. Cost-Effectiveness:
DBMSs are cost-effective compared to traditional file-based systems. They reduce storage costs by
eliminating redundancy and optimizing data storage. They also reduce development costs by providing
tools for database design, maintenance, and administration. Moreover, DBMSs reduce operational costs
by automating routine tasks and providing self-tuning capabilities.
Structure of Database Management System
• Database Management System (DBMS) is software that allows access to data stored in a database and provides an
easy and effective method of –
• In addition to these three levels, a DBMS also includes a Database Administrator (DBA) component, which is
responsible for managing the database system. The DBA is responsible for tasks such as database design,
security management, backup and recovery, and performance tuning.
1. Physical Data Independence :
Physical Data Independence is defined as the ability to make changes in the structure of the lowest level of
the Database Management System (DBMS) without affecting the higher-level schemas. Hence,
modification in the Physical level should not result in any changes in the Logical or View levels.
• A Database Architecture is a representation of DBMS design. It helps to design, develop, implement, and
maintain the database management system. A DBMS architecture allows dividing the database system into
individual components that can be independently modified, changed, replaced, and altered. It also helps to
understand the components of a database.
• A Database stores critical information and helps access data quickly and securely. Therefore, selecting the
correct Architecture of DBMS helps in easy and efficient data management.
Types of DBMS Architecture
There are mainly three types of DBMS architecture:
• One Tier Architecture (Single Tier Architecture) : Billing Systems and Commissions systems
• Two Tier Architecture: Railway Reservation System
• Three Tier Architecture
• N-tier Architecture
1-tier architecture:
• One-tier architecture involves putting all of the required components for a
software application or technology on a single server or platform.
• Simple Architecture: 1-Tier Architecture is the most simple architecture to
set up, as only a single machine is required to maintain it.
• Cost-Effective: No additional hardware is required for implementing 1-
Tier Architecture, which makes it cost-effective.
• Easy to Implement: 1-Tier Architecture can be easily deployed, and hence
it is mostly used in small projects.
• Basically, a one-tier architecture keeps all of the elements of an application, including the interface, Middleware and
back-end data, in one place.
2-tier architecture:
• The two-tier is based on Client Server architecture. The two-tier architecture is like client server application. The
direct communication takes place between client and server. There is no intermediate between client and server.
• APIs like ODBC and JDBC are used for this interaction. The server side is
responsible for providing query processing and transaction management
functionalities. On the client side, the user interfaces and application
programs are run. The application on the client side establishes a
connection with the server side to communicate with the DBMS.
• An advantage of this type is that maintenance and understanding are
easier, and compatible with existing systems. However, this model gives
poor performance when there are a large number of users.
Advantages of 2-Tier Architecture
• Easy to Access: 2-Tier Architecture makes easy access
to the database, which makes fast retrieval.
• Scalable: We can scale the database easily, by adding
clients or upgrading hardware.
• Low Cost: 2-Tier Architecture is cheaper than 3-Tier
Architecture and Multi-Tier Architecture.
• Easy Deployment: 2-Tier Architecture is easier to
deploy than 3-Tier Architecture.
• Simple: 2-Tier Architecture is easily understandable as
well as simple because of only two components.
3-tier architecture:
• A 3-tier architecture separates its tiers from each other
based on the complexity of the users and how they use
the data present in the database. It is the most widely
used architecture to design a DBMS.
This architecture has different usages with different applications. It can be used in web applications and distributed
applications. The strength in particular is when using this architecture over distributed systems.
• Database (Data) Tier − At this tier, the database resides along with its query processing languages. We also have the
relations that define the data and their constraints at this level.
• Application (Middle) Tier − At this tier reside the application server and the programs that access the database. For a
user, this application tier presents an abstracted view of the database. End-users are unaware of any existence of the
database beyond the application. At the other end, the database tier is not aware of any other user beyond the
application tier. Hence, the application layer sits in the middle and acts as a mediator between the end-user and the
database.
• User (Presentation) Tier − End-users operate on this tier and they know nothing about any existence of the database
beyond this layer. At this layer, multiple views of the database can be provided by the application. All views are
generated by applications that reside in the application tier.
n-tier architecture:
N-tier architecture would involve dividing an application into three different tiers. These would be the
1.logic tier,
2.the presentation tier, and
3.the data tier.
• It is the physical separation of the different parts of the application as opposed to the usually
conceptual or logical separation of the elements in the model-view-controller (MVC)
framework.
• Another difference from the MVC framework is that n-tier layers are connected linearly,
meaning all communication must go through the middle layer, which is the logic tier.
• In MVC, there is no actual middle layer because the interaction is triangular; the control
layer has access to both the view and model layers and the model also accesses the view; the
controller also creates a model based on the requirements and pushes this to the view.
• However, they are not mutually exclusive, as the MVC framework can be used in
conjunction with the n-tier architecture, with the n-tier being the overall architecture used and
MVC used as the framework for the presentation tier.
Data Models
• Data Model is the modeling of the data description, data semantics, and consistency constraints
of the data. It provides the conceptual tools for describing the design of a database at each level
of data abstraction. Therefore, there are following four data models used for understanding the
structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and columns within a table. Thus, a
relational model uses tables for representing data and in-between relationships. Tables are also called relations. This
model was initially described by Edgar F. Codd, in 1969. The relational data model is the widely used model which is
primarily used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as objects and relationships
among them. These objects are known as entities, and relationship is an association among these entities. This model
was designed by Peter Chen and published in 1976 papers. It was widely used in database designing. A set of attributes
describe the entities. For example, student_name, student_id describes the 'student' entity. A set of the same type of
entities is known as an 'Entity set', and the set of the same type of relationships is known as 'relationship set'.
3) Object-based Data Model: An extension of the ER model with notions of functions, encapsulation, and object
identity, as well. This model supports a rich type system that includes structured and collection types. Thus, in 1980s,
various database systems following the object-oriented approach were developed. Here, the objects are nothing but
the data carrying its properties.
4) Semi-structured Data Model: This type of data model is different from the other three
data models (explained above). The semi-structured data model allows the data specifications
at places where the individual data items of the same type may have different attributes sets.
The Extensible Markup Language, also known as XML, is widely used for representing the
semi-structured data. Although XML was initially designed for including the markup
information to the text document, it gains importance because of its application in the
exchange of data.
ER (Entity Relationship) Diagram in DBMS
• ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to define the
data elements and relationship for a specified system.
• It develops a conceptual design for the database. It also develops a very simple and easy to design view of data.
• In ER modeling, the database structure is portrayed as a diagram called an entity-relationship diagram.
• For example, Suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city, street
name, pin code, etc and there will be a relationship between them.
Component of ER Diagram
1. Entity:
• An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as rectangles.
• Consider an organization as an example- manager, product, employee, department etc. can be taken as an entity.
a. Weak Entity
• An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key attribute
of its own. The weak entity is represented by a double rectangle.
Strong Entity Type:
• It is an entity that has its own existence and is independent.
• The entity relationship diagram represents a strong entity type with the help of a single rectangle. Below is the
ERD of the strong entity type: • In the above example, the "Customer" is the entity
type with attributes such as ID, Name, Gender, and
Phone Number. Customer is a strong entity type as it
has a unique ID for each customer.
2. Attribute
• The attribute is used to describe the property a. Key Attribute
of an entity. Eclipse is used to represent an
attribute. The key attribute is used to represent the main
characteristics of an entity. It represents a
• For example, id, age, contact number, name, primary key. The key attribute is represented by
etc. can be attributes of a student. an ellipse with the text underlined.
b. Composite Attribute c. Multivalued Attribute
• An attribute that composed of many other • An attribute can have more than one value. These
attributes is known as a composite attributes are known as a multivalued attribute.
The double oval is used to represent multivalued
attribute. The composite attribute is attribute.
represented by an ellipse, and those
• For example, a student can have more than one
ellipses are connected with an ellipse. phone number.
d. Derived Attribute
• An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a
dashed ellipse.
• For example, A person's age changes over time and can be derived from another attribute like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the
relationship.
c. Many-to-one relationship
• When more than one instance of the entity on the left, and only one instance of an entity on the right associates
with the relationship then it is known as a many-to-one relationship.
• For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
• When more than one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then it is known as a many-to-many relationship.
• For example, Employee can assign by many projects and project can have many employees.
Different Types of Database Keys
• Candidate Key
• Primary Key
• Super Key
• Alternate Key
• Foreign Key
• Composite Key
Candidate Key
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key. For Example,
STUD_NO in STUDENT relation.
• It is a minimal super key.
• It is a super key with no repeated data is called a candidate key.
• The minimal set of attributes that can uniquely identify a record.
• It must contain unique values.
• It can contain NULL values.
• Every table must have at least a single candidate key.
• A table can have multiple candidate keys but only one primary key.
• The value of the Candidate Key is unique and may be null for a tuple.
• There can be more than one candidate key in a relationship.
Example:
STUD_NO is the candidate key for relation STUDENT.
1 001 C001
2 056 C005
Primary Key
There can be more than one candidate key in relation out of which one can be chosen as the primary key. For
Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation STUDENT but STUD_NO can
be chosen as the primary key (only one out of many candidate keys).
•It is a unique key.
•It can identify only one tuple (a record) at a time.
•It has no duplicate values, it has unique values.
•It cannot be NULL.
•Primary keys are not necessarily to be a single column; more than one column can also be a primary key for a
table.
•Example:
•STUDENT table -> Student(STUD_NO, SNAME,
•ADDRESS, PHONE) , STUD_NO is a primary key
Table STUDENT
Super Key
The set of attributes that can uniquely identify a tuple is known as Super Key. For Example, STUD_NO, (STUD_NO,
STUD_NAME), etc. A super key is a group of single or multiple keys that identifies rows in a table. It supports NULL
values.
Adding zero or more attributes to the candidate key generates the super key.
A candidate key is a super key but vice versa is not true.
Super Key values may also be NULL.
Example:
• All the keys which are not primary keys are called alternate keys.
• It is a secondary key.
• It contains two or more fields to identify two or more records.
• These values are repeated.
Eg:- SNAME, and ADDRESS is Alternate keys
Example:
1 005 C001
2 056 C005
• Foreign Key can be NULL as well as may contain duplicate tuples i.e. it need not follow uniqueness
constraint.
• For Example, STUD_NO in the STUDENT_COURSE relation is not unique. It has been repeated for the first
and third tuples. However, the STUD_NO in STUDENT relation is a primary key and it needs to be always
unique, and it cannot be null.
Composite Key:Sometimes, a table might not have a single column/attribute that uniquely identifies all the records of a
table. To uniquely identify rows of a table, a combination of two or more columns/attributes can be used. It still can
give duplicate values in rare cases. So, we need to find the optimal set of attributes that can uniquely identify rows in a
table.
• It acts as a primary key if there is no primary key in a table
• Two or more attributes are used together to make a composite key.
• Different combinations of attributes may give different accuracy in terms of identifying the rows uniquely.
• Example:
• FULLNAME + DOB can be combined
• together to access the details of a student.
Extended Entity-Relationship (EE-R) Model
• EER is a high-level data model that incorporates the extensions to the original ER model. Enhanced ERD are high
level models that represent the requirements and complexities of complex database.
• In addition to ER model concepts EE-R includes −
Abstract: The ER Diagram of the University management system shows the relationship
between various entities. We can also call it the blueprint of the University
management system.
Tool used: The ER diagram provides some symbol that is known as the diagramming tool.
Users: The users of the ER diagram are university admin , applications, software, and
websites.