DBMS-UNIT-1
DBMS-UNIT-1
UNIT-I
Introduction: Database system, Characteristics (Database Vs File System), Database Users,
Advantages of Database systems, Database applications. Brief introduction of different Data Models;
Concepts of Schema, Instance and data independence; Three tier schema architecture for data
independence; Database system structure, environment, Centralized and Client Server architecture
for the database.
Entity Relationship Model: Introduction, Representation of entities, attributes, entity set, relationship,
relationship set, constraints, sub classes, super class, inheritance, specialization, generalization using
ER Diagrams.
==============================================================
Introduction
Data
✓ Data is collection of known facts and figures that can be recorded, from which useful information is
derived.
✓ Data is plural word, whereas datum is singular.
Types of Data
✓ Text or numeric value- name, address, phone no
✓ Images-
✓ Videos-
✓ Speech-
Database
✓ A database is a collection of related data
Database Management System (DBMS): is a collection of programs (software) enabling users to create and
maintain a database.
The DBMS is a general-purpose software system that facilitates the processes of defining, constructing,
manipulating, and sharing databases among various users and applications.
Definition: specifying data types (and other constraints to which the data must conform) and data
organization
Construction: the process of storing the data on some medium (e.g., magnetic disk) that is controlled
by the DBMS
Manipulation: querying, updating, report generation
Sharing a database allows multiple users and programs to access the database simultaneously.
Examples of Databases
✓ Traditional databases-in which most of the information that is stored and accessed is either textual or
numeric
✓ Multimedia databases-to store images, audio clips, and video streams digitally
✓ Geographic information systems (GIS) – to store and analyze maps, weather data, and satellite images.
✓ Data warehouses and online analytical processing (OLAP) systems are used in many companies to
extract and analyze useful business information from very large databases to support decision making.
DATABASE SYSTEM
✓ The database and DBMS software together is called as Database System.
An application program accesses the database by sending queries or requests for data to the DBMS.
A query typically causes some data to be retrieved;
A transaction may cause some data to be read and some data to be written into the database.
DISADVANTAGES OF DBMS:
Complexity: A DBMS fulfil lots of requirement and it solves many problems related to database. But all these
functionality has made DBMS extremely complex software. Developer, designer, DBA and End user of
database must have complete skills if they want to use it properly. If they don’t understand this complex system
then it may cause loss of data or database failure.
Size: As DBMS becomes big software due to its functionalities so it requires lots of space and memory to run
its application efficiently. It gains bigger size as data is fed in it.
Cost of DBMS: DBMS requires high initial investment for hardware, software and trained staff.
Additional hardware costs: A DBMS requires disk storage for the data and sometimes you need to purchase
extra space to store your data. Also sometimes you need a dedicated machine for better performance of database.
These machines and storage space increase extra costs of hardware.
Performance: Traditional files system was very good for small organizations as they give splendid
performance. But DBMS gives poor performance for small scale firms as its speed is slow.
Higher impact of a failure: As we know that in DBMS, all the files are stored in single database so chances of
database failure become more. Any accidental failure of component may cause loss of valuable data. This is really a big
question mark for big firms
View of data:
3-SCHEMA ARCHITECTURE FOR DATA INDEPENDENCE / 3-LEVEL
ARCHITECTURE/ DATA ABSTRACTION:
A database system is a collection of interrelated files and a set of programs that allow users to access
and modify these files. A major purpose of a database system is to provide users with an abstract view of the
data. That is, the system hides certain details of how the data are stored and maintained.
For the system to be usable, it must retrieve data efficiently. Since many database-systems users are not
computer trained, developers hide the complexity from users through several levels of abstraction, to simplify
users’ interactions with the system:
• Physical level (Internal level): The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail. Internal schema at the internal level used
to describe physical storage structures and access paths (e.g indexes).
• Logical level (conceptual level): The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the entire database in
terms of a small number of relatively simple structures. Although implementation of the simple structures at the
logical level may involve
complex physical-level
structures, the user of the logical
level does not need to be aware
of this complexity. Database
administrators, who must decide
what information to keep in the
database, use the logical level of
abstraction. Conceptual schema
at the conceptual level used to
describe the structure and
constraints for the whole
database for a community of
users.
• View level (External Level): The highest level of abstraction describes only part of the entire database. Even
though the logical level uses simpler structures, complexity remains because of the variety of information stored
in a large database. Many users of the database system do not need all this information; instead, they need to
access only a part of the database. The view level of abstraction exists to simplify their interaction with the
system. The system may provide many views for the same database. External schemas at the external level to
describe the various user views.
The processes of transforming requests and results between levels are called mappings.
DATA INDEPENDENCE:
The three-schema architecture can be
used to further explain the concept of data
independence.
✓ This can be defined as the capacity to
change the schema at one level of a
database system without having to
change the schema at the next higher
level.
✓ Two types of data independence:
1. Logical data independence is the
capacity to the conceptual schema
without having to change external
schemas or application programs.
2. Physical data independence is the capacity to change the internal schema without having to change the
conceptual schema. Hence the external schemas need not be changed as well.
Database Model
Data model: a collection of conceptual tools for describing data, data relationships, data semantics, and
consistency constraints.
The data models are used to describe the design of the database at the physical, logical and view levels.
A data model visually represents the nature of data, business rules governing the data and how to it will be
organized in the database.
1. Relational Model
The relational model uses a collection of tables to represent both data and the relationships among those
data. Each table has multiple columns, and each column has a unique name. The following table presents a
sample relational database comprising the details of bank customers.
2. The Entity-Relationship Model
The entity-relationship (E-R) data model is based on a perception of a real world that consists of a collection
of basic objects, called entities, and of relationships among these objects. An entity is a “thing” or “object” in
the real world that is distinguishable from other objects. For example, each person is an entity, and bank accounts
can be considered as entities. A sample E-R diagram is shown below.
4. Hierarchical Model
A hierarchical database is a kind of DMS that links records together in a tree data structure such that each
record type has only one owner. Ex: An order is owned by only one customer.
Hierarchical structures were widely used in the first main frame DBMS.
Example-1:
Example-2:
5. Network Model
The network model is based on directed graph theory. The network model replaces the hierarchical tree with
a graph thus allowing more general connections among the nodes.
The main difference of the network model from the hierarchical model is its ability to handle many to many
relationships or in other words, it allows a record to have more than one parent.
Example 1:
DATABASE SYSTEM STRUCTURE / THE DATABASE SYSTEM ENVIRONMENT:
Figure below illustrates, in a simplified form, the typical DBMS components. The figure is divided into two
parts. The top part of the figure refers to
the various users of the database
environment and their interfaces. The
lower part shows the internals of the
DBMS responsible for storage of data
and processing of transactions.
Let us consider the top part of
Figure first. It shows interfaces for the
DBA staff, casual users who work with
interactive interfaces to formulate
queries, application programmers who
create programs using some host
programming languages, and
parametric users who do data entry
work by supplying parameters to
predefined transactions. The DBA staff
works on defining the database and
tuning it by making changes to its
definition using the DDL and other
privileged commands.
The DDL compiler processes
schema definitions, specified in the DDL, and stores descriptions of the schemas (meta-data) in the DBMS
catalog. The catalog includes information such as the names and sizes of files, names and data types of data
items, storage details of each file, mapping information among schemas, and constraints. In addition, the catalog
stores many other types of information that are needed by the DBMS modules, which can then look up the
catalog information as needed.
Casual users and persons with occasional need for information from the database interact using some
form of interface, which we call the interactive query interface and so on by a query compiler that compiles
them into an internal form. This internal query is subjected to query optimization. Among other things,
the query optimizer is concerned with the rearrangement and possible reordering of operations, elimination of
redundancies, and use of correct algorithms and indexes during execution. It consults the system catalog for
statistical and other physical information about the stored data and generates executable code that performs the
necessary operations for the query and makes calls on the runtime processor.
Application programmers write programs in host languages such as Java, C, or C++ that are submitted
to a precompiler. The precompiler extracts DML commands from an application program written in a host
programming language. These commands are sent to the DML compiler for compilation into object code for
database access. The rest of the program is sent to the host language compiler. The object codes for the DML
commands and the rest of the program are linked, forming a canned transaction whose executable code includes
calls to the runtime database processor. Canned transactions are executed repeatedly by parametric users, who
simply supply the parameters to the transactions. Each execution is considered to be a separate transaction. An
example is a bank withdrawal transaction where the account number and the amount may be supplied as
parameters.
In the lower part of Figure, the runtime database processor executes (1) the privileged commands, (2)
the executable query plans, and (3) the canned transactions with runtime parameters. It works with the system
catalog and may update it with statistics. It also works with the stored data manager, which in turn uses basic
operating system services for carrying out low-level input/output (read/write) operations between the disk and
main memory. The runtime database processor handles other aspects of data transfer, such as management of
buffers in the main memory. Some DBMSs have their own buffer management module while others depend on
the OS for buffer management. We have shown concurrency control and backup and recovery
systems separately as a module in this figure. They are integrated into the working of the runtime database
processor for purposes of transaction management.
Database Users
In large organizations, many people are involved in the design, use, and maintenance of a large database
with hundreds of users.
People, whose jobs involve the day-to-day use of a large database, call them the actors on the scene.
People those who work to maintain the database system environment but who are not actively interested
in the database contents as part of their daily job, called workers behind the scene.
Actors on the Scene
1. Database Administrators
Administering resources (database, DBMS and related software) is the responsibility of the database
administrator (DBA).
The DBA is responsible for
Authorizing access to the database,
Coordinating and monitoring its use,
Acquiring software and hardware resources as needed.
The DBA is accountable for problems such as security cracks and poor system response time.
2. Database Designers
Database designers are responsible for identifying the data to be stored in the database and for choosing
appropriate structures to represent and store this data.
It is the responsibility of database designers to communicate with all prospective database users in order
to understand their requirements and to create a design that meets these requirements.
In many cases, the designers are on the staff of the DBA and may be assigned other staff responsibilities
after the database design is completed. Database designers typically interact with each potential group of users
and develop views of the database that meet the data and processing requirements of these groups.
Each view is then analyzed and integrated with the views of other user groups. The final database design
must be capable of supporting the requirements of all user groups.
3. End Users
End users are the people whose jobs require access to the database for querying, updating, and generating
reports; the database primarily exists for their use. There are several categories of end users:
i. Casual end users occasionally access the database, but they may need different information each time.
They use a sophisticated database query language to specify their requests and are typically middle or
high-level managers or other occasional browsers.
ii. Naive or parametric end users make up a sizable portion of database end users. Their main job function
revolves around constantly querying and updating the database, using standard types of queries and
updates—called canned transactions—that have been carefully programmed and tested. The tasks that
such users perform are varied:
✓ Bank tellers check account balances and post withdrawals and deposits.
✓ Reservation agents for airlines, hotels, and car rental companies check availability for a given
request and make reservations.
✓ Employees at receiving stations for shipping companies enter package identifications via bar
codes and descriptive information through buttons to update a central database of received and
in-transit packages.
iii. Sophisticated end users include engineers, scientists, business analysts, and others who thoroughly
familiarize themselves with the facilities of the DBMS in order to implement their own applications to
meet their complex requirements.
iv. Standalone users maintain personal databases by using ready-made program packages that provide
easy-to-use menu-based or graphics-based interfaces. An example is the user of a tax package that stores
a variety of personal financial data for tax purposes.
A typical DBMS provides multiple facilities to access a database. Naive end users need to learn very little
about the facilities provided by the DBMS; they simply have to understand the user interfaces of the standard
transactions designed and implemented for their use. Casual users learn only a few facilities that they may use
repeatedly. Sophisticated users try to learn most of the DBMS facilities in order to achieve their complex
requirements. Standalone users typically become very proficient in using a specific software package.
4. System Analysts and Application Programmers (Software Engineers)
System analysts determine the requirements of end users, especially naive and parametric end users, and
develop specifications for standard canned transactions that meet these requirements. Application programmers
implement these specifications as programs; then they test, debug, document, and maintain these canned
transactions. Such analysts and programmers—commonly referred to as software developers or software
engineers—should be familiar with the full range of capabilities provided by the DBMS to accomplish their
tasks.
Complex attributes:
An attribute which is a multivalued as well as composite is called as composite attribute. For example,
Address of a student is a composite (contains city, street, door no, pin code) as well as multivalued (can be
present address or permanent address).
Descriptive attributes:
The attributes of any relationship set are known as descriptive attributes. Descriptive attributes are used
to record information about the relationship.
Relationship, relationship set
✓ A relationship is an association among several entities. It connects different entities through a meaningful
relation.
✓ A relationship set is a set of relationships of the same type.
✓ In E-R diagram, a relationship set is represented using Diamond symbol (Rhombus symbol).
✓ Degree of relationship set Total no. of entity sets participate in a relationship set is known as degree of that
relationship set.
MANAGED MANAGER
DEPARTMENT
BY
•Many to one. An entity in A is associated with at most one entity in B. An entity in B, however, can be
associated with any number (zero or more) of entities in A. (See Figure a.)
•Many to many. An entity in A is associated with any number (zero or more) of entities in B, and an entity in
B is associated with any number (zero or more) of entities in A. (See Figure b.)
ii. Partial participation: If only some of the entities in the entity set are participated in the relationship set is
known as partial participation. For example, every customer may not borrow loan but every loan must have
borrowed by at least one customer. Hence participation of loan is total whereas participation of customer is
partial in the relationship set borrower.
Fig. partial participation from customer to borrower and total participation from loan to borrower
The following ER diagram shows both the Manages and Works_In relationship sets and all the given constraints.
The thick lines indicate total participation and the arrow indicates a key constraint. Every department has at most
one manager is the key constraint.
Weak Entities
✓ An entity set which has no sufficient attributes to form a primary key is called as weak entity set.
✓ An entity set without a primary key is meaningless, hence to make it meaningful, it should be associated with
a strong entity set called as identifying entity set.
✓ The relationship set through which weak entity set is associated with identifying entity set is called as
identifying relationship set.
✓ Participation of weak entity set is always total in the identifying relationship set.
✓ Weak entity set is represented using double rectangle. And identifying relationship set is represented using
double diamond symbol.
✓ The attribute of weak entity set associated with primary key of strong entity set is called as discriminator or
partial key and is represented as a dotted line inside ellipse.
✓ In the following diagram, one employee may have many dependents. The attribute pname of dependents entity
set cannot be treated as primary key because more than one dependents can have the same name. Hence,
Dependents is a weak entity set. It must be associated with a strong entity set. In this case Employees is the
identifying entity set. Policy is the identifying relationship set. The attribute pname is called as discriminator or
partial key represented as dotted line inside ellipse.
Inheritance
We use all above features of ER-Model, in order to create classes of objects in object oriented programming.
This makes it easier for the programmer to concentrate on what he/ she is programming. Details of entities are
generally hidden from the user, this process known as abstraction. One of the important features of Generalization
and Specialization, is inheritance, that is, the attributes of higher-level entities are inherited by the lower level
entities. For example, attributes of a person like name, age, and gender can be inherited by lower level entities
like student and teacher etc. Here, person is the superclass. Student and teacher are the subclasses. The attributes
of superclass are inherited by subclasses.