CUAP DBMS Notes Unit11
CUAP DBMS Notes Unit11
CUAP DBMS Notes Unit11
1.1 Introduction
Importance: Database systems have become an essential component of life in
modern society, in that many frequently occurring events trigger the accessing of
at least one database: bibliographic library searches, bank transactions,
hotel/airline reservations, grocery store purchases, online (Web) purchases, etc.,
etc.
The applications mentioned above are all "traditional" ones for which the use of
rigidly- structured textual and numeric data suffices. Recent advances have led
to the application of database technology to a wider class of data. Examples include
multimedia databases (involving pictures, video clips, and sound messages) and
geographic databases (involving maps, satellite images).
Also, database search techniques are applied by some WWW search engines.
Definitions
Data
• Facts that can be recorded or stored.
• E.g. Person Name, Age, Gender and Weight etc.
Information
• When data is processed, organized, structured or presented in a given
context so as to make it useful, it is called information.
Database
• A Database is a collection of inter-related data.
• E.g. Books Database in Library, Student Database in University etc.
DBMS (Database Management System)
• A database management system is a collection of inter-related data and
set of programs to manipulate those data.
• DBMS = Database + Set of programs
• E.g. MS SQL Server, Oracle, My SQL, SQLite, MongoDB etc.
Metadata
• Metadata is data about data.
• Data such as table name, column name, data type, authorized user and
user access privileges for any table is called metadata for that table.
Data dictionary
• Data dictionary is an information repository which contains metadata.
• It is usually a part of the system catalog.
1
DATABASE MANAGEMENT SYSTEMS
Data warehouse
• Data warehouse is an information repository which stores data.
• Defining : specifying data types (and other constraints to which the data must
conform) and data organization
• Constructing : the process of storing the data on some medium (e.g.,
magnetic disk) that is controlled by the DBMS
• Manipulating: querying, updating, report generation
• Sharing: allowing multiple users and programs to access the database
"simultaneously"
• System protection: preventing database from becoming corrupted when
hardware or software failures occur
• Security protection: preventing unauthorized or malicious access to database.
• Maintain: DBMS must be able to maintain the database system by allowing
the system to evolve as requirements change over time.
2
DATABASE MANAGEMENT SYSTEMS
1. Data Redundancy
• It is possible that the same information may be duplicated in different
files. This leads to data redundancy.
• Data redundancy results in memory wastage.
• For example, consider that some customers have both kinds of accounts -
saving and current. In this case, data about customers such as name,
address, e-mail and contact number will be duplicated in both files, saving
accounts file and current account file.
• In other words, same information will be stored in two different locations
(files). And, it wastes memory.
2. Data Inconsistency
• Due to data redundancy, it is possible that data may not be in consistent
state.
• For example, consider that an address of some customer changes. And,
that customer has both kinds of accounts. Now, it is possible that this
changed address is updated in only one file, leaving address in other file
as it is. As a result of this, same customer will have two different addresses
in two different files, making data inconsistent.
3. Difficulty in Accessing Data
• Accessing data is not convenient and efficient in file processing system.
• For example, suppose, there is a program to find information about all
customers. But, what if there is a need to find out all customers from
some particular city. In this case, there are two choices here: One, find out
all customers using available program, and then extract the needed
customers manually. Second, develop new program to get required
information. Both options are not satisfactory.
• For each and every different kind of data access, separate programs are
required. This is neither convenient nor efficient.
4. Limited Data Sharing
• Data are scattered in various files.
• Different files may have different formats. And these files may be stored in
different folders (directories) may be of different computers of different
departments.
• So, due to this data isolation, it is difficult to share data among different
applications.
5. Integrity Problems
• Data integrity means that the data contained in the database is both
correct and consistent. For this purpose, the data stored in database must
satisfy certain types of constraints (rules).
4
DATABASE MANAGEMENT SYSTEMS
• For example, a balance for any account must not be less than zero. Such
constraints are enforced in the system by adding appropriate code in
application programs. But, when new constraints are added, such as
balance should not be less than Rs. 5000, application programs need to
be changed. But, it is not an easy task to change programs whenever
required.
6. Atomicity Problems
• Any operation on database must be atomic. This means, operation
completes either 100% or 0%.
• For example, a fund transfer from one account to another must happen in
it’s entirely. But, computer systems are vulnerable to failure, such as
system crash, virus attack. If a system failure occurs during the execution
of fund transfer operation, it may possible that amount to be transferred,
say, Rs. 500, is debited from one account, but is not credited to another account.
• This leaves database in consistent state. But, it is difficult to ensure
atomicity in a file processing system.
7. Concurrent Access Anomalies
• Multiple users are allowed to access data simultaneously (concurrently).
This is for the sake of better performance and faster response.
• Consider an operation to debit (withdrawal) an account. The program
reads the old balance, calculates the new balance, and writes new balance
back to database. Suppose an account has a balance of Rs. 5000. Now, a
concurrent withdrawal of Rs. 1000 and Rs. 2000 may leave the balance Rs.
4000 or Rs. 3000 depending upon their completion time rather than the
correct value of Rs. 2000.
• Here, concurrent data access should be allowed under some supervision.
• But, due to lack of co-ordination among different application programs,
this is not possible in file processing systems.
8. Security Problems
• Database should be accessible to users in a limited way.
• Each user should be allowed to access data concerning his application only.
• For example, a customer can check balance only for his/her own account.
He/ She should not have access for information about other accounts.
• But, in file processing system, application programs are added in an ad
hoc manner by different programmers. So, it is difficult to enforce such
kind of security constraints.
5
DATABASE MANAGEMENT SYSTEMS
If we want to add another piece of data to each STUDENT record, say the Birth_date,
such a program will no longer work and must be changed. By contrast, in a DBMS
environment, we only need to change the description of STUDENT records in the
catalog to reflect the inclusion of the new data item Birth_date; no programs are
changed.
System catalog is used not only by users (e.g., who need to know the names of tables
and attributes, and sometimes data type information and other things), but also by the
DBMS software, which certainly needs to "know" how the data is structured/organized in
order to interpret it in a manner consistent with that structure.
6
DATABASE MANAGEMENT SYSTEMS
details
3. Multiple Views of Data: A database typically has many types of users, each of
whom may require a different perspective or view of the database.
A view may be a subset of the database or it may contain virtual data that is derived
from the database files but is not explicitly stored. Some users may not need to be
aware of whether the data they refer to is stored or derived.
A multiuser DBMS whose users have a variety of distinct applications must provide
facilities for defining multiple views. For example, one user of the database may be
interested only in accessing and printing the transcript of each student; the view for
this user is shown in Figure 1.5(a). A second user, who is interested only in checking
that students have taken all the prerequisites of each course for which the student
registers, may require the view shown in Figure 1.5(b).
Concurrency control, which is supposed to ensure that several users trying to update
the same data do so in a "controlled" manner so that the results of the updates are
as though they were done in some sequential order (rather than interleaved, which
could result in data being incorrect).
This gives rise to the concept of a transaction, which is a process that makes one or
more accesses to a database and which must have the appearance of executing in
isolation from all other transactions (even ones that access the same data at the
"same time") and of being atomic (in the sense that, if the system crashes in the
middle of its execution, the database contents must be as though it did not
execute at all).
These apply to "large" databases, not "personal" databases that are defined,
constructed, and used by a single person via, say, Microsoft Access.
Users may be divided into
◼ Those who actually use and control the database content, and those
who design, develop and maintain database applications (called
“Actors on the Scene”), and
◼ Those who design and develop the DBMS software and related
tools, and the computer systems operators (called “Workers
Behind the Scene”).
▪An example is a tax program user that creates its own internal
database.
▪ Another example is maintaining an address book
4 System Analysts, Application Programmers, Software Engineers:
o System Analysts: determine requirements of end users, especially
naive and parametric users, and develop specifications for canned
transactions that meet these requirements.
o Application Programmers: Implement, test, document, and
maintain programs that satisfy the specifications mentioned above.
o Such analysts and application programmers commonly referred to as
software developers or software engineers
2. Controlling Redundancy:
Data redundancy is a problem that causes storing of redundant amount
of data. Redundancy means unnecessary duplication of data.
This redundancy in storing the same data multiple times leads to several
problems.
➢ There is the need to perform a single logical update- such as
entering data on a new student multiple times, once for each file
where student data is recorded. This leads to duplication of
effort.
➢ Second, storage space is wasted when same data is stored
repeatedly, and this problem may be serious for large databases.
9
DATABASE MANAGEMENT SYSTEMS
➢ Third, files that represent the same data may become inconsistent.
This may happen because an update is applied to some of the files
but not to others
➢ For example, employee mobile number is stored in employee
personal data, employee department data, employee salary data, etc.
if you need to change the mobile number of a particular employee,
then you have to change in three locations .If you miss any one
location/copy then we are not sure about the mobile number.
➢ So data redundancy causes inconsistencies. When data redundancy
is controlled data anomalies and inconsistencies are eliminated.
3. Improved data consistency:
If the amount of data redundancy is controlled, it will reduce data
inconsistency also. It is also highly recommended to maintain the same version
of data at all the locations. For example, when a customer address is stored at
only one location, if the customer changes the address, it will be automatically
reflected in all the applications related to that particular customer.
4. Restricting Unauthorized Access: When multiple users share a large
database, it is likely that most users will not be authorized to access all
information in the database. For example, financial data such as salaries and
bonuses is often considered confidential and only authorized persons are
allowed to access such data. In addition, some users may only be permitted
to retrieve data, whereas others are allowed to retrieve and update.
A DBMS should provide a security and authorization subsystem, which is
used for specifying these restrictions on user accounts.
5. Enforcements of Standards:
To facilitate the services of database management, every DBA designs
establishing procedures and enforcement of standards. Procedures are the
instructions and rules that govern the design and use of the database
system. Procedures are also used to ensure that there is an organized way
to monitor and audit both the data and the information that is generated
through the use of the data.
6. Providing Persistent Storage for Program Objects: Object-oriented
database systems make it easier for complex runtime objects (e.g., lists,
trees) to be saved in secondary storage so as to survive beyond program
termination and to be retrievable at a later time.
7. Providing Storage Structures for Efficient Query Processing: The DBMS
maintains indexes (typically in the form of trees and/or hash tables) that are
utilized to improve the execution time of queries and updates
8. The query processing and optimization module is responsible for choosing
an efficient query execution plan for each query submitted to the system.
9. Providing Backup and Recovery: The subsystem having this responsibility
ensures that recovery is possible in the case of a system crash during
execution of one or more transactions.
10. Providing Multiple User Interfaces: For example, query languages for
10
DATABASE MANAGEMENT SYSTEMS
casual users, programming language interfaces for application programmers, forms
11
DATABASE MANAGEMENT SYSTEMS
and/or command codes for parametric users, menu-driven interfaces for stand-
alone users.
11. Representing Complex Relationships Among Data: A DBMS should
have the capability to represent such relationships and to retrieve related
data quickly.
Disadvantages of DBMS
1. It is bit complex. Since it supports multiple functionality to give the user
the best, the underlying software has become complex. The designers and
developers should have thorough knowledge about the software to get
the most out of it.
2. Because of its complexity and functionality, it uses large amount of
memory. It also needs large memory to run efficiently.
3. DBMS system works on the centralized system, i.e.; all the users from all
over the world access this database. Hence any failure of the DBMS, will
impact all the users.
4. DBMS is generalized software, i.e.; it is written work on the entire systems
rather specific one. Hence some of the application will run slow.
Many relational DBMSs have incorporated object database concepts, leading to a new
category called object-relational DBMSs (ORDBMSs)
Extended relational systems add further capabilities (e.g. for multimedia data, XML, and
other data types)
◼ Relational DBMS Products emerged in the 1980s
◼ Data on the Web and E-commerce Applications:
◼ Web contains data in HTML (Hypertext markup language) with
links among pages.
◼ This has given rise to a new set of applications and E-commerce is
12
DATABASE MANAGEMENT SYSTEMS
13
DATABASE MANAGEMENT SYSTEMS
One fundamental characteristic of the database approach is that it provides some level
of data abstraction by hiding details of data organization and storage that are
irrelevant to database users and highlighting the essential features for an improved
understanding of data.
Structure of database means the data types, relationships, and constraints that apply
to the data. Most data models also include a set of basic operations for specifying
retrievals/updates.
There are other well-known data models that have been the basis for database systems.
The best- known models pre-dating the relational model are the hierarchical (in
which the entity types form a tree) and the network (in which the entity types and
relationships between them form a graph).
15
DATABASE MANAGEMENT SYSTEMS
• Advantages :
▪ It promotes Data Sharing.
▪ Parent / Child relationship promotes conceptual simplicity and data
integrity.
▪ Database security is provided and enforced by DBMS.
• Disadvantages:
▪ Complex implementation requires knowledge of physical data storage
characteristics.
▪ Changes in structure require changes in all application programs.
▪ There is no data definition or data manipulation language in the
DBMS.
▪ There is a lack of standards.
2. Network Data Model:
In this model, the user observes the network database as a collection of
records in 1: M relationship. This model allows a record to have more than
one parent.
In network database terminology, a relationship is called as a set which is
composed of at least two record types, an ―owner record and a ―member
record.
A set represents a 1: M relationship between owner and the member.
➢ Advantages:
▪ It handles more relationship types such as M:N and multi parent.
▪ Data access is more flexible than in hierarchical and file system models.
▪ Data Owner/Member relationship promotes data integrity.
16
DATABASE MANAGEMENT SYSTEMS
➢ Advantages:
▪ Structural independence is promoted by the use of independent
tables.
▪ Ad hoc query capability is based on SQL.
▪ Powerful RDBMS isolates the end user from the physical level details.
➢ Disadvantages:
▪ The RDBMS requires substantial hardware and system software
overhead.
▪ Conceptual simplicity gives relatively untrained people the tools to
use a good system.
17
DATABASE MANAGEMENT SYSTEMS
➢ Advantages:
▪ Semantic content is added.
▪ Visual representation includes semantic
content.
▪ Inheritance promotes data integrity
➢ Disadvantages:
▪ Slow development of standards caused vendors
to supply their own enhancements, thus
eliminating a widely accepted standard.
▪ It is a complex navigational system.
▪ High system overhead slows transactions.
5. Entity-relationship (E-R) data model:
▪ The ER data model is based on a perception of a real world that
consists of a collection of basic objects, called entities, and of
relationships among these objects.
➢ Entity: A real world object
E.g. customers, accounts, bank branch
➢ Relationship: An association between entities.
E.g. Pursues, works-for, managers
➢ Attribute: Property of the Entity.
E.g. empno, ename, sal.
➢ Advantages :
▪ Conceptual simplicity: ER model represents the concepts of a
database along with its entities and relationships in an easy way. It
18
DATABASE MANAGEMENT SYSTEMS
Schema diagram displays the structure of each record type but not the actual
instances of records.
Database state or snapshot: The actual data stored in the database probably
changes often. The data in the database at a particular time is called the database
state, or a snapshot. It is also called the current set of occurrences or instances
in the database.
For example: STUDENT construct will contain the set of individual entities
(records) as its instances.
Note: The schema is sometimes called the intension, and the database state is
called an extension of the schema.
19
DATABASE MANAGEMENT SYSTEMS
A commonly used view of data approach is the three-level architecture suggested by the
ANSI/SPARC (American National Standards Institute/Standards Planning and
Requirements Committee). ANSI/SPARC proposed an architectural framework for
databases.
The three levels of the architecture are three different views of the data:
• External schema or view level
• Conceptual schema or logical schema
• Physical schema or internal schema
20
DATABASE MANAGEMENT SYSTEMS
Conceptual Level
• This is the next higher level of the data abstraction.
• It describes what data are stored in the database and what
relationships exist among those data.
• It is also known as a logical level.
• Conceptual view is defined by conceptual schema. It describes all
records and relationship.
External Level
• This is the highest level of data abstraction.
• It is also known as view level.
• It describes only part of the entire database that a particular end user
requires.
• External view is describes by external schema.
• External schema consists of definition of logical records, relationship
in the external view and method of deriving the objects from the
conceptual view.
• This object includes entities, attributes and relationship.
21
DATABASE MANAGEMENT SYSTEMS
The process of transforming data via mappings can be costly (performance- wise),
whichis probably onereasonthatreal-life DBMS's don't fully implement this 3-schema
architecture.
Database Languages:
22
DATABASE MANAGEMENT SYSTEMS
Data-Definition Language:
For instance, the following statement in the SQL language defines the account
table:
Execution of the above DDL statement creates the account table. In addition, it
updates a special set of tables called the data dictionary or data directory.
We specify the storage structure and access methods used by the database system
by a set of statements in a special type of DDL called a data storage and
definition language or storage definition language. These statements define
the implementation details of the database schemas, which are usually hidden
from the users.
The data values stored in the database must satisfy certain consistency constraints.
For example, suppose the balance on an account should not fall below Rs. 1000.
The DDL provides facilities to specify such constraints. The database systems check
these constraints every time the database is updated.
Data-Manipulation Language:
Data manipulation is
• The retrieval of information stored in the database.
• The insertion of new information into the database.
• The deletion of information from the database.
• The modification of information stored in the database
Procedural DMLs or low level DMLs: require a user to specify what data are
needed and how to get those data.
This query in the SQL language finds the name of the customer whose
customer-id is 192:
Select customer_name from customer where customer_id = 192;
The query specifies that those rows from the table customer where the
customer_id is 192 must be retrieved, and the customer_name attribute of
these rows must be displayed.
DBMS interfaces:
A database management system (DBMS) interface is a user interface which allows the
ability to input queries to a database without using the query language itself.
User-friendly interfaces provide by DBMS may include the following:
1. Menu-Based Interfaces for Web Clients or Browsing –
These interfaces present the user with lists of options called menus that lead the
user through the formation of a request. Pull-down menus are a very popular
technique in Web based interfaces.
2. Apps for Mobile Devices:
These interfaces present mobile users with access to their data.
For example, banking, reservations, and insurance companies, among many
others, provide apps that allow users to access their data through a mobile
phone or mobile device. The apps have built-in programmed interfaces that
typically allow users to login using their account name and password; the apps
then provide a limited menu of options for mobile access to the user data, as
well as options such as paying bills (for banks) or making reservations (for
reservation Web sites).
3. Forms-based Interfaces:
A forms-based interface displays a form to each user. Users can fill out all of the
form entries to insert new data, or they can fill out only certain entries, in which
case the DBMS will retrieve matching data for the remaining entries. Forms are
usually designed and programmed for naive users as interfaces to canned
transactions.
Example: SQL*Forms, Oracle Forms
- Provides an extensive set of features to design and build applications using
forms.
- Some systems have utilities that define a form by letting the end user
interactively construct a sample form on the screen.
24
DATABASE MANAGEMENT SYSTEMS
25
DATABASE MANAGEMENT SYSTEMS
• Software: To make the database system function fully, three types of software
are needed: operating system software, DBMS software, and application
programs and utilities.
a. Operating system software manages all hardware components and makes it
possible for all other software to run on the computers.
Examples: Microsoft Windows, Linux, Mac OS, UNIX.
b. DBMS software manages the database within the database system. Some
Examples: Microsoft’s SQL Server, Oracle Corporation’s Oracle, Sun’s MySQL,
and IBM’s DB2.
c. Application programs and utility software are used to access and
manipulate data in the DBMS and to manage the computer environment in
which data access and manipulation take place. Application programs are most
commonly used to access data found within the database to generate reports,
tabulations, and other information to facilitate decision making. Utilities are
the software tools used to help manage the database system’s computer
components.
For example, all of the major DBMS vendors now provide graphical user
interfaces (GUIs) to help create database structures, control database access,
and monitor database operations.
• People: This component includes all users of the database system. On the
basis of primary job functions, five types of users can be identified in a
database system: system administrators, database administrators,
database designers, system analysts and programmers, and end users.
Each user type, described below, performs both unique and complementary
functions.
a. System administrators: oversee the database system’s general operations.
b. Database administrators: also known as DBAs manage the DBMS and ensure
that the database is functioning properly.
c. Database designers: design the database structure. They are, in effect, the
database architects. If the database design is poor, even the best application
26
DATABASE MANAGEMENT SYSTEMS
programmers and the most dedicated DBAs cannot produce a useful database
environment.
d. System analysts and programmers: design and implement the application
programs. They design and create the data entry screens, reports, and
procedures through which end user’s access and manipulate the database’s
data.
e. End users: are the people who use the application programs to run the
organization’s daily operations. For example, salesclerks, supervisors,
managers, and directors are all classified as end users.
f. Procedures: Procedures are the instructions and rules that govern the design
and use of the database system. Procedures play an important role in a
company because they enforce the standards by which business is conducted
within the organization and with customers. Procedures are also used to
ensure that there is an organized way to monitor and audit both the data that
enter the database and the information that is generated through the use of
those data.
g. Data: The word data covers the collection of facts stored in the database.
Because data are the raw material from which information is generated, the
determination of what data are to be entered into the database and how those
data are to be organized is a vital part of the database designer’s job.
As can be seen from the above diagram, all the information for the organisation is
stored in a single database. This database is known as the centralized database.
Advantages:
Some advantages of Centralized Database Management System are:
27
DATABASE MANAGEMENT SYSTEMS
Disadvantages:
Some disadvantages of Centralized Database Management System are:
1. Since all the data is at one location, it takes more time to search and access it.
If the network is slow, this process takes even more time.
2. Since all the data is at the same location, if multiple users try to access it
simultaneously it creates a problem. This may reduce the efficiency of the
system.
3. If there are no database recovery measures in place and a system failure
occurs, then all the data in the database will be destroyed.
28
DATABASE MANAGEMENT SYSTEMS
Client/Server Structure:
The client/server architecture was developed to deal with computing environments
in which a large number of PCs, workstations, file servers, printers, database servers,
Web servers, e-mail servers, and other software and equipment are connected via a
network. The idea is to define specialized servers with specific functionalities.
For example, it is possible to connect a number of PCs or small workstations as
clients to a file server that maintains the files of the client machines. Another
machine can be designated as a printer server by being connected to various
printers; all print requests by the clients are forwarded to this machine.
The client machines provide the user with the appropriate interfaces to utilize these
servers, as well as with local processing power to run local applications.
An illustration of the client/server structure is as follows−
An example of a two tier client/server structure is a web server. It returns the required
web pages to the clients that requested them.
29
DATABASE MANAGEMENT SYSTEMS
• If the client nodes are increased beyond capacity in the structure, then the
server is not able to handle the request overflow and performance of the
system degrades
Three-tier Client / Server Architecture
Three-tier Client / Server database architecture is
commonly used architecture for web applications.
Intermediate layer called Application server or Web
Server stores the web connectivity software and the
business logic (constraints) part of application used
to access the right amount of data from the
database server. This layer acts like medium for
sending partially processed data between the
database server and the client.
Advantages:
• The three-tier structure provides much better service and fast performance.
• The structure can be scaled according to requirements without any problem.
• Data security is much improved in the three-tier structure.
Disadvantages:
30
DATABASE MANAGEMENT SYSTEMS
31