DBMS unit 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Chapter 1

Introduction to DBMS
By
Dr.Sunil Kumar

1.1 INTRODUCTION
DataBase Management System has evolved from a specialized computer application to a
central component of computing environment. Database system plays a vital role in
organizing data about a particular enterprise. Consider an example of a company which
stores data about following:
Employees (Employee No., Name, Address, Salary).
Departments (Department No., Name, Location)
Project (Name, Project No., Department No.,
Location) Which may have following relations :
An Employee works in a Departments.
An Employee works on many Projects.
A Department handles many Project.
Therefore, a system is needed which can effectively organize the data and also use it to
analyze and guide operations of the company.
Now–a–days, the amount of information to be stored is increasing tremendously and thus
the need of flexible and powerful system is also increasing day–by–day which has the
ability not only to effectively organize or maintain large collection of data but also
provides easy access to data.

1.2 CONCEPT OF DBMS


This section covers basic definitions related to DBMS and also explain various
components of DBMS. Data Base Management System consists of two components :
(i) Database i.e. collection of data
(ii) System or set of programs which are used to access and manage the database.
By incorporating these two components, DBMS organize the information, maintain it and
retrieve it efficiently as and when required. So to use and understand the system, as well
maintained database and set of programs are needed.
1.2.1 Concept of Data and Database
The word Data is derived from a Latin word which means „to give‟ thus data are given
facts from which additional facts can be inferred. Data, are facts or undoubted
information used for different computations or calculations. For example – the facts
related to an employee in a company like his Employee No., Name, Salary, Designation
etc. are data but when these data are retrieved or processed to find answers of questions
like :
What is Employee No. of employee whose salary is more than 10,000?
What is name of employee whose Employee No. is 24?
Then it becomes information. Thus information is a processed form of data and database
is a logically coherent collection of data, not the information, with same meaning.
A database, is a collection of interrelated data which represents some aspects of real
world. Database has some inherent meaning and is related to a particular group of users
or applications. For example – Database of a college, may contain data about students,
faculties, courses etc. which are related to each other with certain relations like – faculty
teaches students, students are enrolled in courses etc. Thus, we can say that database
contains the data, related to a real world enterprise, and is designed, built and populated
with data for specific applications related to the enterprise.
1.2.2 Definition of DBMS
A database management system (DBMS) is essentially a collection of interrelated data
and a set of programs to access this data. This collection of data is usually called the
database. Database systems are designed to maintain large volumes of data. Management
of data involves:
• Defining the structures for the storage of data.
• Providing the mechanisms for the security of data against unauthorized access.
The primary objective of a DBMS is to provide an environment that is convenient and
efficient to use, in retrieving information from and storing information into the database.

The user of the DBMS is provided the following facilities among others:
• Adding empty files to the database.
• Inserting new data into the existing files.
• Retrieving data from the files.
• Updating data in the files.
• Deleting data from the files.
• Removing existing files from the database.
Therefore, DBMS can be used for different purposes besides data storage which are as
follows:
(i) Efficient access to data.
(ii) For avoiding data redundancy and inconsistency.
(iii) For providing security of data.
(iv) For enforcing different integrity constraints.
(v) For providing access for data to multiple users concurrently.

1.3 HISTORY OF DBMS


Most of the software applications focus on the manipulation of the data from the starting
days of computer. So, there is a need arises for a system that helps in storing and
manipulating the data. The first general purpose DBMS was designed by Charles
Bachman in early 1960‟s and called as Integrated Data Store. It founded the basis for the
network data model and influenced database system through 1960‟s. In 1966, IBM
released the first commercially available DBMS called IMS (Information Management
System) which based on the hierarchical data model and assumes all data relationship to
be structured as hierarchies. Conference on Data Systems Languages (CODASYL) set
standards for network database product in 1969.
Dr. E. F. Codd, an IBM researcher, proposed relational data model in theoretical paper in
1970. The publication of Codd‟s paper in early seventies set off a flurry of activities in
both research and commercial system developments communities and they worked to
bring out a relational DBMS. IBM developed a relational model prototype in 1976. In
1980‟s, the relational model was developed as a standard approach for DBMS. SQL is
developed as a part of IBM‟s system R project which becomes a standard query
language. So IBM released first commercially available Database product based on
relational model SQL/DS for interactive operating system in 1981. IBM produced DB2
for its mairyraness with batch operating system in 1982. SQL was standardized and was
adopted as a query language by ANSI and ISO. Many developments were being done in
1980‟s and 1990‟s in the area of database system, which include the release of Paradox,
DBase, Foxpro and Access. Different researches worked out to develop a more powerful
and rich data model which can support complex data types.
Later on, Enterprise Resource Planning (ERP) and Management Resource Planning
(MRP) evolved. Both of these packages identify a set of common tasks e.g. human
resource planning, inventory management etc. of a large organization and provide a
general application layer to carry out these tasks. Then after, DBMS get into the
revolutionary age of internet. Data stored in the DBMS can now be accessed with the
help of web browsers from anywhere and at any time. Stored data is being provided on
the web in the form of HTML and XML documents. So in 2000, the fashionable area for
innovation is XML database. XML databases aim to remove the traditional division
between documents and data, allowing an organization‟s information resource to be held
in one place, whether they are highly structured or not. All the database vendors try to
develop more advanced DBMS which can support complex data like video, streaming
data, digital libraries on the web. Thus, the database system evolved from sequential file
access to the object oriented database system used in present scenario.

1.4 FILE SYSTEM V/S DBMS


Initially, a computer system used by an enterprise mainly performs data processing tasks
i.e. to insert the information about employees, retrieve information about employees of
particular department, accounting functions on salary of employees etc. Since these
systems performed normal record keeping functions, they were called data processing
system. Thus data processing system is an automated system for processing data of an
organization. The conventional data processing approach is to develop a program (or
many programs) for each application. This results in one or more data files for each
application (fig. 1.1). Some of the data may be common between files. However, one
application may require the file to be arranged on a particular field, e.g. amount. A major
drawback of conventional method is that the storage and access techniques are built into
the program. Therefore, though the same data may be required by two applications, the
data will have to be stored in to different places because each application depends on the
way that the data is stored.
There are various drawbacks of the conventional data file processing environment. Some
of them are listed below.
(i) Data Redundancy and inconsistency
Some data elements like name, address, identification code, are used in various
applications. Since data is required by multiple applications, it is stored in multiple data
files. In most cases, there is a repetition of data files. This is referred to as data
redundancy, and it leads to higher storage and access cost. In addition, it any lead to data
inconsistency; that is, the various copies of the same data may no longer agree.
(ii) Difficulty in accessing the data
Suppose the one of the university department needs to find out the names of all students
who live in a particular city. Because the original designed software does not provide any
report regarding this kind of information. There are two choices to meet this requirement:
either obtain the list of all students and extract the required information manually. The
other way is to ask a programmer to write the necessary application Program. Both
alternatives are unsatisfactory. Therefore the conventional file-processing environments
do not allow needed data to be retrieved in a convenient and efficient manner. More
responsive data-retrieval systems are required for general use.
(iii) Data Isolation
When data is scattered in different files, the availability of information from combination
of files is constrained to some extent.

Figure 1.1 One to one correspondences between applications and data files

(iv) Integrity problems


The data values stored in the database must satisfy certain types of consistency
constraints. Suppose a university maintains the record of each student and requires that
the enrollment number of each student should be unique. Developers enforce these
constraints in the system by adding appropriate code in the various application programs.
However, when new constraints are added, it is difficult to change the programs to
enforce them.
(v) Atomicity problems
A computer system, like any other device, is subject to failure. In many applications, it is
crucial that, if a failure occurs, the data be restored to the consistent state that existed
prior to the failure. Consider a program to transfer 1000Rs from the account A to the
account B. If a system failure occurs during the execution of the program, it is possible
that the 1000 Rs debited from account A but was not credited to Account B. This results
in an inconsistent database state. Clearly, it is essential to database consistency that either
both credit or the debit occurs, or that neither occurs. That is, the fund transfer must be
atomic- it must happen in it is entirely or not at all. It is difficult to ensure atomicity in a
conventional file-processing system.
(vi) Concurrent –access anomalies
For the sake of overall performance of the system and faster response, many systems
allow multiple users to update the data simultaneously. Today the largest internet retailers
may have millions of accesses per day to their data by shoppers. In such an environment,
interaction of concurrent updates is possible and may result in inconsistent data. Consider
a registration program maintains a count of students registered for a course, in order to
enforce limits on the number of students registered. When a student registers, the
program reads the current count for the course, verifies the count back to the database.
Suppose two students registers concurrently, with the count of (say)39, and both would
then write back 40, leading to an incorrect increase of only 1, even though two students
successfully registered for the course and the count should be 41.
(vii) Security problems
Not every user of the database system should be able to access all the data. For example,
in a university, accounts personnel need to see only the part of database that has financial
information. They do not need access to information about academic records. But, since
application programs are added to the file-processing system in an ad hoc manner,
enforcing such security constraints are difficult.

1.4.1 Advantage of DBMS over File System


File system stores data in the form of records and data which are files managed by
operating system and uses application program to extract information from the file.
A major advantage the database approach has over the conventional approach is that a
database system provides centralized control of data.
(i) Reduced Redundancy
Unlike conventional approach each application does not have to maintain its own data
files. Data can be integrated and used by multiple applications at the same time.
(ii) Ensure Consistency
It becomes very difficult to maintain consistent format of files in file system. Different
programmers can use different programming languages, which may cause duplication of
information in several files. This duplication results in higher storage and access cost. In
addition, it may lead to data inconsistency i.e. various copies of same information may
not agree. For example, consider an employee management system, if address of an
employee which is stored at two places and is updated at only one place then the system
will give conflicting information and become inconsistent. The DBMS can guarantee that
the database is never inconsistent, by providing a fix format of data and by ensuring that
a change made to any entry, automatically applies to the other entries as well. This
process is known as propagating updates.
(iii) Data Manipulation Capabilities
File system requires an application program for processing the data stored in files
according to needs of user. If the user needs get changed then a different application
program is required. For example consider the employs management system. Suppose we
want to find name of employees in “Jaipur” then either new application program is
developed or we have to find out the name of employee having city as Jaipur manually in
the case of files system. This method is not an efficient process as developing a new
application program takes a lot of time and it is possible that after development of
program, our needs changes from finding employees in Jaipur to find employees in
„Malviya Nagar, Jaipur‟. Database system can solve such problems by simply firing
queries to the database as needed and retrieve answer in response.
(iv) Data Independence (Reduced Programming Efforts)
In non-database systems, the requirements of the application dictate the way in which the
data is stored, and the access techniques. Besides, the knowledge of the organization of
the data and the access techniques are built into the logic and code of the application.
These systems are data dependent. Consider this example, suppose the university
(mentioned previously) has an application that processes the student file. For
performance reasons, the file is indexed on the roll number. The application would be
aware of the existing index, and the internal structure of the application would be built
around this knowledge. Now, consider that for some reason, the file is to be indexed on
the registration date. In this case, it is impossible to change the structure of the stored data
without affecting the application too. Such an application is a data dependent one. It is
desirable to have data independent applications. Suppose two applications X and Y need
to access the same file. However, both applications require a particular field to be stored
in different formats. Application X requires the field “customer-balance” to be stored in
decimal format, while application Y requires it be stored in binary format. This would
pose a problem in the old systems. In a DBMS, differences may exist in the way that data
is actually stored, and the way that it is seen and used by a given application. To conform
to the changing requirements of the enterprise, the DataBase Administrator (DBA) may
need to change the storage structure or access techniques. The DBA should be able to do
this without having to modify the existing applications. If applications are data
dependent, programmer effort, that could otherwise be available for the creation of new
applications, would be necessary to modify existing applications to match the changes
made.
(v) Atomicity and Transaction Management
File system does not ensure completion of transaction and it may cause problem of data
inconsistency. For example, consider employee management system where company
wants to shift an employee from sales department to finance department. The procedure
for this transaction is to perform two operations, reduction in number of employees in
sales department and increment in the number of employee in finance department, but in
file system may combine of both operations cannot be guaranteed as we cannot make a
single unit of these two operations and if only one operation is performed and system
crashes then the database will become inconsistent. This problem can easily be solved by
database management system. It ensures completion of whole transaction which
combines more than one operation or no operation will be performed on behalf of the
transaction. This property is called ‘atomicity’.
(vi) Security
File system does not provide any security to the data stored, as there are no authentication
rights provided to user for the file. Complete file is at expose of user. The DBA has to
guarantee that only authorized persons have access to the database. The DBA defines the
security checks to be carried out. Different checks can be applied to different operation
on the same data. For instance, a person may have the access rights to query on a file, but
may not have the rights to delete or update that file. The DBMS allows such security
checks to be established for each piece of data in the database.
(vii) Integrity
Inconsistency between two entries can lead to integrity problems. However, even if there
is no redundancy, the database can still be inconsistent. For example, a student may be
enrolled in 10 courses in a semester when the maximum number of courses, one can
enroll is 7. Another example could be that of a student enrolling in a course that is not
being offered that semester. Such problems can be avoided in a DBMS by establishing
certain integrity checks to be carried out whenever any update operation is done.

1.5 DISADVANTAGES OF DBMS


In spite of many advantages, DBMS does not proves to be powerful or advantageous
system in certain scenarios due to following:
(i) Overhead for providing security, integration of data, transaction management,
concurrency control etc.
(ii) More investment is required for hardware and software.
(iii) Special training is required to use DBMS.
(iv) Its performance may not be adequate for certain specialized applications
(v) Many applications may need to manipulate the data in ways not supported
by the query language.
So, it is quite advantageous to use file system in certain situations, which are :
(i) Database and application are simple and not expected to change .
(ii) Concurrent access is not required.
(iii) Real time applications as time constraints are not easy to maintain with DBMS.

1.6 DESCRIBING AND STORING DATA IN DBMS


DBMS is always concerned with some real world enterprise. Data stored in DBMS
describe real world entities and represent relationships between these entities. For
example, there are employees, departments and projects in a company and data in the
company database describe these entities, in terms of their attributes and relationship to
other entities. Data can be described through different data model and at different levels
of abstraction.
1.6.1 Data Abstraction
Data abstraction is one of the fundamental characteristic of any database management
system, which helps in making data more accurate and easy to use. Abstraction refers to
the act of representing essential features without including background details or
explanations. So, data abstraction refers to the act of representing data without giving
details that how data are stored or maintained. Data abstraction prevents irrelevant
information at a particular level. Complexity of data is hiding through several levels of
abstraction so as to simplify user interaction with the system. Different levels of
abstraction are:
(i) Physical Level or Internal Level
It is the lowest level of abstraction which specifies storage details that how data are
actually stored on disks or on tapes. It specifies in the manner in which records are stored
either as the collection of pages or as the collection of records. Complex low level data
structures are described in detail at this level. The design of data structure described at
this level is called physical schema. The data structure at this level may include B trees,
B+ trees, hashing etc.
(ii) Logical Level or Conceptual View
The next higher level of abstraction describes what data are stored in the database, and
what relationship exists among those data. There is only one conceptual schema per
database. This schema also contains the method of deriving the objects in the conceptual
view from the internal views. The description of data at this level is in a format
independent of its physical representation. It also includes features that specify the checks
to retain data consistency and integrity. The logical level of abstraction is used by
database administrators, who decide what information is to be kept in the database

(iii) View Level


It is the highest level of abstraction which describes different views of the entire
database. These views are designed according to the requirements of user who wants to
access only a part of the database. A database may have several views, according to the
demand of individual user or the group of users. The data in these views are not exactly
stored in DBMS but they are computed using specification of view described by user. An
analogy to the concept of data types in programming language may clarify the distinction
among levels of abstraction. Most high-level programming languages support the notion
of a record type.
At physical level, a customer, account, employee record can be described as a block of
consecutive storage locations for example words or bytes. The language compiler hides
this level of details from the programmers. Similarly, the database system hides many of
the lowest level storage details from database programmers.
At logical level, each such record is described by a type definition and the
interrelationship among these record types is defined. Programmers using a programming
language work at this level of abstraction. Similarly, database administrators usually
work at this level of abstraction.
Finally, at the view level, computer users see a set of application programs that hide
details of the data type. Similarly, at the view level, several views of the database are
defined, and database users see these views. In addition to hiding details of the logical
level of the database, the views also provide a security mechanism to prevent users from
accessing parts of the database.

Figure 1.2 Database


Abstraction 1.6.2 Instances and Schemas
The collection of information stored in the database at a particular moment is called an
instance of the database. The overall design of the database is called the database
schema. Schemas are changed infrequently. Database systems have several schemas,
partitioned according to the levels of abstraction. At the lowest level is the physical
schema, at the intermediate level is the logical schema and at the highest level is a
subschema. In general database system supports one physical schema, one logical
schema, and several subschemas.
1.7 DATA INDEPENDENCE
Three levels of abstraction, along with the mappings from internal to conceptual and from
conceptual to external; provide two distinct levels of data independence: Logical and
physical data independence.
1.7.1 Logical Data Independence
Indicates that conceptual/logical schema can be changed without affecting the existing
external (view) schemas. The change would be absorbed by the mapping between the
external and conceptual (logical) levels. Consider a change in the conceptual view such
as merging two records into one or adding fields to an existing record. This would require
a change in the mapping from the external view to the conceptual view so as to leave the
external view unchanged. Some changes such as the deletion of a conceptual view field
or record may require changes in the external view and application program.
1.7.2 Physical Data Independence
Indicates that the physical storage structures or devices used for storing the data could be
changed without necessitating a change in the conceptual view or any of the external
views. The change would be absorbed by the mapping between the conceptual and
internal levels. Modifications at physical level are occasionally to improve performance

1.8 DATABASE LANGUAGES


Most of the database management systems provide specialized languages called database
languages, to interact with database or to get some job done from the database.
Commands of these languages provide facility to the user to operate and manage the
database efficiently. Importance of database to the user is dependent on the ease with
which information can be obtained from it and one of the biggest reasons of popularity of
relational database management system is that it allows a rich class of questions to ask
from the database in an easy manner. Consider our sample Employee database, a user
may ask:
(i) Who is getting highest salary?
(ii). How many employees are working under Mr. Ramesh ?
(iii) In which department employee‟s strength is highest?
Such questions which involve the data stored in a database are called queries and
database management systems provides a specialized language called query language to
ask queries from database.
A DBMS provides facility of data manipulation like retrieval of data, insertion of data,
modification of existing data and deletion of data etc. Such facilities are given by Data
Manipulation Language (DML) commands. Query language is only one part of DML.
DBMS also supports some commands that can make changes in the structure of the
database i.e. schema, such commands are known as Data Definition Language (DDL)
commands. The DML and DDL are collectively known as data sublanguages, when
embedded within a host language. DDL make changes in Meta data (data about data)
stored in data dictionary and DML retrieve data from a database (query) or make changes
in an instance of the database.
Relational model supports a powerful mathematical logic based language, called
relational calculus which is a nonprocedural language for defining query solutions.
Queries in this language have precise meaning.
Similarly, in the data world, we need some special kind of programming languages to
make the DBMS software understand our needs and manage the data stored in the
databases accordingly. These programming languages are known as database languages
or query languages.
Database languages are used to perform a variety of critical tasks that help a database
management system function correctly. These tasks can be certain operations such as
read, update, insert, search, or delete the data stored in the database.
1.11 DATABASE USERS
Different people can use the database in a different manner. The interaction between user
and the database may be of several types according to the user. We can have four types of
users differentiated by the way they expect to interact with the DBMS :
1.11.1 Database Administrator
Databases of an enterprise are typically important enough and complex enough that the
task of designing and maintaining it requires a professional, called the DataBase
Administrator (DBA). DBA has central control over the system. The database
administrator is responsible for following functions :

(i) Schema Design and Maintenance


The DBA creates the database schema after interacting with the users of the system and
analyzing what data are to be stored in the database. He or she writes a set of definitions
that are translated by DDL compiler into a set of tables and are stored the in data
dictionary.
(ii) Physical Schema and Organisation Modification
DBA is responsible for designing physical schema and deciding how the data will be
stored on the physical media. DBA defines storage structure and access methods by
writing a set of definitions.
(iii) Authorization and Security
DBA is responsible for ensuring that unauthorized data access is not permitted. The DBA
also decides which parts of the database, various users can access and in which mode
(read, write or both). Database administrator grants different types of authorizations to
different users. For example a clerk may be authorized to view salaries of different
employees but he may not be authorized to update salaries. This authority can be given to
the accounts officer only. So different access rights are given to different users according
to requirement.
(iv) Integrity Constraint Specification
Data stored in the database must satisfy certain constraints for example in an employee
database, Employee No. must be unique for each employee and Address value must not
be left blank. These constraints are called integrity constraints and it is the responsibility
of DBA to identify all such constraints and apply them to the database.
(v) Recovery From Failure
The DBA is responsible for taking necessary steps required for restoring the database if
the system fails DBA should keep backups of the database time to time and maintains
logs of system activities so that recovery may become possible.
(vi) Database Upgradation
DBA always tries to know, understand and analyze, changing requirements of user and
make upgradations in the database accordingly.
1.11.2 Application Programmers
They are computer professionals and write application programs. They embedded DML
call in the host language program. These DML calls are converted in the host language
normal procedure call by DML precompiler. The resulting program is then run through
the host language compiler, which generates appropriate object code.
1.11.3 Sophisticated Users
They work like an analyst and submit queries to a giving process or directly to the
DBMS, which breaks down the query into instructions that the storage manager
understands.
1.11.4 Specialized Users
Speciallized users write specialized database applications like computer aided design
systems, knowledgebase and expert systems etc. Such systems are different from
traditional data processing framework and uses complex data types.

1.11.5 Naive Users


They use the DBMS only by interacting through application programs written previously.
For example the clerk at ticket booking window, he uses an application program to do his
job of making reservations for a passenger.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy