Dbms Book
Dbms Book
Dbms Book
Concepts of Database
Management Systems
Structure
1.0 INTRODUCTION TO DBMS
1.1 OBJECTIVES
1.2 DATA PROCESSING – AN IMPORTANT ASPECT OF ANY BUSINESS
1.2.1 Data and Information
1.2.2 Data / Information Processing and Databases
1.2.3 Data – Types and Properties
1.2.3.1 Data Types
1.2.3.1.1 Data Representation
1.2.3.1.2 Data Size
1.2.4 Data Organization and Grouping
1.2.4.1 Character
1.2.4.2 Field
1.2.4.3 Record
1.2.4.4 File
1.2.4.5 Database
1.3 DATABASES AND THEIR MANAGEMENT
1.3.1 Objectives of DBMS
1.3.2 Components of DBMS
1.3.3 Types of Databases
1.3.3.1 Operational Database
1.3.3.2 Analytical Database
1.3.3.3 Distributed Database
1.3.3.4 Personal end user Database
1
1.3.3.5 Multimedia Database
1.3.3.6 Special Purpose Database
1.3.4 Database Models
1.4 Storage of information
1.4.1 Operational unit
1.4.2 Storage unit
1.4.3 External storage unit
1.5 Record and Record Organization
1.5.1 Definition and concepts
1.5.2 Record Organization
1.6 File and File Organization
1.6.1 Structure of sequential file
1.6.2 Processing of sequential file
1.7 Index Sequential file
1.7.1 Structure of Index sequential file
1.8 Direct file Organization
1.9 Summary
1.10 Questions
1.11 Further Readings
2
INTRODUCTION TO DBMS
INTRODUCTION :
Data structure can be defined as specification of data. Different data structures like
array, stack, queue, tree and graph are used to implement data organization in main
memory. Several strategies are used to support the organization of data in secondary
memory. In this unit we will look at different strategies available for organizing data in
secondary memory. We will also learn about data representation for files in external
storage devices, so that required operations (e.g. retrieval, update) may be carried out
efficiently.
1.1 OBJECTIVES:
3
8. External storage unit ( i.e. Secondary memory)
Business organizations – Big and Small – generate lot of data in terms of activities
they perform. Even individuals need to handle lot of data in their day to day life. A
simple example would be an address book that we all maintain. In this book we keep
information like name, address, and phone numbers etc., for all the people with whom we
interact. Without this book, it will be impossible for us to carry on our day to day activity
of contacting and communicating with our friends, relatives and business associates. As
the size of the business organization increases, the amount of data it generates increases
exponentially. Hence the need for storing and using them too raises multifold. Modern
businesses have recognized this need and duly stress the importance of data as a vital
resource to conduct business profitably. Two terms – Data and Information are used in
this connection. Let us understand their scope and difference.
The observed data is usually represented by symbols such as numbers words, codes
(composed of a mixture of numerical and alphabetical and other characters). It could even
take other forms like, voice, images, pictures, drawings, etc.
4
If the observed / collected data is converted into a useful and meaningful form, then
it becomes Information. Data is usually subjected to a value-added process called Data
Processing OR Information Processing, where –
1. its form is aggregated, manipulated and organized,
2. its content is analyzed and evaluated and
3. It is presented in a context meaningful to a human user.
Thus we see that information is processed data, placed in a context that gives its
value for specific end users as shown below.
Sales
North, Rs 15,000
North : Rs. 95,000
Ramesh, Rs 20,000 Data
Ramesh Rs 60,000
South, Rs 50,000 Processing Saxena Rs 35,000
Saxena Rs 35,000
South : Rs 50,000
Narayan Rs 4,000
Narayan Rs 4,000
Thus we see that data storage and retrieval is one of the central activities in
Information processing. Such collection and organization of information is called “Data
bank”. In early days of business, Data banks existed in the minds of key Personnel in the
business. As the volume and complexity increased several tools like, Books, records,
manuals, drawings etc., were devised as “Data banks” and manual procedures and skills
were evolved to retrieve information from these banks when needed.
5
However these techniques were not reliable and fast enough when the information
involved was huge and complex. Hence business decisions could not be accurate and
timely. To correct this Lacuna, Information systems were computerized. The speed and
accuracy of computers resulted in tremendous improvement of reliability and timeliness
of information generated. This process however, involved the development of techniques
and tools to handle data banks on computers, namely, the tools to store and retrieve
information in computers. The development of such techniques and tools resulted in what
are known as DBMS packages.
c. Database Maintenance - Add, delete, update, and correct the data in a database.
Let us try and understand these tasks in detail later. First let us start a detailed study
of Data.
All data items have certain fundamental properties. It is important to know them first
in order to create databanks. First and foremost property of the data is its form. Every
data element will have a form. Data items are classified as different data types based on
their form. The form decides the way it is stored in the computer.
6
1.2.3.1 Data Types:
Data can be classified as Numeric, Picture, Voice, Data based on its Form. The last 2
types namely picture and voice are special forms of data and normally they are used less
frequently. It is the textual data that is very large and most used. Hence let us focus on
that first. Textual data can be numeric or alphanumeric (combination of numeric and
alphabetic)
Numeric data consists of numbers.
Example:
As you can notice from the examples, pure numeric data items can be classified
further into 2 types. One of them is a whole number. (Like, number of students in a class,
number of vehicles in the city) These are called integers. On the other hand, we also have
numeric data, which includes fractions. (Like price of an item is 48.56, Max.
Temperature today was 28.32 etc). These data items are called Real numbers. This
difference of data types namely integer and real number is of importance to us because
they are represented and manipulated differently in a computer.
The next data type is alphabetic or alphanumeric. This type of data is made up of
alphabetic and numeric characters. (E.g.: The name of a person is HARI, the Reg.No. of
vehicle is KA – 09 F-1234) This type of data may contain numbers along with alphabets
but the number is not used as a numeric data in any calculation. This data type is called a
string of alphanumeric characters. How are these data represented inside computer?
7
Example :
Letter A could be 00110000;
Even pictorial and voice data gets coded into a large number of 0’s and 1’s.
All data items do have a size. Looking at previous examples we may say Number of
Students in a class needs2 digit of space, price for an item may need 4 digit space (2
before decimal and 2 after decimal. – decimal point need not be stored). A name string
may need a maximum number of 30 character positions. Further, when it is stored inside
a computer, it may need 30x 8 =240 bits. A picture data may need several thousand-bit
positions. The property size is of special importance to us because we need to provide
adequate space to store these items in the system. Further, DBMS packages should be
able to distinguish these data types and provide necessary functions to manipulate them.
1.2.3.3 Relationship
Even though data items are individual entities, they never occur in isolation in the
real world. They are always associated with other data item. Ex: Data item price is related
to the vehicle in question, Date of transaction and the seller.
There are 3 different types of data relationships. Let us understand each one of
them.
Simplest of all is 1 : 1 relationship. For each value of a data item there is one and
only corresponding value in the other item.
E.g.: Student ID and the student name.
Normally all such data items are grouped and kept together as a record.
Second type of relationship is one to many (1: M). Here for every value of one data
item there are several values of the other data item. However on the reverse, several
values of other data items are related to a unique value of this data item.
8
2. A person can own several vehicles; all vehicles will have only one
owner.
One to many relationships can be represented in computers using pointers and
arrays. (Details later)
Third type of relationships is called Many to Many. (N: M). Most of the relationships
in real world are this type.
E.g.: - 1. A student has several teachers; A teacher might have several
students.
The Database must maintain all the data and their relationships and allow the user to
access data based on these relations.
E.g.: Get me all vehicles owned by a person. Get me the subjects taught by a
teacher.
The grouping of related data items from user’s view is called logical grouping.
The grouping of data items from the point of view of its storage inside the computer is
called physical grouping.
Just as writing is organized in letters, words, sentences, paragraphs and chapters,
Data can be organized as characters, fields, records, files and databases.
1.2.4.1 Character:
Character is the most basic logical data element, which consists of a single
alphabetic / numeric or other symbol.
9
1.2.4.2 Field:
Field is the next higher level of data. A field consists of grouping of characters.
1.2.4.3 Record:
Related data fields are grouped to form a RECORD. A record, thus is a collection of
attributes that describe an entity.
E.g.: 1. An employee record could consists of attributes like, his ID, name
and salary he draws etc.
2. Set of subjects taught for a class during each hour.
1.2.4.4 File:
E.g.: 1. A group of all employee records showing one record for each
employee could be an employee file. Files are frequently classified by
application for which they are used.
2. Timetable for a class for a week showing subjects taught each hour on
each day of the week.
Files are frequently classified by the application for which they are primarily used
such as payroll file, Inventory file etc.
1.2.4.4 Database:
E.g.: 1. The timetable for an entire school showing the details of classes,
subjects, room, teacher's etc.
10
A Personnel database consolidates data files like, Payroll files, Personnel action files,
employee skill files etc.
Payroll Inventory
File File
DATA BASE
Application
User
Programmes
Application
Programmes User
Application
Programmes User
11
Creation of database involves specifying data types, structures and their relationship
constraints for the data stored in database.
The basic entities in this example are subjects, courses, teachers, rooms, student's
etc.; there will be associations or relationships linking these entities.
A teacher may teach several subjects. Several teachers may teach a subject.
12
7. Protect data from physical hardware failure and unauthorized access.
DBMS packages on personal computers allow end users to develop databases for
their personal need. They are called single user databases. However, large organizations
with lot of users usually place control of enterprise database development in the hands of
the DATABASE ADMINISTRATORS (DBA’s) and other specialists. This improves the
integrity and security of organizational databases. Database developers use DATA
DEFINITION LANGUAGE (DDL) to specify data structures, relationships and modify
these structures if needed. The detailed information about these structures is called
METADATA. It is stored in the DATA DICTIONARY component of DBMS, which is
maintained by DBA.
Users are allowed to insert, modify, delete and retrieve data from the database
according to their needs. They use DATA MANIPULATION LANGUAGE (DML) for
this purpose. Further, DBA needs to guard this database from media failures, accidental
erases etc., For this purpose, he creates copies of the databases and the changes occurring
for later recovery in case of failures. He uses DATABASE UTILITIES to handle these
functions of backup and recovery.
These databases contain information extracted from operational databases. They are
used by the managers to study the trends and patterns emerging in the business to make
strategic decisions and policy making. They are also known as Data warehouses,
information Databases and Decision support Databases. They are generally used in query
mode rather than update mode. Techniques like online Analytical Processing (OLAP) and
13
Data Mining are used in these databases to generate meaningful information for business
analysis, market research etc,
These databases include non-conventional data like, pictures, voice tracks along with
conventional alphanumeric data. These databases tend to be huge in size and access is
done through specialized access language constructs. The data accessed further needs to
be interpreted and displayed by additional front-end software like Browsers and media
players. From database management viewpoint, the set of interconnected multimedia data
needs to be handled as specialized structures rather than simple records.
These databases are developed and used for certain special purpose applications.
Spatial Databases, Temporal databases Biological databases etc. belongs to this category.
The data stored in these applications are of a different kind and needs to be interpreted
according to the ground rules of those applications. Hence special techniques are used for
storage and access of data in these databases.
14
1.3.4 Database Models:
Databases are distinguished based on the conceptual model of data and the
underlying relationships among them. All models try to represent data and their
relationships using simple elegant models.
An early data model widely used in 70’s was HIERARCHICAL Model where the
model captures the intuitive hierarchy of data elements. User is allowed to navigate
through the data structures using the “tree – like” hierarchies. The early generation
database from IBM, namely IMS, is based on this model. Hierarchical models cannot
represent many to many relationships in an elegant fashion. Such data relationships
resulted in cumbersome structures with lot of duplication of data and slow access. To get
over these limitations CODASYL committee proposed a NETWORK MODEL in 70’s
and 80’s. IDMS from cullinet, DMS – 1100 from Unisys Corporation, are typical
representatives of this generation of databases. While the network model provided much
more abstraction power and very good performance for large volume data, it lacked
elegance. It required high level of skills to use these databases. Further, it was difficult to
dynamically alter the structures. Mr. Codd of IBM later proposed an elegant and flexible
RELATIONAL MODEL. The elegance, simplicity and a solid theoretical foundation
made this the darling of database developers and users. Today, this is the most popular
database available on range of machines from PC’s to mainframes. DB2 of IBM,
ORACLE, INFORMIX, ACCESS, LOTUS etc., are all based on this popular model.
DBMS’s built using this model use SQL (Structured Query Language) as the means to
create and manipulate data. SQL is an elegant, simple yet powerful interface to all
relational databases. The present day RDBMS’s provide support for several other tools
and utilities to ease application development. Most common utilities are
Report Generator to access data and present it in a printed format suitable for the end
user.
E.g.: ORACLE REPORT GENERATOR
Utilities to load and extract Bulk data from the database are provided to speed up
data loading and extraction.
15
DBA utilities to, manage security and limit access to data.
Current generation DBMS packages provide most of these above utilities along with
some more to manage Databases effectively. They in fact, create a total environment
under which the user can comfortably handle all his information processing needs.
16
3.0
3.1 Three functions of Data Processing are
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
3.2 List 4 tasks handled by DBMS
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
3.3 List 3 important properties of data
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
3.4 Different data structures in the order of complexity are
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
3.5 Six objectives of DBMS are
------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
17
ANSWERS
1.0
1.1 True
1.2 False
1.3 False
1.4 True
1.5 True
2.0
2.1 (d), 2.2 (a), 2.3 (b), 2.4 (c), 2.5 (a)
3.0
3.1 1. Aggregate, manipulate and organizing of the data,
2. Analyze and evaluate contents and
3. Present it in a context meaningful to user.
3.2 a) Database development
b) Database Integration
c) Database maintenance
c) Application Development
3.3 Type
Size
Relationship
3.4 Character
Field
Record
File
Database
3.5 a. Provide for mass storage of relevant data.
b. Make access to the data easier to user
c. Provide prompt response to the user’s request for data.
d. Allow for the modification of data in a consistent manner.
e. Eliminate or reduce the redundant data.
f. Allow multiple users to be active at a time.
18
Storage, Record and File
Organization
1.4 The Storage of information
In a digital computer there are two types of memory units, namely operational units
and storage units. The name that is commonly associated with operational units are
register, A register is used for the temporary storage and manipulation of information.
The storage type memory unit is designed to store information, which is more
permanent in nature. For example, a particular storage unit or set of storage units is
associated with a particular variable in program, variable can be referred as varies the
value or quantity which is present during execution.
19
When program is executed, its instructions and data generally reside in storage units.
The entire set of storage units in the main frame or main part of the computer is often
called main memory. In some instances program can also reside in storage units, which
do not belong to main memory. Examples of such storage unit devices (often called
secondary storage device) are magnetic disk and magnetic drum.
The data in the main memory or internal memory of computer can be accessed very
quickly, a typical access time is less than 1 micro second (= 10–6 sec). Main memory
provides for the immediate storage requirements of central processor for execution of
program.
The storage capacity of main memory is limited by two factors the cost of memory
and technical problems in developing large capacity main memory. The storage
requirements for programs and the data on which they operate exceed the capacity of
main memory in virtually all computer systems. Therefore, it is necessary to extend the
storage capabilities of a computer by using device external to main memory.
20
1.5.1 Definition and concepts
Comprehensive and consistent overview of hierarchy of information structures
associated with file processing.
A collection of records involving a set of entities with certain aspects in common and
organized for some particular purpose is called a file. For example the collection of all
passengers on a particular flight constitutes a file.
A record item that uniquely identified in a file. In the passenger file, individual
passenger records can uniquely identified by the passenger’s assuming duplicate names
do not occur for a particular flight. The seat number item can be also be used as key, if
desired, since seat numbers are uniquely assigned for a given flight.
21
ordered by customer account number with more than one occurrence of a
customer sales record type for a given account number.
Thus we have observed a hierarchy of information structures in which items are
composed to form records and records are composed to form a file. If the set of files used
by the application programs for some particular enterprise or application area, and if
these files exhibit certain associations relationship between the records of the files than
such collection of files as often referred to as a database or data bank. Figure 1.3 shows
the information structure hierarchy as it applies to a file processing application.
Databas
e
Record
Record Record
Let us examine that some of the factors that effect the organization of a file. The
prime factor, which determines the organization of a file, is the nature of operations that
are to be performed on the file, as dictated by applications. The operations normally
performed are namely, retrieval, addition, deletion and updation. A particular operation
involving a record or set of records is called transaction.
E.g.: Delete Rama from the student list for the Ist Year is a transaction.
Add Watson to student list for Ist Year
22
1.5.2 Record Organization
In a relational database record of distinct relations are generally of different sizes.
One approach to mapping the database to files is to use several files and store records of
only fixed length in any given file. An alternative is to structure our files in such a way
that we can accommodate multiple lengths for record. Files of fixed length records are
easier to implement than files of variable records.
A record item has a fixed length value and its domain is too large for an efficient
encoding, a primitive data-structure( i.e., integer, real, char) format should be selected for
the representation of the item. For example it is unreasonable to bit-encode an item
representing the net sales for the month. We can declare a record containing such an item
in the programming language being used.
Record : Monthly_Report
Because both of these items may be considered as fixed length items, they can
technically be called precoordinated. That is fixed length item can only have a finite set
of values which can be priori enumerated.
Many applications arise in which the value associated with a record item may be list
of entities. For example 'the degree held' and 'programming languages used at a computer
installation' are item which can assume multiple entities. In these instances, the item vale
may be "B.Sc., M.Sc., Ph.D," or "COBOL,C, Pascal, Fortran" respectively. The most
popular method of handling repeating fields is to create an item, which can accommodate
up to some maximum number of replications. If we represent this maximum number to
three, then the example items can accommodate such information as the three most recent
degrees obtained and three most often used programming languages.
23
1.6 Files and file Organization
The technique used to represent and store the records on a file is called file
organization. The fundamental file organization techniques are Sequential and Index
sequential. The presentation of each of these organizations begins with a description of its
file structure.
There are two basic ways, that the file organization techniques differ. First, the
organization determines the file’s record of sequencing, which the physical ordering of
the records in storage. The second, the file organization determines set of operations
necessary to find a particular record. Individual records are typically identified by having
particular values in search key fields. This data field may or may not have duplicate
values in file, the field can be a group or elementary item, some file organization
techniques provide rapid accessibility on a verity of search key; other techniques support
direct access only on the value of a single key.
In a sequential file, records are stored one after the other on storage device and
sequential allocation is conceptually simple, yet flexible enough to cope with many of the
problems associated with handling large volumes of data, a sequential file has been the
most popular basic file structure used in the data–processing industries.
All types of external storage devices support a sequential file organization. Some
devices, by there physical nature, can only support sequential files. Information is stored
on magnetic tape as a continuous series of record along the length of the tape. Accessing
particular record requires the accessing of all previous records in a file. Other devices,
which are strictly sequential in nature, are tape cassettes and line printers.
24
The operations that can be performed on a sequential file may differ slightly,
depending on the storage device used. For example, a file on magnetic tape can be either
an input file or output file, but not both at one time. A sequential file on a disk can be
used strictly for input, strictly for output, for update. Update means that, as records are
read, the record most recently read can be rewritten on the same file
surname, as follows
AGARKER first , BAKER second ,…………., ZIDANE last.
There are occasions in which, serial processing is all that is required on a file
irrespective of the key or item index upon which the file is ordered. For example, if we
are to add a pay increase of 1000 Rupees the wage item of all employees, it is irrelevant
whether the file is sequenced by name or by employee’s identification number.
25
In Sequential processing, transaction records are usually grouped together and sorted
according to the same index item as records in the file. Each successive record of the file
is read, compared with an incoming record and then processed in a manner that is usually
dependent upon whether the value of the record index item is less than, equal to, or
greater than the value of the index item of the transaction record.
Sequential and serial processing are most effective when high percentage of the
record in file must be processed. Since every record in the file must be scanned, a
relatively large number of transactions should be grouped together for processing. If
records are to be added to a file, it is necessary to create a new file unless the records are
to be added to the end of the file.
2. A new file should be created if there are any additions and deletions
requested.
The retrieval of a record from a sequential file, is inefficient and time consuming for
large files . To improve the query response time of sequential file, the type indexing
techniques can be added.
Most important aspect affecting the file structure is the type of physical medium on
which the file resides. The capability of directly accessing a record based on a key can
only be achieved if the external storage device used supports this type of access. In
particular, devices such as magnetic tape and cassette tape units allow the access of a
particular record only after reading all the other records that physically appear before a
desired record in the file. Hence direct access is impossible for these types of devices.
The type of external storage devices that support for both sequential and direct are
magnetic disks unit.
The file structure concept relating to indexed sequential are best exemplified when
considering a magnetic disk as the storage medium. In fact, because of their low
26
price/performance ratio and large total storage capacity, disks are generally chosen when
using indexed sequential files.
Indexing associates a set of orderable quantities, which are usually smaller in number
for faster search. The idea of indexing is to expedite the search process. Indexes are
created from a sequential (or sorted) set of primary keys are referred to as index
sequential. We shall use the term index file to describe the indexes, data file referred to
data records and pointer is address of the variable.
A sequential file that is indexed is called an index sequential file. The index provide
the random access to records, while sequential nature of the file provides easy access to
the subsequent records as well as sequential processing. An index sequential file consists
of three separate areas: the prime area the index area and the overflow area. An additional
feature of this file system is the overflow area. This feature provides an additional space
for record addition without necessitating the creation of a new file.
The prime area is an area into which data records are written when the file is first
created. The file is created sequential, that is, by writing records in prime area in a
sequence dictated by the alphabetical ordering of the keys of the records. The cylinder of
a disk. When this cylinder is filled writing continuos on the second track of the next
cylinder and continues in this fashion until the file’s creation is completed. If the newly
created file is accessed sequentially according to the key item, the records are processed
in the order they were written.
Type of Indexes
Index access structure is similar to that behind the indexes used commonly in
textbooks. A textbook index lists important terms at he end of book in alphabetic order.
Along with each term, a list of page numbers where the term appears is given. We can
search the index to find a list of addresses -page numbers in this case and use addresses
to locate term in the textbook by searching the specified pages.
Primary Indexes
A primary index is an ordered file whose records are fixed length with two fields
the first field is of the same data types as the ordering key field of the data file , and the
second field is pointer to disk block address. The ordering key field is called the primary
key of the data file. There is one index entry in the index file for each block in the data
file. Each index entry has the value of the primary key for the record in a block and a
pointer to other block as its two filed values. We will refer to two field values of index
entry i as K(i), P(i).
27
Block 1
Aaron, Ed
Abbott, Diane
:
Acosta, Marc
Block 2
Adams, John
Adams, Robin
:
Akers, Jan
Block n
Wright Pam
Wyatt,Charles
:
Zimmer,
Byron
Figure 1.4: Some blocks on an ordered (sequential) file of
Employee records with name as the ordering field
To create a primary index on the ordered file shown in figure 1.4, we use the Name
field as primary key, because that is ordering key field to the file. Each entry in the index
will have a Name value and pointer. Figure 1.5 illustrate this primary index. The total
number of entries in the index will be the same as the number of disk blocks in the
ordered data file. The first record in each block of the data file is called the anchor
record of the block, or simply the block anchor similar to one described here can be
used , with last record in each block, rather than the first, as block anchor, a primary
index is an example of what is called non-dense index because it includes an entry for
each disk block of the data file rather than for every record in the data file. A dense
index, on the other hand, contains an entry for every record in the file.
The index file for a primary index needs substantially fewer blocks than the data file
for two reasons. First there are fewer index entries than there are records in the data file
because an entry exist for each whole block of the data file rather than for each record.
Second each index entry is typically smaller in size than a data record because it has only
two fields, so more index entries than data records will fit in one block. A binary search
on the index file will hence require fewer block accesses than a binary search on the data
file.
28
DATA FILE
(PRIMARY KEY FIELD)
Wright, Pam *
Woods,
Manny
Wright, Pam
Wyatt,
Charles
Zimmer,
Byron
Figure 1.5 : Primary index on the ordering key field of the file
29
Major problem with primary index as with any ordered file is insertion and deletion
of records. With primary index, the problem is compounded because if we attempt to
insert in its correct position in the data file., we not only have to move records but also
change some index entries because moving records will change the anchor records of
some blocks. We can use unordered overflow file. Another possibility is to use a linked
list of overflow records for each block in the data file. We can keep the records within
each block and its overflow-linked list sorted to improve retrieval time. Record deletion
can be handled using deletion markers.
9
1 5
2 13
3 8
4
5 6
6 15
7 3
8 17
9 21
10 11
11 16
12 2
13
14 24
15 10
16 20
1
17
18 4
19 23
20 18
21 14
22
23 12
24 7
19
22
Figure 1.6: A dense secondary index on a non ordering key field of a file
30
Secondary Indexes
A secondary index also is an ordered file with two fields, and, as in the order
indexes, the second field is pointer to disk block. The first field is of the same data type
as some non-ordering field of the data file. The field on which the secondary index is
constructed is called an indexing field of the file, whether its values are distinct for every
record or not. There can be many secondary indexes, and hence indexing fields, for the
same file.
We first consider a secondary index on a key field a field having a distinct value for
every record in the data file. Such field sometimes called a secondary key for the file. In
this case there is one index entry for each record in the file, which has the file of the
secondary key for the record and pointer to the block i which the record is stored. A
secondary index on a key field is a dense index because it contains one entry for each
record in the data file.
We again refer to the two field vales of index entry i as K(i), P(i). The entries are
ordered by value of K(i), so we can use binary search on the index. Because the records
of the data file are not physically ordered by value of the secondary key field, we cannot
use block anchors. That is why index entry is created for each record in the data file
rather than for each block as in the case primary index. Figure 1.6 illustrates a secondary
index on key attributes of a data file. Notice that in figure 1.6 the pointers P (i) in the
index entry are block pointers, not record pointers. Once appropriate block is transferred
to main memory. A search for the desired record within the block can be carried out.
A secondary index will usually need substantially more storage space than primary
index because of its larger number of entries. However, the improvement in search time
for an arbitrary record is much greater for a secondary index than it is for a primary
index. Because we would have to do a linear search on the data file if the secondary index
did not exist. For primary index, we could still use binary search on the main file even if
the index did not exist because the records are physically ordered by the primary key
field.
31
pointer is provided in the logical area or associated index entry point to overflow
location. This is illustrates figure 2.5 record 615 is inserted in the original logical block
causing a record to be moved to an overflow block.
Multiple record belonging to same logical area may be chained to maintain logical
sequencing. When records are forced into overflow area as result of insertion, the
insertion process is simplified, but the search time is increased. Deletions of records from
index-sequential files create logical gaps; the records are not physically removed but only
flagged as having been deleted. If there were a number of deletions, we may have great
amount of unused space.
1. A primary data storage area. In certain systems this area may be unused
spaces embedded within it to permit addition of records It may also
include records that have been marked as having been deleted.
2. Overflow areas. This permits the additions of records to the file. A number
of schemes exist for the incorporation of records in these areas into the
expected logical sequence.
The records are written in data blocks in ascending key sequence. These data blocks
are in turn stored in ascending sequence in the primary data area.
32
organization the key value is mapped directly to storage location. The usual method of
direct mapping is by performing some arithmetic manipulation of the key value. This
process is called hashing. Let us consider hashing function h that maps key value key k to
the value h(k). The value h(k) is used as an address and for our application we require
that this value be in some range. If our address area for the records lies between s1 and
s2, the requirement for the hash function h(k) is that for all values of k it should generate
values between s1 and s2.
It is obvious that a hash function that maps many different key values to a single
address or one that does not map the key values uniformly is bad hash function. A
collision is said to occurs when two distinct key values are mapped to the same storage
location. Collision is handled in a number of ways. The colliding records may be
assigned to the next available space, or they may be assigned to overflow area. We can
immediately see that with hashing schemes there are no index to traverse. With well-
designed hashing functions where collisions are few, this is great advantage.
The use of the bucket reduces the problem associated with the collisions. In spite
of this, a bucket may become full and the resulting overflow could be handled by
providing overflow buckets and using a pointer from the normal bucket to an entry in the
overflow bucket. All such overflow entry are linked. Multiple entries from the same
bucket results in a long list and slow down the retrieval of these records. In an alternative
scheme, the address generated by the hash function is bucket address and the bucket is
used to store the records directly instead of using a pointer to the block containing the
record.
33
S gives the number of buckets, simple hashing functions h(k) = k mod s, where k
the numeric representation of the key and h(k) produces a bucket address.
34
Blocks of records
496
Bucket1
Key address
209 176
610
920
976
176
Bucket2
177
610
362
Bucketn
331
920
209
209
Overflow
Buckets
331
362
35
Advantage of hashing:
1) Key matches are extremely quick.
2) Hashing is very good for large keys, or those with multiple columns, provided
the complete key value is provided for the query.
3) No disk space used by this indexing method
Disadvantage of hashing:
1) It becomes difficult to predict overflow because the working of the hashing
algorithm will not be visible to the data base administrator.
2) No sorting of data occurs either physically or logically so sequential access is
poor.
3) This organization is usually takes a lot of disk space to ensure that no
overflow occurs.
1.9 Summary
All businesses need to process data. As data volume increases, the data processing
becomes highly complex. Computers are used in this process. One important aspect of
this computerized data processing is the storage and retrieval of data. Databases provide
this functionality and DBMS packages are software tools to implement databases.
Data as an entity has several important properties like Form, size, organization and
relationships. The form of data namely numeric, Alphabetic, integers and real numbers
represent the different types of data stored in databases. Size of the data plays a central
role in deferring the volume of database and techniques needed to store them.
Organizing and grouping of the data, into characters, fields, records and files of define
the basic building blocks of the database. Databases are classified into different types of
databases based on their usage. Different Data Models have resulted in different kinds of
databases that provide the basic service of storage and retrieval of the data.
36
1 Nature of operation to be performed
2 Characteristics of storage media to be used.
3 Volume and frequency of transaction to be processed
3 Response time requirements.
1.10 Review
1. What is record organization ? Explain the variable and fixed length record
2. How the Index-sequential file organized and explain deletion and addition of
records
1. Tremblay and Sorenson, 'An introduction to Data structures with applications' 2nd
Edition 1984, Mc Graw Hill publications
2. Bipin Desai, An Introduction to data base system, Golgotia Publications New Delhi,
1994
37
Unit 2
Structure:
2.1.1 Introduction
2.1.2 Microsoft Access database
2.1.3 Tables and Queries
2.1.4 Forms and Reports
2.1.5 Accessing Microsoft Access
2.1.6 Opening a database
2.1.7 Database window
2.1.8 Objects of the Access database
Structure:
2.2.1 Introduction
2.2.2 Creating a Microsoft Access database
2.2.3 Creating objects
2.2.4 Customizing toolbars
2.2.5 Fields and data types
2.2.6 Creating a table
2.2.7 Field properties
2.2.8 Save and close a table
2.2.9 Add and save records
2.2.10 Edit records and close a table
2.2.11 Modify fields in a table
2.2.12 Modify columns and rows in data sheet
2.2.13 Validation rule to a field
38
2.3 Data Manipulation in DBMS
Structure:
2.3.1 Introduction
2.3.2 Find a value
2.3.3 Find and replace a value
2.3.4 Create and apply a filter
2.3.5 Sort records
2.3.6 Create a query
2.3.7 Query window
2.3.8 Join tables
2.3.9 Select fields
2.3.10 Specify criteria
2.3.11 Calculate totals
2.3.12 Modify and save a query
Structure:
2.4.1 Introduction
2.4.2 Creation with Form Wizard
2.4.3 View, Add, Delete and Save records
2.4.4 Save and Close a Form
2.4.5 Change Form Design
2.4.6 Select, Resize, Move and Delete controls
2.4.7 Change Fonts, Size and Color of Text
2.4.8 Showing data from more than one table
Structure:
2.5.1 Introduction
2.5.2 Create a report
2.5.3 Preview, print and save a report
2.5.4 Report in design view
39
2.1.1 Introduction
This unit gives you an introduction, as to what an RDBMS is, and what is the
difference between MS-Access, an RDBMS and other packages. Also you will learn to
open an existing database and see all the objects present in an Access database.
Microsoft Access is a relational DBMS. Microsoft Access is also a database like any
other database. Why one should go in for MS-Access, why not for any other one, like
FoxBASE or Dbase?
A Microsoft Access database is a collection of database files, which are also known
as tables. And each table is a collection of records, and a record is a collection of fields.
40
If the company wants to store the employee details, they will have to form a table,
which will be part of some database. The information about an employee will make one
record of that table and the information will be stored under fields such as employee
number, Employee name, and others.
Example
Each record in a table contains the same set of fields and each field contains the
same type of information for each record.
In MS-Access, a Query is a question you ask about the data in your database. The
answer to the question can be from a single table or several tables; the query brings the
data together.
Example
Suppose in the personal information system, the manager of the company wants to
know the total basic salary of all the employees. The answer to the query may be Yes or
No. Keeping track of a large number of employees is difficult.
For Example
41
You create a query that describes the set of records you want. When you use the
query to access the data, you automatically get current data from the table/s.
First way:
Second way
The second way of viewing data is more preferable. A query output can be viewed as
in the first way. But it can be viewed in the second way by using Forms.
A form is a customized way of viewing, entering and editing records in the database.
You can specify how data is to be displayed when you design the form. Forms can be
created to resemble more closely the way data would be entered on paper form so that the
user feels familiar with the operation.
42
Forms and queries present the data on screen. Reports are used to present data on
printed paper. It provides a way to retrieve and present data as meaningful information,
which might include totals and sub totals, which have to be shown across a set of records.
1. Open the program group that contains the Microsoft Access icon.
43
2. Double click the Microsoft Access icon. Microsoft Access starts and displays
Microsoft Access window, where you can create or open a database.
To open a database
1. Choose Open database from the file menu.
2. Select the directory from directories list that contains the database file.
44
2.1.7 Database window
When a database is opened, Microsoft Access displays its database window in the
Microsoft Access window. From Access window you can create and use any object in
your database and other features of the Microsoft access.
• Title bar is located at the top of the screen and displays the name of the program.
• Menu bar is located below the title bar. It lists the various options.
• Tool bars generally located below the menu bar, provides quick access to most
frequently used commands and utilities. It can be customized by dragging the tool
bars and placed in convenient positions by the user.
• Status bar is a horizontal bar at the bottom of the screen that displays information
about commands, toolbar buttons and other options.
45
2.1.8 Objects of the Access database
Tables, queries, forms, reports, macros and modules are objects of the Access
database. The object buttons in the database window provide direct access to every object
in the database.
Example
Similarly all other objects in the database window can be viewed by clicking on the
appropriate object buttons.
To close a database
Select Close database from the File menu.
46
2.2 Working with Access database
2.2.1 Introduction
Now, we are familiar with opening an existing database and all the objects in the
database. Let us learn to create a new database and objects in the database.
A table is a collection of data stored about a particular subject. The data in a table is
presented in columns and rows. We will also learn to create the basic structure of a table,
to add rows (records) and to edit them.
47
5. Select the directory in which you want to create the database. Enter a database name,
which can contain upto 8 characters but no spaces in the file name box. No need to
give extension because Microsoft Access automatically adds an extension to the
database name.
48
To modify the design of an object
1. Select the object type to modify from the database window.
2. Select the object name from the list to modify.
3. Click the Design button to display object window in design view.
Note: There is an option to create objects yourself or through the of access wizard.
An access wizard is like a database expert, which prompts you with queries about the
object and then builds the object based on the answers to the queries. Creation of objects
with the help of wizards will be covered later.
49
create and modify objects in the database. When you start, Microsoft Access displays
tools only for opening and creating a database. After a database is opened, new toolbars
get added to the existing ones. The toolbars get or loose focus as and when you open any
object (forms, tables, queries, reports, etc.) in Design, open or New view.
Initially, the toolbar appears at the top of the Microsoft Access window and the tools
are arranged in a single row. We can customize the toolbar into vertical side of window,
bottom of the window and middle of window and change its shape.
To Customize toolbars
Toolbar customize window is displayed Use of different options allows the toolbars
to be customized.
The first step in designing the database is to make the table structure. Each table in
the database represents a single subject, for example employee information or an invoice.
Before designing a table one should be very clear about the data that is to be stored
in the table, based on which a table structure is created. For example, details of employee
information stored in a table requires employee number, employee name, date of joining,
sex, basic salary, qualification, department. These details are referred to as fields in
database terminology. Fields can be of different data types like number, character or date.
Microsoft Access uses the Datatypes to decide how much storage to give to a field
and to ensure that the right kind of data is entered in the field. For example, a text cannot
be entered in a numeric field.
Choosing the right Datatypes for a field is important before entering data in the table.
Datatypes of a field that already contains data can be changed but if the Datatypes are not
compatible there may be loss of data.
Example
50
Structure of an EMPLOYEE table
Sex (SEX)
Qualification (QUALIFICATION)
Department (DEPT_CODE).
EMP_NO and BASIC_SALARY fields will have numeric data and so can be of type
‘number’
A table first created is an empty container for data. The table is designed to contain
specific type of data.
To create a table
51
3. Click the New table button to open table window in Design view.
We now have a window where we can specify the fields in our table and what kind
of data they will be storing. The creation of table structure begins from here. The window
below depicts the table in design view. The table window has two portions. The upper
portion has field name, data type and description of the field. The lower portion has field
properties like size, format, etc. For creating the structure:
52
a. Enter the first field name ‘EMP_NO’ in field name box. Field name can
consist of upto 64 characters.
b. Press Tab key to go in data type box and select datatype, for example
Number.
c. Press Tab key to go in Description box and type, for example ‘Employee
number’. This description appears in the status bar when data is being
entered in the field.
Press Tab key to go in to the next field.
d. Repeat steps a, b, and c to add other fields.
To set a field property
1. Select field in the upper portion of the table window in design view.
53
2.2.7 Field properties
You can control the appearance of data, specify default values and speed up
searching and sorting by setting field properties in table’s design view.
Field size: Suppose the EMP_NAME should not exceed 20 characters, set the field
size to 20 or limit the range of allowable values in case it is a number field.
Format: You can specify the number or date fields in any of the following formats:
Decimal places: Display a certain number of places after the decimal point when
using a format for a number or currency field.
Default value: Suppose if the user does not enter a value for a field, some value
should be taken for that field. In such a case use the default value. For example, if DOJ is
not entered by the user, current date should be taken as DOJ. Use of default value will
automatically fill the current date in DOJ field, in new records.
Save the table design before you can add any records.
2. If you are saving the table for the first time, type a name for the table and click
Ok. Table name can be upto 64 characters.
To close a table.
54
2.2.9 Add and save records
After designing, you can add records to a table.
To add records
2. Click the Open button from the database window to open table in datasheet view.
3. Enter a value in each field pressing Tab key to move to the next field.
4. After you fill in all the fields, press Tab key to move to the new blank record.
When you move to the next record, Microsoft Access saves the record added to the
data sheet. When you finish adding records, close the data sheet, you don’t have to save
your work.
When you open a data sheet, the first field of the first record is selected.
Use the mouse to select the contents of the field you want to modify.
Type the new value for the field. To cancel all editing changes to a field, press Esc
key.
If any modifications to fields in a table are desired, you can rearrange them, edit
them, delete them or insert new fields also.
To edit a field
55
1. Select the field to edit.
2. Edit name, data type or description of the field in the upper portion of the table
window.
3. Modify the field properties in the lower portion of the table window.
4. Save it and close the table.
To move a field
1. Select the field by clicking the field selector to the left of the field name.
2. Click the field selector again and hold the mouse button and drag it to the new
location.
56
2.2.12 Modify columns and rows in datasheet
If the columns in a data sheet don’t fit the field values they display, the width of each
column, the height of each row can be changed. Also you can rearrange the data sheet
columns.
Microsoft Access automatically validates values based on field’s data type. For
example, a text cannot be entered in a number field. You can set more specific rules for
data using validation rules. You can set validation rule, property for the field.
When a validation rule property is set, it specifies the requirements for data that is
entered into a field. For example, employee name should not be left blank for which a
validation rule can be specified.
If the validation rule is violated when an entry is being made, some kind of message
to be displayed is specified in the validation text. This text is displayed when an entry in
the field breaks the validation rule.
57
Examples
EXERCISE 1:
58
After creating the tables, do the following:
59
2.3 Data Manipulation in DBMS
2.3.1 Introduction
Table is used to store data. Stored data can be retrieved whenever required. There are
many ways in which data stored in a table can be viewed based on some criteria. Let us
learn find, filter, query and sort to view data.
Suppose you require the details of an employee where employee number is ‘1234’.
One way of getting the details is to open the table in open mode and browse through all
the records one by one. The other way is to use the find option. When you want to find
the specific record or find certain values within the fields, you can use the find option to
go directly to a record. You can also use the find option to navigate through records and
find one record after another.
3. In the find what box, type the value you want to find
4. Click the Find first button to move to the record if it exists.
5. Click the Find next button to find the next occurrence of the specified value
6. At the end click the Close button to close the dialog box.
60
text by using the replace command. Replacements can be made either individually or
globally.
1. Select the field where you want to search and replace in the open view.
2. Select Replace from the edit menu
The replace dialog box is shown below:
Microsoft Access provides two ways to create a customized view of data in tables. A
query or a Filter for a table can be created. A filter is like a simple query except that it
applies only to an open table.
A filter is best for temporarily changing the set of records being viewed.
To filter
61
2. Select filter from the Records Menu.
4. Select Apply filter / Sort from the Records menu to display some filtered
records in the table.
5. To remove a filter, select Remove filter / Sort from the Records menu.
Records in a table can be sorted in a different order than they are usually displayed
by using the Sort command. Sorting records for display could be either Ascending or
Descending order.
62
Queries help to
1. Choose fields.
2. Choose records, that is specify criteria.
3. Sort records, that is specify order.
4. Look for data in several tables.
5. Perform calculations.
6. Make changes to data in tables.
To create a Query
2.Click the New button to display the new query dialog box.
Click the OK button to open a select query window and displays the Show table
dialog box, which displays the Tables and the Queries in the database.
63
2. Select the table and click on Add to display a field list for each table.
Design view
Use this option to create a query or change the design of an existing query. You can use
graphical query tools to create a query.
64
Datasheet view
Use this option to see the data retrieved by query.
SQL view
Use this option to enter SQL (Structured Query Language) statements to create or
change a query.
65
The tool used to create a query in design view is called QBE (Graphical Query by
Example). With Graphical QBE queries can be created by dragging fields from the field
list in the upper portion of the query window to the QBE grid in the lower portion of the
window.
In the QBE qrid, each column contains information about a field included in the
query.
To create a query from more than one table, you add the tables you want and make
sure that the tables are joined to each other. We can join the tables by drawing the join
lines between tables, although in many cases Microsoft Access creates join lines
automatically. In most cases, a join lines Microsoft Access: ‘Select the records from both
the tables that have the same values in the fields that are joined’. This is referred to as
‘inner join’. The fields join in this way are called join fields.
Example
Suppose you have two tables: EMPLOYEE and DEPARTMENT. EMPLOYEE table
contains EMP_NO, EMP_NAME, DOJ, SEX, BASIC_SALARY, QUALIFICATION
and DEPT_CODE. DEPARTMENT table contains DEPT_CODE AND DEPT_NAME.
If you want a query that contains DEPT_CODE, you will have to join the two tables.
66
Select a field in one table and drag it to the equivalent field in the other table.
It draws a join line from one table to another.
After adding tables to the query, fields can be included in the query. The fields
selected determine the output of the query in the datasheet view. If you add more than
one table, field can be seen for each table.
1. Drag the field from the field list to a cell in the field row of the QBE grid.
2. Repeat the same until all the fields of the query are shown in the QBE grid.
67
2.3.11 Calculate totals
To calculate totals
1. Select Totals from the View menu to display the totals row in the QBE grid. It
automatically fills ‘Group By’ in each box.
2. Select the field to total on it
3. Select sum from the list of total cell
68
2.3.12 Modify and save a query
You can easily move, delete columns in the query.
1. Click the field selector (column heading) of the column in design view.
2. Click the field selector again, hold down the mouse button and drag the column to
its new location.
1. Click the field selector (column heading) of the column in design view.
2. Press DEL key
To save a query
1. Select Save from the File menu to display Save as dialog box (if first time)
2. Type name in query name box
3. Click Ok to save query in the database.
69
EXERCISE 2:
70
2.4 Creating and Customizing Forms
2.4.1 Introduction
A Query or a Filter is used to view the records in raw form from a table. To view the
data in customized way we use ‘Forms’.
A Form provides an easy way to view data and all the values for one record. Switch
to datasheet view of the form to see all the records for that form.
A Form offers the most convenient layout for entering, changing and viewing the
records in the database. The form design tools in Microsoft Access help to design forms
that present data in an attractive format with special fonts, and other effects.
71
3. Select a Table / Query in the list box
4. Click Ok to create the form by choosing required fields (double click on the
required fields ), a format (say tabular) and title for the form At the end, click on
finish button to save and open the form . The form displays the first record in the
table.
The above form can be used to view, change, add, and delete records in the table.
The objects on the form are called Controls. These controls are used to change and view
the data. The controls are:
72
To switch to datasheet view, select datasheet from the view menu to display form’s
data in datasheet view.
To switch to form view, select forms from the view menu to display records in form
view.
To move from record to record in form view, use navigation buttons to go to first,
last, next or previous records.
73
2.4.4 Close a Form
To close a form select close from the file menu.
To make changes of a form in the forms design view, open the form in design view
from the database window.
• Form Header contains the heading label of the form. It appears at the top of the
window
• Detail section contains the fields from the table to view data. It repeats for each
record
• Form footer appears at the bottom of the window.
74
All forms have a detail section but may or may not have form header and footer.
To add form header and footer, select Form Header / Form Footer from the list box.
Controls on the form are labels and text boxes. In design view, these controls can be
selected and resized.
To select a control
1. Click the text box, to display size and move handles around the control.
2. Drag the handles on the top and bottom to size the text box vertically.
3. Drag the handles on the left and right sides to size the text box horizontally.
4. Drag the handles in the corners to size the text box both vertically and
horizontally.
To resize a control
All the text box controls have attached label controls. They can be moved together or
separately.
To move a control
75
4. Release the mouse button when the control is placed at the desired place.
To move the attached label separately
Using a subform is one way to include information from more than one table in a
form. A subform is a form within a form. When a subform is used, relationship is made
between records from two or more tables. The main form and the subform are linked so
that the subform displays only records that are related to those in the main form.
When you create a Form/Subform using the wizard, data can be viewed in the
subform in either datasheet view or form view.
76
To create a query
1. Click the query button; click the new button to open the new query window.
2. Add the two tables, to display data in the form.
3. Connect the tables with join line.
4. Drag the fields from the field list to the QBE grid.
5. Save and close the query.
EXERCISE 3:
77
2.5 Creating Reports
2.5.1 Introduction
Reports are used to present data on paper. A report is information organized and
formatted to fit some specification. Examples are employee details, department details,
etc. With Microsoft Access different design elements such as text, data, pictures, lines,
boxes and graphs are used to create reports. You can create a design for a report and save
it. It can be used again and again. Current data at that time is printed.
3. Choose Report Wizard from the dialog box and Click OK.
78
4. Make the following choices through the dialog box.
a. Choose the fields you want on the report. Fields can be from more than
one table or query.
For example
79
b. Make a choice to view the data.
For example
By department.
80
d. Select the sort order and summary options for the detail records.
For example
81
f. Select a style for the report.
g. Give a title for the report and click on Finish button to create and open the
report in Print Preview.
82
Report in print Preview:
83
2.5.3 Preview, Print and Save a report
After the wizard creates the report, Microsoft Access displays the report, as it would
appear in print.
To see a whole page in report, position the pointer over the report in Print Preview,
click the report to display a view of the whole page. Click the report again to zoom back
and view data.
To scroll in a page, click the horizontal and vertical scroll bars and to scroll through
pages, click the page buttons to scroll in other pages.
To print a report, select print from file menu. A Print dialog box is shown. Choose
the appropriate options in the box. Click on Ok to print.
To close the report, choose the close option from the file menu.
In design view, the report is divided in sections such as report header and footer,
page header and footer, group header and footer, detail section.
Group header and footer prints information on change of every group (group by
which the report is grouped).
84
EXERCISE 4:
Using the tables created in EXERCISE 1 and / or related queries, generate the
following reports:
85