0% found this document useful (0 votes)
657 views213 pages

Relational Database Management System: Self Learning Material

Uploaded by

mca 11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
657 views213 pages

Relational Database Management System: Self Learning Material

Uploaded by

mca 11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 213

Self Learning Material

Relational Database
Management System
(MCA-202)

Course: Masters in Computer Applications


Semester-II

Distance Education Programme


I.K. Gujral Punjab Technical University
Jalandhar
Syllabus
I.K. Gujral Punjab Technical University

MCA-202 Relational Database Management System

Section -A
Section– A

Review of DBMS:
Basic DBMS terminology; Architecture of a DBMS: Data Independence- Physical and Logical
Independence, Degree of Data Abstraction, Initial Study of the Database, Database Design,
Implementation and Loading, Testing and Evaluation, Operation, Maintenance and Evaluation.

Conceptual Model:
Entity Relationship Model, Importance of ERD, Symbols (Entity: Types of Entities, weak Entity,
Composite Entity, Strong Entity, Attribute: Types of Attribute, Relationship: Type of
relationship, Connectivity, Cardinality).

Section– B
Database Models and Normalization:
Comparison of Network, Hierarchical and Relational Models, Object Oriented Database, Object
Relational Database, Comparison of OOD & ORD; Normalization and its various forms, De-
Normalization, Functional Dependencies, Multi-valued Dependencies, Database Integrity:
Domain, Entity, Referential Integrity Constraints.

Transaction Management and Concurrency Control:


Client/ Server Architecture and implementation issues, Transaction: Properties, Transaction
Management with SQL, Concurrency; Concurrency Control: Locking Methods: (Lock
Granularity,LockTypes,TwoPhaseLocking,Deadlocks),TimeStampingMethod,Optimistic
Method, Database Recovery Management.

Section– C
Distributed Databases:
Centralized Verses Decentralized Design; Distributed Database Management Systems
(DDBMS): Advantage and Disadvantages; Characteristics, Distributed Database Structure,
Components, Distributed Database Design, Homogeneous and Heterogeneous DBMS.

Levels of Data and Process Distribution:


SPSD(Single–SiteProcessing,Single-SiteData),MPSD(Multiple-SiteProcessing,SingleSite
Data),MPMD(Multiple–Site Processing, Multiple-Site Data), Distributed Database Transaction
Features, Transaction Transparency, Client/ Server Vs DDBMS.

Section– D

Business Intelligence and Decision Support System:


The need for Data Analysis, Business Intelligence, Operational Data vs. Decision Support Data,
DSS Database properties and importance, DSS Database Requirements.
OLAP and Database Administration:
Introduction to Online Analytical Processing (OLAP), OLAP Architecture Relational, Star
Schemas, Database Security, Database administration tools, Developing a Data Administration
Strategy.

References:

1. ―DataBase Systems,Peter Rob Carlos Coronel, Cengage Learning, 8thed.


2. ―Database System Concepts, Henry F. korth, Abraham, McGraw-Hill,4thed.
3. ―An Introduction To Database Systems ,C. J. Date, Pearson Education, 8thed.
4. -Principles of Database Systems, Ullman, Galgotia Publication, 3rded.
5. ―An Introduction To Database Systems, Bipin C. Desai, Galgotia Publication
Table of Contents
ChapterNo. Title Written By Page No.
1 Mr. Tarun Kumar, 1
Lecturer, St. Joseph
Overview of database management system – I
School, Barnala
2 Mr. Tarun Kumar, 14
Overview of database management system –
Lecturer, St. Joseph
II
School, Barnala
3 Mr. Tarun Kumar, 24
Lecturer, St. Joseph
Entity Relationship Modelling
School, Barnala
4 Mr. Tarun Kumar, 41
Lecturer, St. Joseph
Database Models
School, Barnala
5 Mr. Tarun Kumar, 55
Lecturer, St. Joseph
Object oriented databases
School, Barnala
6 Mr. Tarun Kumar, 67
Normalization and Data Integrity Lecturer, St. Joseph
School, Barnala
7 Client/Server architecture and transaction Mr. Tarun Kumar, 83
management Lecturer, St. Joseph
School, Barnala
8 Ms.Kanchan Gupta, 96
AP, Mayur College,
Concurrency Control Mechanisms Kapurthala

9 113
Mr. Dinesh Kumar,
Distributed Databases AP, ACET Amritsar

10 Mr. Tarun Kumar, 125


Levels of Data Distribution Lecturer, St. Joseph
School, Barnala
11 Mr. Tarun Kumar, 139
Process distribution in databases Lecturer, St. Joseph
School, Barnala
12 Ms.Kanchan Gupta, 151
AP, Mayur College,
Business Intelligence
Kapurthala
URTHALA
13 Ms.Kanchan Gupta, 162
AP, Mayur College,
Decision Support Systems Kapurthala

14 Ms.Kanchan Gupta, 173


AP, Mayur College,
Online Analytical Processing
Kapurthala
15 KAPURTHALA
Ms. Aarti, AP, ACET 189
Database Administration
Amritsar

Reviewed By
Gagan Kumar
DAVInstitute of
Engineering and
Technology,
Kabir Nagar,
Jalandhar.

©IK GujralPunjabTechnicalUniversityJalandhar
AllrightsreservedwithIK GujralPunjab TechnicalUniversityJalandhar
Lesson 1 Overview of database management system
Structure
1.0 Objective
1.1 Introduction
1.2 Basic DBMS terminology
1.3 Components of DBMS
1.4 Data abstraction
1.5 Data Independence
1.6 DBMS Architecture [system structure]
1.7 Summary
1.8 Glossary
1.9 Answers to check your progress/self assessment questions
1.10 References/ Suggested Readings
1.11 Model questions

1.0 Objective
After studying this lesson, students will be able to:
1. Define database management system.
2. List various components of DBMS.
3. Explain the levels of data abstraction.
4. Discuss the concept of data independence.
5. Describe the architecture of DBMS.

1.1 Introduction
Data storage is not the only objective of DBMS. Even files were capable of storing data.
DBMS became popular because of their ability to perform operations such as data access and
manipulation in minimum possible cost. There is clear separation of schemas used to display
data to end user and schemas used to store the data on storage devices. No business today can
survive without a complete database management solution.

1.2 Basic DBMS terminology


Data initially was stored in files. But it was very difficult to manage data in files as it grew
larger. Performing search operation on data stored in files was very expensive. Then came the
concept of database and database management system. A database is a collection of

1|Page
interrelated data items in the form of tables. A single database may consist of number of
tables. Let us first discuss the concept of a table.
A table is also known as entity and database can be called an entity set. A table is a logical
representation of data in matrix form, i.e. rows and columns. Each column of table represents
an attribute or property of the table. Each row in a table represents a tuple or record. A cell in
a table is where a row and column intersect.

Consider the following for an example,


You may want to create a database for a college. Let the name of the database be "college".
Now, a database is a collection of tables or entities, and following are possible entities for
"college" database.
1. Student
2. Fees
3. Attendance
4. Scholarship
5. Library, etc.

Each entity can be represented in a form of table and one table consists of interrelated
attributes. Following is an example of attributes for student entity:
1. Name
2. Faher_Name
3. Roll_no
4. Class
5. Address
6. Contact_no, etc.

This information can be represented in the form of table as shown below:


Name father_name roll_no class address contact_no
Ajit Ravi 12 11 ABC 123
Sachin Jiten 34 11 DEF 456
Vikas Anoop 2 12 GHI 789
Rupal Raj 31 12 JKL 134
Dhruv Pankaj 5 12 MNO 167

2|Page
The table above consists of 5 records. 34 is highlighted in the table and it represents a cell
where attribute roll_no and 2nd record intersects. As already stated, database is a collection of
interrelated tables or entities. Consider the following two tables

T1
name roll_no Class
T2
emp_code department Salary

Combination of these two tables cannot be considered as a database, as the two tables are not
interrelated to each other. It is important that the two tables have a common attribute that is
used to relate the two tables.
Consider the following two tables for example,
T1
name roll_no Class

T2
roll_no book_title date_of_issue

Now, the two tables have roll_no as the common attribute and hence the two tables can be
considered as interrelated to each other. They both can be part of a single database.
Interrelated tables are used to gather information about a subject, like student from multiple
tables. For example, information about the attendance of student, status of fees paid, books
issued by the student, details of scholarship if any paid to a student can be gathered from
multiple tables using the common attribute "roll_no".

Now that you have an idea of what a database is, it is time to get familiar with the concept of
database management system. It is important that set of rules and procedures are defined for
managing the database. The view of data stored on the disk and the user’s view of data is
entirely different. Database management system is set of programs that are used to manage
the database. Database management system is responsible for assisting in faster execution of
operations such as insertion, deletion and search on database.

3|Page
Following are some of the responsibilities that are performed by the database
management system.
1. Providing support for operations such as inserting, updating, deleting and accessing of
data.
2. Providing security against unauthorized access to data.
3. Maintaining data integrity and consistency at all times.
4. Providing backup for the data and also the ability to recover data in case of any crash or
system failure.
5. Maintaining the catalog and directory of database objects.
6. Providing support for various user interface packages, such as the SQL interface for
relational database systems.

Check your progress/ Self assessment questions- 1


Q1. Define database.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. A table is also known as _____________.


Q3. Each column of table represents an____________ and each row in a table represents a
tuple or __________.
Q4. It is not mandatory for two tables have a common attribute that is used to relate the two
tables. ( TRUE / FALSE )
Q5. Define DBMS.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

1.3 Components of DBMS


Following are the four components of DBMS
1. Hardware: It is the most basic requirement for creatinga computer based system. You need
a client system from where requests can be made for accessing the database. Network devices
are needed in case the client and server are not on the same system. Storage space is

4|Page
important component of a DBMS and you need storage space not only to save the operational
data, but also to save the backup and the archived data.
2. Software: As already stated that DBMS is a set of programs that defines the rules and
procedures to access the data. Complete database management solutions are very costly now
a days. A DBMS software is not expected to facilitate access to data, but also provide other
functionalities like security, regular backup, client server support, ensuring data integrity and
consistency, etc.
3. Data: Data is core component of a DBMS. Data term existed even before DBMS. Data is
an interrelated pieces of information that must be stored for processing. Data must be
organized in a manner that facilitates quick operations on it.
4. Procedures: It refers to the rules and regulations that must be followed when designing the
database. Procedures may differ on the basis of end user requirement or the objective for
which the DBMS is being designed.

1.4 Data abstraction


You must already be familiar with the fact that there is a difference in the way data is stored
on the disk and the way it is viewed by the user. There is an obvious logic why this difference
exists, and it is all for the benefit of the end user. A database can also be viewed from
different angles based on different levels of abstraction. Any entity in DBMS can be seen
from different perspectives or levels of complexity to make it reveal its current amount of
abstraction.
For instance, when you buy a computer system and have a look at the CPU, you know very
little of what is inside the CPU and this is what you call high level of abstraction. Very little
or no knowledge of the details. As you open the cabin or CPU, you are able to see its major
components and what it is made of. Still there is very little knowledge of details and it is
called middle level of abstraction. Finally if we proceed to detach and open any of the
hardware unit, you will be able to see more of the details, and this is what you call low level
of abstraction.
Similarly, based on level of abstraction, data can be viewed from three different prospective.

5|Page
Figure 1.1: DBMS view or DBMS architecture from data abstraction point of view.

Schema is concerned with the arrangement of data. As you can see in the figure above, data is
represented in DBMS using 3 schemas that represent three different levels of abstraction.
Physical schema is also known as internal schema and it represents the lowest level of
abstraction. It deals with the description of how raw data items, i.e. values for each attribute
in a record are stored on the physical storage device (Hard Disc, CD, Tape Drive etc.). It also
describes the location or physical address of the items and the size of the items in the storage
device. This schema is useful for database application developers and database administrator.
Conceptual schema is also known as Logical Schemaand it represents the middle level of
abstraction. It deals with the overall logical structure of the entire database. This level deals
with only the structure of the database and has nothing to do with the raw data items of the
physical schema. This level of abstraction deals with defining the attributes for each table in
the database, and that includes the common attributes in different tables that defines the
relationship between them. It also deals with the type of values that will be stored in each
attribute. Success of any database system depends on careful design of conceptual schema.
External schema is also known as view schema and it represents the user view. External
schema represents the highest level of abstraction. It is designed keeping the end user in
mind who has little knowledge about the working of the DBMS. End user is least bothered
about the functionality and structure of the database and is only concerned with the
information relevant to it. DBMS provides more than one external schema, each different for
a group of users. It is important that the user is presented with only a limited view and not left
confused by pushing all data in front of it. Also creating different external schemas help to

6|Page
implement the data security, by hiding the information not meant for that user. Virtual tables
are used keeping in mind the need of the end user to design external views. These are created
dynamically for the end users at runtime. Some of the fields that an end user may view might
not be physically saved in the database. On the contrary, lots of attributes that are part of the
database and used to maintain the relations are hidden from the user.

Check your progress/ Self assessment questions- 2


Q6. ______________ devices are needed in case the client and server are not on the same
system.
Q7. A database can also be viewed from different angles based on different levels of
_____________.
Q8. Define conceptual schema.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

1.5 Data Independence


It is a key property of a DBMS. According to this property, a DBMS must ensure that any
changes made to one level of schema of the database should require little or no change in the
schema above it. Data independence does not apply to external schema, as there is no schema
above it. Data independence can be classified into following two types:
1. Physical Data Independence: Physical data independence exists if changes made to the
physical or internal schema require no or little change in the conceptual or logical schema.
2. Logical Data Independence: Logical data independence exists if changes made to the
conceptual or logical schema require no or little change in the external or view schema. It is
easy to achieve as compared to physical data independence.

1.6 DBMS Architecture [system structure]


The database may not be saved at the same machine where the end user is working on. The
machine from where the end user requests for data is called client and the machine where the
database is stored is called the server. Database applications work on either a 2-tier or a 3-tier
DBMS architecture. The application in a two-tier architecture, resides at the client machine
and invokes database system functionality at the server machine through query language
statements. ODBC and JDBC (API standards) are used for interaction between the client and

7|Page
server. Whereas, in case of 3-tier architecture, the client machine simply acts as a front end.
It then communicates with an application server through a forms interface, which is turn
communicates with a database system. Application server contains the business logic. 3-tier
architecture is more appropriate for applications that run on the World Wide Web.

Figure 1.2: 2-tier and 3-tier architecture


Source: Database System Concepts by Abraham Silberschatz, Henry F. Korth, S. Sudarshan,
Mc Graw Hill.

Following is the system structure of a DBMS architecture

8|Page
Figure1.3: System structure of DBMS
Source: Database System Concepts by Abraham Silberschatz, Henry F. Korth, S. Sudarshan,
Mc Graw Hill.
There are four types of users as you can see in the figure above:
1. Application programmers are responsible for coding the procedures and routines.
2. Naive users are the end users who have little or no knowledge of the system and often use
automatic teller machines to interact with the system.
3. Sophisticated users are familiar with the system and query languages and they themselves
design their ad-hoc queries to interact with the system.
4. Database administrator is responsible for the well-being of the DBMS. DBA is responsible
for defining the schema, creating new used-ids, roles, access permissions, etc.

Query processor is responsible for optimizing the query such that it’s processing costs the
minimum in terms of time and resources needed. Transactional storage manager is

9|Page
responsible for managing the data access and data manipulation calls.The storage system
includes algorithms anddata structures for organizing and accessing data on disk. A data
dictionary is a file or a set of files that contains the metadata related to the database. The data
dictionary contains records about other objects in the database, such as data ownership, data
relationships to other objects, and other data. The datadictionary is a crucial component of
any relational database. Indexing is a data structure technique usedto efficiently retrieve
records from the database files based on attributes on which the indexing has been done.

Check your progress/ Self assessment questions- 3


Q9. Define Logical data independence.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q10. Describe 3-tier architecture.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
'

Q11. ________________________ is responsible for the well-being of the DBMS.


Q12. DBMS is
a. Detailed base management system
b. Data base management storage
c. Data base management system
Q13. Components of DBMS
a. Hardware
b. Software
c. Data and Procedures
d. All of the above
Q14. Physical schema is also known as
a. external schema
b. internal schema
c. None of the above

10 | P a g e
Q15. ______________ is responsible for managing the data access and data manipulation
a. transactional storage manager
b. Data Dictionary
c. Indexing

1.7 Summary
A database is a collection of interrelated data items in the form of tables. A table is also
known as entity and database can be called an entity set. A table is a logical representation of
data in matrix form, i.e. rows and columns. Each column of table represents an attribute or
property of the table. Each row in a table represents a tuple or record. A cell in a table is
where a row and column intersect.
Database management system is set of programs that are used to manage the database.
Hardware, software, data and procedure are four components of any DBMS. A database can
also be viewed from different angles based on different levels of abstraction. Physical
schemais also known as internal schemaand it represents the lowest level of abstraction. It
deals with the description of how raw data items are stored on the physical storage device.
Conceptual schema is also known as logical schema and it represents the middle level of
abstraction. It deals with the overall logical structure of the entire database. External schema
is also known as view schema and it represents the user view. External schema represents the
highest level of abstraction. It is designed keeping the end userin mind. Data independence
means that changes made to one level of schema of the database should require little or no
change in the schema above it. Physical data independence exists if changes made to the
physical or internal schema require no or little change in the conceptual or logical schema.
Logical data independence exists if changes made to the conceptual or logical schema require
no or little change in the external or view schema. The application in a two-tier architecture,
resides at the client machine and invokes database system functionality at the server machine
through query language statements. In case of 3-tier architecture, the client machine
communicates with an application server through a forms interface, which is turn
communicates with a database system. Application server contains the business logic.

1.8 Glossary
Database- A database is a collection of interrelated data items in the form of tables.

11 | P a g e
DBMS- Database management system is set of programs that are used to manage the
database.
Physical schema- It represent the lowest level of abstraction and deals with the description of
how raw data items are stored on the physical storage device
Conceptual schema- It represents the middle level of abstraction and deals with the overall
logical structure of the entire database.
External schema- It represents the highest level of abstraction and presents the user view.
Data independence- It means that changes made to one level of schema of the database
should require little or no change in the schema above it.

1.9 Answers to check your progress/self assessment questions


1. A database is a collection of interrelated data items in the form of tables. A single database
may consist of number of tables.
2. entity.
3. attribute , record.
4. FALSE.
5. Database management system is set of programs that are used to manage the database.
Database management system is responsible for assisting in faster execution of operations
such as insertion, deletion and search on database.
6. Network.
7. abstraction.
8. Conceptual schemarepresents the middle level of abstraction. It deals with the overall
logical structure of the entire database. This level of abstraction deals with defining the
attributes for each table in the database. It also deals with the type of values that will be
stored in each attribute.
9. Logical data independence exists if changes made to the conceptual or logical schema
require no or little change in the external or view schema.
10. In a 3-tier architecture, client machine simply acts as a front end that communicates with
an application server through a forms interface, which is turn communicates with a database
system. Application server contains the business logic.
11. Database administrator
12. c
13. d
14. b

12 | P a g e
15. a
1.10 References/ Suggested Readings
"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications."

1.11 Model questions


1. What is DBMS? Explain various components of DBMS.
2. What is data independence? Explain the 2 types of data independence.
3. Explain the 3 levels of data abstraction.
4. Describe the 2-teir and 3-teir DBMS architectures.
5. Explain the concept of data, table and database.

13 | P a g e
Lesson 2 Database Lifecycle Model
Structure
2.0 Objective
2.1 Introduction
2.2 Database Life Cycle (DBLC)
2.2.1 PHASE 1. The Database Initial Study
2.2.2 PHASE 2. Database Design
2.2.3 PHASE 3 Implementation and Loading
2.2.4 PHASE 4 Testing and Evaluation
2.2.5 PHASE 5Operation
2.2.6 PHASE 6Maintenance and Evolution
2.3 Summary
2.4 Glossary
2.5 Answers to check your progress/self assessment questions
2.6 References/ Suggested Readings
2.7 Model questions

2.0 Objective
After studying this lesson, students will be able to:
1. Discuss the need of database lifecycle.
2. Discuss the importance of designing the database.
3. List various steps of database life cycle.
4. Explain the importance of testing and evaluation of database components.

2.1 Introduction
The process of developinga database goes through a number of phases and each phase is
linked to the next phase. The database development is an incremental process. Database
development may take place independently or it may be a part of overall development of the
software system. It is important that you proceed by understanding the constraints and other
requirements, before starting with the actual designing of the database structure. Database
design also helps in defining the procedures used by other components for transforming or
updating the data values in the database. Each phase is important and the database
development should be done step wise.

14 | P a g e
2.2 Database Life Cycle (DBLC)
The database life cycle consist of the following six phases:
1. Database initial study
2. Database design
3. Implementation and loading
4. Testing and evaluation
5. Operation
6. Maintenance and evolution.

Figure 2.1 DBLC Phases

2.2.1 PHASE 1. The Database Initial Study


Company’s need of new database design is the direct result of the fallout of the existing
database management system. Database design might be apart of the overall systems
development project and the database designer is one the member of the overall team
composed of a project leader, one or more senior systems analysts, and one or more junior
systems analysts. And for a small project, database designer might be the only member.
The overall purpose of the database initial study is to:
a. Analyse the company situation
b. Define the problems and constraints
c. Define the objectives
d. Define the scope and boundaries

15 | P a g e
a. Analyse the company situation
The company situation describes organizational structure, and its mission. Database designer
must discover companies operational components, how they function, and how they interact.
The design must satisfy the operational demands created by the organization's mission. The
designer must understand the organization chart of the company and try to define required
information flows, specific report and query formats, and so on.

b. Define problems and constraints


The designer can gather information from both the formal and informal sources of
information. The problem definition process might initially appear to be unstructured. End-
users are often unable to describe precisely the larger scope of company operations, which
makes the identification of problem a difficult task.

The designer begins by collecting a very broad problem description. Even the most complete
problem definition may not lead to the perfect solution. The database design is limited by real
life constraints. Such constraints include time, budget, personnel, and more. A designer must
be able to identify and categorize these constraints as perfectly and possible.

c. Define objectives
A database design must at least be capable of providing solution to key problems identified
during the problem discovery process. Designer must look to identify the sources of problems
identified, and surely some of the problems have common sources. It is always a good design
practice to solve the problems sources rather than the symptoms of the problems in hand.

The designer must ensure that the database system objectivesmust correspond to those
envisioned by the end-users. Database designer must begin by proposing the system's initial
objective, systems interface, and whether the system will share the data with other systems or
users?

d. Define scope and boundaries


The system's scope defines the extent of design according to operational requirements. Scope
of database design helps in defining data structures, number of entities, physical size, and
other concepts about the database.

16 | P a g e
Another limit, known as "boundaries" is external to the system and is imposed by the existing
hardware and software. Decision of hardware and software components is done keeping in
mind the goals of the system. The designer is restricted within the constraints of scope and
boundaries and yet expected to produce the best possible system design.

Check your progress/ Self assessment questions- 1


Q1. List various phases of database lifecycle model.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. List the objectives of the phase: Database initial study.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

2.2.2 PHASE 2. Database Design


Now the focus is on the database design to support company operations and objectives. This
phase is concerned with ensuring that the database design meets the requirements of the end
user. In this phase, the designer must concentrate on the data characteristics required to build
the database model. Data can be viewed from:
-Business view of data as source of information
-Designer's view of the data structure, its access, and the activities required to transform the
data into information.

Following points should be kept in the mind before examining the procedures required to
complete the design phase of the database life cycle model:
1. The process of database design is part of the overall system development and data
component is only a single element of a larger information system.
2. The system analysts are also responsible for designing the components of other systems
and activities to create procedures for transforming the data within the database into useful
information.

17 | P a g e
3. The database design is not necessarily a sequential process, rather it is an iterative process
that provides continuous feedback designed to trace previous steps.

Conceptual Design
In the conceptual design stage, data modelling is used to create an abstract database structure
that represents real-world objects in the most realistic way possible. The conceptual model
must embody a clear understanding of the business and its functional areas.
1. Data Analysis and Requirements
The first step is to characterize the data elements. Focus of the designer is on:
a.) Information needs. Designer must try to analyse as to what information is needed and
what sort of information in terms of reports a system need to produce. Also, the designer need
to analyse the information being produced by the system currently in operation, and what
information is lacking in it.
b.) Information users. Designer need to identify the potential users of the information and
how they are expected to use this information. The designer is also concerned with the design
of numerous user views.
c.) Information sources. The designer need to identify the potential credible sources of
information. All information may not be useful, and hence need to decide on a mechanism for
extracting useful information from these sources.
d.) Information constitution. Which data elements are expected produce the information and
what attributes should be included? Designer need to identify the relationship between the
data attributes? What transformations or functions will be applied on data to produce the
expected information?

2. Entity Relationship Modelling and Normalization


It deals with the design of the conceptual model of the database using the ER model. ER
model is used to define the entities and the relationship between those entities. Larger entities
are decomposed into smaller entities which are related to each other through a common
attribute. ER model is used to represent relational data model.

3. Data Model Verification


The ER model is verified against the proposed system processes to confirm that the proposed
processes can be supported by the database model. Series of tests are performed to verify the
data model against:

18 | P a g e
a.) SELECT, INSERT, UPDATE, and DELETE operations and queries and reports.
b.) Access paths and security
c.) Business-imposed data requirements and constraints.

4. Distributed Database Design


It is not mandatory to store all of the database at one location. Generally pieces of data are
stored at multiple sites. Also, the procedures to access the data vary from one site to another.
Data distribution and allocation strategies are needed to distribute the database at multiple
sites.

2.2.3 PHASE 3.Implementation and Loading:


All the instructions specified in the design phase related to the entity creation, attributes and
their domains, indexes, and constraints are implemented in this phase. Following steps are
performed in this phase:
a. Installing the DBMS:
It is performed only when a new dedicated instance of the DBMS is necessary for the system.
b. Create the Database(s): Modern Relational database management systems support creation
of special storage-related constructs to store the database entities. The constructs generally
include storage group, table spaces, and tables.
c. Load Data: Once the database has been created, the next logical step is to add data to it.
Generally the database is saved in old system, and it needs to be migrated into the new
system. Data is extracted from multiple sources, with different formats such as other
relational databases, non-relational databases, flat files, or legacy systems.

2.2.4 PHASE 4. Testing and Evaluation:


Last phase, i.e. implementation and loading was concerned with implementing or carrying
out the decisions made to ensure database integrity, security, performance, and recoverability
during the design phase. Responsibility of this 4th phase, i.e. testing and evaluation lies with
the database administrator. The DBA performs various tests and fine tunes the database to
ensure that the actual performance of the DBMS is as per the expectancy. This phase is
carried out concurrently with the application programming. Following are the key activities
of this phase.
a. Test the Database: Testing of the database is done to maintain the database integrity and
consistency. Database integrity is a key feature of RDBMS and it can be enforced by proper

19 | P a g e
application of referential integrity. The changes in one relation should be properly
communicated to all relations related to it, using primary key and foreign key attributes.
Proper checks should be performed during the tests to ensure the Password security, Access
rights, etc.
b. Fine-Tune the Database: There are no set standards to measure the performance of the
database, and hence it is not easy to evaluate the same. Still, measuring the performance of
the database is the most important factor in database implementation. Performance criteria’s
differ from system to system. Other than technical factors, even environmental factors affect
the database performance. The hardware and software in use also greatly impact the
performance of database system.
c. Evaluate the Database and its Application Programs: A system must be tested and
evaluated from a more holistic approach. Testing of individual components or procedures of
the database management system should be done first and then the testing should be done to
ensure that there is flawless interaction between these components. Data is important and
very costly, and every effort must be made to ensure that no data is lost. Testing must be
performed to ensure that proper backup and recovery mechanisms are put in place and
working as per the expectancy.

Check your progress/ Self assessment questions- 2


Q3. What is conceptual design?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q4. List various steps involved in implementation and loading phase.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

2.2.5 PHASE 5.Operation


A database is considered to be operational, once it passes the evaluation phase.Users,
application programs and management, all put together constitute a complete information
system. System evaluation is the first activity of the operational phase of database life cycle
model. When all the users become operational and start to work on the database in real

20 | P a g e
environment, you immediately start to face problems that could not be traced during the
testing phase. Most of these problems may not seem to be too serious, but are serious enough
to annoy the actual user. For other problems that are of serious nature, need immediate repair.
For example, when you start to transfer data over the web, massive volume of data could lead
to network blockage or system crash. The designer then needs to identify the source of the
problem and design multiple solutions to the problem. Designing a load balancing software
that focuses on the problem source is one solution. Another solution is to provide additional
cache for the DBMS. No matter how well a designer manages all the design aspects, there is
still the need of routine maintenance.

2.2.6 PHASE 6 Maintenance and Evolution


The database administrator must be prepared to perform routine maintenance activities within
the database. Some of the required periodic maintenance activities include:
Preventive maintenance (backup).
Corrective maintenance (recovery).
Adaptive maintenance (enhancing performance, adding entities and attributes, and so on).
Assignment of access permissions and their maintenance for new and old users.

2.3 Summary
The database development is an incremental process. The database life cycle is divided into
six phases. Design must satisfy the operational demands created by the organization's
mission. The designer must understand the organization chart of the company and try to
define required information flows, specific report and query formats, and so on. Design phase
is concerned with ensuring that the database design meets the requirements of the end user.
Implementation and loading phase is concerned with installation of DBMS, creation the
Database(s) and loading of Data. Responsibility of testing and evaluation phase lies with the
database administrator. The DBA performs various tests and fine tunes the database to ensure
that the actual performance of the DBMS is as per the expectancy. System evaluation is the
first activity of the operational phase of database life cycle model. When all the users become
operational and start to work on the database in real environment, you immediately start to
face problems that could not be traced during the testing phase. The designer then needs to
identify the source of the problem and design multiple solutions to the problem. No matter
how well a designer manages all the design aspects, there is still the need of routine
maintenance.

21 | P a g e
2.4 Glossary
ER Model- Database model used to define the various entities and the relationship between
those entities.
Normalization- Decomposing of large entity into a number of small entities to reduce the data
redundancy.
DBLC- It refers to a complete database development process consisting of six phases.
Procedure- Code used to transform the state or values of database entities.
DBA- It stands for database administrator. It is responsible for overall management of the
database.

2.5 Answers to check your progress/self assessment questions


1.Thedatabase life cycle consist of the following six phases:
a. Database initial study
b. Database design
c. Implementation and loading
d. Testing and evaluation
e. Operation
f. Maintenance and evolution.
2. The objectives of database initial study is to:
a. Analyse the company situation
b. Define the problems and constraints
c. Define the objectives
d. Define the scope and boundaries
3. In conceptual design, data modelling is used to create an abstract database structure that
represents real-world objects in the most realistic way possible. The conceptual model must
embody a clear understanding of the business and its functional areas.
4. Steps involved in implementation and loading phase:

a. Install the DBMS.


b. Create the Database(s).
c. Load or Convert the Data.

22 | P a g e
2.6 References/ Suggested Readings
"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications."

2.7 Model questions


1. Explain the database design phase of DBLC.
2. What is the importance of testing and evaluation phase?
3. Why should maintenance be performed at the end of database development cycle?
4. What is the need to conduct database initial study?
5. Define DBLC.

23 | P a g e
Lesson 3 Entity relationship modelling
Structure
3.0 Objective
3.1 Introduction
3.2 ER Model and ER Diagram
3.3 The Concept of Keys
3.4 Entity
3.4.1 Types of Entities
3.5 Attributes
3.5.1 Types of Attributes
3.6 Relationship
3.6.1 Degree of Relationship
3.6.2 Connectivity and Cardinality
3.7 ER Diagram Symbols
3.8 Generalization, Specialization and Aggregation
3.9 Steps in creating a ER Diagram
3.10 Advantages of Entity Relationship Diagram:
3.11 Summary
3.12 Glossary
3.13 Answers to check your progress/self assessment questions
3.14 References/ Suggested Readings
3.15 Model questions

3.0 Objective
After studying this lesson, students will be able to:
1. Define the Entity relationship model.
2. Explain various components of ER model.
3. Describe various symbols used in ER Diagram.
4. List various steps used in creating an ER Diagram.
5. List various advantages of using ER Diagram.

3.1 Introduction
Picture speaks a thousand words. Visual representation of relational data model is easy to
understand and helps in better analysis. It is effective way of expressing relationship between

24 | P a g e
entities. It is a good practice to create an ER Diagram before the actual design of the relation
database. In this lesson you will get an opportunity to learn all the basic concepts that you
need to understand for designing your own ER Diagram.

3.2 ER Model and ER Diagram


ER models are best suited for designing the relational databases. Relational data model is the
most used database model for data storage and processing. The Entity Relation model is used
to define the conceptual view of a database. It used to represent the real world entities and
show the association between them. ER model doesn't actually provide the complete
description of the database, instead it an intermediate step that facilitates efficient design of
database. ER diagram is a tool for visual representation of relational model. It provides the
basic for producing a data structure that helps in effective data storage and retrieval.

3.3 The Concept of Keys


A key is used to uniquely identify an instance of the entity. Following is the list of commonly
used keys in relational data model.
Super key- It refers to a collection of attributes that are used to uniquely identify an instance
or row of an entity or table in the database.
Candidate key: It refers to the minimal super key. If the super key consists of a single
attribute, then a super key is also the candidate key. For a super key which is a collection of
attributes, there may exist more than one candidate key for the entity.
Primary Key: Primary key is a candidate key that uniquely identify an entity. There can be
multiple candidate keys for an entity, but only one primary key. Value of the primary key or
attribute must be unique and it cannot be empty for any instance or row. For example,
aadhar_number and employee_code are two candidate keys for employee entity, and you can
select employee_code as primary key.
Alternate Key: Not all candidate keys can act as primary key for an entity, but they still
possess the properties of a primary key. Such candidate keys are called alternate keys.
Composite Key: Sometimes a single key or attribute is not enough to identify a single row or
instance of an entity. In such cases, key is a combination of 2 or more attributes. For
example, students in different classes can have same roll numbers, hence in order to uniquely
identify a student from school entity, it is advisable to use combination of class and roll
number keys.

25 | P a g e
Foreign Key: An entity may not have a primary key of its own. Primary attribute of a related
entity can then be used to identify the rows for such an entity. Primary key of the related
entity is called the foreign key of such an entity or table.

3.4 Entity
An entity is something that is easily identifiable and is used to represent a real world object.
For example, in a database of an enterprise, an entity may be used to represent, suppliers,
stock, employees, buyers, accounts, etc. Each of these entities consist of attributes or
properties that give them their identity.
Collection of similar types of entities is called an entity set. An entity set may contain entities
with attributes sharing similar values. For example, employees may belong to the same
company, suppliers that supply the same product, stock of single product, etc.

Check your progress/ Self assessment questions- 1

Q1. The Entity Relation model is used to define the ________________view of a database.

Q2. Value of the primary key or attribute must be _________and it cannot be ________ for
any instance or row.

Q3. Define super key.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q4. Define foreign key.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

3.4.1 Types of Entities


Traditional Entity
A simple entity as explained above is called a traditional entity. It is never used to define the
many-to many relationship type. Many-to-many relationship is explained latter in this lesson.

26 | P a g e
Composite entity
It is capable of handling the many-to-many relationship. So it overcomes the limitation of
traditional entity. It is used to connect two simple entities by sharing the primary keys of both
the connected entities. Besides being an entity, it also possess the properties of a relationship.
Subtype/ Supertype Entity
It is used to represent a simple parent/ child relationship between the entities. Child or
subtype entity is at the bottom and it inherits all the information of the parent or supertype
entity at the top. Moving down the hierarchy is called specialization and moving up the
hierarchy is called generalization.
Strong Entity/ Weak Entity
Strong entity is one whose existence does not depend on any other entity. Student entity is a
perfect example of strong entity. Most of the entities used in ER model are strong entities.
Weak entity is one that is dependent on some other entity. Weak entity does not consist of a
key attribute.

3.5 Attributes
Attributes are used to represent the entities. Attributes are also known as the properties of an
entity. All attributes have values. For example, an employee attribute may consist of
attributes such as name, age, designation, department, salary, etc. These attributes may have
values such as Raman, 27, senior architect, design, 50000.
For certain attributes, a fixed range or domain is specified. It means the attribute can be
assigned a value within the specified domain. For example, department value can be one of
the pre-defined departments in an enterprise, age of an employee cannot be less than 18, etc.

3.5.1 Types of Attributes


 Simple attribute − Simple attribute represents atomic value. Atomic attribute is one
that cannot be further divided. For example, an employee's age is an atomic value and
cannot be further divided.
 Composite attribute − Composite attribute is a collection of or is made of more than
one simple attribute. For example, an employee's address may have house number,
street number, locality, city and state.
 Derived attribute − Derived attribute is not a part of the original attribute set of an
entity and is not physically saved. It is derived from one of the attribute in the original

27 | P a g e
attribute set. For example, age can be derived from the attribute date of birth, DA can
be derived from the grade pay of the employee, etc.
 Single-value attribute − Single-value attribute contain single value. For example −
aadhar_number.
 Multi-value attribute − Multi-value attribute may contain more than one values. For
example, email address of the employee, contact number of the employee, etc.

Check your progress/ Self assessment questions- 2


Q5. Define weak entity.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q6. ____________________________ is never used to define the many-to many relationship


type

Q7. Define derived attribute.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q8. _____________ attributes may contain more than one values.

3.6 Relationship
Relationship refers to the association among entities. For example, an employee works_at a
department. Here, works_at is called relationship.

Figure 3.1: Relation between two entities.

28 | P a g e
Relationship Set
Relationship set refers to a set of relationships of similar type. A relationship too can have
attributes called descriptive attributes.

3.6.1 Degree of Relationship


Degree of the relationship is defined as the number of entities participating in a relationship.
 Binary = degree 2 (when two entities are participating in the relationship)
 Ternary = degree 3 (when three entities are participating in the relationship)
 n-ary = degree (generalized for any value of n, normally greater than three)

3.6.2 Connectivity and Cardinality


Connectivity is used to define the association between the number of entities in one entity set
with the number of entities of other set via relationship set. Cardinality is used to specify the
minimum and maximum number of instances of one entity set that should be associated with
instances of other entity set.
Connectivity
 One-to-one − In this type of connectivity, single entity from first entity set can be
associated with only single entity from second entity set and vice versa.

 One-to-many − In this type of connectivity, single entity from first entity set can be
associated with multiple entities from second entity set, but the reverse is not true.

 Many-to-one − In this type of connectivity, multiple entities from first entity set can
be associated with single entity from second entity set, but the reverse is not true.

29 | P a g e
 Many-to-many − In this type of connectivity, single entity from first entity set can be
associated with multiple entities from second entity set and vice versa.

Cardinality

Figure 3.2: Connectivity and cardinality.

ER diagram is used to represent an ER Model. ER model is capable of representing entities,


there attributes, relationship sets, and attributes of relationship sets in an effective manner.

Check your progress/ Self assessment questions- 3


Q9. Relationship refers to the __________________ among entities.
Q10. Define cardinality.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Q11. Explain one-to-one connectivity.

30 | P a g e
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

3.7 ER Diagram Symbols


Following is the list of symbols used in ER diagram:
Rectangle- Rectangle symbol is used to represent entities. Rectangles are labelled with the
entity name they are representing.
For example,

Ellipse- Ellipse symbol is used to represent the entity attributes. One ellipse represents one
attribute and is connected to the entity it belongs to
For example,

The figure above shows simple attributes. Ellipse is also used to represent the composite
attributes. Composite attributes are not directly connected to entity, but instead are connected
to the attribute they belong to.
For example,

31 | P a g e
Double ellipse- Double ellipse is used to represent the multi-valued attributes. Multi-valued
attributes are the ones for which an entity can have more than one value.
For example,

Dashed Ellipse- Dashed ellipse is used to represent the derived attributes. Derived attribute is
not physically stored in the database and its value is derived from some other physically
stored attribute.

Age attribute is derived from the attribute date of birth.

Diamond shaped box- It is used to represent the relationship between two entities, and the
entities are connected to it using simple lines. Relationship name is written inside the
diamond box and the cardinality is labelled on the relationship lines. Binary relationship is

32 | P a g e
where only two entities are participating and cardinality refers to number of instances of an
entity that can be associated with the relation. One-to-one is represented as '1:1' and it means
that only a single instance both entities is associated with the relationship. One-to-many is
represented as '1:N' and it means that a single instance of one entity and multiple instances of
second entity are associated with the relationship. Many-to-many is represented as 'N:N' and
it means that multiple instances of both entities are associated with the relationship.

Double rectangle- It is used to represent a weak entity. Weak entity is one that is dependent
on some other attribute.

Ellipse with underlined text- It is used to represent the key attribute of the entity that helps
to uniquely identify the entity.

3.8 Generalization, Specialization and Aggregation


Generalization and specialization is based on inheritance properties between the entities. In
case of generalization, a bottom-up approach is followed in which lower level entities unite to
form a higher level entity. Multilevel generalization can also be performed, where the higher
level entities can be used further tomake higher level entity.

33 | P a g e
Figure 3.3: Generalization.
Specialization in contrary to Generalization is a top-down approach in which higher level
entity is broken down into two lower level entities.

Figure 3.4: Specialization.


In case of aggregation, a relation between two entities is considered to be a single entity. It is
used to express relationship between the relationships in ER model. In case of aggregation,
relationships are treated as higher level entities. It is also known as generalized aggregation.

Figure 3.5: Aggregation.

34 | P a g e
Check your progress/ Self assessment questions- 4
Q12. How can you represent entity, attribute and relation in ER Diagram?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q13. Define aggregation. What is the advantage of using aggregation?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Q14. ________________is used to represent the derived attributes.
Q15. Database Life Cycle consists of
a. five phases
b. six phases
c. seven phases
Q16. _________________ is used to create an abstract database structure that represents real
-world objects in the most realistic way possible.
a. Data analysis
b. Data modelling
c. Requirements
Q17. Responsibility of testing and evaluation lies with
a. database administrator
b. Oracle
c. Legacy systems
Q18. Which is the first activity of the operational phase of database life cycle model
a. System evaluation
b. Application programs
c. Managment

3.9 Steps in creating a ER Diagram


Following are simple steps that one should follow while constructing an ER Diagram.
 Identify the entities: Very first step in designing an ER Diagram is to identify all the
possible entities that will be used to represent the relational model. An entity in ER
Diagram is represented using a rectangle and is used to store database information in

35 | P a g e
the form of table. It can be any real life object, process, etc. Identification of all
possible entities is key to finding or establishing relationships between the entities.
 .Identify relationships: Once the entities have been identified, the next logical step is
to ascertain or find out if there is any relationship between the entities in the entity set.
 Describe the relationship: If the two entities are related, it is important to describe
the relationship between the entities. A Diamond shaped symbol is used to connect
two related entities and the relationship is defined inside that diamond shape that
contains brief description of how they are related.
 Add attributes: An entity is incomplete without its attributes or properties. Once the
relationships have been established, you must proceed to identify the attributes of all
entities and also define the key attributes for all dimensions, if they exist.
 Complete the diagram: It is more of a check to see if all the relationships have been
defined and all attributes have been listed. You can even use tools like SmartDraw to
draw ER Diagram that makes it easy to modify the structure of the ER Diagram.

Figure 3.6: Example of ER Diagram


Source:
"http://www2.amk.fi/digma.fi/www.amk.fi/opintojaksot/0303011/1142845462205/11428478
02793/1142848508953/1142848642251.html"

3.10 Advantages of Entity Relationship Diagram:


Following are some of the key advantages of ER Diagram:
1. Graphical representation: It provides you with a visual representation of entity relationship
model. It is easy to analyse the relationship between the entities with the help of ER diagram

36 | P a g e
and it supports effective database design. ER Diagram focuses on the data flows and
interactions between various entities of the database. ER Diagram can also be used together
with data flow diagrams.
2. Better communication: ER Diagram is an effective tool for communicating key entities and
their relationship in a database. Relationships, entities and attributes are represented using 3
different symbols. It allows a clear understanding of what exactly the database will be like.
3. Easy to understand: ER Diagram is simple to create and understand. Modification and
enhancement of ER Diagram is also easy. Even the non-designers can easily understand the
working and relationship in the database.
4. High flexibility: ER Diagram can make use of already existing ER models, in other words,
it is quite flexible and can be derived easily from existing models. ER Diagram is a blueprint
of database.

3.11 Summary
The Entity Relation model is used to define the conceptual view of a database. A key is used
to uniquely identify an instance of the entity. Some of the examples of keys are primary key,
foreign key, super key, composite key, derived key, etc. An entity is something that is easily
identifiable and is used to represent a real world object. Weak entity is one that is dependent
on some other entity. Weak entity does not consist of a key attribute. Attributes are used to
represent the entities. For certain attributes, a fixed range or domain is specified. It means the
attribute can be assigned a value within the specified domain. Relationship refers to the
association among entities. For example, an employee works_at a department. Here,
works_at is called relationship. Relationship set refers to a set of relationships of similar type.
A relationship too can have attributes called descriptive attributes. Degree of the relationship
is defined as the number of entities participating in a relationship. Rectangle symbol is used
to represent an entity, ellipse symbol is used to represent the entity attribute and diamond
shaped box is used to represent the relationship between two entities. In case of
generalization, a bottom-up approach is followed in which lower level entities unite to form a
higher level entity. Specialization in contrary to Generalization is a top-down approach in
which higher level entity is broken down into two lower level entities. In case of aggregation,
a relation between two entities is considered to be a single entity. It is used to express
relationship between the relationships in ER model.

3.12 Glossary

37 | P a g e
Entity- An entity is something that is easily identifiable and is used to represent a real world
object.
Attribute- An attribute is also known as the property of an entity. For example, an employee
attribute may consist of attributes such as name, age, designation, department, salary, etc.
Relationship- Relationship refers to the association among entities.
Key- A key is used to uniquely identify an instance of the entity.
Aggregation- In case of aggregation, a relation between two entities is considered to be a
single entity.
Connectivity- Connectivity is used to define the association between the number of entities in
one entity set with the number of entities of other set via relationship set.
Cardinality- Cardinality is used to specify the minimum and maximum number of instances
of one entity set that should be associated with instances of other entity set.

ER model- The Entity Relation model is used to define the conceptual view of a relational
database.
3.13 Answers to check your progress/self assessment questions
1. conceptual.
2. unique , empty.
3. It refers to a collection of attributes that are used to uniquely identify an instance or row of
an entity or table in the database.
4. Primary attribute of a related entity can be used to identify the rows for an entity that does
not have a primary key. Primary key of the related entity is called the foreign key of such an
entity or table.
5. Weak entity is one that is dependent on some other entity. Weak entity does not consist of
a key attribute.
6. Traditional entity.
7. Derived attribute is one this is not physically saved in the database. It is derived from one
of the attribute in the original attribute set.
8. Multi-value
9. association.
10. Cardinality is used to specify the minimum and maximum number of instances of one
entity set that should be associated with instances of other entity set.
11. In one-to-one connectivity, single entity from first entity set can be associated with only
single entity from second entity set and vice versa.

38 | P a g e
12. Rectangle symbol is used to represent an entity, ellipse symbol is used to represent the
entity attribute and diamond shaped box is used to represent the relationship between two
entities.
13. In case of aggregation, a relation between two entities is considered to be a single entity.
It is used to express relationship between the relationships in ER model.
14. Dashed ellipse.
15. b
16. b
17. a
18. a
3.14 References/ Suggested Readings
"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications."

3.15 Model questions


1. List various symbols used in ER diagram.
2. Draw ER diagram for college.
3. List various advantages of ER diagram.
4. Explain various keys used in relational data model.
5. Define attribute. List various attributes.
6. Define entity. Explain various types of entities.
7. Define relationship. Explain connectivity and cardinality.

39 | P a g e
40 | P a g e
Lesson 4 Database Models
Structure
4.0 Objective
4.1 Introduction
4.2 Data Model
4.3 Relational data model
4.3.1 ER Diagram
4.4 Network data model
4.4.1 Data-Structure Diagram
4.5 Hierarchical data model
4.5.1 Tree-Structure Diagram
4.6 Summary
4.7 Glossary
4.8 Answers to check your progress/self assessment questions
4.9 References/ Suggested Readings
4.10 Model questions

4.0 Objective
After studying this lesson, students will be able to:
1. Define the concept of data model
2. List various data models used in RDBMS.
3. Explain the relational model and use of ER Diagram.
4. Explain the network model and use of data-structure Diagram.
5. Explain the hierarchical model and use of Tree-structure Diagram.

4.1 Introduction
There is a clear separation between the logical and the physical schemas of the database.
Number of data models are available to represent the logical schema of the database. In this
lesson you will learn 3 popular data models and the diagrams used in each data model to
represent the database tables and the relationship between those tables.

4.2 Data Model


Data Models are used to represent the logical schema of the database. Data models are
fundamental entities that helps to achieve the objective of abstraction in a database

41 | P a g e
management system. Data models are used to define the relationship between data, or how
the two data are connected to each other. Popular data models used in any database
management system are:
1. Relational data model.
2. Network data model.
3. Hierarchical data model.

4.3 Relational data model


ER data model is the most used database model for data storage and processing. The Entity
Relation model is used to define the conceptual view of a database. ER diagram is used to
represent an ER Model.

4.4 ER Diagram
An entity is something that is easily identifiable and is used to represent a real world object,
and is represented in ER Diagram using a rectangle. Attributes are also known as the
properties of an entity and all attributes have values. Attribute in ER Diagram is represented
using an oval shape.

Figure 4.1 Entity and its attributes


Figure above is part of an ER diagram. It is used to represent an entity called "employee". It
consists of 4 attributes that are connected to the entity "employee". These attributes are
represented using 4 oval shapes.

Relationship refers to the association or link between entities. Relation in ER diagram is


represented using diamond shape.

42 | P a g e
Figure 4.2 Entity relationship in ER Diagram

The figure above shows a relationship between two entities; customer and account. The name
of the relation is depositor and it is represented using a diamond shape. It shows that the
entities customer and account are related to each other, and a customer can open and deposit
money in a bank account. Also a customer can open more than one account in a bank and an
account may belong to more than one customer.
A primary key attribute uniquely identifies a record in an entity and is represented using an
underline in the oval shape. One-to-one or one-to-many relation depends on the primary key
attribute of an entity.

Figure 4.3Primary attribute

Figure above represents an entity "student" that consist of three attributes: roll_no, name and
class. The attribute roll_no is primary key attribute of the entity and is represented by
underlining the attribute name, and it is used to uniquely identify each record in the student
entity.
Degree of the relationship in ER model can be binary, ternary or even n-ary, and it defines
the number of entities participating in a relationship. Connectivity or cardinality is used to
define the association between the number of entities in one entity set with the number of
entities of other set via relationship set. Connectivity can be one-to-one, one-to-many, many-
to-one or many-to many.

43 | P a g e
Figure 4.4Connectivity

The figure above shows a binary relation, as only two entities are involved in this relation.
Labels on the lines are used to indicate that the connectivity of the relation is many-to-many.

Following is an example of ER diagram.

Figure 4.5: Example of ER Diagram


Source:
"http://www2.amk.fi/digma.fi/www.amk.fi/opintojaksot/0303011/1142845462205/11428478
02793/1142848508953/1142848642251.html"

Generalization in ER model, refers to a bottom-up approach in which lower level entities


unite to form a higher level entity. Specialization in ER model, refers to a top-down approach
in which higher level entity is broken down into two or more lower level entity. And in case
of aggregation in ER model, a relation between two entities is considered to be a single
entity. It is used to express relationship between the relationships in ER model.

Check your progress/ Self assessment questions- 1

44 | P a g e
Q1. Define data model.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. List three types of data models.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Q3. An entity is represented in ER Diagram using a___________ and attribute is represented
using an _____________ shape.

4.4 Network data model


Records in a network database are connected to one another through links. A record in
network model resembles closely to entity in relational model. A record is a collection of
multiple attributes, and each attribute having an atomic value. A link represents an
association between only and only two records. A link is used to represent binary relation.

Consider the following record types


Record name = customer
Attributes
customername
customerstreet
customercity

Record name = account


Attributes
accountnumber
balance

The two records belong to a banking system and represent the customer-account relationship.
Consider the following sample database that represents the relationship between two
accounts.

45 | P a g e
Figure 4.6: Sample database

4.4.1 Data-Structure Diagram


Network database can be best represented using a schema called data-structure diagram.
Boxes are used to represent the record type and lines are used to represent the corresponding
links or relations. Data structure diagram is logically same as ER Diagram used to represent
the relational model. It specifies the overall logical structure of the database. Best way to
understand the representation of a data structure diagram, is to convert an ER Diagram into
an equivalent data structure diagram.

In this section, binary relation of sample database of Figure4.6 is considered. There are two
entities in the sample database namely, customer and account. Both the entities are related
through a binary, many-to-many relationship Depositor.

Figure 4.7Relation in ER diagram

ER-Diagram in the figure above shows that, two entities are related to each other using
depositor relationship. It represents many-to-many relationship, as one customer can have
multiple accounts with the same bank, and one account may belong to multiple customers.

46 | P a g e
Following is the representation of its equivalent data structure diagram.

Figure 4.8Relation in Data-structure diagram

The record type customer corresponds to the entity setcustomer in ER Diagram and record
type account corresponds to the entity setaccount in the ER Diagram.
The diamond shaped relationship in ER Diagram is replaced with the link represented using a
line with the label "depositor".
No arrows are used to represent the many-to-many link or relation. One-to-many relationship
in data structure diagram can be represented using arrow pointing to either of the record type.
One-to-one relationship in data structure diagram can be represented using two arrows
pointing to both record types.

Figure 4.9One to many and one-to-one relation

Transformation of ER Diagram is more complicated when the relationship includes


descriptive attributes. A link cannot contain any data value, so a new record type needs to be
created and links need to be established. Consider the E-R diagram of Figure 4.10. Suppose
you want to add the attribute access date to the relationship depositor. This newly derived E-
R diagram looks like the following.

47 | P a g e
Figure 4.10Relation attribute in ER Diagram

The entities in the ER Diagram above are converted to the record types, and a new record
type is created to define the attribute access date with a single field to represent the date. Also
many-to-one links can be represented using arrows at the end of the link. The data structure
diagram for the ER Diagram with descriptive attribute will look like the following:

Figure 4.11Relation attribute in data-structure diagram

Check your progress/ Self assessment questions- 2


Q4. What is data-structure diagram?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q5. How relationship attributes of ER modelspecified using data-structure diagram?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

4.5 Hierarchical data model


A hierarchical model is a tree like structure that is used to show a hierarchy of relationship
between records. A hierarchical database consists of a collection of records that are connected
to each other through links. A record is similar to a record type in the network model. Only
difference between a network model and hierarchical model is that, the network model does
not show the links in the form of hierarchy.

48 | P a g e
Each record is a collection of fields, known as attributes in relational model, each of which
contains only one data value. A link is an association between precisely two records, and is
similar to a link in the network model.

Let us again discuss the same database (banking system) as discussed in the network model.
Consider a database that represents a customer-account relationship in a banking system with
two record types namely, customer and account.

The sample database in the form of tree or hierarchical data model can be viewed as follows:

Figure 4.12 Hierarchical structure

It shows that ‘ravi’ has one account with the bank, ‘sachin’ has two accounts with the bank
and ‘puran’ also has one account with the bank. Note that the set of all customer and account
records are organized in the form of a rooted tree, where the root of the tree is a dummy node.
Hierarchical database is a collection of such rooted trees, and hence forms a forest.

An account may belong to several customers, which results in the duplication of particular
record in several different locations. The information pertaining to that account, or of various
customers to which that account may belong, will have to be replicated. This replication may
occur either in the same database tree or in several different trees.

4.5.1 Tree-Structure Diagram


A tree-structure diagram is the schema for a hierarchical database. Box symbol is used to
represent a record type and line is used to represent a corresponding link between two record
types. A tree-structure diagram serves the same purpose of an ER diagram. It represents the
overall logical structure of the database.

49 | P a g e
A tree-structure diagram is similar to a data-structure diagram of network model. In network
model, record types are organized in the form of a graph, whereas in hierarchical model,
record types are organized in the form of a rooted tree.

A rooted tree does not include cycles, and there is a record type that is designated as the root
of the tree. Tree structure diagram can only represent one-to-many or one-to-one
relationships, i.e. you can have one parent node for one child or multiple children nodes.
Arrows in rooted tree point from children to parent node. A parent may have an arrow
pointing to a child.

The overall database schema in case of hierarchical data model is represented as a collection
of tree-structure diagrams. For each such diagram, there exists one single instance of a
database tree. The root of a tree is a dummy node. The children of the dummy node are
instances of the root record type in the tree-structure diagram. Each record instance may, in
turn, have several children. It refers to a recursive definition of the rooted tree.
Now, you will see the conversion or transformation of an ER Diagram into an equivalent
rooted tree. Here is an example of applying transformation on a single relation. Consider the
E-R diagram shown below:

Figure 4.13 Entity relation in ER Diagram

ER-Diagram in the figure above shows that two entities are related to each other using
depositor relationship. It represents many-to-many relationship, as one customer can have
multiple accounts with the same bank, and one account may belong to multiple customers.

The corresponding tree-structure diagram is shown below:

50 | P a g e
Figure 4.14Tree-structure Diagram

The record type customer corresponds to the entity setcustomer and record type account
corresponds to the entity setaccount. Finally, the relationship depositor has been replaced
with the link depositor, using an arrow pointing to customer record type. An instance of a
database corresponding to the described schema may thus contain a number of customer
records linked to a number of a count records. A customer can open more than single account
with the bank in the existing banking system.

If the relationship depositor is one to one, then the link depositor has two arrows: one
pointing to account record type, and one pointing to customer record type. If the relationship
depositor is many to many, then the transformation from an E-R diagram to a tree-structure
diagram is more complicated, as for a tree data structure it is not possible to have multiple
parent nodes for a single child node.

Check your progress/ Self assessment questions- 3


Q6. Define hierarchical model.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q7. What are the limitations of rooted tree diagram?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Q8. ___________________ are used to represent the logical schema of the database
a. Data models
b. Tables
c. Data structure

51 | P a g e
Q9. ER diagram is used to represent
a. Hierarchical data model
b. Network data model
c. ER model
Q10. A record is a collection of multiple
a. attributes
b. links
c. diagrams
Q11. A hierarchical model is a
a. linear structure
b. tree like structure
c. array structure

4.6 Summary
Data Models are used to represent the logical schema of the database. ER data model is the
most used database model for data storage and processing. ER model represents entities and
the relationship between various entities. ER Diagram can model any degree of relationship
and can be used to represent any form of cardinality. A record in network model resembles
closely to entity in relational model. Records in a network database are connected to one
another through links. Network database can be represented using a schema called data-
structure. Boxes are used to represent the record type and lines are used to represent the
corresponding links or relations. A hierarchical model is a tree like structure that is used to
show a hierarchy of relationship between records. Only difference between a network model
and hierarchical model is that, the network model does not show the links in the form of
hierarchy. A link in hierarchical model is an association between precisely two records, and is
similar to a link in the network model.

4.7 Glossary
ER Diagram- ER Diagram is used to represent the ER model.
Data-structure diagram- It is used to represent the network data model.
Tree-Structure diagram- It is used to represent the hierarchical data model.
Data model- Data Model is used to represent the logical schema of the database.

52 | P a g e
Relation degree- Degree of the relationship in ER model defines the number of entities
participating in a relationship.
Relation cardinality- Cardinality is used to define the association between the number of
entities in one entity set with the number of entities of other entity set.

4.8 Answers to check your progress/self assessment questions


1. Data Model is used to represent the logical schema of the database and to define the
relationship between data, or how the two data are connected to each other.

2. Three data models used in any database management system are:


a. Relational data model.
b. Network data model.
c. Hierarchical data model.
3. rectangle, oval.

4. Network database can be best represented using data-structure diagram. Boxes are used to
represent the record type and lines are used to represent the corresponding links or relations.
It specifies the overall logical structure of the database.
5. Relationship attributes of ER model are specified in data-structure diagram by creating a
new record type and linking them to the entities involved in the relationship or association.
6. A hierarchical model is a tree like structure that is used to show a hierarchy of relationship
between records. Only difference between a network model and hierarchical model is that,
the network model does not show the links in the form of hierarchy.
7. A rooted tree does not include cycles, and it can be used only to represent one-to-many or
one-to-one relationships.
8. a
9. c
10. a
11. b

4.9 References/ Suggested Readings


"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.

53 | P a g e
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications."

4.10 Model questions


1. Define data model.
2. Differentiate between hierarchical and network data model.
3. Explain in detail the data-structure diagram.
4. Explain ER model and diagram used to represent it in detail.
5. Take an example of ER diagram and show its transformation into data-structure diagram.

54 | P a g e
Lesson 5 Object-Oriented Databases
Structure
5.0 Objective
5.1 Introduction
5.2 Limitations of relational databases
5.3 Need of Object-Oriented databases
5.4 Need for complex data types
5.5 Collection types
5.6 Data Definition
5.7 Type inheritance
5.8 Object Identity
5.9 Approaches to Object-Oriented design
5.9.1 Persistent Programming languages
5.9.2 ORDBMS
5.10 Comparison between object-oriented databases and object-relational databases
5.11 Summary
5.12 Glossary
5.13 Answers to check your progress/self assessment questions
5.14 References/ Suggested Readings
5.15 Model questions

5.0 Objective
After studying this lesson, students will be able to:
1. Discuss the limitations of relational database management system.
2. Explain the need of object-oriented database for large applications.
3. Describe the various components of object-oriented database.
4. Discuss the two approaches to build object-oriented database.
5. Compare the OODBMS and ORDBMS.

5.1 Introduction
With every passing day, the data is growing rapidly. Relational model has limitations when it
comes to representing relations and objects for very large applications. Also, making changes
in a relational model is not an easy task and it is not suited in an environment where there is a

55 | P a g e
regular change in requirements by the customer. Object-oriented database tries to address all
the limitations of the relational database.

5.2 Limitations of relational databases


Relational database provides support for only the fixed data types and is well suited to meet
the requirements of the traditional applications. Relational database thought of data as a two
dimensional table. It supported both the logical and physical database structures. But, it is
not suited for the application domains that requires complex data types. Relational database
treated data and the procedures to transform the data, independent of each other. Whereas the
object oriented database is built on the principle that both data and procedures are related to
each other. Relational database is widely used in commercial applications, but suffers from
the following limitations:

1. Object Identity: Relational database model failed to provide an independent existence to


entities that are used to describe the real life objects. Only way you could access objects in
relational databases was through the attributes, which characterized them.

2. Explicit relationships: Relational database model even failed to provide explicit


representation for the identities of relationships. Query operations are used to recover the
relations between the identities. You can even define relationship of relationship in relational
mode, where a relationship may act as an entity. It is very difficult to identify and
differentiate between the entities and relationships in relational database model.

3. Structured data objects: The first normal form of the relational data model states that an
attribute in each tuple must represent an atomic value. But, there is need to consider the
attributes with complex data types as well, where the values of domains are themselves
tuples.
Also the relational model puts restrictions on representing entities as collection types such as
sets, lists, etc. An entity may be represented as a list or set of attributes. For example,
department may be represented as set of attributes such as Department_description,
Department_code, Supervisor_name, Location, etc.

4. Inheritance- It is natural for entities to have hierarchical relationship or structure. But


relational model does not support inheritance to express the concept of hierarchy.

56 | P a g e
5. Methods: Views are used to record special queries on the database and also the methods or
procedures for updating the database values are independent of data and maintained outside
the relational model.

5.3 Need of Object-Oriented databases


Relational data model are well suited for use with small and medium sized projects. But
larger applications need something more than what an RDBMS offers. Object-oriented
database model is a natural option for such projects. An object-oriented model is preferred
over the relational model when working with:

1. Embedded database management system applications: Embedded DBMS are used to


provide super-fast response time. Relational data model involves lot of overhead in
expressing the entity-relation mapping and also requires additional administration during
deployment. Instead, it is wise to use self-contained and easy to deploy persistent solutions
like Java or DOT NET.

2. Complex Object Relationship: It is difficult to model classes that involves cross references
and use of complex data types using relational model. It is easy in case of object-oriented
databases to maintain the references between the objects using reachability persistence.

3. Modification in data structures: RDBMS finds it difficult to add new data members of
relations into the structure. It requires a lot of changes in the procedures and the database
schema. On the contrary, it is quite easy to implement such changes in object-oriented
databases by mixing the old and new objects in the database with ease.

4. Deep data structures: Expressing a relationship that has a very lengthy parent child
relationship can be very difficult for a RDBMS. On the contrary, object-oriented databases
can easily handle lengthy parent-child relationships using reachability persistence.

5. Object-oriented programming: Object oriented databases should be preferred when


working with OO programming languages. RDBMS requires you to write additional code for
passing information between the actual objects and the row objects of the database, and also

57 | P a g e
there is need to write the translation code for providing the object-schema mapping. On the
contrary, there is no need of using any translation code when working with OO databases.

6. Objects with collections: Collections or sets within an object are used to represent the one-
to-many relationship. But, RDBMS does not provide support to maintain these collections
within an object, and you need to provide link between the parent object and the objects in
the collection using an intermediate table. Whereas, in case of OO databases, parent object
and its member collections can be fetched using a single call.

7. Agile software development methodology: Most of the software development in today's


era is done using agile methodology. Agile development model is designed to handle the
change requests by the customers more quickly than any of the traditional development
models. Agile model is optimized to facilitate quick completion of the project by fitting the
process to the model. OODBMS fits into agile development much better than the RDBMS.

8. Navigation to access data: Not all users are comfortable with typing queries to access data,
and RDBMS is based on designing complex queries to access data. Also, OODBMS lets you
navigate through deeply networked structure without having to use complex queries.

Check your progress/ Self assessment questions- 1


Q1. Relational database treats data and the procedures to transform the data,
__________________ of each other.

Q2. Relational model supports inheritance. ( TRUE / FALSE ).


___________________________________________________________________________

Q3. Define agile software development methodology.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

5.4 Need for complex data types


Traditional database management systems worked fine with the simple data type. But in the
recent years, importance of complex data types has increased. For instance, consider saving

58 | P a g e
the address of an employee requires you to save the complete address using one attribute or it
can be broken into multiple atomic attributes such as, street no, city, state, etc. Also, saving
multi-valued attribute requires you to create a separate relation for keeping the database in
first normal form.
ER model concepts can be represented in OODBMS using complex data types and object
orientation, without any complex translations. Large object data type (supported by Oracle),
is used to save objects such as images, voice, video, and much more. These objects require
large amount of space in memory. Four types of large objects supported by Oracle are:
1. CLOB- It is known as character large object and is used to store character data.
2. NCLOB- It is known as national character set large object and is used to store non-english
characters in the form of multiple byte character data.
3. BLOB- It is known as binary large object and is used to store the binary data.
4. BFILE- It is known as binary file large object and it specifies a binary file stored outside
the database using the file system of the operating system.

5.5 Collection types


Collection type is simply a collection of elements such as basic types, structured types, or
even a collection itself. It is possible to even edit a collection type, once it has been created.
Structured type is lot different from collection type. It is a user defined data type with
attributes and methods. Behaviour of the structure type is defined by its methods. You cannot
create user defined data types in Relational DBMS. You can create a structured type from a
basic datatype, another structured type or even reference to structured type. Readers familiar
with structure data type in C++ programming language, will find it is easy to understand
structure type on OODBMS.

You can create a structured data type as follows:


create type program as
( Pcode varchar ( 4 ) ,
Pname varchar ( 20 ) ,
Duration number ,
UGPG char ( 2 ) )

create type student as


( Name varchar ( 30 ) ,

59 | P a g e
Fname varchar ( 30 ) ,
Age number ,
Category varchar ( 10 ) ,
Program program )
method RegisterStudent ( )

Two structured types have been defined in the last example. Structured type 'program' is a
collection of basic data types and structured type 'student' is a collection of both basic data
types and a structured type. Program member of the student structure belongs to 'program'
type.

You can create table containing the tuple of both the 'program' and 'student' as follows:
create table Student of student

Because student structure type contains an element of program structure type, there is no
need to create a separate table for program structure type.

5.6 Data Definition


An object-oriented database consists of objects, object classes and abstraction mechanisms
used to link the two. Consider the following data definitions in OODBMS:

1. Objects: Relational model is known as value-oriented model because a tuple of an entity in


RDMBS is called an object and two identical tuples belong to the same object. Practically,
two identical tuples can refer to two different objects, and OOBDMS is able to differentiate
between the two objects with identical records using the notion of unique system-generated
identifier.
An object is binding of data and procedures, where attributes are used to represent data and
methods are used to represent the procedures. Binding of both data and procedure as one is
called encapsulation. Objects support the notion of encapsulation by defining the interface to
package both the data and procedures.

2. Object classes: Collection of objects is known as class and it defines the methods,
attributes and the relationship between a set of common objects. Object is known as an
instance of a class.

60 | P a g e
3. Abstraction: Generalization is an abstract mechanism used to declare certain classes as
subclasses of other classes. For example, academics, administration, accounts, HR may be
defined as subclasses of Department class. Grouping of lower level objects to represent a
higher level object is called aggregation. For instance, number of girls and boys in school
database.

4. Inheritance- Generalization and aggregation is based on the hierarchical relationship


between the objects, and is also gives birth to the notion of inheritance. Inheritance in
OODBMS can be used to represent the structural and behavioural inheritance.

The process by which a subclass inherits the attributes or data of superclass is called
structural inheritance. For instance, if a superclass 'program' has attributes pcode, pname,
duration, then a subclass 'student' will also be defined by these attributes. The process by
which a subclass inherits the methods or procedures of superclass is called behavioural
inheritance. For instance, if the superclass employee has method compute_pay, then the
subclass manager will also be characterized by the same method. A subclass also inherits the
participation in the relationship sets of the superclass. Depending on the structural
relationship or hierarchy between the classes, inheritance may be of different types such as
single inheritance or multiple inheritance.

Check your progress/ Self assessment questions- 2


Q4. List four types of large object types supported by Oracle.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q5. Define collection type.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

61 | P a g e
Q6. OODBMS is able to differentiate between the two objects with identical records. ( TRUE
/ FALSE ).
___________________________________________________________________________

5.7 Type inheritance


An OODBMS must be capable of differentiating between the objects based on their type. It is
important because a number of objects of same class exists in a database. Consider the
following example of type inheritance:
create type secretary
under employee
(typing_speed varcchar(15))

Just like secretary, you can have other subclasses of the superclass employee, such as
engineer, accountant, receptionist, etc. Inheritance type allows you to save some additional
information about the superclass related to the subclasses in the database. For example, you
can store additional information about the employee, who is a secretary.

5.8 Object Identity


An object in OODBMS consists of value and name. Value refers to the value contained in the
object attribute, and Name component of object is used by procedures to refer to an object.
Object identifier is used to uniquely identify an object, and no two objects can have same
identifier. Each object may contain other objects and is modelled as containment of
objects. A composite object is one that contains other objects.

5.9 Approaches to Object-Oriented design


There are two ways of implementing the concept of object oriented databases

5.9.1 Persistent Programming languages


Generally all instances of class objects are destroyed once the program terminates. PPL refers
to a set of programming languages that allows the objects to exist even after the termination
of the program. PPL deals with persistent data that can be manipulated directly and there is
no need to load the same into memory and save it back onto the disk. Complexity of PPL

62 | P a g e
programming language is very high and it does not support declarative querying, which
makes it difficult to achieve high level optimization.

Persistence of objects- Persistence objects are stored on permanent storage or secondary disk
and transient objects are stored in main memory or temporary storage. You can make a
transient object as persistence object using persistence by class, creation, marking or by
reachability.

5.9.2 ORDBMS
Object relational database model is built directly over the relational model. An object-
relational management system is similar to a relational database, but with an object-oriented
database model. In ORDBMS, database schema and query language provides direct support
for objects, classes and inheritance. It is like an intermediate level between the relational and
object-oriented data models. It follows the basic principles of the relational model only, i.e.
data is stored inside the tables and can be accessed using structured queries. Whereas, in case
of OODBMSs, the data is stored in form of persistent objects. Both the relational and object-
oriented data models are two entirely different designs, and Object-relational data model tries
to bridge the gap between the two. It tries to combine the advantages of both the relational
and object-oriented database. It supports all the features or characteristics of object-oriented
databases.
Check your progress/ Self assessment questions- 3
Q7. An OODBMS must be capable of differentiating between the objects based on their type.
( TRUE / FALSE ).
___________________________________________________________________________

Q8. What do we mean by composite object?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q6. What do we mean by persistent object?


___________________________________________________________________________

5.10 Comparison between object-oriented databases and object-relational databases

63 | P a g e
Object-oriented databases Object-Relational databases
Object-oriented databases based on the Object-relational databases are built on top of
persistent programming language make use the relational model.
of very complex data type.
Object-oriented databases have poor Querying capabilities of object-relational
querying capabilities. database is extremely powerful.
Object oriented databases does not support or Object relational databases support high level
provide high level security. security.
Object oriented model is a natural way of Object relational model is suited to represent
expressing objects and their relationships. simple relationships.
Data is stored in OODBMS using persistent Data is stored in ORDBMS in the form of
objects. tables, as in relational model.

5.11 Summary
Relational database treats data and the procedures to transform the data, independent of each
other. ER model concepts can be represented in OODBMS using complex data types and
object orientation, without any complex translations. An object-oriented database consists of
objects, object classes and abstraction mechanisms used to link the two. OODBMS is capable
of differentiating between the objects based on their type. PPL refers to a set of programming
languages that allows the objects to exist even after the termination of the program.
Persistence objects are stored on permanent storage or secondary disk, whereas, transient
objects are stored in main memory, or temporary storage. In ORDBMS, database schema and
query language provides direct support for objects, classes and inheritance. It is like an
intermediate level between the relational and object-oriented data models.

5.12 Glossary
Collection type- It is a collection of elements such as basic types, structured types, or even a
collection itself.
Structured type- It is a user defined data type with attributes and methods, and it can be
created from a basic datatype, another structured type or even reference to structured type.
Object- An object is binding of attributes used to represent data and methods used to
represent the procedures.

64 | P a g e
Class- It defines the methods, attributes and the relationship between a set of common
objects.
Generalization- Generalization is an abstract mechanism used to declare certain classes as
subclasses of other classes.
Aggregation- Grouping of lower level objects to represent a higher level object is called
aggregation.
Structural inheritance- The process by which a subclass inherits the attributes or data of
superclass is called structural inheritance.
Behavioural inheritance- The process by which a subclass inherits the methods or procedures
of superclass is called behavioural inheritance.
PPL- It refers to a set of programming languages that allows the objects to exist even after the
termination of the program.
ORDBMS- It is built directly over the relational model. It is similar to a relational database,
but with an object-oriented database model.

5.13 Answers to check your progress/self assessment questions


1. independent.
2. FALSE.
3. Agile development model is designed to handle the change requests by the customers more
quickly than any of the traditional development models. Agile model is optimized to facilitate
quick completion of the project by fitting the process to the model.
4. CLOB, NCLOB, BLOB, BFILE.
5. Collection type is simply a collection of elements such as basic types, structured types, or
even a collection itself.
6. TRUE.
7. TRUE.
8. A composite object is one that contains other objects.
9. Persistence objects are stored on permanent storage or secondary disk, and are not
destroyed even when the program terminates.

5.14 References/ Suggested Readings


"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.

65 | P a g e
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database management Systems by R. Panneerselvam, PHI.
4. Database management system Concepts by P. K. Singh, VK Publications.
5. Database systems: Design Implementation and Management by Peter Rob and Carlos
Coronel, Cengage India Pvt. Limited.
6. Advanced database management system by Rini Chakrabarti and Shibhadra Dasgupta,
Dreamtech Press".

5.15 Model questions


1. Define two types of inheritance.
2. What is a structured type? Give an example.
3. Define ORDBMS.
4. Define aggregation and generalization.
5. Differentiate between OODBMS and ORDBMS.
6. What is LOB? List 4 types of LOBs.

66 | P a g e
Lesson- 6 Normalization and Data Integrity
Structure
6.0 Objective
6.1 Introduction
6.2 Normalization
6.3 Advantages of normalization
6.4 Disadvantages of normalization
6.5 Various normal forms
6.5.1 Un-Normalized Form (UNF)
6.5.2 First Normal Form (1NF)
6.5.3 Second Normal Form (2NF)
6.5.4 Third Normal Form (3NF)
6.5.5 Boyce-Code Normal Form (BCNF)
6.5.6 Fourth Normal Form (4NF)
6.6 Database Integrity
6.6.1 Domain Integrity
6.6.2 Referential Integrity Constraint
6.7 Denormalization
6.8 Summary
6.9 Glossary
6.10 Answers to check your progress/self assessment questions
6.11 References/ Suggested Readings
6.12 Model Questions

6.0 Objective
After Studying this lesson, students will be able to:
1. Define normalization.
2. List various advantages and disadvantages of normalization.
3. Discuss the need of data warehouse.
4. Differentiate between operational and informational data stores.
5. Define Denormalization.

6.1 Introduction

67 | P a g e
ER Model is able to successfully define the relationship between the various entities within a
database, but it does not address the problem of redundancy within that raw data items.
Redundancy in raw data can be removed with the help of normalization. Data redundancy can
result in storage crisis and effect the performance of transaction data management.

6.2 Normalization
Normalization refers to a systematic process of removing the redundant data and various
anomalies that exist in the database. It helps to improve storage efficiency, data integrity and
scalability. Normalization may improve the storage efficiency of the database, but it then
comes with the cost of increased complexity and poor data query performance. Normalization
is done in order to optimize the performance of the operational or transactional data. Other
than removing the redundant data, normalization also ensures that data dependencies make
sense, it means that only the attributes that are interrelated and dependent on key attribute are
saved in single entity.
Because the entities in a database are related to each other, updating the data in one entity
may result in data anomalies. To make database structure flexible i.e. it should be possible to
add new data values and rows without reorganizing the database structure. Making changes in
one entity should not result into data inconsistencies. Following are some of the anomalies
that you need to guard against:
Insertion Anomaly-Sometimes you want to add data, but you are unable to do so. It
generally happens when you need to add other piece of data along with it that may not be
available at that time. For example, there may be case that you can add a member to library
database only when he/she issues a book. But a person may not be interested to issue a book.
Deletion Anomaly - A record of data can rightfully be deleted from a database, but it may
result in the deletion of the only instance of other required data. For example, if you delete
the membership details of a person in library database, you will not be able to identify the
member against the books issued by him.
Update Anomaly - Updating data in one entity may result in data inconsistencies. For
example, if the contact number of member has changed, and the change is not reflected in all
related entities, it may result in data integrity problem.

6.3 Advantages of normalization


1. Improves storage efficiency by removing the redundant fields or columns in various
entities of the database.

68 | P a g e
2. It helps to improve the flexibility in data base by removing various data anomalies, i.e. you
are able to add, remove or update data without having to worry about loss of data and data
inconsistency problems.
3. Normalization results in decomposition of large entity into a number of smaller entities
that helps in easy understanding of the entity structure and the relation between them.
4. It helps in reducing the time to record the transactional data by removing the redundant
attributes. For example, if you separate the student record from his/ her attendance record,
you can record attendance of the student without having to record student’s basic
information.

6.4 Disadvantages of normalization


1. It is not possible to start building the database before understanding the needs of the end
user.
2. The performance of data retrieval process is effected at higher levels of normalization due
to large number of joins. Denormalization is discussed at the end of this lesson used to
resolve this problem.
3. Decomposing the database to higher forms of normalization is very difficult task and time
consuming too.
4. Bad normalization design can lead to serious problems and cascading effect on the entire
database.

Check your progress/ Self assessment questions- 1


Q1. Define Normalization.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. Explain deletion anomaly.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q3. List two disadvantages of normalization.


___________________________________________________________________________

69 | P a g e
___________________________________________________________________________
___________________________________________________________________________

6.5 Various normal forms


6.5.1 Un-Normalized Form (UNF)
If tuples in an entity consists of non-atomic values, it is considered to be in un-normalized
form. Atomic value is one that cannot be further decomposed. A non-atomic value, can be
further decomposed and simplified. Consider the following table for example
:
Emp_Code Emp_Name Quarter Sales Bank_Branch_Code Branch_Name
E001 Raghav 1 1234 B0001 Kapurthala
2 1231
3 11123
E002 Manpreet 1 2343 B0002 Jalandhar
2 2167
3 2188
E003 Sunny 1 1678 B0003 Phagwara
2 1345
3 1621
Table 6.1: Un-normalized form
Observe the table entries carefully, there are multiple occurrences of rows under each key
Emp_Code. Emp_Code fails to uniquely identify each tuple in the table. It means the table in
un-normalized form.

6.5.2 First Normal Form (1NF)


A relation is said to be in 1NF if it contains no non-atomic values. A relation is in first
normal form if:
The domain of each attribute contains only atomic values.
The value of each attribute contains only a single value from that domain,
And, it supports functional dependency.
Functional dependency means that one attribute uniquely determines another attribute. If R
is a relation with attributes X and Y, a functional dependency between the attributes is
represented as X->Y, which specifies Y is functionally dependent on X. Each value of X is

70 | P a g e
associated precisely with one Y value. The table 6.1 can be converted into first normal form
as follows:

Emp_Code Emp_Name Quarter Sales Bank_Branch_Code Branch_Name


E001 Raghav 1 1234 B0001 Kapurthala
E001 Raghav 2 1231 B0001 Kapurthala
E001 Raghav 3 11123 B0001 Kapurthala
E002 Manpreet 1 2343 B0002 Jalandhar
E002 Manpreet 2 2167 B0002 Jalandhar
E002 Manpreet 3 2188 B0002 Jalandhar
E003 Sunny 1 1678 B0003 Phagwara
E002 Sunny 2 1345 B0003 Phagwara
E002 Sunny 3 1621 B0003 Phagwara
Table 6.2 First Normal Form
As you can see in the table above, this relation contains only atomic values that cannot be
further decomposed and is said to be in first normal form.

6.5.3 Second Normal Form (2NF)


For a relation to be in 2NF, it must already be in 1NF and non-prime attributes should be
fully functional dependent on the primary key of the relation. A relation is not in second
normal form, if some of its attributes are not fully functional dependant on key attribute. If
the primary key of the table is not a composite key, then the relation in 1NF is
automatically in 2NF as well. Full function dependency means that the non-key attributes
are dependent on the entire key and not a part of it. If (X and Y) form the composite key of a
relation and Z is a non-primary key, it means that a relation is fully functional dependent
if, Z is dependent on both( X and Y), and not on X or Y.

Consider the following table


Order_No Order_Date Item_No Units Price.
E001 01-10-15 1000 12 654
E002 03-10-15 2000 45 612
E003 07-10-15 3000 345 456

71 | P a g e
E004 08-10-15 4000 21 564
E005 10-10-15 5000 36 532
Table 6.3

The primary key for the table above is (Order_No, Item_No). Non-primary attributes Price
and order_Date are dependent on part of the primary key, i.e. Item_No and Order_No
respectively. It means the relation is not in 2NF. Decompose it into following tables:
Item_No Price.
1000 654
2000 612
3000 456
4000 564
5000 532
Table 6.4 (a): Second Normal Form

Order_No Order_Date
E001 01-10-15
E002 03-10-15
E003 07-10-15
E004 08-10-15
E005 10-10-15
Table 6.4 (b): Second Normal Form

Order_No Item_No Units


E001 1000 12
E002 2000 45
E003 3000 345
E004 4000 21
E005 5000 36
Table 6.4 (c): Second Normal Form

Now, the relation is in second normal form.

72 | P a g e
6.5.4 Third Normal Form (3NF)
For a relation to be in 3NF, if should be in 2NF and no transitive dependency should exist in
that relation. Now, it is important to understand the transitive dependency first.
Consider the following functional dependency
A → B, it means that B is functionally dependent on A.
And
B → C, it means that C is functionally dependent on B.
It is easy to derive that C is also functionally depend on A, A → C. This type of derived
dependency is what you call transitive dependency.

In table 6.1, Bank_Branch_Code is dependent on key attribute Emp_Code, and


Branch_Name is dependent on Emp_Code. Hence, there exists transitive dependency and the
relation is not in third normal form. You can convert the relation into third normal form by
decomposing the relation into two entities as shown below:

Emp_Code Emp_Name Quarter Sales Bank_Branch_Code


E001 Raghav 1 1234 B0001
E001 Raghav 2 1231 B0001
E001 Raghav 3 11123 B0001
E002 Manpreet 1 2343 B0002
E002 Manpreet 2 2167 B0002
E002 Manpreet 3 2188 B0002
E003 Sunny 1 1678 B0003
E002 Sunny 2 1345 B0003
E002 Sunny 3 1621 B0003
Table 6.5 (a) Third Normal Form

Bank_Branch_Code Branch_Name
B0001 Kapurthala
B0002 Jalandhar
B0003 Phagwara

73 | P a g e
Table 6.5 (b) Third Normal Form

Check your progress/ Self assessment questions- 2


Q4. Define functional dependency.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q5. If the primary key of the table is not a ___________ key, then the relation in 1NF is
automatically in 2NF as well.

Q6. Define full functional dependency.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Q7. What is transitive dependency?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

6.5.5 Boyce-Code Normal Form (BCNF)


For a relation to be in BCNF, it must be in 3NF and the left hand side of every dependency
must be a candidate key. There is possibility, that a relation in third normal form may not be
in BCNF, and for that the following conditions are found true.
1. The candidate keys are composite.
2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation.

Professor Code Department Head of Department Percent Time


P001 Mathematics Sachin 40
P001 Physics Rahul 60
P002 Nano Singh 35
P002 Mathematics Sachin 65

74 | P a g e
P003 Physics Rahul 100
Table 6.6

Consider the relation above. It is assumed that:


1. A professor may work in more than 1 department.
2. Percentage of the time a professor spends in each department is given.
3. Each department can have only one Head.
The relation diagram for the above relation is given as following:

Figure 6.1 Relation Diagram

Names of Department and Head of Department are duplicated. Further, if Professor P2


resigns, rows 3 and 4 are deleted and also the information that Singh is Head of Department
(nano) is lost.
The relation needs to be decomposed into following:

Professor Code Department Head of Department Percent Time


P001 Mathematics Sachin 40
P001 Physics Rahul 60
P002 Nano Singh 35
P002 Mathematics Sachin 65
P003 Physics Rahul 100
Figure 6.7 (a) BCNF Normal Form

75 | P a g e
Department Head of Department
Physics Rahul
Mathematics Sachin
Nano Singh
Figure 6.7 (b) BCNF Normal Form

See the dependency diagrams for these new relations.

Figure 6.2 Dependency Diagram

6.5.6 Fourth Normal Form (4NF)


A relation must be decomposed to fourth normal form in case there exists multi-valued
dependency. In case of multi-valued dependency, each and every attribute within a relation
depends upon the other, and none of those satisfy the property of primary key.
Consider an example of supplier. A supplier provides number of items and to different
departments within the same organization. The same items may also be supplied by different
vendors.
A multi valued dependency exists here because all the attributes depend upon the other and
none of those satisfy the property of primary key.

Supplier_Code Item_Code Department_Code


S1 I001 D1
S1 I002 D1
S1 I001 D3
S1 I002 D3
S2 I002 D1
S2 I003 D1
S3 I001 D2
S3 I001 D3

76 | P a g e
Table 6.8

The problems are clearly visible. Information about item I001 is stored twice for supplier S3.
If supplier S1 has to supply to department D2 and the item is not yet decided, a tuple with
blank entry has to be introduced.
The relation can be decomposed into two, to form the Fourth Normal Form (4NF). A relation
is in 4NF if it has no more than one independent multi valued dependency or one independent
multi valued dependency with a functional dependency.

Supplier_Code Item_Code
S1 I001
S1 I002
S2 I002
S2 I003
S3 I001
Table 6.9 (a) 4NF
Supplier_Code Department_Code
S1 D1
S1 D3
S2 D1
S3 D2
Table 6.9 (b) 4NF

6.6 Database Integrity


Data integrity is enforced in a database by a series of integrity constraints or rules. Integrity
constraints must be defined before designing the database tables. Integrity helps to keep the
database in consistent state.
6.6.1 Domain Integrity
Domain integrity specifies that all attributes in a relational database must be declared upon a
defined domain, or the definition of a valid set of values for an attribute. You define the
following for each attribute of the table,
- data type,
- length

77 | P a g e
- constraints
You may also define the default value, the range (values in between) and/or specific values
for the attribute. Some DBMS allow you to define the output format and/or input mask for
the attribute.
These definitions ensure that a specific attribute will have a right and proper value in the
database.

6.6.2 Referential Integrity Constraint


The referential integrity is used to maintain the consistency between the tuples of two
interrelated tables. Foreign key of one table is used to relate to a table with its primary key.
Following are some of the referential integrity constraints:
1. A tuple in primary table can only be deleted if no entry for that key exists in the related
table. For example, you can delete the primary entry of the member, if he/she has not issued
any book or has no fine pending against it.
2. Value of a key in primary table, can only be modified, if no entry for that key exists in the
related table.
3. You can't enter a value in the foreign key field of the related table that doesn't exist in the
primary key of the primary table.
4. Referential integrity do allow you to enter NULL value in the foreign key of the related
table.

Check your progress/ Self assessment questions- 3


Q8. Define BCNF.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q9. Define multi-valued dependency.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Q10. Define domain integrity.
___________________________________________________________________________
___________________________________________________________________________

78 | P a g e
Q11. ___________________ refers to a systematic process of removing the redundant data
a. Normalization
b. Scalability
c. Redundancy
Q12. If tuples in an entity consists of non-atomic values, it is considered to be in
___________.
a. un-normalized form
b. non-atomic value
c. tuples
Q13. A relation is said to be in _________ if it contains no non-atomic values
a. 1NF
b. 2NF
c. 3NF
Q14. _________________ means that one attribute uniquely determines another attribute
a. Normalization
b. Functional dependency
c. Derived dependency
___________________________________________________________________________

6.7 Denormalization
Denormalization refers to adding back small pieces of redundancy to the normalized
database. You know that OLTP systems are based on normalized data model. But data
warehouses are based on the objective of providing repository of data that is responsive to
faster query processing. Normalized data model fails to achieve this objective due to large
number of joins. The size of the data warehouse repository is huge and often different tables
that are related to each other are stored on different disks. To process a query that requires
you to apply join operation of tables stored on different disks is time consuming.
Denormalization is the process of attempting to optimise the query responsiveness of
a database by adding some redundant data back to the normalized database. Denormalization
is careful selection of attributes that need to be made redundant. If involves in-depth study of
querying history to check the joins with highest frequency. Denormalization also brings the
danger of inconsistency or update anomalies back to the database design. Therefore,
denormalization needs to be performed deliberately and document any denormalization

79 | P a g e
thoroughly. Any updates performed, should be properly propagated to all foreign keys that
are added through the process of denormalization.

6.8 Summary
Normalization refers to a systematic process of removing the redundant data and various
anomalies that exist in the database. If tuples in an entity consists of non-atomic values, it is
considered to be in un-normalized form. A relation is said to be in 1NF if it contains no non-
atomic values. For a relation to be in 2NF, it must already be in 1NF and non-prime attributes
should be fully functional dependent on the primary key of the relation. For a relation to be in
3NF, if should be in 2NF and no transitive dependency should exist in that relation. For a
relation to be in BCNF, it must be in 3NF and the left hand side of every dependency must be
a candidate key. A relation must be decomposed to fourth normal form in case there exists
multi-valued dependency. Data integrity is enforced in a database by a series of integrity
constraints or rules. Domain integrity specifies that all attributes in a relational database must
be declared upon a defined domain. The referential integrity is used to maintain the
consistency between the tuples of two interrelated tables. Denormalization refers to adding
back small pieces of redundancy to the normalized database.

6.9 Glossary
Domain integrity- Domain integrity specifies that all attributes in a relational database must
be declared upon a defined domain
Referential integrity- The referential integrity is used to maintain the consistency between the
tuples of two interrelated tables by restricting the operations on primary key and foreign keys
of related tables.
Normalization- Process of removing data redundancy.
Denormalization - Process of adding back redundancy to the database to improve the
performance of data retrieval.
Transitive dependency- If A → B, and B → C, it means that C is functionally dependent on
A. This type of derived dependency is what you call transitive dependency.
Functional dependency- Functional dependency means that one attribute uniquely determines
another attribute.
Fully functional dependency- Full function dependency means that the non-key attributes are
dependent on the entire key and not a part of it.

80 | P a g e
Multi-valued dependency- In it, each and every attribute within a relation depends upon the
other, and none of those satisfy the property of primary key

6.10 Answers to check your progress/self assessment questions


1. Normalization refers to a systematic process of removing the redundant data and various
anomalies that exist in the database. It helps to improve storage efficiency, data integrity and
scalability.
2. A record of data can rightfully be deleted from a database, but it may result in the deletion
of the only instance of other, required data.
3. Disadvantages of normalization:
a. It is not possible to start building the database before understanding the needs of the
end user.
b. The performance of data retrieval process is effected at higher levels of normalization
due to large number of joins.
4. Functional dependency means that one attribute uniquely determines another attribute.
5. composite.
6. Full function dependency means that the non-key attributes are dependent on the entire key
and not a part of it.
7. Consider the following functional dependency
A → B, it means that B is functionally dependent on A, and B → C, it means that C is
functionally dependent on B. It is easy to derive that C is also functionally depend on A, A →
C. This type of derived dependency is what you call transitive dependency.
8. For a relation to be in BCNF, it must be in 3NF and the left hand side of every dependency
must be a candidate key.
9. In case of multi-valued dependency, each and every attribute within a relation depends
upon the other, and none of those satisfy the property of primary key.
10. Domain integrity specifies that all attributes in a relational database must be declared
upon a defined domain, or the definition of a valid set of values for an attribute.
11. a
12. a
13. a
14. b

81 | P a g e
6.11 References/ Suggested Readings
"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications."

6.12 Model Questions


1. Explain the concept of database integrity in detail.
2. What is denormalization? Why it is needed?
3. List various advantages and disadvantages.
4. Explain multi-valued dependency with the help of an example.
5. What is BCNF? Explain with the help of an example.

82 | P a g e
Lesson 7 Client/ Server architecture and transaction management
Structure
7.0 Objective
7.1 Introduction
7.2 Understanding Client/Server Architecture
7.3 Transaction
7.4 Key Properties of Transactions
7.5 Transaction States
7.6 Concurrent Execution and Schedule
7.6.1 Serializability
7.6.1.1 Conflict Serializability
7.6.1.2 View Serializability:
7.7 Transaction Management using SQL
7.8 Summary
7.9 Glossary
7.10 Answers to check your progress/self assessment questions
7.11 References/ Suggested Readings
7.12 Model questions

7.0 Objective
After studying this lesson, students will be able to:
1. Describe the client/ server architecture for database management.
2. Define transaction and its key properties.
3. Define schedule.
4. Explain the concept of serializability.
5. List important commands in SQL that helps to implement ACID properties of transaction.

7.1 Introduction
Data is stored in a database. Where the database is stored and how an user is able to access
that database depends on the client/ server architecture being used. Transaction management
is key responsibility of any database management system. Transactions now-a-days are
executed concurrently and DBMS needs to ensure that it should not affect the consistency
and accuracy of the database. This lesson discusses the key properties of a transaction and
management of transactions using SQL.

83 | P a g e
7.2 Understanding Client/Server Architecture
Client/Server architecture is key to understanding the database management system. The
most traditional client/server architecture treats server as some database and client as some
front end machine from where the requests are send to the server for processing. The
architecture that consists of a client and a server is called two-tier system. Both the client and
server can be on the same machine.

Figure 7.1: 2-tier web client and web server architecture

Client is responsible for submitting the request in form of a query to the server and then
displaying the response returned by the database server. Server is where the complete
database is stored. The server receives the query from the client, implements it on the
database and returns the resultant records to the client. Main responsibility of a server is data
storage and processing capability. There is also a 3-teir architecture that is designed to reduce
the load on database server. The 3-teir architecture is the more commonly used architecture
of the two. It is mainly used when you want to create a web application for accessing the
database. The application server consists of database connectivity, web connectivity and
business logic software’s. Application server is used to access the right amount of data from
the database server. This layer acts like medium for sending partially processed data between
the database server and the client.

84 | P a g e
Figure 7.2: 3-teir web client and web server architecture.

In a three-tier client/server system, the web server is divided into two distinct entities, i.e. the
processing tier and the data storage tier. The client tier stays the same, User interface for the
user, which s web browser. The database part of the two-tier client/server system is split into
a processing and data storage tier. The processing tier is also known as middle tier and is
responsible for handling the interaction between the client and the data storage tier.
Effectively, the client tier makes a request for the database to a Web server. The processing
tier makes all necessary processing before the request is forwarded to the data storage tier for
read or write operation. The processing tier is then responsible for generating the response
header and the database.

7.3 Transaction
Transaction refers to a series of commands that are executed in specified order to achieve
some task. A transaction may also refer to an event that occurs in the database. For example,
a transaction to transfer a sum of Rs. 100 from one account into another. Transaction involves
commands to read data from account A, deducting 100 from the same, reading data from
account B, and adding 100 to the value of account B. A transaction basically performs either
of two, read or write operation on a database. There are lots of differences in the
implementation of the two operations. Read operation does not perform any updations on the
database and does not change the image of database at any point of time, whereas, write
operation is bound to change the image of database via operations like insert, delete or

85 | P a g e
update. BFIM refers to the image of the database before the write operation and AFIM refers
to the image of the database after the write operation.

Check your progress/ Self assessment questions- 1


Q1. What is a 3-tier client server architecture?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. Define transaction.


___________________________________________________________________________
__________________________________________________________________________
___________________________________________________________________________

Q3. What is BFIM and AFIM?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

7.4 Key Properties of Transactions


Every transaction must satisfy the four key properties called the ACID properties for it to be
considered successful. Following are the 4 ACID properties of a transaction for database
management system:
1. Atomicity: Atomicity refers to either 0% or 100%. It means that either effect of all
commands in a transaction must reflect in the database, or none. For example, think of
transaction where you want to transfer a sum of Rs. 100 from one account into another and
the BMIF of the two accounts before the transaction shows Rs. 1000/- each. The transaction
consists of the following commands or instructions:
Read A;
A = A – 100;
Write A;
Read B;
B = B + 100;
Write B;

86 | P a g e
Let us suppose that the transaction fails due to some reason after the first three commands are
successfully executed. The AFIM image of the two accounts represent an inconsistent state.
Balance of the first account will be Rs. 900/- and that of second account will be Rs. 1000/-,
which is inconsistent. Solution to the problem of atomicity refers to commit operation.
Commit operation writes every temporarily calculated value from the volatile storage on to
the stable storage. The post recovery image of database will show accounts A and B both
containing Rs 1000/-, as if no changes were made to the database even after the first three
commands were successfully executed. Commit command at the end of the transaction can be
used to instruct the DBMS to reflect all changes permanently into the database.
Consistency: Concurrent execution of transactions help to improve the performance of any
database management system. Beside all the benefits of concurrent execution of transactions,
database management system is faced with number of challenges as well. These challenges
are associated with the management of sharable resources, which in this case is databases.
For example, transaction T1 is supposed to deduct Rs. 100 from account A and transaction
T2 is supposed to compute the 10% of the balance and add the same to B. If the two
transactions are executed in concurrent and the schedule executes T2 before T1, T2 will
compute 10% of 100, whereas it was supposed to compute the 10% of 900. This situation will
result in data inconsistency. Proper scheduling of transactions should take place to avoid the
problem of data inconsistency.
Isolation: Concurrent transactions try to gain access to sharable resources at the same time.
Database management should try to make sure that each concurrent transaction is able to gain
access or apply operations on any sharable data item in isolation. It means that when one
transaction is working on some data item, no other transaction should be allowed access to
the same data item. It can be achieved using various access control schemes like applying
locks on data items, or using timestamp based ordering. It helps to retain the consistency of
the database both before and after the successful execution of the transaction.
Durability: Changes during the execution of the transaction are made in temporary memory.
It is critical that all changes are made permanent in the database. Commit command or
operation is used to make the changes reflect permanently into the database. One simple
method of achieving atomicity and durability is the concept of shadow copying. A shadow
copy is an exact copy of BFIM of the database. A pointer using this scheme is used to point
to BFIM of the database. All changes are made to the shadow copy during the execution of
the transaction. If the transaction is completed successfully, the commit command is executed
and the database pointer will be updated to point to shadow copy or AFIM copy discarding

87 | P a g e
the BFIM copy. Otherwise, the transaction is rolled back and the database pointer is not
updated discarding the AFIM.

7.5 Transaction States


Just like a process in Operating System, transactions also go through a number of stages
before the transaction is eventually committed.
Active: It is the initial stage when the execution of the transaction starts.
Partially Committed: The values generated during the execution of a transaction are stored
in volatile storage. Partially committed state is a point where all the commands have been
executed and you have reached the commit point.
Failed: If the transaction fails for some reason, it points to failed state. The temporary values
are no longer required, and the transaction is set for rollback.
Aborted: Once the rollback operation is over after the transaction failed for some reason, the
database is restored to BFIM. The transaction is now in aborted state.
Committed: It comes after the partially committed state. Once the transaction reaches the
COMMIT POINT, all temporary values are permanently written to the stable storage and the
transaction is now in committed state.
Terminated: No matter if the transaction failed or succeeded, the transaction is terminated at
the end and is said to be in terminated state.
The whole process can be described using the following diagram:

Figure 7.3 Transaction states

7.6 Concurrent Execution and Schedule


Two or more transactions that are executed concurrently and they try to gain access to some
shared resource are called concurrent or cooperating transactions.

88 | P a g e
A schedule refers to a series of transactions which is implemented as a unit. Depending on
how the instructions from all transactions are arranged in a schedule, a schedule can be
categorized as:
 Serial: The transactions are executed one after other in a serial order.
 Concurrent: The transactions are executed in a pre-emptive manner, time shared
method.
There is no concurrency in serial schedule and hence no case of two transactions trying to
gain access to shared resource. It is easy to implement, but it is inefficient and other
transactions suffer with long waiting time and response time.
CPU time is shared among transactions in concurrent schedule and there is a case of
transactions trying to gain access to single shared data item for read/write operation. Need to
handle this situation with care as it could result into inconsistent state.
Consider the following two transactions, T1 and T2.
T1
Read A;
A = A – 100;
Write A;
Read B;
B = B + 100;
Write B;
T2
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;

No problems will arise if the two transactions are executed using serial schedule, i.e.
transaction T2 will begin only after T1 has finished. But there are cases when designing a
concurrent schedule, that sharing of data item A could cause problem. Consider the following
case:
T1 T2

89 | P a g e
Read A;
A = A - 100;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
Write A;
Read B;
B = B + 100;
Write B;

This schedule is wrong because the context switching is done at wrong time, i.e. after the
second instruction. The final value of C after executing transaction T2 should be 90, but in
this case it will be 100. Wondering why? Observe that Rs. 100 has been deducted from A in
instruction 2, but the write operation is completed in instruction 3. The switching takes place
after instruction 2, i.e. before the write operation. It means that C is computed as 10 percent
of 100 which the old value of A and not as 10 percent of 90.

Check your progress/ Self assessment questions- 2


Q4. Define atomicity.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q5. Proper scheduling of transactions should take place to avoid the problem of
data___________.

Q6. Explain partially committed state.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

90 | P a g e
Q7. Which transactions are called concurrent transactions?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

7.6.1 Serializability
Serializability refers to producing a concurrent schedule that is free of any issues like
accessing shared data items and it is primarily similar to a serial schedule.

7.6.1.1 Conflict Serializability


Instructions from concurrent transactions often try to access shared data item. Conflict
Serializability firstly tries to establish if there is any conflict between the two instructions
from two different transactions and then it decides the order in which the two should be
executed so that the conflict can be removed. Conflict can only arise if at least two or more
transactions wants to perform write operation on the shared data item. The following rules
are important in Conflict Serializability:
1. There is no conflict when the two instructions of two concurrent transactions wants to
perform read operation, and hence the order does not matter.
2. There is conflict when one of the instructions wants to perform a read operation and
the other instruction wants to perform a write operation, and ordering in this case
matters. If the read instruction is performed first, it reads the old value of the data
item. It the write instruction is performed first, read instruction reads the newly
updated value.
3. There is conflict when the two instructions of two concurrent transactions wants to
perform write operation and the final value of the data item depends on which
instruction was executed last.
If the two schedules ( S1 and S2 ) are made of the same set of transactions, then both S1 and
S2 would yield the same result if the conflict resolution rules are maintained while creating
the new schedule. In that case the schedule S1 and S2 would be called Conflict Equivalent.

7.6.1.2 View Serializability:


View serializability can also be derived by creating new schedule out of an existing schedule
involving the same set of transactions. You must follow the rules given below to create view

91 | P a g e
serializability. Let T1 and T2 be the transactions to be serialized, and S1 and S2 be the two
view equivalent schedules you want to create.
1. If in S1, T1 reads the initial value of a given shared data item, then T1 should read the
initial value of the same shared data item in S2 as well.
2. If in S1, T2 reads the value updated/written by T1, then T2 should read the value
updated/written by T1 in S2 as well.
3. If in S1, the final write operation is performed by T1 on some data item, then in S2
also the final write operation should be performed by T1 on the same data item.

7.7 Transaction Management using SQL


Transaction management for handling concurrent transaction is handled differently by each
DBMS. In this section you will study the four commands that are used by SQL to implement
the ACID properties of the transaction. These commands are only useful if you have made
any changes into the database image, i.e. AFIM is different from BFIM of the database.

1. Commit: Once the transaction completes execution of all instructions, it reaches the
partially committed state. COMMIT command is used to make all the changes in the
databases permanent from the last commit state. Syntax for COMMIT command is simple:
COMMIT;

2. Rollback: Rollback command is used to undo the changes made in the database image
after the last commit state. The ROLLBACK command cannot make changes to the database
image before the last commit state. Syntax for ROLLBACK command is:
ROLLBACK;

3. Savepoint: A Savepoint is also known as a Checkpoint created to represent a consistent


state within a transaction. If you decide to execute the rollback command, the transaction is
rolled back to the last Savepoint rather than rolling back the entire transaction. You need to
specify the name of the Savepoint with the rollback command to roll back only up to
that Savepoint. Syntax for SAVEPOINT command is:
SAVEPOINT Identifier;

Check your progress/ Self assessment questions- 3

92 | P a g e
Q8. Is there any conflict when an instruction in one transaction wants to read and instruction
in other transaction want to write on same data item?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q9. What is the rollback command?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

7.8 Summary
The most traditional client/server architecture treats server as some database and client as
some front end machine from where the requests are send to the server for processing. In a
three-tier client/server system, the web server is divided into two distinct entities, i.e. the
processing tier and the data storage tier. Transaction refers to a series of commands that are
executed in specified order to achieve some task. A transaction may also refer to an event that
occurs in the database. BFIM refers to the image of the database before the write operation
and AFIM refers to the image of the database after the write operation. Atomicity refers to
either 0% or 100%. It means that either effect of all commands in a transaction must reflect in
the database, or none. Proper scheduling of transactions should take place to avoid the
problem of data inconsistency. Isolation means that when one transaction is working on some
data item, no other transaction should be allowed access to the same data item. Commit
command or operation is used to make the changes reflect permanently into the database. Just
like a process in Operating System, transactions also go through a number stages before the
transaction is eventually committed. Two or more transactions that are executed concurrently
and they try to gain access to some shared resource are called concurrent transactions. If the
two schedules ( S1 and S2 ) are made of the same set of transactions, then both S1 and S2
would yield the same result if the conflict resolution rules are maintained while creating the
new schedule. In that case the schedule S1 and S2 would be called Conflict Equivalent.
Transaction management for handling concurrent transaction is handled differently by each
DBMS.

93 | P a g e
7.9 Glossary
Transaction- It refers to a series of commands that are executed in specified order to achieve
some task.
BFIM- It refers to the image of the database before the write operation
AFIM- It refers to the image of the database after the write operation
Concurrent transaction- Two or more transactions that are executed concurrently and they try
to gain access to some shared resource are called concurrent transactions.
Serializability- It refers to producing a concurrent schedule that is free of any issues like
accessing shared data items and it is primarily similar to a serial schedule.
Commit- COMMIT command is used to make all the changes in the databases permanent
from the last commit state
Rollback- Rollback command is used to undo the changes made in the database image after
the last commit state.
Savepoint- A Savepoint is also known as a Checkpoint created to represent a consistent state
within a transaction.

7.10 Answers to check your progress/self assessment questions


1. 3-tier client server architecture consists of middle tier also known as processing tier and is
responsible for handling the interaction between the client (front end tier) and the data
storage (back end tier).
2. Transaction refers to a series of commands that are executed in specified order to achieve
some task.
3. BFIM refers to the image of the database before the write operation and AFIM refers to the
image of the database after the write operation.
4. Atomicity refers to either 0% or 100%. It means that either effect of all commands in a
transaction must reflect in the database, or none.
5. inconsistency.
6. Partially committed state is a point where all the commands of a transaction have been
executed and you have reached the commit point.
7. Two or more transactions that are executed concurrently and they try to gain access to
some shared resource are called concurrent transactions.
8. There is conflict when one of the instructions wants to perform a read operation and the
other instruction wants to perform a write operation, and ordering is this case matters.

94 | P a g e
9. Rollback command is used to undo the changes made in the database image after the last
commit state. The ROLLBACK command cannot make changes to the database image before
the last commit state.
7.11 References/ Suggested Readings
"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications."

7.12 Model questions


1. What is a transaction? Explain ACID properties of transaction.
2. Explain with the help of neat diagram the concept of 3-tier client/server architecture.
3. Explain various transactions states. Also draw the transition diagram.
4. Explain view serializability and give the three conditions necessary to achieve it.
5. Explain transaction management using SQL.

95 | P a g e
Lesson 8 Concurrency control mechanisms
Structure
8.0 Objective
8.1 Introduction
8.2 Concurrency
8.3 Concurrency control
8.3.1 Lock Based Protocol
8.3.1.1 Lock Granularity
8.3.1.2 Types of Locks
8.3.1.3 Implementing the Locks
8.3.1.4 Two Phase Locking Protocol
8.3.1.4.1 Deadlocks in two phase locking
8.3.1.4.2 Types of Two Phase Locking Protocols
8.3.2 Timestamp Ordering Protocol
8.3.3 Optimistic concurrency control
8.4 Database Recovery Management
8.4.1 Log-based Recovery
8.4.2 Checkpoint based recovery
8.5 Summary
8.6 Glossary
8.7 Answers to check your progress/self assessment questions
8.8 References/ Suggested Readings
8.9 Model questions

8.0 Objective
After studying this lesson, students will be able to:
1. Define the concept of concurrency.
2. Discuss lock based protocol for implementing concurrency control.
3. Explain deadlock and solution to deadlock.
4. Discuss the Timestamp Ordering Protocol.
5. Describe in detail the concept of Database Recovery Management.

8.1 Introduction

96 | P a g e
Database management systems today allow the execution of transactions concurrently.
Execution of transactions concurrently is complicated and it needs the DBMS to apply
various concurrency controls to ensure the database consistency. Problem only arises when
the two transactions concurrently try to gain access to some shared data element. Even after
well designed concurrency controls, database may reach an inconsistent state and you need to
apply some mechanism to recover from that state.

8.2 Concurrency
Concurrency refers to a situation when multiple transactions are being executed at the same
time and accessing some shared data item. The transactions that do not access any shared
resource are not called concurrent transactions, even though they are being executed
concurrently.
8.3 Concurrency control
Access control is used to manage the concurrent access of shared resources, which is this case
is database. Problem generally arises when the access control is not implemented properly
and multiple transactions try to access the same shared database or table. Serializability is one
solution to this problem discussed in transaction management lesson. But it is not always
possible to achieve serializability. Two important access control mechanisms discussed in
this lesson are called Locks and Timestamps. The main objective of implementing access
control is to preserve the consistency and isolation properties of the database. Number of
protocols for both the mechanisms to implement access control are discussed in this lesson.
8.3.1 Lock Based Protocol
What is a lock? Why it isused in databases? It is only applicable when concurrent processes
are trying to access the shared databases. When a transaction tries to perform a read or write
operation on the database, it locks the database or table, so that no other transaction can gain
access to that database. A lock is a mechanism that informs the database management system
whether a data item is being used by some transaction for performing read/write operation.
Every concurrent transaction checks for the status of lock on a particular data item before it is
allowed access to the same. Read and write are two entirely different operations performed
on the database and hence the behaviour of locks on the two processesis also entirely
different.
Implementing access control for concurrent transactions trying to read the data item is very
easy. No updations are made to the database when multiple transactions are performing the

97 | P a g e
read operation and hence it is not dangerous to allow all the transactions to perform the read
operation concurrently on the shared resources.
Write operation is entirely different from the read operation. Write operation changes or
updates the value of the data item. Value of data item updated by a transaction remains in the
inconsistent state from the start of the write operation to the end of the write operation.
Letting some other transaction to gain access to that shared data item is dangerous, as the
second transaction is reading the inconsistent value of that data item. Moreover, if you allow
the second transaction to perform the write operation on the same data item, the value written
by the first transaction is overwritten. Database management system wants to avoid both the
situations.
From the above discussion, you can conclude that, if a transaction is performing the read
operation on some shared data item, other transactions may be allowed to perform the read
operation concurrently on the shared data item but not the write operation. Secondly, if a
transaction is performing the write operation on some shared data item, other transactions
cannot be allowed to perform read and write operations concurrently on the shared data item.
8.3.1.1 Lock Granularity
The granularity of locks refers to the size of the data item to be locked. The size of a data
item to be locked can be as large as complete database to single cell in a table. The size of the
data item effects the overhead involved and performance of implementing concurrent
processes. Smaller the size, better the performance, but increased overhead. By locking at
higher levels of granularity, the amount of work required to obtain and manage locks is
reduced and at lower levels of granularity, Less overall work is required.
8.3.1.2 Types of Locks
Based on the two types of operations, the locks in database management system are classified
into:
Shared Lock: Shared lock is obtained by any transaction when it wants to perform only the
read operation. It is named shared lock because multiple concurrent transactions can share
this lock and perform the read operation simultaneously.
Exclusive Lock: Exclusive lock is obtained by any transaction when it wants to perform both
the read and write operations. It is named exclusive lock because only one transaction can
gain access to it for a given shared data item and all other transactions have to wait for the
first transaction to release the lock before it can perform read/ write operation the same
shared data item.

98 | P a g e
Lock Matrix is used to represent the relationship between the shared and exclusive Lock.
Locks already existing
Shared Exclusive
Shared TRUE FALSE
Exclusive FALSE FALSE

Check your progress/ Self assessment questions- 1

Q1. Define concurrency.

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q2. What is a Lock based protocol?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q3. What do we mean by lock granularity?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

99 | P a g e
Q4. In which case a transaction should apply for shared lock?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

8.3.1.3 Implementing the Locks


If the lock is not used by some other transaction, the transaction that wants to perform read or
write operation on a shared data item must lock the same and then after performing the
desired operation, release the same lock so it can be made available for other transactions to
use. Based on type of operation to be performed, a transaction can request either a shared or
exclusive lock.
Let us suppose, you want to transfer a sum of Rs. 255 from the account of A to account of B,
the transaction using locks can be written as follows:
Lock-X (A); (Because you want to perform both read and write operations)
Read A;
A = A – 255;
Write A;
Unlock (A); (Release the lock after the write operation has been performed)
Lock-X (B); (Because you want to perform both read and write operations)
Read B;
B = B + 255;
Write B;
Unlock (B); (Release the lock after the write operation has been performed)
Another transaction where you want to add in account B, 50% of the amount in account A,
the transaction using locks can be written as follows:
Lock-S (A); (Because you want to perform only read operation)
Read A;
AMOUNT = A * 0.5;

100 | P a g e
Unlock (A); (Release the lock after the write operation has been performed)
Lock-X (B); (Because you want to perform both read and write operations)
Read C;
C = C + Temp;
Write C;
Unlock (C); (Release the lock after the write operation has been performed)

Now if the two transactions are implemented concurrently and the following schedule is
executed for the 2 transactions, if will result in undesirable results or error in balance of two
accounts.
T1 T2

Read A;
A = A - 100;
Read A;
AMOUNT = A * 0.5;

Read C;
C = C + AMOUNT;
Write C;
Write A;
Read B;
B = B + 100;
Write B;

It is easy to detect an error in the schedule of 2 transactions above. Transaction 2 access the
old value of A, because the write operation on A is performed after the transaction 2 finishes.
Transaction 2 should not have assessed value of A before the write operation on A was
complete. To be more precise, Transaction 1 must release the lock before the transaction 2
can gain access to it and perform the read or write operation. Following is the correct
implementation of locks.

101 | P a g e
T1 T2

Lock-X (A)
Read A;
A = A - 100;
Write A;
Unlock (A)
Lock-S (A)
Read A;
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)
Placement of locks in the transactions is very important. You must not request for a lock
before it is needed and hold on to lock for long after it has been used. Both will result into the
degradation of the performance.

8.3.1.4 Two Phase Locking Protocol


You must have already observed that how locks help to maintain the database consistency.
But implementing the locks without any rules can create lot of problems in terms of system
performance. Locking protocols are used to define the rules based on which transactions can
requests for lock and release the locks. In case of two Phase Locking Protocol, gaining access

102 | P a g e
to locks and release of locks is done in two different phases called the growing phase and
shrinking phase.
Growing Phase: It is called growing phase because in this phase the count of locks keeps on
growing. In this phase, different transactions can only request for locks and cannot release the
locks. A transaction becomes part of the growing phase as soon as it is allocated its first lock.
The transaction is not allowed to release even a single lock before it has acquired all the
locks. The point at which a transaction obtains all its locks is called the Lock Point. After the
lock point, a transaction can start releasing the locks.
Shrinking Phase: In this phase a transaction can only release the locks obtained by it and
cannot request for any new lock in the process. The shrinking phase comes after the lock
point, but it does not start immediately. It starts from the point the transaction releases it first
lock. Two Phase Locking Protocol is further categorized into Strict Two Phase Locking
Protocol and Rigorous Two Phase Locking Protocol.

8.3.1.4.1 Deadlocks in two phase locking


Mutual blocking of locks between the transactions can lead to deadlock. Deadlock occurs
when two or more transactions are requesting for a lock obtained by some other transaction
which is waiting for a lock obtained by another waiting transaction, and a cycle is formed.
For example: transaction T1 has obtained exclusive lock on data item A and requesting for
exclusive lock on data item B, whereas, transaction T2 has obtained exclusive lock on data
item B and requesting for exclusive lock on data item A. This situation is called deadlock.

8.3.1.4.2 Types of Two Phase Locking Protocols


Strict Two Phase Locking Protocol
It is used to create cascade less schedule. In case of strict phase locking protocol, a
transaction is allowed only to release the shared locks in the shrinking phase before the
transaction commits. All exclusive locks can be released only after the transaction commits.
A cascading schedule is one in which the fate of child transaction is dependent on the fate of
parent transaction. If the child transaction has performed the commit operation, but the parent
transaction fails, the child transaction will also be rolled back if needed, even though it had
successfully committed. Cascading Rollback can be prevented using Strict Two Phase
Locking Protocol.

Rigorous Two Phase Locking Protocol

103 | P a g e
In case of rigorous two phase locking, a transaction cannot release even the shared locks,
which was allowed in case of Strict two phase locking. It means that until the transaction
commits, other transaction can acquire a shared lock on data item on which the uncommitted
transaction has only the shared lock.

8.3.2 Timestamp Ordering Protocol


A timestamp is simply a tag that defines the time at which a particular operation was
performed. In case of concurrency management, a timestamp denotes a specific time on
which the transaction or data item had been activated. Timestamp can be implemented using
either of the two techniques:
1. Assign the current value of clock to the data item or transaction being activated, or
2. Assign a counter value in some ordering to the data item or transaction being
activated.
A timestamp of a data item can be:
W-timestamp (Q): Latest time at which the write operation was performed on the data item
Q.
R-timestamp (Q): Latest time at which the read operation was performed on the data item Q.

The two values need to be updated each time you perform a read or write operation on a
specific data item.

Check your progress/ Self assessment questions- 2

Q5. What is a lock point in 2 phase locking protocol?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q6. Explain deadlock in 2 phase locking protocol.

104 | P a g e
___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q7. Explain the difference between strict and rigorous two phase locking protocols.

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q8. What is timestamp?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

How should timestamps be used?


Execution of conflicting read or write operation in their respective timestamp order is ensured
by the timestamp ordering protocol. Timestamp ordering protocol can be implemented as an
alternative to lock based protocol.
For Read operations:
1. If TS (T) < W-timestamp (Q), then the transaction T is trying to read a value of data
item Q which has already been overwritten by some other transaction. T would be
rolled back as it is trying to read a value from Q that does not exist.
2. If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data
item Q which has been written and committed by some other transaction earlier.

105 | P a g e
Hence Read operation in this case is allowed and the R-timestamp of Q is updated to
TS (T).
For Write operations:
1. If TS (T) < R-timestamp (Q), it means the system has waited too long for transaction
T to write its value, and the delay has become so great that it has allowed another
transaction to read the old value of data item Q. Hence, T must be rolled back.
2. TS (T) < W-timestamp (Q), then transaction T has delayed so much that the system
has allowed another transaction to write into the data item Q. Hence, T must be rolled
back again.
3. Otherwise, the write operation is allowed and W-timestamp of Q to TS (T) is updated.

8.3.3 Optimistic concurrency control


It assumes that multiple transactions can complete without interfering with each other. No
locks are implemented by any of the concurrent transactions using this method. Each
transaction before committing verifies that the data items used by it were not modified by any
other transaction during that period. If the check reveals conflicting modifications, the
committing transaction rolls back, otherwise it commits.

8.4 Database Recovery Management


Reasons for failure
Transaction failure- Sometimes a transaction cannot complete because it has some code
error or DBMS is not able to execute it. For instance, a transaction cannot complete in case of
a deadlock or resource unavailability.
System Crash- External problems that may cause the system to stop abruptly and cause the
system to crash.
Disk Failure- Faced less frequently now a days as technology advancement have made the
disks safer.
Recovery and Atomicity
DBMS is a highly complex system with hundreds of transactions being executed every
second. If it fails or crashes amid transactions in execution, there needs to be some
mechanism for recovery. When a transaction fails for any reason, database may be left in an
inconsistent state and database recovery management of DBMS should recover the database
to the BFIM consistent state. ACID properties of DBMS state that atomicity of transactions

106 | P a g e
as a whole must be maintained. DBMS must perform following checks during recovery
from a crash:
a. DBMS must get information on states of all transactions in execution.
b. DBMS must ensure the atomicity of all transactions that were in execution.
c. It should verify if a transaction can be completed from where it left or it needs to be
rolled back.
d. No transactions be allowed to terminate in an inconsistent state.

8.4.1 Log-based Recovery


Logs maintain the history of all transactions performed by DBMS. Entry for log is done for
every transaction before it is executed. Following are some of the examples of log entries:
When a transaction starts,
<Ta, Start>
When the transaction modifies item X, V2 is the new value and V1 is the old value.
<Ta, X, V1, V2>
When the transaction commits,
<Ta, commit>
Modification in the database can be performed using either of the two techniques:
Deferred update- All logs are written on to the stable storage and the database is updated
when a transaction commits.
Immediate update- Database is modified immediately after every operation, i.e. entry by
entry in the log table.
In case of immediate update, if the transaction fails, all the entries in the log table are
reversed.

8.4.2 Checkpoint based recovery


Failure of concurrent transactions can lead to cascading effect. Log based recovery is not
suggested for recovery from concurrent transactions. To reduce the effect of cascading,
DBMS must create checkpoints. Checkpoint refers to the last consistent states within the
transactions. Checkpoint declares a point before which the DBMS was in consistent state. All
log entries before the checkpoint can be stored permanently in a storage disk.
Two lists are maintained for recovery using checkpoints called the redo-list and the undo-list.
The recovery system reads the logs backwards from the end to the last checkpoint. The
transactions are added to the redo-list after system crash or failure if recovery system sees a

107 | P a g e
log with <Ta, Start> and <Ta, Commit> or just <Ta, Commit> and transactions are added to
the undo-list if recovery system sees a log with <Ta, Start> but no commit or abort entry. All
transactions in undo-list are reversed and log entries are removed. Logs for all transactions in
redo-list are removed and then redone before saving their logs.

Figure 8.1 Checkpoints and transaction start and commit points.

Transaction T1 is committed before the checkpoint and hence no action is needed for it.
Transactions T2 and T3 committed before the failure but did not reach any check point and
hence should be added to redo-list. For transaction T4, failure happened before the commit
point and hence should be added to undo-list.

Check your progress/ Self assessment questions- 3

Q9. Differentiate between undo-list and redo-list.

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q10. Differentiate between deferred update and immediate update.

108 | P a g e
___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q7. Explain the difference between strict and rigorous two phase locking protocols.

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q8. What is timestamp?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

8.5 Summary
Concurrency refers to a situation when multiple transactions are being executed at the same
time and accessing some shared data item. When a transaction tries to perform a read or write
operation on the database, it locks the database or table, so that no other transaction can gain
access to that database. A lock is a mechanism that informs the database management system
whether a data item is being used by some transaction for performing read/ write operation.
The granularity of locks refers to the size of the data item that is locked. Shared lock is
obtained by any transaction when it wants to perform only the read operation. Exclusive lock
is obtained by any transaction when it wants to perform both the read and write operations. In
case of two Phase Locking Protocol, gaining access to locks and release of locks is done in

109 | P a g e
two different phases called the growing phase and shrinking phase. Mutual blocking of locks
between the transactions can lead to deadlock. Deadlock occurs when two or more
transactions are requesting for a lock obtained by some other transaction which is waiting for
a lock obtained by another waiting transaction, and a cycle is formed. A timestamp is simply
a tag that defines the time at which a particular operation was performed. In case of
concurrency management, a timestamp denotes a specific time on which the transaction or
data item had been activated. Logs maintain the history of all transactions performed by
DBMS. Checkpoint declares a point before which the DBMS was in consistent state. All
transactions in undo-list are reversed and log entries are removed. Logs for all transactions in
redo-list are removed and then redone before saving their logs.

8.6 Glossary
Shared Lock- Shared lock is obtained by any transaction when it wants to perform only the
read operation.
Lock Matrix- It is used to represent the relationship between the shared and exclusive Lock
Exclusive Lock- Exclusive lock is obtained by any transaction when it wants to perform both
the read and write operations.
Timestamp- A timestamp denotes a specific time on which the transaction or data item had
been activated.
Lock granularity- The granularity of locks refers to the size of the data item that is locked.
Lock- A lock is a mechanism that informs the database management system whether a data
item is being used by some transaction for performing read/ write operation.
Checkpoint- Checkpoint declares a point before which the DBMS was in consistent state.
Log- Log refers to entries of all modifications made by a transaction.
Concurrency- Concurrency refers to a situation when multiple transactions are being
executed at the same time and accessing some shared data item.

8.7 Answers to check your progress/self assessment questions


1. Concurrency refers to a situation when multiple transactions are being executed at the
same time and they try to gain access to some shared data item.
2. When a transaction tries to perform a read or write operation on the database, it locks the
database, so that no other transaction can gain access to that database or transaction. A lock
is a mechanism that informs the database management system whether a data item is being
used by some transaction for performing read/ write operation.

110 | P a g e
3. The granularity of lock refers to the size of the data item that is locked. The size of the data
item effects the overhead involved and performance of implementing concurrent processes.
Smaller the size, better the performance, but increased overhead.
4. Shared lock is obtained by any transaction when it wants to perform only the read
operation. It is named shared lock because multiple concurrent transactions can share this
lock and perform the read operation simultaneously.
5. The point at which a transaction obtains all its locks is called the Lock Point. After the
lock point, a transaction can start releasing the locks.
6. Deadlock occurs when two or more transactions are requesting for a lock obtained by some
other transaction which is waiting for a lock obtained by another waiting transaction, and a
cycle is formed.
7. In case of strict phase locking protocol, a transaction is allowed only to release the shared
locks in the shrinking phase before the transaction commits and in case of rigorous two phase
locking, a transaction cannot release even the shred locks before the transaction commits.
8. A timestamp is simply a tag that defines the time at which a particular operation was
performed. In case of concurrency management, a timestamp denotes a specific time on
which the transaction or data item had been activated.
9. All transactions in undo-list are reversed and log entries are removed. Logs for all
transactions in redo-list are removed and then redone before saving their logs.
10. Deferred update- All logs are written on to the stable storage and the database is updated
when a transaction commits. If the transaction fails, the entries for the same are removed
from the log.
Immediate update- Database is modified immediately after every operation, i.e. entry by
entry in the log table. If the transaction fails, all the entries in the log table are reversed and
then removed.

8.8 References/ Suggested Readings


"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications."

111 | P a g e
8.9 Model questions
1. Explain 2 phase locking protocol in detail.
2. Explain Timestamp ordering protocol for implementing concurrency control.
3. What do we mean by deadlock in 2 phase locking?
4. Explain checkpoint based recovery in DBMS with the help of an example.
5. What is a lock? Explain two types of locks with the help of an example each.

112 | P a g e
Unit-3

Lesson -9

Distributed Database

Structure of Lesson

9.0 Introduction

9.1 Centralized Design

9.2 Centralized vs. Decentralized Design

9.3 Distributed database management system(DDBMS)

9.4 Advantages of DDBMS

9.5 Disadvantages of DDBMS

9.6 Characteristics of DDBMS

9.7 Distributed Database Structure

9.8 Components of DDBMS

9.9 Distributed Database Design

9.9.1 Data Fragmentation

9.9.2 Data Replication

9.9.3 Data Allocation

9.10 Homogeneous and Heterogeneous DBMS

9.11 Summary

9.12 Glossary

9.13 Answers to check your progress

9.14 Model Questions

9.0 Introduction

Due to the vast usage of internet and World Wide Web, it becomes a problem for DBMS to
handle the large amount of data and database users. The Traditional DBMS finds it difficult
to manage with I/O and CPU performance as it runs with a single mainframe system. Also
the amount of memory in system, the number of hard disk drives connected to system, the

113 | P a g e
number of processors it can run in parallel becomes a limitation to the traditional DBMS. As
the information and database users increased, so did the problem of storing and sorting. To
overcome this problem there is a need of a system which can overcome these limitations.
There is one way to build large and faster system, but it was not cost effective. So the new
technology emerged called distributed database. DDBMS is able to have a number of
database systems which are connected with each other and appear to user as a single unit.
With the help of this approach it is possible to handle large number of users and is also cost
effective.

9.1 Centralized Design

Centralized design is used when the components of data are having small number of objects,
procedures and entities. This design can be composed in a simple and short database and can
be handled by only one administrator or by a small team. First the administrator has to define
the problem and then create the conceptual design. This conceptual design should be verified
with user view and define some system processes and data constraints according to
organization requirements. This centralized design can be used by small organizations as well
as large companies.

Fig 9.1 : Centralized design

Decentralized Design

Decentralized design is used when the components of data are having large number of
objects, entities and complex relations. This design is used when complex problem is spread
among many operational sites. Here a single administrator is not sufficient, instead a team of

114 | P a g e
expert designers is required to complete the job. Different modules of system are designed to
solve the complex problem. The expert team designs the conceptual model and compares it
with user view, data processes and data constraints for each module. When all the modules
are verified, they are put together to make a single conceptual model.

Fig 9.1 : Decentralized design

9.2 Centralized vs. Decentralized Design

Centralized Design Decentralized Design

This design is used when data components This design is used when data components
are having small number of objects and are having large number of objects, entities
entities. and complex relations.
Small and simple database. Large database.
Single administrator is sufficient. Expert team of designer is required.

This design does not divide the problem into This design divides the problem into
modules. modules.
Single conceptual model is required Different modules require different
conceptual models.

115 | P a g e
Problem is simple and does not spread Problem is spread among many operational
among operational sites. sites.

9.3 Distributed database management system(DDBMS)

Distributed database is viewed as a single logical database which is distributed physically


among different computers at multiple locations. In other words, distributed database is
having control of different DBMS systems that are running independently and are connected
by a communication link. Each computer system in distributed environment is capable of
processing the requests of users that require access to local data, also have the capability to
process the user request that require an access to other computer’s stored data. With this a
wide number of users can be handled and large number of requests can be processed.
Distributed computing approach basically divide's a big problem into smaller ones and solve
it in coordination to improve the efficiency.

The software system that is responsible for managing the distributed databases and
makes the distribution transparent to the user is called Distributed database management
system. By the word transparency we mean that the data is available to user at any location in
the network as if the data is stored in user’s own location. A single logical database is split
into number of fragments where each fragment is stored on one or more computer systems.
Every fragment is under the control of separate DBMS which is connected by a
communication channel.

Fig 9.3 Topology of DDBMS

Check your progress/ self assessment question


1. Full form of DDBMS?

116 | P a g e
9.4 Advantages of DDBMS

 Availability

In case of failure at one site, the data is available to user at some other site. In case of failure
in centralized system, the whole system becomes inaccessible to user. User request is
rerouted from failed site to another working site where the required data is available for the
user. I helps improve the efficiency of the system.

 Improved sharing ability

Data is available to user at any location. User sitting at one location can access the data at his
own site as well as from other sites at same time. For Example: a customer sitting on one
branch of bank can access the data available at another branch of bank in distributed banking
system.

 Local autonomy

There is a global database administrator in distributed systems who is responsible for the
whole system. Still there are also local administrators who are responsible of their own data
sites and according to the design of DDBMS each DBA may have a different degree for local
autonomy.

 Improved reliability

As the same data is available at more than one site, so in case of failure of one site the
another site allows the access of replicated data. So accessibility improves the reliability of
data.

 Improved performance

By keeping the databases closer to sites who are using that data repetitively makes the user to
react to speedily to the new developments and also to interact with other network resources to
find solutions to unusual problems. This improves the access delays and improves the CPU
and I/O services clash.

 Economics

It is cost effective to use a system of smaller computers with equitant power than a single
large computer. So it is more economical to use coordinated mini and micro computers for
processing rather than single mainframe computer.

 Modular Growth

117 | P a g e
In Distributed systems it is very easy to add new sites to the network without effecting the
existing sites of the system. So Distributed systems are very flexible to add processing and
storage power which can easily enhance the database storage size.

 Reduced communication overhead

In the distributed DBMS data access is mostly local or at the sites close by. So there is
reduced communication overhead and better response time as compared to centralized
systems.

9.5 Disadvantages of DDBMS

 Architecture complexity

In distributed DBMS resources are also distributed, so it becomes very difficult to


manage the resources from a central control point. This increases the complexity. The
computation between sites and exchange of messages among sites is a kind of overhead to
distributed systems.

 Lack of standards

Due to unavailability of adequate communication standards and data access protocols, the
good resources may get far from the reach of some sites which decreases the potential of
Distributed database management systems.

 Security

Because of the unavailability of secure communication medium the protection of data is at


risk .The user’s data and programs which are transmitted over network are insecure. The
developments have been made to overcome this problem by building new encryption
techniques.

 Data integrity problem

Integrity of data in database requires some integrity constraints on data. But here the data is
placed at many sites and it becomes very costly to enforce integrity constraints on distributed
data.

 Design of database more complex

As compared to centralized system, database design in distributed systems is more difficult.


Because the design of distributed database has to take into account the data fragmentation,
replication and allocation which is not required in centralized systems.

Check your progress/ self assessment question


2. A distributed database has which of the following advantages over a centralized
database?

118 | P a g e
A. Software cost

B. Software complexity

C. Slow Response

D. Modular growth

9.6 Characteristics of DDBMS

A DDBMS has the following Characteristics:

 Logical related data is collected and shared by different systems.


 Data is split into a number of fragments
 Interface to interact with end users and with other DBMSs.
 Fragments are allocated to sites.
 Communication links are provided between different sites.
 The data under each site is under the control of local DBMS
 Mapping is done to determine the location of data i.e. data of local or remote
fragment.
 Database administrator for administration of database.

9.7 Distributed Database Structure

The structure of Distributed database is based on logical and component architecture


models of DDB (Distributed database). In logical architecture we will discuss schemas that
helps in understanding component architecture i.e. structure of DDB. So here each node is
represented as it is having its own Local internal schema (LIS) to accommodate the
heterogeneity in distributed database. As there are also some DDBs which are having
different data models, different database software, their internal schema cannot be same.
Therefore each node will have different local internal schema.

119 | P a g e
Fig 9.4: Logical architecture of DDB

At local conceptual schema (LCS), the logical organization of data at every site is
done. The Global conceptual schema (GCP) is providing the fragmentation and replication
transparency. It is representing consistent and unified view and the logical structure of data
across all nodes. Therefore its purpose is to provide network transparency.

120 | P a g e
Fig 9.5: Structure of Distributed Database system.

In the structure of DDB the global query compiler is used to verify and impose
defined constraints. It is referring the global conceptual schema from global system catalog.
The optimized local queries are generated from global queries by using local and global
conceptual schema. The query generation is done by the global query optimizer. It is also
estimating the cost based on response time (I/O, CPU and network latencies).The optimizer
selects the candidate with the minimum cost of execution after computing the cost for each
candidate. The Global transaction manager along with Local transaction manager are
responsible for coordinating the execution of multiple sites. The local DBMS are having their
local transaction manager, query optimizer, and execution engines along with local system
catalog.

Check your progress/ self assessment question


3. Transaction manager is which of the following?
A. Maintains a log of transactions

B. Maintains before and after database images

121 | P a g e
C. Maintains appropriate concurrency control

D. All of the above.

9.8 Components of DDBMS

The DDBMS consists of the following components:

1. Computer workstation: The network system is composed of sites or nodes which are
also known as computer workstations. The distributed system must not take into
consideration the computer system hardware. It must be concerned only with
networking sites.
2. Network hardware and software: Each workstation consists of network hardware
and software. Network hardware and software is independent of distributed database
system. All nodes interact with each other over the network.
3. Communication media: As nodes in distributed database system are spread
geographically, some communication medium is required to transfer the data between
different sites. DDBMS is able to support different types of communication media.

Fig 9.6 Components of DDBMS

4. Transaction process: This component is responsible for receiving and processing the
data and application at both local and remote sites. It is also known as transaction
manager or application processor.
5. Data Processor: This component is responsible for storing and retrieving the data. It
is also known as data manager. Data manager can be centralized.

9.9 Homogeneous and Heterogeneous DBMS

122 | P a g e
A DDBMS is classified into two categories called homogeneous and heterogeneous DDBMS.
In homogeneous DDBMS all sites are using the same DBMS software and have the same
applications on the nodes or sites. Each and every site is having the common schema and can
have different autonomy degree. In homogeneous DDBMS it is not possible to have more
than one DBMS software type in the system.

In heterogeneous DDBMS each site can use different DBMS software. These DBMS
software are not required to have same data model. They can choose any one like
hierarchical, network, relational, object-oriented DBMS’s.

DB2
DB2 (Site 1)
(Site 1)

Oracle DDBMS IMS


DB2 DDBMS DB2 And
And (Site 2) Network (Site 3)
(Site 2) Network (Site 3)

DB2 Sybase
(Site 4) (Site 4)

Homogeneous DDBMS with identical Heterogeneous DDBMS with


DBMS’s different database and data model

Homogeneous DBMS are having same DBMS software and data model, so it is very
easy to expand it and have no translation overhead. As every database structure is same in
every site so communication is very easy and efficient. Unlike homogeneous DBMS
heterogeneous DBMS are having different database software and different data model at
different sites so it is very difficult to make mapping among them. Because every site is
having different database structure, so it requires necessary translations to be performed to
exchange information.

The performance of homogeneous DBMS can be increased by exploiting the parallel


processing capacity of multiple sites. But heterogeneous DBMS is more popular for its
scalability as it is able to mix software packages. It needs translations when different sites are
having different hardware, DBMS software or different hardware and DBMS software both.
If systems at sites are having different hardware only then there is simple translation that
involves changes in code and word length. F systems at sites are having different DBMS
software then there is difficult translation that involves mapping of data structure in one
system data model to equivalent data structure in another system data model. But if the
system at different sites are having different hardware and different DBMS software both,
then both then both the above translations will be carried out , which makes it very complex.

123 | P a g e
Check your progress/ self assessment question
4. A heterogeneous distributed database is which of the following?
The same DBMS is used at each location and data are not distributed across all
A.
nodes.

B. The same DBMS is used at each location and data are distributed across all nodes.

A different DBMS is used at each location and data are not distributed across all
C.
nodes.

D. A different DBMS is used at each location and data are distributed across all nodes.

9.11 Summary

In this lesson, comparison of centralized and decentralized design have been discussed. The
advantages and disadvantages of DBMS have been elaborated.

9.12 Glossary

DBMS: Data Base Management System

DDBMS: Distributed data base Management system

LCS: Local conceptual schema

GCP: Global conceptual schema

9.13 Answer to progress/ self assessment question

1. DDBMS: Distributed database management system


2. D
3. D
4. D

9.14 Model question

Question: Differentiate between centralized and decentralized design.


Question: Define Homogeneous and Heterogeneous DBMS.

124 | P a g e
Lesson 10 Levels and design of distributed database
Structure
10.0 Objective
10.1 Introduction
10.2 Levels of data and process distribution
10.2.1 Single-site processing, single-site data (SPSD)
10.2.2 Multiple-site processing, single-site data (MPSD)
10.2.3 Multiple-site processing, multiple-site data (MPSD)
10.3 Distributed database design
10.3.1 Data Fragmentation
10.3.1.1 Horizontal fragmentation
10.3.1.2 Vertical fragmentation
10.3.1.3 Mixed fragmentation
10.3.2 Data Replication
10.3.3 Data Allocation
10.4 Summary
10.5 Glossary
10.6 Answers to check your progress/self assessment questions
10.7 References/ Suggested Readings
10.8 Model questions

10.0 Objective
After studying this lesson, students will be able to:
1. Define the different levels of data and process distribution in distributed databases.
2. Explain the three type of data fragmentation strategies.
3. Discuss the need of data replication.
4. List various levels of data replication.
5. Describe the different types of data allocation strategies.

10.1 Introduction
Implementing a distributed database environment is a challenging task. A number of factors
affect the performance of distributed DBMS. It is important that you are able to select the
appropriate level of data and process distribution depending on the requirements of the
system to be implemented. Also you need to carefully decide on various issues like data

125 | P a g e
fragmentation, data replication and data allocation during design of DDBMS. A good design
of DDBMS, is half the battle won.

10.2 Levels of data and process distribution


Database systems can be best classified on the basis of process and data distribution strategies
supported by them. For instance, a database management system may support data storage on
a single site or on multiple sites, and may support data processing at a single site or at
multiple sites.

10.2.1 Single-site processing, single-site data (SPSD)


There is only one system involved in this strategy, called the host computer. All processing
is done on that same host computer and all data is also stored on the local disk of that same
host computer. Processing is never performed on the system of the end user. SPSD is
typically supported by most main frame or midrange server computer database management
systems. Dumb or thin terminals are connected to the host compute (where the DBMS is
located).

Figure 10.1 SPSD

10.2.2 Multiple-site processing, single-site data (MPSD)


In case of multiple-site processing, single-site data (MPSD) strategy, processing is distributed
on multiple sites or multiple processes run on different computers sharing a single data
repository. It requires a network file server running conventional applications that are
accessed through a network. Accounting applications are the best example of multiple-site
processing, single-site data (MPSD) strategy. Characteristics of MPSD:
TP on each workstation simply routes all network data requests to the file server.

126 | P a g e
The end user sees an abstract view of the system and thinks of the file server as just another
hard disk. Only a limited capabilities for distributed processing are offered by MPSD.
Direct reference to the file server are made by the end user, in order to access remote data.
All functions concerned with data selection, search and updations take place at the
workstation. It requires that entire files are transferred over the network to the workstation
from the file server. It is highly inefficient and could cause network traffic, slows response
time, and increase in communication costs.
Client/server architecture is a variation of multiple site, single data strategy. In case of
client/server architecture, all database processing is done at the server site, and thus reducing
network traffic. Client/server systems support the distributed processing. Client/server
architecture also supports data storage at multiple sites.

Figure 10.2 MPSD

10.2.3 Multiple-site processing, multiple-site data (MPSD)


The multiple-site processing, multiple-site data (MPMD) strategy describes fully distributed
DBMS. It provides support for multiple data processors and transaction processors at
multiple sites. Depending on the level of support for centralized DBMSs, Distributed DBMSs
may be classified as either homogeneous or heterogeneous.

Homogeneous Distributed DBMS

127 | P a g e
It is used to integrate only one type of centralized DBMS over a network. Hence, it provides
an environment in which the same DBMS is running on different server platforms.

Heterogeneous Distributed DBMS


It is used to integrate different types of centralized DBMSs over a network. A fully
heterogeneous DDBMS provides support for different DBMSs that may even support
different data models running under different computer systems.

Fully heterogeneous distributed DBMSs are subject to certain following restrictions:


Remote access is provided on a read-only basis.
There is restriction on the number of remote tables that may be accessed in a single
transaction.
There is restriction on the number of distinct databases that may be accessed.
There is restriction on the database model that may be accessed.
Managing data at multiple sites leads to a number of issues that must be addressed before it is
implemented. Restrictions listed above are not exhaustive by any means, and the technology
to implement DDBMS is evolving rapidly.

Check your progress/ Self assessment questions- 1


Q1. Define SPSD.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. What are the benefits of using client/server architecture?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q3. Differentiate between homogeneous and heterogeneous distributed DBMS.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

128 | P a g e
10.3 Distributed database design
Design principles like relational database model, Entity Relationship Modelling and
Normalization are applicable to both the centralized or distributed databases. In addition to
these design principles, distributed database design also includes the following three issues:
 Partitioning of database into fragments.
 Deciding which fragments to replicate.
 Where the fragments and their replica's will be saved.

10.3.1 Data Fragmentation


Data fragmentation lets you partition a single database or table into multiple fragments.
Fragments can then be stored at single or multiple sites over a network. Distributed data
catalog (DDC) contains all information about data fragmentations. TP can access this
information to process user requests.
In our discussion, logical fragments will be created by only partitioning the tables. Join and
union operations can be used on fragmentations to create the original table. Three data
fragmentation strategies are discussed in this lesson.

10.3.1.1 Horizontal fragmentation


It refers to the partitioning of a table into fragments of tuples. A tuple is also known as row or
record in the table. Each fragment contains unique rows, and each fragment can be stored at
a different site. All the fragmentations created using this technique will share the same set of
attributes or columns. Each fragment will represent the equivalent of a SELECT statement,
with the WHERE clause on a single attribute.
Let us consider an example of horizontal fragmentation. A university might be running its
distance education program across the state. Now, the director wants to know the information
of students enrolled in three different districts. Now you can create three different fragments
and store them on three different locations to represent the data related to local students only.
The horizontal fragmentation of the student table by district can be viewed as follows:
Fragment Location Condition Node Number of
Name Name Rows
STU_F1 JALANDHAR DISTRICT = ABM 1012
'JUC'

129 | P a g e
STU_F2 KAPURTHALA DISTRICT='KPT' JTN 1100
STY_F3 AMRITSAR DISTRICT= DTM 1390
'ASR'

As you can observe that, each fragment in horizontal fragmentation can have different
number of rows, but must have same number of attributes or columns.

10.3.1.2 Vertical fragmentation


It refers to the partitioning of a table into fragments of attributes or columns. Each fragment
contains unique attributes (except the key attribute that will be shared by all fragments and all
fragments will be related using this key attribute only), and each fragment can be stored at a
different site. This is the equivalent of PROJECT operation.
Let us consider an example of vertical fragmentation. Let us consider the same example of
student database for distance education program being run by a university. Now, the
university might be having different departments to provide different services to the students
and each department might not be interested in all the attributes of the student table. Vertical
fragmentation in best suited in this scenario. Support team might only be interested in
knowing the contact details of the student, whereas the placement department might be
interested in knowing the academic related attributes of the student. Vertical fragmentation
for the student table based on departmental activity can be viewed as follows:

Fragment Location Node Name Node Name


Name Number of Rows
STU_F1 Support SVC Roll_no, Name,
Block F_Name, Address,
Contact_no, Email_Id
STU_F2 Placement PVC Roll_No, Grade_10,
Block Grade_12, Grade_Grad,
Skill_set

Each fragment in vertical fragmentation can have different number of columns, but must have
same number of rows. But the two fragments must have one common attribute, which is this
example is Roll_no (roll number of the student).

130 | P a g e
10.3.1.3 Mixed fragmentation
It refers to a combination of both horizontal and vertical techniques. Using this strategy, a
table may be partitioned into multiple horizontal subsets, each having a subset of attributes.
Let us again consider the same example of student database for distance education program
being run by a university. Now, you might need to apply horizontal fragmentation to student
table to accommodate three different districts, and within each district you want to apply
vertical fragmentation to accommodate two departments (i.e. placement and service support).
The mixed fragmentation is a two-step procedure. In step 1, horizontal fragmentation is
introduced for each site based on each DISTRICT. The horizontal fragmentation yields the
subsets of student tuples that are located at each site. Vertical fragmentation is used within
each horizontal fragment to partition the attributes, thus meeting each department’s
information needs at each sub-site. Mixed fragmentation for the student table can be viewed
as follows:

Fragment Location Condition for Node Number of Vertical


Name horizontal criteria Name Rows Criteria
STU_F1 JUC- DISTRICT = ABM - 1022 Roll_no,
Support 'JUC' SVC Name,
Block F_Name,
Address,
Contact_no,
Email_Id
STU_F2 JUC- DISTRICT = ABM - 1022 Roll_No,
Placement 'JUC' PVC Grade_10,
Block Grade_12,
Grade_Grad,
Skill_set
STU_F3 KPT- DISTRICT='KPT' JTN - 1345 Roll_no,
Support SVC Name,
Block F_Name,
Address,
Contact_no,

131 | P a g e
Email_Id
STU_F4 KPT- DISTRICT='KPT' JTN - 1345 Roll_No,
Placement PVC Grade_10,
Block Grade_12,
Grade_Grad,
Skill_set
STU_F5 ASR- DISTRICT= DTM- 1212 Roll_no,
Support 'ASR' SVC Name,
Block F_Name,
Address,
Contact_no,
Email_Id
STU_F6 ASR- DISTRICT= DTM- 1212 Roll_No,
Placement 'ASR' PVC Grade_10,
Block Grade_12,
Grade_Grad,
Skill_set

Each fragment using the mixed fragmentation strategy contains student data based on three
districts, and within each district, by department location.

Check your progress/ Self assessment questions- 2


Q4. Define fragmentation.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q5. What do we mean by horizontal fragmentation?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q6. Vertical fragmentation is the equivalent of ________________ operation.

132 | P a g e
.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

10.3.2 Data Replication


Data replication refers to the storage of data copies at multiple sites served by a computer
network. Fragment copies can be stored at several sites to serve specific information
requirements. Data availability and response time can be reduced by keeping the fragment
copies at multiple sites and it also helps to reduce the overall communication and query costs.
Let us suppose that database DB1 is partitioned into two fragments: F1 and F2. One of the
examples of data replication may be described as follows:

Figure 10.3 Data Replication

Both the data fragments are stored at multiple sites. Original copy of the data fragment F1 is
stored at site1 and original copy of data fragment F2 is stored at site3. The replicated copies
of both the data fragments are stored at site3. There must be data consistency among the
original data fragment and replicated data fragment, and it is subject to the mutual
consistency rule. The mutual consistency rule states that all copies of data fragments be
identical. Regular updates should be made at all sites where the replication of data fragments
are stored by the distributed database management system.
Following are the key benefits of data replication:
 Improved data availability
 better load distribution
 Improved resistance to data failure

133 | P a g e
 Reduced query costs
Following are some of the disadvantages of data replication:
 Additional processing overhead is imposed on the Distributed DBMS.
 Increased transaction time, as regular updates must be made to the replicated sites as
well.
 Increased storage costs.
Following are some of the additional processing that a distributed database
management system must perform to maintain the data replications:
The DDBMS must decompose a query into sub-queries in order to access each fragment of
the database.
A decision is needed to ascertain as to which copy of the fragment be accessed to complete
the operation. A read operation can be performed by accessing the data replication copy at the
nearest site, and a write operation requires the distributed database management system to
update all the replicated copies that satisfy the mutual consistency rule.
A TP sends a data request to each selected DP for execution.
A DP receives and executes each request and sends the response back to the TP.
A TP then assembles the DP responses.
These additional processing's become even more complex if you consider additional factors
such as network topology and communication throughputs.

A fragment replication can be maintained using one of the following three strategies:
Full replication- All fragments of the database are replicated and multiple copies of each
fragment are stored at multiple sites. A fully replicated database is difficult to implement due
to massive overhead it imposes on the system.
Partial replication-Only some of the selected fragments of the database are replicated and
multiple copies of each fragment are stored at multiple sites. It is slightly easy and
manageable to implement partial replication.
No replication- With no replication strategy in practice, only a single copy of the fragment is
stored at a single site. No duplicate copies of the fragments are stored.

Factors that govern the decision to use/ or not to use data replication.
Database size- Data replication not only increases the storage requirements to save the data
replications, but it also increases the overhead related todata transmission. Replicating large

134 | P a g e
amounts of data requires a window of time and higher network bandwidth to transmit the
replicated data.
Usage frequency- Frequency with which the database fragments are updated is also a key
factor. Frequency of read operations has nothing to do with it. It is not easy to maintain the
replication of data fragments that are updated frequently.
Cost- You need to consider the cost for maintaining the performance, software overhead,
managing transaction synchronization, and other cost-benefit analysis associated with
replicated data.

Data replication can help to reduce the cost of data requests as and when the usage frequency
of remotely located data is high. Information related to the data replication is stored in the
distributed data catalog (DDC). This information is used by the TP to decide which copy of a
database fragment to access. The data replication is a powerful recovery tool which makes it
possible to restore lost data.

Check your progress/ Self assessment questions- 3


Q7. List various benefits of data replication.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q8. Differentiate between full and partial data replication.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

10.3.3 Data Allocation


Data allocation is concerned with deciding where to store the data. Following are the data
allocation strategies for distributed database environment:
Centralized data allocation- The entire database is stored at a single site.
Partitioned data allocation- The database is partitioned into multiple disjoint fragments and
stored at multiple sites.
Replicated data allocation- Multiple copies of all or some of the database fragments are
stored at multiple sites.

135 | P a g e
Data distribution over a computer network is achieved through data partition, through data
replication, or through a combination of both. Data allocation and data fragmentation are
closely related to each other.
Following are some of the factors considered when designing a data allocation algorithm:
1. Data availability.
2. Database system performance.
3. Size, and degree of relations.
4. Types of transactions.
5. Disconnected operation for mobile users.

10.4 Summary
Database systems can be best classified on the basis process and data distribution strategies
supported by them. Design principles like relational database model, Entity Relationship
Modelling and Normalization are applicable to both the centralized or distributed databases.
In addition to these design principles, distributed database design also includes the principles
of data fragmentation, data replication and data allocation. Data fragmentation lets you
partition a single database or table into multiple fragments. You can partition the database
using either, horizontal partitioning, vertical partitioning or mixed partitioning. All or some
of the fragments of the database are replicated and multiple copies of each fragment are
stored at multiple sites. Data allocation is concerned with deciding where to store the data.

10.5 Glossary
SPSD- All processing is done on a host computer and all data is also stored on the local disk
of that same host computer.
MPSD- Processing is distributed on multiple sites or multiple processes run on different
computers sharing a single data repository.
MPMD- It provides support for multiple data processors and transaction processors at
multiple sites.
Homogeneous DDBMS- It is used to integrate only one type of centralized DBMS over a
network.
Heterogeneous DDBMS- It is used to integrate different types of centralized DBMSs over a
network.
Horizontal fragmentation- It refers to the partitioning of a table into fragments of tuples.

136 | P a g e
Vertical partitioning- It refers to the partitioning of a table into fragments of attributes or
columns.

10.6 Answers to check your progress/self assessment questions

1. In case of SPSD, all processing and data storage is done on the host computer.

2. Following are the benefits of using client/ server architecture:

a. All database processing is done at the server site and thus reducing network traffic.
b. Client/server systems support the distributed processing.
c. Client/server architecture also supports data storage at multiple sites.
3. Homogeneous Distributed DBMS integrates only one type of centralized DBMS over a
network, and heterogeneous Distributed DBMS integrates different types of centralized
DBMSs over a network.
4. Data fragmentation lets you partition a single database or table into multiple fragments.
5. Horizontal fragmentation refers to the partitioning of a table into fragments of tuples. Each
fragment contains unique rows, and each fragment can be stored at a different site.
6. PROJECT.
7. Following are some of the benefits of data replication:
a. Improved data availability
b. better load distribution
c. Improved resistance to data failure
d. Reduced query costs

8. In case of full replication, all fragments of the database are replicated and multiple copies
of each fragment are stored at multiple sites, and in case of partial replication, only some of
the selected fragments of the database are replicated and multiple copies of each fragment are
stored at multiple sites.

10.7 References/ Suggested Readings


"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.

137 | P a g e
3. Database management Systems by R. Panneerselvam, PHI.
4. Database management system Concepts by P. K. Singh, VK Publications.
5. Database systems: Design Implementation and Management by Peter Rob and Carlos
Coronel, Cengage India Pvt. Limited".

10.8 Model questions


1. Explain mixed fragmentation with the help of an example.
2. List the data allocation strategies for distributed database environment.
3. Which factors govern the decision to use/ or not to use data replication?
4. Explain the three levels of data and process distribution in DDBMS.
5. Differentiate between the horizontal and vertical fragmentation with the help of an
example.

138 | P a g e
Lesson 11 Process distribution in databases
Structure
11.0 Objective
11.1 Introduction
11.2 Distributed database transparency features
11.3 Distributed transparency
11.4 Transaction transparency
11.4.1 Distributed Requests and Distributed Transactions
11.4.2 Distributed Concurrency Control
11.5 CLIENT/SERVER VS. DDBMS
11.6 Summary
11.7 Glossary
11.8 Answers to check your progress/self assessment questions
11.9 References/ Suggested Readings
11.10 Model questions

11.0 Objective
After studying this lesson, students will be able to:
1. List various distributed database transparency features.
2. Explain distributed transparency.
3. Discuss the need of distributed transaction transparency.
4. Describe the concurrency control needed to manage distributed transactions.
5. List the key features of client/ server architecture.

11.1 Introduction
Now that you are familiar with the basic concepts related to distributed database management
system, you will now learn about the various issues related to the process distribution in
databases. You will get a chance to known about the various issues faced when implementing
transactions in a distributed database environment. What levels of transparency are supported
by available DDBMS and what features they provide to keep the distributed database in
integrated and consistent form? Even after the tremendous challenges faced with the
implementation of distributed database, it is fastest growing mechanism used by
multinational companies to manage their data.

139 | P a g e
11.2 Distributed database transparency features
Functional characteristics of distributed database system grouped together can be described
as its transparency features. A user of Distributed DBMS gets a sense that he/she is the only
user of that system and the DDBMS is dedicated to serve his/ her requests. In simple words, a
user feels like he/ she is working on a centralized database management system.
Transparency features of DDBMS hides all the complexities of a distributed database are
hidden from the user.
Following are some the key transparency features of Distributed DBMS:
1. Distribution transparency: It allows a user to view a distributed database as a single
logical or centralized database. It helps to achieve abstraction and the user is freed from the
complexities of knowing:
a. That the rows and columns of entities are partitioned horizontally and vertically, and also
stored on multiple distributed sites.
b. That multiple sites are used to store the replication of the data.
c. The location where the request data is stored.
Transaction transparency, which allows a transaction to update data at more than one network
site.
2. Transaction transparency: It helps to achieve the atomicity and integrity features of the
database. It ensures that the transaction is either entirely completed or aborted.
3. Failure transparency: The system should be functional at all times, and this feature helps
to ensure that the function continues to be operation even in case of a node failure.
4. Performance transparency: It ensures that system functioning is free from all
complexities related to distributed nature of the database and the system is able to function as
it were a centralized database. Performance of the system should not suffer due to
heterogeneous nature of network’s platform. Performance transparency also deals with the
optimum routing to find the most cost-effective path to access remote data.
5. Heterogeneity transparency: The schema used at individual sites is independent of each
other, and heterogeneity transparency ensures that the data from several different local is
integrated into common global schema. It is the responsibility of the DDBMS is to translate
the data requests from the global schema to the local DBMS schema.

11.3 Distributed transparency


It is important that the distributed database operates just like a centralized database, and
distribution transparency tries to ensure the same. The level to which distributed transparency

140 | P a g e
can be achieved, vary from one DDBMS or another. Following are the three levels of
distribution transparency known:
1. Fragmentation transparency: It is the highest level of transparency. According to this
transparency level, the user doesn't even need to be aware of the fact that the data base is
partitioned. It is because of this transparency feature, that the user does not need to mention
the name and location of the fragments prior to data access requests.
2. Location transparency: It refers to middle level of transparency. If a DDBMS supports
only the location transparency, then the end user has to specify the database fragment names,
but does not need to specify the location of the fragments.
3. Local mapping transparency: It refers to lowest level of transparency. If a DDBMS
supports only the local mapping transparency, then the end user has to specify both the
database fragment names and the fragments locations.

Let me explain it with the help of an example. Consider an entity named employee having
following attributes:
1. NAME
2. AGE
3. ADDRESS
4. DEPARTMENT
5. SALARY
The employee table is distributed and stored at three different locations: Punjab, Haryana and
New Delhi. Also, the table has been partitioned into three fragments based on the location,
i.e. data related to Punjab employees is saved in fragment F1, data related to Haryana
employees is saved in fragment F2 and data related to New Delhi employees is saved in
fragment F3.

Figure 11.1 Fragment locations

141 | P a g e
All the fragments are unique, and there is no duplicity of data. Suppose, the end user wants to
display all employee with age greater than 30.

In case, the DDBMS supports the highest level of the distributed transparency, i.e.
fragmentation transparency, the query can be written in the form that conforms to a non
distributed database.
select * from employee where age > 30;

In case, the DDBMS supports the middle level of the distributed transparency, i.e. location
transparency, the query cannot be written in the form that conforms to a non distributed
database, and the fragment names must be specified in the query.
select * from F1 where age > 30 UNION select * from F2 where age > 30 UNION select *
from F3 where age > 30

In case, the DDBMS supports the lowest level of the distributed transparency, i.e. local
mapping transparency, the query must include both the fragment names and the fragment
locations.
select * from F1 NODE Punjab where age > 30 UNION select * from F2 NODE Haryana
where age > 30 UNION select * from F3 NODE New Delhi where age > 30.

It is not difficult to understand from the last three examples, how the three levels of
distributed transparency affects the way interaction with the distributed database is performed
by the user.
Distributed data dictionary (DDD) or distributed data catalog (DDC) component of DDBMS
provides support for Distribution transparency. Description of the complete distributed
database is contained in the distributed data catalog. All local TPs use the common database
schema, called the distributed global schema to translate user requests into remote requests to
be processed by different DPs.

The DDC is distributed and replicated at the network nodes, and hence it is important to
maintain the integrity in DDC at all sites.

142 | P a g e
Most of the DDBMS support distribution of database on multiple storage sites, but not a
table. Such a distribution support is called location transparency, and not fragmentation
transparency.

Check your progress/ Self assessment questions- 1


Q1. List various transparency features provided by DDBMS.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. Define fragmentation transparency.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q3. In case of ________________ transparency, the end user has to specify the database
fragment names, but does not need to specify the location of the fragments.

Q4. _____________ component of DDBMS provides support for Distribution transparency.

11.4 Transaction transparency


Transaction transparency is a DDBMS property that ensures that database transactions will
maintain the distributed database’s integrity and consistency. In case of DDBMS database
transaction, updates are made to the data stored at multiple storage sites connected in a
network. Transaction transparency ensures that a transaction is considered to be complete,
only when all database sites that are part of the transaction complete their part of the
transaction.
It is not easy to manage transactions and ensure the consistency and integrity of the
distributed database. A distributed database management system must provide support for
managing remote requests, remote transactions, distributed transactions, and distributed
requests.

11.4.1 Distributed Requests and Distributed Transactions

143 | P a g e
How do you differentiate between the distributed and nondistributed transaction? A
distributed transaction is one that can update or request data from several different remote
sites on a network. To begin with distributed transaction concepts, it is important to establish
the difference between remote and distributed transactions using the BEGIN WORK and
COMMIT WORK transaction format.

Assume that the distributed DBMS supports location transparency to avoid having to specify
the data location. A remote request, lets a single request access the data to be processed by a
single remote database processor.

Figure 11.2 Remote Request.

A remote transaction is composed of several requests that accesses data at a single remote
site.

Figure 11.3 Remote Transaction.

Following are the key features of remote transaction:


Transaction updates the tables stored at single site.
The remote transaction is sent to and executed at a single remote site.

144 | P a g e
The transaction can reference only a single remote DP.

In sharp contrast, a distributed transaction can reference several different local or remote DP
sites. A single request can reference only one local or remote DP site, but each request in a
transaction can reference different sites.

Figure 11.4 Distributed transaction.

Following are the key features of distributed transaction:


A distributed transaction can reference multiple remote sites.
Distributed requests are processed by the DP at multiple remote sites.
One request can access only one remote site at a time.
A problem may arise, if a table is divided into fragments that are stored at multiple sites; as
one request can access only one remote site at a time. It is then important for a distributed
database management system to support distributed request.

A distributed request lets a single request to reference data stored at multiple local or remote
DP sites. It also facilitates a transaction to access several sites. Support for distributed request
lets you to:
Partition a table into several fragments, and store them on multiple sites.
Implement fragmentation transparency, i.e. reference multiple fragments with only one
request.
The location and partition of data should be transparent to the end user. A physically
partitioned table can also be referenced using the distributed request feature.

11.4.2 Distributed Concurrency Control

145 | P a g e
Processing concurrent transactions in a distributed environment can be challenging task.
Concurrency control in distributed database environment need to ensure that multiple-process
operations on multi-sites do not create data inconsistencies, or create deadlock

It is the responsibility of the TP component of a DDBMS to ensure that all parts of the
transaction are completed at all sites, before a final COMMIT is issued by it. If all local DPs
are permitted to COMMIT their transaction operations, it could lead to inconsistency and
integrity problems in the database. Consider that if local DPs commit their part of the
transaction and one DP fails to commit its part of the transaction, the database will be left in
inconsistent state
A two-phase commit protocol is a solution to this problem and discussed in the next section.

Two-Phase Commit Protocol


It is easy to implement concurrency control in a centralized database environment. All
database operations are performed at single site only, and the results of database operations
are immediately known to the DBMS. In a distributed database environment, a transaction
can access and operate on data at several sites. A final COMMIT cannot be issued before
knowing that operations on all sites have been completed.

In case one DP fails to COMMIT its part of the transaction, two-phase commit protocol
ensures that the parts of transaction COMMITED by other DPs will be rolled back, and the
database will be restored to the before image.
Each DP must maintain its own transaction log. Two-phase commit protocol ensures that the
transaction entry log for each DP is written before the database fragment is actually updated.
A DO-UNDO-REDO and write-ahead protocols are used by two-phase commit protocol to
maintain database consistency. Local DP uses the DO-UNDO-REDO protocol to roll back
and/or roll forward transactions with the help of the transaction log entries. Following
operations are performed by the DO-UNDO-REDO protocol.
DO records the “before” and “after” values in the transaction log after performing the desired
operation.
UNDO reverses an operation using the log entries.
REDO redoes an operation using the log entries.

146 | P a g e
A write-ahead protocol is used to ensure that the three operations can survive a system crash.
The write-ahead protocol forces the log entry to be written to permanent storage before the
actual operation takes place.
The two-phase commit protocol defines a coordinator node and one or more subordinate
nodes. Operations are performed between these two types of nodes only. There can be one
coordinator node, and it generally is the one that initiates the transaction.
The two-phase protocol is implemented as follows:

Preparation Phase
The coordinator node sends a PREPARE TO COMMIT message to all subordinate nodes.
1. The subordinate nodes after receiving the message, performs the write operation on the
transaction log using the write-ahead protocol. Subordinate nodes then sends an
acknowledgment PREPARED TO COMMIT or NOT PREPARED TO COMMIT to the
coordinator node.
2. The coordinator moves forward only if all nodes are ready to commit, or it broadcasts an
ABORT message to all subordinate nodes.

Final COMMIT phase


1. Coordinator node broadcasts a COMMIT message to all subordinate nodes and waits for
their reply.
2. Each subordinate node then updates the database using the DO protocol.
3. The subordinates then reply with a COMMITTED or NOT COMMITTED message to
coordinator node.
In case, any of the subordinate node replies with a NOT COMMITTED message, the
coordinator sends an ABORT message to all subordinate nodes and force them to UNDO all
changes.
The objective of the two-phase commit is to ensure data consistency.

Check your progress/ Self assessment questions- 2


Q5. What is the objective of transaction transparency?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

147 | P a g e
Q6. How data consistency can be ensured in DDBMS?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q3. In case of ________________ transparency, the end user has to specify the database
fragment names, but does not need to specify the location of the fragments.

Q4. _____________ component of DDBMS provides support for Distribution transparency.

11.5 CLIENT/SERVER VS. DDBMS


It is not fair to consider the distributed database management system, same as client/ server
system. Client/server architecture represents the way computers interact to form a system.
The client/server architecture consists of two types of components. Client, called the user of
the resources and server, known as provider of resources. Client sends request to the server
that is processed by the server and a reply in send back to the client. A server may provide
resources such as processor, storage, application to the client. When it comes to
implementing a distributed database management system, client acts as TP and the server is
the DP. The client (TP) interacts with the end user and sends a request to the server (DP). The
server (DP) after receiving the requests, prepares a schedule and execute the requests.

Advantages of client/server applications:


1. It is economical to set up a client/server solutions than alternate minicomputer or
mainframe solutions. Client/server solutions allow the client systems to bevery light in terms
of processing capabilities.
2. For a client system, using the ever-present Web browser in conjunction with Java and
.NET frameworks provides a familiar end-user interface.
3. End users are more familiar with working of the PCs, rather than mainframe computers.
4. Internet coupled with security advances tend to provide a more reliable and secure
platform for business transactions.
5. Client/server architecture is well suited for proving data analysis to facilitate interaction
with many of the DBMSs.
6. Cost advantage is associated with offloading applications development from mainframe to
PCs.

148 | P a g e
Disadvantages of Client/server applications:
1. The client/server architecture finds is difficult to manage environment consisting of
different platforms.
2. Security can be an issue when interaction grows multi-fold with ever increasing number of
users.
3. IT industry is finding it difficult to rope in people with a broad knowledge of software
applications used to manage client/ server architecture to meet the ever growing demand for
data.

11.6 Summary
Distribution transparency allows a user to view a distributed database as a single logical or
centralized database, whereas transaction transparency helps to achieve the atomicity and
integrity features of the database. Highest level of distributed transparency ensures that the
user does not need to mention the name and location of the fragments prior to data access
requests. Transaction transparency is a DDBMS property that ensures that database
transactions will maintain the distributed database’s integrity and consistency. Concurrency
control in distributed database environment need to ensure that multiple-process operations
on multi-sites do not create data inconsistencies. To ensure data consistency in distributed
environment, a final COMMIT cannot be issued before knowing that operations on all sites
have been completed. In case one DP fails to COMMIT its part of the transaction, two-phase
commit protocol ensures that the parts of transaction COMMITED by other DPs will be
rolled back

11.7 Glossary
Transparency- It may be defined as abstraction of user view from distributed view.
Fragment- A partition of database or table is called fragment.
Distributed data catalog- Description of the complete distributed database is contained in the
distributed data catalog.
Global Schema- It is used by all local TPs to translate user requests into remote requests to be
processed by different DPs.
Two-phase protocol- Concurrency control protocol used to maintain database consistency
during execution of concurrent transactions in distributed database environment.

11.8 Answers to check your progress/self assessment questions

149 | P a g e
1. Following are some of the transparency features provided by DDBMS:
a. Distribution transparency
b. Transaction transparency
c. Failure transparency
d. Performance transparency
e. Heterogeneity transparency
2. It is the highest level of transparency. If it is supported by the DDBMS, user does not need
to mention the name and location of the fragments prior to data access requests.'
3. location.
4. DDD or DDC.
5. Key objective of transaction transparency is to ensure that database transactions will
maintain the distributed database’s integrity and consistency.
6. Two-phase commit protocol is used to ensure the data consistency in DDBMS. A final
COMMIT cannot be issued before knowing that operations on all sites have been completed.
In case one DP fails to COMMIT its part of the transaction, parts of transaction COMMITED
by other DPs are rolled back.

11.9 References/ Suggested Readings


"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database management Systems by R. Panneerselvam, PHI.
4. Database management system Concepts by P. K. Singh, VK Publications.
5. Database systems: Design Implementation and Management by Peter Rob and Carlos
Coronel, Cengage India Pvt. Limited".

11.10 Model questions


1. Explain the working of two-phase commit protocol.
2. List various advantages of client/ server architecture.
3. Explain various levels of fragmentation transparency.
4. Explain various features of distributed database transparency.
5. Explain location transparency with the help of an example.

150 | P a g e
Lesson 12 Business Intelligence

Structure
12.0 Objective
12.1 Introduction
12.2 Business intelligence: An introduction and related terms
12.3 Business intelligence architecture
12.4 Data Analysis
12.4.1 Need for data analysis
12.5 Summary
12.6 Glossary
12.7 Answers to check your progress/self assessment questions
12.8 References/ Suggested Readings
12.9 Model questions

12.0 Objective
After studying this lesson, students will be able to:
1. Define the concept of business intelligence.
2. Explain various components of business intelligence architecture.
3. List various types of data analysis tools.
4. Discuss the need of data analysis.

12.1 Introduction
So far you have learned the concepts related to management of operational databases.
Operational databases are not fit for performing the data analysis task. The operational
databases are optimized for managing transactional data, where as you need a different type
of data model to perform effective data analysis. Also the data in operational databases over
various locations is inconsistent and not integrated with each other. In this lesson you will
learn the concept of business intelligence that deals with extraction of data from various
operational and other sources, and use of various data analysis tools to provide knowledge to
management for effective decision making.

12.2 Business intelligence: An introduction and related terms

151 | P a g e
Business intelligence facilitates decision making process. Business intelligence makes use of
large amount of data available and it applies various mathematical models and analysis
methodologies to help decision making. Business Intelligence (BI) systems is a combination
of tools used for extracting, and analysing business data. It helps in improving the efficiency
of business decisions. BI systems initially gather data from various data sources, and then
applies aggregation of extracted data from various dimensions. The aggregations and
summaries are then analysed to support decision making by generating knowledge from it.
Visualization and reporting tools are used to present such knowledge to the decision makers
for taking effective decisions. BI allows the end users to submit their queries to a single
central integrated repository without having to worry about the location of the actual data
source. This central repository is called data warehouse and it contains all the historical data.
Business Intelligence applications analyse patterns in sales, trends, and customer behaviour to
assist in the business decision-making process.

There is a need for any enterprise to take continuous decisions. Primary object of a Business
Intelligence system is to provide methodologies that facilitates in taking effective and timely
decisions. In-depth analysis of data provide decision makers with more dependable
information and helps in taking effective decisions. It is important that data from external
sources and its analysis is provided to decision takers with specified time domains, so that an
enterprise is able to rapidly react to actions of competitors.

It is important to understand the difference between data, information and knowledge


to get a better sense of what Business Intelligence applications do.
Data is used to represent a structure of single primary entity or structure of multiple entities
that are related to each other. Customer, supplier, sales are some of the example of primary
entities that represent data.
Extraction and processing is performed on data to generate meaningful information. For
example, information related to the car models whose sales increased more than 10%, or
percentage of customers for using debit/ credit cards for making payment. This type of
information is useful for decision making point of view.
Information is then transformed into knowledge. It is knowledge that is eventually used to
make decisions. For example, when you combine multiple pieces of information to derive
knowledge such as, purchase by particular group of customers, in a particular area, where a
new retails shop was opened by a competitor has gone down.

152 | P a g e
This knowledge is derived from the information by a business intelligence applications using
various mathematical models. Mathematical models may also be used to represent the
knowledge in form of graphs and pictures. Business intelligence systems provide a scientific
and rational approach to the management.

Check your progress/ Self assessment questions- 1


Q1. Define Business Intelligence.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. Business Intelligence (BI) systems is a combination of tools used for ____________ and
___________________ business data.
Q3. ________ is used to represent a structure of single primary entity or structure of multiple
entities that are related to each other.
Q4. Information is transformed into ______________.

12.3 Business intelligence architecture


Business intelligence architecture comprises of 3 major components:
1. Data sources
2. Data warehouses and data marts,
3. Business intelligence methodologies.

Figure 2.1 : Business intelligence architecture.

153 | P a g e
Source: Business Intelligence: Data Mining and Optimization for Decision Making by Carlo
Vercellis, WILEY.

1. Data sources: Data sources of a multinational enterprise is spread all around the world.
The data stored in individual data sources is independent of each other in regard to structure
and other rules and regulations. Also these data sources are optimized to store the operational
data that results in lot of inconsistencies in the data. It is important that data from these
independent data sources in integrated into single central data repository before it can be used
to perform the data analysis. Data sources may include operational databases, data from
documents like excel sheets, files, DAT files, or even from some external sources. It is
beyond the scope of this lesson to discuss all of these in detail.

2. Data warehouse or data mart: Data warehouse systems are probably the most popular
among all the DSS’s. Data warehouse may be defined as integrated collection of data that
supports decision-making processes. Data warehouses are subject-oriented as they pivot on
enterprise-specific concepts. Data warehouses extract data from variety of sources using the
ETL ( extract, transform, load ) tools. Data warehouse is an enterprise level data repository
and scope of data mart is limited only to a department. Data warehouse records all historical
data and lets you analyze the past data as well. Data is kept in the warehouse forever and
regular and periodical updates are made to it from operational data stores.

3. Business intelligence methodologies: Once the data has been extracted from all data
sources in an integrated central repository, it is then supplied to various mathematical and
business intelligence methodologies intended to support decision making. Some of the
decision support applications implemented by business intelligence systems are as follows:

a. Multidimensional cube analysis


A Multidimensional data cube is often referred to as a web of cuboids. Pre-computation of
some of these cuboids is key to faster response to queries. Pre-computation of
multidimensional aggregates is also known as materializing of cuboids. Aggregate measure,
M, is computed for all cuboids or dimensions. Least generalized of all cuboids is the base
cuboids and highly generalized of all cuboids is the apex cuboids that represents measure M.
You can move downwards from the apex cuboids using drill down operation and move

154 | P a g e
upwards from the base cuboids using roll up operation. Pre-computing the full cube leads to
faster on-line analytical processing.

b. Data Exploration
Data exploration method is used to search for anomalies guided by pre-computed indicators
of exceptions at various levels of detail in the cube. If the value in a data cube cell is
considerably dissimilar from the value anticipated based on a statistical model, it is known as
exception. The anticipated value of a given cell is a function consisting of coefficients of the
statistical model used at the higher-level cuboids of the given cell. These coefficients reflect
how different the values at more detailed levels are, based on impressions formed by looking
at higher-level aggregations.

c. Time series analysis


Time series analysis refers to methodologies for analysing time series data in order to extract
meaningful statistics and other characteristics of the data. It is used for forecasting and
predicting future values based on previously observed values.

d. Data mining
Data mining may be defined as a business process that interacts with other business processes
to explore massive data that grows with every passing day for the discovery of knowledge or
meaningful patterns/ rules to help the business in forming strategies. Data Mining is
extraction of potentially important/ key information from data. Data mining is not a new
concept and people have been analyzing data since the first generation computers were
invented.

e. Optimization
It refers to determining the best solution out of alternative actions. Following are some of the
optimization techniques:
1. Dimension attributes be applied operations such as sorting and grouping for reordering and
clustering of related tuples.
2. Higher-level aggregates can be computed from previously computed lower-level
aggregates by continuously aggregating and caching the intermediate results.
3. The Apriori pruning method should be applied, which states that descendants of a cell that
fails to satisfy minimum threshold are also not likely to satisfy minimum threshold support.

155 | P a g e
In the end, it is time to take the decision, i.e. to select the best alternative from amongst
number of options.

Check your progress/ Self assessment questions- 2


Q5. List three major components of business intelligence.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q6. Define data warehouse.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q7. Explain time series analysis.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

12.4 Data Analysis


Analysis of the data includes simple query and reporting functions, statistical analysis, data
mining, OLAP associated with multidimensional analysis. Following are some of the tools
and methods used for data analysis.
Queries and Reports. Query is a structured statement passed to the DBMS, which in
response generates a sub-set of data. A well-documented query helps to exact accurate piece
of information. Queries may be repetitive in nature that may be saved and reused to generate
reports, or queries may be ad-hoc in nature that are defined spontaneously. Report is
responsible for the presentation of the data retrieved by the query. Data may be presented in
the form of tables, spreadsheets, graphics, or combination of these.
Managed Query Environments. It is used to describe a query and reporting package that
allows automated control over user’s access to data in accordance with each user's level of
expertise and business needs. A managed report is a report design, generation, and processing
environment that permits the centralized control of reporting. Managed reporting

156 | P a g e
environment provides an intelligent report viewer that contain hyperlinks between relevant
parts of a document or allow embedded OLE objects within the report.
Online Analytical Processing (OLAP). The most popular technology in data analysis is
OLAP. OLAP is an analysis tool that helps decision support tool to analyse multidimensional
data. OLAP is used to structure data hierarchy in such a way that it reflects the real
dimensionality of the enterprise as understood by the users.OLAP cube is a data structure that
allows fast analysis of data. OLAP cube holds data like a 3D spreadsheet instead of relational
database. Multidimensional OLAP (MOLAP) databases are used to create and physically
store cubes, whereas a relational OLAP (ROLAP) databases are used to create virtual cubes.
Operations of data cubes allow aggregation of data from different dimensions. Roll-up
operation can be performed using either the concept of dimension reduction or moving up the
level of hierarchy of a dimension. Drill down is used to summarize data at a lower level of a
dimension hierarchy by moving down the concept hierarchy levels. Slice is used to form a
sub-cube by selecting a dimension from a given cube for which the sub-cube is to be formed.
Dice operation is performed by defining range select condition on single or more dimensions.
Pivot means to re-orient the cube for an alternative presentation of the data. It can also be
used as transformation of 3D view to series of 2D planes.

12.4.1 Need for data analysis


Enough data is available with any business house and what is needed, is complete analysis of
the same. Data analysis can help you get the information you are hungry for. Managers are
interested in key aggregations and summaries and not the entire data. Here are some of the
key reasons that forces each enterprise to perform data analysis.
1. Revealing hidden facts
Data extracted from various data sources using ETL tools may be consistent and integrated,
but it still is raw data. Facts of interest from the end user point of view are usually in hidden
form and if proper analysis is performed, the same could go unrevealed. It is not possible for
the management to go through all the data, and it depends heavily on the pre-computed
summaries and aggregations.
2. Effective decision making
Data analysis is able to pick out interesting facts from the data that are more reliable.
Decision takers can make effective decisions based on this more accurate and reliable data.
Data analysis helps to make important business decisions and improve business processes in
a better way.

157 | P a g e
3. Timely decision making
Quick and correct access to useful information is key to taking better decisions. Data analysis
is able to perform analysis not only on the current data, but also on the historical data. Data
analysis is a continuous process and it keeps feeding the decision takers with periodical
reports.
4. Understanding the criteria and mechanism
Data analysis methodologies explicitly describe both the criteria for evaluating alternative
choices and the mechanisms regulating the problem under investigation. It also helps to raise
the awareness of the underlying logic of the decision making process.
5. Data Visualization
Most of the data analysis tools provide their reports in graphical form. Some of the most
popular data visualization tools are boxplots, dendogram, plotting histogram, quantile plot,
scatter plot, loess (regression) curve, Tree mapping, Hyperbolic tree, etc. Depending on the
type of data available, relevant visualization tool can be used to represent the analysis. For
example, plotting histogram is a graphical method for summarizing the distribution of a given
attribute.
6. Policy formulation
Data analysis is key to finding latest market trends, industry research, sales promotion, etc.
Retail industry is able to understand the latest buying trends and identify the frequent items
sets. Frequent items sets are the items that are most frequently bought together.
Understanding the customer is important for any business. It helps them to retain the current
customer base and also add prospective customers.

Check your progress/ Self assessment questions- 3


Q8. _____________ is a structured statement passed to the DBMS, which in response
generates a sub-set of data.
Q9. Define OLAP.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q10. List at least 3 reasons to perform data analysis.


___________________________________________________________________________
___________________________________________________________________________

158 | P a g e
___________________________________________________________________________

12.5 Summary
Business intelligence makes use of large amount of data available and it applies various
mathematical models and analysis methodologies to help decision making. Business
Intelligence (BI) systems is a combination of tools used for extracting, and analysing business
data. Data is used to represent a structure of single primary entity or structure of multiple
entities that are related to each other. Extraction and processing is performed on data to
generate meaningful information. Information is then transformed into knowledge. Business
intelligence architecture comprises of 3 major components: Data sources, Data warehouses
and data marts and Business intelligence methodologies. Data sources may include
operational databases, data from documents like excel sheets, files, DAT files, or even from
some external sources. Data warehouse may be defined as integrated collection of data that
supports decision-making processes. Data warehouses are subject-oriented as they pivot on
enterprise-specific concepts. Data warehouses extract data from variety of sources using the
ETL ( extract, transform, load ) tools. Data in data warehouse is applied mathematical and
business intelligence methodologies intended to support decision making. Query is a
structured statement passed to the DBMS, which in response generates a sub-set of data.
Some products build dictionaries of queries that allows you to query the system based on
drag-and-drop query-building interface. Report is responsible for the presentation of the data
retrieved by the query. Enough data is available with any business house and what is needed,
is complete analysis of the same. Managers are interested in key aggregations and summaries
and not the entire data.

12.6 Glossary
Data Warehouse- Data warehouse may be defined as integrated collection of data that
supports decision-making processes.
Data Mart- Data Mart is a department level data warehouse.
OLAP- OLAP is used to structure data hierarchy in such a way that it reflects the real
dimensionality of the enterprise as understood by the users.
Operation data sources- Operational data sources are used to store and manage the
transactional data.
ETL- It refers to extraction, transformation and loading tools used by BI application to bring
data from operation sources into data warehouse.

159 | P a g e
Managed Query Environments- It is used to describe a query and reporting package that
allows automated control over user’s access to data.

12.7 Answers to check your progress/self assessment questions


1. Business intelligence makes use of large amount of data available and it applies various
mathematical models and analysis methodologies to help decision making.
2. extracting. Analysing.
3. data.
4. knowledge.
5. Business intelligence architecture comprises of following 3 major components:
a. Data sources
b. Data warehouses and data marts,
c. Business intelligence methodologies.
6. Data warehouse may be defined as integrated collection of data that supports decision-
making processes. Data warehouses are subject-oriented as they pivot on enterprise-specific
concepts. Data warehouses extract data from variety of sources using the ETL ( extract,
transform, load ) tools.
7. Time series analysis refers to methodologies for analysing time series data in order to
extract meaningful statistics and other characteristics of the data. It is used for forecasting and
predicting future values based on previously observed values.
8. Query.
9.OLAP is an analysis tool that helps decision support tool to analyse multidimensional data.
OLAP is used to structure data hierarchy in such a way that it reflects the real dimensionality
of the enterprise as understood by the users.
10. Three important reasons to perform data analysis are as follows:
a. Revealing hidden facts
b. Effective and timely decision making
c. Understanding the criteria and mechanism of decision making.

12.8 References/ Suggested Readings


"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.

160 | P a g e
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications."

12.9 Model questions


1. What is the need for data analysis?
2. Explain the 3 components of business intelligence architecture in detail.
3. Define data, information and knowledge.
4. Define business intelligence.
5. Explain various tools and methods used for data analysis.

161 | P a g e
Lesson 13Decision support data
Structure
13.0 Objective
13.1 Introduction
13.2 Operational Data vs. Decision Support Data
13.3 Decision support database requirements
13.3.1 Database Schema
13.3.2 Data Extraction and Filtering
13.3.3 End-User Analytical Interface
13.3.4 Database Size
13.4 Decision support Database properties
13.5 Summary
13.6 Glossary
13.7 Answers to check your progress/self assessment questions
13.8 References/ Suggested Readings
13.9 Model questions

13.0 Objective
After studying this lesson, students will be able to:
1. Define the concept of decision support data.
2. Differentiate between operational and decision support data.
3. List various requirements of decision support data.
4. Explain the key properties of decision support databases.

13.1 Introduction
Effectiveness of business intelligence system depends on the quality of data gathered at the
operational level. But the operational data cannot be directly used for decision making.
Following are some of the key differences between the original operational data and the
decision support data needed for strategic decision making.

13.2 Operational Data vs. Decision Support Data


Objective and usage of both the operational and decision support data are different, and it is
natural that the structure and format of the two will also be different. Highly normalized
relational data models are used to store the operational data, optimized for handling the

162 | P a g e
transactional data. For example, every call made, item sold, booking made, each and every
transaction should be accounted for. Transactional data is updated frequently. In order to
optimize the transactional data, relational data models store data in decomposed tables with
lesser number of attributes to avoid storing redundant information. Simple transaction related
to sale of an item, might be recorded in multiple tables such as, invoice, invoice line, store,
department and discount. It is well suited for transactional data, but it is very expensive from
data retrieval point of view as you need to apply join operation on number of tables to fetch
sales information. Operational data are useful for capturing day to day business transactions
and decision support data is used to make tactical and strategic business decisions based on
intelligent retrieval of operational data.

Difference between operational and decision support data from the data analyst’s point
of view:
Time span- Operational data cover a short time frame, in other words it keeps record of only
the current transactional data. Whereas, decision support data tend to cover a longer time
frame, i.e. it also stores all the historical data that may be useful to carry out analysis.
Strategic and tactical decisions cannot be based on current transactional data, and are based
on data available for a length of period.

Granularity- It refers to the level of aggregation. Decision support data must be presented at
different levels of aggregation, from highly summarized to near-atomic. For example, if you
want to analyse the sales based on region, then the data must be available for each region;
states within that region; cities within that states; stores within that cities. More the level of
details, high the granularity and better the decision making. It enables a manager to drill
down the data into more atomic components or fine-grained data at lower levels of
aggregation. Managers may also perform roll up for aggregating the data to a higher level.

Dimensionality- Focus of the operational data is on representing individual transactions,


rather than representing the effects of a transaction over time. In contrast, data analysts tend
to include many data dimensions and are interested in how the data relate over those
dimensions. For example, an analyst may be interested to know the sales performance of an
item A, in a specified region as compared to the sales of the same product in same month, last
year. Now this analysis is based on two dimension: place (represented by region) and time
(represented by current year and last year).

163 | P a g e
Difference between operational and decision support data from the designer’s point of
view: Operational data happen to store the transactions happening in real time. Decision
support data is a snapshot of operational data at a given point in time. Hence, decision
support data represents time slice of historic operational data.

Operational and decision support data also differ in terms of transaction type and transaction
frequency. Operational data are concerned with transactions related to update, whereas the
decision support data is concerned with transaction related to data retrieval or query.
Frequency of transactions in operational data is very high as compared to decision support
data, as decision support data involves periodic updates to load new data that are summarized
from the operational data.

Operational data is highly normalized and the data about a single transaction is stored in
multiple tables. On the contrary, decision support data is stored in fewer tables that store data
derived from the operational data. The decision support data do not include the details of each
operational transaction. Instead, decision support data stores data that are integrated,
aggregated, and summarized for decision support purposes. It helps to save time and provide
answer to queries in lesser time.

The degree of summarized data is very high in case of decision support databases as
compared to operational databases. Therefore, you will see a great deal of derived data in
decision support databases. For example, an operational data is concerned with storing each
and every transaction that happens throughout the day, whereas, the decision support data is
concerned with storing only the summaries like: number of units of each product sold, total
volume of sales in terms of unit and price, etc. Decision support data is collected to monitor
aggregates, such as total sales for a product, or for a single store. Summaries are useful to
evaluate sales trends, product sales comparisons, and so on, that serve decision needs.

Data models for the two types of databases is also different. Operational database suffers
from various update anomalies due to frequent data updates. Operational system generally
require normalized structures having large number of entities, with fewer attributes per entity.
The objective is to remove the data redundancy and save time to record a transaction.

164 | P a g e
Decision support database focuses on querying capability. Decision support databases require
De-normalized structures with fewer tables, with large number of attributes per table.
Query activity in the operational database tends to be low. Therefore, queries against
operational data typically are narrow in scope, low in complexity, and speed-critical. In
contrast, sole purpose of creating a decision support data is to serve query requirements.
Queries against decision support data typically are broad in scope, high in complexity, and
less speed-critical.
Decision support data consists of very large amounts of data resulted by two factors:
1. Data model used to store data is De-Normalized and generally consists of large amount of
data redundancies.
2. Same data is categorized in many different ways to represent different snapshots. For
example, summaries of sales data for different attributes like, product, may be stored from

Check your progress/ Self assessment questions- 1

Q1. Which data model is used to maintain operational data?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. Why the transactional database is decomposed into smaller relations?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q3. Why smaller relations are not suited for data retrieval task?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q4. Differentiate between operational and decision support data based on Time span.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

165 | P a g e
13.3 Decision support database requirements
A decision support database is optimized to support fast answers to complex queries.
Following are the key requirements of decision support data:

13.3.1 Database Schema


De-Normalized schema is used to represent the decision support database. Decision support
database consists of aggregation and summarization of various dimensions. Queries must be
able to extract multidimensional time slices. For a RDBMS, the conditions suggest using
non-normalized and even duplicated data. Simple operational data store may also be used
decision support database. But when it comes to storing transactional data for multiple stores,
it may not be suitable for decision support. One would suppose that a decision support
database becomes a factor when dealing with more than one store, each of which has more
than one department. Database must contain data for all of the stores and all of their
departments, and it must be able to support multidimensional queries that track sales by
stores, by departments, and over time.

You also need to fine-tune the decision support database to support fast retrieval of data or
faster processing of queries. Query response can be optimized using indexing and data
partitioning concepts of DBMS. The component of DBMS that deals with fine tuning the
query response time is called query optimizer and the same must be enhanced to support the
non-normalized structures in decision support databases.

13.3.2 Data Extraction and Filtering


Source of decision support databases is the operations data sources, flat files, and other
external sources. These data sources are highly inconsistent and also based on different
structures and formats that need to be integrated. Advanced data extraction and data filtering
tools are used to bring the data from different data sources into consistent and integrated
form. Operational databases are optimized to handle high frequency of database updates, and
burden of data extraction on operational databases should be minimized by supporting batch
and scheduled data extraction. Data extraction should support all formats and data models.
Data filtering is concerned with checking the consistency of the database and also the
validation of the database snapshot.

166 | P a g e
Data from external resources also require you to solve issues like data-formatting conflicts.
For example, social_security_no and date formats used in external sources may use different
formats. Also the measurement scales may be different or the currency used to represent the
sales figures. Also the attribute names may be different, that represent the same information.
All such issues must be handled during extraction. Decision support data cannot consist of so
many inconsistencies and must be filtered to check for inconsistencies to ensure that only the
pertinent decision support data are stored in the databases.

13.3.3 End-User Analytical Interface


Presenting data to the end user for analysis is the ultimate goal of decision support system.
The decision support DBMS must support advanced data modelling and data presentation
tools. Using those tools makes it easy for data analysts to define the nature and extent of
business problems. Decision support system must generate queries to retrieve the appropriate
data related to the problem defined, from the decision support database. Query results may
also be evaluated with data analysis tools supported by the decision support DBMS.
Queries must be optimized for speedy processing. The end-user analytical interface is one of
the most critical DBMS components, and a number of visualization tools are available for
effective and easy to analyse representation of data. A user friendly analytical interface
facilitates each navigation through the data for speedy decision making.

13.3.4 Database Size


Decision support databases are supposed to store all the historical data, and also all sought of
aggregations and summaries. Generally the decision support databases tend to be very large.
Decision support databases are optimized for faster and effective data retrieval. They are
based on de-normalized structure that consists of lots of duplicate information or data
redundancies. DBMS for decision support must be capable of supporting very large databases
(VLDBs). Special and advanced hardware support is needed to manage very large databases
such as, multiple disk arrays, multiple-processor technologies like symmetric multiprocessor
(SMP) or massively parallel processor (MPP). The complex information requirements
demand for sophisticated data analysis ignited the need of a new type of data repository that
facilitate data extraction, data analysis, and decision making, called data warehouse.

Check your progress/ Self assessment questions- 2

167 | P a g e
Q5. _______________ schema is used to represent the decision support database.
Q6. How can you optimize query response?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q7. Which component of DBMS is responsible for optimizing query response?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q8. What do you mean by filtering?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

13.4 Decision support Database properties


Data warehouse systems are the most popular among all the DSS’s. Data warehouse may be
defined as Collection of data that supports decision-making processes. Following are the key
features of data in data warehouse:
a. Integrated
b. subject-oriented.
c. Time variant
d. Non-volatile.

Integrated- Data warehouse is a centralized, consolidated database that integrates data derived
from multiple sources with diverse formats. A DSS DBMS should perform data integration
on data before loading the same. Data integration means that all business entities, data
elements, data characteristics, and business metrics are described in the same format. You
may believe that the data across an enterprise must be stored using the same format, but the
reality is far from this belief. For instance, the grades of students may be represented using
text labels such as “O”, "A+", "A", "B+", "B" in one department and as “ 1 ”, “ 2”, “ 3 ”, “ 4 ”
and " 5" in another department. To avoid the potential confusion, the data in the DSS
databases must conform to a common format.

168 | P a g e
Subject-oriented- Operational databases are based on enterprise-specific applications like
order, invoice, and discount. Data warehouses are subject-oriented and are based on
enterprise-specific concepts. DSS data are arranged and optimized to provide answers to
questions coming from diverse functional areas within a company. DSS data are organized
and summarized by topics such as sales, marketing, distribution, and transportation, and each
topic is specified using subjects such as products, customers, departments, regions,
promotions, and so on.

Time-variant- Data warehouse systems also add some degree of new information, but are
predominantly used for rearranging of existing information. Operational data covers
transactions involving the latest data, i.e. for a very short period of time. Data warehouse
records all historical data and lets you analyse the past data as well. DSS data represent the
flow of data through time. The data warehouse can even contain projected data generated
through statistical and other models.

Non-volatile- Data is kept in the DS warehouse forever and is never removed from it. Regular
and periodical updates are made to it from operational data stores. DS data keeps on growing,
as historical data is not removed, and new data and even projected data is added continuously
to it.

Check your progress/ Self assessment questions- 3

Q9. List key features of data in data warehouse.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q6. Why the data in Decision support database keeps on increasing?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q7. Which component of DBMS is responsible for optimizing query response?

169 | P a g e
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q8. What do you mean by filtering?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

13.5 Summary
Highly normalized relational data models are used to store the operational data, optimized for
handling the transactional data. Operational data cover a short time frame and decision
support data tend to cover a longer time frame. Decision support data must be presented at
different levels of aggregation, from highly summarized to near-atomic. Operational data are
concerned with transactions related to update, whereas the decision support data is concerned
with transaction related to data retrieval or query. Query activity in the operational database
tends to be low, whereas sole purpose of creating a decision support data is to serve query
requirements. Query response can be optimized using indexing and data partitioning concepts
of DBMS.DBMS for decision support must be capable of supporting very large databases
(VLDBs). Special and advanced hardware support is needed to manage very large databases
such as, multiple disk arrays, multiple-processor technologies like symmetric multiprocessor
(SMP) or massively parallel processor (MPP). The decision support DBMS supports
advanced data modelling and data presentation tools. A data warehouse is integrated, subject-
oriented, time variant, Non-volatile. Data warehouses are subject-oriented and are based on
enterprise-specific concepts. The data warehouse can even contain projected data generated
through statistical and other models.

13.6 Glossary
Normalization- Process of decomposing large relations into smaller relations to reduce data
redundancy.
De-Normalization- Process of adding back small pieces of redundancy into relations, to
improve the performance of data retrieval.
Granularity- It refers to the level of aggregation or detail.

170 | P a g e
Data filtering- It is concerned with checking the consistency of the database and also the
validation of the database snapshot.
Data warehouse- Data repository that facilitate data extraction, data analysis, and decision
making, called data warehouse.

13.7 Answers to check your progress/self assessment questions


1. Highly normalized relational data models are used to store the operational data, optimized
for handling the transactional data.

2. Transactional database is decomposed into smaller relations with lesser number of


attributes to avoid storing redundant information.

3. Smaller relations are not suited for data retrieval task as they require application of join
operation on number of tables to fetch query data.

4. Operational data is used to store only the current transactional data. Whereas, decision
support data tend to cover a longer time frame as it stores all the historical data that may be
useful to carry out analysis.
5. De-Normalized.
6. Query response can be optimized using indexing and data partitioning concepts of DBMS.
7. The component of DBMS that deals with fine tuning the query response time is called
query optimizer.
8. Data filtering is concerned with checking the consistency of the database and also the
validation of the database snapshot.
9. Following are the key features of data in data warehouse:
a. Integrated
b. subject-oriented.
c. Time variant
d. Non-volatile.
10. Data is kept in the DS database forever and is never removed from it, and new data and
even projected data is added continuously to it on periodical basis.

13.8 References/ Suggested Readings


"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.

171 | P a g e
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications.
5. Database Systems: Design, Implementation and Management by Peter Rob and Carlos
Coronel, Ceneage India Pvt. Ltd."

13.9 Model questions


1. Define normalization and de-normalization.
2. Differentiate between operational data and decision support data from the data analyst's
point of view.
3. Define granularity.
4. Explain the concept of data filtering.
5. Explain various features of decision support data.
6. Explain the process of query optimization.

172 | P a g e
Lesson 14 Online Analytical Processing
Structure
14.0 Objective
14.1 Introduction
14.2 OLAP (On-Line Analytical Processing)
14.3 OLAP cube
14.4 Difference between OLTP and OLAP
14.5 OLAP Operations
14.6 OLAP Three-tier architecture
14.7 Multidimensional data model
14.8 Star schema
14.8.1 Dimension Table
14.8.2 Fact Table
14.9 Database security
14.10 Summary
14.11 Glossary
14.12 Answers to check your progress/self assessment questions
14.13 References/ Suggested Readings
14.14 Model questions

14.0 Objective
After studying this lesson, students will be able to:
1. Define the concept of Online Analytical Processing.
2. Differentiate between OLAP and OLTP.
3. Describe various OLAP operations.
4. Explain the 2-tier OLAP architecture.
5. Explain the star schema.
6. List various issues in OLAP security.

14.1 Introduction

Objective of data analysis is entirely different from managing operational data. Hence the
data model for data analysis is also different from the one used to manage the operational
data. OLAP cube is used to manage the OLAP data and is optimized to provide quick
response to ad-hoc queries. Number of operations can be performed on the OLAP cubes to
generate summaries from different dimensions. It helps to provide response to queries in

173 | P a g e
much lesser time. Star schema is used to describe the multidimensional model used for
maintaining data in a warehouse. It is highly denormalized and is based on central fact table.
Security issues for OLAP data are also different from the operational data. Schemes like data
sanitation and Access control have performed well for operational databases, but cannot be
implemented as such for OLAP systems as the data model used is different. This lesson also
focuses on OLAP security.

14.2 OLAP (On-Line Analytical Processing)

OLAP is an analysis tool that helps decision support tool to analyse multidimensional data.
OLAP is used to structure data hierarchy in such a way that it reflects the real dimensionality
of the enterprise as understood by the users. OLAP provides variety of views to end users or
managers that helps them to gain insight into the database. OLAP are optimized to provide
prompt responses to end users.
OLAP functionality includes:
a. Slicing to reduce the dimensions of data.
b. Drill-down to deeper levels of consolidation.
c. Getting down to the underlying detailed data.
d. Rotation to create new dimensional view so that dimensional comparisons can be
done.
e. Trend analysis for sequential time periods.

14.3 OLAP cube


OLAP cube is a data structure that allows fast analysis of data. OLAP cube holds data like a
3D spreadsheet instead of relational database. Relational databases are not recommended for
near immediate analysis and all also are not suited for the display of large amount of data.
Instead, relational databases are better suited for creating records from a series of transactions
known as OLTP or On-Line Transaction Processing. OLAP overcomes all the shortcomings
of the relational data model. OLAP provides viewing data from different
dimensions. Multidimensional OLAP (MOLAP) databases are used to create and physically
store cubes, whereas a relational OLAP (ROLAP) databases are used to create virtual cubes.

174 | P a g e
Figure 14.1 OLAP cube

Dimensions in OLAP cube are used to categorize the numeric facts. These numeric facts also
called measures. The cube metadata is created using a star or snowflake schema of tables in a
relational database. Fact tables are used to derive the measures and dimension tables are used
to derive the dimensions. Star and snowflake schema are based on denormalization.

14.4Difference between OLTP and OLAP

OLTP OLAP
It consists of operational or transactional
It consists of consolidated data. OLTP
data sources. OLTP’s are the original source
Databases are the source for OLAP data.
of data for any kind of data processing.
OLTP records and manages primary OLAP data provides for planning and decision
business operations. support.
It consists of relational data. It consists of Multi-dimensional views.
Frequency of updates is high very. It consists of less frequent and periodic long-
Operations are frequent and short. running batch jobs for refreshing the data.
Complex, ad-hoc and random queries
Pre-defined or predicted operations are
involving aggregations returning large number
performed.
of records.
Response time to ad-hoc queries may take
Response time is extremely short
minutes, hours or days to process.
Space requirement is relatively small as
Space needed is massive in order to store all
only current data is stored and the historical
data history, aggregations, etc.
data is archived.
Relational data model that is highly Only a fewer de-normalized dimension tables

175 | P a g e
normalized is used. are used along with single fact table.
Periodic backups are taken, or alternatively the
Backup is done on continuous basis. OLTP is simply reloaded as a recovery
method.

Check your progress/ Self assessment questions- 1

Q1. Define OLAP?


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q2. OLAP are optimized to provide prompt responses to end users. ( TRUE / FALSE )
___________________________________________________________________________

Q3. OLAP cube holds data like a relational database. ( TRUE / FALSE )
___________________________________________________________________________

Q4. Relational OLAP (ROLAP) databases are used to create_______________ .


Q5. Differentiate between OLTP and OLAP based on response time and relational data
model used.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

14.5 OLAP Operations


Multi-dimensional analysis makes it easy for the analyst to understand the knowledge stored
in databases. Multi-dimensional viewing makes it easy for the analyst to navigate through the
database. Analyst can easily make data's orientations and define analytical calculations.
Common operations on OLAP cubes are as follows:
1. Roll-up
Roll-up operation can be performed using either the concept of dimension reduction or
moving up the level of hierarchy of a dimension. Dimension reduction refers to reducing the

176 | P a g e
cube by one or more dimensions. A roll-up operation define a formulae to compute all of the
data relationships for one or more dimensions. It helps to get the current aggregation level of
fact values and apply further aggregation on dimensions.

2. Drill-down
It is opposite to roll-up. It is used to summarize data at a lower level of a dimension hierarchy
by moving down the concept hierarchy levels. It increases number of dimensions.
3. Slice
It is used to form a sub-cube by selecting a dimension from a given cube for which the sub-
cube is to be formed. It is similar to dimension reduction and results in decreasing the
dimensions of the cube.

177 | P a g e
4. Dice
It is different from slice. Rather than selecting a single dimension, dice operation is
performed by defining range select condition on single or more dimensions. It reduces the
number of member values of dimensions.

178 | P a g e
5. Pivot (rotate)
It means to re-orient the cube for an alternative presentation of the data. It can also be used as
transformation of 3D view to series of 2D planes.

14.6 OLAP Three-tier architecture


Before we discuss the three-tier architecture, let me give you a little overview of problems
faced with single and two-tier architecture.
Primary focus of the single-tier architecture is to minimize the amount of data stored. It does
so by removing data redundancies. The only layer physically available is the source layer and

179 | P a g e
the actual data warehouse is virtual or non-existence. The architecture fails to separate
analytical and transactional processing. End-user queries are submitted to operational
database via the middleware and it affects regular transactional workloads. Single-tier comes
with two layers namely, source layer and analysis layer.
Two-tier architecture overcomes the separation problem faced with the single-tier
architecture by providing physical data warehouse. The two-tier architecture introduces two
more layers namely, data staging and data warehouse layer. The source data is first brought
into a the staging area where it is extracted, cleansed to remove inconsistencies, and
integrated to one common schema using advanced ETL tools. Information is then stored to
one logically centralized repository called data warehouse and this central repository through
analysis layer interacts with end-users. Still, the two-tier architecture did not support
multidimensional server to speed up query processing.

Now days, large data warehouses adopt a three-tier architecture with the following structure
that overcame the problems faced with both single-tier and two-tier architectures.

Figure 14.2: Three-tier architecture

180 | P a g e
1. Bottom tier of the 3-tier architecture is mostly a relational database system. Data from
operational databases and sources is fed into the bottom tier using back-end tools. These tools
perform data extraction, cleaning, transformation, loadingand update functions. Gateway
examples include ODBC, OLEDB, and JDBC. Data is stored in central repository called data
warehouse which is then sub divided into small sets of data marts. Metadata is also
maintained for this tier.

2. The middle tier of 3-teir architecture is an extended relational model called ROLAP or a
multidimensional model that implements operations on multidimensional data directly.

3. The top tier of the 3-tier architecture is a front-end client layer. Front-end client layer
comprise of reporting and analysis tools. For example, classification, prediction, etc.

Check your progress/ Self assessment questions- 2

Q6. Differentiate between roll-up and drill-down operations on OLAP.


___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________

Q7. _________ operation is performed by defining range select condition on single or more
dimensions. It reduces the number of member values of dimensions.

Q8. What is the role of middle-tier in 3-tier OLAP architecture?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

14.7 Multidimensional data model

181 | P a g e
Data analysis is best performed using multidimensional data model. Multidimensional data
model is based on dimension relations. Data using multidimensional model can be viewed as
data cubes. You can view data from multiple dimensions using a data cube and it is defined
by dimensions and facts.

Figure 14.3: Multidimensional cube

Single dimension table is associated to each dimension in the data cube. Dimension tables
can be automatically generated or specified by modeling experts. There is a central fact table
in multidimensional data model connected to each dimension table or dimension. The fact
table is subject oriented like sales, order, etc. You already know about the OLAP operation
called pivoting, and it can be used to shift from one dimensional hierarchy to another. It
allows you to view the data from multiple dimensions.
A multi-dimensional model is extensible. Users can perform additional calculations on
available dimensions much faster and easily using data cubes. Multi-dimensional schemas,
star schema and snowflake schema are based on centralized fact table and surrounding
dimensions. A multidimensional database is optimized for Data Warehouse. Analysts are able
to identify the interesting measures, dimensions and attributes that make data meaningful, and
how these dimensions need to be organized into levels and hierarchies.

14.8 Star schema


It consists of a central fact table surrounded by dimension tables (one for each dimension).
Dimension tables in star schema are denormalized. Denormalization refers to adding back
small amount of redundancy to the database to improve the performance of data retrieval

182 | P a g e
process. Star Schema is a relational database schema for representing multidimensional data.
It consists of a single fact table and all dimension tables are connected directly to it and no 2
dimension tables are connected to each other directly. It forms the shape of a star. The star
schema may be visualized as a data “cube” or “hypercube” where each dimension table
represents a different spatial dimension.

Figure 14.4: Star schema.

14.8.1 Dimension Table


Dimension tables in dimension model are used to represent the business dimensions. Some of
the characteristics of a dimension table are as follows:
1. Primary key: Primary key in dimension table is used to connect with the foreign key in
the fact table.
2. Large set of attributes: A dimension table is wide or consists of large attributes. You
have the option of keeping all the attributes in one dimension table or normalizing it to
produce smaller tables.
3. Textual attributes: Attributes in a dimension table are not of numerical type but textual or
categorical type. Normally you find numerical attributes in fact tables.
4.No twoDimensions tableare directly connected: Normally dimensions are related to the
fact table and are not directly related to each other.
5. Multiple hierarchies: Dimension tables do provide for multiple hierarchies.

14.8.2 Fact Table


Fact table is connected to all dimension tables. Following are some of the characteristics of a
fact table:
1. Concatenated foreign Key: Fact table consists of foreign keys to dimension tables or the
row in the fact table is identified by the primary keys of dimension tables.

183 | P a g e
2. Data Granularity. Data granularity refers to the level of detail at which a fact table is
stored and you try to save as much of information as possible. Better to keep the granularity
high.
3. Additive Measures: The dimensions or attributes of a particular fact table may be fully
additive, semi additive or not additive. Additivity has been explained later in this lesson. But
let me define it for you. Additivity of a fact is a measure that defines the ability of the fact to
be aggregated across all dimensions and their hierarchy without changing the original
meaning of the fact.
a. Fully-additive.
b. Semi-additive
c. Non-Additive.
4. Deep Table: Contrary to a dimension table, fact table contains fewer attributes or is less
wide, but has large number of rows.

14.9 Database security


Providing security to operational database is different from OLAP database. In case of
operational database, the server level security is treated different from database security. In
other words, you can say that multitier security approach is followed. Concept of logins help
to manage the server level security and the concept of users help to manage the database level
security.
Logins typically consist of login name, password, default database and server-level
permissions. Users typically consist of user name, associated login, database access
permissions and object-level permissions. Passwords are not associated with users as the
validation is based on login. You can create multiple data users for a single login.
Protection is entirely different from security. Security focuses to control the unauthorized
access whereas protection tries to give the granular control over the database access. i.e. to
give the limited access to data users based on specific access permissions. You can
implement the access permissions for individual objects such as tables, views, procedures,
users, roles.
Data used for analysis or strategy making is very important and security of the same is a
critical issue. In case of OLAP system, key role is played by aggregation and derivations.
Beside the advantages, security problems are posed by aggregation and derivation. Effective
Inference control methods to the special setting of OLAP systems are important from security
point of view. OLAP and data warehouse are used to analyse almost everything. It is vital

184 | P a g e
from the company’s point of view that the data in OLAP systems is secure. Data stored in
OLAP systems helps analysts with insight into the perspective of data stored. Major threat to
OLAP data is from the insiders. Schemes like data sanitation and Access control have
performed well for operational databases, but cannot be implemented as such for OLAP
systems as the data model used is different. OLAP systems are susceptible to indirect
inferences. Inference control is mostly absent in OLAP systems.
Combination of both Access control and Inference control should be used to make the OLAP
data secure. A 3-teir security architecture of OLAP consists of data tier, aggregation tier and
query tier.

Figure 14.5 3-teir security architecture of OLAP

Inference control is used to enforce the security at aggregation tier. Inference control problem
is partitioned into blocks. Security is then ensured between the pair of two corresponding
blocks in data and aggregation layer.
Access control is used to enforce the security at query level using the access matrix tool.

Access matrix defines matrix for domains and the corresponding access rights they have on
different resources, which is this case is different dimensions of cube. Users or groups are
included in domains.

185 | P a g e
Check your progress/ Self assessment questions- 3

Q9. Multi-dimensional schemas, star schema and snowflake schema are based on centralized
_______ table and surrounding ___________.

Q10. Define star schema.


___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q11. Define access matrix.

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

Q12. You can create multiple data users for a single login. ( TRUE / FALSE )
___________________________________________________________________________

14.10 Summary
OLAP is an analysis tool that helps decision support tool to analyse multidimensional data.
OLAP provides variety of views to end users or managers that helps them to gain insight into
the database.
OLAP cube holds data like a 3D spreadsheet instead of relational database.Dimensions in
OLAP cube are used to categorize the numeric facts. These numeric facts also called
measures.A roll-up operation define a formulae to compute all of the data relationships for
one or more dimensions. Drill-down is used to summarize data at a lower level of a
dimension hierarchy by moving down the concept hierarchy levels. Slice is used to form a
sub-cube by selecting a dimension from a given cube for which the sub-cube is to be formed.

186 | P a g e
Dice operation is performed by defining range select condition on single or more dimensions.
It reduces the number of member values of dimensions. Now days, large data warehouses
adopt a three-tier architecture. Multidimensional data model is based on dimension relations.
Star schema consists of a central fact table surrounded by dimension tables, one for each
dimension. Dimension tables in star schema are denormalized. All dimension tables are
connected directly to the central fact table and no 2 dimension tables are connected to each
other directly. It forms the shape of a star.In case of operational database, the server level
security is treated different from database security. Concept of logins help to manage the
server level security and the concept of users help to manage the database level security.
Security focuses to control the unauthorized access whereas protection tries to give the
granular control over the database access. Combination of both Access control and Inference
control should be used to make the OLAP data secure. Access matrix defines matrix for
domains and the corresponding access rights they have on different resources.

14.11 Glossary
OLAP- OLAP is used to structure data hierarchy in such a way that it reflects the real
dimensionality of the enterprise as understood by the users.
OLAP Cube- OLAP cube holds data like a 3D spreadsheet instead of relational database.
MOLAP- Multidimensional OLAP (MOLAP) databases are used to create and physically
store cubes.
ROLAP- Relational OLAP (ROLAP) databases are used to create virtual cubes.
OLTP- It is used to manage the operational databases.
Access Matrix- Access matrix defines matrix for domains and the corresponding access rights
they have on different resources.

14.12 Answers to check your progress/self assessment questions


1. OLAP is used to structure data hierarchy in such a way that it reflects the real
dimensionality of the enterprise as understood by the users. OLAP provides variety of views
to end users or managers that helps them to gain insight into the database.
2. TRUE.
3. FALSE.
4. virtual cubes.
5. In case of OLTP, response time is extremely short and in case of OLAP, response time
may take minutes, hours or days to process. Relational data model that is highly normalized

187 | P a g e
is used for OLTP and de-normalized dimension tables are used along with single fact table
for OLAP.
6. A roll-up operation define a formulae to compute all of the data relationships for one or
more dimensions. It helps to get the current aggregation level of fact values and apply further
aggregation on dimensions. A droll-down operation is used to summarize data at a lower
level of a dimension hierarchy by moving down the concept hierarchy levels. It increases
number of dimensions.
7. Dice.
8. The middle tier of 3-tier architecture is an extended relational model called ROLAP or a
multidimensional model that implements operations on multidimensional data directly.
9. fact, dimensions.
10. Star schema consists of a central fact table surrounded by dimension tables, one for each
dimension. Dimension tables in star schema are denormalized. All dimension tables are
connected directly to the central fact table and no 2 dimension tables are connected to each
other directly. It forms the shape of a star.
11. Access matrix defines matrix for domains and the corresponding access rights they have
on different resources.
12. TRUE.

14.13 References/ Suggested Readings


"1. Introduction to Database Management system by Gillenson, Ponniah, Kriegel, Trukhov,
Taylor, Powell, Miller, WILEY.
2. Fundamentals of Relational Database Management System by S. Sumathi and S.
Esakkirajan, Springer.
3. Database 1management Systems by R. Panneerselvam, PHI.
4. Database 1management system Concepts by P. K. Singh, VK Publications."

14.14 Model questions


1. Differentiate between OLAP and OLTP.
2. Explain all operations that can be performed on OLAP cube.
3. Discuss database security.
4. Explain in detail the multidimensional model called star schema.
5. Explain OLAP 3-tier architecture in detail.

188 | P a g e
Lesson – 15 Database Administration

Structure of the lesson

15.0 Objective
15.1 Introduction
15.2 Need of Database administration
15.3 Role of Database in an Organization
15.4Database Administrator (DBA)
15.4.1 Responsibilities of the database administrator
15.4.2 Various tasks performed by DBA
15.4.3 Managerial services of DBA
15.4.4 Technical role of DBA
15.5 Difference in data administrator and database administrator
15.6 Database administration Tools
15.7 Developing a Data Administration Strategy
15.8 Summary
15.9 Glossary
15.10 Answers to check your progress/ self assessment questions
15.11 References/Suggested readings
15.12 Model Questions

15.0 Objective
After Studying this lesson, student will be able to :
 Explain Database administration tools
 Perform basic administrative functions.
 Maintain and retrieve data.
 Perform security Administration to protect data integrity.
 Identify and utilize sources of information to solve technical problems.

15.1 Introduction
Database administration is the function of managing and maintaining database management
systems software such as Oracle, IBM DB2 and Microsoft SQL Server. It is responsible for
designing physical database. It also deals with technical issues such as performance of
database, enforcement of security, backup and recovery. The main purpose of database

189 | P a g e
administration is to provide reliable, consistent, secure and available corporate –wide data.
Every DBMS requires database administration support. Complication of database
administration depends on the number of DBMS products installed and with this, the chances
become better for providing effective data management resources for your organization.
Communication between data administrator and database administrator ensures effective
database creation and usage.

15.2 Need of Database administration

A database management system (DBMS) is used by every organization for management of


their data to ensure the effective use and assignment of the company’s databases. For this, a
database administration group is required. Most of the modern organizations are using a
DBMS of any size which shows the need of a database administrator (DBA).

15.3 Role of Database in an Organization


 To support in making managerial decision at different levels in the organization is its
main role.
 Top Level to Strategic Decisions
 Middle Management to Tactical Decisions
 Operational Management to Daily and Operational Decisions
 A separate view of the data and support to their specialized decision making roles
must be provided by DBMS at each level.

15.4 Database Administrator (DBA)


A person or group of persons exert the centralized control of the database under the
supervision of a high-level administrator. This person or group of persons is known as the
database administrator. These users are most familiar with the database. They are also
responsible for creating, modifying, and maintaining the database at its three levels. Their
role is critical in managing the databases of an organization. The database administrator
ensures that the application programs have efficient and accurate access to the corporate data.
Database administrator scan interface with many different types of people like end users,
programmers, customers, technicians and executives. The database administrator is
responsible for definition, creation, implementation, designing and maintenance of the

190 | P a g e
database system, establishment of policies and procedures pertaining to the management,
maintenance, security and use of the database management system. It also gives training to
employees in database management. The main role of the DBA is to secure the data.

15.4.1 Responsibilities of the database administrator


 Installation and up gradation of the database server and application tools.
 Allocate system storage and plan the future storage requirements for database system
 Modify the database structure
 Enrol different users and maintain the system security
 Ensure compliance with database vendor license agreement
 Control and monitor the user access to the database
 Monitor and optimize the database performance
 Plan the backup and recovery of database
 Maintain the archived data
 Backup and restore the database
 Contact database vendor for technical support
 Generate various reports by querying from database

15.4.2 Various tasks performed by DBA


 Control access to the database
 Providing support services to the end users
 Manage procedures for backup and recovery of data
 Ensuring data integrity
 Control data security
 Maintain data privacy
 Create and manage databases, tables, views, macros, stored procedures, user defined
functions, journals, query logs, and other database objects.
 Grant privileges to roles and users on database objects.
 Allocate space to users and databases, and manage space usage.
 Create and manage accounts.
 Manage data load and export.
 Manage data archive and restore.
 Monitor and tune system performance.
 Troubleshoot user problems.

191 | P a g e
 Manage periodic database maintenance tasks.

15.4.3 Managerial services of DBA


 Supporting end-user community
 Defining and enforcing policies, procedures, and standards for database
functions.
 Ensuring data integrity, privacy and security
 Providing backup and recovery services for data
 Monitoring distribution and use of data in database.

15.4.4 Technical role of DBA


 Evaluating, selecting, and installing DBMS
 Designing and implementing databases and applications
 Testing and evaluating databases and applications
 Operating database management system, applications and utilities
 Provide training and support to users
 Maintaining DBMS, applications and utilities

Check your progress/ Self assessment questions


Q1. What is a database administrator?
Q2. List the tasks performed by a database administrator.
Q3. A person who is responsible for whole data, metadata and the policies about data use is
the _______.
a) Data administrator b) Database administrator c) Database steward d) Both A
and B

15.5 Difference in data administrator and database administrator


Data Administrator(DA) Database Administrator(DBA)
1. Responsible for managing the Responsible for designing physical
data resources database
2. It depends on a DBA for the It does not depend on a DA for the
logical and conceptual data physical model
models.

192 | P a g e
3. Managerial orientation Technical orientation
4. Sets long-term goals Executes plans to reach goals
5. Long term Short term
6. Broad scope Narrow scope
7. Sets policies and standards Enforces policies and procedures, also
programming standards
8. Strategic planning Control and supervision
9. Managing the data repository. Managing data security, privacy, and
integrity
10. DBMS independent DBMS dependent
11. Determine data requirements Implements data requirements
12. Involved in requirements Involved in design, development,
gathering, analysis, and design testing, and operational phases.
phase.

15.6 Database administration Tools


A database administration tool reduces the amount of time, effort and human error
involved in maintaining and administration. These are used only for those tasks that
cannot be performed using StreamStudio applications, Design Center, or Control Center.
Use of standard features of database management system makes the administration and
maintenance of database applications time-consuming. Many DBA tools enhance the
functionality of relational database management systems that are available from third-
party vendors. The wide variety of tools are available to reduce the burden of database
management and administration. Before using these tools you should have proper
knowledge otherwise it can have serious consequences. You can administer databases
used by StreamServer and StreamStudio applications in Database Administration Tool.
Various tools are as follows:
1. Adminer
It is a database management tool which allows managing columns, relations, databases,
tables, indexes, permissions and different types of users. It is similar to a package of a
single PHP file which supports popular DBMS like SQLite, MS SQL, MySQL,
PostgreSQL , MongoDB and Oracle.

193 | P a g e
On loading the first page, it prompts to select a database management system to connect
having database name, server name and user credentials. Once you logged in
successfully, the database tables are shown to you and can start managing the database.

2. DBComparer
It is a comparison tool for database which analyse the differences in comparison with
Microsoft SQL Server database structures and user interface that are easy to use. Among
others, comparison of database objects such as columns,foreign keys, tables, indexes,
roles,schemas, stored proceduresandusers can be done.

On launching this, firstly selection is done on the left side that which database to
display and on the right side which database to display in the comparison window. You

194 | P a g e
can also specify what properties and objects you had like to compare using Compare
Options tab. On completion of comparison process, both databases are shown side by
side and their differences are highlighted either in red or blue colour.

3.Firebird
It is a lightweight open source and powerful SQL relational database management
system for linux and
windows. It include features like incremental backups , full support for triggers and
stored procedures and
multiple access methods like .NET , ODBC, PHP , Perl and Python.

Utilization of 3rd party application like Flame Robin or Turbo Bird is required to handle
database admin
because it doesn’t have a front end user interface for managing databases.

4. EMS SQL Manager Lite for SQL Server


It allows creating and editing SQL Server database objects and also executing and saving
SQL queries. It
has a user-friendly interface. Its most of the functionality is wizard driven. It is also a
good alternative for
Microsoft SQL Server Management Studio.

195 | P a g e
It offers lite versions for other RDBMS. For Oracle or MySQL the same tool is used to
manage the database. When you run this for SQL Server, the first step is to register a
database for management. Once you have completed this start navigation through the DB
Explorer window which is on the left hand side.

5. SQuirrel SQL
It is a Java based database administration tool for JDBC compliant databases. It allows
viewing the database structure, browsing the data in tables and issuing SQL commands. It
supports databases such as Sybase,Oracle,Microsoft Access, Firebird, PostreSQL
,Microsoft SQL Server, IBM DB2, InterBase and MySQL.

196 | P a g e
On launching this, the first step is to configure the driver definition. It specifies the use
of JDBC driver.

6. SQLite Database Browser


It is an open source tool. It allows creating, editing and designing SQLite database files
which includes features like the ability to searching of records , creating and modifying
tables ,indexes , records and databases, importing and exporting of data. It also contains a
log which shows all the SQL commands that have been issued by the user and the
application.

197 | P a g e
When you open this, either create a new database or open an existing database. After
loading a database, one can view the database structure, execute SQL commands and
browse data.

7. DBeaver
It is an open source database tool for database administrators and java developers which
support JDBC compliant databases such as Oracle, PostgreSQL, MySQL, IBM DB2,
Firebird, Sybase and SQL Server. It includes features like the ability to edit databases
and browse, export data, ER diagrams, create and execute SQL scripts and transaction
management. The use of plugins can extend its functionality.

On opening this, the first step is go to Database then New Connection and then loading a
database. Once it gets connected, the database can be viewed on the left hand side of the
main window in the Database Navigator tabs.

8. DBVisualizer Free
It is one of a database tool which allows the management of databases like MySQL,
PostgreSQL, Oracle, SQL Server, Sybase, Informix, SQLite , H2 and DB2. It is based on
Java. Java developers and database administrators try to maintain and develop databases
as its main aim. It include features like creating and editing database objects having

198 | P a g e
visual support ,navigation through database objects with a database browser, an auto-
complete SQL Editor , importing a data from a file and visual query building support and
database admin features like security and managing database storage . It runs on
Windows, Linux and Mac OSX. Query builder provides an easy way to develop database
queries.

On launching this, a connection wizard pops up to guide about the connection to a


database. Once a connection has been made, the database appears on the left hand side of
the main window. The properties are shown on the right hand side and the selection of
data related to the object is done from the left hand side.

9. HeidiSQL
It is a database query tool which supports PostreSQL ,MySQL and Microsoft SQL
Server databases, windows based interface for MySQL databases. It allows to create and
edit tables, triggers,views, procedures, browse and edit data and schedule events. It
include features like management of multiple servers using one main window, export
one database server to another, bulk editing, an advanced SQL syntax editor and
database optimization.

199 | P a g e
On launching this, a connection is needed to setup with the database server. On the left
hand side, the navigation pane is used to view database tables, and the tab on the right
hand pane manages foreign keys, indexes and database options.

10. FlySpeed SQL Query


It is a database query tool whose main goal is to make easy work. It supports popular
database servers such as PostgreSQL , MySQL and SQL Server. Using this one can edit,
find and browse data in your database either in grid format or via the customizable form
view.

200 | P a g e
A connection is needed to be created via the database connection wizard, when you
launch this for the first time. Once this is completed, database tables and views are
navigated on the left hand pane and query builder is used to create queries.

11. DBArtisan 8.1.5


It helps DBA to maximize the security, availability and performance of their databases. It
is a cross
platform database administration tool. Their wizards and graphical editors boost the
productivity by
permitting the staff to manage more databases and reducing errors.DBA can detect
problems like growth
over time , storage-related bottlenecks, pinpoint performance and track storage with
DBArtisan's
performance management capabilities, advanced space and capacity. It can also solve the
challenge of how to do more with less. It simplifies many of the database administration
tasks.
Their latest version gives database administrators support for Microsoft SQL Server 2005
CTP and MySQL
network. This support means improved performance,increased levels of security, higher
productivity and
availability.

201 | P a g e
12. DB2 Recovery Expert
IBM has launched this tool. Its design lessens the administrative costs of databases
through a number of self-managing capabilities and enhances their
performance. Simplified recovery features and self-managing capabilities are designed to
minimise database problems.

13. DB2 Administration Tool


This tool is for administrator whose responsibility is to keep DB2 performance at peak
levels. It enables the daily tasks that are associated with management of a DB2 database.
It simplifies the complex tasks associated with DB2 objects and schema throughout the
application lifecycle that are safely managed. It allows easily and quickly navigation of
DB2 catalog. Without knowing the exact SQL syntax, it builds and executes dynamic
SQL statements. The changes made to DB2 object definitions are managed and tracked
by it. It helps to build DB2 commands for execution against tables and databases. Also
enable the users to alter, create and drop DB2 objects.

14. SQLyog
It is a powerful MySQL manager and admin tool which combine the features of
phpMyAdmin, MySQL Administrator , MySQL Graphic User Interface tools and other
MySQL Front End.

15. Navicat for MySQL


It is a powerful database administration and development tool for MySQL. It supports
most of the latest MySQL features including trigger, stored procedure, function, event,

202 | P a g e
view, and manage user etc. It also works with any MySQL Database Server from version
3.21 or above.

16.SQL Maestro
It is a MySQL admin tool for managing, controlling and developing MySQL database. A
powerful set of tools are used to build visual diagrams for numeric type data, edit and
execute SQL scripts and compose OLAP cubes.

17. MySQL Workbench


It is a visual database design tool developed by MySQL. It provides an integrated tools
environment for database design & modeling, SQL development and database
administration to database administrators and developers.

203 | P a g e
18. RazorSQL
It is a SQL editor, database browser, SQL query tool and database administration tool for
Solaris, Windows, Linux and Mac OS X. It provides visual tools which are easy to use
and advanced features which allow users to do database administration, programming
,editing, management, and browsing.

Check your progress/ Self assessment questions


Q4. List the various database administration tools.
Q5. What is the difference between data administrator and database administrator?
Q6. If both data and database administration exist in an organization, the database
administrator is responsible
for which of the following?
a) Data modelling b) Database design c) Metadata d) All of the above

15.7 Developing a Data Administration Strategy


To make changes in data management supported by technological advances in
computer systems, the functions of data administration were developed in a response.
The strategy behind the development of data administration is closely related to the

204 | P a g e
company’s goal and objectives. Implementing the overall company strategy is a crucial
process influenced by managerial, technological, and cultural issues. There is an
approach to plan, develop systems and databases known as information engineering. It
is a data driven approach who offers the best opportunity for success when the
objective is data sharing systems environment. Information engineering translates
strategic company goals into the data and applications that will help company in
achieving that goals.Information system architecture is the output of the information
engineering process that serves as basis for planning, development, and control of
future information systems. Information system facilitates the transformation of data
into information and to manage both data and information. For the development of data
administration, a detailed analysis of company goals, situation and business needs are
required. An integrating methodology is also required to guide the development of this
overall plan. The most used integrating methodology is known as information
engineering. The development can take place through one of several approaches or a
combination of these which includes developing specifications and plans for a single
system, developing specifications, integration requirements, and plans for multiple
commercially available systems and components. Implementation of information
engineering is a costly process because it provides a framework that includes use of
computerized, automated, and integrated tools.

15.8Summary
A database administrator plays an important role in a company or organization that uses
a database to keep track of information. Database administration is a growing field that
should stay relevant for a long time. It is responsible for data integrity, security,
availability and performance of database management system. It also provides
education and support for the use of the tools, data standards and guidelines. It deals
with the physical side. It is mainly responsible for developing a strategy for the control
and use of corporate data.

15.9 Glossary
Database -A collection of interrelated data stored in a standardized format, designed to
be shared by multiple users.

205 | P a g e
Database Administration - It is the management of the physical realization of a database
system, which includes physical database design and implementation, setting security
and integrity controls, monitoring system performance, and reorganizing the database.
Data Administration - Planning and coordination required to define data consistently
throughout the company.
Database Administrator (DBA) - A specialist who is trained in the administration of a
particular DBMS. They are trained in the details of installing, configuring, and
operating the DBMS.
Data administrator (DA) - The person who is in charge of the data resources of a
company and also responsible for data integrity, consistency, and integration.
DBMS -A database management system is the software that allows a computer to
perform database functions of storing, retrieving, adding, deleting and modifying data.
DB2 - DB2 is a relational database system developed by IBM Corporation, originally
for use on large mainframe computer systems.
Repository - A repository is a collection of resources that can be accessed to retrieve
information.
Schema - A database schema is a collection of meta-data that describes the relations in
a database.
SQL - A standardized database language, used for data retrieval (queries), data
definition, and data manipulation.

15.10 Answers to check your progress/ self assessment questions


1. The users who are most familiar and also responsible for creating, modifying and
maintaining the
database at all three levels.
2. Various tasks are:
 Control data security
 Grant privileges to roles and users on database objects.
 Allocate space to users and databases, and manage space usage.
 Create and manage accounts.
 Manage data load and export.
 Manage data archive and restore.
 Monitor and tune system performance.
 Troubleshoot user problems.

206 | P a g e
 Manage periodic database maintenance tasks.
3. a.
4. Various database administration tools are :
Adminer, DBComparer, Firebird, EMS SQL Manager Lite for SQL Server, SQuirrel
SQL , SQLite
Database Browser, DBeaver , DBVisualizer Free, HeidiSQL, FlySpeed SQL Query ,
DBArtisan 8.1.5 ,
DB2 Recovery Expert, RazorSQL.
5. Differences are :
Data Administrator(DA) Database Administrator(DBA)
1. Responsible for the management of Responsible for designing physical
data resources database
2. It should depend on a DBA for the It should not depend on the DA for
logical and conceptual data models. the physical model.
3. Managerial orientation Technical orientation
4. Sets long-term goals Executes plans to reach goals
5. Long term Short term
6. Broad scope Narrow scope
7. Sets policies and standards Enforces policies and procedures,
also programming standards
8. Strategic planning Control and supervision
9. Managing the data repository Managing data security, privacy and
integrity

6. b. Database design

15.11 References/Suggested readings


1.The Information System Consultant's Handbook: Systems Analysis and Design,
William S. Davis,
David C. Yen
2.Database systems: design, implementation & management, Peter Rob, Carlos Coronel,
Keeley Crockett

207 | P a g e
3.Database Systems: A Practical Approach to Design, Implementation and Management ,
Thomas M. Connolly, Carolyn E. Begg, 4th Edition

15.12 Model Questions


Q1.What is DBA ? What are the various responsibilities of DBA ?
Q2. Differentiate between Data Administrator and Database Administrator.
Q3. What is Database Administration? Why it is needed ?
Q4. Explain various Database Administration Tools?
Q5. Explain how to develop a Data Administration Strategy .

208 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy