Database Management Systems Unit-Wise
Database Management Systems Unit-Wise
UNIT I: Introduction to Databases: Introduction, An Example, Characteristics of the Database Approach, Actors on Scene, Workers
behind the scene, Advantages of Using the DBMS Approach, A Brief History of Database Applications, When Not to Use a DBMS
[Text book-3] Overview of Database Languages and Architectures: Data Models, Schemas and Instances, Three Schema
Architecture and Data Independence, Database Languages and Interfaces, The Database System Environment, Centralized and
Client/Server Architecture for DBMSs, Classification of Database Management Systems [Text book-3]
UNIT II: Introduction to Database Design: Database Design and ER Diagrams, Entities, Attributes and Entity Sets, Relationships
and Relationship Sets, Additional Features of the ER Model, Conceptual Design with the ER Model, Conceptual Design for Large
Enterprises. Relational Model: Introduction to the Relational Model, Integrity Constraints over Relations, Enforcing Integrity
Constraints, Querying Relational Data, Logical Database Design: ER to Relational, Introduction to Views, Destroying/Altering Tables
and Views
UNIT III: Relational Algebra: Selection and Projection, Set Operations, Renaming, Joins, Division, More Examples of Algebra
Queries. SQL: Queries, Constraints, Triggers: The Form of a Basic SQL Query, UNION, INTERSECT and EXCEPT, Nested
Queries, Aggregate Operators, Null Values, Complex Integrity Constraints in SQL, Triggers and Active Databases, Designing Active
Databases.
UNIT IV: Introduction to Normalization Using Functional and Multi valued Dependencies: Informal Design Guidelines for
Relation Schema, Functional Dependencies, Normal Forms Based on Primary Keys, General Definitions of Second and Third Normal
Forms, Boyce-Codd Normal Form, Multi valued Dependency and Fourth Normal Form, Join Dependencies and Fifth Normal Form.
UNIT V: Transaction Management and Concurrency Control: Transaction Concept, A Simple Transaction Model, Storage
Structure, ACID Properties, Serializability, Transaction Isolation Levels, Concurrency Control, Lock-Based Protocols, Validation-
Based Protocols [Text Book-2]
UNIT I
1.1 Introduction
Importance: Database systems have become an essential component of life in modern society, in that many frequently occurring
events trigger the accessing of at least one database: bibliographic library searches, bank transactions, hotel/airline reservations,
grocery store purchases, online (Web) purchases, etc., etc.
Also, database search techniques are applied by some WWW search engines.
Definitions
The term database is often used, rather loosely, to refer to just about any collection of related data. E&N say that, in addition to being
a collection of related data, a database must have the following properties:
It represents some aspect of the real (or an imagined) world, called the miniworld or universe of discourse. Changes to the
mini world are reflected in the database. Imagine, for example, a UNIVERSITY mini world concerned with students,
courses, course sections, grades, and course prerequisites.
It is a logically coherent collection of data, to which some meaning can be attached. (Logical coherency requires, in part, that
the database not be self-contradictory.)
It has a purpose: there is an intended group of users and some preconceived applications that the users are interested in
employing.
To summarize: a database has some source (i.e., the miniworld) from which (logically consistent) data are derived, some degree of
interaction with events in the represented miniworld (at least insofar as the data is updated in response to changes in the state of the
miniworld), and an audience that is interested in using it.
An Aside: data vs. information vs. knowledge: Data is the representation of "facts" or "observations" whereas information refers to
the meaning thereof (according to some interpretation). Knowledge, on the other hand, refers to the ability to use information to
achieve intended ends.
Computerized vs. manual: Not surprisingly (this being a CS course), our concern will be with computerized database systems, as
opposed to manual ones, such as the card catalog-based systems that were used in libraries in ancient times (i.e., before the year 2000).
(Some authors wouldn't even recognize a non-computerized collection of data as a database, but E&N do.)
Size/Complexity: Databases run the range from being small/simple (e.g., one person's recipe database) to being huge/complex (e.g.,
Amazon's database that keeps track of all its products, customers, and suppliers).
Definition: A database management system (DBMS) is a collection of programs enabling users to create and maintain a database.
DB Functionalities: More specifically, a DBMS is a general purpose software system facilitating each of the following (with respect
to a database):
definition: specifying data types (and other constraints to which the data must conform) and data organization
construction: the process of storing the data on some medium (e.g., magnetic disk) that is controlled by the DBMS
manipulation: querying, updating, report generation
sharing: allowing multiple users and programs to access the database "simultaneously"
system protection: preventing database from becoming corrupted when hardware or software failures occur
security protection: preventing unauthorized or malicious access to database.
Given all its responsibilities, it is not surprising that a typical DBMS is a complex piece of software.
A database together with the DBMS software is referred to as a database system. (See Figure 1.1, page 7.)
1.2: An Example:
UNIVERSITY database in Figure 1.2. Notice that it is relational!
Among the main ideas illustrated in this example is that each file/relation/table has a set of named fields/attributes/columns, each of
which is specified to be of some data type. (In addition to a data type, we might put further restrictions upon a field, e.g.,
the Grade field in the GRADE_REPORT table must have a value from the set {'A', 'B', ..., 'F'}.)
The idea is that, of course, each table will be populated with data in the form of records/tuples/rows, each of which represents some
entity (in the mini world) or some relationship between entities.
For example, each record in the STUDENT table represents a —surprise!— student. Similarly for
the COURSE and SECTION tables.
On the other hand, each record in GRADE_REPORT represents a relationship between a student and a section of a course. And each
record in PREREQUISITE represents a relationship between two courses.
Database manipulation involves querying and updating.
Examples of (informal) queries:
Retrieve the transcript(s) of student(s) named 'Smith'.
List the names of students who were enrolled in a section of the 'Database' course in Spring 2006, as well as their grades in
that course section.
List all prerequisites of the 'Database' course.
Examples of (informal) updates:
Change the CLASS value of 'Smith' to sophomore (i.e., 2).
Insert a record for a section of 'File Processing' for this semester.
Remove from the prerequisites of course 'CMPS 340' the course 'CMPS 144'.
A query/update must be conveyed to the DBMS in a precise way (via the query language of the DBMS) in order to be processed.
As with software in general, developing a new database (or a new application for an existing database) proceeds in phases,
including requirements analysis and various levels of design (conceptual (e.g., Entity-Relationship Modeling), logical (e.g.,
relational), and physical (file structures)).
1.3: Characteristics of the Database Approach:
Database approach vs. File Processing approach: Consider an organization/enterprise that is organized as a collection of
departments/offices. Each department has certain data processing "needs", many of which are unique to it. In the file processing
approach, each department would control a collection of relevant data files and software applications to manipulate that data.
Data redundancy, which not only wastes storage space but also makes it more difficult to keep changing data items
consistent with one another, as a change to one copy of a data item must be made to all of them. Inconsistency results when one (or
more) copies of a datum are changed but not others. (E.g., If you change your address, informing the Registrar's Office should suffice
to ensure that your grades are sent to the right place, but does not guarantee that your next bill will be, as the copy of your address
maintained by the Bursar's Office might not have been changed.)
In the database approach, a single repository of data is maintained that is used by all the departments in the organization.
(Note that "single repository" is used in the logical sense. In physical terms, the data may be distributed among various sites, and
possibly mirrored.)
1. Early Database Applications Using Hierarchical and Network Systems: Many early database applications maintained
records in large organzations, such as corporations, universities, hospitals, and banks. In many of these applications, there
were large numbers of records of similar structure. One of the main problems with early database systems was the
intermixing of conceptual relationships with the physical storage and placement of records on disk. Another shortcoming of
early systems was that they provided only programming language interfaces. This made it time-consuming and expensive to
implement new queries and transactions, since new programs had to be written, tested, and debugged.
2. Providing Application Flexibility with Relational Databases : Relational databases were originally proposed to separate
the physical storage of data from its conceptual representation and to provide a mathematical foundation for databases. The
relational data model also introduced high-level query languages that provided an alternative to programming language
interfaces; hence, it was a lot quicker to write new queries. Eventually, relational databases became the dominant type of
database systems for traditional database applications. Relational databases now exist on almost all types of computers, from
small personal computers to large servers.
3. Object-Oriented Applications and the Need for More Complex Databases : The emergence of object-oriented
programming languages in the 1980s and the need to store and share complex-structured objects led to the development of
object-oriented databases. Initially, they were considered a competitor to relational databases, since they provided more
general data structures. They also incorporated many of the useful object oriented paradigms, such as abstract data types,
encapsulation of operations, inheritance, and object identity. However, the complexity of the model and the lack of an early
standard contributed to their limited usc. They are now mainly used in specialized applications, such as engineering design,
multimedia publishing, and manufacturing systems.
4. Interchanging Data on the Web for E-Commerce : The World Wide Web provided a large network of interconnected
computers. Users can create documents using a Web publishing language, such as HTML (HyperText Markup Language),
and store these documents on Web servers where other users (clients) can access them. Documents can be linked together
through hyper links, which are pointers to other documents. A variety of techniques were developed to allow the interchange
of data on the Web. Currently, XML (eXtended Markup Language) is considered to be the primary standard for
interchanging data among various types of databases and Web pages. XML combines concepts from the models used in
document systems with database modeling concepts.
5. Extending Database Capabilities for New Applications : The success of database systems in traditional applications
encouraged developers of other types of applications to attempt to use them. Such applications traditionally used their own
specialized file and data structures.
★ Scientific applications → store large amounts of data from scientific experiments
★ Storage & retrieval of images
★ Storage & retrieval of videos
★ Data mining applications → analyzing large amounts of data
★ Spatial applications → weather information
★ Time series applications → eg: daily sales information
When Not to Use a DBMS :
Overhead costs of using DBMS: High initial investment
→ Overhead for providing security, concurrency control, recovery
★ Database & applications → simple, well defined and no changes expected.
★ Multiple-user access → not required.
Schemas and Instance ,Database State: In a data model, it is important to distinguish between the description of the database and the
database itself. The description of a database is called the database schema, which is specified during database design and is not
expected to change frequently. Most data models have certain conventions for displaying schemas as diagrams.
The diagram displays the structure of each record type but not the actual instances of records. We call each object in the schema—
such as STUDENT or COURSE—a schema construct. The data in the database at a particular moment in time is called a database
state or snapshot. It is also called the current set of occurrences or instances in the database.
3. The external or view level includes a number of external schemas or user views. Each external schema describes the part of the
database that a particular user group is interested in and hides the rest of the database from that user group. As in the previous level,
each external schema is typically implemented using a representational data model, possibly based on an external schema design in a
high-level conceptual data model.
Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one level of the database system without
altering the schema at the next higher level.
There are two types of data independence:
1. Logical Data Independence
o Logical data independence refers characteristic of being able to change the conceptual schema without having to change the
external schema.
o Logical data independence is used to separate the external level from the conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of the data would not be affected.
o Logical data independence occurs at the user interface level.
2. Physical Data Independence
o Physical data independence can be defined as the capacity to change the internal schema without having to change the
conceptual schema.
o If we do any changes in the storage size of the database system server, then the Conceptual structure of the database will not
be affected.
o Physical data independence is used to separate conceptual levels from the internal levels.
o Physical data independence occurs at the logical interface level.
DB INTERFACES:
A database management system (DBMS) interface is a user interface that allows for the ability to input queries to a database without
using the query language itself. User-friendly interfaces provided by DBMS may include the following:
Menu-Based Interfaces
Forms-Based Interfaces
Graphical User Interfaces
Natural Language Interfaces
Interfaces for Parametric Users
Interfaces for the Database Administrator (DBA)
Menu-Based Interfaces
These interfaces present the user with lists of options (called menus) that lead the user through the formation of a request. The basic
advantage of using menus is that they remove the tension of remembering specific commands and syntax of any query language. The
query is basically composed step by step by collecting or picking options from a menu that is shown by the system. Pull-down menus
are a very popular technique in Web-based interfaces. They are also often used in browsing interfaces which allow a user to look
through the contents of a database in an exploratory and unstructured manner.
Forms-Based Interfaces
A forms-based interface displays a form to each user. Users can fill out all of the form entries to insert new data, or they can fill out
only certain entries, in which case the DBMS will redeem the same type of data for other remaining entries. These types of forms are
usually designed or created and programmed for users that have no expertise in operating systems. Many DBMS’s have form
specification languages which are special languages that help specify such forms.
Example: SQL Forms is a form-based language that specifies queries using a form designed in conjunction with the relational
database schema.
Graphical User Interface
A GUI typically displays a schema to the user in diagrammatic form. The user then can specify a query by manipulating the diagram.
In many cases, GUI utilise both menus and forms. Most GUI use a pointing device such as a mouse, to pick a certain part of the
displayed schema diagram.
Natural Language Interfaces
These interfaces accept requests written in English or some other language and attempt to understand them. A Natural language
interface has its own schema, which is similar to the database conceptual schema as well as a dictionary of important words.
The natural language interface refers to the words in its schema as well as to the set of standard words in a dictionary to interpret the
request. If the interpretation is successful, the interface generates a high-level query corresponding to the natural language and submits
it to the DBMS for processing, otherwise, a dialogue is started with the user to clarify any provided condition or request. The main
disadvantage of this is that the capabilities of this type of interface are not that advance.
Interface for Parametric Users
Interfaces for Parametric Users contain some commands that can be handled with a minimum of keystrokes. It is generally used in
bank transactions for transferring money. These operations are performed repeatedly.
Interfaces for Database Administrators (DBA)
Most database system contains privileged commands that can be used only by the DBA’s staff. These include commands for creating
accounts, setting system parameters, granting account authorization, changing a schema, and reorganizing the storage structures of
databases.
The Database System Environment:
A DBMS is a complex software system. In this section we discuss the types of software components that constitute a DBMS and the
types of computer system software with which the DBMS interacts.
DBMS Component Modules :the typical DBMS components divided into two parts. The top part of the figure refers to the various
users of the database environment and their interfaces. The lower part shows the internal modules of the DBMS responsible for
storage of data and processing of transactions.
The database and the DBMS catalog are usually stored on disk. Access to the disk is controlled primarily by the operating system
(OS), which schedules disk read/write. Many DBMSs have their own buffer management module to schedule disk read/write, because
management of buffer storage has a considerable effect on performance. Reducing disk read/write improves performance
considerably. A higher-level stored data manager module of the DBMS controls access to DBMS information that is stored on disk,
whether it is part of the database or the catalog.
first. It shows interfaces for the DBA staff, casual users who work with interactive interfaces to formulate queries,
application programmers who create programs using some host programming languages, and parametric users who do data entry
work by supplying parameters to predefined transactions. The DBA staff works on defining the database and tuning it by making
changes to its definition using the DDL and other privileged commands. The DDL compiler processes schema definitions, specified
in the DDL, and stores descriptions of the schemas (meta-data) in the DBMS catalog. The catalog includes information such as the
names and sizes of files, names and data types of data items, storage details of each file, mapping information among schemas, and
constraints. In addition, the catalog stores many other types of information that are needed by the DBMS modules, which can then
look up the catalog information as needed.
The rest of the program is sent to the host language compiler. The object codes for the DML commands and the rest of the
program are linked, forming a canned transaction whose executable code includes calls to the runtime database processor. It is also
becoming increasingly common to use scripting languages such as PHP and Python to write database programs. Canned transactions
are executed repeatedly by parametric users via PCs or mobile apps; these users simply supply the parameters to the transactions. Each
execution is considered to be a separate transaction.
Common utilities have the following types of functions:
■ Loading. A loading utility is used to load existing data files—such as text files or sequential files—into the database.
Usually, the current (source) format of the data file and the desired (target) database file structure are specified to the utility, which
then automatically reformats the data and stores it in the database. With the proliferation of DBMSs, transferring data from one DBMS
to another is becoming common in many organizations. Some vendors offer conversion tools that generate the appropriate loading
programs, given the existing source and target database storage descriptions (internal schemas).
■ Backup. A backup utility creates a backup copy of the database, usually by dumping the entire database onto tape or other
mass storage medium. The backup copy can be used to restore the database in case of catastrophic disk failure. Incremental backups
are also often used, where only changes since the previous backup are recorded. Incremental backup is more complex, but saves
storage space.
■ Database storage reorganization. This utility can be used to reorganize a set of database files into different file
organizations and create new access paths to improve performance.
■ Performance monitoring. Such a utility monitors database usage and provides statistics to the DBA. The DBA uses the
statistics in making decisions such as whether or not to reorganize files or whether to add or drop indexes to improve performance.
Other utilities may be available for sorting files, handling data compression, monitoring access by users, interfacing with the network,
and performing other functions.
Centralized DBMSs Architecture: Architectures for DBMSs have followed trends similar to those for general computer system
architectures. Older architectures used mainframe computers to provide the main processing for all system functions, including user
application programs and user interface programs, as well as all the DBMS functionality. The reason was that in older systems, most
users accessed the DBMS via computer terminals that did not have processing power and only provided display capabilities.
Therefore, all processing was performed remotely on the computer system housing the DBMS, and only display information
and controls were sent from the computer to the display terminals, which were connected to the central computer via various types of
communications networks. As prices of hardware declined, most users replaced their terminals with PCs and workstations, and more
recently with mobile devices.
At first, database systems used these computers similarly to how they had used display terminals, so that the DBMS itself
was still a centralized DBMS in which all the DBMS functionality, application program execution, and user interface processing were
carried out on one machine. Figure illustrates the physical components in a centralized architecture. Gradually, DBMS systems started
to exploit the available processing power at the user side, which led to client/server DBMS architectures.
Basic Client/Server Architectures :First, we discuss client/server architecture in general; then we discuss how it is applied to
DBMSs. The client/server architecture was developed to deal with computing environments in which a large number of PCs,
workstations, file servers, printers, database servers, Web servers, e-mail servers, and other software and equipment are connected via
a network. The idea is to define specialized servers with specific functionalities. For example, it is possible to connect a number of
PCs or small workstations as clients to a file server that maintains the files of the client machines. Another machine can be designated
as a printer server by being connected to various printers; all print requests by the clients are forwarded to this machine. Web servers
or e-mail servers also fall into the specialized server category.
The resources provided by specialized servers can be accessed by many client machines. The client machines provide the
user with the appropriate interfaces to utilize these servers, as well as with local processing power to run local applications. This
concept can be carried over to other software packages, with specialized programs—such as a CAD (computer-aided design) package
—being stored on specific server machines and being made accessible to multiple clients. Figure illustrates client/server architecture at
the logical level.
Relational DBMS (RDBMS): Organizes data into tables of rows and columns, with relationships established between tables. SQL
(Structured Query Language) is typically used to manipulate and query data.
Hierarchical DBMS: Represents data in a tree-like structure, with each record having one parent record and multiple children records.
Commonly used in older systems and in environments like mainframes.
Network DBMS: Extends the hierarchical model by allowing each record to have multiple parent and child records, forming a
network-like structure. Often used in specialized applications.
Object-Oriented DBMS (OODBMS): Stores data as objects, encapsulating data and behavior. Objects can contain attributes and
methods, offering more flexibility and support for complex data structures.
The main data model used in many current commercial DBMSs is the relational data model, and the systems based on this model are
known as SQL systems. The object data model has been implemented in some commercial systems but has not had widespread use.
Recently, so-called big data systems, also known as key-value storage systems and NOSQL systems, use various data models:
document-based, graph-based, column-based, and key-value data models. Many legacy applications still run on database systems
based on the hierarchical and network data models. The relational DBMSs are evolving continuously, and, in particular, have been
incorporating many of the concepts that were developed in object databases. This has led to a new class of DBMSs called object-
relational DBMSs.
The second criterion used to classify DBMSs is the number of users supported by the system. Single-user systems support only one
user at a time and are mostly used with PCs. Multiuser systems, which include the majority of DBMSs, support concurrent multiple
users.
The third criterion is the number of sites over which the database is distributed. A DBMS is centralized if the data is stored at a
single computer site. A centralized DBMS can support multiple users, but the DBMS and the database reside totally at a single
computer site. A distributed DBMS (DDBMS) can have the actual database and DBMS software distributed over many sites
connected by a computer network. Big data systems are often massively distributed, with hundreds of sites. The data is often
replicated on multiple sites so that failure of a site will not make some data unavailable.
Homogeneous DDBMSs use the same DBMS software at all the sites, whereas heterogeneous DDBMSs can use different DBMS
software at each site. It is also possible to develop middleware software to access several autonomous preexisting databases stored
under heterogeneous DDBMSs. This leads to a federated DBMS (or multidatabase system), in which the participating DBMSs are
loosely coupled and have a degree of local autonomy. Many DDBMSs use client-server architecture also.
The fourth criterion is cost. It is difficult to propose a classification of DBMSs based on cost. Today we have open source (free)
DBMS products like MySQL and PostgreSQL that are supported by third-party vendors with additional services. The main RDBMS
products are available as free examination 30-day copy versions as well as personal versions, which may cost under $100 and allow a
fair amount of functionality. The giant systems are being sold in modular form with components to handle distribution, replication,
parallel processing, mobile capability, and so on, and with a large number of parameters that must be defined for the configuration.
Furthermore, they are sold in the form of licenses—site licenses allow unlimited use of the database system with any number of copies
running at the customer site. Another type of license limits the number of concurrent users or the number of user seats at a location.
Standalone single-user versions of some systems like Microsoft Access are sold per copy or included in the overall configuration of a
desktop or laptop. In addition, data warehousing and mining features, as well as support for additional data types, are made available
at extra cost. It is possible to pay millions of dollars for the installation and maintenance of large database systems annually. We can
also classify a DBMS on the basis of the types of access path options for storing files. One well-known family of DBMSs is based
on inverted file structures. Finally, a DBMS can be general purpose or special purpose. When performance is a primary
consideration, a special-purpose DBMS can be designed and built for a specific application; such a system cannot be used for other
applications without major changes. Many airline reservations and telephone directory systems developed in the past are special-
purpose DBMSs. These fall into the category of online transaction processing (OLTP) systems, which must support a large number of
concurrent transactions without imposing excessive delays.