0% found this document useful (0 votes)
24 views16 pages

Unit 1 Notes

The document provides an introduction to databases including: - Databases are used to store related data for a specific purpose and are more effective than file systems which lack data independence and require separate programs to access data. - A database management system (DBMS) allows users to define, construct, manipulate and share databases while providing protection and long-term maintenance. - Databases are widely used in enterprise systems, banking, education, transportation and more to store and access information.

Uploaded by

Ally Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views16 pages

Unit 1 Notes

The document provides an introduction to databases including: - Databases are used to store related data for a specific purpose and are more effective than file systems which lack data independence and require separate programs to access data. - A database management system (DBMS) allows users to define, construct, manipulate and share databases while providing protection and long-term maintenance. - Databases are widely used in enterprise systems, banking, education, transportation and more to store and access information.

Uploaded by

Ally Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT-I

Introduction to Databases: Advantage of Database System, Database System versus File


System, View of Data, Database System Concepts and Architecture: Data Models, Schemas and
Instances, Three schema architecture and Data Independence, Database Languages and
Interfaces, Classification of Database Management Systems.

Databases and database systems are an essential component of life in modern society: most of us
encounter several activities every day that involve some interaction with a database. For
example, if we go to the bank to deposit or withdraw funds, if we make a hotel or airline
reservation, if we access a computerized library catalog to search for a item, or if we purchase
something online—such as a book, toy, or computer—chances are that our activities will involve
someone or some computer program accessing a database. Even purchasing items at a
supermarket often automatically updates the database that holds the inventory of grocery items.

Databases:

➢ In Simple words, A Database is a place to store data.

➢ A database is a collection of related data and it is designed, built, and populated with data
for a specific purpose.

➢ A database can be of any size and complexity. For example, the list of names and
addresses may consist of only a few hundred records, each with a simple structure. On
the other hand, the computerized catalog of a large library may contain half a million
entries.

➢ An example of a large commercial database is Amazon.com. It contains data for over 60


million active users, and millions of books, CDs, videos, DVDs, games, electronics,
apparel, and other items. The database occupies over 42 terabytes and is stored on
hundreds of computers (called servers). Millions of visitors access Amazon.com each day
and use the database to make purchases. The database is continually updated as new
books and other items are added to the inventory, and stock quantities are updated as
purchases are transacted.

➢ A database may be generated and maintained manually or it may be computerized. For


example, a library card catalog is a database that may be created and maintained
manually.
➢ A computerized database may be created and maintained either by a group of application
programs written specifically for that task or by a database management system. Of
course, we are only concerned with computerized databases in this text.

An Example of a simple Database

Student
Name Roll Number Course No
Akash 34434 1
Mahesh 34345 2
Ramesh 45556 1

Course
Course No Course Name Department
1 BCA CA
2 MCA CA

Let us consider a simple example: a COLLEGE database for maintaining information concerning
students, and courses in a College. Figure shows the database structure and a few sample data
records. The database is organized as two files, each of which stores data records of the same
type.

The STUDENT file stores data on each student and the COURSE file stores data on each course.

To define this database, we must specify the structure of the records of each file by specifying
the different types of data elements to be stored in each record. In Figure, each STUDENT
record includes data to represent the student’s Name, Student Roll number and course no.

Each COURSE record includes data to represent the Course no, Course name and Department
(the department that offers the course).

Database management system (DBMS):


A database management system (DBMS) is a computerized system that enables users to create
and maintain a database. In other words, The DBMS is hence a general-purpose software system
that facilitates the processes of defining, constructing, manipulating, and sharing databases
among various users and applications for example: Oracle, Microsoft SQL Server, MS Access,
IBM’s DB2, MySql, Ingres, Informix etc.

➢ Defining a database involves specifying the data types, structures, and constraints of the
data to be stored in the database.
➢ Constructing the database is the process of storing the data on some storage medium that
is controlled by the DBMS.

➢ Manipulating a database includes functions such as querying the database to retrieve


specific data, updating the database to reflect changes and generating reports from the
data.

➢ Sharing a database allows multiple users and programs to access the database
simultaneously.

Other important functions provided by the DBMS include protecting the database and
maintaining it over a long period of time.

➢ Protection includes system protection against hardware or software malfunction (or


crashes) and security protection against unauthorized or malicious access.

➢ A typical large database may have a life cycle of many years, so the DBMS must be able
to maintain the database system by allowing the system to evolve as requirements change
over time.

An application program accesses the database by sending queries or requests for data to the
DBMS.

Database-System Applications:

Databases are widely used. Here are some representative applications:

• Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting
information.
◦ Human resources: For information about employees, salaries, payroll taxes, and
benefits, and for generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of
items in factories, inventories of items in warehouses and stores, and orders for items.
◦ Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.

• Banking and Finance


◦ Banking: For customer information, accounts, loans, and banking transactions.
◦ Credit card transactions: For purchases on credit cards and generation of monthly
statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds; also for storing real-time market data to enable
online trading by customers.

• Universities: For student information, course registrations, and grades (in addition to standard
enterprise information such as human resources and accounting).

• Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.

• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the
communication networks.

File Processing Systems:

➢ One way to keep the information on a computer is to store it in operating system files. To
allow users to manipulate the information, the system has a number of application
programs that manipulate the files. Application programs are usually written in languages
like COBOL, C, and C++ etc.

➢ New application programs are added to the system as the need arises so the File
Processing system stores permanent records in various files, and it needs different
application programs to extract records from, and add records to, the appropriate files.

➢ Before database management systems (DBMSs) were introduced, organizations usually


stored information in such systems. Keeping organizational information in a file-
processing system has a number of major disadvantages:

1) Data redundancy and inconsistency:

In traditional software development utilizing file processing, every user group maintains its own
files for handling its data-processing applications. For example, consider the COLLEGE
database example; here, two groups of users might be the course registration personnel and the
accounting office. In the traditional approach, each group independently keeps files on students.
The accounting office keeps data on registration and related billing information, where as the
registration office keeps track of student courses and grades. Other groups may further duplicate
some or all of the same data in their own files.

This redundancy in storing the same data multiple times leads to several problems:

➢ First, there is the need to perform a single logical update—such as entering data on a new
student—multiple times: once for each file where student data is recorded. This leads to
duplication of effort.
➢ Second, storage space is wasted when the same data is stored repeatedly, and this
problem may be serious for large databases.

➢ Third, files that represent the same data may become inconsistent. This may happen
because an update is applied to some of the files but not to others.

2) Difficulty in accessing data:

Sometimes we may have to write a new application program to satisfy an unusual request.
Suppose that one of the university clerks needs to find out the names of all students who live
within a particular postal-code area. The clerk asks the data-processing department to generate
such a list. Because the designers of the original system did not anticipate this request, there is no
application program on hand to meet it. The university clerk has now two choices: either obtain
the list of all students and extract the needed information manually or ask a programmer to write
the necessary application program. Both alternatives are obviously unsatisfactory. The point here
is that conventional file-processing Systems do not allow needed data to be retrieved in a
convenient and efficient manner.

3) Integrity problems:

The data values stored in the database must satisfy certain types of consistency constraints. For
example, the balance of bank accounts may never fall below a prescribed amount. Developers
enforce these constraints in the system by adding appropriate code in the various application
programs. However, when new constraints are added, it is difficult to change the programs to
enforce them.

4) Atomicity problems:

A computer system, like any other device, is subject to failure. In many applications, it is crucial
that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure.
Consider a program to transfer $500 from the account balance from account A to the account
balance of account B. If a system failure occurs during the execution of the program, it is
possible that the $500 was removed from the account A but was not credited to the account B,
resulting in an inconsistent database state. Clearly, it is essential to database consistency that
either both the credit and debit occur, or that neither occur. That is, the funds transfer must be
atomic—it must happen in its entirety or not at all. It is difficult to ensure atomicity in a
conventional file-processing system.

5) Concurrent-access anomalies:

Concurrent access means many systems allow multiple users to update the data simultaneously.

Consider account A, with an account balance of $10,000. If two people debit the account balance
(by say $500 and $100, respectively) from account A at almost exactly the same time, the result
of the concurrent executions may leave an incorrect (or inconsistent) state. Suppose that the
programs executing on behalf of each withdrawal read the old balance, reduce that value by the
amount being withdrawn, and write the result back. If the two programs run concurrently, they
may both read the value $10,000, and write back $9500 and $9900, respectively. Depending on
which one writes the value last, the account balance of department A may contain either $9500
or $9900, rather than the correct value of $9400. To guard against this possibility, the system
must maintain some form of supervision. But supervision is difficult to provide because data
may be accessed by many different application programs that have not been coordinated
previously.

6) Security problems:

Not every user of the database system should be able to access all the data. For example, in a
university, payroll personnel need to see only that part of the database that has financial
information. They do not need access to information about academic records. Enforcing such
security constraints is difficult.

Advantages of Using the DBMS Approach:

1) Controlling Redundancy: This redundancy in storing the same data multiple times leads to
several problems. First, there is the need to perform a single logical update—such as entering
data on a new student—multiple times: once for each file where student data is recorded. This
leads to duplication of effort.

Second, storage space is wasted when the same data is stored repeatedly, and this problem may
be serious for large databases

Third, files that represent the same data may become inconsistent. This may happen because an
update is applied to some of the files but not to others.

So any redundancies that exist in the database are controlled by DBMS and the system ensures
that these multiple copies are consistent.

2) Enforcing Integrity Constraints: Most database applications have certain integrity


constraints that must hold for the data. A DBMS should provide capabilities for defining and
enforcing these constraints. The simplest type of integrity constraint involves specifying a data
type for each data item.

A more complex type of constraint that frequently occurs involves specifying that a record in one
file must be related to records in other files. This is known as a referential integrity constraint.
Another type of constraint specifies uniqueness on data item values and this is known as a key or
uniqueness constraint.

Another example of constraint is the value for the age of an employee may be in the range of 16
and 65.
3) Shared Data: A Database allows the sharing of data under its control by any number of
application programs or users.

4) Restricting Unauthorized Access: When multiple users share a large database, it is likely
that most users will not be authorized to access all information in the database. For example,
financial data such as salaries and bonuses is often considered confidential and only authorized
persons are allowed to access such data. In addition, some users may only be permitted to
retrieve data, whereas others are allowed to retrieve and update. Hence, the type of access
operation—retrieval or update—must also be controlled. Typically, users or user groups are
given account numbers protected by passwords, which they can use to gain access to the
database. A DBMS should provide a security and authorization subsystem, which the DBA uses
to create accounts and to specify account restrictions. Then, the DBMS should enforce these
restrictions automatically.

5) Providing Backup and Recovery: A DBMS must provide facilities for recovering from
hardware or software failures. The backup and recovery subsystem of the DBMS is responsible
for recovery. For example, if the computer system fails in the middle of a complex update
transaction, the recovery subsystem is responsible for making sure that the database is restored to
the state it was in before the transaction started executing.

6) Representing Complex Relationships among Data: A database may include numerous


varieties of data that are interrelated in many ways. A DBMS must have the capability to
represent a variety of complex relationships among the data, to define new relationships as they
arise, and to retrieve and update related data easily and efficiently.

7) Providing Multiple User Interfaces: Because many types of users with varying levels of
technical knowledge use a database, a DBMS should provide a variety of user interfaces. These
include apps for mobile users, query languages for casual users, programming language
interfaces for application programmers, forms and command codes for parametric users, and
menu-driven interfaces and natural language interfaces for standalone users.

8) Security Management: Security rules determine which users can access the database, which
data objects each user can access and which data operations the user can perform. All database
users may be authenticated to the DBMS through a user name and passwords or through
biometric authentication.

View of Data:

A major purpose of a database system is to provide users with an abstract view of the data. That
is, the system hides certain details of how the data are stored and maintained.

Data Abstraction

Since many database-system users are not computer trained, developers hide the complexity
from users through several levels of abstraction, to simplify users’ interactions with the system:
➢ Physical level: It is a lowest level of abstraction that describes how the data are actually
stored. The physical level describes the complete details of data storage and access paths
for the database.

➢ Logical level: The next-higher level of abstraction that describes what data are stored in
the database, and what relationships exist among those data. The Logical level hides the
details of physical storage structure and concentrates on describing Entities, Data types,
Relationships and Constraints.

Database administrators, who must decide what information to keep in the database, use
the logical level of abstraction.

➢ View level: The highest level of abstraction describes only part of the entire database.
Many users of the database system do not need all the information; instead, they need to
access only a part of the database so the view level describes the part of the database that
a particular user group is interested in and hides the rest of the database from that user
group. The system may provide many views for the same database.

Figure shows the relationship among the three levels of abstraction:

Data Models:

Data Model is a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints. A data model provides a way to describe the design of a
database at the physical, logical, and view levels.
A Data Model is a plan for building a database. Every DBMS is based on a particular data
model. A Data model consists of rules and standards that define how data is organized in a
database. There are many basic types of data models:

1) Hierarchical Data Model: The hierarchical database is the oldest form of the database. It was
developed by IBM for its IMS (Information management system).This data model organizes the
data in a tree structure that is each child node can have only one parent node and at the top of the
structure, there is a single parent node. This model uses parent-child relationship that is one-to-
many relationship, however it restricts a child segment having only one parent segment.

Limitation:
1. The processing is sequential along the branches of the tree and therefore the access time
becomes longer.
2. This hierarchical tree is implemented through pointers from parents to their children. This
requires extra storage.
3. Deletion of parent deletes its children nodes.
4. Changes in relationships require changes in the entire structure of the database.

2) Network Data Model: The network data model was developed as an alternative to the
hierarchical database. This model was formulated in 1991 by Database Task group(DBTG) in the
conference on Data System languages. The Network Data model expands on the hierarchical
data model by providing multiple paths among the segments, that is, more than one parent-child
relationship. Hence this model allows many-to-many relationships. The main disadvantage is
that it can be quite complicating to maintain all the links.
3) Relational Model: The relational model uses a collection of tables to represent both data and
the relationships among those data. Each table has multiple columns, and each column has a
unique name. Tables are also known as relations.

Relational database model use two dimensional table to store data. They order data in a table
comprising of rows and columns and differ remarkably from their hierarchical and network
counterparts. There are no parent and child data set.

The relational data model is the most widely used data model, and a vast majority of current
database systems are based on the relational model.

4) Entity-Relationship Model: The entity-relationship (E-R) data model uses a collection of


basic objects, called entities, and relationships among these objects. An entity is a “thing” or
“object” in the real world that is distinguishable from other objects. The entity-relationship
model is widely used in database design.

5) Object-Based Data Model: Object-oriented programming (especially in Java, C++, or C#)


has become the dominant software-development methodology. This led to the development of an
object-oriented data model that can be seen as extending the E-R model with notions of
encapsulation, methods (functions), and object identity. The object-relational data model
combines features of the object-oriented data model and relational data model.

Instances and Schemas:

➢ The description of a database is called the database schema, which is specified during
database design or the overall design of the database is called the database schema and is
not expected to change frequently. A displayed schema is called a schema diagram. The
diagram displays the structure of each record type but not the actual instances of records.

➢ The actual data in the database may change quiet frequently so Databases change over
time as information is inserted and deleted. The collection of information stored in the
database at a particular moment is called an instance of the database or it is also called
database state or snapshot.
➢ When we define a new database, we specify its database schema only to the DBMS. At
this point, the corresponding database state is the empty state with no data. We get the
initial state of the database when the database is first populated or loaded with the initial
data. From then on, every time an update operation is applied to the database, we get
another database state. At any point in time, the database has a current state.

Database systems have several schemas, partitioned according to the levels of abstraction.

➢ The physical schema describes the database design at the physical level,

➢ The logical schema describes the database design at the logical level.

➢ A database may also have several schemas at the view level, sometimes called
subschemas that describe different views of the database.

Three Schema Architecture


Data Independence:

Data Independence can be defined as the capacity to change the schema at one level of a
database system without having to change the schema at the next higher level. We can define
two types of data independence:

1) Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the
conceptual schema to expand the database (by adding a record type or data item), to
change constraints, or to reduce the database (by removing a record type or data item).

2) Physical data independence is the capacity to change the internal schema without having
to change the logical schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some physical files were
reorganized—for example, by creating additional access structures—to improve the
performance of retrieval or update.

Database Languages:

A database system provides a data-definition language to specify the database schema and a
data-manipulation language to express database queries and updates. In practice, the data-
definition and data-manipulation languages are not two separate languages; instead they simply
form parts of a single database language, such as the widely used SQL language.

1) Data-Definition Language:

We specify a database schema by a set of definitions expressed by a special language called a


data-definition language (DDL). The DDL is also used to specify additional properties of the
data. The DBMS will have a DDL Compiler whose function is to process DDL statements.

The data values stored in the database must satisfy certain consistency constraints. For example,
suppose the university requires that the account balance of a department must never be negative.
The DDL provides facilities to specify such constraints. The database system checks these
constraints every time the database is updated.

It is a set of SQL Commands used to create, modify and delete database structures but not data.
These commands are normally not used by a general user but by the DBA.

2) Data-Manipulation Language

A data-manipulation language (DML) is a language that enables users to access or manipulate


data .The types of access are:

• Retrieval of information stored in the database


• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database

There are basically two types:

• Procedural DMLs require a user to specify what data are needed and how to get those data.
DML statements must be identified within the program so that they can be extracted by a
precompiler and processed by the DBMS. A low level or procedural DML must be embedded in
a general-purpose programming language.

This type of DML typically retrieves individual records or objects from the database and
processes each separately. Therefore, it needs to use programming language constructs, such as
looping, to retrieve and process each record from a set of records.

• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data
are needed without specifying how to get those data. A high-level or nonprocedural DML can be
used on its own to specify complex database operations concisely. Declarative DMLs are usually
easier to learn and use than are procedural DMLs.

A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language. Although technically incorrect, it is
common practice to use the terms query language and data-manipulation language
synonymously.

Database Administrator:

One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called
a database administrator (DBA). The functions of a DBA include:

1) Schema definition: The DBA creates the original database schema by executing a set of
data definition statements in the DDL.

2) Security and Authorization: The Database Administrator (DBA) is responsible for


ensuring that unauthorized data access is not permitted. Users can be granted permission
to access only certain view and relations.

3) Schema and physical-organization modification: The Database Administrator (DBA)


carries out changes to the schema and physical organization to reflect the changing needs
of the organization.

4) Data Availability and Recovery from failures: The Database Administrator (DBA)
must take steps to ensure that if the system fails, users can continue to access as much of
the uncorrupted data as possible.
5) Routine maintenance: Examples of the database administrator’s routine maintenance
activities are:

• Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
• Ensuring that enough free disk space is available for normal operations, and
upgrading disk space as required.

DBMS Interfaces:

User-friendly interfaces provided by a DBMS may include the following:

1) Menu-based Interfaces for Web Clients or Browsing:

These interfaces present the user with lists of options (called menus) that lead the user through
the formulation of a request. Menus do away with the need to memorize the specific commands
and syntax of a query language; rather, the query is composed step-by-step by picking options
from a menu that is displayed by the system.

2) Apps for Mobile Devices:

These interfaces present mobile users with access to their data. For example, banking,
reservations, and insurance companies, among many others, provide apps that allow users to
access their data through a mobile phone or mobile device.

3) Forms-based Interfaces:

A forms-based interface displays a form to each user. Users can fill out all of the form entries to
insert new data, or they can fill out only certain entries, in which case the DBMS will retrieve
matching data for the remaining entries.

Forms are usually designed and programmed for naive users. Many DBMSs have forms
specification languages, which are special languages that help programmers, specify such forms.

SQL*Forms is a form-based language that specifies queries using a form designed in conjunction
with the relational database schema.

Oracle Forms is a component of the Oracle product suite that provides an extensive set of
features to design and build applications using forms.

4) Graphical User Interfaces:

A GUI typically displays a schema to the user in diagrammatic form. The user then can specify a
query by manipulating the diagram. In many cases, GUIs utilize both menus and forms.
5) Interfaces for Naive Users:

Parametric users, such as bank tellers, often have a small set of operations that they must perform
repeatedly. For example, a teller is able to use single function keys to invoke routine and
repetitive transactions such as account deposits or withdrawals, or balance inquiries. Systems
analysts and programmers design and implement a special interface for each known class of
naive users.

6) Interfaces for the DBA:

Most database systems contain privileged commands that can be used only by the DBA staff.
These include commands for creating accounts, setting system parameters, granting account
authorization, changing a schema, and reorganizing the storage structures of a database.

Classification of Database Management Systems:


Several criteria can be used to classify DBMSs.

➢ The first is the data model on which the DBMS is based. The main data model used in
many current commercial DBMSs is the relational data model, and the systems based on
this model are known as SQL systems. The object data model has been implemented in
some commercial systems but has not had widespread use. Many legacy applications still
run on database systems based on the hierarchical and network data models.

➢ The second criterion used to classify DBMSs is the number of users supported by the
system. Single-user systems support only one user at a time and are mostly used with
PCs. Multiuser systems, which include the majority of DBMSs, support concurrent
multiple users.

➢ The third criterion is the number of sites over which the database is distributed. A DBMS
is centralized if the data is stored at a single computer site. A centralized DBMS can
support multiple users, but the DBMS and the database reside totally at a single computer
site. A distributed DBMS (DDBMS) can have the actual database and DBMS software
distributed over many sites connected by a computer network.

➢ The fourth criterion is cost. It is difficult to propose a classification of DBMSs based on


cost. Today we have open source (free) DBMS products like MySQL and PostgreSQL
that are supported by third-party vendors with additional services.

The main RDBMS products are available as free examination 30-day copy versions as
well as personal versions, which may cost under $100 and allow a fair amount of
functionality.
Furthermore, they are sold in the form of licenses—site licenses allow unlimited use of
the database system with any number of copies running at the customer site.

➢ Finally, a DBMS can be general purpose or special purpose. When performance is a


primary consideration, a special-purpose DBMS can be designed and built for a specific
application; such a system cannot be used for other applications without major changes.
Many airline reservations and telephone directory systems developed in the past are
special-purpose DBMSs.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy