University of Zimbabwe: Department of Computer Science 2013 Course Notes Contacts

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

UNIVERSITY OF ZIMBABWE

DEPARTMENT OF COMPUTER SCIENCE


Database Concepts CT211
2013 Course Notes
Instructor: Owen Kufandirimbwa
Contacts: kufandirimbwa@gmail.com
okufandirimbwa@science.uz.ac.zw
Cell: 0712 784 287, Office: Computer Science Office 112

Purpose of the course


This course introduces computer science students to the underlying concepts and theory of database systems.
Topics include database concepts, architectures, data models, query languages, conceptual and logical
design, physical organization and transaction management. The entity-relationship model and relational
model are discussed in detail for the basis of developing the conceptual model. The mapping from the
conceptual level to the logical level, integrity constraints, dependencies and normalization are studied as a
basis for formal design. Theoretical languages such as the relational algebra and the relational calculus are
described, and high-level languages such as SQL are discussed and practiced. File organization and access
methods are provided as a basis for discussion of query optimization techniques. Transaction processing
techniques are presented with a specific emphasis on concurrency control and database recovery. Though
the course is mainly introductory with much theoretical concepts to discuss, it also includes an examinable
practical component.
Aim

This course aims to theoretically and practically equip students with the necessary database concepts,
providing students with an introduction to database systems: what they are, how they are designed, and how
they are implemented.

Course objectives:

i. To enable students to understand the concepts and practical applications of database systems.
ii. To enable students to solve database related problems
iii. Encourage and motivate students to carry out research in the field of database systems
Evaluation:

 Course work – 30%


 Theoretical and Practical Assignments –12%
 Tests – 10%
 Quizzes and Tutorial Participation – 4%
 Homework —4%
 Final Exam:
Theory Exam – 60%
Practical Exam - 40%
Recommended References:

 R. Elmasri and S. Navathe, Fundamentals of Database Systems, Benjamin/Cummings Publishing


Co. 1989 or later
 Rob and Coronel, Database Systems: Design, Implementation, and Management, Seventh
Edition,
 Thomas Connolly and Carolyn Begg, Database Systems, Addison Wesley 2002
C. J Date, Introduction to Database Systems
1. Introduction to databases
 File Systems
One way to keep the information on a computer is to store it in the permanent files. The
system has a number of application programs; each of them is defined to manipulate the data
files. These application programs have been written on request of the users in the
organization. New application will be added to the system as the need arises. The system just
described is called the file-based system.
Consider a traditional banking system which uses the file-based system in managing the
organization’s data in the picture below. As we can see, there are different departments in the
Bank, each of them has their own applications which manage and manipulate different data
files. For banking system, the programs can be the one to debit or credit an account, find the
balance of an account, add a new mortgage loan or generate monthly statements etc.

Figure 1: File-based approach for banking system


Keeping organizational information in this approach has a number of disadvantages,
including :

 Data Redundancy: Since files and applications are created by different programmer of
various departments over long period of time, it might lead to several problems:
o Inconsistency in data format
o The same information may be kept in several different place (files).
o Data inconsistency which means various copies of the same data are
conflicting ; waste storage space and duplication of effort
 Data Isolation
o It is difficult for new application to retrieve the appropriate data which might
be stored in various files.

 Integrity problems
o Data values must satisfy certain consistency constraints which are specified in
the application programs.
o It is difficult to add change the programs to enforce new constraint
 Security problems
o There are constraint regarding accessing privileges
o Application is added to the system in the ad-hoc manner so it is difficult to
enforce those constraints
 Concurrent – access anomalies
o Data may be accessed by many applications that have not been coordinated
previously so it is not easy to provide a strategy to support multiple users to
update data simultaneously

These difficulties have prompted the development of a new approach in managing large
amount of organizational information – database approach. In the following section, we shall
see the concepts that have been introduced to get over the problems mentioned.
 Database Approach

Database and database technology play an important role in most of social areas where
computer are used, including business, education, medicine etc. To understand the
fundamental of database system, we start from introducing the basic concepts in this area.

 Fundamental Concepts

Database is a shared collection of related data which will be used to support the activities of
particular organization. Database can be viewed as a repository of data that is defined once
and then is accessed by various users. A database has the following properties:

 It is a representation of some aspect of the real world; or perhaps, a collection of data


elements (facts) representing real world information.
 Database is logical coherent and internally consistent.
 Database is designed, built, and populated with data for a specific purpose.

Database Management System (DBMS) is a collection of programs that enable users to


create, maintain database and control all the access to the database. The primary goal of the
DBMS is to provide an environment that is both convenient and efficient for user to retrieve
and store information.
Application program accesses the data stored in the database by sending requests to the
DBMS.

Figure 2: The components of a database system


With the database approach, we can have the traditional banking system as shown in the
following picture.
Figure 3: Database approach for banking system

 Characteristics of Database approach

There are a number of characteristics that distinguish the database approach with the file-
based approach. In this section, we describe in detail some of those important characteristics.

1. Self-Describing Nature of a Database System : Database System contains not only the
database itself but also the descriptions of data structure and constraints (meta-data).
These information is used by the DBMS software or database users if needed. This
separation makes database system totally different from traditional file-based system
in which data definition is a part of application programs
2. Insulation between Program and Data : In the file base system, the structure of the
data files is defined in the application programs so if user want to change the structure
of a file, all the programs access to that files might need to be changed. On the other
hand, in database approach, data structure is stored in the system catalog not in the
programs so such changes might not occurs.
3. Support multiple views of data: A view is a subset of the database which is defined
and dedicated for particular users of the system. Multiple users in the system might
have different views of the system. Each view might contains only the interested data
of an user or a group of user.
4. Sharing of data and Multiuser system: A multiuser database system must allow
multiple users access the database at the same time. As the result, the multiuser
DBMS must have concurrency control strategies to ensure that several user try to
access the same data item at a time do so in the manner so that the data always be
correct.

 Benefits of Database Approach

1. To control Data Redundancy


o In the Database approach, ideally each data item is stored in only one place in
the database
o However, in some case redundancy is still exists to improving system
performance, but such redundancy is controlled and kept to minimum
2. Data Sharing
o The integration of the whole data in an organization leads to the ability to
produce more information from a given amount of data
3. Enforcing Integrity Constraints
o DBMSs should provide capabilities to define and enforce certain constraints
such as data type, data uniqueness.
4. Restricting Unauthorised Access
o Not all users of the system have the same accessing privileges.
o DBMSs should provide a security subsystem to create and control the user
accounts.
5. Data Independence
o The system data descriptions are separated from the application programs.
o Changes to the data structure is handled by the DBMS and not embedded in
the program.
6. Transaction Processing
o The DBMS must include concurrency control subsystem to ensure that several
users trying to update the same data do so in a controlled manner so that the
result of the updates is correct.
7. Providing multiple views of data
o A view may be a subset of the database. Various users may have different
views of the database itself.
o Users may not need to be aware of how and where the data they refer to is
stored
8. Providing backup and recovery facilities
o If the computer system fails in the middle of a complex update program, the
recovery subsystem is responsible for making sure that the database is restored
to the stage it was in before the program started executing.

 Database User Types


 End users

People whose jobs require access to database for querying, updating and generating report.
An end users might by one of the following

 Naïve users who use the existing application programs to perform their daily tasks
 Sophisticated users are who use their own way to access to the database. This mean
they do not use the application program provided in the system. Instead, they might
define their own application or describe their need directly in a query languages.
 Specialized users maintain the personal database by using ready –make program
packages that provide easy-to-use menu.

 Application Programmer

People implement specific application program to access to the stored data. This kind of user
need to familiar with the DBMSs to accomplish their task.

 Database Administrators

A person or a group of people in the organization who is responsible for authorizing the
access to the database, monitoring its use and managing all the resource to support the use of
the whole database system
 Database Architecture

Three –level Architecture (ANSI/SPARC Architecture)

Three – level architecture for database system is proposed to archive the characteristics of the
database approach. The goal of this architecture is separate the applications and the physical
database so the actual details of how data is organized are hided from the users.

Figure 6: Three- level Architecture


As we can see from above picture, there are three levels of schemas in the database
architecture
External level:

 In this highest level, there exists a number of views which of is defined a part of the
actual database.
 Each view is provided for a user or a group of users so that it helps in simplified the
interaction between the user and system.

Conceptual level: Conceptual Schema in this level describes the logical structure of the whole
database.

 The entire database is described using simple logical concepts such as objects, their
properties or relationships. Thus the complexity of the implementation detail of the
data with be hided from the users.

 Internal level: Internal Schema in this level describes how the data are actually stored,
how to access the data.
External View

A user is anyone who needs to access some portion of the data. They may range from
application programmers to casual users with adhoc queries. Each user has a language at
his/her disposal.

The application programmer may use a high level language ( e.g. COBOL) while the casual
user will probably use a query language.

Regardless of the language used, it will include a data sublanguage DSL which is that subset
of the language which is concerned with storage and retrieval of information in the database
and may or may not be apparent to the user.

A DSL is a combination of two languages:

 a data definition language (DDL) - provides for the definition or description of database
objects
 a data manipulation language (DML) - supports the manipulation or processing of database
objects.

Each user sees the data in terms of an external view: Defined by an external schema,
consisting basically of descriptions of each of the various types of external record in that
external view, and also a definition of the mapping between the external schema and the
underlying conceptual schema.

Conceptual View

 An abstract representation of the entire information content of the database.


 It is in general a view of the data as it actually is, that is, it is a `model' of the `realworld'.
 It consists of multiple occurrences of multiple types of conceptual record, defined in the
conceptual schema.
 To achieve data independence, the definitions of conceptual records must involve information
content only.
 storage structure is ignored
 access strategy is ignored
 In addition to definitions, the conceptual schema contains authorisation and validation
procedures.

Internal View

The internal view is a low-level representation of the entire database consisting of multiple
occurrences of multiple types of internal (stored) records.

It is however at one remove from the physical level since it does not deal in terms of physical
records or blocks nor with any device specific constraints such as cylinder or track sizes.
Details of mapping to physical storage is highly implementation specific and are not
expressed in the three-level architecture.

The internal view described by the internal schema:


 defines the various types of stored record
 what indices exist
 how stored fields are represented
 what physical sequence the stored records are in

In effect the internal schema is the storage structure definition.

Mappings

 The conceptual/internal mapping:


o defines conceptual and internal view correspondence
o specifies mapping from conceptual records to their stored counterparts
 An external/conceptual mapping:
o defines a particular external and conceptual view correspondence
 A change to the storage structure definition means that the conceptual/internal mapping must be
changed accordingly, so that the conceptual schema may remain invariant, achieving physical
data independence.
 A change to the conceptual definition means that the conceptual/external mapping must be
changed accordingly, so that the external schema may remain invariant, achieving logical data
independence.

DBMS

The database management system (DBMS) is the software that:

 handles all access to the database


 is responsible for applying the authorisation checks and validation procedures

DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server, FileMaker, Oracle,
RDBMS, dBASE, Clipper, and FoxPro

Conceptually what happens is:

1. A user issues an access request, using some particular DML.


2. The DBMS intercepts the request and interprets it.
3. The DBMS inspects in turn the external schema, the external/conceptual mapping, the
conceptual schema, the conceptual internal mapping, and the storage structure definition.
4. The DBMS performs the necessary operations on the stored database.

Database Administrator

The database administrator (DBA) is the person (or group of people) responsible for overall
control of the database system. The DBA's responsibilities include the following:

 deciding the information content of the database, i.e. identifying the entities of interest to the
enterprise and the information to be recorded about those entities. This is defined by writing
the conceptual schema using the DDL
 deciding the storage structure and access strategy, i.e. how the data is to be represented by
writing the storage structure definition. The associated internal/conceptual schema must also
be specified using the DDL
 liaising with users, i.e. to ensure that the data they require is available and to write the
necessary external schemas and conceptual/external mapping (again using DDL)
 defining authorisation checks and validation procedures. Authorisation checks and validation
procedures are extensions to the conceptual schema and can be specified using the DDL
 defining a strategy for backup and recovery. For example periodic dumping of the database to
a backup tape and procedures for reloading the database for backup. Use of a log file where
each log record contains the values for database items before and after a change and can be
used for recovery purposes
 monitoring performance and responding to changes in requirements, i.e. changing details of
storage and access thereby organising the system so as to get the performance that is `best for
the enterprise'

Facilities and Limitations

The facilities offered by DBMS vary a great deal, depending on their level of sophistication.
In general, however, a good DBMS should provide the following advantages over a
conventional system:

 Independence of data and program - This is a prime advantage of a database. Both the
database and the user program can be altered independently of each other thus saving time
and money which would be required to retain consistency.
 Data shareability and nonredundance of data - The ideal situation is to enable applications to
share an integrated database containing all the data needed by the applications and thus
eliminate as much as possible the need to store data redundantly.
 Integrity - With many different users sharing various portions of the database, it is impossible
for each user to be responsible for the consistency of the values in the database and for
maintaining the relationships of the user data items to all other data item, some of which may
be unknown or even prohibited for the user to access.
 Centralised control - With central control of the database, the DBA can ensure that standards
are followed in the representation of data.
 Security - Having control over the database the DBA can ensure that access to the database is
through proper channels and can define the access rights of any user to any data items or
defined subset of the database. The security system must prevent corruption of the existing
data either accidently or maliciously.
 Performance and Efficiency - In view of the size of databases and of demanding database
accessing requirements, good performance and efficiency are major requirements. Knowing
the overall requirements of the organisation, as opposed to the requirements of any individual
user, the DBA can structure the database system to provide an overall service that is `best for
the enterprise'.

Data Independence

 This is a prime advantage of a database. Both the database and the user program can be
altered independently of each other.
 In a conventional system applications are datadependent. This means that the way in which
the data is organised in secondary storage and the way in which it is accessed are both
dictated by the requirements of the application, and, moreover, that knowledge of the data
organisation and access technique is built into the application logic.
 For example, if a file is stored in indexed sequential form then an application must know
o that the index exists
o the file sequence (as defined by the index)

The internal structure of the application will be built around this knowledge. If, for example,
the file was to be replaced by a hash-addressed file, major modifications would have to be
made to the application.
Such an application is data-dependent - it is impossible to change the storage structure (how
the data is physically recorded) or the access strategy (how it is accessed) without affecting
the application, probably drastically. The portions of the application requiring alteration are
those that communicate with the file handling software - the difficulties involved are quite
irrelevant to the problem the application was written to solve.

 it is undesirable to allow applications to be data-dependent - different applications will need


different views of the same data.
 the DBA must have the freedom to change storage structure or access strategy in response to
changing requirements without having to modify existing applications.
 Data independence can be defines as
`The immunity of applications to change in storage structure and access strategy'.

Data Redundancy

In non-database systems each application has its own private files. This can often lead to
redundancy in stored data, with resultant waste in storage space. In a database the data is
integrated.

The database may be thought of as a unification of several otherwise distinct data files, with
any redundancy among those files partially or wholly eliminated.

Data integration is generally regarded as an important characteristic of a database. The


avoidance of redundancy should be an aim, however, the vigour with which this aim should
be pursued is open to question.

Redundancy is

 direct if a value is a copy of another


 indirect if the value can be derived from other values:
o simplifies retrieval but complicates update
o conversely integration makes retrieval slow and updates easier
 Data redundancy can lead to inconsistency in the database unless controlled.
 the system should be aware of any data duplication - the system is responsible for ensuring
updates are carried out correctly.
 a DB with uncontrolled redundancy can be in an inconsistent state - it can supply incorrect or
conflicting information
 a given fact represented by a single entry cannot result in inconsistency - few systems are
capable of propagating updates i.e. most systems do not support controlled redundancy.

Data Integrity

This describes the problem of ensuring that the data in the database is accurate...

 inconsistencies between two entries representing the same `fact' give an example of lack of
integrity (caused by redundancy in the database).
 integrity constraints can be viewed as a set of assertions to be obeyed when updating a DB to
preserve an error-free state.
 even if redundancy is eliminated, the DB may still contain incorrect data.
 integrity checks which are important are checks on data items and record types.
Integrity checks on data items can be divided into 4 groups:

1. type checks
o e.g. ensuring a numeric field is numeric and not a character - this check should be
performed automatically by the DBMS.
2. redundancy checks
o direct or indirect (see data redundancy) - this check is not automatic in most cases.
3. range checks
o e.g. to ensure a data item value falls within a specified range of values, such as
checking dates so that say (age > 0 AND age < 110).
4. comparison checks
o in this check a function of a set of data item values is compared against a function of
another set of data item values. For example, the max salary for a given set of
employees must be less than the min salary for the set of employees on a higher
salary scale.

A record type may have constraints on the total number of occurrences, or on the insertions
and deletions of records. For example in a patient database there may be a limit on the
number of xray results for each patient or the details of a patients visit to hospital must be
kept for a minimum of 5 years before it can be deleted

 Centralized control of the database helps maintain integrity, and permits the DBA to define
validation procedures to be carried out whenever any update operation is attempted (update
covers modification, creation and deletion).
 Integrity is important in a database system - an application run without validation procedures
can produce erroneous data which can then affect other applications using that data.

 Database Development Process

Database development is a systematic process that moves from concept to design to


implementation. It also takes into account the needs of potential users and the operational
and/or business processes in the organization.

A comparison between Information systems life cycle and Database development


process
Systems Development
Life Cycle

Data Models, Schemas and Instances

Data Model is a collection of concepts that can be used to describe the structure of database.
Structure of database means data types, relationships and constraints. In addition, most data
model includes a set of basic operations for specifying retrievals and modifications on the
database. Data Model provides a means to achieve Data Abstraction. Data Abstraction is
refers to the hiding of certain details of how the data are stored and maintained. With several
levels of abstraction, the user’s view of the database is simplified and this leads to the
improved understanding of data.
There are three levels of abstractions:

1. View level: The highest level of abstraction describes only part of the entire database.
Many users will not be concerned with the large database. Instead, they need to access
only a part of it so that view level abstraction is defined. There are many views for the
same database.
2. Logical level: This level describes what data are stored in the whole database.
3. Physical level: The lowest level of abstraction describes how the data are actually
stored.

Categories of Data Model

 High-level Conceptual Data models: Provide concepts that are close to the way people
perceive data to present the data. Typical example of this type is entity – relationship
model which use main concepts like entities, attributes, relationships. An entity
represents real-world object such as an employee, a project. An entity has some
attributes which represents properties of entity such as employee’s name, address,
birthdate. A relationship represents association among entities for example a works on
relationships between employee and project.

 Record-based Logical Data models: Provide concepts that can be understood by the
user but not too far from the way data is stored in the computer. Three well-known
data models of this type are relational data model, network data model and
hierarchical data model.
o The Relational model represents data as relations. Here is an example of relational
schema for the SUPERMARKET database

EMPS (ENAME, SALARY) MANAGERS (ENAME)


DEPTS (DNAME, DEPT#) SUPPLIERS (SNAME, SADDR)
ITEMS (INAME, ITEM#) ORDERS (O#, DATE)
WORKS_IN (ENAME, DNAME) MANAGES (ENAME, DNAME)
CARRIES (INAME, DNAME) PLACED_BY (O#, CNAME)
CUSTOMERS(CNAME,CADDR,BALANCE) INCLUDES (O#, INAME, QUANTITY)
SUPPLIES (SNAME, INAME, PRICES)

o The Network model represents data as record types and also represents a limited type of
one to many relationship, called set type. The figure below shows a schema in network
model notation

Figure 4: Sample schema in network model

o The Hierarchical model represent data as hierarchical tree structures. Each hierarchy
represents a number of related records. Here is the schema in hierarchical model notation.
Figure 5: Sample schema in hierarchical model notation

o Physical Data models: Provide concepts that describe how data is actually stored in the
computer.

Database Instances and Schemas

The description of the database which is designed in the early stage and is not expected to
change frequently is called the database schema. Database system have several schemas.
Since information can be inserted to or deleted from database at anytime, database changes
over time. At a particular moment, the collection of information stored in the database is
called an instance of the database.

Data Independence

Data Independence is the ability to modify the schema in one level without affecting the
schema in the higher level.
There are two levels of data independence:

 Logical data independence is the ability to make change in the conceptual schema
without causing a modify in the user views or application program.
 Physical data independence is the ability to make change in the internal schema
without causing a modify in the conceptual schema or application program.

Physical data independence seem to be easier to achieve since the way the data is organized
in the memory affect only the performance of the system. Meanwhile, the application
program depends much on the logical structure of the data that they are access.

Database Language

Data Definition Language (DDL): This is used to define the conceptual and internal schemas
for a database system.

 It is not procedural language, rather a language for describing the types of entities and
relationships among them in terms of a particular data model.

 Data Manipulation Language (DML): This is used to manipulate the database, which
typically include retrieval, insertion, deletion, and modification of the data.

Classification of Database Systems

The database management systems can be classified based on several criteria.

 Based on data model: The most popular data model in today commercial DBMSs is
relational data model. Almost wellknown DBMSs like Oracle, MS SQL Server, DB2,
MySQL are support this model. Other traditional models can be named hierarchical
data model , network data model. In the recent year, we are getting familiar with
object-oriented data model but this model has not had widespread use. Some
examples of Object-oriented DBMSs are O2, ObjectStore or Jasmine.
 Based on number users we can have single user database system which support one
user at a time or multiuser syste,s which support multiple users concurrently
 Based on the ways database is distributed we have centralized or distributed database
system
o Centralized database system : Data in this kind of system is stored at a single
site.
o Distributed database sytem: Actual database and DBMS software are
distributed in various sites connected by a computer network.
 Homogeneous distributed Database Systems
 Use the same DBMS software at multiple sites
 Data exchange between various sites can be handle easily
 Heterogeneous distributed Database Systems
 Different sites might use differents DBMS softwares
 There is a software to support data exchange between sites

Figure 7: Centralized Database System

Figure 8: Distributed Database System

 The different database models – hierarchical, relational,


network
The Hierarchical Data Model

The Hierarchical Data Model structures data in a tree of records, with each record
having one parent record and many children. It can be represented as follows:

Figure 1 - The Hierarchical Data Model


A hierarchical database consists of the following:

1. It contains nodes connected by branches.

2. The top node is called the root.

3. If multiple nodes appear at the top level, the nodes are called root segments.

4. The parent of node nx is a node directly above nx and connected to nx by a branch.

5. Each node (with the exception of the root) has exactly one parent.

6. The child of node nx is the node directly below nx and connected to nx by a branch.

7. One parent may have many children.

By introducing data redundancy, complex network structures can also be represented as


hierarchical databases. This redundancy is eliminated in physical implementation by
including a 'logical child'. The logical child contains no data but uses a set of pointers to
direct the database management system to the physical child in which the data is
actually stored. Associated with a logical child are a physical parent and a logical parent.
The logical parent provides an alternative (and possibly more efficient) path to retrieve
logical child information.

The Network Data Model

The Network Data Model uses a lattice structure in which a record can have many
parents as well as many children. It can be represented as follows:

Figure 2 - The Network Data Model


Like the The Hierarchical Data Model the Network Data Model also consists of nodes
and branches, but a child may have multiple parents within the network structure instead
of being restricted to just one.

Hierarchical and network databases, both suffer from the following deficiencies (when
compared with relational databases):

 Access to the database was not via SQL query strings, but by a specific set of API's,
typically for FIND, CREATE, READ, UPDATE and DELETE.

 Each API would only access a single table (dataset), so it was not possible to implement a
JOIN which would return data from several tables.

 It was not possible to provide a variable WHERE clause. The only selection mechanism
availabe was

o read all entries (a full table scan).

o read a single entry using a specific primary key.

o read all entries on a child table which were associated with a selected entry on a
parent table

Any further filtering had to be done within the application code.

 It was not possible to provide an ORDER BY clause. Data was presented in the order in
which it existed in the database. This mechanism could be tuned by specifying sort criteria
to be used when each record was inserted, but this had several disadvantages:

o Only a single sort sequence could be defined for each path (link to a parent), so all
records retrieved on that path would be provided in that sequence.

o It could make inserts rather slow when attempting to insert into the middle of a large
collection, or where a table had multiple paths each with its own set of sort criteria.

Homework One:
With the aid of examples illustrate how shortcomings of the file based
system were addressed by the database approach [30 marks]

Due: 02/09/2013 at 1500pm

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy