University of Zimbabwe: Department of Computer Science 2013 Course Notes Contacts
University of Zimbabwe: Department of Computer Science 2013 Course Notes Contacts
University of Zimbabwe: Department of Computer Science 2013 Course Notes Contacts
This course aims to theoretically and practically equip students with the necessary database concepts,
providing students with an introduction to database systems: what they are, how they are designed, and how
they are implemented.
Course objectives:
i. To enable students to understand the concepts and practical applications of database systems.
ii. To enable students to solve database related problems
iii. Encourage and motivate students to carry out research in the field of database systems
Evaluation:
Data Redundancy: Since files and applications are created by different programmer of
various departments over long period of time, it might lead to several problems:
o Inconsistency in data format
o The same information may be kept in several different place (files).
o Data inconsistency which means various copies of the same data are
conflicting ; waste storage space and duplication of effort
Data Isolation
o It is difficult for new application to retrieve the appropriate data which might
be stored in various files.
Integrity problems
o Data values must satisfy certain consistency constraints which are specified in
the application programs.
o It is difficult to add change the programs to enforce new constraint
Security problems
o There are constraint regarding accessing privileges
o Application is added to the system in the ad-hoc manner so it is difficult to
enforce those constraints
Concurrent – access anomalies
o Data may be accessed by many applications that have not been coordinated
previously so it is not easy to provide a strategy to support multiple users to
update data simultaneously
These difficulties have prompted the development of a new approach in managing large
amount of organizational information – database approach. In the following section, we shall
see the concepts that have been introduced to get over the problems mentioned.
Database Approach
Database and database technology play an important role in most of social areas where
computer are used, including business, education, medicine etc. To understand the
fundamental of database system, we start from introducing the basic concepts in this area.
Fundamental Concepts
Database is a shared collection of related data which will be used to support the activities of
particular organization. Database can be viewed as a repository of data that is defined once
and then is accessed by various users. A database has the following properties:
There are a number of characteristics that distinguish the database approach with the file-
based approach. In this section, we describe in detail some of those important characteristics.
1. Self-Describing Nature of a Database System : Database System contains not only the
database itself but also the descriptions of data structure and constraints (meta-data).
These information is used by the DBMS software or database users if needed. This
separation makes database system totally different from traditional file-based system
in which data definition is a part of application programs
2. Insulation between Program and Data : In the file base system, the structure of the
data files is defined in the application programs so if user want to change the structure
of a file, all the programs access to that files might need to be changed. On the other
hand, in database approach, data structure is stored in the system catalog not in the
programs so such changes might not occurs.
3. Support multiple views of data: A view is a subset of the database which is defined
and dedicated for particular users of the system. Multiple users in the system might
have different views of the system. Each view might contains only the interested data
of an user or a group of user.
4. Sharing of data and Multiuser system: A multiuser database system must allow
multiple users access the database at the same time. As the result, the multiuser
DBMS must have concurrency control strategies to ensure that several user try to
access the same data item at a time do so in the manner so that the data always be
correct.
People whose jobs require access to database for querying, updating and generating report.
An end users might by one of the following
Naïve users who use the existing application programs to perform their daily tasks
Sophisticated users are who use their own way to access to the database. This mean
they do not use the application program provided in the system. Instead, they might
define their own application or describe their need directly in a query languages.
Specialized users maintain the personal database by using ready –make program
packages that provide easy-to-use menu.
Application Programmer
People implement specific application program to access to the stored data. This kind of user
need to familiar with the DBMSs to accomplish their task.
Database Administrators
A person or a group of people in the organization who is responsible for authorizing the
access to the database, monitoring its use and managing all the resource to support the use of
the whole database system
Database Architecture
Three – level architecture for database system is proposed to archive the characteristics of the
database approach. The goal of this architecture is separate the applications and the physical
database so the actual details of how data is organized are hided from the users.
In this highest level, there exists a number of views which of is defined a part of the
actual database.
Each view is provided for a user or a group of users so that it helps in simplified the
interaction between the user and system.
Conceptual level: Conceptual Schema in this level describes the logical structure of the whole
database.
The entire database is described using simple logical concepts such as objects, their
properties or relationships. Thus the complexity of the implementation detail of the
data with be hided from the users.
Internal level: Internal Schema in this level describes how the data are actually stored,
how to access the data.
External View
A user is anyone who needs to access some portion of the data. They may range from
application programmers to casual users with adhoc queries. Each user has a language at
his/her disposal.
The application programmer may use a high level language ( e.g. COBOL) while the casual
user will probably use a query language.
Regardless of the language used, it will include a data sublanguage DSL which is that subset
of the language which is concerned with storage and retrieval of information in the database
and may or may not be apparent to the user.
a data definition language (DDL) - provides for the definition or description of database
objects
a data manipulation language (DML) - supports the manipulation or processing of database
objects.
Each user sees the data in terms of an external view: Defined by an external schema,
consisting basically of descriptions of each of the various types of external record in that
external view, and also a definition of the mapping between the external schema and the
underlying conceptual schema.
Conceptual View
Internal View
The internal view is a low-level representation of the entire database consisting of multiple
occurrences of multiple types of internal (stored) records.
It is however at one remove from the physical level since it does not deal in terms of physical
records or blocks nor with any device specific constraints such as cylinder or track sizes.
Details of mapping to physical storage is highly implementation specific and are not
expressed in the three-level architecture.
Mappings
DBMS
DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server, FileMaker, Oracle,
RDBMS, dBASE, Clipper, and FoxPro
Database Administrator
The database administrator (DBA) is the person (or group of people) responsible for overall
control of the database system. The DBA's responsibilities include the following:
deciding the information content of the database, i.e. identifying the entities of interest to the
enterprise and the information to be recorded about those entities. This is defined by writing
the conceptual schema using the DDL
deciding the storage structure and access strategy, i.e. how the data is to be represented by
writing the storage structure definition. The associated internal/conceptual schema must also
be specified using the DDL
liaising with users, i.e. to ensure that the data they require is available and to write the
necessary external schemas and conceptual/external mapping (again using DDL)
defining authorisation checks and validation procedures. Authorisation checks and validation
procedures are extensions to the conceptual schema and can be specified using the DDL
defining a strategy for backup and recovery. For example periodic dumping of the database to
a backup tape and procedures for reloading the database for backup. Use of a log file where
each log record contains the values for database items before and after a change and can be
used for recovery purposes
monitoring performance and responding to changes in requirements, i.e. changing details of
storage and access thereby organising the system so as to get the performance that is `best for
the enterprise'
The facilities offered by DBMS vary a great deal, depending on their level of sophistication.
In general, however, a good DBMS should provide the following advantages over a
conventional system:
Independence of data and program - This is a prime advantage of a database. Both the
database and the user program can be altered independently of each other thus saving time
and money which would be required to retain consistency.
Data shareability and nonredundance of data - The ideal situation is to enable applications to
share an integrated database containing all the data needed by the applications and thus
eliminate as much as possible the need to store data redundantly.
Integrity - With many different users sharing various portions of the database, it is impossible
for each user to be responsible for the consistency of the values in the database and for
maintaining the relationships of the user data items to all other data item, some of which may
be unknown or even prohibited for the user to access.
Centralised control - With central control of the database, the DBA can ensure that standards
are followed in the representation of data.
Security - Having control over the database the DBA can ensure that access to the database is
through proper channels and can define the access rights of any user to any data items or
defined subset of the database. The security system must prevent corruption of the existing
data either accidently or maliciously.
Performance and Efficiency - In view of the size of databases and of demanding database
accessing requirements, good performance and efficiency are major requirements. Knowing
the overall requirements of the organisation, as opposed to the requirements of any individual
user, the DBA can structure the database system to provide an overall service that is `best for
the enterprise'.
Data Independence
This is a prime advantage of a database. Both the database and the user program can be
altered independently of each other.
In a conventional system applications are datadependent. This means that the way in which
the data is organised in secondary storage and the way in which it is accessed are both
dictated by the requirements of the application, and, moreover, that knowledge of the data
organisation and access technique is built into the application logic.
For example, if a file is stored in indexed sequential form then an application must know
o that the index exists
o the file sequence (as defined by the index)
The internal structure of the application will be built around this knowledge. If, for example,
the file was to be replaced by a hash-addressed file, major modifications would have to be
made to the application.
Such an application is data-dependent - it is impossible to change the storage structure (how
the data is physically recorded) or the access strategy (how it is accessed) without affecting
the application, probably drastically. The portions of the application requiring alteration are
those that communicate with the file handling software - the difficulties involved are quite
irrelevant to the problem the application was written to solve.
Data Redundancy
In non-database systems each application has its own private files. This can often lead to
redundancy in stored data, with resultant waste in storage space. In a database the data is
integrated.
The database may be thought of as a unification of several otherwise distinct data files, with
any redundancy among those files partially or wholly eliminated.
Redundancy is
Data Integrity
This describes the problem of ensuring that the data in the database is accurate...
inconsistencies between two entries representing the same `fact' give an example of lack of
integrity (caused by redundancy in the database).
integrity constraints can be viewed as a set of assertions to be obeyed when updating a DB to
preserve an error-free state.
even if redundancy is eliminated, the DB may still contain incorrect data.
integrity checks which are important are checks on data items and record types.
Integrity checks on data items can be divided into 4 groups:
1. type checks
o e.g. ensuring a numeric field is numeric and not a character - this check should be
performed automatically by the DBMS.
2. redundancy checks
o direct or indirect (see data redundancy) - this check is not automatic in most cases.
3. range checks
o e.g. to ensure a data item value falls within a specified range of values, such as
checking dates so that say (age > 0 AND age < 110).
4. comparison checks
o in this check a function of a set of data item values is compared against a function of
another set of data item values. For example, the max salary for a given set of
employees must be less than the min salary for the set of employees on a higher
salary scale.
A record type may have constraints on the total number of occurrences, or on the insertions
and deletions of records. For example in a patient database there may be a limit on the
number of xray results for each patient or the details of a patients visit to hospital must be
kept for a minimum of 5 years before it can be deleted
Centralized control of the database helps maintain integrity, and permits the DBA to define
validation procedures to be carried out whenever any update operation is attempted (update
covers modification, creation and deletion).
Integrity is important in a database system - an application run without validation procedures
can produce erroneous data which can then affect other applications using that data.
Data Model is a collection of concepts that can be used to describe the structure of database.
Structure of database means data types, relationships and constraints. In addition, most data
model includes a set of basic operations for specifying retrievals and modifications on the
database. Data Model provides a means to achieve Data Abstraction. Data Abstraction is
refers to the hiding of certain details of how the data are stored and maintained. With several
levels of abstraction, the user’s view of the database is simplified and this leads to the
improved understanding of data.
There are three levels of abstractions:
1. View level: The highest level of abstraction describes only part of the entire database.
Many users will not be concerned with the large database. Instead, they need to access
only a part of it so that view level abstraction is defined. There are many views for the
same database.
2. Logical level: This level describes what data are stored in the whole database.
3. Physical level: The lowest level of abstraction describes how the data are actually
stored.
High-level Conceptual Data models: Provide concepts that are close to the way people
perceive data to present the data. Typical example of this type is entity – relationship
model which use main concepts like entities, attributes, relationships. An entity
represents real-world object such as an employee, a project. An entity has some
attributes which represents properties of entity such as employee’s name, address,
birthdate. A relationship represents association among entities for example a works on
relationships between employee and project.
Record-based Logical Data models: Provide concepts that can be understood by the
user but not too far from the way data is stored in the computer. Three well-known
data models of this type are relational data model, network data model and
hierarchical data model.
o The Relational model represents data as relations. Here is an example of relational
schema for the SUPERMARKET database
o The Network model represents data as record types and also represents a limited type of
one to many relationship, called set type. The figure below shows a schema in network
model notation
o The Hierarchical model represent data as hierarchical tree structures. Each hierarchy
represents a number of related records. Here is the schema in hierarchical model notation.
Figure 5: Sample schema in hierarchical model notation
o Physical Data models: Provide concepts that describe how data is actually stored in the
computer.
The description of the database which is designed in the early stage and is not expected to
change frequently is called the database schema. Database system have several schemas.
Since information can be inserted to or deleted from database at anytime, database changes
over time. At a particular moment, the collection of information stored in the database is
called an instance of the database.
Data Independence
Data Independence is the ability to modify the schema in one level without affecting the
schema in the higher level.
There are two levels of data independence:
Logical data independence is the ability to make change in the conceptual schema
without causing a modify in the user views or application program.
Physical data independence is the ability to make change in the internal schema
without causing a modify in the conceptual schema or application program.
Physical data independence seem to be easier to achieve since the way the data is organized
in the memory affect only the performance of the system. Meanwhile, the application
program depends much on the logical structure of the data that they are access.
Database Language
Data Definition Language (DDL): This is used to define the conceptual and internal schemas
for a database system.
It is not procedural language, rather a language for describing the types of entities and
relationships among them in terms of a particular data model.
Data Manipulation Language (DML): This is used to manipulate the database, which
typically include retrieval, insertion, deletion, and modification of the data.
Based on data model: The most popular data model in today commercial DBMSs is
relational data model. Almost wellknown DBMSs like Oracle, MS SQL Server, DB2,
MySQL are support this model. Other traditional models can be named hierarchical
data model , network data model. In the recent year, we are getting familiar with
object-oriented data model but this model has not had widespread use. Some
examples of Object-oriented DBMSs are O2, ObjectStore or Jasmine.
Based on number users we can have single user database system which support one
user at a time or multiuser syste,s which support multiple users concurrently
Based on the ways database is distributed we have centralized or distributed database
system
o Centralized database system : Data in this kind of system is stored at a single
site.
o Distributed database sytem: Actual database and DBMS software are
distributed in various sites connected by a computer network.
Homogeneous distributed Database Systems
Use the same DBMS software at multiple sites
Data exchange between various sites can be handle easily
Heterogeneous distributed Database Systems
Different sites might use differents DBMS softwares
There is a software to support data exchange between sites
The Hierarchical Data Model structures data in a tree of records, with each record
having one parent record and many children. It can be represented as follows:
3. If multiple nodes appear at the top level, the nodes are called root segments.
5. Each node (with the exception of the root) has exactly one parent.
6. The child of node nx is the node directly below nx and connected to nx by a branch.
The Network Data Model uses a lattice structure in which a record can have many
parents as well as many children. It can be represented as follows:
Hierarchical and network databases, both suffer from the following deficiencies (when
compared with relational databases):
Access to the database was not via SQL query strings, but by a specific set of API's,
typically for FIND, CREATE, READ, UPDATE and DELETE.
Each API would only access a single table (dataset), so it was not possible to implement a
JOIN which would return data from several tables.
It was not possible to provide a variable WHERE clause. The only selection mechanism
availabe was
o read all entries on a child table which were associated with a selected entry on a
parent table
It was not possible to provide an ORDER BY clause. Data was presented in the order in
which it existed in the database. This mechanism could be tuned by specifying sort criteria
to be used when each record was inserted, but this had several disadvantages:
o Only a single sort sequence could be defined for each path (link to a parent), so all
records retrieved on that path would be provided in that sequence.
o It could make inserts rather slow when attempting to insert into the middle of a large
collection, or where a table had multiple paths each with its own set of sort criteria.
Homework One:
With the aid of examples illustrate how shortcomings of the file based
system were addressed by the database approach [30 marks]