Management Information Systems Lecture Notes

Management Information Systems lecture notes
Management Information System (Bamenda University of Science & Technology)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)
MANAGEMENT INFORMATION SYSTEM
UNIVERSITE INTERNATIONALE JEAN PAUL II (UIJP-II)

INSTITUT SUPERIEUR DE BAFANG (ISB) JOHN PAUL II INTERNATIONAL UNIVERSITY HIGHER
DIRECTION DES AFFAIRES ACADEMIQUES ET DE LA INSTITUTE OF BAFANG DIRECTORATE OF ACADEMIC
COOPERATION AFFAIRS AND COOPERATION
*********** ********
PROGRAMME D’ÉTUDES INTERNATIONAUX EN INTERNATIONAL DEGREE PROGRAMS IN PARTNERSHIP
PARTENARIAT WITH KESMONDS INTERNATIONAL UNIVERSITY (KIU)
AVEC KESMONDS INTERNATIONAL UNIVERSITY (KIU)
COURSE CODE / COURSE TITLE: MANAGEMENT INFORMATION

SYSTEM (MIS)
COURSE DURATION: 30 HOURS
COURSE FACILITATOR: NGWAIN NDONG BLASIUS
COURSE DESCRIPTION
The Management Information Systems (MIS) course is

designed to provide students with a comprehensive
understanding of the role of information systems in
supporting managerial decision-making and organizational
processes. The course explores the theoretical foundations,
practical applications, and strategic implications of MIS in
modern business environments.
Throughout the course, students will examine key concepts,
frameworks, and methodologies related to the design,
development, implementation, and management of
information systems. They will gain insights into how
organizations leverage technology to collect, process,
analyze, and disseminate information to enhance
operational e 昀 케 ciency, improve decision-making, and gain
a competitive advantage.
Learning Outcomes
Upon completion of the course, students are expected to be able to

❖ Understand the fundamental concepts and principles of
management information systems and their role in
organizations.
❖ Analyze the strategic importance of information systems
in
achieving organizational objectives and gaining a
competitive advantage.
❖ Identify and evaluate different types of information systems
and
their applications in various business functions.
❖ Demonstrate knowledge of the system development
life cycle and apply appropriate methodologies for
system analysis, design, implementation, and
maintenance.
❖ Comprehend the principles and practices of database
management

systems, including data modeling, storage, retrieval, and
security.
Compiled by Ngwain Ndong Blasius 1

❖ Explore the use of business intelligence and analytics

tools to extract insights from data and support
decision-making processes.
❖ Understand the integration of enterprise systems such as
ERP,
supply chain management, and customer relationship
management systems to streamline business
processes.
❖ Develop an awareness of information security risks and
apply
appropriate measures to protect organizational information
assets.
❖ Stay informed about emerging trends in management
information systems, including cloud computing, big data
analytics, mobile technologies, and arti 昀椀 cial intelligence.
❖ Apply critical thinking and problem-solving skills to analyze
real-
world scenarios and propose effective information system
solutions.
❖ Work collaboratively in teams to complete projects and
assignments related to management information
systems.
❖ Develop effective communication skills to present and
articulate
information system concepts and solutions to diverse
stakeholders.
Mode of Assessment
Participation 5 Mks
Attendance 5 Mks
Assignment/Individual Work 10 Mks
Continuous Assessment 10 Mks
End Of Semester Evaluation 70 Mks


UNIT 1: INFORMATION SYSTEM
1.0 Introduction
Information is defined as the processed form of data that is used for decision-making. There
are various types of information viz; strategic and tactical. The quality of information is
determined on the basis of several factors such as completeness and accuracy. Information
helps create a system which is defined as a collection of elements such as manpower and
production. You need to integrate all the elements of the system to achieve the organizational
goals. There are various types of systems such as physical and abstract and open and closed.
1.1 Fundamental Concept of Information System
An information system (IS) refers to a collection of people, processes, data, and technology
that work together to gather, store, process, retrieve, and distribute information within an
organization. Information systems play a vital role in managing and supporting various
aspects of an organization's operations, decision-making, and strategic planning.
The information system is a type of data processing systems which collects the data from
different sources, processes that data and generates information from the data to use them for
different applications within the organization. For example, in business context, the
information system collects data from various systems such as finance and sales systems at
the supplier side. The information system processes the data and generates information for
the customer. Customers provide feedback to the supplier depending on the information
processed by the information system. Figure 1.1 shows the information system in business
context
Fig. 1.1: An Information System

The information system helps to manage and store information to perform various functions
such as decision-making, documentation of business activities and generation of reports for
the analysis of organizational operations. There are various terms that are used in the
information system. They are as follows:

• Data: These are the raw material that can be a number, a fact, a sound, a picture or a
statement gathered from different sources. Data represent something that exists in the
real world such as business processes and employee details.
• Information: It is a meaningful data or a processed data. It defines the relation
between different types of data.
• System: It is a collection of components that help in achieving a common objective.
For example, in a human-machine system, the machine element consists of hardware
and software to perform computation and human makes decisions based on this
computation.
1.2 Components of Information System
An information system consists of two types of components—abstract system components
and physical system components. Abstract system components perform the operations such as
collecting input data, processing the data and generating information from that data. Physical
system components consist of various elements such as hardware, and human resources.
There are a few more components of the information system which are as follows:
• Data: These are the input that the system takes to produce information.
• Hardware: A computer and its peripheral equipment such as input, output and
storage devices are called hardware.
• Software: Software are application programmes or set of instructions that process the
input data using computers, generate information and store information for future use.
• People: People are an essential component of an information system. They include
end-users who interact with the system, IT professionals who develop and maintain
the system, and other stakeholders who use the system to accomplish their tasks.
• Procedures: Procedures define the rules, guidelines, and protocols that govern how
the information system is used. They include data entry processes, security protocols,
backup and recovery procedures, and other operational guidelines.
• Graphical User Interface (GUI): This is an interface for the users of information
system to work with information on the computer system. A user can operate, process
and retrieve information from the computer storage using GUI.
The components of information system describe the functioning of the system. An
information system takes the input data from the users of the information system to perform
the business operations. The users interact with the computer to process the data using GUI.
After processing of data, the information is retrieved at the user’s end.
1.3 Functions of an Information System:
Input: Gathering and capturing data or information from various sources, such as users,
sensors, or other systems.
Storage: Storing and organizing data in databases, files, or other data repositories for easy
access and retrieval.

Processing: Manipulating, analyzing, and transforming data to generate meaningful

information that can be used for decision-making or other purposes.
Output: Presenting processed information to users or other systems in a format that is easily
understandable and useful.
Retrieval: Accessing and retrieving stored information based on specific queries or requests.
Transmission: Transmitting and exchanging data or information between different
components or systems through networks or communication channels.
Feedback: Providing feedback to users or system administrators regarding the system's
performance, errors, or other relevant information.
Control: Implementing security measures, access controls, and ensuring data integrity and
confidentiality within the system.
1.4 Types of Information Systems
a) Transaction Processing Systems (TPS): Manage and process routine transactions
and operational data, such as sales, inventory, or payroll.
b) Management Information Systems (MIS): Provide reports and summaries of
operational data to support middle management's decision-making and planning.
c) Decision Support Systems (DSS): Assist managers and professionals in making
strategic decisions by providing analytical tools, models, and simulations.
d) Executive Information Systems (EIS): Provide high-level information and reports to
support top-level executives in strategic decision-making.
e) Office Automation System (OAS): Computer system, such as word processing,
electronic mail system, and scheduling system, that is designed to increase the
productivity of data workers in the office.
f) Expert Systems (ES): ES are designed to mimic human expertise in specific
domains. They use artificial intelligence techniques to provide solutions and make
recommendations based on a set of rules and knowledge.
1.5 Dimensions of Information System
Information presented to the management has its dimension in terms of cost, business and
technical issues involved. The various dimensions of information are as follows:
• Economic
• Business
• Technical
➢ Economic Dimension
The economic dimension of information determines the cost involved in obtaining the
information and the benefits that are derived from the information. Based on the cost and
benefits analysis of the information, economic dimensions of the information are evaluated.
Following are the factors that are determined during the cost and benefit analysis:

• Cost of information: It determines the total cost involved in obtaining the information.
The cost of information includes:
• Cost of acquiring the data from different data sources
• Cost of maintaining the data in the database
• Cost of generating the accurate information from the data stored in the database
• Cost of communicating the information to the intended receiver
➢ Business Dimension
The business dimension of information helps determine the relevance of information at the
various levels of the management. The business dimension of information at the top-level
management is totally different from the business dimension of information at the lower-level
management. The difference in business dimension arises from the difference in the level and
nature of work performed at the various levels of the management.
➢ Technical Dimension
The technical dimension of information covers the technical aspects of information such as
the volume of information to be stored in the database and type of database. The type of
database helps store the information in the database. The technical dimension covers the
storing capacity of the database and the time required to retrieve information from the
database.

UNIT 2: MANAGEMENT INFORMATION SYSTEM (MIS)
2.0 Introduction
Management Information System or MIS is a well-structured method which combines the
principles, theories and practices of management using an information system. MIS plays an
important role in business organization for planning and decision-making process. It provides
managers with different tools which help organize, evaluate and run their departments
efficiently. MIS also provides information to the employees at various levels of management
for performing their respective jobs. In other words, MIS is an integrated computer-based
user-machine system that provides information to support the operations, management
analysis and decision-making functions in an organization.
2.1 Definition of MIS
After understanding the overview of MIS by studying the concept of management,
information and system, now you need to define and understand the term MIS as a whole.
MIS is an integrated system which collects, maintains, correlates and selectively displays
information to meet the specific needs of the various levels of management. It helps in
making decisions and taking actions for fulfilling the objectives of an organization. The
definition of the term ‘Management Information System’ varies from person to person. There
are various definitions of MIS given by different authors which are as follows:
According to Schwartz, ‘MIS is a system of people, equipment, procedures, documents and
communication that collects, validates, operates on transformers, stores, retrieves, and present
data for use in planning, budgeting, accounting, controlling and other management process.’
According to Coleman and Riley, ‘An MIS (a) applies to all management levels; (b) is
linked to an organizational subsystem; (c) functions to measure performance, monitor
progress, evaluate alternatives or provide knowledge for change or collective action, and (d)
is flexible both internally and externally.’
According to Davis and Olson, ‘[MIS is] an integrated user-machine system designed for
providing information to support operational control, management control and decision-
making functions in an organization. The information systems make use of resources such as
hardware, software, men, procedures as well as supplies.’
2.2 Goals of MIS
MIS is the most common type of management support systems. Various goals of MIS in an
organization are as follows:
• To provide information to managerial end-users to support their day-to-day decision-
making needs

• To produce reports for specific time periods designed for managers responsible for
specific functions in an organization. For example, departmental expense reports and
performance reports
• To provide correct information to the concerned manager at the right time
• To help in carrying out regular and routine operations
• To control, organize and plan better business operations
2.3. Characteristics of MIS

MIS exhibits different characteristics which help specify the approach, design and
development of MIS. The various characteristics of MIS include the following:
System approach: MIS follows the system approach which implies a step by step approach
to the study of complete system of an organization and its performance in the light of the
objectives of the system. In doing so, MIS takes a comprehensive view of the subsystems that
operate within the organization.
Management-oriented: The management-oriented characteristic of MIS implies that top-
down approach needs to be followed for designing MIS. The top-down approach suggests
that the system development starts with determining management requirements and overall
business objectives.
Need-based: The design and development of MIS should meet the information required by
the managers at different levels of management such as strategic planning, management
control and operational control. This implies that MIS needs to provide the requirements for
the managers throughout the management hierarchy of the organization.
Future-oriented: The design and development of MIS should not be restricted to the past
information. It should also look into the future aspects on the basis of the predictions made
for the system.
Integrated: MIS has the ability to produce meaningful information because of the integration
concept. It means taking a comprehensive view of the subsystems that operate within the
organization. An integrated system which blends information from several operational areas
is a necessary characteristic of MIS. For example, to develop an effective production
scheduling system, it is necessary to balance the integration of the following factors:
• Set-up costs
• Work force
• Overtime rates
• Production capacity
• Inventory level
• Capital requirements
• Customer services
Thus, integrated approach blends information from several operational areas.

2.4 Functions of MIS

The prime objective of MIS in an organization is to obtain management information which
can be used by the managers of the organization for decision-making. To meet this object,
MIS needs to perform the following functions:
➢ Collecting data: MIS helps collect data from different external and internal sources
of an organization. MIS can perform this function using both manual and
computerized techniques.
➢ Processing data: Processing the data includes converting the collected data into the
required management information. To process the data, you need to perform various
activities such as calculating, comparing, sorting, classifying and summarizing the
data. These data processing activities organize, analyze and manipulate captured data
by using various statistical, mathematical, operations research and other business
models.
➢ Storing information: MIS allows you to store both processed and unprocessed data
used in an organized manner for future use. The stored data in MIS is organized into
fields, records, files and databases. MIS also allows you to store information as an
organizational record that is not immediately required.
➢ Retrieving information: MIS helps retrieve information from its databases when users
request for the information. MIS either allows to distribute the retrieved information
or sends for further processing as per the requirements of the users.
➢ Disseminating management information: Disseminating MI involves dividing and
distributing the retrieved information to the users of the information. Disseminating
MI can be performed in two ways, periodic and online.
2.5 Classification of MIS
Over the years, though MIS has come across a significant change in its approach, from an
elementary concept to an advanced discipline, MIS is still considered to be in its evolutionary
stage. Therefore, it is difficult to classify the information systems under MIS distinctly.
However, you can categorize these information systems on the basis of their roles in the
operations and management of business as follows:
• Operations support systems
• Management support systems
• General support systems
1. Operations Support Systems
In an organization, when a transaction takes place, data are produced as a by-product of the
transaction. These data are then processed to carry out operations for supporting the business
of the organization. The information systems, which are used to support such data, are called
operations support systems. The operations support systems allow you to perform various
tasks such as efficient processing of business transactions, supporting organizational

communications and updating databases of the organization. You can find the following
different types of operating support systems in an organization:
Transaction Processing Systems: A Transaction Processing System (TPS) allows you to
process and record data and helps produce reports from the processed data. It also represents
the automation of the general routine processes which are used to support business operations
in an organization.
Process Control Systems: A Process Control System (PCS) monitors and controls the
physical processes in an organization. While monitoring the physical processes, PCS handles
the architecture and mechanisms involved in the physical process. An example of a PCS is
the electronic sensors linked to computers used in pharmaceutical industries that help in
monitoring the chemical process and the adjustments need to be performed.
Office Automation Systems: Office automation systems refers to those information systems
in which computer and communication technology applications are used to process office
transactions and office activities at all levels of an organization. The office automation
systems provide secretarial assistance and enhanced communication facilities at different
level of management that helps improve the productivity of the managers at different levels.
The office automation system performs various activities such as typing, mailing, scheduling
of meetings, conferences, etc.
2. Management Support Systems
Management support systems include those information systems which emphasize on
providing information and support for effective decision-making by the managers of an
organization. The various management support systems are as follows:
• Management information system
• Decision support system
• Executive support system
i. Management Information System
Management information system is the information system that takes data as input and
processes the data to convert it into information as the output. Figure 2.1 shows the
processing of management information systems:

Figure 2.1 Management Information System

A management information system uses TPS to get the input data for processing. It processes
the data received from TPS to support a wide range of organizational functions and
management processes. Therefore, the information generated from the management
information system can be used for the control of operations and management along with
various short-term and long-term planning.
ii. Decision Support System (DSS)
Decision Support Systems are information systems that support decision-making in an
organization. DSS is also used in planning and error handling in the organization. A DSS has
three elements, namely database, model base and user interface. The database specifies the
accumulation of data from the master files and external sources. The model base specifies a
library of models which help in manipulating and analysing the data in the desired way. The
user interface allows a user to communicate with DSS.
iii. Executive Support System (ESS)
Executive Support System extends the management information system and includes the
functionality of a DSS that helps support the decision-making of the chief executives in an
organization. Therefore, ESS is an extensive and broad information system that includes
different types of DSS and is more specific and person-oriented than other information
systems.
3. General Support Systems
The information systems under MIS that perform both the roles of operations and
management are categorized as general support systems. The various general support systems
are as follows:
Business Expert System: A Business Expert Systems (BES) is an advanced and knowledge-
based information system that acts as an expert to provide knowledge-specific solutions to
application areas which include medical, business and engineering. The main elements of a
BES are as follows:

• Knowledge base: It contains information about the specific area for which the BES
can provide expert information.
• Inference engine: It specifies how you can deduce an inference from the stored data
and rules.
• User interface: It allows a user to communicate with BES.
Knowledge Management Systems: A Knowledge Management System (KMS) is a
knowledge-based information system that helps support in creation and distribution of
business knowledge to the managers and other employees of an organization. KMS also helps
in providing quick feedback to the employees and improving business performance of the
organization.
Strategic Information Systems: A Strategic Information System (SIS) is an information
system that helps an organization in achieving strategic advantage over its competitor by
applying Information Technology (IT) to the products and services of the organization.
Therefore, any information system such as TPS, DSS, ESS, etc., which uses IT for leveraging
an organization in the competition can be referred to as SIS.
Functional Business Systems: A Functional Business System (FBS) helps support the
various function areas such as production, marketing and accounting of an organization.
Examples of FBS include Financial Information System, Production Information System,
Marketing Information System, etc.
2.6 Subsystem and Upgrowth in Organizations
An information system consists of many subsystems. Subsystem refers to a part of
information system that plays a specific role in the working of information system. An
information system becomes effective only when all its component subsystems work
properly. Subsystems have a great importance for the end-users of the information system
since the subsystems are the integral part of any type of information system. These
subsystems are as follows:
• Real-life business subsystem
• Production subsystem
• Marketing subsystem
• Personnel subsystem
• Material subsystem
• Financial subsystem
Real-Life Business Subsystem

Real-life business subsystem is responsible for collecting information about the organization
and its business environment. This information is very important to implement an
information system in the organization because the information system of the organization
must be compatible with each and every business activities of the organization. It must
answer the following questions:

• What is the relationship between the information and business system of the
organization and the end-users of the information system?
• What are the objectives of the management behind the implementation of the
information system in the organization?
• What is the importance of the information system for the organization?
• Why is understanding the business system important to analyst?
Answers to the above questions help the analyst of the information system to understand the
scope and complexity of the information system.
Production Subsystem; responsible for collecting all the information provided by the
production department of the organization.
Marketing Subsystem; responsible for collecting information about acquisition,
transportation, storage and delivering of the product of the organization.
Personnel Subsystem; responsible for maintaining information such as records of account,
employees and finance of the organization.
Material Subsystem; responsible for collecting information about materials used in different
departments of an organization.
Financial Subsystem; responsible for collecting all the information related to financial
management of the organization
2.7 Decomposition of System
Decomposition of system refers to the division of system into various subsystems. The
decomposition of system into various subsystems allows you to perform complex tasks with
greater ease. Moreover, the division of system into various subsystems leads to the
hierarchical representation of the operations. A system is divided into smaller units until the
smallest unit of the system becomes manageable at the system level. Figure 2.2 shows the
decomposition of a system.
Figure 2.2: Decomposition of System

2.8 Upgrowth in Organizations

In the following section, we will discuss Nolan stage model to understand the various
features of information system correspond to the stages of growth.
The Nolan Stage Model: IS Planning Framework
Nolan stage model is developed by Richard Nolan in 1974 to provide framework for
information system planning in which the various features of information system correspond
to the stages of growth. This model very clearly explains the stage by stage development of
an information system in an organization. Initially, the model consisted of four stages;
initiation, expansion or contagion, formalism or control, and maturity or integration. The
basic principle behind this model is that an organization must go through each stage of
growth before progressing to the next stage.
Later on, in 1979, Nolan recommended that these four stages are not sufficient to depict the
growth of IT in an organization so; he enhanced his four stage model to six stage by
introducing stage 5 and stage 6. This growth curve takes the shape of a double ‘S’. It shows
that the growth rises sharply in the first and the second stage and then become stable by the
end of the third stage. Again for the fourth stage, growth increases but only to get a steady
shape at the last stage of the growth curve.
Figure 2.3: The Nolan Stage Model: IS Planning Framework

Stage 1: Initiation
The first growth stage is known as initiation stage. During this stage, information technology
is introduced into the organization. The organization buys and installs computer systems, and
few applications are computerized to meet the basic organizational needs. However, at this
stage the number of users using the computers is less due to unfamiliarity with the new
technology. Thus, this stage is characterized by decentralized control and minimal planning.

Since most medium and large-sized companies have already installed computer systems, this
stage has already been achieved by most of the organizations.
Stage 2: Contagion
This stage is also known as expansion stage. This is the phase when most of the organizations
wish to have an access to the computer hardware, develop software and have the trained
manpower working. Every organization head wishes to have some computer resources
controlled by them.
Stage 3: Control
This stage is also known as formalism stage. During this stage, management notices that
benefits derived from MIS activity are not in proportion to the actual expenditure spent on it.
So, organizations exercise control over resources by implementing various formal control
processes and standards.
Stage 4: Integration
This stage is also known as maturity stage. By this stage, organizations gain enough
experience and maturity in IS applications. So, this stage mainly focuses on the integration of
applications so as to avoid duplications of efforts and systems. In this stage, controls are
adjusted and planning is performed in well-organized manner hence, this stage is called the
‘stage of perfection’.
Stage 5: Data Administration
This stage puts emphasis on managing corporate data rather than IT. So, management of data
becomes the crucial step at this stage. Database administrator (DBA) plays an important role
in the management of data. Since the data is being stored, used, manipulated and processed
from integrated files in the database, it is the responsibility of DBA to plan, supervise, control
and secure the data.
Stage 6: Maturity
This is the final stage of enhanced Nolan model. It shows that application portfolio (tasks like
orderly entry, material requirements planning, etc.) is completed and hence information flows
within an organization. It is assumed that by this stage, the applications have been
incorporated into the organizational functioning and these are as per the strategic
requirements of the organisation.
Review Questions
Short-Answer Questions
1. Write a short note on the single and the multiple process management information systems.
2. List various types of subsystem.
3. What do you understand by decomposition of subsystem?

4. Define the term ‘management support system’.
Long-Answer Questions
1. Explain the concept of MIS in detail. What is the primary goal of MIS?
2. What is office automation system? Explain with examples.
3. Explain the structure of MIS.
4. Discuss the various characteristics of MIS.
5. Explain the various functions of MIS.
6. Discuss the various approaches that help describe the structure of MIS.
7. What are the bases for classifying MIS? Explain the different categories of
MIS?

UNIT 3. DATABASE MANAGEMENT SYSTEMS (DBMS)
3.0 Introduction
Database Management System (DBMS) is a software solution that allows you to create and
maintain databases in which data are stored. There are four types of data models: relational,
hierarchical, network and object-oriented that you can use to store data. Different individuals,
such as a database managers, perform separate roles to manage the database. DBMS supports
multiple-layered architecture that provides physical and logical data independence. Data
stored in the database can be accessed using different languages such as FML and SQL.
Database design is a process in which you create a logical data model for a database which
stores data of a company. You use the normalization technique to create the logical data
model for a database and eliminate data redundancy. Normalization also allows you to
organize data efficiently in a database and reduce anomalies during data operations. Various
normalization forms, such as first, second and third can be applied to create a logical data
model for a database. Second and third normal forms are based on partial dependency and
transitive dependency. Partial dependency occurs when a row of a table is uniquely identified
by one column that is a part of a primary key. A transitive dependency occurs when a non-
key column is uniquely identified by values in another non-key column of a table. In this
unit, you will learn about the concept of DBMS, its functions, types and benefits.
3.1 Unit Objectives
After going through this unit, you will be able to:
• Define the concept of DBMS
• Explain the architecture of DBMS
• Describe the different data models
• Explain the process of designing of database
• Identify the issues of DBMS
3.2 Overview of DBMS
Database Management System (DBMS) is a software solution that allows you to create and
maintain databases in which you can store data. It basically refers to a system which helps
store and retrieve the data systematically from a database. The different users such as
database manager perform separate roles to manage the database in DBMS which supports
multiple-layered architecture that provides physical and logical data independence.
Constructing a database is a process of storing data on some storage medium such as floppy
drive, compact disk (CD) or hard disk drive (HDD). Manipulating a database involves
performing functions such as querying the database to retrieve specific data, updating the
database to reflect changes made by the user and generating reports from the data.

3.2.1 Features of DBMS

To understand the basics of database management systems, you must know the terms and
definitions that are used in DBMS technology. These terms and definitions constitute DBMS
terminology. DBMS is a software programme which may run on a user machine or a server
computer. The DBMS accepts queries from users and responds to these queries. A DBMS
has the following features:
• Structured data: DBMS enables you to structure the data as tables, records or objects.
• Query language: A DBMS provides a query language such as SQL to process the user
requests.
• Multi-user access: DBMS allows several users to access the data stored in a database.
At the same time, it provides security features which restrict some users from viewing
or manipulating the data.
• Data dictionary: DBMS provides a data dictionary which contains the structure of a
database.
• Data abstraction: It allows a Database Administrator (DBA) to logically separate the
data from the programmes which use the data.
3.2.2 Functions Performed in DBMS
In DBMS, several people play important roles in organizing and manipulating the data. These
roles are assigned to people according to the work performed by them in creating and
maintaining the DBMS. The various roles performed in DBMS are as follows:
• Database administrator
• Database designers
• Database users
• Database manager
Figure 3.1: DBMS Structure

1. Database Administrator

DBA is responsible for making the strategy and policy decisions regarding the organization
of data in the database. DBA also provides technical support in implementing the decisions
which are taken by the data administrator. DBA performs the following functions:
• Defining the conceptual schema for a database
• Defining the internal schema for a database
• Coordinating with users
• Defining security and integrity constraints for the database
• Defining dump and reload policies for a database
• Monitoring performance and responding to changing requirements
2. Database Designers
A database designer identifies the data to be stored in a database. The database designer is
also responsible for choosing the right database structure to represent and store the data in the
database. The tasks of identifying the data and choosing the structure are performed before
the implementation of a database. The database designer communicates with the various
database users to understand their requirements before selecting the database structure.
3. Database Users
The database users are the people who need to interact with DBMS. The database users can
be categorized according to their requirements of data. The people who interact with DBMS
to retrieve data are called the naïve users and the people who interact with DBMS to make
some changes in the database are called the developers.
4. Database Manager
The database manager refers to the software that helps use and management of the data stored
in a database. The database manager handles the requests of database users to access the data
items from database. The database manager also provides facilities such as support for a
query language, to retrieve and update the database. The facilities provided by the database
manager depend on the design of the database manager. For example, if the data manager is
designed to handle one request at a time,
3.2.3 Benefits of DBMS
DBMS provides various advantages that make it useful for storing and maintaining the data.
Following are the advantages of DBMS:
• Preventing data redundancy
• Restricting unauthorized access
• Persistent storage
• Multiple user interfaces
• Integrity constraints
• Backup and recovery of data
3.3 Types of Database

Nowadays, there also exist three new types of databases: parallel database, distributed
database and object-oriented databases.
3.3.1 Parallel Databases
Parallel databases use parallel processors for computing. These databases use high-speed
processors, memory and disks. Parallel databases uses relational data model which uses SQL
query language to manipulate the data.
3.3.2 Distributed Databases (DDBs)
Distributed Databases (DDBs) store data over a computer network. In these databases, data
may be stored in multiple computers located at the same physical location or spread over a
network. It also helps in solving many problems such as data distribution and transaction
processing.
3.3.3 Object-Oriented Databases
These databases use the object-oriented programming languages such as C++ or java. Object-
oriented databases have salient features that are inheritance, encapsulation and
polymorphism.
3.4 Data Models
A data model in DBMS is defined as a collection of concepts which is used to describe the
structure of a database. Data model describes logical structure of a database by considering
following concepts:
• Structure: It represents how the data are organized in a database. The data can be
organized using hierarchical, network, relational or object-oriented data model.
• Integrity: It provides a definition of rules to indicate whether or not the defined
structure can be used to organize data in a database.
• Manipulation: It provides a language in which you can update the data in a database.
• Querying data: It provides a language in which the data in the database are queried.
For DBMS implementation you can use various data models which include all database-
related concepts for describing the structure of a database. The various data models used in
DBMS are as follows:
• Hierarchical
• Network
• Relational
• Object-oriented
3.4.1 Hierarchical Model
Data models can be defined as a collection of various concepts which are used to describe the
structure of a database. Implementing a data model includes specifying data types,
relationship among data types and the constraint on the data. In the Hierarchical model also

called Hierarchical schema, data are organized in the form of a tree structure. Hierarchical
model supports the concept of data independence
Hierarchical model uses two types of data structures, records and parent-child relationship to
define the data and relationship among data. Records can be defined as a set of field values
which are used to provide information about an entity. An entity is a collection of object in a
database which can be described by using a set of attributes. Records of same type can be
easily grouped together to form a record type and assigned a name.
Parent-Child Relationship (PCR) can be defined as a 1: N relationship between two different
record types. The record type on the 1-side is called parent record type and record type on the
N-side is called child record type. The Figure below shows an example of 1: N relationship
between a finance department and the employees of finance department.
Department Employee
Figure 3.2: Parent-Child Relationship (PCR)

3.4.2 Network Model
The network model can be defined as a database model which is used to represent objects and
the relationships among these objects. In network model, a record can have a number of
parent records and it also can have multiple child records. Like Hierarchical model, network
model also supports the concept of data independence which can be defined as the ability to
change the representation of data at one level of a database system without the compulsion of
changing the data representation at the next higher level. In network model, Data
Manipulation Language (DML) is used for searching and retrieving records from the
database. DML can also be used for connecting records from the set of instances, deleting
and modifying records.
Network model uses two types of data structures: records and set type to define the data and
relationship among data. Records can be defined as a set of field values which are used to
provide information about an entity. An entity is a collection of object in a database which

can be described by using a set of attributes. Records that have a same type can be easily
grouped together to form a record type and assigned a name. The structure of a record type
can be defined by using a collection of named fields or data items. Each data item or field has
a certain data type such as character, float or integer. The figure below represents a record
type employee that has data items name, sex and birth-date.
Set type is a description of a 1:N relationship between two record types. Each set type
definition has following elements:
• Name for set type
• Owner record type
• Number record type
3.4.3 Relational Data Model
In a relational data model, data is stored in tables which are also called relations. The related
tables or relations in the relational data model form a database. The properties of relational
data model are as follows:
• Each row in a table is unique from every other row in the table.
• Each row contains atomic data which implies that data are not repeated and do not
contain structures such as arrays.
In a relational model, tables are used to organize data. A table consists of columns or fields
that represent attributes of an entity. Each row or tuple in a table represents occurrence of an
entity and must consist of a value that uniquely identifies the row. Such a column that
uniquely identifies the rows or tuples in table is called the primary key. The relational model
also consists of foreign keys that allow joining data of two tables. To understand relational
model consider the following table below that represents the customers database.
Table 1: Customer Database
Cust_name Cust_id Cust_city
John 1001 Xyz
Tom 1002 Pqr
Ken 1003 Abc

In the above Table, Cust_name represents the name of the customers, Cust_id is the unique
number for each customer and Cust_city represents the city of the customers.
Relational data model makes use of the set theory and is based on the concept of
mathematical relation which contains several data elements. The basic characteristics of the
relational model are relational algebra and relational calculus. Relational algebra is a set of
operations for manipulating relations and specifying queries. Relational calculus provides a
declarative way to specify database queries. The relational algebra and the relational calculus
are two different means of representing the database queries. Any relational algebraic
expression can also be converted into a corresponding expression in the relational calculus
and vice versa.
Relations
A relation is a two-dimensional table which is used to represent data in the form of rows and
columns. The names of the columns are known as attributes and rows are known as tuples of
the relation. There are various parts of a relation which are as follows:
• Domain: It is a set of atomic values. The values that cannot be divided into
subcomponents are called atomic values. Generally, you specify a domain as a data
type from which the values forming the domain are taken. You should also give a
name to a domain to help interpret its values.
• Tuple: In relational data model, a row is termed as tuple that gives complete
information of an entity.
• Attribute: It is a column header in a relation that represents the attributes of an entity.
3.5 Data Mapping

Data mapping is the process of creating data element mappings between two distinct data
models. Data mapping is used as a first step for a wide variety of data integration tasks
including: Data transformation or data mediation between a data source and a destination.
It is a process used in data warehousing by which different data models are linked to each
other using a defined set of methods to characterize the data in a specific definition. This
definition can be any atomic unit, such as a unit of metadata or any other semantic. This data
linking follows a set of standards, which depends on the domain value of the data model
used. Data mapping serves as the initial step in data integration.
3.6 Designing of Database
Database design is a process in which you create a logical data model for a database which
stores data of a company. The goal of designing of database schema is to minimize the
storage space which is occupied by the data stored on the hard drive. Database anomalies are
the errors in the data contained in the database that reduces the performance of Database
Management System (DBMS). The database anomalies also affect the performance of the
DBMS by increasing the size of data files. The following type of database anomalies can
increase the size of data files:
• Insertion anomalies: These occur when it becomes difficult to insert a data in the
database. You cannot insert the data having null values in a table which has a primary
key constraint. So, when you have a record that contains values for all the columns
apart from the primary key column, you cannot insert that record into the table. This
restricts the ability of inserting the records into the database.
• Deletion anomalies: These occur when the deletion of a particular record affects the
existence of a particular relation in the database. For example, in a database, a table
contains the records of students. The subject column of the table contains the
information about the subjects which the students have opted. Now, if you delete all
the records for the multimedia subject, then you can lose the information about the
students who are studying only multimedia.
• Modification anomalies: These occur when a database user changes the value of a
data item and the value of that data item does not change in other tables.
3.7 Normalization
Normalization is integral to the database design and it can be defined as the process of
eliminating the redundancy of data in a database. A relational table in a database is said to be
in a normal form if it satisfies certain constraints. The normalization process involves various
levels of normal forms that allow you to separate the data into multiple related tables. The
various normal forms are first normal form (1NF), second normal form (2NF), third normal
form (3NF), fourth normal form (4NF) and fifth normal Form (5NF).
The goals of normalization are as follows:
• Removing the redundant data

• Ensuring that only related data is stored in a table

Therefore, normalization helps you to remove data redundancy and update inconsistencies
when data are inserted, deleted or modified in a database. The benefits of normalization are
as follows:
• Provides better overall database organization and data consistency within a database
• Allows you to create tables that can be easily joined with other tables with related
information
• Helps to reduce redundant data across the tables
• Prevents data loss by assigning primary and foreign keys in a table
• Helps to reduce modification anomalies such as deletion, insertion and update
anomalies
• Defines relation constraints that are a logical consequence of keys
3.7.1 Normalization Terminology
Normalization terminology consists of various concepts frequently used in normalization
such as primary key and functional dependency.
➢ Primary Key
The primary key of a relational table uniquely identifies each row in a table. A primary key is
either a column in a table that is unique such as identification number and social security
number or it is generated by the DBMS such as a Globally Unique Identifier (GUID).
Primary key is a set of single column or multiple columns from a table. For example,
consider a student records database that contains tables related to student’s information. The
first table, STUDENTS, contains a record for each student at the university. The table
consists of various attributes such as student_id, first_name, last_name and student_stream.
The Table below lists the various attributes in the STUDENTS table.
Table 2: Students Table
Student_id First_name Last_name Student_stream
S01 John Wilkins Computers
S01 Chris Burton Electronics
S01 Ken Wilkins Electronics
A unique Student_id number of a student is a primary key in the STUDENTS table. You
cannot make the first or last name of a student a primary key because more than one student
can have the same first name and can have same stream.

➢ Functional Dependency
A functional dependency is a constraint between two sets of attributes from the database.
Functional dependency is represented by X  Y between two attributes, X and Y, in a table.
The functional dependency X  Y implies that Y is functionally dependent on X. The Table
below lists the various attributes in the EMPLOYEE table.
Table 3: Employee Table
Employee_id Employee_name Employee_dept
K067263 John Sales
K067264 Chris Accounts
K067265 Ken Sales
In the table above, the various attributes of the EMPLOYEE are Employee_id,
Employee_name and Employee_dept. You can state that:
Employee_id “ >Employee_name
The above representation that the Employee_name attribute is functionally dependent on the
Employee_id implies that the name of an employee can be uniquely identified from id of the
employee. However, you cannot uniquely identify the Employee_id from the
Employee_name column because more than one employee can have the same name.
However, each employee has different value in the Employee_id column.
Functional dependencies are a type of constraints based on keys such as primary key or
foreign key. For a relation table R, a column Y is said to be functionally dependent on a
column X of the same table if each value of the column X is associated with only one value
of the column Y at a given time. All the columns in the relational table R should be
functionally dependent on X if the column X is a primary key.
If the columns X and Y are functionally dependent, the functional dependency can be
represented as:
R.x  R.y
For example, consider the following functional dependency in a table. Employee_id 
Salary, the column Employee_id functionally determines the
Salary column because the salary of each employee is unique and remains the same for an
employee, each time the name of the employee appears in the table.
A functional dependency, represented by X  Y, between two sets of attributes, X and
Y, that are subsets of R, is called as trivial functional dependency if Y is a subset of X. For
example, Employee_id  Project is a trivial functional dependency.

A functional dependency, represented by X  Y, between two sets of attributes, X and

Y, which are subsets of R, is called a non-trivial functional dependency if at least one of
the attributes of Y is not among the attributes of X. For example, Employee_id  Salary
is a non-trivial functional dependency.

UNIT 4: DATA WAREHOUSING AND DATA MINING
4.1 Data Mining
4.1.1 What Is Data Mining?
Data mining refers to extracting or mining knowledge from large amounts of data. The term
is actually a misnomer. Thus, data mining should have been more appropriately named as
knowledge mining which emphasis on mining from large amounts of data.
It is the computational process of discovering patterns in large data sets involving methods at
the intersection of artificial intelligence, machine learning, statistics, and database systems.
The overall goal of the data mining process is to extract information from a data set and
transform it into an understandable structure for further use.
The key properties of data mining are

• Automatic discovery of patterns
• Prediction of likely outcomes
• Creation of actionable information
• Focus on large datasets and databases
4.1.2 The Scope of Data Mining
Data mining derives its name from the similarities between searching for valuable business
information in a large database — for example, finding linked products in gigabytes of store
scanner data — and mining a mountain for a vein of valuable ore. Both processes require
either sifting through an immense amount of material, or intelligently probing it to find
exactly where the value resides. Given databases of sufficient size and quality, data mining
technology can generate new business opportunities by providing these capabilities:
4.1.3 Tasks of Data Mining
Data mining involves six common classes of tasks:
• Anomaly detection (Outlier/change/deviation detection) – The identification of
unusual data records, that might be interesting or data errors that require further
investigation.
• Association rule learning (Dependency modelling) – Searches for relationships
between variables. For example a supermarket might gather data on customer
purchasing habits. Using association rule learning, the supermarket can determine
which products are frequently bought together and use this information for marketing
purposes. This is sometimes referred to as market basket analysis.
• Clustering – is the task of discovering groups and structures in the data that are in
some way or another "similar", without using known structures in the data.
• Classification – is the task of generalizing known structure to apply to new data. For

example, an e-mail program might attempt to classify an e-mail as "legitimate" or as

"spam".
• Regression – attempts to find a function which models the data with the least error.
4.1.4 Architecture of Data Mining

A typical data mining system may have the following major components.
Figure 4.1: Architecture of Data Mining

1. Knowledge Base:
This is the domain knowledge that is used to guide the search or evaluate the
interestingness of resulting patterns. Such knowledge can include concept hierarchies, used to
organize attributes or attribute values into different levels of abstraction. Knowledge such as
user beliefs, which can be used to assess a pattern’s interestingness based on its
unexpectedness, may also be included. Other examples of domain knowledge are additional
interestingness constraints or thresholds, and metadata (e.g., describing data from multiple
heterogeneous sources).
2. Data Mining Engine:
This is essential to the data mining system and ideally consists of a set of functional modules
for tasks such as characterization, association and correlation analysis, classification,
prediction, cluster analysis, outlier analysis, and evolution analysis.

3. Pattern Evaluation Module:

This component typically employs interestingness measures interacts with the data mining
modules so as to focus the search toward interesting patterns. It may use interestingness
thresholds to filter out discovered patterns. Alternatively, the pattern evaluation module may
be integrated with the mining module, depending on the implementation of the datamining
method used. For efficient data mining, it is highly recommended to push the evaluation of
pattern interestingness as deep as possible into the mining process so as to confine the search
to only the interesting patterns.
4. User interface:
This module communicates between users and the data mining system, allowing the user to
interact with the system by specifying a data mining query or task, providing information to
help focus the search, and performing exploratory datamining based on the intermediate data
mining results. In addition, this component allows the user to browse database and data
warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in
different forms.
4.1.5 Data Mining Process:
Data Mining is a process of discovering various models, summaries, and derived values from
a given collection of data.
The general experimental procedure adapted to data-mining problems involves the following
Steps:
1. State the problem and formulate the hypothesis
Most data-based modelling studies are performed in a particular application domain. Hence,
domain-specific knowledge and experience are usually necessary in order to come up with a
meaningful problem statement. Unfortunately, many application studies tend to focus on the
data-mining technique at the expense of a clear problem statement. In this step, a modeller
usually specifies a set of variables for the unknown dependency and, if possible, a general
form of this dependency as an initial hypothesis. There may be several hypotheses formulated
for a single problem at this stage. The first step requires the combined expertise of an
application domain and a data-mining model. In practice, it usually means a close interaction
between the data-mining expert and the application expert. In successful data-mining
applications, this cooperation does not stop in the initial phase; it continues during the entire
data-mining process.
2. Collect the data
This step is concerned with how the data are generated and collected. In general, there are
two distinct possibilities. The first is when the data-generation process is under the control of
an expert (modeller): this approach is known as a designed experiment. The second
possibility is when the expert cannot influence the data- generation process: this is known as
the observational approach. An observational setting, namely, random data generation, is
assumed in most data-mining applications. Typically, the sampling distribution is completely

unknown after data are collected, or it is partially and implicitly given in the data-collection
procedure. It is very important, however, to understand how data collection affects its
theoretical distribution, since such a priori knowledge can be very useful for modelling and,
later, for the final interpretation of results. Also, it is important to make sure that the data
used for estimating a model and the data used later for testing and applying a model come
from the same, unknown, sampling distribution. If this is not the case, the estimated model
cannot be successfully used in a final application of the results.
3. Pre-processing the data
In the observational setting, data are usually "collected" from the existing databases, data
warehouses, and data marts. Data pre-processing usually includes at least two common tasks:
1. Outlier detection (and removal) – Outliers are unusual data values that are not consistent
with most observations. Commonly, outliers result from measurement errors, coding and
recording errors, and, sometimes, are natural, abnormal values. Such no representative
samples can seriously affect the model produced later. There are two strategies for dealing
with outliers:
a. Detect and eventually remove outliers as a part of the pre-processing phase, or b. Develop
robust modelling methods that are insensitive to outliers.
2. Scaling, encoding, and selecting features – Data pre-processing includes several steps such
as variable scaling and different types of encoding. For example, one feature with the range
[0, 1] and the other with the range [−100, 1000] will not have the same weights in the applied
technique; they will also influence the final data-mining results differently. Therefore, it is
recommended to scale them and bring both features to the same weight for further analysis.
Also, application-specific encoding methods usually achieve dimensionality reduction by
providing a smaller number of informative features for subsequent data modelling.
These two classes of pre-processing tasks are only illustrative examples of a large
spectrum of pre-processing activities in a data-mining process.
Data-pre-processing steps should not be considered completely independent from other data-
mining phases. In every iteration of the data-mining process, all activities, together, could
define new and improved data sets for subsequent iterations. Generally, a good pre-
processing method provides an optimal representation for a data-mining technique by
incorporating a priori knowledge in the form of application-specific scaling and
encoding.
4. Estimate the model
The selection and implementation of the appropriate data-mining technique is the main task
in this phase. This process is not straightforward; usually, in practice, the implementation is
based on several models, and selecting the best one is an additional task. The basic principles
of learning and discovery from data are given in Chapter 4 of this book. Later, Chapter 5
through 13 explain and analyze specific techniques that are applied to perform a successful
learning process from data and to develop an appropriate model.
5. Interpret the model and draw conclusions

In most cases, data-mining models should help in decision making. Hence, such models need
to be interpretable in order to be useful because humans are not likely to base their decisions
on complex "black-box" models. Note that the goals of accuracy of the model and accuracy
of its interpretation are somewhat contradictory. Usually, simple models are more
interpretable, but they are also less accurate. Modern data-mining methods are expected to
yield highly accurate results using high dimensional models. The problem of interpreting
these models, also very important, is considered a separate task, with specific techniques to
validate the results. A user does not want hundreds of pages of numeric results. He does not
understand them; he cannot summarize, interpret, and use them for successful decision
making.
Figure 4.2: Data Mining Process

4.2 DATA WAREHOUSE:
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of
data in support of management's decision making process.
Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.
Integrated: A data warehouse integrates data from multiple data sources. For example, source
A and source B may have different ways of identifying a product, but in a data warehouse,
there will be only a single way of identifying a product.
Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data
from 3 months, 6 months, 12 months, or even older data from a data warehouse. This
contrasts with a transactions system, where often only the most recent data is kept. For

example, a transaction system may hold the most recent address of a customer, where a data
warehouse can hold all addresses associated with a customer.
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a
data warehouse should never be altered.
4.2.1 Data Warehouse Design Process:
A data warehouse can be built using a top-down approach, a bottom-up approach, or a
combination of both.
The top-down approach starts with the overall design and planning. It is useful in cases where
the technology is mature and well known, and where the business problems that must be
solved are clear and well understood.
The bottom-up approach starts with experiments and prototypes. This is useful in the early
stage of business modelling and technology development. It allows an organization to move
forward at considerably less expense and to evaluate the benefits of the technology before
making significant commitments.
In the combined approach, an organization can exploit the planned and strategic nature of the
top-down approach while retaining the rapid implementation and opportunistic application of
the bottom-up approach.
The warehouse design process consists of the following steps:
➢ Choose a business process to model, for example, orders, invoices, shipments,
inventory, account administration, sales, or the general ledger. If the business process
is organizational and involves multiple complex object collections, a data warehouse
model should be followed. However, if the process is departmental and focuses on the
analysis of one kind of business process, a data mart model should be chosen.
➢ Choose the grain of the business process. The grain is the fundamental, atomic level
of data to be represented in the fact table for this process, for example, individual
transactions, individual daily snapshots, and so on.
➢ Choose the dimensions that will apply to each fact table record. Typical dimensions
are time, item, customer, supplier, warehouse, transaction type, and status.
➢ Choose the measures that will populate each fact table record. Typical measures are
numeric additive quantities like dollars sold and units sold.

4.2.2 A Three Tier Data Warehouse Architecture:
Figure 4.3: Three Tier Data Warehouse Architecture:

Tier-1:
The bottom tier is a warehouse database server that is almost always a relational database
system. Back-end tools and utilities are used to feed data into the bottom tier from
operational databases or other external sources (such as customer profile information
provided by external consultants). These tools and utilities perform data extraction, cleaning,
and transformation (e.g., to merge similar data from different sources into a unified format),
as well as load and refresh functions to update the data warehouse. The data are extracted
using application program interfaces known as gateways. A gateway is supported by the
underlying DBMS and allows client programs to generate SQL code to be executed at a
server.
Examples of gateways include ODBC (Open Database Connection) and OLEDB (Open
Linking and Embedding for Databases) by Microsoft and JDBC (Java Database Connection).
This tier also contains a metadata repository, which stores information about the data
warehouse and its contents.
Tier-2:
The middle tier is an OLAP (Online analytical Processing ) server that is typically
implemented using either a relational OLAP (ROLAP) model or a multidimensional OLAP.

OLAP model is an extended relational DBMS that maps operations on multidimensional data
to standard relational operations.
A multidimensional OLAP (MOLAP) model, that is, a special-purpose server that directly
implements multidimensional data and operations.
Tier-3:
The top tier is a front-end client layer, which contains query and reporting tools, analysis
tools, and/or data mining tools (e.g., trend analysis, prediction, and so on).
4.2.3 Data Warehouse Models:
There are three data warehouse models.
1. Enterprise warehouse:
An enterprise warehouse collects all of the information about subjects spanning the entire
organization.
It provides corporate-wide data integration, usually from one or more operational systems or
external information providers, and is cross-functional in scope.
It typically contains detailed data as well as summarized data, and can range in size from a
few gigabytes to hundreds of gigabytes, terabytes, or beyond.
An enterprise data warehouse may be implemented on traditional mainframes, computer
super servers, or parallel architecture platforms. It requires extensive business modelling and
may take years to design and build.
2. Data mart:
A data mart contains a subset of corporate-wide data that is of value to a specific group of
users. The scope is confined to specific selected subjects. For example, a marketing data mart
may confine its subjects to customer, item, and sales. The data contained in data marts tend to
be summarized.
Data marts are usually implemented on low-cost departmental servers that are
UNIX/LINUX- or Windows-based. The implementation cycle of a data mart is more likely to
be measured in weeks rather than months or years. However, it may involve complex
integration in the long run if its design and planning were not enterprise-wide.
Depending on the source of data, data marts can be categorized as independent or
dependent. Independent data marts are sourced from data captured from one or more
operational systems or external information providers, or from data generated locally within a
particular department or geographic area. Dependent data marts are sourced directly from
enterprise data warehouses.
3. Virtual warehouse:
A virtual warehouse is a set of views over operational databases. For efficient query
processing, only some of the possible summary views may be materialized.

A virtual warehouse is easy to build but requires excess capacity on operational database
servers.
UNIT 5. INFORMATION RETRIEVAL SYSTEMS
5.0 Introduction
The retrieval systems are usually in a state of continuous gradual revision; data are added or
withdrawn; new index points inserted; syndetic relationship changed. The development of
effective retrieval technique has been the core of IR research for more than 30 years.
Nowadays multimedia indexing and retrieval techniques are being developed to access
image, video and sound database without text descriptions.
5.1 Information Retrieval
Information Retrieval (IR) is finding material (usually documents) of an unstructured nature
(usually text) that satisfies an information need from within large collections (usually stored
on computers). Information retrieval technology has been central to the success of the Web.
Information Retrieval is the process of obtaining relevant information from a collection of
informational resources. It does not return information that is restricted to a single object
collection but matches several objects which vary in the degree of relevancy to the query. So,
we have to think about what concepts IR systems use to model this data so that they can
return all the documents that are relevant to the query term and ranked based on certain
importance measures. These concepts include dimensionality reduction, data modelling,
ranking measures, clustering etc. These tools that IR systems provide would help you get
your results faster. So, while computing the results and their relevance, programmers use
these concepts to design their system, think of what data structures and procedures are to be
used which would increase speed of the searches and better handling of data.
Generally, Information retrieval, as the name implies, concerns the retrieving of
relevant information from the collection of information. It is basically concerned with
facilitating the user’s access to large amounts of (predominantly textual) information. The
process of information retrieval involves the following stages:
• Representing Collections of Documents - how to represent, identify and process the
collection of documents.
• User-initiated querying - understanding and processing of the queries.
• Retrieval of the appropriate documents - the searching mechanism used to obtain and
retrieve the relevant document
5.2 What is an IR system?
An IR system accepts a query from a user and responds with a set of documents. The system
returns both relevant and non-relevant material and a document organization approach
are applied to assist the user finding the relevant information in the retrieved set.
Let’s illustrate by means of a black box what a typical IR system would look like. The
diagram shows three components: input, processor and output. Such a trichotomy may seem
a

little trite, but the components constitute a convenient set of pegs upon which to hang a
discussion.
The main problem here is to obtain a representation of each document and query suitable for
a computer to use. Let me emphasize that most computer-based retrieval systems store only a
representation of the document (or query) which means that the text of a document is lost
once it has been processed for the purpose of generating its representation. A document
representative could, for example, be a list of extracted words considered to be significant.
Rather than have the computer process the natural language, an alternative approach is to
have an artificial language within which all queries and documents can be formulated.
Figure 5.1: IR system

Generally speaking, IR systems:
• Are systems which are built to retrieve documents highly likely relevant to the user
• Are systems built to reduce user’s workload in searching through the store of documents
to find relevant once
• Are systems that give information about the presence or absence of documents in
accordance with the query
- Automated abstracts or summaries of documents were developed to further
simplify access to search results
• Are computer based systems (we are talking about automation)
• Are systems that attempt to find relevant documents to respond to user’s request
• Is a set of rules and procedures, as operated by humans and/or machines, for doing some
or all of the following operations
- Indexing (or constructing representation of documents)
- Search formulation (or constructing representation of information needs)
- Searching (or matching representation of documents against representation of needs)
- Feedback (or repeating any or all of the above processes with modifications
introduced in response to an assessment of results of some process)
- Indexing Language Construction (or the generation of rules of representation)

5.3 Examples of Information Retrieval Systems

Typical examples of IR systems are search engines that can be found on the web or in library
- They concentrate on finding documents, performing full text retrieval
- After a user types in several keywords, the system returns the documents that are
most interesting according to the system.
5.4 Importance of Information Retrieval Systems
The main importance of IR systems are:
a. Regulatory Compliance
A well-organized information storage and retrieval system that follows compliance
regulations and tax record-keeping guidelines significantly increases a business owner’s
confidence the business is fully complying.
b. Efficiency and Productivity
Any time a business owner or employees spend searching through stacks of loose files or
spend trying locate missing or misfiled records is inefficient, unproductive and can prove
costly to a small business. A good information storage and retrieval system, including an
effective indexing system, not only decreases the chances information will be misfiled but
also speeds up the storing and retrieval of information. The resulting time-saving benefit
increases office efficiency and productivity while decreasing stress and anxiety.
c. Improve Working Environment
It can be disheartening to anyone walking through an office area to see vital business
documents and other information stacked on top of file cabinets or in boxes next to office
workstations. Not only does this create a stressful and poor working environment, but if
customers see this, can cause customers to form a negative perception of the business.
Contrast this with an office area in which file cabinets, aisles and workstations are clear
and neatly organized to see how important it is for even a small business to have a well-
organized information storage and retrieval system.
5.5 Functions of IR systems
Major functions of IR systems
- Analyze contents of information items
- Represent the contents of the analyzed sources in a way suitable for matching with
users’ queries
- Analyze users’ information need and represent them in a form that will be suitable for
matching with the database
- Match the search statement with the stored database

- Retrieve or generate information that are relevant in a ranking which reflects

relevance
- Make necessary adjustments in the system based on feedback from users
5.6 Information retrieval process

The process of information retrieval involves the following stages:
- Representing Collections of Documents - how to represent, identify and process the
collection of documents.
- User-initiated querying - understanding and processing of the queries.
- Retrieval of the appropriate documents - the searching mechanism used to obtain and
retrieve the relevant document
The following step shows the IR processes (see figure below)
1. The user need is specified via the user interface, in the form of a textual query, qU
(typically made of keywords).
2. The query qU is parsed and transformed by a set of textual operations; the same operations
have been previously applied to the contents indexed by the IR system
3. Query operations further transform the pre-processed query into a system-level
representation, qS.
4. The query qS is executed on top of a document source D (e.g., a text database) to retrieve a
set of relevant documents, R. Fast query processing is made possible by the index structure
previously built from the documents in the document source.
5. The set of retrieved documents R is then ordered: documents are ranked according to the
estimated relevance with respect to the user’s need.
6. The user then examines the set of ranked documents for useful information; he might
pinpoint a subset of the documents as definitely of interest and thus provide feedback to the
system
Textual operations translate the user‟s need into a logical query and create a logical view of
documents

Figure 5.2: Textual IR Process

There are three main ingredients to the IR process
- Texts or documents
- Queries
- The process of evaluation
For texts, the main problem is to obtain a representation of the text in a form which is
amenable to automatic indexing. This is achieved (i.e., the representation) by creating an
abbreviated form of the text, known as a text surrogate. Atypical surrogate would consist of a
set of index terms or keywords or descriptors.
For queries, the query has arisen as a result of an information need on the part of the user.
The query is then a representation of the information need and must be expressed in a
language understood by the system. Due to the inherent difficulty of accurately representing
the information need, the query in IR system is always regarded as approximate and
imperfect.
For the evaluation, the evaluation process involves a comparison of the text actually
retrieved with those the user expected to retrieve. This often leads to some modification,
typically of the query through possibly of the information need or even of the surrogates. The
extent to which modification is required is closely linked with the process of measuring the
effectiveness of the retrieval operation (recall and precision)

It is necessary to define the text database before any of the retrieval processes are initiated.
This is usually done by the manager of the database and includes specifying the following
– The documents to be used
– The operations to be performed on the text
– The text model to be used (the text structure and what elements can be retrieved) the text
operations transform the original documents and the information needs and generate a logical
view of them
Once the logical view of the documents is defined, the database module builds an index of
the text
– An index is a critical data structure
– It allows fast searching over large volumes of data
Different index structures might be used, but the most popular one is the inverted file (more
on this later). Given the document database is indexed, the retrieval process can be
initiated. The user first specifies a user need which is then parsed and transformed
by the same text operation applied to the text. Then the query operations might be applied
before the actual query, which provides the system representation for the user need, is
generated. Matching- The query is then processed to obtain the retrieved documents. Before
the retrieved documents are sent to the user, the retrieved documents are ranked according to
the likelihood of relevance.
The user then examines the set of ranked documents in the search for useful information
• Two choices for the user
– Reformulate query, run on entire collection
– Reformulate query, run on result set
At this point, he might pinpoint a subset of the documents seen as definitely of interest and
initiate a user feedback cycle. In such a cycle, the system uses the documents selected by the
user to change the query formulation. Hopefully, this modified query is a better
representation of the real user need.
5.7 Basic structure of IR systems

An Information Retrieval System serves as a bridge between the world of authors and the
world of readers/users, That is, writers present a set of ideas in a document using a set of
concepts,

The black box is the processing part of the information retrieval system; it includes mainly
indexing and searching
Translation from user need to query

– Usually, manually (by user himself)
– Tools available to assist the process
Translation from item to representation (surrogate)
– Often, automatically (by the system)
– Representation can be at different level:
• Full text, abstract only, index terms only, etc.
Duality of the two translations
– User query can be regarded as the representation of the ideal (sought-after) item
– Often, similar techniques are used to generate both
5.8 Challenges in Information Retrieval

The first problem or challenge in an IR is representation of information items and

information needs (Document representation is one area of IR and Query representation is
another area of IR).
The second problem is matching (How to match need vs. information items). Modification of
representation as a result of judgment (query expansion or reformulation). Build a system
that retrieves documents that users are highly likely to find relevant to their request (i.e.,
Information need).
UNIT 6. INFORMATION SECURITY AND PRIVACY
6.1 What is Information Security?

Information security is defined as the preservation of confidentiality, integrity and
availability of information. It typically involves preventing or at least reducing the probability
of unauthorized/inappropriate access, use, disclosure, disruption, deletion/destruction,
corruption, modification, inspection, recording or devaluation, although it may also involve
reducing the adverse impacts of incidents. Information may take any form, e.g., electronic or
physical, tangible (e.g. paperwork) or intangible (e.g. knowledge).
Information security's primary focus is the balanced protection of the confidentiality,
integrity and availability of data (also known as the CIA triad) while maintaining a focus on
efficient policy implementation, all without hampering organization productivity.
Cyber Security, in contrast, includes not only information security, but also digital
infrastructure security, such as Supervisory Control and Data Acquisition (SCADA) systems
and Internet-of-Things (IoT) systems, which goes beyond the protection of valuable
information.
6.2 4Rs of Information Security
The 4Rs of information security are Right Information, Right People, Right Time and Right
Form. Control over the 4Rs is the most efficient way to maintain and control the value of
information.

3

Figure 6.1: 4Rs of Information Security

“Right Information” refers to the accuracy and completeness of information, which
guarantees the integrity of information.
“Right People” means that information is available only to authorized individuals, which
guarantees confidentiality.
“Right Time” refers to the accessibility of information and its usability upon demand by an
authorized entity. This guarantees availability.
“Right Form” refers to providing information in the right format.
To safeguard information security, the 4Rs have to be applied properly. This means that
confidentiality, integrity and availability should be observed when handling information.
6.3 Information Security Trends and Directions
6.3.1. Types of Cyber
Threats External Threats
External threats are attacks that are conducted from non-employees and the attacks are
usually done remotely from outside the organization’s office. Examples of such threats
include hacking, denial of service and malware.
Internal Threats
Internal threats are attacks that are conducted by employees or contractors who have physical
access to the organization’s systems, networks and applications. Such attacks are usually
carried out by disgruntled employees/contractors. Internal attacks can also be unknowingly
facilitated by employees/contractors using social engineering and exploiting less security
aware employees.
6.3.2. Types of External Threats
Hacking
Hacking is the act of gaining access to a computer or computer network to obtain or modify
information without legal authorization.
Hacking can be classified as recreational, criminal or political hacking, depending on the
purpose of the attack. Recreational hacking is unauthorized modification of programs and
data simply to satisfy the hacker’s curiosity. Criminal hacking is used in fraud or espionage.
Political hacking is tampering with websites to broadcast unauthorized political messages.
Recently, hacking has become more and more implicated in cyber terrorism and
cyberwarfare, posing a major threat to national security. Another new trend shows hacking
groups targeting major sites with national interest and holding highly sensitive information.
6.3.3. Types of Internal Attacks

➢ Disgruntled employees/contractors
Internal attacks are one of the biggest threats facing our data and systems. Rogue employees,
especially members of the IT team with knowledge of and access to networks, data centres
and admin accounts, can cause serious damage to an organizations network, systems and
data.
➢ Lack of employee security awareness
Security awareness training for employees helps to eradicate risky behaviours that could
potentially lead to cyber breaches. Training programs could address some of the threats faced
by an organization, especially attacks such as phishing emails, ransomware, and social
engineering scams via the telephone, text message, or social media channels.
➢ Social engineering
The term “social engineering” refers to a set of techniques used to manipulate people into
divulging confidential information. Although it is similar to a confidence trick or simple
fraud, the term typically applies to trickery for information gathering or computer system
access. In most cases the attacker never comes face-to-face with the victim.
7.4. Standards for Information Security Activities
Information security activities cannot be effectively performed without the mobilization of a
unified administrative, physical and technical plan.
Many organizations have recommended standards for information security activities.
Examples include the International Organization for Standardization and International
Electrotechnical Commission (ISO/IEC), International Telecommunication Union (ITU-U),
information security requirements and evaluation items of the Certified Information Systems
Auditor (CISA) of the Information Systems Audit and Control Association (ISACA), and
Certified Information Systems Security Professional (CISSP) of the International Information
System Security Certification Consortium (ISC). These standards recommend unified
information security activities, such as the formulation of an information security policy, the
construction and operation of an information security organization, human resources
management, physical security management, technical security management, security audit
and business continuity management.
The table below lists the standards related to information security domains.
Table 5: Information security domains and related standard and certifications
Security domains
ISO/IEC 27001 CISA CISSP
Information Security Governance and Security Architecture and
Policy Management of IT Engineering
Organization of
Information Security

Asset Management Protection of Security and Risk

Information Assets Management
Human Resources
Security
Information Security
Incident Management
Administrative Information Security

aspects of Business
Continuity
Management
Supplier Information Security Assessment and
Relationships systems (IS) Testing
Auditing Process
Compliance
Physical and Asset Security
Physical Environmental
Security
Cryptography Information Security Operations
Systems
Communications and
Communications Operations and
Network Security
Security Business
Resilience
Technical Operations Security
Access Control Identity and Access
Management
ISO/IEC27001 focuses on administrative security. In particular, it emphasizes documentation

and operation audit as administrative behaviour and the observance of policy/guideline and
law. Continuous confirmation and countermeasures by the administrator are required. Thus,
ISO/IEC27001 tries to address the weak points of security systems, equipment, and the like
in an administrative way.
In contrast, there is no mention of human resources or physical security in CISA, which
focuses on audit activities and controls on information systems. Accordingly, the role of
auditors and the performance of audit process are considered very important.
CISSP focuses mainly on technical security. It emphasizes the software development,
identity and access management, communications and network security and operations
security.
6.5 Improving Security

Given the trends in security threats and attack technologies, a robust defense requires a
flexible strategy that allows adaptation to the changing environment, well-defined policies
and procedures, the use of appropriate security technologies, and constant vigilance.
It is helpful to begin a security improvement programme by determining the current state of
security. Integral to a security programme are documented policies and procedures, as well as
technology that supports their implementation.
A. Administrative Security
Administrative security consists of an information security strategy, policy and guidelines.
An information security strategy sets the direction for all information security activities.
An information security policy is a documented high-level plan for organization-wide
information security. It provides a framework for making specific decisions, such as an
administrative and physical security plan.
Information security guidelines should be established according to the information security
strategy and policy. The guidelines should specify regulations for each area related to
information security. And because the guidelines must be comprehensive and national in
scope, they must be developed and delivered by the government for observance by
organizations.
A country’s information security strategy, policy and guidelines should be in compliance
with related law. Their scope should be within the boundaries of national and international
laws.
B. Technological security
Various technologies have been developed to help organizations secure their information
systems against intruders. These technologies help to protect systems and information against
attacks, to detect unusual or suspicious activities, and to respond to events that affect security.
Today’s security systems have been designed and developed based on a Defense-In-Depth
(DID) model that leads to unified management of the technologies involved. This model is
different from perimeter defence, which has only one layer of defence against all threats. The
DID model consists of prevention, detection and tolerance, with threats being reduced at each
phase

7

Prevention Technology
Prevention technologies protect against intruders and threats at the storage or system level.
These technologies include the following:
1. Cryptography – Also referred to as encryption, cryptography is a process of
translating information from its original form (called plaintext) into an encoded,
incomprehensible form (called ciphertext). Decryption refers to the process of taking
ciphertext and translating it back into plaintext.
2. One-time passwords (OTPs) – As the name implies, OTPs can be used only once.
Static passwords can more easily be accessed by password loss, password sniffing,
brute-force password cracks, and the like.
3. Firewalls – Firewalls regulate some of the flow of traffic between computer networks
of different trust levels such as between the Internet, which is a no-trust zone, and an
internal network, which is a zone of higher trust
4. Vulnerability analysis tool –In computer security, a vulnerability is a weakness that
allows an attacker to violate a system. Vulnerabilities may result from weak
passwords, software bugs, a computer virus, a script code injection, an SQL injection
or malware. Vulnerability analysis tools detect these vulnerabilities.
Detection Technology
Detection technology is used to detect and trace abnormal states and intrusion in networks or
important systems. Detection technology includes the following:
1. Antivirus – An antivirus software is a computer program for identifying, neutralizing
or eliminating malicious code, including worms, phishing attacks, rootkits, Trojan
horses and other malware.
2. Intrusion detection system (IDS) – An IDS gathers and analyses information from
various areas within a computer or a network to identify possible security breaches.
Intrusion detection functions include analysis of abnormal activity patterns and ability
to recognize attack patterns.
3. Intrusion prevention system (IPS) – Intrusion prevention attempts to identify potential
threats and respond to them before they are used in attacks. An IPS monitors network
traffic and takes immediate action against potential threats according to a set of rules
established by the network administrator. For example, an IPS might block traffic
from a suspicious IP address.

4. Malware sand box system – A "malware sandbox" is a security system that separates
execution of programs, usually in an effort to mitigate malware from spreading. It is
often used to execute untested or untrusted programs or code, possibly from
unverified or untrusted third parties, suppliers, users or websites, in a “sandbox”
without risking harm to the host machine or operating system. The sandbox typically
tightly controls the programs, and restricts the program’s access to disk, memory and
network.
5. Network Traffic Analysis (NTA) – Network traffic analysis is an active cyber defence
activity. It is "the process of proactively and iteratively searching through networks to
detect and isolate advanced threats that evade existing security solutions”
6.6. Protection of Privacy
This section aims to:
• Trace changes in the concept of privacy;
• Describe international trends in privacy protection; and
• Give an overview and examples of Privacy Impact Assessment
6.6.1. The Concept of Privacy
Personal information is any information relating to an identifiable individual or an identified
or identifiable natural person. It includes information such as an individual’s name, phone
number, address, e-mail address, licence number of an automobile, physical characteristics
(facial dimensions, fingerprints, handwriting, etc.), credit card number and family
relationship.
Inappropriate access to and collection, analysis and use of an individual’s personal
information have an effect on the behaviour of others towards that individual, and ultimately
have a negative impact on his/her social standing, property and safety. Therefore, personal
information should be protected from improper access, collection, storage, analysis and use.
In this sense, personal information is the subject of protection.
When the subject of protection is the right to personal information rather than the personal
information itself, this is the concept of privacy. There are five ways to explain the right to
privacy:
• The right to be free from unwanted access (e.g., physical access and access via short
messaging service)
• The right not to allow personal information to be used in an unwanted way (e.g., sale
of information, exposure of information and matching)
• The right not to allow personal information to be collected by others without
one’s knowledge and consent (e.g., through the use of CCTV and cookies)
• The right to have personal information expressed accurately and correctly (i.e.
integrity)
• The right to get rewarded for the value of one’s own information
United Nations guidelines related to protection of privacy

Since the late 1960s, the world has paid attention to the effect on privacy of automated
information processing. UNESCO in particular has shown interest in privacy and privacy
protection since the “UN Guidelines for the Regulation of Computerized Personal Data File”
was adopted by the General Assembly in 1990.
The United Nations Guidelines are applied to documents (papers) as well as computerized
data files in the public or private sectors. The Guidelines establish a series of principles
concerning minimum guarantees to be provided for national legislation or in the internal laws
of international organizations, as follows:
1. Principle of lawfulness and fairness

Information about persons should not be collected or processed in unfair or unlawful ways,
nor should it be used for ends contrary to the purposes and principles of the Charter of the
United Nations.
2. Principle of accuracy
Persons responsible for the compilation of files or those responsible for keeping them have an
obligation to conduct regular checks on the accuracy and relevance of the data recorded and
to ensure that they are kept as complete as possible in order to avoid errors of omission, and
that they are kept up to date regularly or when the information contained in a file is used, as
long as they are being processed.
3. Principle of purpose-specification
The purpose that a file is to serve and its utilization in terms of that purpose should be
specified, legitimate and, when it is established, receive a certain amount of publicity or be
brought to the attention of the person concerned.
4. Principle of interested-person access
Everyone who offers proof of identity has the right to know whether information concerning
him/ her is being processed and to obtain it in an intelligible form, without undue delay or
expense, and to have appropriate rectifications or erasures made in the case of unlawful,
unnecessary or inaccurate entries and, when it is being communicated, to be informed of the
addressees.
5. Principle of non-discrimination
Subject to cases of exceptions restrictively envisaged under principle 6, data likely to give
rise to unlawful or arbitrary discrimination, including information on racial or ethnic origin,
colour, sex life, political opinions, religious, philosophical and other beliefs as well as
membership in an association or trade union, should not be compiled.
6. Power to make exceptions

Departures from principles 1 to 4 may be authorized only if they are necessary to protect
national security, public order, public health or morality, as well as, inter alia, the rights and
freedoms of others, especially persons being persecuted (humanitarian clause), provided that
such departures are expressly specified in a law or equivalent regulation promulgated in
accordance with the internal legal system that expressly states their limits and sets forth
appropriate safeguards.
7. Principle of security
Appropriate measures should be taken to protect the files against both natural dangers, such
as accidental loss or destruction, and human dangers, such as unauthorized access, fraudulent
misuse of data or contamination by computer viruses.
8. Supervision and sanctions

The law of every country shall designate the authority that, in accordance with its domestic
legal system, is to be responsible for supervising observance of the principles set forth above.
This authority shall offer guarantees of impartiality, independence vis-à-vis persons or
agencies responsible for processing and establishing data, and technical competence. In the
event of violation of the provisions of the national law implementing the aforementioned
principles, criminal or other penalties should be envisaged together with the appropriate
individual remedies.
9. Transborder data flows
When the legislation of two or more countries concerned by a transborder data flow offers
comparable safeguards for the protection of privacy, information should be able to circulate
as freely as inside each of the territories concerned. If there are no reciprocal safeguards,
limitations on such circulation may not be imposed unduly and only in so far as the
protection of privacy demands.
10. Field of application
The present principles should be made applicable, in the first instance, to all public and
private computerized files and, by means of optional extension and subject to appropriate
adjustments, to manual files. Special provisions, also optional, might be made to extend all or
part of the principles to files on legal persons particularly when they contain some
information on individuals.
6.6.3. Privacy Impact Assessment (PIA)
A Privacy Impact Assessment (PIA) is a systematic process of investigating, analysing and
evaluating the effect on the customers’ or the nation’s privacy of the introduction of new
information systems or the modification of existing information systems. PIA is based on the
principle of preliminary prevention—i.e. prevention is better than cure. It is not simply a
system evaluation but the consideration of the serious effects on privacy of introducing or
changing new systems. Thus, it is different from the privacy protection audit that ensures the
observance of internal policy and external requirements for privacy.

Because a PIA is conducted to analyse the privacy invasion factor when a new system is
built, it should be performed at the early phase of development, when adjustments to
development specifications are still possible. However, when a serious invasion risk occurs in
collecting, using and managing personal information while operating the existing service, it
would be desirable to perform a PIA and then modify the system accordingly.
UNIT 7. INFORMATION SYSTEM ANALYSIS AND DESIGN
7.0 Introduction
Information systems analysis and design is a method used by companies ranging from IBM
to PepsiCo to Sony to create and maintain information systems that perform basic business
functions such as keeping track of customer names and addresses, processing orders, and
paying employees. The main goal of systems analysis and design is to improve organizational
systems, typically through applying software that can help employees accomplish key
business tasks more easily and efficiently. As a systems analyst, you will be at the center of
developing this software.
Figure 7.1: Information System Analysis and Design

The analysis and design of information systems are based on: Your understanding of the
organization’s objectives, structure, and processes Your knowledge of how to exploit
information technology for advantage To be successful in this endeavour, you should follow
a structured approach.
7.1 The Systems Development Life Cycle (SDLC)
The SDLC is a structured approach used in the field of software engineering to guide the
development of information systems and software applications. It encompasses a series of
phases or stages that help ensure the successful planning, design, development,
implementation, and maintenance of a system. Here are the typical phases of the SDLC:
Figure 7.2 : Stages of the Systems Development Life Cycle

(SDLC) Stage 1: Analysis – What Do We Want?
In this initial phase, the project team works closely with stakeholders to understand their
needs and gather requirements for the system. This involves identifying user requirements,
functional specifications, and any constraints or limitations.
Stage 2: Design – What Will the Finished Project Look Like?
In this phase, the system's architecture and design are created based on the requirements
gathered. The design includes components such as database structure, user interface, system
modules, and integration points. Design decisions are made to ensure that the system meets
the specified requirements and is scalable, maintainable, and efficient.
Stage 3: Development– Let’s Create the System

The phase involves the actual development of the system. Programmers write code based on
the design specifications, and databases are created and populated with data. This phase may
also involve integrating third-party components or customizing existing software. Testing is
an integral part of implementation to identify and fix any defects or issues.
Stage 4: Testing– Let’s Create the System
In this phase, the system is thoroughly tested to ensure it meets the specified requirements
and functions correctly. Different testing techniques, such as unit testing, integration testing,
system testing, and user acceptance testing, are employed to validate the system's
functionality, performance, and reliability. Defects and issues are identified and resolved
during this phase.
Stage 5: Implementation– How Will We Use It?

Once the system has passed the testing phase, it is ready for deployment. This involves
installing the system in the production environment, configuring any necessary hardware or
software, and migrating data from legacy systems if applicable. User training and
documentation may also be provided to facilitate the smooth transition to the new system.
Stage 6: Maintenance– Let’s Make the Improvements
After the system is deployed, ongoing maintenance and support activities are performed. This
includes monitoring the system's performance, addressing user feedback, fixing defects, and
implementing changes or enhancements as needed. Regular updates, patches, and system
backups are part of the maintenance process to ensure the system's stability and security.
It's important to note that there are different variations and models of the SDLC, such as the
Waterfall model, Agile methodologies (e.g., Scrum, Kanban), and iterative models. Each
model has its own set of characteristics and emphasizes different aspects of the development
process. The choice of the SDLC model depends on factors such as project requirements,
timeline, team size, and organizational preferences.
7.2 Requirement Analysis and System Design
7.2.1 Requirement Analysis:
Requirement analysis involves gathering, documenting, and analyzing the needs and
expectations of stakeholders to define the system requirements. The goal is to understand
what the software system should do, its functionalities, constraints, and any specific user
requirements. Here are the key steps involved in requirement analysis:
• Elicitation: Gathering requirements from stakeholders through interviews, surveys,
workshops, and other techniques. This includes understanding user needs, business
processes, and system constraints.

• Documentation: Capturing and documenting the requirements in a clear and

understandable manner. This can be done using various tools and formats such as
requirement documents, use cases, user stories, or diagrams.
• Analysis and Prioritization: Analyzing the requirements, identifying dependencies,
and prioritizing them based on their importance and impact on the system.
• Validation: Ensuring the requirements are complete, consistent, and aligned with the
stakeholders' expectations. This may involve reviews, walkthroughs, and obtaining
feedback from stakeholders.
• Requirements Traceability: Establishing traceability between requirements and other
artifacts such as design, implementation, and testing to ensure that all requirements
are addressed.
7.2.2 System Design
System design involves transforming the requirements into a detailed blueprint of the
software system. It focuses on defining the system's architecture, components, interfaces, and
data structures. The system design phase typically includes the following steps:
Architectural Design: Defining the overall structure and organization of the system. This
includes identifying modules, components, and their interactions. Architectural design
decisions consider factors such as scalability, performance, security, and maintainability.
Component Design: Designing individual components or modules that make up the system.
This involves specifying the internal structure, interfaces, and interactions of each
component. Design patterns and best practices are often applied to ensure modular, reusable,
and maintainable code.
Data Design: Designing the data structures and databases required by the system. This
includes defining the data entities, relationships, and access methods. Data integrity, security,
and efficiency considerations are taken into account.
User Interface Design: Designing the user interface (UI) of the software system. This
includes creating wireframes, prototypes, or mockups to visualize the layout, navigation, and
interaction flow. Usability and user experience (UX) principles are applied to ensure an
intuitive and user-friendly interface.
7.3 Software Development Methodologies
There are several software development methodologies or approaches that guide the process
of developing software. These methodologies provide frameworks, principles, and practices
to manage the software development life cycle effectively. Here are some commonly used
software development methodologies:
a. Waterfall Model:

5

The Waterfall model is a linear and sequential approach to software development. It follows a
strict top-down flow, where each phase (requirements, design, implementation, testing,
deployment) is completed before moving to the next. It is a plan-driven methodology and
works well for projects with well-defined and stable requirements.
b. Agile Model:
Agile methodologies, such as Scrum, Kanban, and Extreme Programming (XP), promote an
iterative and incremental approach. They emphasize flexibility, collaboration, and
adaptability to change. Agile methodologies prioritize delivering working software in short
iterations called sprints, gathering feedback from stakeholders, and continuously improving
the product.
c. Spiral Model:

6

The Spiral model is an iterative and risk-driven approach that combines elements of the
Waterfall model and prototyping. It involves multiple iterations where each iteration includes
requirements gathering, risk analysis, prototyping, development, and testing. The Spiral
model is suitable for projects with high-risk factors that require a flexible and iterative
approach.
d. The Iterative model
The Iterative model incorporates a series of smaller “waterfalls,” where manageable portions
of code are carefully analyzed, tested, and delivered through repeating development cycles.
Getting early feedback from an end user enables the elimination of issues and bugs in the
early stages of software creation.
The Iterative model is often favored because it is adaptable, and changes are comparatively
easier to accommodate.
e. V-Model

7

8. List various methods used in on-site observation.
UNIT 8. LEGAL AND ETHICAL ISSUES IN INFORMATION MANAGEMENT
8.0 Introduction
It probably goes without saying that the security and ethical issues raised by the Information
Age, and specifically the Internet, are the most explosive to face our society in decades. It
will be many years and many court battles before socially acceptable policies and practices
are in place. You say to yourself, “Hey I don’t really care about my online privacy. Nobody
will ever care about what I do or where I go on the Internet.” Well, you might want to think
twice about that.
Organizations nowadays are having strict policies and measures in terms of ethical concerns
pertaining to information management, training is provided which helps them to engage in
learning about the conduct as well as learning to resolve issues when they arise. Workplace
policies are based on the company’s mission statement and the philosophy they follow and
help employees follow that code of conduct.
8.1 Intellectual Property Rights and Copyright Laws
Intellectual property (IP) refers to the creations of the human mind like inventions, literary
and artistic works, and symbols, names, images and designs used in commerce. Intellectual
property is divided into two categories: Industrial property, which includes inventions
(patents), trademarks, industrial designs, and geographic indications of source; and
Copyright, which includes literary and artistic works such as novels, poems and plays, films,
musical works, artistic works such as drawings, paintings, photographs and sculptures, and
architectural designs. Rights related to copyright include those of performing artists in their
performances, producers of phonograms in their recordings, and those of broadcasters in their
radio and television programs. Intellectual property rights protect the interests of creators by
giving them property rights over their creations.

The most noticeable difference between intellectual property and other forms of property,
however, is that intellectual property is intangible, that is, it cannot be defined or identified
by its own physical parameters. It must be expressed in some discernible way to be
protectable.
8.1.1 Types of Intellectual Property
The term intellectual property is usually thought of as comprising four separate legal fields:
1. Trademarks
2. Copyrights
3. Patents
4. Trade secrets
1. Trademarks and Service Marks: A trademark or service mark is a word, name, symbol,
or device used to indicate the source, quality and ownership of a product or service. A
trademark is used in the marketing is recognizable sign, design or expression which identifies
products or service of a particular source from those of others. The trademark owner can be
an individual, business organization, or any legal entity. A trademark may be located on a
package, a label, a voucher or on the product itself. For the sake of corporate identity
trademarks are also being.
2. Copyrights:
Copyright is a form of protection provided by U.S. law (17 U.S.C 101 et seq) to the authors
of "original works of authorship" fixed in any tangible medium of expression. The manner
and medium of fixation are virtually unlimited. Creative expression may be captured in
words, numbers, notes, sounds, pictures, or any other graphic or symbolic media. The subject
matter of copyright is extremely broad, including literary, dramatic, musical, artistic,
audiovisual, and architectural works. Copyright protection is available to both published and
unpublished works.
3. Patents:
A patent for an invention is the grant of a property right to the inventor, issued by the United
States Patent and Trademark Office. Generally, the term of a new patent is 20 years from the
date on which the application for the patent was filed in the United States or, in special cases,
from the date an earlier related application was filed, subject to the payment of maintenance
fees. U.S. patent grants are effective only within the United States, U.S. territories, and U.S.
possessions. Under certain circumstances, patent term extensions or adjustments may be
available.
There are three types of patents:
Utility patents may be granted to anyone who invents or discovers any new and useful
process, machine, article of manufacture, or composition of matter, or any new and useful
improvement thereof.

Design patents may be granted to anyone who invents a new, original, and ornamental
design for an article of manufacture; and
Plant patents may be granted to anyone who invents or discovers and asexually reproduces
any distinct and new variety of plant.
4. Trade Secrets: A trade secret consists of any valuable business information. The
business secrets are not to be known by the competitor. There is no limit to the type of
information that can be protected as trade secrets; For Example: Recipes, Marketing plans,
financial projections, and methods of conducting business can all constitute trade secrets.
There is no requirement that a trade secret be unique or complex; thus, even something as
simple and nontechnical as a list of customers can qualify as a trade secret as long as it
affords its owner a competitive advantage and is not common knowledge.
8.2 Ethical Considerations in Information Management
Ethical considerations in information management are crucial to ensure the responsible and
proper handling of data and information. Some key ethical considerations in information
management include:
1. Privacy: Respecting individual privacy is of utmost importance. Organizations should
collect, store, and use personal information only with the informed consent of
individuals, and they should take appropriate measures to protect the confidentiality
of that information.
2. Data Security: Safeguarding data from unauthorized access, breaches, or misuse is
essential. Organizations should implement robust security measures, including
encryption, access controls, and regular monitoring, to protect sensitive information.
3. Data Accuracy and Quality: Organizations should strive to maintain accurate and
reliable data. They should ensure that data is collected and recorded correctly, and
take steps to validate and verify its accuracy. Inaccurate or misleading information
can have serious consequences and should be avoided.
4. Transparency: Organizations should be transparent about how they collect, use, and
share data. They should provide clear and easily understandable information about
their data management practices, including the purposes for which data is collected
and any third parties with whom it is shared.
5. Consent and Control: Individuals should have control over their personal data.
Organizations should obtain informed consent from individuals before collecting their
data, and individuals should have the right to access, modify, or delete their data if
desired.
6. Avoidance of Discrimination: Information management practices should not lead to
discrimination or unfair treatment of individuals or groups. Organizations should
ensure that data is used in a fair and unbiased manner, and take steps to prevent
algorithmic biases or discriminatory profiling.
7. Compliance with Laws and Regulations: Organizations should comply with
applicable laws, regulations, and industry standards related to information
management. This includes data protection laws, intellectual property rights, and
industry-specific requirements.

8.3 Privacy and Data Protection Regulations
Privacy and data protection regulations are laws and regulations designed to safeguard the
privacy and personal data of individuals. These regulations vary across countries and regions,
but they generally share common objectives in protecting the rights and freedoms of
individuals and establishing rules for the collection, use, storage, and disclosure of personal
information. They include;
General Data Protection Regulation (GDPR): The GDPR is a comprehensive privacy

regulation that came into effect in the European Union (EU) in May 2018. It applies to all EU
member states and regulates the processing of personal data of EU residents. The GDPR
grants individuals various rights, such as the right to access, rectify, and erase their personal
data, and imposes obligations on organizations to ensure privacy and data protection.
California Consumer Privacy Act (CCPA): The CCPA is a privacy law in California,
United States, that became effective in January 2020. It aims to enhance privacy rights and
consumer protection for California residents. The CCPA grants consumers the right to know
what personal information is collected about them, the right to opt out of the sale of their
personal information, and the right to request deletion of their data.
Personal Information Protection and Electronic Documents Act (PIPEDA): PIPEDA is a

federal privacy law in Canada that applies to the collection, use, and disclosure of personal
information in the course of commercial activities. It sets out rules for obtaining consent,
safeguarding personal information, and providing individuals with access to their data.
PIPEDA has been supplemented by provincial privacy laws in some provinces, such as the
Personal Information Protection Act (PIPA) in Alberta and British Columbia.
Health Insurance Portability and Accountability Act (HIPAA): HIPAA is a United States
federal law that sets privacy and security standards for protected health information (PHI). It
applies to health care providers, health plans, and other entities that handle PHI. HIPAA aims
to protect the privacy and confidentiality of individuals' medical information and establishes
guidelines for its use and disclosure.
Personal Data Protection Act (PDPA): The PDPA is a data protection law in Singapore that
governs the collection, use, and disclosure of personal data. It provides individuals with the
right to access and correct their data, imposes obligations on organizations to protect personal
data, and establishes rules for the transfer of data outside Singapore.
Privacy Act of 1974: The Privacy Act is a U.S. federal law that governs the collection, use,
and disclosure of personal information by federal agencies. It grants individuals certain
rights, such as the right to access and amend their records, and imposes obligations on
agencies to protect personal information.

REVIEW QUESTIONS
QUESTION ONE
a) Define the concept of a management information system (MIS) and explain its role in
supporting decision-making within organizations.
b) Discuss the different types of information systems commonly found in organizations, such
as transaction processing systems, decision support systems, and enterprise resource planning
systems. Explain their respective functions and purposes.
c) Analyze the impact of information systems on organizational structure, culture, and

business processes. Provide examples to support your analysis.
QUESTION TWO
a) Explain the role of a database management system (DBMS) in managing organizational

data. Discuss the components and functions of a DBMS.
b) Describe the process of database design, including the steps involved in conceptual,
logical, and physical database design. Discuss the importance of data integrity and security in
database management.
c) Discuss the advantages and challenges associated with implementing a relational database
management system (RDBMS) in an organization. Provide examples to illustrate your points.

FURTHER READING
Goyal, D.P. 2006. Management Information Systems – Managerial Perspectives. Noida:

Vikas Publishing.
Davis, Gordon B. and Olson, Margrethe H. 1985. Management Information Systems.New
York: McGraw-Hill.
Shajahan, S. 2007. Management Information Systems. New Delhi: New AgeInternational.
Bagad, VS. 2009. Management Information Systems. Pune: Technical Publications.

Management Information Systems Lecture Notes

Uploaded by

Copyright:

Available Formats

Management Information Systems Lecture Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Management Information Systems Lecture Notes

Uploaded by

Copyright:

Available Formats

Management Information Systems lecture notes

Management Information System (Bamenda University of Science & Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

UNIVERSITE INTERNATIONALE JEAN PAUL II (UIJP-II)

COURSE CODE / COURSE TITLE: MANAGEMENT INFORMATION

The Management Information Systems (MIS) course is

Upon completion of the course, students are expected to be able to

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Compiled by Ngwain Ndong Blasius 1

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

❖ Explore the use of business intelligence and analytics

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

UNIT 1: INFORMATION SYSTEM

Fig. 1.1: An Information System

Compiled by Ngwain Ndong Blasius 3

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Compiled by Ngwain Ndong Blasius 4

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Processing: Manipulating, analyzing, and transforming data to generate meaningful

Compiled by Ngwain Ndong Blasius 5

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Compiled by Ngwain Ndong Blasius 6

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

UNIT 2: MANAGEMENT INFORMATION SYSTEM (MIS)

Compiled by Ngwain Ndong Blasius 7

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

2.3. Characteristics of MIS

Thus, integrated approach blends information from several operational areas.

Compiled by Ngwain Ndong Blasius 8

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

2.4 Functions of MIS

Compiled by Ngwain Ndong Blasius 9

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Compiled by Ngwain Ndong Blasius 10

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Figure 2.1 Management Information System

Compiled by Ngwain Ndong Blasius 11

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Real-Life Business Subsystem

Compiled by Ngwain Ndong Blasius 12

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Figure 2.2: Decomposition of System

Compiled by Ngwain Ndong Blasius 13

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

2.8 Upgrowth in Organizations

Figure 2.3: The Nolan Stage Model: IS Planning Framework

Compiled by Ngwain Ndong Blasius 14

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

Compiled by Ngwain Ndong Blasius 15

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

4. Define the term ‘management support system’.

Compiled by Ngwain Ndong Blasius 16

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

UNIT 3. DATABASE MANAGEMENT SYSTEMS (DBMS)

Compiled by Ngwain Ndong Blasius 17

Downloaded by Azinwi Boris (azinwiboris34@gmail.com)

3.2.1 Features of DBMS

Figure 3.1: DBMS Structure