Chapter Three
Chapter Three
A database management system (DBMS) consists of a group of programs used to access and
manage a database as well as provide an interface between the database and its users and other
application programs. A DBMS provides a single point of management and control over data
resources, which can be critical to maintaining the integrity and security of the data. A database, a
DBMS, and the application programs that use the data make up a database environment. Databases
and database management systems are becoming even more important to organizations as they
deal with rapidly increasing amounts of information. Most organizations have many databases;
however, without good data management, it is nearly impossible for anyone to find the right and
related information for accurate and business-critical decision making.
Organizations and individuals capture prodigious amounts of data from a myriad of sources every
day. Where does all this data come from, where does it go, how is it safeguarded, and how can you
use it to your advantage? For example, if you become a marketing manager, you can access a vast
store of data related to the Web-surfing habits, past purchases, and even social media activity of
existing and potential customers. You can use this information to create highly effective marketing
programs that generate consumer interest and increased sales. If you become a biologist, you may
use big data to study the regulation of genes and the evolution of genomes in an attempt to
understand how the genetic makeup of different cancers influences outcomes for cancer patients.
If you become a human resources manager, you will be able to use data to analyze the impact of
1|Page
raises and changes in employee-benefit packages on employee retention and long-term costs.
Regardless of your field of study in school and your future career, using database systems and big
data will likely be a critical part of your job. As you read this chapter, you will see how you can
use databases and big data to extract and analyze valuable information to help you succeed. This
chapter starts by introducing basic concepts related to databases and data management systems.
Later, the topic of big data will be discussed along with several tools and technologies used to store
and analyze big data.
Without data and the ability to process it, an organization cannot successfully complete its business
activities. It cannot pay employees, send out bills, order new inventory, or produce information to
assist managers in decision making. Recall that data consists of raw facts, such as employee
numbers and sales figures. For data to be transformed into useful information, it must first be
organized in a meaningful way.
Characters are put together to form a field. A field is typically a name, a number, or a combination
of characters that describes an aspect of a business object (such as an employee, a location, or a
plant) or activity (such as a sale). In addition to being entered into a database, fields can be
computed from other fields. Computed fields include the total, average, maximum, and minimum
value. A collection of data fields all related to one object, activity, or individual is called a record.
By combining descriptions of the characteristics of an object, activity, or individual, a record can
provide a complete description of it. For instance, an employee record is a collection of fields
about one employee. One field includes the employee's name, another field contains the address,
and still others the phone number, pay rate, earnings made to date, and so forth. A collection of
2|Page
related records is a file-for example, an employee file is a collection of all company employee
records. Likewise, an inventory file is a collection of all inventory records for a particular company
or organization.
At the highest level of the data hierarchy is a database, a collection of integrated and related files.
Together, bits, characters, fields, records, files, and databases form the hierarchy of data.
An attribute is a characteristic of an entity. For example, employee number, last name, first name,
hire date, and department number are attributes for an employee. See Figure 3.1. The inventory
number, description, number of units on hand, and location of the inventory item in the warehouse
are attributes for items in inventory. Customer number, name, address, phone number, credit rating,
and contact person are attributes for customers. Attributes are usually selected to reflect the
relevant characteristics of entities such as employees or customers. The specific value of an
attribute, called a data item, can be found in the fields of the record describing an entity. A data
key is a field within a record that is used to identify the record.
3|Page
3.2.3. The Database Approach
At one time, information systems referenced specific files containing relevant data. For example,
a payroll system would use a payroll file. Each distinct operational system used data files dedicated
to that system. Today, most organizations use the database approach to data management, where
multiple information systems share a pool of related data.
A database offers the ability to share data and information resources. Federal databases, for
example, often include the results of DNA tests as an attribute for convicted criminals. The
information can be shared with law enforcement officials around the country. Often, distinct yet
related databases are linked to provide enterprise-wide databases. For example, many Walgreens
stores include in-store medical clinics for customers. Walgreens uses an electronic health records
database that stores the information of all patients across all stores. The database provides
information about customers' interactions with the clinics and pharmacies.
When organizing a database, key considerations include determining what data to collect, what the
source of the data will be, who will have access to it, how one might want to use it, and how to
monitor database performance in terms of response time, availability, and other factors. One of the
tools database designers use to show the logical relationships among data is a data model. A data
4|Page
model is a diagram of entities and their relationships. Data modeling usually involves developing
an understanding of a specific business problem and then analyzing the data and information
needed to deliver a solution.
An enterprise data model involves analyzing the data and information needs of an entire
organization and provides a roadmap for building database and information systems by creating a
single definition and format for data that can ensure compatibility and the ability to exchange and
integrate data among systems. Various models have been developed to help managers and database
designers analyze data and information needs. One such data model is an entity-relationship (ER)
diagram, which uses basic graphical symbols to show the organization of and relationships
between data. In most cases, boxes in ER diagrams indicate data items or entities contained in data
tables, and lines show relationships between entities. In other words, ER diagrams show data items
in tables (entities) and the ways they are related.
After entering data into a relational database, users can make inquiries and analyze the data. Basic
data manipulations include selecting, projecting, and joining. Selecting involves eliminating rows
according to certain criteria. Suppose the department manager of a company wants to use an
employee table that contains the project number, description, and department number for all
projects a company is performing.
Projecting involves eliminating columns in a table. For example, a department table might contain
the department number, department name, and Social Security number (SSN) of the manager in
charge of the project. A sales manager might want to create a new table that contains only the
department number and the Social Security number of the manager in charge of the sales manual
5|Page
project. The sales manager can use projection to eliminate the department name column and create
a new table containing only the department number and Social Security number.
Joining involves combining two or more tables. For example, you can combine the project table
and the department table to create a new table with the project number, project description,
department number, department name, and Social Security number for the manager in charge of
the project. Linking, the ability to combine two or more tables through common data attributes to
form a new table with only the unique data attributes, is one of the keys to the flexibility and power
of relational databases. Suppose the president of a company wants to find out the name of the
manager of the sales manual project as well as the length of time the manager has been with the
company.
Data used in decision making must be accurate, complete, economical, flexible, reliable, relevant,
simple, timely, verifiable, accessible, and secure. Data cleansing (data cleaning or data scrubbing)
is the process of detecting and then correcting or deleting incomplete, incorrect, inaccurate, or
irrelevant records that reside in a database. The goal of data cleansing is to improve the quality of
the data used in decision making. The "bad data" may have been caused by user data-entry errors
or by data corruption during data transmission or storage. Data cleansing is different from data
validation, which involves the identification of "bad data" and its rejection at the time of data entry.
One data cleansing solution is to identify and correct data by crosschecking it against a validated
data set. For example, street number, street name, city, state, and zip code entries in an
organization's database may be cross-checked against the United States Postal Zip Code database.
Data cleansing may also involve standardization of data, such as the conversion of various possible
abbreviations (St., St, St., St) to one standard name (Street).
Creating and implementing the right database system ensures that the database will support both
business activities and goals. But how do we actually create, implement, use, and update a
database? The answer is found in the database management system (DBMS). As discussed earlier,
a DBMS is a group of programs used as an interface between a database and application programs
or between a database and the user. Database management systems come in a wide variety of types
6|Page
and capabilities, ranging from small inexpensive software packages to sophisticated systems
costing hundreds of thousands of dollars.
SQL is a special-purpose programming language for accessing and manipulating data stored in a
relational database. SQL was originally defined by Donald D. Chamberlin and Raymond Boyce of
the IBM Research Center and described in their paper "SEQUEL: A Structured English Query
Language," published in 1974. Their work was based on the relational database model described
by Edgar F. Codd in his groundbreaking paper from 1970, "A Relational Model of Data for Large
Shared Data Banks."
SQL databases conform to ACID properties (atomicity, consistency, isolation, durability), defined
by Jim Gray soon after Codd's work was published. These properties guarantee database
transactions are processed reliably and ensure the integrity of data in the database. Basically, these
principles mean that data is broken down to atomic values-that is, values that have no component
parts-such as employee ID, last name, first name, address line 1, address_line_2, and city. The data
in these atomic values remains consistent across the database. The data is isolated from other
transactions until the current transaction is finished, and it is durable in the sense that the data
should never be lost.
SQL databases rely upon concurrency control by locking database records to ensure that other
transactions do not modify the database until the first transaction succeeds or fails. As a result, 100
percent ACID-compliant SQL databases can suffer from slow performance. In 1986, the American
National Standards Institute (ANSI) adopted SQL as the standard query language for relational
databases. Since ANSI's acceptance of SQL, interest in making SQL an integral part of relational
databases on both mainframe and personal computers has increased. SQL has many built-in
functions, such as average (AVG), the largest value (MAX), and the smallest value (MIN). Table
7|Page
3.1 contains examples of SQL commands.
Database administrators (DBAs) are skilled and trained IS professionals who hold discussions with
business users to define their data needs; apply database programming languages to craft a set of
databases to meet those needs; test and evaluate databases; implement changes to improve their
performance; and assure that data is secure from unauthorized access. Database systems require a
skilled database administrator (DBA), who must have a clear understanding of the fundamental
business of the organization, be proficient in the use of selected database management systems,
and stay abreast of emerging technologies and new design approaches. The role of the DBA is to
plan, design, create, operate, secure, monitor, and maintain databases.
8|Page
3.6.Big Data
Big data is the term used to describe data collections that are so enormous (terabytes or more) and
complex (from sensor data to social media data) that traditional data management software,
hardware, and analysis processes are incapable of dealing with them.
Organizations collect and use data from a variety of sources, including business applications, social
media, sensors and controllers that are part of the manufacturing process, systems that manage the
physical environment in factories and offices, media sources (including audio and video
broadcasts), machine logs that record events and customer call data, public sources (such as
government Web sites), and archives of historical records of transactions and communications.
9|Page
TABLE 3.3 Portals that provide access to free sources of useful big data
Here are just a few examples of how organizations are employing big data to improve their day-
to-day operations, planning, and decision making:
Retail organizations monitor social networks such as Facebook, Google, LinkedIn, Twitter,
and Yahoo to engage brand advocates, identify brand adversaries (and attempt to reverse
their negative opinions), and even enable passionate customers to sell their products.
Advertising and marketing agencies track comments on social media to understand
consumers' responsiveness to ads, campaigns, and promotions.
10 | P a g e
Hospitals analyze medical data and patient records to try to identify patients likely to need
readmission within a few months of discharge, with the goal of engaging with those
patients in the hope of preventing another expensive hospital stay.
Consumer product companies monitor social networks to gain insight into customer
behavior, likes and dislikes, and product perception to identify necessary changes to their
products, services, and advertising.
Financial services organizations use data from customer interactions to identify customers
who are likely to be attracted to increasingly targeted and sophisticated offers.
Manufacturers analyze minute vibration data from their equipment, which changes slightly
as it wears down, to predict the optimal time to perform maintenance or replace the
equipment to avoid expensive repairs or potentially catastrophic failure.
3.6.4. Challenges of Big Data
Individuals, organizations, and society in general must find a way to deal with this ever-growing
data tsunami to escape the risks of information overload. The challenge is manifold, with a variety
of questions that must be answered, including how to choose what subset of data to store, where
and how to store the data, how to find those nuggets of data that are relevant to the decision making
at hand, how to derive value from the relevant data, and how to identify which data needs to be
protected from unauthorized access. With so much data available, business users can have a hard
time finding the information they need to make decisions, and they may not trust the validity of
Data management is an integrated set of functions that defines the processes by which data is
obtained, certified fit for use, stored, secured, and processed in such a way as to ensure that the
accessibility, reliability, and timeliness of the data meet the needs of the data users within an
major functions of data management, as shown in Figure 3.21. Data governance is the core
component of data management; it defines the roles, responsibilities, and processes for ensuring
that data can be trusted and used by the entire organization, with people identified and in place
who are responsible for fixing and preventing issues with data.
Data life cycle management (DLM) is a policy-based approach to managing the flow of an
enterprise's data, from its initial acquisition or creation and storage to the time when it becomes
outdated and is deleted. A policy-based approach to managing the flow of an enterprise's data, from
its initial acquisition or creation and storage to the time when it becomes outdated and is deleted.
See Figure 3.22. Several vendors offer software products to support DL Msuch as IBM Information
Lifecycle Governance suite of software products.
12 | P a g e
13 | P a g e