Instructor Materials Chapter 2: Fundamentals of Data Analysis
Instructor Materials Chapter 2: Fundamentals of Data Analysis
Chapter 2: Fundamentals of
Data Analysis
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 1
Chapter 2: Fundamentals of
Data Analysis
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 9
Chapter 2 - Sections & Objectives
2.1 What is Data Analysis
• Explain how data is used to create knowledge.
2.2 Using Big Data
• Use software tools to visualize a data analysis following the Data Analysis
Lifecycle process.
2.3 Data Acquisition and Preparation
• Configure data for analysis.
2.4 Big Data Ethics
• Explain why ethics are important when using Big Data.
2.5 Preparation for Chapter 2 Internet Meter Labs
• Analyze data by using an external application and SQLite.
2.6 Summary
• Summarize the concepts presented in this chapter.
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 10
2.1 What is Data Analysis
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 11
What is Data Analysis?
Analytics Models
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 12
What Are Analytics?
Analytics Models cont…
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 14
Using Big Data
Types of Data Analysis
Scalable technologies are enabling
data center administrators to
manage the top three aspects of Big
Data:
• Volume
• Velocity
• Variety
The Data, Information, Knowledge,
and Wisdom (DIKW) model shows
the transitions that data undergoes
until it gains enough value to inform
wise decisions. This is called
Business Intelligence
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 15
Using Big Data
Why Analyze Big Data?
Multiple types of analytics provide organizations and people with
information that can drive innovation, improve efficiency and
mitigate risk.
• Descriptive analytics - Relies solely on historical data to provide regular
reports on events that have already happened.
• Predictive analytics - Can infer missing data and establish a future trend
line based on past data. It uses simulation models and forecasting to
suggest what could happen.
• Prescriptive analytics - Recommends actions or decisions based on a
complex set of targets, constraints, and choices.
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 16
Using Big Data
Timely Analysis of Big Data
With Big Data, much of the value of data is derived from creating
opportunities to take action immediately.
Data-driven decisions can have the following benefits:
• Increased time to research and develop products and services
• Increased efficiency and faster manufacturing
• Faster time to market
• More effective marketing
and advertising
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 17
Using Big Data
Data Analysis Lifecycle
Gathering the data - The process of
locating data and then determining if there
is enough data to complete the analysis.
Preparing the data - This step can involve
many tasks to transform the data into a
format appropriate for the tool that will be
used.
Choosing a model - This step includes
choosing an analysis technique that will
best answer the question with the data
available.
Analyzing the data - The process of
testing the model against the data and
determining if the model and the analyzed
data are reliable. Were you able to answer
the question with the selected tool?
Presenting the results - The process of
communicating the results to decision-
makers.
Making decisions - Organizational leaders
incorporate the new knowledge as part of
the overall strategy. The process begins
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 18
2.3 Data Acquisition and
Preparation
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 19
Data Acquisition and Preparation
Sources of Data
There are many different sources of data.
A vast amount of historical data can be found in files
such as:
o MS Word documents
o Emails
o Spreadsheets
o MS PowerPoints
o PDFs
o HTML
o and plaintext files
• Public and Private Archives
• CSV, JSON, and XML files use plaintext, a common
format, and are compatible with a wide range of
applications
• The Web can be mined for data using a web
scraping application
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 20
Data Acquisition and Preparation
Sources of Data cont…
The IoT uses sensors create
data
• Sensors in smartphones, cars,
airplanes, street lamps, and home
appliances capture raw data
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 21
Data Acquisition and Preparation
Data Preparation
Collected data may not be
compatible or formatted
correctly
• Data must be prepared before it can
be added to a data set
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 22
Data Acquisition and Preparation
Data Structures
Relational Database Tables
• Fields (Column)
• Rows (Rows)
• Values (Cells)
Python
• Strings
• Lists
• Tuples
• Sets
• Dictionaries
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 23
2.4 Big Data Ethics
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 24
Big Data Ethics
What are the Ethical Concerns?
Data protection regulations varies
from country to country
Confidentiality, integrity and
availability, known as the CIA
triad is a guideline for data
security in an organization
Four general cloud security
controls:
• Deterrent
• Preventive
• Detective
• Corrective
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 25
2.5 Preparation for Ch2
Internet Meter Labs
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 26
Preparation for Chapter 2 Internet Meter Labs
Part 1
The datetime module is
included in most Python
distributions as a
standard library;
however, it must be
imported to be used in
your code.
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 27
Preparation for Chapter 2 Internet Meter Labs
Part 2
SQLite is an SQL
implementation using a
client server method of
operation
• Uses connections
established between
Python and an SQL
database by creating an
SQL connection object
SQL can be said to be a
language composed of
three special purpose
languages
• Data Definition Language
• Data Manipulation Language
• Data Query Language
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 28
2.6 Summary
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 29
Chapter Summary
Summary
Data can no longer be stored on a few machines or processed with just
one tool
Decision makers will increasingly rely on data analytics to extract the
required information at the right time, in the right place, to make the right
decision
Descriptive analytics relies solely on historical data
Predictive analytics attempts to predict what may happen
Prescriptive analytics predicts outcomes and suggests courses of
actions that will hold the greatest benefit for an organization
Files, the Internet, sensors, and databases are all good sources of data.
Extract, Transform and Load (ETL) is a process for collecting data from
a variety of sources, transforming the data, and then loading the data
into a database
The CIA triad is a guideline for data security for an organization
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 30
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 31
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 32