0% found this document useful (0 votes)
216 views

Instructor Materials Chapter 2: Fundamentals of Data Analysis

Uploaded by

abdulaziz doro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
216 views

Instructor Materials Chapter 2: Fundamentals of Data Analysis

Uploaded by

abdulaziz doro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Instructor Materials

Chapter 2: Fundamentals of
Data Analysis

Big Data & Analytics

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 1
Chapter 2: Fundamentals of
Data Analysis

Big Data & Analytics

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 9
Chapter 2 - Sections & Objectives
 2.1 What is Data Analysis
• Explain how data is used to create knowledge.
 2.2 Using Big Data
• Use software tools to visualize a data analysis following the Data Analysis
Lifecycle process.
 2.3 Data Acquisition and Preparation
• Configure data for analysis.
 2.4 Big Data Ethics
• Explain why ethics are important when using Big Data.
 2.5 Preparation for Chapter 2 Internet Meter Labs
• Analyze data by using an external application and SQLite.
 2.6 Summary
• Summarize the concepts presented in this chapter.

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 10
2.1 What is Data Analysis

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 11
What is Data Analysis?
Analytics Models

 The six-step Data Analysis


Lifecycle
 Data Analytics tools should
provide:
• Ease of use
• Data manipulation
• Sharing
• Interactive exploration

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 12
What Are Analytics?
Analytics Models cont…

 The Python programming


language has become a  The libraries that will be used
commonly used tool for in this course:
handling and manipulating • NumPy – This library adds support
data. for arrays and matrices. It also has
many built-in mathematical
 Python will be used in this functions for use on data sets.
course to perform data • Pandas – This library adds support
cleaning, analysis, and for tables and time series. Pandas
manipulation. is used to manipulate and clean
data, among other uses.
 Jupyter Notebooks will be • Matplotlib – This library adds
used as both a document for support for data visualization.
written instructions as well as Matplotlib is a plotting library
capable of creating simple line
a Python command interface plots to complicated 3D and
for running code. contour plots.
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 13
2.2 Using Big Data

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 14
Using Big Data
Types of Data Analysis
 Scalable technologies are enabling
data center administrators to
manage the top three aspects of Big
Data:
• Volume
• Velocity
• Variety
 The Data, Information, Knowledge,
and Wisdom (DIKW) model shows
the transitions that data undergoes
until it gains enough value to inform
wise decisions. This is called
Business Intelligence

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 15
Using Big Data
Why Analyze Big Data?
 Multiple types of analytics provide organizations and people with
information that can drive innovation, improve efficiency and
mitigate risk.
• Descriptive analytics - Relies solely on historical data to provide regular
reports on events that have already happened.
• Predictive analytics - Can infer missing data and establish a future trend
line based on past data. It uses simulation models and forecasting to
suggest what could happen.
• Prescriptive analytics - Recommends actions or decisions based on a
complex set of targets, constraints, and choices.

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 16
Using Big Data
Timely Analysis of Big Data
 With Big Data, much of the value of data is derived from creating
opportunities to take action immediately.
 Data-driven decisions can have the following benefits:
• Increased time to research and develop products and services
• Increased efficiency and faster manufacturing
• Faster time to market
• More effective marketing
and advertising

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 17
Using Big Data
Data Analysis Lifecycle
 Gathering the data - The process of
locating data and then determining if there
is enough data to complete the analysis.
 Preparing the data - This step can involve
many tasks to transform the data into a
format appropriate for the tool that will be
used.
 Choosing a model - This step includes
choosing an analysis technique that will
best answer the question with the data
available.
 Analyzing the data - The process of
testing the model against the data and
determining if the model and the analyzed
data are reliable. Were you able to answer
the question with the selected tool?
 Presenting the results - The process of
communicating the results to decision-
makers.
 Making decisions - Organizational leaders
incorporate the new knowledge as part of
the overall strategy. The process begins
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 18
2.3 Data Acquisition and
Preparation

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 19
Data Acquisition and Preparation
Sources of Data
There are many different sources of data.
 A vast amount of historical data can be found in files
such as:
o MS Word documents
o Emails
o Spreadsheets
o MS PowerPoints
o PDFs
o HTML
o and plaintext files
• Public and Private Archives
• CSV, JSON, and XML files use plaintext, a common
format, and are compatible with a wide range of
applications
• The Web can be mined for data using a web
scraping application
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 20
Data Acquisition and Preparation
Sources of Data cont…
 The IoT uses sensors create
data
• Sensors in smartphones, cars,
airplanes, street lamps, and home
appliances capture raw data

 The list of things with


sensors grows every year
• The IoT contributes to the growth of
Big Data

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 21
Data Acquisition and Preparation
Data Preparation
 Collected data may not be
compatible or formatted
correctly
• Data must be prepared before it can
be added to a data set

 Extract, Transform and Load


(ETL)
• process for collecting data from a
variety of sources, transforming
the data, and then loading the
data into a database

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 22
Data Acquisition and Preparation
Data Structures
 Relational Database Tables
• Fields (Column)
• Rows (Rows)
• Values (Cells)

 Python
• Strings
• Lists
• Tuples
• Sets
• Dictionaries

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 23
2.4 Big Data Ethics

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 24
Big Data Ethics
What are the Ethical Concerns?
 Data protection regulations varies
from country to country
 Confidentiality, integrity and
availability, known as the CIA
triad is a guideline for data
security in an organization
 Four general cloud security
controls:
• Deterrent
• Preventive
• Detective
• Corrective

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 25
2.5 Preparation for Ch2
Internet Meter Labs

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 26
Preparation for Chapter 2 Internet Meter Labs
Part 1
 The datetime module is
included in most Python
distributions as a
standard library;
however, it must be
imported to be used in
your code.

 The csv module allows


reading and writing to
.csv files.

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 27
Preparation for Chapter 2 Internet Meter Labs
Part 2
 SQLite is an SQL
implementation using a
client server method of
operation
• Uses connections
established between
Python and an SQL
database by creating an
SQL connection object
 SQL can be said to be a
language composed of
three special purpose
languages
• Data Definition Language
• Data Manipulation Language
• Data Query Language

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 28
2.6 Summary

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 29
Chapter Summary
Summary
 Data can no longer be stored on a few machines or processed with just
one tool
 Decision makers will increasingly rely on data analytics to extract the
required information at the right time, in the right place, to make the right
decision
 Descriptive analytics relies solely on historical data
 Predictive analytics attempts to predict what may happen
 Prescriptive analytics predicts outcomes and suggests courses of
actions that will hold the greatest benefit for an organization
 Files, the Internet, sensors, and databases are all good sources of data.
 Extract, Transform and Load (ETL) is a process for collecting data from
a variety of sources, transforming the data, and then loading the data
into a database
 The CIA triad is a guideline for data security for an organization

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 30
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 31
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy