0% found this document useful (0 votes)

3 views

Unit-5 new

The document provides an overview of data visualization in data science, detailing its principles, types, and importance in analyzing and interpreting data. It discusses various visualization techniques, tools, and best practices, emphasizing the need for clarity, interactivity, and audience consideration. Additionally, it highlights the advantages and disadvantages of data visualization, as well as the workflow involved in creating effective visual representations of data.

Uploaded by

PARTH BHARADIA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Unit-5 new

Uploaded by

PARTH BHARADIA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Unit-5

Contents

• Data Visualization: Basic principles, ideas and tools for data visualization.

2
Data Visualization

• Data visualization in data science refers to the process of generating graphical representations
of information.
• These graphical depictions, often known as plots or charts, are pivotal in the realm of data science
for effective analysis and interpretation.
• Understanding the various types of data visualization in data science is crucial to select the
appropriate visual method for the dataset at hand.
• Different types serve different analytical needs, from understanding distributions with
histograms to spotting trends with line charts.
Data visualization benefits include communicating your results or findings, monitoring the
model’s performance at the evaluation stage, hyperparameter tuning, identifying trends, patterns and
correlation between dataset features, data cleaning such as outlier detection, and validating model
assumptions.

3
Data Visualization: Examples

• Here are some popular data visualization examples.

• Weather reports: Maps and other plot types are commonly used in weather reports.
• Internet websites: Social media analytics websites such as Social Blade and Google Analytics use
data visualization techniques to analyze and compare the performance of websites.
• Astronomy: Space agencies uses advanced data visualization techniques in its reports and
presentations.
• Geography
• Gaming industry

4
What Makes Data Visualization Effective?

• Clarity: Data should be visualized in a way that everyone can understand.

• Problem domain: When presenting data, the visualizations should be related to the business
problem.
• Interactivity: Interactive plots are useful to compare and highlight certain things within the plot.
• Comparability: We can compare the thighs easily with good plots.
• Aesthetics: Quality plots are visually aesthetic.
• Informative: A good plot summarizes all relevant information.

5
Importance of Data Visualization

1. Data cleaning
Data visualization plays an important role in data clearing. Good examples are detecting outliers and
removing multicollinearity. We can create scatterplots to detect outliers and generate heatmaps to
check multicollinearity.

2. Data Exploration
Before building any model, we need to do some exploratory data analysis to identify dataset
characteristics. For example, we can create histograms for continuous variables to check for normality
in the data. We can create scatterplots between two features to check whether they are correlated.
Likewise, we can create a bar chart for the label column with two or more classes to identify class
imbalance.

6
3. Evaluation of modeling outputs
We can create a confusion matrix and learning curve to measure the performance of a model during
training. Plots are also useful in validating model assumptions. For example, we can create a residuals
plot and histogram for the distribution of residuals to validate the assumptions of a linear regression
model.

4. Identifying trends
Time and seasonal plots are useful in time series analysis to identify certain trends over time.

5. Presenting results
As a data scientist, you need to present your findings to the company or other related persons who do
not have more knowledge in the subject domain. So, you need to explain everything in plain English.
You can use informative plots that summarize your findings.

7
Types of Data Visualization in Data Science

1. Distribution plot
A distribution plot is used to visualize data distribution—for example: A probability distribution plot
or density curve.

8
2. Box and whisker plot
This plot is used to plot the variation of the values of a numerical feature. You can get the
values' minimum, maximum, median, lower and upper quartiles.

9
3. Violin plot
Similar to the box and whisker plot, the violin plot is used to plot the variation of a numerical
feature. But it contains a kernel density curve in addition to the box plot. The kernel density curve
estimates the underlying distribution of data.

10
4. Line plot
A line plot is created by connecting a series of data points with straight lines. The number of
periods is on the x-axis.

11
5. Bar plot
A bar plot is used to plot the frequency of occurring categorical data. Each category is
represented by a bar. The bars can be created vertically or horizontally. Their heights or lengths are
proportional to the values they represent.

12
6. Scatter plot
Scatter plots are created to see whether there is a relationship (linear or non-linear and
positive or negative) between two numerical variables. They are commonly used in regression
analysis.

13
7. Histogram
A histogram represents the distribution of numerical data. Looking at a histogram, we can
decide whether the values are normally distributed (a bell-shaped curve), skewed to the right or
skewed left. A histogram of residuals is useful to validate important assumptions in regression
analysis.

14
8. Pie chart
A categorical variable pie chart includes each category's values as slices whose sizes are
proportional to the quantity they represent. It is a circular graph made with slices equal to the number
of categories.

15
9. Area plot
The area plot is based on the line chart. We get the area plot when we cover the area between
the line and the x-axis.

16
10. Hexbin plot
Similar to the scatter plot, a hexbin plot represents the relationship between two numerical
variables. It is useful when there are a lot of data points in the two variables. When you have a lot of
data points, they will overlap when represented in a scatter plot.

17
11. Heatmap
A heatmap visualizes the correlation coefficients of numerical features with a beautiful color
map. Light colors show a high correlation, while dark colors show a low correlation. The heatmap is
extremely useful for identifying multicollinearity that occurs when the input features are highly
correlated with one or more of the other features in the dataset.

18
Write python code to display the types of visualizations

19
Data Visualization Process/Workflow

The data visualization process or workflow includes the fowling key steps.

1. Develop your research question

This may be a business problem or any other related problem that could be solved with a
data-driven approach. You should note all the objectives and outcomes plus required resources such
as datasets, open-source software libraries, etc.

2. Get or create your data

The next step is collecting data. You can use existing datasets if they’re relevant to your
research question. Alternatively, you can download open-source datasets from the internet or do web
scraping to collect data.

20
3. Clean your data
Real-world data are messy. So, you need to clean them before using them for visualization.
You can identify missing values and outliers and treat them accordingly. You can perform feature
selection and remove unnecessary features from the data. You can create a new set of features based
on the original features.

4. Choose a chart type

The chart type depends on many factors. For example, it depends on the feature type
(numerical or categorical). It also depends on the type of visualization you need. Let’s say you have
two numerical features. If you want to find their distributions, you can create two histograms for each
feature. If you want to plot their variations, you can create box and whisker plots for each feature. You
can create a scatterplot if you want to find a relationship (linear or non-linear, positive or negative)
between the two features.

21
5. Choose your tool
You can use open-source data visualization tools such as matplotlib, seaborn, plotty and
ggplot. You can also use API-based software such as Matlab, Minitab, SPSS, etc.

6. Prepare data
You can extract relevant features. You can do feature standardization if the values of the
features are not on the same scale. You can apply data preprocessing steps such as PCA to reduce the
dimensionality of the data. That will allow you to visualize high-dimensional data in 2D and 3D plots!

7. Create a chart
This is the final step. Here. You define the title and names for the axes. You should also
choose a proper chart background to ensure the content is easily readable.

22
Data Visualization Tools
There are multiple tools and software available for data visualization.
1. Python provides open-source libraries such as
• Matplotlib
• Seaborn
• Plotty
• Bokeh
• Altair
2. R provides open-source libraries such as
• Ggplot2
• Lattice
3. Other data visualization libraries
• Tableau
• Microsoft Power BI
• IBM SPSS
• Minitab
23
Write detail notes on following tools for data visualization:
• Tableau
• Microsoft Power BI
• IBM SPSS
• Minitab

24
Data Visualization Techniques
Some of the main data visualization techniques in data science are :

1. Univariate Analysis

2. Bivariate Analysis

3. Multivariate Analysis

25
Advantages of Data Visualization
There are many advantages of data visualization. Data visualization is used to:

• Communicate your results or findings with your audience

• Tune hyperparameters
• Identify trends, patterns and correlations between variables
• Monitor the model’s performance
• Clean data
• Validate the model’s assumptions

26
Disadvantages of Data Visualization
There are also some disadvantages of data visualization.

• We need to download, install and configure software and open-source libraries. The process will be difficult and
time-consuming for beginners.
• Some data visualization tools are not available for free. We need to pay for those.
• When we summarize the data, we’ll lose the exact information.

27
Data Visualization Best Practices
1. Set the context
We need to develop a research question that could be solved with a data-driven approach.

2. Know your audience

This is very important as the visualizations depend on the type of audience you have. To present your
findings to a business people audience, you need to create visualizations closely related to money,
profits, and revenue the terms that business people are familiar with!

28
3. Choose an effective visual
You need to create the right plot that addresses your requirement. To see the correlations between
multiple variables, you can create histograms for each pair of variables. But that is not very effective.
Instead, you can create a heatmap that is an effective way of visualizing correlations. When you have
many categories, the pie chart is not suitable. Instead, you can create a bar chart. These are some
examples of choosing an effective visual for your requirements.

4. Keep it simple
Simple plots are easily readable. We can remove unnecessary backgrounds to make things stand out.
We should not include much content in the plot. Title, names for axis, scale, and legends are just
enough.

29
Assignment

1. How do challenges and best practices associated with visualizing datasets with multiple dimensions or
high-dimensional characteristics affect the capability of data visualization to discern patterns, trends, or
outliers?

2. What strategies can be employed to improve the effectiveness of data visualization in complex datasets
with numerous dimensions, considering the challenges and best practices associated with such
visualizations?

30
Happy Learning

Iron Kingdoms - Legend of The Witchfire
75% (12)
Iron Kingdoms - Legend of The Witchfire
90 pages
Geoengineering U.S. Patent List (HAARP) - July 2014
100% (2)
Geoengineering U.S. Patent List (HAARP) - July 2014
8 pages
30 Minutes Indian Recipes PDF
No ratings yet
30 Minutes Indian Recipes PDF
55 pages
7 Paint Defects
100% (3)
7 Paint Defects
29 pages
Eds Unit 3
No ratings yet
Eds Unit 3
22 pages
unit_5 (1)
No ratings yet
unit_5 (1)
81 pages
DV UNIT 2
No ratings yet
DV UNIT 2
5 pages
Common Visualization Idioms
0% (1)
Common Visualization Idioms
95 pages
Data Unit4
No ratings yet
Data Unit4
8 pages
Lecture 4 Unit 1
No ratings yet
Lecture 4 Unit 1
23 pages
What Is Data Visualization UNIT-V
No ratings yet
What Is Data Visualization UNIT-V
24 pages
dsbda_ut6
No ratings yet
dsbda_ut6
11 pages
UNIT 5 (1)
No ratings yet
UNIT 5 (1)
6 pages
Module4 DSV
No ratings yet
Module4 DSV
89 pages
DV Lab Manual (Ex - No.1-10)
No ratings yet
DV Lab Manual (Ex - No.1-10)
23 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Data Visualization PDF
No ratings yet
Data Visualization PDF
3 pages
#CH-2.2.3
No ratings yet
#CH-2.2.3
21 pages
Data Visualization Notes
No ratings yet
Data Visualization Notes
22 pages
Data+visualisation+.pptx
No ratings yet
Data+visualisation+.pptx
30 pages
1 Dataset 101 Visualizations Guidebook
No ratings yet
1 Dataset 101 Visualizations Guidebook
122 pages
Data Visualization New
No ratings yet
Data Visualization New
103 pages
Data Science
No ratings yet
Data Science
59 pages
2.2 Visualization of Numerical Data
No ratings yet
2.2 Visualization of Numerical Data
17 pages
Data Visualization Notes
No ratings yet
Data Visualization Notes
4 pages
Unit-1-1
No ratings yet
Unit-1-1
19 pages
4 - Data Visualization For Decison Making
100% (1)
4 - Data Visualization For Decison Making
64 pages
notes
No ratings yet
notes
10 pages
Unit _Data Visualization
No ratings yet
Unit _Data Visualization
33 pages
15 Questions DV 3rd Year a Sec
No ratings yet
15 Questions DV 3rd Year a Sec
51 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
Data Visualization 21st June
No ratings yet
Data Visualization 21st June
110 pages
Unit IV
No ratings yet
Unit IV
63 pages
Advanced Python Chap 3 Part 1
No ratings yet
Advanced Python Chap 3 Part 1
49 pages
Data Visualization Techniques: Dr. D. Koteswara Rao
No ratings yet
Data Visualization Techniques: Dr. D. Koteswara Rao
41 pages
All_Unit_DV_Notes
No ratings yet
All_Unit_DV_Notes
31 pages
Notes_DV_2025[1]
No ratings yet
Notes_DV_2025[1]
10 pages
Unit 4
No ratings yet
Unit 4
21 pages
Effective Data Visualization Techniques in Data Science Using Python
No ratings yet
Effective Data Visualization Techniques in Data Science Using Python
14 pages
Data Analytics Unit V
No ratings yet
Data Analytics Unit V
18 pages
Data Visualization
No ratings yet
Data Visualization
16 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
DV-Viva-Voice-Data Visualization
No ratings yet
DV-Viva-Voice-Data Visualization
12 pages
data visualization CAE-1
No ratings yet
data visualization CAE-1
8 pages
Business Analytics
No ratings yet
Business Analytics
13 pages
CSC504 note on VISUALIZATION
No ratings yet
CSC504 note on VISUALIZATION
13 pages
Data Visualization
No ratings yet
Data Visualization
103 pages
UNIT 5 Data Analytics
No ratings yet
UNIT 5 Data Analytics
20 pages
00. Data+Visualization+in+Python
No ratings yet
00. Data+Visualization+in+Python
17 pages
Data Visualization in Data Science
100% (6)
Data Visualization in Data Science
34 pages
ch6
No ratings yet
ch6
43 pages
Visualizing Distributions
No ratings yet
Visualizing Distributions
28 pages
IT_R23_Skills Development-DATA VISUALIZATION Lab
No ratings yet
IT_R23_Skills Development-DATA VISUALIZATION Lab
31 pages
UNIT4
No ratings yet
UNIT4
8 pages
5th Unit Fds
No ratings yet
5th Unit Fds
5 pages
Data Visualization
No ratings yet
Data Visualization
23 pages
Data_Visualization_Presentation
No ratings yet
Data_Visualization_Presentation
15 pages
CS2 2 Study Unit 7 Introduction To Data Visualization
No ratings yet
CS2 2 Study Unit 7 Introduction To Data Visualization
47 pages
21CS644 Module 4
No ratings yet
21CS644 Module 4
24 pages
Week13 2 Data Analysis 2
No ratings yet
Week13 2 Data Analysis 2
44 pages
DV Co1 All PDF
No ratings yet
DV Co1 All PDF
196 pages
20200321152821_DSS_Chapter_SEVEN
No ratings yet
20200321152821_DSS_Chapter_SEVEN
12 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Aitken Et Al 2018 - A Role For Data Richness Mapping in Exploration Decision Making
No ratings yet
Aitken Et Al 2018 - A Role For Data Richness Mapping in Exploration Decision Making
13 pages
KEH-P5900R KEH-P4930R KEH-P4900R: Manual
No ratings yet
KEH-P5900R KEH-P4930R KEH-P4900R: Manual
48 pages
pelletron notes
No ratings yet
pelletron notes
13 pages
(Blair) 1989 PF All Cause Mortality Prospective Study - JAMA
No ratings yet
(Blair) 1989 PF All Cause Mortality Prospective Study - JAMA
7 pages
Ubd Unit Plan
100% (1)
Ubd Unit Plan
8 pages
Search Agents Uninformed Search: Artificial Intelligence
No ratings yet
Search Agents Uninformed Search: Artificial Intelligence
48 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Manual: Auto Transfer Controller
100% (1)
Manual: Auto Transfer Controller
32 pages
PF Chronicler Anthology Vol. 1
No ratings yet
PF Chronicler Anthology Vol. 1
217 pages
Sunfar E550 Inverter Manual
No ratings yet
Sunfar E550 Inverter Manual
3 pages
Transformer Protection
50% (2)
Transformer Protection
23 pages
EE3002TermProject-1
No ratings yet
EE3002TermProject-1
11 pages
WCN Lab Manual Final
100% (2)
WCN Lab Manual Final
74 pages
Disgesia
No ratings yet
Disgesia
17 pages
Sign of Life, A Shaman King Fanfiction
0% (1)
Sign of Life, A Shaman King Fanfiction
5 pages
Gardner Denver ESE 18-37 Brochure
100% (1)
Gardner Denver ESE 18-37 Brochure
2 pages
D100 Forest Encounters & Events
83% (6)
D100 Forest Encounters & Events
6 pages
System modeling and simulation an introduction 1st Edition Frank L. Severance instant download
100% (1)
System modeling and simulation an introduction 1st Edition Frank L. Severance instant download
73 pages
The River’s Promise
No ratings yet
The River’s Promise
2 pages
Full Download Future Living Gemeinschaftliches Wohnen in Japan 1st Edition Claudia Hildner PDF DOCX
100% (4)
Full Download Future Living Gemeinschaftliches Wohnen in Japan 1st Edition Claudia Hildner PDF DOCX
56 pages
ELLE_DECOR_Winter_2025-2024@magazinesclubnew
100% (1)
ELLE_DECOR_Winter_2025-2024@magazinesclubnew
112 pages
A Displacement-Based Adaptive Pushover For Assessment of Buildings and Bridges - Rui Pinho, Et Al, 2006
No ratings yet
A Displacement-Based Adaptive Pushover For Assessment of Buildings and Bridges - Rui Pinho, Et Al, 2006
16 pages
Anglian Water Services Limited
No ratings yet
Anglian Water Services Limited
4 pages
Tugas 4 - Bahasa Inggris - Putri Budiman - 2B
No ratings yet
Tugas 4 - Bahasa Inggris - Putri Budiman - 2B
2 pages
ARD Mock 6
No ratings yet
ARD Mock 6
23 pages
Placement Brochure MNNIT-Allahabad
No ratings yet
Placement Brochure MNNIT-Allahabad
28 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit-5 new

Uploaded by

Unit-5 new

Uploaded by

Unit-5

• Here are some popular data visualization examples.

• Clarity: Data should be visualized in a way that everyone can understand.

1. Develop your research question

2. Get or create your data

4. Choose a chart type

• Communicate your results or findings with your audience

2. Know your audience

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.