Lecture 7
Lecture 7
Lecture 7
Lecture 7
Statistical Data Visualization
COSC-3107 Machine Learning
Shahzad Hussain
Lecturer
2 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
Today’s Lecture Outline
• Statistical Visualization
i. What the purpose of Visualization Tool - Graphs
ii. Plotting an Analytical Function
iii. Component of a Graph
iv. Creating Graphs – Mathematical Functions
v. Seaborn
vi. Which Tool I Should Be Used COSC-3107 Machine Learning
vii. Types of Graphs
viii. Line Graphs
ix. Creating Line Graphs Using Different Libraries
x. Pandas DataFrames and Grouped Data
3 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
• A few qualities that a graph that will be used for COSC-3107 Machine Learning
analysis and transmitting information, including
statistics, should have:
– Show the data
– Avoid distorting what the data has to say
– Make large datasets coherent
– Serve a reasonably clear purpose—description, exploration, tabulation, or
decoration
6 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
What purpose of Visualization Tools - Graphs
• Graphs must reveal information.
• We should think about creating graphs with these
principles in mind when creating an analysis.
• A graph should also be able to stand out on its own,
outside the analysis.
• Let us say that you are writing an analysis report that
becomes extensive. Now, we need to create a summary COSC-3107 Machine Learning
of that extensive analysis. To make the analysis' points
clear, a graph can be used to represent the data. This
graph should be able to support the summary without
the entire extensive analysis. To enable the graph to
give more information and be able to stand out on its
own in the summary, we have to add more information to
it, such as a title and labels.
7 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
9 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
10 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
3. Component of a Graph
Components of Graph
12 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
Components of Graph
The components of a graph are as follows:
• Figure: The base of the graph, where all the other components are
drawn.
• Axis: Contains the figure elements and sets the coordinate system.
• Title: The title gives the graph its name.
• X-axis label: The name of the x-axis, usually named with the units.
• Y-axis label: The name of the y-axis, usually named with the units.
• Legend: A description of the data plotted in the graph, allowing you to
identify the curves and points in the graph. COSC-3107 Machine Learning
• Ticks and tick labels: They indicate the points of reference on a scale for
the graph, where the values of the data are. The labels indicate the values
themselves.
• Line plots: These are the lines that are plotted with the data.
• Markers: Markers are the pictograms that mark the point data.
• Spines: The lines that delimit the area of the graph where data is plotted.
13 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
Seaborn
• Seaborn(https://seaborn.pydata.org/) is part of the
PyData family of tools and is a visualization library based on
Matplotlib with the goal of creating statistical graphs more
easily.
18 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
6. Which Tool I Should Be Used
20 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
7. Types of Graphs
Types of Graphs
• Line Graph: A line graph displays
data as a series of interconnected
points on two axes (x and y), usually
Cartesian, ordered commonly by the
x-axis. Line charts are useful for
demonstrating trends in data, such
as in time series.
23 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
8. Line Graphs
• More than one line can be used on the same graph, for a
comparison between the behavior of each line, although
care must be taken so that the units on the graph are the
same.
COSC-3107 Machine Learning
• They can also demonstrate the relationship between an
independent and a dependent variable.
25 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
Usually, a time series graph has the time variable on the x-axis.
26 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
9. Creating Line Graphs Using Different Libraries
# Required Packages
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
COSC-3107 Machine Learning
# Generate the Data
X = np.arange(0,100)
Y = np.random.randint(0,200, size=X.shape[0])
# Plotting the data on a graph using Pyplot API
plt.plot(X, Y)
28 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
Creating Line Graphs Using Different Libraries
# let's create a Pandas DataFrame with the created values:
df = pd.DataFrame({'x':X, 'y_col':Y})
#Plot it using the Pyplot interface, but with the data argument:
plt.plot('x', 'y_col', data=df)
# With the same DataFrame, we can also plot directly from the Pandas DataFrame:
df.plot('x', 'y_col')
29 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
30 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
10. Pandas DataFrames and Grouped Data
33 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology
34 Shahzad Hussain, Lecturer, Khawaja Fareed University of Engineering and Information Technology