L4 Data Visualization Part 1 (1)
L4 Data Visualization Part 1 (1)
Class Outline
Introduction to Data Visualization
Salient Points
Scatter Plot
Line Plot
Distribution Plot
Introduction to Data Visualization
▪Data visualization is the graphical representation of data and information to visually
communicate insights, patterns, trends, and relationships within datasets.
▪It involves transforming raw data into visual elements such as charts, graphs, maps, and
diagrams, making complex information more accessible and easily understandable to a wide
range of audiences.
▪Data visualization plays a crucial role in both understanding and communicating insights
from data.
Salient Points:
Here are some key reasons why data visualization is quite essential in data field:
▪Data Exploration:
•Data visualization enables analysts and data scientists to explore datasets visually, helping them
gain a deeper understanding of the data's structure, patterns, and distributions.
•This exploratory process often leads to the discovery of valuable insights that might have been
overlooked in raw data.
▪Insight Communication:
•Complex data can be difficult to comprehend through raw numbers and tables.
•Visualizations provide an intuitive and clear way to communicate findings, trends, and
relationships to stakeholders, making it easier for them to grasp the key takeaways from the data
analysis.
Salient Points:
▪Pattern Recognition:
•Data visualization aids in the identification of patterns, trends, and outliers in datasets.
•Visual cues and patterns become apparent through graphs and charts, which can lead to
valuable business insights, process improvements, or scientific discoveries.
▪Storytelling:
•Data visualizations can be used to tell a compelling story with data.
•By combining visual elements with narrative context, data scientists can effectively convey
the significance and implications of their findings to a broader audience.
Salient Points:
▪Decision Making:
•Well-crafted data visualizations provide decision-makers with a visual representation of
information, enabling them to make data-driven decisions more confidently and accurately.
•Visualizations help executives and managers grasp complex information quickly and take
appropriate actions.
Plots and Charts
Plots and Charts
▪"Plots" and "charts" are terms often used interchangeably to refer to visual representations of data.
However, there can be subtle differences in their usage and context:
▪Plots:
•Plots typically focus on showing the relationships, patterns, and trends within data through the
use of lines, points, bars, or other graphical elements.
•The term "plot" is often used in the context of scientific or technical fields to describe graphical
presentations of data.
▪Charts:
•"Charts" specifically refer to visual representations of data in graphical formats, often used to
convey information visually and make it easier to understand.
•The term "chart" is often associated with business and information visualization, and it's
frequently used in reports, presentations, and dashboards.
Scatter
Plot
TOTAL BILL VS TI P
AMOUNT G R A P H FOR A
R ESTAU RA NT
Scatter Plot
▪Scatter plots are primarily used to visualize the relationship or correlation between two
numerical variables. They help identify patterns, trends, and associations in the data.
▪The correlation between the two variables can be quantified using a statistical measure called the
correlation coefficient, which ranges from -1 to +1.
▪Identifying Correlation: Scatter plots are ideal for determining the nature and strength of the
relationship between the two variables.
•Positive correlation indicates that as one variable increases, then other also tends to
increase.
•Negative correlation indicates that as one variable increases, then other tends to decrease.
•No correlation implies that there is no clear relationship between the two variables.
Scatter Plot
▪Outlier Detection: Scatter plots are useful for identifying outliers, which are data points that
deviate significantly from the general pattern. Outliers may indicate errors or unique
observations in the dataset.
▪Comparing Data Sets: Scatter plots can be used to compare the relationship between two
different datasets or groups. For example, comparing the relationship between height and weight
for males and females.
Limitations of Scatter Plot
▪Scatter plots are not suitable for visualizing categorical data or non-numeric variables(Bar Charts).
▪Scatter plots can effectively display the relationship between two numerical variables but not
suitable for multiple variables(Heat Maps).
▪While scatter plots can handle time series data, they might not be the best choice for visualizing
trends over time(Line Plot).
▪With small sample sizes, scatter plots may not provide a reliable representation of the relationship
between the variables.
Line Plot
AI RLINES PASSENGERS
TRAFFIC OVER THE YEARS
G R A PH
Line Plot
▪Line plots are commonly used to visualize data over time or continuous intervals.
▪They are effective for showing trends, fluctuations, or seasonal patterns in time-series data.
▪Line plots, also known as line charts or time series plots, are a type of data visualization that
displays data points connected by straight lines.
Line Plot
▪Time Series Data: Line plots are commonly used to visualize time series data, where data is
collected at regular intervals over time.
▪Sequential Data: Line plots are suitable for visualizing any data that has a natural order or
sequence, such as data collected in a continuous process or in chronological order.
▪Comparing Trends: Line plots are effective for comparing trends in multiple data series on the
same graph.
▪Showing Patterns and Seasonality: Line plots can reveal patterns, fluctuations, and seasonality
in data.
▪They are ideal for tracking trends and patterns in data, such as stock prices, temperature
changes, sales performance, or website traffic over time.
Limitations of Line Plot
▪Line plots are not suitable for visualizing categorical data or data that cannot be ordered, such
as colors, product categories, or names of cities.
▪If the data is sporadic or has irregular time intervals, line plots might not effectively show the
trend or pattern due to missing data points.
▪Line plots are not appropriate for data without any sequential relationship or where there is no
meaningful order to the data points.
▪If the data has high variability and frequent abrupt changes, line plots may not effectively
convey the trends and patterns due to the continuous nature of the lines connecting data points.
Distribution Plot
(Histogram)
F R EQU ENCY OF CUSTOMERS AND
TOTAL BILL A M OU N T
Distribution Plot
▪Distribution plots, also known as histograms, are a type of data visualization used to display the
frequency distribution of continuous numerical data.
▪The data is divided into intervals or "bins" along the X-axis, and the Y-axis represents the
frequency or count of data points falling within each bin.
▪Histograms allow us to see the shape of the data distribution, such as whether it is symmetric,
skewed, or bimodal.
▪They are useful for understanding the central tendency, spread, and presence of any unusual
data points (outliers).
Distribution Plot
▪They help visualize how the data is spread across different ranges or intervals.
▪Histograms provide insights into the central tendency of the data (mean, median, mode) and
the spread or variability of the data.
▪Histograms can help identify outliers—data points that lie far from the majority of the data—
indicating potential anomalies or errors in the dataset.
▪Histograms can be used to compare the distributions of different groups or categories within a
dataset.
Limitations of Distribution Plot
▪Histograms are not suitable for visualizing categorical data or data that are not continuous.
▪Distribution plots are not the ideal choice for visualizing data over time or continuous intervals.
▪If the dataset is small, histograms may not provide meaningful insights as the frequency counts
within each bin could be too low to establish a clear pattern.
▪Histograms are not used for data that is qualitative or textual in nature, like survey responses or
feedback comments.
Box Plot
DISTRIBUTION OF BILL
AMOUNT I N A R ESTAU RA NT
BA S E D ON GENDER OF T HE
CU STOM ER
Box Plot
▪Box plots, also known as box-and-whisker plots, are a type of data visualization that provides a
concise summary of the distribution of numerical data.
▪Interpretation of Box Plot:
• The box in the plot represents the interquartile range (IQR).
• The line inside the box represents the median value of the data.
• The "whiskers" extend from the box to show the range of the data, excluding outliers.
• Outliers are displayed as individual points outside the whiskers.
Box Plot
▪Comparing Distributions: Box plots are commonly used to compare the distribution of data
between different groups or categories.
▪Box plots efficiently show potential outliers in the data, which are data points that lie far from
the bulk of the data.
▪Box plots are robust to handle skewed or non-symmetric datasets. They provide a more
informative representation of the data's central tendency compared to mean-based
visualizations, which can be affected by extreme values.
Limitations of Box Plot
▪Box plots are designed for numerical data and are not suitable for visualizing categorical data or
non-numeric variables.
▪Box plots are not typically used for visualizing time series data or continuous intervals.
▪With small sample sizes, box plots may not provide a reliable representation of the data
distribution.
▪Box plots give an overview of the data distribution, but they do not show individual data points.
If the visualization requires a detailed examination of each data point, scatter plots or line plots
may be more suitable.
Thank You