0% found this document useful (0 votes)
2 views

Charts for Visualization

The document provides an overview of various types of charts and graphs used for data visualization, including bar charts, line charts, pie charts, heatmaps, and scatter plots. It explains when to use each type, their advantages and disadvantages, and includes examples such as mosaic plots and bubble charts for visualizing proportions and relationships among data. Additionally, it discusses visualizing distributions through histograms, density plots, and box plots, as well as time series data representation.

Uploaded by

ganesh697todkari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Charts for Visualization

The document provides an overview of various types of charts and graphs used for data visualization, including bar charts, line charts, pie charts, heatmaps, and scatter plots. It explains when to use each type, their advantages and disadvantages, and includes examples such as mosaic plots and bubble charts for visualizing proportions and relationships among data. Additionally, it discusses visualizing distributions through histograms, density plots, and box plots, as well as time series data representation.

Uploaded by

ganesh697todkari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Charts and Graphs

Bar Charts
• A bar chart visually represents data using rectangular bars or columns.
• The length of each bar corresponds proportionally to its value.
• Bar Charts is most common approach to visualizing amounts (i.e., numerical
values shown for some set of categories

• When to use a bar chart


• Compare data : Bar charts can compare the values of different groups, such as the sales of
different departments in a store.
• Analyze categorical data : Bar charts are well-suited for categorical data and to compare
them, such as survey responses
• Show rankings : Bar charts can show rankings or distribution of data across different
categories, such as the market share of different companies in an industry.
Types of Bar Charts
• Use a vertical bar chart to
easily compare different
categories with relatively
short labels
• Use a horizontal bar
chart when the labels are
lengthy
• Use a stacked bar
chart to visualize the
composition of different
parts within a category
• Use a grouped bar
chart to compare values
within different
subgroups of a category
Line Charts
• A line chart connects distinct data points through straight lines.
• This type of chart is effective for demonstrating progression.
• Its best use case is to illuminate trends, patterns, and variable
changes.
Visualising Series – Line Graphs
• They are appropriate whenever the data points have a natural order that is
reflected in the variable shown along the x axis, so that neighboring points
can be connected with a line.
• This situation arises, for example, in dose–response curves, where we
measure how changing some numerical parameter in an experiment (the
dose) affects an outcome of interest (the response).

The line graph visualization highlights how the dose–


response curves have a similar shape for the three oat
varieties considered but differ in the starting point in
the absence of fertilization
i.e., some varieties have naturally higher yield than
others
Pie Charts
• It is a circular, statistical graphic that divides data into
slices.
• Each slice represents a percentage or proportion of the
whole.

• When to Use Pie Chart:


• To show how parts make up a whole.
• Useful in emphasizing a particular category by way of
highlighting a dominant slice.
• Pie charts are used to show how some group, entity,
or amount breaks down into individual pieces that
each represent a proportion of the whole.
Visualising Amounts : Heatmaps
• As an alternative to mapping data values onto positions via bars or
dots, we can map data values onto colors
• A heatmap is a graphical representation of data that uses a system of
color coding to represent different values.
• Each cell in the matrix is assigned a color based on the value it holds.
• Because of their reliance on color to communicate values, Heat Maps
are used to display a more generalized view of numeric values.
• This is especially true when dealing with large volumes of data, as
colors are easier to distinguish and make sense of than raw numbers.
Heat Map Example

for Retail Matrix

for Population
Consider Example of Bridges in Pittsburgh
• Consider a dataset of 106 bridges in Pittsburgh.
• This dataset contains various pieces of information about the bridges,
such as the material from which they are constructed (steel, iron, or
wood) and the year when they were constructed.
• Based on the year of construction, bridges are grouped into distinct
categories, such as crafts bridges that were constructed before 1870
and modern bridges that were constructed after 1940.
Visualising as Proportions
• Breakdown of 106 bridges in Pittsburgh
• by construction material (steel, wood,iron) and
• by date of construction (crafts, before 1870, and modern, after 1940)
• Whenever we have categories that overlap, it is best to show
explicitly how they relate to each other.
• This can be done with a mosaic plot.
• A mosaic plot, Marimekko chart, Mekko chart, is a graphical
visualization of data from two or more qualitative variables.
Visualising Proportions – mosaic plot
To draw a mosaic plot,
• we place one categorical variable
along the x axis (here, era of
bridge construction) and
subdividing the x axis by the
relative proportions that make up
the categories.
• We then place the other
categorical variable along the y
axis (here, building material) and,
within each category along the x
axis, subdivide the y axis by the
relative proportions that make up
the categories of the y variable. T
• result is a set of rectangles whose
areas are proportional to the
number of cases representing
each possible combination of the
two categorical variables.
• The same bridges data set can be visualised using a tree map.
• we take an enclosing rectangle and subdivide it into smaller rectangles
whose areas represent the proportions.
• The method of placing the smaller rectangles into the larger one is
different compared to the mosaic plot.
• In a treemap, we recursively nest rectangles inside each other.
• For example, in the case of the Pittsburgh bridges, we can first subdivide
the total area into three parts representing the three building materials,
wood, iron, and steel. Then we can subdivide each of those areas further to
represent the construction eras represented for each building material
Visualising Proportions – Tree Map
• enclosing rectangle is
subdivided into smaller
rectangles whose areas
represent the proportions
• recursively nest rectangles
inside each other.
• in the case of the Pittsburgh
bridges, we can first
subdivide the total area into
three parts representing the
three building materials,
wood, iron, and steel.
• Then we can subdivide each
of those areas further to
represent the construction
eras represented for each
building material
Visualising Proportions – Mosaic Plots vs Tree Maps
• Mosaic plots assume that every level of one grouping variable can be
combined with every level of another grouping variable
• For example: every bridge can be described by a choice of building material
(wood, iron, steel) and a choice of time period (crafts, emerging, mature,
modern).
• By contrast, such a requirement does not exist for treemaps.
• In fact, treemaps tend to work well when the proportions cannot
meaningfully be described by combining multiple categorical variables.
• For example, we can separate the US into four regions (West, Northeast,
Midwest, and South) and each region into distinct states, but the states in
one region have no relationship to the states in another region
Visualising Proportions – Nested Pies

A pie chart composed of an inner and an outer circle


• inner circle shows the breakdown of the data by one variable (here,
building material),
• outer circle shows the breakdown of each slice of the inner circle by the
second variable (here, era of bridge construction).
Another Option - nested color scale
• first slice the pie into pieces representing the proportions according to one
variable (e.g., material)
• then subdivide these slices further according to the other variable
(construction era)
• then use coloring to indicate the nested nature of the pie
Visualising Proportions – Nested Pies
Visualising Proportions – Parallel Sets
• to visualize
proportions
described by more
than two categorical
variables, one option
is to use Parallel Sets
• In a parallel sets plot,
we show how the
total dataset breaks
down by each
individual categorical
variable, and then we
draw shaded bands
that show how the
subgroups relate to
each other.
Scatter Plots
• Scatter plots show a
collection of data points
‘scattered’ around the graph
as dots.
• The X-axis represents the
independent variable and
the Y-axis represents the
dependent variable.

• When to use Scatter Plots:


• Scatterplots are used when we want to show one quantitative variable relative to
another
• They are ideal for exploring relationships and patterns between two continuous
variables.
• If we have three quantitative variables, we can map one onto the dot size, creating a
variant of the scatterplot called a bubble chart
Bubble Chart
• The bubble chart is a variation of the scatter chart.
• It helps you look at relationships between three or four numeric variables.
• Each dot in the chart represents a data point, with its position determined by the X and Y
values, and its size determined by the third value. Additional variable can be integrated
using colour.
• When to Use bubble chart:
• Multivariate Analysis: Bubble charts allow you to compare three or more variables in a
single visualization.
• Size and Color Encoding: They use size and coloration to deliver extra information such as
fee or class.
• Disadvantage:
• Bubble charts have the disadvantage that they show the same types of variables—
quantitative variables—with two different types of scales, position and size. This makes it
difficult to visually ascertain the strengths of associations between the vari‐ ous
variables. Moreover, differences between data values encoded as bubble size are harder
to perceive than differences between data values encoded as position.
Bubble Chart Example
Bubble Chart Example
Considering the disadvantage, As an
alternative to a bubble chart, it may
be preferable to show an all-against-
all matrix of scatterplots, where
each individual plot shows two data
dimensions
Visualising Distributions
Histograms
• Histograms are similar to bar graphs but are used specifically to represent the distribution of continuous
data. In histograms, the data is divided into intervals, or bins, and the height of each bar represents the
frequency or count of data points within that interval.

• Advantages of using Histogram


• Easy to understand: Histograms provide a visual representation of the distribution of data, making it easy for
viewers to grasp the overall pattern.
• Identify Patterns: Histograms allow for the identification of patterns and trends within the data, such as
skewness, peaks, or gaps.
• Compare Data Sets: Histograms enable comparisons between different datasets, helping to identify
similarities or differences in their distributions.
• Disadvantages of using Histogram
• Not for small datasets: Histograms may not be suitable for very small datasets as they require a sufficient
amount of data to accurately represent the distribution.
• Limited details: Histograms provide a summary of the data distribution but may lack detailed information
about individual data points, such as specific values or outliers.
Density Plots
• A Density Plot visualizes the distribution of data over a continuous interval or
time period.
• This chart is a variation of a Histogram that uses kernel smoothing to plot values,
allowing for smoother distributions by smoothing out the noise.
• The peaks of a Density Plot help display where values are concentrated over the
interval.
• In a density plot, we attempt to visualize the underlying probability distribution of
the data by drawing an appropriate continuous curve
• This curve needs to be estimated from the data, and the most commonly used
method for this estimation procedure is called kernel density estimation.
• An advantage Density Plots have over Histograms is that they're better at
determining the distribution shape because they're not affected by the number
of bins used
• Kernel density estimate of the age distribution of passengers on the Titanic.

Note : in excel this can be drawn by first calculating the kernels and then using scatter plots with smooth lines
Visualising Distributions - Box Plot
• Also Known as whisker plot, box-and-whisker plot,
or a box-and whisker diagram.
• It is defined as a graphical method of displaying
variation in a set of data.
• It displays key summary statistics such as the
median, quartiles, and potential outliers in a concise
and visual manner.
The procedure to develop a box and whisker plot comes from the
five statistics below.
• Minimum value: The smallest value in the data set
• First quartile (Q1): 25% of the data lies below the First (lower)
Quartile.
• Median (Q2) : It is the mid-point of the dataset.. Half of the
values lie below it and half above.
• Third quartile(Q3): 75% of the data lies below the Third
(Upper) Quartile
• Maximum value: It is the maximum value in the dataset
excluding the outliers.
• The area inside the box (50% of the data) is known as the Inter Quartile Range. The IQR is
calculated as
IQR = Q3-Q1
• Outlies are the data points below and above the lower and upper limit. The lower and upper limit
is calculated as
Lower Limit = Q1 - 1.5*IQR
Upper Limit = Q3 + 1.5*IQR
When to use Boxplot vs Histogram
• Use a histogram when you want to see the overall shape and
distribution of your data, including the density of values within
specific ranges
• Use a box plot for comparing the spread and central tendency of
single or multiple data sets side-by-side, particularly when you want
to easily identify outliers and visualize skewness in
Visualising Distributions – Box Plots
Visualising Distributions – Violin Plot
Violin Plots
• Box plot helps you see the spread of data and violin allows you to visualize
spread and the shape of the data distribution

• The width of the violin at a given y value represents the point density
at that y value.

• Technically, a violin plot is a density estimate rotated by 90 degrees


and then mirrored. Violins are therefore symmetric. Violins begin and
end at the minimum and maximum data values, respectively. The
thickest part of the violin corresponds to the highest point density in
the dataset.
Visualising Distributions – Sina Plot
Sina Plots shows
each individual
point while also
visualizing the
distributions.

while violin plots


depict kernel
density, sina plots
depict the points
themselves
Visualizing distributions along the horizontal
axis – Ridge Plots
Ridge Plots
• Ridge plots have a horizontal orientation for the distribution of each group
• The x-axis represents the variable being analyzed here the temperature and y
axis will have the category

In a violin plot, the x-axis represents the different categories or groups.The y-axis
represents the variable of interest (e.g., the numerical values).

• When to Use Ridge vs Violin:


• Ridge Plot: When you need to compare many categories and see how their
distributions differ, especially when looking for patterns in the shape of
distributions.
• Violin Plot: When you want to compare distributions and also want a clear view
of summary statistics (like median, quartiles) in addition to the density.
Visualising Time Series
• In a data set when you are comparing one variable against another
and one of the variables is a time, it creates a unique situation.
• Time imposes additional structure on the data. Now the data points
have an inherent order; normally arranged as increasing time
• Time series data is a sequential arrangement of data points organized
in consecutive time order.
• Omitting the dots emphasizes the overall temporal trend while
deemphasizing individual observations at specific time points.
• Commonly this is depicted using a line graph
Visualising Time Series

We can also fill the area under the curve with a solid color.
a line graph is best for clearly showing trends and This choice further emphasizes the overarching trend in the data as
changes over time. it visually separates the area above the curve from the area below.
Use Line graph in time series when you want to Use area to depict data when you want to emphasize the total
clearly see the direction and rate of change over accumulated value over time, like total sales or cumulative growth.
time. However, this visualization is only valid if the y axis starts at zero, so
that the height of the shaded area at each time point represents
the data value at that time point.
Visualising Multiple Time Series
Monthly submissions to three preprint servers covering biomedical research.

Eliminating the
dots and directly
labeling the lines
instead of
providing a legend
reduces the
cognitive load
required to read
the figure
Area Chart
• An area graph is a
specialized form of the line
graph
• instead of connecting our
data points with a
continuous line, we fill in the
region below that line with a
solid color.
• This might seem to be a
minor cosmetic change, but
it has a significant effect on
how we perceive the data in
the chart.
Gantt Chart
• Gantt charts are
common in project
management, as
they’re useful in
illustrating a project
timeline or
progression of tasks.
• In this type of chart,
tasks to be
performed are listed
on the vertical axis
and time intervals on
the horizontal axis.
• Horizontal bars in
the body of the chart
represent the
duration of each
activity.
Gantt Chart
Waterfall Chart
• A waterfall chart is a
specific type of bar chart
that reveals the story
behind the net change in
something’s value
between two points.
• It is used to show how an
initial value is increased
and decreased by a series
of intermediate values,
leading to a final value.
Bullet Graph
• A bullet graph is a
bar marked with
extra encodings to
show progress
towards a goal or
performance against
a reference line.
• Each bar focuses the
user on one
measure, bringing in
more visual
elements to provide
additional detail.
Pareto Charts
• A Pareto chart combines a bar chart and a line graph.
• The rectangular bars correspond to individual values in descending order, while
the line graph displays the cumulative percentage total.
• This type of chart follows the famous Pareto principle that emphasizes that 20
percent of causes result in 80 percent of problems.

• When to use Pareto charts?


• When analyzing data about the frequency of problems or causes in a
process
• When there are many problems or causes and you want to focus on the
most significant
• When analyzing broad causes by looking at their specific components
Radar Charts
• Radar charts compare multiple quantitative variables and are useful for
visualizing which variables have similar values, or if there are outliers
among the variables.
• Radar charts consists of a sequence of spokes, with each spoke
representing a single variable.
• to plot one or more series of values over multiple quantitative variables.
• Disadvantage:
• Having too many variables creates too many axes and make the chart hard
to read and complicated. So it's good practice to keep Radar Charts simple
and limit the number of variables used.
Radar Chart Example

• Consider skill analysis for staff


members.
• They could be assessed in terms of
communication, problem-solving,
teamwork, ability to meet deadlines,
punctuality, and technical
understanding.
• A radar chart immediately shows where
staff are assessed in comparison to
their colleagues.
Funnel Chart
• A Funnel Chart is a type of chart that visually represents a process or workflow
where data progressively decreases at each stage.
• It's often used to display the conversion rate of a process, showing how values
"flow" from one stage to another.
• This chart is typically used in sales, marketing, or other business processes where
the goal is to analyze and visualize the drop-off from one step to the next.
• Stages or Phases: Each stage of the process is represented by a horizontal bar,
and the width of each bar reflects the size of the data at that stage.
• Decreasing Size: The bars usually get smaller as you progress through the stages,
indicating the decrease in quantity or value (e.g., leads, customers, sales).
• Visualization of Drop-off: It's very effective at showing where values decrease,
highlighting bottlenecks or areas that need improvement.
Funnel Chart
Maps
• Use a Map when you want to see patterns in your data by
geography.
• Data is mapped by matching the location name on the map with a
location in your data.
Index of Charts
Visualising Amounts
A bar chart or bar graph is a chart or graph that presents
categorical data with rectangular bars with heights or lengths
proportional to the values that they represent. The bars can be
plotted vertically or horizontally.
Use Vertical bar charts if number of categories are less. For larger
number of categories use horizontal bars.
e.g. Vertical bar charts : subject as category on x axis and number
of students enrolled on y axis
e.g. Horizontal bar charts: States as y axis and population on x axis

Grouped Bar Charts are used when two or more data sets are
displayed side-by-side and grouped together under categories on
the same axis.
e.g: x axis had teams, each team grouped by age range and y axis
shows the number of players in that team in that range

stacked bar chart (or stacked bar graph) is a bar chart where each
bar is broken down into parts and all the parts together make up
the whole.
e.g x axis has teams and each bar is stacked by number of players
in specific age range
Visualising Amounts
Dot Plots
A dot plot shows data distribution by plotting dots for each
observation. Dot plots can be used for both categorical and
quantitative data. Dot plots are useful for comparing between
2-4 points on a line. If you have more than 4 points, the dot
plot will likely get too cluttered and difficult to read
e.g : x axis depicts life expectancy and y axis lists the countries

A heat map chart is a specialized chart that uses colors to


represent data values in a table. It is mostly used to plot large
and complex data.
IT depicts values for a main variable of interest across two axis
variables as a grid of colored squares.
Because of their reliance on colour to communicate values,
Heatmaps are a chart better suited to displaying a more
generalised view of numerical data
e.g: heat map of number of customers visiting a store. X asis
shows the time of the day and y axis day in a weak. Data of
number of customers visiting is plotted by color scale
We often want to show how
Visualising Proportions some group, entity, or amount
breaks down into individual
pieces that each represent a
proportion of the whole. This is
most commonly done using Pie
Charts.
But depending on the type of
data being represented
proportions can be represented
very often as Bar charts

A pie chart is a type of graph representing data in a circular form, with each slice of the circle representing a fraction or
proportionate part of the whole.
e.g : for a company’s employees preference of beverages represented as % in a pie chart
A Nested pie chart is composed of an inner and an outer circle. The inner circle shows the breakdown of the data by one
variable, and the outer circle shows the breakdown of each slice of the inner circle by the second variable.
e.g % tourist in each continent further depicted as per tourist in each country in continent for a particular year
Visualising Proportions

Whenever we have categories that overlap, it is best to show explicitly how they relate to each other. This can be done
with a mosaic plot.
A mosaic plot is a square subdivided into rectangular tiles the area of which represents the conditional relative frequency
for a cell in the contingency table.In mosaic plot every categorical variable shown must cover all the observations in the
dataset.
Treemaps are an alternative way of visualizing the hierarchical structure of a Tree Diagram while also displaying quantities
for each category via area size. Each category is assigned a rectangle area with the subcategory rectangles nested inside.
When we want to visualize proportions described by more than two categorical variables, mosaic plots, treemaps, and pie
charts all can quickly become unwieldy. An alternative in this case can be a parallel sets plot.
In a parallel sets plot, we show how the total dataset breaks down by each individual categorical variable, and then we
draw shaded bands that show how the subgroups relate to each other
Visualising Distributions
A histogram is the most commonly used graph
to show frequency distributions.
Histograms often classify data into various
“bins” or “range groups” and count how many
data points belong to each of those bins.
It helps to identify how often each different
value in a set of data occurs.
A histogram is a graph that shows the frequency
of numerical data using rectangles. The height of A density plot is a representation of
a rectangle (the vertical axis) represents the the distribution of a numeric
distribution frequency of a variable (the variable. It uses a kernel density
amount, or how often that variable appears). estimate to show the probability
e.g : test score of a particular test shown in density function of the variable. It is
histogram wil have x axis with bins for mark a smoothed version of the
range and y axis will depict number of students histogram.
who scored in that range e.g : same information showed in
histogram but now it will be
represented as kernals (density of
data) and bandwidth (bins/range) of
data
Visualising Distributions • Boxplots, violin plots, strip charts,
and sina plots are useful when we
want to visualize many
distributions at once and/or if we
are primarily interested in overall
shifts among the distributions.
• Stacked histograms and
overlapping densities allow a
more in-depth comparison of a
smaller number of distributions,
though stacked histograms can be
difficult to interpret and are best
avoided
• Ridgeline plots can be a useful
alternative to violin plots and are
often useful when visualizing very
large numbers of distributions or
changes in distributions over time
• Overlapping density plots
means creating some density
plots of the different data over a
single plot.
Visualising x-y relationship ( Associations)
Scatter plots are the graphs that present the relationship between two
variables in a data-set. It represents data points on a two-dimensional plane or
on a Cartesian system. The independent variable or attribute is plotted on the
X-axis, while the dependent variable is plotted on the Y-axis. If the points are
coded, one additional variable can be displayed.
Scatter plots are useful to see correlation between two continuous data sets
e.g: x axis depcis age and y axis depicts blood pressure

A bubble chart is a variation of a scatter chart in which the data points are
replaced with bubbles, and an additional dimension of the data is represented
in the size of the bubbles.
Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that
expresses two of the vi values through the disk's xy location and the third
through its size. Bubble charts can facilitate the understanding of social,
economical, medical, and other scientific relationships.
e.g: plot sales on x axis against number of orders on y axis, with size of bubble
depicting the inventory left
Visualising Time Series
A line graph—also known as a line plot or a line chart—is a graph that uses lines to connect individual
data points. A line graph displays quantitative values over a specified time interval.
Line graphs are useful in that they show data variables and trends very clearly and can help to make
predictions about the results of data not yet recorded.
e.g: change in price of a commodity over a period of time

A multivariate or multiple time series consists of two or more interrelated variables


(or dimensions) that depend on time. Each variable changes simultaneously with
time
e.g: change in price of commodity over a period of time in different cities

A Gantt chart is a project management tool that illustrates work completed


over a period of time in relation to the time planned for the work. It typically
includes two sections: the left side outlines a list of tasks, while the right side
has a timeline with schedule bars that visualize work.
Visualising Time Series
A waterfall chart is a visual representation that illustrates how a value changes as it’s influenced by
different factors, such as time. The main goal of this chart is to show the viewer how a value has
grown or declined over a defined period
e.g: change in account balance over a period of time as transactions get done

An area chart is a line chart where the area between the line and the axis are
shaded with a color. These charts are typically used to represent accumulated totals
over time and are the conventional way to display stacked lines.
This type of chart is particularly effective in showcasing data trends and variations
over a specified period or across different categories.
e.g: change in share price over a period of time

A bullet graph is a bar marked with extra encodings to show progress towards a goal or
performance against a reference line. Each bar focuses the user on one measure, bringing in more
visual elements to provide additional detail. A typical bullet graph encodes three different data
elements: an observed value; a target value; and a range of values used for grading.
e.g: actual sales vs target set, employee satisfaction vs target set

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy