Charts for Visualization
Charts for Visualization
Bar Charts
• A bar chart visually represents data using rectangular bars or columns.
• The length of each bar corresponds proportionally to its value.
• Bar Charts is most common approach to visualizing amounts (i.e., numerical
values shown for some set of categories
for Population
Consider Example of Bridges in Pittsburgh
• Consider a dataset of 106 bridges in Pittsburgh.
• This dataset contains various pieces of information about the bridges,
such as the material from which they are constructed (steel, iron, or
wood) and the year when they were constructed.
• Based on the year of construction, bridges are grouped into distinct
categories, such as crafts bridges that were constructed before 1870
and modern bridges that were constructed after 1940.
Visualising as Proportions
• Breakdown of 106 bridges in Pittsburgh
• by construction material (steel, wood,iron) and
• by date of construction (crafts, before 1870, and modern, after 1940)
• Whenever we have categories that overlap, it is best to show
explicitly how they relate to each other.
• This can be done with a mosaic plot.
• A mosaic plot, Marimekko chart, Mekko chart, is a graphical
visualization of data from two or more qualitative variables.
Visualising Proportions – mosaic plot
To draw a mosaic plot,
• we place one categorical variable
along the x axis (here, era of
bridge construction) and
subdividing the x axis by the
relative proportions that make up
the categories.
• We then place the other
categorical variable along the y
axis (here, building material) and,
within each category along the x
axis, subdivide the y axis by the
relative proportions that make up
the categories of the y variable. T
• result is a set of rectangles whose
areas are proportional to the
number of cases representing
each possible combination of the
two categorical variables.
• The same bridges data set can be visualised using a tree map.
• we take an enclosing rectangle and subdivide it into smaller rectangles
whose areas represent the proportions.
• The method of placing the smaller rectangles into the larger one is
different compared to the mosaic plot.
• In a treemap, we recursively nest rectangles inside each other.
• For example, in the case of the Pittsburgh bridges, we can first subdivide
the total area into three parts representing the three building materials,
wood, iron, and steel. Then we can subdivide each of those areas further to
represent the construction eras represented for each building material
Visualising Proportions – Tree Map
• enclosing rectangle is
subdivided into smaller
rectangles whose areas
represent the proportions
• recursively nest rectangles
inside each other.
• in the case of the Pittsburgh
bridges, we can first
subdivide the total area into
three parts representing the
three building materials,
wood, iron, and steel.
• Then we can subdivide each
of those areas further to
represent the construction
eras represented for each
building material
Visualising Proportions – Mosaic Plots vs Tree Maps
• Mosaic plots assume that every level of one grouping variable can be
combined with every level of another grouping variable
• For example: every bridge can be described by a choice of building material
(wood, iron, steel) and a choice of time period (crafts, emerging, mature,
modern).
• By contrast, such a requirement does not exist for treemaps.
• In fact, treemaps tend to work well when the proportions cannot
meaningfully be described by combining multiple categorical variables.
• For example, we can separate the US into four regions (West, Northeast,
Midwest, and South) and each region into distinct states, but the states in
one region have no relationship to the states in another region
Visualising Proportions – Nested Pies
Note : in excel this can be drawn by first calculating the kernels and then using scatter plots with smooth lines
Visualising Distributions - Box Plot
• Also Known as whisker plot, box-and-whisker plot,
or a box-and whisker diagram.
• It is defined as a graphical method of displaying
variation in a set of data.
• It displays key summary statistics such as the
median, quartiles, and potential outliers in a concise
and visual manner.
The procedure to develop a box and whisker plot comes from the
five statistics below.
• Minimum value: The smallest value in the data set
• First quartile (Q1): 25% of the data lies below the First (lower)
Quartile.
• Median (Q2) : It is the mid-point of the dataset.. Half of the
values lie below it and half above.
• Third quartile(Q3): 75% of the data lies below the Third
(Upper) Quartile
• Maximum value: It is the maximum value in the dataset
excluding the outliers.
• The area inside the box (50% of the data) is known as the Inter Quartile Range. The IQR is
calculated as
IQR = Q3-Q1
• Outlies are the data points below and above the lower and upper limit. The lower and upper limit
is calculated as
Lower Limit = Q1 - 1.5*IQR
Upper Limit = Q3 + 1.5*IQR
When to use Boxplot vs Histogram
• Use a histogram when you want to see the overall shape and
distribution of your data, including the density of values within
specific ranges
• Use a box plot for comparing the spread and central tendency of
single or multiple data sets side-by-side, particularly when you want
to easily identify outliers and visualize skewness in
Visualising Distributions – Box Plots
Visualising Distributions – Violin Plot
Violin Plots
• Box plot helps you see the spread of data and violin allows you to visualize
spread and the shape of the data distribution
• The width of the violin at a given y value represents the point density
at that y value.
In a violin plot, the x-axis represents the different categories or groups.The y-axis
represents the variable of interest (e.g., the numerical values).
We can also fill the area under the curve with a solid color.
a line graph is best for clearly showing trends and This choice further emphasizes the overarching trend in the data as
changes over time. it visually separates the area above the curve from the area below.
Use Line graph in time series when you want to Use area to depict data when you want to emphasize the total
clearly see the direction and rate of change over accumulated value over time, like total sales or cumulative growth.
time. However, this visualization is only valid if the y axis starts at zero, so
that the height of the shaded area at each time point represents
the data value at that time point.
Visualising Multiple Time Series
Monthly submissions to three preprint servers covering biomedical research.
Eliminating the
dots and directly
labeling the lines
instead of
providing a legend
reduces the
cognitive load
required to read
the figure
Area Chart
• An area graph is a
specialized form of the line
graph
• instead of connecting our
data points with a
continuous line, we fill in the
region below that line with a
solid color.
• This might seem to be a
minor cosmetic change, but
it has a significant effect on
how we perceive the data in
the chart.
Gantt Chart
• Gantt charts are
common in project
management, as
they’re useful in
illustrating a project
timeline or
progression of tasks.
• In this type of chart,
tasks to be
performed are listed
on the vertical axis
and time intervals on
the horizontal axis.
• Horizontal bars in
the body of the chart
represent the
duration of each
activity.
Gantt Chart
Waterfall Chart
• A waterfall chart is a
specific type of bar chart
that reveals the story
behind the net change in
something’s value
between two points.
• It is used to show how an
initial value is increased
and decreased by a series
of intermediate values,
leading to a final value.
Bullet Graph
• A bullet graph is a
bar marked with
extra encodings to
show progress
towards a goal or
performance against
a reference line.
• Each bar focuses the
user on one
measure, bringing in
more visual
elements to provide
additional detail.
Pareto Charts
• A Pareto chart combines a bar chart and a line graph.
• The rectangular bars correspond to individual values in descending order, while
the line graph displays the cumulative percentage total.
• This type of chart follows the famous Pareto principle that emphasizes that 20
percent of causes result in 80 percent of problems.
Grouped Bar Charts are used when two or more data sets are
displayed side-by-side and grouped together under categories on
the same axis.
e.g: x axis had teams, each team grouped by age range and y axis
shows the number of players in that team in that range
stacked bar chart (or stacked bar graph) is a bar chart where each
bar is broken down into parts and all the parts together make up
the whole.
e.g x axis has teams and each bar is stacked by number of players
in specific age range
Visualising Amounts
Dot Plots
A dot plot shows data distribution by plotting dots for each
observation. Dot plots can be used for both categorical and
quantitative data. Dot plots are useful for comparing between
2-4 points on a line. If you have more than 4 points, the dot
plot will likely get too cluttered and difficult to read
e.g : x axis depicts life expectancy and y axis lists the countries
A pie chart is a type of graph representing data in a circular form, with each slice of the circle representing a fraction or
proportionate part of the whole.
e.g : for a company’s employees preference of beverages represented as % in a pie chart
A Nested pie chart is composed of an inner and an outer circle. The inner circle shows the breakdown of the data by one
variable, and the outer circle shows the breakdown of each slice of the inner circle by the second variable.
e.g % tourist in each continent further depicted as per tourist in each country in continent for a particular year
Visualising Proportions
Whenever we have categories that overlap, it is best to show explicitly how they relate to each other. This can be done
with a mosaic plot.
A mosaic plot is a square subdivided into rectangular tiles the area of which represents the conditional relative frequency
for a cell in the contingency table.In mosaic plot every categorical variable shown must cover all the observations in the
dataset.
Treemaps are an alternative way of visualizing the hierarchical structure of a Tree Diagram while also displaying quantities
for each category via area size. Each category is assigned a rectangle area with the subcategory rectangles nested inside.
When we want to visualize proportions described by more than two categorical variables, mosaic plots, treemaps, and pie
charts all can quickly become unwieldy. An alternative in this case can be a parallel sets plot.
In a parallel sets plot, we show how the total dataset breaks down by each individual categorical variable, and then we
draw shaded bands that show how the subgroups relate to each other
Visualising Distributions
A histogram is the most commonly used graph
to show frequency distributions.
Histograms often classify data into various
“bins” or “range groups” and count how many
data points belong to each of those bins.
It helps to identify how often each different
value in a set of data occurs.
A histogram is a graph that shows the frequency
of numerical data using rectangles. The height of A density plot is a representation of
a rectangle (the vertical axis) represents the the distribution of a numeric
distribution frequency of a variable (the variable. It uses a kernel density
amount, or how often that variable appears). estimate to show the probability
e.g : test score of a particular test shown in density function of the variable. It is
histogram wil have x axis with bins for mark a smoothed version of the
range and y axis will depict number of students histogram.
who scored in that range e.g : same information showed in
histogram but now it will be
represented as kernals (density of
data) and bandwidth (bins/range) of
data
Visualising Distributions • Boxplots, violin plots, strip charts,
and sina plots are useful when we
want to visualize many
distributions at once and/or if we
are primarily interested in overall
shifts among the distributions.
• Stacked histograms and
overlapping densities allow a
more in-depth comparison of a
smaller number of distributions,
though stacked histograms can be
difficult to interpret and are best
avoided
• Ridgeline plots can be a useful
alternative to violin plots and are
often useful when visualizing very
large numbers of distributions or
changes in distributions over time
• Overlapping density plots
means creating some density
plots of the different data over a
single plot.
Visualising x-y relationship ( Associations)
Scatter plots are the graphs that present the relationship between two
variables in a data-set. It represents data points on a two-dimensional plane or
on a Cartesian system. The independent variable or attribute is plotted on the
X-axis, while the dependent variable is plotted on the Y-axis. If the points are
coded, one additional variable can be displayed.
Scatter plots are useful to see correlation between two continuous data sets
e.g: x axis depcis age and y axis depicts blood pressure
A bubble chart is a variation of a scatter chart in which the data points are
replaced with bubbles, and an additional dimension of the data is represented
in the size of the bubbles.
Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that
expresses two of the vi values through the disk's xy location and the third
through its size. Bubble charts can facilitate the understanding of social,
economical, medical, and other scientific relationships.
e.g: plot sales on x axis against number of orders on y axis, with size of bubble
depicting the inventory left
Visualising Time Series
A line graph—also known as a line plot or a line chart—is a graph that uses lines to connect individual
data points. A line graph displays quantitative values over a specified time interval.
Line graphs are useful in that they show data variables and trends very clearly and can help to make
predictions about the results of data not yet recorded.
e.g: change in price of a commodity over a period of time
An area chart is a line chart where the area between the line and the axis are
shaded with a color. These charts are typically used to represent accumulated totals
over time and are the conventional way to display stacked lines.
This type of chart is particularly effective in showcasing data trends and variations
over a specified period or across different categories.
e.g: change in share price over a period of time
A bullet graph is a bar marked with extra encodings to show progress towards a goal or
performance against a reference line. Each bar focuses the user on one measure, bringing in more
visual elements to provide additional detail. A typical bullet graph encodes three different data
elements: an observed value; a target value; and a range of values used for grading.
e.g: actual sales vs target set, employee satisfaction vs target set