Chapter 3 Non Spatial Data Visualization
Chapter 3 Non Spatial Data Visualization
1
Visualization of one, two and multi-dimensional data, Tabular data,
quantitative values (scatter plot)
Separate, Order and align (Bar, staked bar, dots and line charts),
Tree data, Displaying Hierarchical structures,
Graph data, Rules for graph drawing and labeling
Time series data, Characteristics of time data, Visualization time series
data, Mapping of time
2
Attribute data visualization is a way of representing data through visual elements such as graphs,
charts, and diagrams.
This type of visualization is useful for displaying data that has attributes or characteristics such as
categories, time periods, and numerical values.
Attribute data visualization is an essential tool for displaying data in a graphical format that
enables easy interpretation and analysis.
Understanding the terminology associated with attribute data visualization is essential to making
meaningful interpretations of the visualizations.
Data Types
Items
An item is an individual discrete entity
e.g . row in a table, node in a network
Attributes
An attribute is some specific property that can be measured,
observed, or logg ed
a.k.a. variable, (data) dimension
3
Here are some of the terminologies associated with attribute data visualization:
Categorical Data: This is a type of data that can be sorted into categories or groups.
Examples include colors, names, and genres. Categorical data can be visualized using
techniques such as bar charts and pie charts.
Numerical Data: This is a type of data that is represented by numbers. Examples include
height, weight, and temperature. Numerical data can be visualized using techniques such
as scatter plots and line graphs.
Time-Series Data: This is a type of data that is collected over a period of time. Examples
include stock prices, weather patterns, and sales data. Time-series data can be
visualized using techniques such as line graphs and heat maps.
Data Points: This refers to individual pieces of data in a dataset. Data points can be
represented using different shapes such as dots or bars in a graph.
4
Here are some of the terminologies associated with attribute data visualization:
Axis: This is the line in a graph that represents the values of one or more
variables. A graph typically has two axes: a horizontal x-axis and a vertical y-
axis.
Legend: This is a part of a graph that explains the meaning of different colors or
shapes used to represent data points.
Trend line: This is a line that shows the general trend in a set of data. Trend lines
can be added to a graph to help identify patterns and correlations.
Correlation: This is a measure of how strongly two variables are related.
Correlation can be positive, negative, or neutral. Correlation can be visualized
using techniques such as scatter plots.
Heat Map: This is a type of graph that represents data as colors on a grid. Heat
maps are commonly used to display data that has two dimensions such as as time
and geography.
5
6
Nodes
Synonym for item but in the context of networks (graphs)
Links
A link is a relation between two items
e.g. social network friends, computer network links
7
Positions:
A position is a location in space (usually 2D or 3D)
May be subject to projections
e.g. cities on a map, a sampled region in an CT scan
Grids:
A grid specifies how data is sampled both geometrically and topologically
e.g. how CT scan data is stored
8
9
This type of data visualization involves displaying data that has one variable.
One-dimensional data visualization is typically used for displaying categorical or
numerical data.
Examples of one-dimensional data visualization techniques include bar charts, pie
charts, and histograms.
Bar Charts: Bar charts are used to display categorical data. The height of each bar represents
the number or percentage of observations in each category. Bar charts are commonly used for
comparing the frequency of different categories.
Pie Charts: Pie charts are used to represent categorical data as a percentage of a whole. The
size of each slice is proportional to the frequency or percentage of observations in each
category. Pie charts are useful for visualizing data where the total of all categories is known and
the categories are mutually exclusive.
Histograms: Histograms are used to represent numerical data. Histograms show the distribution
of a dataset by grouping the data into bins and plotting the frequency of observations in each
bin. Histograms are commonly used for exploring the distribution of continuous data.
1
0
1
1
This type of data visualization involves displaying data that has two variables.
Two-dimensional data visualization is typically used for displaying numerical data.
Examples of two-dimensional data visualization techniques include scatter plots
and line graphs.
Scatter Plots: Scatter plots are used to display the relationship between two variables.
Each data point in the scatter plot represents a pair of values for the two variables. Scatter
plots are useful for visualizing patterns and correlations in the data.
Line Graphs: Line graphs are used to display changes in a variable over time. Line graphs
show the relationship between two variables by plotting the values of one variable on the
y-axis and the time on the x-axis. Line graphs are useful for visualizing trends and
changes in the data over time.
1
2
This type of data visualization involves displaying data that has more than two
variables.
Multi-dimensional data visualization is typically used for displaying complex
numerical data.
Examples of multi-dimensional data visualization techniques include bubble charts
and heat maps.
Bubble Charts: Bubble charts are used to display three-dimensional data by representing
the values of three variables as the x-axis, y-axis, and size of the bubble. Bubble charts are
useful for visualizing complex relationships between three variables.
Heat Maps: Heat maps are used to display data as a two-dimensional grid of colored cells.
The color of each cell represents the value of the variable for that cell. Heat maps are
useful for visualizing large datasets where patterns and trends are not immediately
apparent.
1
3
Tabular data is data that is organized in rows and columns in a table format.
Tabular data is often used to represent quantitative values, such as numerical data or
percentages.
Scatter plots are a type of data visualization that can be used to represent quantitative
values.
Scatter plots display the relationship between two variables, where one variable is
plotted on the x-axis and the other variable is plotted on the y-axis.
Each point on the scatter plot represents a pair of values for the two variables.
In data visualization, tabular data can be used to create charts, graphs, and other
visualizations that help users to better understand the data.
For example, let's say you have a table of sales data for a company, organized by region,
product, and time period. You could use this data to create a line chart showing sales
trends over time for each region, or a bar chart showing total sales by product.
1
4
1
5
Quantitative values are numerical values that can be measured and
analyzed mathematically.
In data visualization, scatter plots are a useful way to visualize relationships between
two quantitative values.
A scatter plot is a graph in which the values of two quantitative variables are plotted
along two axes, and each data point is represented by a dot on the graph.
The position of each dot on the graph indicates the values of the two variables for that
data point, and patterns or trends in the data can often be seen by examining the
overall shape of the scatter plot.
Scatter plots can be very useful for exploring relationships between two quantitative
variables.
In addition to examining overall patterns in the data, you can also use techniques like
regression analysis to identify more specific relationships between the variables.
1
6
1
7
• In this example, we have a set of
data that includes the ages and
incomes of several individuals.
• We use Pandas to define this data
as a DataFrame with one column
for age and one column for income.
• We then use Matplotlib to create a
scatter plot of this data, with age on
the x-axis and income on the y-
axis.
• The resulting chart shows that there
is a general trend of increasing
income with increasing age,
although there is also some
variability in the data.
1
8
Separate, order, and align are techniques used in
data visualization to improve the clarity and
effectiveness of charts and graphs
1. Separate: This technique involves separating
different groups of data to make it easier to
compare and contrast them. For example, you
might use separate bars or colors to represent
different categories of data. This can be
especially useful when comparing data across
different time periods or locations.
Here's an example of a bar chart that separates
data by category:
In this example, we use a bar chart to compare
sales data for four different categories (A, B, C ,
and D) in two different years (2020 and 2021).
We use different colors for the bars
representing each year to make it easier to
compare the data.
1
9
Order: This technique involves
ordering data along an axis to
highlight patterns or trends.
For example, you might order data by
value or by date to show changes over
time.
Here's an example of a stacked bar
chart that orders data by value:
In this example, we use a stacked bar
chart to compare sales data for four
different regions (East, West, North, and
South) in four different quarters of the
year. We order the data by Q4 sales to
highlight which regions had the
highest sales.
2
0
import pandas as pd
Align: This technique import matplotlib.pyplot as plt
aligning different data points
involves # Define the data as a Pandas DataFrame
along a common axis to make data = pd.DataFrame({ 'Date': ['2020-01-01', '2020-02-01', '2020-03-01',
'2020-04-01', '2020-05-01'],
easier
it to compare them. 'Sales': [100, 125, 150, 175, 200],
2
2
Stock Price ($) Number of
students
1/6/2021
Course enrolled
1/5/2021
Math 120
English 90
1/4/2021
Science 80
1/3/2021 History 60
1/2/2021
1/1/2021
2
3
Stacked bar charts:
Stacked bar charts are similar to bar charts, but they are used to show how different
components contribute to a whole. Each bar is divided into segments that
represent different categories or sub-groups, with each segment representing a
percentage of the whole.
Example: A stacked bar chart can be used to show how different age groups
contribute to the overall population of a city. By separating the bars into age groups
and ordering them by age, viewers can quickly see how the population is
distributed across different age groups.
2
4
Age Group Population
Stock Price ($)
0-9 50000
1/12/2021 52.25
1/9/2021
20-29 100000 1/8/2021 50.25
1/7/2021 48
30-39 90000 1/6/2021 49.75
1/5/2021 52.5
40-49 80000 1/4/2021 51.25
1/3/2021
60-69 50000 44 46 48 50 52 54 56
70+ 40000
In this chart, each bar would be separated into eight segments, one for each age group,
and each segment would be labeled with the age range it represents. The bars would be
ordered by age group, with the youngest age group (0-9) at the bottom and the oldest
age group (70+) at the top.
By using a stacked bar chart in this way, viewers can quickly see how the population of a
city is distributed across different age groups and how each age group contributes to the
overall population. This type of chart can be useful for demographic analysis and
2
planning purposes 5
Dot plots:
Dot plots are used to display numerical data using a series of dots aligned along a
common axis. They are effective for displaying small datasets and identifying
outliers or anomalies.
Example: A dot plot can be used to compare the heights of different individuals. By
aligning the dots along a common baseline and ordering them by height, viewers
can quickly see how the heights of different individuals compare to one another.
2
6
Stock Price ($)
Individual Height (inches) 53
52.5
Alice 65 52
51.5
Bob 72 51
50.5
Carol 68 50
49.5
Dave 66 49
48.5
Eve 64 48
47.5
Frank 69 12/
31/
202
In this chart, each dot would represent an individual, and the dots would be ordered by height,
10/
1/2
with the tallest individual (Bob) at the top and the shortest individual (Eve) at the bottom. By
02
1/2
using a dot plot in this way, viewers can quickly see how the heights of different individuals
/20
21/
compare to one another. This type of chart can be useful for visualizing and analyzing various
3/2
02
types of data, including height, weight, and other quantitative measures
1/4
/20
21/
5/2
02
1/6
/20
21/ 2
7/2
02
7
Line charts:
Line charts are used to display trends in numerical data over time. They are
effective for identifying patterns and trends in the data, as well as highlighting
changes over time.
Example: A line chart can be used to track the performance of a company's stock
over time. By aligning the data points along a common axis and ordering them by
date, viewers can quickly see how the stock price has fluctuated over time and
identify any significant changes or trends.
2
8
Displaying hierarchical structures in data visualization is important for understanding relationships and
patterns in data. One common way to display hierarchical structures is through tree diagrams or tree
maps.
Tree data is a common type of data that represents hierarchical relationships between objects.
Tree data is used to represent a wide range of information, from organizational structures to computer file systems.
The nodes of a tree data structure represent the objects in the hierarchy, while the edges represent the
relationships between them.
In data visualization, displaying hierarchical structures is commonly done using a tree diagram.
A tree diagram is a graphical representation of a tree data structure that shows the relationships between nodes. It
typically consists of nodes, edges, and labels
Nodes in a tree diagram represent objects or groups of objects, while edges represent the relationships between
them.
Each node can have one or more child nodes, representing the objects or groups of objects that are
"contained" within it. The root node represents the top-level object or group of objects in the hierarchy.
For example, a company's organizational structure can be displayed using a tree diagram. The top-level
node represents the C E O or president, and the child nodes represent different departments or divisions.
Each department can have further child nodes representing teams or individuals
2
9
In this example, the "Animals" node is the root node, representing the top-level object
in the hierarchy. It has two child nodes, "Mammals" and "Birds", which represent groups
of objects that are "contained" within the "Animals" node.
The "Mammals" node has two child nodes, "Dogs" and "Cats", which represent groups
of objects that are "contained" within the "Mammals" node. Similarly, the "Birds" node
has two child nodes, "Parrots" and "Eagles".
Each node in the tree diagram can be labeled with additional information about the
objects it represents. For example, the "Dogs" node might be labeled with the number
of dogs in the group, or the breeds of dogs in the group.
3
0
3
1
Graph data is a type of data that represents relationships between objects.
A graph consists of nodes, also known as vertices, and edges, which connect the nodes.
In data visualization, graph data is often displayed using graph diagrams, also known as network
diagrams.
Graphs are powerful tools for data visualization because they allow us to see patterns and
relationships in data that may not be immediately obvious when looking at the raw numbers
There are several rules for graph drawing that can help create clear and
visualizations:
informative
Avoid crossing edges: Crossing edges can make a graph difficult to read and understand. It's best to try to
arrange the nodes and edges in a way that minimizes edge crossings.
Use symmetry and alignment: Symmetry and alignment can help create a sense of order in a graph. For
example, arranging nodes in a circular or radial pattern can help create symmetry, while aligning nodes or
edges can help create a sense of order.
Use color and size to distinguish nodes: Using color and size can help distinguish different types of nodes in
a graph. For example, nodes representing different categories can be color-coded, while larger nodes can
represent nodes with more connections.
Label nodes and edges: Labeling nodes and edges can provide additional information about the
objects and relationships in a graph. Labels can be used to display names, values, or any other relevant
information.
3
2
Here are some examples of how to follow the rules for graph drawing and labeling in data
visualization:
Choose the appropriate graph type: Suppose you want to visualize the sales performance of a company
over time. In this case, a line graph would be appropriate as it allows you to show trends in sales over a
period of time.
Use clear and concise labels: When labeling the axes, use clear and concise labels such as "Year" for the x-
axis and "Sales" for the y-axis. Use a clear and descriptive title such as "Sales Performance of XYZ Company
(2015-2022)".
Scale the axes appropriately: Ensure that the scale on the y-axis is appropriate for your data. For example, if
the sales figures range from $10,000 to $100,000, set the y-axis scale to show all values in this range.
Avoid clutter: Avoid cluttering the graph with unnecessary information such as gridlines or multiple
legends. Use only the essential elements that help to convey your message.
Use color and visual aids carefully: Use color and visual aids to highlight important information, such as
using a different color for each product line. However, avoid using too many colors or visual aids, as this can
make the graph confusing.
Provide context: Provide context for the data by including a brief explanation of what the graph is showing
and why it is important. For example, you might explain that the graph shows the sales performance of XYZ
Company over the last 7 years and that it is important to understand the trends in order to make informed
business decisions
3
3
Time series data refers to a type of data that is collected over a period of time at
regular intervals.
Time series data can be visualized using various types of graphs such as line
charts, area charts, and candlestick charts.
Time series data provides significant value to organizations because it enables
them to analyze important real-time and historical metrics.
A time-series describes a sequence of measurements over time. The time between
those measurements is often constant. Compared to a timeline it contains a much
more specific dataset and the changes in this dataset over time have to be
determines and visualized.
3
4
3
5
3
6
3
7
Time has a special character to consider. It follows some rules which do not necessarily
apply to other data sets.
1. Time Is Involuntary Unlike the three space dimensions, where it is possible to just stand in
one place for a while, it is not possible to stand at one point of time for a while. There is no
possibility to stop time.
2. Time Is Irreversible Time is a sequence of events, happening one after another. The future is
influenced by the past, but not the other way around.What has happened can not be undone.
3. Time Is Required Nothing can happen without time. Time has to progress in order for
anything to happen.
4. Time Is Measurable There are different units to measure time, such as seconds, minutes,
hours, years, ... Important is only that it is possible to measure time, as known from the
watches and clocks.
5. Time Is Absolute Einstein told us that time is relative and strongly depends on the speed of
movement. When looking at time on the earth, however, it can be said that time is absolute,
which means it is the same whether it is measured in Europe or in America, at the South Pole
or at the North Pole.
3
8
Characteristics of time data:
1. Time series data is typically collected at regular intervals, such as hourly, daily, weekly,
or monthly.
2. Time series data is often subject to seasonality or trends, meaning that the data may
exhibit regular patterns or fluctuations over time.
3. Time series data may have missing values due to incomplete data collection or data
errors.
4. Time series data may exhibit autocorrelation, meaning that the values of the data
are dependent on previous values.
3
9
Visualizing time series data is important to
In the graph below, the populations of
identify patterns and trends over time, detect
anomalies, and make informed decisions Europe and Ireland are the dependent
based on the data. variables and time is the independent
variable.
1. Line Graph
A line graph uses points connected by lines (also
called trend lines) to show how a dependent
variable and independent variable changed.
An independent variable, true to its name,
remains unaffected by other parameters, whereas
the dependent variable depends on how the
independent variable changes.
For temporal visualizations, time is always the
independent variable, which is plotted on the
horizontal axis. Then the dependent variable is
plotted on the vertical axis.
4
0
Assignment find other types with examples
4
1
Mapping time refers to the process of converting time data into a format that can
be used for analysis and visualization.
This may involve converting the data into a standardized format, such as a
timestamp or a datetime object, and accounting for missing values or outliers in
the data.
Mapping time may also involve aggregating the data into larger time intervals,
such as hourly or daily averages, to make it easier to visualize and analyze trends in
the data over longer periods of time
4
2
There are several ways to map time, including:
1. X-axis: In many types of time series visualizations, time is mapped to the x-axis of the chart. This
allows the viewer to easily see the trend of the data over time. The x-axis can be labeled with the time
period, such as year, month, or day, depending on the level of granularity of the data.
2. Color: Color can be used to represent time in data visualizations. For example, a heat map can use a
color gradient to show changes in temperature over time. In this case, the color represents the value
of the data point at a particular time.
3. Size: Size can also be used to represent time in data visualizations. For example, a scatter plot can use
the size of the data point to represent the value of the variable at a particular time. This can help
highlight outliers or trends in the data over time.
4. Animation: Animation is a powerful tool for mapping time in data visualization. By animating a chart
or graph, the viewer can see how the data changes over time. For example, an animated line chart can
show the trend of the data over a period of time, with the line moving along the x-axis as the time
changes.
5. Interactive controls: Interactive controls, such as sliders or drop-down menus, can be used to map
time in data visualizations. These controls allow the viewer to adjust the time period shown in the
chart, making it easier to explore the data at different levels of granularity.
4
3
1. What are the advantages and disadvantages of using scatter plots to visualize
quantitative values in one-dimensional data? How can these limitations be
overcome?
2. Compare and contrast different types of bar charts (e.g., horizontal, vertical,
stacked, grouped) and discuss the circumstances under which each type is most
appropriate. Provide examples of real-world applications where each type has
been successfully used to display tabular data.
3. What are the key principles that guide the visualization of hierarchical structures
using tree diagrams? Discuss the different types of tree diagrams (e.g., radial,
rectangular, sunburst) and evaluate their strengths and weaknesses in terms of
their ability to convey the structure and relationships within the data.
4. Discuss the key principles that govern the drawing and labeling of graphs in
data visualization. How can these principles be used to create effective
visualizations that accurately represent the relationships between variables?
4
4
5. What are the characteristics of time series data, and how do these impact the
choice of visualization techniques? Compare and contrast different methods for
visualizing time series data, including line charts, area charts, and stacked bar
charts.
6. How can data visualization be used to reveal patterns and insights in multi-
dimensional data? Discuss the use of techniques such as scatter plots, heat maps,
and parallel coordinate plots to identify relationships between variables and
uncover hidden trends in complex datasets.
7. What are the key considerations in designing effective data visualizations for
non-expert audiences? How can visualizations be tailored to different audiences
and contexts to ensure that they are accessible, informative, and engaging?
8. Discuss the challenges and opportunities associated with the use of data
visualization in exploratory data analysis. How can visualization be used to help
analysts identify interesting patterns and trends in large datasets, and what are
the limitations of these techniques?
4
5