02 Abstractions
02 Abstractions
Abstractions
Tamara Munzner
Department of Computer Science
University of British Columbia
Week 1, Thu 13 Jan 2021
https://www.students.cs.ubc.ca/~cs-436v/21Jan/
Introduction
2
Why does visualization work?
• limits of memory & cognition
– change blindness
• power of perception to reveal MTHIVLWYADCEQGHKILKMTWYN
– how many V's?
ARDCAIREQGHLVKMFPSTWYARN
GFPSVCEILQGKMFPSNDRCEQDIFP
SGHLMFHKMVPSTWYACEQTWRN
3
Why does visualization work?
• limits of memory & cognition
– change blindness
• power of perception to reveal MTHIVLWYADCEQGHKILKMTWYN
– how many V's?
ARDCAIREQGHLVKMFPSTWYARN
GFPSVCEILQGKMFPSNDRCEQDIFP
SGHLMFHKMVPSTWYACEQTWRN
4
Why does visualization work?
• limits of memory & cognition
– change blindness
• power of perception to reveal
– how many V's?
– which of these 50 numbers appears most often?
15 19 60 33 11 75 57 34 79 18 51 92 73 22 13 71 60 22
17 10 68 73 18 55 65 46 29 60 73 22 46 92 97 10 58 46
57 17 83 26 99 33 88 92 60 91 29 57 96 12 47
5
Why does visualization work?
• limits of memory & cognition
– change blindness
• power of perception to reveal
– how many V's?
– which of these 50 numbers appears most often?
15 19 60 33 11 75 57 34 79 18 51 92 73 22 13 71 60 22
17 10 68 73 18 55 65 46 29 60 73 22 46 92 97 10 58 46
57 17 83 26 99 33 88 92 60 91 29 57 96 12 47
6
Exercise
• Which gender and income level shows a different effect of age on
triglyceride levels?
7
Exercise
• Which gender and income level shows a different effect of age on
triglyceride levels?
8
TreeJuxtaposer
Why analyze visualizations? SpaceTree
Targets TreeJuxtaposer
Path between two nodes Encode Navigate Select Arrange
9
Abstractions: Nested Model
10
How to evaluate a visualization: So many methods, how to pick?
• Computational benchmarks?
– quant: system performance, memory
• User study in lab setting?
– quant: (human) time and error rates, preferences
– qual: behavior/strategy observations
• Field study of deployed system?
– quant: usage logs
– qual: interviews with users, case studies, observations
• Analysis of results?
– quant: metrics computed on result images
– qual: consider what structure is visible in result images
• Justification of choices?
– qual: perceptual principles, best practices 11
Nested model: Four levels of visualization design
• domain situation
– who are the target users?
• abstraction domain
– translate from specifics of domain to vocabulary of visualization abstraction
• what is shown? data abstraction
• why is the user looking at it? task abstraction idiom
– often must transform data, guided by task algorithm
• idiom
– how is it shown?
• visual encoding idiom: how to draw
• interaction idiom: how to manipulate [A Nested Model of Visualization Design and Validation.
Munzner. IEEE TVCG 15(6):921-928, 2009
Domain situation
You misunderstood their needs
Data/task abstraction
You’re showing them the wrong thing
Algorithm
Your code is too slow
13
Interdisciplinary: need methods from different fields at each level
• mix of qual and quant approaches (typically)
Domain situation problem-driven
anthropology/ Observe target users using existing tools qual
work
ethnography
Data/task abstraction
computer Algorithm
Measure system time/memory
quant technique-driven
science Analyze computational complexity work
Analyze results qualitatively qual
psychology
Measure human time with lab experiment (lab study) quant
Domain situation
Observe target users using existing tools
Data/task abstraction
Algorithm
benchmarks can't
Measure system time/memory confirm design
Analyze computational complexity
Analyze results qualitatively
lab studies can't
Measure human time with lab experiment (lab study)
confirm task
Observe target users after deployment ( ) abstraction
Measure adoption
[A Nested Model of Visualization Design and Validation. Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ] 15
Abstraction: Data (What)
16
What does data mean?
14, 2.6, 30, 30, 15, 100001
• What does this sequence of six numbers mean?
– two points far from each other in 3D space?
– two points close to each other in 2D space, with 15 links between them, and a weight of
100001 for the link?
– something else??
Basil, 7, S, Pear
• What about this data?
– food shipment of produce (basil & pear) arrived in satisfactory condition on 7th day of
month
– Basil Point neighborhood of city had 7 inches of snow cleared by the Pear Creek Limited
snow removal service
– lab rat Basil made 7 attempts to find way through south section of maze, these trials used
pear as reward food 17
Now what?
• semantics: real-world meaning
18
Now what?
• semantics: real-world meaning
19
Now what?
• semantics: real-world meaning
• data types: structural or mathematical interpretation of data
– item, link, attribute, position, (grid)
– different from data types in
programming!
20
Items & Attributes
• item: individual entity, discrete
– eg patient, car, stock, city attributes: name, age, shirt size, fave fruit
– "independent variable"
• attribute: property that is
measured, observed, logged...
– eg height, blood pressure for patient
– eg horsepower, make for car
– "dependent variable"
item: person
21
Other data types
• links
– express relationship between two items
– eg friendship on facebook, interaction between proteins
• positions
– spatial data: location in 2D or 3D
– pixels in photo, voxels in MRI scan, latitude/longitude
• (grids) ﺷﺑﻛﺎت
– sampling strategy for continuous data
22
• flat table
Dataset
Data andtypes
Dataset Types
–one item per row
Tables Networks &–eachFields
column isGeometry
attribute Clusters,
Trees Sets, Lists
–cell holds value
Items Items (nodes) Grids Items Items
Attributes Links Positions Positions
attributes: name, age, shirt size, fave fruit
Attributes Attributes
Dataset Types
Tables Networks Fields (Continuous) Geometry (Spatial)
Attributes (columns) Grid of positions
Items Link
Cell
(rows) Position
Node
(item)
Cell containing value Attributes (columns)
Value in cell
Dataset Types
Tables Networks Fields (Continuous) Geometry (Spatial)
Attributes (columns) Grid of positions
Items Link
Cell
(rows) Position
Node
(item)
Cell containing value Attributes (columns)
Value in cell
25
Table
item cell
attribute
26
Dataset
Data andtypes
Dataset Types
Tables •Dataset
Networks multidimensional
& Fields
Types tables
Geometry Clusters,
Trees –indexing based on multiple Sets, Lists
keys
Tables Networks Fields (Continuous) G
Items Items (nodes) Grids
•eg genes, Items
patients Items
Attributes (columns) Grid of positions
Attributes Links Positions Positions
Items Link
Cell
Attributes (rows) Attributes
Node
(item)
Dataset Types Cell containing value Attributes (columns)
Value in cell
Tables Networks
Multidimensional Table
Fields (Continuous)
Trees
Geometry (Spatial)
Attributes (columns) Grid of positions
Items Link
Cell
(rows) Position
Node
Value in cell
(item)
Cell containing value Attributes (columns)
Value in cell
https://bl.ocks.org/jasondavies/1341281
28
Dataset
Data andtypes
Dataset Types
Tables Networks & Fields Geometry
• network/graph Clusters,
Trees Sets, Lists
–nodes (vertices) connected by links (edges)
Items Items (nodes) Grids Items Items
–tree is special case: no cycles
Attributes Links Positions Positions
•often have roots and are directed
Dataset TypesAttributes Attributes
Dataset
Dataset Types
Tables
Types Networks Fields (Continuous) Geometry (Spatial)
Tables
Tables Attributes (columns) Networks
Networks Fields Grid of positions
(Continuous)
Fields (Continuous) Geometry
Geometry(Spatial)
(Spatial)
Items Link
Attributes
Attributes(columns)
(columns) CellGrid
Gridof
ofpositions
positions
(rows) Position
Node
Items Link
Link
Items (item) Cell
Cell
(rows) Cell containing value Attributes (columns) Position
Position
(rows) Node
Node
(item)
(item)
Cell Attributes (columns)
Cellcontaining
containingvalue
value Value in cell
Attributes (columns)
Multidimensional Table Trees
Value in cell
Value in cell
Multidimensional
MultidimensionalTable
Table Trees
Trees
29
Visualizing networks
https://observablehq.com/@d3/force-directed-graph https://bost.ocks.org/mike/miserables/
http://atlas.cid.harvard.edu/explore/?
tradeDirection=import&year=2012&product=726&country=undefined&red
irected=true
30
Dataset types
Data and Dataset Types
Tables Networks & Fields Geometry Clusters,
Trees Sets, Lists
Items Items (nodes) Grids Items Items
Attributes Links Positions Positions
Dataset TypesAttributes Attributes
Dataset Types
Tables
Dataset Types Dataset Networks
Types Fields (Continuous) Geometry (Spatial)
Dataset
Tables Types
Tables Attributes (columns) Tables
Networks
Networks Networks
Spatial
Fields Grid of positions
(Continuous)
Fields (Continuous) Fields
Geometry
(Continuous)
Geometry (Spatial)
(Spatial)
Tables
Items
Attributes (columns)
Attributes (columns)
Networks Link
Attributes (columns) Fields (Continuous)
CellGrid
Grid of
ofpositions
positions Geometry
Grid of positions(Spatial)
Position
(rows)
Node Link
Items
Items Attributes (columns) Items Link
Link (item) Cell Grid of positions Cell
(rows) Cell containing value(rows) Cell Attributes (columns) Position
(rows) Node Position
Node
Link
Node
Items Cell (item)
(item)
(item)
(rows) Cell containing value Cell containing value Attributes (columns)
Value in cell Attributes (columns) Position
Cell containing value Node Attributes (columns)
Multidimensional Table Trees (item)
Cell containing value Attributes
Value in (columns)
cell Value in cell
Value in cell
Multidimensional Multidimensional
Trees Table Trees
MultidimensionalTable
Table Trees Value in cell
31
Trees
Spatial fields
• attribute values associated with cells
• cell contains value from continuous
domain
– eg temperature, pressure, wind velocity
• measured or simulated
Networks
Spatial Fields (Continuous) Geometry (Spatial)
ns) Fields (Continuous) Geometry
Grid of positions(Spatial)
Link
Grid of positions Cell
Position
Node
k
Cell (item)
lue Attributes (columns) Position
Node
(item)
Attributes (columns) Value in cell
able Trees
Value in cell
32
Spatial fields
• attribute values associated with
cells
• cell contains value from
continuous domain
– eg temperature, pressure, wind
velocity
• measured or simulated
• beyond the scope of this class
– sampling
where attributes are measured
– interpolation
how to model attributes elsewhere
– grid types
33
Spatial fields
• attribute values associated with scalar
cells
• cell contains value from
continuous domain
– eg temperature, pressure, wind
velocity
• measured or simulated vector
• beyond the scope of this class
– sampling
where attributes are measured
– interpolation
how to model attributes elsewhere tensor
– grid types, tensors
34
Dataset types
Data and Dataset Types
Tables Networks & Fields Geometry Clusters,
Trees Sets, Lists
Items Items (nodes) Grids Items Items
Attributes Links Positions Positions
Dataset TypesAttributes Attributes
Dataset Types
Tables
Dataset Types Dataset Networks
Types Fields (Continuous) Geometry (Spatial)
Dataset
Tables Types
Tables Attributes (columns) Tables
Networks
Networks Networks
Spatial
Fields Grid of positions
(Continuous)
Fields (Continuous) Fields
Geometry
(Continuous)
Geometry (Spatial)
(Spatial)
Tables
Items
Attributes (columns)
Attributes (columns)
Networks Link
Attributes (columns) Fields (Continuous)
CellGrid
Grid of
ofpositions
positions Geometry
Grid of positions(Spatial)
Position
(rows)
Node Link
Items
Items Attributes (columns) Items Link
Link (item) Cell Grid of positions Cell
(rows) Cell containing value(rows) Cell Attributes (columns) Position
(rows) Node Position
Node
Link
Node
Items Cell (item)
(item)
(item)
(rows) Cell containing value Cell containing value Attributes (columns)
Value in cell Attributes (columns) Position
Cell containing value Node Attributes (columns)
Multidimensional Table Trees (item)
Cell containing value Attributes
Value in (columns)
cell Value in cell
Value in cell
Multidimensional Multidimensional
Trees Table Trees
MultidimensionalTable
Table Trees Value in cell
35
Trees
Geometry
• shape of items
• explicit spatial positions
• points, lines, curves, surfaces, regions
– (volumes outside scope of class)
• boundary between computer graphics
and visualization
– graphics: geometry taken as given
– vis: geometry is result of a design decision
36
Dataset types
Data and Dataset Types
Tables Networks & Fields Geometry Clusters,
Trees Sets, Lists
Items Items (nodes) Grids Items Items
Attributes Links Positions Positions
Dataset TypesAttributes Attributes
Dataset Types
Tables
Dataset Types Dataset Networks
Types Fields (Continuous) Geometry (Spatial)
Dataset
Tables Types
Tables Attributes (columns) Tables
Networks
Networks Networks
Spatial
Fields Grid of positions
(Continuous)
Fields (Continuous) Fields
Geometry
(Continuous)
Geometry (Spatial)
(Spatial)
Tables
Items
Attributes (columns)
Attributes (columns)
Networks Link
Attributes (columns) Fields (Continuous)
CellGrid
Grid of
ofpositions
positions Geometry
Grid of positions(Spatial)
Position
(rows)
Node Link
Items
Items Attributes (columns) Items Link
Link (item) Cell Grid of positions Cell
(rows) Cell containing value(rows) Cell Attributes (columns) Position
(rows) Node Position
Node
Link
Node
Items Cell (item)
(item)
(item)
(rows) Cell containing value Cell containing value Attributes (columns)
Value in cell Attributes (columns) Position
Cell containing value Node Attributes (columns)
Multidimensional Table Trees (item)
Cell containing value Attributes
Value in (columns)
cell Value in cell
Value in cell
Multidimensional Multidimensional
Trees Table Trees
MultidimensionalTable
Table Trees Value in cell
37
Trees
Collections
• how we group items
• sets
– unique items, unordered
• lists
– ordered, duplicates possible
• clusters
– groups of similar items
38
Dataset and data types
Data and Dataset Types
Tables Networks & Fields Geometry Clusters,
Trees Sets, Lists
Items Items (nodes) Grids Items Items
Attributes Links Positions Positions
Attributes Attributes
Data Types
Items Attributes Links Positions Grids
39
Attribute types
Attributes
• which classes of values &
measurements? Attribute Types
Categorical Ordered
• categorical (nominal)
Ordinal Quantitative
– compare equality
– no implicit ordering
• ordered
– ordinal Ordering Direction
• less/greater than defined
Sequential Diverging Cyclic
– quantitative
• meaningful magnitude
• arithmetic possible
40
Table
41
categorical
ordinal
quantitative
42
Quiz: What kind of variable?
• 50 meter race times
• college major
• Amazon rating for product
• product name
43
Other data concerns
Attributes
Attribute Types
Categorical Ordered
Ordinal Quantitative
44
Hierarchical data
• multi-level structure
– space
– time
– others
• example: zipdecode
https://benfry.com/zipdecode/ 45
Data abstraction: Three operations
• translate from domain-specific language to generic visualization language
46
Data vs conceptual models
• data model
– mathematical abstraction
• sets with operations, eg floats with * / - +
• variable data types in programming languages
• conceptual model
– mental construction (semantics)
– supports reasoning
– typically based on understanding of tasks [stay tuned, next week]
47
Data vs conceptual model, example
• data model: floats
– 32.52, 54.06, -14.35, ...
• conceptual model
– temperature
• multiple possible data abstractions
– continuous to 2 significant figures: quantitative
• task: forecasting the weather
– hot, warm, cold: ordinal
• task: deciding if bath water is ready
– above freezing, below freezing: categorical
• task: decide if I should leave the house today
48
Derived attributes
• derived attribute: compute from originals
– simple change of type
– acquire additional data
– complex transformation
• more on this next time
exports
imports
trade
balance
https://
www.thesquirrelcensus.com/
https://
data.cityofnewyork.us/
Environment/2018-Central-
Park-Squirrel-Census-
Squirrel-Data/vfnx-vebw
50
Data Types Attribute Types
What?
Items Attributes Links Positions Grids Categorical
Datasets Attributes
Data and Dataset Types
Items Link
Cell
(rows) Diverging
Dataset Types Node
Items Link
Cell
(rows)
Node
Diverging
(item)
Cell containing value Attributes (columns)
Value in cell
Value in cell
Value in cell
Position
52
Nested model: Four levels of visualization design
• domain situation
– who are the target users?
• abstraction domain
– translate from specifics of domain to vocabulary of visualization abstraction
• what is shown? data abstraction
• why is the user looking at it? task abstraction idiom
– often must transform data, guided by task algorithm
• idiom
– how is it shown?
• visual encoding idiom: how to draw
• interaction idiom: how to manipulate [A Nested Model of Visualization Design and Validation.
Munzner. IEEE TVCG 15(6):921-928, 2009
54
Design Process
Characterize Domain Situation
56
Abstraction: Data & task
• map what and why into generalized terms
– identify tasks that users wish to perform, or already do
– find data types that will support those tasks domain
• possibly transform /derive if need be abstraction
57
Example: Find good movies
• identify good movies in genres I like
• domain:
– general population, movie enthusiasts
• task: what is a good movie for me?
– highly rated by critics?
– highly rated by audiences?
– successful at the box office?
– similar to movies I liked?
– matches specific genres?
• data: (is it available?)
– yes! data sources IMDB, Rotten Tomatoes...
58
Example: Find good movies
• one possible choice for data and tasks, in domain language
– data: combine audience ratings and critic ratings
– task: find high-scoring movies for specific genre
• abstractions?
– attribute: audience & critic ratings
• ordinal
– levels: 3 or 5 or 10...
– attribute: genre one possible idiom
• categorical – stacked bar chart for ratings
– levels: < 20
– items: movies
• items: millions
– task: find high values?
59
Example: Horrified
• same task: high-score movies
• slightly different data
– 14K rated horror movies from IMDB
• very different visual encoding idiom
– circle per item (movie)
– circle area = popularity
– stroke width/opacity = avg rating
– year made = vertical position
• interaction idiom
– lines connect movies w/ same director,
on mouseover
http://alhadaqa.com/2019/10/horrified/ 60
Task abstraction: Actions and targets
• very high-level pattern • {action, target} pairs
–discover distribution
–compare trends
• actions –locate outliers
– analyze –browse topology
• high-level choices
– search
• find a known/unknown item
– query
• find out about characteristics of item
61
Actions
Actions: Analyze
Analyze
• consume ﺗﺳﺗﮭﻠك
Consume
– discover vs present
Discover Present Enjoy
• classic split
• aka explore vs explain
– enjoy
• newcomer Produce
• aka casual, social Annotate Record Derive
tag
• produce ﺗﻧﺗﺞ
– annotate, record
– derive Search
• crucial design choice
Target known Target unknown
Location 62
Lookup Browse
Derive
• don’t just draw what you’re given!
– decide what the right thing to show is
– create it with a series of transformations from the original dataset
– draw that
• one of the four major strategies for handling complexity
exports
imports
trade
balance
Task 1 Task 2
Why?
How?
What?
Why?
How?
What?
Why?
How?
65
Actions: Search
• what does user know? Search
– target, location Target known Target unknown
• lookup Location
Lookup Browse
known
– ex: word in dictionary
• alphabetical order Location
Locate Explore
unknown
• locate
– ex: keys in your house
– ex: node in network Query
• which is better?
– depends on goals / task
• enjoy, social context, lots of time
• find 2nd-best rated movie of all time
http://alhadaqa.com/2019/10/horrified/
– Jeopardy call, < 10 seconds to respond!
67
Actions: Search, query
Search
• what does user know?
Target known Target unknown
– target, location
Location
Lookup Browse
known
• how much of the data Location
matters? unknown
Locate Explore
The Economist 69
Why? Extremes
Task Search
abstraction: Targets
Targets
Target known Target unknown
All Data Location
known
Lookup Browse Network Data
Trends Outliers
Location Features
Locate Explore Topology
unknown
Query Paths
Attributes Identify Compare Summarize
One Many
Distribution Dependency Correlation Similarity Spatial Data
Shape
Extremes
Network Data 70
Abstraction
• these {action, target} pairs are good starting point for vocabulary
– but sometimes you'll need more precision!
• rule of thumb
– systematically remove all domain jargon
71
Examples: Job market
• trends
– how did job market develop
since recession overall?
• outliers
– real estate related jobs
https://www.nytimes.com/interactive/2014/06/05/upshot/how-the-recession-reshaped-the-economy-in-255-charts.html 72
Example: Task abstraction in genomics
You have been approached by a geneticists to help with a visualization
problem. She has gene expression data (data that measures the activity of
the genes) for 30 cancer tissue samples. She is applying an experimental
drug to see whether the cancer tissue dies as she hopes, but she finds
that only some samples show the desired effect. She believes that the
difference between the samples is caused by differential expression
(different activity) of genes in a particular pathway, i.e., an interaction
network of genes. She would like to understand which genes are likely to
cause the difference, and what role they play in that pathway.
73
Annotate Record Derive
Example: Task abstraction
Annotate in genomics
Record
tag
Derive
Location
Extremes Locate
Search unknown
Query Paths
Identify Compare Summarize 75
Why?
Actions Targets
Attributes
Produce
Annotate Record Derive One Many
tag Distribution Dependency Correlation Similarity
Extremes
Search
Target known
• {action, target} pairs Location
Target unknown
77
Assignments
• Programming 0
• Foundations 1
78