Meet Up 1 - Introduction to Data Visualization
Meet Up 1 - Introduction to Data Visualization
Introduction to
Data Visualization
Anton Suhartono
AGENDA
Review
- Why Data Visualization
- Definition Data Visualization
- Step Creating Data Visualization
- Type of Data Visualization
- Choosing the right Chart
- Tools for Data Visualization
- Introduction to Pandas
- Introduction to SQL Database
Practice
- How to Manipulate data using SQL &
Python
- Understanding & Importing Data
- Selecting Data based on Criteria
- Grouping & Aggregation
- Creating new Column based on Criteria
Definition of Data
Visualization
Why need Visualization
Illustration 1
Report 1
Month January February March April May June July August September October November December
Number of Call 8,994,827 6,942,827 6,742,927 5,273,429 4,275,429 4,070,429 3,900,029 3,500,029 3,495,029 3,422,220 3,375,429 2,075,429
Easy to understand
Report 2 Number of Call
10,000,000
9,000,000
8,000,000
7,000,000
6,000,000
5,000,000
4,000,000
3,000,000
2,000,000
1,000,000
-
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Key Figures in the History of Data Visualization
Illustration 2
Develop your
DATA research
Test Your Draft
VISUALIZATION question
Prepare
Select Tools Choose the your Data
Data Vis Right Chart
Type of Data
Visualization
Narrative vs Explorative
Narrative Visual
Narrative Visual
- Usually used to explain the final results or conclusions of the analyst
- Static
- Using Visual Beauty
- Explanation not Detail
- Easy to Understand
Gambar 1: Heatmap menggambarkan transaksi kumulatif harian selama satu tahun. Grafik ini tidak
menampilkan data secara detail karena tujuan utamanya adalah memperlihatkan pada jam berapa
transaksi tertinggi dan terendah terjadi.
Narrative vs Explorative
Explorative Visual
Explorative Visual
- Describes the process carried out to get the right end
result
- Complex & Detail
- Selective Audience
Gambar 2: Grafik transaksi harian selama satu tahun. Grafik menggunakan elemen secara detail untuk
memperlihatkan performa per jam setiap hari.
Static & Dynamic
Definition
Dynamic Visual
Usually for presenting Report
Periodically
- tableau, d3, plotly-dash, etc
Static Visual
Usually for presenting final Report
or exploring data
- matplotlib, ppt, excel
Choosing the Right
Data Visualization
Choosing Chart
Goals
Definition:
Data tables display information in a
grid-like format of rows and columns.
Visual Dimensions:
Columns, Value of Data
Usage:
Detail Observation
Scatter Plot
Definition, Usage, Tips & Tricks
Definition:
This graph is used to describe the relationship between two
variables. The X axis represents abstract values that are
independent of other variables, so they are called
independent variables. The value of Y is the dependent
variable and is placed on the vertical axis.
Visual Dimensions:
Length, line, dot
Usage:
- Correlation two variables
- Perfect to use for large data sets such as
population or epidemiology studies.
Scatter Plot
Definition, Usage, Tips & Tricks
Use lines to show trends & Use as few lines as Always start with the Y-axis at 0.
relationships. possible
Bubble Chart
Definition, Usage, Tips & Tricks
Definition:
is a variation of a scatter chart in which the data points are
replaced with bubbles, and an additional dimension of the
data is represented in the size of the bubbles.
Visual Dimensions:
Length, line, dot, size, color
Usage:
Correlation two variables in dimension
Bubble Chart
Definition, Usage, Tips & Tricks
Use simple shapes. Use clear and visible Size bubbles appropriately.
Circles work best. labels.
Column Chart
Definition, Usage, Tips & Tricks
Definition:
Column charts or vertical charts can be used to compare a
number of categories and/or their changes in a certain time
period (trend). When used to display trends, they function
the same as line charts.
Visual Dimensions:
Length, category, color
Usage:
compare a number of categories and/or their changes in a
certain time period (trend)
Tips & Tricks:
• Multiple categories, use a different color for each
category, or use the darker color the more prominent.
• This graph will be difficult to read if it contains too many
categories.
• Always use zero baseline or zero point on the Y axis.
• Use a consistent scale.
Bar Chart
Definition, Usage, Tips & Tricks
Definition:
Bar charts use horizontal bars to display data and are used to compare
values across categories. The lengths of the bars are proportional to
the values they represent.
Visual Dimensions:
Length, category, color
Usage:
Best suited for data comparisons with multiple categories or data
series (data series)
Tips & Tricks:
• For ease of reading data, you can sort categories based on their
value, for example from the highest to the lowest value
• It is different with data series, where data is distributed based on
tiered categories, for example the population based on age range
or education level.
Histogram
Definition, Usage, Tips & Tricks
Definition:
A graphical display of data using bars of different heights. At
first glance this chart is similar to a bar/column chart.
However, there is actually a fundamental difference between
a histogram and a bar graph. The distance between the
columns / rods is made as close as possible, even sticking.
From a visual perspective, this narrow distance will bring the
reader's eye to connect groups of data and sort them based
on certain criteria.
Visual Dimensions:
Length, category, color
Usage:
displays the shape and spread of continuous sample data
Tips & Tricks:
• Always use zero baseline or zero point on the Y axis.
• No space between categories
Column Chart vs Histogram
Definition, Usage, Tips & Tricks
Definition:
Box plots visually show the distribution of numerical data and
skewness through displaying the data quartiles (or
percentiles) and averages.
Usage:
Show distribution of data
outlier
Pie Chart
Definition, Usage, Tips & Tricks
Definition:
used to describe the composition between parts of a unified
whole. This part is usually represented in percent so that if all
the parts are added up, the result equals one hundred
percent.
Visual Dimensions:
Proportion/Percentage, Category, Color
Usage:
Percentage of categories in a data
Pie Chart
Definition, Usage, Tips & Tricks
Less is more. Clearly label percentages Avoid the use of 3D pie charts, Order slices
No more than 5 categories to avoid misinterpretation of they make the data more so that they are quickly
the segment sizes difficult to understand understood
Donut Chart
Definition, Usage, Tips & Tricks
Definition:
This graph is another form of pie chart, its function also
represents the proportion or composition between parts. The
total number of parts was one hundred percent.
Because it looks simpler, this graph is also often modified into
a semicircle
Visual Dimensions:
Proportion/Percentage, Category, Color
Usage:
Percentage of categories in a data
Text & Number
Definition, Usage, Tips & Tricks
Definition:
Data does not have to be presented in graphical form. Can
use text and numbers only, with a note that only 1-2 data you
want to display. Give bold or color to the number or text that
you want to highlight so that the reader's attention is focused
on that part.
Visual Dimensions:
text
Usage:
Summarizing data
Tips & Tricks:
• Clear text
Line Chart
Definition, Usage, Tips & Tricks
Definition:
a type of chart which displays information as a series of data
points called 'markers' connected by straight line segments.
The X axis usually represents the time period, the Y axis
represents the value/quantity.
Visual Dimensions:
Length, Series of time, Line
Usage:
Time series Data
MultiLine Chart
Definition, Usage, Tips & Tricks
Definition:
is a basic line chart with one or more additional lines that
represent comparison trends.
Visual Dimensions:
Length, Series of time, Line, color
Usage:
Comparison Time series Data
MultiLine Chart
Definition, Usage, Tips & Tricks
Use a maximum of 4 lines Use as few lines as Use solid line Label each line
when comparing possible instead separately
Area Chart
Definition, Usage, Tips & Tricks
Definition:
displays graphically quantitative data. It is based on the
line chart. The area between axis and line are commonly
emphasized with colors, textures and hatchings.
Visual Dimensions:
Length, Category, Area, Color
Usage:
used to illustrate total values in numbers or percentages
over time
Tips & Tricks:
Don't let any area cover other areas.
Heat Map
Definition, Usage, Tips & Tricks
Definition:
to show relationships between two variables, one plotted on
each axis. By observing how cell colors change across each
axis, you can observe if there are any patterns in value for one
or both variables
Visual Dimensions:
Color, Variables
Usage:
show relationships between two variables
Heat Map
Definition, Usage, Tips & Tricks
Use Simple color gradients Keep patterns to a minimal Use Clear map boundaries
Social Network
Definition, Usage, Tips & Tricks
Definition:
A social network diagram visually displays the relationships
and interactions between people, groups, computers and
other information entities. It maps out the nodes (individuals
or groups) and the links (relationships or interactions) that
connect them.
Visual Dimensions:
Dot, size, line
Usage:
Transaction of money, social media interaction
Word Cloud
Definition, Usage, Tips & Tricks
Definition:
can be used to highlight popular values or show the
frequency of text data using font size and color. In a word
cloud chart, more prominent values are displayed with a
larger font size than the less prominent values.
Visual Dimensions:
Text, size
Usage:
Popular topic in social media or text
Tips & Tricks:
Filter unnecessary word, prefix, etc.
Sankey Chart
Definition, Usage, Tips & Tricks
Definition:
a visualization used to depict a flow from one set of values
to another. The things being connected are called nodes
and the connections are called links.
Visual Dimensions:
Nodes, link
Usage:
- a many-to-many mapping between two domains
- multiple paths through a set of stages
- (for instance, Google Analytics uses sankeys to show
how traffic flows from pages to other pages on your
web site).
Map Chart
Definition, Usage, Tips & Tricks
Definition:
Map charts allow you to position your data in a context,
often geographical, using different layers. The layers can be
either data layers, such as marker layers or feature layers,
or reference layers such as map layers.
Visual Dimensions:
marker, map, data
Usage:
Knowing characteristic data in selected region
Tools for
Data Visualization
Type of Tools Data Visualization
Definition, Usage, Tips & Tricks
• EXCEL FILE
# reading Excel file
Xlsx extension is used for files saved as Microsoft Excel worksheets. data = pd.read_excel(‘data.xlsx’, sheet_name=“sheet_name”)
• JSON FILE
JSON stands for JavaScript Object Notation. JSON is a lightweight # reading Json File
data = pd.read_json(‘files/sample_file.json’, orient=“index”)
format for storing and transporting data.
• PICKLE FILE
Python pickle files are the binary files that keep the data and hierarchy # reading Pickle File
data = pd.read_pickle(“./dummy.pkl”)
of Python objects. They usually have the extension .pickle or .pkl.
• Structured Query Language (SQL) is a programming language that is typically used in relational database
management systems (RDBMS).
• We use SQL to be able to communicate with databases directly.
• It is capable to perform tasks such as creating, reading, updating, and deleting tables in a database.
Table
Command in SQL
DDL, DML, DCL
Thank You
Ikan hiu makan kecap
See you at next meet up !