Descriptive Statistics - Tabular & Graphical
Descriptive Statistics - Tabular & Graphical
Decisions
Descriptive Statistics
2
Summarizing Data for a Categorical Variable
◼ Frequency Distribution
◼ Relative Frequency Distribution
◼ Percent Frequency Distribution
◼ Bar Chart
◼ Pie Chart
3
Frequency Distribution
◼ A frequency distribution is a tabular summary of data showing the
number (frequency) of observations in each of several non-overlapping
categories or classes.
◼ The objective is to provide insights about the data that cannot be
quickly obtained by looking only at the original data.
4
Frequency Distribution
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the quality of their
accommodations as being excellent, above average, average, below average,
or poor. The ratings provided by a sample of 20 guests are:
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency Distribution
◼ The relative frequency of a class is the fraction or proportion of the total
number of data items belonging to the class
◼ A relative frequency distribution is a tabular summary of a set of data
showing the relative frequency for each class.
7
Percent Frequency Distribution
◼ The percent frequency of a class is the relative frequency multiplied by
100
◼ A percent frequency distribution is a tabular summary of a set of data
showing the percent frequency for each class.
8
Relative & Percent Frequency Distribution
Example: Marada Inn
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100
1/20 = .05
Frequency Distribution
Example: Marada Inn
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Bar Chart
◼ A bar chart is a graphical display for depicting qualitative data.
◼ On one axis (usually the horizontal axis), we specify the labels that are
used for each of the classes.
◼ A frequency, relative frequency, or percent frequency scale can be used
for the other axis (usually the vertical axis).
◼ Using a bar of fixed width drawn above each class label, we extend the
height appropriately.
◼ The bars are separated to emphasize the fact that each class is a
separate category.
11
Bar Chart – Marada Inn.
7
6
5
4
3
2
1
0
12
Pareto Chart
◼ A Pareto chart is a type of chart that contains both bars and a line
graph, where individual values are represented in descending order by
bars, and the cumulative total is represented by the line.
◼ This diagram is named for its founder, Vilfred Pareto, an Italian
economist.
◼ The main focus of Pareto chart is to separate the “vital few” from the
“trivial many”.
◼ In quality control, Pareto charts are used to identify the most important
causes of problems.
13
Vilfred Pareto [1848-1923]
14
15
Pareto Chart – Marada Inn.
Cumulative Percentage
100%
8
7
80%
Frequency
6
5 60%
4
40%
3
2
20%
1
0 0%
16
Pie Chart
◼ The pie chart is a commonly used graphical display for presenting
relative frequency and percent frequency distributions for categorical
data.
◼ First draw a circle; then use the relative frequencies to subdivide the
circle into sectors that correspond to the relative frequency for each
class.
◼ Since there are 360 degrees in a circle, a class with a relative frequency
of 0.25 would consume 0.25(360) = 90 degrees of the circle.
17
Pie Chart – Marada Inn.
Excellent
5%
Poor
10%
Average
25%
18
Pie Chart – Marada Inn.
Insights Gained from the Preceding Pie Chart
◼ One-half of the customers surveyed gave Marada a quality rating of
“above average” or “excellent” . This might please the manager.
◼ For each customer who gave an “excellent” rating there were two
customers who gave a “poor” rating (looking at the top of the pie).
This should displease the manager.
19
Summarizing Data for a Quantitative Variable
◼ Frequency Distribution
◼ Relative Frequency Distribution
◼ Percent Frequency Distribution
◼ Histogram
◼ Cumulative Distributions
20
Frequency Distribution
Example: Hudson Auto Repair
The manager of Hudson Auto would like to gain a better understanding of
the cost of parts used in the engine tune-ups performed in the shop. She
examines 50 customer invoices for tune-ups. The costs of parts, rounded to
the nearest dollar, are listed on the next slide.
Frequency Distribution
Example: Hudson Auto Repair
Sample of Parts Cost($) for 50 Tune-ups
52 54 62 62 62 65 66 67 67 68
68 68 69 69 69 71 71 72 72 73
74 74 75 75 75 76 77 78 79 79
80 81 81 82 83 85 88 89 91 93
97 97 97 98 99 101 104 105 105 109
Frequency Distribution
The three steps necessary to define the classes for a frequency distribution with
quantitative data are:
The goal is to use enough classes to show the variation in the data, but
not so many classes that some contain only a few data items.
Frequency Distribution
Guidelines for Determining the width of each Classes
◼ Use classes of equal width
◼ Approximate Class Width =
Making the classes the same width reduces the chance of inappropriate
interpretations.
Frequency Distribution
Note on Number of Classes and Class Width
◼ In practice, the number of classes and the appropriate class width are
determined by trial and error.
◼ Once a possible number of classes is chosen, the appropriate class width is
found.
◼ The process can be repeated for a different number of classes.
◼ Ultimately, the analyst uses judgment to determine the combination of the
number of classes and class width that provides the best frequency
distribution for summarizing the data.
Frequency Distribution
Guidelines for Determining the Class Limits
◼ Class limits must be chosen so that each data item belongs to one and only
one class.
◼ The lower class limit identifies the smallest possible data value assigned to
the class.
◼ The upper class limit identifies the largest possible data value assigned to
the class.
◼ The appropriate values for the class limits depend on the level of accuracy
of the data.
An open-end class requires only a lower class limit or an upper class limit.
Frequency Distribution
Example: Hudson Auto Repair
Sample of Parts Cost($) for 50 Tune-ups
52 54 62 62 62 65 66 67 67 68
68 68 69 69 69 71 71 72 72 73
74 74 75 75 75 76 77 78 79 79
80 81 81 82 83 85 88 89 91 93
97 97 97 98 99 101 104 105 105 109
Frequency Distribution
Example: Hudson Auto Repair
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 60 2 .04 4
< 70 15 .30 30
< 80 31 2 + 13 .62 15/50 62 .30(100)
< 90 38 .76 76
< 100 45 .90 90
< 110 50 1.00 100
Descriptive Statistics -
Tabular and Graphical Displays
37
Descriptive Statistics -
Tabular and Graphical Displays
◼ Summarizing Data for Two Variables using Graphical Displays
◼ Summarizing Data for Two Variables using Tables
◼ Data Visualization: Best Practices in Creating effective Graphical Displays
Summarizing Data for Two Variables
Using Graphical Displays
◼ In most cases, a graphical display is more useful than a table for
recognizing patterns and trends.
◼ Displaying data in creative ways can lead to powerful insights.
◼ Scatter diagrams and trendlines are useful in exploring the relationship
between two variables.
Scatter Diagram and Trendline
A Positive Relationship
x
Scatter Diagram
A Negative Relationship
x
Scatter Diagram
No Apparent Relationship
x
Scatter Diagram & Trend Line
Example: Panthers Football Team
The Panthers Rugby team is interested in investigating the relationship, if
any, between interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30
In ball-playing competitive team sports, an interception or pick is a move by a player involving a pass of
the ball—whether by foot or hand, depending on the rules of the sport—in which the ball is intended
for a player of the same team but caught by a player of the opposing team, who thereby usually gains
possession of the ball for their team. Example: Rugby
Scatter Diagram & Trend Line
The Panthers football team is interested in investigating the relationship, if
any, between interceptions made and points scored.
y
35
30
Number of Points Scored
25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions
Scatter Diagram and Trendline
◼ The left and top margin labels define the classes for the two variables.
Cross-Tabulation
Example: Lake Homes
◼ The number of Lakes homes sold for each style and price for the past two
years is shown below.
quantitative categorical
variable variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45
Total 30 20 35 15 100
Colonial Log
Split A-Frame
Cross-Tabulation
Example: Lake Homes
Insights Gained from Preceding Crosstabulation
◼ The greatest number of homes (19) in the sample are a split-level style and
priced at less than $200,000.
◼ Only three homes in the sample are an A-Frame style and priced at
$200,000 or more.
Cross-Tabulation
Example: Lake Homes
Frequency
distribution
for the
price range
variable
Total 30 20 35 15 100
Lake Homes
20
15
Frequency
10
0
Colonial Log Split A-Frame
<$200,000 ≥$200,000
Stacked Bar Chart
◼ A stacked bar chart is another way to display and compare two variables
on the same display.
◼ It is a bar chart in which each bar is broken into rectangular segments of a
different color.
◼ If percentage frequencies are displayed, all bars will be of the same height
(or length), extending to the 100% mark.
Lake Homes
40
30
Frequency
20
10
0
Colonial Log Split A-Frame
<$200,000 ≥$200,000
Thank You !!!