0% found this document useful (0 votes)
8 views

Descriptive Statistics - Tabular & Graphical

PPT

Uploaded by

cbsemathsx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Descriptive Statistics - Tabular & Graphical

PPT

Uploaded by

cbsemathsx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Data Science for Managerial

Decisions
Descriptive Statistics

Tabular and Graphical Displays

2
Summarizing Data for a Categorical Variable
◼ Frequency Distribution
◼ Relative Frequency Distribution
◼ Percent Frequency Distribution
◼ Bar Chart
◼ Pie Chart

3
Frequency Distribution
◼ A frequency distribution is a tabular summary of data showing the
number (frequency) of observations in each of several non-overlapping
categories or classes.
◼ The objective is to provide insights about the data that cannot be
quickly obtained by looking only at the original data.

4
Frequency Distribution
Example: Marada Inn

Guests staying at Marada Inn were asked to rate the quality of their
accommodations as being excellent, above average, average, below average,
or poor. The ratings provided by a sample of 20 guests are:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Frequency Distribution
Example: Marada Inn

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency Distribution
◼ The relative frequency of a class is the fraction or proportion of the total
number of data items belonging to the class
◼ A relative frequency distribution is a tabular summary of a set of data
showing the relative frequency for each class.

7
Percent Frequency Distribution
◼ The percent frequency of a class is the relative frequency multiplied by
100
◼ A percent frequency distribution is a tabular summary of a set of data
showing the percent frequency for each class.

8
Relative & Percent Frequency Distribution
Example: Marada Inn

Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100

1/20 = .05
Frequency Distribution
Example: Marada Inn

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Bar Chart
◼ A bar chart is a graphical display for depicting qualitative data.
◼ On one axis (usually the horizontal axis), we specify the labels that are
used for each of the classes.
◼ A frequency, relative frequency, or percent frequency scale can be used
for the other axis (usually the vertical axis).
◼ Using a bar of fixed width drawn above each class label, we extend the
height appropriately.
◼ The bars are separated to emphasize the fact that each class is a
separate category.

11
Bar Chart – Marada Inn.

Marada Inn Quality Ratings


10
9
8
Frequency

7
6
5
4
3
2
1
0

Poor Below Average Above Excellent


Average Average
Rating

12
Pareto Chart
◼ A Pareto chart is a type of chart that contains both bars and a line
graph, where individual values are represented in descending order by
bars, and the cumulative total is represented by the line.
◼ This diagram is named for its founder, Vilfred Pareto, an Italian
economist.
◼ The main focus of Pareto chart is to separate the “vital few” from the
“trivial many”.
◼ In quality control, Pareto charts are used to identify the most important
causes of problems.

13
Vilfred Pareto [1848-1923]

80% of the land


in Italy was
owned by about
20% of the
population.

14
15
Pareto Chart – Marada Inn.

Marada Inn Quality Ratings


10 120%

Cumulative Percentage
100%
8
7
80%
Frequency

6
5 60%

4
40%
3
2
20%
1
0 0%

Above Average Below Poor Excellent


Average Average
Rating

16
Pie Chart
◼ The pie chart is a commonly used graphical display for presenting
relative frequency and percent frequency distributions for categorical
data.
◼ First draw a circle; then use the relative frequencies to subdivide the
circle into sectors that correspond to the relative frequency for each
class.
◼ Since there are 360 degrees in a circle, a class with a relative frequency
of 0.25 would consume 0.25(360) = 90 degrees of the circle.

17
Pie Chart – Marada Inn.

Excellent
5%
Poor
10%

Below Average Above Average


15% 45%

Average
25%

Above Average Average Below Average Poor Excellent

18
Pie Chart – Marada Inn.
Insights Gained from the Preceding Pie Chart
◼ One-half of the customers surveyed gave Marada a quality rating of
“above average” or “excellent” . This might please the manager.
◼ For each customer who gave an “excellent” rating there were two
customers who gave a “poor” rating (looking at the top of the pie).
This should displease the manager.

19
Summarizing Data for a Quantitative Variable
◼ Frequency Distribution
◼ Relative Frequency Distribution
◼ Percent Frequency Distribution
◼ Histogram
◼ Cumulative Distributions

20
Frequency Distribution
Example: Hudson Auto Repair
The manager of Hudson Auto would like to gain a better understanding of
the cost of parts used in the engine tune-ups performed in the shop. She
examines 50 customer invoices for tune-ups. The costs of parts, rounded to
the nearest dollar, are listed on the next slide.
Frequency Distribution
Example: Hudson Auto Repair
Sample of Parts Cost($) for 50 Tune-ups

52 54 62 62 62 65 66 67 67 68
68 68 69 69 69 71 71 72 72 73
74 74 75 75 75 76 77 78 79 79
80 81 81 82 83 85 88 89 91 93
97 97 97 98 99 101 104 105 105 109
Frequency Distribution
The three steps necessary to define the classes for a frequency distribution with
quantitative data are:

1. Determine the number of non-overlapping classes


2. Determine the width of each class
3. Determine the class limits
Frequency Distribution
Guidelines for Determining the Number of Classes

◼ Use between 5 and 20 classes


◼ Data sets with a larger number of elements usually require a larger
number of classes
◼ Smaller data sets usually require fewer classes

The goal is to use enough classes to show the variation in the data, but
not so many classes that some contain only a few data items.
Frequency Distribution
Guidelines for Determining the width of each Classes
◼ Use classes of equal width
◼ Approximate Class Width =

Largest Data Value − Smallest Data Value


Number of Classes

Making the classes the same width reduces the chance of inappropriate
interpretations.
Frequency Distribution
Note on Number of Classes and Class Width

◼ In practice, the number of classes and the appropriate class width are
determined by trial and error.
◼ Once a possible number of classes is chosen, the appropriate class width is
found.
◼ The process can be repeated for a different number of classes.
◼ Ultimately, the analyst uses judgment to determine the combination of the
number of classes and class width that provides the best frequency
distribution for summarizing the data.
Frequency Distribution
Guidelines for Determining the Class Limits

◼ Class limits must be chosen so that each data item belongs to one and only
one class.
◼ The lower class limit identifies the smallest possible data value assigned to
the class.
◼ The upper class limit identifies the largest possible data value assigned to
the class.
◼ The appropriate values for the class limits depend on the level of accuracy
of the data.

An open-end class requires only a lower class limit or an upper class limit.
Frequency Distribution
Example: Hudson Auto Repair
Sample of Parts Cost($) for 50 Tune-ups

52 54 62 62 62 65 66 67 67 68
68 68 69 69 69 71 71 72 72 73
74 74 75 75 75 76 77 78 79 79
80 81 81 82 83 85 88 89 91 93
97 97 97 98 99 101 104 105 105 109
Frequency Distribution
Example: Hudson Auto Repair

◼ If we choose six classes:


◼ Approximate Class Width = (109 - 52)/6 = 9.5 ≈ 10

Parts Cost ($) Frequency


50<C<=60 2
60<C<=70 13
70<C<=80 16
80<C<=90 7
90<C<=100 7
100<C<=110 5
Total 50
Relative Frequency and
Percent Frequency Distributions

Parts Relative Percent


Cost ($) Frequency Frequency
50<C<=60 .04 4
60<C<=70 .26 2/50 26 .04(100)
70<C<=80 .32 32
80<C<=90 .14 14 Percent
frequency is
90<C<=100 .14 14 the relative
100<C<=110 .10 10 frequency
multiplied
Total 1.00 100 by 100.
Relative Frequency and
Percent Frequency Distributions
Example: Hudson Auto Repair

Insights Gained from the % Frequency Distribution:

➢ Only 4% of the parts costs are in the $(50<C<=60) class


➢ 30% of the parts costs are under $70
➢ The greatest percentage (32% or almost one-third) of the parts costs are in
the $(70<C<=80) class
➢ 10% of the parts costs are $100 or more
Histogram
◼ Another common graphical display of quantitative data is a histogram.
◼ The variable of interest is placed on the horizontal axis.
◼ A rectangle is drawn above each class interval with its height
corresponding to the interval’s frequency, relative frequency, or percent
frequency.
◼ Unlike a bar graph, a histogram has no natural separation between
rectangles of adjacent classes.
Histogram
Example: Hudson Auto Repair
Cumulative Distributions
◼ Cumulative frequency distribution - shows the number of items with
values less than or equal to the upper limit of each class.
◼ Cumulative relative frequency distribution – shows the proportion of items
with values less than or equal to the upper limit of each class.
◼ Cumulative percent frequency distribution – shows the percentage of items
with values less than or equal to the upper limit of each class.
◼ The last entry in a cumulative frequency distribution always equals the
total number of observations.
◼ The last entry in a cumulative relative frequency distribution always
equals 1.00.
◼ The last entry in a cumulative percent frequency distribution always
equals 100.
Frequency Distribution
Example: Hudson Auto Repair

◼ If we choose six classes:


◼ Approximate Class Width = (109 - 52)/6 = 9.5 ≈ 10

Parts Cost ($) Frequency


50<C<=60 2
60<C<=70 13
70<C<=80 16
80<C<=90 7
90<C<=100 7
100<C<=110 5
Total 50
Cumulative Distributions
Example: Hudson Auto Repair

Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 60 2 .04 4
< 70 15 .30 30
< 80 31 2 + 13 .62 15/50 62 .30(100)
< 90 38 .76 76
< 100 45 .90 90
< 110 50 1.00 100
Descriptive Statistics -
Tabular and Graphical Displays

37
Descriptive Statistics -
Tabular and Graphical Displays
◼ Summarizing Data for Two Variables using Graphical Displays
◼ Summarizing Data for Two Variables using Tables
◼ Data Visualization: Best Practices in Creating effective Graphical Displays
Summarizing Data for Two Variables
Using Graphical Displays
◼ In most cases, a graphical display is more useful than a table for
recognizing patterns and trends.
◼ Displaying data in creative ways can lead to powerful insights.
◼ Scatter diagrams and trendlines are useful in exploring the relationship
between two variables.
Scatter Diagram and Trendline

◼ A scatter diagram is a graphical presentation of the relationship between


two quantitative variables.
◼ One variable is shown on the horizontal axis and the other variable is
shown on the vertical axis.
◼ The general pattern of the plotted points suggest the overall relationship
between the variables.
◼ A trendline provides an approximation of the relationship.
Scatter Diagram

A Positive Relationship

x
Scatter Diagram

A Negative Relationship

x
Scatter Diagram

No Apparent Relationship

x
Scatter Diagram & Trend Line
Example: Panthers Football Team
The Panthers Rugby team is interested in investigating the relationship, if
any, between interceptions made and points scored.

x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30
In ball-playing competitive team sports, an interception or pick is a move by a player involving a pass of
the ball—whether by foot or hand, depending on the rules of the sport—in which the ball is intended
for a player of the same team but caught by a player of the opposing team, who thereby usually gains
possession of the ball for their team. Example: Rugby
Scatter Diagram & Trend Line
The Panthers football team is interested in investigating the relationship, if
any, between interceptions made and points scored.

y
35
30
Number of Points Scored

25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions
Scatter Diagram and Trendline

Insights Gained from the Preceding Scatter Diagram


◼ The scatter diagram indicates a positive relationship between the number
of interceptions and the number of points scored.
◼ Higher points scored are associated with a higher number of interceptions.
◼ The relationship is not perfect; all plotted points in the scatter diagram are
not on a straight line.
Summarizing Data for Two Variables
Using Tables
◼ Thus far we have focused on methods that are used to summarize the data
for one variable at a time.
◼ Often a manager is interested in tabular and graphical methods that will
help understand the relationship between two variables.
◼ Crosstabulation is a method for summarizing the data for two variables.
Crosstabulation

◼ A crosstabulation is a tabular summary of data for two variables.


◼ Crosstabulation can be used when:
 one variable is qualitative and the other is quantitative
 both variables are qualitative, or
 both variables are quantitative

◼ The left and top margin labels define the classes for the two variables.
Cross-Tabulation
Example: Lake Homes
◼ The number of Lakes homes sold for each style and price for the past two
years is shown below.

quantitative categorical
variable variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45

Total 30 20 35 15 100
Colonial Log

Split A-Frame
Cross-Tabulation
Example: Lake Homes
Insights Gained from Preceding Crosstabulation
◼ The greatest number of homes (19) in the sample are a split-level style and
priced at less than $200,000.
◼ Only three homes in the sample are an A-Frame style and priced at
$200,000 or more.
Cross-Tabulation
Example: Lake Homes
Frequency
distribution
for the
price range
variable

Price Home Style


Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45

Total 30 20 35 15 100

Frequency distribution for


the home style variable
Cross-Tabulation – Row Percentage
Example: Lake Homes
Converting the entries in the table into row percentages or column percentages can
provide additional insight about the relationship between the two variables.

Price Home Style


Range Colonial Log Split A-Frame Total
< $200,000 32.73 10.91 34.55 21.82 100
> $200,000 26.67 31.11 35.56 6.67 100

Note: row totals are actually 100.01 due to rounding.

(Colonial and > $200K)/(All > $200K) x 100 = (12/45) x 100


Cross-Tabulation – Column Percentage
Example: Lake Homes
Converting the entries in the table into row percentages or column percentages can
provide additional insight about the relationship between the two variables.

Price Home Style


Range Colonial Log Split A-Frame
< $200,000 60.00 30.00 54.29 80.00
> $200,000 40.00 70.00 45.71 20.00

Total 100 100 100 100

(Colonial and > $200K)/(All Colonial) x 100 = (12/30) x 100


Side-by-Side Bar Chart
◼ A side-by-side bar chart is a graphical display for depicting multiple bar
charts on the same display.
◼ Each cluster of bars represents one value of the first variable.
◼ Each bar within a cluster represents one value of the second variable.

Lake Homes
20

15
Frequency

10

0
Colonial Log Split A-Frame
<$200,000 ≥$200,000
Stacked Bar Chart
◼ A stacked bar chart is another way to display and compare two variables
on the same display.
◼ It is a bar chart in which each bar is broken into rectangular segments of a
different color.
◼ If percentage frequencies are displayed, all bars will be of the same height
(or length), extending to the 100% mark.

Lake Homes
40

30
Frequency

20

10

0
Colonial Log Split A-Frame
<$200,000 ≥$200,000
Thank You !!!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy