Chapter 2
Chapter 2
Chapter 2
Chapter 2
Organizing Data
At the end of this chapter, the students should be able to:
Organize qualitative and quantitative data by using tables and graphs.
Example 2.1:
Below give an example of qualitative data on model of national cars used by 25 UTHM
students.
Example 2.2:
Below gives an example of quantitative data on monthly petrol expenses
(in Ringgit Malaysia) of 25 UTHM students.
The data can be organized by using frequency distribution tables. Furthermore, they are
also displayed in graphs such as bar chart or pie chart.
A frequency distribution for qualitative data lists all categories and the number of
elements that belong to each of the categories.
10
Example 2.3:
By using Example 2.2, the frequency distribution table can be presented.
Solution:
The model of national car is the variable in this example. This (qualitative) variable is
classified into five categories: Wira, Gen-2, Perdana, Iswara, and Kemara.
Step 1: we record these categories into first column of Table 2.1.
Step 2: we note each student’s response from the given raw data and mark a tally,
denoted by symbol “│”, in the second column beside the category that it falls in.
Step 3: we record the total tallies (frequency) for each category in third column. The
sum of this column gives the total frequency, which is the sample size.
= (𝑅𝑓) × 100.
Example 2.4:
Construct the relative frequency and percentage distribution for the data in Table 2.1
(in Example 2.2).
Solution:
The relative frequency and percentage are calculated using the formulae above.
11
Table 2.2: Relative frequency and percentage distribution of the model of national cars.
Model Frequency (f) Relative frequency (Rf) Percentage
Wira 8 8/25 = 0.32 0.32 (100) = 32%
Gen-2 6 6/25 = 0.24 0.24 (100) = 24%
Perdana 3 3/25 = 0.12 0.12 (100) = 12%
Iswara 5 5/25 = 0.20 0.20 (100) = 20%
Kembara 3 3/25 = 0.12 0.12 (100) = 12%
Total ∑f =25 ∑Rf =1.00 ∑% = 100%
There are two types of graphs namely, bar graph and pie chart. Moreover, a bar graph
is a graph composed of bars whose heights are frequencies of the different categories.
A bar graph display graphically the same information concerning qualitative data that a
frequency distribution shows in tabular form.
Example 2.5:
Draw a bar chart to represent the data in Table 2.1.
Solution:
Figure 2.1 shows the bar chart (2D and 3D) for the data in Table 2.1.
4 Frequency
0
Wira Gen-2 Perdana Iswara Kembara
2
Frequency
0
Example 2.6:
Construct a pie chart for the data in Table 2.1.
Solution:
First, the angle size for each category should be calculated.
Figure 2.2 shows the pie chart (2D and 3D) for data in the Table 2.2.
Pie chart for frequency distribution for model of national cars used
by 25 UTHM students
9%
Asean
11%
37% India
Japan
16%
China
U.K.
27%
13
Pie chart for frequency distribution for model of national cars used
by 25 UTHM students
9%
Asean
11%
37%
India
Japan
16%
China
U.K.
27%
Exercise 2.1:
1. The following list gives the academic ranks for the 25 male faculty members at a
mechanical faculty:
2. The pie chart below shows the percentage of the foreign professional workers in
Malaysia from January to June 2013. Given the number of foreign professional
workers in Malaysia during that period is 12 705. Construct a frequency
distribution and draw a bar chart for the frequency distribution.
India
U.K.
20%
7%
China
8% Japan
12%
14
In section 2.2, the organizing of the qualitative data has been introduced. In this section, we
will learn the organizing and displaying qualitative data.
A frequency distribution for qualitative data/variable lists all intervals and the number of
observations that belong to each interval. It shows how the frequencies are distributed
over these intervals.
Table 2.3 gives the frequency distribution for the cholesterol values of 45 patients in a
cardiac rehabilitation study. Give the lower and upper class limits and boundaries as well as
the class marks for each class.
Observe the table above. There are 5 classes (intervals), each with a class width of 20. All
these are also class limits in which the ending point (upper limit) of a class is not the same
as the starting point (lower limit) of the next class. In other words, there is a gap between
two consecutive classes. Moreover, all the classes in the first column are called class
boundaries in which the ending point (upper boundary) of a class is the same as starting
point (lower boundary) of the next class. In other word, there is continuity between classes.
Sturges’s Rule:
k = 1 + 3.3 log (n)
Power of k rule:
2k < n
Class width (C) = Range / Number of Classes = (Data max – Data Min) / k.
Step 3: determine the starting point or the lower limit of first class.
15
- Value of the starting point must be ≤ the smallest value (data minimum) in the data set.
- Normally the smallest value in the data set will be chosen as the starting point.
Note:
a. We can check for errors by finding the class width using both formulas below. Of
course, the answers must be the same if there is no error.
Class width = upper boundary – lower boundary
= (upper class limit – lower class limit) + 1
b. We may conclude class midpoint (class mark) into the table.
𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡+𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠
𝑐𝑙𝑎𝑠𝑠 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 = 2
𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦+𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦
𝑐𝑙𝑎𝑠𝑠 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 = 2
Step 5: determine the number of observations (frequency) that fall in each class.
By using these steps, Table 2.3 can be presented completely.
class Lower limit Upper limit Lower Upper Class mark/ class
boundary boundary midpoint
170 to 189 170 189 169.5 189.5 179.5
190 to 209 190 209 189.5 209.5 199.5
210 to 229 210 229 209.5 229.5 219.5
230 to 249 230 249 229.5 249.5 239.5
250 to 269 250 269 249.9 269.5 259.5
Example 2.7
Suppose you are considering investing in a mutual fund. You collected the data in Table 2.4,
which present the three year rate return (in percent) for a simple random sample of 40
small capitalization growth mutual funds. Construct a frequency distribution table for this
data.
Solution:
Step 1: determine the number of class.
Sturges’s rule:
k = 1 + 3.3 log (n)
= 1 + 3.3 log (40)
= 6.28
16
We may choose either 6 or 7 classes. Let us choose 7 classes for the data.
Power of k rule:
2k < n
2k < 40
25 < 40
By using this rule, we may choose 5 intervals or classes.
= (𝑅𝑓) × 100.
Example 2.8:
Construct a relative frequency and percentage distribution of the data in Table 2.3.
17
Table 2.4: Relative frequency and percentage distribution for cholesterol value.
Cholesterol value frequency Relative frequency Percentage
170 to 189 3 3/45 3/45 (100) =
190 to 209 10 10/45 10/45 (100) =
210 to 229 17 17/45 17/45 (100) =
230 to 249 13 13/45 13/45 (100) =
250 to 269 2 2/45 2/45 (100) =
Total 45 1.00 100%
Histograms
A histogram is similar to a bar graph that can be drawn for a frequency distribution or
relative frequency distribution or percentage distribution. To draw a histogram, the
horizontal axis (x-axis) is marked with classes ( or class boundary) and the frequencies ( or
relative frequencies or percentages) are marked on the vertical axis (y-axis). The heights of
the bars represent the frequency of each class.
Example 2.9:
Construct a histogram for data in Table 2.3.
Solution:
Histogram of Cholesterol Value
18
16
14
12
Frequency
10
0
180 200 220 240
Intervals
Polygon
A polygon is a line graph that is formed by joining the points plotted by the midpoint and
frequency (or relative frequency or percentage) of each class with straight lines.
Example 2.10:
Construct a polygon for data in Table 2.3.
Solution:
15
10
0
21.5 24.5 27.5 30.5 33.5
Figure 2.4: Polygon of cholesterol value.
Ogive
The curve that is constructed from the cumulative frequency distribution is called an ogive.
The class boundaries are marked on the horizontal axis and the cumulative frequencies are
marked on the vertical axis. Each of the dots in the graph is plotted by the upper class
boundary as the x-coordinate and the cumulative frequency as the y-coordinate for each of
the classes. Next, a curve or straight line is drawn through each of these points.
10
0
21.5 24.5 27.5 30.5 33.5
Exercise 2.2:
1. Refer to Table in Example 2.7.
a. Construct the relative frequency and cumulative frequency distribution.
b. Draw a relative frequency histogram and a relative frequency polygon for the data
on the same graph.
c. Illustrate the cumulative distribution with a suitable graph.
2. The following data gives the amounts of electrical bills (rounded to nearest RM) for the
past one month for 30 families.
75 34 47 26 56 29 48 42 33 67
38 41 63 55 61 73 61 76 46 51
55 42 35 39 45 71 24 47 67 52
a. Construct a frequency distribution table by taking 21 as the lower limit of the first
class and 10 as the width of each class.
b. Calculate relative frequency, percentage and class boundaries for each of the class.
c. Construct a cumulative frequency distribution table and represent it graphically.
d. What is percentage of the families have a monthly electrical bill of RM 61 or more
and RM 40 or less?
class frequency
a to b f1
c to d f2
e to f f3
g to i f4
j to k f5
Example 2.11:
Given a set of values for some variable, we want to organize and describe these values in a
meaningful way than just listing the raw data. For example, Mary converted a Java library
for matrix manipulation into JavaScript, and she was interested in the time behavior of
some of the functions. In one test, she generated 100 random matrices of size 70 x 60 and
used the JavaScript Date object to calculate run-time of the pseudo-inverse of each matrix
in milliseconds.
Then, how should Mary describe the following data set that she has
recorded?
318, 314, 315, 315, 313, 314, 315, 314, 314, 315, 313, 313, 315, 313, 314, 315, 314, 314,
315, 316, 315, 315, 314, 314, 314, 314, 314, 315, 314, 314, 316, 315, 314, 314, 315, 315,
316, 315, 313, 314, 313, 314, 314, 313, 313, 313, 315, 313, 312, 312, 313, 316, 313, 315,
315, 315, 313, 313, 312, 314, 314, 313, 313, 315, 314, 314, 315, 314, 314, 315, 313, 313,
314, 312, 312, 316, 314, 315, 315, 315, 315, 315, 314, 314, 313, 314, 314, 315, 313, 315,
316, 314, 315, 314, 323, 314, 314, 315, 314, 310
20
More useful way for Mary is to organize the raw data and to look at the distribution. This
shows the relative number of occurrence or the frequency of each value occurred in her
data set whereby she can observe the time behavior of the inverse function. Therefore, she
can construct a frequency distribution table and generate the graph/chart based on the
table.
40
35
30
25
20
Frequency
15
10
0
310 312 314 316 318 320 322
Based on Frequency Distribution Table, Mary observes the number of occurrence of each
time taken for pseudo-inverse function of random matrix of size 70 x 60 used the JavaScript
Date. The time values are clustered into 310-318 ms range, and there is an odd outlier at
323 ms. The highest frequency of the time taken for pseudo-inverse function is 314 ms and
the lowest is 310, 318 and 323 ms. Moreover, the bar chart shows that the ‘worst case’
performance will be occurred frequently in times between 313 to 315 ms.
21
1. Using Excel
Consider the previous set of data as in Table _, enter the data into Excel worksheet. The
data consists of time range (ms) and number of occurrence or frequency. Then drag the
data A2 toA15 and B2 to B15 simultaneously, click on ‘charts’ and choose chart type. To edit
the data of x-axis and y-axis, right click on the graph and select ‘select data’.
Answer:
IBM SPSS Statistics can also be used for statistical analysis. The difference between Excel
and SPSS is that SPSS cannot interpret Excel formulas and so any cell in Excel that is derived
from a formula will not be read in. SPSS data files contain only data and meta-data that
describes the data in terms of format, labels, value labels, missing values, etc).
To insert the data set, we need to click on the ‘file’ tab, then ‘new’ and select ‘data’ to open
data editor or we can also import the data from Excel, csv data, text data, and etc by
clicking at the ‘file’ tab, then ‘import data’ and select data file. To edit data description, we
can click on ‘variable view’ tab. Then, click on ‘analyze’ tab to analyse data set or ‘graphs’
tab to generate graph as in following figure. After generating the graph, double click on the
graph figure to open chart editor whereby we can edit the data and label on x-axis, y-axis,
the title of the graph and etc.
22
3. Using Matlab
Unlike Excel’s statistical analysis, we need to write our own codes to run statistical
functions and analysis, and to call graphing tools in Matlab. Firstly, we need to open a new
script where we write the codes. Then, we create our own ‘filename.mat’ file to insert the
data (time range (time) and number of occurrence (occ)). Based on the following figure, 1)
we can directly insert the data into an empty 4x8 data matrix (after generating line 4, click
at ‘data’ in workspace) and then save the data matrix (insert line 21 after line 4) or 2) we
can write as in line 6-9. Then, after we insert the data, we calculate the relative frequency
and percentage distribution as in line 13-18. To plot the bar chart, we call the graphing tool
as in line 20 and save the 4x8 data matrix in ‘filename.mat’(freq_dist.mat) file as in line 21.
After we finish writing the code, we click ‘run’ at the editor tab to run each line of the
code. To look at the 4x8 data matrix, click ‘data’ in workspace section.
23