Math 1100 Module 4
Math 1100 Module 4
Chapter 4
Data Management
Overview
The above story clearly illustrates the importance of being able to efficiently
collect, organize and manage data. In this chapter, we briefly discuss data
management, which is mainly a topic under the field of Statistics.
Objectives
On successful completion of the module, students will be able to:
1. Advocate the use of statistical data in making important decisions.
2. Discuss and interpret data.
3. Understand and interpret the different measures of central tendency,
measures of dispersion, and measures of relative position.
4. Use a variety of statistical tools to process and manage numerical data.
Statistics
Statistics is the science of collecting, organizing and summarizing recorded
information or data (descriptive statistics) in such a way that a valid conclusion and
meaningful predictions can be drawn from them (inferential statistics).
Mathematics in the Modern World | 4. Data Management
Types of Statistics
1. Descriptive statistics is consists of methods concerned with the collection,
description and analysis of data without drawing conclusions or inferences about a
larger set. Its main concern is simply to describe the set of data such that
otherwise obscure information is brought out clearly.
2. Inferential statistics utilizes sample data to make estimates, decisions, predictions,
or other generalizations about a larger set of data.
Variables
In statistics, a variable refers to a specific characteristic (or attribute) of a
subject. Such an attribute may assume two or more different values. For example, the
“sex” of a person is variable; its value is either „male‟ or „female. Other examples of
variables are your course, citizenship, age, height and weight.
Types of Variables
1. Qualitative variables are those whose values are measured not in terms of
numbers, but categorically by means of depression. Examples are “course”,
“citizenship”, “favorite color” and “place of birth”.
2. Quantitative variables are those that are always associated with numbers or a
scale measure. Examples are “age”, “height”, “weight” and “population”.
Page 2 of 34
Mathematics in the Modern World | 4. Data Management
Illustration 1:
Consider an upcoming election for Provincial Governor. A candidate spends time,
money and effort to conduct a survey on who is likely to be the next governor.
Statistically, the whole list of voters in the province is what is referred to as the
population for the survey. But inasmuch as it would be very costly and virtually
impossible to interview every voter in the province, only a few will be actually
interviewed. Such a few voters are what are referred to as the sample. Results from the
sample will then be used to project the trend of the whole population.
That is, data is collected from a
sample and then summarized in order to
draw a conclusion that is taken to be true for
the whole population. Thus, a good sample is
POPULATION
one that truly represents the population, so
that conclusions made from the sample is
valid for the entire population. If a sample is
bad, then conclusions from it may not be
valid for the population. The fact is,
information could change from one sample to Sample
another sample of the same population.
Illustration 2:
A student researcher wants to do a survey among CLSU students. Instead of
doing a survey of all the students in CLSU, he just chose and surveyed a group of 45
students (five students per college). In this scenario, the population is all the students
of CLSU, while the sample is the group of 45 students.
Organizing Data
Considered as Phase I of organizing data is data collection, where each element
of the data is called a data point. Generally in this phase, the raw data may not show
any apparent pattern or trend.
Illustration 3:
Phase I. The following data are the respective number of kids of 50 families.
0 2 1 0 3 2 0 1 1 0
0 1 1 2 4 1 0 1 1 0
2 1 0 0 3 0 0 1 2 1
0 0 2 4 1 1 0 1 2 0
1 1 0 3 5 1 2 1 3 2
The above raw data as it is presented, suggests nothing but just numbers. But if
we organize the data (Phase II), they become more meaningful.
Frequency Distribution Table
The most common way of organizing data is using a frequency distribution table
or FDT. It utilizes a table that lists all data points, along with how many times the data
Page 3 of 34
Mathematics in the Modern World | 4. Data Management
point occurs (frequency, ), and its percentage of the total number of data (relative
frequency, ).
Phase II. We organize the raw data into a frequency distribution. First, we must decide
on how many groups to use. Customarily, the number of groups is any number from
4 to 8. Say, we use 6 groups here. Second, we determine the interval for each
group. This is done by,
62 16
= =7.66̅
6
In order to be consistent with the data which are integers, we round it off to 8.
Hence, the frequency distribution is
Score (x) Tally Frequency Relative Frequency
16 ≤ x < 24 IIII – IIII – I 11 26 %
24 ≤ x < 32 IIII - IIII – IIII - III 13 31 %
32 ≤ x < 40 IIII – II 7 17 %
40 ≤ x < 48 III 3 7%
48 ≤ x < 56 II 2 5%
56 ≤ x < 64 IIII -I 6 14 %
Total n = 42 100 %
Page 4 of 34
Mathematics in the Modern World | 4. Data Management
Histogram
Data that are grouped in intervals can be depicted by a histogram, which is
actually a bar graph that shows how the data are distributed. The histogram for the
data in Illustration 5 is:
15
12
Frequency (f)
9
6
3
16 24 32 40 48 56 64
Scores
Note that a histogram should show an accurate comparison of the data. That is,
the length of the rectangles must correspond to the frequencies of the intervals, and
the width of the rectangles must be of the same size, since each interval has the same
class interval.
10 25
8 20
6 15
4 10
2 5
15 20 25 30 35 40 45 50 55 60 65
Scores
Page 5 of 34
Mathematics in the Modern World | 4. Data Management
2. Suppose that in the intramural games, the competing colleges earned the
tabulated overall points.
150 CEd
120
90 CAg
CEn
60 CF
CoS CASS
30 CVSM CBAA CHSI
College
What‟s wrong with the above histogram?
Pie Charts
The data used in the preceding examples were all quantitative (numerical). For
qualitative (categorical) data especially, an easy way to summarize data is through the
use of a pie chart. Pie charts are used to clearly show what part of the whole is
accounted by a specific characteristic.
Example.
In a certain small community, the marital status of its adult population is
tabulated below:
Marital Status Frequency Relative Frequency
Single (Si) 50 25%
Married (M) 113 56.5%
Widowed (W) 28 14%
Separated (S) 9 4.5%
Total 200 100%
4.5%
S
14% 25%
W Si
56.5%
M
Page 6 of 34
Mathematics in the Modern World | 4. Data Management
The whole reason for constructing a pie chart is to convey information visually; it
should enable the reader to compare easily the relative proportions of the categorical
data. Thus, every slice of the pie should correspond to the relative frequency, which is
also written in the label. Using different colors for every slice in the pie may also help.
And, if the names of the categories are too long, a legend may be used.
25%
What‟s wrong with the pie chart at the right?
60% 30%
75%
Exercises
2. A survey among CLSU students was conducted. It is about their starting position
when they sleep (face up, face down, left side, or right side). Five hundred students
were randomly interviewed. Do the collected data represent a sample or population?
Explain your answer.
Page 7 of 34
Mathematics in the Modern World | 4. Data Management
4. The following data is the scores of forty students in a 100-item math quiz.
59 65 57 83 64 60 57 89
62 75 90 90 87 66 69 81
69 83 79 59 70 83 82 57
89 92 84 58 73 67 89 66
93 68 59 77 62 90 80 79
a. Organize the data into a 6-group frequency distribution.
b. Create a histogram to summarize the data.
c. Construct a pie chart to represent the data.
Mean
The mean is the most commonly used measure of central tendency. The mean of
a data set is the sum of the data points divided by the number of data points, or simply
the average of the data points. Thus, it is strongly influenced by outliers (data points
that are extremely low or extremely high compared to other data points). The
po0pulation mean, denoted by , is estimated by the sample mean denoted by ̅.
where are the data poins and is the number data points.
Page 8 of 34
Mathematics in the Modern World | 4. Data Management
Example 1: The data below are the current diesel prices (in pesos/liter) in nearby gas
stations, find the mean price.
43.80 44.10 42.95 43.80 44.30 39.00 44.30 43.80
Solution:
̅
̅ 43.26 pesos/liter
Example 2: Gabriel has a total of 4 quizzes. One quiz is missing while the scores of his
remaining quizzes are 43, 35 and 39. Calculate the score of the missing quiz if his
mean score is 41.
Solution:
Let denote Gabriel‟s score in his missing quiz.
̅
( )
47
Example 3: In a class of 18 men and 22 women, the mean score of men in a quiz is 38
while the mean score of women is 35. Find the mean score of the whole class.
Solution:
( ) ( )
̅
̅ 36.35
In a grouped data, we do not know the individual data points. In such situations ,
we use the midpoints of the intervals to represent individual scores. Consequently, the
mean of the grouped data is only an approximation.
̅
where is the midpoint of each interval and is the frequency of each interval.
Page 9 of 34
Mathematics in the Modern World | 4. Data Management
Example 4: Find the mean score of 42 students from the following frequency
distribution:
Score Frequency
16 ≤x< 24 11
24 ≤x< 32 13
32 ≤x< 40 7
40 ≤x< 48 3
48 ≤x< 56 2
56 ≤x< 64 6
Solution:
Step 1: Add two columns for Midpoint ( ) and , and compute for its value. The
midpoint is half of the sum of lower limit and upper limit less by one measure of
unit in each interval (See the example below) while is the product of frequency
and midpoint in each interval.
Step 2: Compute for and .
Step 3: Use the formula ̅ to get the mean of the grouped frequency distribution.
Frequency
Score Midpoint ( )
( )
( )
16 ≤ x < 24 19.5 11 11(19.5) = 214.5
( )
24 ≤ x < 32 27.5 13 13(27.5) = 357.5
32 ≤ x < 40 35.5 7 248.5
40 ≤ x < 48 43.5 3 130.5
48 ≤ x < 56 51.5 2 103.0
56 ≤ x < 64 59.5 6 357.0
Total = 42 1411
Finally, ̅ 33.60
Note: Actually, the data in this example are those used in Illustration 5 of this chapter.
The reader is urged to compute the actual mean which is 33.64. It only shows that the
mean of a grouped data is just an approximation of the actual mean.
Median
The median is a value that separates an array of data points into two equal parts.
To find it, the data need first to be arranged in numerical order. If there is an odd
number of data points, then the median is the middle value. If there is an even number
of values in the data set, then the median is the average of the two middle values. The
median can be denoted by or ̃.
Unlike the mean, median is not affected by extreme values in data points because
it only considers the middle values in the data set.
Page 10 of 34
Mathematics in the Modern World | 4. Data Management
Example 6: The current crude oil prices (in pesos/liter) in nearby gas stations are listed
below. Find the median price.
43.80 44.10 42.95 43.80 44.30 39.00 44.30 43.90
Mode
The mode of a data set is the data point that occurs most often. If no data point is
repeated or every data point is repeated the same number of times, there is no mode.
If the mode of a data set exists, it may not be unique. A unimodal data set has one
mode, bimodal has two modes, trimodal has three modes and multimodal has many
modes. The mode can be used for qualitative as well as quantitative data.
Mode is not affected by the extreme values in the data set, since it only considers
the most frequent data. Mode can be denoted by or ̂.
Solution:
a. There is no mode because no data point is repeated.
b. There is no mode because all data points are repeated twice.
c. The mode is 5 and 8, since 5 and 8 are repeated twice.
Example 8: Thirty students are asked about their favorite color. The data is summarized
by the frequency distribution table below. Find the mode.
Color Frequency
Yellow 2
Blue 5
Red 5
White 8
Black 10
The mode is black, since it has the highest frequency.
Page 11 of 34
Mathematics in the Modern World | 4. Data Management
Exercises
3. The mean monthly salary of 32 men is P28,500 while that of 38 women is P24,400.
Find the mean salary of all the men and women.
4. If your 1st Term Score in this class is 38.42, and your 2 nd Term Score is 43.83, what
score do you need in the 3rd Term so that your mean score is 60.25?
5. During quarantine many people tends to binge watch series in television, frequency
distribution table below shows the number of hours spent per day consumed by 30
teenagers. Find the mean time spent by teenagers to watch series per day.
Page 12 of 34
Mathematics in the Modern World | 4. Data Management
Suppose that we are choosing between Jerico and Jerwin on who should represent
CLSU to an upcoming Inter-University Math Quiz Bee. To choose, their coach conducted
6 sessions of quiz-alikes between them, and came up with the following scores:
So, after the 6 quizzes, Jerico and Jerwin were tied at 3 wins and 3 losses. Who
should be chosen? Looking at their averages (verify);
Surprisingly, they are again tied in these measures. The mean, median, and the
mode cannot help in deciding on who should be sent to the Quiz Bee!
Another measure that could help is to look at their consistency. This is about the
measure of variability that is to look at how spread apart or dispersed their scores are.
Measures of Variability
Range
The range, denoted by , is the difference between the lowest and the highest
values in a data set. A weakness of the range is that an extreme value (outlier) can
greatly alter its value.
= Highest Value – Lowest Value
Page 13 of 34
Mathematics in the Modern World | 4. Data Management
Generally, in any set of data, it can be shown algebraically that the sum of the
deviations is always 0. The negatives always cancel out the positives. So, in order to
use deviations effectively to study how the data is dispersed, the remedy is to square
each deviation. This leads to what is called as variance.
Variance is the mean of the squared deviation of the data points. The sample
variance (denoted by ) is an estimator of the population variance (denoted by ). In
symbols, sample variance of data points where is the number of data
points is defined as
( ̅)
Note: 1. If the data points represent the entire population, the divisor used is .
But for sample data points, the divisor is – . It has been a general
observation and agreed upon by statisticians that using – rather than
produces a best estimate of the true population variance.
Page 14 of 34
Mathematics in the Modern World | 4. Data Management
Example: Compute the respective (a) variance and (b) standard deviation of the scores
of Jerico and Jerwin.
Solution:
( ̅)
a. Using the formula ,
Jerico ( ̅ ) Jerwin ( ̅ )
Score Deviation Score Deviation
( ̅) ( ̅)
̅ ̅
83 –2 4 81 –4 16
65 –20 400 85 0 0
100 15 225 74 –11 121
92 7 49 85 0 0
85 0 0 90 5 25
85 0 0 95 10 100
( ̅) 678 ( ̅) 262
( ̅) ( ̅)
Take note that the value in the Deviation column is computed by subtracting the
given mean from each data, for example 83-85=-2, 65-85=-20, 100-85=15, and so on;
while the value in the ( ̅ ) column is computed by squaring each value in the ̅
column, for example (-2)2=4, (-20)2=400, (15)2=225, and so on.
Page 15 of 34
Mathematics in the Modern World | 4. Data Management
( )
Other solution: Using the alternative variance formula, [ ]. We
need to find the sum of the data and the sum of the squares of each data point. We
don‟t need the mean of the data.
Jerico Jerwin
Score ( ) Score ( )
83 6 889 81 6 561
65 4 225 85 7 225
100 10 000 74 5 476
92 8 464 85 7 225
85 7 225 90 8 100
85 7 225 95 9 025
Σx = 510 Σx2 = 44 028 Σx = 510 Σx2 = 43 612
( ) ( )
[ ] [ ]
[ ] [ ]
= 135.6 = 52.4
Note that the two formulas for variance yield the same result. This is always the
case. In fact, it may be proven algebraically that the formulas are equivalent.
So, between Jerico and Jerwin in the example, Jerwin wins in as far as
consistency is concerned because he has a lower standard deviation 7.24 as compared
to Jerico‟s 11.64 .
Page 16 of 34
Mathematics in the Modern World | 4. Data Management
Exercises
4. Nik and Nok played 5 games in a bowling center. Their scores are:
Page 17 of 34
Mathematics in the Modern World | 4. Data Management
As earlier discussed, the measures of central tendency especially the mean and
the median describe the „center‟ of a distribution. Indeed, such a center is what is
usually used and needed to summarize a distribution. Occasionally however, a different
part of the distribution is of more interest. The percentile, decile, and quartile are
used in such occasions, as they indicate the location of a data point relative to the other
data points.
Percentiles
Percentiles split the whole distribution into 100 subgroups. It is similar to cutting
a long pipe into 100 short pipes of equal lengths. In order to do this, it is necessary to
make 99 cuts. The points where the cuts are done correspond to percentile ranks or
scores. Thus, percentile ranks are from 1 to 99, which we hereby denote by P1, P2, P3,
…, P99, . There is no sense to have a P0, nor a P100.
A percentile is a value that describes the percentage of data that falls below it.
For example, suppose you got a 99 percentile score in an exam. It means that 99% of
the examinees scored lower than you; it doesn‟t mean that you had a score of 99%. In
fact, your actual score is not at all indicated.
Illustration:
Suppose that Sonny is among the 15,000 high school graduates who took
the CLSU Admission Test, and he got a 48 percentile score.
His 48 percentile score means that 48% of the 15,000 examinees (7,200)
scored lower than Sonny. It doesn‟t mean that his actual score in the exam is 48.
On the other hand, his actual score is lower than 52% of the 15,000 examinees
(7,800).
Suppose another student Nick got a percentile score of 68. This means
that 68% of the 15,000 examinees (10,200) scored lower than Nick while 32%
or 4,800 examinees scored higher than him.
The actual scores of Sonny and Nick both remain unknown, until we do
some calculations that also involve the whole distribution of data points, their
percentile scores, and the number of data points.
Calculating Percentiles
To find a data point that corresponds to a percentile score , the following steps
are suggested.
1. Arrange the data points numerically from lowest to highest.
Page 18 of 34
Mathematics in the Modern World | 4. Data Management
Solution: Note that P25 and P80 respectively refer to the 25th and 80th percentiles.
Step 1. Arrange the data in ascending order:
Location 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Data Point 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6
25 80
Step 2. L25 (15 1) 4 L80 (15 1) 12.8
100 100
Step 3. Since L25 = 4 (integer), then Since L80 = 12.8 (with decimal),
P25 = 4th data P80 = 12thdata + 0.8(13th–12th)
=3 = 5 + 0.8(6 – 5)
= 5.8
The data points that correspond to 25th and 80th percentiles are respectively 3 and 5.8.
Solution:
Step 1. Arrange the data according to height (shortest to tallest).
1. Ingrid 58 7.Jorem 63 13 JR 64 19 Jhun 67
2. Delia 58 8. Dinah 63 14 Rose 64 20 Rene 67
3. Sonny 59 9. Edu 63 15 Chito 66 21 Rain 67
4. Jade 59 10 Ammi 64 16 Ronel 66 22 Chad 70
5. Nick 61 11 Angie 64 17 Ped 66
6. Melch 62 12 Edwin 64 18 Al 67
Page 19 of 34
Mathematics in the Modern World | 4. Data Management
30 60
Step 2. L30 (22 1) 6.9 L60 (22 1) 13.8
100 100
Step 3.
P30 = 6th data + 0.9(7th–6th) P60 = 13thdata +0.8(14th–13th)
P30 = 62 + 0.9(63 – 62) P60 = 64 + 0.8(64 – 64)
P30 = 62.9 P60 = 64
a. P30 = 62.9 and P60 = 64.
Deciles
Example 3: In the preceding example (Example 2) of student heights, the 3rd decile D3
could be computed by considering P30, which was computed to be 62.9.
Furthermore, D6 = P60 = 64. Similarly, to find the 9th decile, D9 = P90
Computing for P90,
Step 1. (see arranged data in the preceding page)
90
Step 2. L90 (22 1) 20.7
100
Page 20 of 34
Mathematics in the Modern World | 4. Data Management
Quartiles
Example 3. The following are heights (in inches) of some students, find Q1, Q2, and Q3.
Sonny 59 Melch 62 Ronel 66 Jhun 67
Nick 61 Jade 59 JR 64 Edu 63
Ingrid 58 Ammi 64 Dinah 63 Rene 67
Rose 64 Delia 58 Ped 66 Rain 67
Chad 70 Angie 64 Edwin 64
Chito 66 Jorem 63 Al 67
Solution: Since Q1 = P25, Q2 = P50 and Q3 = P75, we compute for the corresponding
percentiles.
Page 21 of 34
Mathematics in the Modern World | 4. Data Management
Step 3.
a. P25 = 5th + 0.75(6th – 5th) b.P50 = 11th + 0.5(12th –11th)
= 61 + 0.75(62 – 61) = 64 + 0.5(64 – 64)
= 61.75 = 64
Exercises
The following are weights (in pounds) of newborn babies in a certain hospital in a
period of 1 week.
1. P5, P20, P30, P48, P60, P80, P88, P90, P95, P98
2. D1, D2, D3, D4, D5, D6, D7, D8, D9
3. Q1, Q2, Q3
4. If Baby A weighs 7.98 pounds and Baby B weighs 8.65 pounds, in what
percentile does Baby A and Baby B belong?
Page 22 of 34
Mathematics in the Modern World | 4. Data Management
Normal Distribution
Many sets of data exhibit a pattern such as what is exhibited in the following
histogram of some discrete data. Most of the data are concentrated towards the center
and taper off at either end; the data is almost symmetrical with respect to the “center”.
15
Frequency ( )
12
This type of data distribution occurs very frequently in many situations. The
normal distribution or the Gaussian distribution (in honor of Gauss, 1777-1835) is the
most important distribution in statistics. Statisticians created an ideal bell-shaped curve
(also called normal curve) to describe such a normally distributed data. The normal
curve is symmetric about a vertical axis through the mean, with a total are under the
curve equal to 1 and the curve is asymptomatic to the x-axis.
All data points are contained and spread under the bell shape, which is asymptotic
to the horizontal line. Characteristically,
1. Data points are clustered toward the center; only a few are found toward the
two ends or tails.
2. The number of data points at both sides is the same. Consequently, the three
measures of central tendency (mean, median and mode) all coincide at the
center.
Page 23 of 34
Mathematics in the Modern World | 4. Data Management
A wide variety of data have been observed to manifest the normal distribution,
and statisticians have established the occurrence and location of data points under the
normal curve. With the population mean and population standard deviation ,
occurrence of data under the normal curve has been established as illustrated below:
99.74%
95.44%
68.26%
µ- µ- µ- µ µ µ µ
Illustration. Assume that the scores of all 32,000 civil service examinees this year are
normally distributed. Their mean score is 66.5 points and the standard deviation is
2.4 points.
Example 1: In a recently concluded IQ Test among all 9,800 currently enrolled CLSU
students, results showed that the mean IQ is 100, with a standard deviation of
15. Assume that the scores are normally distributed. How many of the students
have an IQ
Page 24 of 34
Mathematics in the Modern World | 4. Data Management
Solution: With the given µ = 100 and σ = 15, the distribution of the scores is
99.74%
95.44%
68.26%
a. Above 100.
Note that 100 is the mean, and in normal distribution mean is in the center.
Since a normal curve is symmetrical to the center (µ = 100), there must be
half or 50% of the scores above it. So, there are half of 9800 scores, that is
4900 students of the 9800 have an IQ above 100.
c. Above 145.
Those whose scored falls from 55 (or µ–3σ) to 145 (or µ+3σ) accounts for
99.74% of data. Hence, the remaining, that is those who scored above 145
(right tail) and below 55 (left tail), accounts only for 100%–99.74% = 0.26%.
Knowing that the normal curve is symmetric, only 0.13% are at each of the
two tails. Thus, 0.13% of 9800 which is approximately 12 students have an
IQ above 145.
Observe in the preceding example that the numbers involved in the questions
(100, 145, 85, and 115) are precisely where µ, µ+3σ, µ–σ, and µ+σ are respectively
situated in the normal curve. Now, suppose there is a question such as “How many
students had an IQ above 120?”.
Page 25 of 34
Mathematics in the Modern World | 4. Data Management
We see that 120 lies somewhere in the interval (µ+σ, µ+2σ), that is (115, 130).
In cases such as this, the z-distribution comes in.
This resulted into a normal distribution whose mean is 0 and standard deviation 1, as
illustrated in the following z-curve.
–3 –2 –1 0 1 2 3
z-score
Illustration 1: In the preceding example about IQ Test of 9800 students whose µ = 100
and σ = 15, a score of 120 corresponds to a z-score of
̅
For various z-scores, the following z-tables summarize the areas under the curve
as compared to the entire area which is taken to be 1. A z-table , also called the
standard normal table, is a statistical table that allows us to know the percentage or
proportion of values below (or to the left) of a z-score in a standard normal
distribution. There are two z-table, negative z-table for negative z-score and positive z-
table for positive z-score.
Page 26 of 34
Mathematics in the Modern World | 4. Data Management
Page 27 of 34
Mathematics in the Modern World | 4. Data Management
Page 28 of 34
Mathematics in the Modern World | 4. Data Management
Example 2: In the recently concluded IQ Test among all 9,800 currently enrolled CLSU
students, results showed that the mean IQ is 100, with a standard deviation of
15. Assume that the scores are normally distributed. How many of the students
have an IQ
A: a) above 100 b) above 145 c) between 85 and 115
B: a) above 120 b) less than 90 c) between 80 and 130
Solution:
The solutions for the A problems have been earlier found in Example 1 where it
wasn‟t necessary to use z-scores. We do them here again using z-scores.
= -1.00 = 1.00
= 0.00 = 3.00
Using now the z-table, noting that the values therein are areas under the curve from the
left up to z, we read off the following values:
0.5000 0.9987 0.1587 0.8413
Below z = 0 is 0.5000, it Below z = 3 implies that Below z=-1 is 0.1587 and below z=1
means that above z = 0 is above z = 3 must be is 0.8413, to get the area or
also 0.5000, since 1– 0.9987 or 0.0013. percentage between –1 < z < 1 we
1-0.5000=0.5000. need to get the difference,
0.8413-0.1587=0.6826.
So, there are So, there are
(0.0500)(9800) or 4,900 (0.0013)(9800) or 12 So, there are (0.6826)(9800) or
students. students. 6,689 students.
Page 29 of 34
Mathematics in the Modern World | 4. Data Management
= -1.33 = 2.00
= 1.33 = -0.67
Using now the z-table, noting that the values therein are areas under the curve from the
left up to z, we read off the following values:
0.9082 0.2514 0.0918 0.9772
Above z = is Below z = is –1.33 < z < 2 has the area
1 - 0.9082 = 0.0918. 0.2514. 0.9772 – 0.0918 or 0.8854.
So, there are So, there are So, there are (0.8854)(9800) or
(0.0918)(9800) or 899 (0.2514)(9800) or 2,463 8,676 students.
students. students.
Example 3: The times taken to answer a mathematics exam have a normal distribution
with a mean of 65 minutes and standard deviation of 5 minutes. There are 200
students who took the exam.
a. How many examinees finished their exam in less than 1 hour?
b. How many examinees finished their exam in 63 to 72 minutes?
c. If the exam is good only for 75 minutes, how many examinees failed to finish the
exam on the given time limit?
Solution: Given: =65 and σ=5. Let x be the time taken to answer the exam.
a. Consider below x = 60, we convert 1 hour to minutes because and σ is in
terms of minutes.
60-65
z= =-1.00
5
Using the z-table, below z=-1.00 is 0.1587.
Hence, (0.1587)(200) or 32 examinees finished the exam in less than an hour.
c. Examinees who failed to finish the exam are those whose time is above x = 75.
75-65
z= = 2.00
5
Using the z-table, below z = 2.00 is 0.9772.
It implies that above z = 2.00 is 1 – 0.9772 = 0.0228
(0.0228)(200) or 2 examinees failed to finish the exam within the time limit.
Page 30 of 34
Mathematics in the Modern World | 4. Data Management
Exercises
I. In a math class of 60 students, the mean general average of the students at the
end of the semester is 63.60 with a standard deviation of 3.40. Assuming that the
scores are normally distributed,
1. Sketch the normal curve that indicates the intervals representing 1, 2 and 3
standard deviations from the mean.
II. An important quality characteristic for softdrink bottlers is the amount of softdrink
injected into each bottle. In a particular filling process, the number of ounces
injected into an 8-ounce bottles is approximately normally distributed with mean
8.00 ounces and a standard deviation of 0.05 ounce. Bottles that contain less than
7.90 ounces do not meet the bottler‟s quality standard. If 20,000 bottles are filled,
approximately how many will meet the quality standard?
Page 31 of 34
Mathematics in the Modern World | 4. Data Management
Chapter Assessment
1. The weights (in kilos) of some live chicken at the University Poultry Project are as
follows:
1.1 2.4 1.3 0.9 2.0 1.7 3.0 1.4 2.8
1.7 2.3 1.8 2.7 2.1 3.2 1.8 3.3 0.9
1.9 2.9 2.8 1.8 2.6 3.1 2.4 3.0 2.6
3.2 2.4 3.1 2.5 1.9 1.2 1.5 2.3
a. Organize the data in a frequency distribution.
b. Represent the data in a histogram.
3. PAGASA reported the following data about the yearly number of typhoons (NT) that
the country experienced in the past.
Year NT Year NT Year NT
2005 8 2010 13 2015 17
2006 15 2011 14 2016 23
2007 22 2012 18 2017 15
2008 9 2013 21 2018 21
2009 17 2014 14 2019 16
Find the mean, median, and the mode of the yearly number of typhoons.
4. Suppose that the mean score of 26 students (in a class of 30 students) in the MMW
Final Exam is 57.98. What should be the total score of the remaining 4 students in the
class in order that the class mean is 60.00?
6. Find the mean and the standard deviation of each data set:
a. 2 4 6 8 10
b. 52 54 56 58 60
Page 32 of 34
Mathematics in the Modern World | 4. Data Management
c. 60 60 60 60 60
d. 21 60 60 60 99
Make some observations about the data sets and their respective means and standard
deviations.
7. Tabulated below are diesel prices (pesos/liter) of some gas stations in the cities of
San Jose and Cabanatuan.
San Jose 39.95 40.15 41.05 39.80 40.65 41.25
Cabanatuan 41.20 38.15 42.25 40.35 39.80 41.70
a. Find the mean price of diesel in each city.
b. Find the standard deviation in each city.
c. Which city has a more consistently priced diesel? Why?
8. Consider again the following weights (in kilos) of some live chicken at the University
Poultry Project (Exercise #1).
1.1 2.4 1.3 0.9 2.0 1.7 3.0 1.4 2.8 1.2
1.7 2.3 1.8 2.7 2.1 3.2 1.8 3.3 0.9 1.5
1.9 2.9 2.8 1.8 2.6 3.1 2.4 3.0 2.6 2.3
3.2 2.4 3.1 2.5 1.9
9. For all brands of pain reliever pills in the market, the amount of time between taking
a pill and getting relief is normally distributed with a mean of 18 minutes and a
standard deviation of 3 minutes. Find the probability that after taking a pill, one will
feel relief within
a. at least 24 minutes
b. at most 15 minutes.
10. The results of a citywide exam for assessing the efficiency and capability of tricycle
drivers in their service to the public were normally distributed with a mean score of
72 and a standard deviation of 12. The drivers who scored in the top 10% are to
receive a special certificate, while those in the bottom 20% will be required to
undergo a remedial training/workshop.
Page 33 of 34
Mathematics in the Modern World | 4. Data Management
11. In a mango farm of 100 trees, the mean harvest is 96.5 kilos of fruit per tree with a
standard deviation of 8.4 kilos. Assuming that the harvests per tree are normally
distributed,
a. Sketch the normal curve that indicates the intervals representing 1, 2 and 3
standard deviations from the mean.
c. If a tree that yielded below µ – 3δ is to be cut off, how many trees would be cut
off?
d. If a tree that only yielded 50 kilos or less is to be cut off, how many trees would
remain?
Page 34 of 34