0% found this document useful (0 votes)
7 views34 pages

Math 1100 Module 4

Math Modern World College First year First Sem

Uploaded by

jjksgst
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views34 pages

Math 1100 Module 4

Math Modern World College First year First Sem

Uploaded by

jjksgst
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Central Luzon State University

Science City of Muñoz 3120


Nueva Ecija, Philippines

Instructional Module for


Mathematics in the Modern World

Chapter 4
Data Management
Overview

During the Crimean War in Victorian England, Florence Nightingale (1820–


1910) took a mission to improve the squalid field hospital conditions of the British
army. She compiled massive amounts of data from the army files which she used to
convince members of the British Parliament about the need to supply nursing and
medical care for soldiers in the field. Through a remarkable series of graphs, she
used statistics to demonstrate that most of the deaths in the war were due to illness
contracted outside the battle from wounds that went untreated. Her compassion and
self-sacrificing nature, coupled with her ability to collect, arrange, and present large
amounts of data, led to her being regarded as the Passionate Statistician.
(https://www.coursehero.com/file/p6unj1f/Descriptive-statistics-utilizes-numerical-and-graphical-methods-to-look-for/)

The above story clearly illustrates the importance of being able to efficiently
collect, organize and manage data. In this chapter, we briefly discuss data
management, which is mainly a topic under the field of Statistics.

Objectives
On successful completion of the module, students will be able to:
1. Advocate the use of statistical data in making important decisions.
2. Discuss and interpret data.
3. Understand and interpret the different measures of central tendency,
measures of dispersion, and measures of relative position.
4. Use a variety of statistical tools to process and manage numerical data.

Statistics
Statistics is the science of collecting, organizing and summarizing recorded
information or data (descriptive statistics) in such a way that a valid conclusion and
meaningful predictions can be drawn from them (inferential statistics).
Mathematics in the Modern World | 4. Data Management

Types of Statistics
1. Descriptive statistics is consists of methods concerned with the collection,
description and analysis of data without drawing conclusions or inferences about a
larger set. Its main concern is simply to describe the set of data such that
otherwise obscure information is brought out clearly.
2. Inferential statistics utilizes sample data to make estimates, decisions, predictions,
or other generalizations about a larger set of data.

Variables
In statistics, a variable refers to a specific characteristic (or attribute) of a
subject. Such an attribute may assume two or more different values. For example, the
“sex” of a person is variable; its value is either „male‟ or „female. Other examples of
variables are your course, citizenship, age, height and weight.

Types of Variables
1. Qualitative variables are those whose values are measured not in terms of
numbers, but categorically by means of depression. Examples are “course”,
“citizenship”, “favorite color” and “place of birth”.
2. Quantitative variables are those that are always associated with numbers or a
scale measure. Examples are “age”, “height”, “weight” and “population”.

The measurement of a variable may either be discrete (integer) or continuous,


and are classified into one of the following scales of measurements:
1. Nominal – characterized by data that consists of names, labels, codes or
categories only. These data cannot be arranged in an ordering scheme and cannot
be used for calculations. Examples are measurements of gender, race, religion,
and sports.
2. Ordinal – it involves data that may be arranged in some order. Examples are sizes
(small, medium, large), socio-economic class (working, middle, upper), and the
Likert scale (strongly disagree, disagree, neutral, agree, strongly agree).
3. Interval – measurements where the difference between values is meaningful.
Examples are height, weight, and temperature in Celsius.
4. Ratio – measurements are ordered according to the amount of attribute they
possess. Equal differences in the attribute are represented by equal differences in
the numbers assigned. In ratio, zero means absence of something. Temperature
in Celsius or Fahrenheit are not ratio scales because 0⁰C or 0⁰F does not mean the
absence of temperature; while temperature in Kelvin is an example of a ratio scale
since 0⁰K means an absence of heat.

Population versus Sample


In statistics, a population refers to the entire set of
all objects under study; while a sample refers to any Sample
subset of the population.
Population

Page 2 of 34
Mathematics in the Modern World | 4. Data Management

Illustration 1:
Consider an upcoming election for Provincial Governor. A candidate spends time,
money and effort to conduct a survey on who is likely to be the next governor.
Statistically, the whole list of voters in the province is what is referred to as the
population for the survey. But inasmuch as it would be very costly and virtually
impossible to interview every voter in the province, only a few will be actually
interviewed. Such a few voters are what are referred to as the sample. Results from the
sample will then be used to project the trend of the whole population.
That is, data is collected from a
sample and then summarized in order to
draw a conclusion that is taken to be true for
the whole population. Thus, a good sample is
POPULATION
one that truly represents the population, so
that conclusions made from the sample is
valid for the entire population. If a sample is
bad, then conclusions from it may not be
valid for the population. The fact is,
information could change from one sample to Sample
another sample of the same population.

Illustration 2:
A student researcher wants to do a survey among CLSU students. Instead of
doing a survey of all the students in CLSU, he just chose and surveyed a group of 45
students (five students per college). In this scenario, the population is all the students
of CLSU, while the sample is the group of 45 students.

Organizing Data
Considered as Phase I of organizing data is data collection, where each element
of the data is called a data point. Generally in this phase, the raw data may not show
any apparent pattern or trend.

Illustration 3:
Phase I. The following data are the respective number of kids of 50 families.
0 2 1 0 3 2 0 1 1 0
0 1 1 2 4 1 0 1 1 0
2 1 0 0 3 0 0 1 2 1
0 0 2 4 1 1 0 1 2 0
1 1 0 3 5 1 2 1 3 2

The above raw data as it is presented, suggests nothing but just numbers. But if
we organize the data (Phase II), they become more meaningful.
Frequency Distribution Table
The most common way of organizing data is using a frequency distribution table
or FDT. It utilizes a table that lists all data points, along with how many times the data

Page 3 of 34
Mathematics in the Modern World | 4. Data Management

point occurs (frequency, ), and its percentage of the total number of data (relative
frequency, ).

Illustration 4: (Ungrouped Data)


Phase II. Frequency distribution of the data in Illustration 3
# of Kids Tally Frequency Relative Frequency
0 IIII – IIII – IIII - I 16 32 %
1 IIII - IIII – IIII - III 18 36 %
2 IIII – IIII 9 18 %
3 IIII 4 8%
4 II 2 4%
5 I 1 2%
Total = 50 100 %
Observe that the data has become more meaningful; for example, we can now
see that majority (a total of 86%) of the families are small-sized with only 2 or less
kids.
Note that in Illustration 3, there are only a few distinct data points (0, 1, 2, 3, 4,
or 5). If there are many distinct data points, it is better to group together the data that
belong to the same interval, as illustrated below.

Illustration 5. (Grouped Data)


Phase I. The following are examination scores of 42 mathematics students.
26 16 21 34 45 18 41
48 27 22 30 39 62 25
29 31 28 20 56 60 24
32 33 18 23 27 46 30
49 59 19 20 23 24 38
25 61 34 22 38 28 62

Phase II. We organize the raw data into a frequency distribution. First, we must decide
on how many groups to use. Customarily, the number of groups is any number from
4 to 8. Say, we use 6 groups here. Second, we determine the interval for each
group. This is done by,
62 16
= =7.66̅
6

In order to be consistent with the data which are integers, we round it off to 8.
Hence, the frequency distribution is
Score (x) Tally Frequency Relative Frequency
16 ≤ x < 24 IIII – IIII – I 11 26 %
24 ≤ x < 32 IIII - IIII – IIII - III 13 31 %
32 ≤ x < 40 IIII – II 7 17 %
40 ≤ x < 48 III 3 7%
48 ≤ x < 56 II 2 5%
56 ≤ x < 64 IIII -I 6 14 %
Total n = 42 100 %

Page 4 of 34
Mathematics in the Modern World | 4. Data Management

Histogram
Data that are grouped in intervals can be depicted by a histogram, which is
actually a bar graph that shows how the data are distributed. The histogram for the
data in Illustration 5 is:

15
12
Frequency (f)

9
6
3

16 24 32 40 48 56 64
Scores

Note that a histogram should show an accurate comparison of the data. That is,
the length of the rectangles must correspond to the frequencies of the intervals, and
the width of the rectangles must be of the same size, since each interval has the same
class interval.

The following histograms are erroneous and/or misleading. What‟s wrong?

1. If the data in Illustration 5 is grouped carelessly as in the following frequency


distribution:
Score (x) Frequency Relative Frequency Interval Width
15 ≤ x < 20 4 10 % 5
20 ≤ x < 25 11 26 % 5
25 ≤ x < 30 6 14 % 5
30 ≤ x < 45 11 26 % 15
45 ≤ x < 65 10 24 % 20
Total n = 42 100 %
The corresponding histogram would be:

10 25
8 20
6 15
4 10
2 5

15 20 25 30 35 40 45 50 55 60 65
Scores

Observe that the above histogram is misleading (Why?)

Page 5 of 34
Mathematics in the Modern World | 4. Data Management

2. Suppose that in the intramural games, the competing colleges earned the
tabulated overall points.

College CEd CAg CEn CVSM CoS CASS CBAA CF CHSI


Points 155 120 95 40 60 58 52 75 60

The corresponding histogram is presented as:

150 CEd

120
90 CAg
CEn
60 CF
CoS CASS
30 CVSM CBAA CHSI

College
What‟s wrong with the above histogram?

Pie Charts
The data used in the preceding examples were all quantitative (numerical). For
qualitative (categorical) data especially, an easy way to summarize data is through the
use of a pie chart. Pie charts are used to clearly show what part of the whole is
accounted by a specific characteristic.

Example.
In a certain small community, the marital status of its adult population is
tabulated below:
Marital Status Frequency Relative Frequency
Single (Si) 50 25%
Married (M) 113 56.5%
Widowed (W) 28 14%
Separated (S) 9 4.5%
Total 200 100%

A pie chart to summarize the tabulated data is

4.5%
S
14% 25%
W Si

56.5%
M

Page 6 of 34
Mathematics in the Modern World | 4. Data Management

The whole reason for constructing a pie chart is to convey information visually; it
should enable the reader to compare easily the relative proportions of the categorical
data. Thus, every slice of the pie should correspond to the relative frequency, which is
also written in the label. Using different colors for every slice in the pie may also help.
And, if the names of the categories are too long, a legend may be used.

25%
What‟s wrong with the pie chart at the right?
60% 30%

75%

Exercises

1. Identify the type of variable and the scale of measurement.


a. Student IQ k. Hair color
b. Distance travelled to school l. Socio-economic status
c. Score in Math quiz m. Level of satisfaction in a cellphone
d. Classification by place of birth n. Number of siblings
e. Student‟s year level o. Birth order
f. Final grade in MMW p. Date of birth
g. Size of shoes q. Favorite subject
h. Family income r. Educational attainment
i. Number of hours in school s. Length of stay in CLSU
j. Place of birth t. Nationality

2. A survey among CLSU students was conducted. It is about their starting position
when they sleep (face up, face down, left side, or right side). Five hundred students
were randomly interviewed. Do the collected data represent a sample or population?
Explain your answer.

3. The following data about the number of boyfriends/girlfriends of 40 students have


had were collected.
2 1 3 0 1 0 2 5 1 2
2 1 4 3 1 1 3 4 1 1
0 0 2 2 1 0 3 1 1 3
1 3 0 1 0 0 4 2 2 2
a. Organize the data into an ungrouped frequency distribution.
b. Create a histogram to summarize the data.
c. Construct a pie chart to represent the data.

Page 7 of 34
Mathematics in the Modern World | 4. Data Management

4. The following data is the scores of forty students in a 100-item math quiz.
59 65 57 83 64 60 57 89
62 75 90 90 87 66 69 81
69 83 79 59 70 83 82 57
89 92 84 58 73 67 89 66
93 68 59 77 62 90 80 79
a. Organize the data into a 6-group frequency distribution.
b. Create a histogram to summarize the data.
c. Construct a pie chart to represent the data.

Measures of Central Tendency

Measure of central tendency is a value that indicates where the center of


distribution tends to be located, or simply the average of the data. It is said to form the
basis of statistics. The most common measures of central tendency are the: mean,
median, and mode. On a perfect normal distribution, all three measures of central
tendency are located at the same score, which is at the center of the normal
distribution.

Mean

The mean is the most commonly used measure of central tendency. The mean of
a data set is the sum of the data points divided by the number of data points, or simply
the average of the data points. Thus, it is strongly influenced by outliers (data points
that are extremely low or extremely high compared to other data points). The
po0pulation mean, denoted by , is estimated by the sample mean denoted by ̅.

where are the data poins and is the number data points.

Some characteristics of the mean are the following:


1. The sum of deviations of the data points from the mean is zero. (Deviation is the
difference between a data point from a certain data point)
2. The sum of the squared deviations of the data points is minimum when the
deviations are taken from the mean.
3. If a constant is added (or subtracted) to every data point, the new mean is the
original mean increase (or decrease) by .
4. If every data point is multiplied (or divided) by a constant , the new mean is the
original mean multiplied (or divided) by .
5. Since the mean is a calculated number, it may not be an actual value in the data
points.

Page 8 of 34
Mathematics in the Modern World | 4. Data Management

Example 1: The data below are the current diesel prices (in pesos/liter) in nearby gas
stations, find the mean price.
43.80 44.10 42.95 43.80 44.30 39.00 44.30 43.80

Solution:
̅
̅ 43.26 pesos/liter

Example 2: Gabriel has a total of 4 quizzes. One quiz is missing while the scores of his
remaining quizzes are 43, 35 and 39. Calculate the score of the missing quiz if his
mean score is 41.

Solution:
Let denote Gabriel‟s score in his missing quiz.
̅

( )
47

Example 3: In a class of 18 men and 22 women, the mean score of men in a quiz is 38
while the mean score of women is 35. Find the mean score of the whole class.

Solution:
( ) ( )
̅
̅ 36.35

Mean of Grouped Data

In a grouped data, we do not know the individual data points. In such situations ,
we use the midpoints of the intervals to represent individual scores. Consequently, the
mean of the grouped data is only an approximation.
̅
where is the midpoint of each interval and is the frequency of each interval.

Page 9 of 34
Mathematics in the Modern World | 4. Data Management

Example 4: Find the mean score of 42 students from the following frequency
distribution:
Score Frequency
16 ≤x< 24 11
24 ≤x< 32 13
32 ≤x< 40 7
40 ≤x< 48 3
48 ≤x< 56 2
56 ≤x< 64 6

Solution:
Step 1: Add two columns for Midpoint ( ) and , and compute for its value. The
midpoint is half of the sum of lower limit and upper limit less by one measure of
unit in each interval (See the example below) while is the product of frequency
and midpoint in each interval.
Step 2: Compute for and .
Step 3: Use the formula ̅ to get the mean of the grouped frequency distribution.

Frequency
Score Midpoint ( )
( )
( )
16 ≤ x < 24 19.5 11 11(19.5) = 214.5
( )
24 ≤ x < 32 27.5 13 13(27.5) = 357.5
32 ≤ x < 40 35.5 7 248.5
40 ≤ x < 48 43.5 3 130.5
48 ≤ x < 56 51.5 2 103.0
56 ≤ x < 64 59.5 6 357.0
Total = 42 1411
Finally, ̅ 33.60
Note: Actually, the data in this example are those used in Illustration 5 of this chapter.
The reader is urged to compute the actual mean which is 33.64. It only shows that the
mean of a grouped data is just an approximation of the actual mean.

Median

The median is a value that separates an array of data points into two equal parts.
To find it, the data need first to be arranged in numerical order. If there is an odd
number of data points, then the median is the middle value. If there is an even number
of values in the data set, then the median is the average of the two middle values. The
median can be denoted by or ̃.
Unlike the mean, median is not affected by extreme values in data points because
it only considers the middle values in the data set.

Page 10 of 34
Mathematics in the Modern World | 4. Data Management

Example 5: Calculate the median age of the seven employees.


25 31 25 62 49 50 38

Solution: First, we need to arrange the data from lowest to highest.


25 25 31 38 49 50 62
Since there are 7 (odd) data points, the median is the middle value which is 38.

Example 6: The current crude oil prices (in pesos/liter) in nearby gas stations are listed
below. Find the median price.
43.80 44.10 42.95 43.80 44.30 39.00 44.30 43.90

Solution: 39.00 42.95 43.80 43.80 43.90 44.10 44.30 44.30


Median
There are 8 (even) data points, the median price is the average of the two middle
values, 43.80 and 43.90, which is 43.85 pesos/liter.

Mode

The mode of a data set is the data point that occurs most often. If no data point is
repeated or every data point is repeated the same number of times, there is no mode.
If the mode of a data set exists, it may not be unique. A unimodal data set has one
mode, bimodal has two modes, trimodal has three modes and multimodal has many
modes. The mode can be used for qualitative as well as quantitative data.
Mode is not affected by the extreme values in the data set, since it only considers
the most frequent data. Mode can be denoted by or ̂.

Example 7: Find the mode of the following data set;


a. 1, 2, 3, 4, 5, 6, 7, 8 b. 1, 2, 3, 4, 1, 2, 3, 4 c. 5, 8, 4, 8, 6, 7, 5, 3

Solution:
a. There is no mode because no data point is repeated.
b. There is no mode because all data points are repeated twice.
c. The mode is 5 and 8, since 5 and 8 are repeated twice.

Example 8: Thirty students are asked about their favorite color. The data is summarized
by the frequency distribution table below. Find the mode.
Color Frequency
Yellow 2
Blue 5
Red 5
White 8
Black 10
The mode is black, since it has the highest frequency.

Page 11 of 34
Mathematics in the Modern World | 4. Data Management

Exercises

1. What measure of central tendency is most appropriate?


a. Brand of shampoo students prefer.
b. Month which is most frequently visited by typhoons.
c. Average height of freshmen students.
d. Passing mark of half of the examinees will be accepted.
e. Average number of text messages sent in a day.

2. Find the mean, median, and mode:


1.25 1.00 1.50 2.00 2.25 1.75 3.00 2.75
2.50 1.50 3.00 2.50 2.75 1.25 1.00 1.50
2.25 1.50 2.75 1.50 2.00 1.75 3.00 2.00

2. Find the mean, median, and mode of each set of data:


a. 45 48 56 62 75 75 78 84
b. 3 48 56 62 75 75 78 84
c. 52 55 63 79 82 82 85 91
d. 90 96 112 124 150 150 156 168
i. Compare the data and the results for (a) and (b).
ii. Compare the data and the results for (a) and (c).
iii. Compare the data and the results for (a) and (d).

3. The mean monthly salary of 32 men is P28,500 while that of 38 women is P24,400.
Find the mean salary of all the men and women.

4. If your 1st Term Score in this class is 38.42, and your 2 nd Term Score is 43.83, what
score do you need in the 3rd Term so that your mean score is 60.25?

5. During quarantine many people tends to binge watch series in television, frequency
distribution table below shows the number of hours spent per day consumed by 30
teenagers. Find the mean time spent by teenagers to watch series per day.

Time (in hours) Frequency


0≤x<5 10
5 ≤ x < 10 12
10 ≤ x < 15 6
15 ≤ x < 20 2

Page 12 of 34
Mathematics in the Modern World | 4. Data Management

In some situations, the measures of central tendency cannot provide enough


information that would lead to a valid conclusion, especially when two or more sets of
data need to be compared. In the following example, a weakness of the mean, median
and mode is illustrated.

Suppose that we are choosing between Jerico and Jerwin on who should represent
CLSU to an upcoming Inter-University Math Quiz Bee. To choose, their coach conducted
6 sessions of quiz-alikes between them, and came up with the following scores:

Quiz 1 Quiz 2 Quiz 3 Quiz 4 Quiz 5 Quiz 6


Jerico 83 65 100 92 85 85
Jerwin 81 85 74 85 90 95

So, after the 6 quizzes, Jerico and Jerwin were tied at 3 wins and 3 losses. Who
should be chosen? Looking at their averages (verify);

Mean Median Mode


Jerico 85 85 85
Jerwin 85 85 85

Surprisingly, they are again tied in these measures. The mean, median, and the
mode cannot help in deciding on who should be sent to the Quiz Bee!
Another measure that could help is to look at their consistency. This is about the
measure of variability that is to look at how spread apart or dispersed their scores are.

Measures of Variability

A measure of variability (or dispersion) is a quantity that measures the spread of


scores in a given population. It indicates the extent to which observations in a data set
are scattered about the mean. Scores that are relatively close together have a lower
variation as compared to scores that are spread farther apart. To measure the spread
or dispersion of data, we use statistical values known as the range, variance and
standard deviation, these three statistical values are the most common measures of
variability.

Range

The range, denoted by , is the difference between the lowest and the highest
values in a data set. A weakness of the range is that an extreme value (outlier) can
greatly alter its value.
= Highest Value – Lowest Value

For example, Jerico‟s range is 100 – 65 or 35; Jerwin‟s range is 95 – 74 or 21.


This indicates that the scores of Jerico are more spread apart.

Page 13 of 34
Mathematics in the Modern World | 4. Data Management

Variance and Standard Deviation

First, we define deviation to be ̅ where is a data point and ̅ is the


mean. It is the difference of a data point from the mean.

Now, in order to test their consistency, it may be tempting to average their


deviations. But, as we can see in the following table, the sum of the deviations is
always 0. This results into Jerico‟s and Jerwin‟s average deviations to be both zeroes
also.
Jerico ( ̅ ) Jerwin ( ̅ )
Score Deviation Score Deviation
̅ ̅
83 –2 81 –4
65 –20 85 0
100 15 74 –11
92 7 85 0
85 0 90 5
85 0 95 10
Total 0 Total 0

Generally, in any set of data, it can be shown algebraically that the sum of the
deviations is always 0. The negatives always cancel out the positives. So, in order to
use deviations effectively to study how the data is dispersed, the remedy is to square
each deviation. This leads to what is called as variance.

Variance is the mean of the squared deviation of the data points. The sample
variance (denoted by ) is an estimator of the population variance (denoted by ). In
symbols, sample variance of data points where is the number of data
points is defined as

( ̅)

Note: 1. If the data points represent the entire population, the divisor used is .
But for sample data points, the divisor is – . It has been a general
observation and agreed upon by statisticians that using – rather than
produces a best estimate of the true population variance.

2. Remember that the variance of a sample is an estimate of the variance of


the population. Since there are far more data points in a population, the
population tends to vary more as compared to a sample. Thus, using n as
divisor in a sample tends to underestimate the true variance of the population.
Statisticians determined that using n – 1 would compensate for such an
underestimation.

Page 14 of 34
Mathematics in the Modern World | 4. Data Management

3. Alternatively, the variance may be computed relatively quicker and easier by


the equivalent formula below. We don‟t need the mean in using this formula.
( )
[ ]

Variance is a tool to enable us to measure the typical deviation found in a set of


data, by using the individual deviations of the data points. Recall that the deviations
were squared in order to overcome the negative deviations cancelling out the positives.
Now finally, we sort of undo the squaring process – take the square root. The result is
what is called the standard deviation.

Standard deviation is defined as the square root of the variance and is


denoted by (for sample) or (for population). Thus,

Example: Compute the respective (a) variance and (b) standard deviation of the scores
of Jerico and Jerwin.

Solution:

( ̅)
a. Using the formula ,

Jerico ( ̅ ) Jerwin ( ̅ )
Score Deviation Score Deviation
( ̅) ( ̅)
̅ ̅
83 –2 4 81 –4 16
65 –20 400 85 0 0
100 15 225 74 –11 121
92 7 49 85 0 0
85 0 0 90 5 25
85 0 0 95 10 100
( ̅) 678 ( ̅) 262
( ̅) ( ̅)

And so, Jerico‟s variance is 135.6 and Jerwin‟s variance is 52.4

Take note that the value in the Deviation column is computed by subtracting the
given mean from each data, for example 83-85=-2, 65-85=-20, 100-85=15, and so on;
while the value in the ( ̅ ) column is computed by squaring each value in the ̅
column, for example (-2)2=4, (-20)2=400, (15)2=225, and so on.

Page 15 of 34
Mathematics in the Modern World | 4. Data Management

( )
Other solution: Using the alternative variance formula, [ ]. We
need to find the sum of the data and the sum of the squares of each data point. We
don‟t need the mean of the data.

Jerico Jerwin
Score ( ) Score ( )
83 6 889 81 6 561
65 4 225 85 7 225
100 10 000 74 5 476
92 8 464 85 7 225
85 7 225 90 8 100
85 7 225 95 9 025
Σx = 510 Σx2 = 44 028 Σx = 510 Σx2 = 43 612
( ) ( )
[ ] [ ]

[ ] [ ]

= 135.6 = 52.4

Note that the two formulas for variance yield the same result. This is always the
case. In fact, it may be proven algebraically that the formulas are equivalent.

b. Finally, their respective standard deviation are


Jerico: √ Jerwin: √
= 11.64 = 7.24
Standard deviation (and variance) is a relative measure of the dispersion of a set
of data; the larger the deviation means the more spread out a set of data is. In a single
set of data, it may not be very informative. It is most useful in comparing the
(in)consistencies of two sets of data of the same type. The set with a lower standard
deviation contains data that are more consistent; the set with a higher standard
deviation contains data that are more spread out or dispersed (less consistent).

So, between Jerico and Jerwin in the example, Jerwin wins in as far as
consistency is concerned because he has a lower standard deviation 7.24 as compared
to Jerico‟s 11.64 .

Page 16 of 34
Mathematics in the Modern World | 4. Data Management

Exercises

1. Given the sample data 85 85 85 85 85 85 85.


Find the variance and the standard deviation of the data.

2. Given the sample data 6 10 12 12 11 17 9.


a. Use the definition of sample variance to find the variance.
b. Use the alternate formula to find the variance.
c. Find the standard deviation of the data.

3. Given the 3 sets of sample data:


A 3 6 9 12 15 18
B 53 56 59 62 65 68
C 60 120 180 240 300 360
a. How are the data in Set A related to those in Set B ?
b. How are the data in Set A related to those in Set C ?
c. Find the mean and standard deviation of each set.
d. How do the results for Set A and Set B compare?
e. How do the results for Set A and Set C compare?

4. Nik and Nok played 5 games in a bowling center. Their scores are:

Nik: 144 171 220 158 147


Nok: 182 165 187 142 159

a. Who is the better player based on mean score?


b. Who is the more consistent player? Why?

Page 17 of 34
Mathematics in the Modern World | 4. Data Management

Measures of Relative Position

As earlier discussed, the measures of central tendency especially the mean and
the median describe the „center‟ of a distribution. Indeed, such a center is what is
usually used and needed to summarize a distribution. Occasionally however, a different
part of the distribution is of more interest. The percentile, decile, and quartile are
used in such occasions, as they indicate the location of a data point relative to the other
data points.

Percentiles
Percentiles split the whole distribution into 100 subgroups. It is similar to cutting
a long pipe into 100 short pipes of equal lengths. In order to do this, it is necessary to
make 99 cuts. The points where the cuts are done correspond to percentile ranks or
scores. Thus, percentile ranks are from 1 to 99, which we hereby denote by P1, P2, P3,
…, P99, . There is no sense to have a P0, nor a P100.

A percentile is a value that describes the percentage of data that falls below it.
For example, suppose you got a 99 percentile score in an exam. It means that 99% of
the examinees scored lower than you; it doesn‟t mean that you had a score of 99%. In
fact, your actual score is not at all indicated.

Illustration:
Suppose that Sonny is among the 15,000 high school graduates who took
the CLSU Admission Test, and he got a 48 percentile score.

His 48 percentile score means that 48% of the 15,000 examinees (7,200)
scored lower than Sonny. It doesn‟t mean that his actual score in the exam is 48.
On the other hand, his actual score is lower than 52% of the 15,000 examinees
(7,800).

Suppose another student Nick got a percentile score of 68. This means
that 68% of the 15,000 examinees (10,200) scored lower than Nick while 32%
or 4,800 examinees scored higher than him.

The actual scores of Sonny and Nick both remain unknown, until we do
some calculations that also involve the whole distribution of data points, their
percentile scores, and the number of data points.

Calculating Percentiles
To find a data point that corresponds to a percentile score , the following steps
are suggested.
1. Arrange the data points numerically from lowest to highest.

Page 18 of 34
Mathematics in the Modern World | 4. Data Management

2. Find the location of the data point by the formula


Lp 
p
(n  1)
where = number of data points
100
3. Use to find the data point.
a. If the computed is an integer k, then the data point is in the kth
position of the arranged data.
b. If the value of includes a decimal such as k.d, then the data point Pp is
(kth data) + 0.d[(k+1)th data – kth data]

Example 1: Find P25 and P80 from the following data:


2 6 4 5 3 6 5 4 3 3 2 4 5 4 6

Solution: Note that P25 and P80 respectively refer to the 25th and 80th percentiles.
Step 1. Arrange the data in ascending order:
Location 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Data Point 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6
25 80
Step 2. L25  (15  1)  4 L80  (15  1)  12.8
100 100

Step 3. Since L25 = 4 (integer), then Since L80 = 12.8 (with decimal),
P25 = 4th data P80 = 12thdata + 0.8(13th–12th)
=3 = 5 + 0.8(6 – 5)
= 5.8
The data points that correspond to 25th and 80th percentiles are respectively 3 and 5.8.

Example 2: The following are heights (in inches) of some students:


Sonny 59 Melch 62 Ronel 66 Jhun 67
Nick 61 Jade 59 JR 64 Edu 63
Ingrid 58 Ammi 64 Dinah 63 Rene 67
Rose 64 Delia 58 Ped 66 Rain 67
Chad 70 Angie 64 Edwin 64
Chito 66 Jorem 63 Al 67
a. Find P30 and P60.
b. JR‟s height corresponds to what percentile?

Solution:
Step 1. Arrange the data according to height (shortest to tallest).
1. Ingrid 58 7.Jorem 63 13 JR 64 19 Jhun 67
2. Delia 58 8. Dinah 63 14 Rose 64 20 Rene 67
3. Sonny 59 9. Edu 63 15 Chito 66 21 Rain 67
4. Jade 59 10 Ammi 64 16 Ronel 66 22 Chad 70
5. Nick 61 11 Angie 64 17 Ped 66
6. Melch 62 12 Edwin 64 18 Al 67

Page 19 of 34
Mathematics in the Modern World | 4. Data Management

30 60
Step 2. L30  (22  1)  6.9 L60  (22  1)  13.8
100 100

Step 3.
P30 = 6th data + 0.9(7th–6th) P60 = 13thdata +0.8(14th–13th)
P30 = 62 + 0.9(63 – 62) P60 = 64 + 0.8(64 – 64)
P30 = 62.9 P60 = 64
a. P30 = 62.9 and P60 = 64.

b. JR‟s height is in the 13th location, meaning . So,


p
Lp  (n  1)
100
p
13  (22  1)
100
13  (100)
p
23
= 56.52

Due to the definition of percentiles, it is safest to always round-down for any


decimals. Thus, we can say that 56% of the students are shorter than JR.

Deciles

Deciles split the whole distribution into 10 subgroups. It is similar to cutting a


long pipe into 10 shorter pipes of equal lengths. In order to do this, it is necessary to
make 9 cuts. The points where the cuts are done correspond to decile ranks or scores.
Thus, decile ranks are from 1 to 9 denoted by D1, D2, D3, up to D9. It makes no sense
to talk about D0 nor D10.

Correspondingly, . That is,


D1 = P10 D4 = P40 D7 = P70
D2 = P20 D5 = P50 D8 = P80
D3 = P30 D6 = P60 D9 = P90

Consequently, computations for deciles may be done by using the corresponding


percentiles.

Example 3: In the preceding example (Example 2) of student heights, the 3rd decile D3
could be computed by considering P30, which was computed to be 62.9.
Furthermore, D6 = P60 = 64. Similarly, to find the 9th decile, D9 = P90
Computing for P90,
Step 1. (see arranged data in the preceding page)
90
Step 2. L90  (22  1)  20.7
100

Page 20 of 34
Mathematics in the Modern World | 4. Data Management

Step 3. P90 = 20thdata + 0.7(21th–20th)


= 67 + 0.7(67 – 67)
= 67
So, D9 = P90 = 67.

Quartiles

Quartiles split the whole distribution into 4 subgroups. It is similar to cutting a


long pipe into 4 shorter pipes of equal lengths. In order to do this, it is necessary to
make 3 cuts. The points where the cuts are done correspond to quartile ranks or
scores. Thus, percentile ranks are from 1 to 3 denoted by Q1, Q2, and Q3. It makes no
sense to talk about Q0 nor Q4.

Correspondingly, . That is,


Q1 = P25 Q2 = P50 = median Q3 = P75

Consequently, computations for quartiles may be done by using the


corresponding percentiles. In Example 1 on percentiles (p. 105), the 1st quartile
Q1 = P25 = 3.

Example 3. The following are heights (in inches) of some students, find Q1, Q2, and Q3.
Sonny 59 Melch 62 Ronel 66 Jhun 67
Nick 61 Jade 59 JR 64 Edu 63
Ingrid 58 Ammi 64 Dinah 63 Rene 67
Rose 64 Delia 58 Ped 66 Rain 67
Chad 70 Angie 64 Edwin 64
Chito 66 Jorem 63 Al 67

Solution: Since Q1 = P25, Q2 = P50 and Q3 = P75, we compute for the corresponding
percentiles.

Step 1. Arrange the data according to height (shortest to tallest).


1. Ingrid 58 7.Jorem 63 13 JR 64 19 Jhun 67
2. Delia 58 8. Dinah 63 14 Rose 64 20 Rene 67
3. Sonny 59 9. Edu 63 15 Chito 66 21 Rain 67
4. Jade 59 10 Ammi 64 16 Ronel 66 22 Chad 70
5. Nick 61 11 Angie 64 17 Ped 66
6. Melch 62 12 Edwin 64 18 Al 67

Step 2. a. For b. For c. For


25 50 75
L25  (22  1) L50  (22  1) L75  (22  1)
100 100 100
= 5.75 = 11.5 = 17.25

Page 21 of 34
Mathematics in the Modern World | 4. Data Management

Step 3.
a. P25 = 5th + 0.75(6th – 5th) b.P50 = 11th + 0.5(12th –11th)
= 61 + 0.75(62 – 61) = 64 + 0.5(64 – 64)
= 61.75 = 64

c. P75 = 17th + 0.25(18th – 17th)


= 66 + 0.25(67 – 66)
= 66.25

Thus, Q1 = P25 = 61.75 Q2 = P50 = 64 Q3 = P75 = 66.25

Exercises

The following are weights (in pounds) of newborn babies in a certain hospital in a
period of 1 week.

7.12 8.25 6.35 7.95 9.15 6.50 7.45 10.10


7.58 7.75 9.05 6.25 8.30 7.45 9.12 8.15
6.95 9.24 7.45 8.25 8.65 7.98 8.30 8.10

Find each of the following:

1. P5, P20, P30, P48, P60, P80, P88, P90, P95, P98
2. D1, D2, D3, D4, D5, D6, D7, D8, D9
3. Q1, Q2, Q3
4. If Baby A weighs 7.98 pounds and Baby B weighs 8.65 pounds, in what
percentile does Baby A and Baby B belong?

Page 22 of 34
Mathematics in the Modern World | 4. Data Management

Normal Distribution

Many sets of data exhibit a pattern such as what is exhibited in the following
histogram of some discrete data. Most of the data are concentrated towards the center
and taper off at either end; the data is almost symmetrical with respect to the “center”.

15
Frequency ( )

12

This type of data distribution occurs very frequently in many situations. The
normal distribution or the Gaussian distribution (in honor of Gauss, 1777-1835) is the
most important distribution in statistics. Statisticians created an ideal bell-shaped curve
(also called normal curve) to describe such a normally distributed data. The normal
curve is symmetric about a vertical axis through the mean, with a total are under the
curve equal to 1 and the curve is asymptomatic to the x-axis.

The Normal Curve

All data points are contained and spread under the bell shape, which is asymptotic
to the horizontal line. Characteristically,

1. Data points are clustered toward the center; only a few are found toward the
two ends or tails.
2. The number of data points at both sides is the same. Consequently, the three
measures of central tendency (mean, median and mode) all coincide at the
center.

Page 23 of 34
Mathematics in the Modern World | 4. Data Management

A wide variety of data have been observed to manifest the normal distribution,
and statisticians have established the occurrence and location of data points under the
normal curve. With the population mean and population standard deviation ,
occurrence of data under the normal curve has been established as illustrated below:

99.74%
95.44%
68.26%

µ- µ- µ- µ µ µ µ

Note: 1. 68.26% of the data are located from to .


2. 95.44% of the data are located from to .
3. 99.74% of the data are located from to .

Illustration. Assume that the scores of all 32,000 civil service examinees this year are
normally distributed. Their mean score is 66.5 points and the standard deviation is
2.4 points.

Solution: Based from the given, µ = 66.5 and σ = 2.4,


a. µ – σ = 66.5 – 2.4 = 64.1 and µ + σ = 66.5 + 2.4 = 68.9
This means that 68.26% of the 32,000 examinees or (21,843 examinees)
scored between 64.1 and 68.9 points.

b. µ – 2σ = 66.5 – 2(2.4) = 61.7 and µ + 2σ = 66.5 + 2(2.4) = 71.3


This means that 95.44 % of the 32,000 examinees (30,540 examinees)
scored between 61.7 and 71.3 points.

c. µ – 3σ = 66.5 – 3(2.4) = 59.3 and µ + 3σ = 66.5 + 3(2.4) = 73.7


This means that 99.74% of the 32,000 examinees (31,916 examinees)
scored between 59.3 and 73.7 points.

Example 1: In a recently concluded IQ Test among all 9,800 currently enrolled CLSU
students, results showed that the mean IQ is 100, with a standard deviation of
15. Assume that the scores are normally distributed. How many of the students
have an IQ

a) above 100 b) between 85 and 115 c) above 145?

Page 24 of 34
Mathematics in the Modern World | 4. Data Management

Solution: With the given µ = 100 and σ = 15, the distribution of the scores is

99.74%

95.44%
68.26%

55 70 85 100 115 130 145

a. Above 100.
Note that 100 is the mean, and in normal distribution mean is in the center.
Since a normal curve is symmetrical to the center (µ = 100), there must be
half or 50% of the scores above it. So, there are half of 9800 scores, that is
4900 students of the 9800 have an IQ above 100.

b. Between 85 and 115.


The interval is exactly from µ–σ to µ+σ which always accounts for 68.26% of
the population. So, 68.26% of 9800 or 6,689 of the 9800 students have IQs
between 85 and 115.

c. Above 145.
Those whose scored falls from 55 (or µ–3σ) to 145 (or µ+3σ) accounts for
99.74% of data. Hence, the remaining, that is those who scored above 145
(right tail) and below 55 (left tail), accounts only for 100%–99.74% = 0.26%.
Knowing that the normal curve is symmetric, only 0.13% are at each of the
two tails. Thus, 0.13% of 9800 which is approximately 12 students have an
IQ above 145.

Notice that we round down the answer.

The Standard Normal Distribution

Observe in the preceding example that the numbers involved in the questions
(100, 145, 85, and 115) are precisely where µ, µ+3σ, µ–σ, and µ+σ are respectively
situated in the normal curve. Now, suppose there is a question such as “How many
students had an IQ above 120?”.

Page 25 of 34
Mathematics in the Modern World | 4. Data Management

We see that 120 lies somewhere in the interval (µ+σ, µ+2σ), that is (115, 130).
In cases such as this, the z-distribution comes in.

The z-distribution is basically a standardized version of the normal distribution,


hence called the Standard Normal Distribution. With the aid of Calculus and
Probability, mathematicians and statisticians determined the percentages of the areas
of various intervals under the normal curve with respect to the area of the entire bell
figure. To achieve this, it was necessary to convert every data point to its equivalent
z-score by the formula

This resulted into a normal distribution whose mean is 0 and standard deviation 1, as
illustrated in the following z-curve.

–3 –2 –1 0 1 2 3
z-score

Illustration 1: In the preceding example about IQ Test of 9800 students whose µ = 100
and σ = 15, a score of 120 corresponds to a z-score of
̅

For various z-scores, the following z-tables summarize the areas under the curve
as compared to the entire area which is taken to be 1. A z-table , also called the
standard normal table, is a statistical table that allows us to know the percentage or
proportion of values below (or to the left) of a z-score in a standard normal
distribution. There are two z-table, negative z-table for negative z-score and positive z-
table for positive z-score.

Page 26 of 34
Mathematics in the Modern World | 4. Data Management

Table 1. Negative z-table. STANDARD NORMAL DISTRIBUTION (Source: Consumer Dummies)

Page 27 of 34
Mathematics in the Modern World | 4. Data Management

Table 2. Positive z-table. STANDARD NORMAL DISTRIBUTION (Source: Consumer Dummies)

Page 28 of 34
Mathematics in the Modern World | 4. Data Management

How to use z- table?


i. Compute for the z-score and round it off to two decimal places.
ii. Based on the computed z-score, use its corresponding z-table. Negative z- table
for negative z-score, while positive z- table for positive z-score. The z- table is
composed of rows and columns, the rows represent the whole number and the
first decimal of the z-score, and the columns represent the second decimal of
the z-score.
iii. Look for the intersection of the row and column that corresponds to the
computed z-score. The value in the intersection represents the portion or
percentage that falls below (or from the left) of the given .

Illustration: In the Illustration 1, we calculated that a score of 120 corresponds to a


z-score of 1.33. The z-table gives us 0.9082, it implies that 0.9082 or 90.82%
has a score below 120.

Example 2: In the recently concluded IQ Test among all 9,800 currently enrolled CLSU
students, results showed that the mean IQ is 100, with a standard deviation of
15. Assume that the scores are normally distributed. How many of the students
have an IQ
A: a) above 100 b) above 145 c) between 85 and 115
B: a) above 120 b) less than 90 c) between 80 and 130

Solution:
The solutions for the A problems have been earlier found in Example 1 where it
wasn‟t necessary to use z-scores. We do them here again using z-scores.

a) = 100 b) = 145 c) = 85 x = 115

= -1.00 = 1.00
= 0.00 = 3.00
Using now the z-table, noting that the values therein are areas under the curve from the
left up to z, we read off the following values:
0.5000 0.9987 0.1587 0.8413
Below z = 0 is 0.5000, it Below z = 3 implies that Below z=-1 is 0.1587 and below z=1
means that above z = 0 is above z = 3 must be is 0.8413, to get the area or
also 0.5000, since 1– 0.9987 or 0.0013. percentage between –1 < z < 1 we
1-0.5000=0.5000. need to get the difference,
0.8413-0.1587=0.6826.
So, there are So, there are
(0.0500)(9800) or 4,900 (0.0013)(9800) or 12 So, there are (0.6826)(9800) or
students. students. 6,689 students.

Compare these results with the earlier solution.

Page 29 of 34
Mathematics in the Modern World | 4. Data Management

Similarly now for the B problems,


a) = 120 b) = 90 c) = 80 x = 130

= -1.33 = 2.00
= 1.33 = -0.67
Using now the z-table, noting that the values therein are areas under the curve from the
left up to z, we read off the following values:
0.9082 0.2514 0.0918 0.9772
Above z = is Below z = is –1.33 < z < 2 has the area
1 - 0.9082 = 0.0918. 0.2514. 0.9772 – 0.0918 or 0.8854.

So, there are So, there are So, there are (0.8854)(9800) or
(0.0918)(9800) or 899 (0.2514)(9800) or 2,463 8,676 students.
students. students.

Example 3: The times taken to answer a mathematics exam have a normal distribution
with a mean of 65 minutes and standard deviation of 5 minutes. There are 200
students who took the exam.
a. How many examinees finished their exam in less than 1 hour?
b. How many examinees finished their exam in 63 to 72 minutes?
c. If the exam is good only for 75 minutes, how many examinees failed to finish the
exam on the given time limit?

Solution: Given: =65 and σ=5. Let x be the time taken to answer the exam.
a. Consider below x = 60, we convert 1 hour to minutes because and σ is in
terms of minutes.
60-65
 z= =-1.00
5
 Using the z-table, below z=-1.00 is 0.1587.
 Hence, (0.1587)(200) or 32 examinees finished the exam in less than an hour.

b. Consider between x = 63 and x = 72.


63-65 72-65
 z= = -0.40 and z = =1.40
5 5
 Using the z-table, below z = -0.40 is 0.3446 and below z = 1.40 is 0.9192.
 It implies that the portion between z = -0.40 and z = 1.40 is
0.9192 – 0.3446 = 0.5746
 So, (0.5746)(200) or 114 examinees finished the exam in 63 to 72 minutes.

c. Examinees who failed to finish the exam are those whose time is above x = 75.
75-65
 z= = 2.00
5
 Using the z-table, below z = 2.00 is 0.9772.
 It implies that above z = 2.00 is 1 – 0.9772 = 0.0228
 (0.0228)(200) or 2 examinees failed to finish the exam within the time limit.

Page 30 of 34
Mathematics in the Modern World | 4. Data Management

Exercises

I. In a math class of 60 students, the mean general average of the students at the
end of the semester is 63.60 with a standard deviation of 3.40. Assuming that the
scores are normally distributed,

1. Sketch the normal curve that indicates the intervals representing 1, 2 and 3
standard deviations from the mean.

2. What percent of the data lies in each of the intervals?

3. How many students had a general average


a. between 60.2 and 70.4?
b. above 73.80?
c. below 56.40?

4. How many students had a general average between 55 and 70?

5. If the passing score is µ – 3σ, how many would fail?

6. If the passing score is 50, how many would pass?

II. An important quality characteristic for softdrink bottlers is the amount of softdrink
injected into each bottle. In a particular filling process, the number of ounces
injected into an 8-ounce bottles is approximately normally distributed with mean
8.00 ounces and a standard deviation of 0.05 ounce. Bottles that contain less than
7.90 ounces do not meet the bottler‟s quality standard. If 20,000 bottles are filled,
approximately how many will meet the quality standard?

Page 31 of 34
Mathematics in the Modern World | 4. Data Management

Chapter Assessment

1. The weights (in kilos) of some live chicken at the University Poultry Project are as
follows:
1.1 2.4 1.3 0.9 2.0 1.7 3.0 1.4 2.8
1.7 2.3 1.8 2.7 2.1 3.2 1.8 3.3 0.9
1.9 2.9 2.8 1.8 2.6 3.1 2.4 3.0 2.6
3.2 2.4 3.1 2.5 1.9 1.2 1.5 2.3
a. Organize the data in a frequency distribution.
b. Represent the data in a histogram.

2. The College of Science Registrar submitted the following tabulated student


population according to course:
BSBio BSChem BSMath BSES BSStat
Male 86 110 64 38 74
Female 124 168 38 18 51

a. Use a pie chart to describe the male population


b. Use a pie chart to describe the female population
c. Use a pie chart to describe the total population

3. PAGASA reported the following data about the yearly number of typhoons (NT) that
the country experienced in the past.
Year NT Year NT Year NT
2005 8 2010 13 2015 17
2006 15 2011 14 2016 23
2007 22 2012 18 2017 15
2008 9 2013 21 2018 21
2009 17 2014 14 2019 16

Find the mean, median, and the mode of the yearly number of typhoons.

4. Suppose that the mean score of 26 students (in a class of 30 students) in the MMW
Final Exam is 57.98. What should be the total score of the remaining 4 students in the
class in order that the class mean is 60.00?

5. Given the following sample data;


12 11 6 17 10 9 12
a. Find the variance in two ways.
b. Find the standard deviation.

6. Find the mean and the standard deviation of each data set:
a. 2 4 6 8 10
b. 52 54 56 58 60

Page 32 of 34
Mathematics in the Modern World | 4. Data Management

c. 60 60 60 60 60
d. 21 60 60 60 99
Make some observations about the data sets and their respective means and standard
deviations.

7. Tabulated below are diesel prices (pesos/liter) of some gas stations in the cities of
San Jose and Cabanatuan.
San Jose 39.95 40.15 41.05 39.80 40.65 41.25
Cabanatuan 41.20 38.15 42.25 40.35 39.80 41.70
a. Find the mean price of diesel in each city.
b. Find the standard deviation in each city.
c. Which city has a more consistently priced diesel? Why?

8. Consider again the following weights (in kilos) of some live chicken at the University
Poultry Project (Exercise #1).
1.1 2.4 1.3 0.9 2.0 1.7 3.0 1.4 2.8 1.2
1.7 2.3 1.8 2.7 2.1 3.2 1.8 3.3 0.9 1.5
1.9 2.9 2.8 1.8 2.6 3.1 2.4 3.0 2.6 2.3
3.2 2.4 3.1 2.5 1.9

a. Find the percentiles P33 P66 P50 P90 P95


b. Find the deciles D2 D4 D5 D8 D8 D9
c. Find the quartiles Q1 Q2 Q3

9. For all brands of pain reliever pills in the market, the amount of time between taking
a pill and getting relief is normally distributed with a mean of 18 minutes and a
standard deviation of 3 minutes. Find the probability that after taking a pill, one will
feel relief within

a. at least 24 minutes
b. at most 15 minutes.

10. The results of a citywide exam for assessing the efficiency and capability of tricycle
drivers in their service to the public were normally distributed with a mean score of
72 and a standard deviation of 12. The drivers who scored in the top 10% are to
receive a special certificate, while those in the bottom 20% will be required to
undergo a remedial training/workshop.

a. What score does a driver need to score in order to receive a special


certificate?
b. What score will dictate that the driver should undergo the workshop?

Page 33 of 34
Mathematics in the Modern World | 4. Data Management

11. In a mango farm of 100 trees, the mean harvest is 96.5 kilos of fruit per tree with a
standard deviation of 8.4 kilos. Assuming that the harvests per tree are normally
distributed,

a. Sketch the normal curve that indicates the intervals representing 1, 2 and 3
standard deviations from the mean.

b. How many trees yielded a harvest


i. between 79.7 and 113.3 kilos of fruit?
ii. above 100 kilos?
iii. below 80 kilos?
iv. How many trees yielded between 90 and 110 kilos of fruit?

c. If a tree that yielded below µ – 3δ is to be cut off, how many trees would be cut
off?

d. If a tree that only yielded 50 kilos or less is to be cut off, how many trees would
remain?

Page 34 of 34

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy