Chapter 4.data Management Lesson 1 2
Chapter 4.data Management Lesson 1 2
Mathema
tics as a
Tool
(Part 1)
Mathematics
as a tool refers to the
use of mathematical
concepts, techniques,
and principles as
instruments for
solving practical
problems, making
decisions, and
gaining a deeper
understanding of the
world.
Chapter 4
Data Management
Data management refers
to the practice of collecting,
storing, organizing, and
maintaining data in a structured
and systematic way.
d. Stub column or Stub usually found at the leftmost column of the table. It lists the
major independent or predictor variables.
e. Column heading is the heading that identifies the entries in just one column in the
table body.
True Lower Class Boundaries (TLCB) = Lower Limit – 0.5 unit of a measure
True Upper Class Boundaries (TUCB) = Upper Limit + 0.5 unit of a measure
a. Less than CF (<CF). It is the accumulated frequency from the lowest class
interval.
Mean
The mean represents the center of the data. It is the most
important measure if the distribution is symmetric and the most stable
measure of location. It is used when the data is at least interval. When n
is small, the mean is very sensitive to extreme values.
It is computed by summing all the observations in the sample and
dividing the sum by the number of observations.
Properties of Mean
a) A set of data has only one mean.
b) Mean can be applied for interval and ratio data.
c) All values in the data set are included in computing the mean.
d) The mean is very useful in comparing two or more data sets.
e) Mean is affected by the extreme small or large values on a data set.
f) Mean is most appropriate in symmetrical data.
For the ungrouped data, the following are the formulas of the mean.
admissions
Solution: years
Weighted mean ( or ) is the sum of the mean of each
group multiplied by its respective weight divided by the
sum of the weights. (For mean alone, the weight values
in each distribution are equal). Example of weighted
mean is solving the weighted average of a student in a
semester to determine whether he or she belongs to the
dean’s list. Each of his or her grade has a corresponding
number of units (Example, GECMAT is 3 units, major
subject is 4 or 5 units, and so on.)
The formula of the weighted mean is
Example 5. Francis answered 20 calculus problems. He spent
1 hours for the first 6 problems; 45 minutes for the next 3;
and 3 hours for the last 11 problems. What was the average
time (in minutes) he spent for the 20 problems?
Solution
This problem requires the weighted average time because
each set of problems has a weight (which is time).
Median (Population median: , Sample median: )
Median is the positional middle of the data array. In the data array,
one-half of the values precede the median and one-half follow it.
When the data set is ordered, whether ascending or descending, it is
called a data array. Median is an appropriate measure of central
tendency for data that are ordinal or above, but is more valuable in an
ordinal type of data.
Properties of Median
• The median is unique, there is only one median for a set of data.
• The median is found by arranging the set of data from lowest to
highest (or highest to lowest) and getting the value of the middle
observation.
• Median is not affected by the extreme small or large values.
• Median can be applied for ordinal, interval and ratio data.
• Median is most appropriate in a skewed data.
For ungrouped data, the first step in calculating
the median, denoted by (), is to arrange the data in
an array. Let the observation in the array, .
If is odd, the median position equals , and the
value of the observation in the array is taken as
the median, i.e. .
If is even, the mean of the two middle values in
the array is the median, i.e.
Example 6. Find the median of the given data set: 75, 67, 71, 75, and 72
Solution
First, arrange the data set in ascending order: 67, 71, 72, 75, 75
Therefore, .
Solution
Array: 2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3
seconds
Mode (Population mode: , Sample mode: )
Mode is the observed value the occurs most
frequently. It locates the point where the observation
values occur with the greatest density. It does not always
exist, and if it does, it may not be unique. A data set is
said to be unimodal if there is only one mode, bimodal if
there are two modes, multimodal if there three or more.
There are some cases when a data set values have the
same number frequency. When this occurs, the data set is
said to be no mode.
Properties of Mode
• The mode is found by locating the most
frequently occurring value.
• The mode is the easiest average to compute.
• There can be more than one mode or even no
mode in any given data set.
• Mode is not affected by the extreme small or
large values.
• Mode can be applied for nominal, ordinal,
interval, and ratio data.
Example 8. The eight hospitals described in Example 1 had
the following number of ICU admissions: 8, 11, 5, 14, 8, 11,
16, and 11. Find the mode.
Solution
admissions
Example 9. The reaction times for a random sample of 9
objects described in Example 6 were recorded as 2.5, 3.6, 3.1,
4.3, 2.9, 2.3, 2.6, 4.1, and 3.4 seconds. Calculate the mode.
Solution
does not exist since all values have the same frequency.
For the grouped data, the mean is or , where is the frequency of
the class interval and is the midpoint of the class interval.
Example 10. Calculate the mean grade of 50 students in statistics
below and give its description or interpretation.
Solution.
First, determine the midpoint () of each interval and the total
frequency () or .
Second, add a column for , which is the product of a frequency ()
and the midpoint () of the class interval, and find the sum of
column or .
where
lower boundary of class containing the median
sample size
cumulative frequency of classes preceding class containing
the median
number of observations in class containing the median
width of the interval containing the median
Example 11. Calculate the median grade of 50 students in statistics
below and give its description or interpretation.
Solution
First, we add two columns for class boundaries and less
than cumulative frequency ().
By substitution,
where
lower class boundary of the modal class
difference between the frequency of the modal class and
that of the immediately preceding lower class
difference between the frequency of the modal class and
that of the immediately following the higher class
class width or size
Example 12. Calculate the modal grade of 50 students in statistics
below and give its description or interpretation.
Solution
First, determine the modal class of the distribution. The modal class of
the distribution has the highest frequency. Hence, 80-84 is the modal
class.
By substitution,
a) 3, 4, 5, 5, 6, 7, 9, 10, 14
b) 7, 8, 9, 9, 10, 10, 11, 12
Solution
c) 3, 4, 5, 5, 6, 7, 9, 10, 14
Mean: years
Median: Since is 9 (which is odd), use the formula .
. Hence, years.
Mode: The mode is 5 years since it has the highest frequency (it
appears twice in the distribution)
b) 7, 8, 9, 9, 10, 10, 11, 12
Mean: years
Median: Since is 8 (which is even), use the formula
where is the sample variance, is the sample standard deviation, is the value
of any particular observation or measurement, is the sample mean, and is the
sample size.
.
Example 15: A sample of 5 households showed the following
number of household members: 3, 8, 5, 4, and 4. Find the variance
and standard deviation.
Solution
First, solve for the sample mean () and add the columns for ()
and .
Second, solve for the sample variance and sample
standard deviation by substitution,
Measures of Relative Position
When presenting or analyzing data set it is sometimes
helpful to group subjects into several equal groups. For
example, to create four equal groups we need the values that
split the data such that 25% of the observations are in each
group. The cut off points are called quartiles, and there are
three (3) of them (the middle one also being called the
median). The general term for such cut off points is quantiles;
other values likely to be encountered are deciles, which split
data into 10 parts, and percentiles, which split the data into
100 parts (also called centiles). Values such as quartiles can
also be expressed as percentiles; for example, the lowest
quartile is also the 25th percentile and the median is the 50th
percentile or the 5th decile.
1. Percentiles
Percentiles are values that divide a set of observations in an
array into 100 equal parts. Thus, P1, read as first percentile, is the
value below which 1% of the values fall P 2, read as second percentile,
is the value below which 2% of the values fall,…, P 99, read as ninety –
ninth percentile, is the value below which 99% of the values fall.
Example. The 80th percentile of a distribution is a value such
that at least 80 percent of the ordered observations are less than its
value and at least 20 percent of the ordered observations are larger
than its value. If : At least 80% of the ordered observations are less
than 75 or at least 20% of the ordered observations are larger than 75.
So any observation that is smaller than value belongs in the lower
80% of the distribution while any observation greater than value
belongs in the upper 20% of the distribution.
To compute for the percentile, we have
Pi = the value of the observation in the array
Note:
If is a whole number, the percentile is the
observation.
If has a fractional value (decimal value), the
percentile is next higher integer value of .
Example 16. The following were the scores of 10 students in a
short quiz. Find the 64th percentile.
2 8 6 9 7 5 8 10 10
1
Solution: First arrange the data from lowest to highest.
1 2 5 6 7 8 8 9 10
10
Then, using
observation. We have
th
or 8th observation (always round up to the nearest whole number)
Since, the 8th observation in an ordered array is 9, therefore, the
64th percentile of the distribution is 9, which is interpreted as at
least 64% of the scores are below 9.
Approximating the Percentile from a Frequency distribution
To solve for the percentile in grouped data, we have
where
The Pith class is the class where the falls.
less than cumulative frequency of the class preceding the Pith class
First, add one column in the FDT for and determine the P35th class using .
Using , we have . Since 38.5 falls on the class interval 70 – 74, hence, the P35th class is 70
– 74. Therefore, we have
By substitution, we have
. Hence, at least thirty-five percent of the scores in the achievement test are below 70.82.
2. Deciles
Deciles are values that divide the array into
10 equal parts. Thus, D1, read as first decile, is the
value below which is 10% of the values fall, D 2,
read as second decile, is the value below which
20% of the values fall,…, D9, read as ninth decile,
is the value below which 90% of the values fall.
To compute for the decile, we have
Di = the value of the observation in the array
Example 18. From the given set scores in a quiz find the 4 thdecile or D4.
3 8 9 11 12 18 19
Solution
Since the data is already arranged from lowest to highest then we
may proceed in finding the 4thdecile.
3 8 9 11 12 18 19
Using , we have
th
or 4th observation (always round up to the nearest whole number)
Since, the 4th observation in an ordered array of the given
distribution is 11, therefore, the 4th decile of the distribution is 11, which
is interpreted as at least 40% of the scores are below 11.
Approximating the Decile from a Frequency distribution
To solve for the decile in grouped data, we have
where
The Dith class is the class where the falls.
less than cumulative frequency of the class preceding the Dith class
By substitution, we have
Hence, at least sixty percent of the scores in the achievement test are below 78.45.
3. Quartiles
Quartiles are values that divide the array into 4 equal parts. Thus, Q 1, read as
first quartile, is the value below which 25% of the values fall Q 2, read as second
quartile, is the value below which 50% of the values fall Q 3, read as third quartile, is the
value below which 75% of the values fall.
Example 20. From the given set scores in a quiz find the 3 rd quartile or Q3
3 8 9 11 12 18 19
Solution
Since the data is already arranged from lowest to highest then we may proceed in finding
the 3rd quartile.
Using , we have
th
observation.
Since, the 6th observation in an ordered array of the given distribution is 18, therefore, the
3rd quartile of the distribution is 18, which is interpreted as at least 75% of the scores are
below 18.
Approximating the Quartile from a Frequency distribution
To solve for the quartile in grouped data, we have
where
The Qith class is the class where the falls.
less than cumulative frequency of the class preceding the Qith class
By substitution, we have
Hence, at least 25% of the scores in the achievement test are below 67.
4. Score
Score is used to know the position of one
observation relative to others in a set of data. Let say, we
want to know a score of a student of 42 compared to the
scores of the other students in the class based from a quiz
on a total of 50 points. The mean and the standard
deviation of the scores can be used to compute a score,
which will measure the relative standing of a
measurement in a data set.
A score measures the distance between an
observation and the mean, measured in units of standard
deviation. The following formulas show how to compute
the score for a data value in a population and in a
sample.
For population: For sample:
Example 22: The monthly expenditures of a large group of
households has a mean of ₱48,700 and a standard deviation of
₱10,400. What is the value of monthly expenditures of ₱59,100
and ₱38,300?
Solution
Let ₱48,700 and ₱10,400
Using the formula of to determine values for the two values
(₱59,100 and ₱38,300) are computed as follows:
For ₱59,100:
For ₱38,300:
The of 1.00 indicates that a monthly expenditure of ₱59,100 for
households is one standard deviation above the mean, and a of
shows that a ₱38,300 monthly expenditure is one standard
deviation below the mean. Note that both household monthly
expenditures (₱59,100 and ₱38,300) are the same distance
(₱48,700) from the mean.
Example 23: Raul has taken two tests in his mathematics
class. He scored 72 on the first test, for which the mean
of all scores was 65 and the standard deviation was 8. He
received a 60 on a second test, for which the mean of all
scores was 45 and the standard deviation was 12. In
comparison to the other students, did Raul do better on
the first test or the second test?
or hours
The light bulb had a life span of 950 hours.
5. Box-and-Whisker Plot
A box-and-whisker plot (sometimes called a
boxplot) is often used to provide a visual
summary of a set of data. It is a graph of a data set
obtained by drawing a horizontal line from the
minimum data value to first quartile (), drawing a
horizontal line to third quartile () to the maximum
data value, and drawing a box whose vertical line
passes through and with a vertical line inside the
box passing through the median or second quartile
().
The boxplot will give the following information:
a) If the median is near the center of the box, the distribution is
approximately symmetric.
b) If the median falls to the right of the center of the box, the distribution is
negatively skewed.
c) If the median falls to the left of the center of the box, the distribution is
positively skewed.
d) If the lines are about the same length, the distribution is approximately
symmetric.
e) If the left line is larger than the right line, the distribution is negatively
skewed.
f) If the right line is larger than the left line, the distribution is positively
skewed.
Example 25: Construct a boxplot for the data set of the ages of 11 middle-management employees of
a certain company. The ages are 45, 48, 49, 49, 51, 51, 53, 55, 55, 58, and 59. What can you say
about the distribution of the data set?
Step 1: Determine the , Median, and of the given data set. Recall that , Median = 51, and .
Step 2: Locate the lowest value, , the median, , and the highest value on the scale.
Step 3: Draw a box around and , draw a vertical line through the median, and connect the upper and
lower values, as shown in the figure below.
𝑄1 ~ 𝑄3
𝑋
𝐿𝑆 𝐻𝑆