MMW PPT Weeks 9 12
MMW PPT Weeks 9 12
MMW PPT Weeks 9 12
Introduction:
Central Tendency is the point about which the scores tend to cluster. It is the
center of concentration of scores in any set of data. It is single number that represents
the general level of performance of a group.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. calculate the mean, mode, median and range for a set of discrete data
2. determine the appropriate measure of central tendency for a given set of
data.
3. discuss the characteristics and uses of mean, median and mode.
Course Material:
Arithmetic Mean ( x ) is the average of the set of data. It is the center of the gravity of a
distribution. (Ungrouped data)
x=
∑x
n
65 55 89 56 35 14 56 55 87 45 92
x=59
51
Grouped Data
x=
∑ fx
n
Where:
f = frequency n = number of
x = class marks samples
Example:
80-84 1 82 82
75-79 1 77 77
70-74 1 72 72
65-69 4 67 268
60-64 4 62 248
55-59 7 57 399
50-54 6 52 312
45-49 6 47 282
40-44 6 42 252
35-39 3 37 111
30-34 0 32 0
25-29 1 27 27
n = 40 2130
x=
∑ fx
n
2130
x=
40
x=53.25
52
Median (~
x ) is a positional value. It is the midpoint of the distribution when data are
ranked according to size. (Ungrouped data)
In order to calculate the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case, 56 (highlighted in bold).
It is the middle mark because there are 5 scores before it and 5 scores after it.
This works fine when you have an odd number of scores, but what happens
when you have an even number of scores? What if you had only 10 scores?
Well, you simply have to take the middle two scores and average the result. So, if
we look at the example below:
65 55 89 56 35 14 56 55 87 45
14 35 45 55 55 56 56 65 87 89
Only now we have to take the 5th and 6th score in our data set and
average them to get a median of 55.5.
Grouped Data
[ ]
n
− F≤
~ 2
x=¿ R + ×i
f
Where: F ≤ = cumulative less than frequency
¿ R = lower real limit f = frequency
n = sample
53
i = class size
Example
Class
Class
Frequency Boundaries ≤ CF
Intervals
lower
80-84 1 79.5 40
75-79 1 74.5 39
70-74 1 69.5 38
65-69 4 64.5 37
60-64 4 59.5 33
55-59 7 54.5 29
50-54 6 49.5 22
45-49 6 44.5 16
40-44 6 39.5 10
35-39 3 34.5 4
30-34 0 29.5 1
25-29 1 24.5 1
n = 40
40
Since =20 , then find the possible position of 20 in the cumulative frequency.
2
And it will fall under 22 which has the interval of 50-54.
Using the interval (50-54)
Given:
¿ R = 49.5 f=6
F ≤ = 16 n = 40
i=5
[ ]
n
− F≤
~ 2
x=¿ R + ×i
f
54
[ ]
40
− 16
~ 2
x=49.5+ ×5
6
~
x=52.83
Mode ( ^x ) is a frequency value. It is the value that occurs most frequently. (Ungrouped
data) Suppose we have the data below, what is the mean?
65 55 89 56 35 14 55 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 55 56 65 87 89 92
Grouped Data
mo=¿R +
[ du
du+dl]×i
Where:
Example
55
80-84 1
75-79 1
70-74 1
65-69 4
60-64 4
55-59 7
50-54 6
45-49 6
40-44 6
35-39 3
30-34 0
25-29 1
n = 40
Since the highest frequency in the given grouped data is 7, then the interval; 55-59 will
be used.
Solution
mo=¿R +
[ du
du+dl]×i
Given:
¿ R = 54.5 dl = (7-4) = 3
du = (7-6) = 1 i=5
mo=54.5+
[ ]
1
1+3
×5
mo=55.75
56
Watch:
Mode, Median, Mean, Range, and Standard Deviation
https://www.youtube.com/watch?v=mk8tOD0t8M0
40 47 50 61 62 65 30 25 24 61
29 32 35 46 54 60 60 60 43 50
59 34 36 48 31 30 57 48 44 53
57
LESSON 4: STATISTICS/DATA MANAGEMENT
UNIT 2– MEASURES OF DISPERSION
Introduction:
Measures of dispersion is a single number that describes how the
data are scattered or how much they are bunched. It is also called as measure of
variability or measure of spread.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. describe the range, variance and standard deviation
2. calculate the range, variance and standard deviation
Course Material:
MEASURES OF DISPERSION
a. Range is the simplest measure of dispersion. It is equal to the
difference of highest score and the lowest score of the set of scores. It involves
only the two extremes in a distribution.
b. Variance is a measure of variability that considers the position of each
observation relative to the mean of the set of scores.
Ungrouped data
SAMPLE VARIANCE
( S2 ) = ∑
2
(x − x)
n− 1
Where:
x = scores x = sample mean n = sample size
58
POPULATION VARIANCE
( σ )=
2 ∑ (x − μ)
2
N
Where:
x = scores μ= population mean N = population
x x− x ¿¿
2 2-6 = -4 16
3 3-6 = -3 9
4 4-6 = -2 4
5 5-6 = -1 1
6 6-6 = 0 0
8 8-6 = 2 4
10 10-6 = 4 16
10 10-6= 4 16
∑ x =48 ∑ (x − x)2=66
x=
∑x
n
48
x= =6
8
( S2 ) = ∑
2
(x − x)
n− 1
( S2 ) = 66 =9.43
7
59
Grouped data
SAMPLE VARIANCE
( S )=
2 ∑ f (m − x )2
n −1
Where:
f = frequency x = sample mean
m = class mark n = sample size
POPULATION VARIANCE
( σ )=
2 ∑ f (m− μ)
2
N
Where:
f = frequency μ = population mean
m = class mark N = population
b. Standard Deviation is derived from the positive square root of variance. It
has been termed because it provides a standard unit for measuring distance of various
scores from the mean.
Ungrouped data
SAMPLE STANDARD DEVIATION
( S )=
√ ∑ ( x − x )2
n −1
Where:
x = scores x = sample mean n = sample size
60
( σ )=
√ ∑ ( x − μ)2
N
Where:
x = scores μ= population N = population
mean
Example: A student was investigating the effect of synthetic fertilizer on the growth of
peanut seedlings. A random sample of those seedlings yielded the heights below in
inches. Find the standard deviation.
x x− x |x − x|
2 2-6 = -4 4
3 3-6 = -3 3
4 4-6 = -2 2
5 5-6 = -1 1
6 6-6 = 0 0
8 8-6 = 2 2
10 10-6 = 4 4
10 10-6= 4 4
x=
∑x
n
48
x= =6
8
( s )=
√ ∑ ( x − x )2
n −1
√
( s )= 66 =3.07
7
61
Grouped data
SAMPLE STANDARD DEVIATION
( S )=
√ ∑ f (m− x)2
n −1
Where:
f = frequency x = sample mean
m = class mark n = sample size
POPULATION STANDARD DEVIATION
( σ )=
√ ∑ f (m − μ)2
N
Where:
f = frequency μ = population mean
m = class mark N = population
Watch:
Measures of Variability
https://www.youtube.com/watch?v=Cx2tGUze60s
62
Activity / Assessment no. 11:
Answer the following.
1. In an advertising company, a random sample of 10 employees gave the
following information on the number of hours spent on consultation by prospective
clients each day: 3, 6, 4, 5, 3, 2, 2, 4, 3, and 2.
a. Find range
63
LESSON 4: STATISTICS/DATA MANAGEMENT
UNIT 3– MEASURES OF RELATIVE POSITION
Introduction:
There are measures of position or location. These measures include standard
scores, percentiles, and quartiles. They are used to locate the relative position of a data
value in the data set. For example, if a value is located at the 80th percentile, it means
that 80% of the values fall below it in the distribution and 20% of the values fall above it.
The median is the value that corresponds to the 50th percentile, since one-half of the
values fall below it and one half of the values fall above it.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. use a variety of statistical tools to process and manage numerical data
2. advocate the use of statistical data in making important decisions
Course Material:
a. Z- SCORES. The areas under the normal curve are given in term of z-values
or scores. Either the z-scores locates X within a sample or population.
FORMULA
Z-SCORE FOR POPULATION DATA
Z-SCORE FOR SAMPLE DATA
X− μ
Z=
σ X− X
Z=
s
64
Where:
X = given measurement X = sample mean
μ = population mean s =sample standard
σ = population standard deviation
deviation
Example: Raul has taken two tests in his chemistry class. He scored 72 on the first test,
for which the mean of all scores was 65 and the standard deviation is 8. He received a
60 on a second test, for which the mean of all scores was 45 and the standard deviation
12. In comparison to the other students, did Raul do better on the first test or second
test?
Solution:
72 −65 60− 45
Z= =0.875 Z= =1.25
8 12
The z-scores indicate that Raul scored better on the second test than he did in the first
test.
Example: On a reading examination given to 900 students, Elaine’s score of 602 was
higher than the scores of 576 of the students who took the examination. What is the
percentile of Elaine’s score?
number of data valuesless tℎan 600
Percentile= ∙ 100
total number of data values
65
576
Percentile= ∙100=64
900
Elaine’s score of 602 places her at 64th percentile.
c. QUARTILES. The quartiles of a set of data are the three numbers Q 1, Q2 and
Q3 that partition the ranked data into four equal groups. Q 2 is the median of the data, Q1
is the median of the data less than Q2 ,and Q3 is the median of the data values greater
than Q2..
i(n+1)
Qi=
4
Where:
Qi = quartile n = number scores i = position
Watch:
Percentiles, Quantiles and Quartiles in Statistics
https://www.youtube.com/watch?v=Ky7QeVgv-BA
66
1. A data set has a mean of 75 and standard deviation of 11.5. find the z-score for
each of the following:
a. x = 85
b. x = 60
c. x = 90
67
2. On a placement examination, August scored lower than 1210 of 12,860 students
who took the exam. Find the percentile, rounded to the nearest percent, for
August’s score.
65
LESSON 4: STATISTICS/DATA MANAGEMENT
UNIT 4– NORMAL DISTRIBUTION
Introduction:
Normal distribution is also known as a bell curve or a Gaussian distribution curve,
named for the German mathematician Carl Friedrich Gauss (1777–1855), who derived
its equation. No variable fits a normal distribution perfectly, since a normal distribution is
a theoretical distribution. However, a normal distribution can be used to describe many
variables, because the deviations from a normal distribution are very small.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. use a variety of statistical tools to process and manage numerical data
2. advocate the use of statistical data in making important decisions
Course Material:
It is bell-shaped.
The tails are asymptotic to the baseline.
66
The mean, median and mode in a normal curve, coincide. (they have the
same values.)
It is symmetrical about the mean
The total area of the curve is 100% or 1.00
Empirical Rule
On a normal distribution about 68% of data will be within one standard deviation
of the mean, about 95% will be within two standard deviations of the mean, and about
99.7% will be within three standard deviations of the mean
Example. A survey of 1000 gas station found that the price charged for a gallon of
regular gas could be closely approximate led by a normal distribution with a mean of
30.10PHP and a standard deviation of 1.8PHP. How many of the stations charge
between 27.4PHP and 34.6PHP for gallon of regular gas?
Solution. The 27.4PHP per gallon price is 2 standard deviation below the mean.
The 34.60PHP price is 2 standard deviations above the mean. In a normal distribution,
95% of all data lie within 2 standard deviations of the mean. Therefore the
approximately (95%) (1000)=950 of the stations charge between 27.40PHP and
34.60PHP a gallon of regular gas.
67
Example: Given the mean ( μ ¿ = 50 and the standard deviation (σ ¿ = 4 of a population
of reading scores. Find the z-value that corresponds to a score of 58.
.
Solution.
1. Get the z-value and express into 2 decimal places
58 − 50
Z=
4
8
Z=
4
Z=2.00
2.Find the area using the z-table.
The z-table
The area is 0.4772.
Determining Probabilities
Probabilities associated with the standard normal random variables can be
shown as areas under the standard normal curve.
68
Probability Notations
• P(a < z < b) denotes the probability that the z-score is between a and b.
• P(z > a) denotes the probability that the z-score is greater than a.
• P(z < a) denotes the probability that the z-score is less than a.
where a and b are z-score values.
Example: The entrance exam scores of incoming freshmen in a state college are
normally distributed with a mean of 78 and the standard deviation of 10. What is the
probability that a randomly selected student has a score?
a. below 77?
1. Express z – value into 2 decimal places
X− X
Z=
s
77 − 78
Z= = -1
10
69
3. Consult the z-Table and find the area that corresponds to z
Area = 0.5 – 0.3413 = 0.1587
4. Examine the graph and use probability notation to form an equation
showing the appropriate operation to get the required area.
P(z < -1) = 0.1587 x 100 = 15.87%
Watch:
Standard Normal Distribution Tables, Z Scores, Probability & Empirical Rule
https://www.youtube.com/watch?v=CjF_yQ2N638
70
Activity/ Assessment no. 13:
Solve the following.
1. Determine each of the following areas and show these graphically. Use probability
notation in your final answer.
a. above z = 1.46
b. below z = - 0.50
c. between z = 2.20 and z =1.00
2. The IQ scores of children in a special education class are normally distributed with μ
= 95 and σ = 10.
a. What is the probability that one of the children has an IQ score below 100?
b. What is the probability that a child has an IQ score above 120?
c. What is the chances that a child has an IQ score of 90?
71
LESSON 4: STATISTICS/DATA MANAGEMENT
UNIT 5– LINEAR REGRESSION AND CORRELATION
Introduction:
Correlation is a statistical method used to determine whether a linear relationship
between variables exists. Regression is a statistical method used to describe the nature
of the relationship between variables, that is, positive or negative, linear or nonlinear.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. Use the methods of linear regression and correlations to predict the value of a
variable given certain conditions
2. advocate the use of statistical data in making important decisions
Course Material:
Pearson’s Correlation Coefficient
It measures the degree of association or closeness of relationship between
two variables. It is a measure of linear association.
The correlation coefficient is always between -1 and +1. If you ever an answer
outside this range, you have made an error in you calculations.
r=
∑ ( x − x)( y − y )
√ ∑ ( x − x )2 √ ∑ ( y − y)2
Let x and y represent the respective mean of the x and y values from the sample
data.
Simplified Formula
r =n ¿ ¿
72
Example: A group of graduating Agricultural Engineering students designed,
constructed and tested a manually operated crop-grading machine as their thesis.
Using potato as a sample crop in the testing, the following data on capacity (kg/min)
and efficiency (%) of the machine were gathered for ten trials and are presented below.
Determine the correlation coefficient for these data.
73
Solution:
Capacity Efficiency
xy x2 y2
(x) (y)
r =n ¿ ¿
−248.05
r=
√ 26.02 √ 4721.74
− 248.05
r=
350.47
r =−0.708
LINEAR REGRESSION
74
Method of Least Squares is the sum of the squares of the vertical distances from
observed points to the line is as small as possible.
b=n ¿ ¿
Then
a= y −b x
Where.
y = means of the values of y
x = means of the values of x
a = y-intercept
b= slope
Prediction
Having determined the equation of the line, it is now possible to predict the value
of y given x.
Example: Determine the best linear relationship between speed and capacity, examine
the set of sample data pairs on speed and capacity as shown below.
75
Solution/ Computation:
x y XY x2
b=n ¿ ¿
10 ( 884.642 ) −(225.54)(37.68)
b=
10 ( 5678.257 ) −(225.54)2
348.07
b=
5914.28
b=0.059
And with the capacity mean of 3.77 kg/min. and mean speed of 22.55 rpm
a =3.77 – 0.059(22.55)
76
a = 2.44
Therefore, the equation of the line is:
y = 2.44 + 0.059 x
Prediction
Determine the capacity when speed is 25 rpm. Using the equation,
y = 2.44 + 0.059 x
y = 2.44 + 0.059 (25)
y= 3.92 kg/min.
Watch:
The (Pearson) Correlation Coefficient Explained in One Minute: From
Definition to Formula
https://www.youtube.com/watch?v=WpZi02ulCvQ
How to Perform Simple Linear Regression by Hand
https://www.youtube.com/watch?v=GhrxgbQnEEU
77
Calculate the regression equation. Estimate Math grade when the HS
average grade is 90.
78