Statistics Slide Notes - Lecture 3-8
Statistics Slide Notes - Lecture 3-8
IN EDUCATIONAL RESEARCH
1. It permits the most exact kind of description i.e, it is our
descriptive language eg, 10% of the students passed the exam.
2. Statistics forces us to be definite and exact in our thinking which
means it allows us to summarize our results in a short, meaningful
and convenient form. Hence, it saves us from the hassles of giving
long narrative descriptions of phenomena
3. It enables us to analyse relationships between variables, e.g The
average pass rate for the A class is 60% while that of the B class is
40%.
4. Thus, it enables us to predict how much of a variable will happen
under certain conditions.
MEASURES OF CENTRAL
TENDENCY
They indicate how much the terms in the distribution
move towards the middle; i.e centre.
A distribution is a set of data / numbers.
Frequency is the number of times.
There are three Measures of Central Tendency namely
mode, median and mean.
1. MODE
-It is the piece of data with the highest frequency, i.e, it
appears most often than the other characters.
1. MODE
A distribution can have no mode, one mode
or even more.
a). If a distribution has one mode, we say it
is unimodal.
b). If it has two modes, we say it is bimodal.
C). If it has three modes, it is called a
trimodal distribution
d). There is also a multimodal distribution
which has many modes
Advantages of Using the Mode in
Educational Research
There is no need to arrange numbers in order of size. It is not affected by extreme values
(outliers) as is with the range.
DISADVANTAGES
It does not use all the terms/characters in the distribution. It only focuses on those which
appear more frequently than others. Hence, it is not normally used for further statistical
calculations
SOLUTIONS
d) The modes are 81 and 82 hence it has a bimodal
e) No mode
f) The mode is 5 hence it is a unimodal
2. MEDIAN
SOLUTION
(a) Rank the numbers or arrange them in ascending order
(b) 1 3 4 5 6 7 7 8 9 n= 9
Median= ½ (9+1) th term
= 5th term
Therefore Median = 6
MEDIAN continues
(b) Ranking the numbers
19 24 29 36 50 60 77 82 100 105
= 50+60
2
= 110
2
DISADVANTAGES
-Does not use all the terms in the distribution
-It is not always used for further statistical calculations
-It can be different from the terms in the distribution
MEAN
It is also called arithmetic mean or average
-It is obtained by adding all the terms in the
distribution and dividing the sum by the number of
terms in the distribution.
Mean = sum of terms
n
It is represented by x̅ (x bar)
∑ Summation sign. This means something is being
added
x1 + x2+ - - - - -=∑ x
MEAN continues
∑x = (x₁ + x₂ + x₃ - - - -xn)
n
n = Number of terms in the distribution
x̅ = ∑x
n
MEAN continues
ADVANTAGES OF MEAN
- It uses all the terms in the distribution
- It is used for further statistical calculations eg in finding the variance
and standard deviation.
- There is no need to rank the numbers.
DISADVANTAGES
-It is affected by the outliers
-It can be different from all the terms in the distribution
Find the mean of each of the following
a) 60, 74, 88, 36, 54, 81, 93, 96, 50 68
b) 1000, 4000, 3600, 7200, 1112
a) x̅ = ∑x
n
MEAN continues
x̅ = 60+74+88+36+54+81+96+93+50+68
10
= 700
10
= 70
x̅ = 3382,4
MEASURES OF DISPERSION /VARIABILITY/ Scatter/ SPREAD
Disadvantages
Only focuses on outliers and ignores all the other terms
It is very misleading if the distribution has outliers
The Interquartile range
Examples 1
Find the IQR of the set 1,2,3,4,5,6,7
IQR cont’ed
Q1 = (n+1)/4 (7+1)/4=2
Q3 = ¾(n+1) ¾(7+1)=6
IQR = Q3-Q1
=6 - 2 = 4
IQR
Advantages
1. Can be used as a measure of variability if
the extreme values are not recorded correctly
2. It is not affected by extreme values
IQR cont’
Disadvantages
1.Not easy to calculate
2.it is a positional measure, based on only the
twenty- Øfth and seventy-Øfth percentile
b. VARIANCE
2. Subtract the mean from each term in the distribution to get the
deviations i.e
(x₁ - x̅ )(x₂-x̅ )- - - - (x n- x̅ )
Therefore Variance is = s² = ∑ (x - x̅ )²
n -1
The End
X x̅ x -x̅
x₁ ↑ (x -x̅ )
x̅
↓ ( Xn - X̅)²
∑X 0 ∑ ( X - X̅ )²
SUM OF SQUARED
DEVIATION
CALCULATION OF VARIANCE
X X̅ X - X̅ (X - X̅ )²
60 71 -11 121
83 71 12 144
71 71 0 0
63 71 -8 64
89 71 18 324
90 71 19 361
40 71 -30 961
72 71 1 1
∑ 1976
VARIANCE continues
Variance (s²) = ∑ (x -x̅ )²
n- 1
= 1976
8 -1
= 1976
7
SD = √ 282,2857143
SD = √ ∑ (X - X̅ )²
n–1
=√ 1976
7
=1680136049
= 16,80 (2 d p)
INTERPRETATION OF VARIANCE AND STANDARD DEVIATION
Z = X- X̅
S
Where( x) is score ( x̅ ) is mean and (S ) is
standard deviation
Maths 77 74 76 2,15
English 79 76 78 3,24
Z – score continues
a) Considering pupil A Z - score
Maths score = x - x̅ = 77- 76
s 2,15
= 1
2,15
= 0,465116279
= 0,465
Z - score continues
Pupil A Z - Score
Z- score = 79 – 78
3, 24
= 1
3,24
= 0, 308
Pupil A performed better in Maths since the z – score in Maths
is higher than in English.
Z- SCORE continues
= 74 – 76
2, 15
= -2
2,15
= -0,930232558
Therefore to two decimal places = -0,93
Z - SCORE continues
Pupil B English Z-score
= x - x̅
s
= 76 – 78
3,24
= -2
3,24
= - 0,61728
x x x
x x
x x x x
x x x x
0 X
Graph showing no correlation
MEASURES OF ASSOCIATION
continues
Graph showing negative correlation
y
x
x
x
x
x
x
x
x
0 X
MEASURES OF ASSOCIATION
continues
Graph showing perfect positive correlation
y
x
x
x
x
x
x
x
0 x
MEASURES OF ASSOCIATION
continues
A graph showing perfect negative correlation
Y
x
x
x
x
x
x
x
x
o x
MEASURES OF ASSOCIATION continues
CORRELATION CO-EFFICIENT
It is the number which shows the size and direction of association
between variables.
r normally represents a correlation co-efficient
The maximum value of r is + 1
The minimum value of r is – 1
This means r lies between -1 and + 1
When r is + 1 it is perfect positive correlation
When it is between 0,8 – 0,99 very strong positive correlation
“ “ 0,6 – 0,79 strong positive correlation
“ “ “ 0,4 – 0,59 moderate positive correlation
“ “ ‘ 0,2 – 0,39 weak positive correlation
“ “ “ 0,1 – 0,19 very weak positive
correlation
CORRELATION CO-EFFICIENT continues
r= n∑ x y - ∑ x ∑ y
√ [n∑ x² - (∑ x)²] [n ∑ y² - ( ∑y )²]
PEARSON’S PRODUCT
CORRELATION continues
The following table can be used to obtain the values
which are to be substituted in the formulae
x y x² y² Xy
x₁ y₁ x₁² y₁² x₁ y₁
x₂ y₂ x₂² y₂² x₂ y₂
x₃ y₃ x₃² y₃² x₃ y₃
x₄ y₄ x₄² y₄² x₄ y₄
∑x ∑y ∑ x² ∑ y² ∑xy
PEARSON’S PRODUCT
WORKED EXAMPLE
Ten Form 4 pupils at a certain school wrote two tests one in
History and the other one in Mathematics and results are as
follows
pup A B C D E F G H I J
il
HIS 80 74 56 52 78 90 73 65 40 75
TO
RY
Ma 40 52 75 74 50 54 59 60 71 48
ths
PEARSON WORKED
EXAMPLE continues
x y x² y² Xy
80 40 6400 1600 3200
74 52 5476 2704 3848
56 75 3136 5625 4200
52 74 2704 5476 3848
78 50 6084 2500 3900
90 54 8100 2916 4860
73 59 5329 3481 4307
65 60 4225 3600 3900
40 71 1600 5041 2840
75 48 5625 2304 3600
r= n∑xy-∑x∑y
√[n∑ x² - (∑ x)²] [n∑ y² - (∑ y )²]
n = 10
= - 13 159
√20 301 x 12 581
= -13 159
√255 401 881
= -13 159
PEARSON continues
Therefore r = - 0, 8 23 to 3 decimal
rho = │ -6 ∑ d²
n (n² - 1)
x y rx ry Rx -ry d²
50 52 2 2 0 0
60 3 3 0 0
75 58 5 5 0 0
42 80 1 1 0 0
92 47 6 6 0 0
61 95 4 4 0 0
60
∑ 380 ∑ 392 ∑ d² 0
SPEARMAN’S RANK ORDER continues
rho = │- 6 x 0
6 (6² – 1)
=│ -0
6 x 35
= │- 0
210 = │- 0 =│
MASS 63 61 51 58 48 75 57 60 75 61
(Y)
SPEARMAN RANK continues
x y Rx Ry d= r x-r y d²
61 63 3,5 8 -4,5 20,25
71 61 6 6,5 - 0,5 0,25
72 51 7 2 5 25
74 58 8,5 4 4,5 20,25
83 48 10 1 9 81
54 75 1 9,5 -8,5 72,25
74 57 8,5 3 5,5 30,25
67 60 5 5 0 0
57 75 2 9,5 -7,5 56,25
61 61 3,5 6,5 -3 9
∑ d² =314,5
SPEARMAN continues
n = 10
When ranking if there are common numbers you add
the numbers and divide by the number for example
75 in the above table under (y) it falls under position
9 and 10 so it becomes 9+10 =19 divided by 2 = 9,5
rho = │ -6 ∑ d²
n (n² - 1)
= │ - 6 x 314,5
10 ( 10² - 1)
SPEARMAN continues
= │- 1887
10 (99)
= │ - 1887
990
= │- 1,906060
= -0,906060
= - 0,91
There is a very strong negative correlation between age
and mass that is as some gets older the mass decreases
MEASURES OF ASSOCIATION continues
CORRELATION CO-EFFICIENT
It is the number which shows the size and direction of association
between variables.
r normally represents a correlation co-efficient
The maximum value of r is + 1
The minimum value of r is – 1
This means r lies between -1 and + 1
When r is + 1 it is perfect positive correlation
When it is between 0,8 – 0,99 very strong positive correlation
“ “ 0,6 – 0,79 strong positive correlation
“ “ “ 0,4 – 0,59 moderate positive correlation
“ “ ‘ 0,2 – 0,39 weak positive correlation
“ “ “ 0,1 – 0,19 very weak positive correlation
CORRELATION CO-EFFICIENT continues
SCALES OF MEASUREMENT
• Nominal Scale
• These are the most primitive scales primarily
used for labelling or naming. Each group is
assigned a number or a name as a
distinguishing label or classification for example
the numbers given to soccer players. It does
not mean that number 7 plays qualitatively
better than number 3 or vice versa. Gender /
sex where the labels male or females are used.
ORDINAL SCALES
According to Latif and Maunganidze (2004) an ordinal
scale is the simplest true scale which orders or ranks
people, objects or events along particular continuum.
There is qualitative classification and one can actually
say that one class is better than another relative to a
particular variable.
The scale determines the relative position of an object or
an individual with respect to others for example in
terms of academic performance religiosity, maturity
etc.
Ordinal scale continued
It satisfies the transitivity rule which states that if A is
greater (>) than B and B greater ( >) than C then A
is greater than (> ) C
Interval Scale
There is equality of units i.e the same numerical
distance is associated with the same empirical
distance on the same real continuum e.g. The
difference between 35° c and 25° c has the same
value as 80° c and 70° c . The intervals emanate
from an abitrary origin i.e. There is no true zero
point. Zero does not mean the total absence of what
is being measured e.g. I Q of 0 does not imply zero
intelligence
SCALES OF MEASUREMENT
continues
Ratio scales
pup A B C D E F G H I J
il
Ma 60 58 75 36 50 61 85 70 77 63
ths
EN 72 65 70 40 50 64 90 72 70 60
GLI
SH
T –test continues
Carry out a t test at 10 % significance level to determine if
there is a difference between the academic performance of
4A pupils in Maths and English.
SOLUTION
H₀ there is no difference between the academic performance
of 4A pupils in Maths and English
H₁ there is a difference in performance of 4A pupils in Maths
and in English.
T-test solution continues
A 2 tailed t – test at 10% significance level
n = 10
d f 10 – 1 = 9
Reject H₀ if │ t calc │ > 1,833
T – test working Continues
t calc =√ ( n – 1) ∑ d
√ n ∑ d² - ( ∑d )²
The values to be substituted in the formulae can be
obtained using the following table.
T- test continues
x y d= x-y d²
X₁ Y₁ D₁ = x₁ - y₁ ²D²
X₂ Y₂ D₂ = x₂ -y₂ D²
x₃ Y₃ D₃ = x₃ -y₃ D²
xn yn Dn = xn -yn dn²
∑x ∑y ∑d ∑ d²
T – test continues
x y D=x-y d²
60 72 -12 144
58 65 -- 7 49
75 70 5 25
36 40 -4 16
50 50 0 0
61 64 -3 9
85 90 -5 25
70 72 -2 4
77 70 7 49
63 60 3 9
t calc = √ ( n – 1) ∑ d
√ n ∑ d² - ( ∑ d)²
= √ (10 – 1 ) x - 18
√ 10 x 330 – (- 18 ) ²
= √ 9 x -18
√ 3300 – 324
T – test worked example continues
= - 54
√ 2976
= 54
54,5527679
= 0, 989868026
= - 0,99
Since the │ t calc │ < 1,833 accept H₀ and conclude that
there is no difference between the academic
performance in Maths and English
CHI – SQUARE
ASSUMPTIONS
1) Observations must be independent
2) The categories must be mutually exclusive i.e.
Each observation must appear in one and only
one of the categories in the table
3) The observations must be measured as
frequencies.
Chi – square test continues
(χ²) chi – square test
This is the version of hypothesis testing which focuses
on Observed ( O) and Expected (E) Frequencies
(O) → F₀ ( E ) → fₑ
The critical value is obtained using degrees of freedom
and the significance level
The chi – square curve is not symmetrical. It starts
from 0
CHI –SQUARE continues (test for
independence)
This is a version of the chi- square(χ²) test in which we
explore the association between variables which are
represented on a contingency table. A contingency table
is a table which shows the association between 2
variables. It has rows and columns. One attribute is
expressed in in rows e.g. Gender versus academic
performance or socio –economics vs academic
performance or highest professional qualifications vs
attitudes to learning
Chi –square a worked example
GENDER POOR GOOD EXCELLENT TOTAL
F 15 20 24 60
M 10 35 40 85
TOTAL 25 55 65 145grant total
CHI – SQUARE WORKED EXAMPLE
CONTINUES
O₁ E₁ O₁ - E₁ (O₁-E₁)²
E₁
E₂ O₂ -E₂
O₂ E₃
O₃ - E₃
O₃ En
On –En
On
∑O ∑E ∑ (O –E)²
Chi –square continues worked example
STEP 1
Hₒ There is no association between the teachers’ highest
qualifications and their attitudes towards teaching in rural areas.
STEP 2
STEP 3
Look for the value of d f (4) at 5% in the table which is
9,49 and draw a curve. Reject Hₒ if χ² calc > 9,49
Worked example continues
Step 4
Calculating the expected frequencies
Expected Frequency is = row total x column total
Grant total
E30 = 64X66 = 21,12
200
E24 =64X66 = 21,12
200
E 10 = 64X 68 = 21,76
200
E 21 =55X66 = 18,15
200
WORKED EXAMPLE
CONTINUES
STEP 4 CONTINUATION
E 12 = 55X68 =18,7
200
E 15 = 81X66 =26,73
200
E 20 = 81X66 =26,73
200
E 46 = 81X68 =27,54
200
WORKED EXAMPLE CONTINUES
STEP 5 O E O-E (O-E)²
E
30 21,12 8,88 3,734
24 21,12 2,88 0,393
10 21,76 -11,76 6,356
21 18,15 2,85 0,448
22 18,15 3,85 0,817
12 18,7 -6,7 2,401
15 26,73 -11,73 5,148
20 26,73 -6,73 1,694
46 27,54 18,46 12,374
33,365
Worked example continued
∴ χ² calc = 33,365
STEP 6