0% found this document useful (0 votes)
20 views

Statistics Slide Notes - Lecture 3-8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Statistics Slide Notes - Lecture 3-8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 104

IMPORTANCE IN OF STATISTICS

IN EDUCATIONAL RESEARCH
1. It permits the most exact kind of description i.e, it is our
descriptive language eg, 10% of the students passed the exam.
2. Statistics forces us to be definite and exact in our thinking which
means it allows us to summarize our results in a short, meaningful
and convenient form. Hence, it saves us from the hassles of giving
long narrative descriptions of phenomena
3. It enables us to analyse relationships between variables, e.g The
average pass rate for the A class is 60% while that of the B class is
40%.
4. Thus, it enables us to predict how much of a variable will happen
under certain conditions.
MEASURES OF CENTRAL
TENDENCY
They indicate how much the terms in the distribution
move towards the middle; i.e centre.
A distribution is a set of data / numbers.
Frequency is the number of times.
There are three Measures of Central Tendency namely
mode, median and mean.
1. MODE
-It is the piece of data with the highest frequency, i.e, it
appears most often than the other characters.
1. MODE
A distribution can have no mode, one mode
or even more.
a). If a distribution has one mode, we say it
is unimodal.
b). If it has two modes, we say it is bimodal.
C). If it has three modes, it is called a
trimodal distribution
d). There is also a multimodal distribution
which has many modes
Advantages of Using the Mode in
Educational Research
There is no need to arrange numbers in order of size. It is not affected by extreme values
(outliers) as is with the range.

DISADVANTAGES

It does not use all the terms/characters in the distribution. It only focuses on those which
appear more frequently than others. Hence, it is not normally used for further statistical
calculations

E.g find the mode of the following distributions


a) 82, 69, 81, 82, 74, 81
b) 1000, 101, 500, 60
c) 0, 5, 0, 5, 0, 5, 4, 5

SOLUTIONS
d) The modes are 81 and 82 hence it has a bimodal
e) No mode
f) The mode is 5 hence it is a unimodal
2. MEDIAN

It is the term which occupies the middle position when


the terms /numbers are arranged in terms of size.
The terms can be arranged in ascending or descending
order
When the number of terms in the distribution is odd
the median is found using the formula:
Median = ½ (n+1) th term
Where n= number of terms in the distribution eg if n=
15
Median = ½ (15+1) th term = 8
MEDIAN continues
If the number of terms in the distribution is even the median is half (1/2) of
the sum of two middle terms eg if n= 20
Median = 10th term +11th term
2
Eg find the median of each of the following distributions
a) 7, 1, 4, 9, 7, 8, 6, 5, 6,3
b) 77, 29, 36, 24, 82, 100, 105, 19, 60, 50
c) 99, 81, 74, 65, 50, 28, 3

SOLUTION
(a) Rank the numbers or arrange them in ascending order
(b) 1 3 4 5 6 7 7 8 9 n= 9
Median= ½ (9+1) th term
= 5th term
Therefore Median = 6
MEDIAN continues
(b) Ranking the numbers
19 24 29 36 50 60 77 82 100 105

Median = ½ (5th+6th) term

= 50+60
2

= 110
2

Therefore the median = 55


MEDIAN continues
(c) Ranking or arranging in ascending order
3 28 50 65 74 81 99
n = 7+1 = 8 = 4th
2 2
Median = the fourth number which is 65

DISADVANTAGES
-Does not use all the terms in the distribution
-It is not always used for further statistical calculations
-It can be different from the terms in the distribution
MEAN
It is also called arithmetic mean or average
-It is obtained by adding all the terms in the
distribution and dividing the sum by the number of
terms in the distribution.
Mean = sum of terms
n
It is represented by x̅ (x bar)
∑ Summation sign. This means something is being
added
x1 + x2+ - - - - -=∑ x
MEAN continues
∑x = (x₁ + x₂ + x₃ - - - -xn)
n
n = Number of terms in the distribution

x̅ = ∑x
n
MEAN continues

ADVANTAGES OF MEAN
- It uses all the terms in the distribution
- It is used for further statistical calculations eg in finding the variance
and standard deviation.
- There is no need to rank the numbers.
DISADVANTAGES
-It is affected by the outliers
-It can be different from all the terms in the distribution
Find the mean of each of the following
a) 60, 74, 88, 36, 54, 81, 93, 96, 50 68
b) 1000, 4000, 3600, 7200, 1112
a) x̅ = ∑x
n
MEAN continues

x̅ = 60+74+88+36+54+81+96+93+50+68
10
= 700
10
= 70

b) x̅ = 1000 + 4000 + 3600 + 7200 + 1112


5
= 16912
5

x̅ = 3382,4
MEASURES OF DISPERSION /VARIABILITY/ Scatter/ SPREAD

They are also called measures of spread, variability or


scatter.
They indicate how much the terms in the distribution
are spread or scattered from the mean/average.
The distribution can be measured relative to the mean
(starting from the mean)
Measures of Dispersion include the variance, Standard
deviation and range
1. RANGE
It focuses on the difference between the greatest
and the smallest values in the distribution. There
are two types of range namely the ordinary range
and the inclusive range.
a). Ordinary range

It is the difference between the greatest and the


smallest value in the distribution.
Ordinary range= x max – x min
= Greatest value – Smallest value
RANGE continues
b. Inclusive range
=(Greatest value –Smallest value) + 1
= (x max –x min) + 1
Find the range (ordinary and inclusive) of the
following distribution
a) 85, 36, 14, 91, 99, 64, 50, 28, 12, 9, 3
Ordinary range = x max- x min
= 99 – 3
= 96
RANGE continues
Inclusive range =( x max – x min ) + 1
= (99 – 3 ) + 1
= 96 + 1
= 97
Advantage of range
Easy to calculate

Disadvantages
Only focuses on outliers and ignores all the other terms
It is very misleading if the distribution has outliers
The Interquartile range

The central portion of the distribution


􀂄 Away from the extremes
􀂄 it is the difference
between the third quartile
(75%) and the first quartile
(25%) of observations.
50% of the data still discarded and only focus
in the middle 5o%
Interquartile range
The interquartile range is another range used as a measure of the
spread.
The difference between upper and lower quartiles (Q3–Q1), which is
called the interquartile range, also indicates the dispersion of a data
set.
The interquartile range spans 50% of a data set, and eliminates the
influence of outliers because, in effect, the highest and lowest
quarters are removed.

Interquartile range = difference between upper quartile (Q3) and


lower quartile (Q1)
Interquatile range cont’

Steps to calculate the IQR


 arrange data in ascending order
 Calculate Q1 using the formula Q1= (n+1)/4
 And Q3= (n+1)*3/4
 Then calculate IQR using the formula IQR= Q3-Q1
 Take averages when Q1 or Q3 are in between two
numbers.

Examples 1
Find the IQR of the set 1,2,3,4,5,6,7
IQR cont’ed

Q1 = (n+1)/4 (7+1)/4=2

Q3 = ¾(n+1) ¾(7+1)=6

IQR = Q3-Q1
=6 - 2 = 4
IQR

• Example 2 find IQR for this marks


30,25,80, 41,4o,56,65,77
Arrange: 25,30,40,41,56,65,77,80
Q1= (8+1)/4
= 2.25 therefore take the average of 30 and 40
and get Q1 = 35
Q3 = ¾(8+1)= 6.75 therefore take the average
of 65 and 77 and get Q3=71
So IQR= 71-35= 36
IQR cont’

Advantages
1. Can be used as a measure of variability if
the extreme values are not recorded correctly
2. It is not affected by extreme values
IQR cont’

Disadvantages
1.Not easy to calculate
2.it is a positional measure, based on only the
twenty- Øfth and seventy-Øfth percentile
b. VARIANCE

Population variance is denoted by σ²


Sample variance is denoted by S². Variance is the mean of the sum of
the squared deviation.

How to find the variance

1. Find the mean of the distribution

2. Subtract the mean from each term in the distribution to get the
deviations i.e
(x₁ - x̅ )(x₂-x̅ )- - - - (x n- x̅ )

3. Square each deviation to get (x₁ - x̅ )², (x₂ -x̅ )²- - - - - - (x n - x̅ )²


VARIANCE continues

4) Add the squared deviation

(x₁ -x̅ )² + (x₂ - x̅ )² +- - - - (xn -x̅ )² = ∑ (x -x̅ )²

Which means sum of the squared deviations

5) Divide the sum by n-1

Therefore Variance is = s² = ∑ (x - x̅ )²
n -1
The End

Thank you for your attention


C. Standard Deviation
The square root of the variance
Samples SD is represented by S
Population SD is represented by σ

Standard Deviation (SD) =√∑( X - X̅ )²


n-1

The variance and standard deviation can be


calculated using the following table
c. STANDARD DEVIATION
Score Mean Deviation Squared deviation

X x̅ x -x̅
x₁ ↑ (x -x̅ )

↓ ( Xn - X̅)²

∑X 0 ∑ ( X - X̅ )²
SUM OF SQUARED
DEVIATION
CALCULATION OF VARIANCE

X X̅ X - X̅ (X - X̅ )²
60 71 -11 121

83 71 12 144
71 71 0 0
63 71 -8 64
89 71 18 324
90 71 19 361
40 71 -30 961

72 71 1 1

∑ 1976
VARIANCE continues
Variance (s²) = ∑ (x -x̅ )²
n- 1

= 1976
8 -1

= 1976
7

=282, 285 7143

= 282,29 (2 decimal place)


Calculation of standard deviation

SD = √ 282,2857143

SD = √ ∑ (X - X̅ )²
n–1

=√ 1976
7

=1680136049

= 16,80 (2 d p)
INTERPRETATION OF VARIANCE AND STANDARD DEVIATION

A large value of the standard deviation/variance shows


that the values are widely scattered relatively to the
mean which means the greater the variance / standard
deviation the more widely spaced the terms are above
and below the mean. The smaller the variance the
more closely packed the values are around the mean.
d. MEASURES OF RELATIVE
STANDING
SCORES
They are used to indicate how an individual
compares to other individuals and determine his or
her relative position. They are concerned with how
a particular score stands in relation to other scores.

Z – score is the number of standard deviations a


score is away from the mean. It helps us to
compare meaningfully, scores obtained in tests
using different scales.
Z –SCORE continues

Z = X- X̅
S
Where( x) is score ( x̅ ) is mean and (S ) is
standard deviation

The greater the Z score the greater the


performance
Z- SCORES continues
Pupil A B X̅ S

Maths 77 74 76 2,15

English 79 76 78 3,24
Z – score continues
a) Considering pupil A Z - score
Maths score = x - x̅ = 77- 76
s 2,15

= 1
2,15

= 0,465116279

= 0,465
Z - score continues

Pupil A Z - Score

Z- score = 79 – 78
3, 24

= 1
3,24

= 0, 308
Pupil A performed better in Maths since the z – score in Maths
is higher than in English.
Z- SCORE continues

Pupil B Maths z – score z = x- x̅


s

= 74 – 76
2, 15

= -2
2,15

= -0,930232558
Therefore to two decimal places = -0,93
Z - SCORE continues
Pupil B English Z-score
= x - x̅
s

= 76 – 78
3,24
= -2
3,24

= - 0,61728

Therefore to two decimal places= -0,62


Pupil B performed better in English than in Maths
e. MEASURES OF ASSOCIATION

It focuses on the relationship between variables.


Correlation It is the degree of association between 2 or
more variables or factors.
There are two types of variables
1) Independent variables (x) and the dependent
variable (y).
The independent variable is the variable which is
manipulated by the researcher during an experiment.
The dependent variable is the factor which is
influenced by the manipulation of the independent
variable
MEASURES OF ASSOCIATION
continues

INDEPENDENT VARIABLE DEPENDENT VARIABLE


1. Amount of fertilizer applied - Yields obtained
2. Number of hours spent studying - Exam mark
3. Distance from the CBD - Rentals

Correlation can be positive or negative .


POSITIVE CORRELATION
The two variables increase or decrease together. An increase in one
variable is matched by corresponding increase in the other.
NEGATIVE CORRELATION
An increase in one variable is matched by corresponding decrease in
the other.
MEASURES OF ASSOCIATION continues

Scatter Diagram/Scatter Plot/Scatter


Gram/Scatter Graph

It is a diagram showing the corresponding values


of the independent and dependent variables as co
ordinates. It is the simplest way of determining
correlation between two variables on a graph. The
independent variable is on the horizontal axis and
the dependent variable on the vertical axis. An
accurate scale should be used on both axes.
The axis should be labeled as shown below
MEASURES OF ASSOCIATION continues

x x x
x x
x x x x

x x x x
0 X
Graph showing no correlation
MEASURES OF ASSOCIATION
continues
Graph showing negative correlation
y
x
x
x
x
x
x
x
x
0 X
MEASURES OF ASSOCIATION
continues
Graph showing perfect positive correlation
y
x
x
x
x
x
x
x

0 x
MEASURES OF ASSOCIATION
continues
A graph showing perfect negative correlation

Y
x
x
x
x
x
x
x
x
o x
MEASURES OF ASSOCIATION continues
CORRELATION CO-EFFICIENT
It is the number which shows the size and direction of association
between variables.
r normally represents a correlation co-efficient
The maximum value of r is + 1
The minimum value of r is – 1
This means r lies between -1 and + 1
When r is + 1 it is perfect positive correlation
When it is between 0,8 – 0,99 very strong positive correlation
“ “ 0,6 – 0,79 strong positive correlation
“ “ “ 0,4 – 0,59 moderate positive correlation
“ “ ‘ 0,2 – 0,39 weak positive correlation
“ “ “ 0,1 – 0,19 very weak positive
correlation
CORRELATION CO-EFFICIENT continues

-1 perfect negative correlation


-0,8 to – 0,99 very strong negative correlation
-0,6 to – 0,79 strong negative correlation
- 0,4 to – 0,59 moderate negative correlation
-0,2 to - 0,39 weak negative correlation
- 0,01 to – 0,19 very weak negative correlation
The 2 popular correlation co-efficiency are the
Pearson’s product correlation co-efficient (r) and the
Spearman’s rank order correlation co-efficient (rho)
PEARSON’S PRODUCT
CORRELATION CO-EFFICIENT (r)
Its mainly strength is that it uses actual values of the
variables.

It is calculated using the following formulae

r= n∑ x y - ∑ x ∑ y
√ [n∑ x² - (∑ x)²] [n ∑ y² - ( ∑y )²]
PEARSON’S PRODUCT
CORRELATION continues
The following table can be used to obtain the values
which are to be substituted in the formulae

x y x² y² Xy
x₁ y₁ x₁² y₁² x₁ y₁
x₂ y₂ x₂² y₂² x₂ y₂
x₃ y₃ x₃² y₃² x₃ y₃
x₄ y₄ x₄² y₄² x₄ y₄
∑x ∑y ∑ x² ∑ y² ∑xy
PEARSON’S PRODUCT
WORKED EXAMPLE
Ten Form 4 pupils at a certain school wrote two tests one in
History and the other one in Mathematics and results are as
follows

pup A B C D E F G H I J
il
HIS 80 74 56 52 78 90 73 65 40 75
TO
RY
Ma 40 52 75 74 50 54 59 60 71 48
ths
PEARSON WORKED
EXAMPLE continues
x y x² y² Xy
80 40 6400 1600 3200
74 52 5476 2704 3848
56 75 3136 5625 4200
52 74 2704 5476 3848
78 50 6084 2500 3900
90 54 8100 2916 4860
73 59 5329 3481 4307
65 60 4225 3600 3900
40 71 1600 5041 2840
75 48 5625 2304 3600

∑ x 683 ∑ y 583 ∑x² 48 679 ∑y² 35 247 ∑ x y 38 503


Pearson worked example continues

r= n∑xy-∑x∑y
√[n∑ x² - (∑ x)²] [n∑ y² - (∑ y )²]

n = 10

= 10 x 38503 – 683 x 583


√[10 x48679 – (683)²][10x35247 – (583)²]
PEARSON WORKED
EXAMPLE continues
= 385 030 – 398 189
√[486790 – 466489] [352470 – 339 889]

= - 13 159
√20 301 x 12 581

= -13 159
√255 401 881
= -13 159
PEARSON continues

Therefore r = - 0, 8 23 to 3 decimal

There is a very strong negative correlation


between History marks and Mathematics marks
SPEARMAN’S RANK ORDER CORRELATION CO – EFFICIENT (rho)

rho = │ -6 ∑ d²

n (n² - 1)

This correlation co-efficient does not use the actual


scores of the variables. It uses the rank order of the
scores (variables). The values of x and y are ranked
separately either in ascending or descending order. The
corresponding rank orders are subtracted, squared and
finally added leading to ∑d².
SPEARMAN’S RANK ORDER
CORRELATION CO - EFFICIENT continues

The following table can be used


x y Rank x Rank y D= r x –r d²
(r x) (r y) y

Maths mark (x)

Physics mark (y)


SPEARMAN’S RANK continues
Maths (x) 50 60 75 42 92 61
Physics (y) 52 58 80 47 95 60
SPEARMAN’S RANK ORDER continues

x y rx ry Rx -ry d²
50 52 2 2 0 0
60 3 3 0 0
75 58 5 5 0 0
42 80 1 1 0 0
92 47 6 6 0 0
61 95 4 4 0 0
60

∑ 380 ∑ 392 ∑ d² 0
SPEARMAN’S RANK ORDER continues

rho = │- 6 x 0
6 (6² – 1)

=│ -0
6 x 35

= │- 0
210 = │- 0 =│

rho = 1 There is a perfect positive correlation between Maths


and Physics marks.
SPEARMAN’S RANK ORDER
EXAMPLE 2
AGE 61 71 72 74 83 54 74 67 57 61
(X)

MASS 63 61 51 58 48 75 57 60 75 61
(Y)
SPEARMAN RANK continues
x y Rx Ry d= r x-r y d²
61 63 3,5 8 -4,5 20,25
71 61 6 6,5 - 0,5 0,25
72 51 7 2 5 25
74 58 8,5 4 4,5 20,25
83 48 10 1 9 81
54 75 1 9,5 -8,5 72,25
74 57 8,5 3 5,5 30,25
67 60 5 5 0 0
57 75 2 9,5 -7,5 56,25
61 61 3,5 6,5 -3 9

∑ d² =314,5
SPEARMAN continues
n = 10
When ranking if there are common numbers you add
the numbers and divide by the number for example
75 in the above table under (y) it falls under position
9 and 10 so it becomes 9+10 =19 divided by 2 = 9,5

rho = │ -6 ∑ d²
n (n² - 1)

= │ - 6 x 314,5
10 ( 10² - 1)
SPEARMAN continues

= │- 1887
10 (99)

= │ - 1887
990
= │- 1,906060
= -0,906060
= - 0,91
There is a very strong negative correlation between age
and mass that is as some gets older the mass decreases
MEASURES OF ASSOCIATION continues

CORRELATION CO-EFFICIENT
It is the number which shows the size and direction of association
between variables.
r normally represents a correlation co-efficient
The maximum value of r is + 1
The minimum value of r is – 1
This means r lies between -1 and + 1
When r is + 1 it is perfect positive correlation
When it is between 0,8 – 0,99 very strong positive correlation
“ “ 0,6 – 0,79 strong positive correlation
“ “ “ 0,4 – 0,59 moderate positive correlation
“ “ ‘ 0,2 – 0,39 weak positive correlation
“ “ “ 0,1 – 0,19 very weak positive correlation
CORRELATION CO-EFFICIENT continues

-1 perfect negative correlation


-0,8 to – 0,99 very strong negative correlation
-0,6 to – 0,79 strong negative correlation
- 0,4 to – 0,59 moderate negative correlation
-0,2 to - 0,39 weak negative correlation
- 0,01 to – 0,19 very weak negative correlation
The 2 popular correlation co-efficiency are the
Pearson’s product correlation co-efficient (r) and the
Spearman’s rank order correlation co-efficient (rho)
SPEARMAN’S RANK
CORRELATION

Pearson’s order is better than Spearman’s rank order


because it uses actual variables and spearman uses rank
orders of the variables. However the correlation
calculated for the same data not very different and in
most cases they leave to the same conclusion.
SCALES OF
MEASUREMENT
Measurement refers to assignment of numerical value
to an entity

SCALES OF MEASUREMENT

nominal ordinal interval ratio


SCALES OF MEASUREMENT

• Nominal Scale
• These are the most primitive scales primarily
used for labelling or naming. Each group is
assigned a number or a name as a
distinguishing label or classification for example
the numbers given to soccer players. It does
not mean that number 7 plays qualitatively
better than number 3 or vice versa. Gender /
sex where the labels male or females are used.
ORDINAL SCALES
According to Latif and Maunganidze (2004) an ordinal
scale is the simplest true scale which orders or ranks
people, objects or events along particular continuum.
There is qualitative classification and one can actually
say that one class is better than another relative to a
particular variable.
The scale determines the relative position of an object or
an individual with respect to others for example in
terms of academic performance religiosity, maturity
etc.
Ordinal scale continued
It satisfies the transitivity rule which states that if A is
greater (>) than B and B greater ( >) than C then A
is greater than (> ) C
Interval Scale
There is equality of units i.e the same numerical
distance is associated with the same empirical
distance on the same real continuum e.g. The
difference between 35° c and 25° c has the same
value as 80° c and 70° c . The intervals emanate
from an abitrary origin i.e. There is no true zero
point. Zero does not mean the total absence of what
is being measured e.g. I Q of 0 does not imply zero
intelligence
SCALES OF MEASUREMENT
continues

Ratio scales

It is the best scale of measurement because it has a true


zero point. According to Latif and Maunganidze (2004)
a true zero point is a point corresponding to the
absence of a thing being measured for example time,
mass and volume where zero seconds means no time
and zero kg means weightless. Apart from possessing
all the properties of 3 proceeding scales ratio scales
have true ratios for example we can safely say 50kg is
¼ of 200kg.
NORMAL DISTRIBUTION CURVE

It is one of the most important distribution in statistics


because it mirrors/reflects the distribution of many real life
measurements such as mass, height, weight etc.
Characteristics of a normal distribution curve
1. It represents a cross section of a bell
2. It is symmetrical
3. It is asymptotic to the horizontal axis i.e. it does not come
into contact with the horizontal axis
4. The total area under the curve is one (1)
5. It begins with the low frequency which raises at the middle
and evenly subsides towards the end.
6. The mean, mode and median are = and they coincide on
the line of symmetry.
HYPOTHESIS TESTING
Alternative Hypothesis H₁ can be stated in 3 ways
which are (a) definite increase (one tailed test

(b) definite decrease (one tailed test


(c) any change (two tailed test)
Critical value is value taken from the table
Null Hypothesis H₀ states there is no difference
between characteristics of 2 samples
Alternative Hypothesis H₁ claim that there is a
difference between characteristics of a 2 samples.
STATEMENT OF HYPOTHESIS
continues

Examples of null (H₀)


1) H₀ There is no difference between the academic
performance of Grade 7A pupils in Maths and English
2) H₀ There is no difference between the performance
of boys and girls in Maths
Examples of Alternative Hypothesis
3) H₁ There is a difference between the academic
performance of Grade 7A pupils in Maths and English
4) H₁ There is a difference between the academic
performance of boys and girls in Maths
HYPOTHESIS TESTING PROCEDURE

1. State the null and alternative hypothesis i.e Hₒ and H ₁


for example there is no difference or association or
relationship. It’s a prediction about a population.
2. Decide on the test statistic to be used.
3. State the rejection criterion (decision rule) . It is a
statement which specifies when the null hypothesis
should be rejected.
4. Calculate the test statistic
5. Make a statistical decision
6. Make a conclusion
ASSUMPTIONS OF THE T -TEST

1) The sample must be taken from the


population which follows the normal
distribution
2) The variance of the population must be
unknown (σ ² )
3) The sample size must be small i.e. n < 30
T - test
We need the degrees of freedom and the level of
significance
Degrees of freedom (d f) = n – 1 where n is the
sample size e.g. If n = 15 d f = 15-1 = 14
e.g. A 2 tailed t –tests for paired samples at 5%
significance level when n = 20
n = 20 - 1 d f = 19
critical value = 2,093 i.e. the value from the
table
Reject H₀ if t calc < -2,093 or t calc > 2,093
(rejection criterion)
T – test continues
A 2 tailed t – test for paired samples at 1 %
significance level when n = 25 d f = 25-1 =24
T –test worked example
A form 4 teacher wanted to find out if there is a difference
between the academic performance of pupils in Maths and
English. The 4A pupils were given Maths and English test
and there scores were as follows

pup A B C D E F G H I J
il
Ma 60 58 75 36 50 61 85 70 77 63
ths
EN 72 65 70 40 50 64 90 72 70 60
GLI
SH
T –test continues
Carry out a t test at 10 % significance level to determine if
there is a difference between the academic performance of
4A pupils in Maths and English.

SOLUTION
H₀ there is no difference between the academic performance
of 4A pupils in Maths and English
H₁ there is a difference in performance of 4A pupils in Maths
and in English.
T-test solution continues
A 2 tailed t – test at 10% significance level
n = 10
d f 10 – 1 = 9
Reject H₀ if │ t calc │ > 1,833
T – test working Continues

t calc =√ ( n – 1) ∑ d
√ n ∑ d² - ( ∑d )²
The values to be substituted in the formulae can be
obtained using the following table.
T- test continues
x y d= x-y d²
X₁ Y₁ D₁ = x₁ - y₁ ²D²
X₂ Y₂ D₂ = x₂ -y₂ D²
x₃ Y₃ D₃ = x₃ -y₃ D²
xn yn Dn = xn -yn dn²
∑x ∑y ∑d ∑ d²
T – test continues
x y D=x-y d²
60 72 -12 144
58 65 -- 7 49
75 70 5 25
36 40 -4 16
50 50 0 0
61 64 -3 9
85 90 -5 25
70 72 -2 4
77 70 7 49
63 60 3 9

Add positive first and ∑ d ² = 330


subtract the negatives
= 15 – 33 = - 18
T –test worked example continues

t calc = √ ( n – 1) ∑ d
√ n ∑ d² - ( ∑ d)²

= √ (10 – 1 ) x - 18
√ 10 x 330 – (- 18 ) ²

= √ 9 x -18
√ 3300 – 324
T – test worked example continues
= - 54
√ 2976
= 54
54,5527679

= 0, 989868026

= - 0,99
Since the │ t calc │ < 1,833 accept H₀ and conclude that
there is no difference between the academic
performance in Maths and English
CHI – SQUARE
ASSUMPTIONS
1) Observations must be independent
2) The categories must be mutually exclusive i.e.
Each observation must appear in one and only
one of the categories in the table
3) The observations must be measured as
frequencies.
Chi – square test continues
(χ²) chi – square test
This is the version of hypothesis testing which focuses
on Observed ( O) and Expected (E) Frequencies
(O) → F₀ ( E ) → fₑ
The critical value is obtained using degrees of freedom
and the significance level
The chi – square curve is not symmetrical. It starts
from 0
CHI –SQUARE continues (test for
independence)
This is a version of the chi- square(χ²) test in which we
explore the association between variables which are
represented on a contingency table. A contingency table
is a table which shows the association between 2
variables. It has rows and columns. One attribute is
expressed in in rows e.g. Gender versus academic
performance or socio –economics vs academic
performance or highest professional qualifications vs
attitudes to learning
Chi –square a worked example
GENDER POOR GOOD EXCELLENT TOTAL

F 15 20 24 60
M 10 35 40 85
TOTAL 25 55 65 145grant total
CHI – SQUARE WORKED EXAMPLE
CONTINUES

Row x column = 2 x 3 table


Total row not to be counted and is used to calculate
Degreesof freedom
60 and 85 (row totals )
25 and 55 (column totals )
The numbers appearing on the original contingency table
are the observed frequencies
Degrees of freedom are obtained using the number of
rows and columns i.e. d f = (r-1) (c – 1) r =no of rows
C = no of columns
Chi – square continues

e.g. For a 4x5 table d f = (4-1) (5-1)


= (3) (4)
= 12

In the above table it is (2-1) (3-1)


=1x2
=2
Expected frequency are obtained using the following formulae r
total x c total
grant total
e.g. 60x25
145
CHI- SQUARE continues
E 40 = 85 X 65 = 38,10
145
The test statistic is obtained using the following table
O E O -E (O-E)²
E

O₁ E₁ O₁ - E₁ (O₁-E₁)²
E₁

E₂ O₂ -E₂

O₂ E₃
O₃ - E₃

O₃ En
On –En

On
∑O ∑E ∑ (O –E)²
Chi –square continues worked example

An education researcher wanted to explore if there is

an association between teachers’ highest

qualifications and their attitudes towards teaching in

rural areas. After conducting a survey the data

shown in the table below was obtained


Chi –square worked example continues
Highest professional qualifications

ATTITUDES Diploma in undergraduate Post graduate Total


education
Favourable 30 24 10 64
Neutral 21 22 12 55
Unfavourable 15 20 46 81
Total 66 66 68 200
Worked example continues
Carry out a chi- square test(χ²) at 5% significance level to
determine if there is an association between teachers’ highest
professional qualifications and their attitudes towards teaching
in rural areas.

STEP 1
Hₒ There is no association between the teachers’ highest
qualifications and their attitudes towards teaching in rural areas.

H₁ There is an association between teachers’ highest


qualifications and their attitudes towards teaching in rural areas.
Worked example continues

STEP 2

A chi-square( χ²) at 5% significance level


3x3 table
d f = (3-1) (3-1)
2x2
df=4
WORKED EXAMPLE
CONTINUES

STEP 3
Look for the value of d f (4) at 5% in the table which is
9,49 and draw a curve. Reject Hₒ if χ² calc > 9,49
Worked example continues
Step 4
Calculating the expected frequencies
Expected Frequency is = row total x column total
Grant total
E30 = 64X66 = 21,12
200
E24 =64X66 = 21,12
200
E 10 = 64X 68 = 21,76
200
E 21 =55X66 = 18,15
200
WORKED EXAMPLE
CONTINUES
STEP 4 CONTINUATION
E 12 = 55X68 =18,7
200

E 15 = 81X66 =26,73
200

E 20 = 81X66 =26,73
200

E 46 = 81X68 =27,54
200
WORKED EXAMPLE CONTINUES
STEP 5 O E O-E (O-E)²
E
30 21,12 8,88 3,734
24 21,12 2,88 0,393
10 21,76 -11,76 6,356
21 18,15 2,85 0,448
22 18,15 3,85 0,817
12 18,7 -6,7 2,401
15 26,73 -11,73 5,148
20 26,73 -6,73 1,694
46 27,54 18,46 12,374

33,365
Worked example continued

∴ χ² calc = 33,365

STEP 6

Since χ² calc is > 9,49 reject Hₒ and conclude that


there is an association between teachers’ highest
qualification and their attitude towards teaching in
rural areas.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy