Sampling and Estimation
Sampling and Estimation
Sampling and Estimation
LESSON SIX
Sampling and Estimation
- Sampling techniques
- Central limit theorem
- Sampling distribution of statistical parameters
- Test of hypothesis
Stratified sampling
In this case the population is divided into groups in such a way that units within each group are
as similar as possible in a process called stratification. The groups are called strata. Simple
random samples from each of the strata are collected and combined into a simple. This
technique of collecting a sample from a population is called stratified sampling. Stratification
may be by age, occupation income group e.t.c.
186 Lesson Six
Systematic Sampling
This sampling is a part of simple random sampling in ascending or descending orders. In
systematic sampling a sample is drawn according to some predetermined object. Suppose a
population consists of 1000 units, then every tenth, 20th or 50th item is selected. This method is
very easy and economical. It also saves a lot of time
Multistage sampling
This is similar to stratified sampling except division is done on geographical/location basis, e.g. a
country can be divided into provinces and then survey is done in 4 towns in each province. This
helps to cut traveling costs for a surveyor.
Cluster Sampling
This is where a few geographical regions e.g. a location, town or village are selected at random
and say every single household or shop in that area is interviewed. This again cuts on costs.
Judgment Sampling
Here the interviewer selects whom to interview believing that their view is more fundamental
since they might be directly affected e.g. to find out effects of public transport one may chose to
interview only people who don’t own cars and travel frequently to work.
Types of distribution
Population distribution
It refers to the distribution of the individual values of population. Its mean is denoted by ‘µ’
Sample distribution
It is the distribution of the individual values of a single sample. Its mean is generally written as
“ x ”. it is not usually the same as µ
s
Standard error of the mean = S x
n
Note: this formula is satisfactory for larger samples and a large population i.e. n > 30 and n >
5% of N.
- The word ‘error’ is in place of ‘deviation’ to emphasize that variation among sample means
is due to sampling errors.
- The smaller the standard error the greator the precision of the sample value.
Statistical estimation
It is the procedure of using statistic to estimate a population parameter
It is divided into point estimation (where an estimate of a population parameter is given by a
single number) and interval estimation (where an estimate of a population is given by a range in
which the parameter may be considered to lie) e.g. a bus meant to take a class of 100 students
(population N) for trip has a limit to the maximum weight of 600kg of which it can carry, the
teacher realizes he has to find out the weight of the class but without enough time to weigh
everyone he picks 25 students selected at random (sample n = 25). These students are weighed
and their average weight recorded as 64kg ( X - mean of a sample) with a standard deviation (s),
now using this the teacher intends to estimate the average weight of the whole class (µ –
population mean) by using the statistical parameters standard deviation (s), and mean of the
sample ( x ).
Confidence Interval
The interval estimate or a ‘confidence interval’ consists of a range (an upper confidence limit and
lower confidence limit) within which we are confident that a population parameter lies and we
assign a probability that this interval contains the true population value
The confidence limits are the outer limits to a confidence interval. Confidence interval is the
interval between the confidence limits. The higher the confidence level the greater the
confidence interval. For example
A normal distribution has the following characteristic
i. Sample mean ± 1.960 σ includes 95% of the population
188 Lesson Six
1. LARGE SAMPLES
These are samples that contain a sample size greater than 30(i.e. n>30)
Example
The quality department of a wire manufacturing company periodically selects a sample of wire
specimens in order to test for breaking strength. Past experience has shown that the breaking
strengths of a certain type of wire are normally distributed with standard deviation of 200 kg. A
random sample of 64 specimens gave a mean of 6200 kgs. Find out the population mean at 95%
level of confidence
Solution
Population mean = χ ± 1.96 S x
Note that sample size is alredy n > 30 whereas s and x are given thus step i), ii) and iv) are
provided.
Here: X = 6200 kgs
s 200
Sx = = = 25
n 64
N n
FPCF is given by =
n 1
where N = population size
n = sample size
Example
A manager wants an estimate of sales of salesmen in his company. A random sample 100 out of
500 salesmen is selected and average sales are found to be Shs. 75,000. if a sample standard
deviation is Shs. 15000 then find out the population mean at 99% level of confidence
Solution
Here N = 500, n = 100, X = 75000 and S = 15000
Now
Standard error of mean
s N n
= Sx = x
n n 1
=
15000
x
500 100
100 500 1
15000 400
= x
10 499
15000
= (0.895)
10
Example
Given two samples A and B of 100 and 400 items respectively, they have the means X 1 = 7 ad
X 2 = 10 and standard deviations of 2 and 3 respectively. Construct confidence interval at 70%
confidence level?
Solution
Sample A B
X1 = 7 X 2 = 10
n1 = 100 n2 = 400
S1 = 2 S2 = 3
The standard error of the samples A and B is given by
4 9
S X =
AX B 100 400
25 5
= =
400 20
=¼ = 0.25
At 70% confidence level, then appropriate number is equal to 1.04 (as read from the normal
tables)
X 1 X 2 = 7 – 10 = - 3 = 3
We take the absolute value of the difference between the means e.g. the value of X = absolute
value of X i.e. a positive value of X.
Confidence interval is therefore given by
= 3± 1.04 (0.25 ) From the normal tables a z value of 1.04 gives a value of 0.7.
= 3± 0.26
Example 2
A comparison of the wearing out quality of two types of tyres was obtained by road testing.
Samples of 100 tyres were collected. The miles traveled until wear out were recorded and the
results given were as follows
Tyres T1 T2
Mean X 1 = 26400 miles X 2 = 25000 miles
Variance S21= 1440000 miles S22= 1960000 miles
Find a confidence interval at the confidence level of 70%
Solution
X 1 = 26400
X 2 = 25000
Difference between the two means
X 1
X 2 = (26400 – 25000)
Sampling and Estimation 191
= 1,400
Again we take the absolute value of the difference between the two means
We calculate the standard error as follows
S12 S 22
S X =
AX B n1 n2
= 184.4
Confidence level at 70% is read from the normal tables as 1.04 (Z = 1.04).
Thus the confidence interval is calculated as follows
= 1400 ± (1.04) (184.4)
= 1400 ± 191.77
1,208.23 ≤ X ≤ 1591.77
Example 1
In a sample of 800 candidates, 560 were male. Estimate the population proportion at 95%
confidence level.
Solution
Here
560
Sample proportion (P) = = 0.70
800
q = 1 – p = 1 – 0.70 = 0.30
n = 800
pq
=
0.70 0.30
n 800
Sp = 0.016
population proportion
192 Lesson Six
= 0.70 ± 0.03
= 0.67 to 0.73
Example 2
A sample of 600 accounts was taken to test the accuracy of posting and balancing of accounts
where in 45 mistakes were found. Find out the population proportion. Use 99% level of
confidence
Solution
Here
45
n = 600; p = = 0.075
600
q = 1 – 0.075 = 0.925
Sp =
pq
=
0.075 0.925
n 600
= 0.011
Population proportion
= P ± 2.58 (Sp)
= 0.075 ± 2.58 (0.011)
= 0.075 ± 0.028
= 0.047 to 0.10
pq pq
= (P1 – P2) ± Z
n1 n2
Sampling and Estimation 193
p1n1 p2 n2
Where P = always remember to convert P1 & P2 to P.
n1 n2
2. SMALL SAMPLES
(a) Estimation of population mean
If the sample size is small (n<30) the arithmetic mean of small samples are not normally
distributed. In such circumstances, students t distribution must be used to estimate the
population mean.
In this case
Population mean µ = X ± ts x
X = Sample mean
s
Sx =
n
x x
2
Example
A random sample of 12 items is taken and is found to have a mean weight of 50 grams and a
standard deviation of 9 grams
What is the mean weight of population
a) with 95% confidence
b) with 99% confidence
Solution
s 9
X 50; S = 9; v = n – 1 = 12 – 1 = 11; Sx
n 12
µ = x’ ± ts x
= 50 ± 5.72 grams
Therefore we can state with 95% confidence that the population mean is between 44.28 and
55.72 grams
At 99% confidence level
9
µ = 50 ± 3.25
12
= 50 ± 8.07 grams
194 Lesson Six
Therefore we can state with 99% confidence that the population mean is between 41.93 and
58.07 grams
Note: To use the t distribution tables it is important to find the degrees of freedom (v = n – 1).
In the example above v = 12 – 1 = 11
From the tables we find that at 95% confidence level against 11 and under 0.05, the value of t =
2.201
Definition
- A hypothesis is a claim or an opinion about an item or issue. Therefore it has to be tested
statistically in order to establish whether it is correct or not correct
- Whenever testing an hypothesis, one must fully understand the 2 basic hypothesis to be tested
namely
i. The null hypothesis (H0)
ii. The alternative hypothesis(H1)
Levels of significance
A level of significance is a probability value which is used when conducting tests of hypothesis.
A level of significance is basically the probability of one making an incorrect decision after the
statistical testing has been done. Usually such probability used are very small e.g. 1% or 5%
0.5000 0.4900
0
Critical value
0.45
5% = 0.05
Critical region
0
Crititical value = -1.65
NB: If the standardized value of the mean is less than –1.65 we reject the null hypothesis (H0)
and accept the alternative Hypothesis (H1) but if the standardized value of the mean is more
than –1.65 we accept the null hypothesis and reject the alternative hypothesis
The above sketch graph and level of significance are applicable when the sample mean is < (i.e.
less than the population mean)
196 Lesson Six
Acceptance region
5% = 0.05
NB: If the sample mean standardized value < 1.65, we accept the null hypothesis but reject the
alternative. If the sample mean value > 1.65 we reject the null hypothesis and accept the
alternative hypothesis
The above sketch is normally used when the sample mean given is greater than the population
mean
Reject null hyp (accept alt hyp) Reject null hyp (accept alt hyp)
15cm 17 ½ cm
NB: Alternative hypothesis is usually rejected if the standardized value of the sample mean lies
beyond the tolerance limits (15cm and 17 ½ cm).
left
On the other hand the test may compuliate on the right hand tail of the normal distribution
when this happens the major complaint is likely to do with oversize items bought. Therefore the
test is known as one tailed as the focus is on one end of the normal distribution.
1. Normal test
Test a sample mean ( X ) against a population mean (µ) (where samples size n > 30 and
population variance σ2 is known) and sample proportion, P(where sample size np >5 and nq
>5 since in this case the normal distribution can be used to approximate the binomial
distribution
2. t test
Tests a sample mean ( X ) against a population mean and especially where the population
variance is unknown and n < 30.
Example 1
A certain NGO carried out a survey in a certain community in order to establish the average at
which the girls are married. The results of the survey indicated that the marriage age for the girls
is 19 years
In order to establish the validity of the mean marital age, a sample of 50 women was interviewed
and the average age indicated that they got married at the age of 16 years. However the different
ages at which they were married differed with the standard deviation of 2.1years
The sample data indicates that the marital age is less 19 years. Is this conclusion true or not ?
Required
Conduct a statistical test to either support the above conclusion drawn from the sample statistics
i.e. the marriage age is less than 19 years, use a level of significance of 5%
Solution
1. Null hypothesis
H0: μ (mean marital age) = 19 years
Alternative hypothesis H1: μ (mean marital age) < 19 years
2. The level of significance is 5%
3. The test statistics is the sample mean age, X = 16 years
4. The critical value of the one tailed test (one tailed because the alternative hypothesis is
an inequality) at 5% level of significance is –1.65
200 Lesson Six
Acceptance region
Rejection region
- 1.65 0
6. Since –10.1 < -1.65, we reject the null hypothesis but accept the alternative hypothesis
at 5% level of significance i.e. the marriage age in this community is significantly lower
than 19 years
Example 2
A foreign company which manufactures electric bulbs has assured its customers that the lifespan
of the bulbs is 28 month with a standard deviation of 4months
Recently the company embarked on a quality improvement research for their product. After the
research using new technology, a sample of 70 bulbs was tested and they gave a mean lifespan of
30.2 months
Does this justify the research undertaken? Use 1% level of significance to conduct a statistical
test in order to establish the truth about the above question.
Testing procedure
1. Null hypothesis H0: µ = 28
Alternative hypothesis H1: µ > 28
2. The level of significance is 1% (one tailed test)
Sampling and Estimation 201
0.4900
1% = 0.01
2.33
5. The standardized value of the sample mean is
X 30.2 28
Z = = = 4.6
Sx 4
70
6. Since 4.6 > 2.33, we reject the null hypothesis but accept the alternative hypothesis at
1% level of significance i.e. the new sample mean life span is statistically significant
higher than the population mean
Therefore the research undertaken was worth while or justified
Example 3
A construction firm has placed an order that they require a consignment of wires which have a
mean length of 10.5 meters with a standard deviation of 1.7 m
The company which produces the wires delivered 90 wires, which had a mean length of 9.2 m.,
The construction company rejected the consignment on the grounds that they were different
from the order placed.
Required
Conduct a statistical test to indicate whether you support or not support the action taken by the
construction company at 5% level of significance.
Solution
Null hypothesis µ = 10.5 m
Alternative hypothesis µ ≠ 10.5 m
Level of significance be 5%
The test statistics is the sample mean X = 9.2m
The critical value of the two tailed test at 5% level of significance is ± 1.96 (two tailed test).
202 Lesson Six
- 1.96 +1.96
The standardized value of the test Z =
X -μ 9.2 10.5
Z = = = - 7.25
SX 1.7
90
Since 7.25 < 1.96, reject the null hypothesis but accept the alternative hypothesis at 5% level
of significance i.e. the sample mean is statistically different from the consignment ordered by
the construction company. Therefore support the action taken by the construction company
deviation of 1.5 bags. The crops grew under natural circumstances and conditions without the
soil being treated with any fertilizer. The same agronomist carried out an alternative experiment
where he picked 60 plots in the same area and planted the same plant of maize but a fertilizer
was applied on these plots. After the harvest it was established that the mean harvest was 63
bags per plot with a standard deviation of 1.3 bags
Required
Conduct a statistical test in order to establish whether there was a significant difference between
the mean harvest under the two types of field conditions. Use 5% level of significance.
Solution
H0 : µ1 = µ2
H1 : µ1 ≠ µ2
Critical values of the two tailed test at 5% level of significance are 1.96
The standardized value of the difference between sample means is given by Z where
X1 X 2 1.52 1.32
Z =
where S X 1 X 2 =
S X1 X 2 50 60
Z =
60 63
0.045 0.028
= 11.11
- 1.96 0 +1.96
Since 11.11 < -1.96, we reject the null hypothesis but accept the alternative hypothesis at 5%
level of significance i.e. the difference between the sample mean harvest is statistically significant.
This implies that the fertilizer had a positive effect on the harvest of maize
Note: You don’t have to illustrate your solution with a diagram.
204 Lesson Six
Example 2
An observation was made about reading abilities of males and females. The observation lead to a
conclusion that females are faster readers than males. The observation was based on the times
taken by both females and males when reading out a list of names during graduation ceremonies.
In order to investigate into the observation and the consequent conclusion a sample of 200 men
were given lists to read. On average each man took 63 seconds with a standard deviation of 4
seconds
A sample of 250 women were also taken and asked to read the same list of names. It was found
that they on average took 62 seconds with a standard deviation of 1 second.
Required
By conducting a statistical hypothesis testing at 1% level of significance establish whether the
sample data obtained does support earlier observation or not
Solution
H0: µ1 = µ2
H1: µ1 ≠ µ2
Critical values of the two tailed test is at 1% level of significance is 2.58.
X1 X 2
Z =
S X1 X 2
63 62
Z = = 3.45
42 2
200 250
1
Acceptance region
Rejection region
Pq
Sp =
n
P
Z score is calculated as, Z = Where P = Proportion found in the sample.
Sp
Π – the hypothetical proportion.
Example
A member of parliament (MP) claims that in his constituency only 50% of the total youth
population lacks university education. A local media company wanted to acertain that claim thus
they conducted a survey taking a sample of 400 youths, of these 54% lacked university education.
Required:
At 5% level of significance confirm if the MP’s claim is wrong.
Solution.
Note: This is a two tailed tests since we wish to test the hypothesis that the hypothesis is
different (≠) and not against a specific alternative hypothesis e.g. < less than or > more
than.
at 5% level of significance for a two-tailored test the critical value is 1.96 since calculated Z value
< tabulated value (1.96).
i.e. 1.6 < 1.96 we accept the null hypothesis.
Thus the MP’s claim is accurate.
Example
Ken industrial manufacturers have produced a perfume known as “fianchetto.” In order to test
its popularity in the market, the manufacturer carried a random survey in Back rank city where
10,000 consumers were interviewed after which 7,200 showed preference. The manufacturer
also moved to area Rook town where he interviewed 12,000 consumers out of which 1,0000
showed preference for the product.
Required
Design a statistical test and hence use it to advise the manufacturer regarding the differences in
the proportion, at 5% level of significance.
Solution
H0 : π1 = π2
H1 : π1 ≠ π2
The critical value for this two tailed test at 5% level of significance = 1.96.
206 Lesson Six
Now Z =
P1 P2 1 2
S P1 P2
But since the null hypothesis is π1 = π2, the second part of the numerator disappear i.e.
π1 - π2 = 0 which will always be the case at this level.
Then Z =
P1 P2
S P1 P2
Where;
Sample 1 Sample 2
Sample size n1 = 10,000 n2 = 12,000
Sample proportion of success P1 =0.72 P2 = 0.83
Population proportion of success. Π1 Π2
pq pq
Now S p1 p2 =
n1 n2
p1n1 p2 n2
Where P =
n1 n2
And q = 1 – p
in our case
10, 000(0.72) 12, 000(0.83)
P=
10, 000 12, 000
84, 000
=
22, 000
= 0.78
q = 0.22
0.78 0.22 0.78 0.22
S P1 P2
10, 000 12, 000
= 0.00894
0.72 0.83
Z= = 12.3
0.00894
Since 12.3 > 1.96, we reject the null hypothesis but accept the alternative. the differences
between the proportions are statistically significant. This implies that the perfume is much
more popular in Rook town than in Back rank city.
The null hypothesis is that there is no difference between the population proportions. It means
two samples are from the same population.
Hence
H0 : π1 = π2
The best estimate of the standard error of the difference of P1 and P2 is given by pooling the
samples and finding the pooled sample proportions (P) thus
p1n1 p2 n2
P=
n1 n2
P1 P2
And Z =
S p1 p2
Example
In a random sample of 100 persons taken from village A, 60 are found to be consuming tea. In
another sample of 200 persons taken from a village B, 100 persons are found to be consuming
tea. Do the data reveal significant difference between the two villages so far as the habit of
taking tea is concerned?
Solution
Let us take the hypothesis that there is no significant difference between the two villages as far
as the habit of taking tea is concerned i.e. π1 = π2
We are given
P1 = 0.6; n1 = 100
P2 = 0.5; n2 = 200
=
0.6 100 0.5 200 60 100
100 200 300
= 0.53
q = 1 – 0.53
= 0.47
pq pq
S P1 P2 =
n1 n2
=
0.53 0.47 0.53 0.47
100 200
= 0.0608
0.6 0.5
Z=
0.0608
= 1.64
208 Lesson Six
Since the computed value of Z is less than the critical value of Z = 1.96 at 5% level of
significance therefore we accept the hypothesis and conclude that there is no significant
difference in the habit of taking tea in the two villages A and B
t distribution (student’s t distribution) tests of hypothesis (test for small samples n < 30)
For small samples n < 30, the method used in hypothesis testing is exactly similar to the one for
large samples exept that t values are used from t distribution at a given degree of freedom v,
instead of z score, the standard error Se statistic used is also different.
Note that v = n – 1 for a single sample and n1 + n2 – 2 where two sample are involved.
X X
2
S= for n < 30
n 1
If the calculated value of t exceeds the table value of t at a specified level of significance, the null
hypothesis is rejected.
Example
Ten oil tins are taken at random from an automatic filling machine. The mean weight of the tins
is 15.8 kg and the standard deviation is 0.5kg. Does the sample mean differ significantly from the
intended weight of 16kgs. Use 5% level of significance.
Solution
Given that n = 10; x = 15.8; S = 0.50; μ = 16; v = 9
H0 : μ = 16
H1 : μ ≠ 16
0.5
= SX
10
15.8 16
t = 0.5
10
0.2
=
0.16
= -1.25
The table value for t for 9 d.f. at 5% level of significance is 2.26. the computed value of t is
smaller than the table value of t. therefore, difference is insignificant and the null hypothesis is
accepted.
Sampling and Estimation 209
The standard deviation is obtained by pooling the two sample standard deviation as shown
below.
Sp =
n1 1 S12 n2 1 S22
n1 n2 2
Where S1 and S2 are standard deviation for sample 1 & 2 respectively.
Sp Sp
Now S X 1 = and S X 2 =
n1 n2
S X1X 2 = S X2 S X2 2
1
n1 n2
Alternatively S = Sp
X1X 2 n1n2
Example
Two different types of drugs A and B were tried on certain patients for increasing weights, 5
persons were given drug A and 7 persons were given drug B. the increase in weight (in pounds)
is given below
Drug A 8 12 16 9 3
Drug B 10 8 12 15 6 8 11
Do the two drugs differ significantly with regard to their effect in increasing weight? (Given that
v= 10; t0.05 = 2.23)
Solution
H0 : μ1 = μ2
H1 : μ1 ≠ μ2
X1 X 2
t=
S X1X 2
X1 =
X 1
=
45
=9 X2 =
X 2
70
10
n1 5 n2 7
62 54
S1 = = 3.94 S2 = 3
4 6
Sp =
4 15.4 6 9
10
= 3.406
= 1.99
X1 X 2 9 10
t = =
S X1X 2 1.99
= 0.50
Hence there is no significant difference in the efficacy of the two drugs in the matter of
increasing weight
Example
Two salesmen A and B are working in a certain district. From a survey conducted by the head
office, the following results were obtained. State whether there is any significant difference in the
average sales between the two salesmen at 5% level of significance.
Sampling and Estimation 211
A B
No. of sales 20 18
Average sales in shs 170 205
Standard deviation in shs 20 25
Solution
H0 : μ1 = μ2
H1 : μ1 ≠ μ2
Where
Sp =
n1 1 S12 n2 1 S22
n1 n2 2
n1 n2
S X 1 X 2 = Sp
n1n2
= 22.5
38
S X 1 X 2 22.5
360
= 7.31
170 205
t=
7.31
= 4.79
t0.05(36) = 1.9 (Since d.f > 30 we use the normal tables)
The table value of t at 5% level of significance for 36 d.f. when d.f. >30, that t distribution is the
same as normal distribution is 1.9. since the value computed value of t is more than the table
value, we reject the null hypothesis. Thus, we conclude that there is significant difference in the
average sales between the two salesmen
S12
F= which is the test statistic.
S 22
Which follows F – distribution with V1 and V2 degrees of freedom. The larger sample variance is
placed in the numerator and the smaller one in the denominator
If the computed value of F exceeds the table value of F, we reject the null hypothesis i.e. the
alternate hypothesis is accepted
Example
In one sample of observations the sum of the squares of the deviations of the sample values
from sample mean was 120 and in the other sample of 12 observations it was 314. test whether
the difference is significant at 5% level of significance
Solution
Given that n1 = 10, n2 = 12, Σ(x1 – X 1 )2 = 120
Σ(x2 – X 2 )2 = 314
Let us take the null hypothesis that the two samples are drawn from the same normal population
of equal variance
H0 : σ 12 σ 22
H1: σ 12 σ 22
n1 1
=
2
X2 X 2
n2 1
120
9
= 314
11
13.33
=
28.55
since the numerator should be greater than denominator
28.55
F= 2.1
13.33
The table value of F at 5% level of significance for V1 = 9 and V2 = 11. Since the calculated
value of F is less than the table value, we accept the hypothesis. The samples may have been
drawn from the two population having the same variances.
The Chi square test (χ2) is used when comparing an actual (observed) distribution with a
hypothesized, or explained distribution.
O E
2
It is given by; χ2 = E
Where O = Observed frequency
E = Expected frequency
The computed value of χ2 is compared with that of tabulated χ2 for a given significance level and
degrees of freedom.
Example
Mr. Nguku carried out a survey of 320 families in Ateka district, each family had 5 children and
they revealed the following distribution
No. of boys 5 4 3 2 1 0
No. of girls 0 1 2 3 4 5
No. of families 14 56 110 88 40 12
Is the result consistent with the hypothesis that male and female births are equally probable at
5% level of significance?
Solution
If the distribution of gender is equally probable then the distribution conforms to a binomial
distribution with probability P(X) = ½.
Therefore
H0 = the observed number of boys conforms to a binomial distribution with P = ½
H1 = The observations do not conform to a binomial distribution.
On the assumption that male and female births are equally probable the probability of a male
birth is P = ½ . The expected number of families can be calculated by the use of binomial
distribution. The probability of male births in a family of 5 is given by
P(x) = 5cX Px q5-x (for x = 0, 1, 2, 3, 4, 5,)
= 5cX ( ½ )5 (Since P = q = ½ )
To get the expected frequencies, multiply P(x) by the total number N = 320. The calculations are
shown below in the tables
5 5c ( ½ )5 =1 320 × 1 = 10
5
32 32
Arranging observed and expected frequencies in the following table and calculating x2
O E (O – E) 2 (O – E) 2 /E
14 10 16 1.60
56 50 16 0.72
110 100 100 1.00
88 100 144 1.44
40 50 100 2.00
12 10 4 0.40
Σ(0 – E) 2 /E = 7.16
O E
2
χ2 = E
= 7.16
χ2 = E
4. The characteristic of this distribution are defined by the number of degrees of
freedom (d.f.) which is given by
d.f. = (r-1) (c-1),
Where r is the number of rows and c is number of columns corresponding to a
chosen level of significance, the critical value is found from the chi squared
table
5. The calculated value of χ2 is compared with the tabulated value χ2 for (r-1) (c-1)
degrees of freedom at a certain level of significance. If the computed value of χ2
is greater than the tabulated value, the null hypothesis of independence is
rejected. Otherwise we accept it.
Sampling and Estimation 215
Example
In a sample of 200 people where a particular devise was selected, 100 were given a drug and the
others were not given any drug. The results are as follows
Drug No drug Total
Cured 65 55 120
Not cured 35 45 80
Total 100 100 200
Test whether the drug will be effective or not, at 5% level of significance.
Solution
Let us take the null hypothesis that the drug is not effective in curing the disease.
Applying the χ2 test
The expected cell frequencies are computed as follows
R1C1 120 100
E11 = = = 60
n 200
R2C1 80 100
E21 = = = 40
n 200
R2C2 80 100
E22 = = = 40
n 200
O E (O – E) 2 (O – E) 2 /E
65 60 25 0.417
55 60 25 0.625
35 40 25 0.417
45 40 25 0.625
Σ(O – E) 2 /E = 2.084
Arranging the observed frequencies with their corresponding frequencies in the following table
we get
O E
2
χ2 = E
= 2.084
2
V= (r –1) (c-1) = (2 – 1) (2 –1) = 1; tabulated ( 0.05 ) = 3.841
216 Lesson Six
The calculated value of χ2 is less than the table value. The hypothesis is accepted. Hence the
drug is not effective in curing the disease.
Test of homogeneity
It is concerned with the proposition that several populations are homogenous with respect to
some characteristic of interest e.g. one may be interested in knowing if raw material available
from several retailers are homogenous. A random sample is drawn from each of the population
and the number in each of sample falling into each category is determined. The sample data is
displayed in a contingency table
The analytical procedure is the same as that discussed for the test of independence
Example
A random sample of 400 persons was selected from each of three age groups and each person
was asked to specify which types of TV programs be preferred. The results are shown in the
following table
Type of program
Age group A B C Total
Under 30 120 30 50 200
30 – 44 10 75 15 100
45 and above 10 30 60 100
Total 140 135 125 400
Test the hypothesis that the populations are homogenous with respect to the types of television
program they prefer, at 5% level of significance.
Solution
Let us take hypothesis that the populations are homogenous with respect to different types of
television programs they prefer
Applying χ2 test
O E (O – E) 2 (O – E) 2 /E
120 70.00 2500.00 35.7143
10 35.00 625.00 17.8571
10 35.00 625.00 17.8571
30 67.50 1406.25 20.8333
75 33.75 1701.56 50.4166
30 33.75 14.06 0.4166
50 62.50 156.25 2.500
15 31.25 264.06 8.4499
60 31.25 826.56 26.449
Σ(O – E) 2 /E = 180.4948
O E
2
χ2 = E
Testing
(a) Hypothesis testing of mean
For n>30
X S
Z= Where S X at level of significance.
SX n
For n < 30
X S
t= where S X
SX n
at n – 1 d.f
level of significance
X1 X 2
Z=
S X1X 2
S12 S 22
Where S
X1X 2 n1 n2
At = level of significance
For n < 30
X1 X 2
t= at n1 + n2 – 2 d.f
S X1X 2
n1 n2
where S Sp
X1X 2 n1n2
and S p
n1 1 S12 n2 1 S22
n1 n2 2
Where:
pq pq
S P1 P2
n1 n2
p1n1 p2 n2
p=
n1 n2
q=1–P
(e) Chi-square test
O E
2
X2 = E
Where O = observed frequency
Column total × Row total
E= = expected frequency
Sample Size
(f) F – test (variance test)
S12
F=
S 22
here the bigger value between the standard deviations make the numerator.