Chapter 4 Inferential
Chapter 4 Inferential
Teresa K.
Basic terms
Teresa K.
Inferential Statistics cont’d…
Teresa K.
The Central Limit Theorem
Teresa K.
The central limit theorem cont’d...
Teresa K.
The central limit theorem cont’d…
Teresa K.
Sampling Distribution of the proportion
Teresa K.
Standard deviation and Standard error
Teresa K.
Parameter Estimations
Teresa K.
Point Estimation
x i
x = i1
n
Teresa K.
Point Estimation cont’d …
Teresa K.
Teresa K.
Point estimations cont’d …
Teresa K.
Point estimations cont’d …
Teresa K.
Example
Teresa K.
Interval Estimation
Teresa K.
Meaning of confidence Interval
Teresa K.
Interval Estimation cont’d …
Teresa K.
Interval Estimation cont’d …
The term z /2 ( ) is called the maximum error of
the estimate. n
Teresa K.
Interval estimation cont’d…
Teresa K.
Elements Of confidence Interval Estimation
Level of confidence
Precision (range)
Teresa K.
Teresa K.
Confidence interval cont’d…
On the other hand 99% CI will be wider than 95% CI; the
extra width meaning that we can be more certain that
the interval will contain the population parameter. But
to obtain a higher confidence from the same sample,
we must be willing to accept a larger margin of error (a
wider interval).
Teresa K.
Confidence interval cont’d…
Teresa K.
The t-distribution
Teresa K.
The t-distribution cont’d…
Teresa K.
The t-distribution cont’d…
Degrees of Freedom
As explained earlier, the t-distribution involves the
degrees of freedom (df).
It is defined as the number of values which are free
to vary after imposing a certain restriction on your
data.
Example: If 3 scores have a mean of 10, how many of
the scores can be freely chosen?
Solution
The first and the second scores could be chosen freely
(i.e., 8 and 12, 9 and 5, 7 & 15, etc.) But the third score
is fixed (i.e., 10, 16, 8, etc.)
Hence, there are two degrees of freedom.
Teresa K.
The t-distribution cont’d…
Table of t-distributions
The table of t-distribution shows values of t for
selected areas under the t curve.
Different values of df appear in the first column. The
table is adapted for efficient use for either one or
two-tailed tests.
Example 1. If df = 8, 5% of t scores are above what
value?
Example 2. Find to if n =13 and 95% of t scores are
between –to and +to.
Example 3. If df =5, what is the probability that a t
score is above 2.02 or Teresa
below K.
-2.02?
The t-distribution cont’d…
Solutions
1. Look at the table (t-distribution ). Along the row
labeled “one tail” to the value 0.05; the intersection
of the 0.05 column and the row with 8 in the df
column gives the value of t = 1.86.
2. df =13-1 = 12. If 95% of t scores are between -to and
+ to, then 5% are in the two tails. Look at the table
along the row labeled “two tail” to the value 0.05;
the intersection of this 0.05 column and the row
with 12 in the df column gives to = 2.179.
3. Two tails are implied. Look along the “df =5” row to
find the entry 2.02. The probability is 0.10 .
Teresa K.
Characteristics of the t distribution
CI = ( x - z /2 , x + z /2 )
n n
Teresa K.
Confidence interval for a single mean cont’d…
s s
CI = (x - t/2, n-1 , x + t/2, n-1 )
n n
Where, n-1 = degree of freedom for student’s t-
distribution and s = sample standard deviation.
Teresa K.
Confidence interval for a single mean cont’d…
No
Use tα/2 values and s in the formula.**
2 2
2 2
CI = (x1 - x2 ) - z/2 , (x1 - x2 ) + z/2
n n n n
1 2 1 2
Teresa K.
Interval estimation for difference of mean cont’d…
• Where ,
2 2
n1 n2
Teresa K.
Interval estimation for difference of mean cont’d…
CI = p Z (1 ) / n, p Z (1 ) / n
2 2
When np and nq are greater than or equal to 5
Teresa K.
Confidence interval for the difference of two
proportions
The point estimate for the difference of two
population proportion, π1- π2 is given by p1-p2.
A(1-α)100% confidence interval estimate for the
difference of population proportions, p1-p2 is given
by:
Teresa K.
In general the width of confidence interval depends
on:
Sample size,
Level of confidence and
The standard error.
Teresa K.
Examples
Example 1
A SRS of 36 apparently healthy subjects yielded the
following values of urine excreted (milligram per day);
0.007, 0.03, 0.025, 0.008, 0.03, 0.038, 0.007, 0.005,
0.032, 0.04, 0.009, 0.014, 0.011, 0.022, 0.009, 0.008,
0.012, 0.03, 0.05, 0.009, 0.008, 0.007, 0.006, 0.02,
0.034, 0.007, 0.008, 0.036, 0.007, 0.023, 0.011, 0.012,
0.022, 0.03, 0.04, 0.04
Compute point estimate of the population mean
Teresa K.
Example 1 cont’d…
If x 1 , x 2 , ..., x n are n observed values , then
n
xi
0 . 704
x = i =1
0 . 0196
n 36
Construct 90% and 95% confidence interval for the
mean:
90%CI =(0.0196-1.65x0.0123/6, 0.0196+1.65x0.0123/6)
=(0.0134, 0.0235)
95%CI=(0.0196-1.96x0.0123/6,0.0196+1.96x0.0123/6)
=(0.0124, 0.0245)
Teresa K.
Example 2
The mean diastolic blood pressure for 225 randomly
selected individuals is 75 mmHg with a standard
deviation of 12.0 mmHg. Construct a 95% confidence
interval for the mean
Solution
n=225
mean =75mmhg
Standard deviation=12 mmHg
confidence level 95%
The 95% confidence interval for the unknown population mean is
given
95%CI = (75 ±1.96x12/15) = (73.432,76.56)
Teresa K.
Example 3
In a survey of 300 automobile drivers in one city, 123
reported that they wear seat belts regularly. Estimate
the seat belt rate of the city and 95% confidence
interval for true population proportion.
Solution
The point estimate of p = 123/300
=0.41 (41%)
Teresa K.
Given
Population 1 (non-smokers)
n 1=50 , = 76, S1 = 8,
δ2 =
δ2 =
Teresa K.
Example 4
Solution
A. The point estimation of the difference of mean is:
point estimation of the difference = -
= 76 – 68
=8
B. 95% CI
CI =
= (8-1.96*1.28, 8+1.96*1.246)
=(5.491,10.442)
Teresa K.
Example 5
Each of two groups consists of 100 patients who
have leukemia. A new drug is given to the first group
but not to the second (the control group). It is found
that in the first group 75 people have remission for 2
years; but only 60 in the second group.
Teresa K.
Example 5 cont’d ….
Solution
Note that
n1p1=100*0.75=75>5
n1q1 = 100*0.25=25>5
n2p2 = 100*0.60=60>5
n2q2 =100*0.40=40>5
p1 = 0.75, q1 = 0.25, n1=100
p2 = 0.60, q2= 0.40, n2=100
δ21 = p1q1/n1 = 0.75*0.25/100= 0.001875
δ22 = p2q2/n2 =0.60*0.40/100= 0.0024
Hence, δ2(1-2) = 0.001875+ 0.0024 = 0.004275
δ(1-2) = = 0.0653
Teresa K.
Example 5 cont’d ….
Teresa K.
More exercises
In a hospital, the mean noise level in the 170 ward areas
was 58.0 decibels and the standard deviation was 4.8.
Find 95% confidence interval for the true mean?
In Addis Ababa, a survey of 350 students showed that
28% carried their lunch to school. Find the 95% CI for
the true population proportion of students who carried
their lunch to school?
A recent study in Gondar from 100 people found that 22
were obese. Find the 95% confidence interval for the
true population proportion?
Teresa K.
Exercises
Teresa K.
3. The standard hemoglobin reading for normal males of
adult age is 15 g/100 ml. The standard deviation is
about 2.5 g/100 ml. For a group of 36 male
construction workers, the sample mean was 16 g/100
ml.
– Construct a 95% confidence interval for the male
construction workers. What is your interpretation of
this interval relative to the normal adult male
population?
– What would the confidence interval have been if the
above results were obtained based on 49
construction workers?
Teresa K.
HYPOTHESIS TESTING
Teresa K.
Example of Hypothesis?
A hypothesis is an
assumption about the Example of hypothesis
population parameter. I assume the mean GPA of this
class is 3.5!
– A parameter is a
characteristic of the
population, like its mean or
variance.
– The parameter must be
identified before analysis.
Types of hypothesis;
1. The null hypothesis:
Null hypothesis (represented by HO) is the
statement about the value of the population
parameter. That is the null hypothesis postulates
that ‘there is no difference between factor and
outcome’ or ‘there is no an intervention effect’.
It is the main hypothesis which we wish to test .
pronounced
H “nought”
Teresa K.
Hypothesis Testing cont’d…
Possible choices of HA :
If Ho is Then HA is
µ = A(single mean) µ ≠ A or µ < A or µ > A
P = B(single proportion) p ≠ B or p < B or p> B
µ1- µ2 = C (difference of means) µ1- µ2 ≠ C or µ1- µ2 < C or µ1- µ2 > C
P1-p2 = D(difference of proportion) P1-p2 ≠D or p1-p2 < D or P1-P2 > D
Teresa K.
Hypothesis Testing cont’d…
Exercises
State HA and HO for each of the following
1. Is the average height of the GCMS students 1.63
m or is it more?
2. Is the average height of the GCMS students 1.63
m or is it less?
3. Is the average height of the GCMS students 1.63
m or is it something different?
4. There is a belief that 10% of the smokers develop
lung cancer in country x.
5. Are men and women infected by malaria in equal
proportions, or is a higher proportion of men get
malaria in Ethiopia?
Teresa K.
Hypothesis Testing cont’d…
Level of significance
A method for making a decision must be agreed
upon.
If HO is rejected, then HA is accepted.
How is a “significant” difference defined?
A null hypothesis is either true or false, and it is
either rejected or not rejected.
No error is made if it is true and we fail to reject it, or
if it is false and rejected.
An error is made, however, if it is true but rejected,
or if it is false and we fail to reject it.
Teresa K.
Hypothesis Testing cont’d…
Teresa K.
Hypothesis Testing cont’d…
Teresa K.
Hypothesis Testing cont’d…
Teresa K.
Hypothesis Testing cont’d…
Teresa K.
Hypothesis Testing cont’d…
Teresa K.
Level of Significance, and the Rejection Region
H 0:
H 1: > 0
/2
H 0:
H 1:
0
Two tailed test
Teresa K.
What Do We Test
Teresa K.
Hypothesis testing for population mean
For the given null hypothesis, Ho: µ = µo
we could have three different alternative hypothesis.
These are:
Teresa K.
Steps for two tailed Test
Test procedure for two tailed test
1. state the null hypothesis: H0: µ =µ0
2. state the alternative hypothesis:H1:µ≠µ0
3. fix the level of significance(α) and construct the
test statistics under the null (assuming the null
hypothesis is true) as:
x µ0
z =
se
note that: this is not the only test statistics.
depending on the type of data and sample size, we
may need to compute z-score, t-score or x2-score
For large samples (n≥30), the test statistic has
standard normally distribution
z ~ N (0, 1) Teresa K.
Steps for two tailed Test cont’d…
Teresa K.
4. Find the critical values corresponds to the given alpha
(α) from the distribution table.
5. Decision rule: For the hypothesis which is two tailed,
the decision is defined by:
Reject the null Hypothesis if:
x - o
zcal = > ztab = z /2
SE
Do not reject the null hypothesis if:
x - o
zcal = < ztab = z /2
SE
Teresa K.
Alpha ( ) vs. critical value.
The α-level is represented by
the clouded areas.
Sample results in this area lead
to rejection of H0.
Region of
DOUBT Region of
/rejection DOUBT /rejection
region region
Acceptance region
Critical value
Teresa K.
Example:
Teresa K.
Example: cont’d…
3. Test statistics:
x - o 27 - 30
Z = = = -2.12
SE 20
10
4. Critical value: is Z value with level of significance α/2 =
0.025 if the test is two test. Thus the value of Zα/2 = Z0.025
= 1.96. So, the critical value will be ±Zα/2 = ±1.96.
Therefore we will reject the null hypothesis if the
calculated value of z is less than -1.96 or greater than
+1.96. let us illustrate this using the following normal
curve:
Teresa K.
Test procedure for one tailed test
x - o
Z =
SE
Teresa K.
Test procedure for one tailed test cont’d…
x - o
Z cal = < z
SE
Teresa K.
Test procedure for one tailed test cont’d…
Teresa K.
Example: cont’d …
3. Test statistics:
x - o 27 - 30
Z = = = -2.12
SE 20
10
Critical value
Teresa K.
Comparison of two Means
The purpose of this section is to extend the arguments
of the single mean to the comparison of two sample
means.
In the comparison of two means, there are two
samples of observations from two underlying
populations (often treatment and control groups)
whose means are denoted by µt and µc and whose
standard deviations are denoted by δt and δc
Teresa K.
The relevant null hypothesis is that the means are
identical, i.e.,
HO: µt= µc or HO : µt- µc = 0
The rationale for the test of significance is as before.
Assuming the null hypothesis is true (i.e., that there is
no difference in the population means), one determines
the chance of obtaining differences in sample means as
discrepant as or more discrepant than that observed.
If this chance is sufficiently small, there is reasonable
evidence to doubt the validity of the null hypothesis;
hence, one concludes there is a statistically significant
difference between the means of the two populations
(i.e., one rejects the null Teresa
hypothesis).
K.
Example
If a random sample of 50 non-smokers have a mean life
of 76 years with a standard deviation of 8 years, and a
random sample of 65 smokers live 68 years with a
standard deviation of 9 years,
Test the hypothesis that there is no difference
between the mean lifetimes of non smokers and
smokers at a 0.01 level of significance.
Teresa K.
Given
Population 1 (non-smokers)
n 1=50 , = 76, S1 = 8,
δ 21 =
δ(n-s) = =
Teresa K.
Example:
• The national institute of mental health published an
article stating that in any one year period,
approximately 9.5 percent of American adults suffer
from depression or a depressive illness. Suppose that
in a survey of 100 people in a certain town, seven of
them suffered from depression or a depressive
illness. Conduct a hypothesis test to determine if the
true proportion of the people in that town suffering
from depression or depressive illness is different
from the percent in the general adult American
population.
Teresa K.
Example:
Teresa K.
Example: cont’d…
• To compute the calculated value of the test statistics,
first we need to calculate sample proportion from
the data. That is p = 7/100= 0.07. from this the
standard error of proportion is defined as:
pq 0.095* 0.905
p = = = 0.0293
n 100
• Then the calculated value of z from the data is given
by:
p - 0.07 - 0.095
z= = -29.08
SE 0.0293
• Decision reject the null hypoyhesis.
Teresa K.
Procedure for one tailed test of proportion
The procedure for the one sided test of proportion is
quit similar with one tailed of population mean. The
only difference is the standard deviation of proportion.
Teresa K.
Teresa K.
Hypothesis testing for two proportions
A similar approach is adopted when performing a
hypothesis test to compare two proportions. The
standard error of the difference in proportions is again
calculated, but because we are evaluating the
probability of the data on the assumption that the null
hypothesis is true we calculate a slightly different
standard error.
Se(p1-p2) =
zcal =
Teresa K.
Hypothesis testing for two proportions cont’d…
( p 1 p 2 ) ( 1 2 )
z cal
p 1 (1 p 1 ) p 2 (1 p 2 )
n1 n2
Teresa K.
Example:
Z cal = =
= Teresa K.
= 0.018/0.0021=8.571
Decision: reject Ho
Because Z calc > Z tab; in other words, the p- value is
less than the level of significance (i.e., α= 0.01)
Teresa K.
One tailed test
Help the health officer in testing the hypothesis that the
malaria prevalence of 1978 was greater than that of
1979 (take the level of significance, a =0.01).
Solution
Ho: π1978 = π1979 or π1978-π1979 =0
HA:π1978 > π1979 or π1979 -π1978 > 0
P1978 =0.038, n1978 =15,000
P1979 =0.02, n1979 = 10,000
Ztab (α=0.01, one tailed) = 2.33
Teresa K.
solutions
Test statistics:
Z cal = =
= = 0.018/0.0021=8.571
Decision: reject Ho
Because Zcalc > Ztab; in other words, the p- value is less
than the level of significance (i.e., α= 0.01)
Teresa K.
Hypothesis testing for means and proportion of small
samples
We have seen in the preceding sections how the
Standard normal distribution can be used to carryout
tests of significance for the means and proportions of
large samples.
In this section we shall see how similar methods may be
used when we have small samples, using the t-
distribution.
Teresa K.
Tests of Hypothesis using the t - distribution
Teresa K.
Tests of Hypothesis using the t - distribution cont’d…
If past experience indicates that the mean pulse rate of
first year male medical students is 72 beats per minute,
test the hypothesis that the above sample estimate is
consistent with the population mean at 5% level of
significance.
Solution:
Hypotheses :
Ho: µo = 72
HA: µ ≠ 72
tcal = = = -3.3/2.89 = -1.14
Teresa K.
Tests of Hypothesis using the t - distribution cont’d…
Teresa K.
So , with large p-value, we can not ignore the effect
of chance.
If the p-value < α (like 0.05), then we say the
difference is significant and hence reject the null
hypothesis of no difference.
While if p-value > α (like 0.05), then the difference is
not significant and hence do not reject the null
hypothesis.
Teresa K.
The p-value for test is:
x 0
p value 1
/ n
N (0 , 2 / n) N (1, 2 / n)
Under H0 Under H1
Rejection region
μ0 μ1
Teresa K.
Teresa K.
Teresa K.
Example
Certain brand of cigarettes is advertised by manufacturer
as having mean nicotine content of 15 mg/cigarette. A
sample of 200 cigarettes is tested by lab and found to
have average of 15.76mg of nicotine with known SD = 3.6.
Using a 0.01 level of significance, can we conclude that
actual mean nicotine content of this brand is greater than
15 mg?(use p-value to test)
Solution:
Following the steps of hypothesis testing, the first step is
stating the null and alternative hypothesis, but before
that let us see the observed difference using the normal
curve:
Teresa K.
Is 15.76 far enough to right
of μ=15 to be in the critical
area (rejection region)?
15
Teresa K.
Ho: π=15;
HA: π > 15
Confidence Interval
1. Provide information that p-value gives.
– If null value is included in a 95% confidence interval,
by definition the corresponding P-value is >0.05.
Teresa K.