Statistical Method
Statistical Method
Statistical Method
CHAPTER ONE
1. REVIEW OF SOME BASIC CONCEPTS
1.1 Basic concepts: population, sample, parameter, statistic
Statistics is a set of scientific principles and techniques that are useful in reaching conclusions about
populations and processes when the available information is both limited and variable; that is, statistics is
the science of learning from data. Almost everyone including corporate presidents, marketing
representatives, social scientists, engineers, medical researchers, and consumers deals with data. Although,
the objective of statistical methods is to make the process of scientific research as efficient and productive
as possible, many scientists and engineers have inadequate training in experimental design and in the
proper selection of statistical analyses for experimentally acquired data.
A population A Population is the set of all items or individuals of interest whose properties are to be
analyzed. It consists of all subjects (human or otherwise) that are being studied. Most of the time, due to
the expense, time, size of population, medical concerns, etc., it is not possible to use the entire population
for a statistical study; therefore, researchers use samples.
A sample is a group of subjects selected from a population or it is a subset of the population. If the
subjects of a sample are properly selected, most of the time they should possess the same or similar
characteristics as the subjects in the population.
Why sample?
Les time consuming than census
Less costly to administer than a census
When it’s impossible to study the whole population
Parameter: is a descriptive measure of a population, or summary value calculated from a population.
Examples: Population Range, Population Average, Population proportion, Population variance, etc.
Statistic: is a descriptive measure of a sample, or summary value calculated from a sample.
Example: Sample Proportion, Sample Average, Sample Range, Sample variance.
Target Population: The population to be studied/ to which the investigator wants to generalize his
results.
Sampling Unit: smallest unit from which sample can be selected
1
Statistical Methods (Stat 1013)Lecture Notes UoG
Sampling frame: List of all the sampling units from which sample is drawn
Example: a college dean is interested in learning about the average age of faculty members. Identify the
basic terms in this situation:
The population is the age of all faculty members at the college.
A sample is any subset of the population. For example, we might select 10 faculty members and
determine their age.
The variable is the “age” of each faculty member
The experiment would be the method used to select the ages forming the sample and determining the
actual age of each faculty member in the sample.
The parameter of interest is the “average” age of all faculties at the college.
The statistic is the “average” age for all faculties in the sample.
1.2 Review of descriptive statistics
Depending on how data can be used Statistics is divided in two main areas:
Descriptive and inferential statistics
Descriptive statistics
Is concerned with summary calculations, graphs, charts and tables.
Collecting, presenting, and describing data
Measures of averages and dispersions
Summarizing and Graphing Data
Frequency distribution (frequency table): shows how a data set is partitioned among all of
several categories (classes) by listing all of the categories along with the number of data values
in each of the categories.
Statistical Graphics
Objective is to identify a suitable graph for representing the data set. The graph should be effective
in revealing the important characteristics of the data.
Graphs use a visual tool called a histogram to analyze the shape of the distribution of the data.
Histogram Frequency Polygon Relative Frequency Polygon
Dot Plot Ogive curve or a line graph Bar Graph
Pie Chart Scatter Plot (or Scatter Diagram)
Time series Box plot
Statistical graphics are used to:
Describing data: consider distribution, center, variation, and outliers
Exploring data: features that reveal some useful and/or interesting characteristic of the data set.
2
Statistical Methods (Stat 1013)Lecture Notes UoG
3
Statistical Methods (Stat 1013)Lecture Notes UoG
Examples of discrete data may be the number of subjects sensitive to the effect of a drug (number of
“success” and number of “failure”).
Or continuous (which can assume any value within a specific range, such as the air pressure in a tire.)
Continuous variables are variables which assume an infinite number of possible values between any
two specific values.
Are usually obtained by measurement.
Examples continuous data are weight, height, pressure, and survival time.
There are many numerical indicators for summarizing and describing data. The most common ones
indicate central tendency, variability, and proportional representation (the sample mean, variance, and
percentiles, respectively).
Inferential statistics
Making statements about a population by examining sample results
Drawing conclusions and/or making decisions concerning a population based on sample results.
The two major activities of inferential statistics are
To use sample data to estimate values of a population parameters and
To test hypothesis or claims made about population parameters.
Can be either estimation or hypothesis testing
1.3 Review of probability concepts and distributions
Probability:-is the measure of a chance that something will appear or it is uncertainty.
Random experiment: involves obtaining observations of some kind. It is an experiment that can be
repeated any number of times under similar conditions and it is possible to enumerate the total number of
outcomes without predicting an individual out come.
Examples Toss of a coin; throw a die, counting arrivals at emergency room, etc.
Population: Set of all possible observations. Conceptually, a population could be generated by repeating
an experiment indefinitely.
Outcome: The result of a single trial of a random experiment.
Elementary event (simple event): one possible outcome of an experiment
Event (Compound event): One or more possible outcomes of a random experiment.
Sample space: the set of all sample points (simple events) for an experiment is called a sample space; or
set of all possible outcomes for an experiment.
Notation:
Sample space: S
Sample point: E1, E2 . . . etc.
4
Statistical Methods (Stat 1013)Lecture Notes UoG
Limitation:
If it is not possible to enumerate all the possible outcomes for an experiment.
If the sample points (outcomes) are not mutually independent.
If the total number of outcomes is infinite.
If each and every outcomes is not equally likely.
Example A fair die is tossed once. What is the probability of getting?
a) Number 4?
b) An odd number?
c) Number greater than 4?
d) Either 1 or 2 or …. Or 6
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of these candles are
selected at random, what is the probability?
a) All will be defective.
b) 6 will be non defective
c) All will be non defective
The Frequentist Approach
This is based on the relative frequencies of outcomes belonging to an event.
Definition: The probability of an event A is the proportion of outcomes favorable to A in the long run
when the experiment is repeated under same condition.
lim NA
P( A) n
N
Example If records show that 60 out of 100,000 bulbs produced are defective. What is the probability of a
newly produced bulb to be defective?
6
Statistical Methods (Stat 1013)Lecture Notes UoG
Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. With each event A a real
number called the probability of A satisfies the following properties called axioms of probability or
postulates of probability.
1. 0 P A 1
2. P(S) =1
3. If A and B are mutually exclusive events, the probability that one or the other occur equals the sum of
the two probabilities. i. e. P (A∪ B) =P (A) +P (B)
Subjective Approach
It is always based on some prior body of knowledge. Hence subjective measures of uncertainty are always
conditional on this prior knowledge. The subjective approach accepts unreservedly that different people
(even experts) may have vastly different beliefs about the uncertainty of the same event.
Example: Abebe’s belief about the chances of Ethiopia Buna club winning the FA Cup this year may be
very different from Daniel's. Abebe, using only his knowledge of the current team and past achievements
may rate the chances at 30%.
Daniel, on the other hand, may rate the chances as 10% based on some inside knowledge he has about key
players having to be sold in the next two months.
Some basic properties of probability
1) For any event A , 0≤P(A)≤1
2) P 0
3) For any event A and B ,P(A∪ B)=P(A)+P(B)-P(A∩B)
4) P A 1 P ( A)
5) P(S) = 1 (since S contains all the outcomes, S always occurs).
Generalized basic principle of counting rules:
Addition Rule
Suppose that the 1st procedure designed can be performed in n1 ways. Assume that 2nd procedure designed
can be performed in n2 ways. suppose further more that, it is not possible that both procedures 1 and 2 are
performed together then the number of ways in which we can perform 1 or 2 procedure is n1+n2 ways, and
also if we have another procedure that is designed by k with possible way of nk we can conclude that there
is n1+n2+…+nk possible ways.
Example: suppose that there are 2 ways to go from place A to place B it one takes a bus and 3 ways it one
takes a train. In how many ways can somebody go from place A to B?
7
Statistical Methods (Stat 1013)Lecture Notes UoG
Example
1. How many possible answers we have from 3 questions, each question has 4 possible
answers.
The number of possible answer =4*4*4=64
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n! =n*(n-1)*(n-2)*,…,*2*1.
2. The arrangement of n objects in a specified order using r objects at a time is called the permutation of n
n!
objects taken r objects at a time. It is written as P r and the formula is n Pr
n
n r !
3. The number of permutations of n objects in which k1 are alike, k2 are alike ---- etc is
n!
nPr =
k1!k 2 !....k n !
Example Suppose we have a letters A, B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word
“CORRECTION”?
Combination
A selection of objects without regard to order is called combination.
Example: Given the letters A, B, C, and D list the permutation and combination for selecting two letters.
Solutions:
Permutation: Combination:
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
8
Statistical Methods (Stat 1013)Lecture Notes UoG
Combination Rule
The number of combinations of r objects selected from n objects is denoted by nC r or and is given
n
r
by the formula:
n!
nC r
n r ! r !
Exercise:
1. In how many ways can a committee of 5 people be chosen out of 9 people?
2. A committee of 5 people must be selected out 5 men and 8 women. In how many ways
can be selection made if there are three women on the committee?
Probability Distribution:
Definition of random variables and probability Distribution:
Random variable: - is numerical valued function defined on the sample space. It assigns a real number
for each element of the sample space. Generally a random variables are denoted by capital letters and the
value of the random variables are denoted by small letters
Example: Consider an experiment of tossed a fair of coin three times. Let the random variable X be the
number of heads in three tosses, then find X?
Random variables are of two types:
1. Discrete random variable: are variables which can assume only a specific number of values. They
have values that can be counted
Examples:
• Toss a coin n time and count the number of heads.
• Number of children in a family.
• Number of car accidents per week.
• Number of defective items in a given company.
• Number of bacteria per two cubic centimeter of water.
2. Continuous random variable: are variables that can assume all values between any two give values.
Examples:
• Height of students at certain college.
• Mark of a student.
• Life time of light bulbs.
• Length of time required to complete a given training.
Probability distribution:- consists of a value a random variable can assume and the corresponding
probabilities of the values or it is a function that assigns probability for each element of random variable.
9
Statistical Methods (Stat 1013)Lecture Notes UoG
2) P X xi 0 or 0 P X xi 1
3) If X is discrete random variable then
b 1
P a X b P( x)
X a 1
b 1
P a X b P( x)
X a
b
P a X b P( x)
X a 1
b
P a X b P( x)
X a
b) P X 0
c) P a X b P a X b P ( a X b ) P ( a X b )
Introduction to expectation
Definition:
10
Statistical Methods (Stat 1013)Lecture Notes UoG
1. Let a discrete random variable X assume the values X1, X2, ….,Xn with the probabilities P(X1), P(X2),
….,P(Xn) respectively. Then the expected value of X, denoted as E(X) is defined as:
n
E(X) =X1.P(X1) +X2.P(X2) +…. +Xn.P(Xn) = X i .P X i
i 1
2. Let X be a continuous random variable assuming the values in the interval (a, b) such that
b
X 2 f x d ( x) if X is continuous
x
Rule of Expectation
1) Let X be a R.V and k be a real number, then
11
Statistical Methods (Stat 1013)Lecture Notes UoG
1
x 0 x2
f ( x) 2
0, otherwise
Then find a) P (1<x<1.5)
b) E(x)
a) Var(x)
b) E (3x 2 2 x)
Common Discrete Probability Distributions
1. Binomial Distribution
A binomial experiment is a probability experiment that satisfies the following four requirements called
assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a failure.
3. The probability of each outcome does not change from trial to trial, and
4. The trials are independent, thus we must sample with replacement.
Examples of binomial experiments
• Tossing a coin 20 times to see how many tails occur.
• Asking 200 people if they watch BBC news.
• Registering a newly produced product as defective or non defective.
• Asking 100 people if they favor the ruling party.
• Rolling a die to see if a 5 appears.
Definition: The outcomes of the binomial experiment and the corresponding probabilities of these
outcomes are called Binomial Distribution.
Let p=probability of success q= 1-p=probability of failure on any given trials
Then the probability getting x success in n trials becomes
n x n x
. p q x 0,1,3,....n
P X x x
0 otherwise
And this sometimes written as
X ~ Bin ( n, p )
When using the binomial formula to solve problems, we have to identify three things:
• The number of trials (n)
• The probability of a success on any one trial (P) and
• The number of successes desired (X).
12
Statistical Methods (Stat 1013)Lecture Notes UoG
Note: The Poisson probability distribution provides a close approximation to the binomial probability
distribution when n is large and p is quite small or quite large with λ=np.
x
P X x
np .e np
where np the average number
x!
x 0,1,2,....
Usually we use this approximation if 5≤np. In other words, if n>20 and np<5 or n(1-p) ≤5 then we may
use Poisson distribution as an approximation to binomial distribution.
Example:
1. If 1.6 accidents can be expected an intersection on any given day, what is the probability that there
will be 3 accidents on any given day?
2. If there are 200 typographical errors randomly distributed in a 500-page manuscript, find the
probability that a given page contains exactly 3 errors.
3. A sale firm receives, on the average, 3 calls per hour on its toll-free number. For any given hour, find
the probability that it will receive the following.
a) At most 3 calls
b) At least 3 calls
c) Five or more calls
4. If approximately 2% of the people are left-handed, find the probability that in a room 200 people, there
are exactly 5 people who are left-handed?
Common Continuous Probability Distributions
Normal Distribution
A random variable X is said to have a normal distribution if its probability density function is given by
2
1 1 x
f x .e 2
where x , , 0
2
E x and 2 var iance x are parameters of the normal distribution.
Properties of Normal Distribution:
1. It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum ordinate is at
μ=x and is given by:
1
f x
2
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution i.e. there is no gaps or holes.
14
Statistical Methods (Stat 1013)Lecture Notes UoG
4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a different
normal distribution. Thus, the normal distribution is completely described by two parameters: mean
and standard deviation.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean is 0.5
f ( x)d x 1
Student’s t Distribution
In statistics as long as sample size is large enough, most datasets can be explained by Standard Normal
Distribution. But when the sample size is small, statisticians rely on the distribution of the t statistic (also
known as the t score), whose value is given by:
[x μ]
t=
s
n
Where x is the sample mean, μ is the population mean, s is the standard deviation of the sample, and n is
the sample size.
The distribution of the t statistic is called the t distribution or the Student t distribution. The particular
form of the t distribution is determined by its Degrees of Freedom (df). The degree of freedom refers to
the number of independent observations in a set of data. When estimating a mean score or a proportion
from a single sample, the number of independent observations is equal to the sample size minus one. The t
distribution can be used with any statistic having a bell-shaped distribution (i.e., approximately normal).
The t distribution has the following properties:
The mean of the distribution is equal to 0.
The variance is equal to v / (v - 2), where v is the degrees of freedom.
With infinite degrees of freedom, the t distribution is the same as the standard normal distribution.
The t distribution is similar to standard normal distribution in the following ways
It is bell-shaped.
16
Statistical Methods (Stat 1013)Lecture Notes UoG
2). Find the chi-square critical value for a specific when the hypothesis test is one tailed left. In this
case, the value must be subtracted from one. Then, the left side of the table used, because the 2 table
gives the area to the right of the critical value, the 2 statistics can not be negative.
Example: The critical 2 value for 10 df when =0.05 and the test is one-tailed left is 3.940.
3). Find the chi-square critical value for a specific when the hypothesis test is two-tailed. When a two-
tailed test is conducted, the area must be split. For example, to find the critical chi-square values for 22
degrees of freedom when =0.05, we use the area to the right of the larger value 0.025 (0.05/2), and the
17
Statistical Methods (Stat 1013)Lecture Notes UoG
area to right of the smaller value 0.975(1-0.05/2). Hence, one must use values in the table of 0.025 and
0.975, with 22 degrees of freedom the critical values are 36.781 and 10.982 respectively.
Note that after the degrees of freedom reach 30, chi-square table only gives values for multiples of 10(40,
50,60,etc.). When the exact degrees of freedom one is seeking are not specified in the table, the closer
smaller value should be used.
The chi-square distribution has the following properties:
The mean of the distribution is equal to the number of degrees of freedom (v) i.e μ = v.
F-Distribution
The F distribution is an asymmetric distribution that has a minimum value of 0, but no maximum value.
The curve reaches a peak not far to the right of 0, and then gradually approaches the horizontal axis the
larger the F value is. The F distribution approaches, but never quite touches the horizontal axis.
The F distribution has two degrees of freedom, n for the numerator, m for the denominator. For each
combination of these degrees of freedom there is a different F distribution. The F distribution is most
spread out when the degrees of freedom are small. As the degrees of freedom increase, the F distribution
is less dispersed.
Let U~χ and V~χ , that is if U and V are independent chi-squared random then the statistic:
Let X , X , … , X i. i. d random variables from N(μ , δ ) and let Y , Y , … , Y i. i. d random variables from
N(μ , δ ) then:
~F , Where
S = ∑ (x − x) , x = ∑ x
S = ∑ (y − y) , y = ∑ y
18
Statistical Methods (Stat 1013)Lecture Notes UoG
CHAPTER TWO
2. Inference about a Population Mean and Proportion
2.1 Introduction
Recall the five stages in statistical investigations:
Collection of data
Organization of data
Presentation of data
Analysis of data and
Interpretation of data
As we have discussed so far, one of the primary objectives of a statistical analysis is to use data from a
sample to make inferences about the population from which the sample was drawn. In this chapter we will
discuss on the basic procedures for making such inferences.
Definition: Inference is the process of making interpretations or conclusions from sample data for the
totality of the population. Statistical inference may be divided into two major areas: statistical estimation
and statistical hypothesis testing. For instance, suppose that a structural engineer is analyzing the tensile
strength of a component used in an automobile chassis. Since variability in tensile strength is naturally
present between the individual components because of differences in raw material batches, manufacturing
processes, and measurement procedures, the engineer is interested in estimating the mean tensile strength
of the components. In practice, the engineer will use sample data to compute a number that is in some
sense a reasonable value (or guess) of the true mean. This number is called a point estimate. Now
consider a situation in which two different reaction temperatures can be used in a chemical process, say t1
and t2. The engineer conjectures that t1 results in higher yields than does t2. Statistical hypothesis testing
is a framework for solving problems of this type. In this case, the hypothesis would be that the mean yield
using temperature t1 is greater than the mean yield using temperature t2. Notice that there is no emphasis
on estimating yields; instead, the focus is on drawing conclusions about a stated hypothesis.
2.2 Estimation
Statistical Estimation: This is one way of making inference about the population parameter where the
investigator does not have any prior notion about values or characteristics of the population parameter.
It is the process by which sample data are used to indicate the value of an unknown quantity in a
population.
There are two ways of estimation.
1) Point Estimation
19
Statistical Methods (Stat 1013)Lecture Notes UoG
2) Interval estimation
It is the procedure that results in the interval of values as an estimate for a parameter, which is interval that
contains the likely values of a parameter. It deals with identifying the upper and lower limits of a
parameter. The limits by themselves are random variable.
Estimator and Estimate
Estimator: is the rule or random variable that helps us to approximate a population parameter.
It is any quantity calculated from the data which is used to give information about an unknown quantity in
a population.
Estimate: is the different possible value which an estimator can assume. It is an indication of the value of
an unknown quantity based on observed data. Or it is the particular value of an estimator that is obtained
from a particular sample of data and used to indicate the value of a parameter.
∑
Example: The sample mean X = is an estimator for the population mean and X =10 is an estimate,
3. Relatively Efficient Estimator: The estimator for a parameter with the smallest variance. This
actually compares two or more estimators for one parameter.
Definition: A point estimate: is a specific numerical value estimate of a parameter. It is a single value
computed from the observations in a sample that is used to estimate the value of the target parameter.
Definition: An interval estimate for a parameter is an interval of numbers within which we expect the
true value of the population parameter to be contained. An “interval estimator” draws inferences about a
population by estimating the value of an unknown parameter using an interval. The endpoints of the
interval are computed based on sample information.
Example: mean weakly income of sample of 25 households is $380-$420.
2.2.2 Sampling Distribution of the Sample Mean
Sampling Distribution
Suppose we have a finite population and we draw all possible simple random samples of size n by without
replacement or with replacement. For each sample we calculate some statistic (sample mean X or
proportion P etc.). All possible values of the statistic make a probability distribution which is called the
sampling distribution.
The sampling distribution of a statistic is the distribution of all possible values of the statistic, computed
from samples of the same size randomly drawn from the same population. When sampling a discrete,
finite population, a sampling distribution can be constructed. The number of all possible samples is
usually very large and obviously the number of statistics (any function of the sample) will be equal to the
number of sample if one and only one statistic is calculated from each sample. In fact, in practical
situations, the sampling distribution has very large number of values. The shape of the sampling
distribution depends upon the size of the sample and the nature of the population and the statistic which is
calculated from all possible simple random samples.
Sampling Distribution of Sample Mean, ( X)
Given a finite population with mean (μ) and variance (δ ), when sampling from a normally distributed
population, it can be shown that the distribution of the sample mean will have the following properties.
The probability distribution of all possible values of Xcalculated from all possible simple random samples
are called the sampling distribution of X. In brief, we shall call it distribution of X. The mean of this
distribution is called expected value of X and is written as E(X) or μ .
21
Statistical Methods (Stat 1013)Lecture Notes UoG
The standard deviation (standard error) of this distribution is denoted by S.E(X) or σ and the variance of
X is denoted by Var(X) or σ . The distribution of X has some important properties as under:
An important property of the distribution of X is that it is a normal distribution when the size of the sample
is large. When the sample size n is more than 30, we call it a large sample size. The shape of the
population distribution does not matter. The population may be normal or non-normal, the distribution of
X is normal for n > 30. But this is true when the number of samples is very large. As the distribution of
random variable X is normal, X can be transformed into standard normal variable Z where
μ
Z= σ .
√
The distribution of X has the t-distribution when the population is normal and n ≤ 30. Diagram (a) shows
the normal distribution and diagram (b) shows the t-distribution.
The square root of the variance of the sampling distribution is called the standard error of the mean which
is also called the standard error. The standard error (standard deviation) of X is related with the standard
deviation of population σ through the relations:
S.E(X) = σ = . This is true when population is infinite which means n is very large or the sampling is
√
S.E(X) = σ = . This is true when sampling is without replacement from finite population.
√
The above two equations between and σ are true both for small as well as large sample sizes.
22
Statistical Methods (Stat 1013)Lecture Notes UoG
The Central Limit Theorem: Even if the population is not normal, sample means from the population
will be approximately normal as long as the sample size is large enough.
Example 1:
Assume there is a population of size N=4 and random variable, X, is age of individuals Values of X: 18,
20, 22, 24 (years). Consider all possible samples of size n = 2 with replacement and construct sampling
distribution of the sample mean.
23
Statistical Methods (Stat 1013)Lecture Notes UoG
= ∑ = = 21, = ∑ ( − ) = 2.236
24
Statistical Methods (Stat 1013)Lecture Notes UoG
Example 2:
Draw all possible samples of size 2 without replacement from a population consisting of 3, 6, 9, 12, 15.
Form the sample distribution of sample means and verify the results:
The sampling distribution of the sample mean ( )and its mean and standard deviation are:
25
Statistical Methods (Stat 1013)Lecture Notes UoG
( )= ( ) = 90 10 = 9
.
( )= ∑ ( ) − [∑ ( )] = − ( ) = 87.75 – 81 = 6.75
9 36 81 144 225
∑ ∑ ∑
= = = 9 and = − ( ) = − ( ) = 99 – 81 = 18
Verification:
number of independent random effects and hence are approximately normally distributed by the central
limit theorem.
2.2.3 Point and Interval Estimation of Population Mean
Point Estimation
Another term for statistic is point estimate, since we are estimating the parameter value. A point
estimate is a single value (or point) used to approximate a population parameter. For instance, sum of
∑
over n is the point estimator used to compute the estimate of the population means, . That is = a
point estimator of the population mean. A point estimate of the population mean, is the sample mean,
which varies from sample to sample . A drawback to the point estimate is that it fails to make a
probability statement as to how close the estimate, ( ) is to the population parameter, .
Confidence Interval Estimation of the Population Mean
Although ( ) possesses nearly all the qualities of a good estimator, because of sampling error, we know
that it's not likely that our sample statistic will be equal to the population parameter, but instead will fall
into an interval of values. We will have to be satisfied knowing that the statistic is "close to" the
parameter. That leads to the obvious question, what is "close"?
We can phrase the latter question differently: How confident can we be that the value of the statistic falls
within a certain "distance" of the parameter? Or, what is the probability that the parameter's value is
within a certain range of the statistic's value? This range is the confidence interval.
A confidence interval allows us to estimate the unknown parameter, and provides a margin for error
indicating how good our estimate is. The method of confidence interval give an idea on the actual
numerical values of a parameters by giving it a range of possible values of the parameter and the degree of
our confidence in limiting the unknown population parameter within that limit.
The confidence level is the probability that the value of the parameter falls within the range specified by
the confidence interval surrounding the statistic. There are different cases to be considered to construct
confidence intervals.
Case 1: If the population is normal with known variance
Recall the Central Limit Theorem, which applies to the sampling distribution of the mean of a sample.
Consider samples of size n drawn from a population, whose mean is and standard deviation is with
replacement and order important. The population can have any frequency distribution.
The sampling distribution of will have a mean ̅ = and a standard deviation = , and approaches
√
a normal distribution as n gets large. This allows us to use the normal distribution curve for computing
confidence intervals.
27
Statistical Methods (Stat 1013)Lecture Notes UoG
If X ~ ( , ), ℎ ~ , = ~ (0, 1)
√
=> = ±Z = ± , ℎ .
√
=> =Z
√
For the interval estimator to be good the error should be small.
How it be small (how to minimize ?
· By making n large
· Small variability
· Taking Z small
- The smaller the interval the more the precise.
- Let α be the probability that the parameter lies outside the interval.
- To obtain the value of Z, we have to attach this to a theory of chance. That is, there is an area of
size 1- α such that;
− < < +
√ √
The (1 - )*100% confidence interval estimator for is a random interval
− , +
√ √
28
Statistical Methods (Stat 1013)Lecture Notes UoG
The probability is (1 - ) that the above random interval includes the true value of . (1 - ) of all random
samples will produce an inequality, (the (1 - ) confidence interval estimate for )
Note that the confidence interval estimate contains no random quantities at all. The statement is either
absolutely certain to be true or absolutely certain to be false,(depending on the values of , , ̅ , n and ).
Case 2: If the sample size is large and the variance is unknown
Usually is not known, in that case we estimate by its point estimator
=> ( − , − ) is a (1- )100% confidence interval for .
√ √
Here are the z values corresponding to the most commonly used confidence levels.
100(1-α)% α α/2
90 0.10 0.05
1.645
95 0.05 0.025
1.96
99 0.01 0.005
2.58
Case 3: If sample size is small and the population variance, is not known.
Sampling where σ is unknown: in this case we cannot assume that our sampling distribution follows a
completely normal distribution. Now we must use the t-distribution (sometimes called the students t-
distribution) to make probability statements.
t-distribution – is a family of distributions that is based on degrees of freedom. Each distribution with its
degrees of freedom has its own features. As the degrees of freedom (d.f) go up the distributions get closer
and closer to the standard normal. This is because as the d.f go up the variability is reduced. We read the
chart the same way as the standard normal with the only difference coming from the fact that we now have
df which is (n-1)
t= , has t distribution with n – 1 degrees of freedom. df = n – 1, number of observations that are free
√
29
Statistical Methods (Stat 1013)Lecture Notes UoG
The only difference from before is that we don’t use our population standard deviation σ because it is not
known. If one thinks about it, if we don’t know the mean, it would be impossible to know the standard
deviation since the mean is required to get this value.
Two-Tailed Confidence Intervals for is given by:
=> ( - / , + / ) is a 100(1-α)%
√ √
The unit of measurement of the confidence interval is the standard error. This is just the standard deviation
of the sampling distribution of the statistic.
Definition: The confidence coefficient is the proportion of times that a confidence interval encloses the
true value of the population parameter if the confidence interval procedure is used repeatedly a very large
number of times.
Example 1:
From a normal sample of size 25 a mean of 32 was found .Given that the population standard deviation is
4.2. Find
a) A 95% confidence interval for the population mean.
b) A 99% confidence interval for the population mean.
Solution: = 32, = 4.2, 1 − = 0.95 => = 0.05, = 0.025 => = 1.96
=> ℎ ± /
√
a) = 32 ± 1.96 ∗ 4.2/√25
= 32 ± 1.65
= (30.35, 33.6)
b) = 32, = 4.2, 1 − = 0.99 => = 0.01, = 0.005 => = 2.58
=> ℎ b ± /
√
= 32 ± 2.58 ∗ 4.2/√25
= 32 ± 2.17
30
Statistical Methods (Stat 1013)Lecture Notes UoG
= (29.83, 34.17)
Example 2:
A drug company is testing a new drug which is supposed to reduce blood pressure. From the six people
who are used as subjects, it is found that the average drop in blood pressure is 2.28 points, with a standard
deviation of .95 points. What is the 95% confidence interval for the mean change in pressure?
Solution:
0≤ ≤1
has a binomial distribution, but can be approximated by a normal distribution when np(1 – p) > 5
Properties of the sample proportion
31
Statistical Methods (Stat 1013)Lecture Notes UoG
Construction of the sampling distribution of the sample proportion is done in a manner similar to that of
the mean and the difference between two means. When the sample size is large, the distribution of the
sample proportion is approximately normally distributed because of the central limit theorem.
Mean and variance
The mean of the distribution, , will be equal to the true population proportion, E( ) = P, and the
( )
variance of the distribution, , will be equal to = Var( ) = , (where P = Population
proportion).
V( )=V( )= V(X) =
Therefore, for sufficiently large n, np and nq, (namly, np > 10 and nq > 10), ~ N(P, )
Compare this with ~ N( , ), for which the corresponding confidence interval has endpoints ̅ ±
( ⁄√ ).
32
Statistical Methods (Stat 1013)Lecture Notes UoG
Example 1: From a random sample of one thousand silicon wafers, 750 pass a quality control test. Find a
99% confidence interval estimate for p (the true proportion of wafers in the population that are good).
n = 1000 and x = 750
=> = = =
=> =1− ̂ =
α/2 = 0.005
. = . , ≈ 2.576
. × .
Endpoints of the CI: ̂ ± / = 0.75 ± 2.576 = 0.75 ± 0.03527.
Therefore the 99% confidence interval estimate for p is 71.5% ≤ p ≤ 78.5% correct to three significant
figures.
Using the more precise version of the confidence interval yields
33
Statistical Methods (Stat 1013)Lecture Notes UoG
Now we want to extend our statements about a population parameter; we use our knowledge of sampling
and sampling distributions to reject or fail to reject whether or not we think a certain population has
certain characteristics.
Hypothesis Testing: This is also one way of making inference about population parameter, where the
investigator has prior notion about the value of the parameter. For instance we may think that the average
income level in a certain community is around $45,000, but if we take an appropriate random sample and
find out that the income is much lower we now will have a basis to prove this false (with a certain level of
confidence → think and recall confidence intervals).
34
Statistical Methods (Stat 1013)Lecture Notes UoG
Alternative hypothesis: It is the hypothesis available when the null hypothesis has to be rejected.
It is the hypothesis of difference.
Usually denoted by H1 or Ha.
Types and size of errors
No matter which hypothesis represents the claim, always begin the hypothesis test assuming that the
equality condition in the null hypothesis is true. (Note: This may or may not be the claim)
At the end of the test, one of two decisions will be made:
reject the null hypothesis
fail to reject the null hypothesis
Because the decision is based on a sample, there is the possibility of making the wrong decision.
Testing hypothesis is based on sample data which may involve sampling and non sampling errors.
The following table gives a summary of possible results of any hypothesis test:
35
Statistical Methods (Stat 1013)Lecture Notes UoG
In practice we set α at some value and design a test that minimizes β. This is because a type I error is
often considered to be more serious, and therefore more important to avoid, than a type II error.
Since rejecting a null hypothesis has a chance of committing a type I error, we make a small by
selecting an appropriate confidence interval.
Generally, we do not control β, even though it is generally greater than α. However, when failing to
reject a null hypothesis, the risk of error is unknown.
Example: The USDA limit for salmonella contamination for chicken is 20%. A meat inspector reports
that the chicken produced by a company exceeds the USDA limit. You perform a hypothesis test to
determine whether the meat inspector’s claim is true. When will a type I or type II error occur? Which is
more serious?
Solution:
Let p represent the proportion of chicken that is contaminated.
Hypotheses: H0: p ≤ 0.2 versus Ha: p > 0.2
Chicken meets USDA limits: H0: p ≤ 0.20 and Chicken exceeds USDA limits: H0: p > 0.20.
A type I error is rejecting H0 when it is true.
The actual proportion of contaminated chicken is less than or equal to 0.2, but you decide to reject H0.
A type II error is failing to reject H0 when it is false.
The actual proportion of contaminated chicken is greater than 0.2, but you do not reject H0.
General steps in hypothesis testing:
1. The first step in hypothesis testing is to specify the null hypothesis (H0) and the alternative hypothesis
(H1).
2. The next step is to select a significance level, α,
Level of significance is your maximum allowable probability of making a type I error. By setting the
level of significance at a small value, you are saying that you want the probability of rejecting a true
null hypothesis to be small.
3. Identify the sampling distribution of the estimator.
4. The fourth step is to calculate a statistic analogous to the parameter specified by the null hypothesis.
5. Identify the critical region (rejection region). It will tell us to reject the null hypothesis if the test
statistic falls in the rejection area, and to accept it if it falls in the acceptance region.
To use a rejection region to conduct a hypothesis test, calculate the standardized test statistic, z. If the
standardized test statistic
Is in the rejection region, then reject H0.
Is not in the rejection region, then fail to reject H0.
36
Statistical Methods (Stat 1013)Lecture Notes UoG
Types of test
6. Making decision.
Decision Claims
Claim is H0 Claim is H1
Reject H0 There is enough evidence to reject the There is enough evidence to support the
claim claim
Fail to reject H0 There is not enough evidence to reject the There is not enough evidence to support
claim the claim
YES NO
Write a statement to interpret the decision in the context of the original claim. If we reject the Ho we will
say that the data to be tested does not provide sufficient evidence to cause rejection.
37
Statistical Methods (Stat 1013)Lecture Notes UoG
If it is rejected we say that the data are not compatible with Ho and support the alternative hypothesis (Ha)
7. Conclusion or Summarization of the result.
38
Statistical Methods (Stat 1013)Lecture Notes UoG
Where =
√
Case 2: When sampling is from a normal distribution with unknown and small sample size
0
The relevant test statistic is = ~ with n -1 degrees of freedom.
√
After specifying α we have the following regions on the student t-distribution corresponding to the
above three hypothesis.
Summary Table for Decision Rule
Where =
√
Case3: When sampling is from a non- normally distributed population or a population whose
functional form is unknown.
If a sample size is large one can perform a test hypothesis about the mean by using:
Where = , if
√
= , if
√
Step 4: make the decision. Since the test value, +1.59, is less than the critical value, +1.65, and is not in
the critical region, the decision is to not reject the null hypothesis. This test is summarized in the following
figure:
Step 5: summarize the result. There is not enough evidence to support the claim that the mean time is
greater than 29 days.
Example 2: An industrial company claims that the mean pH level of the water in a nearby river is6.8. You
randomly select 19 water samples and measure the pH of each. The sample mean and standard deviation
40
Statistical Methods (Stat 1013)Lecture Notes UoG
are 6.7 and 0.24, respectively. Is there enough evidence to reject the company’s claim at α = 0.05?
Assume the population is normally distributed.
Solution:
: μ = 6.8 (Claim) versus : μ ≠ 6.8
α = 0.05, df = 19 – 1 = 18, ( − 1) = . (18) = ± 2.101
. .
Test statistic: = = . = −1.816
√ √
Rejections rejoin:
41
Statistical Methods (Stat 1013)Lecture Notes UoG
Example 4: The mean life time of a sample of 16 fluorescent light bulbs produced by a company is
computed to be 1570 hours. The population standard deviation is 120 hours. Suppose the hypothesized
value for the population mean is 1600 hours. Can we conclude that the life time of light bulbs is
decreasing? (Use α = 0.05 and assume the normality of the population)
Solution: Let μ is the population mean, = 1600
Step 1: Identify the appropriate hypothesis
: = 1600 , Versus : < 1600
Step 2: select the level of significance, α = 0.05 (given), − . = −1.645
Step 3: Select an appropriate test statistics
Z- Statistic is appropriate because population variance is known.
Step 4: identify the critical region.
The critical region is <− . = −1.645 => (−1.645, ∞) is acceptance region.
Step 6: Decision
Accept H0, since Zcal is in the acceptance region.
Step 7: Conclusion
At 5% level of significance, we have no evidence to say that the life time of light bulbs is
decreasing, based on the given sample data.
Example 5: Suppose it is known that the average cholesterol level in children is 175 mg/dl. A group of
men who have died from heart disease within the past year are identified, and the cholesterol levels of
their offspring are measured. Suppose the mean cholesterol level of 10 children whose fathers died from
heart disease is 200mg/dl, with standard deviation 50mg/dl. Test the hypothesis that the mean cholesterol
level is higher in this group than the general population. Use the 0.05 level of significance.
Solution: Let μ is the population mean, = 175
Step 1: Identify the appropriate hypothesis: : = 175 : > 175 (claim)
Step 2: select the level of significance, α = 0.05 (given)
Step 3: Select an appropriate test statistics
t- Statistic is appropriate because population variance is not known and the sample size is also small.
Step 4: identify the critical region.
Here we have one critical region since we have one tailed hypothesis
The critical region is > . (9) = 1.833
=> (−∞, 1.833) is acceptance region.
42
Statistical Methods (Stat 1013)Lecture Notes UoG
=> = = = 1.58
√ √
Step 6: Decision
Accept H0, since is in the acceptance region.
Step 7: Conclusion
At 5% level of significance, we have no evidence to say that the mean cholesterol level is higher in
children whose fathers have died from heart disease within the past year than the general population.
Example 6: It is hoped that a newly developed pain reliever will more quickly produce perceptible
reduction in pain to patients after minor surgeries than a standard pain reliever. The standard pain reliever
is known to bring relief in an average of 3.5 minutes. To test whether the new pain reliever works more
quickly than the standard one, 50 patients with minor surgeries were given the new pain reliever and their
times to relief were recorded. The experiment yielded samples mean 3.1 minutes and sample standard
deviation 1.5 minutes. Is there sufficient evidence in the sample to indicate, at the 5% level of
significance, that the newly developed pain reliever does deliver perceptible relief more quickly? (Use α =
0.05 and assume the normality of the population)
Solution: Let μ is the population mean, = 3.5
Step 1: Identify the appropriate hypothesis: : = 3.5 : < 3.5(claim)
Step 2: select the level of significance, α = 0.05 (given)
Step 3: Select an appropriate test statistics
Z- Statistic is appropriate because population variance is known.
Step 4: identify the critical region.
The critical region is <− . = −1.645
=> (-1.645, ∞) is acceptance region.
Step 5: computation: = 3.1, = 1.5
. .
=> = = . = −1.886
√ √
Step 6: Decision, Reject H0, since the test statistic falls in the rejection region, the decision is to
reject H0.
Step 7: Conclusion
The data provide sufficient evidence, at the 5% level of significance, to conclude that the average time
until patients experience perceptible relief from pain using the new pain reliever is smaller than the
average time for the standard pain reliever.
43
Statistical Methods (Stat 1013)Lecture Notes UoG
Exercise: It is known in a pharmacological experiment that rats fed with a particular diet over a certain
period gain an average of 40 gms in weight. A new diet was tried on a sample of 20 rats yielding a weight
gain of 43 gms with variance 7 gms2. Test the hypothesis that the new diet is an improvement assuming
normality.
a) State the appropriate hypothesis
b) What is the appropriate test statistic? Why?
c) Identify the critical region(s)
d) On the basis of the given information test the hypothesis and make conclusion.
Testing Hypothesis using the p-Value Method: in this case we use p-value instead of the test statistic
method. It is almost identical to the above method, but instead of using and comparing a test statistic to
some critical value we simply compare the probability of the test statistic to the p-value.
P-value – the probability that the test statistic would take on some extreme value than that which is
actually observed. This assumes that the null hypothesis is true. For a one sided test this is simply α and
for a two sided test it is α/2. It is the smallest value of α for which the Ho can be rejected , so it gives a
more precise statement about probability of rejection of Ho when it is true than the alpha level, so instead
of saying the test statistic is significant or not , we will mention the exact probability of rejecting the Ho
when it is true
The probability, if the null hypothesis is true, of obtaining a sample statistic with a value as extreme or
more extreme than the one determined from the sample data. Depends on the nature of the test.
It should be the case that the results from the p-value method and the test statistic method give the same
results. If you don’t get the same results a mistake has been made. If we get a p-value that is smaller than
the α, then we say that the data are significant at the level of α.
Decision rule when using a p-value
If p-value ≤ α, reject the null hypothesis.
If p-value > α, do not reject the null hypothesis.
Example: a researcher claims that the average wind speed in a certain city is 8 miles per hour. A sample of
32 days has an average wind speed of 8.2 miles per hour. The standard deviation of the population is 0.6
mile per hour. At α =0.05, is there enough evidence to reject the claim? Use the p-value method.
Solution:
Step 1: state the hypothesis and identify the claim
: = 8( ) : ≠8
Step 2: compute the test value:
44
Statistical Methods (Stat 1013)Lecture Notes UoG
.
= . = 1.89
√
Step 3: find the p-value. Using table, and find the corresponding area for z =1.89. it is 0.9706. Subtract the
value from 1.0000.
1.0000-0.9706 = 0.0294, since this is a two tailed test, the area of 0.0294 must be double to get the
p-value. i.e 2(0.0294) = 0.0588
Step 4: make the decision. The decision is to not reject the null hypothesis, since the p-value is greater
than 0.05. See the following figure:
Step 5: summarize the result, there is not enough evidence to reject the claim that the average wind speed
is 8 miles per hour.
Nature of the test
Three types of hypothesis tests
left-tailed test
right-tailed test
two-tailed test
The type of test depends on the region of the sampling distribution that favors a rejection of . This
region is indicated by the alternative hypothesis. In other words, we are testing the alternative hypothesis.
Left- tailed Test Right- tailed Test
: = : < : = : >
45
Statistical Methods (Stat 1013)Lecture Notes UoG
A hypothesis test involving a population proportion can be considered as a binomial experiment when
there are only two outcomes and the probability of a success does not change from trial to trial.
Since a normal distribution can be used to approximate the binomial distribution when np ≥ 5 and nq ≥ 5,
the standard normal distribution can be used to test hypotheses for proportions. The estimate of p from a
sample of size n is the sample proportion, ̂ = x/n, where x is the number of successes in the sample.
Using the normal approximation, the appropriate statistic to perform inferences on p is
̂−
=
(1 − )
Under the conditions for binomial distributions, this statistic has the standard normal distribution,
assuming sufficient sample size for the approximation to be valid.
When you use proportions you will state your Null and Alternate Hypothesis in terms of population
parameters (p), using your sample statistic ( ̂ ) as an estimator.
Null Hypothesis will be a statement about the parameter that will include the equality.
Alpha (α) determines the size of the rejection region.
Test can be one or two-tailed, depending on how the alternative hypothesis is formulated. The most
important requirement to perform a hypothesis test for proportion is to assume that the distribution is
normal, and in this case, n has to be sufficiently large such that np > 5 and n(1-p) > 5 .
These are steps for a hypothesis test:
1. Formulate Hypothesis (one-tailed or two-tailed)
:p=
:p≠ where
which is compared to the appropriate critical values from the normal distribution, or a p-value is
calculated from the normal distribution.
2. Compute using the alpha provided by the problem
3. State the Decision rule:
If Z-test > , Reject Ho (use the decision rule that best fits to your needs)
Note that we do not use the t distribution here because the variance is not estimated as a sum of squares
divided by degrees of freedom. Of course, the use of the normal distribution is an approximation, and it is
generally recommended to be used only if np ≥ 5 and n(1 − p) ≥ 5.
47
Statistical Methods (Stat 1013)Lecture Notes UoG
Example 1: A research center claims that 25% of college graduates think a college degree is not worth the
cost. You decide to test this claim and ask a random sample of 200 college graduates whether they think a
college degree is not worth the cost. Of those surveyed, 21% reply yes. At α = 0.10 is there enough
evidence to reject the claim?
Solution: Verify that np ≥ 5 and nq ≥ 5. np = 200(0.25) = 50 and nq = 200 (0.75) = 150
: = 0.25 ( ) : ≠ 0.25 α = 0.01
Rejection region:
Test statistic:
̂− 0.21 − 0.25
= = −1.31
(1 − ) 0.25(1 − 0.25)
200
Decision: Fail to reject .
At the 10% level of significance, there is enough evidence to reject the claim that 25% of college
graduates think a college degree is not worth the cost.
Example 2: Suppose that the national smoking rate among men is 25% and we want to study the smoking
rate among men in Addis Ababa. Let p be the proportion of Addis Ababa men who smoke. Test the null
hypothesis that the smoking prevalence in Addis Ababa is the same as the national rate. Suppose that we
plan to take a sample of size n = 100 and in the sample, 15 are smokers.
Does this evidence indicate that the true proportion of smokers is significantly different from the national
rate? Test at significance level α = .0 5.
Solution: : = % : ≠ %
. . .
Compute = = = = −2.3094
( ) . ( . ) .
= . = 1.96
| |> /
Conclusion: At 5% level of significance this indicates that the proportion of smokers in Addis Ababa is
significantly different from the national rate.
48
Statistical Methods (Stat 1013)Lecture Notes UoG
Exercise 1: A telephone company representative estimates that 40% of its customers have call-waiting
service. To test this hypothesis, she selected a sample of 100 customers and found that 37% had call
waiting. At α = 0.01, is there enough evidence to reject the claim?
Exercise 2: Suppose it is claimed that in a very large batch of components, about 10% of items contain
some form of defect. It is proposed to check whether this proportion has increased, and this will be done
by drawing randomly a sample of 150 components. In the sample, 20 are defectives. Does this evidence
indicate that the true proportion of defective components is significantly larger than 10%? Test at
significance level α = .0 5.
The error between the estimator and the quantity it is supposed to estimate is: − is a random
√
variable having approximately the standard normal distribution We could assert with probability 1-α that
the inequality
−
− / ≤ ≤ /
√
The error | − | will be less than = / ∗ with probability 1-α
√
Suppose that we want to use the mean of a large random sample to estimate the mean of a population, and
want to be able to assert with probability 1-α that the error will be at most some prescribed quantity E. As
before, we get;
/
=
The formula for determining the sample size in the case of a proportion is:
̂ (1 − ̂ )
. = ̂± /
To construct a confidence interval about a proportion, you must use the maximum error of the estimate,
( )
which is = /
A minimum sample size needed for interval estimate of a population proportion is:
/
= ̂ (1 − ̂ )
Example 1:
The Dairy Farm Company wants to estimate the proportion of customers that will purchase its new
broccoli-flavored ice cream. The company wants to be 90% confident that they have estimated the
population proportion (p) to within .03. How many customers should they sample?
Solution:
The desired margin of Error is E = 0.03. The company wants to be 90% confident, so / =1.645; the
required sample size is:
1.645
= ̂ (1 − ̂ ) = ̂ (1 − ̂ )
0.03
Since the sample has not yet been taken, the sample proportion pˆ is still unknown.
We proceed using either one of the following two methods:
Method 1:
There is no knowledge about the value of p
Let p =0.5. This results in the largest possible n needed for a 90% confidence interval of the form
̂ ± 0.03
If the proportion does not equal 0.5, the actual margin of error will be narrower than 0.03 with the
n obtained by the formula below.
.
(0.5)(0.5) = 751.67 ↑ 752
.
Method 2:
There is some idea about the value of p (say p ~ 0.2)
Use the value of p to calculate the sample size
.
(0.2)(0.8) = 481.07 ↑ 482
.
50
Statistical Methods (Stat 1013)Lecture Notes UoG
CHAPTER THREE
3. Inference about the Difference between Two Populations
Learning Objectives
To understand the logical framework for estimating the difference between the means of two distinct
populations and performing tests of hypotheses concerning those means.
To learn how to construct a confidence interval for the difference in the means of two distinct
populations using, independent and paired samples.
To learn how to perform a test of hypotheses concerning the difference between the means of two
distinct populations using, independent and paired samples.
To learn how to perform a test of hypotheses concerning the difference between the proportions of two
distinct populations using, independent and paired samples.
Make decisions about sample size in comparing means and proportions
3. 1 Inference About the Difference Between Two Population Means
3.1.1 Introduction
Inferences concerning differences between two population parameters are considered as comparative
studies. Comparative studies are designed to discover and evaluate differences between groups or between
treatments. In comparative study experiments are conducted to collect informative data and conclusions
are drawn based on the experimental evidences.
In studies involving comparison of two groups there are two ways of taking the samples and conducting
the experiment:
i. Paired samples and
ii. Independent samples
Definition:
a. Two samples are said to be paired if each data point in the first sample is matched and related to a
unique data point in the second sample. Pairs of similar individuals (observations) are selected. In an
experiment one treatment is applied to one member of each pair and another treatment is applied to the
other member. A common application occurs in self pairing where a single individual is measured on
two occasions. The aim of pairing is that to make the comparison more accurate by having members in
pair as like as possible except for differences in treatment that the investigator deliberately introduces.
b. Two samples are independent if data points in one sample are unrelated to data points in the second
points. This case arises when we wish to compare two populations and have drawn a sample from each
quite independently. The independent samples are used widely when there is no suitable basis for
paring.
51
Statistical Methods (Stat 1013)Lecture Notes UoG
Motivating example:
Consider two populations; the two population descriptions are as follows:
n n
X X
S S
Figure 3.1 Comparing Two Population Means
The hypothesis are about μ − μ , such as
H :μ −μ = D
52
Statistical Methods (Stat 1013)Lecture Notes UoG
H :μ −μ ≠ D
For a two-tailed test where D is the hypothesized difference in the means
Often, D is zero.
For a one-tailed test with rejection on the left the hypotheses will be
H :μ −μ = D
H :μ −μ < and for rejection on the right tail
H :μ −μ = D
H :μ −μ >
3.1.2 Sampling Distribution of the Difference Between Two Means
It often becomes important to compare two population means. Knowledge of the sampling distribution of
the difference between two means is useful in studies of this type. It is generally assumed that the two
populations are normally distributed.
Sampling distribution of X − X
Plotting mean sample differences against frequency gives a normal distribution with mean equal to
μ − μ which is the difference between the two population means.
Variance
The variance of the distribution of the sample differences is equal to + . Therefore, the standard
σ1 2 σ2 2
error of the differences between two means would be equal to
n1
+ n2
.
For instance, we may be testing the difference between the average strengths of pins produced from two
different raw materials but using the same machinery and process. It is reasonable in this case to assume
equal variances.
Assumptions:
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are equal.
3. The populations are both normally distributed.
From sampling theory, we note that
E[X − X ] = μ − μ
By the Central Limit Theorem, as n1 and n2 both increase, X − X will approach the normal distribution.
If the common variance is unknown it is estimated by the pooled estimator of the population variance
given by:
(n − 1)S + (n − 1)S
S =
n +n −2
Decision rule: we shall reject: we shall reject H at α level of significane if :
|Z| > Z for H : μ = μ Versus H : μ ≠ μ
54
Statistical Methods (Stat 1013)Lecture Notes UoG
Confidence interval: A 100(1-α)% confidence interval for mean difference μ − μ is given by:
1 1
(X − X ) ± Z σ −
n n
( ) ( )
If σ is unknown T = has a t-distribution with n + n − 2 degrees of freedom.
1 1
(X − X ) ± T ,n + n − 2 ∗ S −
n n
( ) ( ) ( )( . ) ( )( . ) . .
Where:S = = = = 21.2165
( ) ( ) ( . . ) .
t= = = .
= 2.6563
. ∗
55
Statistical Methods (Stat 1013)Lecture Notes UoG
Critical region:
With α = 0.05 and df = 23, the critical value of t is 1.7139. We reject H if t > 1.7139.
Reject H because 2.6563 > 1.7139 (p-value = 0.014).
Conclusion: At a significance level of 5% there is sufficient evidence that smokers, in general, have
greater lung damage than do non-smokers, i.e., the mean lung damage of smokers is significantly higher
from the mean damage of no smokers.
Case 2: Unequal variance but and are known
Assumptions:
Assumptions:
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are NOT equal.
3. The populations are both normally distributed.
When and are known, we can therefore conduct a z-test if both n and n are more than 30 and the
two populations are normal. The z statistic for testing H : μ = μ is given by the formula
(X − X ) − (μ − μ )
z=
σ σ
+
n n
Decision rule: we shall reject H at α level of significane if :
|Z| > Zα for H : μ = μ Versus H : μ ≠ μ
σ σ
(X − X ) ± Zα ∗ +
n n
Solution:
Assumptions
o Two independent random samples
56
Statistical Methods (Stat 1013)Lecture Notes UoG
( ) (μ μ )
Z= σ(
, Where σ( ) = +
)
S = ∑ (X − X ) and S = ∑ (X − X )
S S
(X − X ) ± Zα ∗ +
n n
Example 1: These data were obtained in a study comparing persons with disabilities with persons without
disabilities. A scale known as the Barriers to Health Promotion Activities for Disabled Persons (BHADP)
57
Statistical Methods (Stat 1013)Lecture Notes UoG
Scale gave the data. We wish to know if we may conclude, at the 99% confidence level, that persons with
disabilities score higher than persons without disabilities.
Given
Disable: x = 31.83 n = 132 s = 7.93
Non-disable: x = 25.07 n = 137 s = 4.80 α = 0.01
Solution:
Assumptions
Independent random samples
large samples
unknown variance
Hypothesis: H : μ = μ Versus H : μ > μ
Test statistic: Because of the large samples, the central limit theorem permits calculation of the z score as
opposed to using t. The z score is calculated using the given sample standard deviations. If the
assumptions are correct and H is true, the test statistic is approximately normally distributed
(X − X ) − (μ − μ ) (31.83 − 25.07) − 0 6.76
z= = = = 8.42
(7.93) (4.80) 0.8029
+
132 137
n + n
Critical region:
With α = 0.01 and a one tail test, the critical value of z is 2.33. We reject H if z > 2.33.
Discussion: Reject H because 8.42 > 2.33.
Conclusion: At 99% level of confidence we conclude that the data support the claim that persons with
disabilities score higher than persons without disabilities.
Case 4: Unequal variances with no information about and , and small sample size.
There are many situations in which the comparison of means has to be made based on small samples from
population with different variances. Among the common situations in which we cannot assume σ
σ are:
i. When the samples come from different types of population as in comparisons made from survey data.
ii. When computing confidence limits in case in which the population means differ widely the common
result that σ changes (though slowly) as μ changes will make us hesitant to assume
σ =σ .
iii. When one treatment is erratic in its performance sometimes giving high sometimes low responses. In
populations that are markedly skew, the relationship between μ and σ is relatively strong.
58
Statistical Methods (Stat 1013)Lecture Notes UoG
+
. = =
⎛ ⎞ ⎛ ⎞
⎜ − 1⎟ + ⎜ − 1⎟
⎝ ⎠ ⎝ ⎠
Decision rule: we reject: we shall reject H at α level of significane if :
|t| > t , v for H : μ = μ Versus H : μ ≠ μ
(X − X ) ± t α , v ∗ + .
Example 1: We wish to compare the mean gestational age (in weeks) of babies born to women with
preeclampsia during pregnancy vs. those who had normal pregnancies. Is the mean gestational age for
babies born to preeclamptic mothers is less than the mean gestational age for babies born to mothers with
normal pregnancies at 95% confident level?
Data:
Preeclampsia: 38, 32, 42, 30, 38, 35, 32, 38, 39, 29, 29, 32
Normal: 40, 41, 38, 40, 40, 39, 39, 41, 41, 40, 40, 40
Solution:
Preeclampsia: x = 34.5 n = 12 s = 19.36
Normal: x = 39.92 n = 12 s = 0.81 α = 0.05
Hypothesis: H : μ = μ Versus H : μ < μ
59
Statistical Methods (Stat 1013)Lecture Notes UoG
4. Make the decision. Do not reject the null hypothesis, since -0.57 > -2.365.
5. Summarize the results. There is no any significance difference between the average size of the two farms.
3.1.4 Comparison of Means in Paired Samples
At times it might be possible to pair the observations in the two samples and take the difference in each
pair of observations. Usually this happens when subjects are exposed to a treatment, measurements are
taken before and after the treatment, and these measurements are compared to test the effectiveness of the
treatment. In effect, this amounts to a single population test where the population is the set of all possible
differences in the measurements.
60
Statistical Methods (Stat 1013)Lecture Notes UoG
A paired difference test is more efficient than the other tests because it has less chances of Type I and
Type II errors for the same sampling effort. The efficiency obtains because when measurements are made
on the same subject before and after a treatment, effects of extraneous variables such as age, race and
gender on the before/after difference are avoided. Thus the measured difference due to the treatment is
more accurate and reliable. Hence, whenever a paired difference test is possible, one should settle for that
rather than for other types of tests.
When using dependent samples each observation from population 1 has a one-to-one correspondence with
an observation from population 2. One of the most common cases where this arises is when we measure
the response on the same subjects before and after treatment.
This is commonly called a “pre-test/post-test” situation. However, sometimes we have pairs of subjects in
the two populations meaningfully matched on some prespecified criteria. For example, we might match
individuals who are the same race, gender, socio-economic status, height, weight, etc... to control for the
influence these characteristics might have on the response of interest. When this is done we say that we
are “controlling for the effects of race, gender, etc...”. By using matched-pairs of subjects we are in effect
removing the effect of potential confounding factors, thus giving us a clearer picture of the difference
between the two populations being studied.
When two samples are not independent and observations are taken in pairs the paired t-test is applicable.
In this case for each data point in one sample there is corresponding data point in the second sample.
Consider n paired sample points (x , x ), (x , x ), …, (x , x ). Let the mean difference among pairs
in the population be denoted by μ .
61
Statistical Methods (Stat 1013)Lecture Notes UoG
62
Statistical Methods (Stat 1013)Lecture Notes UoG
using the linear distance across the tumor, but this was found to be very variable because of the irregular
shape of some tumors. A new method called the RECIST criteria traces the outside of the tumor. The
RECIST method was believed to give more consistent measures of the volume of the tumor. For a portion
of the study, a pair of doctors were shown the same set of tumor pictures. The volume of the tumor was
measured by two separate physicians under similar conditions.
Question of interest: Did the measurements from the two physicians significantly differ? If not, then there
would be no evidence that the volume measurements change based on physician.
Tumor 1 2 3 4 5 6 7 8 9 10
Dr.1 15.8 22.3 14.5 15.7 26.8 24.0 21.8 23.0 29.3 20.5
Dr.2 17.2 20.3 14.2 18.5 28.0 24.8 20.3 25.4 27.5 19.7
Solution:
We can measure the effect of the treatment in each person by taking the difference = − . Instead
of having two samples, we can consider our dataset to be one sample of differences
Tumor 1 2 3 4 5 6 7 8 9 10
Dr.1 15.8 22.3 14.5 15.7 26.8 24.0 21.8 23.0 29.3 20.5
Dr.2 17.2 20.3 14.2 18.5 28.0 24.8 20.3 25.4 27.5 19.7
Difference -1.4 2.0 0.3 -2.8 -1.2 -0.8 1.5 -2.4 1.8 0.8
3) Test statistic
.
= = . = −0.457
√ √
64
Statistical Methods (Stat 1013)Lecture Notes UoG
= =
√
65
Statistical Methods (Stat 1013)Lecture Notes UoG
Decision Rule: As | | > t critical then, Ho will be rejected at a=5% level of significance
Conclusion: There is strong evidence to conclude that vapor pressure has an effect on the nectar
secretion.
b) The 99% confidence interval for − or is given by:-
Solution:
For vitamin E to be effective, the “before weights” must be less than the “after weights”, i.e., the
difference in the population means must be negative in order for the vitamin to effective.
Thus the null and alternative hypotheses to be tested are:
Hypothesis: : − =0 = 0 (Vitamin E does not increase strength or vitamin E is not
effective)
: − <0 < 0 (Vitamin E increase strength i.e. Athletes gain more weight
66
Statistical Methods (Stat 1013)Lecture Notes UoG
Athletes 1 2 3 4 5 6 7 8 Total
Before 210 230 182 205 262 253 219 216
After 219 236 179 204 270 250 222 216
-9 -6 3 1 -8 3 -3 0 -19
81 36 9 1 64 9 9 0 209
After 111 110 107 101 121 115 122 118 123 105
a) Find the mean change in the measure after the medication.
b) Find standard deviation of the chance and the standard error of the mean.
c) Construct 99% confidence interval for the true mean chance.
d) Test for a significant difference between the measures before and after the medication at the 5%
level of significance.
3.2 Inference About the Difference Between Two Population Proportions
3.2.1 Introduction
We now consider the case where there are two binomial parameters of interest, say, p and p , and we
wish to draw inferences about these proportions. Often you want to analyze differences between two
groups in the proportion of items that are in a particular category. The sample statistics needed to analyze
these differences are the proportion of occurrences in group 1 and the proportion of occurrences in group
2. With a sufficient sample size in each group, the sampling distribution of the difference between the two
proportions approximately follows a normal distribution. Suppose we wish to compare the proportions of
two populations that have a specific characteristic, such as the proportion of men who are left-handed
compared to the proportion of women who are left-handed. Each population is divided into two groups,
the group of elements that have the characteristic of interest (for example, being left handed) and the
group of elements that do not. We arbitrarily label one population as Population 1 and the other as
Population 2, and we draw a random sample from Population 1 and, without reference to the first sample
we draw a sample from Population 2.
Our goal is to use the information in the samples to estimate the difference P − P in the two population
proportions and to make statistically valid inferences about it.
3.2.2 Sampling Distribution of the Difference Between Two Proportions
We assess the probability associated with a difference in proportions computed from samples drawn from
each of these populations.
Sampling distribution of P − P .
68
Statistical Methods (Stat 1013)Lecture Notes UoG
The sampling distribution of the difference between two sample proportions is constructed in a manner
similar to the difference between two means. Independent random samples of size n and n are drawn
from two populations of dichotomous variables where the proportions of observations with the character
of interest in the two populations are P and P , respectively.
The distribution of the difference between two sample proportions, P − P , is approximately normal. For
large sample, n and n are large.
The mean for difference P − P is μ =P −P
( ) ( )
The variance for P − P is σ = +
The z score for the difference between two proportions is given by:
( ) ( )
Z= has standard normal distribution.
( ) ( )
( ) ( )
Therefore (P − P )~N[μ = P − P , σ = + for large sample size
Note that the expression for the variance of the difference contains the unknown parameters P and P . In
the single-population case, the null hypothesis value for the population parameter p was used in
calculating the variance. If the two population proportions are hypothesized to be different, then we
substitute P and P in their places.
If the two population proportions are hypothesized to be equal, then we substitute a pooled proportion P in
both places. This is analogous to the use of S and S separately or combining them into s in the case
of t-test for comparing population means. Letting P and P be the sample proportions for samples 1 and
69
Statistical Methods (Stat 1013)Lecture Notes UoG
2, respectively, the estimate of the common proportion p is a weighted mean of the two-sample
proportions,
The pooled proportion is given by the formula
In construction of a confidence interval for the difference in proportions, we cannot assume a common
proportion, hence we use the individual estimates P and P in the variance estimate. The (1 − α)
confidence interval on the difference P and P is:
As in the one-population case the use of the t distribution is not appropriate since the variance is not
calculated as a sum of squares divided by degrees of freedom. However, samples must be reasonably large
in order to use the normal approximation.
Comparing two population proportions can be done using a z-test if the two samples are sufficiently large.
Here a large sample means both np and n(1 - p) are at least 5.
Procedures
Hypothesis
The null hypothesis is H : P = P and the alternative hypothesis is one of the following:
H : P ≠ P Two-tailed
H : P < P Left-tailed
H : P > P Right-tailed
Decide on the significance level, α.
The critical value(s) are
Use Z-table to find the critical value(s)
±Z For two-tailed
−Z For left-tailed
Z For right-tailed
70
Statistical Methods (Stat 1013)Lecture Notes UoG
If the value of the test statistic falls in the rejection region, reject H0 otherwise, do not reject H0.
State the conclusion in words.
The p-value approach compares the p-value for the test statistic with the α level.
Example 1: 200 patients suffering from a certain disease were randomly divided into two groups. Of the
first group consisting of 120 patients, who received treatment A, 99 recovered within three days. Out of
the other 80, who were treated by treatment B, 62 recovered within 3 days. Can we conclude that
treatment A is more effective?
Solution:-Let P = the population proportion of treatment A & P is that of treatment B
Given that n =120, n = 80, x = 99, x = 62, and n = n + n = 200. Thus,
p = = = 0.83, and
p = = = 0.78
p= = = 0.81
q = 1 − p = 1 − 0.81 = 0.19
The test statistics is,
Z= = 0.88
Z =Z . = 1.645, that is, R: z > 1.65 but the calculated value of Z does not lie in the rejection region.
Hence, we fail to reject Ho and conclude the two treatments are equally effective.
Example 2: A sample of 100 students at the university showed that 43 had taken one or more remedial
courses. A sample of 200 students at a junior college showed that 90 had taken one or more remedial
college courses. At a=0.05, test the claim that there is no difference in the proportion of students who
complete remedial course at a university or a junior college.
Solution: Let p = the population proportion of university students who complete remedial courses & p =
the population proportion of Junior college students who complete remedial courses. Thus,
The hypothesis to be tested is:
:p =
71
Statistical Methods (Stat 1013)Lecture Notes UoG
:p ≠
Level of significance, a=0.05
Test statistics:
Since n =100 and n =200 are large, an appropriate test statistics is:
72
Statistical Methods (Stat 1013)Lecture Notes UoG
Since the two population probabilities are dependent, we cannot use the same approach for estimating the
standard error of the difference that we used in the previous section. Instead of showing the steps in the
derivation of the formula, we simply present the formula for the estimated standard error.
73
Statistical Methods (Stat 1013)Lecture Notes UoG
Under the null hypothesis, H0 the standard error for p − p is given by:
( ) ( )
The sampling distribution of p − p is normal with mean zero and variance under the null
hypothesis.
Confidence Intervals:
The confidence interval for the difference of two dependent proportions P − P , is then given by
74
Statistical Methods (Stat 1013)Lecture Notes UoG
Example 1: Suppose that 100 students took both calculus and computer tests, and 18 failed in calculus (p
= 0.18) and 10 failed in computer (p = 0.10). There is an 8 percentage point difference (p - p = 0.08).
The confidence interval for the difference of these two failure rates cannot be constructed using the
method in the previous subsection because the two rates are dependent.
We need additional information to assess the dependency. Nine students failed both tests (p =0.09), and
this reflects the dependency. The dependency between p and p can be seen more clearly when the data
are presented in a 2 by 2 table.
Computer
Calculus Failed Passed Total
Failed 9(a) 9(b) 18
Passed 1(c) 81(d) 82
Total 10 90 100(n)
Solution:
Hypothesis:
H : p − p = 0 versus H : p − p ≠ 0
The marginal totals reflect the two failure rates. The numbers in the diagonal cells (a, d) are concordant
pairs of test scores (those who passed or failed both tests), and those in the off-diagonal cells (b, c) are
discordant pairs (those who passed one test but failed the other). Important information for comparing the
two dependent failure rates is contained in discordant pairs, as the estimated difference of the two
proportions and its estimated standard error are dependent on b and c.
b − c − 0.5 9 − 1 − 0.5
Z= =Z= = 2.372
√b + c √9 + 1
Critical value
±Z = ±Z . = ±1.96 => −1.96 ≤ ≤ 1.96
Since 2.372 > 1.96 reject the null hypothesis.
Using the standard error equation, we have
75
Statistical Methods (Stat 1013)Lecture Notes UoG
[ ]
Estimated SE(p − p2 ) = [9 + 1] − = 0.0306
Then the 95 percent confidence interval for the difference of these two dependent proportions is 0.08 –
1.96 (0.0306) < p – p < 0.08 +1.96 (0.0306) or (0.0200, 0.1400).
This interval does not include 0, suggesting that the failure rates of these two tests are significantly
different.
Exercise: Suppose we have two methods of teaching A, and B select n pairs of students, (from each
section of students in pairs) and randomly assign one of the members to one methods.
Data layout
Method A Method B Number of students
Success Success 52
Success Failed 21
Failed Success 9
Failed Failed 18
Is there a difference between the proportion of the successful individual taught by method A and B?
Use α = 0.05. Construct 95% confidence limits.
3.3 Sample Size Determination in Comparative Experiments
3.3.1. Comparing two means from independent samples
An important issue in planning a new study is the determination of an appropriate sample size required to
meet certain conditions. For example, for a study dealing with blood cholesterol levels, these conditions
are typically expressed in terms such as “How large a sample do I need to be able to reject the null
hypothesis that two population means are equal if the difference between them is μ = 10 mg/dl?”
We focus on the sample size required to test a specific hypothesis. In general, there exists a formula for
calculating a sample size for the specific test statistic appropriate to test a specified hypothesis. Typically,
these formulae require that the user specify the α-level and Power = (1 – β) desired, as well as the
difference to be detected and the variability of the measure in question. Importantly, it is usually wise not
to calculate a single number for the sample size. Rather, calculate a range of values by varying the
assumptions so that you can get a sense of their impact on the resulting projected sample size. Then you
can pick a more suitable sample size from this range.
The hypotheses to be tested are:
H : μ = μ versus H : μ ≠ μ
H :μ >μ
76
Statistical Methods (Stat 1013)Lecture Notes UoG
H :μ <μ
( ) ( )
Consider the test , with equal variance assumption.
( )
For the two-sided alternative hypothesis with significance level α, the sample size n = n = n required
to detect a true difference in means of μ − μ with power at least 1 –β is:
Examples 1: We are interested in the size for a sample from a population of blood cholesterol levels. We
know that typically σ is about 30 mg/dl for these populations. How large a sample would be needed for
comparing two approaches to cholesterol lowering using α = 0.05, to detect a difference of d = 20 mg/dl or
more with Power = 1- b = 0.90?
77
Statistical Methods (Stat 1013)Lecture Notes UoG
The α level
The power
The variance in the sample difference
One-sided or two sided test
The hypotheses to be tested are:
: =0 : >0 : <0 : ≠0
− = −√ ( − ) => − = −√ ( )
− − = −√ ( ) => + =√ ( )
+ ( + )
= √ => =
( − )∗√
= −
(1 − )+ (1 − )
+ ( (1 − )+ (1 − )
=
( − )
For a specified pair of values and , we can find the sample sizes = = n required to give the test
of size α, that has specified type II error β.
Example: d = - = 0.7 - 0.5 = 0.2
When s = 30 mg/dl, β = 0.10, a = 0.05; = 1.96
78
Statistical Methods (Stat 1013)Lecture Notes UoG
79
Statistical Methods (Stat 1013)Lecture Notes UoG
Chapter 4
4. Analysis of Variance (ANOVA)
4.1. Introduction
Analysis of variance is for two different purposes: (1) to estimate and test hypotheses about population
variances and (2) to estimate and test hypotheses about population means.
This chapter presents statistical methods for comparing means among any number of populations based on
samples from these populations. The t test for comparing two means cannot be generalized to the
comparing of more than two means. Instead, the analysis most frequently used for this purpose is based on
a comparison of variances (F test), and is therefore called the analysis of variance, often referred to by the
acronyms ANOVA. When ANOVA (F test) is applied to only two populations, the results are equivalent
to those of the t test.
Analysis of variance is a widely used statistical technique that partitions the total variability in our data
into components of variability that are used to test hypotheses.
Model:
i = 1, 2, … , k
X =μ +ε
j = 1, 2, … , n
Now let μ = μ + τ . i = 1, 2, … , k now the above formula becomes:
i = 1, 2, … , k
X =μ +τ +ε
j = 1, 2, … , n
Where X denote j observed sample value from population i,
μ is a parameter common to all treatments under the null hypothesis called the overall mean (grand mean),
τ is a parameter associated with the i population (treatment) called the i treatment effect, and
ε is a random error component.
Assumption:
X are assumed normally distributed about the mean, μ , and variance, σ
We assume that the errors are normally distributed with constant variance, i.e, ε ~ NID(0, σ ) i.e.
independently and identically normally distributed with mean 0 and constant variance σ .
This implies that the populations being sampled are also normally distributed with equal variances.
In the fixed-effects model, the treatment effects τ are usually defined as deviations from the overall
mean μ, so that ∑ τ = 0.
Groups are independent
Equal variances between the groups
80
Statistical Methods (Stat 1013)Lecture Notes UoG
4.2. Test of Hypothesis About the Equality of More than Two Population Means
The comparison of two or more means is based on partitioning the variation in the dependent variable into
its components hence; the method is called the analysis of variance (ANOVA).
In general, suppose there are k normal populations with possibly different means,μ , μ , … , μ , but all
with the same variance . The study question is whether all the k population means are the same. We
formulate this question as the test of hypotheses
H : μ = μ = … = μ versus H : not all k population means are equal.
To perform the test k independent random samples are taken from the k normal populations. The k sample
means, the k sample variances, and the k sample sizes are summarized in the table:
Population Population Sample size population Sample Population Sample
size mean mean variance variance
1 ̅
2 ̅
3 ̅
. . . . . . .
. . . . . . .
. . . . . . .
k ̅
The ANOVA identity (assume equal sample size, n = n = ⋯ n ). Let X represent the j of the
observations under the i sample (treatment) and x . represent the average of the observations under the
i sample. Similarly, let x. . = ∑ ∑ x , i = 1, 2, … , k and j = 1, 2, … , n represent the grand total of
all observations and x.. represent the grand mean of all observations.
∑
̅.= , = 1, 2, … , , ̅. . = . . , = + + ⋯+
81
Statistical Methods (Stat 1013)Lecture Notes UoG
If we consider all of the data together, regardless of which sample the observation belongs to, we can
measure the overall total variability in the data by:∑ ∑ ( − ..)
But − ..= − . + . − .. squaring both sides
( − ..) = ( − . +( . − . . )]
The result of the forgoing calculations may then be summarized as in the following table known as the
One-way ANOVA table.
Source of variation Degrees of freedom Sum of squares Mean squares F-test
Between treatments k-1 BSS BMS=BSS/(K-1) BMS
F=
Within treatments N-k WSS WMS=WSS/(N-k) WMS
(error)
Total N-1 TSS
Definition: Analysis of variance provides a subdivision of the total variation between the responsiveness
of experimental units into separate components representing a different source of variation, so that relative
importance of the different sources can be assessed. Secondly and more importantly it gives an estimate of
the underlying variation between units which provides a basis for inferences about the effects of the
applied treatments or the population means.
82
Statistical Methods (Stat 1013)Lecture Notes UoG
In ANOVA, we compare the between-group variation with the within-group variation to assess
whether there is a difference in the population means. Thus by comparing these two measures of
variance (spread) with one another, we are able to detect if there are true differences among the
underlying group population means.
If the variation between the sample means is large, relative to the variation within the samples, then we
would be likely to detect significant differences among the sample means.
If the variation between the sample means is small, relative to the variation within the samples, then
there would be considerable overlap of observations in the different samples, and we would be
unlikely to detect any differences among the population means.
Another estimate of the common variance (( -1) ) can be found using pooled variance sum of squares
estimates
83
Statistical Methods (Stat 1013)Lecture Notes UoG
Thus if the null hypothesis is true we have two estimates of the common variance (σ ), namely the
Mean Square for Treatments (BMS) and the Mean Square Error (WMS). If BMS > WMS, i.e. the
between group variation is large relative to the within group variation, we reject Ho.
84
Statistical Methods (Stat 1013)Lecture Notes UoG
If BMS ~ WMS we fail to reject Ho, i.e. the between group variation is not large relative to the within
group variation.
Our test statistic is the F-ratio (F-statistic) which compares these two mean squares:
A large F-statistic provides evidence against Ho while a small F-statistic indicates that the data and Ho
are compatible.
A point estimate of the population (treatment) mean may be easily determined = ̅ . . If we assume
that the sample means ̅ . are normally distributed, ̅ . ~ ( , ). Thus if were known we could
use the normal distribution to define the confidence interval. Using the mean square error as an estimator
of , we would base the confidence interval on the t-distribution.
Example 1: Assume ”treatment results” from 13 patients visiting one of three doctors are given:
Doctor A: 24, 26, 31, 27
Doctor B: 29, 31, 30, 36, 33
Doctor C: 29, 27, 34, 26
At 5% level do the treatment results are from the same population of results?
Solution:
Ho: The treatment results are from the same population of results
Ha: They are from different populations
Group Sample size Sample mean Sample variance
Dr. A n =4 x = 27 s = 8.67
Dr. B n =5 x = 31.8 s = 7.7
Dr. C n =4 x = 29 s = 12.67
85
Statistical Methods (Stat 1013)Lecture Notes UoG
86
Statistical Methods (Stat 1013)Lecture Notes UoG
87
Statistical Methods (Stat 1013)Lecture Notes UoG
Unbalanced data
In some single-factor experiments, the number of observations taken under each treatment may be
different. We then say that the design is unbalanced. In this situation, slight modifications must be made
in the sums of squares formulas. Let n observations be taken under treatment i (i = 1, 2 , . . . , a), and let
the total number of observations N = ∑ n.
The sums of squares computing formulas for the ANOVA with unequal sample sizes n in each treatment
are:
∑ . ∑ . .. ..
WSS = ∑ ∑ x − , BSS = − and TSS = ∑ ∑ x −
ANOVA table:
Source of variation Degrees of freedom Sum of squares Mean squares F-test
Between treatments k-1 BSS BMS BMS
F=
Within treatments N-k WSS WMS WMS
(error)
Total N-1 TSS
Summary:
When sample sizes are equal, we say design is balanced.
When sample sizes are unequal we say design is unbalanced.
Example 4:
A research laboratory developed two treatments which are believed to have the potential of prolonging the
survival times of patients with an acute form of thymic leukemia. To evaluate the potential treatment
effects 33 laboratory mice with thymic leukemia were randomly divided into three groups. One group
received Treatment 1, one received Treatment 2, and the third was observed as a control group. The
survival times of these mice are given in table below "Mice Survival Times in Days". Test, at the 1% level
of significance, whether these data provide sufficient evidence to confirm the belief that at least one of the
two treatments affects the average survival time of mice with thymic leukemia.
Treatment 1: 71, 75, 72, 73, 75, 72, 80, 65, 60, 63, 65, 69, 63, 64, 78, 71, 91
Treatment 2: 77, 67, 79, 78, 81, 72, 71, 84
Control: 81, 79, 73, 71, 75, 84, 77, 67
Solution:
Step1. The test of hypotheses is
H : μ = μ = μ versus H : not all three population means are equal.
α = 0.01
89
Statistical Methods (Stat 1013)Lecture Notes UoG
Summary statistics
If we index the population of mice receiving Treatment 1by 1, Treatment 2 by 2, and no treatment by 3,
then the sample sizes, sample means, and sample variances of the three samples in the following table
mice survival times in days are summarized by:
Group Sample Size Sample Mean Sample Variance
Treatment 1 = 16 ̅ = 69.75 = 34.47
Treatment 2 =9 ̅ = 77.78 = 52.69
Control =8 ̅ = 75.88 = 30.69
The average of all 33 observations is:
ANOVA table:
Source of variation Degrees of freedom Sum of squares Mean squares F-test
Between treatments 3-1 = 2 435 217.50 F = 5.65
Within treatments 33-3 = 30 1153.5 38.45
(error)
Total 33-1 = 32 1588.5
The critical value is ( − 1, − ) = . (2,30) = 5.39. thus the rejection region is [5.39, ∞), as F
test value is in the rejection region the null hypothesis of equal means is rejected and we conclude that
there is a slight difference in means among the two treatments and the control group at 99% confidence
level.
Assumption for the F test for comparing three or more means
1. The populations from which the samples were obtained must be normally or approximately normally
distributed.
2. The samples must be independent of one another.
3. The variances of the populations must be equal.
4. The sample are drawn randomly from population
90
Statistical Methods (Stat 1013)Lecture Notes UoG
LSD = t α/ WMS + , where n and n are the respective sample sizes from population i
and j and t is the critical value for α = α/2 and df denoting the degrees of freedom for WMS.
Note that for n = n = ⋯ n = n
91
Statistical Methods (Stat 1013)Lecture Notes UoG
LSD = t α/ 2WMS/n
5. Then compare all pairs of sample means. If x . − x . ≥ LSD , declare the corresponding
population means μ and μ different.
6. For each pair wise comparison of population means, the probability of a type I error is fixed at a
specified value of α.
Determining significance
The difference between two mean values is compared to the LSD value. If the difference is greater than
the LSD value, then the means are significantly different.
Example: We want to compare the mean leaf width of a new variety (A) with two comparators (B & C).
The data is presented as follows:
Variety Variety A 17 15 10 14
Variety B 34 26 23 22
Variety C 23 21 8 16
Solution: We can solve this problem by following the five steps listed for the LSD procedure.
Ho: μ = μ = μ V H : at least one of the means differs from the rest.
First, calculate the variety totals and means. Then, calculate the replicate totals.
Variety Sample Size Sample Mean Sample variance
A n =4 14 s = 8.6667
B n =4 26.25 s = 29.5833
C n =4 17 s = 44.6667
The grand mean is:
∑ ∑ X ∑ nx 4(14) + 4(26.25) + 4(17)
X. . = = = = 19.08
N 12 12
BSS = ∑ ∑ (X . − X . . ) = ∑ n (X . − X . . ) = 4(14.19.08) + 4(26.25 − 19.08) +4(17 − 19.08)
= 326.1668
WSS = ∑ (n − 1)s = 3x8.6667 + 3x29.5833 + 3x44.6667 = 248.7501
TSS = BSS + WSS = 326.1668 + 248.7501 = 574.9169
ANOVA table:
Source of variation Df. SS MSS F- test
Between treatments 2 326.1668 163.0834 5.9005
Within treatments 9 248.7501 27.6389
(error)
Total 11 574.9169
92
Statistical Methods (Stat 1013)Lecture Notes UoG
Decision: reject the null hypothesis, since the calculated value is greater than the tabulated value.
Conclusion: The calculated F value 5.9005 is more than the table F value 4.26. This reveals that there are
significant differences among the three varieties i.e. at least one variety is different from any of others.
The next step is to calculate LSD
The least significant difference for comparing two means based on samples of size 4 is then
Note that the appropriate t value (2.262) was obtained from table value with α = α/2 = 0.025 and df 9.
Step5. When we have equal sample size, it is convenient to use the following procedures rather than make
all pair wise comparison among the sample means because the same LSD is to be used for all
comparisons.
Determining if the two varietal means are significantly different:
The difference between two mean values is compared to the LSD value. If the difference is greater than
the LSD value, then the means are significantly different. The variety means for the above example are A
(candidate) = 14, B (comparator 1) = 26.25, C (comparator 2) = 17.
The absolute difference between A and B is 12.25 which is greater than 8.4089: therefore A and B are
significantly different at P ≤ 0.05. This confirms the results of F test.
The difference between A and C is 3 which is less than 8.4089: therefore A and C are not significantly
different at P ≤ 0.05.
The difference between B and C is 9.25 which is less than 8.4089: therefore A and C are not significantly
different at P ≤ 0.05.
4.3.2 Studentized range (test Tukey’s W Method)
The Tukey-Kramer method is the recommended procedure when one wishes to estimate simultaneously
all pair wise differences among the means in a one way ANOVA assuming that the variances are equal.
This is the best known method among other proposed methods, that replaces the LSD by the criterion
based on tables own table values. The table provide q-value at α = 0.01 and α = 0.05. The tabulated values
depend on two parameters, namely, the number of treatments (a) and the degree of freedom associated
with WMS(f).
: = : ≠ ≠
The overall level of significance is exactly α when the sample sizes are equal and at most α when the
sample sizes are unequal.
Steps:
Calculate x . − x . , for i ≠ j.
93
Statistical Methods (Stat 1013)Lecture Notes UoG
Read from the studentized range table the value of q (a, f) for the required a and f.
( ,)
Compute T = q (a, f) for equal sample sizes, or T = WMS + when the sample
√
q= ,
/
Where X and X are the means of the samples being compared, n is the size of the samples, and S is the
within-group variance.
When the absolute value of q is greater than the critical value for the Tukey test, there is a significant
difference between the two means being compared.
Example: consider example 2 and identify which mean is significant?
Solution: since the sample size is equal, we can calculate as follows:
Tα = q α (a, f) , Since k =3, d.f. =12, and a =0.05, the critical value is q . (3,12) = 3.77,
Where X and X are the means of the samples being compared, n and n are the respective sample sizes,
and S is the within-group variance.
94
Statistical Methods (Stat 1013)Lecture Notes UoG
To find the critical value F’ for the Scheffé test, multiply the critical value for the F test by k-1:
F’ = (k-1)(C.V.)
There is a significant difference between the two means being compared when F is greater than F.
Using the Scheffé test, test each pair of means in Example 2 to see whether a specific difference exists, at
α = 0.05.
Solution:
i) for X versus X ,
(X − X ) (11.8 − 3.8)
F = = = 18.33
1 1 1 1
S + 8.73 +
n n 5 5
ii) for X versus X ,
(X − X ) (3.8 − 7.6)
F = = = 4.14
1 1 1 1
S + 8.73 +
n n 5 5
iii) for X versus X ,
(X − X ) (11.8 − 7.6)
F = = = 5.05
1 1 1 1
S + 8.73 +
n n 5 5
The critical value for the analysis of variance; by using table value with α = 0.05, d.f.N = k-1=2, and
d.f.D = N-k = 12, is 3.89. the critical value for F’ at α = 0.05, with d.f.N = 2 and d.f.D. = 12 is:
F’ = (k-1)(C.V) = (3-1)(3.89) = 7.78
Since only the F test value for part i (X versus X ) is greater than the critical value, 7.78, the only
significant difference is between X and X , that is, between medication and exercise. These results agree
with the Tukey analysis.
95
Statistical Methods (Stat 1013)Lecture Notes UoG
Chapter 5
5. Inference About Population Variance
5.1. Introduction
Comparing variances: if a factory manager is considering whether to buy packaging Machine A or
Machine B. During test runs, Machine A produced sample variance S while Machine B produced
sample variance S . Question: Are these variances significantly different?
Suppose the population variances for weights of mealie-meal bags packaged from machines A and B are
respectively, σ and σ . We can answer the question concerning whether the variances are different by
testing the null hypothesis H : σ =σ against the alternative H : σ ≠σ .
Other applications where testing for variance may be important includes the following:
Foreign exchange stability is important in any economy. Too much variation of a currency is not
good.
Price stability of other commodities is also important.
5.2. Sampling Distribution of the Sample Variance
Let x , x , … , x be a random sample from a population. The sample variance is: s = ∑ (x − x)
The square root of the sample variance is called the sample standard deviation
The sample variance is different for different random samples from the same population
The sampling distribution of s has mean σ , E(s ) = σ
σ
If the population distribution is normal, then Var(s ) =
( )
If the population distribution is normal, then σ
has a χ distribution with n-1 degrees of
96
Statistical Methods (Stat 1013)Lecture Notes UoG
Thus 99% confidence interval for σ becomes, (√2.99, √23.05) = (1.73, 4.8)mm
From the 95% as well as 99% confidence interval for σ, we see that the true standard deviation for the
particular shift is greater than 1.5 mm (that is, both intervals donot contain 1.5 mm).
Hence fluctuation n of the thickness is not in the tolerable range. This is thus a sign assignabel cause on
quality of the plastic sheets.
By using χ or χ , we can get one sided confidence interval for σ as well as σ.
98
Statistical Methods (Stat 1013)Lecture Notes UoG
If the underlying data is normally distributed, the appropriate test statistic use the test ratio
∑ ( ) ( )
χ = = , which has the chi-squared distribution with n - 1 degrees of freedom under H
σ σ
( )
(often written χ ).
Decision rule: Compare this with chi-square tables, or use the p-value.
For a two-tailed test H : σ = σ versus H : σ ≠ σ we would pick two values of chi-squared,
χ and χ , and accept the null hypothesis if χ lies between them, i.e,
χ > χ , n − 1, or χ < χ ,n −1
Example 1: Assume that we believe that the distribution of the ages of a group of workers is normal, and
we wish to test our belief that the variance is 64. Our data is a sample of 17 workers, and our computations
give us a sample variance of 100. Let us set our significance level at 2% and state our problem as follows:
Hypothesis: H : σ = 64 versus H : σ ≠ 64
n = 17, df = n-1 = 16, s = 100, σ = 64, α = 0.02
( ) ( )
We can compute χ = σ
= = 25. since α/2 =0.01 and df = 16, we go to the chi-squared table
( ) ( )
to find χ .
= 5.812 and χ .
= 32.
The acceptance region is between these two values, so we cannot reject the null hypothesis, i.e, 5.812 < 25
< 32.
Conclusion: At 2% level of significance there is no enough evidence to reject the statement that
distribution of the ages of a group of workers variance is 64.
99
Statistical Methods (Stat 1013)Lecture Notes UoG
Example 3: A company claims that the standard deviation of the lengths of time it takes an incoming
telephone call to be transferred to the correct office is less than 1.4 minutes. A random sample of 25
incoming telephone calls has a standard deviation of 1.1 minutes. At α = 1%, is there enough evidence to
support the company’s claim? Assume the population is normally distributed.
Solution:
Hypothesis: H : σ = 1.4 min. versus H : σ < 1.4 min. (claim)
α = 0.01, df = 25-1 = 24
rejection region:χ . , 24 = 15.659
Test statistic:
(n − 1)s (25 − 1)(1.1)
χ = = = 14.846
σ 1.4
Decision: Reject H .
100
Statistical Methods (Stat 1013)Lecture Notes UoG
At the 10% level of significance, there is enough evidence to support the claim that the standard deviation
of the lengths of time it takes an incoming telephone call to be transferred to the correct office is less than
1.4 minutes.
5.5. Estimation and Hypothesis Testing for Comparing Two Population Variance
Suppose that two independent normal populations are of interest, where the population means and
variances, say, μ , σ , μ , and σ , are unknown. We wish to test hypotheses about the equality of the
two variances, say, H : σ = σ . If we have two separate samples and want to test to see if their
variances are equal, we test a ratio of chi-squares instead of a difference, as we would do as with means or
proportions. The ratio of two χ 's from the same population has the F distribution, a distribution named
after R A. Fisher, one of the founders of modern statistics.
Definition: Let W and Y be independent chi-square random variables with u and v degrees of freedom,
f(x) = ( ) , 0 < x < ∞ is said to follow the f distribution with u degrees of freedom
101
Statistical Methods (Stat 1013)Lecture Notes UoG
Assume that two random samples of size n from population 1 and of size n from population 2 are
available and let s and s be the sample variances. We wish to test the hypotheses
H : σ /σ = 1 versus H : σ /σ ≠1
( )
In section 5.3 and 5.4 we found that χ = σ
is an example of the chi-squared distribution with n-1
degrees of freedom. Thus, the first of two sample variances s , when multiplied by n , its sample size,
and σ , the variance of its parent distribution, will have the chi-squared distribution with n − 1 degrees
of freedom, and we can write
( ) ( )
χ = , similarly, χ = . If this is true and σ = σ , the ratio
σ σ
Let x , x ,, … , x be a random sample from a normal population with mean μ , and variance σ ,
and let x , x ,, … , x be a random sample from a second normal population with mean μ , and
variance σ ,. Assume that both normal populations are independent. Let
102
Statistical Methods (Stat 1013)Lecture Notes UoG
H : = 1 versus H : >1
From these data we compute s = 1.797 and n = 5. s = 0.056 and n = 6. Thus the F-ratio is:
.
F= = = 32.09.
.
( , )
Since df = n − 1 = 4 and df = n − 1 = 5, we look up F . = 11.39 on our F table. Since the
calculated value F is much larger than the critical value, we reject the null hypothesis and conclude that
the variance for government lawyers, which serves as a measure of risk, is smaller.
Example 2: A study was performed on patients with pituitary adenomas. The standard deviation of the
weights of 12 patients with pituitary adenomas was 21.4 kg. A control group of 5 patients without
pituitary adenomas had a standard deviation of the weights of 12.4 kg. We wish to know if the weights of
the patients with pituitary adenomas are more variable than the weights of the control group.
Solution:
Pituitary adenomas: = 12, = 21.4 kg
Control: = 5, = 12.4 kg
α = 0.05
103
Statistical Methods (Stat 1013)Lecture Notes UoG
Assumptions
• Each sample is a simple random sample
• Samples are independent
• Both populations are normally distributed
Discussion: We cannot reject H because 2.98 < 5.91. The calculated value for F falls in the non rejection
region (p-value = 0.1517).
Conclusions: The weights of the population of patients may not be any more variable than the weights of
the control subjects.
104
Statistical Methods (Stat 1013)Lecture Notes UoG
Chapter 6
6. Chi-Square Tests
6.1. Introduction
Consider we have two categorical variables, and we want to know if the differences in sample proportions
are likely to have occurred just by chance due to random sampling.
Chi-Square Distribution
The chi-squared distribution is concentrated over nonnegative values. It has mean equal to its degrees of
freedom (df), and its standard deviation equals √2df. The distribution is skewed to the right, but it
becomes more bell-shaped (normal) as df increases. The df value equals the difference between the
number of parameters in the alternative hypothesis and in the null hypothesis, as explained later in this
section. Figure 6.1 displays chi-squared densities having df = 1, 5, 10, and 20.
If the attributes are independent then the probability of possessing both A and B is PA*PB. Where PA is
the probability that a number has attribute A. PB is the probability that a number has attribute B.
Suppose A has r mutually exclusive and exhaustive classes. And B has c mutually exclusive and
exhaustive classes
Objective: To study the association/relationship between two categorical variables. The two
characteristics can be cross classified as follows:
Where n = observed counts for ith level of A and jth level of B. such a table that contains frequency
counts is called contingency table, since there are r rows and c columns we call it rxc contingency table.
n . = ∑ n , n. = ∑ n and n = ∑ ∑ n
The chi-square procedure test is used to test the hypothesis of independency of two attributes. For instance
we may be interested.
o Whether the presence or absence of hypertension is independent of smoking habit or not.
o Whether the size of the family is independent of the level of education attained by the mothers.
o Whether there is association between father and son regarding boldness.
o Whether there is association between stability of marriage and period of acquaintance ship prior to
marriage.
The χ statistics is given by:
( )
χ =∑ ∑ ~χ (r − 1)(c − 1)
e =
106
Statistical Methods (Stat 1013)Lecture Notes UoG
Remark:
n=∑ ∑ O =∑ ∑ e
The null and alternative hypothesis may be stated as:
H : There is no association between A and B
H : Not H (there is association between A and B)
Decision Rule:
Reject H for independency at α level of significance if the calculated value of χ exceeds the tabulated
value with degree of freedom equal to (r-1)(c-1).
Example 1: Random samples of 200 men, all retired were classified according to education and number of
children is as shown below
Education Number of children
level 0-1 2-3 Over 3
elementary 14 37 32
Secondary 31 59 27
and above
Test the hypothesis that the size of the family is independent of the level of education attained by fathers.
(Use 5% level of significance)
Solution:
H : There is no association between the size of the family and the level of education attained by fathers.
H : Not H
First calculate the raw and column totals
= 83, = 117, = 45, = 96, = 59
Then, calculate the expected frequency:
×
=
.
(2) = 5.99 from table
The decision is to reject H since > .
(1)
107
Statistical Methods (Stat 1013)Lecture Notes UoG
Conclusion: At 5% level of significance we have evidence to say there is association between number
of children and education attained by fathers, based on this sample data.
2x2 Contingency table
a. Consider two categorical variables with only two levels each, say A (A1, A2) and B (B1, B2).
Objective: Testing for independence or no association
Characteristic A Characteristic B
B B Total
A n (a) n (b) n (a+b)
A n (c) n (d) n (c+d)
Total n (a+c) n (b+d) n = (a+b+c+d)
Where a, b, c, and d are observed frequencies and N = total sample size
Hypothesis to be tested:
The chi-square test value can be computed as:
n(ad − bc)
χ = ~χ (1)
(a + b)(a + c)(c + d)(b + d)
Examples: 1. a geneticist took a random sample of 300 men to study whether there is association between
father and son regarding boldness. He obtained the following results.
Son
Father bold Not
Bold 85 59
Not 65 91
Using α = 5% test whether there is association between father and son regarding boldness.
Solution:
H : There is no association between father and son regarding boldness.
H : Not H
The appropriate test statistic is:
n(ad − bc) 300(85x91 − 59x65) 4,563,000,000
χ = = = = 9.02
(a + b)(a + c)(c + d)(b + d) (85 + 59)(85 + 65)(65 + 91)(59 + 91) 505,440,000
.
(1) = 3.841 from table
The decision is to reject since > .
(1)
Conclusion: At 5% level of significance we have evidence to say there is association between father and
son regarding boldness, based on this sample data.
6.3. Chi-square Test of Homogeneity
Test the null hypothesis that the samples are drawn from populations that are homogeneous with respect to
some factor.
108
Statistical Methods (Stat 1013)Lecture Notes UoG
In a chi-square test for homogeneity of proportions, we test whether different populations have the same
proportion of individuals with some characteristic i.e., to determine whether k population are
homogeneous with respect to a certain characteristic.
The procedures for performing a test of homogeneity are identical to those for a test of independence.
Procedures
Select individuals from each population.
from population 1
from population 2
.
.
.
from population k
Let the k populations be A , A … A . Classify each sample into c categories of characteristic
B , B , … , B . The resulting classification is kxc contingency table.
Hypothesis to be tested
H0: The k populations are homogeneous with respect to characteristic B
Ha: H0 is not true
Equivalently
H : The proportions for each population are equal i.e. H : p = p = p for all j = 1, 2, …, c
H : At least one of the proportions is different from the other.
Where p are the probability or proportion of jth level of characteristic B, within the ith population A .
The expected frequencies are found by multiplying the appropriate row and column totals and then
dividing by the total sample size. Usually they are given in parentheses in the contingency table, along
with the observed frequencies. The expected frequencies in the (ij)th cell is given as;
i row total x j column total
e =
grand total
While the degrees of freedom are obtained similarly to that of independence test, i.e, df = (k-1)x(c-1).
109
Statistical Methods (Stat 1013)Lecture Notes UoG
Example: In a study of demand for new product a random sample of 270, 250, and 300 customers are
interviewed from three cities respectively. The data are as follows:
Do the data indicate that demand for the new products differ in the three cities?
Solution:
Hypothesis:
H : The three cities are homogeneous with respect to demand for the new product.
H : H is not true
Calculate the expected frequencies
Note: Estimated expected frequencies are presented along with the observed frequencies in parentheses.
Calculate the chi square statistic:
( ) ( . ) ( . ) ( )
χ =∑ ∑ = + + …+ = 52.46
. .
This statistic takes its minimum value of zero when all O = E . For a fixed sample size, greater
differences {O − E } produce larger χ values and stronger evidence against H . Since larger χ values
are more contradictory to H , the P-value is the null probability that χ is at least as large as the observed
value. The χ statistic has approximately a chi squared distribution, for large n. The P-value is the chi-
squared right-tail probability above the observed χ value.
Read the critical value from chi-square distribution table or find p-value
Compare the calculated value and tabulated values of the chi-square statistic and make decision. Reject
the null hypothesis for large values of χ statistic.
111
Statistical Methods (Stat 1013)Lecture Notes UoG
Example 1: For example, suppose as a market analyst you wished to see whether consumers have any
preference among five flavors of a new fruit soda. A sample of 100 people provided these data:
Cherry Strawberry Orange Lime Grape
32 28 16 14 10
If there were no preference, you would expect each flavor to be selected with equal frequency. In this
case, the equal frequency is 100/5 =20. That is, approximately 20 people would select each flavor.
Frequency Cherry Strawberry Orange Lime Grape
Observed 32 28 16 14 10
Expected 20 20 20 20 20
Is there enough evidence to reject the claim that there is no preference in the selection of fruit soda
flavors, using the data shown previously? Let α = 0.05.
Solution:
H : Consumers show no preference for flavors (claim).
H : Consumers show a preference.
Find the critical value. The degrees of freedom are 5 -1 =4, and α = 0.05.
Hence, ( . ) = 9.488
( ) ( ) ( ) ( ) ( ) ( )
χ =∑ = + + + + = 18
Make the decision. The decision is to reject the null hypothesis, since 18.0 > 9.488
Conclusion: There is enough evidence to reject the claim that consumers show no preference for the
flavors.
Example 2: Medelian Law of Genetics
Shape and color of a certain pea be classified into four groups “round and yellow”, “round and green”,
“angular and yellow” and “angular and green” according to the ratio 9:3:3:1. For an experiment with n =
556 peas, the following table were observed. We are interested in that; is there good agreement between
the observed experiment number and the expected ratio 9:3:3:1. Assume that these measurements came
from an underlying known discrete probability distribution 9/16, 3/16, 3/16, 1/16. How can the validity of
this assumption be tested?
112
Statistical Methods (Stat 1013)Lecture Notes UoG
113
Statistical Methods (Stat 1013)Lecture Notes UoG
Chapter 7
7. Non parametric Methods
7.1. Introduction:
All statistical inference procedures discussed so far are based on specific assumptions regarding the nature
of the underlying population distribution. The most commonly used underlying population distribution is
called a normal distribution. The normal distribution plays a very important role in the statistical
reference. Although the underlying population distribution might not be “exactly” normally distributed, it,
in many cases, can be very well approximated by a normal distribution. In practice, however, the data
might come from a given population that cannot be well approximated by a normal distribution. For
example, the distribution might be very flat, peaked, or strongly skewed to the right or left.
In a nonparametric statistical inference, the methods do not depend on the specific distribution of the
population from which the sample was drawn. Therefore, assumptions regarding the underlying
population are not necessary. In order to understand what non parametric statistics are, it is first necessary
to know what parametric statistics are.
114
Statistical Methods (Stat 1013)Lecture Notes UoG
When we do significance tests, we rely on the assumption that the sampling distribution of samples taken
follows the t-distribution or the normal-distribution, depending on the situation. When this assumption is
not true, none of our tests, which are called “parametric statistical inference tests,” are reliable.
115
Statistical Methods (Stat 1013)Lecture Notes UoG
The basic assumption for nonparametric statistics is that the sample or samples are randomly obtained.
When two or more samples are used, they must be independent of each other unless otherwise stated.
116
Statistical Methods (Stat 1013)Lecture Notes UoG
( , )=( ( / ), ( / ))
Where / = ( ), +1
/ = − ( ),
Or
( ) √
M= ±
The weekly weight of recyclable material (in pounds/week) for each household is given here.
14.2, 5.3, 2.9, 4.2, 1.2, 4.3, 1.1, 2.6, 6.7, 7.8, 25.9, 43.8, 2.7, 5.6, 7.8, 3.9, 4.7, 6.5, 29.5, 2.1, 34.8, 3.6, 5.8,
4.5, 6.7
The sample median and a confidence interval on the population are given by the following computations.
Solution:
First, we order the data from smallest value to largest value:
1.1, 1.2, 2.1, 2.6, 2.7, 2.9, 3.6, 3.9, 4.2, 4.3, 4.5, 4.7, 5.3, 5.6, 5.8, 6.5, 6.7, 6.7, 7.8, 7.8, 14.2, 25.9, 29.5,
34.8, 43.8
The number of values in the data set is an odd number, so the sample median is given by
Next we will construct a 95% confidence interval for the population median. From Table value, we find
The main objective of this section is to understand the sign test procedure, which is among the easiest for
non-parametric tests.
The sign test is a non-parametric (distribution free) test that uses plus and minus signs to test different
claims, including:
118
Statistical Methods (Stat 1013)Lecture Notes UoG
119
Statistical Methods (Stat 1013)Lecture Notes UoG
18 39 43 34 40 39 16 45 22 28
30 36 29 40 32 34 37 39 36 52
At α = 0.05, test the owner’s hypothesis.
Solution:
Step 1: stat the hypothesis and identify the claim
H0: Median = 40 (claim) and H1: Median ≠ 40
Step 2: find the critical value. Compare each value of the data with the median. If the value is greater than
the median, replace the value with a plus sign. If it is less than the median, replace it with a minus sign.
And if it is equal to the median, replace it with a 0. The completed value follows.
- - + - 0 - - + - -
- - - 0 - - - - - +
Using n = 18 (the total number of plus and minus signs; omit the zeros) and = 0.05 for a two-tailed test;
the critical value is 4.
Step 3 Compute the test value. Count the number of plus and minus signs obtained in step 2, and use the
smaller value as the test value. Since there are 3 plus signs and 15 minus signs, 3 is the test value.
Step 4 Make the decision. Compare the test value 3 with the critical value 4. If the test value is less than
or equal to the critical value, the null hypothesis is rejected. In this case, the null hypothesis is rejected
since 3 < 4.
Step 5 Summarize the results. There is enough evidence to reject the claim that the median number of
snow cones sold per day is 40.
Example 2:-Based on information from the U.S. Census Bureau, the median age of foreign-born U.S.
residents is 36.4 years. A researcher selects a sample of 50 foreign-born U.S. residents in his area and
finds that 21 are older than 36.4 years. At α = 0.05, test the claim that the median age of the residents is at
least 36.4 years.
Solution:
Step 1 State the hypotheses and identify the claim.
H0: MD = 36.4 and H1: MD > 36.4
Step 2 Find the critical value. Since α = 0.05 and n = 50, and since it is a right-tailed test, the critical value
is 1.65, obtained from Z-table.
Step 3 computes the test value
( . ) ( ) ( . ) ( )
z= = = -3.5/3.5355 = - 0.99
√ / √ /
120
Statistical Methods (Stat 1013)Lecture Notes UoG
Step 4 Make the decision. Since the test value of - 0.99 is less than 1.65, the decision is to not reject the
null hypothesis.
Step 5 Summarize the results. There is not enough evidence to reject the claim that the median age of the
residents is at least 36.4.
7.3.2 Paired-Sample Sign Test
The sign test can also be used to test sample means in a comparison of two dependent samples, such as a
before-and-after test. Recall that when dependent samples are taken from normally distributed
populations, the t test is used. When the condition of normality cannot be met, the nonparametric sign test
can be used. When using the sign test with data that are matched by pairs, we convert the raw data to plus
and minus signs as follows:
Example: - A medical researcher believed the number of ear infections in swimmers can be reduced if the
swimmers use earplugs. A sample of 10 people was selected, and the number of infections for a four-
month period was recorded. During the first two months, the swimmers did not use the earplugs; during
the second two months, they did. At the beginning of the second two-month period, each swimmer was
examined to make sure that no infections were present.
The data are shown here. At =0.05, can the researcher conclude that using earplugs reduced the number
of ear infections?
Number of ear infections
Swimmer Before, After,
A 3 2
B 0 1
C 5 4
D 4 0
E 2 1
F 4 3
G 3 1
H 5 3
I 2 2
J 1 3
Solution:
Step 1 State the hypotheses and identify the claim.
H0: The number of ear infections will not be reduced.
H1: The number of ear infections will be reduced (claim).
Step 2 Find the critical value. Subtract the after values X from the before values X and indicate the
difference by a positive or negative sign or 0, according to the value, as shown in the table.
121
Statistical Methods (Stat 1013)Lecture Notes UoG
122
Statistical Methods (Stat 1013)Lecture Notes UoG
Denoting hired men by + and hired women by -, we have 30 positive signs and 70 negative signs. We note
that the value of n =100 is above 25, so the test statistic x is converted (using a correction for continuity)
to the test statistic z as follows:
( . ) ( ) ( . ) ( )
z= = = - 3.90, with α = 0.05 in a two tailed test, the critical values
√ / √ /
are z = ±1.96
The test statistic z = -3.9 is less than 1.96, so we fail to reject the null hypothesis that the proportion of
hired men is equal to 0.5. There is sufficient sample evidence to support the claim that the hiring practices
are fair, with the proportions of hired men and women both equal to 0.5.
7.4 Wilcoxon Rank Sum Test
The reader should note that the sign test utilizes only the plus and minus signs of the differences between
the observations and μ in the one-sample case, or the plus and minus signs of the differences between the
pairs of observations in the paired-sample case; it does not take into consideration the magnitudes of these
differences. A test utilizing both direction and magnitude, proposed in 1945 by Frank Wilcoxon, is now
commonly referred to as the Wilcoxon signed-rank test.
The analyst can extract more information from the data in a nonparametric fashion if it is reasonable to
invoke an additional restriction on the distribution from which the data were taken. The Wilcoxon signed-
rank test applies in the case of a symmetric continuous distribution. Under this condition, we can test
the null hypothesis μ = μ . We first subtract μ from each sample value, discarding all differences equal to
zero. The remaining differences are then ranked without regard to sign. A rank of 1 is assigned to the
smallest absolute difference (i.e., without sign), a rank of 2 to the next smallest, and so on. When the
absolute value of two or more differences is the same, assign to each the average of the ranks that would
have been assigned if the differences were distinguishable.
For example, if the fifth and sixth smallest differences are equal in absolute value, each is assigned a rank
of 5.5. If the hypothesis μ = μ is true, the total of the ranks corresponding to the positive differences
should nearly equal the total of the ranks corresponding to the negative differences. Let us represent these
totals by w+ and w−, respectively. We designate the smaller of w+ and w− by w.
In selecting repeated samples, we would expect w+ and w−, and therefore w, to vary. Thus, we may think
of w+, w−, and w as values of the corresponding random variables W+, W−, and W. The null hypothesis μ
= μ can be rejected in favor of the alternative μ < μ only if w+ is small and w− is large. Likewise, the
alternative μ > μ can be accepted only if w+ is large and w− is small. For a two-sided alternative, we may
123
Statistical Methods (Stat 1013)Lecture Notes UoG
reject H0 in favor of H1 if either w+ or w−, and hence w, is sufficiently small. Therefore, no matter what
the alternative hypothesis may be, we reject the null hypothesis when the value of the appropriate statistic
W+, W−, or W is sufficiently small.
Two Samples with Paired Observations
To test the null hypothesis that we are sampling two continuous symmetric populations with μ = μ for
the paired-sample case, we rank the differences of the paired observations without regard to sign and
proceed as in the single-sample case. The various test procedures for both the single- and paired-sample
cases are summarized in the following Table.
Or
H0: The two samples come from populations with the same distribution
H1: The two samples come from populations with different distribution
The Wilcoxon signed rank test can also be used to test the claim that a sample comes from a population
with a specified median.
124
Statistical Methods (Stat 1013)Lecture Notes UoG
It is not difficult to show that whenever n < 5 and the level of significance does not exceed 0.05 for a one-
tailed test or 0.10 for a two-tailed test, all possible values of w+, w−, or w will lead to the acceptance of
the null hypothesis. However, when 5 ≤ n ≤ 30, the table shows approximate critical values of W+ and
W− for levels of significance equal to 0.01, 0.025, and 0.05 for a one-tailed test and critical values of W
for levels of significance equal to 0.02, 0.05, and 0.10 for a two-tailed test. The null hypothesis is rejected
if the computed value w+, w−, or w is less than or equal to the appropriate tabled value. For example,
when n = 12, the table shows that a value of w+ ≤ 17 is required for the one-sided alternative μ < μ to be
significant at the 0.05 level.
Example 1: According to the director of a county tourist bureau, there is a median of 10 hours of sunshine
per day during the summer months. For a random sample of 20 days during the past three summers, the
number of hours of sunshine has been recorded below.
Use the 0.05 level in evaluating the director’s claim.
125
Statistical Methods (Stat 1013)Lecture Notes UoG
126
Statistical Methods (Stat 1013)Lecture Notes UoG
Step 4: Find the sum of the absolute values of the negative ranks. Also find the sum of the positive ranks.
Step 5: Let T be the smaller of the two sums found in Step 4. Either sum could be used, but for a
simplified procedure we arbitrarily select the smaller of the two sums.
Step 6: Let n be the number of pairs of data for which the difference d is not 0.
Step 7: Determine the test statistic and critical values based on the sample size, as shown above.
Step 8: When forming the conclusion, reject the null hypothesis if the sample data lead to a test statistic
that is in the critical region that is, the test statistic is less than or equal to the critical value(s). Otherwise,
fail to reject the null hypothesis.
Example 2: In a large department store, the owner wishes to see whether the number of shoplifting
incidents per day will change if the number of uniformed security officers is doubled. A sample of 7 days
before security is increased and 7 days after the increase shows the number of shoplifting incidents.
Step 2 Find the critical value. Since n = 7 and α = 0.05 for this two-tailed test, the critical value is 2.
Step 3 Find the test value.
127
Statistical Methods (Stat 1013)Lecture Notes UoG
128
Statistical Methods (Stat 1013)Lecture Notes UoG
When samples are selected, you assume that they are selected at random. How do you know if the data
obtained from a sample are truly random?
Example: Consider the situation for a research interviewing 20 students for pilot survey. Let the gender is
represented by M for male and F for female. Suppose the participants were chosen as follows:
Situation 1: MMMMMMMMMMFFFFFFFFFF (It does not look random, i.e. 10 male select first and 10
female next)
Situation 2: FMFMFMFMFMFMFMFMFMFM (It seems as the researcher select M and F alternatively)
Situation 3: FFFMMFMFMMFFMMFFMMMF (It looks random, no apparent pattern to their selection)
Rather than try to guess whether the data of a sample have been selected at random, statistician have
devised a non parametric test to determine randomness. That test is called the run test.
A run is a succession of identical letter preceded or followed by a different letter at all, such as the
beginning or the end of succession.
Example: Consider situation 1 and 3, it have 2 and 11 runs respectively.
Situation 1: Run 1: MMMMMMMMMM & Run 2: FFFFFFFFFF
Situation 2: Run: FFF Run 2: MM Run 3: F Run 4: M Run 5: F Run 6: MM Run 7: FF Run 8:
MM Run 9: FF Run 10: MMM Run 11: F
129
Statistical Methods (Stat 1013)Lecture Notes UoG
The main objective of this section is to introduce the runs test for randomness; whish can be used to
determine whether the sample data in a sequence are in a random order.
The test for randomness considers the number of runs rather than the frequency of the letters. For
example, for data to be selected at random there should not be too few or too many runs.
130
Statistical Methods (Stat 1013)Lecture Notes UoG
Example 1:
Listed below are the most recent (as of this writing) winners of the NBA basketball championship game.
Let W denote a winner from the Western Conference, E for the Eastern Conference.
Use a 0.05 significance level to test for randomness in the sequence:
EEWWWWWEWEWEWWW
Solution:
Let n1 = number of Eastern Conference Winners = 5
n2 = number of Western Conference Winners = 10
R = number of runs = 8
Because n1 ≤ 20 and n2 ≤ 20 and α = 0.05, the test statistic is R = 8. From table the critical values are 3
and 12.
131
Statistical Methods (Stat 1013)Lecture Notes UoG
Because G = 8 is neither less than or equal to 3 nor greater than or equal to 12, we do not reject
randomness. There is not sufficient evidence to reject randomness in the sequence of winners.
Example 2:
Let’s consider the sequence of listed products by a certain machine indicated below. Test the claim that
the sequence is random using a 0.05 significance level.
NNN D NNNNNNN D NN DD NNNNNN D NNNN D NNNNN
DDDD NNNNNNNNNNNN
Where N = normal product
D = defective product
132
Statistical Methods (Stat 1013)Lecture Notes UoG
The basic idea underlying the Wilcoxon rank-sum test is this: If two samples are drawn from identical
populations and the individual values are all ranked as one combined collection of values, then the high
and low ranks should fall evenly between the two samples. If the low ranks are found predominantly in
one sample and the high ranks are found predominantly in the other sample, we suspect that the two
populations have different medians.
133
Statistical Methods (Stat 1013)Lecture Notes UoG
o For small samples (n1 +n2 ≤ 30), compare the smaller of T1 and T2 with the rejection region
consisting of values less than the critical values. If either T1 or T2 falls in the rejection region, we
reject the null hypothesis. Note that even though this is a two-tailed test, we only use the lower
quantiles of the tabled distribution.
o The test statistic is T = smaller of T1 or T2.
134
Statistical Methods (Stat 1013)Lecture Notes UoG
135
Statistical Methods (Stat 1013)Lecture Notes UoG
136