0% found this document useful (0 votes)
7 views

STA-PROB PDF

The document outlines a course on Statistics for Physical Science and Engineering, detailing various statistical methods, probability distributions, estimation, hypothesis testing, and regression analysis. It includes a comprehensive table of contents covering topics such as measures of location, dispersion, and the design of experiments. Additionally, it discusses the importance, limitations, and definitions of key statistical terms and concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

STA-PROB PDF

The document outlines a course on Statistics for Physical Science and Engineering, detailing various statistical methods, probability distributions, estimation, hypothesis testing, and regression analysis. It includes a comprehensive table of contents covering topics such as measures of location, dispersion, and the design of experiments. Additionally, it discusses the importance, limitations, and definitions of key statistical terms and concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 208

STA-PROB-WORK BOOK

FOR
PHYSICAL SCIENCE
AND
ENGINEERING
STA 213/MTH 214 STATISTICS FOR PHYSICAL SCIENCE AND ENGINEERING: 3 Units
Course Outline
Scope for statistical methods in physical sciences and engineering. Measures of location, partition and
dispersion. Elements of probability. Probability distribution: binomial, Poisson, geometric,
hypergeometric, negative-binomial, normal, Weibull, Gompertz etc. Estimation (Point and internal) and
tests of hypotheses concerning population means proportions and variances. Regression and correlation.
Non-parametric tests. Contingency table analysis. Introduction to design of experiments. Analysis of
variance.

STA 211: Probability II (3 Units C: LH 45)


Further permutation and combination. probability laws. Conditional probability, independence. Bayes‘
theorem. probability distribution of discrete and continuous random variables: binomial, Poisson,
geometric, hypergeometric, rectangular (uniform), negative exponential, binomial. Expectations and
moments of random variables. Chebyshev‘s inequality. joint marginal and conditional
distributions and moments. Limiting distributions. Discrete, and continuous random variables, standard
distributions, moments and moment-generating functions. Laws of large numbers and the central limit
theorem.

NAME OF STUDENT: ……………………………………………………………………………………………………..…………………

MATRIC NO.: …………………………………………………………………………………………………………………………………

DEPARTMENT: ………………………………………………………………………………………………………………………………

COLLEGE: ……………………………………………………………………………………………………………………………………

SESSION: ………………………………………………………………………………………………………………………………………

2
TABLE OF CONTENTS
Chapter One
1.1. Introduction: Origin and Development of Statistics 6
1.2. Definition of Statistics 6
1.3. Importance and Scope of Statistical Methods in Physical Sciences and Engineering 6
1.4. Limitations of Statistics 7
1.5. Definition of Basic terms in Statistics 7

Chapter Two
2.1 Average or Measures of Location or Central Tendency 13
2.2 Arithmetic Mean or Mean 13
2.3 Median 16
2.4 Mode 19
2.5 Empirical Relationship between Mean, Median and Mode
2.6 Geometric Mean 22
2.7 Harmonic Mean 22
2.8 Relation between Arithmetic Mean, Geometric Mean and Harmonic Mean 22

Chapter Three
3.1 Measure of Dispersion 25
3.2 Objectives and Significance of the Measures of Dispersion 25
3.3 Range 25
3.4 Mean Deviation or Average Deviation 27
3.5 Variance 32
3.6 Standard Deviation 33
3.7 Interpreting Standard Deviation 36
3.8 Empirical Rule 36
3.9 Chebyshev‘s Theorem 39
3.10 Z-scores or Standard Scores 42
3.11 Variance and Standard Deviation using Assumed Mean and scaling factor 42
3.12 Coefficient of Variation 44
3.13 Measure of partition: Deciles, Percentiles, and Quartiles 44
3.14 Skewness and Kurtosis 51

Chapter Four
4.1 Introduction to Probability 54
4.2 Definitions 54
4.3 Properties of Probability 55
4.4 Some Probability Laws 56
4.5 Conditional Probability 58
4.6 Baye‘s Theorem 59
4.7 Permutations and Combinations 60

3
Chapter Five
5.1 Discrete Probability Distribution 63
5.2 Mean or Expectation, and Variance of discrete probability distributions 63
5.3 Bernoulli distribution 64
5.4 Binomial distribution 64
5.5 Poisson distribution 67
5.6 Poisson Approximation to Binomial 68
5.7 Geometric distribution 68
5.8 Hypergeometric distribution 68
5.9 Negative-binomial distribution 69
5.10 Moment Generating Functions 70
5.11 Probability Generating Functions 78
5.12 Sample Distribution 83
5.13 The Central Limit Theorem 83
5.14 The law of Large Number 84

Chapter Six
6.1 Introduction to Continuous Probability Distributions 89
6.2 Mean or Expectation, and Variance of continuous probability distributions 96
6.3 Normal distribution 98
6.4 Uniform distribution 103
6.5 Exponential distribution 105
6.6 Gamma distribution 105
6.7 Weibull distribution 108
6.8 Gompertz distribution 114
6.9 Cauchy distribution 120
6.10 Chi-Square distribution 121
6.11 Beta distribution 121
6.12 Quantile Functions, Moments and Order statistic of Continuous Probability distributions 121
6.13 Maximum Likelihood estimation of parameters of Continuous Probability distributions 128

Chapter Seven
7.1 Introduction to Statistical Estimation 131
7.2 Statistical Point Estimation 132
7.3 Sampling distribution of a statistic 132
7.4 Statistical Interval Estimation 133

Chapter Eight
8.1 Introduction to Statistical Test of Hypothesis- Simple and Alternative 145
8.2 Small Sample Tests for Population Mean and Difference of Two population Means 147
8.3 Large Sample Tests for Population Mean and Difference of Two population Means 154
8.4 Tests of Hypothesis for Proportion (One Sample Case) 156
8.5 Test of Equality of Two Proportions (Two Sample Case) 157
8.6 Nonparametric Tests 167

4
8.7 Contingency Table for Independence of Factors 168

Chapter Nine
9.1 Introduction 172
9.2 Correlation 172
9.3 Types of Correlation 172
9.4 Methods of Computing Correlation coefficient 174
9.5 Meaning of Regression Analysis 183
9.6 Types of Regression Analysis 183
9.7 Least Squares Method For Estimating Parameter Estimates Of Regression Models 183
9.8 Causes of Deviation of the Fitted Value from the Observed Value 185
9.9 Standard Errors of the Estimated Parameters for Simple Linear Regression 187
9.10 Hypothesis Testing of the Parameters for Simple Linear Regression 190
9.11 Curvilinear Regression 195

Chapter Ten
10.1 Introduction to design of experiments 199
10.2 Validity of Experimentation 201
10.3 Statistical Designs 202
10.4 Requirements for a good Experimental Design 203
10.5 Analysis of Variance (ANOVA) 203
10.6 Types of Analysis of Variance (ANOVA) 203
10.7 ANOVA Table 204

5
Chapter One
1.1 Introduction
The subject of Statistics is an old discipline; as old as human society itself. It has been used since the
existence of man on earth. Though, the sphere of its utility was very much restricted. However, Statistics
was regarded as a universal language of sciences derived from the Latin word ‗status‘ or the Italian word
‗statista‘ or the German word ‗statistik‘ or the French word ‗statistique‘ with each meaning a political
state. In the ancient times, the scope of the subject of Statistics was initially limited to the collection of
data by the governments for framing military and fiscal policies;
(i) Age and sex-wise population of the country;
(ii) Property and wealth of the country.
However, the subject of Statistics has now adopted in several places like the academia, economics,
medical records, marking, hydrology, philosophy, health and life sciences, psychology, sociology,
education, medicine, business, nursing etc. In some cases, it has been adopted as a universal language of
sciences. Thus, the understanding and careful use of the statistical methods has enable us to accurately
describe the outcomes or finding of several scientific research, make qualitative and quantitative
decisions, and possibly make useful estimations.

1.2. Definition of Statistics.


Statistics is the science that deals with gathering (collecting), classifying, analysing, presenting, and
interpreting data. Statistics helps us turn data into information to see the relationship between variables—
or the "big picture." This course is an introduction to statistics with an emphasis on modern engineering
applications. Students explore concepts of probability theory, discrete and continuous random variables,
bivariate probability distributions, categorical data analysis, and model building. Statistics can also be
defined as a subject or science that classifies facts representing the conditions of the people in a stat…
specially those facts which can be stated in number or in tables of numbers or in any tabular or classified
arrangement. In some contests, it‘s a science of the measurement of social organism, regarded as a whole
in all its manifestations. It can also be defined science of estimates and probabilities.

1.3 Importance and Scope of Statistical Methods in Physical Sciences and Engineering.
In the ancient times, the word Statistics was regarded only as the science of state-craft, and was used to
collect information relating to crime, military strength, population, wealth, etc., for devising military and
fiscal policies. Some of importance include:
(i) Statistics in Planning
(ii) Statistics in state
(iii) Statistics in Mathematics
(iv) Statistics in Economics
(v) Statistics in Business and Management
(vi) Statistics in Accountancy and Auditing
(vii) Statistics in Industry
(viii) Statistics in Insurance
(ix) Statistics in Astronomy
(x) Statistics in Physical Sciences
(xi) Statistics in Social Sciences
(xii) Statistics in Biology and Medical Sciences

6
(xiii) Statistics in Psychology and Education
(xiv) Statistics in War

1.4 Limitations of Statistics.


Though Statistics is indispensable to almost all sciences or social sciences, physical and natural sciences,
however, it has several limitations that restrict its scope. These include the following:
(i) Statistics does not study qualitative phenomenon.
(ii) Statistics does not study individuals.
(iii) Statistical laws are not exact.
(iv) Statistics is liable to be misused.
Thus, the use of Statistics by experts who are well experienced, and skilled in the analysis, and
interpretation of statistical data for drawing correct, and valid inferences very much reduces the chances
of mass popularity of this important sciences.

1.5 Definition of Basic terms in Statistics


a. Descriptive statistics: This refers to the methods of organizing, summarizing and presenting data
in an informative way or manner.
b. Inferential statistics or statistical inference or inductive statistics: This involves finding
information about a population based on the knowledge gather from the study of a representative
sample of the population.
c. Variable: A variable is a symbol or anything that can assume any of a prescribed set of values. A
variable that can theoretically assume any value between two given values is called a continuous
variable; otherwise it is called a discrete variable. A variable is a characteristic under study that
assumes different values for different elements. In contrast to a variable, the value of a constant is
fixed.
d. Population or Universe: It is the entire collection, or set of individuals or objects whose properties
are to be analysed.
e. Sample: A sample is a subset of a population. It is a subset containing the characteristics of a
larger population. It is a representative part of a population.
f. Data: Data are raw facts or unprocessed information.
g. Statistic: A statistic is a quantity whose numerical values can be obtained from data. Also, a fact
or piece of data obtained from a study of a large quantity of numerical data. "the statistics show
that the crime rate has increased". This is a characteristic of a sample.
h. Experiment: An ordered procedure which is performed with the objective of verifying, and
determining the validity of the hypothesis.
i. Parameter: This is a characteristic of a population. Examples include population mean, and
population standard deviation.
j. Sampling frame: It‘s a list of the items or people forming a population from which a sample is
taken.
k. Sampling Units: Sample units are the members of the population from which measurements are
taken during sampling. Sample units are distinct and non-overlapping entities, such as quadrats or
transect, individual plants, branches within a plant, etc.
l. Census: the complete enumeration of a population or group at a point in time with respect to well-
defined characteristics.

7
m. Element or Member: An element or member of a sample or population is a specific subject or
object (for example, a person, firm, item, state, or country) about which the information is
collected.

Assignment 1.1:
1. Define Statistics
2. Define and explain the following terms in your understanding:
a. Population
b. Sample
c. Statistics
d. Data
e. Variable
f. Experiment
SOLUTION

8
SOLUTION

9
SOLUTION

10
SOLUTION

11
SOLUTION

12
Chapter Two
2.1 Average or Measures of Location or Central Tendency.
In our daily endeavour, we summarize data to make decision in one way or the other. For example, in a
market survey, we report an average (mean) price of a particular commodity as a representative of the
price of the commodity in the market. Sometimes, we also use the most common price (mode) or the price
that tends to fall in the middle of all other prices (median) of a particular commodity. When we do all
these, we are making use of the knowledge of measures of location or central tendency.
2.2 Arithmetic Mean or Mean.
The arithmetic mean of a given set of observations is their sum divided by the number of observations.
Thus, it is given by
n

x  x ...  x n x i
x 1 2  i 1
n n
The above formula is used when data are given in array.
Example 2.1: Find the mean of 5, 8, 10, 15, 24 and 28.
n

x  x ...  x n x i
5  8  10  15  24  28 90
x 1 2  i 1
   15
n n 6 6
n

fx i i
When the sampled data are from a frequency distribution for ungrouped form, we adopt x  i 1
n

f
i 1
i

Example 2.2: Find the mean or expectation


x 1 6 11 16 21

F 13 10 20 5 30

x 1 6 11 16 21

F 13 10 20 5 30 n

f
i 1
i

=13+10+20+5+30=78

fx 13 60 220 80 360 n

fx
i 1
i i

=13+60+220+80+630=1005

13
n

fx i i
1005
x i 1
n
  12.88
f
78
i
i 1

The expectation can also be obtained using the probability as


n
x   x i p i where p is the probability of occurrence.
i 1

Example 2.3: Obtain the expectation of the example 2.2


s/n x f P(x) Xp(x)

1 1 13 13/78 = 0.17 1x0.17=0.17

2 6 10 10/78=0.13 6x0.13=0.78

3 11 20 20/78=0.26 11x0.26=2.86
4 16 5 5/78=0.06 16x0.06=0.96

5 21 30 30/78=0.38 21x0.38=7.98
Total n 0.17+0.78+2.86+0.96+7.98=
 p x   1
i 1
 12.75

n
x   x i p i  12.75
i 1

2.2.1 Basic terms associated with Grouped data.


Let the following be given
Class interval 1-5 6 - 10 11 - 15 16 - 20 21 - 25

F 13 10 20 5 30

(i) Class interval and class limits: A symbol defining a class such as 1 – 5 in the Table above
is called a class interval. The end numbers, 1 and 5, are called class limits; the smaller
number (1) is the lower class limit (LCL), and the larger number (5) is the upper class
limit (UCL).
(ii) Class boundaries: The size of the gap between classes is the difference between the upper
class limit of one class and the lower class limit of the next class. Thus,

Lower Class Boundary (LCB) = LCL – D/2


Upper Class Boundary (UCB) = UCL +D/2
where D is the difference between the lower class limit of any class interval and the upper
class limit of the class just before that class. In the example above, using class interval 6
– 5, D = 1. Thus, the class boundary for the class interval 1 – 5 is given as

14
LCB = 1 – ½ = 0.5 and UCB = 5 - ½= 5.5

Class interval 1-5 6 - 10 11 - 15 16 - 20 21 - 25

Class Boundary 0.5 – 5.5 5.5 – 10.5 10.5 – 15.5 15.5 – 20.5 20.5 – 25.5

(iii) Width or Size of Class: This is difference between the upper class boundary and the
lower class boundary of any class. Class Size (C) = UCB – LCB. Example: 10.5 – 5.5 = 5
(iv) Class Mark (Class Mid-Point): This is given by ½(LCL + UCL). Example: ½(1+5) =
6/2= 3.
Class interval 1-5 6 - 10 11 - 15 16 - 20 21 - 25

Class Mid-Point 3 8 13 18 23

Thus, we can obtain the mean of a grouped data as


n

fx i i
x i 1
n

f
i 1
i

where x i is the class mid-point.

Example 2.4: Compute the mean given the following data set
Class interval 1-5 6 - 10 11 - 15 16 - 20 21 - 25
F 13 10 20 5 30

Solution
Class 1-5 6 - 10 11 - 15 16 - 20 21 - 25 Total
interval

F 13 10 20 5 30 78

Class Mid- 3 8 13 18 23
Point (x)

fx 39 80 260 90 690 1159

Class size = 25.5 – 20.5=5

15
n

fx i i
1159
x i 1
n
  14.86
f
78
i
i 1

2.2.2 The method of assumed Mean


The mean of statistical data can be calculated by using the method of assumed mean. This method is
applicable if the individual number or class mid-point represented by X is very large. It alleviates the
stress of working with large numbers by reducing the individual number by a constant number (A) called
assumed mean. The mean using assumed mean is given by the following formula
N

fd i i
x  A i 1

N
where d i X i A

Example 2.5: Calculate the mean of the distribution table


Class 13 - 17 18 - 22 23 - 27 28 - 32 33 - 37 38 - 42 43 - 47 48 - 52 53 - 57
Interval
Frequency 2 22 10 14 3 4 6 1 1
Determine the class mid-point of each class, calculate the deviation of each class mid-point from the
assumed mean A = 35 and then calculate the product as shown in the Table below.
Class 13 - 18 - 23 - 28 - 33 - 38 - 43 - 48 - 53 - Total
Interval 17 22 27 32 37 42 47 52 57
Frequency 2 22 10 14 3 4 6 1 1 63
15 20 25 30 35 40 45 50 55
Mid-point
d= X - A -20 -15 -10 -5 0 5 10 15 20

fd -40 -330 -100 -70 0 20 60 15 20 -425

fd i i
 425
x  A i 1
 35   35  6.75  28.25
N 63
2.3 Median.
The median is the value that divides a data set that has been ranked in increasing order in two equal
halves. If the data set has an odd number of values, the median is given by the value of the middle term in
the ranked data set. If the data set has an even number of values, the median is given by the average of the
two middle values in the ranked data set.

16
As is obvious from the definition of the median, it divides a ranked data set into two equal parts. The
calculation of the median consists of the following two steps:
a. Rank the given data set in increasing order.
b. Find the value that divides the ranked data set in two equal parts. This value gives the
Median.

Note that if the number of observations in a data set is odd, then the median is given by the value of the
middle term in the ranked data. However, if the number of observations is even, then the median is given
by the average of the values of the two middle terms.

The depth (number of positions from either end), or position of the median is determined by the formula.

n 1
Depth of median =
2
For example, find the median of these numbers 3, 4, 5, 7, 9. I the above example, n = 5, thus, the depth of
the median is given as

5 1
Depth of median = 3
2
This implies that the median is the third number from either end in the ranked data, that is, median is 5.
For frequency distribution data,
x 1 6 11 16 21

F 13 10 21 5 30

We can obtain the median of the distribution as


x 1 6 11 16 21

F 13 10 21 5 30

Cumulative 13 23 44 49 79

79  1
Thus, the depth of median =  40 . Thus, the median is 11.
2
Now, even data set, we have
x 1 6 11 16 21

F 13 10 20 5 30

Cumulative 13 23 43 48 78

78  1
Thus, the depth of median =  39.5 . Thus, the median is 11.
2
2.3.1 The median of grouped data set.

17
The median of a grouped data is given by the following formula

N 
 2  Cf b 
Median  L1   C
 f m 
 
where,

L1 is the lower class boundary of the median class,


n
N   f i total frequency of the distribution,
i 1

Cf b is the cumulative frequency of the class just before or preceding the median class
f m is the frequency of the median class,
C is the class size.
Example 2.6: Compute the median of the following data set
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1
The median class is 50 – 59.
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1
Cumulative 5 17 19 34 47 67 74 79 80

L1 = 49.5
n
N   f i = 80
i 1

Cf b = 34
f m = 13
C = 10

N 
 2  Cf b   802  34 
Median  L1     C  49.5     10  54.1
 fm   13 
 
Thus, the median give as 54.1.

2.3.2 The graphical method of obtaining the Median


a. Presenting the data of given series on a graph in the form of less than ogive or more than ogive.

18
b. Presenting the data of given series on a graph, simultaneously in the form of both less than ogive
and more than ogive. The point where these two ogives meet is the median value of the given
series.

2.4 Mode.
The mode is the value that occurs most frequently in a set of observations and around which the other
items of the set cluster densely. In other words, the mode of a distribution is the value at the point around
which the items tend to be most heavily concentrated.
Example 2.7: Find the mode of the following data set. 2, 2, 3, 5, 6, 6, 9, 10, 3, 4, 3, 9, 3.
The mode in this set of data is 3 because it is the most frequent number.
The model of random variable is said to be bimodal if the mode has 2 variables.
Example 2.8: Compute the model of the following data 2, 2, 3, 5, 6, 4, 4, 6, 9, 10, 3, 4, 3, 9, 4, 3.
The mode in this case is 3 and 4. Thus, it is a bimodal.
Also, in some cases, the data might not have mode. For example, for the data 6, 7, 13, 5, 4, 11, 2, 18 and
9. This data set has no mode.
However, for data that are grouped, the mode is obtained mathematically as

 D1 
Mode  L1   C
 D1 D 2 
where,
L1 is the lower class boundary of the modal class,
f 0 is the frequency of class before or preceding the modal class,
f 1 is the frequency of the modal class,

19
f 2 is the frequency of the class after or succeeding the modal class,
C is the class size,
D1 f 1 f 0
D 2  f 1 f 2

Example 2.9: Obtain the mode of the following data


x 1 6 11 16 21

F 13 10 20 5 30

In the above case, the data are not grouped. Thus, the mode is obtained as 21. This implies that 21
appeared 30 times compared to other random values.
In some cases, the mode can be applied to data that are grouped.
Example 2.10: Calculate the mode of the distribution table
Class 13 - 17 18 - 22 23 - 27 28 - 32 33 - 37 38 - 42 43 - 47 48 - 52 53 - 57
Interval
Frequency 2 22 10 14 3 4 6 1 1

In the above example, the modal class is 18 – 22. Hence,

L1 = 17.5
f 0= 2
f 1 = 22
f 2 = 10
C =5
D1 f 1 f 0 = 22 – 2 = 20
D 2  f 1 f 2 = 22 – 10 = 12

 D1   20 
Mode  L1     C  17.5    5  17.5  3.125  20.625 .
 D1 D 2   20  12 

2.4.1 Graphical method of obtaining the mode with a given set of data
Calculation of Mode by Graphical Method
1. Mode by Graphical Method.
2. Draw a histogram of the given set of series.
3. The rectangle of the histogram with the greatest height will be the modal class of the series.

20
Figure 2.1: Calculation of Mode by Graphical Method

2.5 Empirical Relationship between Mean, Median and Mode.


This section describes the relationships among the mean, median, and mode for three such histograms and
frequency distribution curves. Knowing the values of the mean, median, and mode can give us some idea
about the shape of a frequency distribution curve.
(i) For a symmetric histogram and frequency distribution curve with one peak (see Figure 2.2),
the values of the mean, median, and mode are identical, and they lie at the center of the
distribution.

Figure 2.2: Mean, median, and mode for a symmetric histogram and frequency distribution curve.

(ii) For a histogram and a frequency distribution curve skewed to the right (see Figure 2.3), the
value of the mean is the largest, that of the mode is the smallest, and the value of the median
lies between these two. (Notice that the mode always occurs at the peak point.) The value of
the mean is the largest in this case because it is sensitive to outliers that occur in the right tail.
These outliers pull the mean to the right.

Figure 2.3: Mean, median, and mode for a histogram and frequency distribution curve skewed to the right.

21
(iii) If a histogram and a frequency distribution curve are skewed to the left (see Figure 2.4), the
value of the mean is the smallest and that of the mode is the largest, with the value of the
median lying between these two. In this case, the outliers in the left tail pull the mean to the
left.

Figure 2.4: Mean, median, and mode for a histogram and frequency distribution curve skewed to the left.

The cases (ii) and (iii) can be empirically and mathematically expressed as
Mode = Mean – 3(Mean – Median)
2.6 Geometric Mean.

The geometric mean (G) of a set of N positive numbers X 1, X 2, X 3...X N is the Nth root of the
product of the number.

G  N x1, x 2 , x 3 ...x N
For a grouped data, we have
G  N f 1x1 , f 2x 2 , f 3x 3 ... f Nx N
Example 2.11: Find the geometric mean of the numbers 2, 3,5
The geometric mean is obtain as G  N X 1, X 2, X 3... X N  3 2  3  5  3.11 .

2.7 Harmonic Mean.


The harmonic mean H of a set of N numbers x1, x 2 , x3 ...x N is the reciprocal of the arithmetic mean of
the reciprocals of the numbers. It is given as
1  1 1 1 
H    ...  
N  X1 X 2 XN

2.8 Relation between Arithmetic Mean, Geometric Mean and Harmonic Mean.

The geometric mean of a set of positive numbers X 1, X 2, X 3...X N is less than or equal to their arithmetic
mean but is greater than or equal to their harmonic mean.

H G X .

22
Assignment 2.1: Compute the median
x 1 6 11 16 21

F 13 10 21 5 30

SOLUTION

23
Assignment 2.2: Calculate the mode, mean, median of the distribution table
Class 13 - 17 18 - 22 23 - 27 28 - 32 33 - 37 38 - 42 43 - 47 48 - 52 53 - 57
Interval
Frequency 20 12 11 15 13 14 6 9 8

SOLUTION

24
Chapter Three

3.1 Measure of Dispersion.


Average or measure of central tendency gives us an idea of the concentration of the observations about
the central part of the distribution. Thus, when decisions are made, it is not enough to report a number that
describes the centre of the sample. For instance, why do we send two marketers for a market survey, and
we prefer the reported by one of them. This may be due to consistency and the closeness of the reported
price to average price in the market. A measure of the closeness of the average values is what statisticians
refer to as measures of dispersion or variation. Thus, an estimate with lower dispersion is better than the
other with higher dispersion. There are several methods for finding dispersion in a set of data.
1. Range
2. Mean deviation
3. Variance
4. Standard Deviation
5. Coefficient of variation

3.2 Objectives and Significance of the Measures of Dispersion.


The main objectives of studying measure of dispersion include the following:
a. To find out the reliability of an average.
b. To control the variation of the data from the central value.
c. To compare two or more sets of data regarding their variability.
d. To obtain other statistical measures for further analysis of data.

3.2.1 Characteristics of an ideal measure of dispersion


a. It should be rigidly defined.
b. It should be easily understood and calculated.
c. It should be based on all observations.
d. It should be flexible for further mathematical treatment.
e. It should be affected as little as possible by fluctuations of sampling.
f. It should not be affected much by extreme observations

3.3 Range.
The range is the simplest of all the measures of dispersion. It is defined as the difference between the
extreme observations of the distribution. Thus, the range is the difference between the greatest
(maximum) and the smallest (minimum) observation of the distribution.
Thus,

Range  X max X min = Largest – Smallest


Examples 3.1: Find the range of the following prices of goods purchased by a vendor 6, 4, 1, 6, 10, 3
The highest observation is 10 and the smallest is 1.
Thus, the Range is
Range = 10 – 1 = 9.

25
In the above case, the data are not grouped or ungrouped but in array.

Example 3.2: Compute the range of the frequency distribution


Now, supposed the data were given in a frequency distribution form. ie. Ungrouped
X 10 20 30 40 50 60 70 80 90 100
Frequency 5 12 2 15 13 20 7 5 3 8

The largest value is 120 and the smallest value is 10. Thus the range is given as
Range = Largest – Smallest = 100 – 10 = 90.
Now, suppose for a group data.

Example 3.3: Obtain the Range of the following data.


X 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99
Frequency 5 12 2 15 13 20 7 5 1
The range is obtain as
X 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99
Frequency 5 12 2 15 13 20 7 5 1
Mid-Point 14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5

The largest value is 94.5 and the smallest is 14.5.


Thus, the Range = 94.5 – 14.5 = 80.0
Note: In case of a frequency distribution, the frequencies of the various values of the variable or variables
or classes are IMMATERIAL since range depends only on the two extreme observations.
Another important measure is the Coefficient of range. The Coefficient of range is use to compare the
variability of the distributions of different units of measurement.
Thus,

X max X min
Coefficient of Range 
X max X min

Example 3.4: Calculate the coefficient of range of monthly earnings for a year in naira
S/N 1 2 3 4 5 6 7 8 9 10 11 12
Monthly 139 150 151 151 157 158 160 161 162 162 173 175
Earnings

26
X max  X min 175  139 36
Coefficient of Range     0.115
X max  X min 175  139 314
Merits and Demerits of Range
1. Range is not based on the entire set of data.
2. Range is very much affected by fluctuations of sampling
3. Range cannot be used if we are dealing with open end classes
4. Range is not suitable for mathematical treatment
5. Range is very sensitive to the size of the sample
6. Range is too indefinite to be used as a practical measure of dispersion.

Uses of Range
1. Range is used in the industry for statistical quality control of the manufactured product by the
construction of R-chart. i.e. The control chart for range
2. Range is by far the most widely used measure of variability in our day-to-day life.
3. Range is used as a very convenient measure by meteorological department for weather forecast.
This is because; it is primarily interested to know the limits within which the temperature is likely
to vary on a particular day.
4. Range is used in the stock market fluctuations, variations in money rates and rate of exchange.

3.4 Mean Deviation or Average Deviation.


The mean deviation is the mean absolute deviations from the mean. Thus, is given as
N

 X X i
MD  ei 1

N
Note that |….| implies the answer must be positive.

Example 3.5: Compute the mean deviation


S/N 1 2 3 4 5 6 7 8 9 10
X 2 12 15 17 20 24 27 30 35 40
The solution is given as
n

X i
222
X  i 1
  22.2
n 10
S/N 1 2 3 4 5 6 7 8 9 10 Total
X 2 12 15 17 20 24 27 30 35 40 222

27
X i X 2 - 12 - 15 - 17 - 20 - 24 - 27 - 30 - 35 - 40 - 90
22.2 22.2 22.2 22.2 22.2 22.2 22.2 22.2 22.2 22.2
= = = 7.2 = 5.2 = 2.2 = 1.8 = 4.8 = 7.8 = =
20.2 10.2 12.8 17.8

 X X i
90
MD  ei 1
 9
n 10
For data in frequency distribution, the mean deviation is given as
N

f i
X i X
MD  ei 1
n

f i 1
i

Example 3.6: Suppose we have the following to obtain the mean deviation

x 12 20 24 32 50 62 70 75 86 90

frequency 4 6 8 2 3 7 10 11 9 10

Solution
The first thing to do is to calculate the mean
n

fX i i
4207
X  i 1
n
  60.1
f
70
i
i 1

S/N X F FX X i X F X i X

1 12 4 4x12 = 48 12 – 60.1= 48.1 192.4


2 20 6 6x20 = 120 20 – 60.1= 40.1 240.6
3 24 8 8x24 = 192 24 – 60.1= 36.1 288.8
4 32 2 2x32 = 64 32 – 60.1= 28.1 56.2
5 50 3 3x50 = 150 50 – 60.1= 10.1 30.3
6 62 7 7x62 = 434 62 – 60.1= 1.9 13.3
7 70 10 10x70 = 700 70 – 60.1= 9.9 99
8 75 11 11x75 = 825 75 – 60.1= 14.9 163.9

28
9 86 9 9x86 = 774 86 – 60.1= 25.9 233.1
10 90 10 10x90 = 900 90 – 60.1= 29.9 299
Total 70 4207 1616.6

f i
X i X
1616.6
MD  ei 1
n
  23.09
f
70
i
i 1

Example 3.7: Obtain the mean deviation for a group data set

Class 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57
Interval

Frequency 2 22 10 14 3 4 6 1 1

Solution
The first thing to do here is to calculate the mean.
n

fX i i
1780
X  i 1
n
  28.25
f
63
i
i 1

f i
X i X
481.25
MD  ei 1
n
  7.64
f
63
i
i 1

S/N Class Interval Frequency= F X FX X i X F X i X

1 13 -17 2 15 15x2 = 30 15 – 28.25 = 13.25 26.5

2 18-22 22 20 22x20 = 440 20 – 28.25 = 8.25 181.5

3 23 -27 10 25 10x25 = 250 25 – 28.25 = 3.25 32.5

4 28 -32 14 30 14x30 = 420 30 – 28.25 = 1.75 24.5

5 33 – 37 3 35 3x35 = 105 35 – 28.25 = 6.75 20.25

6 38 – 42 4 40 4x40 = 160 40 – 28.25 = 11.75 47

7 43 – 47 6 45 6x45 = 270 45 – 28.25 = 16.75 100.5

29
8 48 – 52 1 50 1x50 = 50 50 – 28.25 = 21.75 21.75

9 53 -57 1 55 1x55= 55 55 – 28.25 = 26.75 26.75


Total 63 1780 481.25

Example 3.8: Obtain the mean deviation


Class interval 1-5 6 - 10 11 - 15 16 - 20 21 - 25

F 5 4 2 7 2

Solution
S/N FX X i X F X i X
X F
1 3 5 15 9.25 46.25
2 8 4 32 4.25 17
3 13 2 26 0.75 1.5
4 18 7 126 5.75 40.25
5 23 2 46 10.75 21.5
Total 20 245 126.5

fX i i
245
X  i 1
n
  12.25
f
20
i
i 1

f i
X i X
126.5
MD  ei 1
n
  6.325
f
20
i
i 1

Merit of Mean Deviation


1. It is rigidly defined and easily understood and calculated.
2. It is based on all the observations. Thus, it is a better measure of dispersion than the range and
quartile deviation.

30
3. The averaging of the absolute deviations from an average irons out the irregularities in the
distribution and thus mean deviation provides an accurate and true measure of dispersion.
4. It is less affected by extreme observation.
5. It provides a better measure for comparison about the formation of different distributions.

Demerit of mean Deviation


1. The computation ignores the signs of the deviation.
2. It is not a satisfactory measure of dispersion when compare to the mode or while dealing with a
fairly skewed distribution.
3. The steps involved are mathematically unsound and illogical.
4. It is rarely used in sociological studies.
5. It cannot be computed for distributions with open end classes.
6. Mean deviation tends to increase with the increase in size of the sample. Though, not
proportionately and not so rapidly as range.

Uses
Despite its mathematical drawback, the mean deviation has found it favour in economics, and business
statisticians. This is because of its simplicity, and accuracy. It is majorly use in computing personal
wealth in a community or nation, business cycle, and the National Bureau of Economic Research.
Assignment 3.1: Obtain the mean deviation for a group data set

Class 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57
Interval

Frequency 12 20 10 14 13 14 16 12 15

SOLUTION

31
3.5 Variance.
Variance is a useful very important measure of the spread of the original values about the mean.
However, for a population, the variance is denoted in terms of a Greek letter  2 . Thus, for sample the
variance is represented by s 2 . The variance is the mean of squared deviations from the mean.
The Variance of the distribution is obtained as the squared of the standard deviations from the mean.
Thus, for a population, the variance is given as

 X  X 
n
2
i
2  i 1
.
N
Now, when data are given in frequency distribution, the variance is given as

 f X  X 
2

 f iX   f iX i 
n N
fX
2

2 2

 
i i i i i

  i 1n   2
2  i 1
 n  X .
n   n

f
i 1
i 
i 1
fi  fi
 i 1


f
i 1
i

. The variance for a sample of a set of numbers N for variable X 1, X 2,...X N is given as

 X  X 
n
2
i
S2  i 1

n 1
Example 3.9: Obtain the variance for a group data set

Class 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57
Interval

Frequency 2 22 10 14 3 4 6 1 1

Solution
The first thing to do here is to calculate the mean.
n

fX i i
1780
X  i 1
n
  28.25
f
63
i
i 1

32
S/N Class Interval Frequency X FX X2 FX 2
1 13 -17 2 15 15x2 = 30 225 450

2 18-22 22 20 22x20 = 440 400 8800

3 23 -27 10 25 10x25 = 250 625 6250

4 28 -32 14 30 14x30 = 420 900 12600

5 33 – 37 3 35 3x35 = 105 1225 3675

6 38 – 42 4 40 4x40 = 160 1600 6400

7 43 – 47 6 45 6x45 = 270 2025 12150

8 48 – 52 1 50 1x50 = 50 2500 2500


9 53 -57 1 55 1x55= 55 3025 3025

Total 63 1780 55850

fX 2

 
i i 2
55850  1780 
2   X     88.45
 63 
n

f
63
i
i 1

3.6 Standard Deviation.


This is the square root of the mean of square deviations from the mean. The standard deviation is a far
more useful measure of spread or variability in a set of data. This can be defined as the square root of the
variance. The standard deviation for a population of a set of numbers N for variable X 1, X 2,...X N is
given as

 X  X 
2
N
 N 
 X i2 X
2
i 
 i 1
   i 1 
N N  N 
 
  .
Now, when data are given in frequency distribution, the standard deviation is given as

33
 f X  X 
2
n
 N 
   f iX i fX
2
f iX i2 
2

 
i i i i

  i 1n   2
 i 1
  X
n n   n

f
i 1
i f
i 1
i  fi
 i 1


f
i 1
i
.

The standard deviation for a sample of a set of numbers N for variable X 1, X 2,...X N is given as

 X  X 
n
2
i
S i 1

n 1 .
The standard deviation has a special feature in that it measured in the same units as the original data. That
is, if the original data set were measured in weights, then the mean and standard deviation are measured in
weights. Also, we noted that the large the population standard deviation, the greater the spread or
variability among those numbers. The smaller the value of the population standard deviation, the smaller
the amount of variability in the data set.

Example 3.10: The arithmetic mean and standard deviation of series of 20 items were calculated by a
student as 20cm and 5cm respectively. But while calculating them an item 13 was misread as 30. Find the
correct arithmetic mean and standard deviation.
Solution
n = 20, mean = 20cm, standard deviation = 5cm; wrong value used = 30; correct value = 13
n

 X i  n X  20  20  400 , and
i 1
n

X
i 1
2

 n2  X
2
  2025  400  8500
If the wrong observation 30 is replaced by the correct value 13, then the number of observations remains
same viz., 20 and
n n

 X  400  30  13  383 , and corrected X  8500  30  13  7769


2 2 2
Corrected
i 1 i 1

 n 
Corrected   X 
Corrected mean =  i 1   383  19.15
n 20
Corrected
 n 
Corrected   X  2
2   i 1   Corrected mean 2  7769  19.52  388.45  366.72  21.73
n 20

Corrected Standard Deviation = 21.73  4.6615 .

34
Assignment 3.2: Obtain the Standard deviation for a group data set

Class 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57
Interval

Frequency 12 20 10 14 13 14 16 12 15

SOLUTION

35
3.7 Interpreting Standard Deviation.
The standard deviation is one of the most useful measures of dispersion or variability of the data. Thus,
for a homogeneous set of data. That is, for the observations which are close to each other,  will be small
and for a heterogeneous set of data. That is, for the observations which are widely spread,  will be
large. Hence, we can use the values of standard deviations for the relative or comparative study of two or
more groups.

3.8 Empirical Rule.


Empirical rule is an important rule of thumb used for interpreting the value of the standard deviation. The
empirical rule is classified into three categories: The results are shown in Figure 3.1.

Figure 3.1: Empirical Rule Graph


a. Approximate 68% of the data values lie within 1 standard deviation of the mean. i.e. in the

interval X   , X   .
b. Approximate 95% of the data values lie within 2 standard deviation of the mean. i.e. in the

interval X  2 , X  2 . 
c. Approximate 99.7% of the data values lie within 3 standard deviation of the mean. i.e. in the

interval X  3 , X  3 . 
Remarks
1. If the variable under study has normal distribution, then the results (a) to (c) are exactly true.
However, these results apply very well to data with the frequency distributions which are
approximately symmetric and bell-shaped.

36
2. Range Rule of the thumb: The empirical rule can be used to estimate the value of the range of the
given data.

a. For normal distribution, 95.4% of the observation lies within the limit X  2 , X  2 . 
 
Hence, Range = X  2 - X  2  4 , covers 95.4% of the data observations.
The estimate of the standard deviation can be obtained as
 Range
4

which will be true in 95.4% of the cases.


b. For normal distribution, 99.73% of the observation lies within the limit X  3 , X  3 . 
 
Hence, Range = X  3 - X  3  6 , covers 99.73% of the data observations.
The estimate of the standard deviation can be obtained as
 Range
6

which will be true in 99.73% of the cases.


Example 3.11: The test scores of a sample of 100 students have a symmetric mounded distribution with a
mean score of 570 and standard deviation of 70. Approximately, what percent of the scores are between
(a) 430 and 710, (b) 360 and 780.
The mean of X = 570 and its standard deviation = 70.

a. We have X  2  570  140  710 and X  2  570  140  430 . Hence, 430 and 710
are exactly 2 standard deviation away from the mean. The percentage of scores between 430
and 710 is equal the percentage scores lying between X  2 and X  2 ., i.e. with 2
standard deviation of the mean = 95% (approximately). Hence, approximately 95% of the
scores are between 430 and 710.
b. We have X  3  570  370  780 and X  3  570  370  360 . Hence, 360 and
780 are exactly 3 standard deviation away from the mean. The percentage of scores between
360 and 780 is equal the percentage scores lying between X  3 and X  3 ., i.e. with 3
standard deviation of the mean = 99% (approximately). Hence, approximately 99% of the
scores are between 360 and 780.

Assignment 3.3: The test scores of a sample of 1000 students have a symmetric mounded distribution
with a mean score of 870 and standard deviation of 10. Approximately, what percent of the scores are
between (a) 6430 and 910, (b) 460 and 980.
SOLUTION

37
SOLUTION

38
Example 3.12: The following are the cholesterol levels of a group of 20 middle aged men.
190 230 295 310 260 245 270 220 240 240
250 275 235 180 250 250 202 215 210 320
Using the range rule of the thumb, obtain a rough estimate of the data standard deviation.
In the above example, we observed the largest observation (L) = 320 and the smallest observation (S) =
180.
Thus, Range = L – S = 320 -180 = 140. Hence, by the range rule of thumb, an estimate of the standard
deviation of the data is given as

140
 Range
4
  35
4
The actual value of the standard deviation for the above data is 37.88 the estimated value of the standard
deviation obtained above is reasonably close to the exact value.

3.9 Chebyshev‘s Theorem.


The Chebyshev‘s Theorem provides us with a rule for interpreting the value of standard deviation. The
rule was proposed by a Russian mathematician called Chebyshev in 1853. According to Chebyshev's
theorem, the proportion of statistical variable within K standard deviations is at least one minus one
divided by K squared. Here, K is any positive number greater than one. For K equal to two, at least 75
percent of the statistical variable is within two standard deviations of the mean. Mathematically,

 1 
For any k  1 , at least 1001  % of the data values lie within k standard deviations of the mean of
 k2 
the data values. i.e. within the limits X  k and X  k .

Taking k = 2, thus we have 1  1


k2
 1  14  34  0.75  75% . Hence, by using the Chebyshev‘s rule we
conclude that at least 75% of the data values will fall within the limits X  2 and X  2 .

Taking k = 3, we have 1  1
k2
 1  19  89  0.89  89% . Hence, using the Chebyshev‘s rule, we conclude
that at least 89% of the data values will fall within the limits X  3 and X  3 .
The empirical rule holds reasonably well in the case of data with approximately symmetrical mounded
frequency distribution and exactly for the normal distribution. However, the beauty of Chebyshev‘s rule is
that it holds for all sorts of data irrespective of the nature of the distribution represented by it. See Figure
3.2a. to Figure 3.2c.

39
Figure 3.2a: Chebyshev percent values graph

Figure 3.2b: Chebyshev percent values graph

Figure 3.2c: Chebyshev percent values graph

Example 3.13: According to Chebyshev‘s theorem at least what percentage of the data value lie between
(a) X  4 , X  4 , (b) X  2.3 , X  2.3

40
(a) Taking k = 4, 1  1
k2
 1  412  1  161  164
15
 0.9375  93.75% . Thus, at least 93.75% of the data
values lie between X  4 , X  4 .
(b) Taking k = 2.3, 1  1
k2
 1  2.13 2  1  0.189  81.1  81.1% . Therefore, at least 81.1% of the
data values fall within X  2.3 and X  2.3 .

Example 3.14: The average and standard deviation of a sample of size 150 are 15 and 2 respectively.
(a) At least what percentage of sample values lies between 9 and 21?
(b) How many sample values lie between 10 and 20?

(a) Given that


n  150, X  15,  2 .
Let X  k  21 , which implies that k  21 X
  2115
2
 3; X  k  9 , which implies that
k  1529  3 .
X 9

Thus, taking k  3 in Chebyshev‘s formula,
we have 1  k12  1  312  1  19  88  0.89  89% .
Thus, at least 89% of the sample values will fall within 9 and 21. That is, at least
 89 
  150   133.5  134 of the 150 values fall within 9 and 21.
 100 
(b) In this case, we want to find k so that X  k  20 which implies that k  20  X
  20 15
2
 2.5 and
X  k  10 . Thus, k  X 10
  15 210  2.5 and 1  k12  1  2.15 2  1  254  21
25
 0.84  84% .
Hence, at least 84% of the 150 = 84
100
 150  126 of the 150 values fall within 10 and 20.

Assignment 3.4: According to Chebyshev‘s theorem at least what percentage of the data value lie
between (a) X  8 , X  8 , (b) X  6.3 , X  6.3

SOLUTION

41
3.10 Z-scores or Standard Scores.
The Z-scores give the number of standard deviations the originsal value is from the mean. A z-score or
standard score is obtained on shifting the origin in the original measurement (X) to its mean and the scale
to its standard deviation. Standard scores is usually denoted by Z; given as

X  Mean
Z ,
SD
where SD means standard deviation.
The Z-values are independent of the units of measurements or in other words, they are in standard unit.
Accordingly, the transformation of the Z-score formula from X-scores to Z-scores enables us to compare
two or more distributions with different means and standard deviations. Also, the Z-scores play a very
important role in normal distribution for computing areas under normal probability curves.
Example 3.15: Joseph and Stella are taking Statistics course from different colleges. Joseph‘s score in the
first year is 76, whereas the class average is 68 with a standard deviation of 5. Stella scored 82 in the first
year while class average score was 71 and standard deviation 8. Compare the performances of the two
students relative to their classes.

XX
Score (X) Mean   X  SD   Z-score=

76  68
Joseph 76 68 5 Z 1  1.60
5
82  71
Stella 82 71 8 Z 2  1.38
8
76  68
Joseph‘s Z-score is Z 1  1.60 standard deviations above (because it is positive) his class
5
82  71
average and Stella‘s Z-scores is Z 2   1.38 standard deviations above his class average. Since
8
Z 1 Z 2 , Joseph performance (relative to their classes) is better than that of Stella in the first year
Statistics course.
Remark: on the face of it, Stella‘s score of 82 is higher than Joseph‘s score of 76, by 82 – 76 = 6 marks.
However, relative to their classes, Joseph‘s performance is better.

3.11 Variance and Standard Deviation using Assumed Mean and scaling factor.
The standard deviation of grouped data also can be calculated by "step deviation method". In this method
also, some arbitrary data value is chosen as the assumed mean, A. Then we calculate the deviations of all
data values by using d i x i  A .

42
 n 2

  d 2   d 
n
 
 i i

   i 1   i 1 
 and the variance given as
n  n 
   
   
 
2
n
 n 
 d   di 
2
i
  2 i 1
  i 1 
n  n 
 
 
di
Or better still, for grouped data, we can scale down d to hi  , where c is the regular increment in the x
c
values (Class size). Thus, the formula reduces to
n

fh i i
x  A i 1
n
c
f
i 1
i

 n 
2
 n 
  f i hi 
 

f h   i 1 

i 1
i i

n
fi 
2  i 1
  c 2 for population, and
 
n

 
i 1
fi

 

 n 
2
 n 
  f i hi 
 

f h   i 1 

i 1
i i

n
fi 
s2   i 1
  c 2 for sample.
 
n

  fi 
 i 1 
 
The population standard deviation and the sample standard deviation can be obtained as the square roots
of the above formula.
Example 3.16: Find the mean and standard deviation of the grouped data.
Class Interval 2-5 6–9 10 - 13 14 - 17 18 - 21
Frequency 7 15 22 14 2
The solution to the example 3.16 with the class size 9.5 – 5.5 = 4 can be obtained as
Class Interval 2-5 6-9 10 - 13 14 - 17 18 - 21 Total
Frequency 7 15 22 14 2 60

43
Mid-point 3.5 7.5 11.5 15.5 19.5
d  xA -8 -4 0 4 8
h -2 -1 0 1 2

fh -14 -15 0 14 4 -11

fh 2 28 15 0 14 8 65

fh i i
 11 44
Let A = 11.5. Then, x  A  i 1
n
 c  11.5   4  11.5   10.77 .
f
60 60
i
i 1

Thus, the variance is obtained as

 n 
2
 n 
  f i hi 
 

f i hi  n 
 i 1

i 1  fi  65  60
112
65  2.02  16  1.05  16  16.8
s2   i 1
c 
2
 42 
 
n


60 60
fi
 i 1

 

s  16.8  4.099 .

3.12 Coefficient of Variation.


Statistically, to compare the relative amounts of variation in populations with different means, the
coefficient of variation was developed. The Coefficient of variation (CV) is defined as

S tan dard Deviation S


CV   100%   100%
Mean X
The statistical value with the smallest CV among the variables is preferred to the one with highest
CV.

3.13 Measure of partition: Quartiles, Deciles, and Percentiles


The Partition Values are the measures used to divide the total number of observations from a distribution
into a certain number of equal parts. Quartiles, Deciles, and Percentiles are some of the most often used
partition values. It is important here to note that the data should be sorted in either ascending or
descending order before calculating the partition values. Quartiles divide the data into four equal parts;
decile divide the data into ten equal parts and percentiles divide the data into hundred equal parts.

3.13.1 Quartiles and Interquartile Range


Quartiles are the summary measures that divide a ranked data set into four equal parts. Three measures
will divide any data set into four equal parts. These three measures are the first quartile (denoted by Q1),
the second quartile (denoted by Q2), and the third quartile (denoted byQ3). The data should be ranked in
increasing order before the quartiles are determined. The quartiles are defined as follows. Note that Q1
and Q3 are also called the lower and the upper quartiles, respectively.

44
Quartiles are three values that divide a ranked data set into four equal parts. The second quartile is the
same as the median of a data set. The first quartile is the median of the observations that are less than the
median, and the third quartile is the median of the observations that are greater than the median.
Figure 3.3 shows the diagrammatic partition of the quartiles.

Figure 3.3: The quartiles

The difference between the third quartile and the first quartile for a data set is called the interquartile
range (IQR), which is a measure of dispersion. Calculating Interquartile Range The difference between
the third and the first quartiles gives the interquartile range; that is,

IQR = Interquartile = Q3  Q1

Q3  Q1
Semi- IQR = Interquartile =
2

Example 3.17: A sample of 12 commuter students was selected from a college. The following data give
the typical one-way commuting times (in minutes) from home to college for these 12 students.
29 14 39 17 7 47 63 37 42 18 24 55
a. Find the values of the three quartiles.
b. Where does the commuting time of 47 falls in relation to the three quartiles?
c. Find the interquartile range.

Solution
a. We perform the following steps to find the three quartiles.

Step 1. First we rank the given data in increasing order as follows:


7 14 17 18 24 29 37 39 42 47 55 63

Step 2. We find the second quartile, which is also the median. In a total of 12 data values, the
median is between sixth and seventh terms. Thus, the median and, hence, the second quartile is
given by the average of the sixth and seventh values in the ranked data set, that is the average of
29 and 37.

Thus, the second quartile is:


29  27
Q2   33
2
Note that Q2 = 33 is also the value of the median.

45
Step 3. We find the median of the data values that are smaller than Q2 , and this gives the value of
the first quartile. The values that is smaller than Q2 are:
7 14 17 18 24 29

The value that divides these six data values in two equal parts is given by the average of the two
middle values, 17 and 18. Thus, the first quartile is:

17  18
Q1   17.5
2

Step 4. We find the median of the data values that are larger than Q2 , and this gives the value of
the third quartile. The values that is larger than Q2 are:

37 39 42 47 55 63
The value that divides these six data values in two equal parts is given by the average of the two
middle values, 42 and 47. Thus, the third quartile is:
42  47
Q3   44.5
2
Now we can summarize the calculation of the three quartiles in the following figure.

The value of Q1 = 17.5 minutes indicates that 25% of these 12 students in this sample commute for less
than 17.5 minutes and 75% of them commute for more than 17.5 minutes.
Similarly, Q2 = 33 indicates that half of these 12 students commute for less than 33 minutes and the other
half of them commute for more than 33 minutes.
The value of Q3 = 44.5 minutes indicates that 75% of these 12 students in this sample commute for less
than 44.5 minutes and 25% of them commute for more than 44.5 minutes.

b. By looking at the position of 47 minutes, we can state that this value lies in the top 25%
of the commuting times.
c. The interquartile range is given by the difference between the values of the third and the
first quartiles. Thus,

IQR = Interquartile range = Q3  Q1 = 44.5 − 17.5 = 27 minutes

For a grouped data.


The Quartile for first quartile is given by

N 
 4  Cf Q1 
Q1  L Q1   C
 f Q1 
 
where,

46
L Q1 is the lower class boundary of the first quartile class,
n
N   f i total frequency of the distribution,
i 1

Cf Q1 is the cumulative frequency of the class just before or preceding the first quartile class
f Q1 is the frequency of the first quartile class,
C is the class size.

The Quartile for third quartile is given by

N 
 4  Cf Q3 
Q3  L Q3   C
 f Q3 
 
where,

LQ3 is the lower class boundary of the third quartile class,


n
N   f i total frequency of the distribution,
i 1

Cf Q3 is the cumulative frequency of the class just before or preceding the third quartile
f Q3 is the frequency of the third quartile class,
C is the class size.

Example 3.18: Compute the quartiles of the following data set


Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1

The first quartile class is obtained as 80/4= 20 and that of the third quartile is 3N/4= 240/4 = 60.
Thus, the first quartile class is 40 – 49 and the third quartile class is 60 – 69.
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1
Cumulative 5 17 19 34 47 67 74 79 80

For the first quartile,

L Q1 =39.5

47
n
N   f i = 80
i 1

Cf Q1 = 19
f Q1 = 15
C = 10

N 
 4  Cf Q1   20  19 
Q1  L Q1     C  39.5     10  40.2
 f Q1   15
 
For the third quartile
LQ3 = 59.5
n
N   f i = 80
i 1

Cf Q3 = 47
f Q3 = 20
C = 10

N 
 4  Cf Q3   60  47 
Q3  LQ3     C  59.5     10  66.0 .
 f Q 3   20
 

3.13.2 Deciles
A decile is a quantile that is used to divide a data set into 10 equal subsections. The 5 th decile will be the
median for the dataset.
The (approximate) value of the kth decile, denoted by Dk , is
kn
Dk  Value of the   term in a ranked data set
 10 
where k denotes the number of the decile and n represents the sample size.

For grouped data, we can apply the formula

 k  n  
  10   Cf Dk 
Dk  L Dk     C
 f Dk 
 
 
where,

L Dk is the lower class boundary of the k decile class,

48
n
N   f i total frequency of the distribution,
i 1

Cf Dk is the cumulative frequency of the class just before or preceding the decile class.
f Dk is the frequency of the decile class,
C is the class size.

3.13.3 Percentiles
Percentiles are the summary measures that divide a ranked data set into 100 equal parts. Each (ranked)
data set has 99 percentiles that divide it into 100 equal parts. The data should be ranked in increasing
order to compute percentiles. The kth percentile is denoted by, where k is an integer in the range 1 to 99.
For instance, the 25th percentile is denoted by P25. Figure 3.4 shows the positions of the 99 percentiles.

Figure 3.4: Percentiles

Thus, the kth percentile, Pk , can be defined as a value in a data set such that about k% of the
measurements are smaller than the value of Pk and about (100 − k)% of the measurements are greater
than the value of Pk .
The approximate value of the kth percentile is determined as explained next.

Calculating Percentiles

The (approximate) value of the kth percentile, denoted by Pk , is


kn
Pk  Value of the   term in a ranked data set
 100 
where k denotes the number of the percentile and n represents the sample size.

kn
If the value of   is fractional, always round it up to the next higher whole number.
 100 
For grouped data, we can apply the formula

 k  n  
  100   Cf Pk 
Pk  L Pk     C
 f Pk 
 
 
where,

L Pk is the lower class boundary of the k percentile class,

49
n
N   f i total frequency of the distribution,
i 1

Cf Pk is the cumulative frequency of the class just before or preceding the percentile class.
f Pk is the frequency of the percentile class,
C is the class size.

Example 3.19: Compute the 70th percentile of the following data set
A sample of 12 commuter students was selected from a college. The following data give the typical one-
way commuting times (in minutes) from home to college for these 12 students.
29 14 39 17 7 47 63 37 42 18 24 55
Arrange the data in increasing order, we have
7 14 17 18 24 29 37 39 42 47 55 63
Now, for k = 70 and n = 12, we have
 70  12 
P70     8.4  9 term.
th

 100 
P70 = Value of the 9th term = 42 minutes.

Example 3.20: Compute the 10th percentile of the following data set
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1

 10  80 
Thus, the 10th percentile class is obtained as    8 for k = 10, n = 80. This falls in the class
 100 
interval 20 – 29, thus the P10

Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99


interval
Frequency 5 12 2 15 13 20 7 5 1
Cumulative 5 17 19 34 47 67 74 79 80

where,

L Pk = 19.5
n
N   f i = 80
i 1

50
Cf Pk = 5
f Pk = 12

C = 10

 k  n  
  100   Cf Pk  8  5 
Pk  L Pk       C  19.5    10  22.0
 f Pk   12 
 

3.14 Skewness and Kurtosis.


The skewness (SK) of a statistical data is the degree of the asymmetry or departure from symmetry of a
distribution. Normally distributed data is symmetric about the mean, median and mode. This implies that
the mean, median and mode lie in the middle of the distribution and the value of skewness is zero. If the
smoothed frequency polygon of a distribution has a longer tail to the right of the central value than to the
left, the distribution is said to be skewness to the right, or to have positive skewness. If the reverse is true,
it is said to be skewed to the left or to have negative skewness. Measures of skewness are as follows:

Mean  Mode
SK 
S tan dard Deviation

X  Mode 3Mean  Median


SK  
SD S tan dard Deviation

SK 

3 X  Median 
SD
Q3  2Q 2 Q1
SK 
Q3  Q1

The kurtosis (K) is a measure of the tailedness of a distribution. Tailedness is how often outliers occur.
Excess kurtosis is the tailedness of a distribution relative to a normal distribution. Distributions with
medium kurtosis (medium tails) are mesokurtic. It can a measure of whether the data are heavy-tailed or
light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails,
or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution
would be the extreme case.
Mathematically,

Q3  Q1
K 2
P90  P10
where Q and P are the quartiles and percentile.

51
Example 3.21: Compute the Kurtosis of the following data set
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1

Q1  40.2. Q3  66.0, P90  76.64, P10  22 . Thus,

Q3  Q1 66.0  40.2
K 2  2  0.236
P90  P10 76.64  22
Assignment 3.5: Compute the quartiles, decile and percentile of the following data set
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 15 20 12 13 14 25 17 25 11

SOLUTION

52
SOLUTION

53
Chapter Four
4.1 Introduction to Probability
The subject of probability theory is an aspect of mathematics that deals with study of possible outcomes
of a given events together with the outcomes relative likelihoods and distribution.
Probability is a numerical measure of the likelihood that a specific event will occur.
Suppose that an event E can happen in w ways out of a total of n possible equally likely ways. Then, the
probability of occurrence of the event called its success is denoted by

p  Pr E 
Number of ways E can occur
.
Total Number of possible outcomes
Simply put as

p  PrE 
w
n
The probability of nonoccurrence of the event called its failure is denoted by

nh
q  Prnot E   1   1  p  1  PrE
h
n n
Thus, Pr E  Prnot E  1 . That is p  q  1

4.2 Definitions of terms


In this section, we shall define some terms that will help us to understand the basic subject of probability.
a. Experiment: An experiment is a process that produces a result or an observation. Eg.
When a coin is tossed and throwing a die. In this case, the coin can appeared as head or
tail.

Figure 4.1: Tree diagram for one toss of a coin.


The sample space is given as {H. T}. The number of sample is given as n(S) = 2. However, if the die is
tossed twice, the sample space will be {HH, HT, TH, TT}. The number of sample points is given as n(S)
= 4.

54
Figure 4.2: Tree diagram for one toss of a coin.
Suppose we throw a die once, we can obtain 1, 2, 3, 4, 5, and 6. The number of sample point is n(S) = 6.
However, if the die is rolled two times, we have

+ 1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 8
3 4 5 6 7 8 9

4 5 6 7 8 9 10
5 6 7 8 9 10 11

6 7 8 9 10 11 12

The sample point S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.

x 2 3 4 5 6 7 8 9 10 11 12 Total

n(x) 1 2 3 4 5 6 5 4 3 2 1 36

Thus, n(S) = 36
b. Outcome: The outcome of any experiment is the particular result obtained when an
experiment is performed.
c. Sample space: A sample space is a set of all possible outcomes of an experiment.
d. Sample points: These are the individual outcomes in a sample space.
e. Event: An event is any subset of the sample space. The number of sample points of A is
denoted by n(A).
f. Equally Likely Outcomes: Two or more outcomes that have the same probability of
occurrence are said to be equally likely outcomes.

4.3 Properties of Probability


There are two important properties of probability that we should always remember. These properties are
mentioned below.

55
1. The probability of an event always lies in the range 0 to 1.
Whether it is a simple or a compound event, the probability of an event is never less than 0 or greater than
1. We can write this property as follows.

0  PEi   1
0  P  A  1
An event that cannot occur has zero probability and is called an impossible (or null) event.
An event that is certain to occur has a probability equal to 1 and is called a sure (or certain) event.
In the following examples, the first event is an impossible event and the second one is a sure event.

P(a tossed coin will stand on its edge) = 0


P(a child born today will eventually die) = 1.0

There are very few events in real life that have probability equal to either zero or 1.0. Most of the events
in real life have probabilities that are between zero and 1.0. In other words, these probabilities are greater
than zero but less than 1.0. A higher probability such as .82 indicates that the event is more likely to
occur. On the other hand, an event with a lower probability such as .12 is less likely to occur. Sometime
events with very low (.05 or lower) probabilities are also called rare events.

2. The sum of the probabilities of all simple events (or final outcomes) for an experiment, denoted
n
by  PE  is always 1.
i 1
i

Example 4.1: Find the probability of obtaining an even number in one roll of a die.
This experiment of rolling a die once has a total of six outcomes: 1, 2, 3, 4, 5, and 6.
Given that the die is fair, these outcomes are equally likely. Let A be an event that an even number is
observed on the die. Event A includes three outcomes: 2, 4, and 6; that is,
A = an even number is obtained = {2, 4, 6}.
If any one of these three numbers is obtained, event A is said to occur. Since three out of six outcomes are
included in the event that an even number is obtained, its probability is:

P A 
Number of outcomes included in A 3
  0.5
Total number of outcome 6

4.4 Some Probability Laws


There are several useful laws of probability properties. We shall consider some of them
a. Mutually Exclusive Events: In statistics, two or more events are said to be mutually exclusive if
the occurrence of any one of them excludes the occurrence of the others. That is, the two events
cannot happen at same time. Hence, PrE1E2   0.
However, if E1  E2 is the event that either E1 or E2 or both occur, then

PrE1  E2   PrE1  PrE2   PrE1E2 


This can be extend to 3 events

56
PrE1  E2  E3  PrE1  PrE2   PrE3   PrE1E2   PrE1E3   PrE2 E3   PrE1E2 E3 

Nevertheless, for mutually exclusive event,

PrE1  E 2   PrE1   PrE 2 


This is often called the addition law of probability.

Example 4.2: If E1 is the event of drawing an ace from a deck of cards and E2 is the event of drawing a
king. Obtain the probability of either drawing an ace or a king in a single draw.
There are 4 aces and 4 kings in the deck card with a total of 52. Thus, the probability of drawing an ace is
given as

PrE1  524 .
Also, the probability of drawing a king is

PrE2   4
52

Thus,

PrE1  E2   PrE1  PrE2   524  524  131  131  132

Example 4.3: If E1 is the event of drawing an ace from a deck of cards and E2 is the event of drawing a
spade. Obtain the probability of either drawing an ace or a spade in a single draw.
The probability of drawing an ace is given as

PrE1  524 . Also, the probability of drawing a spade is PrE2   13


52 ,
Probability of PrE1E2   521

Thus,

PrE1  E2   PrE1   PrE2   PrE1 E2   4


52  13
52  52  52 13
1 16 4

b. Independence Event: Suppose, we have events E1 and E2 . Thus, two events E1 and E2 are
independent if the probability of the second event E2 is not affected by the occurrence or
nonoccurrence of the first event E1 . Thus, if E1 and E2 are independent events, then
PrE1 and E2   PrE1  E2   PrE1 PrE2 
This is somewhat called the multiplication law of probability.

Example 4.4: Suppose two events E1 and E2 defined a sample space such that PrE 2   0.2 and
PrE1  E2   0.75 , Find PrE 2 such that

1. E1 and E2 are independent


2. E1 and E2 are mutually exclusive.

Solution

57
1. If E1 and E2 are independent, thus, PrE1  E 2   PrE1  PrE 2 . Hence,
PrE1  E2   PrE1  PrE2   PrE1 PrE2 
0.75  PrE1 0.2  PrE1 0.2
0.8 PrE1  0.75  0.2  0.5

PrE1 
0.55
 0.6875
0.8
2. If E1 and E2 are mutually exclusive, then PrE1  E2   0 . Thus,
PrE1  E2   PrE1  PrE2 
0.75  PrE1  0.2
PrE1  0.75  0.2  0.55
4.5 Conditional Probability

If E1 and E2 are two events such that the probability that E2 occurs given that E1 has occurred is
denoted by PrE1 \ E 2  , better still PrE1 give E 2  is called conditional probability of E2 given that E1
has occurred. If the occurrence or nonoccurrence of E1 does not affect the probability of occurrence of
E2 , then PrE1 \ E2   PrE2  and we say that E1 and E2 are independent events; otherwise dependent
events.

If E1 E2 is the event that both E1 and E2 occur, then

PrE1 E 2   PrE1   PrE 2 \ E1 

However, the probability PrE1 E 2   PrE1 PrE 2  if independent events.

Example 4.5: Let E1 and E2 be the events heads on the fifth toss and head on the sixth toss of a coin
respectively. Thus, E1 and E2 are independent events, and thus the probability of heads on both the fifth
and sixth tosses is

PrE1E2   PrE1PrE2  
1 1 1
  .
2 2 4
Example 4.6: Suppose that a bag contains 3 white balls and 2 black balls. Let E1 be the event, first ball
drawn is black and E2 the event, second ball drawn is black. Where the balls are not replaced after being
drawn from the bag. Here, E1 and E2 are dependent events.

The probability that the first ball drawn is black is PrE2   32 2   52 . The probability that the second
ball drawn is black given that the first ball drawn was black is PrE2 \ E1  311  14 . Thus, the
probability that both balls drawn are black is

PrE1 E 2   PrE1 PrE2 \ E 1  


2 1 1
 
5 4 10
Example 4.7: A die is tossed two times. Compute the (1) probability of obtaining a sum of 9 (2)
probability of getting a sum of 9 giving that the number on the 2nd toss is larger than the number on the
first toss.

58
+ 1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6

1. Pr(sum of 9) =4/36 =1/9


2. Let E1= the sum is 9 and E2= 2nd toss number is larger than the first toss
PrE1  E 2 
PrE1 \ E 2  
PrE 2 
Thus, PrE2 = P(2nd toss number is larger than the first toss) = 15/36.
PrE1  E 2  = P(sum of 9 and 2nd toss number is larger than the first toss = 2/36
PrE1  E 2  2 / 36
PrE1 \ E 2  
2
  .
PrE2  15 / 36 15

4.6 Baye‘s Theorem


The Bayes‘s theorem was developed by a renowned English Presbyterian minister called Reverend
Thomas Bayes (1702 – 1761). He was able to expand the conditional probability as

PrEi  PrE \ Ei 
PrEi \ E  n

 PE  PrE \ E 
i 1
i i

Where Ei are all inclusive set of possible outcomes.


Example 4.8: A car repair firm employs three-sprayers, Tom, Dick and Harry. Owing to the different
speeds of their respective production lines, Tom is responsible for painting 25 percent of all the cars
produced, Dick for 35 percent and Harry for the remaining 40 percent. On the basis of frequent quality
inspections it is discovered that, on average, 5 percent of the cars sprayed by Tom fall below the
minimum acceptable standard (as regards the quality of painting), while the corresponding figure for Dick
is 9 percent and for Harry per cent. If a car is selected at a random from the firm‘s throughput of cars and
its paint finish is judged to be sub-standard, what is the probability that it was sprayed by Harry?

Let
S = event that the car is sub-standard
T = event that a car is sprayed by Tom
D = event that a car is sprayed by Dick
H = event that a car is sprayed by Harry
Thus, the problem is to find p(H|S).

59
However, P(T) = 0.25, and P(S|T) = 0.05. That is, Tom sprays 25% of the cars and of these 5% are sub-
standard, on the average. Similarly, for Dick and Harry:
P(D) = 0.35, and P(S|D) = 0.08,
P(H) = 0.40, and P(S|H) = 0.10,.
Thus, applying the Bayes‘ rule, the probability that a randomly selected car, found to be sub-standard was
sprayed by Harry is

PH PS | H  0.40  0.10


P H | S   
PT PS | T   PD PS | D   PH PS | H  0.25  0.05  0.35  0.08  0.40  0.10

0.04
  0.5
0.0805

Example 4.9: A product is being produced by Fupre Enterprises by three machines namely, M1, M2, M3.
These machines produce 40%, 35% and 25% of the product respectively. According, the defective
products produced by these machines respectively are 7%, 10% and 12%. Find
a. The probability that a part selected at a random from the finished product is defective.
b. The probability of the defective product was produced by machine M1, M2, M3.
In the above example, let
P(M1 ) = 0.4
P(M2) = 0.35
P(M3) = 0.25
ND = Non defective product
D = Defective product
P(D|M1 ) = 0.07
P(D|M2) = 0.1
P(D|M3) = 0.12
a. P(D) = P(M1 ) P(D|M1 ) + P(M2 ) P(D|M2) + P(M3 ) P(D|M3 )
0.4  0.07  0.35  0.1  0.25  0.12  0.028  0.035  0.03  0.093

PM i  D  PM i PD | M i 


b. PM i | D   n
 n
.
 PM
i 1
i  D  PM PD | M 
i 1
i i

Thus,
The probability that the defective is from machine 1 is given as
0.4  0.07 0.028
PM1 | D     0.3011
0.093 0.093
The probability that the defective is from machine 2is given as
0.35  0.1 0.035
PM 2 | D     0.3763
0.093 0.093
The probability that the defective is from machine 3 is given as
0.25  0.12 0.035
PM 3 | D     0.3226
0.093 0.093

4.7 Permutations and Combinations


4.7.1 Permutations

60
Permutations refer to the number of ways in which a set of objects can be arranged in order (the order
being crucial). However, the key words for permutation are order or arrangement. Thus, the number of
possible arrangement = n! This can be expressed as n! n  n  1n  2n  3...2  1 . Thus, n Pr
denotes the number of permutations of r objects out of a total of n objects, where n! is called n factorial.
Mathematically,
n!
n
Pr 
n  r !

Example 4.10: Compute n Pr if n= 5 and r = 2.

n! 5! 5  4  3  2 1
n
Pr     20
n  r ! 5  2! 3  2 1

Example 4.11: In how many ways can 3 persons sit on 6 seats in a row?
In this example, n =6, r = 3
n! 6! 6  5  4  3  2 1
n
Pr     120 .
n  r ! 6  3! 3  2 1

4.7.2 Combination
In the cast of permutations, the order in which the objects are arranged is important. However, if one is
interested only in what particular objects are selected when r objects are chosen from n objects, without
regard to their arrangement, then the unordered selection is called a combination. In this case, the number
of combinations is given by the formula:
n
Pr n!
n
Cr   .
r! n  r !r!
Where n C r denotes the number of combinations possible in selecting r objects from n different objects.

Example 4.12: Find the number of ways in which 3 persons can be selected from a committee of 5
people.
In this example, 3 persons can be chosen from a committee of 5 in 5 C3 ways
n
Pr n! 5! 5! 5  4
Thus, n Cr       10 ways
r! n  r !r! 5  3!3! 2!3! 2!

Assignment 4.1: A product is being produced by Fupre Enterprises by three machines namely, M 1, M2,
M3. These machines produce 20%, 65% and 15% of the product respectively. According, the defective
products produced by these machines respectively are 13%, 15% and 35%. Find
a. The probability that a part selected at a random from the finished product is defective.
b. The probability of the defective product was produced by machine M1, M2, M3.

61
SOLUTION

62
Chapter Five
5.1 Discrete Probability Distribution.
In the previous chapter, we fairly dealt with probability of fairly simple events. However, many other
problems may be very complex in that they cannot be readily solved in same manner as those involving
repetitive experiments such as the inspection of components coming off an assembly line, tossing a coin
etc. Thus, a discrete probability distribution counts occurrences that have countable or finite outcomes.
Discrete distributions contrast with continuous distributions, where outcomes can fall anywhere on a
continuum. Common examples of discrete distribution include the binomial, Poisson, and Bernoulli
distributions. The probability mass function (pmf) of a random variable x is denoted by px  . Various
discrete probability distributions have different pmfs.
5.1.1 Discrete Random variable
A discrete random variable is a variable that can assume any set of possible values that can be quantify,
counted or listed. Example of discrete random variable is the outcomes when you roll a die.
5.2 Mean or Expectation, and Variance of discrete probability distributions.

Let the probability mass function (pmf) of a discrete random variable x be given by px  . Then, the
mean or expectation of x denoted by E  x  is given by
n
E  x      xi p xi  .
i 1

The variance is given as


n n
Var  x    2    xi    p xi    x 2 p xi    2 .
2

i 1 i 1

The standard deviation is given as


n n
   xi    p  xi    x 2 p  xi    2
2

i 1 i 1

Example 5.1: Obtain the expectation of the data


s/n x P(x) Xp(x)
1 1 0.17 1x0.17=0.17

2 6 0.13 6x0.13=0.78

3 11 0.26 11x0.26=2.86

4 16 0.06 16x0.06=0.96

5 21 0.38 21x0.38=7.98

Total n 0.17+0.78+2.86+0.96+7.98=
 p x   1
i 1
 12.75

63
n
x   x i p i  12.75
i 1

5.3 Bernoulli distribution.


A Bernoulli experiment is a statistical random experiment in which the outcome of the experiment is
classified into one of two mutually exclusive and exhaustive events, say either success or failure, false or
true, male or female, good or bad. Thus, a Bernoulli distribution is a probability distribution of a discrete
random variable named after Swiss mathematician Jacob Bernoulli that can have only two values:
success and failure. If the probability of success is p , then the probability of failure is 1  p .

The probability mass (density) function (pdf) or (pmf) of the Bernoulli distribution is given as

f x   p , x 1  p 
1 x
for x  0,1

where x is a random variable that associate with the Bernoulli trial. Also, we say x has a Bernoulli
distribution with parameter p if X ~ Bernoulli  p  . Thus, when an experiment of a boy or a girl is
performed once, it is a Bernoulli experiment.
5.3.1. The Expectation and Variance of the Bernoulli distribution
The mean or expectation of the Bernoulli distribution for a random variable x is obtained as

1
  E  X    xp , x 1  p 1 x  0  p 0  1  p 10  1  p ,1 1  p 11  0  p1  p 0  p .
x 0

The variance is given as


1
 2   x  p 2 p , x 1  p 1 x  0  p 2 p 0 1  p 1 0  1  p 2 p1 1  p 11
x 0

 
 p 1  p   1  p  p  1  p  p 2  p  p 2  p1  p   pq
2 2

The standard deviation is obtained by taking the square root of the variance as

  pq .

Example 5.2: If X ~ Bernoulli 0.3 , find the mean and standard deviation.

In the above, p = 0.3. Thus, q = 1 – 0.3 = 0.7

Thus,   E  X   p  0.3 and   pq  0.3  .7  0.21  0.46 .

5.4 Binomial distribution.


Binomial distribution is a common probability distribution that models the probability of obtaining one of
two outcomes under a given number of parameters. It summarizes the number of trials when each trial has
the same chance of attaining one specific outcome. Suppose the random variable X has equal number of

64
observation in n success in n Bernoulli trials with possible values x  0,1,2,3, , ,4..., n . Then the pdf of x ,
say f  x  is given as

f x   n C x p x 1  p 
n x
, x  0,1,2,3, , ,4..., n .
n
Pr n!
where n C x  
r! n  r !r!
In summary, the number of ways of selecting x successful positions in n trials is a binomial distribution.
Thus, if X is a binomial distribution we write X ~ Bn, p  with n trials and parameter p .

Example 5.3: If X has a binomial distribution with the following.


a. n  3, x  2, p  0.3
b. n  4, x  0, p  0.4
Calculate the probability of x
a. Using
f x   n C x p x 1  p   3 C 2 0.32 1  0.3 0.32 0.7  3  0.09  0.7  0.189
n x 3 2 3!

1!2!
f x   n C x p x 1  p   4 C0 0.4 0 1  0.4 0.64  0.1296
n x 4!

4
b.
0!4!
Example 5.4: A coin is loaded in such a way that it lands with a head showing 70% of the time. If it is
tossed five times, find the probability of getting.
a. 4 heads
b. No head
c. 3 heads
d. At least 3 heads
e. At most 2 heads
In the above example, p = 0.7, q = 0.3.

P X  4 5 C4 0.7  0.3  50.7  0.3  0.36015


4 5 4 4
a.
P X  0  5 C 0 0.7  0.3
50
 0.00243
0
b.
P X  3  5 C3 0.7  0.3  50.7  0.3  0.3087
3 5 3 3 2
c.
d. P X  3  P X  3  Px  4  P X  5
P X  5  5 C5 0.7  0.3
55
 0.16807 . Thus,
5

P X  3  P X  3  Px  4  P X  5  0.3087  0.36015  0.16807  0.83692


e. P X  2  1  P X  5  1  0.83692  0.16308
Example 5.5: Ten unbiased coins are tossed simultaneously. Find the probability of obtaining i) Exactly
6 heads.
In the above example, let p be the probability of a head, then p = q=1/2 and n = 10. Thus,

65
For x = 6, we have f x  6 10 C6 p 6 1  p  1/ 26 1  1/ 24  105
10  6 10!
i. 
4!6! 512

5.4.1. Some properties of the Binomial Distribution


a. The outcome of a any trial has two possible outcomes. It is either success or failure, boy
or girl, male or female, good or bad.
b. The experiment is repeated n times. That is, it has n trials
c. The probability of success is p and that of failure is q. That is p + q = 1.

5.4.2. The mean and Variance of the Binomial distribution


The mean or expectation of the Binomial distribution for a random variable x is obtained as
n n n
  E  X    x n C r p x q n x  
xn! n!
p x q n x   p x q n  x for x  0 , the term is
x 0 x 0 x!n  x ! x 1  x  1! n  x !
zero.

 np 
n
n  1! p x 1q n x  np
x 1  x  1!n  x !

The variance of the random variable x is obtained as

 
n
Var  x    2  E X 2  E  X    x   2 pxi 
2

i 0

 
n
Now, let E  X  X  1  E X 2  E  X    X  X  1 X !n  x ! p
n! x
q n x
x 0

However, since the sum of the first two term is zero, we have

 
n n
p x q n  x  nn  1 p 2  n  2 Cx p x  2q n  x
n!
E X2 
x  2  x  2 !n  x ! x2

 nn  1 p 2 
n
n  2! p x  2q n  x
x  2  x  2 !n  x !

Suppose k  x  2 . Then, we have

 
n2
E X 2  nn  1 p 2  n  2 Ck p k q n  2  k  nn  1 p 2  p  q 
n2

k 0

 nn  1 p  p  1  p   nn  1 p 2  np
2 n2

Hence,

 2  E X 2   E  X   nn  1 p 2  np  E  X   nn  1 p 2  np  np 2  np1  p   npq


Remark: the variance of the binomial distribution is less than the mean.

66
5.5 Poisson distribution.
The Poisson distribution is a discrete probability distribution. It gives the probability of an event
happening a certain number of times (k) within a given interval of time or space. The Poisson distribution
has only one parameter, λ (lambda), which is the mean number of events. The Poisson distribution was
derived in 1837 by a French mathematician called Simeon D. Poisson (1781 – 1840). It is the limiting
case of the Binomial probability distribution with the following conditions
a. The number of trial is indefinite. That is n   .
b. The constant probability p of success for each trial is indefinitely small. That is, p  0 .
c. np  m , say is finite

The probability density function (pdf) of the Poisson distribution for a random variable x is given as

e  x
f x   PrX   , x  0,1,2,3,...
x!
Note that e  2.71828 .
Example 5.6: A building society branch manager notices that over a long period of time the number of
people using an automated cash point on a Saturday morning is on average, 30 people per hour. What is
the probability that in say a 10 minute period:
a. Nobody uses the machine?
b. Three people use the machine?
In the example, we need to obtain the mean of the Poisson distribution before we proceed. Now,
30
  5 . Thus,
5
e   x e 5 50
P X  0   2.7183  0.0067
5
a. 
x! 0!
 x
e  e 5 5 3
b. P X  3    0.1404
x! 3!
Example 5.7: The number of accidents occurring on a bridge in Agbarho in a year has a Poisson
distribution with mean 3. Compute the probability that in a year:
a. Only one accident occurred
b. At most one accident occurred
c. At least one accident occurred.
The solution of the example can be obtained as follows:

e   x e 3 31
a. P X  1    0.149
x! 1!
e 3 30 e 3 31
b. P X  1  P X  0  P X  1    0.199
0! 1!

67
e 3 30
c. P X  1  1  P X    1   1  0.04979
0!

5.6 Poisson Approximation to Binomial.


The Poisson distribution may be used to approximate the binomial if the probability of success is ―small‖
(such as 0.01) and the number of trials is ―large‖ (such as 1,000). n is the number of trials, and p is the
probability of a ―success.

5.7 Geometric distribution.


Geometric distribution can be defined as a discrete probability distribution that represents the probability
of getting the first success after having a consecutive number of failures. A geometric distribution can
have an indefinite number of trials until the first success is obtained.
The geometric distribution is another important distribution with the pdf given as

f x   pq x 1 x  1,2,3,...
The mean and the variance of the geometric distribution are obtained as

EX   and Var x    2  2


1 q
P P
Example 5.8: An applicant for a driver‘s license has a probability of 75% of passing the road test on any
given independent trial. What is the probability that an applicant will finally pass the test on the fourth
try?
Let X denote the number of trials. Then p = 75% = 0.75 and q = 1 – 0.75 = 0.25. Thus,

f x   pq x 1  0.750.25  0.750.5  0.0117 .


4 1 3

5.8 Hypergeometric distribution.


The hypergeometric distribution is used to determine the probability of a certain number of "successes" in
a series of draws made without replacement from a fixed population. The distribution depends on the size
of the population, the number of draws, and the number of "successes" in the population. In probability
theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes
the probability of k successes (random draws for which the object drawn has a specified feature) in n
draws, without replacement, from a finite population of size N that contains exactly K objects with that
feature, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes
the probability of k successes in draws with replacement.
A random variable X follows the hypergeometric distribution if its probability mass function (pmf) is
given as

p y  
  
r
y
N r
n y

 
N
n

where

68
 Is N the population size,
 Is r the number of success states in the population,
 is n the number of draws (i.e. quantity drawn in each trial),
 is y the number of observed successes,
 is   a binomial coefficient.
a
b

If Y is a random variable with hypergeometric distribution. The mean and variance of Y are given by

 r  N  r  N  n 
  E Y   and  2  Var Y   n 
nr
  . However, we express
N  N  N  N  1 
r N r
p and 1  p  , we have
N N
 N n  N n
  E Y   np and  2  Var Y   npq  with   as the adjustment.
 N 1   N 1 
Example 5.9: An important problem encountered by personnel directors and others faced with the
selection of the best in a finite set of elements is exemplified by the following scenario. From a group of
20 Ph.D. engineers, 10 are randomly selected for employment. What is the probability that the 10 selected
include all the 5 best engineers in the group of 20?

For this example N = 20, n = 10, and r = 5. That is, there are only 5 in the set of 5 best engineers, and we
seek the probability that Y = 5, where Y denotes the number of best engineers among the ten selected.
Then

p5 
     15!   21  0.162
5 20  5
10  5
   5!110!  1292
5
20
n .

5.9 Negative-binomial distribution.


A negative binomial distribution is the distribution of the sum of independent geometric random
variables. The number of failures before the nth success in a sequence of draws of Bernoulli
random variables, where the success probability is p in each draw, is a negative binomial random
variable. Thus, in a more simpler form, In probability theory and statistics, the negative binomial
distribution is a discrete probability distribution that models the number of failures in a sequence of
independent and identically distributed Bernoulli trials before a specified (non-random) number of
successes (denoted r ) occurs. For example, we can define rolling a 6 on a dice as a success, and rolling
any other number as a failure, and ask how many failure rolls will occur before we see the third success (
r = 3). In such a case, the probability distribution of the number of failures that appear will be a negative
binomial distribution.
The probability mass function of the negative binomial distribution is given as
f x  k 1 Cr 1 p r q k  r

The mean of the negative binomial distribution is E  X  


rq
 .
p

69
The variance of the negative binomial distribution is Var  X  
rq
 .
p2
 Here the mean is always greater than the variance. Mean >Variance.

Example 5.10: Jim is writing an exam with multiple-choice questions, and his probability of attempting
the question with the right answer is 60%. What is the probability that Jim gives the third correct answer
for the fifth attempted question?
Solution:
Probability of success P(s) = 60% = 0.6, Probability of failure P(f) = 40% = 0.4. It is given that Jim gives
the third correct answer for the fifth attempted question. Here we can use the concept of the negative
binomial distribution to find the third correct answer for the fifth attempted question. Here we have k = 5,
r = 3, p = 0.6, q = 0.4
The formula for negative binomial distribution is
 5 1 
Bx, r , p   k 1 Cr 1 p r q k  r   0.6  0.4  6  0.216  0.16  0.1296  0.16  0.020736
3 2

 31 
Therefore the probability of Jim giving the third correct answer for his fifth attempted question is 0.02.

Example 5.11: A geological study indicates that an exploratory oil well drilled in a particular region
should strike oil with probability 0.2. Find the probability that the third oil strike comes on the fifth well
drilled.

Solution
Assuming independent drillings and probability 0.2 of striking oil with any one well, Let Y denotes the
number of the trial on which the third oil strike occurs. Then it is reasonable to assume that Y has a
negative binomial distribution with p = 0.2. Because we are interested in r = 3 and y = 5,

4
PY  5  p5   0.2  0.8  60.0080.64  0.0307
3 2

2
If r = 2, 3, 4, . . . and Y has a negative binomial distribution with success probability p, P(Y = y0) = p(y0)
can be found by using the R (or S-Plus) command dnbinom(y0-r,r,p). If we wanted to use R to obtain p(5)
in Example 5.10, we use the command dnbinom(2,3,.2). Alternatively, P(Y ≤ y0) is found by using the R
(or S-Plus) command pnbinom(y0-r,r,p). Note that the first argument in these commands is the value y0 −
r , not the value y0. This is because some authors prefer to define the negative binomial distribution to be
that of the random variable
Y* = the number of failures before the rth success. In our formulation, the negative binomial random
variable, Y , is interpreted as the number of the trial on which the rth success occurs. In Exercise 3.100,
you will see that Y*= Y − r . Due to this relationship between the two versions of negative binomial
random variables,
P(Y = y0) = P(Y − r = y0 − r ) = P(Y*= y0 − r ). R computes probabilities associated with Y*, explaining
why the arguments for dnbinom and pnbinom are y0 − r instead of y0.

5.10 Moment Generating Functions (mgf).


In probability theory and statistics, the moment-generating function of a real-valued random variable is an
alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to
analytical results compared with working directly with probability density functions or cumulative
distribution functions. There are particularly simple results for the moment-generating functions of

70
distributions defined by the weighted sums of random variables. However, not all random variables have
moment-generating functions.

As its name implies, the moment-generating function can be used to compute a distribution‘s moments:
the nth moment about 0 is the nth derivative of the moment-generating function, evaluated at 0.

In addition to real-valued distributions (univariate distributions), moment-generating functions can be


defined for vector- or matrix-valued random variables, and can even be extended to more general cases.

The moment-generating function of a real-valued distribution does not always exist, unlike the
characteristic function. There are relations between the behaviour of the moment-generating function of a
distribution and properties of the distribution, such as the existence of moments.

Thus, the moment generating function M X t  for a continuous random variable is defined by


 
n
M X t   E etx   etx f  x    etx f x dx
i 1 

where f  x  is the pdf

Thus,

 
 
 
M X t   E etx   etx f x dx   
 1  tx 
t 2 x 2 t 3 x3
  ... f x dx

2! 3! 
   
t 2 x2 t 3 x3
  f x dx   txf x dx   f x dx   f  x dx  ...
  
2! 
3!

t2 t3
M X t   1  t1  2  3  ... .
2! 3!
Example 5.12: Find the moment generating function for the continuous random variable.
 4 x
0 x 
f x   
4e

 0 elsewhere

The solution to the question is obtained as


  
 e  4  t  x  
 
M X t   E etx   etx f x dx  4  etxe 4 x dx 4  e ( 4  t ) x dx   4 
    4  t   0

 e  4  t     e  4  t 0    4   4 
 4     4    0       .
 4  t     4  t     4  t    4  t  
The mean, second moment, and third moment of the question can be obtained as

71
 4 
M X t    
 4  t  
The mean is obtained by taking the first derivative of the mfg.

0  4 1
M ' X t  
4

4  t  4  t 2
2

M ' X 0   1 
4 1

4  0 4
2

The second moment is obtained by taking the second derivative of the first derivative of the mfg.

0  4 24  t  84  t 


M '' X t    2 
8
 
4  t 4
4  t  4  t 3
4

M '' X 0    2 
8 1

4  0 8
3

The second moment is obtained by taking the third derivative of the second derivative of the mfg.

0  8 34  t  84  t  241  t 


2
M ''' X t   3   
4  t 6 4  t 4 4  t 6

M ''' X 0  3 
24
256
Assignment 5.1

 m e 2 x 0 x 
1. Given the pdf of a continuous random variable X as f  x    .
 0 elsewhere

Find (a) the constant value m (b) The expectation of X (c) P1  X  4 .
2. Let X be a continuous random variable X with the probability density function
 2 e 4 x
. Compute (a) EX  (b) 3E X  1 (c) P1  X  3 .
0 x 
f x   
 0 elsewhere

3. Let X be a random variable with pdf of f x  


2x
, 0  x  5 and zero elsewhere. Obtain (a)
25
 
EX  (b) E X 2 (c) E X 3   
(d) VarX  € E 2  X 2 .
4. Find the expected value of the gamma random variable using the mgf approach.
 x
5. Find the moment generating function of the function f  x  
1
exp  , 0  x   .
k  k
6. Using question 5 above, compute the mean, second moment, third moment, and the variance.
7. The daily expenses of a company has a probability density function given as
 3
1 x 2 0 x  2
f x    36
. What is the expected amount to be budgeted for 5 days in
 0 elsewhere

thousands of naira?

72
8. If the moment generating function of a random variable X is given as
3t 
e 2t
M X t  
e
, where t  0 . Compute the mean, second moment, third moment, and the
t
variance.
 x
9. Let X denotes a continuous random variable with pdf given as f  x  
1
exp  , 0  x   .
3  3
Obtain (a) P X  9 (b) P2  X  4 .
 0  x 1
10. Find the mgf of the continuous random variable X with pdf given as f  x   
1

 0 elsewhere

. Compute the mean, second moment, third moment, and the variance.
11. Obtain the moment generating function of the Normal distribution
12. Obtain the moment generating function of the Gompertz distribution
13. Obtain the moment generating function of the Alpha Power Gompertz distribution
14. Obtain the moment generating function of the Alpha Power Muth-G distribution
15. Obtain the moment generating function of the exponential distribution
16. Obtain the moment generating function of the Weibull distribution
17. Obtain the moment generating function of the Alpha Power Gompertz distribution
18. Obtain the moment generating function of the Chi-Square distribution
19. Obtain the moment generating function of the Cauchy distribution
20. Obtain the moment generating function of the shifted Gompertz-G distribution

SOLUTION

73
SOLUTION

74
SOLUTION

75
SOLUTION

76
SOLUTION

77
SOLUTION

78
5.11 Probability Generating Functions
In probability theory, the probability generating function of a discrete random variable is a power series
representation (the generating function) of the probability mass function of the random variable.
Probability generating functions are often employed for their succinct description of the sequence of
probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the
well-developed theory of power series with non-negative coefficients.
The probability generating function gives an alternative method of finding the mathematical expectation
of a discrete random variable. The probability generating function helps in finding the probability
distributions and properties of the discrete random variable.
The probability generating function of a random variable X is given as

 

P t   E t X   t x f  x  .
x 0

   t f x  repeatedly, we have

Suppose we differentiate P t   E t X  x

x 0

 

P ' t   E Xt X 1   xt x 1 f  x 
x 0

 

P' ' t   E X  X  1t X  2   xx  1t x  2 f x 
x 0

 

P' ' ' t   E X  X  1 X  2t X  3   xx  1x  2t x  3 f x 
x 0

.
.
.

   

P k t   E X  X  1 X  3... X  k  1t X  k   xx  1x  3...x  k  1t x  k f x 
x 0

Setting t = 1, we obtain

P ' 1  E  X   1

 
P' ' 1  E  X  X  1  E X 2  EX   2

79
P' ' ' t   E  X  X  1 X  2  3
.
.
.

P k t   E X  X  1 X  3... X  k  1t X  k  k

Thus, we can now deduce Pi t  to find the mean and variance.

Hence,

P ' 1  E  X   1 = Mean

   
P' ' 1  E X  X  1  E X 2  EX   E X 2  P' ' 1 .

 
But, E X 2  P' ' 1  EX   P' ' 1  P' 1 , and the variance is given as

 
Var  X   E X 2  EX  .
2

Hence,

 
Var  X   E X 2  EX   P' ' 1  P' 1  P1 .
2 2

Example 5.13: Find the pgf and the mean for the geometric random variable.

The pmf of the geometric distribution is given as f x   pq x 1 . Thus, the solution of the example is
given as

   
 
Pt   E t x   t x pq x 1   qt x  p qt  q 2t 2  q 3t 3  ...
p
x 1 x 1 q q
However, since the term in the series is an infinite geometric progression, we have for t  1 , qt  1 .
Thus,

qt
qt  q 2t 2  q 3t 3  ...  .
1  qt 
p  qt 
Pt  
pt
   .
q  1  qt  1  qt
The mean is obtained by taking the first derivative as

d  pt  1  qt  p  qpt
P' t    
dt  1  qt 
.
1  qt 2
Now, setting t  1 , we have

80
P' 1 
1  q  p  qp  p 2  pq p p  q  1
 
1  q 2 p2 p2 p
Example 5.14: Find the probability generating function for the binomial random variable with parameter
n, p and obtain the mean and variance.

The pdf of the binomial random variable is given as n C x p x q n  x . Thus, the

 
n n
Pt   E t x   t x C x p x q n  x   n C x tp  q n  x  q  pt  ,    t   .
n x n

x 0 x 0

The mean is obtained as

P' t   E  X   q  pt n  npq  pt n 1 .
d
dt
Setting t  1 , we have

P' 1  E  X   npq  p 


n 1
 np, sin ce q  p  1.
The variance is obtained as

 
Var  X   E X 2  EX   P' ' 1  P' 1  P1 .
2 2

But

P' ' t  
d
dt
 n 1

npq  pt   nn  1 p 2 q  pt  .
n2

Setting t  1 , we have

P' ' t   nn  1 p 2  n2 p 2  np2 .


Hence,

Var  X   P' ' 1  P' 1  P1  n2 p 2  np 2  np  n2 p 2  np  np 2  np1  p   npq .


2

Example 5.15: Find the probability generating function when the pdf of X is defined by
5 x
f x   , x  1,2,3,4 .
10
The solution is obtained as

5 x
 
n n
Pt   E t , x   t x f x    t x  , x  1,2,3,4 .
x 1 x 1  10 
5 1 4 52 3 53 2
For x  1    0.4 , x  2    0.3 , x  3    0.2 ,
10 10 10 10 10 10

81
54 1
x 4   0.1 .
10 10
5 x
 
4
E t,x  t x  0.4t  0.3t 2  0.2t 3  0.1t 4
x 1 10

Assignment 5.2
1. Find the pgf when the pdf of a random variable X is defined by
f x   , x  1,2,3 .
x
a.
6
f x   0.3 0.7  , x  0,1
x 1 x
b.

SOLUTION

82
5.12 Sample Distribution
In this section, we presented methods for finding the distributions of functions of random variables.
Throughout this chapter, we will be working with functions of the variables Y1 , Y2 , Y3 ,...,Yn observed in a
random sample selected from a population of interest.
The random variables Y1 , Y2 , Y3 ,..., Yn are independent and have the same distribution. Certain functions of
the random variables observed in a sample are used to estimate or make decisions about unknown
population parameters.
For example, suppose that we want to estimate a population mean  .If we obtain a random sample of n
observations, y1 , y2 , y3 ,..., yn , it seems reasonable to estimate  with the sample mean
n

y i
y i 1
n
The goodness of this estimate depends on the behaviour of the random variables Y1 , Y2 , Y3 ,..., Yn and the effect that
n

y i
this behaviour has on y i 1
. Notice that the random variable Y is a function of (only) the random variables
n
Y1 , Y2 , Y3 ,..., Yn and the (constant) sample size n. The random variable Y is therefore an example of a statistic.

A statistic is a function of the observable random variables in a sample and known constants.
Thus,
 n 
  yi  n  y  n E  y  n  n

E Y  E  i 1    E  i    i
 i  
 n  i 1  n  i 1 n i 1 n n
 

 n 
  yi  n  y  n Var yi  n  i n i 

2 2 2
Var Y  Var  i 1   Var  i       
 n  i 1 n
2 2
 n  i 1 i 1 n n2 n
 

5.13 The Central Limit Theorem


If is normally distributed with mean and standard deviation, the Z-scores that correspond to estimating
with mean is
X 
Z

n

Example 5.16: A single fair die is tossed once. Let Y be the number facing up. Find the expected value
and variance of Y. Also, suppose a balanced die is tossed three times. Let Y1, Y2, and Y3 denote the number of

83
spots observed on the upper face for tosses 1, 2, and 3, respectively. Suppose we are interested in
Y1  Y2  Y3
Y the average number of spots observed in a sample of size 3.
3
 n 
  yi   1 1  1  2  3  4  5  6 21
EY    i 1   1   2   3   4   5   6   
1 1 1 1
  3.5
 n   6 6 6 6 6 6 6 6
 
The variance can be obtained as
 n 2 
 y i   1
VarY    i 1
1 1 1 1 1
   2  12   22   32   42   52   62    3.52  2.9167
 n   6 6 6 6 6 6
 
 n 
  yi  n  y 

E Y  E  i 1    E  i   3.5  
 n  i 1  n 
 
 n 
  yi  n  y i  n Var y i  n  2 i n 2 i  2 2.9167

Var Y  Var  i 1

 n  i 1
   Var      2  2    0.9722 .
 n  i 1 n
2
i 1 n n n 3
 
Example 5.17: Let the random 2, 7,8,10, and 15 been drawn from a bag. Compute E Y and Var Y .  
The solution is obtained as
 n 
  yi  n  y  2  7  8  10  15

E Y  E  i 1    E  i    20.4
 n  i 1  n  5
 
 n 
  yi  n
 y 
Var Y  Var  i 1   Var  i  
 2 632.24
  126.448 .
 n  i 1 n n 5
 

5.14 The Law of Large Numbers


Considering any shape of a population, the law of large numbers makes us understand that as the sample
size say ‗n‘ increases; the sample mean gets closer, and closer to the population mean. In other words, the
larger the sample, the more likely the sample mean gets closer to the population mean. Thus, if an
experiment is repeated again and again, the probability of an event obtained from the relative frequency
approaches the actual or theoretical probability.

84
5.14.1.1 The Law of Large Numbers of Binomial Trials
If X is the number of successes in n numbers of Binomial trials with probability of success p on each of
the trial, the mean, np and variance, npq. We could deduce that the fraction of success  nx  can provide us
with an estimate of p. we expect as n becomes large the estimate moves closer to p.

X 
Using the chebychev‘s inequality, we note that P  p     P X  np  n where   0 .  
N 
Thus, using the chebychev‘s inequality P X    k    1
2
and substituting X  x and s 2  n ,
2

k

We have P X    k 
n
  k1 2

Let   k 
n
and k   n
 .
Then,

    n
2
P X    n
  
n 2


P X     2
 2n
.

Next, taking the limit as n tends to infinity, we have

2
 0 as n   .
n 2
Therefore,

n 
lim
P X     0.  
Also, since we know that

  
P X      1  P X      1  0 as n   . 
Therefore,


P X     1. 
Thus, the proof shows that for large number n, the sample mean X will be very close or converges to
population mean  .

Example 5.18: Let X be the mean of a random sample of size 20 from a distribution whose pdf is
f x   3x 2
8
, 0  x  2 . Find

1. P X  1.3

85
2. P X  1.42
3. P1.2  X  1.45

The solution is obtained as follows:

From the pdf f  x   3x 2


8
, 0  x  2 , the mean is obtained as

3  x4 
2
3 2 3 2 12
   x.x 2 dx   x 3dx      1.51 , and also,
8 0 8 0 8 4  0
8

3  x5 
2
E X    x 2 .x 2 dx   x 4 dx   
3 2 3 2 12
 .
8 0 8 0 8 5  0
5

Thus,
2
12  12 
VarX       0.15 .
5 8

Therefore,   1.51,  2  0.15 .

 
 
 1.3  1.51    0.2 
1. P X  1.3  P Z   P Z    PZ  2.31  0.0104 .
 0.15   0.0866 
 
 20 
 
 
 1.42  1.51    0.08 
2. P X  1.42  1  P Z   1  P  Z  
 0.15   0.0866 

 20 
 1  PZ  0.92  1  0.1788  0.8212 .
 
 
 1.2  1.51 1.45  1.51 
3. P1.2  X  1.45  P Z  P 3.46  Z  0.58
 0.15 0.15 
 
 20 20 
 PZ  0.58  PZ  3.46  0.281  0.0003  0.2807 .

Example 5.19: Let X1 , X 2 ,...X 25 denote the random sample of size 25 from the uniform distribution
having E  X i   15 and Var X i   101 , if Y is the sum of all the random sample. That is,
Y  X1  X 2  ...  X 25 . Compute

a. PY  6.35

86
b. P4.25  Y  5.90
25

x  x  n
i
Z  i 1
 n 2
n
1 1
and Y  X1  X 2  ...  X 25 , Thus, n  25,   , 2  .
5 10
Thus, the probability that the sum of the random sample will be less than or equal to 6.35 is given as

 
 y  1  25 
 6.35  5.0 
a. PY  6.35  P 5   PZ  0.85  0.8023 .
 25 1.581 
 
 10 
 4.25  5.0 y  5.0 5.90  5.0 
b. P4.25  Y  5.90  P     P 0.47  Z  0.57 
 1.581 1.581 1.581 
PZ  0.57   PZ  0.47   0.7157  0.3857  0.3857 .
Assignment 5.2
1. Suppose the height of workers are normally distributed with   1.69m , and the   0.32 .
Compute
i. The probability that the sum of the height of a group of 16 workers will be great than 1.55m
ii. The probability that the sum of the heights of 20 workers will be between 29.6m and 34.5m
iii. The probability that the sum of the height of a group of 16 workers will be less than 1.55m.
 
2. Find P x  48 for a sample size of 81 drawn from a population with a mean 45 and standard
deviation 9.

SOLUTION

87
SOLUTION

88
Chapter Six
6.1 Introduction to Continuous Probability Distributions
The previous dealt with the discrete random variable. However, in this chapter, we shall consider the
continuous random variable. A random variable X is said to be a continuous random variable if it takes
on any value in an interval who‘s set of possible values are uncountable. That is    x   . A good
example is the height of statistics students in the department of statistics in federal University of
Petroleum Resources, Effurun Delta State can take any countable infinity of values in an interval of real
numbers.
6.1.1 Probability Distribution
Suppose X is a random variable. Then, the probability distribution function for the continuous random
variable X is defined as

F x    f x dx .


However, the probability that X will fall within the interval a, b  is given as
b
F a  X  b    f x dx .
a

The probability density function is a theoretical model for the frequency distribution (histogram) of a
population of measurements. For example, observations of the lengths of life of washers of a particular
brand will generate measurements that can be characterized by a relative frequency histogram.
Conceptually, the experiment could be repeated ad infinitum, thereby generating a relative frequency
distribution (a smooth curve) that would characterize the population of interest to the manufacturer. This
theoretical relative frequency distribution corresponds to the probability density function for the length of
life of a single machine, Y. This is shown in Figure 6.1.

Figure 6.1: The distribution function

Features of the probability distribution function

a. 0  f x   1
b. F  x  is a non-negative function
c. F    0 and F     1
d. F xb   F xa  if xb  xa
x
e. F x    f t dt is the cumulative distribution function


89

f.  f x dx  1

dF x 
g. f x    F ' x  .
dx
Example 6.1: A function f  x  is defined as f x   kx 2 , 0  x  3 . Find the constant k , F  x  ,
P1  X  3 .

The k is obtained as

kx 3
3
 33 03 
 k     k 9  1
3

0
kx 2 dx 
3 0 3 3

1
k 1
9
The cumulative distribution function is obtained as

1  y3 
x x
x3
F x   f t dt  k  y 2dy 
x 1 x 2


 9 0
y dy   
9  3  0

27
, 0  x  3.

1  x3  1  27  1  1  26  26
3
P1  X  3   f x dx 
3 1 3 2
1 9 1
x dx  
 
     
9  3  1 9  3  9  3  27
.

Example 6.2: Let

0 for y  0,

F y   y for 0  y  1,
1 for y  1.

Find the probability density function for Y and graph it.

Note that the function f  y  is the derivative of F  y  . Thus,

 d (0)
 dy  0 for y  0,

dF  y   d ( y)
f y   F 'y   1 for 0  y  1,
dy  dy
 d (1)
 0 for y  1.
 dy
and f  y  is undefined at y = 0 and y = 1. A graph of F  y  is shown in Figure 6.2.

90
Figure 6.3: Distribution function of F  y 

Figure 6.4: Distribution function of f  y 

The graph of f  y  for Example 6.2 is shown in Figure 6.4. Notice that the distribution and density
functions given in Example 6.2 have all the properties required of distribution and density functions,
respectively. Moreover, F  y  is a continuous function of y, but f  y  is discontinuous at the points y = 0,
1. In general, the distribution function for a continuous random variable must be continuous, but the
density function need not be everywhere continuous.

Example 6.3: Let Y be a continuous random variable with probability density function given by

3 y 2 0  y  1,
f y  
0 elsewhere

Find F  y  . Graph both f  y  and F  y  .


x
The solution is obtained by finding F  x    f t dt
 .

Thus,

91
y
  0 dt  0, for y  0,
 
0 y y

F  y     0 dt   3t 2 dt  0  t 3  y 3 , for 0  y  1,
  0 0
0 1 1
 0 dt  3t 2 dt  y 0dt  0  t 3
 1 1  0  1, for 1  y
 0

Notice that some of the integrals that we evaluated yield a value of 0. These are included for
completeness in this initial example. In future calculations, we will not explicitly display any integral that
has value 0. The graph of F  y  is given in Figure 6.5.

Figure 6.5: Density function of f  y 

If the random variable Y has density function f  y  and a  b then the probability that Y falls in the
interval a, b  is
Pa  Y  b   F b   F a    f  y dy.
b

This probability is the shaded area in Figure 6.6.

Figure 6.5: Density function of Pa  Y  b  .

Example 6.4: Let Y be a random variable with p  y  given in the table below.

y 1 2 3 4

p y  0.4 0.3 0.2 0.1

92
a. Give the distribution function, F  y  . Be sure to specify the value of p  y  for all y ,
 y .
b. Sketch the distribution function given in part (a).

The solution is obtained as

0 y 1
0.4 1 y  2

a. F  x   0.7 2 y3
0.9 3 y  4

1 y4
b. The graph of the distribution function is given as

Assignment 6.1
1. A box contains five keys, only one of which will open a lock. Keys are randomly selected and
tried, one at a time; until the lock is opened (keys that do not work are discarded before another is
tried). Let Y be the number of the trial on which the lock is opened.
a. Find the probability function for Y.
b. Give the corresponding distribution function.
c. What is PY  3 ? PY  3 ? PY  3 ?
d. If Y is a continuous random variable, we argued that, for all    a   , PY  a  = 0.Do
any of your answers in part (c) contradict this claim? Why?*61*

93
SOLUTION

94
SOLUTION

95
6.2 Mean or Expectation, and Variance of continuous probability distributions
The next step in the study of continuous random variables is to find their means, variances, and standard
deviations, thereby acquiring numerical descriptive measures associated with their distributions. Many
times it is difficult to find the probability distribution for a random variable Y or a function of a random
variable, g(Y). Even if the density function for a random variable is known, it can be difficult to evaluate
appropriate integrals (we will see this to be the case when a random variable has a gamma distribution,
Section 4.6). When we encounter these situations, the approximate behavior of variables of interest can be
established by using their moments and the empirical rule or Tchebysheff‘s theorem.

The expectation of a random variable X is given as



E  X    xf x dx ,


provided that the integral exists, and where the function f  x  is the probability density function.

In more general form, let g  X  be a function of X ; then the expected value of g  X  is given as

E g  X    g x  f x dx .

The variance of a random variable X is given as

 

 
Var  X    2  E  X       x   2 f x dx  E X 2  E  X   E X 2   2
2 2
 


Example 6.5: Suppose that the length of the wood assumes a continuous random variable with pdf given
as f x   32 x 2  x, 0  x  1, f x   0 , elsewhere. Find the expected value and the variance.

The solution is obtained as


a. The expectation is given

     
1

E  X    xf x dx   x 
1 1
x 2  x dx   x 3  x 2 dx   x4 
4
3 3 3 x3
 0 2 0 2 2 3
0

3  1  1 3 1 9  8 17
        0.708
24 3 8 3 24 24

 
b. The variance is given as Var X    2  E X 2   2 . Thus, we obtain the

 3  x5  x 4  1 3 1
   3  1 3 
E X 2   x 2 f x dx   x 2  x 2  x dx    x 4  x 3 dx        
1

 0
2  
0 2
  2  5  4  0 10 4
11
  0.55 .
20
Thus,
 
Var  X    2  E X 2   2  0.55 0.7082  0.55  0.5013 .

96
6.2.1 Some Properties of Mean and Variance of a random variables X , and Y , and a and b are
constants.
a. Ea   a
b. E  X  b  E  X   b
c. EaX   aEX 
d. EaX  bY   aEX   bE Y 
e. Vara   0
f. VaraX   a 2VarX 
g. Var X  b  Var X 

Assignment 6.2

Let X be a random variable with pdf given as px   4  2 x, 1  x  2 and zreo elsewhere . Find
 
E6 X , E X 2 , and VarX  10 .

SOLUTION

97
6.3 Normal distribution
The most widely used continuous probability distribution is the normal distribution, a distribution with
the familiar bell shape that was discussed in connection with the empirical rule. The examples and
exercises in this section illustrate some of the many random variables that have distributions that are
closely approximated by a normal probability distribution. In Chapter 7we will present an argument that
at least partially explains the common occurrence of normal distributions of data in nature. The normal
density function is as follows:
A random variable Y is said to have a normal probability distribution if and only if, for   0 and
      , the density function of Y is

 1  y   2 
f y 
1
exp   ,  y 
 2  2    
 
The normal density function contains two parameters,  and  . Figure 6.6 shows the normal density
function

Figure 6.6: The normal Probability density function


Areas under the normal density function corresponding to Pa  Y  b  require evaluation of the integral
1  1  y   2 
a  2 exp  2    dy
b

 
Unfortunately, a closed-form expression for this integral does not exist; hence, its evaluation requires the
use of numerical integration techniques.
The normal density function is symmetric around the value  , so areas need be tabulated on only one side
of the mean. The tabulated areas are to the right of points z , where z is the distance from the mean,
measured in standard deviations. This area is shaded in Figure 6.7.

Figure 6.7: Tabulated area for the normal Probability density function

98
6.3.1 Properties of the Normal Distribution
1. The normal distribution is symmetric about the mean 
2. It is centered at the mean 

3.  f x dx  1that is, total area under the normal distribution curve is 1

4. Approximately 68% of a normal population lies within 1 standard deviation of the mean
5. Approximately 95% of a normal population lies within 2 standard deviation of the mean
6. The probability that a randomly selected member X of a normal population lies between two
values xL and x R P X L X  X R  is precisely equal to the area under the normal curve between
X L and X R .

If is a normally distributed random variable with parameters,  and  . Then the mean and variance of
are given as

E X    and VarX    2 .
The moment generating function of the normal distribution is given as

  2t 2 
M X t   exp t  .
 2 

We can always transform a normal random variable X to a standard normal random variable Z by using
the relationship
X 
Z .

Thus,
 z2
f y 
1
exp  , z .
2  2 
This is often called the standard normal distribution with mean zero and standard deviation 1.

6.3.2 Some important features of the standard normal distribution


a. PZ  a   1  PZ  a  .
b. PZ  a   1  PZ  a  .
c. Pa  Z  b   PZ  b   PZ  a  .
d. PZ  a   1  1  PZ  a   PZ  a  .

Example 6.6: Let Z denote a normal random variable with mean 0 and standard deviation 1.

a. PZ  2.6  0.9953


b. PZ  0.26  1  PZ  0.26  1  0.9953  0.0047
c. PZ  0.26  PZ  0.26  0.9953
d. P0.5  Z  2.5  PZ  2.5  PZ  0.5  0.9938  0.6915  0.3023
e. P 2  Z  2  PZ  2  PZ  2

99
 1  21  PZ  2  1  20.0228  0.9544 . The area is shown in Figure 6.8.

Figure 6.8: Desired area


Example 6.7: The achievement scores for a college entrance examination are normally distributed with
mean 75 and standard deviation 10. What fraction of the scores lies between 80 and 90?

Recall that z is the distance from the mean of a normal distribution expressed in units of standard
deviation. Thus, we convert the random variable to the desired z score using
X 
Z

Thus, the desired fraction of the population is given by the area between
80  75 90  75
Z 1  0.5 and Z 2  1.5
10 10
Figure 6.9 gives the shape

Figure 6.9: Required area

PZ  1.5  PZ  0.5  0.9332  0.6915  0.2417


Assignment 6.3
1. Assuming that the height of maize X is normally distributed with mean of 300cm and standard
deviation 60cm. what is the probability that a randomly selected maize
a. Height is greater than 360cm?
b. Height is between 150cm and 400cm?
c. Height is between 240cm and 300cm?
d. Height is less than 400cm?
2. If Z is a standard normal random variable, find the value z0 such that
a. P(Z > z0) = .5.
b. P(Z < z0) = .8643.
c. P(−z0 < Z < z0) = .90.
d. P(−z0 < Z < z0) = .99.

100
SOLUTION

101
SOLUTION

102
6.4 Uniform distribution.
In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a
family of symmetric probability distributions. Such a distribution describes an experiment where there is
an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, a and
b which are the minimum and maximum values. The interval can either be closed (i.e. a, b ) or open
(i.e. a, b  ). Therefore, the distribution is often abbreviated U a, b  where U stands for uniform
distribution. The difference between the bounds defines the interval length; all intervals of the same
length on the distribution's support are equally probable. It is the maximum entropy probability
distribution for a random variable X under no constraint other than that it is contained in the
distribution's support.
The pdf of the random variable X is said to be a uniform distribution on the interval [a,b] if

f x  
1
, a  x b.
ba
The figure below indicated the uniform distribution with parameters a and b as real such that b  a .

Figure 6.10: probability density function diagram of the uniform distribution


The mean of the uniform distribution is obtained as

ba
 
b b b
EX   
x 1 1
a
b  a
dx 
b  a 
a
xdx 
2b  2 a
x2
a

2b  2a
b2  a 2 
2
.

Example 6.7: Arrivals of customers at a checkout counter follow a Poisson distribution. It is known that,
during a given 30-minute period, one customer arrived at the counter. Find the probability that the
customer arrived during the last 5 minutes of the 30-minute period.
The solution is obtained as just mentioned, the actual time of arrival follows a uniform distribution over
the interval of (0, 30). If Y denotes the arrival time, then

30  25 5 1
P25  Y  30  
130
dy    .
25 30 30 30 6
The probability of the arrival occurring in any other 5-minute interval is also 1/6.
Assignment 6.4

103
1. Show that the variance of the uniform distribution is given as VarX  
b  a 2 .
12
expbt   expat 
2. Show that the mgf of the uniform distribution is M X t   .
t b  a 
SOLUTION

104
6.5 Exponential distribution.
The exponential distribution for the continuous random variable is closely related to the Poisson
distribution of the discrete case.
A random variable X has an exponential distribution if and only if its probability density function is
defined as

 exp x , x0
f x    .
0 elsewhere
However, in some cases, it can be represented as

1  x
 exp  , x0
f x     
0
 elsewhere

If X is a random variable that has an exponential distribution, then the moment generating function,
mean and variance of X is given as

M X t   ,   E  X    and  2  Var  X    2 .
1
1  t 
Example 6.8: Suppose X has a n exponential distribution with mean 20. Find

a. P X  18
b. P X  18

The solution is as follows

 x   x   18   0 
18
P X  18  
18 1
exp  dx   exp     exp     exp  
a.
0 20  20   20  0  20   20 
18
 1  e  1  0.4066  0.5934
20

b. P X  18  1  P X  18  1  0.5934  0.4066 .

6.6 Gamma distribution.


Some random variables are always nonnegative and for various reasons yield distributions of data that are
skewed (non-symmetric) to the right. That is, most of the area under the density function is located near
the origin, and the density function drops gradually as y increases. A skewed probability density function
is shown in Figure 6.11.
The lengths of time between malfunctions for aircraft engines possess a skewed frequency distribution, as
do the lengths of time between arrivals at a supermarket checkout queue (that is, the line at the checkout
counter). Similarly, the lengths of time to complete a maintenance check-up for an automobile or aircraft
engine possess a skewed frequency distribution. The populations associated with these random variables
frequently possess density functions that are adequately modelled by a gamma density function.

105
A random variable Y is said to have a gamma distribution with parameters   0 and   0 if and only
if the density function of Y is
 y 1e  
y

 0 y
f  y       
,

0 elsewhere

where

    y 1e y dy .
0

The quantity   is known as the gamma function. A direct integration validate that    1 . Also, we
noted that      1  1 for any   1 and n   n  1! , provided that n is an integer. The
graphs of the gamma function for   1,2,4 and   1 are given in Figure 6.12.

For this reason,  is sometimes called the shape parameter associated with a gamma distribution. The
parameter  is generally called the scale parameter because multiplying a gamma-distributed random
variable by a positive constant (and thereby changing the scale on which the measurement is made)
produces a random variable that also has a gamma distribution with the same value of  (shape
parameter) but with an altered value of  .

Figure 6.11: Gamma probability density function for   1

If Y is a gamma random variable with parameters  and  . Then the mean and variance of the gamma
distribution is given as

  E Y    and  2  VarY    2 .
This can be proven as shown below.
By definition, the gamma density function is such that
y

 y  1e 
0    
dy  1.

Hence,

106

 y 1e  y dy      ,
0

Thus,

y
 y e  
E Y   
1  
y

    0
dy  y e dy
0    
 

1
  


  1  1 
 
  . 
The variance is given as Var Y    2  E Y 2   2 . Thus,  

y

y 1e 
   1 
 1  
y

E Y2  
    0
dy  y e dy .
0    

   1 2 

1
  

  2
   2 
 

    1 2

Thus,

 
Var Y    2  E Y 2   2     1 2   
2

  2  2   2   2  2   2 .

Example 6.9: Suppose X has a gamma distribution with mean   4 and   3 . Find the (a) mean, (b)
variance, (c) standard deviation, (d) P X  5 , and (e) write out the pdf of X .

xe  x x  1    x
 x e  3
EX    xf x   
 
a. dx   dx    0.75 .
0 0   0    4
x 2e  x x  1 x   1  x
   1 33  1 12
b.   
E X 2   x 2 f x   
0

0  
dx  
0
 x

 
e
dx 
2

42

16

 
12  3  12 9 12  9
2

Var  X     E X         
2 2
 0.1875 .
2

16  4  16 16 16

c. Std  X   Var  X     E X 2   2    3
16
 0.433 .

 e x x  1 3 2 4 x
4 x e
2 4 x
 x e
d. P  X  5   dx   dx  64 dx
5   5 3 5 2!

e 4 x
Let u  x 2 ; du  2 xdx , and v   ; dv  e  4 x . Thus,
4

107
 x 2e  4 x   x 2e  4 x 
 e 4 x 
64 dx  64  2 xdx
2!  4 4 
5 5
5

  x 2e  4 x 
 xe  4 x    x 2e  4 x 
 xe  4 x

 e 4 x 
64  dx  64   dx
 4 5
5 2   4 5
8 5
5 4 

  x 2e  4 x 
1   xe  4 x e  4 x  

   25e  20 5e  4 x e  20 
64       640  0  0     
 4 5
2 4 16  5    4 8 32 

  221e20 
 64   442e 20
 32 
e. The pdf will be
 x  1 exp x  44 x 31 exp 4 x  43 x 2 exp 4 x 
f x     , 0 x .
  3 2!
6.7 Weibull distribution.

The Weibull distribution is a two parameter continuous distribution of positive random variables that is
commonly used to describe the failure time of physical entities. It models a broad range of random
variables, largely in the nature of a time to failure or time between events. Examples are maximum one-
day rainfalls and the time a user spends on a web page. The distribution is named after Swedish
mathematician Waloddi Weibull, who described it in detail in 1939, although it was first identified by
Maurice René Fréchet and first applied by Rosin and Rammler (1933) to describe a particle size
distribution.

A random variable X has a Weibull distribution if and only if its probability density function is given as


f x   x  1 exp  x  1 . 
The graph of the pdf is shown in Figure 6.12.

Figure 6.12: The graph of the Weibull pdf

108
The cumulative distribution function (cdf) is given as

F x   1  exp  x  
with   0 and   0 as the shape and scale parameters respectively.

The cdf of the Weibull generalized family of distribution can be obtained from

 Gx   

Gx 
F x   
1 G  x 
t  1

exp  t  1
 
dt  1  exp     .
0 
 1  Gx  

Gx 
This can be obtained by substituting for x in the cdf of the Weibull distribution.
1  Gx 
The pdf of the Weibull generalized family is obtained by differentiating the cdf with respect to x as

Gx  1 exp    Gx  




f x   g x  
1  Gx  1  1  Gx  

Note that G  x  is the cdf of the parent or baseline distribution, and g x  is the corresponding pdf of the
baseline distribution.
With the cdf and pdf of the generalized family many new classes of the family of the Weibull can be
developed.

Assignment 6.5
a. Develop the pdf and cdf of Weibull exponential distribution
b. Develop the pdf and cdf of Weibull Frechet distribution
c. Develop the pdf and cdf of Weibull Lomax distribution
d. Develop the pdf and cdf of Weibull Gompertz distribution
e. Develop the pdf and cdf of Weibull Teissier distribution
f. Develop the pdf and cdf of Weibull Kumaraswamy distribution
g. Develop the pdf and cdf of Weibull Inverse exponential distribution
h. Develop the pdf and cdf of Weibull alpha power distribution
i. Develop the pdf and cdf of Weibull beta distribution

SOLUTION

109
SOLUTION

110
SOLUTION

111
SOLUTION

112
SOLUTION

113
6.8 Gompertz distribution
In probability and statistics, the Gompertz distribution is a continuous probability distribution, named
after Benjamin Gompertz. The Gompertz distribution is often applied to describe the distribution of adult
lifespans by demographers, and actuaries. Related fields of science such as biology and gerontology also
considered the Gompertz distribution for the analysis of survival. More recently, computer scientists have
also started to model the failure rates of computer code by the Gompertz distribution. In Marketing
Science, it has been used as an individual-level simulation for customer lifetime value modeling. In
network theory, particularly the Erdos–Renyi model, the walk length of a random self-avoiding walk
(SAW) is distributed according to the Gompertz distribution.
A random variable X has a Gompertz distribution if and only if its probability density function is given
as

  
f x   e x exp  e x  1 ,   x  0,  ,   0 .
  
The cdf is given as

  
F x   1  exp  e x  1 ,   , x  0,  ,  0 .
  
The graph of the pdf is shown in Figure 6.13.

Figure 6.12: The graph of the Gompertz pdf

The cdf of Gompertz-G family of distributions is obtained as


W G  x 
F x    r t dt
0

Where r t  is the pdf of the Gompertz distribution, and W Gx d   log1  G x  . Thus,

F x   
 log 1 G  x 
e x exp 
 

e x 
 
  
 1 dt  1  exp  1  1  G x   . 
0
   
The corresponding pdf can be derived as follows:

114
d 
f  x    W G  x rW G  x .
 dx 
On simplification, we have

f x   g x 1  G x 
  1 
 
 
exp  1  1  G x  
 

Assignment 6.6
a. Obtain the mean and variance of the Gompertz distribution
b. Develop the pdf and cdf of Gompertz exponential distribution
c. Develop the pdf and cdf of Gompertz Frechet distribution
d. Develop the pdf and cdf of Gompertz Lomax distribution
e. Develop the pdf and cdf of Gompertz Gompertz distribution
f. Develop the pdf and cdf of Gompertz Teissier distribution
g. Develop the pdf and cdf of Gompertz Kumaraswamy distribution
h. Develop the pdf and cdf of Gompertz Inverse exponential distribution
i. Develop the pdf and cdf of Gompertz alpha power distribution
j. Develop the pdf and cdf of Gompertz beta distribution

SOLUTION

115
SOLUTION

116
SOLUTION

117
SOLUTION

118
SOLUTION

119
6.9 Cauchy distribution.
The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also
known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz
distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution is the
distribution of the x-intercept of a ray issuing from with a uniformly distributed angle. It is also the
distribution of the ratio of two independent normally distributed random variables with mean zero.
The Cauchy distribution is often used in statistics as the canonical example of a "pathological"
distribution since both its expected value and its variance are undefined. The Cauchy distribution does not
have finite moments of order greater than or equal to one; only fractional absolute moments exist. The
Cauchy distribution has no moment generating function.
In mathematics, it is closely related to the Poisson kernel, which is the fundamental solution for the
Laplace equation in the upper half-plane. x-analytically, the others being the normal distribution and the
Levy distribution.

A random variable X is said to have a Cauchy distribution if its probability density function is defined as

f X  
1
,  x.
 
 1  x   2
The distribution has one parameter.
Assignment 6.7:

1. Suppose X has a Cauchy distribution with   4 . Compute P X  7  .


2. Suppose X has a Cauchy distribution with   9 . Compute P1  X  7  .

SOLUTION

120
6.10 Chi-Square distribution.

n 1
A Chi-square distribution is a special case of the gamma distribution with parameters   and   .
2 2
The useful part of this distribution is the fact that the sum of squares of n independent normal random
variables has a Chi-square distribution with n-degrees of freedom. The pdf of the Chi-square with n-
degrees of freedom is given as

 x
n
1

f x  
1
exp  , x  0 .
2
x
n  2
n
  2 2
2
It has its expectation and variance as E  X   n and 2n .

6.11 Beta distribution.


A random variable X has a Beta distribution if and only if its probability density function is given by

     1
f x   x 1  x  , for 0  x  1,   0,   0
 1

  
.
The mean and variance of the Beta distribution are given as

 
EX   and Var X  
          1
2

6.12 Quantile Functions and Order statistic of Continuous Probability distributions.


The quantile function is another way of describing a distribution other than the pdf or cdf. It is obtained as
the inverse cdf and calculated as:

Qu   F 1 u  .
For example, obtain the quantile function for the exponential distribution.
The cdf of the exponential distribution is given as

F  x   1  exp x 
u  1  exp x 
1  u  exp x 
log1  u   x

x   log1  u 
1

121
Hence, the quantile function of the exponential distribution can be written as

Qu    log1  u  ,
1

where u  0,1 a uniform distribution.

The quantile function is very much useful for simulation purposes. This implies that random numbers can
be generated for the exponential distribution using the expression Qu    log1  u  . The median is
1

in Qu    log1  u  .
1 1
obtained if u 
2 
6.12.1 Survival function
It can also be called reliability function. It is a function that gives the probability that a device or object of
interest will survive beyond any specified time. It is calculated as:

S  x   Pr  X  x 
 1  Pr  X  x 
Thus,
S  x  1 F  x
.

6.12.2 Hazard Function


It can also be called hazard rate or failure rate. It is a conditional density given that the event under
consideration has not yet occurred prior to time t . It is calculated as the ratio of the pdf to the survival
function:

f  x
h  x 
S  x

6.12.3 Cumulative Hazard Function


It is obtained as the integral of the hazard function:
t
H  t    h  u  du
0

H  t    ln S  t 

Also,

S  t   e H t 

122
f  t   h  t  e H t 

6.12.4 Reversed Hazard Function


It is calculated as the ratio of the pdf to the cdf as:

f  x
r  x 
F  x

6.12.5 Odds Function


It is calculated as the ratio of the cdf to the survival function as:

F  x
O  x 
S  x

6.12.6 Order Statistics


Suppose that a random sample of size ‗n‘ was drawn from an infinite population with a continuous pdf,
and that the values are arranged as in order of magnitude such that the smallest of the x‘s
corresponds to the value of the random variable (r.v) , the next has a r.v and the largest has a r.v .
The r.vs just defined is called order statistics. So, is the first order statistics (minimum order statistics),
is the second order statistics and is the maximum order statistics (last order statistics).
Theorem: For random samples from an infinite population which has the value f(x) at x, the p.d.f of the
kth order statistics is:

n! k 1 nk
g k  yk    F  yk   f  yk  1  F  yk   (23)
 k  1! n  k !
For   yk  

i. The distribution of minimum order statistics is obtained when k  1


ii. The distribution of maximum order statistics is obtained when k  n
iii. When ‗n‘ is odd, n=2m+1, set k=m+1, then the distribution of median is;

g m 1  ym 1  
 2m  1!  F y  m f y 1  F y  m
   m1    m1 
 m ! m !  m1 
iv. When ‗n‘ is even, n  2m

Exercise 6:7

1. Let X 1 , X  2 ,..., X  n  be the order statistics of a random sample of size n=8 from the

 
distribution with pdf; f  x   2 x 2  x , 0  x  1 . Compute (i) the pdf of X 1 (ii) the pdf


of X  n  (iii) Pr 1  X  5  4
2 5 
123
2. (i) Find the sampling distribution of Y1 and Yn , the minimum and maximum order statistics
of random samples of size n drawn from a continuous uniform distribution with parameters
a  0 and b  1 . (ii) If n is odd, find the sampling distribution of the median. (iii) If
n  5 , obtain the distribution of the median.
3. Obtain the distribution of minimum and maximum order statistics for the Weibull
Exponential distribution.

Assignment 6.8
a. Obtain the quantile function of the Gompertz distribution
b. Develop the quantile function of the exponential distribution
c. Develop the quantile function of the Frechet distribution
d. Develop the quantile function of the Lomax distribution
e. Develop the quantile function of the Weibull distribution
f. Develop the quantile function of the e pdf and cdf of Gompertz Teissier distribution
g. Develop the quantile function of the Kumaraswamy distribution
h. Develop the quantile function of the exponential distribution
i. Develop the quantile function of the alpha power distribution
j. Develop the quantile function of the beta distribution

SOLUTION

124
SOLUTION

125
SOLUTION

126
SOLUTION

127
6.13 Maximum Likelihood estimation of parameters of Continuous Probability distributions.

Let w1 , w 2, , , w n be random variable obtained from a population with Weibull alpha power inverted
exponential distribution with a pdf given as:

 
a  exp    log  exp   
f  w   w 

 w

  1 w2
a

a 1
 exp  w  
      1 ,
 
 
  1,   0, a  1,
Then, the log-likelihood of exponential alpha power inverted exponential distribution for vector
   a,  ,   can be represented as
T
n
 w i ,   as

n
 w i ,    n log a  n log  
n
   
w  n log  log    n exp   
i 1 i  wi 
   n
 n exp    log   an log   1   log wi2  a  1
 wi  i 1

 exp  n   
 log    i1   1 .
 w

 
 

Taking partial derivative of Equation with respect to the parameters  a,  ,   and equating to zero. The
solution to the nonlinear equation for the parameters can be obtained using R software, MATLAB, and
MAPLE. Thus, yield the maximum likelihood estimate  ˆ  aˆ , ˆ , ˆ .  

6.13.1 How to Plot Graphs for PDF and CDF Using R-software
Here is the R code for plotting the Gompertz Inverse Exponential distribution. Its pdf and cdf
are:
   
  1
    x  
  1 1 e 
        
f  x  
 
1  e 
x x  
2
e e
x  

128
and
   
   x  
1 1 e  
   
F  x  1 e    

respectively.

Follow this R-code for pdf of:


rm(list=ls())

f<-function(x,al,be,th)
{
al*(th/x^2)* (exp(-(th/x))) *(1-exp(-(th/x)))^(-be-1) * exp((al/be)* (1-(1-exp(-(th/x)))^-be ))
}

#PDF GoIE
curve(f(x,2.0,1.7,2.6),main="", ylab="f(x)",xlab="x",ylim=c(0,0.8),0,6,lwd=2)
curve(f(x,1.0,2.4,2.5),lty=1,col=2,add=T,lwd=2)
curve(f(x,0.5,0.3,0.6),lty=1,col=3,add=T,lwd=2)
curve(f(x,1,0.5,2.5),lty=1,col=4,add=T,lwd=2)
curve(f(x,0.6,4.3,3.4),lty=1,col=5,add=T,lwd=2)

legend("topright",title=expression(" "),
c(expression(alpha*"=2.0,"*~beta*"=1.7,"*~theta*"=2.6" ),
expression(alpha*"=1.0,"*~beta*"=2.4,"*~theta*"=2.5" ),
expression(alpha*"=0.5,"*~beta*"=0.3,"*~theta*"=0.6" ),
expression(alpha*"=1.0,"*~beta*"=0.5,"*~theta*"=2.5" ),
expression(alpha*"=0.6,"*~beta*"=4.3,"*~theta*"=3.4" )),
cex=1.1,lty=c(1) ,lwd=2,col=c(1,2,3,4,5))

################ CDF

129
rm(list=ls())

f<-function(x,al,be,th)
{
1- exp((al/be)* (1-(1-exp(-(th/x)))^-be ))
}

#CDF GoIE
curve(f(x,2.0,1.7,2.6),main="", ylab="F(x)",xlab="x",ylim=c(0,1.0),0,6,lwd=2)
curve(f(x,1.0,2.4,2.5),lty=1,col=2,add=T,lwd=2)
curve(f(x,0.5,0.3,0.6),lty=1,col=3,add=T,lwd=2)
curve(f(x,1,0.5,2.5),lty=1,col=4,add=T,lwd=2)
curve(f(x,0.6,4.3,3.4),lty=1,col=5,add=T,lwd=2)
legend("bottomright",title=expression(" "),
c(expression(alpha*"=2.0,"*~beta*"=1.7,"*~theta*"=2.6" ),
expression(alpha*"=1.0,"*~beta*"=2.4,"*~theta*"=2.5" ),
expression(alpha*"=0.5,"*~beta*"=0.3,"*~theta*"=0.6" ),
expression(alpha*"=1.0,"*~beta*"=0.5,"*~theta*"=2.5" ),
expression(alpha*"=0.6,"*~beta*"=4.3,"*~theta*"=3.4" )),
cex=1.1,lty=c(1) ,lwd=2,col=c(1,2,3,4,5))

130
Chapter Seven
7.1 Introduction to Statistical Estimation.
The purpose of statistics is to use the information contained in a sample to make inferences about the
population from which the sample is taken. Because populations are characterized by numerical
descriptive measures called parameters, the objective of many statistical investigations is to estimate the
value of one or more relevant parameters. As you will see, the sampling distributions derived in Chapter 6
play an important role in the development of the estimation procedures that are the focus of this chapter.
Estimation has many practical applications. For example, a manufacturer of washing machines might be
interested in estimating the proportion p of washers that can be expected to fail prior to the expiration of a
1-year guarantee time. Other important population parameters are the population mean, variance, and
standard deviation. For example, we might wish to estimate the mean waiting time  at a supermarket
checkout station or the standard deviation of the error of measurement σ of an electronic instrument. To
simplify our terminology, we will call the parameter of interest in the experiment the target parameter.
Suppose that we wish to estimate the average amount of mercury  that a newly developed process can
remove from 1 ounce of ore obtained at a geographic location. We could give our estimate in two distinct
forms. First, we could use a single number for instance 0.13 ounce that we think is close to the unknown
population mean  . This type of estimate is called a point estimate because a single value, or point, is
given as the estimate of  . Second, we might say that  will fall between two numbers; for example,
between 0.07 and 0.19 ounce. In this second type of estimation procedure, the two values that we give
may be used to construct an interval (0.07, 0.19) that is intended to enclose the parameter of interest; thus,
the estimate is called an interval estimate.
The information in the sample can be used to calculate the value of a point estimate, an interval estimate,
or both. In any case, the actual estimation is accomplished by using an estimator for the target parameter.

Definition: Estimation is the assignment of value(s) to a population parameter based on a value of the
corresponding sample statistic is called estimation.

Definition: Estimate and Estimator: The value(s) assigned to a population parameter based on the value
of a sample statistic is called an estimate. An estimator is a rule, often expressed as a formula that tells
how to calculate the value of an estimate based on the measurements contained in a sample.
For example, the samples mean,
n

X i
X i 1
n
Is one possible point estimator of the population mean  . Clearly, the expression for X is both a rule
and a formula. It tells us to sum the sample observations and divide by the sample size n.
An experimenter who wants an interval estimate of a parameter must use the sample data to calculate two
values, chosen so that the interval formed by the two values includes the target parameter with a specified
probability. Examples of interval estimators will be given in subsequent sections. Many different
estimators (rules for estimating) may be obtained for the same population parameter. This should not be
surprising. Ten engineers each assigned to estimate the cost of a large construction job could use different
methods of estimation and thereby arrive at different estimates of the total cost. Such engineers, called
estimators in the construction industry, base their estimates on specified fixed guidelines and intuition.
Each estimator represents a unique human subjective rule for obtaining a single estimate. This brings us
to a most important point: Some estimators are considered good, and others, bad. The management of a
construction firm must define good and bad as they relate to the estimation of the cost of a job. How can

131
we establish criteria of goodness to compare statistical estimators? The following sections contain some
answers to this question.

7.2 Statistical Point Estimation.


An estimate may be a point estimate or an interval estimate. These two types of estimates are described.
If we select a sample and compute the value of a sample statistic for this sample, then this value gives the
point estimate of the corresponding population parameter.

Point Estimate: The value of a sample statistic that is used to estimate a population parameter is called a
point estimate.

Thus, the value computed for the sample mean, x, from a sample is a point estimate of the corresponding
population mean,  . For the example mentioned earlier, suppose the Census Bureau takes a random
sample of 10,000 households and determines that the mean housing expenditure per month, x, for this
sample is #2970. Then, using x as a point estimate of  , the Bureau can state that the mean housing
expenditure per month,  , for all households is about #2970. Thus,

Point estimate of a population parameter = Value of the corresponding sample statistic

Each sample selected from a population is expected to yield a different value of the sample statistic. Thus,
the value assigned to a population mean,  , based on a point estimate depends on which of the samples is
drawn. Consequently, the point estimate assigns a value to  that almost always differs from the true
value of the population mean.

7.3 Sampling distribution of a statistic.


Suppose we take a sample of size n from a population of size N, then there are N combinations n = k, ie.
N
Cn  k , say possible sample. We can compute the statistic, say, t for each of these samples. Let
t1 , t2 , t3 ,..t k be values of the statistic t for these k possible samples. Thus,

The mean and variance can be computed as

1 k
E t   t
k i 1

1 k
Var t    t  E t 2
k i 1
With precision of

1
t
S .E t 
With S.E as the standard error given as
S .E  Var t 

132
7.4 Statistical Interval Estimation.
In the case of interval estimation, instead of assigning a single value to a population parameter, an interval
is constructed around the point estimate, and then a probabilistic statement that this interval contains the
corresponding population parameter is made.
Interval Estimation: In interval estimation, an interval is constructed around the point estimate, and it is
stated that this interval contains the corresponding population parameter with a certain confidence level.

For the example about the mean housing expenditure, instead of saying that the mean housing expenditure
per month for all households is #2970, we may obtain an interval by subtracting a number from #2970
and adding the same number to #2970. Then we state that this interval contains the population mean, μ.
For purposes of illustration, suppose we subtract #340 from #2970 and add #340 to #2970. Consequently,
we obtain the interval (#2970 − #340) to (#2970 + #340), or #2630 to #3310. Then we state that the
interval #2630 to #3310 is likely to contain the population mean,  , and that the mean housing
expenditure per month for all households in Nigeria is between #2630 and #3310. This procedure is called
interval estimation. The value #2630 is called the lower limit of the interval, and #3310 is called the upper
limit of the interval. The number we add to and subtract from the point estimate is called the margin of
error or the maximum error of the estimate.

The question arises: What number should we subtract from and add to a point estimate to obtain an
interval estimate? The answer to this question depends on two considerations:
1. The standard deviation  x of the sample mean, x
2. The level of confidence to be attached to the interval

First, the larger the standard deviation of x , the greater is the number subtracted from and added to the
point estimate. Thus, it is obvious that if the range over which x can assume values is larger, then the
interval constructed around x must be wider to include  .

Second, the quantity subtracted and added must be larger if we want to have a higher confidence in our
interval. We always attach a probabilistic statement to the interval estimation. This probabilistic statement
is given by the confidence level. An interval constructed based on this confidence level is called a
confidence interval.

Confidence Level and Confidence Interval: Each interval is constructed with regard to a given confidence
level and is called a confidence interval.

The confidence interval is given as

Point estimate  Margin of error

The confidence level associated with a confidence interval states how much confidence we have that this
interval contains the true population parameter. The confidence level is denoted by (1 −  ) 100%.
As mentioned above, the confidence level is denoted by (1 −  ) 100%, where  is the Greek letter alpha.
When expressed as probability, it is called the confidence coefficient and is denoted by 1 −  . In passing,
note that  is called the significance level, this will be explained later.

133
Although any value of the confidence level can be chosen to construct a confidence interval, the more
common values are 90%, 95%, and 99%. The corresponding confidence coefficients are 0.90, 0.95, and
0.99, respectively. The next section describes how to construct a confidence interval for the population
mean when the population standard deviation,  , is known.

7.4.1 Estimation of a Population mean when,  , is known.


This subsection explains how to construct a confidence interval for the population mean  when the
population standard deviation  is known. Here, there are three possible cases, as follows.

Case I. If the following three conditions are fulfilled:


1. The population standard deviation  is known
2. The sample size is small (i.e., n < 30)
3. The population from which the sample is selected is approximately normally distributed, then we use
the normal distribution to make the confidence interval for  the sampling distribution of x is normal
with its mean equal to  and the standard deviation equal to  x  
n
, assuming that n/N  0.05.

Case II. If the following two conditions are fulfilled:


1. The population standard deviation  is known
2. The sample size is large (i.e., n  30), then, again, we use the normal distribution to make the
confidence interval for  because from the central limit theorem, the sampling distribution of x is
(approximately) normal with its mean equal to  and the standard deviation equal to  x  
n
, assuming
that n/N  0 .05.

Case III. If the following three conditions are fulfilled:


1. The population standard deviation  is known
2. The sample size is small (i.e., n < 30)
3. The population from which the sample is selected is not normally distributed (or its distribution is
unknown), then we use a nonparametric method to make the confidence interval for  .
This section will cover the first two cases. The procedure for making a confidence interval for  is the
same in both these cases. Note that in Case I, the population does not have to be exactly normally
distributed. As long as it is close to the normal distribution without any outliers, we can use the normal
distribution procedure. In Case II, although 30 is considered a large sample, if the population distribution
is very different from the normal distribution, then 30 may not be a large enough sample size for the
sampling distribution of x to be normal and, hence, to use the normal distribution.

The following chart summarizes the above three cases.

134
The confidence Interval for  . The (1 −  ) 100% confidence interval for  under Cases 1 and 11 is
x  z x
where

x 
n

The value of z used here is obtained from the standard normal distribution table.

The quantity z x in the confidence interval formula is called the margin of error and is denoted by E.

Margin of Error: The margin of error for the estimate of  , denoted by E, is the quantity that is
subtracted from and added to the value of x to obtain a confidence interval for  . Thus,
E  z x
The value of z in the confidence interval formula is obtained from the standard normal distribution table
the given confidence level. To illustrate, suppose we want to construct a 95% confidence interval for  .
A 95% confidence level means that the total area under the standard normal curve between two points (at
the same distance) on different sides of  is 95%, or .95, as shown in Figure 7.2. Note that we have
denoted these two points by -z and z in Figure 8.1. To find the value of z for a 95% confidence level, we
first find the areas to the left of these two points, z and z. Then we find the z values for these two areas
from the normal distribution table. Note that these two values of z will be the same but with opposite
signs. To find these values of z, we perform the following two steps:

Figure 7.1: Finding z for a 95% confidence level.

135
1. The first step is to find the areas to the left of −z and z, respectively. Note that the area between −z
and z is denoted by 1 -  . Hence, the total area in the two tails is  because the total area under
the curve is 1.0. Therefore, the area in each tail, as shown in Figure 7.2, is  ∕2. In our example,
1 −  = 0.95. Hence, the total area in both tails is  = 1 − 0.95 = 0.05. Consequently, the area in
each tail is  /2 = 0.05∕2 = 0.025. Then, the area to the left of −z is 0.0250, and the area to the left
of z is .0250 + .95 = .9750.
2. Now find the z values from the standard normal distribution table such that the areas to the left of
−z and z are 0.0250 and 0.9750, respectively. These z values are −1.96 and 1.96, respectively.
Thus, for a confidence level of 95%, we will use z = 1.96 in the confidence interval formula.

Figure 7.2: Area in the tails

Table 7.1 lists the z values for some of the most commonly used confidence levels. Note that we always
use the positive value of z in the formula.

Table 7.1: z Values for Commonly Used Confidence Levels


Confidence Level Areas to Look for in Table z Value
90% 0.0500 and 0.9500 1.64 or 1.65
95% 0.0250 and .9750 1.96
96% 0.0200 and .9800 2.05
97% 0.0150 and 0.9850 2.17
98% 0.0100 and 0.9900 2.33
99% .0050 and .9950 2.57 or 2.58

Example 7.1: A publishing company has just published a new college textbook. Before the company
decides the price at which to sell this textbook, it wants to know the average price of all such textbooks in
the market. The research department at the company took a random sample of 25 comparable textbooks
and collected information on their prices. This information produced a mean price of #145 for this
sample. It is known that the standard deviation of the prices of all such textbooks is #35 and the
population distribution of such prices is approximately normal.
a. What is the point estimate of the mean price of all such college textbooks?
b. Construct a 90% confidence interval for the mean price of all such college textbooks.

Solution Here,  is known and, although n < 30, the population is approximately normally distributed.
Hence, we can use the normal distribution. From the given information,
n  25, x #145,  #35
The standard deviation of x is
 35
x   #7.00
n 25

136
a. The point estimate of the mean price of all such college textbooks is #145; that is, point estimate
of x   #145 .

b. The confidence level is 90%, or .90. First we find the z value for a 90% confidence level.
Here, the area in each tail of the normal distribution curve is  ∕2 = (1 − 0.90) /2 = 0.05.
Now in the standard normal table, look for the areas 0.0500 and 0.9500 and find the corresponding
values of z. These values are (approximately) z = −1.65 and z = 1.65
Next, we substitute all the values in the confidence interval formula for  . The 90% confidence
interval for  is
x  z x = 145  1.65(7.00) = 145  11.55 = (145 − 11.55) to (145 + 11.55) = #133.45 to #156.55
Thus, we are 90% confident that the mean price of all such college textbooks is between $133.45 and
#156.55. Note that we cannot say for sure whether the interval #133.45 to #156.55 contains the true
population mean or not. Since  is a constant, we cannot say that the probability is 090 that this interval
contains  because either it contains  or it does not. Consequently, the probability that this interval
contains  is either 1.0 or 0. All we can say is that we are 90% confident that the mean price of all such
college textbooks is between #133.45 and #156.55.
In the above estimate, #11.55 is called the margin of error or give-and-take figure.

Figure 7.3: Confidence intervals.

How do we interpret a 90% confidence level? In terms of Example 7.1, if we take all possible samples of
25 such college textbooks each and construct a 90% confidence interval for μ around each sample mean,
we can expect that 90% of these intervals will include μ and 10% will not. In Figure 7.3 we show means
x1 , x2 and x3 of three different samples of the same size drawn from the same population. Also shown in
this figure are the 90% confidence intervals constructed around these three sample means. As we observe,
the 90% confidence intervals constructed around x1 , x2 include  , but the one constructed around x3
does not. We can state for a 90% confidence level that if we take many samples of the same size from a
population and construct 90% confidence intervals around the means of these samples, then we expect
90% of these confidence intervals will be like the ones around x1 , x2 in Figure 7.3, which include  , and
10% will be like the one around x3 , which does not include  .

137
Assignment 7.1: The standard deviation for a population is  = 14.8. A random sample of 25
observations selected from this population gave a mean equal to 143.72. The population is known to have
a normal distribution.
a. Make a 99% confidence interval for  .
b. Construct a 95% confidence interval for  .
c. Determine a 90% confidence interval for  .
d. Does the width of the confidence intervals constructed in parts a through c decrease

SOLUTION

138
SOLUTION

139
7.4.2 Determining the Sample Size for the Estimation of Mean
Determining the Sample Size for the Estimation of  given the confidence level and the standard
deviation of the population, the sample size that will produce a predetermined margin of error E of the
confidence interval estimate of  is
z 2 2
n
E2

Assignment 7.2: An alumni association wants to estimate the mean debt of this year‘s college graduates.
It is known that the population standard deviation of the debts of this year‘s college graduates is #11,800.
How large a sample should be selected so that a 99% confidence interval of the estimate is within $800 of
the population mean?

SOLUTION

140
7.4.3 Confidence Interval for the Mean for Unknown Variance and Small Sample Size
In the previous section, we have seen that we can use the normal distribution to construct confidence
intervals for a population mean  when  is known provided that the underlying population is normally
distributed with large sample size larger than 30.
Case I. If the following three conditions are fulfilled:
1. The population standard deviation  is not known
2. The sample size is small (i.e., n < 30)
3. The population from which the sample is selected is approximately normally distributed, and then we
use the t- distribution to make the confidence interval for  .

Case II. If the following two conditions are fulfilled:


1. The population standard deviation  is not known
2. The sample size is large (i.e., n ≥ 30), then again we use the t distribution to make the confidence
interval for  .

Case III. If the following three conditions are fulfilled:


1. The population standard deviation  is not known
2. The sample size is small (i.e., n < 30)
3. The population from which the sample is selected is not normally distributed (or its distribution is
unknown),

However, we can construct a confidence interval for the population mean when the standard deviation is
not known. Thus, in place of z, the standard normal, we use the test statistic t with n-1 degrees of
freedom. The t-statistic is defined as

x
t
s
n
However, if the sample size n >30, we can approximate the t-distribution to normal distribution.
The confidence Interval for  can be obtained using the (1 −  ) 100% confidence interval for  under
Cases 1 and 11 is
x  tsx

141
where
s
sx 
n

The value of t is obtained from the t distribution table for n − 1 degrees of freedom and the given
confidence level. Here tsx is the margin of error or the maximum error of the estimate; that is,
E  tsx

Example 7.2: A sample of 16 private colleges around the state shows a mean cost for a year‘s tuition of
#25000 with a standard deviation of #4500. Find a 90% confidence interval for the average tuition costs at
private colleges.
n  16, x #25000, s #4500 , we calculate.

4500
s  1125
16
Since the sample size n =16< 31, we use a t-distribution with n -1 = 15 degrees of freedom.

 25000  1.7531125  23027.875,26972.125


s
xt
n

7.4.4 Confidence Interval for Proportion


Often we want to estimate the population proportion or percentage. (Recall that a percentage is obtained
by multiplying the proportion by 100.) For example, the production manager of a company may want to
estimate the proportion of defective items produced on a machine. A bank manager may want to find the
percentage of customers who are satisfied with the service provided by the bank.
Again, if we can conduct a census each time we want to find the value of a population proportion, there is
no need to learn the procedures discussed in this section. However, we usually derive our results from
sample surveys. Hence, to take into account the variability in the results obtained from different sample
surveys, we need to know the procedures for estimating a population proportion.
Recall from Chapter 7 that the population proportion is denoted by p, and the sample proportion is
denoted by p̂ . This section explains how to estimate the population proportion, p, using the sample
proportion, p̂ . The sample proportion, p̂ , is a sample statistic, and it possesses a sampling distribution.
We know that:
1. The sampling distribution of the sample proportion p̂ is approximately normal for a large sample.
2. The mean of the sampling distribution of p̂ ,  p̂ , is equal to the population proportion, p.
pq
3. The standard deviation of the sampling distribution of p̂ is  pˆ  , where q  1  p , given
n
n
that  0.05
N

142
In the case of a proportion, a sample is considered to be large if np and nq are both greater than 5.
If p and q are not known, then npˆ and nqˆ should each be greater than 5 for the sample to be large.
When estimating the value of a population proportion, we do not know the values of p and q.
Consequently, we cannot compute  p̂ . Therefore, in the estimation of a population proportion,
we use the value of s pˆ as an estimate of  p̂ . The value of s pˆ is calculated using the following formula.

Estimator of the Standard Deviation of p̂ The value of s pˆ , which gives a point estimate of  p̂ , is
calculated as follows. Here, s pˆ is an estimator of  p̂ .
pˆ qˆ
s pˆ 
n

n
Note that the condition  0.05 must hold true to use this formula.
N
The sample proportion, p̂ , is the point estimator of the corresponding population proportion p.
Then to find the confidence interval for p, we add to and subtract from p̂ a number that is called the
margin of error, E.

Confidence Interval for the Population Proportion, p For a large sample, the (1 −  )100% confidence
interval for the population proportion, p, is

pˆ  zs pˆ

The value of z is obtained from the standard normal distribution table for the given confidence level, and
pˆ qˆ
s pˆ  . The term zs pˆ is called the margin of error, or the maximum error of the estimate, and is
n
denoted by E.

Assignment 7.3: According to a Gallup-Purdue University study of college graduates conducted during
February 4 to March 7, 2014, 63% of college graduates polled said that they had at least one college
professor who made them feel excited about learning (www.gallup.com). Suppose that this study was
based on a random sample of 2000 college graduates. Construct a 97% confidence interval for the
corresponding population proportion.

SOLUTION

143
7.4.5 Confidence Interval for Variances and Standard Deviations

When random samples are drawn from a normal population of a known variance  2 , the quantity
n  1s 2 possesses a probability distribution that is known as chi-square distribution. The calculation
2
follows as:

2 
n  1s 2
2
Making  2 the subject of the formula, we have

 2

n  1s 2
2

Where s 2 is the sample variance, n is the sample size, and  2 is the value of the population variance with
(n-1) degrees of freedom.
The confidence interval is

 
 n  1s 2 n  1s 
2
 ,
  2  df ,    2  df ,1    
  2  2  

Assignment 7.4: Find  2 25,0.025

SOLUTION

144
Chapter Eight
Introduction to Statistical Test of Hypotheses
8.0 Introduction
A hypothesis is an assertion or conjecture about the population parameter(s). For example H: , H:
H: 10, etc. These are statements about the population parameters. Sometimes, a hypothesis
could be conjectures about parameters of two or more populations. For instance, H: , H: ,
H: , etc.
There are two types of hypothesis, namely, null hypothesis and alternative hypothesis. The null
hypothesis is the hypothesis which is to be tested for acceptance or rejection. It is usually denoted as H o.
The alternative hypothesis, denoted as H1 or HA, is a conjecture about a population parameter which
gives an alternative to the null hypothesis. For instance, if Ho: , the alternatives are H1: ,
H1: , H1:

8.1 Introduction to Statistical Test of Hypotheses - Simple and Alternative.


Test of hypothesis is the statistical analysis carried out in order to decide whether to accept or reject the
null hypothesis. The method of hypothesis testing uses tests of significance to determine the likelihood
that a statement (often related to the mean or variance of a given distribution) is true, and at what
likelihood we would, as statisticians, accept the statement as true. While understanding the mathematical
concepts that go into the formulation of these tests is important, knowledge of how to appropriately use
each test (and when to use which test) is equally important.
Once the data is collected, tests of hypotheses follow the following steps:
1. Using the sampling distribution of an appropriate test statistic, determine a critical region of size
.
2. Determine the value of the test statistic from the sample data.
3. Check whether the value of the test statistic falls within the critical region; if yes, we reject
the null in favor of the alternative hypothesis, and if no, we fail to reject the null hypothesis.
These three steps are what we will focus on for every test; namely, what the appropriate
sampling distribution for each test is and what test statistic we use (the third step is done by
simply comparing values).
Next, we give definitions and explanations of important concepts in statistical test of hypotheses.

8.1.1 Types of Errors in Hypothesis Testing:


There is always some possibility of committing an error in taking decision about the hypothesis. These
errors are grouped into two, type 1 error and type II error.
A type I error is committed when we reject a correct Ho when Ho is actually true and H1 is false. On the
other hand, a type II error is committed when we accept an incorrect Ho. when H1 is actually correct. A
type II error is considered more severe than the type I error. Hence, in drawing conclusion (decision)
about the Ho, the practice is to ensure that type II error can be minimized even at a certain risk of
committing a type I error.

8.1.2 Levels of Significance: This is defined as the quantity of risk of the type I error which we are
willing to tolerate in making a decision about Ho. In other words, it is the probability of committing a type
I error of a tolerable level or degree. It is denoted by  and is conventionally chosen as 0.05 (5%) or 0.01

145
(1%). Significance level of  = 0.01 is used for high precision while that of  = 0.05 is used for
moderate precision.
8.1.3 P-value (Probability Value):
This is defined as the smallest level of  at which Ho is significant, that is the smallest level of  at which
Ho is rejected. It is a number that denotes the likelihood of your data having occurred under the null
hypothesis of your statistical test. The level of statistical significance is usually represented as a P-value
or probability value between 0 and 1. The smaller the p-value, the more likely it is that you would reject
the null hypothesis.
P-Value enables an individual to decide for himself how significant the data are. It avoids the imposition
of a fixed level of significance about the acceptance or rejection of Ho.

8.1.4 Critical Region (C.R)


It is the region within which the test statistic lies or falls for the Ho to be rejected. A statistic that is used to
test the Ho follows some known distribution. In a test, the area under the probability density curve is
divided into two regions, namely, the region of acceptance and the region of rejection also known as the
critical region.
The area of the critical region is equal to the level of significance . Critical region is always on the tail(s)
of the distribution curve depending on whether the test is one-tailed or two-tailed.
If the alternative hypothesis, H1, is of the type > o, then the critical region lies on right tail of the
probability density curve and the test is called a one-tailed test. On the other hand, if H1, is of the type
o, then the critical region lies on left tail of the probability density curve and the test is also called a one-
tailed test.
If H1 is of the type o, the critical region lies on both tails of the distribution curve and the test is
called a two-tailed test.

In a two-tailed test, the critical region is still equal to  but /2 of it lies on each tails for a test of
significant level .

Fig: 8.1: One-Sided Right Tailed Critical Region

Fig 2: One-Sided Left Tailed Critical Region

146
Fig.: Two-Tailed Critical Regions.
8.1.5 Size of A Test: This is defined as the probability rejecting the null hypothesis when it is true. It is
usually denoted by  . Therefore

P(Reject Ho/Ho is true) =  (8.1)


8.1.6 Power of A Test: This is defined as the probability of rejecting Ho when Ho is actually false and H1
is true.
Power ( )

Power ( )

Power ( ),
Power (8.2)
8.1.7 Degrees of Freedom: Degree of freedom (d.f) is defined as the number of independent observations
in a set. The table values for the distribution of the test statistics are provided on statistical tables for
various levels of significance and degrees of freedom. These tables of values enable us to decide about the
rejection or otherwise of Ho.
8.1.8 Point Estimate of a Population Parameter
A point estimate is a single value obtained from a sample and used as an estimate of a parameter. For
instance, a sample mean is a point estimate of a population mean.

8.1.9 Confidence Interval (CL) for a Population Mean


The confidence interval (CL) for a population mean is the interval within which the mean of a population
is expected to lie with a given level of confidence. Generally, the ( ) confidence interval is
obtained using the formula:
̅ ( ̅ ), (8.3)
where ̅ is the mean of a sample drawn from the population, is the table value of an appropriate test
statistic at a specified level of significance , (where applicable), is the degree of freedom and ( ̅)
is the standard error of the sample mean.

8.2 Small Sample Tests for Population Mean and Difference of Two Population Means
We consider tests of hypotheses for one sample and two samples when dealing with samples of small
size. In both of the cases, the student t –test is applied.
Assumptions about the test include the following:
a. The random variable X follows a normal distribution
b. All the observations in the sample are independent
c. The sample size is not large, i.e. 30
d. The assumed value o of the population mean is the correct value.

147
e. The sample values are correctly taken and recorded.
In case the above assumptions do not hold, the reliability of the test decreases.

8.2.1 Test of Hypothesis about Population Mean for One Sample of Small Size
The t-distribution is used in the test of hypothesis about the population mean when the sample size is
small. It is defined as the deviation of the estimated sample mean from its population mean expressed in
terms of standard deviation.
Suppose a small random sample (X1, X2,…, Xn) of size has been drawn from a normal population
having a mean and an unknown variance . We want to test the hypothesis:
Ho: against H1: where o is some assumed value for .
Let the observed values on the random sample (X1, X2, …, Xn) be ( 1,, 2, ..., n). The statistic for student t
test is given by:
̅

, (8.4)

where the estimate of the population standard deviation is obtained using sample standard deviation
given by:
∑ ( ̅) , (8.5)
( )
or
{∑ ∑ ( ) }, (8.6)
( )
If calculated value of from (8.3) is greater or equal to the table value of ( ( ) ) at a given and
for ( ) d.f, we reject Ho, and therefore conclude that the is the difference between the sample mean
and population mean is significant, otherwise accept Ho.
Note that for a right tailed test where the alternative hypothesis takes the form : , Ho is rejected
if calculated t is greater or equal to the table value of ( ( ) ), at the specified and a (n - 1) d.f.
Similarly, for a left tailed test where the alternative hypothesis takes the form : , Ho is rejected
if calculated t is less or equal to the negative table value of ( ( ) ) at the specified and a (n - 1) d.f.

The rules for rejection of Ho is summarized below:


a. Given reject if ( ) , other do not reject
b. Given reject if ( ) , other do not reject (Right tail test)
c. Given reject if ( ) , other do not reject (Left tail test)
for a -test, the ( ) confidence interval (CI) is given as:
̅ ( ), (8.7)

where the lower and upper limit are respectively given as ̅ ( ) and ̅ ( ), and
√ √
is the table value of t at a given and d.f of ( ).

Example 8.1: In a modulation study on at certain breed of crop, a sample of eleven plants gave the
following shoot lengths. 10.1, 21.5, 11.7, 12.9, 14.8, 11.0, 19.2, 11.4, 22.6, 10.8, 10.2.
a. An earlier study reported that the mean shoot length is 15cm. Test whether the experimental data
confirm the old view at 5% level of significance?
b. Determine the p-value of the test statistic and take a decision based on it
c. Calculate the confidence interval for
d. Determine the upper and lower limits of

148
Solution
(a) Since the sample size is small, a t-test is performed where we have to test the hypothesis:
Ho: = 15 against H1 15
The test statistic is:
̅

,

o= 15, ̅ 14.2; 11, standard deviation, = 4.6861.
Therefore,
√ ( )
,

At 5% level of significance,  = 0.05/2 = 0.025. The table value of t for 10 d.f is 2.228
Since Ho is not rejected at 5% level of significance. We conclude that the experimental
data confirms the old view at 5% level of significance.
(b) here we want to show how to compute a p-value in the absence of specialized calculators and
computer software.
In the distribution table, the calculated for 10 d.f is seen to lie between 0.260 (under
) and 0.7000 (under ).
We interpolate between these values to get value of , (say ) corresponding to 0.5662 as follows:
,

,
,
For a two-tailed test, since , we do not reject Ho and therefore conclude
that the experimental data confirms the old view at 5% level of significance.
(c) The confidence interval (CL) for the population mean is:
̅ ( ),

CL = 14.2 ( )

CL = 14.2
CL = (11.0520, 17.3480),
Lower limit = 11.0520,
Upper limit = 17.3480.

Example 8.2: A researcher claimed that the life expectancy of people living in country A is expected to
be 50 years. A survey of life expectancy was conducted in eleven states of the country and the data
obtained is as follows:
54.2 50.4 44.2 49.7 55.4 57.0 58.2 56.6 61.9 57.5 53.4
(a) Do the data support the researcher‘s claim at 95% level of significance?
(b) Find the 95% confidence interval for the population mean, .

Solution
(a) We have to test
against
Test statistic t is given by:

149
√ ( ̅ )
,
̅ ( ) ,
̅ ,
Without using the value of ̅ , we can compute with equation (8.6) given as:
{∑ (∑ ) }
,
Therefore,
( ( ) )
,
,
√ ( )
,
,
The table value of t at and a d.f of 10 for a two-tailed test is 2.228.
Since , we reject and conclude that life expectancy of the people in the said country is not
50years.

(b). Confidence interval (CI) is given as:


̅ ( ),

Therefore,
( )

,
Upper limit = 54.41+3.26 = 57.67,
Lower limit = 54.41-3.26 = 51.15

8.2.2 Tests of Hypothesis about Difference in Population Means for Samples of Small Sizes
The t-distribution is also used in the test of hypothesis involving the difference in two population means
when the sample size is small. It is defined as the deviation of the estimated sample means expressed in
terms of the sum of their standard deviations.
Here, we assume that the samples are from two normal populations with mean , and we desire to
test the hypothesis:
vs
which is equivalently,
vs

CASE 1: When . That is the two populations are distributed with the same variance.
If two independent samples of sizes and are taken, respectively, from the two populations, the test
statistic under (that is ) is given by:
̅ ̅
, (8.7)

with a d.f of ( ), where ̅ and ̅ are the means of sample I and sample II given as
( ) and ( ), respectively, and the pooled standard deviation or standard
error ( ) is given by:

150
∑ ( ̅ ) ∑ ( ̅ )
( )
, (8.8)

If the sample variances, , are known or have been calculated earlier, then is computed using the
formula:
( ) ( )
( )
, (8.9)

Decision about at level of significance is taken according to the following rules:


(I) Given reject if ( ), or ( ) otherwise is not
rejected;
(II) Given reject if ( ), otherwise Ho is not rejected;

(III) Given reject if ( ), otherwise Ho is not rejected.

From (8.7) for difference of two means, the ( ) confidence interval (CI) is given as:

( ̅ ̅ ) ( √ ), (8.10)

where the upper and lower limits are respectively given by:

( ̅ ̅ ) ( √ ), and ( ̅ ̅ ) ( √ ).

Example 8.3: The table below gives the average value of two random samples representing the monthly
sales of a certain product in two countries X and Y.
Table 8.1. Data for Example 8.3
Month X Y

Jan 363 536


Feb 404 474

Mar 518 556

Apr 521 549

May 613 479

Jun 587 422

Jul 365 315

Aug 412 414

Sep 469 505

Oct 468 552

Nov 371 492

151
Dec 330 507

(a). Assume that the variances in sales of the product are the same in X and Y, test whether the average
daily sales in the two countries are equal
(b). determine the 95% confidence interval for the difference in the means of X and Y.
Solution:
(a) Let and be the means of the population from which X and Y, respectively, were drawn. The
hypothesis to test is given by:
against
Under the assumption of equal variances, the test statistic is given by:
̅ ̅
,

∑ ,

̅ ,

∑ ,

∑ ,

̅ ,

∑ ,

{∑ (∑ ) } {∑ (∑ ) }
( )
,

{ ( ) } { ( ) }
,

Therefore,

,

The table value of t at and a d.f of 22 is 2.074. that is ( )

Since ( ) Ho is not rejected. We conclude that the mean daily


sales of the product is the same in the two countries.

152
(b) The (1 - ) confidence interval for difference of means under the assumption of equal variances
Is given by:
( ̅ ̅ ) ( √ ),
Therefore,
CI = (451.75-483.42) √ ,
CI = -31 ,
With upper limit and lower limit of 38.20 and -104.54.

CASE 2: When .
In this situation, the variance cannot be pooled. We use the Berhans-Fisher test statistic given by:
̅ ̅
, (8.11)

where and are estimates of and and , respectively.


The tabulated value of denoted by is obtained from Cochran‘s approximation using the formula:

, (8.12)

where and are the table values of t at a prefixed level of significance with ( ) and ( )
d.f , respectively.
If , we obtain:
, (8.13)
with ( ) d.f.

For decision about Ho, if | | , reject Ho, otherwise Ho is not rejected.

From (8.11), the ( ) confidence interval (CI) for difference of two population means when
is given by:

( ̅ ̅ ) (√ ), (8.14)

where the upper and lower limits are respectively given by:

( ̅ ̅ ) (√ ), and ( ̅ ̅ ) (√ ).

Example 8.4: The table below gives the gain in body weight (kilograms) per heifer of two different
breeds under a grazing treatment
Table 8.2: data for Example 8.4

153
Breed 1 57.3 26.9 53.2 16.8 44.8 54.2 71.4

Breed 2 64.2 52.2 48.6 26.6 44.5 71.8


At 95% level of significance, test the equality of the mean gain in body weight of the two breeds under
the assumption that the population variances are different.
Solution
We want to test the hypothesis
against
Under the assumption of unequal population variances, we have:
̅ ̅
,

∑ , ̅ ,∑ .

∑ , ̅ ,∑ .
)
* ( } ,

)
* ( } ,

Therefore,

,

, and ,

Therefore,

The calculated value of therefore, Ho is not rejected. We conclude that


the average gain in body weight of the two breeds is equal.
8.3 Large Sample Tests for Population Mean and Difference of Two Population Means

154
We apply the Z-test when we are dealing with large samples, that is sample sizes of equal to or greater
than 30.

8.3.1 Large Sample Tests for Population Mean


To test the Ho: against the alternative hypothesis when sample size is large, we use the Z-test
(also referred to as the normal test) given by:
̅
, (8.15)

The population variance may be estimated from the sample variance .


Example 8.5: The table below shows the income (in thousand naira) of 36 randomly selected persons
from a particular class of people in Nigeria.
Table 8.3: Data for Example 8.5
6.5 10.5 12.7 13.8 13.2 11.4
5.5 8.0 9.6 9.1 9.0 8.5
4.8 7.3 8.4 8.7 7.3 7.4
5.5 6.8 6.9 6.8 6.1 6.5
4.0 6.4 6.4 8.0 6.6 6.2
4.7 7.4 8.0 8.3 7.6 6.7
on the basis of the above data, can it be said that the mean income of persons in this class is N10,000 per
year?
Solution
We have to test the hypothesis
against
Since the sample size is large, we use the Z test (normal test)
̅
,

̅ , ,
Therefore,
√ ( )
,

Z = -5.81.
The table value of Z at for a two-tailed test is 1.96. Since Zcal is less than -1.96, we reject Ho
and conclude that the mean income is not N10,000.

8.3.2 Large Sample Test for Difference in Two Population Means

155
To test the hypothesis Ho : vs , we use the Z-test where the test statistic is
given by:
( ̅ ̅ )
, (8.16)

where is the hypothesized difference between the two population means.

In the event that the population variances are unknown, then their estimates are obtained from the
samples variances , respectively.
Decision about Ho is taken according to the following rules:
I. Given or reject Ho if or if

II. Given , reject Ho if


III. Given , reject Ho if .

Example 8.6: Two samples were drawn from two normal populations ( ) and ( ). The
following information was available on these samples regarding the expenditure in naira per family

Sample 1: ̅

Sample 2: ̅
Is there significant difference between the mean expenditure of the two families?
Solution
We want to test the hypothesis
against
Since sample sizes are large, Ho is tested by the statistic,
̅ ̅
,

,

The table value of Z at is 1.96. Since calculated Z is greater than tabulated Z, we reject Ho and
conclude that the difference in mean expenditure is significant.
8.4 Tests of Hypothesis for Proportion (One Sample Case)
If the observations on various items are categorized into two classes, and (binomial population), we
often want to test the hypothesis whether the proportion of items in a particular class, say is or not.
Thus the hypothesis:
against or or

156
Can be tested using the Z-test where P is the actual proportion of items in the population belonging to
class .
Let ̂ be the sample estimate of P. Ho can be tested by the statistic
̂
, (8.17)

where is the sample size, ̂ is the proportion obtained from the sample, is the hypothesized or
assumed value of the population proportion, and ̂ ̂ .
Decision about Ho can be taken as follows:
I. Given , reject Ho if or

II. Given , reject Ho if ,


III. Given , reject Ho if
At and

At and

Example 8.7: To test the claim of the management that 60% employees support a new bonus scheme, a
sample of 150 employees was drawn. It was discovered that only 55 of the employees support the new
bonus scheme. Is the management right in their claim at
Solution:
The hypothesized value of P = 60% =60/100 = 0.6,

n =150, ̂ .

We want to test the hypothesis that:


Vs
̂
,

,
√( )

Z = -0.5825.

At , the table value of Z is 2.58. since calculated Z is less than , we reject Ho and
conclude that 60% of the employees do not support the new bonus scheme.
8.5 Test of Equality of Two Proportions (Two Sample Case)
If we have two populations, and each items of a population belongs to either of the two classes and .
A researcher may be interested to know whether the proportions of items in class in both populations
are the same or not. Here, we wish to test the hypothesis that:
against or or .
where are the proportions of items in the two populations belonging to class .

157
When it is assumed that the population proportions are equal, that is , Ho can be tested
against using the statistic
| |
, (8.18)
√ ̂ ̂( )

where ̂ ̂ , and

̂ , (8.19)

The standard error of ( ) is by√ ̂ ̂ ( ).

Under the assumption that , the test statistic becomes:


| |
, (8.20)
√( )

Decisions about Ho in equations (8.18 and 8.20) are the same as in equation (8.17) above for one sample
case.
Example 8.7: A sample of 400 families in City A is randomly selected, and another sample of 500
families is selected in City B. The number of TV owners in City A and City B are 48 and 120,
respectively. Test the hypothesis that the number of TV owners in both cities are the same at .
Solution:
, ,
Under the assumption that ,
( ) ( )
̂ ,

̂ ( ̂) ,
| |
,
√ ̂ ̂( )

| |
,
√ ( )

Z= ,

Z = 4.59.
The table value of Z at for a two-tailed test is 1.96. Since the calculated Z is greater than the
tabulated Z, we reject Ho and conclude that the proportions of TV owners in City A and City B are not
the same.
Assignment 8.1:
The following figures give the forage yield (q/ha) of ten randomly selected plots.

158
21.8, 24.8, 23.3, 29.3, 30.8, 31.8, 32.4, 32.5, 32.1, 31.3.
(a) on the basis of the data, test the claim that the mean yield of forage is 30 q/ha (i) at (ii)

(b) Determine the ( ) confidence interval and limits for both a(i) and a(ii).

SOLUTION

159
Assignment 8.2:
The yields from two strains of a crop was found to be as given below:
Strain 1: 15.4, 20.5, 15.9, 38.3, 8.7, 37.0, 39.2, 26.4
Strain 2: 33.6, 10.7, 6.5, 24.0, 42.5, 22.9
Test whether the mean yields of the two strains are equal at
SOLUTION

160
Assignment 8.3:
Given that , ̅ , and .
Test the hypothesis Ho : against (i) . (ii) (iii) .
SOLUTION

161
Assignment 8.4: A random sample of 1000 workers from South India shows that their mean wages are
$47 per week with a standard deviation of $28. A random sample of 1500 workers from North India
shows that their mean wages are $49 per week with a standard deviation of $40. Is there any significant
difference between the mean wages in the two cities? (take ).

SOLUTION

162
Assignment 8.5:
A random sample of 100 articles from a selected batch of 2000 articles shows that the average diameter of
the articles is 0.354mm with a standard deviation of 0.048mm. Find the 95% confidence interval for the
average diameter of this batch of 2000 articles.
SOLUTION

163
Assignment 8.6:
Is it likely that a sample of size 300 whose mean is 12 is a random sample from a large population with
mean 12.5 and a standard deviation 5.2?
SOLUTION

164
Assignment 8.7:
To test the claim of the management that 75% employees support a new policy, a sample of 300
employees was drawn. It was discovered that only 90 of the employees support the new policy. Is the
management right in their claim at

SOLUTION

165
Assignment 8.8:
A sample of 150 families in City A is randomly selected, and another sample of 250 families is selected in
City B. The number of car owners in City A and City B are 27 and 80, respectively. Test the hypothesis
that the number of car owners in both cities are the same at .

SOLUTION

166
8.6 Nonparametric Tests
Nonparametric tests are tests that do not require knowledge about the form of the parent distribution.
Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the
underlying distribution of the data being studied.

The reasons for the application of nonparametric tests include the following:
1. When the underlying data do not meet the assumptions about the population sample. Generally, the
application of parametric tests requires various assumptions to be satisfied. For example, the data follows
a normal distribution and the population variance is homogeneous.
2. When the population sample size is too small. The sample size is an important assumption in selecting
the appropriate statistical method. If a sample size is reasonably large, the applicable parametric test can
be used. However, if a sample size is too small, it is possible that you may not be able to validate the
distribution of the data. Thus, the application of nonparametric tests is the only suitable option.

3. The analyzed data is ordinal or nominal. Unlike parametric tests that can work only with continuous
data, nonparametric tests can be applied to other data types such as ordinal or nominal data. For such
types of variables, the nonparametric tests are the only appropriate solution.

The chi-square test is taken as parametric as well as nonparametric. Other examples of nonparametric
tests include the sign test, the rank test, etc.

8.6.1 The Chi Square Distribution:


The chi square distribution was first discovered by Helmert in 1876 and later by Karl in Pearson in the
year 1900. In probability theory and statistics, the chi-squared distribution (also chi-square or -
distribution) with k degrees of freedom is the distribution of a sum of the squares
of independent standard normal random variables. The chi-squared distribution is a special case of
the gamma distribution and is one of the most widely used probability distributions in inferential
statistics, notably in hypothesis testing and in construction of confidence intervals. The chi-squared
distribution is used in the chi-squared tests for goodness of fit of an observed distribution to a theoretical
one, the independence of two criteria of classification of qualitative data, and in finding the confidence
interval for estimating the population standard deviation of a normal distribution from a sample standard
deviation, etc.
8.6.2 Goodness-of-Fit Test:
When the data are given in k mutually exclusive categories or classes, the goodness-of-fit test is regarded
as one-way classification test. Here, we test the null hypothesis that the probability of an item falling in
the class is or simply that the frequencies in k classes occur in the ration .
Suppose a random sample of n units falls in k classes or categories as follows:
Categories … …

Observed Frequencies … …

The hypothesis Ho: against is not true


Can be tested by chi square where the test statistic is given by:
∑ ( )
, (8.21)

where the suffix (k - 1) is the d.f for , and are the observed and expected
frequencies.

167
Reject Ho at a given if the calculated chi square is greater or equal to the table value of chi square given
and (k – 1) d.f. Rejection of Ho means that the observed data are not in agreement with the expected
ratio.
Example 8.8: Mendel conducted a classic experiment on peas to know the genetic effect of colour and
shape in the first generation, after taking four crosses, namely, Round and Yellow (RY), Round and
Green (RG), Wrinkled and Yellow (WY), and Wrinkled and Green (WG). According to his theory, the
frequencies of these four classes should be in the ratio 9:3:3:1. However, the observed frequencies he
found is shown in the table below:
Table 8.4: Observed Frequencies for four Crosses of Peas
Classes RY RG WY WG Total

Observed Freq. 315 108 101 32 556

Does the observed frequencies support Mendel‘s theory at ?


Solution
We want to test the hypothesis that:
Ho: against is not true

Sum of ratios, .
The expected frequencies ( ) based on Mendel‘s theory as calculated as follows:

Therefore,

, ,

, ,

The expected frequencies are rounded in such a way that the sum of observed frequencies is equal to the
sum of the expected frequencies.
The test statistic is given by:
∑ ( )
, k = 4,

( ) ( ) ( ) ( )
,

At the table value of which is greater than the calculated value of chi square.
Hence, Ho is not rejected. We conclude that the data support Mendel‘s theory.

168
8.7 Contingency Table for Independence of Factors
A contingency table is a rectangular array of order where m denotes the number or rows which are
equal to the categories of an attribute or criterion A, and p denotes the number or columns which are
equal to the categories of an attribute or criterion B. An example of a contingency table is shown below:

Table 8.5: Contingency table


Attribute B
Attribute A Total
… …

… …

… …

… …

… …

Total … …

In Table 8.5, cell frequency in the ( ) cell represents the number of items or individuals
possessing the characteristics of and Any of the row total, and column total is called the
marginal total, and is the grand total.
Contingency table simplifies the process of computations of the expected frequencies. It is very handy
when carrying out the test of independence of factors as the following example would show. The test
statistic for independence of factors is the same as in goodness-of-fit, the degrees of freedom is equal to
( )( ) The test is always right-tailed.
We test the hypothesis that:
Ho: two factors are independent Vs : Two factors are dependent.
The expected frequency, , for item is calculated using the formula:
( )
, (8.22)

Example 8.9:
In a volunteer group, adults 18 years and older volunteer to spend one to nine hours with children in an
orphanage home. The consultant employs persons among NYSC members, University students, and
nonstudents. The table below shows the types of volunteers and the number of hours they volunteer per
week.

Table 8.6: data for example 8.9


Type of volunteers Number of hours worked per week Row Total

169
1-3 4-6 7–9

Community college students 111 96 48 255


Four-year college students 96 133 61 290

Non-Students 91 150 53 294

Column Total 298 379 162 839

Is the number of hours volunteered independent of the type of volunteer at ?


Solution
The two factors are number of hours volunteered and type of volunteer. This test is always right-tailed.
: The number of hours volunteered is independent of the type of volunteer.
: The number of hours volunteered is dependent on the type of volunteer.
,
Using (8.22) to calculate the respective frequencies, we have for instance,
since and
Similarly,
since and

since and
The complete expected frequencies is presented in the table below:

Table 8.7: Expected frequencies for problem in Example 8.9


Number of hours worked per week
Type of volunteers
1-3 4–6 7-9

Community college students 90.57 115.19 49.24

Four-year college students 103.00 131.00 56.00

Non-Students 104.42 132.81 56.77

The test statistic is given by:


∑ ∑ ( )
,

( )( ) ,
( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

170
At the table value of . Since , we reject Ho and conclude that the
two attributes are not independent (that is type of volunteer and the number of hours volunteered are
dependent on each other).
Assignment 8.9
In a recent diet survey as regards the consumption of tea in two communities in a certain city, the
following results were obtained:
Community A Community B

No. of families that take tea 1236 164

No. of families that do not take tea 564 36

Test at whether there is significance difference between the two communities with regards to
tea consumption
SOLUTION

171
Chapter Nine
9.1 Introduction.
In this chapter, we shall be looking at the dependence of factors or variables which takes only numerical
values. Quite often, the goal is to establish the actual or quantitative relationship between two or more
variables. This aspect is studied under regression which is the second part of the chapter. We commence
the chapter with correlation analysis which is used in situations where the researcher is only concerned
with the quality of the relationship and not the actual relationship between the variables.

9.2 Correlation.
Correlation is a measure of the degree of relationship existing between two variables. It could be defined
as the degree to which variables are related hence. Unlike regression which indicates the quantitative
relationship between independent variable and one or more dependent variables, correlation gives an
estimate the quality of the relationship between these variables.
Similar to what we have in regression analysis, when the relationship is between two variables it is termed
a simple correlation. On the other hand, if the relationship is between more than two variables, it is called
a multiple correlation.
There are mathematical equations for determining whether or not correlation exists between variables.
However, graphical method called the scatter diagram can be used to check for correlation.

9.3 Types of Correlation


Broadly, correlation is divided in two, namely linear correlation and non-linear correlation.

9.3.1 Linear correlation


Correlation is said to be linear when all points on a scattered diagram appear to cluster around a straight
line. There are two types of linear correlation, namely, positive or negative linear correlation. Two
variables are said to be linearly and positively correlated if they tend to move in the same direction. In
other words, an increase in the value of one variable is associated with an increase in the value of the
other variable, and vice versa.

Diagrammatically, the positive linear correlation is represented by the Fig. 9.1 below.
Y

Fig. 9.1: Positive linear correlation between X and Y

In Fig. 9.1 above, it could be seen that not all the points on the scatter diagram fall on the straight line.
This shows that the correlation between X and Y is not perfect. If all the points fell on the straight line,
the correlation is said to be perfect positive.

172
Two variables are said to be negatively and linearly correlated if the two variables tend to move in the
opposite direction. An increase in the value of one of the variable is associated with a decrease in the
value of the other variable and vice versa.
The diagram of two variables that are negatively and linearly correlated is shown in Fig. 9.2.
Y

X
Fig. 9.2: Negative Linear Correlation between X and Y

In situations where all the points in Fig 9.2 above fall on the straight line, the correlation is said to be
perfect negative.
9.3.2 Non-linear Correlation
Correlation between two variables is said to be non-linear when all points appear to form a curve. As in
linear correlation, non-linear correlation could be positive or negative. The correlation between two
variables, X and Y is said to be positive non-linear if their value change in the same direction described
by a curve.
The diagram of this type of correlation is shown in the Fig. 9.3.

X
Fig.9.3: Positive non-linear Correlation between X and Y
Two variables are negatively non-linearly correlated if their respective values change in opposite
directions, forming an inverse relationship along a curve in the XY plane. The diagram is shown in Fig.
9.4.

173
Y

Fig. 9.4: Negative non-linear Correlation


9.3.3 Zero Correlation
This type of correlation exists in situations where two variables are uncorrelated with each other. In other
words, there is no linear relationship between the two variables. When the values of the two variables are
plotted in a scattered diagram, the points are seen to be dispersed all over the surface of the XY plane
with no suitable line to join them.

Fig. 9.5: Zero Correlation

9.4 Methods of Computing Correlation Coefficient

Correlation coefficient, denoted as or simply r, between two variables X and Y, is defined as the ratio
of the covariance of X and Y to the square root of product of the variance of X and that of Y.
Mathematically,

174
( )
(9.1)
√ ( ) ( )

If we are given pairs of sample observations ( )( ) ( ), then two computational


methods are available for determining the quality strength of the correlation between X and Y, namely the
Karl Pearson coefficient of correlation and Spearman Ranks coefficient of correlation.

The Karl Pearson coefficient of correlation is given as:


∑( ̂)( ̂)
∑( ̂) ∑( ̂)
, (9.2)

∑ ∑ ∑
(9.3)
(∑ ) (∑ )
√{∑ }{∑ }

∑ ∑ ∑
for (9.4)
√{ ∑ (∑ ) }{ ∑ (∑ ) }

The Spearman ranks coefficient of correlation, denoted by the symbol , can also be used to determine
the quality of the correlation between X and Y but by first ranking the n observations for X and then Y.
Once this is done, we can make use the Spearman ranks coefficient of correlation given by the formula:

( )
(9.5)

where is the squares of the differences in the ranks of the corresponding paired observations.
The Spearman ranks correlation is specifically used when the observations are qualitative.

The correlation coefficient (r or ) is a pure number, it is independent of the units in which X and Y are
measured. Furthermore, and .
When ( ) it means that there exists a perfect positive correlation between X and Y. When
( ) it means that there exists a perfect negative correlation between X and Y. When two
variables are independent, the correlation between them is zero. However, the converse is not true
because ( ) does not necessarily means that X and Y are independent. It could be that there are
other forms of relationship (e.g, quadratic) between X and Y other than a linear relationship. Between -1
and 0, and 0 and 1, we have weak correlation ((-0.1,-0.39) and(0.1, 0.39)), fair correlation ((-0.4, -0.69)
and (0.4,0.69)) and strong correlation ((-0.7,-0.99) and (0.7,0.99)).
Assumptions about Correlation Coefficient:
1. The random variables X and Y are distributed normally;
2. X and Y are linearly related;
3. There is a cause and effect relationship between the factors affecting the values of X and Y in the
series of data.

175
9.4.1 Test of Significance of Correlation Coefficient
The random samples ( )( ) ( ), are drawn from the population under consideration.
Whatever conclusions are deduced from the sample or are meant to be used to make inferences about
the parent population. For us to be sure of the validity of such inferences on the population, we need a sort
of confirmation about the correctness of these sample statistics ( or ) by way of test of significance.
Assuming R is the population parameter for , then the test of significance of correlation coefficient
means to test the hypotheses whether or not the correlation coefficient is zero in the population, that is, we
test

The test statistic for testing is

(9.6)

where is the estimated value of R based on the n paired observations, is the standard error of given
as:

√ (9.7)

Therefore,

(9.8)

If the calculated value of t is greater than the table value of t for a given
Level of significance and ( ) degree of freedom, reject , otherwise accept . Rejecting leads
to the conclusion that the two variables X and Y are dependent (i,e, not independent). This means that the
correlation between them is worth considering. If is accepted, it means that the value of r is due to
sampling whereas in reality, the two variables are uncorrelated in the population.

Example 9.1: The table below shows the 9 paired observations for the variables X and Y
Table 9.1: Paired observations for X and Y

X 400 440 480 550 620 650 660 740 760

Y 50 60 70 85 95 100 105 115 120

(a). Calculate the value of (b) Test the significance of r at 5% level of significance
Solution
We make use of the formula:

176
∑ ∑ ∑

√{ ∑ (∑ ) }{ ∑ (∑ ) }

Table 9.2: Data for solution to Example 9.1

S/N

1 400 50 20000 160000 2500

2 440 60 26400 193600 3600

3 480 70 33600 230400 4900


4 550 85 46750 302500 7225

5 620 95 58900 384400 9025


6 650 100 65000 422500 10000

7 660 105 69300 435600 11025


8 740 115 85100 547600 13225

9 760 120 91200 577600 14400

Total 5300 800 496250 3254200 75900

(a). Using the respective values from the table above, we have
( ) ( )
,
√(( ) ( ) )(( ) ( ) )

,

We conclude that there is a strong positive linear correlation between the variables X and Y.

(b). To test the significance of the population correlation coefficient, we test

177
where the test statistic is given as:



For we have

√ ( )

.
The table value of t 5% and 7 d.f is 2.365. We reject , implying that there is significant correlation
between X and Y.

Example 9.2
The table below shows the 9 paired observations for the variables X and Y
Table 9.3: Data for Example 9.2

X 20 14 18 25 21 35 26 37 29

Y 25 23 20 15 25 18 17 31 32

Calculate the Spearman rank correlation between X and Y


Solution
In order to calculate Spearman ranks correlation, , we rank the values of x and that of y in ascending
order to get the respective values from the table below:
Table 9.4: Data for estimating for Example 9.2.

S/N rank(x) Rank(y)

1 20 24 3 6 -3 9

2 14 23 1 5 -4 16

3 18 20 2 4 -2 4

4 25 15 5 1 4 16

5 21 25 4 7 -3 9
6 35 18 8 3 5 25

7 26 17 6 2 4 16

178
8 37 31 9 8 1 1

9 29 32 7 9 -2 4


( )

( )

.
We conclude that there is a weak positive linear correlation between X and Y.

Assignment 9.1
The table below shows 14 paired observations on X and Y

X 21 25 26 24 22 30 19 24 28 32 31 29 21 18
Y 19 20 24 21 21 24 18 22 19 30 27 26 19 18

(i). Calculate (ii) Test the significance of r at 5% level of significance


SOLUTION

179
SOLUTION

180
Assignment 2
The table below shows 12 paired observations on X and Y

X 11 15 27 20 22 35 19 24 28 32 31 29
Y 13 10 20 31 21 17 28 22 19 30 27 36

(i). Calculate
SOLUTION

181
SOLUTION

182
9.5 Meaning of Regression Analysis
Regression is an important tool applied by researchers in order to understand the relationship between two
or more variables. It describes in mathematical form the relationship between variables in a given study.
In other words, regression analysis presents an equation for estimating the amount of change in the value
of one variable associated with a unit change in the value of another variable. In expressing any
relationship in mathematical form, two types of variables can be identified, namely, the independent
variable (also called the explanatory variables, the input variable or factor, etc) and the dependent variable
(output variable, the response variable, etc) which depends on the independent or explanatory variables.
For instance, given the statistical model: Y = β0 + β1X + e. Y is referred to as the dependent variable, X is
the explanatory or independent variable, β0 and β1 are the model coefficients, the variable e is called the
error term or disturbance in the relationship. Specifically, for a linear regression, β0 and β1 represent the
intercept and the slope, respectively. It represents factors other than X that has influence on Y. It is
implicitly assumed that the explanatory variable X has a causal effect on the dependent variable Y, and
the coefficient β1 measures this causal effect or influence of X on Y.

9.6 Types of Regression Analysis


Regression analysis could be simple or multiple.
9.6.1 Simple Regression
This is a regression analysis which describes the relationship between two variables. It is also called the
two-variable linear regression model or bivariate linear regression model because it relates to just two
variables say X and Y.

9.6.2 Multiple Regression analysis


This is an extension of the simple regression analysis. It is a regression analysis that involves three or
more variables. Hence, any model with a minimum of two independent variables requires multiple
regression technique for its analysis. For instance, is a statistical model that
involves three variables, namely one dependent variable (Y) and two independent variables ( ).
This focus of this chapter is on simple linear regression where the coefficients and are both linear.

9.7 Least Squares Method For Estimating Parameter Estimates Of Regression Models
To estimate the magnitude of the parameters of a regression model or equation, there are several
techniques that can be used including the Ordinary Least Squares (OLS) method, the matrix method,
maximum likelihood method, etc.
There are distinctive properties associated with OLS method and these include:
a. the parameter estimates obtained by OLS method have some optimal properties like
unbiasedness, least variance, efficiency, best-linear unbiasedness (BLU), least mean square-
error (MSE) and sufficiency;
b. its computational procedure is fairly simple as compared with other econometrics techniques
and data requirement are not excessive;
c. the mechanics of OLS are simple to understand.
It is important to note that any estimation procedure using OLS method is based on certain assumptions.
It is on these assumptions that the parameter estimates of any regression model could be accepted as

183
having a dependable prediction power. Some of these assumptions include the fact that the error term is
normality distributed with a mean of zero and constant variance.
Recall that the statistical model for the linear regression line of Y on X for the population is given as:
, (9.9)
Suppose the regression line given by (9.9) is to be fitted on the basis of pairs of sample observations
( ), ( ), … , ( ). Each pair of ( ) for will satisfy the regression line (9.9).
Therefore,
, (9.10)
( ) (9.11)

∑ ∑( ) , (9.12)
To get the least squares estimates of and such that the sum of errors is minimized, we differentiate
(9.12) partially with respect to and respectively to get two equations called normal equations: We
also replace and with their estimated values say a and b, respectively.
(∑ )
∑( ) (9.13)

(∑ )
∑( ) (9.14)

From (9.13), we have:


∑( ) (9.15)
∑ ∑ ∑ , (9.16)
Since the summation is from 1 to n, we get:
∑ ∑ , (9.17)
∑ ∑
, (9.18)

Therefore,
̅ ̅, (9.20)
From (9.14), we have:
∑( ) , (9.21)
∑ ∑ ∑ , (9.22)
Substituting the value of a from (9.20) in (9.22), we get:
∑ ∑
( )∑ ∑ ∑ , (9.23)

(∑ ) (∑ )(∑ )
(∑ ) ∑ , (9.24)

184
Therefore,
∑ (∑ )(∑ )
, (9.25)
∑ (∑ )

∑ (∑ )(∑ )
∑ (∑ )
, (9.26)

Equation (9.25) can also be written as:


∑( ̅ )( ̅)
, (9.27)
∑( ̅)

Therefore, the estimated or fitted line of regression of Y on X is given as:


∑( ̅ )( ̅)
̂ (̅ ̅) ( ) , (9.28)
∑( ̅)

For
9.8 Causes of Deviation of the Fitted Value from the Observed Value
In reality, ̂ , in (9.28) will always show deviations due to some reasons that are briefly
explained.
1. Omission of relevant variables from the model
It is always difficult to include all the explanatory variables that affect or influence the response variable
in a single model. This is because of the complexity of the real-life situations as well as the need to keep
the model as simple as possible. Thus, several explanatory variables that affect a given phenomenon in
one way or the other may not be recognized and included in the model.
2. Error of specification
The deviation of a fitted value from the observed value could also occur due to imperfect specification of
a relationship. Quite often, a non-linear relationship is represented in a linear form. Again, some
phenomena need to be studied using several equations solved simultaneously. If these phenomena are
studied with a single model, error of specification is bound to occur.
3. Error of aggregation
In collecting data for statistical analysis, it is often the practice to add data from different groups with
dissimilar characteristics. For instance, in the studies that involves social behavior, since the attitudes of
an individual may differ from those of any group, lumping their data as a unit for analysis could bring
deviations of the fitted value of the response variable from the observed value.
4. Error of measurement
This error arises due to the method of data collection and processing. In data collection, a wrong sampling
technique adopted could cause an error in measurement. Equally, the use in appropriate statistical process
in processing statistical information could cause deviations of observations from the fitted line.
5. Inclusion of irrelevant variables without theoretical underpinning
Differences or discrepancies in the value of the fitted response and that of the raw or observed value could
arise in in situation where irrelevant variables are included in the model.

185
Example 9.3 The table below gives the paired values of variables X and Y
Table 9.5: Data for Example 9.3

X 16 22 28 24 29 25 16 23 24

Y 35 42 57 40 54 51 34 47 45

(1). Find the regression line of Y on X. (2) Predict the value of y when x =31
Solution:
In this example, we wish to make use of the equations (9.28)
The values of the terms needed to compute the estimates and from equation (9.28) are obtained in the
table below:
Table 9.6: Computation of data for Example 9.3

S/N (i) ̅ ̅ ( ̅) ( ̅ )( ̅)

1 16 35 -7 -10 49 70

2 22 42 -1 -3 1 3
3 28 57 5 12 25 60

4 24 40 1 -5 1 -5

5 29 54 6 9 36 54
6 25 51 2 6 4 12

7 16 34 -7 -11 49 77

8 23 47 0 2 0 0

9 24 45 1 0 1 0
Total 207 405 0 0 166 271

9; ∑ ∑ ( ) , ̅

∑ ∑ ( ) ,̅

∑( ̅ )( ̅)
(1) ̂ ∑( ̅)

186
̅ ̅
( )

Therefore ̂
(2) Predict the value of y when ,

̂ ( )

̂
9.9 Standard Errors of the Estimated Parameters for Simple Linear Regression
The standard errors of the estimated parameters are given as follows:
For parameter a, we have:

̅
√ ( ∑( ̅)
) (9.29)

For parameter b, we have:

√ (9.30)
∑( ̅)

where the standard error of residuals is given by:

{∑( ̅) ̂ ∑( ̅) ( ̅)} (9.31)


( )

Assignment 3
(a)Obtain the parameters of the regression of Y on X using the data in the table below:
X 25 21 19 18 20 29 32 26 42 30 22

Y 19 17 15 11 15 20 24 19 30 20 15

(b) Predict the value of y when x is 35 (c) obtain (i) (ii) (iii)

SOLUTION

187
SOLUTION

188
SOLUTION

189
9.10 Hypothesis Testing of the Parameters for Simple Linear Regression
A hypothesis is an assertion or conjecture about any chosen parameter (e.g. mean) of the population.
In case we are considering more than one population, a hypothesis may be about the relationship
between the similar parameters of the distributions.
A hypothesis can take two forms, namely, null and alternative hypotheses.
i. Null hypothesis: A null hypothesis is the hypothesis which is to be actually tested for
acceptance or rejection. It is denoted as .
ii. Alternative hypothesis: An alternative hypothesis is a statement about the population
parameter(s) which gives an alternative to the null hypothesis, within the range of pertinent values of
the parameter. It is denoted by or .

In this Section, the three common tests shall be considered. These are:
i. the standard error test
ii. the t-test and
iii. the F-test.
During hypothesis testing, we are faced with the task of finding out whether the parameter estimates
are statistically significant or not. Hence it entails determining whether the independent variable(s)
significantly affect the dependent variable or not.

9.10.1 Standard Error Test


In this test, a comparison is made between the parameter estimate and its standard error without
reference to degree of freedom. Based on the result of the comparison, a decision is taken on the
significance of the estimate.
Decision Rule

(a). If the standard error or is greater than the half of the or , that is if , where is
either or , we accept the null hypothesis and conclude that is not statistically significant.

(b) If , we reject the null hypothesis and conclude that is statistically significant.

9.10.2 The t-test


This is a test for testing the hypothesis vs .

The t-test is usually used when the number of observations, 30. Here, reference is made to the
degree of freedom at the chosen level of significance . The calculated t-value is compared with the
theoretical or table value of t, ( ), at a particular level of significance with (n-2) degree of freedom.
The calculated ( ) is given by:

, (9.32)

and

, (9.33)
where and are obtained from equations (9.29) and (9.30), respectively.
Decision Rule
(1). If ( ) we reject and conclude that is statistically significant,
that is is different from zero.

(2) If ( ) , we accept the null hypothesis (H0) and conclude that is not
statistically significant, that is is not different from zero.

Accepting for parameter b means that the regression parameter (coefficient) of Y on X has no
practical significance, that is the change in Y corresponding to a unit change in X is practically
meaningless.
For parameter a, accepting means that the regression line passes through the origin, that is the
regression line does not cut the Y axis.

9.10.3 The F-test


The hypothesis vs ,
can also be tested by the F-test using the analysis of variance technique. For the F-test, the ANOVA
table is as given below:

Table 9.7: ANOVA table for F-test

Source of
DF Sum of Squares of Error Mean of Squares of Error F-value
Deviation
̂ ∑( ̅ )( ̅)
Due to ̂ ∑( ̂ ∑( ̅ )( ̅)
1 ̅) ( ̅). .
Regression .
∑( ̅) ̂ ∑( ̅ )( ̅)
Dev. from (n- ∑( ̅)
( )
Regression 2) ̂ ∑( ̅) ( ̅). .

(n-
Total ∑( ̅) .
1)

If ( ), reject otherwise accept . The physical interpretation for rejection or


acceptance of remains the same as given under t-test for

3.4.3 Confidence Limits for Regression Parameters


The ( ) confidence limits (CL) for and are given as follows:

( ) (9.34)

CL= ( ) (9.35)

191
where ( ) is the table value for a two-tailed t-test at level of significance for ( ) degree of
freedom.

Example 9.4: Given the table below:


Table 9.8: Data for Example 9.4

S/N X Y

1 36.6 54.8

2 39.5 57.6

3 43.4 58.1

4 47.6 63.4

5 53.4 72.5

6 58.5 78.4

7 66.1 82.7

8 74.9 84.4

9 87.1 90.3

10 100.0 100.0

11 115.1 109.2

12 131.7 119.8

13 150.0 129.7

14 162.6 140.8
15 176.3 153.8

16 190.4 152.6

17 209.4 153.2

18 233.6 163.0

19 255.7 175.3

20 271.4 184.3

(a) Determine the regression line of Y on X using


(b) Calculate (i) (ii) (iii)
(c) Determine the significance of and using the t-test at level of significance
(d) Determine the ( ) confidence intervals for and .
Solution
∑ ( ) ,
∑ ( )

192
̅ ̅

Table 9.9: Data for computing values for Example 9.4

S/N X Y ( ̅) ( ̅) ( ̅) ( ̅) ( ̅ )( ̅)

1 36.6 54.8 -88.5650 -56.3750 7844 3.1781 4993


2 39.5 57.6 -85.6650 -53.5750 7338 2.8703 4590

3 43.4 58.1 -81.7650 -53.0750 6686 2.8170 4340

4 47.6 63.4 -77.5650 -47.7750 6016 2.2825 3706

5 53.4 72.5 -71.7650 -38.6750 5150 1.4958 2776

6 58.5 78.4 -66.6650 -32.7750 4444 1.0742 2185

7 66.1 82.7 -59.0650 -28.4750 3489 0.8108 1682

8 74.9 84.4 -50.2650 -26.7750 2527 0.7169 1346

9 87.1 90.3 -38.0650 -20.8750 1449 0.4358 795


10 100.0 100.0 -25.1650 -11.1750 633 0.1249 281

11 115.1 109.2 -10.0650 -1.9750 101 0.0039 20

12 131.7 119.8 6.5350 8.6250 43 0.0744 56

13 150.0 129.7 24.8350 18.5250 617 0.3432 460

14 162.6 140.8 37.4350 29.6250 1401 0.8776 1109

15 176.3 153.8 51.1350 42.6250 2615 1.8169 2180

16 190.4 152.6 65.2350 41.4250 4256 1.7160 2702


17 209.4 153.2 84.2350 42.0250 7096 1.7661 3540

18 233.6 163.0 108.4350 51.8250 11758 2.6858 5620

19 255.7 175.3 130.5350 64.1250 17039 4.1120 8371

20 271.4 184.3 146.2350 73.1250 21385 5.3473 1.0693

Total 2503.3 2223.5 0.0000 0.4000 111890 34549 61443

∑( ̅ )( ̅)
(a) ∑( ̅)
n=20.

̅ ̅
( )

193
Therefore ̂
(b)

̅
√ ( ∑( ̅)
)


∑( ̅)

where the standard error of residuals is given by:

*∑( ̅) ∑( ̅) ( ̅)+ (???)


( )

(i) * ( )+

( )

̅
(ii). √ ( ∑( ̅)
)

( )
√ ( )

√ ( )

(iii). √
∑( ̅)


.

To determine the significance of and using the t-test at level of significance, we proceed
as follow:
For the parameter estimate , we have:

194
The calculated value of t is greater than the tabulated value of at 5% level of significance at 18 degree
of freedom which is given as .

Hence, we reject , meaning that is significant. The practical implication of this conclusion is that
the regression line does not pass through the origin.
For the parameter estimate, we have:

The calculated value of is greater than the tabulated value of at 5% level of significance
at 18 degree of freedom which is given as .

Hence, we reject , meaning that is significant. The practical implication of the conclusion is that
plays a significant role in determining Y through X.

(d). To determine the ( ) confidence intervals for and , we proceed as follows:

For , the 95% confidence limits for a is given as:

( )

CL=41.96
CL=41.96 6.01
This implies that the upper limit is 47.97 and lower limit is 35.96
For , the 95% confidence limits for a is given as:

( )

CL=0.55
CL=0.55 0.0408
This implies that the upper limit is 0.5908 and lower limit is 0.5092

195
9.11 Curvilinear Regression
When we have nonlinear relations, we often assume an intrinsically linear model and then we fit data
to the model using polynomial regression. That is, we employ some models that use regression to fit
curves instead of straight lines. The technique is known as curvilinear regression analysis. To use
curvilinear regression analysis, we test several polynomial regression equations. Polynomial equations
are formed by taking our independent variable to successive powers. For example, we could have

, Quadratic Model

, Cubic Model

In general, the polynomial equation is referred to by its degree, which is the number of the largest
exponent. In the above, the quadratic is of the second degree, the cubic is of the third degree, and so
on.
The function of the power terms is to introduce bends into the regression line. With simple linear
regression, the regression line is straight. With the addition of the quadratic term, we can introduce or
model one bend. With the addition of the cubic term, we can model two bends, and so forth. The
Figure below shows an example of a quadratic regression which is plotted for the model
for the values of x from 0 to 5:

Fig. 9.6: Graph of a quadratic model

The plot of the cubic model given by is shown in the Figure below:

196
Fig. 9.7: Graph of a cubic model

Notice that there is a single bend in the quadratic curve as against two for the cubic curve because of
the effect of the term for the quadratic model and the terms and for the cubic
model.

Assignment 4
Given the table below:
X 19 11 10 14 22 9 13 23 11

Y 10 12 8 15 20 7 10 17 12

(a) Determine the regression line of Y on X using

(b) Calculate (i) (ii) (iii)


(c ) Determine the significance of and using the t-test at level of significance
(d) Determine the ( ) confidence intervals for and .

SOLUTION

197
SOLUTION

198
Chapter Ten
Research is an important part of academic, scientific, social, medical, economic, industrial and
management studies. The objectives of a research include to find out the cause and effect among
variables and to establish their relationship between them. One of the means or tools of achieving
these objectives is design of experiments.

10.1 Introduction to Design of Experiment


Broadly speaking, design of experiment is a plan to collect data from individuals or units which are
subjected to certain conditions or treatments in such a way that the individuals or units are free from
the influences or effects of all extraneous factors (nuisance variables) and have full freedom to show
the effect of the conditions or treatments to which the individuals or units are exposed to. Analysis of
data and interpretation of results are carried out to conclude a research design. We give an illustration
of how these are done using a category of experimental design referred to a completely randomized
design (CRD) in the concluding Section of the Chapter.
We give brief discussions of some important terms associated with design of experiments in the
Subsections that follow.

10.1.1 Data
Data are numerical values of a variable collected on individuals or unit in the course of carrying out
experiments

10.1.2 Factors or Independent Variable


This is the variable that is manipulated by a researcher in order to see elicit a reaction from the
experimental units. Independent variable could be a condition or a substance that is capable of
prompting a reaction (an effect) on the experimental units. Independent variable is used
interchangeably with the term treatment. For instance, different doses of fertilizers applied to different
plots. Students exposed to different teaching methods. The doses of fertilizer are called treatments
while the different teaching methods are called conditions.

10.1.3 Factor Levels


Factor levels are groups within each factor. They are sources of a particular factor. For example, in
the study of the effect of carbohydrate on the weight gain in birds, the factor levels are the different
sources of the factor (carbohydrate) under study. These sources may include wheat, dried cassava
peels, maize and others. Each of these sources is a factor level.

10.1.4 Treatment
A treatment is equivalent to a factor level in a single-factor analysis. However, in multi-factor
analysis, a treatment is equivalent to a combination of factor levels. For instance, in the study of the
effect of carbohydrate on the weight gain in birds, each of the factor levels (that is wheat, dried
cassava peels and maize) is a treatment under a single-factor analysis.
In considering treatment as a combination of factor levels, assuming we wish to study the effect of
different breeds of chicken and different sources of carbohydrates on the weight gain in chicken. We
may decide to use three different breeds of chicken and three sources of carbohydrates, and this will
give a total of nine (3 x 3 = 9) treatments. This is so because each of the sources of carbohydrates will
combine with each breed of chicken to form a treatment. This combination of factor levels can be
illustrated as follows:

199
Sources of carbohydrate (SC):
1. Dried Cassava peels (C)
2. Wheat (W)
3. Maize (M)

Breeds of Chicken (BC):


1. Plymouth (P)
2. Sicelica (S)
3. Borea (B)

Table 10.1: Treatment Combination

SC C W M
BC

P PC PW PM

S SC SW SM

B BC BW BM

The treatments in this experiment are: PC, PW, PM, SC, SW, SM, BC, BW and BM
10.1.5 Dependent Variable
A dependent variable is one which is used to measure the effects of the various treatments/conditions
on the individuals or experimental units. Measurements of yields of crops in fertilizer experiments and
volume of milk produced of cows in feeding experiments are examples of dependent variables.

10.1.6 Nuisance Variables


Nuisance variables are extraneous variables which influence the dependent variables and cause
undesired variation in the measurements taken on the dependent variables. For example, the inherent
knowledge of a person in teaching method experiment affect the measurements taken the dependent
variable which is the learning outcome of the students or learners. This is so because a teacher of
higher level of intelligence quotient will likely get better results (learning outcome on the part of the
students) than a teacher of lower intelligence quotient. Nuisance variables interact with the dependent
variables in some way that weakens and sometimes invalidates the results of the experiment. Hence,
they should be controlled as much as possible or, if possible, completely eliminated.

10.1.7 Experimental Unit


This is an individual or group of individuals to which a single condition or treatment is applied
independently of other individual or group. It is the unit whose response to the treatment is being
examined or observed and this observation is taken on each unit as a separate entity. For instance, a
patient is the experimental unit in drug experiment while a plot is the experimental unit in fertilizer
experiment.

200
10.1.8 Experimental Error
These are error caused by extraneous factors in an experiment which are beyond the control or not
controlled by the researcher. For example, if on two or more plots, seeds are sown on the same day
and are treated similarly in all respects, their yields will still not be the same. This difference in yields
forms part of the experimental error for the particular study.

10.1.9 Control Experiments


A control experiment is an experiment in which all variable factors have been kept constant and
which is used as a standard of comparison to the experimental component in a controlled experiment.
It is also defined as an experiment designed to check the results of another experiment by removing
the variable or variables operating in that other experiment. The comparison obtained is an indication
or measurement of the effect of the variables concerned. Control experiments are used to minimize
the influence of the extraneous variables.

10.1.10 Randomization
This is the process in which each and every experimental unit has the same chance of being allocated
to treatments. This is best performed with the help of a random number table. Randomization
eliminates researcher‘s bias. It also ensures that no treatment will continually be favoured or
handicapped by extraneous sources of variation which are out of the control of the researcher. With
randomization, statistical inferences drawn are based on the assumption of that errors are
independently and normally distributed.

10.1.11 Replication
This is the repetition of each treatment on several experimental units. Replication provides an estimate
of the experimental error variance which is the basic requirement to test the significance of the
treatment differences and finding the estimate of the standard error of mean given by , where is

the mean square error of variance and is the number of replications of the treatment.

10.1.12 Local Control


This is a technique for controlling a known extraneous variable. In field experiment, we know that
adjacent plots are more alike in respect to fertility as compared to those which lie far apart. So, the
plots of land with the same fertility are kept in one block. The purpose of local control is to reduce the
error variance and make the test of significance more sensitive and powerful thereby creating a more
efficient design.

10.2. Validity of Experimentation


The aims of an experiment are two-folds: One, to draw valid inferences about the effects of the
treatments (independent variables) on the individual or groups (dependent variable) under study. To
generalize our results to a larger population of interest, there are several threats to validity and some
of them are briefly discussed below.
(i). History: In many experimental designs, measurement (say m1) about a criterion variable is taken
on the test units prior to the manipulation (treatment) X. When the manipulation is over, an after
measurement (say m2) is taken on the same criterion variable. The difference (m2-m1) measures the
effect of X on the variable. But there is a possibility that in the mean time between m1 an m2, many
events might have occurred which could have given rise to m2 besides X. Such events are called
history. They are threats to the validity of the experiment because the inferences drawn abo(m2-m1)
regarding the outcome of the study are actually incorrect and should be taken with a pinch of salt.

201
(ii) Testing effects: The effects which are caused due to taking measurement on the dependent
variable before and after the application of the treatment on the test units are called testing effects
which are further classified into two types of effects, namely, main testing effects (MT) and
interactive testing effects (IT). The effect that occurs when a prior observation (m1) after a latter
observation (m2), is the main testing effect. An effect in which a prior measurement (m1) has an
influence on the independent variable and hence, affecting the test unit response is called interactive
testing effect.
(iii) Instrumentation (I): Any changes made in the calibration of the measuring instrument during
interviews cause threats to validity of the experiment.
(iv) Selection Bias: The differential selection of subjects (individuals or group of individuals) to
experimental and control groups is a major threat to internal validity. Selection bias is almost
eliminated if the subjects are assigned to control and experimental groups randomly or matching the
members of both groups on key factors.
(v) Experimental mortality: This term refers to both the death of the experimental units and changes in
the composition of the study groups during experimentation. Such changes often occur due to deaths
of subjects or some subjects refuse to continue in the experiment. This alters the distribution of the
subjects and one cannot ensure that the units lost would have responded to the treatments in the same
manner as the remaining ones.
(vi) Resentful Demoralization: If the subjects come to know that the treatments levels assigned to
them are inferior in terms of benefits, goods or services, they may feel demoralized and hence show
their resentment. They are very likely to perform very low. This will create larger difference than the
real one between desirable and undesirable treatment levels. Confidentiality of treatment levels be
maintained to reduce this type of threat to the validity of the experiment.
(vii) Reactive Factors: If the subjects know that they are being observed, they will act or respond
differently than those who are not being observed. So, the results from the experiment cannot be
generalized.
(viii) Low statistical Power: If the power of a test is low, then a research will very likely fail to reject
a false null hypothesis. The reason for this could be inadequate sample size, inability to control
nuisance variables, the use of inappropriate test.
(ix) Violation of Assumptions: Al tests are based on certain assumptions. If such assumptions are
violated, then there is every likelihood that incorrect inferences would be drawn.
(x) Reliability of Measures: If the reliability of measurements on the dependent variables is low, then
there is bound to be increase in error variance which may lead to the acceptance of a false null
hypothesis.

10.3. Statistical Designs


In the category of true designs, there are a number of statistical designs that allow the control of
extraneous variables to a large degree or extent. Such designs are randomized and enjoy some
advantages enumerated below:
(i) The effects of more than one treatments or conditions can be estimated and compared
from one experiment.
(ii) Interaction effects among the various treatments/conditions on the subjects can be
measured.
(iii) Many extraneous factors can be statistically controlled.
(iv) The designs are generally economical and convenient.

202
(v) Each design has a fixed procedure of analysis; hence, the chances of researcher‘s bias are
negligible.
(vi) Threats to validity of the experiments are very less compared to other types of designs.

10.4 Requirements for a good Experimental Design


The following are requirements of a good design
(i) A good design should have no researcher‘s bias.
(ii) It should involve the use of appropriate statistical tests in analyzing the data.
(iii) It must make use of randomization and replication.
(iv) All extraneous factors should be controlled through local control as much as possible

10.5 Analysis of Variance (ANOVA)


Analysis of variance (ANOVA) is a powerful tool for data analysis when a number of populations are
under study and from each population a random sample or a group of units is selected.
ANOVA is used to test simultaneously the equality of two or more population means when there is
one parametric dependent variable and one or more independent variables. For instance, through
ANOVA, one may compare the average yield of several varieties of a crop.
It is a statistical technique used for decomposing the variation in an observed data into its different
sources. In other words, it is a technique employed to break down the total variation in an experiment
to its additive components. With this technique therefore, we can break down the total variation,
occurring in a dependent variable, into various separate factors causing the variation.
It is important to note that ANOVA does not stipulate any functional relationship between the
dependent and the independent variable but it enables us to break down the total variation into the
different sources or causes of the variation.

The assumptions underlying the application of ANOVA are as follows:


(i) The observations follow normal distribution.
(ii) Experimental units are assigned to treatments randomly and vice-versa.
(iii) Appropriate statistical model is adopted for the experimental design.
(iv) All the components involved in the model are additive and independent.
(v) Experimental errors are independently and normally distributed with mean o and a
constant variance.

10.6 Types of Analysis of Variance (ANOVA)


Before applying ANOVA, we first consider the number of factors to be studied. Based on this,
ANOVA has two basic classifications or types, which are the one-way or one- factor ANOVA and
two-way or two-factor ANOVA.

10.6.1 One- Factor ANOVA


In one-factor analysis of variance, there are only two variables: one dependent variable and one
independent variable. It is used to find out the effect of the independent variable on the dependent
variable. For instance, in the study of the level of maize yield using different sources of Sulphur,
where we are interested in the effect of Sulphur on the yield of maize. Therefore, Sulphur is the
independent variable (factor), while the yield of maize is dependent on it. Based on this, we adopt the
one-factor analysis of variance to determine if the variation in maize yield is due to the the application
of Sulphur (treatment) or due to chance.

10.6.2 Two-Factor ANOVA

203
The two-factor classification contains more than two variables. It is made up of one dependent
variable and two independent variables (factors). It is applied to determine the effects of the two
independent variables (factors) on the dependent variable. This technique, therefore, enables us to
estimate not only the separate effects of the factors (independent variables) but also the joint
(interaction) effect of these factors on the dependent variable.

Consider the study of the effects of fertilizer and soil type on the yield of rice. There are two
independent variables or factors (fertilizer and soil type) and one dependent variable (rice yield).
Therefore, the effects of these two factors (fertilizer and soil type) can be analyzed using the two
factor analysis of variance technique.

10.7 ANOVA Table


ANOVA table is a table that shows, in summarized form, the computations for analysis of variance.
This table enables us to have a quick and convenient assessment of the sources of variation, their
respective Sum of Squares (SS) and the degree of freedom (df), Mean of the Sum of Squares (MS)
and the F-value.
The skeleton of the ANOVA table is given below.

Table 10.2: ANOVA Table

Source of Degree of freedom Sum of Squares Mean Sum of Squares F-value


variation (df) (SS) (MS)

Due to A

Due to B

Error

In Table 10.2, error degrees of freedom is obtained by subtracting the components df from the total df.
Similarly, the error SS is obtained by subtracting the components SS from the total SS, whereas the
total SS is calculated by taking the total of the square of each individual value and subtracting from it
a factor which is known as the correction for mean or correction factor (CF). Hence, the sum of
squares for testing the equality of population means are as follows:
Suppose the observations in k random samples of sizes , ,…, from k normal populations
( ) for .

(∑ ∑ )
, (10.1)

, (10.2)

Total SS ∑ ∑ , (10.3)

Between samples SS ∑ , (10.4)

204
Error SS ∑ ∑ ∑ (10.5)

, (10.6)

ANOVA table with full details is presented in the table below.

Table 10.3: ANOVA Table with full Details


Source of variation df SS MS F-value

Between samples ( ) ∑ ,

Within sample (Error) ( )


∑ ∑ ∑ .

Total ( ) ∑ ∑ ,

In ANOVA, an experimenter tests the null hypothesis of equality of k treatment mean effects. That is
vs at least two of them are not equal, i.e., for some

The test employed for is F-test. If calculated F is not significant, then it is inferred that all
treatments are equally effective and no further test is required. But if the calculated F for treatment is
significant, then is rejected, implying that all the treatment mean effects are not equal. Some of
them may be effective and some are not. So there is a need to perform a test for testing the
significance of all pairs of treatment means. One good example of such test is called the Dunn‘s
Multiple Comparison Test.
The above ANOVA table is for a one way classification. It may be extended for two or more way
classification. In that situation, the component factors will increase in the ANOVA table accordingly.
Example 10.1: The table below shows the gain in body weight (kg) per cow during four grazing
treatments. Test the hypothesis at 5% level of significance that the mean gains in weight of cow under
four treatments are equal.

Table 10.4: Data for Example 10.1


Cow No. Treatments

T1 T2 T3 T4
Total
Gain in body weight(kg)

1 67.3 74.2 63.1 48.7


2 36.9 42.2 32.9 49.0

205
3 63.2 58.6 59.2 52.0

4 26.8 36.6 42.4 38.8


5 54.8 54.6 34.0 48.2

6 64.2 81.8 65.6

7 81.4

Total 394.6 348.0 297.2 245.7 1286.5

Solution
We can test the hypothesis by F-test using ANOVA.
vs at least two means are different
For the given data,
G=1286.5, ;

∑ ∑ ( ) ,

( )
,

Total SS = 74357.57 – 68961.76 = 5395.81.


( ) ( ) ( ) ( )
∑ .

Between treatment SS ∑

Error SS = Total SS – Between SS = 5395.81 – 359.89 = 5035.92.

Table 10.5: ANOVA table for Example 10.1


Source df SS MS F-value

Treatments 3 359.89 119.96 119.96/251.80 = 0.794

Error 20 5035.92 251.80


Total 23 5395.81

The table value of ( ) .

Since the calculated value of F is less than its table value, is not rejected. We therefore conclude
that, under the grazing treatments, the mean increase in weight of cows is not significantly different at
5% level of significance.

Assignment 10.1

206
The table below shows the gain in body weight (kg) per cow during four grazing treatments. Test the
hypothesis at 5% level of significance that the mean gains in weight of cow under four treatments are
equal.

Table 10.4: Data for Example 10.1


Cow No. Treatments

T1 T2 T3 T4
Total
Gain in body weight(kg)

1 25.1 44.8 41.0 38.7

2 22.0 22.2 39.1 39.0

3 23.1 48.7 41.9 42.1

4 16.4 38.3 50.4 28.6


5 24.o 44.6 44.0 38.9

6 31.8 41.8 45.6

7 21.9

SOLUTION

207
SOLUTION

208

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy