STA-PROB PDF
STA-PROB PDF
FOR
PHYSICAL SCIENCE
AND
ENGINEERING
STA 213/MTH 214 STATISTICS FOR PHYSICAL SCIENCE AND ENGINEERING: 3 Units
Course Outline
Scope for statistical methods in physical sciences and engineering. Measures of location, partition and
dispersion. Elements of probability. Probability distribution: binomial, Poisson, geometric,
hypergeometric, negative-binomial, normal, Weibull, Gompertz etc. Estimation (Point and internal) and
tests of hypotheses concerning population means proportions and variances. Regression and correlation.
Non-parametric tests. Contingency table analysis. Introduction to design of experiments. Analysis of
variance.
DEPARTMENT: ………………………………………………………………………………………………………………………………
COLLEGE: ……………………………………………………………………………………………………………………………………
SESSION: ………………………………………………………………………………………………………………………………………
2
TABLE OF CONTENTS
Chapter One
1.1. Introduction: Origin and Development of Statistics 6
1.2. Definition of Statistics 6
1.3. Importance and Scope of Statistical Methods in Physical Sciences and Engineering 6
1.4. Limitations of Statistics 7
1.5. Definition of Basic terms in Statistics 7
Chapter Two
2.1 Average or Measures of Location or Central Tendency 13
2.2 Arithmetic Mean or Mean 13
2.3 Median 16
2.4 Mode 19
2.5 Empirical Relationship between Mean, Median and Mode
2.6 Geometric Mean 22
2.7 Harmonic Mean 22
2.8 Relation between Arithmetic Mean, Geometric Mean and Harmonic Mean 22
Chapter Three
3.1 Measure of Dispersion 25
3.2 Objectives and Significance of the Measures of Dispersion 25
3.3 Range 25
3.4 Mean Deviation or Average Deviation 27
3.5 Variance 32
3.6 Standard Deviation 33
3.7 Interpreting Standard Deviation 36
3.8 Empirical Rule 36
3.9 Chebyshev‘s Theorem 39
3.10 Z-scores or Standard Scores 42
3.11 Variance and Standard Deviation using Assumed Mean and scaling factor 42
3.12 Coefficient of Variation 44
3.13 Measure of partition: Deciles, Percentiles, and Quartiles 44
3.14 Skewness and Kurtosis 51
Chapter Four
4.1 Introduction to Probability 54
4.2 Definitions 54
4.3 Properties of Probability 55
4.4 Some Probability Laws 56
4.5 Conditional Probability 58
4.6 Baye‘s Theorem 59
4.7 Permutations and Combinations 60
3
Chapter Five
5.1 Discrete Probability Distribution 63
5.2 Mean or Expectation, and Variance of discrete probability distributions 63
5.3 Bernoulli distribution 64
5.4 Binomial distribution 64
5.5 Poisson distribution 67
5.6 Poisson Approximation to Binomial 68
5.7 Geometric distribution 68
5.8 Hypergeometric distribution 68
5.9 Negative-binomial distribution 69
5.10 Moment Generating Functions 70
5.11 Probability Generating Functions 78
5.12 Sample Distribution 83
5.13 The Central Limit Theorem 83
5.14 The law of Large Number 84
Chapter Six
6.1 Introduction to Continuous Probability Distributions 89
6.2 Mean or Expectation, and Variance of continuous probability distributions 96
6.3 Normal distribution 98
6.4 Uniform distribution 103
6.5 Exponential distribution 105
6.6 Gamma distribution 105
6.7 Weibull distribution 108
6.8 Gompertz distribution 114
6.9 Cauchy distribution 120
6.10 Chi-Square distribution 121
6.11 Beta distribution 121
6.12 Quantile Functions, Moments and Order statistic of Continuous Probability distributions 121
6.13 Maximum Likelihood estimation of parameters of Continuous Probability distributions 128
Chapter Seven
7.1 Introduction to Statistical Estimation 131
7.2 Statistical Point Estimation 132
7.3 Sampling distribution of a statistic 132
7.4 Statistical Interval Estimation 133
Chapter Eight
8.1 Introduction to Statistical Test of Hypothesis- Simple and Alternative 145
8.2 Small Sample Tests for Population Mean and Difference of Two population Means 147
8.3 Large Sample Tests for Population Mean and Difference of Two population Means 154
8.4 Tests of Hypothesis for Proportion (One Sample Case) 156
8.5 Test of Equality of Two Proportions (Two Sample Case) 157
8.6 Nonparametric Tests 167
4
8.7 Contingency Table for Independence of Factors 168
Chapter Nine
9.1 Introduction 172
9.2 Correlation 172
9.3 Types of Correlation 172
9.4 Methods of Computing Correlation coefficient 174
9.5 Meaning of Regression Analysis 183
9.6 Types of Regression Analysis 183
9.7 Least Squares Method For Estimating Parameter Estimates Of Regression Models 183
9.8 Causes of Deviation of the Fitted Value from the Observed Value 185
9.9 Standard Errors of the Estimated Parameters for Simple Linear Regression 187
9.10 Hypothesis Testing of the Parameters for Simple Linear Regression 190
9.11 Curvilinear Regression 195
Chapter Ten
10.1 Introduction to design of experiments 199
10.2 Validity of Experimentation 201
10.3 Statistical Designs 202
10.4 Requirements for a good Experimental Design 203
10.5 Analysis of Variance (ANOVA) 203
10.6 Types of Analysis of Variance (ANOVA) 203
10.7 ANOVA Table 204
5
Chapter One
1.1 Introduction
The subject of Statistics is an old discipline; as old as human society itself. It has been used since the
existence of man on earth. Though, the sphere of its utility was very much restricted. However, Statistics
was regarded as a universal language of sciences derived from the Latin word ‗status‘ or the Italian word
‗statista‘ or the German word ‗statistik‘ or the French word ‗statistique‘ with each meaning a political
state. In the ancient times, the scope of the subject of Statistics was initially limited to the collection of
data by the governments for framing military and fiscal policies;
(i) Age and sex-wise population of the country;
(ii) Property and wealth of the country.
However, the subject of Statistics has now adopted in several places like the academia, economics,
medical records, marking, hydrology, philosophy, health and life sciences, psychology, sociology,
education, medicine, business, nursing etc. In some cases, it has been adopted as a universal language of
sciences. Thus, the understanding and careful use of the statistical methods has enable us to accurately
describe the outcomes or finding of several scientific research, make qualitative and quantitative
decisions, and possibly make useful estimations.
1.3 Importance and Scope of Statistical Methods in Physical Sciences and Engineering.
In the ancient times, the word Statistics was regarded only as the science of state-craft, and was used to
collect information relating to crime, military strength, population, wealth, etc., for devising military and
fiscal policies. Some of importance include:
(i) Statistics in Planning
(ii) Statistics in state
(iii) Statistics in Mathematics
(iv) Statistics in Economics
(v) Statistics in Business and Management
(vi) Statistics in Accountancy and Auditing
(vii) Statistics in Industry
(viii) Statistics in Insurance
(ix) Statistics in Astronomy
(x) Statistics in Physical Sciences
(xi) Statistics in Social Sciences
(xii) Statistics in Biology and Medical Sciences
6
(xiii) Statistics in Psychology and Education
(xiv) Statistics in War
7
m. Element or Member: An element or member of a sample or population is a specific subject or
object (for example, a person, firm, item, state, or country) about which the information is
collected.
Assignment 1.1:
1. Define Statistics
2. Define and explain the following terms in your understanding:
a. Population
b. Sample
c. Statistics
d. Data
e. Variable
f. Experiment
SOLUTION
8
SOLUTION
9
SOLUTION
10
SOLUTION
11
SOLUTION
12
Chapter Two
2.1 Average or Measures of Location or Central Tendency.
In our daily endeavour, we summarize data to make decision in one way or the other. For example, in a
market survey, we report an average (mean) price of a particular commodity as a representative of the
price of the commodity in the market. Sometimes, we also use the most common price (mode) or the price
that tends to fall in the middle of all other prices (median) of a particular commodity. When we do all
these, we are making use of the knowledge of measures of location or central tendency.
2.2 Arithmetic Mean or Mean.
The arithmetic mean of a given set of observations is their sum divided by the number of observations.
Thus, it is given by
n
x x ... x n x i
x 1 2 i 1
n n
The above formula is used when data are given in array.
Example 2.1: Find the mean of 5, 8, 10, 15, 24 and 28.
n
x x ... x n x i
5 8 10 15 24 28 90
x 1 2 i 1
15
n n 6 6
n
fx i i
When the sampled data are from a frequency distribution for ungrouped form, we adopt x i 1
n
f
i 1
i
F 13 10 20 5 30
x 1 6 11 16 21
F 13 10 20 5 30 n
f
i 1
i
=13+10+20+5+30=78
fx 13 60 220 80 360 n
fx
i 1
i i
=13+60+220+80+630=1005
13
n
fx i i
1005
x i 1
n
12.88
f
78
i
i 1
2 6 10 10/78=0.13 6x0.13=0.78
3 11 20 20/78=0.26 11x0.26=2.86
4 16 5 5/78=0.06 16x0.06=0.96
5 21 30 30/78=0.38 21x0.38=7.98
Total n 0.17+0.78+2.86+0.96+7.98=
p x 1
i 1
12.75
n
x x i p i 12.75
i 1
F 13 10 20 5 30
(i) Class interval and class limits: A symbol defining a class such as 1 – 5 in the Table above
is called a class interval. The end numbers, 1 and 5, are called class limits; the smaller
number (1) is the lower class limit (LCL), and the larger number (5) is the upper class
limit (UCL).
(ii) Class boundaries: The size of the gap between classes is the difference between the upper
class limit of one class and the lower class limit of the next class. Thus,
14
LCB = 1 – ½ = 0.5 and UCB = 5 - ½= 5.5
Class Boundary 0.5 – 5.5 5.5 – 10.5 10.5 – 15.5 15.5 – 20.5 20.5 – 25.5
(iii) Width or Size of Class: This is difference between the upper class boundary and the
lower class boundary of any class. Class Size (C) = UCB – LCB. Example: 10.5 – 5.5 = 5
(iv) Class Mark (Class Mid-Point): This is given by ½(LCL + UCL). Example: ½(1+5) =
6/2= 3.
Class interval 1-5 6 - 10 11 - 15 16 - 20 21 - 25
Class Mid-Point 3 8 13 18 23
fx i i
x i 1
n
f
i 1
i
Example 2.4: Compute the mean given the following data set
Class interval 1-5 6 - 10 11 - 15 16 - 20 21 - 25
F 13 10 20 5 30
Solution
Class 1-5 6 - 10 11 - 15 16 - 20 21 - 25 Total
interval
F 13 10 20 5 30 78
Class Mid- 3 8 13 18 23
Point (x)
15
n
fx i i
1159
x i 1
n
14.86
f
78
i
i 1
fd i i
x A i 1
N
where d i X i A
fd i i
425
x A i 1
35 35 6.75 28.25
N 63
2.3 Median.
The median is the value that divides a data set that has been ranked in increasing order in two equal
halves. If the data set has an odd number of values, the median is given by the value of the middle term in
the ranked data set. If the data set has an even number of values, the median is given by the average of the
two middle values in the ranked data set.
16
As is obvious from the definition of the median, it divides a ranked data set into two equal parts. The
calculation of the median consists of the following two steps:
a. Rank the given data set in increasing order.
b. Find the value that divides the ranked data set in two equal parts. This value gives the
Median.
Note that if the number of observations in a data set is odd, then the median is given by the value of the
middle term in the ranked data. However, if the number of observations is even, then the median is given
by the average of the values of the two middle terms.
The depth (number of positions from either end), or position of the median is determined by the formula.
n 1
Depth of median =
2
For example, find the median of these numbers 3, 4, 5, 7, 9. I the above example, n = 5, thus, the depth of
the median is given as
5 1
Depth of median = 3
2
This implies that the median is the third number from either end in the ranked data, that is, median is 5.
For frequency distribution data,
x 1 6 11 16 21
F 13 10 21 5 30
F 13 10 21 5 30
Cumulative 13 23 44 49 79
79 1
Thus, the depth of median = 40 . Thus, the median is 11.
2
Now, even data set, we have
x 1 6 11 16 21
F 13 10 20 5 30
Cumulative 13 23 43 48 78
78 1
Thus, the depth of median = 39.5 . Thus, the median is 11.
2
2.3.1 The median of grouped data set.
17
The median of a grouped data is given by the following formula
N
2 Cf b
Median L1 C
f m
where,
Cf b is the cumulative frequency of the class just before or preceding the median class
f m is the frequency of the median class,
C is the class size.
Example 2.6: Compute the median of the following data set
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1
The median class is 50 – 59.
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1
Cumulative 5 17 19 34 47 67 74 79 80
L1 = 49.5
n
N f i = 80
i 1
Cf b = 34
f m = 13
C = 10
N
2 Cf b 802 34
Median L1 C 49.5 10 54.1
fm 13
Thus, the median give as 54.1.
18
b. Presenting the data of given series on a graph, simultaneously in the form of both less than ogive
and more than ogive. The point where these two ogives meet is the median value of the given
series.
2.4 Mode.
The mode is the value that occurs most frequently in a set of observations and around which the other
items of the set cluster densely. In other words, the mode of a distribution is the value at the point around
which the items tend to be most heavily concentrated.
Example 2.7: Find the mode of the following data set. 2, 2, 3, 5, 6, 6, 9, 10, 3, 4, 3, 9, 3.
The mode in this set of data is 3 because it is the most frequent number.
The model of random variable is said to be bimodal if the mode has 2 variables.
Example 2.8: Compute the model of the following data 2, 2, 3, 5, 6, 4, 4, 6, 9, 10, 3, 4, 3, 9, 4, 3.
The mode in this case is 3 and 4. Thus, it is a bimodal.
Also, in some cases, the data might not have mode. For example, for the data 6, 7, 13, 5, 4, 11, 2, 18 and
9. This data set has no mode.
However, for data that are grouped, the mode is obtained mathematically as
D1
Mode L1 C
D1 D 2
where,
L1 is the lower class boundary of the modal class,
f 0 is the frequency of class before or preceding the modal class,
f 1 is the frequency of the modal class,
19
f 2 is the frequency of the class after or succeeding the modal class,
C is the class size,
D1 f 1 f 0
D 2 f 1 f 2
F 13 10 20 5 30
In the above case, the data are not grouped. Thus, the mode is obtained as 21. This implies that 21
appeared 30 times compared to other random values.
In some cases, the mode can be applied to data that are grouped.
Example 2.10: Calculate the mode of the distribution table
Class 13 - 17 18 - 22 23 - 27 28 - 32 33 - 37 38 - 42 43 - 47 48 - 52 53 - 57
Interval
Frequency 2 22 10 14 3 4 6 1 1
L1 = 17.5
f 0= 2
f 1 = 22
f 2 = 10
C =5
D1 f 1 f 0 = 22 – 2 = 20
D 2 f 1 f 2 = 22 – 10 = 12
D1 20
Mode L1 C 17.5 5 17.5 3.125 20.625 .
D1 D 2 20 12
2.4.1 Graphical method of obtaining the mode with a given set of data
Calculation of Mode by Graphical Method
1. Mode by Graphical Method.
2. Draw a histogram of the given set of series.
3. The rectangle of the histogram with the greatest height will be the modal class of the series.
20
Figure 2.1: Calculation of Mode by Graphical Method
Figure 2.2: Mean, median, and mode for a symmetric histogram and frequency distribution curve.
(ii) For a histogram and a frequency distribution curve skewed to the right (see Figure 2.3), the
value of the mean is the largest, that of the mode is the smallest, and the value of the median
lies between these two. (Notice that the mode always occurs at the peak point.) The value of
the mean is the largest in this case because it is sensitive to outliers that occur in the right tail.
These outliers pull the mean to the right.
Figure 2.3: Mean, median, and mode for a histogram and frequency distribution curve skewed to the right.
21
(iii) If a histogram and a frequency distribution curve are skewed to the left (see Figure 2.4), the
value of the mean is the smallest and that of the mode is the largest, with the value of the
median lying between these two. In this case, the outliers in the left tail pull the mean to the
left.
Figure 2.4: Mean, median, and mode for a histogram and frequency distribution curve skewed to the left.
The cases (ii) and (iii) can be empirically and mathematically expressed as
Mode = Mean – 3(Mean – Median)
2.6 Geometric Mean.
The geometric mean (G) of a set of N positive numbers X 1, X 2, X 3...X N is the Nth root of the
product of the number.
G N x1, x 2 , x 3 ...x N
For a grouped data, we have
G N f 1x1 , f 2x 2 , f 3x 3 ... f Nx N
Example 2.11: Find the geometric mean of the numbers 2, 3,5
The geometric mean is obtain as G N X 1, X 2, X 3... X N 3 2 3 5 3.11 .
2.8 Relation between Arithmetic Mean, Geometric Mean and Harmonic Mean.
The geometric mean of a set of positive numbers X 1, X 2, X 3...X N is less than or equal to their arithmetic
mean but is greater than or equal to their harmonic mean.
H G X .
22
Assignment 2.1: Compute the median
x 1 6 11 16 21
F 13 10 21 5 30
SOLUTION
23
Assignment 2.2: Calculate the mode, mean, median of the distribution table
Class 13 - 17 18 - 22 23 - 27 28 - 32 33 - 37 38 - 42 43 - 47 48 - 52 53 - 57
Interval
Frequency 20 12 11 15 13 14 6 9 8
SOLUTION
24
Chapter Three
3.3 Range.
The range is the simplest of all the measures of dispersion. It is defined as the difference between the
extreme observations of the distribution. Thus, the range is the difference between the greatest
(maximum) and the smallest (minimum) observation of the distribution.
Thus,
25
In the above case, the data are not grouped or ungrouped but in array.
The largest value is 120 and the smallest value is 10. Thus the range is given as
Range = Largest – Smallest = 100 – 10 = 90.
Now, suppose for a group data.
X max X min
Coefficient of Range
X max X min
Example 3.4: Calculate the coefficient of range of monthly earnings for a year in naira
S/N 1 2 3 4 5 6 7 8 9 10 11 12
Monthly 139 150 151 151 157 158 160 161 162 162 173 175
Earnings
26
X max X min 175 139 36
Coefficient of Range 0.115
X max X min 175 139 314
Merits and Demerits of Range
1. Range is not based on the entire set of data.
2. Range is very much affected by fluctuations of sampling
3. Range cannot be used if we are dealing with open end classes
4. Range is not suitable for mathematical treatment
5. Range is very sensitive to the size of the sample
6. Range is too indefinite to be used as a practical measure of dispersion.
Uses of Range
1. Range is used in the industry for statistical quality control of the manufactured product by the
construction of R-chart. i.e. The control chart for range
2. Range is by far the most widely used measure of variability in our day-to-day life.
3. Range is used as a very convenient measure by meteorological department for weather forecast.
This is because; it is primarily interested to know the limits within which the temperature is likely
to vary on a particular day.
4. Range is used in the stock market fluctuations, variations in money rates and rate of exchange.
X X i
MD ei 1
N
Note that |….| implies the answer must be positive.
X i
222
X i 1
22.2
n 10
S/N 1 2 3 4 5 6 7 8 9 10 Total
X 2 12 15 17 20 24 27 30 35 40 222
27
X i X 2 - 12 - 15 - 17 - 20 - 24 - 27 - 30 - 35 - 40 - 90
22.2 22.2 22.2 22.2 22.2 22.2 22.2 22.2 22.2 22.2
= = = 7.2 = 5.2 = 2.2 = 1.8 = 4.8 = 7.8 = =
20.2 10.2 12.8 17.8
X X i
90
MD ei 1
9
n 10
For data in frequency distribution, the mean deviation is given as
N
f i
X i X
MD ei 1
n
f i 1
i
Example 3.6: Suppose we have the following to obtain the mean deviation
x 12 20 24 32 50 62 70 75 86 90
frequency 4 6 8 2 3 7 10 11 9 10
Solution
The first thing to do is to calculate the mean
n
fX i i
4207
X i 1
n
60.1
f
70
i
i 1
S/N X F FX X i X F X i X
28
9 86 9 9x86 = 774 86 – 60.1= 25.9 233.1
10 90 10 10x90 = 900 90 – 60.1= 29.9 299
Total 70 4207 1616.6
f i
X i X
1616.6
MD ei 1
n
23.09
f
70
i
i 1
Example 3.7: Obtain the mean deviation for a group data set
Class 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57
Interval
Frequency 2 22 10 14 3 4 6 1 1
Solution
The first thing to do here is to calculate the mean.
n
fX i i
1780
X i 1
n
28.25
f
63
i
i 1
f i
X i X
481.25
MD ei 1
n
7.64
f
63
i
i 1
29
8 48 – 52 1 50 1x50 = 50 50 – 28.25 = 21.75 21.75
F 5 4 2 7 2
Solution
S/N FX X i X F X i X
X F
1 3 5 15 9.25 46.25
2 8 4 32 4.25 17
3 13 2 26 0.75 1.5
4 18 7 126 5.75 40.25
5 23 2 46 10.75 21.5
Total 20 245 126.5
fX i i
245
X i 1
n
12.25
f
20
i
i 1
f i
X i X
126.5
MD ei 1
n
6.325
f
20
i
i 1
30
3. The averaging of the absolute deviations from an average irons out the irregularities in the
distribution and thus mean deviation provides an accurate and true measure of dispersion.
4. It is less affected by extreme observation.
5. It provides a better measure for comparison about the formation of different distributions.
Uses
Despite its mathematical drawback, the mean deviation has found it favour in economics, and business
statisticians. This is because of its simplicity, and accuracy. It is majorly use in computing personal
wealth in a community or nation, business cycle, and the National Bureau of Economic Research.
Assignment 3.1: Obtain the mean deviation for a group data set
Class 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57
Interval
Frequency 12 20 10 14 13 14 16 12 15
SOLUTION
31
3.5 Variance.
Variance is a useful very important measure of the spread of the original values about the mean.
However, for a population, the variance is denoted in terms of a Greek letter 2 . Thus, for sample the
variance is represented by s 2 . The variance is the mean of squared deviations from the mean.
The Variance of the distribution is obtained as the squared of the standard deviations from the mean.
Thus, for a population, the variance is given as
X X
n
2
i
2 i 1
.
N
Now, when data are given in frequency distribution, the variance is given as
f X X
2
f iX f iX i
n N
fX
2
2 2
i i i i i
i 1n 2
2 i 1
n X .
n n
f
i 1
i
i 1
fi fi
i 1
f
i 1
i
. The variance for a sample of a set of numbers N for variable X 1, X 2,...X N is given as
X X
n
2
i
S2 i 1
n 1
Example 3.9: Obtain the variance for a group data set
Class 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57
Interval
Frequency 2 22 10 14 3 4 6 1 1
Solution
The first thing to do here is to calculate the mean.
n
fX i i
1780
X i 1
n
28.25
f
63
i
i 1
32
S/N Class Interval Frequency X FX X2 FX 2
1 13 -17 2 15 15x2 = 30 225 450
fX 2
i i 2
55850 1780
2 X 88.45
63
n
f
63
i
i 1
X X
2
N
N
X i2 X
2
i
i 1
i 1
N N N
.
Now, when data are given in frequency distribution, the standard deviation is given as
33
f X X
2
n
N
f iX i fX
2
f iX i2
2
i i i i
i 1n 2
i 1
X
n n n
f
i 1
i f
i 1
i fi
i 1
f
i 1
i
.
The standard deviation for a sample of a set of numbers N for variable X 1, X 2,...X N is given as
X X
n
2
i
S i 1
n 1 .
The standard deviation has a special feature in that it measured in the same units as the original data. That
is, if the original data set were measured in weights, then the mean and standard deviation are measured in
weights. Also, we noted that the large the population standard deviation, the greater the spread or
variability among those numbers. The smaller the value of the population standard deviation, the smaller
the amount of variability in the data set.
Example 3.10: The arithmetic mean and standard deviation of series of 20 items were calculated by a
student as 20cm and 5cm respectively. But while calculating them an item 13 was misread as 30. Find the
correct arithmetic mean and standard deviation.
Solution
n = 20, mean = 20cm, standard deviation = 5cm; wrong value used = 30; correct value = 13
n
X i n X 20 20 400 , and
i 1
n
X
i 1
2
n2 X
2
2025 400 8500
If the wrong observation 30 is replaced by the correct value 13, then the number of observations remains
same viz., 20 and
n n
n
Corrected X
Corrected mean = i 1 383 19.15
n 20
Corrected
n
Corrected X 2
2 i 1 Corrected mean 2 7769 19.52 388.45 366.72 21.73
n 20
34
Assignment 3.2: Obtain the Standard deviation for a group data set
Class 13-17 18-22 23-27 28-32 33-37 38-42 43-47 48-52 53-57
Interval
Frequency 12 20 10 14 13 14 16 12 15
SOLUTION
35
3.7 Interpreting Standard Deviation.
The standard deviation is one of the most useful measures of dispersion or variability of the data. Thus,
for a homogeneous set of data. That is, for the observations which are close to each other, will be small
and for a heterogeneous set of data. That is, for the observations which are widely spread, will be
large. Hence, we can use the values of standard deviations for the relative or comparative study of two or
more groups.
36
2. Range Rule of the thumb: The empirical rule can be used to estimate the value of the range of the
given data.
a. For normal distribution, 95.4% of the observation lies within the limit X 2 , X 2 .
Hence, Range = X 2 - X 2 4 , covers 95.4% of the data observations.
The estimate of the standard deviation can be obtained as
Range
4
b. For normal distribution, 99.73% of the observation lies within the limit X 3 , X 3 .
Hence, Range = X 3 - X 3 6 , covers 99.73% of the data observations.
The estimate of the standard deviation can be obtained as
Range
6
a. We have X 2 570 140 710 and X 2 570 140 430 . Hence, 430 and 710
are exactly 2 standard deviation away from the mean. The percentage of scores between 430
and 710 is equal the percentage scores lying between X 2 and X 2 ., i.e. with 2
standard deviation of the mean = 95% (approximately). Hence, approximately 95% of the
scores are between 430 and 710.
b. We have X 3 570 370 780 and X 3 570 370 360 . Hence, 360 and
780 are exactly 3 standard deviation away from the mean. The percentage of scores between
360 and 780 is equal the percentage scores lying between X 3 and X 3 ., i.e. with 3
standard deviation of the mean = 99% (approximately). Hence, approximately 99% of the
scores are between 360 and 780.
Assignment 3.3: The test scores of a sample of 1000 students have a symmetric mounded distribution
with a mean score of 870 and standard deviation of 10. Approximately, what percent of the scores are
between (a) 6430 and 910, (b) 460 and 980.
SOLUTION
37
SOLUTION
38
Example 3.12: The following are the cholesterol levels of a group of 20 middle aged men.
190 230 295 310 260 245 270 220 240 240
250 275 235 180 250 250 202 215 210 320
Using the range rule of the thumb, obtain a rough estimate of the data standard deviation.
In the above example, we observed the largest observation (L) = 320 and the smallest observation (S) =
180.
Thus, Range = L – S = 320 -180 = 140. Hence, by the range rule of thumb, an estimate of the standard
deviation of the data is given as
140
Range
4
35
4
The actual value of the standard deviation for the above data is 37.88 the estimated value of the standard
deviation obtained above is reasonably close to the exact value.
1
For any k 1 , at least 1001 % of the data values lie within k standard deviations of the mean of
k2
the data values. i.e. within the limits X k and X k .
Taking k = 3, we have 1 1
k2
1 19 89 0.89 89% . Hence, using the Chebyshev‘s rule, we conclude
that at least 89% of the data values will fall within the limits X 3 and X 3 .
The empirical rule holds reasonably well in the case of data with approximately symmetrical mounded
frequency distribution and exactly for the normal distribution. However, the beauty of Chebyshev‘s rule is
that it holds for all sorts of data irrespective of the nature of the distribution represented by it. See Figure
3.2a. to Figure 3.2c.
39
Figure 3.2a: Chebyshev percent values graph
Example 3.13: According to Chebyshev‘s theorem at least what percentage of the data value lie between
(a) X 4 , X 4 , (b) X 2.3 , X 2.3
40
(a) Taking k = 4, 1 1
k2
1 412 1 161 164
15
0.9375 93.75% . Thus, at least 93.75% of the data
values lie between X 4 , X 4 .
(b) Taking k = 2.3, 1 1
k2
1 2.13 2 1 0.189 81.1 81.1% . Therefore, at least 81.1% of the
data values fall within X 2.3 and X 2.3 .
Example 3.14: The average and standard deviation of a sample of size 150 are 15 and 2 respectively.
(a) At least what percentage of sample values lies between 9 and 21?
(b) How many sample values lie between 10 and 20?
Assignment 3.4: According to Chebyshev‘s theorem at least what percentage of the data value lie
between (a) X 8 , X 8 , (b) X 6.3 , X 6.3
SOLUTION
41
3.10 Z-scores or Standard Scores.
The Z-scores give the number of standard deviations the originsal value is from the mean. A z-score or
standard score is obtained on shifting the origin in the original measurement (X) to its mean and the scale
to its standard deviation. Standard scores is usually denoted by Z; given as
X Mean
Z ,
SD
where SD means standard deviation.
The Z-values are independent of the units of measurements or in other words, they are in standard unit.
Accordingly, the transformation of the Z-score formula from X-scores to Z-scores enables us to compare
two or more distributions with different means and standard deviations. Also, the Z-scores play a very
important role in normal distribution for computing areas under normal probability curves.
Example 3.15: Joseph and Stella are taking Statistics course from different colleges. Joseph‘s score in the
first year is 76, whereas the class average is 68 with a standard deviation of 5. Stella scored 82 in the first
year while class average score was 71 and standard deviation 8. Compare the performances of the two
students relative to their classes.
XX
Score (X) Mean X SD Z-score=
76 68
Joseph 76 68 5 Z 1 1.60
5
82 71
Stella 82 71 8 Z 2 1.38
8
76 68
Joseph‘s Z-score is Z 1 1.60 standard deviations above (because it is positive) his class
5
82 71
average and Stella‘s Z-scores is Z 2 1.38 standard deviations above his class average. Since
8
Z 1 Z 2 , Joseph performance (relative to their classes) is better than that of Stella in the first year
Statistics course.
Remark: on the face of it, Stella‘s score of 82 is higher than Joseph‘s score of 76, by 82 – 76 = 6 marks.
However, relative to their classes, Joseph‘s performance is better.
3.11 Variance and Standard Deviation using Assumed Mean and scaling factor.
The standard deviation of grouped data also can be calculated by "step deviation method". In this method
also, some arbitrary data value is chosen as the assumed mean, A. Then we calculate the deviations of all
data values by using d i x i A .
42
n 2
d 2 d
n
i i
i 1 i 1
and the variance given as
n n
2
n
n
d di
2
i
2 i 1
i 1
n n
di
Or better still, for grouped data, we can scale down d to hi , where c is the regular increment in the x
c
values (Class size). Thus, the formula reduces to
n
fh i i
x A i 1
n
c
f
i 1
i
n
2
n
f i hi
f h i 1
i 1
i i
n
fi
2 i 1
c 2 for population, and
n
i 1
fi
n
2
n
f i hi
f h i 1
i 1
i i
n
fi
s2 i 1
c 2 for sample.
n
fi
i 1
The population standard deviation and the sample standard deviation can be obtained as the square roots
of the above formula.
Example 3.16: Find the mean and standard deviation of the grouped data.
Class Interval 2-5 6–9 10 - 13 14 - 17 18 - 21
Frequency 7 15 22 14 2
The solution to the example 3.16 with the class size 9.5 – 5.5 = 4 can be obtained as
Class Interval 2-5 6-9 10 - 13 14 - 17 18 - 21 Total
Frequency 7 15 22 14 2 60
43
Mid-point 3.5 7.5 11.5 15.5 19.5
d xA -8 -4 0 4 8
h -2 -1 0 1 2
fh 2 28 15 0 14 8 65
fh i i
11 44
Let A = 11.5. Then, x A i 1
n
c 11.5 4 11.5 10.77 .
f
60 60
i
i 1
n
2
n
f i hi
f i hi n
i 1
i 1 fi 65 60
112
65 2.02 16 1.05 16 16.8
s2 i 1
c
2
42
n
60 60
fi
i 1
s 16.8 4.099 .
44
Quartiles are three values that divide a ranked data set into four equal parts. The second quartile is the
same as the median of a data set. The first quartile is the median of the observations that are less than the
median, and the third quartile is the median of the observations that are greater than the median.
Figure 3.3 shows the diagrammatic partition of the quartiles.
The difference between the third quartile and the first quartile for a data set is called the interquartile
range (IQR), which is a measure of dispersion. Calculating Interquartile Range The difference between
the third and the first quartiles gives the interquartile range; that is,
IQR = Interquartile = Q3 Q1
Q3 Q1
Semi- IQR = Interquartile =
2
Example 3.17: A sample of 12 commuter students was selected from a college. The following data give
the typical one-way commuting times (in minutes) from home to college for these 12 students.
29 14 39 17 7 47 63 37 42 18 24 55
a. Find the values of the three quartiles.
b. Where does the commuting time of 47 falls in relation to the three quartiles?
c. Find the interquartile range.
Solution
a. We perform the following steps to find the three quartiles.
Step 2. We find the second quartile, which is also the median. In a total of 12 data values, the
median is between sixth and seventh terms. Thus, the median and, hence, the second quartile is
given by the average of the sixth and seventh values in the ranked data set, that is the average of
29 and 37.
45
Step 3. We find the median of the data values that are smaller than Q2 , and this gives the value of
the first quartile. The values that is smaller than Q2 are:
7 14 17 18 24 29
The value that divides these six data values in two equal parts is given by the average of the two
middle values, 17 and 18. Thus, the first quartile is:
17 18
Q1 17.5
2
Step 4. We find the median of the data values that are larger than Q2 , and this gives the value of
the third quartile. The values that is larger than Q2 are:
37 39 42 47 55 63
The value that divides these six data values in two equal parts is given by the average of the two
middle values, 42 and 47. Thus, the third quartile is:
42 47
Q3 44.5
2
Now we can summarize the calculation of the three quartiles in the following figure.
The value of Q1 = 17.5 minutes indicates that 25% of these 12 students in this sample commute for less
than 17.5 minutes and 75% of them commute for more than 17.5 minutes.
Similarly, Q2 = 33 indicates that half of these 12 students commute for less than 33 minutes and the other
half of them commute for more than 33 minutes.
The value of Q3 = 44.5 minutes indicates that 75% of these 12 students in this sample commute for less
than 44.5 minutes and 25% of them commute for more than 44.5 minutes.
b. By looking at the position of 47 minutes, we can state that this value lies in the top 25%
of the commuting times.
c. The interquartile range is given by the difference between the values of the third and the
first quartiles. Thus,
N
4 Cf Q1
Q1 L Q1 C
f Q1
where,
46
L Q1 is the lower class boundary of the first quartile class,
n
N f i total frequency of the distribution,
i 1
Cf Q1 is the cumulative frequency of the class just before or preceding the first quartile class
f Q1 is the frequency of the first quartile class,
C is the class size.
N
4 Cf Q3
Q3 L Q3 C
f Q3
where,
Cf Q3 is the cumulative frequency of the class just before or preceding the third quartile
f Q3 is the frequency of the third quartile class,
C is the class size.
The first quartile class is obtained as 80/4= 20 and that of the third quartile is 3N/4= 240/4 = 60.
Thus, the first quartile class is 40 – 49 and the third quartile class is 60 – 69.
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1
Cumulative 5 17 19 34 47 67 74 79 80
L Q1 =39.5
47
n
N f i = 80
i 1
Cf Q1 = 19
f Q1 = 15
C = 10
N
4 Cf Q1 20 19
Q1 L Q1 C 39.5 10 40.2
f Q1 15
For the third quartile
LQ3 = 59.5
n
N f i = 80
i 1
Cf Q3 = 47
f Q3 = 20
C = 10
N
4 Cf Q3 60 47
Q3 LQ3 C 59.5 10 66.0 .
f Q 3 20
3.13.2 Deciles
A decile is a quantile that is used to divide a data set into 10 equal subsections. The 5 th decile will be the
median for the dataset.
The (approximate) value of the kth decile, denoted by Dk , is
kn
Dk Value of the term in a ranked data set
10
where k denotes the number of the decile and n represents the sample size.
k n
10 Cf Dk
Dk L Dk C
f Dk
where,
48
n
N f i total frequency of the distribution,
i 1
Cf Dk is the cumulative frequency of the class just before or preceding the decile class.
f Dk is the frequency of the decile class,
C is the class size.
3.13.3 Percentiles
Percentiles are the summary measures that divide a ranked data set into 100 equal parts. Each (ranked)
data set has 99 percentiles that divide it into 100 equal parts. The data should be ranked in increasing
order to compute percentiles. The kth percentile is denoted by, where k is an integer in the range 1 to 99.
For instance, the 25th percentile is denoted by P25. Figure 3.4 shows the positions of the 99 percentiles.
Thus, the kth percentile, Pk , can be defined as a value in a data set such that about k% of the
measurements are smaller than the value of Pk and about (100 − k)% of the measurements are greater
than the value of Pk .
The approximate value of the kth percentile is determined as explained next.
Calculating Percentiles
kn
If the value of is fractional, always round it up to the next higher whole number.
100
For grouped data, we can apply the formula
k n
100 Cf Pk
Pk L Pk C
f Pk
where,
49
n
N f i total frequency of the distribution,
i 1
Cf Pk is the cumulative frequency of the class just before or preceding the percentile class.
f Pk is the frequency of the percentile class,
C is the class size.
Example 3.19: Compute the 70th percentile of the following data set
A sample of 12 commuter students was selected from a college. The following data give the typical one-
way commuting times (in minutes) from home to college for these 12 students.
29 14 39 17 7 47 63 37 42 18 24 55
Arrange the data in increasing order, we have
7 14 17 18 24 29 37 39 42 47 55 63
Now, for k = 70 and n = 12, we have
70 12
P70 8.4 9 term.
th
100
P70 = Value of the 9th term = 42 minutes.
Example 3.20: Compute the 10th percentile of the following data set
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1
10 80
Thus, the 10th percentile class is obtained as 8 for k = 10, n = 80. This falls in the class
100
interval 20 – 29, thus the P10
where,
L Pk = 19.5
n
N f i = 80
i 1
50
Cf Pk = 5
f Pk = 12
C = 10
k n
100 Cf Pk 8 5
Pk L Pk C 19.5 10 22.0
f Pk 12
Mean Mode
SK
S tan dard Deviation
SK
3 X Median
SD
Q3 2Q 2 Q1
SK
Q3 Q1
The kurtosis (K) is a measure of the tailedness of a distribution. Tailedness is how often outliers occur.
Excess kurtosis is the tailedness of a distribution relative to a normal distribution. Distributions with
medium kurtosis (medium tails) are mesokurtic. It can a measure of whether the data are heavy-tailed or
light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails,
or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution
would be the extreme case.
Mathematically,
Q3 Q1
K 2
P90 P10
where Q and P are the quartiles and percentile.
51
Example 3.21: Compute the Kurtosis of the following data set
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 5 12 2 15 13 20 7 5 1
Q3 Q1 66.0 40.2
K 2 2 0.236
P90 P10 76.64 22
Assignment 3.5: Compute the quartiles, decile and percentile of the following data set
Class 10 - 19 20 -29 30 -39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
interval
Frequency 15 20 12 13 14 25 17 25 11
SOLUTION
52
SOLUTION
53
Chapter Four
4.1 Introduction to Probability
The subject of probability theory is an aspect of mathematics that deals with study of possible outcomes
of a given events together with the outcomes relative likelihoods and distribution.
Probability is a numerical measure of the likelihood that a specific event will occur.
Suppose that an event E can happen in w ways out of a total of n possible equally likely ways. Then, the
probability of occurrence of the event called its success is denoted by
p Pr E
Number of ways E can occur
.
Total Number of possible outcomes
Simply put as
p PrE
w
n
The probability of nonoccurrence of the event called its failure is denoted by
nh
q Prnot E 1 1 p 1 PrE
h
n n
Thus, Pr E Prnot E 1 . That is p q 1
54
Figure 4.2: Tree diagram for one toss of a coin.
Suppose we throw a die once, we can obtain 1, 2, 3, 4, 5, and 6. The number of sample point is n(S) = 6.
However, if the die is rolled two times, we have
+ 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
x 2 3 4 5 6 7 8 9 10 11 12 Total
n(x) 1 2 3 4 5 6 5 4 3 2 1 36
Thus, n(S) = 36
b. Outcome: The outcome of any experiment is the particular result obtained when an
experiment is performed.
c. Sample space: A sample space is a set of all possible outcomes of an experiment.
d. Sample points: These are the individual outcomes in a sample space.
e. Event: An event is any subset of the sample space. The number of sample points of A is
denoted by n(A).
f. Equally Likely Outcomes: Two or more outcomes that have the same probability of
occurrence are said to be equally likely outcomes.
55
1. The probability of an event always lies in the range 0 to 1.
Whether it is a simple or a compound event, the probability of an event is never less than 0 or greater than
1. We can write this property as follows.
0 PEi 1
0 P A 1
An event that cannot occur has zero probability and is called an impossible (or null) event.
An event that is certain to occur has a probability equal to 1 and is called a sure (or certain) event.
In the following examples, the first event is an impossible event and the second one is a sure event.
There are very few events in real life that have probability equal to either zero or 1.0. Most of the events
in real life have probabilities that are between zero and 1.0. In other words, these probabilities are greater
than zero but less than 1.0. A higher probability such as .82 indicates that the event is more likely to
occur. On the other hand, an event with a lower probability such as .12 is less likely to occur. Sometime
events with very low (.05 or lower) probabilities are also called rare events.
2. The sum of the probabilities of all simple events (or final outcomes) for an experiment, denoted
n
by PE is always 1.
i 1
i
Example 4.1: Find the probability of obtaining an even number in one roll of a die.
This experiment of rolling a die once has a total of six outcomes: 1, 2, 3, 4, 5, and 6.
Given that the die is fair, these outcomes are equally likely. Let A be an event that an even number is
observed on the die. Event A includes three outcomes: 2, 4, and 6; that is,
A = an even number is obtained = {2, 4, 6}.
If any one of these three numbers is obtained, event A is said to occur. Since three out of six outcomes are
included in the event that an even number is obtained, its probability is:
P A
Number of outcomes included in A 3
0.5
Total number of outcome 6
56
PrE1 E2 E3 PrE1 PrE2 PrE3 PrE1E2 PrE1E3 PrE2 E3 PrE1E2 E3
Example 4.2: If E1 is the event of drawing an ace from a deck of cards and E2 is the event of drawing a
king. Obtain the probability of either drawing an ace or a king in a single draw.
There are 4 aces and 4 kings in the deck card with a total of 52. Thus, the probability of drawing an ace is
given as
PrE1 524 .
Also, the probability of drawing a king is
PrE2 4
52
Thus,
Example 4.3: If E1 is the event of drawing an ace from a deck of cards and E2 is the event of drawing a
spade. Obtain the probability of either drawing an ace or a spade in a single draw.
The probability of drawing an ace is given as
Thus,
b. Independence Event: Suppose, we have events E1 and E2 . Thus, two events E1 and E2 are
independent if the probability of the second event E2 is not affected by the occurrence or
nonoccurrence of the first event E1 . Thus, if E1 and E2 are independent events, then
PrE1 and E2 PrE1 E2 PrE1 PrE2
This is somewhat called the multiplication law of probability.
Example 4.4: Suppose two events E1 and E2 defined a sample space such that PrE 2 0.2 and
PrE1 E2 0.75 , Find PrE 2 such that
Solution
57
1. If E1 and E2 are independent, thus, PrE1 E 2 PrE1 PrE 2 . Hence,
PrE1 E2 PrE1 PrE2 PrE1 PrE2
0.75 PrE1 0.2 PrE1 0.2
0.8 PrE1 0.75 0.2 0.5
PrE1
0.55
0.6875
0.8
2. If E1 and E2 are mutually exclusive, then PrE1 E2 0 . Thus,
PrE1 E2 PrE1 PrE2
0.75 PrE1 0.2
PrE1 0.75 0.2 0.55
4.5 Conditional Probability
If E1 and E2 are two events such that the probability that E2 occurs given that E1 has occurred is
denoted by PrE1 \ E 2 , better still PrE1 give E 2 is called conditional probability of E2 given that E1
has occurred. If the occurrence or nonoccurrence of E1 does not affect the probability of occurrence of
E2 , then PrE1 \ E2 PrE2 and we say that E1 and E2 are independent events; otherwise dependent
events.
Example 4.5: Let E1 and E2 be the events heads on the fifth toss and head on the sixth toss of a coin
respectively. Thus, E1 and E2 are independent events, and thus the probability of heads on both the fifth
and sixth tosses is
PrE1E2 PrE1PrE2
1 1 1
.
2 2 4
Example 4.6: Suppose that a bag contains 3 white balls and 2 black balls. Let E1 be the event, first ball
drawn is black and E2 the event, second ball drawn is black. Where the balls are not replaced after being
drawn from the bag. Here, E1 and E2 are dependent events.
The probability that the first ball drawn is black is PrE2 32 2 52 . The probability that the second
ball drawn is black given that the first ball drawn was black is PrE2 \ E1 311 14 . Thus, the
probability that both balls drawn are black is
58
+ 1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6
PrEi PrE \ Ei
PrEi \ E n
PE PrE \ E
i 1
i i
Let
S = event that the car is sub-standard
T = event that a car is sprayed by Tom
D = event that a car is sprayed by Dick
H = event that a car is sprayed by Harry
Thus, the problem is to find p(H|S).
59
However, P(T) = 0.25, and P(S|T) = 0.05. That is, Tom sprays 25% of the cars and of these 5% are sub-
standard, on the average. Similarly, for Dick and Harry:
P(D) = 0.35, and P(S|D) = 0.08,
P(H) = 0.40, and P(S|H) = 0.10,.
Thus, applying the Bayes‘ rule, the probability that a randomly selected car, found to be sub-standard was
sprayed by Harry is
0.04
0.5
0.0805
Example 4.9: A product is being produced by Fupre Enterprises by three machines namely, M1, M2, M3.
These machines produce 40%, 35% and 25% of the product respectively. According, the defective
products produced by these machines respectively are 7%, 10% and 12%. Find
a. The probability that a part selected at a random from the finished product is defective.
b. The probability of the defective product was produced by machine M1, M2, M3.
In the above example, let
P(M1 ) = 0.4
P(M2) = 0.35
P(M3) = 0.25
ND = Non defective product
D = Defective product
P(D|M1 ) = 0.07
P(D|M2) = 0.1
P(D|M3) = 0.12
a. P(D) = P(M1 ) P(D|M1 ) + P(M2 ) P(D|M2) + P(M3 ) P(D|M3 )
0.4 0.07 0.35 0.1 0.25 0.12 0.028 0.035 0.03 0.093
Thus,
The probability that the defective is from machine 1 is given as
0.4 0.07 0.028
PM1 | D 0.3011
0.093 0.093
The probability that the defective is from machine 2is given as
0.35 0.1 0.035
PM 2 | D 0.3763
0.093 0.093
The probability that the defective is from machine 3 is given as
0.25 0.12 0.035
PM 3 | D 0.3226
0.093 0.093
60
Permutations refer to the number of ways in which a set of objects can be arranged in order (the order
being crucial). However, the key words for permutation are order or arrangement. Thus, the number of
possible arrangement = n! This can be expressed as n! n n 1n 2n 3...2 1 . Thus, n Pr
denotes the number of permutations of r objects out of a total of n objects, where n! is called n factorial.
Mathematically,
n!
n
Pr
n r !
n! 5! 5 4 3 2 1
n
Pr 20
n r ! 5 2! 3 2 1
Example 4.11: In how many ways can 3 persons sit on 6 seats in a row?
In this example, n =6, r = 3
n! 6! 6 5 4 3 2 1
n
Pr 120 .
n r ! 6 3! 3 2 1
4.7.2 Combination
In the cast of permutations, the order in which the objects are arranged is important. However, if one is
interested only in what particular objects are selected when r objects are chosen from n objects, without
regard to their arrangement, then the unordered selection is called a combination. In this case, the number
of combinations is given by the formula:
n
Pr n!
n
Cr .
r! n r !r!
Where n C r denotes the number of combinations possible in selecting r objects from n different objects.
Example 4.12: Find the number of ways in which 3 persons can be selected from a committee of 5
people.
In this example, 3 persons can be chosen from a committee of 5 in 5 C3 ways
n
Pr n! 5! 5! 5 4
Thus, n Cr 10 ways
r! n r !r! 5 3!3! 2!3! 2!
Assignment 4.1: A product is being produced by Fupre Enterprises by three machines namely, M 1, M2,
M3. These machines produce 20%, 65% and 15% of the product respectively. According, the defective
products produced by these machines respectively are 13%, 15% and 35%. Find
a. The probability that a part selected at a random from the finished product is defective.
b. The probability of the defective product was produced by machine M1, M2, M3.
61
SOLUTION
62
Chapter Five
5.1 Discrete Probability Distribution.
In the previous chapter, we fairly dealt with probability of fairly simple events. However, many other
problems may be very complex in that they cannot be readily solved in same manner as those involving
repetitive experiments such as the inspection of components coming off an assembly line, tossing a coin
etc. Thus, a discrete probability distribution counts occurrences that have countable or finite outcomes.
Discrete distributions contrast with continuous distributions, where outcomes can fall anywhere on a
continuum. Common examples of discrete distribution include the binomial, Poisson, and Bernoulli
distributions. The probability mass function (pmf) of a random variable x is denoted by px . Various
discrete probability distributions have different pmfs.
5.1.1 Discrete Random variable
A discrete random variable is a variable that can assume any set of possible values that can be quantify,
counted or listed. Example of discrete random variable is the outcomes when you roll a die.
5.2 Mean or Expectation, and Variance of discrete probability distributions.
Let the probability mass function (pmf) of a discrete random variable x be given by px . Then, the
mean or expectation of x denoted by E x is given by
n
E x xi p xi .
i 1
i 1 i 1
i 1 i 1
2 6 0.13 6x0.13=0.78
3 11 0.26 11x0.26=2.86
4 16 0.06 16x0.06=0.96
5 21 0.38 21x0.38=7.98
Total n 0.17+0.78+2.86+0.96+7.98=
p x 1
i 1
12.75
63
n
x x i p i 12.75
i 1
The probability mass (density) function (pdf) or (pmf) of the Bernoulli distribution is given as
f x p , x 1 p
1 x
for x 0,1
where x is a random variable that associate with the Bernoulli trial. Also, we say x has a Bernoulli
distribution with parameter p if X ~ Bernoulli p . Thus, when an experiment of a boy or a girl is
performed once, it is a Bernoulli experiment.
5.3.1. The Expectation and Variance of the Bernoulli distribution
The mean or expectation of the Bernoulli distribution for a random variable x is obtained as
1
E X xp , x 1 p 1 x 0 p 0 1 p 10 1 p ,1 1 p 11 0 p1 p 0 p .
x 0
p 1 p 1 p p 1 p p 2 p p 2 p1 p pq
2 2
The standard deviation is obtained by taking the square root of the variance as
pq .
Example 5.2: If X ~ Bernoulli 0.3 , find the mean and standard deviation.
64
observation in n success in n Bernoulli trials with possible values x 0,1,2,3, , ,4..., n . Then the pdf of x ,
say f x is given as
f x n C x p x 1 p
n x
, x 0,1,2,3, , ,4..., n .
n
Pr n!
where n C x
r! n r !r!
In summary, the number of ways of selecting x successful positions in n trials is a binomial distribution.
Thus, if X is a binomial distribution we write X ~ Bn, p with n trials and parameter p .
65
For x = 6, we have f x 6 10 C6 p 6 1 p 1/ 26 1 1/ 24 105
10 6 10!
i.
4!6! 512
np
n
n 1! p x 1q n x np
x 1 x 1!n x !
n
Var x 2 E X 2 E X x 2 pxi
2
i 0
n
Now, let E X X 1 E X 2 E X X X 1 X !n x ! p
n! x
q n x
x 0
However, since the sum of the first two term is zero, we have
n n
p x q n x nn 1 p 2 n 2 Cx p x 2q n x
n!
E X2
x 2 x 2 !n x ! x2
nn 1 p 2
n
n 2! p x 2q n x
x 2 x 2 !n x !
n2
E X 2 nn 1 p 2 n 2 Ck p k q n 2 k nn 1 p 2 p q
n2
k 0
nn 1 p p 1 p nn 1 p 2 np
2 n2
Hence,
66
5.5 Poisson distribution.
The Poisson distribution is a discrete probability distribution. It gives the probability of an event
happening a certain number of times (k) within a given interval of time or space. The Poisson distribution
has only one parameter, λ (lambda), which is the mean number of events. The Poisson distribution was
derived in 1837 by a French mathematician called Simeon D. Poisson (1781 – 1840). It is the limiting
case of the Binomial probability distribution with the following conditions
a. The number of trial is indefinite. That is n .
b. The constant probability p of success for each trial is indefinitely small. That is, p 0 .
c. np m , say is finite
The probability density function (pdf) of the Poisson distribution for a random variable x is given as
e x
f x PrX , x 0,1,2,3,...
x!
Note that e 2.71828 .
Example 5.6: A building society branch manager notices that over a long period of time the number of
people using an automated cash point on a Saturday morning is on average, 30 people per hour. What is
the probability that in say a 10 minute period:
a. Nobody uses the machine?
b. Three people use the machine?
In the example, we need to obtain the mean of the Poisson distribution before we proceed. Now,
30
5 . Thus,
5
e x e 5 50
P X 0 2.7183 0.0067
5
a.
x! 0!
x
e e 5 5 3
b. P X 3 0.1404
x! 3!
Example 5.7: The number of accidents occurring on a bridge in Agbarho in a year has a Poisson
distribution with mean 3. Compute the probability that in a year:
a. Only one accident occurred
b. At most one accident occurred
c. At least one accident occurred.
The solution of the example can be obtained as follows:
e x e 3 31
a. P X 1 0.149
x! 1!
e 3 30 e 3 31
b. P X 1 P X 0 P X 1 0.199
0! 1!
67
e 3 30
c. P X 1 1 P X 1 1 0.04979
0!
f x pq x 1 x 1,2,3,...
The mean and the variance of the geometric distribution are obtained as
p y
r
y
N r
n y
N
n
where
68
Is N the population size,
Is r the number of success states in the population,
is n the number of draws (i.e. quantity drawn in each trial),
is y the number of observed successes,
is a binomial coefficient.
a
b
If Y is a random variable with hypergeometric distribution. The mean and variance of Y are given by
r N r N n
E Y and 2 Var Y n
nr
. However, we express
N N N N 1
r N r
p and 1 p , we have
N N
N n N n
E Y np and 2 Var Y npq with as the adjustment.
N 1 N 1
Example 5.9: An important problem encountered by personnel directors and others faced with the
selection of the best in a finite set of elements is exemplified by the following scenario. From a group of
20 Ph.D. engineers, 10 are randomly selected for employment. What is the probability that the 10 selected
include all the 5 best engineers in the group of 20?
For this example N = 20, n = 10, and r = 5. That is, there are only 5 in the set of 5 best engineers, and we
seek the probability that Y = 5, where Y denotes the number of best engineers among the ten selected.
Then
p5
15! 21 0.162
5 20 5
10 5
5!110! 1292
5
20
n .
69
The variance of the negative binomial distribution is Var X
rq
.
p2
Here the mean is always greater than the variance. Mean >Variance.
Example 5.10: Jim is writing an exam with multiple-choice questions, and his probability of attempting
the question with the right answer is 60%. What is the probability that Jim gives the third correct answer
for the fifth attempted question?
Solution:
Probability of success P(s) = 60% = 0.6, Probability of failure P(f) = 40% = 0.4. It is given that Jim gives
the third correct answer for the fifth attempted question. Here we can use the concept of the negative
binomial distribution to find the third correct answer for the fifth attempted question. Here we have k = 5,
r = 3, p = 0.6, q = 0.4
The formula for negative binomial distribution is
5 1
Bx, r , p k 1 Cr 1 p r q k r 0.6 0.4 6 0.216 0.16 0.1296 0.16 0.020736
3 2
31
Therefore the probability of Jim giving the third correct answer for his fifth attempted question is 0.02.
Example 5.11: A geological study indicates that an exploratory oil well drilled in a particular region
should strike oil with probability 0.2. Find the probability that the third oil strike comes on the fifth well
drilled.
Solution
Assuming independent drillings and probability 0.2 of striking oil with any one well, Let Y denotes the
number of the trial on which the third oil strike occurs. Then it is reasonable to assume that Y has a
negative binomial distribution with p = 0.2. Because we are interested in r = 3 and y = 5,
4
PY 5 p5 0.2 0.8 60.0080.64 0.0307
3 2
2
If r = 2, 3, 4, . . . and Y has a negative binomial distribution with success probability p, P(Y = y0) = p(y0)
can be found by using the R (or S-Plus) command dnbinom(y0-r,r,p). If we wanted to use R to obtain p(5)
in Example 5.10, we use the command dnbinom(2,3,.2). Alternatively, P(Y ≤ y0) is found by using the R
(or S-Plus) command pnbinom(y0-r,r,p). Note that the first argument in these commands is the value y0 −
r , not the value y0. This is because some authors prefer to define the negative binomial distribution to be
that of the random variable
Y* = the number of failures before the rth success. In our formulation, the negative binomial random
variable, Y , is interpreted as the number of the trial on which the rth success occurs. In Exercise 3.100,
you will see that Y*= Y − r . Due to this relationship between the two versions of negative binomial
random variables,
P(Y = y0) = P(Y − r = y0 − r ) = P(Y*= y0 − r ). R computes probabilities associated with Y*, explaining
why the arguments for dnbinom and pnbinom are y0 − r instead of y0.
70
distributions defined by the weighted sums of random variables. However, not all random variables have
moment-generating functions.
As its name implies, the moment-generating function can be used to compute a distribution‘s moments:
the nth moment about 0 is the nth derivative of the moment-generating function, evaluated at 0.
The moment-generating function of a real-valued distribution does not always exist, unlike the
characteristic function. There are relations between the behaviour of the moment-generating function of a
distribution and properties of the distribution, such as the existence of moments.
Thus, the moment generating function M X t for a continuous random variable is defined by
n
M X t E etx etx f x etx f x dx
i 1
Thus,
M X t E etx etx f x dx
1 tx
t 2 x 2 t 3 x3
... f x dx
2! 3!
t 2 x2 t 3 x3
f x dx txf x dx f x dx f x dx ...
2!
3!
t2 t3
M X t 1 t1 2 3 ... .
2! 3!
Example 5.12: Find the moment generating function for the continuous random variable.
4 x
0 x
f x
4e
0 elsewhere
e 4 t e 4 t 0 4 4
4 4 0 .
4 t 4 t 4 t 4 t
The mean, second moment, and third moment of the question can be obtained as
71
4
M X t
4 t
The mean is obtained by taking the first derivative of the mfg.
0 4 1
M ' X t
4
4 t 4 t 2
2
M ' X 0 1
4 1
4 0 4
2
The second moment is obtained by taking the second derivative of the first derivative of the mfg.
M '' X 0 2
8 1
4 0 8
3
The second moment is obtained by taking the third derivative of the second derivative of the mfg.
M ''' X 0 3
24
256
Assignment 5.1
m e 2 x 0 x
1. Given the pdf of a continuous random variable X as f x .
0 elsewhere
Find (a) the constant value m (b) The expectation of X (c) P1 X 4 .
2. Let X be a continuous random variable X with the probability density function
2 e 4 x
. Compute (a) EX (b) 3E X 1 (c) P1 X 3 .
0 x
f x
0 elsewhere
thousands of naira?
72
8. If the moment generating function of a random variable X is given as
3t
e 2t
M X t
e
, where t 0 . Compute the mean, second moment, third moment, and the
t
variance.
x
9. Let X denotes a continuous random variable with pdf given as f x
1
exp , 0 x .
3 3
Obtain (a) P X 9 (b) P2 X 4 .
0 x 1
10. Find the mgf of the continuous random variable X with pdf given as f x
1
0 elsewhere
. Compute the mean, second moment, third moment, and the variance.
11. Obtain the moment generating function of the Normal distribution
12. Obtain the moment generating function of the Gompertz distribution
13. Obtain the moment generating function of the Alpha Power Gompertz distribution
14. Obtain the moment generating function of the Alpha Power Muth-G distribution
15. Obtain the moment generating function of the exponential distribution
16. Obtain the moment generating function of the Weibull distribution
17. Obtain the moment generating function of the Alpha Power Gompertz distribution
18. Obtain the moment generating function of the Chi-Square distribution
19. Obtain the moment generating function of the Cauchy distribution
20. Obtain the moment generating function of the shifted Gompertz-G distribution
SOLUTION
73
SOLUTION
74
SOLUTION
75
SOLUTION
76
SOLUTION
77
SOLUTION
78
5.11 Probability Generating Functions
In probability theory, the probability generating function of a discrete random variable is a power series
representation (the generating function) of the probability mass function of the random variable.
Probability generating functions are often employed for their succinct description of the sequence of
probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the
well-developed theory of power series with non-negative coefficients.
The probability generating function gives an alternative method of finding the mathematical expectation
of a discrete random variable. The probability generating function helps in finding the probability
distributions and properties of the discrete random variable.
The probability generating function of a random variable X is given as
P t E t X t x f x .
x 0
t f x repeatedly, we have
Suppose we differentiate P t E t X x
x 0
P ' t E Xt X 1 xt x 1 f x
x 0
P' ' t E X X 1t X 2 xx 1t x 2 f x
x 0
P' ' ' t E X X 1 X 2t X 3 xx 1x 2t x 3 f x
x 0
.
.
.
P k t E X X 1 X 3... X k 1t X k xx 1x 3...x k 1t x k f x
x 0
Setting t = 1, we obtain
P ' 1 E X 1
P' ' 1 E X X 1 E X 2 EX 2
79
P' ' ' t E X X 1 X 2 3
.
.
.
P k t E X X 1 X 3... X k 1t X k k
Hence,
P' ' 1 E X X 1 E X 2 EX E X 2 P' ' 1 .
But, E X 2 P' ' 1 EX P' ' 1 P' 1 , and the variance is given as
Var X E X 2 EX .
2
Hence,
Var X E X 2 EX P' ' 1 P' 1 P1 .
2 2
Example 5.13: Find the pgf and the mean for the geometric random variable.
The pmf of the geometric distribution is given as f x pq x 1 . Thus, the solution of the example is
given as
Pt E t x t x pq x 1 qt x p qt q 2t 2 q 3t 3 ...
p
x 1 x 1 q q
However, since the term in the series is an infinite geometric progression, we have for t 1 , qt 1 .
Thus,
qt
qt q 2t 2 q 3t 3 ... .
1 qt
p qt
Pt
pt
.
q 1 qt 1 qt
The mean is obtained by taking the first derivative as
d pt 1 qt p qpt
P' t
dt 1 qt
.
1 qt 2
Now, setting t 1 , we have
80
P' 1
1 q p qp p 2 pq p p q 1
1 q 2 p2 p2 p
Example 5.14: Find the probability generating function for the binomial random variable with parameter
n, p and obtain the mean and variance.
n n
Pt E t x t x C x p x q n x n C x tp q n x q pt , t .
n x n
x 0 x 0
P' t E X q pt n npq pt n 1 .
d
dt
Setting t 1 , we have
Var X E X 2 EX P' ' 1 P' 1 P1 .
2 2
But
P' ' t
d
dt
n 1
npq pt nn 1 p 2 q pt .
n2
Setting t 1 , we have
Example 5.15: Find the probability generating function when the pdf of X is defined by
5 x
f x , x 1,2,3,4 .
10
The solution is obtained as
5 x
n n
Pt E t , x t x f x t x , x 1,2,3,4 .
x 1 x 1 10
5 1 4 52 3 53 2
For x 1 0.4 , x 2 0.3 , x 3 0.2 ,
10 10 10 10 10 10
81
54 1
x 4 0.1 .
10 10
5 x
4
E t,x t x 0.4t 0.3t 2 0.2t 3 0.1t 4
x 1 10
Assignment 5.2
1. Find the pgf when the pdf of a random variable X is defined by
f x , x 1,2,3 .
x
a.
6
f x 0.3 0.7 , x 0,1
x 1 x
b.
SOLUTION
82
5.12 Sample Distribution
In this section, we presented methods for finding the distributions of functions of random variables.
Throughout this chapter, we will be working with functions of the variables Y1 , Y2 , Y3 ,...,Yn observed in a
random sample selected from a population of interest.
The random variables Y1 , Y2 , Y3 ,..., Yn are independent and have the same distribution. Certain functions of
the random variables observed in a sample are used to estimate or make decisions about unknown
population parameters.
For example, suppose that we want to estimate a population mean .If we obtain a random sample of n
observations, y1 , y2 , y3 ,..., yn , it seems reasonable to estimate with the sample mean
n
y i
y i 1
n
The goodness of this estimate depends on the behaviour of the random variables Y1 , Y2 , Y3 ,..., Yn and the effect that
n
y i
this behaviour has on y i 1
. Notice that the random variable Y is a function of (only) the random variables
n
Y1 , Y2 , Y3 ,..., Yn and the (constant) sample size n. The random variable Y is therefore an example of a statistic.
A statistic is a function of the observable random variables in a sample and known constants.
Thus,
n
yi n y n E y n n
E Y E i 1 E i i
i
n i 1 n i 1 n i 1 n n
n
yi n y n Var yi n i n i
2 2 2
Var Y Var i 1 Var i
n i 1 n
2 2
n i 1 i 1 n n2 n
Example 5.16: A single fair die is tossed once. Let Y be the number facing up. Find the expected value
and variance of Y. Also, suppose a balanced die is tossed three times. Let Y1, Y2, and Y3 denote the number of
83
spots observed on the upper face for tosses 1, 2, and 3, respectively. Suppose we are interested in
Y1 Y2 Y3
Y the average number of spots observed in a sample of size 3.
3
n
yi 1 1 1 2 3 4 5 6 21
EY i 1 1 2 3 4 5 6
1 1 1 1
3.5
n 6 6 6 6 6 6 6 6
The variance can be obtained as
n 2
y i 1
VarY i 1
1 1 1 1 1
2 12 22 32 42 52 62 3.52 2.9167
n 6 6 6 6 6 6
n
yi n y
E Y E i 1 E i 3.5
n i 1 n
n
yi n y i n Var y i n 2 i n 2 i 2 2.9167
Var Y Var i 1
n i 1
Var 2 2 0.9722 .
n i 1 n
2
i 1 n n n 3
Example 5.17: Let the random 2, 7,8,10, and 15 been drawn from a bag. Compute E Y and Var Y .
The solution is obtained as
n
yi n y 2 7 8 10 15
E Y E i 1 E i 20.4
n i 1 n 5
n
yi n
y
Var Y Var i 1 Var i
2 632.24
126.448 .
n i 1 n n 5
84
5.14.1.1 The Law of Large Numbers of Binomial Trials
If X is the number of successes in n numbers of Binomial trials with probability of success p on each of
the trial, the mean, np and variance, npq. We could deduce that the fraction of success nx can provide us
with an estimate of p. we expect as n becomes large the estimate moves closer to p.
X
Using the chebychev‘s inequality, we note that P p P X np n where 0 .
N
Thus, using the chebychev‘s inequality P X k 1
2
and substituting X x and s 2 n ,
2
k
We have P X k
n
k1 2
Let k
n
and k n
.
Then,
n
2
P X n
n 2
P X 2
2n
.
2
0 as n .
n 2
Therefore,
n
lim
P X 0.
Also, since we know that
P X 1 P X 1 0 as n .
Therefore,
P X 1.
Thus, the proof shows that for large number n, the sample mean X will be very close or converges to
population mean .
Example 5.18: Let X be the mean of a random sample of size 20 from a distribution whose pdf is
f x 3x 2
8
, 0 x 2 . Find
1. P X 1.3
85
2. P X 1.42
3. P1.2 X 1.45
3 x4
2
3 2 3 2 12
x.x 2 dx x 3dx 1.51 , and also,
8 0 8 0 8 4 0
8
3 x5
2
E X x 2 .x 2 dx x 4 dx
3 2 3 2 12
.
8 0 8 0 8 5 0
5
Thus,
2
12 12
VarX 0.15 .
5 8
1.3 1.51 0.2
1. P X 1.3 P Z P Z PZ 2.31 0.0104 .
0.15 0.0866
20
1.42 1.51 0.08
2. P X 1.42 1 P Z 1 P Z
0.15 0.0866
20
1 PZ 0.92 1 0.1788 0.8212 .
1.2 1.51 1.45 1.51
3. P1.2 X 1.45 P Z P 3.46 Z 0.58
0.15 0.15
20 20
PZ 0.58 PZ 3.46 0.281 0.0003 0.2807 .
Example 5.19: Let X1 , X 2 ,...X 25 denote the random sample of size 25 from the uniform distribution
having E X i 15 and Var X i 101 , if Y is the sum of all the random sample. That is,
Y X1 X 2 ... X 25 . Compute
a. PY 6.35
86
b. P4.25 Y 5.90
25
x x n
i
Z i 1
n 2
n
1 1
and Y X1 X 2 ... X 25 , Thus, n 25, , 2 .
5 10
Thus, the probability that the sum of the random sample will be less than or equal to 6.35 is given as
y 1 25
6.35 5.0
a. PY 6.35 P 5 PZ 0.85 0.8023 .
25 1.581
10
4.25 5.0 y 5.0 5.90 5.0
b. P4.25 Y 5.90 P P 0.47 Z 0.57
1.581 1.581 1.581
PZ 0.57 PZ 0.47 0.7157 0.3857 0.3857 .
Assignment 5.2
1. Suppose the height of workers are normally distributed with 1.69m , and the 0.32 .
Compute
i. The probability that the sum of the height of a group of 16 workers will be great than 1.55m
ii. The probability that the sum of the heights of 20 workers will be between 29.6m and 34.5m
iii. The probability that the sum of the height of a group of 16 workers will be less than 1.55m.
2. Find P x 48 for a sample size of 81 drawn from a population with a mean 45 and standard
deviation 9.
SOLUTION
87
SOLUTION
88
Chapter Six
6.1 Introduction to Continuous Probability Distributions
The previous dealt with the discrete random variable. However, in this chapter, we shall consider the
continuous random variable. A random variable X is said to be a continuous random variable if it takes
on any value in an interval who‘s set of possible values are uncountable. That is x . A good
example is the height of statistics students in the department of statistics in federal University of
Petroleum Resources, Effurun Delta State can take any countable infinity of values in an interval of real
numbers.
6.1.1 Probability Distribution
Suppose X is a random variable. Then, the probability distribution function for the continuous random
variable X is defined as
F x f x dx .
However, the probability that X will fall within the interval a, b is given as
b
F a X b f x dx .
a
The probability density function is a theoretical model for the frequency distribution (histogram) of a
population of measurements. For example, observations of the lengths of life of washers of a particular
brand will generate measurements that can be characterized by a relative frequency histogram.
Conceptually, the experiment could be repeated ad infinitum, thereby generating a relative frequency
distribution (a smooth curve) that would characterize the population of interest to the manufacturer. This
theoretical relative frequency distribution corresponds to the probability density function for the length of
life of a single machine, Y. This is shown in Figure 6.1.
a. 0 f x 1
b. F x is a non-negative function
c. F 0 and F 1
d. F xb F xa if xb xa
x
e. F x f t dt is the cumulative distribution function
89
f. f x dx 1
dF x
g. f x F ' x .
dx
Example 6.1: A function f x is defined as f x kx 2 , 0 x 3 . Find the constant k , F x ,
P1 X 3 .
The k is obtained as
kx 3
3
33 03
k k 9 1
3
0
kx 2 dx
3 0 3 3
1
k 1
9
The cumulative distribution function is obtained as
1 y3
x x
x3
F x f t dt k y 2dy
x 1 x 2
9 0
y dy
9 3 0
27
, 0 x 3.
1 x3 1 27 1 1 26 26
3
P1 X 3 f x dx
3 1 3 2
1 9 1
x dx
9 3 1 9 3 9 3 27
.
0 for y 0,
F y y for 0 y 1,
1 for y 1.
Find the probability density function for Y and graph it.
d (0)
dy 0 for y 0,
dF y d ( y)
f y F 'y 1 for 0 y 1,
dy dy
d (1)
0 for y 1.
dy
and f y is undefined at y = 0 and y = 1. A graph of F y is shown in Figure 6.2.
90
Figure 6.3: Distribution function of F y
The graph of f y for Example 6.2 is shown in Figure 6.4. Notice that the distribution and density
functions given in Example 6.2 have all the properties required of distribution and density functions,
respectively. Moreover, F y is a continuous function of y, but f y is discontinuous at the points y = 0,
1. In general, the distribution function for a continuous random variable must be continuous, but the
density function need not be everywhere continuous.
Example 6.3: Let Y be a continuous random variable with probability density function given by
3 y 2 0 y 1,
f y
0 elsewhere
Thus,
91
y
0 dt 0, for y 0,
0 y y
F y 0 dt 3t 2 dt 0 t 3 y 3 , for 0 y 1,
0 0
0 1 1
0 dt 3t 2 dt y 0dt 0 t 3
1 1 0 1, for 1 y
0
Notice that some of the integrals that we evaluated yield a value of 0. These are included for
completeness in this initial example. In future calculations, we will not explicitly display any integral that
has value 0. The graph of F y is given in Figure 6.5.
If the random variable Y has density function f y and a b then the probability that Y falls in the
interval a, b is
Pa Y b F b F a f y dy.
b
Example 6.4: Let Y be a random variable with p y given in the table below.
y 1 2 3 4
92
a. Give the distribution function, F y . Be sure to specify the value of p y for all y ,
y .
b. Sketch the distribution function given in part (a).
0 y 1
0.4 1 y 2
a. F x 0.7 2 y3
0.9 3 y 4
1 y4
b. The graph of the distribution function is given as
Assignment 6.1
1. A box contains five keys, only one of which will open a lock. Keys are randomly selected and
tried, one at a time; until the lock is opened (keys that do not work are discarded before another is
tried). Let Y be the number of the trial on which the lock is opened.
a. Find the probability function for Y.
b. Give the corresponding distribution function.
c. What is PY 3 ? PY 3 ? PY 3 ?
d. If Y is a continuous random variable, we argued that, for all a , PY a = 0.Do
any of your answers in part (c) contradict this claim? Why?*61*
93
SOLUTION
94
SOLUTION
95
6.2 Mean or Expectation, and Variance of continuous probability distributions
The next step in the study of continuous random variables is to find their means, variances, and standard
deviations, thereby acquiring numerical descriptive measures associated with their distributions. Many
times it is difficult to find the probability distribution for a random variable Y or a function of a random
variable, g(Y). Even if the density function for a random variable is known, it can be difficult to evaluate
appropriate integrals (we will see this to be the case when a random variable has a gamma distribution,
Section 4.6). When we encounter these situations, the approximate behavior of variables of interest can be
established by using their moments and the empirical rule or Tchebysheff‘s theorem.
provided that the integral exists, and where the function f x is the probability density function.
In more general form, let g X be a function of X ; then the expected value of g X is given as
E g X g x f x dx .
The variance of a random variable X is given as
Var X 2 E X x 2 f x dx E X 2 E X E X 2 2
2 2
Example 6.5: Suppose that the length of the wood assumes a continuous random variable with pdf given
as f x 32 x 2 x, 0 x 1, f x 0 , elsewhere. Find the expected value and the variance.
1
E X xf x dx x
1 1
x 2 x dx x 3 x 2 dx x4
4
3 3 3 x3
0 2 0 2 2 3
0
3 1 1 3 1 9 8 17
0.708
24 3 8 3 24 24
b. The variance is given as Var X 2 E X 2 2 . Thus, we obtain the
3 x5 x 4 1 3 1
3 1 3
E X 2 x 2 f x dx x 2 x 2 x dx x 4 x 3 dx
1
0
2
0 2
2 5 4 0 10 4
11
0.55 .
20
Thus,
Var X 2 E X 2 2 0.55 0.7082 0.55 0.5013 .
96
6.2.1 Some Properties of Mean and Variance of a random variables X , and Y , and a and b are
constants.
a. Ea a
b. E X b E X b
c. EaX aEX
d. EaX bY aEX bE Y
e. Vara 0
f. VaraX a 2VarX
g. Var X b Var X
Assignment 6.2
Let X be a random variable with pdf given as px 4 2 x, 1 x 2 and zreo elsewhere . Find
E6 X , E X 2 , and VarX 10 .
SOLUTION
97
6.3 Normal distribution
The most widely used continuous probability distribution is the normal distribution, a distribution with
the familiar bell shape that was discussed in connection with the empirical rule. The examples and
exercises in this section illustrate some of the many random variables that have distributions that are
closely approximated by a normal probability distribution. In Chapter 7we will present an argument that
at least partially explains the common occurrence of normal distributions of data in nature. The normal
density function is as follows:
A random variable Y is said to have a normal probability distribution if and only if, for 0 and
, the density function of Y is
1 y 2
f y
1
exp , y
2 2
The normal density function contains two parameters, and . Figure 6.6 shows the normal density
function
Unfortunately, a closed-form expression for this integral does not exist; hence, its evaluation requires the
use of numerical integration techniques.
The normal density function is symmetric around the value , so areas need be tabulated on only one side
of the mean. The tabulated areas are to the right of points z , where z is the distance from the mean,
measured in standard deviations. This area is shaded in Figure 6.7.
Figure 6.7: Tabulated area for the normal Probability density function
98
6.3.1 Properties of the Normal Distribution
1. The normal distribution is symmetric about the mean
2. It is centered at the mean
3. f x dx 1that is, total area under the normal distribution curve is 1
4. Approximately 68% of a normal population lies within 1 standard deviation of the mean
5. Approximately 95% of a normal population lies within 2 standard deviation of the mean
6. The probability that a randomly selected member X of a normal population lies between two
values xL and x R P X L X X R is precisely equal to the area under the normal curve between
X L and X R .
If is a normally distributed random variable with parameters, and . Then the mean and variance of
are given as
E X and VarX 2 .
The moment generating function of the normal distribution is given as
2t 2
M X t exp t .
2
We can always transform a normal random variable X to a standard normal random variable Z by using
the relationship
X
Z .
Thus,
z2
f y
1
exp , z .
2 2
This is often called the standard normal distribution with mean zero and standard deviation 1.
Example 6.6: Let Z denote a normal random variable with mean 0 and standard deviation 1.
99
1 21 PZ 2 1 20.0228 0.9544 . The area is shown in Figure 6.8.
Recall that z is the distance from the mean of a normal distribution expressed in units of standard
deviation. Thus, we convert the random variable to the desired z score using
X
Z
Thus, the desired fraction of the population is given by the area between
80 75 90 75
Z 1 0.5 and Z 2 1.5
10 10
Figure 6.9 gives the shape
100
SOLUTION
101
SOLUTION
102
6.4 Uniform distribution.
In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a
family of symmetric probability distributions. Such a distribution describes an experiment where there is
an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, a and
b which are the minimum and maximum values. The interval can either be closed (i.e. a, b ) or open
(i.e. a, b ). Therefore, the distribution is often abbreviated U a, b where U stands for uniform
distribution. The difference between the bounds defines the interval length; all intervals of the same
length on the distribution's support are equally probable. It is the maximum entropy probability
distribution for a random variable X under no constraint other than that it is contained in the
distribution's support.
The pdf of the random variable X is said to be a uniform distribution on the interval [a,b] if
f x
1
, a x b.
ba
The figure below indicated the uniform distribution with parameters a and b as real such that b a .
ba
b b b
EX
x 1 1
a
b a
dx
b a
a
xdx
2b 2 a
x2
a
2b 2a
b2 a 2
2
.
Example 6.7: Arrivals of customers at a checkout counter follow a Poisson distribution. It is known that,
during a given 30-minute period, one customer arrived at the counter. Find the probability that the
customer arrived during the last 5 minutes of the 30-minute period.
The solution is obtained as just mentioned, the actual time of arrival follows a uniform distribution over
the interval of (0, 30). If Y denotes the arrival time, then
30 25 5 1
P25 Y 30
130
dy .
25 30 30 30 6
The probability of the arrival occurring in any other 5-minute interval is also 1/6.
Assignment 6.4
103
1. Show that the variance of the uniform distribution is given as VarX
b a 2 .
12
expbt expat
2. Show that the mgf of the uniform distribution is M X t .
t b a
SOLUTION
104
6.5 Exponential distribution.
The exponential distribution for the continuous random variable is closely related to the Poisson
distribution of the discrete case.
A random variable X has an exponential distribution if and only if its probability density function is
defined as
exp x , x0
f x .
0 elsewhere
However, in some cases, it can be represented as
1 x
exp , x0
f x
0
elsewhere
If X is a random variable that has an exponential distribution, then the moment generating function,
mean and variance of X is given as
M X t , E X and 2 Var X 2 .
1
1 t
Example 6.8: Suppose X has a n exponential distribution with mean 20. Find
a. P X 18
b. P X 18
x x 18 0
18
P X 18
18 1
exp dx exp exp exp
a.
0 20 20 20 0 20 20
18
1 e 1 0.4066 0.5934
20
105
A random variable Y is said to have a gamma distribution with parameters 0 and 0 if and only
if the density function of Y is
y 1e
y
0 y
f y
,
0 elsewhere
where
y 1e y dy .
0
The quantity is known as the gamma function. A direct integration validate that 1 . Also, we
noted that 1 1 for any 1 and n n 1! , provided that n is an integer. The
graphs of the gamma function for 1,2,4 and 1 are given in Figure 6.12.
For this reason, is sometimes called the shape parameter associated with a gamma distribution. The
parameter is generally called the scale parameter because multiplying a gamma-distributed random
variable by a positive constant (and thereby changing the scale on which the measurement is made)
produces a random variable that also has a gamma distribution with the same value of (shape
parameter) but with an altered value of .
If Y is a gamma random variable with parameters and . Then the mean and variance of the gamma
distribution is given as
E Y and 2 VarY 2 .
This can be proven as shown below.
By definition, the gamma density function is such that
y
y 1e
0
dy 1.
Hence,
106
y 1e y dy ,
0
Thus,
y
y e
E Y
1
y
0
dy y e dy
0
1
1 1
.
The variance is given as Var Y 2 E Y 2 2 . Thus,
y
y 1e
1
1
y
E Y2
0
dy y e dy .
0
1 2
1
2
2
1 2
Thus,
Var Y 2 E Y 2 2 1 2
2
2 2 2 2 2 2 .
Example 6.9: Suppose X has a gamma distribution with mean 4 and 3 . Find the (a) mean, (b)
variance, (c) standard deviation, (d) P X 5 , and (e) write out the pdf of X .
xe x x 1 x
x e 3
EX xf x
a. dx dx 0.75 .
0 0 0 4
x 2e x x 1 x 1 x
1 33 1 12
b.
E X 2 x 2 f x
0
0
dx
0
x
e
dx
2
42
16
12 3 12 9 12 9
2
Var X E X
2 2
0.1875 .
2
16 4 16 16 16
c. Std X Var X E X 2 2 3
16
0.433 .
e x x 1 3 2 4 x
4 x e
2 4 x
x e
d. P X 5 dx dx 64 dx
5 5 3 5 2!
e 4 x
Let u x 2 ; du 2 xdx , and v ; dv e 4 x . Thus,
4
107
x 2e 4 x x 2e 4 x
e 4 x
64 dx 64 2 xdx
2! 4 4
5 5
5
x 2e 4 x
xe 4 x x 2e 4 x
xe 4 x
e 4 x
64 dx 64 dx
4 5
5 2 4 5
8 5
5 4
x 2e 4 x
1 xe 4 x e 4 x
25e 20 5e 4 x e 20
64 640 0 0
4 5
2 4 16 5 4 8 32
221e20
64 442e 20
32
e. The pdf will be
x 1 exp x 44 x 31 exp 4 x 43 x 2 exp 4 x
f x , 0 x .
3 2!
6.7 Weibull distribution.
The Weibull distribution is a two parameter continuous distribution of positive random variables that is
commonly used to describe the failure time of physical entities. It models a broad range of random
variables, largely in the nature of a time to failure or time between events. Examples are maximum one-
day rainfalls and the time a user spends on a web page. The distribution is named after Swedish
mathematician Waloddi Weibull, who described it in detail in 1939, although it was first identified by
Maurice René Fréchet and first applied by Rosin and Rammler (1933) to describe a particle size
distribution.
A random variable X has a Weibull distribution if and only if its probability density function is given as
f x x 1 exp x 1 .
The graph of the pdf is shown in Figure 6.12.
108
The cumulative distribution function (cdf) is given as
F x 1 exp x
with 0 and 0 as the shape and scale parameters respectively.
The cdf of the Weibull generalized family of distribution can be obtained from
Gx
Gx
F x
1 G x
t 1
exp t 1
dt 1 exp .
0
1 Gx
Gx
This can be obtained by substituting for x in the cdf of the Weibull distribution.
1 Gx
The pdf of the Weibull generalized family is obtained by differentiating the cdf with respect to x as
Assignment 6.5
a. Develop the pdf and cdf of Weibull exponential distribution
b. Develop the pdf and cdf of Weibull Frechet distribution
c. Develop the pdf and cdf of Weibull Lomax distribution
d. Develop the pdf and cdf of Weibull Gompertz distribution
e. Develop the pdf and cdf of Weibull Teissier distribution
f. Develop the pdf and cdf of Weibull Kumaraswamy distribution
g. Develop the pdf and cdf of Weibull Inverse exponential distribution
h. Develop the pdf and cdf of Weibull alpha power distribution
i. Develop the pdf and cdf of Weibull beta distribution
SOLUTION
109
SOLUTION
110
SOLUTION
111
SOLUTION
112
SOLUTION
113
6.8 Gompertz distribution
In probability and statistics, the Gompertz distribution is a continuous probability distribution, named
after Benjamin Gompertz. The Gompertz distribution is often applied to describe the distribution of adult
lifespans by demographers, and actuaries. Related fields of science such as biology and gerontology also
considered the Gompertz distribution for the analysis of survival. More recently, computer scientists have
also started to model the failure rates of computer code by the Gompertz distribution. In Marketing
Science, it has been used as an individual-level simulation for customer lifetime value modeling. In
network theory, particularly the Erdos–Renyi model, the walk length of a random self-avoiding walk
(SAW) is distributed according to the Gompertz distribution.
A random variable X has a Gompertz distribution if and only if its probability density function is given
as
f x e x exp e x 1 , x 0, , 0 .
The cdf is given as
F x 1 exp e x 1 , , x 0, , 0 .
The graph of the pdf is shown in Figure 6.13.
Where r t is the pdf of the Gompertz distribution, and W Gx d log1 G x . Thus,
F x
log 1 G x
e x exp
e x
1 dt 1 exp 1 1 G x .
0
The corresponding pdf can be derived as follows:
114
d
f x W G x rW G x .
dx
On simplification, we have
f x g x 1 G x
1
exp 1 1 G x
Assignment 6.6
a. Obtain the mean and variance of the Gompertz distribution
b. Develop the pdf and cdf of Gompertz exponential distribution
c. Develop the pdf and cdf of Gompertz Frechet distribution
d. Develop the pdf and cdf of Gompertz Lomax distribution
e. Develop the pdf and cdf of Gompertz Gompertz distribution
f. Develop the pdf and cdf of Gompertz Teissier distribution
g. Develop the pdf and cdf of Gompertz Kumaraswamy distribution
h. Develop the pdf and cdf of Gompertz Inverse exponential distribution
i. Develop the pdf and cdf of Gompertz alpha power distribution
j. Develop the pdf and cdf of Gompertz beta distribution
SOLUTION
115
SOLUTION
116
SOLUTION
117
SOLUTION
118
SOLUTION
119
6.9 Cauchy distribution.
The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also
known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz
distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution is the
distribution of the x-intercept of a ray issuing from with a uniformly distributed angle. It is also the
distribution of the ratio of two independent normally distributed random variables with mean zero.
The Cauchy distribution is often used in statistics as the canonical example of a "pathological"
distribution since both its expected value and its variance are undefined. The Cauchy distribution does not
have finite moments of order greater than or equal to one; only fractional absolute moments exist. The
Cauchy distribution has no moment generating function.
In mathematics, it is closely related to the Poisson kernel, which is the fundamental solution for the
Laplace equation in the upper half-plane. x-analytically, the others being the normal distribution and the
Levy distribution.
A random variable X is said to have a Cauchy distribution if its probability density function is defined as
f X
1
, x.
1 x 2
The distribution has one parameter.
Assignment 6.7:
SOLUTION
120
6.10 Chi-Square distribution.
n 1
A Chi-square distribution is a special case of the gamma distribution with parameters and .
2 2
The useful part of this distribution is the fact that the sum of squares of n independent normal random
variables has a Chi-square distribution with n-degrees of freedom. The pdf of the Chi-square with n-
degrees of freedom is given as
x
n
1
f x
1
exp , x 0 .
2
x
n 2
n
2 2
2
It has its expectation and variance as E X n and 2n .
1
f x x 1 x , for 0 x 1, 0, 0
1
.
The mean and variance of the Beta distribution are given as
EX and Var X
1
2
Qu F 1 u .
For example, obtain the quantile function for the exponential distribution.
The cdf of the exponential distribution is given as
F x 1 exp x
u 1 exp x
1 u exp x
log1 u x
x log1 u
1
121
Hence, the quantile function of the exponential distribution can be written as
Qu log1 u ,
1
where u 0,1 a uniform distribution.
The quantile function is very much useful for simulation purposes. This implies that random numbers can
be generated for the exponential distribution using the expression Qu log1 u . The median is
1
in Qu log1 u .
1 1
obtained if u
2
6.12.1 Survival function
It can also be called reliability function. It is a function that gives the probability that a device or object of
interest will survive beyond any specified time. It is calculated as:
S x Pr X x
1 Pr X x
Thus,
S x 1 F x
.
f x
h x
S x
H t ln S t
Also,
S t e H t
122
f t h t e H t
f x
r x
F x
F x
O x
S x
n! k 1 nk
g k yk F yk f yk 1 F yk (23)
k 1! n k !
For yk
g m 1 ym 1
2m 1! F y m f y 1 F y m
m1 m1
m ! m ! m1
iv. When ‗n‘ is even, n 2m
Exercise 6:7
1. Let X 1 , X 2 ,..., X n be the order statistics of a random sample of size n=8 from the
distribution with pdf; f x 2 x 2 x , 0 x 1 . Compute (i) the pdf of X 1 (ii) the pdf
of X n (iii) Pr 1 X 5 4
2 5
123
2. (i) Find the sampling distribution of Y1 and Yn , the minimum and maximum order statistics
of random samples of size n drawn from a continuous uniform distribution with parameters
a 0 and b 1 . (ii) If n is odd, find the sampling distribution of the median. (iii) If
n 5 , obtain the distribution of the median.
3. Obtain the distribution of minimum and maximum order statistics for the Weibull
Exponential distribution.
Assignment 6.8
a. Obtain the quantile function of the Gompertz distribution
b. Develop the quantile function of the exponential distribution
c. Develop the quantile function of the Frechet distribution
d. Develop the quantile function of the Lomax distribution
e. Develop the quantile function of the Weibull distribution
f. Develop the quantile function of the e pdf and cdf of Gompertz Teissier distribution
g. Develop the quantile function of the Kumaraswamy distribution
h. Develop the quantile function of the exponential distribution
i. Develop the quantile function of the alpha power distribution
j. Develop the quantile function of the beta distribution
SOLUTION
124
SOLUTION
125
SOLUTION
126
SOLUTION
127
6.13 Maximum Likelihood estimation of parameters of Continuous Probability distributions.
Let w1 , w 2, , , w n be random variable obtained from a population with Weibull alpha power inverted
exponential distribution with a pdf given as:
a exp log exp
f w w
w
1 w2
a
a 1
exp w
1 ,
1, 0, a 1,
Then, the log-likelihood of exponential alpha power inverted exponential distribution for vector
a, , can be represented as
T
n
w i , as
n
w i , n log a n log
n
w n log log n exp
i 1 i wi
n
n exp log an log 1 log wi2 a 1
wi i 1
exp n
log i1 1 .
w
Taking partial derivative of Equation with respect to the parameters a, , and equating to zero. The
solution to the nonlinear equation for the parameters can be obtained using R software, MATLAB, and
MAPLE. Thus, yield the maximum likelihood estimate ˆ aˆ , ˆ , ˆ .
6.13.1 How to Plot Graphs for PDF and CDF Using R-software
Here is the R code for plotting the Gompertz Inverse Exponential distribution. Its pdf and cdf
are:
1
x
1 1 e
f x
1 e
x x
2
e e
x
128
and
x
1 1 e
F x 1 e
respectively.
f<-function(x,al,be,th)
{
al*(th/x^2)* (exp(-(th/x))) *(1-exp(-(th/x)))^(-be-1) * exp((al/be)* (1-(1-exp(-(th/x)))^-be ))
}
#PDF GoIE
curve(f(x,2.0,1.7,2.6),main="", ylab="f(x)",xlab="x",ylim=c(0,0.8),0,6,lwd=2)
curve(f(x,1.0,2.4,2.5),lty=1,col=2,add=T,lwd=2)
curve(f(x,0.5,0.3,0.6),lty=1,col=3,add=T,lwd=2)
curve(f(x,1,0.5,2.5),lty=1,col=4,add=T,lwd=2)
curve(f(x,0.6,4.3,3.4),lty=1,col=5,add=T,lwd=2)
legend("topright",title=expression(" "),
c(expression(alpha*"=2.0,"*~beta*"=1.7,"*~theta*"=2.6" ),
expression(alpha*"=1.0,"*~beta*"=2.4,"*~theta*"=2.5" ),
expression(alpha*"=0.5,"*~beta*"=0.3,"*~theta*"=0.6" ),
expression(alpha*"=1.0,"*~beta*"=0.5,"*~theta*"=2.5" ),
expression(alpha*"=0.6,"*~beta*"=4.3,"*~theta*"=3.4" )),
cex=1.1,lty=c(1) ,lwd=2,col=c(1,2,3,4,5))
################ CDF
129
rm(list=ls())
f<-function(x,al,be,th)
{
1- exp((al/be)* (1-(1-exp(-(th/x)))^-be ))
}
#CDF GoIE
curve(f(x,2.0,1.7,2.6),main="", ylab="F(x)",xlab="x",ylim=c(0,1.0),0,6,lwd=2)
curve(f(x,1.0,2.4,2.5),lty=1,col=2,add=T,lwd=2)
curve(f(x,0.5,0.3,0.6),lty=1,col=3,add=T,lwd=2)
curve(f(x,1,0.5,2.5),lty=1,col=4,add=T,lwd=2)
curve(f(x,0.6,4.3,3.4),lty=1,col=5,add=T,lwd=2)
legend("bottomright",title=expression(" "),
c(expression(alpha*"=2.0,"*~beta*"=1.7,"*~theta*"=2.6" ),
expression(alpha*"=1.0,"*~beta*"=2.4,"*~theta*"=2.5" ),
expression(alpha*"=0.5,"*~beta*"=0.3,"*~theta*"=0.6" ),
expression(alpha*"=1.0,"*~beta*"=0.5,"*~theta*"=2.5" ),
expression(alpha*"=0.6,"*~beta*"=4.3,"*~theta*"=3.4" )),
cex=1.1,lty=c(1) ,lwd=2,col=c(1,2,3,4,5))
130
Chapter Seven
7.1 Introduction to Statistical Estimation.
The purpose of statistics is to use the information contained in a sample to make inferences about the
population from which the sample is taken. Because populations are characterized by numerical
descriptive measures called parameters, the objective of many statistical investigations is to estimate the
value of one or more relevant parameters. As you will see, the sampling distributions derived in Chapter 6
play an important role in the development of the estimation procedures that are the focus of this chapter.
Estimation has many practical applications. For example, a manufacturer of washing machines might be
interested in estimating the proportion p of washers that can be expected to fail prior to the expiration of a
1-year guarantee time. Other important population parameters are the population mean, variance, and
standard deviation. For example, we might wish to estimate the mean waiting time at a supermarket
checkout station or the standard deviation of the error of measurement σ of an electronic instrument. To
simplify our terminology, we will call the parameter of interest in the experiment the target parameter.
Suppose that we wish to estimate the average amount of mercury that a newly developed process can
remove from 1 ounce of ore obtained at a geographic location. We could give our estimate in two distinct
forms. First, we could use a single number for instance 0.13 ounce that we think is close to the unknown
population mean . This type of estimate is called a point estimate because a single value, or point, is
given as the estimate of . Second, we might say that will fall between two numbers; for example,
between 0.07 and 0.19 ounce. In this second type of estimation procedure, the two values that we give
may be used to construct an interval (0.07, 0.19) that is intended to enclose the parameter of interest; thus,
the estimate is called an interval estimate.
The information in the sample can be used to calculate the value of a point estimate, an interval estimate,
or both. In any case, the actual estimation is accomplished by using an estimator for the target parameter.
Definition: Estimation is the assignment of value(s) to a population parameter based on a value of the
corresponding sample statistic is called estimation.
Definition: Estimate and Estimator: The value(s) assigned to a population parameter based on the value
of a sample statistic is called an estimate. An estimator is a rule, often expressed as a formula that tells
how to calculate the value of an estimate based on the measurements contained in a sample.
For example, the samples mean,
n
X i
X i 1
n
Is one possible point estimator of the population mean . Clearly, the expression for X is both a rule
and a formula. It tells us to sum the sample observations and divide by the sample size n.
An experimenter who wants an interval estimate of a parameter must use the sample data to calculate two
values, chosen so that the interval formed by the two values includes the target parameter with a specified
probability. Examples of interval estimators will be given in subsequent sections. Many different
estimators (rules for estimating) may be obtained for the same population parameter. This should not be
surprising. Ten engineers each assigned to estimate the cost of a large construction job could use different
methods of estimation and thereby arrive at different estimates of the total cost. Such engineers, called
estimators in the construction industry, base their estimates on specified fixed guidelines and intuition.
Each estimator represents a unique human subjective rule for obtaining a single estimate. This brings us
to a most important point: Some estimators are considered good, and others, bad. The management of a
construction firm must define good and bad as they relate to the estimation of the cost of a job. How can
131
we establish criteria of goodness to compare statistical estimators? The following sections contain some
answers to this question.
Point Estimate: The value of a sample statistic that is used to estimate a population parameter is called a
point estimate.
Thus, the value computed for the sample mean, x, from a sample is a point estimate of the corresponding
population mean, . For the example mentioned earlier, suppose the Census Bureau takes a random
sample of 10,000 households and determines that the mean housing expenditure per month, x, for this
sample is #2970. Then, using x as a point estimate of , the Bureau can state that the mean housing
expenditure per month, , for all households is about #2970. Thus,
Each sample selected from a population is expected to yield a different value of the sample statistic. Thus,
the value assigned to a population mean, , based on a point estimate depends on which of the samples is
drawn. Consequently, the point estimate assigns a value to that almost always differs from the true
value of the population mean.
1 k
E t t
k i 1
1 k
Var t t E t 2
k i 1
With precision of
1
t
S .E t
With S.E as the standard error given as
S .E Var t
132
7.4 Statistical Interval Estimation.
In the case of interval estimation, instead of assigning a single value to a population parameter, an interval
is constructed around the point estimate, and then a probabilistic statement that this interval contains the
corresponding population parameter is made.
Interval Estimation: In interval estimation, an interval is constructed around the point estimate, and it is
stated that this interval contains the corresponding population parameter with a certain confidence level.
For the example about the mean housing expenditure, instead of saying that the mean housing expenditure
per month for all households is #2970, we may obtain an interval by subtracting a number from #2970
and adding the same number to #2970. Then we state that this interval contains the population mean, μ.
For purposes of illustration, suppose we subtract #340 from #2970 and add #340 to #2970. Consequently,
we obtain the interval (#2970 − #340) to (#2970 + #340), or #2630 to #3310. Then we state that the
interval #2630 to #3310 is likely to contain the population mean, , and that the mean housing
expenditure per month for all households in Nigeria is between #2630 and #3310. This procedure is called
interval estimation. The value #2630 is called the lower limit of the interval, and #3310 is called the upper
limit of the interval. The number we add to and subtract from the point estimate is called the margin of
error or the maximum error of the estimate.
The question arises: What number should we subtract from and add to a point estimate to obtain an
interval estimate? The answer to this question depends on two considerations:
1. The standard deviation x of the sample mean, x
2. The level of confidence to be attached to the interval
First, the larger the standard deviation of x , the greater is the number subtracted from and added to the
point estimate. Thus, it is obvious that if the range over which x can assume values is larger, then the
interval constructed around x must be wider to include .
Second, the quantity subtracted and added must be larger if we want to have a higher confidence in our
interval. We always attach a probabilistic statement to the interval estimation. This probabilistic statement
is given by the confidence level. An interval constructed based on this confidence level is called a
confidence interval.
Confidence Level and Confidence Interval: Each interval is constructed with regard to a given confidence
level and is called a confidence interval.
The confidence level associated with a confidence interval states how much confidence we have that this
interval contains the true population parameter. The confidence level is denoted by (1 − ) 100%.
As mentioned above, the confidence level is denoted by (1 − ) 100%, where is the Greek letter alpha.
When expressed as probability, it is called the confidence coefficient and is denoted by 1 − . In passing,
note that is called the significance level, this will be explained later.
133
Although any value of the confidence level can be chosen to construct a confidence interval, the more
common values are 90%, 95%, and 99%. The corresponding confidence coefficients are 0.90, 0.95, and
0.99, respectively. The next section describes how to construct a confidence interval for the population
mean when the population standard deviation, , is known.
134
The confidence Interval for . The (1 − ) 100% confidence interval for under Cases 1 and 11 is
x z x
where
x
n
The value of z used here is obtained from the standard normal distribution table.
The quantity z x in the confidence interval formula is called the margin of error and is denoted by E.
Margin of Error: The margin of error for the estimate of , denoted by E, is the quantity that is
subtracted from and added to the value of x to obtain a confidence interval for . Thus,
E z x
The value of z in the confidence interval formula is obtained from the standard normal distribution table
the given confidence level. To illustrate, suppose we want to construct a 95% confidence interval for .
A 95% confidence level means that the total area under the standard normal curve between two points (at
the same distance) on different sides of is 95%, or .95, as shown in Figure 7.2. Note that we have
denoted these two points by -z and z in Figure 8.1. To find the value of z for a 95% confidence level, we
first find the areas to the left of these two points, z and z. Then we find the z values for these two areas
from the normal distribution table. Note that these two values of z will be the same but with opposite
signs. To find these values of z, we perform the following two steps:
135
1. The first step is to find the areas to the left of −z and z, respectively. Note that the area between −z
and z is denoted by 1 - . Hence, the total area in the two tails is because the total area under
the curve is 1.0. Therefore, the area in each tail, as shown in Figure 7.2, is ∕2. In our example,
1 − = 0.95. Hence, the total area in both tails is = 1 − 0.95 = 0.05. Consequently, the area in
each tail is /2 = 0.05∕2 = 0.025. Then, the area to the left of −z is 0.0250, and the area to the left
of z is .0250 + .95 = .9750.
2. Now find the z values from the standard normal distribution table such that the areas to the left of
−z and z are 0.0250 and 0.9750, respectively. These z values are −1.96 and 1.96, respectively.
Thus, for a confidence level of 95%, we will use z = 1.96 in the confidence interval formula.
Table 7.1 lists the z values for some of the most commonly used confidence levels. Note that we always
use the positive value of z in the formula.
Example 7.1: A publishing company has just published a new college textbook. Before the company
decides the price at which to sell this textbook, it wants to know the average price of all such textbooks in
the market. The research department at the company took a random sample of 25 comparable textbooks
and collected information on their prices. This information produced a mean price of #145 for this
sample. It is known that the standard deviation of the prices of all such textbooks is #35 and the
population distribution of such prices is approximately normal.
a. What is the point estimate of the mean price of all such college textbooks?
b. Construct a 90% confidence interval for the mean price of all such college textbooks.
Solution Here, is known and, although n < 30, the population is approximately normally distributed.
Hence, we can use the normal distribution. From the given information,
n 25, x #145, #35
The standard deviation of x is
35
x #7.00
n 25
136
a. The point estimate of the mean price of all such college textbooks is #145; that is, point estimate
of x #145 .
b. The confidence level is 90%, or .90. First we find the z value for a 90% confidence level.
Here, the area in each tail of the normal distribution curve is ∕2 = (1 − 0.90) /2 = 0.05.
Now in the standard normal table, look for the areas 0.0500 and 0.9500 and find the corresponding
values of z. These values are (approximately) z = −1.65 and z = 1.65
Next, we substitute all the values in the confidence interval formula for . The 90% confidence
interval for is
x z x = 145 1.65(7.00) = 145 11.55 = (145 − 11.55) to (145 + 11.55) = #133.45 to #156.55
Thus, we are 90% confident that the mean price of all such college textbooks is between $133.45 and
#156.55. Note that we cannot say for sure whether the interval #133.45 to #156.55 contains the true
population mean or not. Since is a constant, we cannot say that the probability is 090 that this interval
contains because either it contains or it does not. Consequently, the probability that this interval
contains is either 1.0 or 0. All we can say is that we are 90% confident that the mean price of all such
college textbooks is between #133.45 and #156.55.
In the above estimate, #11.55 is called the margin of error or give-and-take figure.
How do we interpret a 90% confidence level? In terms of Example 7.1, if we take all possible samples of
25 such college textbooks each and construct a 90% confidence interval for μ around each sample mean,
we can expect that 90% of these intervals will include μ and 10% will not. In Figure 7.3 we show means
x1 , x2 and x3 of three different samples of the same size drawn from the same population. Also shown in
this figure are the 90% confidence intervals constructed around these three sample means. As we observe,
the 90% confidence intervals constructed around x1 , x2 include , but the one constructed around x3
does not. We can state for a 90% confidence level that if we take many samples of the same size from a
population and construct 90% confidence intervals around the means of these samples, then we expect
90% of these confidence intervals will be like the ones around x1 , x2 in Figure 7.3, which include , and
10% will be like the one around x3 , which does not include .
137
Assignment 7.1: The standard deviation for a population is = 14.8. A random sample of 25
observations selected from this population gave a mean equal to 143.72. The population is known to have
a normal distribution.
a. Make a 99% confidence interval for .
b. Construct a 95% confidence interval for .
c. Determine a 90% confidence interval for .
d. Does the width of the confidence intervals constructed in parts a through c decrease
SOLUTION
138
SOLUTION
139
7.4.2 Determining the Sample Size for the Estimation of Mean
Determining the Sample Size for the Estimation of given the confidence level and the standard
deviation of the population, the sample size that will produce a predetermined margin of error E of the
confidence interval estimate of is
z 2 2
n
E2
Assignment 7.2: An alumni association wants to estimate the mean debt of this year‘s college graduates.
It is known that the population standard deviation of the debts of this year‘s college graduates is #11,800.
How large a sample should be selected so that a 99% confidence interval of the estimate is within $800 of
the population mean?
SOLUTION
140
7.4.3 Confidence Interval for the Mean for Unknown Variance and Small Sample Size
In the previous section, we have seen that we can use the normal distribution to construct confidence
intervals for a population mean when is known provided that the underlying population is normally
distributed with large sample size larger than 30.
Case I. If the following three conditions are fulfilled:
1. The population standard deviation is not known
2. The sample size is small (i.e., n < 30)
3. The population from which the sample is selected is approximately normally distributed, and then we
use the t- distribution to make the confidence interval for .
However, we can construct a confidence interval for the population mean when the standard deviation is
not known. Thus, in place of z, the standard normal, we use the test statistic t with n-1 degrees of
freedom. The t-statistic is defined as
x
t
s
n
However, if the sample size n >30, we can approximate the t-distribution to normal distribution.
The confidence Interval for can be obtained using the (1 − ) 100% confidence interval for under
Cases 1 and 11 is
x tsx
141
where
s
sx
n
The value of t is obtained from the t distribution table for n − 1 degrees of freedom and the given
confidence level. Here tsx is the margin of error or the maximum error of the estimate; that is,
E tsx
Example 7.2: A sample of 16 private colleges around the state shows a mean cost for a year‘s tuition of
#25000 with a standard deviation of #4500. Find a 90% confidence interval for the average tuition costs at
private colleges.
n 16, x #25000, s #4500 , we calculate.
4500
s 1125
16
Since the sample size n =16< 31, we use a t-distribution with n -1 = 15 degrees of freedom.
142
In the case of a proportion, a sample is considered to be large if np and nq are both greater than 5.
If p and q are not known, then npˆ and nqˆ should each be greater than 5 for the sample to be large.
When estimating the value of a population proportion, we do not know the values of p and q.
Consequently, we cannot compute p̂ . Therefore, in the estimation of a population proportion,
we use the value of s pˆ as an estimate of p̂ . The value of s pˆ is calculated using the following formula.
Estimator of the Standard Deviation of p̂ The value of s pˆ , which gives a point estimate of p̂ , is
calculated as follows. Here, s pˆ is an estimator of p̂ .
pˆ qˆ
s pˆ
n
n
Note that the condition 0.05 must hold true to use this formula.
N
The sample proportion, p̂ , is the point estimator of the corresponding population proportion p.
Then to find the confidence interval for p, we add to and subtract from p̂ a number that is called the
margin of error, E.
Confidence Interval for the Population Proportion, p For a large sample, the (1 − )100% confidence
interval for the population proportion, p, is
pˆ zs pˆ
The value of z is obtained from the standard normal distribution table for the given confidence level, and
pˆ qˆ
s pˆ . The term zs pˆ is called the margin of error, or the maximum error of the estimate, and is
n
denoted by E.
Assignment 7.3: According to a Gallup-Purdue University study of college graduates conducted during
February 4 to March 7, 2014, 63% of college graduates polled said that they had at least one college
professor who made them feel excited about learning (www.gallup.com). Suppose that this study was
based on a random sample of 2000 college graduates. Construct a 97% confidence interval for the
corresponding population proportion.
SOLUTION
143
7.4.5 Confidence Interval for Variances and Standard Deviations
When random samples are drawn from a normal population of a known variance 2 , the quantity
n 1s 2 possesses a probability distribution that is known as chi-square distribution. The calculation
2
follows as:
2
n 1s 2
2
Making 2 the subject of the formula, we have
2
n 1s 2
2
Where s 2 is the sample variance, n is the sample size, and 2 is the value of the population variance with
(n-1) degrees of freedom.
The confidence interval is
n 1s 2 n 1s
2
,
2 df , 2 df ,1
2 2
SOLUTION
144
Chapter Eight
Introduction to Statistical Test of Hypotheses
8.0 Introduction
A hypothesis is an assertion or conjecture about the population parameter(s). For example H: , H:
H: 10, etc. These are statements about the population parameters. Sometimes, a hypothesis
could be conjectures about parameters of two or more populations. For instance, H: , H: ,
H: , etc.
There are two types of hypothesis, namely, null hypothesis and alternative hypothesis. The null
hypothesis is the hypothesis which is to be tested for acceptance or rejection. It is usually denoted as H o.
The alternative hypothesis, denoted as H1 or HA, is a conjecture about a population parameter which
gives an alternative to the null hypothesis. For instance, if Ho: , the alternatives are H1: ,
H1: , H1:
8.1.2 Levels of Significance: This is defined as the quantity of risk of the type I error which we are
willing to tolerate in making a decision about Ho. In other words, it is the probability of committing a type
I error of a tolerable level or degree. It is denoted by and is conventionally chosen as 0.05 (5%) or 0.01
145
(1%). Significance level of = 0.01 is used for high precision while that of = 0.05 is used for
moderate precision.
8.1.3 P-value (Probability Value):
This is defined as the smallest level of at which Ho is significant, that is the smallest level of at which
Ho is rejected. It is a number that denotes the likelihood of your data having occurred under the null
hypothesis of your statistical test. The level of statistical significance is usually represented as a P-value
or probability value between 0 and 1. The smaller the p-value, the more likely it is that you would reject
the null hypothesis.
P-Value enables an individual to decide for himself how significant the data are. It avoids the imposition
of a fixed level of significance about the acceptance or rejection of Ho.
In a two-tailed test, the critical region is still equal to but /2 of it lies on each tails for a test of
significant level .
146
Fig.: Two-Tailed Critical Regions.
8.1.5 Size of A Test: This is defined as the probability rejecting the null hypothesis when it is true. It is
usually denoted by . Therefore
Power ( )
Power ( ),
Power (8.2)
8.1.7 Degrees of Freedom: Degree of freedom (d.f) is defined as the number of independent observations
in a set. The table values for the distribution of the test statistics are provided on statistical tables for
various levels of significance and degrees of freedom. These tables of values enable us to decide about the
rejection or otherwise of Ho.
8.1.8 Point Estimate of a Population Parameter
A point estimate is a single value obtained from a sample and used as an estimate of a parameter. For
instance, a sample mean is a point estimate of a population mean.
8.2 Small Sample Tests for Population Mean and Difference of Two Population Means
We consider tests of hypotheses for one sample and two samples when dealing with samples of small
size. In both of the cases, the student t –test is applied.
Assumptions about the test include the following:
a. The random variable X follows a normal distribution
b. All the observations in the sample are independent
c. The sample size is not large, i.e. 30
d. The assumed value o of the population mean is the correct value.
147
e. The sample values are correctly taken and recorded.
In case the above assumptions do not hold, the reliability of the test decreases.
8.2.1 Test of Hypothesis about Population Mean for One Sample of Small Size
The t-distribution is used in the test of hypothesis about the population mean when the sample size is
small. It is defined as the deviation of the estimated sample mean from its population mean expressed in
terms of standard deviation.
Suppose a small random sample (X1, X2,…, Xn) of size has been drawn from a normal population
having a mean and an unknown variance . We want to test the hypothesis:
Ho: against H1: where o is some assumed value for .
Let the observed values on the random sample (X1, X2, …, Xn) be ( 1,, 2, ..., n). The statistic for student t
test is given by:
̅
⁄
, (8.4)
√
where the estimate of the population standard deviation is obtained using sample standard deviation
given by:
∑ ( ̅) , (8.5)
( )
or
{∑ ∑ ( ) }, (8.6)
( )
If calculated value of from (8.3) is greater or equal to the table value of ( ( ) ) at a given and
for ( ) d.f, we reject Ho, and therefore conclude that the is the difference between the sample mean
and population mean is significant, otherwise accept Ho.
Note that for a right tailed test where the alternative hypothesis takes the form : , Ho is rejected
if calculated t is greater or equal to the table value of ( ( ) ), at the specified and a (n - 1) d.f.
Similarly, for a left tailed test where the alternative hypothesis takes the form : , Ho is rejected
if calculated t is less or equal to the negative table value of ( ( ) ) at the specified and a (n - 1) d.f.
Example 8.1: In a modulation study on at certain breed of crop, a sample of eleven plants gave the
following shoot lengths. 10.1, 21.5, 11.7, 12.9, 14.8, 11.0, 19.2, 11.4, 22.6, 10.8, 10.2.
a. An earlier study reported that the mean shoot length is 15cm. Test whether the experimental data
confirm the old view at 5% level of significance?
b. Determine the p-value of the test statistic and take a decision based on it
c. Calculate the confidence interval for
d. Determine the upper and lower limits of
148
Solution
(a) Since the sample size is small, a t-test is performed where we have to test the hypothesis:
Ho: = 15 against H1 15
The test statistic is:
̅
⁄
,
√
o= 15, ̅ 14.2; 11, standard deviation, = 4.6861.
Therefore,
√ ( )
,
At 5% level of significance, = 0.05/2 = 0.025. The table value of t for 10 d.f is 2.228
Since Ho is not rejected at 5% level of significance. We conclude that the experimental
data confirms the old view at 5% level of significance.
(b) here we want to show how to compute a p-value in the absence of specialized calculators and
computer software.
In the distribution table, the calculated for 10 d.f is seen to lie between 0.260 (under
) and 0.7000 (under ).
We interpolate between these values to get value of , (say ) corresponding to 0.5662 as follows:
,
,
,
For a two-tailed test, since , we do not reject Ho and therefore conclude
that the experimental data confirms the old view at 5% level of significance.
(c) The confidence interval (CL) for the population mean is:
̅ ( ),
√
CL = 14.2 ( )
√
CL = 14.2
CL = (11.0520, 17.3480),
Lower limit = 11.0520,
Upper limit = 17.3480.
Example 8.2: A researcher claimed that the life expectancy of people living in country A is expected to
be 50 years. A survey of life expectancy was conducted in eleven states of the country and the data
obtained is as follows:
54.2 50.4 44.2 49.7 55.4 57.0 58.2 56.6 61.9 57.5 53.4
(a) Do the data support the researcher‘s claim at 95% level of significance?
(b) Find the 95% confidence interval for the population mean, .
Solution
(a) We have to test
against
Test statistic t is given by:
149
√ ( ̅ )
,
̅ ( ) ,
̅ ,
Without using the value of ̅ , we can compute with equation (8.6) given as:
{∑ (∑ ) }
,
Therefore,
( ( ) )
,
,
√ ( )
,
,
The table value of t at and a d.f of 10 for a two-tailed test is 2.228.
Since , we reject and conclude that life expectancy of the people in the said country is not
50years.
8.2.2 Tests of Hypothesis about Difference in Population Means for Samples of Small Sizes
The t-distribution is also used in the test of hypothesis involving the difference in two population means
when the sample size is small. It is defined as the deviation of the estimated sample means expressed in
terms of the sum of their standard deviations.
Here, we assume that the samples are from two normal populations with mean , and we desire to
test the hypothesis:
vs
which is equivalently,
vs
CASE 1: When . That is the two populations are distributed with the same variance.
If two independent samples of sizes and are taken, respectively, from the two populations, the test
statistic under (that is ) is given by:
̅ ̅
, (8.7)
√
with a d.f of ( ), where ̅ and ̅ are the means of sample I and sample II given as
( ) and ( ), respectively, and the pooled standard deviation or standard
error ( ) is given by:
150
∑ ( ̅ ) ∑ ( ̅ )
( )
, (8.8)
If the sample variances, , are known or have been calculated earlier, then is computed using the
formula:
( ) ( )
( )
, (8.9)
From (8.7) for difference of two means, the ( ) confidence interval (CI) is given as:
( ̅ ̅ ) ( √ ), (8.10)
where the upper and lower limits are respectively given by:
( ̅ ̅ ) ( √ ), and ( ̅ ̅ ) ( √ ).
Example 8.3: The table below gives the average value of two random samples representing the monthly
sales of a certain product in two countries X and Y.
Table 8.1. Data for Example 8.3
Month X Y
151
Dec 330 507
(a). Assume that the variances in sales of the product are the same in X and Y, test whether the average
daily sales in the two countries are equal
(b). determine the 95% confidence interval for the difference in the means of X and Y.
Solution:
(a) Let and be the means of the population from which X and Y, respectively, were drawn. The
hypothesis to test is given by:
against
Under the assumption of equal variances, the test statistic is given by:
̅ ̅
,
√
∑ ,
̅ ,
∑ ,
∑ ,
̅ ,
∑ ,
{∑ (∑ ) } {∑ (∑ ) }
( )
,
{ ( ) } { ( ) }
,
Therefore,
,
√
152
(b) The (1 - ) confidence interval for difference of means under the assumption of equal variances
Is given by:
( ̅ ̅ ) ( √ ),
Therefore,
CI = (451.75-483.42) √ ,
CI = -31 ,
With upper limit and lower limit of 38.20 and -104.54.
CASE 2: When .
In this situation, the variance cannot be pooled. We use the Berhans-Fisher test statistic given by:
̅ ̅
, (8.11)
√
, (8.12)
where and are the table values of t at a prefixed level of significance with ( ) and ( )
d.f , respectively.
If , we obtain:
, (8.13)
with ( ) d.f.
From (8.11), the ( ) confidence interval (CI) for difference of two population means when
is given by:
( ̅ ̅ ) (√ ), (8.14)
where the upper and lower limits are respectively given by:
( ̅ ̅ ) (√ ), and ( ̅ ̅ ) (√ ).
Example 8.4: The table below gives the gain in body weight (kilograms) per heifer of two different
breeds under a grazing treatment
Table 8.2: data for Example 8.4
153
Breed 1 57.3 26.9 53.2 16.8 44.8 54.2 71.4
∑ , ̅ ,∑ .
∑ , ̅ ,∑ .
)
* ( } ,
)
* ( } ,
Therefore,
,
√
, and ,
Therefore,
154
We apply the Z-test when we are dealing with large samples, that is sample sizes of equal to or greater
than 30.
̅ , ,
Therefore,
√ ( )
,
Z = -5.81.
The table value of Z at for a two-tailed test is 1.96. Since Zcal is less than -1.96, we reject Ho
and conclude that the mean income is not N10,000.
155
To test the hypothesis Ho : vs , we use the Z-test where the test statistic is
given by:
( ̅ ̅ )
, (8.16)
√
In the event that the population variances are unknown, then their estimates are obtained from the
samples variances , respectively.
Decision about Ho is taken according to the following rules:
I. Given or reject Ho if or if
Example 8.6: Two samples were drawn from two normal populations ( ) and ( ). The
following information was available on these samples regarding the expenditure in naira per family
Sample 1: ̅
Sample 2: ̅
Is there significant difference between the mean expenditure of the two families?
Solution
We want to test the hypothesis
against
Since sample sizes are large, Ho is tested by the statistic,
̅ ̅
,
√
,
√
The table value of Z at is 1.96. Since calculated Z is greater than tabulated Z, we reject Ho and
conclude that the difference in mean expenditure is significant.
8.4 Tests of Hypothesis for Proportion (One Sample Case)
If the observations on various items are categorized into two classes, and (binomial population), we
often want to test the hypothesis whether the proportion of items in a particular class, say is or not.
Thus the hypothesis:
against or or
156
Can be tested using the Z-test where P is the actual proportion of items in the population belonging to
class .
Let ̂ be the sample estimate of P. Ho can be tested by the statistic
̂
, (8.17)
√
where is the sample size, ̂ is the proportion obtained from the sample, is the hypothesized or
assumed value of the population proportion, and ̂ ̂ .
Decision about Ho can be taken as follows:
I. Given , reject Ho if or
At and
Example 8.7: To test the claim of the management that 60% employees support a new bonus scheme, a
sample of 150 employees was drawn. It was discovered that only 55 of the employees support the new
bonus scheme. Is the management right in their claim at
Solution:
The hypothesized value of P = 60% =60/100 = 0.6,
n =150, ̂ .
,
√( )
Z = -0.5825.
At , the table value of Z is 2.58. since calculated Z is less than , we reject Ho and
conclude that 60% of the employees do not support the new bonus scheme.
8.5 Test of Equality of Two Proportions (Two Sample Case)
If we have two populations, and each items of a population belongs to either of the two classes and .
A researcher may be interested to know whether the proportions of items in class in both populations
are the same or not. Here, we wish to test the hypothesis that:
against or or .
where are the proportions of items in the two populations belonging to class .
157
When it is assumed that the population proportions are equal, that is , Ho can be tested
against using the statistic
| |
, (8.18)
√ ̂ ̂( )
where ̂ ̂ , and
̂ , (8.19)
Decisions about Ho in equations (8.18 and 8.20) are the same as in equation (8.17) above for one sample
case.
Example 8.7: A sample of 400 families in City A is randomly selected, and another sample of 500
families is selected in City B. The number of TV owners in City A and City B are 48 and 120,
respectively. Test the hypothesis that the number of TV owners in both cities are the same at .
Solution:
, ,
Under the assumption that ,
( ) ( )
̂ ,
̂ ( ̂) ,
| |
,
√ ̂ ̂( )
| |
,
√ ( )
Z= ,
√
Z = 4.59.
The table value of Z at for a two-tailed test is 1.96. Since the calculated Z is greater than the
tabulated Z, we reject Ho and conclude that the proportions of TV owners in City A and City B are not
the same.
Assignment 8.1:
The following figures give the forage yield (q/ha) of ten randomly selected plots.
158
21.8, 24.8, 23.3, 29.3, 30.8, 31.8, 32.4, 32.5, 32.1, 31.3.
(a) on the basis of the data, test the claim that the mean yield of forage is 30 q/ha (i) at (ii)
(b) Determine the ( ) confidence interval and limits for both a(i) and a(ii).
SOLUTION
159
Assignment 8.2:
The yields from two strains of a crop was found to be as given below:
Strain 1: 15.4, 20.5, 15.9, 38.3, 8.7, 37.0, 39.2, 26.4
Strain 2: 33.6, 10.7, 6.5, 24.0, 42.5, 22.9
Test whether the mean yields of the two strains are equal at
SOLUTION
160
Assignment 8.3:
Given that , ̅ , and .
Test the hypothesis Ho : against (i) . (ii) (iii) .
SOLUTION
161
Assignment 8.4: A random sample of 1000 workers from South India shows that their mean wages are
$47 per week with a standard deviation of $28. A random sample of 1500 workers from North India
shows that their mean wages are $49 per week with a standard deviation of $40. Is there any significant
difference between the mean wages in the two cities? (take ).
SOLUTION
162
Assignment 8.5:
A random sample of 100 articles from a selected batch of 2000 articles shows that the average diameter of
the articles is 0.354mm with a standard deviation of 0.048mm. Find the 95% confidence interval for the
average diameter of this batch of 2000 articles.
SOLUTION
163
Assignment 8.6:
Is it likely that a sample of size 300 whose mean is 12 is a random sample from a large population with
mean 12.5 and a standard deviation 5.2?
SOLUTION
164
Assignment 8.7:
To test the claim of the management that 75% employees support a new policy, a sample of 300
employees was drawn. It was discovered that only 90 of the employees support the new policy. Is the
management right in their claim at
SOLUTION
165
Assignment 8.8:
A sample of 150 families in City A is randomly selected, and another sample of 250 families is selected in
City B. The number of car owners in City A and City B are 27 and 80, respectively. Test the hypothesis
that the number of car owners in both cities are the same at .
SOLUTION
166
8.6 Nonparametric Tests
Nonparametric tests are tests that do not require knowledge about the form of the parent distribution.
Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the
underlying distribution of the data being studied.
The reasons for the application of nonparametric tests include the following:
1. When the underlying data do not meet the assumptions about the population sample. Generally, the
application of parametric tests requires various assumptions to be satisfied. For example, the data follows
a normal distribution and the population variance is homogeneous.
2. When the population sample size is too small. The sample size is an important assumption in selecting
the appropriate statistical method. If a sample size is reasonably large, the applicable parametric test can
be used. However, if a sample size is too small, it is possible that you may not be able to validate the
distribution of the data. Thus, the application of nonparametric tests is the only suitable option.
3. The analyzed data is ordinal or nominal. Unlike parametric tests that can work only with continuous
data, nonparametric tests can be applied to other data types such as ordinal or nominal data. For such
types of variables, the nonparametric tests are the only appropriate solution.
The chi-square test is taken as parametric as well as nonparametric. Other examples of nonparametric
tests include the sign test, the rank test, etc.
Observed Frequencies … …
where the suffix (k - 1) is the d.f for , and are the observed and expected
frequencies.
167
Reject Ho at a given if the calculated chi square is greater or equal to the table value of chi square given
and (k – 1) d.f. Rejection of Ho means that the observed data are not in agreement with the expected
ratio.
Example 8.8: Mendel conducted a classic experiment on peas to know the genetic effect of colour and
shape in the first generation, after taking four crosses, namely, Round and Yellow (RY), Round and
Green (RG), Wrinkled and Yellow (WY), and Wrinkled and Green (WG). According to his theory, the
frequencies of these four classes should be in the ratio 9:3:3:1. However, the observed frequencies he
found is shown in the table below:
Table 8.4: Observed Frequencies for four Crosses of Peas
Classes RY RG WY WG Total
Sum of ratios, .
The expected frequencies ( ) based on Mendel‘s theory as calculated as follows:
Therefore,
, ,
, ,
The expected frequencies are rounded in such a way that the sum of observed frequencies is equal to the
sum of the expected frequencies.
The test statistic is given by:
∑ ( )
, k = 4,
( ) ( ) ( ) ( )
,
At the table value of which is greater than the calculated value of chi square.
Hence, Ho is not rejected. We conclude that the data support Mendel‘s theory.
168
8.7 Contingency Table for Independence of Factors
A contingency table is a rectangular array of order where m denotes the number or rows which are
equal to the categories of an attribute or criterion A, and p denotes the number or columns which are
equal to the categories of an attribute or criterion B. An example of a contingency table is shown below:
… …
… …
… …
… …
Total … …
In Table 8.5, cell frequency in the ( ) cell represents the number of items or individuals
possessing the characteristics of and Any of the row total, and column total is called the
marginal total, and is the grand total.
Contingency table simplifies the process of computations of the expected frequencies. It is very handy
when carrying out the test of independence of factors as the following example would show. The test
statistic for independence of factors is the same as in goodness-of-fit, the degrees of freedom is equal to
( )( ) The test is always right-tailed.
We test the hypothesis that:
Ho: two factors are independent Vs : Two factors are dependent.
The expected frequency, , for item is calculated using the formula:
( )
, (8.22)
Example 8.9:
In a volunteer group, adults 18 years and older volunteer to spend one to nine hours with children in an
orphanage home. The consultant employs persons among NYSC members, University students, and
nonstudents. The table below shows the types of volunteers and the number of hours they volunteer per
week.
169
1-3 4-6 7–9
since and
The complete expected frequencies is presented in the table below:
( )( ) ,
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
170
At the table value of . Since , we reject Ho and conclude that the
two attributes are not independent (that is type of volunteer and the number of hours volunteered are
dependent on each other).
Assignment 8.9
In a recent diet survey as regards the consumption of tea in two communities in a certain city, the
following results were obtained:
Community A Community B
Test at whether there is significance difference between the two communities with regards to
tea consumption
SOLUTION
171
Chapter Nine
9.1 Introduction.
In this chapter, we shall be looking at the dependence of factors or variables which takes only numerical
values. Quite often, the goal is to establish the actual or quantitative relationship between two or more
variables. This aspect is studied under regression which is the second part of the chapter. We commence
the chapter with correlation analysis which is used in situations where the researcher is only concerned
with the quality of the relationship and not the actual relationship between the variables.
9.2 Correlation.
Correlation is a measure of the degree of relationship existing between two variables. It could be defined
as the degree to which variables are related hence. Unlike regression which indicates the quantitative
relationship between independent variable and one or more dependent variables, correlation gives an
estimate the quality of the relationship between these variables.
Similar to what we have in regression analysis, when the relationship is between two variables it is termed
a simple correlation. On the other hand, if the relationship is between more than two variables, it is called
a multiple correlation.
There are mathematical equations for determining whether or not correlation exists between variables.
However, graphical method called the scatter diagram can be used to check for correlation.
Diagrammatically, the positive linear correlation is represented by the Fig. 9.1 below.
Y
In Fig. 9.1 above, it could be seen that not all the points on the scatter diagram fall on the straight line.
This shows that the correlation between X and Y is not perfect. If all the points fell on the straight line,
the correlation is said to be perfect positive.
172
Two variables are said to be negatively and linearly correlated if the two variables tend to move in the
opposite direction. An increase in the value of one of the variable is associated with a decrease in the
value of the other variable and vice versa.
The diagram of two variables that are negatively and linearly correlated is shown in Fig. 9.2.
Y
X
Fig. 9.2: Negative Linear Correlation between X and Y
In situations where all the points in Fig 9.2 above fall on the straight line, the correlation is said to be
perfect negative.
9.3.2 Non-linear Correlation
Correlation between two variables is said to be non-linear when all points appear to form a curve. As in
linear correlation, non-linear correlation could be positive or negative. The correlation between two
variables, X and Y is said to be positive non-linear if their value change in the same direction described
by a curve.
The diagram of this type of correlation is shown in the Fig. 9.3.
X
Fig.9.3: Positive non-linear Correlation between X and Y
Two variables are negatively non-linearly correlated if their respective values change in opposite
directions, forming an inverse relationship along a curve in the XY plane. The diagram is shown in Fig.
9.4.
173
Y
Correlation coefficient, denoted as or simply r, between two variables X and Y, is defined as the ratio
of the covariance of X and Y to the square root of product of the variance of X and that of Y.
Mathematically,
174
( )
(9.1)
√ ( ) ( )
∑ ∑ ∑
(9.3)
(∑ ) (∑ )
√{∑ }{∑ }
∑ ∑ ∑
for (9.4)
√{ ∑ (∑ ) }{ ∑ (∑ ) }
The Spearman ranks coefficient of correlation, denoted by the symbol , can also be used to determine
the quality of the correlation between X and Y but by first ranking the n observations for X and then Y.
Once this is done, we can make use the Spearman ranks coefficient of correlation given by the formula:
∑
( )
(9.5)
where is the squares of the differences in the ranks of the corresponding paired observations.
The Spearman ranks correlation is specifically used when the observations are qualitative.
The correlation coefficient (r or ) is a pure number, it is independent of the units in which X and Y are
measured. Furthermore, and .
When ( ) it means that there exists a perfect positive correlation between X and Y. When
( ) it means that there exists a perfect negative correlation between X and Y. When two
variables are independent, the correlation between them is zero. However, the converse is not true
because ( ) does not necessarily means that X and Y are independent. It could be that there are
other forms of relationship (e.g, quadratic) between X and Y other than a linear relationship. Between -1
and 0, and 0 and 1, we have weak correlation ((-0.1,-0.39) and(0.1, 0.39)), fair correlation ((-0.4, -0.69)
and (0.4,0.69)) and strong correlation ((-0.7,-0.99) and (0.7,0.99)).
Assumptions about Correlation Coefficient:
1. The random variables X and Y are distributed normally;
2. X and Y are linearly related;
3. There is a cause and effect relationship between the factors affecting the values of X and Y in the
series of data.
175
9.4.1 Test of Significance of Correlation Coefficient
The random samples ( )( ) ( ), are drawn from the population under consideration.
Whatever conclusions are deduced from the sample or are meant to be used to make inferences about
the parent population. For us to be sure of the validity of such inferences on the population, we need a sort
of confirmation about the correctness of these sample statistics ( or ) by way of test of significance.
Assuming R is the population parameter for , then the test of significance of correlation coefficient
means to test the hypotheses whether or not the correlation coefficient is zero in the population, that is, we
test
(9.6)
where is the estimated value of R based on the n paired observations, is the standard error of given
as:
√ (9.7)
Therefore,
√
(9.8)
√
If the calculated value of t is greater than the table value of t for a given
Level of significance and ( ) degree of freedom, reject , otherwise accept . Rejecting leads
to the conclusion that the two variables X and Y are dependent (i,e, not independent). This means that the
correlation between them is worth considering. If is accepted, it means that the value of r is due to
sampling whereas in reality, the two variables are uncorrelated in the population.
Example 9.1: The table below shows the 9 paired observations for the variables X and Y
Table 9.1: Paired observations for X and Y
(a). Calculate the value of (b) Test the significance of r at 5% level of significance
Solution
We make use of the formula:
176
∑ ∑ ∑
√{ ∑ (∑ ) }{ ∑ (∑ ) }
S/N
(a). Using the respective values from the table above, we have
( ) ( )
,
√(( ) ( ) )(( ) ( ) )
,
√
We conclude that there is a strong positive linear correlation between the variables X and Y.
177
where the test statistic is given as:
√
√
For we have
√
√ ( )
.
The table value of t 5% and 7 d.f is 2.365. We reject , implying that there is significant correlation
between X and Y.
Example 9.2
The table below shows the 9 paired observations for the variables X and Y
Table 9.3: Data for Example 9.2
X 20 14 18 25 21 35 26 37 29
Y 25 23 20 15 25 18 17 31 32
1 20 24 3 6 -3 9
2 14 23 1 5 -4 16
3 18 20 2 4 -2 4
4 25 15 5 1 4 16
5 21 25 4 7 -3 9
6 35 18 8 3 5 25
7 26 17 6 2 4 16
178
8 37 31 9 8 1 1
9 29 32 7 9 -2 4
∑
( )
( )
.
We conclude that there is a weak positive linear correlation between X and Y.
Assignment 9.1
The table below shows 14 paired observations on X and Y
X 21 25 26 24 22 30 19 24 28 32 31 29 21 18
Y 19 20 24 21 21 24 18 22 19 30 27 26 19 18
179
SOLUTION
180
Assignment 2
The table below shows 12 paired observations on X and Y
X 11 15 27 20 22 35 19 24 28 32 31 29
Y 13 10 20 31 21 17 28 22 19 30 27 36
(i). Calculate
SOLUTION
181
SOLUTION
182
9.5 Meaning of Regression Analysis
Regression is an important tool applied by researchers in order to understand the relationship between two
or more variables. It describes in mathematical form the relationship between variables in a given study.
In other words, regression analysis presents an equation for estimating the amount of change in the value
of one variable associated with a unit change in the value of another variable. In expressing any
relationship in mathematical form, two types of variables can be identified, namely, the independent
variable (also called the explanatory variables, the input variable or factor, etc) and the dependent variable
(output variable, the response variable, etc) which depends on the independent or explanatory variables.
For instance, given the statistical model: Y = β0 + β1X + e. Y is referred to as the dependent variable, X is
the explanatory or independent variable, β0 and β1 are the model coefficients, the variable e is called the
error term or disturbance in the relationship. Specifically, for a linear regression, β0 and β1 represent the
intercept and the slope, respectively. It represents factors other than X that has influence on Y. It is
implicitly assumed that the explanatory variable X has a causal effect on the dependent variable Y, and
the coefficient β1 measures this causal effect or influence of X on Y.
9.7 Least Squares Method For Estimating Parameter Estimates Of Regression Models
To estimate the magnitude of the parameters of a regression model or equation, there are several
techniques that can be used including the Ordinary Least Squares (OLS) method, the matrix method,
maximum likelihood method, etc.
There are distinctive properties associated with OLS method and these include:
a. the parameter estimates obtained by OLS method have some optimal properties like
unbiasedness, least variance, efficiency, best-linear unbiasedness (BLU), least mean square-
error (MSE) and sufficiency;
b. its computational procedure is fairly simple as compared with other econometrics techniques
and data requirement are not excessive;
c. the mechanics of OLS are simple to understand.
It is important to note that any estimation procedure using OLS method is based on certain assumptions.
It is on these assumptions that the parameter estimates of any regression model could be accepted as
183
having a dependable prediction power. Some of these assumptions include the fact that the error term is
normality distributed with a mean of zero and constant variance.
Recall that the statistical model for the linear regression line of Y on X for the population is given as:
, (9.9)
Suppose the regression line given by (9.9) is to be fitted on the basis of pairs of sample observations
( ), ( ), … , ( ). Each pair of ( ) for will satisfy the regression line (9.9).
Therefore,
, (9.10)
( ) (9.11)
∑ ∑( ) , (9.12)
To get the least squares estimates of and such that the sum of errors is minimized, we differentiate
(9.12) partially with respect to and respectively to get two equations called normal equations: We
also replace and with their estimated values say a and b, respectively.
(∑ )
∑( ) (9.13)
(∑ )
∑( ) (9.14)
Therefore,
̅ ̅, (9.20)
From (9.14), we have:
∑( ) , (9.21)
∑ ∑ ∑ , (9.22)
Substituting the value of a from (9.20) in (9.22), we get:
∑ ∑
( )∑ ∑ ∑ , (9.23)
(∑ ) (∑ )(∑ )
(∑ ) ∑ , (9.24)
184
Therefore,
∑ (∑ )(∑ )
, (9.25)
∑ (∑ )
∑ (∑ )(∑ )
∑ (∑ )
, (9.26)
For
9.8 Causes of Deviation of the Fitted Value from the Observed Value
In reality, ̂ , in (9.28) will always show deviations due to some reasons that are briefly
explained.
1. Omission of relevant variables from the model
It is always difficult to include all the explanatory variables that affect or influence the response variable
in a single model. This is because of the complexity of the real-life situations as well as the need to keep
the model as simple as possible. Thus, several explanatory variables that affect a given phenomenon in
one way or the other may not be recognized and included in the model.
2. Error of specification
The deviation of a fitted value from the observed value could also occur due to imperfect specification of
a relationship. Quite often, a non-linear relationship is represented in a linear form. Again, some
phenomena need to be studied using several equations solved simultaneously. If these phenomena are
studied with a single model, error of specification is bound to occur.
3. Error of aggregation
In collecting data for statistical analysis, it is often the practice to add data from different groups with
dissimilar characteristics. For instance, in the studies that involves social behavior, since the attitudes of
an individual may differ from those of any group, lumping their data as a unit for analysis could bring
deviations of the fitted value of the response variable from the observed value.
4. Error of measurement
This error arises due to the method of data collection and processing. In data collection, a wrong sampling
technique adopted could cause an error in measurement. Equally, the use in appropriate statistical process
in processing statistical information could cause deviations of observations from the fitted line.
5. Inclusion of irrelevant variables without theoretical underpinning
Differences or discrepancies in the value of the fitted response and that of the raw or observed value could
arise in in situation where irrelevant variables are included in the model.
185
Example 9.3 The table below gives the paired values of variables X and Y
Table 9.5: Data for Example 9.3
X 16 22 28 24 29 25 16 23 24
Y 35 42 57 40 54 51 34 47 45
(1). Find the regression line of Y on X. (2) Predict the value of y when x =31
Solution:
In this example, we wish to make use of the equations (9.28)
The values of the terms needed to compute the estimates and from equation (9.28) are obtained in the
table below:
Table 9.6: Computation of data for Example 9.3
S/N (i) ̅ ̅ ( ̅) ( ̅ )( ̅)
1 16 35 -7 -10 49 70
2 22 42 -1 -3 1 3
3 28 57 5 12 25 60
4 24 40 1 -5 1 -5
5 29 54 6 9 36 54
6 25 51 2 6 4 12
7 16 34 -7 -11 49 77
8 23 47 0 2 0 0
9 24 45 1 0 1 0
Total 207 405 0 0 166 271
9; ∑ ∑ ( ) , ̅
∑ ∑ ( ) ,̅
∑( ̅ )( ̅)
(1) ̂ ∑( ̅)
186
̅ ̅
( )
Therefore ̂
(2) Predict the value of y when ,
̂ ( )
̂
9.9 Standard Errors of the Estimated Parameters for Simple Linear Regression
The standard errors of the estimated parameters are given as follows:
For parameter a, we have:
̅
√ ( ∑( ̅)
) (9.29)
√ (9.30)
∑( ̅)
Assignment 3
(a)Obtain the parameters of the regression of Y on X using the data in the table below:
X 25 21 19 18 20 29 32 26 42 30 22
Y 19 17 15 11 15 20 24 19 30 20 15
(b) Predict the value of y when x is 35 (c) obtain (i) (ii) (iii)
SOLUTION
187
SOLUTION
188
SOLUTION
189
9.10 Hypothesis Testing of the Parameters for Simple Linear Regression
A hypothesis is an assertion or conjecture about any chosen parameter (e.g. mean) of the population.
In case we are considering more than one population, a hypothesis may be about the relationship
between the similar parameters of the distributions.
A hypothesis can take two forms, namely, null and alternative hypotheses.
i. Null hypothesis: A null hypothesis is the hypothesis which is to be actually tested for
acceptance or rejection. It is denoted as .
ii. Alternative hypothesis: An alternative hypothesis is a statement about the population
parameter(s) which gives an alternative to the null hypothesis, within the range of pertinent values of
the parameter. It is denoted by or .
In this Section, the three common tests shall be considered. These are:
i. the standard error test
ii. the t-test and
iii. the F-test.
During hypothesis testing, we are faced with the task of finding out whether the parameter estimates
are statistically significant or not. Hence it entails determining whether the independent variable(s)
significantly affect the dependent variable or not.
(a). If the standard error or is greater than the half of the or , that is if , where is
either or , we accept the null hypothesis and conclude that is not statistically significant.
(b) If , we reject the null hypothesis and conclude that is statistically significant.
The t-test is usually used when the number of observations, 30. Here, reference is made to the
degree of freedom at the chosen level of significance . The calculated t-value is compared with the
theoretical or table value of t, ( ), at a particular level of significance with (n-2) degree of freedom.
The calculated ( ) is given by:
, (9.32)
and
, (9.33)
where and are obtained from equations (9.29) and (9.30), respectively.
Decision Rule
(1). If ( ) we reject and conclude that is statistically significant,
that is is different from zero.
(2) If ( ) , we accept the null hypothesis (H0) and conclude that is not
statistically significant, that is is not different from zero.
Accepting for parameter b means that the regression parameter (coefficient) of Y on X has no
practical significance, that is the change in Y corresponding to a unit change in X is practically
meaningless.
For parameter a, accepting means that the regression line passes through the origin, that is the
regression line does not cut the Y axis.
Source of
DF Sum of Squares of Error Mean of Squares of Error F-value
Deviation
̂ ∑( ̅ )( ̅)
Due to ̂ ∑( ̂ ∑( ̅ )( ̅)
1 ̅) ( ̅). .
Regression .
∑( ̅) ̂ ∑( ̅ )( ̅)
Dev. from (n- ∑( ̅)
( )
Regression 2) ̂ ∑( ̅) ( ̅). .
(n-
Total ∑( ̅) .
1)
( ) (9.34)
CL= ( ) (9.35)
191
where ( ) is the table value for a two-tailed t-test at level of significance for ( ) degree of
freedom.
S/N X Y
1 36.6 54.8
2 39.5 57.6
3 43.4 58.1
4 47.6 63.4
5 53.4 72.5
6 58.5 78.4
7 66.1 82.7
8 74.9 84.4
9 87.1 90.3
10 100.0 100.0
11 115.1 109.2
12 131.7 119.8
13 150.0 129.7
14 162.6 140.8
15 176.3 153.8
16 190.4 152.6
17 209.4 153.2
18 233.6 163.0
19 255.7 175.3
20 271.4 184.3
192
̅ ̅
S/N X Y ( ̅) ( ̅) ( ̅) ( ̅) ( ̅ )( ̅)
∑( ̅ )( ̅)
(a) ∑( ̅)
n=20.
̅ ̅
( )
193
Therefore ̂
(b)
̅
√ ( ∑( ̅)
)
√
∑( ̅)
(i) * ( )+
( )
̅
(ii). √ ( ∑( ̅)
)
( )
√ ( )
√ ( )
(iii). √
∑( ̅)
√
.
To determine the significance of and using the t-test at level of significance, we proceed
as follow:
For the parameter estimate , we have:
194
The calculated value of t is greater than the tabulated value of at 5% level of significance at 18 degree
of freedom which is given as .
Hence, we reject , meaning that is significant. The practical implication of this conclusion is that
the regression line does not pass through the origin.
For the parameter estimate, we have:
The calculated value of is greater than the tabulated value of at 5% level of significance
at 18 degree of freedom which is given as .
Hence, we reject , meaning that is significant. The practical implication of the conclusion is that
plays a significant role in determining Y through X.
( )
CL=41.96
CL=41.96 6.01
This implies that the upper limit is 47.97 and lower limit is 35.96
For , the 95% confidence limits for a is given as:
( )
CL=0.55
CL=0.55 0.0408
This implies that the upper limit is 0.5908 and lower limit is 0.5092
195
9.11 Curvilinear Regression
When we have nonlinear relations, we often assume an intrinsically linear model and then we fit data
to the model using polynomial regression. That is, we employ some models that use regression to fit
curves instead of straight lines. The technique is known as curvilinear regression analysis. To use
curvilinear regression analysis, we test several polynomial regression equations. Polynomial equations
are formed by taking our independent variable to successive powers. For example, we could have
, Quadratic Model
, Cubic Model
In general, the polynomial equation is referred to by its degree, which is the number of the largest
exponent. In the above, the quadratic is of the second degree, the cubic is of the third degree, and so
on.
The function of the power terms is to introduce bends into the regression line. With simple linear
regression, the regression line is straight. With the addition of the quadratic term, we can introduce or
model one bend. With the addition of the cubic term, we can model two bends, and so forth. The
Figure below shows an example of a quadratic regression which is plotted for the model
for the values of x from 0 to 5:
The plot of the cubic model given by is shown in the Figure below:
196
Fig. 9.7: Graph of a cubic model
Notice that there is a single bend in the quadratic curve as against two for the cubic curve because of
the effect of the term for the quadratic model and the terms and for the cubic
model.
Assignment 4
Given the table below:
X 19 11 10 14 22 9 13 23 11
Y 10 12 8 15 20 7 10 17 12
SOLUTION
197
SOLUTION
198
Chapter Ten
Research is an important part of academic, scientific, social, medical, economic, industrial and
management studies. The objectives of a research include to find out the cause and effect among
variables and to establish their relationship between them. One of the means or tools of achieving
these objectives is design of experiments.
10.1.1 Data
Data are numerical values of a variable collected on individuals or unit in the course of carrying out
experiments
10.1.4 Treatment
A treatment is equivalent to a factor level in a single-factor analysis. However, in multi-factor
analysis, a treatment is equivalent to a combination of factor levels. For instance, in the study of the
effect of carbohydrate on the weight gain in birds, each of the factor levels (that is wheat, dried
cassava peels and maize) is a treatment under a single-factor analysis.
In considering treatment as a combination of factor levels, assuming we wish to study the effect of
different breeds of chicken and different sources of carbohydrates on the weight gain in chicken. We
may decide to use three different breeds of chicken and three sources of carbohydrates, and this will
give a total of nine (3 x 3 = 9) treatments. This is so because each of the sources of carbohydrates will
combine with each breed of chicken to form a treatment. This combination of factor levels can be
illustrated as follows:
199
Sources of carbohydrate (SC):
1. Dried Cassava peels (C)
2. Wheat (W)
3. Maize (M)
SC C W M
BC
P PC PW PM
S SC SW SM
B BC BW BM
The treatments in this experiment are: PC, PW, PM, SC, SW, SM, BC, BW and BM
10.1.5 Dependent Variable
A dependent variable is one which is used to measure the effects of the various treatments/conditions
on the individuals or experimental units. Measurements of yields of crops in fertilizer experiments and
volume of milk produced of cows in feeding experiments are examples of dependent variables.
200
10.1.8 Experimental Error
These are error caused by extraneous factors in an experiment which are beyond the control or not
controlled by the researcher. For example, if on two or more plots, seeds are sown on the same day
and are treated similarly in all respects, their yields will still not be the same. This difference in yields
forms part of the experimental error for the particular study.
10.1.10 Randomization
This is the process in which each and every experimental unit has the same chance of being allocated
to treatments. This is best performed with the help of a random number table. Randomization
eliminates researcher‘s bias. It also ensures that no treatment will continually be favoured or
handicapped by extraneous sources of variation which are out of the control of the researcher. With
randomization, statistical inferences drawn are based on the assumption of that errors are
independently and normally distributed.
10.1.11 Replication
This is the repetition of each treatment on several experimental units. Replication provides an estimate
of the experimental error variance which is the basic requirement to test the significance of the
treatment differences and finding the estimate of the standard error of mean given by , where is
√
the mean square error of variance and is the number of replications of the treatment.
201
(ii) Testing effects: The effects which are caused due to taking measurement on the dependent
variable before and after the application of the treatment on the test units are called testing effects
which are further classified into two types of effects, namely, main testing effects (MT) and
interactive testing effects (IT). The effect that occurs when a prior observation (m1) after a latter
observation (m2), is the main testing effect. An effect in which a prior measurement (m1) has an
influence on the independent variable and hence, affecting the test unit response is called interactive
testing effect.
(iii) Instrumentation (I): Any changes made in the calibration of the measuring instrument during
interviews cause threats to validity of the experiment.
(iv) Selection Bias: The differential selection of subjects (individuals or group of individuals) to
experimental and control groups is a major threat to internal validity. Selection bias is almost
eliminated if the subjects are assigned to control and experimental groups randomly or matching the
members of both groups on key factors.
(v) Experimental mortality: This term refers to both the death of the experimental units and changes in
the composition of the study groups during experimentation. Such changes often occur due to deaths
of subjects or some subjects refuse to continue in the experiment. This alters the distribution of the
subjects and one cannot ensure that the units lost would have responded to the treatments in the same
manner as the remaining ones.
(vi) Resentful Demoralization: If the subjects come to know that the treatments levels assigned to
them are inferior in terms of benefits, goods or services, they may feel demoralized and hence show
their resentment. They are very likely to perform very low. This will create larger difference than the
real one between desirable and undesirable treatment levels. Confidentiality of treatment levels be
maintained to reduce this type of threat to the validity of the experiment.
(vii) Reactive Factors: If the subjects know that they are being observed, they will act or respond
differently than those who are not being observed. So, the results from the experiment cannot be
generalized.
(viii) Low statistical Power: If the power of a test is low, then a research will very likely fail to reject
a false null hypothesis. The reason for this could be inadequate sample size, inability to control
nuisance variables, the use of inappropriate test.
(ix) Violation of Assumptions: Al tests are based on certain assumptions. If such assumptions are
violated, then there is every likelihood that incorrect inferences would be drawn.
(x) Reliability of Measures: If the reliability of measurements on the dependent variables is low, then
there is bound to be increase in error variance which may lead to the acceptance of a false null
hypothesis.
202
(v) Each design has a fixed procedure of analysis; hence, the chances of researcher‘s bias are
negligible.
(vi) Threats to validity of the experiments are very less compared to other types of designs.
203
The two-factor classification contains more than two variables. It is made up of one dependent
variable and two independent variables (factors). It is applied to determine the effects of the two
independent variables (factors) on the dependent variable. This technique, therefore, enables us to
estimate not only the separate effects of the factors (independent variables) but also the joint
(interaction) effect of these factors on the dependent variable.
Consider the study of the effects of fertilizer and soil type on the yield of rice. There are two
independent variables or factors (fertilizer and soil type) and one dependent variable (rice yield).
Therefore, the effects of these two factors (fertilizer and soil type) can be analyzed using the two
factor analysis of variance technique.
Due to A
Due to B
Error
In Table 10.2, error degrees of freedom is obtained by subtracting the components df from the total df.
Similarly, the error SS is obtained by subtracting the components SS from the total SS, whereas the
total SS is calculated by taking the total of the square of each individual value and subtracting from it
a factor which is known as the correction for mean or correction factor (CF). Hence, the sum of
squares for testing the equality of population means are as follows:
Suppose the observations in k random samples of sizes , ,…, from k normal populations
( ) for .
(∑ ∑ )
, (10.1)
, (10.2)
Total SS ∑ ∑ , (10.3)
204
Error SS ∑ ∑ ∑ (10.5)
, (10.6)
Between samples ( ) ∑ ,
Total ( ) ∑ ∑ ,
In ANOVA, an experimenter tests the null hypothesis of equality of k treatment mean effects. That is
vs at least two of them are not equal, i.e., for some
The test employed for is F-test. If calculated F is not significant, then it is inferred that all
treatments are equally effective and no further test is required. But if the calculated F for treatment is
significant, then is rejected, implying that all the treatment mean effects are not equal. Some of
them may be effective and some are not. So there is a need to perform a test for testing the
significance of all pairs of treatment means. One good example of such test is called the Dunn‘s
Multiple Comparison Test.
The above ANOVA table is for a one way classification. It may be extended for two or more way
classification. In that situation, the component factors will increase in the ANOVA table accordingly.
Example 10.1: The table below shows the gain in body weight (kg) per cow during four grazing
treatments. Test the hypothesis at 5% level of significance that the mean gains in weight of cow under
four treatments are equal.
T1 T2 T3 T4
Total
Gain in body weight(kg)
205
3 63.2 58.6 59.2 52.0
7 81.4
Solution
We can test the hypothesis by F-test using ANOVA.
vs at least two means are different
For the given data,
G=1286.5, ;
∑ ∑ ( ) ,
( )
,
Between treatment SS ∑
Since the calculated value of F is less than its table value, is not rejected. We therefore conclude
that, under the grazing treatments, the mean increase in weight of cows is not significantly different at
5% level of significance.
Assignment 10.1
206
The table below shows the gain in body weight (kg) per cow during four grazing treatments. Test the
hypothesis at 5% level of significance that the mean gains in weight of cow under four treatments are
equal.
T1 T2 T3 T4
Total
Gain in body weight(kg)
7 21.9
SOLUTION
207
SOLUTION
208