SMA 140 Lectures Notes 2024 Sep
SMA 140 Lectures Notes 2024 Sep
SMA 140 Lectures Notes 2024 Sep
STATISTICS
November 7, 2024
2
Lecture One: Introduction
1. Introduction to statistics
The Word statistics has been derived from Latin word ”Status” or the Italian word ”Statista”, the meaning
of these words is ”Political State” or a Government. Early applications of statistical thinking revolved around
the needs of states to base policy on demographic and economic data.
1.1 Definitions
Data/Data set – Set of values collected or obtained when gathering information on some issue of interest.
Examples
4) The yields of a certain crop obtained after applying different types of fertilizer.
Statistics: a branch of science that deals with collection presentation, analysis, and interpretation of data.
The definition points out 4 key aspects of statistics namely.
Statistics in the above sense refers to the methodology used in drawing meaningful information from a data
set. This use of the term should not be confused with statistics(referring to a set of numerical values)
or statistic (referring to measures of description obtained from a data set).
Descriptive Statistics – Collection, organization, summarization and presentation of data.
Population – All subjects possessing a common characteristic that is being studied.
Examples
3
4
2) The collection of all cars of a certain type manufactured during a particular month.
1) Study of the entire population carried out by the government every 10 years.
A census is usually very costly and time consuming. It is therefore not carried out very often. A study of a
population is usually confined to a subgroup of the population.
Sample – A subgroup or subset of the population.
The number of values in the sample (sample size) is denoted by n. The number of values in the popula-
tion (population size) is denoted by N.
Statistical Inference – Generalizing from samples to populations and expressing the conclusions in the
language of probability (chance).
Discrete variables – Variables that can assume a finite or countable number of possible values. Such
variables are usually obtained by counting.
Examples
3) A person’s response (agree, not agree) to a statement. A one (1) is recorded when the person agrees
with the statement, a zero (0) is recorded when a person does not agree.
Continuous variables – Variables that can assume an infinite number of possible values. Such variables
are usually obtained by measurement.
Examples
Ordinal scale – Level of measurement which classifies data into categories that can be ordered or ranked.
Differences between the ranks do not exist.
A variable can be treated as ordinal when its values represent categories with some intrinsic order or ranking.
Examples
1) Levels of service satisfaction from very dissatisfied to very satisfied.
2) Attitude scores representing degree of satisfaction or confidence and preference rating scores (low,
medium or high).
3) Likert scale responses to statements (strongly agree, agree, neutral, disagree, strongly disagree).
Quantitative variables – Variables which assume numerical values.
Interval scale – Level of measurement which classifies data that can be ordered and ranked and where
differences are meaningful. However, there is no meaningful zero and ratios are meaningless.
Examples
1) The difference between a temperature of 100 degrees and 90 degrees is the same difference as that
between 90 degrees and 80 degrees. Taking ratios in such a case does not make sense.
2) When referring to dates (years) or temperatures measured (degrees Fahrenheit or Celsius) there is no
natural zero point.
Ratio scale – Level of measurement where differences and ratios are meaningful and there is a natural zero.
This is the “highest” level of measurement in terms of possible operations that can be performed on the data.
Examples
Variables like height, weight, mark (in test) and speed are ratio variables. These variables have a natural
zero and ratios make sense when doing calculations e.g. a weight of 80 kilograms is twice as heavy as one of
40 kilograms.
6
(i) Add up the given quantities and let s be the sum of the values
X X
(ii) For each quantity x, calculate the representative angle and percentage as S × 360o and S × 100%
respectively
(iii) Draw a circle and divide it into sectors using the angles calculated in step ii above
(iv) Label the sector by the group represented and indicate the corresponding percentage.
Example
This frequency distribution shows the number of pounds of each snack food eaten during the Super Bowl.
Construct a pie graph for the data
Solution
Snack Potato chips Tortilla chips Pretzels Popcorn Snack nuts Total
Pounds (in millions) 11.2 8.2 4.3 3.8 2.5 30.0
Representative Angle 134 98 52 46 30 360
Representative %age 37.3 27.3 14.3 12.7 8.3 99.9
Figure 1:
Figure 2:
iii) Make the units that are used for the frequency equal in size.
When you analyze a Pareto chart, make comparisons by looking at the heights of the bars.
Example
The table shown here is the average cost per mile for passenger vehicles on state turnpikes. Construct
Solution
Arrange the data from the largest to smallest according to frequency.
Figure 3:
2 Decide on the number of classes. Use Sturges’ rule which states that
No. of classes k = Round-up(1 + 1.44ln(n))
= 1 + 1.44 × ln(50)
= 6.63 ≈ 7
11
3 Calculate the class width such that no. of classes× class width > Range
Class width = 6
d
Upper class boundary = upper class limit +
2
d
Lower class boundary = Lower class limit–
2
Example 2 continued
The frequency distribution below includes the class boundaries.
Example 3
The monthly expenditures (thousands of rands) of 60 households are shown below. The values of this data
12
Class midpoints
The midpoint of class (xmid ) can be calculated from
1) For the frequency distribution in example 2 (temperature data), the class midpoints are given below.
2) For the frequency distribution in example 3, the class midpoints are given below.
classes midpoints
4.5 – 5.5 5
5.5 – 6.5 6
6.5 – 7.5 7
7.5 – 8.5 8
8.5 – 9.5 9
9.5 – 10.5 10
10.5 – 11.5 11
For this distribution lower (upper) class limit = lower (upper) class boundary for each of the classes.
A value that falls on the boundary of 2 classes is allocated to the higher of the two classes e.g. 5.50000 is
allocated to the class 5.5 – 6.5 (not 4.5 to 5.5).
13
Cumulative frequencies
The “less than” cumulative frequency of a class is the number of values in the sample that are less than or
equal to the upper class boundary of the class.
Examples
2) For the frequency distribution in example 3 (expenditure data) the cumulative frequencies are calcu-
lated as shown below.
Histogram
A histogram is the graphical representation of a frequency distribution. The frequency for each class is
represented by a rectangular bar with the class boundaries as base and the frequency as height.
Example
A histogram of the frequency distribution in example 2 (temperature data) is shown below.
Frequency polygon
This is also a graphical representation of a frequency distribution. For each class the class midpoint is plotted
against the frequency and the plotted points joined by means of straight lines.
14
Figure 4:
Example
For the temperature data the following values are plotted.
midpoint 35.5 39.5 43.5 47.5 51.5 55.5 59.5 63.5 67.5
f 0 4 10 8 15 9 3 1 0
Solution
Figure 5:
Figure 6:
Solution
Arrange the data in order and separate the data according to the first digit, as shown.
02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36 43, 44, 44, 45 51, 52, 57
The plot above shows that the distribution peaks in the center and that there are no gaps in the data. For
Figure 7:
7 of the 20 days, the number of patients receiving cardiograms was between 31 and 36. The plot also shows
that the testing center treated from a minimum of 2 patients to a maximum of 57 patients in any one day.
Figure 8:
Example
The number of stories in two selected samples of tall buildings in Atlanta and Philadelphia is shown. Con-
struct a back-to-back stem and leaf plot, and compare the distributions.
Solution
The final back-to-back stem and leaf plot looks like the one below
17
Figure 9:
Figure 10:
The buildings in Atlanta have a large variation in the number of stories per building. Although both
distributions are peaked in the 30- to 39-story class, Philadelphia has more buildings in this class. Atlanta
has more buildings that have 40 or more stories than Philadelphia does.
18
Lecture 2: Measures of location and
central Tendency
Objectives
By the end of this unit, you will be able to:
calculate the
Measures of location can further be classified into measures of central tendency and measures of relative
positioning (quartiles).
Notation
Let the symbol xi denote any of the n values x1 , x2 , . . . , xn assumed by a variable X. The letter i in xi
i = 1, 2, . . . is called an index subscript. The letters j, k, p, q or s can also be used.
Summation notation
X
x1 + x2 + . . . + xn = i = 1n xi
Example
n
X
Xi Yi = X1 Y1 + X2 Y2 + . . . + Xn Yn
i=1
n
X n
X
aXi = aX1 + aX2 + . . . + aXn = a(X1 + X2 + . . . Xn ) = a Xi
i=1 i=1
19
20
Arithmetic Mean
The arithmetic mean of a set of values x1 , x2 , . . . , xn , denoted ¯x if the data set is a sample, is found by
dividing the sum of the set of numbers with the actual number of values. i.e;
x1 + x2 + . . . + xn
X̄ =
n
n
1X
= xi
n i
Example 1
Find the mean of 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. Solution
x1 + x2 + . . . + xn
X̄ =
n
n
1 X
= xi
n i
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10
=
10
55
= = 5.5
10
Some important characteristics of the arithmetic mean are:
ii) The location of the mean is dependent upon the shape of the distribution. It may not always be
representative of the centre.
iii) The arithmetic mean has mathematical properties that the other two averages do not have. (Balance
point property).
iv) Because of its mathematical properties, the mean lends itself to further mathematical analysis of data
sets unlike the other two averages.
v) The mean is used as a measure of the centre in statistical inference because these mathematical prop-
erties are important in that instance.
vi) The mean requires a quantitative variable, typically at the ratio level of measurement.
21
Exercise
Median
The median is the value in the data set which is such that half of the values in the data set are less
than or equal to it and half greater than or equal to it.
It’s the value below which and above which half
ofthe observations fall when ranked in order of size.
th
The position of the median term is given by n+12 value. where n is the number of data values in
the sample.
If the number of values in the data set is even, then the median is the average of the two middle values.
Some important characteristics of the median are:
i) The median is representative of the data array because it is in the geometrical centre of the distribution.
It is the exact halfway point. Half the observations are less and half are greater than the median value.
ii) In a positively skewed data set, the median will be to the right of the mode and on negatively skewed
data sets to the left. Draw sketches to illustrate this relationship.
iii) The median is always unique for a data set.
iv) The median is useful for descriptive purposes when the data set is skewed because of the constancy
of its location. It is always exactly the middle observation in the data array when the array is rank
ordered and it is insensitive to outliers.
v) The median can be found for data that has an ordinal level of measurement or higher.
Examples
1) The marks of students in a geography test that has a maximum possible mark of 50 are given below
47, 35, 37, 32, 38, 39, 36, 34, 35
Find the median of this set of data values.
Solution
Arrange the data values in order from the lowest to the highest value:
32, 34, 35, 35, 36, 37, 38, 39, 47
The number of values n, in the data set = 9
1
Median = (9 + 1)th value
2
= 5th value
= 36
22
2) Consider the above data set with the first value (47) omitted
Arrange the data values in order from the lowest to the highest value:
The number of values n, in the data set = 8 which is an even number. The two middle values in the
data set are in the position n2 = 82 = 4 and n2 + 1 = 28 + 1 = 5, i.e the values 35 and 36.
35 + 36
Median = = 35.5.
2
i = 1, 2, . . . , 100
Li = Lower class boundary of the Median class.
fi = frequency of the median class
n = Sample size
FLess = Sum of frequencies of classes less than percentile class.
c = Class width
Mode
It’s the value occurring most frequently in a data set. If each observation occurs the same number of times,
then there is no mode. When 2 or more observation occurs most frequently in a data then the data is said
to be multimodal.
Example
Find the mode of the following data set:
Can be determined for all levels of data (nominal, ordinal, interval and Ratio ) scale.
It is not affected by extremely high or low values in the data set.
It can be used as a measure of central tendency in distributions with open ended classes.
Some of the important characteristics of the mode are:
i) The mode may not be unique since a distribution may have more than one mode.
23
ii) There is no calculation required to find the mode since it is obtained by inspection of the data.
iii) If the mode is used as a representative of individual values in the array, it will be in error less often
than any other average used. If the size of this error is important such as in sizes in the manufacturing
business, the mode is a good representative of the data array.
iv) In a negatively skewed distribution, the mode is to the right of the midrange. In a positively skewed
distribution the mode is to the left of the midrange. Draw sketches to illustrate this relationship.
v) If what is to be portrayed is a typical value in an array, it is most typical because no value occurs more
often.
vi) For data measured at the nominal level, it is the only average that can be found.
Example 2
If a final examination is weighted 4 times as much as a quiz, a midterm examination 3 times as much as a
quiz, and a student has a final examination grade of 80, a midterm examination grade of 95 and quiz grades
of 90, 65 and 70, the mean grade is
P
w1 x1 + w2 x2 + . . . + wn xn w i xi
x̄w = = P
w1 + w2 + . . . + wn wi
1(90) + 1(65) + 1(70) + 3(95) + 4(80) 830
= = = 83
1+1+1+3+4 10
Geometric mean
For Raw data, let x1 , x2 , . . . , xn be the sample values, the geometric mean is given by the formula:
√
X̄GM = n x1 .x2 . . . . .xn
v
u n
uY
n
= t xi
i=1
v
u n
uY fi
n
= t xi
i=1
n
1X
⇒ log(X̄GM ) = fi logxi
n i=1
2) Find the harmonic and geometric mean of the frequency table below
x 13 14 15 16 17
f 2 5 13 7 3
q
xf11 .xf22 . . . . .xfnn
n
X̄GM =
p
30
= 132 × 145 × 151 3 × 167 × 173
= 15.09837
Harmonic mean
Let x1 , x2 , . . . , xn be the sample values, the harmonic mean is given by the formula:
n
X̄HM = 1
+ x12 + . . . +
x1
1
xn
n
=
P 1
xi
x 13 14 15 16 17
f 2 5 13 7 3
solution
p
30
X̄HM = 132 × 145 × 1513 × 167 × 173 = 15.09837
Measures of Variability
Dispersion
Averages or the measures of central tendency give us an idea of tile concentration of the observations about
the central part of the distribution.
26
Measures of Dispersion
The following are the measures of dispersion:
(i) Range,
(ii) Quartile deviation or Semi-interquartile range,
(iii) Mean deviation, and
(iv) Standard deviation.
Range
The range is the difference between two extreme observations, or the distribution. If A and B arc the greatest
and smallest observations respectively in a distribution, then, its range is A − B.
Range is the simplest but a crude measure of dispersion. Since it is based on two extreme observations
which themselves are subject to chance fluctuations, it is not at all reliable measure of dispersion.
Quartile deviation
Quartile deviation or semi-interquartile range Q is given by
1
Q= (Q3 − Q1 )
2
where Q1 and Q3 are the first and third quartiles of the distribution respectively.
Quartile deviation is definitely a better measure than the range as it makes use of 50% of the data. But
since it ignores the other 50% of the data, it cannot be regarded as a reliable measure.
Since mean deviation is based on all the observations. it is a better measure of dispersion than range
or quartile deviation.
It may be pointed out here that mean deviation is least when taken from median.
Example 1
Find the quartile deviation and the mean absolute deviation for the following data.
Solution
Sorted data: 3, 5, 6, 6, 7, 9, 10, 12, 13, 13, 15
Recall Q1 = 6 and Q3 = 13
1 1
SIQR = (Q3 − Q1 ) = (13 − 6) = 3.5
2 2
3 + 5 + 6 + 6 + 7 + 9 + 10 + 12 + 13 + 13 + 15
x̄ = =9
11
P
(x − x̄) |3 − 9| + |5 − 9| + |6 − 9| + |6 − 9| + . . . + |13 − 9| + |15 − 9| 36
M AD = = = = 3.2727
n 11 11
Solution
3 + 5 + 6 + 6 + 7 + 9 + 10 + 12 + 13 + 13 + 15
x̄ = =9
11
28
Example 2
Find the standard deviation of the data: 2, 4, 8, 7, 9, 4, 6, 10, 8, and 5.
Solution
2 + 4 + 8 + 7 + 9 + 4 + 6 + 10 + 8 + 5 63
X̄ = = = 6.3
10 10
X
x2 = 22 + 42 + 82 + 72 + 92 + 42 + 62 + 102 + 82 + 52 = 455
1X 2
S2 = x − x̄2
n
= 45.5 − 6.32
= 5.81
√ √
Standard deviation S = S2 = 5.81 = 2.4104.
Example 3
Estimate the mean, and standard deviation for the frequency table below:
Solution
A 204 68 150 30 70 95 60 76 24 19
B 99 190 130 94 80 89 69 85 65 40
Solution
The 9 deciles D1 , D2 , . . . , D9 are the values that have 10%, 20%, . . . , 90% respectively of the values in
the data set less or equal to them.
D1 = P10 , D2 = P20 , . . . , D5 = P50 = me, . . . , D9 = P90 .
3) Divide the data set into 2 portions of equal numbers of values – set 1 consists of those values less
or equal to the median and set 2 consists of those values greater or equal to the median. When the
data set has an odd number of values, the median is excluded from the division of the data set into 2
portions.
4) The first quartile (Q1 ) is the median of set 1 and the third quartile (Q3 ) is the median of set 2.
Example
The distance from home to work (kilometers) of 11 employees at a certain company are shown below.
Calculate Q1 and Q3 .
6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36
31
32
1) Ordered data set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49
2) Median = 40. After this step the median is deleted from the data set.
4) Set 2–5 values greater than the median i.e. 41, 42, 43, 47, 49.
Example
Suppose the data set consists of the above values and 56 (12 values).
6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36, 56
1) Ordered Data Set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49, 56
40+41
2) median 2 = 40.5 Unlike what was done in example 1, no values are deleted from the data set.
3) Set 1–6 values less or equal than median i.e. 6, 7, 15, 36, 39, 40
Set 2–6 values greater or equal than the median i.e. 41, 42, 43, 47, 49, 56.
15+36 43+47
4) Q1 = median of set 1 = 2 = 25.5 Q3 = median of set 2 = 2 = 45
Q3 −Q1
The quartile deviation = Q = 2 can also be used as a measure of variability.
43−15
For the data set in example 1, quartile deviation = Q = 2 = 14.
The quartile deviation value shows the extent to which the values in the data set deviate from the
median.
For a skew data set the quartile deviation is a more appropriate measure of variability than the standard
deviation.
A formula for calculating the ith percentile Pi for grouped data is shown below.
c n×i
100 − Fless
Pi = Li + , i = 1, 2, . . . , 100.
fi
where
Li = lower class boundary of percentile class.
fi = frequency of percentile class
n = sample size
Fless = Sum of frequencies of classes less than percentile class.
c = class width.
33
Example
For the frequency distribution of temperatures (example 2 of the frequency distributions – table given below),
the calculations of the median, first quartile, third quartile, 4th decile and 65th percentile are shown below.
class boundaries f cumulative frequency
37.5 – 41.5 4 4
41.5 – 45.5 10 14
45.5 – 49.5 8 22
49.5 – 53.5 15 37
53.5 – 57.5 9 46
57.5 – 61.5 3 49
61.5 – 65.5 1 50
Total 50
Median
The above formula with i = 50, n = 50 applies.
i∗n
Step 1: Calculate position of median = 100 = 50∗50
100 = 25.
Step 2: Median class (class that contains 25th observation) is the class 49.5–53.5.
Step 3: L50 = 49.5, f50 = 15, Fless = 22, c = 4.
Step 4: Substitute into the above formula.
(25 − 22) ∗ 4
Median = 49.5 + = 50.3
15
First quartile
The above formula with i = 25, n = 50 applies.
i∗n
Step 1: Calculate position of median = 100 = 25∗50
100 = 12.5.
Step 2: First quartile class (class that contains 12.5th observation) is the class 41.5 – 45.5.
Step 3: L25 = 41.5, f25 = 10, Fless = 4, c = 4.
Step 4: Substitute into the above formula.
(12.5 − 4) ∗ 4
Q1 = 41.5 + = 44.9.
15
Third quartile
The above formula with i = 75, n = 50 applies.
i∗n
Step 1: Calculate position of median = 100 = 75∗50
100 = 37.5.
Step 2: Third quartile class (class that contains 37.5th observation) is the class 53.5 – 57.5.
Step 3: L75 = 53.5, f75 = 9, Fless = 37, c = 4.
Step 4: Substitute into the above formula.
(37.5 − 37) ∗ 4
Q3 = 53.5 + = 53.72.
9
Fourth decile
The above formula with i = 40, n = 50 applies.
i∗n
Step 1: Calculate position of the 4th = 100 = 40∗50
100 = 20.
Step 2: 4 decile class (class that contains 20th observation) is the class 45.5-49.5.
th
65th Percentile
The above formula with i = 40, n = 50 applies.
i∗n
Step 1: Calculate position of the 65th = 100 = 65∗50
100 = 32.5.
Step 2: 65 percentile class (class that contains 32, 5th observation) is the class 49.5 – 53.5.
th
(32.5 − 22) ∗ 4
P65 = 49.5 + = 52.3.
15
Percentiles can also be read off from a “less than” ogive.
Example
The following cumulative frequency graph shows the distribution of marks scored by a class of 40 students
in a test.
Figure 11:
Objective: The objective of the present lesson is to impart the knowledge of measures of dispersion and
skewness and to enable the students to distinguish between average, dispersion, skewness, moments and
kurtosis.
4.1 Dispersion
Characteristics for an Ideal measure of Dispersion:
The dispersion, for an ideal measure of dispersion arc the same as those for all ideal ·measure of central
tendency, viz.,
Spread is the degree of scatter or variation of the variable about the central value. e.g:
– the range,
– Inter-Quartile range,
– Quartile Deviation also called semi Inter-Quartile range,
– Mean Absolute Deviation,
– Variance and
– standard deviation.
35
36
Figure 12: Sketches showing general position of the Mean, Median and Mode of the population
is given by
f (x − x̄)3
P
α3 =
nS 3
where x̄ and S are the arithmetic mean and standard deviation of X respectively.
Generally for any set of values x1 , x2 , . . . , xn the moment coefficient of skewness α3 is given by α3 =
f (x−x̄)3
P
nS 3 where S is the standard deviation of X. It’s worth noting that if α3 < 0, the distribution is
negatively skewed, if α3 > 0 the distribution is positively skewed and if α3 = 0 the distribution is normal.
The Karl Pearson’s coefficient of Skewness is based upon the divergence of mean from mob in a skewed
distribution. Recall the empirical relation between mean, median and mode which states that, for a moder-
ately symmetrical distribution, we have
Example
Calculate the coefficient of Kurtosis α4 for the data: 5, 6, 7, 6, 9, 4, 5
Solution
1X 42
x̄ = x=
n 7
Standard deviation
r
1X 4
s= (x − x̄)2 = √
n 7
37
x 5 6 7 6 9 4 5 Total
2
P
(x − x̄) 1 0 1 0 9 4 1 16
3
P
(x − x̄) -1 0 1 0 27 -8 -1 18
(x − x̄)3 1
P
0 1 0 81 16 1 100
√
f (x − x̄)3
P
18 7 3
α3 = = ×( ) = 0.744118
nS 3 7 4
The Karl Pearson’s coefficient of Skewness is based upon the divergence of mean from mob in a skewed dis-
tribution. Recall the empirical relation between mean, median and mode which states that, for a moderately
symmetrical distribution, we have
Mean-Mode = 3(Mean-Median)
Measure of Kurtosis
It measures the peakedness of a distribution. If the values of x are very close to the mean, the peak is very
high and the distribution is said to be Leptokurtic. On the other hand if the values of x are very far away
from the mean, the peak is very low and the distribution is said to be Pletykurtic. Finally if x values are at
a moderate distance from the mean then the peak is moderate and the distribution is said to be mesokurtic.
Figure 13:
f (x − x̄)4
P
α4 =
nS 4
where x̄ and S are the arithmetic mean and standard deviation of X respectively. Example
Calculate the coefficient of Kurtosis α4 for the data: 5, 6, 7, 6, 9, 4, 5
Solution
1X 42
x̄ = x=
n 7
Standard deviation r
1X 4
s= (x − x̄)2 = √
n 7
x 5 6 7 6 9 4 5 Total
2
P
P(x − x̄)3 1 0 1 0 9 4 1 16
(x − x̄) -1 0 1 0 27 -8 -1 18
(x − x̄)3 1
P
0 1 0 81 16 1 100
√ 4
f (x − x̄)4
P
100 7
α4 = = × = 2.73438
nS 4 7 4
Lecture 4: Bivariate data
4.1 Introduction
So far we have confined our discussion to the distributions involving only one variable. Sometimes, in
practical applications, we might come across certain set of data, where each item of the set may comprise
of the values of two or more variables. A Bivariate Data is a a set of paired measurements which are of the
form
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )
Examples
ii) The series of sales revenue and advertising expenditure of the various branches of a company in a
particular year.
iii) The series of ages of husbands and wives in a sample of selected married couples.
In a bivariate data, each pair represents the values of the two variables. Our interest is to find a relationship
(if it exists) between the two variables under study.
39
40
Figure 14:
Example:
A study was conducted to find whether there is any relationship between the weight and blood pressure of
an individual. The following set of data was arrived at from a clinical study. Let us determine the coefficient
of correlation for this set of data. The first column represents the serial number and the second and third
columns represent the weight and blood pressure of each patient.
Weight 78 86 72 82 80 86 84 89 68 71
Blood Pressure 140 160 134 144 180 176 174 178 128 132
Thus
10(124206) − (796)(1546)
r= p p
(10)(63776) − (796)2 (10)(243036) − (1546)2
11444
=p = 0.5966
(1144)(40244)
Example
Calculate the correlation coefficient for the following heights (in inches) of fathers (X) and their sons (Y) :
X 65 66 67 67 68 69 70 72
Y 61 68 65 68 72 72 69 71
Solution
42
Figure 15:
X Y X2 Y2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
68 72 4624 5184 4896
69 72 4761 5184 4968
70 68 4900 4761 4830
72 71 5184 5041 5112
Total 544 552 37028 38132 37560
Subject 1 2 3 4 5 6 7 8 9 10
X 8.3 8.6 9.2 9.8 8.0 7.8 9.4 9.0 7.2 8.6
y 2300 2250 2380 2400 2000 2100 2360 2350 2000 2260
43
Solution
Note that in the x row, we have two students having a grade point average of 8.6 also in the y row; there is
a tie for 2000.Now we arrange the data in descending order and then rank 1, 2, 3, . . . , 10 accordingly. In case
of a tie, the rank of each tied value is the mean of all positions they occupy. In x,for instance, 8.6 occupy
ranks 5 and 6. So each has a rank 5+6 2 = 5.5.
Similarly in ‘y’ 2000 occupies ranks 9 and 10, so each has rank 9+102 = 9.5. Now we come back to our
formula
6 d2
P
R=1−
n(n2 − 1)
X 8.3 8.6 9.2 9.8 8.0 7.8 9.4 9.0 7.2 8.6
y 2300 2250 2380 2400 2000 2100 2360 2350 2000 2260
Rank(X) 7 5.5 3 1 8 9 2 4 10 5.5
Rank(Y) 5 7 2 1 9.5 8 3 4 9.5 6
d 2 -1.5 1 0 -1.5 1 -1 0 0.5 -0.5
d2 4 2.25 1 0 2.25 1 1 0 0.25 0.25
d2 = 12. so
P
So here, n = 10 and
6(12)
R=1−
10(100 − 1)
= 1 − 0.0727 = 0.9273
Example
For a certain joint stock company, the prices of preference shares (X) and debentures (Y) are given below:
Use the method of rank correlation to determine the relationship between preference prices and debentures
prices.
Solution
Calculations for Coefficient of Rank Correlation
X Y RX RY XR − RY d2
73.2 97.8 7 5 2 4
85.8 99.2 1 1 0 0
78.9 98.8 4 2 2 4
75.8 98.3 6 3.5 2.5 6.25
77.2 98.3 5 3.5 1.5 2.25
81.2 96.7 3 7 -4 16
83.8 97.1 2 6 -4
P P 216
d=0 d = 48.5
44
P 2 m(m2 −1)
6 d + 2
ρ=
N (N 2 − 1)
6 48.5 + 2(4−1)
12
=
7(72 − 1)
6 × 49
=1−
7 × 48
= 0.125
Hence, there is a very low degree of positive correlation, probably no correlation, between preference share
prices and debenture prices.
Example:
Scores made by students in a statistics class in the mid-term and final examination are given here. Develop
a regression equation which may be used to predict final examination scores from the mid-term score.
Student 1 2 3 4 5 6 7 8 9 10
Mid term 98 66 100 96 88 45 76 60 74 82
Final 90 74 98 88 80 62 78 74 86 80
Solution
We want to predict the final exam scores from the mid term scores. So let us designate ’y’ for the final exam
scores and ’x’ for the mid term exam scores. We open the following table for the calculations.
Student X Y X2 XY
1 98 90 9604 8820
2 66 74 4356 4884
3 100 98 10000 9800
4 96 88 9216 8448
5 88 80 7744 7040
6 45 62 2025 2790
7 76 78 5776 5928
8 60 74 3600 4440
9 74 86 5476 6364
10 82 80 6724 6560
Totals 785 810 64521 65074
P P P
n xy −
x y
b=
P 2
P 2
n x − x
10(65074) − (785)(810)
=
10(64521) − (785)2
14, 860
= = 0.5127
28, 985
P P
y x
a= −b
n n
810–785(0.5127)
=
10
= 40.7531
Thus, the regression equation is given by ŷ = 40.7531 + 0.5127x.
We can use this to find the projected or estimated final scores of the students. e.g for the midterm score
of 50 the projected final score is
ŷ = 40.7531 + 0.5127(50) = 66.3881
which is a quite a good estimation.
46
Exercise
1) The following are marks obtained by a student in Kenyatta University for 11 subjects within two
academic years:
Academic year 1 2 3 4 5 6 7 8 9 10 11
Year1 25 10 15 35 45 15 20 30 40 45 55
Year2 12 8 10 11 13 8 10 11 13 14 10
Required
a) Draw a scatter diagram of year 1 against year 2 marks and comment of the pattern [3 marks]
b) Calculate the product moment correlation of year 1 and year 2 marks of the students. [4 marks]
c) Calculate the Spearman’s Rank correlation of students marks. [5 marks]
2) Two series X and Y are presented below:
Series X 62 72 78 58 65 70 66 63 60 72
Series Y 50 65 63 50 54 60 61 55 54 65
a) Draw a scatter diagram of Series X against Series Y data and comment of the pattern [3 marks]
b) Calculate the product moment correlation of Series X against Series Y data. [4 marks]
c) Calculate the Spearman’s Rank correlation of series data. [5 marks]
3) An Agriculturalist assumes that there is a linear relationship between the amount of fertilizer supplied
to potato plants and the subsequent yield of potatoes obtained. Eight tomato plants, of the same
variety, were selected at random and treated weekly with a solution in which x grams of fertilizer was
dissolved in fixed quantity of water. The yield y kgs of potatoes was recorded as:
CROP 1 2 3 4 5 6 7 8
X 1 1.5 2 2.5 3 3.5 4 4.5
y 3.9 4.4 5.8 6.6 7 7.1 7.3 7.7
Required
Examples
Random experiment
- A random experiment is a statistical experiment in which:
2. Any performance of the experiment results in an outcome that is not known in advance.
Examples
4) Drawing a card from a deck of cards (possible outcomes: 13 hearts, 13 clubs, 13 spades, 13 diamonds).
47
48
Example
A fair die, with faces numbered 1 to 6, is rolled once, write down the sample space S hence find the proba-
bility that the score showing up is ; a) a multiple of 3 b) a prime number.
Solution
S = {1, 2, 3, 4, 5, 6} Multiples of 3 are 3 and 6 while prime numbers are 2, 3 and 5.
2 1
P (Multiple of 3) = =
3 4
3 1
P (Prime number) = = .
6 2
Example 2
Two coins are tossed. Find the probability of getting
Solutions
Here S = {hh, ht, th, tt}.
3
P (A) =
4
In a nut shell Empirical (or frequentist or statistical) probability is based on observed data. The empirical
probability of an event A is the relative frequency of event A. Example 1
The following are the counts of fish of each type, that you have caught before.
Estimate the probability that the next fish you catch will be a Blue gill.
13
P (Blue gill) = = 0.325
40
Example
A summary of the final marks in a certain statistics course is shown below.
Mark f
less than 30 6
30 - 39 26
40 - 49 45
50 - 59 64
60 - 69 82
70 - 79 37
80 - 89 22
90 - 99 8
Total 290
Find the following empirical probabilities from the frequency distribution table
(ii) P (pass)
solution
26+6
i) P (Marks less than 40) = 290 = 0.11
64+82+37+22+8 213
i) P (Pass) = 290 = 290 = 0.73
22+8 28
i) P (Above 80) = 290 = 290 = 0.103.
Figure 16:
1) Let E denote the event “an odd number is obtained when tossing a single die”. Then E = 1, 3, 5.
2) Let H denote the event “at least one head appears when tossing two coins”.
3) Let B denote the event “obtaining a club and a heart in a single draw from a deck of cards”. The event
B is impossible. The set of outcomes of B is an empty set denoted by
B = {.} = ϕ
4) 4) Let A denote the event “obtaining a 1, 2, 3, 4, 5 or 6 when tossing a single die”. The event A is a
certain event i.e. one of the outcomes belonging to the set describing the event must happen. This is
denoted by A = S, where S is the sample space.
A Venn diagram is a drawing, in which circular areas represent groups of items usually sharing common
properties.
The drawing consists of two or more circles, each representing a specific group or set, contained within
a square that represents the sample space. Venn diagrams are often used as a visual display when
referring to sample spaces, events and operations involving events.
These are events that involve more than one event. Such events can be obtained by performing various
operations involving two or more events.
Some of the operations that can be performed are described in the sections that follow.
Complementary events
The complementary event Ā (sometimes written A′ ) of an event A is all the outcomes in S that are
not in A.
Examples
1) Consider the experiment of tossing a single die. S = {1, 2, 3, 4, 5, 6}. The complement of the event A =
“obtaining a 3 or less” = 1, 2, 3 is = “obtaining a 4 or more” = {4, 5, 6}.
2) Consider the experiment of tossing two coins. S = {hh, ht, th, tt}. The complement of the event H =
“at least one head” = {hh, ht, th} is “no heads” = {tt}.
Figure 17:
The union of two events A and B, denoted by A ∪ B , is the set of outcomes that are in A or in B or
in both A and B i.e. the event that “either A or B or both A and B occur” or “at least one of A or B
occurs”.
The intersection of two events A and B, denoted by A ∩ B , is the set of outcomes that are in both A
and B i.e. the event that “both A and B occur”.
– Ā ∩ B is the event “a sample point is in B but not in A”.
– A ∩ B̄is the event “a sample point is in A but not in B”.
The Venn diagrams below show the sets A ∪ B and A ∩ B. These definitions involving two events can be
Figure 18:
extended to ones involving 3 or more events e.g. for the 3 events A1 , A2 and A3 the event A1 ∪ A2 ∪ A3 is
the event “at least one of A1 , A2 or A3 occurs” and A1 ∩ A2 ∩ A3 the event “A1 and A2 and A3 occur”.
Lecture 6: Counting formulas and
conditional probabilities
Number of ways = 2 × 1 = 2.
Let n = 3: 1st object –3 choices 2nd object – 2 choices.
3rd object – 1 choice.
Number of ways = 3 × 2 × 1 = 6.
In general: the number of ways is n × (n − 1) × (n − 2) × . . . 2 × 1 = n! (n factorial).
Using this notation
2 × 1 = 2! = 2
3 × 2 × 1 = 3! = 6
4 × 3 × 2 × 1 = 4! = 24etc.
Note: 1! = 1, 0 = 1.
The factorial notation is used in counting formulae.
Examples
53
54
n!
n Pr = P (n, r) =
(n − r)!
Combination
A combination is the number of different selections of a group of items where order does not matter.
Examples:
1) Four people (A, B, C, D) serve on a board of directors. A chairman and vice-chairman are to be chosen
from these 4 people. In how many ways can this be done?
Chairman Vice-chairman
A B
B A
A C
C A
A D
D A
B C
C B
B D
D B
C D
D C
2) Four people (A, B, C, D) serve on a board of directors. Two people are to be chosen from them as
members of a committee that will investigate fraud allegations. In how many ways can this be done?
People chosen A and B A and C A and D B and C B and D C and D
Number of ways = 6.
In both these examples a choice of 2 people from 4 people is made. However, in example 1 the
order of choice of the 2 people matters (since the one person chosen is chairman and the other one
vice-chairman). In example 2 the order does not matter. The only interest is in who serves on the
committee.
4!
P (4, 2) = = 12.
(4 − 2)!
In question 2 the combinations formula applies with n = 4, r = 2.
4!
C(4, 2) = = 6.
2!(4 − 2)!
3) Find the number of ways to take 4 people and place them in groups of 3 at a time where order does
not matter.
Solution:
4! 24
C(4, 3) = = = 4.
3!(4 − 3)! 6
Since order does not matter, use the combination formula.
4) Find the number of way to arrange 6 items in groups of 4 at a time where order matters.
Solution:
6! 720
P (6, 4) = = = 360
(6 − 4)! 2!
There are 360 ways to arrange 6 items taken 4 at a time when order matters.
5) Find the number of ways to take 20 objects and arrange them in groups of 5 at a time where order
does not matter.
20! 20.19.18.17.16
C(20, 5) = = = 15504
5!(20 − 5)! 1.2.3.4.5
6) Determine the total number of five-card hands that can be drawn from a deck of 52 cards.
Solution:
When a hand of cards is dealt, the order of the cards does not matter. Thus the combinations formula
is used.
There are 52 cards in a deck and we want to know in how many different ways we can draw them in
groups of five at a time when order does not matter. Using the combination formula gives
C(52, 5) = 2598960
56
7) There are five women and six men in a group. From this group a committee of 4 is to be chosen. In
how many ways can the committee be formed if the committee is to have at least 3 women in it?
Solution
Situation 1–3 women and 1 man. Number of ways = C(5, 3) × C(6, 1) = 10 × 6 = 60
Situation 2–4 women and no men. Number of ways = C(5, 4) × C(6, 0) = 5 × 1 = 5
Total number of ways = 60 + 5 = 65.
Figure 19:
Figure 20:
P (A) = P (A ∩ B) + P (A ∩ B̄)
P (B) = P (A ∩ B) + P (Ā ∩ B)
These formulae can be verified from the Venn diagram shown on the following page. The formulae can be
58
Figure 21:
Figure 22:
1) There are two telephone lines – A and B. Line A is engaged 50% of the time and line B is engaged
60% of the time. Both lines are engaged 30% of the time. Calculate the probability that
(a) at least one of the lines are engaged.
(b) none of the lines are engaged.
(c) line B is not engaged.
(d) line A is engaged, but line B is not engaged.
(e) only one line is engaged.
Solution
Let E1 denote the event “line A is engaged” and E2 the event “line B is engaged”.
P (E1 ) = 0.5, P (E2 ) = 0.6, P (E1 ∩ E2 ) = 0.3
59
(a)
(b)
P (none of the lines are engaged.) = 1–P (at least one of the lines are engaged)
= 1 − 0.8
= 0.2
(e)
P (only one line is engaged) = P (line A is engaged, but line B is not engaged)
+ P (line B is engaged, but line A is not engaged)
= P (E1 ∩ Ē2 ) + P (Ē1 ∩ E2 )
P (Ē1 ∩ E2 ) = P (E2 ) − P (E1 ∩ E2 )
= 0.6 − 0.3 = 0.3(Using the total probability formula)
A batch of 20 computers contain 3 that are faulty. Four (4) computers are selected at random without
replacement from this batch. Calculate the probability that
Solution:
There are C(20, 4) = 4845 ways of selecting the 4 computers from the batch of 20. Since random
selection is used, all 4845 selections are equally likely. Let A denote the event “all 4 the computers
selected are not faulty” and B the event “at least 2 of the computers selected are faulty”.
Using the classical probability result,
6.4 Independence
Two events A and B are independent if
P (A ∩ B) = P (A)P (B).
Otherwise, they are said to be dependent.
Two events are independent if they are not related to each other. For example, if you roll two dice sepa-
rately, the outcomes will be independent.
proposition
If A and B are independent, then A and B C are independent.
proof
P (A ∩ B̄) = P (A) − P (A ∩ B)
= P (A) − P (A)P (B)
= P (A)(1 − P (B))
= P (A)P (B̄)
61
This definition applies to two events. What does it mean to say that three or more events are independent?
Example
Roll two fair dice. Let A1 and A2 be the event that the first and second die is odd respectively. Let A3 = [sum
is odd]. The event probabilities are as follows:
Event Probability
A1 1/2
A2 1/2
A3 1/2
A1 ∩ A2 1/4
A1 ∩ A3 1/4
A2 ∩ A3 1/4
A1 ∩ A2 ∩ A3 0
We see that A1 and A2 are independent, A1 and A3 are independent, and A2 and A3 are independent. How-
ever, the collection of all three are not independent, since if A1 and A2 are true, then A3 cannot possibly be
true.
From the example above, we see that just because a set of events is pairwise independent does not mean
they are independent all together. We define:
We can also apply this concept to experiments. Suppose we model two independent experiments with
Ω1 = {α1 , α2 , · · · } and Ω2 = {β1 , β2 , · · · } with probabilities P (αi ) = pi and P (βi ) = qi . Further suppose
that these two experiments are independent, i.e.
P ((αi , βj )) = pi qj
for all i, j. Then we can have a new sample space Ω = Ω1 × Ω2 .
Now suppose A ⊆ Ω1 and B ⊆ Ω2 are results (i.e. events) of the two experiments. We can view them as
subspaces of Ω by rewriting them as A × Ω2 and Ω1 × B. Then the probability
X X X
P (A ∩ B) = pi qi = pi qi = P (A)P (B).
αi ∈A,βi ∈B αi ∈A βi ∈B
62
So we say the two experiments are “independent” even though the term usually refers to different events in
the same experiment. We can generalize this to n independent experiments, or even countably many infinite
experiments. The law of total probability. For disjoint events A1 and A2 with S = A1 ∪ A2 ,
P (B) = P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ).
Bayes’ theorem.
P (B|A1 )P (A1 )
P (A1 |B) = .
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 )
Example
Consider 1/10 of men and 1/7 of women are color-blind. A person is chosen at random and that person
is color-blind. What is the probability that the person is male. Assume males and females to be in equal
numbers.
Solution
Let M=male,F=female,C=colour-blind. Then
P (M ∩ C)
P (M |C) = (1)
P (C)
P (C|M )P (M )
= (2)
P (C|M )P (M ) + P (C|F )P (F )
1
· 12
= 1 10 1 1 1. (3)
10 · 2 + 7 · 2
Theorem
1. P (A ∩ B) = P (A | B)P (B).
Proof
Proofs of (i), (ii) and (iii) are trivial. So we only prove (iv). To prove this, we have to check the axioms.
P (A∩B)
1. Let A ⊆ B. Then P (A | B) = P (B) ≤ 1.
P (B)
2. P (B | B) = P (B) = 1.
For example, “odd” and “even” partition the sample space into two events.
Example
A fair coin is tossed repeatedly. The gambler gets +1 for head, and −1 for tail. Continue until he is broke
or achieves $a. Let
px = P (goes broke | starts with $x),
64
and B1 be the event that he gets head on the first toss. Then
180
P (satisfied|male) = = 0.6
300
90
P (satisfied|female) = = 0.45
200
120 180
P (not satisfied|male) = =1− = 0.4
300 300
110 90
P (not satisfied|female) = =1− = 0.55
200 200
270
P (satisfied) = = 0.54 and P (not satisfied) = 1–0.54 = 0.46
500
P (A | Bi )P (Bi )
P (Bi | A) = P .
j P (A | Bj )P (Bj )
P (+ | D) = 0.98
P (+ | DC ) = 0.01
P (D) = 0.001.
65
So what is the probability that a person has the disease given that he received a positive result?
P (+ | D)P (D)
P (D | +) =
P (+ | D)P (D) + P (+ | DC )P (DC )
0.98 · 0.001
=
0.098 · 0.001 + 0.01 · 0.999
= 0.09
So this test is pretty useless. Even if you get a positive result, since the disease is so rare, it is more likely
that you don’t have the disease and get a false positive.
Example
Consider the two following cases:
1. I have 2 children, one of whom is a boy.
2. I have two children, one of whom is a son born on a Tuesday.
What is the probability that both of them are boys?
Solution
1/4
1. P (BB | BB ∪ BG) = 1/4+2/4 = 13 .
2. Let B ∗ denote a boy born on a Tuesday, and B a boy not born on a Tuesday. Then
1 1 1 6
14 · 14 + 2 · 14 · 14
P (B ∗ B ∗ ∪ B ∗ B | BB ∗ ∪ B ∗ B ∗ ∪ B ∗ G) = 1 1 1 6 1 1
14 · 14 + 2 · 14 · 14 + 2 · 14 · 2
13
= .
27
How can we understand this? It is much easier to have a boy born on a Tuesday if you have two boys than
one boy. So if we have the information that a boy is born on a Tuesday, it is now less likely that there is
just one boy. In other words, it is more likely that there are two boys.
Example
A lab test is 95 percent effective at detecting a certain disease when it is present (sensitivity). When the
disease is not present, the test is 99 percent effective at declaring the subject negative (specificity). If 8
percent of the population has the disease (prevalence), what is the probability that a subject has the disease
given that
(a) his test is positive?
(b) his test is negative?
Solution
Let D =disease is present and z = test is positive. We are given that P (D) = 0.08 (prevalence), P (z|D) = 0.95
(sensitivity), and P (z̄|D̄) = 0.99 (specificity). In part (a), we want to compute P(D—z). By Bayes Rule,
P (Z|D)P (D)
P (D|z) =
P (Z|D)P (D) + P (Z|D̄)P (D̄)
(0.95)(0.08)
= ≈ 0.892
(0.95)(0.08) + (0.01)(0.92)
66
P (Z̄|D)P (D)
P (D|Z̄) =
P (Z̄|D)P (D) + P (Z̄|D)P (D̄)
(0.05)(0.08)
= ≈ 0.004
(0.05)(0.08) + (0.99)(0.92)
Example
An Economist believes that during periods of high economic growth, the Indian Rupee appreciates with
probability 0.70; in periods of moderate economic growth, it appreciates with probability 0.40; and during
periods of low economic growth, the Rupee appreciates with probability 0.20.During any period of time
the probability of high economic growth is 0.30; the probability of moderate economic growth is 0.50 and
the probability of low economic growth is 0.20. Suppose the Rupee value has been appreciating during the
present period. What is the probability that we are experiencing the period of
Solution
Our partition consists of three events: high economic growth (event H), moderate economic growth (event
M) and low economic growth (event L). The prior probabilities of these events are:
Let A be the event that the rupee appreciates. We have the conditional probabilities
By using the Bayes’ theorem we can find out the required probabilities
P(H /A), P(M / A) and P(L / A)
P (A/H).P (H)
P (H/A) =
P (A/H).P (H) + P (A/M ).P (M ) + P (A/L).P (L)
(0.70)(0.30)
=
(0.70)(0.30) + (0.40)(0.50) + (0.20)(0.20)
= 0.467
P (A/M ).P (M )
P (M/A) =
P (A/H).P (H) + P (A/M ).P (M ) + P (A/L).P (L)
(040)(0.50)
=
(0.70)(0.30) + (0.40)(0.50) + (0.20)(0.20)
= 0.444
67
P (A/L).P (L)
P (L/A) =
P (A/H).P (H) + P (A/M ).P (M ) + P (A/L).P (L)
(0.20)(0.20)
=
(0.70)(0.30) + (0.40)(0.50) + (0.20)(0.20)
= 0.089
68
Lecture 7: Random Variables and
probability distributions
2) X = the sum of the values (x) showing when two dice are rolled.
Variables that can take on any value in some interval i.e. they can take an infinite number of possible
values.
Examples:
1) The variables T and X from the above examples are discrete random variables.
2) The variables H and V from the above examples are continuous random variables.
69
70
1) As above, let T be the random variable that represents the number of tails obtained when a coin is
flipped three times. Then T has 4 possible values 0, 1, 2, and 3. The outcomes of the experiment and
the values of T are summarized in the next table.
Outcomes T
hhh 0
hht, hth, thh 1
tth, tht, htt 2
ttt 3
Assuming that the outcomes are all equally likely, the probability distribution for T is given in the
following table.
t 0 1 2 3 Total
P(t) 1/8 3/8 3/8 1/8 1
2) Let Y denote the number of tosses of a coin until heads appear first. Then
S = {h, th, tth, ttth, . . .} and Y = 1, 2, 3, 4, . . .
y 1 2 3 ... Total
1
P(y) 2 ( 21 )2 ( 12 )3 ... 1
3) A pair of dice is tossed. Let X denote the sum of the digits. The probability distribution of X can be
found from the following table. The entry in any particular cell is the sum of the row and column values.
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = x) 36 36 36 36 36 36 36 36 36 36 36
Note:
For any discrete randomPvariable X, the range of values that it can assume are such that
0 ≤ P (X = x) ≤ 1 and x P (x) = 1
1) Find the probability of getting 2 or more tails when a coin is flipped 3 times.
3 1 1
P (T ≥ 2) = + = .
8 8 2
2) Find the probability of getting at least one tail when a coin is flipped 3 times.
1 7
P (at least1) = 1–P (0) = 1– = .
8 8
3) Find the probability of needing at most 3 tosses of a coin to get the first heads.
Rx
(i) F (x) = f (x)dx = P (X ≤ x); −∞ < x < ∞
−x
Rb
(ii) F (b) − F (a) = a f (x)dx = P (a ≤ X ≤ b)
Example
Examine whether f (x) = 5x4 , 0 < x < 1 can be a p.d.f of a continuous random variable x.
Solution R∞
For probability density function, we show that −∞ f (x)dx = 1
Z 1
f x4 dx = 1
0
Z 1 5 1
x
f x4 dx = 5
0 5 0
1
5 5
= x
5 0
= [15 − 0]
=1
Hence f(x) is a pdf.
Example
A continuous random variable x follows the rule f (x) = Ax2 , 0 < x < 1. Determine A.
Solution R∞
Since f(x) is a p.d.f −∞ f (x)dx = 1
Therefore
Z 1
Ax2 dx = 1
0
1
x3
A =1
3 0
A
= 1A =3
3
Mathematical Expectation
A very important concept in probability and statistics is that of mathematical expectation, expected value,
or briefly the expectation, of a random variable. For a discrete random variable X having the possible values
x1 , x2 , . . . , xn the expectation of X is defined as
E(X) = x1 P (X = x1 ) + . . . + xn P (X = xn )
Xn
= xi P (X = xi )
i=1
or
E(X) = x1 f (x1 ) + . . . + xn f (xn )
Xn
= xi f (xi )
i=1
73
Example.
Suppose that a game is to be played with a single die assumed fair. In this game a player wins $20 if a 2
turns up; $40 if a 4 turns up; loses $30 if a 6 turns up; while the player neither wins nor loses if any other
face turns up. Find the expected sum of money to be won.
Solution
Examples
As above, let T be the random variable that represents the number of tails obtained when a coin is flipped
three times. Then T has 4 possible values 0, 1, 2, and 3. The outcomes of the experiment and the values of
T are summarized in the next table.
Outcomes T
hhh 0
hht, hth, thh 1
tth, tht, htt 2
ttt 3
Find the expected value of the random variable T.
Solution
X 1 3 3 1 12
E(T ) = tp(t) = (0 × ) + (1 × ) + (2 × ) + (3 × ) = = 1.5
t
8 8 8 8 8
Theorem
Let X be a discrete r.v. with probability function p(x). Then
(i) E(c) = c where c is any real constant.
(ii) E[ax + c] = aµ + b where a and b are constants.
(iii) E[kg(x)] = kE[g(x)] where g(x) is a real valued function of X.
(iv) E[ag1 (x) ± bg2 (x)] = aE[g1 (x)] ± bE[g2 (x)] where gi (x) iis a real valued functions of X.
σ 2 = E[(X − µ)2 ]
Xn
= (xi − µ)2 p(xi )
i=1
X
= x2 p(x) − µ2
The standard deviation σ .Let X be a discrete r.v. with probability function p(x). Then is the square root
of the variance of X given by p p
σ = V ar(X) = E(X − µ)2
74
Theorem
X
V ar(X) = E(X − µ)2 = x2 p(x) − µ2
Proof
V ar(aX + b) = a2 var(X)
Proof
Recall that E(aX + b) = aµ + b, thus
V ar(aX + b) = E[(aX + b) − (aµ + b)]2
= E[a(X − µ)]2
= E[a2 (X − µ)2 ]
= a2 E[(X − µ)2 ]
= a2 var(X)
Example
Given a probability distribution of X as below, find the mean and standard deviation of X
x 0 1 2 3
P(X=x) 1/8 1/4 3/8 1/4
Solution
x 0 1 2 3 Total
P(X=x) 1/8 1/4 3/8 1/4 1
xP(X=x) 0 1/4 3/4 3/4 7/8
x2 P (X = x) 0 1/4 13/2 9/4 4
3
X
E(X) = µ = xP (X = x) = 1.75
x=0
Standard deviation p p
σ= E(X 2 ) − µ2 = 4 − 1.752 = 0.968246
Example 2
The probability distribution of a r.v X is as shown below, find the mean and standard deviation of;
x 0 1 2
P(X=x) 1/6 1/2 1/3
Required
75
Proof
x 0 1 2 Total
P(X=x) 1/6 1/2 1/3 1
xp(X=x) 0 1/2 2/3 /7/6
x2 p(X = x) 0 1/2 4/3 11/6
2
X
E(X) = µ = xp(X = x) = 7/6
x=0
2
X
E(X 2 ) = x2 p(X = x) = 11/6
x=0
Standard deviation
p p p
σ= E(X 2 ) − µ2 = 11/6 − (7/6)2 = 17/6 = 1.6833
p
V ar(Y ) = V ar(12X + 6) = 122 V ar(X) = 144 17/6 = 242.38812
Solution
We know that
Z ∞
E(X) = xf (x)dx
−∞
Z 1
= x(4x3 )dx
0
Z 1
=4 x(x3 )dx
0
5 1
x
=4
5 0
1
4 5
= x
5 0
4 5
= [1 − 05 ]
5
4
=
5
Example:
Let x be a continuous random variable with pdf. given by
(
3x2 0<x<1
f (x) =
0 elsewhere
Z ∞
E(X) = xf (x)dx
−∞
Z 1
= x[3x2 ]dx
0
Z 1
=3 x3 dx
0
4 1
x
=3
4 0
1
3 4
= x
4 0
3 4
= (1 − 04 )
4
3
=
4
77
Z ∞
E(X 2 ) = x2 f (x)dx
−∞
Z 1
= x2 [3x2 ]dx
0
Z 1
=3 x4 dx
0
4 1
x
=3
5 0
1
3 5
= x
5 0
3 5
= (1 − 05 )
5
3
=
5
Assumptions
A discrete random variable X is said to have a binomial distribution if a random experiment satisfies the
following conditions:
(1) The experiment is repeated a fixed number of times. Each repetition is called a trial. The number of
trials is denoted by n.
(2) All trials are independent of each other.
(3) The outcome for each trial of the experiment can be one of two complementary outcomes, one(s)
labeled success and the other(f) labeled failure. A single trial is called a Bernoulli trial.
(4) The probability of success(P) has a constant value of p for each trial.
(5) The random variable X counts the number of success that has occurred in n trials.
Binomial distribution
The random variable that counts the number of successes in many independent, identical Bernoulli trials is
called a Binomial Random Variable.
In probability theory and statistics, the binomial distribution is the discrete probability distribution of the
number of successes in a sequence of n independent yes/no experiments, each of which yields success with
probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In
fact, when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis
for the popular binomial test of statistical significance.
The probability that the event will occur exactly x times in n trials s (i.e., x successes and n – x failures
will occur) is given by the probability function
n x
f (x) = P (X = x) = p (1 − p)n−x
x
n!
= px (1 − p)n−x x = 0, 1, 2, . . . , n, 0 ≤ p ≤ 1
x!(n − x)!
79
80
Example 5.1.
The probability of getting exactly 2 heads in 6 tosses of a fair coin.
Solution
2 4
6 1 1 15
P (X = 2) = =
2 2 2 64
The discrete probability function f(x) is often called the binomial distribution since x = 0, 1, 2, . . . , n, it
corresponds to successive terms in the binomial expansion
n n−1 n n−2 2
(q + p)n = q 2 + q p+ q p + . . . + pn
1 2
n
X n x
= p (1 − p)n−x .
x=0
x
The special case of a binomial distribution with n = 1 is also called the Bernoulli distribution.
5. Skewness
To bring out the skewness of a Binomial distribution we can calculate, moment coefficient of skewness, γ1
p
γ 1 = β1
s
(µµ3 )2
=
(µµ2 )3
µµ3
=
p µ 3
µ2
npq(q − p)
= 3
√
npq
q−p
=√
npq
82
5. Kurtosis
A measure of kurtosis of the Binomial distribution is given by the moment coefficient of kurtosis γ2
γ 2 = β2 − 3
mµ
= 42 − 3
mµ2
X − np
Z= √
npq
(b) Out of 960 families with 5 children each find the expected number of families with (i) and (ii) above
Solution
Let the random variable X measures the number of boys out of 5 births. Clearly X is a binomial random
variable. So we apply the Binomial probability function to calculate the required probabilities.
1
X ∼ B(5, )
2
n x n−x
P (X = x) = p q for x = 0, 1, 2, 3, 4, 5
x
The probability distribution of X is given below
X=x 0 1 2 3 4 5
P (X = x) 1/32 5/32 10/32 10/32 5/32 1/32
83
P (X ≥ 1) = 1 − P (X = 0)
= 1 − 1/32
= 31/32
P (X ≤ 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
= 1/32 + 5/32 + 10/32 + 10/32
= 26/32
(b) Out of 960 families with 5 children, the expected number of families with
Poisson distribution
Poisson Distribution was developed by a French Mathematician Simeon D Poisson (1781- 1840). If a random
variable X is said to follow a Poisson Distribution, if its probability distribution is given by
e−µ µx
P (X = x) = x = 0, 1, 2, . . .
x!
where
The random variable X counts the number of successes in Poisson Process. A Poisson process corresponds
to a Bernoulli process under the following conditions:
the constant probability of success p, for each trial is infinitely small i.e. p → 0.
np = µ is finite
84
Let us consider a Bernoulli process with n trials and probability of success in any trial p = nµ , where µ ≥ 0.
Then, we know that the probability of x successes in n trials is given by
x n−x
n µ µ
p(X = x) = 1−
x n n
x n−x
n! µ µ
= 1−
x!(n − x)! n n
x n−x
n(n − 1) . . . (n − (x − 1)) µ µ
= 1−
x! n n
x
n−x
µ n n−1 n−2 n − (x − 1) µ
= [ . . ... ] 1−
n n n n n n
x
−x
µ 1 2 x−1 µ
= 1− 1− ... 1 − 1−
x! n n n n
−x
n−(x−1)
Now as n → ∞ then [ nn . n−1 n−2
n . n ... n ] → 1 and 1 − nµ → e−µ as n → ∞. Thus, we have
P (X = x) = f race−µ µx x! x = 0, 1, 2, . . .
2. Variance
The variance, denoted by σ 2 , of a Poisson distribution is computed as
V ar(X) = σ 2 = E[(X − µ)2 ]
X
= (X − µ)2 P (x)
all x
Example
At a parking place the average number of car-arrivals during a specified period of 15 minutes is 2. If the
arrival process is well described by a Poisson process, find the probability that during a given period of 15
minutes
Solution:
Let X denote the number of cars arrivals during the specified period of 15 minutes. So X ∼ pois(λ)
e−λ λx
P (X = x) = x = 0, 1, . . . ,
x!
(a)
e−2 20
P (no car will arrive ) = = 0.1353
0!
(b)
(c)
(d)