SMA 140 Lectures Notes 2024 Sep

SMA 140 INTRODUCTION TO PROBABILITY AND
STATISTICS
Dr Robert Mathenge Mutwiri
November 7, 2024
2
Lecture One: Introduction
1. Introduction to statistics
The Word statistics has been derived from Latin word ”Status” or the Italian word ”Statista”, the meaning
of these words is ”Political State” or a Government. Early applications of statistical thinking revolved around
the needs of states to base policy on demographic and economic data.
1.1 Definitions
Data/Data set – Set of values collected or obtained when gathering information on some issue of interest.
Examples
1) The monthly sales of a certain vehicle collected over a period.
2) The number of passengers using a certain airline on various routes.
3) Rating (on a scale from 1 to 5) of a new product by customers.
4) The yields of a certain crop obtained after applying different types of fertilizer.
Statistics: a branch of science that deals with collection presentation, analysis, and interpretation of data.
The definition points out 4 key aspects of statistics namely.
(i) Data collection
(ii) Data presentation,
(iii) Data analysis,
(iv) Data interpretation
Statistics in the above sense refers to the methodology used in drawing meaningful information from a data
set. This use of the term should not be confused with statistics(referring to a set of numerical values)
or statistic (referring to measures of description obtained from a data set).
Descriptive Statistics – Collection, organization, summarization and presentation of data.
Population – All subjects possessing a common characteristic that is being studied.
Examples
1) The population of people inhabiting a certain country.
3
4
2) The collection of all cars of a certain type manufactured during a particular month.
3) All patients in a certain area suffering from AIDS.
4) Exam marks obtained by all students studying a certain statistics course.
Census– A study where every member or element of the population is included.

Examples
1) Study of the entire population carried out by the government every 10 years.
2) Special investigations e.g. tax study commissioned by a government.
3) Any study of all the individuals/elements in a population.
A census is usually very costly and time consuming. It is therefore not carried out very often. A study of a
population is usually confined to a subgroup of the population.
Sample – A subgroup or subset of the population.
The number of values in the sample (sample size) is denoted by n. The number of values in the popula-
tion (population size) is denoted by N.
Statistical Inference – Generalizing from samples to populations and expressing the conclusions in the
language of probability (chance).
Variable – Characteristic or attribute that can assume different values.
Discrete variables – Variables that can assume a finite or countable number of possible values. Such
variables are usually obtained by counting.
Examples
1) The number of cars parked in a parking lot.
2) The number of students attending a statistics lecture.
3) A person’s response (agree, not agree) to a statement. A one (1) is recorded when the person agrees
with the statement, a zero (0) is recorded when a person does not agree.
Continuous variables – Variables that can assume an infinite number of possible values. Such variables
are usually obtained by measurement.
Examples
1) The body temperature of a person.
2) The weight of a person.
3) The height of a tree.
4) The contents of a bottle of cool drink.

5
1.3 Measurement scales

Qualitative variables – Variables that assume non-numerical values.
Examples
1) The course of study at university (B.Com, B.Eng , BA etc.)

2) The grade (A, B, C, D or E) obtained in an examination.
Nominal scale – Level of measurement which classifies data into categories in which no order or ranking
can be imposed on the data.
A variable can be treated as nominal when its values represent categories with no intrinsic ranking. For
example, the department of the company in which an employee works. Examples of nominal variables include
region, postal code, or religious affiliation.
Ordinal scale – Level of measurement which classifies data into categories that can be ordered or ranked.
Differences between the ranks do not exist.
A variable can be treated as ordinal when its values represent categories with some intrinsic order or ranking.
Examples
1) Levels of service satisfaction from very dissatisfied to very satisfied.
2) Attitude scores representing degree of satisfaction or confidence and preference rating scores (low,
medium or high).
3) Likert scale responses to statements (strongly agree, agree, neutral, disagree, strongly disagree).
Quantitative variables – Variables which assume numerical values.
Examples Discrete and continuous variables examples given above.
Interval scale – Level of measurement which classifies data that can be ordered and ranked and where
differences are meaningful. However, there is no meaningful zero and ratios are meaningless.
Examples
1) The difference between a temperature of 100 degrees and 90 degrees is the same difference as that
between 90 degrees and 80 degrees. Taking ratios in such a case does not make sense.
2) When referring to dates (years) or temperatures measured (degrees Fahrenheit or Celsius) there is no
natural zero point.
Ratio scale – Level of measurement where differences and ratios are meaningful and there is a natural zero.
This is the “highest” level of measurement in terms of possible operations that can be performed on the data.
Examples
Variables like height, weight, mark (in test) and speed are ratio variables. These variables have a natural
zero and ratios make sense when doing calculations e.g. a weight of 80 kilograms is twice as heavy as one of
40 kilograms.
6
Measurement scale Examples Meaningful calculations

Nominal Types of music Put into categories
University faculties
Vehicle makes
Ordinal Motion picture ratings: Put into categories
G- General audiences Put into order
PG-Parental guidance
PG-13 – Parents cautioned
R - Restricted
NC 17 – No under 17
Interval Years: 2009,2010, 2011 Put into categories
Months: 1,2, . . . , 12 Put into order
Differences between values are meaningfull
Ratio Rainfall Put into categories
Humidity Put into order
Income Differences between values are meaningfull
Ratios are meaningfull
1.4 Graphical Displays

After you have organized the data into a frequency distribution, you can present them in graphical form.
The purpose of graphs in statistics is to convey the data to the viewers in pictorial form. It is easier for most
people to comprehend the meaning of data presented graphically than data presented numerically in tables
or frequency distributions. This is especially true if the users have little or no statistical knowledge.
Statistical graphs can be used to describe the data set or to analyze it. Graphs are also useful in getting
the audience’s attention in a publication or a speaking presentation. They can be used to discuss an issue,
reinforce a critical point, or summarize a data set. They can also be used to discover a trend or pattern
in a situation over a period of time. The commonly used graphs in research are; the pie chart, bar chart,
histogram, frequency polygon and the cumulative frequency curve (Ogive).
2.2.1 Pie Chart

It’s a circular graph having radii divide a circle into sectors proportional in angle to the relative size of the
quantities in the category being represented. How to Draw
(i) Add up the given quantities and let s be the sum of the values
X X
(ii) For each quantity x, calculate the representative angle and percentage as S × 360o and S × 100%
respectively
(iii) Draw a circle and divide it into sectors using the angles calculated in step ii above
(iv) Label the sector by the group represented and indicate the corresponding percentage.
Example
This frequency distribution shows the number of pounds of each snack food eaten during the Super Bowl.
Construct a pie graph for the data
Snack Potato chips Tortilla chips Pretzels Popcorn Snack nuts

Pounds (in millions) 11.2 8.2 4.3 3.8 2.5
7
Solution
Snack Potato chips Tortilla chips Pretzels Popcorn Snack nuts Total
Pounds (in millions) 11.2 8.2 4.3 3.8 2.5 30.0
Representative Angle 134 98 52 46 30 360
Representative %age 37.3 27.3 14.3 12.7 8.3 99.9
Figure 1:
2.2.2 Bar chart

A bar chart consists of a set of equal spaced rectangles whose heights are proportional to the frequency of
the category /item being considered. The X axis in a bar chart can represent the number of categories.
Note: Bars are of uniform width and there is equal spacing between the bars.
Example
A sample of 250 students was asked to indicate their favourite TV channels and their responses were as
follows
8
Figure 2:
2.23 Pareto Charts

It consist of a set of continuous rectangles where the variable displayed on the horizontal axis is qualitative
or categorical and the frequencies are displayed by the heights of vertical bars, which are arranged in order
from highest to lowest. A Pareto chart is used to represent a frequency distribution for a categorical variable,
Points to note when drawing a Pareto Chart
i) Make the bars the same width.
ii) Arrange the data from largest to smallest according to frequency.
iii) Make the units that are used for the frequency equal in size.
When you analyze a Pareto chart, make comparisons by looking at the heights of the bars.
Example
The table shown here is the average cost per mile for passenger vehicles on state turnpikes. Construct
State Indiana Oklahoma Florida Maine Pennsylvania

Number 2.9 4.3 6.0 3.8 5.0
9
Solution
Arrange the data from the largest to smallest according to frequency.
State Florida Pennsylvania Oklahoma Maine Indiana

Number 6.0 5.8 4.3 3.8 2.9
Figure 3:
2.1 Frequency Distributions Tables

Definitions
Raw Data: unprocessed data i.e, data in its original form.
Frequency Distribution:
The organization of raw data in table form with classes and frequencies. Rather it’s a list of values and the
number of times they appear in the data set. We have grouped and ungrouped frequency distribution tables
for large and small data sets respectively.
A frequency distribution is a table in which data are grouped into classes and the number of val-
ues(frequencies) which fall in each class recorded.
The main purpose of constructing a frequency distribution is to get insight into the distribution pattern
of the frequencies over the classes. Hence, the name frequency distribution is used to refer to this pattern.
10
2.1.1 Construction of Ungrouped Frequency Distributions

Note the largest and smallest observations in the data
Starting with the smallest value, tally the observations of each quantity.
Count the number of tallies for each quantity and record it as frequency.
Example
In a survey of 40 families in a village, the number of children per family was recorded and the following data
was obtained
1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5
Number of children Tally Frequency
0 3
1 7
2 10
3 8
4 6
5 4
6 2
Total 40
2.1.2 Construction of Grouped Frequency Distributions

When the number of observations is too large and/or when the variable of interest is continuous, it’s cumber-
some to consider the repetition of each observation. A quick and more convenient way is to group the range
of values into a number of exclusive groups or classes and count the class frequency. The resulting table is
called a grouped frequency distribution table. A grouped frequency distribution consists of classes and their
corresponding frequencies. Each raw data value is placed into a quantitative or qualitative category called
a class.
Example 2
These data represent the record high temperatures in degrees Fahrenheit (F) for each of the 50 states.
Construct a grouped frequency distribution for the data. 112 100 127 120 134 118 105 110 109 112 110 118
117 116 118 122 114 114 105 109 107 112 114 115 118 117 118 122 106 110 116 108 110 121 113 120 119 111
104 111 120 113 120 117 105 110 118 112 114 114
Solution
1 Find the minimum= 100 and maximum= 134 values in the data set and calculate the range.
Range = Maximum − Minimum = 134 − 100 = 34
2 Decide on the number of classes. Use Sturges’ rule which states that
No. of classes k = Round-up(1 + 1.44ln(n))
= 1 + 1.44 × ln(50)
= 6.63 ≈ 7
11
3 Calculate the class width such that no. of classes× class width > Range
7 × class width > 34
Class width = 6
Class Bounderies Tally Frequency

100 - 105 99.5 – 105.5 5
106 - 111 105.5 - 111.5 13
112 - 117 111.5 – 117.5 16
118 - 123 117.5 -123.5 14
124 - 129 124.5 - 129.5 1
130 - 135 129.5 - 135-5 1
Continuous Frequency Distribution

If we deal with a continuous variable, it is not possible to arrange the data in the class intervals of above
type. Let us consider the distribution of age in years. If class intervals are 15 – 19, 20 – 24 then persons with
ages between 19 and 20 years are not taken into consideration. In such a case we form the class intervals
as 0–5, 5–10, 10–15, 15–20, . . . Here all the persons with any fraction of age are included in one group or the
other. In the above classes, the upper limits of each class are excluded from the respective classes and are
included in the immediate next class and are known as ‘exclusive classes’. The upper and lower class limits
of the new exclusive type classes are known as class boundaries.
If d is the gap between the upper limit of any class and the lower limit of the succeeding class, the class
boundaries for any class are then given by :
d
Upper class boundary = upper class limit +
2
d
Lower class boundary = Lower class limit–
2
Example 2 continued
The frequency distribution below includes the class boundaries.
class limits class boundaries f relative frequency cumulative frequency

38 – 41 37.5 – 41.5 4 0.08 4
42 – 45 41.5 – 45.5 10 0.2 14
46 – 49 45.5 – 49.5 8 0.16 22
50 – 53 49.5 – 53.5 15 0.3 37
54 – 57 53.5 – 57.5 9 0.18 46
58 – 61 57.5 – 61.5 3 0.06 49
62 – 65 61.5 – 65.5 1 0.02 50
Total 50
Example 3
The monthly expenditures (thousands of rands) of 60 households are shown below. The values of this data
12
set were accurately recorded (not rounded).
7.21741 7.8989 6.85461 10.31167 8.48253 5.17069

5.09063 8.16412 5.67094 7.7394 7.87423 5.41634
9.37265 10.14436 7.15675 10.31107 8.86571 10.1734
5.99276 6.5738 7.06965 8.82439 7.47467 9.50018
4.90014 5.50273 8.12516 5.51933 7.43641 10.95599
5.87188 9.36936 9.83773 10.18893 5.12028 9.60018
8.56534 9.27719 8.37107 7.03318 10.78344 9.08941
6.85749 7.7887 9.68159 6.75009 8.0521 8.19638
10.17312 7.51527 11.31383 8.5765 7.48021 8.39881
7.37565 7.28159 8.81773 5.53182 5.98515 7.71778
Class midpoints
The midpoint of class (xmid ) can be calculated from
Lower class limit(boundary)+Upper class limit(boundary)

xmid =
2
Examples
1) For the frequency distribution in example 2 (temperature data), the class midpoints are given below.
class limits class boundaries midpoints

38 – 41 37.5 – 41.5 39.5
42 – 45 41.5 – 45.5 43.5
46 – 49 45.5 – 49.5 47.5
50 – 53 49.5 – 53.5 51.5
54 – 57 53.5 – 57.5 55.5
58 – 61 57.5 – 61.5 59.5
62 – 65 61.5 – 65.5 63.5
2) For the frequency distribution in example 3, the class midpoints are given below.
classes midpoints
4.5 – 5.5 5
5.5 – 6.5 6
6.5 – 7.5 7
7.5 – 8.5 8
8.5 – 9.5 9
9.5 – 10.5 10
10.5 – 11.5 11
For this distribution lower (upper) class limit = lower (upper) class boundary for each of the classes.
A value that falls on the boundary of 2 classes is allocated to the higher of the two classes e.g. 5.50000 is
allocated to the class 5.5 – 6.5 (not 4.5 to 5.5).
13
Cumulative frequencies
The “less than” cumulative frequency of a class is the number of values in the sample that are less than or
equal to the upper class boundary of the class.
Examples
1) See frequency distribution in example 2 (temperature data).
2) For the frequency distribution in example 3 (expenditure data) the cumulative frequencies are calcu-
lated as shown below.
classes upper class boundary f cumulative frequencies calculations

4.5 – 5.5 5.5 5 5 5
5.5 – 6.5 6.5 7 12 5+7
6.5 – 7.5 7.5 13 25 5+7+13
7.5 – 8.5 8.5 13 38 5+7+13+13
8.5 – 9.5 9.5 9 47 5+7+13+13+9
9.5 – 10.5 10.5 10 57 5+7+13+13+9+10
10.5 – 11.5 11.5 3 60 5+7+13+13+9+10+3
Total 60
Relative and percentage frequencies

Relative frequency = frequency/sample size i.e. Rf = f
n.
The percentage frequency of a class is calculated from relative frequency ×100.
classes f relative frequency percentage frequency

4.5 – 5.5 5 0.083 8.3
5.5 – 6.5 7 0.117 11.7
6.5 – 7.5 13 0.217 21.7
7.5 – 8.5 13 0.217 21.7
8.5 – 9.5 9 0.15 15
9.5 – 10.5 10 0.167 16.7
10.5– 11.5 3 0.05 5
Total 60 1 100
Histogram
A histogram is the graphical representation of a frequency distribution. The frequency for each class is
represented by a rectangular bar with the class boundaries as base and the frequency as height.
Example
A histogram of the frequency distribution in example 2 (temperature data) is shown below.
Frequency polygon
This is also a graphical representation of a frequency distribution. For each class the class midpoint is plotted
against the frequency and the plotted points joined by means of straight lines.
14
Figure 4:
Example
For the temperature data the following values are plotted.
midpoint 35.5 39.5 43.5 47.5 51.5 55.5 59.5 63.5 67.5
f 0 4 10 8 15 9 3 1 0
The plot is shown on the following page. Example:

Consider the following frequency distribution
Class 5-9 10-14 15-19 20-24 25-29 30-34 35-39

Frequency 5 12 32 40 16 9 6
Solution
Boundaries 4.5-9.5 9.5-14.5 14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5

heights 1 2.4 6.4 8 3.2 1.8 1.2
The corresponding histogram is as shown below.
2.3 Stem and Leaf Plots

The stem and leaf plot is a method of organizing data and is a combination of sorting and graphing. It
has the advantage over a grouped frequency distribution of retaining the actual data while showing them in
graphical form. A stem and leaf plot is a data plot that uses part of the data value as the stem and part of
the data value as the leaf to form groups or classes. For this plot, it’s easy to identify the mode, the smallest
15
Figure 5:
Figure 6:
value and the largest value. Note:
(i) In a stem and leaf plot, classes width/interval must be uniform.

(ii) The leaves in the final stem and leaf plot should be arranged in order.
Example 1
At an outpatient testing center, the number of cardiograms performed each day for 20 days is shown.
Construct a stem and leaf plot for the data.
25 31 20 32 13 14 43 02 57 23 36 32 33 32 44 32 52 44 51 45
16
Solution
Arrange the data in order and separate the data according to the first digit, as shown.
02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36 43, 44, 44, 45 51, 52, 57
The plot above shows that the distribution peaks in the center and that there are no gaps in the data. For
Figure 7:
7 of the 20 days, the number of patients receiving cardiograms was between 31 and 36. The plot also shows
that the testing center treated from a minimum of 2 patients to a maximum of 57 patients in any one day.
2.3.1 Back-to-Back Stem and Leaf Plot

Related distributions can be compared by using a back-to-back stem and leaf plot. The back-to-back stem
and leaf plot uses the same digits for the stems of both distributions, but the digits that are used for the
leaves are arranged in order out from the stems on both sides.
Example 1
The growth (in centimetres) of two varieties of plant after 20 days is shown in this table. Construct a
back-to-back stem and leaf plot for the data, and compare the distributions.
Figure 8:
Example
The number of stories in two selected samples of tall buildings in Atlanta and Philadelphia is shown. Con-
struct a back-to-back stem and leaf plot, and compare the distributions.
Solution
The final back-to-back stem and leaf plot looks like the one below
17
Figure 9:
Figure 10:
The buildings in Atlanta have a large variation in the number of stories per building. Although both
distributions are peaked in the 30- to 39-story class, Philadelphia has more buildings in this class. Atlanta
has more buildings that have 40 or more stories than Philadelphia does.
18
Lecture 2: Measures of location and
central Tendency
Objectives
By the end of this unit, you will be able to:
calculate the Arithmetic Mean
obtain the median of group and ungrouped distribution
calculate the
2.0 Numerical summaries

A numerical summary for a set of data is referred to as a statistic if the data set is a sample and a
parameter if the data set is the entire population.
Numerical summaries are categorized as measures of location and measures of spread.
Measures of location can further be classified into measures of central tendency and measures of relative
positioning (quartiles).
Notation
Let the symbol xi denote any of the n values x1 , x2 , . . . , xn assumed by a variable X. The letter i in xi
i = 1, 2, . . . is called an index subscript. The letters j, k, p, q or s can also be used.
Summation notation
X
x1 + x2 + . . . + xn = i = 1n xi
Example
n
X
Xi Yi = X1 Y1 + X2 Y2 + . . . + Xn Yn
i=1
n
X n
X
aXi = aX1 + aX2 + . . . + aXn = a(X1 + X2 + . . . Xn ) = a Xi
i=1 i=1
19
20
2.1 Measures of Central Tendency

A Measures of Central Tendency of a set of numbers is a value which best represents it. There are three
different types of Central Tendencies namely the mean, median and mode. Each has advantages and disad-
vantages depending on the data and intended purpose.
Arithmetic Mean
The arithmetic mean of a set of values x1 , x2 , . . . , xn , denoted ¯x if the data set is a sample, is found by
dividing the sum of the set of numbers with the actual number of values. i.e;
x1 + x2 + . . . + xn
X̄ =
n
n
1X
= xi
n i
Example 1
Find the mean of 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. Solution
x1 + x2 + . . . + xn
X̄ =
n
n
1 X
= xi
n i
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10
=
10
55
= = 5.5
10
Some important characteristics of the arithmetic mean are:
i) The arithmetic mean is the average which is found by the procedure:
sum of all observations

average =
number of observations
ii) The location of the mean is dependent upon the shape of the distribution. It may not always be
representative of the centre.
iii) The arithmetic mean has mathematical properties that the other two averages do not have. (Balance
point property).
iv) Because of its mathematical properties, the mean lends itself to further mathematical analysis of data
sets unlike the other two averages.
v) The mean is used as a measure of the centre in statistical inference because these mathematical prop-
erties are important in that instance.
vi) The mean requires a quantitative variable, typically at the ratio level of measurement.
21
Exercise
1) Find the mean of 9, 3, 4, 2, 1, 5, 8, 4, 7, 3

2) A sample of 5 executives received the following amount of bonus last year: sh 14,000, sh 15,000,
sh17,000, sh 16,000 and sh y. Find the value of y if the average bonus for the 5executives is sh 15,400
Median
The median is the value in the data set which is such that half of the values in the data set are less
than or equal to it and half greater than or equal to it.
It’s the value below which and above which half
ofthe observations fall when ranked in order of size.
th
The position of the median term is given by n+12 value. where n is the number of data values in
the sample.
If the number of values in the data set is even, then the median is the average of the two middle values.
Some important characteristics of the median are:
i) The median is representative of the data array because it is in the geometrical centre of the distribution.
It is the exact halfway point. Half the observations are less and half are greater than the median value.
ii) In a positively skewed data set, the median will be to the right of the mode and on negatively skewed
data sets to the left. Draw sketches to illustrate this relationship.
iii) The median is always unique for a data set.
iv) The median is useful for descriptive purposes when the data set is skewed because of the constancy
of its location. It is always exactly the middle observation in the data array when the array is rank
ordered and it is insensitive to outliers.
v) The median can be found for data that has an ordinal level of measurement or higher.
Examples
1) The marks of students in a geography test that has a maximum possible mark of 50 are given below
47, 35, 37, 32, 38, 39, 36, 34, 35
Find the median of this set of data values.
Solution
Arrange the data values in order from the lowest to the highest value:
32, 34, 35, 35, 36, 37, 38, 39, 47
The number of values n, in the data set = 9
1
Median = (9 + 1)th value
2
= 5th value
= 36
22
2) Consider the above data set with the first value (47) omitted
Arrange the data values in order from the lowest to the highest value:
32, 34, 35, 35, 36, 37, 38, 39
The number of values n, in the data set = 8 which is an even number. The two middle values in the
data set are in the position n2 = 82 = 4 and n2 + 1 = 28 + 1 = 5, i.e the values 35 and 36.
35 + 36
Median = = 35.5.
2
Median for Grouped data

For grouped data median is estimated using the formular

c n×i
100 − FLess
Median = Li +
fi
where
i = 1, 2, . . . , 100
Li = Lower class boundary of the Median class.
fi = frequency of the median class
n = Sample size
FLess = Sum of frequencies of classes less than percentile class.
c = Class width
Mode
It’s the value occurring most frequently in a data set. If each observation occurs the same number of times,
then there is no mode. When 2 or more observation occurs most frequently in a data then the data is said
to be multimodal.
Example
Find the mode of the following data set:
48, 44, 48, 45, 42, 49, 48

The mode is 48 since it occurs most often.
Advantages of Mode
Can be determined for all levels of data (nominal, ordinal, interval and Ratio ) scale.
It is not affected by extremely high or low values in the data set.
It can be used as a measure of central tendency in distributions with open ended classes.
Some of the important characteristics of the mode are:
i) The mode may not be unique since a distribution may have more than one mode.
23
ii) There is no calculation required to find the mode since it is obtained by inspection of the data.
iii) If the mode is used as a representative of individual values in the array, it will be in error less often
than any other average used. If the size of this error is important such as in sizes in the manufacturing
business, the mode is a good representative of the data array.
iv) In a negatively skewed distribution, the mode is to the right of the midrange. In a positively skewed
distribution the mode is to the left of the midrange. Draw sketches to illustrate this relationship.
v) If what is to be portrayed is a typical value in an array, it is most typical because no value occurs more
often.
vi) For data measured at the nominal level, it is the only average that can be found.
Mode for grouped data

For grouped data, the mode is estimated using the formula:

f − fa
Mode = Li + c ×
2f − fa − fb
Example
Estimate the median and mode for the following frequency distribution:
Class 5-9 10-14 15-19 20-24 25-29 30-34 35-39
Freq 5 12 32 40 16 9 6
Solution
Class 4.5-9.5 9.5-14.5 14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5
Freq 5 12 32 40 16 9 6
xf 35 144 544 880 432 288 222
CF 5 17 49 89 105 114 12

c n×i
100 − FLess
Median = Li +
fi

60.5 − 49
= 19.5 + c × 5 = 20.93.75
40

f − fa
Mode = Li + c ×
2f − fa − fb

40 − 32
= 19.5 + 5 ×
80 − 32 − 16
= 20.75
Comparison of mean, Median and Mode

(i) The mean is used as a measure of central tendency for symmetrical, bell-shaped data that do not have
extreme values (extreme values are called outliers).
(ii) The median may be more useful than the mean when there are extreme values in the data set as it is
not affected by extreme values.
(iii) The mode is useful when the most common item characteristic or values of a data set is required.
24
Weighted Arithmetic Mean

The weighted arithmetic mean of a set of n numbers x1 , x2 , . . . , xn having corresponding weights w1 , w2 , . . . , wn
is defined as P
w1 x1 + w2 x2 + . . . + wn xn w i xi
x̄w = = P
w1 + w2 + . . . + wn wi
Example 1
Consider the following table with marks obtained by two students James (mark x) and John (mark y). The
weights are to be used in determining who joins the engineering course whose requirement is a weighted
mean of 58% on the four subjects below;
Subject Maths English History Physics Total

Mark x 25 87 83 30 225
Mark y 70 45 35 75 225
Weight 3.6 2.3 1.5 2.6 10
Example 2
If a final examination is weighted 4 times as much as a quiz, a midterm examination 3 times as much as a
quiz, and a student has a final examination grade of 80, a midterm examination grade of 95 and quiz grades
of 90, 65 and 70, the mean grade is
P
w1 x1 + w2 x2 + . . . + wn xn w i xi
x̄w = = P
w1 + w2 + . . . + wn wi
1(90) + 1(65) + 1(70) + 3(95) + 4(80) 830
= = = 83
1+1+1+3+4 10
Geometric mean
For Raw data, let x1 , x2 , . . . , xn be the sample values, the geometric mean is given by the formula:
√
X̄GM = n x1 .x2 . . . . .xn
v
u n
uY
n
= t xi
i=1
For grouped data the formula for geometric mean is given by

q
X̄GM = xf11 .xf22 . . . . .xfnn
n
v
u n
uY fi
n
= t xi
i=1
n
1X
⇒ log(X̄GM ) = fi logxi
n i=1
1) Find the geometric mean of the numbers 2,4 and 8

Solution
The geometric mean √ √
3
X̄GM = 3 2 × 3 × 8 64 = 4
25
2) Find the harmonic and geometric mean of the frequency table below
x 13 14 15 16 17
f 2 5 13 7 3
q
xf11 .xf22 . . . . .xfnn
n
X̄GM =
p
30
= 132 × 145 × 151 3 × 167 × 173
= 15.09837
Harmonic mean
Let x1 , x2 , . . . , xn be the sample values, the harmonic mean is given by the formula:
n
X̄HM = 1
+ x12 + . . . +
x1
1
xn
n
=
P 1
xi
For a grouped data, the Harmonic mean is given by the formula:

n
X̄HM = f1
+ xf22 + . . . +
x1
fn
xn
n
=
P fi
xi
where fi is the frequency and xi is the class boundaries midpoint.
1) Find the harmonic mean of the numbers 2,4 and 8

Solution
The harmonic mean is
3 3 34
X̄HM = 1 1 1 = 7 = = 3.43
2 + 4 + 8 8
7
2) Find the harmonic mean of the frequency table below
x 13 14 15 16 17
f 2 5 13 7 3
solution
p
30
X̄HM = 132 × 145 × 1513 × 167 × 173 = 15.09837
Measures of Variability
Dispersion
Averages or the measures of central tendency give us an idea of tile concentration of the observations about
the central part of the distribution.
26
Characteristics for an Ideal measure of Dispersion.

The characteristics, for an ideal measure of dispersion arc the same as those for all ideal ·measure of central
tendency, viz.,
(i) It should he rigidly defined.
(ii) It should be easy to calculate and easy to understand.
(iii) It should be based on all the observations.
(iv) It should be amenable to further mathematical treatment.
(v) It should be affected as little as possible by fluctuations of sampling.
Measures of Dispersion
The following are the measures of dispersion:
(i) Range,
(ii) Quartile deviation or Semi-interquartile range,
(iii) Mean deviation, and
(iv) Standard deviation.
Range
The range is the difference between two extreme observations, or the distribution. If A and B arc the greatest
and smallest observations respectively in a distribution, then, its range is A − B.
Range is the simplest but a crude measure of dispersion. Since it is based on two extreme observations
which themselves are subject to chance fluctuations, it is not at all reliable measure of dispersion.
Quartile deviation
Quartile deviation or semi-interquartile range Q is given by
1
Q= (Q3 − Q1 )
2
where Q1 and Q3 are the first and third quartiles of the distribution respectively.
Quartile deviation is definitely a better measure than the range as it makes use of 50% of the data. But
since it ignores the other 50% of the data, it cannot be regarded as a reliable measure.
Mean Absolute Deviation (MAD)

It is the average of the absolute deviations from the mean for the raw data(ungrouped data) is given by
P
|x − x̄|
M AD =
n
The mean absolute deviation for grouped data is given by the formula:
P
f |x − x̄|
M AD = .
n
27
Since mean deviation is based on all the observations. it is a better measure of dispersion than range
or quartile deviation.
It may be pointed out here that mean deviation is least when taken from median.
Example 1
Find the quartile deviation and the mean absolute deviation for the following data.
3, 6, 9, 10, 7, 12, 13, 15, 6, 5, 13
Solution
Sorted data: 3, 5, 6, 6, 7, 9, 10, 12, 13, 13, 15
Recall Q1 = 6 and Q3 = 13
1 1
SIQR = (Q3 − Q1 ) = (13 − 6) = 3.5
2 2
3 + 5 + 6 + 6 + 7 + 9 + 10 + 12 + 13 + 13 + 15
x̄ = =9
11
P
(x − x̄) |3 − 9| + |5 − 9| + |6 − 9| + |6 − 9| + . . . + |13 − 9| + |15 − 9| 36
M AD = = = = 3.2727
n 11 11
Variance and Standard Deviation

The average of the squared deviations from the mean is called the variance denoted by s2 and its given by
k
1X
S2 = (x − x̄)2
n i=1
n
1X 2
= x − nx̄2
n i=1 i
For the grouped data, the sample variance is given by

k
1X
S2 = f (x − x̄)2
n i=1
k
1X 2
= f x − x̄2
n i=1
where n is the sum of the frequencies or sample size.

Example
Find the variance and standard deviation for the data.
3, 6, 9, 10, 7, 12, 13, 15, 6, 5, 13
Solution
3 + 5 + 6 + 6 + 7 + 9 + 10 + 12 + 13 + 13 + 15
x̄ = =9
11
28
(3 − 9)2 + (5 − 9)2 + . . . + (13 − 9)2 + (15 − 9)2

S2 =
11
36 + 16 + 9 + 9 + 4 + 0 + 1 + 4 + 16 + 16 + 36
=
11
143
= = 13
11
Standard deviation
√ √
S= S2 = 13 = 3.60555
Example 2
Find the standard deviation of the data: 2, 4, 8, 7, 9, 4, 6, 10, 8, and 5.
Solution
2 + 4 + 8 + 7 + 9 + 4 + 6 + 10 + 8 + 5 63
X̄ = = = 6.3
10 10
X
x2 = 22 + 42 + 82 + 72 + 92 + 42 + 62 + 102 + 82 + 52 = 455
1X 2
S2 = x − x̄2
n
= 45.5 − 6.32
= 5.81
√ √
Standard deviation S = S2 = 5.81 = 2.4104.
Example 3
Estimate the mean, and standard deviation for the frequency table below:
Class 5-9 10-14 15-19 20-24 25-29 30-34 35-39

freq 5 12 32 40 16 9 6
Solution
Class 5-9 10-14 15-19 20-24 25-29 30-34 35-39

freq 5 12 32 40 16 9 6
Mid pts (x) 7 12 17 22 27 32 37 Total
Freq (f) 5 12 32 40 16 9 6 120
xf 35 144 544 880 432 288 222 2545
f x2 245 1728 9248 19360 11664 9216 8214 59675
P
fx 2545
X̄ = = = 21.2083
n 120
1X 2 59675
S2 = f x − x̄2 = − 21.20832 = 47.49829
n 120
√ √
S = S 2 = 47.49829 = 6.8919
29
3.4 Measures of Relative Dispersion

These measures are used in comparing spreads of two or more sets of observations. These measures are
independent of the units of measurement. These are a sort of ratio and are called coefficients.
Suppose that the two distributions to be compared are expressed in the same units and their means are
equal or nearly equal. Then their variability can be compared directly by using their standard deviations.
However, if their means are widely different or if they are expressed in different units of measurement,
we can not use the standard deviations as such for comparing their variability. We have to use the relative
measures of dispersion in such situations.
Measures of relative dispersion includes:
Coefficient of quartile deviation,
Coefficient of mean deviation
Coefficient of variation.
Coefficient of Quartile Deviation

The Coefficient of Quartile Deviation of x CQD(x) is given by
Q3 − Q1
CQD(x) = × 100%
Q3 + Q1
Coefficient of Mean Deviation

The Coefficient of Mean Deviation CMD(x) is given by
M AD
CM D(x) = × 100%
M ean
3.4.2 Coefficient of Variation:

Coefficient of variation is the percentage ratio of standard deviation and the arithmetic mean. It is usually
expressed in percentage. The coefficient of variation of x denoted C.V(x) is given by the formula
S
CV (x) = × 100%
X̄
where S is the standard deviation and X̄ is the sample mean of variable X.
The coefficient has no units ie it’s independent of the units of measurements. It is useful in comparing
spreads of two or more populations. The smaller the coefficient of variation, the higher the peak and the
lower the spread and vice versa.
Note:
Standard deviation is absolute measure of dispersion while. Coefficient of variation is relative measure of
dispersion.
Example 1
Consider the distribution of the yields (per plot) of two ground nut varieties. For the first variety, the mean
and standard deviation are 82 kg and 16 kg respectively. For the second variety, the mean and standard
deviation are 55 kg and 8 kg respectively. Then we have, for the first variety
16
CV (x) = × 100 = 19.5%
82
30
For second Variety

8
CV (x) = × 100 = 14.5%
55
It is apparent that the variability in second variety is less as compared to that in the first variety. But in
terms of standard deviation the interpretation could be reverse. Example 2
Below are the scores of two cricketers in 10 innings. Find who is more ”consistent scorer” by Indirect method.
A 204 68 150 30 70 95 60 76 24 19
B 99 190 130 94 80 89 69 85 65 40
Solution
X̄A = 79.6, SA = 58.2, X̄B = 94.1, SB = 41.1

58.2 41.1
CV (XA ) = × 100 = 73.153%, CV (XB ) = × 100 = 43.7028%
79.6 94.1
Coefficient of variation of A is greater than coefficient of variation of B and hence we conclude that player
B is more consistent.
Lecture 3: Measures of Position and
Variability
Measures of position – percentiles

Definitions
The ith percentile, Pi , is the value that has i% of the values in a data set less or equal to it (0 < i ≤ 100).
Examples
Median = me = 50th percentile = P50 .
First quartile = Q1 = 25th percentile = P25 .
Third quartile = Q3 = 75th percentile = P75 .
The 9 deciles D1 , D2 , . . . , D9 are the values that have 10%, 20%, . . . , 90% respectively of the values in
the data set less or equal to them.
D1 = P10 , D2 = P20 , . . . , D5 = P50 = me, . . . , D9 = P90 .
Calculation of quartiles and quartile deviation for raw data

For raw data the calculations of the first and third quartiles are based on the same principles as that of the
median.
Steps to be followed in calculating the first and third quartiles for raw data
1) Organize the values in the data set in ascending order in magnitude.
2) Find the median.
3) Divide the data set into 2 portions of equal numbers of values – set 1 consists of those values less
or equal to the median and set 2 consists of those values greater or equal to the median. When the
data set has an odd number of values, the median is excluded from the division of the data set into 2
portions.
4) The first quartile (Q1 ) is the median of set 1 and the third quartile (Q3 ) is the median of set 2.
Example
The distance from home to work (kilometers) of 11 employees at a certain company are shown below.
Calculate Q1 and Q3 .
6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36
31
32
1) Ordered data set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49
2) Median = 40. After this step the median is deleted from the data set.
3) Set 1 − 5 values less than median i.e. 6, 7, 15, 36, 39.
4) Set 2–5 values greater than the median i.e. 41, 42, 43, 47, 49.
5) Q1 = median of set 1 = 15,

Q3 = median of set 2 = 43.
Example
Suppose the data set consists of the above values and 56 (12 values).
6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36, 56
1) Ordered Data Set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49, 56
40+41
2) median 2 = 40.5 Unlike what was done in example 1, no values are deleted from the data set.
3) Set 1–6 values less or equal than median i.e. 6, 7, 15, 36, 39, 40
Set 2–6 values greater or equal than the median i.e. 41, 42, 43, 47, 49, 56.
15+36 43+47
4) Q1 = median of set 1 = 2 = 25.5 Q3 = median of set 2 = 2 = 45
Q3 −Q1
The quartile deviation = Q = 2 can also be used as a measure of variability.
43−15
For the data set in example 1, quartile deviation = Q = 2 = 14.
The quartile deviation value shows the extent to which the values in the data set deviate from the
median.
For a skew data set the quartile deviation is a more appropriate measure of variability than the standard
deviation.
Calculation of median, quartiles and percentiles for grouped data

Percentile class – class that contains the percentile that is calculated.
A formula for calculating the ith percentile Pi for grouped data is shown below.

c n×i
100 − Fless
Pi = Li + , i = 1, 2, . . . , 100.
fi
where
Li = lower class boundary of percentile class.
fi = frequency of percentile class
n = sample size
Fless = Sum of frequencies of classes less than percentile class.
c = class width.
33
Example
For the frequency distribution of temperatures (example 2 of the frequency distributions – table given below),
the calculations of the median, first quartile, third quartile, 4th decile and 65th percentile are shown below.
class boundaries f cumulative frequency
37.5 – 41.5 4 4
41.5 – 45.5 10 14
45.5 – 49.5 8 22
49.5 – 53.5 15 37
53.5 – 57.5 9 46
57.5 – 61.5 3 49
61.5 – 65.5 1 50
Total 50
Median
The above formula with i = 50, n = 50 applies.
i∗n
Step 1: Calculate position of median = 100 = 50∗50
100 = 25.
Step 2: Median class (class that contains 25th observation) is the class 49.5–53.5.
Step 3: L50 = 49.5, f50 = 15, Fless = 22, c = 4.
Step 4: Substitute into the above formula.
(25 − 22) ∗ 4
Median = 49.5 + = 50.3
15
First quartile
i∗n
100 = 12.5.
Step 2: First quartile class (class that contains 12.5th observation) is the class 41.5 – 45.5.
Step 3: L25 = 41.5, f25 = 10, Fless = 4, c = 4.
(12.5 − 4) ∗ 4
Q1 = 41.5 + = 44.9.
15
Third quartile
i∗n
100 = 37.5.
Step 2: Third quartile class (class that contains 37.5th observation) is the class 53.5 – 57.5.
Step 3: L75 = 53.5, f75 = 9, Fless = 37, c = 4.
(37.5 − 37) ∗ 4
Q3 = 53.5 + = 53.72.
9
Fourth decile
i∗n
Step 1: Calculate position of the 4th = 100 = 40∗50
100 = 20.
Step 2: 4 decile class (class that contains 20th observation) is the class 45.5-49.5.
th
Step 3: L40 = 45.5, f40 = 8, Fless = 14, c = 4.

(20 − 14) ∗ 4
D4 = 45.5 + = 48.5.
8
34
65th Percentile
i∗n
Step 1: Calculate position of the 65th = 100 = 65∗50
100 = 32.5.
Step 2: 65 percentile class (class that contains 32, 5th observation) is the class 49.5 – 53.5.
th
Step 3: L65 = 49.5, f65 = 15, Fless = 22, c = 4.

(32.5 − 22) ∗ 4
P65 = 49.5 + = 52.3.
15
Percentiles can also be read off from a “less than” ogive.
Example
The following cumulative frequency graph shows the distribution of marks scored by a class of 40 students
in a test.
Figure 11:
From the graph Q1 = 36, M e = 44, Q3 = 52.

Lecture 4: Measures of Dispersion,
Skewness and Kurtosis
Objective: The objective of the present lesson is to impart the knowledge of measures of dispersion and
skewness and to enable the students to distinguish between average, dispersion, skewness, moments and
kurtosis.
4.1 Dispersion
Characteristics for an Ideal measure of Dispersion:
The dispersion, for an ideal measure of dispersion arc the same as those for all ideal ·measure of central
tendency, viz.,
(i) It should he rigidly defined.
(ii) It should be easy to calculate and easy to understand.
(iii) It should be based all the observations.
(iv) It should be amenable to further mathematical treatment.
(v) It should be affected as little as possible by fluctuations of sampling.
Spread is the degree of scatter or variation of the variable about the central value. e.g:
– the range,
– Inter-Quartile range,
– Quartile Deviation also called semi Inter-Quartile range,
– Mean Absolute Deviation,
– Variance and
– standard deviation.
Inter-Quartile range and Semi Inter-Quartile Range

Inter-Quartile range (IQR) is the difference between the upper and lower quartiles. Half of this difference is
called Quartile Deviation or the semi Inter-Quartile range (SIQR) i.e;
35
36
4.2 Measures of Skewness and Kurtosis

Measure of Skewness
Before discussing the concept of skewness, an understanding of the concept of symmetry is essential. A plot
of frequency against class mark joined with a smooth curve can help us to visually assess the symmetry of a
distribution. Usually symmetry is about the central value. Symmetry is said to exist in a distribution if the
smoothed frequency polygon of the distribution can be divided into two identical halves wherein each half
is a mirror image of the other Skewness on the other hand means lack of symmetry and it can be positive or
negative.
Basically, if the distribution has a tail on the right, (See figure below), then the distribution is positively
skewed E.g., Most students having very low marks in an examination. However if the distribution has a tail
on the left, then the distribution is negatively skewed. (see figure below). Eg Most students having very
high marks in an examination. Generally, for a set of x1 , x2 , . . . , xn the moment coefficient of Skewness α3
Figure 12: Sketches showing general position of the Mean, Median and Mode of the population
is given by
f (x − x̄)3
P
α3 =
nS 3
where x̄ and S are the arithmetic mean and standard deviation of X respectively.
Generally for any set of values x1 , x2 , . . . , xn the moment coefficient of skewness α3 is given by α3 =
f (x−x̄)3
P
nS 3 where S is the standard deviation of X. It’s worth noting that if α3 < 0, the distribution is
negatively skewed, if α3 > 0 the distribution is positively skewed and if α3 = 0 the distribution is normal.
The Karl Pearson’s coefficient of Skewness is based upon the divergence of mean from mob in a skewed
distribution. Recall the empirical relation between mean, median and mode which states that, for a moder-
ately symmetrical distribution, we have
Example
Calculate the coefficient of Kurtosis α4 for the data: 5, 6, 7, 6, 9, 4, 5
Solution
1X 42
x̄ = x=
n 7
Standard deviation
r
1X 4
s= (x − x̄)2 = √
n 7
37
x 5 6 7 6 9 4 5 Total
2
P
(x − x̄) 1 0 1 0 9 4 1 16
3
P
(x − x̄) -1 0 1 0 27 -8 -1 18
(x − x̄)3 1
P
0 1 0 81 16 1 100
√
f (x − x̄)3
P
18 7 3
α3 = = ×( ) = 0.744118
nS 3 7 4
The Karl Pearson’s coefficient of Skewness is based upon the divergence of mean from mob in a skewed dis-
tribution. Recall the empirical relation between mean, median and mode which states that, for a moderately
symmetrical distribution, we have
Mean-Mode = 3(Mean-Median)
Hence Karl Pearson’s coefficient of skewness is defined by;

Mean-Mode 3(Mean-Median)
SKp = =
Standard Deviation Standard Deviation
The Bowley’s coefficient of Skewness is based on quartiles. For a symmetrical distribution, it is seen that
Q1 and Q3 are equidistant from median.
Q3 − 2Q2 + Q1
SKB = where Qk is the K th quartile.
Q3 − Q1
The Kelly’s coefficient of Skewness is based on P90 and,P10 so that only 10% of the observations on each
extreme are ignored.. This is an improvement over the Bowley’s coefficient which leaves 25% of the obser-
vatories on each extreme of the distribution.
P90 − 2P50 + P10
SKk = where Pk is the K th percentile.
P90 − P10
Capital 1-5 6-10 11-15 16-20 21-25 26-30 31-35
No. of companies 20 27 29 38 48 53 70
Compute the Bowley’s coefficients of skewness and interpret the results.
Solution
Capital 0.5-5.5 5.5-10.5 10.5-15.5

15.5-20.5 20.5-25.5 25.5-30.5 30.5-35.5
No. of companies 20 27 29 38 48 53 70
CF 20 47 76 114 162 215 285

1 71.5 − 47
Q1 = (286)th value = 71.5th value = 10.5 + × 5 = 14.7241
4 29

1 th rd 143 − 114
Q2 = (286) value = 143 value = 20.5 + × 5 = 23.5208
2 48

3 214.5 − 162
Q3 = (286)th value = 214.5th value = 25.5 + × 5 = 30.4528
4 53
Q3 − 2Q2 + Q1 30.4528 − 2 × 23.5208 + 14.7241
SKp = = = −0.11855.
Q3 − Q1 30.4528 − 14.7241
Since the value of Skewness lies between −0.5 and 0.5, the distribution is approximately symmetric.
38
Measure of Kurtosis
It measures the peakedness of a distribution. If the values of x are very close to the mean, the peak is very
high and the distribution is said to be Leptokurtic. On the other hand if the values of x are very far away
from the mean, the peak is very low and the distribution is said to be Pletykurtic. Finally if x values are at
a moderate distance from the mean then the peak is moderate and the distribution is said to be mesokurtic.
Figure 13:
Generally, for a set of x1 , x2 , . . . , xn the moment coefficient of kurtosis α4 is given by
f (x − x̄)4
P
α4 =
nS 4
where x̄ and S are the arithmetic mean and standard deviation of X respectively. Example
Calculate the coefficient of Kurtosis α4 for the data: 5, 6, 7, 6, 9, 4, 5
Solution
1X 42
x̄ = x=
n 7
Standard deviation r
1X 4
s= (x − x̄)2 = √
n 7
x 5 6 7 6 9 4 5 Total
2
P
P(x − x̄)3 1 0 1 0 9 4 1 16
(x − x̄) -1 0 1 0 27 -8 -1 18
(x − x̄)3 1
P
0 1 0 81 16 1 100
√ 4
f (x − x̄)4
P
100 7
α4 = = × = 2.73438
nS 4 7 4
Lecture 4: Bivariate data
4.1 Introduction
So far we have confined our discussion to the distributions involving only one variable. Sometimes, in
practical applications, we might come across certain set of data, where each item of the set may comprise
of the values of two or more variables. A Bivariate Data is a a set of paired measurements which are of the
form
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )
Examples
i) Marks obtained in two subjects by 60 students in a class.
ii) The series of sales revenue and advertising expenditure of the various branches of a company in a
particular year.
iii) The series of ages of husbands and wives in a sample of selected married couples.
In a bivariate data, each pair represents the values of the two variables. Our interest is to find a relationship
(if it exists) between the two variables under study.
4.2 Scatter Diagrams and Correlation

A scatter diagram is a tool for analyzing relationships between two variables. One variable is plotted on
the horizontal axis and the other is plotted on the vertical axis. The pattern of their intersecting points
can graphically show relationship patterns. Most often a scatter diagram is used to prove or disprove cause-
and-effect relationships. While the diagram shows relationships, it does not by itself prove that one variable
causes the other. In brief, the easiest way to visualize Bivariate Data is through a Scatter Plot. “Two
variables are said to be correlated if the change in one of the variables results in a change in the other
variable”.
4.2.1: Positive and Negative Correlation

If the values of the two variables deviate in the same direction i.e. if an increase (or decrease) in the values
of one variable results, on an average, in a corresponding increase (or decrease) in the values of the other
variable the correlation is said to be positive.
Some examples of series of positive correlation are:
39
40
i) Heights and weights;

ii) Household income and expenditure;
iii) Price and supply of commodities;
iv) Amount of rainfall and yield of crops.
Correlation between two variables is said to be negative or inverse if the variables deviate in opposite
direction. That is, if the increase in the variables deviate in opposite direction.
That is, if increase (or decrease) in the values of one variable results on an average, in corresponding
decrease (or increase) in the values of other variable. e.g; Price and demand of goods.
Figure 14:
4.2.2 Interpreting a Scatter Plot

Scatter diagrams will generally show one of six possible correlations between the variables:
i) Strong Positive Correlation The value of Y clearly increases as the value of X increases.
ii) Strong Negative Correlation The value of Y clearly decreases as the value of X increases.
iii) Weak Positive Correlation The value of Y increases slightly as the value of X increases.
iv) Weak Negative Correlation The value of Y decreases slightly as the value of X increases.
v) Complex Correlation The value of Y seems to be related to the value of X, but the relationship is not
easily determined.
vi) No Correlation There is no demonstrated connection between the two variables
41
4.3 Correlation Coefficient

Correlation coefficient measures the degree of linear association between 2 paired variables It takes values
from +1 to –1.
i) If r = +1,we have perfect positive relationship
ii) If r = −1,we have perfect negative relationship
iii) If r = 0 there is no relationship ie the variables are uncorrelated.
4.3 .1 Pearson’s Product Moment Correlation Coefficient

Pearson’s product moment correlation coefficient, usually denoted by r, is one example of a correlation
coefficient. It is a measure of the linear association between two variables that have been measured on
interval or ratio scales, such as the relationship between height in inches and weight in pounds. However, it
can be misleadingly small when there is a relationship between the variables but it is a non-linear one.
The correlation coefficient r is given by
P PP
xy − x y
n
r= p P P p P P
[ n x2 − ( x)2 ][ n y 2 − ( y)2 ]
Example:
A study was conducted to find whether there is any relationship between the weight and blood pressure of
an individual. The following set of data was arrived at from a clinical study. Let us determine the coefficient
of correlation for this set of data. The first column represents the serial number and the second and third
columns represent the weight and blood pressure of each patient.
Weight 78 86 72 82 80 86 84 89 68 71
Blood Pressure 140 160 134 144 180 176 174 178 128 132
Thus
10(124206) − (796)(1546)
r= p p
(10)(63776) − (796)2 (10)(243036) − (1546)2
11444
=p = 0.5966
(1144)(40244)
Example
Calculate the correlation coefficient for the following heights (in inches) of fathers (X) and their sons (Y) :
X 65 66 67 67 68 69 70 72
Y 61 68 65 68 72 72 69 71
Solution
42
Figure 15:
X Y X2 Y2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
68 72 4624 5184 4896
69 72 4761 5184 4968
70 68 4900 4761 4830
72 71 5184 5041 5112
Total 544 552 37028 38132 37560
4.3 .2 Spearman rank correlation coefficient

Data which are arranged in ascending order are said to be in ranks or ranked data. The coefficient of
correlation for such type of data is given by Spearman rank difference correlation coefficient and is denoted
by R.
6 d2
P
R=1−
n(n2 − 1)
Example
The data given below are obtained from student records.( Grade Point Average (x) and Graduate Record
exam score (y)) Calculate the rank correlation coefficient ‘R’ for the data
Subject 1 2 3 4 5 6 7 8 9 10
X 8.3 8.6 9.2 9.8 8.0 7.8 9.4 9.0 7.2 8.6
y 2300 2250 2380 2400 2000 2100 2360 2350 2000 2260
43
Solution
Note that in the x row, we have two students having a grade point average of 8.6 also in the y row; there is
a tie for 2000.Now we arrange the data in descending order and then rank 1, 2, 3, . . . , 10 accordingly. In case
of a tie, the rank of each tied value is the mean of all positions they occupy. In x,for instance, 8.6 occupy
ranks 5 and 6. So each has a rank 5+6 2 = 5.5.
Similarly in ‘y’ 2000 occupies ranks 9 and 10, so each has rank 9+102 = 9.5. Now we come back to our
formula
6 d2
P
R=1−
n(n2 − 1)
We compute d, square it and substitute its value in the formula
X 8.3 8.6 9.2 9.8 8.0 7.8 9.4 9.0 7.2 8.6
y 2300 2250 2380 2400 2000 2100 2360 2350 2000 2260
Rank(X) 7 5.5 3 1 8 9 2 4 10 5.5
Rank(Y) 5 7 2 1 9.5 8 3 4 9.5 6
d 2 -1.5 1 0 -1.5 1 -1 0 0.5 -0.5
d2 4 2.25 1 0 2.25 1 1 0 0.25 0.25
d2 = 12. so
P
So here, n = 10 and
6(12)
R=1−
10(100 − 1)
= 1 − 0.0727 = 0.9273
Example
For a certain joint stock company, the prices of preference shares (X) and debentures (Y) are given below:
X 73.2 85.8 78.9 75.8 77.2 81.2 83.8

Y 97.8 99.2 98.8 98.3 98.3 96.7 97.1
Use the method of rank correlation to determine the relationship between preference prices and debentures
prices.
Solution
Calculations for Coefficient of Rank Correlation
X Y RX RY XR − RY d2
73.2 97.8 7 5 2 4
85.8 99.2 1 1 0 0
78.9 98.8 4 2 2 4
75.8 98.3 6 3.5 2.5 6.25
77.2 98.3 5 3.5 1.5 2.25
81.2 96.7 3 7 -4 16
83.8 97.1 2 6 -4
P P 216
d=0 d = 48.5
44

P 2 m(m2 −1)
6 d + 2
ρ=
N (N 2 − 1)

6 48.5 + 2(4−1)
12
=
7(72 − 1)
6 × 49
=1−
7 × 48
= 0.125
Hence, there is a very low degree of positive correlation, probably no correlation, between preference share
prices and debenture prices.
4.4 Regression analysis

Regression analysis is a general approach for obtaining a prediction function using a sample data. We work
with a dependent variable (Y, response or endogenous variable), independent variable (X’s, predictor or
exogenous variable) and the predicted value for a given level of X, Ŷ . The method of least squares finds
particular line where the aggregate deviation of the data points above or below it is minimized.
If two variables are significantly correlated, and if there is some theoretical basis for doing so, it is possible
to predict values of one variable from the other. Regression analysis, in general sense, means the estimation
or prediction of the unknown value of one variable from the known value of the other variable. It is one of
the most important statistical tools which is extensively used in almost all sciences – Natural, Social and
Physical.
Regression analysis is a mathematical measure of the average relationship between two or more variables
in terms of the original units of the data.
Regression analysis can be thought of as being sort of like the flip side of correlation. It has to do with
finding the equation for the kind of straight lines you were just looking at Suppose we have a sample of size
n and it has two sets of measures, denoted by x and y. We can predict the values of y given the values of x
by using the equation,
y = a + bx
Where the coefficients ‘a’ and ‘b’ are real numbers given by
X X
y = na + b x
X X X
xy = a x+b x2
P P
y x
→a= −b
n n
Substituting a into the equation we have
P P X
X y x X
xy = −b x+b x2
n n
P P X
X y x X
n xy = −b x + nb x2
n n
X X X X X 2
n xy − x y=b n x2 − x
45
P PP
n x y xy −
b=
P 2

P 2
n x − x
Example:
Scores made by students in a statistics class in the mid-term and final examination are given here. Develop
a regression equation which may be used to predict final examination scores from the mid-term score.
Student 1 2 3 4 5 6 7 8 9 10
Mid term 98 66 100 96 88 45 76 60 74 82
Final 90 74 98 88 80 62 78 74 86 80
Solution
We want to predict the final exam scores from the mid term scores. So let us designate ’y’ for the final exam
scores and ’x’ for the mid term exam scores. We open the following table for the calculations.
Student X Y X2 XY
1 98 90 9604 8820
2 66 74 4356 4884
3 100 98 10000 9800
4 96 88 9216 8448
5 88 80 7744 7040
6 45 62 2025 2790
7 76 78 5776 5928
8 60 74 3600 4440
9 74 86 5476 6364
10 82 80 6724 6560
Totals 785 810 64521 65074
P P P
n xy −
x y
b=
P 2

P 2
n x − x
10(65074) − (785)(810)
=
10(64521) − (785)2
14, 860
= = 0.5127
28, 985
P P
y x
a= −b
n n
810–785(0.5127)
=
10
= 40.7531
Thus, the regression equation is given by ŷ = 40.7531 + 0.5127x.
We can use this to find the projected or estimated final scores of the students. e.g for the midterm score
of 50 the projected final score is
ŷ = 40.7531 + 0.5127(50) = 66.3881
which is a quite a good estimation.
46
Exercise
1) The following are marks obtained by a student in Kenyatta University for 11 subjects within two
academic years:
Academic year 1 2 3 4 5 6 7 8 9 10 11
Year1 25 10 15 35 45 15 20 30 40 45 55
Year2 12 8 10 11 13 8 10 11 13 14 10
Required
a) Draw a scatter diagram of year 1 against year 2 marks and comment of the pattern [3 marks]
b) Calculate the product moment correlation of year 1 and year 2 marks of the students. [4 marks]
c) Calculate the Spearman’s Rank correlation of students marks. [5 marks]
2) Two series X and Y are presented below:
Series X 62 72 78 58 65 70 66 63 60 72
Series Y 50 65 63 50 54 60 61 55 54 65
a) Draw a scatter diagram of Series X against Series Y data and comment of the pattern [3 marks]
b) Calculate the product moment correlation of Series X against Series Y data. [4 marks]
c) Calculate the Spearman’s Rank correlation of series data. [5 marks]
3) An Agriculturalist assumes that there is a linear relationship between the amount of fertilizer supplied
to potato plants and the subsequent yield of potatoes obtained. Eight tomato plants, of the same
variety, were selected at random and treated weekly with a solution in which x grams of fertilizer was
dissolved in fixed quantity of water. The yield y kgs of potatoes was recorded as:
CROP 1 2 3 4 5 6 7 8
X 1 1.5 2 2.5 3 3.5 4 4.5
y 3.9 4.4 5.8 6.6 7 7.1 7.3 7.7
Required
(i) Plot a scatter diagram of yield y against amount of fertilizer x [3 mks]

(ii) Calculate the equation of the least squares regression line of y on x. [6 mks]
(iii) Estimate the yield of a plant treated weekly with 3.2grams of fertilizer. [3 mks]
Lecture 5: Introduction to Probability
theory
5.1 Probability (Chance)

A probability is the chance that something of interest will happen.
Its a measure of occurrence of a phenomena.
A probability is expressed as a proportion i.e. it ranges from 0 to 1.
Chance can be expressed as a percentage i.e. it ranges from 0 to 100.
Examples
1) The probability of rain tomorrow is 0.40

There is a 40% chance of rain tomorrow.
1
2) The probability of winning the Lotto is 1398316 .
3) The probability of a certain new product being successful is 0.75.
Random experiment
- A random experiment is a statistical experiment in which:
1. All possible outcomes of the experiment are known in advance.
2. Any performance of the experiment results in an outcome that is not known in advance.
3. The experiment can be repeated under identical or similar conditions.
Examples
1) Tossing a coin (possible outcomes: heads, tails).
2) Rolling a die (possible outcomes: 1, 2, 3, 4, 5, 6).
3) Asking a person to assign a rating to a product (possible outcomes: A, B, C, D, E).
4) Drawing a card from a deck of cards (possible outcomes: 13 hearts, 13 clubs, 13 spades, 13 diamonds).
47
48
5.2 Approaches to Probability

There are three ways to define probability, namely classical, empirical and subjective probability.
5.2.1 Classical probability

Classical or theoretical probability is used when each outcome in a sample space is equally likely to occur.
The underlying idea behind this view of probability is symmetry. i.e; if the sample space contains n outcomes
that are fairly likely then P (one outcome) = n1 . The classical probability for an event A is given by
number of favourable outcomes n(E)

P (E) = =
total number of possible outcomes n(S)
Example
A fair die, with faces numbered 1 to 6, is rolled once, write down the sample space S hence find the proba-
bility that the score showing up is ; a) a multiple of 3 b) a prime number.
Solution
S = {1, 2, 3, 4, 5, 6} Multiples of 3 are 3 and 6 while prime numbers are 2, 3 and 5.
2 1
P (Multiple of 3) = =
3 4
3 1
P (Prime number) = = .
6 2
Example 2
Two coins are tossed. Find the probability of getting
(i) exactly two heads.
(ii) at least one head
Solutions
Here S = {hh, ht, th, tt}.
(i) Let A = getting exactly two heads = hh

1
P (A) =
4
(ii) Let B = getting at least one head = {hh, ht, th}
3
P (A) =
4
5.2.2 Frequentist or Empirical probability

When the outcomes of an experiment are not equally likely, we can conduct experiments to give us some idea
of how likely the different outcomes are. For example, suppose we are interested in measuring the probability
of producing a defective item in a manufacturing process. The probability could be measured by monitoring
the process over a reasonably long period of time and calculating the proportion of defective items.
49
In a nut shell Empirical (or frequentist or statistical) probability is based on observed data. The empirical
probability of an event A is the relative frequency of event A. Example 1
The following are the counts of fish of each type, that you have caught before.
Fish Types Blue gill Red gill Crappy Total

No of times caught 13 17 10 40
Estimate the probability that the next fish you catch will be a Blue gill.
13
P (Blue gill) = = 0.325
40
Example
A summary of the final marks in a certain statistics course is shown below.
Mark f
less than 30 6
30 - 39 26
40 - 49 45
50 - 59 64
60 - 69 82
70 - 79 37
80 - 89 22
90 - 99 8
Total 290
Find the following empirical probabilities from the frequency distribution table
(i) P (mark less than 40).
(ii) P (pass)
(iii) P (Above 80)
solution
26+6
i) P (Marks less than 40) = 290 = 0.11
64+82+37+22+8 213
i) P (Pass) = 290 = 290 = 0.73
22+8 28
i) P (Above 80) = 290 = 290 = 0.103.
5.2.3 Subjective Probability

Subjective probabilities result from intuition, educated guesses, and estimates. For example: given a pa-
tient’s health and extent of injuries a doctor may feel that the patient has a 90% chance of a full recovery.
Subjectivity means two people can assign different probabilities to the same event.
Regardless of the way probabilities are defined, they always follow the same laws, which we will explore
in the following Section.
50
5.3 Set Theory

Set
A set is a collection of outcomes.
Sample space
The sample space is the set of all possible outcomes of a random experiment. A sample space is usually
denoted by the symbol S and the collection of elements contained in S enclosed in curly brackets {}.
Sample point
A sample point is an individual outcome (element) in a sample space.
Examples
1) Tossing a single coin. S = {h, t}.

2) Tossing a die. S = {1, 2, 3, 4, 5, 6}.
3) Tossing a pair of dice
1 2 3 4 5 6
1 (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
2 (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
S= 3 (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
4 (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
5 (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
6 (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6),
4) ) Tossing two coins. S = {hh, ht, th, tt}.

5) Drawing a card from a deck of cards. The elements in the sample space are listed below. Each outcome
Figure 16:
listed in the above examples is a sample point.

Event
An event is a subset of a sample space i.e. a collection of sample points taken from a sample space.
Impossible event
An impossible event is an event that cannot happen (has probability zero).
Certain event
A certain event is an event that is sure to happen (has probability 1).

Simple events are events that involve only one event.
Examples
51
1) Let E denote the event “an odd number is obtained when tossing a single die”. Then E = 1, 3, 5.
2) Let H denote the event “at least one head appears when tossing two coins”.
H = hh, ht, th.
3) Let B denote the event “obtaining a club and a heart in a single draw from a deck of cards”. The event
B is impossible. The set of outcomes of B is an empty set denoted by
B = {.} = ϕ
4) 4) Let A denote the event “obtaining a 1, 2, 3, 4, 5 or 6 when tossing a single die”. The event A is a
certain event i.e. one of the outcomes belonging to the set describing the event must happen. This is
denoted by A = S, where S is the sample space.
5) The events E, H, B and A above are all examples of simple events.
5.3.1 Venn diagrams
A Venn diagram is a drawing, in which circular areas represent groups of items usually sharing common
properties.
The drawing consists of two or more circles, each representing a specific group or set, contained within
a square that represents the sample space. Venn diagrams are often used as a visual display when
referring to sample spaces, events and operations involving events.
5.3.2 Complements, Unions and Intersections of events

Compound events
These are events that involve more than one event. Such events can be obtained by performing various
operations involving two or more events.
Some of the operations that can be performed are described in the sections that follow.
Complementary events
The complementary event Ā (sometimes written A′ ) of an event A is all the outcomes in S that are
not in A.
Examples
1) Consider the experiment of tossing a single die. S = {1, 2, 3, 4, 5, 6}. The complement of the event A =
“obtaining a 3 or less” = 1, 2, 3 is = “obtaining a 4 or more” = {4, 5, 6}.
2) Consider the experiment of tossing two coins. S = {hh, ht, th, tt}. The complement of the event H =
“at least one head” = {hh, ht, th} is “no heads” = {tt}.
Union and intersection of events

52
Figure 17:
The union of two events A and B, denoted by A ∪ B , is the set of outcomes that are in A or in B or
in both A and B i.e. the event that “either A or B or both A and B occur” or “at least one of A or B
occurs”.
The intersection of two events A and B, denoted by A ∩ B , is the set of outcomes that are in both A
and B i.e. the event that “both A and B occur”.
– Ā ∩ B is the event “a sample point is in B but not in A”.
– A ∩ B̄is the event “a sample point is in A but not in B”.
The Venn diagrams below show the sets A ∪ B and A ∩ B. These definitions involving two events can be
Figure 18:
extended to ones involving 3 or more events e.g. for the 3 events A1 , A2 and A3 the event A1 ∪ A2 ∪ A3 is
the event “at least one of A1 , A2 or A3 occurs” and A1 ∩ A2 ∩ A3 the event “A1 and A2 and A3 occur”.
Lecture 6: Counting formulas and
conditional probabilities
6.0 Learning objectives

At the end of the unit, you should be able to:
evaluate combinational expressions
recognize a problem of selection or combination.
6.1 Counting formulas

6.1.1 Factorial notation
In how many ways can n (n– integer) objects be arranged in a row?
Let n = 2 : 1st object –2 choices

2nd object –1 choice.
Number of ways = 2 × 1 = 2.
Let n = 3: 1st object –3 choices 2nd object – 2 choices.
3rd object – 1 choice.
Number of ways = 3 × 2 × 1 = 6.
In general: the number of ways is n × (n − 1) × (n − 2) × . . . 2 × 1 = n! (n factorial).
Using this notation
2 × 1 = 2! = 2
3 × 2 × 1 = 3! = 6
4 × 3 × 2 × 1 = 4! = 24etc.
Note: 1! = 1, 0 = 1.
The factorial notation is used in counting formulae.
Examples
53
54
1) In how many ways can 7 people be placed in a queue at a bus stop?

The 7 people have to be placed in the 7 positions from 1st to 7th.
No. of ways = 7 × 6 × 5 × . . . × 2 × 1 = 7! = 5040.
2) In how many ways can 5 books be arranged in a row?

No. of ways = 5 × 4 × 3 × 2 × 1 = 5! = 120.
6.1. 2 Permutations and combinations

Permutation
A permutation is the number of different arrangements of a group of items where order matters.
The number of permutations of n objects taken r at a time is calculated from
n!
n Pr = P (n, r) =
(n − r)!
Combination
A combination is the number of different selections of a group of items where order does not matter.
The number of combinations of a group of n objects taken r at a time is calculated from

n n!
n Cr = C(n, r) = =
r (n − r)!r!
Examples:
1) Four people (A, B, C, D) serve on a board of directors. A chairman and vice-chairman are to be chosen
from these 4 people. In how many ways can this be done?
Chairman Vice-chairman
A B
B A
A C
C A
A D
D A
B C
C B
B D
D B
C D
D C
Number of ways = 12.

55
2) Four people (A, B, C, D) serve on a board of directors. Two people are to be chosen from them as
members of a committee that will investigate fraud allegations. In how many ways can this be done?
People chosen A and B A and C A and D B and C B and D C and D
Number of ways = 6.
In both these examples a choice of 2 people from 4 people is made. However, in example 1 the
order of choice of the 2 people matters (since the one person chosen is chairman and the other one
vice-chairman). In example 2 the order does not matter. The only interest is in who serves on the
committee.
6.1.3 Application of formulae

In question 1 the permutations formula applies with n = 4, r = 2.
4!
P (4, 2) = = 12.
(4 − 2)!
In question 2 the combinations formula applies with n = 4, r = 2.
4!
C(4, 2) = = 6.
2!(4 − 2)!
3) Find the number of ways to take 4 people and place them in groups of 3 at a time where order does
not matter.
Solution:
4! 24
C(4, 3) = = = 4.
3!(4 − 3)! 6
Since order does not matter, use the combination formula.
4) Find the number of way to arrange 6 items in groups of 4 at a time where order matters.
Solution:
6! 720
P (6, 4) = = = 360
(6 − 4)! 2!
There are 360 ways to arrange 6 items taken 4 at a time when order matters.
5) Find the number of ways to take 20 objects and arrange them in groups of 5 at a time where order
does not matter.
20! 20.19.18.17.16
C(20, 5) = = = 15504
5!(20 − 5)! 1.2.3.4.5
6) Determine the total number of five-card hands that can be drawn from a deck of 52 cards.
Solution:
When a hand of cards is dealt, the order of the cards does not matter. Thus the combinations formula
is used.
There are 52 cards in a deck and we want to know in how many different ways we can draw them in
groups of five at a time when order does not matter. Using the combination formula gives
C(52, 5) = 2598960
56
7) There are five women and six men in a group. From this group a committee of 4 is to be chosen. In
how many ways can the committee be formed if the committee is to have at least 3 women in it?
Solution
Situation 1–3 women and 1 man. Number of ways = C(5, 3) × C(6, 1) = 10 × 6 = 60
Situation 2–4 women and no men. Number of ways = C(5, 4) × C(6, 0) = 5 × 1 = 5
Total number of ways = 60 + 5 = 65.
6.2 Basic probability formulae

6.2.1 Complementary events
For any event A defined on some sample space,
P (Ā) = 1 − P (A)
6.2.2 Union of two or more events

For any two events A and B defined on some sample space,
P (A ∪ B) = P (A) + P (B) for mutually exclusive events.
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) for events that are not mutually exclusive.
These formulae can be extended to probabilities involving more than two events e. g. for 3 events A, B and
C defined on some sample space
P (A ∪ B) = P (A) + P (B) + P (C) for mutually exclusive events.
P (A∪B∪C) = P (A)+P (B)+P (C)¶(A∩B)−¶(B∩C)−¶(A∩C)−P (A∩B∩C)for events that are not mutually exclusive.
This formula can easily be verified with the aid of the Venn diagram shown below. From the above diagram
the following sets can be written down.
A = {1, 2, 4, 5}; B = {2, 3, 5, 6}; C = {4, 5, 6, 7}

A ∩ B = {2, 5}; A ∩ C = {4, 5}; B ∩ C = {5, 6}
A ∩ B ∩ C = {5}; A ∪ B ∪ C = {1, 2, 3, 4, 5, 6, 7}.
Exercise: Complete the verification of the result for P (A ∪ B ∪ C).
6.2.3 De Morgan’s Laws

¯ B)
(1) P (Ā ∩ B̄) = P (A ∪
¯ B)
(2) P (Ā ∪ B̄) = P (A ∩
The two above results can also be written in a different notation as
(1) P (A′ ∩ B ′ ) = P (A ∪ B)′
(2) P (A′ ∪ B ′ ) = P (A ∩ B)′
Venn diagram verification of second result
57
Figure 19:
Figure 20:
6.2.4 Total probability formulae
P (A) = P (A ∩ B) + P (A ∩ B̄)
P (B) = P (A ∩ B) + P (Ā ∩ B)
These formulae can be verified from the Venn diagram shown on the following page. The formulae can be
58
Figure 21:
Figure 22:
extended to probabilities involving more than two events.

Examples
1) There are two telephone lines – A and B. Line A is engaged 50% of the time and line B is engaged
60% of the time. Both lines are engaged 30% of the time. Calculate the probability that
(a) at least one of the lines are engaged.
(b) none of the lines are engaged.
(c) line B is not engaged.
(d) line A is engaged, but line B is not engaged.
(e) only one line is engaged.
Solution
Let E1 denote the event “line A is engaged” and E2 the event “line B is engaged”.
P (E1 ) = 0.5, P (E2 ) = 0.6, P (E1 ∩ E2 ) = 0.3
59
(a)
P (at least one of the lines are engaged) = P (E1 ∪ E2 )

= P (E1 ) + P (E2 ) − P (E1 ∩ E2 )
= 0.5 + 0.6 − 0.3
= 0.8
(b)
P (none of the lines are engaged.) = 1–P (at least one of the lines are engaged)
= 1 − 0.8
= 0.2
(c) P (B not engaged) = 1–P (B engaged) = 1–P (E2 ) = 1–0.6 = 0.4.

(d) The event ”line A is engaged, but line B is not engaged” can be written in symbols as
P (E1 ∩ Ē2 ) = P (E1 ) − P (E1 ∩ E2 )

= 0.5 − 0.3
= 0.2
(e)
P (only one line is engaged) = P (line A is engaged, but line B is not engaged)
+ P (line B is engaged, but line A is not engaged)
= P (E1 ∩ Ē2 ) + P (Ē1 ∩ E2 )
P (Ē1 ∩ E2 ) = P (E2 ) − P (E1 ∩ E2 )
= 0.6 − 0.3 = 0.3(Using the total probability formula)
P (only one line is engaged) = 0.2 + 0.3 = 0.5
A batch of 20 computers contain 3 that are faulty. Four (4) computers are selected at random without
replacement from this batch. Calculate the probability that
(a) all 4 the computers selected are not faulty.

(b) at least 2 of the computers selected are faulty.
Solution:
There are C(20, 4) = 4845 ways of selecting the 4 computers from the batch of 20. Since random
selection is used, all 4845 selections are equally likely. Let A denote the event “all 4 the computers
selected are not faulty” and B the event “at least 2 of the computers selected are faulty”.
Using the classical probability result,
N (A) C(17, 4) 2380

P (A) = = = = 0.4912
N (S) C(20, 4) 4845
60
N (B) N (2faculty) + N (3faculty)

P (B) = =
N (S) 4845
C(17, 2) ∗ C(3, 2) + C(17, 1) ∗ C(3, 3)
=
4845
136 ∗ 3 + 17 ∗ 1 425
= =
4845 4845
= 0.0877.
6.3 Marginal and joint probabilities

Probabilities involving the occurrence of single events are called marginal probabilities.
Probabilities involving the occurrence of two or more events are called joint probabilities.
Example
The preference probabilities according to gender for 2 different brands of a certain product are summarized in
the table below. The gender marginal probabilities are obtained by summing the joint probabilities over the
brands. The brand marginal probabilities are obtained by summing the joint probabilities over the genders.
Brand
Gender 1 2 Marginal
Male 0.2 0.32 0.52
Female 0.4 0.08 0.48
Marginal 0.6 0.4 1
Joint probabilities: P (male ∩ brand 1) = 0.20, P (male ∩ brand 2) = 0.32, P (female ∩ brand 1) = 0.40,
P (female ∩ brand 2) = 0.08
Marginal probabilities: P (male) = 0.52, P (f emale) = 0.48, P (brand1) = 0.60, P (brand2) = 0.40.
6.4 Independence
Two events A and B are independent if
P (A ∩ B) = P (A)P (B).
Otherwise, they are said to be dependent.
Two events are independent if they are not related to each other. For example, if you roll two dice sepa-
rately, the outcomes will be independent.
proposition
If A and B are independent, then A and B C are independent.
proof
P (A ∩ B̄) = P (A) − P (A ∩ B)
= P (A) − P (A)P (B)
= P (A)(1 − P (B))
= P (A)P (B̄)
61
This definition applies to two events. What does it mean to say that three or more events are independent?
Example
Roll two fair dice. Let A1 and A2 be the event that the first and second die is odd respectively. Let A3 = [sum
is odd]. The event probabilities are as follows:
Event Probability
A1 1/2
A2 1/2
A3 1/2
A1 ∩ A2 1/4
A1 ∩ A3 1/4
A2 ∩ A3 1/4
A1 ∩ A2 ∩ A3 0
We see that A1 and A2 are independent, A1 and A3 are independent, and A2 and A3 are independent. How-
ever, the collection of all three are not independent, since if A1 and A2 are true, then A3 cannot possibly be
true.
From the example above, we see that just because a set of events is pairwise independent does not mean
they are independent all together. We define:
6.5 Independence of multiple events

Events A1 , A2 , · · · are said to be mutually independent if
P (Ai1 ∩ Ai2 ∩ · · · ∩ Air ) = P (Ai1 )P (Ai2 ) · · · P (Air )
for any i1 , i2 , · · · ir and r ≥ 2.
Example
Let Aij be the event that i and j roll the same. We roll 4 dice. Then
1 1 1
P (A12 ∩ A13 ) = · = = P (A12 )P (A13 ).
6 6 36
But
1
P (A12 ∩ A13 ∩ A23 ) = ̸= P (A12 )P (A13 )P (A23 ).
36
So they are not mutually independent.
We can also apply this concept to experiments. Suppose we model two independent experiments with
Ω1 = {α1 , α2 , · · · } and Ω2 = {β1 , β2 , · · · } with probabilities P (αi ) = pi and P (βi ) = qi . Further suppose
that these two experiments are independent, i.e.
P ((αi , βj )) = pi qj
for all i, j. Then we can have a new sample space Ω = Ω1 × Ω2 .
Now suppose A ⊆ Ω1 and B ⊆ Ω2 are results (i.e. events) of the two experiments. We can view them as
subspaces of Ω by rewriting them as A × Ω2 and Ω1 × B. Then the probability
X X X
P (A ∩ B) = pi qi = pi qi = P (A)P (B).
αi ∈A,βi ∈B αi ∈A βi ∈B
62
So we say the two experiments are “independent” even though the term usually refers to different events in
the same experiment. We can generalize this to n independent experiments, or even countably many infinite
experiments. The law of total probability. For disjoint events A1 and A2 with S = A1 ∪ A2 ,
P (B) = P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ).
Bayes’ theorem.
P (B|A1 )P (A1 )
P (A1 |B) = .
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 )
Example
Consider 1/10 of men and 1/7 of women are color-blind. A person is chosen at random and that person
is color-blind. What is the probability that the person is male. Assume males and females to be in equal
numbers.
Solution
Let M=male,F=female,C=colour-blind. Then
P (M ∩ C)
P (M |C) = (1)
P (C)
P (C|M )P (M )
= (2)
P (C|M )P (M ) + P (C|F )P (F )
1
· 12
= 1 10 1 1 1. (3)
10 · 2 + 7 · 2
6.6 Conditional probability

Suppose B is an event with P (B) > 0. For any event A ⊆ Ω, the conditional probability of A given B is
P (A ∩ B)
P (A | B) = .
P (B)
We interpret as the probability of A happening given that B has happened.
Note that if A and B are independent, then

P (A ∩ B) P (A)P (B)
P (A | B) = = = P (A).
P (B) P (B)
Example
In a game of poker, let Ai = [player i gets royal flush]. Then
P (A1 ) = 1.539 × 10−6 .
and
P (A2 | A1 ) = 1.969 × 10−6 .
It is significantly bigger, albeit still incredibly tiny. So we say “good hands attract”.
If P (A | B) > P (A), then we say that B attracts A. Since
P (A ∩ B) P (A ∩ B)
> P (A) ⇔ > P (B),
P (B) P (A)
63
A attracts B if and only if B attracts A. We can also say A repels B if A attracts B C .
Theorem
1. P (A ∩ B) = P (A | B)P (B).
2. P (A ∩ B ∩ C) = P (A | B ∩ C)P (B | C)P (C).

P (A∩B|C)
3. P (A | B ∩ C) = P (B|C) .
4. The function P (ϕ | B) restricted to subsets of B is a probability function (or measure).
Proof
Proofs of (i), (ii) and (iii) are trivial. So we only prove (iv). To prove this, we have to check the axioms.
P (A∩B)
1. Let A ⊆ B. Then P (A | B) = P (B) ≤ 1.
P (B)
2. P (B | B) = P (B) = 1.
3. Let Ai be disjoint events that are subsets of B. Then

! S
[ P ( i Ai ∩ B)
P Ai B =
i
P (B)
S
P ( i Ai )
=
P (B)
X P (Ai )
=
P (B)
X P (Ai ∩ B)
=
P (B)
X
= P (Ai | B).
A partition of the sample space is a collection of disjoint events {Bi }∞

S
i=0 such that i Bi = Ω.
For example, “odd” and “even” partition the sample space into two events.
The following result should be clear:

proposition
If Bi is a partition of the sample space, and A is any event, then
∞
X ∞
X
P (A) = P (A ∩ Bi ) = P (A | Bi )P (Bi ).
i=1 i=1
Example
A fair coin is tossed repeatedly. The gambler gets +1 for head, and −1 for tail. Continue until he is broke
or achieves $a. Let
px = P (goes broke | starts with $x),
64
and B1 be the event that he gets head on the first toss. Then
px = P (B1 )px+1 + P (B1C )px−1

1 1
px = px+1 + px−1
2 2
We have two boundary conditions p0 = 1, pa = 0. Then solving the recurrence relation, we have
x
px = 1 − .
a
Example 1
Five hundred (500) TV viewers consisting of 300 males and 200 females were asked whether they were
satisfied with the news coverage on a certain TV channel. Their replies are summarized in the table below.
Satisfied Non Satisfied Total

Male 180 120 300
Female 90 110 200
Total 270 230 500
180
P (satisfied|male) = = 0.6
300
90
P (satisfied|female) = = 0.45
200
120 180
P (not satisfied|male) = =1− = 0.4
300 300
110 90
P (not satisfied|female) = =1− = 0.55
200 200
270
P (satisfied) = = 0.54 and P (not satisfied) = 1–0.54 = 0.46
500
6.7 Bayes’ Theorem

Suppose Bi is a partition of the sample space, and A and Bi all have non-zero probability. Then for any Bi ,
P (A | Bi )P (Bi )
P (Bi | A) = P .
j P (A | Bj )P (Bj )
Note that the denominator is simply P (A) written in a fancy way.

Example
Suppose we have a screening test that tests whether a patient has a particular disease. We denote positive
and negative results as + and − respectively, and D denotes the person having disease. Suppose that the
test is not absolutely accurate, and
P (+ | D) = 0.98
P (+ | DC ) = 0.01
P (D) = 0.001.
65
So what is the probability that a person has the disease given that he received a positive result?
P (+ | D)P (D)
P (D | +) =
P (+ | D)P (D) + P (+ | DC )P (DC )
0.98 · 0.001
=
0.098 · 0.001 + 0.01 · 0.999
= 0.09
So this test is pretty useless. Even if you get a positive result, since the disease is so rare, it is more likely
that you don’t have the disease and get a false positive.
Example
Consider the two following cases:
1. I have 2 children, one of whom is a boy.
2. I have two children, one of whom is a son born on a Tuesday.
What is the probability that both of them are boys?
Solution
1/4
1. P (BB | BB ∪ BG) = 1/4+2/4 = 13 .
2. Let B ∗ denote a boy born on a Tuesday, and B a boy not born on a Tuesday. Then
1 1 1 6
14 · 14 + 2 · 14 · 14
P (B ∗ B ∗ ∪ B ∗ B | BB ∗ ∪ B ∗ B ∗ ∪ B ∗ G) = 1 1 1 6 1 1
14 · 14 + 2 · 14 · 14 + 2 · 14 · 2
13
= .
27
How can we understand this? It is much easier to have a boy born on a Tuesday if you have two boys than
one boy. So if we have the information that a boy is born on a Tuesday, it is now less likely that there is
just one boy. In other words, it is more likely that there are two boys.
Example
A lab test is 95 percent effective at detecting a certain disease when it is present (sensitivity). When the
disease is not present, the test is 99 percent effective at declaring the subject negative (specificity). If 8
percent of the population has the disease (prevalence), what is the probability that a subject has the disease
given that
(a) his test is positive?
(b) his test is negative?
Solution
Let D =disease is present and z = test is positive. We are given that P (D) = 0.08 (prevalence), P (z|D) = 0.95
(sensitivity), and P (z̄|D̄) = 0.99 (specificity). In part (a), we want to compute P(D—z). By Bayes Rule,
P (Z|D)P (D)
P (D|z) =
P (Z|D)P (D) + P (Z|D̄)P (D̄)
(0.95)(0.08)
= ≈ 0.892
(0.95)(0.08) + (0.01)(0.92)
66
In part (b), we want P (D|z̄). By Bayes Rule,
P (Z̄|D)P (D)
P (D|Z̄) =
P (Z̄|D)P (D) + P (Z̄|D)P (D̄)
(0.05)(0.08)
= ≈ 0.004
(0.05)(0.08) + (0.99)(0.92)
Example
An Economist believes that during periods of high economic growth, the Indian Rupee appreciates with
probability 0.70; in periods of moderate economic growth, it appreciates with probability 0.40; and during
periods of low economic growth, the Rupee appreciates with probability 0.20.During any period of time
the probability of high economic growth is 0.30; the probability of moderate economic growth is 0.50 and
the probability of low economic growth is 0.20. Suppose the Rupee value has been appreciating during the
present period. What is the probability that we are experiencing the period of
(a) high, economic growth?
(b) moderate, economic growth?
(c) low, economic growth?
Solution
Our partition consists of three events: high economic growth (event H), moderate economic growth (event
M) and low economic growth (event L). The prior probabilities of these events are:
P (H) = 0.30 P (M ) = 0.50 P (L) = 0.20
Let A be the event that the rupee appreciates. We have the conditional probabilities
P (A/H) = 0.70 P (A/M ) = 0.40 P (A/L) = 0.20
By using the Bayes’ theorem we can find out the required probabilities
P(H /A), P(M / A) and P(L / A)
P (A/H).P (H)
P (H/A) =
P (A/H).P (H) + P (A/M ).P (M ) + P (A/L).P (L)
(0.70)(0.30)
=
(0.70)(0.30) + (0.40)(0.50) + (0.20)(0.20)
= 0.467
P (A/M ).P (M )
P (M/A) =
(040)(0.50)
=
(0.70)(0.30) + (0.40)(0.50) + (0.20)(0.20)
= 0.444
67
P (A/L).P (L)
P (L/A) =
(0.20)(0.20)
=
(0.70)(0.30) + (0.40)(0.50) + (0.20)(0.20)
= 0.089
68
Lecture 7: Random Variables and
probability distributions
Discrete random variables

A random variable is a variable whose value depends on the outcome of a random experiment.
A random variable is denoted by a capital letter and a particular value of a random variable by a lower case
(small) letter.
Examples:
1) T = the number of tails (t) when a coin is flipped 3 times.
2) X = the sum of the values (x) showing when two dice are rolled.
3) H = the height (h) of a woman chosen at random from a group.
4) V = the liquid volume (v) of soda in a can marked 12 oz.
There are two types of random variables:

Discrete Random Variables
Variables that have a finite or countable number of possible values.
These variables usually occur in counting experiments.
Continuous Random Variables
Variables that can take on any value in some interval i.e. they can take an infinite number of possible
values.
These variables usually occur in experiments where measurements are taken.
Examples:
1) The variables T and X from the above examples are discrete random variables.
2) The variables H and V from the above examples are continuous random variables.
69
70
Discrete probability distributions

A discrete probability distribution is a list of the possible distinct values of the random variable together
with their corresponding probabilities.
The probability of the random variable X assuming a particular value x is denoted by P (X = x) = P (x).
This probability, which is a function of x, is referred to as the probability mass function.
Examples:
1) As above, let T be the random variable that represents the number of tails obtained when a coin is
flipped three times. Then T has 4 possible values 0, 1, 2, and 3. The outcomes of the experiment and
the values of T are summarized in the next table.
Outcomes T
hhh 0
hht, hth, thh 1
tth, tht, htt 2
ttt 3
Assuming that the outcomes are all equally likely, the probability distribution for T is given in the
following table.
t 0 1 2 3 Total
P(t) 1/8 3/8 3/8 1/8 1
2) Let Y denote the number of tosses of a coin until heads appear first. Then
S = {h, th, tth, ttth, . . .} and Y = 1, 2, 3, 4, . . .
y 1 2 3 ... Total
1
P(y) 2 ( 21 )2 ( 12 )3 ... 1
3) A pair of dice is tossed. Let X denote the sum of the digits. The probability distribution of X can be
found from the following table. The entry in any particular cell is the sum of the row and column values.
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = x) 36 36 36 36 36 36 36 36 36 36 36
Note:
For any discrete randomPvariable X, the range of values that it can assume are such that
0 ≤ P (X = x) ≤ 1 and x P (x) = 1
The cumulative distribution function

The cumulative distribution function is defined as The graphs on the previous page are plots of the probability
mass function (graph on the right) and cumulative distribution function (graph on the left).
A random variable can only take on one value at a time i.e. the events X = x1 and X = x2 for x1 ̸= x2 are
mutually exclusive. The probability of the variable taking on any number of different values can be found
by simply adding the appropriate probabilities.
Examples
71
1) Find the probability of getting 2 or more tails when a coin is flipped 3 times.
3 1 1
P (T ≥ 2) = + = .
8 8 2
2) Find the probability of getting at least one tail when a coin is flipped 3 times.
1 7
P (at least1) = 1–P (0) = 1– = .
8 8
3) Find the probability of needing at most 3 tosses of a coin to get the first heads.
Probability density function (pdf )

A function f is said to be the probability density function of a continuous random variable X if it satisfies
the following properties.
(i) f (x) ≥ 0 − ∞ < x < ∞
R∞
(ii) −∞ f (x)dx = 1
Z a
P (a) = f (x)dx = 0
a
The probability that x lies in the interval (a,b) is given by
Z b
P (a < X < b) = f (x)dx
a
Distribution function for continuous random variable.

If X is a continuous random variable with p.d.f f(x), then the distribution function is given by
72
Rx
(i) F (x) = f (x)dx = P (X ≤ x); −∞ < x < ∞
−x
Rb
(ii) F (b) − F (a) = a f (x)dx = P (a ≤ X ≤ b)
Example
Examine whether f (x) = 5x4 , 0 < x < 1 can be a p.d.f of a continuous random variable x.
Solution R∞
For probability density function, we show that −∞ f (x)dx = 1
Z 1
f x4 dx = 1
0
Z 1 5 1
x
f x4 dx = 5
0 5 0
1
5 5
= x
5 0
= [15 − 0]
=1
Hence f(x) is a pdf.
Example
A continuous random variable x follows the rule f (x) = Ax2 , 0 < x < 1. Determine A.
Solution R∞
Since f(x) is a p.d.f −∞ f (x)dx = 1
Therefore
Z 1
Ax2 dx = 1
0
1
x3

A =1
3 0
A
= 1A =3
3
Mathematical Expectation
A very important concept in probability and statistics is that of mathematical expectation, expected value,
or briefly the expectation, of a random variable. For a discrete random variable X having the possible values
x1 , x2 , . . . , xn the expectation of X is defined as
E(X) = x1 P (X = x1 ) + . . . + xn P (X = xn )
Xn
= xi P (X = xi )
i=1
or
E(X) = x1 f (x1 ) + . . . + xn f (xn )
Xn
= xi f (xi )
i=1
73
Example.
Suppose that a game is to be played with a single die assumed fair. In this game a player wins $20 if a 2
turns up; $40 if a 4 turns up; loses $30 if a 6 turns up; while the player neither wins nor loses if any other
face turns up. Find the expected sum of money to be won.
Solution
x 0 +20 0 +40 0 −30

f(x) 1/6 1/6 1/6 1/6 1/6 1/6
Therefore, the expected value, or expectation, is
E(X) = 0(1/6) + 20(1/6) + 0(1/6) + 40(1/6) + 0(1/6) + (−30)(1/6) = 5
Examples
As above, let T be the random variable that represents the number of tails obtained when a coin is flipped
three times. Then T has 4 possible values 0, 1, 2, and 3. The outcomes of the experiment and the values of
T are summarized in the next table.
Outcomes T
hhh 0
hht, hth, thh 1
tth, tht, htt 2
ttt 3
Find the expected value of the random variable T.
Solution
X 1 3 3 1 12
E(T ) = tp(t) = (0 × ) + (1 × ) + (2 × ) + (3 × ) = = 1.5
t
8 8 8 8 8
Theorem
Let X be a discrete r.v. with probability function p(x). Then
(i) E(c) = c where c is any real constant.
(ii) E[ax + c] = aµ + b where a and b are constants.
(iii) E[kg(x)] = kE[g(x)] where g(x) is a real valued function of X.
(iv) E[ag1 (x) ± bg2 (x)] = aE[g1 (x)] ± bE[g2 (x)] where gi (x) iis a real valued functions of X.
Variance and Standard Deviation

Let X be a r.v with mean E(X) = µ, then the variance of X is denoted by σ 2 or Var(X) and is given by
σ 2 = E[(X − µ)2 ]
Xn
= (xi − µ)2 p(xi )
i=1
X
= x2 p(x) − µ2
The standard deviation σ .Let X be a discrete r.v. with probability function p(x). Then is the square root
of the variance of X given by p p
σ = V ar(X) = E(X − µ)2
74
Theorem
X
V ar(X) = E(X − µ)2 = x2 p(x) − µ2
Proof
V ar(X) = E(X − µ)2

= E(X 2 − 2Xµ + µ2 )
= E(X 2 ) − 2µE(X) − µ2
X
= x2 p(x) − µ2
Theorem
V ar(aX + b) = a2 var(X)
Proof
Recall that E(aX + b) = aµ + b, thus
V ar(aX + b) = E[(aX + b) − (aµ + b)]2
= E[a(X − µ)]2
= E[a2 (X − µ)2 ]
= a2 E[(X − µ)2 ]
= a2 var(X)
Example
Given a probability distribution of X as below, find the mean and standard deviation of X
x 0 1 2 3
P(X=x) 1/8 1/4 3/8 1/4
Solution
x 0 1 2 3 Total
P(X=x) 1/8 1/4 3/8 1/4 1
xP(X=x) 0 1/4 3/4 3/4 7/8
x2 P (X = x) 0 1/4 13/2 9/4 4
3
X
E(X) = µ = xP (X = x) = 1.75
x=0
Standard deviation p p
σ= E(X 2 ) − µ2 = 4 − 1.752 = 0.968246
Example 2
The probability distribution of a r.v X is as shown below, find the mean and standard deviation of;
x 0 1 2
P(X=x) 1/6 1/2 1/3
Required
75
(a) Calculate the mean, Variance and Standard deviation of X
(b) Calculate the mean, Variance and Standard deviation of Y = aX + b
Proof
x 0 1 2 Total
P(X=x) 1/6 1/2 1/3 1
xp(X=x) 0 1/2 2/3 /7/6
x2 p(X = x) 0 1/2 4/3 11/6
2
X
E(X) = µ = xp(X = x) = 7/6
x=0
2
X
E(X 2 ) = x2 p(X = x) = 11/6
x=0
Standard deviation
p p p
σ= E(X 2 ) − µ2 = 11/6 − (7/6)2 = 17/6 = 1.6833
E(Y ) = 12E(X) + 6 = 12(7/6) + 6 = 20
p
V ar(Y ) = V ar(12X + 6) = 122 V ar(X) = 144 17/6 = 242.38812
Expectation of a continuous random variable

Let X be a continuous random variable with probability density function f(x), then the mathematical expec-
tation of x is defined as
Z ∞
E(X) = xf (x)dx
−∞
provided the integral exists.

Example
Let X be a continuous random variable with p.d.f given by
(
4x3 0<x<1
f (x) =
0 elsewhere
76
Solution
We know that
Z ∞
E(X) = xf (x)dx
−∞
Z 1
= x(4x3 )dx
0
Z 1
=4 x(x3 )dx
0
5 1
x
=4
5 0
1
4 5
= x
5 0
4 5
= [1 − 05 ]
5
4
=
5
Example:
Let x be a continuous random variable with pdf. given by
(
3x2 0<x<1
f (x) =
0 elsewhere
Find mean and variance

Solution:
Z ∞
E(X) = xf (x)dx
−∞
Z 1
= x[3x2 ]dx
0
Z 1
=3 x3 dx
0
4 1
x
=3
4 0
1
3 4
= x
4 0
3 4
= (1 − 04 )
4
3
=
4
77
Z ∞
E(X 2 ) = x2 f (x)dx
−∞
Z 1
= x2 [3x2 ]dx
0
Z 1
=3 x4 dx
0
4 1
x
=3
5 0
1
3 5
= x
5 0
3 5
= (1 − 05 )
5
3
=
5
V ar(X) = E(X 2 ) − [E(X)]2

2
3 3
= −
5 4
3 9
= −
5 15
48 − 45 3
= =
80 80
78
Lecture 8: Discrete probability
distributions
Bernoulli random variable

A random variable X which takes two values 0 and I. with probabilities q and p respectively. i.e . P (X =
I) = p. P (X = 0) = q. q = 1 − p is called a Bernoulli variate and is said to have a Bernoulli distribution.
Assumptions
A discrete random variable X is said to have a binomial distribution if a random experiment satisfies the
following conditions:
(1) The experiment is repeated a fixed number of times. Each repetition is called a trial. The number of
trials is denoted by n.
(2) All trials are independent of each other.
(3) The outcome for each trial of the experiment can be one of two complementary outcomes, one(s)
labeled success and the other(f) labeled failure. A single trial is called a Bernoulli trial.
(4) The probability of success(P) has a constant value of p for each trial.
(5) The random variable X counts the number of success that has occurred in n trials.
Binomial distribution
The random variable that counts the number of successes in many independent, identical Bernoulli trials is
called a Binomial Random Variable.
In probability theory and statistics, the binomial distribution is the discrete probability distribution of the
number of successes in a sequence of n independent yes/no experiments, each of which yields success with
probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In
fact, when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis
for the popular binomial test of statistical significance.
The probability that the event will occur exactly x times in n trials s (i.e., x successes and n – x failures
will occur) is given by the probability function

n x
f (x) = P (X = x) = p (1 − p)n−x
x
n!
= px (1 − p)n−x x = 0, 1, 2, . . . , n, 0 ≤ p ≤ 1
x!(n − x)!
79
80
Example 5.1.
The probability of getting exactly 2 heads in 6 tosses of a fair coin.
Solution
2 4
6 1 1 15
P (X = 2) = =
2 2 2 64
The discrete probability function f(x) is often called the binomial distribution since x = 0, 1, 2, . . . , n, it
corresponds to successive terms in the binomial expansion

n n−1 n n−2 2
(q + p)n = q 2 + q p+ q p + . . . + pn
1 2
n
X n x
= p (1 − p)n−x .
x=0
x
The special case of a binomial distribution with n = 1 is also called the Bernoulli distribution.
Properties of Binomial Distributions

1. Expected Value or Mean
The expected value or the mean, denoted by µ, of a Binomial distribution is computed as
n
X
E(X) = µ = xP (x)
x=0
an evaluation of µ will show that µ = np.

2. Variance
The variance is denoted by σ 2 of a binomial distribution is given by
V arX = σ 2 = E[(X − µ)2 ]
Xn
= (x − µ)2 P (x)
x=0
2 2
An evaluation of σ shows that σ = npq.
3. Moments about the Origin
The rth h moment about the origin denoted by m0r , of a Binomial distribution is computed as
n
X
m0r = xr p(x)
x=0
First moment about the origin will be

n
X
m01 = xp(x)
x=0
= np = µ
Second moment about the origin will be
n
X
m02 = x2 p(x)
x=0
= n(n − 1)p2 + np
81
4. Moments about the Mean

The rth moment about the mean denoted by mµr , of a binomial distribution is computed as:
n
X
mµr = (x − µ)r p(x)
x=0
First moment about the mean will be

n
X
mµ1 = (x − µ)1 p(x)
x=0
=0
Second moment about the mean will be

n
X
mµ2 = (x − µ)2 p(x)
x=0
= npq = σ 2
Third moment about the mean will be

n
X
mµ3 = (x − µ)3 p(x)
x=0
= npq(q − p)
Fourth moment about the mean will be

n
X
mµ4 = (x − µ)4 p(x)
x=0
= 3(npq)2 + npq(1 − 6qp)
5. Skewness
To bring out the skewness of a Binomial distribution we can calculate, moment coefficient of skewness, γ1
p
γ 1 = β1
s
(µµ3 )2
=
(µµ2 )3
µµ3
=
p µ 3

µ2
npq(q − p)
= 3
√
npq
q−p
=√
npq
82
5. Kurtosis
A measure of kurtosis of the Binomial distribution is given by the moment coefficient of kurtosis γ2
γ 2 = β2 − 3
mµ
= 42 − 3
mµ2
3n2 p2 q 2 + npq(1 − 6pq)

= −3
n2 p 2 q 2
1 − 6pq
=
npq
6. Normal approximation of the Binomial distribution

If n is large and if neither of p or q is too close to zero, the Binomial distribution can be closely approximated
by a Normal distribution with standardized variable
X − np
Z= √
npq
7. Poisson approximation of the Binomial distribution

Binomial distribution can reasonably be approximated by the Poisson distribution when n is infinitely large
and p is infinitely small i. e. when
n → ∞ and p → 0.
Example
Assuming the probability of male birth as ½, find the probability distribution of number of boys out of 5
births.
(a) Find the probability that a family of 5 children have
(i) at least one boy

(ii) at most 3 boys
(b) Out of 960 families with 5 children each find the expected number of families with (i) and (ii) above
Solution
Let the random variable X measures the number of boys out of 5 births. Clearly X is a binomial random
variable. So we apply the Binomial probability function to calculate the required probabilities.
1
X ∼ B(5, )
2

n x n−x
P (X = x) = p q for x = 0, 1, 2, 3, 4, 5
x
The probability distribution of X is given below
X=x 0 1 2 3 4 5
P (X = x) 1/32 5/32 10/32 10/32 5/32 1/32
83
(a) The required probabilities are
P (X ≥ 1) = 1 − P (X = 0)
= 1 − 1/32
= 31/32
P (X ≤ 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
= 1/32 + 5/32 + 10/32 + 10/32
= 26/32
(b) Out of 960 families with 5 children, the expected number of families with
At least one Boy = 960 ∗ P (X ≥ 1)

= 960 ∗ 31/32
= 930
At most 3 boys = 960 ∗ P (X ≤ 3)

= 960 ∗ 26/32
= 720
Poisson distribution
Poisson Distribution was developed by a French Mathematician Simeon D Poisson (1781- 1840). If a random
variable X is said to follow a Poisson Distribution, if its probability distribution is given by
e−µ µx
P (X = x) = x = 0, 1, 2, . . .
x!
where
x is the number of successes
µ is the mean of the Poisson distribution
e = 2.71828 (the base of natural logarithms)
The random variable X counts the number of successes in Poisson Process. A Poisson process corresponds
to a Bernoulli process under the following conditions:
the number of trials n, is infinitely large i.e. n → ∞
the constant probability of success p, for each trial is infinitely small i.e. p → 0.
np = µ is finite
84
Let us consider a Bernoulli process with n trials and probability of success in any trial p = nµ , where µ ≥ 0.
Then, we know that the probability of x successes in n trials is given by
x n−x
n µ µ
p(X = x) = 1−
x n n
x n−x
n! µ µ
= 1−
x!(n − x)! n n
x n−x
n(n − 1) . . . (n − (x − 1)) µ µ
= 1−
x! n n
x
n−x
µ n n−1 n−2 n − (x − 1) µ
= [ . . ... ] 1−
n n n n n n
x
−x
µ 1 2 x−1 µ
= 1− 1− ... 1 − 1−
x! n n n n
−x
n−(x−1)
Now as n → ∞ then [ nn . n−1 n−2
n . n ... n ] → 1 and 1 − nµ → e−µ as n → ∞. Thus, we have
P (X = x) = f race−µ µx x! x = 0, 1, 2, . . .
Characteristics of a Poisson Distribution

1. Expected Value or Mean
The expected value or the mean, denoted by µ, of a Poisson distribution is computed as
X
E(X) = µ = xP (x)
all x
2. Variance
The variance, denoted by σ 2 , of a Poisson distribution is computed as
V ar(X) = σ 2 = E[(X − µ)2 ]
X
= (X − µ)2 P (x)
all x
3. Moments about the Origin

The rth moments about the origin denoted by m0r of a poisson distribution is computed as
X
m0r = all xxr p(x)
First moment about the origin will be

X
m01 = xp(x)
all x
=µ
Second moment about the origin will be
X
m02 = x2 p(x)
all x
= µ + µ2
85
4. Moments about the Mean

The rth moments about the mean denoted by mµr of a Poisson distribution is computed as
X
m0r = (x − µ)r p(x)
all x
First moment about the mean will be

X
mµ1 = (x − µ)1 p(x)
all x
=0
Second moment about the mean will be
X
mµ2 = (x − µ)2 p(x)
all x
= σ2
=µ
Third moment about the mean will be
X
mµ3 = (x − µ)3 p(x)
all x
=µ
Fourth moment about the mean will be
X
mµ4 = (x − µ)3 p(x)
all x
= 3µ2 + µ
5. Skewness
To bring out the skewness we can calculate, moment coefficient of skewness, γ1
p
γ 1 = β1
s
(mµ3 )2
=
(mµ2 )3
mµ3
=
p µ 3

m2
1
=√
µ
6. Kurtosis
A measure of kurtosis of the Poisson distribution is given by the moment coefficient of kurtosis
γ 2 = β2 − 3
mµ4
= −3
(mµ2 )2
1
=√
µ
86
7. Poisson approximation of the Binomial distribution

Poisson distribution can reasonably approximate Binomial distribution when n is infinitely large and p is
infinitely small i. e. when
n → ∞ and p → ∞
Example
At a parking place the average number of car-arrivals during a specified period of 15 minutes is 2. If the
arrival process is well described by a Poisson process, find the probability that during a given period of 15
minutes
(a) no car will arrive
(b) atleast two cars will arrive
(c) atmost three cars will arrive
(d) between 1 and 3 cars will arrive
Solution:
Let X denote the number of cars arrivals during the specified period of 15 minutes. So X ∼ pois(λ)
e−λ λx
P (X = x) = x = 0, 1, . . . ,
x!
(a)
e−2 20
P (no car will arrive ) = = 0.1353
0!
(b)
P (atleast two cars will arrive) = P (X ≥ 2)

= 1 − [P (X = 0) + P (X = 1)]
e−2 20 e−2 21
=1−[ + ]
0! 1!
= 1 − [0.1353 + 0.2707]
= 1 − 0.4060
= 0.5940
(c)
P (atmost three cars will arrive) = P (X ≤ 3)

3
X e−2 2x
=
x=0
x!
= P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
= 0.8571
87
(d)
P (between 1 and 3 cars will arrive) = P (1 ≤ X ≤ 3)

= P (X ≤ 3) − P (X = 0)
3
X e−2 2x e−2 20
= −
x=0
x! 0!
= 0.8571 − 0.1353
= 07218

SMA 140 Lectures Notes 2024 Sep

Uploaded by

Copyright:

Available Formats

SMA 140 Lectures Notes 2024 Sep

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SMA 140 Lectures Notes 2024 Sep

Uploaded by

Copyright:

Available Formats

SMA 140 INTRODUCTION TO PROBABILITY AND

Dr Robert Mathenge Mutwiri

1) The monthly sales of a certain vehicle collected over a period.

2) The number of passengers using a certain airline on various routes.

3) Rating (on a scale from 1 to 5) of a new product by customers.

(i) Data collection

(ii) Data presentation,

(iii) Data analysis,

(iv) Data interpretation

1) The population of people inhabiting a certain country.

3) All patients in a certain area suffering from AIDS.

4) Exam marks obtained by all students studying a certain statistics course.

Census– A study where every member or element of the population is included.

2) Special investigations e.g. tax study commissioned by a government.

3) Any study of all the individuals/elements in a population.

Variable – Characteristic or attribute that can assume different values.

1) The number of cars parked in a parking lot.

2) The number of students attending a statistics lecture.

1) The body temperature of a person.

2) The weight of a person.

3) The height of a tree.

4) The contents of a bottle of cool drink.

1.3 Measurement scales

1) The course of study at university (B.Com, B.Eng , BA etc.)

Examples Discrete and continuous variables examples given above.

Measurement scale Examples Meaningful calculations

1.4 Graphical Displays

2.2.1 Pie Chart

Snack Potato chips Tortilla chips Pretzels Popcorn Snack nuts

2.2.2 Bar chart

2.23 Pareto Charts

i) Make the bars the same width.

ii) Arrange the data from largest to smallest according to frequency.

State Indiana Oklahoma Florida Maine Pennsylvania

State Florida Pennsylvania Oklahoma Maine Indiana

2.1 Frequency Distributions Tables

2.1.1 Construction of Ungrouped Frequency Distributions

2.1.2 Construction of Grouped Frequency Distributions

7 × class width > 34

Class Bounderies Tally Frequency

Continuous Frequency Distribution

class limits class boundaries f relative frequency cumulative frequency

set were accurately recorded (not rounded).

7.21741 7.8989 6.85461 10.31167 8.48253 5.17069

Lower class limit(boundary)+Upper class limit(boundary)

class limits class boundaries midpoints

1) See frequency distribution in example 2 (temperature data).

classes upper class boundary f cumulative frequencies calculations

Relative and percentage frequencies

 The percentage frequency of a class is calculated from relative frequency ×100.

classes f relative frequency percentage frequency

The plot is shown on the following page. Example:

Class 5-9 10-14 15-19 20-24 25-29 30-34 35-39

Boundaries 4.5-9.5 9.5-14.5 14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5

2.3 Stem and Leaf Plots

value and the largest value. Note:

(i) In a stem and leaf plot, classes width/interval must be uniform.

2.3.1 Back-to-Back Stem and Leaf Plot

 calculate the Arithmetic Mean

The percentage frequency of a class is calculated from relative frequency ×100.

calculate the Arithmetic Mean

obtain the median of group and ungrouped distribution

Numerical summaries are categorized as measures of location and measures of spread.

Median = me = 50th percentile = P50 .

First quartile = Q1 = 25th percentile = P25 .

Third quartile = Q3 = 75th percentile = P75 .