Computational Mathematics Open University

Computational
Mathematics
OUbs033112
Table of Contents
UNIT 1 – Matrices 2
UNIT 2 – Solving Systems of Linear Equations 17
UNIT 3 – Permutations and Combinations 29
UNIT 4 – Probability Distributions 49
UNIT 5 – Correlation and Regression 79
UNIT 6 – Sets, Relations and Functions 99
UNIT 7 – Graph Theory 107
UNIT 8 – Randomness 130
Open University of Mauritius – Computational Mathematics

1
UNIT
Matrices
1
UNIT STRUCTURE
1.1 Introduction
1.2 Learning Outcomes
1.3 Definition and Symbols
1.4 Types of Matrices
1.5 Matrix Operations
1.6 Minors, Cofactors, Adjoints, and Cofactor Expansion
1.7 Activity
1.1 Introduction
In this Unit, you will be introduced to matrices. The aim of Unit 1 is to explain how
to manipulate matrices. Matrices are useful in Computer Graphics. For instance if
you have to rotate an object, you assign the coordinates to the vertices of the object
and multiply coordinates by a matrix that does rotation.

By the end of this Unit, you should be able to do the following:
1. Identify the different types of matrices.
2. Add, subtract, multiply and transpose matrices.
3. Define inverse of the matrices.
1.3 Definition and Symbols

A matrix is defined as an array of numbers, symbols or expressions. Most of the times we will have
numbers rather than expressions and symbols. These numbers are referred to as the entries or elements
of the matrix. A Matrix consists of rows and columns. We can only place the elements where a row
meets a column.
We shall represent the total number of rows that a matrix has by m. Clearly m is an integer! (Does is
make sense to say there 4.16 rows???).
We shall represent the total number of columns that a matrix has by n. Clearly n is also an integer!
(Does is make sense to say there 3.7 columns???).
The order or size or dimension of the matrix will be denoted by (m x n). We will read it as m by n.
The elements of the matrix will be denoted by a. Note a is not necessarily integer. It can be an integer,
real or even complex number.
We shall label the rows of the matrix as Row 1 or R1, Row 2 or R2, …, Row i or
Ri,…,Row m or Rm. Note the first row is at the top and last row is at the bottom.
We shall label the columns of the matrix as Column 1 or C1, Column 2 or C2, …,
Column i or Ci,…,Column m or Cm. Note that the first column is at the extreme
left and last column is at the extreme right.

Recall that we said that you can place the elements a of a matrix only at the
intersection of a row and a column. We shall use the symbol aij to represent the
element of the matrix placed where the ith row meets the jth column. For example
a11 or a11 is the element placed at the intersection of the first row and the first
column. a11 is also called the first element of the matrix as it placed in the top-left
corner.
a23 is the element placed at the intersection of the second row (R2) and the third
column (C3).
Hence the (m x n) matrix (that we will name as matrix A) looks like
𝑎11 𝑎12 ⋯ 𝑎1𝑛

𝑎21 𝑎22 ⋯ 𝑎2𝑛
𝐴= ( ⋮ ⋮ ⋮ ⋮ )
𝑎𝑚1 𝑎𝑚2 ⋯ 𝑎𝑚𝑛
We can also represent this matrix A by A = (aij), i =1, 2, 3,…,m and j = 1, 2, 3,…,n.
An example of a (2 x 3) matrix B is
2 3.5 −45
𝐵 = ( 1 −6.7 2 )
2
In matrix B:
There are 2 rows and 3 columns. Here m = 2 and n = 3.
The order or size or dimension of B (also written as Dim(B)) is (2 x 3).
The first row is R1 = (2 3.5 -45)
The second row is R2 = (1/2 -6.7 2)
2
The first column is C1 = ( 1 )
2
3.5
The second column is C2 = ( )
−6.7
−45
The third column is C3 = ( )
2
a11 = 2 a12 = 3.5 a13 = -45
a21 = 1/2 a22 = -6.7 a23 = 2

Activity 1
2 −3.1
𝐴= (5 6 )
−7 8
2 1
(a) What is order of matrix A?

3

(b) Write R3.
(c) Write C2.
(d) What is value of the element a22?
(e) What is value of the element a31?
Answer:
(a) (4 x 2)
(b) (-7 8)
−3.1
(c) ( 6 )
8
1
(d) 6
(e) -7
1.4 Types of Matrices
A Row Matrix is a matrix with one row. Example: (2 5 4) has one row and three
columns.
−7
A Column Matrix is a matrix with one column. Example: ( ) has two rows and one
10
column. A column matrix is also called a vector.
A Square Matrix is one that has equal number of rows and columns m = n. For example,
1 4 2
(−12 5 5) has three rows and three columns.
5 3 8
The main diagonal of a square matrix constitute of the elements: a11, a22, a33,…
𝒂𝟏𝟏 𝑎12 ⋯ 𝑎1𝑛
𝑎21 𝒂𝟐𝟐 ⋯ 𝑎2𝑛
𝐴= ( ⋮ ⋮ ⋮ ⋮ )
𝑎𝑚1 𝑎𝑚2 ⋯ 𝒂𝒎𝒏
In the matrix 𝐴 = ( 2
3
) the elements 2 and 5 lie on the main diagonal.
−1 5
2 3
In the matrix A = ( ) the elements 2 and 5 lie on the main diagonal.
−1 5

1 5 7
In the matrix 𝐵 = (10 2 −15) the elements 1, 2 and 3 lie on the main
8 9 3
diagonal.
An identity matrix is a square matrix in which all the elements on the main
diagonal have a value of one and all the other remaining elements have a value of
zero:
1 0 0
1 0
𝐼2 = ( ) 𝐼3 = (0 1 0)
0 1
0 0 1
I2 is a (2 x 2) identity matrix.
I3 is a (3 x 3) identity matrix.
Note IA = A and AI = A provided it is possible to multiply the matrices.
A diagonal matrix is one in which the elements that are not on the main
diagonal are zero while at least one element on the main diagonal is non-zero.
2 0 0
𝐷 = (0 −1 0)
0 0 3
A zero matrix is one in which all the elements have a value of zero.
A lower triangular matrix is one in which all the elements above the main
diagonal are zero.
2 0 0
𝐿 = (5 −1 0)
6 1 3
An upper triangular matrix is one in which all the elements below the main
diagonal are zero.
2 60 8
𝑈 = (0 −1 9)
0 0 3
Matrix A is said to be an invertible matrix, if there exists a matrix A-1 such

that AA-1 = A-1 A = I
A singular matrix is one in which the matrix is not invertible.
Two matrices are equal if the elements in the respective positions are equal.
Clearly equal matrices must have the same order.
x −2 3 −2
If the matrix ( ) is equal to ( ) then x = 3; y = 1; and z = 4.
4 y z 1

The transpose of matrix A is written as AT.
If the order of the matrix A is (m x n) then the order of matrix AT is (n x m).
The elements of the first row of matrix A are equal to the elements of the first column of matrix
AT.
The elements of the second row of matrix A are equal to the elements of the second column of
matrix AT. Note that (AT)T = A.
2 0.4
2 3 −5
If 𝐴=( ) then 𝐴𝑇 = ( 3 7.1)
0.4 7.1 1
−5 1
Matrix A is said to be symmetric if AT = A.
1.5 Matrix Operations
Addition and Subtraction

Condition: Matrices having exactly the same order can be added or subtracted.
Elements in the same position can be added or subtracted.
2 −3 4 10 20 0 3
𝐴= ( ) 𝐵=( ) 𝐶= ( )
5 1 6 7 8 9 6
Note that matrices A and B have the same order (2 x 3). So they can be added or subtracted. Since
the order of matrix C is (2 x 1), we can’t add or subtract A and C. Similarly we can’t add or subtract
B and C.
2 + 10 −3 + 20 4+0 12 17 4
𝐴+𝐵 =𝐵+𝐴= ( )=( )
5+7 1+8 6+9 12 9 15
Note if A + B = B + A, A – B ≠ B – A
2 − 10 −3 − 20 4−0 −8 −23 4
𝐴−𝐵 = ( )=( )
5−7 1−8 6−9 −2 −7 −3
10 − 2 20 − −3 0 − 4 8 23 −4
𝐵−𝐴 = ( )=( )
7−5 8−1 9−6 2 7 3
Scalar Multiplication
Suppose that k is a number. Let A be a matrix. Then kA is a matrix that has exactly the same order
as A. The matrix kA is obtained by multiplying all the elements of matrix A by k. For example 2A
means we multiply all the elements of A by 2.
5 −3 4 5 −3 4
𝐼𝑓 𝐴 = ( ) 𝑡ℎ𝑒𝑛 2𝐴 = 2 ( )
7 1 6 7 1 6
6

2𝑥5 2𝑥 − 3 2x4
2𝐴 = ( )
2𝑥7 2𝑥1 2𝑥6
10 −6 8
2𝐴 = ( )
14 2 12
Activity 2
𝟗 𝟓 𝟔 𝟏𝟎
𝑨=( ) and 𝑩 = ( )
−𝟏 𝟒 𝟎 𝟕
Write the matrix (1) 2A
(2) 3B
(3) 2A + 3B
(4) 2B - 3A
Answer
𝟏𝟖 𝟏𝟎 𝟏𝟖 𝟑𝟎
𝟐𝑨 = ( ) 𝟑𝑩 = ( )
−𝟐 𝟖 𝟎 𝟐𝟏
𝟑𝟔 𝟒𝟎
𝟐𝑨 + 𝟑𝑩 = ( )
−𝟐 𝟐𝟗
−𝟏𝟓 𝟓
𝟐𝑩 − 𝟑𝑨 = ( )
𝟑 𝟐
Matrix Multiplication
When we multiply two matrices, the product is a matrix. However, it is not always
possible to multiply two matrices. If A and B are two matrices then for the matrix
AB to exist, the number of columns in matrix A must be equal to the number of
rows in matrix B.
Note that AB ≠BA. Existence of AB does not guarantee that BA will also exist.
Suppose that matrix A has order (m x n) and matrix B has order (p x q). The
product AB will exist only if n = p.
If AB exists then its order will be (m x q).
Rows of matrix A are multiplied by columns of matrix B.
For example:

Matrix A has order (2 x 3) and matrix B has order (5 x 6). The product of the matrices AB does not
exist because 3 ≠ 5.
Matrix A has order (1 x 4) and matrix B has order (4 x 2). The product of the matrices AB exists
because 4 = 5. The order of the matrix AB is (1 x 2).
As we said, to obtain the product AB, rows of matrix A are multiplied by columns of matrix B.
When we multiply a row by a column, the answer is a number. Therefore, we will repeat the
following operations:
𝑥
(1) multiply the row (a b) by the column (𝑦):
𝑥
(𝑎 𝑏) (𝑦) = (𝑎𝑥 + 𝑏𝑦)
𝑥
(2) multiply the row (a b c) by the column (𝑦):
𝑧
𝑥
(𝑎 𝑏 𝑐 ) (𝑦) = (𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧)
𝑧
For example
(1 2) (5) = (1x5 + 2x6) = (17)
6
5
(1 2 3) (6) = (1x5 + 2x6 + 3x7) = (38)
7
−𝟐
𝟏 𝟐 𝟑
𝑨 = ( ) 𝑩=( 𝟎 )
𝟒 𝟓 𝟔
𝟏𝟎
The matrix A has order (2 x 3). The matrix B has order (3 x 1). Clearly, the matrix AB exists
because number of columns in matrix A is equal to number of rows in matrix B. The order of matrix
AB is (2 x 1).
1x − 2 + 2x0 + 3x10 28
𝑨𝑩 = ( )= ( )
4x − 2 + 5x0 + 6x10 52
Note matrix B has three rows and one column while matrix A has two rows and three columns. Matrix
BA does not exist because number of columns in matrix B is not equal to number of rows in matrix A.

Matrix A2 is obtained by multiplying matrix A by matrix A. It is a common mistake
to square each element of A to obtain A2. The condition for matrix multiplication
must apply.
For example
1 2
𝐴= ( )
−4 5
1 2 1 2 1x1 + 2x − 4 1x2 + 2x5

𝐴2 = ( )( )=( )
−4 5 −4 5 −4x1 + 5x − 4 −4x2 + 5x5
−7 12
𝐴2 = ( )
−24 17
Similarly, A3 = A2.A, A4 = A2.A2,…
Determinant of a 2 x 2 Matrix = |A| = det A

𝑎11 𝑎12
𝐴 = (𝑎 )
21 𝑎22
Determinant of matrix A is written as |A| or det A is obtained as follows

|A| = (a11 times a22) minus (a12 times a21)
10 1
For example, if 𝐴 = ( )
−2 20
Then |A| = (10 times 20) minus (1 times -2) =200 - -2=202
Determinant of a 3 x 3 Matrix = |A| = det A

𝑎11 𝑎12 𝑎13
𝐴 = (𝑎21 𝑎22 𝑎23 )
𝑎31 𝑎32 𝑎33
𝑎 𝑎23 𝑎21 𝑎23 𝑎21 𝑎22

|𝐴| = 𝑎11 |𝑎22 | − 𝑎12 | | + 𝑎13 |𝑎31 𝑎32 |
32 𝑎33 𝑎31 𝑎33
|𝐴| = 𝑎11 (𝑎22 𝑎33 − 𝑎23 𝑎32 ) − 𝑎12 (𝑎21 𝑎33 − 𝑎23 𝑎31 ) + 𝑎13 (𝑎21 𝑎32 −
𝑎22 𝑎31 )
Example
1 2 3
𝐴=( 7 8 9)
−4 6 5
|A| = 1(8x5 – 9x6) -2(7x5 - -4x9)+3(7x6 - -4x8) = 1(-14) – 2(71) + 3(74) = 66

1.6 Minors, Cofactors, Adjoints and Cofactor Expansion
Cofactor expansion is a common technique used to find the determinant of a square matrix.
Minors
A minor is a determinant. A square matrix of order (m x m) has m minors. To find the minor Mij,
proceed as follows:
Step 1: Delete the row i and column j.
Step 2: Minor is the determinant of the remaining matrix after deleting the ith row and jth column.
Cofactors
The cofactor of Cij is defined as Cij = (-1)i+j Mij
Matrix of Minors
The (3x3) matrix of minors is defined as
𝑀11 𝑀12 𝑀13
𝑀 = (𝑀21 𝑀22 𝑀23 )
𝑀31 𝑀32 𝑀33
Matrix of Cofactors
The (3x3) matrix of cofactors is defined as
𝐶11 𝐶12 𝐶13
𝐶 = (𝐶21 𝐶22 𝐶23 )
𝐶31 𝐶32 𝐶33
Adjoint
Adjoint is obtained by the transpose of the matrix of cofactors
𝐶11 𝐶12 𝐶13 𝑇 𝐶11 𝐶21 𝐶31
𝑇
𝐴𝑑𝑗𝑜𝑖𝑛𝑡 = 𝐶 = (𝐶21 𝐶22 𝐶23 ) = (𝐶12 𝐶22 𝐶32 )
𝐶31 𝐶32 𝐶33 𝐶13 𝐶23 𝐶33
Example
Find the minors and cofactors of the matrix A.
9 8 7
𝐴 = (6 5 4)
1 2 3
Write the matrix of minors.
Write the matrix of cofactors.
Write the adjoint of matrix A.
Solution
To find the minor M11, we will delete the first row and first column from matrix A. We will then
calculate the determinant of the remaining matrix:
10

5 4
𝑀11 = | | = (5x3 − 2x4) = 7
2 3
C11 = (-1)1+1 M11 = 1 x 7 = 7
To find the minor M12, we will delete the first row and second column from
matrix A. We will then calculate the determinant of the remaining matrix:
6 4
𝑀12 = | | = (6x3 − 1x4) = 14
1 3
C12 = (-1)1+2 M12 = -1 x 14 = -14
To find the minor M13, we will delete the first row and third column from matrix
A. We will then calculate the determinant of the remaining matrix:
6 5
𝑀13 = | | = (6x2 − 1x5) = 7
1 2
C13 = (-1)1+3 M13 = 1 x 7 = 7
To find the minor M21, we will delete the second row and first column from
8 7
𝑀21 = | | = (8x3 − 2x7) = 10
2 3
C21 = (-1)2+1 M21 = -1 x 10 = -10
To find the minor M22, we will delete the second row and second column from
9 7
𝑀22 = | | = (9x3 − 1x7) = 20
1 3
C22 = (-1)2+2 M22 = 1 x 20 = 20
To find the minor M23, we will delete the second row and third column from
9 8
𝑀23 = | | = (9x2 − 1x8) = 10
1 2
C23 = (-1)2+3 M23 = -1 x 10 = -10
To find the minor M31, we will delete the third row and first column from matrix
A. We will then calculate the determinant of the remaining matrix:
8 7
𝑀31 = | | = (8𝑥4 − 5𝑥7) = −3
5 4
C31 = (-1)3+1 M31 = 1 x -3= -3
11

To find the minor M32, we will delete the third row and second column from matrix A. We will then
9 7
𝑀32 = | | = (9x4 − 6x7) = −6
6 4
C32 = (-1)3+2 M32 = -1 x -6 = 6
To find the minor M33, we will delete the third row and third column from matrix A. We will then
9 8
𝑀33 = | | = (9x5 − 6x8) = −3
6 5
C33 = (-1)3+3 M33 = 1 x -3 = -3
Therefore, the matrix of minors is

7 14 7
𝑀 = ( 10 20 10 )
−3 −6 −3
Therefore, the matrix of cofactors is

7 −14 7
𝐶 = (−10 20 −10)
−3 6 −3
The adjoint of matrix A is
𝟕 −𝟏𝟒 𝟕 𝑻 𝟕 −𝟏𝟎 −𝟑
𝑇
𝐴𝑑𝑗𝑜𝑖𝑛𝑡 𝑜𝑓 𝑀𝑎𝑡𝑟𝑖𝑥 𝐴 = 𝒂𝒅𝒋 (𝑨) = 𝐶 = (−𝟏𝟎 𝟐𝟎 −𝟏𝟎) = (−𝟏𝟒 𝟐𝟎 𝟔)
−𝟑 𝟔 −𝟑 𝟕 −𝟏𝟎 −𝟑
Determinant of Matrix
The determinant of matrix A is obtained by using any row or column of matrix A.
For instance if you use the first row then |𝐴| = 𝑎11 𝐶11 + 𝑎12 𝐶12 + 𝑎13 𝐶13
For instance if you use the second row then |𝐴| = 𝑎21 𝐶21 + 𝑎22 𝐶22 + 𝑎23 𝐶23
For instance if you use the third row then |𝐴| = 𝑎31 𝐶31 + 𝑎32 𝐶32 + 𝑎33 𝐶33
For instance if you use the first column then |𝐴| = 𝑎11 𝐶11 + 𝑎21 𝐶21 + 𝑎31 𝐶32
For instance if you use the second column then |𝐴| = 𝑎12 𝐶12 + 𝑎22 𝐶22 + 𝑎32 𝐶32
For instance if you use the third column then |𝐴| = 𝑎13 𝐶13 + 𝑎23 𝐶23 + 𝑎33 𝐶33
Example
Find the determinant of the matrix A.
𝟗 8 𝟕
𝐴 = (6 5 4)
1 2 3
Solution
We found that the matrix of cofactors is
12

𝟕 −14 7
𝐶 = (−10 20 −10)
−3 6 −3
Using the first row of matrix A, |A| = (9x7) + (8x-14) + (7x7) = 0
Using the second row of matrix A, |A| = (6x-10) + (5x20) + (4x-10) = 0
You can try using any other rows and column of matrix A, you will always get the
same answer for determinant of A.
Inverse of Matrix
If a matrix A is NOT singular, that is, the determinant of the matrix A is NOT zero,
then we can find the inverse of the matrix A, which we write as A-1, using the
formula
1
𝐴−1 = 𝑎𝑑𝑗 (𝐴)
|𝐴|
Example
Find the minors and cofactors of the matrix A.
−1 1 2
𝐴 = ( 2 −2 1 )
1 4 −1
Write the matrix of minors.
Write the matrix of cofactors.
Write the adjoint of matrix A.
Find the determinant of matrix A.
Find the inverse of matrix A.
Solution
−2 1
𝑀11 = | | = (−2 𝑥 − 1 − 1𝑥4) = −2
4 −1
C11 = (-1)1+1 M11 = 1 x -2 = -2
2 1
𝑀12 = | | = (2x − 1 − 1x1) = −3
1 −1
C12 = (-1)1+2 M12 = -1 x 3 = 3
2 −2
𝑀13 = | | = (2 x 4 − 1x − 2) = 10
1 4
C13 = (-1)1+3 M13 = 1 x 10 = 10
1 2
𝑀21 = | | = (1x − 1 − 4x2) = −9
4 −1
13

C21 = (-1)2+1 M21 = -1 x -9 = 9
−1 2
𝑀22 = | | = (−1𝑥 − 1 − 1𝑥2) = −1
1 −1
C22 = (-1)2+2 M22 = 1 x -1 = -1
−1 1
𝑀23 = | | = (−1x4 − 1x1) = −5
1 4
C23 = (-1)2+3 M23 = -1 x -5 = 5
1 2
𝑀31 = | | = (1𝑥1 − 2𝑥 − 2) = 5
−2 1
C31 = (-1)3+1 M31 = 1 x 5= 5
−1 2
𝑀32 = | | = (−1x1 − 2x2) = −5
2 1
C32 = (-1)3+2 M32 = -1 x -5 = 5
−1 1
𝑀33 = | | = (−1x − 2 − 1x2) = 0
2 −2
C33 = (-1)3+3 M33 = 1 x 0 = 0
Therefore, the matrix of minors is

−2 −3 10
𝑀 = (−9 −1 −5)
5 −5 0
Therefore, the matrix of cofactors is

−2 3 10
𝐶 = ( 9 −1 5 )
5 5 0
The adjoint of matrix A is
−2 9 5
𝐴𝑑𝑗𝑜𝑖𝑛𝑡 𝑜𝑓 𝑀𝑎𝑡𝑟𝑖𝑥 𝐴 = 𝒂𝒅𝒋 (𝑨) = 𝐶 𝑇 = ( 3 −1 5)
10 5 0
Determinant of Matrix
Using the first row
|𝐴| = 𝑎11 𝐶11 + 𝑎12 𝐶12 + 𝑎13 𝐶13 = (−1x − 2) + (1x3) + (2x10) = 25
Inverse of Matrix
Inverse of Matrix A is
14

1 1 −2 9 5 −0.08 0.36 0.2
−1
𝐴 = 𝑎𝑑𝑗 (𝐴) = ( 3 −1 5) = ( 0.12 −0.04 0.2)
|𝐴| 25
10 5 0 0.4 0.2 0
1.7 Activity
Question 1.
(a) Find the minors and cofactors of the matrix A.
1 2 5
𝐴 = ( 2 1 1)
1 4 3
(b) Write the matrix of minors.
(c) Write the matrix of cofactors.
(d) Write the adjoint of matrix A.
(e) Find the determinant of matrix A.
(f) Find the inverse of matrix A.
Answer
(a) Follow the steps in the examples.
−1 5 7
(b) (−14 −2 2 )
−3 −9 −3
−1 −5 7
(c) ( 14 −2 −2)
−3 9 −3
−1 14 −3
(d) (−5 −2 9 )
7 −2 −3
(e) 24
(f)
−1 7 −1
24 12 8
−5 −1 3
𝐴−1 =
24 12 8
7 −1 −1
( 24 12 8 )
Question 2.
(a) Find the minors and cofactors of the matrix A.
1 3 −1 1
𝐴 = (1 3 1 2 )
1 3 3 −1
2 2 0 2
(b) Write the matrix of minors.
15

(c) Write the matrix of cofactors.
(d) Write the adjoint of matrix A.
(e) Find the determinant of matrix A.
(f) Find the inverse of matrix A.
Answer
(a) Follow the steps in the examples.
−2 −10 −12 8
(b) ( 20 4 −8 16)
6 −2 4 8
−24 −8 0 0
−2 10 −12 −8
(c) ( −20 4 8 16 )
6 2 4 −8
24 −8 0 0
−2 −20 6 −8
(d) ( 10 4 2 16 )
−12 8 4 −8
−8 16 −8 0
(e) 32
1 5 3 3
− 16 −8 16 4
5 1 1 −1
16 8 16 4
(f) 3 1 1
−8 0
4 8
1 1 1
( −4 2
−4 0)
16

Solving Systems of Linear
UNIT
2 Equations
1
UNIT STRUCTURE
2.1 Introduction
2.3 System of Simultaneous Equations
2.4 Method of Determinants
2.5 Elimination Method
2.6 Activities
2.1 Introduction
In this Unit, you will learn how to solve equations simultaneously. The aim of Unit 2
is to explain how to use matrix operations to solve simultaneous equations. System
of equations emanate from various processes.

1. Understand when simultaneous will have unique solutions.
2. Use inverse of matrices to find solutions.
3. Use row and column operations to solve simultaneous equations.
2.3 System of Simultaneous Equations

Consider the following example: the cost of 2 apples and 3 oranges is 34 rupees while the cost of 3
apples and 1 orange is 23 rupees. What is the cost of one apple and of one orange?
Let x represent cost of an apple and y be the cost of an orange.
The cost of 2 apples and 3 oranges = 2x + 3y = 34
The cost of 3 apples and 1 orange = 3x + y = 23
We have to solve these two equations 2x + 3y = 34 and 3x + y = 23. You must have studied this at the
secondary level. We will multiply 3x + y = 23 by 3 to obtain 9x + 3y = 69.
We will subtract the equation 2x + 3y = 34 from 9x + 3y = 69 to obtain (9x-2x) + (3y-3y) = (69 – 34)
to obtain 7x = 35 and hence x = 5. Replacing x = 5 in 3x + y = 23 gives y = 23 – 3(5)
= 8. So the cost of an apple, x, is 5 rupees and the cost of an orange, y, is 8 rupees.
We will now use matrix method to solve these equations 2x + 3y = 34 and 3x + y = 23 simultaneously.
First we write these two equations

2x + 3y = 34
3x + y = 23
in matrix form as
17

𝟐 𝟑 𝑥 34
( ) (𝑦) = ( )
𝟑 1 23
𝟐 𝟑
The determinant of the matrix of coefficient ( ) = (2 x 1) – (3 x 3) = -7
𝟑 1
𝟐 𝟑 1 1 −𝟑
The inverse of the matrix of coefficient ( )= ( )
𝟑 1 −7 −𝟑 𝟐
𝟐 𝟑 𝑥 34
Therefore, ( ) ( ) = ( ) becomes
𝟑 1 𝑦 23
(1 × 34) + (−𝟑 × 23)

𝑥 1 1 −𝟑 34 −7 5
(𝑦 ) = ( )( ) = ( )= ( )
−7 −𝟑 𝟐 23 (−𝟑 × 34) + (𝟐 × 23) 8
−7
𝑥 5
(𝑦 ) = ( )
8
Hence x = 5 and y = 8.
Unique Solutions, No Solutions and Infinite Solutions

When we solve simultaneous equations then we can have
(a) a unique solution;
(b) an infinite number of solutions; and
(c) no solution.
The simultaneous equations y = x + 3 and 3y = 5x – 1 have a unique solution (x = 5
and y = 8). If you plot the graphs, then you will see these two lines intersecting at
one point only.
The simultaneous equations y = x + 5 and 2y – 2x = 10 have an infinite number of

solutions. If you plot the graphs, then you will see that there is a single line.
The simultaneous equations y = 2x + 5 and y = 2x – 1 have no solution. If you plot the graphs, then you
will see that there is a pair of parallel lines.
18

n equations in n unknowns
The variables x and y were used in the equations 2x + 3y = 34 and 3x + y = 23. However, often there
are many unknown variables. So instead of using x and y we will use x1, x2, x3, x4,…
Therefore a system of n equations with n unknowns
a11x1 + a12x2 + a13x3 + a14x4 +…+ a1nxn = b1
a21x1 + a22x2 + a23x3 + a24x4 +…+ a2nxn = b2
a31x1 + a32x2 + a33x3 + a34x4 +…+ a3nxn = b3
.
.
.
an1x1 + an2x2 + an3x3 + an4x4 +…+ annxn = bn
This can be written in the form of a matrix
𝑎11 𝑎12 𝑎13 𝑎14 ⋯ 𝑎1𝑛 𝑥1 𝑏1

𝑎21 𝑎22 𝑎23 𝑎24 ⋯ 𝑎2𝑛 𝑥2 𝑏2
𝑎31 𝑎32 𝑎33 𝑎34 ⋯ 𝑎3𝑛 𝑥3 = 𝑏3
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
(𝑎𝑛1 𝑎𝑛2 𝑎𝑛3 𝑎𝑛4 ⋯ 𝑎𝑛𝑛 ) (𝑥𝑛 ) (𝑏𝑛 )
We can also write it as AX = B
Where
𝑎11 𝑎12 𝑎13 𝑎14 ⋯ 𝑎1𝑛 𝑥1 𝑏1

𝑎21 𝑎22 𝑎23 𝑎24 ⋯ 𝑎2𝑛 𝑥2 𝑏2
𝐴= 𝑎 31 𝑎32 𝑎33 𝑎34 ⋯ 𝑎3𝑛 𝑋 = 𝑥3 𝐵 = 𝑏3
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
(𝑎𝑛1 𝑎𝑛2 𝑎𝑛3 𝑎𝑛4 ⋯ 𝑎 𝑛𝑛 ) 𝑥
( 𝑛) (𝑏𝑛 )
Matrix A is called the matrix of coefficients.

Matrix X is called the matrix of unknowns.
Matrix B is called the matrix of constants.
Combining the matrix of coefficients and matrix of constants give the augmented matrix:
𝑎11 𝑎12 … 𝑎1𝑛 | 𝑏1
𝑎21 𝑎22 … 𝑎2𝑛 | 𝑏2
𝐴𝑢𝑔𝑚𝑒𝑛𝑡𝑒𝑑 𝑚𝑎𝑡𝑟𝑖𝑥 = 𝑎31 𝑎32 … 𝑎3𝑛 | 𝑏3
⋮ ⋮ ⋮ ⋮ | ⋮
(𝑎𝑛1 𝑎𝑛2 … 𝑎𝑛𝑛 | 𝑏𝑛 )
19

2.4 Method of Determinants
If A-1 exists, then we can find the unknowns by using
AX = B implies X = A-1 B
Cramer’s Rule
Cramer’s rule uses determinants to solve simultaneous equations.
Consider the system of equations with two unknowns x and y:
a1x + b1y = d1
a2x + b2y = d2
Step 1. We find the determinant of the coefficient matrix. Of course this should
NOT be zero.
𝑎 𝑏1
𝐴= ( 1 )
𝑎2 𝑏2
|A| = (a1 b2) – (a2 b1)
Step 2. We find the determinant of the coefficient matrix with the coefficient of x
replaced by the constants.
𝑑 𝑏1
𝐴𝑥 = ( 1 )
𝑑2 𝑏2
|Ax| = (d1 b2) – (𝑑2 b1)

The value of unknown x is obtained by
|𝐴𝑥 | (𝑑1 𝒃𝟐 ) – (𝑑2 𝒃𝟏 )
𝑥= =
|𝐴| (𝑎1 𝒃𝟐 ) – (𝑎2 𝒃𝟏 )
Step 3. We find the determinant of the coefficient matrix with the coefficient of y
replaced by the constants.
𝑎 𝑑1
𝐴𝑦 = ( 1 )
𝑎 2 𝑑2
|Ay| = (𝑎1 𝑑2 ) – (𝑎2 d1)

The value of unknown y is obtained by
|𝐴𝑦 | (𝑎1 𝑑2 ) – (𝑎2 𝑑1)
𝑦= =
|𝐴| (𝑎1 𝒃𝟐 ) – (𝑎2 𝒃𝟏 )
Example
Solve the equations
2x + y = 5
5x – y = 2
20

𝟐 𝟏
𝐴 = ( ) then |𝐴| = (2 x − 1) − (5 x 1) = −2 − 5 = −7
𝟓 −𝟏
𝟓 𝟏
𝐴𝑥 = ( ) then | 𝐴𝑥 | = (5 x − 1) − (2 x 1) = −5 − 2 = −7
𝟐 −𝟏
|𝐴𝑥 | −7
𝑥= = =1
|𝐴| −7
𝟐 𝟓
𝐴𝑦 = ( ) then | 𝐴𝑦 | = (2 x 2) − (5 x 5) = 4 − 25 = −21
𝟓 𝟐
|𝐴𝑦 | −21
𝑦= = =3
|𝐴| −7
Cramer’s rule can be applied to solve any system of equation.
-x + y + 2z = 3
2x - 2y + y = 9
x + 4y – z = -5
−1 1 2
𝑨= (2 −2 1)
1 4 −1
𝟑 1 2
𝑨𝒙 = ( 𝟗 −2 1 )
−𝟓 4 −1
−1 𝟑 2
(
𝑨𝒚 = 2 𝟗 1)
1 −𝟓 −1
−1 1 𝟑
𝑨𝒛 = ( 2 −2 𝟗)
1 4 −𝟓
We calculate the determinant of each matrix using the formula in Unit 1:

|A| = 25 |Ax| = 50 |Ay| = -25 |Az| = 75
We calculate x, y and z:
|𝐴𝑥 | 50
𝑥= = =2
|𝐴| 25
|𝐴𝑦 | −25
𝑦= = = −1
|𝐴| 25
21

|𝐴𝑧 | 75
𝑧= = =3
|𝐴| 25
2.5 Elimination Method

We will now consider elimination method to solve a system of equations. As the name suggests, we
must eliminate most of the variables from an equation, so that, we are left with one variable.
To solve the system of equations
𝑎11 𝑎12 𝑎13 𝑎14 ⋯ 𝑎1𝑛 𝑥1 𝑏1
𝑎21 𝑎22 𝑎23 𝑎24 ⋯ 𝑎2𝑛 𝑥2 𝑏2
𝑎31 𝑎32 𝑎33 𝑎34 ⋯ 𝑎3𝑛 𝑥3 = 𝑏3
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
( 𝑎 𝑛1 𝑎 𝑛2 𝑎 𝑛3 𝑎 𝑛4 ⋯ 𝑎 𝑛𝑛 𝑥
) ( ) ( 𝑛)
𝑛 𝑏
Step 1.
We will start with the augmented matrix:
𝑎11 𝑎12 … 𝑎1𝑛 | 𝑏1

𝑎21 𝑎22 … 𝑎2𝑛 | 𝑏2
𝑎31 𝑎32 … 𝑎3𝑛 | 𝑏3
⋮ ⋮ ⋮ ⋮ | ⋮
𝑎
( 𝑛1 𝑎𝑛2 … 𝑎 𝑛𝑛 | 𝑏𝑛 )
Step 2.
We will use the Elementary Row Operations (ERO)
EROi: Interchanging Rows
In the simultaneous equations y = x + 3 and 3y = 5x – 1, it does not matter which equation is written
first. We can write y = x + 3 first or second. The solution will not change.
Since, interchanging two equations does not change the solution, the first elementary operation is
interchanging rows in the augmented matrix.
EROm: Multiplying a row by a non-zero constant

The simultaneous equations y = x + 3 and 3y = 5x – 1 have the solution x = 5 and y = 8.
In the simultaneous equations y = x + 3 and 3y = 5x – 1, we can multiply the equation y = x + 3 by 5
to obtain 5(y) = 5(x) + 5(3) or 5y = 5x + 15.
The equations 5y = 5x + 15 and 3y = 5x – 1 have the same solution x = 5 and y = 8.
The solution has not changed.
Since, multiplying an equation by a non-zero constant does not change the solution, the second
elementary operation is multiplying a row in the augmented matrix by a non-zero constant.
EROr: Replacing a row by the row added to a multiple of another row

The simultaneous equations y = x + 3 and 3y = 5x – 1 have the solution x = 5 and y = 8. We can
multiply the equation y = x + 3 by 2 to obtain 2(y) = 2(x) + 2(3) or 2y = 2x + 6. We will replace the
second equation 3y = 5x – 1 by (3y = 5x – 1) minus (2y = 2x + 6) which yields y = 3x – 7. The
equations y = x + 3 and y = 3x – 7 have the same solution x = 5 and y = 8.
The solution has not changed.
22

Hence the third elementary operation is replacing a row of the augmented matrix by the row
added to a multiple of another row.
Row-echelon matrix
A row-echelon matrix is one in which
 a row of all zeroes, if it exists, must be at the bottom of the augmented matrix;
 the first non-zero element in any row of the augmented matrix has a value of
one. This is called a “leading one”;
 the leading one of any row is to the immediate right of the leading one of the
previous row; and
 all the elements below the leading one are zeroes.
The row-echelon form of a matrix is not necessarily unique.
Reduced Row-echelon matrix

A reduced row-echelon matrix is a row-echelon matrix with zeroes both below and
above leading ones.
Step 3.
Proceed with the process called Gaussian Elimination: perform the elementary
row operations till the augmented matrix becomes a row-echelon form.
Use back substitution to find the value of each unknown.
OR
Proceed with the process called Gauss-Jordan Elimination: perform the

elementary row operations till the augmented matrix becomes a reduced row-
echelon form.
Find value of each unknown.
Note that you can use the elementary operations in any order and for any
number of times.
Example.
Solve these equations simultaneously
-x + y + 2z = 3
2x - 2y + z = 9
x + 4y – z = -5
−1 1 2 | 3
𝑨 = ( 2 −2 1 | 9 )
1 4 −1 | −5
We will denote row 1 as R1. So R1 = (-1 1 2 | 3)

23

Similarly
Row 2 is R2 = (2 -2 1 | 9)
Row 3 is R3 = (1 4 -1 | -5)
We will multiply R1 by -1 to obtain a new first row:

R1 = (1 -1 -2 | -3)
So the augmented matrix becomes: R1 -1 x R1:
1 −1 −2 | −3
(2 −2 1 | 9 )
1 4 −1 | −5
In R2 = (2 -2 1 | 9) will use operations to replace the element “2” by zero:

R2 - 2 times R1: (2 -2 1 | 9) – 2 (1 -1 -2 | -3) = (2- 2x1 -2-2x-1 1-2x-2| 9-2x-3) = (0 0 5
| 15).
R2  R2 - 2 x R 1 :
1 −1 −2 | −3
(0 0 5 | 15 )
1 4 −1 | −5
Since we can interchange rows, we will make (0 0 5| 15) row 3.

R3  R2 :
1 −1 −2 | −3
(1 4 −1 | −5)
0 0 5 | 15
We will multiply R3 by 1/5 in order to create 1 where there is a 5:

1/5 R3 = 1/5 times (0 0 5 | 15) = (0 0 1| 3)
1 −1 −2 | −3
(1 4 −1 | −5)
0 0 1 | 3
In the second row, we will create zero in the first position by subtracting row1 from row 2:
(1 4 -1 | -5) – (1 -1 -2 | -3) = (1 - 1 4- -1 -1- -2| -5- -3) = (0 5 1 | -2)
R2  R2 - R1 :
1 −1 −2 | −3
(0 5 1 | −2 )
0 0 1 | 3
Let’s create one in the second row so that there is a leading one by replacing R2 by 1/5 R2.
R2  1/5R2:
24

1 −1 −2 | −3
1 −2
(0 1 | )
5 5
0 0 1 | 3
This matrix is in row-echelon form.

Using back substitution, we will find the value of the unknowns.
We shall start with the last row of the augmented matrix:

(0 0 1| 3) means 1z = 3 so z = 3.
Using the second row

(0 1 1/5| -2/5) means 1y + (1/5) z = -2/5
We will replace z by 3 to obtain y.
1y + (1/5)(3) = -2/5. Hence y is (-2/5) – (3/5) = -1.
Using the first row

(1 -1 -2| -3) means x – y – 2z = -3
Replace z by 3 and y by -1 to find x:
x – (-1) – 2(3) = -3 so x = -3 -1 + 6 = 2
Hence x = 2, y = -1 and z = 3.
Note that we could have proceeded further to transform the echelon matrix
1 −1 −2 | −3
1 −2
(0 1 | )
5 5
0 0 1 | 3
into a reduced row-echelon form:
R1 by R1+ R2 which we will write as R1  R1 + R2:
−9 −17
1 0 |
5 5
1 −2
0 1 |
5 5
(0 0 1 | 3 )
We will replace R1 by R1 + (9/5)R3 which will write as

R1  R1 + (9/5)R3:
25

1 0 0 | 2
1 −2
(0 1 | )
5 5
0 0 1 | 3
We will replace R2 by R2 - (1/5)R3 which will write as

R2  R2 - (1/5)R3:
1 0 0 | 2
(0 1 0 | −1)
0 0 1 | 3
This matrix is in reduced row-echelon form.

Hence, we can find the values of the unknowns:
x = 2, y = -1 and z = 3.
2.6 Activities
Question 1.
Use Cramer’s rule to solve
x + 4y = 9
3x - y = 1
Answer x = 1 and y = 2.
Question 2.
-x + 2y + 3z = 8
2x - y + 2z = 1
x + 4y – z = 12
Answer x = 1; y = 3 and z = 1.
Question 3.
x + 2y + 3z = -7
2x - 3y - 5z = 9
-6x -8y + z = -22
Answer x = -1; y = 3 and z = -4.
Question 4.
Use Gaussian elimination to solve
-x + 2y + 3z = 8
2x - y + 2z = 1
26

x + 4y – z = 12
Note that applying the elementary row operations:
2 −1 2 | 1
9 23
0 −2 |
2 2
14 14
0 0 |
( 3 3)
The row-echelon matrix is

−1 1
1 1 |
2 2
−4 23
0 1 |
9 9
(0 0 1 | 1 )
Question 5.
Use Gauss-Jordan elimination to solve
-x + 2y + 3z = 8
2x - y + 2z = 1
x + 4y – z = 12
Note that the reduced row-echelon matrix is

1 0 0 | 1
(0 1 0 | 3 )
0 0 1 | 1
Question 6.
Use Gaussian elimination to solve
x + 2y + 3z = -7
2x - 3y - 5z = 9
-6x -8y + z = -22
Answer x = -1; y = 3 and z = -4.

Note that applying the elementary row operations:
−6 −8 1 | −22
−17 −14 5
0 |
3 3 3
89 −178
0 0 |
( 34 17 )
27

The row-echelon matrix is
4 1 11
1 − |
3 6 3
14 5
0 1 | −
17 17
(0 0 1 | −4 )
Question 7.
Use Gauss-Jordan elimination to solve
x + 2y + 3z = -7
2x - 3y - 5z = 9
-6x -8y + z = -22
Answer x = -1; y = 3 and z = -4.
Note that the reduced row-echelon matrix is

1 0 0 | −1
(0 1 0 | 3 )
0 0 1 | −4
3
UNIT
Permutation and
Combination
1 UNIT STRUCTURE
28

3.1 Introduction
3.3 Principles of Counting
3.4 Permutation
3.5 Combination
3.6 Activities
3.1 Introduction
In this Unit, you will learn how to find total number of arrangements
of certain objects as well as total number of combinations. The aim
of Unit 3 is to explain how to apply different methods and formulas
to problems.

By the end of this Unit, you should be able to
do the following:
1. Understand the meaning of permutation and combination.
2. Understand that different problems require different formulas.
3. Understand how to use different formulas.
3.3 Principles of Counting

Multiplicative Rule
Suppose that a task requires k compulsory steps to be completed.
Suppose that there are n1 different ways of completing the first
step. Suppose that there are n2 different ways of completing the
second step. Suppose that there are nk different ways of
completing the last kth step. Hence the total number of ways of
completing the task is
n1  n2  n3  . . . . . .  nk
Example
In a club there are 5 girls and 6 boys. You have been asked to select a team of one boy and one
girl to represent the club. How many different teams can you have?
Solution
29

Clearly this task (of forming teams) can be completed in 2 steps. The first step consists of choosing
a girl among the 5 girls. This can be done in 5 different ways as you can choose any one of
the five girls.
The second step consists of selecting one boy out of 6 boys. As you have 6 choices here, you can
complete this step in 6 different ways.
Hence, total number of different teams that can be formed
= number of girls that can be chosen in step (1) times number of boys that can be chosen in step
(2)
=56
= 30 ←
Additive Rule
Suppose that there are m alternative methods to complete a task. Then the number of different ways
of completing the task is obtained by adding the number of ways we have to complete each
alternative method.
Example
Suppose that there are 2 girls and 5 boys in a club. Suppose that you have to form a team of 2 persons
with at least one girl. How many different teams can you form?
Solution
Note that at least one girl means here you can either have one girl or more.
So one alternative would be to have 1 girl and 1 boy.

This alternative can be completed by choosing 1 girl from 2 (2 choices) and then choosing 1 boy
from 5 (5 choices):
= 2 x 5 = 10
The second alternative is to have 2 girls and no boys. This alternative can be completed in one way
only as you have to take both girls. This is the only way.
Hence total number of different teams that can be formed
= 10 + 1 = 11
It is important that you find all the alternatives.
30

3.4 Permutation
Permutation means arranging in an order and we will consider arranging on a line only.
PermutationI
Suppose that it’s your birthday and we have to arrange an apple (A), a bottle of
cola (B) and the birthday cake (C) in a line. How many ways can we arrange the
3 objects A, B and C in a line ?
We can place A first , B second and C

last :
ABC
Another way would be to place A first , C second and B
last: A C B
If we continue in this way we shall have the following six arrangements

in all:
ABC
AC B There are six
BAC different ways of
BCA arranging or
CAB permuting the 3
CBA different objects.
31
Factorial is represented by the exclamation mark !
n! = 1 x 2 x 3 x. . . . . . x n
! is available on your calculator.
3 factorial = 3! = 1 x 2 x 3 = 6
If we have n different objects and we would like to

arrange all of them in a straight line, the total
number of arrangement and permutation = n!
Applying this rule to the three objects we had to arrange
= 3! = 1 x 2 x 3 = 6 ←
Example
Suppose that you have four cards numbered 1, 2, 3 and 4. How many 4-digit
numbers (e.g 1234, 2134, . . . . . . ) can you form?
Solution
Forming 4-digit numbers requires arranging the cards. For instance if you want
the number 3241 then you must place card 3 first, then card 2, then card 4
and finally card 1.
Since we have 4 different cards and we would like to arrange all of them, then
this can be done in 4! = 24 ways. Thus we shall have 24 different 4-digit
numbers in all.
Example
Suppose that you had to arrange all the 6 letters in the word FRANCE. How
many different arrangements would you have?
Solution
Since we have 6 different objects that we would like to arrange, then we can
do so in 6! = 720 ways.
Permutation II
Suppose that we have n objects such that ‘a’ of these objects are identical, ‘b’ of
these objects are identical, ‘c’ of these objects are identical, . . . . . . Hence, total
number of arrangements of all these objects is
n!
a! b! c! . . .
32
Example
In how many ways can you arrange all the letters in the word MISSISSIPPI?
Solution
There are 11 letters in the word MISSISSIPPI with 4 identical ‘S’, 4 identical
‘I’ and 2 identical ‘P’. Therefore, total number of arrangements is
11!
= 34 650
4! times 4! times 2!
Example
Suppose that you have 6 cards numbered 1, 1, 2, 4, 4, 4. How many different

6-digit numbers can you form?
Solution
6! 720
= = 60
2! X 3! 2 X 6
because because
we have 2 we have 3
‘1’ ‘4’
33
Permutation III
Suppose that we have n different objects. We would like to have arrangements
containing r objects. (Of course ‘r’ has a value less than n). How many
arrangements can we have?
𝑛!
Total number of arrangements = n
pr = (𝑛−𝑟 )!
Example
How many arrangements containing 3 letters can be made if the letters are
taken from the word FRANCE?
Solution
FRANCE has 6 different letters. We would like to have arrangements containing

3 letters like (FRA, ICE, EFC, . . . . . . ). Then we shall have 6P3 = 120
Example
Suppose that your dad gave you the four cards numbered 1, 2, 3 and 4. He asked
you to teach numbers to your younger sister. How many different numbers can
you teach her?
Solution
You can use 1 card at a time and teach her number ‘1’, then number ‘2’, then
number ‘3’ and finally ‘4’. So you can teach her 4 1-digit number only. This can
be viewed as arranging one card from the four. This can be done in 4P1 = 4 ways.
Alternatively, you can decide to use 2 cards at a time. Thus, you can teach her
numbers like 21, 12, 13, 14, . . . . . . This can be viewed as arranging 2 cards out
of 4. This can be done in 4P2 = 12 ways.
So you will have 12 2-digit numbers. (As we only have 12 we can write them:
12, 13,
14, 21, 23, 24, 31, 32, 34, 41, 42,
43).
Alternatively, you could have used 3 cards at a time to teach her 3-digit numbers
(e.g
123, 321, . . . . . . ). This means we are arranging 3 cards from four every time.
This can be done in 4P3 = 24. So we have 24 3-digit numbers.
Finally, you could have used all the four cards to form 4-digit numbers like 1234,
4312,
… This means we are arranging four cards out of four. This can be done in
4P4 = 24 ways. So we have 24 4-digits.
Hence, there
are
4P1 1-digit + 4P2 2-digit + 4P3 3-digit and 4P4 4-digit numbers that can be formed.
4P + 4P
2 + P3 + P4
4 4
1
= 4 + 12 + 24 + 24 = 64
34
Permutation IV
Suppose that we have to arrange or permute such that some of the objects must
be together or “next to each other’’. We shall use the grouping method which
consists of putting all the objects that must be together into one group and calling
that group ‘X’ where X is considered as one object only.
Example
We saw that the total number of arrangements of all the letters in the word
FRANCE was
6! = 720. In how many of these 720 arrangements, the letters F, R, and C are
together?
F R A N C E
F R C
 A N E
Step 1 We have 4 different objects (namely X, A, N and E)

that can be arranged in
4! = 24 ways.
Step 2 Within this group we have

3 different objects that we Step 3 6  24
can arrange in 3! = 6 ways. = 144
Hence out of the 720 arrangements, there are 144 arrangements in which F, R and
C are together. This also means that there are 720 – 144 = 576 arrangements in
which F, R and C are not together.
Note when F, R C are not together then this means that they can be completely
separated from each other (we shall use the Arrow method to find in how many
arrangements they are completely separated) or two of them are together and one
is separate.
720 arrangements in all
F, R, C together
F R C
Grouping method : 144
F, R, C are completely
F R C
separated from each
other: use Arrow method
FR C 2 together and 1 separate
35
Example
In how many arrangements of the letters in the word MAURITIUS, the letters
M, U, Uand R are together?
Solution
M A U R I T I U S
M U U R
 A I T I S
These 6 letters (with 2 ‘I’)

can be arranged in
6!
= 360
2!
These 4 letters (with 2 ‘U’)
can be arranged in
4! 12  360 = 4320
= 12 ways
2!
Hence, there are 4320 arrangements in which M U U R are together.
Permutation V: Arrow Method
The arrow method is used when we would like the objects to be completely
separated from each other.
Example
We saw that there are 6! = 720 arrangements of all the letters in the word
FRANCE. In how many of these arrangements, the letters F, R and C are
completely separated from each other?
Solution
Since the constraint is on letters F, R and C, let us arrange the remaining letters
(A, N, E) with a blank in between them, one at the beginning and one at the
end.
–A–N–E
–
Step 1 These letters A, N, and E can arrange themselves in 3! = 6

ways.
36
Step 2 There are 4 slots (shown by the arrows) where we can fit the first
letter out of the 3 (F, R, C)
–A– N– E–
Step 3 Consequently, 3 slots shall remain for the second letter and only 2 slots
for the third letter.
Hence, total number of arrangements in which F, R and C are

completely separated = 3! 4 3 2 = 144
Example
You have to arrange 8 reports A, B, C, D, E, X, Y and Z on a shelf. In how

many arrangements X, Y, and Z will be completely separated from each other?
Solution
Step 1 – A – B – C – D – E –
ABCDE can be arranged in 5! = 120 ways.
Step 2 The first of the 3 reports (X, Y, Z) can be inserted in any one of the six
slots (shown by the arrows)
–A– B– C– D– E–
Step 3 The second can be inserted in any of the remaining 5 slots.
Step 4 The third can be inserted in any one of the remaining 4 slots.
Hence, there are 5! x 6 x 5 x 4 = 14 400 arrangements of the 8

reports in which X, Y and Z are completely separated.
37
Permutation VI: Box Method
Whenever, we have constraints on position, we must use the box method.
Example
We saw that there are 720 arrangements of all the letters in the word
FRANCE. (a) How many of these start with letter R?
(b) How many of these start with letter R and end in letter A?
Solution
(a) As there are 6 letters to arrange, we will draw 6 boxes
There is only one letter that we can place in

this first position, namely R. So there is
only 1 way to fill this position.
There is no constraint on the remaining positions. As we have used the letter

‘R’, there remains the letter F, A, N, C, E and we can use any one of these
to fill the second position. So we have 5 choices for the second position.
1 5
R F
A
N
C
E
Suppose we choose ‘F’ to fill this second position, we are left with A, N,
C, E for the third position. (So 4 choices for the third position)
1 5 4
R F A
A N
N C
C E
E
38
If we continue in this way, there shall be 3 choices for the fourth position (N, C, E), 2 choices for
the fifth position (C, E) and 1 letter E will be used to fill the last position:
1 5 4 3 2 1
R F A N C E
A N C E
N C E
C E
E
We carry out the multiplication of the numbers in the boxes

1 x 5 x 4 x 3 x 2 x 1 = 120
So there are 120 arrangements starting with ‘R’.
(b) Since, we want arrangements starting with R and ending in A , we will draw 6
boxes. As we can fill the first position with ‘R’ only, there is only 1 way to fill the
first position. As we can fill the last position with ‘A’ only, there is only one way
to fill the last position.
1 1
R A
We are left with F, N, C, E for the second position M, C, E for 3rd position, C, E
for 4th position and E for the 5th position.
1 4 3 2 1 1
R F N C E A
N C E
C E
E
1  4  3  2  1  1 = 24
So there are 24 arrangements starting with R and ending in ‘A’.
39
Example
Suppose that you have six cards numbered 1, 2, 3, 4, 5 and 6. How many 4-digit
even numbers greater than 3000 can you form?
Solution
Since, we have 4-digit numbers, we will draw 4 boxes. Since, the 4-digit numbers
have to be greater than 3000, they can’t start with ‘1’ or ‘2’. They can only start
with 3, 4, 5 or 6. So we have 4 choices when filling the first position. Now since
the numbers have to be even, they must end in 2, 4 or 6.
3 2
4 4
5 6
6
So here it is preferable to work with the shorter list. So we will do it in 3 steps.
We shall find all those ending in ‘2’.
4 4 3 1 = 48
3 1 4 2
4 4 5
5 5 6
6 6
So there are 48 4-digit even numbers ending in ‘2’. Similarly, let’s find the 4-
digit even numbers ending in ‘4’.
3 4 3 1 = 36
3 1 2 4
5 2 5
6 5 6
6
So there are 36 4-digit even numbers ending in ‘4’. Finally, let’s find the 4-digit
numbers ending in ‘6’.
3 4 3 1
3 1 2 6
4 2 4
5 4 5
5
3  4  3  1 = 36
Hence, there are 36 4-digit even numbers that end in 6. Therefore, the total
number of even 4-digit numbers that are greater than 3000
48 36 36
= + +
(ending in ‘2’) (ending in 4) (ending with ‘6’)
= 120.
40
3.5 Combination
Suppose that you have 3 fruits, Apple (A), Banana (B) and a citrus fruit (C). If you
have to use all the 3 fruits, how many different juices can you make? Only one!
When you are drinking the juice would you know in which order the fruits have
been put into the juicer? No.
Thus, if we have n objects, and we would like to combine all of them, then there
is only one combination that we can have. In combination the order does not
matter. This is indeed a major difference between permutation and combination.
Suppose that we decide to use only two fruits out of the three (A, B, C) to prepare
a juice. How many different juices can you make? Three. You can use A and B
or A and C or B and C.
When we have n different objects, and we want to have combinations containing r

n
objects, then we will have Cr such combinations. (of course, r is less than n).
Example
How many different juices can be made using two of the three fruits A, B and C?
3 different fruits 3
C2 = 3
2 fruits are being used

Example
How many different lotto combinations can you have if you have to choose 6
numbers out of 40?
Answer
40
C 6 = 38 383 80
We will now illustrate the difference between permutation and combination

through the following example.
Example
Suppose that there are ten members in a club. A team of 3 persons must be
selected to form the management team of the club.
(a) How many different management teams can be formed?
Solution
10
C3 = 120
(b) How many different ways are there to elect a President, a Secretary and a
Treasurer to form a management team?
Solution
10
C3  3! = 10
P3
Select 3 persons In general,
out of 10
41
A range the 3 persons as President,
r Secretary and Treasurer
Cr  r! = P r
n n
42

Combination under constraint
Example
Suppose that a club has ten members: 6 males and 4 females. A team of 4 persons must
be selected to represent the team. How many different teams can be formed if
(a) There is no restriction?
Solution
10
C4
(b) There must be 2 males and 2 females?

Solution
Total 6 males 4 females
 C
6 4
C
Chosen 2 2 2  2 = 90
(c) There must be at least one female.

Solution
This means we have one female or more.
4 females 6 males
4 6
1 3 C1  C3 = 80
4 6
or 2 2 C2  C2 = 90
4 6 24
or 3 1 C3  C1 =
4 6
or 4 0 C4  C0 = 1
195
(d) One of the males was John and one of the female was Mary. John and Mary don’t
get along. A team of 2 males and 2 females must be selected. How many different
teams can be formed with either John or Mary but not both?
Solution
We will solve this in two steps.
Step1 Let’s find those teams in which John is included. This means we need
to select one more male among the five (6 males minus John) and this
can be done in 5C1 = 5 ways.
We will need to select 2 females among 3 females (4 females minus

Mary) and this can be done in 3C1 = 3 ways.
Hence, number of teams in which John is included and Mary is excluded

= 5C1  C1 = 15
3
Step 2 Let’s find the number of teams in which Mary is included but John is
excluded. So we need to find one more female from the three remaining
(4 females minus Mary) and this can be done in 3C1 = 3 ways.
We need to select two males from five (6 males minus John) and this
43
5
can be done in C2 = 10 ways.
Hence, total number of teams in which Mary is included but John is

excluded = 3C1  C2
5
= 3  10 = 30
Hence, total number of teams in which either John or Mary are included
but not both is 15 + 30 = 45.
44

PermutationVII
Example
How many permutations containing 3 letters can you have if the letters are taken from
the word PRETTY?
Solution
PRETTY has 6 letters in total. As we would like to have arrangements of 3 letters, there
is a tendency to use 6P3 = 120. This is wrong because all the 6 letters are not different as
‘T’ appears twice. For this problem, we shall proceed in the following steps.
Note when we remove the 2 ‘T’ from the word PRETTY we are left with PREY.
Step 1 Let’s find those arrangements that don’t contain any ‘T’. This means we need
to choose 3 letters from PREY and then arrange the 3. This can be done in
C3  3! = 4  6 = 24
4
Step 2 Let’s find those arrangements that contain one letter ‘T’. This means we need
to choose 2 more letters from PREY before arranging all the 3. This can be
done in C2  3! = 6  6 = 36
4
Step 3 Let’s find those arrangements with 2 ‘T’. This means that we need to select
only one letter from PREY and arrange them. We can do this in
3!
C 1  = 4  3 = 12
4
2!
Note that we divide by 2! as we will have 2 ‘T’.
Hence, total number of permutations containing 3 letters

= 24 + 36 + 12 = 72 ←
45
3.6 Activities
1. Issam has 11 different CDs, of which 6 are pop music, 3 are jazz and 2 are
classical.
(i) How many different arrangements of all 11 CDs on a shelf are there if the
jazz CDs are all next to each other?
(ii) Issam makes a selection of 2 pop music CDs, 2 jazz CDs and 1 classical
CD. How many different possible selections can be made?
2. A choir consists of 13 sopranos, 12 altos, 6 tenors and 7 basses. A group consisting

of 10 sopranos, 9 altos, 4 tenors and 4 basses is to be chosen from the choir.
(i) In how many different ways can the group be chosen?
(ii) In how many ways can the 10 chosen sopranos be arranged in a line if the
6 tallest stand next to each other?
(iii) The 4 tenors and 4 basses in the group stand in a single line with all the
tenors next to each other and all the bases next to each other. How many
possible arrangements are there if three of the tenors refuse to stand next to
any of the basses?
3. A builder is planning to build 12 houses along one side of a road. He will build 2
houses in style A, 2 houses in style B, 3 houses in style C, 4 houses in style D and
1 house in style E.
(i) Find the number of possible arrangements of these 12 houses.
(i)
Road
First group Second group
The 12 houses will be in two groups of 6 (see diagram). Find the number of
possible arrangements if all the houses in styles A and D are in the first
group and all the houses in style B, C and E are in the second group.
iii) Four of the 12 houses will be selected for a survey. Exactly one house must
be in style B and exactly one house in style C. Find the number of ways in
which these four houses can be selected.
4. Six men and three women are standing in a supermarket queue.

(i) How many possible arrangements are there if there are no restrictions on
order?
(ii) How many possible arrangements are there if no two of the woman are
standing next to each other?
(iii) Three of the people in the queue are chosen to take part in a customer
survey. How many different choices are possible if at least one women must
be included?
5. 4 letters are to be selected from the 7 letters of the word ASSUMED.

Find the number of different 4-letters selections that can be made containing
(i) no S, (ii)
one S,
(iii) both S’s
Find the corresponding number of arrangements in each case.
6. A committee of 6 people, which must contain at least 4 men and at least 1 woman,
is to be chosen from 10 men and 9 women.
(i) Find the number of possible committees that can be chosen.
(ii) Find the probability that one particular man, Albert, and one particular
woman, Tracey, are both on the committee.
46
(iii) Find the number of possible committees that include either Albert or Tracey
but not both.
(iv) The committee that is chosen consists of 4 men and 2 women. They queue
up randomly in a line for refreshments. Find the probability that the women
are not next to each other in the queue.
7. Eight men travel in two four-seater cars. In how many ways can the men be
allocated to the cars
(a) if all eight can drive;
(b) if only two can drive? (The same four men in a car count as one way no
matter in what positions the men are sitting).
8. In how many different ways can the letters of the word SALOON be arranged
(a) if the two O’s must not come together,
(b) if the consonants and vowels must occupy alternate places?
9. Calculate the number of 5-figure even numbers containing each of the digits 1, 3,
4, 7 and 8.
10. In how many ways can the letters of the word HORROR be arranged? In how
many of these arrangements are the vowels separated?
11. Seven cards, each bearing a letter, can be arranged to spell the word DOUBLES.
How many three-letter code-words can be formed from these cards? How many
of these words:
(a) contain the letter S,
(b) do not contain the leter O,
(c) consist of a vowel between two consonants?
12. A committee consisting of 2 men and 2 women is to be chosen from 5 men and 6
women. In how many ways can this be done?
One of the women is the wife of one of the men. In how many ways can the
committee be chosen if it must contain either the man and his wife, or neither?
13. The digits 1, 2, 3, 4, 5 are written down at random to form a five digit number.
Find:
(a) how many such numbers are possible,
(b) the chance that the last digit is odd,
(c) how many of the numbers are divisible by 4.
14. How many different 6-digit numbers can be made using the digits 1, 2 and 3 each
and the digit 4 three times?
How many of these numbers are odd? How many of these odd numbers have the
three 4’s together?
15. A committee of 6 is to be selected form 5 men and 4 women. Calculate the number
of different possible committees if:
(i) there is no restriction,
(ii) the number of men must be greater that the number of women.
16. A group of 12 cricketers is to be selected for a tour. The final selection is made by
choosing 6 batsmen out of 9, 4 bowlers out of 6 and 2 wicket-keepers out of 3.
Calculate the total number of different possible touring parties.
47
17. From a class of 6 boys and 4 girls, a group of 3 children is to be selected. Calculate:
(i) the total number of possible groups,
(ii) the number of possible groups if at least one member of each group is to be
a boy.
18. Find the number of ways in which 4 questions can be chosen from 7 questions in
Section II of an examination paper, assuming that the order in which the questions
are chosen is not relevant.
19. A committee of 5 people is to be chosen from 6 men and 4 women. In how many
ways can this be done
(i) if there must be 3 men and 2 women on the committee,
(ii) if there must be more men than women on the committee,
(iii) if there must be 3 men and 2 women, and one particular women refuses to
be on the committee with one particular man?
20. A collection of 18 books contains one Harry Potter book. Linda is going to choose
6 of these books to take on holiday.
(i) In how many ways can she choose the 6 books?
(ii) How many of these choices will include the Harry Potter book?
21. A school is asked to send a delegation of six pupils selected from six badminton
players, six tennis players and five squash players. No pupil plays more than one
game. The delegation is to consist of at least one, and not more than three, players
drawn from each sport. Giving full details of your working, find the number of
ways in which the delegation can be selected.
22. Geoff wishes to plant 25 flowers in a flower-bed. He can choose from 15 different
geraniums, 10 different roses and 8 different lilies. He wants to have at least 11
geraniums and also to have the same number of roses and lilies. Find the number
of different selections of flowers he can make.
23. In how many ways can a group of 14 people eating at the restaurant be divided
between three tables seating 5, 5 and 4?
24. A football team consists of 3 players who play in a defence position, 3 players
who play in a midfield position and 5 players who play in a forward position.
Three players are chosen to collect a gold medal for the team. Find in how many
ways this can be done
(i) if the captain, who is a midfield player, must be included, together with one
defence and one forward player.
(ii) if exactly one forward player must be included, together with any two others.
25. (a) Calculate the number of different 6-digit numbers which can be formed
using the digits 0, 1, 2, 3, 4, 5 without repetition and assuming that a number
cannot begin with 0.
(b) A committee of 4 people is to be chosen from 4 women and 5 men. The
committee must contain at least 1 women. Calculate the number of different
committees that can be formed.
26. (a) The producer of a play requires a total cast of 5, of which 3 are actors and
2 are actresses. He auditions 5 actors and 4 actresses for the cast. Find the
total number of ways in which the cast can be obtained.
(b) Find how many different 4-digit odd numbers less than 4000 can be made
from the digits 1, 2, 3, 4, 5, 6, 7 if no digit may be repeated.
48
27. (i) Find the number of different arrangements of the letters of the word
MEXICO.
Find the number of these arrangements
(ii) which begin with M,
(iii) which have the letter X at one end and the letter C at the other end.
Four of the letters of the word MEXICO are selected at random. Find the number
of different combinations if
(iv) there is no restriction on the letters selected,
(v) the letter M must be selected.
28. A garden centre sells 10 different varieties of rose bush. A gardener wishes to buy
6 rose bushes, all of different varieties.
(i) Calculate the number of ways she can make her selction.
Of the 10 varieties, 3 are pink, 5 are red and 2 are yellow. Calculate the number
of ways in which her selection of 6 rose bushes could contain
(ii) no pink rose bush,
(iii) at least one rose bush of each colour.
29. (a) An examination paper contains 12 different questions of which 3 are on

trigonometry, 4 are on algebra and 5 are on calculus. Candidates are aksed
to answer 8 questions.
Calculate:
(i) the number of different ways in which a candidate can select 8
questions if there is no restriction.
(ii) the number of these selections which contain questions on only 2 of
the 3 topics, trigonometry, algebra and calculus.
(b) A fashion magazine runs a competition, in which 8 photographs of dresses
are shown, lettered A, B, C, D, E, F, G and H. Competitors are asked to
submit an arrangement of 5 letters showing their choice of dresses in
descending order in merit. The winner is picked at random from those
competitors whose arrangement of letters agrees with that chosen by a panel
of experts.
(i) Calculate the number of possible arrangements of 5 letters chosen
from the 8.
Calculate the number of these arrangements
(ii) in which A is placed first,
(iii) which contain A.
30. (a) Find the number of different arrangements of the 9 letters of the word
SINGAPORE in which S does not occur as the first letter.
(b) 3 students are selected to form a chess team from a group of 5 girls and 3
boys. Find the number of possible teams that can be selected in which there
are more girls than boys.
49
Answers
1. (i) 2177280 20. (i) 18564

(ii) 90 (ii) 6188
2. (i) 33033000 21. 9450

(ii) 86400
(iii) 288 22. 1941912
3. (i) 831600 23. 252252

(ii) 900
(iii) 126 24. (i) 15
(ii) 75
4. (i) 362880
(ii) 151200 25. (a) 600
(iii) 64 (b) 121
5. (i) 5 ways; 120 ways 26. (a) 60

(ii) 10 ways; 240 ways (b) 200
(iii) 10 ways; 120 ways
27. (i) 720
6. (i) 9828 (ii) 120
(ii) 0.0812 (iii) 48
(iii) 4494 (iv) 15
(iv) 2/3
(v) 10
7. 70, 40 28. (i) 210

(ii) 7
8. 240, 36 (iii) 175
9. 48 29. (a) (i) 495

(ii) 10
10. 60, 40 (b) (i) 6720
(ii) 840
11. 21, 90, 120, 36 (iii) 4200
12. 150, 80 30. (a) 322, 560

(b) 40
13. 120, 3/5 , 24
14. 120, 40, 12
15. (i) 84
(ii) 34
16. 3780
17. (i) 120 ways

(ii) 116
18. 35
19. (i) 120

(ii) 186
(iii) 90
50
4
UNIT
Probability Distributions
3
UNIT STRUCTURE
1 4.1
4.2
4.3
Introduction
Learning Outcomes
Discrete v/s Continuous
4.4 Probability Distribution for Discrete Random Variables
4.5 Activities
4.6 Norma Probability Distribution
4.1 Introduction
In this Unit, you will learn about probability distributions. The aim of
Unit 4 is to explain the conditions to use the probability distributions.
You will also learn how to apply the distributions to solve real life
problems.

1. Understand the difference between discrete and continuous random variables.
2. Understand that each probability distribution has certain conditions.
3. Apply the formula to calculate the probability.
4. Calculate expectation and variance.
4.3 Discrete v/s Continuous

A discrete random variable is one that does not take all the values in
a given range. For example, if X is number of faulty printers among
five printers, X can only take one value among these six values: 0, 1,
2, 3, 4, or 5. X can’t take value 1.83 as we can’t have 1.83 faulty
printers.
A continuous random variable is one that can take any value in a
given range. For example, if X = time taken to repair a fault. X can
take any value from 5 minutes.
4.4 Probability Distribution for Discrete Random Variables
As in all subjects, some probability distributions have been studied

in depth. This is mainly because they occur often in practice. For each
one you must know how to calculate the probabilities, expectation
and variance.
The Discrete Uniform Distribution
51
Suppose that a d.r.v. X can take the n values x1, x2, x3, . . . . . . , xn. Suppose that each
value has exactly the same chance to occur. Then X is said to have a discrete uniform
distribution.
The probability distribution of X is given by

1
f (x) = for x = x1, x2, . . . . . xn
n
1
E(X) =  xi 
n
x1 x2 xn
= n + n +......+ n
x12 x2 2 xn2
E(X2) = n + n + . . . . . . = n
Var(X) = E(X2) – [E(X)]2
Example
Let’s throw a fair die. Let X = Score on fair die.
X can take any one of the six values :
Since the die is fair, all of the six scores are equally likely.
Hence, f (x) = 1
6
Outcome
x 1 2 3 4 5 6
1 1 1 1 1 1
P(X = x) 6 6 6 6 6 6
E(X) = 1  6 + 2  6 + 3  6 + 4  6 + 5  6 + 6 
1 1 1 1 1 1
6
21 7
= = = 3.5
6 2
+ 32  + 42  + 52  + 62 
1 1 1 1 1 1 91
E(X2) = 12  + 22  =
6
6 6 6 6 6 6
Var(X) = E(X2) – [E(X)]2

91 7 2 35 11
= – o p = = 2
6 2 12 12
52
The Binomial Probability Distribution
Very often an outcome has only two possible outcomes. For example
(i) tossing a coin results into a Head or Tail
(ii) verifying an electronic chip results in either the chip being good or faulty
(iii) At the beginning of a Ludo game the outcomes can be split into two : as you
desperately need a six to start the game and the rest { , , , , }.
Such trials that have two possible outcomes are called Bernoulli trials. The two
outcomes are also referred to as the ‘Success” and “Failure”.
A Binomial experiment is obtained when we repeat the Bernoulli trials several times.
For example, we throw a fair die five times or we test 12 electronic chips. Clearly we are
interested in the number of successes obtained.
Consider throwing a fair die five times. Let us define success as obtaining a , failure
as obtaining { , , , , }.
Let us define X as number of success (i.e. number of ), in five trials. How many
can you obtain when you throw a die five times? 0, 1, 2, 3, 4, 5.
Then X = 0, 1, 2, 3, 4, 5.
Note X = 0 means we haven’t obtained any in the 5 trials.
X = 1 means we have obtained only one in the 5 trials.
X = 2 means we have obtained two in the 5 trials.
X = 5 means we have obtained five in the 5 trials.
(i.e. every time you obtained )
Since success = obtaining a , probability of success =

p =p( ) = 1/6.
probability of failure, q = 1 – p = 1 – (1/6) = 5/6
p = probability of obtaining a success in one trial.
q = probability of obtaining a failure in one trial.
p+ q=1
53
Suppose that we have n trials and
(i) the trials are independent
(ii) each trial has only two possible outcomes: success or failure
(iii) probability of obtaining a success in one trial, p, is same for all trials.
Let X = No. of successes in n trials. X is said to follow a Binomial distribution with

parameter n and p : X  B (n, p)
P(X = r) = nCr pr qn–r

r = 0, 1, 2, . . . . . , n
E(X) = np
Var(X) = npq
Note: (i) X  B (n, p) is read as “X follows a Binomial distribution with

parameter n and p”.
(ii) E(X) means expected number of success in n trials.
(iii) P(X = 0) + P(X = 1) + P(X = 2) + . . . . . . + P(X = n) = 1
Example
It is well known that 75% of electronic chips manufactured are good.

(a) If 12 chips are manufactured find the probability that
(i) no chip is good;
(ii) only one chip is good;
(iii) at least one chip is good;
(iv) more than eight chips are good;
(b) Find the expected number of good chips; and
(c) Find the variance and standard deviation of the number of good chips.
Solution
Here
success = a good chip
failure = a faulty chip
Probability of success, p = probability of a good chip
 p = 75% = 0.75
probability of failure, q = 1 – p = 1 – 0.75 = 0.25
X = number of good chips out of the 12 chips.

X = 0, 1, 2, 3, 4, . . . . . . , 12
n = 12
(a) (i) P(no good chip)

= P(X = 0) = nCr pr q n–r = 12C0 0.75˚ 0.2512 – 0
= 1  1  0.2512
= 5.96  10 –8
= 0.000 000 0596
(ii) P (only one good chip)

= 12C1 0.751 0.2511
= 2.146  10–6
= 0.000 002 146

54
(iii) P(at lease one good chip)
= P(X = 1, 2, 3, . . . . . . , 12)
= P(X = 1) + P(X = 2) + . . . . . . + P(X = 12)
Note instead of calculating these 12 probabilities, we can use the fact that
P(X = 0) + P(X = 1) + P(X = 2) + . . . . . . + P(x = 12) = = 1
 P(X = 1) + P(X = 2) + . . . . . . + P(X = 12) = 1 – P(X = 0)

P (at least one good chip)
= 1 – P(X = 0)
= 1 – 0.000 000 0596
= 0.999 999 94
(iv) P(more than eight good chips)

= P(X  8) = P(X = 9, 10, 11, 12).
P(X = 9) + P(X = 10) + P(X = 11) + P(X = 12)
= 12C 0.759 0.253 + 12C 0.7510 0.252

9 10
+ 12C 0.7511 0.251 + 12C 0.7512 0.250

11 12
= 0.2581 + 0.2323 + 0.1267 + 0.0317

= 0.6488
= 64.88%
(b) Expected number of good chips = np

= 12  0.75 = 9
(c) Variance of the number of good chips = npq

= 12  0.75  0.25
= 2.25
Standard deviation = variance
= 2.25
= 1.5

55
The Poisson Probability Distribution
Named after the French mathematician Siméon Poisson, the Poisson distribution remains
one of the most useful distribution to model occurrences of events in a given length of
time or space.
For instance, number of faults reported in 5 hours; number of defects in 1m2 of cloth;
number of accidents in 3 weeks can be modelled by a Poisson distribution if events occur:
(i) randomly and independently;
(ii) “singly” in continuous space or time;
(iii) at a constant rate, in the sense that mean number of events occurring in an interval
is proportional to the length of the interval. For example, if the mean number of
calls received in one hour is 5 then mean number of calls received in two hours is
10.)
Let X = number of events occurring in a given interval, and  = average or mean number
of events occurring in the interval. Then X can take value 0, 1, 2, 3, . . . . . up to
infinity.
X is said to follow a Poisson distribution with parameter  : X ~ Po()
The probability of X is calculated using
−𝜆
𝜆𝑟
𝑃(𝑋 = 𝑟) = 𝑒 𝑟 = 0, 1, 2, 3, 4, ….
𝑟!
N.B. P(X = 0) + P(X = 1) + P(X = 2) + . . . . . . = 1
Expected or mean number of events, E(X) = 
Variance number of events, Var (X) = 
Example
The number of faults reported at a computer workshop during a working day

follows a Poisson distribution with mean of 6 faults per day.
(a) On a randomly chosen day, find the probability that
(i) no faults are reported
(ii) one fault is reported
(iii) at least one fault is reported
(iv) at most one fault is reported
(b) Find the probability ten faults are reported on a two-day period.
(c) Expected number of faults reported in a five-day week.
Solution
(a) Let X = number of faults reported in one day.

So X can take values 0, 1, 2, 3, . . . . .
X = 0 means no fault has been reported
X = 1 means one fault has been reported
X = 2 means two faults have been reported
Since X is defined for one day, we need to find mean,  for one day :
 = means number of faults reported in one day = 6
Therefore, X will follow a Poisson with  = 6: X R Po(6)

56
(i) P(no faults are reported)
𝜆𝑟 60
𝑃(𝑋 = 0) = 𝑒 −𝜆 = 𝑒 −6 = 0.00248
𝑟! 0!
(ii) P (one fault is reported )
𝜆𝑟 61
𝑃(𝑋 = 1) = 𝑒 −𝜆 = 𝑒 −6 = 0.0149
𝑟! 1!
(iii) P (at least one fault is reported)

= P(X  1)
= P(X = 1) + P(X = 2) + P(X = 3) + . . . . . .
Note since P(X = 0) + P(X = 1) + P(X = 2) + . . . . .. = 1
P(X  1) = 1 – P(X = 0)
= 1 – P(X = 0) = 1 – 0.002479 = 0.9975
P (at least one fault is reported)
(iv) P (at most one fault is reported)

= P(X  1) = P(X = 0) + P(X = 1)
0 1
= e– + e–
0! 1!
60 61
= e–6 + e–6 = 0.002479 + 0.0149
0! 1!
= 0.0174
(b) For one day, mean,  = 6
 For two days, mean,  = 6  2 = 12
Y = no of faults reported on a two-day period
P(Y =10) = e–

r 1210
= e–12 = 0.1048
r! 10 !
(c) For one day, mean,  = 6

 For five-day week, mean  = 6  5 = 30
Expected number of faults = 30
57
The Poisson Probability Distribution Approximation to the
Binomial Probability Distribution
Example
X ~ B(n, p) can be approximated by a Poisson distribution

if n  50 and np  5. X ~ Po() where  = np.
The probability that an electronic chip is faulty is 0.01. Using a suitable approximation,
find the probability that in a randomly chosen set of 60 chips,
(a) none are faulty
(b) at least two are faulty
Solution
A chip can either be faulty or not. Among 60 chips, number of faulty chips can be 0, 1,
2, 3, . . . . . , 60
X = number of faulty chips among 60

= 0, 1, 2, 3, . . . . . . , 60
X ~ B(60, 0.01)
n = 60, p = 0.01
np = 60  0.01 = 0.6  5 and n = 60  50
X ~ Po (0.6)
0
–
(a) P (none are faulty) = P(X = 0) = e 0 !  
(b) P (at least two are faulty) = (X  2)
= 1 – P(X = 0) – P(X = 1)
0 1
= 1 – e– 0 ! – e– 1 !
0.60 0.61
= 1 – e– 0 ! – e– 1 !
= 1 – 0.5488 – 0.3293 = 0.1219
The Geometric Distribution

Set X = number of trials up to including the trial that results in a success.
E.g X = 5 means first four trials were failures and the fifth trial results in a success.
If the trials are random and independent, and probability of

success is same for all trials, X is said to follow a Geometric
distribution
X ~ g(p) and P(X = r) = pqr–1
E(X) = 1/p and Var(X) = q/p

2

58
Example
State conditions under which a Poisson distribution can be used to give a good
approximation to a binomial distribution.
Packets of fruit gums contain 12 fruit gums altogether, and two flavours, orange and
lime are manufactured. The filling machine mixes the two flavours randomly in the
overall proportion of 2 oranges to 1 lime. Packets are regarded as acceptable provided
they contain at least 1 lime and at least 3 orange flavours. Calculate the probability that
a randomly chosen packet will not be acceptable.
The packets are delivered to shops in boxes of 200 packets. Use a Poisson distribution
to calculate the probability that a box will contain at least 2 unacceptable packets.
Solution
If X ~ B (n, p) and n is very large i.e. n  50, p is very small ( 0) such that np  5, the
Poisson distribution can be used as a good approximation to the Binomial distribution
by taking  = np.
Let X be the no. of lime flavour fruit guns in a packet of 12.

Y be the number of orange flavour fruit gums in a packet of 12.
Required Probability
= P (X = 0) + P(Y  3)
1
= (212 + 1 + 24 + 264)
312
4385
=
312
= 0.008251

59
Let W be the number of unacceptable packets from a box of 200.
Then W R B (200, 0.008251).
Using Poisson distribution to approximate,
 = np
= 200(0.008251)
= 1.6502
P(W  2) = 1 – P(W  2)
= 1 – P(W = 0) – P(W = 1)
= 1 – e–1.6502 – 1.6502e–1.6502
= 0.4911
Example
At the ‘hot drinks’ counter in a cafeteria both tea and coffee are sold. The number of
cups of coffee sold per minute may be assumed to be a Poisson variable with mean 1.5,
and the number of cups of tea sold per minute may be assumed to be an independent
Poisson variable with mean 0.5.
(i) Calculate the probability that in a given one-minute period exactly one cup of tea
and one cup of coffee are sold.
(ii) Calculate the probability that in a given three-minute period fewer than 5 drinks
altogether are sold.
(iii) In a given one-minute period exactly three drinks are sold. Calculate the
probability that these are all cups of coffee.
Solution
Let X & Y be the number of cups of coffee and tea told per minute respectively.
Then X R Po (1.5) & Y R Po (0.5).
(i) Required probability

= P(X = 1) P(Y = 1)
= (1.5e–1.5)(0.5e–0.5)
= 0.102
(ii) Let W be the number of drinks sold in a period of 3 minutes.

Then W R Po [3(1.5 + 0.5)]
W R Po (6)
Required probability
= P(W  5)
62e–6 63e–6 64e–6
= e–6 + 6e–6 + + +
2 6 24
)
36(36
= e–6 q1 + 6 + 18 + 36 + r
24
= 0.285

60
(iii) Probability (3 drinks are sold in one minute)
= P(X + Y = 3)
X + Y ~ Poisson (2)
23e–2
= 6
4 –2
= e
3
Required Probability
1.53 3
= 
6 4
= 0.422

61
Example
As part of its sale campaign for a new beauty preparation, a cosmetics manufacturer has
a counter in a large store at which each prospective customer is given a free 10-minute
individual session with a personal assistant to try the preparation. The store opens at
09 00. The first 10 minute session commences at 09 10 and, from then on, there are 50
sessions which run continuously throughout the day. It is estimated that the number of
customers arriving at the counter per hour has a Poisson distribution with a mean of 15.
Evaluate, to the nearest integer in each case, the expected number of 10-minute sessions
in the day during which 0, 1, 2, 3, more than 3, customers arrive at the counter.
Any assistant who has no customer waiting at the beginning of a 10-minute session is
allowed to have the whole of that session as a rest period. It is found that any customer
who would have to wait longer than the start of the next 10-minute session goes away,
that no customers buy the preparation without a trial, and that 50% of the customers who
do have a trial buy the preparation. The manufacturer makes a profit of £1.60 for each
sale and a loss on materials of £0.10 for every trial where no sale results. The daily wage
of an assistant is £20. Prove that, if the counter has two assistants, then the expected
daily profit is £21.50.
Find the expected daily profit if the counter has three assistants.
Solution
Let X be the number of customers arriving at the counter per 10-minute.
P(X  3) = 1 – P(X  3)
Expected no. of 10-minute sessions with 0 customer = 50P (X = 0) = 4

Expected no. of 10-minute sessions with 1 customer = 50P (X = 1) = 10
Expected no. of 10-minute sessions with 2 customers = 50P (X = 2) = 13
Expected no. of 10-minute sessions with 3 customers = 50P (X = 3) = 11
Expected no. of 10-minute sessions with > 3 customers = 50P (X > 3) = 12 (to the
nearest integer in each case)

62
Where there are 2 assistants, expected daily profit
(1  1.6 – 1  0.1)  (13 + 11 + 12) – 20  2 = 61.5 – 40
= £ 21.50
If there are 3 assistants, expected daily profit
= 78.75 – 60
= £ 18.75
Example
In each batch of manufactured articles the proportion of defective articles is p. From

each batch a random sample of nine is taken and each of the nine articles is examined. If
two or more of the nine articles are found to be defective the batch is rejected; otherwise
it is accepted. Prove that the probability that a batch is accepted is
(1 – p )8 (1 + 8p)
It is decided to modify the sampling scheme so that, when one defective is found in the
sample, a second sample of nine is taken and the batch rejected if this contains any
defectives. With this exception the original scheme is continued. Find an expression in
terms of p for the probability that a batch is accepted.
For this modified scheme evaluate the average number sampled per manufactured batch
over a large number of batches when p has the value 0.1.
Solution
Let X be the number of defective articles in a sample of 9.

Then X B (9, P).
P (a batch is accepted)
= P(X  2)
= P(X = 0) + P(X = 1)
= (1 – p)9 + 9p(1 – p)8
= (1 – p)8(1 – p + 9p)
= (1 – p)8(1 + p) (Proved)
In the modified sampling scheme,

P(a batch is accepted)
= P(X = 0) + P(X = 1)(P(X = 0)
= (1 – p)9 + 9p(1 – p)8(1 – p)9
= (1 – p)9(1 + 9p(1 – p)8)

63
Expected number sampled
9  P(X  1) + 18P(X =1)

= 9{1 – [9p(1 – p)8]} + 18[9p(1 – p)8]
=9 + 81p(1 – p)8
= 9 + 81(0.1)(0.9)8
= 9[1 + (0.9)9]
=12.5
Example
In a data transmission link between two computers, each character (ie letter, digit or
other symbol) is transmitted separately and there is a constant small probability that a
transmission error will occur; the mean failure rate is 1 error per ten million characters
transmitted.
(i) If 2 500 000 characters are transmitted, calculate the probability of no errors
occurring, and the probability of more than 2 errors occurring.
(ii) N characters of data are to be transmitted. Find and approximate value for the
maximum value of N such that the probability of no errors occurring is at least
0.999.
(iii) If an error occurs in transmitting a character, there is a probability p that it will be
discovered and corrected automatically. Prove that if N characters are transmitted
the probability of no errors finally remaining is e–(1 – p), where  = N  10–7.
Solution
(i) Let X be the number of errors occurring in 2.5 million characters transmitted.
Then
Since n is vary large and p is very small such that

25
np = = 0.25  5
100
We use Poisson distribution as an approximation with  = np = 0.25

P(X = 0) = e–0.25
= 0.7788
P(X  2) = 1 – P(X = 0) –P(X = 1) – P(X = 2)
(0.25)2
= 1 – e–0.25 q1 + 0.25 + ]
2
= 0.00216
(ii) Let Y be the number of errors occurring in N characters.

We want P(Y = 0)  0.999

64
1
N  ln 0.999
10 000 000
1
N  10 000 000 ln p
0.999
N  10005.003
the maximum value of N is 10005
(iii)  = N  10 – 7
P(no errors finally remaining)
= P(Y = 0) + P(Y = 1) + P2[P(Y = 2)] + . . . . . . PN[P(Y = N)]
(p)2 (p)N –
= e– + pe– + +
N!
e
2!
(p)2 (p)N
= e– (1 + (p) + +.....+
N!
2!
 e–ep (if N is large)

= e–(1 – p) (Proved)

65
4.5 Activity
1. Groups of six people are chosen at random and the number, x, of people in each
group who normally wear glasses is recorded. The results obtained from 200
groups of six are shown in table.
No. of group
0 1 2 3 4 5 6
wearing glasses(x)
No. of occurrences 17 53 65 45 18 2 0
Calculate, from the above data, the mean value of x.

Assuming that the situation can be modelled by a binomial distribution having the
same mean as the calculated above, state the appropriate values for the binomial
parameters n and p. Calculate the theoretical frequencies corresponding to those
in the table.
2. The random variable integer X is such that

e–mmr
P(X = r) = , (r  0),
r!
where m is a positive constant. Prove that E(X) = m.
The following table gives a record of the number of goals scored by a certain
football team in each of 450 matches:
Goals per match 0 1 2 3 4 5 6 or more
Frequency 201 163 65 18 2 1 0
Calculate the expected frequencies for a Poisson distribution having the same
mean number of goals per match.
3. Each week a security firm transports a large sum of money between two places.
The day on which the journey is made is varied at random, and in any week each
of the five days from Monday to Friday is equally likely to be chosen.
(i) Calculate the probability that, in a period of 10 weeks, Friday will be chosen
at least 3 times.
(ii) The event that, in a 4-week period, the same day is chosen on all four
occasions is denoted by S. Show that the probability of S occurring is 0.008.
Use an appropriate approximation to estimate the probability that, in one
hundred 4-week periods, the event S will occur at least 3 times.
4. At a certain depot a constant stock level at 10 000 bottles is maintained. The

probability that at this depot on any given day a particular bottle will be broken is
0.00012. Find, giving three significant figures in your answers,
(i) the probability that on any given day at least 2 bottles will be broken,
(ii) the probability that no bottles will be broken on 2 consecutive days,
(iii) the probability that on a day on which at least one bottle is broken there will
be exactly 3 bottles broken,
(iv) the average number of bottles broken on those days on which at least one
bottle is broken.

66
5. An urn contains a very large number of balls, of which a proportion p are red and
a proportion (1 – p) are white. The balls are indistinguishable apart from colour.
An experiment consists of drawing n balls at random from the urn and counting
the number of red balls drawn.
(i) For each of the following cases, state with a reason whether you would
expect a Poisson distribution to give a good estimate of the probability:
(a) P (1 red is drawn) when n = 5 and p = 0.2,
(b) P (1 red is drawn) when n = 100 and p = 0.01
(c) P (50 reds are drawn) when n = 100 and p = 0.5
(ii) Given that n = 20 and p = 0.2, use a binomial distribution to find the
probability that fewer than 3 reds are drawn, and calculate the percentage
error in using a Poisson distribution as an approximation in this case.
(iii) Given that n = 100 and p = 0.01, find the probability that in three successive
experiments a total of 5 red balls will be drawn.
6. An administrative centre has 3 independent telephone lines, A, B and C. During

the period 10 00 to 10 15 hours on a working day, the numbers of telephone calls
coming in on lines A, B and C are X1, X2, and X3 respectively, and each of these
independent random variables has a Poisson distribution. Also, it is known that
E(X1) = 1.2, E(X2) = 1.5 and that on 1 in 200 working days there are no telephone
calls made to the centre between 10 00 and 10 15 hours. Find, correct to three
significant figures,
(i) The value of E(X3),
(ii) The probability that, on any given working day there will be exactly one
incoming call to the centre between 10 00 and 10 15 hours.
(iii) The probability that, on not more than 2 out of 100 working days, there will
be exactly one incoming call to the centre between 10 00 and 10 15 hours.
7. A coin is biased in such a way that, on any throw, P(head) = p and P(tail) = q,
where p + q = 1. The random variable X denotes the number of heads resulting
from three throws of this coin. Tabulate the probability distribution of X, and show
from first principles (i.e without quoting any results relating to the binomial
distribution) and that E(X) = 3p and that Var(X) = 3pq.
In an experiment, three throws of the coin were repeated 1000 times. The numbers
of times each value of X occurred are shown in the table.
No. of heads 0 1 2 3
Frequency 90 329 412 169
Calculate the mean number of heads from these figures, and use it to estimate the
value of p.
Hence, estimate the probability that five throws of this coin will result in at least
one head.
8. An advertising display contains a large number of light bulbs which are continually
being switched on and off. Individual lights fail at random times, and each day the
display is inspected and any failed lights are replaced. The number of lights that
fail in any one-day period has a Poisson distribution which mean
2.2. Calculate
(i) the probability that no light will need to be replaced on a particular day,
(ii) the probability that at least four lights will need to be replaced on a
particular day,

67
(iii) the least number of consecutive days after which the probability of at least
one light having to be replaced exceeds 0.9999.
Calculate also the probability that, in a period of seven days, at least four lights
will need to be replaced on at least two days.
9. A randomly chosen doctor in general practice sees, on average, one case of a

broken nose per year and each case is independent of other similar cases.
(i) Regarding a month as a twelfth part of a year,
(a) show that the probability that, between them, three such doctors see
no cases of a broken nose in a period of one month is 0.779, correct
to three significant figures,
(b) find the variance of the number of cases seen by three such doctors
in a period of six months.
(ii) Find the probability that, between them, three such doctors see at least three
cases in one year.
(iii) Find the probability that, of three such doctors, one sees three cases and the
other two see no cases in one year.
10. In a particular city, a person selected at random among those who have heard a
rumour has a probability of 0.75 of believing the rumour. Find the probability that
(a) the eight person to hear the rumour will be fifth to believe it
(b) the fifteenth person to hear the rumour will be the tenth person to believe it.
11 Stephen shoots at a target. The probability that he misses the target is 5%. Find
the probability that he will miss the target for the second time on the fifteenth shot.
12. At a factory, electronic toys are packed in boxes containing 18 toys each. An
inspector will accept the box if two randomly chosen toys from the box are not
defective; otherwise the entire box is inspected. What is the probability that a
randomly chosen box will be accepted without further inspection if the box
contains
(i) Four defective toys;
(ii) Eight defective toys;
(iii) Twelve defective toys.
13. In a construction firm, there are 240 male and 60 female employees. Six employees
are chosen at random to represent the firm. Find the probability that four of the six
will be males.
Answers
1. 17.56, 52.67, 65.84, 43.90, 16.46, 3.29, 0.27
2. 202.2, 161.76, 64.70, 17.25, 3.45, 0.55, 0.88
3. (i) 0.322 (ii) 0.0474
4. (i) 0.337 (ii) 0.0907 (iii) 0.124 (iv) 1.72
5. (ii) 15.5% (iii) 0.101
6. (i) 2.60 (ii) 0.0265 (iii) 0.506

68
7. Mean = 1.66; p = 0.553; 0.982
8. (i) 0.111 (ii) 0.181 (iii) 5 0.379
9. (a) 0.779 (b) 1.5 (ii) 0.577 (iii) 0.025
10. (a) 0.1298 (b) 0.1101
11. 0.0180
12. (i) 0.5948 (ii) 0.2941 (iii) 0.0980
13. 0.2478
4.6 Normal Probability Distribution

A continuous random variable X has a normal distribution if it has a probability density
function f(x) that has a graph
The graph of y = f (x) is

f (x)
x

 is the mean of X
 is the standard deviation of X.
It is important that you note that
(i) the distribution is symmetrical about mean 
(ii) mode = mean = median = 
(iii) the bell-shape shows that there is a concentration of values around the mean 
and there are fewer and fewer observations as the distribution moves away from
the mean.
(iv) the total area under curve is one.
f (x)
(v)
x
 a b
Shaded area = P(a  x  b)
= P(a  x  b) = P(a  x  b)
= P(a  x  b)
Unfortunately, t hi s ar ea can’ t be eval uat ed using standard techniques

of integration. Thus, the statistical tables should be used.
The statistical normal table gives area under the standard normal curve which is
associated with the standard normal variable, x, that has mean,  = 0 and standard
deviation,  = 1. The Standard normal deviation is represented by . Thus we say
that Z follows a Normal distribution with parameters  = 0 and 2 = 1:
Z ~ N (0, 12)

69
The good news is that any normal distribution N(, 2) can be mapped onto the standard
normal variable by the process of standardisation.
If X ~ N(, 2) and Z ~ N (0, 12)
Then Z = (X - 
Unshaded area = P(Z  a)

f (x) = P(Z  a)
= 1 –  (a)
x
 Z=a
Shaded area
= P(Z  a) = P(Z  a) =  (a)
Properties : (1)  (-a) = 1 –  (a)

(2) 0   (a)  1
P(Z  a) =  (a)
P(Z  a) = 1 –  (a)
P(a  Z  b) = P(a  Z  b)
= P(a  Z  b) = P(a  Z  b)
=  (b) –  (a)
The statistical table (The normal distribution) called the standard normal curve has two
columns : Z and (z).
P(Z  a) = P(Z  a) = 1 –  (a)

f (x)
x
 a
= P(Z  a) = P(Z  a) =  (a)
70
Example
(i) P(Z  1) = 1 – (1) = 1 – 0.8413 = 0.1587

(ii) P(Z  –2.0) = (–2.0) = 1 – (+2.0)
= 1 – 0.9772 = 0.0228
(iii) P(–1.5  Z 0.5) = (0.5 – (–1.5)
= (0.5) – [1 – (1.5)]
= 0.6915 – [1 – 0.9332]
= 0.6247
Example
The time taken to complete the production of a toy at a certain factory follows a normal
distribution with mean 56 minutes and standard deviation 10 minutes. Find the
probability that the time taken to complete a randomly chosen toy is
(i) more than 68 minutes
(ii) between 56 and 65 minutes
Solution
Step 1 Identify the variable that follows a normal distribution and denote it by X.
So here, X = time taken to complete the production of a toy
Step 2 Find  and 

The mean value of X,  = 56
The standard deviation of X,  = 10
Therefore X follows a normal distribution with parameter
 = 56 and 2 = 100 : X ~ N (56, 102)
or X ~ N (56, 100)
Step 3 Use standardisation process

(i) P(time taken is more than 68 minutes)
P( X  68 )
= P(X  68)
We use the standardisation process:

we simply replace X by Z and then subtract mean () from 68
and divide by standard deviation.
68 – mean
= P (Z   )
Standard deviation
68 – 56
= P(Z  ) p= P(Z  1.2)
10
Step 4 Use the fact that P(Z  a) = 1 –  (a)

and P (Z  a) = (a) to write the probability in terms of 
P(Z  1.2) = 1 – (1.2)
Step 5 Use the statistical tables

Z (Z)
1.20 0.8849
Hence P(Z  1.2) = 1 – (1.2)
=1 – 0.8849 = 0.1151

71
We can also write it in the form of a percentage.
P(X  68) = P(Z  1.2) = 0.1151
= 11.51%
So there are 11.51% chances that it takes more than 68 minutes to produce a
toy.
(ii) P(56  time taken to complete a toy  65)
= P(56  X  65)
56−𝑚𝑒𝑎𝑛 65−𝑚𝑒𝑎𝑛
= P(𝑆𝑡𝑑.𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 < 𝑍 < )
𝑆𝑡𝑑.𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
56−56 65−56
= P( <𝑍< )
10 10
= P(0  Z  0.9)
= (0.9) – (0)
= 0.8159 – 0.5000 = 0.3159
Example
The random variable X is normally distributed with mean  and variance 2. Given that
P(X  58.39) = 0.0217 and P(X  41.82) = 0.0287, find  and .
P(X  58.39) = 0.0217

p = 1 – 0.0217 = 0.9783
58.39 – 
= –1(0.9783)

use table
58.39 – 
= 2.02

58.39 –  = 2.02 (1)

41.82 – 
P(X  41.82) = P(Z  )

72
41.82 – 
=  –1 (0.0287)

–1(a) = – –1(1 – a)
if a is less than 0.5
= ––1(1 – 0.0287)
= –  –1 (0.9713)
41.82 – 
= – 1.90

41.82 –  = – 1.90 (2)
Solve (1) and (2) simultaneously
58.89 –  = 2.02
–
41.82 –  = – 1.90
16.57 = 3.92
16.57
= = 4.227
3.92
58.9 –  = 2.02(4.227) = 8.539


 = 58.39 – 8.539 = 49.851

73
Approximating the Binomial Probability Distribution by the
Normal Probability Distribution
Suppose that in a village 100 people are selected at random. In this village 20% of the
people are known to be left-handed. Find the probability that there will be at least 30
left-handed in the random sample of 100 villagers.
If we define X as the number of people who are left-handed among the 100, then
probability of obtaining at least 30 people who are left-handed is written as P(X
 30), where X ~ B(100, 20%).
Now, P(X  30) = P(X = 30 + P(X = 31) + . . . . . . + P(X = 100)

= 1 – {P(X = 0) + P(X = 1) + . . . . . + P(X = 29)}
This is very tedious to calculate. In this situation, it is

preferable to use the normal distribution to find the
approximate values for the Binomial probabilities.
However, the following conditions must be satisfied:
X R B(n, p)
np  5
nq  5
X R N(np, npq)
That is B(n, p) must have np  5 and nq  5 in order to approximate B(n, p) by the

N(np, npq). As we are approximating the B(n, p) by a Normal distribution, we must also
carry out a continuity correction which means adding or subtracting ½ as follows:
after
1
P(X  a) P(X  a + )
continuity 2
correction 1
P(X  a) P(X  a – )
2
1
P(X  a) P(X  a – )
2
1
P(X  a) P(X  a + )
2

74
Example
In a village, 20% of people are left-handed. Find the probability that there will be at least
30 left-handed in a random sample of 100 villagers.
Solution
Let X = number of left-handed in a random sample of 100.

X = 0, 1, 2, 3, . . . . . , 100
X ~ B(100, 20%) = B(100, 0.2)
B(100, 0.2) n 100 p = 0.2

q = 1 – p = 1 – 0.2 = 0.8
np = 100  0.2 = 20  5
np = 100  0.8 = 80  5
N( = np = 20, 2 = npq = 16)
P(there will be at least 30 left-handed)

continuity
= P(X  30) P(X  29.5)
correction
We shall use standardisation to calculate the probability.
P(X  29.5)
29.5 – mean
= P( Z   )
Std. deviation
= P(Z  2.375)
= 1 –  (2.3745)
Z : 2.360 2.365 2.370 2.375 2.380

Z:
 : 0.9909 (2.375) 0.9913
(2.375) – 0.9909 0.9913 – 0.9909

=
2.375 – 2.360 2.380 – 2.360
 (2.375) = 0.9912

75
Example
If 30% of biscuits produced by a factory is known to contain peanuts, find the probability
that in a random sample of 100 biscuits, there will be 33 or more but less than 37 biscuits
containing peanuts.
Solution
Let X = number of biscuits containing peanut among the 100 biscuits

= 0, 1, 2, 3, . . . . . . , 100
X ~ B(100, 30%) = B(100, 0.3)
B(100, 0.3) p = 0.3

q = 1 – p = 1 – 0.3 = 0.7
np = 100  0.3 = 30  5
np = 100  0.7 = 70  5
N( = np = 30, 2 = npq = 21)
(i) P(33  X  37)
continuity correction
= P(33 – 0.5  X  37 – 0.5)

= P(32.5  X  36.5)
P(32.5  X  36.5)
standardize
= P (0.55  Z  1.42)
= (1.42) – (0.55)
= 0.9222 – 0.7088
= 0.2134

76
Approximating the Poisson Probability Distribution
by the Normal Probability Distribution
Just as we approximate B(n, p) by Normal distribution, we can use the Normal
distribution to approximate the Poisson distribution provided   15.
X ~ Po()
  15
X ~ N( , )
We must also carry the continuity correction as we

approximating the discrete Poisson distribution by the
continuous Normal distribution.
Example
At certain factory, machine breakdowns occur at an average of 3 per day. Assuming that
breakdowns occur at a constant rate, randomly in time and independently of one another,
calculate the probability that in a seven-day week, more than 27 breakdowns occur.
Solution
Let X = number of breakdowns in a 7-day week.
If in one day, average number of breakdowns is three, then in a seven-day week,

average number of breakdowns will be 7  3 = 21
Therefore, X ~ Po(21)
= 21  15
X ~ N( = 21, 2 = 21)
Probability that there are more than 27 breakdowns

= P(X  27)
continuity correction
= P(X  27.5)
P = P(Z 1.42)
=1 – Φ (1.42)
= 1 – 0.9222
= 0.0778

77
Example
A company hires out cars on a daily basis. The mean number of cars hired per day is 20.
Using the normal approximation, find
(a) the probability that
(i) less than 5 cars will be hired
(ii) exactly 20 cars will be hired
(b) on how many days the company will have to turn customers away?
Solution
(a) Let X = number of cars hired on a day

X ~ Po(20)
 = 20  15
X ~ N( = 20, 2 = 20)
(i) P(X  5)  – P(X  4.5)
= P(Z  – 3.466)
= ( – 3.466)
= 1 – (3.466)
use interpolation
Z : 3.40 3.499 3.50
 : 0.9997 (3.466) 0.9998
(3.466) – 0.9907 0.9998 – 0.9907

=
3.466 – 3.40 3.50 – 3.40
 (3.466) = 0.999766
 P(Z  – 3.466) = 1 – 0.999766

= 0.000234
(ii) P(exactly 20) = P(X = 20)
= P(19.5    20.5) continuity correction

19.5 –  20.5 – 
=P(  Z  )
 
= P(–0.1118  Z  0.1118)
= (0.1118) – (–0.1118)
= 1 – (0.1118)

78
= (0.1118) – 1 + (–0.1118)
= 2 (0.1118) –1
Z: 0.11 0.1118 0.12
: 0.5438 (0.1118) 0.5478
(0.1118) – 0.5438 0.5478 – 0.5438

=
0.1118 – 0.11 0.12 – 0.11
 (0.1118) = 0.5445
Hence 2(0.1118) – 1 = 2(0.5445) – 1 = 0.89

79
Standardized Normal Distribution
Φ (z) is the integral of the standardized normal distribution from - ∞ to z (in other
words, the area under the curve to the left of z). It gives the probability of a normal
random variable not being more than z standard deviations above its mean.
Values of z of particular importance:
z Φ (z)
1.645 0.9500 Lower limit of right 5% tail
1.960 0.9750 Lower limit of right 2.5% tail
2.326 0.9900 Lower limit of right 1% tail
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998
3.6 0.9998 0.9998 0.9999

80
UNIT Correlation and
5 Regression
2
UNIT STRUCTURE
1 5.1
5.2
5.3
Introduction
Learning Outcomes
Correlation
5.4 Activity 1
5.5 Regression
5.6 Activity 2
5.1 Introduction
In this Unit, you will learn how to establish relationship between variables. Moreover,
we you will learn how to develop regression models.

1. Understand correlation and calculate the correlation coefficients.
2. Understand regression and write equation of regression line.
5.3 Correlation
Very often we hear people talking about correlation between two variables. For example, there is a
positive correlation between IQ of a person and the marks that the person obtains for an exam
(this means persons with higher IQ tend to score higher marks and people with lower IQ tend to
score lower marks); there is a negative correlation between savings and expenditure (this means the
more you spend, the less you will be able to save). In statistics we use correlation to denote
association between two quantitative variables that have a linear relationship such that one variable
increases or decreases by a fixed amount for a unit increase or decrease in the other. When one
variable increases as the other increases the correlation is positive; when one decreases as the
other increases it is negative. Complete absence of correlation is represented by zero. It is
important to note that we can’t deduce any cause and effect relationship from correlation.
If we use the letters X and Y to represent the two variables (e.g. IQ = X and Exam marks = Y),
then for every person we shall have a pair of values (x = IQ of person, y = exam marks of the
person). When we plot the coordinates (x, y) for all the persons, the diagram obtained is called a
scatter diagram. We plot the X-scores on the x-axis and Y-scores on the y-axis. A single dot
represents the data of one person.
The scatter diagrams give an idea about the correlation as shown in the diagrams below.
81
y y y
x x x
(a) Positive correlation (b) Strong positive (c) Perfect positive
between x and y correlation correlation between
between x and y x and y
y y y
x x x
(d) Negative correlation (e) Strong negative (f) Perfect negative
between x and y correlation between x correlation between x
and y and y
The terms association and correlation will be used interchangeably. However, it

is good to note that there is association between two variables if knowing the
value of one provides information about the likely value of the other. There is
correlation between the variables if the association is linear – that is the
association can be represented by a straight line on a scatter diagram.

82
y y
x x
(g) No correlation (h) Non linear relationship
between x and y between x and y
Correlation Coefficient
On top of the scatter diagram, we can also mathematically compute the degree of
linear association between two variables; this measure of correlation is called the
correlation coefficient. Such coefficients measure the strength of the correlation.
The coefficient is also a good indicator of whether the correlation is strong or
weak. The value of these coefficients usually ranges from -1.00 to +1.00. A
positive coefficient indicates that two variables systematically vary in the same
direction: as one variable increases, the other variable tends to increase. The
closer the coefficient is to +1.00, the stronger is the positive association. A
negative coefficient indicates that two variables systematically vary in opposite
directions: as one variable increases, the other variable tends to decrease. The closer
the coefficient is to -1.00, the stronger is the negative association. A coefficient
close to zero indicates that no systematic co-variation exists between the two
variables. There are different correlation coefficients that have been developed.
They apply to different data types and conditions.
One of the most common correlation coefficient is the Pearson’s Product

Moment Correlation Coefficient. This coefficient is subject to the following
four assumptions. (Note: when analysing real-world data, one or more of these
assumptions may be violated. However, there is no need to be worried if this
happens. Even when your data fails certain assumptions, there is often a solution
to overcome this.)
First assumption:
The two variables should be measured at the interval or ratio level (i.e., they are
continuous). Examples of such variables include time taken to complete a task
(measured in minutes), intelligence (measured using IQ score), exam
performance (measured from 0 to 100), and weight (measured in kg). Variables
like gender (male or female), and type of banks are neither interval nor ratio
variables.
Second assumption:
There must be a linear relationship between the two variables. Scatter diagrams
can help to visually check for linearity. If the relationship displayed on the scatter
diagram is not linear, then it is recommended to run a non-parametric equivalent
to Pearson’s correlation or transform the data (one example of transformation
would be the use logarithm; for instance, one may decide to plot lnx against xy
instead of x against y in order to obtain a linear relationship).
83
Third assumption:
There should be no significant outliers. Outliers are simply single data points within
your data that do not follow the usual pattern (they are often extremely large values or
extremely small values). The following scatter diagrams highlight the potential impact
of outliers:
r = 0.4 r = 0.7
Outlier Outlier removed
The presence of outlier produces a lower coefficient of r = 0.4 as the outlier tends
to bend the line (called the line of best fit). The removal of the outlier shows that
there is quite a strong correlation (r = 0.7) between the variables. This shows
that Pearson’s correlation coefficient is sensitive to outliers, which can have a
very large effect on the line of best fit and the value of the coefficient, leading to
very different conclusions regarding your data. Therefore, it is best if there are
no outliers or they are kept to a minimum. Box-and-whisker’s plot can be good
indicators of the presence or absence of outliers.
Fourth assumption:
The variables should be approximately normally distributed. In order to assess
the statistical significance of the Pearson correlation, you need to have bivariate
normality, but this assumption is difficult to assess, so a simpler method is more
commonly used. This is known as the Shapiro-Wilk test of normality. This
assumption is vital for small samples. It is considered to be robust for large
samples (that is 30 or more pairs of data). If the variables are not normally
distributed but take values that can be ranked (ordinal data) then we should use
non-parametric Spearman’s rho correlation coefficient. Kendall’s Tau is another
non-parametric correlation and it should be used rather than Spearman’s rho
when you have a small set of data with large number of tied ranks.
The Contingency Coefficient and Cramer’s V compute the strength of

relationship between nominal data (e.g. X = gender: males and females, and Y =
brand of milk purchased: Brand A, Brand B, and Brand C). The Phi Coefficient
computes the correlation between two nominal variables having only two
categories (e.g. X = gender: males and females, and Y = Yes or No which can be
the answer to a dichotomous question). When we have to calculate the correlation
coefficient between mixed data types, we can use the biserial correlation
coefficient. Point biserial is used when one variable is interval/ratio and the
second is dichotomous. Rank biserial is used when one variable is ordinal and the
second is dichotomous (dichotomous means only two possible outcomes: Yes
and No). We can summarize these various coefficients in the following table:
Open University of Mauritius - Computational Mathematics
84
Variable 1
Va r i a b l e 2
Interval/Ratio Ordinal Nominal Dichotomous
Interval/Ratio Pearson Spearman* Point Biserial
Ordinal Spearman* Spearman Rank Biserial
Cramer’s V and
Nominal Contingency
Coefficient**
Dichotomous Point Biserial Rank Biserial Phi Coefficient
*requires interval/ratio data to be ranked ** requires chi-square value
Kendall’s Coefficient of Concordance computes the correlation of three or more

rankings of items.
Interpretation of the magnitude of Correlation Coefficient

The sign of the correlation coefficient determines whether the correlation is
positive or negative. The magnitude of the correlation coefficient determines the
strength of the correlation. Although there are no hard and fast rules for
describing correlational strength, we can use these guidelines:
0 < |r| < 0.3: weak correlation

0.3 < |r| < 0.7: moderate correlation
|r| > 0.7: strong correlation
For example, r = -0.849 suggests a strong negative

correlation
Population and Sample Correlation Coefficient

When all the data in the population are used to calculate the correlation
coefficient, we obtain the population correlation coefficient, which is represented
by ρ (rho). The sample correlation coefficient r is an estimate of ρ.
Coefficient of Determination
The coefficient of determination is the square of the correlation coefficient (r2).
r2 quantifies the proportion of the variance of one variable “explained” (in a
statistical sense, not a causal sense) by the other.
For example, if r =0.9, then r2 = 0.81. r2 = 0.81 = 81% indicate that 81% of the
variability in one variable is explained by the other.

85
Calculation of the Person’s Correlation
Coefficient
n xy – xy
r=
[n(x2) – (x)2][n(y2) – (y)2]
where
n = sample size
xy = Multiply x by y and then sum all the products
Σx = Sum all the x scores
Σy = Sum all the y scores
Σx2 = Square each x and then sum
Σy2 = Square each y and then sum
(Σx)2 = Square the sum of all x
(Σy)2 = Square the sum of all y
Example
Suppose that we would like to calculate the correlation between the marks
obtained by 10 students in an English test and a Mathematics test. Calculate the
Pearson correlation coefficient and interpret it (assuming that all the assumptions
hold). The table below gives the marks obtained by the students.
Marks in
40 42 45 55 68 70 80 92 95 98
English
Marks in
60 64 71 76 83 85 89 90 91 93
Mathematics
Solution
Step 1 Let X = Marks obtained in English test

Let Y = Marks obtained in Mathematics test
Let n = number of pairs of data in the sample = 10 (since we have 10 students)
Step 2 Calculate the values of X2, Y2 and XY.

For example, the first student (see first row in the table below) obtained 40
marks in English (so X = 40) and 60 marks in Mathematics (so Y = 60).
X2 = 40 times 40 = 1600; Y2 = 60 times 60 = 3600; and XY = 40 times 60 =
2400. We must also calculate the sum for each variable. When we sum the
marks obtained by all the students in English, we obtain 685, therefore
x = 685. The Table below shows all the calculations.

86
X Y X2 Y2 XY
40 60 1600 3600 2400

42 64 1764 4096 2688
45 71 2025 5041 3195
55 76 3025 5776 4180
68 83 4624 6889 5644
70 85 4900 7225 5950
80 89 6400 7921 7120
92 90 8464 8100 8280
95 91 9025 8281 8645
98 93 9604 8649 91114
Σx = 685 Σy =802 Σx =51431
2
Σy 65578 Σxy = 57216
2=
Step 3 We use the formula to calculate r.
r=
[n(x2) – (x)2][n(y2) – (y)2]
10(57216) – (685)(802)
r= = 0.957
[10(51431) – (685)2][10(65578) – (802)2]
r = 0.957 indicates that there is a strong positive correlation between marks

obtained in Mathematics and English. This means that those who perform
well in English tend to perform well in Mathematics as well.

87
Rank Correlation Coefficient (rho)
When we can’t assume both variables to be normally distributed, we will

calculate the distribution-independent correlation coefficient like the
Spearman’s coefficient. The formula for Spearman’s coefficient is
6d2
r=1–
n(n2 – 1)
where n is the number of pairs of ranks. d is the difference between the pair of
ranks.
Calculation of the Spearman’s Rank Correlation

Coefficient
Step 1 Rank the n sample values of X
Step 2 Rank the n sample values of Y
Step 3 Calculate the difference in the ranks e.g. d1 = rank of first value of
X – rank of the first value of Y
Step 4 Square each difference
Step 5 Sum the squares of the differences to obtain

d2.
Step 6 Replace the values in the formula and

interpret.
Example
Suppose that we would like to calculate the correlation between the marks
obtained by 7 students (represented by A, B, C, D, E, F and G) in an English test
and a Mathematics test. Calculate the Spearman correlation coefficient and
interpret it. The table below gives the marks obtained by the students.
Student Marks in English (X) Marks in Mathematics (Y)

A 60 55
B 90 67
C 80 82
D 75 73
E 48 55
F 54 48
G 45 46
Solution
Step 1 Let X = Marks obtained in English test

Let Y = Marks obtained in Mathematics test
88
Let n = number of pairs of data in the sample = 7 (since we have 7 students)
89
Step 2 Calculate the ranks of X and of Y (e.g. Student B who obtained 90 marks (the
highest mark) in English would be given a rank of 1 for English). Calculate
d and d2.
Student X Y Rank of X Rank of Y d = Rank of X minus Rank of Y d2

A 60 55 4 4.5 -0.5 0.25
B 90 67 1 3 -2 4
C 80 82 2 1 1 1
D 75 73 3 2 1 1
E 48 55 6 4.5 1.5 2.25
F 54 48 5 6 -1 1
G 45 46 7 7 0 0
d = 0 d2=9.5
Note.
(1) Students A and E have both obtained 55 marks in the Mathematics test. So they
have the same ranks for Y: (4 + 5)/2 = 4.5.
(2) d must always be zero.

6d2 6  9.5
r = 1 – = 1 – = 0.830
n(n2 – 1) 7(72 – 1)
r = 0.830 indicates that there is a strong positive correlation between ranks of

Mathematics and English scores. This means that those who perform well in
English tend to perform well in Mathematics as well.

90
Kendall’s tau
Kendall’s tau, a non-parametric correlation (thus we don’t need normality assumptions)
is used rather than Spearman’s rho when you have a small set of data with large number
of tied ranks.
Suppose that we have to compare a pair of observations (xi, yi) and (xj, yj). We refer to
the pair as being “concordant” if xi < xj if and only if yi < yj. Otherwise, the pair is
discordant.
Let C = total number of concordant pairs and Let D = total number of discordant pairs.
C–D 4C
Kendall’s tau, t = = – 1 –1  t  1
½n(n – 1) n(n – 1)
Example
2 Judges rank 8 pieces of artworks labelled as A, B, ..., H. Their ranks are given below:
Artwork A B C D E F G H
Rank of First Judge 1 2 3 4 5 6 7 8
Rank of Second Judge 3 4 1 2 5 7 8 6
Calculate Kendall’s tau.
Solution
Note we have arranged the ranks given by Judge 1 in ascending order.
For the artwork A, first Judge gave rank = 1 and the second judge gave rank 3. We count
the number of ranks of second judge to the right that are greater than 3: there are 5 such
ranks.
For the artwork B, first Judge gave rank = 2 and the second judge gave rank 4. We count
ranks.
For the artwork C, first Judge gave rank = 3 and the second judge gave rank 1. We count
ranks.
For the artwork D, first Judge gave rank = 4 and the second judge gave rank 2. We count
ranks.
We do the same for the remaining artworks.
Total number of concordant pairs, C = 5 + 4 + 5 + 4 + 3 + 1 + 0 + 0 = 22

4C 4(22)
Kendall’s tau, t = – 1 = – 1 = 0.571
n(n – 1) 8(8 – 1)

91
5.4 Activities
1. A random sample of 10 children was timed in the task of writing the letters of the
alphabet first with their right hand and then with their left hand. Is there a
significant correlation between the two sets of times below? Use a 5% level of
significance and assume that the time to complete the task for ‘right hands’ and
for ‘left hands’ are both normally distributed.
Children 1 2 3 4 5 6 7 8 9 10
Time Right hand 12 9 17 15 13 18 12 16 18 16

(Secs) Left hand 25 22 30 29 33 29 28 31 32 34
2. Ten pairs of identical twins have the following birth weights (kg):
1 2 3 4 5 6 7 8 9 10
Weight 1st born 3.95 3.41 3.73 4.13 3.48 4.28 3.98 4.18 4.04 3.73
(kg) 2nd born 3.93 3.35 3.72 4.18 3.44 4.15 3.89 4.20 4.00 3.72
(a) Plot a scatter diagram.

(b) The value of Pearson’s r for these data is one of the following:
–0.105 0.505 –0.985 0.985 0.264
Choose the correct one, giving a reason for your choice.
3. The height (cm) and weight (kg) of 12 babies were recorded at the time of the
birth.
Height (x) 44 41 43 40 41 37 38 36 44 43 35 40
Weight (y) 3.5 2.8 3.2 2.7 2.9 2.5 2.8 2.6 3.6 2.6 2.4 2.9
(Note : x = 482, x2 = 19466, y = 34.5, y2 = 100.77, xy = 1396.0)
(i) Represent the data graphically.

(ii) Calculate the product-moment correlation coefficient, and test whether the
population correlation coefficient between the weight and height is zero.
(iii) One of the children in the sample was born prematurely, the others were
born after full-term pregnancies, but the records are not available. Which
of the observation would you suspect for the premature birth?

92
4. The marks awarded by two judges to 10 contestants at a music festival were as
follows:
Contestant
A B C D E F G H I J
Judge 1 10 13 8 7 3 4 15 6 17 18
Judge 2 5 4 6 5 7 8 3 7 2 2
Calculate Spearman’s rs and test the hypothesis that the marks of the judges are
uncorrelated against a suitable one-sided alternative.
5. From a large number of families living in a rural area each having a youngest child
aged between 5 and 7 years, a researcher selected a random sample of six families
and measured the IQ of the youngest child. The researcher also recorded the total
number of children in each family and drew up the following table:
Number of children in family 2 3 3 3 4 5

IQ of youngest child 110 100 100 80 80 70
The researcher calculated the value of Pearson’s r for these data to be – 0.875, and
concluded that the cause of lower IQ in larger families is a direct result of the
parents having to divide their attention between a larger number of children.
Answers
1. r = 0.694, t = 2.73, t0.05, 8 = 1.86, there is a significant positive correlation.
2. Choose 0.985 since scatter diagram indicates very high positive correlation.
3. (ii) r = 0.7927, t = 4.11, t0.025, 10 = 2.23. Correlation is significantly different from
zero.
(iii) x = 43 and y = 2.6 as it is lower on the scatter diagram.
4. rs = –0.2673; critical value of rs is 0.564; reject H0 and conclude that the rankings
are uncorrelated and accept H1 that high ranks of Judge 1 are associated with low
ranks of Judge 2.
5. rs = –0.7429

93
5.5 Regression
Suppose that we have found that there is at least a moderate correlation between two
variables say IQ (let X = IQ) and total marks obtained in a Mathematics test
(let Y = marks). This means that there is a linear relationship between X and Y. Such
linear relationships can be written in the form of an equation that connects X and Y. This
equation can be used to predict or forecast value of Y when value of X is given. This
process of describing how the variable Y (often referred to as the outcome or response
or dependent variable) is numerically related to the other variables like X (often referred
to as predictor or independent or explanatory variable). Y is called the response variable
or the dependent variable because its value depends to some extent on the value of X. X
is called the predictor because it is used to predict Y or explanatory variable because it
explains the variation or changes in Y. It is also called the independent variable because
its value does not depend on Y.
When we have several independent variables (i.e. the dependent variable is affected by
several independent variables), we obtain a multiple linear regression equation that we
can write as Y= b0 + b1X1 + b2X2.... + bkXk.
In this manual, we will deal with a simple regression equation (i.e. we will have only
one independent variable): Y = a + bX.
a is called the intercept of the regression line.
b is called the slope of the regression line.
a and b would be calculated using data from random sample derived from the population.
Very often we refer to a simple regression model which we write with a random error
term : Y =  + X + ;  and  are population parameters.
Except in the case when the correlation between X an Y is 1 or -1, all the points plotted
on the scatter diagram won’t lie on the line. Thus, for any point x that doesn’t lie on the
line, we shall have two values of y: one that lies on the line (called the estimated value
of y which we often denote by ŷ), and one from the point plotted on the scatter diagram
(called the observed value of y).
The difference between the observed and predicted value is referred to as the residual:
y - ŷ = e.
Observed value of Y = ŷ
Estimated value of Y = ŷ
obtained from the equation of
the regression line

94
Least Squares Method
Least squares method will be used to calculate the value of a and of b to obtain the
equation of the line of best fit. Suppose that we have n pairs of data: (x1,y1),(x2,y2),....,(xn,
yn). For each pair of data we calculate the residual: y1 – ŷ1 = e1; y2 – ŷ2 = e2; y2 – ŷ3 =
e3; ...... ; yn – ŷn = en.
Least squares methods minimises the sum of residuals.
Assumptions
There is a linear relation between X and Y that can be formulated as Y =  + X + .
It is assumed that e1, e2, e3, ......, en are independent. It is also assumed that each e is
normally distributed with mean zero and variance σ2.
It is assumed that there is a random sample of n pairs of observations, (X1 , Y1), (X2 ,
Y2), ..., (Xn , Yn)
For each value of X, the population of Ys have normal distribution with mean
µY = α + X.
The population of Y’s has a constant standard deviation (which is the same for every
value of X).
For each value of X, the Y values are independent.
How to calculate a and b?
Step 1 Calculate x, x2, y, y2, xy
n  xy –  x  y
Step 2 b =
n  x2 – (x)2
x y
Step 3 x- = ;ӯ =
n
n
Step 4 a = ӯ – bx-

95
Example
For a random sample of 10 students, the marks obtained in English (x) and marks
obtained in Mathematics (y) were recorded. Write the regression equation that can be
used to predict y. Predict value of y for x = 30.
x y
1 44
5 56
15 62
25 68
35 71
45 74
55 82
65 87
75 90
85 95
Solution
Since we have 10 students, n =10
Let’s calculate x, x2, y, y2, xy
x y x2 y2 xy
1 44 1 1936 44
5 56 25 3136 280
15 62 225 3844 930
25 68 625 4624 1700
35 71 1225 5041 2485
45 74 2025 5476 3330
55 82 3025 6724 4510
65 87 4225 7569 5655
75 90 5625 8100 6750
85 95 7225 9025 8075
2 2
Σ x = 406 Σ y = 729 Σ x = 24226 Σ y =55475 Σ xy = 33759
n  xy –  x  y 10(33759) – (406)(729)
b= = = 0.538
n  x2 – (x)2 10(24226) – (406)2
x 406
x- = = = 40.6
n 10
y 729
ӯ = = = 72.9
96
n 20
97
a = ӯ – bx- = 72.9 – (0.538)  (40.6) = 51.057
Hence the regression equation is
y = a + bx = 51.057 + 0.538x
We can interpret the coefficient of x as the rate of increase of y with respect to x is 0.538.
That is, for every unit increase in x, y increases by 0.538 unit.
Note that we can use the regression equation to forecast y for values of x in the range 1
to 85 as the regression equation may not be valid outside this range. X = 30 is within this
range.
When x = 30, ӯ = 51.057 + 0.538x = 51.057 + 0.538(30) = 67.18
Correlation and Coefficient of Determination

If we calculate the Pearson’s correlation coefficient between X and Y,
(x = 406, y = 729, x2 = 24226, y2=55475, and xy = 33759)
r=
[n(x2) – (x)2][n(y2) – (y)2]
10(33759) – (406)(729)
r= = 0.98
[10(24226) – (406)2] [10(55475) – (729)2]
R2 = 0.982 = 0.96=96%. This means that this regression model accounts for 96% of
variances in Y (i.e. 96% of variability in Y is explained by linear regression on X).

98
Activities 2
You should attempt the parts on hypothesis testing.
1. The following table shows the masses of a certain chemical substance which
dissolved in a given mass of water at various temperatures.
Tempetature(˚C) 10 20 30 40 50 60 70 80 90
Mass (g) 4.5 4.6 5.0 5.6 5.9 6.3 6.4 6.7 7.4
Plot a scatter diagram and calculate, to 3 significant figures, the coefficients of the
equation of the regression line. Plot the regression line on the scatter diagram and
verify that it passes through the point given by the mean values of temperature and
mass. Estimate mass for a temperature of 56˚C.
2. In an experiment to find the force required to pull a plough towed by a tractor at a

sequence of constant speeds (X kilometres per hour) and the force required (Y
newtons) was measured. Originally the tractor was run at ten different speeds but
the data on the force required at the highest speed of 11.2 km h–1 was mislaid. The
results for the other nine speeds are shown in the table (and summarized below the
table).
Speed (X km h–1) 1.5 2.0 3.2 4.3 5.5 6.6 8.4 8.8 9.7
Force (Y newtons) 96 95 109 112 122 133 138 156 154
x = 50, y = 1115, x2 = 350.88
y2 = 142 335, xy = 6739
Show these results on a scatter diagram.

The original reading corresponding to the speed of 11.2 km h–1 was subsequently
found, and was 162 newtons. Find, by calculation, the equation of the least squares
regression line for the ten readings, showing all your working, and draw this line
on your graph.
Estimate the force required with a tractor speed of 10 km h–1.
Find, from your regression line, the value of y when x = 0 and interpret your result.
3. A study was made of the amount of converted sugar in a certain process at various
temperatures. The coded data are as follows:
Temperature, x 1.0 1.1 1.2 1.3 1.4

Converted sugar, y 8.1 7.8 8.5 9.8 9.5
Find the least squares linear regression equation which could be used to predict
the amount of converted sugar given the temperature.

99
4. Rates of oxygen consumption were obtained for mammals running on a treadmill.
Five mammals were randomly selected and assigned to one of five running speeds.
Their rates of oxygen consumption were as follows:
Running speed, x (m/min) 2 4 6 8 10

Oxygen consumption, y (ml/g/h) 3.5 6.1 6.7 8.9 13.0
Assuming a linear regression model yi =  + xi + ei, estimate  and .
5. In an experiment to find the Young modulus for a brass wire the following 11 pairs
of values of x (suspended mass in kg) and y (length of wire in mm – 700 mm) are
obtained. The equation connecting x and y is assumed to take the form
y =   x. Obtain the least squares estimates for  and  showing your working
clearly.
x 1 1.5 2 2.5 3 3.5 3 2.5 2 1.5 1

y –1.1 –0.6 0 0.4 0.9 1.5 1.0 0.6 0.1 –0.5 –0.9
(x2 = 57.25, y2 = 7.22, xy = 10.00)
6. The mass y grams of a certain chemical substance which is dissolved in a certain

mass of water at a temperature of x˚C is given by the relation
y =  + x
Where  and  are unknown constants. In an experiment to estimate  and , the

temperature was carefully controlled at 10 different values between 0˚C and
100˚C, and y was measured at each temperature value. The following calculations
were made from the 10 observed pairs of values (x, y):
(x = 400, y = 688, x2 = 19,200, xy = 38,720)
(i) Determine the least squares estimate of the equation connecting x and y.
(ii) Estimate the mass of the chemical dissolved when the temperature is 60˚C.
7. An experiment was conducted to determine the mass y grams of a given amount

of chemical that dissolved in glycerine at x˚C. The results of the experiment are
given in the following table.
Temperature (x˚C) 0 10 20 30 40 50
Mass (y grams) 51.3 51.4 51.9 52.0 52.6 52.8
Assuming that the true value of y is linearly related to the value of x, obtain the
least squares estimate of this relationship.

100
Assuming further that the temperatures used in the experiment were controlled
accurately but that the measured values of y were subject to independent errors
which are normally distributed with mean zero and standard deviation 0.2 g,
calculate 95% confidence intervals for:
(i) The mass of chemical that will dissolve in glycerine at 0˚C;
(ii) The additional mass of chemical that will dissolve in glycerine when the
temperature is raised from 10˚C to 20˚C.
8. An aptitude test, designed to predict the potential productivity of new employees,

was given to a random sample of nine employees. The table shows the test results
together with a measure of the actual productivity of each employee:
Employee 1 2 3 4 5 6 7 8 9
Aptitude score, x 9 17 20 19 20 23 16 24 22
Productivity, y 2 25 29 33 43 32 24 33 14
Calculate the coefficients of a regression equation which could be used to predict

productivity from the aptitude score. Estimate productivity for aptitude scores of
10, 20 and 30. Which estimate is (a) the most reliable, (b) the least reliable? Give
reasons.
9. A random sample of five families had the following annual incomes and annual
savings in thousands of pounds.
Income 16 22 18 12 12
Savings 1.2 2.4 2.0 1.4 0.6
Draw a scatter diagram. Assuming both income and savings are measured with
negligible measuring error, calculate the coefficients of the equations of the
regression lines of (a) savings on income, (b) income on savings. Draw both lines
on the scatter diagram.
Also calculate Pearson’s r for these data and verify that the value of r2 is equal to
bb (the product of the regression coefficients of the two regression lines).
10. The ages, x years (given to one place of decimals), and heights, y cm (to the nearest
cm), of 10 boys were as follows:
x 6.6 6.8 6.9 7.5 7.8 8.2 10.1 11.4 12.8 13.5
y 119 112 116 123 122 123 135 151 141 141
Given that x2 = 899.80, y2 = 166,091, xy = 12,023.3, calculate the linear
correlation coefficient between x and y and comment upon the result.
Calculate the equation of the regression line of y on x, and use it to estimate the
height of a boy 9.0 years old.
State the value of y given by the regression line when x = 30 and comment upon
your answer.

101
Answers
1. y = 4.04 + 0.0357x; when x = 56, y = 6.04 g.
2 ӯ = 83.3 + 7.25x; when x = 10, ŷ = 155.8; when x = 0, ŷ = 83.3 but this is

extrapolation and so is an invalid prediction.
3. y = 2.98 + 4.8x, z = 1.039
4. y = 1.1 + 1.09x.
5. y = 1.998 + 0.9948x.
6. (i) y = –71.2 + 3.5x; (ii) when x = 60, ŷ = 138.8
7. y = 512 + 0.032x; (i) 50.92 to 51.48; (ii) 0.23 to 0.41.
8. y = 7.059 + 1.309x; 20.1, 33.2, 46.3; (a) estimate for x = 20 most reliable,
being nearest centre of data; (b) estimate at x = 30 is least reliable, being
outside the range of data.
9. If X denotes income, Y denotes savings, y = –0.791 + 0.144x, x = 7.967 + 5.285y;

r2 = 0.87372 = 0.7633; bb = 0.444  5.285 = 0.7632.
10. r = 0.9033; y = 87.4 + 4.46x; when x = 9, ŷ = 127.5 cm; when x = 30, ŷ =

221.2 cm, but this is extrapolating far beyond the values of x in the data.
102
UNIT Sets, Relation and
6 Functions
1UNIT
STRUCTURE
6.1 Introduction
6.3 Sets
6.4 Set Rules
6.5 Activity 1
6.6 Relations
6.7 Activity 2
6.8 Functions
6.9 Activity 3
6.1 Introduction
In this Unit, you will be introduced to Sets. I am sure that you have been
dealing with Sets since primary schools. Thanks God, sets have not changed.
Sets are vital in computing.

By the end of this Unit, you should be able to do the
following:
4. Write sets different ways.
5. Use Set notations.
6. Prove identities involving sets.
6.3 Sets
A Set is a collection of objects that share at least one common characteristic. The objects in the
set are called elements of the set. Sets are usually given names such as Set A, Set B,…
Suppose that there is a Set A that contains even numbers between 1 and 7. It is common to use
curly-braces to write sets. We can write Set A in different ways including:
A = {even numbers between 1 and 7}
A = {2, 4, 6}
We can use Venn Diagrams to represent sets:

A
2
24
6
103
Since there are three elements in Set A, we say that cardinal number of set A is 3 or
n(A) = 3. This can be used to write Set A as
It is A common to use the letter x to represent an element.
“:” is used 23 to mean “such that”. Hence we can also write Set A
as
A = { x: x is an even number between 1 and 7}
We can also give a formula to calculate the value of each element:

A = {x: x = 2m where m is an integer between 1 and 3 inclusive}
Here, m = 1 or 2 or 3. Thus, when m = 1, x = 2m = 2(1) = 2; when m = 2, x
= 2m = 2(2) = 4; and when m = 3, x = 2m = 2(3) = 6.
Z is used to represent the set of all integers. Z + is the set of all positive integers
and Z- is the set of all negative integers. Therefore the statement that m is an
integer can be written as m ∈ Z where ∈ means “is an element of”.
A = {x: x = 2m, m ∈ Z, 1 ≤ m ≤ 3}
If a set contains coordinates then instead of using x, we will use (x,y) to

represent an element. For example set B contains coordinates (1, 1), (1, 2),
(1,3), (2,1), (2,2) and (2,3).
B = {(1, 1), (1, 2), (1,3), (2,1), (2,2), (2,3)}
We can also write set B as:

B = {(x, y): x = 1, 2 and y = 1, 2 and 3}
or
B = {(x, y): x ∈ Z, y 𝜖 Z, 1 ≤ x ≤ 2, 1 ≤ y ≤ 3}
Popular Sets
{ } or 𝜙 is called an empty set. This set does not have any element.
N is the set of non-negative integers or natural numbers {0, 1, 2, 3,…}
ℤ or Z is the set of integers {…-4, -3, -2, -1, 0, 1, 2, 3, 4,…}
ℤ+ or Z+ is the set of positive integers {1, 2, 3, 4,…}
ℤ− or Z- is the set of integers {…-4, -3, -2, -1}
ℝ or R is the set of real numbers {…-4/7,..,-2,…,π,..1,..,, √2,…3,..}
ℂ or C is the set of complex numbers {…,-2,…-3 + 4i,…,0, …, 4-3i,..,5,…}
Set Notations
∈ means “is an element of ”. For 3 ∈ {1, 2, 3, 4, 5} but 6 ∉ {1, 2, 3, 4, 5}
104
⊂ means “is a subset of ”. If A = {1,4}; B = {1,7}, C={1, 2, 3, 4, 5}.
A ⊂ C but B ⊄ C
⊆ means “is a subset of or is equal to ”. If A = {1,2,3,4}, B = {1,2,3,4}, C = {1,4,5,7,8}

then A ⊆ B but C ⊈ B
∪ means “union”. Set A union B is written as A ∪ B is a set that contains elements found in
set A as well as those found in set B or those found in both in set A and B.
If A = {1, 2, 3, 4}, B = {1, 2, 5}, then A ∪ B = {1, 2, 3, 4, 5}
∩ means “intersection”. Set A intersection set B is written as A ∩ B is a set that contains

elements found in both set A and set B.
If A = {1, 2, 3, 4}, B = {1, 2, 5}, then A ∩ B = {1, 2}
𝜉 is used to represent the universal set. This is the set that contains all the elements.
Complement of set A, that we will write as 𝐴̅ or 𝐴′ , is the set that contains elements from the
universal set that are not found in set A.
Set difference of A and B is written as A – B and contains elements of set A that are in set B.
If A = {1, 2, 3, 4}, B = {1, 2, 5}, then A - B = {3, 4} and B – A = {5}
Power set of A, which we write as pow (A), is the set of all subsets of set A. For example, if
A = {1, 2, 3}, pow(A) = {𝜙, {1}, {2}, {3}, {1, 2}, {1,3}, {2,3}, {1, 2, 3}}. There are 8 sets in
pow(A).
If set A has n elements, the pow(A) has 2n sets.
6.4 Set Rules

For three sets A, B and C
(i) Commutative Laws
A∩B=B∩A
A ∪B = B ∪A
(ii) Associative Laws

(A ∩ B) ∩ C = A ∩ (B ∩ C)
(A ∪ B) ∪ C = A ∪ (B ∪ C)
(iii) Distributive Laws

A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
(iv) Idempotent Laws

A∩A=A
A ∪A = A
(v) Identity Laws

A ∪Φ = A
A∩U=A
105
(vi) De Morgan’s Laws
(a) (A ∩ B) ′ = A ′ ∪ B ′
(b) (A ∪ B) ′ = A ′ ∩ B ′
(c) A – (B ∩ C) = (A – B) ∩ (A- C)
(d) A – (B ∪ C) = (A – B) ∪ ( A – C)
(vii) (a) A – B = A ∩ B′
(b) B – A = B ∩ A′
(c) A – B = A ⇔A ∩ B= Φ
(d) (A – B) ∪ B= A ∪ B
(e) (A – B) ∩ B = Φ
(f) A ∩ B ⊆ A and A ∩ B ⊆ B
(g) A ∪ (A ∩ B)= A
(h) A ∩ (A ∪ B)= A
(viii) (a) (A – B) ∪ (B – A) = (A ∪ B) – (A ∩ B)
(b) A ∩ (B – C) = (A ∩ B) – (A ∩ C)
(c) (A ∩ B) ∪ (A – B) = A
(d) A ∪ (B – A) = (A ∪ B)
(ix) (a) 𝜉 ′ = Φ
(b) Φ′ = 𝜉
(c) (A′ ) ′ = A
(d) A ∩ A′ = Φ
(e) A ∪ A′ = 𝜉
(f) A ⊆ B ⇔B′ ⊆ A′
Example
Prove A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
Solution
The strategy is to start by with an element x of the set on LHS and prove
that it also exist in the set on the RHS.
Let x ∈ 𝐴 ∩ (𝐵 ∪ 𝐶). This means x ∈ 𝐴 and x ∈ (𝐵 ∪ 𝐶) i.e. x must be in set
A and x must also be in the set B ∪ C. If x is in set B ∪ C then x must either
be in B only or in C only or in both B and C. If x is in B only then x must be
in A ∩ B; and if x is in C only then x must be in A ∩ C. Therefore, x will
either be in A ∩ B or in A ∩ C. Since x is in either A ∩ B or in A ∩ C, x
will be in the union of these two sets: x ∈ (A ∩ B) ∪(A ∩ C). Hence proved.
Example
Prove (A ∩ B) ′ = A′ ∪ B′
Solution
Let x ∈ (A ∩ B) ′. This means x is not in the set A ∩ B. This means x is in
either A or B but not both. If x is in A only then x must be in B′. If x is in B
only then x must be in A′. Therefore, x must either be in A′ or in B′. If x is
either in A′ or in B′, then it must be in A′ ∪ B′. Hence, (A ∩ B) ′ = A′ ∪ B′.
106
6.5 Activity 1
Prove
(1) (A ∪ B) ′ = A ′ ∩ B ′
(2) A – (B ∩ C) = (A – B) ∩ (A- C)
(3) A – (B ∪ C) = (A – B) ∪ ( A – C)
6.6 Relations
Cartesian Product of Two Sets
Let A and B be two sets, then the Cartesian product of A and B is written as AxB. If x ∈ A and
y ∈ B, then AxB is the set of the ordered pairs (x,y).
For example, if A = {1, 2, 3} and B = {5, 6} then AxB = {(1,5), (2,5), (3,5), (1,6), (2,6), (3,6)}
AxB = {(5,1), (5,2), (5,3), (6,1), (6,2), (6,3)}
If there are p elements in set A and q elements in set B then there are p times q elements in set
AxB.
Relation
A relation, R, from non-empty set A to non-empty set B is defined as a subset of AxB.
The set contains all the first elements in the relation R is called the domain of the relation R.
The set of all the second elements in the relation R is called the range of the relation R.
For example if R = {(2, 5), (3, 10), (4, 17)}, then domain of R = {2, 3, 4} and range of R = {5,
10, 17}.
Properties of Relations
A relation, R, on non-empty set A is defined as a subset of A × A: R ⊆ A × A
For example, let A = {1, 2, 3} then Cartesian product A × A = {(1,1), (1,2), (1,3), (2,1), (2,2),
(2,3), (3,1), (3,2), (3,3)}.
A binary relation R ⊆ A × A is called:
Reflexive if and only if for all x ∈ A, (x, x) ∈ R
Symmetric if and only if for all x, y ∈ A, (x, y) ∈ R ⇒ (y, x) ∈ R. This means both (x, y)
and (y, x) must be present in R if R is symmetric.
Transitive if and only if for all x, y, z ∈ A, (x, y) ∈ R and (y, z) ∈ R ⇒ (x, z) ∈ R. This
means when (x, y) and (y, z) are present in R, then (x, z) must also be present in R if R is
transitive.
If a relation is reflexive, symmetric and transitive, then it is called an equivalence relation.
Example
R is a relation on Z+ (Z+ is the set of positive integers) such that ((a,b),(c,d)) ∈ R if and only
if ad = bc. For example, ((1,2),(2,4)) is an element of R as 1 x 4 = 2 x 2.
Prove that R is an equivalence relation.
Solution
Let’s prove that R is reflexive.
x = (a, b) where a and b are positive integers.
We have to prove that (x, x) in R if R is reflexive.
(x, x) = ((a, b), (a, b)) is included in R because ab = ba. Hence R is reflexive.
107
Let’s prove that R is symmetric.
x = (a, b) and y = (c, d)
We have to prove that if (x, y) is in R then (y, x) is also in R.
Let (x, y) = ((a, b), (c, d)) be in R. This means ad = bc.
ad = bc means da = bc.
Let’s find if (x, y) is in R.
(x, y) = ((c, d), (a, b)) lies in R as cb = da. Hence R is symmetric.
Let’s prove that R is transitive.

x = (a, b), y = (c, d) and z = (e, f)
We have to prove that if (x, y) and (y, x) are in R then (x, z) is also in R.
Since (x, y) = ((a, b), (c, d)) is in R. This means ad = bc.
Since (y, z) = ((c, d), (e, f)) is in R. This means cf = de.
We have to prove that (x, z) = ((a, b), (e, f)) is in R. That is we have to
prove that af = be.
ad = bc means a = bc/d
cf = de means f = de/c
Therefore, af = (bc/d) x (de/c) = be. Hence R is transitive.
6.7 Activity 2
Question 1
If R is an equivalence relation on the set A = {1, 2, 3, 4, 5} which is
defined as R = {(a, b): |a-b| is even}. Prove that R is an equivalence
relation.
Question 2
If R is an equivalence relation on the set Z+ which is defined as R = {((a,
b), (c, d)): a + d = b + c}. Prove that R is an equivalence relation.
6.8 Functions
A function, f, maps every element of a set called domain, A, onto elements
of a set called codomain, B. This is written as f: A  If a ∈ A and b
∈ B and f(a) = b, the we say that the function f assigns element b of set B
to the element a of set A. b is referred to as the image and a is referred to
as the object.
It is important to note that
(1) the function f assigns an element of set B to every element of set A.
That is every object of domain must have an image in codomain;
(2) The function f assigns a unique element of set B to every element of
set A. This means an object in domain can’t have two images in
codomain;
(3) two different objects of the domain can have the same image;
(4) the set of images is called range of the function; and
(5) range of the function is a subset of the codomain of the function.
Example
For example there is a function f that “adds 5”. Its domain consists of
integers, A = {1, 2, 3, 4}. The codomain consists of integers 0 to 10
inclusive, B = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}.
(a) Find the range of the function f.
108
(b) Write function f in different forms.
Solution
(a) Since set A is the domain, then, 1, 2, 3, and 4 are the objects of function f.
f(1) = 1 + 5 = 6. This means the image of object “1” is 6.
Hence, range of function f = {6, 7, 8, 9}
(b) one way to write the function is to start by taking a general element of the domain
that we can represent by x. Therefore, f maps x onto x + 5 as f adds 5 to x.
f (x) = x + 5, x ∈{1, 2, 3, 4}
We can also write the function as

f: x  x + 5, x ∈{1, 2, 3, 4}
Injective Functions or one-to-one function

A function f: A  is injective if f(x) = f(y) implies that x = y. This means every element
of domain has a unique image in the codomain. Therefore, two different objects in the
domain are not allowed to have the same image in the codomain.
To prove that a function f(x) is injective, start with f(x) = f(y) and conclude that this leads to
x = y.
We can also use the horizontal line test. We draw the graph of y = f(x). The horizontal lines
should either not cut the graph or cut it at one point only. Horizontal lines should not cut the
graph of injective functions more than once. For example f(x) = x + 2 is an injective
function but f(x) = x2 is not an injective function if the domain is the set of real numbers.
Surjective Functions or onto function

A function f: A  is surjective if for every element of the codomain is associated to an
element of the domain. This means that the Range = Codomain.
To prove that a function f(x) is surjective, start with f(x) = b where b is an element of the
domain. Solve f(x) = b to find x and show that x is an element of the domain A.
Bijective Functions
A function is bijective if it is both injective and surjective. Bijective functions have an
inverse. The inverse of function f(x) is written as f-1(x).
To find f-1(x) proceed as follows:
Step 1. Let y = f(x)
Step 2. Make x the subject of formula.
Step 3. Replace x on the LHS by f-1 (x) and all y on RHS by x.
Example
f is a function from the set of Real numbers of to the set of Real numbers. f(x) = x + 2
(a) Prove that f(x) is an injective function
(b) Prove that f(x) is a surjective function
(c) Find f-1 (x)
Solution
109
(a) f(x) = f(y) means x + 2 = y + 2. Solving it gives rise to x = y. Hence f(x) is injective
or one-to-one.
(b) Let f(x) = b. This means x + 2 = b. Solving this leads to x = b – 2 which
is a real number.
(c) Let f(x) = x + 2 = y. This means x = y – 2. Hence, f-1(x) = x – 2.
Composite Function
Suppose that f and g are two functions such that range of function g is a subset
of the domain of f(x), then the composite function f(g(x)) or fg(x) or f o g(x)
exists.
g(f(x)) exists if range of f(x) is a subset of the domain of g(x).
Example
Suppose that f and g are functions from the set of Real numbers to the set of
Real numbers. f: RR.
f(x) = x + 5 and g(x) = x2
We can have the composite functions:
fg(x) = x2 + 5
gf(x) = (x + 5)2
ff(x) = f2(x) = x +10
ffg(x) =f2g(x) = x2 + 10
fff (x) = f3(x) = x +15
f4(x) = x + 20
fn(x) = x + 5n
6.9 Activity 3
In the following questions, f and g are functions from set of real numbers to
the set of real numbers.
Question 1
f(x) = 2x – 1 and g(x) = x2 + 1
(a) Prove that f(x) is bijective and g(x) is not bijective.
(b) Find the composite functions (i) fg (ii) gf (iii) ff (iv) gg (v) ffg
Question 2
f(x) = x + 1 and g(x) = 3x - 1
(c) Prove that f(x) and g(x) are bijective.
(d) Find the composite functions (i) fg (ii) gf (iii) ff (iv) gg (v) ffg
110
UNIT
Graph Theory
7
6UNIT
STRUCTURE
1 7.1
7.2
7.3
7.4
Introduction
Learning Outcomes
Graph
Eulerian Multigraphs
7.5 Hamiltonian Graphs
7.6 Distance in Graphs
7.7 Network Analysis
7.8 Activities
7.1 Introduction
In this Unit, you will be introduced to Graph Theory. This is important when
building networks.

following:
1. Understand Graph Theory.
2. Solve practical problems using graph theory
7.3 Graph
Definition 1.
A graph is a finite, nonempty set V, called the vertex set, together with set E, called the edge set,
whose elements e∈E are pairs e = (a, b) with a, b ∈V. We also write G (V, E) to represent a
graph G with the set of vertices V and set of edges E.
Straight lines, arrows, arcs or loops can connect vertices.
A multigraph G(V,E) allows two vertices to be joined by more than one edge. Such edges are
called parallel edges.
Definition 2.
An undirected graph is one in which the edge set consists of unordered pairs. This means (a,
b) and (b, a) are same. Arrows are not used to connect vertices.
Definition 3.
111
A directed graph or digraph is one in which the edge set consists of ordered pairs. This means
(a, b) and (b, a) are different. We use arrows to join the vertices. A directed edge e = (c, d)
appears as an arrow that starts at c and points towards d and ends at d.
Loops or self-loops connect a vertex to itself: e = (c, c).

Definition 4.
Suppose that a and b are two vertices in an undirected graph G(V, E) where V
is the set of vertices and E is the set of edges. The vertices a and b are said to
be adjacent or to be neighbours if (a, b) ∈ E. We can also say that the edge e
= (a, b) is incident on the vertices a and b.
Definition 5.
If the directed edge e = (u, v) is present in a directed graph G(V′,E′) then u is
referred to as a predecessor of v. v is referred to as a successor of u. u is also
called the tail or tail vertex of the edge (u, v), and v is called the tip or tip
vertex.
Complete graphs, Kn
The complete graph Kn is the undirected graph with n vertices whose edge set
includes every possible edge. If one numbers the vertices consecutively the
edge and vertex set are
V = {v1, v2, ..., vn}
E = {(vj , vk ) | 1 ≤ j ≤ (n − 1), (j + 1) ≤ k ≤ n} .
Number of edges = n(n-1)/2
K4 is the undirected graph has 4 vertices and (4)(4-1)/2 = 6 edges as shown
below
K5 is the undirected graph has 5 vertices and (5)(5-1)/2 = 10 edges as shown

below
Path Graphs Pn
Pn has n vertices and (n-1) edges:
V = {v1, v2, ..., vn}
E = {(vj , vj+1) | 1 ≤ j ≤ n } .
P4 has 4 vertices and (4-1) = 3 edges as shown below:
112
P5 has 5 vertices and (5-1) = 4 edges as shown below:
Cycle or Circuit Graphs Cn

The cycle or circuit graph Cn, has at least three vertices that are arranged in a
ring.
The set of vertices is
V = {v1, v2, ..., vn}
E = {(v1,v2), (v2,v3), ..., (vj,vj+1), ..., (vn−1,vn), (vn,v1)}.
Cn has n edges. Each edge is written as (vj,vj+1) with vn+1 ≡ v1.
Example
C3 has 3 edges and 3 vertices:
Definition 6
A bipartite graph G(V,E) is a graph with a non-empty set of edge E and two
non-empty disjoint set of vertices V1 and V2 (so that V = V1 ∪ V2) such that
every edge e = (u, v) connects a vertex from V1 to that of V2.
Complete Bipartite Graphs Km,n

Suppose that V1 is a set of distinct m vertices and V2 is a set of distinct n
vertices, then complete bipartite graph joins all vertices of V1 to all vertices of
V2. Hence total number of edges is m times n.
Example
Suppose that V1 has one vertex (unshaded circles in diagram below) and V2
has three vertices (shaded circles in diagram below), then we obtain the
complete bipartite graph K1,3:
113
Suppose that V1 has two vertices (unshaded circles in diagram below) and V2 has two vertices
(shaded circles in diagram below), then we obtain the complete bipartite graph K2,2:
Suppose that V1 has two vertices (unshaded circles in diagram below) and V2 has two vertices
(shaded circles in diagram below), then we obtain the complete bipartite graph K2,2:
d-Dimensional Cube Graphs, Id
Id has the set of vertices V defined as:

V = {v | v ∈ {0,1}d}
and the set of edges E defined as:
E = {(v, v′) | the string of 0’s and 1’s defining v and v′ differ by one position only}
Hence Id has 2d vertices and has d2d-1 edges.
Example
For I2
V = {00, 01, 11, 10}
E = {(00,01), (00, 10), (10, 11), (01, 11)}
For I3
V = {000, 001, 011, 111, 100, 101, 110, 010}
E = {(000,100), (000, 001), (000, 010), (010, 110), (010, 011), (110, 111), (110, 100), (100,
101), (111, 101), (111,011), (001,101), (001,011)}
114
Definition 7
The degree of a vertex, written as deg(v), in an undirected graph is the number
of edges that include the vertex v.
Theorem
In an undirected graph G(V, E), the sum of the degree for all vertices is equal
to twice the number of edges.
deg (v1) + deg (v2) + deg (v3) + deg (v4) + … = 2 x number of edges
Theorem
In an undirected graph G(V, E), the sum of the degree for all vertices is equal
to twice the number of edges.
deg (v1) + deg (v2) + deg (v3) + deg (v4) + … = 2 x number of edges
Moreover, in an undirected graph, the number of vertices, that have an odd

degree, must have an even value.
Definition 8
The in-degree of a vertex in a digraph, written as degin(v), is equal to number
of edges having vertex v at their tip and the out-degree of a vertex in a
digraph, written as degin(v), is equal to number of edges having vertex v at
their tip.
Definition 9
Walk
A sequence of edges (e1, e2, e3,…,eL) is a walk in a graph G(V,E) if there is a
sequence of vertices (v0, v1, v2,…,vL) such that ej = (vj-1,vj )∈E.
115
Note that vertices need not be distinct. If v0 = vL then the walk is a closed
walk.
Length of a Walk
The length of a walk is equal to the number of edges in the sequence.
Trail
A walk in which all the edges ej are distinct is called a trail. In a trail, an edge
should not appear twice. A vertex can appear more than once.
Path
A trail in which the vertices are distinct in the sequence of vertices (v0, v1, v2,…,vL) is called a
path. That is in a path, a vertex should not appear twice.
Cycle
A cycle is a closed trail in which all the vertices are distinct, except for the first and last, which
are identical.
Note that all paths are trails. All trails are walks.
Certain trails are not paths. For instance the walk through the vertices (a, b, c, d, e, b, and f) is a
trail as all the edges are distinct. However, this is not a path because the vertex b is visited twice.
Definition 10
Connected Vertices
Suppose that a and b are two vertices in a graph G(V, E). The two vertices a and b will be said
to be connected if there is a walk described by (v0,...,vL) where v0 = a and vL = b. We also say
that a vertex is connected to itself.
When every pair of vertices is connected, then we will refer to the graph as a connected graph.
Suppose that a and b are two vertices in a directed graph G(V,E). If there is a walk from a to b
then vertex b is said to be accessible or reachable from vertex a. We also say that the vertices
are accessible or reachable from themselves.
116
Two vertices a and b in a directed graph are said to be strongly connected when the vertex b is
accessible from the vertex a and vertex a is accessible from vertex b. Moreover, we say that a
vertex is strongly connected to itself.
When every pair of vertices of a directed graph is strongly connected, we say that the graph is
strongly connected.
Suppose that when we convert the edges of a directed graph G(V,E) into
undirected ones, the graph becomes a connected undirected graph, then the
directed graph is said to be weakly connected.
7.4 Eulerian Multigraphs
Definition 11
Suppose that G(V,E) is a multigraph. A trail in G(V,E) is said to be a
Eulerian Trail if it includes the graph edges exactly once.
An Eulerian Trail that starts and ends at the same vertex is said to be a
Eulerian Tour.
An Eulerian multigraph is one that contains an Eulerian tour.
Jungnickel’s Theorem
For a connected multigraph, the following statements are equivalent:
(1) G is Eulerian.
(2) The degree of all the vertex of G is even.
(3) The set of edges of G can be partitioned into cycles.
Leonhard Euler invented graph theory in 1736 to solve the Königsberg

Bridge Problem. The Königsberg town (now called Kaliningrad now) used
to be famous for its seven bridges (show by the seven arrows) .
117
Euler wanted to know if it is possible to start at any point and come back to
the same point by crossing each of the bridges once only? Of course the
answer is no.
We will show the bridges as:
The diagram below shows that if we start at East Island and we wish to come back to the East
Island, it will be impossible to do it without using an edge more than once.
Euler suggested adding two edges to solve the problem:
118
Now we add the arrows to show how it becomes a Eulerian Multigraph. The diagram below
shows that you can start at East Island and return to East Island by crossing all the bridges once
only.
Note that each of the vertex has a degree which is even. The cycles are clear on the diagram.
Hence Eulerian graphs are graphs that have a closed trail that includes every
edge exactly once.
7.5 Hamiltonian Graphs
If in Eulerian graphs, every edge is included once, in Hamiltonian graphs all vertices are
included.
Definition
A path that includes all of the graph’s vertices is called a Hamiltonian path in the graph.
A cycle that includes all the vertices of a graph is called a Hamiltonian tour or Hamiltonian
cycle.
A Hamiltonian graph is a graph that contains a Hamiltonian tour.
Note that in Hamiltonian graphs, there must be at least three vertices else, it will not be
possible to have a cycle.
Theorem
Suppose that G is a graph that has n vertices (n has a value of 3 or more). If the degree of
each of the vertices is at least n/2, then G is a Hamiltonian graph.
119
Theorem
Suppose that G is a graph that has n vertices (n has a value of 3 or more). If u and v are non-
adjacent vertices in G such that deg(u) + deg (v) ≥ n, then G is a Hamiltonian graph.
Suppose that G is a graph with n vertices. The closure of G, which we will write as [G], is a
graph that is obtained by adding edges until all non-adjacent pairs u and v satisfy the
condition deg(u) + deg (v) ≥ n.
Example
By adding edges, we can make the graph Hamiltonian.
7.6 Distance in Graphs
Weighted Graphs
We shall associate a weight, w, to an edge. Weights are real numbers (so w can be negative, zero
or positive). For example, if vertices represent places, the weights can be the distance between
the two vertices. Weights can also be the time to travel between two vertices. Weights can be
profit when using the route from a to b.
Edges can represent roads, pipes, sea-route, cables,…
The weight of a walk is defined as the sum of the weights of the edges including the walk.
Definition
In a weighted graph G(V, E, w) that does not any cycle of negative weight,
we define distance as
d(a,a) = 0 for all vertices
d(a,b) = ∞ if there is no walk from a to b
d(a,b) is the weight of a minimal-weight walk from a to b when such
walk exists.
Network
A network is a graph in which each edge is assigned a distance.
Tree
A tree is a connected graph that does not have a cycle
Planar graph is one in which the edges intersect at the vertices only.
This is a complete graph K4 but it is not a planar graph:
120
This is a complete graph K4 but it is a planar graph:
Isomorphic Graphs
Isomorphic graphs can be deformed (vertices moved and edges straightened
or bent) to make the other.
Incidence Matrix
In the incidence matrix of a graph, the elements represent the number
of edges connecting the vertices represented by the row and column of the
matrix.
Isomorphic
Graphs are isomorphic if one can be deformed to make the other.
The vertices can be moved and the edges straightened or bent to do this.
7.7 Network Analysis
Minimum Connector – Minimum Spanning Tree

Suppose that the villages in a given district are to be connected by a cable system like fibre -
optic. The important question is to find the minimum length of cable needed to connect the
villages? We refer to such problems as the minimum connector problem that consists of
selecting edges that allow us to reach a vertex from any other vertex such that the sum of weights
on all the selected edges is minimum. We obtain the minimum spanning tree.
There are two methods to find the minimum connector:
(a) Kruskal’s algorithm; and
(b) Prim’s algorithm.
Kruskal’s algorithm:
Step 1. Choose the shortest edge. In case there are several edges having the minimum weight,
just choose anyone of them.
Step 2. Choose the next shortest edge (it is not necessary that this edge be joined to the already
chosen edge.
Step 3. Choose the next shortest edge that does not create a cycle and add it.
121
Step 4. Repeat Step 3 until all the vertices are connected.
Example
Find the minimum spanning tree for the network:
Step 1. We choose the shortest edge. Here we have BC and CE. We can choose anyone of them.
Let’s choose CE.
Step 2. Let’s choose the next shortest edge. Here it will be BC.
Step 3. Choose the next shortest edge (that should not form a cycle; it may or may not be
connected to the sub-graph obtained in step 2).
Repeat Step 3.
Repeat Step 3 as there is still the vertex A to connect.
The minimum spanning tree has length 2+2+3+3+4 = 14.
Prim’s Algorithm
122
Step 1. Choose a vertex.
Step 2. Choose the shortest edge from this vertex to any other vertex that is
connected directly to it.
Step 3. Choose the nearest vertex not yet in the solution and which does not
form a cycle.
Step 4. Repeat Step 3 until all the vertices are connected.
Let’s apply Prim’s algorithm to the previous example.

Step 1. Let’s start with vertex A.
Step 2. A is connected to B and C. Since the weight is same for both, we can
choose anyone of them. Lets choose AB.
Step 3. From B, we can only go to C.
Repeat Step 3. From C, we connect to E.
Then we connect vertex D.
Hence we obtain the minimum spanning tree of length 14:
Prim’s Algorithm – Table Form

Step 1. We draw a table showing the weights for the edges
For the network
123
the table is
A B C D E F
A - 4 4 - - -
B 4 - 2 - - -
C 4 2 - 3 2 6
D - - 3 - - 3
E - - 2 - - 3
F - - 6 3 3 -
Step 2. Choose a column and cross its row. Let’s say we choose Column D. We will delete row
D.
Step 3. Choose the smallest number in column D and circle it. In case there is more than one,
choose one at random. Here we circle “3” which is in row C.
Step 4. Delete the row containing the number you have circled in Step 3. So we will delete row
C. We have now deleted rows D and C. We will choose the smallest value in Column D and C.
Here we choose “2” from row B. We circle it. We will delete row B
We have now deleted rows D, C and B. We will choose the smallest value in
Column D, C and B. Here we choose “2” from row E. We circle it.
We shall number the columns we are working with.
We will delete row E. Since the rows D, C, B and E have been deleted, we
will choose the smallest number in these columns. It is “3” in row F. We circle
it.
124
We will delete row F. Since the rows D, C, B, E and F have been deleted, we
will choose the smallest number in these columns. It is “4” in row A. We circle
it.
We will delete row A.

We will continue the process until all the vertices have been included.
The length of the minimum spanning tree is 14.
Dijkstra’s Algorithm: Shortest Path Between Two Points or Vertices

Dijkstra’s algorithm allows us to find the shortest path between two vertices.
Dijkstra’s algorithm is useful in finding directions between places (e.g. Google Maps) .
It can also be used to solve a networking or telecommunication problem like data network
routing that aims at finding the path for data packets that must go through a switching network
with minimal delay. It can also be used to solve shortest path problems emanating from robotics
and transportation. It does not work when edges have negative weights.
This algorithm assigns values, called labels, to each node. There are three types of labels:
Permanent Label, Ordering Label, and Working Label. We present the labels in the form of a
box:
125
A 1 0
To implement the algorithm, 0 we proceed as follows:
Step 1. Assign the starting point a permanent label of zero and the ordering
label of one.
Step 2. Consider the vertices connected to the starting vertex. Label the vertices directly
connected to the starting vertex by giving them a working label equal to the
weight of the connecting B edge.
Step 3. Select the vertex 5 that has the minimum weight. Make its permanent label
become equal to its working label. Make its ordering label 2.
Vertex Ordering Label Permanent Label Step 4. Repeat Step 2 with the vertex
B 4 6 that was last given a permanent label
Working Label (Temporary) (identified in Step 3) until the
10 7 6 destination node has a permanent label.
Each time a permanent label is
assigned the ordering label is increased by one.
Step 5. Connect the destination to the start by working backwards. Always choose an edge for
which the difference in the permanent labels, found at the two end points of the edge, is equal to
weight of the edge.
Example
In the road network, the weight represents the length, in km, of the respective road. Find the
shortest route from A to J as well as its length.
Step 1. We start at point A:
Step 2. Since B and C can be reached directly from A, we will label them with
a working label of 5 and 12 respectively:
126
A 1 0
Step 3. 0 Since vertex B has the smallest working label, we
will update its ordering label to 2 and permanent label to 5. C
is unchanged at this stage.
C
12
Step 4. Since B just got its permanent label, we will start with it. We can reach
C and F directly from B.
The working label of C will be equal to permanent
B 2 5
label of B plus the weight of the edge connecting B to C = 5
5
+ 6 = 11. Since vertex C already had a label 12, we will add
the working label 11 as it is smaller than 12. We will not add a working label
if it is larger than the existing one.
The working label of F will be equal to permanent label of B plus the weight
of the edge connecting B to F = 5 + 2 = 7:
C
12 11
Since the edge connecting B to F has the smallest weight 2, we will update
ordering label of F as 3 and give it its permanent label of 7.
F
7
Step 5. We will start with vertex F now. This will be repeated till vertex J gets
a permanent label:
F 3 7
7
127
Always choose an edge for which the difference in the permanent labels, found at the two end
points of the edge, is equal to weight of the edge. For example from J, we will have to choose H
(because 22 – 15 = 7 = weight on edge connecting H to J). We can’t choose G (because 22 – 14
= 8 ≠ weight on edge connecting H to G = 9).
Tracing back from J to A, the shortest route is A B F D G H J of length 22 km.
The Route Inspection Problem: Chinese Postman Problem

The Chinese Postman Problem is the problem of going along each edge once in a network.
This is called the Postman Problem because the postman needs to walk along every street to
deliver the mail in the most efficient way. Eulerian (all vertices have even degree) and semi-
Eulerian (only two vertices have odd degree) networks are traversable. A Eulerian network
can have more than one optimum solution.
Step 1. We start by identifying the odd vertices in the network.

Step 2. Form pairs of odd vertices. Join each pair. Find the sum of weights and choose the
one having the smallest sum of weights. These will lead to the edges that will be added to
the original network.
Step 3. Calculate the sum of weights on all the edges in the original network.
Step 4. Shortest distance is the sum of the weights in the original network plus the extra
distance that must be travelled.
Step 5. Find a tour that includes the edges found in Step 2.
Note that when there are

(1) 2 odd nodes only one pairing is possible;
(2) 4 odd nodes then 3 pairings are possible;
(3) 6 odd nodes then 15 pairings are possible; and
(4) n odd nodes then (n-1)x(n-3)x(n-5)x…x3x2x1 pairings are possible.
Example
What is the length of the shortest tour that covers all the edges in the given network. State the
suitable route.
128
Step 1. Identify the vertices having an odd degree. They are A, C, D, and
E.
Step 2. Since there are 4 different vertices, we can have 3 pairings. Starting
with A, we can have the edge AC, AD and AE. If we take AC then we will
pair it with DE. When we take AD, we will pair it with CE. When we take
AE, we will pair it with CD.
AC and DE: 6 + 14 = 20
AD and CE: 11 + 6 = 17
AE and CD: 12 + 8 = 20
The pair having the smallest sum of weights is AD and CE.
The sum of weights in the original network is 124.

We will have to repeat AD and CE. This gives a total weight of 124 + 17 =
141.
The suitable route is A B E F D A C B F C E C D A
7.8 Activities
Question 1
The following is a road network with the weight being the length in km.
(a) Using Dijkstra’s algorithm, calculate the shortest distance from S to G as

well as state the shortest route.
(b) State the shortest route from S to H as well as the shortest distance.
129
Question 2
Graph G has vertices P, Q, R, S, T, U, V, and W.
(a) Write down an example of a cycle on G.

(b) Is P Q R T Q S a path?
(c) The weights show the distance in km:
(i) Starting at P, use Prim’s algorithm to find the minimum spanning tree for G.
(ii) Use Kruskal’s algorithm to build the minimum spanning tree.
Draw the minimum spanning tree using vertices of G.
Question 3
The diagram shows a network of canals. After an oil spill, an expert wishes to inspect all the
canals. The expert will travel along each canal at least once.
(a) Using the Route Inspection algorithm, find the length of the shortest route that the expert
can take. State the canals that must be traversed twice.
(b) How many times would the vertex F appear in the Expert’s route?
(c) If the expert decides to start at the vertex H but can end at any place. Find the finishing
place so that the length of the route is shortest.
130
Question 4
The graph is a network of roads with the weights equal to the length of the
road.
(a) Using Dijkstra’s algorithm, calculate the shortest distance from
A to I as well as state the shortest route.
(b) John decides to travel along each road at least one before ending
his journey at A. Find the shortest route Sam can take.
Question 5
(a) What is the difference between Kruskal’s and Prim’s algorithms?

(b) Find the minimum spanning tree of the network using
(i) Prim’s algotrithm.
(ii) Kruskal’s algorithm.
Question 6
(a) An archaeologist found the above old train track. She wants to travel
from A to F. Use Dijkstra’s algorithm to find the shortest path she can
take.
(b) Using Prim’s algorithm and starting at G, find the minimum spanning
tree. State the length of the minimum spanning tree.
131
Solution
Question 1
(a)
(b) Shortest distance from S to H is 20 (km)

Shortest route from S to H is S – A – C – F – E – H
Question 2
(a) e.g. P Q S P
(b) No because vertex Q appears more than once.
(c) (i) PS, ST, SV; QS, QR; RU, TW
(ii) ST SV PS QS (not QT) QR (not PQ) (not TV) RU TW
Question 3
(a) A(BC)E + H(F)G = 15 + 13 = 28*
A(BDF)H + E(F)G = 30 + 7 = 37 A(BDF)G + E(F)H = 21 + 16 = 37
Repeated routes: AB, BC, CE, HF, FG Length: 214 + 28* = 242 (km)
(b) 4
(c) EG (7) is the shortest link between two odd nodes excluding H
Repeat EG (7) since this is the shortest path excluding H
We finish at A
Length of route = 214 + 7 = 221 (km)
132
Question 4
Route ADGHI: 48 km
(c) Using Route Inspection Algorithm

Odd vertices are A and H. Shortest route from A to H = ADGH
Length = 197 + 36 = 233
ADGHGDACEDHIFHEFBA
Question 5
(a) Kruskal starts with shortest edge while Prim’s start with any vertex. It is not
necessary to verify for cycles using Prim. Prim adds vertices to the tree while
Kruskal adds edges. Prim can be used with data given in a matrix form.
(b) (i) e.g. AC, CF, FD, DE, DG, AB.
(ii) CF, DE, DF, not CD, not EF, DG, not FG, not EG, AC, not AD, AB.
133
Question 6
(a)
(b) Prim starting at G: GF, GH, FJ, DG, JK, EK, BE, AB, CD or GF, GH, FJ, DG, CD, JK,
EK, BE, AB
80 km.
134
UNIT
Randomness
8
7UNIT
STRUCTURE
6 8.1
8.2
8.3
Introduction
Learning Outcomes
Randomness
1
8.4 Generators of Random Numbers
8.5 Test of Randomness
8.6 Activities
8.1 Introduction
In this Unit, you will be introduced to random numbers. This is important in
cryptography as well as when carrying out sampling, simulations and property
testing.

following:
1. Use random number generators.
2. Test randomness.
8.3 Randomness
I am sure you have all generated random numbers, as you must have tossed coins or thrown dice.
Since several thousand years people have been using dice or tossing coins. People have always
accepted that throwing fair dice and tossing fair coins generate random numbers.
Randomness refers to absence of patterns and predictability.
8.4 Random Number Generation
The sequence of random numbers generated must be statistically independent and identically
distributed (IID).
Microsoft Excel
I am sure that you must know people who play Lotto and use Excel Function
RANDBETWEEN(1,40) to generate a random number. Using it six times will generate six
random numbers.
135
Random Number Table
You can also use Random Number Tables. In most the random number tables,
the random numbers are presented in groups of 5 in order to increase the
readability. In fact the first row in the following table was generated as follows:
1 3 9 6 2 7 0 9 9 2 6 5 1 7 2…
Thus if I want four 1-digit random numbers, I may take 1, 3, 9, and 6.
If I want five 2-digit random numbers, I may take 13, 96, 27, 09, and 92.
If I want two 3-digit random numbers, I may take 139 and 627.
Online Random Number Generators

There are many websites providing random numbers. For example, you can use
the website https://stattrek.com/statistics/random-number-generator.aspx#error
136
The results are generated in a table. They are 2, 15, 7, 9, 3, and 22.
Linear Congruential Generators (LCG’s)

LCG requires us to define four integers:
m, referred to as the modulus, with m > 0
a, referred to as the multiplier, 0 ≤ a < m
c, referred to as the increment, 0 ≤ c < m
x0, referred as the seed, 0 ≤ c < m
The sequence of random numbers is generated using: xi+1 = (axi+c) mod m
“mod m” means the “remainder after dividing by m”.
Example
If we take m = 8, a = c =3 and x0 = 2
The sequence of random number generated is
x0 = 2
x1 = (ax0+c) mod m = (3 x 2 + 3) mod 8 = 1
x2 = (ax1+c) mod m = (3 x 1 + 3) mod 8 = 6
137
x3 = (ax2+c) mod m = (3 x 6 + 3) mod 8 = 5
x4 = (ax3+c) mod m = (3 x 5 + 3) mod 8 = 2
The sequence of random numbers is 2, 1, 6, 5, 2, 1, 6, 5, 2, 1, 6, 5, …

Since x4 = x0, we say that the above sequence has a period of 4.
Blum-Blum-Shub Generator
The Blum-Blum-Shub Generator (BBS) is commonly used in cryptography.
A Blum integer has two prime number as its factors such that each factor can
be written as 4t + 3 where t is an integer. 77 is a Blum integer as its factors are
11 and 7. 11 can be written as 4(2) + 3 and 7 can be written as 4(1) + 3.
Examples of Blum integers are
21, 33, 57, 69, 77, 93, 129, 133, 141, 161, 177, 201, 209, 213, 217, 237, 249,
253, 301, 309, 321, 329, 341, 381, 393, 413, 417, 437, 453, 469, 473, 489, 497,
501, 517, 537, 553, 573, 581, 589, 597, 633, 649, 669, 681, 713, 717, 721, 737,
749, 753, 781, 789,…
In order to generate random numbers, it starts by selecting two prime numbers
p and q, such that the product of p and q is a Blum integer: m = pq
Choose s between 1 and m-1.
Starting with x0 = s compute the sequence by using the formula
xn+1 = xn2 mod m
Finally the sequence of bits is created using bn = xn mod 2.
Example
p 7
q 19
m m = 7 x 19 =133 m is a Blum integer
Choose s = 100 (between 1 and 133 – 1 = 132)
s 100 1002 = 10000
x0 10000 mod 133 = 25 252 = 625 b0 = 625 mod 2 = 1
x1 625 mod 133 = 93 2
93 = 8649 b1 = 8649 mod 2 = 1
x2 8649 mod 133 = 4 2
4 = 16 b2 = 16 mod 2 = 0
x3 16 mod 133 = 16 2
16 = 256 b3 = 256 mod 2 = 0
x4 2
256 mod 133 = 123 123 = 15129 b4 = 15129 mod 2 = 1
The sequence generated is 11001
8.5 Test of Randomness: Run’s Test
Since in the field of Computing, you will be dealing with binary data, we will
use Wald-Wolfowitz Runs Test to test if runs of binary numbers are random or
not.
A run is defined as a sequence of the same digit. Each time that the value of the
digit changes, a new run begins.
For example, we have generated the following sequence of binary digits:

138
111 00 1 0000 1 0 1 00000 has 8 runs.
Note that
The first run is 111
The second run is 00
The third run is 1
The fourth run is 0000
I would advise you to carry out the Run test using EXCEL.
H0: The runs of binary digits are random.

H1: The runs of binary digits are not random.
Step 1. Find number of zeroes generated and call it n1.

Step 2. Find number of ones generated and call it n2.
Step 3. Find the number of runs and call it r.
Since n1 and n2 are usually large (greater than 20), we will use the Z-test as r will then have an
approximate normal distribution with mean μ and standard deviation σ.
Step 4. Calculate the mean suing the formula
2𝑛1 𝑛2
𝜇= +1
𝑛1 + 𝑛2
Step 5. Calculate the variance using the formula

2𝑛1 𝑛2 (2𝑛1 𝑛2 − 𝑛1 − 𝑛2 )
𝜎2 =
( 𝑛1 + 𝑛2 )2 (𝑛1 + 𝑛2 − 1)
Step 6. Calculate the value of z using formula

𝒓−𝝁
𝑧=
𝝈
Step 7. Find the critical values of z for 2-tail test using the given significant level.
Step 8. Write the conclusion of whether you accept H 0 or not.
Note that if n1 and n2 are not large then we will use a run test table to obtain the critical value
of r.
Example.
Suppose that we have 20 runs with 18 zeroes and 22 ones. Are the runs random?
Run's Test Formula used in Excel
n1 = number of zeroes 18 B3
n2 = number of ones 22 B4
n = n 1 + n2 40 B5
r = number of runs 20 B6
mean, μ 20.8 ((2*B3*B4)/B5) + 1
Variance, σ2 9.5446 ((2*B3*B4)*(2*B3*B4-B3-B4))/(((B3+B4)^2)*(B3+B4-1))
Standard deviation σ 3.0894 SQRT(B8)
Z 0.2589 ABS(B6-B7)/B9
p-value 0.7956 2*(1-NORMSDIST(B10))
Significance level of test 0.05 5%
Critical value of Z 1.9599 NORMSINV(1-B12/2)
Accept H0? YES IF(B11<B12,"NO","YES")
We have taken a screen shot from the Excel workbook for you:
139
Example
Suppose that we have 10 runs with n1 = 17 zeroes and n2 = 9 ones. Are the runs
random?
Here we will have to use the random number table to find the number of runs
that would allow us to accept H 0.
From Table below
140
We choose the following row with n1 = 17, n2 = 9. This shows that the number
of runs that we must have to accept H0 ranges from 8 to 19. Since we got 10
runs, we accept H0 at 5% level of significance (two-tailed).
For other values of n1 and n2, the critical values of r are (5% level of
significance-two-tailed):
8.6 Activity
Question 1.
141
Generate five random numbers using LCG with seed = 1073 and parameters a = 35, b = 528
and m = 2547.
Answer: 2425, 1352, 2002, 1829, and 868.
Question 2.
(a) Show that 781 is a Blum number.
(b) Using Blum-Blum-Shub Generator (BBS) with p =11, q = 71, and seed
=15, generate 5 binary digits.
Answer: 11010
p 11
q 71
m 781
s 15 225
x0 225 50625 1
x1 85 7225 1
x2 196 38416 0
x3 147 21609 1
x4 522 272484 0
Question 3.
(a) Show that 5293 is Blum number.
(b) Using Blum-Blum-Shub Generator (BBS) with p =67, q = 79, and seed
=127, generate 5 binary digits.
Answer: 01100
p 67
q 79
m 5293
s 127 16129
x0 250 62500 0
x1 123 15129 1
x2 4543 20638849 1
x3 1442 2079364 0
x4 4508 20322064 0
Question 4
Suppose that we have 40 runs with 35 zeroes and 25 ones. Are the runs random
at 5% level of significance?
Answer: No
142
143

Computational Mathematics Open University

Uploaded by

Copyright:

Available Formats

Computational Mathematics Open University

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computational Mathematics Open University

Uploaded by

Copyright:

Available Formats

Computational

UNIT 2 – Solving Systems of Linear Equations 17

UNIT 3 – Permutations and Combinations 29

UNIT 4 – Probability Distributions 49

UNIT 5 – Correlation and Regression 79

UNIT 6 – Sets, Relations and Functions 99

UNIT 7 – Graph Theory 107

UNIT 8 – Randomness 130

Open University of Mauritius – Computational Mathematics

1.2 Learning Outcomes

1.3 Definition and Symbols

Open University of Mauritius – Computational Mathematics

𝑎11 𝑎12 ⋯ 𝑎1𝑛

The order or size or dimension of B (also written as Dim(B)) is (2 x 3).

The first row is R1 = (2 3.5 -45)

The second row is R2 = (1/2 -6.7 2)

a11 = 2 a12 = 3.5 a13 = -45

a21 = 1/2 a22 = -6.7 a23 = 2

(a) What is order of matrix A?

Open University of Mauritius – Computational Mathematics

1.4 Types of Matrices

Open University of Mauritius – Computational Mathematics

Matrix A is said to be an invertible matrix, if there exists a matrix A-1 such

Open University of Mauritius – Computational Mathematics

Matrix A is said to be symmetric if AT = A.

1.5 Matrix Operations

Addition and Subtraction

Open University of Mauritius – Computational Mathematics

Open University of Mauritius – Computational Mathematics

Open University of Mauritius – Computational Mathematics

1 2 1 2 1x1 + 2x − 4 1x2 + 2x5

Similarly, A3 = A2.A, A4 = A2.A2,…

Determinant of a 2 x 2 Matrix = |A| = det A

Determinant of matrix A is written as |A| or det A is obtained as follows

Determinant of a 3 x 3 Matrix = |A| = det A

𝑎 𝑎23 𝑎21 𝑎23 𝑎21 𝑎22

|A| = 1(8x5 – 9x6) -2(7x5 - -4x9)+3(7x6 - -4x8) = 1(-14) – 2(71) + 3(74) = 66

Open University of Mauritius – Computational Mathematics

Open University of Mauritius – Computational Mathematics

C11 = (-1)1+1 M11 = 1 x 7 = 7

C12 = (-1)1+2 M12 = -1 x 14 = -14

C13 = (-1)1+3 M13 = 1 x 7 = 7

C21 = (-1)2+1 M21 = -1 x 10 = -10

C22 = (-1)2+2 M22 = 1 x 20 = 20

C23 = (-1)2+3 M23 = -1 x 10 = -10

C31 = (-1)3+1 M31 = 1 x -3= -3

Open University of Mauritius – Computational Mathematics

C32 = (-1)3+2 M32 = -1 x -6 = 6

C33 = (-1)3+3 M33 = 1 x -3 = -3

Therefore, the matrix of minors is

Therefore, the matrix of cofactors is

Open University of Mauritius – Computational Mathematics

Using the first row of matrix A, |A| = (9x7) + (8x-14) + (7x7) = 0

Using the second row of matrix A, |A| = (6x-10) + (5x20) + (4x-10) = 0

C11 = (-1)1+1 M11 = 1 x -2 = -2

C12 = (-1)1+2 M12 = -1 x 3 = 3

C13 = (-1)1+3 M13 = 1 x 10 = 10

Open University of Mauritius – Computational Mathematics

C22 = (-1)2+2 M22 = 1 x -1 = -1

C23 = (-1)2+3 M23 = -1 x -5 = 5