Introduction To Probability and Statistics Course ID:MA2203: Course Teacher: Dr. Manas Ranjan Tripathy

Introduction to
Probability
and Statistics
Syllabus
Introduction to Probability and Statistics
Course ID:MA2203
Lecture-1
Course Teacher: Dr. Manas Ranjan Tripathy
Department of Mathematics
National Institute of Techonology, Rourkela
. . . . . . . . . . . . . . . . . . . .
1/9
Introduction to
Probability
and Statistics
Syllabus
Syllabus
Mid-Semester: Axiom of Probability(Motivation, Various types

of definitions), Some basic results on probability (Boole’s and
Bonferoni’s inequalities), Conditional probability, Bayes the-
orem, Probability for independent events, Random variable,
types of random variables(discrete and continuous), Cumu-
lative distribution function, probability mass function, prob-
ability density function, Mean, Variance, Standard deviation,
Moments (central and about origin), Moment generating func-
tion, Special types of random variables, discrete: uniform,
binomial, geometric, Poisson, Hyper geometric, Continuous:
Uniform, Normal, Gamma, Exponential (one parameter).
. . . . . . . . . . . . . . . . . . . .
2/9
Introduction to
Probability
and Statistics
Syllabus
After Mid-Semester: Two dimensional random variable, joint
CDF, joint PDF/PMF, Marginal distribution, Conditional distri-
bution, Calculating probabilities using two dimensional ran-
dom variable, Sampling (with and without replacement), Dis-
tribution of sample mean and variance in the case of nor-
mal distribution, Estimation: Point estimation, Method of mo-
ments, method of maximum likelihood, Confidence Intervals
for mean and variance in the case of normal, Testing of hy-
pothesis (parameters of normal distribution), Goodness of fit
Chi-square test, Regression and Correlation analysis, Rank
correlation coefficient.
. . . . . . . . . . . . . . . . . . . .
3/9
Introduction to
Probability
and Statistics
Motivation
Syllabus
If an experiment is repeated under essentially homoge-
neous and similar conditions, we generally come across
two types of situations. Note that, an experiment is a
process of measurement or observation, in a laboratory,
in a factory, in nature or wherever; so experiment is used
in a more general sense.
1. The results (or outcome), is unique or certain: determin-
istic or predictable (a) For a perfect gas, PV = Constant
(b) The velocity v of a particle after time t is given by
v = u + at etc.
2. The result is not unique, but may be one of the several
possible outcomes: Unpredictable or Probabilistic. This
is also known as Random or Statistical Experiment.
. . . . . . . . . . . . . . . . . . . .
4/9
Introduction to
Probability
and Statistics
Random Experiment
Syllabus
An experiment is called random or statistical experiment,
if it satisfies the following three conditions. (1) All out-
comes of the experiment are known in advance, (2) Any
performance of the experiment results in an outcome
that is not known in advance, (3) The experiment can
be repeated under identical conditions.
In probability theory, we study this uncertainty of a ran-
dom experiment.
Examples of Random Experiments: (1) Tossing of a coin
once. (2) Rolling of a six faced die. (3) Inspecting a light
bulb. (4) Asking for opinion about a new electronic prod-
uct. (5) Counting daily traffic accidents. (6) Measuring
copper content of brass. (7) Picking a card from a well
snuffled pack of cards. (8) Measuring tensile strength of
wire.
. . . . . . . . . . . . . . . . . . . .
5/9
Introduction to
Probability
and Statistics
Basic Terminologies
Sample space: The set of all possible outcomes of an
Syllabus experiment (random). The elements of a sample space
is known as sample points. We will denote the sample
space by S. The result of an experiment is called an
outcome.
Trial and Event: Any particular performance of an ran-
dom experiment is known as a trial. Outcomes or com-
binations of outcomes are called events. More formally,
any subset of a sample space is called an event. The
events will be denoted by capital letters, such as A, B,
C, D etc.
Example: Tossing of a coin once, here sample space is
S = {Head = H, Tail = T }, Events are A = {Head},
B = {Tail}, C = {Head, Tail}, E = ∅.
σ-algebra or σ-field on S: A σ-field is a non-empty class
of subsets of S that is closed under countable unions
and complements and contains the null set ∅.
. . . . . . . . . . . . . . . . . . . .
6/9
Introduction to
Probability
and Statistics
Basic Terminologies
Examples of σ-field: Let A = {a, b, c}, then define the
Syllabus class of subsets of A as A1 = {{a}, {b}, {c}}, A2 =
Power set of A. A2 is a σ-filed on A, whereas A1 is not.
Exhaustive Events: The total number of possible out-
comes of an random experiment.
Favorable events: The number of cases favorable to an
event.
Mutually exclusive events: Events are said to be mu-
tually exclusive or incompatible if the happening of any
one of them precludes the happening of all the others.
That is no two or more of them can happen simultane-
ously.
Equally likely events: Outcomes are called equally likely
if taking into consideration all the relevant evidences,
there is no reason to expect one in preference to the
others.
. . . . . . . . . . . . . . . . . . . .
7/9
Introduction to
Probability
and Statistics
Basic Terminologies
Syllabus Independent Events: Several events are said to be inde-

pendent if the happening or non-happening of an event
is not affected by the supplementary knowledge con-
cerning the occurrence of any number of the remaining
events.
Probability(Mathematical or Classical Definition): If a ran-
dom experiment results in n exhaustive, mutually exclu-
sive and equal likely outcomes, out of which m are fa-
vorable to the occurrence of an event A, then the prob-
ability of occurance or happening of A, usually denoted
by P(A), is given by
m No. of favourable cases

P(A) = = .
n Total no. of exhaustive cases
. . . . . . . . . . . . . . . . . . . .
8/9
Introduction to
Probability
and Statistics
Continue....
Observations: We can see, that P(A) ≥ 0, also 0 ≤
Syllabus P(A) ≤ 1. P(A) + P(Ac ) = 1.
Here we can compute the Probability by logical reason-
ing, without conducting any experiment. Since the prob-
ability can be computed prior to obtaining any experi-
mental data, it is also known as ‘a priori’ or Mathematical
probability.
In rolling a fair die, what is the probability of obtain-
ing at least a 5, and probability of getting an odd num-
ber. We can use the above definition to compute this
probability, since here we can assume that the sample
points are equally likely, exhaustive and and finite. So,
P(at least 5) = P({5, 6}) = 2/6 = 1/3, and probability
of getting odd number P({1, 3, 5}) = 3/6 = 1/2.
Limitations:(a) If the various outcomes of the random ex-
periment are not equally likely or have equal chance of
occurrence. (b) If the exhaustive number of outcomes of
the random experiment is. infinite
. . . . . . or
. . .unknown.
. . . . . . . . . .
9/9
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-2
. . . . . . . . . . . . . . . . . . . .
1/14
Introduction to
Probability
and Statistics
Continue...
Statistical or Empirical Probability (VON MISES): If an
Syllabus experiment is performed repeatedly under essentially ho-
mogeneous and identical conditions, then the limiting
value of the ratio of the number of times the event occurs
to the number of trials, as the number of trials becomes
indefinitely large, is called the probability of happening
of the event, it is being assumed that the limit is finite
and unique. Mathematically, if for n trials an event A
happens m times, then
m
P(A) = lim .
n→∞ n
J. E. Kerrich conducted a coin tossing experiment with
10 sets of 1000 tosses ech during his confinement in
World war-II. The number of heads found were: 502,
511, 529, 504, 476, 507, 520, 504, 529. The probability
of getting a head in tossing a coin once is computed
using the above definition, 5, 079/10, 000 = 0.5079 ≈
. . . . . . . . . . . . . . . . . . . .
0.5. 2/14
Introduction to
Probability
and Statistics
Continue...
Syllabus We also call the probability as the counterpart of relative

frequency. If we denote f (A) as the absolute frequency
of an event A. Then Relative frequency of A is denoted
by
f (A)
frel (A) = .
n
We can observe that, 0 ≤ frel (A) ≤ 1, frel (S) = 1. For
two mutually
∪ exclusive events, say A and B we have,
frel (A B) = frel (A) + frel (B). So, having all these prior
information regarding the probability, we will try to gen-
eralize the definition in such a way that, it should include
all the previous definitions, and might be the best and
most practical. This we call as Axiomatic definition of
Probability, which we will discuss in the next.
. . . . . . . . . . . . . . . . . . . .
3/14
Introduction to
Probability
and Statistics
Axiomatic Definition
Syllabus Let S be a given sample space, and S be a σ-field on it.

Then probability P is defined as a set function P : S →
[0, 1], which satisfies the following axioms.
1 For every event A ∈ S, 0 ≤ P(A) ≤ 1.
2 The entire sample space has the probability, P(S) = 1.
3 For mutually exclusive events A and B,
∪
P(A B) = P(A) + P(B).
The above axiom (3) can be extended to countable num-

ber of mutually exclusive events, say A1 , A2 , . . . that is,
∪ ∪
P(A1 A2 . . . ) = P(A1 ) + P(A2 ) + . . . .
. . . . . . . . . . . . . . . . . . . .
4/14
Introduction to
Probability
and Statistics
Basic Results on Probability
Syllabus 1 Probability of the impossible event is zero, that is P(∅) = 0.

∪ the ∅ does not contain any elements, hence
Observe that,
we have S ∅ = S. These ∪ two sets are disjoint. Using the
third axiom, we have P(S ∅) = P(S) = P(S) + P(∅), which
implies that P(∅) = 0.
Note: P(A) = 0, does not imply that A is necessarily an empty
set. In practice, probability ‘0’ is assigned to the events which
are so rare that they happen only once in a life time. For
example, in a random tossing of a coin, the event that the
coin will stand erect on its edge, is assigned the probability
‘0’.
2 Probability of the complementary event Ac of A is obtained as
P(Ac ) = 1 − P(A). To∪prove this, observe that, A and Ac are
c
disjoint events
∪ c and A A = S. Hence using axiom (3), we
have P(A A ) = P(S) = 1, this implies P(A) + P(Ac ) = 1,
and the result.
. . . . . . . . . . . . . . . . . . . .
5/14
Introduction to
Probability
and Statistics
Some More Results
∩
Syllabus 3 For any
∩ two events A and B, we have ∩ (i) P(Ac B) = P(B) −
P(A B). (ii) If B ⊂ A, then (a) P(A B c ) = P(A)∩− P(B),
(b) P(B) ≤ P(A). To prove (ii), the events ∪ B∩and A B c are
c
mutually
∩ exclusive, hence P(A) =∩P(B (A B )) = P(B) +
P(A ∩ B c ), this implies that P(A B c ) = P(A) − P(B). Also
P(A B c ) ≥ 0 which implies that P(A) ≥ P(B).
4 Addition Theorem∪ of Probability: For events A ∩
and B in a sam-
ple space S, P(A B) = P(A) ∩ + P(B)
∩ − P(A B). ∩
Proof: The events A − (A B), A ∪ B, and B − (A B) are
mutually exclusive and∪its union is A B.∩Hence applying ∩ ax-
iom (3), we∩ have P(A B) = P(A − (A∪ B)) + P(A B) +
P(B − (A B)) = P(A) + P(B) − P(A B). This proves the
theorem.
Ex. Prove this theorem without using axioms of probability. Hint:
Use set theory approach.
. . . . . . . . . . . . . . . . . . . .
6/14
Introduction to
Probability
and Statistics
Boole’s Inequality
For n events A1 , A2 , . . . , An , we have
Syllabus
∩
n ∑
n
(i) P( Ai ) ≥ P(Ai ) − (n − 1).
i=1 i=1
∪n ∑
n
(ii) P( Ai ) ≤ P(Ai ).
i=1 i=1
Proof:(i) This can be proved by the method of mathematical

∪
induction. Verify the result
∩ for n = 2 that is, P(A1 ∩ A2 ) =
P(A1 ) + P(A2 ) − P(A1 A2 ) ≤ 1, this implies P(A1 A2 ) ≥
P(A1 ) + P(A2 ) − 1. The result is true for n = 2. Assume that
the result holds true for n = k , that is
∩
k ∑
k
P( Ai ) ≥ P(Ai ) − (k − 1).
i=1 i=1
To prove that the result is true for n = k + 1.

. . . . . . . . . . . . . . . . . . . .
7/14
Introduction to
Probability
and Statistics
Continue...
Syllabus
k∩
+1 ∩
k ∩
L.H.S : P( Ai ) = P(( Ai ) Ak+1 )
i=1 i=1
∩
k
≥ P( Ai ) + P(Ak +1 ) − 1
i=1
∑
k
≥ P(Ai ) − (k − 1) + P(Ak +1 ) − 1
i=1
∑
k +1
= P(Ai ) − k : R.H.S
i=1
. . . . . . . . . . . . . . . . . . . .
8/14
Introduction to
Probability
and Statistics
Continue.....
Proof (ii): Now applying the above inequality, with the events
Syllabus
Ac1 , Ac2 , . . . , Acn , we have
∩ ∑
n
P( Ai ) ≥
c
P(Aci ) − (n − 1)
i=1
= 1 − P(A1 ) + · · · + 1 − P(An ) − (n − 1)
∑n
= 1− P(Ai ).
i=1
Hence, we have
∑
n ∩
P(Ai ) ≥ 1 − P( Aci )
i=1
∪
= 1 − P([ Ai ]c )
∪
= P( Ai ).
. . . . . . . . . . . . . . . . . . . .
9/14
Introduction to
Probability
and Statistics
Bonferroni’s Inequality
Syllabus Given n events, A1 , A2 , . . . , An , we have
∑
n ∪
n ∑
n ∑ ∩
P(Ai ) ≥ P( Ai ) ≥ P(Ai ) − P(Ai Aj ).
i=1 i=1 i=1 1≤i≤j≤n
Proof: This can be proved by the method of mathematical

induction. Check that, the result is true for n = 3.
∪
3 ∑
3 ∑ ∩ ∩
3
P( ) = P(Ai ) − P(Ai Aj ) + P( Ai )
i=1 i=1 1≤i<j≤3 i=1
∑
3 ∑ ∩
≥ P(Ai ) − P(Ai Aj ).
i=1 1≤i<j≤3
Let the result is true for n = k . To prove the result for

n = k + 1.
. . . . . . . . . . . . . . . . . . . .
10/14
Introduction to
Probability
and Statistics
Proof Continue...
Syllabus
k∪
+1 ∪
k ∪
P( Ai ) = P( Ai Ak +1 )
i=1 i=1
∪
k ∪
k ∩
= P( Ai ) + P(Ak +1 ) − P[( Ai ) Ak +1 ]
i=1 i=1
∪
k ∪
k ∩
= P( Ai ) + P(Ak +1 ) − P[ (Ai Ak+1 )]
i=1 i=1
∑
k ∑ ∩
≥ { P(Ai ) − P(Ai Aj )}
i=1 1≤i<j≤n
∪
k ∩
+P(Ak +1 ) − P{ (Ai Ak +1 )}.
i=1
. . . . . . . . . . . . . . . . . . . .
11/14
Introduction to
Probability
and Statistics
Continue....
Syllabus
From Boole’s inequality, we have
∪
k ∩ ∑
k ∩
P( (Ai Ak+1 )) ≤ P(Ai Ak +1 )
i=1 i=1
Using this, we get

k∪
+1 ∑
k+1 ∑ ∩ ∑
k ∩
P( Ai ) ≥ P(Ai ) − P(Ai Aj ) − P(Ai Ak +1 )
i=1 i=1 1≤i<j≤n i=1
∑
k+1 ∑ ∩
= P(Ai ) − P(Ai Aj ).
i=1 1≤i<j≤k +1
. . . . . . . . . . . . . . . . . . . .
12/14
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-3
. . . . . . . . . . . . . . . . . . . .
1/9
Introduction to
Probability
and Statistics
Some Examples
1. An integer is chosen at random from two hundred digits {1, 2, . . . , 200}.
Syllabus
What is the probability that the integer is divisible by 6 or 8?
Ans: The sample space S = {1, 2, . . . , 200}. The event that the
integer chosen is divisible by 6 is A = {6, 12, 18, . . . , 198}, gives
|A| = 198/6 = 33. Hence P(A) = 33/200. Similarly, the event that
the integer is divisible by 8, B = {8, 16, 24, . . . , 200}, |B| = 200/8 =
25, implies P(B) = 25/200. The LCM of 6 and 8 is 24. The num-
ber
∩ will be divisible by both 6 and
∩ 8 if it is divisible by 24.∩Hence
A B = {24, 48, . . . , 192}. |A ∪ B| = 8. This gives P(A ∩ B) =
8/200. Hence, we have P(A B) = P(A) + P(B) − P(A B) =
33/200 + 25/200 − 8/200 = 1/4.
Ex. A and B alternatively cut a pack of cards and the pack is shuffled
after each cut. If A starts and the game is continued until one cuts a
diamond, what are the respective chances of A and B first cutting a
diamond?
Ex. The sum of two non-negative quantities is equal to 2n. Find the
chance that their product is not less than 34 times their greatest prod-
uct.
. . . . . . . . . . . . . . . . . . . .
2/9
Introduction to
Probability
and Statistics
2. The probability that a student passes a Physics test is 2/3 and the
Syllabus probability that he passes both a Physics test and English test is
14/45. The probability that he passes at least one test is 4/5. What
is the probability that he passes the English test?
Sol’n: A= The student passes the Physics test, B=The∩student passes
the English
∪ test. Given P(A) = 2/3,
∪ P(B) =?, P(A B) = 14/45, ∩
P(A B) = 4/5. We know P(A B) = P(A) + P(B) − P(A B).
This gives P(B) = 4/9.
3. A card is drawn from a pack of 52 cards. Find the probability of
getting a king or heart or a red card?
Ans: A=the card is a king, B=the card is a heart, C=the card is red.
The ∪events
∪ A, B and C are not mutually exclusive.
∩ We need∩to get
P(A ∩ B C) = ∩ P(A)
∩ + P(B) + P(C) − P(A B) − P(B C) −
P(A C) + P(A B∩ C). Given P(A)∩= 4/52, P(B) = ∩ 13/52,
P(C) = 26/52,
∩ ∩ P(A B) = 1/52, P(B C) = 13/52, P(A C) =
2/52, P(A B C) = 1/52. Hence the required probability is 7/13.
. . . . . . . . . . . . . . . . . . . .
3/9
Introduction to Conditional Probability: Sometimes it is essential to use the prior
Probability
and Statistics information regarding the happening of an event. Suppose, A and B
are two events in a given sample space. The happening of A may
be affected by the happening or non-happening of B. The probability
Syllabus
of A, under the condition that another event, say B, has happened
is called the conditional probability of A given B, this we denote by
P(A|B) and computed as,
∩
P(A B)
P(A|B) = , P(B) ̸= 0.
P(B)
∩
Moreover, we get P(A B) = P(A|B)P(B) = P(B|A)P(A), which is
known as multiplication rule.
Independent
∩ Events: Two events A and B are said to be independent
if P(A B) = P(A)P(B). Hence, if the events A and B are indepen-
dent we have the conditional probability as P(A|B) = P(A), P(B|A) =
P(B) provided P(A) ̸= 0, and P(B) ̸= 0.
The events A1 , A2 , . . . , An are said to be independent if
∩ ∩ ∩
P(A1 A2 · · · An ) = P(A1 )P(A2 ) . . . P(An )
and for k different events Aj1 , Aj2 , . . . , Ajk ,
∩ ∩
P(Aj1 Aj2 . . . Ajk ) = P(Aj1 )P(Aj2 ) . . . P(Ajk )
where k = 2, 3, . . . , n − 1.
. . . . . . . . . . . . . . . . . . . .
4/9
Introduction to
Probability
and Statistics
Pairwise Independent: The
∩ events A1 , A2 , . . . , An are said to be pair-
wise independent if P(Ai Aj ) = P(Ai )P(Aj ), for all i, j = 1, 2, . . . , n
such that i ̸= j.
Syllabus
Note that, independent implies pairwise independent but the con-
verse may not be true.
Problem 1: Prove that if A and B are independent, then (i) A and B c ,
(ii) Ac and B and ∩ (iii) Ac and B c are independent.
∩
Proof (i): P(A B ) ∩ c
= P(A) − P(A ∪ − P(B)) =
B) = P(A)(1 ∪
P(A)P(B c ). (iii) P(Ac B c ) = P((A B)c ) = 1 − P(A B) = 1 −
P(A) − P(B) + P(A)P(B) = (1 − P(A))(1 − P(B)) = P(Ac )P(B c ).
∪
Problem 2: If the events A, B and C are independent then A B and
C are also independent. (Home work)
Problem 3: Let A and B be two events ∪ such that P(A) = 3/4,
P(B) = 5/8 then show that (i) P(A B) ≥ 3/4, (ii) 3/8 ≤
and ∩
P(A B) ≤ 5/8. ∪ ∪
∪ Since A ⊂ (A
Proof(i): ∩ B), P(A) ≤ P(A ∩ B) this implies 3/4 ≤
P(A
∪ B). (ii) Also (A B) ⊂ B,
∩ so P(A B) ≤ P(B) = 5/8. Further
(A ∩ B) = P(A)+P(B)−P(A ∩ B) ≤ 1. This implies 3/4+5/8−1 ≤
P(A B). This gives ∩ P(A B) ≥ 3/8. Combining (i) and (ii), we
have 3/8 ≤ P(A B) ≤ 5/8.
. . . . . . . . . . . . . . . . . . . .
5/9
Introduction to
Probability
and Statistics
Syllabus
Bayes’ Theorem: If A1 , A2 , . . . , An are mutually disjoint events with

P(Ai ) ̸= 0,
∪i = 1, 2, . . . , n then for any arbitrary event B which is a
subset of Ai such that P(B) > 0, we have
P(Ai )P(B|Ai ) P(Ai )P(B|Ai )
P(Ai |B) = ∑n = .
i=1 P(A i )P(B|Ai ) P(B)
∑
Here we can see that P(B) = ni=1 P(Ai )P(B|Ai ). This is called the
total probability.
. . . . . . . . . . . . . . . . . . . .
6/9
Introduction to
Probability
and Statistics
Problem (1): There are two bags A and B. A contains n white and 2
Syllabus black balls and B contains 2 white and n black balls. One of the two
bags is selected at random and two balls are drawn from it without
replacement. If both the balls drawn are white and the probability
that the bag A was used to draw the ball is 6/7, find the value of n.
Ans: Let E1 be the event that the bag A was selected, E2 be the
event that the bag B is selected. Let E be the event that the two balls
drawn are white. Hence we have P(E1 ) = P(E2 ) = 1/2, P(E|E1 ) =
C(n,2) C(2,2)
C(n+2,2)
, P(E|E2 ) = C(n+2,2) . From Bayes theorem,
P(E1 )P(E|E1 )
P(E1 |E) = = 6/7.
P(E1 )P(E|E1 ) + P(E2 )P(E|E2 )
Substituting all the values in the left hand side, we have after simplifi-
n(n−1)
cation n(n−1)+2 = 6/7. Which gives n2 −n−12 = 0 and consequently
after solving we get n = 4, −3. Since n can not be negative we have,
n = 4.
. . . . . . . . . . . . . . . . . . . .
7/9
Introduction to
Probability
and Statistics

Course ID:MA2203
Lecture-4
. . . . . . . . . . . . . . . . . . . .
1/9
Introduction to
Probability
and Statistics
Random Variable: A random variable X is finite and single valued

function from the sample space S to R, such that the inverse images
under X of all Borel sets in R are events. That is X −1 (B) = {w :
X (w) ∈ B} event for all B ∈ B. Class of Borel sets is the collection of
open or closed intervals in R, which is closed under countable union,
countable intersection and complementation. In order to verify that
a real valued function on S is a random variable, it is not necessary
to check for all Borel sets. It is sufficient to verify the condition for
any class of subsets of R. Here we take the class of semi-closed
intervals (−∞, x] x ∈ R. Note that, for any real a the probability
P(X = a) with which X assumes a is defined. and for any interval I,
the probability P(X ∈ I) is defined.
We can see that, the semi-closed interval
∩
∞
1
(−∞, x] = (−∞, x + ).
n
n=1
. . . . . . . . . . . . . . . . . . . .
2/9
Introduction to
Probability
and Statistics
Examples (i): Suppose we toss a coin once. Here S = {H, T }. Let

us define a function X :→ R, such that X is the number of heads
turns up. To verify that it is a random variable, observe that X (T ) =
0, X (H) = 1. Take a subset of R as (−∞, x], x ∈ R.
X −1 (−∞, x] = ∅, if x < 0,
= {T }, if 0 ≤ x < 1,
= {T , H}, if x ≥ 1.
−1
In all the cases X (−∞, x] is an event. Hence X is a random vari-
able.
. . . . . . . . . . . . . . . . . . . .
3/9
Introduction to
Probability (ii): Suppose we throw a die once. S = {1, 2, 3, 4, 5, 6}. Define X as
and Statistics
the number shows up when we throw. That is X (1) = 1, X (2) = 2,
X (3) = 3, X (4) = 4, X (5) = 5, X (6) = 6. To check whether X is a
random variable, observe that,
X −1 (−∞, x] = ∅, if x < 1,
= {1}, if 1 ≤ x < 2,
= {1, 2}, if 2 ≤ x < 3
= {1, 2, 3} if 3 ≤ x < 4
= {1, 2, 3, 4} if 4 ≤ x < 5
= {1, 2, 3, 4, 5} if 5 ≤ x < 6
= {1, 2, 3, 4, 5, 6}, if x ≥ 6.
In all the cases X −1 (−∞, x] is an event. Hence X is a random vari-

able.
In general, we can say that the random variable is the quantity that
we observe in a random experiment. The number of heads, the
number shows up in throwing a die, the number of deaths by cancer,
the number of accidents in a city, amount of rain fall, hardness of
steel, etc.
. . . . . . . . . . . . . . . . . . . .
4/9
Introduction to Distribution Function or Cumulative Distribution Function (CDF): A
Probability
and Statistics function F (x) which is defined in (−∞, ∞) such that it is monotoni-
cally non-decreasing, right continuous and F (−∞) = 0, F (∞) = 1.
The CDF of a random variable X is defined as F (x) = P(X ≤ x), we
read it as the probability that the random variable X will not exceed
x. Here x ∈ R.
(i) The probability that the random variable X will be in the interval
a < X ≤ b is computed as P(a < X ≤ b) = F (b) − F (a). The
interval (−∞, b] is the disjoint union of (−∞, a] and (a, b]. Hence
F (b) = P(X ≤ a) + P(a < X ≤ b).
Types of random variables: (i) Discrete type, (ii) Continuous type.
Discrete Type RV: A random variable X is said to be discrete if X
assumes only finitely or countable number of values, say x1 , x2 , . . . ,
called the possible values of X with probabilities p1 = P(X = x1 ),
p2 = P(X = x2 ), . . . whereas P(X ∈ I) ∑ = 0 for ∑any interval I that
does not contain any xi . Here pi > 0 and i pi = i P(X = xi ) = 1
and these pi s are known as the probability mass function (pmf) of X .
The CDF of a discrete type random variable X is obtained as
∑
F (x) = P(X = xi ).
xi ≤x
∑
Moreover
∑ P(a < X ≤ b) = a<x≤b P(X = x), P(a < X < b) =
a<x<b P(X = x).
. . . . . . . . . . . . . . . . . . . .
5/9
Introduction to Examples of Discrete Type RV: (i) If we toss a coin once, then S =
Probability
and Statistics {H, T }. Let X be the number of tails. Then X (H) = 0 = x1 , X (T ) =
1 = x2 . Further p1 = P(X (H) = 0), p2 = P(X (T ) = 1) is the
probability mass function of X . Here we have two points. Also we
have p1 + p2 = 1. If the coin is fair we can take p1 = p2 = 1/2. (ii)
Tossing of a die, X is the number that shows up. (iii) Suppose we
toss two coins simultaneously, X is the sum of head and tail. (iv)
Suppose we throw two fair die simultaneously, X be the sum of two
numbers that shows up.
Continuous Type RV: A random variable X and its distribution are
called continuous if if its distribution function F (x) can be obtained
by an integral, ∫ x
F (x) = f (v )dv ,
−∞
here f (x) > 0 and is known as the probability density function (pdf)
of X , and ∫ ∞
f (x)dx = 1.
−∞
Now differentiating this F (x) at the point of continuity, we have
F ′ (x) = f (x).
Moreover, we have P(a < X < b) = P(a ≤ X ≤ b) = P(a ≤ X <
∫b
b) = P(a < X ≤ b) = a f (x)dx.
. . . . . . . . . . . . . . . . . . . .
6/9
Introduction to
Probability
and Statistics
Examples of Continuous Type RV: (i) Let X have the density function
f (x) = 0.75(1 − x 2 ), if −1 ≤ x ≤ 1 and zero otherwise. Find
the distribution function. Find the probabilities P(− 12 ≤ X ≤ 12 ),
P( 14 ≤ X ≤ 2).
Ans: To obtain the CDF, F (x) we have
F (x) = 0, if x ≤ −1,
∫ x
= 0.75(1 − v 2 )dv
−∞
= 0.5 + 0.75x − 0.25x 3 , if − 1 < x ≤ 1

= 1, if x > 1.
∫ 1
Now P(− 21 ≤ X ≤ 12 ) = 2
− 12
f (x)dx = 0.6875. P( 41 ≤ X ≤ 2) =
∫2
1 f (x)dx = 0.3164.
4
. . . . . . . . . . . . . . . . . . . .
7/9
Introduction to Some More Examples of Continuous Type RV: (i) The probability
Probability
and Statistics density function of a random variable X is
{ sinx
2
, if 0 ≤ x ≤ π
f (x) =
0, elsewhere.
Check that it is a probability density function and find its cumulative
distribution function. Further find P(1/2 < X < π) and P(X > π/2).
(ii) Let X be a random variable having probability density function
{ x
6
+ k , if 0 ≤ x ≤ 3
f (x) =
0, elsewhere.
Finf the value of k , and obtain the cumulative distribution function
F (x). Further find (a) P(1 < X < 2) (b) P(X > 1.8) (c) P(3/2 <
X < 3).
(iii) A continuous random variable have the probability density func-
tion
{
ke−kx , if x > 0
f (x) =
0, elsewhere.
Find the value of k and the cumulative distribution function. Further
obtain the probabilities (a) P(X > 1/2) (b) P(1 < X < 2) and (c)
P(X < 10).
. . . . . . . . . . . . . . . . . . . .
8/9
Introduction to
Probability
and Statistics
Some More Examples of Discrete Type RV: (i) Let X be a discrete
type random variable having probability mass function given by
x 0 1 2 3 4 5 6 7
P(X = x) 0 k 2k 2k 3k k 2 2k 2 7k 2 + k
Find the value of k and the cumulative distribution function of X .
Further find (a) P(0 < X < 1.5) (b) P(X ≥ 5) (c) P(1.9 ≤ X < 6)
(d) P(X < 8)
(ii) Suppose we toss pair of dice simultaneously. Let X denotes the
minimum of two numbers that appear. Show that X is a random
variable and find its cumulative distribution function F (x). Do the
same problem if X denotes the maximum of two numbers.
(iii) Let X be the sum of two numbers that appear when two dice
are thrown simultaneously. Show that X is random variable and also
obtain its cumulative distribution function.
(iv) Suppose we toss 3 coins simultaneously. Let X denotes the sum
of the number of heads. Find the cumulative distribution function if
X is a random variable.
. . . . . . . . . . . . . . . . . . . .
9/9
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-6
. . . . . . . . . . . . . . . . . . . .
1/9
Introduction to
Probability
and Statistics
Syllabus
Syllabus

. . . . . . . . . . . . . . . . . . . .
2/9
Introduction to
Probability
and Statistics
Syllabus
. . . . . . . . . . . . . . . . . . . .
3/9
Introduction to
Probability
and Statistics Moment Generating Function(MGF): Let X be a random variable de-
fined on a sample space S. We denote the MGF of X having some
distribution as MX (t) and is defined as
Syllabus
∑ tx
MX (t) = E(etX ) = e j P(X = xj ), if X is Discrete
j
∫ ∞
= etx f (x)dx, if X is Continuous.
−∞
( t 2X 2 tr X r )
MX (t) = E(etX ) = E 1 + tX + + ··· + + ...
2! r!
t2 tr
= 1 + E(X ) + E(X ) + · · · + E(X r ) + . . .
2
2! r!
∑∞
tr ′
= µr ,
r!
r =0
where µ′r = E(X ) is the r th moment about origin of X .

r
tr
That is, we see that the coefficient of r!
in MX (t) gives the r th moment
about origin that is µ′r .
. . . . . . . . . . . . . . . . . . . .
4/9
Introduction to
Probability
and Statistics
Further differentiating MX (t) with respect to t r times and putting
t = 0, we get
Syllabus
dr µ′ t2
r
[MX (t)]t=0 = [ r r ! + µ′r +1 t + µ′r +2 + . . . ]t=0 = µ′r .
dt r! 2!
The moment generating function of X about any point ‘a’ is given by
M(X =a) (t) = E(et(X −a) )

t2
= E[1 + t(X − a) + (X − a)2 + . . .
2!
tr
+ (X − a)r + . . . ].
2!
Since it generates the moments, it is known as the moment gener-
ating function.
The moment generating function if exists is unique. That is for a
given random variable X , we can find its moment generating func-
tion uniquely and vice versa. In another word, we can say that the
moment generating function characterizes the distribution function.
. . . . . . . . . . . . . . . . . . . .
5/9
Introduction to Relation between Moment about origin (µ′r ) and Central Moments(µr ):
Probability
and Statistics µk = E(X − µ)k = C(k , 0)µ′k − C(k , 1)µµ′k −1
+C(k , 2)µ2 µ′k −2 − · · · + (−1)k µk .
Syllabus
µ′k = E(X k ) = E(X − µ + µ)k = C(k , 0)µk + C(k , 1)µµk −1

+C(k , 2)µ2 µk −2 + · · · + µk .
Exercises:(i) Let the random variable X has the pmf P(X = r ) =
q r −1 p; r = 1, 2, 3 . . . Find the mgf of X and hence its mean and
variance. 0 < p, q < 1, p + q = 1.
Ans:
∑∞
MX (t) = E(etX ) = etr q r −1 p
r =1
p∑
∞
= (qet )r
q
r =1
p t∑
∞
= qe (qet )r −1
q
r =1
( pet )
= pet [1 + qet + (qet )2 + . . . ] = .
1 − qet
. . . . . . . . . . . . . . . . . . . .
6/9
Introduction to (ii) The probability density function of the random variable X is given
Probability
and Statistics by
1 ( |x − θ| )
f (x) = exp − , −∞ < x < ∞.
Syllabus 2θ θ
Finf the moment generating function of X , and hence mean and vari-
ance.
Ans:
∫ ∞ ( |x − θ| )
1
MX (t) = E(etX ) = exp − etx dx
−∞ 2θ θ
∫ θ ( (θ − x) )
1
= exp − etx dx
−∞ 2θ θ
∫ ∞ ( (x − θ) )
1
+ exp − etx dx
θ 2θ θ
∫ θ
1 1
= exp{x(t + )}dx
2θe −∞ θ
∫ ∞
e 1
+ exp{−x( − t)}dx
2θ θ θ
eθt eθt
= +
2(θt + 1) 2(1 − θt)
eθt
= = eθt (1 − θ2 t 2 )−1
1 .− .θ2. t .2 . . . . . . . . . . . . . . . .
7/9
Introduction to
Probability Hence
and Statistics
θ2 t 2
MX (t) = eθt (1 − θ2 t 2 ) = (1 + θt + + . . . )(1 + θ2 t 2 + θ4 t 4 + . . .
Syllabus 2!
3t 2 θ2
= 1 + θt + + ...
2!
Mean of X is the coefficient of t in MX (t) which is θ. That is µ′1 = θ,
2
µ′2 = coefficient of t2 in MX (t) which is 3θ2 . Hence variance of X ,
V (X ) = µ′2 − (µ′1 )2 = 3θ2 − θ2 = 2θ2 .
(iii) If the moments of a random variable X are given by E(X r ) = 0.6;
r = 1, 2, 3, . . . show that P(X = 0) = 0.4, P(X = 1) = 0.6, P(X ≥
2) = 0.
Ans: The moment generating function is given by
∑
∞
tr ′ ∑
∞
tr
MX (t) = µr = 1 + (0.6)
r! r!
r =0 r =1
∑
∞
tr ∑
∞
tr
= 0.4 + 0.6 + (0.6) = 0.4 + 0.6[ ]
r! r!
r =1 r =0
= 0.4 + 0.6et .
. . . . . . . . . . . . . . . . . . . .
8/9
Introduction to
Probability
and Statistics
Syllabus
But we have,
∑
∞
MX (t) = E(etX ) = etx P(X = x)
x=0
∑
∞
= P(X = 0) + et P(X = 1) + etx P(X = 2)
x=2
Comparing these two, we get the required result that is P(X = 0) =

0.4, P(X = 1) = 0.6 and P(X ≥ 2) = 0.
. . . . . . . . . . . . . . . . . . . .
9/9
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-7
. . . . . . . . . . . . . . . . . . . .
1/11
Introduction to
Probability
and Statistics
Syllabus
Syllabus

. . . . . . . . . . . . . . . . . . . .
2/11
Introduction to
Probability
and Statistics
Syllabus
. . . . . . . . . . . . . . . . . . . .
3/11
Introduction to
Probability
and Statistics
Special Types of Distributions (Discrete): (A) Degenerate Distribu-
Syllabus tion. A random variable X is said to be degenerate at a point k if
P(X = k ) = 1, and P(X ̸= k ) = 0. Its CDF is
F (x) = 0, if x < k
= 1, if x ≥ k .
Its MGF MX (t) = etk , hence E(X r ) = k r . Mean E(X ) = k , V (X ) =

0.
(B) Two Point Distribution: A random variable X has two pint distri-
bution if it takes two values, say x1 and x2 with probabilities (pmf)
P(X = x1 ) = p, P(X = x2 ) = 1 − p, 0 < p < 1. We can check
MX (t) = petx1 + (1 − p)etx2 , E(X ) = px1 + (1 − p)x2 , V (X ) =
p(1 − p)(x1 − x2 )2 .
If we chose x1 = 0 and x2 = 1 we get the Bernoulli random variable
that is P(X = 0) = p, P(X = 1) = 1 − p.
Bernoulli distribution arises in coin tossing problem.
. . . . . . . . . . . . . . . . . . . .
4/11
Introduction to
Probability
and Statistics
(C) Binomial Distribution: This distribution occurs in games of chance,
quality inspection, opinion polls etc. The number of times an event
Syllabus say A occurs in n trials, with each time probability p. Then in a trial
A will not occur with probability 1 − p = q. So, our interest here the
random variable X = Number of times A occurs in n trials. In this
case X takes the values 0, 1, 2, 3 . . . , n.
In general, if a statistical experiment has two outcomes for a given
performance or trial, say success or failure. The success probability
is p and failure probability is q = 1 − p. X is the number of success
in n trials.
For example, let us toss a coin n times. Let A = head occurs, Ac =
Tails occurs. Let P(H) = p and P(T ) = 1 − p. If we take n trials
that is toss it n times, and let X be the number of heads (success).
Here we are not bother about which trail gives the success, rather
we want the total number of success in n trails. X = x means in n
trails we have x number of success (heads) and n − x are failures or
tails. This we can see as,
(A, A, A, . . . , A) x − times (Ac , Ac , Ac , . . . , Ac ) n − x times.
. . . . . . . . . . . . . . . . . . . .
5/11
Introduction to Here if we want to determine the probability of getting x successes
Probability
and Statistics in n trials, that is
P(X = x) = C(n, x)px q n−x ,
Syllabus n!
= , x = 0, 1, 2, . . . , n
(n − x)!x!
This is called the Binomial random variable with probability mass

function as given above.
The moment generating function is
∑
n
MX (t) = E(etX ) = etx C(n, x)px (1 − p)n−x
x=0
∑
n
= C(n, x)(pet )x q n−x
x=0
= (q + pet ).
Mean of Binomial RV:
∑
n
µ = E(X ) = xP(X = x)
x=0
∑
n
n!
= x px q n−x
(n − x)!x!
. x=0
. . . . . . . . . . . . . . . . . . .
6/11
Introduction to
Probability
and Statistics
Mean of Binomial RV:
Syllabus
∑
n
µ = E(X ) = xP(X = x)
x=0
∑
n
n!
= x px q n−x
(n − x)!x!
x=0
∑
n
n!
= px q n−x
(n − x)!(x − 1)!
x=1
∑
n
n(n − 1)!
= p(x−1) pq (n−1)−(x−1)
((n − 1) − (x − 1))!(x − 1)!
x=1
∑
n
(n − 1)!
= np p(x−1) q (n−1)−(x−1)
((n − 1) − (x − 1))!(x − 1)!
x=1
∑
n∗
n∗!
= np px∗ q n∗−x∗ = np
(n ∗ −x∗)!x∗!
x∗=0
. . . . . . . . . . . . . . . . . . . .
7/11
Introduction to
Probability
and Statistics
Variance of Binomial RV: The variance is V (X ) = E(X 2 ) − (EX )2 .
Syllabus E(X 2 ) = E(X (X − 1) + X )
= E(X (X − 1)) + EX
∑ n!
= x(x − 1) px q n−x + EX
x
(n − x)!x!
∑ n!
= px q n−x + EX
x
(n − x)!(x − 2)!
∑ n(n − 1)(n − 2)!
= p(x−2) p2 q (n−2)−(x−2) + EX
x
((n − 2) − (x − 2))!(x − 2)!
∑
n∗
n∗!
= p2 n(n − 1) px∗ q n∗−x∗ + EX
(n ∗ −x∗)!x∗!
x∗=0
2
= n(n − 1)p + EX .
Hence V (X ) = EX 2 −(EX )2 = n(n −1)p2 +np −n2 p2 = np(1−p) =

npq.
. . . . . . . . . . . . . . . . . . . .
8/11
Introduction to
Probability
and Statistics
Application of Binomial Distribution:(i) Suppose we roll a fair die 4

Syllabus
times. Find the probability of obtaining at least two six.
Ans: Let A = event that a six is obtained. p = P(A) = 61 , q =
1 − p = 56 . n = 4 trials. Here X is the number of six in 4 trials, so the
probability mass function of X is
( 1 )k ( 5 )4−k
P(X = k ) = C(4, k ) , k = 0, 1, 2, 3, 4.
6 6
To get
P(X ≥ 2) = P(X = 2) + P(X = 3) + P(X = 4)

= C(4, 2)(1/6)2 (5/6)2 + C(4, 3)(1/6)3 (5/6)1
+C(4, 4)(1/6)4
1 171
= (6 ∗ 25 + 4 ∗ 5 + 1) = = 13.2%.
64 1296
. . . . . . . . . . . . . . . . . . . .
9/11
Introduction to
Probability
and Statistics (ii) If X is a Binomial random variable with E(X ) = 2, and V (X ) = 34 ,
find the distribution of X .
Syllabus Ans: Mean np = EX = 2, and V (X ) = npq. Solving these two
equations for n and p, we get p = 1/3, n = 6 and q = 2/3.. The
probability mass function of X is
P(X = k ) = C(6, k )pk q 6−k , k = 0, 1, 2, 3, 4, 5, 6.
(iii) The probability that the life of a bulb is 100 days is 0.05. Find the
probability that out of 6 bulbs (a) at least one, (b) none, (c) greater
than 4, (d) between 1 and 3 will be leaving a life of 100 days.
Ans: Given that the probability that the life of a bulb is of 100 days
is 0.05. Hence p = 0.05, and q = 19/20 = 0.95, n = 6. Using
Binomial distribution, P(X = k ) = C(n, k )pk q n−k , k = 0, 1, 2, . . . , n.
(a) Probability that out of 6 balls at least one leave the life of 100
days is
P(X ≥ 1) = 1 − P(X < 1)

= 1 − P(X = 0)
= 1 − C(6, 0)(1/20)6 (19/20)0 = 0.265.
. . . . . . . . . . . . . . . . . . . .
10/11
Introduction to
Probability
and Statistics
(b) P(X = 0) = C(6, 0)(1/20)0 (19/20)6 = (19/20)6 = 0.735.
(c) P(X > 4) = P(X = 5) + P(X = 6) = C(6, 5)(1/20)5 (19/20) +
Syllabus
C(6, 6)(1/20)6 (19/20)6−6 = 115/(20)6
(d)
P(1 ≤ X ≤ 3) = P(X = 1) + P(X = 2) + P(X = 3)

= C(6, 1)(0.05)(0.95)5
+C(6, 2)(0.05)2 (0.95)4
+C(6, 3)(0.05)3 (0.95)3
2471 ∗ 193
=
206
Find the maximum n such that the probability of getting no head in
tossing a coin n times is greater than 0.1.
Ans: Probability of getting a head in tossing a coin is p = 1/2, so q =
1/2. From binomial distribution we have P(X = k ) = C(n, k )pk q n−k ;
k = 0, 1, 2, . . . , n. But given that P(X = 0) > 0.1.
That is C(n, 0)(0.5)0 (0.5)n > 0.1 which gives after solving the in-
equality 10 > 2n . The maximum value of n that satisfies the inequal-
ity is n = 3.
. . . . . . . . . . . . . . . . . . . .
11/11
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-8
. . . . . . . . . . . . . . . . . . . .
1/9
Introduction to
Probability
and Statistics
Syllabus
Syllabus

. . . . . . . . . . . . . . . . . . . .
2/9
Introduction to
Probability
and Statistics
Syllabus
. . . . . . . . . . . . . . . . . . . .
3/9
Introduction to
Probability
and Statistics
Continue....
(D) Poisson Distribution: A random variable assuming non-negative
Syllabus
integer values having probability mass function,
e−λ λk
P(X = k ) = , k = 0, 1, 2, . . .
k!
is known as a Poison random variable. Here λ > 0 is a constant and
is the rate at which the event happens. This distribution is named
after the name of S.D. Poisson.
This random variable can be seen arising in the following practical
situations or can be well understood by the following examples. (i)
Suppose, we are counting the number of vehicles passing through a
traffic post within some time interval. (ii) The number of phone calls
recorded within a time period. (iii) In general the number of times a
particular event happen or which we also call as a success. In these
cases, the random variable X is the number of vehicles,number of
phone calls, or in general the number of success. Note that the
number of success may be 0, 1, 2, . . . . We also call this type of
process as Poisson process.
. . . . . . . . . . . . . . . . . . . .
4/9
Introduction to Mean of Poisson RV: The mean is given by
Probability
and Statistics ∑
∞
e−λ λk
EX = k
k!
k =0
Syllabus
∑ e−λ λk
=
(k − 1)!
k
∑
∞
λk −1
= λe−λ = λeλ e−λ = λ.
(k − 1)!
k =1
Variance: V (X ) = EX − (EX )2 . Hence we need to compute

2
EX 2 = E(X (X − 1)) + EX
∑
∞
e−λ λk
= k (k − 1) +λ
k!
k =0
∑ e−λ λk
= +λ
(k − 2)!
k
∑
∞
λk−2
= λ2 e−λ + λ = λ2 eλ e−λ + λ = λ2 + λ.
(k − 2)!
k =2
Hence V (X ) = λ + λ − λ2 = λ.
2
. . . . . . . . . . . . . . . . . . . .
5/9
Introduction to Moment Generating Function: The moment generating function is
Probability
and Statistics given by
∑
∞
e−λ λk
MX (t) = E(etX ) = etk
Syllabus k!
k =0
∑
∞
e−λ (λet )k
=
k!
k =0
t
= e−λ eλe .
Ex: In a book of 520 pages, 390 typo-graphical errors occur. Assum-
ing Poisson distribution for the number of errors per page, find the
probability that a random sample of 5 pages will contain no error.
Ans: The average number of typographical errors per page in the
book is given by λ = 390/520 = 0.75. Hence using Poisson distri-
bution, the probability of k errors per page is
∑
∞
e−λ λk
P(X = k ) =
k!
k =0
∑
∞
e−0.75 0.75k
= .
k!
k =0
The required probability that a random sample of 5 pages will contain

no error is P(X = 0)5 = (e−0.75 )5 = e−3.75 .
. . . . . . . . . . . . . . . . . . . .
6/9
Introduction to
Probability
and Statistics
Continue....
(D) Poisson Distribution as an Approximation or Limit of Binomial:
Syllabus The Poisson distribution can be obtained from the Binomial distribu-
tion under the following conditions.
(i) The number of trials must be indefinitely large that is n → ∞
(ii) The success probability p for each trial is indefinitely small that is
p→0
(iii) (Mean of Binomial) np → λ (finite) = mean of Poisson, under the
above two conditions.
Ex.1: If the probability of producing a defective screw is p = 0.01,
what is the probability that a lot of 100 screws will contain more than
2 defectives?
Ans:(a) Using Binomial Distribution: Let A = event that contain more
than 2 defectives, Ac = not contains more than 2 defectives. Here
p = 0.01, n = 100, q = 0.99. Now
P(Ac ) = C(100, 0)p0 q 100 + C(100, 1)p1 q 99 + C(100, 2)p2 q 98

= C(100, 0)(0.99)100 + C(100, 1)(0.01)(0.99)99 + C(100, 2)(
Hence P(A) = 1 − P(Ac ) = 0.0794.

. . . . . . . . . . . . . . . . . . . .
7/9
Introduction to
Probability
and Statistics
Continue....
Syllabus (b) Using Poisson Distribution: Here n = 100(large), p = 0.01(small)

which is tending to 0. Further np → 1 = λ(mean of Poisson). Apply-
ing Poisson distribution,
P(Ac ) = P(X = 0) + P(X = 1) + P(X = 2)

e−λ λ0 e−λ λ1 e−λ λ2
= + +
0! 1! 2!
−1 1
= e (1 + 1 + ) = 0.9197
2
Hence P(A) = 1 − P(Ac ) = 0.0803.
Ex.2: If the variance of a Poisson rv is 3, then find the probability
that (i) x = 0 (ii) 0 < x < 3 (iii) 1 ≤ x < 4.
−3 0
Ans:(i) V (X ) = λ, hence P(X = 0) = e 0!3 = e−3 (ii) P(0 <
e−3 31 e−3 32
X < 3) = P(X = 1) + P(X = 2) = 1!
+ 2!
= 0.597. (iii)
−3 1 −3 2 −3 3
P(1 ≤ X < 4) = e
1!
3
+ e
2!
3
+ e
3!
3
.
. . . . . . . . . . . . . . . . . . . .
8/9
Introduction to
Probability
and Statistics
Continue....
Ex.3: The average number of phone calls/minute coming into a
Syllabus switch board between 2 pm and 4 pm is 2.5, determine the prob-
ability that during one particular minute there will be (i) 4 or fewer (ii)
more than 6 calls.
Ans: The probability mass function of Poisson random variable is
e−λ λk
P(X = k ) = , k = 0, 1, 2, . . .
k!
(i) Given mean is λ = 2.5. The probability that there will be 4 or fewer
calls is
P(X ≤ 4) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)

+P(X = 4)
e−2.5 (2.5)0 e−2.5 (2.5)1 e−2.5 (2.5)2
= + +
0! 1! 2!
e−2.5 (2.5)3 e−2.5 (2.5)4
+ + .
3! 4!
(ii) The probability that more than 6 calls is P(X > 6) = 1 − P(X ≤
5) = 0.32
. . . . . . . . . . . . . . . . . . . .
9/9
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-9
. . . . . . . . . . . . . . . . . . . .
1/8
Introduction to
Probability
and Statistics
Syllabus
Syllabus

. . . . . . . . . . . . . . . . . . . .
2/8
Introduction to
Probability
and Statistics
Syllabus
. . . . . . . . . . . . . . . . . . . .
3/8
Introduction to
Probability
and Statistics
Syllabus Sampling: Randomly drawing one at a time from a given set of ob-
jects is known as sampling.
(i) Sampling With Replacement: Objects drawn are put back to the
set before drawing the next item.
(ii) Sampling Without Replacement: Objects drawn are kept aside
we draw the next one.
Example: A box contains 10 screws, three of which are defective.
Two screws are drawn at random. Find the probability that none of
the two screws is defective.
Ans: A : first drawn screw none defective, B : second drawn screw
none defective.
With Replacement: P(A) = 7/10,∩ P(B) = 7/10. Both the events A
and B are independent. P(A B) = P(A)P(B) = 49/100 = 0.49.
Without
∩ Replacement: P(A) = 7/10, P(B|A) = 6/9 = 2/3,
P(A B) = P(A).P(B|A) = (7/10)(2/3) = 7/15.
. . . . . . . . . . . . . . . . . . . .
4/8
Introduction to
Probability (E) Hyper-geometric Distribution: Suppose there are N number of
and Statistics fishes in a lake or pond. We catch M numbers and color it. Then
we put them all back to the pond. Now from the pond we catch n
Syllabus numbers at random. We want to determine the probability of exactly
x numbers are colored.
(i) With Replacement (independent trials): In one trial the probability
of colored one is M
N
= p. The probability of no colored is 1 − M
N
= 1−
p = q. If we draw n times then this will lead to Binomial distribution
with probability mass function is
M M
P(X = x) = C(n, x)( )(1 − )n−x , x = 0, 1, 2, . . . , n.
N N
(ii) Without Replacement: Here the trials are not independent. This
is done as follows. Since we need x colored fishes, out of M which
can be done in C(M, x) ways. Then from the rest N − M numbers
we can take n − x fishes in C(N − M, n − x) ways. Total number of
ways C(N, n). Hence the probability of getting exactly x number of
colored fishes is
C(M, x)C(N − M, n − x)
P(X = x) = , x = 0, 1, 2, . . . , n.
C(N, n)
This is called Hyper-geometric distribution.
. . . . . . . . . . . . . . . . . . . .
5/8
Introduction to
Probability
and Statistics Mean of Hyper-geometric distribution:
∑ C(M, x)C(N − M, n − x)
Syllabus µ = EX = x
x
C(N, n)
∑ nM.C(M − 1, x − 1).C(N − M, n − x)
=
x
N.C(N − 1, n − 1)
nM
= .
N
Variance: V (X ) = EX 2 − (EX )2 .
EX 2 = E(X (X − 1)) + EX
∑ C(M, x)C(N − M, n − x)
= x(x − 1) + EX
x
C(N, n)
∑ M(M − 1)n(n − 1)C(M − 2, x − 2)C(N − M, n − x)
= +E
x
N(N − 1)C(N − 2, n − 2)
M(M − 1)n(n − 1) nM
= + .
N(N − 1) N
. . . . . . . . . . . . . . . . . . . .
6/8
Introduction to
Probability
and Statistics
Syllabus
Hence
M(M − 1)n(n − 1) nM n2 M 2
V (X ) = + −
N(N − 1) N N2
{
nM (M − 1)(n − 1) nM } nM
= − +
N (N − 1) N N
nM
= (N − n)(N − M).
N 2 (N − 1)
. . . . . . . . . . . . . . . . . . . .
7/8
Introduction to
Probability
and Statistics
Syllabus
Relation Between HG and Binomial: If N, M and N − M are large as
compared to n then it does not matter too much whether we sample
with replacement or without replacement. Hence Hyper-geometric
can be approximated by Binomial.
(F) Geometric Distribution: A random variable X is said to have a
geometric distribution if it assumes only non-negative values and its
probability mass function is given by:
P(X = x) = q x p; x = 0, 1, 2, . . . , 0 < p ≤ 1; q = 1 − p
= 0, otherwise.
p q
The Moment Generating Function is MX (t) = (1−qet )
EX = p,
V (X ) =
q
p2
.
. . . . . . . . . . . . . . . . . . . .
8/8
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-10
. . . . . . . . . . . . . . . . . . . .
1/8
Introduction to
Probability
and Statistics
Syllabus
Syllabus

. . . . . . . . . . . . . . . . . . . .
2/8
Introduction to
Probability
and Statistics
Syllabus
. . . . . . . . . . . . . . . . . . . .
3/8
Introduction to Continuous Type Distributions:(A) Uniform Distribution. A random
Probability
and Statistics variable X is said to have a uniform distribution on [a, b] if its proba-
bility density function is given by
Syllabus 1
f (x) = , if x ∈ [a, b]
b−a
= 0, otherwise.
Here a and b are constants. The CDF of X is obtained as
∫ x
F (x) = f (t)dt = 0, x < a
−∞
∫ x
x −a
= f (t)dt = , a≤x <b
a b−a
= 1, x ≥ b.
∫b 1
∫b
The mean of this random variable is EX = a
xf (x)dx = b−a a
xdx =
2
a+b
2
.The variance is given by V (X ) = (b−a)
12
. In fact we can compute
the MGF of X as,
∫ b
MX (t) = etx f (x)dx
a
etb − eta
= , t ̸= 0.
t(b − a)
. . . . . . . . . . . . . . . . . . . .
4/8
Introduction to
Probability
and Statistics
bk +1 −ak +1
Syllabus We compute EX k = µ′k = (k +1)(b−a)
.
(B) Normal Distribution: A random variable X is said to be a normal
random variable or Gaussian random variable if X has the probability
density function
1 { 1 ( x − µ )2 }
f (x) = √ exp − , σ > 0, −∞ < µ < ∞, −∞ < x < ∞
σ 2π 2 σ
Properties: (i) µ is the mean and σ 2 is the variance (check!).

(ii) The constant σ√12π helps to make the area under the curve f (x)
unity.
(iii) The density f (x) is symmetric about the point x = µ and is bell-
shaped. If we take µ = 0, then it is symmetric about Y − axis.
(iv) As σ → 0, the function f (x) → 0 very fast.
. . . . . . . . . . . . . . . . . . . .
5/8
Introduction to
Probability
and Statistics
The moment generating function is obtained as
∫ ∞
Syllabus MX (t) = etx f (x)dx
−∞
∫ ∞
1
= √ etx exp{−(x − µ)2 /2σ 2 }dx
σ 2π −∞
∫ ∞
1 x −µ
= √ exp{t(µ + zσ)} exp(−z 2 /2)dz, z =
2π −∞ σ
∫ ∞
1 1
= eµt √ exp{− (z 2 − 2zσt)}dz
2π −∞ 2
∫ ∞
1 1
= eµt √ exp{− (z − σt)2 − σ 2 t 2 }dz
2π −∞ 2
∫ ∞
µt+t 2 σ 2 /2 1 1
= e √ exp{− (z − σt)2 }dz
2π −∞ 2
∫ ∞
2 2 1 1
= eµt+t σ /2 √ exp{− u 2 }du
2π −∞ 2
2
σ 2 /2
= eµt+t .
. . . . . . . . . . . . . . . . . . . .
6/8
Introduction to The odd moments µ2k +1 = 0, and µ2k = 1.3.5. . . . .(2k − 1)σ 2k ,
Probability
and Statistics k = 0, 1, 2, . . . .
Ans: The odd moments are given by,
∫ ∞
Syllabus µ2k +1 = (x − µ)2k +1 f (x)dx
−∞
∫ ∞
1
= √ (x − µ)2k +1 exp{−(x − µ)2 /2σ 2 }dx
σ 2π −∞
∫ ∞
1 x −µ
= √ (σz)2k +1 exp{−z 2 /2}dz, z =
σ 2π −∞ z
2k +1 ∫ ∞
σ
= √ z 2k +1 exp{z 2 /2}dz = 0,
2π −∞
2
since the integrand z 2k +1 ez /2 is an odd function of z.
The even order moments are µ2k which are computed as,
∫ ∞
µ2k = (x − µ)2k f (x)dx
−∞
∫ ∞
1
= √ (σz)2k exp(−z 2 /2)dz
2π −∞
∫ ∞
σ 2k
= √ 2 z 2k exp(−z 2 /2)dz
2π 0
∫ ∞
2σ 2k k −t dt
= √ 2 . .(2t) . . . e. . .√. . . , . t. =
2
. . z . /2. . . .
7/8
Introduction to
Probability
and Statistics
∫ ∞
2k σ 2k 1
Syllabus µ2k = √ e−t t (k + 2 )−1 dt
π 0
2k σ 2k 1
= Γ(k + ).
π 2
∫∞
where Γ(k ) = 0
e−x x k −1 dx.
Ex. Prove that µ2k = σ 2 (2k − 1)µ2k −2 .
From the above recurrence relation, we have
µ2k = [(2k − 1)σ 2 ][(2k − 3)σ 2 ]µ2k −4

= [(2k − 1)σ 2 ][(2k − 3)σ 2 ][(2k − 5)σ 2 ]µ2k −6
= [(2k − 1)σ 2 ][(2k − 3)σ 2 ][(2k − 5)σ 2 ] . . . 3σ 2 .1σ 2 .µ0
= 1.3.5. . . . .(2k − 1).σ 2k .
Ex. Derive the above results using moment generating function.

Find skewness and Kurtosis of the normal distribution with mean µ
and variance σ 2 .
. . . . . . . . . . . . . . . . . . . .
8/8
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-11
. . . . . . . . . . . . . . . . . . . .
1/14
Introduction to
Probability
and Statistics
Syllabus
Syllabus

. . . . . . . . . . . . . . . . . . . .
2/14
Introduction to
Probability
and Statistics
Syllabus
. . . . . . . . . . . . . . . . . . . .
3/14
Introduction to
Probability
and Statistics
The CDF of a normal random variable with mean µ and variance σ 2

Syllabus is given by
∫ x
1 1 ( t − µ )2
F (x) = P(X ≤ x) = √ exp{− }dt.
σ 2π −∞ 2 σ
The standardized normal random variable Z = X −µ

σ
has the CDF,
say Φ(.) and is given by
∫ z
1
Φ(z) = √ exp(−t 2 /2)dt,
2π −∞
∫ z
= ϕ(t)dt,
−∞
where we denote
1
ϕ(z) = exp{z 2 /2}.
2π
. . . . . . . . . . . . . . . . . . . .
4/14
Introduction to
Probability
and Statistics
Syllabus
The values of Φ(z) = P(Z ≤ z) are given in the table (Table A7,
Appendix 5, 8th Edition E. Kreyszig).
We can see that Φ(−z) + Φ(z) = 1, as
Φ(−z) + Φ(z) = P(Z ≤ −z) + P(Z ≤ z)

= P(Z ≤ z) + P(Z > z), because of symmetry
= P(Z ≤ z) + 1 − P(Z ≤ z) = 1.
Observe that Φ(0) = 1/2.
. . . . . . . . . . . . . . . . . . . .
5/14
Introduction to
Probability
and Statistics
Theorem: Relation between F (x) and Φ(z): The distribution function
F (x) of the normal distribution with any µ and σ is related to the
Syllabus
standardized distribution function Φ(z) by the formula
(x − µ)
F (x) = Φ .
σ
Proof:
∫
1 x
1 t −µ 2
F (x) = √ exp{− ( ) }dt
σ 2π −∞ 2 σ
∫ x−µ
σ 1 t −µ
= √ exp{−u 2 /2}du, put u =
−∞ 2π σ
∫ x−µ
σ
= ϕ(u)du
−∞
(x − µ)
= Φ .
σ
Observe that Φ(∞) = 1, and Φ(−∞) = 0.
. . . . . . . . . . . . . . . . . . . .
6/14
Introduction to
Probability
and Statistics
Syllabus
Finding Probabilities in Some Intervals: The probability that a normal

random variable X with mean µ and standard deviation σ assume
any value in an interval a < X ≤ b is
P(a < X ≤ b) = F (b) − F (a)

(b − µ) (a − µ)
= Φ −Φ .
σ σ
Here P(a < X ≤ b) = P(a < X < b) = P(a ≤ X ≤ b) = P(a ≤
X < b).
. . . . . . . . . . . . . . . . . . . .
7/14
Introduction to
Probability
and Statistics
Let X be a normal random variable with mean µ and variance σ 2 .
Then the following results are to be noted.
Syllabus
P(µ − σ < X ≤ µ + σ) ≈ 68%

P(µ − 2σ < X ≤ µ + 2σ) ≈ 95.5%
P(µ − 3σ < X ≤ µ + 3σ) ≈ 99.7%
Ex.1: Let X be normal random variable with mean 0.8 and variance
4 then find (i) P(X ≤ 2.44), (ii) P(X ≥ 1), (iii) P(1 ≤ X ≤ 1.8).
Ans: (i) P(X ≤ 2.44) = F (2.44) = Φ( 2.44−0.80 2
) = Φ(0.82) =
0.7939.
(ii) P(X ≥ 1) = 1−P(X ≤ 1) = 1−Φ( 1−0.8 2
) = 1−0.5398 = 0.4602.
(iii) P(1 ≤ X ≤ 1.8) = Φ(0.5)−Φ(0.1) = 0.6915−0.5398 = 0.1517.
Ex.2: In a population of iron rods let the diameter X be normally
distributed with mean 2 inch and standard deviation 0.008 inch. (a)
What percentage of defectives can we expect if we set the tolerance
limits at 2 ± 0.02 inch? (b) How should we set the tolerance limits to
allow for 4% defectives?
. . . . . . . . . . . . . . . . . . . .
8/14
Introduction to
Probability
and Statistics
Syllabus
Let X be a normal random variable with mean 30 and standard de-

viation 5. Find the probabilities that (i) 26 ≤ X ≤ 40, (ii) X ≥ 45, (iii)
|X − 30| > 5.
Ans: (i) Here µ = 30, and σ = 5. Now P(26 ≤ X ≤ 40) = P( 26−30 5
≤
X −30
5
≤ 40−30
5
) = P(−0.8 ≤ Z ≤ 2) = Φ(2) − Φ(−0.8) = Φ(2) −
(1 − Φ(0.8)) = 0.7653.
(ii) P(X ≥ 45) = 1 − P(X < 45) = 1 − F (45) = 1 − Φ( 45−30 5
) =
1 − Φ(3) = 0.00135.
(iii) P(|X − 30| > 5) = 1 − P(|X − 30| < 5) = 1 − P(−5 < X − 30 <
5) = 1 − P(25 < X < 35) = 1 − F (35) + F (25) = 1 − Φ( 35−30 5
)+
Φ( 25−30
5
) = 1 − Φ(1) + Φ(−1) = 1 − 0.6826 = 0.3174..
. . . . . . . . . . . . . . . . . . . .
9/14
Introduction to
Probability
and Statistics
Example: The masses of 300 students are normally distributed with
mean 68kgs and standard deviation 3kgs. How many students have
Syllabus
mass (i) greater than 72kgs (ii) less than or equal to 64kgs (iii) Be-
tween 65 and 71kgs inclusive.
Ans: From the given data we have N = 300, µ = 68, and σ = 3.
Let the random variable X represents the mass of students. The
standard normal variable Z = X −µσ
= x−68
3
.
X −68
(i)P(X > 72) = P( 3 > 72−68
3
) = 1 − P(Z ≤ 1.33) = 1 −
Φ(1.33) = 1 − 0.9018 = 0.0918. Hence the number of students
have mass greater than 72kg are N × P(X > 72) = 300 × 0.0918 =
27.54 ≈ 28.
(ii) P(X ≤ 64) = F (64) = Φ( 64−68
3
) = Φ(−1.33) = 1 − Φ(1.33) =
1 − 0.9082 = 0.0918. Hence the number of students have masses
less than 64kgs is N × P(X ≤ 64) = 300 × 0.0918 = 27.54 ≈ 28.
(iii) P(65 ≤ X ≤ 71) = F (71) − F (65) = Φ( 71−68 3
) − Φ( 65−68
3
) =
Φ(1)−Φ(−1) = 0.8413−(1−Φ(1)) = 0.8413−1+0.8413 = 0.6826.
Now the number of students have masses between 65kgs and 71kgs
is N × P(65 ≤ X ≤ 71) = 300 × 0.6826 = 204.78 ≈ 205.
. . . . . . . . . . . . . . . . . . . .
10/14
Introduction to
Probability
Example: In a distribution exactly normal 7% of the items are under
and Statistics 35 and 89% are under 63. What are the mean and standard devia-
tion of the distribution.
Syllabus Ans: From the given data P(X ≤ 35) = 100 7
= 0.07 and P(X ≤
89
63) = 100 = 0.89. Now from the first, we get
P(X ≤ 35) = 0.07

X −µ 35 − µ
=⇒ P(( )≤ ) = 0.07
σ σ
35 − µ
=⇒ P(Z ≤ ) = 0.07
σ
35 − µ
=⇒ Φ( ) = 0.07 (1)
σ
Again from the second data,
P(X ≤ 63) = 0.89

X −µ 63 − µ
=⇒ P(( )≤ ) = 0.89
σ σ
63 − µ
=⇒ P(Z ≤ ) = 0.89
σ
( 63 − µ )
=⇒ Φ = 0.89 (2)
σ
. . . . . . . . . . . . . . . . . . . .
11/14
Introduction to
Probability
and Statistics
Syllabus
From equation (1), we get 35−µ

σ
= −1.48
which implies µ − 1.48σ = 35. (3)
Again from equation (2), we have 63−µ
σ
= 1.23
which implies µ + 1.23σ = 63. (4)
After solving these two equations for µ and σ we get µ = 50.3, and
σ = 10.3.
Ex.1: The marks obtained in statistics in a certain examination found
to be normally distributed. If 15% of the students have secured
marks greater than 60, and 40% secured less than 30, then find
the mean and standard deviation.
. . . . . . . . . . . . . . . . . . . .
12/14
Introduction to
Probability
and Statistics
Binomial Distribution Approximated by Normal Distribution:

Syllabus
P(X = k ) = C(n, k )pk q n−k , k = 0, 1, 2, . . . , n.
If n becomes large, the binomial coefficients and powers become

very inconvenient. In this case the normal distribution provides a
good approximation to the binomial distribution, which is given by
the following result.
Theorem (Limit Theorem of De Moivre and Laplace): For large n,
f (x) = P(X = x) ∼ f ∗ (x) where f (x) is the probability mass function
of Binomial distribution and f ∗ (x) is given by
1 x − np
f ∗ (x) = √ √ exp{−z 2 /2}, z = √
2π npq npq
is the density of the normal distribution with mean µ = np, and vari-
ance σ 2 = npq. The symbol ∼ is read as asymptotically equal, that
is the ratio of both sides approaches 1 as n approaches ∞.
. . . . . . . . . . . . . . . . . . . .
13/14
Introduction to
Probability
and Statistics
Syllabus
Moreover, we can compute the probability for any a and b non neg-
ative integers such that a < b using normal approximation as
∑
b
P(a ≤ X ≤ b) = C(n, x)px q n−x ∼ Φ(β) − Φ(α),
x=a
where
a − np − 0.5 b − np + 0.5
α= √ β= √ .
npq npq
The term 0.5 in α and β is a correction caused by the change from
a discrete to a continuous distribution.
. . . . . . . . . . . . . . . . . . . .
14/14
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-12
. . . . . . . . . . . . . . . . . . . .
1/4
Introduction to
Probability
and Statistics
∫∞
Syllabus (C) Gamma Distribution: The integral Γ(α) = 0 e−x x α−1 con-
verges or diverges according as α > 0, or α ≤ 0. For α > 0 the
integral above is called gamma function. In particular, if α = 1,
Γ(1) = 1. If α > 1, we have Γ(α) = (α − 1)Γ(α − 1). If α = n is a
positive integer then Γ(n) = (n − 1)!.
√
We can show that Γ(1/2) = π.
The random variable X is said to have a gamma distribution if its
probability density function is given by
1
f (x) = x α−1 e−x/β , 0 < x < ∞
Γ(α)β α
= 0, otherwise.
Here α > 0, β > 0 are constants. We denote X ∼ G(α, β).
. . . . . . . . . . . . . . . . . . . .
2/4
Introduction to
Probability
and Statistics
Syllabus
The Moment Generating Function of X is computed as
MX (t) = EetX
∫ ∞
1 1)
x(t− β
= e x α−1 dx
Γ(α)β α 0
( 1 )α ∫ ∞ y α−1 e−y
= dy
1 − βt 0 Γ(α)
1
= (1 − βt)−α , t < .
β
EX = M ′ (t)|t=0 = αβ. EX 2 = M ′′ (t)|t=0 = α(α + 1)β 2 . Hence

Variance V (X ) = EX 2 − (EX )2 = αβ 2 .
. . . . . . . . . . . . . . . . . . . .
3/4
Introduction to
Probability
and Statistics
Syllabus
(D) Exponential Distribution: In the special case of Gamma distribu-

tion, when α = 1, we get the exponential distribution.
The probability density function is given by
1 − βx
f (x) = e , x > 0,
β
= 0, otherwise.
EX = β, V (X ) = β 2 , in fact EX n = n!β n .
Ex.1: Find Skewness and Kurtosis of the exponential random vari-
able X with parameter β.
. . . . . . . . . . . . . . . . . . . .
4/4
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-13
. . . . . . . . . . . . . . . . . . . .
1/7
Introduction to
Probability
and Statistics
Syllabus
Function of Several (Two) Random Variables: The distributions of

two or more random variables are of interest because,
(i) There are several practical situations in which we observe more
than one random variables. For example, (a) carbon content X and
hardness Y of steel, (b) blood pressure X , blood sugar Y and age
of a person, (c) height X and weight Y of a person, (d) amount of
fertilizer X and the yield of certain crop (say corn, paddy) and so on.
(ii) For the mathematical justification of statistical methods. We will
mainly focus on two dimensional random variables, though the re-
sults which we will discuss here, can be extended to more than two
random variables easily.
. . . . . . . . . . . . . . . . . . . .
2/7
Introduction to
Probability
and Statistics
The two dimensional random variable will be denoted as (X , Y ) and

Syllabus its realization or observed value, say (x, y ), can be thought of as a
point n the XY −plane. The cumulative distribution function of a two
dimensional random variable is denoted as
F (x, y ) = P(X ≤ x, Y ≤ y ), (x, y ) ∈ R2
and is read as the probability that the random variable X assumes

the values less than or equal to x and the random variable Y as-
sumes the values less than or equal to y in the same trial. In analogy
to the formula for one dimensional random variable P(a < X ≤ b) =
F (b) − F (a), we have in two dimensional case the formula
P(a1 < X ≤ b1 , a2 < Y ≤ b2 ) = F (b1 , b2 ) − F (a1 , b2 )

−F (b1 , a2 ) + F (a1 , a2 ).
Next, we will discuss the discrete and continuous case as before we

have discussed for one dimensional random variable.
. . . . . . . . . . . . . . . . . . . .
3/7
Introduction to
Probability Discrete type two-dimensional RV: The two dimensional random vari-
and Statistics
able (X , Y ) and its distribution is said to be discrete it (X , Y ) as-
sumes only finitely many or at most countably infinitely many pairs
Syllabus
of values (x1 , y1 ), (x2 , y2 ), . . . , . . . with positive probabilities p11 , p22 ,
. . . whereas the probability for any domain containing none of those
values of (X , Y ) is zero. More systematically, let (xi , yj ) be any of
those pairs and let P(X = xi , Y = yj ) = pij , and pij may be zero
for some pairs of subscripts i, j. These pij s are called the probability
mass functions of the random variable (X , Y ). Also we have
∑∑ ∑∑
pij = P(X = xi , Y = yj ) = 1.
i j i j
The CDF of the discrete type random variable (X , Y ) is obtained as

∑∑
F (x, y ) = P(X ≤ x, Y ≤ y ) = P(X = xi , Y = yj ).
xi ≤x yj ≤y
Further for any rectangular region in XY − plane

∑ ∑
P(a < X < b, c < Y < d) = P(X = xi , Y = yj ).
a<xi <b c<yj <d
. . . . . . . . . . . . . . . . . . . .
4/7
Introduction to
Probability
and Statistics
Syllabus
Ex.1: If we simultaneously toss a dime and a nickel and let X be

the number of heads the dime turns up, and Y be the number of
heads the nickel turns up. Here X and Y can take the values 0
or 1, that is (X = 0, Y = 0), (X = 0, Y = 1), (X = 1, Y = 0)
and (X = 1, Y = 1). The probabilities in each case is 1/4. That
is the probability mass function is P(X = 0, Y = 0) = p11 = 14 ,
P(X = 0, Y = 1) = p12 = 14 , P(X = 1, Y = 0) = p21 = 41 ,
P(X = 1, Y = 1) = p22 = 14 and otherwise pij = 0.
. . . . . . . . . . . . . . . . . . . .
5/7
Introduction to
Probability
and Statistics
Continuous type two dimensional random variable: The two dimen-
sional random variable (X , Y ) and its distribution is said to be con-
Syllabus
tinuous if the corresponding cumulative distribution function F (x, y )
can be obtained by the double integral
∫ x ∫ y
F (x, y ) = P(X ≤ x, Y ≤ y ) = f (u, v )dudv ,
−∞ −∞
whose integrand f (.) ≥ 0, everywhere, is called the joint probability

density function (or simply density) of the random variable (X , Y ).
This is is continuous except on finitely many curves. Here we also
have ∫ ∞ ∫ ∞
f (x, y )dxdy = 1.
−∞ −∞
We can obtain the probability that the random variable assumes

value in any rectangular region as,
∫ b1 ∫ b2
P(a1 < X ≤ b1 , a2 < Y ≤ b2 ) = f (x, y )dxdy .
a1 a2
. . . . . . . . . . . . . . . . . . . .
6/7
Introduction to
Probability
and Statistics
Ex.1: Let R be the rectangular region a1 < x ≤ b1 , a2 < y ≤ b2 . The

Syllabus
density (joint density) of the random variable (X , Y ) which assumes
values in this region is
1
f (x, y ) = , if (x, y ) ∈ R,
k
and f (x, y ) = 0, otherwise. Find the value of k .
Ans: The value of k can be obtained as,
∫ ∞ ∫ ∞
f (x, y )dxdy = 1,
−∞ −∞
Which implies
∫ b1 ∫ b2
1
dxdy = 1.
k a1 a2
After integrating we get k = (b1 − a1 )(b2 − a2 ).
. . . . . . . . . . . . . . . . . . . .
7/7
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-14
. . . . . . . . . . . . . . . . . . . .
1/11
Introduction to Marginal Distributions: Let (X , Y ) be a two dimensional random vari-
Probability
and Statistics able having cumulative distribution function F (x, y ). It is sometimes
essential to find the distribution of any one of the random variable
, say only X or only Y . The distribution of X or Y is known as the
Syllabus
marginal distribution.
Marginal Distribution for Discrete type random variable: Let (X , Y )
be a discrete type random variable having cdf F (x, y ) and joint prob-
ability mass function f (x, y ) = P(X = xi , Y = yj ) = pij . The marginal
probability mass function of X is obtained as
∑
f1 (x) = pi. = P(X = xi , Y = arbitrary ) = P(X = xi , Y = yj ).
j
The marginal distribution or CDF of X is obtained as

∑ ∑
F1 (x) = P(X ≤ x, Y = arbitrary ) = pi. = f1 (x∗).
xi ≤x x∗≤x
Similarly we can obtain the marginal probability mass function of Y

as
∑
f2 (y ) = p.j = P(X = arbitrary , Y = yj ) = P(X = xi , Y = yj ).
i
The marginal distribution or CDF of Y is obtained as

∑ ∑
F2 (y ) = P(X = arbitrary , Y ≤ y ) = p.j = f2 (y ∗).
yj ≤y y ∗≤y
. . . . . . . . . . . . . . . . . . . .
2/11
Introduction to
Probability
and Statistics Marginal Distribution of a Continuous type Random Variable: Let
(X , Y ) be a continuous type two dimensional random variable having
Syllabus joint probability density function f (x, y ) and joint distribution function
F (x, y ). The marginal probability density function of X is obtained as
∫ ∞
f1 (x) = f (x, y )dy .
−∞
The marginal cumulative distribution function of X is obtained as

∫ x
F1 (x) = f1 (x)dx.
−∞
In a similar manner we can define the marginal probability density

function of Y as ∫ ∞
f2 (y ) = f (x, y )dx.
−∞
The marginal cumulative distribution function of Y is obtained as

∫ y
F2 (y ) = f2 (y )dy .
−∞
. . . . . . . . . . . . . . . . . . . .
3/11
Introduction to
Probability
and Statistics
Syllabus
Important: The two dimensional discrete random variable (X , Y )
with joint probability mass function pij = P(X = xi , Y = yj ) can
be represented in a matrix form as follows.
x/y y1 y2 y3 y4 P(X = xi )
x1 p11 p12 p13 p14 p1.
x2 p21 p22 p23 p24 p2.
x3 p31 p32 p33 p34 p3.
x4 p41 p42 p43 p44 p
∑ ∑ 4.
P(Y = yj ) p.1 p.2 p.3 p.4 pij = 1
. . . . . . . . . . . . . . . . . . . .
4/11
Introduction to
Probability
and Statistics
Syllabus
Example 1: In drawing 3 cards with replacement from a bridge deck

let us consider, (X , Y ), X : number of queens, Y : number of kings
or aces. The deck has 52 cards. These include 4 queen, 4 kings,
and 4 aces. Hence in a single trial a queen has pmf 4/52 = 1/3 and
a king or ace 8/52 = 2/13. Thus the joint pmf of (X , Y ) is
3! ( 1 )x ( 2 )y ( 10 )3−x−y
P(X = x, Y = y ) = ,
x!y !(3 − x − y )! 13 13 13
x + y ≤ 3.
. . . . . . . . . . . . . . . . . . . .
5/11
Introduction to
Probability
and Statistics
Syllabus
Important: The two dimensional discrete random variable (X , Y )
with joint probability mass function pij = P(X = xi , Y = yj ) can
be represented in a matrix form as follows.
x/y 0 1 2 3 P(X = xi )
1000 600 120 8 1728
0 2197 2197 2197 2197 2197
300 120 12 432
1 2197 2197 2197
0 2197
30 6 36
2 2197 2197
0 0 2197
1 1
3 2197
0 0 0 2197
1331 726 132 8
P(Y = yj ) 2197 2197 2197 2197
1
. . . . . . . . . . . . . . . . . . . .
6/11
Introduction to
Probability
and Statistics
Syllabus
Independent Random Variables: Let (X , Y ) be a two dimensional
random variable having joint CDF F (x, y ). These two random vari-
ables are said to be independent if
F (x, y ) = F1 (x)F2 (y ),
for all (x, y ) ∈ R . Otherwise they are dependent. Using the cor-
2
responding joint probability density or joint probability mass function

we can write, that the random variables X and Y are independent if
and only if
f (x, y ) = f1 (x)f2 (y ),
for all (x, y ) ∈ R2 .
. . . . . . . . . . . . . . . . . . . .
7/11
Introduction to
Probability
and Statistics Functions of Random Variables: Let (X , Y ) be a two dimensional
random variable having joint CDF F (x, y ). Let g(x, y ) be a continu-
Syllabus ous function defined for all (x, y ). Then W = g(X , Y ) is a random
variable.
If (X , Y ) is a discrete random variable, we get the probability mass
function P(W = w) as follows.
∑ ∑
P(W = w) = P(X = x, Y = y )
g(x,y )=w
and CDF is
∑ ∑
F (w) = P(W ≤ w) P(X = x, Y = y ).
g(x,y )≤w
Similarly if (X , Y ) is continuous
∫ ∫
F (w) = P(W ≤ w) f (x, y )dxdy ,
g(x,y )≤w
where for each w we integrate over the region g ≤ w.
. . . . . . . . . . . . . . . . . . . .
8/11
Introduction to
Probability
and Statistics Expectation or Mean of Random Variables: Let (X , Y ) be a two di-
mensional random variable having joint CDF F (x, y ). We define the
mean of a function of random variable g(X , Y ) as
Syllabus
∑∑
E(g(X , Y )) = g(x, y )P(X = x, Y = y ), if (X , Y ) is discrete
x,y
∫ ∞ ∫ ∞
= g(x, y )f (x, y )dxdy , if (X , Y ) is continuous
−∞ −∞
Here we assume that the double series converges, and the double
integral of |g(x, y )|f (x, y ) is finite.
Further, we have for a, b constants and g, h continuous functions,
E(ag(X , Y ) + bh(X , Y )) = aE(g(X , Y )) + bE(h(X , Y ))
and thus
E(X + Y ) = EX + EY .
Addition of Means: The mean of sum of random variables equals the
sum of the means, that is,
E(X1 + X2 + · · · + Xn ) = EX1 + EX2 + · · · + EXn .
. . . . . . . . . . . . . . . . . . . .
9/11
Introduction to
Probability
and Statistics
Syllabus
Multiplication of Means: Let (X , Y ) be a two dimensional random
variable having joint CDF F (x, y ). The mean of the product of in-
dependent random variables equals the product of the means, that
is,
E(X1 X2 . . . Xn ) = E(X1 )E(X2 ) . . . E(Xn ).
In particular, E(XY ) = E(X )E(Y ).

Addition of Variances: Let W = X + Y . Let V (X ) = σ12 , V (Y ) = σ22 ,
E(X ) = µ1 , E(Y ) = µ2 .
E(W 2 ) = E[(X +Y )2 ] = E(X 2 +Y 2 +2XY ) = EX 2 +EY 2 +2E(XY ).
[E(W )]2 = [EX + EY ]2 = [(EX )2 + (EY )2 + 2E(X )E(Y )].
. . . . . . . . . . . . . . . . . . . .
10/11
Introduction to
Probability
and Statistics
Syllabus
Hence, the variance of W is
V (W ) = EW 2 − (EW )2
= EX 2 − (EX )2 + EY 2 − (EY )2 + 2[E(XY ) − E(X )E(Y )]
= σ12 + σ22 + 2σ12 .
The terms σ12 = E(XY ) − E(X )E(Y ), is known as the covariance of

X and Y .
If X and Y are independent then E(XY ) = E(X )E(Y ). Hence we
have σ12 = 0 and σ 2 = σ12 + σ22 . We write this result below.
(Addition of Variances:) The variance of the sum of independent
random variables equals the sum of the variances of these variables.
. . . . . . . . . . . . . . . . . . . .
11/11
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-15
. . . . . . . . . . . . . . . . . . . .
1/6
Introduction to Conditional Distributions: Let (X , Y ) be a two dimensional random
Probability
and Statistics variable having cumulative distribution function F (x, y ). Let (X , Y )
be a discrete type random variable. Let P(Y = y ) > 0, then we
Syllabus
define the conditional probability mass function of X given Y as
P(X = x, Y = y )
P(X = x|Y = y ) = ,
P(Y = y )
for fixed y . Similarly, we define the conditional probability mass func-
tion of Y given X as
P(X = x, Y = y )
P(Y = y |X = x) = , P(X = x) > 0,
P(X = x)
for fixed X = x.
Let (X , Y ) be a continuous type random variable. We define the
conditional probability density function of X given Y as
f (x, y )
fX |Y (x|y ) = ,
f2 (y )
Similarly, we define the conditional probability density of Y given X
as
f (x, y )
fY |X (y |x) = .
f1 (x)
. . . . . . . . . . . . . . . . . . . .
2/6
Introduction to
Probability
and Statistics Ex.1: A fair coin is tossed three times. Let X = number of heads in
three tossing, and Y =difference, in absolute value between number
Syllabus of heads and number of tails. The joint PMF of (X , Y ) is given in the
following table.
Y /X 0 1 2 3 P(Y = y )
3 3 6
1 0 8 8
0 8
1 1 2 The conditional PMF of
3 8
0 0 8 8
1 3 3 1
P(X = x) 8 8 8 8
1
X given Y is
P(X = x|Y = 1) = 0, x = 0, 3
1
= , x = 1, 2.
2
1
P(X = x|Y = 3) = , x = 0, 3
2
= 0, x = 1, 2.
. . . . . . . . . . . . . . . . . . . .
3/6
Introduction to
Probability
and Statistics
Syllabus
P(Y = y |X = 0) = 0, y = 1
= 1, y = 3.
P(Y = y |X = 1) = 1, y = 1
= 0, y = 3.
P(Y = y |X = 2) = 1, y = 1
= 0, y = 3.
P(Y = y |X = 3) = 0, y = 1
= 1, y = 3.
. . . . . . . . . . . . . . . . . . . .
4/6
Introduction to
Probability
and Statistics
Ex.2: Let (X , Y ) be jointly distributed with probability density func-

Syllabus tion,
f (x, y ) = 2, 0 < x < y < 1,

= 0, otherwise.
The marginal density of X and Y are respectively given by

∫ 1
f1 (x) = 2dy = 2 − 2x, 0 < x < 1,
x
= 0, otherwise,
and
∫ y
f2 (y ) = 2dx = 2y , 0 < y < 1
0
= 0, otherwise,
. . . . . . . . . . . . . . . . . . . .
5/6
Introduction to
Probability
and Statistics
Calculating Conditional Probability Densities: The conditional den-
Syllabus sity of Y given X is
f (x, y ) 1
fY |X (y |x) = = , x < y < 1.
f1 (x) 1−x
Similarly, the conditional PDF of X given Y is given by
1
fX |Y = , 0 < x < y.
y
∫ 1
1 1 1
P(Y ≥ |x = ) = dy = 1
2 2 1
2
1 − 12
and
∫ 2
1 2 3 1 1
P(X ≥ |y = ) = 2
dx = .
3 3 1
3
2
3
. . . . . . . . . . . . . . . . . . . .
6/6
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-16
. . . . . . . . . . . . . . . . . . . .
1/4
Introduction to Distribution of sample mean in the case of normal distribution: Let
Probability
and Statistics X1 , X2 , . . . , Xn be identically independently distributed random sam-
ple of size n from normal distribution ∑having mean µ and variance
σ 2 . We have sample mean X̄ = n1 ni=1 Xi . To derive the distribu-
Syllabus 2
tion of X̄ . It is easy to check that, E(X̄ ) = µ and V (X̄ ) = σn . The
X̄ −µ
√ is the standardized normal
standardized random variable Z = σ/ n
random variable with mean 0 and variance 1.
To check that X̄ is a normal random variable with mean µ and vari-
ance σ 2 /n. We need the following simple result.
A linear combination of independent normal variables is also a nor-
mal variable: Let Xi , i = 1, 2, . . . , n be n independent normal random
variables with mean µi and variance σi2 . Then the MGF of Xi is
MXi (t) = exp{µi t + (t 2 σi2 /2)}.
∑
The MGF of the linear combination ni=1 ai Xi .
∑n
M∑ni=1 ai Xi (t) = E[et i=1 ai Xi
]
∏
n
= E[ etai Xi ]
i=1
∏
n
= [ Eetai Xi ]
i=1
. . . . . . . . . . . . . . . . . . . .
2/4
Introduction to
Probability
and Statistics
Syllabus
∏
n
M∑ni=1 ai Xi (t) = [ Eetai Xi ]
i=1
∏
n
= MXi (ai t)
i=1
∑n ∑
n
= exp[( ai µi )t + t 2 ( ai2 σi2 )/2].
i=1 i=1
∑
This is the MGF
∑n of a2 normal random variable with mean ni=1 ai µi
2
and variance
∑ i=1 ai σi ∑
. Hence by∑uniqueness of MGF the random
variable ni=1 ai Xi ∼ N( ni=1 ai µi , ni=1 ai2 σi2 ).
Take ai = 1
n
, µi = µ, σi2 = σ 2 ; i = 1, 2, . . . , n. We get X̄ ∼
N(µ, σ 2 /n).
. . . . . . . . . . . . . . . . . . . .
3/4
Introduction to
Probability
and Statistics
Syllabus
Distribution of Sample Variance of Normal Random Variable: (µ
known) This result also can be proved by the MGF.
∑n Xi −µi 2 ∑n
The random variable χ2 = i=1 ( σ ) = 2
i=1 Ui , where Ui =
i
Xi −µi
σi
∼ N(0, 1). The MGF of χ2 is
1
Mχ2 (t) = .
(1 − 2t)n/2
This is the MGF of a gamma random variable with parameters 1/2
and n/2.
∑n ∑n Xi −X̄ 2
Consider S 2 = n−1 i=1 (Xi −X̄ ) . The random variable
1 2
i=1 ( σ ) =
(n−1)S 2
σ2
is a χ2 random variable with degree of freedom n − 1.
. . . . . . . . . . . . . . . . . . . .
4/4
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-17
. . . . . . . . . . . . . . . . . . . .
1/10
Introduction to
Probability
and Statistics
Syllabus
Estimation of Parameters Associated to a Distribution: In a distri-

bution there are some parameters associated with it, which are un-
known. Using sample values, it is necessary to get an approximate
or estimate value of these unknown parameters. The process of es-
timating the parameters is known as estimation. In this process the
approximate value is a real number and is known as point estima-
tion. In the next, we will learn interval estimation, where we get an
interval that contains the parameter with some probability. First we
will discuss point estimation. In this course we will learn two impor-
tant methods of point estimation, (A)- Method of Moment Estimation
and (B) Method of Maximum Likelihood Estimation.
. . . . . . . . . . . . . . . . . . . .
2/10
Introduction to
Probability
and Statistics
Method of Moment Estimation: Suppose x1 , x2 , . . . , xn be n obser-
vations from a distribution having probability mass function or prob-
Syllabus
ability density function f (x|θ). Here θ = (θ1 , θ2 , . . . , θk ). We compute
the population moments (moments)e e E(X r ) = µ′ for r = 1, 2, . . . , k .
∑ r
Also compute the sample moments mr′ = n1 ni=1 xir .
Now equating the sample moments and population moments we get
k number of equations. Solving the k simultaneous equations, we
get the solutions for (θ1 , θ2 , . . . , θk ) as (θ̂1 , θ̂2 , . . . , θ̂k ). Here θ̂i ; i =
1, 2, . . . , k are functions of sample (x1 , x2 , . . . , xk ).
Ex.1: Find the method of moment estimator for n and p in the Bino-
mial distribution B(n, p).
Sol: Let x1 , x2 , . . . , xN be N observations from the binomial distribu-
tion. Since there are two unknown parameters, θ1 = n, θ2 = p. We
will compute two moments or some variants of these. We know the
mean is np = EX , V (X ) = np(1 − p). Further the sample moments
are
1 ∑ 1 ∑ 2
N N
xi = x̄, xi .
N N
i=1 i=1
. . . . . . . . . . . . . . . . . . . .
3/10
Introduction to
Probability To get the method of moment estimators, we consider the following
and Statistics
two equations simultaneously as
1 ∑ 1 ∑ 2
N N
Syllabus
np = xi , np(1 − p) + n2 p2 = xi .
N N
i=1 i=1
We can solve these two equations for p and n simultaneously as:
x̄
p=
n
and
(x̄)2
n= ∑ .
x̄ + (x̄)2 − N1 Ni=1 xi2
These are the method of moment estimators for n and p respectively.

Ex.2: Find the method of moment estimators of µ and σ 2 in N(µ, σ 2 ).
Sol: We know the population mean and variance of normal distribu-
tion are EX = µ and V (X ) = σ 2 . Equating with the sample mean
and variance we have µ = x̄ and σ 2 = s2 . Hence ∑nthe method2 of
moment estimator of µ is X̄ and of σ 2 ia S 2 = n−1 i=1 (Xi − X̄ ) .
1
. . . . . . . . . . . . . . . . . . . .
4/10
Introduction to Maximum Likelihood Estimator(MLE): Let X be a random variable
Probability
and Statistics (continuous/discrete) having CDF F (x|θ). Here θ = (θ1 , θ2 , . . . , θk ).
Then it has the pdf/pmf given by f (x|θe ). Below ewe will discuss the
e
algorithm to get the MLEs of θ1 , θ2 , . . . , θk .
Syllabus
Step-1: (Construction of Likelihood Function) Let x1 , x2 , . . . , xn be n
number of observations taken from X ∼ f (x|θ). That is,
e
x1 ∼ f (x1 |θ)
e
x2 ∼ f (x2 |θ)
e
..
.
xn ∼ f (xn |θ).
e
Since all the xi s are independent, the joint density function is given
by
∏
n
l(θ) = f (xi |θ).
e i=1
e
This l(θ) is known as the likelihood function for θ.
e
Step-2:(Construction of log-likelihood function)eIf needed calculate
the log-likelihood function as
∑
n
L = log l = log f (xi |θ).
i=1
e
. . . . . . . . . . . . . . . . . . . .
5/10
Introduction to
Probability
and Statistics
Continue
Syllabus Step-3: (Construction of Normal Equations) In order to maximize the

log/log-likelihood function, differentiate it with respect to θ1 , θ2 , . . . ,
θk and equate to zero. That is
∂L
= 0
∂θ1
∂L
= 0
∂θ2
..
.
∂L
= 0
∂θk
Then solve the above system of k equations in k unknowns to get

the solutions of θ1 , θ2 , . . . , θk as θ̂1 , θ̂2 , . . . , θ̂k respectively.
Note: If θ = θ, (one parameter) then instead of partial derivative we
consideresimple derivative and solve for the value of θ.
. . . . . . . . . . . . . . . . . . . .
6/10
Introduction to
Probability
and Statistics Ex.1: Find the maximum likelihood estimators of µ and σ 2 in normal
distribution N(µ, σ 2 ).
Sol: Step-1: Construction of likelihood function: Let x1 , x2 , . . . , xn be
Syllabus
n number of observations taken from X ∼ N(µ, σ 2 .) That is
1 1 x1 −µ 2
f (x1 ) ∼ √ e− 2 ( σ )
σ 2π
1 1 x2 −µ 2
f (x2 ) ∼ √ e− 2 ( σ )
σ 2π
..
.
1 1 xn −µ 2
f (xn ) ∼ √ e− 2 ( σ )
σ 2π
The joint density of x1 , x2 , . . . , xn is given by
∏
n
l(µ, σ 2 ) = f (xi |µ, σ 2 )
i=1
1 ∑ xi − µ 2
n
1
= exp{− ( ) }
σ n (2π)n/2 2 σ
i=1
. . . . . . . . . . . . . . . . . . . .
7/10
Introduction to
Probability Step-2: Construction of log-likelihood function: The log-likelihood
and Statistics function is given by
1 ∑
n
n n
Syllabus
L(µ, σ 2 ) = log l(µ, σ 2 ) = − log σ 2 − log 2π − 2
(xi − µ)2 .
2 2 2σ
i=1
Step-3: (Normal equations): Differentiating with respect to µ and σ 2

and equating to zero we get the two equations, as
1 ∑
n
∂L
= 2 (xi − µ) = 0,
∂µ σ
i=1
1 ∑
n
∂L n
= − + (xi − µ)2 = 0.
∂σ 2 2σ 2 2σ 4
i=1
Now solving for µ and σ 2 we get,
µ̂ = x̄
1 ∑
n
n−1 2
σ̂ 2 = (xi − x̄)2 = s .
n n
i=1
These µ̂ and σ̂ are the MLEs of µ and σ 2 respectively.

2
. . . . . . . . . . . . . . . . . . . .
8/10
Introduction to
Probability
and Statistics
Ex.3: Find the maximum likelihood estimator of λ in Poisson distri-
bution.
Sol: The probability mass function of a Poisson random variable is
Syllabus
given by
e−λ λx
P(X = x) = , x = 0, 1, 2, . . . .
x!
Step-1: Let x1 , x2 , . . . , xn be n observations from the Poisson distri-
bution with above probability mass function. The likelihood function
is given by
∏
n
l(λ, x ) = f (xi )
e i=1
∑n
e−nλ λ i=1 xi
= ∏ .
xi !
Step-2: The log-likelihood function is given by
∑
n ∑
n
L(λ, x ) = −nλ + xi log λ − log xi !.
e i=1 i=1
. . . . . . . . . . . . . . . . . . . .
9/10
Introduction to
Probability
and Statistics
Syllabus
Step-3: The normal equation is obtained as
1∑
n
dL
= −n + xi = 0.
dλ λ
i=1
Now solving for λ we get
1∑
n
λ= xi = x̄.
n
i=1
Hence the maximum likelihood estimator of λ is given by X̄ .

Ex.4: Let X be a Binomial random variable having parameters p and n.
Find the maximum likelihood estimator of p when n is known.
Ex.5: Find the maximum likelihood estimate of θ in f (x, θ) = (1 + x)x θ ,
0 < x < 1 based on an independent sample of size n.
. . . . . . . . . . . . . . . . . . . .
10/10
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-18
. . . . . . . . . . . . . . . . . . . .
1/7
Introduction to
Probability
and Statistics Confidence Interval of Parameters in Case of Normal Distribution:
Confidence intervals for an unknown parameter, say θ associated
Syllabus to a distribution are intervals of the form θ1 ≤ θ ≤ θ2 such that the
interval contains θ with some probability say ν. This ν can be chosen
90%, 95%, 99%, etc.
It is natural to expect an interval whose length is as small as possible,
and at the same time the value of ν must be as high as possible.
However, it is not possible to achieve both the things at the same
time. So, we first fix the value of ν and then compute the interval of
certain length depending upon ν.
The values of θ1 and θ2 are called the lower limit and upper limit of
the interval which depend on the sample values x1 , x2 , . . . , xn .
That is, we are interested to obtain an interval of the form [θ1 , θ2 ],
such that,
P(θ ∈ [θ1 (x1 , x2 , . . . , xn ), θ2 (x1 , x2 , . . . , xn )]) = ν
for a fixed ν. We write
CONFν {θ1 ≤ θ ≤ θ2 }.
. . . . . . . . . . . . . . . . . . . .
2/7
Introduction to
Probability
and Statistics
Syllabus
Confidence Interval for µ in N(µ, σ 2 ) when σ 2 is known: The follow-
ing steps can be used to get a confidence interval for µ when σ 2 is
known.
Step-1: Chose a confidence level ν, which can be 90%, 95%, 99%
etc.
Step-2: Determine the corresponding value of c. The following table
may be useful.
ν 0.90 0.95 0.99 0.999
c 1.645 1.960 2.57 3.291
Step-3: Compute the sample mean x̄ using the sample values x1 ,
x2 , . . . , xn .
cσ
√
Step-4: Compute the value k = n
. The confidence interval for µ is
CONFν {x̄ − k ≤ µ ≤ x̄ + k }.
. . . . . . . . . . . . . . . . . . . .
3/7
Introduction to
Probability Understanding Step-2, and Step-4: In order to derive the step-2 and
and Statistics
step-4, we need the following results.
Theorem: Let X1 , X2 , . . . , Xn be independent normal random vari-
Syllabus
ables each of which has mean µ and variance σ 2 . Then we have
(i) The sum (X1 +X2 +· · ·+Xn ) is normal with mean nµ and variance
nσ 2 . ∑
(ii) The random variable X̄ = n1 ni=1 Xi is normal random variable
2
with mean µ and variance σ /n.
X̄ −µ
√ ∼ N(0, 1).
(iii) The random variable Z = σ/ n
In order to derive the Step-2, let us first chose the value of ν and then
determine the corresponding value of the c. Since the distribution
of Z is symmetric and also contains the information regarding µ,
consider
X̄ − µ
P(−c ≤ Z ≤ c) = P(−c ≤ √ ≤ c)
σ/ n
= Φ(c) − Φ(−c)
= 2Φ(c) − 1 = ν (given)
The values of c can be obtained from Table A8 Appendix 5 using

Normal table.
. . . . . . . . . . . . . . . . . . . .
4/7
Introduction to
Probability
and Statistics
Syllabus
To derive Step-4, consider that
P(−c ≤ Z ≤ c) = P(c ≥ −Z ≥ −c)

µ − X̄
= P(c ≥ √ ≥ −c)
σ/ n
cσ cσ
= P( √ ≥ µ − X̄ ≥ − √ )
n n
= P(X̄ + k ≥ µ ≥ x̄ − k )
= P(X̄ − k ≤ µ ≤ X̄ + k ) = ν
Hence we can write
CONFν {X̄ − k ≤ µ ≤ X̄ + k }.
. . . . . . . . . . . . . . . . . . . .
5/7
Introduction to
Probability
and Statistics
Syllabus Ex.1: Detrmine a 95% confidence interval for the mean of a normal
distribution with variance σ 2 = 9, using a sample of size n = 100
values with sample mean x̄ = 5.
Sol:
Step-1: ν = 0.95 given.
Step-2: From the Table A8 when ν = 0.95 we get the value of c as
1.960.
Step-3: It is given that x̄ = 5, σ 2 = 9, and n = 100.
Step-4: Compute the value k = 1.960∗3
√
100
= 0.588. Hence x̄ − k =
4.412, and x̄ + k = 5.588.
Step-5: The confidence interval for µ is given by
CONF0.95 {4.412 ≤ µ ≤ 5.588}.
. . . . . . . . . . . . . . . . . . . .
6/7
Introduction to
Probability
and Statistics
Syllabus
Ex.2: Detrmine a 99% confidence interval for the mean of a normal
population with standard deviation 2.5 using the sample 30.8, 30.0,
29.9, 30.1, 31.7, 34.0.
population with standard deviation 1.2 using the sample 10, 10, 8,
12, 10, 11, 10, 11.
population with variance 16 using a sample of size 200 with mean
x̄ = 74.81.
What sample size would be needed to produce a 95% confidence
interval in Ex.2, of length (a) 2σ (b) σ ?
. . . . . . . . . . . . . . . . . . . .
7/7
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-19
. . . . . . . . . . . . . . . . . . . .
1/6
Introduction to
Probability
and Statistics
Confidence Interval of µ in N(µ, σ 2 ) when σ 2 is unknown: The fol-

Syllabus lowing steps can be used to get a confidence interval of µ when σ 2
is unknown.
Step-1: Choose a confidence level ν such as ν = 0.90, 0.95, 0.99
etc. (which may be given in practice)
Step-2: Determine the solution c of the equation
1+ν
F (c) =
2
from the table of the t−distribution with ‘n − 1′ degrees of freedom
where n is the sample size. Please see Table A9 Appendix 5.
Step-3: Compute the sample mean x̄ and√variance s2 of the sample
x1 , x2 , . . . , xn . Compute the value of s = s2 .
sc
√
Step-4: Compute k = n
. The confidence interval is
CONFν {x̄ − k ≤ µ ≤ x̄ + k }.
. . . . . . . . . . . . . . . . . . . .
2/6
Introduction to
Probability Understanding Step-2 and Step-4: The following result will be useful
and Statistics
for deriving the steps.
Theorem: Let X1 , X2 , . . . , Xn be independent normal random vari-
Syllabus
ables with same mean µ and same variance σ 2 . Then the random
variable
X̄ − µ
T = √
S/ n
has a t−distribution with ‘n − 1′ degrees of freedom, where
1 ∑
n
S2 = (Xi − X̄ )2 .
n−1
i=1
The distribution of T is symmetric and it contains the information

about mean µ. Accordingly, we are interested to get P(−c ≤ T ≤
c) = ν, for a given ν. That is
P(−c ≤ T ≤ c) = F (c) − F (−c)

= 2F (c) − 1 = ν.
This implies
1+ν
F (c) = .
2
. . . . . . . . . . . . . . . . . . . .
3/6
Introduction to
Probability
and Statistics
Consider that
Syllabus
X̄ − µ
P(−c ≤ T ≤ c) = P(−c ≤ √ ≤ c)
S/ n
cS cS
= P(− √ ≤ X̄ − µ ≤ √ )
n n
= P(X̄ − K ≤ µ ≤ X̄ + K ),
where
cS
K = √ .
n
We substitute the observed values of X̄ and S 2 as x̄ and s2 to get
the interval as
P(x̄ − K ≤ µ ≤ x̄ + K ) = ν
or
CONFν {x̄ − K ≤ µ ≤ x̄ + K }.
. . . . . . . . . . . . . . . . . . . .
4/6
Introduction to
Probability
and Statistics
Ex.1: Five independent measurements of the flash point (0 F ) of

Syllabus diesel oil gave the values 144, 147, 146, 142, 144. Assuming nor-
mality determine a 99% confidence interval for the mean.
Step-2: F (c) = ν+1
2
= 0.99, n − 1 = 4. From Table A9 we get
c = 4.60.
Step-3: Calculate x̄ = 144.6, s2 = 3.8.
√
3.8∗4.60
√
Step-4: Calculate K = 5
= 4.01.
Step-5: The confidence interval for µ is given by
CONFν {x̄ − K ≤ µ ≤ x̄ + K }.
or
CONFν {144.6 − 4.01 ≤ µ ≤ 144.6 + 4.01}.
or
CONF0.99 {140.5 ≤ µ ≤ 148.7}.
. . . . . . . . . . . . . . . . . . . .
5/6
Introduction to
Probability
and Statistics
Syllabus

population using the sample 30.8, 30.0, 29.9, 30.1, 31.7, 34.0.
population using the sample 10, 10, 8, 12, 10, 11, 10, 11.
population with sample variance s2 = 16 using a sample of size 200
with mean x̄ = 74.81.
. . . . . . . . . . . . . . . . . . . .
6/6
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-20
. . . . . . . . . . . . . . . . . . . .
1/6
Introduction to
Probability
Confidence Interval of σ 2 in N(µ, σ 2 ).: The following steps can be
and Statistics used to get a confidence interval of σ 2 when µ is unknown.
Step-1: Choose a confidence level ν such as ν = 0.90, 0.95, 0.99
Syllabus etc. (which will be given in practice)
Step-2: Determine the solutions c1 and c2 of the equations
1−ν 1+ν
F (c1 ) = , F (c2 ) =
2 2
from the table of the Chi-square distribution with ‘n − 1′ degrees of
freedom where n is the sample size. Please see Table A10 Appendix
5.
Step-3: Compute s2 of the sample x1 , x2 , . . . , xn .
Step-4: Compute
(n − 1)s2
k1 =
c1
and
(n − 1)s2
k2 = .
c2
The confidence interval of σ 2 is thus obtained as
CONFν {k2 ≤ σ 2 ≤ k1 }.
. . . . . . . . . . . . . . . . . . . .
2/6
Introduction to
Probability
and Statistics Understanding Step-2 and Step-4: The following result will be useful
for deriving the steps.
Syllabus Theorem: Let X1 , X2 , . . . , Xn be independent normal random vari-
ables with same mean µ and same variance σ 2 . Then the random
variable
(n − 1)S 2
Y =
σ2
has a Chi-square distribution with ‘n − 1′ degrees of freedom, where
1 ∑
n
S2 = (Xi − X̄ )2 .
n−1
i=1
(Proof Omitted)
The random variable Y contains information about σ 2 . Now for a
given ν, we want to get
P(c1 ≤ Y ≤ c2 ) = F (c2 ) − F (c1 ) = ν

1−ν
The above equation will be satisfied if we choose F (c1 ) = 2
and
F (c2 ) = 1+ν
2
.
. . . . . . . . . . . . . . . . . . . .
3/6
Introduction to
Probability
and Statistics
To derive Step-4, consider that

Syllabus
(n − 1)S 2
P(c1 ≤ Y ≤ c2 ) = P(c1 ≤ ≤ c2 )
σ2
(n − 1)S 2
(n − 1)S 2
= P( ≤ σ2 ≤ )
c2 c1
= P(k2 ≤ σ 2 ≤ k1 ),
where
(n − 1)S 2
k1 =
c1
and
(n − 1)S 2
k2 = .
c2
Hence the confidence interval for σ 2 is given by
CONFν {k2 ≤ σ 2 ≤ k2 }.
. . . . . . . . . . . . . . . . . . . .
4/6
Introduction to
Probability
and Statistics
Syllabus
Ex.1: Determine a 95% confidence interval for the variance (assum-
ing normality), using the sample values 89, 84, 87, 81, 89, 86, 91,
90, 78, 89, 87, 99, 83, 89.
Step-2: Given n − 1 = 13, c1 = 5.01, and c2 = 24.74. (From Table
A10 )
Step-3: Calculate s2 and 13s2 = 326.9.
Step-4: Calculate 13s2 /c1 = 65.25 and 13s2 /c2 = 13.21.
Step-5: The confidence interval for σ 2 is given by
CONF0.95 {13.21 ≤ σ 2 ≤ 65.25}.
. . . . . . . . . . . . . . . . . . . .
5/6
Introduction to
Probability
and Statistics
Syllabus
Ex.2: Determine a 99% confidence interval for the mean and vari-
ance of a normal population using the sample 30.8, 30.0, 29.9, 30.1,
31.7, 34.0.
Ex.3: Determine a 95% confidence interval for the mean and vari-
ance of a normal population using the sample 10, 10, 8, 12, 10, 11,
10, 11.
If X1 and X2 are independent normal random variables with mean
16 and 12 and variance 8 and 2, respectively, what distribution does
4X1 − X2 have?
Assuming that the populations from which the samples are collected
are normal, determine a 95% confidence interval for the variance σ 2 .
Use the sample values 251, 255, 258, 253, 253, 252, 250, 252, 255,
256.
. . . . . . . . . . . . . . . . . . . .
6/6
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-21
. . . . . . . . . . . . . . . . . . . .
1/4
Introduction to
Probability
and Statistics
Goodness of Fit Chi-Square Test: In this method, we intend to test

Syllabus a hypothesis that a certain function F (x) is the distribution function
of a distribution from which the samples x1 , x2 , . . . , xn are collected.
This is called goodness of fit. If we denote F (x)˜ as the sample distri-
bution function, which is defined as the sum of the relative frequen-
cies of all sample values xi not greater than x. The actual cumu-
lative distribution function is F (x). We will accept the hypothesis, if
F̃ (x) fits sufficiently well F (x) otherwise reject it. We may tolerate a
certain amount of deviation/difference between F̃ (x) and F (x). We
determine a number c such that if the hypothesis is true, a devia-
tion greater than c has a small preassigned probability. If a deviation
greater than c occurs, we have reason to doubt that the hypothesis
is true and we reject it. On the other hand if the deviation does not
exceed c so that F̃ (x) approximates F (x) sufficiently well, we accept
the hypothesis. When we accept the hypothesis, means we have in-
sufficient evidence to reject it using this method. Next we discuss a
step by step procedure for goodness of fit chi-square test.
. . . . . . . . . . . . . . . . . . . .
2/4
Introduction to Chi-Square Test for the Hypothesis That F (x) is the Distribution
Probability
and Statistics Function of a Population from Which a Sample x1 , x2 , . . . , xn is Taken:
Step-1: Subdivide the x− axis into K intervals I1 , I2 , . . . , IK such that
each interval contains at least 5 values of the given sample x1 , x2 ,
Syllabus
. . . , xn . Determine the number bj of sample values in the interval Ij
where j = 1, 2, . . . , K . If a sample value lies at a common boundary
point of two intervals, add 0.5 to each of the two corresponding bj .
Step-2: Using F (x) (when the hypothesis is true), compute the prob-
ability pj where j = 1, 2, . . . , K , That is pj = P(X ∈ Ij ). Consequently
compute ej = npj that is the number of sample values theoretically
expected in Ij if the hypothesis is true.
Step-3: Observed χ2 value. Compute the deviation
∑
K
(bj − ej )2
χ20 = .
ej
j=1
Step-4: Choose the significance level α that is α = 5%, 1% etc.

Step-5: Determine the solution c of the equation P(χ2 ≤ c) = 1 − α
from the table of χ2 distribution with K − 1 degrees of freedom (Table
A10 in Appendix 5). If r parameters of F (x) are unknown and their
MLEs are used, then K − r − 1 degrees of freedom will be taken
instead of K − 1.
Step-6: If χ20 ≤ c, accept the hypothesis. If χ20 > c, then reject the
hypothesis.
. . . . . . . . . . . . . . . . . . . .
3/4
Introduction to
Probability
and Statistics
Syllabus
Ex.1: If 100 flips of a coin results in 40 heads and 60 tails, can we
assert on the 5% level that the coin is fair?
Ex.2: Can you claim on a 5% level that a die is fair if 60 trials give
1, 2, . . . , 6 with absolute frequencies 10, 13, 9, 11, 9, 8?
Ex.3: Test for normality at the 1% level using a sample of n = 79
values x (tensile strength of steel sheets of 0.3 mm thickness). a =
a(x) = absolute frequency. Take the first two values together, also
the last three to get K = 5.
Ex.4: In a sample of 100 patients having a certain disease 45 are
men and 55 women. Does this support the claim that the disease is
equally common among men and women? Choose α = 5%.
. . . . . . . . . . . . . . . . . . . .
4/4
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-22
. . . . . . . . . . . . . . . . . . . .
1/7
Introduction to
Probability
and Statistics Regression Analysis: In regression analysis we study the depen-
dence of the random variable X on Y by treating one of the random
variable, say X = x as ordinary variable or control variable or inde-
Syllabus
pendent variable as we can control it. We study the regression of Y
on x. Some examples are, dependence of blood pressure Y on age
x of a person, the regression of the heat conductivity Y of cork on
the specific weight x of the cork.
In an experiment we first choose x1 , x2 , . . . , xn and then observe the
corresponding values y1 , y2 , . . . , yn of Y so that we get a sample
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). Now in regression analysis the depen-
dence of Y on x is a dependence of the mean µ of Y on x so that
µ = µ(x). The curve µ(x) is called the regression curve of Y on x.
Mathematically we observe that
E(Y |X = x) = µ(x).
We consider a special case when µ(x) = k0 + k1 x that is straight

line or linear regression. To determine this curve, we need to get k0
and k1 . For a given sample values (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) in
the xY − plane we will try to fit a straight line through them and use
it for estimating µ(x).
. . . . . . . . . . . . . . . . . . . .
2/7
Introduction to
Probability Least Square Principle: The straight line should be fitted through
and Statistics the given points so that the sum of the squares of the distances of
those points from the straight line is minimum, where the distance is
Syllabus measured in the vertical direction (y − direction)
Condition for Uniqueness: The x− values x1 , x2 , . . . , xn in the sam-
ple (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) are not all equal.
From a given sample (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) we will determine
a straight line
yj = k0 + k1 xj .
This is called sample regression line. The population regression line,
which is the counterpart of it, is
y = k0 + k1 x. (1)
A sample point (xj , yj ) has the vertical distance from the sample re-
gression line
|yj − (k0 + k1 xj )|.
Hence the sum of the squares of these distance is
∑
n
E(k0 , k1 ) = (yj − (k0 + k1 xj ))2 . (2)
j=1
. . . . . . . . . . . . . . . . . . . .
3/7
Introduction to In the method of least square we have to determine k0 and k1 such
Probability
and Statistics that E will be minimum. Differentiating partially E with respect to k0
and k1 we get the normal equations as
Syllabus
∂E ∂E
= 0, = 0.
∂k0 ∂k1
∑
From ∂k∂E
= 0, we get nj=1 (yj − k0 − k1 xj ) = 0 and consequently
0 ∑ ∑
we get k0 = n1 nj=1 yj − k1 n1 nj=1 xj . Finally we get k0 = ȳ − k1 x̄.
Substituting in the population regression line we get,
y − ȳ = k1 (x − x̄). (3)
The coefficient k1 is called the regression coefficient is given by
sxy
k1 = 2 ,
sx
1 ∑
n
where sxy = (xj − x̄)(yj − ȳ )
n−1
j=1
1 [∑
n
1 ( ∑ )( ∑ )]
n n
= xj yj − xj yj
n−1 n
j=1 j=1 j=1
1 ∑ 1 [ ∑ 2 1 ( ∑ )2 ]
n n n
sx2 = (xj − x̄)2 = xj − xj
n−1 n−1 n
j=1 j=1 j=1
. . . . . . . . . . . . . . . . . . . .
4/7
Introduction to
Probability
and Statistics
To obtain the value of k1 , we proceed as follows. From the two nor-

Syllabus mal equations, we have
∑
n
−2 (yj − k0 − k1 xj ) = 0,
j=1
and
∑
n
−2 xj (yj − k0 − k1 xj ) = 0.
j=1
Simplifying we get the system of two linear equations in k0 and k1

∑
n ∑
n
k0 n + k1 xj = yj
j=1 j=1
∑
n ∑
n ∑
n
k0 xj + k1 xj2 = xj yj .
j=1 j=1 j=1
. . . . . . . . . . . . . . . . . . . .
5/7
Introduction to
Probability
and Statistics
The determinant of the coefficient matrix is obtained as

Syllabus
∑
n ∑n ∑
n
n xj2 − ( xj )2 = n(n − 1)sx2 = n (xj − x̄)2 .
j=1 j=1 j=1
This is not zero as the xj s are not all equal. Hence the system has a
unique solution.
Dividing the first equation by n we get k0 = ȳ − k1 x̄. It is also given
that y = k0 + k1 x, substituting we get equation (3). To get the value
of k1 , we apply Cramer’s rule to solve the system of equations which
gives ∑ ∑ ∑
n nj=1 xj yj − nj=1 xj nj=1 yj sxy
k1 = = 2.
n(n − 1)sx2 sx
∑n ∑n
Note: If we use sx = n j=1 (xj − x̄) and sxy = n j=1 (xj − x̄)(yj − ȳ ),
2 1 2 1
we will also same result.
. . . . . . . . . . . . . . . . . . . .
6/7
Introduction to
Probability
and Statistics
Ex.1: The decrease of volume y of leather for certain fixed val-

Syllabus ues of high pressure x was measured. The results are given by
(4, 000, 2.3), (6, 000, 4.1), (8, 000, 5.7), (10, 000, 6.9). Find the re-
gression line of y on x.
Sol: Given n = 4, and compute x̄ = 7000, ȳ = 4.75. Further
we compute sx2 = 20000000/3, sxy = 15400/3. This gives k1 =
15400/20000000 = 0.00077. Hence the regression line is given by
y − 4.75 = 0.00077(x − 7000),
or
y = 0.00077x − 0.64.
Ex.2: Find the sample regression line of Y on X for the given points.
(a) (2, 12), (5, 24), (9, 33), (14, 50). (b) (−2, 3.5), (0.1.5), (2, 1),
(4, −0.5), (6, −1).
Ex.3: Find the estimate of y when x = 180, using the estimated re-
gression line for the points (30, 160), (40, 240), (50, 330), (60, 435).
. . . . . . . . . . . . . . . . . . . .
7/7
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-23
. . . . . . . . . . . . . . . . . . . .
1/7
Introduction to
Probability Correlation Analysis: In this study we will study the dependency of
and Statistics
the random variable X on Y by treating both as random. On the
basis of n ordered sample values (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) we will
Syllabus compute the relation between X and Y . We define sample means
1∑ 1∑
n n
x̄ = xi , ȳ = yi .
n n
i=1 i=1
The sample variances are
1 ∑ 1 ∑
n n
sx2 = (xi − x̄)2 , sy2 = (yi − ȳ )2 .
n−1 n−1
i=1 i=1
Further we define the sample covariance as
1 ∑
n
sxy = (xi − x̄)(yi − ȳ ).
n−1
i=1
The sample correlation coefficient is

sxy Cov (X , Y ) σXY
r (x, y ) = = √ √ = .
sx sy Var (X ) Var (Y ) σ X σY
. . . . . . . . . . . . . . . . . . . .
2/7
Introduction to Result-1: The sample correlation coefficient r satisfies −1 ≤ r (x, y ) ≤
Probability
and Statistics 1, and r = ±1 if and only if the sample values lie on a straight line.
Proof: We have
∑n
i=1 (xi − x̄)(yi − ȳ )
1
n−1
Syllabus
r (x, y ) = 1 ∑n ∑n .
[ n−1 i=1 (xi − x̄)2 n−1
1
i=1 (yi − ȳ ) ]
2 1/2
Hence ∑
( n ai bi )2
r 2 (x, y ) = ∑n i=1 ∑ ,
( i=1 ai2 )( ni=1 bi2 )
where we denote ai = xi − x̄, and bi = yi − ȳ .
Next, from the Schwartz inequality for ai , bi ; i = 1, 2, . . . , n are any
real numbers then
∑ n ∑n ∑
n
( ai bi )2 ≤ ( ai2 )( bi2 ),
i=1 i=1 i=1
the sign of equality holding if and only if

a1 a2 an
= = ··· = .
b1 b2 bn
Now using this inequality, we get
r 2 (x, y ) ≤ 1,
or
−1 ≤ r (x, y ) ≤ 1.
. . . . . . . . . . . . . . . . . . . .
3/7
Introduction to
Probability
and Statistics
The theoretical counterpart of r is the population correlation coef-

Syllabus ficient, we will denote it by ρ. If there is no confusion between the
symbol, one may also use r provided it should be mentioned as
population correlation coefficient or simply correlation coefficient.
The correlation coefficient ρ(X , Y ) is defined as
σXY
ρ= ,
σX σY
where we define σXY = E(X − µX )(Y − µY ), the covariance between
X and Y and σX2 = E(X − µX )2 , σY = E(Y − µY )2 . We denote
µX = EX , µY = EY .
Result-2: The correlation coefficient satisfies − ≤ ρ(X , Y ) ≤ 1, and
X and Y are linearly related (Y = aX + b, X = cY + d) if and only
if ρ = ±1.
Proof: The proof is same as the proof of Result 1.
The random variables X and Y are called uncorrelated if ρ = 0.
. . . . . . . . . . . . . . . . . . . .
4/7
Introduction to
Probability
and Statistics
Result-3: (a) If the random variables X and Y are independent then
they are uncorrelated that is ρ = 0.
Syllabus (b) The converse of the statement (a) is not true, that is uncorrelated
random variables may not be independent.
(c) The converse of the statement is true, if (X , Y ) is normal, that is
if (X , Y ) is normal then uncorrelated X and Y are independent.
Proof: (a) Let the random variables X and Y are independent. Then
E(XY ) = (EX )(EY ). Now the correlation coefficient
Cov (X , Y )
ρ(X , Y ) = .
σX σY
But Cov (X , Y ) = E(XY ) − (EX )(EY ) = 0, this implies that ρ = 0.
(b) To prove that the converse is not true, let us consider the example
as follows. Let X be a discrete type random variable with probability
mass function P(X = −1) = P(X = 0) = P(X = 1) = 31 . Let
Y = X 2 then E(X ) = 0 and E(XY ) = E(X 3 ) = σXY = 0.
(c) In order to prove (c), take the bivariate normal random variable
(X , Y ) write its density function. Show that if ρ = 0, implies X and Y
are independent.
. . . . . . . . . . . . . . . . . . . .
5/7
Introduction to
Probability
and Statistics
(c) In order to prove (c), take the bivariate normal random variable
(X , Y ) write its density function. Show that if ρ = 0, implies X and Y
Syllabus are independent. The joint density function of X and Y is given by
1
f (x, y ) = √ e−h(x,y )/2 ,
2πσX σY 1 − ρ2
where
1 [( x − µx )2 ( x − µ )( y − µ ) ( y − µ )2 ]
x y y
h(x, y ) = −2ρ + .
1−ρ2 σx σx σy σy
Now check that when ρ = 0, the joint density f (x, y ) can be written
as the product of marginal densities of X and Y . That is
f (x, y ) = f1 (x)f2 (y )
y −µy 2
1 1 x−µx 2 1 −1(
√ e− 2 ( σx )
)
= √ e 2 σy
σx 2π σy 2π
Using this it is easy to show that E(XY ) = (EX )(EY ). This proves
that X and Y are independent.
. . . . . . . . . . . . . . . . . . . .
6/7
Introduction to
Probability
and Statistics
Ex.1: Calculate the correlation coefficient for the following heights

Syllabus (in inches) of fathers (X ) and their sons (Y ).
X : 65, 66, 67, 67, 68, 69, 70, 72.
Y : 67, 68, 65, 68, 72, 72, 69, 71.
Ex.2: If the random variables X and Y have the joint probability den-
sity function
f (x, y ) = x + y , 0 < x < 1, 0 < y < 1

= 0, elsewhere.
Find the correlation coefficient between X and Y .

Ex.3: Suppose the two dimensional random variable (X , Y ) has the
probability density function given by
f (x, y ) = k exp(−y ), 0 < x < y < 1

= 0, elsewhere.
. . . . . . . . . . . . . . . . . . . .
7/7
Introduction to
Probability
and Statistics
Syllabus
Course ID:MA2203
Lecture-24
. . . . . . . . . . . . . . . . . . . .
1/9
Introduction to
Probability
and Statistics
Syllabus
Rank Correlation Coefficient: Let us suppose that a group of n indi-

viduals is arranged in order of merit or proficiency in possession of
two characteristics A and B. These ranks in the two characteristics
will, in general, be different. For example, if we consider the relation
between intelligence and beauty, it is not necessary that a beautiful
individual is intelligent also. Let (xi , yi ); i = 1, 2, . . . , n be the ranks
of the i th individual in two characteristics A and B respectively. Pear-
son coefficient of correlation between the ranks xi s and yi s is called
the rank correlation coefficient between A and B for that group of
individuals.
. . . . . . . . . . . . . . . . . . . .
2/9
Introduction to
Probability Spearman’s Rank Correlation Coefficient: Assume that no two indi-
and Statistics viduals have the same rank. The random variables (X , Y ) takes the
values 1, 2, . . . , n.
Syllabus Hence we have x̄ = ȳ = n1 (1 + 2 + 3 + · · · + n) = n+1
2
.
n2 −1
Variance σx2 = σy2 = 12
. Here xi ̸= yi , denote di = xi − yi =
(xi − x̄) − (yi − ȳ ).
Squaring and summing over i from 1 to n, we get
∑
n ∑
n
di2 = [(xi − x̄) − (yi − ȳ )]2
i=1 i=1
∑
n ∑
n ∑
n
= (xi − x̄)2 + (yi − ȳ )2 − 2 (xi − x̄)(yi − ȳ ).
i=1 i=1 i=1
Now dividing both sides by n, we get
1∑ 2
n
di = σx2 + σy2 − 2Cov (X , Y )
n
i=1
= σx2 + σy2 − 2ρσx σy ,
where ρ is the rank correlation coefficient between A and B.

. . . . . . . . . . . . . . . . . . . .
3/9
Introduction to
Probability
and Statistics
Syllabus
∑n ∑n
i=1 di
2
Hence, 1
n i=1 di2 = 2σx2 − 2ρσx2 , this implies that 1 − ρ = 2nσx2
.
From the above we get,
∑2
di2
ρ = 1− i=1
2
2nσx
∑
6 ni=1 di2
= 1− .
n(n2 − 1)
This is the Spearman’s formula for the rank correlation coefficient,
when there are no tied rank. ∑ ∑
Note: We should always have ni=1 di = ni=1 (xi − yi ) = 0.
. . . . . . . . . . . . . . . . . . . .
4/9
Introduction to
Probability
and Statistics
Tied Ranks: If some of the individuals receive the same rank in a
ranking of merit, they are said to be tied. Let us suppose that m of
Syllabus the individuals, say, (k + 1)th , (k + 2)th , . . . , (k + m)th are tied. Then
each of these m individuals is assigned a common rank, which is
arithmetic mean of the ranks k + 1, k + 2, . . . , k + m.
Derivation of ρ in tied Case. We have
∑n
i=1 (X − X̄ )(Y − Ȳ )
ρ = ∑n ∑
[ i=1 (X − X̄ )2 ni=1 (Y − Ȳ )2 ]1/2
∑n
xy
= √∑ i=1 ∑ ,
n 2 n 2
i=1 x i=1 y
where x = X − X̄ , and y = Y − Ȳ . If X and Y each tajes the

∑ 2 2
values 1, 2, . . . , n then X̄ = Ȳ = n+1
2
and nσx2 = x = n(n12−1) and
∑ 2 2
nσy2 = y = n(n12−1) .
∑ 2 ∑ ∑
We
∑ also have d = (X − Y )2 = [(X − X̄ ) − (Y − Ȳ )]2 =
(x − y ) .
2
. . . . . . . . . . . . . . . . . . . .
5/9
Introduction to
Probability
and Statistics ∑ 2 ∑ 2 ∑ 2 ∑ ∑
This
∑ implies∑ 2d ∑
= x + y − 2 xy . Thus we have xy =
1
2
( x 2
+ y − d 2
).
Syllabus
We shall now investigate the effect og common ranking, on the sum
of squares of the ranks. Let S 2 and S12 denote the sum of the squares
of untied and tied ranks respectively. Then we have
S2 = (k + 1)2 + (k + 2)2 + · · · + (k + m)2

m(m + 1)(2m + 1)
= mk 2 + + mk (m + 1).
6
S12 = m(Average rank )2

[ (k + 1) + (k + 2) + · · · + (k + m) ]2
= m
m
( m + 1 )2 m(m + 1)2
= m k+ = mk 2 + + mk (m + 1).
2 4
m(m+1) m(m2 −1)
The difference S 2 − S12 = 12
[2(2m + 1) − 3(m + 1)] = 12
.
. . . . . . . . . . . . . . . . . . . .
6/9
Introduction to
Probability
and Statistics
Syllabus Thus the effect of tying m individuals (ranks) is to reduce the sum of
2
the squares by m(m12−1) , though the mean value of the ranks remains
n+1
the same, that is 2 .
Suppose that there are s such sets of ranks to be tied in the X −
values so that the total sum of squares due to them is:
1 ∑ 1 ∑ 3
s s
mi (mi2 − 1) = (mi − mi ) = Tx (Say).
12 12
i=1 i=1
Similarly, suppose that there are t such sets of ranks to be tied in the
Y − values so that the total sum of squares due to them is:
1 ∑ ′ ′2 1 ∑ ′3
t s
mi (mi − 1) = (mi − mi′ ) = Ty (Say ).
12 12
i=1 i=1
. . . . . . . . . . . . . . . . . . . .
7/9
Introduction to
Probability
and Statistics
Thus in the case of tied ranks, the new sums of squares are given
Syllabus
by
∑ 2 n(n2 − 1)
nVar ′ (X ) = x − Tx = − Tx
12
and
∑ n(n2 − 1)
Var ′ (Y ) = y 2 − Ty = − Ty .
12
Further we have the new Covariance:
1[∑ 2 ∑ 2 ∑ 2]
nCov ′ (X , Y ) = x − Tx + y − Ty − d
2
1 [ n(n2 − 1) n(n2 − 1) ∑ 2]
= − Tx + − Ty − d
2 12 12
n(n2 − 1) 1 ∑ 2
= − [(Tx + Ty ) + d ].
12 2
. . . . . . . . . . . . . . . . . . . .
8/9
Introduction to
Probability
and Statistics
Syllabus Finally the rank correlation coefficient ρ is given by

n(n2 −1) ∑
12
− 12 [(Tx + Ty ) + d 2 ]
ρ(X , Y ) =
[ n(n12−1) − Tx ]1/2 [ n(n12−1) − Ty ]1/2
2 2
∑
n(n2 − 1)/6 − ( d 2 + Tx + Ty )
= .
[ n(n 6−1) − 2Tx ]1/2 [ n(n 6−1) − 2Ty ]1/2
2 2
Note: If we adjust only the covariance term, then for ties the above
formula reduces to
∑
n(n2 − 1)/6 − ( d 2 + Tx + Ty )
ρ(X , Y ) =
n(n2 − 1)/6
∑ 2
6( d + Tx + Ty )
= 1− .
n(n2 − 1)
. . . . . . . . . . . . . . . . . . . .
9/9

Introduction To Probability and Statistics Course ID:MA2203: Course Teacher: Dr. Manas Ranjan Tripathy

Uploaded by

Copyright:

Available Formats

Introduction To Probability and Statistics Course ID:MA2203: Course Teacher: Dr. Manas Ranjan Tripathy

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Probability and Statistics Course ID:MA2203: Course Teacher: Dr. Manas Ranjan Tripathy

Uploaded by

Copyright:

Available Formats

Introduction to

Mid-Semester: Axiom of Probability(Motivation, Various types

Syllabus Independent Events: Several events are said to be inde-

m No. of favourable cases

Syllabus We also call the probability as the counterpart of relative

Syllabus Let S be a given sample space, and S be a σ-field on it.

The above axiom (3) can be extended to countable num-

Syllabus 1 Probability of the impossible event is zero, that is P(∅) = 0.

Proof:(i) This can be proved by the method of mathematical

To prove that the result is true for n = k + 1.

Syllabus Given n events, A1 , A2 , . . . , An , we have

Proof: This can be proved by the method of mathematical

Let the result is true for n = k . To prove the result for

Using this, we get

Bayes’ Theorem: If A1 , A2 , . . . , An are mutually disjoint events with