Week10 Notes

116
10. Week 10
Proposition 10.1 (Cauchy-Schwarz Inequality). Let X and Y be RVs defined on the same prob-
ability space. Then,
(E(XY ))2 ≤ E(X 2 )E(Y 2 ),
provided the expectations exist. The equality occurs if and only if P(Y = cX) = 1 or P(X = cY ) = 1
for some c ∈ R.
Proof. First we consider the case when EX 2 = 0. Then P(X = 0) = 1 and consequently P(XY =
0) = 1 and E(XY ) = 0. The equality holds.
Now, assume that EX 2 > 0. Now, for all c ∈ R, we have E(Y − cX)2 = c2 EX 2 − 2cE(XY ) +
EY 2 ≥ 0. Hence, the discriminant (2E(XY ))2 − 4EX 2 EY 2 must be non-positive, which proves
the statement.
If the equality holds for some E(Y − cX)2 = 0 for some c, then we have P(Y = cX) = 1. If
P(Y = cX) = 1 for some c, then E(Y − cX)2 = 0. Interchanging the roles of X and Y , we can
discuss the case involving E(X − cY )2 and P(X = cY ). □
Corollary 10.2. Let X and Y be RVs defined on the same probability space. Then,
(Cov(X, Y ))2 ≤ V ar(X) V ar(Y ),
provided the covariance and the variances exist.
Proof. Take U = X − EX and V = Y − EY . Applying the Cauchy-Schwarz inequality to U and

V , the result follows. □
Remark 10.3. (a) Recall from the discussion in Definition 6.33 that
V ar(X) = E (X − EX)2 = inf E(X − c)2 ,

c∈R
for any RV X with finite second moment. This result can be interpretated in the following
q
way: EX is the best constant to approximate an RV X with an ‘error’ V ar(X).
117
(b) Now, given two non-degenerate RVs X and Y , we can ask the following question: what are
the best constants a, b when we try to approximate Y by aX + b? To be precise, we are
looking at the problem of minimizing E(Y − aX − b)2 over all a, b ∈ R.
(c) Continue with the problem mentioned above. Fix a ∈ R. Then, the best b which minimizes
q
E(Y − aX − b)2 is b = E(Y − aX) with an ‘error’ V ar(Y − aX). We may therefore
consider
h i
inf E(Y −aX−b)2 = inf E [Y − aX − E(Y − aX)]2 = inf V ar(Y ) − 2aCov(X, Y ) + a2 V ar(X) .
a,b∈R a∈R a∈R
Cov(X,Y )
By a simple computation, we conclude that a = V ar(X)
,b = E(Y − aX) attains the
infimum in
  2 
Cov(X, Y )
inf E(Y − aX − b)2 = V ar(Y ) 
1 −  q   ≤ V ar(Y ) = inf E(Y − a)2 .
a,b∈R a∈R
V ar(X) V ar(Y )
The inequality above follows from the Cauchy-Schwarz inequality (Proposition 10.1).
Definition 10.4 (Correlation between RVs). Let X and Y be RVs defined on the same probability
space. If 0 < V ar(X) < ∞, 0 < V ar(Y ) < ∞, then we call
Cov(X, Y )
ρ(X, Y ) := q
V ar(X) V ar(Y )
as the Correlation between X and Y . We say X and Y are uncorrelated if ρ(X, Y ) = 0 or

equivalently Cov(X, Y ) = 0.
Note 10.5. By Corollary 10.2, |ρ(X, Y )| ≤ 1 for any two RVs X and Y defined on the same
probability space.
Remark 10.6 (Correlation and Independence). If X and Y are independent RVs defined on the same
probability space, then by Remark 9.32(i), Cov(X, Y ) = 0 and hence X and Y are uncorrelated.
However, the converse is not true. We illustrate this problem with examples.
118
(a) Let X = (X1 , X2 ) be a bivariate discrete random vector, i.e. a 2-dimensional discrete
random vector with joint p.m.f. given by

1



 2
, if (x1 , x2 ) = (0, 0),


fX (x1 , x2 ) = 1
 4
, if (x1 , x2 ) = (1, 1) or (1, −1),



0, otherwise.


The marginal p.m.fs are


1
 

 2
, if x2 = 0
1, if x1 ∈ {0, 1}

 


2 1
fX1 (x1 ) = , fX2 (x2 ) = 4
, if x2 ∈ {1, −1}
0, otherwise

 



0, otherwise


1 1
We have fX1 ,X2 (0, 0) = 2
̸= 4
= fX1 (0)fX2 (0) and hence X1 and X2 are not independent.
1
But, EX1 = 2
, EX2 = 0, E(X1 X2 ) = 0, V ar(X1 ) > 0 and V ar(X2 ) > 0. Therefore
Cov(X1 , X2 ) = 0 and hence X1 and X2 are uncorrelated.
(b) Let X = (X1 , X2 ) be a bivariate continuous random vector, i.e. a 2-dimensional continuous
random vector with joint p.d.f. given by

 1, if 0 < |x2 | ≤ x1 < 1
fX (x1 , x2 ) =
 0, otherwise.
Then, Z 1 Z x1
E (X1 X2 ) = x1 x2 dx2 dx1 = 0,
0 −x1
and
Z 1 Z x1
2 Z 1 Z x1
E (X1 ) = x1 dx2 dx1 = , E (X2 ) = x2 dx2 dx1 = 0.
0 −x1 3 0 −x1
Hence, E (X1 X2 ) = (EX1 )(EX2 ), which implies Cov(X1 , X2 ) = 0. A similar computation

shows V ar(X1 ) and V ar(X2 ) exists and are non-zero. Hence, X1 and X2 are uncorrelated.
119
Now, by computing the marginal p.d.f.s fX1 and fX2 , it is immediate that the equality
fX1 ,X2 (x1 , x2 ) = fX1 (x1 ) fX2 (x2 )
does not hold for all x = (x1 , x2 ) ∈ R2 . Here, X1 and X2 are not independent. The
verification with the marginal p.d.f.s is left as an exercise in practice problem set 9.
We now discuss the concept of equality of distribution for random vectors. As we shall see, the
ideas remain the same as in the case of RVs.
Definition 10.7 (Identically distributed random vectors). Let X and Y be two p-dimensional
random vectors, possibly defined on different probability spaces. We say that they have the same
law/distribution, or equivalently, X and Y are identically distributed or equivalently, X and Y are
d
equal in law/distribution, denoted by X = Y , if FX (x) = FY (x), ∀x ∈ Rp .
Remark 10.8. As discussed in the case of RVs in Remark 7.1, we can check whether two random
vectors are identically distributed or not via other quantities that describe their law/distribution.
(a) Let X and Y be p-dimensional discrete random vectors with joint p.m.f.s fX and fY ,
respectively. Then X and Y are identically distributed if and only if fX (x) = fY (x), ∀x ∈
Rp .
(b) Let X and Y be p-dimensional continuous random vectors with joint p.d.f.s fX and fY ,
respectively. Then X and Y are identically distributed if and only if fX (x) = fY (x), ∀x ∈
Rp .
(c) Let X and Y be p-dimensional random vectors such that their joint MGFs MX and MY
exist and agree on (−a1 , a1 ) × (−a2 , a2 ) × · · · (−ap , ap ) for some a1 , a2 , · · · , ap > 0, then X
and Y are identically distributed.
(d) Let X and Y be identically distributed p-dimensional random vectors. Then for any func-
tion h : Rp → Rq , we have h(X) and h(Y ) are identically distributed q-dimensional random
vectors.
120
Notation 10.9 (i.i.d RVs). We say that RVs X1 , · · · , Xp defined on the same probability space
are independent and identically distributed, then we usually use the short hand notation i.i.d. and
say that X1 , · · · , Xp are i.i.d..
Note 10.10. The concept of independence of random vectors can be discussed in the same way
as done for independence of random variables.
Definition 10.11. (a) A random sample is a collection of i.i.d. RVs.

(b) A random sample of size n is a collection of n i.i.d. RVs X1 , X2 , · · · , Xn .
(c) Let X1 , X2 , · · · , Xn be a random sample of size n. If the common DF is F or the common
p.m.f./p.d.f. is f , then we call X1 , X2 , · · · , Xn to be a random sample from a distribution
having a DF F or p.m.f./p.d.f. f .
(d) A function of one or more RVs that does not depend on any unknown parameter is called
a statistic.
Example 10.12. Suppose that X1 , · · · , Xn are i.i.d. with the common distribution being P oisson(θ)
or Exponential(θ) for some unknown θ ∈ (0, ∞). Here, θ is a unknown parameter.
1 Pn
(a) X̄ := n i=1 Xi is a statistic and is usually referred to as the sample mean.
(b) X1 − θ is not a statistic.
1 Pn
(c) Sn2 := n i=1 (Xi − X̄)2 is a statistic and is usually referred to as the sample variance.
1 Pn
Depending on the situation, we sometimes work with n−1 i=1 (Xi − X̄)2 .
(d) The value of Sn such that Sn2 is the sample variance, is referred to as the sample standard
deviation.
(e) For r = 1, · · · , n, we denote by X(r:n) the r-th smallest of X1 , · · · , Xn . By definition,
X(1:n) ≤ · · · ≤ X(n:n) and these are called the order statistics of the random sample. If n is
understood, then we simply write X(r) to denote the r-th order statistic.
Note 10.13. Let X1 , X2 be a random sample of size 2. Then X(1) = min{X1 , X2 } = 21 (X1 +
X2 ) − 12 |X1 − X2 | and X(2) = max{X1 , X2 } = 21 (X1 + X2 ) + 12 |X1 − X2 | are RVs. Using similar
arguments, it follows that the order statistics from any random sample of size n are RVs. The joint
distribution of the order statistics is therefore of interest.
121
Note 10.14. Let X1 , · · · , Xn be a random sample of continuous RVs with the common p.d.f. f .
Then,
P(X(1) < X(2) ) < · · · < X(n) ) = 1
and hence X(r) , r = 1, · · · , n are defined uniquely with probability one.
Proposition 10.15. Let X1 , · · · , Xn be a random sample of continuous RVs with the common DF
F and the common p.d.f. f . The joint p.d.f. of (X(1) , · · · , X(n) ) is given by

n! n f (yi ), if y1 < · · · < yn ,

 Q
i=1
g(y1 , · · · , yn ) =
0, otherwise.


Further the marginal p.d.f. of X(r) is given by

n!
gX(r) (y) = (F (y))r−1 (1 − F (y))n−r f (y), ∀y ∈ R.
(r − 1)!(n − r)!
Proof. Observe that a sample value (y1 , · · · , yn ) of (X(1) , · · · , X(n) ) is related to a sample (x1 , · · · , xn )
of (X1 , · · · , Xn ) in the following way
(y1 , · · · , yn ) = (x(1) , · · · , x(n) ),
x(r) being the r-th smallest of x1 , · · · , xn . Note that yr = x(r) .

Now, the actual values x1 , · · · , xn may have been arranged in a different order than x(1) , · · · , x(n) .
In fact, the values x(1) , · · · , x(n) arise from one of the n! permutations of the values x1 , · · · , xn .
But, any such transformation/permutation is obtained by the action of a permutation matrix
on the vector (x1 , · · · , xn ). For example, if x1 < x2 < · · · < xn−2 < xn < xn−1 , then x(1) =
x1 , · · · , x(n−2) = xn−2 , x(n−1) = xn , x(n) = xn−1 which interchanges the n − 1 and n-th values, i.e.
xn−1 and xn .
Hence, the Jacobian matrix for this transformation is the same as the corresponding permutation
matrix and the Jacobian determinant is ±1.
Since X1 , · · · , Xn are i.i.d., the joint p.d.f. of (X1 , · · · , Xn ) is given by
fX1 ,··· ,Xn (x1 , · · · , xn ) = f (x1 ) × · · · × f (xn ), ∀(x1 , · · · , xn ) ∈ Rn .

122
Using Theorem 9.24, we have joint p.d.f. of (X(1) , · · · , X(n) ) is given by


n! n f (yi ), if y1 < · · · < yn ,

 Q
i=1
g(y1 , · · · , yn ) =
0, otherwise.


The marginal p.d.f. of X(r) can now be computed for y ∈ R,
gX(r) (y)
Z y Z yr−1 Z y2 Z ∞ Z ∞ Z ∞ n
Y
= ··· ··· n! f (yi ) dyn dyn−1 · · · dyr+1 dy1 dy2 · · · dyr−1
yr−1 =−∞ yr−2 =−∞ y1 =−∞ yr+1 =y yr+2 =yr+1 yn =yn−1 i=1
The above integral simplifies to the result stated above. □
Example 10.16. Let X1 , X2 , X3 be a random sample from U nif orm(0, 1) distribution. The
common p.d.f. here is given by 
1, if x ∈ (0, 1)


f (x) = 
0,
 otherwise.
By the above result, the joint p.d.f. of (X(1) , X(2) , X(3) ) is given by

6, if 0 < y1 < y2 < y3 < 1,


g(y1 , y2 , y3 ) =
0, otherwise.


and the marginal p.d.f. of X(1) is


3(1 − y )2 , if y1 ∈ (0, 1)


1
g(y1 ) =
0, otherwise.


Remark 10.17. For random samples from discrete distributions, there is no general formula or result
which helps in computing the joint distribution of the order statistics. Usually they are done by
a case-by-case analysis. Let X1 , X2 , X3 be a random sample from Bernoulli(p) distribution, for
123
some p ∈ (0, 1). The common p.m.f. here is given by





 p, if x = 1


f (x) = 1 − p, if x = 0




0, otherwise.


Note that X(1) is also a {0, 1}-valued RV with X(1) = min{X1 , X2 , X3 } = 1 if and only if X1 =
X2 = X3 = 1. Then using independence,
P(X(1) = 1) = P(X1 = 1, X2 = 1, X3 = 1) = P(X1 = 1)P(X2 = 1)P(X3 = 1) = p3
and P(X(1) = 0) = 1 − P(X(1) = 1) = 1 − p3 . Therefore, X(1) ∼ Bernoulli(p3 ). Similarly,

X(3) ∼ Bernoulli(1 − (1 − p)3 ). The distribution of X(2) is left as an exercise in problem set 10.
Earlier, we have discussed the concept of conditional distributions and the concept of expectation
of a random vector. Combining these two concepts, we are led to the following.
Definition 10.18 (Conditional Expectation, Conditional Variance and Conditional Covariance).

Let X = (X1 , X2 , · · · , Xp+q ) be a p + q-dimensional random vector with joint p.m.f./p.d.f. fX .
Let the joint p.m.f./p.d.f. Y = (X1 , X2 , · · · , Xp ) and Z = (Xp+1 , Xp+2 , · · · , Xp+q ) be denoted by
fY and fZ , respectively. Let h : Rp → R be a function. Let z ∈ Rq be such that fZ (z) > 0.
(a) The conditional expectation of h(Y ) given Z = z, denoted by E(h(Y ) | Z = z), is the
expectation of h(Y ) under the conditional distribution of Y given Z = z.
(b) The conditional variance of h(Y ) given Z = z, denoted by V ar(h(Y ) | Z = z), is the
variance of h(Y ) under the conditional distribution of Y given Z = z.
(c) Let 1 ≤ i ̸= j ≤ p. The conditional covariance between Xi and Xj given Z = z, denoted
by Cov(Xi , Xj | Z = z), is the covariance between Xi and Xj under the conditional
distribution of (Xi , Xj ) given Z = z.
Notation 10.19. On {z ∈ Rq : fZ (z) > 0}, consider the function, g1 (z) := E(h(Y ) | Z = z). We
denote the RV g1 (Z) by E(h(Y ) | Z). Similarly, define the RVs V ar(h(Y ) | Z) and Cov(X1 , X2 | Z)
124
Proposition 10.20. The following are properties of Conditional Expectation, Conditional Vari-
ance and Conditional Covariance. Here, we assume that the relevant expectations exist.
(a) Eh(Y ) = E(E(h(Y ) | Z).
(b) V ar(h(Y )) = V ar(E(h(Y ) | Z)) + EV ar(h(Y ) | Z).
(c) Cov(X1 , X2 ) = Cov(E(X1 | Z), E(X2 | Z)) + ECov(X1 , X2 | Z).
Proof. We only prove the first statement under a simple assumption. The general case and other
statements can be proved using appropriate generalization.
Take p = q = 1 and let X = (Y, Z) be a 2-dimensional continuous random vector. Then,
Z ∞
E(E(h(Y ) | Z) = E(h(Y ) | Z = z)fZ (z) dz
−∞
Z ∞ Z ∞
= h(y)fY |Z (y | z) dy fZ (z) dz
−∞ −∞
Z ∞ Z ∞
= h(y)fY,Z (y, z) dydz
−∞ −∞
= Eh(Y ).
Example 10.21. We shall see computations for conditional expectations in a later lecture.
We look at examples of discrete RVs in relation with random experiments.
Remark 10.22 (Binomial RVs via random experiments). Recall that in Remark 7.20, we have seen
Bernoulli RVs arising from Bernoulli trials. Now, consider the same random experiment with
two outcomes ‘Success’ and ‘Failure’ with probability of success p ∈ (0, 1). Now, consider n
independent Bernoulli trials of this experiment with the RV Xi being 1 for ‘Success’ and 0 for
‘Failure’ in the i-th trial for i = 1, 2, · · · , n. Then, X1 , X2 , · · · , Xn is a random sample of size n
from the Bernoulli(p) distribution. Now, the total number X of successes in the n trials is given
by X = X1 + X2 + · · · + Xn and hence, by Remark 9.23, X ∼ Binomial(n, p). A Binomial(n, p)
RV can therefore be interpretated as the number of successes in n trials of a random experiment
with two outcomes ‘Success’ and ‘Failure’ with probability of success p ∈ (0, 1). Here, we have
kept p fixed over all the trials.
125
Example 10.23. Suppose that a standard six-sided fair die is rolled at random 4 times indepen-
dently. We now consider the probability that all the rolls result in a number at least 5. In each
2 1
roll, obtaining at least 5 has the probability 6
= 3
- we treat this as the probability of success
in one trial. Repeating the trial three times independently gives us the number of success as
X ∼ Binomial(4, 31 ). The probability that all the rolls result in successes is given by P(X = 4) –
which can now be computed from the Binomial distribution. If we now consider the probability
that at least two rolls result in a number at least 5, then that probability is given by P(X ≥ 2).
Example 10.24 (Negative Binomial RV). Consider a random experiment with two outcomes
‘Success’ and ‘Failure’ with probability of success p ∈ (0, 1). We consider repeating the experiment
until we have r successes, with r being a positive integer. Let X denote the number of failures
observed till the r-th success. Then X is a discrete RV with the support of X being SX = {0, 1, · · · }.
Note that for k ∈ SX , using independence of the trials we have
P(X = k)
= P(there are k failures before the r-th success)
= P(first k + r − 1 trials result in r − 1 successes and the k + r-th trial results in a success)
= P(first k + r − 1 trials result in r − 1 successes) × P(the k + r-th trial results in a success)

!
k + r − 1 r−1
= p (1 − p)k × p
r−1
!
k+r−1 r
= p (1 − p)k .
k
Therefore the p.m.f. of X is given by


 x+r−1 pr (1 − p)x , if x ∈ SX ,


x
fX (x) =
0, otherwise.


In this case, we say X follows the negative Binomial(r, p) distribution or equivalently, X is a

negative Binomial (r, p) RV. Here, r denotes the number of successes at which the trials are
126
terminated and p being the probability of success. The MGF can now be computed as follows.
MX (t) = EetX
∞
!
tk k+r−1 r
p (1 − p)k
X
= e
k=0 k
∞
!
k+r−1 rh ik
p (1 − p)et
X
=
k=0 k
= pr [1 − (1 − p)et ]−r , ∀t < − ln(1 − p).
rq rq
Using the MGF, we can compute the mean and variance of X as EX = p
,V ar(X) = p2
, with
q = 1 − p.
Remark 10.25 (Connection between negative Binomial distribution and the Geometric distribu-
tion). A negative Binomial(1, p) RV X has the p.m.f.

p(1 − p)x , if x ∈ {0, 1, · · · },


fX (x) =
0, otherwise.


which is exactly the same as the p.m.f. for the Geometric(p) distribution. Since the p.m.f. of a
discrete RV determines the distribution, we conclude that a Geometric(p) RV can be identified as
the number of failures observed till the first success in independent trials of a random experiment
with two outcomes ‘Success’ and ‘Failure’ with probability of success p ∈ (0, 1).
Note 10.26 (No memory property for Geometric Distribution). Let X ∼ Geometric(p) for some
p ∈ (0, 1). For any non-negative integer n, we have
∞ ∞
k n
(1 − p)k = (1 − p)n .
X X
P(X ≥ n) = p(1 − p) = p(1 − p)
k=n k=0
Then, for any non-negative integers m, n, we have

P(X ≥ m + n and X ≥ m) P(X ≥ m + n)
P(X ≥ m + n | X ≥ m) = = = (1 − p)n = P(X ≥ n).
P(X ≥ m) P(X ≥ m)
127
Here, the probability of obtaining at least n additional failures (till the first success) beyond the
first m or more failures remain the same as in the the probability of obtaining at least n failures
till the first success. In the situation where we stress test a device under repeated shocks, if we
consider the survival or continued operation of the device under shocks as ‘Failures’ in our trial
and if the number of shocks till the device breaks down follows Geometric(p) distribution, then
we can interpret that the age of the device (measured in number of shocks observed) has no effect
on the remaining lifetime of the device. This property is usually referred to as the ‘No memory’
property of the Geometric distribution.
Note 10.27. See problem set 9 for a similar property for the Exponential distribution.
Example 10.28. Let us consider the random experiment of rolling a standard six-sided fair die
till we observe an outcome of at least 5. As mentioned in Example 10.23, the probability of success
is 31 . Since the last roll results in a success, the number Y of rolls required is exactly one more
than the number X of failures observed. Here X ∼ Geometric( 31 ). Then, the probability that an
outcome of 5 or 6 is observed in the 10-th roll for the first time is given by
9
1 2
P(Y = 10) = P(X = 9) = .
3 3
If we want to look at Z which is the number of failures observed till 5 or 6 is rolled twice, then Z
follows negative Binomial(2, 31 ). Now, the number of rolls required is Z + 2. The probability that
10 rolls are required is given by
2 8
!
9 1 2
P(Z + 2 = 10) = P(Z = 8) = .
8 3 3
Note 10.29. Suppose that a box contains N items, out of which M items have been marked/labelled.
In our experiment, we consider all labelled items to be identical and the same for all the unla-
belled items. If we draw items from the box with replacement, then the probability of drawing
M
a marked/labelled item is N
does not change between the draws. If we draw n items at random
with replacement, then the number X of marked/labelled items follow Binomial(n, M
N
) distribu-
tion. The case where the draws are conducted without replacement is of interest.
128
Example 10.30 (Hypergeometric RV). In the setup of Note 10.29, consider drawing n items at
random without replacement. Here, the probability of drawing a marked/labelled item may change
between the draws and the number X of marked/labelled items in the n drawn items need not
follow Binomial(n, M
N
) distribution. Here, the number of labelled items among the items drawn
satisfies the relation
0 ≤ X ≤ min{n, M } ≤ N
and the number of unlabelled items among the items drawn satisfies the relation
0≤n−X ≤N −M
and hence X is a discrete RV with support SX = {max{0, n − (N − M )}, max{0, n − (N − M )} +

1, · · · , min{n, M }}. The p.m.f. of X is given by
 M N −M
( )( )
 x n−x , if x ∈ SX ,


fX (x) =  (Nn )
0,

otherwise.
In this case, we say X follows the Hypergeometric distribution or equivalently, X is a Hypergeo-

metric RV. This distribution has the three parameters N, M and n. Using properties of binomial
coefficients, we can compute the factorial moments of X (left as exercise in problem set 10) and
using these values we have,
nM nM M M N −n

EX = , V ar(X) = 2 (N − M )(N − n) = n 1− .
N N (N − 1) N N N −1
M
Note 10.31. In the setup of a Hypergeometric RV, if we consider p = N
as the probability of
success and n as the number of trials, then EX matches with that of a Binomial(n, M
N
) RV and
V ar(X) is close to that of a Binomial(n, M
N
) RV for small sample sizes n.
Example 10.32. Suppose that there are multiple boxes each containing 100 electric bulbs and
we draw 5 bulbs from each box for testing. If a box contains 10 defective bulbs, then the number
X of defective bulbs in the drawn bulbs follows Hypergeometric distribution with parameters
129
N = 100, M = 10, n = 5. Here,

10 100−10
2 5−2
P(X = 2) =
100
.
5
Note 10.33. We continue with the setting of Note 10.29, where a box contains N items, out of
which M items have been marked/labelled or are defective. In our experiment, we consider all
labelled items to be identical and the same for all the unlabelled items. If we draw items from the
box with replacement until the r-th defective item is drawn, then the number of draws required can
be described in terms of negative Binomial(r, M
N
) distribution, where the last draw yields the r-th
defective item (see Example 10.28). The case where the draws are conducted without replacement
is of interest.
Example 10.34 (Negative Hypergeometric RV). In the setting of Note 10.33, consider drawing the
items without replacement till the r-th defective item is obtained. We then have 1 ≤ r ≤ M . Let
X be the number of draws required. Then X is a discrete RV with support SX = {r, r + 1, · · · , N }.
For k ∈ SX , using independence of the draws we have
P(X = k)
= P(first k − 1 trials result in r − 1 defective items and the k-th trial results in a defective item)
= P(first k − 1 trials result in r − 1 defective items) × P(the k-th trial results in a defective item)

M N −M
r−1 k−r M − (r − 1)
= × .
N N − (k − 1)
k−1
Therefore the p.m.f. of X is given by

 M N −M
 M −(r−1) (r−1)( x−r ) ,

if x ∈ {r, r + 1, · · · , N },
N −(x−1) N
(x−1 )

fX (x) =

0,

otherwise.
In this case, we say X follows the negative Hypergeometric distribution or equivalently, X is a
negative Hypergeometric RV.

Week10 Notes

Uploaded by

Copyright:

Available Formats

Week10 Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week10 Notes

Uploaded by

Copyright:

Available Formats

116

(Cov(X, Y ))2 ≤ V ar(X) V ar(Y ),

provided the covariance and the variances exist.

Proof. Take U = X − EX and V = Y − EY . Applying the Cauchy-Schwarz inequality to U and

V ar(X) = E (X − EX)2 = inf E(X − c)2 ,

as the Correlation between X and Y . We say X and Y are uncorrelated if ρ(X, Y ) = 0 or

The marginal p.m.fs are

Hence, E (X1 X2 ) = (EX1 )(EX2 ), which implies Cov(X1 , X2 ) = 0. A similar computation

fX1 ,X2 (x1 , x2 ) = fX1 (x1 ) fX2 (x2 )

Definition 10.11. (a) A random sample is a collection of i.i.d. RVs.

and hence X(r) , r = 1, · · · , n are defined uniquely with probability one.

Further the marginal p.d.f. of X(r) is given by

(y1 , · · · , yn ) = (x(1) , · · · , x(n) ),

x(r) being the r-th smallest of x1 , · · · , xn . Note that yr = x(r) .

fX1 ,··· ,Xn (x1 , · · · , xn ) = f (x1 ) × · · · × f (xn ), ∀(x1 , · · · , xn ) ∈ Rn .

Using Theorem 9.24, we have joint p.d.f. of (X(1) , · · · , X(n) ) is given by

The marginal p.d.f. of X(r) can now be computed for y ∈ R,

The above integral simplifies to the result stated above. □

and the marginal p.d.f. of X(1) is

some p ∈ (0, 1). The common p.m.f. here is given by

P(X(1) = 1) = P(X1 = 1, X2 = 1, X3 = 1) = P(X1 = 1)P(X2 = 1)P(X3 = 1) = p3

and P(X(1) = 0) = 1 − P(X(1) = 1) = 1 − p3 . Therefore, X(1) ∼ Bernoulli(p3 ). Similarly,

Definition 10.18 (Conditional Expectation, Conditional Variance and Conditional Covariance).

We look at examples of discrete RVs in relation with random experiments.

= P(there are k failures before the r-th success)

= P(first k + r − 1 trials result in r − 1 successes) × P(the k + r-th trial results in a success)

In this case, we say X follows the negative Binomial(r, p) distribution or equivalently, X is a

Then, for any non-negative integers m, n, we have

and hence X is a discrete RV with support SX = {max{0, n − (N − M )}, max{0, n − (N − M )} +

In this case, we say X follows the Hypergeometric distribution or equivalently, X is a Hypergeo-

N = 100, M = 10, n = 5. Here,   

Therefore the p.m.f. of X is given by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

N = 100, M = 10, n = 5. Here,