Random Variables PDF

Random Variables
Fall 2015
Instructor:
Ajit Rajwade
Topic Overview
Random variable: definition
Discrete and continuous random variables
Probability density function (pdf) and cumulative
distribution function (cdf)

Joint and conditional pdfs
Expectation and its properties
Variance and covariance
Markovs and Chebyshevs inequality
Weak law of large numbers
Random variable
In many random experiments, we are not always
interested in the observed values, but in some

numerical quantity determined by the observed values.
Example: we may be interested in the sum of the
values of two dice throws, or the number of heads

appearing in n consecutive coin tosses.
Any such quantities determined by the results of
random experiments are called as random variables

(they may also be the observations themselves).
Random variable
Value of X (Denoted
as x) where X = sum
of 2 dice throws
P(X=x)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
10
3/36
11
2/36
12
1/36
This is called the probability

mass function (pmf) table of
the random variable X. If S is
the sample space, then
P(S) = P(union of all events of
the form X = x) = 1 (verify
from table).
Random variable: Notation

A random variable is usually denoted by an upper case
alphabet.
Individual values the random variable can acquire are
denoted by lower case.
Random variable: discrete

Random variables whose values can be written as a
finite or infinite sequence are called discrete random

variables.
Example: results of coin toss or random dice
experiments
The probability that a random variable X takes on
value x, i.e. P(X=x), is called as the probability mass

function.
Random variable: continuous

Random variables that can take on values within a
continuum are called continuous random variables.

Example: the dimensions (length, height, width,
weight) of an object are usually continuous quantities,

direction of a vector, amount of water that can be
stored in a 4 litre jar is a continuous random variable in
the interval [0,4].

For a continuous random variable, the probability that
it takes on any particular value within a continuum is

zero!
Why? Because there are infinitely many values say
in the interval [0,4] in the example on the previous

slide. Each value will be equally likely.
Note: Zero probability in case of continuous random
variables does not mean the event will never occur!

This differs from the discrete case.

Hence for a continuous random variable X, we
consider the cumulative distribution function (cdf)

FX(x) defined as P{X x}.
The cdf is basically the probability that X takes on a
value less than or equal to x.

The cdf can be used to compute cumulative interval
measures, that is the probability that X takes on a

value greater than a and less than or equal to b, i.e.
P(a < X b) = FX (b) -FX (a ).
Cumulative interval measure
Random variable: continuous example

Consider a cdf of the form:
FX (x) = 0 for x 0, and

FX (x) = 1-exp(-x2) otherwise
To find: probability that X exceeds 1
P(X > 1) = 1-P(X 1)=1-FX (1) = e-1
Probability Density Function (pdf)

The pdf of a random variable X at a value x is the
derivative of its cumulative distribution function (cdf)

at that value x.
It is a non-negative function fX(x) such that for any set
B of real numbers, we have
P{ X B} f X ( x )dx
B
Properties:
( x )dx 1
P(a X b) f X ( x )dx FX (b) FX ( a )

a
a
P( X a ) f X ( x )dx 0
a
Probability Density Function

Another way of looking at this concept:
P{a / 2 X a / 2}
a / 2
( x )dx f (a )
a / 2
f X (a ) lim 0
P{a / 2 X a / 2}
Examples: Popular families of PDFs

Gaussian (normal) pdf:
f X ( x)
2
2
1
e ( x ) /( 2 )
2
Examples: Popular families of PDFs

Bounded uniform pdf:
1
,a x b
(b a )
0 otherwise
f X ( x)
Expected Value (Expectation) of a

random variable
It is also called the mean value of the random variable.
For a discrete random variable X, it is defined as:
E ( X ) xi P( X xi )
i
For a continuous random variable X, it is defined as:
E( X )
xf
( x )dx
The expected value should not be (mis)interpreted to be the
value that X usually takes on its the average value, not

the most frequently occurring value.
Expected Value (Expectation) of a

random variable
For some pdfs, the expected value is not always
defined, i.e. the integral below may not have a finite

value.
E( X )
xf
( x )dx
Expected Value: examples

The expected value that shows up when you throw a
die is 1/6(1+2+3+4+5+6) = 3.5.

The game of roulette consists of a ball and wheel with
38 numbered pockets on its side. The ball rolls and

settles on one of the pockets. If the number in the
pocket is the same as the one you guessed, you win
$35 (probability 1/38), otherwise you lose $1
(probability 37/38). The expected value of the amount
you earn after one trial is: (-1)37/38 +(35)1/38 =
$-0.0526
A Game of Roulette
Expected value of a function of

random variable
Consider a function g(X) of a discrete random variable
X. The expected value of g(X) is defined as:

E ( g ( X )) g ( xi ) P( X xi )
i
For a continuous random variable, the expected value
of g(X) is defined as:
E ( g ( X ))
g ( x) f
( x )dx
Properties of expected value
E (ag ( X ) b) ( ag ( x ) b) f X ( x )dx
ag ( x) f
( x )dx bf X ( x )dx
aE ( g ( X )) b why ?
This property is called the linearity of the expected value. In general, a
function f(x) is said to be linear in x is f(ax+b) = af(x)+b where a and b are
constants. In this case, the expected value is not a function but an operator
(it takes a function as input). An operator E is said to be linear if
E(af(x) + b) = a E(f(x)) + b.
Properties of expected value

Suppose you want to predict the value of a random variable with a known
mean. On an average, what value will yield the least squared error?
Let X be the random variable and c be its predicted value.

We want to find c such that E(( X - c)2 ) is minimized.
Let be the mean of X .
Then
E((X-c) 2 ) E (( X c)2 )
E (( X )2 ( c)2 2( X )( c))
E (( X )2 ) E (( c)2 ) 2 E (( X )( c))
E (( X )2 ) ( c)2 0
E (( X )2 )
The expected value is the value that yields the least
mean squared prediction error!
The median
What minimizes the following quantity?
J (c) | x c | f X ( x )dx
F (c) | x c | f X ( x )dx | x c | f X ( x )dx

c
(c x ) f X ( x )dx ( x c) f X ( x )dx
c
cf X ( x )dx xf X ( x )dx xf X ( x )dx cf X ( x )dx

c
cFX (c) xf X ( x )dx xf X ( x )dx c(1 FX (c))
The median
c
J (c) cFX (c) xf X ( x )dx xf X ( x )dx c(1 FX (c))

c
J (c) cFX (c) q( x )dx q( x )dx c(1 FX (c))

q( x ) xf X ( x )
In this derivation, we are assuming

that the two definite integrals of q(x)
exist! This proof wont go through
otherwise.
J (c) cFX (c) (Q (c) Q ( )) (Q () Q (c)) c(1 FX (c))

(Q ( x ) xf X ( x )dx )
2cFX (c) c 2Q (c) Q () Q ( )
The median
J (c) 2cFX (c) c 2Q(c) Q() Q()
J ' (c) 0
2cf X (c) 2 FX (c) 1 2q(c) 0
2cf X (c) 2 FX (c) 1 2cf X (c) 0
2 FX (c) 1 0
FX (c) 1 / 2
This is the median by definition and it minimizes J(c). We can double
check that J(c) >= 0. Notice the peculiar definition of the median for the
continuous case here! This definition is not conceptually different from
the discrete case, though. Also, note that the median will not be unique if
FX is not differentiable at c.
Variance
The variance of a random variable X tells you how much
its values deviate from the mean on an average.

The definition of variance is:
Var ( X ) E[( X )2 ] ( x )2 f X ( x )dx
The positive square-root of the variance is called the
standard deviation.
For some distributions, the variance (and hence standard
deviation) may not be defined, because the integral may not
have a finite value.
Variance: some intuition

density
density
Low-variance probability mass functions or probability densities
tend to be concentrated around one point. High variance densities
are spread out.
Variance: Alternative expression

The definition of variance is:
Var ( X ) E[( X )2 ] ( x )2 f X ( x )dx
Alternative expression:
Var ( X ) E[( X ) 2 ] E[ X 2 2 2 X ]
E[ X 2 ] 2 2 E[ X ]
E[ X 2 ] 2 2 2 why ?
E[ X 2 ] 2
E[ X 2 ] ( E[ X ]) 2
Variance: properties
Property:
Var (aX b) E[( aX b E (aX b)) 2 ]

E[( aX b (a b)) 2 ]
E [a 2 ( X ) 2 ]
a 2 E[( X )2 ] a 2Var ( X )
Probabilistic inequalities
Sometimes we know the mean or variance of a random
variable, and want to guess the probability that the

random variable can take on a certain value.
The exact probability can usually not be computed as
the information is too less. But we can get upper or

lower bounds on this probability which can influence
our decision-making processes.
Probabilistic inequalities
Example: Lets say the average annual salary offered to a
CSE Btech-4 student at IITB is $100,000. Whats the

probability that you (i.e. a randomly chosen student) will
get an offer of $110,000? Additionally, if you were told
that the variance of the salary was 50,000, whats the
probability that your package is between $90,000 and
$110,000?
Markovs inequality
Let X be a random variable that takes only non-
negative values. For any a > 0, we have

P{X a} E[ X ] / a
Proof: next slide
Markovs inequality
Proof:
E[ X ] xf X ( x )dx
0
a
xf X ( x )dx xf X ( x )dx
xf X ( x )dx
a
af X ( x )dx
a
a f X ( x )dx
a
aP{ X a}
Chebyshevs inequality
For a random variable X with mean and variance 2,
we have for any value k > 0,

P{| X | k}
k2
Proof: follows from Markovs inequality
( X )2 is a non - negative random variable
P{( X )2 k 2 } E[( X )2 ] / k 2 2 / k 2
P{| X | k} 2 / k 2
Chebyshevs inequality: another form

For a random variable X with mean and variance 2,
we have for any value k > 0,

P{| X | k}
k2
If I replace k by k, I get the following:
1
P{| X | k } 2
k
Back to counting money!

Let X be the random variable indicating the annual
salary offered to you when you reach Btech-4

Then
100K
P{ X 110K}
0.9090 90%
110K
50 K
P{| X 100 K | 10 K }
0.0005 0.05%
10 K 10 K
P{| X 100 K | 10 K } 1 0.05% 99.5%
Back to the expected value

When I tell you that the expected value of a random
die variable is 3.5, what does this mean?

If I throw the die n times, and average the results, I
should get a value close to 3.5 provided n is very large

(not valid if n is small).
As n increases, the average value should move closer
and closer towards 3.5.

Thats our basic intuition!
https://en.wikipedia.org/wiki/Law_of_large_numbers
Back to the expected value: weak

law of large numbers
This intuition has a rigorous theoretical justification in
a theorem known as the weak law of large numbers.

Let X1, X2,,Xn be a sequence of independent and
identically distributed random variables each having

mean . Then for any > 0, we have:
P{|
X 1 X 2 ... X n
| } 0 as n
n
Back to the expected value: weak

law of large numbers
Let X1, X2,,Xn be a sequence of independent and
identically distributed random variables each having

mean . Then for any > 0, we have:
X X 2 ... X n
P{| 1
| } 0 as n
n
Empirical (or
sample) mean
Proof: follows immediately from Chebyshevs
inequality
X 1 X 2 ... X n
X 1 X 2 ... X n
n 2 2
E(
) ,Var (
) 2
,
n
n
n
n
X 1 X 2 ... X n
2
P{|
| } 2
n
n
X X 2 ... X n
lim n P{| 1
| } 0
n
The strong law of large numbers

The strong law of large numbers states the following:
P(lim n
X 1 X 2 ... X n
) 1
n
This is stronger than the weak law because this states that
the probability of the desired event (that the empirical mean

is equal to the actual mean) is equal to 1 given enough
samples. The weak laws states that it tends to 1.
The proof of the strong law is formidable and beyond the
scope of our course.
(The incorrect) Law of averages

As laymen we tend to believe that if something has
been going wrong for quite some time, it will suddenly

turn right using the law of averages.
This supposed law is actually a fallacy it reflects
wishful thinking, and the core mistake is that we

mistake the distribution of samples among a small set
of outcomes for the distribution of a larger set.
This is also called as Gamblers fallacy.
(The incorrect) Law of averages

Lets say a gambler independently tosses an unbiased
coin 20 times, and gets a head each time. He now

applies the law of averages and believes that it is
more likely that the next coin toss will yield a tail.
The mistake is as follows: The probability of getting
all 21 heads = (1/2)21. The probability of getting 20

heads and 1 tail also = (1/2)21.
Joint distributions/pdfs/pmfs
Jointly distributed random variables

Many times in statistics, one needs to model
relationships between two or more random variables

for example, your CPI at IITB and the annual salary
offered to you during placements!
Another example: average amount of sugar consumed
per day and blood sugar level recorded in a blood test.

Another example: literacy level and crime rate.
Joint CDFs
Given continuous random variables X and Y, their joint
cumulative distribution function (cdf) is defined as:
FXY ( x, y ) P( X x, Y y )
The distribution of either random variable (called as
marginal cdf) can be obtained from the joint distribution as

follows:
FX ( x ) P( X x, Y ) FXY ( x, )
FY ( y ) P( X , Y y ) FXY (, y )
Ill explain this

a few slides
further down
These definitions can extended to handle more than two
random variables as well.
Joint PMFs
Given two discrete random variables X and Y, their
joint probability mass function (pmf) is defined as:

pXY ( xi , y j ) P( X xi , Y y j )
The pmf of either random variable (called as marginal
pmf) can be obtained from the joint distribution as

follows:
P{ X xi } P({ X xi , Y y j })
j
P{ X xi , Y y j } p( xi , y j )
j
Why?
Joint PMFs: Example

Consider that in a city 15% of the families are childless, 20% have
only one child, 35% have two children and 30% have three
children. Let us suppose that male and female child are equally
likely and independent.
What is the probability that a randomly chosen family has no
children?
P(B = 0, G = 0) = 0.15 = P(no children)
Has 1 girl child?
P(B=0,G=1)=P(1 child) P(G=1|1 child) = 0.2 x 0.5 = 0.1
Has 3 girls?
P(B = 0, G = 3) = P(3 children) P(G=3 | 3 Children) = 0.3 x (0.5)3
Has 2 boys and 1 girl?
P(B = 2, G = 1) = P(3 children) P(B = 2, G = 1| 3 children) = 0.3 x
(1/8) x 3 = 0.1125 (all 8 combinations of 3 children are equally
likely. Out of these there are 3 of the form 2 boys + 1 girl)
Joint PDFs
For two jointly continuous random variables X and Y,
the joint pdf is a non-negative function fXY(x,y) such

that for any set C in the two-dimensional plane, we
have:
P{( X , Y ) C}
XY
( x, y )dxdy
( x , y )C
The joint CDF can be obtained from the joint PDF as
follows:
a b
FXY (a, b)
XY
( x, y )dxdy
2
f XY (a, b)
FXY ( x, y ) |x a , y b
xy
X
The joint probability that (X,Y) belongs to any
arbitrary-shaped region in the XY-plane is
obtained by integrating the joint pdf of (X,Y)
over that region (eg: region C)
Joint and marginal PDFs

The marginal pdf of a random variable can be
obtained by integrating the joint pdf w.r.t. the other

random variable(s):
f X ( x)
XY
( x, y )dy
XY
( x, y )dy
fY ( y )
f X ( x)
XY
( x, y )dx
( x )dx
XY
FX ( a ) FXY ( a, )
( x, y )dydx
Independent random variables

Two continuous random variables are said to be
independent if and only if:

x, y, f XY ( x, y ) f X ( x) fY ( y )
i.e., the joint pdf is equal to the product of the

marginal pdfs.
For independent random variables, the joint CDF is
also equal to the product of the marginal CDFs:

FXY ( x, y ) FX ( x) FY ( y )
Try proving this yourself!

Some n continuous random variables are said to be
mutually independent if and only if:

x1 , x2 ,..., xn ,
f X1 , X 2 ,...,X n ( x1 , x2 ,..., xn ) f X1 ( x1 ) f X 2 ( x2 )... f X n ( xn )
i.e., the joint pdf is equal to the product of all n

marginal pdfs.
Note that this condition is stronger than pairwise
independence!
( xi , x j ),1 i n,1 j n, i j,
f X i , X j ( xi , x j ) f X i ( xi ) f X j ( x j )

Mutual independence between n random variables
implies that they are pairwise independent, or in fact,

k-wise independent for any k < n.
But pairwise independence does not necessarily imply
mutual independence.
Concept of covariance
The covariance of two random variables X and Y is
defined as follows:
Cov( X , Y ) E[( X X )(Y Y )]
Further expansion:
Cov( X , Y ) E[( X X )(Y Y )]

E[ XY X Y Y X X Y ]
E[ XY ] X Y Y X X Y why ?
E[ XY ] X Y
E[ XY ] E[ X ]E[Y ]
Concept of covariance: properties

Cov(X,Y) = Cov(Y, X)
Cov(X, X) = Var(X) [verify this yourself!]
Cov(aX,Y) = aCov(X,Y) [prove this!]
Relationship with correlation coefficient:
Cov( X , Y )
r( X ,Y )
Var ( X )Var (Y )

Cov( X Z , Y ) Cov( X , Y ) Cov( Z , Y )
Pr oof :
Cov( X Z , Y ) E[( X Z )Y ] E[ X Z ]E[Y ]
E[ XY ZY ] E[ X ]E[Y ] E[ Z ]E[Y ]
E[ XY ] E[ X ]E[Y ] E[ ZY ] E[ Z ]E[Y ]
Cov( X , Y ) Cov( Z , Y )
Cov( X i , Y ) Cov( X i , Y )
i
Cov( X i , Y j ) Cov( X i , Y j )
i
Try proving this

yourself! Along
similar lines as the
previous one.

Cov( X i , Y ) Cov( X i , Y )
i
Cov( X i , Y j ) Cov( X i , Y j )
i
Var ( X i ) Cov( X i , X i )
i
Cov( X i , X j )
i
Cov( X i , X i ) Cov( X i , X j )
i
j i
Var ( X i ) Cov( X i , X j )
i
j i
Notice that the variance of the

sum of random variables is not
equal to the sum of their
individual variances. This is quite
unlike the mean!

For independent random variables X and Y, Cov(X,Y) =
0, i.e. E[XY] = E[X]E[Y].

Proof:
E[ XY ] xi y j P{ X xi , Y y j }
i
xi y j P{ X xi }P{Y y j }
i
xi P{ X xi } y j P{Y y j }
i
E[ X ]E[Y ]
Cov( X , Y ) E[( X X )(Y Y )]

E[ XY ] X E[ X ] Y E[Y ] X Y
E[ XY ] E[ X ]E[Y ] 0

Given random variables X and Y, Cov(X,Y) = 0 does
not necessarily imply that X and Y are independent!

Proof: Construct a counter-example yourself!
Conditional pdf/cdf/pmf
Given random variables X and Y with joint pdf fXY(x,y),
then the conditional pdf of X given Y = y is defined as

follows:
f X |Y ( x | y )
f XY ( x, y )
FX |Y ( x | y )
fY ( y )
x
Conditional cdf FX|Y(x,y):

FX |Y ( x | y ) lim 0 P( X x | y Y y )
f X ,Y ( z, y )

dz
fY ( y )
http://math.arizona.edu/~jwatkins/m-conddist.pdf
X |Y
( z | y )dz
Conditional mean and variance

Conditional densities or distributions can be used to
define the conditional mean (also called conditional

expectation) or conditional variance as follows:
E( X | Y y)
xf
X |Y
( x, y )dx
Var ( X | Y y ) ( x E ( X | Y y )) 2 f X |Y ( x, y )dx
Example
f ( x, y ) 2.4 x ( 2 x y ),0 x 1,0 y 1
0 otherwise
Find conditional density of X given Y y.
Find conditional mean of X given Y y.

Random Variables PDF

Uploaded by

Copyright:

Available Formats

Random Variables PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Random Variables PDF

Uploaded by

Copyright:

Available Formats

Random Variables

distribution function (cdf)

interested in the observed values, but in some

values of two dice throws, or the number of heads

random experiments are called as random variables

This is called the probability

Random variable: Notation

denoted by lower case.

Random variable: discrete

finite or infinite sequence are called discrete random

value x, i.e. P(X=x), is called as the probability mass

Random variable: continuous

continuum are called continuous random variables.

weight) of an object are usually continuous quantities,

Random variable: continuous

it takes on any particular value within a continuum is

in the interval [0,4] in the example on the previous

variables does not mean the event will never occur!

Random variable: continuous

consider the cumulative distribution function (cdf)

value less than or equal to x.

measures, that is the probability that X takes on a

Cumulative interval measure

Random variable: continuous example

FX (x) = 0 for x 0, and

Probability Density Function (pdf)

derivative of its cumulative distribution function (cdf)

B of real numbers, we have

P(a X b) f X ( x )dx FX (b) FX ( a )

Probability Density Function

Examples: Popular families of PDFs

Examples: Popular families of PDFs

Expected Value (Expectation) of a

For a continuous random variable X, it is defined as:

The expected value should not be (mis)interpreted to be the

value that X usually takes on its the average value, not

Expected Value (Expectation) of a

defined, i.e. the integral below may not have a finite

Expected Value: examples

die is 1/6(1+2+3+4+5+6) = 3.5.

38 numbered pockets on its side. The ball rolls and

Expected value of a function of

X. The expected value of g(X) is defined as:

For a continuous random variable, the expected value

of g(X) is defined as:

Properties of expected value

Properties of expected value

Let X be the random variable and c be its predicted value.

F (c) | x c | f X ( x )dx | x c | f X ( x )dx

cf X ( x )dx xf X ( x )dx xf X ( x )dx cf X ( x )dx

cFX (c) xf X ( x )dx xf X ( x )dx c(1 FX (c))

J (c) cFX (c) xf X ( x )dx xf X ( x )dx c(1 FX (c))

J (c) cFX (c) q( x )dx q( x )dx c(1 FX (c))

In this derivation, we are assuming

J (c) cFX (c) (Q (c) Q ( )) (Q () Q (c)) c(1 FX (c))

its values deviate from the mean on an average.

Var ( X ) E[( X )2 ] ( x )2 f X ( x )dx

The positive square-root of the variance is called the

Variance: some intuition