Probability and Random Process
Probability and Random Process
Probability and Random Process
assignment p for the members of , satisfying the probability axioms. Bayes theorem and total probability theorem were discussed. field not being the power set of the sample space when S is We talked about the uncountable. Let us move on to Random Variables. To use the properties of functions of a real variable we need to move from p[.] to functions on the real line . We look at Borel sets which are members of the field generated by the unbounded sets (- , x] . field ? What are the sets in this (a, b] = (
, b] ( , a] 1 1 ,a ] {a} = lim (a n n n [a, b] = (a, b] {a} [a, b) = [a, b] {b} (a, b) = (a, b] {b} field in addition to countable sets Also, ( , b), [a, ), (a, ) are members of the and complements of countable sets. This collection of subsets of is called Borel sets ( ) or just .
Random Variable
Consider a random experiment with probability space ( S , , p) . Definition: A random variable X is a mapping from S to such that sets in are pulled back to sets in , i.e. X: S such that X-1(B) for all B . Will every function X: S a random variable? NO!
Example: S = {1,2,3,,n} and ={ , S ,{1},{2,3,...,n}} . Then the function X(i) = 2i for all i = 1, 2,,n is not a random variable. If = 2S then every function X will be measurable. We usually take countable. = 2S when S is
When S is uncountable we have 2S and we do have functions which are not random variables (non-measurable). Only solace is that measurable functions are closed under sums, products and limits.
Properties of cdf: 1. FX(x) is non-negative, i.e. FX(x) 0, x . 2. FX(x) 1, x . 3. FX(x) is a non-decreasing function i.e. for x, y
h o
we have FX(x)
FX(y),
when x < y. 4. FX(x) is continuous from the right i.e. lim FX (a h) FX (a) for a
5.
Inverse problem: Will every function with the above four properties be a cdf of a random experiment? Yes! Of Course! On a sample space [0, 1] - used in simulation. We have three types of random variables and their cdfs: Discrete, continuous and mixed How to calculate probabilities using cdf? Example 1: Interval (a, b] Define the set A as {s S | a < X(s) b} = {a < X b} A = {X b}{X a} i.e. {X b} = A { X a} which is disjoint union. p[a < X b] = FX(b) FX(a) Example 2: Interval [a, b]: p[a X b] = FX(b) FX(a) + p[X = a] 1 1 ,a ] we get Using {X=a} = lim (a n n n 1 1 X a ] p[X = a] = lim p [a n n n 1 1 ) lim FX (a ) = lim FX (a n n n n = FX (a ) FX (a )
Example 3: Interval [a, b): p[a X < b] = FX(b) FX(a) + p[X = a] p[X = b] Example 4: Interval (a, b): p[a < X < b] = FX(b) FX(a) p[X = b] Example 5: Interval (a, ): P[a < X ] = 1 FX(a)
Conditional cdf:
Consider a probability model ( S , , p) and an arbitrary event A , and a rv X defined on S. Since FX(x) is the probability of the event B = [X x], conditional probabilities p[B|A] make sense when p[A] > 0. Definition: The conditional cdf (cumulative distribution function) of a rv X, given an event A such that p[A] > 0, is defined as p[{X x} A] FX(x|A) = . A may or may not be related to the rv X p[ A] Till now: We have reduced the probability assignment problem from , to specifying a function FX(x) of single argument. Seen how to calculate probabilities of intervals and their union, complements etc. This will be enough to calculate a lot of probabilities when used along with probability axiom 3 But how to find the cdf? Big question! Even if we find one how do we check axiom 3?
f X ( x)dx 1.
If a function f satisfies the above two properties is there a rv X such that X has f(x) as its pdf? Yes, as we can get a cdf from this function f(x).
Note that
f X ( x)dx FX ( y).
b] =
a
f X ( x)dx
b
b] =
a
b
f X ( x)dx
f X ( x)dx
a b
X < b] =
f X ( x)dx
5. p[a < X ]
=
a
f X ( x)dx
In the computation of these probabilities relevant delta functions must be used. Additivity axiom follows from the additivity of integrals. (There are subtle mathematical points when the events are not intervals).
Conditional pdf
Definition: The conditional pdf of a rv X, given an event A with p[A] > 0, is defined dFX ( x | A) as fX(x|A) = dx Pdf in practice: From the definition of pdf we can give the following interpretation p[ X x dx] p[ X x] fX(x) , for very small dx dx fX(x)dx p[X x + dx] p[X x] State of affairs till now: Able to assign probabilities satisfying axiom 3 Need to assign probabilities to only infinitesimal intervals (x, x + dx] Need not worry about more mathematical rigour than this! Probabilistic Model again! Consider a random experiment E with initial probability model ( S , , p) . Suppose we define a random variable X on S with cdf FX(x) and pdf fX(x). The random variable X transforms ( S , , p) into [ , ( ), FX(x)] or equivalently into [ , ( ), fX(x)]. It is to be noted that regardless of the random experiment and the random variable X, the sample space and the collection ( ) are the same for all experiments. Thus the original random experiment can be implicitly considered in our mind and we take the liberty of naming fX(x) [or FX(x)] as the probabilistic model of the random experiment! This is our focus from now on.
Standard Random Variable Models There are infinitely many discrete and continuous random variables. Among these, a few have wide applications.
Discrete Models: If the rvs take a countable number of values {ak} then their pdf is a sum of delta functions: fX(x) = p[ X ak ] ( x ak ) . Alternatively we give the probability mass
ak
function pmf
pk
1.
1. Bernoulli: The Bernoulli rv X takes only two values 0 and 1. Therefore SX = {0, 1}. Its pmf is given in terms of the parameter p: p[x = 1] = p, p[x = 0] = 1 p; Examples: coin flip, rolling a die, indicator function of an event, arrivals in a very small time lot of length t 2. Binomial: The binomial random variable X with parameters n, p has a sample space SX = {0, 1, 2, , n}. Here p [0, 1] is a probability, and n is a positive n k p (1 p) n k , k = 0,1,, n. The binomial rv integer. The pmf is p[X = k] = k
n
X=
k 1
Examples: The number of heads in n flips of a coin, the total number of arrivals in an interval. 3. Geometric: The geometric rv X with parameter p has a sample space SX = {0, 1, 2, } i.e. all non-negative integers. The pmf is given by p[X = k] = p(1 p)k, k = 0,1,2, Example: First head in tossing a coin. 4. Negative Binomial: The Negative Binomial (Pascal) random variable X with parameters r, p has a sample space SX = {r, r + 1, r + 2, }, where r is a positive integer and p is a probability. The pmf is k 1 r p (1 p) k r , k = r, r + 1, . p[X = k] = r 1 Example: The no of tosses required to get rth head. 5. Poisson: Poisson rv X, with parameter
k
k!
e , k = 0, 1, 2, where
Examples: The number of arrivals in a queuing system, the number of errors in a printed page.
6. Discrete Uniform: The discrete uniform rv X with parameters a, b where a and b are integers has the sample space SX = {a, a+1, , b 1, b}. Its pmf is given by 1 p[X = k] = for k = a, a+1,,b1, b b a 1 Continuous Models: If the rvs take a continuum of values (uncountable) then the rv X is called a continuous rv. For continuous random variables the cdf is continuous. 1. Uniform: The continuous uniform rv X wi th parameters a, b where a and b are real has a sample space SX = [a, b], where a < b. The pdf is fX(x) = x
[a, b] and = 0 otherwise.
1 b a
for
2. Gaussian (Normal): The Gaussian random variable X is defined on the sample space with parameters , 2 where, is a real parameter and 2 is any positive real number. The pdf is given by fX(x) =
x2
1 2
(x
)2
2
,x (
, ).
(x) =
0,
Examples: Any normal phenomenon like noise, distribution of height of people in a certain age group etc. 3. Exponential: The exponential rv X, with parameter > 0, sample space SX = [0, ). The pdf is given by fX(x) =
, x [0, ) .
Examples: Time between two consecutive arrivals, life of electronic components. 4. Gamma: The gamma random variable X, with parameters a, has sample space SX = [0, ). The parameters a, are real positive numbers. The pdf is given by ( x) a 1 e x fX(x) = , x [0, ) , where (a) s a 1e s ds , is the so-called (a) 0 gamma function. Sum of a exponential rvs is gamma if a is a positive integer.
5. Cauchy: The Cauchy rv X, with parameters a, b has a sample space SX = ( Here a, are real numbers with b > 0. The pdf is given by b fX(x) = , x ( , ). [(x a)2 b2 This is used in communication theory.
, ).
6. Erlang: This model is a special case of gamma random variable when a is a positive integer n. Sample space SX = [0, ). Pdf is ( x) n 1 e x fX(x) = , x [0, ) . (n 1)!
Note that if two rvs X1, X2 both have the same pdf then they have the same average. Hence it is a property of the pdf and not rv. Question: Is the mean well defined for any pdf fX(x)? We know that it is well defined only when
| xfx ( x) | dx =
| x | f x ( x)dx <
definition of E(X) converges absolutely. For example the Cauchy rv dos not have a mean.
Definition: The variance of a random variable X (with a finite mean ), denoted by var(X) is defined as
(x
.The variance is
Question: Is the variance well defined for any pdf fX(x)? It is, as the integrand in the definition of variance is always positive, even though it may equal + . When we fit pdfs for experimental data means and variances will definitely be well defined since fX(x) will vanish outside a finite interval. Properties of mean and variance: 1. Shifting a rv by a constant c . E(X + c) = E(X) + c Var(X+c) = Var(X) 2. Scaling a rv by a constant c , E(c.X) = c.E(X) Var(c.X) = c2.Var(X) 3. Relationship between variance and mean: Var(X) = Higher Moments: Definition: The nth raw moment of a rv X, denoted by E(Xn) is defined as E(Xn) =
x 2 f X ( x)dx
x n f X ( x)dx
(x
)n f X ( x)dx
2
Average and Variance of the models: Random Variable 1. Bernoulli 2. Binomial 3. 4. 5. 6. 7. Geometric Negative Binomial Poisson Discrete Uniform Uniform
Average p np 1 p p r p
b a 2 b a 2
(b a)(b a 2) 12 (b a)2 2
1 a
Does not exist n
1
2
a
2
y b ) if a > 0, a y b = 1 FX( ) if a < 0. |a| 2. Cosine Transformation of Uniform Random Variable This transformation is often used in communication channels. Let X be a uniform rv with range [0, 2). The transformation we use is Y = cos(X). Note that for each value y in (1, 1), the set cos-1(y) has two elements, and 2 , where 0 < < and cos() = y. Also 1 f X ( x) , 0 x 2 .Hence 2 1 1 1 fY ( y ) , 1 y 1 , since 2 1 y2 2 1 y2 1 y2 dg 1 . Note that the pdf approaches when y sin( x) dx 1 y2
FY(y) = FX( approaches 1. 3. Clipping Transformation This arises in signal processing applications. Let X be a rv with a range SX = ( , ) . In this case the transformation is x, if | x | a, g(x) = a, if x a,
g(x) kd a, x [kd , (k 1)d ), d a, x [ d ,0), (k 1)d a, x [ (k 1)d , kd ). Let X be a random variable with SX = ( , ) . As g(x) is discrete valued, the pdf fY(y) will have only (.) functions. The probability mass function of Y is
P[Y = kd + a] = p[kd X < (k+1)d], k = 0,1,2,Similarly P[Y = -(k + 1)d + a] = p[-(k + 1)d X < -kd], k = 0,1,2, 5. Logarithmic transformation This is used when certain random variables have values covering several orders of magnitude. The probabilities also may range widely say from 0.1 to 0.00001 (10-5). Such values must be converted to logarithmic scale. Here g(x) = log(x). Hence the range of X is (0, ). Log is a monotone dg 1 function and we have only singleton sets for g-1(y) = ey. Also and dx x hence fY(y) = ey.fX(ey), y ( , ) .
The
characteristic
function
of
the
fX(x)
is
defined
as
( )
f X (x)e j x dx,
for the sign of the exponent. X ( ) is well defined for all pdfs since | e j x | = 1and | X ( ) | 1. Note 1: Characteristic functions are associated with pdfs rather than random variables. Note 2: There is a one to one correspondence between a pdf and its characteristic function. We can recover the pdf from the characteristic function X ( ) by the inverse transform defined as 1 j x f X ( x) d , x . X ( )e 2 Properties of characteristic functions: Theorem: If
X
= 0. Then
E(Xn) =
1 dn X ( ) jn d n
at
0.
Examples: Exponential pdf: If X is an exponential random variable with parameter , then its characteristic function is . j Gaussian pdf: If X is a Gaussian random variable with ,
X
2 2
( ) =
, then
( ) =e
mj
Laplace transforms: Definition: The Laplace Transform of the pdf fX(x) of a positive random variable X is defined as
LX (s)
E (e
sX
)
0
f X ( x)e sx dx.
positive real part. This is a sufficient condition for LX(s) to be well defined. An inverse formula also holds true for the Laplace Transform and there is a one-to-one correspondence between fX(x) and LX(s). c j 1 fX(x) = LX (s)esx ds, x . 2j c j Theorem: Let X be a random variable with Laplace Transform LX(s) that is n times differentiable at s = 0. Then d n LX (s) E(Xn) = ( 1)n at s 0. dsn Note: Expanding LX(s) about the origin s = 0, we get E( X n ) n s . LX(s) = n! n 0 Examples: Gamma pdf: X is a Gamma random variable with parameters
a
Transform is given by
s)a
(a)
0
y a 1e y dy
Bernoulli pmf: For Bernoulli random variable the Laplace Transform is (1 - p) + pes. Probability Generating Function: For a discrete random variable X taking nonnegative integer values, let pX(k) represent p[X = k]. Definition: The probability generating function of the pmf pX(k) is defined as GX(z) = E(zX) =
k 0