The Ultimate Probability Cheatsheet
The Ultimate Probability Cheatsheet
The Ultimate Probability Cheatsheet
Simpsons Paradox
c
P (A) = P (A B1 ) + P (A B2 ) + ...P (A Bn )
P (A|C) = P (A|B1 , C)P (B1 , C) + ...P (A|Bn , C)P (Bn |C)
Counting
n
n!
(n k)!
Not Matter
n + k 1
k
n
k
Na
ve Definition of Probability - If the likelihood of each
outcome is equal, the probability of any event happening is:
P (Event) =
P (A) = P (A B) + P (A B )
c
P (A|B, C) =
Independence
Independent Events - A and B are independent if knowing one
gives you no information about the other. A and B are
independent if and only if one of the following equivalent
statements hold:
P (A B) = P (A)P (B)
P (A B)
P (B|A)P (A)
=
P (B)
P (B)
P (A B|C)
P (B|A, C)P (A|C)
=
P (B|C)
P (B|C)
Odds Form of Bayes Rule, and with extra conditioning (just add C!)
P (A|B)
P (B|A) P (A)
=
P (Ac |B)
P (B|Ac ) P (Ac )
P (B|A, C) P (A|C)
P (A|B, C)
=
P (Ac |B, C)
P (B|Ac , C) P (Ac |C)
P (A|B) = P (A)
PX (x) = P (X = x)
(A B) A B
c
(A B) A B
FX (x0 ) = P (X x0 )
=
2
2
What is the Cumulative Density Function (CDF)? It is the
following function of x.
F (x) = P (X x)
What is the Probability Density Function (PDF)? The PDF,
f (x), is the derivative of the CDF.
0
F (x) = f (x)
P (X = x, Y = y) = P (X = x)P (Y = y)
Or alternatively,
F (x) =
f (t)dt
Distributions
Probability Mass Function (PMF) (Discrete Only) is a function
that takes in the value x, and gives the probability that a
random variable takes on the P
value x. The PMF is a
positive-valued function, and
x P (X = x) = 1
PX (x) = P (X = x)
Cumulative Distribution Function (CDF) is a function that
takes in the value x, and gives the probability that a random
variable takes on the value at most x.
F (x) = P (X x)
Marginal Distributions
MGF For any random variable X, this expected value and function of
dummy variable t;
tX
MX (t) = E(e )
E(X) = x xP (X = x)
Z
xf (x)dx
E(X) =
LotUS states that you can find the expected value of a function
of a random variable g(X) this way:
E(g(X)) = x g(x)P (X = x)
Z
E(g(X)) =
g(x)f (x)dx
tX
MX (t) = E(e
MX (0) = E(X e
(k)
(k)
X
xn
n!
n=0
2
) = E(X ) = k
ct
) = e E(e
(at)X
When you plug any random variable into its own CDF, you get a
Uniform[0,1] random variable. When you put a Uniform[0,1] into an
inverse CDF, you get the corresponding random variable. For example,
lets say that a random variable X has a CDF
F (x) = 1 e
t(X+Y )
) = E(e
tX
)E(e
tY
k = E(X )
Mean 01 = E(X)
Variance 02 = E(X 2 ) = Var(X) + (01 )2
Mean, Variance, and other moments (Skewness) can be
expressed in terms of the moments of a random variable!
Multivariate LotUS
P
Review: E(g(X))
x g(x)P (X = x), or
R =
E(g(X)) =
g(x)fX (x)dx
For discrete random variables:
XX
E(g(X, Y )) =
g(x, y)P (X = x, Y = y)
x
Joint Distributions
Review: Joint Probability of events A and B: P (A B)
Both the Joint PMF and
Joint PDF must be non-negative and
P P
sum/integrate to 1. ( x y P (X = x, Y = y) = 1)
R R
( x y fX,Y (x, y) = 1). Like in the univariate cause, you sum/integrate
the PMF/PDF to get the CDF.
Conditional Distributions
Review: By Bayes Rule, P (A|B) =
Similar conditions
P (B)
apply to conditional distributions of random variables.
For discrete random variables:
P (X = x, Y = y)
P (X = x|Y = y)P (Y = y)
P (Y = y|X = x) =
=
P (X = x)
P (X = x)
For continuous random variables:
) = MX (t) MY (t)
P (B|A)P (A)
ct
) = e MX (at)
Universality of Uniform
F (X) = 1 e
k 0X
MY (t) = E(e
X
X
0k tk
E(X k )tk
=
k!
k!
k=0
k=0
dk tX
dk
tX
k tX
E(e ) = E( k e ) = E(X e )
k
dt
dt
MX (t) =
)=
Whats the point? You dont need to know the PDF/PMF of g(X)
to find its expected value. All you need is the PDF/PMF of X.
e =
(k)
k = E(X ) = MX (0)
Note that
Cov(X, X) = E(XX) E(X)E(X) = Var(X)
Correlation is a rescaled variant of Covariance that is always
between -1 and 1.
Corr(X, Y ) = p
Cov(X, Y )
Var(X)Var(Y )
Cov(X, Y )
X Y
Var(X1 + X2 + + Xn ) =
n
X
i=1
Var(Xi ) + 2
X
i<j
Cov(Xi , Xj )
Continuous Transformations
Why do we need the Jacobian? We need the Jacobian to rescale
our PDF so that it integrates to 1.
One Variable Transformations Lets say that we have a random
variable X with PDF fX (x), but we are also interested in some
function of X. We call this function Y = g(X). Note that Y is
a random variable as well. If g is differentiable and one-to-one
(every value of X gets mapped to a unique value of Y ), then
the following is true:
dy
dx
= fX (x)
fY (y)
fY (y) = fX (x)
dy
dx
To find fY (y) as a function of y, plug in x = g 1 (y).
d 1
1
fY (y) = fX (g (y))
g (y)
dy
The derivative of the inverse transformation is referred to the
Jacobian, denoted as J.
d 1
J =
g (y)
dy
Convolutions
Definition If you want to find the PDF of a sum of two independent
random variables, you take the convolution of their individual
distributions.
Z
fX+Y (t) =
fx (x)fy (t x)dx
Poisson Process
n!
j1
nj
t
(1 t)
(j 1)!(n j)!
X + Y Gamma(a + b, )
X
Beta(a, b)
X+Y
X+Y
X
X+Y
Beta(1, 1) Unif(0, 1)
Continuous Y
R
E(Y ) =
R yfY (y)dy
E(Y |X = x) =R
yfY |X (y|x)dy
E(Y |A) =
yf (y|A)dy
Conditional Variance
Eves Law (aka Law of Total Variance)
Var(Y ) = E(Var(Y |X)) + Var(E(Y |X))
Chain Properties
We use
to denote is approximately distributed. We can use the
central limit theorem when we have a random variable, Y that is a
sum of n i.i.d. random variables with n large. Let us say that
2
E(Y ) = Y and Var(Y ) = Y
. We have that:
2
Y
N (Y , Y )
When we use central limit theorem to estimate Y , we usually have
n = 1 (X1 + X2 + + Xn ).
Y = X1 + X2 + + Xn or Y = X
n
2
Specifically, if we say that each of the iid Xi have mean X and X
,
then we have the following approximations.
2
X1 + X2 + + Xn
N (nX , nX )
2
n = 1 (X1 + X2 + + Xn )
X
N (X , X )
n
n
We use
to denote converges in distribution to as n . These
are the same results as the previous section, only letting n and
not letting our normal distribution have any n terms.
1
d
N (0, 1)
(X1 + + Xn nX )
n
n X d
X
N (0, 1)
/n
Definition
A Markov Chain is a walk along a (finite or infinite, but for this class
usually finite) discrete state space {1, 2, . . . , M}. We let Xt denote
which element of the state space the walk is on at time t. The Markov
Chain is the set of random variables denoting where the walk is at all
points in time, {X0 , X1 , X2 , . . . }, as long as if you want to predict
where the chain is at at a future time, you only need to use the present
state, and not any past information. In other words, the given the
present, the future and past are conditionally independent. Formal
Definition:
P (Xn+1 = j|X0 = i0 , X1 = i1 , . . . , Xn = i) = P (Xn+1 = j|Xn = i)
State Properties
A state is either recurrent or transient.
If you start at a Recurrent State, then you will always return
back to that state at some point in the future. You can
check-out any time you like, but you can never leave.
Otherwise you are at a Transient State. There is some
probability that once you leave you will never return. You
dont have to go home, but you cant stay here.
A state is either periodic or aperiodic.
If you start at a Periodic State of period k, then the GCD of
all of the possible number steps it would take to return back is
> 1.
Otherwise you are at an Aperiodic State. The GCD of all of
the possible number of steps it would take to return back is 1.
Transition Matrix
Element qij in square transition matrix Q is the probability that the
chain goes from state i to state j, or more formally:
qij = P (Xn+1 = j|Xn = i)
To find the probability that the chain goes from state i to state j in m
steps, take the (i, j)th element of Qm .
(m)
Markov Chains
qij
Stationary Distribution
= P (Xn+m = j|Xn = i)
If you have a certain number of nodes with edges between them, and a
chain can pick any edge randomly and move to another node, then this
is a random walk on an undirected network. The stationary
distribution of this chain is proportional to the degree sequence. The
degree sequence is the vector of the degrees of each node, defined as
how many edges it has.
Continuous Distributions
Uniform
Let us say that U is distributed Unif(a, b). We know the following:
Properties of the Uniform For a uniform distribution, the
probability of an draw from any interval on the uniform is
proportion to the length of the uniform. The PDF of a Uniform
is just a constant, so when you integrate over the PDF, you will
get an area proportional to the length of the interval.
Exponential Distribution
Let us say that X is distributed Expo(). We know the following:
Story Youre sitting on an open meadow right before the break of
dawn, wishing that airplanes in the night sky were shooting
stars, because you could really use a wish right now. You know
that shooting stars come on average every 15 minutes, but its
never true that a shooting star is ever due to come because
youve waited so long. Your waiting time is memorylessness,
which means that the time until the next shooting star comes
does not depend on how long youve waited already.
Example The waiting time until the next shooting star is distributed
Expo(4). The 4 here is , or the rate parameter, or how many
shooting stars we expect to see in a unit of time. The expected
1
, or 14 of an hour. You
time until the next shooting star is
can expect to wait 15 minutes until the next shooting star.
Expos are rescaled Expos
Y Expo() X = Y Expo(1)
Memorylessness The Exponential Distribution is the sole
continuous memoryless distribution. This means that its
always as good as new, which means that the probability of it
failing in the next infinitesimal time period is the same as any
infinitesimal time period. This means that for an exponentially
distributed X and any real numbers t and s,
P (X > s + t|X > s) = P (X > t)
Given that youve waited already at least s minutes, the
probability of having to wait an additional t minutes is the
same as the probability that you have to wait more than t
minutes to begin with. Heres another formulation.
X a|X > a Expo()
Example - If waiting for the bus is distributed exponentially
with = 6, no matter how long youve waited so far, the
expected additional waiting time until the bus arrives is always
1
6 , or 10 minutes. The distribution of time from now to the
arrival is always the same, no matter how long youve waited.
Example William throws darts really badly, so his darts are uniform
over the whole room because theyre equally likely to appear
anywhere. Williams darts have a uniform distribution on the
surface of the room. The uniform is the only distribution where
the probably of hitting in any specific region is proportion to
the area/length/volume of that region, and where the density of
occurrence in any one specific spot is constant throughout the
whole support.
Normal
Gamma Distribution
Let us say that X is distributed Gamma(a, ). We know the
following:
Story You sit waiting for shooting stars, and you know that
the waiting time for a star is distributed Expo(). You
want to see a shooting stars before you go home. X is
the total waiting time for the ath shooting star.
Example You are at a bank, and there are 3 people ahead of
you. The serving time for each person is distributed
Exponentially with mean of 2 time units. The
distribution of your waiting time until you begin service
is Gamma(3, 21 )
2 Distribution
Let us say that X is distributed 2n . We know the following:
Story A Chi-Squared(n) is a sum of n independent squared
normals.
Example The sum of squared errors are distributed 2n
Properties and Representations
2
i.i.d.
n = Z1 + Z2 + + Zn , Z
n 1
,
2 2
N (0, 1)
Discrete Distributions
DWR = Draw w/ replacement, DWoR = Draw w/o replacement
DWR
DWoR
Binom/Bern
(Bern if n = 1)
NBin/Geom
(Geom if k = 1)
HGeom
NHGeom
(see example probs)
nk
w+b
n
Multivariate Distributions
Multinomial Let us say that the vector
~ = (X1 , X2 , X3 , . . . , Xk ) Multk (n, p
X
~) where
p
~ = (p1 , p2 , . . . , pk ).
Story - We have n items, and then can fall into any one of the
k buckets independently with the probabilities
p
~ = (p1 , p2 , . . . , pk ).
Example - Let us assume that every year, 100 students in the
Harry Potter Universe are randomly and independently
sorted into one of four houses with equal probability. The
number of people in each of the houses is distributed
Mult4 (100, p
~), where p
~ = (.25, .25, .25, .25). Note that
X1 + X2 + + X4 = 100, and they are dependent.
Multinomial Coefficient The number of permutations of n
objects where you have n1 , n2 , n3 . . . , nk of each of the
different variants is the multinomial coefficient.
n
n!
=
n1 n2 . . . n k
n1 !n2 ! . . . nk !
Joint PMF - For n = n1 + n2 + + nk
n
n
n
n
~ =~
P (X
n) =
p 1 p 2 . . . pk k
n1 n2 . . . nk 1 2
Xi Bin(n, pi )
Xi + Xj Bin(n, pi + pj )
X1 ,X2 ,X3 Mult3 (n,(p1 ,p2 ,p3 ))X1 ,X2 +X3 Mult2 (n,(p1 ,p2 +p3 ))
pk1
p1
,...,
1 pk
1 pk
Multivariate Uniform See the univariate uniform for stories and
examples. For multivariate uniforms, all you need to know is
that probability is proportional to volume. More formally,
probability is the volume of the region of interest divided by
the total volume of the support. Every point in the support has
equal density of value Total1 Area .
n nk ,
~ = (X1 , X2 , X3 , . . . , Xk )
Multivariate Normal (MVN) A vector X
is declared Multivariate Normal if any linear combination is
normally distributed (e.g. t1 X1 + t2 X2 + + tk Xk is Normal
for any constants t1 , t2 , . . . , tk ). The parameters of the
Multivariate normal are the mean vector
~ = (1 , 2 , . . . , k )
and the covariance matrix where the (i, j)th entry is
Cov(Xi , Xj ). For any MVN distribution: 1) Any sub-vector is
also MVN. 2) If any two elements of a multivariate normal
distribution are uncorrelated, then they are independent. Note
that 2) does not apply to most random variables.
Distribution Properties
Important CDFs
Exponential F (X) = 1 ex , x (0, ))
Uniform(0, 1) F (X) = x, x (0, 1)
1
1 +2
Classic Problems
1. Bin(1, p) Bern(p)
Birthday Matches
2. Beta(1, 1) Unif(0, 1)
3. Gamma(1, ) Expo()
1
4. 2n Gamma n
2, 2
5. NBin(1, p) Geom(p)
Reasoning by Representation
1. X Gamma(a, ), Y Gamma(b, ),
X
Beta(a, b)
X
Y X+Y
number of birthday matches and let Ji be the indicator that the ith
pair of people have the same birthday. The probability that any two
n
people share a birthday is 1/365 so E(Y ) =
/365 .
2
2. Bin(n, p) Pois() as n , p 0, np = .
3. U(j) Beta(j, n j + 1)
4. For any X with CDF F (x), F (X) U
Coupon Collector
Formulas
In general, remember that PDFs integrated (and PMFs summed) over
support equal 1.
Geometric Series
2
a + ar + ar + + ar
n1
n1
X
ar = a
k=0
1 rn
1r
There are n total coupons, and each draw, you get a random coupon.
What is the expected number of coupons needed until you have a
complete set? Answer - Let N be the number of coupons needed; we
want E(N ). Let N = N1 + + Nn , N1 is the draws to draw our first
distinct coupon, N2 is the additional draws needed to draw our second
distinct coupon and so on. By the story of First Success,
N2 F S((n 1)/n) (after collecting first toy type, theres (n 1)/n
chance youll get something new). Similarly, N3 F S((n 2)/n), and
Nj F S((n j + 1)/n). By linearity,
n
Exponential Function (e )
X
xn
x2
x3
x n
x
e =
=1+x+
+
+ = lim
1+
n
n!
2!
3!
n
n=1
t1 x
Z
dx = (t)
a1
b1
(1 x)
dx =
(a)(b)
(a + b)
nk
x (1 x)
dx =
1
(n + 1)
n
k
1
1
1
+ + +
log n + 0.57721 . . .
2
3
n
n!
2n
In every time period, Bobo the amoeba can die, live, or split into two
amoebas with probabilities 0.25, 0.25, and 0.5, respectively. All of
Bobos offspring have the same probabilities. Find P (D), the
probability that Bobos lineage eventually dies out. Answer - We use
law of probability, and define the events B0 , B1 . and B2 where Bi
means that Bobo has split into i amoebas. We note that P (D|B0 ) = 1
since his lineage has died, P (D|B1 ) = P (D), and P (D|B2 ) = P (D)2
since both lines of his lineage must die out in order for Bobos lineage
to die out.
P (D) = 0.25P (D|B0 ) + 0.25P (D|B1 ) + 0.5P (D|B2 )
2
Stirlings Approximation
What is the expected number of cards that you draw before you pick
your first Ace in a shuffled deck? Answer - Consider a non-Ace.
Denote this to be card j. Let Ij be the indicator that card j will be
drawn before the first Ace. Note that if j is before all 4 of the Aces in
the deck, then Ij = 1. The probability that this occurs is 1/5, because
out of 5 cards (the 4 Aces and the not Ace), the probability that the
not Ace comes first is 1/5. 1/5 here is the probability that any specific
non-Ace will appear before all of the Aces in the deck. (e.g. the
probability that the Jack of Spades appears before all of the Aces).
Thus let X be the number of cards that is drawn before the first Ace.
Then X = I1 + I2 + ... + I48 , where each indicator correspond to one
of the 48 not Aces. Thus,
X1
n
n
n
+
+ +
= n
n
n1
1
j
j=1
Example Problems
I call 2 UberXs and 3 Lyfts at the same time. If the time it takes for
the rides to reach me is i.i.d., what is the probability that all the Lyfts
will arrive first? Answer - since the arrival times of the five cars are
i.i.d., all 5! orderings of the arrivals are equally likely. There are 3!2!
orderings that involve the Lyfts arriving first, so the probability that
3!2!
the Lyfts arrive first is
= 1/10 . Alternatively, there are 53
5!
ways to choose 3 of the 5 slots for the Lyfts to occupy, where each of
the choices are equally likely. 1 of those choices have all 3 of the Lyfts
5
= 1/10
arriving first, thus the probability is 1/
3
n
Miscellaneous Definitions
Medians A continuous random variable X has median m if
P (X m) = 50%
A discrete random variable X has median m if
P (X m) 50% and P (X m) 50%
Log Statisticians generally use log to refer to ln
i.i.d random variables Independent, identically-distributed random
variables.
= a
1
X+1
= 1/e by a definition of e .
X
k=0
1
X+1
. Answer - By LOTUS,
1 e k
e X k+1
e
=
=
(e 1)
k+1
k!
k=0 (k + 1)!
Calculating Probability
A textbook has n typos, which are randomly scattered amongst its n
pages. You pick a random page, what is the probability that it has no
1
typos? Answer - There is a 1 n
probability that any specific
1 n
typo isnt on your page, and thus a 1 n
probability that there
are no typos on your page. For n large, this is approximately
p(1 s)
s
p(1 p)(1 s)
p2 (1 s)
p(1 s)(p + s(1 p))
+
=
s
s2
s2
E(e
tT
) = E(E(e
tT
|N )) = E((pe + q) ) = s
(pe + 1 p) (1 s)
n=0
s
s
=
1 (1 s)(pet + 1 p)
s + (1 s)p (1 s)pet
tX
E(e ) =
1 (1 )et
So, we would want to try to get our MGF into this form to identify
what is. Taking our original MGF, it would appear that dividing by
s + (1 s)p would allow us to do this. Therefore, we have that
s
E(etT ) =
s
s+(1s)p
=
(1s)p
s + (1 s)p (1 s)pet
1 s+(1s)p et
s
s + (1 s)p
E(X ) =
6
3
But a much nicer way to use the MGF here is via pattern recognition:
note that M (t) looks like it came from a geometric series:
n
X
X
t
1
n! tn
=
=
t
n n!
1
n=0
n=0
n
Markov Chains
Suppose Xn is a two-state Markov chain with transition matrix
0
Q=
1
0
1
1
,
+ +
=
= s1 q10
+
a) Is this Markov Chain irreducible? Is it aperiodic? Answer Yes to both The Markov Chain is irreducible because it can get
from anywhere to anywhere else. The Markov Chain is also
aperiodic because the robber can return back to a square in
2, 3, 4, 5, . . . moves. Those numbers have a GCD of 1, so the chain
is aperiodic.
b) What is the stationary distribution of this Markov Chain? Answer
- Since this is a random walk on an undirected graph, the
stationary distribution is proportional to the degree sequence. The
degree for the corner pieces is 3, the degree for the edge pieces is 4,
and the degree for the center pieces is 6. To normalize this degree
sequence, we divide by its sum. The sum of the degrees is
6(3) + 6(4) + 7(6) = 72. Thus the stationary probability of being
on a corner is 3/84 = 1/28, on an edge is 4/84 = 1/21, and in the
center is 6/84 = 1/14.
c) What fraction of the time will the robber be in the desert in this
game? Answer - From above, 1/14 .
d) Say the robber starts on the desert. What is the expected amount
of moves it will take for the robber to return? Answer - Since
this chain is irreducible and aperiodic, to get the expected time to
return we can just invert the stationary probability. Thus on
average it will take 14 turns for the robber to return to the desert.
Biohazards
Section author: Jessy Hwang
1. Dont misuse the native definition of probability - When
answering What is the probability that in a group of 3 people,
no two have the same birth month?, it is not correct to treat
the people as indistinguishable balls being placed into 12 boxes,
since that assumes the list of birth months { January, January,
January} is just as likely as the list { January, April, June},
when the latter is fix times more likely.
2. Dont confuse unconditional and conditional
probabilities, or go in circles with Bayes Rule P (B|A)P (A)
. It is not correct to say P (B) = 1
P (A|B) =
P (B)
because we know that B happened.; P(B) is the probability
before we have information about whether B happened. It is
not correct to use P (A|B) in place of P (A) on the right-hand
side.
3. Dont assume independence without justification - In the
matching problem, the probability that card 1 is a match and
card 2 is a match is not 1/n2 . - The Binomial and
Hypergeometric are often confused; the trials are independent
in the Binomial story and not independent in the
Hypergeometric story due to the lack of replacement.
4. Dont confuse random variables, numbers, and events. Let X be a r.v. Then f (X) is a r.b. for any function f . In
particular, X 2 , |X|, F (X), and IX>3 are r.v.s.
P (X 2 < X|X 0), E(X), Var(X), and f (E(X)) are numbers.
X = 2Rand F (X) 1 are events. It does not make sense to
write
F (X)dx because F (X) is a random variable. It does
not make sense to write P (X) because X is not an event.
5. A random variable is not the same thing as its
distribution - To get the PDF of X 2 , you cant just square the
PDF of X. The right way is to use one variable transformations
- To get the PDF of X + Y , you cant just add the PDF of X
and the PDF of Y . The right way is to compute the
convolution.
6. E(g(X)) does not equal g(E(X)) in general. - See the St.
Petersburg paradox for an extreme example. - The right way to
find E(g(X)) is with LotUS.
Recommended Resources
Distributions
Distribution
Bernoulli
Bern(p)
P (X = 1) = p
P (X = 0) = q
k
p (1 p)nk
P (X = k) = n
k
Binomial
Bin(n, p)
Geometric
Geom(p)
Negative Binom.
NBin(r, p)
Hypergeometric
HGeom(w, b, n)
Poisson
Pois()
EV
Variance
MGF
pq
q + pet
k {0, 1, 2, . . . n}
np
npq
(q + pet )n
P (X = k) = q k p
k {0, 1, 2, . . . }
r n
P (X = n) = r+n1
p q
r1
q/p
q/p2
rq/p
rq/p2
n {0, 1, 2, . . . }
P (X = k) =
Beta
Beta(a, b)
Chi-Squared
2n
w+b
n
w+bn
n (1
w+b1 n
)
n
k!
e(e
1)
a+b
2
(ba)2
12
x (, )
et+
f (x) = ex
x (0, )
1/
1/2
,t
t
a/
a/2
f (x) =
1
ba
etb eta
t(ba)
2
2
1 e(x ) /(2 )
2
1
(x)a ex x1
(a)
x (0, )
f (x) =
nw
b+w
x (a, b)
f (x) =
p
r
t
( 1qe
t ) , qe < 1
f (x) =
<1
k {0, 1, 2, . . . }
(a+b) a1
x
(1
(a)(b)
2 t2
2
a
<
,t <
x)b1
x (0, 1)
(1)
(a+b+1)
2n
n~
p
Var(Xi ) = npi (1 pi )
Cov(Xi , Xj ) = npi pj
a
a+b
1
xn/21 ex/2
2n/2 (n/2)
x (0, )
f (x) =
Multivar Uniform
A is support
Multinomial
Multk (n, p
~)
b
nk
P (X = k) =
Exponential
Expo()
Gamma
Gamma(a, )
w
k
k {0, 1, 2, . . . , n}
Uniform
Unif(a, b)
Normal
N (, 2 )
p
, qet
1qet
1
|A|
xA
~ =~
P (X
n) =
n1
n
p
n1 ...nk 1
n
. . . pk k
n = n1 + n2 + + nk
P
k
i=1
pi eti
n
Inequalities
Cauchy-Schwarz
p
|E(XY )| E(X 2 )E(Y 2 )
Markov
E|X|
P (X a)
a
Chebychev
P (|X X | a)
Jensen
2
X
a2