0% found this document useful (0 votes)

12 views26 pages

1a Review of discrete probability

The document provides an overview of discrete probability, including definitions and properties of events, conditional probability, independence, random variables, and their distributions. It explains the concepts of expectation, variance, and specific distributions such as Bernoulli and binomial distributions. Key formulas and examples illustrate how to calculate probabilities, expectations, and variances for discrete random variables.

Uploaded by

kamunyaalex1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views26 pages

1a Review of discrete probability

Uploaded by

kamunyaalex1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Stat 504, Lecture 1 1

✬ ✩

Review of
Discrete Probability

An event is denoted by a capital letter near the

beginning of the alphabet (A, B, . . .). The probability
that A occurs is denoted by P (A).
Probability satisﬁes the following elementary
properties, called axioms; all other properties can be
derived from these.
1. 0 ≤ P (A) ≤ 1 for any event A;
2. P (not A) = 1 − P (A);
3. P (A or B) = P (A) + P (B) if A and B are
mutually exclusive events (i.e. A and B cannot
both happen simultaneously).
More generally, if A and B are any events then

P (A or B) = P (A) + P (B) − P (A and B). (1)

If A and B are mutually exclusive, then

P (A and B) = 0 and (1) reduces to axiom 3.
✫ ✪
Stat 504, Lecture 1 2

✬ ✩

Conditional probability. If B is known to have

occurred, then this knowledge may aﬀect the
probability of another event A. The probability of A
once B is known to have occurred is written P (A | B)
and called “the conditional probability of A given B,”
or, more simply, “the probability of A given B.” It is
deﬁned as
P (A and B)
P (A | B) = (2)
P (B)
provided that P (B) = 0.
Independence. The events A and B are said to be
independent if

P (A and B) = P (A) P (B). (3)

By (2), this implies P (A | B) = P (A) and

P (B | A) = P (B). Intuitively, independence means
that knowing A has occurred provides no information
about whether or not B has occurred and vice-versa.

✫ ✪
Stat 504, Lecture 1 3

✬ ✩
Random variables
A random variable is the outcome of an experiment
(i.e. a random process) expressed as a number. We
use capital letters near the end of the alphabet (X, Y ,
Z, etc.) to denote random variables. Random
variables are of two types: discrete and continuous.
Continuous random variables are described by
probability density functions. For example, a
normally distributed random variable has a
bell-shaped density function like this:
0.4
0.3
f(x)

0.2
0.1
0.0

-2 0 2

The probability that X falls between any two

particular numbers, say a and b, is given by the area
under the density curve f (x) between a and b,
b
P (a ≤ X ≤ b) = f (x) dx.
a
✫ ✪
Stat 504, Lecture 1 4

✬ ✩
The two continuous random variables that we will use
most are the normal and the χ2 (chisquare)
distributions. Areas under the normal and χ2 density
functions are tabulated and widely available in
textbooks. They can also be computed with
statistical computer packages (e.g. Minitab).
Discrete random variables are described by
probability mass functions, which we will also call
“distributions.” For a random variable X, we will
write the distribution as f (x) and deﬁne it to be

f (x) = P (X = x).

In other words, f (x) is the probability that the

random variable X takes the speciﬁc value x. For
example, suppose that X takes the values 1, 2, and 5
with probabilities 1/4, 1/4, and 1/2 respectively.
Then we would say that f (1) = 1/4, f (2) = 1/4,
f (5) = 1/2, and f (x) = 0 for any x other than 1, 2, or
5: 


 .25 x = 1, 2
f (x) = .50 x=5



0 otherwise

✫ ✪
Stat 504, Lecture 1 5

✬ ✩
A graph of f (x) has spikes at the possible values of
X, with the height of a spike indicating the
probability associated with that particular value:
f(x)
.50

.25

x
1 2 3 4 5

Note that x f (x) = 1 if the sum is taken over all
values of x having nonzero probability. In other
words, the sum of the heights of all the spikes must
equal one.
Joint distribution. Suppose that X1 , X2 , . . . , Xn are n
random variables, and let X be the entire vector

X = (X1 , X2 , . . . , Xn ).

Let x = (x1 , x2 , . . . , xn ) denote a particular value that

X can take. The joint distribution of X is

f (x) = P (X = x)
= P (X1 = x1 , X2 = x2 , . . . , Xn = xn ).
✫ ✪
Stat 504, Lecture 1 6

✬ ✩

In particular, suppose that the random variables

X1 , X2 , . . . , Xn are independent and identically
distributed (iid). Then X1 = x1 , X2 = x2 , . . . ,
Xn = xn are independent events, and the joint
distribution is

f (x) = P (X1 = x1 , X2 = x2 , . . . , Xn = xn )
= P (X1 = x1 ) P (X2 = x2 ) · · · P (Xn = xn )
= f (x1 ) f (x2 ) · · · f (xn )
n
= f (xi )
i=1

where f (xi ) refers to the distribution of Xi .

Moments
The expectation of a discrete random variable X is
deﬁned to be

E(X) = xf (x)
x

where the sum is taken over all possible values of X.

E(X) is also called the mean of X or the average of
X, because it represents the long-run average value if
the experiment were repeated inﬁnitely many times.

✫ ✪
Stat 504, Lecture 1 7

✬ ✩
In the trivial example where X takes the values 1, 2,
and 5 with probabilities 1/4, 1/4, and 1/2
respectively, the mean of X is

E(X) = 1(.25) + 2(.25) + 5(.5) = 3.25.

In calculating expectations, it helps to visualize a

table with two columns. The ﬁrst column lists the
possible values x of the random variable X, and the
second column lists the probabilities f (x) associated
with these values:
x f (x)
1 .25
2 .25
5 .50
To calculate E(X) we merely multiply the two
columns together, row by row, and add up the
products: 1(.25) + 2(.25) + 5(.5) = 3.25.
If g(X) is a function of X (e.g. g(X) = logX,
g(X) = X 2 , etc.) then g(X) is also a random
variable. It’s expectation is

E( g(X) ) = g(x)f (x). (4)
x

✫ ✪
Stat 504, Lecture 1 8

✬ ✩
Visually, in the table containing x and f (x), we can
simply insert a third column for g(x) and add up the
products g(x)f (x). In our example, if
Y = g(X) = X 3 , the table becomes

x f (x) g(x) = x3
1 .25 13 = 1
2 .25 23 = 8
5 .50 53 = 125
and
E(Y ) = E(X 3 ) = 1(.25) + 8(.25) + 125(.5) = 64.75.
If Y = g(X) = a + bX where a and b are constants,
then Y is said to be a linear function of X, and
E(Y ) = a + bE(X). An algebraic proof is

E(Y ) = yf (y)
y

= (a + bx)f (x)
x

= af (x) + bxf (x)
x x

= a f (x) + b xf (x)
x x
= a · 1 + bE(X).
✫ ✪
Stat 504, Lecture 1 9

✬ ✩
That is, if g(X) is linear, then E(g(X)) = g(E(X)).
Note, however, that this does not work if the function
g is nonlinear. For example, E(X 2 ) is not equal to
E(X)2 , and E(log X) is not equal to log E(X). To
calculate E(X 2 ) or E(log X), we need to use (4).
Variance. The variance of a discrete random variable,
denoted by V (X), is deﬁned to be

V (X) = E( (X − E(X))2 )

= (x − E(X))2 f (x).
x

That is, V (X) is the average squared distance

between X and its mean. Variance is a measure of
dispersion, telling us how “spread out” a distribution
is. For our simple random variable, the variance is

2 2 2
V (X) = (1 − 3.25) (.25) + (2 − 3.25) (.25) + (5 − 3.25) (.50)
= 3.1875.

A slightly easier way to calculate the variance is to

use the well-known identity

V (X) = E(X 2 ) − ( E(X) )2 .

✫ ✪
Stat 504, Lecture 1 10

✬ ✩
Visually, this method requires a table with three
columns: x, f (x), and x2 .

x f (x) x2
1 .25 12 = 1
2 .25 22 = 4
5 .50 52 = 25
First we calculate
E(X) = 1(.25) + 2(.25) + 5(.50) = 3.25 and
E(X 2 ) = 1(.25) + 4(.25) + 25(.50) = 13.75. Then
V (X) = 13.75 − (3.25)2 = 3.1875.
It can be shown that if a and b are constants, then

V (a + bX) = b2 V (X).

In other words, adding a constant a to a random

variable does not change its variance, and multiplying
a random variable by a constant b causes the variance
to be multiplied by b2 .
Another common measure of dispersion is the
standard deviation, which is merely the positive
square root of the variance,

SD (X) = V (X).

✫ ✪
Stat 504, Lecture 1 11

✬ ✩
Mean and variance of a sum of random variables.
Expectation is always additive; that is, if X and Y
are any random variables, then

E(X + Y ) = E(X) + E(Y ).

If X and Y are independent random variables, then

their variances will also add:

V (X + Y ) = V (X) + V (Y ) if X, Y independent.

More generally, if X and Y are any random variables,

then

V (X + Y ) = V (X) + V (Y ) + 2 Cov(X, Y )

where Cov(X, Y ) is the covariance between X and Y ,

Cov(X, Y ) = E( (X − E(X)) (Y − E(Y )) ).

If X and Y are independent (or merely uncorrelated)

then Cov(X, Y ) = 0. This additive rule for variances
extends to three or more random variables; e.g.,

V (X + Y + Z) = V (X) + V (Y ) + V (Z)
+ 2 Cov(X, Y ) + 2 Cov(X, Z) + 2 Cov(Y, Z),

with all covariances equal to zero if X, Y , and Z are

mutually uncorrelated.
✫ ✪
Stat 504, Lecture 1 12

✬ ✩
Bernoulli distribution
The most basic of all discrete random variables is the
Bernoulli. X is said to have a Bernoulli distribution if
X = 1 occurs with probability p and X = 0 occurs
with probability 1 − p,



 p x=1
f (x) = 1−p x=0



0 otherwise.
Another common way to write it is

f (x) = px (1 − p)1−x for x = 0, 1.

Suppose an experiment has only two possible

outcomes, “success” and “failure,” and let p be the
probability of a success. If we let X denote the
number of successes (either zero or one), then X will
be Bernoulli. The mean of a Bernoulli is

E(X) = 1(p) + 0(1 − p) = p,

and the variance of a Bernoulli is

V (X) = E(X 2 ) − ( E(X) )2

= 12 p + 02 (1 − p) − p2
= p(1 − p).
✫ ✪
Stat 504, Lecture 1 13

✬ ✩

Binomial distribution
Suppose that X1 , X2 , . . . , Xn are independent and
identically distributed (iid) Bernoulli random
variables, each having the distribution

f (xi ) = pxi (1 − p)1−xi for xi = 0, 1.

n
Let X = i=1 Xi . Then X is said to have a binomial
distribution with parameters n and p,

X ∼ Bin(n, p).

Suppose that an experiment consists of n repeated

Bernoulli-type trials, each trial resulting in a
“success” with probability p and a “failure” with
probability 1 − p. If all the trials are
independent—that is, if the probability of success on
any trial is unaﬀected by the outcome of any other
trial—then the total number of successes in the
experiment will have a binomial distribution. The
binomial distribution can be written as
n!
f (x) = px (1−p)n−x for x = 0, 1, 2, . . . , n.
x! (n − x)!

✫ ✪
Stat 504, Lecture 1 14

✬ ✩
Note that X will not have a binomial distribution if
the probability of success p is not constant from trial
to trial, or if the trials are not entirely independent
(i.e. a success or failure on one trial alters the
probability of success on another trial).
The Bernoulli distribution is a special case of the
binomial with n = 1. That is, X ∼ Bin(1, p) means
that X has a Bernoulli distribution with success
probability p.
One can show algebraically that if X ∼ Bin(n, p) then
E(X) = np and V (X) = np(1 − p). An easier way to
arrive at these results is to note that
X = X1 + X2 + . . . + Xn where X1 , X2 , . . . , Xn are iid
Bernoulli random variables. Then, by the additive
properties of mean and variance,

E(X) = E(X1 ) + E(X2 ) + · · · + E(Xn )

= np

and

V (X) = V (X1 ) + V (X2 ) + · · · + V (Xn )

= np(1 − p).

✫ ✪
Stat 504, Lecture 1 15

✬ ✩
Poisson distribution
The Poisson is a limiting case of the binomial.
Suppose that X ∼ Bin(n, p) and let n → ∞ and p → 0
in such a way that np → λ where λ is a constant.
Then, in the limit, X will have a Poisson distribution
with parameter λ. The notation X ∼ P (λ) will mean
“X has a Poisson distribution with parameter λ.” The
Poisson probability distribution is
λx e−λ
f (x) = x = 0, 1, 2, . . .
x!
The mean and the variance of the Poisson are both λ;
that is, E(X) = V (X) = λ. Note that the parameter
λ must always be positive; negative values are not
allowed.
Because the Poisson is limit of the Bin(n, p), it is
useful as an approximation to the binomial when n is
large and p is small. That is, if n is large and p is
small, then
n! x n−x λx e−λ
p (1 − p) ≈ (5)
x! (n − x)! x!
where λ = np. The right-hand side of (5) is typically
less tedious and easier to calculate than the left-hand
side.
✫ ✪
Stat 504, Lecture 1 16

✬ ✩

Aside from its use as an approximation to the

binomial, the Poisson distribution is also an
important probability model in its own right. It is
often used to model discrete events occurring in time
or in space. For example, suppose that X is the
number of telephone calls arriving at a switchboard in
one hour. Suppose that in the long run, the average
number of telephone calls per hour is λ. Then it may
be reasonable to assume X ∼ P (λ). For the Poisson
model to hold, however, the average arrival rate λ
must be fairly constant over time; that is, there should
be no systematic or predictable changes in the arrival
rate. Moreover, the arrivals should be independent of
one another; that is, the arrival of one call should not
make the arrival of another call more or less likely.

✫ ✪
Stat 504, Lecture 1 17

✬ ✩
Likelihood function
One of the most fundamental concepts of modern
statistics is that of likelihood. In each of the discrete
random variables we have considered thus far, the
distribution depends on one or more parameters that
are, in most statistical applications, unknown. In the
Poisson distribution, the parameter is λ. In the
binomial, the parameter of interest is p (since n is
typically ﬁxed and known).
Likelihood is a tool for summarizing the data’s
evidence about parameters. Let us denote the
unknown parameter(s) of a distribution generically by
θ. Since the probability distribution depends on θ, we
can make this dependence explicit by writing f (x) as
f (x ; θ). For example, in the Bernoulli distribution the
parameter is θ = p, and the distribution is

f (x ; p) = px (1 − p)1−x x = 0, 1. (6)

Once a value of X has been observed, we can plug

this observed value x into f (x ; p) and obtain a
function of p only. For example, if we observe X = 1,
then plugging x = 1 into (6) gives the function p. If
we observe X = 0, the function becomes 1 − p.
✫ ✪
Stat 504, Lecture 1 18

✬ ✩
Whatever function of the parameter results when we
plug the observed data x into f (x ; θ) is called the
likelihood function.
We will write the likelihood function as L(θ ; x) or
sometimes just L(θ). Algebraically, the likelihood
L(θ ; x) is just the same as the distribution f (x ; θ),
but its meaning is quite different because it is
regarded as a function of θ rather than a function of
x. Consequently, a graph of the likelihood usually
looks very different from a graph of the probability
distribution.
For example, suppose that X has a Bernoulli
distribution with unknown parameter p. We can
graph the probability distribution for any fixed value
of p. For example, if p = .5 we get this:
f(x)

.50

x
0 1
✫ ✪
Stat 504, Lecture 1 19

✬ ✩
Now suppose that we observe a value of X, say
X = 1. Plugging x = 1 into the distribution
px (1 − p)1−x gives the likelihood function L(p ; x) = p,
which looks like this:
L(p;x)
1.0

p
0 1

For discrete random variables, a graph of the

probability distribution f (x ; θ) has spikes at speciﬁc
values of x, whereas a graph of the likelihood L(θ ; x)
is a continuous curve (e.g. a line) over the parameter
space, the domain of possible values for θ.
L(θ ; x) summarizes the evidence about θ contained in
the event X = x. L(θ ; x) is high for values of θ that
make X = x likely, and small for values of θ that
make X = x unlikely. In the Bernoulli example,
observing X = 1 gives some (albeit weak) evidence
that p is nearer to 1 than to 0, so the likelihood for
x = 1 rises as p moves from 0 to 1.
✫ ✪
Stat 504, Lecture 1 20

✬ ✩
Maximum-likelihood (ML) estimation
Suppose that an experiment consists of n = 5
independent Bernoulli trials, each having probability
of success p. Let X be the total number of successes
in the trials, so that X ∼ Bin(5, p). If the outcome is
X = 3, the likelihood is
n!
L(p ; x) = px (1 − p)n−x
x! (n − x)!
5!
= p3 (1 − p)5−3
3! (5 − 3)!

∝ p3 (1 − p)2

where the constant at the beginning is ignored. A

graph of L(p; x) = p3 (1 − p)2 over the unit interval
p ∈ (0, 1) looks like this:
0.03
0.02
L(p;x)

0.01
0.0

0.0 0.2 0.4 0.6 0.8 1.0

✫ ✪
Stat 504, Lecture 1 21

✬ ✩
It’s interesting that this function reaches its
maximum value at p = .6. An intelligent person
would have said that if we observe 3 successes in 5
trials, a reasonable estimate of the long-run
proportion of successes p would be 3/5 = .6.
This example suggests that it may be reasonable to
estimate an unknown parameter θ by the value for
which the likelihood function L(θ ; x) is largest. This
approach is called maximum-likelihood (ML)
estimation. We will denote the value of θ that
maximizes the likelihood function by θ̂, read “theta
hat.” θ̂ is called the maximum-likelihood estimate
(MLE) of θ.
Finding MLE’s usually involves techniques of
differential calculus. To maximize L(θ ; x) with
respect to θ, we first calculate the derivative of
L(θ ; x) with respect to θ, set the derivative equal to
zero, and solve the resulting equation for θ. These
computations can often be simplified by maximizing
the loglikelihood function,

l(θ ; x) = log L(θ ; x),

where “log” means natural log (logarithm to the base

✫ ✪
Stat 504, Lecture 1 22

✬ ✩
e). Because the natural log is an increasing function,
maximizing the loglikelihood is the same as
maximizing the likelihood. The loglikelihood often
has a much simpler form than the likelihood and is
usually easier to diﬀerentiate.
In Stat 504 you will not be asked to derive MLE’s by
yourself. In most of the probability models that we
will use later in the course (logistic regression,
loglinear models, etc.) no explicit formulas for MLE’s
are available, and we will have to rely on computer
packages to calculate the MLE’s for us. For the
simple probability models we have seen thus far,
however, explicit formulas for MLE’s are available and
are given below.
ML for Bernoulli trials. If our experiment is a single
Bernoulli trial and we observe X = 1 (success) then
the likelihood function is L(p ; x) = p. This function
reaches its maximum at p̂ = 1. If we observe X = 0
(failure) then the likelihood is L(p ; x) = 1 − p, which
reaches its maximum at p̂ = 0. Of course, it is
somewhat silly for us to try to make formal inferences
about θ on the basis of a single Bernoulli trial; usually
multiple trials are available.
✫ ✪
Stat 504, Lecture 1 23

✬ ✩
Suppose that X = (X1 , X2 , . . . , Xn ) represents the
outcomes of n independent Bernoulli trials, each with
success probability p. The likelihood for p based on X
is deﬁned as the joint probability distribution of
X1 , X2 , . . . , Xn . Since X1 , X2 , . . . , Xn are iid random
variables, the joint distribution is

L(p ; x) = f (x ; p)
n
= f (xi ; p)
i=1

n
= pxi (1 − p)1−xi
i=1
n n
= p i=1 xi (1 − p) n− i=1 xi .

Diﬀerentiating the log of L(p ; x) with respect to p and

setting the derivative to zero shows that this function
n
achieves a maximum at p̂ = n x
i=1 i /n. Since i=1 xi
is the total number of successes observed in the n
trials, p̂ is the observed proportion of successes in the
n trials. We often call p̂ the sample proportion to
distinguish it from p, the “true” or “population”
proportion. For repeated Bernoulli trials, the MLE p̂
is the sample proportion of successes.
✫ ✪
Stat 504, Lecture 1 24

✬ ✩
ML for Binomial. Suppose that X is an observation
from a binomial distribution, X ∼ Bin(n, p), where n
is known and p is to be estimated. The likelihood
function is
n!
L(p ; x) = px (1 − p)n−x ,
x! (n − x)!
which, except for the factor n!/(x! (n − x)!), is
identical to the likelihood from n independent

Bernoulli trials with x = n i=1 xi . But since the
likelihood function is regarded as a function only of
the parameter p, the factor n!/(x! (n − x)!) is a ﬁxed
constant and does not aﬀect the MLE. Thus the MLE
is again p̂ = x/n, the sample proportion of successes.
The fact that the MLE based on n independent
Bernoulli random variables and the MLE based on a
single binomial random variable are the same is not
surprising, since the binomial is the result of n
independent Bernoulli trials anyway. In general,
whenever we have repeated, independent Bernoulli
trials with the same probability of success p for each
trial, the MLE will always be the sample proportion
of successes. This is true regardless of whether we
know the outcomes of the individual trials
✫ ✪
Stat 504, Lecture 1 25

✬ ✩

X1 , X2 , . . . , Xn , or just the total number of successes

for all trials X = n i=1 Xi .

Suppose now that we have a sample of iid binomial

random variables. For example, suppose that
X1 , X2 , . . . , X10 are an iid sample from a binomial
distribution with n = 5 and p unknown. Since each
Xi is actually the total number of successes in 5
independent Bernoulli trials, and since the Xi ’s are

independent of one another, their sum X = 10 i=1 Xi
is actually the total number of successes in 50
independent Bernoulli trials. Thus X ∼ Bin(50, p)
and the MLE is p̂ = x/n, the observed proportion of
successes across all 50 trials. Whenever we have
independent binomial random variables with a
common p, we can always add them together to get a
single binomial random variable.
Adding the binomial random variables together
produces no loss of information about p if the model
is true. But collapsing the data in this way may limit
our ability to diagnose model failure, i.e. to check
whether the binomial model is really appropriate.

✫ ✪
Stat 504, Lecture 1 26

✬ ✩

ML for Poisson. Suppose that X = (X1 , X2 , . . . , Xn )

are iid observations from a Poisson distribution with
unknown parameter λ. The likelihood function is

n
L(λ ; x) = f (xi ; λ)
i=1
n
λxi e−λ
=
i=1
xi !
n
λ i=1 xi e−nλ
=
x1 ! x2 ! · · · xn !
By diﬀerentiating the log of this function with respect
to λ, one can show that the maximum is achieved at
n
λ̂ = i=1 xi /n. Thus, for a Poisson sample, the MLE
for λ is just the sample mean.
Next time: What happens to the logikelihood as n
gets large

✫ ✪

Lesson 1 Discrete and Continuous
50% (4)
Lesson 1 Discrete and Continuous
24 pages
ProbabilityStatistics_Probability2 (1)
No ratings yet
ProbabilityStatistics_Probability2 (1)
11 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
Random Variables and Probability Distributions
No ratings yet
Random Variables and Probability Distributions
15 pages
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
No ratings yet
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
17 pages
323 egec
No ratings yet
323 egec
18 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 151 180
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 151 180
30 pages
Random Variable Modified PDF
No ratings yet
Random Variable Modified PDF
19 pages
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
No ratings yet
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
44 pages
الشيت الثامن الاحصاء
No ratings yet
الشيت الثامن الاحصاء
19 pages
Chapter 7 Eng
No ratings yet
Chapter 7 Eng
59 pages
MAS 102_Topic 1
No ratings yet
MAS 102_Topic 1
13 pages
Chapter 5 Prob
No ratings yet
Chapter 5 Prob
6 pages
Stats and Prob Reviewer
No ratings yet
Stats and Prob Reviewer
7 pages
STA 120-All Lectures
No ratings yet
STA 120-All Lectures
64 pages
Lecture 1-1_Review of Probability
No ratings yet
Lecture 1-1_Review of Probability
36 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
Chapter 4 Slides
No ratings yet
Chapter 4 Slides
27 pages
Lecture 3 -Adv. Probability - Discrete Random Variables
No ratings yet
Lecture 3 -Adv. Probability - Discrete Random Variables
51 pages
Discrete Random Variables: 4.1 Definition, Mean and Variance
No ratings yet
Discrete Random Variables: 4.1 Definition, Mean and Variance
15 pages
Review MidtermII Summer09
No ratings yet
Review MidtermII Summer09
51 pages
Chapter 6
No ratings yet
Chapter 6
11 pages
STATISTICS AND PROBABILITY 2
No ratings yet
STATISTICS AND PROBABILITY 2
16 pages
Stats 1 - IITM BS Notes - Part 4
No ratings yet
Stats 1 - IITM BS Notes - Part 4
16 pages
Q4-Lessons-3-4-Random-Variables-Prob-Distribution
No ratings yet
Q4-Lessons-3-4-Random-Variables-Prob-Distribution
18 pages
Chapter 3 & 4
No ratings yet
Chapter 3 & 4
12 pages
STAT 552 Probability and Statistics Ii: Short Review of S551
No ratings yet
STAT 552 Probability and Statistics Ii: Short Review of S551
51 pages
Notes-04-Random variables
No ratings yet
Notes-04-Random variables
17 pages
Statis-Methods Testing Hypothesis
No ratings yet
Statis-Methods Testing Hypothesis
56 pages
Pa E E: Random Variables
No ratings yet
Pa E E: Random Variables
7 pages
Probability Formula Sheet
No ratings yet
Probability Formula Sheet
11 pages
Probability 2 FPM
No ratings yet
Probability 2 FPM
55 pages
Statistics For Business and Economics: Discrete Random Variables and Probability Distributions
No ratings yet
Statistics For Business and Economics: Discrete Random Variables and Probability Distributions
82 pages
Chapter 4-6
No ratings yet
Chapter 4-6
39 pages
3and4_main
No ratings yet
3and4_main
10 pages
Draw PDF
No ratings yet
Draw PDF
21 pages
Random Variables
No ratings yet
Random Variables
44 pages
ST3236_Note3
No ratings yet
ST3236_Note3
17 pages
4&5 Basic Probability Concepts and Discrete Probability Distribution
No ratings yet
4&5 Basic Probability Concepts and Discrete Probability Distribution
10 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
Week-5
No ratings yet
Week-5
30 pages
Lecture 06
No ratings yet
Lecture 06
21 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
Binomial Poisson Normal
No ratings yet
Binomial Poisson Normal
36 pages
Probability Distributions
No ratings yet
Probability Distributions
10 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
RVSP Notes
89% (9)
RVSP Notes
123 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
statatics and probability chapter 3 and 4
No ratings yet
statatics and probability chapter 3 and 4
10 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
Slides 9 A
No ratings yet
Slides 9 A
44 pages
Chap2 PDF
No ratings yet
Chap2 PDF
20 pages
WINSEM2024-25_MAT1011_ETH_AP2024254000674_2025-01-22_Reference-Material-I
No ratings yet
WINSEM2024-25_MAT1011_ETH_AP2024254000674_2025-01-22_Reference-Material-I
32 pages
ST102: Text For The Gaps in Lecture Slides: Descriptive Statistics
No ratings yet
ST102: Text For The Gaps in Lecture Slides: Descriptive Statistics
21 pages
Lesson 4 - Continuous Probability Distributions (With Exercises)
No ratings yet
Lesson 4 - Continuous Probability Distributions (With Exercises)
16 pages
Basic Statistics in Fluid Mechanics
No ratings yet
Basic Statistics in Fluid Mechanics
34 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Marine Structures
No ratings yet
Marine Structures
24 pages
2024 Ascham Adv
No ratings yet
2024 Ascham Adv
71 pages
CHAPTER 7 Probability Distributions
100% (1)
CHAPTER 7 Probability Distributions
97 pages
Discrete and Continuous Random Variables
No ratings yet
Discrete and Continuous Random Variables
8 pages
test_hl_probability_distributions_sol_key_v2 2
No ratings yet
test_hl_probability_distributions_sol_key_v2 2
7 pages
Using ABAQUS For Reliability Analysis by Directional Simulation
No ratings yet
Using ABAQUS For Reliability Analysis by Directional Simulation
15 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
66 pages
Chapter 4 Introduction To Probability
No ratings yet
Chapter 4 Introduction To Probability
68 pages
Topic 3.1
No ratings yet
Topic 3.1
11 pages
Brock Mirman JET 72
No ratings yet
Brock Mirman JET 72
35 pages
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
46 pages
21MAB204T-Syllabus-2024-2025(signed copy)
No ratings yet
21MAB204T-Syllabus-2024-2025(signed copy)
2 pages
New CMA Part 1 Section A
100% (2)
New CMA Part 1 Section A
114 pages
Risk Analysis Using Simulation
100% (1)
Risk Analysis Using Simulation
29 pages
Statistics & Probability Summative Test
60% (5)
Statistics & Probability Summative Test
1 page
Basic Terms of Probability
No ratings yet
Basic Terms of Probability
7 pages
3rdq Exam Statisticsprobability 11
100% (1)
3rdq Exam Statisticsprobability 11
4 pages
Department of Education Schools Division of Zamboanga Del Norte
No ratings yet
Department of Education Schools Division of Zamboanga Del Norte
3 pages
Chapter 5: Distribution of Quadratic Forms
100% (1)
Chapter 5: Distribution of Quadratic Forms
7 pages
Deloitte Supply Chain Analytics Workbook
100% (1)
Deloitte Supply Chain Analytics Workbook
0 pages
Inventory Theory
No ratings yet
Inventory Theory
18 pages
14 Moment Generating Function
No ratings yet
14 Moment Generating Function
2 pages
lec4-EDA2025
No ratings yet
lec4-EDA2025
13 pages
XSTK
No ratings yet
XSTK
4 pages
HW 3
No ratings yet
HW 3
3 pages
ECS315 2014 Postmidterm U1 PDF
No ratings yet
ECS315 2014 Postmidterm U1 PDF
89 pages
Applied Mathematics (Colour) Rev (20!08!21)
No ratings yet
Applied Mathematics (Colour) Rev (20!08!21)
73 pages
Fundamentals of Kalman Filtering
No ratings yet
Fundamentals of Kalman Filtering
83 pages
Assignment
No ratings yet
Assignment
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

1a Review of discrete probability

Uploaded by

1a Review of discrete probability

Uploaded by

Stat 504, Lecture 1 1

An event is denoted by a capital letter near the

P (A or B) = P (A) + P (B) − P (A and B). (1)

If A and B are mutually exclusive, then

Conditional probability. If B is known to have

P (A and B) = P (A) P (B). (3)

By (2), this implies P (A | B) = P (A) and

The probability that X falls between any two

In other words, f (x) is the probability that the

Let x = (x1 , x2 , . . . , xn ) denote a particular value that

In particular, suppose that the random variables

where f (xi ) refers to the distribution of Xi .

where the sum is taken over all possible values of X.

E(X) = 1(.25) + 2(.25) + 5(.5) = 3.25.

In calculating expectations, it helps to visualize a

That is, V (X) is the average squared distance

A slightly easier way to calculate the variance is to

V (X) = E(X 2 ) − ( E(X) )2 .

In other words, adding a constant a to a random

E(X + Y ) = E(X) + E(Y ).

If X and Y are independent random variables, then

More generally, if X and Y are any random variables,

where Cov(X, Y ) is the covariance between X and Y ,

Cov(X, Y ) = E( (X − E(X)) (Y − E(Y )) ).

If X and Y are independent (or merely uncorrelated)

with all covariances equal to zero if X, Y , and Z are

f (x) = px (1 − p)1−x for x = 0, 1.

Suppose an experiment has only two possible

E(X) = 1(p) + 0(1 − p) = p,

and the variance of a Bernoulli is

V (X) = E(X 2 ) − ( E(X) )2

f (xi ) = pxi (1 − p)1−xi for xi = 0, 1.

Suppose that an experiment consists of n repeated

E(X) = E(X1 ) + E(X2 ) + · · · + E(Xn )

V (X) = V (X1 ) + V (X2 ) + · · · + V (Xn )

Aside from its use as an approximation to the

Once a value of X has been observed, we can plug

For discrete random variables, a graph of the

where the constant at the beginning is ignored. A

0.0 0.2 0.4 0.6 0.8 1.0

l(θ ; x) = log L(θ ; x),

where “log” means natural log (logarithm to the base

Diﬀerentiating the log of L(p ; x) with respect to p and

X1 , X2 , . . . , Xn , or just the total number of successes

Suppose now that we have a sample of iid binomial

ML for Poisson. Suppose that X = (X1 , X2 , . . . , Xn )

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.