Week 1
Week 1
Week 1
A B C D E F G H I J K
1
2
3
4
5
6
Attendance
7
8
9
10
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
4
5
6
7 Probability Distributions
8
9
10
11
12
13
N.D. Amara
14
15
16
A B C D E F G H I J K
1
2
3
Random Variable
4
5
• A random variable x takes on a defined set of values with different
probabilities.
6 • For example, if you roll a die, the outcome is random (not fixed) and there are 6
7
possible outcomes, each of which occur with probability one-sixth.
• For example, if you poll people about their voting preferences, the percentage of the
8 sample that responds “Yes on Proposition 100” is a also a random variable (the
percentage will be slightly differently every time you poll).
9
10 • Roughly, probability is how frequently we expect different outcomes to
11
occur if we repeat the experiment over and over (“frequentist” view)
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Random variables can be discrete or continuous
4
5 1. Discrete random variables have a countable number of outcomes
6 a. Examples: Dead/alive, treatment/placebo, dice,
7 counts, etc.
8 2. Continuous random variables have an infinite continuum of possible values.
9 a. Examples: blood pressure, weight, the speed of a car,
10 the real numbers from 1 to 6.
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3 Probability functions
4
5 ● A probability function maps the possible values of x against
6 their respective probabilities of occurrence, p(x)
7 ● p(x) is a number from 0 to 1.0.
8 ● The area under a probability function is always 1.
9
10
11
12
13
14
15
16
A B C D E F G H I J K
1
2
Discrete example: roll of a die
3
4
p(x)
5
6
7
8
1/6
9
10 x
11
1 2 3 4 5 6
12
13
14
P(x) 1
all x
15
16
A B C D E F G H I J K
1
2
3
Probability mass function (pmf)
4
x p(x)
5
6
1 p(x=1)=1/6
7
2 p(x=2)=1/6
8
9 3 p(x=3)=1/6
10
11 4 p(x=4)=1/6
12
5 p(x=5)=1/6
13
14 6 p(x=6)=1/6
15
16 1.0
A B C D E F G H I J K
1
2
3
Cumulative distribution function (CDF)
4
5
6
1.0 P(x)
7
5/6
8
2/3
9 1/2
10 1/3
11 1/6
12 1 2 3 4 5 6 x
13
14
15
16
A B C D E F G H I J K
1
2
3
Cumulative distribution function
4
x P(x≤A)
5
6 1 P(x≤1)=1/6
7
2 P(x≤2)=2/6
8
9 3 P(x≤3)=3/6
10
11 4 P(x≤4)=4/6
12
5 P(x≤5)=5/6
13
14
6 P(x≤6)=6/6
15
16
A B C D E F G H I J K
1
2
3
Examples
4
5 1. What’s the probability that you roll a 3 or less?
6 P(x≤3)=1/2
7
8
9
2. What’s the probability that you roll a 5 or higher?
10
P(x≥5) = 1 – P(x≤4) = 1-2/3 = 1/3
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Practice Problem
4
5 Which of the following are probability functions?
6
7 a. f(x)=.25 for x=9,10,11,12
8
9 b. f(x)= (3-x)/2 for x=1,2,3,4
10
11
c. f(x)= (x2+x+1)/25 for x=0,1,2,3
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Answer (a)
4
5 a. f(x)=.25 for x=9,10,11,12
6
7 x f(x) Yes, probability
8 function!
9
9 .25
10
10 .25
11
12 11 .25
13
14 12 .25
15
16
1.0
A B C D E F G H I J K
1
2
3
Answer (b)
4
5 b. f(x)= (3-x)/2 for x=1,2,3,4
6
7 x f(x)
8
Though this sums to 1,
1 (3-1)/2=1.0 you can’t have a negative
9
probability; therefore, it’s
10
2 (3-2)/2=.5 not a probability
11
function.
12 3 (3-3)/2=0
13
14 4 (3-4)/2=-.5
15
16
A B C D E F G H I J K
1
2
3
Answer (c)
4
5 c. f(x)= (x2+x+1)/25 for x=0,1,2,3
6
7 x f(x)
8
9
0 1/25
10
1 3/25
11
Doesn’t sum to 1. Thus,
12 2 7/25 it’s not a probability
function.
13
14 3 13/25
15
24/25
16
A B C D E F G H I J K
1
2
3
Practice Problem:
4
● The number of ships to arrive at a harbor on any given day is a random variable represented
5
by x. The probability distribution for x is:
6
7
8
x 10 11 12 13 14
9
P(x) .4 .2 .2 .1 .1
10 Find the probability that on a given day:
11
12 a. exactly 14 ships arrive p(x=14)= .1
13
b. At least 12 ships arrive p(x12)= (.2 + .1 +.1) = .4
14
15 c. At most 11 ships arrive p(x≤11)= (.4 +.2) = .6
16
A B C D E F G H I J K
1
2
3
Practice Problem:
4
5 You are lecturing to a group of 1000 students. You ask them to each
6 randomly pick an integer between 1 and 10. Assuming, their picks
7
are truly random:
• What’s your best guess for how many students picked the number 9?
8
Since p(x=9) = 1/10, we’d expect about 1/10th of the 1000 students to pick 9. 100
9 students.
10
11
• What percentage of the students would you expect picked a number less than
or equal to 6?
12 Since p(x≤ 6) = 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 =.6 60%
13
14
15
16
A B C D E F G H I J K
1
2
3
Important discrete distributions in epidemiology…
4
5 ● Binomial
6 ○ Yes/no outcomes (dead/alive, treated/untreated,
7
8
smoker/non-smoker, sick/well, etc.)
9
● Poisson
10 ○ Counts (e.g., how many cases of disease in a given
11 area)
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Continuous case
4
5
The probability function that accompanies a continuous random
6
variable is a continuous mathematical function that integrates to 1.
The probabilities associated with continuous functions are just areas
7
under the curve (integrals!).
8
Probabilities are given for a range of values, rather than a particular
9
value (e.g., the probability of getting a math SAT score between 700
10
and 800 is 2%).
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Continuous case
4
5
For example, recall the negative exponential function (in probability, this is
called an “exponential distribution”):
6
7 f ( x) e x
8
9 This function integrates to 1:
10
11
e
x x
12 e 0 1 1
0
13 0
14
15
16
A B C D E F G H I J K
1
2
3
Continuous case: “probability density function” (pdf)
4
5
6
p(x)=e-x
7
8
1
9
10
11 x
12
13
14 The probability that x is any exact particular value (such as 1.9976) is 0;
15 we can only assign probabilities to possible ranges of x.
16
A B C D E F G H I J K
1
2
3
For example, the probability of x falling within 1 to 2:
4
5
6
p(x)=e-x
7
8
1
9
10
11 x
12 1 2
13
14 2 2
x x
15 P(1 x 2) e e e 2 e 1 .135 .368 .23
1
1
16
A B C D E F G H I J K
1
2
3
Cumulative distribution function
4
5
As in the discrete case, we can specify the “cumulative distribution
6
7
function” (CDF):
8
9
The CDF here = P(x≤A)=
10
A A
11
x x
e e e A e 0 e A 1 1 e A
12 0
0
13
14
15
16
A B C D E F G H I J K
1
2
3
Example
4
5
p(x)
6
7 1
8
9
2 x
10
11
12
2
13 P(x 2) 1 - e 1 - .135 .865
14
15
16
A B C D E F G H I J K
1
2
3
Example 2: Uniform distribution
4
5 The uniform distribution: all values are equally likely
6
7 The uniform distribution:
p(x)
8 f(x)= 1 , for 1 x 0
9 1
10
11 x
1
12
13
We can see it’s a probability distribution because it integrates
14
to 1 (the area under the curve is 1): 1 1
15
16
1 x
0
0
1 0 1
A B C D E F G H I J K
1
2
3
Example: Uniform distribution
4
5 What’s the probability that x is between ¼ and ½?
6
7
p(x)
8
9
1
10
11
12 ¼ ½ x
1
13
14 P(½ x ¼ )= ¼
15
16
A B C D E F G H I J K
1
2
3
Practice Problem
4
5 4. Suppose that survival drops off rapidly in the year following diagnosis of a
6 certain type of advanced cancer. Suppose that the length of survival (or
time-to-death) is a random variable that approximately follows an
7
exponential distribution with parameter 2 (makes it a steeper drop off):
8
9
probability function : p( x T ) 2e 2T
10
11
2 x 2 x
12 [note : 2e e 0 1 1]
0
13 0
14
15
What’s the probability that a person who is diagnosed with this
illness survives a year?
16
A B C D E F G H I J K
1
2
3
Answer
4
5
The probability of dying within 1 year can be calculated using the cumulative
6 distribution function:
7
Cumulative distribution function is:
8
T
2 x
9 P ( x T ) e 1 e 2 (T )
0
10
11
The chance of surviving past 1 year is: P(x≥1) = 1 – P(x≤1)
12
13
1 (1 e 2(1) ) .135
14
15
16
A B C D E F G H I J K
1
2
3 Expected Value and Variance
4
5 ●
All probability distributions are characterized by an
6
7
expected value and a variance (standard deviation
8 squared).
9
10
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
For example, bell-curve (normal) distribution:
4
5
6
7
8
9
10
One standard
11
deviation from the
Mean ()
12 mean ()
13
14
15
16
A B C D E F G H I J K
1
2
3 Expected value, or mean
4
5
● If we understand the underlying probability function of a certain
6
phenomenon, then we can make informed decisions based on how we expect
7
x to behave on-average over the long-run…(so called “frequentist” theory of
probability).
8
9
● Expected value is just the weighted average or mean (µ) of random variable x.
10
Imagine placing the masses p(x) at the points X on a beam; the balance point
11
of the beam is the expected value of x.
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Example: expected value
4
5 ● Recall the following probability distribution of
6 ship arrivals:
7
8
9
10
x 10 11 12 13 14
11
P(x) .4 .2 .2 .1 .1
12
13
14 5
16
A B C D E F G H I J K
1
2
3
Expected value, formally
4
5 Discrete case:
6
7
8
E( X ) x p(x )
all x
i i
9
10
11
12 Continuous case:
13
14
15
E( X )
all x
xi p(xi )dx
16
A B C D E F G H I J K
1
2
3
Empirical Mean is a special case of Expected Value…
4
5
6 Sample mean, for a sample of n subjects: =
7
8
n
x
9
10 i n
1
11
12
X i 1
n
i 1
xi ( )
n
13
14
The probability (frequency) of each person
15
in the sample is 1/n.
16
A B C D E F G H I J K
1
2
3
Expected value, formally
4
5 Discrete case:
6
7
8
E( X ) x p(x )
all x
i i
9
10
11
12 Continuous case:
13
14
15
E( X )
all x
xi p(xi )dx
16
A B C D E F G H I J K
1
2
3
Extension to continuous case: uniform distribution
4
5
6
p(x)
7
8
1
9
10
11 x
1
12
13
1
x2 1
1 1
14
E ( X ) x(1)dx
0
2 0
2
0
2
15
16
A B C D E F G H I J K
1
2
3
Symbol Interlude
4
5
●
E(X) = µ
6
7
○
these symbols are used interchangeably
8
9
10
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Expected Value
4
5 ●
Expected value is an extremely useful concept for
6
7 good decision-making!
8
9
10
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Example: the lottery
4
5
● The Lottery (also known as a tax on people who are bad at math…)
6
● A certain lottery works by picking 6 numbers from 1 to 49. It costs
7
$1.00 to play the lottery, and if you win, you win $2 million after
8
taxes.
9
10
● If you play the lottery once, what are your expected winnings or
11
losses?
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Lottery
4
Calculate the probability of winning in 1 try:
5
6
1 1 1 “49 choose 6”
7 7.2 x 10-8
49 49! 13,983,816
Out of 49
8 6 43!6!
numbers, this is
9 the number of
10 The probability function (note, sums to 1.0): distinct
11
combinations of 6.
x$ p(x)
12
-1 .999999928
13
14 + 2 million 7.2 x 10--8
15
16
A B C D E F G H I J K
1
2
3
Expected Value
4 The probability function
5 x$ p(x)
6
-1 .999999928
7
8 + 2 million 7.2 x 10--8
9
10 Expected Value
11 E(X) = P(win)*$2,000,000 + P(lose)*-$1.00
12 = 2.0 x 106 * 7.2 x 10-8+ .999999928 (-1) = .144 - .999999928 = -$.86
13
Negative expected value is never good!
14
You shouldn’t play if you expect to lose money!
15
16
A B C D E F G H I J K
1
2
3
Expected Value
4
5
6
If you play the lottery every week for 10 years,
7 what are your expected winnings or losses?
8
9
10 520 x (-.86) = -$447.20
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Gambling (or how casinos can afford to give so many free drinks…)
4
A roulette wheel has the numbers 1 through 36, as well as 0 and 00. If you bet $1 that an odd number
5 comes up, you win or lose $1 according to whether or not that event occurs. If random variable X
6 denotes your net gain, X=1 with probability 18/38 and X= -1 with probability 20/38.
7 E(X) = 1(18/38) – 1 (20/38) = -$.053
8
On average, the casino wins (and the player loses) 5 cents per game.
9
The casino rakes in even more if the stakes are higher:
10
11 E(X) = 10(18/38) – 10 (20/38) = -$.53
12 If the cost is $10 per game, the casino wins an average of 53 cents per game. If 10,000 games are played in
a night, that’s a cool $5300.
13
14
15
16
A B C D E F G H I J K
1
2
3
**A few notes about Expected Value as a mathematical operator:
4
If c= a constant number (i.e., not a variable) and X and Y are any random variables…
5 ● E(c) = c
6 ● E(cX)=cE(X)
7
● E(c + X)=c + E(X)
8
9
● E(X+Y)= E(X) + E(Y)
10
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3 E(c) = c
4
5
6
E(c) = c
7
Example: If you cash in soda cans in CA, you always get 5 cents
8
9
per can.
10
Therefore, there’s no randomness. You always expect to (and do)
11 get 5 cents.
12
13
14
15
16
A B C D E F G H I J K
1
2
3 E(cX)=cE(X)
4
5
6
E(cX)=cE(X)
7
8
Example: If the casino charges $10 per game instead of
9 $1, then the casino expects to make 10 times as much on
10 average from the game (See roulette example above!)
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3 E(c + X)=c + E(X)
4
5
6 E(c + X)=c + E(X)
7 Example, if the casino throws in a free drink worth exactly $5.00
8 every time you play a game, you always expect to (and do) gain an
9 extra $5.00 regardless of the outcome of the game.
10
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3 E(X+Y)= E(X) + E(Y)
4
5 E(X+Y)= E(X) + E(Y)
6
Example: If you play the lottery twice, you expect to lose: -$.86 + -$.86.
7
8 NOTE: This works even if X and Y are dependent!! Does not require independence!!
9 Proof left for later…
10
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Practice Problem
4
If a disease is fairly rare and the antibody test is fairly expensive, in a resource-poor region,
5
one strategy is to take half of the serum from each sample and pool it with n other halved
6 samples, and test the pooled lot. If the pooled lot is negative, this saves n-1 tests. If it’s
positive, then you go back and test each sample individually, requiring n+1 tests total.
7
a. Suppose a particular disease has a prevalence of 10% in a third-world population and you have 500
8 blood samples to screen. If you pool 20 samples at a time (25 lots), how many tests do you expect
to have to run (assuming the test is perfect!)?
9 b. What if you pool only 10 samples at a time?
10 c. 5 samples at a time?
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Answer (a)
4
5
a. Suppose a particular disease has a prevalence of 10% in a third-world population and you have 500 blood
samples to screen. If you pool 20 samples at a time (25 lots), how many tests do you expect to have to
6 run (assuming the test is perfect!)?
7 Let X = a random variable that is the number of tests you have to run per lot:
8
E(X) = P(pooled lot is negative)(1) + P(pooled lot is positive) (21)
9
E(X) = (.90)20 (1) + [1-.9020] (21) = 12.2% (1) + 87.8% (21) = 18.56
10
11 E(total number of tests) = 25*18.56 = 464
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Answer (b)
4
5
6
b. What if you pool only 10 samples at a time?
7
8 E(X) = (.90)10 (1) + [1-.9010] (11) = 35% (1) + 65% (11) =
9
7.5 average per lot
10
11
12 50 lots * 7.5 = 375
13
14
15
16
A B C D E F G H I J K
1
2
3
Answer (c)
4
5 c. 5 samples at a time?
6
7
8
E(X) = (.90)5 (1) + [1-.905] (6) = 59% (1) + 41% (6) =
9 3.05 average per lot
10
11
100 lots * 3.05 = 305
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Practice Problem
4
5 If X is a random integer between 1 and 10, what’s the
6 expected value of X?
7
8
9
10
11
12
13
14
15
16
A B C D E F G H I J K
1
2
3
Answer
4
5
If X is a random integer between 1 and 10, what’s the expected value of X?
6
7
8
10 10
9 1 1 10(10 1)
10
E ( x) i ( )
i 1 10 10
i
i (.1)
2
55(.1) 5.5
11
12
13
14
15
16