Biostatistics - Probability - 02 October 2024
Biostatistics - Probability - 02 October 2024
02 October 2024
In a card game, suppose a player needs to draw two cards of the same suit in order to
win. Of the 52 cards, there are 13 cards in each suit. Suppose first the player draws a
heart. Now the player wishes to draw a second heart. Since one heart has already been
chosen, there are now 12 hearts remaining in a deck of 51 cards. So the conditional
probability
P(Draw second heart|First card a heart) = 12/51.
Conditional Probability
Problem: A math teacher gave her class two tests. 25% of the class passed
both tests and 42% of the class passed the first test.
What percent of those who passed the first test also passed the second test?
P(second/first) = P(First and Second)
P(first)
=0,25
0,42
=0,60
= 60 %
https://youtu.be/hR0fQ31-3lg
Conditional Probability
Example 1: A jar contains black and white marbles. Two marbles are chosen without
replacement.
The probability of selecting a black marble and then a white marble is 0.34, and the
probability of selecting a black marble on the first draw is 0.47.
What is the probability of selecting a white marble on the second draw, given that the first
marble drawn was black?
P(white/black) = P(black and white)
P (black)
= 0,34
0,47
=0,72
= 72 %
Bayes Theorem
In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes’ rule)
describes the probability of an event, based on prior knowledge of conditions that might
be related to the event.
For example, if cancer is related to age, then, using Bayes’ theorem, a person's age can be
used to more accurately assess the probability that they have cancer than can be done
without knowledge of the person’s age.
https://youtu.be/cqTwHnNbc8g
Bayes Theorem
In state A, 50% of voters support the liberal candidate, in state B, 60% of the voters support the liberal
candidate, and in state C, 35% of the voters support the liberal candidate.
Of the total population of the three states, 40% live in state A, 25% live in state B, and 35% live in
state C.
Given that a voter supports the liberal candidate, what is the probability that she lives in state B?
Bayes Theorem
P(Voter lives in state B|Voter supports liberal candidate) =
P(Voter supports liberal candidate|Voter lives in state B)P(Voter lives in state B) divide by
P(Voter supports lib. cand.|Voter lives in state A)P(Voter lives in state A) +
P(Voter supports lib. cand.|Voter lives in state B)P(Voter lives in state B) +
P(Voter supports lib. cand.|Voter lives in state C)P(Voter lives in state C))
= (0.60)*(0.25)/((0.50)*(0.40) + (0.60)*(0.25) + (0.35)*(0.35))
= (0.15)/(0.20 + 0.15 + 0.1225)
= 0.15/0.4725
= 0.3175.
The probability that the voter lives in state B is approximately 0.32
Probabilities and Odds
The odds are defined as the probability that the event will occur divided by the probability
that the event will not occur.
If the probability of an event occurring is Y, then the probability of the event not occurring is
1-Y. (Example: If the probability of an event is 0.80 (80%), then the probability that the
event will not occur is 1-0.80 = 0.20, or 20%.
Using the RED marble example [P(RED) = 1/4 and Odds_Favor(RED) = 1/3] we can
demonstrate how these are equivalent:
Random variable
• A random variable is a variable whose value is a numerical outcome of a random
phenomenon.
• The probability distribution of a random variable X tells what the possible values of X
are and how probabilities are assigned to those values
The probability that X is between an interval of numbers is the area under the density
curve between the interval endpoints
The probability that a continuous random variable X is exactly equal to a number is zero
Means and Variances of Random Variables
• The mean of a discrete random variable, X, is its weighted average. Each value of X is
weighted by its probability.
• To find the mean of X, multiply each value of X by its probability, then add all the
products.
• The mean of a random variable X is called the expected value of X.
X x1 p1 x2 p2 xk pk
xi pi
Mean of Random Variables
Law of Large Numbers
As the number of observations increases, the mean of the observed values, x
The more variation in the outcomes, the more trials are needed to ensure that x is close to
Rules of means
If X is a random variable and a and b are fixed numbers, then
a bX a b X
This creates a distribution that resembles a bell (hence the nickname). The
bell curve is symmetrical. Half of the data will fall to the left of the mean; half
will fall to the right.
Properties of Normal Distribution
• The mean, mode and median are all equal.
• The curve is symmetric at the center (i.e. around the mean, μ).
• Exactly half of the values are to the left of center and exactly half the
values are to the right.
99,9% 3,291
Confidence Interval
• Which is:
• 175cm ± 6,20cm
• Commonly, two statistical data sets are compared, or a data set obtained by sampling is
compared against a synthetic data set from an idealized model.
• A hypothesis is proposed for the statistical relationship between the two data sets, and
this is compared as an alternative to an idealized null hypothesis that proposes no
relationship between two data sets.
Testing Hypothesis
The process of distinguishing between the null hypothesis and the alternative
hypothesis is aided by considering two conceptual types of errors.
The first type of error occurs when the null hypothesis is wrongly rejected. The
second type of error occurs when the null hypothesis is wrongly not rejected.
Critical Values and Critical Region
A critical value is a line on a graph that splits the graph into sections. One or two of the
sections is the “rejection region”; if your test value falls into that region, then you reject the
null hypothesis.
A one tailed test with the rejection in one tail. The critical value is the red line to the left of that
region
Critical Values
shows that results of a one-tailed Z-test are shows that results of a two-tailed Z-test are
significant if the test statistic is equal to or significant if the absolute value of the test statistic
greater than 1.64, the critical value in this is equal to or greater than 1.96, the critical value in
case. The shaded area is 5% (α) of the area this case
under the curve
When are critical values of z used?
• Z-scores are used when the population standard deviation is known or when
you have larger sample sizes.
• While the z-score can also be used to calculate probability for unknown
standard deviations and small samples, many statisticians prefer to use the t
distribution to calculate these probabilities.
Examples of critical regions
Hypothesis
Here are 7 steps to take to formulate a strong A/B testing Hypothesis
• Defining your problem is the first thing that needs to be done. What is it that
you want to test or solve? Is it to double your sales or to increase the number
of opt-ins?
• Whatever your goals are, they need to be clearly defined, quantifiable, and
measurable. This should give you a clear idea of what your new design
should solve including the process that will be followed to achieve the
results.
Here are 7 steps to take to formulate a strong A/B testing Hypothesis
• Now that you have defined your problem and you have a clear picture of what it is you want to
achieve, the next thing that follows is an in-depth analysis of the current problem. Basically, you want
to take as much time as possible to learn the reasons behind your numbers.
• You won’t be able to form an accurate hypothesis without studying what is happening in the website
where you want to test your A/B test. Now that you are already looking for better variables to improve
your conversion rates, it is only logical that you find the reasons that brought you to this current
situation. Why are you experiencing high bounce rate? Why aren’t you seeing more conversions?
Why are most of your customers failing to complete the payment process? These are obviously some
of the reasons that may push you to improve your website
Here are 7 steps to take to formulate a strong A/B testing Hypothesis
• It is important to get real feedback from your visitors. One way is to use surveys—both
entry surveys and exit surveys that are used to discover your visitor’s objectives and
determine whether their goals have been met respectively this is aimed at
understanding what they want or what their desires are.
• Knowing the reasons behind their decisions and actions is the most important part of
the survey. Therefore, do not hesitate to ask them to give reasons for their actions in the
survey. For instance, you can place an exit survey at the end of a buying process to ask
them why they bought your product. You could also place an exit survey immediately
they abandoning a buying process to understand why they did so.
Here are 7 steps to take to formulate a strong A/B testing Hypothesis
• 4. Use segmentation to get actionable data
• An experiment may show that a certain product is not performing well, but upon further
analysis, it may be discovered that majority of people who buy the product are women
aged between 18 and 29 years.
• Upon further investigation, it may turn out that ads for the product were being targeted to
the general population. So when you do segmentation, it may eventually occur to you that
you should concentrate your marketing efforts on the women who fall in the 18-29 age
bracket.
Here are 7 steps to take to formulate a strong A/B testing Hypothesis
• Now that you have gathered enough evidence to show what or where the problems is, it
is time to state why you think the problem occurs.
• Your hypothesis should have the following characteristics:
• It is goal oriented—it clearly states what needs to be accomplished
• It can be tested—it can easily be implemented
• It is insightful—looking at the hypothesis, one should learn something about the problem.
• An example of hypothesis
• Problem: less than 5% of visitors buy the mobile app
Here are 7 steps to take to formulate a strong A/B testing Hypothesis
• We can call this the brainstorming stage. After determining the problem and articulating a
hypothesis. The next thing that follows is coming up with substantial variations based on
your hypothesis.
• Taking the above example, the hypothesis states that “The text in the CTA button does
not provide a clear message to the customer.” The substantial variations could include
things like changing the colour of the button, changing position of the CTA on the landing
page, changing the wordings, creating different icon etc.
• The substantial variations of your hypothesis are meant to bring you closer to the
solution as quickly as possible and provide you with insights.
Here are 7 steps to take to formulate a strong A/B testing Hypothesis
• Once you’ve managed to articulate your hypotheses and test substantial variations,
it’s time to analyse results to validate your hypothesis. You need to have sufficient
test results in order to analyse and compare.
• When you are analysing your tests with the aim of implementing solutions, you
should bear in mind that revenue is the ultimate measurement of improvement.
• Customer feedback and analytics are tools you can use. You should look at the data
your customers have left to help you choose the elements that need to be analysed.
The various elements you could test include:
Here are 7 steps to take to formulate a strong A/B testing Hypothesis