MC4301 - ML Unit 3 (Bayesian Learning)
MC4301 - ML Unit 3 (Bayesian Learning)
MC4301 - ML Unit 3 (Bayesian Learning)
Uncertainty:
we might write A→B, which means if A is true then B is true, but consider a situation
where we are not sure about whether A is true or not then we cannot express this
statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we
need uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
Probabilistic reasoning:
In the real world, there are lots of scenarios, where the certainty of something is not
confirmed, such as "It will rain today," "behavior of someone for some situations," "A
match between two teams or two players." These are probable sentences for which we
can assume that it will happen but not sure about it, so here we use probabilistic
reasoning.
In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:
about:blank 1/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
Bayes' rule
Bayesian Statistics
Probability: Probability can be defined as a chance that an uncertain event will occur.
It is the numerical measure of the likelihood that an event will occur. The value of
probability always remains between 0 and 1 that represent ideal uncertainties.
We can find the probability of an uncertain event by using the below formula.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in
the real world.
Conditional probability:
about:blank 2/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
Let's suppose, we want to calculate the event A when event B has already occurred,
"the probability of A under the conditions of B", it can be written as:
If the probability of A is given and we need to find the probability of B, then it will be
given as:
It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
⋀ B) by P( B ).
event B is already occurred by dividing the probability of P(A⋀
Example:
In a class, there are 70% of the students who like English and 40% of the students
who likes English and mathematics, and then what is the percent of students those
who like English also like mathematics?
Solution:
Hence, 57% are the students who like English also like Mathematics.
about:blank 3/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
Inference
The inference process in an agent takes place according to some rules, which are
known as the inference rules or rule of inference. Following are the major types of
inference rules that are used:
P
----------
∴P v Q
P^Q P^Q
---------- OR ----------
∴P ∴Q
3) Modus Ponens: This is the most widely used inference rule. It states:
P->Q
P
-----------
∴Q
P->Q
~Q
-----------
∴~P
P
P->Q
-----------
∴Q
about:blank 4/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
6) Backward Chaining: This is also a type of deductive inference rule. This rule
states that:
P
P->Q
-----------
∴P
7) Resolution: In the reasoning by resolution, we are given the goal condition and
available facts and statements. Using these facts and statements, we have to decide
whether the goal condition is true or not, i.e. is it possible for the agent to reach the
goal state or not. We prove this by the method of contradiction. This rule states that:
PvQ
~P^R
-----------
∴Q v R
8) Hypothetical Syllogism: This rule states the transitive relation between the
statements:
P->Q
Q->R
-----------
∴P->R
PvQ
~P
-----------
∴Q
Machine learning (ML) inference is the process of running live data points into a
machine learning algorithm (or “ML model”) to calculate an output such as a single
numerical score. This process is also referred to as “operationalizing an ML model” or
“putting an ML model into production.” When an ML model is running in production,
it is often then described as artificial intelligence (AI) since it is performing functions
similar to human thinking and analysis. Machine learning inference basically entails
deploying a software application into a production environment, as the ML model is
typically just software code that implements a mathematical algorithm. That
algorithm makes calculations based on the characteristics of the data, known as
“features” in the ML vernacular.
An ML lifecycle can be broken up into two main, distinct parts. The first is the
training phase, in which an ML model is created or “trained” by running a specified
subset of data into the model. ML inference is the second phase, in which the model is
put into action on live data to produce actionable output. The data processing by the
about:blank 5/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
ML model is often referred to as “scoring,” so one can say that the ML model scores
the data, and the output is a score.
In machine learning inference, the data sources are typically a system that captures the
live data from the mechanism that generates the data. The host system for the machine
learning model accepts data from the data sources and inputs the data into the
machine learning model. The data destinations are where the host system should
deliver the output score from the machine learning model.
The data sources are typically a system that captures the live data from the mechanism
that generates the data. For example, a data source might be an Apache Kafka cluster
that stores data created by an Internet of Things (IoT) device, a web application log
file, or a point-of-sale (POS) machine. Or a data source might simply be a web
about:blank 6/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
application that collects user clicks and sends data to the system that hosts the ML
model.
The host system for the ML model accepts data from the data sources and inputs the
data into the ML model. It is the host system that provides the infrastructure to turn
the code in the ML model into a fully operational application. After an output is
generated from the ML model, the host system then sends that output to the data
destinations. The host system can be, for example, a web application that accepts data
input via a REST interface, or a stream processing application that takes an incoming
feed of data from Apache Kafka to process many data points per second.
The data destinations are where the host system should deliver the output score from
the ML model. A destination can be any type of data repository like Apache Kafka or
a database, and from there, downstream applications take further action on the scores.
For example, if the ML model calculates a fraud score on purchase data, then the
applications associated with the data destinations might send an “approve” or “decline”
message back to the purchase site.
Additionally, DevOps and data engineers are sometimes not able to help with
deployment, often due to conflicting priorities or a lack of understanding of what’s
required for ML inference. In many cases, the ML model is written in a language like
Python, which is popular among data scientists, but the IT team is more well-versed in
a language like Java. This means that engineers must take the Python code and
translate it to Java to run it within their infrastructure. In addition, the deployment of
ML models requires some extra coding to map the input data into a format that the
ML model can accept, and this extra work adds to the engineers’ burden when
deploying the ML model.
Also, the ML lifecycle typically requires experimentation and periodic updates to the
ML models. If deploying the ML model is difficult in the first place, then updating
models will be almost as difficult. The whole maintenance effort can be difficult, as
there are business continuity and security issues to address.
about:blank 7/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
Independence
Let’s say A is the height of a child and B is the number of words that the child
knows. It seems when A is high, B is high too.
The height and the # of words known by the kid are NOT independent, but they are
conditionally independent if you provide the kid’s age.
2. Mathematical Form
about:blank 8/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 9/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
3. Applications
Because it is a foundation for many statistical models that we use. (e.g., latent class
models, factor analysis, graphical models, etc.)
Using this property, we can simplify the whole joint distribution into the formula
below:
10
about:blank 10/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
This reduces so much of the computation since we now only take into account its
parent and disregard everything else.
For a Bayesian Network, with a maximum of k parents for any node, we need
only O(n * 2^k) probabilities. (This can be carried out in linear time for certain
numbers of classes.)
Let’s say I’d like to estimate the engagement (clap) rate of my blog. Let p be the
proportion of readers who will clap for my articles. We’ll choose n readers randomly
from the population. For i = 1, …, n, let Xi = 1 if the reader claps or Xi = 0 if s/he
doesn’t.
11
about:blank 11/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
C. Correlation ≠ Causation
“Correlation is not causation” means that just because two things correlate does not
necessarily mean that one causes the other.
A study has shown a positive and significant correlation between the number of
accidents and taxi drivers’ wearing coats. They found that coats might hinder the
driver’s movements and cause accidents. A new law was ready to ban taxi drivers
from wearing coats while driving.
Until another study pointed out that people wear coats when it rains…
Correlations between two things can be caused by a third factor that affects both of
them. This third factor is called a confounder. The confounder, which is rain, was
responsible for the correlation between accident and wearing coats.
Note that this does NOT mean accidents are independent of rain. What it means is:
given drivers wearing coats, knowing rain doesn’t give any more information about
accidents.
5. More examples!
12
about:blank 12/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
for sure about the relationship between A & B until we test every possible C
(confounding variable)!
Bayes’ Rule
Bayes' Rule is the most important rule in data science. It is the mathematical rule that
describes how to update a belief, given some evidence. In other words – it describes
the act of learning.
Bayes' Rule can answer a variety of probability questions, which help us (and
machines) understand the complex world we live in.
Today, Bayes' Rule has numerous applications, from statistical analysis to machine
learning.
Conditional probability
Conditional probability is the bridge that lets you talk about how multiple uncertain
events are related. It lets you talk about how the probability of an event can vary
under different conditions.
13
about:blank 13/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
For example, consider the probability of winning a race, given the condition you
didn't sleep the night before. You might expect this probability to be lower than the
probability you'd win if you'd had a full night's sleep.
Or, consider the probability that a suspect committed a crime, given that their
fingerprints are found at the scene. You'd expect the probability they are guilty to be
greater, compared with had their fingerprints not been found.
P(A|B)
An important thing to remember is that conditional probabilities are not the same as
their inverses.
That is, the "probability of event A given event B" is not the same thing as the
"probability of event B, given event A".
Bayes' Rule tells you how to calculate a conditional probability with information you
already have.
It is helpful to think in terms of two events – a hypothesis (which can be true or false)
and evidence (which can be present or absent).
14
about:blank 14/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
However, it can be applied to any type of events, with any number of discrete or
continuous outcomes.
Bayes' Rule lets you calculate the posterior (or "updated") probability. This is a
conditional probability. It is the probability of the hypothesis being true, if the
evidence is present.
Think of the prior (or "previous") probability as your belief in the hypothesis
before seeing the new evidence. If you had a strong belief in the hypothesis already,
the prior probability will be large.
The prior is multiplied by a fraction. Think of this as the "strength" of the evidence.
The posterior probability is greater when the top part (numerator) is big, and the
bottom part (denominator) is small.
Remember, the "probability of the evidence being present given the hypothesis is
true" is not the same as the "probability of the hypothesis being true given the
evidence is present".
Now look at the denominator. This is the marginal probability of the evidence. That
is, it is the probability of the evidence being present, whether the hypothesis is true or
false. The smaller the denominator, the more "convincing" the evidence.
Your neighbour is watching their favourite football (or soccer) team. You hear them
cheering, and want to estimate the probability their team has scored.
Step 3 – estimate the likelihood probability of cheering, given there's a goal as 90%
(perhaps your neighbour won't celebrate if their team is losing badly)
15
about:blank 15/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 16/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 17/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 18/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 19/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 20/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 21/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 22/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 23/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 24/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 25/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 26/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 27/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 28/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 29/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 30/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 31/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 32/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 33/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 34/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 35/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 36/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 37/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 38/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 39/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 40/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 41/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 42/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 43/44
12/18/23, 6:23 PM MC4301 - ML Unit 3 (Bayesian Learning)
about:blank 44/44