Unit 6
Unit 6
Unit6 :
Bayesian Concept Learning:
1
Outline
2
Impotence of Bayesian methods
Probabilistic models can define relationships between variables and be used
to calculate probabilities.
Thus, conditional probabilities are a must in determining accurate
predictions and probabilities in Machine Learning.
For example, fully conditional models may require an enormous amount of
data to cover all possible cases, and probabilities may be intractable to
calculate in practice.
Simplifying assumptions such as the conditional independence of all random
variables can be effective, such as in the case of Naive Bayes, although it is a
drastically simplifying step.
An alternative is to develop a model that preserves known conditional
dependence between random variables and conditional independence in all
other cases.
Bayesian networks are a probabilistic graphical model that explicitly capture
the known conditional dependence with directed edges in a graph model.
All missing connections define the conditional independencies in the model.
3
Key words in Bayes theorem
Evidence – As discussed earlier, P(X) is known as evidence. It is simply the
probability that the customer will, in this case, be of age 26, earning $2000.
Prior Probability – P(H), known as the prior probability, is the simple
probability of our hypothesis – namely, that the customer will buy a book.
This probability will not be provided with any extra input based on age and
income. Since the calculation is done with lesser information, the result is
less accurate.
Posterior Probability – P(H | X) is known as the posterior probability.
Here, P(H | X) is the probability of the customer buying a book (H) given X
(that he is 26 years old and earns $2000).
Likelihood – P(X | H) is the likelihood probability. In this case, given that
we know the customer will buy the book, the likelihood probability is the
probability that the customer is of age 26 and has an income of $2000.
4
Bayes’ Theorem
Bayes Theorem is also used widely in machine learning, where it is a simple,
effective way to predict classes with precision and accuracy. The Bayesian
method of calculating conditional probabilities is used in machine learning
applications that involve classification tasks.
If there’s one aspect of machine learning you’ll hear talked about, it’s the
application of Bayes’ Theorem. You might also hear the terms Bayes’ Law or
Bayes’ Rule used, but all three are essentially the same thing.
The important part of Bayes’ Theorem is the observation of previous events, or
your degree of belief that something will occur.
If you have a degree of belief of one event happening, you can apply it to new
data and make an informed calculation to its probability.
There are plenty of examples of the application of Bayes’ Theorem to
determine the result of more coin flips, the spread of disease, or the potential
gender of offspring.
A simplified version of the Bayes Theorem, known as the Naive Bayes
Classification, is used to reduce computation time and costs. In this article, we
take you through these concepts and discuss the applications of the Bayes
Theorem in machine learning.
5
Bayes’ Theorem
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A
with known event B: P(A ⋀ B)= P(A|B) P(B)
The probability of event B with known event A: P(A ⋀ B)= P(B|A) P(A)
Bayes' rule or Bayes' theorem
P(B|A) P(A)
P(A|B) =
P(B)
This equation is basic of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities
P(A|B) is known as posterior, which we need to calculate, and it will be read as
Probability of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true,
then we calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering
the evidence
P(B) is called marginal probability, pure probability of an evidence.
In general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be
written as:
6
How Bayesian Networks Work
Now that you have a basic grasp of graphs, probability, and Bayes’ Theorem,
look at how the network is put together.
There are two events that would cause the yard to be wet; either the
owner had hosed it or it has been raining. In any normal circumstance, you
wouldn’t hose the yard while it was raining.
For each node you can assign true/false values:
Y = Yard wet (True or False)
R = Raining (True or False)
H = Someone using the hose (True or False)
7
How Bayesian Networks Work
Assigning Probabilities
8
How Bayesian Networks Work
Suppose we have a dataset of weather conditions and
corresponding target variable "Play".
So using this dataset we need to decide that whether we
should play or not on a particular day according to the
weather conditions.
So to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given
features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
9
How Bayesian Networks Work
Solution:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 4
Frequency table for the Weather Conditions:
Weather No Yes
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
11
Advantages and Disadvantages
Advantages:
Naïve Bayes is one of the fast and easy ML algorithms to
predict a class of datasets.
It can be used for Binary as well as Multi-class Classifications.
It performs well in Multi-class predictions as compared to the
other Algorithms.
It is the most popular choice for text classification
problems.
Disadvantages :
Naive Bayes assumes that all features are independent or
unrelated, so it cannot learn the relationship between features.
12
Applications of Naïve Bayes Classifier:
It is used for Credit Scoring.
It is used in medical data classification.
It can be used in real-time predictions because Naïve
Bayes Classifier is an eager learner.
It is used in Text classification such as Spam
filtering and Sentiment analysis.
13
Types of Naïve Bayes Model:
1. Gaussian Naive Bayes: 1. from sklearn.datasets import load_iris
The Gaussian model assumes2. from sklearn.model_selection import
train_test_split
that features follow a normal3. from sklearn.naive_bayes import
distribution. GaussianNB
4. X, y = load_iris(return_X_y=True)
This means if predictors take5. X_train, X_test, y_train, y_test =
continuous values instead of train_test_split(X, y, test_size=0.5,
discrete, then the model random_state=0)
assumes that these values are6. gnb = GaussianNB()
7. y_pred = gnb.fit(X_train,
sampled from the Gaussian y_train).predict(X_test)
distribution. 8. print("Number of mislabeled points out of
a total %d points : %d" % (X_test.shape[0],
(y_test != y_pred).sum()))
14
Types of Naïve Bayes Model:
Multinomial Naive Bayes
1. import numpy as np
The Multinomial Naïve Bayes 2. rng = np.random.RandomState(1)
classifier is used when the data is 3. X = rng.randint(5, size=(6, 100))
4. y = np.array([1, 2, 3, 4, 5, 6])
multinomial distributed. It is 5. from sklearn.naive_bayes import
primarily used for document MultinomialNB
classification problems, it means 6. clf = MultinomialNB()
a particular document belongs 7. clf.fit(X, y)
8. print(clf.predict(X[2:3]))
to which category such as
Sports, Politics, education, etc.
The classifier uses the frequency
of words for the predictors.
15
Types of Naïve Bayes Model:
Bernoulli Naive Bayes
1. import numpy as np
BernoulliNB implements the naive 2. rng = np.random.RandomState(1)
Bayes training and classification
algorithms for data that is distributed 3. X = rng.randint(5, size=(6, 100))
according to multivariate Bernoulli 4. Y = np.array([1, 2, 3, 4, 4, 5])
distributions; 5. from sklearn.naive_bayes import
herefore, this class requires samples BernoulliNB
to be represented as binary-valued 6. clf = BernoulliNB()
feature vectors; if handed any other 7. clf.fit(X, Y)
kind of data, a BernoulliNB instance 8. print(clf.predict(X[2:3]))
may binarize its input
The Bernoulli classifier works similar
to the Multinomial classifier, but the
predictor variables are the
independent Booleans variables. Such
as if a particular word is present or
not in a document. This model is also
famous for document classification
tasks.
16
Bayesian Belief Network
Bayesian belief network is key computer technology for dealing with
probabilistic events and to solve a problem which has uncertainty. We can
define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a
set of variables and their conditional dependencies using a directed acyclic
graph."
It is also called a Bayes network, belief network, decision network,
or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction
and anomaly detection.
17
18
https://machinelearningmastery.com/monte-carlo-sampling-for-
probability/
https://seleritysas.com/blog/2019/12/12/types-of-predictive-
analytics-models-and-how-they-work/
https://towardsdatascience.com/selecting-the-correct-
predictive-modeling-technique-ba459c370d59
https://www.netsuite.com/portal/resource/articles/financial-
management/predictive-modeling.shtml
https://www.dezyre.com/article/types-of-analytics-descriptive-
predictive-prescriptive-analytics/209#toc-2
https://www.sciencedirect.com/topics/computer-
science/descriptive-model
https://towardsdatascience.com/intro-to-feature-selection-
methods-for-data-science-4cae2178a00a