AI - Module 4

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 57

Probabilistic reasoning in

Artificial intelligence
Module - 4
Uncertainty:

• Till now, we have learned knowledge representation using


first-order logic and propositional logic with certainty,
which means we were sure about the predicates. With this
knowledge representation, we might write A→B, which
means if A is true then B is true, but consider a situation
where we are not sure about whether A is true or not then
we cannot express this statement, this situation is called
uncertainty.
• So to represent uncertain knowledge, where we are not
sure about the predicates, we need uncertain reasoning or
probabilistic reasoning.
Causes of uncertainty:

Following are some leading causes of uncertainty to


occur in the real world.

• Information occurred from unreliable sources.


• Experimental Errors
• Equipment fault
• Temperature variation
• Climate change.
Probabilistic reasoning:

• Probabilistic reasoning is a way of knowledge representation where


we apply the concept of probability to indicate the uncertainty in
knowledge. In probabilistic reasoning, we combine probability
theory with logic to handle the uncertainty.
• We use probability in probabilistic reasoning because it provides a
way to handle the uncertainty that is the result of someone's laziness
and ignorance.
• In the real world, there are lots of scenarios, where the certainty of
something is not confirmed, such as "It will rain today," "behavior
of someone for some situations," "A match between two teams or
two players." These are probable sentences for which we can assume
that it will happen but not sure about it, so here we use probabilistic
reasoning.
Need of probabilistic reasoning in AI:
• When there are unpredictable outcomes.
• When specifications or possibilities of predicates
becomes too large to handle.
• When an unknown error occurs during an experiment.

In probabilistic reasoning, there are two ways to solve


problems with uncertain knowledge:
• Bayes' rule
• Bayesian Statistics
Need of probabilistic reasoning in AI
(Cont..)
As probabilistic reasoning uses probability and related terms, so
before understanding probabilistic reasoning, let's understand some
common terms:
• Probability: Probability can be defined as a chance that an
uncertain event will occur. It is the numerical measure of the
likelihood that an event will occur. The value of probability always
remains between 0 and 1 that represent ideal uncertainties.

0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.


P(A) = 0, indicates total uncertainty in an event A.
P(A) =1, indicates total certainty in an event A.
Need of probabilistic reasoning in AI
(Cont..)
We can find the probability of an uncertain event by using the below
formula.
P(¬A) = probability of a not happening event.
P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event.
Sample space: The collection of all possible events is called sample
space.
Random variables: Random variables are used to represent the events
and objects in the real world.
Prior probability: The prior probability of an event is probability
computed before observing new information.
Posterior Probability: The probability that is calculated after all
evidence or information has taken into account. It is a combination of
prior probability and new information.
Conditional probability:

Conditional probability is a probability of occurring


an event when another event has already happened.
Let's suppose, we want to calculate the event A when
event B has already occurred, "the probability of A
under the conditions of B", it can be written as:

Where P(A⋀B)= Joint probability of a and B


P(B)= Marginal probability of B.
If the probability of A is given and we need to find
the probability of B, then it will be given as:
Example of Conditional probability:

Example:
In a class, there are 70% of the students who like
English and 40% of the students who likes English and
mathematics, and then what is the percent of students
those who like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like
Mathematics.
Bayes' theorem in Artificial intelligence

Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law,
or Bayesian reasoning, which determines the probability
of an event with uncertain knowledge.
In probability theory, it relates the conditional probability
and marginal probabilities of two random events.
Bayes' theorem was named after the British mathematician
Thomas Bayes. The Bayesian inference is an application
of Bayes' theorem, which is fundamental to Bayesian
statistics.
Bayes' theorem in Artificial intelligence
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of an event by
observing new information of the real world.

Example: If cancer corresponds to one's age then by using Bayes' theorem,


we can determine the probability of cancer more accurately with the help
of age.
Bayes' theorem can be derived using product rule and conditional
probability of event A with known event B:
As from product rule we can write:
P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
P(A ⋀ B)= P(B|A) P(A)
Bayes' theorem in Artificial intelligence (Cont..)
Equating right hand side of both the equations, we will
get:

The above equation (a) is called as Bayes' rule or Bayes' theorem.


This equation is basic of most modern AI systems for probabilistic
inference.
It shows the simple relationship between joint and conditional
probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will
be read as Probability of hypothesis A when we have occurred an
evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis
is true, then we calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before
considering the evidence
P(B) is called marginal probability, pure probability of an evidence.
Bayes' theorem in Artificial intelligence
(Cont..)
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai),
hence the Bayes' rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and


exhaustive events.

Where A1, A2, A3,........, An is a set of mutually exclusive and


exhaustive events.
Example of Bayes Theorem
Example-1:
Question: what is the probability that a patient has diseases meningitis
with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck,
and it occurs 80% of the time. He is also aware of some more facts, which are
given as follows:
The Known probability that a patient has meningitis disease is 1/30,000.
The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition
that patient has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Example of Bayes Theorem (Cont..)
Example-2:
Question: From a standard deck of playing cards, a single card
is drawn. The probability that the card is king is 4/52, then
calculate posterior probability P(King|Face), which means the
drawn face card is a king card.
Solution:

P(king): probability that the card is King= 4/52= 1/13


P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a
king = 1
Putting all values in equation (i) we will get:
Example of Bayes Theorem (Cont..)
Example of Bayes Theorem (Cont..)
• Problem 3:
I want to solve one more example from a popular topic as
Covid-19. As you know, Covid-19 tests are common
nowadays, but some results of tests are not true. Let’s
assume; a diagnostic test has 99% accuracy and 60% of
all people have Covid-19. If a patient tests positive, what
is the probability that they actually have the disease?
Example of Bayes Theorem (Cont..)
Example of Bayes Theorem (Cont..)
The total units which have positive results= 59.4 + 0.4 = 59.8
59.4 units (true positive) is 59.8 units means 99.3% = 0.993
probability

With Bayes’;

P(positive|covid19) = 0.99
P(covid19) = 0.6
P(positive) = 0.6*0.99+0.4*0.01=0.598
Example of Bayes Theorem (Cont..)
Application of Bayes' theorem in Artificial intelligence:

Following are some applications of Bayes' theorem:


• It is used to calculate the next step of the robot when the
already executed step is given.
• Bayes' theorem is helpful in weather forecasting.
• It can solve the Monty Hall problem.
Naïve Bayes Classifier Algorithm

• Naïve Bayes algorithm is a supervised learning algorithm,


which is based on Bayes theorem and used for solving
classification problems.

• It is mainly used in text classification that includes a high-


dimensional training dataset.

• Naïve Bayes Classifier is one of the simple and most effective


Classification algorithms which helps in building the fast
machine learning models that can make quick predictions.

• It is a probabilistic classifier, which means it predicts on the


basis of the probability of an object.
Why Naive Bayes?
• The Naïve Bayes algorithm is comprised of two
words Naïve and Bayes, Which can be described as:
• Naïve: It is called Naïve because it assumes that the
occurrence of a certain feature is independent of the
occurrence of other features.
• Bayes: It is called Bayes because it depends on the
principle of Bayes' Theorem.
• Some popular examples of Naïve Bayes Algorithm
are spam filtration, Sentimental analysis, and
classifying articles.
Bayes Theorem
Bayes' theorem is also known as Bayes' Rule or Bayes'
law, which is used to determine the probability of a
hypothesis with prior knowledge. It depends on the
conditional probability.

The formula for Bayes' theorem is given as:


Bayes Theorem
Where
P(A|B) is Posterior probability: Probability of hypothesis
A on the observed event B.

P(B|A) is Likelihood probability: Probability of the


evidence given that the probability of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis


before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.


Steps of Naïve Bayes' Classifier:

So to solve this problem, we need to follow the


below steps:

1.Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the


probabilities of given features.

3. Now, use Bayes theorem to calculate the


posterior probability.
Working of Naïve Bayes' Classifier:
• Suppose we have a dataset of weather conditions and
corresponding target variable "Play". So using this
dataset we need to decide that whether we should play or
not on a particular day according to the weather
conditions.

Problem: If the weather is sunny, then the Player should


play or not?

Solution: To solve this, first consider the below dataset:


Data Set
Outlook Play

0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Frequency Distribution
Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 4
Likelihood Table of Work Condition
Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71


Application of Bayes'theorem:
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|
Sunny)
Hence on a Sunny day, Player can play the game.
Advantages of Naïve Bayes Classifier:

• Naïve Bayes is one of the fast and easy ML


algorithms to predict a class of datasets.

• It can be used for Binary as well as Multi-class


Classifications.

• It performs well in Multi-class predictions as


compared to the other Algorithms.

• It is the most popular choice for text classification


problems.
Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are
independent or unrelated, so it cannot learn the
relationship between features.
Bayesian Belief Network in artificial intelligence

• Bayesian belief network is key computer technology for


dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network
as:

• "A Bayesian network is a probabilistic graphical model


which represents a set of variables and their conditional
dependencies using a directed acyclic graph.“

• It is also called a Bayes network, belief network,


decision network, or Bayesian model.
Bayesian Belief Network in artificial intelligence
Bayesian networks are probabilistic, because these networks are built from a
probability distribution, and also use probability theory for prediction and
anomaly detection.

Real world applications are probabilistic in nature, and to represent the


relationship between multiple events, we need a Bayesian network. It can also
be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction, and
decision making under uncertainty.
Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts:
• Directed Acyclic Graph
• Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
Influence Diagram:
A Bayesian network graph is made up of nodes and Arcs
(directed links), where:

•Each node corresponds to the random variables, and a


variable can be continuous or discrete.
Influence Diagram: (Cont..)
Arc or directed arrows represent the causal relationship or
conditional probabilities between random variables. These
directed links or arrows connect the pair of nodes in the
graph.

• These links represent that one node directly influence the other
node, and if there is no directed link that means that nodes are
independent with each other In the above diagram, A, B, C, and D
are random variables represented by the nodes of the network
graph.
• If we are considering node B, which is connected with node A by
a directed arrow, then node A is called the parent of Node B.
• Node C is independent of node A.
components of Bayesian network :
Note: The Bayesian network graph does not contain any
cyclic graph. Hence, it is known as a directed acyclic graph
or DAG.
The Bayesian network has mainly two components:
• Causal Component
• Actual numbers
Each node in the Bayesian network has condition probability
distribution P(Xi|Parent(Xi)), which determines the effect of
the parent on that node.
Bayesian network is based on Joint probability distribution
and conditional probability. So let's first understand the joint
probability distribution:
Joint probability distribution:

If we have variables x1, x2, x3,....., xn, then the probabilities


of a different combination of x1, x2, x3.. xn, are known as
Joint probability distribution.

P[x1, x2, x3,....., xn], it can be written as the following way in


terms of the joint probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:

• Let's understand the Bayesian network through an example by creating


a directed acyclic graph:
• Example: Harry installed a new burglar alarm at his home to detect
burglary. The alarm reliably responds at detecting a burglary but also
responds for minor earthquakes. Harry has two neighbors David and
Sophia, who have taken a responsibility to inform Harry at work when
they hear the alarm. David always calls Harry when he hears the alarm,
but sometimes he got confused with the phone ringing and calls at that
time too. On the other hand, Sophia likes to listen to high music, so
sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.
• Problem:
• Calculate the probability that alarm has sounded, but there is
neither a burglary, nor an earthquake occurred, and David and
Sophia both called the Harry.
Solution:

• The Bayesian network for the above problem is given below. The
network structure is showing that burglary and earthquake is the
parent node of the alarm and directly affecting the probability of
alarm's going off, but David and Sophia's calls depend on alarm
probability.
• The network is representing that our assumptions do not directly
perceive the burglary and also do not notice the minor earthquake,
and they also not confer before calling.
• The conditional distributions for each node are given as conditional
probabilities table or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the
table represent an exhaustive set of cases for the variable.
• In CPT, a boolean variable with k boolean parents contains 2 K
probabilities. Hence, if there are two parents, then CPT will contain
4 probability values
Solution (Cont..)

List of all events occurring in this network:


• Burglary (B)
• Earthquake(E)
• Alarm(A)
• David Calls(D)
• Sophia calls(S)

We can write the events of problem statement in the form of


probability: P[D, S, A, B, E], can rewrite the above probability
statement using joint probability distribution:
Solution (Cont..)

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]


Bayesian network of the Problem
Solution (Cont..)

Let's take the observed probability for the Burglary and earthquake
component:
P(B=True) = 0.002, which is the probability of burglary.
P(B=False) = 0.998, which is the probability of no burglary.
P(E=True) = 0.001, which is the probability of a minor earthquake
P(E=False)= 0.999, Which is the probability that an earthquake not
occurred.

We can provide the conditional probabilities as per the below


tables:

Conditional probability table for Alarm A:


The Conditional probability of Alarm A depends on Burglar and
earthquake:
Solution (Cont..)

B E P(A= True) P(A= False)


True True 0.94 0.06
True False 0.95 0.04
False True 0.31 0.69
False False 0.001 0.999

Conditional probability table for David Calls:


The Conditional probability of David that he will call depends on the
probability of Alarm.

A P(D= True) P(D= False)


True 0.91 0.09
False 0.05 0.95
Solution (Cont..)
From the formula of joint distribution, we can write the
problem statement in the form of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E)
*P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query
about the domain by using Joint distribution.
Semantics of the Bayesian network
There are two ways to understand the semantics of the
Bayesian network, which is given below:

1. To understand the network as the representation of


the Joint probability distribution.
It is helpful to understand how to construct the network.

2. To understand the network as an encoding of a


collection of conditional independence statements.
It is helpful in designing inference procedure.
Example 2: Bayesian Network
• let us imagine that we are given the task of modeling a student’s
marks (m) for an exam he has just given. From the given Bayesian
Network Graph below, we see that the marks depend upon two
other variables. They are,
• Exam Level (e)– This discrete variable denotes the difficulty of the
exam and has two values (0 for easy and 1 for difficult)
• IQ Level (i) – This represents the Intelligence Quotient level of the
student and is also discrete in nature having two values (0 for low
and 1 for high)
• Additionally, the IQ level of the student also leads us to another
variable, which is the Aptitude Score of the student (s). Now, with
marks the student has scored, he can secure admission to a
particular university. The probability distribution for getting
admitted (a) to a university is also given below.
Example 2 (Cont..)
Example 2 (Cont..)
• In the above graph, we see several tables representing the
probability distribution values of the given 5 variables. These
tables are called the Conditional Probabilities Table or CPT.
There are a few properties of the CPT given below –
• The sum of the CPT values in each row must be equal to 1
because all the possible cases for a particular variable are
exhaustive (representing all possibilities).
• If a variable that is Boolean in nature has k Boolean parents,
then in the CPT it has 2K probability values.
• Coming back to our problem, let us first list all the possible
events that are occurring in the above-given table.
Example 2 (Cont..)
• Exam Level (e)
• IQ Level (i)
• Aptitude Score (s)
• Marks (m)
• Admission (a)
These five variables are represented in the form of a
Directed Acyclic Graph (DAG) in a Bayesian Network
format with their Conditional Probability tables. Now, to
calculate the Joint Probability Distribution of the 5
variables the formula is given by,
P[a, m, i, e, s]= P(a | m) . P(m | i, e) . P(i) . P(e) . P(s | i)
Example 2 (Cont..)
• From the above formula,
• P(a | m) denotes the conditional probability of the student
getting admission based on the marks he has scored in the
examination.
• P(m | i, e) represents the marks that the student will score
given his IQ level and difficulty of the Exam Level.
• P(i) and P(e) represent the probability of the IQ Level and the
Exam Level.
• P(s | i) is the conditional probability of the student’s Aptitude
Score, given his IQ Level.
• With the following probabilities calculated, we can find the
Joint Probability Distribution of the entire Bayesian Network.
Example 2 (Case 1)
Case 1: Calculate the probability that in spite of the exam
level being difficult, the student having a low IQ level and a
low Aptitude Score, manages to pass the exam and secure
admission to the university.
Solution : Case 1
From the above word problem statement, the Joint
Probability Distribution can be written as below,
P[a=1, m=1, i=0, e=1, s=0]
From the above Conditional Probability tables, the values
for the given conditions are fed to the formula and is
calculated as below.
P[a=1, m=1, i=0, e=0, s=0] = P(a=1 | m=1) . P(m=1 | i=0,
e=1) . P(i=0) . P(e=1) . P(s=0 | i=0)
= 0.1 * 0.1 * 0.8 * 0.3 * 0.75
= 0.0018
Example 2 (Case 2)
• Case 2: In another case, calculate the probability that
the student has a High IQ level and Aptitude Score,
the exam being easy yet fails to pass and does not
secure admission to the university.
Solution : Case 2
The formula for the JPD is given by
P[a=0, m=0, i=1, e=0, s=1]
Thus,
P[a=0, m=0, i=1, e=0, s=1]= P(a=0 | m=0) . P(m=0 | i=1,
e=0) . P(i=1) . P(e=0) . P(s=1 | i=1)
= 0.6 * 0.5 * 0.2 * 0.7 * 0.6
= 0.0252
Hence, in this way, we can make use of Bayesian
Networks and Probability tables to calculate the
probability for various possible events that occur.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy