FAI Module 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Module - 3

Probabilistic reasoning in Artificial intelligence


Uncertainty:
Knowledge representation using first-order logic and propositional logic
with certainty, which means we were sure about the predicates. With this
knowledge representation, we might write A→B, which means if A is true then B is
true, but consider a situation where we are not sure about whether A is true or
not then we cannot express this statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the
predicates, we need uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we
apply the concept of probability to indicate the uncertainty in knowledge. In
probabilistic reasoning, we combine probability theory with logic to handle the
uncertainty.
We use probability in probabilistic reasoning because it provides a way to
handle the uncertainty that is the result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of
something is not confirmed, such as "It will rain today," "behavior of someone for
some situations," "A match between two teams or two players." These are
probable sentences for which we can assume that it will happen but not sure
about it, so here we use probabilistic reasoning.
Need of probabilistic reasoning in AI:
o When there are unpredictable outcomes.
o When specifications or possibilities of predicates becomes too large to
handle.
o When an unknown error occurs during an experiment.
In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:
o Bayes' rule
o Bayesian Statistics
As probabilistic reasoning uses probability and related terms, so before
understanding probabilistic reasoning, let's understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is
the numerical measure of the likelihood that an event will occur. The value of probability
always remains between 0 and 1 that represent ideal uncertainties.
0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
P(A) = 0, indicates total uncertainty in an event A.
P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.

o P(¬A) = probability of a not happening event.


o P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the
real world.
Prior probability: The prior probability of an event is probability computed before
observing new information.
Posterior Probability: The probability that is calculated after all evidence or information
has taken into account. It is a combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another
event has already happened.
Let's suppose, we want to calculate the event A when event B has already
occurred, "the probability of A under the conditions of B", it can be written as:

Where P(A⋀B)= Joint probability of A and B


P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will
be given as:
It can be explained by using the below Venn diagram, where B is occurred
event, so sample space will be reduced to set B, and now we can only calculate
event A when event B is already occurred by dividing the probability of P(A⋀B) by
P( B ).

Example:
In a class, there are 70% of the students who like English and 40% of the
students who likes English and mathematics, and then what is the percent of
students those who like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics.
Bayes' theorem in Artificial intelligence:
Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning,
which determines the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal
probabilities of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is fundamental
to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).

Bayes' theorem allows updating the probability prediction of an event by


observing new information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we
can determine the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of
event A with known event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
1. P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is
basic of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as
Probability of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then
we calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering
the evidence
P(B) is called marginal probability, pure probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes'
rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B),
and P(A). This is very useful in cases where we have a good probability of these
three terms and want to determine the fourth one. Suppose we want to perceive
the effect of some unknown cause, and want to compute that cause, then the
Bayes' rule becomes:

Example-1:
Question: what is the probability that a patient has diseases meningitis with a
stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck,
and it occurs 80% of the time. He is also aware of some more facts, which are
given as follows:
o The Known probability that a patient has meningitis disease is 1/30,000.
o The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that
patient has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease
with a stiff neck.
Application of Bayes' theorem in Artificial intelligence:
Following are some applications of Bayes' theorem:
o It is used to calculate the next step of the robot when the already executed
step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
Bayesian Belief Network in artificial intelligence
Bayesian belief network is key computer technology for dealing with probabilistic
events and to solve a problem which has uncertainty. We can define a Bayesian
network as:
"A Bayesian network is a probabilistic graphical model which represents a set of
variables and their conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian
model.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection.
Real world applications are probabilistic in nature, and to represent the
relationship between multiple events, we need a Bayesian network. It can also be
used in various tasks including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction, and decision making under
uncertainty.
Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:

o Each node corresponds to the random variables, and a variable can


be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows
connect the pair of nodes in thegraph.
These links represent that one node directly influence the other node, and if
there is no directed link that means that nodes are independent with each
other
o In the above diagram, A, B, C, and D are random variables
represented by the nodes of the network graph.
o If we are considering node B, which is connected with node A by a
directed arrow, then node A is called the parent of Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it
is known as a directed acyclic graph or DAG.
The Bayesian network has mainly two components:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability
distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on that
node.
Bayesian network is based on Joint probability distribution and conditional
probability. So let's first understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,.....,xn], it can be written as the following way in terms of the joint
probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:
Let's understand the Bayesian network through an example by creating a
directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The
alarm reliably responds at detecting a burglary but also responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always
calls Harry when he hears the alarm, but sometimes he got confused with the
phone ringing and calls at that time too. On the other hand, Sophia likes to listen
to high music, so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.

Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary,
nor an earthquake occurred, and David and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of
the alarm and directly affecting the probability of alarm's going off, but
David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive
the burglary and also do not notice the minor earthquake, and they also
not confer before calling.
o The conditional distributions for each node are given as conditional
probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
List of all events occurring in this network:
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A,
B, E], can rewrite the above probability statement using joint probability
distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:
B E P(A= True) P(A= False)
True True 0.94 0.06
True False 0.95 0.04
False True 0.31 0.69
False False 0.001 0.999
Conditional probability table for David Calls:
The Conditional probability of David that he will call depends on the probability of
Alarm.
A P(D= True) P(D= False)
True 0.91 0.09
False 0.05 0.95
Conditional probability table for Sophia Calls:
The Conditional probability of Sophia that she calls is depending on its Parent
Node "Alarm."
A P(S= True) P(S= False)
True 0.75 0.25
False 0.02 0.98
From the formula of joint distribution, we can write the problem statement in the
form of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.
Inferences in Bayesian Network – Purpose
 Probabilistic Inference System is to compute Posterior Probability
Distribution for a set of query variables, given some observed events.
 That is, some assignment values to a set of evidence variables.

Inference in Bayesian Networks – Notations


 X – Denotes the Query Variable
 E – Set of Evidence Variables {E1,……..Em}
 e-Particular observed Event
 Y – Non-Evidence, Non-Query Variables, Y1,…….Yn.(Called the Hidden
Variables)
 The Complete set of Variables – X ={X} U E U Y
 A typical Query asks for the Posterior Probability Distribution P(X|e)
 In the Burglary Network, we might observe the event in which
 JohnCalls=true and MaryCalls=true.

 We could then ask for, say, the probability that a burglary has occurred:

 P(Burglary | JohnCalls = true, MaryCalls = true) = (0.284, 0716)


Types of Inferences:
 Inference by Enumeration.

(Inference by listing or recording all variables)


 Inference by Variable Elimination.

(Inference by variable removal)


Inference by Enumeration:
 Any Conditional Probability can be Computed by summing terms from the
full joint distribution.
 More specifically, a query P(X|e) can be answered using equation.

P(X|e) = αP(X,e) = α∑P(X,e,y)


Y

Where α is normalized constant


X – Query Variable
E – event
y – number of terms
Inference by Enumeration…
 Consider P(Burglary | JohnCalls = true, MaryCalls = true)
 Burlary – Query Variable(X)
 JohnCalls – Evidence Variable1(E1)
 MaryCalls – Evidence Variable2(E2)
 The hidden variables of this query are Earthquake and Alarm
Inference by Enumeration:
 Using initial letter for the variables to shorten the expression we have

 The semantic of Bayesian network give us an expression, in terms of CPT


entries, for simplicity we do this just for Burglary = true
Inference by Variable Elimination:
 The Enumeration Algorithm can be improved substantially by Elimination of
Repeated Calculations.
 The idea is simple: Do the calculation once and solve the result for later use.
This is a form of Dynamic Programming.
Inference by Variable Elimination:
 Variable elimination works by evaluating expressions,
 Previous equation (derived in inference by enumeration)

 From this the repeated variables are separated.

Inference by Variable Elimination:


 Intermediate results are stored, and summations of each variable are done,
for only those portion of the expression, that depends on the variable.
 Let us illustrate this process for the Burglary Network.
 We evaluate the Expression

 We have annotated each part of the expression with the name of the
Associated Variable; these parts are called Factors.
Inference by Variable Elimination:
 For Example, the factors f4(a) and f5(a) corresponding to P(j|a) and P(m|a)
depending just on A because J and M are fixed by the query.
 They are therefore two element vectors.

Inference by Variable Elimination – Example


 Given two factors f1(A,B) and f2(B,C) with probability distributions shown
below, the pointwise product f1xf2 = f3(A,B,C) has 21+1+1 = 8:

Temporal Models:
 Agents in uncertain environments must be able to keep track of the current
state of the environment, just as logical agents must.
 This is difficult by partial and noisy data, because the environment is
uncertain over time.
 At best, the agent will be able to obtain only a probabilistic assessment of
the current situation.
Temporal Models:
 Two sections in Temporal Model,
 Time and Uncertainty.
 States and observations
 Stationary processes and the Markov assumption
 Inference in Temporal Model

Temporal Models – Time and Uncertainity:


 A changing world is modelled using a random variable for each aspect of
the environment state, at each point in time.
 The relations among these variables describe how the state evolves.
Example – Treating a Diabetic Patient:
 We have evidence, such as, recent insulin doses, food intake, blood sugar
measurments, and other physical signs.
 The task is to assess the current state of the patient, including the actual
blood sugar level and insulin level.
 Given this information, the doctor (or patient) makes a decision about the
patient’s food intake and insulin dose.
Example – Treating a Diabetic Patient:
 The dynamic aspects of the problem are essential.
 Blood sugar levels and measurements thereof can change rapidly over
time, depending on one’s recent food intake and insulin doses, one’s
metabolic activity, the time of day, and so on.
 To assess the current state from the history of evidence and to predict the
outcomes of treatment actions, we must model these changes.
Two sections in Temporal Model:
 Time and Uncertainity
-States and Observations
-Stationary processes and the Markov Assumption
 Inference in Temporal Model
States and Observations:
 The process of change can be viewed as a series of snapshots (results),
describes the state the world at a particular time.
 Each snapshot or time slice, contains a set of random variables, some of
which are observable and some of which are not.
State and Observation:
 We will assume that the same subset of variables is observable in each
slice.
 Xt- set of unobservable state variable at time t
 Et– set of observable evidence variable
 The observation at time t is Et=et for some set of values et
State and Observation:
 Example : Umbrella and Rain
 Suppose you are a security guard for some secret underground installation
 You want to know whether it is raining today
 But, your only access to the outside world occurs each morning, when you
see the director coming in with, or without an umbrella.

Example: Umbrella and Rain:


 For each day t, the set Et(Observable evidence Variables) thus contains a
single evidence variable Ut (whether the umbrella appears) and the set
Xt(Unobservable state variable) contains a single state variable Rt(whether
raining or not)
 Hence we can assume Et= Ut and Xt= Rt
 And if Et= true then Xt= true
 I.e. if Ut= true then Rt= true
State and Observation:
 The interval between time slices also depends on the problem.
 For diabetes monitoring, a suitable interval might be an hour rather than a
day.
 We generally assume a fixed, finite interval; this means that times can be
labelled by integers.
 We will assume that evidence starts arriving at t=1 rather than t=0.
 Hence, our umbrella world is represented by state variables Ro will be,
R1,R2,…. And evidence variables U1,U2,…..
 We will use the notation a:b to denote the sequence of integers from a to b
and the notation Xa:b to denote the corresponding set of variables from Xa
to Xb.
 For example U1:3corresponds to the variables U1,U2,U3.
Stationary Processes and the Markov Assumption:
 With the set of state and evidence variables for a given problem, we need
to specify the dependencies among the variables.
 Order the variables in their natural temporal order.
 Since cause usually precedes effect so we need to add the variables in
causal order.
Stationary Processes and the Markov Assumption:
 The set of variables is unbounded, because it includes the state and
evidence variables for every time slice.
 This actually creates two problems:
 First, we might have to specify an unbounded number of conditional
probability tables (CPT), one for each variable in each slice.
 Second, each one might involve an unbounded number of parents.
Stationary Processes and the Markov Assumption:
 Solution for the problems
 The first problem is solved by assuming that changes in the world state are
caused by a stationary process – that is, a process of change that is
governed by laws that do not themselves change over time.
 In the umbrella world, the conditional probability that the umbrella appears,
P(Ut| Parents(Ut )), is the same for all t.
Stationary processes and the Markov Assumption:
 The second problems, handling the infinite number of parents, is solved by
making a Markov Assumption – that is, that the current state depends on
only a finite history of previous states.
 The simplest is the first – order Markov Process
 In which the current state depends only on the previous state and not on
any earlier states.
 Using our notation, the corresponding conditional independence assertion
states that, for all t,
P(Xt| X0:t-1) = P(Xt| Xt-1)

Stationary Processes and the Markov Assumption:


 The transition model for a second-order Markov process is the conditional
distributionP(Xt| Xt-1, Xt-2 ).
 Current state depends on only two previous states.
Inference in Temporal Model:
Basic Inference Tasks:
 Filtering
 Prediction
 Smoothing
 Most likely explanation
1. Filtering:
 Maintain a Current State Estimate and Update it.
 No need to go back over the entire history of percepts for each update.
 In the Umbrella example, this would mean computing the probability of rain
today, given all the observations of the umbrella carrier mode so far
P(Xt+1 | e1:t+1) = f(et+1,P(Xt| e1:t))

2. Prediction:
 To compute a future belief state, given current evidence (it’s like filtering
without all evidence).
 In the umbrella example, this might mean computing the probability of rain
three days from now, given all the observations of the umbrella-carrier
made so far. Prediction is useful for evaluating possible courses of action.
P(Xt+k| e1:t) for k>0
3.Smoothing:
 Smoothing is the process of computing the distribution over past states
given evidence up to the present that is,
 In the umbrella example, it might mean computing the probability that it
rained last Wednesday, given all the observations of the umbrella carrier
made up to today.
P(Xk| e1:t) for 0 ≤ k < t.

4. Most Likely Explanation:


 To compute the state sequence that is most likely, iven the evidence.
 Suppose that [true, true, false, true, true] is the umbrella sequence for the
security guard’s first five days on the job.
 What is the weather sequence most likely to explain this ?
 Does the absence of the umbrella on day 3 mean that it wasn’t raining, or did
the director forget to bring it?

Hidden Markov Models:


 Markov Model
 Hidden Markov Model (HMM)
 Three central issues of HMM
 Model Evaluation
 Most Probable path decoding
 Model training
 Applications Areas of HMM
 Hidden Markov Model (HMM) :
Hidden Markov Model(HMM) is a statistical model, in which the system
being modelled

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy