Artificial Intelligence M2
Artificial Intelligence M2
Artificial Intelligence M2
Module - II
Probabilistic Reasoning
Probability theory is used to discuss events, categories, and hypotheses about which
there is not 100% certainty.
We might write A→B, which means that if A is true, then B is true. If we are unsure
whether A is true, then we cannot make use of this expression.
In many real-world situations, it is very useful to be able to talk about things that lack
certainty. For example, what will the weather be like tomorrow? We might formulate
a very simple hypothesis based on general observation, such as “it is sunny only 10%
of the time, and rainy 70% of the time”. We can use a notation similar to that used for
predicate calculus to express such statements:
P(S) = 0.1
P(R) = 0.7
The first of these statements says that the probability of S (“it is sunny”) is 0.1. The
second says that the probability of R is 0.7. Probabilities are always expressed as real
numbers between 0 and 1. A probability of 0 means “definitely not” and a probability
of 1 means “definitely so.” Hence, P(S) = 1 means that it is always sunny.
Many of the operators and notations that are used in prepositional logic can also be
used in probabilistic notation. For example, P(¬S) means “the probability that it is
not sunny”; P(S ∧ R) means “the probability that it is both sunny and rainy.” P(A ∨
B), which means “the probability that either A is true or B is true,” is defined by the
following rule: P(A ∨ B) = P(A) + P(B) - P(A ∧ B)
The notation P(B|A) can be read as “the probability of B, given A.” This is known as
conditional probability—it is conditional on A. In other words, it states the probability 1
that B is true, given that we already know that A is true. P(B|A) is defined by the
following rule: Of course, this rule cannot be used in cases where P(A) = 0.
For example, let us suppose that the likelihood that it is both sunny and rainy at the
same time is 0.01. Then we can calculate the probability that it is rainy, given that it is
sunny as follows:
The basic approach statistical methods adopt to deal with uncertainty is via the
axioms of probability:
Probabilities are (real) numbers in the range 0 to 1.
A probability of P(A) = 0 indicates total uncertainty in A, P(A) = 1 total
certainty and values in between some degree of (un)certainty.
Probabilities can be calculated in a number of ways.
Probability = (number of desired outcomes) / (total number of outcomes)
So given a pack of playing cards the probability of being dealt an ace from a full
normal deck is 4 (the number of aces) / 52 (number of cards in deck) which is 1/13.
Similarly the probability of being dealt a spade suit is 13 / 52 = 1/4.
If you have a choice of number of items k from a set of items n then the
to 1.
Conditional probability, P(A|B), indicates the probability of of event A given that we
know event B has occurred.
A Bayesian Network is a directed acyclic graph:
A graph where the directions are links which indicate dependencies that exist
between nodes.
Nodes represent propositions about events or events themselves. 2
Bayes’ theorem can be used to calculate the probability that a certain event will occur
or that a certain proposition is true
The theorem is stated as follows:
P(B) is called the prior probability of B. P(B|A), as well as being called the
conditional probability, is also known as the posterior probability of B.
P(A ∧ B) = P(A|B)P(B) 3
This reads that given some evidence E then probability that hypothesis is
true is equal to the ratio of the probability that E will be true given times the
know all the prior probabilities of find symptom and also the probability of
having an illness based on certain symptoms being observed.
Bayesian statistics lie at the heart of most statistical reasoning systems. How is Bayes
theorem exploited?
The key is to formulate problem correctly:
P(A|B) states the probability of A given only B's evidence. If there is other
relevant evidence then it must also be considered.
All events must be mutually exclusive. However in real world problems events are not
generally unrelated. For example in diagnosing measles, the symptoms of spots and a
fever are related. This means that computing the conditional probabilities gets
In general if a prior evidence, p and some new observation, N then computing 4
All events must be exhaustive. This means that in order to compute all probabilities
the set of possible events must be closed. Thus if new information arises the set must
be created afresh and all probabilities recalculated.
Thus Simple Bayes rule-based systems are not suitable for uncertain reasoning.
Knowledge acquisition is very hard.
Too many probabilities needed -- too large a storage space.
Computation time is too large.
Updating new information is difficult and time consuming.
Exceptions like ``none of the above'' cannot be represented.
Humans are not very good probability estimators.
However, Bayesian statistics still provide the core to reasoning in many uncertain
reasoning systems with suitable enhancement to overcome the above problems. We
will look at three broad categories:
Certainty factors
Dempster-Shafer models
Bayesian networks.
Bayesian networks are also called Belief Networks or Probabilistic Inference Networks. 5 6
Clinical Example: 7
Knowledge can be defined as the body of facts and principles accumulated by human-
kind or the act, fact, or state of knowing
Knowledge is having familiarity with language, concepts, procedures, rules, ideas,
abstractions, places, customs, facts, and associations, coupled with an ability to use
theses notions effectively in modeling different aspects of the world
The meaning of knowledge is closely related to the meaning of intelligence
Intelligent requires the possession of and access to knowledge
A common way to represent knowledge external to a computer or a human is in the
form of written language
Ramu is tall – This expresses a simple fact, an attribute possessed by a person
Ramu loves his mother – This expresses a complex binary relation between
two persons
Knowledge may be declarative or procedural
Procedural knowledge is compiled knowledge related to the performance of
some task. For example, the steps used to solve an algebraic equation 8 9
Representation of knowledge 10 11 12