Unit 8 Ai

UNIT-8 STATSTICAL REASONING:-
PROBABILITY AND BAYES THEORM:-
Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning,
which determines the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities

of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is fundamental to
Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of an event by observing

new information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of
event A with known event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
1. P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is
basic of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as
Probability of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the
evidence
P(B) is called marginal probability, pure probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule
can be written as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:

Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and
P(A). This is very useful in cases where we have a good probability of these three
terms and want to determine the fourth one. Suppose we want to perceive the effect
of some unknown cause, and want to compute that cause, then the Bayes' rule
becomes:
Example-1:
Question: what is the probability that a patient has diseases meningitis with a
stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it
occurs 80% of the time. He is also aware of some more facts, which are given as
follows:
o The Known probability that a patient has meningitis disease is 1/30,000.

o The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that
patient has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff
neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The
probability that the card is king is 4/52, then calculate posterior probability
P(King|Face), which means the drawn face card is a king card.
Solution:
P(king): probability that the card is King= 4/52= 1/13
P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a king = 1
Putting all values in equation (i) we will get:
Application of Bayes' theorem in Artificial

intelligence:
Following are some applications of Bayes' theorem:
o It is used to calculate the next step of the robot when the already executed
step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
Probabilistic reasoning in Artificial

intelligence
Uncertainty:
Till now, we have learned knowledge representation using first-order logic and
propositional logic with certainty, which means we were sure about the predicates.
With this knowledge representation, we might write A→B, which means if A is true
then B is true, but consider a situation where we are not sure about whether A is true
or not then we cannot express this statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates,
we need uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.

2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the
concept of probability to indicate the uncertainty in knowledge. In probabilistic
reasoning, we combine probability theory with logic to handle the uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the

uncertainty that is the result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not
confirmed, such as "It will rain today," "behavior of someone for some situations," "A
match between two teams or two players." These are probable sentences for which
we can assume that it will happen but not sure about it, so here we use probabilistic
reasoning.
Need of probabilistic reasoning in AI:
o When there are unpredictable outcomes.

o When specifications or possibilities of predicates becomes too large to handle.
o When an unknown error occurs during an experiment.
In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:
o Bayes' rule
o Bayesian Statistics
o As probabilistic reasoning uses probability and related terms, so before
understanding probabilistic reasoning, let's understand some common terms:
o Probability: Probability can be defined as a chance that an uncertain event
will occur. It is the numerical measure of the likelihood that an event will
occur. The value of probability always remains between 0 and 1 that represent
ideal uncertainties.
0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

P(A) = 0, indicates total uncertainty in an event A.
P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.
o P(¬A) = probability of a not happening event.

o P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real world
Prior probability: The prior probability of an event is probability computed before observing
information.
Posterior Probability: The probability that is calculated after all evidence or information has taken
account. It is a combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already happe
Let's suppose, we want to calculate the event A when event B has already occurred, "the probability
under the conditions of B", it can be written as:
Where P(A⋀B)= Joint probability of a and B
P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will be given as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample space w
reduced to set B, and now we can only calculate event A when event B is already occurred by div
the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes Englis
mathematics, and then what is the percent of students those who like English also like mathematics
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.
Hence, 57% are the students who like English also like Mathematics.
Certainty Factors and Rule-Based Systems

We mentioned easlier that rule systems can be augmented so that we can draw
variably certain conclusions. The approach which is typically used is influenced by
probability theory, but makes strong simplifying assumptions concerning
the independence of different rules. We won't go into this in detail but you should
be aware that the methods are not always sound, and erroneous conclusions (about
probabilities) may be drawn.
The basic idea is to add certainty factors to rules, and use these to calculate the
measure of belief in some hypothesis. So, we might have a rule such as:
IF has-spots(X)
AND has-fever(X)
THEN has-measles(X) CF 0.5
Certainty factors are related to conditional probabilities, but are not the same. For
one thing, we allow certainty factors of less than zero to represent cases where
some evidence tends to deny some hypothesis. Rich and Knight discuss how
certainty factors consist of two components: a measure of belief and a measure of
disbelief. However, here we'll assume that we only have positive evidence and
equate certainty factors with measures of belief.
Suppose we have already concluded has-spots(fred) with certainty 0.3, and has-
fever(fred) with certainty 0.8. To work out the probability of has-measles(X) we
need to take account both of the certainties of the evidence and the certainty factor
attached to the rule. The certainty associated with the conjoined premise (has-
spots(fred) AND has-fever(fred)) is taken to be the minimum of the certainties
attached to each (ie min(0.3, 0.8) = 0.3). The certainty of the conclusion is the total
certainty of the premises multiplied by the certainty factor of the rule (ie, 0.3 x 0.5
= 0.15).
If we have another rule drawing the same conclusion (e.g., measles(X)) then we
will need to update our certainties to reflect this additional evidence. To do this we
calculate the certainties using the individual rules (say CF1 and CF2), then
combine them to get a total certainty of (CF1 + CF2 - CF1*CF2). The result will
be a certainty greater than each individual certainty, but still less than 1.
Certainty factors provide a simple way of updating probabilities given new

evidence. They are slightly dodgy theoretically, but in practice this tends not to
matter too much. This is mainly because the error in dealing with certainties tends
to lie as much in the certainty factors attached to the rules (or in conditional
probabilities assigned to things) as in how they are manipulated. Generally these
certainty factors will be based on rough guesses of experts in the domain, rather
than based on actual statistical knowledge. These guesses tend not to be very good.
There are many other ways of dealing with uncertainty, such as Demster Shafer
theory and Fuzzy logic. It's a big topic, and we have only touched the surface
The certainty-factor model was one of the most popular model for the
representation and manipulation of uncertain knowledge in the early (1980s) Rule-
based expert systems.
The model was criticized by resea hers in artificial intelligence and statistics being
ad-hoc-in nature. Resea hers and developers have stopped using the model.
Its place has been taken by more expressive formalisms of Bayesian belief
networks for the representation and manipulation of uncertain knowledge.
The manipulation of uncertain knowledge in the Rule-based expert systems is

illustrated in the next three slide before moving to Bayesian Networks.
Rule Based Systems

.
Rule based systems have been discussed in previous lectures.
Here it is recalled to explain uncertainty.
A rule is an expression of the form "if A then B" where A is an

assertion and B can be either an action or another assertion.
Example : Trouble shooting of water pumps
If pump failure then the pressure is low
If pump failure then check oil level
If power failure then pump failure
Rule based system consists of a library of such rules.
Rules reflect essential relationships within the domain.
Rules reflect ways to reason about the domain.
Rules draw conclusions and points to actions, when specific

information about the domain comes in. This is called
inference.
The inference is a kind of chain reaction like :
If there is a power failure then (see rules 1, 2, 3 mentioned

above) Rule 3 states that there is a pump failure, and
Rule 1 tells that the pressure is low, and
Rule 2 gives a (useless) recommendation to check the oil

level.
■ It is very difficult to control such a mixture of inference back

and forth in the same session and resolve such uncertainties.
How to deal such uncertainties ?
How to deal uncertainties in rule based system?
A problem with rule-based systems is that often the

connections reflected by the rules are not absolutely
certain (i.e. deterministic), and the gathered information
is often subject to uncertainty.
In such cases, a certainty measure is added to the
premises as well as the conclusions in the rules of the
system.
A rule then provides a function that describes : how
much a change in the certainty of the premise will
change the certainty of the conclusion.
In its simplest form, this looks like :
If A (with certainty x) then B (with certainty f(x))

This is a new rule, say rule 4, added to earlier three rules.
There are many schemes for treating uncertainty in rule based

systems. The most common are :
Adding certainty factors.
Adoptions of Dempster-Shafer belief functions.
Inclusion of fuzzy logic.
In these schemes, uncertainty is treated locally, means action

is connected directly to incoming rules and uncertainty of their
elements. Example : In addition to rule 4 , in previous slide, we
have the rule
If C (with certainty x) then B (with certainty g(x))

Now If the information is that A holds with certainty a
and C holds with certainty c, Then what is the certainty
of B ?
Note : Depending on the scheme, there are different algebras
for such a combination of uncertainty. But all these algebras in
many cases come to incorrect conclusions because combination
of uncertainty is not a local phenomenon, but it is strongly
dependent on the entire situation (in principle a global matter).
Bayesian Belief Network in artificial
intelligence
Bayesian belief network is key computer technology for dealing with probabilistic
events and to solve a problem which has uncertainty. We can define a Bayesian
network as:
"A Bayesian network is a probabilistic graphical model which represents a set of

variables and their conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian

model.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions,
and it consists of two parts:
o Directed Acyclic Graph

o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where
o Each node corresponds to the random variables, and a variable can
be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities
between random variables. These directed links or arrows connect the pair of nodes
in the graph.
These links represent that one node directly influence the other node, and if there is
no directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented
by the nodes of the network graph.
o If we are considering node B, which is connected with node A by a
directed arrow, then node A is called the parent of Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is
known as a directed acyclic graph or DAG.
The Bayesian network has mainly two components:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |
Parent(Xi) ), which determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional

probability. So let's first understand the joint probability distribution:
Joint probability distribution:

If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination
of x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint
probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:

Let's understand the Bayesian network through an example by creating a directed
acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The
alarm reliably responds at detecting a burglary but also responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always calls
Harry when he hears the alarm, but sometimes he got confused with the phone
ringing and calls at that time too. On the other hand, Sophia likes to listen to high
music, so sometimes she misses to hear the alarm. Here we would like to compute
the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a
burglary, nor an earthquake occurred, and David and Sophia both called the
Harry.
Solution:
o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the
alarm and directly affecting the probability of alarm's going off, but David and
Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not
confer before calling.
o The conditional distributions for each node are given as conditional
probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2 K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
List of all events occurring in this network:
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A,
B, E], can rewrite the above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:
B E P(A= True) P(A= False)
True True 0.94 0.06
True False 0.95 0.04
False True 0.31 0.69

False False 0.001 0.999
Conditional probability table for David Calls:
The Conditional probability of David that he will call depends on the probability of
Alarm.
A P(D= True) P(D= False)
True 0.91 0.09
False 0.05 0.95
Conditional probability table for Sophia Calls:
The Conditional probability of Sophia that she calls is depending on its Parent Node
"Alarm."
A P(S= True) P(S= False)
True 0.75 0.25
False 0.02 0.98
From the formula of joint distribution, we can write the problem statement in the
form of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.
The semantics of Bayesian Network:
There are two ways to understand the semantics of the Bayesian network, which is
given below:
1. To understand the network as the representation of the Joint probability

distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional

independence statements.
It is helpful in designing inference procedure.
Dempster Shafer Theory in Artificial Intelligence

Uncertainty is a pervasive aspect of AI systems, as they often deal with incomplete
or conflicting information. Dempster–Shafer Theory, named after its inventors
Arthur P. Dempster and Glenn Shafer, offers a mathematical framework to
represent and reason with uncertain information. By utilizing belief functions,
Dempster–Shafer Theory in Artificial Intelligence systems enables them to handle
imprecise and conflicting evidence, making it a powerful tool in decision-making
processes.
Introduction
In recent times, the scientific and engineering community has come to realize the significance
of incorporating multiple forms of uncertainty. This expanded perspective on uncertainty has
been made feasible by notable advancements in computational power within the field of
artificial intelligence. As computational systems become more adept at handling intricate
analyses, the limitations of relying solely on traditional probability theory to encompass the
entirety of uncertainty have become apparent.
Traditional probability theory falls short in its ability to effectively address consonant,
consistent, or arbitrary evidence without the need for additional assumptions about
probability distributions within a given set. Moreover, it fails to express the extent of conflict
that may arise between different sets of evidence. To overcome these limitations, Dempster-
Shafer theory has emerged as a viable framework, blending the concept of probability with
the conventional understanding of sets. Dempster-Shafer theory provides the means to handle
diverse types of evidence, and it incorporates various methods to account for conflicts when
combining multiple sources of information in the context of artificial intelligence.
What Is Dempster – Shafer Theory (DST)?

Dempster-Shafer Theory (DST) is a theory of evidence that has its roots in the work of
Dempster and Shafer. While traditional probability theory is limited to assigning probabilities
to mutually exclusive single events, DST extends this to sets of events in a finite discrete
space. This generalization allows DST to handle evidence associated with multiple possible
events, enabling it to represent uncertainty in a more meaningful way. DST also provides a
more flexible and precise approach to handling uncertain information without relying on
additional assumptions about events within an evidential set.
Where sufficient evidence is present to assign probabilities to single events, the Dempster-
Shafer model can collapse to the traditional probabilistic formulation. Additionally, one of
the most significant features of DST is its ability to handle different levels of precision
regarding information without requiring further assumptions. This characteristic enables the
direct representation of uncertainty in system responses, where an imprecise input can be
characterized by a set or interval, and the resulting output is also a set or interval.
The incorporation of Dempster Shafer theory in artificial intelligence allows for a more
comprehensive treatment of uncertainty. By leveraging the unique features of this theory, AI
systems can better navigate uncertain scenarios, leveraging the potential of multiple
evidentiary types and effectively managing conflicts. The utilization of Dempster Shafer
theory in artificial intelligence empowers decision-making processes in the face of
uncertainty and enhances the robustness of AI systems. Therefore, Dempster-Shafer theory is
a powerful tool for building AI systems that can handle complex uncertain scenarios.
The Uncertainty in this Model

At its core, DST represents uncertainty using a mathematical object called a belief function.
This belief function assigns degrees of belief to various hypotheses or propositions, allowing
for a nuanced representation of uncertainty. Three crucial points illustrate the nature of
uncertainty within this theory:
1. Conflict: In DST, uncertainty arises from conflicting evidence or incomplete

information. The theory captures these conflicts and provides mechanisms to manage
and quantify them, enabling AI systems to reason effectively.
2. Combination Rule: DST employs a combination rule known as Dempster's rule of
combination to merge evidence from different sources. This rule handles conflicts
between sources and determines the overall belief in different hypotheses based on the
available evidence.
3. Mass Function: The mass function, denoted as m(K), quantifies the belief assigned to
a set of hypotheses, denoted as K. It provides a measure of uncertainty by allocating
probabilities to various hypotheses, reflecting the degree of support each hypothesis
has from the available evidence.
Example
Consider a scenario in artificial intelligence (AI) where an AI system is tasked with solving a
murder mystery using Dempster–Shafer Theory. The setting is a room with four individuals:
A, B, C, and D. Suddenly, the lights go out, and upon their return, B is discovered dead,
having been stabbed in the back with a knife. No one entered or exited the room, and it is
known that B did not commit suicide. The objective is to identify the murderer.
To address this challenge using Dempster–Shafer Theory, we can explore various

possibilities:
1. Possibility 1: The murderer could be either A, C, or D.

2. Possibility 2: The murderer could be a combination of two individuals, such as A and C, C
and D, or A and D.
3. Possibility 3: All three individuals, A, C, and D, might be involved in the crime.
4. Possibility 4: None of the individuals present in the room is the murderer.
To find the murderer using Dempster–Shafer Theory, we can examine the evidence and
assign measures of plausibility to each possibility. We create a set of possible
conclusions (𝑃)(P) with individual elements {𝑝1,𝑝2,...,𝑝𝑛}{p1,p2,...,pn}, where at
least one element (𝑝)(p) must be true. These elements must be mutually exclusive.
By constructing the power set, which contains all possible subsets, we can analyze the
evidence. For instance, if 𝑃={𝑎,𝑏,𝑐}P={a,b,c}, the power set would be {𝑜,{𝑎},{𝑏},
{𝑐},{𝑎,𝑏},{𝑏,𝑐},{𝑎,𝑐},{𝑎,𝑏,𝑐}}{o,{a},{b},{c},{a,b},{b,c},{a,c},{a,b,c}},
comprising 23=823=8 elements.
Mass function m(K)

In Dempster–Shafer Theory, the mass function m(K) represents evidence for a hypothesis or
subset K. It denotes that evidence for {K or B} cannot be further divided into more specific
beliefs for K and B.
Belief in K
The belief in 𝐾K, denoted as 𝐵𝑒𝑙(𝐾)Bel(K), is calculated by summing the masses of the
subsets that belong to 𝐾K. For example,
if 𝐾={𝑎,𝑑,𝑐},𝐵𝑒𝑙(𝐾)K={a,d,c},Bel(K) would be calculated as 𝑚(𝑎)+𝑚(𝑑)+𝑚(𝑐)
+𝑚(𝑎,𝑑)+𝑚(𝑎,𝑐)+𝑚(𝑑,𝑐)+𝑚(𝑎,𝑑,𝑐)m(a)+m(d)+m(c)+m(a,d)+m(a,c)+m(d,c)
+m(a,d,c).
Plausibility in K
Plausibility in 𝐾K, denoted as 𝑃𝑙(𝐾)Pl(K), is determined by summing the masses of sets
that intersect with 𝐾K. It represents the cumulative evidence supporting the possibility of K
being true. 𝑃𝑙(𝐾)Pl(K) is computed as 𝑚(𝑎)+𝑚(𝑑)+𝑚(𝑐)+𝑚(𝑎,𝑑)+𝑚(𝑑,𝑐)+𝑚(𝑎,𝑐)
+𝑚(𝑎,𝑑,𝑐)m(a)+m(d)+m(c)+m(a,d)+m(d,c)+m(a,c)+m(a,d,c).
By leveraging Dempster–Shafer Theory in AI, we can analyze the evidence, assign masses to
subsets of possible conclusions, and calculate beliefs and plausibilities to infer the most likely
murderer in this murder mystery scenario.
Characteristics of Dempster Shafer Theory

Dempster Shafer Theory in artificial intelligence (AI) exhibits several notable characteristics:
1. Handling Ignorance: Dempster Shafer Theory encompasses a unique aspect related

to ignorance, where the aggregation of probabilities for all events sums up to 1. This
peculiar trait allows the theory to effectively address situations involving incomplete
or missing information.
2. Reduction of Ignorance: In this theory, ignorance is gradually diminished through
the accumulation of additional evidence. By incorporating more and more evidence,
Dempster Shafer Theory enables AI systems to make more informed and precise
decisions, thereby reducing uncertainties.
3. Combination Rule: The theory employs a combination rule to effectively merge and
integrate various types of possibilities. This rule allows for the synthesis of different
pieces of evidence, enabling AI systems to arrive at comprehensive and robust
conclusions by considering the diverse perspectives presented.
By leveraging these distinct characteristics, Dempster Shafer Theory proves to be a valuable

tool in the field of artificial intelligence, empowering systems to handle ignorance, reduce
uncertainties, and combine multiple types of evidence for more accurate decision-making.
Advantages and Disadvantages

Dempster Shafer Theory in Artificial Intelligence (AI) Offers Numerous Benefits:
1. Firstly, it presents a systematic and well-founded framework for effectively managing

uncertain information and making informed decisions in the face of uncertainty.
2. Secondly, the application of Dempster–Shafer Theory allows for the integration and
fusion of diverse sources of evidence, enhancing the robustness of decision-making
processes in AI systems.
3. Moreover, this theory caters to the handling of incomplete or conflicting information,
which is a common occurrence in real-world scenarios encountered in artificial
intelligence.
Nevertheless, it is Crucial to Acknowledge Certain Limitations Associated with the

Utilization of Dempster Shafer Theory in Artificial Intelligence:
1. One drawback is that the computational complexity of DST increases significantly

when confronted with a substantial number of events or sources of evidence, resulting
in potential performance challenges.
2. Furthermore, the process of combining evidence using Dempster–Shafer Theory
necessitates careful modeling and calibration to ensure accurate and reliable
outcomes.
3. Additionally, the interpretation of belief and plausibility values in DST may possess
subjectivity, introducing the possibility of biases influencing decision-making
processes in artificial intelligence.
Conclusion
This article taught us:
 Dempster Shafer Theory in artificial intelligence empowers AI systems to handle uncertainty

effectively and make more accurate decisions.
 By leveraging its unique characteristics, AI systems can navigate uncertain scenarios,
combine diverse evidence sources, and enhance their overall performance.
The distinctive characteristics of Dempster Shafer Theory, such as its ability to aggregate
probabilities to 1, gradual reduction of ignorance through accumulating evidence, and
utilization of a combination rule for merging possibilities, contribute to its effectiveness in
addressing uncertainty in AI.
The advantages of Dempster Shafer Theory in AI include providing a principled framework

for uncertain information management, facilitating robust decision-making through
evidence fusion, and accommodating incomplete or conflicting information.
Dempster Shafer Theory stands as a valuable tool in the field of artificial intelligence,
contributing to the advancement of intelligent systems capable of handling complex and
uncertain environments.
Artificial Intelligence - Fuzzy Logic Systems
Fuzzy Logic Systems (FLS) produce acceptable but definite output in
response to incomplete, ambiguous, distorted, or inaccurate (fuzzy) input.
What is Fuzzy Logic?

Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning.
The approach of FL imitates the way of decision making in humans that
involves all intermediate possibilities between digital values YES and NO.
The conventional logic block that a computer can understand takes precise
input and produces a definite output as TRUE or FALSE, which is equivalent
to human’s YES or NO.
The inventor of fuzzy logic, Lotfi Zadeh, observed that unlike computers, the
human decision making includes a range of possibilities between YES and
NO, such as −
CERTAINLY YES
POSSIBLY YES
CANNOT SAY
POSSIBLY NO
CERTAINLY NO
The fuzzy logic works on the levels of possibilities of input to achieve the
definite output.
Implementation
 It can be implemented in systems with various sizes and capabilities
ranging from small micro-controllers to large, networked, workstation-
based control systems.
 It can be implemented in hardware, software, or a combination of
both.
Why Fuzzy Logic?
Fuzzy logic is useful for commercial and practical purposes.
 It can control machines and consumer products.

 It may not give accurate reasoning, but acceptable reasoning.
 Fuzzy logic helps to deal with the uncertainty in engineering.
Learn Artificial Intelligence in-depth with real-world projects through

our Artificial Intelligence certification course. Enroll and become a certified
expert to boost your career.
Fuzzy Logic Systems Architecture

It has four main parts as shown −
 Fuzzification Module − It transforms the system inputs, which are crisp

numbers, into fuzzy sets. It splits the input signal into five steps such
as −
LP x is Large Positive
MP x is Medium Positive
S x is Small
MN x is Medium Negative
LN x is Large Negative
 Knowledge Base − It stores IF-THEN rules provided by experts.

 Inference Engine − It simulates the human reasoning process by making
fuzzy inference on the inputs and IF-THEN rules.
 Defuzzification Module − It transforms the fuzzy set obtained by the
inference engine into a crisp value.
The membership functions work on fuzzy sets of variables.
Membership Function
Membership functions allow you to quantify linguistic term and represent a
fuzzy set graphically. A membership function for a fuzzy set A on the universe of
discourse X is defined as μA:X → [0,1].
Here, each element of X is mapped to a value between 0 and 1. It is
called membership value or degree of membership. It quantifies the degree of
membership of the element in X to the fuzzy set A.
 x axis represents the universe of discourse.
 y axis represents the degrees of membership in the [0, 1] interval.
There can be multiple membership functions applicable to fuzzify a numerical

value. Simple membership functions are used as use of complex functions
does not add more precision in the output.
All membership functions for LP, MP, S, MN, and LN are shown as below −
The triangular membership function shapes are most common among various
other membership function shapes such as trapezoidal, singleton, and
Gaussian.
Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence
the corresponding output also changes.
Example of a Fuzzy Logic System

Let us consider an air conditioning system with 5-level fuzzy logic system.
This system adjusts the temperature of air conditioner by comparing the
room temperature and the target temperature value.
Algorithm
 Define linguistic Variables and terms (start)
 Construct membership functions for them. (start)
 Construct knowledge base of rules (start)
 Convert crisp data into fuzzy data sets using membership functions.
(fuzzification)
 Evaluate rules in the rule base. (Inference Engine)
 Combine results from each rule. (Inference Engine)
 Convert output data into non-fuzzy values. (defuzzification)
Development
Step 1 − Define linguistic variables and terms
Linguistic variables are input and output variables in the form of simple
words or sentences. For room temperature, cold, warm, hot, etc., are
linguistic terms.
Temperature (t) = {very-cold, cold, warm, very-warm, hot}
Every member of this set is a linguistic term and it can cover some portion of
overall temperature values.
Step 2 − Construct membership functions for them
The membership functions of temperature variable are as shown −
Step3 − Construct knowledge base rules
Create a matrix of room temperature values versus target temperature

values that an air conditioning system is expected to provide.
RoomTemp.
Very_Cold Cold Warm Hot Very_Hot
/Target
Very_Cold No_Change Heat Heat Heat Heat

Cold Cool No_Change Heat Heat Heat
Warm Cool Cool No_Change Heat Heat
Hot Cool Cool Cool No_Change Heat
Very_Hot Cool Cool Cool Cool No_Change
Build a set of rules into the knowledge base in the form of IF-THEN-ELSE
structures.
Sr. No. Condition Action
1 IF temperature=(Cold OR Very_Cold) AND target=Warm THEN Heat
2 IF temperature=(Hot OR Very_Hot) AND target=Warm THEN Cool
3 IF (temperature=Warm) AND (target=Warm) THEN No_Change
Step 4 − Obtain fuzzy value
Fuzzy set operations perform evaluation of rules. The operations used for OR
and AND are Max and Min respectively. Combine all results of evaluation to
form a final result. This result is a fuzzy value.
Step 5 − Perform defuzzification
Defuzzification is then performed according to membership function for

output variable.
Application Areas of Fuzzy Logic
The key application areas of fuzzy logic are as given −
Automotive Systems
 Automatic Gearboxes
 Four-Wheel Steering
 Vehicle environment control
Consumer Electronic Goods
 Hi-Fi Systems
 Photocopiers
 Still and Video Cameras
 Television
Domestic Goods
 Microwave Ovens
 Refrigerators
 Toasters
 Vacuum Cleaners
 Washing Machines
Environment Control
 Air Conditioners/Dryers/Heaters
 Humidifiers
Advantages of FLSs
 Mathematical concepts within fuzzy reasoning are very simple.
 You can modify a FLS by just adding or deleting rules due to flexibility
of fuzzy logic.
 Fuzzy logic Systems can take imprecise, distorted, noisy input
information.
 FLSs are easy to construct and understand.
 Fuzzy logic is a solution to complex problems in all fields of life,
including medicine, as it resembles human reasoning and decision
making.
Disadvantages of FLSs
 There is no systematic approach to fuzzy system designing.
 They are understandable only when simple.
 They are suitable for the problems which do not need high accuracy.
---------------------------------------finish------------------------------------
UNIT 8:-
1. Explain bays theorem with example?

2. Explain probabilistic reasoning in AI with e.g.?
3. Explain certainty factors and rule based system?
4. Explain rule based system with example?
5. Explain Bayesian belief network in AI with example?
6. Explain Dempster shafer theory with example?
7. Explain characteristic of dempster shafer theory ?
8. Explain advantages and disadvantages of dempster shafer theory
9. What is Fuzzy Logic explain with example?
10.Explain advantages and disadvantages of Fuzzy Logic?

Unit 8 Ai

Uploaded by

Copyright:

Available Formats

Unit 8 Ai

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 8 Ai

Uploaded by

Copyright:

Available Formats

UNIT-8 STATSTICAL REASONING:-

PROBABILITY AND BAYES THEORM:-

In probability theory, it relates the conditional probability and marginal probabilities

It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).

Bayes' theorem allows updating the probability prediction of an event by observing

As from product rule we can write:

1. P(A ⋀ B)= P(A|B) P(B) or

Similarly, the probability of event B with known event A:

1. P(A ⋀ B)= P(B|A) P(A)

Equating right hand side of both the equations, we will get:

P(B) is called marginal probability, pure probability of an evidence.

Applying Bayes' rule:

o The Known probability that a patient has meningitis disease is 1/30,000.

P(king): probability that the card is King= 4/52= 1/13

P(face): probability that a card is a face card= 3/13

P(Face|King): probability of face card when we assume it is a king = 1

Putting all values in equation (i) we will get:

Application of Bayes' theorem in Artificial

Probabilistic reasoning in Artificial

1. Information occurred from unreliable sources.

We use probability in probabilistic reasoning because it provides a way to handle the

Need of probabilistic reasoning in AI:

o When there are unpredictable outcomes.

0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

o P(¬A) = probability of a not happening event.

Event: Each possible outcome of a variable is called an event.

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

Let, A is an event that a student likes Mathematics

B is an event that a student likes English.

Certainty Factors and Rule-Based Systems

Certainty factors provide a simple way of updating probabilities given new

The manipulation of uncertain knowledge in the Rule-based expert systems is

Rule Based Systems

Rule based systems have been discussed in previous lectures.

Here it is recalled to explain uncertainty.

A rule is an expression of the form "if A then B" where A is an

If pump failure then the pressure is low

If pump failure then check oil level

If power failure then pump failure

Rule based system consists of a library of such rules.

Rules reflect essential relationships within the domain.

Rules reflect ways to reason about the domain.

Rules draw conclusions and points to actions, when specific

If there is a power failure then (see rules 1, 2, 3 mentioned

Rule 1 tells that the pressure is low, and

Rule 2 gives a (useless) recommendation to check the oil

■ It is very difficult to control such a mixture of inference back

How to deal such uncertainties ?

How to deal uncertainties in rule based system?

A problem with rule-based systems is that often the

In its simplest form, this looks like :

If A (with certainty x) then B (with certainty f(x))

There are many schemes for treating uncertainty in rule based

Adding certainty factors.

Adoptions of Dempster-Shafer belief functions.

Inclusion of fuzzy logic.

In these schemes, uncertainty is treated locally, means action

If C (with certainty x) then B (with certainty g(x))

P(S, D, A, ¬B, ¬E) = P (S|A) P (D|A)P (A|¬B ^ ¬E) P (¬B) P (¬E).