Unit 8 Ai
Unit 8 Ai
Unit 8 Ai
Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning,
which determines the probability of an event with uncertain knowledge.
Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is fundamental to
Bayesian statistics.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of
event A with known event B:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is
basic of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as
Probability of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the
evidence
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule
can be written as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Example-1:
Question: what is the probability that a patient has diseases meningitis with a
stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it
occurs 80% of the time. He is also aware of some more facts, which are given as
follows:
Let a be the proposition that patient has stiff neck and b be the proposition that
patient has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff
neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The
probability that the card is king is 4/52, then calculate posterior probability
P(King|Face), which means the drawn face card is a king card.
Solution:
o It is used to calculate the next step of the robot when the already executed
step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
So to represent uncertain knowledge, where we are not sure about the predicates,
we need uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the
concept of probability to indicate the uncertainty in knowledge. In probabilistic
reasoning, we combine probability theory with logic to handle the uncertainty.
In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:
o Bayes' rule
o Bayesian Statistics
o As probabilistic reasoning uses probability and related terms, so before
understanding probabilistic reasoning, let's understand some common terms:
o Probability: Probability can be defined as a chance that an uncertain event
will occur. It is the numerical measure of the likelihood that an event will
occur. The value of probability always remains between 0 and 1 that represent
ideal uncertainties.
We can find the probability of an uncertain event by using the below formula.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real world
Prior probability: The prior probability of an event is probability computed before observing
information.
Posterior Probability: The probability that is calculated after all evidence or information has taken
account. It is a combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already happe
Let's suppose, we want to calculate the event A when event B has already occurred, "the probability
under the conditions of B", it can be written as:
If the probability of A is given and we need to find the probability of B, then it will be given as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample space w
reduced to set B, and now we can only calculate event A when event B is already occurred by div
the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes Englis
mathematics, and then what is the percent of students those who like English also like mathematics
Solution:
Hence, 57% are the students who like English also like Mathematics.
The basic idea is to add certainty factors to rules, and use these to calculate the
measure of belief in some hypothesis. So, we might have a rule such as:
IF has-spots(X)
AND has-fever(X)
THEN has-measles(X) CF 0.5
Certainty factors are related to conditional probabilities, but are not the same. For
one thing, we allow certainty factors of less than zero to represent cases where
some evidence tends to deny some hypothesis. Rich and Knight discuss how
certainty factors consist of two components: a measure of belief and a measure of
disbelief. However, here we'll assume that we only have positive evidence and
equate certainty factors with measures of belief.
Suppose we have already concluded has-spots(fred) with certainty 0.3, and has-
fever(fred) with certainty 0.8. To work out the probability of has-measles(X) we
need to take account both of the certainties of the evidence and the certainty factor
attached to the rule. The certainty associated with the conjoined premise (has-
spots(fred) AND has-fever(fred)) is taken to be the minimum of the certainties
attached to each (ie min(0.3, 0.8) = 0.3). The certainty of the conclusion is the total
certainty of the premises multiplied by the certainty factor of the rule (ie, 0.3 x 0.5
= 0.15).
If we have another rule drawing the same conclusion (e.g., measles(X)) then we
will need to update our certainties to reflect this additional evidence. To do this we
calculate the certainties using the individual rules (say CF1 and CF2), then
combine them to get a total certainty of (CF1 + CF2 - CF1*CF2). The result will
be a certainty greater than each individual certainty, but still less than 1.
There are many other ways of dealing with uncertainty, such as Demster Shafer
theory and Fuzzy logic. It's a big topic, and we have only touched the surface
The certainty-factor model was one of the most popular model for the
representation and manipulation of uncertain knowledge in the early (1980s) Rule-
based expert systems.
The model was criticized by resea hers in artificial intelligence and statistics being
ad-hoc-in nature. Resea hers and developers have stopped using the model.
Its place has been taken by more expressive formalisms of Bayesian belief
networks for the representation and manipulation of uncertain knowledge.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions,
and it consists of two parts:
The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where
o Each node corresponds to the random variables, and a variable can
be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities
between random variables. These directed links or arrows connect the pair of nodes
in the graph.
These links represent that one node directly influence the other node, and if there is
no directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented
by the nodes of the network graph.
o If we are considering node B, which is connected with node A by a
directed arrow, then node A is called the parent of Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is
known as a directed acyclic graph or DAG.
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |
Parent(Xi) ), which determines the effect of the parent on that node.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint
probability distribution.
In general for each variable Xi, we can write the equation as:
Example: Harry installed a new burglar alarm at his home to detect burglary. The
alarm reliably responds at detecting a burglary but also responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always calls
Harry when he hears the alarm, but sometimes he got confused with the phone
ringing and calls at that time too. On the other hand, Sophia likes to listen to high
music, so sometimes she misses to hear the alarm. Here we would like to compute
the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a
burglary, nor an earthquake occurred, and David and Sophia both called the
Harry.
Solution:
o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the
alarm and directly affecting the probability of alarm's going off, but David and
Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not
confer before calling.
o The conditional distributions for each node are given as conditional
probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2 K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A,
B, E], can rewrite the above probability statement using joint probability distribution:
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
The Conditional probability of David that he will call depends on the probability of
Alarm.
The Conditional probability of Sophia that she calls is depending on its Parent Node
"Alarm."
From the formula of joint distribution, we can write the problem statement in the
form of probability distribution:
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.
There are two ways to understand the semantics of the Bayesian network, which is
given below:
Introduction
In recent times, the scientific and engineering community has come to realize the significance
of incorporating multiple forms of uncertainty. This expanded perspective on uncertainty has
been made feasible by notable advancements in computational power within the field of
artificial intelligence. As computational systems become more adept at handling intricate
analyses, the limitations of relying solely on traditional probability theory to encompass the
entirety of uncertainty have become apparent.
Traditional probability theory falls short in its ability to effectively address consonant,
consistent, or arbitrary evidence without the need for additional assumptions about
probability distributions within a given set. Moreover, it fails to express the extent of conflict
that may arise between different sets of evidence. To overcome these limitations, Dempster-
Shafer theory has emerged as a viable framework, blending the concept of probability with
the conventional understanding of sets. Dempster-Shafer theory provides the means to handle
diverse types of evidence, and it incorporates various methods to account for conflicts when
combining multiple sources of information in the context of artificial intelligence.
Where sufficient evidence is present to assign probabilities to single events, the Dempster-
Shafer model can collapse to the traditional probabilistic formulation. Additionally, one of
the most significant features of DST is its ability to handle different levels of precision
regarding information without requiring further assumptions. This characteristic enables the
direct representation of uncertainty in system responses, where an imprecise input can be
characterized by a set or interval, and the resulting output is also a set or interval.
The incorporation of Dempster Shafer theory in artificial intelligence allows for a more
comprehensive treatment of uncertainty. By leveraging the unique features of this theory, AI
systems can better navigate uncertain scenarios, leveraging the potential of multiple
evidentiary types and effectively managing conflicts. The utilization of Dempster Shafer
theory in artificial intelligence empowers decision-making processes in the face of
uncertainty and enhances the robustness of AI systems. Therefore, Dempster-Shafer theory is
a powerful tool for building AI systems that can handle complex uncertain scenarios.
Example
Consider a scenario in artificial intelligence (AI) where an AI system is tasked with solving a
murder mystery using Dempster–Shafer Theory. The setting is a room with four individuals:
A, B, C, and D. Suddenly, the lights go out, and upon their return, B is discovered dead,
having been stabbed in the back with a knife. No one entered or exited the room, and it is
known that B did not commit suicide. The objective is to identify the murderer.
By constructing the power set, which contains all possible subsets, we can analyze the
evidence. For instance, if 𝑃={𝑎,𝑏,𝑐}P={a,b,c}, the power set would be {𝑜,{𝑎},{𝑏},
{𝑐},{𝑎,𝑏},{𝑏,𝑐},{𝑎,𝑐},{𝑎,𝑏,𝑐}}{o,{a},{b},{c},{a,b},{b,c},{a,c},{a,b,c}},
comprising 23=823=8 elements.
Belief in K
The belief in 𝐾K, denoted as 𝐵𝑒𝑙(𝐾)Bel(K), is calculated by summing the masses of the
subsets that belong to 𝐾K. For example,
if 𝐾={𝑎,𝑑,𝑐},𝐵𝑒𝑙(𝐾)K={a,d,c},Bel(K) would be calculated as 𝑚(𝑎)+𝑚(𝑑)+𝑚(𝑐)
+𝑚(𝑎,𝑑)+𝑚(𝑎,𝑐)+𝑚(𝑑,𝑐)+𝑚(𝑎,𝑑,𝑐)m(a)+m(d)+m(c)+m(a,d)+m(a,c)+m(d,c)
+m(a,d,c).
Plausibility in K
Plausibility in 𝐾K, denoted as 𝑃𝑙(𝐾)Pl(K), is determined by summing the masses of sets
that intersect with 𝐾K. It represents the cumulative evidence supporting the possibility of K
being true. 𝑃𝑙(𝐾)Pl(K) is computed as 𝑚(𝑎)+𝑚(𝑑)+𝑚(𝑐)+𝑚(𝑎,𝑑)+𝑚(𝑑,𝑐)+𝑚(𝑎,𝑐)
+𝑚(𝑎,𝑑,𝑐)m(a)+m(d)+m(c)+m(a,d)+m(d,c)+m(a,c)+m(a,d,c).
By leveraging Dempster–Shafer Theory in AI, we can analyze the evidence, assign masses to
subsets of possible conclusions, and calculate beliefs and plausibilities to infer the most likely
murderer in this murder mystery scenario.
Conclusion
This article taught us:
The distinctive characteristics of Dempster Shafer Theory, such as its ability to aggregate
probabilities to 1, gradual reduction of ignorance through accumulating evidence, and
utilization of a combination rule for merging possibilities, contribute to its effectiveness in
addressing uncertainty in AI.
Dempster Shafer Theory stands as a valuable tool in the field of artificial intelligence,
contributing to the advancement of intelligent systems capable of handling complex and
uncertain environments.
Artificial Intelligence - Fuzzy Logic Systems
Fuzzy Logic Systems (FLS) produce acceptable but definite output in
response to incomplete, ambiguous, distorted, or inaccurate (fuzzy) input.
The conventional logic block that a computer can understand takes precise
input and produces a definite output as TRUE or FALSE, which is equivalent
to human’s YES or NO.
The inventor of fuzzy logic, Lotfi Zadeh, observed that unlike computers, the
human decision making includes a range of possibilities between YES and
NO, such as −
CERTAINLY YES
POSSIBLY YES
CANNOT SAY
POSSIBLY NO
CERTAINLY NO
The fuzzy logic works on the levels of possibilities of input to achieve the
definite output.
Implementation
It can be implemented in systems with various sizes and capabilities
ranging from small micro-controllers to large, networked, workstation-
based control systems.
It can be implemented in hardware, software, or a combination of
both.
Why Fuzzy Logic?
Fuzzy logic is useful for commercial and practical purposes.
MP x is Medium Positive
S x is Small
MN x is Medium Negative
LN x is Large Negative
All membership functions for LP, MP, S, MN, and LN are shown as below −
The triangular membership function shapes are most common among various
other membership function shapes such as trapezoidal, singleton, and
Gaussian.
Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence
the corresponding output also changes.
Algorithm
Define linguistic Variables and terms (start)
Construct membership functions for them. (start)
Construct knowledge base of rules (start)
Convert crisp data into fuzzy data sets using membership functions.
(fuzzification)
Evaluate rules in the rule base. (Inference Engine)
Combine results from each rule. (Inference Engine)
Convert output data into non-fuzzy values. (defuzzification)
Development
Step 1 − Define linguistic variables and terms
Linguistic variables are input and output variables in the form of simple
words or sentences. For room temperature, cold, warm, hot, etc., are
linguistic terms.
Every member of this set is a linguistic term and it can cover some portion of
overall temperature values.
RoomTemp.
Very_Cold Cold Warm Hot Very_Hot
/Target
Build a set of rules into the knowledge base in the form of IF-THEN-ELSE
structures.
Fuzzy set operations perform evaluation of rules. The operations used for OR
and AND are Max and Min respectively. Combine all results of evaluation to
form a final result. This result is a fuzzy value.
Automotive Systems
Automatic Gearboxes
Four-Wheel Steering
Vehicle environment control
Consumer Electronic Goods
Hi-Fi Systems
Photocopiers
Still and Video Cameras
Television
Domestic Goods
Microwave Ovens
Refrigerators
Toasters
Vacuum Cleaners
Washing Machines
Environment Control
Air Conditioners/Dryers/Heaters
Humidifiers
Advantages of FLSs
Mathematical concepts within fuzzy reasoning are very simple.
You can modify a FLS by just adding or deleting rules due to flexibility
of fuzzy logic.
Fuzzy logic Systems can take imprecise, distorted, noisy input
information.
FLSs are easy to construct and understand.
Fuzzy logic is a solution to complex problems in all fields of life,
including medicine, as it resembles human reasoning and decision
making.
Disadvantages of FLSs
There is no systematic approach to fuzzy system designing.
They are understandable only when simple.
They are suitable for the problems which do not need high accuracy.
---------------------------------------finish------------------------------------
UNIT 8:-