Unit-5 Bayes' Rule and Bayesian Network

Bayes' Rule is a fundamental concept in probability theory and statistics that
describes how to update the probability of a hypothesis based on new

evidence. It's commonly used in:
1. Machine Learning: for classification, regression, and clustering tasks

2. Data Analysis: for inference and prediction
3. Artificial Intelligence: for decision-making and uncertainty reasoning
4. Statistics: for hypothesis testing and confidence intervals
Bayes' Rule Formula
P(H|E) = P(E|H) × P(H) / P(E)
Where:
- P(H|E) is the posterior probability of the hypothesis (H) given the evidence (E)
- P(E|H) is the likelihood of the evidence (E) given the hypothesis (H)
- P(H) is the prior probability of the hypothesis (H)
- P(E) is the probability of the evidence (E)
Example
Suppose we want to diagnose a disease (H) based on a positive test result (E).
We know:
- P(E|H) = 0.9 ( likelihood of a positive test result given the disease)

- P(H) = 0.01 (prior probability of the disease)
- P(E) = 0.1 (probability of a positive test result)
Using Bayes' Rule, we can calculate the posterior probability of the disease
given the positive test result:
P(H|E) = 0.9 × 0.01 / 0.1 = 0.09
So, the probability of the disease given the positive test result is 9%.
Bayes' Rule Use Cases
1. Medical Diagnosis: updating disease probabilities based on test results

2. Spam Filtering: classifying emails as spam or not spam based on features
3. Image Recognition: identifying objects in images based on features
4. Risk Assessment: updating risk probabilities based on new data.
Representing knowledge in an uncertain domain involves capturing and

reasoning about uncertain information. Here are some key aspects:
1. Uncertainty Representation: Use formalisms like probability theory, fuzzy

logic, or possibility theory to represent uncertainty.
2. Knowledge Graphs: Construct graphs to represent entities, relationships,

and uncertainty.
3. Bayesian Networks: Model probabilistic relationships between variables.
4. Fuzzy Ontologies: Represent imprecise concepts and relationships using

fuzzy logic.
5. Probabilistic Logic Programming: Combine logic programming with

probability theory.
6. Uncertainty Quantification: Assign numerical values to uncertainty, like

probability intervals.
7. Reasoning under Uncertainty: Use algorithms like Bayesian inference, fuzzy

reasoning, or probabilistic logic programming to draw conclusions.
8. Knowledge Update: Update knowledge representations as new information

becomes available.
9. Uncertainty Propagation: Propagate uncertainty through reasoning

processes.
10. Decision Making under Uncertainty: Make decisions based on uncertain

knowledge, using approaches like decision theory or probabilistic planning.
Some popular frameworks for representing knowledge in uncertain domains

include:
1. Probabilistic Graphical Models (PGMs)

2. Fuzzy Description Logics (FDLs)
3. Probabilistic Logic Programming (PLP)
4. Bayesian Networks (BNs)
5. Markov Logic Networks (MLNs)
These frameworks enable reasoning about uncertain knowledge, facilitating

informed decision-making in various applications, such as:
1. Artificial Intelligence (AI)

2. Machine Learning (ML)
3. Natural Language Processing (NLP)
4. Expert Systems
5. Decision Support Systems
Bayesian Belief Network in artificial intelligence

Bayesian belief network is key computer technology for dealing with probabilistic
events and to solve a problem which has uncertainty. We can define a Bayesian
network as:
"A Bayesian network is a probabilistic graphical model which represents a set of

variables and their conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian

model.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and anomaly
detection.
Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions,
and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems
under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can

be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows connect
the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if
there is no directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables
represented by the nodes of the network graph.
o If we are considering node B, which is connected with node A by a
directed arrow, then node A is called the parent of Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known
as a directed acyclic graph or DAG.
The Bayesian network has mainly two components:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability

distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability.

So let's first understand the joint probability distribution:
Joint probability distribution:

If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination
of x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint
probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:

Let's understand the Bayesian network through an example by creating a directed
acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm
reliably responds at detecting a burglary but also responds for minor earthquakes.
Harry has two neighbours David and Sophia, who have taken a responsibility to inform
Harry at work when they hear the alarm. David always calls Harry when he hears the
alarm, but sometimes he got confused with the phone ringing and calls at that time
too. On the other hand, Sophia likes to listen to high music, so sometimes she misses
to hear the alarm. Here we would like to compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary,
nor an earthquake occurred, and David and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the
alarm and directly affecting the probability of alarm's going off, but David and
Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
o The conditional distributions for each node are given as conditional
probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
List of all events occurring in this network:
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
ADVERTISEMENT
We can write the events of problem statement in the form of probability: P[D, S, A, B,
E], can rewrite the above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
ADVERTISEMENT
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:
B E P(A= True) P(A= False)
True True 0.94 0.06
True False 0.95 0.04

False True 0.31 0.69
False False 0.001 0.999
Conditional probability table for David Calls:
The Conditional probability of David that he will call depends on the probability of
Alarm.
A P(D= True) P(D= False)
True 0.91 0.09
False 0.05 0.95
Conditional probability table for Sophia Calls:
The Conditional probability of Sophia that she calls is depending on its Parent Node
"Alarm."
A P(S= True) P(S= False)
True 0.75 0.25
False 0.02 0.98
From the formula of joint distribution, we can write the problem statement in the form
of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.
The semantics of Bayesian Network:
There are two ways to understand the semantics of the Bayesian network, which is
given below:
1. To understand the network as the representation of the Joint probability
distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional

independence statements.
It is helpful in designing inference procedure.

Unit-5 Bayes' Rule and Bayesian Network

Uploaded by

Copyright:

Available Formats

Unit-5 Bayes' Rule and Bayesian Network

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit-5 Bayes' Rule and Bayesian Network

Uploaded by

Copyright:

Available Formats

Bayes' Rule is a fundamental concept in probability theory and statistics that

describes how to update the probability of a hypothesis based on new

1. Machine Learning: for classification, regression, and clustering tasks

Bayes' Rule Formula

P(H|E) = P(E|H) × P(H) / P(E)

- P(E|H) = 0.9 ( likelihood of a positive test result given the disease)

P(H|E) = 0.9 × 0.01 / 0.1 = 0.09

Bayes' Rule Use Cases

1. Medical Diagnosis: updating disease probabilities based on test results

Representing knowledge in an uncertain domain involves capturing and

1. Uncertainty Representation: Use formalisms like probability theory, fuzzy

2. Knowledge Graphs: Construct graphs to represent entities, relationships,

3. Bayesian Networks: Model probabilistic relationships between variables.

4. Fuzzy Ontologies: Represent imprecise concepts and relationships using

5. Probabilistic Logic Programming: Combine logic programming with

6. Uncertainty Quantification: Assign numerical values to uncertainty, like

7. Reasoning under Uncertainty: Use algorithms like Bayesian inference, fuzzy

8. Knowledge Update: Update knowledge representations as new information

9. Uncertainty Propagation: Propagate uncertainty through reasoning

10. Decision Making under Uncertainty: Make decisions based on uncertain

Some popular frameworks for representing knowledge in uncertain domains

1. Probabilistic Graphical Models (PGMs)

These frameworks enable reasoning about uncertain knowledge, facilitating

1. Artificial Intelligence (AI)

Bayesian Belief Network in artificial intelligence

"A Bayesian network is a probabilistic graphical model which represents a set of

It is also called a Bayes network, belief network, decision network, or Bayesian

o Each node corresponds to the random variables, and a variable can

The Bayesian network has mainly two components:

Each node in the Bayesian network has condition probability

Bayesian network is based on Joint probability distribution and conditional probability.

Joint probability distribution:

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Explanation of Bayesian network:

List of all events occurring in this network:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False False 0.001 0.999

Conditional probability table for David Calls:

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).

P(S, D, A, ¬B, ¬E) = P (S|A) P (D|A)P (A|¬B ^ ¬E) P (¬B) P (¬E).