Aiml Unit 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

1.

Bayesian Belief Network in artificial intelligence:

Bayesian belief network is key computer technology for dealing with probabilistic events
and to solve a problem which has uncertainty. We can define a Bayesian network as:

"A Bayesian network is a probabilistic graphical model which represents a set of


variables and their conditional dependencies using a directed acyclic graph."

It is also called a Bayes network, belief network, decision network, or Bayesian


model.

Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and anomaly
detection.

Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.

Bayesian Network can be used for building models from data and experts opinions, and
it consists of two parts:

o Directed Acyclic Graph


o Table of conditional probabilities.

The generalized form of Bayesian network that represents and solve decision problems
under uncertain knowledge is known as an Influence diagram.

A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can
be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows connect
the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there
is no directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables
represented by the nodes of the network graph.
o If we are considering node B, which is connected with node A by a
directed arrow, then node A is called the parent of Node B.
o Node C is independent of node A.

The Bayesian network has mainly two components:

o Causal Component
o Actual numbers

Each node in the Bayesian network has condition probability


distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability.
So let's first understand the joint probability distribution:

Joint probability distribution:

If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination
of x1, x2, x3.. xn, are known as Joint probability distribution.

P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

In general for each variable Xi, we can write the equation as:

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Explanation of Bayesian network:

Let's understand the Bayesian network through an example by creating a directed


acyclic graph:

Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm
reliably responds at detecting a burglary but also responds for minor earthquakes. Harry
has two neighbors David and Sophia, who have taken a responsibility to inform Harry at
work when they hear the alarm. David always calls Harry when he hears the alarm, but
sometimes he got confused with the phone ringing and calls at that time too. On the
other hand, Sophia likes to listen to high music, so sometimes she misses to hear the
alarm. Here we would like to compute the probability of Burglary Alarm.

Problem:

Calculate the probability that alarm has sounded, but there is neither a burglary,
nor an earthquake occurred, and David and Sophia both called the Harry.
Solution:

o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the alarm
and directly affecting the probability of alarm's going off, but David and Sophia's
calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
o The conditional distributions for each node are given as conditional probabilities
table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values

List of all events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
o We can write the events of problem statement in the form of probability: P[D, S,
A, B, E], can rewrite the above probability statement using joint probability
distribution:
o P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
o =P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
o = P [D| A]. P [ S| A, B, E]. P[ A, B, E]
o = P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
o = P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
Let's take the observed probability for the Burglary and earthquake component:

P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

P(E= False)= 0.999, Which is the probability that an earthquake not occurred.

We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:


From the formula of joint distribution, we can write the problem statement in the form
of probability distribution:

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).

= 0.75* 0.91* 0.001* 0.998*0.999

= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.

2. Naïve Bayes Classifier Algorithm:

o Naïve Bayes algorithm is a supervised learning algorithm, which is based


on Bayes theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training
dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can
make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.

Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a certain


feature is independent of the occurrence of other features. Such as if the fruit is
identified on the bases of color, shape, and taste, then red, spherical, and sweet
fruit is recognized as an apple. Hence each feature individually contributes to
identify that it is an apple without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability
of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the below
example:

Suppose we have a dataset of weather conditions and corresponding target variable


"Play". So using this dataset we need to decide that whether we should play or not on a
particular day according to the weather conditions. So to solve this problem, we need to
follow the below steps:

1. Convert the given dataset into frequency tables.


2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:


Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29
P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

3. Inference in bayesian network:

Bayesian networks (also known as Bayes nets or belief networks) are graphical models
that represent a set of variables and their conditional dependencies via a directed
acyclic graph (DAG). Inference in Bayesian networks refers to the process of computing
the probability distribution of one or more variables given some observed evidence.

Key Concepts in Bayesian Network Inference

1. Nodes and Edges:


o Nodes represent random variables.
o Edges represent conditional dependencies between variables. If there is a
directed edge from node A to node B, A is a parent of B, and B is
conditionally dependent on A.
2. Conditional Probability Table (CPT):
o Each node has an associated CPT that quantifies the effects of the
parents on the node. For a node B with parents A1,A2,…,An the CPT
specifies P(B∣A1,A2,…,An).

Types of Inference in Bayesian Networks

1. Exact Inference:
o Variable Elimination: This algorithm systematically sums out the
variables to compute the marginal distribution of the query variables given
the evidence.
o Belief Propagation (Message Passing): This algorithm works by passing
messages between nodes in the network to update beliefs about the
states of variables. It works well on tree-structured networks.
o Junction Tree Algorithm: Converts the original network into a tree
structure (a junction tree) to facilitate efficient belief propagation.
2. Approximate Inference:
o Monte Carlo Methods: These methods use random sampling to
approximate the posterior distributions. Examples include Gibbs sampling
and Metropolis-Hastings algorithm.
o Loopy Belief Propagation: An extension of belief propagation for
networks with loops, which iterates the message passing process to
approximate the posterior distributions.

Steps for Inference in Bayesian Networks

1. Set Up the Network: Define the structure of the network (nodes and edges) and
specify the CPTs for each node.
2. Observe Evidence: Set the values for the observed variables (evidence).
3. Compute Query Distributions: Use one of the inference algorithms to compute
the posterior distributions of the query variables given the evidence.

Example

Consider a simple Bayesian network with three nodes: A, B, and C, where A influences
B and B influences C.

1. Nodes:
o A: Cloudy
o B: Rain
o C: Sprinkler
2. Edges:
o A→B
o B→C
3. CPTs:
o P(A)
o P(B∣A)
o P(C∣B)

Given evidence, such as observing that the sprinkler is on (C=True), we want to


compute the probability that it is raining (B=True). Using inference algorithms, we can
compute P(B=True| C = True).

4. Casual network:

Causal networks, also known as causal Bayesian networks or causal models, are a
specific type of Bayesian network that explicitly represent causal relationships between
variables. They are used to model and reason about cause-and-effect relationships,
providing a framework to predict the effects of interventions and understand the
underlying mechanisms of observed data.

Key Concepts in Causal Networks

1. Causal Graph:
o Nodes: Represent random variables.
o Directed Edges: Represent causal influences. If there is a directed edge
from node A to node B (denoted A→B), it means that A is a direct cause
of B.
2. Causal Markov Condition:
o Each variable is independent of its non-effects given its direct causes
(parents). This means that the probability of each variable can be
described by its conditional probability given its parents.
3. Interventions:
o Interventions are actions that forcibly set the value of one or more
variables in the network, breaking the natural causal links. The do-
operator (do(X=x)) is used to denote an intervention where variable X is
set to x.

Differences Between Causal and Bayesian Networks

• Causal Networks: Focus on cause-and-effect relationships and allow for


reasoning about interventions.
• Bayesian Networks: Focus on probabilistic dependencies and correlations
without necessarily implying causation.

Causal Inference

Causal inference is the process of drawing conclusions about causal relationships from
data. There are several key tasks in causal inference:

1. Causal Discovery:
o Identifying the causal structure from data. This involves determining the
direction of the edges in the network, which can be challenging because
correlation does not imply causation.
2. Estimating Causal Effects:
o Quantifying the effect of an intervention. For example, estimating the
effect of a new drug on patient recovery rates.
3. Counterfactual Reasoning:
o Considering hypothetical scenarios to determine what would have
happened if a different action had been taken.

Methods for Causal Inference

1. Do-Calculus:
o A set of rules developed by Judea Pearl for reasoning about interventions
in causal models. It allows for the derivation of causal effects from
observational data.
2. Structural Equation Modeling (SEM):
o A framework that represents causal relationships using equations. Each
variable is expressed as a function of its parents and an error term.
3. Randomized Controlled Trials (RCTs):
o Experimental studies where subjects are randomly assigned to treatment
and control groups. This randomization helps to establish causal
relationships by controlling for confounding variables.

Example

Consider a simple causal network with three variables: A, B, and C, where A causes B
and B causes C:

• A→B→C

If we observe that C has a high value, we might be interested in determining whether A


is the cause. By intervening on A (e.g., setting A to a specific value) and observing the
changes in B and C, we can infer the causal effect of A on C.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy