0% found this document useful (0 votes)
6 views

Contact session6

The document discusses concepts related to Artificial Intelligence, focusing on probability, Bayesian networks, and Hidden Markov Models. It explains how probability theory helps manage uncertainty and belief in various scenarios, including conditional and joint probabilities. Additionally, it introduces Bayesian networks as a method for representing dependencies among variables and highlights the use of Hidden Markov Models for sequence prediction tasks.

Uploaded by

vishnugupta2098
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Contact session6

The document discusses concepts related to Artificial Intelligence, focusing on probability, Bayesian networks, and Hidden Markov Models. It explains how probability theory helps manage uncertainty and belief in various scenarios, including conditional and joint probabilities. Additionally, it introduces Bayesian networks as a method for representing dependencies among variables and highlights the use of Hidden Markov Models for sequence prediction tasks.

Uploaded by

vishnugupta2098
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

ARTIFICIAL INTELLIGENCE

By
BITS Team AI
Pilani
Pilani | Dubai | Goa | Hyderabad
BITS
Pilani
Pilani | Dubai | Goa | Hyderabad

Contact Session-6
Probability, Bayesian Networks and Hidden Markov Models
Facts
 Delhi is the capital of India
 You can reach Bangalore Airport from MG Road within 90 mins if
you go by route A.
 ” Would you call that as a fact?
 Are you sure this would be true always?
Uncertainty

 You can reach Bangalore Airport from MG Road within 90 mins if you go by
route A.
 There is uncertainty in this information due to partial observability and non
determinism
 Agents should handle such uncertainty

 Previous approaches like Logic represent all possible world states

 Such approaches can’t be used as multiple possible states need to be


enumerated to handle the uncertainty in our information
Belief

 You can reach Bangalore Airport from MG Road within 90 mins if


you go by route A.

 Such information can only provide a degree of belief, e.g., we are


80% confident that it would be true on any given day

 In order to deal such degree of belief, we need Probability


Theory
Probability Theory

 Probability provides a way to summarize the uncertainty


 Sample Space: Set of all possible outcomes.
 Ex: After tossing 2 coins, the set of all possible outcomes are
 {HH, HT, TH, TT}
 Event: A subset of a sample space.
 An event of interest might be - {HH}
Probability Basics – Probability Model

 A fully specified probability model associates a numerical


probability P(ω) with each possible world.
 The basic axioms
 Every possible world has a probability between 0 and 1
 Sum of probabilities of possible worlds is 1
 E.g., P(HH) = 0.25; P(HT) = 0.25; P(TT) = 0.25, P(TH) = 0.25
Probability Basics – Unconditional / Prior

 Unconditional / Prior probabilities: Propositions like


P(sum = 11) or P(two dices rolling equals) are called
Unconditional or Prior probabilities

 They refer to degree of belief in absence of any other


information
Probability Basics - Conditional
 However, most of the time we have some information, we
call it evidence
 E.g., we can be interested in two dice rolling a double (i.e., 1,1 or 2,2,
etc)
 When one die has rolled 5 and the other die is still
spinning
 Here, we not interested in unconditional probability of
rolling a double
 Instead, the conditional or posterior probability for
rolling a double given the first die has rolled a 5
 where | is pronounced “given”
 E.g., if you are going for a dentist for a checkup, P(cavity) = 0.2
 If you have a toothache, then P(cavity | toothache) = 0.6
Probability Basics - Conditional
 Conditional probabilities can be expressed in unconditional
probabilities
 E.g., for proposition a, b
 Holds when P(b) > 0

 Product rule :
Example
In a factory there are 100 units of a certain product, 5 of which are defective. We
pick three units from the 100 units at random. What is the probability that none of
them are defective?
Let Ai as the event and i=1,2,3
Probability Distribution

 If there is a random variable Weather with domain {sunny, rain, cloudy,


snow}, we could write
 P(Weather = sunny) = 0.6
 P(Weather = rain) = 0.1
 P(Weather = cloudy) = 0.2
 P(Weather = snow) = 0.1

 P(Weather) = <0.6, 0.1, 0.2, 0.1>


 P defines a probability distribution for the random variable Weather
 P is used for conditional distributions; P(X|Y) gives values for P(X = i | Y = j)
for all combinations of i and j
Probability Density Function

 For continuous variables (e.g., weight) it is infeasible to write the distribution


as a vector

 Instead, we define the probability as a function of the value the variable can
take
 P(Weight = x) = Normal(x | mean = 60, std = 10)

 It says that the weight of an individual is Normal (Gaussian) distributed with


average weight being 60 kgs and a deviation of 10 kgs

 They are called as Probability Density Function


Joint Probability Distributions
 Instead of distribution over single variable, we can model distribution over
multiple variables, separated by comma
 E.g., P(A, B) = P(A | B) . P(B)
 P(A, B) is the probability distribution over combination of all values of A and
B
 E.g., if A = Weather and B = Cavity
Independence
 If we have two random variables, TimeToBnlrAirport and
HyderabadWeather
P(TimeToBnlrAirport, HyderabadWeather)
To determine their relation, use the product rule
= P(TimeToBnlrAirport | HyderabadWeather) / P(HyderabadWeather)

 However, we would argue that HyderabadWeather and


TimeToBnlrAirport doesn’t have any relation and hence
P(TimeToBnlrAirport | HyderabadWeather) = P(TimeToBnlrAirport)
This is called Independence or Marginal Independence
Independence between propositions a and b can be written as
Bayes Rule
 Using the product rule for propositions a and b

 Equating the right hand sides and dividing by P(a)

 This is called the Bayes Rule


Conditional Independence

Can generalize to more than 2 random variables


 E.g., K different symptom variables X1, X2,
… XK, and C = disease
 P(X1, X2,…. XK | C) = Π P(Xi | C)
 Also known as the naïve Bayes assumption
Bayesian Networks

 Full joint probability distributions can be


used for inference however can become
intractable as the number of variables
grow

 Bayesian Networks represent


dependencies among variables
 Can represent full joint probability
distribution
What is a Bayesian Network

 A Bayesian network is a directed graph in which each node is annotated with


quantitative probability information.
 Each node corresponds to a random variable, which may be discrete or
continuous.
 A set of directed links or arrows connects pairs of nodes. If there is an arrow
from node X to node Y , X is said to be a parent of Y. The graph has no
directed cycles (and hence is a directed acyclic graph, or DAG).
 Each node Xi has a conditional probability distribution P(Xi | Parents(Xi )) that
quantifies the effect of the parents on the node.
Bayesian Networks

 Once the topology of the Bayesian


Network is laid out,
 Will specify the conditional probability
distribution for each variable given its
parents
 Such that the topology and conditional
probabilities suffices to specify the full
joint probability distribution of all
variables
Building a Bayesian Network

BITS Pilani, Deemed to be University under Section 3 of UGC


BITS Pilani, Pilani Campus

Example Bayesian Net #2


A Burglary Alarm
System
– Fairly reliable on detecting a
burglary
– Also responds to earthquakes
– Two neighbors John and Mary
are asked to call you at work
when Burglary happens and
they hear the Alarm
– John nearly always calls when
he hears the alarm, however
sometimes confuses the
telephone ring with alarm and
calls then too
– Mary like loud music and often
misses the alarm altogether
– Problem: Given the
information that who has /
has not called we need to
estimate the probability of a
burglary
calculate the probability that
alarm has sounded, but neither
burglary nor earthquake
happened, and both John and
Mary called.
Example Bayesian Net #3Traffic Prediction -Travel Estimation

– AI system reminds traveler regarding


• day time
– Travel plan is to reach Delhi and the weather of Delhi may influence
the accommodation plans
– Traveler always take car to reach airport
– Car may be rerouted either due to road block or weekday traffic during
working hours which delays the arrival to airport
– Bars are always observed to be full on
• weekends
– Authorities block roads to safe the processions
– Processions observed during festive season or due to the political rally.
– Problem: Given the information that there is a political rally expected
estimate the probability of late arrival
BITS Pilani, Pilani Campus

Identify the R.Vs


Political
Festival RallY

Dependencies
among RVs

Procession Weather
@ Delhi
Find the
Conditional
Independences

Road
weekend Use ML to get the
best Linearization
among RVs
block

Construct the Bayes


Net
Cars
All bars
route
are
d
full
Encode the Local
dependencies by
CPT

Late
for
BGLR
airport
Example Bayesian Net #3
F ~F
Y ~Y
Identify the R.Vs
T
Political T
F Y P ~P Festival RallY

T T Dependencies
among RVs

T F D ~D
F T Procession Weather
T
@ Delhi Find the
F F Conditional
Independences

P R ~R
W ~W
T Road Use ML to get the
weekend T best Linearization
among RVs
F
block

R W C ~C
W A ~A Construct the Bayes
Net
T T Cars T
All bars
T F route
are F
d
F T full Encode the Local
dependencies by
CPT
F F
Late for
C L ~L BGLR
airport
T

F BITS Pilani, Pilani


Probabilistic Reasoning over Time
Hidden Markov Model

37
Partial Observability

 Agents in partially observable environment should keep a track of current


state to the extent allowed by sensors
 E.g., Robot moving in a new maze

 Agent maintains a belief state representing the current possible world states
 Transition Model: Using belief state and transition model, the agent can how
the world might evolve in next time step
 Sensor Model: With the observed percepts and sensor model, the agent can
update the belief state
Degree of belief
 Earlier, the states were captured as
 Facts in Logic which can take either True/False

 To capture the degree of belief we will use Probability


Theory
 We model the change in world using a variable for each
aspect of state and at each point in time

 Transition models – describe the probability distribution of


variables at time t, given the state of the world at past times
 Sensor models – describe the probability of each percept at
time t, given the current state of the world
Time and Uncertainty
 Static World: Each random variable would
have a single fixed value
 E.g., Diagnosing a broken car

 Dynamic World: The state information


keeps changing with time
 E.g., treating a diabetic patient, tracking
the location of robot, tracking economic
activity of a nation
Hidden Markov model
Markov model /Markov chain
A Markov process is a process that generates a sequence of
outcomes in such a way that the probability of the next outcome
depends only on the current outcome and not on what happened
earlier.
MARKOV CHAIN: WEATHER EXAMPLE

Design a Markov Chain to


predict the weather of
tomorrow using previous
information of the past days.

states: 𝑆 = 𝑆1, 𝑆2, 𝑆3


 Our model has only 3

𝑆1 = 𝑆𝑢𝑛𝑛𝑦 , 𝑆2 =
𝑅𝑎𝑖𝑛𝑦, 𝑆3 = 𝐶𝑙𝑜𝑢𝑑𝑦.
Contd..
Contd..
state sequence notation: 𝑞1, 𝑞2, 𝑞3, 𝑞4, 𝑞5, … . ., where 𝑞𝑖 𝜖
{𝑆𝑢𝑛𝑛𝑦, 𝑅𝑎𝑖𝑛𝑦, 𝐶𝑙𝑜𝑢𝑑𝑦}.

 Markov Property
Example
Given that today is Sunny, what’s the probability that tomorrow is
Sunny and the next day Rainy?
Example2
Assume that yesterday’s weather was Rainy, and today is
Cloudy, what is the probability that tomorrow will be Sunny?
WHAT IS A HIDDEN MARKOV MODEL (HMM)?

A Hidden Markov Model, is a stochastic model where the states


of the model are hidden. Each state can emit an output which is
observed.
Imagine: You were locked in a room for several days and you
were asked about the weather outside. The only piece of
evidence you have is whether the person who comes into the
room bringing your daily meal is carrying an umbrella or not.
 What is hidden? Sunny, Rainy, Cloudy
 What can you observe? Umbrella or Not
Markov chain Vs HMM
Markov chain HMM
Hidden Markov Models (Formal)

• States Q = q1, q2…qN;


• Observations O= o1, o2…oN;
• Transition probabilities
• Transition probability matrix A = {aij}

• Emission Probability /Output probability


• Output probability matrix B={bi(k)}

• Special initial probability vector 


First-Order HMM Assumptions
• Markov assumption: probability of a state depends
only on the state that precedes it

P(qi | q1...qi 1) P(qi | qi 1)


How to build a second-order HMM?
• Second-order HMM
• Current state only depends on previous 2 states
• Example
• Trigram model over POS tags
Markov Chain for Weather
What is the probability of 4 consecutive warm
days?

Sequence is
warm-warm-warm-warm
And state sequence is
3-3-3-3
P(3,3,3,3) =
3a33a33a33a33 = 0.2 x (0.6)3 = 0.0432
Hidden Markov Models
 It is a sequence model.
 Assigns a label or class to each unit in a sequence, thus
mapping a sequence of observations to a sequence
of labels.
 Probabilistic sequence model: given a sequence of units
(e.g. words, letters, morphemes, sentences), compute a
probability distribution over possible sequences of labels
and choose the best label sequence.
 This is a kind of generative model.

BITS Pilani, Pilani Campus


Hidden Markov Model (HMM)
Oftentimes we want to know what produced the sequence – the
hidden sequence for the observed sequence. For example,
– Inferring the words (hidden) from acoustic signal (observed) in
speech recognition
– Assigning part-of-speech tags (hidden) to a sentence
(sequence of words) – POS tagging.
– Assigning named entity categories (hidden) to a sentence
(sequence of words) – Named Entity Recognition.

BITS Pilani, Pilani Campus


Problem 1:
Observation Likelihood
• The probability of a observation sequence given a model
and state sequence
• Evaluation problem

BITS Pilani, Pilani Campus


Problem 2:

• Most probable state sequence given a model and an


observation sequence
• Decoding problem

BITS Pilani, Pilani Campus


Problem 3:

• Infer the best model parameters, given a partial model and an


observation sequence...
– That is, fill in the A and B tables with the right numbers --
• the numbers that make the observation sequence most likely
• This is to learn the probabilities!

BITS Pilani, Pilani Campus


Solutions
Problem 1: Forward (learn observation sequence)
Problem 2: Viterbi (learn state sequence)
Problem 3: Forward-Backward (learn probabilities)
– An instance of EM (Expectation Maximization)

BITS Pilani, Pilani Campus


Example :HMMs for Ice Cream

You are a climatologist in the year 2799 studying global warming


You can’t find any records of the weather in Baltimore for summer of
2007
But you find Jason Eisner’s diary which lists how many ice-creams
Jason ate every day that summer
Your job: figure out how hot it was each day

BITS Pilani, Pilani Campus


Hidden Markov Model
Example -1 – Viterbi Algorithm - Initialization

P(C)*P(3|C) = 0.2*0.1 = 0.02


C

*
Hot Cold
<S> 0.8 0.2
A Hot Cold

H Hot 0.6 0.4


Cold 0.5 0.5
P(H)*P(3|H) = 0.8*0.4 = 0.32
B 1 2 3
Hot .2 .4 .4
Cold .5 .4 .1

BITS Pilani, Pilani Campus


Hidden Markov Model
Example -1 – Viterbi Algorithm - Recursion

P(C)*P(C|C)*P(1|C) = 0.02*0.5*0.5 = 0.005


P(H)*P(C|H)*P(1|C) = 0.32*0.4*0.5 = 0.064
C C
0.02

* Hot Cold
<S> 0.8 0.2
A Hot Cold

Hot 0.6 0.4


H H Cold 0.5 0.5
0.32
P(C)*P(H|C)*P(1|H) = 0.02*0.5*0.2 = 0.002 B 1 2 3
P(H)*P(H|H)*P(1|H) = 0.32*0.6*0.2 = 0.0384 Hot .2 .4 .4
Cold .5 .4 .1

BITS Pilani, Pilani Campus


Hidden Markov Model
Example -1 – Viterbi Algorithm – Termination through Back Trace
P(C)*P(C|C)*P(3|C) = 0.064*0.5*0.1 = 0.032
P(H)*P(C|H)*P(3|C) = 0.0384*0.4*0.1 = 0.0015
C C C
0.02 0.064

* Best
Sequence:
Hot Cold
Hot🡪Cold🡪Cold
<S> 0.8 0.2
A Hot Cold

H H H Hot 0.6 0.4


Cold 0.5 0.5
0.32 0.0384
P(C)*P(H|C)*P(3|H) = 0.064 *0.5*0.2 = 0.0064 B 1 2 3
P(H)*P(H|H)*P(3|H) = 0.0384 *0.6*0.2 = 0.0046 Hot .2 .4 .4
Cold .5 .4 .1

BITS Pilani, Pilani Campus


Hidden Markov Model
Example -1 – Viterbi Algorithm

Source Credit : Speech and Language Processing - Jurafsky and Martin

BITS Pilani, Pilani Campus


Hidden Markov Model
Example -1 – Viterbi Algorithm

Source Credit : Speech and Language Processing - Jurafsky and Martin

BITS Pilani, Pilani Campus


Hidden Markov Model
Example -4 – Naïve Search

Hot Cold
<S> 0.8 0.2

HHH P(H)*P(1|H)*P(H|H)*P(3|H)*P(H|H)*P(1|H)
A Hot Cold
HHC
HCC Hot 0.7 0.3
CCC Cold 0.4 0.6

CHC P(C)*P(1|C)*P(H|C)*P(3|H)*P(C|H)*P(1|C) 0.0024


=0.2*0.5*0.4*0.4*0.3*0.5 B 1 2 3
CCH Hot .2 .4 .4
CHH Cold .5 .4 .1

HCH
BITS Pilani, Pilani Campus

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy