Unit - 4 Hidden Markov Models

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

UNIT - 4

HIDDEN MARKOV MODELS

Sequential data- Markov models, HMM , Maximum


likelihood for the HMM, The forward and backward
algorithm, The sum-product algorithm, Scaling factors,
Viterbi algorithm, Linear dynamical systems
Introduction to Sequential Data
• Sequential data are data points that occur in a specific order, often
time-dependent.
• Examples: Daily rainfall measurements, Speech recognition (acoustic features
in time frames), DNA nucleotide sequences
Important distinction:
oStationary distributions: The generative process does not change over time.
oNon-stationary distributions: The process evolves over time.
• Previously, data points were treated as independent and identically
distributed (i.i.d.).
• For sequential data, this assumption does not hold, as nearby data points are
often correlated.
• Example: Predicting whether it will rain tomorrow based on today’s weather
(i.i.d. model lacks this connection).
Markov Models
• We define the Markov process as a simple stochastic process in which the distribution
of future states depends only on the present state and not on how it arrived in the
present state.
• A random sequence has the Markov property if its distribution is determined solely by
its current state.
• Any random process having this property is called a Markov random process. For
observable state sequences (state is known from data), this leads to a Markov chain
model. For non-observable states, this leads to a Hidden Markov Model (HMM).
First Order Markov Chain

• In a first-order Markov chain, the current state depends only on the previous state
• First-order Markov chain is described as each observation being independent of all
previous observations except the most recent.
• Example: Predicting weather based on whether it rained the day before.
Higher Order Markov Chain

• Second-order Markov chain: Depends on the last two observations


• More flexibility but with increased model complexity.
Markov Model
• A discrete (finite) system can be described as consisting of:
– N distinct states.
– Begins (at time t=1) in some initial state(s).
– At each time step (t=1,2,…) the system moves from current to next state (possibly
the same as the current state) according to transition probabilities associated with current
state.
• This kind of system is called a finite, or discrete Markov model. The model has been
named after Andrei Andreyevich Markov (1856 -1922)
Hidden Markov Models
• A statistical model called a Hidden Markov Model (HMM) is used to describe
systems with changing unobservable states over time.
• It is predicated on the idea that there is an underlying process with concealed
states, each of which has a known result.
• Probabilities for switching between concealed states and emitting observable
symbols are defined by the model.
• MMs are used in a wide range of industries, including finance, bioinformatics, and
speech recognition.
• HMMs are useful for modelling dynamic systems and forecasting future states
based on sequences that have been seen because of their flexibility.
HMM applications
Maximum likelihood for the HMM
The forward-backward algorithm
Evaluate
Evaluate : Forward- Backward algorithm
Evaluate : Forward - Backward algorithm
Evaluate
The sum-product algorithm for the HMM
• Applied to Hidden Markov Models (HMMs) for computing marginal distributions of
hidden states.
• This algorithm allows for efficient computation by breaking down the problem into
manageable pieces.
• The sum-product algorithm uses factor graphs to represent the dependencies between
variables, allowing for efficient message passing.
• Start by converting the directed graph into a factor graph. The factor graph shows all
variables explicitly, both latent and observed.
• For solving the inference problem, we condition on the observed variables x1,x2,…,xN.
• Absorb the emission probabilities into the transition probability factors.
• The transition probability factors, now simplified, include both the original transition
and emission probabilities.
• Alpha-Beta Algorithm Derivation: Start by denoting the final hidden variable zN as the
root node.
• Message Passing: First, pass messages from the leaf node h to the root node.
• Message Propagation: Use the general results for message propagation.

• Form of Messages: In the Hidden Markov Model, the propagated messages follow the
structure derived from these general results
Recursion for the messages of the form:

Messages that are propagated from the root node back to the leaf node
• The sum-product algorithm also specifies how to evaluate the marginals once all
the messages have been evaluated
Scaling Factor

• In Hidden Markov Models (HMMs), the forward-backward algorithm suffers from numerical
underflow because the forward variable involves multiplying small probabilities, causing values to
shrink exponentially.

• In (13.34), we defined representing the joint distribution of all the observations up to


and the latent variable . Now we define a normalized version of α given by

In order to relate the scaled and original alpha variables, we introduce scaling factors defined by conditional
distributions over the observed variables
From the product rule, we then have

and so

We can then turn the recursion equation for into one for given by

We can similarly define re-scaled variables using


which will again remain within machine precision because the quantities are simply the ratio of two
conditional probabilities

The recursion result for then gives the following recursion for the re-scaled variables

In applying this recursion relation, we make use of the scaling factors that were previously computed
in the phase
Similarly, using the given equations together we see that the required marginals are given by
The Viterbi algorithm

The Viterbi algorithm is used to find the most probable sequence of hidden
states in Hidden Markov Models (HMMs) for a given observation sequence.
It efficiently identifies the best path through the state space, crucial in
applications like speech recognition.

•Finding the most probable sequence is different from maximizing individual


state probabilities. The latter can lead to sequences with zero probability due
to zero transition probabilities.

•The Viterbi algorithm employs the max-sum algorithm, working with log
probabilities, which eliminates the need for re-scaling.

•HMMs can be represented as a factor graph, and messages are passed from
leaf nodes to the root.
• A fragment of the HMM lattice showing two possible paths. The Viterbi algorithm efficiently determines the
most probable path from amongst the exponentially many possibilities.
• For any given path, the corresponding probability is given by the product of the elements of the transition
matrix , corresponding to the probabilities for each segment of the path, along
with the emission densities associated with each node on the path.
Max-Sum Message Passing

Recursion for Messages

Initialization of Messages

Maximization of Joint Distribution


Backtracking to Find Path

The Viterbi algorithm efficiently computes the most probable sequence of hidden
states by maintaining only the best paths, reducing the computational cost from
exponential to linear with respect to the length of the sequence.
Linear Dynamical Systems
Inference in LDS
Learning in LDS
Extensions of LDS
Particle filters

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy