Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
Graphical Models
Lecture 8: State-Space Models
Based on slides by Richard Zemel
Sequential data
Turn a@en*on to sequen*al data
– Time-series: stock market, speech, video analysis
– Ordered: text, gene
• Need to take care: many counts may be zero in training
dataset
Character
recognition:
Transition
probabilities
Hidden Markov model (HMM)
• Return to casino example -- now imagine that do not observe
ABBAA, but instead just sequence of die rolls (1-6)
• Genera*ve process:
Loop un*l *red:
1. Flip coin C (Z = A or B)
2. Chosen dealer rolls die, record result X
Z is now hidden state variable – 1st order Markov chain generates
state sequence (path), governed by transi2on matrix A
Observa*ons governed by emission probabili2es, convert state path
into sequence of observable symbols or vectors:
Relationship to other models
• Can think of HMM as:
– Markov chain with stochastic measurements
x1 = x2 = x3 = x4 = x5 =
Need 3 distributions:
1. Initial state: P(Z1)
2. Transition model: P(Zt|Zt-1)
3. Observation model (emission probabilities): P(Xt|Zt)
HMM: Main tasks
T
P(x, z) = P( z1 ) P( x1 | z1 )∏t =2 P( zt | zt −1 ) P( xt | zt )
• Three problems
1. Computing probability of observed sequence: forward-
backward algorithm [good for recognition]
2. Infer most likely hidden state sequence: Viterbi algorithm
[useful for interpretation]
3. Learning parameters: Baum-Welch algorithm (version of
EM)
Fully observed HMM
Learning fully observed HMM (observe both X and Z) is easy:
1. Initial state: P(Z1) – proportion of words start with each
le?er
2. Transition model: P(Zt|Zt-1) – proportion of times a given
le?er follows another (bigram statistics)
3. Observation model (emission probabilities): P(Xt|Zt) – how
often particular image represents specific character, relative
to all images
But still have to do inference at test time: work out states given
observations
HMMs often used where hidden states are identified: words in
speech recognition; activity recognition; spatial position of
rat; genes; POS tagging
HMM: Inference tasks
Important to infer distributions over hidden states:
§ If states are interpretable, infer interpretations
§ Also essential for learning
= P( X t | Z t )∑ z P( Z t | zt −1 ) P( zt −1 | X 1:t −1 )
t −1
• Semi-Markov HMM
– Improve model of state duration
• Autoregressive HMM
– Allow observations to depend on some previous
observations directly
• Factorial HMM
– Expand dim. of latent state
State Space Models
Instead of discrete latent state of the HMM, model Z as a
continuous latent variable
Standard formulation: linear-Gaussian (LDS), with (hidden
state Z, observation Y, other variables U)
– Transition model is linear
zt = A t zt 1 + Bt u t + ✏ t
– with Gaussian noise
✏t = N (0, Qt )
– Observation model is linear
y t = C t zt + D t u t + t
– with Gaussian noise
t = N (0, Rt )
Model parameters typically independent of time: stationary
Kalman Filter
Algorithm for filtering in linear-Gaussian state space model
Everything is Gaussian, so can compute updates exactly
Dynamics update: predict next belief state
Z
p(zt |y1:t 1 , u1:t ) = N (zt |At zt 1 + Bt ut , Qt )N (zt 1 |µt 1 , ⌃t 1 )dzt 1
µt|t 1 = At µt 1 + Bt u t
T
⌃t|t 1 = A t ⌃t 1 At + Qt
Kalman Filter: Measurement Update
Key step: update hidden state given new measurement:
p(zt |y1:t , u1:t ) / p(yt |zt , ut )p(zt |y1:t 1 , u1:t )
First term a bit complicated, but can apply various identities
(such as the matrix inversion lemma, Bayes rule), obtain:
p(z |y , u ) = N (z |µ , ⌃ )
t 1:t 1:t t t t
The mean update depends on Kalman gain matrix K, and the
residual or innovation r = y – E[y]
µt = µt|t 1 + Kt rt
Kt = ⌃t|t T
1 Ct S t
1