EM Algorithm
EM Algorithm
= e2 σ
σ √2π σ √2 π e σ √2π e
n
2
1 −∑ (x i−μ)
= n /2 e i=1
(2 π)
2
n
2
n ∑ (x−μ)
logL(µ) = - log2π - i=1
2
2
= 0 ⇒ (x1+x2+…+xn-1+ μ0)-nμ =0
d Q (μ , μ 0)
dμ
( x 1 + x 2+ …+ x n−1+ μ 0)
μ1=
n
( x 1 + x 2+ …+ x n−1+ μ 0)
Using the initial guess μ0 the estimate of μ is obtained as μ1= and
n
( x + x + …+ x n−1+ μ k )
the process is repeated (at the k th stage, μk+1= 1 2 ) till it converges to the
n
estimated value of μ.
Assume that we observe a sample x = (9; 11; ?) from the univariate normal distribution
N(µ,1), estimate the missing observation.
( x + x + …+ x n−1+ μ k ) ( x 1 + x 2+ μ k )
μk+1= 1 2 =
n 3
Let the initial guess of it as µ0 = 0. Then the maximum likelihood estimate of µ will be
9+11+ μ 0 9+11+ 0
µ1 = = = 6.67
3 3
9+11+ μ1 9+11+6.67
Using this formula again gives µ2 = = =8.89 and so on.
3 3
µn µn+1
0 6.67
6.67 8.89
8.89 9.63
9.63 9.88
9.88 9.96
9.96 9.99
9.99 10.00
10.00 10.00
It can be seen that µn → 10 as n →∞ .
For this simple model (a univariate normal distribution with unit variance), it can be seen that
9+11
substituting the average of the known values =10 is the best answer to replace missing
2
value.
2. Let X1,X2,…,Xn be random sample from N(µ,σ) and m out of n observations are
missing. Estimate missing observations using EM algorithm.
Χ ~ N (μ, σ2)
= e2 σ
σ √2π σ √2 π e σ √2π e
1 −1
¿¿
= n n e 2σ
2
(√ 2π) σ
1 −1 ¿¿
Lα 2σ v(z)+(E(z))2=E(z2)= σ2+ µ2
2
σ e
n
1 −1
L α n 2 σ ¿¿ 2
σ e
2 2
Σ xi Σ zi µ n−m m
n 2
log L = - n log σ - 2
- 2
+ 2 [ ∑ x i +¿ ∑ z i ¿ - 2µ
2σ 2σ σ i=1 i=1 2σ
Σ xi
2
m ( σ 20 + µ20 ) µ n−m n 2
2 [ ∑ x i +¿ ¿ mµ0] -
2 2
Q( µ , σ , µ 0 ,σ )=E(log L,z/x)= -n log σ –
0 2
– + 2µ
2σ 2 σ2 σ i=1 2σ
n−m
2
dQ(µ , σ , µ0 , σ )
=
2
0 ∑ x i+ m µ0 - n µ = 0
i=1 2
dµ σ
σ2
n−m
∴ μ1 = ∑ x i+ ¿ mµ0
i=1
¿
n
m ( σ 0 + µ0 ) 2 µ[ Σ x i +m µ0 ]
2 2 2 2 2 2
dQ(µ , σ , µ0 , σ 0 ) n Σx nµ
= - + 3i + - - + 3 =0
dσ σ σ σ
3
σ
3
σ
n−m
n−m
=
∑ x 2i + m ( σ 20 + µ20 )
- μ12
i=1
n
n−m n−m
μk +1= ∑ x 2i + m μk
, σ
2
k+1 =
∑ x 2i + m( σ 2k + μ2k )
- μ2k+1
i=1 i=1
n n
θ0
(125 + x 5)(1- θ)-(x 3+ x 4)θ = 0
(θ0 +2)
(125 θ0 + x5 (θ0 +2))(1- θ) -(x 3+ x 4)θ(θ0 +2)=0
θ(125 θ0 + x5 (θ0 +2)+(x 3+ x 4)(θ0 +2))= (125 θ0 + x5 (θ0 +2))
(125+ x 5) θ0 +2 x5
θ1 =
(125+ x 5 + x 3 + x 4 )θ0 +2( x5 + x 3 + x 4 )
159 θ0 +68
θ1 =
197 θ0 +144
Starting with θ0 = 0.5, using the above formula θ1 = 0.6082. Repeatedly using the above
formula, the following values of θ can be obtained as 0.6268
n θn
0 0.5
1 0.6082
2 0.6243
3 0.6265
4 0.6268
5 0.6268
MLE method
197 ! 2+ θ y1 1−θ y2 1−θ y3 θ y4
L= ( ) ( ) ( ) ( )
y 1 ! y 2! y 3 ! y 4 ! 4 4 4 4
L(θ/X) = c(2+θ)y1 (1- θ)y2+y3 θy4
log L(θ /x) = y1log(2+θ) + (y2 +y3) log(1- θ)+ y4 logθ
d L(θ / x) y 1 y 2+ y 3 y 4
= 0 gives - + =0
dθ (2+θ) (1−θ) θ
(1- θ)θ y1-(2+θ)θ(y2 +y3)+ (2+θ) (1-θ) y4=0
2
θ (y1+ y2 +y3+ y4)+ θ(-y1+2 y2+2 y3+ y4)-2 y4 =0
197θ2 - 15θ-68=0
θ =0.6268
Assume that the identities of the coins used for each toss are not known. It is hidden variable.
EM algorithm is used to obtain estimates of θ1 and θ2 in this situation.
It turns out that we can make progress by starting with a guess for the coin biases, which will
allow us to estimate which coin was chosen in each trial and come up with an estimate for the
expected number of heads and tails for each coin across the trials (E-step). We then use these
counts to recompute a better guess for each coin bias (M-step). By repeating these two steps,
we continue to get a better estimate of the two-coin biases and converge at a solution that
turns out to be a local maximum to the problem.
The E-Step: Estimating likelihood each coin was chosen. Let the series of flips be event E.
The no. of heads and tails in event E be h and t. Let ZA and ZB be the events that coin A and B
are chosen.
First, let's assume that it is coin A, then the probability of seeing these flips would be
P(E|ZA) = θ1h(1-θ1)t
P(E∨Z A)P(Z A ) P ( E ∩Z A)
Using Bayes’ theorem, P(ZA|E)= (= =
P ( E∨Z A )P( Z A)+ P( E∨Z B)P(Z B) P(E)
P(E ∩ Z A)
)
P ( E ∩ Z A )+ P(E ∩ Z B)
P(E∨Z A )
P(ZA|E)=
P ( E∨Z A )+ P (E∨Z B)
h t
θ 1 (1−θ 1)
=
θ1 (1−θ 1)t +θ 2h (1−θ 2)t
h
h t
θ 2 (1−θ 2)
and for coin B, P(ZB|E)= h
θ1 (1−θ 1)t +θ 2h (1−θ 2)t
The "E-step" assuming θ1= 0.6 and θ2 = 0.5, probabilities for the first flip are
5 5 5 5
.6 ∗.4 .5 ∗.5
P(ZA|E) = 5 5 5 5 =0.45, P(ZB|E)= 5 5 5 5 =0.55
.6 ∗.4 + .5 ∗.5 .6 ∗.4 + .5 ∗.5
Probability Probability it No. heads No. heads
Flips it was coin was coin B attributed attributed
A P(ZA|E) P(ZB|E) to A to B
HTTTHHTHTH 0.45 0.55 2.2 2.8
HHHHTHHHHH 0.8 0.2 7.2 1.8
HTHHHHHTHH 0.73 0.27 5.9 2.1
HTHTTTHHTT 0.35 0.65 1.4 2.6
THHHTHHHTH 0.65 0.35 4.5 2.5
Total 2.98 2.02 21.2 11.8
The M-Step: Revised estimates of θ1 and θ2 are obtained by dividing the expected number of
heads by the expected number of total flips
21.2
θ1= =0.71
10∗2.98
11.8
θ2= =0.58
10∗2.02
With updated estimates for θ1 and θ2, we can repeat the E-step again and then M-step.
n=3, k=2
L= ¿1 f(x1, µ1, σ12)] z11¿2 f(x1, µ2, σ22)] z12¿1 f(x2, µ1, σ12)] z21¿2 f(x2, µ2, σ22)] z22¿1 f(x3, µ1, σ12)] z31¿2
f(x3, µ2, σ22)] z32
i=1 j=1
k
Maximise Q=E(logL/f(x, µj, σj2)) subject to the condition ∑ π j ¿ 1
j=1
n k k
Φ= ∑ ∑ γ ij [log π j + log f (x ¿¿ i, µ j , σ 2j )¿]+λ(1−∑ π j)
i=1 j=1 j=1
n
dϕ
=0∴∑
n
γ ij
– λ = 0 ∴ πj =
∑ γ ij
i=1
dπ j i=1 π j
λ
n k
k
∴∑π j = ∑ ∑ γ ij =1
i=1 j=1
j=1
λ
n k
∑ ∑ γ ij n k
=∑ ∑ γ ij =n
i=1 j=1
∴ λ= k
∑π j i=1 j=1
j=1
n
∑ γ ij n
∴ πj =
i=1
=
∑ γ ij (2)
n k i=1
∑ ∑ γ ij n
i=1 j=1
2 1 −(x −µ )2 i j
f (x ¿¿ i, µ j , σ j )¿ = ❑ 2σ 2
√2 π σj e j
n n
∴ ∑ γ ijxi = µ j ∑ γ ij
i=1 i=1
n
∑ γ ij x i
i=1
µj = n (3)
∑ γ ij
i=1
dQ n
−1 (x i−µ j )2
❑ = 0 ∴
d σj ∑ γ ij ( σ ❑j + σ 3j
) =0
i=1
n n
σ 2j ∑ γ ij = ∑ γ ij(x i−µ j)2
i=1 i=1
n
∑ γ ij (x i−µ j )2
2 i=1
σ =
j n (4)
∑ γ ij
i=1
The EM algorithm is sensitive to the initial values of the parameters, so care must be taken in
the first step. However, assuming the initial values are “valid,” one property of the EM
algorithm is that the log-likelihood increases at every step.