Lacey Thacker Tutorial
Lacey Thacker Tutorial
Lacey Thacker Tutorial
1996-002
Internal.
Tutorial: The Kalman Filter.
N.A.Thacker, A.J.Lacey.
Last updated
1 / 12 / 1998
See Also Tina Memo: 1995-1998
Imaging Science and Biomedical Engineering Division,
Medical School, University of Manchester,
Stopford Building, Oxford Road,
Manchester, M13 9PT.
Tony Lacey and Neil Thacker.
1 Introduction
The Kalman lter [1] has long been regarded as the optimal solution to many tracking and data prediction tasks, [2].
Its use in the analysis of visual motion has been documented frequently. The standard Kalman lter derivation
is given here as a tutorial exercise in the practical use of some of the statistical techniques outlied in previous
sections. The lter is constructed as a mean squared error minimiser, but an alternative derivation of the lter
is also provided showing how the lter relates to maximum likelihood statistics. Documenting this derivation
furnishes the reader with further insight into the statistical constructs within the lter.
The purpose of ltering is to extract the required information from a signal, ignoring everything else. How well a
lter performs this task can be measured using a cost or loss function. Indeed we may dene the goal of the lter
to be the minimisation of this loss function.
2 Mean squared error
Many signals can be described in the following way;
y
k
= a
k
x
k
+ n
k
(1)
where; y
k
is the time dependent observed signal, a
k
is a gain term, x
k
is the information bearing signal and n
k
is
the additive noise.
The overall objective is to estimate x
k
. The dierence between the estimate of x
k
and x
k
itself is termed the error;
f (e
k
) = f (x
k
x
k
) (2)
The particular shape of f (e
k
) is dependent upon the application, however it is clear that the function should be
both positive and increase monotonically [3]. An error function which exhibits these characteristics is the squared
error function;
f (e
k
) = (x
k
x
k
)
2
(3)
Since it is necessary to consider the ability of the lter to predict many data over a period of time a more meaningful
metric is the expected value of the error function;
lossfunction = E (f (e
k
)) (4)
This results in the mean squared error (MSE) function;
(t) = E
_
e
2
k
_
(5)
3 Maximum likelihood
The above derivation of mean squared error, although intuitive is somewhat heuristic. A more rigorous derivation
can be developed using maximum likelihood statistics. This is achieved by redening the goal of the lter to nding
the x which maximises the probability or likelihood of y. That is;
max[P(y| x)] (6)
Assuming that the additive random noise is Gaussian distributed with a standard deviation of
k
gives;
P(y
k
| x
k
) = K
k
exp
_
(y
k
a
k
x
k
)
2
2
2
k
_
(7)
2
where K
k
is a normalisation constant. The maximum likelihood function of this is;
P(y| x) =
k
K
k
exp
_
(y
k
a
k
x
k
)
2
2
2
k
_
(8)
Which leads to;
logP(y| x) =
1
2
k
_
(y
k
a
k
x
k
)
2
2
k
_
+ constant (9)
The driving function of equation 11.9 is the MSE, which may be maximised by the variation of x
k
. Therefore
the mean squared error function is applicable when the expected variation of y
k
is best modelled as a Gaussian
distribution. In such a case the MSE serves to provide the value of x
k
which maximises the likelihood of the signal
y
k
.
In the following derivation the optimal lter is dened as being that lter, from the set of all possible lters which
minimises the mean squared error.
4 Kalman Filter Derivation
Before going on to discuss the Kalman lter the work of Norbert Wiener [4], should rst be acknowledged . Wiener
described an optimal nite impulse response (FIR) lter in the mean squared error sense. His solution will not be
discussed here even though it has much in common with the Kalman lter. Suce to say that his solution uses
both the auto correlation and the cross correlation of the received signal with the original data, in order to derive
an impulse response for the lter.
Kalman also presented a prescription of the optimal MSE lter. However Kalmans prescription has some advan-
tages over Weiners; it sidesteps the need to determine the impulse response of the lter, something which is poorly
suited to numerical computation. Kalman described his lter using state space techniques, which unlike Wieners
perscription, enables the lter to be used as either a smoother, a lter or a predictor. The latter of these three,
the ability of the Kalman lter to be used to predict data has proven to be a very useful function. It has lead to
the Kalman lter being applied to a wide range of tracking and navigation problems. Dening the lter in terms
of state space methods also simplies the implementation of the lter in the discrete domain, another reason for
its widespread appeal.
5 State space derivation
Assume that we want to know the value of a variable within a process of the form;
x
k+1
= x
k
+ w
k
(10)
where; x
k
is the state vector of the process at time k, (nx1); is the state transition matrix of the process from
the state at k to the state at k + 1, and is assumed stationary over time, (nxm); w
k
is the associated white noise
process with known covariance, (nx1).
Observations on this variable can be modelled in the form;
z
k
= Hx
k
+ v
k
(11)
where; z
k
is the actual measurement of x at time k, (mx1); H is the noiseless connection between the state vector
and the measurement vector, and is assumed stationary over time (mxn); v
k
is the associated measurement error.
This is again assumed to be a white noise process with known covariance and has zero cross-correlation with the
process noise, (mx1).
As was shown in section ?? for the minimisation of the MSE to yield the optimal lter it must be possible to
correctly model the system errors using Gaussian distributions. The covariances of the two noise models are
assumed stationary over time and are given by;
3
Q = E
_
w
k
w
T
k
(12)
R = E
_
v
k
v
T
k
(13)
The mean squared error is given by 11.5. This is equivalent to;
E
_
e
k
e
T
k
= P
k
(14)
where; P
k
is the error covariance matrix at time k, (nxn).
Equation 11.14 may be expanded to give;
P
k
= E
_
e
k
e
T
k
= E
_
(x
k
x
k
) (x
k
x
k
)
T
_
(15)
Assuming the prior estimate of x
k
is called x
k
, and was gained by knowledge of the system. It posible to write an
update equation for the new estimate, combing the old estimate with measurement data thus;
x
k
= x
k
+ K
k
(z
k
H x
k
) (16)
where; K
k
is the Kalman gain, which will be derived shortly. The term z
k
H x
k
in eqn. 11.16 is known as the
innovation or measurement residual;
i
k
= z
k
H x
k
(17)
Substitution of 11.11 into 11.16 gives;
x
k
= x
k
+ K
k
(Hx
k
+ v
k
H x
k
) (18)
Substituting 11.18 into 11.15 gives;
P
k
= E [[(I K
k
H) (x
k
x
k
) K
k
v
k
]
[(I K
k
H) (x
k
x
k
) K
k
v
k
]
T
_
(19)
At this point it is noted that x
k
x
k
is the error of the prior estimate. It is clear that this is uncorrelated with
the measurement noise and therefore the expectation may be re-written as;
P
k
= (I K
k
H) E
_
(x
k
x
k
) (x
k
x
k
)
T
_
(I K
k
H)
+ K
k
E
_
v
k
v
T
k
K
T
k
(20)
Substituting equations 11.13 and 11.15 into 11.19 gives;
P
k
= (I K
k
H) P
k
(I K
k
H)
T
+ K
k
RK
T
k
(21)
where P
k
is the prior estimate of P
k
.
Equation 11.21 is the error covariance update equation. The diagonal of the covariance matrix contains the mean
squared errors as shown;
P
kk
=
_
_
E
_
e
k1
e
T
k1
E
_
e
k
e
T
k1
E
_
e
k+1
e
T
k1
E
_
e
k1
e
T
k
E
_
e
k
e
T
k
E
_
e
k+1
e
T
k
E
_
e
k1
e
T
k+1
E
_
e
k
e
T
k+1
E
_
e
k+1
e
T
k+1
_
_
(22)
The sum of the diagonal elements of a matrix is the trace of a matrix. In the case of the error covariance matrix the
trace is the sum of the mean squared errors. Therefore the mean squared error may be minimised by minimising
the trace of P
k
which in turn will minimise the trace of P
kk
.
4
The trace of P
k
is rst dierentiated with respect to K
k
and the result set to zero in order to nd the conditions
of this minimum.
Expansion of 11.21 gives;
P
k
= P
k
K
k
HP
k
P
k
H
T
K
T
k
+ K
k
_
HP
k
H
T
+ R
_
K
T
k
(23)
Note that the trace of a matrix is equal to the trace of its transpose, therefore it may written as;
T [P
k
] = T [P
k
] 2T [K
k
HP
k
] + T
_
K
k
_
HP
k
H
T
+ R
_
K
T
k
(24)
where; T [P
k
] is the trace of the matrix P
k
.
Dierentiating with respect to K
k
gives;
dT [P
k
]
dK
k
= 2(HP
k
)
T
+ 2K
k
_
HP
k
H
T
+ R
_
(25)
Setting to zero and re-arranging gives;
(HP
k
)
T
= K
k
_
HP
k
H
T
+ R
_
(26)
Now solving for K
k
gives;
K
k
= P
k
H
T
_
HP
k
H
T
+ R
_
1
(27)
Equation 11.27 is the Kalman gain equation. The innovation, i
k
dened in eqn. 11.17 has an associated measure-
ment prediction covariance. This is dened as;
S
k
= HP
k
H
T
+ R (28)
Finally, substitution of equation 11.27 into 11.23 gives;
P
k
= P
k
P
k
H
T
_
HP
k
H
T
+ R
_
1
HP
k
= P
k
K
k
HP
k
= (I K
k
H) P
k
(29)
Equation 11.29 is the update equation for the error covariance matrix with optimal gain. The three equa-
tions 11.16, 11.27, and 11.29 develop an estimate of the variable x
k
. State projection is achieved using;
x
k+1
= x
k
(30)
To complete the recursion it is necessary to nd an equation which projects the error covariance matrix into the
next time interval, k + 1. This is achieved by rst forming an expressions for the prior error;
e
k+1
= x
k+1
x
k+1
= (x
k
+ w
k
) x
k
= e
k
+ w
k
(31)
Extending equation 11.15 to time k + 1;
P
k+1
= E
_
e
k+1
e
T
k+1
= E
_
(e
k
+ w
k
) (e
k
+ w
k
)
T
_
(32)
Note that e
k
and w
k
have zero cross-correlation because the noise w
k
actually accumulates between k and k + 1
whereas the error e
k
is the error up until time k. Therefore;
5
P
k+1
= E
_
e
k+1
e
T
k+1
= E
_
e
k
(e
k
)
T
_
+ E
_
w
k
w
T
k
= P
k
T
+ Q (33)
This completes the recursive lter. The algorithmic loop is summarised in the diagram of gure 11.5.
Kalman Gain
Update Estimate
Update Covariance
Project into k+1
Projected Estimates
Initial Estimates
Updated State Estimates
Measurements
Description Equation
Kalman Gain K
k
= P
k
H
T
_
HP
k
H
T
+ R
_
1
Update Estimate x
k
= x
k
+ K
k
(z
k
H x
k
)
Update Covariance P
k
= (I K
k
H) P
k
Project into k + 1 x
k+1
= x
k
P
k+1
= P
k
T
+ Q
Figure 1: Kalman Filter Recursive Algorithm
6 The Kalman lter as a chi-square merit function
The objective of the Kalman lter is to minimise the mean squared error between the actual and estimated data.
Thus it provides the best estimate of the data in the mean squared error sense. This being the case it should be
possible to show that the Kalman lter has much in common with the chi-square. The chi-square merit function is
a maximum likelihood function, and was derived earlier, equation 11.9. It is typically used as a criteria to t a set
of model parameters to a model a process known as least squares tting. The Kalman lter is commonly known
as a recursive least squares (RLS) tter. Drawing similarities to the chi-square merit function will give a dierent
perspective on what the Kalman lter is doing.
The chi-square merit function is;
2
=
k
i=1
_
z
i
h(a
i
, x)
i
_
2
(34)
where; z
i
is the measured value; h
i
is the data model with parameters x, assumed linear in a;
i
is the variance
associated with the measured value.
6
The optimal set of parameters can then be dened as that which minimises the above function. Expanding out
the variance gives;
2
=
k
i=1
1
i
[z
i
h(a
i
, x)]
2
(35)
Representing the chi-square in vector form and using notation from the earlier Kalman derivation;
2
k
= [z
k
h(a, x
k
)] R
1
[z
k
h(a, x
k
)]
T
(36)
where; R
1
is the matrix of inverse squared variances, i.e. 1/
i
i
.
The above merit function is the merit function associated with the latest, kth, measurement and provides a measure
of how accurately the model predicted this measurement. Given that the inverse model covariance matrix is known
up to time k, the merit function up to time k may be re-written as;
2
k1
= (x
k1
x
k1
) P
1
k1
(x
k1
x
k1
)
T
(37)
To combine the new data with the previous, tting the model parameters so as to minimise the overall chi-square
function, the merit function becomes the summation of the two;
2
= (x
k1
x
k1
) P
1
k1
(x
k1
x
k1
)
T
+ [z
k
h(a, x
k
)] R
1
[z
k
h(a, x
k
)]
T
(38)
Where the rst derivative of this is given by;
d
2
dx
= 2P
1
k1
(x
k1
x
k1
) 2
x
h(a, x
k
)
T
R
1
[z
k
h(a, x
k
)] (39)
The model function h(a, x
k
) with parameters tted from information to date, may be considered as;
h(a, x
k
) = h(a, ( x
k
+ x
k
)) (40)
where x
k
= x
k
x
k
. The Taylor series expansion of the model function to rst order is;
h( x
k
+ x) = h( x
k
) + x
x
h( x
k
) (41)
Substituting this result into the derivative equation 11.39 gives;
d
2
dx
= 2P
1
k
(x
k
x
k
)
2
x
h(a, x
k
)
T
R
1
[z
k
h(a, x
k
) (x
k
x
k
)
x
h(a, x
kk
)] (42)
It is assumed that the estimated model parameters are a close approximation to the actual model parameters.
Therefore it may be assumed that the derivatives of the actual model and the estimated model are the same.
Further, for a system which is linear in a the model derivative is constant and may be written as;
x
h(a, x
k
) =
x
h(a, x
k
) = H (43)
Substituting this into equation 11.39 gives;
d
2
dx
= 2P
1
k
x
k
+ 2H
T
R
1
Hx
k
2H
T
R
1
[z
k
h(a, x
k
)] (44)
Re-arranging gives;
d
2
dx
= 2
_
P
1
k
+ H
T
R
1
H
x
k
2H
T
R
1
[z
k
h(a, x
k
)] (45)
7
For a minimum the derivative is zero, rearrange in terms of x
k
gives;
x
k
=
_
P
1
k
+ H
T
R
1
H
1
H
T
R
1
[z
k
h(a, x
k
)] (46)
x = x
k
+
_
P
1
k
+ H
T
R
1
H
1
H
T
R
1
[z
k
h(a, x)] (47)
Comparison of equation 11.47 to 11.16 allows the gain, K
k
to be identied as;
K
k
=
_
P
1
k
+ H
T
R
1
H
1
H
T
R
1
(48)
Giving a parameter update equation of the form;
x
k
= x
k
+ K
k
[z
k
h(a, x
k
)] (49)
Equation 11.49 is identical to 11.16 and describes the improvement of the parameter estimate using the error
between measured and model projected values.
7 Model covariance update
The model parameter covariance has been considered in its inverted form where it is known as the information
matrix
1
. It is possible to formulate an alternative update equation for the covariance matrix using standard error
propogation;
P
1
k
= P
1
k
+ HR
1
H
T
(50)
It is possible to show that the covariance updates of equation 11.50 and equation 11.29 are equivalent. This may
be achieved using the identity P
k
P
1
k
= I. The original, eqn 11.29 and alternative, eqn 11.50 forms of the
covariance update equations are;
P
k
= (I K
k
H) P
k
and P
1
k
= P
1
k
+ HR
1
H
T
Therefore;
(I K
k
H) P
k
P
1
k
+ HR
1
H
T
= I (51)
Substituting for K
k
gives;
_
P
k
P
k
H
T
_
HP
k
H
T
+ R
_
1
HP
k
_
_
P
1
k
+ H
T
R
1
H
= I P
k
H
T
_
_
HP
k
H
T
+ R
_
1
R
1
+
_
HP
k
H
T
+ R
_
1
HP
k
H
T
R
1
_
H
= I P
k
H
T
_
_
HP
k
H
T
+ R
_
1
_
I + HP
k
H
T
R
1
_
R
1
_
H
= I P
k
H
T
_
R
1
R
1
H
= I (52)
8 Alternative Kalman equations
Having shown that the covariance matrix can be updated via the previous equation it is possible to formulate an
alternative Kalman gain, as follows;
K
k
= P
k
H
T
_
HP
k
H
T
+ R
_
1
(53)
1
when the Kalman lter is built around the information matrix it is known as the information lter
8
Inserting P
k
P
1
k
and R R
1
;
K
k
= P
k
P
1
k
P
k
H
T
R
1
R
_
HP
k
H
T
+ R
_
1
= P
k
P
1
k
P
k
H
T
R
1
_
HP
k
H
T
R
1
+ I
_
1
= P
k
_
I + H
T
R
1
HP
k
_
H
T
R
1
_
HP
k
H
T
R
1
+ I
_
1
= P
k
H
T
R
1
_
I + H
T
R
1
HP
k
_ _
I + HP
k
H
T
R
1
_
1
= P
k
H
T
R
1
(54)
Replacing P
k
with the inverse of equation 11.50;
K
k
=
_
HR
1
H
T
+ P
1
k
1
H
T
R
1
(55)
Which is the same as the gain calculated from the chi-square equations, conrming that the gains are indeed
equivalent.
Although an alternative recursive algorithm has been developed the objective was to demonstrate the relationship
between the Kalman lter and the chi-square statistic, showing how the Kalman lter embodies this statistic. The
diagram of gure ?? shows how the alternative set of lter equations may be used to implement a Kalman lter.
This form of the lter may be attractive due to the simplied gain calculation and some authors have been able to
use this form of the lter in a distributed implementation [?]. However in this form the lter requires two matrix
inversions which can be a computational burden, particularly when large matrices are involved. Thus the preferred
implementation here is that given in gure 11.5.
Project into k+1
Projected Estimates
Update Estimate
Kalman Gain
Covariance & Invert
Compute Inverse
Initial Estimates
Updated State Estimates
Measurements
Description Equation
Compute Inverse Covariance P
1
k
= HR
1
H
T
+ P
1
k
Kalman Gain K
k
= P
k
H
T
R
1
Update Estimate x
k
= x
k
+ K
k
(z
k
H x
k
)
Project into k + 1 x
k+1
= x
k
P
k+1
= P
k
T
+ Q
Figure 2: Alternative Kalman lter recursive algorithm
9
9 Conclusions
This tutorial has shown how the Kalman lter may be derived from the desire to minimise the mean squared error
of a signal prediction. Several points in the derivation have been emphasised;
The minimisation of the mean squared error is shown to be applicable when the expected errors on the signal
are distribution as a Gaussian. Under such conditions the minimisation of the mean squared error between
the data and the data prediction leads to the development of a maximum likelihood statistic
It has been shown how the Kalman lter can be thought of in terms of a chi-squared minimiser by deriving an
alternative form of the Kalman lter which highlights its statistical constructs including the processes of error
propagation and data combination. This derivation leads to a common, alternative set of lter equations.
In summary, although the Kalman lter is optimal in the mean-squared error sense, it is limited, practically by the
quality and accuracy of the model which is embedded within it. However, without an appropriate model the lter
is unable to perform the task for which it is designed. The following sections describe a new statistical approach
to the solution of this problem.
References
[1] Kalman, R. E., A New Approach to Linear Filtering and Predi ction Problems, Trans. ASME, Journal
of Basic Engineering, pp. 35-45, March 1960.
[2] Brown, R. G. and Hwang, P. Y. C., Introduction to Random Sig nals and Applied Kalman Filtering,
Second Ed., Wiley, New York, 1992
[3] Grant, P. M., Cowan, C. F. N., Mulgrew, B. and Dripps, J. H., Analogue and Digital Signal Processing
and Coding, Chartwell-Bratt Ltd, 1989.
[4] Weiner, N., Extrapolation, Interpolation and Smoothing of Stationary Time Series, Wiley, New York,
1949.
[5] Hugh Durrant White ???
10