2.1. The General Least Squares Adjustment Model
2.1. The General Least Squares Adjustment Model
2.1. The General Least Squares Adjustment Model
A common treatment of the least squares technique of estimation starts with simple linear
mathematical models having observations (or measurements) as explicit functions of parameters
with non-linear models developed as extensions. This adjustment technique is generally described
as adjustment of indirect observations (also called parametric least squares). Cases where the
mathematical models contain only measurements are usually treated separately and this technique is
often described as adjustment of observations only. Both techniques are of course particular cases
of a general adjustment model, the solution of which is set out below. The general adjustment
technique also assumes that the parameters, if any, can be treated as "observables" ie, they have an
a priori covariance matrix. This concept allows the general technique to be adapted to sequential
processing of data where parameters are updated by the addition of new observations.
In general, least squares solutions require iteration, since a non-linear model is assumed. The
iterative process is explained below. In addition, a proper treatment of covariance propagation is
presented and cofactor matrices given for all the computed and derived quantities in the adjustment
process. Finally, the particular cases of the general least squares technique are described.
Consider the following set of non-linear equations representing the mathematical model in an
adjustment
d i
F l, x = 0 (2.1)
l = l + v (2.2)
x = x + δ x (2.3)
As is usual, the independent observations l have an a priori diagonal cofactor matrix Qll containing
estimates of the variances of the observations, and in this general adjustment, the parameters x are
treated as "observables" with a full a priori cofactor matrix Q xx . The diagonal elements of Q xx
contain estimates of variances of the parameters and the off-diagonal elements contain estimates of
the covariances between parameters. Cofactor matrices Qll and Q xx are related to the covariance
matrices Σ ll and Σ xx by the variance factor σ 20
Σ ll = σ 20 Qll (2.4)
Σ xx = σ 20 Q xx (2.5)
Also, weight matrices W are useful and are defined, in general, as the inverse of the cofactor
matrices
W = Q −1 (2.6)
and covariance, cofactor and weight matrices are all symmetric, hence Q T = Q and W T = W
where the superscript T denotes the transpose of the matrix.
Note also, that in this development where Q and W are written without subscripts they refer to the
observations, i.e., Qll = Q and Wll = W
Linearizing (2.1) using Taylor's theorem and ignoring 2nd and higher order terms, gives
d i
F l, x a f
= F l, x +
∂F
∂l l, x
dl − li + ∂∂Fx bx − xg
l, x
= 0 (2.7)
and with v = l − l and δx = x − x from (2.2) and (2.3), we may write the linearized model in
symbolic form as
Av + B δ x = f (2.8)
Equation (2.8) represents a system of m equations that will be used to estimate the u parameters
from n observations. It is assumed that this is a redundant system where
n ≥ m ≥ u (2.9)
r = m−u (2.10)
In equation (2.8) the coefficient matrices A and B are design matrices containing partial derivatives
of the function evaluated using the observations l and the "observed" parameters x.
∂F
A m,n = (2.11)
∂l l, x
∂F
Bm , u = (2.12)
∂x l, x
The vector f contains m numeric terms calculated from the functional model using l and x.
m a fr
fm,1 = − F l, x (2.13)
The least squares solution of (2.8), ie, the solution which makes the sums of the squares of the
weighted residuals a minimum, is obtained by minimising the scalar function ϕ
a
ϕ = v T W v + δ x T Wxxδ x − 2kT Av + Bδ x − f f (2.14)
where k is a vector of m Lagrange multipliers. ϕ is a minimum when its derivatives with respect to
v and δx are equated to zero, ie.
∂ϕ
= 2v T W − 2kT A = 0T
∂v
∂ϕ
= 2δ x T Wxx − 2k T B = 0 T
∂δx
These equations can be simplified by dividing both sides by two, transposing and changing signs to
give
− Wv + A T k = 0 (2.15)
− Wxxδ x + BT k = 0 (2.16)
Equations (2.15) and (2.16) can be combined with (2.8) and arranged in matrix form as
LM−W AT 0 OP L v O L0O
MM A 0 B PP MM k PP = MMf PP (2.17)
N0 BT − Wxx Q MNδ xPQ MN0PQ
Equation (2.17) can be solved by the following reduction process given by Cross (1992, pp. 22-23).
Consider the partitioned matrix equation P y = u given as
LMP 11 P12 OP LM y OP = LM u OP
1 1
(2.18)
NP 21 P22 Q N y Q Nu Q
2 2
P11 y1 + P12 y 2 = u1
or
b
y1 = P11−1 u1 − P12 y 2 g (2.19)
LMP
11 P12 OP LMP bu − P y gOP = LM u OP
−1
11 1 12 2 1
NP
21 P22 QN y Q Nu Q 2 2
b
P21P11−1 u1 − P12 y 2 + P22 y 2 = u 2 g
P21P11−1u1 − P21P11−1P12 y 2 + P22 y 2 = u 2
cP22 h
− P21P11−1P12 y 2 = u2 − P21P11−1u1 (2.20)
LM−W AT 0 OP L v O L0O
MM A 0 B PP MM k PP = MMf PP (2.21)
N0 BT − Wxx Q MNδ xPQ MN0PQ
v can be eliminated by applying (2.20)
LML 0 B OP LM OP
A W −1 A T 0 OP L k O = Lf O − LAO −W −1
0
NMNB Q MNδ xPQ MN0PQ MN 0 PQ
−
T
− Wxx Q NQ
0
LMAQA T
B OP L k O = Lf O
Q MNδ xPQ MN0PQ
(2.22)
NB T
− Wxx
e−W xx c
− BT AQA T h Bj δ x = 0 − B cAQA h
−1 T T −1
f
eB cAQA h
T T −1
j
B + Wxx δ x = BT AQA T c h −1
f (2.23)
le = A l (2.24)
Applying the matrix rule for cofactor propagation (Mikhail 1976, pp. 76-79) gives the cofactor
matrix of the equivalent observations as
Q e = AQA T (2.25)
With the usual relationship between weight matrices and cofactor matrices, see (2.6), we may write
We = Q e−1 = AQA T c h −1
(2.26)
cB W B + W h δ x = B W f
T
e xx
T
e (2.27)
N = BT We B (2.28)
t = BT We f (2.29)
δ x = N + Wxx b g −1
t (2.30)
The vector of Lagrange multipliers k are obtained from (2.22) by applying (2.19) to give
c
k = AQA T h af − B δ x f = W af − B δ x f
−1
e (2.31)
− Wv + A T k = 0
giving
v = W −1A T k = QA T k (2.32)
Remembering that x = x + δ x , see (2.3), where x is the vector of a priori estimates of the
parameters, δ x is a vector of corrections and x is the least squares estimate of the parameters.
At the beginning of the iterative solution, it can be assumed that x equals the a priori estimates x1
and a set of corrections δ x1 computed. These are added to x1 to give an updated set x 2 . A and B
are recalculated and a new weight matrix Wxx computed by cofactor propagation. The corrections
are computed again, and the whole process cycles through until the corrections reach some
predetermined value, which terminates the process.
x n +1 = x n + δ x n (2.33)
In this section, the cofactor matrices of the vectors x , δ x, v and l will be derived. The law of
cofactor propagation will be used and is defined as follows (Mikhail 1976, pp. 76-89).
z= Fx af (2.34)
between two random vectors z and x and the variance-covariance matrix Σ xx , the variance-
covariance matrix of z is given by
Σ zz = J zx Σ xx J Tzx (2.35)
LM ∂z 1 ∂z1
"
∂z1 OP
MM ∂∂zx 1 ∂x2
∂z2
∂xn
∂z2
PP
∂F
= M ∂x PP
2
"
J zx = ∂x2 ∂xn
∂x M 1
MM ∂z# m ∂zm
#
∂zm PP
"
MN ∂x 1 ∂x2 ∂xn PQ
Using the relationship between variance-covariance matrices and cofactor matrices, see (2.5), the
law of cofactor propagation may be obtained from (2.35), as
Q zz = J zx Q xx J Tzx (2.36)
For a function z containing two independent random variables x and y with cofactor matrices Q xx
and Q yy
z = F x, ya f (2.37)
2.4.1. Cofactor Matrix for x
According to equations (2.33) and (2.30) with (2.29) the least squares estimate x is
b
x = x + N + Wxx g −1
BT We f (2.39)
and x is a function of the a priori parameters x (the "observables") and the observations l since the
vector of numeric terms f contains functions of both. Applying the law of propagation of cofactors
gives
Q xx
H ∂x K H ∂x K H ∂l K H ∂l K
xx (2.40)
∂x ∂f
∂x
b
= I + N + Wxx g −1
BT We
∂x
(2.41)
∂x ∂f
∂l
b
= N + Wxx g −1
BT We
∂l
(2.42)
a f
From (2.13), f = − F x, l the partial derivatives
∂f
∂x
∂f
and , are the design matrices A and B given
∂l
by (2.11) and (2.12)
∂f
= −B (2.43)
∂x
∂f
= −A (2.44)
∂l
Substituting (2.43) and (2.44) into (2.41) and (2.42) with the auxiliary N = BT We B gives
∂x
∂x
b
= I − N + Wxx g −1
BT We B
(2.45)
= I − bN + W g
−1
xx N
∂x
∂l
= − N + Wxxb g −1
BT We A (2.46)
o b g Nt Q oI − bN + W g Nt
−1 −1 T
Q xx = I − N + Wxx xx xx
(2.47)
+ o−bN + W g B W At Q o−bN + W g B W At
−1 −1 T
T T
xx e xx e
FG • IJ FG • IJ FG • IJ FG • IJ
H K H K H
Q xx = I − N −1 N Q xx I − N N −1 + N −1 BT We A Q A T We B N −1
K H K
Remembering that Q e = AQA T and We = Q e−1
• • • • • •
Q xx = Q xx − Q xx N N −1 − N −1 NQ xx + N −1 NQ xx N N −1 + N −1 N N −1 (2.49)
• • • • • FG IJ• •
N −1 NQ xx N N −1 + N −1 N N −1 = N −1 NQ xx N N −1 + Wxx N −1
H K
•
= N −1 NQ xx bN + W g N xx
• −1
• • •
= N −1 NQ xx N N −1
•
= N −1 NQ xx
• • •
Q xx = Q xx − Q xx N N −1 − N −1 NQ xx + N −1 NQ xx
(2.50)
• −1
= Q xx − Q xx N N
FG IJ •
H
Q xx = Q xx I − N N −1
K
= Q xx FH N− NIK N
• • −1
(2.51)
= Q xx bN + W − Ng N xx
• −1
•
= Q xx Wxx N −1
and since Q xx Wxx = I the cofactor matrix of the least squares estimates x is
•
Q xx = N −1 = N + Wxx b g −1
(2.52)
l = l + v (2.53)
l = l + QA T k
= l + QA T We f − Bδ x a f
= l + QA We f − QA We Bδ x
T T
•
Substituting the expression for δ x given by (2.30) with the auxiliaries t and N given by (2.29) and
(2.48) respectively gives
e b e g
l = l + QA T W f − QA T W B N + W
xx
−1
t
= l + QA W f − QA W B bN + W g
T T −1
e e xx BT We f (2.54)
•
= l + QA T We f − QA T We B N −1 BT We f
and l is function of the observables x and the observations l since f = − F x, l . Applying the law a f
of propagation of cofactors to (2.54) gives
Qll
F ∂l I F ∂l I F ∂l I F ∂l I
= G J Q G J + G J QG J
T T
(2.55)
H ∂x K H ∂x K H ∂l K H ∂l K
xx
∂l ∂f • ∂f
= Q A T We − Q A T We B N −1 BT We
∂x ∂x ∂x
∂l ∂f • ∂f
= I + Q A T We − Q A T We B N −1 BT We
∂l ∂l ∂l
∂f ∂f
With = − B and = − A , and with the auxiliary N = BT We B the partial derivatives become
∂x ∂l
∂l •
= Q A T We B N −1 BT We − Q A T We B
∂x (2.56)
•
= Q A T We B N −1 N − Q A T We B
∂l •
= I + Q A T We B N −1 BT We A − Q A T We A (2.57)
∂l
m r m
Qll = 1st term + 2 nd term r (2.58)
where
m1st
r • •
term = QA T We B N −1 NQ xx N N −1 BT We AQ − QA T We B N −1 NQ xx BT We AQ
•
•
− QA T WeBQ xx N N −1 BT We AQ + QA T We BQ xx BT We AQ
m2 nd
r
term = Q + QA T We B N −1 BT We AQ − QA T We AQ
•
•
+ QA T We B N −1 BT We AQ
• •
+ QA T We B N −1 BT We AQA T We B N −1 BT We AQ
•
− QA T We B N −1 BT We AQA T We AQ − QA T We AQ
•
− QA T We AQA T We B N −1 BT We AQ + QA T We AQA T We AQ
m1 r FG • • • • IJ
H
term = QA T We B N −1 NQ xx N N −1 − N −1 NQ xx − Q xx N −1 N + Q xx BT We AQ
K
st
F F
= QA W B G N N G Q
• −1 • IJ − Q • IJ
H H N N −1 − Q xx
K N −1 N + Q xx BT We AQ
T
e xx xx
K
• •
but we know from (2.50) that Q xx = Q xx − Q xx N N −1 , and from (2.52) that Q xx = N −1 so
m1 r FG • IJ
H
term = QA T We B Q xx − N −1 NQ xx BT We AQ
K
st
F
= QA W B G N
• −1 • • IJ B W AQ
H − N −1 N N −1
K
T T
e e
• FG • IJ
= QA T We B N −1 I − N N −1 BT We AQ
H K
•
The term in brackets has been simplified in (2.51) as Wxx N −1 which gives the 1st term as
m1 st
r
term = QA T We B N −1 Wxx N −1 BT We AQ
• •
(2.59)
The 2nd term of (2.58) can be simplified by remembering that AQA T = Q e = We−1 so that after
some cancellation of terms we have
m2 nd
r •
term = Q + QA T We B N −1 N N −1 BT We AQ − QA T We AQ
•
(2.60)
Substituting (2.59) and (2.60) into (2.58) gives the cofactor matrix of the adjusted observations as
Qll = Q + QA T We B N + Wxx b g −1
BT We AQ − QA T We AQ (2.61)
b
δ x = N + Wxx g −1
BT We f
(2.62)
• −1
= N B We fT
FGIJ FG
• IJ •
T
H K H K
Qδ xδ x = N −1 BT We Q f f N −1 BT We (2.63)
F ∂f I F ∂f I F ∂f I F ∂f I
Q = G J Q G J + G J QG J
T T
ff
H ∂x K H ∂x K H ∂l K H ∂l K
xx
= a− Bf Q a − Bf + a− Af Q a− Af
T T
xx (2.64)
= BQ xx BT + AQA T
= BQ xx BT + Q e
b
Qδ xδ x = N + Wxx g −1
b
NQ xx N N + Wxx g + bN + W g N bN + W g
−1
xx
−1
xx
−1
(2.65)
• • • •
Qδ xδ x = N −1 NQ xx N N −1 + N −1 N N −1
• FG IJ • •
H
= N −1 NQ xx N N −1 + Wxx N −1
K
•
= N −1 NQ xx bN + W g N xx
• −1
• • •
= N −1 NQ xx N N −1
or
•
Qδ xδ x = N −1 NQ xx = N + Wxx b g −1
NQ xx (2.66)
v = QA T k
a
= QA T We f − Bδ x f
= QA T We f − QA T We Bδ x
= QA T We f − QA T We B N + Wxx b g −1
t
•
and with (2.29) and the auxiliary N −1 = N + Wxx b g −1
•
v = QA T We f − QA T We B N −1 BT We f (2.67)
v is a function of the observables x and the observations l since f = − F x, l and applying the lawa f
of propagation of cofactors gives
FG ∂v IJ Q FG ∂v IJ + FG ∂v IJ Q FG ∂v IJ
T T
Q vv =
H ∂x K H ∂x K H ∂l K H ∂l K
xx (2.68)
∂v ∂f • ∂f
= Q A T We − Q A T We B N −1 BT We
∂x ∂x ∂x
∂v ∂f • ∂f
= Q A T We − Q A T We B N −1 BT We
∂l ∂l ∂l
∂f ∂f
With = − B and = − A , and with the auxiliary N = BT We B the partial derivatives become
∂x ∂l
∂v •
= Q A T We B N −1 N − Q A T We B (2.69)
∂x
∂v •
= Q A T We B N −1 BT We A − Q A T We A (2.70)
∂l
m r m
Q vv = 1st term + 2 nd term r (2.71)
where
m1st
r • •
term = QA T We B N −1 NQ xx N N −1 BT We AQ − QA T We B N −1 NQ xx BT We AQ
•
•
− QA T WeBQ xx N N −1 BT We AQ + QA T We BQ xx BT We AQ
m2 nd
r •
term = QA T We B N −1 BT We AQA T We B N −1 BT We AQ
•
•
− QA T We B N −1 BT We AQA T We AQ
•
− QA T We AQA T We B N −1 BT We AQ
+ QA T We AQA T We AQ
The 1st term above is identical to the 1st term of (2.58) which simplifies to (2.59) as
m1 st
r •
term = QA T We B N −1 Wxx N −1 BT We AQ
•
(2.72)
The 2nd term above can be simplified by remembering that AQA T = Q e = We−1 so that after some
manipulation we have
m2 r FG • • • IJ
H
term = QA T We B N −1 N N −1 − N −1 BT We AQ
K
nd
•
− QA T We B N −1 BT We AQ + QA T We AQ
• • • •
N −1 N N −1 − N −1 = N −1 N − N N − 1 FH IK • •
= N −1
•
cN − bN + W gh N xx
• −1
• •
= − N −1 Wxx N −1
m2 nd
r •
term = − QA T We B N −1 Wxx N −1 BT We AQ
•
(2.73)
• −1
− QA T We B N BT We AQ + QA T We AQ
Substituting (2.72) and (2.73) into (2.71) gives the cofactor matrix of the residuals v as
Q vv = − QA T We B N + Wxx b g −1
BT We AQ + QA T We AQ (2.74)
Q vv = Q − Qll (2.75)
v T Wv + δ x T Wxxδ x
σ 20 = (2.77)
r
r = m − u + ux (2.78)
where m is the number of equations used to estimate the u parameters from n observations. ux is the
number of weighted parameters. [Equation (2.78) is given by Krakiwsky (1975, p.17, eqn 2-62)
who notes that it is an approximation only and directs the reader to Bossler (1972) for a complete
and rigorous treatment.]
Depending on the form of the design matrices A and B, and also on whether the parameters are
treated as observables, ie, is Wxx = 0 , there are several different possibilities for the formulation
and solution of least squares problems. The standard cases are listed below.
The general case of a non-linear implicit model with weighted parameters treated as observables is
known as the Combined Case with Weighted Parameters. It has a solution given by the following
equations (2.30), (2.28), (2.29), (2.26), (2.3), (2.31), (2.32), (2.2), (2.65), (2.52), (2.74), (2.61),
(2.64), (2.77) and (2.78).
b
δ x = N + Wxx g −1
t (2.79)
with N = BT We B (2.80)
t = BT We f (2.81)
We = Q e−1 = AQA T c h −1
(2.82)
x = x + δ x (2.83)
a
k = We f − B δ x f (2.84)
v = W −1A T k = QA T k (2.85)
l = l + v (2.86)
b
Qδ xδ x = N + Wxx g −1
b
NQ xx N N + Wxx g + bN + W g N bN + W g
−1
xx
−1
xx
−1
(2.87)
= bN + W g
−1
xx NQ xx
b
Q xx = N + Wxx g −1
(2.88)
Q vv = QA T We AQ − QA T We B N + Wxx b g −1
BT We AQ (2.89)
Qll = Q + QA T We B N + Wxx b g −1
BT We AQ − QA T We AQ (2.90)
Q f f = BQ xx BT + Q e (2.91)
v T Wv + δ x T Wxxδ x
σ =2
0 (2.92)
r
r = m − u + ux (2.93)
Σ δ x δ x = σ 20 Qδ x δ x (2.94)
Σ vv = σ 20 Q vv (2.96)
Σ f f = σ 20 Q f f (2.98)
b
2.5.2. Combined Case A, B, W, Wxx = 0 g Av + Bδx = f 0
The Combined Case is a non-linear implicit mathematical model with no weights on the parameters.
The set of equations for the solution is deduced from the Combined Case with Weighted Parameters
by considering that if there are no weights then Wxx = 0 and Q xx = 0 . This implies that x is a
constant vector (denoted by x 0 ) of approximate values of the parameters, and partial derivatives
with respect to x 0 are undefined. Substituting these two null matrices and the constant vector
x = x 0 into equations (2.1) to (2.78) gives the following results.
δ x = N −1t (2.99)
with N = BT We B (2.100)
t = BT We f 0 (2.101)
c h
f 0 = − F x0 , l (2.102)
c
We = Q e−1 = AQA T h −1
(2.103)
x = x 0 + δ x (2.104)
c
k = We f 0 − B δ x h (2.105)
v = W −1A T k = QA T k (2.106)
l = l + v (2.107)
Qδ xδ x = Q xx = N −1 (2.108)
Q vv = QA T We AQ − QA T We BN −1BT We AQ (2.109)
Q f 0 f 0 = Qe (2.111)
v T Wv
σ 20 = (2.112)
r
r = m−u (2.113)
Σ vv = σ 20 Q vv (2.115)
Σ f 0 f 0 = σ 20 Q f 0 f 0 (2.117)
b
2.5.3. Parametric Case A = I, B, W, Wxx = 0 g v + Bδx = f 0
The Parametric Case is a mathematical model with the observations l explicitly expressed by some
non-linear function of the parameters x only. This implies that the design matrix A is equal to the
identity matrix I. Setting A = I in the Combined Case (with no weights) leads to the following
equations.
δ x = N −1t (2.118)
with N = BT WB (2.119)
t = BT Wf 0 (2.120)
c h
f 0 = − F x0 , l (2.121)
x = x 0 + δ x (2.122)
v = f 0 − B δx (2.123)
l = l + v (2.124)
Qδ xδ x = Q xx = N −1 (2.125)
Q vv = Q − BN −1BT (2.126)
Qf0f0 = Q (2.128)
v T Wv
σ 20 = (2.129)
r
r = n−u (2.130)
Σ vv = σ 20 Q vv (2.132)
Σ f 0 f 0 = σ 20 Q f 0 f 0 (2.134)
b
2.5.4. Condition Case A, B = 0, W, Wxx = 0 g Av = f
The Condition Case is characterised by a non-linear model consisting of observations only. Setting
B = 0 in the Combined Case (with no weights) leads to the following equations.
k = We f (2.135)
f = −F l af (2.137)
v = W −1A T k = QA T k (2.138)
l = l + v (2.139)
Q vv = QA T We AQ (2.140)
Qll = Q − QA T We AQ (2.141)
v T Wv
σ 20 = (2.142)
r
r =m (2.143)
Σ vv = σ 20 Q vv (2.144)