Convergence of A Regularized Gauss-Newton Method
Convergence of A Regularized Gauss-Newton Method
Convergence of A Regularized Gauss-Newton Method
GAUSS-NEWTON METHOD*
H. SCHWETLICK
Dresden
1. Introduction
F(s)=(fi(x), . . . J&w
the problem of mean square solution of the in general incompatible non-linear system
of equations
F(x) =0
With m = n we get a particular case of this class of problems, namely, the solution or
mean square solution of a non-linear system of n equations with n unknowns.
Several algorithms have been proposed for the solution of problem (1 .l) (see
e.g., Powell [l] , Kowalik and Osborne [2] , or Bard [3] ); these frequently amount to
variants of the so-called damped Gauss-Newton method
Difficulties arise in the numerical realization of (1.2) when rank (F’ ( sk) ) <n in
which case F’ ( zk) ‘F’ (5) k is not invertible (or when the corresponding linear system is
not uniquely solvable). Moreover, for propositions to be provable concerning global
convergence, [ F’(xk) TF’(xk) I-’ must exist and be uniformly bounded with respect to k.
These difficulties were avoided by Levenberg [4] , and later by Marquardt [5] , by
going over to the iteration
which may be interpreted as a regularized version of (1.2); here, pk>o is a real parameter.
Obviously, with ok>0 the expression pkzfF’(2)‘F’( “)
2 IS invertible, and by a suitable
’
choice of pk it becomes possible to prove convergence theorems for the iteration (1.3)
(see Shanno [6,7], and Shaman&ii [S]).
The papers just quoted offered several special methods for choosing pk all
essentially based on the principle of majorization by linear functions which Goldstein
[9, lo] devised in connection with the minimization of an arbitrary functional. For
the relevant iteration (1.3), global convergence theorems can be proved, similar to those
for the gradient method; see Ortega and Rheinboldt [ 1l] , Section 12. If the sequence
{xk} converges to a “good” zero Z* of F, the method transforms automatically, given
a suitable choice of a certain parameter, into the undamped Gauss-Newton method,
i.e., into the method (1.2) with yk= 1, and thus acquires the favourable local convergence
properties of the latter, and notably, its over-linear convergence.
Further, let x”ER” be a fixed initial vector for the iteration (2.1), and let the
corresponding level set be defined by W( 5’) : = {z&?” : g (5) Gg (5”))
It will be said for brevity that a function F satisfies the condition (V) if F) R"-R"
and F possesses on W(Y) a uniformly continuous Frechet derivative r;’ bounded by
M > 0 (by differentiability on a set Q we understand here differentiability on an open
set R 3 Q).
Lemma 1
-& g (x--h/z (A, z))= --g’ (z-Ah (h, z))‘A (h, z)-2g’ (2).
Lemma 2
Let F satisfy condition (V), and for some sE w( 2”) and some 3~ 10, ‘/PI let
0~(X~1/2(I+MZ)and{z--hh(3L, z):a~[O, 71]}cW(P).Then,with 3i.=[O, ?I],
we have
4 H. Sch wetlick
~~~(2)-_P(2--lh(h,2))#=~~~~‘(z--th(~,2))[~~h(t,z)]dt~~
0
A A
<+F’(z-th(t, z))A(t, z)-eg’(z)Idt~SM.4.;lilbdt=4M2bh
0 0
Q.E.D.
Lemma 3
Let F satisfy (V), let ~1, ~2 and q2 be given numbers such that 0<q2<~1<pz<1
and let
Then for every z~W(z“) with llg’(z) ll>q>O there exists the number h,(z) =
maxCO~a~‘lz:cp(h)~Xz(h)and ii(A)<&(A) for all h=[O, CL]), and we have
h,(z) ahi> with some AI= (0, ‘/,) independent of z.
Proof For all hi [O, ‘/2] such that z-hh (A. z) EW (so) all the derivatives in
question exist, and we have
Since (p(0) -C$ (0) <ii (0) -C&(O) <O we can show, by similar arguments to those
used by Altman [13. 141, when proving the convergence of a special gradient method,
that there exists at least one cc= (0, 1) such that C$(A) &Z (A) -(ii (A) _(& (A) for
AE [ 0, a] and that A, (z) is well defined, while in particular
Now let E: = ( r2/4) (1 -_clz) /4 Mb2 and 6,>0 be numbers for which it follows,
from the fact that U, zzW(z”) for llu--~1]<(6~ that IIF’(-F’( U) llGe.Further, let
We now assume that no positive lower bound exists for X1(z). There will then be at
least one z~W(z’) for which jig’(z) ]l>rl and O<hi:=)Lf (z) (6. By definition of
AI and noting that h,<r/s at least one of the following equations will hold for this z:
In view of Lemma 2 and the fact that IIhJr (A,, z) l162MbhiG3, we have
+[16M4b2+6M”(1+M2)b2]6s+ (I--y2),
whence
6 H. Schwetlick
-&h,)=pillg’(z) Il”=-~?z(hi)=q~g’(z)‘A(h~,
z>-%‘(z)
=qz/lg’(z) w-s(4 3’
Gf (prq2)
whence it follows that
i.e., pI<qz+i/s ( pt-q2) which contradicts the fact that ~1> q2. Q.E.D.
Lemma 4
Proof. 1. Since rank (F’ (5’) ‘F’ (5’) ) = rank (F’ (2’) ) = n the matrix
B(Y) =F’(x’) ‘F’( ST* ) IS
. non-singular. There then exist numbers &E (0, p], and
L1 > 0 such that, with XES(L*, 6,) B(x) is non-singular and llB(z) -i]]=&5,.It
follows from this that
Ils--h(l,s)-2*~~=II--B(z)-‘F’(z)’[F(z)--P(~’)
-F’(x) (x-x*) ] )~~‘L/~L,N~N~))x-L*~~~=:~~~~-x*~~~,
Convergence of a regularized Gauss-Newtonmethod 7
where Nr andNz denote bounds for F’(x) and F”(x) on 5(x*, p). Our assertion 1) will
obviously hold provided that, for a given EE (0, I] we choose 6 so as to satisfy
2. Let XES (x*, &) , &: =*/2&. The straight line joiningx and x-h(l) x) then lies
entirely in S(x*, a,), and in particular,g will thus be twice continuously differentiable on
this line. We thus have
g(n)-_g(x--h(l, z))=g’(z)‘h(l, z)--‘l&(1, 4Tgy+(l, x) (2.2)
-wz(l, 5)qgN(Z---6h(l, z))-gyr)]h(l, 5)
=‘/zg’(s)‘h(l, +--r(2), s=e(z>E(O, I),
i=k=l t * **,n,
D tx)= (dik (x)h otherwise
w1(4
A (s)=diag (IQ (x)), w(x)=
[
a**
w2 (4 1
,
O=z’(x)+h(x)V(x) (x-x*)+w’(x),
O=?(x) + +wyx)
Now let &: =min {6,, mlNz}. Then, with FZS( x*, &)we have
and furthermore,
g(x)-_g(X----h(l, x~~~~/*~1-2~~2~i2{~211~(x)II
+Et(62)}lg’(x)=h(l, x>.
Convergence of a regularized Gauss-Newtonmethod 9
To solve problem (1 .l), we shall consider the iteration (1.3) in the form (2.1). The
step parameter ?L~E[ 0, 11 will therefore be determined from the following algorithm,
which we refer to for brevity as (Gl):
Given numbers q,,q2, ii. such that OCqjGqz<l, O<hGl and a sequence {k}
-
such that h<Xk< 1. -
(3.1)
&se 2b. If Case 2a does not hold, we choose hk~ (0, %A) in such a way that
(3.2)
Theorem I
Let F satisfy condition (V). Let the sequence {xk} starting with x”ER”be calculated
from (2.1), the step hk being chosen in accordance with (Gl).
hoof: We shall prove 1) by induction. Assume that 1) holds for 1 Gk. Then,
zk=IV( 5’). If g’( zk) =0 we have nothing to prove; we therefore assume that g’( zk) ZO.
If case 2a holds, we then have g’(sk)‘h(&, xk) =g’(zk)TXA (A,>,xk)-‘g’(s’)>O, i.e.,
g(zk+‘)~g(Zk)-4,Xkg’(gk)Th(~kr .%“)<&‘).
(pk(?&):=g(rt.k-k.h(%k,
x”))‘g(p)
-ql~kg'(xk)Th(?ik, xk) =:$!,k(&).
exists such that (pk(h) <$,k (a), hi (0, a). Hence there exists
i.e.,
The function f$k is thus continuous and even differentiable in [ 0, xk) and since
~2,k(h):=g(sk)--3Lq2g’(Zk)=h(h, xk)
there ti ahays exista hkE (0, a) c (0, xk), such that (3.2) holds, and again we
get g (tip”) <g(z) . This proves the realizability and the inequalities for g (ti).The
existence of lim g ( zk) =L>O follows at once from the inequality g(z) 20 which holds
for all x=Rn
2) Our proof will be indirect. We assume that there exists a subsequence {xkl}
of (2”) such that llg’(z”‘) ll&q>0, j=l, 2, . . . ,
‘pkj
(M2’kj (k> and jli,kj (h)<&, k (h) for all hE[O, h,], (3.3)
where Cp, Xi and ‘$2 are the functions defined in Lemma 3, with z=& From (3.3)
noting that
‘Pkj (0)=X2, kj (0)‘Xl,kj (0)=%,kj (O)=g (Xkj),
<lil,k,(o)=--tL1[Ig'(Zkj)112<~2,kj(0)=-~2IIg'(~kj)II
It is clear from (3.4) that iikj cannot belong to the interval (0, Al) since the left-hand
side of (3.2) does not hold in this interval. We thus always have-
hkj>min (h,
--
hi}= : &in, Lni*>O.
Hence
It follows from (3.5) that {g(P’)} and hence {g(P)} is not bounded from below,
which contradicts the fact that g(z) 20. QED.
It may be mentioned here that, in the method for choosing the step given by
Goldstein [9, lo] for the iteration
X h+‘=XLYkP (Xk)
g(xk)--Qt~kg’(xk)T~(xk)~g(xk-yk~(xk))~(g(xk)--Qiykg’(~)Tp(~) (3 .o)
where p (zk)is a direction of descent of g at the point zk i.e., a direction for which
g’(zk)Tp(5k)>0.
12 H. Schwetlick
As distinct from the case of (3.2) the step parameter Tk appears linearly in (3.6)
in the argument ofg and in the upper and lower bounds; the same is true for (3.1). The
statements of Theorem 1 can of course still be proved (in fact, more simply) if, instead
of the non-linear bounding functions
are employed in (3.1) and (3.2). But in general, it can then no longer be shown that
k!+=lfor k>k, or that, given a suitable choice of XA and qi we can choose hk such
that Iim AA=1 The corresponding method (2.1) is then no longer over-linearly
convergent in the neighbourhood of a zero with certain regularity properties, as is the
method when (Cl) is used for choosing the step (see Theorem 2).
With additional assumptions, we can prove in the usual way that {z’} or at least
some subsequence of it, is convergent to a stationary point ofg, i.e., to some x*ER”
such that g’(z*) =0 .This does not necessarily follow from our assumptions (V), as
is evident e.g., from the counter-example f(x) =ex, nz=n. These assertions are, however,
independent of the special algorithm, and are derived solely from the assertions that
{x’} c W (x”) and lim g’ ( xA) =O.
An example is provided by
Note. Let FIRn+Rm be continuously differentiable on W(Z”) and let W(z”) itself
be bounded. The assertions of Theorem 1 then hold, together with:
1) The set Q: = (.z=W(ZO) : g’(x) =0} of stationary points of g on W(x”) is
non-empty, and
lim { inf llsA-sll} = 0.
Aem XEP
If fi consists solely of the single point x*, the sequence (~“1 converges to this point.
Proofi Assertion 1) was quoted by Ortega and Rheinboldt [ 11, Section 121 , while
2) follows from the well-known theorems on the minima of a convex functional; it has
to be noted here that conv w(x”)is bounded along with w(xO) .
Convergence of a regularized Gauss-Newtonmethod 13
Theorem 2
Let F satisfy condition (V), and for the sequence {xk} derived from Theorem 1,
let lim xk=x*, where x*ER” has the following properties: F (5’) =O, rank F’(x*) =IZ,
and F is twice continuously differentiable on a sphere S (x*, p) , p>O.
If we then choose ql<r/s and %+=I for k>lc, in (Gl), there will then exist
a subscript k,>k,, such that hk=l forkak,.The iteration (2.1) thus transforms for this
k into the pure Gauss-Newton method, i.e., into Eq. (1.2) with yk=l, and is
quadratically convergent in the following sense: with k> k, we have
lb A+~-x’JI~‘Q~~xk-x*ll~, p-0.
Proof. By Lemma 4, if k>k,, k,>k, are sufficiently large, and e:=1-2ql~ (O,.!)
we have
g(P)-g(lC”-/&(I, zk))
z=‘/~(l-&)g’(zk)%(l, x”)=q~g’(xk)%(l, 2)
Since iik=1(3.1) now follows, i.e., we can set &=1 .The quadratic convergence also
follows from Lemma 4. QED.
Our step selection algorithm (Gl) does not fur ?LRuniquely; the latter depends, not
only on the constants q, and q2 but also on the choice of xkand on the method used for
finding a hkE (0, h) , such that (3.2) holds.
While retaining the basic principle of (Gl), modifications of this algorithm can
easily be devised, e.g., the following (G2):
Let CZE(0, I), q= (0, l/2) and let{lhk} be a given sequence such that
O<h&k<i ~
-
Case 1. g’( xk) =O. We then set &,=O .
Gzse 2. g’( x”) ZO. We then find the smallest index i=O, 1, . . . call it jk for
which
and we set Ak=hk(j’) Goldstein [IO] and Armijo [ 121, and also Ortega and Reinboldt
([ll],pp.491 and493).
14 H. Schwetlick
We still have a free choice of x, ; this can be fixed by means of the following
algorithm, call it (G3):
k=O,
k=l,2,... .
With this modified algorithm, AL+is uniquely determined once CY, 3, and 4 are
fixed. In essence, (G3) is the algorithm employed by Marquardt for the iteration (1.3);
the main difference was that, instead of (3.7), the weaker condition
Theorem 3
Let F satisfy condition (V), and starting with z” let (5’) be obtained from (2.1)
with the step-selection algorithm (G2) or (G3). The assertions of Theorem 1 then hold.
If {z?} is convergent to an x*ER” with the properties: F (.Tc*)=O, rank F’ (x*) =r~
and F is twice continuously differentiable on S (z’, p), p-0, then an index k,,can be
found, such that hk=l for k>k, when the step is chosen in accordance with (G3),
whence {P}is in fact quadratically convergent to x*, The same situation is obtained if
Ak is chosen in accordance with (G2) and{xk} has the property: x,=1 for k>k,.
PLoofi The convergence statements may be proved in the same way as in Theorem 1,
since it may be seen from (3.4) that, in the case of both algorithms, if llg’(&) Il>q>O
we always have
hkj >min{ahr,-- A}
The remaining assertions are proved by using Lemma 4 and the special features of (G2)
and (G3). When investigating (G3), it has to be noted that assertion 2) of Lemma 4 also
holds when h (1, CC)is replaced by hh <A,Z) ,, hi [O, 1 I QED.
The advantages of our ste-selection algorithms (Gl), (G2), and (G3), over those
of [S-8] lie in the fact that they reduce the determination of 3LA to the checking of a
finite number of inequalities with known quantities. In particular, they do not require
that a function of one variable be minimized at every step. Furthermore, the sequences
(2) determined by means of them have the favourable properties proved in Theorems
1,2, and 3.
Danslated by D. E. Brown
Convergence of a regularized Gauss-Newtonmethod 1.5
REFERENCES
1. POWELL, M., A method for minimizing a sum of squares of non-linear functions without
calculating derivatives, Comput. J., 7, 303-307, 1965.
2. KOWALIK, J., and OSBORNE, M., Methods for unconstrained optimization problems, Elsevier,
New York, 1968.
3. BARD, J., Comparison of gradient methods for the solution of non-linear parameter estimation
problems, SIAMJ. Numer. Analys., 7, 157-186, 1970.
4. LEVENBERG, K., A method for the solution of certain non-linear problems in least squares,
Quart. Appl. Math., 2, 164-168, 1944.
6. SHANNO, D. F., An accelerated gradient projection method for linearly constrained non-linear
estimation, SIAMJ. Appl. Math., l&322-334, 1970.
7. SHANNO, D. F., Parameter selection for modified Newton methods for function minimization,
SIAM J. Numer. Analys., 7,366-372, 1970.
8. SHAMANSKII, V. E., A regularized Newton’s method for solving non-linear boundary value
problems, Vkr. Matem. Zh., 22, 514-526, 1970;A Correction, LOCcit., 23, 138, 1971.
10. GOLDSTEIN, A., Constructive real analysis, Harper and Row, New York, 1967.
11. ORTEGA, J., and RHEINBOLDT, W. C., Iterative solution of non-linear equations in several
variables,Acad. Press, New York, 1970.
12. ARMIJO, L., Minimization of functions having Lipschitzcontinuous first partial derivatives,
Pacijic J. Math., 16, 1-3, 1966.
13. ALTMAN, M., A feasible direction method for solving the non-linear programming problem,
Bull. Acad. Polon. Sci., Ser. sci. Math., astron. et phys., 12, 43-50, 1964.
14. ALTMAN, M., Generalized gradient methods of minimizing a functional, BUNAcad. Polon. Sci.
Ser. Sci. math., astron. et phys., 14,313-318,1966.