Convergence of A Regularized Gauss-Newton Method

CONVERGENCE OF A REGULARIZED
GAUSS-NEWTON METHOD*
H. SCHWETLICK
Dresden
(Received 14 February 1972; revised 27 March 1973)
THE problem of minimizing a sum of squares of m non-linear functions of n variables is

considered. In Marquardt’s method, special rules were laid down for the selection of the
parameter. The present rules do not demand the minimization of a function of one
variable, they only require that a finite set of inequalities be checked. It is shown that
the relevant regularized Gauss-Newton methods converge, under weak assumptions, like
gradient methods. If the sequence produced by the algorithm converges to a simple zero,
the method transforms under certain assumptions into the non-regularized Gauss-Newton
method, and hence has the quadratic convergence of the latter.
1. Introduction
Given a vector function
FIR-R”, man, z=(zi, . . . , %JT,
F(s)=(fi(x), . . . J&w
the problem of mean square solution of the in general incompatible non-linear system
of equations
F(x) =0
amounts to finding an x*=R” such that
g(s*) = inf g(x), g(x): = ‘~F(x)~F(x)=~/z~[fi(x) I” (1.1)

riszw i=l
With m = n we get a particular case of this class of problems, namely, the solution or
mean square solution of a non-linear system of n equations with n unknowns.
*Zh vjkhisl. Mat. mat. FL, 13,6, 1371-1382, 1973.
I’SSK CMMI’ 136 A

2 H. Schwetlick
Several algorithms have been proposed for the solution of problem (1 .l) (see
e.g., Powell [l] , Kowalik and Osborne [2] , or Bard [3] ); these frequently amount to
variants of the so-called damped Gauss-Newton method
X k+‘=~k-yk[F’(izk)TF’(d’)]-lF’(~)TF(ti), k=O, 1,. . . ) (1-2)
Difficulties arise in the numerical realization of (1.2) when rank (F’ ( sk) ) <n in
which case F’ ( zk) ‘F’ (5) k is not invertible (or when the corresponding linear system is
not uniquely solvable). Moreover, for propositions to be provable concerning global
convergence, [ F’(xk) TF’(xk) I-’ must exist and be uniformly bounded with respect to k.
These difficulties were avoided by Levenberg [4] , and later by Marquardt [5] , by
going over to the iteration
~“+~=2-[pkI+F’(rk)TF’(zk)]-iF’(rk)TF(sR), k=O, 1, . . . , (1.3)
which may be interpreted as a regularized version of (1.2); here, pk>o is a real parameter.
Obviously, with ok>0 the expression pkzfF’(2)‘F’( “)
2 IS invertible, and by a suitable
’
choice of pk it becomes possible to prove convergence theorems for the iteration (1.3)
(see Shanno [6,7], and Shaman&ii [S]).
The papers just quoted offered several special methods for choosing pk all
essentially based on the principle of majorization by linear functions which Goldstein
[9, lo] devised in connection with the minimization of an arbitrary functional. For
the relevant iteration (1.3), global convergence theorems can be proved, similar to those
for the gradient method; see Ortega and Rheinboldt [ 1l] , Section 12. If the sequence
{xk} converges to a “good” zero Z* of F, the method transforms automatically, given
a suitable choice of a certain parameter, into the undamped Gauss-Newton method,
i.e., into the method (1.2) with yk= 1, and thus acquires the favourable local convergence
properties of the latter, and notably, its over-linear convergence.
2. Definitions and Lemmas
We first use the transformation p= (1 -h) lh to map the permissible range [ 0, -1

of pk in Eq. (1.3) into the finite interval [0, l] for h.The iteration (1.3) then becomes
xk+i=,xk-),k[ (l-hk)z+hkF’(Xk)TF’(xk)]-lF’(xk)TF(xk), (2.1)

1LkE[O,
11,
Convergence of a regularizedGauss-Newtonmethod
Henceforth we shall use the abbreviations
B(x):=F’(x)‘F’(x), A(h, x):=(1-a)l+aB(x),
h(h, x):=[R(3L, X)]“F’(X)‘F(X)
so that Eq. (2.1) now reads
5 k+'=xk-)Lkh (?a, 2)) k=O, I,. ...
Further, let x”ER” be a fixed initial vector for the iteration (2.1), and let the
corresponding level set be defined by W( 5’) : = {z&?” : g (5) Gg (5”))
It will be said for brevity that a function F satisfies the condition (V) if F) R"-R"
and F possesses on W(Y) a uniformly continuous Frechet derivative r;’ bounded by
M > 0 (by differentiability on a set Q we understand here differentiability on an open
set R 3 Q).
Lemma 1
Let F satisfy condition (V) {z-kh(h,z) : hi Xl}cW(X’)

IO, **‘for some
ZG W(x“) and some AE (0, 1) (the value x = 1 is also permissible if B(z) is
invertible). Then, A (a, z) -‘, Ah (31, z) and g (s-Ah (A, z) ) are differentiable with
respect to X in [0,x], and for these h we have
&A (h, z)-“=A (A, 2)-r [I--B(z)] A (h, 2)-r,
+ hh (h, z)= A (A, z)-“g’ (z), g’ (z)=F’ (z)‘F (z),
-& g (x--h/z (A, z))= --g’ (z-Ah (h, z))‘A (h, z)-2g’ (2).
Boof: Since (d/dh)A(h, z)=-(IT-B(z)) wehave
-&A (h, 2)-1=---A (a,

g-1[$A(a, z)]A (a,
2)-l
=_4(a,
2pp-B (2))
A (h,
2)-l.
Our expressions 2) and 3) follow from this by simple differentiation. QED.
Lemma 2
Let F satisfy condition (V), and for some sE w( 2”) and some 3~ 10, ‘/PI let
0~(X~1/2(I+MZ)and{z--hh(3L, z):a~[O, 71]}cW(P).Then,with 3i.=[O, ?I],
we have
4 H. Sch wetlick
Proof. Let h=[O, A]. Then, \\A(& ~)--llI~h(l+M~)~‘/~; hence there

exists A (h, z) -‘, and IIA (h, 2) -‘II<2. Moreover, (IA (A, z) -‘-Ill G IM (A, 2) --III
IlLI (h, z) -‘llc2 (lfW& whence it immediately follows that I(A (h, z) -2-Z11 <
llA(I,, z)-1-Zl(&4(h, ~)-~+1(1<6(l+W)h Since ((g’(z) Il~llF’(z) IllIF IIfMb
statements 1) and 2) are proved. To prove 3), we write
~~~(2)-_P(2--lh(h,2))#=~~~~‘(z--th(~,2))[~~h(t,z)]dt~~
0
A A
<+F’(z-th(t, z))A(t, z)-eg’(z)Idt~SM.4.;lilbdt=4M2bh
0 0
after which 4) follows immediately from the identity
g’(z)-g’(z-hh(h, z))=[F’(z)-F’(z-hh(h, z))]‘F(z

-hh(h, z))fF’(~)~[F(z)-F(z-hh(h, z)) I
Q.E.D.
Lemma 3
Let F satisfy (V), let ~1, ~2 and q2 be given numbers such that 0<q2<~1<pz<1
and let
cPo+):=g(z-hh(h,z)), Xi(h) :‘~(z)-~~ill~‘(z) 11’7 i= 1, 2,
~z(h):=g(z)-~qzg’(z)‘h(a, z), h= [O, l/21 1
Then for every z~W(z“) with llg’(z) ll>q>O there exists the number h,(z) =
maxCO~a~‘lz:cp(h)~Xz(h)and ii(A)<&(A) for all h=[O, CL]), and we have
h,(z) ahi> with some AI= (0, ‘/,) independent of z.
Proof For all hi [O, ‘/2] such that z-hh (A. z) EW (so) all the derivatives in
question exist, and we have
(P(h)=-g’(z-hh(h, z))=A(h, z)-“g’(z), (P(0) =-Ilg’(z) II”,

Xi(h) =-/kllg’(z) 11’7 Xi(O) =-j-Lillg’(z) II2 i=l, 2,
$&)=-qzg’(z)‘A(h, z)-%‘(z), $z(O) =--42118’(z) l12.

Convergence of a regularized Gauss-Newtonmethod 5
Since (p(0) -C$ (0) <ii (0) -C&(O) <O we can show, by similar arguments to those
used by Altman [13. 141, when proving the convergence of a special gradient method,
that there exists at least one cc= (0, 1) such that C$(A) &Z (A) -(ii (A) _(& (A) for
AE [ 0, a] and that A, (z) is well defined, while in particular
Now let E: = ( r2/4) (1 -_clz) /4 Mb2 and 6,>0 be numbers for which it follows,
from the fact that U, zzW(z”) for llu--~1]<(6~ that IIF’(-F’( U) llGe.Further, let
and finally, let

6: =min {l/2 (l+M*) , S,/ZMb, 62, &} >O.
We now assume that no positive lower bound exists for X1(z). There will then be at
least one z~W(z’) for which jig’(z) ]l>rl and O<hi:=)Lf (z) (6. By definition of
AI and noting that h,<r/s at least one of the following equations will hold for this z:
Gzse 1. Let (p(Ai) =x2(L) . Then obviously
-iz (a,) =p211gf(z) ly=--6 (a,)

=gyz-ah(ai,4 )‘A (aI, 3 -Wz>
=llg’(z) ll’+r(z).
In view of Lemma 2 and the fact that IIhJr (A,, z) l162MbhiG3, we have
1r (~1I=1 k’ (2-w (a,, 4)-g’ (z)iTA(al, 4-26 (4

+gw [A (aI, 4-v g’ (4 Ia I IIF’ (4--F (z-w (a,, 2))11
+4Mshl] 4MbtMb. 6 (I +M2) hlMb<4Mb2c
+[16M4b2+6M”(1+M2)b2]6s+ (I--y2),
whence
6 H. Schwetlick
i.e., since ((g’ (2) I(>y we get
and finally, I-(~> 1, which contradicts our assumption.
Cizse2. Let ii(h,)=$(h,).Wehave
-&h,)=pillg’(z) Il”=-~?z(hi)=q~g’(z)‘A(h~,
z>-%‘(z)
=qz/lg’(z) w-s(4 3’
so that we can write
Is(z) 1=1q~g’(z)T[A(hi,z)-2-llg’(~) )Gf2bZ~6(1+M2)6
Gf (prq2)
whence it follows that
i.e., pI<qz+i/s ( pt-q2) which contradicts the fact that ~1> q2. Q.E.D.
Lemma 4
Let F 1R”-R” be twice continuously differentiable on a sphere S(S_?,p) : =

{GER”: IIs--z*llC p}, p>O, and in additionlet F(s*) =0 and rank (F/(x*)) =IZ.
Then for every Ed (0, I] we can find a 6=6(e) E (0, p) + such that, for all XES(Z*, 6)
we have
1) [F’(s)‘F’(Lq]-’ exists, and [s-h(1, s)]~S(z*, 6)*with l]z---h(l, z)

--5*ll~QllZ--T*112,
2) g(s)--_g(s--h(l, 2))~~/2(1-E)g1(5)=h(l, 5).
Proof. 1. Since rank (F’ (5’) ‘F’ (5’) ) = rank (F’ (2’) ) = n the matrix
B(Y) =F’(x’) ‘F’( ST* ) IS
. non-singular. There then exist numbers &E (0, p], and
L1 > 0 such that, with XES(L*, 6,) B(x) is non-singular and llB(z) -i]]=&5,.It
follows from this that
Ils--h(l,s)-2*~~=II--B(z)-‘F’(z)’[F(z)--P(~’)
-F’(x) (x-x*) ] )~~‘L/~L,N~N~))x-L*~~~=:~~~~-x*~~~,
where Nr andNz denote bounds for F’(x) and F”(x) on 5(x*, p). Our assertion 1) will
obviously hold provided that, for a given EE (0, I] we choose 6 so as to satisfy
2. Let XES (x*, &) , &: =*/2&. The straight line joiningx and x-h(l) x) then lies
entirely in S(x*, a,), and in particular,g will thus be twice continuously differentiable on
this line. We thus have
g(n)-_g(x--h(l, z))=g’(z)‘h(l, z)--‘l&(1, 4Tgy+(l, x) (2.2)
-wz(l, 5)qgN(Z---6h(l, z))-gyr)]h(l, 5)
=‘/zg’(s)‘h(l, +--r(2), s=e(z>E(O, I),
I+) 1 ~‘/2{~zL2K211~(4 lI+L,ZNi2Ei(6Z)} IIF II”,

(2.3)
E*(T):=sup {Ilg”(u)-f(v) II: u, u=S(z*, p), Ilu-VII<(z).
Using a familiar proposition of linear algebra, corresponding to F’(x) we have

orthogonal matrices U(x) and V(x) such that
F’(x) =U(x)‘D(x) V(x)

where
i=k=l t * **,n,
D tx)= (dik (x)h otherwise
is an (m, n) matrix, and k> (z) are the eigenvalues of B(x).
Since IIB(r)-*IIGL 1 we see that k(s) >1/)1’L,=: m>O is independent of i,

and XES (z*, &) .
Using the substitution z (z) : =U (2) F (x) we then have
k’ (z)=F (z)=z(z)=z (x)= 5 [zi (x)]~,

i=l
(2.4)
F(2)TF’(5) [F’(s)‘F’(z)]-l F’(s)TF (5)=g’(~)~h (1, 2)= ~ [zi(s)l”.

k1
On the other hand,
F(z’)=O=F(z)+F’(s)(&--z)+v(z),
8 H. Schwetlick
whence it immediately follows that
o=z (x)+D (xc)V(x) (x--z’)fw (x),

(2.5)
II~(I)ll=ll~(~)~0ll~~11~-~*11
On expanding z, D, and w by the n-th row in accordance with the expressions
w1(4
A (s)=diag (IQ (x)), w(x)=
[
a**
w2 (4 1
,
we can write Eqs. (2.5) as
O=z’(x)+h(x)V(x) (x-x*)+w’(x),
O=?(x) + +wyx)
whence we again have Ilwi(z) II< (iV&) kr-~*1/‘, i=1,2,
Now let &: =min {6,, mlNz}. Then, with FZS( x*, &)we have
II9 (4 II< -$ (Ix--5* 11” (2.6)
and furthermore,
-!I!- X-2 (l2- q II2-z* I/“> + I(x--z* ~l‘q~.z2(z) 11.

> 4ln
It thus follows from Eqs. (2.4) and (2.6) that, with XCS (z*, 6,)
llF(x) l12=liz(x) l12=llz’(x) ll”+llz”(x) 11’~2llz’(x) II2

=2g’(s)‘h(l, .r),
whence, recalling (2.3),
We can now obtain from Eq. (2.2) the inequality
g(x)-_g(X----h(l, x~~~~/*~1-2~~2~i2{~211~(x)II
+Et(62)}lg’(x)=h(l, x>.
Since g” is uniformly continuous on S (x*, p) given any EE (0,1] we can find a

&=6,(~)~(0, &),suchthat
when P&(X*, &) whence 2) holds for 6=6(~) : =&(E) QED.
3. Choice of Step and Convergence Theorems
To solve problem (1 .l), we shall consider the iteration (1.3) in the form (2.1). The
step parameter ?L~E[ 0, 11 will therefore be determined from the following algorithm,
which we refer to for brevity as (Gl):
Given numbers q,,q2, ii. such that OCqjGqz<l, O<hGl and a sequence {k}
-
such that h<Xk< 1. -
Cizse I. g/(x”) =O. We then set Ah=0

Gzse 2. g’ (rk) 20.
Cizse 2~7. A ( xk, t”) --i exists (this is always the case when x,<l) and we have
(3.1)
We then set hk’%k
&se 2b. If Case 2a does not hold, we choose hk~ (0, %A) in such a way that
(3.2)
Theorem I
Let F satisfy condition (V). Let the sequence {xk} starting with x”ER”be calculated
from (2.1), the step hk being chosen in accordance with (Gl).
We can then say that:
1) The algorithm in question is realizable without restrict ion. We have

g(z”)>g(z’)>. . . &+k)2=g(~+*), andif g’(z’)fO thenalso g(xk)>g(sk+‘).
The limit lim g(x”) =: L>O exists.
k-*03
10 H. Schwetlick
hoof: We shall prove 1) by induction. Assume that 1) holds for 1 Gk. Then,
zk=IV( 5’). If g’( zk) =0 we have nothing to prove; we therefore assume that g’( zk) ZO.
If case 2a holds, we then have g’(sk)‘h(&, xk) =g’(zk)TXA (A,>,xk)-‘g’(s’)>O, i.e.,
g(zk+‘)~g(Zk)-4,Xkg’(gk)Th(~kr .%“)<&‘).
If case 2a does not hold, we have
(pk(?&):=g(rt.k-k.h(%k,
x”))‘g(p)
-ql~kg'(xk)Th(?ik, xk) =:$!,k(&).
NOW,o>~,,k(0)=-4111g’(zk) Il”>&(o)=-lig’(x”) li*, i.e., a number okkE (0, 1)
exists such that (pk(h) <$,k (a), hi (0, a). Hence there exists
Xk:=SUp{O~a<l:(pk(h)~~l,k(~), hELO, a]},
and we have X+ (0, %k> C (0,l). Furthermore, since
&k(h)=-gig’(~k)‘A(h,~k)-2g’(~k)<0, iEIO, l),

we have
i.e.,
The function f$k is thus continuous and even differentiable in [ 0, xk) and since
O+i,k (0) +Z.k(O) >& (0))
~2,k(h):=g(sk)--3Lq2g’(Zk)=h(h, xk)
there ti ahays exista hkE (0, a) c (0, xk), such that (3.2) holds, and again we
get g (tip”) <g(z) . This proves the realizability and the inequalities for g (ti).The
existence of lim g ( zk) =L>O follows at once from the inequality g(z) 20 which holds
for all x=Rn
2) Our proof will be indirect. We assume that there exists a subsequence {xkl}
of (2”) such that llg’(z”‘) ll&q>0, j=l, 2, . . . ,
If case 2a holds at the step kj we have by hypothesis

hkj =x,j &>O.
-
If case 2a does not hold, (3.2) will be satisfied for hkj . We now choose numbers ~1 and
p2 such that qz<pi<pzCl, and since ljg’(z9) ll>q>O and zR’E II’( i=l, 2, . . . ,
we can apply Lemma 3. Hence there exists a hi= (0, l/s) such that
‘pkj
(M2’kj (k> and jli,kj (h)<&, k (h) for all hE[O, h,], (3.3)
where Cp, Xi and ‘$2 are the functions defined in Lemma 3, with z=& From (3.3)
noting that
‘Pkj (0)=X2, kj (0)‘Xl,kj (0)=%,kj (O)=g (Xkj),
‘pkj (0)~ -11 g’ (Xkj) [12<*li2, kj (O)=-p2 1)g’ (Xkj) 11
<lil,k,(o)=--tL1[Ig'(Zkj)112<~2,kj(0)=-~2IIg'(~kj)II
for hi (0,3Li I we obtain the inequalities
(Pkj (h)=g (Xkj--hh (h, Xki))<.X2, kj @)=g (Xk3)--hp2 1 g’ (Xki) 11 (3.4)
<X1,kj @)=g (“ki)-&, I[g’ (sk) )(2<&, kl (h)

= g (ski)--hg’ (xkj)% (h, xkj).
It is clear from (3.4) that iikj cannot belong to the interval (0, Al) since the left-hand
side of (3.2) does not hold in this interval. We thus always have-
hkj>min (h,
--
hi}= : &in, Lni*>O.
Hence
g (xkjtl)<g(xkj+‘)<g(xk+qlhkjg’(x’i )Th(hkj, xkj) (3.5)
=g (xkj)-qlhkjgf (zk)TA (hkj, xki)+g (xkj)
<g (xkj)-A, A: =Qlhminq2/(1 +M2)>0.
It follows from (3.5) that {g(P’)} and hence {g(P)} is not bounded from below,
which contradicts the fact that g(z) 20. QED.
It may be mentioned here that, in the method for choosing the step given by
Goldstein [9, lo] for the iteration
X h+‘=XLYkP (Xk)
when minimizing a function g, our inequality (3.2) is replaced by
g(xk)--Qt~kg’(xk)T~(xk)~g(xk-yk~(xk))~(g(xk)--Qiykg’(~)Tp(~) (3 .o)
where p (zk)is a direction of descent of g at the point zk i.e., a direction for which
g’(zk)Tp(5k)>0.
12 H. Schwetlick
As distinct from the case of (3.2) the step parameter Tk appears linearly in (3.6)
in the argument ofg and in the upper and lower bounds; the same is true for (3.1). The
statements of Theorem 1 can of course still be proved (in fact, more simply) if, instead
of the non-linear bounding functions
g(x”) -hq&?(xA)‘h(h, x”)
the corresponding linearized functions
g(XA)-hQillg’(xA) 11’9 i=l, 2,
are employed in (3.1) and (3.2). But in general, it can then no longer be shown that
k!+=lfor k>k, or that, given a suitable choice of XA and qi we can choose hk such
that Iim AA=1 The corresponding method (2.1) is then no longer over-linearly
convergent in the neighbourhood of a zero with certain regularity properties, as is the
method when (Cl) is used for choosing the step (see Theorem 2).
With additional assumptions, we can prove in the usual way that {z’} or at least
some subsequence of it, is convergent to a stationary point ofg, i.e., to some x*ER”
such that g’(z*) =0 .This does not necessarily follow from our assumptions (V), as
is evident e.g., from the counter-example f(x) =ex, nz=n. These assertions are, however,
independent of the special algorithm, and are derived solely from the assertions that
{x’} c W (x”) and lim g’ ( xA) =O.
An example is provided by
Note. Let FIRn+Rm be continuously differentiable on W(Z”) and let W(z”) itself
be bounded. The assertions of Theorem 1 then hold, together with:
1) The set Q: = (.z=W(ZO) : g’(x) =0} of stationary points of g on W(x”) is
non-empty, and
lim { inf llsA-sll} = 0.
Aem XEP
If fi consists solely of the single point x*, the sequence (~“1 converges to this point.
2) If F is in fact continuously differentiable and convex on the convex hull conv

W(SO) of W(ro) every point of accumulation z of (9) will be a minimum of g, i.e.,
every such z is a solution of (1.2). If g is strongly convex on conv W(ZO) the sequence
(2”) will converge to what is then the unique solution of (1.2).
Proofi Assertion 1) was quoted by Ortega and Rheinboldt [ 11, Section 121 , while
2) follows from the well-known theorems on the minima of a convex functional; it has
to be noted here that conv w(x”)is bounded along with w(xO) .
Theorem 2
Let F satisfy condition (V), and for the sequence {xk} derived from Theorem 1,
let lim xk=x*, where x*ER” has the following properties: F (5’) =O, rank F’(x*) =IZ,
and F is twice continuously differentiable on a sphere S (x*, p) , p>O.
If we then choose ql<r/s and %+=I for k>lc, in (Gl), there will then exist
a subscript k,>k,, such that hk=l forkak,.The iteration (2.1) thus transforms for this
k into the pure Gauss-Newton method, i.e., into Eq. (1.2) with yk=l, and is
quadratically convergent in the following sense: with k> k, we have
lb A+~-x’JI~‘Q~~xk-x*ll~, p-0.
Proof. By Lemma 4, if k>k,, k,>k, are sufficiently large, and e:=1-2ql~ (O,.!)
we have
g(P)-g(lC”-/&(I, zk))
z=‘/~(l-&)g’(zk)%(l, x”)=q~g’(xk)%(l, 2)
Since iik=1(3.1) now follows, i.e., we can set &=1 .The quadratic convergence also
follows from Lemma 4. QED.
Our step selection algorithm (Gl) does not fur ?LRuniquely; the latter depends, not
only on the constants q, and q2 but also on the choice of xkand on the method used for
finding a hkE (0, h) , such that (3.2) holds.
While retaining the basic principle of (Gl), modifications of this algorithm can
easily be devised, e.g., the following (G2):
Let CZE(0, I), q= (0, l/2) and let{lhk} be a given sequence such that
O<h&k<i ~
-
Case 1. g’( xk) =O. We then set &,=O .
Gzse 2. g’( x”) ZO. We then find the smallest index i=O, 1, . . . call it jk for
which
xk))<g (xk)-q@g (xkpa(h(j)

k ,xk)t (3.7)
._
,
hai’: =alhkt j-O,;, ...,
and we set Ak=hk(j’) Goldstein [IO] and Armijo [ 121, and also Ortega and Reinboldt
([ll],pp.491 and493).
14 H. Schwetlick
We still have a free choice of x, ; this can be fixed by means of the following
algorithm, call it (G3):
k=O,
k=l,2,... .
With this modified algorithm, AL+is uniquely determined once CY, 3, and 4 are
fixed. In essence, (G3) is the algorithm employed by Marquardt for the iteration (1.3);
the main difference was that, instead of (3.7), the weaker condition
g (xk- a$2 (I$‘, xk))< g (xk)

(which is not sufficient for proving the convergence) was used. We now have
Theorem 3
Let F satisfy condition (V), and starting with z” let (5’) be obtained from (2.1)
with the step-selection algorithm (G2) or (G3). The assertions of Theorem 1 then hold.
If {z?} is convergent to an x*ER” with the properties: F (.Tc*)=O, rank F’ (x*) =r~
and F is twice continuously differentiable on S (z’, p), p-0, then an index k,,can be
found, such that hk=l for k>k, when the step is chosen in accordance with (G3),
whence {P}is in fact quadratically convergent to x*, The same situation is obtained if
Ak is chosen in accordance with (G2) and{xk} has the property: x,=1 for k>k,.
PLoofi The convergence statements may be proved in the same way as in Theorem 1,
since it may be seen from (3.4) that, in the case of both algorithms, if llg’(&) Il>q>O
we always have
hkj >min{ahr,-- A}
The remaining assertions are proved by using Lemma 4 and the special features of (G2)
and (G3). When investigating (G3), it has to be noted that assertion 2) of Lemma 4 also
holds when h (1, CC)is replaced by hh <A,Z) ,, hi [O, 1 I QED.
The advantages of our ste-selection algorithms (Gl), (G2), and (G3), over those
of [S-8] lie in the fact that they reduce the determination of 3LA to the checking of a
finite number of inequalities with known quantities. In particular, they do not require
that a function of one variable be minimized at every step. Furthermore, the sequences
(2) determined by means of them have the favourable properties proved in Theorems
1,2, and 3.
Danslated by D. E. Brown
Convergence of a regularized Gauss-Newtonmethod 1.5
REFERENCES
1. POWELL, M., A method for minimizing a sum of squares of non-linear functions without
calculating derivatives, Comput. J., 7, 303-307, 1965.
2. KOWALIK, J., and OSBORNE, M., Methods for unconstrained optimization problems, Elsevier,
New York, 1968.
3. BARD, J., Comparison of gradient methods for the solution of non-linear parameter estimation
problems, SIAMJ. Numer. Analys., 7, 157-186, 1970.
4. LEVENBERG, K., A method for the solution of certain non-linear problems in least squares,
Quart. Appl. Math., 2, 164-168, 1944.
5. MARQUARDT, D., An algorithm for least squares estimation of non-linear parameters,

SIAMJ. Appl. Math., l&431441, 1963.
6. SHANNO, D. F., An accelerated gradient projection method for linearly constrained non-linear
estimation, SIAMJ. Appl. Math., l&322-334, 1970.
7. SHANNO, D. F., Parameter selection for modified Newton methods for function minimization,
SIAM J. Numer. Analys., 7,366-372, 1970.
8. SHAMANSKII, V. E., A regularized Newton’s method for solving non-linear boundary value
problems, Vkr. Matem. Zh., 22, 514-526, 1970;A Correction, LOCcit., 23, 138, 1971.
9. GOLDSTEIN, A., On steepest descent, SZAMJ. Control, 3, 147-151, 1965.
10. GOLDSTEIN, A., Constructive real analysis, Harper and Row, New York, 1967.
11. ORTEGA, J., and RHEINBOLDT, W. C., Iterative solution of non-linear equations in several
variables,Acad. Press, New York, 1970.
12. ARMIJO, L., Minimization of functions having Lipschitzcontinuous first partial derivatives,
Pacijic J. Math., 16, 1-3, 1966.
13. ALTMAN, M., A feasible direction method for solving the non-linear programming problem,
Bull. Acad. Polon. Sci., Ser. sci. Math., astron. et phys., 12, 43-50, 1964.
14. ALTMAN, M., Generalized gradient methods of minimizing a functional, BUNAcad. Polon. Sci.
Ser. Sci. math., astron. et phys., 14,313-318,1966.

Convergence of A Regularized Gauss-Newton Method

Uploaded by

Copyright:

Available Formats

Convergence of A Regularized Gauss-Newton Method

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Convergence of A Regularized Gauss-Newton Method

Uploaded by

Copyright:

Available Formats

CONVERGENCE OF A REGULARIZED

(Received 14 February 1972; revised 27 March 1973)

THE problem of minimizing a sum of squares of m non-linear functions of n variables is

Given a vector function

FIR-R”, man, z=(zi, . . . , %JT,

amounts to finding an x*=R” such that

g(s*) = inf g(x), g(x): = ‘~F(x)~F(x)=~/z~[fi(x) I” (1.1)

*Zh vjkhisl. Mat. mat. FL, 13,6, 1371-1382, 1973.

I’SSK CMMI’ 136 A

X k+‘=~k-yk[F’(izk)TF’(d’)]-lF’(~)TF(ti), k=O, 1,. . . ) (1-2)

~“+~=2-[pkI+F’(rk)TF’(zk)]-iF’(rk)TF(sR), k=O, 1, . . . , (1.3)

2. Definitions and Lemmas

We first use the transformation p= (1 -h) lh to map the permissible range [ 0, -1

xk+i=,xk-),k[ (l-hk)z+hkF’(Xk)TF’(xk)]-lF’(xk)TF(xk), (2.1)

Henceforth we shall use the abbreviations

B(x):=F’(x)‘F’(x), A(h, x):=(1-a)l+aB(x),

h(h, x):=[R(3L, X)]“F’(X)‘F(X)

so that Eq. (2.1) now reads

5 k+'=xk-)Lkh (?a, 2)) k=O, I,. ...

Let F satisfy condition (V) {z-kh(h,z) : hi Xl}cW(X’)

&A (h, z)-“=A (A, 2)-r [I--B(z)] A (h, 2)-r,

+ hh (h, z)= A (A, z)-“g’ (z), g’ (z)=F’ (z)‘F (z),

Boof: Since (d/dh)A(h, z)=-(IT-B(z)) wehave

-&A (h, 2)-1=---A (a,

Proof. Let h=[O, A]. Then, \\A(& ~)--llI~h(l+M~)~‘/~; hence there

after which 4) follows immediately from the identity

g’(z)-g’(z-hh(h, z))=[F’(z)-F’(z-hh(h, z))]‘F(z

cPo+):=g(z-hh(h,z)), Xi(h) :‘~(z)-~~ill~‘(z) 11’7 i= 1, 2,

~z(h):=g(z)-~qzg’(z)‘h(a, z), h= [O, l/21 1

(P(h)=-g’(z-hh(h, z))=A(h, z)-“g’(z), (P(0) =-Ilg’(z) II”,

$&)=-qzg’(z)‘A(h, z)-%‘(z), $z(O) =--42118’(z) l12.

and finally, let

Gzse 1. Let (p(Ai) =x2(L) . Then obviously

-iz (a,) =p211gf(z) ly=--6 (a,)

1r (~1I=1 k’ (2-w (a,, 4)-g’ (z)iTA(al, 4-26 (4

i.e., since ((g’ (2) I(>y we get

and finally, I-(~> 1, which contradicts our assumption.

Cizse2. Let ii(h,)=$(h,).Wehave

so that we can write

Is(z) 1=1q~g’(z)T[A(hi,z)-2-llg’(~) )Gf2bZ~6(1+M2)6

Let F 1R”-R” be twice continuously differentiable on a sphere S(S_?,p) : =

1) [F’(s)‘F’(Lq]-’ exists, and [s-h(1, s)]~S(z*, 6)*with l]z---h(l, z)

2) g(s)--_g(s--h(l, 2))~~/2(1-E)g1(5)=h(l, 5).

I+) 1 ~‘/2{~zL2K211~(4 lI+L,ZNi2Ei(6Z)} IIF II”,

Using a familiar proposition of linear algebra, corresponding to F’(x) we have

F’(x) =U(x)‘D(x) V(x)

is an (m, n) matrix, and k> (z) are the eigenvalues of B(x).

Since IIB(r)-*IIGL 1 we see that k(s) >1/)1’L,=: m>O is independent of i,

Using the substitution z (z) : =U (2) F (x) we then have

k’ (z)=F (z)=z(z)=z (x)= 5 [zi (x)]~,

F(2)TF’(5) [F’(s)‘F’(z)]-l F’(s)TF (5)=g’(~)~h (1, 2)= ~ [zi(s)l”.

whence it immediately follows that

o=z (x)+D (xc)V(x) (x--z’)fw (x),

we can write Eqs. (2.5) as

whence we again have Ilwi(z) II< (iV&) kr-~*1/‘, i=1,2,

II9 (4 II< -$ (Ix--5* 11” (2.6)

-!I!- X-2 (l2- q II2-z* I/“> + I(x--z* ~l‘q~.z2(z) 11.

llF(x) l12=liz(x) l12=llz’(x) ll”+llz”(x) 11’~2llz’(x) II2

whence, recalling (2.3),

1) [F’(s)‘F’(Lq]-’ exists, and [s-h(1, s)]~S(z, 6)with l]z---h(l, z)