Global Convergence of A Trust-Region Sqp-Filter Algorithm For General Nonlinear Programming
Global Convergence of A Trust-Region Sqp-Filter Algorithm For General Nonlinear Programming
Global Convergence of A Trust-Region Sqp-Filter Algorithm For General Nonlinear Programming
minimize f (x)
subject to cE (x) = 0,
cI (x) 0,
635
FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER
636
This algorithm therefore mixes the use of the lter with sequential linear programming (SLP). This approach was generalized by the same authors in [20], where the
objective function is approximated by a quadratic model, which results in a sequential
quadratic programming (SQP) technique in which each quadratic program must be
solved globally. In this paper, we again consider approximating the objective function
by a quadratic model, but, at variance with the latter reference, the method discussed
here does not require the global solution of the associated nonconvex quadratic programming (QP) subproblem, which is known to be a theoretically dicult processit
is known to be NP hard (see Murty and Kabadi [26]). The algorithm analyzed here
also has a dierent mechanism for deciding on the compatibility of this subproblem
and allows for an approximate subproblem solution.
2. A class of trust-region SQP-lter algorithms.
2.1. An approximate SQP framework. SQP methods are iterative. At a
given iterate xk , they implicitly apply Newtons method to solve (a local version of)
the rst-order necessary optimality conditions by solving the QP subproblem QP(xk )
given by
(2.1)
where fk = f (xk ), gk = g(xk ) = x f (xk ), where AE (xk ) and AI (xk ) are the Jacobians of the constraint functions cE and cI at xk , and where Hk is a symmetric
matrix. We will not immediately be concerned about how Hk is obtained, but we will
return to this point in section 3. Assuming that a suitable matrix Hk can be found,
the solution of QP(xk ) then yields a step sk . If sk = 0, then xk is rst-order critical
for problem (1.1).
Unfortunately, due to the locally convergent nature of Newtons iteration, the
step sk may not always be very good. One possible way to cope with this diculty
is to dene an appropriate merit function whose value decreases with the goodness
of sk , which is where penalty functions typically play a role. A trust-region or a
linesearch method is then applied to minimize this merit function, ensuring global
convergence under reasonable assumptions. However, as one of our objectives is to
avoid penalty functions (and the need to update the associated penalty parameter),
we instead consider a trust-region approach that will not use any penalty function.1
In such an approach, the objective function of QP(xk ) is intended to be only of local
interest; that is, we restrict the step sk in the norm to ensure that xk + sk remains in
a trust-region centered at xk . In other words, we replace QP(xk ) by the subproblem
TRQP(xk , k ) given by
(2.2)
minimize mk (xk + s)
subject to cE (xk ) + AE (xk )s = 0,
cI (xk ) + AI (xk )s 0,
and
s k
for some (positive) value of the trust-region radius k , where we have dened
(2.3)
1
mk (xk + s) = fk + gk , s + s, Hk s
2
1 Recently, W
achter and Biegler [31] have proposed a linesearch variant of the ideas described in
this paper.
637
and where denotes the Euclidean norm. This latter choice is purely for ease
of exposition. We could equally use a family of iteration-dependent norms k , so
long as we require that all members of the family are uniformly equivalent to the
Euclidean norm. The interested reader may verify that all subsequent developments
can be adapted to this more general case by introducing the constants implied by this
uniform equivalence wherever needed.
Remarkably, most early SQP algorithms assume that an exact local solution of
QP(xk ) or TRQP(xk , k ) is found, although attempts have been made by Dembo
and Tulowitzki [8] and Murray and Prieto [25] to design conditions under which
an approximate solution of the subproblem is acceptable. We revisit this issue in
what follows, and start by noting that the step sk may be viewed as the sum of two
distinct components, a normal step nk , such that xk + nk satises the constraints
of TRQP(xk , k ), and a tangential step tk , whose purpose is to obtain reduction of
the objective functions model while continuing to satisfy those constraints. This
framework is therefore similar in spirit to the composite-step SQP methods pioneered
by Vardi [30], Byrd, Schnabel, and Shultz [5], and Omojokun [27], and later developed
by several authors, including Biegler, Nocedal, and Schmid [1], El-Alem [12, 13], Byrd,
Gilbert, and Nocedal [3], Byrd, Hribar, and Nocedal [4], Bielschowsky and Gomes [2],
Liu and Yuan [23], and Lalee, Nocedal, and Plantenga [22]. More formally, we write
(2.4)
sk = nk + tk
sk k ,
and
(2.7)
Of course, this is a strong assumption, since in particular (2.5) or (2.6)/(2.7) may not
have a solution. We shall return to this possibility shortly. Given our assumption,
there are many ways to compute nk and tk . For instance, we could compute nk from
(2.8)
nk = Pk [xk ] xk ,
where Pk is the orthogonal projector onto the feasible set of QP(xk ). In what follows,
we do not make any specic choice for nk , but we shall make the assumptions that
nk exists when the maximum violation of the nonlinear constraints at the kth iterate
def
k = (xk ), where
(2.9)
(x) = max 0, max |ci (x)|, max ci (x)
iE
iI
is suciently small, and that nk is then reasonably scaled with respect to the values
of the constraints. In other words, we assume that
(2.10)
for some constants usc > 0 and n > 0. This assumption is also used by Dennis,
El-Alem, and Maciel [9] and Dennis and Vicente [11] in the context of problems only
FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER
638
xNk = xk + nk
and observe that nk satises the constraints of TRQP(xk , k ) and thus also of QP(xk ).
It is crucial to note, at this stage, that such an nk may fail to exist because the
constraints of QP(xk ) may be incompatible, in which case Pk is undened, or because
all feasible points for QP (xk ) may lie outside the trust-region.
Let us continue to consider the case where this problem does not arise, and a
normal step nk has been found with nk k . We then have to nd a tangential
step tk , starting from xNk and satisfying (2.6) and (2.7), with the aim of decreasing the
value of the objective function. As always in trust-region methods, this is achieved by
computing a step that produces a sucient decrease in mk , which is to say that we
wish mk (xNk ) mk (xk + sk ) to be suciently large. Of course, this is only possible
if the maximum size of tk is not too small, which is to say that xNk is not too close to
the trust-region boundary. We formalize this supposition by replacing our condition
that nk k with the stronger requirement that
(2.12)
nk k min[1, k ]
for some (0, 1], some > 0, and some (0, 1). If condition (2.12) does
not hold, we assume that the computation of tk is unlikely to produce a satisfactory
decrease in mk , and proceed just as if the feasible set of TRQP(xk , k ) were empty. If
nk can be computed and (2.12) holds, we shall say that TRQP(xk , k ) is compatible.
In this case at least a sucient model decrease seems possible, which we state in the
form of a familiar Cauchy-point condition. In order to formalize what we mean, we
639
recall that the feasible set of QP(xk ) is convex, and we can therefore introduce the
measure
k =
(2.13)
min
gk + Hk nk , t
AE (xk )t=0
cI (xk )+AI (xk )(nk +t)0
t 1
(see Conn et al. [6]), which we will use to deduce rst-order criticality for our problem
(see Lemma 3.2). Note that this function is zero if the origin is a rst-order critical
point of the tangential problem
(2.14)
FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER
640
min (x),
xRn
perhaps starting from xk , the current iterate. This is a nonsmooth problem, but we
know that there exist methods, possibly of trust-region type (such as that suggested
by Yuan [32]), which can be successfully applied to solve it. Thus we will not describe
the restoration procedure in detail. Note that we have chosen here to reduce the
innity norm of the constraint violation, but we could equally well consider other
norms, such as 1 or 2 , in which case the methods of Fletcher and Leyer [17] or
of El-Hallabi and Tapia [14] and Dennis, El-Alem, and Williamson [10], respectively,
can be considered. Of course, this technique only guarantees convergence to a rstorder critical point of the chosen measure of constraint violation, which means that,
in fact, the restoration procedure may fail as this critical point may not be feasible
for the constraints of (1.1). However, even in this case, the result of the procedure
is of interest because it typically produces a local minimizer of (x), or of whatever
other measure of constraint violation we choose for the restoration, yielding a point
of locally least infeasibility.
There is no easy way to circumvent this drawback, as it is known that nding a
feasible point or proving that no such point exists is a global optimization problem
and can be as dicult as the optimization problem (1.1) itself. We therefore accept
two possible outcomes of the restoration procedure: either the procedure fails in that
it does not produce a sequence of iterates converging to feasibility, or a point xk + rk
is produced such that (xk + rk ) is as small as we wish. We will shortly see that this
is all we need.
2.2. The notion of a lter. Having computed a step sk = nk + tk (or rk ), we
still need to decide whether the trial point xk + sk (or xk + rk ) is any better than
xk as an approximate solution to our original problem (1.1). We shall use a concept
borrowed from multicriteria optimization. We say that a point x1 dominates a point
x2 whenever
(x1 ) (x2 ) and f (x1 ) f (x2 ).
Thus, if iterate xk dominates iterate xj , the latter is of no real interest to us since xk
is at least as good as xj on account of both feasibility and optimality. All we need to
do now is to remember iterates that are not dominated by any other iterates using a
structure called a lter. A lter is a list F of pairs of the form (i , fi ) such that either
i < j or fi < fj
for i = j. We thus aim to accept a new iterate xi only if it is not dominated by
any other iterate in the lter. In the vocabulary of multicriteria optimization, this
amounts to building elements of the ecient frontier associated with the bicriteria
problem of reducing infeasibility and the objective function value.
Figure 2.1 illustrates the concept of a lter by showing the pairs (k , fk ) as black
dots in the (, f )-space. Each such pair is called the (, f )-pair associated with xk .
The lines radiating from each (, f )-pair indicate that any iterate whose associated
(, f )-pair occurs above and to the right of that of a given lter point is dominated
by this (, f )-pair.
While the idea of not accepting dominated trial points is simple and elegant,
it needs to be rened a little in order to provide an ecient algorithmic tool. In
641
f (x)
(x)
r
r
for some (0, 1). In Figure 2.1, the set of acceptable points corresponds to the set
of (, f )-pairs below the thin line. We also say that x is acceptable for the lter and
xk if (2.17) holds with F replaced by F (k , fk ). We thus consider moving from
xk to xk + sk only if xk + sk is acceptable for the lter and xk .
As the algorithm progresses, we may want to add a (, f )-pair to the lter. If an
iterate xk is acceptable for F, we do this by adding the pair (k , fk ) to the lter and
by removing from it every other pair (j , fj ) such that both
(2.18)
j k and fj j fk k .
Only entries whose envelope is dominated by a new entry are thus removed from
the lter. As a consequence, the margin of the lter never decreases, and it can be
shown that, for all innite subsequences of points added to the lter, lim ki = 0 (see
Lemma 3.3). We also refer to this operation as adding xk to the lter, although,
strictly speaking, it is the (, f )-pair which is added.
We conclude this section by noting that, if a point xk is in the lter or is acceptable
for the lter, then any other point x such that
(x) (1 )k and f (x) fk k
is also acceptable for the lter and xk .
642
mk (xk ) mk (xk + sk ) k
and
(2.20)
def
k =
f (xk ) f (xk + sk )
< 1 ,
mk (xk ) mk (xk + sk )
but it is too early to know if these are even close to the best possible choices.
643
Observe rst that, by construction, every iterate xk must be acceptable for the
lter at the beginning of iteration k, irrespective of the possibility that it might be
added to the lter later. Also note that the restoration step rk cannot be zero, that
is, restoration cannot simply entail enlarging the trust-region radius to ensure (2.12),
even if nk exists. This is because xk is added to the lter before rk is computed,
and xk + rk must be acceptable for the lter which now contains xk . Also note
that the restoration procedure cannot be applied on two successive iterations, since
the iterate xk + rk produced by the rst of these iterations leads to a compatible
TRQP(xk+1 , k+1 ) and is acceptable for the lter.
For the restoration procedure in Step 2 to succeed, we have to evaluate whether
TRQP(xk + rk , k+1 ) is compatible for a suitable value of k+1 . This requires that a
suitable normal step be computed which successfully passes the test (2.12). Of course,
once this is achieved, this normal step may be reused at iteration k + 1. Thus we shall
require that the normal step calculated to verify compatibility of TRQP(xk +rk , k+1 )
should actually be used as nk+1 .
As it stands, the algorithm is not specic about how to choose k+1 during
a restoration iteration. On one hand, there is an advantage to choosing a large
k+1 , since this allows a large step and, one hopes, good progress. On the other
hand, it may be unwise to choose it to be too large, as this may possibly result in
a large number of unsuccessful iterations, during which the radius is reduced, before
the algorithm can make any progress. A possible choice might be to restart from
the radius obtained during the restoration iteration itself, if it uses a trust-region
method. Reasonable alternatives would be to use the average radius observed during
past successful iterations, or to apply the internal doubling strategy of Byrd, Schnabel,
and Shultz [5] to increase the new radius, or even to consider the technique described
by Sartenaer [28]. However, we recognize that numerical experience with the algorithm
is too limited at this stage to make denite recommendations.
The role of condition (2.19) may be interpreted as follows. If this condition fails,
then one may think that the constraint violation is signicant and that one should
aim to improve on this situation in the future by inserting the current point into the
lter. Fletcher, Leyer, and Toint [19] use the term of -step in such circumstances
to indicate that the main preoccupation is to improve feasibility. On the other hand,
if condition (2.19) holds, then the reduction in the objective function predicted by the
model is more signicant than the current constraint violation, and it is thus appealing
to let the algorithm behave as if it were unconstrained. Fletcher and Leyer [18] use
the term f -step to denote the step generated, in order to reect the dominant role
of the objective function f . In this case, it is important that the predicted decrease
in the model be realized by the actual decrease in the function, which is why we also
require that (2.20) not hold. In particular, if the iterate xk is feasible, then (2.10)
implies that xk = xNk , and we obtain that
(2.21)
As a consequence, the lter mechanism is irrelevant if all iterates are feasible, and
the algorithm reduces to a classical unconstrained trust-region method. Another
consequence of (2.21) is that no feasible iterate is ever included in the lter, which
is crucial in allowing nite termination of the restoration procedure. Indeed, if the
restoration procedure is required at iteration k of the lter algorithm and produces a
sequence of points {xk,j } converging to feasibility, there must be an iterate xk,j for
644
which
def
k,j = (xk,j ) min (1 )kmin ,
k+1 min[1, k+1 ]
usc
min i > 0
iZ, ik
and
Z = {k | xk is added to the lter}.
Moreover, k,j must eventually be small enough to ensure, using our assumption on
the normal step, the existence of a normal step nk,j from xk,j . In other words, the
restoration iteration must eventually nd an iterate xk,j which is acceptable for the
lter and for which the normal step exists and satises (2.12), i.e., an iterate xj which
is both acceptable and compatible. As a consequence, the restoration procedure will
terminate in a nite number of steps, and the lter algorithm may then proceed. Note
that the restoration step may not terminate in a nite number of iterations if we do
not assume the existence of the normal step when the constraint violation is small
enough, even if this violation converges to zero (see Fletcher, Leyer, and Toint [19],
for an example).
Notice also that (2.19) ensures that the denominator of k in (2.20) will be strictly
positive whenever k is. If k = 0, then xk = xNk , and the denominator of (2.20) will
be strictly positive unless xk is a rst-order critical point because of (2.15).
The reader may have observed that Step 6 allows a relatively wide choice of the
new trust-region radius k+1 . While the stated conditions appear to be sucient
for the theory developed below, one must obviously be more specic in practice. For
instance, one may wish to distinguish, at this point in the algorithm, the cases where
(2.19) fails or holds. If (2.19) fails, the main eect of the current iteration is not
to reduce the model (which makes the value of k essentially irrelevant), but rather
to reduce the constraint violation (which is taken care of by inserting the current
iterate into the lter at Step 5). In this case, Step 6 imposes no further restriction on
k+1 . In practice, it may be reasonable not to reduce the trust-region radius, because
this might cause too small steps towards feasibility or an unnecessary restoration
phase. However, there is no compelling reason to increase the radius either, given
the compatibility of TRQP(xk ,k ). A reasonable strategy might then be to choose
k+1 = k . If, on the other hand, (2.19) holds, the emphasis of the iteration is then
on reducing the objective function, a case akin to unconstrained minimization. Thus
a more detailed rule of the type
[1 k , 2 k ]
if k [1 , 2 ),
k+1
if k 2
[k , 2 k ]
seems reasonable in these circumstances.
Finally, we recognize that (2.15) may be dicult to verify in practice, since it
may be expensive to compute xNk and Pk when the dimension of the problem is large.
3. Convergence to rst-order critical points. We now prove that our algorithm generates a globally convergent sequence of iterates. In the following analysis,
we concentrate on the case in which the restoration iteration always succeeds. If this
645
is not the case, then it usually follows that the restoration phase has converged to an
approximate solution of the feasibility problem (2.16) and we can conclude that (1.1)
is locally inconsistent. For the purpose of our analysis, we shall consider
S = {k | xk+1 = xk + sk },
the set of (indices of) successful iterations, and
R = {k | nk does not satisfy (2.10)
the set of restoration iterations. In order to obtain our global convergence result, we
will use the following assumptions.
AS1. f and the constraint functions cE and cI are twice continuously dierentiable.
AS2. There exists umh > 1 such that
Hk umh 1 for all k.
AS3. The iterates {xk } remain in a closed, bounded domain X Rn .
If, for example, Hk is chosen as the Hessian of the Lagrangian function
(x, y) = f (x) + yE , cE (x) + yI , cI (x)
at xk , in that
(3.1)
Hk = xx f (xk ) +
[yk ]i xx ci (xk ),
iEI
where [yk ]i denotes the ith component of the vector of Lagrange multipliers ykT =
T
T
(yE,k
yI,k
), then we see from AS1 and AS3 that AS2 is satised when these multipliers
remain bounded. The same is true if the Hessian matrices in (3.1) are replaced by
bounded approximations.
A rst immediate consequence of AS1AS3 is that there exists a constant ubh > 1
such that, for all k,
(3.2)
A proof of this property, based on Taylor expansion, may be found, for instance, in
Toint [29]. A second important consequence of our assumptions is that AS1 and AS3
together directly ensure that, for all k,
(3.3)
for some constants f min and max > 0. Thus the part of the (, f )-space in which the
(, f )-pairs associated with the lter iterates lie is restricted to the rectangle
A0 = [0, max ] [f min , ].
We also note the following simple consequence of (2.10) and AS3.
Lemma 3.1. Suppose that Algorithm 2.1 is applied to problem (1.1). Suppose also
that (2.10) and AS3 hold and that
k n .
FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER
646
(3.4)
Proof. First dene
def
{j I | k = cj (xk )},
for all j Vk . But AS3 ensures that there exists a constant lsc > 0 such that
def
jEI xX
1
.
lsc
We then obtain the desired conclusion by substituting this bound into (3.5).
Our assumptions and the denition of k in (2.13) ensure that k and k can be
used (together) to measure criticality for problem (1.1).
Lemma 3.2. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1 and AS3 hold, and that there exists
a subsequence {ki } such that, for any i, ki R with
(3.6)
Then every limit point of the subsequence {xki } is a rst-order critical point for problem (1.1).
Proof. Consider x , a limit point of the sequence {xki }, whose existence is ensured
by AS3, and assume that {k } {ki } is the index set of a subsequence such that {xk }
converges to x . The fact that k R implies that nk satises (2.10) for suciently
large and converges to zero, because {k } converges to zero and the second part of
this condition. As a consequence, we deduce from (2.11) that {xNk } also converges to
x . Since the minimization problem occurring in the denition of k (in (2.13)) is
convex, we then obtain from classical perturbation theory (see, for instance, Fiacco
[15, pp. 1417], AS1, and the rst part of (3.6) that
= 0.
min
g
,
t
A (x )t=0
E
cI (x )+AI (x )t0
t 1
This in turn guarantees that x is rst-order critical for problem (1.1).
We start our analysis by examining what happens when an innite number of
iterates (that is, their (, f )-pairs) are added to the lter.
Lemma 3.3. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose that AS1 and AS3 hold and that |Z| = . Then
lim k = 0.
k
kZ
647
Proof. Suppose, for the purpose of obtaining a contradiction, that there exists an
innite subsequence {ki } Z such that
ki %
(3.7)
for all i and for some % > 0. At each iteration ki , the (, f )-pair associated with xki ,
that is (ki , fki ), is added to the lter. This means that no other (, f )-pair can be
added to the lter at a later stage within the square
[ki %, ki ] [fki %, fki ]
or with the intersection of this square with A0 . Note that this holds, even if (ki , fki )
is later removed from the lter, since the rule for removing entries, (2.18), ensures
that the envelope never shrinks. Now observe that the area of each of these squares
is 2 %2 . As a consequence, the set A0 {(, f )|f f } is completely covered by at
most a nite number of such squares, for any choice of f f min . Since the pairs
(ki , fki ) keep on being added to the lter, this implies that fki tends to innity when
i tends to innity. Let us assume, without loss of generality, that fki+1 fki for all i
suciently large. But (2.17) and (3.7) then imply that
ki+1 (1 )ki ki %,
and therefore that ki converges to zero, which contradicts (3.7). Hence this latter
assumption is imposssible and the conclusion follows.
We next examine the size of the constraint violation before and after an iteration
where restoration did not occur.
Lemma 3.4. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1 and AS3 hold, that k R, and
that nk satises (3.4). Then
k ubt 1+
k
(3.8)
and
(3.9)
(xk + sk ) ubt 2k
lsc k nk 1+
,
k
which gives (3.8). Now, the ith constraint function at xk + sk can be expressed as
1
ci (xk + sk ) = ci (xk ) + ei , Ak sk + sk , xx ci (k )sk
2
for i E I, where we have used AS1 and the mean-value theorem and where k
belongs to the segment [xk , xk + sk ]. Using AS3, we may bound the Hessian of the
constraint functions, and we obtain from (2.7), the CauchySchwarz inequality, and
(2.6) that
|ci (xk + sk )|
1
max xx ci (x) sk 2 1 2k
2 xX
FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER
648
if i E, or
1
max xx ci (x) sk 2 1 2k
2 xX
ci (xk + sk )
if i I, where we have dened
def
1 =
1
max max xx ci (x).
2 iEI xX
(3.11)
for some % > 0, and that
(3.12)
ubg
k min
, 2
umh
umh
1
1+
tmd %
4ubg
def
= m ,
def
1
tmd %k .
2
We thus conclude from this last inequality and (3.13) that the desired conclusion
holds.
We continue our analysis by showing, as the reader has grown to expect, that
iterations have to be very successful when the trust-region radius is suciently small.
649
Lemma 3.6. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1AS3, (2.15), and (3.11) hold, that
k R, and that
(1 2 )tmd % def
(3.14)
= .
k min m ,
2ubh
Then
k 2 .
Proof. Using the denition of k in (2.20), (3.2), Lemma 3.5, and (3.14), we nd
that
|k 1| =
ubh 2k
|f (xk + sk ) mk (xk + sk )|
1
1 2 ,
|mk (xk ) mk (xk + sk )|
2 tmd %k
def
k =
f (xk ) f (xk + sk ) + k
,
mk (xk ) mk (xk + sk )
k
ubt k
1
tmd %k mk (xk ) mk (xk + sk ),
2
where we have successively used Lemma 3.4, (3.16), and Lemma 3.5.
We may also guarantee a decrease in the objective function, large enough to ensure
that the trial point is acceptable with respect to the (, f )-pair associated with xk , so
long as the constraint violation is itself suciently small.
Lemma 3.8. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1AS3, (2.15), (3.11), and (3.14)
hold, that k R, that nk satises (3.4), and that
(3.17)
k ubt
2 tmd %
2
1+
def
= .
650
Then
f (xk + sk ) f (xk ) k .
Proof. Applying Lemmas 3.43.6which is possible because of (3.11), (3.14),
k R, and the fact that nk satises (3.4)and (3.17), we obtain that
f (xk ) f (xk + sk ) 2 [mk (xk ) mk (xk + sk )]
12 2 tmd %k
1
1+
k
12 2 tmd % ubt
k ,
and the desired inequality follows.
We now establish that if the trust-region radius and the constraint violation are
both small at a noncritical iterate xk , TRQP(xk , k ) must be compatible.
Lemma 3.9. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1AS3, (2.10), and (3.11) hold, that
(2.15) holds for k
/ R, and that
1
1
2
1
0 (1 ) 1 def
k min 0 ,
(3.18)
,
= R .
usc ubt
Suppose furthermore that k > 0 and that
(3.19)
k min[ , n ].
Then k R.
Proof. Because k n , we know from (2.10) and Lemma 3.1 that nk satises
(2.10) and (3.4). Moreover, since k , we have that (3.17) also holds. Assume,
for the purpose of deriving a contradiction, that k R, that is,
(3.20)
nk > 1+
,
k
where we have used (2.12) and the fact that k 1 because of (3.18). In this
case, the mechanism of the algorithm then ensures that k 1 R. Now assume that
iteration k 1 is unsuccessful. Because of Lemmas 3.6 and 3.8, which hold at iteration
k 1 R because of (3.18), the fact that k = k1 , (2.10), and (3.17), we obtain
that
(3.21)
Hence, given that xk1 is acceptable for the lter at the beginning of iteration k 1,
if this iteration is unsuccessful, it must be because xk1 + sk1 is not acceptable for
the lter and xk1 , which in turn can happen only if
(xk1 + sk1 ) > (1 )k1 = (1 )k
because of (3.21) (see the last paragraph of section 2.2). But Lemma 3.4 and the
mechanism of the algorithm then imply that
(1 )k ubt 2k1
ubt 2
.
02 k
651
Combining this last bound with (3.20) and (2.10), we deduce that
< nk usc k
1+
k
usc ubt
2
) k
02 (1
1
>
k
Since this last inequality contradicts (3.18), our assumption that iteration k 1 is
unsuccessful must be false. Thus iteration k 1 is successful and k = (xk1 + sk1 ).
We then obtain from (3.20), (2.10), and (3.9) that
< nk usc k usc ubt 2k1
1+
k
usc ubt 2
k ,
02
which is again impossible because of (3.18) and because (1 ) < 1. Hence our
initial assumption (3.20) must be false, which yields the desired conclusion.
We now distinguish two mutually exclusive cases. For the rst, we consider what
happens if there is an innite subsequence of iterates belonging to the lter.
Lemma 3.10. Suppose that Algorithm 2.1 is applied to problem (1.1) and that
nite termination does not occur. Suppose also that AS1AS3, (2.10) hold, and (2.15)
holds for k
/ R. Suppose furthermore that |Z| = . Then there exists a subsequence
{kj } Z such that
(3.22)
lim kj = 0
and
lim kj = 0.
(3.23)
Proof. Let {ki } be any innite subsequence of Z. We observe that (3.22) follows
from Lemma 3.3. Suppose now that
ki %2 > 0
(3.24)
for all i and some %2 > 0. Suppose furthermore that there exists %3 > 0 such that, for
all i i0 ,
ki %3 .
(3.25)
(3.26)
Thus (3.25) ensures that (2.12) holds for suciently large i and thus ki R for such
i. Now, as we noted in the proof of Lemma 3.5,
1
|mki (xki ) mki (xNki )| ubg nki + umh nki 2 ,
2
which in turn, with (3.26), yields that
(3.27)
652
(3.28)
%2
+ ski ) tmd %2 min
, %3
umh
def
= > 0.
We now decompose the model decrease in its normal and tangential components, that
is,
mki (xki ) mki (xki + ski ) = mki (xki ) mki (xNki ) + mki (xNki ) mki (xki + ski ).
Substituting (3.27) and (3.28) into this decomposition, we nd that
(3.29)
We now observe that, because xki is added to the lter at iteration ki , we know from
the mechanism of the algorithm that either iteration ki R or (2.19) must fail. Since
we have already shown that ki R, (2.19) must fail for i suciently large, that is,
(3.30)
Combining this bound with (3.29), we nd that ki is bounded away from zero for
i suciently large, which is impossible in view of (3.22). We therefore deduce that
(3.25) cannot hold and obtain that there is a subsequence {k } {ki } for which
lim k = 0.
We now restrict our attention to the tail of this subsequence, that is, to the set of
indices k > 0 that are large enough to ensure that (3.16), (3.17), and (3.18) hold,
which is possible by denition of the subsequence and because of (3.22). For these
indices, we may therefore apply Lemma 3.9 and deduce that iteration k R for
suciently large. Hence, as above, (3.30) must hold for suciently large. However,
we may also apply Lemma 3.7, which contradicts (3.30), and therefore (3.24) cannot
hold, yielding the desired result.
Thus, if an innite subsequence of iterates is added to the lter, Lemma 3.2
ensures that there exists a limit point which is a rst-order critical point. Our remaining analysis then naturally concentrates on the possibility that there may be no
such innite subsequence. In this case, no further iterates are added to the lter for k
suciently large. In particular, this means that the number of restoration iterations,
|R|, must be nite. In what follows, we assume that k0 0 is the last iteration for
which xk0 1 is added to the lter.
Lemma 3.11. Suppose that Algorithm 2.1 is applied to problem (1.1), that nite
termination does not occur, and that |Z| < . Suppose also that AS1AS3 and (2.10)
hold and that (2.15) holds for k
/ R. Then we have that
(3.31)
lim k = 0.
653
Thus the objective function does not increase for all successful iterations with k k0 .
But AS1 and AS3 imply (3.3), and therefore we must have, from the rst part of this
statement, that
(3.33)
kS
The limit (3.31) then immediately follows from (3.32) and the fact that j = k for
all unsuccessful iterations j that immediately follow the successful iteration k, if any.
The last conclusion then results from (2.10) and Lemma 3.1.
We now show that the trust-region radius cannot become arbitrarily small if the
(asymptotically feasible) iterates stay away from rst-order critical points.
Lemma 3.12. Suppose that Algorithm 2.1 is applied to problem (1.1), that nite
termination does not occur, and that |Z| < . Suppose also that AS1AS3 hold and
(2.15) holds for k
/ R. Suppose furthermore that (3.11) hold for all k k0 . Then
there exists a min > 0 such that
k min
for all k.
Proof. Suppose that k1 k0 is chosen suciently large to ensure that (3.19) holds
and that nk satises (2.10) for all k k1 , which is possible because of Lemma 3.11.
Suppose also, for the purpose of obtaining a contradiction, that iteration j is the rst
iteration following iteration k1 for which
F
(1
)
def
j 0 min ,
(3.34)
, k1 = 0 s ,
ubt
where
def
F = min i
iZ
is the smallest constraint violation appearing in the lter. Note also that the inequality
j 0 k1 , which is implied by (3.34), ensures that j k1 + 1 and hence that
j 1 k1 and thus that j 1 R. Then the mechanism of the algorithm and (3.34)
imply that
(3.35)
j1
1
j s ,
0
and Lemma 3.6, which is applicable because (3.34) and (3.35) together imply (3.14)
with k replaced by j 1, then ensures that
(3.36)
j1 2 .
Furthermore, since nj1 satises (2.10), Lemma 3.1 implies that we can apply Lemma 3.4.
This, together with (3.34) and (3.35), gives that
(3.37)
We may also apply Lemma 3.8 because (3.34) and (3.35) ensure that (3.14) holds and
because (3.17) also holds for j 1 k1 . Hence we deduce that
f (xj1 + sj1 ) f (xj1 ) j1 .
654
This last relation and (3.37) ensure that xj1 + sj1 is acceptable for the lter and
xj1 . Combining this conclusion with (3.36) and the mechanism of the algorithm, we
obtain that j j1 . As a consequence, and since (2.19) also holds at iteration
j 1, iteration j cannot be the rst iteration following k1 for which (3.34) holds. This
contradiction shows that k 0 s for all k > k1 , and the desired result follows if
we dene
min = min[0 , . . . , k1 , 0 s ].
We may now analyze the convergence of k itself.
Lemma 3.13. Suppose that Algorithm 2.1 is applied to problem (1.1), that nite
termination does not occur, and that |Z| < . Suppose also that AS1AS3, (2.10)
hold, and (2.15) holds for k
/ R. Then
lim inf k = 0.
(3.38)
Proof. We start by observing that Lemma 3.11 implies that the second conclusion
of (2.10) holds for k suciently large. Moreover, as in Lemma 3.11, we obtain (3.32)
and therefore (3.33) for each k S, k k0 . Suppose now, for the purpose of obtaining
a contradiction, that (3.11) holds, and notice that
(3.39) mk (xk ) mk (xk + sk ) = mk (xk ) mk (xNk ) + mk (xNk ) mk (xk + sk ).
Moreover, note, as in Lemma 3.5, that
|mk (xk ) mk (xNk )| ubg nk + umh nk 2 ,
which in turn yields that
lim [mk (xk ) mk (xNk )] = 0
because of Lemma 3.11 and the second conclusion of (2.10). This limit, together with
(3.32), (3.33), and (3.39), then gives that
(3.40)
k
kS
But (2.15), (3.11), AS2, and Lemma 3.12 together imply that for all k k0
655
lim kj = lim kj = 0.
for some value of f min and simply monitor that the values f (xk ) are
reasonablein view of the problem being solvedas the algorithm proceeds. To
summarize, we may replace AS1 and AS3 by the following assumption.
AS4. The functions f and c are twice continuously dierentiable on an open
set containing C(max ), their rst and second derivatives are uniformly bounded on
C(max ), and (3.43) holds.
The reader should note that AS4 no longer ensures the existence of a limit point,
but only that (3.42) holds for some subsequence {kj }. Furthermore, the comments
following the statement of (2.10) no longer apply if limit points at innity are allowed.
4. Conclusion and perspectives. We have introduced a trust-region SQPlter algorithm for general nonlinear programming and have shown this algorithm to
be globally convergent to rst-order critical points. The proposed algorithm diers
from that discussed by Fletcher and Leyer [18], notably because it uses a decomposition of the step in its normal and tangential components and imposes some restrictions
on the length of the former. However, preliminary numerical experiments indicate that
its practical performance is similar to that reported in [18]. Since the performance
of the latter is excellent, the theory developed in this paper provides the reassurance
that lter algorithms also have reasonable convergence properties, which then makes
these methods very attractive.
We are aware, however, that the convergence study is not complete, as we have
not discussed local convergence properties. As it is possible to exhibit examples where
the SQP step increases both the objective function and the constraint violation,2 it is
2 Such an example is provided by Figure 9.3.1 in Fletcher [16], taking the case =
feasible point close to the origin illustrates the eect.
1
4
. Any
656
very likely that such a study will have to introduce second-order corrections (see [16,
section 14.4]) to ensure that the Maratos eect does not take place and that a fast
(quadratic) rate of convergence can be achieved. Moreover, convergence to secondorder critical points also remains, for now, an open question. In this context, the
alternative denition of k presented in (3.15) is also likely to play a role if we choose
Hk according to (3.1). In this case, we might choose
k =
[yk ]i sk , xx ci (xk )sk
iEI
in order to ensure that the denominator of the fraction dening k is a correct model of
its numerator not only up to rst-order, but also up to second-order. These questions
are the subject of ongoing work.
657
Appendix.
initialization (k = 0)
attempt to compute nk
TRQP(xk , k ) compatible?
n
compute tk
xk + sk acceptable?
y
n
y
k < 1 and
mk (xk ) mk (xk + sk )
mk (xk ) mk (xk + sk ) k
?
xk+1 = xk + rk
xk+1 = xk + sk
increase k k+1
xk+1 = xk
reduce k
k+1
658
[1] L. T. Biegler, J. Nocedal, and C. Schmid, A reduced Hessian method for large-scale constrained optimization, SIAM J. Optim., 5 (1995), pp. 314347.
[2] R. H. Bielschowsky and F. A. M. Gomes, Dynamical Control of Infeasibility in Nonlinearly
Constrained Optimization, presentation at the Optimization 98 Conference, University of
Coimbra, Portugal, 1998.
[3] R. H. Byrd, J. C. Gilbert, and J. Nocedal, A trust region method based on interior point
techniques for nonlinear programming, Math. Program., 89 (2000), pp. 149186.
[4] R. H. Byrd, M. E. Hribar, and J. Nocedal, An interior point algorithm for large-scale
nonlinear programming, SIAM J. Optim., 9 (2000), pp. 877900.
[5] R. H. Byrd, R. B. Schnabel, and G. A. Shultz, A trust region algorithm for nonlinearly
constrained optimization, SIAM J. Numer. Anal., 24 (1987), pp. 11521170.
[6] A. R. Conn, N. Gould, A. Sartenaer, and Ph. L. Toint, Global convergence of a class of
trust region algorithms for optimization using inexact projections on convex constraints,
SIAM J. Optim., 3 (1993), pp. 164221.
[7] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, MPS-SIAM Ser.
Optim. 1, SIAM, Philadelphia, 2000.
[8] R. S. Dembo and U. Tulowitzki, On the Minimization of Quadratic Functions Subject to
Box Constraints, Working paper, Series B, 71, School of Organization and Management,
Yale University, New Haven, CT, 1983.
[9] J. E. Dennis, Jr., M. El-Alem, and M. C. Maciel, A global convergence theory for general
trust-region based algorithms for equality constrained optimization, SIAM J. Optim., 7
(1997), pp. 177207.
[10] J. E. Dennis, Jr., M. El-Alem, and K. Williamson, A trust-region approach to nonlinear
systems of equalities and inequalities, SIAM J. Optim., 9 (1999), pp. 291315.
[11] J. E. Dennis and L. N. Vicente, On the convergence theory of trust-region-based algorithms
for equality-constrained optimization, SIAM J. Optim., 7 (1997), pp. 927950.
[12] M. El-Alem, Global convergence without the assumption of linear independence for a trustregion algorithm for constrained optimization, J. Optim. Theory Appl., 87 (1995), pp. 563
577.
[13] M. El-Alem, A global convergence theory for Dennis, El-Alem, and Maciels class of trustregion algorithms for constrained optimization without assuming regularity, SIAM J. Optim., 9 (1999), pp. 965990.
[14] M. El-Hallabi and R. A. Tapia, An Inexact Trust-Region Feasible-Point Algorithm for Nonlinear Systems of Equalities and Inequalities, Technical Report TR95-09, Department of
Computational and Applied Mathematics, Rice University, Houston, TX, 1995.
[15] A. V. Fiacco, Introduction to Sensitivity and Stability Analysis in Nonlinear Programming,
Academic Press, London, 1983.
[16] R. Fletcher, Practical Methods of Optimization, 2nd ed., J. Wiley and Sons, Chichester,
England, 1987.
[17] R. Fletcher and S. Leyffer, User Manual for FilterSQP, Numerical Analysis Report
NA/181, Department of Mathematics, University of Dundee, Dundee, Scotland, 1998.
[18] R. Fletcher and S. Leyffer, Nonlinear programming without a penalty function, Math.
Program., 91 (2002), pp. 239269.
[19] R. Fletcher, S. Leyffer, and P. L. Toint, On the Global Convergence of an SLP-Filter
Algorithm, Technical Report 98/13, Department of Mathematics, University of Namur,
Namur, Belgium, 1998.
[20] R. Fletcher, S. Leyffer, and P. L. Toint, On the Global Convergence of a Filter-SQP
Algorithm, Technical Report 00/15, Department of Mathematics, University of Namur,
Namur, Belgium, 2000.
[21] N. I. M. Gould, M. E. Hribar, and J. Nocedal, On the solution of equality constrained
quadratic programming problems arising in optimization, SIAM J. Sci. Comput., 23 (2001),
pp. 13761395.
[22] M. Lalee, J. Nocedal, and T. D. Plantenga, On the implementation of an algorithm for
large-scale equality constrained optimization, SIAM J. Optim., 8 (1998), pp. 682706.
[23] X. Liu and Y. Yuan, A Robust Trust-Region Algorithm for Solving General Nonlinear Programming Problems, presentation at the International Conference on Nonlinear Programming and Variational Inequalities, Hong Kong, 1998.
and G. Toraldo, On the solution of large quadratic programming problems with
[24] J. J. More
bound constraints, SIAM J. Optim., 1 (1991), pp. 93113.
[25] W. Murray and F. J. Prieto, A sequential quadratic programming algorithm using an in-
659