Goldfarb Idnani
Goldfarb Idnani
Goldfarb Idnani
North-Holland
D. G O L D F A R B
Columbia University, Department of Industrial Engineering and Operations Research, New York,
10027, U.S.A.
A. IDNANI
Bell Laboratories, Murray Hill, N.J. 07974, U.S.A.
An efficient and numerically stable dual algorithm for positive definite quadratic program-
ming is described which takes advantage of the fact lhat the unconstrained minimum of the
objective function can be used as a starling point. Its implementation utilizes the Cholesky
and QR factorizations and procedures for updating them. The performance of the dual
algorithm is compared against that of primal algorithms when nsed to solve randomly
generated test problems and quadratic programs generated in the course of solving nonlinear
programming problems by a successive quadratic programming code (the principal motivation
for the development of the algorithm). These computational results indicate that the dual
algorithm is superior to primal algorithms when a primal feasible point is not readily available.
The algorithm is also compared theoretically to the modified-simplex type dual methods of
Lemke and Van de Panne and Whinston and it is ilh,strated by a numerical example.
Key, words: Positive Definite Quadratic Programming. Matrix Factorizations, Dual Al-
gorithms, Successive Quadratic Programming Methods.
I. Introduction
w h e r e x a n d a a r e n - v e c t o r s , G is a n n • n s y m m e t r i c p o s i t i v e d e f i n i t e m a t r i x , C
is a n n • m m a t r i x , b is an m - v e c t o r , a n d s u p e r s c r i p t T d e n o t e s t h e t r a n s p o s e .
A l t h o u g h t h e v e c t o r o f v a r i a b l e s x m a y a l s o b e s u b j e c t to e q u a l i t y c o n s t r a i n t s
t2Tx - t; = 0 (I.2)
*This research was supported in part by the Army Research Office under Grant No. DAAG
29-77-G-0114 and in part by the National Science Foundation under Grant No. MCS-6006065.
2 D. Goldfarb and A. hh~ani/A numerically stable dual method
we shall ignore such constraints for the moment in order to simplify our
presentation.
Attention has recently been drawn to the need for efficient and robust
algorithms to solve (1.1) by the development of successive (recursive) quadratic
programming methods for solving general nonlinear programming problems [4,
21, 30, 40]. These methods, which have been highly successful (e.g. see [35]),
require the solution of a strictly convex quadratic program to determine the
direction of search at each iteration.
Several approaches and numerous algorithms have been proposed for solving
quadratic programming problems. These include the primal methods of Beale [2,
31, Dantzig [101, Fletcher [13], Goldfarb [16], Bunch and Kaufman [51, Gill and
Murray [151, Murray [28] and Wolfe [41], the dual methods of Lemke [261 and
Van de Panne and Whinston [39], the principal pivoting methods of Cottle and
Dantzig [7], the parametric methods of Grigoriadis and Ritter [20] and Ritter [32],
the primal-dual method of Goncalves [19], the methods for constrained least
squares of Stoer [371, Schittkowski and Stoer [36], Lawson and Hanson [251, and
Mifflin [271, the exact penalty function methods of Conn and Sinclair [6] and Han
[22], and the subproblem optimization method of Theil and Van de Panne [38].
The above methods can for the most part be categorized as either modified
simplex type methods or projection methods. The former perform simplex type
pivots on basis matrices or tableaux of row size (m + n) that are derived from
the K u h n - T u c k e r optimality conditions for the QPP (1.1). The latter are based
upon projections onto active sets of constraints and employ operators of size no
larger than (n • n). Consequently projection methods are usually more efficient
and require less storage than methods of the modified simplex type. In this paper
we present a projection type dual algorithm for solving the QPP (I. 1), which was
first given in ldnani [24], and outline an implementation that is both efficient and
numerically stable.
Most of the work in quadratic programming has been, in general, a two-phase
approach; in phase 1 a feasible point is obtained and then in phase 2 optimality is
achieved while maintaining feasibility. Our computational experience indicates
that, unless a feasible point is available, on the average between one-third to
one-half of the total effort required to solve a QPP using typical primal
algorithms is expended in phase 1. (See Tables 1, 2 and 3, for example.) One way
to reduce the total work is to use a phase I approach that is likely to result in a
near-optimal feasible point. This was suggested by ldnani [23] who used the
unconstrained minimum of (l.la), x = - G - ' a , as the starting point for the
variant of Rosen's feasible point routine [33] proposed by Goldfarb [16]. The
same suggestion of a starting point was made by Dax in [9]. Computational
testing indicated that this approach usually found a feasible point that was also
optimal. When it did not only very few additional iterations in phase 2 were
required to obtain optimality.
Although these results were encouraging, they suggested that a dual approach
D. Goldfarb trod A. Idnani/ A numerically stable dual method 3
would be even better. In a dual method for the strictly convex QPP one must
first provide a dual feasible point, that is, a primal optimal point for some
subproblem of the original problem. By relaxing all of the constraints (1.1b), the
unconstrained minimum of (1.1a) is such a point. A dual algorithm then iterates
until primal feasibility (i.e. dual optimality) is achieved, all the while maintaining
the primal optimality of intermediate subproblems (i.e. dual feasibility). This is
equivalent to solving the dual by a primal method (which can handle the positive
semi-definite case). The important observation to make is that the origin in the
space of dual variables is always dual feasible, so that no phase 1 is needed.
In the next section we present our basic approach for solving problem (1.1).
Our discussion is in terms of this problem rather than in terms of the problem
dual to it (cf. [26, 39]) as we believe this to be more instructive. We also
introduce some notation in this section along with some other preliminaries. The
dual algorithm is given in Section 3 where its validity and finite termination are
proved. A particular numerically stable and efficient implementation of the
algorithm is described in Section 4. In Section 5 we give the results of
computational tests performed on randomly generated quadratic programs while
in Section 6 the performance of our algorithm when used in a successive
quadratic programming code is described. In both of these sections the com-
putational performance of the dual algorithm is compared against that of primal
algorithms. Comparisons between our dual algorithm and other modified simplex
type dual algorithms for quadratic programming are given in Section 7. In
Section 8 we make some remarks on alternative primal approaches and im-
plementations of our dual algorithm and comment upon its superior behavior in
degenerate and 'near' degenerate situations. We also provide an appendix in
which we work through an example to illustrate the various parts of the dual
algorithm.
The dual algorithm that is described in the next section is of the active set
type. By an active set we mean a subset of the m constraints in (l.lb) that are
satisfied as equalities by the current estimate x of the solution to the QPP (l.l).
We shall use K to denote the set {l, 2 . . . . . m} of indices of the constraints (l.lb)
and A C_ K to denote the indices of the active set.
We define a subproblem P(J) to be the QPP with the objective function (l.la)
subject only to the subset of the constraints (l.lb) indexed by J C K . For
example P(0), where 0 denotes the empty set, is the problem of finding the
unconstrained minimum of (1.1a).
If the solution x of a subproblem P(J) lies on some linearly independent active
set of constraints indexed by A C J we call (x, A) a solution-(S) pair. Clearly, if
(x, A) is an S-pair for subproblem P(J) it is also an S-pair for the subproblem
4 D. Goldfarb and A. htnani/A numerically stable dual method
Basic approach
XC~=-- G 10
is readily available, one can always start the above procedure with the S-pair
(x ~ The most important and interesting part of the a b o v e approach is Step
l(c). This step is realizable since it is always possible to implement it as
Step l(c'): Determine an S-pair (~, A) by solving P(A U {p}) and set (x, A)~--
(-'C ,~). (It is easy to show that f ( . f ) > f(x) and that A must contain p.)
This implementation is more restrictive than Step l(c) since Step I(c') requires
that .f satisfies all constraints indexed by A while Step l(c) does not. In the next
section we give a dual algorithm to implement the a b o v e basic approach which
effects Step l(c) so that dual feasibility is maintained at every point along the
solution path. A primal-dual algorithm which incorporates Step I(c') is des-
cribed and c o m p a r e d to our dual algorithm in [18, 24].
In order to describe our algorithm, it is necessary to introduce some notation.
The matrix of normal vectors of the constraints in the active set indexed by A
will be denoted by N and the cardinality of A will be denoted by q. A + will
denote the set A U { p } w h e r e p is in K ~ A a n d A will denote a proper subset
of A containing one f e w e r element than A. N § and N will represent the
matrices of normals corresponding to A + and A , respectively. Also, we shall
use I (or Ik) to denote the k x k identity matrix and ej to denote the jth column
of I. n ~will indicate the normal vector n~, added to N to give N * and n will indicate
the column deleted from N to give N - .
D. Golcffarb and A. Idnani/ A numerically stable dual method 5
When the columns of N are linearly independent one can define the operators
N,=(NTG-IN) INT G I (2.1)
and
g(.~) = Nu(.V)
where the vector of Lagrange multipliers u(.~) -> O. It follows from the definitions
of N * and H that at such a point
and
H g ( x ) = O. (2.4)
It is well known that these conditions are sufficient as well as necessary for x to
be the optimal solution to P(A).
We note that the Lagrange multipliers defined by (2.3) are independent of x on
the manifold JR. This is not the case for the first-order multipliers
( N T N ) ~ N r g ( x ) which equal N * g ( x ) only at the optimal point in M. In the next
section we shall make use of another set of multipliers
r = N*n + (2.5)
Properties.
The algorithm given below follows the dual approach described in the pre-
vious section and makes use of the operators H and N* defined there. In our
implementation we do not explicitly compute or store these operators, rather we
store and update the matrices J = Q rL ~ and R obtained from the Cholesky and
QR factorizations G = L L v and L ~N = Q[~]. This is described in Section 4.
Dual algorithm :
t l ~ - rain
/ u/Ix)l - ui~(x)
r#.o t ri J rl
j=l ..... q
If t = t2 (full step) set u~-u + and add constraint p; i.e. set A~--A U{p}, q~--q + I,
update H and N * and go to Step 1.
If t = t~ (partial step) drop constraint k; i.e., set A~--A--~ {k}, q ~ - q - 1, update H
and N * , and go to Step 2(a).
The heart of the a b o v e dual algorithm is the method used to determine a new
S-pair (,i:,/~ U {p}) in Step 2 given an S-pair (x, A) and a violated constraint p, i.e.
the columns of N are linearly independent and (2.3) and (2.4) hold,
Let us first consider the case where the columns of N ' (i.e., those of N and
n ~ = np) are linearly independent. It follows from (2.3), (2.4) and (2.6) that
H"g(x) =0 (3.3)
and
L e m m a l. L e t (x, A, p ) be a V - t r i p l e a n d c o n s i d e r p o i n t s o f the f o r m
.~=X+tZ (3.5)
where
. = H n +. (3.6)
Then
H ~g(.~) = 0, (3.7)
&(s f o r all i E A, (3.8)
u ~(.~) ~- (N+)*g(s = u+(x) + t ( [ ) , (3.9)
where
r = N'n-, (3.10)
and
S,,(X) -= Sr,(X) + tz Tn + (3.[i)
g(Y,) = g ( x ) + tGz,
Gz = G H n += (I - N N * ) n " = n ~ - N r = N+( ().
H*N" =0,
N * * N ~ = I,
and
. fuJ(x)/ /
t~=min mln~--zT-~,, } , ~ , (3.13)
(~;oqt r~ tx) J J
t, = - So(X ) / z T n t , (3.14)
D. Goldfarb and A. Idnani/ A numerically stable dual method 9
and
4
f ( , ~ ) - f(x) = tzrn+(~t + u > l ( x ) ) 2:~
- O. (3.16)
Substituting (3.17) and (3.19) into (3.18) gives (3.16). M o r e o v e r as long as t > 0,
f(~) > f(x).
F r o m the definition of t ((3.12)-(3.14)) and L e m m a 1 it is evident that
H " g ( $ ) = 0, s~(~)= 0, i E A, and u*($)->0. If t = t2, then sv(.f ) = 0 and (x, A U
{p}) is an S-pair. If t = tl < t~, then u ( ( $ ) = 0 a n d s p ( s 0. Since H*g(.f) = 0 and
u : ( ~ ) = 0 we can write
g(.~) = N+U~(~) = ~ +
ieAU-T{p}/{k}
and
ltl(X)
- rain lug(x)/. I = j(k) (3.22)
rl r i >1~ t ri J
j=l ..... q
and
N T z >-- 0
nk = n - ~ Ay" ri'"nl + n~
I u,.,
L fi
"/rj.,->0.
/
i~A-.
r = N * n ~.
Theorem 3. The dual algorithm will solve the Q P P (1.1) or indicate that it has no
f e a s i b l e solution in a finite n u m b e r o f steps.
G = LL r (4. I)
B = O
["01 ,["J
= [ 0 , ',Q2 0 (4.2)
of the n x q matrix
B = L-IN. (4.3)
can be expressed as
(4.7)
~2 = t,(y, + 9 , ) - )'2-
Adding a constraint
Q+= i~-';~.[I"
', 0 ] and R + = r_R_ _: _d ,_l
QLui(., ~ L0 i ~J
w h e r e 3 =][d21] and Q = Qt.2" Q2.3 . . . . . Q,,-,~ , .... , is the product of G i v e n s
m a t r i c e s c h o s e n so that
Moreover,
t
J'=L TQ~=[j, i j2QT]=[ j~ : J~ 1. (4.13)
q n-q q+l n-q-1
Dropping a constraint
S
=01L'N
where the partitioned matrix on the right equals R with its Ith column deleted. If
I r q, T is a ( q - l + 1 ) • I) upper-Hessenberg matrix. Again a sequence of
q - l Givens matrices can be chosen so that the product of these matrices
0 = Q,~ ~.~ ~;," 'Q_,.~Qr.: reduces T to an upper triangular matrix R:; i.e.,
QT = R,. (4.14)
The matrices denoted by 0 in (4.12) and (4.14) are not computed in our
implementation: rather the Givens matrices from which they are formed, which
successively introduce zeros into tL and T. are applied directly to the rows of J '
using the computational scheme (4.9). In [14], Gill et al. show that because of
their special structure the matrices 0 in (4.12) and (4.14) can be generated from
two vectors which can be obtained and applied to j r using suitable recurrences.
This method, however, is less efficient than the one described above.
The vector d = JVn- is required for the step directions z for the primal
variables and - r for the dual variables as indicated in (4.8). It is also needed as
shown above when the pth constraint, with normal n +, is added to the active set
A. When a constraint is dropped from A, the same orthogonal transformations
that are applied to update jT can be applied to d. If this is done, then the updated
d can be used to compute the new z and r after the basis change and to
determine the appropriate orthogonal transformation O in (4.12) needed when
the pth constraint is finally added to the active set. This saves the n ~ operations
required to compute d directly.
In this section we compare the performance of our dual algorithm against that
of two different primal algorithms on 168 randomly generated strictly convex
quadratic programming problems. Twenty-four different types of problems were
D. Goldfarb and A. htnani/ A mm~erically stable dual method 15
generated with known optimal solutions using the technique of Rosen and
Suzuki [34]. Each problem type was determined by specifying the n u m b e r of
variables n (9, 27, or 81), the number of constraints m (n or 3n), the n u m b e r of
constraints q* in the active set A at the solution (~m or Ira), and the conditioning
(well or ill) of the Hessian matrix G. For each problem the off-diagonal elements
of G were set to r ( - 1, 1) and G~t = Sr + r(0, I ) + 1 was computed, where r(a, b)
denotes a freshly c o m p u t e d (pseudo-) random n u m b e r uniformly distributed
b e t w e e n a and b and S~ denotes the sum of the absolute values of the
off-diagonal elements in the ith row of G. In the well-conditioned case we set
G~i = Si + r(0, 1)+ 1, for i = 2 . . . . . n while in the ill-conditioned case we set
G , = G ~ j + S ~ + S ~ _ ~ + r ( 0 , 1 ) , for i = 2 . . . . . n. In the latter case the condition
m e m b e r s generated were on the order of 50 for n = 9, 500 for n = 27, and 5000
for n = 81. Further, our experiments were subdivided into three runs. In runs 1
and 2 five replications of each of the 16 problem types with n equal to 9 and 27
were generated, and the optimal dual variables uj, j E A were set to r(0, 30) and
r(O, 30m), respectively. In run 3, one replicate of each of the eight problem types
with n = 81 was generated, and all uj, j E A were set to (0, 81 m). T o complete the
generation of each problem we set the c o m p o n e n t s of the optimal solution x* to
r ( - 5 , 5 ) and the elements of C to r ( - 1 , 1 ) . The columns of C were then
normalized to unit length. For j ~ A we set s i = 0 and for j~" A we set sj = r(0, 1)
and uj = 0. Then we set b = s - CVx * and a = Cu - G x * .
The results of our computational tests are reported below in Tables I, 2 and 3.
These results differ s o m e w h a t from the preliminary results reported in [18] and
[24] because of i m p r o v e m e n t s made to the dual and primal codes to make them
more efficient. The random number seed used to generate the run 3 problems
was changed f r o m the one that was used in [18, 24]. These results are therefore
not directly comparable.
The dual algorithm that we used always chose the most violated constraint to
add to the active set. The primal algorithms that we used were algorithms 1 and 2
described in Goldfarb [16], the f o r m e r being essentially identical to the algorithm
given by Fletcher [13] in the strictly c o n v e x case. For finding a feasible point for
the primal algorithms we used the variant of Rosen's [33] procedures suggested
in [16], where the operators N ~ = ( N X N ) -~ and P = I - N N + used by Rosen
are replaced by N * and H , defined by (2.1) and (2.2).
As already mentioned in the introduction, when this routine is started from the
unconstrained minimum the feasible point found by it is often optimal as well; in
a certain sense it behaves like a poor version of our dual method which
s o m e t i m e s needs some phase-two primal iteration to 'restore optimality'. For
computational results, see Idnani [23] and Dax [9]. Although this choice of
starting point is clearly preferred, it would not have enabled us to effectively
c o m p a r e our dual algorithm against the primal algorithms (in phase-two). There-
fore, we started the feasible point routine from a randomly generated point (with
c o m p o n e n t s set to r( - 5, 5)) so as to obtain a random feasible point for the start
16 D, Goldfarb and A. Idnani/ A numerically stable dual method
~ ~ . . . . ~ - , ~
o~
r-.i
~.~
z2d ra
r
",-
.,--_
".-
D. Goldftlrb a n d A. ldnani/ A numerically stable dual method 17
++
.o_
cJ r l ~ ; ; ; ~ ; r-i . . . . . ~ ; --
,o
t"-i
+.+._,
E . ~- ..u,
z.~ r--.
rl
E
"E
0
.,,.
E~ E
$.
[-,
18 D. (i~ldf~rb c~tl~t A. ldn~mi/A nmnericc, ll?," st~ble ~hn~! meth~d
.~_ -~
~1 -- -- rl
r"
=2
Z u
"r"
9
b~
E~
E
o.
0
D. G,~htlurb and .4. hhlani/ A nmneri('(llly stable dual melhm t, 19
* These codes and the random problem generator are available from the authors.
20 D. Goldfarb and A. ldnani/ A numerically xt~lbh, dmd method
c~ ~().
Vf and Vc~, i = 1..... k are the gradients of the nonlinear programming problem's
objective function f ( x ) and the k' equality and k - k ' inequality constraints
c~(x) = 0, and c~(x) >- 0 evaluated at x, the current estimate of the solution. B is a
positive definite symmetric quasi-Newton approximation to the Hessian of the
Lagrangian of the nonlinear programming problem with respect to the x vari-
ables and h(~) is a scalar function of ~ chosen so as to make ~ as large as
22 D. Goldfarb and A. ldnani/ A mmzerically stable dual method
possible subject to the constraints (6.1b). Since the dual method requires a
strictly convex objective function we used h(~j) = 10~(~ ~ - 2~) rather than h(~) =
-106~ as in Powell's original code. Also it was not necessary to add artificial
lower and upper bounds to each quadratic program as required by Fletcher's
feasible point code. In spite of these differences in implementation both quadra-
tic programming codes resulted in essentially identical nonlinear programming
iterations.
The results of our computational tests are summarized in Table 4. The
numbers of variables and constraints for the generated quadratic program (given
in parentheses in Table 4) are respectively one and two more than these numbers
for the original nonlinear problem because of the variable ~ and the bounds on it.
To obtain the results in this table the n u m b e r of basis changes and operations
were s u m m e d over all quadratic programming problems generated during the
solution of a given nonlinear problem. Observe that although now on the average
the feasible point routine requires f e w e r basis changes than the dual, the total
for it and Fletcher's primal algorithm is more than 30% more than that required
by the dual algorithm. This is in spite of the fact that the optimal solutions of all
of the quadratic programs solved were on manifolds of dimension less than or
equal to two. In Colville 3 all optimal solutions occurred at vertices (see the
columns with headings 'Variables' and 'Average q*').
The set of constraints that are active at the optimal solution of the quadratic
program (6.1) tends not to change very much from one iteration of Powell's
algorithm to the next. Note that even if the active set remains the same the
constraints in that set may change because the linearization is carried out at a
new point. In four of the six problems run the optimal basis stayed the same for
all iterations while in the other two problems the overlap was, on the average,
greater than 90%. To take advantage of this we modified the dual algorithm so
that our algorithm was executed in two passes on all iterations of Powell's
algorithm other than the first. In the first pass all constraints not in the optimal
active set A* on the previous iteration of Powell's algorithm were ignored. In
addition in Step 1 p was chosen as the first violated constraint (i.e., the one with
the lowest index) rather than the most violated constraint. In the second pass the
full set of constraints was considered and the dual algorithm was started from
the optimal solution of P(A*) obtained by the first pass. This resulted in fewer
drops and an overall computational savings of a p p r o x i m a t e l y 20%. The results
presented in Table 4 are for this variant of the dual algorithm. A consequence of
this strategy was that the dual algorithm rarely had to drop constraints; in 49
Q P P ' s solved only a total of 23 drops were required. The figures in parentheses
next to the n u m b e r of basis changes required by the dual (Fletcher's primal)
algorithm give the number of times a constraint had to be dropped from (added
to) the basis.
The results reported for Fletcher's code are for a variant that included in the
initial infeasible basis at the start of the feasible point routine those constraints
D. Goldfarb a~d A, h b m n i / A numerically stable dual method 23
r-,
z~2 ~-~
#o
~ . ~ . : . .
.r--i ,~*'-, t,--~ .t'l r ~.q
'~ ,~.~
r
rl
r,
r-i
t-i
Lr,
~ ~ , ~, ~,
r,r,
'7,
c'l
._.E
E
g.-.~
d~
"5
~o
, e1-
0
[._. Ca O- r~ e, b-, qO r O r f-
24 D. Goldfarb and A. Idtmni/ A numerically stable dual method
that were in the previous optimal basis as long as they were still sufficiently
linearly independent.
The number of operations relative to the dual reported in Table 4 were
calculated as described in the last section. H o w e v e r , we did not include the work
required to factor B in counting the n u m b e r of operations for the dual algorithm
because nonlinear programming codes which are based upon variable metric
formulas should update the Cholesky factors of B rather than B itself. The extra
work for these factorizations was c o m p u t e d and is given in the table relative to
the work required for the dual algorithm. The number of operations required by
Fletcher's primal code relative to the dual was much greater than one would
expect from the difference in the numbers of basis changes required by the two
algorithms. This shows that Fletcher's implementation of algorithm I is rather
inefficient.
Our computational results also show that using our dual algorithm to solve the
Q P P ' s that were generated by Powell's successive quadratic programming code
took between four and fourteen times as much work as factoring the Hessian,
and on the average only about seven and one-half times as much work.
The results presented in Table 4 differ from preliminary results presented in
[18] and [24] as a result of some changes to the dual code and because the earlier
results did not account for the situations in which Fletcher's code exchanged one
constraint in the basis for another not in the basis.
It should also be mentioned that Fletcher's code experienced some numerical
problems in the course of solving the Colville 2 problem which caused the
system to print several underflow error messages (on an IBM 3033). The dual
code on the other hand experienced no numerical difficulties.
show that it follows the same solution path as our dual algorithm. Consequently,
our algorithm is mathematically equivalent to applying either Dantzig's, Flet-
cher's, or G o l d f a r b ' s primal algorithm to the dual of the given problem.
As in Dantzig's primal method the computations performed in the Van de
P a n n e - W h i n s t o n dual method are based upon using simplex pivots applied to
tableaux which can be obtained from the tableau
x s~O 14~0
G 0 - C - ~
C v - I 0 b
A major iteration begins with what is called a 'standard' tableau; i.e.. one whose
corresponding solution is dual feasible and which has no basic (or nonbasic)
pairs. A basic (nonbasic) pair means that both u~ and si are basic (nonbasic).
Such a solution x is also referred to in the literature as ' c o m p l e m e n t a r y ' . In our
terminology x is the solution of the subproblem P(A), where A = {i E K ] s ~ is
nonbasic}; i.e., (x, A) is an S-pair. The method chooses the dual variable u v
corresponding to a negative slack s v to increase and add to the basis. This
c o r r e s p o n d s to step 1 of our algorithm. If sp is reduced to zero before any basic
dual variable, then it is removed from the basis, a simplex pivot is performed,
and once again the solution is ~complementary'. This corresponds to a full step
(t = t2) in our method. Otherwise one of the dual variables, say u k . goes to zero
and is dropped from the basis yielding a ' n o n c o m p l e m e n t a r y ' solution (the
tableau is called 'nonstandard'), with a basic pair (lh,, s~) and a nonbasic pair
(uk, sD. In our method this corresponds to a partial step (t = tO. N e x t the slack sk
is increased and added to the basis. This corresponds to dropping constraint k
from the active set A in our method and can be shown to have the effect of also
increasing s,. Again either s t, will be reduced to zero before any basic dual
variable and it will be dropped from the basis to yield a c o m p l e m e n t a r y solution,
or one of the dual variables (other than u~,, as it will increase) will be dropped
from the basis, again leaving a n o n c o m p l e m e n t a r y solution with one basic and
one nonbasic pair. In the latter case, the a b o v e procedure for a noncom-
p l e m e n t a r y solution is repeated. Thus, starting from a standard tableau (S-pair)
one proceeds through a sequence of nonstandard tableaus (partial steps) until
one again obtains a standard tableau (full step).
A c o m p a r i s o n between the actual computations performed by our dual al-
gorithm and the Van de P a n n e - W h i n s t o n algorithm is given in [24]. It is shown
there that our algorithm is far more efficient. It is also more numerically stable.
The only noncomputational difference between the two algorithms is that the
latter does not allow for the possibility of primal infeasibility. This shortcoming
is easily rectified by terminating with an indication that (1.1b) is infeasible when
Rule 2 in [39] cannot be applied.
26 D. Gold farb and A. Idnani[ A numerically stable thml method
Another dual method that is related to our method is the one proposed by
L e m k e [26]. L e m k e first eliminates the primal variables from the dual Q P P (7.1)
using (7. lb)--i.e.,
x=G ~(Cu-a)
where s~ = C'~xo- b, x~ = - G l a, and Q = CTG IC. Then starting from the dual
feasible point u = 0 (x = x0, the unconstrained minimum of f ( x ) ) L e m k e essen-
tially applies Beale's conjugate direction approach [2, 3] for solving a Q P P to
(7.2).
Specifically, L e m k e ' s method employs the 'explicit inverse' of a basis matrix
B which has columns of the identity matrix corresponding to the normals of all
active dual constraints and columns of the f o r m Qd~ where d~ is the direction of
a dual step previously taken. Starting from u = 0 and B = I, if some c o m p o n e n t ,
say the pth, of the gradient of w(u), y = - ( s c ~ + Q u ) (corresponding to a zero
dual variable) is negative (sp(x) < 0 in the primal QPP), L e m k e ' s algorithm drops
up from the 'dual' active set and moves in the direction - dr,. where dj, is the pth
row of B -~. If the minimum of w(u) along the semi-infinite ray {ti = u - rdp [ r->
0} satisfies the dual constraints , -> 0, then a step to this point is taken, the above
procedure is repeated and the solution path, in both x and u variables, is the
same as the one followed by our dual algorithm. The sequence of points u so
obtained are the minima of w ( u ) on faces of the nonnegative orthant of
increasing dimension. The corresponding points x are the solutions of sub-
problems P(A), where A is of increasing cardinality. H o w e v e r , if a dual
cons_traint is encountered before a minimum is reached along - d p , the two
algorithms in general follow different solution paths from that point on. On the
next iteration our algorithm will proceed in the direction of the minimum of w ( u )
on the new constraint manifold in the dual space, whereas L e m k e ' s algorithm
will take as m a n y as k Q-conjugate steps, where k is the dimension of the
manifold, to reach this minimum. If this minimum is reached by both algorithms,
they will again begin to follow the same solution path. On the other hand, if
some dual constraint is encountered by either algorithm before this minimum is
reached, then the two algorithms can subsequently proceed through totally
different dual and primal active sets. (See the discussion of Beale's algorithm in
[16] for a more complete analysis.)
Different solution paths will result if problem (1.1) with
is solved choosing successively the first, second and third constraints to add to
the active set in our algorithm while making corresponding choices of the dual
variable to drop from the dual active set in L e m k e ' s algorithm. On the other
hand, both algorithms follow the same solution path on the example given in the
appendix with the first set of active set changes specified there.
A n o t h e r method which invites c o m p a r i s o n with ours is one proposed by
Bartels et al. in Section 12 of [1]. That method also starts from the unconstrained
m i n i m u m of (1.1a) and uses the factorizations (4.1) and (4.2), updating Q and R
rather than J and R. H o w e v e r , instead of following a purely dual approach it
uses the principal pivoting method [7].
8. Observations
In our computational tests the dual method was found to require f e w e r basis
changes than the primal methods with which it was compared. Without any
doubt the principal reason for this was that no phase-one was needed to find a
dual feasible point. For this reason, one would expect primal methods which use
an I~ penalty term (e.g., see [6]) to penalize constraint violations rather than a
separate phase-one procedure to also require less basis changes than the primal
m e t h o d s that we tested.
T h e r e are modifications that can be made to the dual algorithm to make it
more efficient even though the n u m b e r of basis changes m a y increase. At any
iteration, arbitrary subsets of constraints can be temporarily ignored when
c o m p u t i n g slacks and seeking a violated constraint in Step 1. One such strategy
was described in Section 6, and m a n y others are possible. When the n u m b e r of
constraints is large, the potential for reducing the cost per iteration is sufficient
to make such strategies worthwhile. Clearly this is analogous to 'partial pricing'
in the simplex method for linear programming.
It is important to point out that there are m a n y numerically stable ways to
implement the dual algorithm. Several are described in [17]. Although the
discussion there is in terms of primal algorithms the factorizations described are
also applicable to our dual algorithm. The implementation that we chose is not
designed to take advantage of simple upper and lower bounds on the variables or
general sparsity. If one wishes to do this or to extend the algorithm so that it can
solve problems with positive semi-definite and indefinite Hessians, then an
implementation based upon the Cholesky factorization of Z T G Z , where the
columns of Z f o r m a basis for the null space of N, would appear to be more
appropriate. (It should be noted that Van de Panne and Whinston's dual method
can solve semidefinite quadratic programs.) If in addition the constraints are
given in the standard linear p r o g r a m m i n g form (i.e. A x = b, I <- x <- u ) , then an
implementation which also uses an L U factorization of the basis matrix as in
[29] might be best.
28 D. Gold farb and A. ldnani/ A m.nerically stable dual method
Our final remarks concern the problems of degeneracy and "near' degeneracy.
For reasons of numerical stability it is important in both primal and dual
algorithms to avoid intermediate active sets which are nearly linearly dependent.
In dual algorithms one usually has a choice of which constraint to add to the
active set, while in primal algorithms that selection is determined by feasibility
conditions except in exactly degenerate cases. (The situation is reversed with
regard to dropping constraints from the active set.) Consequently, one has
greater control over the linear independence of the active set in dual methods.
As far as degeneracy is concerned no special procedure, such as small
perturbations of the constraints, is required to prevent 'cycling' in our method as
is the case for primal methods. This advantage is also claimed by Lemke [26] for
his method.
Because of computer rounding errors a degenerate situation may in practice
appear to be nearly degenerate. Consider the behavior of a reasonable im-
plementation of our algorithm in such a situation. If the current iterate x is
'nearly' feasible, then the algorithm will stop since all violations of the 'nearly
dependent' inactive constraints should fall within a prescribed tolerance for
termination. If x is so infeasible that the algorithm does not terminate, then we
expect that there is a constraint violated by x which, when added to the active
set, will take us away from the current near degenerate situation and will avoid
an ill-conditioned intermediate active set constraint matrix. Choosing the most
violated constraint is, in fact, a good strategy for doing this, In contrast, primal
methods cannot choose which constraint to add to the active set (unless a small
amount of infeasibility is tolerated). Therefore ill-conditioned intermediate states
are more likely to occur, as are instances in which many small steps are taken in
essentially the same manifold as a result of interchanging linearly dependent or
nearly linearly dependent active constraints.
On the other hand an ill-conditioned Hessian matrix G might be expected to
cause more numerical problems for a dual method than for a primal method
since the former starts from the unconstrained minimum x ~ = - G-~a.
Acknowledgement
Thanks are due to the associate editor and referee for their useful suggestions.
Appendix: An example
Pro blem :
This problem is depicted in Fig. I along with all possible solution paths. More
than one path is possible because one has the freedom of choosing any violated
constraint in Step 1 of the algorithm. Since
,% n1
Minimize f(x) = 2(x 2 - x lx2+ x2)+ 6x 1 ~ '
subject to x l ~ O, x2~> 0 ~ /
and Xl+ x2-~ 2.
/ x
n3
/ ~ feasible region
n2
\,
I I// t I " / / I possiblesolution paths
//
//
I
--~ /
/
/
/
_lJ
I -~
1) x O _ _ x l ~ x 2 ~ x 3
X0 ~ I 2) x O ~ x ~ -~x 4 * x S ~ x 3
3) x 0 ~ x 3
4) xO ~ x 4 ~ x 5 - * x 3
Fig. I. An example showing all possible solution paths for the dual algorithm where any violated
constraint can be chosen for p in Step 1.
30 D. Gnldfarb trod A. ldnani/ A n.merically st~tble (h~l method
~ ~ ~ o ~ ==~
=~ ~~ =~ o-~
i
l r,~ I
~4" r ~,
-~ [I - ~'I ~ I
1 F i [
.J L - - J I
I E 84 I
~ll~: ~ ~ el ~1 ~'
I ~7 ~ ~_l,,
- - 1 r ~ l r - - i i - - 1 [ I ~ I f - - 1 I I C I
f
ii- .......
't
- - q l l t l q I I [ I ( 1 r i I i i ~ "|
References
[l] R.H. Bartels, G.H. Golub, and M.A. Saunders, "'Numerical techniques in mathematical pro-
gramming", in: J.B. Rosen, O.1. Mangasarian and K. Ritter, eds., Nonr progranmlin~
(Academic Press, New York, 1970) pp. 123-176.
[2] E.M.L. Beale, "On minimizing a convex function subject to linear inequalities". Journal of the
Royal Statistical Society Series B 17 (1955) 173-184.
[3] E.M,L. Beale. "On quadratic programming", Naval Research Logistics Quarterly 6 (1959)
227-243.
[4] M.C. Biggs, "'Constrained minimization using rect,rsive quadratic programming: some alter-
native subproblem formulations", in: L.C.W, Dixon and G.P. Szego. eds., Towards global
optimization (North-Holland, Amsterdam, 1975) pp. 3-41-349.
[5] J.W. Bunch and L. Kaufman, "Indefinite quadratic programming", Computing Science Tech-
nical Report 61, Bell. Labs., Murray Hill. NJ (1977).
[6l A.R. Conn and J.W. Sinclair, "Quadratic programming via a nondifferentiable penalty function",
Department of Combinatorics and Optimization Research Report CORR 75-15, University of
Waterloo, Waterloo, Ont. (1975).
[7l R.W. Cottle and G.B. Dantzig, "Complementary pivot theory of mathematical programming",
in: G,B. Dantzig and A.F. Veinott, eds., Lectures in applied mathematics II, Mathematics of the
decision sciences, Part l (American Mathematical Society, Providence, RI, 1968) pp. 115-136.
[8] J.W. Daniel, W.B. Graggs. L. Kaufman and G.W. Stewart. "Reorthogonalization and stable
algorithms for updating the Gram-Schmidt QR factorizations", Mathematics of Computation 30
(1976) 772-795.
[9] A. Dax, "'The gnidient projection method for quadratic programming", Institute of Mathematics
Report, The Hebrew University of Jerusalem (Jerusalem, 1978).
[t0] G.B. Dantzig. Linear programming and extensions (Princeton University Press, Princeton, NJ,
1963) Chapter 24, Section 4.
[I 1] R. Fletcher, "'The calculation of feasible points for linearly constrained optimization problems",
UKAEA Research Group Report, AERE R635-4 (1970).
[12] R. Fletcher, "A FORTRAN subroutine for quadratic programming", UKAEA Research Group
Report, AERE R6370 (1970).
[13] R. Fletcher, "A general quadratic programming algorithm", Journal of the Institute of Mathe-
matics and Its Applications ( 197 I) 76-91.
[14] P.E. Gill, G.H. Golub, W. Murray and M.A. Saunders, "Methods for modifying matrix
factorizations', Mathematics of Computation 28 (1974) 505-535.
32 D. Goldfarb and A. Ida~ini/ A numerically stable dual method
[15] P.E. Gill and W. Murray, "'Numerically stable methods for quadratic programming", Mathema-
tical programming 14 (1978) 349-372.
[16] D. Goldfarb, "Extension of Newton's method and simplex methods for solving quadratic
programs", in: F,A. Lootsma, ed., Nanterical methods for nonlinear optimization (Academic
Press, London, 1972) pp. 239-254.
[17] D. Goldfarb, "Matrix factorizations in optimization of nonlinear functions subject to linear
constraints", Mathematical Programming 10 (1975) 1-31.
[18] D. Goldfarb and A. Idnani, "Dual and primal-dual methods for solving strictly convex quadratic
programs", in: J.P. Hennart, ed., Numerical Analysis, Proceedings Cocoyoc, Mexico 1981, Lecture
Notes in Mathematics 909 (Springer-Verlag, Berlin, 1982) pp. 226--239.
[19] A.S. Goncalves. "A primal-dual method for quadratic programming with bounded variables", in
F.A. Lootsma, ed., Numerical methods for nonlinear optimization (Academic Press. London,
1972) pp. 255-263.
[20] M.D. Grigoriadis and K. Ritter, "'A parametric method for semidefinite quadratic programs",
SIAM Journal of Control 7 (1969) 559-577.
[21] S-P. Han, "Snperlinearly convergent variable metric algorithms for general nonlinear pro-
gramming problems", Mathematical Programming 11 (1976) 263-282,
[22] S-P. Han, "Solving quadratic programs by an exact penalty function", MRC Technical Sum-
mary Report No. 2180, M.R.C., University of Wisconsin (Madison, WI, 1981).
[23] A.U. Idnani, "Extension of Newton's method for solving positive definite quadratic programs--
A computational experience", Master's Thesis, City College of New York, Department of
Computer Science (New York, 1973).
[24] A.U. ldnani, "Numerically stable duM projection methods for solving positive definite quadratic
programs", Ph.D. Thesis. City College of New York, Department of Computer Science (New
York, 1980).
[25] C.L. Lawson and R.J. Hanson, Solt'ing least sqt~ares problems (Prentice-Hall, Engelwood Cliffs,
N.J., 1974).
[26] C.E. Lemke, "A method of solution for quadratic programs", Ma~zagernenl Scie~lce 8 (1962)
442-453.
[27] R. Mifflin, "A stable method for solving certain constrained least squares problems", Mathe-
matical Programming 16 (1979) 141-158.
[28] W. Murray, "An algorithm for finding a local minimum of an indefinite quadr~tic program",
NPL NAC Report No, I (1971).
[29] B.A. Murtagh and M,A. Saunders, "'Large-scale linearly constrained optimization", M~,thema-
tical Prograntming 14 (1978) 41-72.
[30] M.J.D. Powell, "A fast algorithm for nonlinearly constrained optimization calculations", in:
Namerichl analysis, Dundee. 1977, Lecture Notes in Mathematics 630 (Springer Verlag, Berlin,
1978) pp. 144-157.
[31] M.J.D. Powetl, "An example of cycling in a feasible point algorithm'*, Mathematical Program-
ruing 20 (1981) 353-357.
[32] K. Ritter, "Ein Verfahren zur L6sung parameter-abhtingiger, nichtlinearer Maximum-Prob-
leme", Unternehmensforschang 6 (1962) 149-166: English Transl., Nat, al Researcfl Logistics
Q~,arterly 14 (1%7) 147-162.
[33] J.B. Rosen, "The gradient projection method for nonlinear programming, Part 1. Linear
constraints", S I A M Journal of Applied Mathematics 8 (1960) 18 I-217.
[34] J.B. Rosen and S. Suzuki, "Construction of nonlinear programming test problems", Com-
munications of the A C M (1965) 113,
[35] K. Schittkowski, Nonlinear programming codes--Information, tests, performance. Lecture
Notes in Economics and Mathematical Systems, No. 183 (Springer-Verlag, Berlin, 1980).
[36] K. Schittkowski and J. Stoer, "'A factorization method for the solution of constrained linear
least squares problems allowing subsequent data changes", Namerische Mathematik 31 (1979)
431-463.
[371 J. Stoer, "On the numerical solution of constrained least squares problems", SIAM Joarnal an
Numerical Analysis 8 (1971 ) 382-411.
D. Goldfilrb and A. Idmmi/ A m~meri~'~lly ,stable dual method 33