The Penalty Function Method
The Penalty Function Method
The Penalty Function Method
Lecture 13, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk)
I. Basic Concepts in Constrained Optimisation In the remaining four lectures we will study algorithms for solving constrained nonlinear optimisation problems of the standard form (NLP)
xRn
s.t.
The use of merit functions allows one to combine the often conicting goals of improving the objective function and achieving feasibility.
The use of a homotopy parameter allows one to reduce a constrained optimisation problem to a sequence of unconstrained optimisation problems.
Merit Functions: Starting from a current iterate x, we aim at nding a new update x+ that brings us closer towards the achievement of two conicting goals: reducing the objective function as much as possible, and satisfying the constraints. The two goals can be combined by minimising a merit function which depends both on the the objective function and on the residuals measuring the constraint violation, rE (x) := gE (x) rI (x) := (gI (x))+ , where
g (x) j (gj (x))+ := 0
Example 1: The penalty function method that will be further analysed below is based on the merit function 1 2(x), gi Q(x, ) = f (x) + 2 iEI where > 0 is a parameter and gi =
g
i
(1)
min(g , 0) i
(i E ), (i I ).
Note that Q(x, ) has continuous rst but not second derivatives at points where one or several of the inequality constraints are active.
The Homotopy Idea: The second term of the merit function forces the constraint violation to be small when Q(x, ) is minimised over x. We are not guaranteed that the constraints are exactly satised when is held xed, but we can penalise constraint violation more strongly by choosing a smaller . This leads to the idea of a homotopy or continuation method which is based on reducing dynamically and using the following idea for the outermost iterative loop:
Given a current iterate x and a value of the homotopy parameter such that x is an approximate minimiser of the unconstrained problem
y R n
min Q(y, ),
(2)
reduce to a value + < and starting from x apply one or several steps of an iterative algorithm for the minimisation of
y R n
until an approximate minimiser x+ of this problem is reached. Thus, the continuation approach replaces the constrained problem (NLP) by a sequence of unconstrained problems (2) for which we already studied solution methods.
S0 Initialisation choose x0 Rn choose (k )N0 choose ( k )N0 0 0 % (not necessarily feasible) % (homotopy parameters) % (tolerance parameters)
S1 For k = 0, 1, 2, . . . repeat y [0] := xk , l := 0 until x Q(y [l], k ) k repeat nd y [l+1] such that Q(y [l+1], k ) < Q(y [l], k ) % (using an unconstrained minimisation method) l l+1 end xk+1 := y [l] end.
Theorem 1: Convergence of Algorithm QPen. Let f and gi be C 1 functions for all i E I , let x be an accumulation point of the sequence of iterates (xk )N0 generated by Algorithm QPen, and let (kl )N0 (k)N0 be such that liml xkl = x. Let us furthermore assume that that the set of gradients {gi(x) : i V (x )} is linearly independent, where V (x) = E {j I : gj (x ) 0} is the index set of active, violated and equality constraints.
For i E I let i Then i) x is feasible, ii) the LICQ holds at x, iii) the limit := liml [kl ] exists, iv) (x , ) is a KKT point.
[k]
gi(xk+1) . k
(3)
The proof we are about to give only depends on the termination criterion in step S1 and not on the starting point y [0] in each iteration. We may therefore assume without loss of generality that kl = l for all l N0. Proof: Using xQ(xk+1 , k ) k and the identity 1 gi(xk+1)gi(xk+1 ) k iEI (4) in conjunction with the triangular inequality, we nd xQ(xk+1 , k ) = f (xk+1 ) + gi(xk+1)gi(xk+1) k k + f (xk+1 )
iEI
(5)
= 0(0 + f (x ) ) = 0.
Therefore, the left-hand side of (5) converges to zero, and gi(x)gi (x) = 0.
But since {gi(x) : i V (x )} is linearly independent, it must be true that gi(x ) = 0, (i V (x)),
which shows that x is feasible. This settles i). Since x is feasible, we have V (x) = E A(x ).
The linear independence of {gi(x) : i V (x)} therefore implies that the LICQ holds at x , settling ii).
lim x Q(xk+1 , k ) = 0.
Moreover, f is continuous, so that limk f (xk+1 ) = f (x ). Therefore, it follows from (4) that gi(xk+1) lim gi(xk+1) = f (x ). k k iEI
(6)
Note that if j I and gj (x) > 0 then gj (xk+1) > 0 and hence, gj (xk+1) = 0 for all k suciently large. In this case
= lim
gi(xk+1) = lim 0 = 0. k k k
(7)
On the other hand, since the LICQ holds at x , we have limk gi(xk ) = gi(x ) = 0 for all (i E A(x )), and hence,
k i i ,
(i E A(x )),
n where k i , i : R R are the unique linear functionals such that k i (gj (xk )), i (gj (x )) = ij := 0
if i = j, if i = j.
This implies (see lecture notes for details) that gi(xk+1 ) k k gj (xk+1 ) k+1 = lim i gj (xk+1) = i (f (x )) =: i k k j EI lim i k
[k]
= lim
exists for all (i E A(x )) and f (x ) showing iii). The rst of the KKT equations was established in (8). Moreover, we have already established that x is feasible and that ), showing complementarity. = 0 for j I \ A ( x j
i gi(x ) = 0,
(8)
iEI
If gj (xk+1) 0 occurs innitely often, then clearly j 0. On the other hand, if j A(x ) and gj (xk+1) > 0 for all k [k] suciently large, then gj (xk+1) = 0 and j = 0 for all k large, and this implies that i = 0.
A Few Computational Issues: It follows from the fact that the approximate Lagrange multipli[k] ers i converge that gi(xk+1) = O (k ) for all (i V (x )). This shows that k has to be reduced to the order of precision by which we want the nal result to satisfy the constraints. The Augmented Lagrangian Method which we will discuss in Lecture 14 performs much better.
= C (x) +
where AT(x) is the matrix with columns {gi(x) : i V (x)}. 2 Q(x, ) is discontinuous on the boundary of the Although Dxx feasible domain, it can be argued that this is usually inconsequential in algorithms.
2 Q(x, ) is used for the minimisation of Q(y, ) in When Dxx k the innermost loop of Algorithm QPen, the computations can become very ill-conditioned. For example, solving the Newton equations 2 Dxx Q(y [l], k )dl = xQ(y [l], k )
(9)
directly can lead to large errors as the condition number of the matrix 1 C (y [l]) + AT(y [l])A(y [l]) k
1 is of order O ( k ).
In this particular example, it is better to introduce a new dummy variable l , and to reformulate (9) as follows, C (y [l]) AT(y [l]) A(y [l]) k I dl l xQ(y [l], k ) = . 0 (10)
1 Indeed, if (dl , l )T satises (10) then dl solves (9): k Adl = l 1 T and xQ = Cdl + ATl = Cdl + k A Adl .
The advantage of this method is that the system (10) is usually well-conditioned and the numerical results of high precision. Similar tricks can be applied when a quasi-Newton method is used instead of the Newton-Raphson method.