The Computer Journal 1965 Nelder 308 13
The Computer Journal 1965 Nelder 308 13
The Computer Journal 1965 Nelder 308 13
A method is described for the minimization of a function of n variables, which depends on the
comparison of function values at the (n 4- 1) vertices of a general simplex, followed by the
replacement of the vertex with the highest value by another point. The simplex adapts itself to
the local landscape, and contracts on to thefinalminimum. The method is shown to be effective
and computationally compact. A procedure is given for the estimation of the Hessian matrix in
the neighbourhood of the minimum, needed in statistical estimation problems.
Spendley et al. (1962) introduced an ingenious idea for of p from Ph with [P*p] = oc[PhP]. If y* lies between
tracking optimum operating conditions by evaluating yh and yh then Ph is replaced by P* and we start again
the output from a system at a set of points forming a with the new simplex.
simplex in the factor-space, and continually forming If y* <yh i.e. if reflection has produced a new
new simplices by reflecting one point in the hyperplane
Yes
I Replace P h by P*
No
Form P** = (i+yJP* -TP Form P** = Ph+(A-)P
Calculate y " Calculate y*
is
K y " < yi ? No- -Yes
Yes No Replace a l l
Replace P h by P**
-No
Fig. 1.Flow diagram
simplex rather than with changes in the x's. The form be made to it. Clearly either technique can deal with
chosen is to compare the "standard error" of the y's individual limitations on the range of any number of
in the form */{Y,(y, ~y)2/n} with a pre-set value, and x's. Constraints involving more than one x can be
to stop when it falls below this value. The success of included using the second technique provided that an
the criterion depends on the simplex not becoming too initial simplex can be found inside the permitted region,
small in relation to the curvature of the surface until the from which to start the process. Linear constraints that
final minimum is reached. The reasoning behind the reduce the dimensionality of the field of search can be
criterion is that in statistical problems where one is included by choosing the initial simplex to satisfy
concerned with finding the minimum of a negative the constraints and reducing the dimensions accord-
likelihood surface (or of a sum-of-squares surface) the ingly. Thus to minimize y f(xu x2, x3) subject to
curvature near the minimum gives the information x, + x2 + x3 = X, we could choose an initial simplex
available on the unknown parameters. If the curvature with vertices (X, 0, 0), (0, X, 0), and (0, 0, X), treating
is slight the sampling variance of the estimates will be the search as being in two dimensions. In particular,
large so there is no sense in finding the co-ordinates of any x, may be held constant by setting its value to that
the minimum very accurately, while if the curvature is constant for all vertices of the initial simplex.
marked there is justification for pinning down the
minimum more exactly.
Results
Constraints on the volume to be searched Three functions, all of which have been used before
for testing minimization procedures, were used to test
If, for example, one of the x, must be non-negative in the method. The functions, all of which have a minimum
a minimization problem, then our method may be of zero, were:
adapted in one of two ways. The scale of the x con-
cerned can be transformed, e.g., by using the logarithm, (1) Rosenbrock's parabolic valley (Rosenbrock (I960))
so that negative values are excluded, or the function can y = 100(^2 x])2 + (1 x,)2, starting point
be modified to take a large positive value for all negative (-1-2,1).
x. In the latter case any trespassing by the simplex over
the border will be followed automatically by contraction (2) Powell's quartic function (Powell (1962))
moves which will eventually keep it inside. In either y = (xi + l(hr2)2 + 5(x3 - x4)2 + (x2 - 2x3)4
case an actual minimum with x = 0 would be inaccess- + 10(x, - xA)\
ible in general, though arbitrarily close approaches could starting point (3, 1, 0, 1).
309
Function minimization
(3) Fletcher and Powell's helical valley (Fletcher and Table 1
Powell (1963))
Mean and minimum numbers of evaluations for function 2
y = 100[x3 - lOflfr,, x 2 )] 2
+ W(A + 4) - i]2 + A MEAN NUMBER
where 2-nQ(xu x2) = arctan (x2lx{), x, > 0
STRATEGY
= 77 + arctan (*2/*i), *i < 0 STEP-LENGTH
0, h 2) 0. h 3)(1, h 4) (1, h 2) 0, i, 2)
starting point (1, 0, 0).
The stopping criterion used was s/^L^y, j) 2 /n}<10~ 8 . 0-25 225 234 282 255 344
The function value at the centroid of the final simplex 0-5 210 234 254 300 335
usually deviated from the true minimum by less than 1 216 229 260 283 343
10~ 8 ; a sample of runs gave 2-5 x 10~9 as the geo- 2 216 239 250 264 253
metric mean of this deviation. A difficulty encountered 4 226 241 251 249 347
in testing the procedure was that the size and orientation
of the initial simplex had an effect on the speed of con- Mean 219 235 259 270 322
vergence and consequently several initial step-lengths
and several arrangements of the initial simplex were
NUMBER OF FUNCTION
EVALUATION (1) (2) (3)
The values for Powell's method were obtained by logarithmic interpolation of the function values at the end of each iteration.
Data for functions (1) and (2) from Powell (1964), data for function (3) from our EMA program of his method.
Powell (1964) suggested a more complex convergence values are sufficiently close is convergence allowed.
criterion, for this general problem, based on perturbing The simplex method is computationally compact; on
the first minimum found and repeating the method to the Orion computer the basic routine (without final
find a second minimum, followed by exploration along printing) contains less than 350 instructions, and the
the line joining the two. An alternative technique, more great majority of orders are additions and subtractions
suited to our convergence criterion in terms of function or simple logical orders. There are few multiplications,
variation, is to continue after the first convergence for a and no divisions at all except on entering and leaving
prescribed number of evaluations, to test for convergence the routine.
again and, if the second test proves successful, to com- Copies of the routine, written in Extended Mercury
pare the two "converged" function values. Only if these Autocode, are available from the authors.
Appendix
The Hessian matrix at the minimum
The minimization method proposed, being independent points. If the original points of the simplex are used to
of the properties of quadratic forms, does not yield any define a set of oblique axes with co-ordinates xh then
estimate of the Hessian matrix of second derivatives at the points may be taken as
the minimum. This matrix is, of course, the information
matrix in statistical problems when the function being (0, 0, 0, . . . 0)
minimized is minus the log. likelihood, and its inverse is (1, 0, 0, . . . 0)
the sample variance-covariance matrix of the estimates. (0, 1, 0, . . . 0)
A convenient way of utilizing a quadratic surface to
estimate the minimum when the simplex is close to that and (0, 0, 0, . . . 1).
minimum was given by Spendley et al. (1962) and their
method can be readily extended to give the required If the quadratic approximation to the function in the
variance-covariance matrix of the estimates. neighbourhood of the minimum is written as
If the (n + 1) points of the simplex in n dimensions y = aQ
are given by Po, P{, . . . Pn, then Spendley et al. form
the "half-way points" Pu = (P, + Pj)l2, i =j and fit a or in vector form as
quadratic surface to the combined set of (M + 1)( + 2)/2 y = a0 + la'x + x'Bx,
312
Function minimization
then the coefficients are estimated as so that the variance-covariance matrix is given by
a; = 2y0i - (y, + 3yo)/2, i=\,...,n If normal equal-variance independent errors are involved
ba = 2(j>,- + y0- 2y0l), i=l,...,n and the sum of squares of residuals is minimized, then
bu = 2(yu +y0- yOi - y0J), i ^ j , this matrix must be multiplied by 2a2, where as usual
where y, is the function value at P ; and ya that at P l7 . a2 would be estimated by ymin/(N n), N being the
The estimated minimum is then given by total number of observations, and n the number of
parameters fitted.
In estimating B numerically it is necessary to steer a
course between two hazards. In one the simplex is so
and the information matrix is just B. small that (yu + j 0 yOi y0J) consists largely of
If Pi denotes the co-ordinates of .P, in the original rounding-off errors incurred in calculating the y's. In
system, and if Q is the n x n matrix whose ith column the other the simplex is so large that the quadratic
is />, p0, then the minimum is estimated to be at approximation is poor, and the b's are correspondingly
biased. If the method given in this paper is used, the
Pmin = Po former hazard will usually be the important one, and
_ nrt-i'a. it may be necessary to enlarge the final simplex before
adding the extra points. A possible way of doing this