Lecture10 PDF
Lecture10 PDF
Lecture10 PDF
2 in the text.
and therefore (0) = f (x0 ) u = f (x0 ) cos , where is the angle between f (x0 ) and u. It follows that (0) is minimized when = , which yields f ( x0 ) u= , (0) = f (x0 ) . f (x0 ) We can therefore reduce the problem of minimizing a function of several variables to a singlevariable minimization problem, by nding the minimum of (t) for this choice of u. That is, we nd the value of t, for t > 0, that minimizes 0 (t) = f (x0 tf (x0 )). After nding the minimizer t0 , we can set x1 = x0 t0 f (x0 ) 1
and continue the process, by searching from x1 in the direction of f (x1 ) to obtain x2 by minimizing 1 (t) = f (x1 tf (x1 ), and so on. This is the Method of Steepest Descent: given an initial guess x0 , the method computes a sequence of iterates {xk }, where xk+1 = xk tk f (xk ), where tk > 0 minimizes the function k (t) = f (xk tf (xk )). Example We apply the Method of Steepest Descent to the function f (x, y ) = 4x2 4xy + 2y 2 with initial point x0 = (2, 3). We rst compute the steepest descent direction from f (x, y ) = (8x 4y, 4y 4x) to obtain f (x0 ) = f (2, 3) = (4, 4). We then minimize the function (t) = f ((2, 3) t(4, 4)) = f (2 4t, 3 4t) by computing (t) = f (2 4t, 3 4t) (4, 4) = (8(2 4t) 4(3 4t), 4(3 4t) 4(2 4t)) (4, 4) = (16 32t 12 + 16t, 12 16t 8 + 16t) (4, 4) = (16t + 4, 4) (4, 4) = 64t 32. This strictly convex function has a strict global minimum when (t) = 64t 32, or t = 1/2, as can be seen by noting that (t) = 64 > 0. We therefore set 1 1 x1 = x0 f (x0 ) = (2, 3) (4, 4) = (0, 1). 2 2 Continuing the process, we have f (x1 ) = f (0, 1) = (4, 4), 2 k = 0, 1, 2, . . . ,
and by dening (t) = f ((0, 1) t(4, 4)) = f (4t, 1 4t) we obtain (t) = (8(4t) 4(1 4t), 4(1 4t) 4(4t)) (4, 4) = (48t 4, 32t + 4) (4, 4) = 320t 32. We have (t) = 0 when t = 1/10, and because (t) = 320, this critical point is a strict global minimizer. We therefore set x2 = x 1 1 1 f (x1 ) = (0, 1) (4, 4) = 10 10 2 3 , 5 5 .
2 Repeating this process yields x3 = (0, 10 ). We can see that the Method of Steepest Descent produces a sequence of iterates xk that is converging to the strict global minimizer of f (x, y ) at x = (0, 0). 2
The following theorems describe some important properties of the Method of Steepest Descent. Theorem Let f : Rn R be continuously dierentiable on Rn , and let x0 D. Let t > 0 be the minimizer of the function (t) = f (x0 tf (x0 )), t 0 and let x1 = x0 t f (x0 ). Then f (x1 ) < f (x0 ). That is, the Method of Steepest Descent is guaranteed to make at least some progress toward a minimizer x during each iteration. This theorem can be proven by showing that (0) < 0, which > 0 such that (t) < (0). guarantees the existence of t Theorem Let f : Rn R be continuously dierentiable on Rn , and let xk and xk+1 , for k 0, be two consecutive iterates produced by the Method of Steepest Descent. Then the steepest descent directions from xk and xk+1 are orthogonal; that is, f (xk ) f (xk+1 ) = 0. This theorem can be proven by noting that xk+1 is obtained by nding a critical point t of (t) = f (xk tf (xk )), and therefore (t ) = f (xk+1 ) f (xk ) = 0. That is, the Method of Steepest Descent pursues completely independent search directions from one iteration to the next. However, in some cases this causes the method to zig-zag from the initial iterate x0 to the minimizer x . 3
We have seen that Newtons Method can fail to converge to a solution if the initial iterate is not chosen wisely. For certain functions, however, the Method of Steepest Descent can be shown to be much more reliable. Theorem Let f : Rn R be a coercive function with continuous rst partial derivatives on Rn . Then, for any initial guess x0 , the sequence of iterates produced by the Method of Steepest Descent from x0 contains a subsequence that converges to a critical point of f . This result can be proved by applying the Bolzano-Weierstrauss Theorem, which states that any bounded sequence contains a convergent subsequence. The sequence {f (xk )} k=0 is a decreasing sequence, as indicated by a previous theorem, and it is a bounded sequence, because f (x) is continuous and coercive and therefore has a global minimum f (x ). It follows that the sequence {xk } is also bounded, for a coercive function cannot be bounded on an unbounded set. By the Bolzano-Weierstrauss Theorem, {xk } has a convergent subsequence {xkp }, which can be shown to converge to a critical point of f (x). Intuitively, as xk+1 = xk t f (xk ) for some t > 0, convergence of {xkp } implies that
kp+1 1
t i f (xi ),
t i > 0,
which suggests the convergence of f (xkp ) to zero. If f (x) is also strictly convex, we obtain the following stronger result about the reliability of the Method of Steepest Descent. Theorem Let f : Rn R be a coercive, strictly convex function with continuous rst partial derivatives on Rn . Then, for any initial guess x0 , the sequence of iterates produced by the Method of Steepest Descent from x0 converges to the unique global minimizer x of f (x) on Rn . This theorem can be proved by noting that if the sequence {xk } of steepest descent iterates does not converge to x , then any subsequence that does not converge to x must contain a subsequence that converges to a critical point, by the previous theorem, but f (x) has only one critical point, which is x , which yields a contradiction.
Exercises
1. Chapter 3, Exercise 8 2. Chapter 3, Exercise 11 3. Chapter 3, Exercise 12