4 Iterative Methods: 4.1 What A Two Year Old Child Can Do
4 Iterative Methods: 4.1 What A Two Year Old Child Can Do
4 Iterative Methods: 4.1 What A Two Year Old Child Can Do
45
4
4.1
Iterative methods
What a two year old child can do
Suppose we want to nd a number x such that cos x = x (in radians). This is a nonlinear equation, there is no explicit solution. Nevertheless a two-year old child can solve it with a calculator: just push the COS button many times... Suppose that x0 = 0 is the initial number on the display, and x1 = cos x0 , x2 = cos x1 etc. are the successive numbers. This is what you see x0 x1 x2 x3 x4 x5 x6 = = = = = = = . . . 0 1 0.5403 0.857 0.654 0.793 0.701
x20 = 0.738 x21 = 0.739 x22 = 0.739 and you see that at least the rst three digits get stabilized. This will be the approximative solution to cos x = x. We also say that x 0.739 is a xed point of the function f (x) = cos x. In general the xed point of a function f is a value x such that f (x) = x. Of course the method does not always work. Try to nd a nonzero root of x2 = x with this method. Unless you start exactly from x = 1, your iteration will either blow up or converge to zero. This means that x = 0 is a stable solution, while x = 1 is unstable. But in this case you at least got a root... Try now
1 x2
= x. Here the situation is worse: the iteration blows up whatever close you
started from the true root x = 1, unless if you started exactly from the root. This is because x = 1 is an unstable root of this equation.
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 Finally, try
1 x
46
= x. Here nothing bad happens, just the procedure will never converge, the
iteration alternates between two numbers: x0 and x1 = 1/x0 . In this case the algorithm ended up in an innite cycle. The conclusion is that whenever we try to solve an equation of the form f (x) = x (xed-point equation), one can try the two-year-old-child stategy: start from some x0 and iterate, i.e. generate the sequence x1 = f (x0 ), x2 = f (x1 ) x3 = f (x2 ) etc. If the sequence stabilizes (converges), then you found a solution. This very simple method is extremely useful, if it works... But we should emphasize, that if the sequence does not converge, then no conclusion can be made! You cannot say that the equation has no solution! Moreover, even if you found a solution, there could be other solutions. Some of them could be accessible by the same method starting from a dierent initial value x0 , some of them may not be accessible at all. It also could happen that for some initial value the iteration converges, for some other values does not. Whether the iteration converges or not is basically determined the derivative of f at the xed point. If |f (xf ix )| < 1, then the iteration converges if you start it from a point close enough to the solution. The reason is roughly the following: From the iteration we have x(n) = f (x(n1) ) and from the xed point equation xf ix = f (xf ix ). Subtract these equations from each other x(n) xf ix = f (x(n1) ) f (xf ix )
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 Now use Taylor expansion around xf ix : f (x(n1) ) f (xf ix ) + f (xf ix ) x(n1) xf ix if x(n1) is close to xf ix . Hence, from these two last equations: |x(n) xf ix | |f (xf ix )||x(n1) xf ix |
47
Hence, if |f (xf ix )| < 1, then we see that x(n) is closer to the xpoint than x(n1) . After n iterations |x(n) xf ix | |f (xf ix )|n |x(0) xf ix | i.e. we see that the procedure converges exponentially fast if the number |f (xf ix )| is strictly smaller than 1. Moreover, we can give an estimate on the speed of convergence. We see that after n step the deviation of the approximate solution from the true one is reduced by a factor of |f (xf ix )|n . Hence if we want a certain precision , then we need at least n steps, where |f (xf ix )|n |x(0) xf ix | i.e. n log log |f (xf ix )|
if the error of the initial guess is at most of order one, i.e. one can assume that |x(0) xf ix | 1 In this formula we can use logarithm of any base. For example if |f (xf ix )| = 0.8, the error of the initial guess was smaller than 1, and we want 108 precision, then we need at least n lg 108 8 = = 82.55 lg 0.8 0.0969
48
To be on the safe side, one usually overestimates this number by one or two, so we need around 83-84 iterations. The argument above is not quite correct, since we neglected the higher order terms in the Taylor expansion. If you start the iteration from a point close enough to the xed point, then these higher order terms are really negligible, otherwise they could ruin the convergence. Recall that a similar phenomenon was found at the Newtons iteration. For a more rigorous discussion and nice pictures generated by a java applet, see http://www.math.gatech.edu/ bourbaki/1501/html/pdf/fixedpoints.pdf Finally notice that any equation of the form F (x) = 0 can be brought into a xed-point equation just by writing it as x F (x) = x . Then the solution to F (x) = 0 is equivalent to the xed point of f (x) := x F (x).
4.2
Now we apply the same idea to solve Ax = b. Again, for simplicity, we assume that A is a regular square matrix, hence there is a unique solution. We have to bring the equation into a xed-point equation and we do it by splitting the matrix A = B + (A B) with some cleverly chosen matrix B. The way to think about it is that B is the main part which will be chosen a nice matrix, and we view A as a small perturbation (by A B) of this nice matrix B. Clearly Ax = b is equivalent to Bx = (B A)x + b or x = B 1 (B A)x + b = (I B 1 A)x + B 1 b
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 at least if B is invertible (hence we will have to choose B such that B 1 be simple). The algorithm is very simple: start with an initial vector x(0) , and iteratively generate x(n) := (I B 1 A)x(n1) + B 1 b and hope for the best...
49
How to predict the convergence? At least is there some sucient condition for convergence? Let x(n) := x x(n) be the error after the n-th iteration. From Ax = b and Bx(n) = (B A)x(n1) + b we easily see that B(x(n) ) = (B A)(x(n1) ) i.e. x(n) = (I B 1 A)(x(n1) ) = (I B 1 A)n (x(0) ) where x(0) is the dierence of the initial value from the solution. Hence x(n) (I B 1 A)n x(0) (4.1)
It is clear that if the norm of the matrix (I B 1 A)n converges to zero, then the iteration converges. Moreover, this norm gives an estimate on the speed of convergence. We need the following theorem: Theorem 4.1 Let H be a square matrix. Then H n 0 if and only if all eigenvalues of H are less than one in absolute value. Proof: The theorem intuitively is clear; the eigenvalues are responsible for amplication of vectors. However the rigorous proof is not trivial for general matrices, so we restrict ourselves to the symmetric case.
50
If H is symmetric, then by the spectral theorem it can be written as H = QDQt with some orthogonal matrix Q and diagonal matrix D. Clearly H n = QDn Qt , and H n = QDn Qt = Dn (PROVE IT(*)). Since D contains the eigenvalues of H in the diagonal, the eigenvalues of Dn are just the n-th power of the eigenvalues of H. The biggest of them clearly goes to zero if and only if the biggest eigenvalue of H is smaller than one in absolute value. As a conclusion, we have Theorem 4.2 If A, B are both regular square matrices, then the iteration Bx(n) = (B A)x(n1) + b converges to the the solution if all eigenvalues of I B 1 A are less than one in absolute value. The speed of convergence is exponential with a rate given by the largest (in modulus) eigenvalue of I B 1 A. REMARK 1.: The condition is not just sucient, but also essentially necessary. But the important part is the suciency; it gives a criterion to decide in advance whether it is worth running iteration or not. REMARK 2.: The good news is that if the iteration converges, then it usually converges very fast. It means that there is a constant c < 1 such that x(n) cn x0 If I B 1 A is symmetric, then c clearly can be chosen as its biggest eigenvalue (WHY(*)?) The nonsymmetric case is slightly harder. It is well known that such an exponential estimate yields fast convergence even if c is very close to 1. For example, if c = 0.99, then c1000 e10 4 105 . Also remember that it is usually very easy to implement an iterative algorithm. So this is a good algorithm whenever it works.
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 4.2.1 Jacobi iteration
51
The art of iterative methods for Ax = b is the good decomposition. The simplest method is the Jacobi decomposition of a square matrix A = (aij ) as A=L+D+U where L is a lower triangular matrix, containing all elements of A strictly below the diagonal, U is its upper triangular counterpart and D is a diagonal matrix containing the diagonal elements of A. Let B = D and A B = L + U be the decomposition explained in Section 4.2. Since the diagonal matrix is easy to invert, we can easily write up the formulas: x(n) = D1 (L + U)x(n1) + D1 b or in coordinates xi
(n)
(4.2)
The summation is taken over all js except j = i. The initial vector x(0) can be chosen to be 0 unless we have a better a-priori guess for the true solution. Does this work? One could use Theorem 4.2 to check the eigenvalues of D1 (L + U), but this may not be so easy. The following theorem is not optimal, but it gives a very easily veriable condition for the convergence of the Jacobi iteration: Theorem 4.3 Suppose that A is diagonal dominant, that is |aii | >
j:j=i
|aij |
(4.3)
52
We do not give the rigorous proof of this theorem. We just remark that the condition (4.3) says that in every row the diagonal entry is bigger (in absolute value) than the sum of all the other entries (in absolute value) in the row. In other words, the matrix is close to a diagonal matrix in a sense that the matrix L + U containing the odiagonal terms is smaller that the diagonal matrix D. The speed of convergence of the Jacobi iteration is given by the largest (in modulus) eigenvalue of D1 (L + U). This directly follows Theorem 4.2. 1 2 . Show that it does not satisfy the condition 0 1 of Theorem 4.3, but the Jacobi iteration still converges. (Hint: Use Theorem 4.2). EXERCISE: Consider the matrix A = The condition (4.3) is quite restrictive, especially for big matrices. However it could be useful for sparse matrices, especially for matrices with special structure. When partial dierential equations are solved by discretization, then the arising big matrix is usually almost diagonal in a sense that the only nonzero entries are one or two steps away from the diagonal, e.g., for the one step case aij = 0 if |i j| > 1 (this is also called tridiagonal matrix). In this case the condition (4.3) is not so restrictive, since it compares the diagonal entry aii with only two other nonzero entries ai,i1 and ai,i+1 . Problem 4.4 Find the solution to 4 1 x= 2 3 6 8
rst with Gaussian elimination, then with Jacobi iteration up to two digits (102 precision) starting from the zero vector. Give an estimate on the number of steps to reach 8 digit precision. SOLUTION: The true solution is x = 1 2 either from Gauss, or just by looking at it.
53
algorithm will converge from any initial vector. Let x(0) = according to the Jacobi formula: 1 (n) (n1) x1 = (6 1 x2 ) 4 1 (n) (n1) x2 = (8 2 x1 ) 3 (BE CAREFUL WITH INDICES)
We compute each number up to three digits to detect the stabilization of the second digit 1 3 (1) x1 = (6 1 0) = = 1.5 4 2 1 8 (1) x2 = (8 2 0) = = 2.33 3 3 The next iteration gives 1 8 5 (2) x1 = (6 1 ) = = 0.833 4 3 6 1 3 5 (2) x2 = (8 2 ) = = 1.66 3 2 3 Next: 1 5 13 (3) = 1.08 x1 = (6 1 ) = 4 3 12 1 5 19 (3) = 2.1 x2 = (8 2 ) = 3 6 9 Next 1 (4) x1 = (6 1 4 1 (4) x2 = (8 2 3 Next 1 (5) x1 = (6 1 1.94) = 1.01 4 19 )= 9 13 )= 12 35 = 0.97 36 35 = 1.94 18
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 1 (5) x2 = (8 2 0.97) = 2.02 3 Next 1 (6) x1 = (6 1 2.02) = 0.995 4 1 (6) x2 = (8 2 1.01) = 1.99 3 and 1 (7) x1 = (6 1 1.99) = 1.0025 4 1 (7) x2 = (8 2 0.995) = 2.003 3
54
Since we see that the error in the successive steps, x(6) x(7) , is less than 1%, we can stop. There are many dierent rules to stop the algorithm, they usually dier from each other by a few steps. The important thing is that the true solution is unknown, so the true error, x(n) xtrue , must be estimated by the error in the successive steps, i.e. x(n) x(n+1) , but usually this is reliable. To give an estimate on the number of steps to get 10 8 precision, we have to compute the largest eigenvalue of M := D (L + U) =
1
4 0 0 3
0 1 2 0
0 1/4 2/3 0
The eigenvalues are = 1/ 6, hence the speed of convergence is determined by 1/ 6 (you have to choose the one bigger in absolute value, but both eigenvalues have the same absolute value here) This means that the absolute error gets diminished roughly (not exactly since this is not the norm) by a factor of 1/ 6 after every step: 1 x(n) xtrue 6
n
x(0) xtrue
55
Here x(0) xtrue is of order one (this is the typical case), you can even forget about its exact value, replace it by 1. So to get a 108 precision you have to solve 1 108 6
n
i.e. n 8 ln 10/ ln(1/ 6) 20.56. As a rule of thumb you add 2-3 more steps to x the order 1 type argument (one can make it more precise), so to get 108 digit precision one needs approximately 22-23 steps. 4.2.2 Gauss-Seidel iteration
There are several other (a bit more rened) iterative methods, relying on slightly dierent decomposition of A. Gauss-Seidel method uses B = L + D, and the n-th step of the iteration is x(n) = (L + D)1 b Ux(n1) or (CHECK that equivalent) x(n) = D1 b Lx(n) Ux(n1) or, in coordinates xi
(n)
(CHECK(*) that this is really the correct formula). This is very similar to the Jacobi iteration (see (4.2)), but in many cases it works better. Roughly speaking, Jacobi method considers A = L+D+U as a perturbation of its diagonal D, while Gauss-Seidel considers A = L+D+U as a perturbation of L + D. It turns out that Gauss-Seidel is also easier to implement on a computer (WHY?). From theoretical point of view we have the analogue of Theorem 4.3
56
Theorem 4.5 The Gauss-Seidel iteration converges for diagonal dominant matrices, i.e. for matrices satisfying (4.3).
The speed of convergence of the Gauss-Seidel iteration is given by the largest (in modulus) eigenvalue of (L + D)1 U. This directly follows Theorem 4.2.
Problem 4.6 Consider the same problem as in the previous section, i.e. nd the solution to 4 1 x= 2 3 6 8
but now with Gauss-Seidel iteration up to two digits (102 precision) starting from the zero vector. Give an estimate on the number of steps to reach 8 digit precision. SOLUTION: We just follow the formula 1 (n) (n1) x1 = (6 1 x2 ) 4 1 (n) (n) x2 = (8 2 x1 ) 3 (Notice the deviation from Jacobi, the in the second line you use x1 instead of x1 We generate the iteration: 1 3 (1) x1 = (6 1 0) = = 1.5 4 2 1 3 5 (1) x2 = (8 2 ) = = 1.66 3 2 3 Next: 1 5 13 (2) x1 = (6 1 ) = = 1.08 4 3 12 1 13 35 (2) x2 = (8 2 ) = = 1.94 3 12 18
(n) (n1)
!)
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 Next: 1 (3) x1 = (6 1 1.94) = 1.01 4 1 (3) x2 = (8 2 1.01) = 1.99 3 Next 1 (4) x1 = (6 1 1.99) = 1.0025 4 1 (4) x2 = (8 2 1.0025) = 1.998 3 and we can stop. Notice that the iteration was faster (around twice as fast) as Jacobi. To estimate the number of steps for large precision, we need to nd the eigenvalues of M = (L + D) U =
1
57
4 0 2 3
0 1 0 0
0 1/4 0 1/6
which are 0 and 1/6, hence the biggest absolute value is 1/6. To compute the number of steps we solve 108 (1/6)n i.e. n = 8 ln 10/ ln(1/6) = 10.28, hence after 12-13 steps we reach the 108 precision. Notice that the eigenvalue of the Gauss-Seidel algorithm was the square of the corresponding eigenvalue of Jacobi algorithm, i.e., the iteration will be twice as fast. This is not true for general matrices, but it is true for tridiagonal ones. There are several improvements of the Gauss-Seidel method, the most useful one is called Gauss-Seidel with relaxation. In general, relaxation is a method to avoid overshooting. Very intuitively; suppose that you run an iterative method in one dimension, the sequence of iteratively computed numbers x(n) is used to approximate the true solution x. The nicest situation is when you monotonically approach to x, from one direction: x(1) x(2) x(3) x.
Numerically this situation is the most stable. But it could happen that you approach to the
58
solution in an alternating way; e.g., x(1) < x, then x(2) > x and then again x(3) < x etc., and hopefully still |x x(n) | decreases to zero. Such a scheme carries the potential danger of overshooting; x(2) is on the other side of x and it could be farther away from x than x(1) . In this case it looks wise to replace x(2) , for example, by the average (x(1) + x(2) )/2 (or some weighted average) since this is clearly closer to x. Such a procedure is called relaxation and in particular this idea has been carefully implemented for the Gauss-Seidel iteration. It is worth to compare Jacobi and Gauss-Seidel iterations; this is the goal of one of the computer projects. Based upon a test on random matrices, it turns out that (i) Diagonal dominance is a restrictive sucient condition: these iterative methods converge for much more matrices. (ii) In general one cannot directly compare the two methods. However, for tridiagonal matrices Jacobi and Gauss-Seidel converge or diverge simultaneously, and when they converge, then Gauss-Seidel is around twice as fast. This can also be proven rigorously. Here are some results of a program by Nolan Leaky. Solving 100 random 3 by 3 general system up to precision 103 Number of diagonally dominant: 1 Number of matrices for which Jacobi converged: 12 Average number of steps: 62.57 Number of matrices for which Gauss-Seidel converged: 22 Average number of steps: 30.66 Solving 100 random 4 by 4 general system up to precision 103 Number of diagonally dominant: 0 Number of matrices for which Jacobi converged: 2
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 Average number of steps: 124 Number of matrices for which Gauss-Seidel converged: 6 Average number of steps: 45.66 As you can expect, the situation for tridiagonal matrices is much better: Solving 100 random 3 by 3 tridiagonal system up to precision 103 Number of diagonally dominant: 5 Number of matrices for which Jacobi converged: 30 Average number of steps: 49.4 Number of matrices for which Gauss-Seidel converged: 30 Average number of steps: 24.4 Solving 100 random 4 by 4 system up to precision 103 Number of diagonally dominant: 1 Number of matrices for which Jacobi converged: 14 Average number of steps: 63.13 Number of matrices for which Gauss-Seidel converged: 14 Average number of steps: 33.9
59