Num Math PDF
Num Math PDF
Num Math PDF
1. Numerical analysis
Numerical analysis is the branch of mathematics which study and develop the algorithms that
use numerical approximation for the problems of mathematical analysis (continuous mathematics).
Numerical technique is widely used by scientists and engineers to solve their problems. A major
advantage for numerical technique is that a numerical answer can be obtained even when a problem
has no analytical solution. However, result from numerical analysis is an approximation, in general,
which can be made as accurate as desired. For example to find the approximate values of 2, etc.
In this chapter, we introduce and discuss some basic concepts of scientific computing. We begin
with discussion of floating-point representation and then we discuss the most fundamental source of
imperfection in numerical computing namely roundoff errors. We also discuss source of errors and then
stability of numerical algorithms.
Definition 3.1 (Normal form). A non-zero floating-point number is in normal form if the values of
mantissa lies in (1, 1 ) or [ 1 , 1].
Therefore, we normalize the representation by a1 6= 0. Not only the precision is limited to a finite
number of digits, but also the range of exponent is also restricted. Thus there are integers m and M
such that m e M .
3.1. Rounding and chopping. Let x be any real number and f l(x) be its machine approximation.
There are two ways to do the cutting to store a real number
x = (0.a1 a2 . . . an an+1 . . . ) e , a1 6= 0.
(1) Chopping: We ignore digits after an and write the number as following in chopping
f l(x) = (.a1 a2 . . . an ) e .
(2) Rounding: Rounding is defined as following
(0.a1 a2 . . . an ) e , 0 an+1 < /2
(rounding down)
f l(x) =
(0.a1 a2 . . . an ) + (0.00 . . . 01) e , /2 an+1 < (rounding up).
Example 1.
0.86 100 (rounding)
6
fl =
7 0.85 100 (chopping).
Rules for rounding off numbers:
(1) If the digit to be dropped is greater than 5, the last retained digit is increased by one. For example,
12.6 is rounded to 13.
(2) If the digit to be dropped is less than 5, the last remaining digit is left as it is. For example,
12.4 is rounded to 12.
(3) If the digit to be dropped is 5, and if any digit following it is not zero, the last remaining digit is
increased by one. For example,
12.51 is rounded to 13.
(4) If the digit to be dropped is 5 and is followed only by zeros, the last remaining digit is increased
by one if it is odd, but left as it is if even. For example,
11.5 is rounded to 12, and 12.5 is rounded to 12.
Definition 3.2 (Absolute and relative error). If f l(x) is the approximation to the exact value x, then
|x f l(x)|
the absolute error is |x f l(x)|, and relative error is .
|x|
Remark: As a measure of accuracy, the absolute error may be misleading and the relative error is more
meaningful.
Definition 3.3 (Overflow and underflow). An overflow is obtained when a number is too large to fit
into the floating point system in use, i.e e > M . An underflow is obtained when a number is too small,
i.e e < m . When overflow occurs in the course of a calculation, this is generally fatal. But underflow
is non-fatal: the system usually sets the number to 0 and continues. (Matlab does this, quietly.)
NUMERICAL ALGORITHMS AND ERRORS 3
1 X ai
= n
i
i=n+1
1 a a
n+1 i
X
n n+1
i
i=n+2
1 an+1
n n+1
Since an+1 /2, therefore
e
1 /2
|x f l(x)| n n+1
1
= n .
2
Therefore, for both cases
1 en
|x f l(x)| .
2
Now
|x f l(x)| 1 n e 1
1 e
= 1n .
|x| 2 2
5. Significant Figures
All measurements are approximations. No measuring device can give perfect measurements without
experimental uncertainty. By convention, a mass measured to 13.2 g is said to have an absolute
uncertainty of plus or minus 0.1 g and is said to have been measured to the nearest 0.1 g. In other
words, we are somewhat uncertain about that last digit-it could be a 2; then again, it could be a
1 or a 3. A mass of 13.20 g indicates an absolute uncertainty of plus or minus 0.01 g.
The number of significant figures in a result is simply the number of figures that are known with some
degree of reliability.
The number 25.4 is said to have 3 significant figures. The number 25.40 is said to have 4 significant
figures
Rules for deciding the number of significant figures in a measured quantity:
(1) All nonzero digits are significant:
1.234 has 4 significant figures, 1.2 has 2 significant figures.
(2) Zeros between nonzero digits are significant: 1002 has 4 significant figures.
(3) Leading zeros to the left of the first nonzero digits are not significant; such zeros merely indicate
the position of the decimal point: 0.001 has only 1 significant figure.
(4) Trailing zeros that are also to the right of a decimal point in a number are significant: 0.0230 has
3 significant figures.
(5) When a number ends in zeros that are not to the right of a decimal point, the zeros are not neces-
sarily significant: 190 may be 2 or 3 significant figures, 50600 may be 3, 4, or 5 significant figures.
The potential ambiguity in the last rule can be avoided by the use of standard exponential, or scien-
tific, notation. For example, depending on whether the number of significant figures is 3, 4, or 5, we
would write 50600 calories as:
0.506 106 (3 significant figures)
0.5060 106 (4 significant figures), or
0.50600 106 (5 significant figures).
What is an exact number? Some numbers are exact because they are known with complete certainty.
Most exact numbers are integers: exactly 12 inches are in a foot, there might be exactly 23 students in
a class. Exact numbers are often found as conversion factors or as counts of objects. Exact numbers
can be considered to have an infinite number of significant figures. Thus, the number of apparent
significant figures in any exact number can be ignored as a limiting factor in determining the number
of significant figures in the result of a calculation.
NUMERICAL ALGORITHMS AND ERRORS 5
which is a maximum relative error. Therefore it shows that when the given numbers are added then
the magnitude of absolute error in the result is the sum of the magnitudes of the absolute errors in
that numbers.
Error in subtraction of numbers. As in the case of addition, we can obtain the maximum absolute
errors for subtraction of numbers
|X| |x1 | + |x2 |.
Also
X x1 x2
X X + X
0.5
Therefore A = A = 1/200r2
100
r 100 A
Percentage error in r = 100 = = 0.25.
r r A
r
7.342
Example 7. Find the relative error in calculation of , where numbers 7.342 and 0.241 are correct
0.241
to three decimal places. Determine the smallest interval in which true result lies.
x1 7.342
Sol. Let = = 30.467
x2 0.241
Here errors x1 = x2 = 21 103 = 0.0005.
Therefore relative error
0.0005 0.0005
Er
+ = 0.0021
7.342 0.241
Absolute error
x1
Ea 0.0021 = 0.0639
x2
7.342
Hence true value of lies between 30.4647 0.0639 = 30.4008 and 30.4647 + 0.0639 = 30.5286.
0.241
7. Loss of significance, stability and conditioning
Roundoff errors are inevitable and difficult to control. Other types of errors which occur in com-
putation may be under our control. The subject of numerical analysis is largely preoccupied with
understanding and controlling errors of various kinds. Here we examine some of them.
7.1. Loss of significance. One of the most common error-producing calculations involves the can-
cellation of significant digits due to the subtractions nearly equal numbers (or the addition of one very
large number and one very small number). The phenomenon can be illustrated with the following
example.
Example 8. If x = 0.3721478693 and y = 0.3720230572. What is the relative error in the computation
of x y using five decimal digits of accuracy?
Sol. We can compute with ten decimal digits of accuracy and can take it as exact.
x y = 0.0001248121.
Both x and y will be rounded to five digits before subtraction. Thus
f l(x) = 0.37215
f l(y) = 0.37202.
f l(x) f l(y) = 0.13000 103 .
Relative error, therefore is
(x y) (f l(x) f l(y)
Er = .04% = 4%.
xy
Example 9. Consider the stability of x + 1 1 when x is near 0. Rewrite the expression to rid it
of subtractive cancellation.
Sol. Suppose that x = 1.2345678 105 . Then x + 1 1.000006173. If our computer (or calculator)
can only keep 8 significant digits, this will be rounded to 1.0000062. When 1 is subtracted, the result
is 6.2 106 .
Thus 6 significant digits have been lost from the original. To fix this, we rationalize the expression
x+1+1 x
x + 1 1 = ( x + 1 1) = .
x+1+1 x+1+1
This expression has no subtractions, and so is not subject to subtractive cancelling. When x =
1.2345678 105 , this expression evaluates approximately as
1.2345678 105
= 6.17281995 106
2.0000062
8 NUMERICAL ALGORITHMS AND ERRORS
the addition of the nearly equal numbers which will not cause serious loss of significant figures.
To obtain a more accurate 4-digit rounding approximation for x1 , we change the formulation by
rationalizing the numerator, that is,
2c
x1 = .
b + b2 4ac
Then
2.000
f l(x1 ) = = 2.000/124.2 = 0.01610.
62.10 + 62.06
The relative error in computing x1 is now reduced to 0.62103 . However, if rationalize the numerator
in x2 to get
2c
x2 = .
b b2 4ac
The use of this formula results not only involve the subtraction of two nearly equal numbers but also
division by the small number. This would cause degrade in accuracy.
2.000
f l(x2 ) = = 2.000/.04000 = 50.00
62.10 62.06
The relative error in x2 becomes 0.19.
Example 12. How to evaluate y x sin x, when x is small.
Sol. Since x sin x, x is small. This will cause loss of significant figures. Alternatively, if we use
Taylor series for sin x, we obtain
x3 x5 x7
y = x (x + + ...)
3! 5! 7!
x3 x5 x7
= + ...
6 6 20 6 20 42
x3 x2 x2 x2
= 1 (1 (1 )(...)) .
6 20 42 72
7.2. Conditioning. The words condition and conditioning are used to indicate how sensitive the
solution of a problem may be to small changes in the input data. A problem is ill-conditioned if
small changes in the data can produce large changes in the results. For a certain types of problems, a
condition number can be defined. If that number is large, it indicates an ill-conditioned problem. In
contrast, if the number is modest, the problem is recognized as a well-conditioned problem.
The condition number can be calculated in the following manner:
7.3. Stability of an algorithm. Another theme that occurs repeatedly in numerical analysis is the
distinction between numerical algorithms are stable and those that are not. Informally speaking, a
numerical process is unstable if small errors made at one stage of the process are magnified and prop-
agated in subsequent stages and seriously degrade the accuracy of the overall calculation.
An algorithm can be thought of as a sequence of problems, i.e. a sequence of function evaluations.
In this case we consider the algorithm for evaluating f (x) to consist of the evaluation of the sequence
x1 , x2 , , xn . We are concerned with the condition of each of the functions f1 (x1 ), f2 (x2 ), , fn1 (xn1 )
where f (x) = fi (xi ) for all i. An algorithm is unstable if any fi is ill-conditioned, i.e. if any fi (xi ) has
condition much worse than f (x). Consider the example
f (x) = x + 1 x
so that there is potential loss of significance when x is large. Taking x = 12345 as an example, one
possible algorithm is
x0 : = x = 12345
x1 : = x0 + 1
x2 : = x
1
x3 : = x0
f (x) := x4 : = x2 x3 .
The loss of significance occurs with the final subtraction. We can rewrite the last step in the form
f3 (x3 ) = x2 x3 to show how the final answer depends on x3 . As f30 (x3 ) = 1, we have the condition
x3 f30 (x3 ) x3
K(x3 ) =
=
f3 (x3 ) x2 x3
from which we find K(x3 ) 2.2 104 when x = 12345. Note that this is the condition of a subproblem
arrived at during the algorithm. To find an alternative algorithm we write
x+1+ x 1
f (x) = ( x + 1 x) =
x+1+ x x+1+ x
This suggests the algorithm
x0 : = x = 12345
x1 : = x0 + 1
x2 : = x
1
x3 : = x0
x4 : = x2 + x3
f (x) := x5 : = 1/x4 .
In this case f3 (x3 ) = 1/(x2 + x3 ) giving a condition for the subproblem of
x3 f30 (x3 ) x3
K(x3 ) = =
f3 (x3 ) x2 + x3
which is approximately 0.5 when x = 12345, and indeed in any case where x is much larger than 1.
Thus first algorithm is unstable and second is stable for large values of x. In general such analyses are
not usually so straightforward but, in principle, stability can be analysed by examining the condition
of a sequence of subproblems.
Exercises
(1) Determine the number of significant digits in the following numbers: 123, 0.124, 0.0045,
0.004300, 20.0045, 17001, 170.00, and 1800.
(2) Find the absolute, percentage, and relative errors if x = 0.005998 is rounded-off to three decimal
digits.
(3) Round-off the following numbers correct to four significant figures: 58.3643, 979.267, 7.7265,
56.395, 0.065738 and 7326853000.
NUMERICAL ALGORITHMS AND ERRORS 11
(4) The following numbers are given in a decimal computer with a four digit normalized mantissa:
A = 0.4523e 4, B = 0.2115e 3, and C = 0.2583e1.
Perform the following operations, and indicate the error in the result, assuming symmetric
rounding:
(i) A + B + C (ii) A B (iii) A/C (iv) AB/C.
(5) Assume 3-digit mantissa with rounding
(i) Evaluate y = x3 3x2 + 4x + 0.21 for x = 2.73.
(ii) Evaluate y = [(x 3)x + 4]x + 0.21 for x = 2.73.
Compare and discuss the errors obtained in part (i) and (ii).
(6) Associativity not necessarily hold for floating point addition (or multiplication).
Let a = 0.8567 100 , b = 0.1325 104 , c = 0.1325 104 , then a + (b + c) = 0.8567 100 ,
and (a + b) + c) = 0.1000 101 .
The two answers are NOT the same!
Show the calculations.
(7) Calculate the sum of 3, 5, and 7 to four significant digits and find its absolute and relative
errors.
(8) Rewrite ex cos x to be stable when x is near 0.
(9) Find the smaller root of the equation
x2 400x + 1 = 0
using four digits rounding arithmetic.
(10) Discuss the condition number of the polynomial function f (x) = 2x2 + x 1.
(11) Suppose that a function ln is available to compute the natural logarithm of its argument.
Consider the calculation of ln(1 + x), for small x, by the following algorithm
x0 : = x
x1 : = x0
f (x) := x2 : = ln(x1 )
By considering the condition K(x1 ) of the subproblem of evaluating ln(x1 ), show that such a
function ln is inadequate for calculating ln(1 + x) accurately.
Bibliography
[Atkinson] K. Atkinson and W. Han, Elementary Numerical Analysis, John Willey and Sons, Third
edition, 2004.
[Conte] Samuel D. Conte and Carle de Boor, Elementary Numerical Analysis: An Algorithmic
Approach, Third edition, McGraw-Hill, New York, 1980.
CHAPTER 2 (6 LECTURES)
ROOTS OF NON-LINEAR EQUATIONS
1. Introduction
Finding one or more root of the equation
f (x) = 0
is one of the more commonly occurring problems of applied mathematics. In most cases explicit
solutions are not available and we must be satisfied with being able to find a root to any specified
degree of accuracy. The numerical procedures for finding the roots are called iterative methods.
Definition 1.1 (Simple and multiple root). A root having multiplicity one is called a simple root. For
example, f (x) = (x 1)(x 2) has a simple root at x = 1 and x = 2, but g(x) = (x 1)2 has a root
of multiplicity 2 at x = 1, which is therefore not a simple root.
A multiple root is a root with multiplicity m 2 is called a multiple point or repeated root. For example,
in the equation (x 1)2 = 0, x = 1 is multiple (double) root.
If a polynomial has a multiple root, its derivative also shares that root.
Let be a root of the equation f (x) = 0, and imagine writing it in the factored form
f (x) = (x )m (x)
with some integer m 1 and some continuous function (x) for which () 6= 0. Then we say that
is a root of f (x) of multiplicity m.
Definition 1.2 (Convergence). A sequence {xn } is said to be converge to a point with order p if
there is exist a constant c such that
|xn+1 |
lim p = c, n 0.
n |xn |
End if.
Until |a b| (tolerance value).
Print root as c.
Example 1. Perform the five iterations of the bisection method to obtain the smallest root of the
equation x3 5x + 1 = 0.
Sol. We write f (x) = x3 5x + 1 = 0.
Since f (0) > 0 and f (1) < 0,
= the smallest root lies in the interval (0, 1). 1 Taking a0 = 0 and b0 = 1, we obtain c1 = 12 (a0 +b0 ) =
0.5.
Now f (c1 ) = 1.375, f (a0 ) f (c1 ) < 0.
This implies root lies in the interval [0, 0.5].
Now we take a1 = 0 and b1 = 0.5, then c2 = 12 (a1 + b1 ) = 0.25
f (c2 ) = 0.2343, and f (a1 ) f (c2 ) < 0,
which implies root lies in interval [0, 0.25].
Similarly applying the same procedure, we can obtain the other iterations as given in the following
Table.
0.1875 + 0.21875
Root lies in (0.1875, 0.21875), and we take the mid point = 0.203125 as root .
2
2.2. Convergence analysis. Now we analyze the convergence of the iterations.
Theorem 2.1. Suppose that f C[a, b] and f (a)f (b) < 0. The Bisection method generates a sequence
{ck } approximating a zero of f with linear convergence.
Proof. Let [a0 , b0 ], [a1 , b1 ], ... denote the successive intervals produced by the bisection algorithm.
Thus
a = a0 a1 a2 b0 = b
b = b0 b1 b2 a0 = a.
This implies {an } and {bn } are monotonic and bounded and hence convergent.
Since
1
b1 a1 = (b0 a0 )
2
1 1
b2 a2 = (b1 a1 ) = 2 (b0 a0 )
2 2
........................
1
bn an = n (b0 a0 ).
2
Hence
lim (bn an ) = 0.
n
Take limit
lim an = lim bn = (say).
n n
1Choice of Initial approximations: Initial approximations to the root are often known from the physical significance
of the problem. Graphical methods are used to find the zero of f (x) = 0 and any value in the neighborhood of root can
be taken as initial approximation.
If the given equation f (x) = 0 can be written as f1 (x) = f2 (x) = 0, then the point of the intersection of the graphs
y = f1 (x) and y = f2 (x) gives the root of the equation. Any value in the neighborhood of this point can be taken as
initial approximation.
ROOTS OF NON-LINEAR EQUATIONS 3
= f () = 0.
i.e. limit of {an } and {bn } is a zero of [a, b].
Let cn+1 = 21 (an + bn )
Then
1 1
| cn+1 | = | lim an (an + bn )| |bn (an + bn )| (since an bn , n)
n 2 2
1 1
= |bn an | = n+1 |b0 a0 |.
2 2
By definition of convergence, we can say that the bisection method converges linearly with rate 12 .
Note: 1. From the statement of the bisection algorithm, it is clear that the algorithm always converges,
however, can be very slow.
ak + bk
2. Computing ck : It might happen that at a certain iteration k, computation of ck = will
2
give overflow. It is better to compute ck as:
bk ak
ck = ak + .
2
Stopping Criteria: Since this is an iterative method, we must determine some stopping criteria that
will allow the iteration to stop. Criterion |f (ck )| very small can be misleading since it is possible to
have |f (ck )| very small, even if ck is not close to the root.
Lets now find out what is the minimum number of iterations N needed with the bisection method
b0 a0
to achieve a certain desired accuracy. The interval length after N iterations is . So, to obtain
2N
b0 a0
an accuracy of , we must have . That is,
2N
2N (b0 a0 ) ,
or
log(b0 a0 ) log
N .
log 2
Note the number N depends only on the initial interval [a0 , b0 ] bracketing the root.
Example 2. Find the minimum number of iterations needed by the bisection algorithm to approximate
the root in interval [2.5, 4] of x3 6x2 + 11x 6 = 0 with error tolerance 103 .
Algorithm:
1. Give inputs and take two initial guesses x0 and x1 .
2. Start iterations
x1 x0
x2 = x1 f0 .
f1 f0
3. If |f (x2 )| < (error tolerance) then stop and print the root.
4. Repeat the iterations (step 2). Also check if the number of iterations has exceeded the maximum
number of iterations.
Example 3. Apply secant method to find the root of the equation
cos x x ex = 0.
ROOTS OF NON-LINEAR EQUATIONS 5
2. for k = 0, 1, 2, . . . do
if f (x) is sufficiently small
then x = x
return x
end
f (xk )
3. xk+1 = xk 0
f (xk )
If |xk+1 xk | is sufficiently small
then x = xk+1
return x
end
4. end (for main loop)
Example 4. Use Newtons Method in computing of 2.
Sol. This number satisfies the equation f (x) = 0 where f (x) = x2 2 = 0.
Since f 0 (x) = 2x, it follows that in Newtons Method, we can obtain the next iterate from the previous
iterate xk by
x2 2 xk 1
xk+1 = xk k = + .
2xk 2 xk
Starting with x0 = 1, we obtain
1 1
x1 = + = 1.5
2 1
1.5 1
x2 = + = 1.41666667
2 1.5
x3 = 1.41421569
x4 = 1.41421356
x5 = 1.41421356.
Since the fourth and fifth iterates agree in to eight decimal places, we assume that 1.41421356 is a
correct solution to f (x) = 0, to at least eight decimal places.
Example 5. Perform four iterations to Newtons method to obtain the approximate value of (17)1/3
start with x0 = 2.0.
Sol. Let x = 171/3 which implies x3 = 17.
Let f (x) = x3 17 = 0.
Newton approximations are given by
x3k 17 2x3k + 17
xk+1 = xk = , k = 0, 1, 2, . . . .
3x2k 3x2k
Start with x0 = 2.0, we obtain
x1 = 2.75, x2 = 2.582645, x3 = 2.571332, x4 = 2.571282 etc.
Example 7. Find, correct to 5 decimal places, the x-coordinate of the point on the curve y = ln x
which is closest to the origin. Use the Newton Method.
Sol. Let (x, ln x) be a general point on the curve, and let S(x) be the square of the distance from
(x, ln x) to the origin. Then
S(x) = x2 + ln2 x.
We want to minimize the distance. This is equivalent to minimizing the square of the distance. Now
the minimization process takes the usual route. Note that S(x) is only defined when x > 0. We have
ln x 2
S 0 (x) = 2x + 2 = (2x2 + ln x).
x x
Our problem thus comes down to solving the equation S 0 (x) = 0. We can use the Newton Method
directly on S 0 (x), but calculations are more pleasant if we observe that S 0 (x) = 0 is equivalent to
x2 + ln x = 0.
Let f (x) = x2 + ln x. Then f 0 (x) = 2x + 1/x and we get the recurrence relation
x2k + ln xk
xk+1 = xk
2xk + 1/xk
We need to find a suitable starting point x0 . Experimentation with a calculator suggests that we take
x0 = 0.65.
Then x1 = 0.6529181, and x2 = 0.65291864.
Since x1 agrees with x2 to 5 decimal places, we can perhaps decide that, to 5 places, the minimum
distance occurs at x = 0.65292.
Theorem 3.2. Let f C 2 [a, b]. If is a simple root of f (x) = 0 and f 0 () 6= 0, then Newtons method
generates a sequence {xn } converging to for any initial approximation x0 near to .
f ( + k )
xk+1 = xk k = 1, 2, . . .
f 0 ( + k )
ROOTS OF NON-LINEAR EQUATIONS 9
k f 00 ()
0
k f () 1 + + ...
2 f 0 ()
= k
f 00 ()
0
f () 1 + k 0 + ...
f ()
1
1 f 00 () f 00 ()
= k k 1 + + ... 1+ 0 k
2 f 0 () f ()
1 f 00 () f 00 ()
= k k 1 + + ... 1 0 k + ...
2 f 0 () f ()
1 f 00 () 2
= + O(3k )
2 f 0 () k
= C2k ,
f 00 ()
where C = .
f 0 ()
This error analysis shows that Newton method has second-order convergence.
Theorem 3.3. Let f(x) be twice continuously differentiable on the closed finite interval [a, b] and let
the following conditions be satisfied:
(i) f (a) f (b) < 0.
(ii) f 0 (x) 6= 0, x [a, b].
(iii) Either f 00 (x) 0 or f 00 (x) 0, x [a, b].
(iv) At the end points a, b,
|f (a)| |f (b)|
0
< b a, < b a.
|f (a)| |f 0 (b)|
Then the Newtons method converges to the unique solution of f (x) = 0 in [a, b] for any choice of
x0 [a, b].
Some comments about these conditions: Conditions (i) and (ii) guarantee that there is one and
only one solution in [a, b]. Condition (iii) states that the graph of f (x) is either concave from above or
concave from below, and furthermore together with condition (ii) implies that f 0 (x) is monotone on
[a, b]. Added to these, condition (iv) states that the tangent to the curve at either endpoint intersects
the xaxis within the interval [a, b]. Proof of the Theorem is left as an exercise for interested readers.
Example 8. Find an interval containing the smallest positive zero of f (x) = ex sin x and which
satisfies the conditions of previous Theorem for convergence of Newtons method.
Sol. f (x) = ex sin x, we have f 0 (x) = ex cos x, f 00 (x) = ex sin x.
We choose [a, b] = [0, 1]. Then since f (0) = 1, f (1) = 0.47, we have f (a)f (b) < 0, so that condition
(i) is satisfied.
Since f 0 (x) < 0, x [0, 1], condition (ii) is satisfied, and since f 00 (x) > 0, x [0, 1], condition (iii) is
satisfied.
Finally since f (0) = 1, f 0 (0) = 2,
|f (0)| |f (1)|
0
= 1/2 < b a = 1, and since f (1) = 0.47 and f 0 (1) = 0.90, 0 = 0.52 < 1. This verify
|f (0)| |f (1)|
condition (iv).
Newtons iteration will therefore converge for any choice of x0 in [0, 1].
Example 9. Find all the roots of cos x x2 x = 0 to five decimal places.
10 ROOTS OF NON-LINEAR EQUATIONS
Sol. f (x) = cos x x2 x = 0 has two roots in the interval (2, 1) and (0, 1). Applying Newton
method,
cos xn x2n xn
xn+1 = xn .
sin xn 2xn 1
Take x0 = 1.5 for the root in the interval (2, 1), we obtain
x1 = 1.27338985, x2 = 1.25137907, x3 = 1.25115186, x4 = 1.25114184.
Starting with x0 = 0.5, we can obtain the root in (0, 1) and iterations are given by
x1 = 0.55145650, x2 = 0.55001049, x3 = 0.55000935.
Hence roots correct to five decimals are 1.25115 and 0.55001.
3.5. Newton method for multiple roots. Let be a root of f (x) = 0 with multiplicity m. In this
case we can write
f (x) = (x )m (x).
In this case
f () = f 0 () = ... = f (m1) () = 0, f (m) () 6= 0.
Now
m m+1
f (xk ) = f ( + k ) = k f (m) () + k
f (m+1) () + . . .
m! (m + 1)!
m1 m
f 0 (xk ) = f 0 ( + k ) = k
f (m) () + k f (m+1) () + . . .
(m 1)! m!
Therefore
f ( + k )
k+1 = k
f 0 ( + k )
" #
m
k (m) k f (m+1) ()
f () 1 + + ...
m! (m + 1) f (m) ()
= k " #
m1
k k f (m+1) ()
f (m) () 1 + + ...
(m 1)! m f (m) ()
" # " #1
k k f (m+1) () k f (m+1) ()
= k 1+ + ... 1+ + ...
m (m + 1) f (m) () m f (m) ()
!
k k f (m+1) ()
= k 1 + ...
m m f (m) ()
1
= k (1 ) + O(2k ).
m
This implies method has linear rate of convergence for multiple roots.
However when the multiplicity of the root is known in advance, we can modify the method to increase
the order of convergence.
We consider
f (xk )
xk+1 = xk e 0
f (xk )
where e is an arbitrary constant to be determined.
If is a multiple root with multiplicity m then error equation
k+1 = k (1 e/m) + O(2k ).
If method is to have quadratic rate of convergence then 1 e/m = 0.
This implies e = m. Therefore, for multiple roots Newton method with quadratic convergence will be
f (xk )
xk+1 = xk m .
f 0 (xk )
ROOTS OF NON-LINEAR EQUATIONS 11
Example 10. Let f (x) = ex x 1. Show that f has a zero of multiplicity 2 at x = 0. Show that
Newtons method with x0 = 1 converges to this zero but not quadratically.
Sol. We have f (x) = ex x 1, f 0 (x) = ex 1 and f 00 (x) = ex .
Now f (0) = 1 0 1 = 0, f 0 (0) = 1 1 = 0 and f 00 (0) = 1. Therefore f has a zero of multiplicity 2
at x = 0.
f (xk )
Starting with x0 = 1, iterations are given by xk+1 = xk 0 x1 = 0.58198, x2 = 0.31906,
f (xk )
x3 = 0.16800
x4 = 0.08635, x5 = 0.04380, x6 = 0.02206.
Example 11. The equation f (x) = x3 7x2 + 16x 12 = 0 has a double root at x = 2.0. Starting
with x0 = 1, find the root correct to three decimals.
Sol. Firstly we apply simple Newton method and successive iterations are given by
x3k 7x2k + 16xk 12
xk+1 = xk , k = 0, 1, 2, . . .
3x2k 14xk + 16
Start with x0 = 1.0, we obtain
x1 = 1.4, x2 = 1.652632, x3 = 1.806484, x4 = 1.89586
x5 = 1.945653, x6 = 1.972144, x7 = 1.985886, x8 = 1.992894
x9 = 1.996435, x10 = 1.998214, x11 = 1.999106, x12 = 1.999553.
The root correct to 3 decimal places is x12 = 2.000.
If we apply modified Newtons Method then
x3k 7x2k + 16xk 12
xk+1 = xk 2 , k = 0, 1, 2, . . .
3x2k 14xk + 16
Start with x0 = 1.0, we obtain
x1 = 1.8, x2 = 1.984615, x3 = 1.999884.
The root correct to 3 decimal places is 2.000 and in this case we need less iterations to get desired
accuracy.
Example 12. Apply the Newton method with x0 = 0.8 to the equation f (x) = x3 x2 x + 1 = 0, and
verify the first-order of convergence. Then apply the modified Newton method with m = 2 and verify
the order of convergence.
Sol. Successive iterations in Newton method are given by
x3k x2k xk + 1
xk+1 = xk .
3x2k 2xk 1
Starting with x0 = 0.8, we obtain
x1 = 0.905882, x2 = 0.954132, x3 = 0.977338, x4 = 0.988734.
Since the exact toot is = 1, we have the error in approximations
0 = | x0 | = 0.2 = 0.2 100
1 = | x1 | = 0.094118 = 0.94 101
2 = | x2 | = 0.045868 = 0.46 101
3 = | x3 | = 0.022662 = 0.22 101
4 = | x4 | = 0.011266 = 0.11 101
which shows the linear convergence (error is almost half in the consecutive steps).
Iterations in modified Newtons method are given by
x3k x2k xk + 1
xk+1 = xk 2 .
3x2k 2xk 1
12 ROOTS OF NON-LINEAR EQUATIONS
n 1 2 3
0 2.0 2.0 2.0
1 3.0 1.5 1.75
2 9.0 2.0 1.732147
3 87.0 1.5 1.73205
Now 3 = 1.73205 and it is clear that third choice is correct but why other two are not working?
Therefore which of the approximation is correct or not, we will answer after the convergence result
(which requires |g 0 () < 1| in the neighborhood of ) for convergence.
Lemma 4.1. Let g(x) be a continuous function on [a, b] and assume that a g(x) b, x [a, b] i.e.
g([a, b]) [a, b] then x = g(x) has at least one solution in [a, b].
Proof. Let g be a continuous function on [a, b].
Let assume that a g(x) b, x [a, b].
Now consider (x) = g(x) x.
If g(a) = a or g(b) = b then proof is trivial. Hence we assume that a 6= g(a) and b 6= g(b).
Now since a g(x) b
= g(a) > a and g(b) < b.
Now
(a) = g(a) a > 0
and
(b) = g(b) b < 0.
Now is continuous and (a)(b) < 0, therefore by Intermediate Value Theorem has at least one
zero in [a, b], i.e. there exists some s.t.
g() = , [a, b].
Graphically, the roots are the intersection points of y = x & y = g(x) as shown in the Figure.
ROOTS OF NON-LINEAR EQUATIONS 13
Theorem 4.2 (Contraction Mapping Theorem). Let g & g 0 are continuous functions on [a, b] and
assume that g satisfy a g(x) b, x [a, b]. Furthermore, assume that there is a constant 0 < < 1
s.t.
= max g 0 (x).
axb
Then
1. x = g(x) has a unique solution of x = g(x) in the interval [a, b].
2. The iterates xn+1 = g(xn ), n 1 will converge to for any choice of x0 [a, b].
n
3. | xn | |x1 x0 |, n 0.
1
4.
| xn+1 |
lim = |g 0 ()|.
n | xn |
Thus for xn close to , xn+1 g 0 ()( x0 ).
Proof. Let g and g 0 are continuous functions on [a, b] and assume that a g(x) b, x [a, b].
By previous Lemma, there exists at least one solution to x = g(x).
By Mean-Value Theorem, c s.t.
g(x) g(y) = g 0 (c)(x y).
|g(x) g(y)| |x y|, 0 < < 1, x [a, b].
1. Let x = g(x) has two solutions, say and in [a, b] then = g(), and = g().
Now | | = |g() g()| | |
= (1 )| | 0
Since 0 < < 1, = = .
= x = g(x) has a unique solution in [a, b] which is (say).
2. To check the convergence of iterates {xn }, we observe that they all remain in [a, b] as xn
[a, b], xn+1 = g(xn ) [a, b].
Now
| xn+1 | = |g() g(xn )| = g 0 (cn )| xn |
for some cn between and xn .
= | xn+1 | | xn | 2 | xn1 |
................
n | x0 |
As n , n 0 which = xn .
14 ROOTS OF NON-LINEAR EQUATIONS
shows that iterates are linearly convergent. If in addition g 0 () 6= 0, then formula proves that conver-
gence is exactly linear, with no higher order of convergence being possible. In this case, the value of
g 0 () is the linear rate of convergence.
In practice, we dont use the above result in the Theorem. The main reason is that it is difficult to find
an interval [a, b] for which a g(x) b condition is satisfied. Therefore, we use the Theorem in the
following practical way.
Corollary 4.3. Let g & g 0 are continuous on some interval c < x < d with the fixed point contained
in this interval. Moreover assume that
|g 0 ()| < 1.
Thus there is an interval [a, b] around for which the hypothesis and hence conclusions of Theorem
are true.
On the contrary if |g 0 ()| > 1, then the iteration method xn+1 = g(xn ) will not converge to .
When |g 0 ()| = 1, no conclusion can be drawn and even if convergence occur, the method would be far
too slow for the iteration method to be practical.
Remark 4.2. The possible behavior of fixed-point iterates {xn } is shown in Figure for various values
of g 0 (). To see the convergence, consider the case case of x1 = g(x0 ), the height of y = g(x) at x0 .
We bring the number x1 back to the x-axis by using the line y = x and the height y = x1 . We continue
this with each iterate, obtaining a stair-step behavior when g 0 () > 0. When g 0 () < 0, the iterates
oscillates around the fixed point , as can be seen in the Figure. In first figure (on top) iterations are
monotonic convergence, in second oscillatory convergent, in third figure iterations are divergent and in
the last figure iterations are oscillatory divergent.
ROOTS OF NON-LINEAR EQUATIONS 15
Theorem 4.4. Let is a root of x = g(x), and g(x) is p times continuously differentiable function
for all x near to , for some p 2. Furthermore assume
g 0 () = = g (p1) () = 0. (4.1)
Then if the initial guess x0 is sufficiently close to , then iteration
xn+1 = g(xn ), n0
will have order of convergence p and
xn+1 (p)
p1 g ()
lim p = (1) .
n ( xn ) p!
Proof. Let g(x) is p times continuously differentiable function for all x near to and satisfying the
conditions in equation (4.1) stated above.
Now expand g(xn ) about .
xn+1 = g(xn ) = g( + xn )
(xn )p1 (p1) (xn )p (p)
= g() + (xn )g 0 () + + g () + g (n )
(p 1)! p!
for some n between xn and . Using equation (4.1) and g() = , we obtain
(xn )p (p)
xn+1 = g (n ).
p!
xn+1 g (p) (n )
= =
(xn )p p!
16 ROOTS OF NON-LINEAR EQUATIONS
xn+1 (p)
p1 g (n )
= = (1)
( xn )p p!
which proves the result.
Remark: The Newton method can be analyzed by this result.
f (x) f (x) f 00 (x)
g(x) = x 0 , g 0 (x) =
f (x) [f 0 (x)]2
f 00 ()
g 0 () = 0, g 00 () = 6= 0,
f 0 ()
as f 0 () 6= 0 and f 00 () is either positive or negative.
Example 13. Use a fixed-point method to determine a solution to within 104 for x = tan x, for x in
[4, 5].
Sol. Using g(x) = tan x and x0 = 4 gives x1 = g(x0 ) = tan 4 = 1.158, which is not in the interval [4, 5].
So we need a different fixed-point function.
If we note that x = tan x implies that
1 1
=
x tan x
1 1
= x = x + .
x tan x
1 1
Starting with x0 and taking g(x) = x + ,
x tan x
we obtain x1 = 4.61369, x2 = 4.49596, x3 = 4.49341, x4 = 4.49341.
As x3 and x4 agree to five decimals, it is reasonable to assume that these values are sufficiently accurate.
Example 14. Consider the equation x3 7x + 2 = 0 in [0, 1]. Write a fixed-point iteration which will
converge to the solution.
1
Sol. We rewrite the equation in the form x = (x3 + 2) and define the fixed-point iteration xn+1 =
7
1 3
(x + 2).
7 n
1
Now g(x) = (x3 + 2)
7
then g : [0, 1] [0, 1] and |g 0 (x)| < 3/7 < 1, x [0, 1].
Hence by the Contraction Mapping Theorem the sequence {xn } defined above will converge to the
unique solution of given equation. Starting with x0 = 0.5, we can compute the solution as following.
x1 = 0.303571429
x2 = 0.28971083
x3 = 0.289188016.
Therefore root correct to three decimals is 0.289.
Example 15. The iterates xn+1 = 2 (1 + c)xn + cx3n will converge to = 1 for some values of
constant c (provided that x0 is sufficiently close to ). Find the values of c for which convergence
occurs? For what values of c, if any, convergence is quadratic?
Sol. Fixed-point iteration
xn+1 = g(xn )
with
g(x) = 2 (1 + c)x + cx3 .
If = 1 is a fixed point then for convergence |g 0 ()| < 1
= | (1 + c) + 3c2 | < 1
= 0 < c < 1/2.
For this value of c, g 00 ()
6= 0.
For quadratic convergence
g 0 () = 0 & g 00 () 6= 0.
ROOTS OF NON-LINEAR EQUATIONS 17
a + 2 + 1
=
a+1
= 3 2 1 = 0.
Therefore, the above formula is used to find the roots of the equation f (x) = x3 x2 1 = 0.
Now substitute xn = + n , and xn+1 = + n+1 , we get
1
(a + 1)( + n+1 ) = a( + n ) + (1 + /)2 + 1
2
which implies
(a + 1)n+1 = (a 2/3 )n + O(2n ).
Therefore, for fastest convergence, we have a = 2/3 . Here is the root of the equation x3 x2 1 = 0
and can be computed by the Newton method.
Example 17. To compute the root of the equation
ex = 3 loge x,
using the formula
3 loge xn exp(xn )
xn+1 = xn ,
p
show that p = 3 gives rapid convergence.
Sol. Substitute xn = + n , and xn+1 = + n+1 , in the given equation
3 loge ( + n ) exp( n )
n+1 = n
p
1
= n [3 loge + 3 loge (1 + n /) exp() exp(n )]
p
1
3 loge + 3 n / 2n /22 + O(3n ) exp() 1 n + 2n /2 . . .
= n
p
Since is exact root, exp() 3 loge = 0, therefore error equation will be
1
n+1 = 1 (3/ + e ) n + O(2n ).
p
The method have rapid convergence if
3
p= + e
where is the root of ex = 3 loge x = 0. The root lies in (1, 2) and by applying Newtons method
with x0 = 1.5, we get
x1 = 1.053213, x2 = 1.113665, x3 = 1.115447, x4 = 1.115448.
Taking = 1.11545, we obtain p = 2.9835. Hence p = 3.
18 ROOTS OF NON-LINEAR EQUATIONS
Exercises
(1) Given the following equations: (i) x4 x 10 = 0, (ii) x ex = 0.
Find the initial approximations for finding the smallest positive root. Use these to find the
root correct three decimals with Secant and Newton method.
(2) Find all solutions of e2x = x + 6, correct to 4 decimal places using the Newton Method.
(3) Use the bisection method to find the indicated root of the following equations. Use an error
tolerance of = 0.0001.
(a) The real root of x3 x2 x 1 = 0.
(b) The smallest positive root of cos x = 1/2 + sin x.
(c) The real roots of x3 2x 1 = 0.
(4) Suppose that
2
e1/x , x 6= 0
f (x) =
0, x = 0
The function f is continuous everywhere, in fact differentiable arbitrarily often everywhere, and
0 is the only solution of f (x) = 0. Show that if x0 = 0.0001, it takes more than one hundred
million iterations of the Newton Method to get below 0.00005.
(5) A calculator is defective: it can only add, subtract, and multiply. Use the equation 1/x = 1.37,
the Newton Method, and the defective calculator to find 1/1.37 correct to 8 decimal places.
(6) Use the Newton Method to find the smallest and the second smallest positive roots of the
equation tan x = 4x, correct to 4 decimal places.
(7) What is the order of convergence of the iteration
xn (x2n + 3a)
xn+1 =
3x2n + a
as it converges to the fixed point = a ?
(8)
What are the solutions , if any, of the equation x = 1 + x ? Does the iteration xn+1 =
1 + xn , converge to any of these solutions (assuming x0 is chosen sufficiently close to ) ?
(9) (a) Apply Newtons method to the function
x, x 0
f (x) =
x, x < 0
with the root = 0. What is the behavior of the iterates? Do they converge, and if so, at what
rate?
(b) Do the same as in (a), but with
3
x2 , x 0
f (x) = 3
x2 , x < 0.
(10) Find all positive roots of the equation
Z x
2
10 ex dt = 1
0
(12) Show that the following two sequences have convergence of the second order with the same
limit a.
x2n
1 a 1
(i) xn+1 = xn 1 + 2 (ii) xn+1 = xn 3 .
2 xn 2 a
If xn is a suitably close to approximation to a, show that the error in the first formula for
xn+1 is about one-third of that in the second formula, and deduce that the formula
3a x2
1
xn+1 = xn 6 + 2 n
8 xn a
gives a sequence with third-order convergence.
(13) Suppose is a zero of multiplicity m of f , where f (m) is continuous on an open interval
containing . Show that the fixed-point method x = g(x) with the following g has second-
order convergence:
f (x)
g(x) = x m 0 .
f (x)
Bibliography
[Gerald] Curtis F. Gerald and Patrick O. Wheatley Applied Numerical Analysis, 7th edition,
Pearson, 2003.
[Atkinson] K. Atkinson and W. Han. Elementary Numerical Analysis, 3rd edition, John Willey
and Sons, 2004.
[Jain] M. K. Jain, S. R. K. Iyengar, and R. K. Jain. Numerical Methods for Scientific and
Engineering Computation, 6th edition, New Age International Publishers, New Delhi,
2012.
CHAPTER 3 (4 LECTURES)
NUMERICAL SOLUTION OF SYSTEM OF LINEAR EQUATIONS
1. Introduction
System of simultaneous linear equations are associated with many problems in engineering and
science, as well as with applications to the social sciences and quantitative study of business and
economic problems. These problems occur in wide variety of disciplines, directly in real world problems
as well as in the solution process for other problems.
The principal objective of this Chapter is to discuss the numerical aspects of solving linear system of
equations having the form
a x + a12 x2 + .........a1n xn = b1
11 1
a21 x1 + a22 x2 + .........a2n xn = b2
(1.1)
................................................
an1 x1 + an2 x2 + .........ann xn = bn .
This is a linear system of n equation in n unknowns x1 , x2 ......xn . This system can simply be written
in the matrix equation form
Ax=b
a11 a12 a1n x1 b1
a21 a22 a2n x2 b2
.. .. = .. (1.2)
.. .. ..
. . . . . .
an1 an2 ann xn bn
This equations has a unique solution x = A1 b, when the coefficient matrix A is non-singular. Unless
otherwise stated, we shall assume that this is the case under discussion. If A1 is already available,
then x = A1 b provides a good method of computing the solution x.
If A1 is not available, then in general A1 should not be computed solely for the purpose of obtaining
x. More efficient numerical procedures will be developed in this chapter. We study broadly two
categories Direct and Iterative methods. We start with direct method to solve the linear system.
2. Gaussian Elimination
Direct methods, which are technique that give a solution in a fixed number of steps, subject only to
round-off errors, are considered in this chapter. Gaussian elimination is the principal tool in the direct
solution of system (1.2). The method is named after Carl Friedrich Gauss (1777-1855). To solve larger
system of linear equation we use a method of introductory Algebra.
For example, consider
x1 + 2x2 + x3 = 0
2x1 + 2x2 + 3x3 = 3
x1 3x2 = 2.
Eliminating x1 from second and third equations, we obtain
x1 + 2x2 + x3 = 0
2x2 + x3 = 3
x2 + x3 = 2.
1
2 SOLUTION OF SYSTEM OF LINEAR EQUATIONS
Now eliminate x2 from last equation with the help of second equation
x1 + 2x2 + x3 = 0
2x2 + x3 = 3
1
x3 = 2.
2
The last equation gives x3 = 1.
Therefore 2x2 + 1 = 3, = x2 = 1.
x1 + 2(1) + 1 = 0, = x1 = 1.
(3) Input the coefficients of the linear equation with right side as:
Do for i = 1 to n
Do for j = 1 to n + 1
Read a[i][j] End for j
End for i
(4) Do for k = 1 to n 1
Do for i = k + 1 to n
Do for j = k + 1 to n + 1
a[i][j] = a[i][j] a[i][k]/a[k][k] a[k][j] End for j
End for i
End for k
(5) Compute x[n] = a[n][n + 1]/a[n][n]
(6) Do for i = n 1 to 1
sum = 0
Do for j = i + 1 to n
sum = sum +a[i][j] x[j] End for j
x[i] = 1/a[i][i] (a[i][n + 1] sum)
End for i
(7) Display the result x[i]
(8) Stop
Partial Pivoting: In the elimination process, it is assumed that pivot element aii 6= 0, i = 1, 2, . . . , n.
If at any stage of elimination, one of the pivot becomes small (or zero) then we bring other element as
pivot by interchanging rows.
Here are some tips that will allow us to determine what type of solutions we have based on either the
reduced echelon form.
1. If we have a leading one in every column, then we will have a unique solution.
2. If we have a row of zeros equal to a non-zero number in right side, then the system has no solution.
3. If we dont have a leading one in every column in a homogeneous system, i.e. a system where all
the equations equal zero, or a row of zeros, then system have infinite solutions.
Example 1. Solve the system of equations. This system has solution x1 = 2.6, x2 = 3.8, x3 = 5.0.
Sol. Let us use a floating-point representation with 4digits and all operations will be rounded.
Augmented matrix is given by
6.000 2.000 2.000 2.000
2.000 1.667 0.3333 1.000
1.000 2.000 1.000 0.0
2 1
Multipliers are m21 = = 0.3333 and m31 = = 0.1667.
6 6
(2) (2)
a21 = a21 m21 a11 , a22 = a22 m21 a12 etc.
6.000 2.000 2.000 2.000
0.0 0.0001000 0.3333 1.667
0.0 1.667 1.333 0.3334
4 SOLUTION OF SYSTEM OF LINEAR EQUATIONS
1.667
Multiplier is m32 = = 16670
0.0001
6.000 2.000 2.000 2.000
0.0 0.0001000 0.3333 1.667
0.0 0.0 5555 27790
Using back substitution, we obtain
x3 = 5.003
x2 = 0.0
x1 = 1.335.
We observe that computed solution is not compatible with the exact solution.
The difficulty is in a22 . This coefficient is very small (almost zero). This means that the coefficient
in this position had essentially infinite relative error and this was carried through into computation
involving this coefficient. To avoid this, we interchange second and third rows and then continue the
elimination.
In this case (after interchanging) multipliers is m32 = 0.00005999.
6.000 2.000 2.000 2.000
0.0 1.667 1.337 0.3334
0.0 0.0 0.3332 1.667
Using back substitution, we obtain
x3 = 5.003
x2 = 3.801
x1 = 2.602.
We see that after partial pivoting, we get the desired solution.
Complete Pivoting: In the first stage of elimination, we search the largest element in magnitude
from the entire matrix and bring it at the position of first pivot. We repeat the same process at every
step of elimination. This process require interchange of both rows and columns.
Scaled Partial Pivoting: In this approach, the algorithm selects as the pivot element the entry that
has the largest relative entries in its rows.
At the beginning, a scale factor must be computed for each equation in the system. We define
si = max |aij | (1 i n)
1jn
These numbers are recored in the scaled vector s = [s1 , s2 , , sn ]. Note that the scale vector does not
change throughout the procedure. In starting the forward elimination process, we do not arbitrarily
ai,1
use the first equation as the pivot equation. Instead, we use the equation for which the ratio is
si
greatest. We repeat the process by taking same scaling factors.
Example 2. Solve the system
3x1 13x2 + 9x3 + 3x4 = 19
6x1 + 4x2 + x3 18x4 = 34
6x1 2x2 + 2x3 + 4x4 = 16
12x1 8x2 + 6x3 + 10x4 = 26
by hand using scaled partial pivoting. Justify all row interchanges and write out the transformed matrix
after you finish working on each column.
Sol. The augmented matrix is
3 13 9 3 19
6 4 1 18 34
6 2 2 4 16
12 8 6 10 26.
SOLUTION OF SYSTEM OF LINEAR EQUATIONS 5
and the scale factors are s1 = 13, s2 = 18, s3 = 6, & s4 = 12. We need to pick the largest (3/13, 6/18, 6/6, 12/12),
which is the third entry, and interchange row 1 and row 3 and interchange s1 and s3 to get
6 2 2 4 16
6 4 1 18 34
3 13 9 3 19
12 8 6 10 26.
with s1 = 6, s2 = 18, s3 = 13, s4 = 12. Performing R2 (6/6)R1 R2, R3 (3/6)R1 R3 , and
R4 (12/6)R1 R4 , we obtain
6 2 2 4 16
0
2 3 14 18
0 12 8 1 27
0 4 2 2 6
Comparing (a22 /s2 = 2/18, a32 /s3 = 12/13, a42 /s4 = 4/12), the largest is the third entry so we need
to interchange row 2 and row 3 and interchange s2 and s3 to get
6 2 2 4 16
0 12 8 1 27
0 2 3 14 18
0 4 2 2 6
with s1 = 6, s2 = 13, s3 = 18, s4 = 12. Performing R3 (2/12)R2 R3 and R4 (4/12)R2 R4,
we get
6 2 2 4 16
0 12 8 1 27
0 0 13/3 83/6 45/2
0 0 2/3 5/3 3
Comparing (a33 /s3 = (13/3)/18, a43 /s4 = (2/3)/12), the largest is the first entry so we do not inter-
change rows. Performing R4 (2/13)R3 R4 , we get the final reduced matrix
6 2 2 4 16
0 12 8 1 27
0 0 13/3 83/6 45/2
0 0 0 6/13 6/13
Backward substitution gives x1 = 3, x2 = 1, x3 = 2, x4 = 1.
Example 3. Solve this system of linear equations:
0.0001x + y = 1
x+y =2
using no pivoting, partial pivoting, and scaled partial pivoting. Carry at most five significant digits
of precision (rounding) to see how finite precision computations and roundoff errors can affect the
calculations.
Sol. By direct substitution, it is easy to verify that the true solution is x = 1.0001 and y = 0.99990 to
five significant digits.
For no pivoting, the first equation in the original system is the pivot equation, and the multiplier is
1/0.0001 = 10000. The new system of equations is
0.0001x + y = 1
9999y = 9998
We obtain y = 9998/9999 0.99990 and x = 1. Notice that we have lost the last significant digit in
the correct value of x.
We repeat the solution process using partial pivoting for the original system. We see that the second
6 SOLUTION OF SYSTEM OF LINEAR EQUATIONS
entry is larger, so the second equation is used as the pivot equation. We can interchange the two
equations, obtaining
x+y =2
0.0001x + y = 1
which gives y = 0.99980/0.99990 0.99990 and x = 2 y = 2 0.99990 = 1.0001.
Both computed values of x and y are correct to five significant digits.
We repeat the solution process using scaled partial pivoting for the original system. Since the scaling
constants are s = (1, 1) and the ratios for determining the pivot equation are (0.0001/1, 1/1), the
second equation is now the pivot equation. We do not actually interchange the equations and use
the second equation as the first pivot equation. The rest of the calculations are as above for partial
pivoting. The computed values of x and y are correct to five significant digits.
Operations count for Gauss elimination We consider the number of floating point operations
(flops) required to solve the system Ax = b. Gaussian Elimination first uses row operations to
transform the problem into an equivalent problem of the form U x = b, where U is upper triangular.
Then back substitution is used to solve for x. First we look at how many floating point operations are
required to reduce into upper triangular matrix.
First a multiplier is computed for each row. Then in each row the algorithm performs n multiplies and
n adds. This gives a total of (n 1) + (n 1)n multiplies (counting in the computing of the multiplier
in each of the (n 1) rows) and (n 1)n adds.
In total this is 2n2 n 1 floating point operations to do a single pivot on the n by n system.
Then this has to be done recursively on the lower right subsystem, which is an (n1) by (n1) system.
This requires 2(n 1)2 (n 1) 1 operations. Then this has to be done on the next subsystem,
requiring 2(n 2)2 (n 2) 1 operations, and so on.
In total, then, we use In total floating point operations, with
n
X n(n + 1)(4n 1) 2
In = (2k 2 k 1) = n n3 .
6 3
k=1
Counts for back substitution: To find xn we just requires one division. Then to solve for xn1 we
requires 3 flops. Similarly, solving for xn2 requires 5 flops. Thus in total back substitution requires
Bn total floating point operations with
X n
Bn = (2k 1) = n(n 1) n = n(n 2) n2 .
k=1
The LU Factorization: When we use matrix multiplication, another meaning can be given to the
Gauss elimination. The matrix A can be factored into the product of the two triangular matrices.
Let AX = b is the system to be solved, A is n n coefficient matrix. The linear system can be reduced
to the upper triangular system U X = g with
u11 u12 u1n
0 u22 u2n
U = ..
. . ..
. . .
0 0 unn
Here uij = aij . Introduce an auxiliary lower triangular matrix L based on the multipliers mij as
following
1 0 0 0
m21
1 0 0
L=
m31 m32 1 0
.. .. ..
. . .
mn,1 0 mn,n1 1
Theorem 2.1. Let A be a non-singular matrix and let L and U be defined as above. If U is produced
without pivoting then
LU = A.
SOLUTION OF SYSTEM OF LINEAR EQUATIONS 7
3 1 1
Multipliers are m21 = , m31 = , and m41 = .
4 2 4
Replace R2 with R2 m21 R1 , R3 with R3 m31 R1 and R4 with R4 m41 R1 .
4 3 2 1 1
0 7/4 3/2 5/4 1/4
0 3/2 3 5/2 3/2
0 5/4 5/2 15/4 5/4
6 5
Multipliers are m32 = and m42 = .
7 7
Replace R3 with R3 m32 R2 and R4 with R4 m42 R2 , we obtain
4 3 2 1 1
0 7/4 3/2 5/4 1/4
0 0 12/7 10/7 12/7
0 0 10/7 20/7 10/7
5
Multiplier is m43 = and we replace R4 with R4 m43 R3 .
6
4 3 2 1 1
0 7/4 3/2 5/4 1/4
0 0 12/7 10/7 12/7
0 0 0 5/3 0
0 0 0 5/3
1 0 0 0
3/4 1 0 0
L=
1/2 6/7 1
0
1/4 5/7 5/6 1
It can be verified that LU = A.
8 SOLUTION OF SYSTEM OF LINEAR EQUATIONS
3. Iterative Method
The linear system Ax = b may have a large order. For such systems Gauss elimination is often too
expensive in either computation time or computer memory requirements or both.
In an iterative method, a sequence of progressively iterates is produced to approximate the solution.
Jacobi and Gauss-Seidel Method: We start with an example and let us consider a system of
equations
9x1 + x2 + x3 = 10
2x1 + 10x2 + 3x3 = 19
3x1 + 4x2 + 11x3 = 0.
One class of iterative method for solving this system as follows.
We write
1
x1 = (10 x2 x3 )
9
1
x2 = (19 2x1 3x3 )
10
1
x3 = (0 3x1 4x2 ).
11
(0) (0) (0)
Let x(0) = [x1 x2 x3 ] be an initial approximation of solution x. Then define an iteration of sequence
(k+1) 1 (k) (k)
x1 = (10 x2 x3 )
9
(k+1) 1 (k) (k)
x2 = (19 2x1 3x3 )
10
(k+1) 1 (k) (k)
x3 = (0 3x1 4x2 ), k = 0, 1, 2, . . . .
11
This is called Jacobi or method of simultaneous replacements. The method is named after German
mathematician Carl Gustav Jacob Jacobi.
We start with [0 0 0] and obtain
(1) (1) (1)
x1 = 1.1111, x2 = 1.900, x3 = 0.0.
(2) (2) (2)
x1 = 0.9000, x2 = 1.6778, x3 = 0.9939
etc.
An another approach to solve the same system will be following.
(k+1) 1 (k) (k)
x1 = (10 x2 x3 )
9
(k+1) 1 (k+1) (k)
x2 = (19 2x1 3x3 )
10
(k+1) 1 (k+1) (k+1)
x3 = (0 3x1 4x2 ), k = 0, 1, 2, . . . .
11
This method is called Gauss-Seidel or method of successive replacements. It is named after the German
mathematicians Carl Friedrich Gauss and Philipp Ludwig von Seidel. Starting with [0 0 0], we obtain
(1) (1) (1)
x1 = 1.1111, x2 = 1.6778, x3 = 0.9131.
(2) (2) (2)
x1 = 1.0262, x2 = 1.9687, x3 = 0.9588.
General approach: We rewrite the system Ax = b as
(k+1) (k)
a11 x1 = a12 x2 + b1
(k+1) (k+1) (k)
a21 x1 + a22 x2 = a23 x3 + b2
..................................
(k+1) (k+1)
an1 x1 + an2 x2 + + ann x(k+1)
n = bn
or (D + L)x(k+1) = U x(k) + b
SOLUTION OF SYSTEM OF LINEAR EQUATIONS 9
where D, L and U are diagonal, strictly lower triangle and upper triangle matrices, respectively.
x(k+1) = (D + L)1 U x(k) + (D + L)1 b
x(k+1) = T x(k) + B, k = 0, 1, 2,
Here T = (D + L)1 U and this matrix is called iteration matrix.
Algorithm[Gauss-Seidel]
(1) Input matrix A = [aij ], b, XO = x(0) , tolerance TOL, maximum number of iterations
(2) Set k = 1
(3) while (k N ) do step 4-7
(4) For i = 1, 2, , n
i1 n
1 X X
xi = (aij xj ) (aij XOj ) + bi )
aii
j=1 j=i+1
= T 2 (x x(k2) )
= T k (x x(0 ).
Let z = x x(0) ) then
lim T k z = lim (x x(k) ) = x lim x(k) = x x = 0.
k k k
= (T ) < 1.
x(k) = T x(k1) + B.
Method is convergent iff (T ) < 1, where T = (D + L)1 U .
Let be an eigenvalue of matrix A and x be an eigenvector then
T x = x
(D + L)1 U x = x
U x = (D + L)x
X Xi
naij = [ aij xj ], i = 1, 2, . . . , n
j=i+1 j=1
X i1
X
naij = aii xi + aij xj
j=i+1 j=1
i1
X n
X
aii xi = aij xj aij xj
j=1 j=i+1
i1
X n
X
|aii xi | || |aij | |xj | + || |aij | |xj |
j=1 j=i+1
Since x is an eigenvector, x 6= 0, so we can take norm ||x|| = 1.
Hence
i1
X n
X
|| |aii |
|aij |
|aij |
j=1 j=i+1
Pn Pn
j=i+1 |aij | j=i+1 |aij |
= || Pi1 Pn =1
|aii | j=1 |aij | j=i+1 |aij |
Theorem 4.2. If A is an n n diagonalizable matrix with a dominant eigenvalue, then there exists a
nonzero vector x0 such that the sequence of vectors given by
Ax0 , A2 x0 , A3 x0 , . . . , Ak x0 , . . .
approaches a multiple of the dominant eigenvector of A.
Proof. Let A is diagonalizable, which implies that it has n linearly independent eigenvectors
x1 , x2 , . . . , xn with corresponding eigenvalues of 1 , 2 , , n .
We assume that these eigenvalues are ordered so that 1 is the dominant eigenvalue (with a corre-
sponding eigenvector of x1 ).
Because the n eigenvectors x1 , x2 , . . . , xn are linearly independent, they must form a basis for Rn .
For the initial approximation x0 , we choose a nonzero vector such that the linear combination
x0 = c1 x1 + c2 x2 + + cn xn
has nonzero leading coefficients. (If c1 = 0, the power method may not converge, and a different x0
must be used as the initial approximation.
Now, operating both sides of this equation by A produces
Ax0 = Ac1 x1 + Ac2 x2 + + Acn xn
Ax0 = c1 (Ax1 ) + c2 (Ax2 ) + + cn (Axn )
Ax0 = c1 (1 x1 ) + c2 (2 x2 ) + + cn (n 2xn )
As i are eigenvalues, so Axi = i xi .
Repeated multiplication of both sides of this equation by A produces
Ak x0 = c1 (k1 x1 ) + c2 (k2 x2 ) + + cn (kn xn )
which implies that " #
k k
2 n
Ak x0 = k1 c1 x1 + c2 x2 + + cn xn
1 1
Now, from our original assumption that 1 is larger in absolute value than the other eigenvalues it
follows that each of the fractions
2 3 n
, , , < 1.
1 1 1
Therefore each of the factors k k k
2 3 n
, , ,
1 1 1
must approach 0 as k approaches infinity. This implies that the approximation
Ak x0 k1 c1 x1 , c1 6= 0
SOLUTION OF SYSTEM OF LINEAR EQUATIONS 13
improves as k increases. Since x1 is a dominant eigenvector, it follows that any scalar multiple of x1 is
also a dominant eigenvector. Thus we have shown that Ak x0 approaches a multiple of the dominant
eigenvector of A.
Algorithm
(1) Start
(2) Define matrix A and initial guess x
(3) Calculate y = Ax
(4) Find the largest element in magnitude of matrix y and assign it to K.
(5) Calculate fresh value x = (1/K) y
(6) If [K(n) K(n 1)] > error, goto step 3.
(7) Stop
Example 7. Calculate seven iterations of the power method with scaling to approximate a dominant
eigenvector of the matrix
1 2 0
2 1 2
1 3 1
Sol. Using x0 = [1, 1, 1]T as initial approximation, we obtain
1 2 0 1 3
y1 = Ax0 = 2 1 2 1 = 1
1 3 1 1 5
and by scaling we obtain the approximation
3 0.60
x1 = 1/5 1 = 0.20
5 1.00
Similarly we get
1.00 0.45
y2 = Ax1 = 1.00 = 2.20 0.45 = 2.20x2
2.20 1.00
1.35 0.48
y3 = Ax2 = 1.55 = 2.8 0.55 = 2.8x3
2.8 1.00
0.51
y4 = Ax3 = 3.1 0.51
1.00
etc.
After several iterations, we observe that dominant eigenvector is
0.50
x = 0.50
1.00
Scaling factors are approaching to dominant eigenvalue = 3.
Remark 4.1. The power method is useful to compute the eigenvalue but it gives only dominant eigen-
value. To find other eigenvalue we use properties of matrix such as sum of all eigenvalue is equal to the
trace of matrix. Also if is an eigenvalue of A then 1 is the eigenvalue of A1 . Hence the smallest
eigenvalue of A is the dominant eigenvalue of A1 .
Remark 4.2. We consider A I then its eigenvalues are (1 , 2 , . . . ). Now the eigenvalues
1 1
of (A I)1 are ( , , . . . ).
1 2
The eigenvalues of the original matrix A that is the closest to corresponds to the eigenvalue of largest
magnitude of the shifted and inverted of matrix (A I)1 .
To find the eigenvalue closest to , we apply the Power method to obtain the eigenvalue of (AI)1 .
14 SOLUTION OF SYSTEM OF LINEAR EQUATIONS
Then we recover the eigenvalue of the original problem by = 1/ + . This method is shifted and
inverted. We solve y = (A I)1 x which implies (A I)y = x. We need not to compute the inverse
of the matrix.
Example 8. Find the eigenvalue of matrix nearest to 3
2 1 0
1 2 1
0 1 2
using power method.
Sol. The eigenvalue of matrix A which is nearest to 3 is the smallest eigenvalue in magnitude of A 3I.
Hence it is the largest eigenvalue of (A 3I)1 in magnitude. Now
1 1 0
A 3I = 1 1 1
0 1 1
0 1 1
B = (A 3I)1 = 1 1 1 .
1 1 0
Starting with
1
x0 = 1
1
we obtain
0 1 1 1 0
y1 = Bx0 = 1 1 1 1 = 1 = 1.x1
1 1 0 1 0
1
y2 = Bx1 = 1 = 1.x2
1
2 0.6667
y3 = Bx2 = 3 = 3 1 = 3x3
2 0.6667
1.6667 0.7143
y4 = Bx3 = 2.3334 = 2.3334 1 = 2.3334x4 .
1.6667 0.7143
After six iterations, we obtain the dominant eigenvalue of matrix B and which is 2.4 and the dominant
eigenvector is
0.7143
1 .
0.7143
1
Now the eigenvalue of matrix A is 3 2.4 = 3 0.42 = 3.42, 2.58. Since 2.58 does not satisfy
|A 2.58I| = 0, therefore the correct eigenvalue of matrix A nearest to 3 is 3.42.
Although the power method worked well in these examples, we must say something about cases in
which the power method may fail. There are basically three such cases :
1. Using the power method when A is not diagonalizable. Recall that A has n linearly Independent
eigenvector if and only if A is diagonalizable. Of course, it is not easy to tell by just looking at A
whether is is diagonalizable.
2. Using the power method when A does not have a dominant eigenvalue or when the dominant
eigenvalue is such that 1 = 2 .
3. If the entries of A contains significant error. Powers Ak will have significant roundoff error in their
entires.
SOLUTION OF SYSTEM OF LINEAR EQUATIONS 15
Exercises
(1) Using the four-decimal-place computer solve the following system of equation without and with
pivoting
0.729x1 + 0.81x2 + 0.9x3 = 0.6867
x1 + x2 + x3 = 0.8338
1.331x1 + 1.21x2 + 1.1x3 = 1.000
This system has exact solution, rounded to four places x1 = 0.2245, x2 = 0.2814, x3 = 0.3279.
(2) Solve the following system of equations by Gaussian elimination with partial and scaled partial
pivoting
x1 + 2x2 + x3 = 3
3x1 + 4x2 = 3
2x1 + 10x2 + 4x3 = 10.
(3) Consider the linear system
x1 + 4x2 = 1
4x1 + x2 = 0.
The true solution is x1 = 1/5 and x2 = 4/15. Apply the Jacobi and Gauss-Seidel methods
with x(0) = [0, 0]T to the system and find out which methods diverges rapidly. Next, interchange
the two equations to write the system as
4x1 + x2 = 0
x1 + 4x2 = 1
and apply both methods with x(0) = [0, 0]T .
Iterate until ||x x(k) || 105 . Which method converges faster?
(4) Solve the system of equations by Jacobi and Gauss-Seidel method
8x1 + x2 + 2x3 = 1
x1 5x2 + x3 = 16
x1 + x2 4x3 = 7.
(5) Solve this system of equations by Gauss-Seidel, starting with the initial vector [0,0,0]:
4.63x1 1.21x2 + 3.22x3 = 2.22
3.07x1 + 5.48x2 + 2.11x3 = 3.17
1.26x1 + 3.11x2 + 4.57x3 = 5.11.
(6) Show that Gauss-Seidel method does not converge for the following system of equations
2x1 + 3x2 + x3 = 1
3x1 + 2x2 + 2x3 = 1
x1 + 2x2 + 2x3 = 1.
(7) Consider the iteration
(k+1) 2 1
x =b+ x(k) , k0
1 2
where is a real constant. For some values of , the iteration method converges for any choice
of initial guess x(0) , and for some other values of , the method diverges. Find the values of
for which the method converges.
(8) Determine the largest eigenvalue and the corresponding eigenvector of the matrix
4 1 0
A = 1 20 1
0 1 4
correct to three decimals using the power method.
16 SOLUTION OF SYSTEM OF LINEAR EQUATIONS
Bibliography
[Gerald] Curtis F. Gerald and Patrick O. Wheatley Applied Numerical Analysis, 7th edition,
Pearson, 2003.
[Atkinson] K. Atkinson and W. Han. Elementary Numerical Analysis, 3rd edition, John Willey
and Sons, 2004.
CHAPTER 4 (6 LECTURES)
POLYNOMIAL INTERPOLATION AND APPROXIMATIONS
1. Introduction
Polynomials are used as the basic means of approximation in nearly all areas of numerical analysis.
They are used in the solution of equations and in the approximation of functions, of integrals and
derivatives, of solutions of integral and differential equations, etc. Polynomials have simple structure,
which makes it easy to construct effective approximations and then make use of them. For this reason,
the representation and evaluation of polynomials is a basic topic in numerical analysis. We discuss this
topic in the present chapter in the context of polynomial interpolation, the simplest and certainly the
most widely used technique for obtaining polynomial approximations.
Definition 1.1 (Polynomial). A polynomial Pn (x) of degree n is, by definition, a function of the
form
Pn (x) = a0 + a1 x + a2 x2 + + an xn (1.1)
with certain coefficients a0 , a1 , , an . This polynomial has (exact) degree n in case its leading coeffi-
cient an is nonzero.
The power form (1.1) is the standard way to specify a polynomial in mathematical discussions. It is
a very convenient form for differentiating or integrating a polynomial. But, in various specific contexts,
other forms are more convenient. For example, the following shifted power form may be helpful.
P (x) = a0 + a1 (x c) + a2 (x c)2 + + an (x c)n . (1.2)
It is good practice to employ the shifted power form with the center c chosen somewhere in the interval
[a, b] when interested in a polynomial on that interval.
Remark 1.1. The coefficients in the shifted power form provide derivative values, i.e.,
P (i) (c)
ai = , i = 0, 1, 2, , n.
i!
In effect, the shifted power form provides the Taylor expansion for P (x) around the center c.
Definition 1.2 (Newton form). A further generalization of the shifted power form is the following
Newton form
P (x) = a0 + a1 x c1 ) + a2 (x c1 )(x c2 ) + + an (x c1 )(x c2 ) (x cn ).
This form plays a major role in the construction of an interpolating polynomial. It reduces to the
shifted power form if the centers c1 , , cn , all equal c, and to the power form if the centers c1 , , cn ,
all equal zero. The following discussion on the evaluation of the Newton form therefore applies directly
to these simpler forms as well.
It is inefficient to evaluate each of the n + 1 terms in the Newton form separately and then sum. This
would take n + n(n + 1)/2 additions and n(n + 1)/2 multiplications. Instead, we notice that the factor
(x c1 ) occurs in all terms but the first and the factor (x c2 ) occurs in remaining factors and then
(x c3 ) and so on. Finally we get
P (x) = a0 + (x c1 ) {a1 + (x c2 )[a2 + (x c3 )[a3 + + [(x cn1 [an1 + (x cn )an ] ]}
Now for any particular value of x takes 2n additions and n multiplications.
Theorem 1.3 (Algorithm). (Nested Multiplication) Let P (x) be the polynomial in Newton form having
coefficients a0 , a1 , , an and centers c1 , c2 , , cn , the following algorithm computes y = P (x) for a
given real number x.
y = an
for i = n 1, n 2, , 0 do
1
2 INTERPOLATION AND APPROXIMATIONS
y = ai + (x ci )y
end.
Example 1. Consider the interpolating polynomial
P3 (x) = 3 7(x + 1) + 8(x + 1)x 6(x + 1)x(x 1).
We will use nested multiplication to write this polynomial in the power form
P3 (x) = b0 + b1 x + b2 x2 + b3 x3 .
This requires repeatedly applying nested multiplication to a polynomial of the form
P (x) = a0 + x1 (x c1 ) + a1 (x c1 )(x c2 ) + a3 (x c1 )(x c2 )(x c3 ),
and for each application it will perform the following steps,
b3 = a3
b2 = a2 + (z c3 )b3
b1 = a1 + (z c2 )b2
b0 = a0 + (z c1 )b1 ,
where, in this example, we will set z = 0 each time.
The numbers b0 , b1 , b2 and b3 computed by the algorithm are the coefficients of P (x) in the Newton
form, with the centers c1 , c2 and c3 changed to z, c1 and c2 ; that is,
P (x) = b0 + b1 (x z) + b2 (x z)(x c1 ) + b3 (x z)(x c1 )(x c2 ).
It follows that b0 = P (z), which is why this algorithm is the preferred method for evaluating a
polynomial in Newton form at a given point z.
It should be noted that the algorithm can be derived by writing P (x) in the nested form
P (x) = a0 + (x c1 )[a1 + (x c2 ) [a2 + (x c3 )a3 ]]
and computing P (z) as follows:
P (z) = a0 + (z c1 )[a1 + (z c2 ) [a2 + (z c3 )a3 ]]
= a0 + (z c1 )[a1 + (z c2 ) [a2 + (z c3 )b3 ]]
= a0 + (z c1 )[a1 + (z c2 )b2 ]
= a0 + (z c1 )b1
= b0 .
Initially, we have
P (x) = 3 7(x + 1) + 8(x + 1)x 6(x + 1)x(x 1),
so the coefficients of P (x) in this Newton form are
a0 = 3, a1 = 7, a2 = 8, a3 = 6
with the centers
c1 = 1, c2 = 0, c3 = 1.
Applying nested multiplication to these coefficients and centers, with z = 0, yields
b3 = 6
b2 = 8 + (0 1)(6) = 14
b1 = 7 + (0 0)(14) = 7
b1 = 3 + (0 (1))(7) = 4.
It follows that
P (x) = 4 + (7)(x 0) + 14(x 0)(x (1)) + (6)(x 0)(x (1))(x 0)
= 4 7x + 14x(x + 1) 6x2 (x + 1),
For the second application of nested multiplication, we have the centers
c1 = 0, c2 = 1, c3 = 0
INTERPOLATION AND APPROXIMATIONS 3
with coefficients
a0 = 4, a1 = 7, a2 = 14, a3 = 6.
Applying nested multiplication to these coefficients and centers, with z = 0, yields
b3 = 6
b2 = 14 + (0 0)(6) = 14
b1 = 7 + (0 (1))(14) = 7
b1 = 4 + (0 0)(7) = 4.
It follows that
P (x) = 4 + 7(x 0) + 14(x 0)(x 0) + (6)(x 0)(x 0)(x (1))
= 4 + 7x + 14x2 6x2 (x + 1), n
For the third and final application of nested multiplication, we have the centers
c1 = 0, c2 = 0, c3 = 1
with coefficients
a0 = 4, a1 = 7, a2 = 14, a3 = 6.
Applying nested multiplication to these coefficients and centers, with z = 0, yields
b3 = 6
b2 = 14 + (0 (1))(6) = 8
b1 = 7 + (0 0)(8) = 7
b1 = 4 + (0 0)(7) = 4.
It follows that
P (x) = 4 + 7(x 0) + 8(x 0)(x 0) + (6)(x 0)(x 0)(x 0)
= 4 + 7x + 8x2 6x3 ,
and the centers are now 0, 0 and 0. Since all of the centers are equal to zero, the polynomial is now in
the power form.
2. Interpolation
In this chapter, we consider the interpolation problem. Suppose we do not know the function f , but
a few information (data) about f . Now we try to compute a function g that approximates f .
2.1. Polynomial Interpolation. The polynomial interpolation problem, also called Lagrange inter-
polation, can be described as follows: Given (n+1) data points (xi , yi ), i = 0, 1, , n find a polynomial
P of lowest possible degree such
yi = P (xi ), i = 0, 1, , n.
Such a polynomial is said to interpolate the data. Here yi may be the value of some unknown function
f at xi , i.e. yi = f (xi ).
One reason for considering the class of polynomials in approximation of functions is that they uniformly
approximate continuous function.
Theorem 2.1 (Weierstrass Approximation Theorem). Suppose that f is defined and continuous on
[a, b]. For any > 0, there exists a polynomial P (x) defined on [a, b] with the property that
|f (x) P (x)| < , x [a, b].
Another reason for considering the class of polynomials in approximation of functions is that the
derivatives and indefinite integrals of a polynomial are easy to compute.
Theorem 2.2 (Existence and Uniqueness). Given a real-valued function f (x) and n + 1 distinct points
x0 , x1 , , xn , there exists a unique polynomial Pn (x) of degree n which interpolates the unknown
f (x) at points x0 , x1 , , xn .
4 INTERPOLATION AND APPROXIMATIONS
Proof. Existence: Let (xi , f (xi )), i = 0, 1, , n. We prove the result by the mathematical induc-
tion.
The Theorem clearly holds for n = 0, only one data point is given and we can take constant polynomial
P0 (x) = f (x0 ), x.
Assume that the Theorem holds for n k, i.e. there is a polynomial Pk with degree k such that
Pk (xi ) = f (xi ), for 0 i k.
Now we try to construct a polynomial of degree at most k + 1 to interpolate (xi , f (xi )), 0 i k + 1.
Let
Pk+1 (x) = Pk (x) + c(x x0 )(x x1 ) (x xk ).
For x = xk+1 ,
Pk+1 (xk+1 ) = f (xk+1 ) = Pk (xk+1 ) + c(xk+1 x0 )(xk+1 x1 ) (xk+1 xk )
f (xk+1 ) Pk (xk+1 )
= c = .
(xk+1 x0 )(xk+1 x1 ) (xk+1 xk )
Since xi are distinct, the polynomial Pk+1 (x) is well-defined and degree of Pk+1 k + 1. Now
Pk+1 (xi ) = Pk (xi ) + 0 = Pk (xi ) = f (xi ), 0 i k
and
Pk+1 (xk+1 ) = f (xk+1 )
Above two equations implies
Pk+1 (xi ) = f (xi ), 0 i k + 1.
Therefore Pk+1 (x) interpolate f (x) at all k + 2 nodal points. By mathematical induction result is true
for all polynomials.
Uniqueness: Let there are two such polynomials Pn and Qn such that
Pn (xi ) = f (xi )
Qn (xi ) = f (xi ), 0 i n.
Define
Sn (x) = Pn (x) Qn (x)
Since for both Pn and Qn , degree n, which implies the degree of Sn is also n.
Also
Sn (xi ) = Pn (xi ) Qn (xi ) = f (xi ) f (xi ) = 0, 0 i n.
This implies Sn has at least n + 1 zeros which is not possible as degree of Sn is at most n.
This implies
Sn = 0
= Pn = Qn , x.
Therefore interpolating polynomial is unique.
2.2. Linear Interpolation. We determine a polynomial
P (x) = ax + b (2.1)
where a and b are arbitrary constants satisfying the interpolating conditions f (x0 ) = P (x0 ) and
f (x1 ) = P (x1 ). We have
f (x0 ) = P (x0 ) = ax0 + b
f (x1 ) = P (x1 ) = ax1 + b.
Lagrange interpolation: Solving for a and b, we obtain
f (x0 ) f (x1 )
a=
x0 x1
f (x0 )x1 f (x1 )x0
b=
x1 x0
Substituting these values in equation (2.1), we obtain
f (x0 ) f (x1 ) f (x0 )x1 f (x1 )x0
P (x) = x+
x0 x1 x1 x0
INTERPOLATION AND APPROXIMATIONS 5
x x1 x x0
= P (x) = f (x0 ) + f (x1 )
x0 x1 x1 x0
= P (x) = l0 (x)f (x0 ) + l1 (x)f (x1 )
x x1 x x0
where l0 (x) = and l1 (x) = .
x0 x1 x1 x0
These functions l0 (x) and l1 (x) are called the Lagrange Fundamental Polynomials and they satisfy the
following conditions.
l0 (x) + l1 (x) = 1.
l0 (x0 ) = 1, l0 (x1 ) = 0
l1 (x0 ) = 0, l1 (x1 ) = 1
1, i = j
= li (xj ) = ij =
0, i 6= j.
Newtons divided difference interpolation: Again write P (x) in different way as following
x x1 x x0
P (x) = f (x0 ) + f (x1 )
x0 x1 x1 x0
f (x0 )(x x1 ) f (x1 )(x x0 )
=
x0 x1
f (x1 ) f (x0 )
= f (x0 ) + (x x0 )
x1 x0
= f (x0 ) + (x x0 )f [x0 , x1 ].
f (x1 ) f (x0 )
The ratio f [x0 , x1 ] = , is called first divided difference of f (x).
x1 x0
Higher-order interpolation: In this section we take a different approach and assume that the
interpolation polynomial is given as a linear combination of n + 1 polynomials of degree n. This time,
we set the coefficients as the interpolated values, {f (xi )}ni=0 , while the unknowns are the polynomials.
We thus let
X n
Pn (x) = f (xi )li (x),
i=0
where li (x) are n + 1 polynomials of degree n. Note that in this particular case, the polynomials li (x)
are precisely of degree n (and not n). However, Pn (x), given by the above equation may have a
lower degree. In either case, the degree of Pn (x) is n at the most. We now require that Pn (x) satisfies
the interpolation conditions
Pn (xj ) = f (xj ), 0 j n.
By substituting xj for x we have
n
X
Pn (xj ) = f (xi )li (xj ), 0 j n.
i=0
Therefore we may conclude that li (x) must satisfy
li (xj ) = ij , i, j = 0, 1, , n
where ij is the Kronecker delta, defined as
1, i = j
ij =
6 j.
0, i =
Each polynomial li (x) has n + 1 unknown coefficients. The conditions given above through delta
provide exactly n + 1 equations that the polynomials li (x) must satisfy and these equations can be
solved in order to determine all li (x)s. Fortunately there is a shortcut. An obvious way of constructing
polynomials li (x) of degree n that satisfy the condition is the following:
(x x0 )(x x1 ) (x xi1 )(x xi+1 ) (x xn )
li (x) = .
(xi x0 )(xi x1 ) (xi xi1 )(xi xi+1 ) (xi xn )
The uniqueness of the interpolating polynomial of degree n given n + 1 distinct interpolation points
implies that the polynomials li (x) given by above relation are the only polynomials of degree n.
6 INTERPOLATION AND APPROXIMATIONS
Note that the denominator does not vanish since we assume that all interpolation points are distinct.
We can write the formula for li (x) in a compact form using the product notation.
(x x0 )(x x1 ) (x xi1 )(x xi+1 ) (x xn )
li (x) =
(xi x0 )(xi x1 ) (xi xi1 )(xi xi+1 ) (xi xn )
W (x)
= , i = 0, 1, , n
(x xi )W 0 (xi )
where
W (x) = (x x0 ) (x xi1 )(x xi )(x xi+1 ) (x xn )
W 0 (xi ) = (xi x0 ) (xi xi1 )(xi xi+1 ) (xi xn ).
We can write the Newton divided difference formula in the following fashion (and we will prove in
next Theorem).
Pn (x) = f (x0 ) + (x x0 )f [x0 , x1 ] + (x x0 )(x x1 )f [x0 , x1 , x2 ] +
+ (x x0 )(x x1 ) (x xn1 )f [x0 , x1 , , xn ]
n
X i1
Y
= f (x0 ) + f [x0 , x1 , , xi ] (x xj ).
i=1 j=0
Divided difference are calculated as following.
f [x1 , x2 ] f [x0 , x1 ]
f [x0 , x1 , x2 ] =
x2 x0
1 f (x2 ) f (x1 ) f (x1 ) f (x0 )
=
x2 x0 x2 x1 x1 x0
f (x0 ) f (x1 ) f (x2 )
= + +
(x0 x1 )(x0 x2 ) (x1 x0 )(x1 x2 ) (x2 x0 )(x2 x1 )
In general
f [x1 , x2 , , xn ] f [x0 , x1 , , xn1 ]
f [x0 , x1 , , xn ] =
xn x0
n
X f (xi )
= n
Q
i=0 (xi xj )
j=0
Example 2. Given the following four data points. Find a polynomial in Lagrange and Newton form
xi 0 1 3 5
yi 1 2 6 7
P3 (x) = f (x0 ) + (x 0)f [0, 1] + (x 0)(x 1)f [0, 1, 3] + (x 0)(x 1)(x 3)f [0, 1, 3, 5]
P3 (x) = 1 + x + 1/3x(x 1) 17/120x(x 1)(x 3).
Note that xi can be re-ordered but must be distinct. When the order of some xi are changed, one
obtain the same polynomial but in different form.
Remark 2.1. If more data points are added to the interpolation problem, we have to recalculate all the
cardinal numbers in Lagrange form but in Newton form we need not to recalculate which is the great
advantage of Newton form.
Example 3. Let f (x) = x x2 and P2 (x) be the interpolation polynomial on x0 = 0, x1 and x2 = 1.
Find the largest value of x1 in (0, 1) for which f (0.5) P2 (0.5) = 0.25.
p
Sol. If f (x) = x x2 then our nodes are [x0 , x1 , x2 ] = [0, x1 , 1] and f (x0 ) = 0, f (x1 ) = x1 x21
and f (x2 ) = 0. Therefore
(x x1 )(x x2 ) (x x1 )(x 1)
l0 (x) = = ,
(x0 x1 )(x0 x2 ) x1
(x x0 )(x x2 ) x(x 1)
l1 (x) = = ,
(x1 x0 )(x1 x2 ) x1 (x1 1)
(x x0 )(x x1 ) x(x 1)
l2 (x) = = .
(x2 x0 )(x2 x1 ) (1 x1 )
P2 (x) = l0 (x)f (x0 ) + l1 (x)f (x1 ) + l2 (x)f (x2 )
(x x1 )(x 1) x(x 1) x(x 1)
q
= .0 + . x1 x21 + .0
x1 x1 (x1 1) (1 x1 )
x(x 1)
= p .
x1 (1 x1 )
If we now consider f (x) P2 (x), then
p x(x 1)
f (x) P2 (x) = x x2 + p .
x1 (1 x1 )
Hence f (0.5) P2 (0.5) = 0.25 implies
p 0.5(0.5 1)
0.5 0.52 + p = 0.25
x1 (1 x1 )
Solving for x1 gives
x21 x1 = 1/9
or
(x 1/2)2 = 5/36
q q 1
which gives x1 = 12 36 5
or x1 = 1
2
5
+ 36 .
The largest of these is therefore r
1 5
x1 = + 0.8727.
2 36
Theorem 2.3. The unique polynomial of degree n that passes through (x0 , y0 ), (x1 , y1 ), , (xn , yn )
is given by
Pn (x) = f [x0 ] + f [x0 , x1 ](x x0 ) + f [x0 , x1 , x2 ](x x0 )(x x1 ) + +
f [x0 , , xn ](x x0 )(x x1 ) (x xn1 )
8 INTERPOLATION AND APPROXIMATIONS
Proof. We prove it by induction. The unique polynomial of degree 0 that passes through (x0 , y0 )
is obviously
P0 (x) = y0 = f [x0 ].
Suppose that the polynomial Pk (x) of order k that passes through (x0 , y0 ), (x1 , y1 ), , (xk , yk ) is
Pk (x) = f [x0 ] + f [x0 , x1 ](x x0 ) + f [x0 , x1 , x2 ](x x0 )(x x1 ) + +
f [x0 , , xk ](x x0 )(x x1 ) (x xk1 )
Write Pk+1 (x), the unique polynomial of order (degree) k that passes through (x0 , y0 ), (x1 , y1 ), , (xk , yk )(xk+1 , yk
by
Pk+1 (x) = f [x0 ] + f [x0 , x1 ](x x0 ) + f [x0 , x1 , x2 ](x x0 )(x x1 ) + +
f [x0 , , xk ](x x0 )(x x1 ) (x xk1 ) + C(x x0 )(x x1 ) (x xk1 )(x xk )
We only need to show that
C = f [x0 , x1 , , xk , xk+1 ].
For this, let Qk (x) be the unique polynomial of degree k that passes through (x1 , y1 ), , (xk , yk )(xk+1 , yk+1 ).
Define
x x0
R(x) = Pk (x) + [Qk (x) Pk (x)]
xk+1 x0
Then,
R(x) is a polynomial of degree k + 1.
R(x0 ) = Pk (x0 ) = y0 ,
xi x0
R(xi ) = Pk (xi ) + (Qk (xi ) Pk (xi )) = Pk (xi ) = yi , i = 1, , k,
xk+1 x0
R(xk+1 ) = Qk (xk+1 ) = yk+1 .
By the uniqueness, R(x) = Pk+1 (x).
The leading coefficient of Pk+1 (x) is C.
x x0
The leading coefficient of R(x) is the leading coefficient of [Qk (x) Pk (x)] which is
xk+1 x0
1
(leading coefficient of Qk (x) - leading coefficient of Pk (x)).
xk+1 x0
On the other hand, the leading coefficient of Qk (x) is f [x1 , , xk+1 ], and the leading coefficient of
Pk (x) is f [x0 , , xk ]. Therefore
f [x1 , , xk+1 ] f [x0 , , xk ]
C= = f [x0 , x1 , , xk+1 ].
xk+1 x0
Since Pn (xi ) = f (xi ) for i = 0, 1, , n, the function g has n + 1 distinct zeros in [a, b]. By the
generalized Rolles Theorem there exists (a, b) such that
g (n) () = f (n) () Pn(n) () = 0.
Here
Pn(n) (x) = n! f [x0 , x1 , , xn ].
Therefore
f (n) ()
f [x0 , x1 , , xn ] = .
n!
Truncation error: The polynomial P (x) coincides with f (x) at all nodal points and may deviates at
other points in the interval. This deviation is called the truncation error and we write
En (f ; x) = f (x) P (x).
Theorem 3.2. Suppose that x0 , x1 , , xn are distinct numbers in [a, b] and f C n+1 [a, b]. Let Pn (x)
be the unique polynomial of degree n that passes through n + 1 nodal points then
x [a, b], (a, b)
such that
(x x0 ) (x xn ) (n+1)
En (f ; x) = f (x) Pn (x) = f ().
(n + 1)!
Proof. Let x0 , x1 , , xn are distinct numbers in [a, b] and f C n+1 [a, b]. Let Pn (x) be the unique
polynomial of degree n that passes through n + 1 nodal points.
The truncation error in interpolation is given by
En (f ; x) = f (x) Pn (x).
En (f ; xi ) = 0, i = 0, 1, , n.
Now for any t in the domain, define
(t x0 ) (t xn )
g(t) = f (t) P (t) [f (x) P (x)] (3.1)
(x x0 ) (x xn )
Now g(t) = 0 at t = x, x0 , x1 , ....., xn . Therefore g(t) satisfy the conditions of Rolles Theorem which
states that between n + 2 zeros of a function, there is at least one zero of (n + 1)th derivative of the
function. Hence there exists a point such that
g (n+1) () = 0
where is some point such that
min(x0 , x1 , , xn , x) < < max(x0 , x1 , , xn , x).
Now differentiate (3.1) (n + 1) times, we get
(n + 1)!
g (n+1) (t) = f (n+1) (t) P (n+1) (t) [f (x) P (x)]
(x x0 ) (x xn )
(n + 1)!
= f (n+1) (t) [f (x) P (x)]
(x x0 ) (x xn )
Here P (n+1) (t) = 0 as P is a n degree polynomial.
Now g (n+1) () = 0 and then solving for f (x) P (x), we obtain
(x x0 ) (x xn ) (n+1)
f (x) P (x) = f ()
(n + 1)!
Truncation error is given by
(x x0 ) (x xn ) (n+1)
En (f ; x) = f (x) P (x) = f ().
(n + 1)!
10 INTERPOLATION AND APPROXIMATIONS
e
max |(x ih)(x (i + 1)h|.
2 xi xxi+1
Consider the function g(x) = (x ih)(x (i + 1)h), for ih x (i + 1)h. Because
h
g 0 (x) = (x (i + 1)h) + (x ih) = 2 x ih ),
2
the only critical point for g is at x = ih + h/2, with g(ih + h/2) = (h/2)2 = h2 /4. Since g(ih) = 0 and
g((i + 1)h) = 0, the maximum value of |g 0 (x)| in [ih, (i + 1)h] must occur at the critical point which
implies that
e e h2 eh2
|f (x) p(x)| max |g(x)| = .
2 xi xxi+1 2 4 8
Consequently, to ensure that the the error in linear interpolation is bounded by 106 , it is sufficient
for h to be chosen so that
eh2
106 .
8
This implies that h < 1.72 103 .
Because n = (1 0)/h must be an integer, a reasonable choice for the step size is h = 0.001.
Example 7. Determine the step size h that can be used in the tabulation of a function f (x), a x b,
at equally spaced nodal points so that the truncation error of the quadratic interpolation is less than .
Sol. Let x0 , x1 , x2 are three eqispaced points with space h. The truncation error of the quadratic
interpolation is given by
M
E2 (f ; x) max |(x x0 )(x x1 )(x x2 )|
3! axb
where M = max |f (3) (x)|.
axb
Let x = x0 + th, x1 = x0 + h, x2 = x0 + 2h,
|(x x0 )(x x1 )(x x2 )| = h3 |t(t 1)(t 2)| = g(t)(say)
Now g(t) attains its extreme values if
dg
=0
dt
which gives t = 1 13 . For both values of t, we obtain max |g(t)| = 2
3 3
. Truncation error
E2 (f ; x) <
h3
= M <
9 3
" #1/3
9 3
= h < .
M
Algorithm: (Divided-Difference Algorithm)
for i = 0, 1, , n do
c(i) = y(i) = f (x(i))
end for
for k = 1, , n do
for i = n, n 1 , k do
c(i) c(i 1)
c(i) =
x(i) x(i k)
end for
end for
12 INTERPOLATION AND APPROXIMATIONS
Let x = xn + sh.
Therefore
n
X
Pn (x) = f (xn ) + f [xn , xn1 , xnk ] (x xn ) (x xnk+1 )
k=1
n
X
= f (xn ) + f [xn , xn1 , xnk ] (s)h (s + 1)h (s + k 1)h
k=1
n
X k s
= f (xn ) + f [xn , xn1 , xnk ] (1) hk k!
k
k=1
where the binomial formula is extended to include all real values s
s s(s 1) (s k + 1) s(s + 1) (s + k 1)
= = (1)k .
k k! k!
This formula is called the Newton backward divided-difference formula. Like-wise the forward difference
operator, we introduce the backward-difference operator
f (xi ) = f (xi ) f (xi1 ).
f (xi ) = k1 f (xi ) = k1 [f (xi ) f (xi1 )]
k
then
1
f [xn , xn1 ] = f (xn )
h
1
f [xn , xn1 , xn2 ] = 2 f (xn )
2!h2
In general
1
k f (xn ).
f [xn , xn1 , xn2 , xnk ] =
k!hk
Therefore by using the backward-difference operator, the Newton backward divided-difference formula
can be written as
n
X s
Pn (x) = f (xn ) + (1)k k f (xn ).
k
k=1
This is the Newton backward difference interpolation formula.
Example 8. For the following data, calculate the differences and obtain the forward and backward
difference polynomials. Interpolate at x = 0.25.
Now
f (0.25) P (0.25) = 1.655.
Differences and Derivatives:
Since
f (x) = f (x + h) f (x)
h2 00
0
= f (x) + hf (x) + f (x) + f (x)
2
0
= hf (x) + O(h)
hf 0 (x).
Similarly
2 f (x) = f (x + 2h) 2f (x + h) + f (x)
(2h)2 00
0
= f (x) + 2hf (x) + f (x) +
2
h2
2 f (x) + hf 0 (x) + f 00 (x) + + f (x)
2
= h2 f 00 (x) + h3 f 000 (x) +
2 f (x)
= f 00 (x) = .
h2
Similarly we can obtain higher-order derivatives.
n
X n
X
= yi = na + b xi (5.1)
i=1 i=1
n
E X
= [yi (a + bxi )](2xi ) = 0
b
i=1
n
X n
X n
X
= x i yi = a xi + b x2i . (5.2)
i=1 i=1 i=1
These equations (5.1-5.2) are called normal equations, which are to be solved to get desired values for
a and b.
Example 9. Obtain the least square straight line fit to the following data
Example 10. Find the least square approximation of second degree for the discrete data
x 2 1 0 1 2
f (x) 15 1 1 3 19
5
X 5
X 5
X 5
X
x2i f (xi ) =a x2i +b x3i +c x4i .
i=1 i=1 i=1 i=1
5 5 4 5 5 5 5
x2i = 10, x3i = 0, x4i = 34, x2i f (xi ) =
P P P P P P P
We have xi = 0, f (xi ) = 39, xi f (xi ) = 10,
i=1 i=1 i=1 i=1 i=1 i=1 i=1
140.
From given data
5a + 10c = 39
10b = 10
10a + 34c = 140.
37 31
The solution of this system is a = , b = 1, and c = .
35 7
1
The required approximation is y = (37 + 35x + 155x2 ).
35
Example 11. Use the method of least square to fit the curve f (x) = c0 x + c1 / x. Also find the least
square error.
Sol. By principle of least squares, we minimize the error
5
X c1
E(c0 , c1 ) = [f (xi ) c0 xi ]2
xi
i=1
We obtain the normal equations
5 5 5
X X X
c0 x2i + c1 xi = xi f (xi )
i=1 i=1 i=1
5 5 5
X X 1 X f (xi )
c0 xi + c1 = .
xi xi
i=1 i=1 i=1
We have
5 5 5
X X 1 X
xi = 4.1163, = 11.8333, x2i = 5.38
xi
i=1 i=1 i=1
5 5
X X f (xi )
xi f (xi ) = 24.9, = 85.0151.
xi
i=1 i=1
The normal equations are given by
5.3c0 + 4.1163c1 = 24.9
4.1163c0 + 11.8333c1 = 85.0151.
Whose solution is c0 = 1.1836, c1 = 7.5961.
Therefore, the least square fit is given as
7.5961
f (x) = 1.1836x.
x
The least square error is given by
5
X 7.5961
E= [f (xi ) + 1.1836xi ]2 = 1.6887
xi
i=1
Example 12. Obtain the least square fit of the form y = abx to the following data
INTERPOLATION AND APPROXIMATIONS 17
x 1 2 3 4 5 6 7 8
f (x) 1.0 1.2 1.8 2.5 3.6 4.7 6.6 9.1
Sol. The curve y = abx takes the form Y = A + Bx after taking log, where Y = log y, A = log a and
B = log b.
Hence the normal equations are given by
8
X 8
X
Yi = 8A + B xi
i=1 i=1
8
X x
X 8
X
xi Yi = A xi + B x2i
i=1 i=1 i=1
From the data, we form the following table.
x y Y = log y xY x2
1 1.0 0.0 0.0 1
2 1.2 0.0792 0.1584 4
3 1.8 0.2553 0.7659 9
4 2.5 0.3979 1.5916 16
5 3.6 0.5563 2.7815 25
6 4.7 0.6721 4.0326 36
7 6.6 0.8195 5.7365 49
8 9.1 0.9590 7.6720 64
36 30.5 3.7393 22.7385 204
Example 13. We are given the values of a function of the variable t. Obtain a least square fit of the
4
X 4
X 4
X
5ti 4ti
a e +b e fi e2ti = 0
i=1 i=1 i=1
Using the table values, we obtain the system of equations
1.106023a + 1.332876b 1.16542 = 0
1.332876a + 1.7622740b 1.409764 = 0,
which have the solution a = 0.6853, b = 0.3058.
Therefore the least square fit is given by
f (t) = 0.6853e3t + 0.3058e2t .
Remark 5.1. If data is quite large then we can make it small by changing the origin and appropriating
scaling.
Example 14. Show that the line of fit to the following data is given by y = 0.7x + 11.28.
x 0 5 10 15 20 25
y 12 15 17 22 24 30
Sol. Here n = 6. We fit a line of the form y = A + Bx.
x 15
Let u = , v = y 20 and line of the form v = a + bu.
5
x y u v uv u2
0 12 3 8 24 9
5 15 2 5 10 4
10 17 1 3 3 1
15 22 0 2 0 0
20 24 1 4 4 1
25 30 2 10 20 4
3 0 61 19
The normal equations are,
0 = 6a 3b
61 = 3a + 19b.
By solving a = 1.7428 and b = 3.4857.
Therefore equation of the line is v = 1.7428 + 3.4857u.
Changing in to original variable, we obtain
x 15
y 20 = 1.7428 + 3.4857
5
= y = 11.2857 + 0.6971x.
Exercises
(1) Find the unique polynomial P (x) of degree 2 or less such that
P (1) = 1, P (3) = 27, P (4) = 64
using Lagrange and Newton interpolation. Evaluate P (1.05).
(2) Let P3 (x) be the Lagrange interpolating polynomial for the data (0, 0), (0.5, y), (1, 3) and (2, 2).
Find y if the coefficient of x3 in P3 (x) is 6.
(3) Calculate a quadratic interpolate in Newton form to e0.826 from the function values
e0.82 = 2.270500, e0.83 = 2.293319, e0.84 = 2.316367.
(4) Let f (x) = ln(1 + x), x0 = 1, x1 = 1.1. Use Lagrange linear interpolation to find the
approximate value of f (1.04) and obtain a bound on the truncation error.
(5) Use the following values and four-digit rounding arithmetic to construct a third degree Lagrange
polynomial approximation to f (1.09). The function being approximated is f (x) = log10 (tan x).
Use this knowledge to find a bound for the error in the approximation.
f (1.00) = 0.1924, f (1.05) = 0.2414, f (1.10) = 0.2933, f (1.15) = 0.3492.
INTERPOLATION AND APPROXIMATIONS 19
(6) Determine the step size h that can be used in the tabulation of a function f (x), a x b, at
equally spaced nodal points so that the truncation error of the cubic interpolation is less than
.
(7) If linear interpolation is used to interpolate the error function
Z x
2 2
f (x) = ex dt,
0
show that the error of linear interpolation using data (x0 , f0 ) and (x1 , f1 ) cannot exceed
(x1 x0 )2 /2 2e.
(8) Suppose that f (x) = ex cos x is to be approximated on [0, 1] by an interpolating polynomial on
n + 1 equally spaced points. Determine n so that the truncation error will be less than 0.0001
in this interval.
(9) The following data represents the function f (x) = ex .
x 1 1.5 2.0 2.5
f (x) 2.7183 4.4817 7.3891 12.1825
Estimate the value of f (2.25) using the Newtons forward and backward difference interpolation.
Compare with the exact value. Also obtain the bound of the truncation error.
(10) Construct the interpolating polynomial that fits the following data using Newton forward and
backward difference interpolation.
x 0 0.1 0.2 0.3 0.4 0.5
f (x) 1.5 1.27 0.98 0.63 0.22 0.25
Hence find the values of f (x) at x = 0.15 and 0.45.
(11) The error function erf (x) is defined by the integral
Z x
2 2
erf (x) = et dt.
0
(A) Approximate erf (0.08) by linear interpolation in the given table of correctively rounded
values. Estimate the total error.
x 0.05 0.10 0.15 0.20
erf (x) 0.05637 0.11246 0.16800 0.22270
(B) Suppose that the table were given with 7 correct decimals and with the step size 0.001.
Find the maximum total error for linear interpolation in the interval 0 x 0.10
in this table.
(12) Determine the spacing h in a table of equally spaced values of the function f (x) = x between
1 and 2, so that interpolation with a quadratic polynomial will yield an accuracy of 5 108 .
sin x
(13) The following data are parts of a table for function g(x) = 2 .
x
x 0.1 0.2 0.3 0.4 0.5
f (x) 9.9833 4.9667 3.2836 2.4339 1.9177
Calculate g(0.25) as accurately as possible
(a) by interpolating directly in this table, (b) by first calculating xg(x) and then interpolating
directly in that table, (c) explain the difference between the results obtained in (a) and (b),
respectively.
(14) By the method of least square fit a curve of the form y = axb to the following data
x 2 3 4 5
y 27.8 62.1 110 161
(15) Determine the least squares approximation of the type ax2 + bx + c, to the function 2x at the
points xi = 0, 1, 2, 3, 4.
(16) Experiments with a periodic process gave the following data :
t 0 50 100 150 200
y 0.754 1.762 2.041 1.412 0.303
Estimate the parameter a and b in the model y = a + b sin t, using the least square approxima-
tion.
20 INTERPOLATION AND APPROXIMATIONS
Bibliography
[Gerald] Curtis F. Gerald and Patrick O. Wheatley Applied Numerical Analysis, 7th edition,
Pearson, 2003.
[Atkinson] K. Atkinson and W. Han. Elementary Numerical Analysis, 3rd edition, John Willey
and Sons, 2004.
CHAPTER 5 (4 LECTURES)
NUMERICAL INTEGRATION
1. Introduction
The general problem is to find the approximate value of the integral of a given function f (x) over
an interval [a, b]. Thus
Z b
I= f (x)dx. (1.1)
a
Problem can be solved by using the Fundamental Theorem of Calculus by finding an anti-derivative
F of f , that is, F 0 (x) = f (x), and then
Z b
f (x)dx = F (b) F (a).
a
But finding an anti-derivative is not an easy task in general. Hence, it is certainly not a good approach
for numerical computations.
In this chapter well study methods for finding integration rules. Well also consider composite versions
of these rules and the errors associated with them.
Therefore
Z b Z b Z b
f (x)dx = Pn (x)dx + en (x)dx
a a a
n Z b Z b n
X 1 Y
= f (xi ) li (x)dx + f (n+1) () (x xi )dx
a (n + 1)! a
i=0 i=0
Xn
= i f (xi ) + En
i=0
where
Z b
i = li (x)dx.
a
1
2 NUMERICAL INTEGRATION
We can also use Newton divided difference interpolation to approximate the function f (x).
Before we proceed, we define an alternative method to analyze error which is based on the method of
undetermined coefficients.
is said to be of order p if it provides exact results for all polynomials of degree less than or equal to p.
Now if the above method gives exact results for polynomials of degree less than or equal n, then the
error term will be zero for all polynomials of degree n.
IF |f (n+1) ()| M then error term can be written as
n
Z bY
M
En (x xi )dx
(n + 1)! a i=0
C
= M.
(n + 1)!
The number C is called error constant. By using the notation, we can write error term as following
C
En = f (n+1) ().
(n + 1)!
3. Newton-Cotes Formula
ba
Let all nodes are equally spaced with spacing h = . The number h is also called the step
n
length.
Let x0 = a and xn = b then xi = a + ih, i = 0, 1, , n.
The general quadrature formula is given by
Z b n
X
f (x)dx = i f (xi ) + En .
a i=0
NUMERICAL INTEGRATION 3
Therefore
Z b
i = li (x)dx
a
n n
tj
Z Y
= h dt (dx = hdt).
0 ij
j=0, j6=i
For n=1. x0 = a, x1 = b, and h = b a and we use linear interpolation. The values of the multipliers
are given by
Z 1
t1
0 = h dt = h/2.
0 01
Z 1
t0
1 = h dt = h/2.
0 10
Hence
Z b
f (x)dx = 0 f (x0 ) + 1 f (x1 )
a
h
=
[f (a) + f (b)].
2
This is called the Trapezoidal rule. Now error is given by
1 b 00
Z
E1 = f ()(x a)(x b)dx
2 a
Since (x a)(x b) does not change its sign in [a, b], therefore by the Weighted Mean-Value Theorem,
there exists (a, b) such that
Z b
1 00
E1 = f () (x a)(x b)dx
2
a
00 1 3
= f () (b a)
6
h 3
= f 00 ().
12
Trapezoidal rule (with error) is given by
Z b
h h3
f (x)dx = [f (a) + f (b)] f 00 ().
a 2 12
Geometrically, it is the area of Trapezium (Trapezoid) with width h and ordinates f (a) and f (b).
For n=2. We take x0 = a, x1 = a+b 2 , x2 = b. The values of the multipliers are given by
Z 2
(t 1)(t 2)
0 = h dt = h/3
0 (0 1)(0 2)
Z 2
(t 0)(t 2)
1 = h dt = 4h/3
0 (1 0)(1 2)
4 NUMERICAL INTEGRATION
2
(t 0)(t 1)
Z
2 = h dt = h/3.
0 (2 0)(2 1)
Hence
Z b
f (x)dx = 0 f (x0 ) + 1 f (x1 ) + 2 f (x2 )
a
h a+b
= f (a) + 4f ( ) + f (b) .
3 2
This is Simpsons 31 rule.
To calculate the error in Simpsons rule, we use alternate method by making the method exact for all
polynomial degree up to 2, therefore
C
E2 = f (3) ().
3!
Since Simpsons rule is exact for polynomial degree up to 2, therefore
Z b " 3 #
h a + b
C = x3 dx a3 + 4 + b3
a 3 2
= 0.
This implies Simpsons rule is exact for polynomial up to degree 3 also. Therefore error is given by
C
E3 = f (4) ().
4!
a+b 4
Z b
h
C = x4 dx [a4 + 4 + b4 ]
a 3 2
(b a)5
= .
120
Hence error in Simpsons rule is given by
(b a)5 (4)
E3 = f ()
120 4!
h5
= f (4) ().
90
ba
For n=3. Three nodal points are a = x0 , x1 , x2 , x3 = b with h = . We get the Simpsons 83 rule
3
Z b
3h
f (x)dx = [f (x0 ) + 3f (x1 ) + 3f (x2 ) + f (x3 )].
a 8
Error in Simpsons three-eighth rule is given by
3
E4 = h5 f (4) ().
80
Example 1. Find the value of the integral
Z 1
dx
I=
0 1+x
using trapezoidal and Simpsons rule. Also obtain a bound on the errors. Compare with exact value.
Sol.
1
f (x) =
1+x
By trapezoidal rule
IT = h/2[f (a) + f (b)]
Here a = 0, b = 1, h = b a = 1.
Exact value
Iexact = ln 2 = 0.693147
Error= |0.75 0.693147| = 0.056853
The error bound for the trapezoidal rule is given by
E1 h3 /12 max |f 00 ()|
0x1
2
1/12 max
0x1 (1 + x)3
1/6
Similarly by using Simpsons rule with h = (b a)/2 = 1/2, we obtain
IS = h/3[f (0) + 4f (1/2) + f (1)] = 1/6(1 + 8/3 + 1/2) = 0.69444
Error= |0.75 0.69444| = 0.001297.
The error bound for the Simpsons rule is given by
h5
E3 max |f 0000 ()|
90 0x1
1 24
max
2880 0x1 (1 + x)5
0.008333.
Example 2. Find the quadrature formula by method of undetermined coefficients
Z 1
f (x)
p dx = 1 f (0) + 2 f (1/2) + 3 f (1)
0 x(1 x)
which is exact for polynomials of highest possible degree. Then use the formula to evaluate
Z 1
dx
0 x x3
and compare with the exact value.
Sol. We make the method exact for polynomials up to degree 2.
Z 1
dx
f (x) = 1 : I1 = p = 1 + 2 + 3
0 x(1 x)
Z 1
xdx
f (x) = x : I2 = p = 1/22 + 3
0 x(1 x)
Z 1
2 x2 dx
f (x) = x : I3 = p = 1/42 + 3
0 x(1 x)
Now Z 1 Z 1 Z 1
dx dx dt
I1 = p = p = = [sin1 t]11 =
x(1 x) 1 (2x 1) 2 1t 2
0 0 1
Similarly
I2 = /2
I3 = 3/8.
Therefore
1 + 2 + 3 =
1/22 + 3 = /2
1/42 + 3 = 3/8
By solving these equations, we obtain 1 = /4, 2 = /2, 3 = /4. Hence
Z 1
f (x)
p dx = /4[f (0) + 2f (1/2) + f (1)].
0 x(1 x)
6 NUMERICAL INTEGRATION
Z 1 Z 1 Z 1
dx dx f (x)dx
I= = p = p .
0 x x3 0 1 + x x(1 x) 0 x(1 x)
Here f (x) = 1/ 1 + x.
By using the above formula, we obtain
" #
2 2 2
I = /4 1 + + = 2.62331.
3 2
The exact value of integral is
I = 2.6220575.
4. Gauss Quadrature
In the numerical integration method if both nodes xi and multipliers i are unknown then method is
called Gaussian quadrature. We can obtain the unknowns by making the method exact for polynomials
of degree as high as required. The formulas are derived for the interval [1, 1] and any interval [a, b]
can be transformed to [1, 1] by taking the transformation x = At + B which gives a = A + B and
ba b+a
b = A + B and after solving we get x = t+ .
2 2
We consider
Z 1 X n
w(x)f (x)dx = i f (xi )
1 i=0
where w(x) is appropriate weight function.
Gauss-Legendre Integration Methods: In this integration procedure we take w(x) = 1. The
integration formula is given by
Z 1 n
X
f (x)dx = i f (xi ).
1 i=0
One-point formula: The formula is given by
Z 1
f (x)dx = 0 f (x0 ).
1
The method has two unknowns 0 and x0 . Make the method exact for f (x) = 1, x, we obtain
Z 1
f (x) = 1 : dx = 2 = 0
1
Z 1
f (x) = x : xdx = 0 = 0 x0 = x0 = 0.
1
Therefore one-point formula is given by
Z 1
f (x)dx = 2f (0).
1
The error in approximation is given by
C 00
E1 = f ()
2!
where error constant C is given by
Z 1
C= x2 dx 2f (0) = 2/3.
1
Hence
1
E1 = f 00 (), 1 < < 1.
3
Two-point formula:
Z 1
f (x)dx = 0 f (x0 ) + 1 f (x1 ).
1
NUMERICAL INTEGRATION 7
The method has four unknowns. Make the method exact for f (x) = 1, x, x2 , x3 , we obtain
Z 1
f (x) = 1 : dx = 2 = 0 + 1 (4.1)
1
Z 1
f (x) = x : xdx = 0 = 0 x0 + 1 x1 (4.2)
1
Z 1
f (x) = x2 : x2 dx = 2/3 = 0 x20 + 1 x21 (4.3)
1
Z 1
f (x) = x3 : x3 dx = 0 = 0 x30 + 1 x31 (4.4)
1
The method has six unknowns. Make the method exact for f (x) = 1, x, x2 , x3 , x4 , x5 , we obtain
f (x) = 1 : 2 = 0 + 1 + 2
f (x) = x : 0 = 0 x0 + 1 x1 + 2 x2
f (x) = x2 : 2/3 = 0 x20 + 1 x21 + 2 x22
f (x) = x3 : 0 = 0 x30 + 1 x31 + 2 x32
f (x) = x4 : 2/5 = 0 x40 + 1 x41 + 2 x42
f (x) = x5 : 0 = 0 x50 + 1 x51 + 2 x52
p
By solving
p these equations, we obtain 0 = 2 = 5/9 and 1 = 8/9. x 0 = 3/5, x1 = 0 and
x2 = 3/5.
Therefore formula is given by
Z 1 " r ! r !#
1 3 3
f (x)dx = 5f + 8f (0) + 5f .
1 9 5 5
8 NUMERICAL INTEGRATION
1
E5 = f (6) (), 1 < < 1.
15750
Example 3. Evaluate
Z 2
2x
I=
4
dx
1 1+x
using Gauss-Legendre 1 and 2-point formula. Also compare with the exact value.
t+3
Sol. Firstly we change the interval [1, 2] in to [1, 1] by taking x = , dx = dt/3.
2
Z 2 Z 1
2x 8(t + 3)
I= dx = dt
1 1 + x4 1 16 + (t + 3)4
Let
8(t + 3)
f (t) = .
16 + (t + 3)4
By 1-point formula
I = 2f (0) = 0.4948
By 2-point formula
1 1
I=f +f = 0.5434
3 3
Now exact value of the integral is given by
Z 2
2x
I= 4
dx = tan1 4 = 0.5408
1 1+x 4
Example 4. Evaluate
Z 1
3/2
I= (1 x2 ) cos x dx
1
using Gauss-Legendre 3-point formula.
3/2
Sol. Using Gauss-Legendre 3-point formula with f (x) = (1 x2 ) cos x, we obtain
" r ! r !#
1 3 3
I = 5f + 8f (0) + 5f
9 5 5
" ! r !#
2 3/2
r 3/2
1 3 2 3
= 5 cos +8+5 cos
9 5 5 5 5
= 1.08979.
5. Composite Integration
As the order of integration method is increased, the order of the derivative involved in error term also
increase. Therefore, we can use higher-order method if the integrand is differentiable up to required
degree. We can apply lower-order methods by dividing the whole interval in to subintervals and then
we use any Newton-Cotes or Gauss quadrature method for each subintervals separately.
Composite Trapezoidal Method: We divide the interval [a, b] into N subintervals with step size
NUMERICAL INTEGRATION 9
ba
h= and taking nodal points a = x0 < x1 < < xN = b where xi = x0 +i h, i = 1, 2, , N 1.
N
Now
Z b
I = f (x)dx
Za x1 Z x2 Z xN
= f (x)dx + f (x)dx + + f (x)dx.
x0 x1 xN 1
Now use trapezoidal rule for each of the integrals on the right side, we obtain
h
I = [(f0 + f1 ) + (f1 + f2 ) + + (fN 1 + fN )]
2
h
= [f0 + 2(f1 + f2 + + fN 1 ) + fN ]
2
where fi = f (xi ), i = 0, 1, , N . This formula is composite trapezoidal rule. The error in the
composite integration is given by
h3 00
E= [f (1 ) + f 00 (2 ) + + f 00 (N )]
12
where xi1 i xi , i = 1, 2, , N. The error in numerical approximation decrease as N increases
ba
as h = .
N
Composite Simpsons Method: Simpsons rule require three abscissas. We divide the interval [a, b]
in to 2N (to get odd number of abscissas) subintervals with step size h = ba 2N and taking nodal points
a = x0 < x1 < < x2N = b where xi = x0 + i h, i = 1, 2, , 2N 1. We write
Z b
I = f (x)dx
Za x2 Z x4 Z x2N
= f (x)dx + f (x)dx + + f (x)dx.
x0 x2 x2N 2
Now use Simpsons rule for each of the integrals on the right side to obtain
h
I = [(f0 + 4f1 + f2 ) + (f2 + 4f3 + f4 ) + + (f2N 2 + 4f2N 1 + f2N )]
3
h
= [f0 + 4(f1 + f3 + + f2N 1 ) + 2(f2 + f4 + + f2N 2 ) + f2N ].
3
This formula is called composite Simpsons rule. The error in the integration rule is given by
h5 (4)
E= [f (1 ) + f (4) (2 ) + + f (4) (N )]
90
where x2i2 i x2i , i = 1, 2, , N.
Example 5. Evaluate the integral
Z 1
1
I=
0 1 + x
by using the composite trapezoidal and Simpsons rule with 2 and 4 subintervals.
Sol. Let IT and IS represent the values of the integral by composite trapezoidal and composite
Simpsons rule, respectively.
ba
Case I: Number of subintervals N = 2 then h = = 1/2. Therefore we have two subintervals for
N
trapezoidal and one interval for Simpsons rule.
We have
1
IT = [f (0) + 2f (1/2) + f (1)] = 0.70833.
4
1
IS = [f (0) + 4f (1/2) + f (1)] = 0.69444.
6
10 NUMERICAL INTEGRATION
Case II: Number of subintervals N = 4 then h = 1/4. We have four subintervals for trapezoidal rule
and two subintervals for Simpsons rule.
1
IT = [f (0) + 2(f (1/4) + f (1/2) + f (3/4)) + f (1)] = 0.69702.
8
1
IS = [f (0) + 4f (1/4) + 2f (1/2) + 4f (3/4) + f (1)] = 0.69325.
12
Example 6. Evaluate
Z 1
dx
I=
0 1+x
by subdividing the interval [0, 1] into two equal parts and then by using Gauss-Legendre three-point
formula.
Sol. " r ! r !#
Z 1
1 3 3
f (x)dx = 5f + 8f (0) + 5f .
1 9 5 5
Let Z 1 Z 1/2 Z 1
dx dx dx
I= = + = I1 + I2 .
0 1+x 0 1+x 1/2 1 + x
t+1 z+3
Now substitute x = and x = in I1 and I2 , respectively to change the limits to [1, 1].
4 4
We have dx = dt/4 and dx = dz/4 for integral I1 and I2 , respectively.
Therefore " #
Z 1
dt 1 5 8 5
I1 = = p + + p = 0.405464
1 t + 5 9 5 3/5 5 5 + 3/5
Z 1 " #
dz 1 5 8 5
I2 = = p + + p = 0.287682
1 z + 7 9 7 3/5 7 7 + 3/5
Hence
I = I1 + I2 = 0.405464 + 0.287682 = 0.693146.
Example 7. The area A inside the closed curve y 2 + x2 = cos x is given by
Z
1/2
A=4 (cos x x2 ) dx
0
where is the positive root of the equation cos x = x2 .
(a) Compute with three correct decimals.
(b) Use trapezoidal rule to compute the area A with an absolute error less than 0.05.
Sol. (a) Using Newton method to find the root of the equation
f (x) = cos x x2 = 0,
we obtain the following iteration scheme
cos xk x2k
xk+1 = xk + , k = 0, 1, 2,
sin xk + 2xk
Starting with x0 = 0.5, we obtain
0.62758
x1 = 0.5 + = 0.92420
1.47942
0.25169
x2 = 0.92420 + = 0.82911
2.64655
0.011882
x3 = 0.82911 + = 0.82414
2.39554
0.000033
x4 = 0.82414 + = 0.82413.
2.38226
NUMERICAL INTEGRATION 11
Exercises
(1) Given
Z 1
I= x ex dx.
0
Approximate the value of I using trapezoidal and Simpsons one-third method. Also obtain
the error bounds and compare with exact value of the integral.
(2) Evaluate
Z 1
dx
I= 2
0 1+x
using trapezoidal and Simpsons rule with 4 and 6 subintervals. Compare with the exact value
of the integral.
12 NUMERICAL INTEGRATION
(3) Compute
1
xp
Z
Ip = 3
dx
0 x + 10
for p = 0, 1 using trapezoidal and Simpsons rule with 3, 5 and 9 nodes.
(4) The length of the curve represented by a function y = f (x) on an interval [a, b] is given by the
integral
Z bp
I= 1 + [f 0 (x)]2 dx.
a
Use the trapezoidal rule and Simpsons rule with 4 and 8 subintervals to compute the length
of the curve y = tan1 (1 + x2 ), 0 x 2.
(5) Evaluate the integral
Z 1
2
ex cos x dx
1
by using the one and two point Gauss-Legendre formulas. Also obtain the bound for error for
one-point formula.
(6) Evaluate
Z 3
cos 2x
dx
2 1 + sin x
by using the two and three point Gauss-Legendre integration formulas.
(7) Determine the values of a, b, and c such that the formula
Z h
f (x)dx = h [af (0) + bf (h/3) + cf (h)]
0
is exact for polynomials of degree as high as possible. Also obtain the order of the truncation
error.
(8) Determine constants a, b, c, and d that will produce a quadrature formula
Z 1
f (x)dx = af (1) + bf (1) + cf 0 (1) + df 0 (1)
1
Bibliography
[Atkinson] K. Atkinson and W. Han. Elementary Numerical Analysis, John Willey and Sons, Third
edition, 2004.
[Jain] M. K. Jain, S. R. K. Iyengar, and R. K. Jain. Numerical Methods for Scientific and
Engineering Computation, Sixth edition, New Age International Publishers, New Delhi,
2012.
CHAPTER 6 (4 LECTURES)
NUMERICAL SOLUTIONS OF ORDINARY DIFFERENTIAL EQUATIONS
1. Introduction
In this chapter, we discuss the numerical methods for solving the ordinary differential equations of
initial-value problems (IVP) of the form
dy
= f (x, y), x R, y(x0 ) = y0 (1.1)
dx
where y is a function of x, f is function of x and y, and x0 is called the initial value. The numerical
values of y(x) on an interval containing x0 are to be determined.
We divide the domain [a, b] in to subintervals
a = x0 < x1 < < xN = b.
These points are called mesh points or grid points. Let equal spacing is h. The uniform mesh points
are given by xi = x0 + ih, i = 0, 1, 2, ... The set of points y0 , y1 , , yN are the numerical solution of
the initial-value problem (IVP).
f
Theorem 2.2. If f (x, y) and are continuous on a region then the IVP (1.1) has a unique
x
solution y(x) in the interval |x x0 | min{a, Mb }, where M = max |f (x, y)|.
(x,y)
which gives
y(xi+1 ) = y(xi ) + hf (xi , yi ).
We can write
xi+1 = xi + h
yi+1 = yi + hf (xi , yi )
where yi = y(xi ). This is called Eulers method.
3.2. The Improved or Modified Eulers method. A better approximations of the slope is the
average of two slopes at points (xi , yi ) and (xi+1 , yi+1 ) and we can write the Eulers method as
h
yi+1 = yi + [f (xi , yi ) + f (xi+1 , yi+1 )].
2
This method falls in the category of Predictor-Corrector methods.
(0)
Predictor: y1 = y0 + hf (x0 , y0 )
(1) (0)
Corrector: y1 = y0 + h2 [f (x0 , y0 ) + f (x1 , y1 )].
(i)
We repeat this process until two y1 becomes same.
Geometrically, the tangent line to the graph y(x) at xi has slope
y 0 (xi ) = f (xi , yi ).
If we use this tangent line to approximate the curve near the point (xi , y(xi )), the value of the tangent
line at x = xi+1 is given by the right side of the method.
Local truncation error and order of the method: Truncation error of the difference approx-
imation is the difference of the exact solution y(xi+1 ) and the numerical solution yi+1 and is given
by
Ti+1 = y(xi+1 ) yi+1
= y(xi + h) [y(xi ) + hf (xi , yi )]
h2
= y(xi ) + hy 0 (xi ) + y 00 () [y(xi ) + hf (xi , yi )]
2
2
h 00
= y ().
2
h2
|T | M, xi < < xi+1 , M = max |y 00 (x)|.
2
If truncation error has term hp+1 , then order of the numerical method is p. For example, Eulers
method is a first-order method.
Example 1.
y 0 + 2y = 2 e4t , y(0) = 1
By taking step size 0.1, find y at t = 0.1 and 0.2 by Euler method.
Sol.
y 0 = 2y + 2 e4t = f (t, y), y(0) = 1
f (0, 1) = 2(1) + 2 1 = 1
By Euler method with step size h = 0.1,
t1 = t0 + h = 0 + 0.1 = 0.1
y1 = y0 + hf (0, 1) = 1 + 0.1(1) = 0.9
Therefore
y(0.1) = 0.9.
t2 = t0 + 2h = 0 + 2 0.1 = 0.2
y2 = y1 + hf (0.1, 0.9) = 0.9 + 0.1(2 0.9 + 2 e4(0.1) )
= 0.9 + 0.1(0.47032) = 0.852967
Therefore
y(0.2) = 0.852967.
NUMERICAL DIFFERENTIAL EQUATIONS 3
Example 2. For the IVP y 0 = x + y, y(0) = 1. Calculate y in the interval [0.0.6] with h = 0.2 by
using modified Eulers method.
Sol.
y0 = x + y = f (x, y), x0 = 0, y0 = 1, h = 0.2, x1 = 0.2
Predictor
(0)
y1 = y0 + hf (x0 , y0 ) = 1 + 0.2(1) = 1.2
Corrector
(1) (0)
y1 = y0 + h/2[f (x0 , y0 ) + f (x1 , y1 )] = 1 + (0.2/2)[1 + 1.2954] = 1.2295
(2) (1)
y1 = y0 + h/2[f (x0 , y0 ) + f (x1 , y1 )] = 1 + (0.2/2)[1 + 1.3088] = 1.2309
(3) (2)
y1 = y0 + h/2[f (x0 , y0 ) + f (x1 , y1 )] = 1 + (0.2/2)[1 + 1.3094] = 1.2309
(2) (3)
y1 = y1 = y1 = y(0.2) = 1.2309
Now
y1 = 1.2309, h = 0.2, x1 = 0.2, x2 = 0.4
(0)
y2 = y1 + hf (x1 , y1 ) = 1.2309 + 0.2(1.3094) = 1.4927
(1) (0)
y2 = y1 + h/2[f (x1 , y1 ) + f (x2 , y2 )] = 1.2309 + (0.2/2)[1.3094 + 1.6218] = 1.5240
(2) (1)
y2 = y1 + h/2[f (x1 , y1 ) + f (x2 , y2 )] = 1.2309 + (0.2/2)[1.3094 + 1.6345] = 1.5253
(3) (2)
y2 = y1 + h/2[f (x1 , y1 ) + f (x2 , y2 )] = 1.2309 + (0.2/2)[1.3094 + 1.6350] = 1.5253
Therefore
y(0.4) = 1.5253
Now
y2 = 1.5253, h = 0.2, x2 = 0.4, x3 = 0.6
(0)
y3 = y2 + hf (x2 , y2 ) = 1.5253 + 0.2(1.635) = 1.8523
(1) (0)
y3 = y2 + h/2[f (x2 , y2 ) + f (x3 , y3 )] = 1.5253 + (0.2/2)[1.635 + 1.961] = 1.8849
(2) (1)
y3 = y2 + h/2[f (x2 , y2 ) + f (x3 , y3 )] = 1.5253 + (0.2/2)[1.635 + 1.9729] = 1.8861
(3) (2)
y3 = y2 + h/2[f (x2 , y2 ) + f (x3 , y3 )] = 1.5253 + (0.2/2)[1.635 + 1.9734] = 1.8861
Hence
y(0.6) = 1.8861.
1
yi+1 = yi + (K1 + K2 )
2
end for
Third-order Runge-Kutta method:
1
yi+1 = yi + (K1 + 4K2 + K3 )
6
where
K1 = hf (xi , yi )
K2 = hf (xi + h/2, yi + K1 /2)
K3 = hf (xi + h, yi K1 + 2K2 )
and xi = x0 + ih. Fourth-order Runge-Kutta method:
1
yi+1 = yi + (K1 + 2K2 + 2K3 + K4 ) + O(h5 )
6
where
K1 = hf (xi , yi )
K2 = hf (xi + h/2, yi + K1 /2)
K3 = hf (xi + h/2, yi + K2 /2)
K4 = hf (xi + h, yi + K3 ).
Algorithm for fourth-order Runge-Kutta method:
for i = 0, 1, 2, .. do
xi+1/2 = xi + h/2
xi+1 = xi + h = x0 + (i + 1)h
K1 = hf (xi , yi )
K2 = hf (xi+1/2 , yi + K1 /2)
K3 = hf (xi+1/2 , yi + K2 /2)
K4 = hf (xi+1 , yi + K3 )
1
yi+1 = yi + (K1 + 2K2 + 2K3 + K4 )
6
end for
Local truncation error in the Runge-Kutta method is the error that arises in each step because of
the truncated Taylor series. This error is inevitable. The fourth-order Runge-Kutta involves a local
truncation error of O(h5 ).
dy y 2 x2
Example 4. Using Runge-Kutta fourth-order, solve = 2 with y0 = 1 at x = 0.2 and 0.4.
dx y + x2
Sol.
y 2 x2
f (x, y) = , x0 = 0, y0 = 1, h = 0.2
y 2 + x2
K1 = hf (x0 , y0 ) = 0.2f (0, 1) = 0.200
h K1
K2 = hf (x0 + , y0 + ) = 0.2f (0.1, 1.1) = 0.19672
2 2
h K2
K3 = hf (x0 + , y0 + ) = 0.2f (0.1, 1.09836) = 0.1967
2 2
K4 = hf (x0 + h, y0 + K3 ) = 0.2f (0.2, 1.1967) = 0.1891
1
y1 = y0 + (K1 + 2K2 + 2K3 + K4 ) = 1 + 0.19599 = 1.196
6
Therefore y(0.2) = 1.196
Now x1 = x0 + h = 0.2
K1 = hf (x1 , y1 ) = 0.1891
6 NUMERICAL DIFFERENTIAL EQUATIONS
h K1
K2 = hf (x1 + , y1 + ) = 0.2f (0.3, 1.2906) = 0.1795
2 2
h K2
K3 = hf (x1 + , y1 + ) = 0.2f (0.3, 1.2858) = 0.1793
2 2
K4 = hf (x1 + h, y1 + K3 ) = 0.2f (0.4, 1.3753) = 0.1688
1
y2 = y(0.4) = y1 + (K1 + 2K2 + 2K3 + K4 ) = 1.196 + 0.1792 = 1.3752.
6
5. Numerical solution of system and second-order equations
We can apply the Euler and Runge-Kutta methods to find the numerical solution of system of
differential equations. Second-order equations can be changed in to system of differential equations.
The application of numerical methods are explained in the following examples.
Example 5. Solve the following system
dx
= 3x 2y
dt
dy
= 5x 4y
dt
x(0) = 3, y(0) = 6.
Find solution by Euler method at 0.1 and 0.2 by taking time increment 0.1.
Sol. Given t0 = 0, x0 = 3, y0 = 6, h = 0.1.
Write f (x, y) = 3x 2y, g(x, y) = 5x 4y.
By Euler method
x1 = x(0.1) = x0 + hf (x0 , y0 ) = 3 + 0.1(3 3 2 6) = 2.7
y1 = y(0.1) = y0 + hg(x0 , y0 ) = 6 + 0.1(5 3 4 6) = 5.1.
Similarly
x2 = x(0.2) = x1 + hf (x1 , y1 ) = 2.7 + 0.1(3 2.7 2 5.1) = 2.49
y2 = y(0.2) = y1 + hg(x1 , y1 ) = 5.1 + 0.1(5 2.7 4 5.1) = 4.41.
Example 6. Solve the following system
dy dz
= 1 + xz, = xy
dx dx
for x = 0.3 by using fourth-order Runge-Kutta method. Given y(0) = 0, z(0) = 1.
Sol. Given
dy dz
= 1 + xz = f (x, y, z), = xy = g(x, y, z)
dx dx
x0 = 0, y0 = 0, z0 = 1, h = 0.3
K1 = hf (x0 , y0 , z0 ) = 0.3f (0, 0, 1) = 0.3
L1 = hg(x0 , y0 , z0 ) = 0.3g(0, 0, 1) = 0
h K1 L1
K2 = hf (x0 + , y0 + , z0 + ) = 0.3f (0.15, 0.15, 1) = 0.346
2 2 2
h K1 L1
L2 = hg(x0 + , y0 + , z0 + ) = 0.00675
2 2 2
h K2 L2
K3 = hf (x0 + , y0 + , z0 + ) = 0.34385
2 2 2
h K2 L2
L3 = hg(x0 + , y0 + , z0 + ) = 0.007762
2 2 2
K4 = hf (x0 + h, y0 + K3 , z0 + L3 ) = 0.3893
L4 = hg(x0 + h, y0 + K3 , z0 + L3 ) = 0.03104
NUMERICAL DIFFERENTIAL EQUATIONS 7
Hence
1
y1 = y(0.3) = y0 + (K1 + 2K2 + 2K3 + K4 ) = 0.34483
6
1
z1 = z(0.3) = z0 + (L1 + 2L2 + 2L3 + L4 ) = 0.9899
6
Example 7. Solve by using fourth-order Runge-Kutta method for x = 0.2.
2
d2 y dy
2
=x y 2 , y(0) = 1, y 0 (0) = 0.
dx dx
Sol. Let
dy
= z = f (x, y, z)
dx
Therefore
dz
= xz 2 y 2 = g(x, y, z)
dx
Now x0 = 0, y0 = 1, z0 = 0, h = 0.2
K1 = hf (x0 , y0 , z0 ) = 0.0
L1 = hg(x0 , y0 , z0 ) = 0.2
h K1 L1
K2 = hf (x0 + , y0 + , z0 + ) = 0.02
2 2 2
h K1 L1
L2 = hg(x0 + , y0 + , z0 + ) = 0.1998
2 2 2
h K2 L2
K3 = hf (x0 + , y0 + , z0 + ) = 0.02
2 2 2
h K2 L2
L3 = hg(x0 + , y0 + , z0 + ) = 0.1958
2 2 2
K4 = hf (x0 + h, y0 + K3 , z0 + L3 ) = 0.0392
L4 = hg(x0 + h, y0 + K3 , z0 + L3 ) = 0.1905
Hence
1
y1 = y(0.2) = y0 + (K1 + 2K2 + 2K3 + K4 ) = 0.9801
6
0 1
z1 = y (0.3) = z0 + (L1 + 2L2 + 2L3 + L4 ) = 0.1970
6
Exercises
(1) Consider the IVP
y 0 = x(y + x) 2, y(0) = 2.
Use the Euler method with stepsize h = 0.2 to compute y(0.6) with four decimals.
(2) Use modified Eulers method to find y(0.2) and y(0.4) with h = 0.2 for IVP
y 0 = y + ex , y(0) = 0.
(3) Solve the initial value problem
dy y + x2 2
= , y(0) = 2
dx x+1
by explicit Euler method with step size h = 0.2 for interval [0, 1].
(4) Solve the following initial value problem with step size h = 0.1 and 0.2.
y 0 = tet y, y(0) = 1
by explicit Euler method in the interval [0, 1].
(5) Solve the following differential equation by second-order Runge-Kutta method
y 0 = y + 2 cos t, y(0) = 1.
Compute y(0.2), y(0.4), and y(0.6).
8 NUMERICAL DIFFERENTIAL EQUATIONS
(6) Compute solutions to the following problems with a second-order Taylor method. Use step size
h = 0.2.
(A)
y 0 = (cos y)2 , 0 x 1, y(0) = 0.
(B)
20
y0 = , 0 x 1, y(0) = 1.
1 + 19ex/4
(7) Using Runge-Kutta fourth-order method to solve the IVP at x = 0.8 for
dy
= x + y, y(0.4) = 0.41
dx
with step length h = 0.2.
(8) Use the Runge-Kutta fourth-order method to solve the following IVP
y 0 = xz + 1, y(0) = 0,
z 0 = xy, z(0) = 1
with h = 0.1 and 0 x 0.2.
(9) Apply the Taylors method of order three to obtain approximate value of y at x = 0.2 for the
differential equation
y 0 = 2y + 3ex , y0 = 0.
Compare the numerical solution with the exact solution.
(10) Use Runge-Kutta method of order four to solve
2
y 00 = xy 0 y 2 , y(0) = 1, y 0 (0) = 0
for x = 0.2 with stepsize 0.2.
(11) Consider the Lotka-Volterra system
du
= 2u uv, u(0) = 1.5
dt
dv
= 9v + 3uv, v(0) = 1.5.
dt
Use Eulers method with step size 0.5 to approximate the solution at t = 2.
(12) The following system represent a much simplified model of nerve cells
dx
= x + y x3 , x(0) = 0.5
dt
dy x
= , y(0) = 0.1
dt 2
where x(t) represents voltage across the boundary of nerve cell and y(t) is the permeability of
the cell wall at time t. Solve this system using Runge-Kutta fourth-order method to generate
the profile up to t = 0.2 with step size 0.1.
Bibliography
[Atkinson] K. Atkinson and W. Han, Elementary Numerical Analysis, John Willey and Sons, Third
edition, 2004.
[Jain] M. K. Jain, S. R. K. Iyengar, and R. K. Jain, Numerical Methods for Scientific and
Engineering Computation, Sixth edition, New Age International Publishers, New Delhi,
2012.