Ma 204 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Numerical Methods - MA 204

Syllabus:

Interpolation by polynomials, divided differences, error of the interpolating polynomial,


piecewise linear and cubic spline interpolation,
Numerical Integration, Composite rule, error formulae,
Solution of a system of linear equations, implementation of Gaussian elimination, and Gauss-Seidel
Methods, partial pivoting, row echelon form, LU factorization Cholesky’s method, ill-conditioning,
norms.
Solution of Non-linear equations, bisection and secant methods.
Newton’s method, rate of convergence, solution of a system of nonlinear equations, numerical
solution of ordinary differential equations, euler and Runge-Kutta methods, multi-step methods,
predictor-corrector methods, order of convergence, finite difference methods, numerical solutions
of elliptic, parabolic and hyperbolic partial differential equations.
Eigen-value problem, power method, QR method, Gershgorin’s theorem.
Exposure to software packages like IMSL subroutines, MATLAB

Suggested Readings:

1. S.D. Conte and Carle de Boor, Elementary Numerical Analysis An Algorithmic Approach
(3rd Edition), McGraw-Hill, 1980.

2. Carl E. Froberg, Introduction to Numerical Analysis (2nd Edition), Addison-Wesley, 1981.

3. E. Kreyszig, Advanced Engineering Mathematics (8th Edition), John Wiley and Sons, 1999.

4. D. Watkinson, Fundamentals of Matrix Computations, Wiley- Interscience (2nd edition),


2002.

5. M.K. jain, S.R.K. Iyengar, R.K. Jain, Numerical Methods (6th Edition), New Age
International (P) Ltd, 2012.

These are some partial, rough notes prepared for MA-204 students at IIT Indore. Mistakes
and typos are bound to be there. Students are suggested to read them carefully and are
encouraged to send their comments and suggestions.

Ashisha Kumar
Discipline of Mathematics
I.I.T. Indore.

1
CONTENTS 2

Contents
1 Introduction 3

2 Interpolation by polynomials 4

3 Numerical integration 28

4 Solution of a system of linear equations 35

5 The Eigen-Value Problem 50

6 Nonlinear Equation 55
1 INTRODUCTION 3

1 Introduction
There are many mathematical problems, which are formulated while solving problems from other
sciences. Some of these problems can be solved by methods which we have learned in the course of
Calculus, Differential Equations and Linear algebra etc. And we are happy to get exact solutions
or some information which is showing the analytic behavior of the solution. But in most of the
practical purpose we can not get exact solutions by direct computations (may be because of round
off errors) or the solutions in known compact forms may not exist or even the solution in compact
forms is not sufficient and we want some numeric value as the solution. In this course our aim is
to solve such mathematical problems using numerical methods.
2 INTERPOLATION BY POLYNOMIALS 4

2 Interpolation by polynomials
Suppose we want to know about the value of a function (or some of its derivative) at a particular
point. But only information we know about the function is that we know its value (or the value
of its derivatives) at certain other points. It might be some arbitrary continuous function. And
we want to approximate the function by some polynomial, which agrees with the known data
about the function. The reason of approximating the given functions by polynomial is that we can
compute the value of a polynomial at a point just by using basic operations (addition subtraction
and multiplication) of computer.
In this chapter our aim is to find an approximating polynomial to a function, whose value or
the value of its certain derivatives is known at some points. These points are called nodes. To ease
the computations we always try to find the minimal degree polynomial, which satisfies the given
data.
Problem 2.1. Find a polynomial, which coincide with |x| at the points −1, 0 and 1.
It means we need to find a polynomial P , which should coincide with the function |x| at the
points −1, 0 and 1. So we must have
P (−1) = | − 1| = 1, P (0) = |0| = 0, P (1) = |1| = 1 (2.1)
Now since there are three distinct points, we will try to find a two degree polynomial, because
the degree of freedom of two degree polynomial is three. (The vector space of at most two degree
polynomial is three dimensional and {1, x, x2 } is a basis for this vector space.)
Let P (x) = a + bx + cx2 be a polynomial, which passes through the points (−1, 1), (0, 0), (1, 1).
Then we must have
a + b(0) + c(0) = 0
2
a + b(−1) + c(−1) = 1
a + b(1) + c(1)2 = 1 (2.2)
Remark 2.1. Here we need to solve a system of three linear equations in three unknowns. If we
would have approximated it by a polynomial of degree less than two, then the system of three
linear equations in two unknowns might not have any solution. And similarly if we would have
approximated it by a four degree polynomial, then this system of linear equations might have
infinite number of solutions. But because of increased degree of polynomial, computations will be
difficult.
From (2.2), we must have     
1 0
0 a 0
1 −1
1  b  = 1 (2.3)
1 1
1 c 1
     
1 0 0 1 0 0 −2 0 0
Since the det 1 −1 1 = −2, and the Adj 1 −1 1 =  0 1 −1. The inverse of
1 1 1 1 1 1 2 −1 −1
the matrix is given by
 −1  
1 0 0 −2 0 0
1
1 −1 1 = −  0 1 −1 . (2.4)
2
1 1 1 2 −1 −1
And hence       
a −2 0 0 0 0
b = − 1  0 1 −1 1 = 0 . (2.5)
2
c 2 −1 −1 1 1
Thus the two degree polynomial which coincide with |x| at (−1, 0, 1) is given by x2 .
2 INTERPOLATION BY POLYNOMIALS 5

Remark 2.2. In general there are large number of data points with numerically rich data.
• We need to compute the inverse of a large matrix.
• The solution of the system of linear equations may be far from that of exact solution due to
round off errors.
Suppose we know the value of a continuous function at two points x0 , x1 , that is, we know
f (x0 ), f (x1 ).
Question 2.1. Can we find a polynomial, which satisfies P (x0 ) = f (x0 ), P (x1 ) = f (x1 )?
And the answer is very easy that we learned in our 10th standard. We find the equation of a
line passing through (x0 , f (x0 )) and (x1 , f (x1 )), because the equation of line is a polynomial of
degree one. I am sure that you know many versions of the formula of finding the equation of a line
passing through two points. But we try to find this in some different way.
P (x) = ax + b; f (x0 ) = ax0 + b; f (x1 ) = ax1 + b;
or
P (x)(−1) + ax + (1)b = 0; f (x0 )(−1) + ax0 + (1)b = 0; f (x1 )(−1) + ax1 + (1)b = 0;
Now if you recall your matrix representations of system of linear equations from Linear Algebra
course, we can write above system of equations as
    
P (x) x 1 −1 0
f (x0 ) x0 1  a  = 0 (2.6)
f (x1 ) x1 1 b 0

P (x0 ) x0 1
 

Now you know that if for any particular value of x say x0 the 3 × 3 matrix f (x0 ) x0 1 is
f (x1 ) x1 1
invertible, then by multiplyingthe 
inverse of this matrix on both sides of the equation
 (2.6), we get a
−1 P (x) x 1
contradiction that the matrix  a  is a null matrix. This implies that the matrix f (x0 ) x0 1
b f (x1 ) x1 1
is a singular matrix for every value of x. And hence, the determinant of this matrix is zero for all
P (x) x 1
values of x. Then by simplifying the determinant f (x0 ) x0 1 with respect to first column, we
f (x1 ) x1 1
get
P (x)(x0 − x1 ) + f (x0 )(x1 − x) + f (x1 )(x − x0 ) = 0.
Rewriting it we get
(x − x1 ) (x − x0 )
P (x) = f (x0 ) + f (x1 ). (2.7)
(x0 − x1 ) (x1 − x0 )
Observation 2.1. Now if we want to find a polynomial P (x), which agrees with a function
f (x) at three distinct points x0 , x1 , x2 , that is, P (x) satisfies P (x0 ) = f (x0 ), P (x1 ) = f (x1 ) and
P (x2 ) = f (x2 ), and if we consider the two degree polynomial P (x) = a + bx + cx2 + dx3 and
proceed similar to above case then we will come to a conclusion that the determinant of the matrix
P (x) x2 x 1
 
f (x0 ) x20 x0 1
f (x1 ) x21 x1 1 is equal to zero for all values of x in the interval. By simplification with
 

f (x2 ) x22 x2 1
respect to first column we get.
(x − x1 )(x − x2 ) (x − x0 )(x − x2 ) (x − x0 )(x − x1 )
P (x) = f (x0 ) + f (x1 ) + f (x2 ). (2.8)
(x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) (x2 − x0 )(x2 − x1 )
2 INTERPOLATION BY POLYNOMIALS 6

Now if we write
(x − x1 )(x − x2 )
L0 (x) = ,
(x0 − x1 )(x0 − x2 )
(x − x0 )(x − x2 )
L1 (x) = ,
(x1 − x0 )(x1 − x2 )
(x − x0 )(x − x1 )
L2 (x) = ,
(x2 − x0 )(x2 − x1 )
then we observe the following properties of these functions.
• L0 + L1 + L2 = 1,
(
1 if i = j
• Lj (xi ) = δij = ,
0 6 j
if i =

• The degree of each Lj is two, and


• P (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ) + L2 (x)f (x2 ).
The polynomials L0j s are called Lagrange’s fundamental polynomial. And the corresponding poly-
nomial P (x) is called Lagrange’s polynomial.
Problem 2.2. Let [a, b] is a given interval and f is some unknown continuous function defined on
[a, b], if we know the the value of the function at n + 1 distinct points a = x0 < x1 < . . . < xn = b,
that is we know f (xi ) at i = 0, . . . n. Can we find a polynomial

P (x) = a0 + a1 x + a2 x2 + . . . + an xn , (2.9)

such that P (xi ) = f (xi ) for i = 0, . . . n?


2.1. Lagrange Polynomial
Using the similar procedure as in the above two examples J.L. Lagrange found P (x), but we
usually denote it L(x) to give respect to his name.
 
n n n
X Y (x − xi )  X
L(x) =  f (xj ) = Lj (x)f (xj ), (2.10)
j=0
(xj − xi ) j=0
i=0,i6=j

where Lj (x) is jth fundamental polynomial of degree n and is given by


n
Y (x − xi )
Lj (x) = . (2.11)
(xj − xi )
i=0,i6=j

These Lj ’s satisfy the following properties.


Qn
• If we write ω(x) = j=0 (x − xj ), then Lagrange’s fundamental polynomial can also be
expressed as
ω(x)
Lj (x) = . (2.12)
(x − xj )ω 0 (xj )
Pn
• j=0 Lj (x) = 1,
(
1 if j = k
• Lj (xk ) = δjk = ,
0 if j 6= k
2 INTERPOLATION BY POLYNOMIALS 7

• the degree of each Lj is n, and


Pn
• L(x) = j=0 Lj (x)f (xj ). And
• the degree of L(x) is at most n.
Remark 2.3. There are examples of data in which L(x) is of degree strictly less than n.

Remark 2.4. We can use the the following steps to find Lagrange polynomial for the given data.
a) First we compute Lagrange’s fundamental polynomial Lj for each j.
b) Multiply each Lj (x) with the corresponding functional value f (xj ).
c) Then sum these multiplications obtained in step b) over all j 0 s.

Remark 2.5. If we draw the graphs of both f (x) and P (x), then it is clear from the con-
ditions that these two graphs intersect each other at least at n + 1 distinct points namely
(x0 , f (x0 )), (x1 , f (x1 )), . . . , (xn , f (xn )). In other words P (x) interpolates f (x) at n + 1 points.
We can also think of finding a polynomial P such that certain derivatives of P coincide with
the same order derivatives of the given function f at some points in the interval.

2.2. Interpolating Polynomial


Let f be a continuous function defined on some interval. We say that a polynomial P is an
interpolating polynomial for f , if mth derivative of P coincides with mth derivative of f at certain
points of the interval for some nonnegative integer m. The case m = 0 infer that P coincide with
f at some points in the interval.
Example 2.1. The polynomial P (x) = x − f (x0 ) is an interpolating polynomial for some function
f (x), because P (x) agrees with f (x) at x = x0 .
Example 2.2. The Lagrange’s polynomial is also an interpolating polynomial, because it matches
with function at n + 1 distinct points in the interval. And can give some approximating value for
the function at any other point in the interval with some error.
Example 2.3. If f (x) = cos x, then the polynomial P (x) = 1 not only agrees with f (x) at x = 0,
but also the first derivative of P (x) agrees with the first derivatives of f (x) at x = 0. Thus P (x) = 1
is an interpolating polynomial for cos x.

But the third derivative of f (x) does not agree with the third derivative of P (x).
2
Example 2.4. If f (x) = cos x, then the polynomial P (x) = 1 − x2 not only agrees with f (x) at
x = 0, but also the first and second derivatives of P (x) agree with the first and second derivatives
2
of f (x) respectively at x = 0. Thus 1 − x2 is also an interpolating polynomial for cos x.
Example 2.5. If f (x) = sin x, then the polynomial P (x) = x not only agrees with f (x) at x = 0,
but also the first and second derivative of P (x) agrees with the respective derivatives of f (x) at
x = 0. Thus P (x) = x is an interpolating polynomial for sin x.
But the third order derivative of P (x) = x does not coincide with the third order derivative of
3
the function sin x! If we consider P (x) = x − x3! , then P (x) agrees with sin x at x = 0 along with
its first three derivatives.

Does it remind you something? (Think about it.) Let us see one more example then your guess
will be Pakka!
Example 2.6. If f (x) = ex , then the polynomial
2 INTERPOLATION BY POLYNOMIALS 8

• P0 (x) = 1 is an interpolating polynomial for ex , because P0 (x) = 1 agrees with the function
at x = 0.
• P1 (x) = 1 + x is an interpolating polynomial for ex , because P1 (x) and its first derivative
agree with the function ex at x = 0.
2
• P2 (x) = 1 + x + x2 is an interpolating polynomial for ex , because P2 (x) and its first and
second order derivatives agree with the function ex at x = 0.
2 3
• P3 (x) = 1 + x + x2 + x3! is an interpolating polynomial for ex , because P3 (x) and its first
three derivatives agree with the function ex and derivatives respectively at x = 0.
Example 2.7. The Taylor’s polynomial of degree n of the function f (x) around the point x0 is
given by
1 1
Pn (x) = f (x0 ) + (x − x0 )f 0 (x0 ) + (x − x0 )2 f 00 (x0 ) + . . . + (x − x0 )n f n (x0 ). (2.13)
2! n!
Then Pn agrees with f (x) at x = x0 along with its first n derivatives. Therefore, Pn is also an
interpolating polynomial for the function f (x).
Remark 2.6. In all above examples, to find the interpolating polynomial P (x), we do not need
the full information about the function. We only need the the specific data about the function. In
Example (2.5), we can also pose the problem in a different way, without saying any thing about
sine function as follows.
(i) Find a polynomial P (x), which satisfy the following P (0) = 0, P 0 (0) = 1, P 00 (0) = 0, and
P 000 (0) = −1, or
(ii) Find a polynomial P (x), which agree with the function f (x) along the data f (0) = 0, f 0 (0) = 1,
f 00 (0) = 0, and f 000 (0) = −1.
3
Note that this f (x) need not be sin x, for example it might be x − x3! + x10 .
Example 2.8. The polynomial
(x − x1 ) (x − x0 )
P (x) = f (x0 ) + f (x1 )
(x0 − x1 ) (x1 − x0 )
is an interpolating polynomial for f (x), because P (x) coincide with f (x) at the points x0 , x1 .
On the other hand if we want to find an interpolating polynomial for the data f (x0 ) = f (x1 ) = c,
then above formula yields P (x) to be a constant polynomial c. But the polynomial P (x) =
(x − x1 )(x − x2 ) + c also satisfies the data. It means that the interpolating polynomial for a given
data is not necessarily unique.
Question 2.2. Since the interpolating polynomial is not unique, the question arises that by which
interpolating polynomial we should approximate the function.
In above example there are two interpolating polynomial for the given data, one is a constant
polynomial and other is of degree two. Note that the calculations with the constant polynomial
will be easier than that of two degree polynomial.
Remark 2.7. In general we need to deal with a large and numerically rich data, so to ease the
computations it will be better to approximate by an interpolating polynomial of minimal degree.
Remark 2.8. In problem (2.1) we have interpolated a data with three distinct points by a two
degree polynomial and by construction it is clear that this is the only polynomial of at most degree
two, which can interpolate the given data. If we change the functional value as constant 1 at each of
the above three points (−1, 0, 1), and follow the same procedure as above, we get the interpolating
polynomial as constant polynomial 1, which will be again unique by construction.
2 INTERPOLATION BY POLYNOMIALS 9

Question 2.3. If we are given the functional value at (n + 1) distinct points, then by the answer
of Problem (2.2) there exists a Lagrange’s polynomial of at most degree n, which interpolates the
given data. Now the question is that wether there are any other interpolating polynomials (for the
same data) of degree at most n. Or in other words, wether the Lagrange’s polynomial is unique
interpolating polynomial of degree at most n for given data of n + 1 distinct points.
Answer to this is Yes.

Theorem 2.1. (Uniqueness of Interpolating Polynomial) If we know the functional value


of a real valued function f at n + 1 distinct points x0 , x1 , . . . , xn , then there exists exactly one
polynomial of degree at most n, which interpolates f at x0 , x1 , . . . , xn .
Clearly Lagrange’s polynomial (2.10) is of degree at most n and interpolates the given data.
Only thing we need to show is that it is unique interpolating polynomial of degree at most n. We
will prove it in two ways.
Proof. First, without loss of generality we consider that x0 < x1 < . . . < xn−1 < xn . Let
P (x) = a0 + a1 x + a2 x2 + . . . + an xn be a polynomial, which satisfies the following data.

P (x0 ) = f (x0 ), P (x1 ) = f (x1 ), . . . , P (xn ) = f (xn ). (2.14)

Then we must have

a0 + a1 x0 + a2 x20 + . . . + an xn0 = f (x0 )


a0 + a1 x1 + a2 x21 + . . . + an xn1 = f (x1 )
........................................................... (2.15)
...........................................................
...........................................................
a0 + a1 xn + a2 x2n + . . . + an xnn = f (xn )

Or in matrix form,
x20 . . . xn0
    
1 x0 a0 f (0)
1
 x1 x21 . . . xn1 
  a1   f (x1 ) 
   
· · · ... ·  ·  · 
   =   (2.16)
· · · ... ·   ·   · 
   

· · · ... ·  ·   · 
1 xn x2n . . . xnn an f (xn )
This a system of (n + 1) linear equations in (n + 1) unknowns a0 , a1 , . . . , an . This system will
have a unique solution if determinant of following matrix (known as Vandermonde determinant)
is non zero.
1 x0 x20 . . . xn0
1 x1 x21 . . . xn1
· · · ... ·
∆= 6= 0 (2.17)
· · · ... ·
· · · ... ·
1 xn x2n . . . xnn
Q
But we know that ∆ = 0≤i<j≤n (xi − xj ) 6= 0. So all the coefficients of P (x) are uniquely
determined and hence P (x) itself.

For the second proof we need the following lemma.


Lemma 2.2. Let f be a real valued function on real line. Further if f is n times differentiable
function and has (n + 1) zeros, then nth derivative of the function f n has at least one real zero.
2 INTERPOLATION BY POLYNOMIALS 10

Above lemma can be proved by repeated application of Rolle’s Theorem. We need the following
corollary to this lemma for our second proof.
Corollary 2.3. A non zero polynomial of degree n cannot have n + 1 distinct zeros.
Because if polynomial P (x) 6= 0 of degree n has n + 1 zeros, then its by above lemma its nth
derivative must have at least one zero. But the nth derivative of a polynomial of degree n is the
constant function n!an , where an is the coefficient of xn in P (x) and this constant function can not
have any zero unless an itself is zero. This gives the contradiction to the degree of the polynomial
P (x).
Proof. This proof is by contradiction, so if we consider any other polynomial Q(x) 6= P (x) of
degree at most n such that Q(x) also interpolates the same data, that is, we must have

Q(x0 ) = f (x0 ), Q(x1 ) = f (x1 ), . . . , Q(xn ) = f (xn ). (2.18)

From (2.14) and (2.18), it is clear that if the polynomial R(x) = P (x) − Q(x) 6= 0 must have n + 1
zeros at x0 , x1 , . . . , xn . This gives a contradiction because R(x) is of at most degree n, and hence
it cannot have n + 1 distinct zeros unless R(x) itself is a zero polynomial.
Remark 2.9. The main difficulty in dealing with Lagrange’s polynomial is that if we want to
increase the data even with a single point only, we need to compute it from the very beginning.
2.3. Newton’s Divided Difference Interpolating Polynomial
Newton suggested to write the interpolating polynomial in the following form for a given data
of n + 1 distinct points x0 , x1 , . . . , xn .

P (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )(x − x1 ) + . . . + an (x − x0 )(x − x1 ) . . . (x − xn−1 ). (2.19)

Since P (x) coincide with f (x) at the points x0 , x1 , . . . , xn , we must have

(2.20)
f (x0 ) = a0 ;
f (x1 ) = a0 + (x1 − x0 )a1 ;
f (x2 ) = a0 + (x2 − x0 )a1 + (x2 − x0 )(x2 − x1 )a2 ;
... = ...................................................................................
... = .................................................................................................
f (xn ) = a0 + (xn − x0 )a1 + (xn − x0 )(xn − x1 )a2 + . . . + (xn − x0 ) . . . (xn − xn−1 )an .

It is clear from the above system of equations that a0 = f (x0 ) and to compute a1 , we plug in the
value of a0 in the second equation and similarly to compute ak we can substitute the values of
a0 , a1 , . . . , ak−1 obtained by solving the first k equations in (k + 1)th equation. It is important to
note that we only need to consider first k + 1 equations to compute ak . Thus the computation
of ak depends only on the points x0 , . . . , xk . So we can rename ak as some new function of the
points {x0 , x1 , x2 , . . . , xk }. To show dependency of ak on the function f also we denote ak by
f [x0 , x1 , x2 , . . . , xk ]. We call f [x0 , x1 , x2 , . . . , xk ] as kth divided difference of the function f relative
to points x0 , x1 , . . . , xk . We will justify the name divided difference in further discussion. Now
we can use the values of a0 , . . . , ak in the (k + 2)th equation to compute ak+1 and so on. Thus
we can compute all the values of a0 , a1 , . . . , an and hence P (x). Since this P (x) is of degree
at most n and also interpolates the data at n + 1 distinct points say at x0 , x1 , x2 , . . . , xn , then
by Theorem (of uniqueness of interpolating polynomial) 2.1, we infer that P (x) is nothing but
Lagrange’s polynomial written in different form. But then an = f [x0 , x1 , x2 , . . . , xn ], which is the
2 INTERPOLATION BY POLYNOMIALS 11

coefficient xn in Newton’s polynomial, must also be the coefficient of xn in Lagrange’s polynomial


as well. And hence, equating the coefficient of xn from (2.19) and (2.10), we get
n
X f (xj )
an = f [x0 , x1 , x2 , . . . , xn ] = Qn . (2.21)
j=0 i=0,i6=j (xj − xi )

Now it is worth to note that even if we rearrange the order of data points, the interpolating
polynomial will not change and hence an , the coefficient of xn in interpolating polynomial, will
remain the same. This fact is also justified by the expression on R.H.S. of (2.21). Thus we can
say that the nth divided difference f [x0 , x1 , x2 , . . . , xn ] is dependent on the set of node points
{x0 , x1 , x2 , . . . , xn } but independent of the order of node points.
In the following we aim to justify the name divided difference. It is easy to see from (2.20) that
f (x1 ) − f (x0 )
a1 = .
(x1 − x0 )
Here, a1 is obtained by dividing the difference of functional values by the difference of points and
hence it is known as Newton’s first divided difference of f (x) relative to x0 , x1 and denoted by
f [x0 , x1 ]. In general Newton’s first divided difference of f (x) relative to xj , xk , (j 6= k) is defined
as
f (xj ) − f (xk )
f [xj , xk ] = . (2.22)
(xj − xk )
It is clear from the definition that f [xj , xk ] is independent of the order of j and k and depends
only on the set {j, k}. Using these expressions for a0 and a1 , we can compute a2 from first three
equations of (2.20).
f (x0 ) f (x1 ) f (x2 )
a2 = f [x0 , x1 , x2 ] = + + . (2.23)
(x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) (x2 − x0 )(x2 − x1 )
Clearly, f [x0 , x1 , x2 ] is independent of order of x0 , x1 , x2 . In fact one can write f [x0 , x1 , x2 ] as
the fraction of the difference of certain first divided differences by difference of certain points as
following.
Exercise 2.1. Show that
f [x1 , x2 ] − f [x0 , x1 ] f [x1 , x2 ] − f [x2 , x0 ] f [x0 , x2 ] − f [x0 , x1 ]
f [x0 , x1 , x2 ] = = = . (2.24)
(x2 − x0 ) (x1 − x0 ) (x2 − x1 )
Hence f [x0 , x1 , x2 ] is known as Newton’s second divided difference of f (x) relative to
{x0 , x1 , x2 }. In general we define Newton’s second divided difference of f (x) relative to {xi , xj , xk }
as
f [xj , xk ] − f [xi , xj ]
f [xi , xj , xk ] = . (2.25)
(xk − xi )
Moreover we can proceed in the similar way and can see that f [x0 , x1 , x2 , . . . , xk−1 , xk ], the New-
ton’s kth divided difference of f (x) relative to {x0 , x1 , . . . , xk−1 , xk }, can be written as the fraction
of difference of certain (k − 1)th divided difference by the difference of certain point values as
follows.
f [x1 , x2 , . . . , xk−1 , xk ] − f [x0 , x1 , x2 , . . . , xk−1 ]
f [x0 , x1 , x2 , . . . , xk−1 , xk ] = . (2.26)
(xk − x0 )
It can be checked that
k
X f (xi )
f [x0 , x1 , x2 , . . . , xk−1 , xk ] = Qk , k = 3, . . . , n. (2.27)
i=0 j6=i=0 (xi − xj )
2 INTERPOLATION BY POLYNOMIALS 12

In general we define Newton’s kth divided difference of f (x) relative to {xi0 , xi1 , . . . , xik−1 , xik }
recursively as follows
f [xi1 , xi2 , . . . , xik−1 , xik ] − f [xi0 , xi1 , xi2 , . . . , xik−1 ]
f [xi0 , xi1 , xi2 , . . . , xik−1 , xik ] = . (2.28)
(xik − xi0 )
Now we can rewrite the Newton’s interpolating polynomial by plugging the the expressions for ak ’s
as Newton’s divided difference in (2.19).

P (x) = f (x0 ) + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ) + . . .


. . . + f [x0 , x1 , . . . , xn ](x − x0 )(x − x1 ) . . . (x − xn−1 ),
Xn
= f (x0 ) + (x − x0 )(x − x1 ) . . . (x − xi−1 )f [x0 , x1 , . . . , xi ] (2.29)
i=1

Thus to find the polynomial P (x) completely, we need to find f [x0 , x1 ], f [x0 , x1 , x2 ], . . .,
f [x0 , x1 , . . . , xn ], that is, the divided difference of f of all order from 1 to n. One can directly
calculate f [x0 , x1 ], but to find f [x0 , x1 , x2 ] we also need to calculate first difference of f relative to
points x1 , x2 , that is, f [x1 , x2 ]. And further to find f [x0 , x1 , x2 , x3 ], we need to find f [x1 , x2 , x3 ],
that is, we need an extra computation of f [x2 , x3 ]. If we proceed in this way, we need to compute
the following table

x f 1st f [xi , xi+1 ] 2nd f [xi , xi+1 , xi+2 ] . . . nth f [x0 , . . . , xn ]


 
 x0
 f (x0 ) f [x0 , x1 ] = f (x(x11)−f (x0 )
−x0 ) f [x0 , x1 , x2 ] = f [x1 ,x 2 ]−f [x0 ,x1 ]
(x2 −x0 ) ... f [x0 , . . . , xn ]  
f (x2 )−f (x1 ) f [x2 ,x3 ]−f [x1 ,x2 ]
 x1 f (x1 ) f [x1 , x2 ] = (x2 −x1 ) f [x1 , x2 , x3 ] = . . . . . .
 
(x3 −x1 ) 
f (x3 )−f (x2 ) f [x3 ,x4 ]−f [x2 ,x3 ] .
 
 x2 f (x2 ) f [x2 , x3 ] = (x3 −x2 ) f [x2 , x3 , x4 ] = ... ...
 (x4 −x2 ) 
 ... ... ... ... ... ... 
 
x
n−1 f (xn−1 ) f [xn−1 , xn ] ... ... ... 
xn f (xn ) ... ... ... ...
This table will be of upper triangular form and the coefficients a0 = f (x0 ), a= f [x0 , x1 ],
a2 = f [x0 , x1 , x2 ], . . . , an = f [x0 , x1 , . . . , xn ] are the entries of the first row. So by directly
substituting the values of the coefficients in (2.19), we can obtain Newton’s divided difference
interpolating polynomial.
Problem 2.3. Find Newton’s interpolating polynomial for the following data.

f (0) = 3, f (1) = 5, f (3) = 33, f (4) = 67, f (5) = 133.

First we form the divided difference table for the above data.
x f 1st 2nd 3rd 4th
3−3 12−0 8−4 1−1
0 3 (1−0) = 0 (3−0) = 4 (4−0) = 1 (5−0) = 0
27−3 36−12 12−8
1 3 (3−1) = 12 (4−1) = 8 (5−1) = 1 ...
63−27 60−36 . (2.30)
3 27 (4−3) = 36 (5−3) = 12 ... ...
123−63
4 63 (5−4) = 60 ... ... ...
5 123 ... ... ... ...
Now we substitute the corresponding values divided differences in formula for Newton’s inter-
polating polynomial of degree four.

P (x) = f (0) + f [0, 1](x − 0) + f [0, 1, 3](x − 0)(x − 1) +


f [0, 1, 3, 4](x − 0)(x − 1)(x − 3) + f [0, 1, 3, 4, 5](x − 0)(x − 1)(x − 3)(x − 4)
P (x) = 3 + 0(x − 0) + 4(x − 0)(x − 1) + 1(x − 0)(x − 1)(x − 3) + 0(x − 0)(x − 1)(x − 3)(x − 4)
P (x) = 3 + 4x(x − 1) + x(x − 1)(x − 3)
2 INTERPOLATION BY POLYNOMIALS 13

2.4. Divided difference at variable point (Functional value in terms of divided differ-
ence.)
In the following, we will try to find the divided difference of the function at some arbitrary point
of the interval. And we will see that the functional value at some given point can be expressed in
terms of divided difference at that point.
f (x0 ) − f (x)
f [x, x0 ] = .
(x0 − x)
=⇒ f (x) = f (x0 ) + (x − x0 )f [x, x0 ]. (2.31)
Further,
f [x0 , x1 ] − f [x, x0 ]
f [x, x0 , x1 ] = .
(x1 − x)
=⇒ f [x, x0 ] = f [x0 , x1 ] + (x − x1 )f [x, x0 , x1 ]. (2.32)
Substituting (2.32) in (2.31), we get
f (x) = f (x0 ) + (x − x0 )f [x0 , x1 ] + (x − x0 )(x − x1 )f [x, x0 , x1 ]. (2.33)
th
Continuing this procedure up to n divided difference, we get inductively that
f (x) = f (x0 ) + (x − x0 )f [x0 , x1 ] + (x − x0 )(x − x1 )f [x0 , x1 , x2 ] + . . .
+(x − x0 )(x − x1 ) . . . (x − xn−1 )f [x, x0 , x1 , . . . , xn−1 ]. (2.34)
But
f [x0 , x1 , x2 , . . . , xn ] − f [x, x0 , x1 , . . . , xn−1 ]
f [x, x0 , x1 , . . . , xn ] =
(xn − x)
=⇒ f [x, x0 , x1 , . . . , xn−1 ] = f [x0 , x1 , x2 , . . . , xn ] + (x − xn )f [x, x0 , x1 , . . . , xn ].
Thus from (2.34), we get
f (x) = f (x0 ) + (x − x0 )f [x0 , x1 ] + (x − x0 )(x − x1 )f [x0 , x1 , x2 ]
+ . . . + (x − x0 )(x − x1 ) . . . (x − xn−1 )f [x0 , x1 , x2 , . . . , xn ]
+(x − x0 )(x − x1 ) . . . (x − xn )f [x, x0 , x1 , . . . , xn ]. (2.35)
Using the expression for Newton’s polynomial form (2.29), we get from (2.35)
f (x) = P (x) + (x − x0 )(x − x1 ) . . . (x − xn )f [x, x0 , x1 , . . . , xn ]. (2.36)
2.5. Properties of divided difference
As we saw that first divided difference relative to points x0 , x1 is defined as f [x0 , x1 ] =
f (x1 )−f (x0 )
(x1 −x0 ) .But this expression reminds us (mean value theorem) that if we also assume the
differentiability of the function, our first divided difference f [x0 , x1 ] is nothing but the derivative
of the function f at some point in the smallest open interval containing both x0 , x1 .
Question 2.4. Can we say that second divided difference is related to second derivative of the
function?
If yes, then how?
If we assume that the function is two times differentiable and x0 < x1 < x2 , then we write
f [x1 , x2 ] − f [x0 , x1 ] f 0 (ξ1 ) − f 0 (ξ2 ) (ξ1 − ξ0 ) f 0 (ξ1 ) − f 0 (ξ2 ) (ξ1 − ξ0 ) 00
f [x0 , x1 , x2 ] = = = = f (ζ),
(x2 − x0 ) (x2 − x0 ) (x2 − x0 ) (ξ1 − ξ0 ) (x2 − x0 )
(2.37)
where ξ0 ∈ (x0 , x1 ), ξ1 ∈ (x1 , x2 ) and ζ ∈ (ξ0 , ξ1 ).
2 INTERPOLATION BY POLYNOMIALS 14

Observation 2.2. Here we note that if ξ0 , ξ1 are the middle points of the intervals (x0 , x1 ) and
(x1 , x2 ) respectively, then f [x0 , x1 , x2 ] = 21 f 00 (ζ).
We will return back to this and prove a relation between divided difference and derivative of
the function in the Remark 2.12 of Theorem 2.4. Now we see some other properties of divided
differences.
Using Exercise 2.1, one can see that Newton’s second divided difference is independent of the
order ofQnode points and depends only on the set of all node points. If we consider the function
n
ω(x) = j=0 (x − xj ), then we can see that
n
X f (xi )
f [x0 , x1 , x2 , . . . , xn ] = . (2.38)
i=0
ω 0 (xi )

We know that f [x, x0 ] = f (x(x00)−f (x)


−x) . This shows that f [x, x0 ] is a continuous function of x for all
(x0 6=)x ∈ [a, b], possibly undefined at x0 . But if we consider f to be differentiable function at x0 ,
then we can define
f (x0 ) − f (x) d
f [x0 , x0 ] = lim f [x, x0 ] = lim = f |x=x0 = f 0 (x0 ). (2.39)
x→x0 x→x0 (x0 − x) dx

Further, if we assume that f is once differentiable in whole interval (a, b), then it is clear by mean
value theorem that f [x0 , x] = f (x)−f (x0 )
(x−x0 ) = f 0 (x̃) for some x̃ ∈ (min{x0 , x}, max{x0 , x}). Moreover,

d f [x0 , x1 , . . . , xn , x + h] − f [x0 , x1 , . . . , xn , x]
f [x0 , x1 , . . . , xn , x] = lim ,
dx h→0 h
f [x0 , x1 , . . . , xn , x + h] − f [x, x0 , x1 , . . . , xn ]
= lim ,
h→0 (x + h) − x
= lim f [x, x0 , x1 , . . . , xn , x + h],
h→0
= f [x, x0 , x1 , . . . , xn , x],
= f [x0 , x1 , . . . , xn , x, x]. (2.40)

Problem 2.4. Use induction to prove that


(k+1) times
dk z }| {
k
f [x0 , x1 , . . . , xn , x] = k!f [x0 , x1 , . . . , xn , x, x, . . . , x]. (2.41)
dx
Now we further assume f to be second differentiable function and define

f [x0 , x0 , x0 ] = lim f [x0 , x1 , x2 ],


x1 →x0 ,x2 →x0
= lim f [x0 , x1 , x2 ],
x1 −x0 →0,x2 −x0 →0
= lim f [x0 , x0 + h, x0 + k],
h→0,k→0
f [x0 + h, x0 + k] − f [x0 , x0 + h]
= lim ,
h→0,k→0 (x0 + k) − x0
f [x0 , x0 + k] − f [x0 , x0 ]
= lim ,
k→0 k
f (x0 +k)−f (x0 )
k − f 0 (x0 )
= lim ,
k→0 k
f (x0 + k) − f (x0 ) − kf 0 (x0 )
= lim .
k→0 k2
2 INTERPOLATION BY POLYNOMIALS 15

Using L’Hospital’s Rule, we have now


f (x0 + k) − f (x0 ) − kf 0 (x0 )
f [x0 , x0 , x0 ] = lim
k→0 k2
f (x0 + k) − f 0 (x0 )
0
= lim
k→0 2k
1 00
= f (x0 ). (2.42)
2
Remark 2.10. Let f be n times differentiable, then in the similar manner one can define
f [x0 , x0 , . . . , x0 ] = limhi →0 f [x0 , x0 + h1 , . . . , x0 + hn ] and using induction one can obtain that
(n+1) times
1
f [x0 , x0 , . . . , x0 ] = f n (x0 ).
z }| {
(2.43)
n!
2.6. Error in Interpolation
Till now we approximated a continuous real valued function f defined on interval [a, b] by the
interpolating polynomials Pn at node points x0 = a, x1 , . . . , xn = b. But interpolating polynomials
Pn does not necessarily match with f at some arbitrary point of the interval other than node points.
And we expect some difference between Pn (x̃) and f (x̃) at some point x̃ ∈ [a, b], x̃ 6= xi . This
difference is known as error in interpolation. Since by Theorem 2.1 there is unique interpolation
polynomial Pn of degree at most n, we can define error function as

En f (x) = f (x) − Pn (x). (2.44)

Remark 2.11. We note that Newton’s divided difference interpolating polynomial Pn (x) given
by (2.29) interpolates the function f (x) at n + 1 distinct points and the degree of Pn (x) is at most
n, but Lagrange’s polynomial L(x) given by (2.11) also interpolates the same data so by Theorem
2.1 both L(x) and Pn (x) are the same polynomial written in two different ways. Therefore, for a
given continuous function f the error function defined by (2.44) depends only on the set of node
points. From expression (2.36) it is clear that by interpolating the function f (x) by a Newton’s
divided difference interpolating polynomial Pn (x), the error En f (x) can be expressed a

En f (x) = f (x) − Pn (x) = (x − x0 )(x − x1 ) . . . (x − xn )f [x, x0 , x1 , . . . , xn ]. (2.45)

Question 2.5. Can we estimate this error?


Theorem 2.4. Let f be n + 1 times differentiable function defined on interval [a, b]. Let Pn be
the unique interpolating polynomial of degree at most n, interpolating f at n + 1 distinct points
x0 , x1 , x2 , . . . , xn . Then the error in interpolation at some point xi 6= x ∈ [a, b] is given by

f n+1 (ξ)
En f (x) = (x − x0 )(x − x1 ) . . . (x − xn ) . (2.46)
(n + 1)!

Where ξ is some point in the interval (min{x, x0 , x1 . . . , xn }, max{x, x0 , x1 . . . , xn }) and depends


upon x.
Proof. Let x be some point in the interval other than node points. Let φ(t) be a function defined
on interval [a, b] by

φ(t) = f (t) − Pn (t) − (t − x0 )(t − x1 ) . . . (t − xn )K, (2.47)

where K is a constant determined by the equation φ(x) = 0, that is,

f (x) − Pn (x) + (x − x0 )(x − x1 ) . . . (x − xn )K = 0. (2.48)


2 INTERPOLATION BY POLYNOMIALS 16

It is clear that φ(xi ) = 0 for all i = 0, 1, 2, . . . , n. Thus function φ has n + 2 zeros say
{x, x0 , x1 , . . . , xn }. Hence its (n + 1)th derivative has at least one zero at some point ξ in the
interval (min{x, x0 , x1 . . . , xn }, max{x, x0 , x1 . . . , xn }), that is, φn+1 (ξ) = 0. Therefore,

0 = φn+1 (ξ) = f n+1 (ξ) − 0 − (n + 1)!K. (2.49)


1 n+1
This implies that K = n+1! f (ξ). And hence from (2.47) and (2.48),

1
En f (x) = f (x) − Pn (x) = (x − x0 )(x − x1 ) . . . (x − xn ) f n+1 (ξ). (2.50)
(n + 1)!

Remark 2.12. Equating (2.45) and (2.46), we get


1
f [x, x0 , x1 , . . . , xn ] = f n+1 (ξ). (2.51)
n + 1!
Where ξ ∈ (min{x, x0 , x1 . . . , xn }, max{x, x0 , x1 . . . , xn }).
Corollary 2.5. If Pm (x) = a0 +a1 x+a2 x2 +. . .+am xm is of degree m, then mth divided difference
of Pm is am and (m + 1)th divided difference is zero.
1 n+1
Proof. For f (x) = Pm (x), we have from (2.51) that Pm [x, x0 , x1 , . . . , xn ] = n+1! Pm (ξ).
1 m+1
Thus Pm [x, x0 , x1 , . . . , xm ] = m+1! Pm (ξ) = 0, or Pm [x0 , x1 , . . . , xm , xm+1 ] = 0. Similarly
1 m
Pm [x, x0 , x1 , . . . , xm−1 ] = m! Pm (ξ) = am , or Pm [x0 , x1 , . . . , xm−1 , xm ] = am .
2.7. Bound on the Error in Interpolation
We note from (2.46) that the error in interpolation by n degree polynomial is given by

f n+1 (ξ) f n+1 (ξ)


En f (x) = (x − x0 )(x − x1 ) . . . (x − xn ) = ω(x). (2.52)
(n + 1)! (n + 1)!

If we denote by
Mn+1 = max |f n+1 (x)|, (2.53)
x∈[a,b]

then the error is bounded by


Mn+1
|En f (x)| ≤ |(x − x0 )(x − x1 ) . . . (x − xn )|. (2.54)
(n + 1)!

If we some how know the bound for |(x − x0 )(x − x1 ) . . . (x − xn )| on [a, b] or the maxx∈[a,b] |(x −
x0 )(x−x1 ) . . . (x−xn )|, we can find the uniform bound for the error function. Certainly, the maxima
and minima is attained by the continuous function ω(x) = (x − x0 )(x − x1 ) . . . (x − xn ) on some
Mn+1
points of the closed interval [a, b]. And we can bound En f (x) by (n+1)! max{|minima|, |maxima|}.
Since points of maxima or minima are also the points of local maxima and minima, so if we know
all the points of local minima and maxima, that is, the critical points, we only need to compute
the modulus of functional values at these critical points and find the maximum of those values.
But we know that the critical points for some differentiable function are given by the roots of the
derivative of the function. In our case critical points are given by the roots of the derivative of the
polynomial ω(x), that is, by the solution of the equation ω 0 (x) = 0. But to find these roots might
not be an easy task if there is a large number of node points.
Problem 2.5. Find the bound on the error by interpolating a polynomial of five degree x5 −x3 +1
by Lagrange’s interpolating polynomial of degree two at the points 0, 2, 3.
2 INTERPOLATION BY POLYNOMIALS 17

Solution. We will find the solution in following steps.


• Since the number of node points are two, we need to interpolate by a polynomial of degree
three.
• Since we are looking to interpolate the function with the given node points, so the interval
of our concern is the smallest interval containing all the nodes. In our case this is [0, 3].
1
• Since the error function in our case is given by E(x) = (x − 0)(x − 2)(x − 3) (2+1)! f 2+1 (ξ), to
find the bound on this it is enough to find the bound on (x − 0)(x − 2)(x − 3) and f 000 (x) on
the interval [0, 3].
• We have f 000 (x) = 60x2 − 6. Since the derivative of 60x2 − 6 is non-negative on the interval
[0, 3], f 000 is non increasing function. Thus the maximum of |f 000 (x)| on the interval is possible
at the end points of interval. So clearly this maximum is 534.
• Further we also need to compute the maximum of |x3 − 5x2 + 6x| on [0, 3]. This can be
attained either at the end point of the interval or at some local maxima or minima. But at
end points, ω(x) = x3 − 5x2 + 6x is zero, so we only need to consider the critical points of
2
√ points can be obtained by solving 3x − 10x + 6 = 0.
ω(x) in the interval [0, 3]. These critical
So points of extremum are xe = (5 ± 7)/3 = 2.54, .78. And the local extremum values will
be given by ω(xe ) = −.63, .21. thus the maximum value of |x3 − 5x2 + 6x| on [0, 3] is .63.
• Thus,
1
|E(x)| = |(x − 0)(x − 2)(x − 3) f 2+1 (ξ)|
(2 + 1)!
1
≤ |(x − 0)(x − 2)(x − 3)| |f 3 (ξ)|
(3)!
≤ .63 × .16 × 534 = 53.83

Remark 2.13. Thus we see in above problem that error in interpolation might be significantly
large. But suppose if we have freedom to choose the position of node points, (not the number of
nodes), wether we can control the error.
Question 2.6. More precisely, suppose we are given a smooth function f . And we aim to inter-
polate f by n + 1 node points. Further suppose that the choice of node points is in our hand.
Can we choose these n + 1 nodes in such a way that the maximum error on the smallest interval
containing all the nodes is controlled?
Suppose we assume that the node points are equally spaced, that is, a = x0 , < x1 = x0 + h, <
x2 = x0 + 2h, . . . < xn = x0 + nh = b. And we know some how that |f n+1 (x)| ≤ Mn+1 . Then it is
clear from from (2.54) that error function is bounded as follows.
Mn+1
|En f (x)| ≤ |(x − x0 )(x − x0 − h) . . . (x − x0 − nh)|. (2.55)
(n + 1)!
Clearly ω(x) = (x − x0 )(x − x0 − h) . . . (x − x0 − nh) is independent of the function. Here we
will try to find a bound for ω(x) = (x − x0 )(x − x0 − h) . . . (x − x0 − nh) on the smallest interval
containing all nodes, that is, [x0 , x0 + nh].
We will find the bound for ω(x) in three different cases of linear (n = 1), quadratic (n = 2),
and cubic (n = 3) interpolation.
In case of linear interpolation it is easy to observe that the maximum value of |(x − x0 )(x − x1 )|
is obtained at x = (x0 + x1 )/2 and is given by (x1 − x0 )2 /4. And thus the bound for error in
interpolation by a linear polynomial (line) is given by M2 (x1 − x0 )2 /8, that is,
|E1 f (x)| ≤ M2 (x1 − x0 )2 /8. (2.56)
2 INTERPOLATION BY POLYNOMIALS 18

In case of quadratic interpolation we need to find the maximum value of |ω(x)| = |(x − x0 )(x −
x1 )(x − x2 )|. This maximum value can be obtained at one of the roots of quadratic polynomial
|ω 0 (x)|, which certainly has two real roots. But in case of equidistant points we need to find the
maximum value of |ω(x)| = |(x − x0 )(x − x0 − h)(x − x0 − 2h)| in the interval [x0 , x0 + 2h]. This
extremum will be attained at some critical point given by ω 0 (x) = 0 or (x − x0 − h)(x − x0 − 2h) +
(x − x0 )(x − x0 − 2h) + (x − x0 )(x − x0 − h) = 0. For simplification we reparameterize the curve
ω(x) with origin as the middle point of the interval [x0 , x0 + 2h] by assuming x − x0 = (t + 1)h,
then x ∈ [x0 , x0 + 2h] ⇔ t ∈ [−1, 1]√and we need to solve t(t √ − 1) + (t + 1)(t − 1) + (t + 1)t = 0,
that is, 3t2 − √ 1 = 0. Thus t = ±1/ 3 or x = x0 + h ± h/ 3. But at both of these values of x,
|ω(x)| = 2h3 / 27. Thus from (2.55), It is clear that

M3 3 √
|E2 f (x)| ≤ 2h / 27. (2.57)
3!
Now we can bound E2 f by any given positive small number by choosing step size h accordingly
small.
Similarly in cubic case if we know M4 , then to bound E3 f , we need to control |ω(x)| =
|(x − x0 )(x − x0 − h)(x − x0 − 2h)(x − x0 − 3h)|. Thus we need to find the solution for ω 0 (x) =
0. For simplification we reparameterize the curve ω(x) with origin as the center of the interval
[x0 , x0 + 3h]. Let x − x0 = (t + 3/2)h, then x ∈ [x0 , x0 +
√3h] ⇔ t ∈ [−3/2, 3/2] and we need to
solve d/dt[(t2 − 9/4)(t2 − 1/4)] = 0, which gives t = 0, ± 5/2. And the maximum value of ω(x)
is 1. And hence
M4 4
|E3 f (x)| ≤ h . (2.58)
4!
Problem 2.6. Determine the maximum step size that can be used in the tabulation of f (x) = ex
in [0, 1], so that the error for cubic interpolation will be less than 5 × 10−4 .
Solution. Since we want to approximate f by a three degree polynomial, we need to bound the
error function E3 f using (2.58). For this we need to find M4 , that is, maximum of the fourth
derivative of f (x) = ex on the interval [0, 1]. But f 4th (x) = ex , so M4 = e. Since we want or error
to be bounded by 5 × 10−4 , we need the step size h such that
M4 4
|E3 f (x)| ≤ h = (e/24)h4 ≤ 5 × 10−4 .
4!
Or we want
h4 ≤ (5 × 24 × 10−4 )/e = .004414553294,
or,
h ≤ (.004414553294).25 = .25776366.
Observation 2.3. Suppose if we approximate the function f (x) = x3 − x2 + 2 at the nodes
−1, 0, 1, the two degree polynomial, which satisfies f (−1) = 2, f (0) = 2, f (1) = 2, is given by
P (x) = −x2 + x + 2. If we draw the graphs of these functions, we observe that the graphs intersect
each other at x = 1, but goes away rapidly as soon as they leave x = 1. And this is because they
intersect perpendicularly at x = 1(P 0 (1) = −1, f 0 (1) = 1). So it is desirable to approximate the
function with a polynomial, which not only agrees with the function at nodes but derivative of the
polynomial also agrees with the derivative of the function at node points.

2.8. Hermite Interpolation


Problem 2.7. Find a polynomial P (x) of minimal degree which agrees with the function f along
the data
P (x0 ) = f (x0 ), P 0 (x0 ) = f 0 (x0 ), P (x1 ) = f (x1 ), P 0 (x1 ) = f 0 (x1 ). (2.59)
2 INTERPOLATION BY POLYNOMIALS 19

Solution. Since there are four conditions, it is reasonable to consider the three degree polynomial
P (x) = a0 + a1 x + a2 x2 + a3 x3 , in four unknown coefficients, (a0 , a1 , a2 , a3 ), which are to be
determined by the given data (2.59). Since P (x) satisfies (2.59), we have
a0 (1)+ a1 x0 + a2 x20 + a3 x30 = f (x0 )
a0 (0)+ a1 (1)+ a2 2x0 + a3 3x20 = f 0 (x0 )
a0 (1)+ a1 x1 + a2 x21 + a3 x31 = f (x1 )
a0 (0)+ a1 (1)+ a2 2x1 + a3 3x21 = f 0 (x1 )
Or in matrix form,
x20 x30
    
1 x0 a0 f (x0 )
0 1 2x0 3x20     0
 a1  f (x0 )

3   =  (2.60)

1 x1 x21 x1 a2 f (x1 ) 
0 1 2x1 3x1 2
a3 f 0 (x1 )
1 x0 x20 3
 
x0
0 1 2x0 3x20 
One can check that the det   = (x1 −x0 )4 6= 0, so we will have a unique solution
1 x1 x21 x31 
0 1 2x1 3x21
to the above system of four equations. But to find solution is a tedious task. (So we leave it to
very enthusiastic reader.)
Question 2.7. Can we find above polynomial in terms of Lagrange’s fundamental polynomials?
Can we find the values of ai , bi , ci , di , i = 0, 1, 2, . . . , n, so that the polynomial
n
X
(ai x + bi )L2i (x)f (xi ) + (ci x + di )L2i (x)f 0 (xi ) ,
 
P (x) = (2.61)
i=0

satisfies P (xi ) = f (xi ) and P 0 (xi ) = f 0 (xi ) for all i = 0, 1, 2, . . . , n.


Problem 2.8. Suppose we denote
Hi (x) = (ai x + bi )L2i (x), (2.62)
and
Ki (x) = (ci x + di )L2i (x), (2.63)
then find these ai , bi , ci , di such that
Hi (xj ) = δij , Ki (xj ) = 0, Hi0 (xj ) = 0, Ki0 (xj ) = δij , (2.64)
so that the polynomial
n
X
P (x) = [Hi (x)f (xi ) + Ki (x)f 0 (xi )] , (2.65)
i=0
automatically satisfies the conditions satisfies P (xi ) = f (xi ) and P 0 (xi ) = f 0 (xi ) for all i =
0, 1, 2, . . . , n.
Solution. We use the definition of Hi , Ki in (2.64) to obtain
(ai xi + bi )L2i (xi ) = 1 (2.66)
(ai xj + bi )L2i (xj ) = 0 (2.67)
ai L2i (xi ) + (ai xi + bi )2Li (xi )L0i (xi ) = 0 (2.68)
ai L2i (xj ) + (ai xj + bi )2Li (xj )L0i (xj ) = 0 (2.69)
(ci xi + di )L2i (xi ) = 0 (2.70)
(ci xj + di )L2i (xj ) = 0 (2.71)
ci L2i (xi ) + (ci xi + di )2Li (xi )L0i (xi ) = 1 (2.72)
ci L2i (xj ) + (ci xj + di )2Li (xj )L0i (xj ) = 0 (2.73)
2 INTERPOLATION BY POLYNOMIALS 20

Clearly (2.67),(2.69),(2.71),(2.73) are obviously satisfied because Li (xj ) = 0. And to satisfy other
conditions, we need

(ai xi + bi ) = 1 (2.74)
ai + (ai xi + bi )2L0i (xi ) = 0 (2.75)
(ci xi + di ) = 0 (2.76)
ci + (ci xi + di )2L0i (xi ) = 1 (2.77)

After solving them we get,

ai = −2L0i (xi ) (2.78)


bi = 1+ 2xi L0i (xi ) (2.79)
ci = 1 (2.80)
di = −xi (2.81)

Thus P (x) is given by


n
X n
X
P (x) = [1 − 2(x − xi )L0i (xi )] L2i (x)f (xi ) + (x − xi )L2i (x)f 0 (xi ). (2.82)
i=0 i=0

This polynomial is called as Hermite polynomial. Here we can write L0i (xi ) in terms of ω(x) as
follows
ω 00 (xi )
L0i (xi ) = . (2.83)
2ω 0 (xi )
Remark 2.14. In Hermite interpolation there are n + 1 nodes, and interpolating polynomial
should satisfy 2(n + 1) equations. This suggests that our P (x) should be of degree at most 2n + 1.
And clearly from (2.82), P (x) satisfies this minimal degree criteria.
Theorem 2.6. (Uniqueness of Hermite Interpolating Polynomial) If we know the value
of a real valued function f and its derivative f 0 at n + 1 distinct points x0 < x1 < . . . <, xn , then
there exists exactly one polynomial of degree at most 2n + 1, which satisfies the data P (xi ) = f (xi )
and P 0 (xi ) = f 0 (xi ) for all i = 0, 1, 2, . . . , n.

Proof. Clearly, the existence of such a polynomial is given by (2.82). We only need to prove the
uniqueness.
Suppose if there is some polynomial Q(x) 6= P (x) of degree at most 2n + 1 such that

P (xi ) = f (xi ) = Q(xi ), P 0 (xi ) = f 0 (xi ) = Q(xi ).

Then φ(x) = P (x) − Q(x) 6= 0 is a non zero polynomial of degree at most 2n + 1, such that
φ(xi ) = 0 and φ0 (xi ) = 0. But φ(xi ) = 0 for all i = 0, 1, 2, . . . , n implies that φ0 (ξi ) = 0, for
some ξi ∈ (xi−1 , xi ) for all i = 1, 2, . . . , n. Thus we found n distinct zeros of φ0 other than x0i s.
This shows that φ0 has 2n + 1 distinct zeros, which is a contradiction to the fact that φ0 (x) is
nonzero polynomial of degree at most 2n (see Corollary 2.3). We note that φ0 (x) can not be a zero
polynomial because φ(x) can not be a non zero constant polynomial as φ(xi ) = 0.
We can also compute the error in Hermite interpolation.
Theorem 2.7. Let f be 2n + 2 times differentiable real valued function defined on interval [a, b].
Let P2n+1 be the unique Hermite interpolating polynomial of degree at most 2n + 1, satisfying
0
P2n+1 (xi ) = f (xi ), P2n+1 (xi ) = f 0 (xi ), i = 0, 1, 2, . . . , n. (2.84)
2 INTERPOLATION BY POLYNOMIALS 21

Then the error in interpolation at some point xi 6= x ∈ [a, b] is given by

1 ω 2 (x) 2n+2
E2n+1 f (x) = (x − x0 )2 (x − x1 )2 . . . (x − xn )2 f 2n+2 (ξ) = f (ξ). (2.85)
(2n + 2)! (2n + 2)!
Where ξ is some point in the smallest interval I containing all the node points and depends upon
x. And error bound on I is given by
M2n+2
|E2n+1 f (x)| ≤ max |ω 2 (x)| . (2.86)
x∈I (2n + 2)!
Proof. Let x be some point in the interval other than node points. Let φ(t) be a function defined
on interval [a, b] by

φ(t) = f (t) − P2n+1 (t) − (t − x0 )2 (t − x1 )2 . . . (t − xn )2 K, (2.87)

where K is a constant determined by the equation φ(x) = 0, that is,

f (x) − P2n+1 (x) − (x − x0 )2 (x − x1 )2 . . . (x − xn )2 K = 0. (2.88)

It is clear that φ(xi ) = 0 for all i = 0, 1, 2, . . . , n. Thus function φ has n + 2 zeros


say {x, x0 , x1 , . . . , xn }. Hence φ0 must have at least n + 1 zeros at some points other than
{x, x0 , x1 , . . . , xn }. Moreover φ0 (xi ) = 0 for all i = 0, 1, 2, . . . , n. Thus φ0 has at least 2n + 2
distinct zeros. Hence (2n + 1)th derivative of φ0 , that is, φ(2n+2)th must have at least one zero ξ in
the interval I. Therefore,

0 = φ2n+2 (ξ) = f 2n+2 (ξ) − 0 − (2n + 2)!K. (2.89)


1 2n+2
This implies that K = 2n+2! f (ξ). And hence from (2.87) and (2.88),

1
E2n+1 f (x) = f (x) − P2n+1 (x) = (x − x0 )2 (x − x1 )2 . . . (x − xn )2 f 2n+2 (ξ). (2.90)
(2n + 2)!

Clearly (2.86) follows if we consider M2n+2 = maxx∈I |f 2n+2 (x)|.


Problem 2.9. Find Lagrange’s fundamental polynomial to interpolate the following data.

f (0) = 2, f 0 (0) = 0, f (1) = 2, f 0 (1) = 1.


(x−x1 )
Solution. Let x0 = 0, x1 = 1. And L0 (x) = (x 0 −x1 )
= (x−1)
(0−1) . Similarly L1 (x) =
(x−x0 )
(x1 −x0 ) = (x−0)
(1−0) .
0 0 0 0
Thus, L0 (x0 ) = L0 (0) = −1 and L1 (x1 ) = L1 (1) = 1. Using,
n
X n
X
P (x) = [1 − 2(x − xi )L0i (xi )] L2i (x)f (xi ) + (x − xi )L2i (x)f 0 (xi ).
i=0 i=0

1
X 1
X
P3 (x) = [1 − 2(x − xi )L0i (xi )] L2i (x)f (xi ) + (x − xi )L2i (x)f 0 (xi ),
i=0 i=0
0
= 2
[1 − 2(x − 0)L0 (0)]L0 (x)f (0) + [1 − 2(x − 1)L01 (1)]L21 (x)f (1)
+(x − 0)L20 (x)f 0 (0) + (x − 1)L21 (x)f 0 (1)
2 2
= [1 − 2x(−1)](1 − x) 2 + [1 − 2(x − 1)1]x 2
+x(1 − x)2 (0) + (x − 1)x2 (1)
= 2(1 + 2x)(x − 1)2 + 2(3 − 2x)x2 + (x − 1)x2
= x3 − x2 + 2
2 INTERPOLATION BY POLYNOMIALS 22

2.9. Newton’s method to find Hermite Interpolating Polynomial

We further consider Problem (2.59). And assume that the polynomial of degree three in the
following form.

P (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + a3 (x − x0 )2 (x − x1 ). (2.91)

If this polynomial satisfy the data P (x0 ) = f (x0 ), P 0 (x0 ) = f 0 (x0 ), P (x1 ) = f (x1 ), P, (x1 ) =
f, (x1 ), then we must have

a0 = f (x0 )
a1 = f 0 (x0 )
a0 + a1 (x1 − x0 ) + a2 (x1 − x0 )2 = f (x1 )
a1 + 2a2 (x1 − x0 ) + a3 (x1 − x0 ) 2
= f 0 (x1 )

If we assume that the function is smooth enough (as many times differentiable as we want), then
f 0 (x0 ) can be written as f [x0 , x0 ], and f 0 (x1 ) = f [x1 , x1 ]. Hence, we have f (x0 ) + f [x0 , x0 ](x1 −
x0 ) + a2 (x1 − x0 )2 = f (x1 ) or f [x0 , x0 ] + a2 (x1 − x0 ) = f [x0 , x1 ], or a2 = f [x0 , x0 , x1 ]. Further
from the last equation we have a1 + a2 (x1 − x0 ) + a2 (x1 − x0 ) + a3 (x1 − x0 )2 = f [x1 , x1 ], or
f [x0 , x1 ] + a2 (x1 − x0 ) + a3 (x1 − x0 )2 = f [x1 , x1 ], or f [x0 , x0 , x1 ](x1 − x0 ) + a3 (x1 − x0 )2 =
f [x1 , x1 ] − f [x0 , x1 ], or f [x0 , x0 , x1 ] + a3 (x1 − x0 ) = f [x0 , x1 , x1 ], or a3 = f [x0 , x0 , x1 , x1 ]. Thus

P (x) = f (x0 ) + f [x0 , x0 ](x − x0 ) + f [x0 , x0 , x1 ](x − x0 )2 + f [x0 , x0 , x1 , x1 ](x − x0 )2 (x − x1 ). (2.92)

Remark 2.15. In general if there are n + 1 nodes one can compute Newton’s divided difference
table with each node considered twice, that is, total 2n + 2 total number of nodes. And to compute
the the first divided difference relative to repeated points xi , xi we directly use the data given to
us because f [xi , xi ] = f 0 (xi ).
Example 2.9. We will use Newton’s extended divided difference table to find a polynomial, which
satisfies the following data in which at each node we need to match derivatives up to certain order.

P (0) = 1, P 0 (0) = 2, P 00 (0) = −2, P (1) = 3, P 0 (1) = 5, P 00 (1) = 18, P 000 (1) = 60.

x f (x) f [x, x] f [x, x, x] f [x, x, x, x]


0 1 f [0, 0] = f 0 (0) = 2 f [0, 0, 0] = f 00 (0)/2 = −1 f [0, 0, 0, 1] = 0 − (−1) = 1
0 1 f [0, 0] = f 0 (0) = 2 f [0, 0, 1] = 0 f [0, 0, 1, 1] = 3 − 0 = 3
0 1 f [0, 1] = (3 − 1)/(1 − 0) = 2 f [0, 1, 1] = 3 f [0, 1, 1, 1] = 9 − 3 = 6
1 3 f [1, 1] = f 0 (1) = 5 f [1, 1, 1] = f 00 (1)/2 = 9 f [1, 1, 1, 1] = f 000 (1)/6 = 10
1 3 f [1, 1] = f 0 (1) = 5 f [1, 1, 1] = f 00 (1)/2 = 9 ...
0
1 3 f [1, 1] = f (1) = 5 ... ...
1 3 ... ... ...
x f [x, x, x, x, x] f [x, x, x, x, x, x] f [x, x, x, x, x, x, x]
0 f [0, 0, 0, 1, 1] = 3 − 1 = 2 f [0, 0, 0, 1, 1, 1] = 1 f [0, 0, 0, 1, 1, 1, 1] = 0
0 f [0, 0, 1, 1, 1] = 6 − 3 = 3 f [0, 0, 1, 1, 1, 1] = 1 ...
0 f [0, 1, 1, 1, 1] = 10 − 6 = 4 ... ...
1 ... ... ...
1 ... ... ...
1 ... ... ...
1 ... ... ...
P (x) = f (0) + f [0, 0]x + f [0, 0, 0]x2 + f [0, 0, 0, 1]x3 + f [0, 0, 0, 1, 1]x3 (x − 1) + f [0, 0, 0, 1, 1, 1]x3 (x −
1)2 + f [0, 0, 0, 1, 1, 1, 1]x3 (x − 1)3 = x5 − x2 + 2x + 1.
2 INTERPOLATION BY POLYNOMIALS 23

Remark 2.16. One can not use Newton’s extended divided difference table if the value of certain
order derivative is known at some node, but the value of lower order derivative is not known. We
can not find a polynomial of degree three such that P (0) = 1, P 00 (0) = 2, P 00 (1) = 0, P 000 (1) = 1.
2.10. Piecewise Linear Interpolation
In practical purpose it has been observed that the approximation of a given function by lin-
ear pieces is better than the one polynomial of higher degree. Let f be a nice function with
f (x0 ), f (x1 ), . . . , f (xn ) known at n + 1 distinct points x0 , x1 , . . . , xn . We aim to find n lines
si , i = 1, 2, ..., n, such that each si interpolates the function at the end points of the interval
[xi−1 , xi ], that is,
si (xi−1 ) = f (xi−1 ), si (xi ) = f (xi ), i = 1, 2, ...n. (2.93)
Using Lagrange’s linear interpolating polynomial we get,

(x − xi ) (x − xi−1 )
si (x) = f (xi−1 ) + f (xi ) x ∈ [xi−1 , xi ], i = 1, 2, ...n. (2.94)
(xi−1 − xi ) (xi − xi−1 )
= 0 x ∈ [xi−1 , xi ]c .
Pn
Thus P1 (x) = i=1 si (x) is the desired piecewise linear interpolating polynomial satisfying the
given data. If we write the shape function as



 0, x ≤ xi−1
 (x−xi−1 ) ,

xi−1 ≤ x ≤ xi
i −xi−1 )
Ni (x) = (x(x−xi+1 ) . (2.95)
 (xi −xi+1 ) ,

 xi ≤ x ≤ xi+1

0, x ≤x i+1

Pn
Then P1 (x) = i=1 Ni (x)f (xi ). The error in piecewise linear interpolation is given by

1
E1 f (x) = (x − xi−1 )(x − xi )f 00 (ξ), ξ ∈ [xi−1 , xi ].
2!
Remark 2.17. Here it is important to note that on each subinterval the expression for error
function is different. So to find the uniform error bound we need to take the maximum of error
bounds on different subintervals. Thus
|xi − xi−1 |2
 
M2
|Ef (x)| ≤ max . (2.96)
2 i 2

2.11. Cubic Spline Interpolation

Here we aim to find n cubic polynomials si , i = 1, 2, ..., n, such that each si interpolates the
function f at the node points xi−1 , xi in the interval [xi−1 , xi ] such that the spline s, which we
get after adjoining each cubic curve at the node points, is not only continuous on the interval
(x0 , xn ) but s0 and s00 is also continuous on this interval. These conditions can also be stated
mathematically as follows.

• si (xi−1 ) = f (xi−1 ) for all i = 1, 2, ..., n,


• si (xi ) = f (xi ) for all i = 1, 2, ..., n,
• s0i (xi ) = s0i+1 (xi ) for all i = 1, 2, ..., n − 1,
• s00i (xi ) = s00i+1 (xi ) for all i = 1, 2, ..., n − 1.
2 INTERPOLATION BY POLYNOMIALS 24

These are total 4n − 2 equations, but to obtain polynomials uniquely we need two more con-
ditions. These two conditions are known as boundary conditions. In practical purpose there are
two types of boundary conditions. Firstly, free boundary conditions is given by
s001 (x0 ) = s00n (xn ) = 0. (2.97)
And clamped boundary conditions are given by
s01 (x0 ) = f 0 (x0 ) and s0n (xn ) = f 0 (xn ). (2.98)
At node points we tie two distinct polynomial, so some times these node are also called as knots.
Since f (xi ) is given, in first two conditions both RHS and LHS is fixed. But in last two conditions
RHS=LHS is to be determined so that our s0i s should satisfy all the conditions simultaneously.
Suppose for notational simplification we consider new variables
m0 = s01 (x0 ), M0 = s001 (x0 ) (2.99)
mi = s0i (xi ) = s0i+1 (xi ), i = 1, 2, ..., n − 1, (2.100)
Mi = s00i (xi ) = s00i+1 (xi ), i = 1, 2, ..., n − 1. (2.101)
mn = s0n (xn ), Mn = s00n (xn ) (2.102)
Thus all the four conditions will be automatically satisfied, if we demand our each cubic polynomial
si to satisfy the following conditions.
si (xi−1 ) = f (xi−1 )
si (xi ) = f (xi )
s0i (xi−1 ) = mi−1
s0i (xi ) = mi
s00i (xi−1 ) = Mi−1
s00i (xi ) = Mi
Since in the interval [xi−1 , xi ] we want a cubic polynomial si , that is, we want s00i to be a linear
polynomial to satisfy the last two conditions. Using Lagrange’s formula we get,
(x − xi ) (x − xi−1 )
s00i (x) = Mi−1 + Mi .
(xi−1 − xi ) (xi − xi−1
Now if we write hi = xi − xi−1 , then
(x − xi )3 (x − xi−1 )3
si (x) = Mi−1 + Mi + ai x + bi .
−6hi 6hi
And since si (xi−1 ) = f (xi−1 ) and si (xi ) = f (xi ), we must have
f (xi−1 ) = h2i Mi−1 /6 + ai xi−1 + bi and f (xi ) = h2i Mi /6 + ai xi + bi .
Solving these equations we get
f (xi ) − f (xi−1 ) Mi − Mi−1
ai = − hi ,
hi 6
xi f (xi−1 ) − xi−1 f (xi ) xi Mi−1 − xi−1 Mi
bi = − hi .
hi 6
Substituting these values of ai , bi in the expression of si , we get
(x − xi )3 (x − xi−1 )3 f (xi ) − f (xi−1 ) Mi − Mi−1
si (x) = Mi−1 + Mi + x− hi x (2.103)
−6hi 6hi hi 6
xi f (xi−1 ) − xi−1 f (xi ) xi Mi−1 − xi−1 Mi
+ − hi .
hi 6
2 INTERPOLATION BY POLYNOMIALS 25

This gives that


(x − xi )2 (x − xi−1 )2 f (xi ) − f (xi−1 ) Mi − Mi−1
s0i (x) = Mi−1 + Mi + − hi . (2.104)
−2hi 2hi hi 6
Further we use the condition s0i (xi ) = s0i+1 (xi ) to get

hi Mi−1 + 2(hi + hi+1 )Mi + hi+1 Mi+1 = 6f [xi , xi+1 ] − 6f [xi−1 − xi ]. (2.105)

By the assumption of clamped boundary conditions f 0 (x0 ) = s01 (x0 ) and f 0 (xn ) = s0n (xn ), we get

2h1 M0 + h1 M1 = 6f [x0 , x1 ] − 6f [x0 , x0 ]


hn Mn−1 + 2hn Mn = 6f [xn , xn ] − 6f [xn−1 , xn ]

Now using (2.105) and above two equations we write the matrix form of the system of linear
equations.
 
2h1 h1 0 0 ... ... ... ... ... ...
 h1 2(h1 + h2 ) h2 0 ... ... ... ... ... ... 
 
 0 h2 2(h 2 + h 3 ) h 3 . . . . . . . . . . . . . . . ... 
 
... ... ... ... ... ... ... ... ... ... 
 
... ... ... . . . hi−1 2(hi−1 + hi ) hi ... ... ... 
 ×
... ... ... ... ... hi 2(hi + hi+1 ) hi+1 ... ... 
 
... ... ... ... ... ... hi+1 2(hi+1 + hi+2 ) hi+2 ... 
 
... ... ... ... ... ... ... ... ... ... 
 
... ... ... ... ... ... ... hn−1 2(hn−1 + hn ) hn 
... ... ... ... ... ... ... ... hn 2hn
   
M0 f [x0 , x1 ] − f [x0 , x0 ]
 M1 
 

 f [x1 , x2 ] − f [x0 , x1 ] 

 ...   . . . 
   
 ...   . . . 
   
Mi−1   . . . 
  = 6  (2.106)
 Mi   f [xi , xi+1 ] − f [xi−1 − xi ] 
   
Mi+1   ... 
   
 ...   . . . 
   
 ...  f [xn−1 , xn ] − f [xn−1 , xn−2 ]
Mn f [xn , xn ] − f [xn−1 , xn ]

Remark 2.18. This matrix equation (correspond to clamped boundary) takes simple form when
step size is fixed, that is, hi = h.

     
2 1 0 0 ... ... ... ... ... ... M0 f [x0 , x1 ] − f [x0 , x0 ]
2
 4 1 0 ... ... ... ... ... . . .
  M1 
  
 f [x1 , x2 ] − f [x0 , x1 ] 

0 1 4 1 ... ... ... ... ... . . .  . . . 
    ... 
  
. . . ... ... ... ... ... ... ... ... . . .  . . . 
    ... 
  
. . . ... ... ... 1 4 1 ... ... . . . Mi−1 
   6  ... 
 × =  
. . .
 ... ... ... ... 1 4 1 ... . . .  Mi  h  f [xi , xi+1 ] − f [xi−1 − xi ] 
   

. . . ... ... ... ... ... ... ... ... . . .
 Mi+1  ...
   
  
. . . ... ... ... ... ... ... ... ... . . .  . . . 
    ... 
  
. . . ... ... ... ... ... ... 1 4 1   ...  f [xn−1 , xn ] − f [xn−1 , xn−2 ]
... ... ... ... ... ... ... ... 1 2 Mn f [xn , xn ] − f [xn−1 , xn ]
(2.107)
2 INTERPOLATION BY POLYNOMIALS 26

Remark 2.19. Further for the free boundary condition M0 = 0, Mn = 0 we replace the first and
last row of the matrix with these two conditions and get

 
h1 0 0 0 ... ... ... ... ... ...
 h1 2(h1 + h2 ) h2 0 ... ... ... ... ... . . .
 
0 h2 2(h2 + h3 ) h3 ... ... ... ... ... . . .
 
. . . ... ... ... ... ... ... ... ... . . .
 
. . . ... ... . . . hi−1 2(hi−1 + hi ) hi ... ... . . .
 ×
. . . ... ... ... ... hi 2(hi + hi+1 ) hi+1 ... . . .
 
. . . ... ... ... ... ... hi+1 2(hi+1 + hi+2 ) hi+2 . . .
 
. . . ... ... ... ... ... ... ... ... . . .
 
. . . ... ... ... ... ... ... hn−1 2(hn−1 + hn ) hn 
... ... ... ... ... ... ... ... 0 hn
   
M0 0
 M1 
 

 f [x 1 , x2 ] − f [x0 , x1 ] 

 ...   ... 
   
 ...   ... 
   
Mi−1   ... 
 = 6
 f [xi , xi+1 ] − f [xi−1 − xi ]  . (2.108)
 
 Mi 
   
Mi+1   ... 
   
 ...   ... 
   
 ...  f [xn−1 , xn ] − f [xn−1 , xn−2 ]
Mn 0

This equation also takes a simple form when step size is fixed.
     
1 0 0 0 ... ... ... ... ... ... M0 0
2
 4 1 0 . . . . . . . . . . . . . . . . . .   M1 
  
 f [x1 , x2 ] − f [x0 , x1 ] 

0 1 4 1 . . . . . . . . . . . . . . . . . .  ...   . . . 
     
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ...   . . . 
     
. . . ... ... ... 1 4 1 . . . . . . . . . Mi−1  6  . . . 
×
 Mi  h  f [xi , xi+1 ] − f [xi−1 − xi ]  .
 =  
. . . ... ... ... ... 1 4 1 . . . . . .
     
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 Mi+1  ...
   
  
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ...   . . . 
     
. . . ... ... ... ... ... ... 1 4 1   ...  f [xn−1 , xn ] − f [xn−1 , xn−2 ]
... ... ... ... ... ... ... ... 0 1 Mn 0
(2.109)
Remark 2.20. In all above matrix equations first and last row of the matrix appear because of
extra boundary condition at x0 and xn respectively, which might be either clamped boundary or
the free boundary condition. But one can also think of having mixed boundary condition like free
boundary at x0 , that is, f 00 (x0 ) = M0 = 0 and clamped boundary at xn , that is s0 (xn ) = f 0 (xn ),
2 INTERPOLATION BY POLYNOMIALS 27

or equivalently, hn Mn−1 + 2hn Mn = 6f [xn , xn ] − 6f [xn−1 , xn ] or viceversa.


 
h1 0 0 0 ... ... ... ... ... ...
 h1 2(h1 + h2 ) h2 0 ... ... ... ... ... ... 
 
0 h2 2(h2 + h3 ) h3 ... ... ... ... ... ... 
 
. . . ... ... ... ... ... ... ... ... ... 
 
. . . ... ... . . . hi−1 2(hi−1 + hi ) hi ... ... ... 
 ×
. . . ... ... ... ... hi 2(hi + hi+1 ) hi+1 ... ... 
 
. . . ... ... ... ... ... hi+1 2(hi+1 + hi+2 ) hi+2 ... 
 
. . . ... ... ... ... ... ... ... ... ... 
 
. . . ... ... ... ... ... ... hn−1 2(hn−1 + hn ) hn 
... ... ... ... ... ... ... ... hn 2hn
   
M0 0
 M1 
 

 f [x1 , x2 ] − f [x0 , x1 ] 

 ...   ... 
   
 ...   ... 
   
Mi−1   ... 
 = 6
 f [xi , xi+1 ] − f [xi−1 − xi ]  . (2.110)
 
 Mi 
   
Mi+1   ... 
   
 ...   . . . 
   
 ...  f [xn−1 , xn ] − f [xn−1 , xn−2 ]
Mn f [xn , xn ] − f [xn−1 , xn

Further if the step size is fixed, we get


     
1 0 0 0 ... ... ... ... ... ... M0 0
2
 4 1 0 ... ... ... ... ...   M1 
. . .   
 f [x1 , x2 ] − f [x0 , x1 ] 

0 1 4 1 ... ... ... ... ... . . .  ...   . . . 
     
. . . . . . ... ... ... ... ... ... ... . . .  ...   . . . 
     
. . . . . . ... ... 1 4 1 ... ... . . . Mi−1  6 
    ... 
 × =  .
. . . . . .
 ... ... ... 1 4 1 ... . . .  Mi  h  f [xi , xi+1 ] − f [xi−1 − xi ] 
 

. . . . . . ... ... ... ... ... ... ... . . .
 Mi+1  ...
   
  
. . . . . . ... ... ... ... ... ... ... . . .  . . . 
    ... 
  
. . . . . . ... ... ... ... ... 1 4 1   ...  f [xn−1 , xn ] − f [xn−1 , xn−2 ]
... ... ... ... ... ... ... ... 1 2 Mn f [xn , xn ] − f [xn−1 , xn ]
(2.111)

2.12. Error in cubic spline interpolation


It can be shown that
5M4
|E(x)| = |f (x) − s(x)| ≤ (max{|hi |})4 . (2.112)
384 i

Further it can also be shown that


M4
|f 0 (x) − s0 (x)| ≤ (max{|hi |})3 . (2.113)
24 i

This shows that we can also approximate f 0 using cubic spline just by knowing some functional
values at nodes and M4 .
3 NUMERICAL INTEGRATION 28

3 Numerical integration
Our aim in this chapter is to find the approximate value of some definite integral specially in
the cases when the value of the integrand is known only at certain points, or the integral of the
integrand is not known in terms of standard functions.
Question 3.1. Can we use interpolating polynomials of some function f to find the approximate
Rb
integral of f on [a, b], that is, a f (x)dx? Is this approximation a good one?
If P (x) is some interpolating polynomial to f (x) such that the error is as small as some given
positive number , that is, |E(x)| = |f (x) − P (x)| ≤ , then
Z b Z b Z b
f (x)dx − P (x)dx = (f (x) − P (x))dx,
a a a
Z b
≤ |f (x) − P (x)|dx,
a
Z b
≤  dx,
a
=  (b − a). (3.1)
Thus we can approximate the integral of f by the integral of interpolating polynomial with the
desirable accuracy.
3.1. Newton-Cotes Methods
Rb Rb
In this method we use interpolating polynomials P (x) to approximate a
f (x)dx by a
P (x)dx
Rb Rb
with error in integration as a [f (x) − P (x)]dx = a E(x)dx.
3.2. Trapezoidal Rule
Suppose f is known at the two end points of the interval x0 = a, x1 = b and P1 (x) is the linear
approximation to the function f at nodes x0 , x1 , then by Lagrange’s method we write
(x − x1 ) (x − x0 )
P1 (x) = f (x0 ) + f (x1 ). (3.2)
(x0 − x1 ) (x1 − x0 )
R x1 R x1  (x−x1 )  R x1  (x−x0 ) 
(x1 −x0 )
Now, x0
P1 (x)dx = x0 (x0 −x1 ) f (x0 ) dx + x0 (x1 −x0 ) f (x1 ) dx = 2 f (x0 ) +
(x1 −x0 )
2 And if h1 = x1 − x0 , then
f (x1 ).
Z x1
h1
P1 (x)dx = [f (x0 ) + f (x1 )]. (3.3)
x0 2
Rb
Thus by Trapezoidal rule the approximate value of a f (x)dx is given by (3.3). Further from (2.46)
f (x) − P1 (x) = (x − x0 )(x − x1 )f 00 (ξx )/2. (3.4)
And hence,
Z x1 Z x1
1
[f (x) − P1 (x)] dx ≤ (x − x0 )(x − x1 )f 00 (x) dx
x0 x0 2
Z x1
1
≤ |(x − x0 )| |(x − x1 )| max |f 00 (x)| dx
x0 2 x∈[x0 ,x1 ]
Z x1
1
= max |f 00 (ξx )| (x − x0 ) (x1 − x)dx
2 x∈[x0 ,x1 ] x0
1 (x1 − x0 )3
= max |f 00 (x)| . (3.5)
2 x∈[x0 ,x1 ] 6
3 NUMERICAL INTEGRATION 29

3.3. Composite Trapezoidal Rule


Now if we partition [a, b] in n subintervals [xi−1 , xi ], i = 1, 2, . . . , n. We want to ap-
Rb Pn R xi
ply trapezoidal rule in each subinterval to approximate a f (x)dx = i=1 xi−1 f (x)dx by
Pn R xi
P
i=1 xi−1 1,i (x)dx. Thus
Z b Xn Z xi Xn Z xi
f (x)dx = f (x)dx ≈ P1,i (x)dx,
a i=1 xi−1 i=1 xi−1
n  
X hi
≈ [f (xi−1 ) + f (xi )] ,
i=1
2
"n−1  #
h1 X hi+1 + hi  hn
= f (x0 ) + f (xi ) + f (xn ). (3.6)
2 i=1
2 2
Similar to (3.5), bound for the error of integration through trapezoidal rule in the interval [xi−1 , xi ]
is given by
Z xi+1
h3
[f (x) − P1,i (x)]dx ≤ i max |f 00 (x)| . (3.7)
xi−1 12 x∈[xi−1 ,xi ]
And the total error in composite trapezoidal rule will be bounded by
n  3  n  3 n
X hi 00 00
X hi M2 X 3
max |f (x)| ≤ max |f (x)| = h . (3.8)
i=1
12 x∈[xi−1 ,xi ] x∈[a,b]
i=1
12 12 i=1 i
Further if our step size is fixed, that is, xi − xi−1 = hi = h, then
Z b "n−1 #
h X h
f (x)dx ≈ f (x0 ) + hf (xi ) + f (xn )
a 2 i=1
2
"n−1 # !
h X
= f (x0 ) + 2f (xi ) + f (xn )
2 i=1
h
= [f (x0 ) + 2f (x1 ) + . . . + 2f (xn−1 ) + f (xn )] . (3.9)
2
n 3 1 3 (b−a)3
And by (3.8), total error in this case will be bounded by 12 M2 h = 12n2 M2 (nh) = 12n2 M2 .
3.4. Simpson’s one-third rule
Suppose now
R x we have three
R x equidistant nodes a = x0R, xx1 = x0 +h, x2 = x0 +2h
R x = b and want to
approximate x02 f (x)dx by x02 P2 (x)dx with error as x02 [f (x) − P2 (x)]dx = x02 E2 (x)dx. Since
there are three nodes we should approximate by a polynomial of degree at most two. According
to Newton’s method we can write the quadratic polynomial interpolating at given three node as
follows
P2 (x) = f (x0 ) + f [x0 , x0 + h](x − x0 ) + f [x0 , x0 + h, x0 + 2h](x − x0 )(x − x0 − h). (3.10)
Now we want to compute the integral
Z x2 Z x0 +2h
2
P2 (x)dx = f (x0 )2h + f [x0 , x0 + h]2h + f [x0 , x0 + h, x0 + 2h] (x − x0 )(x − x0 − h)dx,
x0 x0
Z x0 +2h
2
P2 (x)dx = 2hf (x0 ) + 2h2 f [x0 , x0 + h] + h3 f [x0 , x0 + h, x0 + 2h],
x0 3
   
f (x0 + h) − f (x0 ) 2 f (x0 + 2h) − 2f (x0 + h) + f (x0 )
= 2hf (x0 ) + 2h2 + h3 ,
h 3 2h2
h h
= [f (x0 + 2h) + 4f (x0 + h) + f (x0 )] = [f (x2 ) + 4f (x1 ) + f (x0 )] . (3.11)
3 3
3 NUMERICAL INTEGRATION 30

Thus if the step size is fixed, then


Z x2
h
f (x)dx ≈ [f (x2 ) + 4f (x1 ) + f (x0 )] . (3.12)
x0 3
000
Further since E2 (x) = f (x) − P2 (x) = (x − x0 )(x − x0 − h)(x − x0 − 2h) f 3!(ξ) , error in integration
should be
Z x0 +2h Z x0 +2h
f 000 (ξ)
E2 (x)dx = (x − x0 )(x − x0 − h)(x − x0 − 2h) dx. (3.13)
x0 x0 6

Thus, if M3 = maxx∈[x0 ,x2 ] |f 000 (x)|, then

x0 +2h x0 +2h
f 000 (ξ)
Z Z
E2 (x)dx ≤ (x − x0 )(x − x0 − h)(x − x0 − 2h) dx
x0 x0 6
Z x0 +2h
M3
≤ |(x − x0 )(x − x0 − h)(x − x0 − 2h)| dx
x0 6
Z x1 Z x2 
M3
= (x − x0 )(x1 − x)(x2 − x)dx + (x − x0 )(x − x1 )(x2 − x)dx
6 x0 x1
"Z
x0 +h
M3
= (x − x0 )(x0 + h − x)(x0 + 2h − x)dx+
6 x0
#
Z x0 +2h
(x − x0 )(x − x0 − h)(x0 + 2h − x)dx
x0 +h

By substitution x − x0 = hu,
x0 +2h 1 2
M3 h4
Z Z Z 
E2 (x)dx ≤ u(1 − u)(2 − u)du + u(u − 1)(2 − u)du
x0 6 0 1
4
M3 h  2
(u − u3 − u4 /4)|10 + (−u2 + u3 − u4 /4)|10

=
6
M3 h4
= [(1 − 1 + 1/4) − 0 + (−4 + 8 − 4) − (−1/4 + 1 − 1)]
6
4
M3 h
≤ . (3.14)
12
R x +2h
This shows that we can approximate x00 f (x)dx by h3 [f (x0 ) + 4f (x0 + h) + f (x0 + 2h)] with
a desired accuracy if h is small enough.
Observation 3.1. Clearly because of three nodes, approximation to a two degree polynomial
is exact. It can also be seen easily from (3.13) that if f is a three degree polynomial, that is,
f (x) = a0 + a1 x + a2 x2 + a3 x3 , then
x0 +2h x0 +2h
f 000 (ξ)
Z Z
E2 (x)dx = (x − x0 )(x − x0 − h)(x − x0 − 2h) dx
x0 x0 6
Z x0 +2h
= a3 (x − x0 )(x − x0 − h)(x − x0 − 2h)dx = 0. (3.15)
x0

Thus, because of odd number of equidistant nodes, the approximation to the integral of three
degree polynomial is also exact.
3 NUMERICAL INTEGRATION 31

3.5. Another form of truncation error of integration in Simpson’s one-third rule

Since we are approximating the function at three node points x0 , x0 + h, x0 + 2h, we have
f (x) = f (x0 ) + f [x0 , x0 + h](x − x0 ) + f [x0 , x0 + h, x0 + 2h](x − x0 )(x − x0 − h) + f [x0 , x0 + h, x0 +
000
2h, x](x−x0 )(x−x0 −h)(x−x0 −2h) = P2 (x)+ f 6(ξ) (x−x0 )(x−x0 −h)(x−x0 −2h). And the approx-
R x0 +2h R x0 +2h
imation to the integral x0 f (x)dx is x0 P2 (x)dx = h3 [f (x0 ) + 4f (x0 + h) + f (x0 + 2h)]
with truncation error of integration given by (3.13). But using (3.15), the crux of the observation
R x +2h
3.1, one can obtain the same approximation as (3.12) to the integral x00 f (x)dx by approxi-
mating f (x) with a three degree interpolating polynomial P3 (x) satisfying the conditions P3 (x0 ) =
f (x0 ), P3 (x0 + h) = f (x0 + h), P3 (x0 + 2h) = f (x0 + 2h) and an extra condition P30 (x0 + h) = 0.
And this is because P3 (x) = P2 (x) + f [x0 , x0 + h, x0 + 2h, x0 + h](x − x0 )(x − x0 − h)(x − x0 − 2h)
and
Z x0 +2h Z x0 +2h

P3 (x)dx = f (x0 ) + f [x0 , x0 + h](x − x0 ) + f [x0 , x0 + h, x0 + 2h](x − x0 )
x0 x0

Z x0 +2h
(x − x0 − h) dx + f [x0 , x0 + h, x0 + 2h, x0 + h] (x − x0 )(x − x0 − h)(x − x0 − 2h)dx,
x0
Z x0 +2h Z x0 +2h
= P2 (x)dx + f [x0 , x0 + h, x0 + 2h, x0 + h](0) = P2 (x)dx. (3.16)
x0 x0

Thus the error of interpolation in this case is given by f (x) − P3 (x) = f [x0 , x0 + h, x0 + 2h, x0 +
iv
h, x](x − x0 )(x − x0 − h)2 (x − x0 − 2h) = f 4!(ξ) (x − x0 )(x − x0 − h)2 (x − x0 − 2h). Further the
truncation error of integration will be
x0 +2h x0 +2h
f iv (ξ)
Z Z
[f (x) − P3 (x)]dx = (x − x0 )(x − x0 − h)2 (x − x0 − 2h) dx. (3.17)
x0 x0 4!

And bound for error in integration can be obtained as


x0 +2h x0 +2h
f iv (ξ)
Z Z
[f (x) − P3 (x)] dx = (x − x0 )(x − x0 − h)2 (x − x0 − 2h) dx ,
x0 x0 4!
Z x0 +2h Z x0 +2h
M4
[f (x) − P2 (x)] dx ≤ (x − x0 )(x − x0 − h)2 (x − x0 − 2h) dx,
x0 4! x0
1
M4 h5
Z
= (u + 1)(u)2 (u − 1) du, x − x0 − h = uh,
4! −1
5 Z 1
2M4 h5 1
Z
M4 h
= (u4 − u2 ) du =
(u4 − u2 ) du,
−1 4! 4! 0
2M4 h5 1 2 5
Z  
4 2M 4 h 1 1
= (u − u )du = − ,
4! 0 4! 3 5
M4 h5
= .
90
R x +2h R x +2h
Here since x00 [f (x) − P3 (x)] dx = x00 [f (x) − P2 (x)] dx, the bound for the error of integra-
tion in Simpson one-third rule can also be obtained as
x0 +2h
M4 h5
Z
[f (x) − P2 (x)] dx ≤ . (3.18)
x0 90
3 NUMERICAL INTEGRATION 32

Remark 3.1. Thus we see from (3.14) and (3.18) that there are two bounds of order h4 and h5
respectively for the error of integration in Simpson’s one-third rule. But for small h we can infer
5
the better accuracy of approximating integral from the later bound (3.18). So often we use M90 4h

as error bound of integration in Simpson’s one-third rule.


3.6. Composite Simpson’s one-third rule
Now if we want to apply Simpson’s one-third rule on a given interval [a, b], we need to partition
[a, b] in to 2n number of equal length subintervals with nodes a = x0 , x1 , x2 , . . . , x2n−2 , x2n−1 , x2n =
b, where xi = x0 + ih. InReach two adjacent subintervals of the form [x2i−2 , x2i−1 ], [x2i−1 , x2i ] we
x2i
approximate the integral x2i−2 f (x)dx using Simpson’s one-third rule. So that
Z b Z x2n n Z
X x2i
f (x)dx = f (x)dx = f (x)dx
a x0 i=1 x2i−2
n Z
X x2i
≈ P2,i (x)dx
i=1 x2i−2

Now using (3.12), we get


Z b n
X h
f (x)dx ≈ [f (x2i−2 ) + 4f (x2i−1 ) + f (x2i )]
a i=1
3
h
= [f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + 2f (x4 ) + . . . + 2f (x2i−2 )
3
+4f (x2i−1 ) + 2f (x2i ) + . . . + 2f (x2n−2 ) + 4f (x2n−1 ) + f (x2n )] (3.19)

Further using (3.14) error will bounded as follows


Z b n Z
X x2i
E(x)dx = [f (x) − P2,i (x)]dx ,
a i=1 x2i−2
n
X Z x2i
≤ [f (x) − P2,i (x)]dx ,
i=1 x2i−2
n
X M3 h4 nM3 h4
≤ = , 2nh = b − a,
i=1
12 12
M3 (b − a)4
= . (3.20)
192n3
Further using (3.18) one can also obtain the error bound as
Z b n Z
X x2i
E(x)dx = [f (x) − P3,i (x)]dx ,
a i=1 x2i−2
n
X Z x2i
≤ [f (x) − P3,i (x)]dx ,
i=1 x2i−2
n
X M4 h5 nM4 h5
≤ = , 2nh = b − a, ,
i=1
90 90
M4 (b − a)5
= . (3.21)
2880n4
3 NUMERICAL INTEGRATION 33

Remark 3.2. As we discussed in Remark 3.1, in composite Simpson’s one-third rule we also get
two bounds for error of integration as given above. But we always try to subdivided interval [a, b]
in to least number of subintervals to obtain certain accuracy. Thus to obtain the error bound as
4 5
, we need either M192n
3 (b−a)
3 <  or M2880n
4 (b−a)
4 < . And hence the minimum n, required for obtaining
the desired error bound of integration, should satisfy
( 1/3  1/4 )
M3 (b − a)4 M4 (b − a)5
min , ≤ n. (3.22)
192 2880

3.7. Simpson’s 3/8 rule


Rb
Now our aim is to approximate a f (x)dx by approximating f (x) with an interpolating poly-
nomial P3 (x) of degree at most three at four equidistant nodes a = x0 , x0 + h, x0 + 2h, x0 + 3h.
Newton’s form of the polynomial P3 (x) is as follows
P3 (x) = f (x0 ) + f [x0 , x0 + h](x − x0 ) + f [x0 , x0 + h, x0 + 2h](x − x0 )(x − x0 − h)
+f [x0 , x0 + h, x0 + 2h, x0 + 3h](x − x0 )(x − x0 − h)(x − x0 − 2h). (3.23)
So that,
Z x0 +3h Z x0 +3h Z x0 +3h
P3 (x)dx = f (x0 ) dx + f [x0 , x0 + h] (x − x0 )dx
x0 x0 x0
Z x0 +3h
+f [x0 , x0 + h, x0 + 2h] (x − x0 )(x − x0 − h)dx
x0
Z x0 +3h
+f [x0 , x0 + h, x0 + 2h, x0 + 3h] (x − x0 )(x − x0 − h)(x − x0 − 2h)dx,
x0
f (x0 + h) − f (x0 ) 9h2
 
= 3h f (x0 ) +
h 2
f (x0 + 2h) − 2f (x0 + h) + f (x0 ) 9h3
 
+
2h2 2
f (x0 + 3h) − 3f (x0 + 2h) + 3f (x0 + h) − f (x0 ) 9h4
 
+ ,
6h3 4
3h
= [f (x0 ) + 3f (x0 + h) + 3f (x0 + 2h) + f (x0 + 3h)] . (3.24)
8
R x +3h
The above expression is the approximation to the x00 f (x) dx in Simpson’s three-eighth rule.
Z x0 +3h
3h
f (x) dx ≈ [f (x0 ) + 3f (x0 + h) + 3f (x0 + 2h) + f (x0 + 3h)] . (3.25)
x0 8
Using (2.46) the error in interpolation is given by
f iv (ξ)
E3 f (x) = (x − x0 )(x − x0 − h)(x − x0 − 2h)(x − x0 − 3h) . (3.26)
4!
And the error in integration will be bounded as follows
Z x0 +3h Z x0 +3h
f iv (ξ)
E3 (x)dx = (x − x0 )(x − x0 − h)(x − x0 − 2h)(x − x0 − 3h) dx ,
x0 x0 4!
Z x0 +3h
f iv (ξ)
≤ (x − x0 )(x − x0 − h)(x − x0 − 2h)(x − x0 − 3h) dx,
x0 4!
M4 x0 +3h
Z
≤ |(x − x0 )(x − x0 − h)(x − x0 − 2h)(x − x0 − 3h)| dx.
24 x0
3 NUMERICAL INTEGRATION 34

Here M4 is the maximum of |f iv | on the interval [a, b]. Now


Z x0 +3h
|(x − x0 )(x − x0 − h)(x − x0 − 2h)(x − x0 − 3h)| dx
x0
Z 3     
2 3 1 1 3
= u+ u+ u− u− du
− 32 2 2 2 2
Z 32    Z 32   
9 1 9 1
= u2 − u2 − du = 2 u2 − u2 − du
− 32 4 4 0 4 4
Z 12    Z 23   
9 1 9 1
=2 − u2 − u2 du + 2 − u2 u2 − du
0 4 4 1
2
4 4
Z 21   Z 32  
9 10 9 10
=2 − u2 + u4 du + 2 − + u2 − u4 du,
0 16 4 1
2
16 4
1 3
10 3 u5 2 10 3 u5 2
  
9 9 49
=2 u− u + +2 − u+ u − =
16 12 5 0 16 12 5 1 30
2

Thus the error bound for Simpson’s 3/8 rule can be given as
x0 +3h
M4 h5 49 49M4 h5
Z
f (x) − P3 (x)]dx ≤ = . (3.27)
x0 24 30 720

3.8. Composite Simpson’s 3/8 rule


Here we first subdivide the interval [a, b] in to n number of equal length subintervals and aim
to apply Simpson’s 3/8 rule in each of these n subintervals. Thus we further need to subdivided
these intervals into three equal length subintervals of width h = (b − a)/3. Now we have total 3n
subintervals with 3n + 1 nodes as

x0 , x1 , x2 , x3 , . . . , x3i , x3i+1 , x3i+2 , x3(i+1) , . . . x3(n−1) , x3(n−1)+1 , x3(n−1)+2 , x3n

and apply Simpson’s 3/8 rule in the interval of the form [x3i , x3(i+1) ] to obtain
Z x3(i+1)
3h  
f (x) dx ≈ f (x3i ) + 3f (x3i+1 ) + 3f (x3i+2 ) + f (x3(i+1) ) . (3.28)
x3i 8

And the approximation for the integral on whole interval [a, b] can be obtained as follows.
Z b n−1
X Z x3(i+1) n−1
3h  X 
f (x) dx = f (x) dx ≈
f (x3i ) + 3f (x3i+1 ) + 3f (x3i+2 ) + f (x3(i+1) )
a i=0 x3i i=0
8
" n−1 n−1 n−1
#
3h X X X
≈ f (x0 ) + 3 f (x3i+1 ) + 3 f (x3i+2 ) + 2 f (x3i ) + f (x3n ) . (3.29)
8 i=0 i=0 i=1

The total error in composite Simpson’s 3/8 rule will be bounded as follows
Z b n−1
X Z x3(i+1) n−1
X Z x3(i+1)
[f (x) − P (x)] dx = [f (x) − P (x)] dx ≤ [f (x) − P (x)] dx
a i=0 x3i i=0 x3i
n−1
X 49M4 h5 49M4 h5 49M4 4
≤ =n = n (b − a)5 . (3.30)
i=0
7200 720 720 × 35
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 35

4 Solution of a system of linear equations


Here we aim to solve the following system of linear equations.
a11 x1 a12 x2 ... ... ... a1i xi ... ... ... a11 xn = b1
a21 x1 a22 x2 ... ... ... a2i xi ... ... ... a21 xn = b2
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
ai1 x1 ai2 x2 ... ... ... aii xi ... ... ... ain xn = bi
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
an1 x1 an2 x2 ... ... ... ani xi ... ... ... ann xn = bn

(4.1)
In this system of linear equations a0ij s, i, j = 1, 2, . . . , n are known coefficients, b0i s, i = 1, 2, . . . , n
are known values and x0i s, i = 1, 2, . . . , n are unknowns to be determined. This system can also be
written in matrix form as
Ax = b, (4.2)
where A is n × n matrix and x and b are column vectors of order n × 1 of unknowns and known
values respectively. If b = 0, then (4.2) is called homogeneous system of equations. For the solution
of the system of linear equations (4.2) we recall an important theorem from linear algebra.
Theorem 4.1. If A is real matrix of order n × n, then the following statements are equivalent.

• Ax = 0 has only trivial solution.


• For each b, Ax = b has a solution.
• A is invertible.
• det(A) 6= 0.

Remark 4.1. If any of the four equivalent conditions are satisfied in the above theorem, then one
can find the solution of the system of linear equations by multiplying the inverse of the matrix A
on left of both side of the equation Ax = b to get

x = A−1 b. (4.3)

And to find A−1 we know the standard method (learned in 10 + 2 standard) of finding the adjoint
of the matrix A and A−1 = Adj(A)/ det(A). One can also recall the Cramer’s method of finding
the solution.
Observation 4.1. Both the above method of finding the solution involve an intermediate step of
finding the determinant of matrix A. But finding the determinant of a large matrix is not an easy
task.
Question 4.1. Can we find the solution of the system of linear equations without finding the
inverse of the coefficient matrix?
4.1. Direct method for some special form of the coefficient matrix A

Here we will try to answer the question (4.1) positively if the coefficient matrix is of some easy
form such that the solution can be obtained by direct computations.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 36

1. Diagonal case: A = D Qn
Since matrix A is assumed to be nonsingular, det(A) = i=1 6 0. And we have
aii =
     
a11 0 . . . . . . 0 ... ... 0 0 x1 b1
 0 a22 . . . . . . 0 ... ... 0 0   x2   b2 
   

. . . . . . . . . . . . ... ... ... ... ...  ...   ... 
   
 
. . . . . . . . . . . . ... ... ... ... ...  ...   ... 
   

 0
 0 ... ... aii . . . . . . 0 0  ×  xi  =  bi 
   
.
. . . . . . . . . . . . ... ... ... ... ...  ...   ... 
     
. . . . . . . . . . . . ... ... ... ... ...  ...   ... 
   

 0 0 ... ... 0 . . . . . . an−1,n−1 0   xn−1  bn−1 
0 0 ... ... 0 ... ... 0 ann xn bn
The solution is obvious in this case and can be written as
bi
xii = , i = 1, 2 . . . , n. (4.4)
aii
Note that the only n divisions are required as computer operations.
2. Lower triangular case: A = LT
We have the following matrix equation.
   
 
a11 0 0 0 0 0 0 0 0 x1 b1
 a21 a 22 0 0 0 0 0 0 0
  x2   b2 
     
 ... ... ... 0 0 0 0 0   ...   ... 
0
     
 ... . . . . . . . . . 0 0 0 0   ...   ... 
0
     
 ai1 a i2 . . . . . . a ii 0 0 0  ×  xi  =  bi  .
0
     
 ... ... ... ... ... ... 0 0   ...   ... 
0
     
 ... . . . . . . . . . . . . ... ... 0   ...   ... 
0
     
an−1,1 an−1,2 . . . . . . an−1,i . . . . . . an−1,n−1  xn−1  bn−1 
0
an1 an2 ... ... ani ... ... an,n−1 ann xn bn
Qn
In this case also we assume the solution to exist and hence, det(A) = i=1 aii 6= 0. Here it
is easy to compute x1 but to compute x2 , we need to substitute the value of x1 in the second
equation. Similarly to compute xk we need to substitute the values of x1 , x2 , . . . , xk−1 (which
are already obtained) in the following equation
Pk−1
(bk − j=1 akj xj )
xk = , k = 2, . . . , n. (4.5)
akk
Thus we need forward substitution of the entries so we call this method as forward sub-
stitution. For computation of xk in the above equation we require (k − 1) multiplications,
Pn− 1) additions and one division.
(k
2
Thus the total number of computer operations are
k=1 (k − 1) + (k − 1) + 1 = n + 2n.
3. Upper triangular case: A = U T
In this case we also have aii 6= 0. And we have
     
a11 a12 . . . . . . a1i . . . ... a1,n−1 a1n x1 b1
 0 a22 . . . . . . a2i . . . ... a2,n−1 a2n   x2   b2 
     
 0 0 ... ... ... ... ... ... ...   ...   ... 
     
 0 0 0 ... ... ... ... ... ...   ...   ... 
     
 0 0 0 0 aii . . . ... ai,n−1 ain  ×  x i  =  bi  .
     
 0 0 0 0 0 ... ... ... ...   ...   ... 
     
 0 0 0 0 0 0 ... ... ...   ...   ... 
     
 0 0 0 0 0 0 0 an−1,n−1 an−1,n  xn−1  bn−1 
0 0 0 0 0 0 0 0 ann xn bn
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 37

But here the computation of xn is easy compared to other unknowns. So we first compute
xn from the last equation and substitute its value in the second last equation to compute
xn−1 . And to compute xk we substitute the values of unknowns xn , . . . , xk+1 obtained from
the last n − k equations in the kth equation as follows
Pn
(bk − j=k+1 akj xj )
xk = , k = 2, . . . , n − 1. (4.6)
akk
Because of substitution in the earlier equations this method is known as back substitution.
And similar to forward substitution we need total n2 + 2n computer operations for the
complete solution.
Remark 4.2. Above methods are applicable only to diagonal or triangular matrices and do not
give the general answer to the question (4.1).
4.2. Certain Decomposition Methods for solving the system of linear equations
Here our aim is to write the coefficient matrix A as the product of two matrices B and C of some
simpler form. And then go for solving Ax = b or BCx = b, by solving two different systems. First
Bz = b for z and then Cx = z for x.
4.3. Doolittle’s Method
In this method A is decomposed as A = LU , where
   
1 0 0 u11 u12 u13
L = l21 1 0 , U = 0 u22 u23  .
l31 l32 1 0 0 u33

So that
   
a11 a12 a13 u11 u12 u13
A = a21 a22 a23  = LU = l21 u11 l21 u12 + u22 l21 u13 + u23  (4.7)
a31 a32 a33 l31 u11 l31 u12 + l32 u22 l31 u13 + l32 u23 + u33

Now one has to solve nine equations to find the all the total nine unknown coefficients of lower
triangular matrix L and upper triangular matrix U . But these are easy to solve more or less only
substitutions are needed. The solution is obtained by first solving Lz = b for z by direct methods
and then solving U x = z for x again by direct methods.
4.4. Crout’s Method
Here one decomposes A = LU , where
   
l11 0 0 1 u12 u13
L = l21 l22 0  , U = 0 1 u23  .
l31 l32 l33 0 0 1

Thus one has


   
a11 a12 a13 l11 l11 u12 l11 u13
A = a21 a22 a23  = LU = l21 l21 u12 + l22 l21 u13 + l22 u23  (4.8)
a31 a32 a33 l31 l31 u12 + l32 l31 u13 + l32 u23 + l33

Here we again need to solve nine equations to determine all the nine unknown coefficients. And
similar to previous method we first solve Lz = b for z and then U x = z for x.
4.5. Positive definite matrix
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 38

A real square matrix A is said to be positive definite if det A > 0 and all leading principal
minors are positive.
4.6. A matrix  
a11 a12 a13
A = a21 a22 a23 
a31 a32 a33
is positive definite if

• a11 > 0,
 
a a12
• det 11 > 0.
a21 a22
• det A > 0.
   
1 1/2 1/3 1 2 3
Example 4.1. The matrix 1/2 1/3 1/4 is positive definite, while the matrix 2 3 2 is
1/3 1/4 1/5 3 2 5
not positive definite because second leading principal minor is not positive.
Cholesky’s Method

Cholesky’s method is applicable for symmetric and positive definite matrix A. In this case the
decomposition of A is A = LLT , where
 
d1 0 0
L = l21 d2 0  .
l31 l32 d3

So that    2 
a11 a21 a31 d1 d1 l12 d1 l31
2
A = a21 a22 a32  = LU = d1 l21 l21 + d22 l21 l31 + d2 l32  (4.9)
2 2
a31 a32 a33 d1 l31 l31 l21 + l32 d2 l31 + l32 + d23
Here we only need to solve six equations in six unknowns. To solve Ax = b we first solve Lz = b,
for z and then LT x = z for x.
Question 4.2. Can we think of some generalization of the elimination method, which we learned
in 10th standard?

4.7. Gauss elimination method


Here our aim is to convert the given system of linear equations Ax = b in another equivalent
(having the same solution) system of linear equations in which the coefficient matrix is of triangular
or diagonal form. So that we can use the direct methods to solve it. Gauss elimination method
involves three types of elementary row transformations, which does not change the solution of the
system. These transformations can be described as follows.

1. Multiplication of a row by a non zero constant.


Suppose ith row is multiplied by a non-zero constant c. We denote this transformation by
Ri (c). If we multiply the ith row of the identity matrix by the same constant c and name
this new matrix by Ei(c) , then it is easy to observe that the inverse of this matrix is Ei(c− 1) .
Now it is important to observe that this row transformation can also be operated to the
system of linear equations by multiplying the system Ax = b by elementary matrix Ei(c) on
the left to obtain Ei(c) Ax = Ei(c) b. And clearly this new system also have the solution as
x = A−1 (Ei(c) )−1 Ei(c) b = A−1 b.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 39

2. Interchanging two rows.


The row transformation of interchanging the ith row with the jth row is denoted by Ri ↔ Rj .
If we interchange the ith row with the jth row of the identity matrix and obtain a new matrix
denoted by Ei↔j . This is a self inverse matrix. And the same row transformation can also
be obtained by left multiplication of the elementary matrix Ei↔j on both sides of the system
Ax = b.

3. Addition of non-zero multiple of one row to other.


Suppose we multiply a constant c to ith row and add it to jth row. This row transformation is
denoted by Rji(c) . The corresponding elementary matrix obtained from identity by the same
row transformation is denoted by Eji(c) . The inverse of this elementary matrix is Eji(−c) .

Thus we note that at each step of elementary row transformation to a system of linear equations
we get an equivalent system of linear equations having the same solution.
Steps of Gauss elimination method to convert the coefficient matrix in to upper triangular (or
identity) matrix can be described as follows.

• If a11 = 0, interchange the first row with jth row for which aj1 6= 0.
• Use transformations Ri1(−ai1 /a11 ) to eliminate all ai1 , i > 1 with the pivot element a11 .
• If a22 = 0, interchange the second row with jth , j > 2 row for which aj2 6= 0

• Use transformations Ri2(−ai2 /a22 ) to eliminate all ai2 , i > 2 with the pivot element a22 .
• Use similar transformations to convert the coefficient matrix in to upper triangular matrix.
If we know already that the matrix A is invertible we can further use elementary row transfor-
mations to convert this upper triangular matrix to identity matrix in following steps. Also note
that in this upper triangular matrix each aii 6= 0.

• Now by first using Rn(1/ann ) , we get ann = 1.


• Use transformations Rin(−ain ) , i = n − 1, . . . , 1 to eliminate ain , i = n − 1, . . . , 1 with pivot
as ann = 1.

• We further do similar transformations in (n − 1)th column after that in (n − 2)th column and
so on to second and then in first column. In general in kth column we first apply Rk(1/akk ) to
get akk = 1 and then Rik(−aik ) , i = k − 1, . . . , 1 to eliminate aik , i = k − 1, . . . , 1 with pivot
as akk = 1.
To implement Gauss elimination method in general we operate a number of elementary row trans-
formations to a system of linear equations, required to convert A in to triangular form (or iden-
tity form), or equivalently the same elementary row transformations on the augmented matrix
[A|b] to convert it into [T |bT ] (or [I|bI ]). Theoretically it can be understood by successive left
multiplication of corresponding elementary metrics on both sides of matrix form of the system
Ax = b, that is, El El−1 . . . E2 E1 Ax = El El−1 . . . E2 E1 b to obtain the new equivalent matrix form
of the system as T x = bT (or Ix = bI ). Since the product of invertible metrics is an invert-
ible matrix, the matrix E = El El−1 . . . E2 E1 Ax = El El−1 . . . E2 E1 is invertible and hence the
solution x for EAx = T x = bT = Eb (or EAx = Ix = bI = Eb) is same as the solution for
E −1 EAx = Ax = b = E −1 Eb.
Observation 4.2. It might happen in general that the coefficient matrix is not invertible or it is
not a square matrix, that is, the number of equations is different from number of unknowns. In
these situations we can not use Theorem 4.1 and the steps of Gauss elimination method might not
work fine.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 40

Question 4.3. Can we convert Ax = b in to some form, which can be solved directly when A is
rectangular or singular matrix.
4.8. Row Echelon Form
Yes! One can still use elementary row transformations to convert A in to Echelon form, which
can be described as follows.
A matrix A is said to be of Echelon form if it satisfies the following properties.
• Each of the zero rows (a row in which each entry is zero), if it occurs in the matrix A, must
occur below every non-zero row (a row in which at least one entry is non-zero).
• If the leading non-zero entry (first non-zero entry of the row) in ith row occurs in kith column,
that is, aij = 0, j = 1, 2, . . . , ki − 1 and aiki 6= 0, and there are only r many non-zero rows,
then one must have k1 < k2 < · · · < kr .
To obtain the row Echelon form of a matrix A of order m × n, one can use the following steps.
• If first non-zero column is l1 , by a row transformation (row interchange) bring one non-zero
entry of l1th column in the first row. And use this entry to eliminate all other entries of the
l1th column by the row transformations Ril1 (−ail1 /a1l1 ) . This l1 is k1 of the definition. In
further row transformations we will not use the first row at all.
• Now consider the sub matrix of order m − 1 × n after ignoring the first row. Search for
the first non-zero column l2 of this sub-matrix. If first entry of this l2th column of order
m − 1 × 1 is zero, bring one non-zero entry of the column to the first row and use this entry to
eliminate all other non-zero entries of the column. This l2 is k2 of the definition. In further
row transformations we will not use the first two rows of the parent matrix.
• Now we further consider the sub matrix of order m − 2 × n of the parent matrix. And search
for first non-zero column l3 of this sub-matrix. Use similar row transformations to eliminate
all the entries of l3th below the first row using the non-zero entry of the first row as pivot.
• Use similar transformations successively to find a column lr of the parent matrix satisfying
i) lr th column is the first non-zero column of the sub-matrix (obtained by ignoring first r − 1
rows ) of order m − r + 1 × n,
ii) only non-zero entry of this column of sub-matrix is in the first row.
• This process will stop if i) either r = m, in this case m ≤ n (and m = n only when Echelon
form is upper triangular and invertible),
ii) or last r − m rows of the parent matrix are zero rows.
4.9. Solution of system through Echelon form
For solving the system Ax=b, where A is coefficient matrix of order m × n, x is column of un-
knowns of order n × 1 and b is column of order m × 1, we apply same sequence of elementary row
transformations to augmented matrix [A|b] to convert it in to [AE |bE ], where AE is the Echelon
form of matrix A. Because of elementary row transformations the solution x for both the systems
Ax = b and AE x = bE are same.
We can also conclude about the nature of the solution by some observations of augmented
matrix [AE |bE ]as follows.
• If r = m and n = m, then unique solution by direct method.
• If r = m and n > m, then infinite solutions with n − m order of independency, that is, out
of n unknowns the system can be solved for any chosen m unknowns in terms of remaining
n − m unknowns, which can be given any values.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 41

• If r < m and there exists some non-zero row in last m − r rows of augmented matrix
[AE |bE ]. This happens because of only non-zero entry of the row as the entry corresponding
to column bE . If this entry is bE r+p 6= 0, we end up with solving an equation of the form
0 × xr+p = bE r+p , which shows the inconsistency of the system and leads to NO Solution.
• Further if r < m and last m − r rows of augmented matrix [AE |bE ] are zero rows, then the
system has unique solution if n = r, and infinite solutions with n − r degree of freedom if
n > r. Note that the case n < r does not occur.
4.10. Partial Pivoting
Look at the class example for the need of partial pivoting.
One can ensure partial pivoting in Gauss Elimination or in Echelon form just by making sure
that the pivot entry in the column is largest in magnitude compared to the entries, which are to
be eliminated.
4.11. Norm
Norm is the notion of generalization of the modulus. Modulus of a number measures the
distance of the position of that number from the origin. Now to measure the distances in a plane
or in Rn , one considers a function

k · k : Rn → R+ ∪ {0}, (4.10)

satisfying,
• kxk ≥ 0 and kxk = 0 if and only if x = 0, for all x ∈ Rn .

• kαxk = |α|kxk, for any scalar α ∈ R and x ∈ Rn . This also implies kx − yk = ky − xk.
• Triangle Inequality: For any x, y, z, u, v ∈ Rn ,

ku + vk ≤ kuk + kvk. (4.11)

This is also equivalent to


kx − zk ≤ kx − yk + ky − zk. (4.12)
Equation (4.12) shows that distance of any two points is less or equal to sum of the distances
of these two points from any third point.
Example 4.2. Consider the following function on Rn

n
!1/p
X
kxkp = |xi |p . (4.13)
i=1

This defines a norm on Rn , known as p−norm of the vector x ∈ Rn . Note that in daily life we use
2−norm for measuring the distance in plane and space.
Example 4.3. Consider the function on Rn

kxk∞ = max {|xi |}. (4.14)


1≤i≤∞

This also defines a norm on Rn .


Exercise 4.1. Draw the locus of following in plane
• {x ∈ R2 : kxk1 = 1},
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 42

• {x ∈ R2 : kxk2 = 1},
• {x ∈ R2 : kxk∞ = 1}.
Exercise 4.2. Show that norm is a continuous function from Rn → [0, ∞].
4.12. Matrix Norm
We know that the set of all real valued matrices of order m × n, denoted by Mmn , forms a vector
space over real numbers. One can define a norm on Mmn , which is also compatible with matrix
multiplication as
k · k : Mmn → R+ ∪ {0}, (4.15)
satisfying,
• kAk ≥ 0 and kAk = 0 if and only if A is null matrix, for all A ∈ Mmn .
• kαAk = |α|kAk, for any scalar α ∈ R and A ∈ Mmn .
• Triangle Inequality: For any A, B ∈ Mmn ,
kA + Bk ≤ kAk + kBk. (4.16)

• This norm is also compatible with multiplication by a column vector of order n × 1


kAxk ≤ kAkkxk, (4.17)

Example 4.4. Consider the following function on Mmn


( n ! )
X
kAk1 = max |aij | : j = 1, . . . , m . (4.18)
1≤j≤m
i=1

This defines a norm on Mmn , which is compatible with multiplication by vectors x ∈ Rn as


kAxk1 ≤ kAk1 kxk1 , (4.19)
where kAxk1 and kxk1 are defined in (4.13) for p = 1.
Example 4.5. Consider the following function on Mmn
 1/2
m,n
X
kAk2 =  |aij |2  . (4.20)
i,j=1

This also defines a norm on Mmn , which is compatible with multiplication by vectors x ∈ Rn as
kAxk2 ≤ kAk2 kxk2 , (4.21)
where kAxk2 and kxk2 are defined in (4.13) for p = 2.
Example 4.6. Consider the following function on Mmn
  
 X m 
kAk∞ = max  |aij | : i = 1, . . . , n . (4.22)
1≤i≤n  
j=1

This defines a norm on Mmn , which is compatible with multiplication by vectors x ∈ Rn as


kAxk∞ ≤ kAk∞ kxk∞ , (4.23)
where kAxk∞ and kxk∞ are defined in (4.14).
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 43

4.13. Convergence of vectors

Similar to convergence of sequence of numbers to a point, one can define convergence of sequence
of vectors to another vector. A sequence of vector (or matrices) converges to some vector (or
matrices) if and only if sequence of components converges to the corresponding component of limit
vector (or matrix) for each component.
(n)
In Rm a sequence of vectors y (n) converges to a vector y if and only if yi → yi , for all
i = 1, . . . , m.
Example 4.7. The vector [1/n, 2n/n − 1, n + 1/n − 1]t → [0, 2, 1]t , while [1/n, 2n/n − 1, n]t does
not converse because the sequence corresponding to third component is not convergent.
4.14. Gauss-Jacobi iterative method

Now our aim is to solve (4.1) using some iteration method. For this we first assume aii 6= 0
and rewrite the system of linear equations as follows
a11 x1 = 0x1 −a12 x2 ... ... ... −a1i xi ... ... ... −a1n xn +b1
a22 x2 = −a21 x1 0x2 ... ... ... −a2i xi ... ... ... −a2n xn +b2
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
aii xi = −ai1 x1 −ai2 x2 ... ... ... 0xi ... ... ... −ain xn +bi
... = ... ... ... ... ... ... ... ... ... ... ...
... = ... ... ... ... ... ... ... ... ... ... ...
... = ... ... ... ... ... ... ... ... ... ... ...
ann xn = −an1 x1 −an2 x2 ... ... ... −ani xi ... ... ... 0xn +bn

Note that if we decompose A = L + D + U , where L, D, U are lower triangular, diagonal and upper
triangular matrices respectively, then above system of equations in matrix form can be written as
Dx = −(L + U )x + b, where x is the column vector of unknowns. Now since D is invertible by our
assumption we have
x = −D−1 (L + U )x + D−1 b. (4.24)
And this can also be written component wise as
x1 = 0x1 − aa11
12
x2 ... ... ... − aa11
1i
xi ... ... ... − aa1n
11
xn + ab11
1

x2 = − aa21
22
x1 0x2 ... ... ... a2i
− a22 xi ... ... ... a2n
− a22 xn + ab11
1

... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
xi = − aai1
ii
x1 − aai2
ii
x2 ... ... ... 0xi ... ... ... − aain
ii
xn + abiii
... = ... ... ... ... ... ... ... ... ... ... ...
... = ... ... ... ... ... ... ... ... ... ... ...
... = ... ... ... ... ... ... ... ... ... ... ...
xn = − aann
n1
x1 − aann
n2
x2 ... ... ... − aann
ni
xi ... ... ... 0xn + abnn
n

(4.25)
−1 −1
Denoting −D (L + U ) = B and D b = C, we have x = Bx + C. Now if we assume the initial
approximation to the solution as
h it
x(0) = x(0)
1 x2
(0)
... ... ... xi
(0)
... ... ... xn
(0)
, (4.26)
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 44

then the first approximation can be obtained by the equation x(1) = Bx(0) + C, and in general
(k + 1)th approximation is obtained by

x(k+1) = Bx(k) + C, (4.27)

where matrix B and C are as follows


− aa11 . . . − aa11 . . . − aa1n
 a1n 
0 ... ... ... ... − a11
 12 1i

11
 − a21 0 ... ... a2i
. . . − a22 ... ... . . . − aa2n  − a2n 
 a22 22   a22 
 ... ... ... ... ... ... ... ... ... ...   ... 
   
 ... ... ... ... ... ... ... ... ... ...   ... 
   
 ... ... ... ... ... ... ... ... ... ...   ... 
B=  − ai1
, C =  ain  . (4.28)
− aai2 . . . − aain

... ... ... 0 ... ...  − aii 

 aii ii ii 
 ... ... ... ... ... ... ... ... ... ...   ... 
   
 ... ... ... ... ... ... ... ... ... ...   ... 
   
 ... ... ... ... ... ... ... ... ... ...   ... 
− aann
n1
− aann
n2
... ... ... ... ... ... ... 0 0

Using (4.27) we can write ith component of the (k + 1)th approximation of the solution vector
as
n
(k+1) bi X −aij (k)
xi = + x . (4.29)
aii aii j
j=1,j6=i

4.15. Error analysis of Gauss-Jacobi Iteration Method


Suppose x is the exact solution to the system 4.27. Then the kth error vector can be defined as
h it
e(k) = x − x(k) = x1 − x(k)
1 x2 − x2
(2)
... ... xi − xi
(k)
... ... xn − xn
(k)
, (4.30)

where [aij ]t represents the transpose of the corresponding matrix. Using (4.30) and (4.29), one has

e(k+1) = x − x(k+1) = (Bx + C) − (Bx(k) + C) = B(x − k k ) = Be(k) , (4.31)

and component-wise
n
(k+1)
X −aij (k)
ei = e
aii j
j=1,j6=i
or,
n
(k+1)
X −aij (k)
|ei | ≤ |ej |
aii
j=1,j6=i
n n
X |aij | (k)
X |aij | (k)
≤ max {|e | : j = 1, . . . , n} = ke k∞ (4.32)
|aii | 1≤j≤n j |aii |
j=1,j6=i j=1,j6=i

If we define
i−1 n
X |aij | X |aij |
αi = , βi = , (4.33)
j=1
|aii | j=i+1
|aii |

and
µ = max {(αi + βi ) : i = 1, . . . , n}. (4.34)
1≤i≤n
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 45

Using (4.32) and (4.34),


n
(k+1)
X |aij |
|ei | ≤ ke(k) k∞
|aii |
j=1,j6=i
(k)
≤ ke k∞ (αi + βi )
(k)
≤ ke k∞ µ.
Above inequality is true for all i = 1, . . . , n, hence
(k+1)
max {|ei | : i = 1, . . . , n} ≤ ke(k) k∞ µ,
1≤i≤n

ke(k+1) k∞ ≤ µ ke(k) k∞ , (4.35)


Above inequality (4.35) is true for all k ∈ N. And hence by repeated application of (4.35), we have
ke(k+1) k∞ ≤ µke(k) k∞ ≤ µµke(k−1) k∞ ≤ (µ)k+1 ke(0) k∞ , (4.36)
where e(0) = x − x(0) is the zeroth error vector.
(k+1) (k+1)
If we assume µ < 1, then ke(k+1) k∞ → 0, that is, |xi −xi | = |ei | → 0 for all i = 1, . . . , n,
(k)
which implies that the sequence of approximating vectors x converges to the exact solution x.
This µ defined by (4.34) is known as convergence factor of Gauss-Jacobi iterative method.
Remark 4.3. Now we collect all the assumptions, which are assumed for the convergence. These
assumptions are
• aii 6= 0, for all i = 1, . . . , n,
• and
µ = max {(αi + βi ) : i = 1, . . . , n}
1≤i≤n
  
i−1 n
 X |aij | X |a |
ij 

= max  + : i = 1, . . . , n < 1. (4.37)
1≤i≤n 
j=1
|aii | j=i+1 |aii | 

Let us first consider the second condition (4.37), which will be valid if and only if
(αi + βi ) < 1 for all i = 1, . . . , n
 
i−1
X |aij | n
X |aij |
or  +  < 1 for all i = 1, . . . , n
j=1
|aii | j=i+1 |aii |
 
Xn
or  |aij | < |aii | for all i = 1, . . . , n (4.38)
j=1,j6=i

The condition (4.38) is known as strict row diagonally dominant and also implies the first condition.
Thus for the convergence of Gauss-Jacobi iteration method we only need the coefficient matrix to
be strict row diagonally dominant.
Remark 4.4. Note that kBk∞ = µ. Then from (4.31)
ke(k+1) k = kBe(k) k ≤ kBk × ke(k) k ≤ kBk × kBk × ke(k−1) k ≤ . . . ≤ kBkk+1 ke(0) k = µk+1 ke(0) k.
Now the convergence follows if µ < 1. Moreover, we have x(k+1) − x(k) = Bx(k) + C − Bx(k−1) − C
and hence
kx(k+1) − x(k) k = kB(x(k) − x(k−1) )k ≤ kBk × kx(k) − x(k−1) k = µkx(k) − x(k−1) k, for all k ∈ N.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 46

Thus kx(k+1) − x(k) k ≤ µkx(k) − x(k−1) k ≤ µ × µkx(k−1) − x(k−2) k ≤ . . . ≤ µk kx(1) − x(0) k. Now
for any m > k > 1 we have

kx(m) − x(k) k = kx(m) − x(m−1) + x(m−1) − x(m−2) + x(m−2) . . . − x(k+1) + x(k+1) − x(k) k
≤ kx(m) − x(m−1) k + kx(m−1) − x(m−2) k + . . . + kx(k+1) − x(k) k
≤ (µm−1 + µm−2 + . . . + µk )kx(1) − x(0) k

X µk
< µk ( µi )kx(1) − x(0) k = kx(1) − x(0) k since µ < 1.
i=0
1 − µ

Further since kx(m) −xk → 0, for any given , there exists an integer m0 > k such that kx(m0 ) −xk <
. And hence
µk
kx − x(k) k ≤ kx − x(m0 ) k + kx(m0 ) − x(k) k <  + kx(1) − x(0) k.
1−µ
Since above inequality is true for any arbitrary  > 0, we can assume  → 0 and hence

µk
ke(k) k = kx − x(k) k ≤ kx(1) − x(0) k. (4.39)
1−µ
4.16. Gauss-Seidel iterative method
We rewrite the system of equations as follows

a11 x1 = −a12 x2 −a13 x3 ... ... −a1i xi ... ... ... ... −a1n xn +b1
a21 x1 +a22 x2 = −a23 x3 ... ... −a2i xi ... ... ... ... −a2n xn +b2
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
ai1 x1 +ai2 x2 ... ... ... +aii xi = −ai,i+1 xi+1 ... ... ... −ain xn +bi
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
an1 x1 +an2 x2 ... ... ... +ani xi ... ... ... ... +ann xn = +bn

As in case of Gauss-Jacobi method we assume aii 6= 0 and decompose A = L + D + U , then above


system of equations in matrix form can be written as (L + D)x = −U x + b and the matrix L + D
turns out to be invertible. Further if x(0) is the initial approximation, we define the sequence of
iterates by

(L + D)x(k+1) = −U x(k) + b, or x(k+1) = −(L + D)−1 U x(k) + (L + D)−1 b. (4.40)

Component-wise it can be written as follows


i−1 n
(k+1) (k+1) (k)
X X
aij xj + aii xi =− aij xj + bi , or equivalently
j=1 j=i+1

i−1 n
(k+1) bi X −aij (k+1) X −aij (k)
xi = + xj + x . (4.41)
aii j=1 aii j=i+1
aii j

4.17. Error analysis of Gauss-Seidel Iteration Method


4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 47

Suppose x is the exact solution to the system 4.25. Then the kth error vector can be similar to
(4.30) defined as
h it
e(k) = x − x(k) = x1 − x(k)
1 x 2 − x
(2)
2 . . . . . . x i − x
(k)
i . . . . . . xn − x
(k)
n ,

and using (4.41) component-wise we can write


   
i−1 n i−1 n
(k+1) (k+1) b i
X −aij
X −a ij b i
X −a ij (k+1)
X −a ij (k)
ei = xi − xi = + xj xj  −  + xj + xj  ,
aii j=1 aii j=i+1
aii aii j=1
aii j=i+1
aii

i−1 n
X −aij (k+1)
X −aij (k)
= (xj − xj )+ (xj − xj )
j=1
aii j=i+1
aii

i−1 n
X −aij (k+1)
X −aij (k)
= (ej )+ (e ), (4.42)
j=1
aii j=i+1
aii j

Above equation implies


i−1 n
(k+1)
X −aij (k+1)
X −aij (k)
|ei | = (ej )+ (e )
j=1
aii j=i+1
aii j
i−1 n
X |aij | (k+1)
X |aij | (k)
≤ |ej |+ |e |
j=1
|aii | j=i+1
|aii | j
i−1 n
X |aij | X |aij | (k)
≤ ke(k+1) k∞ + ke k∞
j=1
|aii | j=i+1
|aii |

= αi ke(k+1) k∞ + βi ke(k) k∞ .
Or,
(k+1)
|ei | − αi ke(k+1) k∞ ≤ βi ke(k) k∞
(k+1)
|ei | − ke(k+1) k∞ + (1 − αi )ke(k+1) k∞ ≤ βi ke(k) k∞ (4.43)
If we assume (1 − αi ) > 0 for all i = 1, . . . , n and define
 
βi
η = max : i = 1, . . . , n . (4.44)
1≤i≤n (1 − αi )

Then from (4.43),


(k+1)
|ei | − ke(k+1) k∞ βi
+ ke(k+1) k∞ ≤ ke(k) k∞ ≤ ηke(k) k∞ (4.45)
(1 − αi ) (1 − αi )
(k+1)
(k+1) |e |−ke(k+1) k
Let ke(k+1) k∞ = |ei0 |. Since (1 − αi ) > 0, the expression i (1−αi ) ∞
is less or equal to
zero for all i = 1, . . . , n and equal to zero for at least one i = i0 . Further since the above inequality
(4.45) is true for all i = 1, . . . , n and so it is true for i = i0 . Thus from (4.45) for i = i0
ke(k+1) k∞ ≤ ηke(k) k∞ . (4.46)
Above inequality is true for all k ∈ N. And hence we have from (4.46)
ke(k+1) k∞ ≤ ηke(k) k∞ ≤ ηηke(k−1) k∞ ≤ (η)k+1 ke(0) k∞ . (4.47)
Further if we assume η < 1, we conclude from (4.47) that ke(k) k∞ → 0 or equivalently x(k)
converges to exact solution x. This η defined by (4.44) is known as convergence factor of Gauss-
Seidel iterative method.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 48

Remark 4.5. Now we collect all the assumptions, which are assumed for the convergence of
Gauss-Seidel method. These assumptions are
• aii 6= 0, for all i = 1, . . . , n,
• (1 − αi ) > 0 for all i = 1, . . . , n ,
• and  
βi
η = max : i = 1, . . . , n < 1. (4.48)
1≤i≤n (1 − αi )

Let us first consider the third condition (4.48), which will be valid if and only if
 
βi
<1 for all i = 1, . . . , n
(1 − αi )
or βi < (1 − αi ) for all i = 1, . . . , n
 
i−1 n
X |aij | X |a |
ij 
or  + <1 for all i = 1, . . . , n
j=1
|a ii | j=i+1
|a ii |
 
X n
or  |aij | < |aii | for all i = 1, . . . , n (4.49)
j=1,j6=i

Thus if we assume the coefficient matrix to be strict row diagonally dominant, the third assumption
will be satisfied. Further in this case 1 − αi > βi ≥ 0 implies the second condition and first is
obviously true.
Exercise 4.3. If µ < 1, prove that η ≤ µ.
Remark 4.6. Above exercise shows that if the coefficient matrix is strict row diagonally dominant,
then the convergence factor of Gauss-Jacobi method is less or equal to convergence factor of
Gauss-Seidel method. And in case if η < µ, Gauss-Seidel method should converges faster than
Gauss-Jacobi method.
Remark 4.7. To apply any of the two methods, we need the matrix of coefficients as the strict row
diagonally dominant. Then for any initial approximation sequence of iterates converges to some
exact solution. Now the question arise that, whether the sequences corresponding to two different
initial approximations can converge to two different exact solutions. Note that it can happen only
when the coefficient matrix, which is strictly row diagonally dominant, is singular. In the following
discussion we will see that any strictly row diagonally dominant matrix is non singular and hence
we conclude that any sequence of iterates with a given initial approximation converges to the exact
solution.
Theorem 4.2. Suppose A and B are two square matrices of order n × n. If A is invertible and
1
kA − Bk < ,
kA−1 k
then B is also invertible.
Proof. Suppose if B is not invertible, then the rank of B is less than n and hence by Rank-Nullity
theorem the null space of B is not equal to {0}. Thus if 0 6= x is in the Null space of (B), then
Bx = 0 and
kxk kA−1 Axk
= ≤ kAxk = kAx − Bxk ≤ kA − Bk × kxk.
kA−1 k kA−1 k
1
But kxk =
6 0, we can conclude the contradiction kA−1 k ≤ kA − Bk (to the assumption).
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 49

Problem 4.1. Show that a strictly row diagonally dominant matrix is invertible.

Solution. Let A be a strictly row diagonally dominant matrix. If A = L+D+U is the decomposition
of A, then the diagonal matrix D consists of non zero entries in the diagonal and hence invertible.
Since A = D × D−1 A, it is sufficient to prove that D−1 A is invertible. Note that I is invertible
matrix with kI −1 k∞ = 1. Thus if we can show that kI − D−1 Ak∞ < 1, then by the application of
the previous theorem it follows that D−1 A is invertible. But it is easy to see that kI − D−1 Ak∞ =
µ < 1.
4.18. Ill-conditioned matrix
We first solve the following system of two linear equations in two unknowns.

x1 + 3x2 = 19
2.5x1 + 7.857x2 = 47.499

The exact solution to this system is x1 = −3 and x2 = 7. But if we round off 47.499 by 47.500,
then the solution changes drastically to x1 = 19 and x2 = 0. Or if we round off 7.857 by 7.86, the
solution changes to x1 = 98.17 and x2 = −26.39.
We observe that a small change in the coefficient matrix A or the constant vector b leads to a
large change in the solution vector. Such system is called ill-conditioned, otherwise the system is
called well-conditioned.
4.19. Small change in b vector
We want to solve
Ax = b (4.50)
Suppose the change δb in b leads to change δx in solution vector so that A(x + δx) = Ax + Aδx =
b + δb. This implies Aδx = δb, or δx = A−1 δb. Thus

kδxk = kA−1 δbk ≤ kA−1 kkδbk (4.51)

Above equation shows that the δx is controlled if kA−1 k is controlled. Further since

kbk = kAxk ≤ kAkkxk, (4.52)

we can control the relative error in solution vector kδxk/kxk. Using (4.51),

kδxk 1 kAk kδbk


≤ kA−1 k × kδbk ≤ kA−1 k × kδbk = (kA−1 k × kAk) . (4.53)
kxk kxk kbk kbk

Thus we see that the relative error in x is controlled by relative error in b if one has control over
the quantity (kA−1 kkAk), which is known as condition number of matrix A.
Exercise 4.4. Show that for any invertible matrix A the condition number is always greater or
equal to 1, when the considered norm is infinity norm.
5 THE EIGEN-VALUE PROBLEM 50

5 The Eigen-Value Problem


Let B be a square matrix. A number λ (real or complex) is said to be an eigenvalue of the matrix
B if there exists a nonzero vector y such that

By = λy.

This is equivalent to say that (B − λI)y = 0 for some non-zero vector y,


⇔ the null space of (B − λI) is not equal to {0},
⇔ the dimension of the null space of (B − λI) is greater than or equal to 1,
⇔ (B − λI) is singular,
⇔ determinant of (B − λI) is zero.
Clearly if y is an eigenvector for the matrix B corresponding to eigenvalue λ, then so is αy for
any nonzero α
Note that any matrix B of order m×n represents a linear transformation from an n-dimensional
vector space to an m-dimensional vector space with respect to some fixed ordered basis of these
vector spaces. it can be proved that the linear map defined by the matrix B as discussed is a
continuous map from n-dimensional vector space to an m-dimensional vector space.
Moreover, any square matrix of order n × n can be viewed as a linear transformation from Rn
to Rn with respect to the standard basis.
The importance of eigenvector is that the image of an eigenvector of an square matrix is again
a vector in the same direction but scaled by a factor of its eigenvalue.
Moreover, if there is a basis of eigenvectors say {v1 , v2 , . . . , vn }, with corresponding eigenvalues
λ1 , λ2 , . . . , λn , then image of any vector is completely known. Suppose if z ∈ Rn , then there are
scalars α1 , α2 , . . . , αn such that

z = α1 v1 + α2 v2 + . . . + αn vn , (5.1)

so that
Bz = α1 Bv1 + α2 Bv2 + . . . + αn Bvn = α1 λ1 v1 + α2 λ2 v2 + . . . + αn λn vn .
In this case the matrix of linear transformation with respect to the basis {v1 , v2 , . . . , vn } turns out to
be diagonal with diagonal entries as λ1 , λ2 , . . . , λn and we say that the matrix B is diagonalizable.
Exercise 5.1. Show that if a matrix B is diagonalizable, with eigenvalues |λ1 | ≥ |λ2 | ≥ . . . ≥ |λn |,
then the image of the unit ball {x : kxk ≤ 1} is contained in a ball of radius |λ1 | with center at
origin.
This exercise shows the importance of eigenvalue of largest magnitude.
5.1. The Power Method
This method is useful to find the dominant eigenvalue among a collection of eigenvalues of a
matrix and an eigenvector corresponding to the dominant eigenvalue. Let λ1 , λ2 , . . . , λm be a set
of eigenvalues of an square matrix of order n × n, with corresponding eigenvectors v1 , v2 , . . . , vm ,
m ≤ n such that z = α1 v1 + α2 v2 + . . . + αm vm , with α1 6= 0 and |λ1 | > |λ2 | ≥ |λ3 | ≥ . . . ≥ |λm |.
Thus,
B k z = α1 λk1 v1 + α2 λk2 v2 + . . . + αm λkm vm . (5.2)
Note that if u is a vector such that hv1 , ui =
6 0, then hz, ui =
6 0 and

hB k+1 z, ui α1 hv1 , ui + α2 (λ2 /λ1 )k+1 hv2 , ui + . . . + αm (λm /λ1 )k+1 hvm , ui
k
= λ1 . (5.3)
hB z, ui α1 hv1 , ui + α2 (λ2 /λ1 )k hv2 , ui + . . . + αm (λm /λ1 )k hvm , ui
So that in the limiting case
hB k+1 z, ui
lim = λ1 . (5.4)
k→∞ hB k z, ui
5 THE EIGEN-VALUE PROBLEM 51

Moreover, from (5.2) λ−k k k k


1 B z = α1 v1 + α2 (λ2 /λ1 ) v2 + . . . + αm (λm /λ1 ) vm . Thus

lim λ−k k
1 B z = α1 v1 . (5.5)
k→∞

Thus by (5.4), we can first find the largest eigenvalue and then by (5.2) the eigenvector, corre-
sponding to the largest eigenvalue, involved in the representation of z. Note that the eigenvector
λ1 v1 is not necessarily of unit length.
Problem 5.1.Use power  method to find the largesteigenvalue
 and the corresponding eigenvector
3 −1 3
of the matrix with the initial vector as .
2 0 1
   
3 3 −1
Solution. Clearly, since z = is not the eigenvalue of the matrix B = . So we
1 2 0
can infer
 that
 if there is a basis of eigenvectors, z is a linear combination of both of them. Further
1
since is not an eigenvector, its inner product with both the eigenvectors has to be non-zero
0
and hence we can use this vector as u vector. Now
    
3 −1 3 8
Bz = = ⇒ hB 1 z, ui = 8,
2 0 1 6
    
3 −1 8 18
B 2 z = BBz = = ⇒ hB 2 z, ui = 18,
2 0 6 16
    
3 −1 18 38
B 3 z = BB 2 z = = ⇒ hB 3 z, ui = 38,
2 0 16 36
    
3 −1 38 78
B 4 z = BB 3 z = = ⇒ hB 4 z, ui = 78,
2 0 36 76
    
3 −1 78 158
B 5 z = BB 4 z = = ⇒ hB 5 z, ui = 158,
2 0 76 156
    
3 −1 158 318
B 6 z = BB 5 z = = ⇒ hB 6 z, ui = 318,
2 0 156 316
    
3 −1 318 638
B 7 z = BB 6 z = = ⇒ hB 7 z, ui = 638,
2 0 316 636
    
3 −1 638 1278
B 8 z = BB 7 z = = ⇒ hB 8 z, ui = 1278.
2 0 636 1276
n k+1 o
Thus first initial terms of the sequence hB k
z,ui
hB z,ui
are 18 38 78
8 = 2.25, 18 = 2.1111, 38 = 2.0526,
158
78 = 2.0256, 318 638 1278
158 = 2.0126, 318 = 2.0062, 638 = 2.0031. Thus up to to two decimal places largest
eigenvalue is 2. Moreover first few terms of the sequence {λ−k k t t t
1 B z} are [3, 2] , [4.5, 4] , [4.75, 4.5] ,
t t t t t
[4.875, 4.75] , [4.9375, 4.875] , [4.96875, 4.9375] , [4.984375, 4.96875] ,[4.9921875,
  4.984375]
  . Thus 
t 3 5 −2
the eigenvector correct up to one decimal place is [5, 5] . Note that = + ,
1 5 −4
 
−2
where should be the eigenvector corresponding to the other eigenvalue. And hence other
−4
eigenvalue is 1.
Remark 5.1. Note that since k·k is a continuous function, from (5.5) we have limk→∞ kλ−k k
1 B zk =
λ−k k
1 B z α 1 v1 Bk z v1
kα1 v1 k. And hence limk→∞ −k k
kλ1 B zk
= kα1 v1 k , or limk→∞ kB k zk
= kv1 k .
Further since B rep-
  k
B z v1
resents a continuous linear map from Rn to Rn , we have limk→∞ B kB k zk = B kv1 k , or
equivalently,
B k+1 z v1
lim = λ1 (5.6)
k→∞ kB k zk kv1 k
5 THE EIGEN-VALUE PROBLEM 52

n k+1 o
Note that by writing first few terms of the sequence B k
z
for the Problem 5.1 w.r.t.
      kB zk    
18 1 38 1 78
infinity norm as 18 = 2.25 1
, 18 = 2.1111 1
, 38 =
  16
  .88  36   .9474  76 
1 1 158 1 1 318 1
2.0526 , 78 = 2.0256 , 158 = 2.0127 ,
 .9744
  156    .9873  316
 .9937
1 638 1 1 1278 1
318 = 2.006 , 638 = 2.003 . This shows that the eigen-
636 .9969 1276 .9948
 
1
value is 2 correct up to two decimal places and corresponding unit norm eigenvector is
1
correct up to one decimal places.
5.2. QR Decomposition
Let A = [a1 , a2 , . . . , an ] be a non-singular square matrix of order n × n such that a1 , a2 , . . . , an
are the column vectors of A. We apply Gram-Schmidt Process to find an orthonormal basis
from the basis {a1 , a2 , . . . , an }. For this we first consider a unit vector in the direction of a1
as e1 = kaa11 k . Next we search for a vector u2 in the perpendicular direction of a1 such that
span{a1 , a2 } = span{e1 , u2 }, this can be obtained by subtracting from a2 the projection of a2 in
the direction of a1 , that is, ha 2 ,a1 i
ka1 k2 a1 . Thus u2 = a2 −ha2 , e1 ie1 . Consider unit vector in the direction
of u2 , that is, e2 = kuu22 k . Using ha2 , e2 i = hu2 , e2 i, we have ha2 , e2 ie2 = hu2 , e2 ie2 = ku2 ke2 = u2 .
Similarly we define e3 , e4 , . . . , en and get

a1 = ha1 , e1 ie1 ,
a2 = ha2 , e1 ie1 + ha2 , e2 ie2
k
X
In general , ek = hak , ej iej .
j=1

Thus if we consider
 
ha1 , e1 i ha2 , e1 i ... han , e1 i
 0 ha2 , e2 i ... han , e2 i
Q = [e1 , e2 , . . . , en ]t , and R=
 ...
,
... ... ... 
0 0 ... han , e2 i

then A = QR.
Note that the norm used here is k · k2 , which is compatible with inner product. Moreover, it
can also be shown that QR decomposition of a non-singular square matrix is unique.
 
−3 −5 −8
Problem 5.2. Find the QR decomposition of the matrix  6 4 1 .
−6 2 5
Solution.
√ Note that a1 = [−3, 6, −6]t , a2 = [−5, 4, 2]t , and a3 = [−8, 1, 5]t . So that ka1 k =
9 + 36 + 36 = 9 and e1 = [−1/3, 2/3, −2/3]t . Now ha1 , e1 i = 9, ha2 , e1 i = 3, ha3 , e1 i = 0 so
that u2 = [−5, 4, 2]t − 3[−1/3, 2/3, −2/3]t = [−4, 2, 4]t . Thus e2 = [−2/3, 1/3, 2/3]t and ha2 , e2 i =
6, ha3 , e2 i = 9. Now u3 = [−8, 1, 5]t − 0e1 − 9[−2/3,

t
1/3, 2/3] = [−2, −2, t
 −1] so that e3 =
−1 −2 −2 9 3 0
[−2/3, −2/3, −1/3]t and ha3 , e3 i = 3. Thus Q = 31  2 1 −2 and R = 0 6 9.
−2 2 −1 0 0 3
5.3. QR Algorithm
Let A1 be a square matrix of order n with n distinct eigenvalues λ1 , λ2 , . . . , λn such that
|λ1 | > |λ2 | > . . . > |λn |. Decompose A1 as A1 = Q1 R1 , where R1 is an upper triangular matrix and
5 THE EIGEN-VALUE PROBLEM 53

Q1 is orthogonal matrix such that Qt1 = Q−1 −1 −1


1 . Consider A2 = R1 Q1 = Q1 Q1 R1 Q1 = Q1 A1 Q1 .
Thus A2 is similar to A1 and hence the set of eigenvalues of A2 is same as the set of eigenvalues
of A1 . Now if A2 has the QR decomposition as A2 = Q2 R2 , we define A3 = R3 Q3 , which is again
similar to A2 and hence similar to A1 . Thus we find a sequence of similar matrices {An }.
For the convergence following theorem is stated without proof.
Theorem 5.1. If a matrix A of order n × n has n distinct eigenvalues λ1 , λ2 , . . . , λn such that
|λ1 | > |λ2 | > . . . > |λn | and all the principal minors of the matrix of eigenvectors of At are non
zero, then sequence {An } converges to a diagonal matrix with diagonal entries as eigenvalues.
 
3 1
Problem 5.3. Use QR algorithm to find the eigenvalues of the matrix .
2 2
   
3 1 1 3 −2
Solution. Let A1 = . The QR factorization of A = Q1 R1 where Q1 = 13 √ and
  2 2   2 3
13 7 1 53 −5
R1 = √113 . Thus A2 = R1 Q1 = 13 . This shows that the first approximation
0 4 8 12
53 12
to the eigenvalues
 are
 13 , 13 . To decompose
 A2 as Q2 R2 we follow the same procedure
 to get
1 53 −8 17 −1 893 189
Q2 = 13√ and R2 = √117 . Now A3 = R2 Q2 = 2211
.Thus second
17 8 53 0 4 32 212
approximation to the eigenvalues turns out to be 893 212
221 and 221 . We can proceed further to find the
next approximation.
5.4. Location of Eigenvalues
If λ is an eigenvalue of a square matrix B with eigenvector v, then Bv = λv, or kBvk = |λ|kvk
and hence
kBvk kBk × kvk
|λ| = ≤ = kBk.
kvk kvk
Note that this is true for all possible matrix norms, that is, k · k1 , k · k2 and k · k∞ . Thus
|λ| ≤ min{kBk1 , kBk2 , kBk∞ }
Theorem 5.2. (Gershgorin’s Theorem) Let A be a square matrix B of order n × n. Each
eigenvalue λ of B satisfies
n
X
|aii − λ| ≤ |aij |, (5.7)
j=1,j6=i

at least for some 1 ≤ i ≤ n.


Proof. Let λ be the eigenvalue of the matrix B and v be an eigenvector corresponding to this
eigenvalue so that Av = λv. If kvk∞ = |vk |, then |vk | ≥ |vi | for all 1 ≤ i ≤ n. Now since
λvk − akk vk = ak1 v1 + ak2 v2 + . . . + ak,k−1 vk−1 + ak,k+1 vk+1 + . . . + akn vn , we have
v1 v2 vk−1 vk+1 vn
|λ − akk | = ak1 + ak2 + . . . + ak,k−1 + ak,k+1 + . . . + akn
vk vk vk vk vk
|v1 | |v2 | |vk−1 | |vk+1 | |vn |
≤ |ak1 | + |ak2 | + . . . + |ak,k−1 | + |ak,k+1 | + . . . + |akn |
|vk | |vk | |vk | |vk | |vk |
≤ |ak1 | + |ak2 | + . . . + |ak,k−1 | + |ak,k+1 | + . . . + |akn |.
Thus for i = k the inequality (5.7) holds.
Remark 5.2. Since the set of eigenvalues of a square matrix A is as of its transpose At , one can
apply the Gershgorin’s theorem to At to conclude
n
X
|aii − λ| ≤ |aji |.
j=1,j6=i
5 THE EIGEN-VALUE PROBLEM 54

Problem 5.4. Use Gerschgorin’s theorem to find the location of eigenvalues of the matrix
 
1 0 −1
 1 −2 1 .
2 −1 −1

Solution. Let λ be eigenvalue of the given matrix. According to Gerschgorin’s theorem the λ has
to satisfy at least one of the following conditions. |λ − 1| ≤ 1, |λ + 2| ≤ 1 + 1 and |λ + 1| ≤ 2 + 1.
Thus all the eigenvalues of the matrix lie within the union of these three disks. Further if we apply
Gershgorin’s theorem to transpose of the given matrix, then λ should lie within the union of the
disks |λ − 1| ≤ 1 + 2, |λ + 2| ≤ 1, and |λ + 1| ≤ 1 + 1. Thus finally we conclude that all the
eigenvalues should lie within the intersection of these two unions.
6 NONLINEAR EQUATION 55

6 Nonlinear Equation
In last section we learned to find the solution of a system of linear equations. But in practical
problems it is also very important to find the value (or values) of an unknown satisfying certain
nonlinear equation.
Question 6.1. Let f be a nonlinear continuous function defined on real line. Can we determine
the values of x satisfying
f (x) = 0, x ∈ R? (6.1)

Remark 6.1. It might be easy to find the solution for f (x) = 0, if we know some how that this
solution will lie in some particular interval. By using Intermediate value theorem one can find such
an interval [a, b] by determining two points a, b ∈ R such that f (a) and f (b) have different signs.
Now we will discuss some iterative methods to find such a solution.
6.1. Bisection Method

This method is based on the fact that if a continuous function f is changing the sign at two
points, then f must have at least one root in the smallest interval containing these two points. And
further subdivide this interval in two subintervals of equal length and search for the subinterval in
which the root lies. This method can be described in following steps.

• First determine the interval [a, b] in which the root lies. Which can be determined by finding
two points a, b such that f (a)f (b) < 0. We consider middle point x1 of [a, b] as the first
approximation to the root. If x1 is the exact root then done or otherwise proceed to next
step.
• Next choose the interval [a2 , b2 ] with one endpoint as x1 and other endpoint as one of a or
b according to f (x1 )f (a) < 0 or f (x1 )f (b) < 0 respectively. Now find second approximation
x2 as the middle point of [a2, b2 ] such that |x2 − r| < |b2 −a
2
2|
= |b−a|
4 . If x2 is the exact root
then done or otherwise proceed to next step.
• In general find the kth approximation xk to root as the middle point of [ak , bk ]. If xk is the
root, stop the process. And if not, find next interval [ak+1 , bk+1 ] such that one endpoint
of this interval is xk and other endpoint is one of ak or bk according to f (xk )f (ak ) < 0 or
f (xk )f (b + k) < 0 respectively. Note that

|bk − ak | |b − a|
|xk − r| ≤ = . (6.2)
2 2k

6.2. Error analysis in bisection method

In bisection method we search for the location of the a root r of (6.1) lying in each interval [a, b].
The equation (6.2) shows that
(b − a)
|ek | ≤ .
2k
Using this inequality we can approach to the root as closer as we want by increasing the number
of iterates.
6.3. The secant method
This method is based upon the linear approximation of the function. First we search two
points near the root of the function, name these points as x0 , x1 such that |f (x1 )| < |f (x0 )|.
These points x0 , x1 are first two approximations to the root. The line passing through two points
(x0 , f (x0 )), (x1 , f (x1 )) on the graph of the function is called secant. We consider this secant as the
6 NONLINEAR EQUATION 56

approximating function to find the next iterate x2 to root as the intersection point of this secant
with the real line. One can draw a figure and check that the slope of this secant can be written
in two different ways using similar triangles rule (assuming without loss of generality that x1 < x0
and 0 < f (x1 ) < f (x0 )).
f (x0 ) f (x0 ) − f (x1 )
= . (6.3)
x0 − x2 x0 − x1
After solving this we get
x0 − x1
x2 = x1 − f (x1 ) . (6.4)
f (x0 ) − f (x1 )
Now we will use x1 , x2 to determine the secant as the line passing through (x1 , f (x1 )), (x2 , f (x2 )) on
the graph of the original function and consider x3 as the intersection of this secant with the real line.
In general, we consider the secant passing through the graphical points (xk−1 , f (xk−1 )), (xk , f (xk ))
and the next approximation xk+1 as the intersection point of the secant with the real line. We
have
xk−1 − xk
xk+1 = xk − f (xk ) . (6.5)
f (xk−1 ) − f (xk )
Remark 6.2. The sequence of iterates does not need to converge to root of the function. In fact
it might also diverse to infinity.
Question 6.2. Then, why does one use secant method instead of bisection method, which gives
the security of convergence.
6.4. Order of convergence and asymptotic error
Suppose the sequence of iterates {x0 , x1 , . . .} converges to root r and for the sequence of errors
{ek = r − xk }, there exists a number p and a constant C 6= 0 such that

|ek+1 |
lim = C, (6.6)
k→∞ |ek |p

then p is called the order of convergence and C is called the asymptotic error.
6.5. Error analysis of the secant method
Since we do not know exact positioning of the root, our treatment to error will be different
from bisection method, where we knew for sure the interval in which the root lies. Let us go back
to the theme of secant method: linear approximation to the function at each new iterate. In fact
for linear approximation with nodes as xk−1 , xk , we can use (2.35) to write the expression for the
function as

f (x) = f (xk ) + f [xk , xk−1 ](x − xk ) + f [xk , xk−1 , x](x − xk )(x − xk−1 ). (6.7)

If r is the root for the function, that f (r) = 0, we have from (6.7)

0 = f (xk ) + f [xk , xk−1 ](r − xk ) + f [xk , xk−1 , r](r − xk )(r − xk−1 ).

This implies (assuming f [xk , xk−1 ] 6= 0)

f (xk ) f [xk , xk−1 , r]


r = xk − − (r − xk )(r − xk−1 ).
f [xk , xk−1 ] f [xk , xk−1 ]

Now using (6.5) we get

f [xk , xk−1 , r]
r = xk+1 − (r − xk )(r − xk−1 ). (6.8)
f [xk , xk−1 ]
6 NONLINEAR EQUATION 57

Further if r − xk = ek denotes the error at kth iterate, then we get from (6.8) that
f [xk , xk−1 , r]
ek+1 = − ek ek−1 . (6.9)
f [xk , xk−1 ]
To determine the order of convergence of the secant method we note that from (6.9)

|ek+1 | = ck |ek ek−1 |, (6.10)

where
f [xk , xk−1 , r]
ck = . (6.11)
f [xk , xk−1 ]
Now if we assume that the sequence of iterates converges to the root r and the function is twice
continuously differentiable, we can write

lim ck = lim |f 00 (ηk )/2f 0 (ξk )|, (6.12)


k→∞ k→∞

where ηk belongs to smallest interval containing xk , xk−1 , r and ξk belongs to smallest interval
containing xk , xk−1 . But because of convergence of the iterates to the root r these intervals are
eventually shrinking to one point r as k tends to infinity and hence we have

lim ck = |f 00 (r)/2f 0 (r)| := c. (6.13)


k→∞

Now using (6.10), we can write


 α
|ek+1 | |ek |
= ck |ek |1−p |ek−1 | = ck , (6.14)
|ek |p |ek−1 |p

provided α = 1 − p and αp = −1, that is, p2 − p − 1 = 0 so that p = (1 + 5)/2. Note that the
negative value of p is not considered because it leads to limk→∞ |ek+1 |/|ek |p = 0, we don’t want
this situation to determine the order of convergence. If we consider yk = |ek |/|ek−1 |p , we have
from (6.14)
−1/p
yk+1 = ck yk (6.15)
Now if yk → y, then y = cy −1/p √or y = c1/p . Thus the asymptotic error is |f 00 (r)/2f 0 (r)|1/p and
order of convergence is p = (1 + 5)/2 = 1.618.
6.6. Newton’s Method
Newton’s method of finding the root of the equation f (x) = 0 is based upon the approximation
of the function with the tangent line near the root. Thus our function is assumed to be differen-
tiable. Here we only need one point x0 near the root as the initial approximation to begin with.
To find next approximation we draw the tangent line at (x0 , f (x0 ) and consider the intersection
point of this tangent line with the real line and name it as x1 . Thus
f (x0 )
f 0 (x0 ) = .
x0 − x1
This gives
f (x0 )
x1 = x0 − . (6.16)
f 0 (x0 )
In general to find (k + 1)th iterate, we use tangent line at (xk , f (xk ) as the approximation function
and consider its root, that is, the intersection point with the real line. And we have
f (xk )
xk+1 = xk − . (6.17)
f 0 (xk )
6 NONLINEAR EQUATION 58

Remark 6.3. Newton’s method also need not to converge.


6.7. Error analysis of Newton’s method
One can write the function in the Newtonian form as
f (x) = f (xk ) + f [xk , xk ](x − xk ) + f [xk , xk , x](x − xk )2 . (6.18)
If r is the root of the function, we have 0 = f (xk ) + f [xk , xk ](r − xk ) + f [xk , xk , r](r − xk )2 . Or,
f (xk ) f [xk , xk , r] f [xk , xk , r]
r = xk − − (r − xk )2 = xk+1 − (r − xk )2 . (6.19)
f [xk , xk ] f [xk , xk ] f [xk , xk ]
Further if ek = r − xk is the error at kth stage,
f [xk , xk , r] 2 f 00 (ηk ) 2
ek+1 = − ek = − 0 e , (6.20)
f [xk , xk ] 2f (xk ) k
where ηk belongs to smallest interval containing xk , r. And if sequence of iterates converges to r,
then this interval eventually shrinks to point r itself and hence
|ek+1 | f 00 (ηk ) f 00 (r)
lim = lim = . (6.21)
k→∞ |ek |2 k→∞ 2f 0 (xk ) 2f 0 (r)
Above equation shows that if the Newton’s method converges, then order of convergence is 2 with
asymptotic error as |f 00 (r)/2f 0 (r)|.
6.8. Fixed point iteration method
If f (x) = 0 ⇔ x = g(x), then instead of finding roots of f (x) we search for the fixed points of
the function g(x) (both are same). Consider some initial approximation to fixed point say x0 to
determine first approximation by the equation x1 = g(x0 ) and in general
xn+1 = g(xn ) for all n ∈ N. (6.22)
Theorem 6.1. If g : [a, b] → [a, b] is a continuous function, then there is at least one fixed point
for g in [a, b]. If g is differentiable in (a, b) and
|g 0 (x)| ≤ α < 1 for all x ∈ (a, b).
then there is exactly one fixed point of g in (a, b). Further if x1 ∈ (a, b), then the sequence, defined
by
xn = g(xn−1 ),
converges to the fixed point.
Proof. Since the function g(x) − x changes sign in [a, b], by Intermediate Value Property of con-
tinuous functions g(x) − x has a root in [a, b]. Now suppose if possible, ξ1 , ξ2 ∈ (a, b) be such that
g(ξ1 ) = ξ1 and g(ξ2 ) = ξ2 . Then by mean value theorem there exists a point c in between ξ1 and
ξ2 such that
g(ξ2 ) − g(ξ1 ) ξ2 − ξ1
g 0 (c) = = = 1,
ξ2 − ξ1 ξ2 − ξ1
a contradiction to the assumption. Hence there is only one fixed point say ξ in (a, b).
We will use Mathematical Induction to prove that the set {xn : n ∈ N} is a subset of (a, b).
Clearly x2 = g(x1 ) ∈ (a, b) since x1 ∈ (a, b). Next if xk ∈ (a, b), xk+1 = g(xk ) ∈ (a, b). Note that
|xk+1 − xk | = |g(xk ) − g(xk−1 )| = |g 0 (λk )(xk − xk−1 )| = |g 0 (λk )||(xk − xk−1 )| ≤ α|xk − xk−1 |.
Now since α < 1, it can be proved that {xk } is a Cauchy sequence and hence convergent. If
xk → x, then by continuity of g, g(xk ) → g(x) (Note that x ∈ [a, b]). And hence taking the limit
in (6.22), we get x = g(x). Thus x is the fixed point of g in (a, b), or x = ξ.
6 NONLINEAR EQUATION 59

Theorem 6.2. Let l0 be a fixed point of g(x). Suppose  > 0 be such that g is differentiable
on [l0 − , l0 + ] and |g 0 (x)| ≤ α < 1 for all x ∈ [l0 − , l0 + ]. Then the sequence defined by
xn = g(xn−1 ) and x1 ∈ [l0 − , l0 + ], converges to l0 .
Proof. To prove this theorem we first prove that g([l0 −, l0 +]) ⊂ [l0 −, l0 +]. If x ∈ [l0 −, l0 +],
then |l0 − g(x)| = |g(l0 ) − g(x)| = |g 0 (c)| |l0 − x| ≤ α|l0 − x| ≤ α < , hence g(x) ∈ [l0 − , l0 + ].
Now g fulfils the condition of Theorem (6.1) with interval [a, b] as [l0 − , l0 + ] so we conclude the
rest.
Remark 6.4. Suppose if g is twice differentiable, and g 0 (r) 6= 0, where r is the fixed point of g,
then by Taylor’s formula
1 1
xk+1 = g(xk ) = g(r + xk − r) = g(r) + g 0 (r)(xk − r) + g 00 (c)(xk − r)2 = r + g 0 (r)ek + g 00 (c)e2k .
2 2
Thus,
|ek+1 | 1
= |g 0 (r) + g 00 (c)ek |.
|ek | 2
Now if {xk } converges to r or equivalently ek → 0, we have

|ek+1|
lim = |g 0 (r)|.
k→∞ |ek |

This shows that the order of convergence of fixed point iteration method is 1 and asymptotic error
is |g 0 (r)|.
Remark 6.5. Note that the iterations of Newton’s method are given by
f (xk )
xk+1 = xk − .
f 0 (xk )

If g(x) = x − ff0(x)
(x) , then Newton’s method is also a fixed point iteration method. From the last
remark order of convergence of fixed point iteration method is 1, but we know that the order
of convergence of Newton;s method is 2. This ambiguity arises because in Newton’s method
g 0 (r) = 0, which is not the case in general for fixed point iteration method. Now let us examine
the convergence behavior of the fixed point iteration method if g 0 (r) = 0 and g 00 (r) 6= 0. Suppose
g is thrice differentiable, then by Taylors formula
1 1
xk+1 = g(xk ) = g(r + xk − r) = g(r) + g 0 (r)(xk − r) + g 00 (r)(xk − r)2 + g 000 (c)(xk − r)3 ,
2 6
1 00 2 1 000 3
= r + g (r)ek + g (c)ek .
2 6
Thus
|ek+1 | 1 1 |ek+1 | 1
= | g 00 (r) + g 000 (c)ek |, or lim = | g 00 (r)|.
|ek |2 2 6 k→∞ |ek |2 2
Hence the convergence is of order 2 with asymptotic error as | 21 g 00 (r)|. Since for Newton’s
00
f (x) 1 |f (r)|
method g(x) = x − f 0 (x) , the asymptotic error | 12 g 00 (r)| = 2 |f 0 (r)| .

6.9. Multiple Roots


If f (x) = 0 has a root at x = r with multiplicity m greater than 1, then f can be written as
f (x) = (x − r)m h(x), where h is a function such that h(r) is nonzero finite quantity. Now

(x − r)m h(x) x−r (x − r)2 h0 (x)


g(x) = x − = x − + .
m(x − r)m−1 h(x) + (x − r)m h0 (x) m m2 h(x) + mh0 (x)(x − r)
6 NONLINEAR EQUATION 60

And we have g 0 (r) = 1 − m 1


. This shows that Newton’s method converges linearly when m > 1.
We aim to modify Newton’s method to ensure the quadratic convergence. For this we modify g
such that g 0 (r) = 0 and g(r) = r. Note that if we assume modified g as g(x) = x − α ff0(x)
(x) , then
g 0 (r) = 1 − α
m. Thus we assume g(x) = x − m ff0(x)
(x) , so that modified sequence of iterates is

f (xk )
xk+1 = xk − m . (6.23)
f 0 (xk )
Note that this sequence of iterates can be applied if we know a priory the multiplicity of the root.
But to get the quadratic convergence, without knowing the multiplicity of the roots, we assume
g(x) = x − uu(x) f (x) 0
0 (x) , where u(x) = f 0 (x) . Thus g(r) = r and g (r) = 0, and hence iterates defined

by xk+1 = g(xk ), converges quadratically irrespective of the multiplicity of the root of f (x) = 0.
Thus in general we use following approximations to get quadratic convergence in case of roots with
multiplicity greater than 1,
f (xk )f 0 (xk )
xk+1 = xk − . (6.24)
(f 0 (xk ))2 − f (xk )f 00 (xk )

Note that in this case we need an extra evaluation of f 00 (x) in each iteration.
6.10. Solution of System of Nonlinear Equations
Consider the system of two nonlinear equations in two unknowns.

f (x, y) = 0,
g(x, y) = 0.
 
f (x, y)
Let F : R2 → R2 be defined by F (x, y) = . Using Taylor’s series expansion for multi
g(x, y)
variable function we have

0 = f (x, y) = f (xk + x − xk , yk + y − yk ) = f (xk + ∆x, yk + ∆y)


1
= f (xk , yk ) + [∆xk (∂/∂x) + ∆yk (∂/∂y)]f |xk ,yk + [∆xk (∂/∂x) + ∆yk (∂/∂y)]2 f |xk ,yk + . . . .
2
Neglecting higher order terms [∆xk (∂/∂x) + ∆yk (∂/∂y)]f |xk ,yk ≈ −f (xk , yk ). Thus we have

[∆xk fx + ∆yk fy ]|(xk ,yk ) ≈ −f (xk , yk ),

[∆xk gx + ∆yk gy ]|(xk ,yk ) ≈ −g(xk , yk ).


Or,      
fx fy ∆xk f
≈− . (6.25)
gx gy (xk ,yk )
∆yk g (x
k ,yk )
 
fx fy
Let J denotes the Jacobian and Jk denotes the Jacobian J evaluated at Xk = [xk , yk ]t .
gx gy
Thus we can write Jk ∆Xk ≈ −F (Xk ), or ∆Xk ≈ −Jk−1 F (Xk ), or X ≈ Xk − Jk−1 F (Xk ), or
     
x x f (xk , yk )
≈ k − Jk−1 . (6.26)
y yk g(xk , yk )

Thus we define sequence of iterates as


     
xk+1 xk −1 f (xk , yk )
= − Jk . (6.27)
yk+1 yk g(xk , yk )
6 NONLINEAR EQUATION 61

This method is known as Newton’s iteration method of solving system of nonlinear equations. In
general for system of n linear equations in n unknowns we define the iterates as
Xk+1 = Xk − Jk−1 F (Xk ), (6.28)
(k) (k) (k)
where Xk = [x1 , x2 , . . . , xn ]t , F (X) = [f1 (X), f2 (X), . . . , fn (X)]t and Jk is the Jacobian of
(k) (k) (k)
f1 , f2 , . . . , fn with respect to x1 , x2 , . . . , xn evaluated at Xk = [x1 , x2 , . . . , xn ]t .
Remark 6.6. Suppose we have system of two linear equations say
ax + by = s (6.29)
cx + dy = t. (6.30)
We can view them as f1 (x, y) = ax+by−s
  = 0 and f2 (x, y) = cx+dy−t = 0. So that Jacobian J of
fx fy a b
f1 , f2 with respect to x, y is = , which is constant. Now we assume that [x0 , y0 ]t is
gx gy c d
the initial approximation to the solution and J is invertible so that we can apply Newton’s method
to find first approximation to the solution say [x1 , y1 ]t as
     −1  
x1 x a b f (x0 , y0 )
= 0 − . (6.31)
y1 y0 c d g(x0 , y0 )
Or,
             
a b x1 a b x0 f1 (x0 , y0 ) ax0 + by0 ax0 + by0 − s s
= − = − = . (6.32)
c d y1 c d y0 g(x0 , y0 ) cx0 + dy0 cx0 + dy0 − t t
This shows that when we apply Newton’s method to linear equations, then defining sequence of
iterates becomes exact.
Remark 6.7. We can find the complex roots of an equation f (z) = 0 by finding the real and
imaginary parts of f (z) = f (x + iy) as f (x + iy) = u(x, y) + iv(x, y) and then solving the system
of equations as
u(x, y) = 0; v(x, y) = 0.
Problem 6.1. Find the solution of the system of following equations
x2 + 10x + 2y − 13 = 0; x2 + 6y 2 − 7 = 0,
using initial approximation as x0 = 0.5 and y0 = 0.5.
Solution. Let f (x, y) = x2 + 10x + 2y − 13 and g(x, y) = x2 + 6y 2 − 7. Thus Jacobian
   
2x + 10 2 −1 1 12y −2
J= ; J =
2x 12y 24xy + 120y − 4x −2x 2x + 10
          
x1 x0 f (x0 , y0 ) 0.5 6 −2 −6.75
and = − J −1 (x0 , y0 ) = − 64 1
=
 y1  y0    g(x0 , y0 )  0.5  −1 11  −4.5 
0.9921875 x2 x1 f (x1 , y1 ) 0.9921875
and = − J −1 (x1 , y1 ) = −
1.16796875 y2
 y 1
 g(x1 , y1 )
 1.16796875
1 1.4015625 −2 0.2422485352 1.016572644
163.9997559 −1.984375 11.984375 = . Now we
  2.169342041
   1.021542721   
x3 x2 −1 f (x2 , y2 ) 1.016572644
find next iterate as = − J (x2 , y2 ) = −
y3 y2 g(x2 , y2 ) 1.021542721
    
1 12.19887173 −2 0.2422318225 1.00008153
143.4421732 −2.033145288 12.033145288 = . Thus solution
0.2947171255 1.000252737
correct up to one decimal place is x = 1, y = 1.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy