Ma 204 5
Ma 204 5
Ma 204 5
Syllabus:
Suggested Readings:
1. S.D. Conte and Carle de Boor, Elementary Numerical Analysis An Algorithmic Approach
(3rd Edition), McGraw-Hill, 1980.
3. E. Kreyszig, Advanced Engineering Mathematics (8th Edition), John Wiley and Sons, 1999.
5. M.K. jain, S.R.K. Iyengar, R.K. Jain, Numerical Methods (6th Edition), New Age
International (P) Ltd, 2012.
These are some partial, rough notes prepared for MA-204 students at IIT Indore. Mistakes
and typos are bound to be there. Students are suggested to read them carefully and are
encouraged to send their comments and suggestions.
Ashisha Kumar
Discipline of Mathematics
I.I.T. Indore.
1
CONTENTS 2
Contents
1 Introduction 3
2 Interpolation by polynomials 4
3 Numerical integration 28
6 Nonlinear Equation 55
1 INTRODUCTION 3
1 Introduction
There are many mathematical problems, which are formulated while solving problems from other
sciences. Some of these problems can be solved by methods which we have learned in the course of
Calculus, Differential Equations and Linear algebra etc. And we are happy to get exact solutions
or some information which is showing the analytic behavior of the solution. But in most of the
practical purpose we can not get exact solutions by direct computations (may be because of round
off errors) or the solutions in known compact forms may not exist or even the solution in compact
forms is not sufficient and we want some numeric value as the solution. In this course our aim is
to solve such mathematical problems using numerical methods.
2 INTERPOLATION BY POLYNOMIALS 4
2 Interpolation by polynomials
Suppose we want to know about the value of a function (or some of its derivative) at a particular
point. But only information we know about the function is that we know its value (or the value
of its derivatives) at certain other points. It might be some arbitrary continuous function. And
we want to approximate the function by some polynomial, which agrees with the known data
about the function. The reason of approximating the given functions by polynomial is that we can
compute the value of a polynomial at a point just by using basic operations (addition subtraction
and multiplication) of computer.
In this chapter our aim is to find an approximating polynomial to a function, whose value or
the value of its certain derivatives is known at some points. These points are called nodes. To ease
the computations we always try to find the minimal degree polynomial, which satisfies the given
data.
Problem 2.1. Find a polynomial, which coincide with |x| at the points −1, 0 and 1.
It means we need to find a polynomial P , which should coincide with the function |x| at the
points −1, 0 and 1. So we must have
P (−1) = | − 1| = 1, P (0) = |0| = 0, P (1) = |1| = 1 (2.1)
Now since there are three distinct points, we will try to find a two degree polynomial, because
the degree of freedom of two degree polynomial is three. (The vector space of at most two degree
polynomial is three dimensional and {1, x, x2 } is a basis for this vector space.)
Let P (x) = a + bx + cx2 be a polynomial, which passes through the points (−1, 1), (0, 0), (1, 1).
Then we must have
a + b(0) + c(0) = 0
2
a + b(−1) + c(−1) = 1
a + b(1) + c(1)2 = 1 (2.2)
Remark 2.1. Here we need to solve a system of three linear equations in three unknowns. If we
would have approximated it by a polynomial of degree less than two, then the system of three
linear equations in two unknowns might not have any solution. And similarly if we would have
approximated it by a four degree polynomial, then this system of linear equations might have
infinite number of solutions. But because of increased degree of polynomial, computations will be
difficult.
From (2.2), we must have
1 0
0 a 0
1 −1
1 b = 1 (2.3)
1 1
1 c 1
1 0 0 1 0 0 −2 0 0
Since the det 1 −1 1 = −2, and the Adj 1 −1 1 = 0 1 −1. The inverse of
1 1 1 1 1 1 2 −1 −1
the matrix is given by
−1
1 0 0 −2 0 0
1
1 −1 1 = − 0 1 −1 . (2.4)
2
1 1 1 2 −1 −1
And hence
a −2 0 0 0 0
b = − 1 0 1 −1 1 = 0 . (2.5)
2
c 2 −1 −1 1 1
Thus the two degree polynomial which coincide with |x| at (−1, 0, 1) is given by x2 .
2 INTERPOLATION BY POLYNOMIALS 5
Remark 2.2. In general there are large number of data points with numerically rich data.
• We need to compute the inverse of a large matrix.
• The solution of the system of linear equations may be far from that of exact solution due to
round off errors.
Suppose we know the value of a continuous function at two points x0 , x1 , that is, we know
f (x0 ), f (x1 ).
Question 2.1. Can we find a polynomial, which satisfies P (x0 ) = f (x0 ), P (x1 ) = f (x1 )?
And the answer is very easy that we learned in our 10th standard. We find the equation of a
line passing through (x0 , f (x0 )) and (x1 , f (x1 )), because the equation of line is a polynomial of
degree one. I am sure that you know many versions of the formula of finding the equation of a line
passing through two points. But we try to find this in some different way.
P (x) = ax + b; f (x0 ) = ax0 + b; f (x1 ) = ax1 + b;
or
P (x)(−1) + ax + (1)b = 0; f (x0 )(−1) + ax0 + (1)b = 0; f (x1 )(−1) + ax1 + (1)b = 0;
Now if you recall your matrix representations of system of linear equations from Linear Algebra
course, we can write above system of equations as
P (x) x 1 −1 0
f (x0 ) x0 1 a = 0 (2.6)
f (x1 ) x1 1 b 0
P (x0 ) x0 1
Now you know that if for any particular value of x say x0 the 3 × 3 matrix f (x0 ) x0 1 is
f (x1 ) x1 1
invertible, then by multiplyingthe
inverse of this matrix on both sides of the equation
(2.6), we get a
−1 P (x) x 1
contradiction that the matrix a is a null matrix. This implies that the matrix f (x0 ) x0 1
b f (x1 ) x1 1
is a singular matrix for every value of x. And hence, the determinant of this matrix is zero for all
P (x) x 1
values of x. Then by simplifying the determinant f (x0 ) x0 1 with respect to first column, we
f (x1 ) x1 1
get
P (x)(x0 − x1 ) + f (x0 )(x1 − x) + f (x1 )(x − x0 ) = 0.
Rewriting it we get
(x − x1 ) (x − x0 )
P (x) = f (x0 ) + f (x1 ). (2.7)
(x0 − x1 ) (x1 − x0 )
Observation 2.1. Now if we want to find a polynomial P (x), which agrees with a function
f (x) at three distinct points x0 , x1 , x2 , that is, P (x) satisfies P (x0 ) = f (x0 ), P (x1 ) = f (x1 ) and
P (x2 ) = f (x2 ), and if we consider the two degree polynomial P (x) = a + bx + cx2 + dx3 and
proceed similar to above case then we will come to a conclusion that the determinant of the matrix
P (x) x2 x 1
f (x0 ) x20 x0 1
f (x1 ) x21 x1 1 is equal to zero for all values of x in the interval. By simplification with
f (x2 ) x22 x2 1
respect to first column we get.
(x − x1 )(x − x2 ) (x − x0 )(x − x2 ) (x − x0 )(x − x1 )
P (x) = f (x0 ) + f (x1 ) + f (x2 ). (2.8)
(x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) (x2 − x0 )(x2 − x1 )
2 INTERPOLATION BY POLYNOMIALS 6
Now if we write
(x − x1 )(x − x2 )
L0 (x) = ,
(x0 − x1 )(x0 − x2 )
(x − x0 )(x − x2 )
L1 (x) = ,
(x1 − x0 )(x1 − x2 )
(x − x0 )(x − x1 )
L2 (x) = ,
(x2 − x0 )(x2 − x1 )
then we observe the following properties of these functions.
• L0 + L1 + L2 = 1,
(
1 if i = j
• Lj (xi ) = δij = ,
0 6 j
if i =
P (x) = a0 + a1 x + a2 x2 + . . . + an xn , (2.9)
Remark 2.4. We can use the the following steps to find Lagrange polynomial for the given data.
a) First we compute Lagrange’s fundamental polynomial Lj for each j.
b) Multiply each Lj (x) with the corresponding functional value f (xj ).
c) Then sum these multiplications obtained in step b) over all j 0 s.
Remark 2.5. If we draw the graphs of both f (x) and P (x), then it is clear from the con-
ditions that these two graphs intersect each other at least at n + 1 distinct points namely
(x0 , f (x0 )), (x1 , f (x1 )), . . . , (xn , f (xn )). In other words P (x) interpolates f (x) at n + 1 points.
We can also think of finding a polynomial P such that certain derivatives of P coincide with
the same order derivatives of the given function f at some points in the interval.
But the third derivative of f (x) does not agree with the third derivative of P (x).
2
Example 2.4. If f (x) = cos x, then the polynomial P (x) = 1 − x2 not only agrees with f (x) at
x = 0, but also the first and second derivatives of P (x) agree with the first and second derivatives
2
of f (x) respectively at x = 0. Thus 1 − x2 is also an interpolating polynomial for cos x.
Example 2.5. If f (x) = sin x, then the polynomial P (x) = x not only agrees with f (x) at x = 0,
but also the first and second derivative of P (x) agrees with the respective derivatives of f (x) at
x = 0. Thus P (x) = x is an interpolating polynomial for sin x.
But the third order derivative of P (x) = x does not coincide with the third order derivative of
3
the function sin x! If we consider P (x) = x − x3! , then P (x) agrees with sin x at x = 0 along with
its first three derivatives.
Does it remind you something? (Think about it.) Let us see one more example then your guess
will be Pakka!
Example 2.6. If f (x) = ex , then the polynomial
2 INTERPOLATION BY POLYNOMIALS 8
• P0 (x) = 1 is an interpolating polynomial for ex , because P0 (x) = 1 agrees with the function
at x = 0.
• P1 (x) = 1 + x is an interpolating polynomial for ex , because P1 (x) and its first derivative
agree with the function ex at x = 0.
2
• P2 (x) = 1 + x + x2 is an interpolating polynomial for ex , because P2 (x) and its first and
second order derivatives agree with the function ex at x = 0.
2 3
• P3 (x) = 1 + x + x2 + x3! is an interpolating polynomial for ex , because P3 (x) and its first
three derivatives agree with the function ex and derivatives respectively at x = 0.
Example 2.7. The Taylor’s polynomial of degree n of the function f (x) around the point x0 is
given by
1 1
Pn (x) = f (x0 ) + (x − x0 )f 0 (x0 ) + (x − x0 )2 f 00 (x0 ) + . . . + (x − x0 )n f n (x0 ). (2.13)
2! n!
Then Pn agrees with f (x) at x = x0 along with its first n derivatives. Therefore, Pn is also an
interpolating polynomial for the function f (x).
Remark 2.6. In all above examples, to find the interpolating polynomial P (x), we do not need
the full information about the function. We only need the the specific data about the function. In
Example (2.5), we can also pose the problem in a different way, without saying any thing about
sine function as follows.
(i) Find a polynomial P (x), which satisfy the following P (0) = 0, P 0 (0) = 1, P 00 (0) = 0, and
P 000 (0) = −1, or
(ii) Find a polynomial P (x), which agree with the function f (x) along the data f (0) = 0, f 0 (0) = 1,
f 00 (0) = 0, and f 000 (0) = −1.
3
Note that this f (x) need not be sin x, for example it might be x − x3! + x10 .
Example 2.8. The polynomial
(x − x1 ) (x − x0 )
P (x) = f (x0 ) + f (x1 )
(x0 − x1 ) (x1 − x0 )
is an interpolating polynomial for f (x), because P (x) coincide with f (x) at the points x0 , x1 .
On the other hand if we want to find an interpolating polynomial for the data f (x0 ) = f (x1 ) = c,
then above formula yields P (x) to be a constant polynomial c. But the polynomial P (x) =
(x − x1 )(x − x2 ) + c also satisfies the data. It means that the interpolating polynomial for a given
data is not necessarily unique.
Question 2.2. Since the interpolating polynomial is not unique, the question arises that by which
interpolating polynomial we should approximate the function.
In above example there are two interpolating polynomial for the given data, one is a constant
polynomial and other is of degree two. Note that the calculations with the constant polynomial
will be easier than that of two degree polynomial.
Remark 2.7. In general we need to deal with a large and numerically rich data, so to ease the
computations it will be better to approximate by an interpolating polynomial of minimal degree.
Remark 2.8. In problem (2.1) we have interpolated a data with three distinct points by a two
degree polynomial and by construction it is clear that this is the only polynomial of at most degree
two, which can interpolate the given data. If we change the functional value as constant 1 at each of
the above three points (−1, 0, 1), and follow the same procedure as above, we get the interpolating
polynomial as constant polynomial 1, which will be again unique by construction.
2 INTERPOLATION BY POLYNOMIALS 9
Question 2.3. If we are given the functional value at (n + 1) distinct points, then by the answer
of Problem (2.2) there exists a Lagrange’s polynomial of at most degree n, which interpolates the
given data. Now the question is that wether there are any other interpolating polynomials (for the
same data) of degree at most n. Or in other words, wether the Lagrange’s polynomial is unique
interpolating polynomial of degree at most n for given data of n + 1 distinct points.
Answer to this is Yes.
Or in matrix form,
x20 . . . xn0
1 x0 a0 f (0)
1
x1 x21 . . . xn1
a1 f (x1 )
· · · ... · · ·
= (2.16)
· · · ... · · ·
· · · ... · · ·
1 xn x2n . . . xnn an f (xn )
This a system of (n + 1) linear equations in (n + 1) unknowns a0 , a1 , . . . , an . This system will
have a unique solution if determinant of following matrix (known as Vandermonde determinant)
is non zero.
1 x0 x20 . . . xn0
1 x1 x21 . . . xn1
· · · ... ·
∆= 6= 0 (2.17)
· · · ... ·
· · · ... ·
1 xn x2n . . . xnn
Q
But we know that ∆ = 0≤i<j≤n (xi − xj ) 6= 0. So all the coefficients of P (x) are uniquely
determined and hence P (x) itself.
Above lemma can be proved by repeated application of Rolle’s Theorem. We need the following
corollary to this lemma for our second proof.
Corollary 2.3. A non zero polynomial of degree n cannot have n + 1 distinct zeros.
Because if polynomial P (x) 6= 0 of degree n has n + 1 zeros, then its by above lemma its nth
derivative must have at least one zero. But the nth derivative of a polynomial of degree n is the
constant function n!an , where an is the coefficient of xn in P (x) and this constant function can not
have any zero unless an itself is zero. This gives the contradiction to the degree of the polynomial
P (x).
Proof. This proof is by contradiction, so if we consider any other polynomial Q(x) 6= P (x) of
degree at most n such that Q(x) also interpolates the same data, that is, we must have
From (2.14) and (2.18), it is clear that if the polynomial R(x) = P (x) − Q(x) 6= 0 must have n + 1
zeros at x0 , x1 , . . . , xn . This gives a contradiction because R(x) is of at most degree n, and hence
it cannot have n + 1 distinct zeros unless R(x) itself is a zero polynomial.
Remark 2.9. The main difficulty in dealing with Lagrange’s polynomial is that if we want to
increase the data even with a single point only, we need to compute it from the very beginning.
2.3. Newton’s Divided Difference Interpolating Polynomial
Newton suggested to write the interpolating polynomial in the following form for a given data
of n + 1 distinct points x0 , x1 , . . . , xn .
(2.20)
f (x0 ) = a0 ;
f (x1 ) = a0 + (x1 − x0 )a1 ;
f (x2 ) = a0 + (x2 − x0 )a1 + (x2 − x0 )(x2 − x1 )a2 ;
... = ...................................................................................
... = .................................................................................................
f (xn ) = a0 + (xn − x0 )a1 + (xn − x0 )(xn − x1 )a2 + . . . + (xn − x0 ) . . . (xn − xn−1 )an .
It is clear from the above system of equations that a0 = f (x0 ) and to compute a1 , we plug in the
value of a0 in the second equation and similarly to compute ak we can substitute the values of
a0 , a1 , . . . , ak−1 obtained by solving the first k equations in (k + 1)th equation. It is important to
note that we only need to consider first k + 1 equations to compute ak . Thus the computation
of ak depends only on the points x0 , . . . , xk . So we can rename ak as some new function of the
points {x0 , x1 , x2 , . . . , xk }. To show dependency of ak on the function f also we denote ak by
f [x0 , x1 , x2 , . . . , xk ]. We call f [x0 , x1 , x2 , . . . , xk ] as kth divided difference of the function f relative
to points x0 , x1 , . . . , xk . We will justify the name divided difference in further discussion. Now
we can use the values of a0 , . . . , ak in the (k + 2)th equation to compute ak+1 and so on. Thus
we can compute all the values of a0 , a1 , . . . , an and hence P (x). Since this P (x) is of degree
at most n and also interpolates the data at n + 1 distinct points say at x0 , x1 , x2 , . . . , xn , then
by Theorem (of uniqueness of interpolating polynomial) 2.1, we infer that P (x) is nothing but
Lagrange’s polynomial written in different form. But then an = f [x0 , x1 , x2 , . . . , xn ], which is the
2 INTERPOLATION BY POLYNOMIALS 11
Now it is worth to note that even if we rearrange the order of data points, the interpolating
polynomial will not change and hence an , the coefficient of xn in interpolating polynomial, will
remain the same. This fact is also justified by the expression on R.H.S. of (2.21). Thus we can
say that the nth divided difference f [x0 , x1 , x2 , . . . , xn ] is dependent on the set of node points
{x0 , x1 , x2 , . . . , xn } but independent of the order of node points.
In the following we aim to justify the name divided difference. It is easy to see from (2.20) that
f (x1 ) − f (x0 )
a1 = .
(x1 − x0 )
Here, a1 is obtained by dividing the difference of functional values by the difference of points and
hence it is known as Newton’s first divided difference of f (x) relative to x0 , x1 and denoted by
f [x0 , x1 ]. In general Newton’s first divided difference of f (x) relative to xj , xk , (j 6= k) is defined
as
f (xj ) − f (xk )
f [xj , xk ] = . (2.22)
(xj − xk )
It is clear from the definition that f [xj , xk ] is independent of the order of j and k and depends
only on the set {j, k}. Using these expressions for a0 and a1 , we can compute a2 from first three
equations of (2.20).
f (x0 ) f (x1 ) f (x2 )
a2 = f [x0 , x1 , x2 ] = + + . (2.23)
(x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) (x2 − x0 )(x2 − x1 )
Clearly, f [x0 , x1 , x2 ] is independent of order of x0 , x1 , x2 . In fact one can write f [x0 , x1 , x2 ] as
the fraction of the difference of certain first divided differences by difference of certain points as
following.
Exercise 2.1. Show that
f [x1 , x2 ] − f [x0 , x1 ] f [x1 , x2 ] − f [x2 , x0 ] f [x0 , x2 ] − f [x0 , x1 ]
f [x0 , x1 , x2 ] = = = . (2.24)
(x2 − x0 ) (x1 − x0 ) (x2 − x1 )
Hence f [x0 , x1 , x2 ] is known as Newton’s second divided difference of f (x) relative to
{x0 , x1 , x2 }. In general we define Newton’s second divided difference of f (x) relative to {xi , xj , xk }
as
f [xj , xk ] − f [xi , xj ]
f [xi , xj , xk ] = . (2.25)
(xk − xi )
Moreover we can proceed in the similar way and can see that f [x0 , x1 , x2 , . . . , xk−1 , xk ], the New-
ton’s kth divided difference of f (x) relative to {x0 , x1 , . . . , xk−1 , xk }, can be written as the fraction
of difference of certain (k − 1)th divided difference by the difference of certain point values as
follows.
f [x1 , x2 , . . . , xk−1 , xk ] − f [x0 , x1 , x2 , . . . , xk−1 ]
f [x0 , x1 , x2 , . . . , xk−1 , xk ] = . (2.26)
(xk − x0 )
It can be checked that
k
X f (xi )
f [x0 , x1 , x2 , . . . , xk−1 , xk ] = Qk , k = 3, . . . , n. (2.27)
i=0 j6=i=0 (xi − xj )
2 INTERPOLATION BY POLYNOMIALS 12
In general we define Newton’s kth divided difference of f (x) relative to {xi0 , xi1 , . . . , xik−1 , xik }
recursively as follows
f [xi1 , xi2 , . . . , xik−1 , xik ] − f [xi0 , xi1 , xi2 , . . . , xik−1 ]
f [xi0 , xi1 , xi2 , . . . , xik−1 , xik ] = . (2.28)
(xik − xi0 )
Now we can rewrite the Newton’s interpolating polynomial by plugging the the expressions for ak ’s
as Newton’s divided difference in (2.19).
Thus to find the polynomial P (x) completely, we need to find f [x0 , x1 ], f [x0 , x1 , x2 ], . . .,
f [x0 , x1 , . . . , xn ], that is, the divided difference of f of all order from 1 to n. One can directly
calculate f [x0 , x1 ], but to find f [x0 , x1 , x2 ] we also need to calculate first difference of f relative to
points x1 , x2 , that is, f [x1 , x2 ]. And further to find f [x0 , x1 , x2 , x3 ], we need to find f [x1 , x2 , x3 ],
that is, we need an extra computation of f [x2 , x3 ]. If we proceed in this way, we need to compute
the following table
First we form the divided difference table for the above data.
x f 1st 2nd 3rd 4th
3−3 12−0 8−4 1−1
0 3 (1−0) = 0 (3−0) = 4 (4−0) = 1 (5−0) = 0
27−3 36−12 12−8
1 3 (3−1) = 12 (4−1) = 8 (5−1) = 1 ...
63−27 60−36 . (2.30)
3 27 (4−3) = 36 (5−3) = 12 ... ...
123−63
4 63 (5−4) = 60 ... ... ...
5 123 ... ... ... ...
Now we substitute the corresponding values divided differences in formula for Newton’s inter-
polating polynomial of degree four.
2.4. Divided difference at variable point (Functional value in terms of divided differ-
ence.)
In the following, we will try to find the divided difference of the function at some arbitrary point
of the interval. And we will see that the functional value at some given point can be expressed in
terms of divided difference at that point.
f (x0 ) − f (x)
f [x, x0 ] = .
(x0 − x)
=⇒ f (x) = f (x0 ) + (x − x0 )f [x, x0 ]. (2.31)
Further,
f [x0 , x1 ] − f [x, x0 ]
f [x, x0 , x1 ] = .
(x1 − x)
=⇒ f [x, x0 ] = f [x0 , x1 ] + (x − x1 )f [x, x0 , x1 ]. (2.32)
Substituting (2.32) in (2.31), we get
f (x) = f (x0 ) + (x − x0 )f [x0 , x1 ] + (x − x0 )(x − x1 )f [x, x0 , x1 ]. (2.33)
th
Continuing this procedure up to n divided difference, we get inductively that
f (x) = f (x0 ) + (x − x0 )f [x0 , x1 ] + (x − x0 )(x − x1 )f [x0 , x1 , x2 ] + . . .
+(x − x0 )(x − x1 ) . . . (x − xn−1 )f [x, x0 , x1 , . . . , xn−1 ]. (2.34)
But
f [x0 , x1 , x2 , . . . , xn ] − f [x, x0 , x1 , . . . , xn−1 ]
f [x, x0 , x1 , . . . , xn ] =
(xn − x)
=⇒ f [x, x0 , x1 , . . . , xn−1 ] = f [x0 , x1 , x2 , . . . , xn ] + (x − xn )f [x, x0 , x1 , . . . , xn ].
Thus from (2.34), we get
f (x) = f (x0 ) + (x − x0 )f [x0 , x1 ] + (x − x0 )(x − x1 )f [x0 , x1 , x2 ]
+ . . . + (x − x0 )(x − x1 ) . . . (x − xn−1 )f [x0 , x1 , x2 , . . . , xn ]
+(x − x0 )(x − x1 ) . . . (x − xn )f [x, x0 , x1 , . . . , xn ]. (2.35)
Using the expression for Newton’s polynomial form (2.29), we get from (2.35)
f (x) = P (x) + (x − x0 )(x − x1 ) . . . (x − xn )f [x, x0 , x1 , . . . , xn ]. (2.36)
2.5. Properties of divided difference
As we saw that first divided difference relative to points x0 , x1 is defined as f [x0 , x1 ] =
f (x1 )−f (x0 )
(x1 −x0 ) .But this expression reminds us (mean value theorem) that if we also assume the
differentiability of the function, our first divided difference f [x0 , x1 ] is nothing but the derivative
of the function f at some point in the smallest open interval containing both x0 , x1 .
Question 2.4. Can we say that second divided difference is related to second derivative of the
function?
If yes, then how?
If we assume that the function is two times differentiable and x0 < x1 < x2 , then we write
f [x1 , x2 ] − f [x0 , x1 ] f 0 (ξ1 ) − f 0 (ξ2 ) (ξ1 − ξ0 ) f 0 (ξ1 ) − f 0 (ξ2 ) (ξ1 − ξ0 ) 00
f [x0 , x1 , x2 ] = = = = f (ζ),
(x2 − x0 ) (x2 − x0 ) (x2 − x0 ) (ξ1 − ξ0 ) (x2 − x0 )
(2.37)
where ξ0 ∈ (x0 , x1 ), ξ1 ∈ (x1 , x2 ) and ζ ∈ (ξ0 , ξ1 ).
2 INTERPOLATION BY POLYNOMIALS 14
Observation 2.2. Here we note that if ξ0 , ξ1 are the middle points of the intervals (x0 , x1 ) and
(x1 , x2 ) respectively, then f [x0 , x1 , x2 ] = 21 f 00 (ζ).
We will return back to this and prove a relation between divided difference and derivative of
the function in the Remark 2.12 of Theorem 2.4. Now we see some other properties of divided
differences.
Using Exercise 2.1, one can see that Newton’s second divided difference is independent of the
order ofQnode points and depends only on the set of all node points. If we consider the function
n
ω(x) = j=0 (x − xj ), then we can see that
n
X f (xi )
f [x0 , x1 , x2 , . . . , xn ] = . (2.38)
i=0
ω 0 (xi )
Further, if we assume that f is once differentiable in whole interval (a, b), then it is clear by mean
value theorem that f [x0 , x] = f (x)−f (x0 )
(x−x0 ) = f 0 (x̃) for some x̃ ∈ (min{x0 , x}, max{x0 , x}). Moreover,
d f [x0 , x1 , . . . , xn , x + h] − f [x0 , x1 , . . . , xn , x]
f [x0 , x1 , . . . , xn , x] = lim ,
dx h→0 h
f [x0 , x1 , . . . , xn , x + h] − f [x, x0 , x1 , . . . , xn ]
= lim ,
h→0 (x + h) − x
= lim f [x, x0 , x1 , . . . , xn , x + h],
h→0
= f [x, x0 , x1 , . . . , xn , x],
= f [x0 , x1 , . . . , xn , x, x]. (2.40)
Remark 2.11. We note that Newton’s divided difference interpolating polynomial Pn (x) given
by (2.29) interpolates the function f (x) at n + 1 distinct points and the degree of Pn (x) is at most
n, but Lagrange’s polynomial L(x) given by (2.11) also interpolates the same data so by Theorem
2.1 both L(x) and Pn (x) are the same polynomial written in two different ways. Therefore, for a
given continuous function f the error function defined by (2.44) depends only on the set of node
points. From expression (2.36) it is clear that by interpolating the function f (x) by a Newton’s
divided difference interpolating polynomial Pn (x), the error En f (x) can be expressed a
f n+1 (ξ)
En f (x) = (x − x0 )(x − x1 ) . . . (x − xn ) . (2.46)
(n + 1)!
It is clear that φ(xi ) = 0 for all i = 0, 1, 2, . . . , n. Thus function φ has n + 2 zeros say
{x, x0 , x1 , . . . , xn }. Hence its (n + 1)th derivative has at least one zero at some point ξ in the
interval (min{x, x0 , x1 . . . , xn }, max{x, x0 , x1 . . . , xn }), that is, φn+1 (ξ) = 0. Therefore,
1
En f (x) = f (x) − Pn (x) = (x − x0 )(x − x1 ) . . . (x − xn ) f n+1 (ξ). (2.50)
(n + 1)!
If we denote by
Mn+1 = max |f n+1 (x)|, (2.53)
x∈[a,b]
If we some how know the bound for |(x − x0 )(x − x1 ) . . . (x − xn )| on [a, b] or the maxx∈[a,b] |(x −
x0 )(x−x1 ) . . . (x−xn )|, we can find the uniform bound for the error function. Certainly, the maxima
and minima is attained by the continuous function ω(x) = (x − x0 )(x − x1 ) . . . (x − xn ) on some
Mn+1
points of the closed interval [a, b]. And we can bound En f (x) by (n+1)! max{|minima|, |maxima|}.
Since points of maxima or minima are also the points of local maxima and minima, so if we know
all the points of local minima and maxima, that is, the critical points, we only need to compute
the modulus of functional values at these critical points and find the maximum of those values.
But we know that the critical points for some differentiable function are given by the roots of the
derivative of the function. In our case critical points are given by the roots of the derivative of the
polynomial ω(x), that is, by the solution of the equation ω 0 (x) = 0. But to find these roots might
not be an easy task if there is a large number of node points.
Problem 2.5. Find the bound on the error by interpolating a polynomial of five degree x5 −x3 +1
by Lagrange’s interpolating polynomial of degree two at the points 0, 2, 3.
2 INTERPOLATION BY POLYNOMIALS 17
Remark 2.13. Thus we see in above problem that error in interpolation might be significantly
large. But suppose if we have freedom to choose the position of node points, (not the number of
nodes), wether we can control the error.
Question 2.6. More precisely, suppose we are given a smooth function f . And we aim to inter-
polate f by n + 1 node points. Further suppose that the choice of node points is in our hand.
Can we choose these n + 1 nodes in such a way that the maximum error on the smallest interval
containing all the nodes is controlled?
Suppose we assume that the node points are equally spaced, that is, a = x0 , < x1 = x0 + h, <
x2 = x0 + 2h, . . . < xn = x0 + nh = b. And we know some how that |f n+1 (x)| ≤ Mn+1 . Then it is
clear from from (2.54) that error function is bounded as follows.
Mn+1
|En f (x)| ≤ |(x − x0 )(x − x0 − h) . . . (x − x0 − nh)|. (2.55)
(n + 1)!
Clearly ω(x) = (x − x0 )(x − x0 − h) . . . (x − x0 − nh) is independent of the function. Here we
will try to find a bound for ω(x) = (x − x0 )(x − x0 − h) . . . (x − x0 − nh) on the smallest interval
containing all nodes, that is, [x0 , x0 + nh].
We will find the bound for ω(x) in three different cases of linear (n = 1), quadratic (n = 2),
and cubic (n = 3) interpolation.
In case of linear interpolation it is easy to observe that the maximum value of |(x − x0 )(x − x1 )|
is obtained at x = (x0 + x1 )/2 and is given by (x1 − x0 )2 /4. And thus the bound for error in
interpolation by a linear polynomial (line) is given by M2 (x1 − x0 )2 /8, that is,
|E1 f (x)| ≤ M2 (x1 − x0 )2 /8. (2.56)
2 INTERPOLATION BY POLYNOMIALS 18
In case of quadratic interpolation we need to find the maximum value of |ω(x)| = |(x − x0 )(x −
x1 )(x − x2 )|. This maximum value can be obtained at one of the roots of quadratic polynomial
|ω 0 (x)|, which certainly has two real roots. But in case of equidistant points we need to find the
maximum value of |ω(x)| = |(x − x0 )(x − x0 − h)(x − x0 − 2h)| in the interval [x0 , x0 + 2h]. This
extremum will be attained at some critical point given by ω 0 (x) = 0 or (x − x0 − h)(x − x0 − 2h) +
(x − x0 )(x − x0 − 2h) + (x − x0 )(x − x0 − h) = 0. For simplification we reparameterize the curve
ω(x) with origin as the middle point of the interval [x0 , x0 + 2h] by assuming x − x0 = (t + 1)h,
then x ∈ [x0 , x0 + 2h] ⇔ t ∈ [−1, 1]√and we need to solve t(t √ − 1) + (t + 1)(t − 1) + (t + 1)t = 0,
that is, 3t2 − √ 1 = 0. Thus t = ±1/ 3 or x = x0 + h ± h/ 3. But at both of these values of x,
|ω(x)| = 2h3 / 27. Thus from (2.55), It is clear that
M3 3 √
|E2 f (x)| ≤ 2h / 27. (2.57)
3!
Now we can bound E2 f by any given positive small number by choosing step size h accordingly
small.
Similarly in cubic case if we know M4 , then to bound E3 f , we need to control |ω(x)| =
|(x − x0 )(x − x0 − h)(x − x0 − 2h)(x − x0 − 3h)|. Thus we need to find the solution for ω 0 (x) =
0. For simplification we reparameterize the curve ω(x) with origin as the center of the interval
[x0 , x0 + 3h]. Let x − x0 = (t + 3/2)h, then x ∈ [x0 , x0 +
√3h] ⇔ t ∈ [−3/2, 3/2] and we need to
solve d/dt[(t2 − 9/4)(t2 − 1/4)] = 0, which gives t = 0, ± 5/2. And the maximum value of ω(x)
is 1. And hence
M4 4
|E3 f (x)| ≤ h . (2.58)
4!
Problem 2.6. Determine the maximum step size that can be used in the tabulation of f (x) = ex
in [0, 1], so that the error for cubic interpolation will be less than 5 × 10−4 .
Solution. Since we want to approximate f by a three degree polynomial, we need to bound the
error function E3 f using (2.58). For this we need to find M4 , that is, maximum of the fourth
derivative of f (x) = ex on the interval [0, 1]. But f 4th (x) = ex , so M4 = e. Since we want or error
to be bounded by 5 × 10−4 , we need the step size h such that
M4 4
|E3 f (x)| ≤ h = (e/24)h4 ≤ 5 × 10−4 .
4!
Or we want
h4 ≤ (5 × 24 × 10−4 )/e = .004414553294,
or,
h ≤ (.004414553294).25 = .25776366.
Observation 2.3. Suppose if we approximate the function f (x) = x3 − x2 + 2 at the nodes
−1, 0, 1, the two degree polynomial, which satisfies f (−1) = 2, f (0) = 2, f (1) = 2, is given by
P (x) = −x2 + x + 2. If we draw the graphs of these functions, we observe that the graphs intersect
each other at x = 1, but goes away rapidly as soon as they leave x = 1. And this is because they
intersect perpendicularly at x = 1(P 0 (1) = −1, f 0 (1) = 1). So it is desirable to approximate the
function with a polynomial, which not only agrees with the function at nodes but derivative of the
polynomial also agrees with the derivative of the function at node points.
Solution. Since there are four conditions, it is reasonable to consider the three degree polynomial
P (x) = a0 + a1 x + a2 x2 + a3 x3 , in four unknown coefficients, (a0 , a1 , a2 , a3 ), which are to be
determined by the given data (2.59). Since P (x) satisfies (2.59), we have
a0 (1)+ a1 x0 + a2 x20 + a3 x30 = f (x0 )
a0 (0)+ a1 (1)+ a2 2x0 + a3 3x20 = f 0 (x0 )
a0 (1)+ a1 x1 + a2 x21 + a3 x31 = f (x1 )
a0 (0)+ a1 (1)+ a2 2x1 + a3 3x21 = f 0 (x1 )
Or in matrix form,
x20 x30
1 x0 a0 f (x0 )
0 1 2x0 3x20 0
a1 f (x0 )
3 = (2.60)
1 x1 x21 x1 a2 f (x1 )
0 1 2x1 3x1 2
a3 f 0 (x1 )
1 x0 x20 3
x0
0 1 2x0 3x20
One can check that the det = (x1 −x0 )4 6= 0, so we will have a unique solution
1 x1 x21 x31
0 1 2x1 3x21
to the above system of four equations. But to find solution is a tedious task. (So we leave it to
very enthusiastic reader.)
Question 2.7. Can we find above polynomial in terms of Lagrange’s fundamental polynomials?
Can we find the values of ai , bi , ci , di , i = 0, 1, 2, . . . , n, so that the polynomial
n
X
(ai x + bi )L2i (x)f (xi ) + (ci x + di )L2i (x)f 0 (xi ) ,
P (x) = (2.61)
i=0
Clearly (2.67),(2.69),(2.71),(2.73) are obviously satisfied because Li (xj ) = 0. And to satisfy other
conditions, we need
(ai xi + bi ) = 1 (2.74)
ai + (ai xi + bi )2L0i (xi ) = 0 (2.75)
(ci xi + di ) = 0 (2.76)
ci + (ci xi + di )2L0i (xi ) = 1 (2.77)
This polynomial is called as Hermite polynomial. Here we can write L0i (xi ) in terms of ω(x) as
follows
ω 00 (xi )
L0i (xi ) = . (2.83)
2ω 0 (xi )
Remark 2.14. In Hermite interpolation there are n + 1 nodes, and interpolating polynomial
should satisfy 2(n + 1) equations. This suggests that our P (x) should be of degree at most 2n + 1.
And clearly from (2.82), P (x) satisfies this minimal degree criteria.
Theorem 2.6. (Uniqueness of Hermite Interpolating Polynomial) If we know the value
of a real valued function f and its derivative f 0 at n + 1 distinct points x0 < x1 < . . . <, xn , then
there exists exactly one polynomial of degree at most 2n + 1, which satisfies the data P (xi ) = f (xi )
and P 0 (xi ) = f 0 (xi ) for all i = 0, 1, 2, . . . , n.
Proof. Clearly, the existence of such a polynomial is given by (2.82). We only need to prove the
uniqueness.
Suppose if there is some polynomial Q(x) 6= P (x) of degree at most 2n + 1 such that
Then φ(x) = P (x) − Q(x) 6= 0 is a non zero polynomial of degree at most 2n + 1, such that
φ(xi ) = 0 and φ0 (xi ) = 0. But φ(xi ) = 0 for all i = 0, 1, 2, . . . , n implies that φ0 (ξi ) = 0, for
some ξi ∈ (xi−1 , xi ) for all i = 1, 2, . . . , n. Thus we found n distinct zeros of φ0 other than x0i s.
This shows that φ0 has 2n + 1 distinct zeros, which is a contradiction to the fact that φ0 (x) is
nonzero polynomial of degree at most 2n (see Corollary 2.3). We note that φ0 (x) can not be a zero
polynomial because φ(x) can not be a non zero constant polynomial as φ(xi ) = 0.
We can also compute the error in Hermite interpolation.
Theorem 2.7. Let f be 2n + 2 times differentiable real valued function defined on interval [a, b].
Let P2n+1 be the unique Hermite interpolating polynomial of degree at most 2n + 1, satisfying
0
P2n+1 (xi ) = f (xi ), P2n+1 (xi ) = f 0 (xi ), i = 0, 1, 2, . . . , n. (2.84)
2 INTERPOLATION BY POLYNOMIALS 21
1 ω 2 (x) 2n+2
E2n+1 f (x) = (x − x0 )2 (x − x1 )2 . . . (x − xn )2 f 2n+2 (ξ) = f (ξ). (2.85)
(2n + 2)! (2n + 2)!
Where ξ is some point in the smallest interval I containing all the node points and depends upon
x. And error bound on I is given by
M2n+2
|E2n+1 f (x)| ≤ max |ω 2 (x)| . (2.86)
x∈I (2n + 2)!
Proof. Let x be some point in the interval other than node points. Let φ(t) be a function defined
on interval [a, b] by
1
E2n+1 f (x) = f (x) − P2n+1 (x) = (x − x0 )2 (x − x1 )2 . . . (x − xn )2 f 2n+2 (ξ). (2.90)
(2n + 2)!
1
X 1
X
P3 (x) = [1 − 2(x − xi )L0i (xi )] L2i (x)f (xi ) + (x − xi )L2i (x)f 0 (xi ),
i=0 i=0
0
= 2
[1 − 2(x − 0)L0 (0)]L0 (x)f (0) + [1 − 2(x − 1)L01 (1)]L21 (x)f (1)
+(x − 0)L20 (x)f 0 (0) + (x − 1)L21 (x)f 0 (1)
2 2
= [1 − 2x(−1)](1 − x) 2 + [1 − 2(x − 1)1]x 2
+x(1 − x)2 (0) + (x − 1)x2 (1)
= 2(1 + 2x)(x − 1)2 + 2(3 − 2x)x2 + (x − 1)x2
= x3 − x2 + 2
2 INTERPOLATION BY POLYNOMIALS 22
We further consider Problem (2.59). And assume that the polynomial of degree three in the
following form.
P (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + a3 (x − x0 )2 (x − x1 ). (2.91)
If this polynomial satisfy the data P (x0 ) = f (x0 ), P 0 (x0 ) = f 0 (x0 ), P (x1 ) = f (x1 ), P, (x1 ) =
f, (x1 ), then we must have
a0 = f (x0 )
a1 = f 0 (x0 )
a0 + a1 (x1 − x0 ) + a2 (x1 − x0 )2 = f (x1 )
a1 + 2a2 (x1 − x0 ) + a3 (x1 − x0 ) 2
= f 0 (x1 )
If we assume that the function is smooth enough (as many times differentiable as we want), then
f 0 (x0 ) can be written as f [x0 , x0 ], and f 0 (x1 ) = f [x1 , x1 ]. Hence, we have f (x0 ) + f [x0 , x0 ](x1 −
x0 ) + a2 (x1 − x0 )2 = f (x1 ) or f [x0 , x0 ] + a2 (x1 − x0 ) = f [x0 , x1 ], or a2 = f [x0 , x0 , x1 ]. Further
from the last equation we have a1 + a2 (x1 − x0 ) + a2 (x1 − x0 ) + a3 (x1 − x0 )2 = f [x1 , x1 ], or
f [x0 , x1 ] + a2 (x1 − x0 ) + a3 (x1 − x0 )2 = f [x1 , x1 ], or f [x0 , x0 , x1 ](x1 − x0 ) + a3 (x1 − x0 )2 =
f [x1 , x1 ] − f [x0 , x1 ], or f [x0 , x0 , x1 ] + a3 (x1 − x0 ) = f [x0 , x1 , x1 ], or a3 = f [x0 , x0 , x1 , x1 ]. Thus
Remark 2.15. In general if there are n + 1 nodes one can compute Newton’s divided difference
table with each node considered twice, that is, total 2n + 2 total number of nodes. And to compute
the the first divided difference relative to repeated points xi , xi we directly use the data given to
us because f [xi , xi ] = f 0 (xi ).
Example 2.9. We will use Newton’s extended divided difference table to find a polynomial, which
satisfies the following data in which at each node we need to match derivatives up to certain order.
P (0) = 1, P 0 (0) = 2, P 00 (0) = −2, P (1) = 3, P 0 (1) = 5, P 00 (1) = 18, P 000 (1) = 60.
Remark 2.16. One can not use Newton’s extended divided difference table if the value of certain
order derivative is known at some node, but the value of lower order derivative is not known. We
can not find a polynomial of degree three such that P (0) = 1, P 00 (0) = 2, P 00 (1) = 0, P 000 (1) = 1.
2.10. Piecewise Linear Interpolation
In practical purpose it has been observed that the approximation of a given function by lin-
ear pieces is better than the one polynomial of higher degree. Let f be a nice function with
f (x0 ), f (x1 ), . . . , f (xn ) known at n + 1 distinct points x0 , x1 , . . . , xn . We aim to find n lines
si , i = 1, 2, ..., n, such that each si interpolates the function at the end points of the interval
[xi−1 , xi ], that is,
si (xi−1 ) = f (xi−1 ), si (xi ) = f (xi ), i = 1, 2, ...n. (2.93)
Using Lagrange’s linear interpolating polynomial we get,
(x − xi ) (x − xi−1 )
si (x) = f (xi−1 ) + f (xi ) x ∈ [xi−1 , xi ], i = 1, 2, ...n. (2.94)
(xi−1 − xi ) (xi − xi−1 )
= 0 x ∈ [xi−1 , xi ]c .
Pn
Thus P1 (x) = i=1 si (x) is the desired piecewise linear interpolating polynomial satisfying the
given data. If we write the shape function as
0, x ≤ xi−1
(x−xi−1 ) ,
xi−1 ≤ x ≤ xi
i −xi−1 )
Ni (x) = (x(x−xi+1 ) . (2.95)
(xi −xi+1 ) ,
xi ≤ x ≤ xi+1
0, x ≤x i+1
Pn
Then P1 (x) = i=1 Ni (x)f (xi ). The error in piecewise linear interpolation is given by
1
E1 f (x) = (x − xi−1 )(x − xi )f 00 (ξ), ξ ∈ [xi−1 , xi ].
2!
Remark 2.17. Here it is important to note that on each subinterval the expression for error
function is different. So to find the uniform error bound we need to take the maximum of error
bounds on different subintervals. Thus
|xi − xi−1 |2
M2
|Ef (x)| ≤ max . (2.96)
2 i 2
Here we aim to find n cubic polynomials si , i = 1, 2, ..., n, such that each si interpolates the
function f at the node points xi−1 , xi in the interval [xi−1 , xi ] such that the spline s, which we
get after adjoining each cubic curve at the node points, is not only continuous on the interval
(x0 , xn ) but s0 and s00 is also continuous on this interval. These conditions can also be stated
mathematically as follows.
These are total 4n − 2 equations, but to obtain polynomials uniquely we need two more con-
ditions. These two conditions are known as boundary conditions. In practical purpose there are
two types of boundary conditions. Firstly, free boundary conditions is given by
s001 (x0 ) = s00n (xn ) = 0. (2.97)
And clamped boundary conditions are given by
s01 (x0 ) = f 0 (x0 ) and s0n (xn ) = f 0 (xn ). (2.98)
At node points we tie two distinct polynomial, so some times these node are also called as knots.
Since f (xi ) is given, in first two conditions both RHS and LHS is fixed. But in last two conditions
RHS=LHS is to be determined so that our s0i s should satisfy all the conditions simultaneously.
Suppose for notational simplification we consider new variables
m0 = s01 (x0 ), M0 = s001 (x0 ) (2.99)
mi = s0i (xi ) = s0i+1 (xi ), i = 1, 2, ..., n − 1, (2.100)
Mi = s00i (xi ) = s00i+1 (xi ), i = 1, 2, ..., n − 1. (2.101)
mn = s0n (xn ), Mn = s00n (xn ) (2.102)
Thus all the four conditions will be automatically satisfied, if we demand our each cubic polynomial
si to satisfy the following conditions.
si (xi−1 ) = f (xi−1 )
si (xi ) = f (xi )
s0i (xi−1 ) = mi−1
s0i (xi ) = mi
s00i (xi−1 ) = Mi−1
s00i (xi ) = Mi
Since in the interval [xi−1 , xi ] we want a cubic polynomial si , that is, we want s00i to be a linear
polynomial to satisfy the last two conditions. Using Lagrange’s formula we get,
(x − xi ) (x − xi−1 )
s00i (x) = Mi−1 + Mi .
(xi−1 − xi ) (xi − xi−1
Now if we write hi = xi − xi−1 , then
(x − xi )3 (x − xi−1 )3
si (x) = Mi−1 + Mi + ai x + bi .
−6hi 6hi
And since si (xi−1 ) = f (xi−1 ) and si (xi ) = f (xi ), we must have
f (xi−1 ) = h2i Mi−1 /6 + ai xi−1 + bi and f (xi ) = h2i Mi /6 + ai xi + bi .
Solving these equations we get
f (xi ) − f (xi−1 ) Mi − Mi−1
ai = − hi ,
hi 6
xi f (xi−1 ) − xi−1 f (xi ) xi Mi−1 − xi−1 Mi
bi = − hi .
hi 6
Substituting these values of ai , bi in the expression of si , we get
(x − xi )3 (x − xi−1 )3 f (xi ) − f (xi−1 ) Mi − Mi−1
si (x) = Mi−1 + Mi + x− hi x (2.103)
−6hi 6hi hi 6
xi f (xi−1 ) − xi−1 f (xi ) xi Mi−1 − xi−1 Mi
+ − hi .
hi 6
2 INTERPOLATION BY POLYNOMIALS 25
hi Mi−1 + 2(hi + hi+1 )Mi + hi+1 Mi+1 = 6f [xi , xi+1 ] − 6f [xi−1 − xi ]. (2.105)
By the assumption of clamped boundary conditions f 0 (x0 ) = s01 (x0 ) and f 0 (xn ) = s0n (xn ), we get
Now using (2.105) and above two equations we write the matrix form of the system of linear
equations.
2h1 h1 0 0 ... ... ... ... ... ...
h1 2(h1 + h2 ) h2 0 ... ... ... ... ... ...
0 h2 2(h 2 + h 3 ) h 3 . . . . . . . . . . . . . . . ...
... ... ... ... ... ... ... ... ... ...
... ... ... . . . hi−1 2(hi−1 + hi ) hi ... ... ...
×
... ... ... ... ... hi 2(hi + hi+1 ) hi+1 ... ...
... ... ... ... ... ... hi+1 2(hi+1 + hi+2 ) hi+2 ...
... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... hn−1 2(hn−1 + hn ) hn
... ... ... ... ... ... ... ... hn 2hn
M0 f [x0 , x1 ] − f [x0 , x0 ]
M1
f [x1 , x2 ] − f [x0 , x1 ]
... . . .
... . . .
Mi−1 . . .
= 6 (2.106)
Mi f [xi , xi+1 ] − f [xi−1 − xi ]
Mi+1 ...
... . . .
... f [xn−1 , xn ] − f [xn−1 , xn−2 ]
Mn f [xn , xn ] − f [xn−1 , xn ]
Remark 2.18. This matrix equation (correspond to clamped boundary) takes simple form when
step size is fixed, that is, hi = h.
2 1 0 0 ... ... ... ... ... ... M0 f [x0 , x1 ] − f [x0 , x0 ]
2
4 1 0 ... ... ... ... ... . . .
M1
f [x1 , x2 ] − f [x0 , x1 ]
0 1 4 1 ... ... ... ... ... . . . . . .
...
. . . ... ... ... ... ... ... ... ... . . . . . .
...
. . . ... ... ... 1 4 1 ... ... . . . Mi−1
6 ...
× =
. . .
... ... ... ... 1 4 1 ... . . . Mi h f [xi , xi+1 ] − f [xi−1 − xi ]
. . . ... ... ... ... ... ... ... ... . . .
Mi+1 ...
. . . ... ... ... ... ... ... ... ... . . . . . .
...
. . . ... ... ... ... ... ... 1 4 1 ... f [xn−1 , xn ] − f [xn−1 , xn−2 ]
... ... ... ... ... ... ... ... 1 2 Mn f [xn , xn ] − f [xn−1 , xn ]
(2.107)
2 INTERPOLATION BY POLYNOMIALS 26
Remark 2.19. Further for the free boundary condition M0 = 0, Mn = 0 we replace the first and
last row of the matrix with these two conditions and get
h1 0 0 0 ... ... ... ... ... ...
h1 2(h1 + h2 ) h2 0 ... ... ... ... ... . . .
0 h2 2(h2 + h3 ) h3 ... ... ... ... ... . . .
. . . ... ... ... ... ... ... ... ... . . .
. . . ... ... . . . hi−1 2(hi−1 + hi ) hi ... ... . . .
×
. . . ... ... ... ... hi 2(hi + hi+1 ) hi+1 ... . . .
. . . ... ... ... ... ... hi+1 2(hi+1 + hi+2 ) hi+2 . . .
. . . ... ... ... ... ... ... ... ... . . .
. . . ... ... ... ... ... ... hn−1 2(hn−1 + hn ) hn
... ... ... ... ... ... ... ... 0 hn
M0 0
M1
f [x 1 , x2 ] − f [x0 , x1 ]
... ...
... ...
Mi−1 ...
= 6
f [xi , xi+1 ] − f [xi−1 − xi ] . (2.108)
Mi
Mi+1 ...
... ...
... f [xn−1 , xn ] − f [xn−1 , xn−2 ]
Mn 0
This equation also takes a simple form when step size is fixed.
1 0 0 0 ... ... ... ... ... ... M0 0
2
4 1 0 . . . . . . . . . . . . . . . . . . M1
f [x1 , x2 ] − f [x0 , x1 ]
0 1 4 1 . . . . . . . . . . . . . . . . . . ... . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . .
. . . ... ... ... 1 4 1 . . . . . . . . . Mi−1 6 . . .
×
Mi h f [xi , xi+1 ] − f [xi−1 − xi ] .
=
. . . ... ... ... ... 1 4 1 . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mi+1 ...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . .
. . . ... ... ... ... ... ... 1 4 1 ... f [xn−1 , xn ] − f [xn−1 , xn−2 ]
... ... ... ... ... ... ... ... 0 1 Mn 0
(2.109)
Remark 2.20. In all above matrix equations first and last row of the matrix appear because of
extra boundary condition at x0 and xn respectively, which might be either clamped boundary or
the free boundary condition. But one can also think of having mixed boundary condition like free
boundary at x0 , that is, f 00 (x0 ) = M0 = 0 and clamped boundary at xn , that is s0 (xn ) = f 0 (xn ),
2 INTERPOLATION BY POLYNOMIALS 27
This shows that we can also approximate f 0 using cubic spline just by knowing some functional
values at nodes and M4 .
3 NUMERICAL INTEGRATION 28
3 Numerical integration
Our aim in this chapter is to find the approximate value of some definite integral specially in
the cases when the value of the integrand is known only at certain points, or the integral of the
integrand is not known in terms of standard functions.
Question 3.1. Can we use interpolating polynomials of some function f to find the approximate
Rb
integral of f on [a, b], that is, a f (x)dx? Is this approximation a good one?
If P (x) is some interpolating polynomial to f (x) such that the error is as small as some given
positive number , that is, |E(x)| = |f (x) − P (x)| ≤ , then
Z b Z b Z b
f (x)dx − P (x)dx = (f (x) − P (x))dx,
a a a
Z b
≤ |f (x) − P (x)|dx,
a
Z b
≤ dx,
a
= (b − a). (3.1)
Thus we can approximate the integral of f by the integral of interpolating polynomial with the
desirable accuracy.
3.1. Newton-Cotes Methods
Rb Rb
In this method we use interpolating polynomials P (x) to approximate a
f (x)dx by a
P (x)dx
Rb Rb
with error in integration as a [f (x) − P (x)]dx = a E(x)dx.
3.2. Trapezoidal Rule
Suppose f is known at the two end points of the interval x0 = a, x1 = b and P1 (x) is the linear
approximation to the function f at nodes x0 , x1 , then by Lagrange’s method we write
(x − x1 ) (x − x0 )
P1 (x) = f (x0 ) + f (x1 ). (3.2)
(x0 − x1 ) (x1 − x0 )
R x1 R x1 (x−x1 ) R x1 (x−x0 )
(x1 −x0 )
Now, x0
P1 (x)dx = x0 (x0 −x1 ) f (x0 ) dx + x0 (x1 −x0 ) f (x1 ) dx = 2 f (x0 ) +
(x1 −x0 )
2 And if h1 = x1 − x0 , then
f (x1 ).
Z x1
h1
P1 (x)dx = [f (x0 ) + f (x1 )]. (3.3)
x0 2
Rb
Thus by Trapezoidal rule the approximate value of a f (x)dx is given by (3.3). Further from (2.46)
f (x) − P1 (x) = (x − x0 )(x − x1 )f 00 (ξx )/2. (3.4)
And hence,
Z x1 Z x1
1
[f (x) − P1 (x)] dx ≤ (x − x0 )(x − x1 )f 00 (x) dx
x0 x0 2
Z x1
1
≤ |(x − x0 )| |(x − x1 )| max |f 00 (x)| dx
x0 2 x∈[x0 ,x1 ]
Z x1
1
= max |f 00 (ξx )| (x − x0 ) (x1 − x)dx
2 x∈[x0 ,x1 ] x0
1 (x1 − x0 )3
= max |f 00 (x)| . (3.5)
2 x∈[x0 ,x1 ] 6
3 NUMERICAL INTEGRATION 29
x0 +2h x0 +2h
f 000 (ξ)
Z Z
E2 (x)dx ≤ (x − x0 )(x − x0 − h)(x − x0 − 2h) dx
x0 x0 6
Z x0 +2h
M3
≤ |(x − x0 )(x − x0 − h)(x − x0 − 2h)| dx
x0 6
Z x1 Z x2
M3
= (x − x0 )(x1 − x)(x2 − x)dx + (x − x0 )(x − x1 )(x2 − x)dx
6 x0 x1
"Z
x0 +h
M3
= (x − x0 )(x0 + h − x)(x0 + 2h − x)dx+
6 x0
#
Z x0 +2h
(x − x0 )(x − x0 − h)(x0 + 2h − x)dx
x0 +h
By substitution x − x0 = hu,
x0 +2h 1 2
M3 h4
Z Z Z
E2 (x)dx ≤ u(1 − u)(2 − u)du + u(u − 1)(2 − u)du
x0 6 0 1
4
M3 h 2
(u − u3 − u4 /4)|10 + (−u2 + u3 − u4 /4)|10
=
6
M3 h4
= [(1 − 1 + 1/4) − 0 + (−4 + 8 − 4) − (−1/4 + 1 − 1)]
6
4
M3 h
≤ . (3.14)
12
R x +2h
This shows that we can approximate x00 f (x)dx by h3 [f (x0 ) + 4f (x0 + h) + f (x0 + 2h)] with
a desired accuracy if h is small enough.
Observation 3.1. Clearly because of three nodes, approximation to a two degree polynomial
is exact. It can also be seen easily from (3.13) that if f is a three degree polynomial, that is,
f (x) = a0 + a1 x + a2 x2 + a3 x3 , then
x0 +2h x0 +2h
f 000 (ξ)
Z Z
E2 (x)dx = (x − x0 )(x − x0 − h)(x − x0 − 2h) dx
x0 x0 6
Z x0 +2h
= a3 (x − x0 )(x − x0 − h)(x − x0 − 2h)dx = 0. (3.15)
x0
Thus, because of odd number of equidistant nodes, the approximation to the integral of three
degree polynomial is also exact.
3 NUMERICAL INTEGRATION 31
Since we are approximating the function at three node points x0 , x0 + h, x0 + 2h, we have
f (x) = f (x0 ) + f [x0 , x0 + h](x − x0 ) + f [x0 , x0 + h, x0 + 2h](x − x0 )(x − x0 − h) + f [x0 , x0 + h, x0 +
000
2h, x](x−x0 )(x−x0 −h)(x−x0 −2h) = P2 (x)+ f 6(ξ) (x−x0 )(x−x0 −h)(x−x0 −2h). And the approx-
R x0 +2h R x0 +2h
imation to the integral x0 f (x)dx is x0 P2 (x)dx = h3 [f (x0 ) + 4f (x0 + h) + f (x0 + 2h)]
with truncation error of integration given by (3.13). But using (3.15), the crux of the observation
R x +2h
3.1, one can obtain the same approximation as (3.12) to the integral x00 f (x)dx by approxi-
mating f (x) with a three degree interpolating polynomial P3 (x) satisfying the conditions P3 (x0 ) =
f (x0 ), P3 (x0 + h) = f (x0 + h), P3 (x0 + 2h) = f (x0 + 2h) and an extra condition P30 (x0 + h) = 0.
And this is because P3 (x) = P2 (x) + f [x0 , x0 + h, x0 + 2h, x0 + h](x − x0 )(x − x0 − h)(x − x0 − 2h)
and
Z x0 +2h Z x0 +2h
P3 (x)dx = f (x0 ) + f [x0 , x0 + h](x − x0 ) + f [x0 , x0 + h, x0 + 2h](x − x0 )
x0 x0
Z x0 +2h
(x − x0 − h) dx + f [x0 , x0 + h, x0 + 2h, x0 + h] (x − x0 )(x − x0 − h)(x − x0 − 2h)dx,
x0
Z x0 +2h Z x0 +2h
= P2 (x)dx + f [x0 , x0 + h, x0 + 2h, x0 + h](0) = P2 (x)dx. (3.16)
x0 x0
Thus the error of interpolation in this case is given by f (x) − P3 (x) = f [x0 , x0 + h, x0 + 2h, x0 +
iv
h, x](x − x0 )(x − x0 − h)2 (x − x0 − 2h) = f 4!(ξ) (x − x0 )(x − x0 − h)2 (x − x0 − 2h). Further the
truncation error of integration will be
x0 +2h x0 +2h
f iv (ξ)
Z Z
[f (x) − P3 (x)]dx = (x − x0 )(x − x0 − h)2 (x − x0 − 2h) dx. (3.17)
x0 x0 4!
Remark 3.1. Thus we see from (3.14) and (3.18) that there are two bounds of order h4 and h5
respectively for the error of integration in Simpson’s one-third rule. But for small h we can infer
5
the better accuracy of approximating integral from the later bound (3.18). So often we use M90 4h
Remark 3.2. As we discussed in Remark 3.1, in composite Simpson’s one-third rule we also get
two bounds for error of integration as given above. But we always try to subdivided interval [a, b]
in to least number of subintervals to obtain certain accuracy. Thus to obtain the error bound as
4 5
, we need either M192n
3 (b−a)
3 < or M2880n
4 (b−a)
4 < . And hence the minimum n, required for obtaining
the desired error bound of integration, should satisfy
( 1/3 1/4 )
M3 (b − a)4 M4 (b − a)5
min , ≤ n. (3.22)
192 2880
Thus the error bound for Simpson’s 3/8 rule can be given as
x0 +3h
M4 h5 49 49M4 h5
Z
f (x) − P3 (x)]dx ≤ = . (3.27)
x0 24 30 720
and apply Simpson’s 3/8 rule in the interval of the form [x3i , x3(i+1) ] to obtain
Z x3(i+1)
3h
f (x) dx ≈ f (x3i ) + 3f (x3i+1 ) + 3f (x3i+2 ) + f (x3(i+1) ) . (3.28)
x3i 8
And the approximation for the integral on whole interval [a, b] can be obtained as follows.
Z b n−1
X Z x3(i+1) n−1
3h X
f (x) dx = f (x) dx ≈
f (x3i ) + 3f (x3i+1 ) + 3f (x3i+2 ) + f (x3(i+1) )
a i=0 x3i i=0
8
" n−1 n−1 n−1
#
3h X X X
≈ f (x0 ) + 3 f (x3i+1 ) + 3 f (x3i+2 ) + 2 f (x3i ) + f (x3n ) . (3.29)
8 i=0 i=0 i=1
The total error in composite Simpson’s 3/8 rule will be bounded as follows
Z b n−1
X Z x3(i+1) n−1
X Z x3(i+1)
[f (x) − P (x)] dx = [f (x) − P (x)] dx ≤ [f (x) − P (x)] dx
a i=0 x3i i=0 x3i
n−1
X 49M4 h5 49M4 h5 49M4 4
≤ =n = n (b − a)5 . (3.30)
i=0
7200 720 720 × 35
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 35
(4.1)
In this system of linear equations a0ij s, i, j = 1, 2, . . . , n are known coefficients, b0i s, i = 1, 2, . . . , n
are known values and x0i s, i = 1, 2, . . . , n are unknowns to be determined. This system can also be
written in matrix form as
Ax = b, (4.2)
where A is n × n matrix and x and b are column vectors of order n × 1 of unknowns and known
values respectively. If b = 0, then (4.2) is called homogeneous system of equations. For the solution
of the system of linear equations (4.2) we recall an important theorem from linear algebra.
Theorem 4.1. If A is real matrix of order n × n, then the following statements are equivalent.
Remark 4.1. If any of the four equivalent conditions are satisfied in the above theorem, then one
can find the solution of the system of linear equations by multiplying the inverse of the matrix A
on left of both side of the equation Ax = b to get
x = A−1 b. (4.3)
And to find A−1 we know the standard method (learned in 10 + 2 standard) of finding the adjoint
of the matrix A and A−1 = Adj(A)/ det(A). One can also recall the Cramer’s method of finding
the solution.
Observation 4.1. Both the above method of finding the solution involve an intermediate step of
finding the determinant of matrix A. But finding the determinant of a large matrix is not an easy
task.
Question 4.1. Can we find the solution of the system of linear equations without finding the
inverse of the coefficient matrix?
4.1. Direct method for some special form of the coefficient matrix A
Here we will try to answer the question (4.1) positively if the coefficient matrix is of some easy
form such that the solution can be obtained by direct computations.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 36
1. Diagonal case: A = D Qn
Since matrix A is assumed to be nonsingular, det(A) = i=1 6 0. And we have
aii =
a11 0 . . . . . . 0 ... ... 0 0 x1 b1
0 a22 . . . . . . 0 ... ... 0 0 x2 b2
. . . . . . . . . . . . ... ... ... ... ... ... ...
. . . . . . . . . . . . ... ... ... ... ... ... ...
0
0 ... ... aii . . . . . . 0 0 × xi = bi
.
. . . . . . . . . . . . ... ... ... ... ... ... ...
. . . . . . . . . . . . ... ... ... ... ... ... ...
0 0 ... ... 0 . . . . . . an−1,n−1 0 xn−1 bn−1
0 0 ... ... 0 ... ... 0 ann xn bn
The solution is obvious in this case and can be written as
bi
xii = , i = 1, 2 . . . , n. (4.4)
aii
Note that the only n divisions are required as computer operations.
2. Lower triangular case: A = LT
We have the following matrix equation.
a11 0 0 0 0 0 0 0 0 x1 b1
a21 a 22 0 0 0 0 0 0 0
x2 b2
... ... ... 0 0 0 0 0 ... ...
0
... . . . . . . . . . 0 0 0 0 ... ...
0
ai1 a i2 . . . . . . a ii 0 0 0 × xi = bi .
0
... ... ... ... ... ... 0 0 ... ...
0
... . . . . . . . . . . . . ... ... 0 ... ...
0
an−1,1 an−1,2 . . . . . . an−1,i . . . . . . an−1,n−1 xn−1 bn−1
0
an1 an2 ... ... ani ... ... an,n−1 ann xn bn
Qn
In this case also we assume the solution to exist and hence, det(A) = i=1 aii 6= 0. Here it
is easy to compute x1 but to compute x2 , we need to substitute the value of x1 in the second
equation. Similarly to compute xk we need to substitute the values of x1 , x2 , . . . , xk−1 (which
are already obtained) in the following equation
Pk−1
(bk − j=1 akj xj )
xk = , k = 2, . . . , n. (4.5)
akk
Thus we need forward substitution of the entries so we call this method as forward sub-
stitution. For computation of xk in the above equation we require (k − 1) multiplications,
Pn− 1) additions and one division.
(k
2
Thus the total number of computer operations are
k=1 (k − 1) + (k − 1) + 1 = n + 2n.
3. Upper triangular case: A = U T
In this case we also have aii 6= 0. And we have
a11 a12 . . . . . . a1i . . . ... a1,n−1 a1n x1 b1
0 a22 . . . . . . a2i . . . ... a2,n−1 a2n x2 b2
0 0 ... ... ... ... ... ... ... ... ...
0 0 0 ... ... ... ... ... ... ... ...
0 0 0 0 aii . . . ... ai,n−1 ain × x i = bi .
0 0 0 0 0 ... ... ... ... ... ...
0 0 0 0 0 0 ... ... ... ... ...
0 0 0 0 0 0 0 an−1,n−1 an−1,n xn−1 bn−1
0 0 0 0 0 0 0 0 ann xn bn
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 37
But here the computation of xn is easy compared to other unknowns. So we first compute
xn from the last equation and substitute its value in the second last equation to compute
xn−1 . And to compute xk we substitute the values of unknowns xn , . . . , xk+1 obtained from
the last n − k equations in the kth equation as follows
Pn
(bk − j=k+1 akj xj )
xk = , k = 2, . . . , n − 1. (4.6)
akk
Because of substitution in the earlier equations this method is known as back substitution.
And similar to forward substitution we need total n2 + 2n computer operations for the
complete solution.
Remark 4.2. Above methods are applicable only to diagonal or triangular matrices and do not
give the general answer to the question (4.1).
4.2. Certain Decomposition Methods for solving the system of linear equations
Here our aim is to write the coefficient matrix A as the product of two matrices B and C of some
simpler form. And then go for solving Ax = b or BCx = b, by solving two different systems. First
Bz = b for z and then Cx = z for x.
4.3. Doolittle’s Method
In this method A is decomposed as A = LU , where
1 0 0 u11 u12 u13
L = l21 1 0 , U = 0 u22 u23 .
l31 l32 1 0 0 u33
So that
a11 a12 a13 u11 u12 u13
A = a21 a22 a23 = LU = l21 u11 l21 u12 + u22 l21 u13 + u23 (4.7)
a31 a32 a33 l31 u11 l31 u12 + l32 u22 l31 u13 + l32 u23 + u33
Now one has to solve nine equations to find the all the total nine unknown coefficients of lower
triangular matrix L and upper triangular matrix U . But these are easy to solve more or less only
substitutions are needed. The solution is obtained by first solving Lz = b for z by direct methods
and then solving U x = z for x again by direct methods.
4.4. Crout’s Method
Here one decomposes A = LU , where
l11 0 0 1 u12 u13
L = l21 l22 0 , U = 0 1 u23 .
l31 l32 l33 0 0 1
Here we again need to solve nine equations to determine all the nine unknown coefficients. And
similar to previous method we first solve Lz = b for z and then U x = z for x.
4.5. Positive definite matrix
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 38
A real square matrix A is said to be positive definite if det A > 0 and all leading principal
minors are positive.
4.6. A matrix
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
is positive definite if
• a11 > 0,
a a12
• det 11 > 0.
a21 a22
• det A > 0.
1 1/2 1/3 1 2 3
Example 4.1. The matrix 1/2 1/3 1/4 is positive definite, while the matrix 2 3 2 is
1/3 1/4 1/5 3 2 5
not positive definite because second leading principal minor is not positive.
Cholesky’s Method
Cholesky’s method is applicable for symmetric and positive definite matrix A. In this case the
decomposition of A is A = LLT , where
d1 0 0
L = l21 d2 0 .
l31 l32 d3
So that 2
a11 a21 a31 d1 d1 l12 d1 l31
2
A = a21 a22 a32 = LU = d1 l21 l21 + d22 l21 l31 + d2 l32 (4.9)
2 2
a31 a32 a33 d1 l31 l31 l21 + l32 d2 l31 + l32 + d23
Here we only need to solve six equations in six unknowns. To solve Ax = b we first solve Lz = b,
for z and then LT x = z for x.
Question 4.2. Can we think of some generalization of the elimination method, which we learned
in 10th standard?
Thus we note that at each step of elementary row transformation to a system of linear equations
we get an equivalent system of linear equations having the same solution.
Steps of Gauss elimination method to convert the coefficient matrix in to upper triangular (or
identity) matrix can be described as follows.
• If a11 = 0, interchange the first row with jth row for which aj1 6= 0.
• Use transformations Ri1(−ai1 /a11 ) to eliminate all ai1 , i > 1 with the pivot element a11 .
• If a22 = 0, interchange the second row with jth , j > 2 row for which aj2 6= 0
• Use transformations Ri2(−ai2 /a22 ) to eliminate all ai2 , i > 2 with the pivot element a22 .
• Use similar transformations to convert the coefficient matrix in to upper triangular matrix.
If we know already that the matrix A is invertible we can further use elementary row transfor-
mations to convert this upper triangular matrix to identity matrix in following steps. Also note
that in this upper triangular matrix each aii 6= 0.
• We further do similar transformations in (n − 1)th column after that in (n − 2)th column and
so on to second and then in first column. In general in kth column we first apply Rk(1/akk ) to
get akk = 1 and then Rik(−aik ) , i = k − 1, . . . , 1 to eliminate aik , i = k − 1, . . . , 1 with pivot
as akk = 1.
To implement Gauss elimination method in general we operate a number of elementary row trans-
formations to a system of linear equations, required to convert A in to triangular form (or iden-
tity form), or equivalently the same elementary row transformations on the augmented matrix
[A|b] to convert it into [T |bT ] (or [I|bI ]). Theoretically it can be understood by successive left
multiplication of corresponding elementary metrics on both sides of matrix form of the system
Ax = b, that is, El El−1 . . . E2 E1 Ax = El El−1 . . . E2 E1 b to obtain the new equivalent matrix form
of the system as T x = bT (or Ix = bI ). Since the product of invertible metrics is an invert-
ible matrix, the matrix E = El El−1 . . . E2 E1 Ax = El El−1 . . . E2 E1 is invertible and hence the
solution x for EAx = T x = bT = Eb (or EAx = Ix = bI = Eb) is same as the solution for
E −1 EAx = Ax = b = E −1 Eb.
Observation 4.2. It might happen in general that the coefficient matrix is not invertible or it is
not a square matrix, that is, the number of equations is different from number of unknowns. In
these situations we can not use Theorem 4.1 and the steps of Gauss elimination method might not
work fine.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 40
Question 4.3. Can we convert Ax = b in to some form, which can be solved directly when A is
rectangular or singular matrix.
4.8. Row Echelon Form
Yes! One can still use elementary row transformations to convert A in to Echelon form, which
can be described as follows.
A matrix A is said to be of Echelon form if it satisfies the following properties.
• Each of the zero rows (a row in which each entry is zero), if it occurs in the matrix A, must
occur below every non-zero row (a row in which at least one entry is non-zero).
• If the leading non-zero entry (first non-zero entry of the row) in ith row occurs in kith column,
that is, aij = 0, j = 1, 2, . . . , ki − 1 and aiki 6= 0, and there are only r many non-zero rows,
then one must have k1 < k2 < · · · < kr .
To obtain the row Echelon form of a matrix A of order m × n, one can use the following steps.
• If first non-zero column is l1 , by a row transformation (row interchange) bring one non-zero
entry of l1th column in the first row. And use this entry to eliminate all other entries of the
l1th column by the row transformations Ril1 (−ail1 /a1l1 ) . This l1 is k1 of the definition. In
further row transformations we will not use the first row at all.
• Now consider the sub matrix of order m − 1 × n after ignoring the first row. Search for
the first non-zero column l2 of this sub-matrix. If first entry of this l2th column of order
m − 1 × 1 is zero, bring one non-zero entry of the column to the first row and use this entry to
eliminate all other non-zero entries of the column. This l2 is k2 of the definition. In further
row transformations we will not use the first two rows of the parent matrix.
• Now we further consider the sub matrix of order m − 2 × n of the parent matrix. And search
for first non-zero column l3 of this sub-matrix. Use similar row transformations to eliminate
all the entries of l3th below the first row using the non-zero entry of the first row as pivot.
• Use similar transformations successively to find a column lr of the parent matrix satisfying
i) lr th column is the first non-zero column of the sub-matrix (obtained by ignoring first r − 1
rows ) of order m − r + 1 × n,
ii) only non-zero entry of this column of sub-matrix is in the first row.
• This process will stop if i) either r = m, in this case m ≤ n (and m = n only when Echelon
form is upper triangular and invertible),
ii) or last r − m rows of the parent matrix are zero rows.
4.9. Solution of system through Echelon form
For solving the system Ax=b, where A is coefficient matrix of order m × n, x is column of un-
knowns of order n × 1 and b is column of order m × 1, we apply same sequence of elementary row
transformations to augmented matrix [A|b] to convert it in to [AE |bE ], where AE is the Echelon
form of matrix A. Because of elementary row transformations the solution x for both the systems
Ax = b and AE x = bE are same.
We can also conclude about the nature of the solution by some observations of augmented
matrix [AE |bE ]as follows.
• If r = m and n = m, then unique solution by direct method.
• If r = m and n > m, then infinite solutions with n − m order of independency, that is, out
of n unknowns the system can be solved for any chosen m unknowns in terms of remaining
n − m unknowns, which can be given any values.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 41
• If r < m and there exists some non-zero row in last m − r rows of augmented matrix
[AE |bE ]. This happens because of only non-zero entry of the row as the entry corresponding
to column bE . If this entry is bE r+p 6= 0, we end up with solving an equation of the form
0 × xr+p = bE r+p , which shows the inconsistency of the system and leads to NO Solution.
• Further if r < m and last m − r rows of augmented matrix [AE |bE ] are zero rows, then the
system has unique solution if n = r, and infinite solutions with n − r degree of freedom if
n > r. Note that the case n < r does not occur.
4.10. Partial Pivoting
Look at the class example for the need of partial pivoting.
One can ensure partial pivoting in Gauss Elimination or in Echelon form just by making sure
that the pivot entry in the column is largest in magnitude compared to the entries, which are to
be eliminated.
4.11. Norm
Norm is the notion of generalization of the modulus. Modulus of a number measures the
distance of the position of that number from the origin. Now to measure the distances in a plane
or in Rn , one considers a function
k · k : Rn → R+ ∪ {0}, (4.10)
satisfying,
• kxk ≥ 0 and kxk = 0 if and only if x = 0, for all x ∈ Rn .
• kαxk = |α|kxk, for any scalar α ∈ R and x ∈ Rn . This also implies kx − yk = ky − xk.
• Triangle Inequality: For any x, y, z, u, v ∈ Rn ,
n
!1/p
X
kxkp = |xi |p . (4.13)
i=1
This defines a norm on Rn , known as p−norm of the vector x ∈ Rn . Note that in daily life we use
2−norm for measuring the distance in plane and space.
Example 4.3. Consider the function on Rn
• {x ∈ R2 : kxk2 = 1},
• {x ∈ R2 : kxk∞ = 1}.
Exercise 4.2. Show that norm is a continuous function from Rn → [0, ∞].
4.12. Matrix Norm
We know that the set of all real valued matrices of order m × n, denoted by Mmn , forms a vector
space over real numbers. One can define a norm on Mmn , which is also compatible with matrix
multiplication as
k · k : Mmn → R+ ∪ {0}, (4.15)
satisfying,
• kAk ≥ 0 and kAk = 0 if and only if A is null matrix, for all A ∈ Mmn .
• kαAk = |α|kAk, for any scalar α ∈ R and A ∈ Mmn .
• Triangle Inequality: For any A, B ∈ Mmn ,
kA + Bk ≤ kAk + kBk. (4.16)
This also defines a norm on Mmn , which is compatible with multiplication by vectors x ∈ Rn as
kAxk2 ≤ kAk2 kxk2 , (4.21)
where kAxk2 and kxk2 are defined in (4.13) for p = 2.
Example 4.6. Consider the following function on Mmn
X m
kAk∞ = max |aij | : i = 1, . . . , n . (4.22)
1≤i≤n
j=1
Similar to convergence of sequence of numbers to a point, one can define convergence of sequence
of vectors to another vector. A sequence of vector (or matrices) converges to some vector (or
matrices) if and only if sequence of components converges to the corresponding component of limit
vector (or matrix) for each component.
(n)
In Rm a sequence of vectors y (n) converges to a vector y if and only if yi → yi , for all
i = 1, . . . , m.
Example 4.7. The vector [1/n, 2n/n − 1, n + 1/n − 1]t → [0, 2, 1]t , while [1/n, 2n/n − 1, n]t does
not converse because the sequence corresponding to third component is not convergent.
4.14. Gauss-Jacobi iterative method
Now our aim is to solve (4.1) using some iteration method. For this we first assume aii 6= 0
and rewrite the system of linear equations as follows
a11 x1 = 0x1 −a12 x2 ... ... ... −a1i xi ... ... ... −a1n xn +b1
a22 x2 = −a21 x1 0x2 ... ... ... −a2i xi ... ... ... −a2n xn +b2
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
aii xi = −ai1 x1 −ai2 x2 ... ... ... 0xi ... ... ... −ain xn +bi
... = ... ... ... ... ... ... ... ... ... ... ...
... = ... ... ... ... ... ... ... ... ... ... ...
... = ... ... ... ... ... ... ... ... ... ... ...
ann xn = −an1 x1 −an2 x2 ... ... ... −ani xi ... ... ... 0xn +bn
Note that if we decompose A = L + D + U , where L, D, U are lower triangular, diagonal and upper
triangular matrices respectively, then above system of equations in matrix form can be written as
Dx = −(L + U )x + b, where x is the column vector of unknowns. Now since D is invertible by our
assumption we have
x = −D−1 (L + U )x + D−1 b. (4.24)
And this can also be written component wise as
x1 = 0x1 − aa11
12
x2 ... ... ... − aa11
1i
xi ... ... ... − aa1n
11
xn + ab11
1
x2 = − aa21
22
x1 0x2 ... ... ... a2i
− a22 xi ... ... ... a2n
− a22 xn + ab11
1
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
xi = − aai1
ii
x1 − aai2
ii
x2 ... ... ... 0xi ... ... ... − aain
ii
xn + abiii
... = ... ... ... ... ... ... ... ... ... ... ...
... = ... ... ... ... ... ... ... ... ... ... ...
... = ... ... ... ... ... ... ... ... ... ... ...
xn = − aann
n1
x1 − aann
n2
x2 ... ... ... − aann
ni
xi ... ... ... 0xn + abnn
n
(4.25)
−1 −1
Denoting −D (L + U ) = B and D b = C, we have x = Bx + C. Now if we assume the initial
approximation to the solution as
h it
x(0) = x(0)
1 x2
(0)
... ... ... xi
(0)
... ... ... xn
(0)
, (4.26)
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 44
then the first approximation can be obtained by the equation x(1) = Bx(0) + C, and in general
(k + 1)th approximation is obtained by
Using (4.27) we can write ith component of the (k + 1)th approximation of the solution vector
as
n
(k+1) bi X −aij (k)
xi = + x . (4.29)
aii aii j
j=1,j6=i
where [aij ]t represents the transpose of the corresponding matrix. Using (4.30) and (4.29), one has
and component-wise
n
(k+1)
X −aij (k)
ei = e
aii j
j=1,j6=i
or,
n
(k+1)
X −aij (k)
|ei | ≤ |ej |
aii
j=1,j6=i
n n
X |aij | (k)
X |aij | (k)
≤ max {|e | : j = 1, . . . , n} = ke k∞ (4.32)
|aii | 1≤j≤n j |aii |
j=1,j6=i j=1,j6=i
If we define
i−1 n
X |aij | X |aij |
αi = , βi = , (4.33)
j=1
|aii | j=i+1
|aii |
and
µ = max {(αi + βi ) : i = 1, . . . , n}. (4.34)
1≤i≤n
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 45
Let us first consider the second condition (4.37), which will be valid if and only if
(αi + βi ) < 1 for all i = 1, . . . , n
i−1
X |aij | n
X |aij |
or + < 1 for all i = 1, . . . , n
j=1
|aii | j=i+1 |aii |
Xn
or |aij | < |aii | for all i = 1, . . . , n (4.38)
j=1,j6=i
The condition (4.38) is known as strict row diagonally dominant and also implies the first condition.
Thus for the convergence of Gauss-Jacobi iteration method we only need the coefficient matrix to
be strict row diagonally dominant.
Remark 4.4. Note that kBk∞ = µ. Then from (4.31)
ke(k+1) k = kBe(k) k ≤ kBk × ke(k) k ≤ kBk × kBk × ke(k−1) k ≤ . . . ≤ kBkk+1 ke(0) k = µk+1 ke(0) k.
Now the convergence follows if µ < 1. Moreover, we have x(k+1) − x(k) = Bx(k) + C − Bx(k−1) − C
and hence
kx(k+1) − x(k) k = kB(x(k) − x(k−1) )k ≤ kBk × kx(k) − x(k−1) k = µkx(k) − x(k−1) k, for all k ∈ N.
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 46
Thus kx(k+1) − x(k) k ≤ µkx(k) − x(k−1) k ≤ µ × µkx(k−1) − x(k−2) k ≤ . . . ≤ µk kx(1) − x(0) k. Now
for any m > k > 1 we have
kx(m) − x(k) k = kx(m) − x(m−1) + x(m−1) − x(m−2) + x(m−2) . . . − x(k+1) + x(k+1) − x(k) k
≤ kx(m) − x(m−1) k + kx(m−1) − x(m−2) k + . . . + kx(k+1) − x(k) k
≤ (µm−1 + µm−2 + . . . + µk )kx(1) − x(0) k
∞
X µk
< µk ( µi )kx(1) − x(0) k = kx(1) − x(0) k since µ < 1.
i=0
1 − µ
Further since kx(m) −xk → 0, for any given , there exists an integer m0 > k such that kx(m0 ) −xk <
. And hence
µk
kx − x(k) k ≤ kx − x(m0 ) k + kx(m0 ) − x(k) k < + kx(1) − x(0) k.
1−µ
Since above inequality is true for any arbitrary > 0, we can assume → 0 and hence
µk
ke(k) k = kx − x(k) k ≤ kx(1) − x(0) k. (4.39)
1−µ
4.16. Gauss-Seidel iterative method
We rewrite the system of equations as follows
a11 x1 = −a12 x2 −a13 x3 ... ... −a1i xi ... ... ... ... −a1n xn +b1
a21 x1 +a22 x2 = −a23 x3 ... ... −a2i xi ... ... ... ... −a2n xn +b2
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
ai1 x1 +ai2 x2 ... ... ... +aii xi = −ai,i+1 xi+1 ... ... ... −ain xn +bi
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ...
an1 x1 +an2 x2 ... ... ... +ani xi ... ... ... ... +ann xn = +bn
i−1 n
(k+1) bi X −aij (k+1) X −aij (k)
xi = + xj + x . (4.41)
aii j=1 aii j=i+1
aii j
Suppose x is the exact solution to the system 4.25. Then the kth error vector can be similar to
(4.30) defined as
h it
e(k) = x − x(k) = x1 − x(k)
1 x 2 − x
(2)
2 . . . . . . x i − x
(k)
i . . . . . . xn − x
(k)
n ,
i−1 n
X −aij (k+1)
X −aij (k)
= (xj − xj )+ (xj − xj )
j=1
aii j=i+1
aii
i−1 n
X −aij (k+1)
X −aij (k)
= (ej )+ (e ), (4.42)
j=1
aii j=i+1
aii j
= αi ke(k+1) k∞ + βi ke(k) k∞ .
Or,
(k+1)
|ei | − αi ke(k+1) k∞ ≤ βi ke(k) k∞
(k+1)
|ei | − ke(k+1) k∞ + (1 − αi )ke(k+1) k∞ ≤ βi ke(k) k∞ (4.43)
If we assume (1 − αi ) > 0 for all i = 1, . . . , n and define
βi
η = max : i = 1, . . . , n . (4.44)
1≤i≤n (1 − αi )
Remark 4.5. Now we collect all the assumptions, which are assumed for the convergence of
Gauss-Seidel method. These assumptions are
• aii 6= 0, for all i = 1, . . . , n,
• (1 − αi ) > 0 for all i = 1, . . . , n ,
• and
βi
η = max : i = 1, . . . , n < 1. (4.48)
1≤i≤n (1 − αi )
Let us first consider the third condition (4.48), which will be valid if and only if
βi
<1 for all i = 1, . . . , n
(1 − αi )
or βi < (1 − αi ) for all i = 1, . . . , n
i−1 n
X |aij | X |a |
ij
or + <1 for all i = 1, . . . , n
j=1
|a ii | j=i+1
|a ii |
X n
or |aij | < |aii | for all i = 1, . . . , n (4.49)
j=1,j6=i
Thus if we assume the coefficient matrix to be strict row diagonally dominant, the third assumption
will be satisfied. Further in this case 1 − αi > βi ≥ 0 implies the second condition and first is
obviously true.
Exercise 4.3. If µ < 1, prove that η ≤ µ.
Remark 4.6. Above exercise shows that if the coefficient matrix is strict row diagonally dominant,
then the convergence factor of Gauss-Jacobi method is less or equal to convergence factor of
Gauss-Seidel method. And in case if η < µ, Gauss-Seidel method should converges faster than
Gauss-Jacobi method.
Remark 4.7. To apply any of the two methods, we need the matrix of coefficients as the strict row
diagonally dominant. Then for any initial approximation sequence of iterates converges to some
exact solution. Now the question arise that, whether the sequences corresponding to two different
initial approximations can converge to two different exact solutions. Note that it can happen only
when the coefficient matrix, which is strictly row diagonally dominant, is singular. In the following
discussion we will see that any strictly row diagonally dominant matrix is non singular and hence
we conclude that any sequence of iterates with a given initial approximation converges to the exact
solution.
Theorem 4.2. Suppose A and B are two square matrices of order n × n. If A is invertible and
1
kA − Bk < ,
kA−1 k
then B is also invertible.
Proof. Suppose if B is not invertible, then the rank of B is less than n and hence by Rank-Nullity
theorem the null space of B is not equal to {0}. Thus if 0 6= x is in the Null space of (B), then
Bx = 0 and
kxk kA−1 Axk
= ≤ kAxk = kAx − Bxk ≤ kA − Bk × kxk.
kA−1 k kA−1 k
1
But kxk =
6 0, we can conclude the contradiction kA−1 k ≤ kA − Bk (to the assumption).
4 SOLUTION OF A SYSTEM OF LINEAR EQUATIONS 49
Problem 4.1. Show that a strictly row diagonally dominant matrix is invertible.
Solution. Let A be a strictly row diagonally dominant matrix. If A = L+D+U is the decomposition
of A, then the diagonal matrix D consists of non zero entries in the diagonal and hence invertible.
Since A = D × D−1 A, it is sufficient to prove that D−1 A is invertible. Note that I is invertible
matrix with kI −1 k∞ = 1. Thus if we can show that kI − D−1 Ak∞ < 1, then by the application of
the previous theorem it follows that D−1 A is invertible. But it is easy to see that kI − D−1 Ak∞ =
µ < 1.
4.18. Ill-conditioned matrix
We first solve the following system of two linear equations in two unknowns.
x1 + 3x2 = 19
2.5x1 + 7.857x2 = 47.499
The exact solution to this system is x1 = −3 and x2 = 7. But if we round off 47.499 by 47.500,
then the solution changes drastically to x1 = 19 and x2 = 0. Or if we round off 7.857 by 7.86, the
solution changes to x1 = 98.17 and x2 = −26.39.
We observe that a small change in the coefficient matrix A or the constant vector b leads to a
large change in the solution vector. Such system is called ill-conditioned, otherwise the system is
called well-conditioned.
4.19. Small change in b vector
We want to solve
Ax = b (4.50)
Suppose the change δb in b leads to change δx in solution vector so that A(x + δx) = Ax + Aδx =
b + δb. This implies Aδx = δb, or δx = A−1 δb. Thus
Above equation shows that the δx is controlled if kA−1 k is controlled. Further since
we can control the relative error in solution vector kδxk/kxk. Using (4.51),
Thus we see that the relative error in x is controlled by relative error in b if one has control over
the quantity (kA−1 kkAk), which is known as condition number of matrix A.
Exercise 4.4. Show that for any invertible matrix A the condition number is always greater or
equal to 1, when the considered norm is infinity norm.
5 THE EIGEN-VALUE PROBLEM 50
By = λy.
z = α1 v1 + α2 v2 + . . . + αn vn , (5.1)
so that
Bz = α1 Bv1 + α2 Bv2 + . . . + αn Bvn = α1 λ1 v1 + α2 λ2 v2 + . . . + αn λn vn .
In this case the matrix of linear transformation with respect to the basis {v1 , v2 , . . . , vn } turns out to
be diagonal with diagonal entries as λ1 , λ2 , . . . , λn and we say that the matrix B is diagonalizable.
Exercise 5.1. Show that if a matrix B is diagonalizable, with eigenvalues |λ1 | ≥ |λ2 | ≥ . . . ≥ |λn |,
then the image of the unit ball {x : kxk ≤ 1} is contained in a ball of radius |λ1 | with center at
origin.
This exercise shows the importance of eigenvalue of largest magnitude.
5.1. The Power Method
This method is useful to find the dominant eigenvalue among a collection of eigenvalues of a
matrix and an eigenvector corresponding to the dominant eigenvalue. Let λ1 , λ2 , . . . , λm be a set
of eigenvalues of an square matrix of order n × n, with corresponding eigenvectors v1 , v2 , . . . , vm ,
m ≤ n such that z = α1 v1 + α2 v2 + . . . + αm vm , with α1 6= 0 and |λ1 | > |λ2 | ≥ |λ3 | ≥ . . . ≥ |λm |.
Thus,
B k z = α1 λk1 v1 + α2 λk2 v2 + . . . + αm λkm vm . (5.2)
Note that if u is a vector such that hv1 , ui =
6 0, then hz, ui =
6 0 and
hB k+1 z, ui α1 hv1 , ui + α2 (λ2 /λ1 )k+1 hv2 , ui + . . . + αm (λm /λ1 )k+1 hvm , ui
k
= λ1 . (5.3)
hB z, ui α1 hv1 , ui + α2 (λ2 /λ1 )k hv2 , ui + . . . + αm (λm /λ1 )k hvm , ui
So that in the limiting case
hB k+1 z, ui
lim = λ1 . (5.4)
k→∞ hB k z, ui
5 THE EIGEN-VALUE PROBLEM 51
lim λ−k k
1 B z = α1 v1 . (5.5)
k→∞
Thus by (5.4), we can first find the largest eigenvalue and then by (5.2) the eigenvector, corre-
sponding to the largest eigenvalue, involved in the representation of z. Note that the eigenvector
λ1 v1 is not necessarily of unit length.
Problem 5.1.Use power method to find the largesteigenvalue
and the corresponding eigenvector
3 −1 3
of the matrix with the initial vector as .
2 0 1
3 3 −1
Solution. Clearly, since z = is not the eigenvalue of the matrix B = . So we
1 2 0
can infer
that
if there is a basis of eigenvectors, z is a linear combination of both of them. Further
1
since is not an eigenvector, its inner product with both the eigenvectors has to be non-zero
0
and hence we can use this vector as u vector. Now
3 −1 3 8
Bz = = ⇒ hB 1 z, ui = 8,
2 0 1 6
3 −1 8 18
B 2 z = BBz = = ⇒ hB 2 z, ui = 18,
2 0 6 16
3 −1 18 38
B 3 z = BB 2 z = = ⇒ hB 3 z, ui = 38,
2 0 16 36
3 −1 38 78
B 4 z = BB 3 z = = ⇒ hB 4 z, ui = 78,
2 0 36 76
3 −1 78 158
B 5 z = BB 4 z = = ⇒ hB 5 z, ui = 158,
2 0 76 156
3 −1 158 318
B 6 z = BB 5 z = = ⇒ hB 6 z, ui = 318,
2 0 156 316
3 −1 318 638
B 7 z = BB 6 z = = ⇒ hB 7 z, ui = 638,
2 0 316 636
3 −1 638 1278
B 8 z = BB 7 z = = ⇒ hB 8 z, ui = 1278.
2 0 636 1276
n k+1 o
Thus first initial terms of the sequence hB k
z,ui
hB z,ui
are 18 38 78
8 = 2.25, 18 = 2.1111, 38 = 2.0526,
158
78 = 2.0256, 318 638 1278
158 = 2.0126, 318 = 2.0062, 638 = 2.0031. Thus up to to two decimal places largest
eigenvalue is 2. Moreover first few terms of the sequence {λ−k k t t t
1 B z} are [3, 2] , [4.5, 4] , [4.75, 4.5] ,
t t t t t
[4.875, 4.75] , [4.9375, 4.875] , [4.96875, 4.9375] , [4.984375, 4.96875] ,[4.9921875,
4.984375]
. Thus
t 3 5 −2
the eigenvector correct up to one decimal place is [5, 5] . Note that = + ,
1 5 −4
−2
where should be the eigenvector corresponding to the other eigenvalue. And hence other
−4
eigenvalue is 1.
Remark 5.1. Note that since k·k is a continuous function, from (5.5) we have limk→∞ kλ−k k
1 B zk =
λ−k k
1 B z α 1 v1 Bk z v1
kα1 v1 k. And hence limk→∞ −k k
kλ1 B zk
= kα1 v1 k , or limk→∞ kB k zk
= kv1 k .
Further since B rep-
k
B z v1
resents a continuous linear map from Rn to Rn , we have limk→∞ B kB k zk = B kv1 k , or
equivalently,
B k+1 z v1
lim = λ1 (5.6)
k→∞ kB k zk kv1 k
5 THE EIGEN-VALUE PROBLEM 52
n k+1 o
Note that by writing first few terms of the sequence B k
z
for the Problem 5.1 w.r.t.
kB zk
18 1 38 1 78
infinity norm as 18 = 2.25 1
, 18 = 2.1111 1
, 38 =
16
.88 36 .9474 76
1 1 158 1 1 318 1
2.0526 , 78 = 2.0256 , 158 = 2.0127 ,
.9744
156 .9873 316
.9937
1 638 1 1 1278 1
318 = 2.006 , 638 = 2.003 . This shows that the eigen-
636 .9969 1276 .9948
1
value is 2 correct up to two decimal places and corresponding unit norm eigenvector is
1
correct up to one decimal places.
5.2. QR Decomposition
Let A = [a1 , a2 , . . . , an ] be a non-singular square matrix of order n × n such that a1 , a2 , . . . , an
are the column vectors of A. We apply Gram-Schmidt Process to find an orthonormal basis
from the basis {a1 , a2 , . . . , an }. For this we first consider a unit vector in the direction of a1
as e1 = kaa11 k . Next we search for a vector u2 in the perpendicular direction of a1 such that
span{a1 , a2 } = span{e1 , u2 }, this can be obtained by subtracting from a2 the projection of a2 in
the direction of a1 , that is, ha 2 ,a1 i
ka1 k2 a1 . Thus u2 = a2 −ha2 , e1 ie1 . Consider unit vector in the direction
of u2 , that is, e2 = kuu22 k . Using ha2 , e2 i = hu2 , e2 i, we have ha2 , e2 ie2 = hu2 , e2 ie2 = ku2 ke2 = u2 .
Similarly we define e3 , e4 , . . . , en and get
a1 = ha1 , e1 ie1 ,
a2 = ha2 , e1 ie1 + ha2 , e2 ie2
k
X
In general , ek = hak , ej iej .
j=1
Thus if we consider
ha1 , e1 i ha2 , e1 i ... han , e1 i
0 ha2 , e2 i ... han , e2 i
Q = [e1 , e2 , . . . , en ]t , and R=
...
,
... ... ...
0 0 ... han , e2 i
then A = QR.
Note that the norm used here is k · k2 , which is compatible with inner product. Moreover, it
can also be shown that QR decomposition of a non-singular square matrix is unique.
−3 −5 −8
Problem 5.2. Find the QR decomposition of the matrix 6 4 1 .
−6 2 5
Solution.
√ Note that a1 = [−3, 6, −6]t , a2 = [−5, 4, 2]t , and a3 = [−8, 1, 5]t . So that ka1 k =
9 + 36 + 36 = 9 and e1 = [−1/3, 2/3, −2/3]t . Now ha1 , e1 i = 9, ha2 , e1 i = 3, ha3 , e1 i = 0 so
that u2 = [−5, 4, 2]t − 3[−1/3, 2/3, −2/3]t = [−4, 2, 4]t . Thus e2 = [−2/3, 1/3, 2/3]t and ha2 , e2 i =
6, ha3 , e2 i = 9. Now u3 = [−8, 1, 5]t − 0e1 − 9[−2/3,
t
1/3, 2/3] = [−2, −2, t
−1] so that e3 =
−1 −2 −2 9 3 0
[−2/3, −2/3, −1/3]t and ha3 , e3 i = 3. Thus Q = 31 2 1 −2 and R = 0 6 9.
−2 2 −1 0 0 3
5.3. QR Algorithm
Let A1 be a square matrix of order n with n distinct eigenvalues λ1 , λ2 , . . . , λn such that
|λ1 | > |λ2 | > . . . > |λn |. Decompose A1 as A1 = Q1 R1 , where R1 is an upper triangular matrix and
5 THE EIGEN-VALUE PROBLEM 53
Problem 5.4. Use Gerschgorin’s theorem to find the location of eigenvalues of the matrix
1 0 −1
1 −2 1 .
2 −1 −1
Solution. Let λ be eigenvalue of the given matrix. According to Gerschgorin’s theorem the λ has
to satisfy at least one of the following conditions. |λ − 1| ≤ 1, |λ + 2| ≤ 1 + 1 and |λ + 1| ≤ 2 + 1.
Thus all the eigenvalues of the matrix lie within the union of these three disks. Further if we apply
Gershgorin’s theorem to transpose of the given matrix, then λ should lie within the union of the
disks |λ − 1| ≤ 1 + 2, |λ + 2| ≤ 1, and |λ + 1| ≤ 1 + 1. Thus finally we conclude that all the
eigenvalues should lie within the intersection of these two unions.
6 NONLINEAR EQUATION 55
6 Nonlinear Equation
In last section we learned to find the solution of a system of linear equations. But in practical
problems it is also very important to find the value (or values) of an unknown satisfying certain
nonlinear equation.
Question 6.1. Let f be a nonlinear continuous function defined on real line. Can we determine
the values of x satisfying
f (x) = 0, x ∈ R? (6.1)
Remark 6.1. It might be easy to find the solution for f (x) = 0, if we know some how that this
solution will lie in some particular interval. By using Intermediate value theorem one can find such
an interval [a, b] by determining two points a, b ∈ R such that f (a) and f (b) have different signs.
Now we will discuss some iterative methods to find such a solution.
6.1. Bisection Method
This method is based on the fact that if a continuous function f is changing the sign at two
points, then f must have at least one root in the smallest interval containing these two points. And
further subdivide this interval in two subintervals of equal length and search for the subinterval in
which the root lies. This method can be described in following steps.
• First determine the interval [a, b] in which the root lies. Which can be determined by finding
two points a, b such that f (a)f (b) < 0. We consider middle point x1 of [a, b] as the first
approximation to the root. If x1 is the exact root then done or otherwise proceed to next
step.
• Next choose the interval [a2 , b2 ] with one endpoint as x1 and other endpoint as one of a or
b according to f (x1 )f (a) < 0 or f (x1 )f (b) < 0 respectively. Now find second approximation
x2 as the middle point of [a2, b2 ] such that |x2 − r| < |b2 −a
2
2|
= |b−a|
4 . If x2 is the exact root
then done or otherwise proceed to next step.
• In general find the kth approximation xk to root as the middle point of [ak , bk ]. If xk is the
root, stop the process. And if not, find next interval [ak+1 , bk+1 ] such that one endpoint
of this interval is xk and other endpoint is one of ak or bk according to f (xk )f (ak ) < 0 or
f (xk )f (b + k) < 0 respectively. Note that
|bk − ak | |b − a|
|xk − r| ≤ = . (6.2)
2 2k
In bisection method we search for the location of the a root r of (6.1) lying in each interval [a, b].
The equation (6.2) shows that
(b − a)
|ek | ≤ .
2k
Using this inequality we can approach to the root as closer as we want by increasing the number
of iterates.
6.3. The secant method
This method is based upon the linear approximation of the function. First we search two
points near the root of the function, name these points as x0 , x1 such that |f (x1 )| < |f (x0 )|.
These points x0 , x1 are first two approximations to the root. The line passing through two points
(x0 , f (x0 )), (x1 , f (x1 )) on the graph of the function is called secant. We consider this secant as the
6 NONLINEAR EQUATION 56
approximating function to find the next iterate x2 to root as the intersection point of this secant
with the real line. One can draw a figure and check that the slope of this secant can be written
in two different ways using similar triangles rule (assuming without loss of generality that x1 < x0
and 0 < f (x1 ) < f (x0 )).
f (x0 ) f (x0 ) − f (x1 )
= . (6.3)
x0 − x2 x0 − x1
After solving this we get
x0 − x1
x2 = x1 − f (x1 ) . (6.4)
f (x0 ) − f (x1 )
Now we will use x1 , x2 to determine the secant as the line passing through (x1 , f (x1 )), (x2 , f (x2 )) on
the graph of the original function and consider x3 as the intersection of this secant with the real line.
In general, we consider the secant passing through the graphical points (xk−1 , f (xk−1 )), (xk , f (xk ))
and the next approximation xk+1 as the intersection point of the secant with the real line. We
have
xk−1 − xk
xk+1 = xk − f (xk ) . (6.5)
f (xk−1 ) − f (xk )
Remark 6.2. The sequence of iterates does not need to converge to root of the function. In fact
it might also diverse to infinity.
Question 6.2. Then, why does one use secant method instead of bisection method, which gives
the security of convergence.
6.4. Order of convergence and asymptotic error
Suppose the sequence of iterates {x0 , x1 , . . .} converges to root r and for the sequence of errors
{ek = r − xk }, there exists a number p and a constant C 6= 0 such that
|ek+1 |
lim = C, (6.6)
k→∞ |ek |p
then p is called the order of convergence and C is called the asymptotic error.
6.5. Error analysis of the secant method
Since we do not know exact positioning of the root, our treatment to error will be different
from bisection method, where we knew for sure the interval in which the root lies. Let us go back
to the theme of secant method: linear approximation to the function at each new iterate. In fact
for linear approximation with nodes as xk−1 , xk , we can use (2.35) to write the expression for the
function as
f (x) = f (xk ) + f [xk , xk−1 ](x − xk ) + f [xk , xk−1 , x](x − xk )(x − xk−1 ). (6.7)
If r is the root for the function, that f (r) = 0, we have from (6.7)
f [xk , xk−1 , r]
r = xk+1 − (r − xk )(r − xk−1 ). (6.8)
f [xk , xk−1 ]
6 NONLINEAR EQUATION 57
Further if r − xk = ek denotes the error at kth iterate, then we get from (6.8) that
f [xk , xk−1 , r]
ek+1 = − ek ek−1 . (6.9)
f [xk , xk−1 ]
To determine the order of convergence of the secant method we note that from (6.9)
where
f [xk , xk−1 , r]
ck = . (6.11)
f [xk , xk−1 ]
Now if we assume that the sequence of iterates converges to the root r and the function is twice
continuously differentiable, we can write
where ηk belongs to smallest interval containing xk , xk−1 , r and ξk belongs to smallest interval
containing xk , xk−1 . But because of convergence of the iterates to the root r these intervals are
eventually shrinking to one point r as k tends to infinity and hence we have
Theorem 6.2. Let l0 be a fixed point of g(x). Suppose > 0 be such that g is differentiable
on [l0 − , l0 + ] and |g 0 (x)| ≤ α < 1 for all x ∈ [l0 − , l0 + ]. Then the sequence defined by
xn = g(xn−1 ) and x1 ∈ [l0 − , l0 + ], converges to l0 .
Proof. To prove this theorem we first prove that g([l0 −, l0 +]) ⊂ [l0 −, l0 +]. If x ∈ [l0 −, l0 +],
then |l0 − g(x)| = |g(l0 ) − g(x)| = |g 0 (c)| |l0 − x| ≤ α|l0 − x| ≤ α < , hence g(x) ∈ [l0 − , l0 + ].
Now g fulfils the condition of Theorem (6.1) with interval [a, b] as [l0 − , l0 + ] so we conclude the
rest.
Remark 6.4. Suppose if g is twice differentiable, and g 0 (r) 6= 0, where r is the fixed point of g,
then by Taylor’s formula
1 1
xk+1 = g(xk ) = g(r + xk − r) = g(r) + g 0 (r)(xk − r) + g 00 (c)(xk − r)2 = r + g 0 (r)ek + g 00 (c)e2k .
2 2
Thus,
|ek+1 | 1
= |g 0 (r) + g 00 (c)ek |.
|ek | 2
Now if {xk } converges to r or equivalently ek → 0, we have
|ek+1|
lim = |g 0 (r)|.
k→∞ |ek |
This shows that the order of convergence of fixed point iteration method is 1 and asymptotic error
is |g 0 (r)|.
Remark 6.5. Note that the iterations of Newton’s method are given by
f (xk )
xk+1 = xk − .
f 0 (xk )
If g(x) = x − ff0(x)
(x) , then Newton’s method is also a fixed point iteration method. From the last
remark order of convergence of fixed point iteration method is 1, but we know that the order
of convergence of Newton;s method is 2. This ambiguity arises because in Newton’s method
g 0 (r) = 0, which is not the case in general for fixed point iteration method. Now let us examine
the convergence behavior of the fixed point iteration method if g 0 (r) = 0 and g 00 (r) 6= 0. Suppose
g is thrice differentiable, then by Taylors formula
1 1
xk+1 = g(xk ) = g(r + xk − r) = g(r) + g 0 (r)(xk − r) + g 00 (r)(xk − r)2 + g 000 (c)(xk − r)3 ,
2 6
1 00 2 1 000 3
= r + g (r)ek + g (c)ek .
2 6
Thus
|ek+1 | 1 1 |ek+1 | 1
= | g 00 (r) + g 000 (c)ek |, or lim = | g 00 (r)|.
|ek |2 2 6 k→∞ |ek |2 2
Hence the convergence is of order 2 with asymptotic error as | 21 g 00 (r)|. Since for Newton’s
00
f (x) 1 |f (r)|
method g(x) = x − f 0 (x) , the asymptotic error | 12 g 00 (r)| = 2 |f 0 (r)| .
f (xk )
xk+1 = xk − m . (6.23)
f 0 (xk )
Note that this sequence of iterates can be applied if we know a priory the multiplicity of the root.
But to get the quadratic convergence, without knowing the multiplicity of the roots, we assume
g(x) = x − uu(x) f (x) 0
0 (x) , where u(x) = f 0 (x) . Thus g(r) = r and g (r) = 0, and hence iterates defined
by xk+1 = g(xk ), converges quadratically irrespective of the multiplicity of the root of f (x) = 0.
Thus in general we use following approximations to get quadratic convergence in case of roots with
multiplicity greater than 1,
f (xk )f 0 (xk )
xk+1 = xk − . (6.24)
(f 0 (xk ))2 − f (xk )f 00 (xk )
Note that in this case we need an extra evaluation of f 00 (x) in each iteration.
6.10. Solution of System of Nonlinear Equations
Consider the system of two nonlinear equations in two unknowns.
f (x, y) = 0,
g(x, y) = 0.
f (x, y)
Let F : R2 → R2 be defined by F (x, y) = . Using Taylor’s series expansion for multi
g(x, y)
variable function we have
This method is known as Newton’s iteration method of solving system of nonlinear equations. In
general for system of n linear equations in n unknowns we define the iterates as
Xk+1 = Xk − Jk−1 F (Xk ), (6.28)
(k) (k) (k)
where Xk = [x1 , x2 , . . . , xn ]t , F (X) = [f1 (X), f2 (X), . . . , fn (X)]t and Jk is the Jacobian of
(k) (k) (k)
f1 , f2 , . . . , fn with respect to x1 , x2 , . . . , xn evaluated at Xk = [x1 , x2 , . . . , xn ]t .
Remark 6.6. Suppose we have system of two linear equations say
ax + by = s (6.29)
cx + dy = t. (6.30)
We can view them as f1 (x, y) = ax+by−s
= 0 and f2 (x, y) = cx+dy−t = 0. So that Jacobian J of
fx fy a b
f1 , f2 with respect to x, y is = , which is constant. Now we assume that [x0 , y0 ]t is
gx gy c d
the initial approximation to the solution and J is invertible so that we can apply Newton’s method
to find first approximation to the solution say [x1 , y1 ]t as
−1
x1 x a b f (x0 , y0 )
= 0 − . (6.31)
y1 y0 c d g(x0 , y0 )
Or,
a b x1 a b x0 f1 (x0 , y0 ) ax0 + by0 ax0 + by0 − s s
= − = − = . (6.32)
c d y1 c d y0 g(x0 , y0 ) cx0 + dy0 cx0 + dy0 − t t
This shows that when we apply Newton’s method to linear equations, then defining sequence of
iterates becomes exact.
Remark 6.7. We can find the complex roots of an equation f (z) = 0 by finding the real and
imaginary parts of f (z) = f (x + iy) as f (x + iy) = u(x, y) + iv(x, y) and then solving the system
of equations as
u(x, y) = 0; v(x, y) = 0.
Problem 6.1. Find the solution of the system of following equations
x2 + 10x + 2y − 13 = 0; x2 + 6y 2 − 7 = 0,
using initial approximation as x0 = 0.5 and y0 = 0.5.
Solution. Let f (x, y) = x2 + 10x + 2y − 13 and g(x, y) = x2 + 6y 2 − 7. Thus Jacobian
2x + 10 2 −1 1 12y −2
J= ; J =
2x 12y 24xy + 120y − 4x −2x 2x + 10
x1 x0 f (x0 , y0 ) 0.5 6 −2 −6.75
and = − J −1 (x0 , y0 ) = − 64 1
=
y1 y0 g(x0 , y0 ) 0.5 −1 11 −4.5
0.9921875 x2 x1 f (x1 , y1 ) 0.9921875
and = − J −1 (x1 , y1 ) = −
1.16796875 y2
y 1
g(x1 , y1 )
1.16796875
1 1.4015625 −2 0.2422485352 1.016572644
163.9997559 −1.984375 11.984375 = . Now we
2.169342041
1.021542721
x3 x2 −1 f (x2 , y2 ) 1.016572644
find next iterate as = − J (x2 , y2 ) = −
y3 y2 g(x2 , y2 ) 1.021542721
1 12.19887173 −2 0.2422318225 1.00008153
143.4421732 −2.033145288 12.033145288 = . Thus solution
0.2947171255 1.000252737
correct up to one decimal place is x = 1, y = 1.