Chapter 1: Introduction and Mathematical Preliminaries: Evy Kersal e

Chapter 1: Introduction and mathematical preliminaries
Evy Kersalé
September 26, 2011

Motivation
Most of the mathematical problems you have encountered so far can be solved
analytically. However, in real-life, analytic solutions are rather rare, and
therefore we must devise a way of approximating the solutions.
Motivation
Z 2 Z 2
2
For example, while ex dx has a well known analytic solution, ex dx can
1 Z 2 1
3
only be solved in terms of special functions and ex dx has no analytic
1
solution.
Motivation
Z 2 Z 2
2
For example, while ex dx has a well known analytic solution, ex dx can
1 Z 2 1
3
only be solved in terms of special functions and ex dx has no analytic
1
solution.
These integrals exist as the area un-

der the curves y = exp(x 2 ) and
y = exp(x 3 ).
We can obtain a numerical approxi-
mation by estimating this area, e.g.
by dividing it into strips and using
the trapezium rule.
1 2
Definition
Numerical analysis is a part of mathematics concerned with

Definition

I devising methods, called numerical algorithms, for obtaining numerical
approximate solutions to mathematical problems;
Definition

I being able to estimate the error involved.
Definition

Traditionally, numerical algorithms are built upon the most simple arithmetic
operations (+, −, × and ÷).
Definition

Traditionally, numerical algorithms are built upon the most simple arithmetic
operations (+, −, × and ÷).
Interestingly, digital computers can only do these very basic operations.

However, they are very fast and, hence, have led to major advances in applied
mathematics in the last 60 years.
Computer arithmetic: Floating point numbers.
Computers can store integers exactly but not real numbers in general. Instead,
they approximate them as floating point numbers.
A decimal floating point (or machine number) is a number of the form
± 0. d1 d2 . . . dk ×10±n , 0 ≤ di ≤ 9, d1 6= 0,
| {z }
m
where the significand or mantissa m (i.e. the fractional part) and the exponent
n are fixed-length integers. (m cannot start with a zero.)
A decimal floating point (or machine number) is a number of the form
± 0. d1 d2 . . . dk ×10±n , 0 ≤ di ≤ 9, d1 6= 0,
| {z }
m
where the significand or mantissa m (i.e. the fractional part) and the exponent
n are fixed-length integers. (m cannot start with a zero.)
In fact, computers use binary numbers (base 2) rather than decimal numbers
(base 10) but the same principle applies (see handout).
Computer arithmetic: Machine ε.
Consider a simple computer where m is 3 digits long and n is one digit long.
The smallest positive number this computer can store is 0.1 × 10−9 and the
largest is 0.999 × 109 .
largest is 0.999 × 109 .
Thus, the length of the exponent determines the range of numbers that can be
stored.
largest is 0.999 × 109 .
stored.
However, not all values in the range can be distinguished: numbers can only be
recorded to a certain relative accuracy ε.
largest is 0.999 × 109 .
stored.
However, not all values in the range can be distinguished: numbers can only be
recorded to a certain relative accuracy ε.
For example, on our simple computer, the next floating point number after
1 = 0.1 × 101 is 0.101 × 101 = 1.01. The quantity εmachine = 0.01 (machine ε) is
the worst relative uncertainty in the floating point representation of a number.
Chopping and rounding
There are two ways of terminating the mantissa of the k-digit decimal machine
number approximating 0.d1 d2 . . . dk dk+1 dk+2 . . . × 10n , 0 ≤ di ≤ 9, d1 6= 0,
I Chopping: chop off the digits dk+1 , dk+2 , . . . to get 0.d1 d2 . . . dk × 10n .
I Rounding: add 5 × 10n−(k+1) and chop off the k + 1, k + 2, . . . digits. (If
dk+1 ≥ 5 we add 1 to dk before chopping.)
Rounding is more accurate than chopping.
Example
The five-digit floating-point form of π = 3.14159265359 . . . is 0.31415 × 10
using chopping and 0.31416 × 10 using rounding.
Example
The five-digit floating-point form of π = 3.14159265359 . . . is 0.31415 × 10
using chopping and 0.31416 × 10 using rounding.
Similarly, the five-digit floating-point form of 2/3 = 0.6666666 . . . is 0.66666

using chopping and 0.66667 using rounding but that of 1/3 = 0.3333333 . . . is
0.33333 using either chopping or rounding.
Measure of the error
Much of numerical analysis is concerned with controlling the size of errors in

calculations. These errors, quantified in two different ways, arise from two
distinct sources.

distinct sources.
Let p? be the result of a numerical calculation and p the exact answer (i.e. p?
is an approximation to p). We define two measures of the error,

distinct sources.
I Absolute error: E = |p − p? |

distinct sources.
I Relative error: Er = |p − p? |/|p| (provided p 6= 0) which takes into
consideration the size of the value.

distinct sources.
I Relative error: Er = |p − p? |/|p| (provided p 6= 0) which takes into
consideration the size of the value.
Example
If p = 2 and p? = 2.1, the absolute error E = 10−1 ;
if p = 2 × 10−3 and p? = 2.1 × 10−3 , E = 10−4 is smaller;
and if p = 2 × 103 and p? = 2.1 × 103 , E = 102 is larger
but in all three cases the relative error remains the same, Er = 5 × 10−2 .
Round-off errors
Caused by the imprecision of using finite-digit arithmetic in practical

calculations (e.g. floating point numbers).
Round-off errors

Example
√
The 4-digit representation of x = 2 = 1.4142136 . . . is
x? = 1.414 = 0.1414 × 10. Using 4-digit arithmetic, we can evaluate
x?2 = 1.999 6= 2, due to round-off errors.
Round-off errors

Example
√
The 4-digit representation of x = 2 = 1.4142136 . . . is
x? = 1.414 = 0.1414 × 10. Using 4-digit arithmetic, we can evaluate
x?2 = 1.999 6= 2, due to round-off errors.
Round-off errors can be minimised by reducing the number of arithmetic

operations, particularly those that magnify errors.
Magnification of the error.
Computers store numbers to a relative accuracy ε. Thus, the true value of a

floating point number x? could be anywhere between x? (1 − ε) and x? (1 + ε).

Now, if we add two numbers together, x? + y? , the true value lies in the interval
(x? + y? − ε(|x? | + |y? |), x? + y? + ε(|x? | + |y? |)) .


(x? + y? − ε(|x? | + |y? |), x? + y? + ε(|x? | + |y? |)) .
Thus, the absolute error is the sum of the errors in x and y , E = ε(|x? | + |y? |)
but the relative error of the answer is
|x? | + |y? |
Er = ε .
|x? + y? |

(x? + y? − ε(|x? | + |y? |), x? + y? + ε(|x? | + |y? |)) .
|x? | + |y? |
Er = ε .
|x? + y? |
If x? and y? both have the same sign the relative accuracy remains equal to ε,
but if the have opposite signs the relative error will be larger.

(x? + y? − ε(|x? | + |y? |), x? + y? + ε(|x? | + |y? |)) .
|x? | + |y? |
Er = ε .
|x? + y? |
If x? and y? both have the same sign the relative accuracy remains equal to ε,
but if the have opposite signs the relative error will be larger.
This magnification becomes particularly significant when two very close
numbers are subtracted.
Magnification of the error: Example
Recall: the exact solutions of the quadratic equation ax 2 + bx + c = 0 are

√ √
−b − b 2 − 4ac −b + b 2 − 4ac
x1 = , x2 = .
2a 2a

√ √
−b − b 2 − 4ac −b + b 2 − 4ac
x1 = , x2 = .
2a 2a
Using 4-digit rounding arithmetic, solve the quadratic equation
x 2 + 62x + 1 = 0, with roots x1 ' −61.9838670 and x2 ' −0.016133230.

√ √
−b − b 2 − 4ac −b + b 2 − 4ac
x1 = , x2 = .
2a 2a
√ √
The discriminant b 2 − 4ac = 3840 = 61.97 is close to b = 62.
Thus, x1? = (−62 − 61.97)/2 = −124.0/2 = −62, with a relative error
Er = 2.6 × 10−4 ,
but x2? = (−62 + 61.97)/2 = −0.0150, with a much larger relative error
Er = 7.0 × 10−2 .

√ √
−b − b 2 − 4ac −b + b 2 − 4ac
x1 = , x2 = .
2a 2a
√ √
The discriminant b 2 − 4ac = 3840 = 61.97 is close to b = 62.
Thus, x1? = (−62 − 61.97)/2 = −124.0/2 = −62, with a relative error
Er = 2.6 × 10−4 ,
but x2? = (−62 + 61.97)/2 = −0.0150, with a much larger relative error
Er = 7.0 × 10−2 .
Similarly, division by small numbers (or equivalently, multiplication by large

numbers) magnifies the absolute error, leaving the relative error unchanged.
Truncation errors
Caused by the approximations in the computational algorithm itself. (An

algorithm only gives an approximate solution to a mathematical problem, even
if the arithmetic is exact.)
Truncation errors

Example
Calculate the derivative of a function f (x) at the point x0 .
df f (x0 + h) − f (x0 )
Recall the definition of the derivative (x0 ) = lim .
dx h→0 h
Truncation errors

Example
df f (x0 + h) − f (x0 )
dx h→0 h
However, on a computer we cannot take h → 0 (there exists a smallest positive
floating point number), so h must take a finite value.
Truncation errors

Example
df f (x0 + h) − f (x0 )
dx h→0 h
Truncation errors

Example
df f (x0 + h) − f (x0 )
dx h→0 h
Using Taylor’s theorem, f (x0 + h) = f (x0 ) + hf 0 (x0 ) + h2 /2 f 00 (ξ), where
x0 < ξ < x0 + h.
Truncation errors

Example
df f (x0 + h) − f (x0 )
dx h→0 h
x0 < ξ < x0 + h. Therefore,
f (x0 + h) − f (x0 ) hf 0 (x0 ) + h2 /2 f 00 (ξ) h h

= = f 0 (x0 )+ f 00 (ξ) ' f 0 (x0 )+ f 00 (x0 ).
h h 2 2
Truncation errors

Example
df f (x0 + h) − f (x0 )
dx h→0 h
x0 < ξ < x0 + h. Therefore,
f (x0 + h) − f (x0 ) hf 0 (x0 ) + h2 /2 f 00 (ξ) h h

= = f 0 (x0 )+ f 00 (ξ) ' f 0 (x0 )+ f 00 (x0 ).
h h 2 2
Using a finite value of h leads to a truncation error of size ' h/2 |f 00 (x0 )|.
Truncation errors: Example continued
Clearly, the truncation error h/2 |f 00 (x0 )| decreases with decreasing h.

The absolute round-off error in f (x0 + h) − f (x0 ) is 2ε|f (x0 )| and that in the
derivative (f (x0 + h) − f (x0 ))/h is 2ε/h |f (x0 )|.
So, the round-off error increases with decreasing h .
The absolute round-off error in f (x0 + h) − f (x0 ) is 2ε|f (x0 )| and that in the
derivative (f (x0 + h) − f (x0 ))/h is 2ε/h |f (x0 )|.
So, the round-off error increases with decreasing h .
The relative accuracy of the calcu- 1

Er
lation of f 0 (x0 ) (i.e. the sum of the ∝
h
relative truncation and round-off er-
∝h
rors) is
h |f 00 | 2ε |f |
Er = + ,
2 |f 0 | h |f 0 |
hm h
√ p √ p
which has a minimum, min(Er ) = 2 ε |ff 00 |/|f 0 |, for hm = 2 ε |f /f 00 |.

Chapter 1: Introduction and Mathematical Preliminaries: Evy Kersal e

Uploaded by

Copyright:

Available Formats

Chapter 1: Introduction and Mathematical Preliminaries: Evy Kersal e

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1: Introduction and Mathematical Preliminaries: Evy Kersal e

Uploaded by

Copyright:

Available Formats

Chapter 1: Introduction and mathematical preliminaries

September 26, 2011

These integrals exist as the area un-

Numerical analysis is a part of mathematics concerned with

Numerical analysis is a part of mathematics concerned with

Numerical analysis is a part of mathematics concerned with

Numerical analysis is a part of mathematics concerned with

Numerical analysis is a part of mathematics concerned with

Interestingly, digital computers can only do these very basic operations.

Similarly, the five-digit floating-point form of 2/3 = 0.6666666 . . . is 0.66666

Much of numerical analysis is concerned with controlling the size of errors in

Much of numerical analysis is concerned with controlling the size of errors in

Much of numerical analysis is concerned with controlling the size of errors in

Much of numerical analysis is concerned with controlling the size of errors in

Much of numerical analysis is concerned with controlling the size of errors in

Caused by the imprecision of using finite-digit arithmetic in practical

Caused by the imprecision of using finite-digit arithmetic in practical

Caused by the imprecision of using finite-digit arithmetic in practical

Round-off errors can be minimised by reducing the number of arithmetic

Computers store numbers to a relative accuracy ε. Thus, the true value of a

Computers store numbers to a relative accuracy ε. Thus, the true value of a

(x? + y? − ε(|x? | + |y? |), x? + y? + ε(|x? | + |y? |)) .

Computers store numbers to a relative accuracy ε. Thus, the true value of a

(x? + y? − ε(|x? | + |y? |), x? + y? + ε(|x? | + |y? |)) .

Computers store numbers to a relative accuracy ε. Thus, the true value of a

(x? + y? − ε(|x? | + |y? |), x? + y? + ε(|x? | + |y? |)) .

Computers store numbers to a relative accuracy ε. Thus, the true value of a

(x? + y? − ε(|x? | + |y? |), x? + y? + ε(|x? | + |y? |)) .

Recall: the exact solutions of the quadratic equation ax 2 + bx + c = 0 are

Recall: the exact solutions of the quadratic equation ax 2 + bx + c = 0 are

Recall: the exact solutions of the quadratic equation ax 2 + bx + c = 0 are

Recall: the exact solutions of the quadratic equation ax 2 + bx + c = 0 are

Similarly, division by small numbers (or equivalently, multiplication by large

Caused by the approximations in the computational algorithm itself. (An

Caused by the approximations in the computational algorithm itself. (An

Caused by the approximations in the computational algorithm itself. (An

Caused by the approximations in the computational algorithm itself. (An

Caused by the approximations in the computational algorithm itself. (An

Caused by the approximations in the computational algorithm itself. (An

f (x0 + h) − f (x0 ) hf 0 (x0 ) + h2 /2 f 00 (ξ) h h

Caused by the approximations in the computational algorithm itself. (An

f (x0 + h) − f (x0 ) hf 0 (x0 ) + h2 /2 f 00 (ξ) h h

Clearly, the truncation error h/2 |f 00 (x0 )| decreases with decreasing h.

Clearly, the truncation error h/2 |f 00 (x0 )| decreases with decreasing h.

Clearly, the truncation error h/2 |f 00 (x0 )| decreases with decreasing h.

The relative accuracy of the calcu- 1

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.