0% found this document useful (0 votes)
25 views78 pages

Chapter 1

Uploaded by

metlasreeja5294
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views78 pages

Chapter 1

Uploaded by

metlasreeja5294
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Numerical Methods and Computation

MTL 107

Harish Kumar

Dept. of Mathematics, IIT Delhi.


E-mail: hkumar@maths.iitd.ac.in

1 / 78
Lecture: Introduction

2 / 78
Outline of today’s Lecture

▸ What are Numerical Methods


▸ Lecture Planning
▸ Tutorials and Exams
▸ References
▸ Start of the Lecture

3 / 78
Science and Technology

Experiments Theory

4 / 78
Since 1980 ...

Experiments Theory

Computing

Figure: After access to computing

5 / 78
6 / 78

umerical Methods for Computational Science and Engineering


Introduction
Scientific Computing
Scientific Computing

CSE, Lecture 1, Sept 19, 2013 3/40


Course Plan

▸ Introduction
▸ Calculus and Linear Algebra Review
▸ Roundoff Errors
▸ Nonlinear equations in One Variables
▸ Direct Methods for Linear Systems
▸ Iterative Methods for Linear Systems
▸ Iterative methods for systems of nonlinear equations

7 / 78
Course Plan...

▸ Least Square Approximation


▸ Numerical Differentiation
▸ Numerical Integration
▸ Numerical Methods for Ordinary Differential equations: Initial
Value Problems.

8 / 78
Focus of the Course

▸ Mathematical Foundation of the algorithms.


▸ Scope and Limitations of Algorithms.
▸ Efficient implementation of Algorithms in Matlab .
▸ Numerical experiments.
▸ No focus on Hardware related issues like parallelization,
vectorization etc.

9 / 78
Aim of the Course

▸ Knowledge of basic algorithms in Computational Mathematics.


▸ Ability to analyze numerical algorithms and errors.
▸ Ability of proper error analysis.
▸ To be able to choose appropriate numerical algorithms for a
given problem.
▸ Efficient Implementation of algorithms in Matlab .

10 / 78
Prerequisites

▸ MAL 100: Calculus.


▸ MAL 101: Linear Algebra and Differential Equations.
Revise Them.

11 / 78
Literature

▸ Uri Ascher & Chen Greif: A First Course in Numerical


Methods. SIAM, 2011. See
http://www.siam.org/books/cs07/
▸ Numerical Analysis (9th Ed.), Richard L. Burden, J. Douglas
Faires, 2011.
▸ C. Moler: Numerical Computing with Matlab. SIAM 2004.
See http://www.mathworks.in/moler/.
▸ Numerical Mathematics (2nd Ed.), Alfio Quarteroni, Riccardo
Sacco, Fausto Saleri, Springer, 2007.
▸ Elementary Numerical Analysis: Algorithmic Approach,
Samuel D. Conte, Boor.

12 / 78
Organisation

▸ Prof. Harish Kumar (hkumar@maths.iitd.ac.in) MZ 183


Coordinator (K Slot)
▸ Prof. Kamana Porwal (kamana@iitd.ac.in) MZ 196 (H Slot)

13 / 78
Evaluation Plan

1. Minor 40 Marks
2. Assignments +Quizzes 20 Marks
3. Major 40 Marks

14 / 78
Attendance Requirements

Minimum Attendance Required is 75%

Attendance below this will be penalized in


accordance with the institute rules.

15 / 78
MATLAB

▸ We will use Matlab for implementation of algorithms.


▸ We will organize a small introduction to Matlab in an Extra
class.
▸ Several Matlab resources will be made available on course
website.
▸ We expect you to be able to code in Matlab after first 2-3
weeks, so familiarize yourself with Matlab .
▸ For Python you can use:
https://pythonnumericalmethods.studentorg.
berkeley.edu/notebooks/Index.html

16 / 78
Calculus Review

▸ Limit
▸ Continuity
▸ Differentiability
▸ Riemann Integrability
▸ ..... and related basic results about the real valued functions.

17 / 78
Calculus Review

Theorem (Rolle’s Theorem)


Assume f ∈ C [a, b] and f is differentiable on (a, b). If f (a) = f (b),
then there exist c ∈ (a, b) s.t. f ′ (c) = 0.

18 / 78
Calculus Review

Theorem (Mean Value Theorem)


Assume f ∈ C [a, b] and f is differentiable on (a, b) then there exist
c ∈ (a, b) s.t.,
f (b) − f (a)
f ′ (c) = .
b−a

19 / 78
Calculus Review

Theorem (Extreme Value Theorem)


If f ∈ C [a, b] then there exists c1 , c2 ∈ [a, b] such that
f (c1 ) ≤ f (x) ≤ f (c2 ). Furthermore if f is differentiable in (a, b)
then the numbers c1 and c2 are either occurs at the end points of
[a, b] or they are zeros of f ′ .

20 / 78
Calculus Review

Theorem (Intermediate Value Theorem)


If f ∈ C [a, b] and C is any value in between f (a) and f (b) then
there exists c ∈ (a, b) such that f (c) = C .

21 / 78
Calculus Review

Theorem (Weighted Mean Value Theorem)


Assume that f ∈ C [a, b], g is Riemann integrable on [a, b] and g
do not change sign on [a, b]. Then there exists c ∈ (a, b) such that,
b b
∫ f (x)g (x)dx = f (c) ∫ g (x)dx
a a

22 / 78
Calculus Review

Theorem (Taylor’s Theorem)


Assume that f ∈ C n [a, b], f (n+1) exists on [a, b] , and x0 ∈ [a, b].
Then for every x ∈ [a, b], there exists a number x1 (x) such that,

f (n)
f (x) = f (x0 ) + f ′ (x0 )(x − x0 ) + ⋯ + (x − x0 )n + Rn
n!
f (n+1) (x1 )
where Rn = (n+1)! (x − x0 )n+1 .

23 / 78
Lecture 2
Introduction to the Numerical Methods

24 / 78
Numerical Methods and Errors

▸ Presence of Error is inevitable.


▸ The results are nothing but only approximation.
▸ Goal is to ensure that error is under tolerance level.

25 / 78
Measure of Errors

▸ Absolute error: if v is approximation of u, then absolute error


is
∣∣u − v ∣∣
▸ Relative error
∣∣u − v ∣∣
∣∣u∣∣
▸ Often a combination of both can be used.

26 / 78
Example

▸ The Stirling Approximation


√ n n
v = Sn = 2πn ( )
e
is used to approximate u = n!.
▸ Use Example1 1.m from
http://www.siam.org/books/cs07/programs.zip
▸ Compare absolute and relative errors for n = 1, ⋯, 10.

27 / 78
Example: The Stirling Approximation

% Example 1 . 1 : S t i r l i n g a p p r o x i m a t i o n
format long ;
e=exp ( 1 ) ;
n =1:20; % array
Sn=s q r t ( 2 ∗ p i ∗n ) . ∗ ( ( n/ e ) . ˆ n ) ; % the S t i r l i n g
approximation .
f a c t n= f a c t o r i a l ( n ) ;

a b s e r r=a b s ( Sn−f a c t n ) ; % absolute error


r e l e r r =a b s e r r . / f a c t n ; % relative error

format short g
[ n ; f a c t n ; Sn ; a b s e r r ; r e l e r r ] ’ % p r i n t o u t
values

28 / 78
Types of errors

▸ Modelling errors: Wrong assumptions, wrong simplifications,


wrong input data.
▸ Approximation error: Wrong discretization, divergent method,
unstable method.
▸ Roundoff errors: Due to the computer implementation of
algorithms, finite memory.

29 / 78
Discretization Error

▸ Let f be a smooth function, then by Taylor’s theorem:

h2 ′′
f (x0 + h) = f (x0 ) + hf ′ (x0 ) + f (ξ), x0 < ξ < x0 + h.
2
▸ Rearranging:

f (x0 + h) − f (x0 ) h ′′
f ′ (x0 ) = − f (ξ)
h 2
▸ Discretization error is,

f (x0 + h) − f (x0 ) h ′′ h
∣f ′ (x0 ) − ∣ = ∣f (ξ)∣ ≈ ∣f ′′ (x0 )∣ = O(h).
h 2 2

30 / 78
Discretization Error

▸ Let f = sin(x) and calculate its derivate at x = 1.2


(cos(1.2) = 0.362357754476674).
▸ Error
h Absolute Error
0.1 4.71667 × 10−2
0.01 4.666196 × 10−3
0.001 4.660799 × 10−4
10−4 4.660256 × 10−5
10−7 4.619326 × 10−8
▸ Note that f ′′ (x0 )/2 ≈ −0.466.

31 / 78
What Error is this?

▸ Error for very small h


h Absolute Error
10−8 4.36105 × 10−10
10−9 5.594726 × 10−8
10−10 1.669696 × 10−7
10−11 7.938531 × 10−6
10−13 6.851746 × 10−4
10−15 8.173146 × 10−2
10−16 3.623578 × 10−1

32 / 78
Results

0
10

−5
10
Absolute error

−10
10

−15
10 −20 −15 −10 −5 0
10 10 10 10 10
h

Figure: Error with decreasing h

33 / 78
Properties of Algorithms

▸ Accuracy: How accurate is the algorithm for nice inputs.


▸ Efficiency: How fast we get the result, number of floating
point operations (flops), amount of memory needed.
▸ Robustness and Stability: The algorithm should produce
results under reasonable circumstances and erros should be
acceptable.

34 / 78
Complexity

▸ Complexity or computational cost is number of elementary


operations in an algorithm.
Operation Description #m/d #a/s Comp
inner product (x ∈ Rn , y ∈ Rn ) → x ⊺ y n n-1 O(n)
tensor product (x ∈ R , y ∈ Rm ) → xy ⊺
n
nm 0 O(mn)
matrix product (A ∈ Rm×n , B ∈ Rn×k ) → AB mnk mn(k-1) O(mnk)

35 / 78
Big-O

▸ f = O(g ) if,
∣f (n)∣
lim sup <∞
n→∞ ∣g (n)∣
p
▸ For errors: f = O(h ) if,

∣e∣
lim sup <∞
h→0 hp

36 / 78
Small-o

▸ f = o(g ) if,
∣f (n)∣
lim =0
n→∞ ∣g (n)∣
▸ For errors: f = o(hp ) if,

∣e∣
lim =0
h→0 hp

37 / 78
Θ notation
The Θ notation signifies a stronger relation then O: A function
ϕ(h) for small h is Θ(ψ(h)) is ϕ is asymptotically bounded both
above and below by ψ.

More Percisely: f = Θ(g ) if,

∣f (n)∣ ∣f (n)∣
0 < lim inf ≤ lim sup <∞
n→∞ ∣g (n)∣ n→∞ ∣g (n)∣

Difference: O(h2 ) mean atleast quadratic, whereas Θ(h2 ) means


exact quadratic convergence.

38 / 78
Problem Conditioning

▸ The problem is ill conditioned if a small perturbation in the


data produce a large difference in the result (unstable).
▸ The algorithms is stable if output doesn’t change much with
small change in input.

39 / 78
Stable andNumerical
Unstable Methods for Computational Science and Engineering
Elementary operationsAlgorithms

▸ Stable Algorithm
A stable algorithm

An instance of a stable algorithm for computing y = g (x): the


output ȳ is the exact result, ȳ = g (x̄), for a slightly perturbed
input, i.e.,Figure:
x̄ whichStable
is close to the input x. Thus, if the algorithm is
Algorithm
stable and the problem is well-conditioned, then the computed
result ȳ is close to the exact y .
NumCSE, Lecture 1, Sept 19, 2013 40 / 78
Stable and Unstable Algorithms
Numerical Methods for Computational Science and Engineering
Elementary operations

▸ Unstable Algorithm
An unstable algorithm

Ill-conditioned problem of computing output values y from input


values x by y = g (x ): when x is slightly perturbed to x̄ , the result
ȳ = g (x̄ ) is far from y .
Figure: Unstable Algorithm
NumCSE, Lecture 1, Sept 19, 2013

41 / 78
Example of Unstable Algorithm
Evaluate the integrals,
1 xn
yn = ∫ dx, for, n = 0, 1, 2, ⋯, 30.
0 x + 10
Note that,
1 x n + 10x n−1 1
yn + 10yn−1 = ∫ dx =
0 x + 10 n
and
1 1
y0 = ∫ dx = log(11) − log(10)
0 x + 10

▸ Evaluate y0 = log(11) − log(10)


▸ Calculate yn = n1 − 10yn−1
See Example1.6.m

42 / 78
Example of Unstable Algorithm

% Example 1 . 6 : E v a l u a t e i n t e g r a l r e c u r s i v e f o r m u l a
y (1) = log (11) − log (10) ; % t h i s i s y 0
f o r n =1:30
y ( n+1) = 1/ n − 10∗ y ( n ) ;
end
% For c o m p a r i s o n , u s e n u m e r i c a l
f o r n = 1:31
z ( n ) = quad (@( x ) f u n 1 6 ( x , n−1) , 0 , 1 , 1 . e −10) ;
end
format long g
fprintf ( ’ recursion result quadrature r e s u l t
a b s ( d i f f e r e n c e ) \n ’ )
f o r n = 1:31
f p r i n t f ( ’ %e %e %e \n ’ , y ( n ) , z ( n ) , a b s
( y ( n )−z ( n ) ) )
end

43 / 78
Error in Stable/Unstable Algorithms

▸ In general if, En is the error after n elementary operations, it


should behave like
En ≈ c0 nE0
▸ If error behaves exponentially,

En ≈ c1n E0 ,

then algorithm is unstable.

44 / 78
Lecture 3
Rounding Errors

45 / 78
In this Section
▸ Understand how numbers are stored in computer.
▸ How Roundoff error can accumulates.
▸ Some recipes to avoid them.

46 / 78
Introduction I

One of the important tasks of numerical mathematics is the


determination of the accuracy of results of some computation.
There are three types of errors that limit accuracy:
1. Errors in the mathematical model of the problem to be solved.
Simplified models are easier to solve (shape of objects,
’unimportant’ chemical reactants, linearisation).
2. Discretization or approximation errors depend on the chosen
algorithm or the type of discretization.
▸ It May occur even when computing without rounding error is
approximated by a finite number of terms (truncation error),
▸ Function is approximated by, e.g., a piecewise linear function
▸ Derivative by finite differences.
▸ Finite number of iteration

47 / 78
Introduction II

3 Rounding errors occur if a real number (probably an


intermediate result of some computation) is rounded to the
next nearest machine number. The propagation of rounding
errors from one floating point operation to the next is the
most frequent source of numerical instabilities. Since
computer memory is finite practically no real number can be
represented exactly in a computer.
We discuss floating point numbers as a representation of real
numbers.

48 / 78
Computation of Pi
Motivating example: quadrature of a circle
Let’s try to compute π which is the area of a circle with radius r =
1. We Let’s
approximate by the area
try to πcompute ⇡, ofthe
an area
inscribed
of a regular
circle polygon:
with radius
r=
We approximate ⇡ by the area of an inscribed regular poly

↵n := 2⇡
n
Fn = cos ↵2n si

↵n ↵n
An =Figure:
nFn = n cos
Area sin
of the Circle ! ⇡ as n ! 1
2 2
[See Gander, Gander, & Kwok: Scientific Computing. Springer. Com
49 / 78
Computation of Pi

▸ Define αn = 2π αn αn
n , then area of the triangle is Fn = cos 2 sin 2 .
▸ Area An covered by rotating this triangle n times is
n cos α2n sin α2n
▸ An → π as n → ∞.
n
▸ An = 2 sin( 2π
n )=
n
2 sin(αn )
√ √ √
▸ sin(α2n ) = sin αn 1−cos(αn ) 1− 1−sin2 αn
2 = 2 = 2 .

▸ sin(α6 ) = 3
2

50 / 78
Computation of Pi

clear all ;
clc ;
s=s q r t ( 3 ) / 2 ; A=3∗ s ; n =6;
z =[A−p i n A s ] ;
w h i l e s >1e −10
% initialization
% s t o r e the r e s u l t s
% t e r m i n a t e i f s=s i n ( a l p h a ) s m a l l
s=s q r t ((1 − s q r t (1− s ∗ s ) ) / 2 ) ; % new s i n ( a l p h a / 2 ) v a l u e
n=2∗n ; A=n /2∗ s ; % A = new p o l y g o n a l a r e a
z =[ z ; A−p i n A s ] ;
end
f o r i =1: l e n g t h ( z )
f p r i n t f ( ’%10d %20.15 f %20.15 f %20.15 f \n ’ , z ( i , 2 ) , z ( i
,3) , z ( i ,1) , z ( i ,4) ’)
end

51 / 78
Integers

Integers also suffer from the finiteness of computers.


Matlab represents integers by 32-bit signed int’s (in the two’s
complement format)

⎪Σ30 ai 2i if a31 = 0,

a = ⎨ i=032 30 i
⎩−(2 − Σi=0 ai 2 ) if a31 = 1,

Therefore the range is −2147483648 ≤ a ≤ 2147483647. These


numbers are given by intmin and intmax, respectively.

52 / 78
Real Numbers

A number x ∈ R (in the binary number system) has the form

x = ±(1.d1 d2 d3 ⋯dt−1 dt dt+1 ⋯) × 2e

where e is an integer exponent.


The binary digits di are either 0 or 1. So,

d1 d2 d3
1.d1 d2 d3 ⋯ = 1 + + + +⋯
2 22 23
In general, infinitely many digits are needed to represent a real
number.
The choice of a binary representation is just one of many
possibilities. It is, indeed, a convenient choice when it comes to
computers.

53 / 78
Examples

1
1. −1.1012 × 2 = −(1 + 2 + 18 ) × 2 = −3.25
2. (10011.01)2 = 19.25
1
3. (0.010101)2 = 3
1
4. (0.00110011) = 5 = 0.2
The last example is of interest insofar as it shows that to a finite
decimal number there may correspond a (nontrivial) infinite binary
representation. (This is not true the other way round. Why?) So,
one cannot assume that a finite decimal number is exactly
representable on a binary computer.

54 / 78
Floating Point representation

Given any real number


d1 d2 dt
x = ± (1 + + 2 + ⋯ t + ⋯) × 2e
2 2 2
= ±(1.d1 d2 d3 ⋯dt−1 dt dt+1 ⋯) × 2e
with di = {0, 1}.
We somehow want represent this number in computer as

fl(x) = sign(x) × (1.d˜1 , d˜2 , ⋯, d˜t−1 , d˜t ) × 2e .

So, we need to store, sign, t bits (t digits), and exponent e.

55 / 78
Floating point systems
Definition (Floating point system:)
A floating point system can be characterize by using 4-tuple
(β, t, L, U), where
▸ β is the base of number system
▸ t is the number of digits (percision)
▸ L is the lower bound on exponenet e.
▸ R is the upper bound on exponenet e.

So,
d0 d1 dt−1
fl(x) = ± ( 0
+ 1 + ⋯ + t−1 ) × β e
β β β

▸ Single Percison: β = 2, t = 23, L = −126, U = 127


▸ Double Percison: β = 2, t = 52, L = −1022, U = 1023
▸ Sign is stored sepearately
56 / 78
Error in Floating point representation

How big is the relative error

∣fl(x) − x∣
∣x∣

is ?

57 / 78
Floating point systems: Rounding Unit and Significant
digits

Relative error in a floating point representation is called Rounding


Unit or machine percision or machine epsiolon. For a general
floating point system rounding unit is
1
η = β 1−t
2
Also, in that case then t − 1 is called Number of Significant Digits.

58 / 78
Error in Floating point representation
▸ Chopping:
fl(x) = ±(1.d1 d2 d3 ⋯dt−1 dt ) × 2e
Absolute error is bounded by 2−t ⋅ 2e
▸ Rounding:

⎪±(1.d1 d2 d3 ⋯dt−1 dt ) × 2e if 0.dt+1 ⋯ < 1/2

fl(x) = ⎨ e
⎩±(1.d1 d2 d3 ⋯dt−1 dt + 2 ) × 2 if 0.dt+1 ⋯ > 1/2
⎪ −t

Absolute Error is 21 2−t ⋅ 2e . Relative error bounded by


rounding unit, machine precision is,
1
η = 2−t
2
(EXERCISE)

59 / 78
Errors for general floating point systems

Theorem
Let x → fl(x) = g × β e , where x ≠ 0 and g is normalized, signed
mantissa. Then the absolute error committed in using floating
point representation of x is bounded by,

β 1−t β e for chopping,


∣x − fl(x)∣ ≤ { 1 1−t e
2β β for rounding.

The relative errors are:


∣x − fl(x)∣ β 1−t for chopping,
≤ { 1 1−t
∣x∣ 2β for rounding.

60 / 78
IEEE
IEEEfloating
floatingpoint
point numbers
numbers
ANSI/IEEE Standard 754-1985 for Binary Floating Point
ANSI/IEEE Standard 754-1985 for Binary Floating Point
Arithmetic. Acording to the IEEE standard a 32-bit float has the
Arithmetic. Acording to the IEEE standard a 32-bit float has the
following
followingstructure
structure(from
(from en.wikipedia.org)
en.wikipedia.org)

The exponent has 8 bits, the mantissa


Figure: 23 Bit. There is a sign bit.
32 Bit Number
The value of a normalized 32-bit IEEE floating point number V is

The exponent has 8 Vbits, theS mantissa


= (-1) x 2(E-127) 23 Bit. There is a sign bit.
x (1.M)
The value of a normalized 32-bit IEEE floating point number V is
Normalized means 0 < E < 255 = 28 1. (127 is called a bias.)
NumCSE, Lecture 2, Sept 23, 2013 V = (−1)S × 2E −127 × (1.M) 14/34

Normalized means 0 < E < 255 = 28 − 1. (127 is called a bias.)


61 / 78
Double

double:
▸ 1 sign bit
▸ 11 bits exponent
▸ 52 bits mantissa
The value of a normalized 64-bit IEEE floating point number V is

V = (−1)S × 2E −1023 × (1.M)

Normalized means, that 0 < E < 2047 = 211 − 1.

62 / 78
Special Numbers

If the exponent has only zeros or ones, there is a special cases:


▸ 0 (zero): e = 0, m = 0, s arbitrary.
▸ -Infinity, +Infinity: e= all ones, m = 0.
▸ e=all ones, m ≠ 0 : NaN
There are also non-normalized numbers.

63 / 78
Rounding errors in IEEE
Parameters of IEEE Standard arithmetics with base E = 2.
Percision t emin emax η
Single 23 -125 128 2−24 ≈ 6 ⋅ 10−8
Double 52 -1021 1024 2−53 ≈ 6 ⋅ 10−16
Extended 63 -16381 16384 2−64 ≈ 5 ⋅ 10−20
Table: Errors in various percisons

Lemma
(EXERCISE):If x ≠ 0is a normalized floating point number and
fl(x) obtained by rounding with t digits, then

∣fl(x) − x∣ 2−t
∣fl(x) − x∣ ≤ 2e−t /2, ≤ ≡η
∣x∣ 2

64 / 78
Rounding errors
We
Weassume
assume that
that all numbers
numbers are are normalized.
normalized.
Let
Lettt be
be the
the length the mantissa.
length of the mantissa.
Between
Between powers
powers of 2, the
the floating
floating point
pointnumbers
numbersare
areequidistant.
equidistant.

Here, the length of the mantissa is t = 2 and 2  e  2.


Figure: Distribution of Numbers
Definition
Machine precision = 2 (t+1) (half of Matlab’s eps)
Here, the length of the mantissa
This is half the distance of the numbers is t = 2between
and −2 1≤ and
e ≤ 2.
2. Definition
Machine precision = 2−(t+1) (half of Matlab eps) This is half the
distance of the numbers between 1 and 2.
NumCSE, Lecture 2, Sept 23, 2013 18/34

65 / 78
Rounding errors

Rounding error are random


Rounding errors are random

NumCSE, Lecture 2, Sept 23, 2013 19/34

Figure: Sampling Errors

66 / 78
Floating point Arithmetic

Important to use exact rounding: if x and y are machine numbers,


then

fl(x ± y ) = (x ± y )(1 + ε1 ),
fl(x × y ) = (x × y )(1 + ε2 ),
fl(x/y ) = (x/y )(1 + ε3 ),

with ∣εi ∣ ≤ η.
In other words: The result of a basic operation with two floating
point numbers yields a result that is correct up to a relative error
smaller than η. Thus, the relative errors remain small after each
such operation.
This is achieved only using guard digits (intermediate higher
precision).

67 / 78
Guard Digit

Floating point system with β = 10 and t = 4. So, η = 12 10−4 . Let

x = 0.1103 = 1.103 × 101 , y = 9.963 × 10−4 .

Then, x − y = 0.100337. Hence, exact rounding yields 0.1003.

∣0.100337−0.1003∣
Relative error: ∣0.100337∣ ≈ 0.37 × 10−4 < η

However, if we were to subtract these two numbers without guard


digits we would obtain 0.1103 − 0.0099 = 0.1004. Now the obtained
relative error is 0.63 × 10−4 > η

Thus, guard digits must be used to produce exact rounding.

68 / 78
Rounding error example

For t = 5 we have η = 2−6 = 0.015625.

fl(π) = fl(21 + 20 + 2−3 + 2−5 + ⋯) = 21 + 20 + 2−3 = 1.10010 ⋅ 21

∣fl(π) − π∣
≈ 0.0053
∣π∣
Similarly,

∣π 2 − fl(fl(π)fl(π))∣
π 2 − fl(fl(π)fl(π)) ≈ 0.12, ≈ 0.012
π2

69 / 78
Note on machine epsilon

▸ For any number with α ≤ η we have fl(1 + α) = 1.


▸ The Matlab command eps returns η = 2−52 , i.e., the
smallest positive number (for the data type double) for which
1 + eps greater than 1.
▸ eps can have a parameter, see help eps.
▸ In the finite difference example we had for very small h that
fl(f (x + h)) = fl(f (x)). Therefore,

fl(f (x0 + h)) − fl(f (x0 ))


∣ − fl(f ′ (x0 ))∣ = ∣fl(f ′ (x0 ))∣.
h

70 / 78
Rounding errors summary

Lemma
1. With the machine precision η we have fl(x) = x(1 + ε) with
∣ε∣ ≤ η.
2. If ∗ is an elementary operation then fl(x ∗ y ) = (x ∗ y )(1 + ε)
with ∣ε∣ ≤ η.

Wilkinson’s Principle
The result of a numerical computation on the computer is the
exact result with slightly perturbed initial data.
This also holds for good implementations of (library) functions!

71 / 78
Cancellation

Cancellation is a special kind of rounding error. Consider the


following two numbers with 5 decimal digits:

1.2345e0 − 1.2344e0 = 0.0001e0 = 1.0000e − 4

If the two numbers were exact, the result delivered by the


computer would also be exact. But if the first two numbers had
been obtained by previous calculations and were affected by
rounding errors, then the result would at best be 1.xxxxe − 4,
where the digits denoted by x are unknown.

72 / 78
Cancellation (cont.)

Suppose z = x − y , where x ≈ y . Then

∣z − fl(z)∣ ≤ ∣x − fl(x)∣ + ∣y − fl(y )∣,

from which it follows that the relative error satisfies


∣z − fl(z)∣ ∣x − fl(x)∣ + ∣y − fl(y )∣

∣z∣ ∣x − y ∣

Numerator: OK.
Denominator is very close to zero if x ≈ y . So the relative error in
z could become large.

73 / 78
Library function for sinh

The sinus hyperbolicus is defined as


1
y = sinh(x) = (e x − e −x ).
2
Ifx ≈ 0 we have to expect cancellation in the two terms. Using the
Taylor expansion of sinh:

x3 ξ5
sinh(x) = x + + , ∣ξ∣ < x.
6 120
For small enough x we can expect very good approximations.

74 / 78
Quadrature of a circle revisited

This is what happened in our previous example to compute π by


the inscribed regular polynoms.
To compute sin(α/2) from sin(α), we used the recurrence:
¿ √
αn Á À 1 − 1 − sin2 αn
Á
sin =
2 2
Since sin αn → 0, the numerator on the right

1 − 1 − ε2

with small ε = sin(αn ). which suffers from severe cancellation.


Therefore the algorithm performed so badly, although theory and
program are both correct.

75 / 78
Fix

Quadrature of a circle revisited (cont.)

αn sin αn
sin =√ √ .
2 2
2(1 + 1 − sin αn )

76 / 78
A Very Good Source For Floating Point Arithmetic

Handbook of Floating-Point Arithmetic, Second Edition ,2018,


Springer

Authors: Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin,


Claude-Pierre Jeannerod, Mioara Joldes, Vincent Lefèvre,
Guillaume Melquiond, Nathalie Revol, Serge Torres

77 / 78
Paper on high precision computation

David H. Bailey, Roberto Barrio, and Jonathan M. Borwein, High


precision computation: Mathematical physics and dynamics,
Applied Mathematics and Computation, vol. 218 (2012), pp.
10106-10121.
http://dx.doi.org/10.1016/j.amc.2012.03.087
Gist of the paper:
In many very large scale problems it is difficult to achieve sufficient
accuracy: for a rapidly growing body of important scientific
computing applications, a higher level of numeric precision is
required. This is facilitated by high-precision software packages.
Software available, but awfully slow.

78 / 78

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy