Chapter 1
Chapter 1
MTL 107
Harish Kumar
1 / 78
Lecture: Introduction
2 / 78
Outline of today’s Lecture
3 / 78
Science and Technology
Experiments Theory
4 / 78
Since 1980 ...
Experiments Theory
Computing
5 / 78
6 / 78
▸ Introduction
▸ Calculus and Linear Algebra Review
▸ Roundoff Errors
▸ Nonlinear equations in One Variables
▸ Direct Methods for Linear Systems
▸ Iterative Methods for Linear Systems
▸ Iterative methods for systems of nonlinear equations
7 / 78
Course Plan...
8 / 78
Focus of the Course
9 / 78
Aim of the Course
10 / 78
Prerequisites
11 / 78
Literature
12 / 78
Organisation
13 / 78
Evaluation Plan
1. Minor 40 Marks
2. Assignments +Quizzes 20 Marks
3. Major 40 Marks
14 / 78
Attendance Requirements
15 / 78
MATLAB
16 / 78
Calculus Review
▸ Limit
▸ Continuity
▸ Differentiability
▸ Riemann Integrability
▸ ..... and related basic results about the real valued functions.
17 / 78
Calculus Review
18 / 78
Calculus Review
19 / 78
Calculus Review
20 / 78
Calculus Review
21 / 78
Calculus Review
22 / 78
Calculus Review
f (n)
f (x) = f (x0 ) + f ′ (x0 )(x − x0 ) + ⋯ + (x − x0 )n + Rn
n!
f (n+1) (x1 )
where Rn = (n+1)! (x − x0 )n+1 .
23 / 78
Lecture 2
Introduction to the Numerical Methods
24 / 78
Numerical Methods and Errors
25 / 78
Measure of Errors
26 / 78
Example
27 / 78
Example: The Stirling Approximation
% Example 1 . 1 : S t i r l i n g a p p r o x i m a t i o n
format long ;
e=exp ( 1 ) ;
n =1:20; % array
Sn=s q r t ( 2 ∗ p i ∗n ) . ∗ ( ( n/ e ) . ˆ n ) ; % the S t i r l i n g
approximation .
f a c t n= f a c t o r i a l ( n ) ;
format short g
[ n ; f a c t n ; Sn ; a b s e r r ; r e l e r r ] ’ % p r i n t o u t
values
28 / 78
Types of errors
29 / 78
Discretization Error
h2 ′′
f (x0 + h) = f (x0 ) + hf ′ (x0 ) + f (ξ), x0 < ξ < x0 + h.
2
▸ Rearranging:
f (x0 + h) − f (x0 ) h ′′
f ′ (x0 ) = − f (ξ)
h 2
▸ Discretization error is,
f (x0 + h) − f (x0 ) h ′′ h
∣f ′ (x0 ) − ∣ = ∣f (ξ)∣ ≈ ∣f ′′ (x0 )∣ = O(h).
h 2 2
30 / 78
Discretization Error
31 / 78
What Error is this?
32 / 78
Results
0
10
−5
10
Absolute error
−10
10
−15
10 −20 −15 −10 −5 0
10 10 10 10 10
h
33 / 78
Properties of Algorithms
34 / 78
Complexity
35 / 78
Big-O
▸ f = O(g ) if,
∣f (n)∣
lim sup <∞
n→∞ ∣g (n)∣
p
▸ For errors: f = O(h ) if,
∣e∣
lim sup <∞
h→0 hp
36 / 78
Small-o
▸ f = o(g ) if,
∣f (n)∣
lim =0
n→∞ ∣g (n)∣
▸ For errors: f = o(hp ) if,
∣e∣
lim =0
h→0 hp
37 / 78
Θ notation
The Θ notation signifies a stronger relation then O: A function
ϕ(h) for small h is Θ(ψ(h)) is ϕ is asymptotically bounded both
above and below by ψ.
∣f (n)∣ ∣f (n)∣
0 < lim inf ≤ lim sup <∞
n→∞ ∣g (n)∣ n→∞ ∣g (n)∣
38 / 78
Problem Conditioning
39 / 78
Stable andNumerical
Unstable Methods for Computational Science and Engineering
Elementary operationsAlgorithms
▸ Stable Algorithm
A stable algorithm
▸ Unstable Algorithm
An unstable algorithm
41 / 78
Example of Unstable Algorithm
Evaluate the integrals,
1 xn
yn = ∫ dx, for, n = 0, 1, 2, ⋯, 30.
0 x + 10
Note that,
1 x n + 10x n−1 1
yn + 10yn−1 = ∫ dx =
0 x + 10 n
and
1 1
y0 = ∫ dx = log(11) − log(10)
0 x + 10
42 / 78
Example of Unstable Algorithm
% Example 1 . 6 : E v a l u a t e i n t e g r a l r e c u r s i v e f o r m u l a
y (1) = log (11) − log (10) ; % t h i s i s y 0
f o r n =1:30
y ( n+1) = 1/ n − 10∗ y ( n ) ;
end
% For c o m p a r i s o n , u s e n u m e r i c a l
f o r n = 1:31
z ( n ) = quad (@( x ) f u n 1 6 ( x , n−1) , 0 , 1 , 1 . e −10) ;
end
format long g
fprintf ( ’ recursion result quadrature r e s u l t
a b s ( d i f f e r e n c e ) \n ’ )
f o r n = 1:31
f p r i n t f ( ’ %e %e %e \n ’ , y ( n ) , z ( n ) , a b s
( y ( n )−z ( n ) ) )
end
43 / 78
Error in Stable/Unstable Algorithms
En ≈ c1n E0 ,
44 / 78
Lecture 3
Rounding Errors
45 / 78
In this Section
▸ Understand how numbers are stored in computer.
▸ How Roundoff error can accumulates.
▸ Some recipes to avoid them.
46 / 78
Introduction I
47 / 78
Introduction II
48 / 78
Computation of Pi
Motivating example: quadrature of a circle
Let’s try to compute π which is the area of a circle with radius r =
1. We Let’s
approximate by the area
try to πcompute ⇡, ofthe
an area
inscribed
of a regular
circle polygon:
with radius
r=
We approximate ⇡ by the area of an inscribed regular poly
↵n := 2⇡
n
Fn = cos ↵2n si
↵n ↵n
An =Figure:
nFn = n cos
Area sin
of the Circle ! ⇡ as n ! 1
2 2
[See Gander, Gander, & Kwok: Scientific Computing. Springer. Com
49 / 78
Computation of Pi
▸ Define αn = 2π αn αn
n , then area of the triangle is Fn = cos 2 sin 2 .
▸ Area An covered by rotating this triangle n times is
n cos α2n sin α2n
▸ An → π as n → ∞.
n
▸ An = 2 sin( 2π
n )=
n
2 sin(αn )
√ √ √
▸ sin(α2n ) = sin αn 1−cos(αn ) 1− 1−sin2 αn
2 = 2 = 2 .
√
▸ sin(α6 ) = 3
2
50 / 78
Computation of Pi
clear all ;
clc ;
s=s q r t ( 3 ) / 2 ; A=3∗ s ; n =6;
z =[A−p i n A s ] ;
w h i l e s >1e −10
% initialization
% s t o r e the r e s u l t s
% t e r m i n a t e i f s=s i n ( a l p h a ) s m a l l
s=s q r t ((1 − s q r t (1− s ∗ s ) ) / 2 ) ; % new s i n ( a l p h a / 2 ) v a l u e
n=2∗n ; A=n /2∗ s ; % A = new p o l y g o n a l a r e a
z =[ z ; A−p i n A s ] ;
end
f o r i =1: l e n g t h ( z )
f p r i n t f ( ’%10d %20.15 f %20.15 f %20.15 f \n ’ , z ( i , 2 ) , z ( i
,3) , z ( i ,1) , z ( i ,4) ’)
end
51 / 78
Integers
52 / 78
Real Numbers
d1 d2 d3
1.d1 d2 d3 ⋯ = 1 + + + +⋯
2 22 23
In general, infinitely many digits are needed to represent a real
number.
The choice of a binary representation is just one of many
possibilities. It is, indeed, a convenient choice when it comes to
computers.
53 / 78
Examples
1
1. −1.1012 × 2 = −(1 + 2 + 18 ) × 2 = −3.25
2. (10011.01)2 = 19.25
1
3. (0.010101)2 = 3
1
4. (0.00110011) = 5 = 0.2
The last example is of interest insofar as it shows that to a finite
decimal number there may correspond a (nontrivial) infinite binary
representation. (This is not true the other way round. Why?) So,
one cannot assume that a finite decimal number is exactly
representable on a binary computer.
54 / 78
Floating Point representation
55 / 78
Floating point systems
Definition (Floating point system:)
A floating point system can be characterize by using 4-tuple
(β, t, L, U), where
▸ β is the base of number system
▸ t is the number of digits (percision)
▸ L is the lower bound on exponenet e.
▸ R is the upper bound on exponenet e.
So,
d0 d1 dt−1
fl(x) = ± ( 0
+ 1 + ⋯ + t−1 ) × β e
β β β
∣fl(x) − x∣
∣x∣
is ?
57 / 78
Floating point systems: Rounding Unit and Significant
digits
58 / 78
Error in Floating point representation
▸ Chopping:
fl(x) = ±(1.d1 d2 d3 ⋯dt−1 dt ) × 2e
Absolute error is bounded by 2−t ⋅ 2e
▸ Rounding:
⎧
⎪±(1.d1 d2 d3 ⋯dt−1 dt ) × 2e if 0.dt+1 ⋯ < 1/2
⎪
fl(x) = ⎨ e
⎩±(1.d1 d2 d3 ⋯dt−1 dt + 2 ) × 2 if 0.dt+1 ⋯ > 1/2
⎪ −t
⎪
59 / 78
Errors for general floating point systems
Theorem
Let x → fl(x) = g × β e , where x ≠ 0 and g is normalized, signed
mantissa. Then the absolute error committed in using floating
point representation of x is bounded by,
60 / 78
IEEE
IEEEfloating
floatingpoint
point numbers
numbers
ANSI/IEEE Standard 754-1985 for Binary Floating Point
ANSI/IEEE Standard 754-1985 for Binary Floating Point
Arithmetic. Acording to the IEEE standard a 32-bit float has the
Arithmetic. Acording to the IEEE standard a 32-bit float has the
following
followingstructure
structure(from
(from en.wikipedia.org)
en.wikipedia.org)
double:
▸ 1 sign bit
▸ 11 bits exponent
▸ 52 bits mantissa
The value of a normalized 64-bit IEEE floating point number V is
62 / 78
Special Numbers
63 / 78
Rounding errors in IEEE
Parameters of IEEE Standard arithmetics with base E = 2.
Percision t emin emax η
Single 23 -125 128 2−24 ≈ 6 ⋅ 10−8
Double 52 -1021 1024 2−53 ≈ 6 ⋅ 10−16
Extended 63 -16381 16384 2−64 ≈ 5 ⋅ 10−20
Table: Errors in various percisons
Lemma
(EXERCISE):If x ≠ 0is a normalized floating point number and
fl(x) obtained by rounding with t digits, then
∣fl(x) − x∣ 2−t
∣fl(x) − x∣ ≤ 2e−t /2, ≤ ≡η
∣x∣ 2
64 / 78
Rounding errors
We
Weassume
assume that
that all numbers
numbers are are normalized.
normalized.
Let
Lettt be
be the
the length the mantissa.
length of the mantissa.
Between
Between powers
powers of 2, the
the floating
floating point
pointnumbers
numbersare
areequidistant.
equidistant.
65 / 78
Rounding errors
66 / 78
Floating point Arithmetic
fl(x ± y ) = (x ± y )(1 + ε1 ),
fl(x × y ) = (x × y )(1 + ε2 ),
fl(x/y ) = (x/y )(1 + ε3 ),
with ∣εi ∣ ≤ η.
In other words: The result of a basic operation with two floating
point numbers yields a result that is correct up to a relative error
smaller than η. Thus, the relative errors remain small after each
such operation.
This is achieved only using guard digits (intermediate higher
precision).
67 / 78
Guard Digit
∣0.100337−0.1003∣
Relative error: ∣0.100337∣ ≈ 0.37 × 10−4 < η
68 / 78
Rounding error example
∣fl(π) − π∣
≈ 0.0053
∣π∣
Similarly,
∣π 2 − fl(fl(π)fl(π))∣
π 2 − fl(fl(π)fl(π)) ≈ 0.12, ≈ 0.012
π2
69 / 78
Note on machine epsilon
70 / 78
Rounding errors summary
Lemma
1. With the machine precision η we have fl(x) = x(1 + ε) with
∣ε∣ ≤ η.
2. If ∗ is an elementary operation then fl(x ∗ y ) = (x ∗ y )(1 + ε)
with ∣ε∣ ≤ η.
Wilkinson’s Principle
The result of a numerical computation on the computer is the
exact result with slightly perturbed initial data.
This also holds for good implementations of (library) functions!
71 / 78
Cancellation
72 / 78
Cancellation (cont.)
Numerator: OK.
Denominator is very close to zero if x ≈ y . So the relative error in
z could become large.
73 / 78
Library function for sinh
x3 ξ5
sinh(x) = x + + , ∣ξ∣ < x.
6 120
For small enough x we can expect very good approximations.
74 / 78
Quadrature of a circle revisited
75 / 78
Fix
αn sin αn
sin =√ √ .
2 2
2(1 + 1 − sin αn )
76 / 78
A Very Good Source For Floating Point Arithmetic
77 / 78
Paper on high precision computation
78 / 78