Lecture9 - Fixed Point
Lecture9 - Fixed Point
Lecture9 - Fixed Point
Welcome!
Today’s Agenda:
Introduction
Float to Fixed Point and Back
Operations
Fixed Point & Accuracy
INFOMOV – Lecture 9 – “Fixed Point Math” 3
Introduction
The Concept of Fixed Point Math
Some consequences:
Introduction
The Concept of Fixed Point Math
In binary:
Looking at the first number (205887), and splitting in two sets of 16 bit, we get:
Introduction
But… Why!?
Introduction
But… Why!?
Quake’s solution:
And:
Start the floating point division (21 cycles) for the next segment, so it
can complete while we execute integer code for the linear interpolation.
INFOMOV – Lecture 9 – “Fixed Point Math” 7
Introduction
But… Why!?
Conversions
Practical Things
After calculations, cast the result to int by discarding the fractional bits. E.g.:
int result = fp_pi >> 16; // divide by 65536
Or, get the original float back by casting to float and dividing by 2fractionalbits :
float result = (float)fp_pi / 65536.0f;
Note that this last option has significant overhead, which should be
outweighed by the gains.
INFOMOV – Lecture 9 – “Fixed Point Math” 10
Conversions
Practical Things - Considerations
What is the best value for FP_SCALE in this case? And should we use int or
unsigned int for the table?
Sine/cosine: range is [-1, 1]. In this case, we need 1 sign bit, and 1 bit for the
whole part of the number. So:
We use 30 bits for fractional precision, 1 for sign, 1 for range.
In base 10, the fractional precision is ~10 digits (float has 7).
INFOMOV – Lecture 9 – “Fixed Point Math” 11
Conversions
Practical Things - Considerations
1. All values are positive (no objects behind the camera are drawn);
2. Further away we need less precision.
Conversions
Practical Things - Considerations
Better: scale the simulation to a box of 127x127x127 for better use of the full
range; this gets you ~8.5 decimal digits of precision.
INFOMOV – Lecture 9 – “Fixed Point Math” 13
Conversions
Practical Things - Considerations
Suppose you want to add a sine wave to your 7:25 particle coordinates using
the precalculated 2:30 sine table. How do we get from 2:30 to 7:25?
Simple: shift the sine values 5 bits to the right (losing some precision).
(What happens if you used the 127x127x127 grid, and adding the sine wave
makes particles exceed this range?)
INFOMOV – Lecture 9 – “Fixed Point Math” 14
Conversions
Practical Things – 64 bit
So far, we assumed the use of 32bit integers to represent our fixed point
numbers. What about 64bit?
but we will use 64bit to overcome problems with multiplication and division.
Today’s Agenda:
Introduction
Float to Fixed Point and Back
Operations
Fixed Point & Accuracy
INFOMOV – Lecture 9 – “Fixed Point Math” 16
Operations
Addition & Subtraction
fp_a = … ;
fp_b = … ;
fp_sum = fp_a + fp_b;
Note that this does require that fp_a and fp_b have the same
number of fractional bits. Also don’t mix signed and unsigned
carelessly.
fp_a = … ; // 8:24
fp_b = … ; // 16:16
fp_sum = (fp_a >> 8) + fp_b; // result is 16:16
INFOMOV – Lecture 9 – “Fixed Point Math” 17
Operations
Multiplication
fp_a = … ; // 10:22
fp_b = … ; // 10:22
fp_sum = fp_a * fp_b; // 20:44
Operations
Multiplication
1. (fp_a * fp_b) >> 22; // good if fp_a and fp_b are very small
2. (fp_a >> 22) * fp_b; // good if fp_a is a whole number
3. (fp_a >> 11) * (fp_b >> 11); // good if fp_a and fp_b are large
4. ((fp_a >> 5) * (fp_b >> 5)) >> 12;
fp_a = PI;
fp_b = 0.5f * 2^22;
int fp_prod = fp_a >> 1; //
INFOMOV – Lecture 9 – “Fixed Point Math” 19
Operations
Division
fp_a = … ; // 10:22
fp_b = … ; // 10:22
fp_sum = fp_a / fp_b; // 10:0
Operations
Division
1. (fp_a << 22) / fp_b; // good if fp_a and fp_b are very small
2. fp_a / (fp_b >> 22); // good if fp_b is a whole number
3. (fp_a << 11) / (fp_b >> 11); // good if fp_a and fp_b are large
4. ((fp_a << 5) / (fp_b >> 5)) >> ?;
Operations
Square Root
For square roots of fixed point numbers, optimal performance is achieved via
_mm_rsqrt_ps (via float). If precision is of little concern, use a lookup table, optionally
combined with interpolation and / or a Newton-Raphson iteration.
Operations
Fixed Point & SIMD
_mm_mul_epu32
_mm_mullo_epi16
_mm_mulhi_epu16
_mm_srl_epi32
_mm_srai_epi32
Accuracy
Error
1
16:16 fixed point numbers have a maximum error of 217 ≈ 7.6 · 10−6 .
We get slightly more than 5 digits of decimal precision.
Accuracy
Error
𝑥 = 𝑦/𝑧
Assuming 16:16 input, 𝑓𝑝_𝑧 briefly becomes 16:8, with a precision of only 2 decimal digits.
Similarly:
Here, both 𝑓𝑝_𝑦 and 𝑓𝑝_𝑧 become 16:8, and the cumulative error will exceed 1/29 .
INFOMOV – Lecture 9 – “Fixed Point Math” 26
Accuracy
Error
Careful balancing of range and precision in fixed point calculations can reduce this problem.
Note that accuracy problems also occur in float calculations; they are just exposed more
clearly in fixed point. And: this time we can do something about it.
INFOMOV – Lecture 9 – “Fixed Point Math” 27
Accuracy
Error - Example
INFOMOV – Lecture 9 – “Fixed Point Math” 28
Accuracy
Improving the function.zip example
The following slides contain a step-by-step improvement of the fixed point evaluation of the
1
function 𝑓 𝑥 = sin 4𝑥 3 − cos 4𝑥 2 + 𝑥 , which failed during the real-time session in class.
Starting point is the working, but inaccurate version available from the website.
Initial accuracy, expressed as summed error relative to the ‘double’ evaluation, is 246.84.
For comparison, the summed error of the ‘float’ evaluation is just 0.013.
INFOMOV – Lecture 9 – “Fixed Point Math” 29
Accuracy
Improving the function.zip example
int EvaluateFixed( double x )
{
16:16 int fp_pi = (int)(PI * 65536.0);
16:16 int fp_x = (int)(x * 65536.0);
if ((fp_x >> 8) == 0) return 0; // safety net for division
Accuracy
Improving the function.zip example Notice how many values do not use
the full integer range: e.g, PI is 3 and
int EvaluateFixed( double x ) needs two bits; x is -9..+9 and needs
{ four bits, sin/cos is -1..1 and needs
2:16 int fp_pi = (int)(PI * 65536.0); only one bit for range.
4:16 int fp_x = (int)(x * 65536.0);
if ((fp_x >> 8) == 0) return 0; // safety net for division
Accuracy
Here, x is adjusted to use maximum precision:
4:27. 4x is then just a reinterpretation of this
Improving the function.zip example number, 6:25.
int EvaluateFixed( double x ) The calculation of sin4x3 is interesting: since
{ sin(x) is -1..1, sin(x)^3 is also -1..1. We drop a
2:16 int fp_pi = (int)(PI * 65536.0); minimal amount of bits and keep precision.
4:27 int fp_x = (int)(x * (double)(1 << 27));
if ((fp_x >> 10) == 0) return 0; // safety net for division Error is now down to 14.94.
6:25 int fp_4x = fp_x;
int a = fp_4x / ((2 * fp_pi) >> 3); 6:25 / 3:13 = 4:12
1:16 int fp_sin4x = sintab[a & 4095];
0:30 int fp_sin4x3 = (((fp_sin4x >> 1) * (fp_sin4x >> 1)) >> 15) * (fp_sin4x >> 1);
^ 0:15 * 0:15 = 0:30; 0.15 * 0:15 = 0.30
1:16 int fp_cos4x = costab[a & 4095];
0:39 int fp_cos4x2 = (fp_cos4x >> 1) * (fp_cos4x >> 1); 0:15 * 0:15 = 0:30
16:16 int fp_recix = (1 << 30) / (fp_x >> 13); 1:30 / 5:14 = 0:16
Accuracy
Where do we go from here?
Improving the function.zip example The sin/cos tables still contain 1:16 data. However,
int EvaluateFixed( double x ) the way their data is used makes that increasing
{
int fp_pi = (int)(PI * 65536.0); precision here doesn’t help.
We could calculate fp_sin4x3 and fp_cos4x2 via 64-
int fp_x = (int)(x * (double)(1 << 27));
if ((fp_x >> 10) == 0) return 0; // safety net for division
enough?
To be continued.
INFOMOV – Lecture 9 – “Fixed Point Math” 33
Accuracy
Error – Take-away
Take-away
Fixed point: