py/obj: Improve REPR_C accuracy and include it in CI tests #17731
+31
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Current implementation of REPR_C works by clearing the two lower bits of the mantissa to zero. As it happens after each floating point operation, this tends to bias floating point numbers towards zero, causing decimals like .9997 instead of rounded numbers. This is visible in test cases involving repeated computations, such as
misc/rge_sm
for instance, or when looking at the numerical results ofperf_bench/bm_fft
, which shows significantly higher errors than when using full 32 bit floats.The standard way to fix this bias toward zero would be to add 2 to the mantissa before clearing the last two bits, effectively performing a round operation. This approach works quite well to fix the bias, but the repeated rounding operation still has some impact on accuracy on the longer term.
By experimenting with alternative approches, I found out that the most effective approach is to fix the bias not when saving the floating point number into an
mp_obj_t
, but when reloading thefloat
value from themp_obj_t
, simply by copying the the two previous bits. Although less mathematically sound, this method appears empirically to work significantly better, although I don't have a mathematical proof to offer. The only explanation I have for this good behaviour is that statistically, it adds the required half-bit bias, while properly restoring the missing decimals for numbers with a mantissa ending in 0000 and 1111.Here is a summary of the comparison between current baseline, and these two methods:
Given this improvement in REPR_C accuracy, and as discussed in PR 16953, I have updated the
unix/longlong
variant to use REPR_C, to ensure that it is included in CI builds. A few existing test cases had to be tweaked as the updated code did cause in a few cases the last digit to change, but this has been made explicit in the test code.The only possibly problematic case found during testing of this code was found in the implementation of
bm_fft
, which computes log_2(n) asint(log(n) / log(2))
. This expression is lacking a call toround()
to make it safe, as nothing guarantees that the result of the division is not just a fraction below the expected integer value. This is actually what was happening when running the new code, causing a test failure. The fix to the test case was easy, but we cannot exclude that other pieces of code in the wild have been using the same expression. They would fail in the same way, unless the call toround
is added.Testing
The new code has been tested as described above, on the latest development builds.
Trade-offs and Alternatives
Two methods to fix the bias have been described, and the most effective has been chosen.
Another approach would be to use an alternate REPR which favors floats over smallints, but discussions in MicroPython forum showed that this was not desirable.