core/num/dec2flt/
mod.rs

1//! Converting decimal strings into IEEE 754 binary floating point numbers.
2//!
3//! # Problem statement
4//!
5//! We are given a decimal string such as `12.34e56`. This string consists of integral (`12`),
6//! fractional (`34`), and exponent (`56`) parts. All parts are optional and interpreted as a
7//! default value (1 or 0) when missing.
8//!
9//! We seek the IEEE 754 floating point number that is closest to the exact value of the decimal
10//! string. It is well-known that many decimal strings do not have terminating representations in
11//! base two, so we round to 0.5 units in the last place (in other words, as well as possible).
12//! Ties, decimal values exactly half-way between two consecutive floats, are resolved with the
13//! half-to-even strategy, also known as banker's rounding.
14//!
15//! Needless to say, this is quite hard, both in terms of implementation complexity and in terms
16//! of CPU cycles taken.
17//!
18//! # Implementation
19//!
20//! First, we ignore signs. Or rather, we remove it at the very beginning of the conversion
21//! process and re-apply it at the very end. This is correct in all edge cases since IEEE
22//! floats are symmetric around zero, negating one simply flips the first bit.
23//!
24//! Then we remove the decimal point by adjusting the exponent: Conceptually, `12.34e56` turns
25//! into `1234e54`, which we describe with a positive integer `f = 1234` and an integer `e = 54`.
26//! The `(f, e)` representation is used by almost all code past the parsing stage.
27//!
28//! We then try a long chain of progressively more general and expensive special cases using
29//! machine-sized integers and small, fixed-sized floating point numbers (first `f32`/`f64`, then
30//! a type with 64 bit significand). The extended-precision algorithm
31//! uses the Eisel-Lemire algorithm, which uses a 128-bit (or 192-bit)
32//! representation that can accurately and quickly compute the vast majority
33//! of floats. When all these fail, we bite the bullet and resort to using
34//! a large-decimal representation, shifting the digits into range, calculating
35//! the upper significant bits and exactly round to the nearest representation.
36//!
37//! Another aspect that needs attention is the ``RawFloat`` trait by which almost all functions
38//! are parametrized. One might think that it's enough to parse to `f64` and cast the result to
39//! `f32`. Unfortunately this is not the world we live in, and this has nothing to do with using
40//! base two or half-to-even rounding.
41//!
42//! Consider for example two types `d2` and `d4` representing a decimal type with two decimal
43//! digits and four decimal digits each and take "0.01499" as input. Let's use half-up rounding.
44//! Going directly to two decimal digits gives `0.01`, but if we round to four digits first,
45//! we get `0.0150`, which is then rounded up to `0.02`. The same principle applies to other
46//! operations as well, if you want 0.5 ULP accuracy you need to do *everything* in full precision
47//! and round *exactly once, at the end*, by considering all truncated bits at once.
48//!
49//! Primarily, this module and its children implement the algorithms described in:
50//! "Number Parsing at a Gigabyte per Second", available online:
51//! <https://arxiv.org/abs/2101.11408>.
52//!
53//! # Other
54//!
55//! The conversion should *never* panic. There are assertions and explicit panics in the code,
56//! but they should never be triggered and only serve as internal sanity checks. Any panics should
57//! be considered a bug.
58//!
59//! There are unit tests but they are woefully inadequate at ensuring correctness, they only cover
60//! a small percentage of possible errors. Far more extensive tests are located in the directory
61//! `src/tools/test-float-parse` as a Rust program.
62//!
63//! A note on integer overflow: Many parts of this file perform arithmetic with the decimal
64//! exponent `e`. Primarily, we shift the decimal point around: Before the first decimal digit,
65//! after the last decimal digit, and so on. This could overflow if done carelessly. We rely on
66//! the parsing submodule to only hand out sufficiently small exponents, where "sufficient" means
67//! "such that the exponent +/- the number of decimal digits fits into a 64 bit integer".
68//! Larger exponents are accepted, but we don't do arithmetic with them, they are immediately
69//! turned into {positive,negative} {zero,infinity}.
70//!
71//! # Notation
72//!
73//! This module uses the same notation as the Lemire paper:
74//!
75//! - `m`: binary mantissa; always nonnegative
76//! - `p`: binary exponent; a signed integer
77//! - `w`: decimal significand; always nonnegative
78//! - `q`: decimal exponent; a signed integer
79//!
80//! This gives `m * 2^p` for the binary floating-point number, with `w * 10^q` as the decimal
81//! equivalent.
82
83#![doc(hidden)]
84#![unstable(
85    feature = "dec2flt",
86    reason = "internal routines only exposed for testing",
87    issue = "none"
88)]
89
90use self::common::BiasedFp;
91use self::float::RawFloat;
92use self::lemire::compute_float;
93use self::parse::{parse_inf_nan, parse_number};
94use self::slow::parse_long_mantissa;
95use crate::error::Error;
96use crate::fmt;
97use crate::str::FromStr;
98
99mod common;
100pub mod decimal;
101pub mod decimal_seq;
102mod fpu;
103mod slow;
104mod table;
105// float is used in flt2dec, and all are used in unit tests.
106pub mod float;
107pub mod lemire;
108pub mod parse;
109
110macro_rules! from_str_float_impl {
111    ($t:ty) => {
112        #[stable(feature = "rust1", since = "1.0.0")]
113        impl FromStr for $t {
114            type Err = ParseFloatError;
115
116            /// Converts a string in base 10 to a float.
117            /// Accepts an optional decimal exponent.
118            ///
119            /// This function accepts strings such as
120            ///
121            /// * '3.14'
122            /// * '-3.14'
123            /// * '2.5E10', or equivalently, '2.5e10'
124            /// * '2.5E-10'
125            /// * '5.'
126            /// * '.5', or, equivalently, '0.5'
127            /// * 'inf', '-inf', '+infinity', 'NaN'
128            ///
129            /// Note that alphabetical characters are not case-sensitive.
130            ///
131            /// Leading and trailing whitespace represent an error.
132            ///
133            /// # Grammar
134            ///
135            /// All strings that adhere to the following [EBNF] grammar when
136            /// lowercased will result in an [`Ok`] being returned:
137            ///
138            /// ```txt
139            /// Float  ::= Sign? ( 'inf' | 'infinity' | 'nan' | Number )
140            /// Number ::= ( Digit+ |
141            ///              Digit+ '.' Digit* |
142            ///              Digit* '.' Digit+ ) Exp?
143            /// Exp    ::= 'e' Sign? Digit+
144            /// Sign   ::= [+-]
145            /// Digit  ::= [0-9]
146            /// ```
147            ///
148            /// [EBNF]: https://www.w3.org/TR/REC-xml/#sec-notation
149            ///
150            /// # Arguments
151            ///
152            /// * src - A string
153            ///
154            /// # Return value
155            ///
156            /// `Err(ParseFloatError)` if the string did not represent a valid
157            /// number. Otherwise, `Ok(n)` where `n` is the closest
158            /// representable floating-point number to the number represented
159            /// by `src` (following the same rules for rounding as for the
160            /// results of primitive operations).
161            // We add the `#[inline(never)]` attribute, since its content will
162            // be filled with that of `dec2flt`, which has #[inline(always)].
163            // Since `dec2flt` is generic, a normal inline attribute on this function
164            // with `dec2flt` having no attributes results in heavily repeated
165            // generation of `dec2flt`, despite the fact only a maximum of 2
166            // possible instances can ever exist. Adding #[inline(never)] avoids this.
167            #[inline(never)]
168            fn from_str(src: &str) -> Result<Self, ParseFloatError> {
169                dec2flt(src)
170            }
171        }
172    };
173}
174
175#[cfg(target_has_reliable_f16)]
176from_str_float_impl!(f16);
177from_str_float_impl!(f32);
178from_str_float_impl!(f64);
179
180// FIXME(f16_f128): A fallback is used when the backend+target does not support f16 well, in order
181// to avoid ICEs.
182
183#[cfg(not(target_has_reliable_f16))]
184impl FromStr for f16 {
185    type Err = ParseFloatError;
186
187    #[inline]
188    fn from_str(_src: &str) -> Result<Self, ParseFloatError> {
189        unimplemented!("requires target_has_reliable_f16")
190    }
191}
192
193/// An error which can be returned when parsing a float.
194///
195/// This error is used as the error type for the [`FromStr`] implementation
196/// for [`f32`] and [`f64`].
197///
198/// # Example
199///
200/// ```
201/// use std::str::FromStr;
202///
203/// if let Err(e) = f64::from_str("a.12") {
204///     println!("Failed conversion to f64: {e}");
205/// }
206/// ```
207#[derive(Debug, Clone, PartialEq, Eq)]
208#[stable(feature = "rust1", since = "1.0.0")]
209pub struct ParseFloatError {
210    kind: FloatErrorKind,
211}
212
213#[derive(Debug, Clone, PartialEq, Eq)]
214enum FloatErrorKind {
215    Empty,
216    Invalid,
217}
218
219#[stable(feature = "rust1", since = "1.0.0")]
220impl Error for ParseFloatError {
221    #[allow(deprecated)]
222    fn description(&self) -> &str {
223        match self.kind {
224            FloatErrorKind::Empty => "cannot parse float from empty string",
225            FloatErrorKind::Invalid => "invalid float literal",
226        }
227    }
228}
229
230#[stable(feature = "rust1", since = "1.0.0")]
231impl fmt::Display for ParseFloatError {
232    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
233        #[allow(deprecated)]
234        self.description().fmt(f)
235    }
236}
237
238#[inline]
239pub(super) fn pfe_empty() -> ParseFloatError {
240    ParseFloatError { kind: FloatErrorKind::Empty }
241}
242
243// Used in unit tests, keep public.
244// This is much better than making FloatErrorKind and ParseFloatError::kind public.
245#[inline]
246pub fn pfe_invalid() -> ParseFloatError {
247    ParseFloatError { kind: FloatErrorKind::Invalid }
248}
249
250/// Converts a `BiasedFp` to the closest machine float type.
251fn biased_fp_to_float<F: RawFloat>(x: BiasedFp) -> F {
252    let mut word = x.m;
253    word |= (x.p_biased as u64) << F::SIG_BITS;
254    F::from_u64_bits(word)
255}
256
257/// Converts a decimal string into a floating point number.
258#[inline(always)] // Will be inlined into a function with `#[inline(never)]`, see above
259pub fn dec2flt<F: RawFloat>(s: &str) -> Result<F, ParseFloatError> {
260    let mut s = s.as_bytes();
261    let c = if let Some(&c) = s.first() {
262        c
263    } else {
264        return Err(pfe_empty());
265    };
266    let negative = c == b'-';
267    if c == b'-' || c == b'+' {
268        s = &s[1..];
269    }
270    if s.is_empty() {
271        return Err(pfe_invalid());
272    }
273
274    let mut num = match parse_number(s) {
275        Some(r) => r,
276        None if let Some(value) = parse_inf_nan(s, negative) => return Ok(value),
277        None => return Err(pfe_invalid()),
278    };
279    num.negative = negative;
280    if !cfg!(feature = "optimize_for_size") {
281        if let Some(value) = num.try_fast_path::<F>() {
282            return Ok(value);
283        }
284    }
285
286    // If significant digits were truncated, then we can have rounding error
287    // only if `mantissa + 1` produces a different result. We also avoid
288    // redundantly using the Eisel-Lemire algorithm if it was unable to
289    // correctly round on the first pass.
290    let mut fp = compute_float::<F>(num.exponent, num.mantissa);
291    if num.many_digits
292        && fp.p_biased >= 0
293        && fp != compute_float::<F>(num.exponent, num.mantissa + 1)
294    {
295        fp.p_biased = -1;
296    }
297    // Unable to correctly round the float using the Eisel-Lemire algorithm.
298    // Fallback to a slower, but always correct algorithm.
299    if fp.p_biased < 0 {
300        fp = parse_long_mantissa::<F>(s);
301    }
302
303    let mut float = biased_fp_to_float::<F>(fp);
304    if num.negative {
305        float = -float;
306    }
307    Ok(float)
308}
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy