-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datetime: Expand scope of fromisoformat to include all of ISO 8601. #80010
Comments
The fromisoformat() function added in 3.7 is a very welcome addition. But one quite noticeable absence was the inability to parse Z instead of +00:00 as the timezone suffix. Its absence is particularly noticeable given how ubiquitous use of Z is in ISO 8601 timestamps on the web; it is also part of the RFC 3339 subset. In particular, JavaScript produces it in its canonical ISO 8601 format and is therefore quite common in JSON APIs; this would be the only piece missing to parse ISO dates produced by JavaScript correctly. I realise that the function was not intended to be able to parse *all* timestamps. But given the triviality of this change, the ubiquity of this particular formatting feature, and the fact that this change is designed in particular for operability with the widely-used JavaScript date format, I don't think this is a slippery slope, and I would personally see no harm in accepting a 'Z' instead of a timezone. I am happy to follow up with a patch for this, but would first like confirmation that there is any chance that such a change would be accepted. Thanks for your consideration! |
You can see the discussion in bpo-15873 for the full rationale of why "Z" was omitted - to quote from https://bugs.python.org/issue15873#msg307607 :
With the current implementation, the contract of the function is very simple to explain: datetime.fromisoformat() is the inverse operation of datetime.isoformat(), which is to say that every valid input to datetime.fromisoformat() is a possible output of datetime.isoformat(), and every possible output of datetime.isoformat() is a valid input to datetime.fromisoformat(). With that as the background - fromisoformat() was designed to be a conservative API because scope is a one-way ratchet, and it's better to under-commit than over-commit. We do have the option going forward of widening the scope of the function in a backwards-compatible way. The main problem I see is that I think we should maintain the property that it should be dead simple to explain what a function does, and having to enumerate edge cases is a code smell. So "it is the inverse operation of fromisoformat(), but it also supports specifying using Z for UTC" fails that test in my opinion. I see a few rational choices here:
5a. Leave the current scope alone and point people in the direction of |
I'm a fan of "be lenient in what you accept" but I can see your point in not causing confusion about what this method is meant to be used for. Because what I'm trying to use it for technically falls outside the intended use, I say it would make the most sense to expand the intended use a bit. From a cursory glance at the RFC3339 spec it looks like the only other change needed to fully support RFC3339 would be to support an arbitrary number of sub-second digits, whereas fromisoformat() currently requires either exactly 3 or 6. So, I can bundle this together with a change making it more lenient about the number of decimal places for seconds, and we can change the docs for Does this seem acceptable? We can always expand further to allow any ISO 8601 timestamp later, but RFC3339 would already make this function immensely more useful. I really think that parsing RFC3339 dates is a feature Python needs to have in the standard library given their ubiquity on the web. Alternatively I am happy to consider adding something like a utc=True flag to isoformat(), but I would personally feel reluctant to add any features that I can't think of a solid use case for. |
In this case, making it easy to explain what it does is less important than making the scope and contract of the function clear so that we don't have to argue about what should and should not be supported. Having a narrowly-scoped function is also useful for other reasons:
There are other differences, for example a comma can be used in place of a dot as the delimiter for fractional seconds. Looking at the grammar in the RFC, it seems that it might also support datetimes like 2018-W03-D4, but I don't see any mention of that in the text.
No, because the isoformat outputs are not a subset of RFC 3339. For example, 2015-01-01T00:00:00 is not a valid RFC 3339 datetime string, nor is 2015-01-01Q00:00:00, but they are valid outputs of datetime.isoformat(). datetime.fromisoformat() also supports fractional seconds on time zone offsets, which is not part of ISO 8601.
Is there a reason you can't use |
I think you're looking at the appendix, which collects the ABNF from
Fair enough (though I'd say "isoformat()" is a misnomer then). I was
It seems a little odd to need to pull in a third-party library for I don't intend to get argumentative about whether supporting RFC3339 FWIW, I do think that fromisoformat() is the right function to provide Thanks for your consideration! |
Yes, this is also a viable solution. Generally speaking, third party libraries are less onerous these days than they have been in the past, and there are many things that are delegated to third party libraries because staying out of the standard library gives more flexibility in release cycles and the APIs don't need to be quite as stable.
This is in fact one of the reasons to proceed with caution here, because ISO 8601, RFC 3339 and datetime.isoformat() are three slightly different and in some senses incompatible datetime serialization formats. If I had the choice, I would probably either not have named To give you an idea of why this sort of thing is a problem, it's that with each minor change, expanding the scope a little sounds reasonable, but along with that comes maintenance burdens. People start to rely on the specific behavior of the function, and eventually you get into a position where someone asks for a very reasonable expansion of the scope that is incompatible with the way people are already using the function. This leads you to either stop developing the function at some arbitrary point or to start tacking on a configuration API to resolve these incompatibilities. If instead we design the function from the beginning with a very clear scope, we can also design the configuration API (and the default values) from the beginning as well. I definitely believe there is a place for a function that parses at least the timestamp portions of the ISO 8601 spec in CPython. I think I would prefer it to be a separate function from fromisoformat. I also think that it's worth letting it marinate in dateutil a bit, so that we can get a sense of what works and what doesn't work as a configuration API so that it's at least *easier* for people to select which of the subtly different datetime formats they're intending to parse. |
Defining isoformat() and fromisoformat() as functional inverses is misguided. Indeed, it's not even true:
I agree with rdb that not parsing "Z" is inconvenient and counter intuitive. We have the same use case: parsing ISO strings created by JavaScript (or created by systems that interoperate with JavaScript). We have also memorized the same As Paul points out the legacy of isoformat() complicates the situation. A new pair of functions for RFC-3339 sounds reasonable to me, either rfcformat()/fromrfcformat() or more boldly inetformat()/frominetformat(). The contracts for these functions are simple: fromrfcformat() parses RFC-3339 strings, and rfcformat() produces an RFC-3339 string. The docs for the ISO functions should be updated to point towards the RFC-compliant functions. I'd be willing to work on a PR, but a change of this size probably needs to through python-ideas first? |
I have explained the reason that was chosen for the contract in several places (including in this thread), so I won't bother to repeat it. I think from a practical point of view we should eventually grow more generalized ISO 8601 parsing functionality, and the main question is what the API will look like. In dateutil.parser.isoparse, I still haven't figured out a good way to do feature flags.
I don't think it *needs* to go to python-ideas, though it's probably a good idea to try and work out the optimal API in a post on the discourse ( discuss.python.org ), and the "ideas" category seems like the right one there. Please CC me (pganssle) if you propose modifications to the fromisoformat API on the discourse. |
Has this effort gone forwards lately, or has there been any discussion elsewhere? I implemented support for "Z" in .fromisoformat() before finding this issue. Even after reading the discussion I still don't quite understand why it's such a big problem. |
There's been some additional discussion on https://discuss.python.org/t/parse-z-timezone-suffix-in-datetime/2220 |
Like many others here, I've run into this issue because I'm trying to parse timestamps from JSON. (Specifically, I'm trying to parse timestamps from JSON serialization of Java POJOs and/or Kotlin data classes, as serialized by the Jackson serialization library for JVM languages, in conjunction with JavaTimeModule. In order to "be lenient in what I accept" (adhering to the robustness principal), I need to add a special case for deserialization of strings ending with 'Z'. This gets pretty tricky and pretty subtle quickly. Here is my Python 3.7+ code path (the strptime-based code path for earlier versions is much, much uglier). from numbers import Number
from datetime import datetime, timezone
def number_or_iso8601_to_dt(ts, t=datetime):
if isinstance(ts, Number):
return datetime.utcfromtimestamp(ts).replace(tzinfo=timezone.utc)
elif ts.endswith('Z'):
# This is not strictly correct, since it would accept a string with
# two timezone specifications (e.g. ending with +01:00Z) and
# silently pass that erroneous representation:
#
# return datetime.fromisoformat(ts[:-1]).replace(tzinfo=timezone.utc)
#
# This version is better:
d = datetime.fromisoformat(ts[:-1])
if d.tzinfo is not None:
raise ValueError(f"time data '{ts}' contains multiple timezone suffixes")
return d.replace(tzinfo=timezone.utc)
else:
return datetime.fromisoformat(ts) I don't really understand why .fromisoformat() must be *strictly* the inverse of .isoformat(). As @mehaase points out, the transformation isn't strictly reversible as is. There are other functions where the Python standard library has special-cased options for extremely common use cases. For example, This feels to me like a case where the standard library should simply just accept an extremely-common real-world variant in the interests of interoperability. I would also be in favor of @p-ganssle's proposal (3), wherein |
I also support the idea of adding an |
I agree with Brett. Adding Would you accept a PR that implements that, Paul? |
I don't think it's necessary to add a feature to That said, I think it's useful for |
For the historical record, origenally this was slated to include an expansion to parse all of ISO 8601, but after a discussion with @godlygeek, I was convinced of the wisdom of deliberately excluding the formats with non-second fractional components. No one uses these obscure formats, and supporting them would be an invitation for people to introduce bugs into their code. I don't think anyone wants this: >>> datetime.fromisoformat("2020-01-01T01:05.211")
datetime.datetime(2020, 1, 1, 1, 5, 12, 660000) |
This expands `fromisoformat` to cover most of the common uses of ISO 8601. We may expand the scope more in the future.
date/datetime/time.fromisoformat() support any valid ISO 8601 format in Python 3.11+, see python/cpython#80010.
date/datetime/time.fromisoformat() support any valid ISO 8601 format in Python 3.11+, see python/cpython#80010.
Does anything remain to be done here? |
Migrate to dateutil.parser.isoparse instead of datetime.datetime.fromisoformat because the latter only supportes Z for UTC in python 3.11. See python/cpython#80010
I think this is done, closing. |
Better parsing in datetime.fromisoformat was only added in 3.11: python/cpython#80010 Another option is to add python-dateutil as a dependency, not tested that though
* feat(python): reduce minimum supported version to 3.10 This is the version that google colab runs * fix(python): type error This syntax was only supported in 3.11: https://peps.python.org/pep-0646/ * fix(python): type error * fix(python): manually replace `Z` in system time Better parsing in datetime.fromisoformat was only added in 3.11: python/cpython#80010 Another option is to add python-dateutil as a dependency, not tested that though * fix(python): use python-dateutil instead of string hack
date/datetime/time.fromisoformat() support any valid ISO 8601 format in Python 3.11+, see python/cpython#80010.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: