-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix performance bug from cftime import #5640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance bug from cftime import #5640
Conversation
I'd also like to append this to tag 14.1 and make tag 14.2 if possible - would this be ok? |
Thanks for the suggestion @lusewell . I'm a bit confused as to how exactly this improves performance though - you've moved the location of the |
I guess it always tries importing if the module doesn't exist and so that's a slowdown? |
But in both cases we always check for the existence of cftime via Hopefully @lusewell can enlighten us 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noting this @lusewell. Based on @TomNicholas's comment and @dcherian's investigation, I'm curious whether we can make this PR a little more targeted.
def assert_all_valid_date_type(data): | ||
import cftime | ||
if cftime is None: | ||
raise ModuleNotFoundError("No module named 'cftime'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be the only place where this import could be attempted regularly without cftime
installed. Would only making a change here fix the performance issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have thought its better to avoid a pattern with these perfomance characteristics in general - hence why I changed it for all instances, rather than just the one that was causing me issues. I think its pretty clearly a low risk change, so thought its more a question of what's a better pattern to follow going forward, rather than what currently falls in the critical path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my thinking was, at least in the case of the cftime_offsets.py
module, if it is a performance issue to attempt importing cftime
when it is not installed, why attempt it -- even once -- if we know we will never need to? Note that we follow this current pattern in more than just the cftime_offsets.py
and cftimeindex.py
modules. There are places in times.py
where we import cftime
within functions, as well as numerous places in the tests (again though, these are places where I do not think we would see material impacts on performance). I'm open to discussing this more, however.
One other place where I do think this new pattern could positively impact performance is linked below; this is another example of where we might attempt importing cftime
regularly when it is not installed. It would be great if you could modify the import logic there to be more performance-friendly too.
Lines 625 to 630 in 35d798a
try: | |
import cftime | |
cftime_datetime = [cftime.datetime] | |
except ImportError: | |
cftime_datetime = [] |
So i've found another instance of this which causes a performance issue - this one with groupby. |
RE perfomance. Its only a performance issue to attempt to import cftime repeatedly. Having it fail once in the top level import is not a big problem. The issue comes when it does it thousands of times every time you try and I've fixed this for some other cases I've found that were causing me slowness - would like me to changeanythinng else before this can be merged? |
Thanks for catching that additional spot.
Yes, understood. I just prefer that we are consistent across the code base -- either we use this pattern only where absolutely necessary or we use it everywhere. In light of that do you mind introducing this pattern in Line 167 in 7bfee3e
Line 417 in 7bfee3e
I think we don't have to worry about the tests, because they already follow this pattern to an extent; in building the After that, just fix the linting error and add a what's new entry, and I think this should be ready to go from my perspective. |
4502168
to
8619ad9
Compare
Fixed other usages and added to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience @lusewell. This looks good to me. Regarding your request:
I'd also like to append this to tag 14.1 and make tag 14.2 if possible - would this be ok?
It sounds like you would like us to backport this change? I am not an expert on doing this -- perhaps others in @pydata/xarray can weigh in.
Thanks @lusewell and @spencerkclark Unfortunately we don't do backports. |
Co-authored-by: Luke Sewell <lukeddsewell@gmail.com>
No functional change, just removes a terrible perfomance bug when cftime isn't installed - previously calls to
.sel
would search your whole python path for trying to import cftime, leading to progams of mine taking 10% of time just doing this against a slow filesystem.Tests all pass localy.