-
-
Notifications
You must be signed in to change notification settings - Fork 374
refactor: wip attempt at lazy import modules #1419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Cool, thanks. Note that there are at least two sources of long startup time for Datashader, i.e. imports and Numba compilation. I think only once Numba can be precompiled and/or cached (which is supposed to be possible already) will the startup time be reasonable. |
…is, and add lazy_register to pipeline
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1419 +/- ##
==========================================
- Coverage 88.46% 87.95% -0.51%
==========================================
Files 94 95 +1
Lines 18683 18662 -21
==========================================
- Hits 16527 16415 -112
- Misses 2156 2247 +91 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Yes, it could be great also to tackle that, but just by looking at the profiler, I can see, e.g., If I lazy load pandas (and therefore xarray and comment out some ragged extension inheritance for now), I can get it down to the following: ❯ hyperfine "python -c ''" "python -c 'import datashader'" --warmup 5
Benchmark 1: python -c ''
Time (mean ± σ): 9.6 ms ± 0.7 ms [User: 6.6 ms, System: 2.9 ms]
Range (min … max): 7.7 ms … 11.8 ms 277 runs
Benchmark 2: python -c 'import datashader'
Time (mean ± σ): 298.7 ms ± 14.9 ms [User: 1847.7 ms, System: 57.0 ms]
Range (min … max): 284.5 ms … 331.5 ms 10 runs
Summary
python -c '' ran
31.19 ± 2.67 times faster than python -c 'import datashader' |
That's great! What I was bringing up was the time to the first datashaded result, not the time for an import that isn't used. Are we often importing datashader without actually using it? |
I understand that this is the main problem, but it is outside the scope of this PR. But something we should improve at some point.
I don't think so, but when we do import it, we import a lot of stuff we don't necessarily need. For example, if we want to rasterize a pandas DataFrame, we try to import dask.DataFrame. This, at least for me, causes unnecessary slowdown, because I have Dask installed in my development environment. |
Trying to see if I could lazy import modules, so we don't get such a heavy import cost of datashader.
This is heavily inspired by the work done in holoviz/holoviews#6476, also here an import hook for when the module is imported. Though, I have chosen to have the import here in a centralized file as this matches better with how datashader is structured.
Nowhere near finishing in the transition
, also currently dask stalls and don't know why...Benchmark
Hyperfine
Main
This branch (7648773)
Tuna
Main
This branch (7648773)