Skip to content

refactor: wip attempt at lazy import modules #1419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

refactor: wip attempt at lazy import modules #1419

wants to merge 10 commits into from

Conversation

hoxbro
Copy link
Member

@hoxbro hoxbro commented May 14, 2025

Trying to see if I could lazy import modules, so we don't get such a heavy import cost of datashader.

This is heavily inspired by the work done in holoviz/holoviews#6476, also here an import hook for when the module is imported. Though, I have chosen to have the import here in a centralized file as this matches better with how datashader is structured.

Nowhere near finishing in the transition , also currently dask stalls and don't know why...


Benchmark

Hyperfine

Main

❯ hyperfine "python -c ''" "python -c 'import datashader'" --warmup 5
Benchmark 1: python -c ''
  Time (mean ± σ):      10.0 ms ±   0.8 ms    [User: 6.8 ms, System: 3.0 ms]
  Range (min … max):     7.9 ms …  12.7 ms    263 runs

Benchmark 2: python -c 'import datashader'
  Time (mean ± σ):      1.099 s ±  0.017 s    [User: 2.510 s, System: 0.164 s]
  Range (min … max):    1.077 s …  1.125 s    10 runs

Summary
  python -c '' ran
  109.99 ± 8.70 times faster than python -c 'import datashader'

This branch (7648773)

❯ hyperfine "python -c ''" "python -c 'import datashader'" --warmup 5
Benchmark 1: python -c ''
  Time (mean ± σ):      10.0 ms ±   0.8 ms    [User: 7.0 ms, System: 2.9 ms]
  Range (min … max):     7.8 ms …  11.8 ms    257 runs

Benchmark 2: python -c 'import datashader'
  Time (mean ± σ):     558.2 ms ±   7.5 ms    [User: 2047.4 ms, System: 96.5 ms]
  Range (min … max):   544.8 ms … 568.1 ms    10 runs

Summary
  python -c '' ran
   55.59 ± 4.45 times faster than python -c 'import datashader'

Tuna

❯ python -X importtime -c 'import datashader' 2> tuna.log && tuna tuna.log

Main

image

This branch (7648773)

image

@jbednar
Copy link
Member

jbednar commented May 14, 2025

Cool, thanks. Note that there are at least two sources of long startup time for Datashader, i.e. imports and Numba compilation. I think only once Numba can be precompiled and/or cached (which is supposed to be possible already) will the startup time be reasonable.

Copy link

codecov bot commented May 15, 2025

Codecov Report

Attention: Patch coverage is 75.88235% with 41 lines in your changes missing coverage. Please review.

Project coverage is 87.95%. Comparing base (82a57c1) to head (c74fb8f).

Files with missing lines Patch % Lines
datashader/_dependencies.py 66.30% 31 Missing ⚠️
datashader/data_libraries/__init__.py 77.77% 2 Missing ⚠️
datashader/data_libraries/cudf.py 0.00% 2 Missing ⚠️
datashader/data_libraries/dask_cudf.py 0.00% 2 Missing ⚠️
datashader/tiles.py 33.33% 2 Missing ⚠️
datashader/glyphs/points.py 80.00% 1 Missing ⚠️
datashader/utils.py 95.23% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1419      +/-   ##
==========================================
- Coverage   88.46%   87.95%   -0.51%     
==========================================
  Files          94       95       +1     
  Lines       18683    18662      -21     
==========================================
- Hits        16527    16415     -112     
- Misses       2156     2247      +91     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hoxbro
Copy link
Member Author

hoxbro commented May 15, 2025

Cool, thanks. Note that there are at least two sources of long startup time for Datashader, i.e. imports and Numba compilation. I think only once Numba can be precompiled and/or cached (which is supposed to be possible already) will the startup time be reasonable.

Yes, it could be great also to tackle that, but just by looking at the profiler, I can see, e.g., dask.dataframe accounted for half of the import time, and pandas a little below 2/10.

If I lazy load pandas (and therefore xarray and comment out some ragged extension inheritance for now), I can get it down to the following:

hyperfine "python -c ''" "python -c 'import datashader'" --warmup 5
Benchmark 1: python -c ''
  Time (mean ± σ):       9.6 ms ±   0.7 ms    [User: 6.6 ms, System: 2.9 ms]
  Range (minmax):     7.7 ms11.8 ms    277 runs

Benchmark 2: python -c 'import datashader'
  Time (mean ± σ):     298.7 ms ±  14.9 ms    [User: 1847.7 ms, System: 57.0 ms]
  Range (minmax):   284.5 ms331.5 ms    10 runs

Summary
  python -c '' ran
   31.19 ± 2.67 times faster than python -c 'import datashader'

image

@jbednar
Copy link
Member

jbednar commented May 28, 2025

That's great! What I was bringing up was the time to the first datashaded result, not the time for an import that isn't used. Are we often importing datashader without actually using it?

@hoxbro
Copy link
Member Author

hoxbro commented May 28, 2025

What I was bringing up was the time to the first datashaded result, not the time for an import that isn't used.

I understand that this is the main problem, but it is outside the scope of this PR. But something we should improve at some point.

Are we often importing datashader without actually using it?

I don't think so, but when we do import it, we import a lot of stuff we don't necessarily need. For example, if we want to rasterize a pandas DataFrame, we try to import dask.DataFrame. This, at least for me, causes unnecessary slowdown, because I have Dask installed in my development environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy