refactor: wip attempt at lazy import modules #1419

hoxbro · 2025-05-14T07:59:45Z

Trying to see if I could lazy import modules, so we don't get such a heavy import cost of datashader.

This is heavily inspired by the work done in holoviz/holoviews#6476, also here an import hook for when the module is imported. Though, I have chosen to have the import here in a centralized file as this matches better with how datashader is structured.

Nowhere near finishing in the transition ~~, also currently dask stalls and don't know why...~~

Benchmark

Hyperfine

Main

❯ hyperfine "python -c ''" "python -c 'import datashader'" --warmup 5
Benchmark 1: python -c ''
  Time (mean ± σ):      10.0 ms ±   0.8 ms    [User: 6.8 ms, System: 3.0 ms]
  Range (min … max):     7.9 ms …  12.7 ms    263 runs

Benchmark 2: python -c 'import datashader'
  Time (mean ± σ):      1.099 s ±  0.017 s    [User: 2.510 s, System: 0.164 s]
  Range (min … max):    1.077 s …  1.125 s    10 runs

Summary
  python -c '' ran
  109.99 ± 8.70 times faster than python -c 'import datashader'

This branch (7648773)

❯ hyperfine "python -c ''" "python -c 'import datashader'" --warmup 5
Benchmark 1: python -c ''
  Time (mean ± σ):      10.0 ms ±   0.8 ms    [User: 7.0 ms, System: 2.9 ms]
  Range (min … max):     7.8 ms …  11.8 ms    257 runs

Benchmark 2: python -c 'import datashader'
  Time (mean ± σ):     558.2 ms ±   7.5 ms    [User: 2047.4 ms, System: 96.5 ms]
  Range (min … max):   544.8 ms … 568.1 ms    10 runs

Summary
  python -c '' ran
   55.59 ± 4.45 times faster than python -c 'import datashader'

Tuna

❯ python -X importtime -c 'import datashader' 2> tuna.log && tuna tuna.log

Main

This branch (7648773)

jbednar · 2025-05-14T19:23:33Z

Cool, thanks. Note that there are at least two sources of long startup time for Datashader, i.e. imports and Numba compilation. I think only once Numba can be precompiled and/or cached (which is supposed to be possible already) will the startup time be reasonable.

…is, and add lazy_register to pipeline

codecov · 2025-05-15T13:34:26Z

Codecov Report

Attention: Patch coverage is 75.88235% with 41 lines in your changes missing coverage. Please review.

Project coverage is 87.95%. Comparing base (82a57c1) to head (c74fb8f).

Files with missing lines	Patch %	Lines
datashader/_dependencies.py	66.30%	31 Missing ⚠️
datashader/data_libraries/__init__.py	77.77%	2 Missing ⚠️
datashader/data_libraries/cudf.py	0.00%	2 Missing ⚠️
datashader/data_libraries/dask_cudf.py	0.00%	2 Missing ⚠️
datashader/tiles.py	33.33%	2 Missing ⚠️
datashader/glyphs/points.py	80.00%	1 Missing ⚠️
datashader/utils.py	95.23%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1419      +/-   ##
==========================================
- Coverage   88.46%   87.95%   -0.51%     
==========================================
  Files          94       95       +1     
  Lines       18683    18662      -21     
==========================================
- Hits        16527    16415     -112     
- Misses       2156     2247      +91

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hoxbro · 2025-05-15T18:12:47Z

Cool, thanks. Note that there are at least two sources of long startup time for Datashader, i.e. imports and Numba compilation. I think only once Numba can be precompiled and/or cached (which is supposed to be possible already) will the startup time be reasonable.

Yes, it could be great also to tackle that, but just by looking at the profiler, I can see, e.g., dask.dataframe accounted for half of the import time, and pandas a little below 2/10.

If I lazy load pandas (and therefore xarray and comment out some ragged extension inheritance for now), I can get it down to the following:

❯ hyperfine "python -c ''" "python -c 'import datashader'" --warmup 5
Benchmark 1: python -c ''
  Time (mean ± σ):       9.6 ms ±   0.7 ms    [User: 6.6 ms, System: 2.9 ms]
  Range (min … max):     7.7 ms …  11.8 ms    277 runs

Benchmark 2: python -c 'import datashader'
  Time (mean ± σ):     298.7 ms ±  14.9 ms    [User: 1847.7 ms, System: 57.0 ms]
  Range (min … max):   284.5 ms … 331.5 ms    10 runs

Summary
  python -c '' ran
   31.19 ± 2.67 times faster than python -c 'import datashader'

jbednar · 2025-05-28T15:35:39Z

That's great! What I was bringing up was the time to the first datashaded result, not the time for an import that isn't used. Are we often importing datashader without actually using it?

hoxbro · 2025-05-28T16:17:02Z

What I was bringing up was the time to the first datashaded result, not the time for an import that isn't used.

I understand that this is the main problem, but it is outside the scope of this PR. But something we should improve at some point.

Are we often importing datashader without actually using it?

I don't think so, but when we do import it, we import a lot of stuff we don't necessarily need. For example, if we want to rasterize a pandas DataFrame, we try to import dask.DataFrame. This, at least for me, causes unnecessary slowdown, because I have Dask installed in my development environment.

refactor: wip attempt at lazy import modules

2e63c01

hoxbro added 4 commits May 15, 2025 11:36

a bit of cleanup

a6fcb85

fix: not running dask

1198206

init lazy module if already imported, run import hook directly if it …

dee9e43

…is, and add lazy_register to pipeline

fix pre-commit

4a0abc5

hoxbro force-pushed the lazy_import branch from 006f77d to 4a0abc5 Compare May 15, 2025 13:06

hoxbro added 3 commits May 15, 2025 16:06

don't lazy_import dask bag, add hook and find_spec fix

960200d

remove dask import

c162ad3

don't import mock

7648773

hoxbro force-pushed the lazy_import branch from 13383a3 to 7648773 Compare May 15, 2025 15:12

hoxbro added 2 commits May 15, 2025 17:46

hide more import

4879ec1

lazy_register to gpu

c74fb8f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

refactor: wip attempt at lazy import modules #1419

refactor: wip attempt at lazy import modules #1419

Uh oh!

hoxbro commented May 14, 2025 •

edited

Loading

Uh oh!

jbednar commented May 14, 2025

Uh oh!

codecov bot commented May 15, 2025 •

edited

Loading

Uh oh!

hoxbro commented May 15, 2025

Uh oh!

jbednar commented May 28, 2025

Uh oh!

hoxbro commented May 28, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

refactor: wip attempt at lazy import modules #1419

Are you sure you want to change the base?

refactor: wip attempt at lazy import modules #1419

Uh oh!

Conversation

hoxbro commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Hyperfine

Main

This branch (7648773)

Tuna

Main

This branch (7648773)

Uh oh!

jbednar commented May 14, 2025

Uh oh!

codecov bot commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hoxbro commented May 15, 2025

Uh oh!

jbednar commented May 28, 2025

Uh oh!

hoxbro commented May 28, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

hoxbro commented May 14, 2025 •

edited

Loading

codecov bot commented May 15, 2025 •

edited

Loading