Skip to content

GH-132657: Add lock-free set contains implementation #132290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Apr 8, 2025

This roughly follows what was done for dictobject to make a lock-free lookup operation. On a benchmark running set.__contains__ in a tight loop, this is 1.5x faster on my computer. In the bm_deepcopy benchmark, the gains are very modest, between 1 to 2% faster. On the "set_contains" scaling benchmark, the results are much better. Also, the multi-threaded scaling of "copy" and "deepcopy" seem to be measurably improved.

Summary of changes:

  • refactor set_lookkey() into set_do_lookup() which now takes a function pointer that does the entry comparison. This is similar to dictobject and do_lookup(). In an optimized build, the comparison function is inlined and there should be no performance cost to this.
  • change set_do_lookup to return a status separately from the entry value.
  • add set_compare_frozenset() and use if the object is a frozenset. For the free-threaded build, this avoids some overhead (locking, atomic operations, incref/decref on key)
  • use FT_ATOMIC_* macros as needed for atomic loads and stores
  • use a deferred free on the set table array, if shared (only on free-threaded build, normal build always does an immediate free)
  • for free-threaded build, use explicit for loop to zero the table, rather than memcpy().
  • when mutating the set, assign so->table to NULL while the change is a happening. Assign the real table array after the change is done.

Free-threading scaling benchmark results from the attached scripts (result for 6 cores in parallel). This is a modified version of the ftscalingbenchmark.py script.

base this PR
dict_contains 4.0x faster 4.0x faster
tuple_contains 5.4x faster 5.3x faster
list_contains 7.1x faster 6.1x faster
frozenset_contains 1.0x faster 5.9x faster
frozenset_contains_dunder 6.4x faster 3.9x faster
set_contains 1.0x slower 5.4x faster
set_contains_dunder 1.4x faster 5.6x faster
shallow_copy 1.9x faster 3.7x faster
deepcopy 2.5x faster 3.5x faster

ftscaling_set.py.txt

@nascheme nascheme changed the title Add lock-free set contains implemention GH-132657: Add lock-free set contains implementation Jul 12, 2025
@nascheme nascheme added performance Performance or resource usage topic-free-threading 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Jul 12, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 70a1c1f 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F132290%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-free-threading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy