Skip to content

Precompute embedded string literals hash code #10596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2024

Conversation

etiennebarrie
Copy link
Contributor

[Feature #20415]

With embedded strings we often have some space left in the slot, which we can use to store the string Hash code.

It's probably only worth it for string literals, as they are the ones likely to be used as hash keys.

We chose to store the Hash code right after the string terminator as to make it easy/fast to compute, and not require one more union in RString.

compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code

|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol      |     39.275M|   39.753M|
|            |           -|     1.01x|
|dyn_symbol  |     37.348M|   37.704M|
|            |           -|     1.01x|
|small_lit   |     29.514M|   33.948M|
|            |           -|     1.15x|
|frozen_lit  |     27.180M|   33.056M|
|            |           -|     1.22x|
|iseq_lit    |     27.391M|   32.242M|
|            |           -|     1.18x|

@etiennebarrie etiennebarrie force-pushed the interned-string-hashcode branch from 5281aa9 to 303929b Compare April 22, 2024 10:24

This comment has been minimized.

string.c Outdated
RUBY_ASSERT(free_bytes >= sizeof(st_index_t));
#endif

*(st_index_t *)(RSTRING_END(str) + TERM_LEN(str)) = rb_str_hash(str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid if this may cause a bus error on some platform. Is it safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may cause a bus error on some platform

Because of alignment? If it's a concern, I think we could ensure it's always aligned, the overhead would be very slightly larger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes > because of alignment

It would be safer to use aligned address if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll look into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*(st_index_t *)(RSTRING_END(str) + TERM_LEN(str)) = rb_str_hash(str);
typedef struct {char bytes[sizeof(st_index_t)];} unaligned_index;
union {st_index_t i; unaligned_index b;} u = {.i = rb_str_hash(str)};
*(unaligned_index *)(RSTRING_END(str) + TERM_LEN(str)) = u.b;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @nobu, interesting trick. I suppose we need it on read as well?

@casperisfine casperisfine force-pushed the interned-string-hashcode branch from 303929b to 54c8bda Compare April 22, 2024 11:47
@casperisfine
Copy link
Contributor

Looks like we broke something after the refactor, I'll investigate.

@casperisfine casperisfine force-pushed the interned-string-hashcode branch from 54c8bda to 6255749 Compare April 22, 2024 12:53
Comment on lines +248 to +250
if (FL_TEST_RAW(str, STR_PRECOMPUTED_HASH)) {
real_size += sizeof(st_index_t);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the bug. The compactor thought the string could fit in a smaller slot, corrupting the precomputed hash.

@casperisfine casperisfine force-pushed the interned-string-hashcode branch 5 times, most recently from 2bd83d4 to 8f949a4 Compare April 22, 2024 14:19
@casperisfine
Copy link
Contributor

Alright, I think the bugs have been squashed now.

@etiennebarrie etiennebarrie force-pushed the interned-string-hashcode branch from 8f949a4 to 39fae7c Compare May 6, 2024 09:45
@casperisfine casperisfine force-pushed the interned-string-hashcode branch from 39fae7c to 44bc1f4 Compare May 9, 2024 12:58
With embedded strings we often have some space left in the slot, which
we can use to store the string Hash code.

It's probably only worth it for string literals, as they are the ones
likely to be used as hash keys.

We chose to store the Hash code right after the string terminator as to
make it easy/fast to compute, and not require one more union in RString.

```
compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code

|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol      |     39.275M|   39.753M|
|            |           -|     1.01x|
|dyn_symbol  |     37.348M|   37.704M|
|            |           -|     1.01x|
|small_lit   |     29.514M|   33.948M|
|            |           -|     1.15x|
|frozen_lit  |     27.180M|   33.056M|
|            |           -|     1.22x|
|iseq_lit    |     27.391M|   32.242M|
|            |           -|     1.18x|
```

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy