-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce branches in PyUnicode_MAX_CHAR_VALUE #92142
Conversation
134c043
to
ca2513a
Compare
I don't understand why msvc is failing to compile. |
504bde2
to
ca2513a
Compare
MSVC is failing to compile and test_cppext is failing as C++ does not support designated initializers in array by #91321 |
Include/cpython/unicodeobject.h
Outdated
@@ -436,17 +436,17 @@ static inline Py_UCS4 PyUnicode_MAX_CHAR_VALUE(PyObject *op) | |||
if (PyUnicode_IS_ASCII(op)) { | |||
return 0x7fU; | |||
} | |||
|
|||
static const Py_UCS4 max_char_values[] = { | |||
[PyUnicode_1BYTE_KIND] = 0xffU, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just do it the oldfashioned way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done but I wanted to use designated initializer, it looks better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you benchmark it on a build with PGO and LTO? Could you also compare before/after for 1-byte wide and 2-byte wide strings? On my Mac (in debug mode, so far) the odd thing is that the 2-byte mode is faster than the 1-byte mode.
I also wonder whether having static data inside a static inline function (that's potentially expanded many times) is a good idea.
0xffffU, /* PyUnicode_2BYTE_KIND */ | ||
0, | ||
0x10ffffU, /* PyUnicode_4BYTE_KIND */ | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since kind
is 3 bits, maybe add extra zero entries for a total of 8?
I do think it's odd that it fails on MSVC, since designated initializers in arrays are supported -- see e.g. Include/internal/pycore_opcode.h. I wonder if MSVC doesn't like having it in a static inline function? |
Well the benchmark is so small that pgo and lto gives inconsistent result because of cpu boost and binary layout. I reject this as this is insignificant and not worth it. |
Okay, we learned a few things. :-)
|
Yes, but lto takes a lot of time to compile and can vary in results up to 5 to 10%.
Linker would de-dedup those in the executable.
MSVC is picky about designated initializers in array in C++ mode, in C mode it works fine. |
Got it! |
Offers a minor 5.5 % speed improvement in microbenchmark but less branches are better especially when it can be done with a small lookup table.
Benchmark:
./python -m timeit -s "a='🚀'" "a*1000"