Content-Length: 371849 | pFad | http://github.com/python/cpython/pull/92142

A6 Reduce branches in PyUnicode_MAX_CHAR_VALUE by kumaraditya303 · Pull Request #92142 · python/cpython · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce branches in PyUnicode_MAX_CHAR_VALUE #92142

Closed
wants to merge 3 commits into from

Conversation

kumaraditya303
Copy link
Contributor

Offers a minor 5.5 % speed improvement in microbenchmark but less branches are better especially when it can be done with a small lookup table.

Benchmark:

./python -m timeit -s "a='🚀'" "a*1000"
# Before
2000000 loops, best of 5: 182 nsec per loop
# After
2000000 loops, best of 5: 172 nsec per loop

@kumaraditya303
Copy link
Contributor Author

I don't understand why msvc is failing to compile.

@kumaraditya303 kumaraditya303 force-pushed the fast-string branch 2 times, most recently from 504bde2 to ca2513a Compare May 2, 2022 15:51
@kumaraditya303
Copy link
Contributor Author

kumaraditya303 commented May 2, 2022

MSVC is failing to compile and test_cppext is failing as C++ does not support designated initializers in array by #91321

@@ -436,17 +436,17 @@ static inline Py_UCS4 PyUnicode_MAX_CHAR_VALUE(PyObject *op)
if (PyUnicode_IS_ASCII(op)) {
return 0x7fU;
}

static const Py_UCS4 max_char_values[] = {
[PyUnicode_1BYTE_KIND] = 0xffU,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just do it the oldfashioned way?

Copy link
Contributor Author

@kumaraditya303 kumaraditya303 May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done but I wanted to use designated initializer, it looks better

@kumaraditya303 kumaraditya303 requested a review from gvanrossum May 3, 2022 04:09
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you benchmark it on a build with PGO and LTO? Could you also compare before/after for 1-byte wide and 2-byte wide strings? On my Mac (in debug mode, so far) the odd thing is that the 2-byte mode is faster than the 1-byte mode.

I also wonder whether having static data inside a static inline function (that's potentially expanded many times) is a good idea.

0xffffU, /* PyUnicode_2BYTE_KIND */
0,
0x10ffffU, /* PyUnicode_4BYTE_KIND */
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since kind is 3 bits, maybe add extra zero entries for a total of 8?

@gvanrossum
Copy link
Member

I do think it's odd that it fails on MSVC, since designated initializers in arrays are supported -- see e.g. Include/internal/pycore_opcode.h. I wonder if MSVC doesn't like having it in a static inline function?

@kumaraditya303
Copy link
Contributor Author

Well the benchmark is so small that pgo and lto gives inconsistent result because of cpu boost and binary layout. I reject this as this is insignificant and not worth it.

@kumaraditya303 kumaraditya303 deleted the fast-string branch May 4, 2022 10:40
@gvanrossum
Copy link
Member

Okay, we learned a few things. :-)

  • Use PGO+LTO for benchmarking, even micro-benchmarking.
  • Benchmarking is important.
  • Maybe a static array inside a static inline function isn't a great idea.
  • MSVC seems picky about those.

@kumaraditya303
Copy link
Contributor Author

Use PGO+LTO for benchmarking, even micro-benchmarking.

Yes, but lto takes a lot of time to compile and can vary in results up to 5 to 10%.

Maybe a static array inside a static inline function isn't a great idea.

Linker would de-dedup those in the executable.

MSVC seems picky about those.

MSVC is picky about designated initializers in array in C++ mode, in C mode it works fine.

@gvanrossum
Copy link
Member

Got it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/python/cpython/pull/92142

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy