-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Make RustPython benchmarks readable #5742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for bringing this up. These issues do need to be addressed. Currently our benchmarks graphs are handled by criterion: https://bheisler.github.io/criterion.rs/book/user_guide/comparing_functions.html Sadly from what I know, criterion doesn’t support graphing config. The workflow that updates these benchmarks is here: RustPython/.github/workflows/cron-ci.yaml Line 111 in c97f4d1
Now everything else (display-wise) is located here: I think the last 3 bullet points should be very feasible to implement. The graph related ones could either be done during CI via a custom script or on the frontend with a js plotting library. |
I don't think anyone needs to be convinced of anything; in my opinion, you use the right tool depending on the project. Maybe it would be a good idea to ask if RustPython should be included in various distribution tools like pyenv or similar. In any case, I can try to understand and solve this issue. |
From what I could understand, Criterion uses a parameter in the function I'm not experienced enough in Rust to confirm whether this constant can be modified before compile-time depending on whether you're testing "RustPython" or "CPython." Even when changing Criterion's backend, this parameter doesn't seem to be available, not to mention criterion-plot (an unmaintained library). One method to change the color scheme appears to be using a [colors]
# These are used in many charts to compare the current measurement against
# the previous one.
current_sample = {r = 31, g = 120, b = 180}
previous_sample = {r = 7, g = 26, b = 28}
# These are used by the full PDF chart to highlight which samples were outliers.
not_an_outlier = {r = 31, g = 120, b = 180}
mild_outlier = {r = 5, g = 127, b = 0}
severe_outlier = {r = 7, g = 26, b = 28}
# These are used for the line chart to compare multiple different functions.
comparison_colors = [
{r = 230, g = 25, b = 75}, # red
{r = 60, g = 180, b = 75}, # green
{r = 255, g = 225, b = 25}, # yellow
{r = 0, g = 130, b = 200}, # blue
{r = 245, g = 130, b = 48}, # orange
{r = 145, g = 30, b = 180}, # purple
{r = 70, g = 240, b = 240}, # light cyan
{r = 240, g = 50, b = 230}, # magenta
]
The solution shold be to find a library that can replace Criterion as quickly as possible while also returning a similar HTML report. |
TL;DR: benchmarks are poorly readable and could be greatly improved. This is key element in convincing people of the soundness of RustPython so it should probably not be neglected IMHO.
The violin plots available here are not easily readable and their Y-axes labels are hardly readable at all because they got left-cut at some point. This is especially troublesome for the
MICROBENCHMARKS
section, for which it is impossible to tell RustPython from CPython.This issue could be alleviated by doing the following:
EXECUTION
tab, CPython is on top and RustPython on bottom, while in tabPARSE_TO_AST
it is the other way around).execution/mandelbrot.py/cpython
by eitherMandelbrot
(and use a legend to indicate which color is which interpreter), or make a plot title sayingMandelbrot
and use the Y-axis labels to tell whether it is CPython or RustPython.In addition to these visual issues, some other improvements could be implemented:
plotly
.-o3
locally, as well as the machine specs (this would allow for meaningful comparison and reproducibility).I think that benchmarks one of the key element that might convince anyone to switch from one interpreter to another (apart from functionalities / low-level bindings). Hence they should not be neglected.
If someone could point me to where these plots are generated, I'd be happy to help typesetting them / add further info (although I might need some technical support about why benchmark X is especially relevant or not).
The text was updated successfully, but these errors were encountered: