-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support rng for summary_plot #3945
base: master
Are you sure you want to change the base?
ENH: support rng for summary_plot #3945
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR, this looks sensible to me. I just need to think whether we'll actually want to introduce the keyword that might become redundant once we implement a Generator version for the rng. On the other hand, this is just an optional parameter but please give me some time to read into this and come back to you.
Also about the tests: these fail in all pipelines and are not caused by your changes.
Thanks for the ping. I suppose if I'm here, I'd suggest using one of the legacy names for the keyword unless it is going to follow the SPEC 7 plan. If this keyword is intended to be permanent, a different name might be helpful to signal different behavior from the ecosystem standard; if not (if the library wants to adopt the standard some day, but not yet), it's easier to replace the keyword than to change an existing keyword's behavior in backyard compatible way. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3945 +/- ##
==========================================
+ Coverage 64.67% 64.68% +0.01%
==========================================
Files 92 92
Lines 12862 12873 +11
==========================================
+ Hits 8318 8327 +9
- Misses 4544 4546 +2 ☔ View full report in Codecov by Sentry. |
* Some folks on my team are having issues writing tests downstream of `shap` because `summary_plot` mutates the NumPy legacy random machinery global state, which is hard to work around in a large testsuite. It would be great if a heavily-used project in the ML space like `shap` could start to adopt support for local random `Generator` objects per the community document at: https://scientific-python.org/specs/spec-0007/ This PR is a lot cruder than that approach, but appears to allow preservation of the current global state behavior while allowing the option of using the modern non-global approach for downstream developers that need it. * This patch adds a regression test that fails when not using the optional `rng` argument, and passes when it is used to scope to a local random state. If there is a preference for using the full SPEC 7 approach (and that may very well be the best idea), I may ask one of my team members to help out a bit, since they will benefit from it. * The full testsuite passed locally via `pytest --import-mode=append`
for more information, see https://pre-commit.ci
* Ignore some `mypy` false positives in `test_summary_plot_seed_insulated` * Switch from `rng` to `seed` argument name for `summary_legacy()` based on reviewer feedback, to leave an easier route open to SPEC 7.
d46a1e8
to
5671ab2
Compare
for more information, see https://pre-commit.ci
@CloseChoice @connortann I've revised in the following ways, let me know if you'd like any other changes:
|
Thank you @tylerjereddy for the PR and for bringing SPEC 7 to our attention. It is fantastic to have you as a contributor. I think it would make sense for I think we should introduce the new Initially:
Then in future:
The initial change could be implemented as: def my_func(*, rng=None):
if rng is not None:
# If rng argument is provided, normalise as per SPEC 7
rng = np.random.default_rng(rng)
else:
# Otherwise, maintain backwards compatibility for now, raising a warning if the global seed was set
global_seed_set = np.random.mtrand._rand._bit_generator._seed_seq is None
if global_seed_set:
msg = (
"The NumPy global RNG was seeded by calling `np.random.seed`. "
"In a future version this function will no longer use the global RNG. "
"Pass `rng` explicitly to opt-in to the new behaviour and silence this warning."
)
warnings.warn(msg, FutureWarning, stacklevel=2)
# Example use
inds = np.arange(10)
if rng is None:
# For now, maintain backwards compatibility.
np.random.shuffle(inds)
else:
rng.shuffle(inds)
Then at some point in future we'd just have: def my_func(*, rng=None):
rng = np.random.default_rng(rng)
inds = np.arange(10)
rng.shuffle(inds) PS - @tylerjereddy you might be interested in the contributor discussion #3559 |
Some folks on my team are having issues writing tests downstream of
shap
becausesummary_plot
mutates the NumPy legacy random machinery global state, which is hard to work around in a large testsuite. It would be great if a heavily-used project in the ML space likeshap
could start to adopt support for local randomGenerator
objects per the community document at https://scientific-python.org/specs/spec-0007/. This PR is a lot cruder than that approach, but appears to allow preservation of the current global state behavior while allowing the option of using the modern non-global approach for downstream developers that need it.This patch adds a regression test that fails when not using the optional
rng
argument, and passes when it is used to scope to a local random state. If there is a preference for using the full SPEC 7 approach (and that may very well be the best idea), I may ask one of my team members to help out a bit, since they will benefit from it.The full testsuite passed locally via
pytest --import-mode=append
Maybe I'll cc @mdhaber @tupui -- not expecting a review, but because they also work in stats/ML space and were involved in the SPEC.
Checklist