-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Misc] Add Ray Prometheus logger to V1 #17925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Add Ray Prometheus logger to V1 #17925
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Mainly coding style and structure discussions.
Also cc @markmc
This pull request has merge conflicts that must be resolved before it can be |
05b63be
to
1c773e2
Compare
fc1838a
to
b793da4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No fundamental objection to adding a vllm.v1.metrics.ray.RayPrometheusStatLogger
custom logger
I'd prefer not to add the extra PrometheusMetrics
indirection
And ~90% of the diff is unnecessary reformatting of code which makes it more difficult to review
Thanks for the review @markmc! Apologies for reformatting changes. My pre-commit hook may have been configured incorrectly. |
a923f87
to
a41f879
Compare
Hey @eicherseiji I made some changes to the interface of PrometheusStatLogger that I think will conflict your stuff, but at the same time it will make this PR easier to write and review if you incorporate those changes. for easier review https://github.com/njhill/vllm/pull/6/files but this one is actually against master (which includes other stuff) #18053 |
Done. Thanks @kouroshHakha |
ece396d
to
3079a87
Compare
Starting with a rebase due to test failure:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty happy to merge the ray stuff, but there's still a lot of unrelated changes I'd like to see removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @markmc for the review! Going to remove mentions to multiprocess_mode
and unnecessary type hint changes for now.
Let me know what the recommendation is for Ray wrapper class injection (should we prefer static class variables for now?)
What you had the first time around with the static class variables was fine - I just asked to remove the extra |
858f20b
to
a0f1b00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for your patience. Just some questions on the test you added
tests/metrics/test_ray_metrics.py
Outdated
|
||
# Create the actor and call the async method | ||
actor = EngineTestActor.remote() # type: ignore[attr-defined] | ||
ray.get(actor.run.remote()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not actually verifying metrics behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a smoke test for now; added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future it would be helpful to verify output to avoid regressions. But setting up a ray cluster in the test environment could be non-trivial.
So I think that ROI is okay with this to start, given the current design (class injection with a wrapper that's verified in ray-project).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded to comments @markmc! 🙏
tests/metrics/test_ray_metrics.py
Outdated
|
||
# Create the actor and call the async method | ||
actor = EngineTestActor.remote() # type: ignore[attr-defined] | ||
ray.get(actor.run.remote()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a smoke test for now; added a comment.
tests/metrics/test_ray_metrics.py
Outdated
|
||
# Create the actor and call the async method | ||
actor = EngineTestActor.remote() # type: ignore[attr-defined] | ||
ray.get(actor.run.remote()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future it would be helpful to verify output to avoid regressions. But setting up a ray cluster in the test environment could be non-trivial.
So I think that ROI is okay with this to start, given the current design (class injection with a wrapper that's verified in ray-project).
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
4af2dd1
to
ffc3331
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for your patience!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamp
Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Implements the following item from #10582:
Tested with V1 as follows (with ray-project/ray#52719):
VLLM_USE_V1=1 python serve.py
Ray dashboard output:
