Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a remote-vllm integration test to GitHub Actions workflow. #1648

Open
wukaixingxp opened this issue Mar 14, 2025 · 3 comments
Open

Add a remote-vllm integration test to GitHub Actions workflow. #1648

wukaixingxp opened this issue Mar 14, 2025 · 3 comments
Labels
api provider Specific to a Llama Stack API Provider enhancement New feature or request

Comments

@wukaixingxp
Copy link
Contributor

wukaixingxp commented Mar 14, 2025

🚀 Describe the new functionality needed

Given that vllm has been a very popular choice for inference solution, I would like to suggest we add a remote-vllm integration test to GitHub Actions workflow, maybe test the CPU version of vLLM on 1B/3B model is enough, similar to this PR for adding Ollama test

💡 Why is this needed? What if we don't build it?

vLLM provider maybe broken and many users/companies can not use llama-stack with vLLM.

Other thoughts

will add some inference costs but I believe making sure vLLM provider is working well with llama-stack is very important.

@wukaixingxp wukaixingxp added api provider Specific to a Llama Stack API Provider enhancement New feature or request labels Mar 14, 2025
@ashwinb
Copy link
Contributor

ashwinb commented Mar 14, 2025

Here's how you can run vllm easy enough:

uv run --with vllm --python 3.12 vllm serve meta-llama/Llama-3.2-3B-Instruct

This probably needs a huggingface token though which has permissions to read the protected llama repository :/

@nathan-weinberg
Copy link
Contributor

Couldn't you just use a non-Llama model that doesn't require a HuggingFace token? Or are only Llama models support with the vLLM provider?

@wukaixingxp
Copy link
Contributor Author

Here's how you can run vllm easy enough:

uv run --with vllm --python 3.12 vllm serve meta-llama/Llama-3.2-3B-Instruct
This probably needs a huggingface token though which has permissions to read the protected llama repository :/

Do we have a way to store secrets in the Github action? I wonder how we are testing meta-reference server? as it also need some credential to get our Pytorch weights..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api provider Specific to a Llama Stack API Provider enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy