You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given that vllm has been a very popular choice for inference solution, I would like to suggest we add a remote-vllm integration test to GitHub Actions workflow, maybe test the CPU version of vLLM on 1B/3B model is enough, similar to this PR for adding Ollama test
💡 Why is this needed? What if we don't build it?
vLLM provider maybe broken and many users/companies can not use llama-stack with vLLM.
Other thoughts
will add some inference costs but I believe making sure vLLM provider is working well with llama-stack is very important.
The text was updated successfully, but these errors were encountered:
uv run --with vllm --python 3.12 vllm serve meta-llama/Llama-3.2-3B-Instruct
This probably needs a huggingface token though which has permissions to read the protected llama repository :/
Do we have a way to store secrets in the Github action? I wonder how we are testing meta-reference server? as it also need some credential to get our Pytorch weights..
🚀 Describe the new functionality needed
Given that vllm has been a very popular choice for inference solution, I would like to suggest we add a remote-vllm integration test to GitHub Actions workflow, maybe test the CPU version of vLLM on 1B/3B model is enough, similar to this PR for adding Ollama test
💡 Why is this needed? What if we don't build it?
vLLM provider maybe broken and many users/companies can not use llama-stack with vLLM.
Other thoughts
will add some inference costs but I believe making sure vLLM provider is working well with llama-stack is very important.
The text was updated successfully, but these errors were encountered: