Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: CentML AI Inference Provider Integration #810

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

V2arK
Copy link

@V2arK V2arK commented Jan 17, 2025

What does this PR do?

Add CentML as a Remote Inference Provider in llama-stack.

This PR integrates CentML into llama-stack, enabling users to utilize CentML's models (meta-llama/Llama-3.3-70B-Instruct and meta-llama/Llama-3.2-3B-Instruct) for inference tasks like chat and text completions.

Right now only supported for conda deployments, simply build with llama stack build --template centml --image-type conda and run with llama stack run run.yaml --port <PORT> --env CENTML_API_KEY=<API_KEY>, then use llama-stack-client to perform any inference workload as needed.

Key Changes:

  • Added CentML as a remote inference provider with model supports for meta-llama/Llama-3.3-70B-Instruct and meta-llama/Llama-3.2-3B-Instruct.

Addresses issue #809


Test Plan

pytest -s -v --stack-config inference=centml ./tests/integration/inference/test_text_inference.py --text-model "meta-llama/Llama-3.2-3B-Instruct" --env CENTML_API_KEY=*********
/Users/honglin/.pyenv/versions/3.12.0/lib/python3.12/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
================================================ test session starts ================================================
****
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 16 items                                                                                                  

tests/integration/inference/test_text_inference.py::test_text_completion_non_streaming[txt=3B-inference:completion:sanity] PASSED
tests/integration/inference/test_text_inference.py::test_text_completion_streaming[txt=3B-inference:completion:sanity] PASSED
tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=3B-inference:completion:log_probs] PASSED
tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=3B-inference:completion:log_probs] PASSED
tests/integration/inference/test_text_inference.py::test_text_completion_structured_output[txt=3B-inference:completion:structured_output] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=3B-inference:chat_completion:non_streaming_01] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=3B-inference:chat_completion:non_streaming_02] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=3B-inference:chat_completion:streaming_01] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=3B-inference:chat_completion:streaming_02] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[txt=3B-inference:chat_completion:tool_calling] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[txt=3B-inference:chat_completion:tool_calling] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[txt=3B-inference:chat_completion:tool_calling] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[txt=3B-inference:chat_completion:tool_calling] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output[txt=3B-inference:chat_completion:structured_output] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=3B-inference:chat_completion:tool_calling_tools_absent-True] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=3B-inference:chat_completion:tool_calling_tools_absent-False] PASSED

========================================== 16 passed, 2 warnings in 9.74s ===========================================


---

Manual running

llama stack build --template centml --image-type conda
llama stack run <run.yaml> --image-name centml --port 5001 --env CENTML_API_KEY=<API_KEY>

INFO:     Started server process [75830]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)

02:13:02.671 [START] /v1/inference/chat-completion
INFO     2025-03-12 22:13:03,656 httpx:1025 uncategorized: HTTP Request: POST                                           
         https://api.centml.com/openai/v1/completions "HTTP/1.1 200 OK"                                                 
INFO:     ::1:58547 - "POST /v1/inference/chat-completion HTTP/1.1" 200 OK
02:13:03.669 [END] /v1/inference/chat-completion [StatusCode.OK] (997.47ms)
llama-stack-client inference chat-completion --message "hello, what model are you" --model-id meta-llama/Llama
-3.1-3B-Instruct   
ChatCompletionResponse(
    completion_message=CompletionMessage(
        content="Hello! I'm an AI, which stands for Artificial Intelligence. I'm a",
        role='assistant',
        stop_reason='out_of_tokens',
        tool_calls=[]
    ),
    logprobs=None,
    metrics=None
)

Sources

  • Related Issue: #809


Before submitting

@facebook-github-bot
Copy link

Hi @V2arK!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 17, 2025
@V2arK V2arK marked this pull request as ready for review January 21, 2025 22:03
@V2arK V2arK requested a review from ehhuang as a code owner February 4, 2025 21:22
@V2arK V2arK force-pushed the v2ark/add_centml branch from 97b745a to 9dcf238 Compare March 4, 2025 15:25
@V2arK V2arK requested a review from terrytangyuan as a code owner March 4, 2025 15:25
@V2arK V2arK changed the title CentML AI Inference Provider Integration feat: CentML AI Inference Provider Integration Mar 4, 2025
@V2arK V2arK requested a review from SLR722 as a code owner March 6, 2025 00:37
@V2arK V2arK force-pushed the v2ark/add_centml branch from ae90822 to 12a50a9 Compare March 6, 2025 00:43
@V2arK V2arK force-pushed the v2ark/add_centml branch from 12a50a9 to e2290a0 Compare March 11, 2025 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy