feat: CentML AI Inference Provider Integration #810

V2arK · 2025-01-17T19:36:22Z

What does this PR do?

Add CentML as a Remote Inference Provider in llama-stack.

This PR integrates CentML into llama-stack, enabling users to utilize CentML's models (meta-llama/Llama-3.3-70B-Instruct and meta-llama/Llama-3.2-3B-Instruct) for inference tasks like chat and text completions.

Right now only supported for conda deployments, simply build with llama stack build --template centml --image-type conda and run with llama stack run run.yaml --port <PORT> --env CENTML_API_KEY=<API_KEY>, then use llama-stack-client to perform any inference workload as needed.

Key Changes:

Added CentML as a remote inference provider with model supports for meta-llama/Llama-3.3-70B-Instruct and meta-llama/Llama-3.2-3B-Instruct.

Addresses issue #809

Test Plan

pytest -s -v --stack-config inference=centml ./tests/integration/inference/test_text_inference.py --text-model "meta-llama/Llama-3.2-3B-Instruct" --env CENTML_API_KEY=*********
/Users/honglin/.pyenv/versions/3.12.0/lib/python3.12/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
================================================ test session starts ================================================
****
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 16 items                                                                                                  

tests/integration/inference/test_text_inference.py::test_text_completion_non_streaming[txt=3B-inference:completion:sanity] PASSED
tests/integration/inference/test_text_inference.py::test_text_completion_streaming[txt=3B-inference:completion:sanity] PASSED
tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=3B-inference:completion:log_probs] PASSED
tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=3B-inference:completion:log_probs] PASSED
tests/integration/inference/test_text_inference.py::test_text_completion_structured_output[txt=3B-inference:completion:structured_output] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=3B-inference:chat_completion:non_streaming_01] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=3B-inference:chat_completion:non_streaming_02] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=3B-inference:chat_completion:streaming_01] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=3B-inference:chat_completion:streaming_02] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[txt=3B-inference:chat_completion:tool_calling] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[txt=3B-inference:chat_completion:tool_calling] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[txt=3B-inference:chat_completion:tool_calling] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[txt=3B-inference:chat_completion:tool_calling] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output[txt=3B-inference:chat_completion:structured_output] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=3B-inference:chat_completion:tool_calling_tools_absent-True] PASSED
tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=3B-inference:chat_completion:tool_calling_tools_absent-False] PASSED

========================================== 16 passed, 2 warnings in 9.74s ===========================================


---

Manual running

llama stack build --template centml --image-type conda

llama stack run <run.yaml> --image-name centml --port 5001 --env CENTML_API_KEY=<API_KEY>

INFO:     Started server process [75830]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)

02:13:02.671 [START] /v1/inference/chat-completion
INFO     2025-03-12 22:13:03,656 httpx:1025 uncategorized: HTTP Request: POST                                           
         https://api.centml.com/openai/v1/completions "HTTP/1.1 200 OK"                                                 
INFO:     ::1:58547 - "POST /v1/inference/chat-completion HTTP/1.1" 200 OK
02:13:03.669 [END] /v1/inference/chat-completion [StatusCode.OK] (997.47ms)

llama-stack-client inference chat-completion --message "hello, what model are you" --model-id meta-llama/Llama
-3.1-3B-Instruct   
ChatCompletionResponse(
    completion_message=CompletionMessage(
        content="Hello! I'm an AI, which stands for Artificial Intelligence. I'm a",
        role='assistant',
        stop_reason='out_of_tokens',
        tool_calls=[]
    ),
    logprobs=None,
    metrics=None
)

Sources

Related Issue: #809

Before submitting

This PR fixes an issue (CentML inference provider support #809)
Ran pre-commit to handle lint / formatting issues.
Read the contributor guideline, Pull Request section.
Updated relevant documentation.
Wrote necessary unit or integration tests.

facebook-github-bot · 2025-01-17T19:36:29Z

Hi @V2arK!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

V2arK requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic and sixianyi0721 as code owners January 17, 2025 19:36

V2arK marked this pull request as draft January 17, 2025 19:36

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 17, 2025

V2arK force-pushed the v2ark/add_centml branch from 79d74a9 to 9bce587 Compare January 17, 2025 20:23

V2arK marked this pull request as ready for review January 21, 2025 22:03

V2arK requested a review from ehhuang as a code owner February 4, 2025 21:22

V2arK force-pushed the v2ark/add_centml branch from 2463d70 to 3a224f5 Compare February 4, 2025 22:01

V2arK force-pushed the v2ark/add_centml branch from 97b745a to 9dcf238 Compare March 4, 2025 15:25

V2arK requested a review from terrytangyuan as a code owner March 4, 2025 15:25

V2arK changed the title ~~CentML AI Inference Provider Integration~~ feat: CentML AI Inference Provider Integration Mar 4, 2025

V2arK requested a review from SLR722 as a code owner March 6, 2025 00:37

V2arK force-pushed the v2ark/add_centml branch from ae90822 to 12a50a9 Compare March 6, 2025 00:43

V2arK and others added 10 commits March 11, 2025 10:45

Add centml as remote inference provider

dc1ff40

ruff fix and format

e20228a

ruff fix

6c1b172

fixing centml get params

102af46

fix after rebasing, now test isn't working

acc4d75

fix stuff after rebasing - 2025/03/04

3742aff

change to dev

98549b8

fix endpoint

e31a52b

change to dev, fix issues with test

941d5f1

fix top k, add in comments

d1f67d9

V2arK added 2 commits March 11, 2025 11:18

fix intergration tests

a454b53

change to adapter

e2290a0

V2arK force-pushed the v2ark/add_centml branch from 12a50a9 to e2290a0 Compare March 11, 2025 20:37

V2arK added 5 commits March 11, 2025 16:43

revert indent changes on inference.py

2fb7981

revert changes on python version

7c40048

fix endpoints

3ab672d

add comments

0cef9ad

ruff fix

8943755

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CentML AI Inference Provider Integration #810

feat: CentML AI Inference Provider Integration #810

V2arK commented Jan 17, 2025 •

edited

Loading

facebook-github-bot commented Jan 17, 2025

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

feat: CentML AI Inference Provider Integration #810

Are you sure you want to change the base?

feat: CentML AI Inference Provider Integration #810

Conversation

V2arK commented Jan 17, 2025 • edited Loading

What does this PR do?

Key Changes:

Addresses issue #809

Test Plan

Manual running

Sources

Related Issue: #809

Before submitting

facebook-github-bot commented Jan 17, 2025

Action Required

Process

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

V2arK commented Jan 17, 2025 •

edited

Loading