Skip to content
forked from BerriAI/litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

License

Notifications You must be signed in to change notification settings

hyamanieu/litellm

Β 
Β 

Repository files navigation

πŸš… LiteLLM

Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]

LiteLLM manages

  • Translating inputs to the provider's completion and embedding endpoints
  • Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
  • Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.

10/05/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more
10/16/2023: Self-hosted OpenAI-proxy server Learn more

Usage (Docs)

Important

LiteLLM v1.0.0 is now requires openai>=1.0.0. Migration guide here

Open In Colab
pip install litellm
from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-openai-key" 
os.environ["COHERE_API_KEY"] = "your-cohere-key" 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)

Streaming (Docs)

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)

from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
    print(chunk['choices'][0]['delta'])

# claude 2
result = completion('claude-2', messages, stream=True)
for chunk in result:
  print(chunk['choices'][0]['delta'])

OpenAI Proxy - (Docs)

If you want to use non-openai models in an openai code base, you can use litellm proxy. Create a server to call 100+ LLMs (Huggingface/Bedrock/TogetherAI/etc) in the OpenAI ChatCompletions & Completions format

Step 1: Start litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:8000

Step 2: Replace openai base

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

Logging Observability (Docs)

LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack

from litellm import completion

## set env variables for logging tools
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"

os.environ["OPENAI_API_KEY"]

# set callbacks
litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase

#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}])

Supported Provider (Docs)

Provider Completion Streaming Async Completion Async Streaming
openai βœ… βœ… βœ… βœ…
azure βœ… βœ… βœ… βœ…
aws - sagemaker βœ… βœ… βœ… βœ…
aws - bedrock βœ… βœ… βœ… βœ…
cohere βœ… βœ… βœ… βœ…
anthropic βœ… βœ… βœ… βœ…
huggingface βœ… βœ… βœ… βœ…
replicate βœ… βœ… βœ… βœ…
together_ai βœ… βœ… βœ… βœ…
openrouter βœ… βœ… βœ… βœ…
google - vertex_ai βœ… βœ… βœ… βœ…
google - palm βœ… βœ… βœ… βœ…
ai21 βœ… βœ… βœ… βœ…
baseten βœ… βœ… βœ… βœ…
vllm βœ… βœ… βœ… βœ…
nlp_cloud βœ… βœ… βœ… βœ…
aleph alpha βœ… βœ… βœ… βœ…
petals βœ… βœ… βœ… βœ…
ollama βœ… βœ… βœ… βœ…
deepinfra βœ… βœ… βœ… βœ…
perplexity-ai βœ… βœ… βœ… βœ…
anyscale βœ… βœ… βœ… βœ…

Read the Docs

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Step 1: Clone the repo

git clone https://github.com/BerriAI/litellm.git

Step 2: Navigate into the project, and install dependencies:

cd litellm
poetry install

Step 3: Test your change:

cd litellm/tests # pwd: Documents/litellm/litellm/tests
pytest .

Step 4: Submit a PR with your changes! πŸš€

  • push your fork to your GitHub repo
  • submit a PR from there

Support / talk with founders

Why did we build this

  • Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.

Contributors

About

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Other 0.1%
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy