Skip to content

runpod-workers/worker-infinity-embedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Infinity Embedding Worker Banner


High-throughput, OpenAI-compatible text embedding & reranker powered by Infinity


RunPod


  1. Quickstart
  2. Endpoint Configuration
  3. API Specification
    1. List Models
    2. Create Embeddings
    3. Rerank Documents
  4. Usage
  5. Further Documentation
  6. Acknowledgements

Quickstart

  1. 🐳 Pull an image – use the tag shown on the latest GitHub release page (e.g. runpod/worker-infinity-embedding:<version>)
  2. πŸ”§ Configure – set at least MODEL_NAMES (see Endpoint Configuration)
  3. πŸš€ Deploy – create a RunPod Serverless endpoint
  4. πŸ§ͺ Call the API – follow the example in the Usage section

Endpoint Configuration

All behaviour is controlled through environment variables:

Variable Required Default Description
MODEL_NAMES Yes β€” One or more Hugging-Face model IDs. Separate multiple IDs with a semicolon.
Example: BAAI/bge-small-en-v1.5
BATCH_SIZES No 32 Per-model batch size; semicolon-separated list matching MODEL_NAMES.
BACKEND No torch Inference engine for all models: torch, optimum, or ctranslate2.
DTYPES No auto Precision per model (auto, fp16, fp8). Semicolon-separated, must match MODEL_NAMES.
INFINITY_QUEUE_SIZE No 48000 Max items queueable inside the Infinity engine.
RUNPOD_MAX_CONCURRENCY No 300 Max concurrent requests the RunPod wrapper will accept.

API Specification

Two flavours, one schema.

  • OpenAI-compatible – drop-in replacement for /v1/models, /v1/embeddings, so you can use this endpoint instead of the API from OpenAI by replacing the base url with the URL of your endpoint: https://api.runpod.ai/v2/<ENDPOINT_ID>/openai/v1 and use your API key from RunPod instead of the one from OpenAI
  • Standard RunPod – call /run or /runsync with a JSON body under the input key.
    Base URL: https://api.runpod.ai/v2/<ENDPOINT_ID>

Except for transport (path + wrapper object) the JSON you send/receive is identical. The tables below describe the shared payload.

List Models

Method Path Body
GET /openai/v1/models –
POST /runsync { "input": { "openai_route": "/v1/models" } }

Response

{
  "data": [
    { "id": "BAAI/bge-small-en-v1.5", "stats": {} },
    { "id": "intfloat/e5-large-v2", "stats": {} }
  ]
}

Create Embeddings

Request Fields (shared)

Field Type Required Description
model string Yes One of the IDs supplied via MODEL_NAMES.
input string | array Yes A single text string or list of texts to embed.

OpenAI route vs. Standard:

Flavour Method Path Body
OpenAI POST /v1/embeddings { "model": "…", "input": "…" }
Standard POST /runsync { "input": { "model": "…", "input": "…" } }

Response (both flavours)

{
  "object": "list",
  "model": "BAAI/bge-small-en-v1.5",
  "data": [
    { "object": "embedding", "embedding": [0.01, -0.02 /* … */], "index": 0 }
  ],
  "usage": { "prompt_tokens": 2, "total_tokens": 2 }
}

Rerank Documents (Standard only)

Field Type Required Description
model string Yes Any deployed reranker model
query string Yes The search/query text
docs array Yes List of documents to rerank
return_docs bool No If true, return the documents in ranked order (default false)

Call pattern

POST /runsync
Content-Type: application/json

{
  "input": {
    "model": "BAAI/bge-reranker-large",
    "query": "Which product has warranty coverage?",
    "docs": [
      "Product A comes with a 2-year warranty",
      "Product B is available in red and blue colors",
      "All electronics include a standard 1-year warranty"
    ],
    "return_docs": true
  }
}

Response contains either scores or the full docs list, depending on return_docs.


Usage

Below are minimal curl snippets so you can copy-paste from any machine.

Replace <ENDPOINT_ID> with your endpoint ID and <API_KEY> with a RunPod API key.

OpenAI-Compatible Calls

# List models
curl -H "Authorization: Bearer <API_KEY>" \
     https://api.runpod.ai/v2/<ENDPOINT_ID>/openai/v1/models

# Create embeddings
curl -X POST \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"model":"BAAI/bge-small-en-v1.5","input":"Hello world"}' \
  https://api.runpod.ai/v2/<ENDPOINT_ID>/openai/v1/embeddings

Standard RunPod Calls

# Create embeddings (wait for result)
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"input":{"model":"BAAI/bge-small-en-v1.5","input":"Hello world"}}' \
  https://api.runpod.ai/v2/<ENDPOINT_ID>/runsync

# Rerank
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"input":{"model":"BAAI/bge-reranker-large","query":"Which product has warranty coverage?","docs":["Product A comes with a 2-year warranty","Product B is available in red and blue colors","All electronics include a standard 1-year warranty"],"return_docs":true}}' \
  https://api.runpod.ai/v2/<ENDPOINT_ID>/runsync

Further Documentation


Acknowledgements

Special thanks to Michael Feil for creating the Infinity engine and for his ongoing support of this project.

About

Create embeddings with infinity as serverless endpoint

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 10

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy