Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

High-throughput, OpenAI-compatible text embedding & reranker powered by Infinity

Quickstart

🐳 Pull an image – use the tag shown on the latest GitHub release page (e.g. runpod/worker-infinity-embedding:<version>)
🔧 Configure – set at least MODEL_NAMES (see Endpoint Configuration)
🚀 Deploy – create a RunPod Serverless endpoint
🧪 Call the API – follow the example in the Usage section

Endpoint Configuration

All behaviour is controlled through environment variables:

Variable	Required	Default	Description
`MODEL_NAMES`	Yes	—	One or more Hugging-Face model IDs. Separate multiple IDs with a semicolon. Example: `BAAI/bge-small-en-v1.5`
`BATCH_SIZES`	No	`32`	Per-model batch size; semicolon-separated list matching `MODEL_NAMES`.
`BACKEND`	No	`torch`	Inference engine for all models: `torch`, `optimum`, or `ctranslate2`.
`DTYPES`	No	`auto`	Precision per model (`auto`, `fp16`, `fp8`). Semicolon-separated, must match `MODEL_NAMES`.
`INFINITY_QUEUE_SIZE`	No	`48000`	Max items queueable inside the Infinity engine.
`RUNPOD_MAX_CONCURRENCY`	No	`300`	Max concurrent requests the RunPod wrapper will accept.

API Specification

Two flavours, one schema.

OpenAI-compatible – drop-in replacement for /v1/models, /v1/embeddings, so you can use this endpoint instead of the API from OpenAI by replacing the base url with the URL of your endpoint: https://api.runpod.ai/v2/<ENDPOINT_ID>/openai/v1 and use your API key from RunPod instead of the one from OpenAI
Standard RunPod – call /run or /runsync with a JSON body under the input key.
Base URL: https://api.runpod.ai/v2/<ENDPOINT_ID>

Except for transport (path + wrapper object) the JSON you send/receive is identical. The tables below describe the shared payload.

List Models

Method	Path	Body
`GET`	`/openai/v1/models`	–
`POST`	`/runsync`	`{ "input": { "openai_route": "/v1/models" } }`

Response

{
  "data": [
    { "id": "BAAI/bge-small-en-v1.5", "stats": {} },
    { "id": "intfloat/e5-large-v2", "stats": {} }
  ]
}

Create Embeddings

Request Fields (shared)

Field	Type	Required	Description
`model`	string	Yes	One of the IDs supplied via `MODEL_NAMES`.
`input`	string \| array	Yes	A single text string or list of texts to embed.

OpenAI route vs. Standard:

Flavour	Method	Path	Body
OpenAI	`POST`	`/v1/embeddings`	`{ "model": "…", "input": "…" }`
Standard	`POST`	`/runsync`	`{ "input": { "model": "…", "input": "…" } }`

Response (both flavours)

{
  "object": "list",
  "model": "BAAI/bge-small-en-v1.5",
  "data": [
    { "object": "embedding", "embedding": [0.01, -0.02 /* … */], "index": 0 }
  ],
  "usage": { "prompt_tokens": 2, "total_tokens": 2 }
}

Rerank Documents (Standard only)

Field	Type	Required	Description
`model`	string	Yes	Any deployed reranker model
`query`	string	Yes	The search/query text
`docs`	array	Yes	List of documents to rerank
`return_docs`	bool	No	If `true`, return the documents in ranked order (default `false`)

Call pattern

POST /runsync
Content-Type: application/json

{
  "input": {
    "model": "BAAI/bge-reranker-large",
    "query": "Which product has warranty coverage?",
    "docs": [
      "Product A comes with a 2-year warranty",
      "Product B is available in red and blue colors",
      "All electronics include a standard 1-year warranty"
    ],
    "return_docs": true
  }
}

Response contains either scores or the full docs list, depending on return_docs.

Usage

Below are minimal curl snippets so you can copy-paste from any machine.

Replace <ENDPOINT_ID> with your endpoint ID and <API_KEY> with a RunPod API key.

OpenAI-Compatible Calls

# List models
curl -H "Authorization: Bearer <API_KEY>" \
     https://api.runpod.ai/v2/<ENDPOINT_ID>/openai/v1/models

# Create embeddings
curl -X POST \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"model":"BAAI/bge-small-en-v1.5","input":"Hello world"}' \
  https://api.runpod.ai/v2/<ENDPOINT_ID>/openai/v1/embeddings

Standard RunPod Calls

# Create embeddings (wait for result)
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"input":{"model":"BAAI/bge-small-en-v1.5","input":"Hello world"}}' \
  https://api.runpod.ai/v2/<ENDPOINT_ID>/runsync

# Rerank
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"input":{"model":"BAAI/bge-reranker-large","query":"Which product has warranty coverage?","docs":["Product A comes with a 2-year warranty","Product B is available in red and blue colors","All electronics include a standard 1-year warranty"],"return_docs":true}}' \
  https://api.runpod.ai/v2/<ENDPOINT_ID>/runsync

Further Documentation

Infinity Engine – how the ultra-fast backend works.
RunPod Docs – serverless concepts, limits, and API reference.

Acknowledgements

Special thanks to Michael Feil for creating the Infinity engine and for his ongoing support of this project.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
.runpod		.runpod
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-bake.hcl		docker-bake.hcl
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
test_input.json		test_input.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quickstart

Endpoint Configuration

API Specification

List Models

Response

Create Embeddings

Request Fields (shared)

Response (both flavours)

Rerank Documents (Standard only)

Usage

OpenAI-Compatible Calls

Standard RunPod Calls

Further Documentation

Acknowledgements

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

License

runpod-workers/worker-infinity-embedding

Folders and files

Latest commit

History

Repository files navigation

Quickstart

Endpoint Configuration

API Specification

List Models

Response

Create Embeddings

Request Fields (shared)

Response (both flavours)

Rerank Documents (Standard only)

Usage

OpenAI-Compatible Calls

Standard RunPod Calls

Further Documentation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages