<p style="font-size:small;">Content-Length: 5474 | <a href="http://clevelandohioweatherforecast.com//pFad.php?u=" style="font-size:small;">pFad</a> | <a href="https://github.com/sangyuxiaowu/LLamaWorker/raw/refs/heads/main/README.md" style="font-size:small;">https://github.com/sangyuxiaowu/LLamaWorker/raw/refs/heads/main/README.md</a></p>th: 5462

![](doc/logo.png)

# LLamaWorker

LLamaWorker is a HTTP API server developed based on the [LLamaSharp](https://github.com/SciSharp/LLamaSharp?wt.mc_id=DT-MVP-5005195) project. It provides an OpenAI-compatible API, making it easy for developers to integrate Large Language Models (LLM) into their applications.

English | [中文](README_CN.md)

## Features

- **OpenAI API Compatible**: Offers an API similar to OpenAI's / Azure OpenAI, making migration and integration easy.
- **Multi-Model Support**: Supports configuring and switching between different models to meet the needs of various scenarios.
- **Streaming Response**: Supports streaming responses to improve the efficiency of processing large responses.
- **Embedding Support**: Provides text embedding functionality with support for various embedding models.
- **chat templates**: Provides some common chat templates.
- **Auto-Release**: Supports automatic release of loaded models.
- **Function Call**: Supports function calls.
- **API Key Authentication**: Supports API Key authentication.
- **Gradio UI Demo**: Provides a UI demo based on Gradio.NET.

## Use Vulkan Compiled Version

A Vulkan backend compiled version is provided in the release, you can download the corresponding compiled version from [Releases](../../releases):

- `LLamaWorker-Vulkan-win-x64.zip`
- `LLamaWorker-Vulkan-linux-x64.zip`

After downloading and unzipping, modify the configuration in the `appsettings.json` file, and you can run the software and start using it.

> For other backends, you can also download the `Vulkan` version, go to [llama.cpp](https://github.com/ggerganov/llama.cpp/releases) to download the corresponding compiled version, and replace the relevant libraries. You can also compile the `llama.cpp` project yourself to get the required libraries.

## Function Call

LLamaWorker supports function calls, and currently provides three templates in the configuration file, and has tested the function call effect of `Phi-3`, `Qwen2` and `Llama3.1`.

Function calls are compatible with OpenAI's API, You can test it with the following JSON request:

`POST /v1/chat/completions`

```json
{
  "model": "default",
  "messages": [
    {
      "role": "user",
      "content": "Where is the temperature high between Beijing and Shanghai?"
    }
  ],
  "tools": [
    {
      "function": {
        "name": "GetWeatherPlugin-GetCurrentTemperature",
        "description": "Get the current temperature of the specified city。",
        "parameters": {
          "type": "object",
          "required": [
            "city"
          ],
          "properties": {
            "city": {
              "type": "string",
              "description": "City Name"
            }
          }
        }
      },
      "type": "function"
    },
    {
      "function": {
        "name": "EmailPlugin-SendEmail",
        "description": "Send an email to the recipient.",
        "parameters": {
          "type": "object",
          "required": [
            "recipientEmails",
            "subject",
            "body"
          ],
          "properties": {
            "recipientEmails": {
              "type": "string",
              "description": "A recipient email list separated by semicolons"
            },
            "subject": {
              "type": "string"
            },
            "body": {
              "type": "string"
            }
          }
        }
      },
      "type": "function"
    }
  ],
  "tool_choice": "auto"
}
```

## Compile and Run

1. Clone the repository locally
   ```bash
   git clone https://github.com/sangyuxiaowu/LLamaWorker.git
   ```
2. Enter the project directory
   ```bash
   cd LLamaWorker
   ```
3. Choose the project file according to your needs. The project provides three versions of the project files:
   - `LLamaWorker.Backend.Cpu`: For CPU environments.
   - `LLamaWorker.Backend.Cuda11`: For GPU environments with CUDA 11.
   - `LLamaWorker.Backend.Cuda12`: For GPU environments with CUDA 12.
   - `LLamaWorker.Backend.Vulkan`: Vulkan. 
   
   Select the project file that suits your environment for the next step.
   
4. Install dependencies
   ```bash
   dotnet restore LLamaWorker.Backend.Cpu\LLamaWorker.Backend.Cpu.csproj
   ```
   If you are using a CUDA version, replace the project file name accordingly.
   
5. Modify the configuration file `appsettings.json`. The default configuration includes some common open-source model configurations, you only need to modify the model file path (`ModelPath`) as needed.
   
6. Start the server
   ```bash
   dotnet run --project LLamaWorker.Backend.Cpu\LLamaWorker.Backend.Cpu.csproj
   ```
   If you are using a CUDA version, replace the project file name accordingly.

## API Reference

LLamaWorker offers the following API endpoints:

- `/v1/chat/completions`: Chat completion requests
- `/v1/completions`: Prompt completion requests
- `/v1/embeddings`: Create embeddings
- `/models/info`: Returns basic information about the model
- `/models/config`: Returns information about configured models
- `/models/{modelId}/switch`: Switch to a specified model

## Gradio UI Demo

This ui is based on [Gradio.NET](https://github.com/feiyun0112/Gradio.Net?wt.mc_id=DT-MVP-5005195).

You can also try the Gradio UI demo by running the following command:

```bash
dotnet restore ChatUI\ChatUI.csproj
dotnet run --project ChatUI\ChatUI.csproj
```

Then open the browser and visit the Gradio UI demo.

![](doc/ui.png)
<!-- URL input box at the bottom -->
<form method="GET" action="">
    <label for="targeturl-bottom"><b>Enter URL:</b></label>
    <input type="text" id="targeturl-bottom" name="u" value="https://github.com/sangyuxiaowu/LLamaWorker/raw/refs/heads/main/README.md" required><br><small>
    <label for="useWeserv-bottom">Disable Weserv Image Reduction:</label>
    <input type="checkbox" id="useWeserv-bottom" name="useWeserv" value="false"><br>
    <label for="stripJS-bottom">Strip JavaScript:</label>
    <input type="checkbox" id="stripJS-bottom" name="stripJS" value="true"><br>
    <label for="stripImages-bottom">Strip Images:</label>
    <input type="checkbox" id="stripImages-bottom" name="stripImages" value="true"><br>
    <label for="stripFnts-bottom">Stripout Font Forcing:</label>
    <input type="checkbox" id="stripFnts-bottom" name="stripFnts" value="true"><br>
    <label for="stripCSS-bottom">Strip CSS:</label>
    <input type="checkbox" id="stripCSS-bottom" name="stripCSS" value="true"><br>
    <label for="stripVideos-bottom">Strip Videos:</label>
    <input type="checkbox" id="stripVideos-bottom" name="stripVideos" value="true"><br>
    <label for="removeMenus-bottom">Remove Headers and Menus:</label>
    <input type="checkbox" id="removeMenus-bottom" name="removeMenus" value="true"><br></small>
<!-- New form elements Sandwich Strip -->
        <label for="start"><small>Remove from after:</label>
        <input type="text" id="start" name="start" value="<body>">
        <label for="end"><small>to before:</label>
        <input type="text" id="end" name="end">
        <input type="checkbox" id="applySandwichStrip" name="applySandwichStrip" value="1" onclick="submitForm()"> ApplySandwichStrip<br></small>
    <button type="submit">Fetch</button>
</form><!-- Header banner at the bottom -->
<p><h1><a href="http://clevelandohioweatherforecast.com//pFad.php?u=" title="pFad">pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <i>Saves Data!</i></a></h1><br><em>--- a PPN by Garber Painting Akron. <b> With Image Size Reduction </b>included!</em></p><p>Fetched URL: <a href="https://github.com/sangyuxiaowu/LLamaWorker/raw/refs/heads/main/README.md" target="_blank">https://github.com/sangyuxiaowu/LLamaWorker/raw/refs/heads/main/README.md</a></p><p>Alternative Proxies:</p><p><a href="http://clevelandohioweatherforecast.com/php-proxy/index.php?q=https://github.com/sangyuxiaowu/LLamaWorker/raw/refs/heads/main/README.md" target="_blank">Alternative Proxy</a></p><p><a href="http://clevelandohioweatherforecast.com/pFad/index.php?u=https://github.com/sangyuxiaowu/LLamaWorker/raw/refs/heads/main/README.md&useWeserv=true" target="_blank">pFad Proxy</a></p><p><a href="http://clevelandohioweatherforecast.com/pFad/v3index.php?u=https://github.com/sangyuxiaowu/LLamaWorker/raw/refs/heads/main/README.md&useWeserv=true" target="_blank">pFad v3 Proxy</a></p><p><a href="http://clevelandohioweatherforecast.com/pFad/v4index.php?u=https://github.com/sangyuxiaowu/LLamaWorker/raw/refs/heads/main/README.md&useWeserv=true" target="_blank">pFad v4 Proxy</a></p>