Creating a custom_runner for self-hosted LLMs

Thank you for this great utility.

I was running the benchmark on our self-hosted deployment llama3.3 and deepseek V3.

For deepseek, had just a minor change here and everything else worked out-of-the-box:
<img width="1179" alt="image" src="https://github.com/user-attachments/assets/d66e930e-2c59-42c2-a8fc-ed896f73385b" />

For llama3, few more changes had to be made for me.. 
I was getting an assertion error regarding the prompt format in run_single, which got fixed be removing the tokenizer part in code_generation.py, line 208-219:

```
        from transformers import AutoTokenizer
        tokenizer = AutoTokenizer.from_pretrained(
            "meta-llama/Meta-Llama-3-8B-Instruct", padding_side="left", use_fast=False
        )
        return tokenizer.apply_chat_template(
            chat_messages,
            tokenize=False,
            add_generation_prompt=True,
            truncation=False,
            padding=False,
        )

```

Would it be appropriate to add a custom_runner, in order to run the benchmark for models hosted via vLLM/SGLang/TGI?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Creating a custom_runner for self-hosted LLMs #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Creating a custom_runner for self-hosted LLMs #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.