-
Notifications
You must be signed in to change notification settings - Fork 105
Open
Description
Thank you for this great utility.
I was running the benchmark on our self-hosted deployment llama3.3 and deepseek V3.
For deepseek, had just a minor change here and everything else worked out-of-the-box:
For llama3, few more changes had to be made for me..
I was getting an assertion error regarding the prompt format in run_single, which got fixed be removing the tokenizer part in code_generation.py, line 208-219:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct", padding_side="left", use_fast=False
)
return tokenizer.apply_chat_template(
chat_messages,
tokenize=False,
add_generation_prompt=True,
truncation=False,
padding=False,
)
Would it be appropriate to add a custom_runner, in order to run the benchmark for models hosted via vLLM/SGLang/TGI?
zhongze-fish and HansYeoh
Metadata
Metadata
Assignees
Labels
No labels