This tool allows you to get the t/s (tokens per second) of Large Language Models (LLMs) running on your local machine. Currently we only support testing Ollama llms
Output on a Nvidia 4090 windows desktop
Average stats:
(Running on dual 3090 Ti GPU, Epyc 7763 CPU in Ubuntu 22.04)
----------------------------------------------------
Model: deepseek-r1:70b
Performance Metrics:
Prompt Processing: 336.73 tokens/sec
Generation Speed: 17.65 tokens/sec
Combined Speed: 18.01 tokens/sec
Workload Stats:
Input Tokens: 165
Generated Tokens: 7673
Model Load Time: 6.11s
Processing Time: 0.49s
Generation Time: 434.70s
Total Time: 441.31s
----------------------------------------------------
Average stats:
(Running on single 3090 GPU, 13900KS CPU in WSL2(Ubuntu 22.04) in Windows 11)
----------------------------------------------------
Model: deepseek-r1:32b
Performance Metrics:
Prompt Processing: 399.05 tokens/sec
Generation Speed: 27.18 tokens/sec
Combined Speed: 27.58 tokens/sec
Workload Stats:
Input Tokens: 168
Generated Tokens: 10601
Model Load Time: 15.44s
Processing Time: 0.42s
Generation Time: 390.00s
Total Time: 405.87s
----------------------------------------------------
Follow these instructions to set up and run benchmarks on your system.
- Python 3.11 or higher
- Ollama installed and configured
-
Clone the repository
git clone https://github.com/larryhopecode/ollama-benchmark.git cd ollama-benchmark
-
Set up Python environment
# Linux/macOS python3 -m venv venv source venv/bin/activate # Windows python -m venv venv .\venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Start Ollama Server
Ensure the Ollama service is running:
ollama serve
-
Run Benchmarks
Basic usage:
python benchmark.py
With options:
python benchmark.py --verbose --models deepseek-r1:70b --prompts "Write a hello world program" "Explain quantum computing"
-v, --verbose
: Enable detailed output including streaming responses-m, --models
: Space-separated list of models to benchmark (defaults to all available models)-p, --prompts
: Space-separated list of custom prompts (defaults to a predefined set testing various capabilities)
The default benchmark suite includes prompts testing:
- Analytical reasoning
- Creative writing
- Complex analysis
- Technical knowledge
- Structured output generation
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.