# Install the solo-server package using pip
pip install solo-server
# Run the solo server setup in simple mode
solo setup
soloreccomp.mp4
- Seamless Setup: Manage your on device AI with a simple CLI and HTTP servers
- Open Model Registry: Pull models from registries like Ollama & Hugging Face
- Cross-Platform Compatibility: Deploy AI models effortlessly on your hardware
- Configurable Framework: Auto-detect hardware (CPU, GPU, RAM) and sets configs
- ๐ Docker: Required for containerization
# Install Solo-Server
pip install solo-server
Run the interactive setup to configure Solo Server:
# Setup Solo-Server
solo setup
โ๏ธ Detects CPU, GPU, RAM for hardware-optimized execution
โ๏ธ Auto-configures solo.conf
with optimal settings
โ๏ธ Recommends the compute backend OCI (CUDA, HIP, SYCL, Vulkan, CPU, Metal)
โญโโโโโโโโโโโโโโโโโโ System Information โโโโโโโโโโโโโโโโโโโฎ
โ Operating System: Windows โ
โ CPU: AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD โ
โ CPU Cores: 8 โ
โ Memory: 15.42GB โ
โ GPU: NVIDIA โ
โ GPU Model: NVIDIA GeForce GTX 1660 Ti โ
โ GPU Memory: 6144.0GB โ
โ Compute Backend: CUDA โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
๐ฅ๏ธ Detected GPU: NVIDIA GeForce GTX 1660 Ti (NVIDIA)
โ
NVIDIA GPU drivers and toolkit are correctly installed.
Would you like to use GPU for inference? [y/n] (y): y
๐ข Choose the domain that best describes your field:
1. Personal
2. Education
3. Agriculture
4. Software
5. Healthcare
6. Forensics
7. Robotics
8. Enterprise
9. Custom
Enter the number of your domain (1):
โญโ Commands โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ setup Set up Solo Server environment with interactive prompts and saves configuration to config.json. โ
โ serve Start a model server with the specified model. โ
โ status Check running models, system status, and configuration. โ
โ list List all downloaded models available in HuggingFace cache and Ollama. โ
โ test Test if the Solo server is running correctly. Performs an inference test to verify server functionality. โ
โ stop Stops Solo Server services. If a server type is specified (e.g., 'ollama', 'vllm', 'llama.cpp'), only that specific โ
โ service will be stopped. Otherwise, all Solo services will be stopped. โ
โ download Downloads a Hugging Face model using the huggingface repo id. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
solo serve -s ollama -m llama3.2
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --model -m TEXT Model name or path. Can be: - HuggingFace repo ID (e.g., 'meta-llama/Llama-3.2-1B-Instruct') - โ
โ Ollama model Registry (e.g., 'llama3.2') - Local path to a model file (e.g., '/path/to/model.gguf') โ
โ If not specified, the default model from configuration will be used. โ
โ [default: None] โ
โ --server -s TEXT Server type (ollama, vllm, llama.cpp) [default: None] โ
โ --port -p INTEGER Port to run the server on [default: None] โ
โ --ui Start the UI for the server [default: True] โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
View all downloaded models in your HuggingFace cache and Ollama:
solo list
solo stop
Solo Server provides consistent REST API endpoints across different server types (Ollama, vLLM, llama.cpp). The exact API endpoint and format differs slightly depending on which server type you're using.
# Generate a response
curl http://localhost:5070/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
# Chat with a model
curl http://localhost:5070/api/chat -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
Both use OpenAI-compatible endpoints:
# Chat completion
curl http://localhost:5070/v1/chat/completions -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "Why is the sky blue?" }
],
"max_tokens": 50,
"temperature": 0.7
}'
# Text completion
curl http://localhost:5070/v1/completions -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"max_tokens": 50,
"temperature": 0.7
}'
Refer example_apps for sample applications.
# Clone the repository
git clone https://github.com/GetSoloTech/solo-server.git
# Navigate to the directory
cd solo-server
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Unix/MacOS
# OR
.venv\Scripts\activate # On Windows
# Install in editable mode
pip install -e .
This project wouldn't be possible without the help of other projects like:
- uv
- llama.cpp
- ramalama
- ollama
- whisper.cpp
- vllm
- podman
- huggingface
- aiaio
- llamafile
- cog
Like using Solo, consider leaving us a โญ on GitHub