# optillm
**Repository Path**: devdz/optillm
## Basic Information
- **Project Name**: optillm
- **Description**: Optimizing inference proxy for LLMs
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-03-09
- **Last Updated**: 2026-03-17
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# OptiLLM
π 2-10x accuracy improvements on reasoning tasks with zero training
π€ HuggingFace Space β’
π Colab Demo β’
π¬ Discussions
---
**OptiLLM** is an OpenAI API-compatible optimizing inference proxy that implements 20+ state-of-the-art techniques to dramatically improve LLM accuracy and performance on reasoning tasks - without requiring any model training or fine-tuning.
It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time. A good example of how to combine such techniques together is the [CePO approach](optillm/cepo) from Cerebras.
## β¨ Key Features
- **π― Instant Improvements**: 2-10x better accuracy on math, coding, and logical reasoning
- **π Drop-in Replacement**: Works with any OpenAI-compatible API endpoint
- **π§ 20+ Optimization Techniques**: From simple best-of-N to advanced MCTS and planning
- **π¦ Zero Training Required**: Just proxy your existing API calls through OptiLLM
- **β‘ Production Ready**: Used in production by companies and researchers worldwide
- **π Multi-Provider**: Supports OpenAI, Anthropic, Google, Cerebras, and 100+ models via LiteLLM
## π Quick Start
Get powerful reasoning improvements in 3 simple steps:
```bash
# 1. Install OptiLLM
pip install optillm
# 2. Start the server
export OPENAI_API_KEY="your-key-here"
optillm
# 3. Use with any OpenAI client - just change the model name!
```
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1")
# Add 'moa-' prefix for Mixture of Agents optimization
response = client.chat.completions.create(
model="moa-gpt-4o-mini", # This gives you GPT-4o performance from GPT-4o-mini!
messages=[{"role": "user", "content": "Solve: If 2x + 3 = 7, what is x?"}]
)
```
**Before OptiLLM**: "x = 1" β
**After OptiLLM**: "Let me work through this step by step: 2x + 3 = 7, so 2x = 4, therefore x = 2" β
## π Proven Results
OptiLLM delivers measurable improvements across diverse benchmarks:
| Technique | Base Model | Improvement | Benchmark |
|-----------|------------|-------------|-----------|
| **MARS** | Gemini 2.5 Flash Lite | **+30.0 points** | AIME 2025 (43.3β73.3) |
| **CePO** | Llama 3.3 70B | **+18.6 points** | Math-L5 (51.0β69.6) |
| **AutoThink** | DeepSeek-R1-1.5B | **+9.34 points** | GPQA-Diamond (21.72β31.06) |
| **LongCePO** | Llama 3.3 70B | **+13.6 points** | InfiniteBench (58.0β71.6) |
| **MOA** | GPT-4o-mini | **Matches GPT-4** | Arena-Hard-Auto |
| **PlanSearch** | GPT-4o-mini | **+20% pass@5** | LiveCodeBench |
*Full benchmark results [below](#sota-results-on-benchmarks-with-optillm)* β¬οΈ
## ποΈ Installation
### Using pip
```bash
pip install optillm
optillm
2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy
2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory
2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto
```
### Using docker
```bash
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest
docker run -p 8000:8000 ghcr.io/algorithmicsuperintelligence/optillm:latest
2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy
2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory
2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto
```
**Available Docker image variants:**
- **Full image** (`latest`): Includes all dependencies for local inference and plugins
- **Proxy-only** (`latest-proxy`): Lightweight image without local inference capabilities
- **Offline** (`latest-offline`): Self-contained image with pre-downloaded models (spaCy) for fully offline operation
```bash
# Proxy-only (smallest)
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest-proxy
# Offline (largest, includes pre-downloaded models)
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest-offline
```
### Install from source
Clone the repository with `git` and use `pip install` to setup the dependencies.
```bash
git clone https://github.com/algorithmicsuperintelligence/optillm.git
cd optillm
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
## π SSL Configuration
OptILLM supports SSL certificate verification configuration for working with self-signed certificates or corporate proxies.
**Disable SSL verification (development only):**
```bash
# Command line
optillm --no-ssl-verify
# Environment variable
export OPTILLM_SSL_VERIFY=false
optillm
```
**Use custom CA certificate:**
```bash
# Command line
optillm --ssl-cert-path /path/to/ca-bundle.crt
# Environment variable
export OPTILLM_SSL_CERT_PATH=/path/to/ca-bundle.crt
optillm
```
β οΈ **Security Note**: Disabling SSL verification is insecure and should only be used in development. For production environments with custom CAs, use `--ssl-cert-path` instead. See [SSL_CONFIGURATION.md](SSL_CONFIGURATION.md) for details.
## Implemented techniques
| Approach | Slug | Description |
| ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- |
| [MARS (Multi-Agent Reasoning System)](optillm/mars) | `mars` | Multi-agent reasoning with diverse temperature exploration, cross-verification, and iterative improvement |
| [Cerebras Planning and Optimization](optillm/cepo) | `cepo` | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
| CoT with Reflection | `cot_reflection` | Implements chain-of-thought reasoning with \, \ and \