# bgiserver **Repository Path**: jinfagang/BigServer ## Basic Information - **Project Name**: bgiserver - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-03 - **Last Updated**: 2025-10-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # BigServer BigServer is a high-performance LLM serving framework that supports multiple backends (vLLM, SGLang, Transformers) with an OpenAI-compatible API. It's designed to be easy to use and customize, supporting various models including Qwen3VL, MiniCPM, and more. BigServer的使用场景: - `vllm serve` 无法解决的场景,诸如需要用户验、健全、request审查等; - 需要对推理过程进行精确控制的场景,例如自定义convtemplate、自定义停止词等; - 需要对推理过程进行监控、分析等的场景; 总之, `bigserver` 通过融合多个后端,取代单一后端推理,同时包含了多个后端的离线使用方法,可以极大的方便用户定制化、自定义自己的推理后端。 ## Features - OpenAI-compatible API endpoints - Support for multiple inference backends (vLLM, SGLang, Transformers) - Easy model switching and management - Multimodal model support (for models like Qwen3VL) - FastAPI-based with async/await support - Highly customizable and extensible ## Installation ```bash pip install bigserver ``` Or for development: ```bash git clone cd BigServer pip install -e . ``` To install with specific backends: ```bash # For vLLM backend pip install bigserver[vllm] # For SGLang backend pip install bigserver[sglang] # For both pip install bigserver[vllm,sglang] ``` ## Quick Start Start the server with the default model: ```bash python api.py ``` Or programmatically: ```python from bigserver import serve serve( model_name="microsoft/DialoGPT-medium", # or any Hugging Face model backend="transformers", # vllm, sglang, transformers port=8000 ) ``` ## Usage Once the server is running, you can use it with OpenAI-compatible clients: ```python import openai openai.base_url = "http://localhost:8000/v1" openai.api_key = "dummy-key" # BigServer doesn't require a real API key # Chat completion response = openai.ChatCompletion.create( model="microsoft/DialoGPT-medium", messages=[ {"role": "user", "content": "Hello!"} ] ) print(response.choices[0].message.content) ``` ## Configuration You can configure the server using environment variables: ```bash export MODEL_NAME="your-model-name" export BACKEND="transformers" # vllm, sglang, transformers export MODEL_PATH="/path/to/local/model" export HOST="0.0.0.0" export PORT=8000 ``` Or by passing parameters to the `serve` function: ```python from bigserver import serve serve( model_name="Qwen/Qwen2-7B", backend="transformers", model_path="/path/to/local/model", # optional host="0.0.0.0", port=8000 ) ``` ## Supported Models BigServer supports a wide range of models that work with the supported backends: - Hugging Face transformers models - Vision-language models (QwenVL, MiniCPM-V, etc.) - Text generation models - Instruction-following models - Multimodal models (with appropriate backends) ## Backends BigServer supports three different inference backends: 1. **Transformers**: Standard Hugging Face implementation, good for development and experimentation 2. **vLLM**: High-performance inference with advanced optimization techniques 3. **SGLang**: Efficient serving framework with runtime optimization ## Architecture ``` ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ API Client │───▶│ FastAPI App │───▶│ Model Manager │ │ (OpenAI compat) │ │ (Endpoints) │ │ │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────────┐ │ Backend Manager │ │ │ ├─────────────────────┤ │ ├─ VLLM Backend │ │ ├─ SGLang Backend │ │ └─ Transformers Bknd│ └─────────────────────┘ ``` ## Contributing We welcome contributions! Please see our [Issues](https://github.com/example/bigserver/issues) page for ways to contribute. ## License This project is licensed under the MIT License - see the LICENSE file for details.