OpenLLM
An open platform for running large language models as OpenAI-compatible API endpoints. OpenLLM lets you serve any supported open-source model with a single command and includes a built-in chat interface for testing.
What You Can Do After Deployment
- Visit your domain — Open the built-in chat UI to interact with your LLM
- Use the OpenAI-compatible API — Connect any OpenAI SDK client to your endpoint for programmatic access
- Integrate with frameworks — Use with LangChain, LlamaIndex, AutoGen, and other AI frameworks
- Test with the playground — Experiment with different prompts and parameters in the web interface
- Monitor performance — View request metrics and model performance statistics
Key Features
- OpenAI-compatible API (chat/completions, completions endpoints)
- Built-in web chat UI for interactive testing
- Support for Llama, Mistral, Gemma, Phi, Qwen, and many more models
- Streaming response support for real-time text generation
- Automatic model downloading and caching
- Quantization support (GPTQ, AWQ, SqueezeLLM)
- Multi-GPU inference with tensor parallelism
- Adapter support for LoRA fine-tuned models
- Compatible with LangChain, LlamaIndex, and BentoML
- RESTful API with automatic OpenAPI documentation
License
Apache-2.0 — GitHub