We are excited to introduce SGLang Diffusion, which brings SGLang's leading performance to the realm of diffusion model image and video generation.
SGLang Diffusion supports mainstream open-source video and image generation models, including the Wan series, Hunyuan, Qwen-Image, Qwen-Image-Edit, and Flux, while achieving fast inference and ease of use through multiple API interfaces (OpenAI-compatible API, CLI, Python interface). It delivers 1.2x to 5.9x speedups across diverse workloads.
In collaboration with the FastVideo team, we have built a complete ecosystem for diffusion models, from post-training to production deployment. The code is open-sourced on GitHub.
SGLang Diffusion Performance Benchmarks on H100 GPU
SGLang Diffusion Performance Benchmarks on H200 GPU
Why Bring Diffusion to SGLang?
As diffusion models become the core technology for image and video generation, the community has strongly called for extending SGLang's high performance and seamless experience to these modalities. We developed SGLang Diffusion to address this need, providing a unified high-performance engine that supports both language and diffusion tasks.
This unified approach is crucial because future generative technologies will converge architectures. Pioneering models like ByteDance's Bagel, Meta's Transfusion, and NVIDIA's Fast-dLLM v2 already combine autoregressive (AR) and diffusion methods. SGLang Diffusion is designed as a future-proof high-performance solution.
Architecture
SGLang Diffusion builds on SGLang's mature serving architecture, inheriting its powerful scheduler and optimized sgl-kernel, ensuring both performance and flexibility.
At its core is ComposedPipelineBase, a flexible abstraction that orchestrates multiple modular PipelineStage components, such as DenoisingStage for denoising loops or DecodingStage for VAE decoding, allowing developers to easily build custom pipelines.
To achieve top speeds, we integrate advanced parallelism techniques: core Transformers support Unified Sequence Parallelism (USP, including Ulysses-SP and Ring-Attention), while other components support CFG-parallelism and tensor parallelism (TP).
The system is based on an enhanced FastVideo branch, developed through close collaboration with their team: SGLang Diffusion focuses on inference acceleration, while FastVideo provides training support such as model distillation.
Model Support
We support popular open-source video and image generation models:
- Video models: Wan series, FastWan, Hunyuan
- Image models: Qwen-Image, Qwen-Image-Edit, Flux
See the complete support list here.
Usage
We provide CLI, Python engine API, and OpenAI-compatible API for easy integration.
Installation
# Via pip or uv
uv pip install 'sglang[diffusion]' --prerelease=allow
# From source
git clone https://github.com/sgl-project/sglang.git
cd sglang
uv pip install -e "python[diffusion]" --prerelease=allowCLI
Start the server and send requests:
sglang serve --model-path black-forest-labs/FLUX.1-dev --port 3000
curl http://127.0.0.1:3000/v1/images/generations \
-o >(jq -r '.data[0].b64_json' | base64 --decode > example.png) \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "black-forest-labs/FLUX.1-dev",
"prompt": "A cute baby sea otter",
"n": 1,
"size": "1024x1024",
"response_format": "b64_json"
}'Or generate images directly:
sglang generate --model-path black-forest-labs/FLUX.1-dev \
--prompt "A Logo With Bold Large Text: SGL Diffusion" \
--save-outputSee the Installation Guide and CLI Guide for details.
Demo
Text-to-Video: Wan-AI/Wan2.1
sglang generate --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
--prompt "A curious raccoon" \
--save-outputImage-to-Video: Wan-AI/Wan2.1-I2V
sglang generate --model-path=Wan-AI/Wan2.1-I2V-14B-480P-Diffusers \
--prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard..." \
--image-path="https://github.com/Wan-Video/Wan2.2/blob/990af50de458c19590c245151197326e208d7191/examples/i2v_input.JPG?raw=true" \
--num-gpus 2 --enable-cfg-parallel --save-outputText-to-Image: FLUX
sglang generate --model-path black-forest-labs/FLUX.1-dev \
--prompt "A Logo With Bold Large Text: SGL Diffusion" \
--save-output
Text-to-Image: Qwen-Image
sglang generate --model-path=Qwen/Qwen-Image \
--prompt='A curious raccoon' \
--width=720 --height=720 --save-output
Image-to-Image: Qwen-Image-Edit
sglang generate --model-path=Qwen/Qwen-Image-Edit \
--prompt="Convert 2D style to 3D style" --image-path="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/TI2I_Qwen_Image_Edit_Input.jpg" \
--width=1536 --height=1024 --save-output

Performance Benchmarks
As shown in the charts at the top, SGLang Diffusion achieves top performance in both image and video generation compared to popular baselines like Hugging Face Diffusers. Parallel configurations such as CFG-Parallel and USP deliver significant speedups compared to single GPU.
Roadmap and Diffusion Ecosystem
We are collaborating with the FastVideo team to build a comprehensive diffusion ecosystem, providing end-to-end solutions from model training to high-performance inference.
© 2026 Winzheng.com 赢政天下 | 转载请注明来源并附原文链接