MTP (1 articles)

SGLang Instantly Supports MiMo-V2-Flash Model

SGLang now supports the MiMo-V2-Flash model, a 309B parameter model optimized for inference with sliding window attention and multi-layer MTP, achieving balanced throughput and latency on H200 GPUs.