NVFP4 (1 articles)

GB200 NVL72 Deployment DeepSeek Optimization (Part 2): 3.8x Prefill and 4.8x Decode Throughput

The SGLang team shares their optimization progress on DeepSeek V3/R1 inference performance using GB200 NVL72, achieving 26,156 input tokens/s for prefill and 13,386 output tokens/s for decode per NVIDIA Blackwell GPU through techniques like FP8 attention, NVFP4 MoE, and large-scale expert parallelism.