Optimizing GPT-OSS on NVIDIA DGX Spark: Unleashing Spark's Maximum Potential
We successfully optimized GPT-OSS 20B and 120B models on NVIDIA DGX Spark using SGLang, achieving state-of-the-art performance of ~70 tokens/s and ~50 tokens/s respectively, enabling fully local AI applications including coding agents.