PD-Multiplexing: A New Paradigm for High-Goodput LLM Serving Driven by GreenContext
This article introduces PD-Multiplexing, a new serving paradigm in SGLang that leverages NVIDIA's GreenContext technology to achieve higher goodput for LLM services through efficient intra-GPU resource sharing between prefill and decode phases.