KTransformers Accelerates SGLang's Heterogeneous Inference
KTransformers, developed by Tsinghua University's MadSys and Approaching.AI, optimizes CPU/GPU collaborative inference for sparse MoE models through AMX-optimized kernels, efficient device coordination, and expert deferral mechanisms, now integrated into SGLang for enhanced performance.