量化感知训练 (1 articles)

Deploying 1TB Models on a Single H200: End-to-End INT4 QAT RL Practice

The SGLang RL team achieves major breakthroughs in RL training stability and efficiency, implementing end-to-end INT4 QAT that enables ~1TB model deployment on a single H200 GPU while maintaining training-inference consistency.