混合精度推理 (1 articles)

FP4 Mixed-Precision Inference Optimization on AMD GPUs

We developed Petit, a collection of FP16/BF16 × FP4 mixed-precision GPU kernels for AMD GPUs, enabling 1.74× faster Llama 3.3 70B inference on existing MI250/MI300 hardware without upgrades.