YZ Index
Winzheng · Task Communication Rankings
Structured output quality and formatting compliance.
Side Dimension Rankings — Communication + Judgment sub-dimension performance
| # | Model | Task Communication | Code Execution | Overall |
|---|---|---|---|---|
| 🥇 | Claude Opus 4.7 claude | 90.3 | 89 | |
| 🥈 | Claude Sonnet 4.6 claude | 87.6 | 87.2 | |
| 🥉 | Grok 4 grok | 93.9 | 89.9 | |
| 4 | GPT-o3 gpt | 84.8 | 82.8 | |
| 5 | GPT-5.5 gpt | 81.9 | 80.9 | |
| 6 | Qwen3 Max qwen | 89.7 | 86.2 | |
| 7 | DeepSeek V4 Pro DeepSeek | 87.9 | 83.3 | |
| 8 | Gemini 3.1 Pro gemini | 88.4 | 84.8 | |
| 9 | Gemini 2.5 Pro gemini | 88.1 | 86.4 | |
| 10 | Doubao Pro doubao | 94.6 | 88.8 | |
| 11 | ERNIE Bot 4.5 ernie | 78 | 76.9 |